![]() ![]() ![]() Encrypting data hides the patterns of the data, making it look more like uniformly distributed white noise (independent random numbers) while keeping the data the same size. However, there are counter examples to that. If it doesn’t shrink much, or even gets larger (which happens when the compressed data plus the compression header is larger than the original data!), then the information density is pretty high. If it shrinks a lot, the information density was low. In fact, it’s part of a family of compressors called “Entropy encoders” ( ).īecause of this, you can get a pretty good idea of how much information is in some data by how much you can compress it. This makes the overall data smaller, but doesn’t change the amount of information there is in the data. In Huffman compression, you take a histogram of the data, and give shorter length bit patterns to the symbols that occur more. Interestingly, Huffman compression is related to this ( ). You can get a pretty good upper bound on information entropy though by making a histogram of how often each symbol appears, treating that histogram like a weighted N sided dice (a probability mass function or PMF), and calculating the entropy of that dice. A program to generate the second text file is probably going to be very slightly more complex than the text file to generate the first one: it’ll have a for loop on the outside! That’s why i said that those two text files would have nearly the same amount of information entropy instead of saying they had the same amount. It isn’t really possible to calculate the actual amount of information entropy a specific piece of data has in it because it’s related to Kolmogorov complexity: the length of the shortest program that can generate the data ( ), which turns out to be an incalculable value. These two text files would have nearly the same amount of information entropy, despite them having different lengths. The second text file is 3 times longer than the first but doesn’t really give any extra information. It isn’t the length of the data, but the actual amount of information it contains.įor example, one text file could contain “Apples are red.” and another text file could contain “Apples are red. Information entropy is a measure of how much information there is in some specific data. Links to threads and resources are at the bottom of the post! Information Entropy Overview Thanks to all the people on twitter who answered questions & helped me learn this stuff. If you have any comments, questions, important additions, etc, please hit me up by commenting on this post, or on twitter at. This post has an overview of information entropy then moves onto some technical details and experiments. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |