Universal data compression
From Wikipedia, the free encyclopedia
| This article does not cite any references or sources. (January 2007) Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. |
| The introduction to this article provides insufficient context for those unfamiliar with the subject. Please help improve the article with a good introductory style. |
Universal data compression is impossible, though some claim to have found a way to do it. Any data compression algorithm necessarily leaves some input data uncompressed. A simple counting argument illustrates this.
Suppose there is an algorithm A which compresses all data sequences of length b (in bits) to shorter length data sequences. Let c be the length of the longest compressed sequence. Since A compresses all data sequences it must be the case that c < b. Now, there are 2b sequences of length b, but only 2c sequences of length c. Since 2c < 2b, it must be the case that at least two sequences, say U1 and U2 compress to the same value, say C. As a result, when uncompressing C it is not possible to determine whether U1 or U2 should be produced. Therefore, there is no way to undo algorithm A, and A is not correct.

