Wavelets and Image Compression

Image Approximation

The ideas for one dimensional signal approximation using fewer coefficients were developed previously. These ideas can be extended to \(2D\) signals, ie. images. It was also shown that Nonlinear Approximation results in better performance than linear approximation.Recall that Nonlinear approximation consists of selecting \(M\) largest coefficients instead of the first \(M\) coefficients as in Linear case. Nonlinear approximation performance for wavelets is significantly better than in the linear case as quite a few large coefficients exist at every scale of signal decomposition. In the images case, we'll stick with nonlinear approximation owing to its superior performance.

Approximation Example: Lena Image Approximation using \(1/50\) coefficients

As an example of approximation , consider the Lena image approximation using Fourier Transform and Wavelet Transform. Algorithms are same as in the \(1D\) case.


Original 512X512 Lena Image

In Fourier case, we take the fourier transform of the image and retain only the largest one out of fifty coefficients while setting the rest equal to zero and then we reconstruct the image using only largest \(2\%\) coefficients . The reconstructed image is shown below.


Lena Image Fourier Approximation using only largest 2% coefficients

Next we repeat the same approximation process with Wavelet Method. We take \(J=4\) stage Discrete Wavelet Transform (derived from Db2 wavelets) of the Lena image, choose the largest \(\frac{1}{50}\) coefficients while setting the rest equal to zero. Then the Inverse Discrete Wavelet Transform is taken using these new coefficients. The resulting Image is shown below and is seen to be superior reconstruction compared to the Fourier case.


Lena Image Wavelet Approximation using only largest 2% coefficients

Information Theory Background

Let us assume that system source is a discrete and finite consisting of N symbols and is given by \(S\) also known as alphabet.

\[ S=[s_{0},s_{1},....,s_{N-1}] \]

Let \(X=[x_{0},x_{1},....,x_{i},....\) be a collection of output sequence where each output \(x_{i}\) is taken from the set \(S\). The probability that this system outputs the \(kth\)symbol \(s_{k}\) is given by \(p_{k}\) where

\[ \sum_{k=0}^{N-1} p_{k}=1 \]

as there are only \(N\) symbols.

Self-Information: We define information content \(I_{k}\) of each symbol based on its probability of being the output. It is given by

\[ I_{k}=-\log_{2}(p_{k}) \]

and is measured in bits. If \(p_{k}=1\) or the symbol occurs every time then the information content of such a certain event is zero. On the other hand if the event is impossible \(p_{k}=0\) then the information content of such an event is \(\infty\). More realistically, the likelier an event , less information is needed to define the event. On the other hand, unlikley events carry more information.

Source Entropy: is defined as expected value of self-information.

\[ H(S)=E\{I_{k}\}=-\sum_{k=0}^{N-1} p_{k}\log_{2}(p_{k}) \]

\(H(S)\) is measured in bits per symbol and is maximal for flat probability distribution, ie., if each symbol is equally probable \(\frac{1}{N}\) then \(H(S)=\log_{2}(N)\). Smaller source entropy can be obtained if probability distribution isn't flat.

Average Bit rate \(R_{x}\): Let \(l_{k}\) be the length of the code associated with \(kth\) symbol with output probability \(p_{k}\) then the Average Bit Rate \(R_{x}\) is given by

\[ R_{x}= \sum_{k=0}^{N-1} p_{k}l_{k} \]

The objective of good coding is to minimize the average bit rate for a given information source.Also, \(H(S)\) sets the lower bound on average bit rate so \(R_{x}\) for a particular coding scheme can be measured against source entropy.

Prefix Code: One important condition for any good code is that no codeword can be a prefix of another codeword. As an example consider a code with following codewords

\[ c_{1}=1 , c_{2}=11 , c_{3}=10 , c_{4}=101 \]

Consider a sequence \(1011\). It can be written as both \(c_{4}c_{1}\) and \(c_{3}c_{2}\) which will not work at the decoder stage. The solution is to generate codes using prefix binary tree.Each node of the tree has only one input branch and no more than two output branches with left branch labeled as \(0\) branch and right branch as \(1\). The binary tree ends in \(K\) leaves each corresponding to a unique codeword. In this case, length \(l_{k}\) of each codeword is given by the number of branches from the node to the \(kth\) leaf.

Huffman Coding

Premise of Huffman coding is that more probable a symbol, the shorter its length should be. If \(P(s_{i}) \lt P(s_{j})\) then \( l_{j} \ge l_{i}\). Huffman Codes are constructed in \(4\) steps as following.

Huffman Coding Example


Huffman Coding Example

The codeword can be obtained as in \(4\). We get

\[ s_{1}=0 , s_{2}=10 , s_{3}=110 , s_{4}=111 \]

The average bit rate is found by using

\[ R_{x}= \sum_{k=0}^{N-1} p_{k}l_{k} = 0.4 * 1 + 0.3 * 2 + 0.2 * 3 + 0.1 * 3 = 1.9 \]

The source entropy is calculated using

\[ H(S)=E\{I_{k}\}=-\sum_{k=0}^{N-1} p_{k}\log_{2}(p_{k}) \] \[ H(S)=-[0.4*log_{2}(0.4)+0.3*log_{2}(0.3)+0.2*log_{2}(0.2)+0.1*log_{2}(0.1)] = 1.846 \]

The average bit rate is a bit higher than source entropy but still gives better results than usual bit coding which corresponds to \(2\) bits per symbol for this example.

Arithmetic Coding

Drawbacks of Huffman Coding

Basis of Arithmetic Coding:Consider a binary sequence \(B = 100111011\). We can convert it into a decimal number \( 0 \leq v \leq 1\) by expressing it as \( 0.100111011_{2} \), which is equal to \( v = 0.615234375_{10} \). In other words, regardless of how large a number is, it can be mapped to \( [0,1)\). In arithmetic coding, we create a sequence of nested intervals

\[ \phi_{k}(S)= [a_{k},b_{k}) \]

where \(a_{k}\) and \(b_{k}\) are real numbers and

\[ 0 \leq a_{k} \leq a_{k+1} , b_{k+1} \leq b_{k} \leq 1 \]

Alternatively, we can represent each interval in form of base (lowest value \(b\)) and the length of the interval \(l\).

\[ |b,l \gt \]

Arithmetic coding Algorithm

Example: Let us consider a source consisting of four symbols with probability and cumulative distribution given as

\[ p=[0.2 , 0.5 , 0.2 , 0.1] \] \[ c=[0 , 0.2 , 0.7 , 0.9 , 1] \]

Let us consider an input sequence \(S= [ 2 , 1] \). Corresponding to first value \(2\), we have \(p(s_{1}) = 0.2 \) and \(c(s_{1}) = 0.7 \). Calculating

\[ |b_{1},l_{1}\gt =|b_{0}+c((s_{1})l_{0},p(s_{1}l_{0}\gt = |0.7,0.2 \gt \]

Corresponding to second value \(1\), we have \(p(s_{2}) = 0.5 \) and \(c(s_{2}) = 0.2 \). Calculating

\[ |b_{2},l_{2} \gt =|b_{1}+c((s_{2})l_{1},p(s_{2}l_{1}\gt = |0.74,0.10 \gt \]

If we are transmitting only two values, this encoding gives us a choice to generate shortest binary number in the range \( 0.74-0.75\). Minimum number of bits needed for encoding purposes are given by

\[ B_{m}= -\log_{2} (l_{n}) \]

\(B_{m}\) is rounded up to the nearest integer. This coding is especially effective for a large chain of input values as all inputs are encoded together as a block with trade off being a more complex implementation is needed to generate the codeword.

Decoding

Although some overhead is needed, decoding depends solely on binary string \(v\). The decoding is done as follows

Navigation

Discrete Wavelet Transform

2D Discrete Wavelet Transform

Approximation Theory

Nonlinear Approximation

Resources

Numerical Tour
Huffman Coding Wiki
Arithmetic Coding Wiki
JPEG2000 Wiki
JPEG2000 Official Page
C Valens EZW Page ( Matlab and C Codes )

References

Mallat::A Wavelet Tour of Signal processing

Lim::Two-Dimensional Signal and Image Processing

Gonzalez, Woods::Digital Image Processing, 3rd edition

D.A. Huffman: A Method for the Construction of Minimum-Redundancy Codes [PDF]

Said:Introduction to Arithmetic Coding - Theory and Practice [PDF]

Marcellin, Taubman:JPEG2000: Image Compression Fundamentals, Standards and Practice

Marcellin et al.:An Overview of JPEG2000