you're reading...
Deep Packet Analysis, Hacking / Counter Intelligence, Lawful Intercept

Audio Steganography PT4: detecting MP3Stego

A new method to detect MP3Stego

When encoding a MP3 file, the frame length is in byte according to the bit stream format, but at the time of quantization and encoding, the Main Data is a bit stream. So when data is packed to frames, it may produce 1 to 7 padding bits to satisfy the request of integrity byte. Moreover, different frames have different compression rate. To satisfy the request of invariant frames rate, it may produce some padding bits to fill the frame. These additional bits are kept in the stuffingBits or ancillary positions, as shown in Figure 3.

Figure 3 MP3 frame structure

The following figure shows the process that MP3Stego packs data to frames and appends stuffingBits.

void  ResvFrameEnd(L3_side_info_t *l3_side, int mean_bits )
{   … 
    if(stuffingBits) {  … 
#ifdef MP3STEGO //to satisfy the request of odevity 
        if (stuffingBits % 2) { 
            gi->part2_3_length += stuffingBits - 1; 
            stuffingBits = 1; 
        else gi->part2_3_length += stuffingBits; 
     gi->part2_3_length += stuffingBits;  
//in normal case this request is not needed 

Figure 4 The simplified bit reservoir codes of MP3Stego

When 8hz-mp3 encodes a MP3 file, 1 to 7 padding bits are appended to the part23’s stuffingBits positions so that the part23 length is adjusted to integrity byte, then other needed padding bits are appended to the ancillary positions. But in the case of MP3Stego encoding, the padding stuffingBits must be even to ensure the hiding rule that odd length of part23 represents 1 and even length of part23 represents 0. Because there is 50% probability that the padding bits are odd number, it will remain 1 bit, which will be appended to the ancillary position by MP3Stego as shown in Figure 4.

Although this change will not interrupt the MP3 decoding process, it doesn’t accord with the normal case. The differences of the normal MP3 encoding and the MP3Stego encoding can be distinguished by the statistics of the part23 length, the stuffingBits length and the ancillary length. Table 1 shows the statistics of the part23 length, the stuffingBits length and the ancillary length modulo 8 when a WAVE file is encoded to a MP3 file.
Table1 The statistics of part23length, stuffingBits and ancillary of MP3Stego, 8hz-mp3 and Lame

encoder %8 0 1 2 3 4 5 6 7
part23 396 0 0 0 0 0 0 396
stuffingBits 985 0 189 0 224 0 186 0
MP3- Stego
ancillary 396 396 0 0 0 0 0 0
part23 792 0 0 0 0 0 0 0
stuffingBits 894 91 96 96 102 116 95 94
8hz- mp3
ancillary 792 0 0 0 0 0 0 0
part23 106 101 90 110 96 101 103 86
stuffingBits 1585 0 0 0 0 0 0 0
ancillary 106 86 103 101 96 110 90 101

Obviously MP3Stego doesn’t adjust all part23 length to integrity byte but 50% to integrity byte and other 50% modulo 8 to 7. The distribution of StuffingBits modulo 8 at 1,3,5,7 is 0 and at 2,4,6 is neally equal, and the ancillary’s distribution is 50% equal to 1 and 50% equal to 0. Mean-while, 8hz-mp3 adjusts the part23 length to integrity byte when coding, StuffingBits’ distribution at 1-7 is neally equal, and ancillary’s is equal to 0. There is distinct difference between MP3Stego and 8hz-mp3.

There are many kinds of MP3 encoders. For comparing different characteristics of these coders, we need to calculate the statistics of other encoders.
Table 1 gives the statistics of the classic MP3 encoder Lame. We can see that Lame encoder doesn’t adjust the part23 length, but it appends the padding bits to ancillary position. So all stuffingBits are equal to 0, and the sum of the part23 length and the ancillary in the same frame modulo 8 is equal to 0.

By comparing these statistics of different encoders, a statistic formula is given to detect MP3Stego.

R=\frac{abs(L_8^7-L_8^6)}{\sum_{i=0}^7 L_8^i}+\frac{4*stereo}{3}*\frac{abs(N_8^2 - N_8^1)+abs(N_8^4 - N_8^3)+abs(N_8^6 - N_8^5)}{\sum_{i=0}^7 N_8^i}

indicates the statistic number of part23 length modulo 8 equals to i, N
indicates the statistic number of stuffingBits modulo 8 equals to i. The threshold can be 0.5, when R is more than 0.5 it means MP3Stego is detected, or when R is less than 0.5 it means MP3Stego is not detected.


About Pythorian

Exploration and Production oriented security consultant for securing IT infrastructures relating to natural resources.


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: