ADX (file format)

From Wikipedia, the free encyclopedia

CRI ADX
Developed by CRI Middleware
Platform Cross-platform
Development status Active / Unknown
Genre Codec / File format
License Proprietary
Website CRI Middleware

ADX is a lossy proprietary audio storage and compression format developed by CRI Middleware specifically for use in video games, it is derived from ADPCM. Its most notable feature is a looping function that has proved useful for background music in various games that have adopted the format, such as the Dreamcast and later generation Sonic the Hedgehog games from SEGA, as well as many PlayStation 2 and GameCube games. There is also a sibling format, AHX, which uses a variant of MPEG-2 audio and is intended specifically for voice recordings. A packaging archive, AFS, is also included for bundling individual ADX and AHX files into a single container.

One of the first games to use ADX was Burning Rangers, on the Sega Saturn.

Contents

[edit] General Overview

ADX supports the typical variety of sampling frequencies such as 22050 Hz, 44100 Hz, 48000 Hz, etc. but the sample depth is locked at 16bits. It does support multiple channels, however there seems to be an implicit limitation of stereo (2 channel) audio although the file format itself can represent up to 255 channels. The only really unusual feature is the looping functionality that enables the audio player to automatically skip backwards after reaching a [single] specified point in the track, theoretically this functionality could be used to skip forwards as well but that would be redundant since the audio could simply be clipped with an editing program instead.

[edit] Technical Description

This section provides a complete technical overview of ADX and is aimed mainly at people with a background in programming.

The ADX format's specification is not freely available, however the most important elements of the structure have been documented in various places on the web. The information here may be incomplete but is sufficient to build a working codec or transcoder. AHX is not covered here as information about that variant is rare, however a cursory examination with a hex editor reveals a strikingly similar design to ADX 'version 3' minus the looping feature.

As a side note, AFS archive files are a simple variant of a tarball which uses numerical indicies to identify the contents rather than names. Source code for an extractor for this format is included in the ADX archive at [1].

[edit] File Header

The ADX disk format is defined in big-endian. The identified sections of the main header are outlined below:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0x0 0x80 0 Copyright Offset Unknown Channel Count Sample Rate Total Samples
0x10 Version Mark Unknown Loop Enabled (v3) Loop begin sample index (v3)
0x20 Loop begin byte index (v3) Loop Enabled (v4)

End sample index (v3)

Loop begin sample index (v4)

End byte index (v3)

Loop begin byte index (v4)
0x30 Loop end sample index (v4) Loop end byte index (v4) Unknown
0x40 Unknown/Empty
??? [CopyrightOffset - 2] -> ASCII String: "(c)CRI"
... [CopyrightOffset + 4] -> Audio Data

The "Version Mark" field should contain the big-endian values 0x01F40400 (Hexadecimal) for 'version' 4, or 0x01F40300 for 'version' 3. Fields labelled "Unknown" contain either unknown data or otherwise appear to be reserved (ie. filled with null bytes). Fields labelled with v3 or v4 but not both are "Unknown" in the version they are not marked with.

[edit] Sample Format

ADX encoded audio data is broken into a series of consecutive blocks of 18 bytes. Each block contains data for one channel only, they are laid out in 'frames', one block for each channel makes up a frame in ascending order. ie. Frame 1: left channel block, right channel block; Frame 2: left, right; etc. The layout of a block itself looks like this:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Scale 32 4bit samples

The scale is a 16bit unsigned big-endian integer. The high bit of the scale is not used, it is a flag for the end of the ADX stream.

[edit] Decoding Samples

As noted above, each sample consists of 4bits, the high 4bits of each byte are the first sample with the low 4bits being the second.

7 6 5 4 3 2 1 0
First sample Second sample

The decoding method for a sample is demonstrated below in C99:

#define SAMPLES_PER_BLOCK  32
#define BYTES_PER_BLOCK    18    /* SAMPLES_PER_BLOCK / 2 + sizeof(uint16_t) */

/* sample_index is an uint_fast32_t incremented every time a sample has been decoded from every channel */
/* current_channel is an uint_fast8_t that holds the index for the channel currently being decoded (ie. 0 for left, 1 for right) */
/* audio_data_start is an uint_fast16_t byte index of the first byte of audio data in the file (ie. adx_header->CopyrightOffset + 4) */
/* num_channels is a uint_fast8_t channel count, that is 1 for mono, 2 for stereo, etc (This is adx_header->ChannelCount verbatim) */
/* raw_data is a uint8_t pointer to the start of where the file is located in memory */
/* previous_sample and second_previous_sample are both int_fast32_t's */

int_fast32_t sample;
uint_fast8_t sample_4bit;
uint_fast16_t block_scale;
uint_fast32_t data_index;

/* --- Get 4 bit sample --- */
data_index = audio_data_start + (sample_index / SAMPLES_PER_BLOCK) * num_channels * BYTES_PER_BLOCK + current_channel * BYTES_PER_BLOCK;
block_scale = ntohs( *(uint16_t*)&raw_data[data_index] ) + 1;
data_index += 2 + sample_index % SAMPLES_PER_BLOCK / 2;
sample_4bit = raw_data[data_index];
if (sample_index % 2)                      /* If the sample index [starting at 0] is odd then we are decoding a secondary sample */
    sample_4bit &= 0x0F;
else                                       /* Otherwise it is a primary sample */
    sample_4bit >>= 4;

/* --- Decode 4 bit sample --- */
sample = sample_4bit;
if (sample_4bit & 8) sample -= 16;         /* Check the 4th bit (the sign), if negative then adjust for larger variable */

sample *= block_scale * volume;            /* Scale up the sample and amplify */
sample += previous_sample * 0x7298;        /* Incorporate previous sample data */
sample -= second_previous_sample * 0x3350; /* Incorporate previous previous sample data */
sample >>= 14;                             /* Divide the sample by 16384 */
if (sample > 32767)                        /* Round-off the sample within the valid range for a 16bit signed sample */
    sample = 32767;
else if (sample < -32768)
    sample = -32768;

second_previous_sample = previous_sample;  /* Update the previous samples for the current channel */
previous_sample = sample;

The above code assumes the file has been read or mapped into the program address space (see mmap), however this is not necessary in a practical implementation but makes this demonstration simpler.

Before processing the sample, it is necessary to acquire the "block scale" and the byte containing the sample within the file, the calculations used here appear more complex than they truly are, a counter and cache variables would be simpler and more efficient in practice but the entire positional calculations are demonstrated for clarity.

The first calculation finds the channel block for the current sample, this involves converting the 'total samples read' counter into a 'number of frames read' counter then adding the offset of block for the current channel within the frame. The 'block scale' is located at the start so that needs to be converted to the local endian (in this case, ntohs is appropriate for this task) and stored for later. The second calculation moves to the byte within the channel block. As the samples are nybbles, not whole bytes, the if statement cuts off the undesired sample and shifts the nybble appropriately to the low 4bits.

The decoding process involves first adjusting the 4bit signed value for a 32bit [or larger] variable as few desktop processors can handle 4bit numbers directly. The highest bit of the 4bit value is the sign bit, the number itself is formatted in Two's complement. The demonstration code uses a simple trick for sign-extending the value, for example, if sample_4bit is -1 (1111 in binary), which is 15 in unsigned arithmetic, subtracting 16 will convert the number to -1 again in the larger variable.

The next stage is to multiply the sample by the 'block scale' which gives it a rational amplitude, then amplify by a volume, the value used for the volume varies between sources from 0x1000 to 0x4000, it is recommended that you should likely not go higher than 0x4000 as distortion effects may be noticeable in common test files, caused by oversaturating the sound. The next 2 steps include information from the previous two samples to bring the sample in line with the others. The previous sample trackers translate across block boundaries but separate tracker sets must be kept for each channel, the values start at 0 in the first audio frame of the file. Lastly, the sample is divided by 16384 using a downshift to compress into the expected signed 16bit range (-32768 to 32767) then truncated if necessary.

The decoded sample value is a regular raw PCM amplitude sample that can be played on a sound card or fed into an encoder to transcode into some other sound format.

[edit] Encryption

ADX supports a simple encryption scheme which XORs values from a linear congruential random number generator with the block scale values. This method is computationally inexpensive to decrypt (in keeping with ADX's real-time decoding) yet renders the encrypted files unusable. The encryption is active when the Version Mark value in the header is 0x01F40408 (note that the final byte is 0x08 rather than 0x00 as in unencrypted files). As XOR is symmetric the same method is used to decrypt as to encrypt. The encryption key is a set of three 16-bit values: the multiplier, increment, and start values for the linear congruential generator (the modulus is 0x7fff to keep the values in the 15-bit range of valid block scales). Typically all ADX files from a single game will use the same key.

The encryption method is vulnerable to known-plaintext attacks. If an unencrypted version of the same audio is known the random number stream can be easily retrieved and from it the key parameters can be determined, rendering every ADX encrypted with that same key decryptable. The encryption method attempts to make this more difficult by not encrypting silent blocks (with all sample nybbles equal to 0), as their scale is known to be 0.

Even if the encrypted ADX is the only sample available, it is possible to determine a key by assuming that the scale values of the decrypted ADX must fall within a "low range". This method does not necessarily find the key used to encrypt the file, however. While it can always determine keys that produce an apparently correct output, errors may exist undetected. This is due to the increasingly random distribution of the lower bits of the scale values, which becomes impossible to separate from the randomness added by the encryption.

[edit] Sources

Languages