On audio compression

In the last few days, I have been reading and thinking a lot about audio compression.

Lossy v. lossless compression

As most of you will know, there are two major types of compression: lossless and lossy. In the first case, we take a string of digital information and reduce the amount of space it takes to store without actually destroying any information at all. For example, we could take a string like:

1-2-1-7-3-5-5-5-5-5-5-5-5-5-5-5-5-5-2-2-2-3-4

And convert it into:

1-2-1-7-3-5(13)-2(3)-3-4

Depending on the character of the data and the kinds of rules we use to compress it, this will result in a greater or lesser amount of compression. The upshot is that we can always return the data to its original state. If the file in question is an executable (a computer program), this is obviously required. A file that closely resembles Doom, as a string of bits, will nonetheless probably not run like Doom (or at all).

Lossless compression is great. It allows us, for instance, to go back to the original data and then manipulate it with as much freedom as we had to begin with. The cost associated with that flexibility is that files compressed in lossless compression are larger than those treated with lossy compression. For data that is exposed to human senses (especially photos, music, and video), it is generally worthwhile to employ ‘lossy’ compression. A compact disc stores somewhere in the realm of 700MB of data. Uncompressed, that would take up an equivalent amount of space on an iPod or computer hard drive. There is almost certainly some level of lossy compression at which it would be impossible for a human being with good ears and the best audio equipment to tell if they were hearing the compressed or uncompressed version. This is especially true when the data source is CDs, which have considerable limitations of their own when it comes to storing audio information.

Lossy compression, therefore, discards bits of the information that are less noticeable in order to save space. Two bits of sky that are almost-but-not-quite the same colour of blue in an uncompressed image file might become actually the same colour of blue in a compressed image file. This happens to a greater and greater degree as the level of compression increases. As with music, there is some point where it is basically impossible to distinguish the original uncompressed data from a compressed file of high quality. With music, it might be that a tenth of a second of near silence followed by a tenth of a second of the slightest noise becomes a twentieth of a second of near silence.

MP3 and AAC are both very common kinds of music compression. Each can be done at different bit-rates, which determines how much data is used to represent a certain length of time. Higher bit rates contain more data (which one may or may not be able to hear), while lower bit rates contain less. The iTunes standard is to use 128-bit AAC. I have seen experts do everything from utterly condemn this as far too low to claim that at this level the sound is ‘transparent:’ meaning that it is impossible to tell that it was compressed.

But what sort to use, exactly?

Websites on which form of compression to use generally take the form of: “I have made twenty five different versions of the same three songs. I then listened to each using my superior audio equipment and finely tuned ear and have decided that X is the best sort of compression. Anyone who thinks you should use something more compressed than X obviously doesn’t have my fine ability to discern detail. Anyone who wants you to use more than X is an audiophile snob who is more concerned about equipment than music.”

This is not a very useful kind of judgment. Most problematically, the subject/experimenter knows which track is which, when listening to them. It has been well established that taking an audio expert and telling them that they are listening to a $50,000 audiophile quality stereo will lead to a good review of the sound, even if they are really listening to a $2,000 system. (There are famous pranks where people have put a $100 portable CD player inside the case for absurdly expensive audio gear and passed the former off as the latter to experts.) The trouble is both that those being asked to make the judgement feel pressured to demonstrate their expertise and that people actually do perceive things which they expect to be superior as actually being so.

Notoriously, people who are given Coke and Pepsi to taste are more likely to express a preference for the latter if they do not know which is which, but for the former when they do. Their pre-existing expectations affect the way they taste the drinks.

What is really necessary is a double-blinded study. We would make a large number of versions of a collection of tracks with different musical qualities. The files would then be assigned randomized names by a group that will not communicate with either the experimenters or the subjects. The subjects will then listen to two different versions of the same track and choose which they prefer. Each of these trials would produce what statisticians call a dyad. Once we have hundreds of dyads through which to compare versions, we can start to generate statistically valid conclusions about whether the two tracks can be distinguished, and which one is perceived as better. On the basis of hundreds of such tests, in differing orders, we would gain knowledge about whether a certain track is preferred on average to another.

We would then analyze those frequencies to determine whether the difference between one track (say, 128-bit AAC) and another (say, 192-bit AAC) is statistically significant. I would posit that we will eventually find a point where people are likely to pick one or the other at random, because they are essentially the same (640-bit AAC v. 1024-bit AAC, for instance). We therefore take the quality setting that is lowest, but still distinguishable from the one below based on, say, a 95% confidence level and use that to encode our music.

This methodology isn’t perfect, but it would be dramatically more rigorous than the expertly-driven approach described above.

Author: Milan

In the spring of 2005, I graduated from the University of British Columbia with a degree in International Relations and a general focus in the area of environmental politics. In the fall of 2005, I began reading for an M.Phil in IR at Wadham College, Oxford. Outside school, I am very interested in photography, writing, and the outdoors. I am writing this blog to keep in touch with friends and family around the world, provide a more personal view of graduate student life in Oxford, and pass on some lessons I've learned here.

15 thoughts on “On audio compression”

  1. Exhibit A

    “For only about a 5% penalty in file size I use variable bit rate encoding for better quality. This lets the coder use more bits when it has to. I set this under PREFERENCES > ADVANCED > IMPORTING > Import Using & Setting > Custom, and then check “Use Variable Bit Rate Encoding (VBR).” Apple has this pretty well hidden. I leave the rest at default of 128kbs, auto and auto.

    VBR sounds better for the same file size. As far as I can see the only reason Apple doesn’t default to this is for compatibility with old iPods. Having a new iPod Nano, no problem!

    I couldn’t hear any defects. 128kbs VBR AAC sounds the same as my CDs. Any defects I heard were accurate reproductions of flaws in the original CDs…”

    “Audiophiles are people more in love with equipment and algorithms than music. They prefer listening for artifacts over enjoying music. They, like most people, hear things based on what they expect to hear. Tell them something was data-reduced and it really will sound worse to them, even if you play them an uncompressed selection! Most people don’t worry themselves sick about the oxygen content of their power cables or green magic marker on the edges of their CDs. Audiophiles oddly are deaf to the clicks, pops, scratches, horrendous inner groove distortion and speed and pitch changes caused by eccentric pressings of the vinyl records they still hoard.”

  2. Exhibit B

    “The 128 kbit/sec VBR (file size 760KB for 42 seconds) is a bit better, though still very fuzzy. Although the artifacts are still very very big, the transients of the harpsicord don’t interfere with the voice that heavily anymore. For less demanding music, this encoding is on the edge of usable…”

    “For me it resulted in selecting AAC 224kbit/sec as my default format, as this is the encoding which provides good sound quality, and still results in acceptable file sizes (at least for my iPod 30GB). Although the AAC 224bit/sec still shows some minor artifact, you have to compare it with the original and critical source material in order to recognize those artifacts, though for the Tori Amos track I can recognize them immediately. The AAC 320 kbit/sec encoding results in much larger files, for just a little bit more quality, which was my reason not to select it. For less critical material, I use the 160kbit/sec AAC encoding, just to gain some space on my iPod. For critical material (piano, lots of cymabals, music close to hard rock with continiously distorted guitars) this results in obvious flanging or tremolo.”

  3. “Note how people who review compression schemes tend to have unusually ugly websites.”

    Snobbery comes in many forms. Those who do not agonize over CSS are obviously inferior.

  4. Thank God I still keep all my music as MIDI files!

    Oh, the glory of MIDI audio!

  5. Seriously, according to this page Variable Bit Rate (VBR) AAC at 128-bit is imperceptible from the original. Ken Rockwell agrees and I trust him.

    Debate closed.

  6. Mark,

    If you register for an account on this blog, you can edit your own comments. For the moment, if you post a corrected version, I will backdate it and delete the incorrect one.

    Anonymous,

    Variable Bit Rate (VBR) AAC at 128-bit is also the solution I have decided upon.

  7. To our subjects’ ears, there wasn’t a tremendous distinction between the tracks encoded at 128Kb/s and those encoded at 256Kb/s. None of them were absolutely sure about their choices with either set of earphones, even after an average of five back-to-back A/B listening tests. That tells us the value in the Apple’s and EMI’s more expensive tracks lies solely in the fact that they’re free of DRM restrictions.

    Source

  8. 1/3 of People Can’t Tell 48Kbps Audio From 160Kbps

    “Results of a blind listening test show that a third of people can’t tell the difference between music encoded at 48Kbps and the same music encoded at 160Kbps. The test was conducted by CNet to find out whether streaming music service Spotify sounded better than new rival Sky Songs. Spotify uses 160Kbps OGG compression for its free service, whereas Sky Songs uses 48Kbps AAC+ compression. Over a third of participants thought the lower bit rate sounded better.”

  9. That is rather surprising.

    I though perhaps the comparison was done based on the sound hardware people had at home, but it was apparently with “a pair of £500 reference-grade headphones and a high-end audio processor.”

    The sample size was quite small, however. Just 16 people.

  10. The perceived quality of a recording depends on what the listener’s ears have been trained on (as well as the quality of the audio equipment and the ambient noise). Jonathan Berger, a professor of music at Stanford University in California, gets his incoming students every year to listen to a variety of recordings compressed with different algorithms. Each year, their preference for music in MP3 format increases.

    Clearly, the iPod generation is becoming attuned to the “sizzle” caused by a muffled base and clipped high notes that MP3’s lossy codec imparts. Their preference is similar to the way audiophiles from a previous generation swore that vinyl LPs produced a warmer, richer sound than CDs. To their ears, they did.

    In reality, they had simply become so attuned to the clicks and crackles, as well as the limited dynamic range, of the older format that the familiarity made them feel comfortable. A future generation—trained to hear a recording’s subtleties burned by a lossless codec onto an audio Blu-ray Disc—will be puzzled by their parents’ preoccupation with sizzling songs rather than an authentic replica of the music the performer actually created.

  11. Articles last month revealed that musician Neil Young and Apple’s Steve Jobs discussed offering digital music downloads of ‘uncompromised studio quality’. Much of the press and user commentary was particularly enthusiastic about the prospect of uncompressed 24 bit 192kHz downloads. 24/192 featured prominently in my own conversations with Mr. Young’s group several months ago.

    Unfortunately, there is no point to distributing music in 24-bit/192kHz format. Its playback fidelity is slightly inferior to 16/44.1 or 16/48, and it takes up 6 times the space.

    There are a few real problems with the audio quality and ‘experience’ of digitally distributed music today. 24/192 solves none of them. While everyone fixates on 24/192 as a magic bullet, we’re not going to see any actual improvement.

Leave a Reply

Your email address will not be published. Required fields are marked *