Dither noise in audio recording… in “today’s” reality.

There are lots of guys out there that swear by recording audio only at 44.1khz citing that recording at anything higher introduces quatization errors when down converted for CD-Audio formats and require “dither noise” to compensate for.

Wikipedia, has some interesting articles that explain how quantizing your audio can add bad resonant frequencies to your audio and emphasizes how dither noise can solve this problem. I’m not going to explain them again here, but, what I WILL do is point out what these examples are NOT. It is important to note that the examples are *intentionally modified* from real-world scenarios so as to explain what quantization errors are and how adding noise (dither noise) to a recording can be used to reduce the harmonic errors introduced from (poor) sample rate conversion.

I put the word “poor” in parens in that last sentence, but really it should be BOLD. The examples in the Wikipedia articles don’t really emphasize that in order to create the situation where dither noise is needed, you first need to start with poor sample rate conversion and poor bit-depth.

1. For example take ten numbers that represent points in time.

[0][1][2][3][4][5][6][7][8][9]

2. Then reduce the speed by 30% by multiplying * 0.7, this effectively
stretches the sound. Here’ we’re turning 10 samples into 14.
[0][0.7][1.4][2.1][2.8][3.5][4.2][4.9][5.6][6.3][7][7.7][8.4][9.1][9.8]

3. Then lets round them off
[0][1][1][2][3][4][4][5][6][6][7][8][9][10]

Step #3 is the “poor” part. Given a sine wave for example, it will
duplicate samples as is, adding jagged edges into what would otherwise
be a smooth waveform. If we did the opposite, shrinking the audio
instead of stretching it, the samples of the original sine wave would
be unevenly spaced adding unwanted “bumps” which your ear would interpret as a sound at a particular frequency.

A good sample rate conversion process would Take the amplitude
measurements from #2 above, and given say [0.7] would take 30% of
sample [0], and 70% of original sample [1] and interpolate between
them. It would also, combine and average samples together when
multiple samples were being condensed down into a single sample.

Most DAWs these days use a 64-bit floating point engine to accomplish all mixing,

i like to record at 96Khz, just because I can. Why would I want to record 96Khz? Why would anyone? You can’t hear those frequencies right?

It is correct that you cannot hear tones above maybe 18-20khz. The frequency is different for everyone, but is well below the 22Khz maximum frequency that can be represented by a 44khz audio recording. BUT… I had an interesting conversation with David Eden who started Eden Electronics, regarded to manufacture some of the best (and most expensive) Bass amplifiers in the world.

The human ear cannot hear the supersonic tones that could be recorded at 96Khz, but it CAN sense the direction of a sound at the around equivalent of up to 250Khz (depending on the person). If you haven’t heard a recording mixed at 192Khz, you should give one a listen. It sounds considerably more real, (the words “freakin’ awesome” come to
mind), like the instruments are not in the speakers, but in the room with you. Additionally, acoustic instruments such as violins, brass, and acoustic guitars emit supersonic vibrations into the air which, you can’t necessarily hear up close, but which interact with the air at a distance.

I want the option of having the highest quality recording possible, fuck the limits of the CD-ROM. It won’t be around forever.

So Now, just to continue on the original conversation… lets assume that we’re stuck with poor sample rate conversion and therefore the other points of the Wikipedia articles still actually MEAN something. But even if bending reality in the quantization process weren’t enough, they continue bending reality YET AGAIN for a sole purpose of example in the later examples.

One example illustrates what happens when you take this wave and reduce
it from a 16-bit to a 6-bit accuracy. Listen to all that scary resonnance! If I weren’t a computer programmer for a living, I might be scared by the 6-bit samples put up on wikipedia, but careful examination of what a 6-bit sample really is should explain the reality of how the example blows the problem out of proportion.

6-bit audio uses up to 6 “bits” to represent a measurement of sound pressure at a given point in time. A bit can be on or off, a “1” or a “0” in binary. Therefore if you wanted to hear something awful, you could encode your music as “1-bit-per-sample” and you’d essentially only be able to create square waves… the sound pressure measurements would be -100% or +100%, so just TWO possible levels of precision, none in between… it would sound awful, but maybe you’re making awful music anyway 😉

If you have 2 bits, there are 4 possible values 00, 01, 10, 11. In decimal these numbers would be represented as 0, 1, 2, and 3, so there will be 4 possible sound pressure levels Each bit you add, doubles the possible values that can be represented. So by the time you get up to 6-bits you have 64 possible values. Here’s a table er somthin…

Bits / Precision (Levels)
——————————-
1 / 2
2 / 4
3 / 8
4 / 16
5 / 32
6 / 64
7 / 128
8 / 256
16 / 65536
20 / 1048576
24 / 16777216

First off, 6-bit audio, with 64-possible amplitude values represents extremely poor audio quality even by 1982 standards. Even 8-bit audio (which is 4x as accurate as 6-bit audio) has been absent in computers for quite a while now.

A CD is 16-bit, which can represent the audio in 65536 levels of accuracy. It is fair to say that this is pretty good for consumer purposes, but could be better in a professional environment when recording things like drums in particular which have very loud sounds and quiet sounds mixed together.

Anyone who’s anyone records at 24-bit. Keeping mind that every bit doubles the accuracy of the recording, 24-bits gives you 16,777,216 degrees of accuracy. Now, it is quite safe to say that no analog equipment, including microphones, preamps, xlr cables, and speakers, etc. can transmit the audio to your computer with 16.7 million degrees of accuracy.

The errors that would be introduced from POOR quantization of audio would be introduced at levels between 0 and 1 of those 16.7 million levels of accuracy. So even assuming that your quantization is POOR, you’re not really at risk of polluting your final CD-master with unwanted resonnant frequencies unless you were stacking 1000 or more
poorly quantized tracks on top of each other BEFORE you did the final master. And that assumes that they are all resonnating in lock-step with each other. But in order to keep 1000 sine waves in lock-step with each other, given that each wave adds together, you’d have to turn down the volume of all those sine waves to 1/1000th its original volume
(asumming they were in phase). If the quantization errors were present in that original sine wave, then resonnant frequency interference would be turned down with it. So … in short… I think you’re pretty safe. Many digital audio recording packages mix the audio at 64-bit accuracy these days. 64-bits of accuracy is enough to measure every nanosecond of time from now since the dinosaur age, with room to spare!

Some other links:

Here’s an article that contradicts my viewpoint…. it should be noted that this article is 10 years old… and the footnote concedes that 10-years later, 96khz to 44.1khz conversion can be done 100% transparently.
[1]http://www.stereophile.com/asweseeit/397awsi/

Also here’s an article that talks about other reasons why you would
want to go 96Khz.
[2]http://www.digitalprosound.com/Htm/SoapBox/soap2_Apogee.htm

Leave a Reply Cancel reply