What is perfect sound and how can I get it?

In the audio industry, nearly every company makes big claims about delivering the best sound, but hardly, if ever, is there a clear explanation of what that means. This begs the question, how does one define “perfect sound”? 

This article will outline objective standards for what perfect sound means and most importantly how it will impact both your work as a music producer and your enjoyment as a listener.

While some of the concepts detailed below may seem quite complicated, it is important to note that there are software products like Neptune that do the heavy lifting for you.

Perfect Sound in a Perfect World

The simplest definition of a perfect audio system is a system that accurately reproduces the source audio. 

In order to make or enjoy music in the highest detail your goal is to hear the audio exactly as it is. 

Given the nature of stereo audio and modern studio design our target sound system will be slightly different from the “perfect” system described above. 

This brings us to the commonly accepted industry standards for professional audio systems. 

Industry Standards for Professional Audio

As it turns out, there are industry standards that have been outlined by the EBU (European Broadcast Union). These standards are used by most acousticians as a guide for creating professional studios.

Most of the commercial audio you’ve heard, whether that be a popular song, sound effects in a video game, or a movie score, was either created, mixed, or mastered (or a combination of the three) in a studio with these standards.

So, in order to hear the audio as the artist, producer or engineers heard it, we need to recreate the sound that would be heard in that setting.

Now that we have a target sound that we are looking to achieve, we can come up with a solution. If you are interested in the details, I recommend you read through the standards laid out by the EBU, but I will go through some of the key defining features of the “ideal listening setup” here. 

You might be surprised to learn that studio quality audio is both affordable and achievable with any location and budget. 

1. The most important factor is frequency response.

The single most important factor for a high-quality sound system is a flat frequency response. Essentially, this means that, through the entire audible range of 20Hz-20kHz, each frequency will be played at the same volume. 

This can be measured using a calibrated omnidirectional microphone that listens to the system’s output as it plays a sine wave sweep and plots each frequency’s relative amplitude. Ideally, the graphical representation of this would be a perfectly linear line extending from the lowest audible frequency to the highest.

This is important because let's say your mixing in an environment that is weak in the low frequencies, you would likely overcompensate by adding too much bass to the mix. When you go play your mix in the car or a bass boosted club it will likely sound pretty bad.

Mixing in an environment with a non-linear frequency response is like trying to paint a landscape while color blind.

A linear frequency response is nearly impossible to achieve with speakers. The reason for this is that a room’s reverberant qualities inevitably influence the sound that travels directly from the speakers to the microphone. Although it is possible to dampen or attenuate a room’s reverberance, doing so in a cost-effective way can be challenging to say the least. 

The problem with speakers

According to the EBU specifications, the frequency response target is +/-3dB from flat when smoothed at 1/3dB per Octave. It is important to note that a system can fit this criterion and still have a raw (unsmoothed) response that deviates dramatically from this range.

It is equally important to realize that when you are listening to your speakers, you are hearing the raw response and not the averaged response. 

For example here is a snapshot of the frequency response of my current setup, both smoothed and unsmoothed.

As you can see, the smoothed graph falls well within the recommended range as defined by the EBU. The raw response, however, deviates on average +/-5dB from flat. Keep in mind that this system performs exceptionally well relative to the average response of a typical home setup. 

Therefore, a raw response of +/-5dB from flat is basically the best you can ever hope for with speakers. This response is only possible because of a combination of high-quality room treatment, large room volume, and most importantly, digital signal processing correction. For reference, here is the response before DSP.

To reiterate the main point, it is basically impossible for even the highest quality speaker systems to measure perfectly due to the effects of reverb. However, now that we have an understanding of the standards and limitations of the frequency response for speakers, we can focus on alternative ways to increase listening clarity through headphones. 

Headphones can be corrected more accurately

One major advantage of headphones is that they propagate sound directly to your ears without any external influence from the room. By switching to headphones, we have already eliminated essentially all of the problems associated with reverberation.

It is important to note that for the highest quality measurements it is worth using in-ear headphones as opposed to over-ear headphones. While the reverberant field using over-ear headphones will be superior to speakers, for the most accurate digital signal processing correction in-ear headphones will provide better results assuming the build quality is satisfactory.

All that is left to do is make sure that what is arriving at the eardrums of the listener is correct. We will discuss the headphone measurement process in a future article, but long story short, we are looking to normalize the response so that the headphones deliver a flat frequency response.

Here is an example of what this looks like graphically for Apple EarPods 3.5mm. 

What you are seeing here is, quite literally, a perfect frequency response (albeit with some HPF below 45Hz). The difference between the headphone’s response above and the speaker’s response from earlier is that the raw response of the EarPods is significantly flatter. As we know from our discussion earlier, the discrepancy is due to the absence or presence of reverb.

At this point, it should be clear that achieving a perfect frequency response with speakers is not a realistic expectation, but it is possible with in-ear headphones. 

Now that we have addressed the frequency response problem, let’s now look at the fundamental difference between stereo presentation on speakers compared to headphones.

2. Crossfeed is an essential part of human hearing

The largest fundamental difference between speaker and headphone listening is the presence or absence of crossfeed. Crossfeed is a phenomenon that occurs naturally when listening on speakers whereby sound from each speaker travels to both ears, with sound reaching the opposite ear at a slight time delay and only at mid and low frequencies. 

To reference an important point made earlier, all professionally produced audio is created in control rooms with speakers. Therefore, if we want to hear what the artist, producer, and engineer heard on speakers, we need to replicate the effect of crossfeed on headphones.

According to the EBU, the recommended listening setup places the speakers in an equilateral triangle with the listener at a distance of 6-12ft. In order to accommodate both near and far-field styles of listening, our crossfeed uses a distance of 6ft. With some basic geometry, we can figure out the difference between how long it takes sound to travel to each ear. The delayed signal also needs to be filtered in order to simulate the shadow effect of the head. The majority of research on this subject points towards an optimal cutoff frequency that lies anywhere from 800Hz to 1600Hz. From our beta testers we found that 1600Hz provides the most realistic simulation. 

The effects of a proper crossfeed implementation can be either subtle or pronounced depending on the input. For example, a song that contains a lot of stereo information (and especially hard-panned elements) will sound drastically different with crossfeed processing whereas a song that is mostly mono will not sound much different. The perceptible changes can be understood by the graphic below.

 

Other factors that might be effecting your audio system

1. Distortion Characteristics

The next most important measurable characteristic of a sound system is its distortion characteristics. Distortion is defined as any additional sound that is produced that is not inherent to the audio itself. 

Distortion can be categorized as either linear or nonlinear and further classified as either harmonic or inharmonic. Distortion is caused by a number of things including but not limited to, speaker design, amplifier limitations, digital to analog conversion, etc. Acceptable levels of distortion, as defined by the EBU are less than 1% for frequencies lower than 16kHz and 3% for frequencies lower than 250Hz. Ideally, for perfect sound, there would be zero distortion. However, because speakers and headphones have physical limitations and DSP cannot correct for distortion, we need to pick a system with the lowest relative distortion levels. 

As it turns out, this eliminates all speakers from the conversation. Due to the physical demands of speakers being far higher than that of headphones, speakers tend to have higher levels of distortion. In accordance with this logic, the physical demands of over-ear headphones are higher than in-ear headphones so many well-designed in-ear headphones perform better than their over-ear counterparts. 

This exact phenomenon can be observed with Apple EarPods which, according to measurements taken by RTINGS.com, have lower distortion levels than almost all of the other headphones that they have measured. To reiterate, in the case of distortion, it is important to choose a headphone that is naturally capable of producing low levels. In order to achieve perfect sound, distortion must be kept low so that the audio input can be delivered to the listener in its original form.

2. Phase and Time Response

Although the phase and time response of a system is the least important variable for perfect sound, it provides another example of why headphone listening is more accurate than speaker listening.

Most free field setups feature two loudspeakers each with a tweeter (producing high frequencies >2000Hz), one or two woofers (producing low-mid frequencies 40Hz - 2000Hz), and one or two subwoofers (producing sub frequencies <40Hz). 

In the example of a three-way system with two subwoofers, there exist eight different sound propagating locations, with the contents of each arriving at the listening position at different times. 

There is not much relevant research regarding how perceptible these time differences are, but for perfect sound reproduction, it is clear that we want all frequencies arriving at the eardrum at the same time. With headphones, most of this work is done for us because all frequencies are propagated by the same woofer.

Benefits of a Perfect Audio System

1. Perfect Mix Translation

In order to maximize your mix’s clarity and legibility on all devices, you need to work in a neutral mixing environment.

The reasoning for this is that if your mix sounds good with a flat frequency response, it will sound relatively good on a system that is colored (bass boosted club, bright car, etc).

If you’ve ever made a mix that badly fails the car test, a poor critical listening environment is a likely culprit.

2. Accurate Listening

A flat frequency response brings with it improved detail retrieval, dynamics and when combined with proper crossfeed, enhanced spatial perception. When working with the appropriate environment it is easier to choose the correct reverb or the right snare. 

3. Easy Collaboration

When working with a remote team it is incredibly important that all contributors are hearing the same thing. When your team is using a software such as Neptune, you can be confident that you are not only hearing a perfect reproduction of the source audio, but that your team is hearing exactly what you are.

Conclusion

Hopefully, this article has shed some light on what variables are relevant in the pursuit of perfect sound reproduction. It is one thing to read about it, but another to experience it for yourself.

If you are interested in what perfect actually sounds like try Neptune for 14 days free.