Audio Precision Explains Headphone Measurements and Head Related Transfer Function

In the image above clipped from the video below, you'll see a little snippet of text from the IEC60268-7 specification for headphone measurements.

No known objective method produces a flat frequency response characteristic from an earphone which is subjectively judged to produce wide band uncolored reproduction.

Before you watch the video, let me provide a few pointers to help you follow along. What we're talking about here is the fact that sound that measures flat on a standard measurement microphone will no longer be flat when you put your head in that sound field and measure the response at your eardrum. Your outer ear—specifically the concha bowl around the ear canal—provides gain between roughly 2kHz and 6kHz. There are other effects that cause sound at the ear drum to deviate from flat as well—ear canal resonances, and head and torso boundary gain for example.

Free-Field Response
Free-Field response is an early acoustic standard showing the relationship between the sound in space and that at your ear drum with sound coming directly in front of you (in an anechoic chamber). The first part of the video (graph at 0:10sec) shows roughly what this response looks like. It says Head Related Transfer Function (HRTF) but you'll notice the measurement is in an anechoic chamber and the angle of incidence is 0 degrees, which makes it the Free-Field response.

Now the problem with the Free-Field measurement is that it only shows the ear drum response for sounds coming directly in front of you. When a sound source moves away from center, the shape of the pinna (outside of your ear) reacts differently to the sound, and the shape of the response will change. The plot shown at 6:26 shows measurements for angles of incidence at 0, 90, 180, and 270 degrees, and shows how the frequency response at the ear drum changes with incidence angle.

Technically, the HRTF is the entire set of ear drum response curves for all angles of incidence. Free-Field response is generally understood as only the ear drum respons for sound directly in front of you, and in an anechoic chamber.

Diffuse Field Response
This film does a really terrific job of showing you that if you add up all the response curves of sound coming at you from every direction, you end up with a new type of response curve called the Diffuse Field Response. Generally, this measurement is made in a very reverberant room that gets filled with sound causing it to approach the head from every direction at once. For a long time after this response was identified it was touted as the best target response curve for headphones; some models even had "Diffuse Field Equalized" printed on the headphone.

The problem is, it just doesn't work. Neither sound coming from directly in front of you in an anechoic chamber (Free-Field), or coming at you from all directions in a reverberant chamber (Diffuse Field), represent a good approximation of how you listen to music. This leads to the problem highlighted in the IEC spec, and the conundrum of subjective and objective measurement of headphone response yielding different results. I'll go on, but this would be a good time to watch the flick.

(Click here if you have trouble seeing the video.)

Now, Audio Precision is a class outfit; their gear is spectacular, and their technical publications outstanding. In this video they've done an excellent job of showing us exactly where headphone standards are today. Their conclusion? As the instructor rightly summarizes at 9:28 regarding the IEC headphone measurement specification:

"So basically what they're saying here is: Nobody can agree on what is good."

I'm motivated to post this video here today because I though it was a nice clear explanation of Free-Field and Diffuse Field equalization curves for readers just getting into this subject. But I'm also motivated to show InnerFidelity readers what a difficult position headphone engineers find themselves in when trying to determine how a headphone should measure, and how important it is that a new headphone target response curve be developed.

Regular readers will be well aware of the work Sean Olive, Todd Welti, and Elisabeth McMullin have been doing at Harman's research facility on just such a target response curve. Their basic premise is that headphones should sound like good speakers in a good listening room.

It just makes sense, doesn't it?

If you'd like to catch up on my previous postings on the subject, check out the articles here, here, and here.

Rillion's picture

One might naively think that headphones should be voiced to sound like two speakers in the traditional equilateral triangle arrangement since that is what most recordings are mixed on. However, without crossfeed effects, a headphone will not perfectly model the reduced treble of mono sounds resulting from two-speaker comb-filtering. On the other hand, headphones should be able to better reproduce a three-speaker/channel arrangement that eliminates the comb filtering with a "center" channel for vocals and solo instruments. The problem with this is that 3-channel (or more) music recordings are not that common.

Tyll Hertsens's picture
Your point is well taken. The differences between headphone and speaker reproduction is complex, but thoughtful adjustments to EQ that more effectively mimic speakers in a room is far better than FF and DF target EQs, I think.
sszorin's picture

To 'imitate speakers in a room' is, I think, a wrong approach. Headphones should represent instruments on a stage, they should represent how would the instruments sound in space if one was standing in the central position of a microphone for vocal/s.

AstralStorm's picture

In fact, using a simulated three channel model (AKA mid-side stereo) and then applying crossfeed as needed between the three channels gives a much better result than trying to model crossfeed between two speakers.
I bet this is because the brain is actually resynthesizing the center signal from stereo + diffuse reverberation, not just ITD and ILD, but I don't have a hard theory.

AstralStorm's picture

The result, given equalized headphones (to personal flat sound - which apparently is less variable than one would think) - this sounds very close to speakers in a highly dampened room. (Not anechoic chamber, the distance is simple to gauge.)

Rillion's picture

Hi AstralStorm,

I've experimented with that some also. If you are including a delay in only your side signals, then the comb-effect will be greatly reduced. This will not accurately represent a two-speaker setup, but it is arguably better for certain music. I notice it mainly adds clarity to male vocalists, at least with the music I listened carefully with.

I'm not sure the delays are translated properly with side channels generated by subtracting right from left: the phase is reversed on one channel and then mid-side gets mixed back together after the processing--it makes my brain hurt thinking about it. Better results might be obtained with a "center cut" algorithm based on phases, for example: . There is a LADSPA implementation of this which unfortunately has some slightly audible artifacts (echo or warbling). I have not had time to improve upon it myself.

AstralStorm's picture

The delays are actually translated correctly - nearly in phase signal is crossfed less.
Phase is also correct, in fact phase cancellation effects work more like on actual speakers in this setup - they depend on the amount of center summing.
You'll see what exactly happens once you write down the equations.
No modern codec even tries going down the full orthogonalization route as it is unnecessary and inaccurate.

In addition, the center channel should be slightly reduced in volume. In perfect speakers and anechoic room, the summing is 6 dB for 60°, generally real setups are closer to 3 dB.
The difference is "cone" soundscape as opposed to "triangle" soundscape - depends on the pan law used for the recording too - most use 3 dB.

What is more important is that then phase cancellation is incomplete, just like in actual speakers.

Rillion's picture

Seems I was mistaken when I looked at this before. It is easy to verify that the approach of applying crossfeed only to the side channels of a simple M-S separation (with right S reversed phase from left S) does give the right results in the limits of either mono signals or hard-panned signals. I still need to verify the intermediate cases, but suspect it will work out fine. Thanks for bringing it up!

Rillion's picture

What I meant by "reversed phase" is (right S) = -(left S) .

Rillion's picture

The M-S approach seems fine with delays turned off, but I'm still not sure about what the delays do to the signals. Looks like I have a bit more work to do before I really understand this ...

Rillion's picture

Hi AstralStorm,

I've examined applying crossfeed to the S channel enough to be convinced that it generally does a better job, at least for the most simple crossfeed implementations: the comb that normally affects mono sound is replaced by a very attenuated anti-comb in hard-panned sound. This ends up working better with the way my favorite headphones are voiced (conventional crossfeed darkens their sound too much).

Anyway, can you refer me to any good resources to learn more about the "cone" vs "triangle" soundscape? I already know a bit about the triangle soundscape in an anechoic room since that is what I currently simulate (overlaid with a Bruel & Kjaer room response that gradually boost the low frequencies).

Rillion's picture

Once the overall tonal balance is matched using shelf filters, crossfeeding just the S channel sounds very close to conventional crossfeed. You have to listen very carefully to perceive differences. My impression is that crossfeeding just the S channel improves the clarity of soloists and singers at the expense of making the band sound less dynamic in music that has a lot of hard-panned instruments. I guess it all comes back to how the music was mixed in the first place (as many people have said in this thread, there seems to be no standards for this). I don't want to sound too critical of the S-channel crossfeed approach because these listening tests are very time-consuming so my conclusions may be biased by a small sample size.

castleofargh's picture

if we agree to go with flat speakers as reference sound, shouldn't we start by trying to get a similar signature while using some matter of crossfeed?
I don't really see the point of trying to get the "same sound" when we know from the start that there is nothing realistic in the way the sound is brought to us in headphones.
I don't know if we should keep expecting the sound engineers to think about headphones while mixing the albums and pray for some standardization of this process. or push for crossfeed as a default feature on headphone amps. but I really believe that sounding real(no 100% in only one ear nonsense)is imperative if we ever want headphones to sound flat.
nobody looks at the picture of a badly done 3D rendering and think "oh those colors are really well balanced".

Rillion's picture

Certainly aiming to reproduce speakers in a room is better than the status quo. I'm not sure what the best target is, but I do know that a headphone EQ that sounds right paired with a fairly realistic crossfeed will sound overly-bright without realistic crossfeed, at least with some music.

Seth195208's picture

..poses one of the most interesting(And seemingly paradoxical) scientific questions in all of audio. Makes my head spin when I try to make sense of all the variables.

Tyll Hertsens's picture
And that's why I continue to bring up the subject: it's quite confusing for many at first. Takes a while for it all to soak in. Still learning here.
Seth195208's picture

..inherently incorrect by bypassing each individual's own unique HRTF (outer ear, head and upper body) processing mechanisms that regular in room audio naturally and fully accounts for? Is a generic approximation really good enough? Really?

Rillion's picture

That is a question that deserves more study -- perhaps there are already studies published on it.

I don't think perfect reproduction of individual HRTFs is necessary for most people since the human brain can adapt to different acoustic environments. However, there are certain things that are very hard to adapt to, such as hard-panned bass instruments. How close you need to reproduce individual HRTFs for a comfortable listening experience is an interesting question and probably varies from person-to-person and the type of music.

Also, external speakers have the complicated issue of room treatment which can have a huge impact on the frequency response. There are professionals that make a living on acoustic room treatment.

Tyll Hertsens's picture
I wouldn't say "incorrect", but I would say unnatural.

One problem is that once you start going beyond relatively simple, but probably not complete, compensations, you have to start using complex DSP algorithms. Many, my self included, will likely prefer simple approximations that don't degrade the front end analog signal too much.

Mind you, I'm all for DSP solutions, and I think there will be plenty in the future, but I don't think they'll easily rise to the level if resolution and finesse that audiophiles strive for. Botton line: I think there room to develop along both lines.

Seth195208's picture

.. larger ears and larger canals(Let alone the shape of those things) compared to smaller ones, must have a profound effect on timing, comb filtering, resonances and frequency response, especially at, and above the 2.5 to 3.5 peak. There is also no reliable way to test for an individual's perceived "accuracy" at these frequencies other than asking the individual weather it sounds "subjectively" accurate or not. This is where the objective science of HRTF starts breaking down.

ultrabike's picture

Your article makes all of this very accessible to me.

"Regular readers will be well aware of the work Sean Olive, Todd Welti, and Elisabeth McMullin have been doing at Harman's research facility on just such a target response curve. Their basic premise is that headphones should sound like good speakers in a good listening room.

It just makes sense, doesn't it?"

IMO, to some extent. I honestly don't know what speakers where used to develop the standard free-field and diffuse-field target curves. But I bet they were not bad speakers. I feel that saying that headphones should sound like good speakers, in a "good" room or an anechoic chamber, doesn't completely address the problem.

One could say the problem is the use of an anechoic chamber, and a "good" room should solve the problem. But what is a "good" room? Furthermore, "good" speakers in a "good" listening room at 0 degrees (free-field like)? maybe 30 degrees? Bunch of degrees mashed up together (diffuse-field like)?

What qualifies a speaker as "good" in a "good" listening? +/-3 dB flat frequency response at 0 degrees, 1 meter, anechoic chamber? How about the off axis frequency roll off? How much absorption does the "good" room provides? How about room modes? What is the "good" coloration that the "good" speakers + "good" room should have? Will this overlap such coloration to ANY recording's coloration (with it's random mastering effects) in a "good" way?

Speakers are usually evaluated in anechoic chambers. Best guess is that relative performance evaluation would be fairly difficult if every other speaker is measured in a different "good" room. Yet the fact remains that most people do not listen in anechoic chambers. What then should then be the optimal interaction between the room and the speaker? How much energy should be reflected? What should a reference speaker frequency/angle polar response be?

Maybe something that reproduces "well" recorded music realistically and close to a "well" set up live performance will do fine... probably would have to listen to a good live performance and compare it to what we get from a good speaker in a good room.

Question: Are all of these compensations relative to sound impinging the head at 180 degrees? That is, does [x]-field compensation takes the frequency response difference between the x-degree response and the 180 degree response?

EDIT: Based on this LINKY it is not referenced to 180 degrees and applicable only if using an ear or head to measure. Remove the ear or head from the equation while maintaining proper acoustic impedance and this comp deal may be less necessary.

Tyll Hertsens's picture
Great comments, but I think I'd have to write a whole article to answer it properly.

A speaker that measures flat in an anechoic chamber will measure slightly bass heavy in a good listening room due to the boundary effect of the walls and wider dispersion of low frequencies resulting in more sound power in the lows in the room. So, good speakers in a good room naturally sound a little bass heavy and people are accustom to that.

Now, a good room is a little tough to define, but there does exist an IEC spec for listening rooms. The problem is that most studio acoustic designers don't really follow the spec, they've spent years developing their own brand of "special sauce" and as a result, for the most part, recording studio speaker response varies to some degree. Well, as soon as the studio mixes an album on a non-calibrated system you have no idea where flat really is.

So, I would say that the quick and dirty way to develop a headphone target response would be to put flat measuring speakers in an IEC standard listening room, and then measure the response using an IEC standard measurement head...possibly taking a few measurements with the head a few degrees of axis to either side and then averaging the measurements.

Problem is, that's just a guess on my part, and is exactly why the folks at Harman are approaching it so methodically and essentially using listener satisfaction as the controlling factor. They do make comments in the papers that they think the headphone target response is likely strongly related to the sound of speakers in the room, and they did make some assumptions along those lines when developing a variety of headphone responses to subjectively test, but they let the subjective response data drive the results and not their assumptions.

For most people wanting to delve into this topic further the best place to start id Floyd Toole's book "Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms". It doesn't talk about headphones, but it does go into great detail about how they untangled the problem of speakers being flat in an anechoic chamber and being bass heavy in a room, and how we humans, amazing as we are, somehow perceive the flat speaker in a room that adds some bass as being appropriate. (That's an almost ridiculous oversimplification of the material in the book, though.) He also goes into detail about what he calls "the circle of confusion" where if there are no standards followed for correct in-studio acoustic response, then everyone will continue to chase their tail trying to find neutral. The point is that the problems with headphones today is just an extension of the problems seen in the past with speakers...and those problems aren't even resolved fully yet in terms of implementation of what's been learned.

As a side note: Sean Olive worked with Floyd Toole when the work was done at Canada's NRC decades ago. I see it as one of the reasons he's motivated to do the work he's doing because the headphone problem is so strongly related to the idea of finding some sort of meaningful "neutral" and then designing gear around those standards. Sean also happens to be the current President of the Audio Engineering Society, and it feels very reassuring to know the group will consider his work very seriously.

BTW, Floyd's book is also filled with great information about reflections and reverberance and when it's important, and all sorts of cool stuff. I HIGHLY recommend it to audio enthusiasts as a fascinating read.

ultrabike's picture
Thanks Tyll, really appreciate your comments.
jagwap's picture

I like this research into the natural sound of loudspeakers in a room for headphones. It is an interesting goal. However much of the reason a flat response speaker in a room is perceived to be flat in room is because of the brain filtering down the reflected sounds against the early arrival directly from the speaker. So if a transducer has added bass to resemble the sound of an in room speaker, this additional bass should be delayed a little, or it will just sound like a bass lift, like these "room feel" headphones tend to have.

Music happens in the time domain. The frequency domain is a useful mathematical transform to allow us to understand the energy in each (1/3rd) octave to give balance when we do not have "DC to light" bandwidth.

DaveinSM's picture

very cool, thanks for sharing this! This is the best headphone review site on the internet that I have been able to find.

Seth195208's picture

Weather it be good, bad, right, wrong, ugly or beautiful, the buck stops with your own individual HRTF. If it is bypassed, in part, or whole, you won't be able to hear good, bad, right, wrong, ugly or beautiful the way your own brain has always expected to hear it. And getting good measurements above 6 kHz is somewhat analogous to the Heisenberg uncertainty principal, where the act of measurement(By having tiny microphones placed inside the ear canal on real human subjects with real living skin, comparable to an elephant in the living room) will grossly alter and invalidate the measurements.

Tyll Hertsens's picture
On the other hand our brains are amazingly adaptive, and are powerfully capable of making adjustments to the ways we perceive. So there's hope.

And along your line of thinking, I've often tried to dream up a way of using the headphone driver to "ping" the ear and receive a reflection from which it could map the shape of the ear to adjust perimeters. Maybe we'd need a bunch of lasers!? Mua ha ha ha ha! :)

Seth195208's picture

Our brains are so amazingly adaptive, that listening to high quality audio is totally moot, because our brain will fill in the gaps. Waaaah!

Tyll Hertsens's picture


Rillion's picture

Tyll mentioned using laser to map the ear. Well someone is already working on it: