AES Headphone Technology Conference Highlight Paper
Oh my, this paper sure throws a wrench into my mental monkey-works.
In it, Gunther Thiele opines about how to develop the standardized EQ and signal processing needed to deliver a tonally neutral headphone listening experience for audio professionals. There are some very interesting tid-bits of information here...but the conclusion will be somewhat troubling for headphone audio purists.
The frequency response of high-quality studio monitor headphones should provide the same sound colour neutrality as demanded for loudspeaker monitoring in listening rooms according to ITU-Rec. BS 1116. This is obtained by a probe measured frequency-independent diffuse-field transfer function in accordance with ITU-Rec. BS 708. Spectrum level based calibration requires a reference sound field that provides sufficient diffusity as well as a flat frequency response in order to avoid coloration. Headphone manufacturers are interested in an attractive sound designed in accordance with actual preferences of consumers. Alternative target responses have been designed to simulate what a listener hears from a high-quality multichannel loudspeaker system in a reference listening room (in-room equalization). It is shown that this intention can only be realized with binaural room synthesis implementation that ensures accurate binaural rendering of the spatial cues, ideally including head tracking and personalization methods. A corresponding suitable standard based on a neutral listening room is desirable, not least in view of multichannel sound headphone reproduction. The virtual 3D listening room would avoid inadequate in-head perception of suboptimal two-channel stereo downmix material. Instead, it would ensure the intended perception of the recording in terms of space and colour. However, alternative in-room based equalization target curves should be documented with measures according to ITU-Rec. BS 708 Annex 2 that offer clear information for the assessment of tone colour, as well as comparability of headphone frequency responses.
As many readers will know, I've recently spent some time measuring my head in Harman's killer listening room in an effort to come up with a target response curve specific to my dummy head. Well...sadly...this paper points out that such an effort might be in vain. Let's work our way through some of the interesting points of Theile's paper.
Imaging - Here's one I didn't know: Turns out that while headphone imaging is inside the head, it has been found to be more precise than speakers. In the plot above, speakers at 3 meters (normal room listening), speakers at 1 meter (near field listening), and headphones were evaluated for the ability to precisely locate a sound within the normal stereo image. Nothing more to say on this...just thought it was cool.
In-Head Localization with Headphones - This seems to me one of those things that's so obvious we never bother to think about it, but the answer is quite interesting. Theile asserts that each ear is hearing the sound properly: If you listen to only one ear-piece, you correctly hear the sound as at the ear, zero distance away. Then, when you engage both ears, you essentially create a phantom image between the two source just like with stereo speakers, except in this case with the sources at both ears, the phantom image between ends up in your head.
Tone Color - This is where things get rough. For a real sound source outside your head you get two types of information: spatial (what angle and distance to the sound) and tonal (the sound of the sound).
The diagram above shows a model of what happens to the original sound as it becomes perceived by the listener. First, the sound impinges on the outer ear and becomes "spatially encoded"it gains some reflections and colorations that make up the psychoacoustic cues you need to determine the location of the source. Then the signals from both ears go into the brain for spatial decoding to determine the direction and distance to the source.
Then a weird thing happens, in the "Gestalt determining stage" the brain determines the location of the source, then removes all the tonal, level, and timing information. It then passes the now cleaned up signal on to become the perceived auditory event. Said another way, the brain knows how to remove all effects of the pinna and body reflections so that you perceive the sound as it actually is, rather than hearing what you're actually hearing. Amazing!
Sound-pressure Level Divergence (SLD) - Okay, if that was weird, this is weirder. It turns out that the mind perceives loudness differently depending on the nature of the sound field. Basically, for the same perceived loudness in free-field, diffuse field, and headphone listening, different levels will be measured in the ear canal.
I'm sure we can all relate to the idea that we tend to play headphones louder than speakers for the same perceived listening level. The plot above shows that difference in level is about 4dB, and also shows that the SLD varies with frequency.
What that means for me and my dummy head trying to make a target response curve from speaker measurements, is that what ever curve I get, I now need to adjust it by the SLD plot above (maybe?) to perceptually correct it for headphone listening.
(Next time I'm in L.A. I'm going to buy Sean Olive a really nice dinner and pick his brain like an opal miner.)
Theile's Conclusions - Because of all these perceptual problems with headphones, it is exceedingly difficult to make headphones sound like speakers. Just making sure the signal in the ear canal is the same in both speaker and headphone listening will not ensure that the listener has the same experience in both cases. Theile sums it up this way:
Simple loudness based calibration does not replicate the original complex outer ear transfer functions given in the reference configuration.
As shown in the previous sections, this is evident also for spectrum level based calibration targeting to simulate what a listener hears from a high-quality multichannel loudspeaker system in a reference listening room (so-called in-room equalization ). The result is in-head locatedness, which means that certain ear signal spectrum characteristics generated from individual loudspeakers in the room do not contribute to the spatial perception process but instead degrade colour neutrality.
His conclusion is that the only way to make very accurate studio monitor headphones is to first tune the headphone to the diffuse field responseas it delivers the least linear distortion in the transducer/ear interface and will be able to most accurately play an incoming signal for the ear. Then, using digital signal processing (DSP), create a fake room using binaural room impulse response information for a high acoustic quality listening room. Then create virtual speakers in that room to play the sound. Then add a head tracker and a bunch of HRTF data so that you can move your head normally and hear the cues changebecause your brain won't be reliably fooled if you don't. Research shows that if you do all these things, only then can you properly perceive tonal neutrality on headphones.
My Conclusions - What this means to me is that manufacturers will be developing high-end headphones for audio pros that have diffuse field equalization so that they can be hooked up to electronics that will do all the DSP for virtualization of the audio. And, sadly, diffuse field equalization sounds pretty bad without all the DSP.
There will be a push from manufacturers of high-end headphone gear to make headphone that sound tonally neutral by adding DSP because it's almost impossible to do it with a passive headphone due to perceptual problems.
Oy vey. I have a fair bit of skepticism that that much signal processing will ever deliver the type of resolution and nuance an audiophile desires...tonally neutral or not.