Harman Researchers Make Important Headway in Understanding Headphone Response
What should a headphone sound like?
You wouldn't believe how complicated it is to answer that question. The problem, in a nutshell, is this: put a calibrated measurement microphone in front of an ideal speaker and measure the frequency response and you get a flat line. Stick your head where that calibrated mic was and look at the signal at your ear drum and what you get is far from a flat line. Boundry gain from your torso adds a bit of mid-range energy; the shape of your pinnae act to amplify signal between roughly 2kHz and 5kHz; and ear canal resonances make minced-meat of the highs.
And it gets worse....
As you move your head from side to side, or up and down, relative to the speaker, the response at the ear drum changes significantly. It's a good thing because all these changes are what allow you to tell where a sound is coming from. But it makes measuring a pair of headphones and being able to tell if they're neutral a veritable nightmare. How, the hell, to do that!?
This family of response curves describing what our ear drum hears as sound comes from various angles as a class is called the Head Related Transfer Function (HRTF). Audio engineers have long established two curves as important for compensating for the HRTF: Free Field, and Diffuse Field. The Free Field curve models what is heard at the ear drum with sound coming from directly in front of you and without any reflected energylike if your were listening to a speaker in an anechoic chamber. The Diffuse Field curve models what you'd hear at the ear drum if you were in a very live room (all concrete walls) with speakers placed in numerous places around the room pumping it full of energyin other words, what's heard at the ear-drum with flat sound coming at you from all directions.
The problem, as I see it, is that neither of these compensations model what would be heard at the ear drum listening to two good speakers in an acoustically treated listening room, which, I would assume, is what headphones are supposed to be mimicking. But as much sense as that might make to a simple journalist like me, it doesn't move the state of the art forward. Someone, with serious science chops, lots of money to throw at the problem, and all the gear to do the studies right, has to step up to the plate and do the hard work needed to establish a really good target curve for headphone reproduction. One of those someones is Dr. Sean E. Olive, Director of Acoustic Research at Harman International.
Approaching the Problem
Having been aware for some time of Dr. Olives work at Harman in developing target response curves for headphones, I contacted him about the possibility of a visit to his lab while in L.A. for T.H.E. Show Newport. I was delighted when he agreed, and simply tickled pink when he also invited me to join himself and Dr. Floyd E. Toole for dinner during the show.
Dr. Toole lead a team of researcherswhich included Sean Olivein a very important body of work dubbed the "Athena Project", which started in 1986 and was a partnered effort between Canada's National Research Council (NRC) and a non-profit consortium of five Canadian audio companies. The project's task was to develop a set of target specifications for speaker performance known to be pleasing to listeners. The basic premise of the research was that speakers that sound good will sell well, and Toole's job was to find out what "sound good" meant. He didn't simply settle on "flat is best", he set out to determine what people actually preferred by developing a comprehensive regime of subjective testing to very carefully relate listener preferences with the technical performance of speakers.
Spoiler alert: Flat, neutral response was preferred.
But it's not nearly as simple as that. Off-axis response, evenness of response, room reflections, bass reinforcement, bass extension, and a wide variety of other factors strongly come into play. It would be unfair of me to over-simplify the work or insights that arose from a very serious effort to critically evaluate the subjective impressions that lead them to their conclusions, so i'll point you to Dr. Tooles marvelous book, "Sound Reproduction: Loudspeakers and Rooms." In it, Dr. Toole takes the reader on a grand tour of the world of audio reproduction and his life's work investigating the subject. His written voice is clear, humorous, and eye-opening, making accessible to the avid enthusiast the nuts and bolts of what makes for a great listening experience. I've skimmed the book and now have started reading it in ernest, and I find it a great treat for my inner audio geek. Highly recommended.
The take-away point for this article is this: When research using very carefully controlled double-blind testing is done to evaluate audio performance, results may point toward the simple conclusion that neutral is preferred, but the data will also be rich with nuancenuance that may provide the information needed to build not just good audio transducers, but great ones.
The current work being lead by Dr. Olive at Harman, with the help of researchers Todd Welti and Elisabeth McMullin, to determine a target headphone response is similar in its approach to the previous work done by Dr. Toole at the NRC. And though still in its early stages, some interesting results have begun to emerge, which are documented in three AES papers that Dr. Olive provided me for this article. (I am an AES member, so it's all Kosher.) I'll briefly summarize these papers below.
The Relationship between Perception and Measurement of Headphone Sound Quality - Presented at the 133rd AES Convention Oct 2012, San Francisco
For this paper double-blind tests were performed on six popular circumaural headphones (AKG K701 and K550; Audeze LCD-2 v2; Beats by Dre Studio; Bose Quiet Comfort 15; and V-Moda Crossfade LP). Test subjects were unable to see the headphones being placed on their head by the test administrator, and small handles were installed on the headphones so the subject could adjust the headphones on their head without any tactile cues as to which cans they were wearing. This was a two part test where subjects were asked to provide responses on various sound quality attributes (spectral, spatial, dynamics, and distortions) in the first part, and perceived spectral balance and comfort in the second.
The paper explains in excruciating detail the efforts to develop an experiment that was free of bias, which includes: level matching techniques; the selection of music used; the use of short clips; the development of listener feedback metrics and comments; and selection of qualified listeners. It goes on to describe the rigorous application of statistical analysis on the resulting raw data, which not only held the grading information on headphone performance, but also was able to determine the reliability of the various subjects impressions.
Sitting in my catbird seat of having long experience with all the headphones included in the test I have to tell you that it's quite amusing to see all this science tease out observations that resonate strongly with my experience. If I had a billion bucks I'd love to perform such tests for all InnerFidelity headphone reviews...but I don't, so I guess we'll just have to be satisfied with trusting reviewers for the task.
The paper goes on to compare the subjective listening test results with the measured performance of the headphones using standardized test equipment. But it also takes the measurements one step further and some subjects had small microphones inserted into their ear canals to have the headphones measured on real heads.
Six conclusions were drawn from the experiment:
- Subjects perceived significant differences between the headphone's in the areas of comfort, preferred attributes of sound quality, and spectral balance.
- The most preferred headphones had the least deviation from flat and neutral in spectral balance rating.
- Sound quality attribute preferences and overall spectral balance ratings occurred in separate parts of this two-part test, but there was a strong correlation between perceptions of poor spectral balance and comments associated with low preference rating.
- The more preferred headphones had measurements showing flatter, smoother amplitude response, and better extended bass. The measured amplitude response was generally a good predictor of perceived spectral balance and preference rating.
- The most preferred headphone did not have the 12dB peak at 3kHz which exists in the diffuse field standard curve. Two headphones which did have this peak were judged to be too thin and bright.
- In-ear measurements showed significant variations in amplitude response depending upon the listener and model headphone used. Some headphones varied more than others. How these headphone/listener variations effect the accuracy, reliability, and validity of subjective testing will be explored in future work.
Listener Preference for Different Headphone Target Response Curves - Presented at the 134th AES Convention May 2013, Rome, Italy
You can't build a bridge over a river from only one side. Sure, science can be done by simply amassing data and then observing which way the data points, but sometimes it's more efficient to take a stab at what you think is right (a hypothesis) and then test it. Like I said before, it just seems obvious to me that headphones should sound like good speakers in a good room, but until you test that hypothesis you can't know if that's true or not.
In this paper, the Harman team set out to test that idea. A new headphone compensation curve was developed using calibrated speakers in a room, which were then "listened to" by a dummy head with a calibrated "ear". A GRAS 43AG coupler was mounted in a head in the listening position, the head was then rotated to three different positions to spatially average response (+30 deg, 0 deg, +30 deg) and measurements taken. This gave the researchers a starting target curve for what the ear hears in a room with speakers.
Then two different headphones (Audeze LCD-2 v2, and Sennheiser HD 518) were measured on a coupler and inverse filters calculated to cause the headphones to measure perfectly flat on the coupler. Having a filter that could flatten the headphones now allowed the researchers the ability to add various target EQs to the headphones to determine which was most preferred in subjective listening tests.
The target EQs tested were:
- Diffuse-field based on Hammershoi & Moller.
- Diffuse-field based on Moller
- Modified Diffuse-field based on Lorho
- Free-field based on Hammershoi & Moller
- In-room speaker response based on measurements in Harman Reference Listening Room.
- Modified in-room response with less bass and treble.
- Unmodified sound of the headphones.
I'll quote the paper here:
"The underlying premise or hypothesis was quite simple: since stereo recordings are optimized to sound good through loudspeakers in a room, they will only sound good through headphones that simulate the response of a loudspeaker system in a room. This study provides empirical evidence that this premise is well grounded"
Damn! Dontcha just love it when common sense prevails. I'm going to be watching the continuing work on this subject with great interest and hope one day in the future I'll have a new curve to compensate my headphone measurements. Thank you Olive et al!
A Virtual Headphone Listening Test Methodology - not yet published, will be presented at AES 51st International Conference, August 2013, Helsinki Finland
In this last paper I'll report on here, the Harman team take the Sennheiser HD 518, measure it, create an inverse filter, and then apply it to the headphone to make it flat. They then take measurements of the six headphones in the first paper, and create filters that represent their response. These new filters are then applied to the flattened HD 518 so as to mimic the sound of the other headphones. Basically, the researchers are making the HD 518 sound like the other headphones in the test. The paper includes a series of graphs comparing the amplitude response of the real and virtual headphones...and the similarity is remarkable in all but the top octave.
Above 10kHz, there are significant difficulties getting reliable and stable amplitude response measurements as even very small positional changes on the measurement coupler causes wild swings in measured response. The researchers chose to not apply any filters above 10kHz.
Now, with six virtual headphones available on one headphone, the researchers can repeat the double blind testing of sound quality without the difficulty of having a facilitator placing the various headphones on the subjects head, and without the additional factors of fit, weight, and comfort differences between cans skewing evaluations of sound quality. The paper then goes on to describe the test design, methods, and data reduction in great detail.
A number of results were observed:
- Overall, the preference ratings of virtual headphones were lower than with the real headphones, but the distribution of preference was wider with the virtual cans than the real ones. It's a little complicated, so you'd have to read the paper to understand the justifications, but it is thought that subjects having the ability to select from all six headphones at the touch of a button allowed listeners to give a wider and more stable scaling of their headphone preferences.
- Broadly, there was good agreement between the headphone preference ratings in the standard and virtual test (correlation coefficient r=0.85).
- There was also very good agreement on the perceived spectral balance of four of the six headphones. Listeners were essentially asked to draw a frequency response curves while listening to both the real and virtual headphones. These perceived curves were compared and correlation coefficients werevery good at: r=0.98; 0.91; 0.83; and 0.80.
- Two of the headphones had poor correlation coefficients (r=0.05 and 0.69). The AKG K550 is known to have fit problems, and the heavy weight and higher clamping force of the Audeze LCD-2 v2 may have made it identifiable in the standard test. These physical and comfort issues are thought to have skewed the listening test results.
- Statistical methods allowed the researchers to determine that the virtual test was significantly less prone to variations from influences other than sound quality, making it a more reliable indicator of sonic preference by listeners.
- The virtual test method required far fewer trials and delivered more discriminating results making it a far more efficient method for testing headphone sound quality than double-bling testing with real headphones.
The paper concludes that while the virtual method of testing headphones does not address some important factors in headphone satisfaction (fit, comfort, bass leakage), it does provide a superior method for subjective testing of audio performance. It's not only more discriminating than testing the actual headphones, but it provides a method that is much more efficient (cheaper, faster) than the standard testing methods.
I'll quote the authors' concluding remarks:
"The authors hope this paper and the virtual method will encourage more headphone manufacturers and researchers to do proper headphone listening tests. Such tests can provide new information and guidance needed to improve the measurement, design, and sound quality of future headphones."
A final note on this paper: Some of the more technical InnerFidelity readers will be aware of the fact that the process of virtualizing headphone performance using DSP techniques do not easily take into account non-linear distortions of the headphones themselves. The authors acknowledge this issue, and believe it may be significant when testing headphones at higher listening levels, but also believe the effects of non-linear distortion are quite small and likely have little effect on test results. Fodder for future research no doubt.
My Closing Thoughts
I believe the work being done by Dr. Olive and his team brought out in these three papers represents a significant step forward in the development better headphone for us all. To summarize:
- The first paper shows that there is a direct relationship between measured performance of headphones and the listening experience. Listeners strongly tend to prefer neutral, even amplitude response, and good bass extension.
- The second paper shows that the long time diffuse-field target curve is not preferred by listeners. People prefer headphones that mimic the sound of good speakers in a good room.
- The final paper describes a method for virtual double-blind headphone testing that is efficient and discriminating, allowing manufacturers and researchers a less onerous method for evaluating headphone performance.
I'll add that if you're curious about this type of testing, you should come to T.H.E. Show Newport next year. I haven't announced it officially yet, but InnerFidelity and Head-Fi members will be producing a display area called, "Candemonium: A Headphone Sideshow", which will appear in the "Headphonium" area of next year's show. You can think of it as a cross between a carnival sideshow and a science exhibit having a bunch of hands-on experiences with headphones in various ways. (Candemonium is being developed with the intention of drawing audiophiles not yet fully aware of the headphone hobby, and new blood from the youthful public who have an interest in headphones, further into the activity.) Dr. Olive has agreed to sponsor some listening stations in the Candemonium area that will allow show-goers the opportunity to participate in informal listening tests very similar to those described in the papers above. (We'll put some Beats in the mix and see how they do.)
Okie dokie, 'nuf of the technical stuff. Turn the page and I'll show you some pictures of my visit to Harman's research labs....
Sean Olive's blog is very informative.
Link to the two papers available in the AES library here and here. (Papers are available to non-members for $20 and for members $5.)
Head-Fi thread discussing first paper, Sean Olive (as Tonmeister2008) and Todd Welti (as MeatusMaximus) contribute to the lively and informative dialog.
Head-Fi thread started by Todd Welti discussing whether or not pinnae should be used on couplers when measuring headphones.