Are Objective Headphone Measurements Relevant to Audiophile's Subjective Experience?

Tyll Hertsens's picture

Ed. Note: I talk with Steve Guttenberg on the phone a few times a month, and I very much enjoy our chats. When Steve suggested we have a conversation about measurements and their relevance to audiophiles and write it up as an article, I was all-in immediately.

I hope you enjoy our chat as much as I did.

Steve Guttenberg: Let's get down to the nitty-gritty, do you really think measurements can be used to predict sound quality preferences? And are measurements in any way more reliable predictors than subjective listening tests? We all know that tube electronics don't measure as well as solid-state gear, but that doesn't seem to bother a lot of audiophiles (and musicians) who crave tube sound. So what do measurements bring to the party that subjective evaluations miss?

Tyll Hertsens: There's a tension between subjective and objective evaluation. It seems to me they both attempt to evaluate headphones, but operate in two different domains. Essentially, I think measurements can tell you how close something is to neutral, and to what extent it deviates from neutral. Subjective testing allows you to know how pleasing something is, and in what ways. I don’t think our brain and perceptual system is like an Audio Precision tester at all; I don’t think we’re good at objectively analyzing what we hear. Likewise, my AP tester isn’t conscious, and it doesn’t experience what it’s hearing. So I don’t think it can tell us much about the pleasurable experiences of listening to a pair of headphones.

The trick here is knowing that objective and subjective evaluation happen in two different domains, and one has to have an open mind to bring in information from both without feeling conflicted. If a person falls too strongly into one camp or the other, their viewpoint bias may override the ability to see valid points of the other viewpoint. The question is how to keep both valid viewpoints at the same time.

SG: I see your point, and I want to say upfront that measurements are valuable tools for engineers, I'm just not convinced laymen can use them to predict whether they would prefer headphone A over headphone B based on looking a wiggly lines on a graph, as opposed to reading a reviewer's impressions of the two headphones. I recall that when we were both on a panel at the 2011 Rocky Mountain Audio Fest you said the Sennheiser HD 800 measured better than any headphone you tested to date, but it wasn't your favorite headphone to listen to. On another occasion you mentioned the Bowers & Wilkins P5 didn't measure all that well, but you enjoyed its sound.

TH: Well … it’s complicated. There are a whole variety of measurements that I take, and when I say a headphone measures well it’s really a summary of a variety of discrete measurements. The HD 800, for example, measures excellently in terms of the speed of the initial transient of a 300 Hz square wave, and has little ringing after the initial transient. This makes for a headphone with very good imaging as transients are clearly identifiable, providing accurate sound arrival time information for the brain to use to accurately localize sound and build a sonic image. But the initial transient also overshoots a little too far, and makes the headphones sound a little piercing, which is a characteristic I don’t like.

The subjective experience is complex too. The B&W P5 doesn’t measure well---having a somewhat excessive bass, and rolled-off highs---but I remain very attracted to them by their look and feel, for movie listening and phone calls they’re very easy on the ears and fun to hear, and their isolation is astonishing. There are a lot of things I really like about them, and I care very little about their inaccuracies faced with my satisfaction in other ways. Basically, I’m not listening for their faults in normal every-day use, so don’t hear them.

I feel things are very complicated for both subjective and objective evaluation. “Measures well” and “sounds good” are both gross oversimplifications.

SG: So there's subjective analysis of the objective measurements. That's cool, and then there's the stuff that can't be measured and quantified -- like comfort and the feel of the thing in your hands – which heavily influence preferences. The intimacy of headphones puts them in another category compared with other audio components. Measurements can't measure that, and in some ways it's more important for a headphone to feel good than sound good.

TH: I would certainly say the look and feel of headphones has far more relevance to user satisfaction then the look and feel of speakers. I mean, you’re not wearing your speakers on your head while you commute on the subway.

The value system for personal audio is clearly different than other types of audio. Esthetics, comfort, and convenience play a much more intimate role in your experience with them than with home audio. And satisfaction in those areas is going to bleed over into your aural satisfaction as confirmation bias causes you to hear them as better than they otherwise might be.

So, yes, feeling good about the look and feel of a personal audio product is probably more important than it is with a home product.

SG: It's interesting, John Grado told me last year that he never measures prototypes when designing new headphones. He's been at it a long time, and his ears tell him what he needs to know. Grado's working method is at one extreme and I guess Sennheiser is at the other. Sennheiser takes a more hard engineering, more measurement oriented approach to advancing the state-of-the-art of headphone design.

John Grado is a "seat of the pants" designer, so why would his sound be far more consistent than Sennheiser's? I can't identify a Sennheiser sound per se, it's all over the place. If its engineers were really using a scientific approach, wouldn't all Sennheisers of a certain type, say open-back for example, have an identifiable Sennheiser sound? They don't, so even for a company like Sennheiser other factors come into play.

TH: Well … Grado’s sound doesn’t change much because their headphones are all nearly identical in design. Sennheiser makes maybe a dozen full-size open cans, but the designs change significantly and tend to run in families. For example: Sennheiser makes the HD 518, HD 558, and HD 598. This is a family of headphones that share a similar look. Sennheiser probably does some things like use common parts in these headphones to gain economies of scale. So they really aren’t completely free to optimize the top product in the line as it may have some parts in common with the lowest priced product in the line.

You’ve also got a big company with corporate goals and timelines. There comes a point where management will “shoot the engineer” and put the product into production. Their job is WAY more complex than Grado’s, so you get more variance in the end sound.

SG: I still think it's odd that Sennheiser doesn't have a Sennheiser sound. The reasons behind consumer preferences are complex, and all I'm saying is the better measuring headphone isn't guaranteed success. Grado owners don't live in a vacuum, and many of them are being exposed to other brands' sounds. Some stick with Grado, some move on. Some Sennheiser owners cross over to Grado or Beyerdynamic or Hifiman. People eventually wind up with the sound they like, how well the 'phones measure has little to do with it.

Article Contents
Share | |
Comments
Jazz Casual's picture
What you should be saying Alondite

What you should be saying Alondite is that other people's subjective views don't mean anything to you. I don't think that I am alone in valuing some people's listening impressions and relating them to my own subjective experience, regardless of whether I share their personal preferences or not. You might like the taste of grapefruit and I might not, but I trust we can agree that it tastes sour. Similarly, I know a few head-fi'ers whose descriptions of how headphones sound very much align with my own, but that doesn't necessarily mean that we share the same headphone preferences. A person who dislikes treble happy, mid-centric or bassy headphones can be just as capable of perceiving those sound traits and describing them reasonably accurately, as someone who actively seeks them out. I have found subjective listening impressions from ears that I trust, more useful than objective measurements for the purpose of comparing and evaluating headphones, both before and after I've heard them for myself.

mikeaj's picture
It's not just the reviewer

I see a lot of audio gear reviews mention which music was played, what amps were used, and so on.

However, there's an even more important aspect that usually goes unreported: the listening volume! Not only does the performance of gear change at different volumes, but human perception to the sound will vary a lot too, based on the volume.

There are a lot of confounding variables to consider when evaluating audio reviews, but this is one that is too often forgotten.

kmj2587's more general comments about reproducible results are spot on.

Sometimes I think that the source of information (measurements vs. human listening) is too much of a focus, that it becomes contentious. What we want is -good- data, wherever it came from. Then we can hopefully draw the correct conclusions based on that data. Unfortunately, acoustic measurement systems often aren't blameless (most are good enough to gain some valuable insights though), and a lot of people misinterpret some measurement data. On the flip side, a large amount of listening impressions produce bad data, for a wide variety of biasing influences and factors as previously discussed by others.

tinyman392's picture
Why Subjectivism is still important

I agree that objective looks at headphones do explain a lot of subjective ideas, and can confirm what people hear (or why they hear it that way), and weed out opinions that may not be up to snuff.

Objective data cannot ever tell that a headphone will have a given property. What is punchy in the objective world? What is textured bass in the objective world? You can give me an example of what it is, but there is no definitive answer as I'll always be able to find a counter-example that doesn't fit what you say it would graphically. For example, if you say treble sparkle/splash is caused by an extension in the higher frequencies (like in the PFE232, Brainwavz B2, etc.), I'll come back and mention the Etymotic graph that shows the treble rolling off... If you want to say texture is the higher focus on the sub-bass, I'll point you straight to the Aurisonics ASG-1 graph that shows it well recessed. I can go all day with just about every single property (good or bad) that will contradict what you'd expect it to be.

At this day in age, I don't see too much of it being used though. Many of the words we used to describe a pair of headphones really can be translated to the graph. However, many of the graphs cannot be translated back into the words we describe our headphones with. It isn't a two-way street... Yet. We are still learning all of this stuff, it'll take some time, but with it, we can learn why something has texture, punch, detailing, harshness, etc.

I truly think that we need both results in order to really get a good grasp on what something sounds like. Objective data alone is not enough. Subjective data has subjectivity to it. However, put them both together, and one when confirms the other, you have a really big picture about how something is going to sound. But remember, subjective can be confirmed by objective. However, until you know what punch looks like definitely (in ALL instances), objective can not be confirmed by subjective (can't have something confirmed if we don't have a full grasp on what it is precisely).

AstralStorm's picture
Need an example and not talk

I can't find the graph you're talking about. Please provide an actual example, otherwise you come across as a troll.
ASG-1 has been described all ways by everyone and their dog. I suspect it has a peaky frequency response, with multiple peaks, like, say, Ultrasone PRO 650 which I've tried and didn't like due to various reasons. (This chart: http://www.innerfidelity.com/images/UltrasonePRO650.pdf)

"For example, if you say treble sparkle/splash is caused by an extension in the higher frequencies"
You don't. Bandwidth is not directly related to tonal balance and vice versa.

Brightness/darkness is directly defined by skew of the graph (tangent of graph, filtered). Major peaks can throw this measure for a loop, as it's typically based on straight average and should be a weighted average instead. (but nobody has the set of weights yet)

Etymoics are bright mostly because of the magnitude of the ~2.5k peak, which is pretty high and somewhat due to a lower 7k peak. Other than this nasty issue they're nicely balanced if a bit low bandwidth - but that is an issue for most IEMs. (they end at 17.5 kHz)

There's also fast/slow - this is typically first due to dropping step response, then attributable to the ringing which is best visible in impulse response. (but also noticeable in step response)
So, e.g. Etymotic ER4PT are overly fast.
Huge overshoot can also give the feeling of brightness if it happens at high frequencies.

Bass control/bass bleed is directly related to step response at low frequencies - which also affects THD there. Long ringing and high magnitude ringing is bad, as is drooping. There seems to be no consensus which one is more important.

Phase chart is important in checking how the device will react to bad sources (high output impedance, low current).
This is the first thing that has to be checked - whether your source used for matching is not broken by high output impedance or capacitance-coupled. Otherwise any description you make will not be translatable to anything. (unless you happen to measure using the same source)

Care to actually provide something that's not mappable?

acs's picture
Something important has not

Something important has not been mentioned, perhaps it is obvious so no one has said it, but I think that it should be put on the table.

If your goal is to accurately reproduce what is on the recording, using objective measurements when selecting amplifiers and digital sources is a requirement. Amplifiers can be so transparent that the signal from the input can almost be nulled with the signal from the output, only adding a small amount of barely audible distortion. DAC's can also be very accurate. Measurements can reveal these performance levels, whereas listening cannot.

Speakers and headphones cannot reproduce the incoming signal close to as accurately as an amplifier, or a DAC. There is no speaker or headphone that could come close to nulling the signal from input to output, this is inherent in the way that they work and the function they serve. This is where subjectivity takes a larger role. Deciding what is important to you in a sound signature based on the recordings you listen to and your personal preferences is much more important in selecting headphones and speakers than in selecting amplifiers and sources.

Imagine listening to an expensive amplifier that sounds good with HD650's, so you buy it. Then you decide to try some Beyer T5P's and don't like the sound. You think, geez, this amp sounded so good with the 650's, and now it just doesn't work the same with the T5P. Well, you didn't look at measurements, and it has a high output impedance, which can shift the frequency response of lower impedance headphones, like the T5P, but might not effect the 650. You have made an expensive purchase of an amplifier that only works with high impedance headphones, but you couldn't have listened for it, because you didn't have the headphones you were going to buy, and even if you did, you might think it was "synergy". Measurements and knowledge would have revealed the problem in the first place, and save you the trouble of getting on the merry-go-round.

It is very foolish to think that measurements are not a useful tool in selecting audio components. Guttenburg's insistance that they are just squiggly lines is like an illiterate saying that words are just squiggly lines. Just because you haven't taken the time to understand what measurements mean doesn't negate their meaning.

-Aaron.

jadeeast's picture
Those cookies look good.

Let's pretend that I like cookies. In fact, I'm obsessed with them.

I read cookie blogs and magazines. I've studied the FMRI scans of people's brains eating cookies. I've been an active member of cookie-fi since it started. I read all the peer reviewed cookie studies. I've studied the culture and history of cookies. I've interviewed 10,000 cookie eaters and watched hours of footage of cookie eaters on your-tube. I study the brain chemistry of cookies. I've studied homemade and industrial cookie processes. I subscribe to Cookie-o-phile magazine. I collect cookie jars. I have a degree in food chemistry. I have charts describing the weight size, density, and colour of 1,000 kinds of cookies. I've read over 2,000 reviews of cookies from experts and lay people. I've read the double and tripple blind studies of cookies going back to 1927.

I've made it my life's mission to know everything about cookies. I know all there is to know objectivity about cookies including other people's subjective experience tasting them- but I've never touched or eaten one myself.

What knowledge would I gain by eating a cookie?

AstralStorm's picture
:)

Not much, but it'd still be tasty. (assuming it fits your criteria of good tasting cookie)

Jazz Casual's picture
Not necessarily tasty but it

Not necessarily tasty but it would have flavour and texture, which could be observed and described without making a value judgement about whether it tastes good or not. ; )

Alondite's picture
GR07

@AstralStorm

That was basically what I was saying. I've never heard something that I've seen measured as having "ruler flat" bass, so the GR07 is my reference for now. I would imagine that the GR07s measure fairly well in the bass though.

And that bump at ~6kHz seems to flatten out over time. I did a blind A/B test with a brand new pair and a pair with ~300 hours of use and the new pair was very noticeably more sibilant. Actually they seem to change in a number of ways, but that was the most obvious.

I'm still convinced that they would knock the Phonaks off Tyll's Wall of Fame if he ever got to hear a broken-in pair.

AstralStorm's picture
No such effect here

Nah, it doesn't flatten at all - you might've got used to it.

I'm actually interested in this burn-in thing that hasn't been captured yet.
I've tried to keep tabs on it by reequalizing said IEM to sound flat and checked the variances. 6k difference between corrections over time was relatively insignificant. (within 1 dB and 100 Hz peak position)
I can directly use the first EQ of these with great results.

In comparison, switching eartips provided variances in 6k region that were ~3 dB and ~500 Hz peak position. (Included silicone vs new Comply T140)
And ~4.5 dB in the bass 200-400 Hz region. (same)
All the tips sealed very well.

Comply tips do wear out in time, starting to get even more reduced bass and slightly increased high resonance. The effect during the month was ~1.5 dB - Comply don't stay alive much longer than this.

I can keep more accurate data if you wish, but it's time consuming and there's a sample of one IEM. I'd require at least a few to determine manufacturing variability.

marksig's picture
The problem is accountability.

I was going to write a lengthy treatise on human psychology and how too many audiophiles don't know what they're doing and ruin the audio industry for everyone. I have opted for a shorter comment to save everyone time:

I find most objective-vs-subjective discussions beside the point and grossly mislead. I'm not going to write about which side is right or wrong.

The actual problem is that "HiFi" back in the first days implied measurements and technical excellence in circuit design to bring the lowest distortion, the most linear frequency response, etc. at the lowest cost to the table. Just like in any other engineering field. Today, "HiFi" is almost a synonym for "too expensive for you dirty plebs".

A weird anti-science attitude spares companies the hassle of proving (in whatever way) that their $1000 headphone amp and $2000/box stereo speaker system performs better than the $75 O2 and €190/box Nubert nuBox 381 (sighted listening tests a.k.a. "reviews" on Head-Fi are useless). There is no accountability. More expensive must mean better because surely companies are abstract entities that do not lie unless I think they do. Vintage is back and better than run-of-the-mill-solid-state-stuff available today because Head-Fi says so. The result: Ridiculously expensive products of questionable quality on the one side and cheap Logitech/Bose/Skullcandy/Beats/kitchen radio garbage on the other. A few sensible, well-engineered and under appreciated products are somewhere in the middle. Guess what the masses are going to buy? Producers must target these crappy transducers to make their music sound good there to sell stuff. Go figure.

The audio industry is making a fool out of itself. And budding audiophiles are made to believe the lies and waste a lot of money. It is perverse. In the end, everyone loses.

I can't help but think of quacks. They advertised their useless services on the market and people believed them because they didn't know better or wanted to believe the quacks were better than "real" medical doctors back then.

Keep in mind that e.g. liking tube sound and such is perfectly okay. But please keep the deliberate distortion on the transducer level or in effect boxes so you can switch the tube cruft off depending on your mood.

Craigster75's picture
Bass vs. true audiophile sound

After reading this article, I am glad to see I am not alone regarding my desire for neutral, transparent sound, yet with bass emphasis in terms of presence and punch. I realize it is an oxymoron to call myself and audiophile bass head, but I think this is how I would best describe modern music listening on the go. I am old enough to recall cleaning my Tom Petty "Damn the Torpedos" album with Discwasher on my Technics turntable before listening to my Sennheiser's with earcups made of yellow Nerf material, but young enough to appreciate a solid bass beat from Nicki Minaj to help motivate me during a workout. I have been wracking my brain trying to find headphones under $500 that will marry killer bass with crystal clear, accurate mids and highs. If the Sennheiser HD800 is the reference for mids and highs; the Beats PRO is the reference for bass, what are my best options in a portable headphone that could most closely meet my needs?

Below are a few I am considering based on research, but I have not listened to them. It is frustrating how few retailers have these available to demo, so I would be relying on online retailers that accept returns, but don't want to make a habit of returning headphones.

Ultrasone Pro 900
Beyerdynamic DT770
Aiaiai TMA-1
Phiaton MS 400
Philips Fidelio L1
AKG K550
Sony MDR-MA900
PSB M4U2
Sennheiser Amperior (not yet available)
V Moda M100 (not yet available)
Ultrasone Edition 8 (if I hit the lottery)

I would welcome suggestions/feedback/recommendations, thank you.