Are Objective Headphone Measurements Relevant to Audiophile's Subjective Experience?

Ed. Note: I talk with Steve Guttenberg on the phone a few times a month, and I very much enjoy our chats. When Steve suggested we have a conversation about measurements and their relevance to audiophiles and write it up as an article, I was all-in immediately.

I hope you enjoy our chat as much as I did.

Steve Guttenberg: Let's get down to the nitty-gritty, do you really think measurements can be used to predict sound quality preferences? And are measurements in any way more reliable predictors than subjective listening tests? We all know that tube electronics don't measure as well as solid-state gear, but that doesn't seem to bother a lot of audiophiles (and musicians) who crave tube sound. So what do measurements bring to the party that subjective evaluations miss?

Tyll Hertsens: There's a tension between subjective and objective evaluation. It seems to me they both attempt to evaluate headphones, but operate in two different domains. Essentially, I think measurements can tell you how close something is to neutral, and to what extent it deviates from neutral. Subjective testing allows you to know how pleasing something is, and in what ways. I don’t think our brain and perceptual system is like an Audio Precision tester at all; I don’t think we’re good at objectively analyzing what we hear. Likewise, my AP tester isn’t conscious, and it doesn’t experience what it’s hearing. So I don’t think it can tell us much about the pleasurable experiences of listening to a pair of headphones.

The trick here is knowing that objective and subjective evaluation happen in two different domains, and one has to have an open mind to bring in information from both without feeling conflicted. If a person falls too strongly into one camp or the other, their viewpoint bias may override the ability to see valid points of the other viewpoint. The question is how to keep both valid viewpoints at the same time.

SG: I see your point, and I want to say upfront that measurements are valuable tools for engineers, I'm just not convinced laymen can use them to predict whether they would prefer headphone A over headphone B based on looking a wiggly lines on a graph, as opposed to reading a reviewer's impressions of the two headphones. I recall that when we were both on a panel at the 2011 Rocky Mountain Audio Fest you said the Sennheiser HD 800 measured better than any headphone you tested to date, but it wasn't your favorite headphone to listen to. On another occasion you mentioned the Bowers & Wilkins P5 didn't measure all that well, but you enjoyed its sound.

TH: Well … it’s complicated. There are a whole variety of measurements that I take, and when I say a headphone measures well it’s really a summary of a variety of discrete measurements. The HD 800, for example, measures excellently in terms of the speed of the initial transient of a 300 Hz square wave, and has little ringing after the initial transient. This makes for a headphone with very good imaging as transients are clearly identifiable, providing accurate sound arrival time information for the brain to use to accurately localize sound and build a sonic image. But the initial transient also overshoots a little too far, and makes the headphones sound a little piercing, which is a characteristic I don’t like.

The subjective experience is complex too. The B&W P5 doesn’t measure well---having a somewhat excessive bass, and rolled-off highs---but I remain very attracted to them by their look and feel, for movie listening and phone calls they’re very easy on the ears and fun to hear, and their isolation is astonishing. There are a lot of things I really like about them, and I care very little about their inaccuracies faced with my satisfaction in other ways. Basically, I’m not listening for their faults in normal every-day use, so don’t hear them.

I feel things are very complicated for both subjective and objective evaluation. “Measures well” and “sounds good” are both gross oversimplifications.

SG: So there's subjective analysis of the objective measurements. That's cool, and then there's the stuff that can't be measured and quantified -- like comfort and the feel of the thing in your hands – which heavily influence preferences. The intimacy of headphones puts them in another category compared with other audio components. Measurements can't measure that, and in some ways it's more important for a headphone to feel good than sound good.

TH: I would certainly say the look and feel of headphones has far more relevance to user satisfaction then the look and feel of speakers. I mean, you’re not wearing your speakers on your head while you commute on the subway.

The value system for personal audio is clearly different than other types of audio. Esthetics, comfort, and convenience play a much more intimate role in your experience with them than with home audio. And satisfaction in those areas is going to bleed over into your aural satisfaction as confirmation bias causes you to hear them as better than they otherwise might be.

So, yes, feeling good about the look and feel of a personal audio product is probably more important than it is with a home product.

SG: It's interesting, John Grado told me last year that he never measures prototypes when designing new headphones. He's been at it a long time, and his ears tell him what he needs to know. Grado's working method is at one extreme and I guess Sennheiser is at the other. Sennheiser takes a more hard engineering, more measurement oriented approach to advancing the state-of-the-art of headphone design.

John Grado is a "seat of the pants" designer, so why would his sound be far more consistent than Sennheiser's? I can't identify a Sennheiser sound per se, it's all over the place. If its engineers were really using a scientific approach, wouldn't all Sennheisers of a certain type, say open-back for example, have an identifiable Sennheiser sound? They don't, so even for a company like Sennheiser other factors come into play.

TH: Well … Grado’s sound doesn’t change much because their headphones are all nearly identical in design. Sennheiser makes maybe a dozen full-size open cans, but the designs change significantly and tend to run in families. For example: Sennheiser makes the HD 518, HD 558, and HD 598. This is a family of headphones that share a similar look. Sennheiser probably does some things like use common parts in these headphones to gain economies of scale. So they really aren’t completely free to optimize the top product in the line as it may have some parts in common with the lowest priced product in the line.

You’ve also got a big company with corporate goals and timelines. There comes a point where management will “shoot the engineer” and put the product into production. Their job is WAY more complex than Grado’s, so you get more variance in the end sound.

SG: I still think it's odd that Sennheiser doesn't have a Sennheiser sound. The reasons behind consumer preferences are complex, and all I'm saying is the better measuring headphone isn't guaranteed success. Grado owners don't live in a vacuum, and many of them are being exposed to other brands' sounds. Some stick with Grado, some move on. Some Sennheiser owners cross over to Grado or Beyerdynamic or Hifiman. People eventually wind up with the sound they like, how well the 'phones measure has little to do with it.

ARTICLE CONTENTS

COMMENTS
Limp's picture

Really appreciate that you take the time to educate Steve a bit, Tyll. :P

Also good that you included that last bit about MOG. A lot of people, me included, gladly sacrifices some fidelity for convenience. I haven't used a dedicated DAC or amplifier for months, while my little Clip and IEMs get no rest at all.

ultrabike's picture

Consumer, newbie, nob take on the mater:

Measurements mater because they serve as a concise guide to nobs like me about what to buy. If a headphone has bass hump, likely it will measure that way. If a headphone is bright, likely it will measure that way. If you last set of headphones didn't satisfy your needs because they were too bright, a set of measurements and definitions will help you avoid this problem again without having to go through a ton of headphones in the process. Furthermore, it can narrow your search for a few good cans at incredible prices...

I would have never guessed the KSC75 can was as good as it was have I not seen it's measurements... Maybe I'm biased, but I can tell you it destroyed all of my previous a bit more expensive gear by no small margin, and the measurements correlate with that.

Our brain adapts to the environment, and it has to if we are to survive. But it can only do so much. A headphone that fits your needs in the measurement world (doesn't have to be flat), will likely fit your need in the real world... At least that is my experience. Furthermore, say your brain adapted to a crappy can, put on a good set of well measuring cans (where "well measuring" means frequency response that matches what you are looking for) and I can guarantee from experience, that people will perceive this difference.

And there are those aberrations that your brain will have to sweat incredibly hard to equalize...

Willakan's picture

Blind testing (especially ABX) is *more* sensitive a protocol where genuine differences are concerned vs sighted listening. If a difference is suddenly very hard to detect under blind conditions, this does not mean that blind conditions make it hard to detect differences!

The following comment about "cheap crap" is also not appreciated.

ultrabike's picture

The quote goes:

"Blind testers can't reliably tell the difference between coffee and tea. The audio guys who steadfastly believe in blind testing have it easy, they can buy the cheapest crap and live happily ever after!"

I think the context is on ABX testing. Brain kind of adapts, but it is not instantaneous. It really depends on the conditions: the amount of time it takes to switch from A to B, and how many times I guess.

Now coffee and tea are very different subjects here. It is really hard for me to tell from Blue Mountain and a similarly roasted Tarrazu. But man common man Folgers will get killed vs. any decent bean. Roasting makes tremendous difference... I don't think espresso tastes at all like medium roasted Colombian... And tea? In Japan I tasted a tea with beef flavor... Man I guess Steve was upset or something, 'cuz that rant didn't make any sense...

I believe Steve is a connoisseur, he probably has heard more systems that I will in 10 lifetimes, but it seems pointless to me to roast measurements, ABX tests, and objective thinking... For some, insatiable thirst for the next piece of gear might be important. For some others, we just want relatively decent stuff... Not expensive crap, or just 1% better gear for 1000% the price.

Limp's picture

My immediate reaction to Steve's statement was a bit simpler: "Okhams Razor"

If a whole bunch of people really can't tell the difference between two things, might there just be no perceivable difference?
And before he can get around to say that ABX testing is unnatural and not perceptive enough, I will remind him that our auditory memory is really limited. Within less than a second most of it is gone, and our subconscious will be more than happy to fill out those blank spots on it's own.

It is true that our auditory system is far from the standards of an AP system, but carefully controlled, large sample size blind tests are as good as we get.

ultrabike's picture

I know what you mean bro. Not my first reaction, but yeah: "Occam's Razor" ~= "Pluralitas non est ponenda sine neccesitate" ~= KISS principle.

From the KISS side of things, I don't like to see myself putting down $2500 for a bad measuring Ultrasone ED10. Simple no? I might see myself putting down $400 for a great measuring, and heavily engineered HD600?

I guess someone can definitively get their paws on an elite Ultrasone ED10, while I not too sorrily enjoy my not too crappy HD600...

BTW Tyll, the coffee and cookies in the picture look FANTASTIC!!! Or is it tea? Hopefully you guys were able to tell ha?!

:) I think Steve has very valid and useful opinions, and bring to the spotlight some very cool gadgets... I just wish he would put a tinny bit more weight on objective stuff... Anyhow, you will always have the right and the left... and ignorant folks like me in the not so comfy middle :(

catscratch's picture

I would think that if you apply Occam's Razor to it, the only conclusion from a DBT where 80% of people couldn't tell the difference was that 80% of people couldn't tell the difference. Saying that there is no difference would be jumping to conclusions.

Also, if 20% of people in a test could reliably tell the difference, then there IS a difference.

I don't put much stock in DBT not because audio is some sort of religion to me but simply because I think DBT is fundamentally flawed. It doesn't separate audio perception from audio memory and audio comparative skills. Also there is a difference between short-term and long-term perception. DBT conditions are not normal listening conditions. Interpretation of DBT again is flawed - if a very small minority can reliably tell a difference, there is a difference. Otherwise you're assuming that all hearing ability, or indeed audio memory and interpretive ability is the same.

I agree with Tyll's stance on measurements - I think measurements are simply a tool to be used in audio, as is DBT, as is plain old listening, and as is - first and foremost - understanding the fundamental science and theory behind it all. Trusting only DBT is just as faith-based as anything since it assumes that DBT is flawless. Trusting only measurements assumes that we have the ability to measure absolutely everything that matters, and that our testing methodology is, again, flawless.

The objectivist standpoint simplifies something that's inherently much more complex. We don't understand our perception perfectly (or indeed very well at all) and we don't know everything there is to know about audio or electrical engineering or headphone/amp/whatever design. If you want to be a true skeptic then you need to be skeptical of everything except what has been scientifically demonstrated and peer-reviewed and is generally temporarily assumed to be true. Temporarily assumed, mind you, is the best we can do. Objectivism is selectively skeptical, which is just as faith-based as anything out there.

AstralStorm's picture

The actual test for something as badly defined as "quality" is ABC/HR augumented with a low anchor. (known bad quality sample)
That's not a simple DBT.

It's similar to MUSHRA testing done by AES, but proven to be more sensitive. (ITU-R BS.1116.1)
The results have to be analyzed then for statistical significance using a valid F-test. (Neither Tukey nor straight ANOVA are sensitive enough - there are better methods.)

To test headphones in such a setting, one would have to accurately record their sound output and reproduce it through some known reference design, preferably speakers that are known to be accurate, equalized, with the listener in the "sweet spot". You can imagine that's difficult to set up.

The closest anyone ever got to blind testing headphones is Tyll's testing for wear differences of those 2 pairs of AKGs. Similarly manufacturing variance could perhaps be quantified using a double blind test.

The whole trick is that straight DBT can only tell you if something is not audible. This works well for amps, but not really for headphones.
Unlike correct amplifiers, headphones are really different sounding from each other. Differences of more than 3 dB in frequency response are trivial to find, more than 0.1% THD differences which can be audible...
That's not even including to various HRTF issues due to placement and ear shape...
The custom IEM manufacturers have it far easier. :)

Note that the average of preferences is not the same as preference for the average, but it seems people grade certain average things as quite appealing - for example, average faces.

KikassAssassin's picture

I have a hard time getting inside the brainspace of someone who DOESN'T like blind ABX testing, but Steve's comment about audio being a belief system gives me some hints as to where those people are coming from. If you're the kind of person who takes things on faith, and you treat audio almost like a religion, no amount of evidence is going to convince you that what you believe isn't true. Faith-based belief systems are based on ignoring evidence in favor of your own biases.

As for me, I'm a skeptic. I try as best as I can not to take anything on faith, and I need to see substantial, provable evidence before I'm going to be convinced of something. The placebo effect is very real and demonstrable, so it's important to be aware of it when comparing audio products. Our brains are very good at tricking us into thinking we're hearing differences that don't exist, just because we want or expect to hear a difference. I've experienced this myself before when comparing DSP settings in my audio player software, where I was convinced I was hearing a substantial difference between two settings until I did a blind ABX test, where I couldn't tell any difference at all. Once I'd convinced myself there was no difference, I suddenly didn't hear any difference when I listened to them conventionally.

As far as I'm aware, ABX testing is one of the best methods available for weeding out the placebo effect as a factor, so I don't trust anyone--even my own ears--to be capable of offering a truly accurate subjective evaluation of audio equipment if they refuse to do ABX testing.

Plus, as Steve said, it can also save you a lot of money. You'd think that would convince more people to give it a shot. ;)

13mh13's picture

The graph on headphone.com shows a huge bass boost. Given these 'phone's bass tuner, I emailed HeadRoom about this feature's setting for the graph metrics. HR noted the tuner was at its MIDDLE position. The tuner only ADDS bass, so min. position is its flatest spectral. They are faves: they sound and measure very nicely. YMMV.

AstralStorm's picture

Yes, it's fairly widely known that IE8 have the right amount of bass at the lowest or single notch setting. They still have a "warm" sound signature even at lowest setting - a pronounced ~120 Hz boost. (I thought about 3 dB)

Steve Guttenberg's picture
Guys, enough with the blind testing thing, we spent 3,000 words mostly talking about whether measurements predict sound quality preferences more reliably than subjective listening tests. No one addressed the question -- What do measurements bring to the party that subjective evaluations miss? The wiggly lines on a graph must be subjectively evaluated after all.
ultrabike's picture

Hopefully I won't get crucified for once in my life but here we go...

"What do measurements bring to the party that subjective evaluations miss?"

Subjective evaluations depend heavily on the person doing the evaluation. Objective evaluations remove that dependency to some extent. A subjective evaluation is very valuable when provided by an expert such as yourself and Tyll (for the wealth of information it provides). An objective evaluation removes the expert, and allows the buyer to insert him/herself in the evaluation, provided the buyer has some baseline or idea of what he/she wants...

Say you buy two cans: KSC75 and DT-990. You find the KSC75 is more of you liking but want to upgrade. You feel the KSC75 may be still just a tad bright and would like more bass extension. Then you look at the plots and select a set of cans that may fit the bill based on the frequency response. And then read subjective evaluations to see what the experts think about it given their wealth of experience.

That is what I have been trying to do, since I do not have no access to some of the cans I'm considering. If I have access to all of the cans, then maybe objective evaluations would just be fun by confirming what you already hear... I personally do not have access to even an HD600... Best Buy doesn't carry it.

purrin's picture

Measurements ... are just another tool.

Personally I would rely on them more than subjective listening tests, especially if the subjective listening is from untrained or biased ears. Measurements bring objectivity and precision where all listeners (with trained ears, untrained ears, biased, treble sensitive, etc.) can calibrate their own tastes and experience accordingly.

HD800: http://www.changstar.com/index.php?action=dlattach;topic=49.0;attach=934...

Grado RS1: http://www.changstar.com/index.php?action=dlattach;topic=44.0;attach=916...

Modded T50RP "Paradox": http://www.changstar.com/index.php?action=dlattach;topic=41.0;attach=926...

Even if you didn't know which headphone the graphs represented, guess which one of the above has the most aggressive and raw sounding midrange?

Which one has the most relaxed and speaker-like presentation?

And which has a bit of treble grain with a slight bite in the upper mid/lower treble region?

What it comes down to is communication and speaking the same language. Simplistic language such as "better", "more accurate", "more bassy", or "more fun" is inadequate for the more serious hobbyists.

Username's picture

this was a good read. i would like to see a headphone review where you guys team up and share what you think about a headphone. Steves golden ears against Tyll's measurements. it could be a good bias control measure and a fun read. or you could each pick a headphone with a common price and have an array of test to see which is better for a specific scenario (like travel use or something).

PS Tyll you could rake in some more page hits if you split this into three pages as the second page is a little long.

Phos's picture

Immaculate Baking Company Ginger Spice?

I think an interesting test would be a double dissociation of the effects of sighted listening on perception. It's hard to detect differences between amps in blind conditions, would amps be heard differently if they looked different but were the same? It would consist of an AB (I don't think this falls under ABX) test with two amps in different enclosures, there would be a switch that might or might not be configured to switch between the amps, listener has to tell whether it is or not.

As for headphones, this seems to me to make it a lot easier to do what Tyll suggested, determining where you stand vs. neutrality. I didn't really realize that the A900s were mid centered phones for quite a while, considering how it ran somewhat contrary to AT's marketing.

Tyll Hertsens's picture
Nah. A friend's wife, who's a hell of a baker.
Phos's picture

I suspect Sennheiser's lack of a house sound might have something to do with the limitations of the technology they have/had, the voice is going to be determined by what the type of driver they're working with is capable of. They seem to have changed their sound to follow what their drivers are able to do best. You can measure the phone very closely, but how to make a given change to the sound it makes isn't always clear.

Mkubota1's picture

Can't tell the difference between coffee or tea? May I ask what you're drinking? To become a Master of Wine (MW), I'm afraid you won't get away with blowing off blind testing.

Other than that, it was a fantastic read and excellent points were made by both. But I have yet to figure SG out. I used to enjoy reading his column but have gotten turned off by one too many "music that sounds like crap" comments. That's just too "hey kids, get off of my lawn!" for me, and awfully judgemental for a subjectivist. ;-)

TH, I love the way you listen to music and we are exactly alike in that respect. Keep up the excellent work!

Long time listener's picture

Measurements DO help in finding the headphone you want, but it takes time and experience (and money spent) to learn to use them, and to correlate them with what you are hearing. (Thanks for providing them Tyll!)

Tyll has often mentioned, for example, that sharp initial overshoots in treble square waves are a sign of harsh-sounding highs, and found them in both the Sennheiser HD25-II and the Audio-Technica M-50. After listening to both, I found it was true. Neither pleased me when listening to orchestral strings--too harsh. When searching for the ideal all-rounder for use with my HifiMan HM-801, I ended up rejecting both, since I listen mostly to classical and electronic music.

Now when I look for a possible headphone candidate, I look for treble that is highly damped and does not overshoot--like the AKG 518LE, or some of the new Philips headphones. I haven't heard the latter, but will audition them based on the measurements.

Surprisingly, FYI, the AKG 518LE has fantastic synergy with the HM-801--the ideal all-rounder for it. Deep growling bass, silky strings, rich, creamy woodwinds. They're not just for Hip-Hop, Tyll. They work just as well for Arnold Bax symphonies (on Chandos, with Handley conducting) as they do for Zombi's Escape Velocity, or Atticus Ross' soundtrack to The Book of Eli. Deep, dark, luscious, beautiful. Give them a try. Their only drawback is that they exhibit more variation from one to another than I've ever heard with other phones. If you take off the earpads, you can even see visual differences in the size of the apertures between the left and right sides sometimes, which affects the sound. But they're cheap. Buy two and keep the one that sounds best.

I also know from the graphs that I won't like anything where the frequency response is too high in the 2-4 KHz region. It will sound too "hot" and aggressive to me. So I don't even bother to audition those phones.

Cheers

Steve Guttenberg's picture
So what do measurements provide that my ears can't detect? If the problem is so small I can't hear it, then it wouldn't show up in a blind test, would it? We're back to where we started, the truly worrisome measurements merely provide visual proof of what you can hear with your ears. If you can't, what's the problem?
kmj2587's picture

The problem is that different subjective reviewers will say different and sometimes contrasting things about the same piece of equipment. Confirmation bias, expectation bias, and the inability to consciously process 100% of the sensory information we receive moment to moment are inescapable, and all three lead to the problem of conflicting reviews. When a reviewer is listening for some characteristic or another, their brain is going to pick up the relevant information and discard much of the rest. Ultimately it comes down to the scientific principle of results needing to be reproducible. When subjective reviews contradict one another, they are showing that their results are not reproducible.

Machines also eliminate much of the variability surrounding a listener review. Machines don't care what they were listening to previously. Machines don't get listening fatigue. Machines don't have other things on their minds when they're measuring a piece of equipment. Finally, machines don't have financial incentives to soften bad reviews or play up good reviews based on the preferences of advertisers. There can always be problems with the design of a testing system, but anyone worth their salt is going to provide enough information about their setup that shortcomings can be found and pointed out.

AstralStorm's picture

That's why I've personally validated a few good reviewer's opinions.
As long as the reviewer in question is consistent, it's all good for me.

Both ClieOS and Tyll have proven to be consistent and good at detecting flaws that were also annoying to me.
(However, magnitude of annoyance were different. ClieOS's tend to have lower "dynamic range".)

purrin's picture

^ This.

It's all about COMMUNICATION! Short of actual experience with a product, it's going to come down to communication from one audiophile to another, or a reviewer to an audiophile. Measurements are a precise way to do this.

For example, we can say that the HD800, PS1000, and T1 can sound bright. Only measurements (the right set of) can precisely delineate how or in what different ways they sound bright.

JA @ Stereophile, who I feel is one of the best reviewers around, is very effective at communicating how a loudspeaker sounds like using FR, CSD, off-axis, step response, 1kHz RTA, and other measurements to supplement subjective listening observations.

With enough experience and exposure, an reader can correlate those measurements into something tangible. The level of correction of those measurements with subjective experience isn't always perfect, but still very high, especially with transducers which have inaccuracies several orders higher than electronics.

Since the context of the article is the "audiophile" experience, let's stop mickey-mousing around. Imagine Car & Driver saying X car feels faster than Y car without providing basic measurements. Even though acceleration, braking, and skidpad numbers don't tell the entire story, if Car & Driver didn't provide at least some basic measurements, no one would really take it seriously.

Finally, there three great misunderstandings which I see here in these discussions: 1) The assumption that people always prefer a neutral FR, i.e. those who don't like neutrality in the first place consider measurements useless, so they never bother to understand the characteristics that they do like; and 2) The subjectivist vs. objectivist arguments, which has very little to do with the topic on hand; 3) The misconception that a good set of measurements always, 100% of the time, equates to good sound.

I think what it comes down to is that some people are threatened by measurements since measurements provide an ideal or reference. And that if our tastes differ from the ideal, we feel somehow defective or abnormal - which is of course completely ridiculous. If we take the time to study measurements, and what they represent, then we will eventually be able find patterns (regular or irregular) which are amiable or incompatible with our own tastes.

Jazz Casual's picture

The car analogy is not a convincing one. The measurement of a car's travelling speed and braking distance can be measured and universally accepted. Can the same be said for headphone measurements?

We need to be able to reliably measure and monitor the speed that a car is travelling at, in order to comply with the rules of the road. We don't need to know the frequency response of a headphone in order to listen, appreciate, enjoy and describe the music that it is reproducing. Is there really a need to measure a headphone's performance beyond a curious interest in doing so?

I think that subjectivist vs objectivist arguments have everything to do with the topic at hand. What is the science of measuring headphones predicated upon? Is it based upon a universally accepted proof or standard? Or is there debate surrounding its currency? Is it likely to be reviewed and revised in the near future?

The term "neutral" is a rubbery one that is often used by geeks and reviewers alike, to describe headphones that measure well but sound very different. How can there be so many different degrees and flavours of neutral?

A healthy scepticism of the accuracy and usefulness of headphone measurements, should not be misconstrued by the measurement geeks as fear and ignorance.

purrin's picture

Accuracy is an issue. Precision is not. As long as you work within a system, such as Tyll's, a lot of information can arrived at.

The point of headphone measurements (at this time) is not to determine or argue about absolutes (what is neutral, what is bright), but compare relatives. With Tyll's graphs, this can most certainly be done to a surprising degree of correlation to the subjective experience.

Try to explain in words how you perceive the T1, HD800, and PS1000 treble are different (I've owned the latter two and have heard the T1), then look at Tyll's graphs. You will see specific patterns which pertain to them. Measurements are only useless if you do not know how to read them. Again - this takes practice.

Jazz Casual's picture

Inaccuracy is more than an issue if you're in the business of taking measurements, it is a fundamental problem. It indicates that the measuring methodology lacks rigour and the data it yields is unreliable. This could be described as bad science.

The purpose of measuring headphones as you describe it sounds imprecise and open to broad, subjective interpretation; not unlike comparing subjective listening impressions. ; )

The headphone forums are testament to people's ability to convey what they hear in words. Some are more adept at doing this than others, but there is no requirement for anyone to do so precisely or even accurately. We should know that they are no substitute for our own listening experience, and nor are headphone measurements for that matter.

What is the value in learning how to precisely interpret inaccurate headphone measurements? How do we know if they are accurate or not? Are they subject to stringent peer review and verification against an agreed standard?

Jazz Casual's picture

This is a timely article given that a couple of days ago I revisited the T1 and HD800; two headphones that come much closer to the ideal frequency response for a headphone than the PS1000 does, which I own. The measurements only accord with how I hear the PS1000 compared to the T1 and HD800 in one area, and that's the midrange. Both the T1 and HD800 have more midrange presence than the PS1000 to my ears. However, the measurements don't explain why the PS1000's treble presentation sounds smoother and is less fatiguing despite its treble peak being greater than that of the T1 and HD800. Nor do they explain why the PS1000 managed to dig just as deep and reach very low acoustic and electronic bass notes with the same authority as the T1 and HD800. Most importantly (for me anyway), the measurements shed no light whatsoever on the intangibles that make the PS1000 a far for more pleasurable listening experience simply for the passion it conveys, its crystalline presentation and unique ability to transmit what sounds like the essence of the recording to me, without robbing it of any of its vitality like the other two phones can do - particularly the HD800. So I'm with SG on this one. : )

Alondite's picture

In the grand scheme of things, subjectivity is only valuable to the individual. There are some 7 billion or so people in the world, and thus 7 billion different opinions.

Subjectivity has no place in the realm of scientific research and testing. In fact, the very presence of subjectivity in research will essentially invalidate all of the findings. Objective headphone measurements reflect the true sound of the headphones, free of any subjectivity. You can't argue with objective results. If there is some discrepancy between what is seen in the measurements and what is heard, than the measurement procedure is potentially flawed, or there aren't enough different measurements to tell the whole story.

I don't personally care about the subjective views of others concerning sound. If somebody tells me that the ER4s have "anemic" bass I'm basically going to ignore them because the measurements tell me that the bass is balanced within the rest of the spectrum.

I think people are mistaking subjectivity for personal preference. They are not the same. Subjective views are simply characteristics of an object that exist within the mind, not whether they appeal to your or not. Objective views are the true, measurable characteristics of an object.

Everybody hears the objective qualities of sound. Subjectivity then comes from a reference established within the mind, and any variation in what is heard based on anatomical/perceptive differences between people.

For example, my reference for bass quantity is the Vsonic GR07. They have what I would consider to be neutral, balanced bass. That is my subjective view based on how I hear the music, and compared to other headphones I've heard (and they might measure as such as well, I don't know yet because I haven't seen any measurements). I consider the Audio Technica AD900 to be light in bass compared to my reference. However, had my reference been the HiFiMan RE0 I might think differently. In addition, I'm sure there are people out there who think that the GR07 are light in bass, and others who think they are bassy. Ultimately, the only definite answer is the one reflected by the measurements.

Anyway, I've rambled on without making a solid point, so I'll try to do that now:

Subjective views don't mean anything to anyone but the one who holds them and have no place when discussing sound quality unless in reference to something else. Without a reference, subjective views are arbitrary and meaningless.

AstralStorm's picture

And my reference wouldn't be the VSonic GR07 in the bass quality, but only in quantity. Haven't found a good enough bass reference yet.

The quantity is very right, but the bass is loaded with high bass. (200-400 Hz boost, 4.5 dB - 8 dB depending on tips)
This gives them a bit "boxy" sound, not unpleasant but not what I'd call neutral*.

They also have a 3-6 dB boost issue around 6 kHz to my ears - the annoying range which is also used for "presence" boost, so I get double-boosted sibilant vocals. For example, HifiMan RE272 have a boost similar to it, but far lower, around 4.5-5 kHz, which is much less annoying.

(*Neutral defined as "reproduces all/most frequencies at the same loudness to my ears".)

Jazz Casual's picture

What you should be saying Alondite is that other people's subjective views don't mean anything to you. I don't think that I am alone in valuing some people's listening impressions and relating them to my own subjective experience, regardless of whether I share their personal preferences or not. You might like the taste of grapefruit and I might not, but I trust we can agree that it tastes sour. Similarly, I know a few head-fi'ers whose descriptions of how headphones sound very much align with my own, but that doesn't necessarily mean that we share the same headphone preferences. A person who dislikes treble happy, mid-centric or bassy headphones can be just as capable of perceiving those sound traits and describing them reasonably accurately, as someone who actively seeks them out. I have found subjective listening impressions from ears that I trust, more useful than objective measurements for the purpose of comparing and evaluating headphones, both before and after I've heard them for myself.

mikeaj's picture

I see a lot of audio gear reviews mention which music was played, what amps were used, and so on.

However, there's an even more important aspect that usually goes unreported: the listening volume! Not only does the performance of gear change at different volumes, but human perception to the sound will vary a lot too, based on the volume.

There are a lot of confounding variables to consider when evaluating audio reviews, but this is one that is too often forgotten.

kmj2587's more general comments about reproducible results are spot on.

Sometimes I think that the source of information (measurements vs. human listening) is too much of a focus, that it becomes contentious. What we want is -good- data, wherever it came from. Then we can hopefully draw the correct conclusions based on that data. Unfortunately, acoustic measurement systems often aren't blameless (most are good enough to gain some valuable insights though), and a lot of people misinterpret some measurement data. On the flip side, a large amount of listening impressions produce bad data, for a wide variety of biasing influences and factors as previously discussed by others.

tinyman392's picture

I agree that objective looks at headphones do explain a lot of subjective ideas, and can confirm what people hear (or why they hear it that way), and weed out opinions that may not be up to snuff.

Objective data cannot ever tell that a headphone will have a given property. What is punchy in the objective world? What is textured bass in the objective world? You can give me an example of what it is, but there is no definitive answer as I'll always be able to find a counter-example that doesn't fit what you say it would graphically. For example, if you say treble sparkle/splash is caused by an extension in the higher frequencies (like in the PFE232, Brainwavz B2, etc.), I'll come back and mention the Etymotic graph that shows the treble rolling off... If you want to say texture is the higher focus on the sub-bass, I'll point you straight to the Aurisonics ASG-1 graph that shows it well recessed. I can go all day with just about every single property (good or bad) that will contradict what you'd expect it to be.

At this day in age, I don't see too much of it being used though. Many of the words we used to describe a pair of headphones really can be translated to the graph. However, many of the graphs cannot be translated back into the words we describe our headphones with. It isn't a two-way street... Yet. We are still learning all of this stuff, it'll take some time, but with it, we can learn why something has texture, punch, detailing, harshness, etc.

I truly think that we need both results in order to really get a good grasp on what something sounds like. Objective data alone is not enough. Subjective data has subjectivity to it. However, put them both together, and one when confirms the other, you have a really big picture about how something is going to sound. But remember, subjective can be confirmed by objective. However, until you know what punch looks like definitely (in ALL instances), objective can not be confirmed by subjective (can't have something confirmed if we don't have a full grasp on what it is precisely).

AstralStorm's picture

I can't find the graph you're talking about. Please provide an actual example, otherwise you come across as a troll.
ASG-1 has been described all ways by everyone and their dog. I suspect it has a peaky frequency response, with multiple peaks, like, say, Ultrasone PRO 650 which I've tried and didn't like due to various reasons. (This chart: http://www.innerfidelity.com/images/UltrasonePRO650.pdf)

"For example, if you say treble sparkle/splash is caused by an extension in the higher frequencies"
You don't. Bandwidth is not directly related to tonal balance and vice versa.

Brightness/darkness is directly defined by skew of the graph (tangent of graph, filtered). Major peaks can throw this measure for a loop, as it's typically based on straight average and should be a weighted average instead. (but nobody has the set of weights yet)

Etymoics are bright mostly because of the magnitude of the ~2.5k peak, which is pretty high and somewhat due to a lower 7k peak. Other than this nasty issue they're nicely balanced if a bit low bandwidth - but that is an issue for most IEMs. (they end at 17.5 kHz)

There's also fast/slow - this is typically first due to dropping step response, then attributable to the ringing which is best visible in impulse response. (but also noticeable in step response)
So, e.g. Etymotic ER4PT are overly fast.
Huge overshoot can also give the feeling of brightness if it happens at high frequencies.

Bass control/bass bleed is directly related to step response at low frequencies - which also affects THD there. Long ringing and high magnitude ringing is bad, as is drooping. There seems to be no consensus which one is more important.

Phase chart is important in checking how the device will react to bad sources (high output impedance, low current).
This is the first thing that has to be checked - whether your source used for matching is not broken by high output impedance or capacitance-coupled. Otherwise any description you make will not be translatable to anything. (unless you happen to measure using the same source)

Care to actually provide something that's not mappable?

acs's picture

Something important has not been mentioned, perhaps it is obvious so no one has said it, but I think that it should be put on the table.

If your goal is to accurately reproduce what is on the recording, using objective measurements when selecting amplifiers and digital sources is a requirement. Amplifiers can be so transparent that the signal from the input can almost be nulled with the signal from the output, only adding a small amount of barely audible distortion. DAC's can also be very accurate. Measurements can reveal these performance levels, whereas listening cannot.

Speakers and headphones cannot reproduce the incoming signal close to as accurately as an amplifier, or a DAC. There is no speaker or headphone that could come close to nulling the signal from input to output, this is inherent in the way that they work and the function they serve. This is where subjectivity takes a larger role. Deciding what is important to you in a sound signature based on the recordings you listen to and your personal preferences is much more important in selecting headphones and speakers than in selecting amplifiers and sources.

Imagine listening to an expensive amplifier that sounds good with HD650's, so you buy it. Then you decide to try some Beyer T5P's and don't like the sound. You think, geez, this amp sounded so good with the 650's, and now it just doesn't work the same with the T5P. Well, you didn't look at measurements, and it has a high output impedance, which can shift the frequency response of lower impedance headphones, like the T5P, but might not effect the 650. You have made an expensive purchase of an amplifier that only works with high impedance headphones, but you couldn't have listened for it, because you didn't have the headphones you were going to buy, and even if you did, you might think it was "synergy". Measurements and knowledge would have revealed the problem in the first place, and save you the trouble of getting on the merry-go-round.

It is very foolish to think that measurements are not a useful tool in selecting audio components. Guttenburg's insistance that they are just squiggly lines is like an illiterate saying that words are just squiggly lines. Just because you haven't taken the time to understand what measurements mean doesn't negate their meaning.

-Aaron.

jadeeast's picture

Let's pretend that I like cookies. In fact, I'm obsessed with them.

I read cookie blogs and magazines. I've studied the FMRI scans of people's brains eating cookies. I've been an active member of cookie-fi since it started. I read all the peer reviewed cookie studies. I've studied the culture and history of cookies. I've interviewed 10,000 cookie eaters and watched hours of footage of cookie eaters on your-tube. I study the brain chemistry of cookies. I've studied homemade and industrial cookie processes. I subscribe to Cookie-o-phile magazine. I collect cookie jars. I have a degree in food chemistry. I have charts describing the weight size, density, and colour of 1,000 kinds of cookies. I've read over 2,000 reviews of cookies from experts and lay people. I've read the double and tripple blind studies of cookies going back to 1927.

I've made it my life's mission to know everything about cookies. I know all there is to know objectivity about cookies including other people's subjective experience tasting them- but I've never touched or eaten one myself.

What knowledge would I gain by eating a cookie?

AstralStorm's picture

Not much, but it'd still be tasty. (assuming it fits your criteria of good tasting cookie)

Jazz Casual's picture

Not necessarily tasty but it would have flavour and texture, which could be observed and described without making a value judgement about whether it tastes good or not. ; )

Alondite's picture

@AstralStorm

That was basically what I was saying. I've never heard something that I've seen measured as having "ruler flat" bass, so the GR07 is my reference for now. I would imagine that the GR07s measure fairly well in the bass though.

And that bump at ~6kHz seems to flatten out over time. I did a blind A/B test with a brand new pair and a pair with ~300 hours of use and the new pair was very noticeably more sibilant. Actually they seem to change in a number of ways, but that was the most obvious.

I'm still convinced that they would knock the Phonaks off Tyll's Wall of Fame if he ever got to hear a broken-in pair.

AstralStorm's picture

Nah, it doesn't flatten at all - you might've got used to it.

I'm actually interested in this burn-in thing that hasn't been captured yet.
I've tried to keep tabs on it by reequalizing said IEM to sound flat and checked the variances. 6k difference between corrections over time was relatively insignificant. (within 1 dB and 100 Hz peak position)
I can directly use the first EQ of these with great results.

In comparison, switching eartips provided variances in 6k region that were ~3 dB and ~500 Hz peak position. (Included silicone vs new Comply T140)
And ~4.5 dB in the bass 200-400 Hz region. (same)
All the tips sealed very well.

Comply tips do wear out in time, starting to get even more reduced bass and slightly increased high resonance. The effect during the month was ~1.5 dB - Comply don't stay alive much longer than this.

I can keep more accurate data if you wish, but it's time consuming and there's a sample of one IEM. I'd require at least a few to determine manufacturing variability.

marksig's picture

I was going to write a lengthy treatise on human psychology and how too many audiophiles don't know what they're doing and ruin the audio industry for everyone. I have opted for a shorter comment to save everyone time:

I find most objective-vs-subjective discussions beside the point and grossly mislead. I'm not going to write about which side is right or wrong.

The actual problem is that "HiFi" back in the first days implied measurements and technical excellence in circuit design to bring the lowest distortion, the most linear frequency response, etc. at the lowest cost to the table. Just like in any other engineering field. Today, "HiFi" is almost a synonym for "too expensive for you dirty plebs".

A weird anti-science attitude spares companies the hassle of proving (in whatever way) that their $1000 headphone amp and $2000/box stereo speaker system performs better than the $75 O2 and €190/box Nubert nuBox 381 (sighted listening tests a.k.a. "reviews" on Head-Fi are useless). There is no accountability. More expensive must mean better because surely companies are abstract entities that do not lie unless I think they do. Vintage is back and better than run-of-the-mill-solid-state-stuff available today because Head-Fi says so. The result: Ridiculously expensive products of questionable quality on the one side and cheap Logitech/Bose/Skullcandy/Beats/kitchen radio garbage on the other. A few sensible, well-engineered and under appreciated products are somewhere in the middle. Guess what the masses are going to buy? Producers must target these crappy transducers to make their music sound good there to sell stuff. Go figure.

The audio industry is making a fool out of itself. And budding audiophiles are made to believe the lies and waste a lot of money. It is perverse. In the end, everyone loses.

I can't help but think of quacks. They advertised their useless services on the market and people believed them because they didn't know better or wanted to believe the quacks were better than "real" medical doctors back then.

Keep in mind that e.g. liking tube sound and such is perfectly okay. But please keep the deliberate distortion on the transducer level or in effect boxes so you can switch the tube cruft off depending on your mood.

Craigster75's picture

After reading this article, I am glad to see I am not alone regarding my desire for neutral, transparent sound, yet with bass emphasis in terms of presence and punch. I realize it is an oxymoron to call myself and audiophile bass head, but I think this is how I would best describe modern music listening on the go. I am old enough to recall cleaning my Tom Petty "Damn the Torpedos" album with Discwasher on my Technics turntable before listening to my Sennheiser's with earcups made of yellow Nerf material, but young enough to appreciate a solid bass beat from Nicki Minaj to help motivate me during a workout. I have been wracking my brain trying to find headphones under $500 that will marry killer bass with crystal clear, accurate mids and highs. If the Sennheiser HD800 is the reference for mids and highs; the Beats PRO is the reference for bass, what are my best options in a portable headphone that could most closely meet my needs?

Below are a few I am considering based on research, but I have not listened to them. It is frustrating how few retailers have these available to demo, so I would be relying on online retailers that accept returns, but don't want to make a habit of returning headphones.

Ultrasone Pro 900
Beyerdynamic DT770
Aiaiai TMA-1
Phiaton MS 400
Philips Fidelio L1
AKG K550
Sony MDR-MA900
PSB M4U2
Sennheiser Amperior (not yet available)
V Moda M100 (not yet available)
Ultrasone Edition 8 (if I hit the lottery)

I would welcome suggestions/feedback/recommendations, thank you.

X