One Enthusiasts Take on Top-of-the-Line Headphones: The State of Flagships

In a recent reddit post here, SanjiWatsuki has unleashed his impression of current flagship headphones from various manufacturers, grading them on the basis of measurements and his interpretation of them. I found his methods and evaluations laudable—even though I would add other factors and weight them a bit differently. If you want to read along, you'll find the document here: "The State of Flagships". My comments follow...

Page 2 - "Most of them suck at designing flagships."
Ha ha ha! Pretty funny...I've been trying to figure out how to say the same thing recently.

Page 4 - Flagship Characteristics
I think these are pretty good benchmarks for headphone with the exception of #9. After hearing quite a few good sealed headphones lately (NAD VISO HP50, Focal Spirit Professional, Sennheiser Momentum), I'm beginning to think it just might be possible to make a sealed headphone that is world class. Personally, I'd much rather EQ for frequency response on a Focal Spirit Professional or Viso HP50 than any of the headphones mentioned in the article.

Page 12 - Why This Metric? #7
Wiggles in the impedance response between 300Hz and 1kHz can be caused by a rocking voice coil I suppose, but I tend to think impedance blips in the range between 300Hz and 3kHz as coming from not dealing properly with the small volumes trapped, or nearly so, around the driver and diaphragm. These are usually dealt with by using various damped ports in and around the driver. I do agree however, that a bunch of small wiggles in the impedance response can be an indicator of poor design. On the other hand, methods used by designers trying to flatten the diaphragm response will often turn into bumps in impedance, however, these bumps indicate something that is actually flattening the FR.

Page 14 - Why This Metric? #7
Here again is the statement that open headphones are inherently superior. One thing that's fairly easy to see is that sealed headphones are better able to control what's going on in the bass. It seems to me that open cans have a harder time keeping the air compressed around the ear for the long bass excursions, and as a result, sealed headphones often outperform open headphones in bass distortion.

I've also heard some sealed cans lately that are particularly good at treble resolution (HP50, B&W P7). These cans do deliver a sense of space similar to that of an open can. It also seems to me that some of the sense of space from open cans might be that you remain aware of some outside sounds, which may tend to create a sense of space that's not actually coming from the audio, but rather your ongoing acoustic sense of the room you're in. I'll admit the open cans are more likely to deliver good imaging and soundstage, but I'm beginning to thing it's not a foregone conclusion.

No mention of time domain signal analysis.
In his reddit postings he claims, "Square waves don't tell you much at all, especially not more than the frequency response. Making judgments based on the square wave is a poor decision, in my opinion. Tyll talks about trying to determining damping of the signal -- frankly, I'm not entire sure I buy it. I've never seen a study at AES come to his conclusion, but I'll accept it once someone else who has done more scientific rigor has said so. Edit: It's also worth noting that the tizziness appears whenever there is a lot of treble, particularly if it is a spiky treble."

131222_Blog_StateOfFlagships_SquareWavesIt's true that the square wave is largely just another way to look at frequency response, but it can also tell you some things about the phase response. The 30Hz square wave is particularly good at this to observe bass phase: If the waveform rapidly drops to the zero line and stays there, you have no bass. But if it undershoots the zero line substantially it means there's bass there, but it's gone out-of-phase.

Also, the reason why the treble goes "spiky" is because of resonant interactions, and this phenomenon is more directly observable in the impuse response, or better yet the cumulative spectral decay plots derived from the impulse response. Music is an extremely random signal; reproducing it requires a system that reacts well in the moment. I think time domain information is important.

In the 300Hz square wave responses at left, it seems to me capture the nature of the various treble response of the headphones shown in a more meaningful way. (Grade received in paper is shown next to model number.) The HD800 is truly a great headphone, it's clean impulse gives it unparalleled imaging. SanjiWatsuki says in the HD700 section, "There's brightness near 6kHz, but nothing to suggest horrible resonance." I'd say the square wave response makes it pretty clear. The K812 and the T1 have a very hard time reproducing a clean instantaneous edges—which is a lot of what music is. And the K701 response is exactly why one needs more factors and additional metrics—this headphone should get a better grade against the flagships.

More Emphasis on Listening
I think analyses without significant listening to verify the audibility of measured features an incomplete attempt at a true evaluation. I'd like to have seen a bit more listening comments in this paper.

A Good Attempt Overall!
On the other hand, an evaluation based on measurements alone does provide a significant amount of information. I applaud SanjiWatsuki for this valiant attempt to grade headphones based on an objective measurements. I found it very interesting, and no doubt many others will.

I do think the difficult thing here is identifying the actual relavent factors and how they add up. Also knowing the target response in absolute terms is very, very difficult. None the less, I think it's efforts like this one by SanjiWatsuki, acting as models for ongoing modification by the larger headphone enthusiast community, that provide an avenue forward for real learning in the hobby.

The document: "The State of Flagships"
Threads on reddit and Head-Fi.

yaluen's picture

Keeping your finger on the pulse of things even during the holidays :)

Should fix the Head-Fi link (

Tyll Hertsens's picture

Sorry, link fixt.

Jazz Casual's picture

SanjiWatsuki's blinkered approach is lamentable rather than laudable. To evaluate flagship or any other headphones on measurements alone is not only incomplete but misguided. Even Stereophile recognises the value in presenting subjective listening impressions and measurements of the audio gear it reviews, presumably with the objective of providing a balanced perspective for its readers. 

This post from another forum skewers the flaws inherent to SanjiWatsuki's rating methodology rather nicely: "Seemingly interesting but actually poor analysis. It is possible for headphone A to barely pass all categories and be rated A+, while headphone B passes 5 out of 9 with flying colors, barely misses on the other 4, and is rated an F. The cutoffs for each category are arbitrary, with no correlation to how sound is perceived, making this an example of why purely "objective" reviews are 1) subjective and 2) generally useless." 

SanjiWatsuki's picture

I fully admit that it is possible for a headphone to pass all the characteristics while being a poor headphone. Part of the reason that this was released was to vet the process. One of the headphones pointed out to score abnormally high was the B&W P5 -- which means that part of the process was a success.


I also fully admit that the cut-offs were completely arbitrary. The actual system was developed very hastily. Hence, I've open sourced the metrics on the Sound Science forum at Head-Fi. I literally chose those standards in about 15 minutes, so I don't hold any love for the particular numbers chosen.


That said, I believe that the metrics that I've chosen to analyze the headphones are good for what they were intended to do -- it was not an attempt to measure how good a headphone sounds, but instead an attempt to measure how well it was engineered. I explained my logic in the "Why this metric?" section of the presentation, and I haven't heard anything (yet!) to suggest that the analysis from that standpoint is wrong. Basically, I would expect that a flagship would be engineered to the highest standards, and I tried to choose metrics to reflect specific acoustic engineering difficulities. 


I know the limits of the data that I've chosen, and I would NEVER suggest anyone to make a decision about which headphone to buy based purely on this document. Anyone that thinks that this was the intention is missing the point of the document.

samratpatil's picture

Appreciate this man! Thanks a lot. I can understand there are flaws, also that you decided the process in 15 minutes and also that the results might probably be unfair to some cans. But this is definitely a step in the right direction. I was actually looking for something like this for a very long time. Being on the lookout for headphones I realized (and was shocked) there aren't any methods to objectively review these precision engineered items that we splurge on in thousands(only by reading some lofty adjectives). I think the adjectives have their place, but what you've started is necessary. I wish you the best in refining the process and parameters. Hope this becomes a standard and a formal reference guide. Looking forward to your updates.

Alondite's picture

The problem with subjective impression is that it's subject to bias and individual perception; you can give the exact same headphone to 10 different people and get 10 different impressions. There's a reason why all scholarly research is done using only objective evidence. 

Your second paragraph is pretty spot-on, except for this part: 

" The cutoffs for each category are arbitrary, with no correlation to how sound is perceived, making this an example of why purely "objective" reviews are 1) subjective and 2) generally useless." "

How it sounds is going to be entirely individual, and thus equally meaningless as arbitrary cutoffs. 

The effective solution would be to research the minimum difference, that is, the scientifically acheived reference point at which differences in distortion, volume, etc, can be perceived. These are going to vary among people, but I'd almost guarantee that there is a statistically significant value, if the research has been done, that is. 

At that point, you'd give headphones a grade in each category based upon standard deviations of the minimum difference, and assign an overall grade. 

Subjectivism should be avoided at all cost in any type of scientific research if you're hoping to achieve valid results. 

Jazz Casual's picture

Just to be clear, I am not advocating a subjectivist position here. Frankly, I find the objectivist vs subjectivist debate in the discussion of headphones old and tired. The design and manufacture of headphones doesn't take place in some vacuum, where the only considerations that matter are how well they measure against a standard that may be subject to change. It is not a precise science, and that's partly why I think it is misguided and somewhat naive to attempt to evaluate headphones entirely by using so called objective criteria.

matador's picture

...including the perception of what these measurements mean.

Jazz Casual's picture

Any product (including a flagship headphone) can be superbly engineered and not perform as well as intended, and be out-performed by products that have not been engineered to the same standards - so what?

I'm sure that some bright spark will come along and start poking holes in your metrics soon enough. ;)

I expect that no one in their right mind would seriously consider making a headphone purchasing decision based on your document alone. 

xnor's picture

Why so negative?

Face it, companies sometimes produce expensive crap and market it as high-end. I applaud any attempt at pointing that out and getting past biased subjective impressions.

Jazz Casual's picture

Even if it's flawed and simplistic? Then clap away.

xnor's picture

Yes, because it's a starting point that can be improved / built upon.

Above you complain about a narrow perspective, yet it seems your own perspective is very narrow. Either it's perfect or crap? And don't get me started on "even stereophile recognises the value in presenting subjective listening impressions". What a ludicrously odd way to put it.

Back to bad headphones marketed and priced as flagships: it's bad for everyone. The manufacturer, the customers, even the industry. Isn't that obvious?

Jazz Casual's picture

You've either misunderstood the basis of my criticism (which is not a complaint by the way) or deliberately misrepresented it. If you find my reference to Stereophile so ludicrous, then I suggest you replace it with Innerfidelity and you might begin to grasp where I'm coming from. It's a matter of taking a balanced approach to assessing the performance of hi fi equipment generally and headphones in particular, which I think is lacking in SanjiWatsuki's approach, no matter how well intentioned it may be.

KG_Jag's picture

I very much like the architecture here: define and measure the objective factors of performance that the best headphones are expected to achieve.  Of course there are elements, often significant elements, of subjectivity in what factors are selected; what metrics are selected; how tests are conducted; and (in this case) issuing either a pass or fail (or in one case--a half pass).  Yet this seems to be an honest and reasonable effort to determine what cans cut the mustard at the highest levels and which one don't--from a strict techincal performace viewpoint.

Of course achieving engineering excellence does not necessarily mean that the best sound is achieved. It certainly doesn't tell me whether it's the sound I personally perfer the most--either overall or for a specific music genre or audio use.  I have always turned to my ears for the grading of that final exam.  All the rest is just a (hopefully) helpful yardstick and/or simulator...a tool to help determine the best qualified to give that final exam.

Jazz Casual's picture

Nice post.

inarc's picture

Regarding inherent advantages of open vs closed headphones, I strongly suspect that the natural crossfeed/crosstalk you get with open heaphones by virtue of being open is the main reason they are generally lauded for their "soundstage", "depth", "imaging", etc..

Edit: Didn't read, LOL.

Argyris's picture

I think that measurements should always be indexed against actual listening, lest we find ourselves looking too closely at metrics that are insignificant to humans, or else aiming for measurement targets which don't actually properly model the ideal result. Recall Bob Carver's immortal wisdom: "If it measures good and sounds bad, it is bad; if it measures bad and sounds good, you've measured the wrong thing." Measurements should be used as possible indicators of why something sounds bad; they should not be used as predictors for when something will sound good unless similar measurement characteristics have been recorded for other equipment that does sound good.

What the folks over at CS and what Tyll here on IF have been doing for the past few years is measuring a bunch of headphones and trying to work out if certain sonic characteristics, good or bad, have correlating characteristics in the measurements. This work is in its infancy at this point, and I don't think we're even close to understanding everything. Even during my short time following the hobby we've already started shifting away from one of the fundamental assumptions of frequency response measurement, which is that you want it to be as flat as possible when tracking the DF target curve. A smaller presence bump and a gentle warm tilt are emerging as features that are more often associated with "good" sound.

Just a few things to think about. I know mentioning Bob Carver is a bit like throwing gasoline on a fire in some places, but that line always stuck with me. I think the reason it offends a lot of people is because he's basically saying they wasted their money on their mondo expensive gear that might measure minutely better on paper but for which the sonic improvement is imperceptible by humans in any sort of blind test.

Jazz Casual's picture

Thoughtful post, nicely expressed. It strikes me as wrongheaded to judge a headphone entirely on measurements, which are indicative at best. As you also point out, this area of research is in its infancy with the various methodologies employed for gathering data and the assumptions underpinning them evolving over time.   

vforrest's picture

Nice effort. Obviously a lot of work went into the analysis. While I'm not sure open-back headphones have to be better than sealed-back, I'll grant you most current ones are; and my personal preferences would have led to a somewhat different weighting (I hate bright, peaky headphones) at least you developed criteria and applied them. While measurements will never tell the total story about how headphones (or anything else) sounds, they do tell a lot. Thanks for the hard work.


Vade Forrester

Seth195208's picture

...part of what makes this hobby so interesting. I love reading stuff like this( That is, as long as everyone stays objective. Heh heh.).

neo's picture

Is that Nwavguy re-incarnated??

gefski's picture

"...and I would NEVER suggest anyone to make a decision about which headphone to buy based purely on this document."

"Do yourself a favor and stay away from the D7100..."

"If you're thinking about buying the SRH1840s, stop..."


mikeaj's picture

I think the idea is that you look at other sources of information too and hopefully are able to try them for yourself and come to those conclusions.

Anyhow, I'm not at all 100% convinced by the methodology here (even if we restrict ourselves to looking at datasheets from IF only, never mind other measurements, never mind people's subjective impressions), but neither is the author himself. It's a work in progress.

It's also a whole lot of information and some perspectives I hadn't really seen before, so good job.

Jim Tavegia's picture

My 701's are still my ref and I love them.  I also use my Sony 7506's for my location recordings and some listening.  I also enjoy my Grado 80s for my porrtable use and have used many Senn models at church.  My Koss Porta-Pros are my insomniac cans right by the bed with a portable CD player and minidisc player for those sleepless times at night. 

I still think that for the $219 I paid for my 701 they are the best bang for the buck I've ever spent. 

monetschemist's picture

Tyll, looking at your 300hz square waves above, my ranking (I'm not going to give a grade) would be

#1 HD700 (small overshoot, damps very quickly; I'm thinking that sustained low-level longer ringing is not as important, but if I'm wrong it should move to #3 and #2 and #3 should move to #1 and #2 respectively)

#2 HD800 (bigger overshoot, damps very quicklly)

#3 K701 (same overshoot as HD800, slightly slower damping)

#4 T1 (smaller overshoot than HD800 and K701 but much slower damping)

#5 K812 (what's up with that, looks sloppy to me).

Leaving aside my doubts with the HD700's sustained low-level ringing, how is the mark for the K812 better than the T1 and the K701?  The worst ringing of the bunch...

What am I missing?

Thanks for continuing to generate discussion on this topic!

ZeeeM's picture

I wonder if the Distortions on the K812 have an Effek like Exciters have.

ZeeeM's picture

I wonder if the Distortions on the K812 have an Effek like Exciters have.

Mad Lust Envy's picture

"It also seems to me that some of the sense of space from open cans might be that you remain aware of some outside sounds, which may tend to create a sense of space that's not actually coming from the audio, but rather your ongoing acoustic sense of the room you're in."


So much this. I recently reviewed the Alpha Dogs for my gaming headphone guide, and I essentially said the exact same thing, lol. Closed headphones CAN give a very convincing open soundstage, but the user has to understand that the lack of isolation of open cans may trick the brain into thinking the sound is more 'open'.

Mike Birman's picture

Interesting to match my own listening experience to headphones graded A or B+. Subjectively, they remain the best sounding to my ears so that, at least,  is anecdotal evidence for the efficacy of measurements (although they should not be the only metric). Has there been any substantive study of the measurement variability within models? How much variability exists over the life of any particular headphone model due to manufacturing or materials changes, tweaking etc.? There appears to be the possibility for significant changes over the manufacturing life of any particular headphone that might require ocassional re-testing or the averaging of measurements over time.

DS-21's picture

The presentation makes much hay out of nonlinear (harmonic, IM, etc.) distortion. Is there any real evidence that nonlinear distortion (short of obvious overdrive) perceptually relevant in headphones? In loudspeakers, every serious listening study has failed to find any material correlation between nonlinear distortion and listener preferences.

AstralStorm's picture

It's relatively easily audible once it's beyond 1% range.

The typical result is bass "muddiness" (misinterpreted by some as "warmth") or "lack of control" - and "grain" or "glare" if it happens in mids or highs. Or even sound "compressed" as in dynamic range compression.

The bass distortion may matter less in some genres, ones that happen to not have too prominent bass, meaning classical and jazz. Most everything else uses kick drums.

If by evidence you mean double blind tests, well, so called audiophiles do not like these, because they could expose the biases, so they actually bring out heavy artillery against them.

Not that it's really possible to run a blind test for headphones and in-ears.

AstralStorm's picture

Tyll, why don't you finally accept that your time domain "analysis" is actually guesswork and there is little correlation between the square wave behavior at 30Hz and good sound that's not already covered by the FR?

What can be seen there is whether the enclosure is made to boost bass or the capacitance/inductance (including mechanical equivalents) are causing bass boost or cut. That's about all you can read from it. In high frequencies, it would show any ringing, but since it is conflated with frequency response, it is very hard to judge.

As to the text, I do agree the point about closed headphones has been hastily made. It is missing the point about resonances somewhat by not using actual resonance metric (as in a properly tuned CSD graph and T60).
In that matter, HD700 is worse than it looks and so is Beyer T1.

Listening is very important of course, but you cannot really do a blind test and there is the "new thing" bias going on, so we need measurements to keep us honest. And actually the points are correct to my ears if incomplete.

AGB's picture

Not only does it appear that for the most part Sanji's measurements correlate to the subjective impressions of many, but they provide clues for what to look for and how headphones are designed. Most importantly, in #4 Sanji may be suggesting that headphones - in my experience all of them - require equalization to tap the device's potential. An un-equalized headphone is not likely to satisfy over the long haul. For example, take arguably the best of the breed: Audez'e LCD-Xes need about a +3dB treble boost with a octave wide parametric DSP at 1.5K and another boost of +4.4 dB at 4.9K to restore transient accuracy and sparkle, with a bass CUT -5.1 dB at 48 Hz to tighten and flatten the bass and restore magic resolution and integrity into the lowest ctaves. This may or may not result in "flat" at your ears. You may have to tweak it some, and again, to cover a moving target as your earphone, wires and/or DAC breaks in. I add that you've not heard what the LCD-X can do until you do exactly as I had stipulated above...and you'll be amazed with the results.

Flat at my ears should deviate from yours, as each ear cavity is its very own resonance chamber. The only DUMMY HEAD I speak of here that actually hears, rather than measures, are ours. Lastly: Don't knock it until you've tried it!

All scientific pursuit requires standards, and subjectivity winds up  with everyone having his own rubber ruler. I say this after three decades plus reviewing and analyzing equipment - but that's just my opinion. And not everyone's opinion carries the same weight. In my view Sanji's and Tyll's carry far more than many others'.

matador's picture

I owned it for a year, it's a great headphone, I was able to tweak my system to a very high level of performance trying to get the HD800 to sound right. And I got very close to accomplishing that. But recently I decided I need headphones for work so I purchased and tested a new headphone, the Shure SE846. It was then I realized how absoulutely fantastic my home setup sounded. I heard for the first time in forever... MUSIC. Not just sounds positioned properly in time and space, but actual music. Not only that but in many ways much more revealing of source issues than the HD800, which, in all actuality, is not that revealing, or that accurate. There is a lot of "engineering" in that headphone but it's more like trickery. They're taking a lot out of the sound to deliver those nice measurements.

Bloos's picture

"The Grado sound is literally the sound of an underengineered headphone with some targeted treble peaks."
Ooh that's hilarious!

Cats_Paw's picture

I find it a bit strange that people argue about objectivity vs subjectivity in audio.
I always thought that the point of the measurements is not to tell me what I am supposed to "like" but that I could objectively get an idea of what headphone I probably would like.

What I am getting at is that it seems I like a headphone that is polite in the treble, but I would never know that if I did not have the data from the measurements to compare the HPs I liked vs the ones I didn't.

So... yes, both being objective and subjective about audio works for me, but I understand that only the objective data will not change for another person, while the subjective part will.

In other words, I can recommend a HP to someone who likes a model that I know the measurements for as long as the measurements of the HP I am recommending is similar to the one the person enjoys.

I would however most likely be wrong to recommend a LCD-2 (a HP I like a lot) to someone who loves the HD800 (too bright for me, too little bass).