Expert Tests InnerFidelity's Headphone Measurement Repeatability and Reproducibility Page 2
Procedures/Results & Discussion
Right off the bat, I wanted to see just how reproducible Tyll's gear was with respect to repeatability. But before I go into that I want to note the differences between precision (repeatability) and accuracy. I think it's best described by the Image above. As you can see, one can be "precise," but not accurate. Accuracy is both precise and "hitting the bull's eye."
Back to the testing ...
First, we wanted to categorize Tyll's gear (dummy head, isolation chamber, microphones, measurement equipment) alone to determine just how much variability the equipment alone was causing in the measurements. Tyll simply placed the HD800s on the dummy head and took 5 successive measurements over a short span of time. To note, the headphones were left on the head without any changes and the 5 successive measurements were done in short order to eliminate any potential variability of room temperature variation. The results were quite impressive to say the least.
Below are the five frequency response graphs of this experiment:
I then determined the standard deviation at each frequency and multiplied that value by 2 and plotted the 2x standard deviation (dB) by frequency (Hz). The reason I used 2x again was that statistically, if we measured the same pair of headphones 100 times, we would expect 95 of them to fall within +/- 2 standard deviations (or sigmas). As well, this graph would show us the areas in the frequency ranges that the measurement system had difficulties measuring.
The results here also looked very good with an average 2x the standard deviation of roughly 0.1dB.
We can use this data to extrapolate that should we measure the exact same pair of headphones successively 1000 times (and discount any placement or temperature variations), we could expect to find 950 of them would measure within the frequency range of +/- 0.1dB. This shows that the measurement tools are very precise. (I am using the word "tools" in place of system as the system includes the effects of changing headphone position on the head, temperature variations, etc…)
Now how does this precision relate to accuracy? Well the dummy head is composed of the dimensions/density of the "average" human head. So, comparing this method to say me using my Radio Shack SPL meter, I've got to believe that Tyll's system is far more accurate in determining how my ears will hear it.
Now the fun begins….
The next set of experiments, Tyll performed the exact same experiment above, BUT this time he removed the headphones and placed them back on the head. The results were very different in terms of precision. For this data set, we have a total of 7 runs. While leaving the headphones on the head yielded remarkably repeatable results, the action of removing and placing the headphones back on the head made substantial changes.
Here is the frequency response graph from this run of 5 successive measurements but this time with taking off the headphones and placing them back on the head:
As we can see, the precision up to about 2kHz is still quite good; however, the data does start to diverge after this point. Comparing this to the first run (when the headphones were not removed from the head), we can see quite a bit more variability in the measurements in the higher frequencies.
Performing a similar analysis on the standard deviation vs. frequency, we see that the deleterious effects on the precision at higher frequencies are more pronounced:
The bass to mid bass/lower mids regions (Regions 1 and 2) still show little effect of precision loss due to headphone placement. Region 3 (1300Hz – 3400Hz) shows a shifting upwards of the variability. Once we approach about 3.4kHz the variability does really begin to become a strong function of headphone placement. The resulting 2x standard deviation is now 1.8dB. Again, this would mean, should we measure the same pair of HD800s 1000 times (and place the headphones on the head in between each measurement) we could expect 950 measurements would be within 1.8dB.
However, once we reach higher frequencies (>8.4kHz), the variability really begins to become strongly affected. The average standard deviation in this region is now almost 5dB. So yet again, variations of +/-5dB are to be expected in this region. The maximum variability is approximately 18dB, higher still. So it appears that the slightest headphone placement on the head can significantly change the high frequency resonances in the dummy head and drastically reduce the measurement system's ability to precisely measure higher frequencies.
Finally, we performed the exact same testing procedure that Tyll outlines on innerfidelity.com. This method uses an average of 5 specific locations on the dummy head to "average" out the above variability. (Slightly forward, back, up, down, and centered)
The results showed that this spatially averaging method does noticeably smooth out the variances seen as noted in the two graphs below:
To confirm, each graph above compares the result of five runs where each run is the averaged result of the 5 different headphone positions. We can see that the variations between runs have been reduced by a good margin; but they still are quite a bit larger than in the tests where the headphones were not repositioned on the dummy head.
The 2X standard deviation vs. frequency graphs shows this as well:
So now headphone measurements in Region 3 that vary within 1.33dB should be considered equivalent (but please note that we did see variations of up to almost 2.5dB). Furthermore, in Region #4, we did see variations of +/- 3.5dB (with maximum variations of 7.4dB).
What does all this data mean? First off, I learned that measuring headphones (unlike measuring conference room table lengths) is pretty darn difficult. However, the gear used by Tyll does very much seem up to the task of offering us readings that are precise and based on the dummy head that Tyll uses; it does appear to have a more accurate reading of what's going on in the average human head than other methodologies.
Between 10Hz -3.4kHz, the precision of the measurements seem to be relatively unaffected (with 2x standard deviations less than 1dB) . So one could with confidence could use this information and compare say two different headphone models in the bass and mids.
But measurements in the treble region appear to be strongly influenced by the position of the headphones on the head. Even the smallest variances in placements alter the resonance artifacts (peaks and valleys) in the higher frequencies that can make it appear that 2 different headphones measure differently (by up to 7dB). Even when measuring the same pair of HD800s we found variability in the higher frequencies (> 8.4kHz) that averaged approximately 3.5dB with peak variances of up to 7.4dB.
Finally, based on the data, it does very much appear from 10Hz to 3.4kHz the measurements are very precise and not very dependent on headphone positioning. So comparisons between different headphones and models are quite relevant statistically. In the 3.4kHz to about 8.5kHz, the precision is still good, but can vary (so please take this into account). The most problematic area is the treble (frequencies > 8.5kHz) where even the slightest variations in headphone placements can have a drastic negative effect on the repeatability. So please use caution when trying to make comparisons in this region of the frequency range.
Next steps? I'm not sure, but the challenge is there for Tyll (or anyone else for that matter) to work on a measurement system that still maintains a level of accuracy (i.e. dummy head), but improves upon the precision at higher frequencies as even the slightest variances in placement can have such profound differences on the measured results. I'm sure that using this type of methodology to question what's being measured should help in that endeavor.
I would like to thank Tyll for his great amount of work in pulling together this data and actually running the experiments. As well, I've got to say, in the few telephone conversations I've had with Tyll, I've learned more than all the years of independent learning that I undertook since I got into this hobby. He's a great guy with a great depth of knowledge and really just knows his stuff.
Editor's Note: Aw geez, thanks, Peter. And thanks for the terrific analysis. Let me complicate it a little further ...
First, the reader should know thatl the peaks and valleys in the high frequency region arrise from resonances: between the driver and ear; within the concha ridge of the ear; and in the ear canal itself. While the driver might be putting out a completely flat frequency response in the treble, all the resonances will make it appear that it's not. Basically, the ear's not hearing the driver as much as it's hearing all the resonances the driver is exciting in the coupler (the combined headphone/ear acoustic system).
Because these resonant cavities are very small, very small positional changes of the headphones on the head significantly shift the resonant frequencies of the acoustic coupling between the headphone and ear. So the changes in amplitude that are being measured for this study are primarily occurring from the shifting in frequency of the resonant peaks and nulls.
One way to rid ourselves of this pesky problem might be to apply some smoothing to the frequency response curves. Because the resonances typically create adjacent peak and null features, by applying a smoothing filter one will be able to somewhat average the peaks with the nulls to arrive at mean response. This might indicate an approximation of how much energy the driver is emitting into the coupler. (Peter and I have already chatted about this as a further avenue for exploration.)
Another miserable reality is that one headphone may couple with the head in a completely different way than another. Peters current study shows us the repeatability when measuring an HD 800---a headphone that's fairly positionally insensitive. If I were to perform the same test with a Beyerdynamic DT1350, which tends to be fairly positionally sensitive, we might see a completely different analysis from Peter. In that case, variations reported in bass response do to the changing seal would be much higher.
The point is: If Peter's article cautions us to take headphone measurements with more than just a grain of salt, my experience tells me we need to unscrew the top of the salt shaker.
None-the-less, it's the best objective measure we've got, and you can rest assured that I will continue to try to improve my skills and methods as I perform these measurements.
Thanks for the article, Peter. I look forward to producing more data for your future number crunching sessions.