Testing the Audibility of Break-in Effects

Subjectivist: "Man, I got my headphones last week and they're breaking in nicely."
Obectivist: "Yer nuts, dude, it's your head breaking in to the sound of your new headphones."
Subjectivist: "Leave me alone, troll, take your objectivism to 'Sound Science.' We have the minds of Gods and poets, and don't need your weights and measures to know what we know what we know."
Objectivist: "What can I say to someone who's their own placebo?"
Subjectivist: "Break-in exists ... I've heard it ... I stamp my feet three times and you will go away."
Objectivist: "Lol ... you couldn't blind test your way out of a paper bag!"

And so it goes.

Let's try to clear a bit of this up, eh?

Break-In Testing So Far
If you've been following along, you'll know I had three brand new Quincy Jones Q701 headphones that I'm using (green, black, and white) to see if we can measure the effects of break-in.

I did the first exploratory break-in test on the green pair some time ago, in which we saw some changes and learned what we might look for in subsequent testing.

I designed a second more complex test based on what we learned on the first test, and we saw the changes over time more clearly.

Now, we have an avenue to do a break-in test on the third pair and really run it through the ringer. But first, I thought it would be good to do a subjective test to see if I could hear the difference between a brand new pair and one that's been broken in considerably.

Subjective Break-In Testing
The green Q701 that was used in the first test has been on my bench playing pink noise at about 90dB for well beyond 1000 hours at this point. The white Q701 remains sealed, brand spanking new in its box. I thought it's important for our exploration into break-in to find out if your could hear some difference between the two.

So, I called my buddy Brian (screen name "NA Blur" on Head-Fi) here in Bozeman, and arranged a time for him to come over and help me out with the test. I made some score sheets, colored a coin, and set up the gear on my dining room table.

The Test
This was a single blind test where I did not know which headphone was being placed on my head, but Brian did know. (In double blind tests, Brian wouldn't know which was which either.) In order to take some influence out of Brian's hands, a coin was flipped at the beginning of each test that would indicate which headphone he would place on my head. At no time during the test would Brian indicate whether I was guessing correctly or not, and I would not know the score until the end of each trial.

I did notice that the music playing computer in front of me did provide some reflections in which I could potentially see the headphones on my head, so the screen was tilted way back to prevent seeing any reflection. As Brian was putting headphones on my head, there was no way to see the headphones, or feel the difference between the headphones. I had no idea at all which headphone was on my head.

Before testing began, I listened to both headphones to try to perceive what the differences were between the two. I thought it was fairly clear that the broken-in pair was smoother sounding then the new pair. Brian did the same and thought he heard a difference as well.

Then we began testing. I went first and had a somewhat difficult time. I was using a Tiger Okoshi track with a strong trumpet solo that I knew could sound harsh if not well reproduced. As we progressed through this test, Brian said maybe I should try another track ... which was a bit of a hint that I wasn't doing so well, but we were on our first trial so I figured I'd switch the music I was using mid-stream. I switched to a Pinback driving rock track that was very dense with sound, and which could sound harsh and pinched when poorly presented.

By the time we finished 19 guesses, I called it quits and found that I had gotten 13 out of 19 correct, which is statistically significant, but I thought I could do better.

Brian tried it as well. At this point I hadn't told him which headphone was the broken-in pair, and he didn't want to know, he also took only a very short (probably too short) listen before starting his trial. When he was done he had gotten about 65% wrong. We think he had gotten mixed up with the sounds and which color headphone made what sound. It seemed evident he was hearing differences, but misidentifying which headphone was which.

Then I sat down for my second trial with a pretty good sense of what I was listening for and what music to play.

Blind testing is not easy. Even when there is a fairly clear difference it can be quite disorienting not knowing whether you are guessing correctly or not. Opportunity is rife for self-doubt, anxiety, and second-guessing oneself. But I've done a fair bit of blind testing of prototype amplifiers, so I knew what I was in for. As the second test progressed, I relaxed and relied on my previous experience, and used a technique I think works very well for this sort of thing.

I don't actually try to listen for the sound for a problem, or differences in sound. I relax and listen to the music normally, as if for enjoyment, then I pay attention to and monitor how I feel about the music. So I'm not trying to be critically aware of the sound as much as I'm critically aware of my reaction to the sound. It's worked very well for me in the past.


inarc's picture

"Have we absolutely proven that break-in is an audible phenomenon? No. All I've proven is that I could tell one headphone from another."

To rule out that you were just discerning manufacturing tolerances, you could break-in the new Q701 for a comparable amount of time and then re-do the blind test again. If -- after an honest effort -- you then failed to discern both headphones, you could be sure that the current audible differences are (mainly) due to break-in.

xnor's picture

Manufacturing tolerances, my thought exactly.
I really respect what you're doing but that's a big problem in a test where you try to show that burn-in exists. You've seen yourself that different headphone samples (even if they have the same color :D and probably even if they're from the same batch) have slightly different channel balance, impedance, sensitivity/efficiency (-> different SPL).
What I'm trying to say is that I think you could achieve the same test results with two brand new Q701s or K701s.

Tyll Hertsens's picture
Thanks! This is a good idea. Will do.
dalethorn's picture

When you see the headphone being manufactured and coming off the assembly line and you grab it and take it with you, then you know the mfr didn't pre-burn it on an automated rig. Other than that, you don't know. You could assume no pre-burn for cheapies, but for mfrs' pride items, I wouldn't assume that.

Tyll Hertsens's picture
The headphones were sent to me by a PR firm. I would guess that they are turned on at the factory for testing purposes and to pass QA. And manufacturers ofter burn-in product for possibly a day to weed out infant mortality issues, but it's bad form to build this step into a manufacturing process for a rather simple product. It's better to tweek the manufacturing process so product coming off the line is virtually guaranteed to work. AKG is a quality manufacturer, so I highly doubt the cans have more than ten minutes of time on them when they leave the factory.

I will, however, make an attempt to find out.

J oker's picture

Are you planning to do a similar test with an in-ear model?

Tyll Hertsens's picture
Probably not as these tests are extremely time consuming ... but you never know. Frankly, I think once I've done what I can on break-in, I'm going to try to do some testing on the audible differences of IEM tips.
JAD's picture

What about the heat and humidity of the skin transmitted to the earpads? It could be a clue because the most recent headphones worn could be hotter or had a feeling of recently wear. A good way to solve this is to heat the earpads equally, or wearing the same earpads for both. Furthermore, earpads virtually identical could sound different.

There are many factors like those but I've no doubt that break-in is subjective noticeable, of course, that's my personal experience, like many.

Tyll Hertsens's picture
I doubt this is a problem, that cans were on and off the head rather quickly. I certainly didn't feel anything at all. More importantly, doing blind testing is psychologically complicated, and settling in to simply hearing and focusing on that without thinking about anything else at all is quite important. So when performing these tests, I'm hardly "thinking" about my conclusion at all. Even if there was a temp difference in the pads, I'd be in the mode of not only not being aware of it, but those sensations would likely be strongly ignored.

Also because it's been asked elsewhere, the green headphones had about 200 hours of head time on the dummy, and were on the shelf burning in for the remainder. Pad wear is very likely not a factor.

donunus's picture

Even if there are differences in manufacturing with some models being slightly bassier and older pads making the sound more intimate and richer, the difference where one is harsh and makes one wince vs the other is a typical burn in related thing. I'm inclined to believe that these new cans will be as smooth as the one with 1000 hours on them after burn in but what I am not sure about is if they will end up with the same exact tonal balance.

There have been reports of k701s sounding different in the bass for instance even when both have been well used. These conclusions were made after comparing two k701s side by side by the way.

The Monkey's picture
Thanks for this excellent write up, Tyll. I remain a skeptic, but readily admit that the 701 is the only can I have heard where I suspected break in could be a factor. I would love to see these tests run with a different can, too. Further, regardless of whether break-in is real, the notion of it is still abused by the proudly uninformed. In other words, people who really have no idea what they are talking about regularly trot out break-in as a way of either protecting their own observations from critique or discounting the views of others. I'd like to ban such folks from the hobby.
Tyll Hertsens's picture
Pure gold, little ape! :)

I don't think we should ban them from the hobby though ... they should be made to fetch cigars and burbon at meets, and massage my feet at trade shows. Yeah, that'll teach 'em.

dalethorn's picture

Not a bad idea. Being skeptics, they would certainly want to perform all of the physical chores, just to make sure.

The Monkey's picture
Tyll, with respect to significance, were there sufficient trials to draw such conclusions?
Tyll Hertsens's picture
Um ... my goal was to see if I could hear clear differences between cans. I think I clearly heard them sighted, and the tests showed a very clear result. I think it's enough to show I could hear differences, and I can describe what I was listening for. So, yes, I think it's significant in that way.

Also, more trials would not have gone any further toward proving the audibility of burn-in, so making the test more stringent wouldn't have added any info, I think.

khaos's picture

If my memory is correct, you have the measurements of the new green Q701 somewhere in your database.

Including the measurements of the new green Q701, its measurements in its broken in state and the measurements of the still new white Q701, should give us some information about whether the differences you heard are more likely due to break in or manufacturing tolerance.

Tyll Hertsens's picture
Well, I tend to think the differences would be at the same level as the differences seen with positional changes on the head. So, it would be difficulty to say what had changed over time, and what was just the differences from doing the measurement at a different time.

Really, looking at this issue is best done by putting a pair of headphones in the chamber and doing the long testing over time without moveing the head ... which was done in previous tests, and did show small but measurable difference.

mward's picture

Thanks Tyll. More great work here.

I used to be fairly skeptical of break-in... and I still am, somewhat. But when I reviewed Maximo's iM-590, I noticed they had a tag that said they should be burned in for 20 hours or so. So I listened to them, found them harsh, gave them a 24 hour burn-in, listened again, and although I still found them somewhat bright, most of the harshness was gone.

So now I burn everything I review in for 24 hours, usually before I even try them, and certainly before I do any formal evaluation. Although judging by your data, maybe I should do more...

AncientWisdom's picture

Great article, I love your approach, and I love it that you assume NOTHING even after hearing an audible differences.

The amount of variables is obviously astounding but even if you show proved audible differences that's a great step forward IMO.

In the previous articles you said that based on the measurements you don't think the differences would be significant i.e. cans would have the same SQ out of the box as well as after burn in. How do you judge that now that you have heard the difference? (how big is the difference?)

I also fully support the suggestion in the first comment above. It would have also been good if there was a blind test conducted on the headphones when they were new as well - to show that (possibly) statistically you couldn't tell them apart. Maybe worth doing one before burning in the third pair?

One last thing, in one of the previous articles someone suggested an experiment which I think would be very interesting: record the same track's output through the new and burned in cans and then do a comparison of the wave lengths (hopefully that's the right terms).

Tyll Hertsens's picture
To be honest, I do assume some things. Someone pointed out on Head-Fi that in my video the language I use indicates that I believe I'm hearing the difference as a result of burn-in ... and that's true. I DO think I was hearing differences from burn-in. But I'm also willing to be skeptical of myself. I very much understand that what I personaly believe, has nothing to do with providing proof.

I think what I've said in the past is the there differences out of the box and after burn-in are small, and what you her out of the box is pretty much what you'll hear 1000 hours later. What I'm saying is comments like "OMG! My cans are like night-and -day different not that I've burned them in" are largly balony.

donunus's picture

On all headphones I have had including the k701s, I have noticed the effects of burn in were mostly subtle and admittedly sometimes due to psychological acclimation/ getting used to a certain sound but the hd555s I had before were a totally different phoenomenon. It was really weird in that out of the box they were echoey as hell and was much brighter than the grados I had. They also had zero bass. Listening to the same 555s now which a cousin of mine now owns, they are warm sounding and even slightly dull in the highs. I actually thought they were damaged when I first listened to them, out of the box. My 555s were an Irish made 50ohm model by the way in case that matters.

Currawong's picture

One of the most useful points I feel you made was how, with practice, you could improve your blind test score. I see a lot of people feel that any DBT is absolute in the truth it conveys regardless of anything else, but don't understand that such a thing is far from what a properly conceived and executed scientific experiment requires if it is to produce meaningful results.

Most importantly, you've pointed out that there are no absolutes regarding audio and you're doing very well demonstrating the complexity with what goes on when we put headphones on and listen to music.

Tyll Hertsens's picture
Thanks mate. I think people should practice doing blind tests to sharpen their listening skills. The problem is they're hard, stressful, and rather uncomfortable as you end up going through a bit of a psychological ringer, and untill you get good at it you'll not score well. I think most of the failures of blind testing come from the fact them most people don't practice it.

Why? Because they're lazy, and don't want to do something that's uncomfortable. It's so much easier to say that blind testing isn't informative, and so much easier to blather about their own experience as if it's objectively factual.

I think the best part of this article is the intro, which strives to show the closed mindedness of both "sides."

donunus's picture

I really agree with this post here. Some people just say this and that is much better like flac vs aac, etc for example when they won't even do a foobar abx test to prove it. Then there are people that also won't listen when you say that with a foobar abx, you can also not really make sure that one file sounds exactly the same as another unless you had the best equipment connected to your system at the time of testing.

With an hd600 for example, I can tell a flac from an mp3 in a foobar abx way more easily than if I were using a px100-II for example. These details don't seem to matter to a lot of the objectivist camp and its sad. I think one has to be a combination of both subjectivist and objectivist to really know for sure about their hearing skills and the truth in a lot of things in audio.

dalethorn's picture

I'm finding it easy and informative to compare music tracks now, using the same player, using different players, or using multiple copies of the same player so I can have both running and switch instantly. Comparing headphones is not so easy, having to remove one and plant the other. Very distracting. Most of that can be tuned out, but still very stressing when you're concentrating as hard as you can. Speakers were easier since you could just flip a switch.

johnjen's picture

For many years I too was 'caught' in the VS. trap. And then I realized it is very artificial and trying to stay in only one camp or the other divorced me from the reality I was actually immersed within.

Both have value and both provide their own types of results. And neither is 'complete' without the other. And in reality they can't be mixed together, like oil and water. They can however be used to examine and discern our ability to enjoy music to a greater or lesser degree. To help us hone and refine and increase our ability to discover more and more of what is already there just waiting to be revealed.

Numbers will never capture the enjoyment of listening to music nor the visceral and nearly (at times) palpable experience of being immersed in the music. And conversely we wouldn't be able to listen into those very same inner details and structure of the music unless it were capable of delivering a faithful re-creation of the original. It is this ability to accurately re-create the original where the numbers can give us the ability to peer into the finer structure of how the music is 'put back together' and then presented for our enjoyment.

And I suspect that as we gain in our ability to resolve ever more complex relationships between the numbers and our experience of what the numbers 'mean' (to the direct experience itself) both sides of the VS. syndrome will gain.

But even more importantly, our appreciation and enjoyment of the experience not only of listening but also striving to grasp and understand the complexity of the relationship between the numbers and our enjoyment, which is usually kept apart by the VS. syndrome itself, provides even further levels of fascination and challenge.

Well at least for me it does.


Tyll Hertsens's picture
Hiya JJ, good to see you here!

I think the tricky thing for most to get is the concept of "domains." Measurement is in one domain, the listening experience is in another. The twain meet in places, but don't in most.

Both domains exist, though. Both are valid. Oil and water indeed. A little balsamic vinigar may help.


keanex's picture

The biggest problem I see is that you didn't consider pads as the reason for the difference. At a meet there was a guy with brand new K701 and another person's old K701. They could tell the difference in an AB test but they switched the pads and came to the conclusion that the broken in pads were providing a better seal, therefor better sound.

Tyll Hertsens's picture
Other than the roughly 200 hours the headphones had on the measurements head during the burn-in test, the headphones haven't been worn at all, so there is very little mechanical wear on the pads. They were on a shelf for the 1000+ hour burn-in.
Hal Espen's picture

Thanks for a fascinating exercise in non-rancorous self-experimentation. Indeed, you’ve boldly tackled a topic that has fanned many a flame war. But this kind of testing is too rare and too often taboo in audio journalism, and it provides a refreshing break from the customary oenophile-like audio evaluation based on subjective sound-tasting and the reviewer’s bona fides in “trained listening.”

But I still think we end up in the old familiar objectivist/subjectivist hall of mirrors. I have to admit that my heart sank a bit when I got to this proviso: "Before testing began, I listened to both headphones to try to perceive what the differences were between the two. I thought it was fairly clear that the broken-in pair was smoother sounding then the new pair. Brian did the same and thought he heard a difference as well.” The potential effect this had in terms of providing cues and generally priming expectations and identifying audible distinctiveness leading into the experiment is to disqualify the test on the basis of prejudicial prep work. You took a sneak peek at the answers before taking the test! A key pitfall in the ideas behind blind testing and component break-in is that you may be training your ear, not hearing meaningful differences between different pieces of gear at different stages of break-in.

And of course, the possibility that inarc and xnor raise in the first comments above — that the differences between the headphones may exist already before any “breaking in” — adds another degree of uncertainty to the project.

Part of my skepticism about break-in also relates to the unvarying conclusion that long break-in not only changes the sound, but inevitably IMPROVES it. The fact that 100% of the reported changes after break-in result in a component sounding better should be a significant warning that the fog of subjectivity is thick on the ground.

Still, as much I’d like to dismiss your outcome for such reasons, it’s difficult to avoid the conclusion that you’re on to something. Again, thanks for a highly suggestive effort.

Tyll Hertsens's picture
The problem is that it's not like a drug trial where the effect you are looking for is an involutary change in symptoms. In this case, I have to make a conscious choice: Is this A or B? If I hadn't listened to the headphones at all, the first time I test the result is meaningless as it would be a totally random guess. My first series of tests were still somewhat like that as I was listening to a tune that wasn't as good at highlighting the differences. Again, I was simply showing I could fairly easily tell the difference beteen cans, so hearing them before the test did nothing but allow me to have a starting point to correlate the sound with a particular pair.
bikermanlax's picture

I too was skeptical about break-in, agreeing that much of the difference was just learning to hear the sound of the new headphones.

The last pair I bought I didn't bother to burn in at all. Based on Tyll's recommendation I bought some 1350's (after having a horrible experience with their doppelganger, the T50). I noticed a certain dissonance in the middle range of many classical piano sonatas. The dissonance was there in many different recordings and vintages. I simply stopped listening to piano sonatas as the rest of the sound was excellent.

After somewhere between 20 and 50 hours of play, I stumbled back into a piano sonata and the dissonance was now gone. I went back and listened to the same recordings that had been a problem when the phones were new. Great sound now. Burn-in? Seems to be the best explanation to me.

xnor's picture
The question is if the drivers changed or if you simply got used to the new sound signature, which to me seems to be the more likely explanation in this case and also many other burn-in reports.
pbarach's picture

If this were actually a true double-blind test, then neither Tyll nor Brian would know which set of cans had been burned in. Unfortunately, Brian knew (and I know that because Tyll said which ones had been burned in at the onset of the video). So it's a SINGLE blind test.

Tyll Hertsens's picture
And I've said as much. Still interesting though, eh?
AstralStorm's picture

Now, why haven't you actually added a measurement of the both headphones after the test? You do have the test rig.
If there is an audible difference, there must be a measurable one as well. (Yes, the ears are very precise, 0.5 dB in frequency response is enough for a trained listener like yourself.)

Also, was the trial count preset or "until we get bored or find a significant difference"?
The tests without a preset trial count are far lower statistical power - Bayes' equation with random prior (0.5) would have to be used instead of Bernoulli's trial; normal error which is assumed by significance metrics doesn't apply. (e.g. ANOVA or Fischer's test)

I'd suspect the earpads do break in, as they're pretty loose mechanical parts subject to a lot of load, but likely not the driver. The test for the latter is simple - first measure before "burn-in", and later after playing music of choice for x hours.
I'd bet (quite a lot of money even) that you won't be able to even measure the difference. (beyond typical placement issues, so do a bunch of measurements)

The similar phenomenon I've found with IEMs - the driver doesn't change one bit, but the tips do a lot - they get softer and fit better, especially foam ones.

udauda's picture

The data you acquired from the test are certainly interesting, but I wonder whether they are any meaningful in an objective way. Your pal was not blinded, and there was a long delay when your pal was removing & replacing Q701s from your head. (ITU recommendations suggest non-stop simultaneous transition among the A - B - X test chain during a subjective assessment test, not to mention that double-blind & level-matching are mandatory.)

Here's better way to execute this kind of test.
1. Record 2 different kinds test materials with the head-and-torso simulator with the new Q701 & broke-in Q701 on.
2. Apply a diffuse-field compensation to the recorded materials. (with a professional quality DAW software)
3. Run the materials with a double-blind ABX comparison test software(e.g. foobar2000) through one of those flat-diffuse field headphones.
4. Try to be as comfortable as you can while listening.
5. Post the result!

This will certainly remove all kinds of distractive test variables, and keep the test result valid objectively.

Tyll Hertsens's picture
I dunno man, I'd rather hear the real thing. I think I can deal with it.
udauda's picture

"I'd rather hear the real thing."

And that's kind of the notion J.Atkinson of Stereophile would take, whenever someone challenges him with a blind test :) I think we can be more scientifically rational than that.

Your data shall not pose any objective meaning if the test did not go with 'formality'. What I recommended you was merely one(it's called Binaural Room Scanning) of many existing binaural recording techniques, and even Harman Kardon utilizes the same kind of approach when measuring room acoustics & acoustic properties of an automobile. The result was so good, even they were able to convince Toyota with it!

And maybe you can take this idea to another level, such as, by sharing the recorded materials of Q701s with users, and make them post their own ABX results. Once enough number of data samples gathered, you may able to come up with something quite meaningful scientifically. (but you must prove that those two Q701s were sonically identical before breaking-in one)

SoulSyde's picture


I loved this article and the included video. Well done. I think you put a provocative argument to rest. I'm a firm believer of burn-in (break-in) in regards to dynamic drivers and tubes. The effects I believe are less noticeable with solid state op-amp driven amps (discrete excluded), BA headphones and DACs. But I'll leave that be.

Any thoughts on doing a "double blind" cable test? I would really enjoy your thoughts and opinions on this subject.

IUyyufe2's picture

Psn codigo gratis
Sony ha abierto inscripciones para un nuevo servicio de PlayStation Network beta para PS4. Cuando el sistema va en vivo el 2 de septiembre, los usuarios que se han inscrito podrán probar nuevas funciones en la columna vertebral digital que soporta la infraestructura de la red de la consola.

Yuwf12's picture

gps phone tracker
Then there is the hardware into the shell to complicate matters: first, the accelerometer is included, despite being more than sufficient as a pedometer, struggle to correctly detect the movements of the wrist less than blatant movements, with the result that to activate the display is always better to rely on the side button; heartbeat detector offers a precision not really less than surgical competition.