Big Sound 2015 Pyrates Aboard!

(Ed Note: No joke...there are some very good and legit reasons Mike's face is not photographed. 'Nuf said.)

Just a little background here for those unaware of the inner-workings of the headphone hobby. Head-Fi.com is the mothership of all headphone enthusiast forums. It has a culture (if one can even really wrap this type of thing up in two words) of cordial collegiality, which makes it a comfortable place for the masses, but sometimes at the expense of the learning that occurs with heartfelt disagreement and the difficult dialog that follows.

Changstar.com is just the opposite: They'll happily agree to disagree, and then just carry on disagreeing. Not in an angry or insulting way, just in that if you have an opinion, you're expected to be able to hold it out there confidently regardless of the contrary thoughts of others. If you're thin-of-skin, Changstar.com is going to look awfully rough and tumble. But if you're well prepared to state your best formed opinion, and have a centered, stable sense of self, you may find the feedback there illuminating...maybe a little thrilling...

It's not for everyone.

Additionally, many of the pyrates at Changstar.com have built their own measurement systems and are heavily into modifying headphones to improve over stock performance. I mention that because some of the talk in the video will be about the state of the art in modified cans.

Just to complete the trifecta, Head-Case.org is also an important resource for headphone enthusiasts, especially those interested in electrostatic headphones and DIY headphone amplifiers.

N00bs be warned! Head-Fi is probably your best first stop, followed shortly thereafter by long term lurking at the other two sites to learn their respective cultural norms.

Anywho...Anax and Hands have spent the weekend working at blind tests and listening to the headphones available. Blind tests were attacked with a here-to-for unparalleled zeal, and headphones were plugged and unplugged with unceremonious vigor as the iPad app was rapidly requesting one reference track after another of the obedient Aurender. None of this really bothered me at all...until after returning them to the hotel Saturday evening and coming back to shut down the room, I found myself in need of airing it out, lest the stench of a small harbor at low tide permeate the rest of my home.

Good times.

Blind Tests
Tyler did a top-notch job of identifying the Bakoon current source amp, the TTVJ Teton, and the Moon 430HA with the Sennheiser HD 800. Its widely varying impedance response interacts strongly with the very high output impedance of the Bakoon, and less high (about 120 Ohms) output impedance of the Teton, and with the Moon's sub-0.5 Ohm output impedance not at all.

Switching over to the HiFiMAN HE-1000, which has a very constant, flat impedance response, and differences were much more difficult to identify for Tyler. He spent quite a bit of time training himself to hear the differences, and once his groove was found he only made 3 errors out of the possible 15 right in 5 trial test.

Mike, on the other hand, shot for the brass ring and decided to try to blind test the differences between the Schiit Yggdrasil and Antelope Zodiac Platinum DAC (without clock). This is the test that both Bob Katz and I failed. Mike similarly did no better than a coin toss during his Saturday attempt at a series. In both cases, however, we all felt we heard differences...small, but subtle differences that tend to be obscured by the cruel nature of blind testing.

I cannot emphasize enough how difficult blind testing is; not because the differences are so small—which, of coures, they are sometimes—but because your mind plays such nasty tricks on you in the process of blind testing. Doubts, fears, and uncertainties in ones personal capabilities nag at you; as you start to get an inkling of which is which while blind, you also tend to project biases based on what your guess might be, which causes you to hear it a little differently and second guess yourself. When journalists at times take a poke at audiophiles by performing blind tests with untrained, man-off-the-street listeners, they do a huge disservice to the to the true nature of the variations in audio reproduction. Blind testing is not easy or reliable unless listeners have a degree of experience and conditions are very well controlled—at which point results can be meaningful.

On Sunday, however, Anax arrived with his eye-patch and game face on. "I want to do it again. I am going to crush this test!" Seems to me Mike has the word "determination" spelled out somewhere in his genome. He spent at least three hours rigorously learning then executing a blind test differentiating between the Schiit Yggdrasil and Antelope Platinum DAC. (Again, the Rubidium clock was not used as it creates a huge disparity in price between the two.) During one of the blind tests Mike might listen back and forth between DACS for 15 minutes, and then take a five minute stretch break to clear his mind, and then go back to the test to continue to evaluate. Quite near the end, he had a six right out of eight trials score. I told him, "Mike, that's 75%, that's statistically valid and damned good. You've been at it for a long time now; you should stop right here." But the Anax determination kicked in and he wanted to go fo two more to make a 10 series trial.

He missed the next one.

Argh.

We agreed he should stop at 6/9, still giving a solid 66% score. There's no doubt he could tell the difference blind; he was just wore out.

His take was the Yggy was the more "incisive" sounding DAC. As we talked, we agreed his experience might be similar to mine where I felt the Yggy was the more dynamic of the two DACs. He didn't have a comment about image width or depth to compare with Bob Katz's impression that the Antelope had a wider and deeper image, which I thought I heard as well.

BigSound2015_19Pyrates_Photo_Portrait

The Headphones
Unlike other visitors to Big Sound 2015, the two pyrates had significant experience and well formed opinions of most of the new headphones present. Their opinions are quite complex, it really is best to point to their informative—and at 50 minutes, very long—video rather than try to digest an amputated version. And it's a good opportunity to hear how two experienced headphone enthusiasts think talk about modern-day headphones.

I will offer this personal opinion however: Just as Bob Katz thought this crop of ultra-high-end headphones falls short of the state-of-the-art in studio monitors, Mike, Tyler, and I all felt we still have a way to go before headphones are really hitting the nail on the head.

What Bob doesn't have an eye for is how far we've come in the past 10-15 years with headphones, and how these new cans do represent significant corporate learning in getting ever closer to an ideal headphone sound. All three of us thought the companies represented at Big Sound 2015 were making forward strides—It may sometimes be two steps forward and one back, but it is progress...and that's a good thing.

With that, I'll leave you with nearly an hour of pyrate dialog on the poop deck of the good ship Changstar.com. Enjoy!

Argh!

COMMENTS
TMRaven's picture

AAAAAAAAAARRRRRRR!!!!

purrin's picture

It occurred to me that that the Schiit Yggdrasil and Antelope Zodiac Platinum DAC could be more similar than different, both being high-end DACs.

I would have liked to see a comparison between one of the above with an ODAC or BM.

Shike's picture

"We agreed he should stop at 6/9, still giving a solid 66% score. There's no doubt he could tell the difference blind; he was just wore out."

That's still a 25% chance of guessing on a low trial test . . .

Tyll Hertsens's picture
He was hearing a difference.

Doesn't surprise me given the very different nature of the DACs.

Shike's picture

Tyll,

Are you familiar with stats? Based on the outcome of his testing, there's still a 25% chance of guessing. That isn't sufficient evidence to make such a conclusion. You are jumping the gun on this.

Take this for example:

We're testing to see if an individual is psychic.

We have two cards, one ace and a joker.
We run nine tests.
Six times they find the ace, three times find the joker.

Would we argue they are psychic? Based on the stats (25% chance of guessing) with the low trials we would not. I know you've been in touch with some statisticians before, I suggest you ask their opinion if you think I'm barking up the wrong tree. The math on it is pretty black and white.

steaxauce's picture

I just want to point out that the results aren't valid if you stop the test when he's doing best, because you greatly increase the probability of getting a positive result by chance if you allow yourself to do this. You have to decide in advance how many trials you're going to do and then stick to it, otherwise you have to throw the results out.

anaxilus's picture

To be precise, Tyll wanted to end the test when it actually ended. The test ended at 8 for everyone as the standard. I was the one who decided to go beyond the test parameters because I knew a certain select group would be predisposed and biased against accepting 6/8 or 75% as statistically valid. It should have occurred to me shooting for 8\10 or 80% wouldn't have mattered either to the same certain people. I'd be happy to do it again if anyone wants to pay for shipping and insurance so I can use my own amp and transport rather than someone else's gear I'm not familiar with.

steaxauce's picture

Thanks for clarifying. I'd still recommend sticking to the planned number of trials and not going over, because as soon as you go over you find yourself playing the same game of deciding to stop when it suits you, based on some criterion unknown to us, and that prevents the experiment from being useful. In this case, I'd say you can just throw out the last trial and say you got 6/8.

The probability of getting 6/8 or better by guessing is (C(8, 6) + C(8, 7) + C(8, 8))/2^8 = 37/256 = approx. 0.143, or a little over 1/7. (Here C(n, k) is the number of ways of choosing k elements from n elements. See the binomial distribution for the formula I'm using.)

1/7 isn't bad, but it's definitely not statistically significant. Usually you use a p-value of .05, but I would want an even lower one for this experiment since, given the number of people doing these blind tests, there's a good chance that some people would do that well by chance. It's also a problem that you attempted the test more than once. You might argue that you did better because you were more focused and more methodical the second day, and that may well be true, but I'd still like to see a very significant result if you're going to cite the best of more than one attempt. It's really hard to do this with just 8 trials. 7/8 gives you p=.035, and you have to get 8/8 for p=.004.

By the way, thanks for posting the HD800 mod. I'm going to be trying it out soon.

anaxilus's picture

Excellent points. The math is the math. As usual, it will be up to individuals to interpret the data and statistics as they see fit. All I have to offer is the data myself and others provided per our performances. We can all have our own ideas as to what it means if anything.

Cool, shoot me a pm if you have any questions about the mod when you do give it a go.

Cheers!

Bob Katz's picture

I'm afraid steaxauce is correct. You can't arbitrarily stop a blind test when you're going great. They don't allow you to do that in Blackjack, do they? Please don't take this as my being against Blind Tests, just to say that there are a lot of gotchas and I cringe when certain posters in our lovely forum say, "Bob, why don't you take a blind test on this?" OK, another month spent on this.

In fact, let's say that Anax got 10 out of 10. I understand that once in every 70 tries (I'm making this up but there is a finite number) you can get 10 out of 10 right. So to prove your statistics scientifically you even have to be more formal than just one listener getting 10 out of 10. Ugggg... I told you blind testing is work!

I'll be happy to organize and take one formal blind test a year and survive.

anaxilus's picture

Just wanted to point out in case you missed my clarification. I arbitrarily continued the test after it was over. There's a difference there from your characterization. If you check your own test sheet, you'll notice the test ends at 8, not 10.

donunus's picture

I don't know what it is with the fatigue kicking in at around 6 trials or so because even with 320 mp3 vs flac foobar ABX, I can get 7/7 and start failing after that.

anaxilus's picture

So can I (under my own conditions). What does that have to do with this particular DAC test exactly? How are they identical?

donunus's picture

Just pointing out that I get listener fatigue with anything that requires concentration and that maybe its the same for you. A perfect score of 6/6 is already good enough to show that there is a difference between the DACs even for people that don't trust in your listening skills. If you were comparing a 64kbps mp3 vs flac thats a different story and fatigue will probably only set in at 1000 trials lol

donunus's picture

or 6/8 for that matter... plus everyone there agrees on what the differences are between the dacs. I don't know whats up with people that can't fathom the possibility that there are actually people in this world that can discern between dacs/ amps and whatnot. This goes back to the stereo review days with julian Hirsch and the all amps sound the same bit. frustrating

anaxilus's picture

Thanks for the clarification! I agree. Forming subjective impressions over time at your leisure using a familiar listening protocol is quite different than objective testing with unfamiliar gear and protocols. Tyll had a great recent article about this.

Psychoacoustics is a powerful effect where the ear and brain adjust to compensate over time to what it's hearing. When I offer impressions normally I try to do it with fresh ears and mindset as quickly as possible using known constants in my own chain to avoid this process. Over enough time and at loud enough levels, your brain wants to adapt to any fatiguing sound to make it more palatable. Treble and bassy phones begin to sound more 'neutral' for example.

Regardless, I hope this helps to illuminate that results of a DBT test can be as much a commentary on the difficult process and interaction of such a test with the subject as it is any sort of conclusion to be generalized or extrapolated from. If differences exist, it is repeated testing under the same conditions by experienced and practiced listeners using capable gear (note the skew of results between hd800 and he1000) that will tease them out when dealing with the most nuanced of sonic cues.

TMRaven's picture

What is their list of headphone preferences? Every other guest had one.

Bob Katz's picture

I'd sure like to know which amp they used to evaluate them. Since the GSX/Headamp was clearly a faster amp than all the others, I'd love to hear their reactions to the LCD-3's on the Headamp. And they should have heard the equalized LCD-3's on my M3. A perfect fit.

JK's picture

Great write up and video !!!

tony's picture

I heard this group was to be careful about, nonsense, they seem like my kind of guys. Certainly not "Blindly loyal Fan Boys" or "prickly types", these are the kind of guys I work with, strivers for careful accuracy.

I can see why Jude ( who is every Manufacturer's Advocate ) would not welcome folks spending time looking for "Runs in the Stockings".

Once again, the Crows are out there crowing away and now the traffic of Cars & Trucks ( must'ave been Rush Hour )

Phew, these guys are Sennheiser lovers, hmm, good value for money people, again my resonance is resonating.

Planer consistency is what I ran into with my Audeze 8 Open demo vs. what I ended up owning . Stoddard mentions things about manufacturing these devices requiring a commitment.

The DAC insights, this time around, brings better focus on what to expect with these devices.

In manufacturing, Quality & Constancy are the Corner Stones of continued success. Toyota nearly killed the US Auto industry with it. ( but we're back ) and Sennheiser seem to have had it for decades. I saw the Sennheiser factory videos, phew, compare with the Audeze or Grado Shop Videos and you'll get the idea that Sennheiser is a Rolex outfit and the others are Garage outfits. The significance is that a person can build off of and improve from the consistent device. ( as these Changstar lads are able to do )

People seem to say the HD580 is mediocre, I think it's Vanilla ( but the best dam vanilla you can buy! ) , I modded my own a few times, it loves modding the way an older woman loves make-up and new clothes.

Tony in Michigan

romaz's picture

I knew I would appreciate the commentary from these guys given the wealth of their experience and they did not disappoint. Nothing at this level is black or white but rather nuances of gray. I know what Mike is saying about the differences he heard between these 2 DACs despite his blind scores possibly showing no statistical significance compared to chance. Fatigue probably did play a role and he might have done better with 4 attempts followed by a long break and then another 4 attempts and so forth.

Regardless, having had the opportunity to evaluate several DACs recently in the controlled setting of my home over a span of time, what I have found is that sometimes the differences that were appreciated were subtle and sometimes they were stark but more often than not, these differences took time to discern and were without question real and reproducible. Sometimes the differences were better appreciated in one part of a track and not another or in one genre of music and not another. If a certain DAC excels in dynamic range, such a DAC may not really shine at all with a studio recording but would certainly come to life with a live recording of the London Philharmonic, for example. If a certain DAC excels in its ability to reproduce transients, than you wouldn't notice this listening to Diana Krall seducingly singing "Peel me a grape." Even if a certain DAC is famous for it's resolving ability, unless you play a complex track rich in detail and nuance, you might not be able to fully appreciate that DAC either.

My opinion is that the best DACs play everything well but without question, there are certain things that will challenge a DAC more than others and those challenges don't necessarily come often. In between those challenging moments, all modern DACs might do equally well which would make one believe they were all on equal footing. My point is the qualities of a fine DAC (or the differences between 2 DACs) are often best appreciated over time and with a broad variety of content rather than a 10 second loop of music. I believe anyone who tells you their jaws dropped during the first 10 seconds of listening to a DAC either are coming from a very bad DAC to start with or else they're feeding you a line. So in this sense, this methodology for blind DAC testing may need to be revised because a blind test where experienced people like Tyll, Bob and Mike are unable to do better than chance has to be considered suspect, in my opinion. If you think about it, in a digital headphone chain (meaning you're not spinning vinyl), you have your source (digital file player or streamer), DAC, amplifier and finally, the headphone. Of these components, the most expensive production headphone that you can purchase today is the Abyss at almost $5,500. The most expensive dedicated headphone amp I am aware of is the Woo WA234 at $16k with stock tubes. The Aurender W20 (digital music player) that Tyll has in his possession is among the most expensive at about $17k. DACs take the cake, however, as there are dedicated DACs that cost $90,000 and more. Obviously, no one would be buying these devices unless there was a difference.

tony's picture

This certainly is the case.

Things settle down after a goodly bit of time. A person needs time to adjust. And a person does adjust. Still, no glaring, obvious or dramatic difference is being reported about DAC sonic performance. Nothing at all like the differences found from phono cartridges. Back in the day we heard night & day differences between Monster Cable and lamp wire, differences that the man on the street could hear.
That very "dramatic difference" is what I was anticipating but DACs don't seem to offer Drama, I haven't yet discovered it. In fact, I was so concerned that I could not hear properly that I had my own hearing evaluated at the University of Michigan Medical in Ann Arbor, Mi. Personally, I'm relieved that the Golden Ears are having the same result.
For now, I'm having to safely conclude that standard DeltaSigma DAC design pretty much gets it right, the better DACs get it a little bit more right but the small improvement has a multi-thousand price.
And, if a person needs better performance, the improved performance comes easiest and most efficiently from the improved transducer and headphone design.
Phew, it's taken me 4 or more years to finally get to this level of understanding.
Our own Bob Katz may have a perspective of the Professional Authority Creating music designed to be reproduced. He must maintain "Standards", he meets quality levels. He knows what something of his actually sounds like. He isn't allowed to have gear that beautifies his work. Bob Katz needs to see the naked truth. We rely on him to accurately report.
But I need to hit an Average Superb, a wide Average, maybe not an outstandingly Superb ( at any cost ). I can achieve this at the $1,000 level by careful gear selections. ( which is where Tyll came in ).
My professional years in Audio gave me the confidence to know and accept "when to say when" ( as Jason Stoddard describes it ).
This Big Sound 2015 brings me peace, I feel free from the Neurosis that my gear is lacking and the Psychosis that my little Odac isn't quite as useful as a $10,000 MSB Analog.
Now, I'm free to find a HD800 in Black & Gold, which I may get on my next trip to Germany.
Big Sound 2015 hasn't free'd me from my craving to purchase!

Tony in Michigan

SleepModezZ's picture

"Obviously, no one would be buying these devices unless there was a difference." That's not really valid economics. You would have to say: Obviously, no one would be buying these devices unless they thought that there was a difference. And many think that they can hear a difference while in a real blind test they would not. Some just buy the "best available" to make sure that they do not miss on anything.

Surprisingly many are affected by advertising, while we should know better. In medicine people pay to get treatments that do more harm than good because they believe that they do good for them. So people do not actually have to be able to hear any difference to pay thousands of dollars more - and still be totally happy with their purchases. Owning luxury products oozing of quality can be very rewarding - even if they do not sound any better than a product costing a fraction of its price.

Maybe many reason, as you do, that there must be a difference because people are not fools who pays for nothing. While economics often presupposes rational decision makers - homo economicus - there are now much evidence that real people are often quite the opposite. People often do pay more for products that are not better than the cheaper products. (I have not read it, but a "Predictably Irrational" by Dan Ariely seems like an entertaining book on this subject.)

You showed that you have extremely good ears and listening skills, but we could still ask if the differences that you heard were worth thousands of dollars? Were the more expensive equipment better, or just different?

romaz's picture

I agree with you that what you say happens. The placebo effect that you allude to is undoubtedly real as your preconceived ideas and emotions regarding a product can definitely affect your judgement and proper blind testing can effectively bring this to light.

With respect to DACs, there is no doubt in my mind there are differences. Whether these differences, as subtle as they may be, are worth "thousands", well that is an individual value judgement more than anything else, isn't it? I question the business practice of a certain DAC company that is charging $15k for a femto clock upgrade but who am I to argue the wisdom or judgement of those who insist this upgrade is worth it to them? The market will generally dictate whether what a company charges for their product is appropriate and sustainable and it is interesting to note that this certain company has been in business for many years with their business model.

zobel's picture

The analysis of these cans was very succinct, interesting, honest, and very helpful. Having each of your inputs here was a real bonus, and really shed more light on what these headphones can and can't do in this setting.

I'm especially interested in the correlation of preferred sound signature to the measured frequency response, and the current favored target curve. It would be interesting to note other measurements that can used to point out differences and preferences between the cans on trial here.

It is also very interesting to see the variability between units of particular models. I've heard that even the venerable Sennheiser HD 600 have undergone changes to driver manufacturing over the years they have been in production. People have reported quite noticeable changes in their sound. Equipment reviews can differ a lot, depending on product consistency, which generally leads people to distrust the reviewer when their experiences and opinions disagree with his. Tony's comment of how lack of quality control and product consistency in the auto industry eventually is found out by consumers, and has a real effect on sales, is well taken. I also agree with him that Sennheiser has an overall excellent track record in QC. The human touch is often necessary in crucial steps in high end headphone manufacturing, and is performed by highly skilled craftspersons (more often women at Sennheiser). There will be variability in drivers, and the best companies know this, and carefully measure and match each pair that pass muster, so that each pair of cans will have drivers that are as identical as possible. This you tube video is a factory tour of the entire production line of the HD 800, and illustrates
the precision employed, and the testing done for quality control:

https://www.youtube.com/watch?v=5es8zggYM7A

You might have already seen this, if not, be sure to check it out!
Do you think Sennheiser will come out with an HD 850 that incorporates the fixes the mods address?

Good to see you looking well Tyll. Hope you are feeling good again.
Zobel

Hands's picture

It's not much of a secret these days that the HD600/650 went under some silent revisions that DID change their sound a bit.

That said, they do not have the same type of regular, slight variance that many planar headphones seem to have. Tyll actually measured my HD650, and though slightly modded, it measured surprisingly close to his pair. As did Anax's stock HD800, a newer pair, and Tyll's older HD800.

I've heard three HE1000s now, and they all had slight, but subjectively noticeable differences. I have seen some public and private measurements from 3rd parties on various OPPO planars and also noticed production variability. You won't get that on the HD600/650/800 to NEAR that extent, and it's important to not mix that up with silent revisions over time. Hope that helps. :)

donunus's picture

About Blind testing, the hard part when concentrating to discern between things may also have to do with right brain left brain functionality. When one listens to music and enjoys it, he is using the right side of the brain. When one is trying to be analytical, the left brain starts working. The hard part is getting a good balance of left and right brain functionality at the same time. This is why people say they CLEARLY hear differences in sound quality between equipment yet can still fail a blind test. We are all in this to enjoy the music and tickle our right brain. The left brain can go to hell for all I care lol. Then the annoying part is when people call placebo hahahaha. I hardly think that people like Tyll and Anax that live and breathe audio can still be fooled by placebo.

SleepModezZ's picture

If people would "CLEARLY hear differences in sound quality between equipment" they would not fail in a blind test. It just don't make any sense to attribute some qualities to one sound reproducing system, and then other qualities to another system, and then say that those are valid attributions after failing to differentiate between the systems in an blind test. Left side of the brain, right side of the brain - don't matter at all.

Concentration (and its possible negative consequences) happens only because it is difficult to tell which system is which. If it would be easy - to either side of the brain - you would not need to concentrate. That is just what happens in easy blind listening test - fatigue won't be kicking in and the identification of systems can be done really quickly.

The placebo effect in medicine is not something that only affects some, while some, less gullible individuals, are immune to. I remember reading of one test where all subjects that were influenced by placebo were eliminated from the next trial. You should have less placebo effect in the next trial, right? Wrong. There were as much of placebo effect in the trial with the screened subjects. Placebo effect in listening tests is not the same as placebo effect in responses to medicines, but I would guess that you should not expect some people to be immune to the effect in listening tests - no matter how experienced listeners they are. Maybe Toole said something about this in his presentation that was linked to from an article here a while ago?

donunus's picture

Don't underestimate the power of the left and right brain as it applies to this. You may be able to enjoy music more when not analyzing and the magic of one dac can make you enjoy the music even more than the other but once you start stressing and trying to pinpoint what makes it better, you shift to the left brain and all that magic goes away. I know its hard to imagine for science people but there are things in this world that haven't been explained by science yet. Some people are good at shifting between left and right brain and can do this with more training though.

SleepModezZ's picture

It goes over my head how it can be possible to hear and describe differences between two systems, and then fail to identify which is which.

According to you, for some reason the "right side of the brain" seems capable to eloquently attribute different qualities to different sound sources in a sighted environment, but then it is incapable to identify *when* it is hearing one or the other in a blinded environment. How could that be even possible?

Blinding a listening environment should not blind "the right side of the brain" - it just blinds the knowledge of the sound source. All differences in sound should be as clear in blinded setting as they are in a sighted setting. The concentration and all other "left side of the brain" butting in is just a consequence of the hardness of the task of differentiating the sources.

You seems to think that it goes like this: Blinded setting -> "left side of the brain" butting in -> hard to identify two sounds what would otherwise be easy to tell apart.

I think it goes like this: Hard to identify two sounds -> concentration, analyzing and even fear of failure kicks in -> failure in identification is attributed to the "left side of the brain butting in". I agree that trying really hard can have a negative effect on performance, but, no, it is not the reason for the failure. The real reason for failing is the obvious one: The differences in the sources are so small that it is just plain hard to notice them.

"Brain-games" are an effect of difficulties, not the source of them. They are not even a necessary effect of blind listening, only of the difficulties in identifying the sources: Trying hard to find something, then finding something and becoming sure that you can hear a difference, just to be shown that you were wrong, and getting frustrated and fatigued with it all.

Mental fatigue happens when something is difficult and the person (or his/her brain) tries to put more effort into it. Some research with patients with fibromyalgia showed more activation of their brains as a whole compared to healthy subjects when doing some mental tasks. One hypothesis was that the brain tries to compensate for the difficulties they were having in completing the tasks. Something like that might be happening in these listening tests also. "The left side of the brain" butting in is just the brains way to compensate for a difficult task. It won't happen if the task is easy and can be done effortlessly.

Some sort of blinding is necessary for getting valid information. Toole's presentation had enough evidence for that claim. The blinding could be achieved with other, less stressful, ways if necessary. You could give more time, use equipment that the listener is familiar with, give fake sight information etc. But the blinding is necessary in one form or other.

Pages