AES Headphone Technology Conference Summary

From the look of the current market you may think the headphone train is moving along pretty fast. Well, you ain't seen nothing yet. The photo above is a very high speed exposure of the pistons in a new engine that's about be be hooked up to the front of that train...and it's already running at very high RPM.

Given the glimpse of what's coming down the tracks by engineers at the Audio Engineering Society Headphone Conference, my overwhelming impression walking out the door is that we're about to see a revolution in headphones.


You remember what cell phones looked like back in 1973? 15 years later everyone had one. Today, it's your phone, camera, email, calendar, watch, newspaper, TV, stereo, book library, and things we wouldn't have even thought of back then like FaceBook, tape measure, GPS, and a million other things. What we carry in our pocket today would look like magic to someone 50 years ago.

You can expect headphones to undergo an even more dramatic and rapid development in future. Let's take a look at the topical trends in research revealed at the conference.

Individual HRTF Acquisition - It seemed pretty universally understood by researchers that generic HRTFs from dummy heads or anthropomorphic averages are just not going to cut it. Our ears are just to different one person to the next, making the psychoacoustic cues we hear significantly different. Researchers know they need to develop fast and easy methods for acquiring individualized HRTFs in order to convince us we're hearing sounds from outside our head when we wear our headphones.

It's likely a couple of methods will be used in future. One may be a service delivered at audiologists, and maybe places like eyeglass stores, where you wear a head tracker and small microphones at the entrance of your ear canals. You look at a wall with a grid on it and a speaker mounted in the middle. Then you move your head around in lazy random circles to cover the grid as the speaker plays test signals. As you do, audio from your ears and head position is captured, and then calculated and converted into a personalized HRTF.

The other methods will likely be some sort of optical 3D capture of your ears and various body measurements. This may be done by a service similar to the above, or it might simply be implemented through 3D capture using a cellphone camera. Once an accurate anthropomorphic model is made of the user, a full coverage HRTF can be calculated.

Imagine this: You fire up your smartphone for the first time. During the audio set-up process it asks you to tape a quarter to your forehead for scale...hell, they might even include some Post-it notes with targets in the box. Then you take a selfie. It asks you to do the same thing pointing at your left and right ear; and then you take a movie of your ears while sweeping the phone around from various angles. Then it asks you some questions like: weight; height; gender; race; belt size....who knows. Once it's got all the info it needs, it creates an anthropomorphic model of you, and SHAZZAM! you've got an HRTF. All right in the comfort of your own livingroom...just before you send it to your Xbox9 and start blowing people's heads off.

Simplifying HRTF and BRIR - Imagine how much computational power it's going to take to synthesize any number of sounds coming from any direction and distance continuously as you move your head around without latency? Lots...whole freaking lots.

Researchers are currently looking very hard at how to simplify these calculations. Smartphones are smart, and they're going to get smarter, but man, they're not going to get that smart and still be able to deliver without latency. The computational load is going to be distributed and much of it will be done in the headphones themselves, but it still has to be manageable.

Researchers are currently trying to find minimum perceptible resolution requirements for things like: How many discrete HRTF angles are needed? How many microphones are needed in a spherical array to accurately capture a sound field for virtual display? How much can the binaural room impulse response be simplified with artificial reverberant decay significantly shortening the BRIR? The computational problem of virtual audio synthesis is huge, researchers are going too spend a lot of effort making it as simple as possible.

Subjective Testing - This whole science of how to fool you into perceiving a sound is coming from outside your head is completely subjective. Researchers may know how to measure the sound in your ear canal and how it might change with angle, but they don't know if it's fooling you or not. At every step along the way, researchers must do the expensive and time consuming task of performing subjective tests on living, breathing listeners.

Much of the work shown at the conference was centered around making subjective testing more efficient. Things like: Improved statistical analysis methods allowing fewer subjects, fewer trials, and faster post-test data reduction and analysis. Better methods to allow test subjects to express themselves meaningfully through standardized vocabulary and better tablet user interfaces. And even finding out for sure what people want in headphones. (Spoiler: People say they want sound quality first, but when it's tested it comes in third after design and comfort.)

Controlled Acoustic Transparency - Remember the goal of all this is a mixed auditory reality where you can hear artificially created audio seemingly coming from anywhere around you, but you must also be able to hear everything in your environment just as you normally would. In other words, if nothing is playing in the headphones, you would continue to hear the sounds around you, where they were coming from, and at their proper distance. Headphones need to be completely acoustically transparent, and that's likely to have to be accomplished electronically. This problem alone is huge.

Now add to it smart noise canceling. Imagine you're sitting at a table in a crowded, noisy restaurant. You say, "Identify table mates." You look directly at each in turn, "That's John....that's Mary...that's Mom. Engage conversation mode!" All sorts of DSP stuff kicks in and from then on you can hear your friends clearly, and the noisy restaurant quiets to a murmur. That's going to be cool, but man is that complicated.

Summary - To all that above stuff add a full-size headphone with a drop-down transparent visor that delivers mixed-reality vision (like the Microsoft HoloLens) and you've got one heck of a gizmo. Kids can hear the Pokemons rusting around and giggling in the hedgerow; they'll catch fleeting glimpses as they begin the chase; and then a bevy of Pokemons will run out of the bushes and into the street with your kids running behind. Plenty of money in that...and danger. Too many people already getting killed walking out in front of trains with their headphones on. I shudder to think about inattentional blindness in the future.

But you'll also be able to go to Japan and read all the train station signs. And when you ask for directions, you'll hear the answer translated into English. Blind people will have cameras on their headgear and will get spoken directions and warnings when out and about thanks to autonomous car technologies. They'll ask, "Where's the shoe store?" And the headphones will leave a virtual sonic breadcrumb trail for them to follow. Need I go on?

A quality consumer headphone 10 years from now will cost what a quality consumer smartphone costs. In todays environment, about $800, and you'll likely pay for it just like you did for your phone: though your cell provider.

The question I don't know how to answer is: What are high-end audio enthusiast headphones going to look like then?

I'd love to hear what you think in the comments.

jgazal's picture

Thank you, Tyll! I am not an expert, but in my opinion you did an excellent coverage.
You summarized the limitations (how much precision is needed: Tetrahedral microphones or 32 capsules eigenmikes? 3rd order ambisonics?; 32 measuring sources also for personal hrtf acquisition?) and how much latency to compute the needed precision for a subjective low error perception.
Still, I guess/fill that once the transducers achieve a minimum standard of audio quality the choice of binaural rendering of 3d fields over speakers or headphones will remain under the same criteria: Music on the go or at home; Comfort wearing headphones; decor in listening rooms; aesthetics and luxury goods.

ADU's picture

Thank you, Tyll! I am not an expert, but in my opinion you did an excellent coverage.

+1. Finally had a chance this weekend to read and better absorb all the articles on the AES conference, and I think Tyll did a very nice job of breaking things down in a way that ordinary lay headphone users like myself can grasp. The "your takeaway" parts were particular helpful.

What will audiophile cans be like in the year 2026?... Hopefully something like my AKG K553's, but with a smaller and/or better padded headband that fits more users comfortably, and maybe some kind of washable cloth covering on the earpads (to keep my head warmer in the winter, and absorb the sweat in the summer), and a little more accurate/balanced sound.

I'm not askin for a lot here folks. :) Jus sumthin that will fit well and sound good for a reasonable price,... and look good on me while I'm flying around in my hydrogen-powered, self-navigating drone-car.

norb's picture

Hi, isn´t this new system called tone control something like HRTF but with a sound that measures your ear canal?

"Tone Control: Full control of adjustment sound signature. It allows bass and treble adjustments without any sound distortion, for true personalized sound - developed with inputs from Grammy winner Quincy Jones." ... taken from this website:

Most of the listeners claim to hear a significant difference. I don´t expect them to be neutral audiophile headphones and so I wouldn´t buy them, but it seems that they already implemented something like what you were writing. At least they began thinking about how to do it. Don´t know if it really works but I wouldn´t mind if let´s say the Ether C Flow would have something of that kind too in a few years, so that they produce special sound for my special ear canal.

tony's picture

You, Dear Sir and Steve G. were discussing how headphones had become a "Hobby", the iPod and Beats had become the leading forces. You seemed surprised and rather delighted. You showed no inkling of where the Industry would be eventually. I, for my part, was delighted that I was able to find Sennheiser without having to spend endless hours and expense, thank you, I'm happy to report that I'm still delighted with Sennhieser products.

Phew, 5 years hence, I see Smart Phones dominating lives, Smart Phones becoming all things to all people ( even having quad-DAC chipsets = Audiophile aspirations in the soon to be released V20 LG Phone ).

But wait, you say, there's more coming, much more.

I think you're right, there is much more coming, most of it good.

I see Audiophile quality ( like RMAF 2011 was discussing ) available to all citizens ( wether they want it or not ) from their little hand held phone. Superb transducers already exist to take good advantage of the Phone's limited output power.

I no longer see the need for elaborate electronic devices, high powered Amplification and perhaps a smaller number of surviving Manufacturers to make the gear Headphone hobbyists are buying today but the Hobby will continue, just as vinyl 33 has survived.

This summary you've just presented is perhaps your finest Prose and most acute Philosopical analysis, extrapolating the destinations this journey will take us is invigorating and inspirational, one again, I give you 5 Stars!

Just today, my son found a 78 Record at a Jumbles Sale. He bought it and presented it to me, asking if I could play it?. I though of the Record as the very place I started in life. He discovered the Jumbles Sale on his iPhone 6s as he was returning home from work. Now, as I write this, I'm understanding that: 78s to iPhones is just the beginning but it's one hell of a beginning!

For my part, I'll be working on Solar stuff, in Florida.

Thank you for this superb journalism,

Tony in Michigan

Tyll Hertsens's picture
You're welcome, Tony. Didja find a little turntable for that 78?
tony's picture

Hello Tyll,

I and my son were deep into TT sales, back in the day, he knows I won't be playing records. That 78 is just a missing 'snap' out of the Photo Album of my early life, perhaps he's reminding me of just how Ancient I am ( in a nice way ). I suppose I'll mount the Record on a wall hanging plinth. In 1985, Ortofon still sold a 78 MM Cartridge and VPI TT had 78 Capability.

Tony in Michigan

Cheche's picture

Whenever I see words like sounds from outside our heads in the later articles I can't help thinking about the retired list of WOF. Did it ever happen to have one below the line punching through, back to the celebrated ones?

pf-THX's picture

Thanks for you writeup. It is very good. It was also a pleasure meeting you.

Tonmeister's picture

As the co-chair of the AES Headphone Technology Conference I want to thank you for the excellent coverage of the conference and summary of the papers. I'm so glad you were able to attend.

You continue to do a great service for the headphone community distilling scientific and technical information to people who might not otherwise have access to it. Plus, you make the information digestible without distorting the facts. Kudos to you.
Sean Olive

Tyll Hertsens's picture
Thanks, Sean. That was my biggest struggle: To keep track of what they were actually saying. It's pretty dense stuff...much of it far over my head. I'm very happy you think I did the work justice.

And thanks for your work keeping the effort on well directed rails.

castleofargh's picture

I'm only annoyed that all those things weren't available 10 years ago. if cellphones are considered now, it means that computer processing has been enough for while. but I guess that's the usual pace of the audio industry a whole entity, wait 10years, see someone else doing it, then try to jump on the boat before it's really too late.
this time it's clearly the crappy 3D googles that have forced the sound to also move on to moving 3D.

thank you for the coverage, nothing to really call a revolution, but a few interesting alternative ideas on how to make custom measurements. I can't wait for the day when people stop thinking that a headphone without DSP can have a great soundstage.

3VRMS's picture

Don't know if you have heard of Viper4Android. I've been using this for a while on an old Samsung S3 with my Philips SHE3890, and it has been amazing:

Might be a hassle to setup sometimes, but I highly recommend everyone to give it a try.

After setting Viper4Android up on a rooted Android phone (required), unzip the impulse response samples and choose them in "Convolver," based on the sampling rate of the phone/external dac that you play out of the phone.

Really makes a big difference for me. What is better is that it can also use impulse response samples for digital compensation, making headphones as neutral as possible based on measurements (many of which come from your site, too! :D ). I use it with my Philips SHE3590 and it really offers a great listening experience on the cheap.

Hope you guys benefit from it too. put the accelerometer in smartphones these days to good use...