AES Headphone Technology Conference Summary
Given the glimpse of what's coming down the tracks by engineers at the Audio Engineering Society Headphone Conference, my overwhelming impression walking out the door is that we're about to see a revolution in headphones.
You remember what cell phones looked like back in 1973? 15 years later everyone had one. Today, it's your phone, camera, email, calendar, watch, newspaper, TV, stereo, book library, and things we wouldn't have even thought of back then like FaceBook, tape measure, GPS, and a million other things. What we carry in our pocket today would look like magic to someone 50 years ago.
You can expect headphones to undergo an even more dramatic and rapid development in future. Let's take a look at the topical trends in research revealed at the conference.
Individual HRTF Acquisition - It seemed pretty universally understood by researchers that generic HRTFs from dummy heads or anthropomorphic averages are just not going to cut it. Our ears are just to different one person to the next, making the psychoacoustic cues we hear significantly different. Researchers know they need to develop fast and easy methods for acquiring individualized HRTFs in order to convince us we're hearing sounds from outside our head when we wear our headphones.
It's likely a couple of methods will be used in future. One may be a service delivered at audiologists, and maybe places like eyeglass stores, where you wear a head tracker and small microphones at the entrance of your ear canals. You look at a wall with a grid on it and a speaker mounted in the middle. Then you move your head around in lazy random circles to cover the grid as the speaker plays test signals. As you do, audio from your ears and head position is captured, and then calculated and converted into a personalized HRTF.
The other methods will likely be some sort of optical 3D capture of your ears and various body measurements. This may be done by a service similar to the above, or it might simply be implemented through 3D capture using a cellphone camera. Once an accurate anthropomorphic model is made of the user, a full coverage HRTF can be calculated.
Imagine this: You fire up your smartphone for the first time. During the audio set-up process it asks you to tape a quarter to your forehead for scale...hell, they might even include some Post-it notes with targets in the box. Then you take a selfie. It asks you to do the same thing pointing at your left and right ear; and then you take a movie of your ears while sweeping the phone around from various angles. Then it asks you some questions like: weight; height; gender; race; belt size....who knows. Once it's got all the info it needs, it creates an anthropomorphic model of you, and SHAZZAM! you've got an HRTF. All right in the comfort of your own livingroom...just before you send it to your Xbox9 and start blowing people's heads off.
Simplifying HRTF and BRIR - Imagine how much computational power it's going to take to synthesize any number of sounds coming from any direction and distance continuously as you move your head around without latency? Lots...whole freaking lots.
Researchers are currently looking very hard at how to simplify these calculations. Smartphones are smart, and they're going to get smarter, but man, they're not going to get that smart and still be able to deliver without latency. The computational load is going to be distributed and much of it will be done in the headphones themselves, but it still has to be manageable.
Researchers are currently trying to find minimum perceptible resolution requirements for things like: How many discrete HRTF angles are needed? How many microphones are needed in a spherical array to accurately capture a sound field for virtual display? How much can the binaural room impulse response be simplified with artificial reverberant decay significantly shortening the BRIR? The computational problem of virtual audio synthesis is huge, researchers are going too spend a lot of effort making it as simple as possible.
Subjective Testing - This whole science of how to fool you into perceiving a sound is coming from outside your head is completely subjective. Researchers may know how to measure the sound in your ear canal and how it might change with angle, but they don't know if it's fooling you or not. At every step along the way, researchers must do the expensive and time consuming task of performing subjective tests on living, breathing listeners.
Much of the work shown at the conference was centered around making subjective testing more efficient. Things like: Improved statistical analysis methods allowing fewer subjects, fewer trials, and faster post-test data reduction and analysis. Better methods to allow test subjects to express themselves meaningfully through standardized vocabulary and better tablet user interfaces. And even finding out for sure what people want in headphones. (Spoiler: People say they want sound quality first, but when it's tested it comes in third after design and comfort.)
Controlled Acoustic Transparency - Remember the goal of all this is a mixed auditory reality where you can hear artificially created audio seemingly coming from anywhere around you, but you must also be able to hear everything in your environment just as you normally would. In other words, if nothing is playing in the headphones, you would continue to hear the sounds around you, where they were coming from, and at their proper distance. Headphones need to be completely acoustically transparent, and that's likely to have to be accomplished electronically. This problem alone is huge.
Now add to it smart noise canceling. Imagine you're sitting at a table in a crowded, noisy restaurant. You say, "Identify table mates." You look directly at each in turn, "That's John....that's Mary...that's Mom. Engage conversation mode!" All sorts of DSP stuff kicks in and from then on you can hear your friends clearly, and the noisy restaurant quiets to a murmur. That's going to be cool, but man is that complicated.
Summary - To all that above stuff add a full-size headphone with a drop-down transparent visor that delivers mixed-reality vision (like the Microsoft HoloLens) and you've got one heck of a gizmo. Kids can hear the Pokemons rusting around and giggling in the hedgerow; they'll catch fleeting glimpses as they begin the chase; and then a bevy of Pokemons will run out of the bushes and into the street with your kids running behind. Plenty of money in that...and danger. Too many people already getting killed walking out in front of trains with their headphones on. I shudder to think about inattentional blindness in the future.
But you'll also be able to go to Japan and read all the train station signs. And when you ask for directions, you'll hear the answer translated into English. Blind people will have cameras on their headgear and will get spoken directions and warnings when out and about thanks to autonomous car technologies. They'll ask, "Where's the shoe store?" And the headphones will leave a virtual sonic breadcrumb trail for them to follow. Need I go on?
A quality consumer headphone 10 years from now will cost what a quality consumer smartphone costs. In todays environment, about $800, and you'll likely pay for it just like you did for your phone: though your cell provider.
The question I don't know how to answer is: What are high-end audio enthusiast headphones going to look like then?
I'd love to hear what you think in the comments.