View Single Post
Old 04 July 2017, 02:18 am   #6
Shy's Avatar
Join Date: Jul 2004
Posts: 372

There are probably at least several ways that multichannel lossy audio coding could be implemented in such a way that it would be much more efficient than stereo-optimized coding, but I suppose the difficulty of actually implementing it, and the fact that there are many multichannel configurations, none of which is close to being a guaranteed future standard, not to mention in any way an agreed upon "proper" configuration, means that no one will bother any time soon.

The psychoacoustic considerations in the multichannel audio field, even by the most experienced and professional people in the world, such as Doctor David Griesinger, are really mainly derived by "practicality" and partial backward compatibility, rather than accuracy and science. There are many problems with speaker configurations in general, and a good example is the fact that even simple stereo has great acoustic and psychoacoustic problems, starting from the fact that unless an advanced "phase cancellation" technique is applied during playback, everything you hear from a pair of speakers has a greatly incorrect stereo field that hardly matches the source audio's stereo field, as heard with headphones or earphones which ensure that sound from each channel only reaches one ear. Existing implementations of phase cancellation methods (such as "transaural") are greatly limited because although they do sometimes work amazingly well with some synthetic signals in some frequency ranges, they don't really have anything that simulates real speaker-optimized HRTF (head-related transfer functions), so it sounds mostly weird and fatiguing. So even stereo that we get through two speakers is somewhat decent at best, but when more speakers are introduced, the multiple problems that are added are much too complicated, and really it doesn't seem like anyone will ever have any incentive in the next decades or probably much later than that, to take multiple speaker setups "seriously" enough so as to provide a psychoacoustically correct representation of an audio source.

Multichannel setups were "hacks" from the start, meant for large spaces, to enable a more even audio representation. Even today, there isn't a single widely accepted multichannel recording method, and definitely not one that really adheres to principles of psychoacoustics, because there isn't even a true-to-source way to play back more than two channels, so there can't even be a way to know how to record correctly. Multichannel recording and mixing and playback in general is fundamentally, deeply different than stereo because it aims at placing sounds in virtual field, rather than attempt to simulate a listener's perspective. It aims to bring sounds "from the outside to the inside (you)" whereas in stereo it's possible to get the opposite, which is an actual listener's perspective, with the signal in each channel being a substitute for the signal that each ear would capture in a real environment, as opposed to a simulated environment created by multiple speakers. A nice analogy is for example how you view things: you put your hand close to your face, you also see your desk a meter or two beyond it, and you also see the view from beyond your window, all of those at the same time. How would a video playback system be able to simulate that? Not through multiple screens, all placed at different distances from you, but using a virtual reality system, which takes advantage of the fact that each of our eyes doesn't see the exact same image the other one does. Although it is possible in theory to have a setup with multiple screens, it would of course be unfeasible in general and even when working fine for a specific image, it would still be nowhere near as good as those two screens projecting images right into our eyes. It's similar with audio: you could take the lame approach which is multiple speakers, or you could take the right approach which is two speakers, or in large places, multiple speakers used for left and right. A simple "binaural recording" (an earphone-like microphone in each ear) gives anyone who plays it back through headphones or earphones an amazingly accurate and realistic sound which greatly resembles the original sound in the original surroundings. Degrees of success vary, but even through a couple of speakers, such a recording can sound very immersive and "real". If an advanced phase cancellation was applied, it would sound nearly as good through the speakers as it sounds through headphones. We just don't have that yet.

In the case of an audio codec optimized for stereo, we know exactly what is correct and optimal, because there is relatively little ambiguity in the properties of the audio signal, and no ambiguity in the way it can be optimally played back. Known and reproducible psychoacoustic effects are easily taken advantage of. In order to reproduce an audio signal in a way that actually adheres to the source, we have the simple, problem-free condition which is: signal from the left channel goes only to the left ear, and signal from the right channel goes only to the right ear, which is enabled by headphones or earphones, or the yet unavailable speakers with advanced, adaptive phase cancellation applied. Mid/Side stereo processing can easily be taken advantage of, to reduce bitrate, along with a panning rule that works very well because it adheres to real-world acoustics as well as artificially mixed audio, and there's much less occurrence of big differences in sound levels and separation as there is in a channel configuration that doesn't adhere to the simple binaural rule of nature, and which aims at producing a "creative" outer field rather than simulating a listener's point of reference, which is what most stereo does.

I'm really just mentioning too many things because this field is something I have a great interest in, specifically in developing good virtual sound field processing for audio production, virtual reality and standard playback systems, that works well with both headphones and speakers. Hopefully it won't take too many years, but I have great hopes for achieving this holy grail .
Shy is offline   Reply With Quote