Musepack Forums

Musepack Forums (
-   Misc (
-   -   a bit of background (

dev 01 July 2017 08:44 am

a bit of background

Lately it's quiet here, and since I've been using Musepack I wanted to ask about it, I think the time is right to finally do it.
Who is Andree Buschmann, Frank Klemm and people behind MDT?
I know the story is that Andree was dissatisfied with mp3's quality and He decided to do something about it.
But it's not like reading some books and writing a bit of code on pc ;)
This type of work requires a lot of knowledge and know how-to, to achieve this quality software.
So who are people behind it and how did we get to where we are with Musepack?
And where did the name came from?
No story is to long, please don't be shy and share... how it all happend :)


Shy 02 July 2017 01:46 am

Musepack in 2017
Hi, dev.

Andree Buschmann started working on Musepack long before graduating from Hannover university, with electrical engineering and digital audio signal processing among his degrees. Although he's been working in major companies as a software engineer, senior manager, and director of system integration, his main interests are still acoustics, audio signal processing and complicated things like that :) and in recent years he has made contributions to the Rockbox project, greatly improving Musepack decoding performance, among other things.

I wasn't among the earliest testers or users, so I can't tell any nice stories about that period. I've heard some things about the stages of the development and obsessive (in a good way) people who contributed to improving it, but it wouldn't be appropriate to tell half-stories, so I'll leave it to others in case they ever feel like sharing it.

Frank Klemm is a man who many people would call a DSP guru. He has had many years of experience with advanced signal processing, psychoacoustics, circuit optimization, etc. before he came in to help with Musepack development. He worked on integrating and optimizing an advanced psychoacoustic model, and much of the tweaking was based on listening tests done by himself and users. With each update or few, there were noticeable, big improvements, until it slowly reached a point where changing things further was not necessary or resulted in other things becoming worse instead.

The current state is that anyone who dares to touch anything in the audio-related code quickly realizes they're way over their head and that every tiny change can amazingly degrade everything. Anyone who looks at the audio-related code and isn't a world-class DSP guru thinks they're looking at an extraterrestrial language. After some further browsing, the response is basically "wow, I have a lot to learn", we've been getting comments like that from very experienced programmers. Well-intentioned, hoping to be able to contribute something, but we have yet to hear back from anyone :) but that's never a surprise. There's a funny yet true recurring quote by Frank that we like: "You have very little knowledge." followed by an explanation that makes you realize just how true it is.

For over a decade, Frank Klemm has been working at Zeiss, engineering extremely advanced devices related to life sciences. A job he had always wanted and is very happy with. A bit more important than our puny audio format ;).

Nicolas Botti, who did most of the work on the SV8 update to Musepack and subsequent patches, has had experience with DSP coding in the past, and even designed and released an experimental Wavelet-based video codec (called rududu). He has made great improvements and optimizations to the Musepack format, which only a person who is obsessed about optimizing things and making sure it's done right, could have done. I can't expand on more personal details.

Peter Pawlowski, known mainly as the author of foobar2000 (and of Winamp's main components, before that), has made big contributions to Musepack, migrating between programming languages, arrangement, bug and security fixes. He's a very experienced programmer who does things right and kills you if your code is bad. Well not really, although he is nicknamed "DEATH". And he's a great friend. It's funny that I think fondly of that name when I see it. It's derived from an internal joke by a great early Musepack contributor called Filburt :).. (goes a bit like "MPC IST DEATH! VQF IST GOD!" ;))

People have come and gone, and we haven't heard from some of them in a long time, but we'll always appreciate their years-long ongoing support. Mainly Lefungus (who did plugins and other things for years) and Ganymed (the man behind Mp3tag, great guy). We get patches from people occasionally, mentioned in our SVN. We're always open to any contribution.

Seed (Meni Berman) had been a Musepack tester from the early days, and there's no one whose taste in music and sound in general I appreciate more. He and I are musicians, and I'm a sound engineer, mastering engineer, and audio production software algorithm designer, so yeah, good sounds is what we're all about :). Along with him, I had fought to save Musepack back in 2003 when business people with interests in competing as well as unrelated formats were trying to get rid of Musepack by means of taking over other software projects, and web domains (including this one) and trying to bury it while promoting other things. I won't expand on that terrible time, but I'll just say we have all the info, logs, everything we'd ever need, in case anyone ever decides to open that shameful chapter in open-source software. Being lovers of music and fine sound quality, it has always been in our interest to make sure support for Musepack grows and is available to everyone in the software and hardware they use. To a large extent, we've had great success since then, and today it's possible to play MPC files on any mobile device that's worth anything, and many other devices.

I've been providing info/answers to all kinds of people, from users to students to journalists to developers who want to integrate Musepack in their production-oriented software, and I've been hosting the site and software since 2004 and making sure that we always have stellar uptime, without asking for support from anyone and without putting a single ad on it, and that's how it's always going to stay, completely free of any "special interests". grimmel (real name private), another great old friend of ours has been a great help with SVN and Trac hosting.

There is indeed not much happening these past years, but definitely that doesn't mean anything is wrong, we're always here and MPC is working great and making people and their batteries happy :).

Nice to hear from you.

Ah and I'm not sure exactly how the name "Musepack" was reached, but it was meant to be a short for "Music Packer" and of course a replacement to the old, less appropriate name, and it correlates with the "MPC" file name extension.

dev 02 July 2017 09:19 am

Hi Shy.

I knew i could count on You replaying :)
"please don't be shy and share..." - this was intentional :D

You speak of things that's impossible to find out in any other way than directly talking to the people from that time, thanks.
Reading about fight to save Musepack reminds me about eMule, but unfortunately that fight was lost :( and it happened because of people's ignorance.
This is what i thought, that Andree and Frank had to be a lot more than just an audio enthusiasts to be able to do this work.
Since SV8 official release, MPC is my religion when it comes to making my music library :)
It was harder in the beginning with portability; but now with android and f2k mobile.
And maybe one day being able to stream lossless signals from f2k mobile to other devices - it's getting more beautiful each year :)
From where i sit it's great that there is a final version of Musepack, maybe it's dumb of me but i would prefer if it stayed like that :)
In the past i hoped for MPC to use it as a quality multi channel track in concert videos with mkv but for that now we have apple's aac with tvbr.
And i think it would be a waste of time to implement it, probably i would only use it :)
I hope this topic stays open for anyone who like to share a bit of history about Musepack, thank You Shy for Your great response, it's nice to talk to You again.
If Musepack ever needs anything, i know that I'm here to help.


Shy 02 July 2017 11:08 pm

Yeah, having a stable, pretty much "finalized" version of the format, which fixed the problems with the previous version, has been really important and relieving. It's definitely a big reason for the widespread, ever-growing support. We spent years raising awareness to Musepack, cleaning things up and getting support, much thanks to our influential friends in the audio software world, and people like you who spread the word. We were lucky to have Nicolas show up and do lots of great, very professional work for a long time, got the SV8 beta Slashdotted :), did lots of testing to make sure everything works right under all conditions, got everything going. Paid off.

Although SV8 does allow multichannel and support by containers such as MKV, it hasn't happened yet, because the video world is completely dominated by formats owned by huge corporations. Even FLAC, which by now has pretty significant industry support behind it and is supported in MKV, is struggling to find a place in video. Truthfully, the error-correction-code considerations in video are great, and some formats offer a greatly superior "complete package" that neither Musepack nor other formats ever aimed to offer, so this gives them an advantage, even though having a universal external ECC solution would have been a much better solution. But that goes against business interests :).

Also, as much as multichannel MPC would have been nice, it has the same problem that AAC and others have, which is: no one really does multi-channel coding properly, it's basically a very lame hack of multiple stereo pairs and mono channels. A proper multichannel perceptual audio encoder would have to have true multichannel coding that takes advantage of the info in all channels, to reduce bitrate as much as possible, which is the whole point of efficient audio coding to begin with. Without this, we just get a huge waste of bits, and files which may be several times bigger than they could have been had the coding method was true multichannel. The problem, of course, is that it would be incredibly hard to design, and no one in the world would invest the resources needed to upgrade an existing format like Musepack or even AAC which has all the support in the world, not to mention design something new aimed at high sound quality, which would be senseless of course, and which I don't expect to happen in the next decades. Luckily stereo will always remain the dominant format, as we have two ears, and people will keep using headphones and two speakers in most places, and headphone usage will greatly increase in the future, as "virtual reality" and "augmented reality" in every aspect of entertainment, media and life in general keeps increasing.

dev 03 July 2017 06:26 am

Hi Shy

Yes it was, i remember I had audio in all available formats and only official SV8 convinced me to make something out of that mess.

"A proper multichannel perceptual audio encoder..."
In general, You think it would be technically possible?
Encoder that makes multi channel audio sound transparent on headphones and different home speaker systems in different rooms?
Psychosoacustic model that takes input from all channels, it would be pretty amazing work.
But like You say it would be a lot of work and probably later only handful of people would use it for personal use.


Shy 04 July 2017 02:18 am

There are probably at least several ways that multichannel lossy audio coding could be implemented in such a way that it would be much more efficient than stereo-optimized coding, but I suppose the difficulty of actually implementing it, and the fact that there are many multichannel configurations, none of which is close to being a guaranteed future standard, not to mention in any way an agreed upon "proper" configuration, means that no one will bother any time soon.

The psychoacoustic considerations in the multichannel audio field, even by the most experienced and professional people in the world, such as Doctor David Griesinger, are really mainly derived by "practicality" and partial backward compatibility, rather than accuracy and science. There are many problems with speaker configurations in general, and a good example is the fact that even simple stereo has great acoustic and psychoacoustic problems, starting from the fact that unless an advanced "phase cancellation" technique is applied during playback, everything you hear from a pair of speakers has a greatly incorrect stereo field that hardly matches the source audio's stereo field, as heard with headphones or earphones which ensure that sound from each channel only reaches one ear. Existing implementations of phase cancellation methods (such as "transaural") are greatly limited because although they do sometimes work amazingly well with some synthetic signals in some frequency ranges, they don't really have anything that simulates real speaker-optimized HRTF (head-related transfer functions), so it sounds mostly weird and fatiguing. So even stereo that we get through two speakers is somewhat decent at best, but when more speakers are introduced, the multiple problems that are added are much too complicated, and really it doesn't seem like anyone will ever have any incentive in the next decades or probably much later than that, to take multiple speaker setups "seriously" enough so as to provide a psychoacoustically correct representation of an audio source.

Multichannel setups were "hacks" from the start, meant for large spaces, to enable a more even audio representation. Even today, there isn't a single widely accepted multichannel recording method, and definitely not one that really adheres to principles of psychoacoustics, because there isn't even a true-to-source way to play back more than two channels, so there can't even be a way to know how to record correctly. Multichannel recording and mixing and playback in general is fundamentally, deeply different than stereo because it aims at placing sounds in virtual field, rather than attempt to simulate a listener's perspective. It aims to bring sounds "from the outside to the inside (you)" whereas in stereo it's possible to get the opposite, which is an actual listener's perspective, with the signal in each channel being a substitute for the signal that each ear would capture in a real environment, as opposed to a simulated environment created by multiple speakers. A nice analogy is for example how you view things: you put your hand close to your face, you also see your desk a meter or two beyond it, and you also see the view from beyond your window, all of those at the same time. How would a video playback system be able to simulate that? Not through multiple screens, all placed at different distances from you, but using a virtual reality system, which takes advantage of the fact that each of our eyes doesn't see the exact same image the other one does. Although it is possible in theory to have a setup with multiple screens, it would of course be unfeasible in general and even when working fine for a specific image, it would still be nowhere near as good as those two screens projecting images right into our eyes. It's similar with audio: you could take the lame approach which is multiple speakers, or you could take the right approach which is two speakers, or in large places, multiple speakers used for left and right. A simple "binaural recording" (an earphone-like microphone in each ear) gives anyone who plays it back through headphones or earphones an amazingly accurate and realistic sound which greatly resembles the original sound in the original surroundings. Degrees of success vary, but even through a couple of speakers, such a recording can sound very immersive and "real". If an advanced phase cancellation was applied, it would sound nearly as good through the speakers as it sounds through headphones. We just don't have that yet.

In the case of an audio codec optimized for stereo, we know exactly what is correct and optimal, because there is relatively little ambiguity in the properties of the audio signal, and no ambiguity in the way it can be optimally played back. Known and reproducible psychoacoustic effects are easily taken advantage of. In order to reproduce an audio signal in a way that actually adheres to the source, we have the simple, problem-free condition which is: signal from the left channel goes only to the left ear, and signal from the right channel goes only to the right ear, which is enabled by headphones or earphones, or the yet unavailable speakers with advanced, adaptive phase cancellation applied. Mid/Side stereo processing can easily be taken advantage of, to reduce bitrate, along with a panning rule that works very well because it adheres to real-world acoustics as well as artificially mixed audio, and there's much less occurrence of big differences in sound levels and separation as there is in a channel configuration that doesn't adhere to the simple binaural rule of nature, and which aims at producing a "creative" outer field rather than simulating a listener's point of reference, which is what most stereo does.

I'm really just mentioning too many things because this field is something I have a great interest in, specifically in developing good virtual sound field processing for audio production, virtual reality and standard playback systems, that works well with both headphones and speakers. Hopefully it won't take too many years, but I have great hopes for achieving this holy grail :).

dev 04 July 2017 05:09 am

It's an interesting read Shy.
Can I sample Your work?
Are You working currently on some hrtf software for mastering a life like binaural recording?
I think I had only one cd that was recorded using "the head", You think that there is a chance that music industry will go that way?

Shy 04 July 2017 08:01 am

I'm just at very early stages, without final decisions on even the most basic processing upon which other things will be built :). It's crucial to know what approaches should be taken in the first place, so that the end result won't be yet another failure. There are many software solutions nowadays available to developers and audio producers, which combine so-called HRTF (which are extremely low quality and ineffective in any software that does that), Haas effect, doppler effect, reverberation, echo, and simple phase and filtering techniques, to provide a virtual "mixing stage". So in a game, you're able to pinpoint the direction of some sounds accurately, and you get a nice realistic overall sound and feel like you're in a real environment. It's very nice even with the existing limited tools, but it could be much better, and much more "real", if the underlying effect algorithms simulated real-world effects more accurately.

Binaural recording won't be common because of many reasons:

- Even just walking around and doing nature or urban recordings, it's a big hassle, because it's not very comfortable having those mics stuck in your ears for long (and a very bad idea just like using earphones, sticking things in your ears is never good and can cause infections), and they are very sensitive to wind, so you need big foam pads covering the mics in any weather except rare weather with close to zero wind, and in some cases you need additional foam on your ears as well, and really windy conditions would require a pretty ridiculous setup which makes it all pretty unfeasible. Of course, taking a heavy, high quality, ridiculously expensive Neumann KU100 along with you is probably not an option either :).

- An ideal recording location is required. In the uncommon event that there is no intention to add effects such as reverberation, phaser, chorus, flanger, or other effects later, then all that's needed is a good location with just the right reverberation / resonance conditions, and likely it would need to be quite a large space as well (which can be especially problematic if you want large space + little reverb), because if it's small, you won't have enough room to place all the instruments / sound sources spread enough from each other. If you think "OK, I do multiple takes anyway, I can use the same position for several instruments", what happens is that since it's a binaural recording, it just sounds weird, because then you have multiple completely different sounds coming from the same position, which is unrealistic and even more pronounced in a binaural recording than in a typical half-baked stereo mix. If you're familiar with traditional recording methods then you'd think "OK, I can deal with that, I'll just place mics closer or further away from the sound sources, as needed, and use a favored method for each type of instrument", but all that is completely shattered when it comes to binaural recording. If you don't use the exact same, SINGLE position for the microphones/head, when recording each instrument or person, the result will be bad and completely against the very idea of a binaural recording, which is to have a single, listener's perspective, around which everything happens. If you position the mics/head differently in each recording (if it's separate takes, meant to be mixed later), what you get is a completely messed up "space" and overall weird-sounding performance, because the reverberation, frequency response, the overall "positional cues", will be extremely different in each take, and when combined, you no longer have a binaural recording/mix, you have a mix of many binaural recordings, which is completely useless.

- If the intention is to also have separate effects applied to each sound, then the binaural recording would have to be made in a space with no or very little reverberation, because applying additional reverberation on top of the existing recorded reverberation doesn't sound good, and applying a phaser effect, or flanger or other delay-based effects, also doesn't sound good, because the entire "space" in which the sound occurs gets "painted" by the effect, while it was intended only for the "dry" sound and not the space it's within. It's extremely hard and/or expensive to get a large closed space with very little reverberation. Recording outside is usually not an option.

- The preferred method by most people in most musical genres nowadays, even completely acoustic with no effects, is to record each instrument separately using multiple types of microphones, in mono or one of multiple 2-channel methods, and mix them in a way that doesn't necessarily correlate with how a real performance sounds in a real environment. Some microphones and microphone placements enable capturing an instrument's sound better or more desirably than others. Even many people who prefer a true to life stereo image, prefer recording in the comfort of their own place, and making the adjustments to each recording (including equalization, peak limiting, panning, etc.) later, trying to create a realistic, different environment than where the instruments were originally recorded. For this reason, software that enables easy mixing in a virtual environment, and with realistic results, is the preferred method for most people.

- Today, "virtual instruments" are often the norm, rather than a secondary addition to real performances. Virtual instruments are either synthesizers, sample libraries (like pre-recorded single-note samples, and sometimes very sophisticated scripting), or a combination of both, aiming to enable musicians to create believable, real-sounding acoustic instrument performances, or a synthetic sound similar to analog synthesizers' or any other kind of synthetic sound, from plain subtractive to "physical modeling". Since those are either synthetic sounds or pre-recorded samples, those who make use of such instruments have only one option in regards to creating a result that would be similar to recording in a real space, and that is to mix those sounds in a virtual environment made up of either a sequencer's plain interface and included effects, additional plugin effects, external effect processors, or a combination of them. This virtual environment we have to operate within may or (usually) may not make it easy to process those sounds in a way that eventually results in a believable, immersive stereo mix. So most people nowadays who aim to create a believable, good-sounding musical performance, really need tools that enable them to get a "real" stereo mix, which when played on headphones, makes the listener feel they're in a real environment, and the same with speakers, to the largest extent possible without or with phase-cancelation post-processing.

Due to the changes in how things are made and how things are experienced / "consumed", the future of audio production as well as playback really is heavily dependent on effect algorithms most of all, so it's crucial that software as well as hardware is improved to meet the demands of our ever-progressing and expanding "virtualization". Ever since people started using telephones and phonograph cylinders, we've had a "virtual reality" field of sound, where people can essentially exist in places they're not, and where you can hear an artificial device omit any kind of sound. This is just a natural, requested progression of "virtual reality", in a era where the focus is now on blurring the boundaries between what is real and what is artificial.

Heh, and yeah, that drifted way beyond anything related to Musepack, file compression or anything like that. At least maybe somewhat interesting.

dev 04 July 2017 09:43 am

It's vast, individual topic and there is a lot of options to choose from, sometimes to many.
And if it's not enough, end user with f2k can practically apply any pluggin; to one's liking.
Even encode with changes to Musepack :)
We have tools for excellence.

All times are GMT. The time now is 07:33 pm.

Powered by vBulletin® Version 3.8.11 Beta 2
Copyright ©2000 - 2021, vBulletin Solutions Inc.