Musepack Forums

Musepack Forums (https://forum.musepack.net/index.php)
-   Tech (https://forum.musepack.net/forumdisplay.php?f=7)
-   -   Suzanne Vega - Tonal component removed (https://forum.musepack.net/showthread.php?t=645)

Grunt 08 October 2011 12:45 pm

Suzanne Vega - Tonal component removed
 
Hi guys. Have you ever heard Tom's Diner(.mpc) without tonal component?

I just tried to replace all the "Samples data" with white noise and I was surprised myself how much information nontonal component carries and how good does it sounds despite the fact that polyphase filter bank used in MPEG/MPC has only 32-subbands (is like some equalizer bars with 32-taps (or bars) uniformly dispersed across the spectrum and with time resolution 1152 samples). I knew that for hearing/perceiving purposes, only matters on noise-envelope shape and that envelope shape can be approximated ([1]->[2] and some Monty's articles I think) but I never thought it can be so rough and yet sound so good. Do you know someone about some R&D activity concerning this phenomena (in example I'm interested in: How rough shape can be, what is sufficient frequency & time resolution, what are the relationships between tonal and non-tonal components, how does it all affects p-a masking and so on) best with results in form of papers? :)

Just out of curiosity: If I subtract from bitstream count of sample bits, I get for this clip 11.2kbps (standard preset and for pure bitstream (packets, their headers + resolution, SCF_types and SCFs)) because on sample data doesn't matter and can be replicated on decoder side. Interesting, isn't it?

Added:with more variations (i.E.: stereo, more Percussion). Tom's Diner is purely voice illustration. 10 meaningless points for every clip, which you identify.

Shy 08 October 2011 05:28 pm

It's definitely an interesting topic. You know, regarding your surprise by this, I can tell for example something that surprised me even more: I have an old analog 10 band vocoder (SEV-66) and it's amazing that even such a seemingly simple processor can retain a huge amount of very comprehensible information from the source (speech for example) with just its noise generator (and with other wide-band input signals). Merely 10 bands and yet pretty much anything is interpreted with comprehensible accuracy. When I compared to 40-band vocoders, not only was there no real improvement, they were actually worse (hard to compare,though, since it's digital implementations).

I don't know about relevant research papers unfortunately, but there are a few here with technical knowledge who might.


All times are GMT. The time now is 11:25 pm.

Powered by vBulletin® Version 3.8.11 Beta 2
Copyright ©2000 - 2019, vBulletin Solutions Inc.