Musepack Forums  

Go Back   Musepack Forums > Main > Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 11 March 2005, 09:35 pm   #1
eye
Member
 
Join Date: Mar 2005
Location: München, Germany
Posts: 2
Send a message via ICQ to eye
Default Slow ARM feasible?

Hello.

I am programming stuff for Dreamcast at the moment. As a sound co-processor it has a DSP with a built-in ARM7 chip, running, according to the specifications lying out in the net, at 45 MHz. This whole thing is attached to 2 MB RAM. Among technical challenges is the slowness of the system, and the lack of an FPU.

I recall i have seen a codec based on libtremor which decodes 96kpbs oggvorbis files usually without getting into performance trouble, running solely on this ARM7 coprocessor - however, i cannot find it again and having asked for it in another forum i was pointed to Musepack. I looked through the libmusepack archive but it lacked the documentation to answer my questions so i hope someone out here can answer them.

- How much space does libmusepack decoder need?
- How would you think would Musepack compare in performace to libtremor, on such a target? I have seen a comparison to normal libvorbis, but it doesn't say me much since libtremor should be faster by orders of magnitude even on a real machine, and floating-point emulation is simply deadly.
- How hard should it be to adapt it to the target?
- Can it easily be made read from a circular input buffer?

Additionally, would anyone like to assist me? Perhaps someone who owns a Dreamcast and perhaps was intending to do something along the lines anyway? I am still quite unfamiliar with the hardware, and i have to get the prod out within 2 weeks, for the Breakpoint party (see breakpoint.untergrund.net). I'm pretty much in trouble because my time is getting very tight.

Regards, eye/photoAllergics.
eye is offline   Reply With Quote
Old 11 March 2005, 10:11 pm   #2
Seed
Musepack Nanny
 
Seed's Avatar
 
Join Date: Jul 2004
Posts: 168
Default

I am not sure if you intend to play Vorbis files or .mpc (Musepack) files through the Dreamcast. Anyway, this is what I can tell:

1. circular input buffer, no
2. 45 MHz is very little and without a floating point unit my bet is that both Vorbis and MPC files will be close to impossible to play.
3. libmusepack is faster than MAD, so it's probably your best bet right now, especially if you want decent sound to come out of the machine

You should post on the forum at www.hydrogenaudio.org and you might find help there.

Good luck
Seed is offline   Reply With Quote
Old 11 March 2005, 10:36 pm   #3
eye
Member
 
Join Date: Mar 2005
Location: München, Germany
Posts: 2
Send a message via ICQ to eye
Default

I need a playing routine for the Dreamcast co-processor. I can choose the file format freely. The main processor will be very busy doing other stuff.

I don't need decent sound immediately. I only need the bass to rock the party - the soud system attached will be very loud but not very high-fidelity.

I don't think i can access main RAM (of which there is plenty, 16 MB) from the code running on the co-processor, so i need some strategy to load the stream to be decoded in pieces, while discarding parts which are not in need any longer. Naturally, the most simple is circular buffer - the player need only maintain a pointer to the place in buffer up to which it doesn't need it any longer - "until here is discarded" - and from a thread running on the main CPU i can detect this pointer has moved and write some new data, say, by streaming it from disk.

MAD, or for that matter any MP3 codec need not be mentioned - they are too slow i believe.

Thans for the link, i'll check it out.
eye is offline   Reply With Quote
Old 08 October 2011, 05:01 pm   #4
Grunt
Member
 
Grunt's Avatar
 
Join Date: Oct 2011
Location: Morava
Posts: 3
Default

I know is pretty old thread, nevertheless (for possible future demomakers):
Quote:
Originally Posted by eye View Post
I am still quite unfamiliar with the hardware, and i have to get the prod out within 2 weeks, for the Breakpoint party (see breakpoint.untergrund.net). I'm pretty much in trouble because my time is getting very tight.

Regards, eye/photoAllergics.
Precalculated sound for Demo/on demoparty? Or even PCM? That's pretty lame. You should be ashamed that you even thought at it! I saw recent demo from ASD, visually perfect, from technical standpoint pretty sh*t. Don't be like them. Little inspiration from Hungary right here. I'm really sorry for Freezepoint. It was good event. RIP (oder am Rhein).
Quote:
Originally Posted by eye View Post
How much space does libmusepack decoder need?
Static memory? Little more than arbitrary MPEG-1 Layer II decoder. For LUT. Otherwise if you count out Huffman and few tables (some of them are needed more or less) it could even fit into 64k or 4k category.
Quote:
Originally Posted by Seed View Post
1. circular input buffer, no
About the size of one packet? (there is even switch for control amount of frames in packet and one frame is really small) IMHO without any problem. And if you get rid of Seek Table. Current bitstream format is not suitable for this task but otherwise i don't see any reason why it shouldn't be possible.
Quote:
Originally Posted by Seed View Post
2. 45 MHz is very little and without a floating point unit my bet is that both Vorbis and MPC files will be close to impossible to play.
Actually Musepack is yet still one of best speed-to-compresion-ratio format on planet Earth. AFAIK (If you know better, then let me know). It would be perfectly suitable for cheap power-spare mini-players. If anyone from industry give a f**k, that something like Musepack even exist. So 45 Mhz ARM? If not Musepack, then I do not know who can give that (maybe MPEG-1 Layer I but for transparent rendering it needs far more bits).

From my personal blog:
Code:
mpcdec - Musepack (MPC) decoder v1.0.0 (C) 2006-2009 MDT
Built Feb  5 2011 02:57:16
9061668 samples decoded in 9250 ms (22.21x)
on 333Mhz Celeron. So if I take it lineary, that Celeron would need for real-time decompression about 15Mhz. I know that is too simplistic evaluation, but now look at this:
Code:
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 63.58      6.18     6.18    15734     0.39     0.39  mpc_synthese_filter_float_internal
 29.32      9.03     2.85     7867     0.36     0.37  mpc_decoder_read_bitstream_sv8
  6.48      9.66     0.63     7868     0.08     1.23  mpc_decoder_decode_frame
  0.31      9.69     0.03     7990     0.00     0.00  mpc_bits_log_dec
  0.21      9.71     0.02                             main
  0.10      9.72     0.01     7868     0.00     1.23  mpc_demux_decode
  0.00      9.72     0.00     7867     0.00     0.79  mpc_decoder_synthese_filter_float
  0.00      9.72     0.00      281     0.00     0.00  mpc_demux_fill
  0.00      9.72     0.00      129     0.00     0.00  mpc_bits_get_block
  0.00      9.72     0.00       88     0.00     0.00  read_stdio
  0.00      9.72     0.00       60     0.00     0.00  mpc_bits_golomb_dec
  0.00      9.72     0.00       20     0.00     0.00  can_fill_lut
  0.00      9.72     0.00       16     0.00     0.00  huff_fill_lut
  0.00      9.72     0.00        6     0.00     0.00  mpc_bits_get_size
  0.00      9.72     0.00        5     0.00     0.00  mpc_bits_read
  0.00      9.72     0.00        3     0.00     0.00  tell_stdio
  0.00      9.72     0.00        2     0.00     0.00  mpc_demux_seek
  0.00      9.72     0.00        2     0.00     0.00  seek_stdio
  0.00      9.72     0.00        1     0.00     0.00  get_size_stdio
  0.00      9.72     0.00        1     0.00     0.00  huff_init_lut
  0.00      9.72     0.00        1     0.00     0.00  mpc_crc32
  0.00      9.72     0.00        1     0.00     0.00  mpc_decoder_exit
  0.00      9.72     0.00        1     0.00     0.00  mpc_decoder_init
  0.00      9.72     0.00        1     0.00     0.00  mpc_decoder_init_quant
  0.00      9.72     0.00        1     0.00     0.00  mpc_demux_ST
  0.00      9.72     0.00        1     0.00     0.00  mpc_demux_exit
  0.00      9.72     0.00        1     0.00     0.00  mpc_demux_get_info
  0.00      9.72     0.00        1     0.00     0.00  mpc_demux_init
  0.00      9.72     0.00        1     0.00     0.00  mpc_get_encoder_string
  0.00      9.72     0.00        1     0.00     0.00  mpc_reader_exit_stdio
  0.00      9.72     0.00        1     0.00     0.00  mpc_reader_init_stdio
  0.00      9.72     0.00        1     0.00     0.00  mpc_reader_init_stdio_stream
  0.00      9.72     0.00        1     0.00     0.00  streaminfo_encoder_info
  0.00      9.72     0.00        1     0.00     0.00  streaminfo_gain
  0.00      9.72     0.00        1     0.00     0.00  streaminfo_read_header_sv8
mpc_synthese_filter_float_internal is only procedure which really matters (and one of fastest polyphase filter I know), so without that mess around (Huffman probably) it could take it. My opinion. That syntheses filter is general 32-subband which is used in all MPEG-1/2. There is plenty of them. I know about some fixed-point filters (but not so fast of course) and in Libav is few of them optimized for for different platforms (ARM one of them, I believe). Another slowest device I've got at home is (cca.) 200Mhz MIPS so I'm gonna do some magic and I let you know.
Quote:
Originally Posted by eye View Post
I don't need decent sound immediately. I only need the bass to rock the party - the soud system attached will be very loud but not very high-fidelity.
Ok, why then some perceptual format? That's for transparent and complex reproduction. For this task you need something less complex, and something less difficult/heavy. What about some PCM loop if nothing, or when you really need some compressed format, then why not ADPCM?
Quote:
Originally Posted by eye View Post
I don't think i can access main RAM (of which there is plenty, 16 MB) from the code running on the co-processor, so i need some strategy to load the stream to be decoded in pieces, while discarding parts which are not in need any longer. Naturally, the most simple is circular buffer - the player need only maintain a pointer to the place in buffer up to which it doesn't need it any longer - "until here is discarded" - and from a thread running on the main CPU i can detect this pointer has moved and write some new data, say, by streaming it from disk.
Fastest is pure PCM, from HDD right to DAC by some DMA. It takes only cycles on DMA for copying. If you don't have DMA, then you will need somehow improvise. Best solution is of course using your own synthesizer (or MOD module, few of them is yet in demoscene done). Keep hacking! :mrgreen:
Grunt is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Slow encoding on Mac OS X krmathis MPC for UNIX 6 25 March 2005 02:10 am


All times are GMT. The time now is 04:11 am.


Powered by vBulletin® Version 3.8.11 Beta 2
Copyright ©2000 - 2017, vBulletin Solutions Inc.