Speed vs Compression ratio

Nov 28, 2011 at 5:20 AM

Hi Cristoph,

Thanks a ton for porting Speex to .NET!  I've been trying to use it to send speech collected from the WP7 microphone up to a web service for recognition, and the Speex library has achieved about 90% compression on the audio - which is exactly what I was looking for.

On the flipside... it is VERY CPU intensive.  On the WP7 emulator I was spending half a second of CPU for every second of audio.  On the actual phone I've been testing (a Samsung Focus), it is a whopping 2 seconds for every second of audio - which of course precludes doing the speex encoding synchronously (i.e. on the UI thread).

My question: is there a way to compromise on encoding size but use less CPU? 

Thanks!

Omri.

 

Coordinator
Nov 28, 2011 at 2:19 PM

Hi Omri,

Yes audio encoding is quite CPU intensive. That's why all mobile phones have a hardware audio encoder/decoder for GSM encoding.

Anyway, you can play a little with the quality settings but I cannot assure you that it will help. There have been some improvements in the latest Speex releases but they still need to be ported.

Is using a managed wrapper around native code am option for you? If yes, then try the wrapper hack in the source as it uses the latest Speex version.

Am 28.11.2011 07:20 schrieb "ogazitt" <notifications@codeplex.com>:

From: ogazitt

Hi Cristoph,

Thanks a ton for porting Speex to .NET! I've been trying to use it to send speech collected from the WP7 microphone up to a web service for recognition, and the Speex library has achieved about 90% compression on the audio - which is exactly what I was looking for.

On the flipside... it is VERY CPU intensive. On the WP7 emulator I was spending half a second of CPU for every second of audio. On the actual phone I've been testing (a Samsung Focus), it is a whopping 2 seconds for every second of audio - which of course precludes doing the speex encoding synchronously (i.e. on the UI thread).

My question: is there a way to compromise on encoding size but use less CPU?

Thanks!

Omri.

Read the full discussion online.

To add a post to this discussion, reply to this email (nspeex@discussions.codeplex.com)

To start a new discussion for this project, email nspeex@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com

Nov 28, 2011 at 6:26 PM
balistof wrote:

Hi Omri,

Yes audio encoding is quite CPU intensive. That's why all mobile phones have a hardware audio encoder/decoder for GSM encoding.

Anyway, you can play a little with the quality settings but I cannot assure you that it will help. There have been some improvements in the latest Speex releases but they still need to be ported.

Is using a managed wrapper around native code am option for you? If yes, then try the wrapper hack in the source as it uses the latest Speex version.


Thanks Christoph!  I didn't realize that not setting the quality defaulted it to the highest quality.  In doing some experiments, it looks like a 0 quality level is too low for my purposes, but 1 works pretty well.  The extra bonus is that it not only works at less than half of the CPU time, it also results in far better compression ratios - an extra factor of 4x (so a 64000 byte PCM stream encodes into about 1500 bytes!)

Unfortunately the Windows Phone 7 environment doesn't yet allow native DLL's so I can't try the managed wrapper approach.

That said, I did run into one interesting tidbit on the speex site... One of the FAQ's mentions that you can compile Speex to use only fixed-mode ops, which would presumably work faster on a low-end CPU such as the snapdragons they use in the WP7 (with lower floating point perf).  Do you know what would be entailed in this?  I'm assuming that if this was a compiler option in the original speex source, it was implemented using preprocessor directives that you stripped out when you ported to C#...

Thanks again!

Omri.

Dec 5, 2011 at 9:08 AM

 

Omri, what did you changed to work on wp7?

I attached the NSSpeex.Silverlight project in my wp7 solution, removed the Contract validation statements, that was missing on wp7, and it compile. it So i used the example on documentation section and also compiled and ran fune, but I always get the file corrupted. I can see the file size get smaller, the riff header is created, but the file is always corruped. I tried to encode the byte[] as the mic buffer get filled, and also tried encoding after the capture is complete, and on both ways i can see the file gets smaller, i can even see some audio data when i open it, but, when i try to open with speexdec, its corrupted.

C:\speex\bin>speexdec.exe c:\temptts\teste.raw
This doesn't look like a Speex file

Follows my WP7 class, if someone could help I am already thankful.

namespace PanoramaApp1
{
    public partial class MainPage : PhoneApplicationPage
    {
        private Microphone microphone = Microphone.Default;
        private byte[] buffer;
        private MemoryStream audiostream;
        private SoundEffect sound;
        private bool hasspeak = false;
        public static ManualResetEvent allDone = new ManualResetEvent(false);

        // Constructor
        public MainPage()
        {
            SpeexEncoder encoder = new SpeexEncoder(BandMode.Wide);
            encoder.Quality = 10;

            InitializeComponent();

            // Set the data context of the listbox control to the sample data
            DataContext = App.ViewModel;
            this.Loaded += new RoutedEventHandler(MainPage_Loaded);
           
            microphone.BufferReady += (object sender, EventArgs e) =>
            {
                microphone.GetData(buffer);
                audiostream.Write(buffer, 0, buffer.Length);
               
                /*
                //ON THE FLY CAPTURE. ALSO FAILED
                short[] data = new short[buffer.Length/2];
                int sampleIndex = 0;
                for (int index = 0; index < buffer.Length; index += 2, sampleIndex++)
                {
                    data[sampleIndex] = BitConverter.ToInt16(buffer, index);
                }

                var encodedData = new byte[buffer.Length];
                // note: the number of samples per frame must be a multiple of encoder.FrameSize
                var encodedBytes = encoder.Encode(data, 0, sampleIndex, encodedData, 0, buffer.Length);
                if (encodedBytes != 0)
                {
                    var upstreamFrame = new byte[encodedBytes];
                    Array.Copy(encodedData, upstreamFrame, encodedBytes);

                    // todo: do something with the encoded data
                    audiostream.Write(upstreamFrame, 0, upstreamFrame.Length);
                }
                */
            };
        }


        public byte[] EncodeAudio(byte[] wavRawData)
        {
            var encoder = new SpeexEncoder(BandMode.Wide);
            //var pcmWaveWriter = new PcmWaveWriter(1, 10, 16000, 1, encoder.FrameSize, true);
            var pcmWaveWriter = new PcmWaveWriter( 16000, 1);

            pcmWaveWriter.Open(@"audio.speex.wav");
            pcmWaveWriter.WriteHeader("Test conversion");

            var inDataSize = wavRawData.Length / 2;
            inDataSize = inDataSize - inDataSize % encoder.FrameSize;

            var inData = new short[inDataSize];

            for (var index = 0; index < inDataSize; index++)
            {
                inData[index] = BitConverter.ToInt16(wavRawData, index * 2);
            }

            for (var offset = 0; offset + encoder.FrameSize <= inDataSize; offset += encoder.FrameSize)
            {
                var encodedBuffer = new byte[1024];
                var encodedBytes = encoder.Encode(inData, offset, encoder.FrameSize, encodedBuffer, 0, encoder.FrameSize);

                var chunk = new byte[encodedBytes];
                Array.Copy(encodedBuffer, 0, chunk, 0, encodedBytes);
                pcmWaveWriter.WritePacket(chunk, 0, chunk.Length);
            }

            pcmWaveWriter.Close();


            IsolatedStorageFile file = IsolatedStorageFile.GetUserStoreForApplication();

            FileStream fs = file.OpenFile(@"audio.speex.wav", FileMode.Open);
            byte[] arr = new byte[fs.Length];
            fs.Read(arr, 0, arr.Length);
            fs.Close();

            return arr;

        }

        // Load data for the ViewModel Items
        private void MainPage_Loaded(object sender, RoutedEventArgs e)
        {
            if (!App.ViewModel.IsDataLoaded)
            {
                App.ViewModel.LoadData();
            }
        }

        private void Button_Hold(object sender, GestureEventArgs e)
        {
            hasspeak = true;
            microphone.BufferDuration = TimeSpan.FromMilliseconds(1000);
            buffer = new byte[microphone.GetSampleSizeInBytes(microphone.BufferDuration)];
            audiostream = new MemoryStream();
            microphone.Start();
            tbxStatus.Text = "Estou ouvindo...";    
        }
 
        private void Button_MouseLeave(object sender, MouseEventArgs e)
        {
            if (hasspeak)
            {
                this.panorama.DefaultItem = 1;
               
                hasspeak = false;
                microphone.Stop();
                SendPost();
            }
            else
            {
//                tbxStatus.Text = "";
            }
        }


        void SendPost()
        {
            ServiceReference1.ftwp7SoapClient req = new ServiceReference1.ftwp7SoapClient();
            req.SRCompleted += new EventHandler<ServiceReference1.SRCompletedEventArgs>(req_SRCompleted);
            byte[] arr = EncodeAudio(audiostream.ToArray());
            req.SRAsync(arr, "sp", "ok");     
        }

        void req_SRCompleted(object sender, ServiceReference1.SRCompletedEventArgs e)
        {
            string ok = "ok";
           
        }

   }
}