Having trouble encoding/decoding

Jan 18, 2011 at 5:22 AM

Not sure I'm using NSpeex correctly. Basically I've got a 16 bit mono stream I'm getting from SL's Microphone that I'd like to encode. My encoding loop looks like this:

   public override void AddSamples(byte[] samples, uint startIndex, uint numSamples)
        {
            // add the new samples to the end of the store buffer. because we have to pass only
            // a certain number of bytes to the encoder, we'll use storebuffer to cache the incoming data
            // and serve it up in the given chunks
            uint numConvertedSamples = numSamples/2;
            this.storeBuffer.AddSamples(samples.ToInt16(startIndex, numSamples), 0, numConvertedSamples);

            // Debug.Assert(numSamples == this.storeBuffer.NumSamples);
            var destIndex = this.outputBuffer.GetUsedBufferEnd(numConvertedSamples);
            var numSamplesToUse = this.storeBuffer.NumSamples -
                (this.storeBuffer.NumSamples%this.encoder.FrameSize);

            // encode the data in the storebuffer into the output buffer
            var count = this.encoder.Encode(this.storeBuffer.Samples, (int) this.storeBuffer.OutputSampleStart,
                                            (int) numSamplesToUse, this.outputBuffer.Samples, (int)destIndex,
                                            this.outputBuffer.Samples.Length);
            this.storeBuffer.RemoveSamples((uint)numSamplesToUse);
            this.outputBuffer.UpdateNumSamplesInBuffer((uint)count);

        }

And the decoding loop:

      
   public override void AddSamples(byte[] samples, uint startIndex, uint numSamples)
        {
            // stick this in our raw buffer
            this.rawBuffer.AddSamples(samples, startIndex, numSamples);

            // now break these up into chunks of 320
            uint numUsableSamples = this.rawBuffer.NumSamples - (uint)(this.rawBuffer.NumSamples%decoder.FrameSize);
           

            for (int i = 0; i < numUsableSamples; i += this.decoder.FrameSize)
            {
                uint offsetIndex = this.tempBuffer.GetUsedBufferEnd(4000);

                var bytesWritten = this.decoder.Decode(this.rawBuffer.Samples, i, 
                    this.decoder.FrameSize, this.tempBuffer.Samples,
                                    (int)offsetIndex, false);

                this.tempBuffer.UpdateNumSamplesInBuffer((uint)bytesWritten);

                // now need to convert back to bytes and return...
                this.OutputPipe.AddSamples(this.tempBuffer.Samples.ToByte(
                    this.tempBuffer.OutputSampleStart, this.tempBuffer.NumSamples,
                    this.numBitsPerOutputSample),
                                           0,
                                           this.tempBuffer.NumSamples);

                // remove the used samples from the temp buffer.
                this.tempBuffer.RemoveSamples(this.tempBuffer.NumSamples);
            }
            this.rawBuffer.RemoveSamples(numUsableSamples);
        }
Does this look right?
Essentially I'm getting format violations on the second iteration through the decode loop. Both encoder/decoder are initialized with the Wide band, although I've also tried narrow. Any ideas?
Jul 18, 2011 at 8:24 AM

Hallo,

i have the same problem. I can not decode audio.

I am successfully encode the audio as described in documenation:

        public byte[] EncodeAudio(byte[] rawData)
        {
            var encoder = new SpeexEncoder(BandMode.Wide);
            var inDataSize = rawData.Length / 2;
            var inData = new short[inDataSize];

            for (var index = 0; index < rawData.Length; index += 2)
            {
                inData[index / 2] = BitConverter.ToInt16(rawData, index);
            }
            inDataSize = inDataSize - inDataSize % encoder.FrameSize;

            var encodedData = new byte[rawData.Length];
            var encodedBytes = encoder.Encode(inData, 0, inDataSize, encodedData, 0, encodedData.Length);

            byte[] encodedAudioData = null;
            if (encodedBytes != 0)
            {
                encodedAudioData = new byte[encodedBytes];
                Array.Copy(encodedData, 0, encodedAudioData, 0, encodedBytes);
            }
            rawDataSize = inDataSize; // Count of encoded shorts, for debugging
            return encodedAudioData;
        }

Then, I can not decode the data. 
If I try to decode the whole data as follows, I get an IndexOutOfRangeException
       public byte[] DecodeAudio(byte[] encodedData)
        {            
            var decoder = new SpeexDecoder(BandMode.Wide);

// rawDataSize is equals to the count of encoded shorts in encoding

 

var tmpBuffer = new short[rawDataSize]; // The following line causes IndexOutOfRangeException var read = decoder.Decode(encodedData, 0, encodedData.Length, tmpBuffer, 0, false); byte[] retData = null; return retData; }

 If I try to decode the data by chunks, i get an error on the second iteration:
        public byte[] DecodeAudio(byte[] encodedData)
        {            
            var decoder = new SpeexDecoder(BandMode.Wide);
            var outBuffer = new List<byte[]>();
            var tmpBuffer = new short[2048];
            for (var idx = 0; idx + decoder.FrameSize < encodedData.Length; idx += decoder.FrameSize)
            {
                //On the second iteration: NSpeex.InvalidFormatException: Invalid mode encountered: 9
                var read = decoder.Decode(encodedData, idx, decoder.FrameSize, tmpBuffer, 0, false);
                var tmpData = new byte[read * 2];
                for (var i = 0; i < read; i++)
                {
                    var ba = BitConverter.GetBytes(tmpBuffer[i]);
                    Array.Copy(ba, 0, tmpData, i * 2, 2);
                }
                outBuffer.Add(tmpData);
            }
            var fullSize = outBuffer.Sum(m => m.Length);
            var retData = new byte[fullSize];
            var offset = 0;
            foreach (var b in outBuffer)
            {
                Array.Copy(b, 0, retData, offset, b.Length);
                offset += b.Length;
            }
            return retData;
        }

Can anyone tell me, what I am doing wrong?
Jul 18, 2011 at 4:58 PM
Edited Jul 18, 2011 at 5:20 PM

I have found a workaround myself. I encode audio also in pieces and save the header in byte array:

        public byte[] EncodeAudio(byte[] rawData)
        {
            var encoder = new SpeexEncoder(BandMode.Wide);
            var encodedData = new List<byte[]>();

            var inDataSize = rawData.Length / 2;
            inDataSize = inDataSize - inDataSize % encoder.FrameSize;
            var inData = new short[inDataSize];           

            for (var index = 0; index < inDataSize; index ++)
            {
                inData[index] = BitConverter.ToInt16(rawData, index*2);
            }
            
            var encodingFrameSize = encoder.FrameSize;
            var encodedBuffer = new byte[1024];
            for (var offset = 0; offset + encodingFrameSize < inDataSize; offset += encodingFrameSize)
            {
                var encodedBytes = encoder.Encode(inData, offset, encodingFrameSize, encodedBuffer, 0, encodingFrameSize);
                var encodedFrame = new byte[encodedBytes];
                Array.Copy(encodedBuffer, 0, encodedFrame, 0, encodedBytes);
                encodedData.Add(encodedFrame);
            }
            var ms = new MemoryStream();
            ms.Write(signature, 0, signature.Length); // write a signature to ensure, that it is a right format

            var countBits = BitConverter.GetBytes((Int32)encodedData.Count);
            ms.Write(countBits, 0, countBits.Length); // A count of frames encoded
            foreach (var d in encodedData) // A list of lengths of frames
            {
                var bc = BitConverter.GetBytes((Int32)d.Length);
                ms.Write(bc, 0, bc.Length);  
            }
            foreach (var d in encodedData) // Data 
            {
                ms.Write(d, 0, d.Length);
            }
            return ms.ToArray();
        }

This array can be successfully decoded back (also by pieces) and played. So, I still need to add to this array the info about audio format (Samples/Sec,  Channels, Bits/Sec) and then it must be stored in database.  Is there any standard solutions for preparing and storing of the audio data in the (remote) database?

Coordinator
Jul 18, 2011 at 5:25 PM

Hi AlexPuh,

you are right. You have to wrap each encoded frame in some sort of container. The simplest container will at least contain the encoded frame size. Even that can be taken away (or only set in the first container) if you work with constant bit rates as each encoded frame will have the same size.

Encoding and decoding with Speex always works on a frame basis which requires to put in correctly sized audio data for encoding and decoding but as mentioned before if you always encode with the same sampling rate, quality and constant bit rate this will be constant.

Common containers are Ogg, etc. but for simple scenarios you can make up your own.

Christoph

Aug 1, 2011 at 5:12 AM

Ok, enclosing in the container did the trick. THe thing that was really unobvious was that I needed to pass encoded frame size to the decoder, not decoder.FrameSize.

Now, however, I've got a different "fun" issue: the played back stream appears sped up by a factor of 10x or so. I'm not really sure what to attribute that to.

Couple things come to mind:

1. Is the encoder assuming the input stream is mono or stereo? I'm feeding it a mono stream... could that play into it?

2. I've checked the sample rate, appears to be 16k throughout...

What else could this be? Any ideas?

Aug 14, 2011 at 5:43 PM
//Encoding Part
public static List<byte[]> EncodeAudio(byte[] rawData)
        {

            List<byte[]> encodedData = new List<byte[]>();

            int inDataSize = rawData.Length/2;
            int encodingFrameSize = encoder.FrameSize;
            inDataSize = inDataSize - inDataSize % encodingFrameSize;
            short[] inData = new short[inDataSize];

            for (int index = 0; index < inDataSize; index++)
            {
                inData[index] = BitConverter.ToInt16(rawData, index * 2);
            }

            
            byte[] encodedBuffer = new byte[1024];

            for (int offset = 0; offset + encodingFrameSize < inDataSize; offset += encodingFrameSize)
            {
                int encodedBytes = encoder.Encode(inData, offset, encodingFrameSize, encodedBuffer, 0, encodingFrameSize);
                byte[] encodedFrame = new byte[encodedBytes];
                Array.Copy(encodedBuffer, 0, encodedFrame, 0, encodedBytes);
                encodedData.Add(encodedFrame);
            }

            return encodedData;
}


//Decoding Part


public static List<byte[]> DecodeAudio(List<byte[]> encodedAudioData)
        {
            List<byte[]> decodedAudioData = new List<byte[]>();
            foreach (byte[] dataChunk in encodedAudioData)
            {
                short[] decodedFrame = new short[encoder.FrameSize];
                int decodedBytes = decoder.Decode(dataChunk, 0, dataChunk.Length, decodedFrame, 0, false);
                byte[] DecodedBuffer = new byte[decodedBytes];
                //int byteIndex ;
                for (int shortIndex=0, byteIndex=0 ; byteIndex < decodedBytes; shortIndex++)
                {
                    DecodedBuffer[byteIndex] = (byte)(decodedFrame[shortIndex] >> 8);
                    DecodedBuffer[++byteIndex] = (byte)(decodedFrame[shortIndex]);
                    byteIndex++;
                }
                decodedAudioData.Add(DecodedBuffer);
            }

            return decodedAudioData;
}


while i tried to play the decoded data... i hear a buzzing sound(similar to no signals sound in radio).... here to play the audio i used the header of the wav file...
guys can anyone help me.... i think there's a problem in the decoding part
Aug 16, 2011 at 6:01 PM

The issue might be how you're coverting back into a bytestream. Instead of

DecodedBuffer[byteIndex] = (byte)(decodedFrame[shortIndex] >> 8);
DecodedBuffer[++byteIndex] = (byte)(decodedFrame[shortIndex]);
what I do is

 

BitConverter.GetBytes(samples[startIndex + i]).CopyTo(result, i * 2);

 

So I solved the speedup problem... somewhat. Now instead of a 10x speedup, I'm getting about a 1.2x speedup, and a lot of crackle in the audio. Unusable as is.

Coordinator
Aug 16, 2011 at 6:07 PM

The Encoder assumes mono.

How do you fill the playback buffer you have to know that you have to be absolutely in sync with the playback buffer in order to be able to play it correctly.

Aug 17, 2011 at 1:17 PM

 

                    BitConverter.GetBytes(decodedFrame[shortIndex]).CopyTo(DecodedBuffer, shortIndex * 2);

can anyone tell me why the quality of the sound is poor... and give the solution
Dec 26, 2011 at 5:32 PM

AlexPuh, can you share both Encode and Decode method implementations, as well as signature you are using? 

May 3, 2012 at 12:01 PM

I have the same problem with sound speed. Was anyone able to fix it? Thanks.