Encoding And Decoding - WTF Am I doing wrong?

Mar 15, 2013 at 12:29 AM
Ok, I've tried posting twice before and I really hope these forums aren't dead because this is absolutely critical...

I've got a short[] representing my audio data. I need to convert this to byte[] via Speex, send it over the network, then convert it back to short[] and play it. These are the functions I am using:
private byte[] EncodeAudio( short[] rawData )
{
    var encoder = new NSpeex.SpeexEncoder( NSpeex.BandMode.Wide );
    encoder.Quality = 1;
    var encodedData = new List<byte[]>();

    var inDataSize = rawData.Length;
    inDataSize = inDataSize + inDataSize % encoder.FrameSize;
    var inData = new short[ inDataSize ];
    System.Array.Copy( rawData, inData, rawData.Length );

    var encodingFrameSize = encoder.FrameSize;
    var encodedBuffer = new byte[ 1024 ];
    for( var offset = 0; offset + encodingFrameSize < inDataSize; offset += encodingFrameSize )
    {
        var encodedBytes = encoder.Encode( inData, offset, encodingFrameSize, encodedBuffer, 0, encodingFrameSize );
        var encodedFrame = new byte[ encodedBytes ];
        System.Array.Copy( encodedBuffer, 0, encodedFrame, 0, encodedBytes );
        encodedData.Add( encodedFrame );
    }

    var ms = new MemoryStream();

    var countBits = BitConverter.GetBytes( (Int32)encodedData.Count );
    ms.Write( countBits, 0, countBits.Length );

    for( int i = 0; i < encodedData.Count; i++ )
    {
        var d = encodedData[ i ];
        var bc = BitConverter.GetBytes( (Int32)d.Length );
        ms.Write( bc, 0, bc.Length );
    }
    for( int i = 0; i < encodedData.Count; i++ )
    {
        var d = encodedData[ i ];
        ms.Write( d, 0, d.Length );
    }

    return ms.ToArray();
}

private short[] DecodeAudio( byte[] encodedData )
{
    var decoder = new NSpeex.SpeexDecoder( NSpeex.BandMode.Wide, false );
    var outBuffer = new List<byte[]>();
    var tmpBuffer = new short[ 1024 ];

    int numPackets = BitConverter.ToInt32( encodedData, 0 );

    List<int> packet_sizes = new List<int>();

    for( int i = 0; i < numPackets; i++ )
    {
        int frame_size = BitConverter.ToInt32( encodedData, 4 + ( i * 4 ) );
        packet_sizes.Add( frame_size );
    }

    int fr_index = 0;
    for( var idx = 4 + ( numPackets * 4 ); idx + packet_sizes[ fr_index ] < encodedData.Length; idx += packet_sizes[ fr_index ] )
    {
        var read = decoder.Decode( encodedData, idx, packet_sizes[fr_index], tmpBuffer, 0, false );
        var tmpData = new byte[ read * 2 ];
        for( var i = 0; i < read; i++ )
        {
            var ba = BitConverter.GetBytes( tmpBuffer[ i ] );
            System.Array.Copy( ba, 0, tmpData, i * 2, 2 );
        }
        outBuffer.Add( tmpData );
        fr_index++;
    }
    var fullSize = outBuffer.Sum( delegate( byte[] m ) { return m.Length; } );
    var retData = new byte[ fullSize ];
    var offset = 0;
    for( int i = 0; i < outBuffer.Count; i++ )
    {
        var b = outBuffer[ i ];
        System.Array.Copy( b, 0, retData, offset, b.Length );
        offset += b.Length;
    }

    short[] pcm = new short[ retData.Length / sizeof( short ) ];
    Buffer.BlockCopy( retData, 0, pcm, 0, retData.Length );

    return pcm;
}
But this simply isn't working. The sound is incredibly choppy. This isn't the case with the other two codecs I have implemented (ADPCM and G711), both sound absolutely perfect but I just for the life of me can't get Speex to work.
Please, can anybody point out what the hell I'm doing wrong?
Coordinator
Mar 15, 2013 at 6:02 AM

Your code is a bit difficult to understand buy check out the sample in the source of this project. You will find exactly what you are looking for.

Chris

Am 15.03.2013 00:29 schrieb "Wiffledude" <notifications@codeplex.com>:

From: Wiffledude

Ok, I've tried posting twice before and I really hope these forums aren't dead because this is absolutely critical...

I've got a short[] representing my audio data. I need to convert this to byte[] via Speex, send it over the network, then convert it back to short[] and play it. These are the functions I am using:
private byte[] EncodeAudio( short[] rawData )
{
    var encoder = new NSpeex.SpeexEncoder( NSpeex.BandMode.Wide );
    encoder.Quality = 1;
    var encodedData = new List<byte[]>();

    var inDataSize = rawData.Length;
    inDataSize = inDataSize + inDataSize % encoder.FrameSize;
    var inData = new short[ inDataSize ];
    System.Array.Copy( rawData, inData, rawData.Length );

    var encodingFrameSize = encoder.FrameSize;
    var encodedBuffer = new byte[ 1024 ];
    for( var offset = 0; offset + encodingFrameSize < inDataSize; offset += encodingFrameSize )
    {
        var encodedBytes = encoder.Encode( inData, offset, encodingFrameSize, encodedBuffer, 0, encodingFrameSize );
        var encodedFrame = new byte[ encodedBytes ];
        System.Array.Copy( encodedBuffer, 0, encodedFrame, 0, encodedBytes );
        encodedData.Add( encodedFrame );
    }

    var ms = new MemoryStream();

    var countBits = BitConverter.GetBytes( (Int32)encodedData.Count );
    ms.Write( countBits, 0, countBits.Length );

    for( int i = 0; i < encodedData.Count; i++ )
    {
        var d = encodedData[ i ];
        var bc = BitConverter.GetBytes( (Int32)d.Length );
        ms.Write( bc, 0, bc.Length );
    }
    for( int i = 0; i < encodedData.Count; i++ )
    {
        var d = encodedData[ i ];
        ms.Write( d, 0, d.Length );
    }

    return ms.ToArray();
}

private short[] DecodeAudio( byte[] encodedData )
{
    var decoder = new NSpeex.SpeexDecoder( NSpeex.BandMode.Wide, false );
    var outBuffer = new List<byte[]>();
    var tmpBuffer = new short[ 1024 ];

    int numPackets = BitConverter.ToInt32( encodedData, 0 );

    List<int> packet_sizes = new List<int>();

    for( int i = 0; i < numPackets; i++ )
    {
        int frame_size = BitConverter.ToInt32( encodedData, 4 + ( i * 4 ) );
        packet_sizes.Add( frame_size );
    }

    int fr_index = 0;
    for( var idx = 4 + ( numPackets * 4 ); idx + packet_sizes[ fr_index ] < encodedData.Length; idx += packet_sizes[ fr_index ] )
    {
        var read = decoder.Decode( encodedData, idx, packet_sizes[fr_index], tmpBuffer, 0, false );
        var tmpData = new byte[ read * 2 ];
        for( var i = 0; i < read; i++ )
        {
            var ba = BitConverter.GetBytes( tmpBuffer[ i ] );
            System.Array.Copy( ba, 0, tmpData, i * 2, 2 );
        }
        outBuffer.Add( tmpData );
        fr_index++;
    }
    var fullSize = outBuffer.Sum( delegate( byte[] m ) { return m.Length; } );
    var retData = new byte[ fullSize ];
    var offset = 0;
    for( int i = 0; i < outBuffer.Count; i++ )
    {
        var b = outBuffer[ i ];
        System.Array.Copy( b, 0, retData, offset, b.Length );
        offset += b.Length;
    }

    short[] pcm = new short[ retData.Length / sizeof( short ) ];
    Buffer.BlockCopy( retData, 0, pcm, 0, retData.Length );

    return pcm;
}
But this simply isn't working. The sound is incredibly choppy. This isn't the case with the other two codecs I have implemented (ADPCM and G711), both sound absolutely perfect but I just for the life of me can't get Speex to work.
Please, can anybody point out what the hell I'm doing wrong?

Read the full discussion online.

To add a post to this discussion, reply to this email (nspeex@discussions.codeplex.com)

To start a new discussion for this project, email nspeex@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com

Mar 15, 2013 at 8:21 PM
I'm sorry, I can't seem to find the code sample you're referring to.
I'm beginning to really pull my hair out over this - I have no doubt that NSpeex will be useful but the complete lack of documentation is turning out to be a huge pain in the @55.
Mar 15, 2013 at 9:22 PM
Actually I might have made some headway on this.
I went into the source code and modified a few lines in order to avoid the IndexOutOfRange exceptions you'd normally get if you tried encoding/decoding audio in one go instead of in chunks. This is OK for me since I'm already chunking up my audio before encoding it (should be small enough for Speex to handle)
I tested this with a soundfile of me speaking into the microphone. I read the float[] pcm samples, converted those into short[] pcm samples, fed into the speex encoder, then fed the result directly into the speex decoder, and then finally took the result, fed it into new audio clip and played that. It was a fairly long sound so it takes a while for Speex to chew through all of the data, but the result is absolutely perfect and around 8% the size of the original raw audio data.
Now I need to roll this code into my main project and cross my fingers :)
Mar 16, 2013 at 12:40 AM
OK, yes!
It works just about perfectly!
My only issue now is for some reason it consumes absolutely insane amounts of CPU. Perhaps if instead of encoding as soon as I receive microphone data, I stick the raw microphone input on a queue which is periodically processed?
May 15, 2013 at 2:56 PM
How did you resolve your probleam.

Can you post your code (Encode/Decorde) here?

I'am trying to do the same, but when I play in other device, the sound is wrong.
Jul 19, 2013 at 5:37 PM
Edited Jul 19, 2013 at 5:38 PM
So here's what you need to do I found.

First, you NEED to make sure your input data is properly sized. If you're on Narrow mode, I think it takes 320 samples, or at Wide mode 640 samples. In any case, it very specifically needs to be that many samples! Or I think a multiple of that many.

Then just pass your data to encode.
Then, you need to transmit the encoded data, and the length of the data (what's returned by the Encode method). I myself packed this all into a byte array (first four bytes is length, all the rest is encoded data).
Then just pass this to Decode.

And it should work perfectly - or at least it does on my end.
Jul 30, 2013 at 10:33 AM
@Wiffledude: i'm trying to replicate your code. maybe a noob question, but even if the function Sum applied to a List<T> is documented on the web, i really can't make it work via VisualStudio. The line of code:
var fullSize = outBuffer.Sum( delegate( byte[] m ) { return m.Length; } );
generates an error, as the List type does not contain a definition for the Sum function.
Would you please explain to me how did you managed to make it work? Also, if I have to decode a speex file not encoded by me, how can I get the number of samples for that specific file? is it fixed (as you said, 320 for narrow, 640 for wide etc.) or do I have to inspect the file?
Thanks in advance!
Jul 30, 2013 at 4:20 PM
spiaggefredde wrote:
@Wiffledude: i'm trying to replicate your code. maybe a noob question, but even if the function Sum applied to a List<T> is documented on the web, i really can't make it work via VisualStudio. The line of code:
var fullSize = outBuffer.Sum( delegate( byte[] m ) { return m.Length; } );
generates an error, as the List type does not contain a definition for the Sum function.
Would you please explain to me how did you managed to make it work? Also, if I have to decode a speex file not encoded by me, how can I get the number of samples for that specific file? is it fixed (as you said, 320 for narrow, 640 for wide etc.) or do I have to inspect the file?
Thanks in advance!
Lemme grab the actual encoding/decoding function from my code.
int byteLen = 320;
switch( mode )
{
    case BandMode.Narrow:
        byteLen = 320;
        break;
    case BandMode.Wide:
        byteLen = 640;
        break;
    case BandMode.UltraWide:
        byteLen = 1280;
        break;
}

byte[] encoded = new byte[ byteLen + 4 ];

int length = speexEnc.Encode( input, 0, input.Length, encoded, 4, encoded.Length );

// first 4 bytes contains length
byte[] len_bytes = BitConverter.GetBytes( length );

System.Array.Copy( len_bytes, encoded, 4 );

return encoded;
This function takes a short array of 16 bit PCM samples. Before I pass them off to this function, I always ensure they are the correct size depending on bandwidth mode (320, 640, or 1280)
As you can see, I take the 'Length' that was returned by Encode, and copy that to the beginning of the output byte array (so the first four bytes is encoded length, and the rest are actual encoded data)

And my decoding function:
int shortLen = 320;
switch( mode )
{
    case BandMode.Narrow:
        shortLen = 320;
        break;
    case BandMode.Wide:
        shortLen = 640;
        break;
    case BandMode.UltraWide:
        shortLen = 1280;
        break;
}

byte[] len_bytes = new byte[ 4 ];
System.Array.Copy( input, len_bytes, 4 );

int dataLength = BitConverter.ToInt32( len_bytes, 0 );

byte[] actual_bytes = new byte[ input.Length - 4 ];
Buffer.BlockCopy( input, 4, actual_bytes, 0, input.Length - 4 );

short[] decoded = new short[ shortLen ];

speexDec.Decode( actual_bytes, 0, dataLength, decoded, 0, false );

return decoded;
And for me, that works absolutely perfectly
Aug 6, 2013 at 4:38 AM
Edited Aug 6, 2013 at 7:47 AM
Thanks @Wiffledude, your sample works!

btw, when converting float to short you need shortvalue = floatvalue * 32767.0f;
http://music.columbia.edu/pipermail/music-dsp/2003-January/021613.html
Nov 18, 2013 at 12:16 AM
Hi
can you paste your entire coding and decoding function ?