Posted on Leave a comment

Implementing a serverside soundmixer using SoX

In a recent project the primary goal was to deliver a fully functional soundmixer based on ActionScript and Flash. Users should be able to mix a pre-defined set of sound samples by dropping them onto multiple tracks, as shown in Figure 1. By hitting the record button the samples selected should be mixed and made available as a downloadable MP3 file.

Fig. 1: Soundmixer Tracks

The first approach was to do the sound processing entirely on the client side, i.e. by directly mixing the sound samples’ ByteArrays using ActionScript. Unfortunately, the resulting sound quality was not satisfying (noise, etc.). Thus, the first approach to simply do some byte-adding for the left and right channel bytes and computing the average did not do the trick. Therefore, a different approach was taken.

Why not do the sound mixing entirely on the server side?

Mixing raw audio data loaded from the samples on the client side through ActionScript obviously requires a certain amount of effort. Various solutions exist ranging from simply computing weighted floats for the channel bytes to using rather complex mixing algorithms. But why reinvent the wheel in ActionScript? Fortunately, various tools exist that do an excellent job converting, concatening and mixing audio files, like SoX. Thus, if we had a chance to switch the mixing process to the backend we would have a rich set of functionality to handle the audio data selected by the user and furthermore could minimize the code on the client side.

Consequently, the next step was to define a process to delegate the mixing to the backend by telling it which audio files to mix. For this to work we have to keep in mind that there are possible empty track segments, i.e. where no sound sample should be played. This is important since when mixing multiple tracks we need to have the same length for each track. Once we have this information it is just a simply set of calls of SoX (or any other tool of your choice) with the respective command line switches and arguments to achieve the mixed audio file.

The final solution was to build a multi-dimensional array in ActionScript holding the metadata of each track and its samples. This metadata is then posted as JSON object to the backend for the actual processing. The metadata structure used was something like the following:

{
  track1 : [ "sample1", "empty", "sample2", "empty" ],
  track2 : [ "empty", "sample3", "empty", "sample1" ],
  ...,
}

In order to assemble this metadata you need to implement a function that gets called everytime the play head hits a new tracks segment. Furthermore, this should be done for each track. Once the play head hits a new segment you only need to add the respective sample identifier in the corresponding metadata field, e.g. “sample1”.

On the backend side two steps are required to get to the mixed audio file:

  1. Concatenate sound samples of each track, resulting in track1_mixed.wav, track2_mixed.wav, etc.
  2. Mix the mixed track files into a the final tracks_mixed.mp3

Note that ActionScript currently is limited to only fully support WAV files. That’s why all sound samples are available as WAV files and any internal sonud processing can be done on this raw audio data.

The Backend Code

For this project a PHP backend was used. Furthermore, a simple API was provided to handle the submission of the audio metadata and return the URL to the mixed file to be downloaded.

I will add some code later...

Conclusion

As you can see it is rather simply to process audio files on the server side using SoX or any other audio processing tool of your choice. SoX is a very powerful tool for audio processing and provides a vast range of options to work with.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.