In this post we are going to manipulate some wav files using Python. We would like to add some noise or sound to a recorded wav file.
Record your voice
Let’s record a sound sample of your voice using the rec command
userk@dopamine:~$ mkdir temp && cd temp userk@dopamine:~/temp$ rec voice.wav Input File : 'default' (alsa) Channels : 2 Sample Rate : 48000 Precision : 16-bit Sample Encoding: 16-bit Signed Integer PCM In:0.00% 00:00:03.16 [00:00:00.00] Out:147k [ | ] Hd:0.0 Clip:0 PRESS CTRL+C Aborted.
To play your recorded voice use the play command
userk@dopamine:~/temp$ play voice.wav voice.wav: File Size: 590k Bit Rate: 1.54M Encoding: Signed PCM Channels: 2 @ 16-bit Samplerate: 48000Hz Replaygain: off Duration: 00:00:03.07 In:100% 00:00:03.07 [00:00:00.00] Out:147k [ | ] Hd:0.0 Clip:0 Done.
By default, the rec command uses 2 channels and a sampling rate of 48kHz. This information is important since we are going the manipulate this file. For simplicity’s sake we will use sox to convert the sample to a single channel signal with 16kHz sampling rate.
userk@dopamine:~/temp$ sox -t wav voice.wav -t wav -r 16000 -b 16 -e signed-integer -c 1 voice1.wav
Then listen to the newly created sample:
userk@dopamine:~/temp$ play voice1.wav voice1.wav: File Size: 98.3k Bit Rate: 256k Encoding: Signed PCM Channels: 1 @ 16-bit Samplerate: 16000Hz Replaygain: off Duration: 00:00:03.07 In:100% 00:00:03.07 [00:00:00.00] Out:49.2k [ | ] Clip:0 Done.
Download the sound sample
Feel free to download any sound you want but keep in mind that we are looking for a .wav file.
You can download a file directly from this link and use it with this tutorial.
So, for example let’s say you are interested in overlapping a car sound to your voice, then:
userk@dopamine:~/temp$ wget http://www.userk.co.uk/download/sound/machine.wav userk@dopamine:~/temp$ ls machine.wav voice.wav voice1.wav
We can analyze the characteristics of the sound file using avprobe from the package libav-tools.
userk@dopamine:~/temp$ sudo apt-get install libav-tools userk@dopamine:~/temp$ sudo apt-get install libav-tools userk@dopamine:~/temp$ avprobe machine.wav Input #0, wav, from 'machine.wav': Metadata: encoded_by : ZOOM Handy Recorder H4n date : 2013-06-29 creation_time : 01:17:38 time_reference : 223584000 coding_history : A=PCM,F=48000,W=16,M=stereo,T=ZOOM Handy Recorder H4n Duration: 00:01:08.10, bitrate: 1536 kb/s Stream #0:0: Audio: pcm_s16le ( / 0x0001), 48000 Hz, 2 channels, s16, 1536 kb/s
As you can see the wav file has a sampling rate of 48kHz and 2 channels. We need to convert it using sox to match the characteristics of the recorded voice.
userk@dopamine:~/temp$ sox -t wav machine.wav -t wav -r 16000 -b 16 -e signed-integer -c 1 machine1.wav remix 2
Please note that, in the above command, we have selected the second channel as source to extract the signal from.
Adding one file to the other using Python
Let’s get hands dirty! For our purpose we are going to use the numpy and audiolab library to, respectively manipulate wave files and perform Input Output operation such as open and save sound files.
We need the scikits learn, scikits.audiolab and the numpy python packages installed.
userk@dopamine:~/temp$ sudo apt-get install libsndfile1-dev userk@dopamine:~/temp$ pip install scikits.audiolab --user
Here is the script we will be using
The first two lines import the required libraries, then lines 4 and 5 imports the previously saved sound samples and extract the encoding and sampling frequency information.
Lines 7 and 8 checks whether the sample have different encoding and sampling frequency.
Line 10 picks a sub array of the same length of the voice signal from the wave file containing the noise. Note that we have used the numpy.split() method to cut the data2 array in three pieces and the // operand to execute a floor division. You can find more information about the division operand in python here.
Finally we create several noise affected samples in the for loop by weighting the noise signal.