Tuesday, July 22, 2008

Recovering an Invalid WAV File

Last Saturday I was using my newish Edirol R-09HR to record a gig (Almaden Auto Festival). I had it in a shock mount on a mic stand, which worked fine for more than 3 hours. But towards the end, a bunch of little kids were dancing nearby and, you guessed it, one of them finally hit the stand and knocked it over.

After the song ended, I jumped down from the stage to inspect the situation. This recorder writes to an SD card. It has no moving parts so I didn't expect any damage, and in fact I figured it would still be recording. Nope, the power was off. But no apparent damage to the recorder so I quickly started it up again and set it to record the last bit of our set while I jumped back on stage.

Later at home, I downloaded the files and opened them all in Sound Forge -- all but one of them. Uh-oh, the 1.5GB file containing the entire set for The Iconics was invalid. Yikes!

Eventually, as one does, I turned to Google and searched for something like [invalid wav file]. In due course this led me to some forum posts that suggested using Audacity's "Import Raw Data" function. Hooray! I'm saved!

Had been meaning to try out Audacity, so I had no hesitancy about downloading. It started right up, I found "Import Raw Data" under the "Project" menu and clicked. As expected, an options window opened. I knew my format was 24-bit, 44.1kHz stereo WAV, and I thought that would be enough. Wrong. There are also a bunch of options for big-endian, little-endian.

I won't bore the reader with details of how I spent the next hour or so. Suffice to say I tried every one of the endian options, combined with several different data formats, and never did get more than noise. I even tried doing a raw import of a file that I knew was good. Still no dice.

Finally I had an idea. I knew that the WAV files had headers, and that the raw import would treat them as if they were sound data, but I hadn't considered that the header could be throwing off the alignment of the data.

Think of it this way: 24-bit stereo requires six bytes for each sample (3 bytes per channel). The importer will start chewing up bytes in groups of six starting at the beginning of the file. If the header is a size that is not a multiple of six, then by the time the importer gets to the real data, it will be out of alignment and won't interpret the bytes correctly.

I Googled my way to a nice description of Microsoft WAVE format found at Stanford's CCRMA.

The answer is right in front of me now, but by then it's 2am and my brain is toast -- bedtime for bonzo.

The next morning I can attack it freshly. First of all, I decide to dump the beginning of the damaged file and compare it to an undamaged file, and to the fields described in the diagram.

Here's the dump of an undamaged file using the Cygwin unix command "od -N 128 -x -c". Note that od writes the offsets (the first column) in octal.

0000000 4952 4646 7e24 0d68 4157 4556 6d66 2074
R I F F $ ~ h \r W A V E f m t
0000020 0010 0000 0001 0002 ac44 0000 0998 0004
020 \0 \0 \0 001 \0 002 \0 D 254 \0 \0 230 \t 004 \0
0000040 0006 0018 6164 6174 7e00 0d68 3925 b500
006 \0 030 \0 d a t a \0 ~ h \r % 9 \0 265
0000060 ffc8 3159 1000 ffe9 16f7 7b00 fffe 0793
310 377 Y 1 \0 020 351 377 367 026 \0 { 376 377 223 \a
0000100 6b00 0012 ef9b 50ff 002a cc71 acff 0040
\0 k 022 \0 233 357 377 P * \0 q 314 377 254 @ \0

Now here's a dump of the damaged file:

0000000 4952 4646 0000 0000 4157 4556 6d66 2074
R I F F \0 \0 \0 \0 W A V E f m t
0000020 0010 0000 0001 0002 ac44 0000 0998 0004
020 \0 \0 \0 001 \0 002 \0 D 254 \0 \0 230 \t 004 \0
0000040 0006 0018 6164 6174 0000 0000 340a ed00
006 \0 030 \0 d a t a \0 \0 \0 \0 \n 4 \0 355
0000060 0048 3878 7d00 004b 3aaf 7b00 004f 3be8
H \0 x 8 \0 } K \0 257 : \0 { O \0 350 ;
0000100 7800 0051 3d29 e100 0054 4083 9200 0057
\0 x Q \0 ) = \0 341 T \0 203 @ \0 222 W \0

Some observations:
  1. The damaged file obviously has an intact WAVE header.
  2. Comparison with the chart confirms that the the actual data begins at offset 44 decimal (40 octal == 32 decimal. Count from there).
  3. In the damaged file, the "Subchunk2 Size" is zero. This makes perfect sense, since that could only be written when the file was completed, and the power was shut off before that could be done.
Now we can fix our problem by setting the "Start offset" for Audacity's raw import to 44.

That's it! It worked fine.


Micah said...

Perfect! This worked great for me! Thanks for helping me along on such a hard issue. The only thing I had to fiddle with was the Hz setting. The default is too high and everyone sounded like chipmunks. Thanks again!

Chip Chapin (G) said...

Thanks Micah, I'm glad this helped somebody. In reading it over again, I think I could have said it all in a lot less words!

Dank said...

I have in invalid wav file from a recent shoot that I can't afford to lose. I have tried audacity and audition (offsets 1-5; and tried 44 because of your post).

I can't figure out what I am doing wrong or if this file is just not recoverable. The file size is 500+ MB so there is data there. Can you please take a look at the hex screen shot and let me know if you have any ideas:


This was recorded with zoom H4N, 16-bit 44100 hz.