jmtd → doom → More than you ever wanted to know about pitch-shifting
I wrote some code to re-implement random sound effect pitch-shifting in Doom, specifically the Chocolate-Doom project. This page is just an info-dump of the notes and details that I made along the way.
Measurements
A big part of the project was measuring the behaviour of the pitch-shift in old versions of Doom. One approach to this would be to run it through a dissassembler and trace the machine code. I know some people are using this approach to figure out how the OPL code works.
I took a different approach: knowing a bit about the pitch calculations from the Linux Doom source code, I made small modifications to the EXE to make the random number generator predictable. I then also modified the game data so the sound effects were replaced with a single, one-second sine wave, tuned to middle C. I then started the game up, triggered some sound effects and recorded the output. I could then compare the result against the input sine wave.
The sine wave file and test scripts and code:
- u8 middlec.wav: A single sound effect, 1 sec sine wav tuned to middle C. (8 bit mono 11050Hz)
- sine.wad:
switch.wav
replacing DSPODTH1. - switch.wad: A single sound effect, as with
sine.wad
but replacing SWITCH (Heretic sound effect) - moan.py: opens
doom1.wad
, expecting the 1.2 SW IWAD, replaces all directory entries for sound effects with one pointing at DSPODTH1 - sine.py: Opens doom2.wad and reads out all sound directory entries. opens
sine.wad
and reads all the data. writes outsine2.wad
, mapping all doom2 sfx to the one sound effect insine.wad
. - sizesfx.py: open
doom2.wad
, print out how many bytes are used by sound effects.
Doom 1.2 shareware
M_Random | pitch input | 1.2 | chocolate-doom | |
---|---|---|---|---|
255 | 113 | 1.06 | 1.117 | |
247 | 121 | 1.029 | 1.054 | |
113 | 127 | 1.005 | 1.007 | |
16 | 128 | 1 | 1 | |
141 | 131 | 0.971 | 0.9765 | |
135 | 137 | 0.928 | 0.9296 | |
0 | 144 | 0.883 | 0.875 |
The red line ("c-d") was the behaviour of my patch against Chocolate-Doom at the time of testing, and the orange line ("exp") was an improved algorithm I was testing. I eventually settled on that algorithm, but you can see there's a little room for improvement.
Heretic 1.3
Heretic has 15 distinct pitch values, and calls M_Random twice when generating them. This made things tricky with the approach I was taking, because I couldn't just fix the RNG to one value: I had to write 'stripes' of values across it, instead. I ended up writing scripts to make this easier:
- printrand.py: opens a supplied file, expecting
heretic.exe
version 1.3, prints out the RNG table - rand.py: expects two numbers. Opens
heretic.exe
, expecting version 1.3, writes outhaxetic.exe
, replacing the RNG table with the numbers supplied repeated
rndindex | pitch input | chocolate-doom | 1.3 | |
---|---|---|---|---|
82 | 120 | 1.059 | 1.024 | |
116 | 121 | 1.052 | ||
0 | 122 | 1.044 | 1.011 | |
125 | 123 | 1.036 | ||
13 | 124 | |||
11 | 125 | |||
100 | 126 | |||
102 | 127 | 1.005 | 0.98 | |
1 | 128 | 1 | ||
119 | 129 | |||
117 | 130 | |||
132 | 131 | 0.974 | ||
111 | 132 | |||
105 | 133 | |||
16 | 134 | 0.95 | 0.969 |
Compared to the doom shifting, we're quite a lot further out for Heretic, but it's still roughly right, and sounds OK in-game.
Pitch shifting code
I guessed pretty early that the Doom/DMX code wasn't doing a "proper" pitch-shift, but stretching or squashing the sample to change the length and pitch. my initial stabs therefore resized the sound effect by the same ratio as the pitch value against the "norm". This turned out to be inverted, which makes some kind of sense: a higher pitch value results in a higher pitch, which is a shorter play time, and shorter buffer.
The solution we went with was a variation on my very first hack: map source to destination buffer cells based on their percentage offset from the start of the buffer. In short, just copying a subset of samples over, or doubling up some to make up the required buffer length, but not modifying the samples in any way.
Interpolation
Proper resampling is more involved. I wrote a slight improvement on the above which did interpolation of cells in the 'pitch-up' case: every source sample is mixed into the output buffer, depending on what cells it would contribute to. Multiple source-samples to destination cells are averaged with an even weighting. A further improvement again would be to have a non-even weighting, based on how close the cells matched up.
- moan.c: my higher quality decimator, not used in the end
I actually thought this sounded better for the 8 bit 11kHz samples from Doom, but when I tried reworking it to support higher quality samples, I got a lot of noise, and would have had to write a low-pass filter. Rather than sort it out we just used the first iteration which sounded good enough and was fast to perform frequently at runtime.
Memory management
Chocolate-Doom had a sound-effect caching scheme in place before I started adding pitch-shifting. Every in-game sound effect is pre-cached at game start-up and stored in a priority-list. When a sound effect is played, it's promoted to the top of the list. If the list reaches a certain size, sounds are purged from the bottom of the list.
My first attempt at adding pitch-shifting inserted the shifted sound effects into the priority list. The trouble with this was the threshold for throwing out sound effects was set pretty high, and memory usage grew quite a lot with the extra sound effects. Here's the memory usage for Doom's DEMO1 and DOOM2's DEMO3, with the caching. The lines represent different game sample rates. p0 means shifting disabled, p1 means enabled.
In the end we decided to just not cache pitched sound effects. They are re-calculated every time they are needed. The red and blue lines on these graphs correspond to the earlier ones, the green lines are pitch shifting on with no caching.
Before we decided to just not cache, I was planning to tweak the purging algorithm to throw out pitched sounds first, and possibly set a second, lower waterline for pitched sounds. I recorded a long-ish session of Doom 2 and compared the memory usage of pitch on and cache versus pitch off, to see if it 'topped out'. I think this is inconclusive (play time was approximately 15 minutes).