DSP Time in Unity

I’ve used Unity for a long time now, but until I worked on Shapesong I had never had to worry about exact timing with audio. For example, it had never mattered that two samples played exactly simultaneously, or that an audio clip needed to end at an certain time.

In the Shapesong prototype there is a section that involves a player-directed song. At first, a solitary piano loop plays, but as the player progresses more layers of loops are added on top that build to a finale. It’s important that loops remain in sync, otherwise the time difference is distractingly noticable to the player, even if the difference is only a few ms. This led me to do research on audio-specific timing in Unity, and there are some interesting quirks with how Unity handles things that I wish I would have known going in.

DSP Time and Scheduling

DSP Time in Unity is the sample-accurate time in seconds since the game has started. An exception to this is that in the editor it seems to continue running even when the game is not being played. The AudioSettings.dspTime variable is important because it is used as an input for timing based functions for the AudioSource: AudioSource.PlayScheduled() and AudioSource.SetScheduledEndTime().

Here’s a quote from the AudioSource.PlayScheduled() documentation:

“This is the preferred way to stitch AudioClips in music players because it is independent of the frame rate and gives the audio system enough time to prepare the playback of the sound to fetch it from media where the opening and buffering takes a lot of time (streams) without causing sudden CPU spikes.”

AudioSource.PlayScheduled() is great for synchronizing your audio, except that the documentation doesn’t mention that you actually need to give the audio system enough time to buffer the audio clip! If you want to start your clip at the same time you are calling AudioSource.PlayScheduled(), it’s not like it will magically buffer instantly. You may schedule the audio to play right away but it won’t start playing until a few frames later.

Frustratingly, the buffer time will depend on a number of factors: the hardware you’re playing the game on, the size of the sample, etc. For Shapesong I hacked in an “arbitrary wait time” that was tacked on any time a loop needed to be played immediately. For the player, it wasn’t noticable (it ended up only needing to be a couple hundred ms, but we had small samples), and it ensured that every time I called AudioSource.PlayScheduled(), the clip would start exactly when I wanted it to.

Starting a New Layer in the Middle of a Loop

During the guided song, the player has control of when new layers are started. This means that sometimes a new layer needs to be introduced right away instead of at the start of the next segment. To further complicate this problem, not all of the loop samples were the same length. Some were a full measure, others were a half or quarter measure.

To solve this problem, I kept track of the current “measure time” as a normalized value between 0 and 1. When it came time to play a clip that was out of phase with the measure, I would first take the ratio between the clip length and measure length. Then, I would divide the current measure time by that value to get the normalized value within the clip needed to seek to. For example, if the clip was a half of a measure and the current measure time was 0.25, then seek value would need to be 0.5. This result was modded by 1 to keep it normalized.

Conclusion

It was fun to work with the Unity audio system in more depth. While it can be annoying to work with synchronized audio, Unity does a great job overall handling most of the heavy lifting. As far as audio goes, I’m just scratching the surface; there’s really interesting things you can do, especially in the area of procedurally generating sounds in the OnAudioFilterRead() function. Perhaps that will be the topic of a future post!