5: Mixing Functionality for The Interactive Diegesis
In relation to establishing solutions theoretically to audio mixing problems it is important to take into account the functionality and capabilities of the tools available to achieve these desired results.
The two leading third party audio implementation engines, Firelight Technologies F-MOD and Audiokinetic’s WWise lead the field in game audio middleware each offering an increasing array of functionality and creative options to the designer through an integrated single project system, not unlike mixing a record in a DAW, allowing the Sound Designer to organize all the sounds that contribute to the game as well as, in the case of Wwise, the motion feedback controls relating to physical feedback (rumble) on hand held devices. How these audio assets behave and are processed in accordance with in game parameters and triggers can all be set using the various interfaces. Using Codemasters ‘Operation Flashpoint’ first person shooter/open world combat simulator as a template, the following dissects their respective creative potential with relation to enabling the application of theoretical devices.
One of the most useful functions of a mixing desk either physical or contained within a Digital Audio Workstation or indeed an analogue mixing desk is the ability to group channels of sounds together and apply parameters across the resultant output. If we look at these sound design projects mixing set ups we can see that concurrent streams of audio have been allocated to a sympathetic mixer group fader which are in turn sub groups of wider generic group faders. [fig 4] Of course the graphical mixer offers a clarity that the sound engineer can relate to in relation to his physical studio or live work experience.
However the digital, ‘ in the box’, realm offers approaches to mixing that do not necessarily necessitate the image of a mixing desk.
For example, relative audio element levels in the figure four project were set in the edit screen of Cubase. Rather then automating a whole channel, volumes could be adjusted by selecting the audio block and altering the gain set against the level of the other concurrent elements. When running a sound design project this approach could be seen to be more appropriate. Also, channel automation is often drawn into the edit window rather than played out on a fader. It could be argued that a project of this type could easily be completed without the use of a graphical virtual mixing console and as such the absence of such a window, until recently, in leading middleware audio engines should make sense to the audio designer. However WWise does now feature a mixing desk for simulation purposes in capture sessions though it is actually the Mixer States functionality that is used to actually provide snapshots at run time. Firelight Systems also do plan to launch an interactive mixer for FMOD that looks graphically like a mixing desk though this has not yet been released at time of writing but was featured at the last GDC conference so release should be imminent. [fig 5]
For the purposes of game audio, sounds are triggered according to variable event arguments and parameter fluctuations and as such events and their associated sound definitions take the place of linear channels in a DAW editor. In FMOD you can create event groups that contain a multitude of associated events. A typical event group for a game such as in ‘Operation Flashpoint’ might be, for instance, footsteps. Contained within would be the sound definitions for the players footsteps on all surfaces within the game environment. These would be single shot layered sound instances. In this case, as we can see here that we have a layer for the boot hitting the surface itself and three layers of accompanying clothing and gear sounds. Parameter wise we have static 3D spatialisation as these sounds are attached to our character and move with us and a velocity parameter layer that increases volume in relation to the speed of the player moving across the terrain. Variation is provided by each segment of the ‘surface’ layer being attached to seven footsteps each on a 14.29% play probability. [fig 6]
However, in the case of more complex events such as ambiences or vehicle movement we see layering of sounds in channels for simultaneous playback with multiple automated parameters and effects. As a consequence multiple channels of audio are grouped together in an event Sound Definition that acts as a group channel just like the footsteps. In FMOD the game engine talks directly to the in game parameters such as distance, time etc to articulate game states on a individual event basis, which doesn’t give the sound designer a clear idea of how sounds are working together without loading them into the game. The Wwise Actor-Mixer object likewise works much the same way, grouping ‘child‘ objects together so that they can be associated with a single game object.
In Wwise sounds are grouped and routed through buss’s much like the use of groups and buss functionality in a DAW and split into three hierarchies. The ‘Master – Mixer’ hierarchy at the top. Equivalent to group faders and master output channels. The ‘Actor Mixer’ hierarchy below that in this example contains some ambient distant combat sounds sub-groups and organizes assets using a variety of ‘containers‘ and a dedicated ‘Music Hierarchy’ which does the same job as the ‘Actor – Mixer’ hierarchy though with dedicated music functionality.
The Master-Mixer hierarchy allows us to route the audio assets of our game through various categories. and apply effects to them at a buss level. This is particularly effective for applying environmental immersive effects across a range of sounds that need to gel. In this way we could emulate Walter Murch’s layering for ‘Apocalypse Now’ or along those lines anyway. Hence we could have six sub mixes split into Vehicles, Music, Weapons, Explosions, Footsteps and character movement sounds and finally Ambiences and additional foley. Being a game we would of course need a final group containing front end ontological interface sounds.
The ‘containers’ referred to earlier mark a departure from comparisons to comparable linear audio systems in that they concern themselves with the interactive and adaptive qualities of the gaming medium. For example you would not have a device in film that randomly triggers different variations of weapon sound every time the movie was watched or the sound designer couldn’t decide in post production which sound to use so he randomized channel activation. So what we do have here is devices that can randomly pick from a number of similar but not the same assets and devices that blend and layer sounds and transition between them in accordance with changing game states. These devices are interfaced with through containers that can be chained together to accomplish desired results. In the example below a ‘random container‘ picks between several ‘blend containers‘ which contain the layered sounds of a distant conflagration. In turn the ‘Blend’s are each connected to a different ‘random’ container that picks between some distant explosions to intersperse the flatter ambiances in the ‘Blend’ container.
Below we can see the original FMOD layering of sounds that make up one of the randomized ambient events utilizing Sound Definition layers. [fig 9] This has been successfully replicated in WWise using blend containers.
4.2 SoundSeed: Solving Repetition, Saving RAM and Aiding Environmental Immersivity.
Wwise’s SoundSeed sets out to solve one of the most persistent challenges in immersive diegetic game audio, that of repetition. Repetition of audio assets in game play can often run in opposition to the game designers desire to offer immersion in the diegisis occupied by the players avatar. We, in our experience as human beings, are instinctively aware that each footstep we make sounds different to the last and that if we clash two pieces of metal together, for instance a sword shield clash in a game, that each hit we would make in the real world would sound perceivably different. Raymond Biggs of Firelight Technologies when talking about new developments for FMOD “One unique aspect of game audio that sound designers and composers have to take into account and try to avoid is listener fatigue caused by repetition” Never the less the games audio designer has always been forced to compromise on this by three integral factors all focused around expense. The expense of runtime memory compounded by the issue that audio is competing for processor resources with all the other elements of the game and that “For example, a reverb effect of the quality of Waves’ IR1 convolution reverb would blow the entire audio CPU budget for most games – and that’s just for one effect.” . In addition media storage (space on the DVD, cartridge et al) and expense of paying foley artists to create, in most games, many times the amount of audio that would be needed for an equivalent film.
Existing as an additional module that supplies data for real time re-synthesis of audio assets creating infinite variations of the same, sword clash, for instance. Soundseed Impact illustrated here consists of two components, a standalone impact modeler and an effect plug in which integrates the technology into the main Wwise system. In addition one of its main assets is that by only storing residual noise and not resonance, file sizes for individual impact assets are radically reduced.
“So far it looks like SoundSeed can make a lot of difference for us. It may actually add a new layer of creativity, allowing for a new type of sound design – a design that happens within the tool itself. This will benefit everyone, especially the end user who will experience more variety. If we didn’t have to worry about memory usage we probably wouldn’t have noticed SoundSeed in the first place. Its main advantage for us is technical and concerns the use of RAM.”  “For example, we can use only one gun sound waveform for a particular type of gun instead of five to ten. Multiply this with all the sounds where this is possible and we may be saving huge amounts of precious RAM. It looks like SoundSeed can save us some considerable time and money once we get it up and running in EVE, and our sound design methods are geared towards the use of it.” 
SoundSeed effectively separates impact noise from the vibrations that occur following the contact of swords or footsteps or any other impact. The follow on vibrations being the element of the sound that takes up by far the most space in the audio file. So what SoundSeed does is analyze the resonant portion of the sound and extract it from the impact noise content, leaving a Residual file. The data from the resonant element is saved as a text file containing the frequency, magnitude and bandwidth of each detected resonant mode. Therefore at runtime, in the game, the engine only has to call on the audio file containing the noise as this will be re-synthesized by the SoundSeed plugin with the data extracted in the modeler offline which is only a text file.
So now SoundSeed has gone some way to solving the RAM and storage issues in terms of file size but like ‘EVE’s gun sound, by modifying the properties recorded in the text file of frequency, magnitude and bandwidth SoundSeed can generate an infinite number of variations of that sound as demonstrated here.
6.4 Vehicle Simulations In FMOD and Wwise.
A particular challenge for mixing immersive diegetic audio in games is that of the replication of moving vehicles particularly those piloted by the player. On an instinctive level it could be said with some certainty that the average human being is instinctively aware of the behavior of sound in relation to engine powered vehicles in terms of pitch as the engine speeds up, gear shifts discrepancy between velocity and engine rev per minute etc. But also, if interactive, those sounds act as kinetic devices reacting to the players control of the vehicle and providing feedback of the consequences. As you will see from these examples this requires several concurrent layers of WAV files, each subject to, in the case of this simulated vehicle, five parameter layers with several signal processors acting independently subject to that parameters position. Most of the signal processing is simple volume curves across sequentially crossfaded samples with a pitch shift applied only to the engine wine layer the other pitch effects being hard-baked into the individual samples.
Both packages feature particular functionality designed to make vehicle simulation easier. FMOD’s Engine Designer and Wwise’s CarSim essentially allow you to simulate how your mix should sound once integrated into the game. The FMOD system links a slider labeled RPM to the ‘RPM’ primary parameter in the Event Editor with which you can speed up or slow down the engine.
Also you are able to set up the load fluctuations associated with engine acceleration by using the sliders in the Engine Settings panel.
A key effect for effective diegetic immersion when dealing with vehicles operating in a three dimensional space is specialization processing, achieved here through the use of a 3D Pan Level parameter.
Again, like the Engine Designer, FMOD offers quick simulation of the programmers 3D parameter attenuation using the 3D Audition tool. In Wwise it is a fairly simple process to derive a very similar engine system by creating a ‘Blend’ container, which of course can contain the five layers in the same way as the FMOD ‘Event Editor’. We can now of course set volume and pitch parameters as well as crossfade samples in the same way as the FMOD example as shown here and demo them using the transport bar which is set up with separate Engine Load and RPM controls . [clip 19]
Also 3D specialization can be handled and simulated using the Attenuation editor as seen below. This can be applied of course to individual sound layers. [fig 14]
6.5 Side-Chaining To Achieve Clarity
In sound design for film, once viewed, it is a fairly simple process to set the automation of levels for your sound content dependent on the needs inherent to the effective communication of essential narrative and immersive function. In interactive games this is obviously not the case. If we look back at Walter Murch’s battle scene in Apocalypse Now and we revisit the shot of the soldier yelling “I’m not going, I’m not going” we can hear that Murch has backed off the other sounds in the mix so that the soldiers voice is clearly audible. Knowing when a piece of dialogue such as this might be triggered in a game can be hard to predict. If triggered kinetically this would be impossible in more open games. Hence a solution has been incorporated into WWise that has been borrowed from music production techniques. Side-chaining effectively allows the amplitude of one signal to work as an argument that manipulates another audio channel. A typical use is to lower the volume of the bass guitar slightly when the kick is played to give an impression of tightness and accentuate the kick drum. Similarly it is often used with vocals and competing instruments in the same frequency range. The key is that this technique works automatically, making it a very attractive tool to the game sound designer. As in interactive scenarios we can not accurately predict where we need to ‘duck’ sounds a system that prioritizes sounds automatically could arguably solve a lot of mixing problems. Of course other sound engines including FMOD have a sound prioritization system that allows the volumes of the audio assets to be dropped for pre-determined lengths of time according to game events. However by utilizing WWise’s Side-chaining functionality, transients are taken account of and automatic mixing should sound much more fluid with the faster automated response times that this functionality provides.
The following illustrates a proposed application of this system where dialogue could be used to temporarily lower the volume of the players weapon.
For this exercise we have two busses. One dealing with weapon sounds and one the game dialogue. For there to be communication between the two busses we can assign a game parameter that would represent a feasible dynamic range.
As we wish, in this case, to ensure that the dialogue is audible in preference to the sound of the weapon firing we need the engine to monitor the level of the output signal of the former in order for it to effect the latter. We can do this by inserting Wwise’s ‘Meter’ effect into the dialogue buss. Now, by providing an RTPC volume curve responsive to our dynamic range parameter in the ‘Weapon’ property editor. ‘X’ axis being the decibel reading ‘Y‘ being volume. [clip 20] [fig 15].
As a result of this relatively simple process we now hear that as dialogue between the team kicks in the sounds of the weapons are appropriately attenuated. In terms of creating an immersive audio environment it may seem that we are pulling away from realism by implementing such engineering characteristics. It could be argued that in a fierce fire fight we should barely be able to hear our team mates and so are negating our commitment to realism. However, this of course would depend on the degree of attenuation offered by our volume curve. The human hearing system naturally attenuates and filters frequencies so that the listener can focus on what he/she is concentrating. Hence your ability to follow a conversation in a busy bar with many competing loud sources of sound. We should probably expect the same cognitive response in the situation of a firefight and hence we should be able to conceive a subtler volume curve in the side-chain that better allows us to hear essential communication whilst not dropping the sounds of weapon fire to the extent that the player is aware of the engineering.
It is of course true that you could use side-chaining to effect any RTPC parameter of your choosing such as maintaining assets rather than cutting them by applying equalization to the speech intelligibility frequency range, cutting those frequencies to make room for speech. Or maybe an alien presence might effect communications equipment increasing distortion in relation to proximity.
6.6 Dynamic Mixing Solution Using States In Wwise
However, in WWise you can also articulate dynamic mixing solutions for an array of audio objects simultaneously using States. The State Group Property Editor can be used via busses to effect either individual or whole categories of sounds dependent on the arguments generated by the game, such as fight intensity for example. These snapshots allow combined automation of four properties. Pitch, a low pass filter, volume and Low Frequency Emitter level. LFE being the low frequency effect channel featured on surround systems. This means the designer could choose to boost or cut audio content under 120Hz without altering the level of the rest of the audio content in the other speakers. If applied to the battle ambience for example we could have states of light, medium, intense, conflagration, and create snapshots and transfer times to represent these. Perhaps by simply using the volume and LFE controls [clip 21]. If we perhaps wanted to articulate an impression of the heaviness of the weapons utilized we could include pitch shifting alongside that of the LFE and have snapshots set according to game states of light arms, medium arm, heavy arms.
‘States’ occupy an element of the Wwise system referred to as ‘Game Syncs’. These tools are intended to “streamline and handle the changes and alternatives that are part of the game.”  . These include ‘Switches’ which, as suggested, switch sounds or effects dependent on changes in the game. An example might be walking in off the street into a nightclub in ‘Grand Theft Auto 4’ where a change of the reverb settings attached to the players footsteps would be required.
6.7 The Wwise Soundcaster and Mixer
In order to experience the above effect Wwise supports a simulation system that allows the sound designer to generate mixes of possible combinations of activated assets without having to wait until they can access incorporation into the game or getting the audio programmer to integrate. This seems to be what Firelight Technologies might be trying to catch up on with their forthcoming equivalent. Soundcaster allows you to drop any container, switch, effect, sound or buss into its workspace and play them in any combination and if running through the mixer, as in the following demonstration, you can see and adjust levels first in the mixer, which has no bearing on actual output to the game, and then in the objects themselves once an appropriate mixing solution, such as a volume level, has been found. [clip 22] [fig 16]
Firelight Technologies forthcoming FMOD Studio seems to offer very similar possibilities though it seems likely that the mixer will become part of the engine rather than a simulation device. “FMOD Studio is designed specifically for creating sound effects and music that have variation and respond dynamically to in-game action”  So it would seem that maybe the FMOD Studio might take the role of Wwise’s ‘Master – Mixer’ buss system, integrating the music system with the rest of the ‘Designer’ so that the designer can hear all of the elements simultaneously.
6.8 Effects and Signal Processing
Referring back to the fear emitters utilized in Dead Space we can see how adaptive mixing of effects can be used for narrative purposes as well as achieving relative immersive realism. On lower ‘fear’ settings the reverb DSP is quite ‘wet’ giving a big resonant emptiness to the sound world, however as the fear level rises and therefore new sounds appearing across the diegesis the reverb dries up in proportion to the latter in order to make room for new sounds. [clip 23]
This is important when reflecting on frequency make up of the Dead Space sound world which is low-mid and bass dominant. When not well defined, audio content in this area of the frequency spectrum can quickly become muddy when there are multiple sources present. This real time adaptation of effects is deployed across a broader spectrum of processing allowing more precise narrative alignment with visual effects “We were able to look at values inside the visual effect at every frame and say ‘okay, it’s changed slightly, we’re going to change the audio to match exactly what it’s doing’.”