A cookbook for 360º sound – what I’ve learned with lots of links.

When I started working on 360° sound delivery, I thought I’d have to chew on some reasonably coherent technical writing to get a working process. Instead I almost drowned in an ocean of piffle washing on a few islands of isolated first-hand knowledge. My attempt here has been to force these islands together.

It doesn’t help, for example, that Dolby has re-applied the ‘Atmos’ label to a wide range of outcomes each defined by the incapability of a device. Some Android phones are advertised as providing ‘Atmos’ sound through their tiny speakers, which is patently bullshit. Nevertheless, some listeners are dead certain they’re hearing full surround, and are equally impressed by Dolby’s Atmos demonstrations on YouTube, which has no surround capability whatsoever. (YouTube can supply binaural sound – but that’s different, as will be explained).

You’ll believe a pig can fly…

You might want to cut Dolby out of the whole deal for being snake oil. Most DAWs and video editors have dropped support for Dolby in favour of AAC. But here comes Apple with their air pods adding the damn thing back to fit their Apple TV+ ecosystem. It’s all very messy.

And the gatekeepers are in control again. Once upon a time you had to record music in a hired studio, but we’ve enjoyed a period of bedroom production and the glorious tumult that goes with that (but see this guy for determination). Not yet happening with surround – both Dolby and Sony have a ‘fuck you’ page online for people wanting access to their production tools. If you don’t have the right speaker array and the numbers of staff (seriously – “How many staff do you have?”) you are not going throw mud pies at their white dress. We need a process that works around their edges.

Terms

Let’s start by defining some terms. Surround should always refer to a multi-speaker system where there is a focal point – a screen, a stage. The listener is orientated in a particular direction. But when there is no dominant direction, a sphere of sound, you’re dealing with Ambisonics. It seems obvious but it’s very easy to confuse the two when trying to get sound from A to B. They each have purposes and advantages – for example most rock bands play from a specific direction.

However, when I create ambisonic soundtracks for YouTube videos, they offer a starting point for the viewer, but there is no correct listening orientation. You are welcome to turn your head at odds to my suggestion. Then again I’ve recently made a set of surround music for Apple’s pro air pods which are always orientated to the phone screen. Different concepts.

Ambisonic

Ambisonic seems to be the best way to store your music for any future outcome. There’s no patent on it and plenty of free tools that can mix it down to whatever you need – no requirement to pay for some commercial solution. Most DAWs and some video editors now have at least some ambisonic capability. There are plug in tools that can emphasise sound focus from one angle so that surround mix design is possible.

One major issue with ambisonic sound is that it’s complex. If you’re not mathematically inclined it can be a terrible barrage of gobbledegook. It’s worth reading the maths just once or twice and then take this executive summary – the more channels, the higher the positional fidelity. These channels do not represent speaker positions, they carry phase differences – a kind of rotational hologram. First order (4 channels) has been used for years to capture sounds on a reasonably simple microphone array or dummy head and provide it as a binaural headphone image. Second order (9 channels) is an improvement and Third order (16 channels) is detailed. You can keep going higher, and each time get better positional quality.

Every time you share this diagram a kitten dies.

At some point most DAWs will cop out. Cubase can reach 16 channels, the standard Pro Tools only 8 (which you’ll notice isn’t the required 9). Ableton has an Envelop plug in that promises 16 (but I could never get it to work). Logic doesn’t seem to have any support. But Reaper can handle 64 channels or 7th order – which is why most academic work is done there. Keep in mind that means each track in your mix would then be 64 times bigger!

How many channels do you really need? It could depend on how many sources need to be differentiated in the master. An orchestra might need 64 channels, a singer just 9. Another way to think about it is the distance between speakers required to reproduce the image. If you’re aiming at a standard 5.1 speaker array, you could use only 3rd order ambisonics – so long as you disable the centre speaker and move the surrounds up to create a quad array. But the 5.1 array is so bad at reproducing spherical sound (those front three speakers have a lot of cross talk) you’d probably need 4th order to help fill the big gaps either side. Also note that you lose all height information. A cube of speakers needs only 1st order sound. See https://en.wikipedia.org/wiki/Ambisonic_reproduction_systems

To future proof your masters, the more channels the better. There’s some good news about the way the orders relate to each other. You downmix your ambisonic recording simply by removing unwanted channels. If you take a 3rd Order recording and keep only the first 9 channels you end up with exactly the same recording with slightly fuzzier positioning. And there are parametric orders that can take less channels and extrapolate them into more.

Unless you’re embedded in a particular DAW, Reaper is your home base (I am a Cubase user). You start by learning how to send multichannel sound to the output while monitoring with a binaural converter on a stereo output so you can hear what you are doing. If you can afford it, invest in a binaural system that does head tracking, so you can swivel the sound around as if it is on speakers. You’re then going to pick up at least two suites of plugins – Sparta and IEM. They are not simple; they are not for the novice. IEM is where you start, until you get to a task where Sparta can do some difficult thing.

Practice panning before any effects. Practice rising some sounds above the head, some below, having counterpoint sounds behind you and the essentials in the front. There are many things to learn.

At the moment you have at least one, maybe two mass culture outlets for your ambisonic productions. YouTube can host videos with first order sound, where you can rotate the phone to turn the listener’s head. They must be .mp4 videos with four channels of audio. The good news is exactly the same format works locally on VLC, which runs on everything including phones. But most phone owners don’t understand VLC so you will have to train them. We can hope that one day a music streaming service will take the risk.

From Ambisonic to Surround

At this point you may have a doubt about headphones which are not often the best way to mix sound. You are tempted to downmix to a surround speaker system to get the right quality – so why not work in surround in the first place? And which surround system? 5.1? 7.1? 7.1.2? Or some future format?

RODE’s Soundfiled application is an easy way to convert from ambisonics to surround. But it can only handle 1st order recordings. IEM and SPARTA are harder to use but more accurate.

I believe that most people don’t listen to music on surround systems. Some might still go to a movie theatre, less common these days. Others might watch movies with a 5.1 speaker system, or one of those soundbars (which are basically like unwrapped binaural headphones). But seemingly everyone has a smart phone, and often the headphones that go with it. Apple is investing in headphones on the basis that’s where the action will be in future. SONY is in hot pursuit. In the same way you’d once test a mix for AM radio and car sound – you’re reaching the majority of the audience by mixing for a variety of headphones. I’ve picked up the air pods and SONY equivalent on the basis that a lot of music is going to be heard this way in future.

My mixing always starts with ‘higher’ order ambisonics. I use a variety of binaural rendering tools straight out of Cubase or Reaper into a variety of headphones. Once I’m happy with that I’ll start making downmix versions. The IEM and Sparta tools can take an ambisonic image and place virtual microphones at the directions where speakers would sit. You output the sound of these microphones as multiple mono signals – 6 of them for 5.1 surround, 8 for 7.1 and so on. This is the area which is causing a lot of difficulty for me.

5.1 and beyond.

Many video tools understand 5.1 surround. Output a WAV with 6 channels and it will drop into most any software preserving the L,C,R,LFE,SL,SR speaker positions. Above that there’s not much luck. For example, Adobe Premiere will cut a 7.1 WAV into 5.1 and a stereo pair (which at least sensibly allows you to downmix). Audition sees 8 unassigned channels – other tools like Apple’s Compressor just see nothing, zilch. There are of course high-end tools that work with 7.1 but they cost. I don’t know much about them. Like I said – gatekeepers. For that reason if you must use surround, 5.1 is probably the best option for now.

The easiest way to get your 5.1. mix to a consumer is to make a video the same length as the audio and then ‘mux’ it (combine it in a video container) with your 6 channel WAV. You can do that in Premiere, probably in Final Cut or one of several dedicated ‘muxing’ tools. Now that you have a recognisable surround video just enter that into Handbrake, set to convert the audio to Dolby 5.1. The result plays perfectly on the Apple EarPods.

You’re not soon likely to work with object based systems like Dolby ATMOS or Sony 360. But there’s some interesting alternatives which I must admit I’m still figuring out. Most game engines already work in ambisonics and 7.1 and higher – and most them are free. The Australian sound design software FMOD isn’t really designed for music production – it’s specific to active music. But it has the positional and animation tools that could lead to a new kind of underground culture – Unity Punk or something like that. But that’s for a later essay – this one has grown top-heavy!

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *