Recording audio and video for amateurs

During the pandemic, conferences and meetings had to move online. As scientists, we are increasingly asked to pre-record presentations ourselves. This is new ground for many of us. Recently, I have also become dissatisfied with the quality of the built-in laptop microphone, even for video calls. While the camera is also bad, this does not matter as much for a video call and a recorded presentation will only have a small image of me in some corner. So here are some notes of what I learned about recording videos and how I tried to improve the audio quality. No guarantees for correctness, I am also an amateur. Everything is done on Linux using free software, which should also be available on Windows and Mac. Possibly subject to updates in the future.

Hardware
Software
Input settings
Recording
Cutting
Loudness normalisation
Export

Hardware and environment

The first principle is “crap in/crap out”, as always. If your audio clips or your signal-to-noise ratio is low, this cannot be fixed in post-processing. Before you even turn on your computer, some things have to be arranged and/or bought.

The biggest issue is the acoustic quality of the room you’re recording in. You can search for “acoustic treatment” to find the plethora of options and details. You can also find out that the amount of money and effort you can spent have no upper limit. Most of us are not going to turn our offices or living rooms into recording studios, so I am only going to recommend to do something like a clap test: Clap your hands loudly and listen for the reverb. In your bathroom, for example, you will hear a lot of reverb. Try to reduce that. Carpets might help. Turn off any unneeded noise source (fan, second computer, phone, office mate).

As I said above, I did not yet look too deeply into improving video quality. Just use your eyes to see if the lighting is decent. Try to set the scene a bit, it looks bad if you are just in front of a white wall, perhaps you can have some plants or a bookshelf behind you. Make sure you wear clothes.

Regarding hardware, you should really look into getting a somewhat decent microphone. Again, there is no real upper limit on the price you can pay (this is a theme for audio equipment). There are also many different opinions and reviews on the net, so things are confusing. You will most likely want a “dynamic” microphone (those pick up less noise from your less-than-optimal room) with a “cardioid pattern” (which means the mic will pick up more sound in front of it than to the sides or behind it). Personally, I found the YouTube channel “Podcastage” helpful and some tips are taken from there (but there are many videos and details, so you can get lost quickly). I decided to go for an XLR microphone and an audio interface because it is (a) more flexible, (b) not much more expensive, and (c) I get a decent headphone output on my PC for free. XLR is just a standard connector for professional audio equipment (meaning you also need a special cable) and the audio interface amplifies the microphone and converts it to a digital signal that travels into your computer via USB. A cheap option is the Behringer XM8500 microphone and a Behringer U-Phoria UMC22 audio interface. The latter should be plug and play under Linux and other operating systems. You will also need the XLR cable and a microphone stand or boom arm. It is best to have the stand on a different surface than your keyboard and mouse, so that the microphone does not pick up your typing noise or you bumping into the table with your arms or legs. But take heed: The microphone should not be too far from your mouth in order to improve the signal-to-noise ratio. So some sort of flexible arm mount is nice to have. Check your local music store if you can, they might be able to give you more tips.

Finally, you will want to do something about plosives. If you hold your hand in front of your mouth and say “plosive”, you will feel some wind from the “p” sound. If this wind hits the microphone, it will make an unpleasant noise. The best option is to buy a “pop filter”, which is often a piece of cloth that you can put in front of the microphone and which then stops the wind from hitting the mic. If that is not possible, put the microphone at a 45° angle towards your mouth, so that the wind passes by it. Or do both.

Software

I use

OBS to record video and audio,
Kdenlive to cut clips, arrange clips, and render the final video, and
Ardour to post-process audio.

In case you need to convert any audio to .wav format, you can do it on the command line with ffmpeg:

ffmpeg -i <input_file> -acodec pcm_s24le -ar 48000 <output>.wav

This gives a 24 bits per sample depth and 48 kHz sampling rate. Other options are e.g. pcm_s16le for 16-bit depth and sampling rates of 44.1 kHz or 96 kHz (the latter is probably nonsense).

Input settings (also for video calls)

You want to maximise the signal-to-noise ratio of the microphone (we want to hear your voice, not your neighbour’s cat) without clipping. First, bring your microphone closer to your mouth. You might be limited by the fact that you do not want it to be visible in the video. Then try to keep this distance from the microphone at all times (I know that it can be distracting or make you take a stiff position, you have to practice to get comfortable with this). If you need to listen to some audio—such as other people in a video call—while your mic records, you should use headphones.

Now we will have to set the gain. On an audio interface (or some USB microphones), this is a knob that controls how much the microphone signal is amplified, often helpfully labelled “gain”. If you have that, make sure that your operating system’s volume control is set to 0 dB or 100% for the microphone (this should be the default in Pulseaudio on modern Linux). Otherwise, you will have to adjust input volume via the OS volume control. If your microphone sound only comes out of the left channel, see here. Start OBS, even if you do not need to record. At the bottom of the screen, there are volume meters with green/yellow/red bars. Make sure the mic is unmuted there and the slider is set to 0.0 dB. Now talk normally and you should see the signal. Make sure the correct microphone is chosen in the settings! You can scratch your fingers over the mic to test this (other microphones, such as those built into laptops, should not pick this up) Adjust the gain so that your normal speaking volume leads to the volume meter being in the yellow area. That’s it. It should give you some “head space” to avoid clipping even if you scream/laugh loudly and still pick up enough signal. You can test a loud sound and make sure it does not reach 0 dB on the scale. It is best to do this “calibration” every time.

If all you wanted is better audio in a video call, you’re done. Just make sure to select the correct microphone/audio interface in your video call software.

If you also want to record some audio from your computer (e.g., music or program sounds), the OBS developers propose to keep these levels in the green area and not in the yellow. This recommendation is probably made for people who livestream video games and do not want these sounds to overwhelm the voice. You must test if this is reasonable for your use case. You might also consider introducing video clips in a video editing software as described below.

Recording

OBS is quite capable, but I found it easy to use. Just arrange everything on screen the way you want it to be. For recording, I recommend to use high-quality settings if you can afford it (limitations are your CPU speed and disk space; play around with it, I did not encounter any problems). Go to File → Settings → Output and set “Recording Quality” to “Indistinguishable Quality, Large File Size” and “Recording Format” to “mkv”. You can go for a “Sample Rate” of 48 kHz under Audio, but I doubt it matters much. More important is to set the resolutions and FPS under Video to whatever your target is. If you don’t know, you could go to 1920x1080 with 60 FPS for high-quality recordings. 30 or 24 FPS, for example, are also OK, being used for TV in the US and movies, respectively (you can experiment with what looks best if you really want). Take note that if you record a window on your screen (let’s say the slides from your presentation), the signal will have only the exact pixel size that the window has on the screen. So if you want your slides to fill your full 1920x1080 video, they need to be displayed at 1920x1080. That also means your monitor needs at least that resolution. Otherwise use a smaller output for your video, nothing is gained by up-scaling a 640x480 input to full HD.

If you have convinced yourself that these settings work for you, it is finally time to record. Set everything up and hit the Start Recording button. I would leave a few seconds in the beginning to get comfortable, you can cut them later. Also leave a few seconds at the end to avoid having an abrupt cut that you don’t like and cannot fix later.

You can then find the video in your home directory, named with the timestamp of the start of the recording. If you’re happy with it, and you are going to upload this to YouTube, you can probably get away with just trimming the beginning and end a bit. If you just want to trim, use Avidemux, which can make cuts losslessly and without re-encoding. This is much faster than Kdenlive, for example. The reason why I mentioned uploading to YouTube is because they automatically adjust the loudness. If you use the file for example on your own website or send it to a TV or radio station (why are they not assisting you?), you will need to normalise the loudness. We will get to that later.

Cutting

If you had to do several takes, recorded several sections independently, want to insert some video clips, or forgot to wear clothes and need to put some black bars over parts of the video, you will need a video editor. Start Kdenlive and drag any material you want to include from your file manager to the Project Bin field. Go to Project → Project Settings and adjust the output video resolution and FPS (you probably want to match the OBS settings).

Now you should already have several empty video and audio tracks available. Drag and drop your clips into the timeline and arrange them. You can use the “Razor tool” to split a clip. With this, you can delete the parts you don’t like. There are two previews available, the clip monitor and the project monitor. The project monitor will show you how the final product will look and sound.

A useful effect is the “dissolve” transition. Move two video tracks so that they overlap. Then click on the bottom right corner of one of them. A little box labelled “Dissolve” should appear. Move this between the two tracks and make sure it covers the whole overlapped time range. Be careful: The audio from both tracks will play simultaneously.

You can also make freeze frames by selecting a clip and adding the Motion → Freeze effect. This uses a time stamp relative to the start of the clip. From this point on, the image is frozen. You can combine this with a dissolve transition, possibly setting the track property to “Black” to have a fade to black.

If your audio tracks have different loudness, for example because you used a video clip from another source or you added some music, you will have to adjust this. This is a complex topic, but for now I would suggest to just keep one audio track as the reference loudness and adjust the others until you reach the desired result. This can be achieved with the effect Audio correction → Volume (keyframeable). Apply this to each clip you want to adjust and change the gain until you are happy. You can even add more key frames with the + button to have different adjustments in different parts of the clip.

Save the project.

Now, we just need to adjust the overall loudness so that it matches other, similar media. Unfortunately, I could not get the loudness normalisation in Kdenlive to work, so we have to treat the audio externally. For this, click on Render and chose WAV as the output format. Click Render to File. It should not take long. We will now have to normalise this .wav file.

Loudness normalisation

First, a tiny bit of theory. The quietest and loudest sounds you want to have in your audio determine the dynamic range. That means that you should be able to digitally represent these signals (same for analogue audio, by the way, it just works differently). We already made sure our input captures a good signal range, so where is the problem? Imagine playing a music album and having to constantly adjust the volume because one track is really quiet and another really loud. This is even worse for music streaming services that often want to sell you a non-offensive background playlist consisting of music from different artists for, let’s say, exercising. You would not want to adjust your phone’s volume while lifting weights. That’s why sound engineers came up with loudness units relative to full scale (LUFS). It does not really matter what that is, you just need to know that it is an average value of loudness as perceived by humans.

This allows us to normalise the loudness to a given target LUFS value. I could only really get this to work in Ardour. Start Ardour, make a new session, chose pulseaudio (except if you know better and want something else). Make sure that Edit → Preferences → Mixer → Master → Enable master-bus output gain control is activated.

Go to the “editor” view (button on the top right). Right-click below “Master” and add a new audio track. Choose stereo and click “Add and Close”. Drag and drop the .wav file you exported from Kdenlive to the Audio 1 track.

Now go to the “mixer” view (button on the top right). On the right side, there is the mixer for the master, i.e., the total output. There is a button labelled LAN. Click it and chose Analyze. After waiting a while, you will get the results and some options to adjust loudness. You can either chose a LUFS value (“integrated loudness”) yourself, or use one of the many presets. “EBU R128” is for TV broadcast. “Youtube” is probably also good for podcasts and video material that goes onto any webpage that does not automatically normalise the loudness (you will see that the various online platforms have relatively similar settings). Then press “Apply”. If you want, you can play similar audio material at the same time as your adjusted track to make sure it has comparable loudness (make sure both programs use the same volume settings in your operating system). If you’re unhappy, repeat with adjusted values, higher values are louder (remember that LUFS is a negative number).

Now go to Session → Export → Export to audio file. You will probably do fine with the “BWAV 24bit session rate” format. Export the file.

Export

After normalisation, go back to Kdenlive. Drag and drop your normalised master into the Project Bin and move it into an empty audio track in the timeline. Adjust its beginning with the zero timestamp to sync it up with the video. Mute all other tracks.

Click Render and chose an output format. MP4 with h264 will render reasonably quickly and is probably widely supported. If you care about having a format that is free to use without any restrictions, use WebM with VP8. If you are going for more than full HD, need smaller file sizes, and can accept longer rendering times, use MP4 with h265 or WebM with VP9. If file size is an issue, you might have to test out all of these. Click Render to File and go do something else until it finishes. You are done!