Recording audio and video for amateurs
During the pandemic, conferences and meetings had to move online. As scientists, we are increasingly asked to pre-record presentations ourselves. This is new ground for many of us. Recently, I have also become dissatisfied with the quality of the built-in laptop microphone, even for video calls. While the camera is also bad, this does not matter as much for a video call and a recorded presentation will only have a small image of me in some corner. So here are some notes of what I learned about recording videos and how I tried to improve the audio quality. No guarantees for correctness, I am also an amateur. Everything is done on Linux using free software, which should also be available on Windows and Mac. Possibly subject to updates in the future.
Hardware and environment
The first principle is “crap in/crap out”, as always. If your audio clips or your signal-to-noise ratio is low, this cannot be fixed in post-processing. Before you even turn on your computer, some things have to be arranged and/or bought.
The biggest issue is the acoustic quality of the room you’re recording in. You can search for “acoustic treatment” to find the plethora of options and details. You can also find out that the amount of money and effort you can spent have no upper limit. Most of us are not going to turn our offices or living rooms into recording studios, so I am only going to recommend to do something like a clap test: Clap your hands loudly and listen for the reverb. In your bathroom, for example, you will hear a lot of reverb. Try to reduce that. Carpets might help. Turn off any unneeded noise source (fan, second computer, phone, office mate).
As I said above, I did not yet look too deeply into improving video quality. Just use your eyes to see if the lighting is decent. Try to set the scene a bit, it looks bad if you are just in front of a white wall, perhaps you can have some plants or a bookshelf behind you. Make sure you wear clothes.
Regarding hardware, you should really look into getting a somewhat decent microphone. Again, there is no real upper limit on the price you can pay (this is a theme for audio equipment). There are also many different opinions and reviews on the net, so things are confusing. You will most likely want a “dynamic” microphone (those pick up less noise from your less-than-optimal room) with a “cardioid pattern” (which means the mic will pick up more sound in front of it than to the sides or behind it). Personally, I found the YouTube channel “Podcastage” helpful and some tips are taken from there (but there are many videos and details, so you can get lost quickly). I decided to go for an XLR microphone and an audio interface because it is (a) more flexible, (b) not much more expensive, and (c) I get a decent headphone output on my PC for free. XLR is just a standard connector for professional audio equipment (meaning you also need a special cable) and the audio interface amplifies the microphone and converts it to a digital signal that travels into your computer via USB. A cheap option is the Behringer XM8500 microphone and a Behringer U-Phoria UMC22 audio interface. The latter should be plug and play under Linux and other operating systems. You will also need the XLR cable and a microphone stand or boom arm. It is best to have the stand on a different surface than your keyboard and mouse, so that the microphone does not pick up your typing noise or you bumping into the table with your arms or legs. But take heed: The microphone should not be too far from your mouth in order to improve the signal-to-noise ratio. So some sort of flexible arm mount is nice to have. Check your local music store if you can, they might be able to give you more tips.
Finally, you will want to do something about plosives. If you hold your hand in front of your mouth and say “plosive”, you will feel some wind from the “p” sound. If this wind hits the microphone, it will make an unpleasant noise. The best option is to buy a “pop filter”, which is often a piece of cloth that you can put in front of the microphone and which then stops the wind from hitting the mic. If that is not possible, put the microphone at a 45° angle towards your mouth, so that the wind passes by it. Or do both.
Software
I use
- OBS to record video and audio,
- Kdenlive to cut clips, arrange clips, and render the final video, and
- Ardour to post-process audio.
In case you need to convert any audio to .wav
format, you can do it
on the command line with ffmpeg
:
ffmpeg -i <input_file> -acodec pcm_s24le -ar 48000 <output>.wav
This gives a 24 bits per sample depth and 48 kHz sampling
rate. Other options are e.g. pcm_s16le
for 16-bit depth and sampling
rates of 44.1 kHz or 96 kHz (the latter is
probably nonsense).
Input settings (also for video calls)
You want to maximise the signal-to-noise ratio of the microphone (we want to hear your voice, not your neighbour’s cat) without clipping. First, bring your microphone closer to your mouth. You might be limited by the fact that you do not want it to be visible in the video. Then try to keep this distance from the microphone at all times (I know that it can be distracting or make you take a stiff position, you have to practice to get comfortable with this). If you need to listen to some audio—such as other people in a video call—while your mic records, you should use headphones.
Now we will have to set the gain. On an audio interface (or some USB microphones), this is a knob that controls how much the microphone signal is amplified, often helpfully labelled “gain”. If you have that, make sure that your operating system’s volume control is set to 0 dB or 100% for the microphone (this should be the default in Pulseaudio on modern Linux). Otherwise, you will have to adjust input volume via the OS volume control. If your microphone sound only comes out of the left channel, see here. Start OBS, even if you do not need to record. At the bottom of the screen, there are volume meters with green/yellow/red bars. Make sure the mic is unmuted there and the slider is set to 0.0 dB. Now talk normally and you should see the signal. Make sure the correct microphone is chosen in the settings! You can scratch your fingers over the mic to test this (other microphones, such as those built into laptops, should not pick this up) Adjust the gain so that your normal speaking volume leads to the volume meter being in the yellow area. That’s it. It should give you some “head space” to avoid clipping even if you scream/laugh loudly and still pick up enough signal. You can test a loud sound and make sure it does not reach 0 dB on the scale. It is best to do this “calibration” every time.
If all you wanted is better audio in a video call, you’re done. Just make sure to select the correct microphone/audio interface in your video call software.
If you also want to record some audio from your computer (e.g., music or program sounds), the OBS developers propose to keep these levels in the green area and not in the yellow. This recommendation is probably made for people who livestream video games and do not want these sounds to overwhelm the voice. You must test if this is reasonable for your use case. You might also consider introducing video clips in a video editing software as described below.
Recording
OBS is quite capable, but I found it easy to use. Just arrange
everything on screen the way you want it to be. For recording, I
recommend to use high-quality settings if you can afford it
(limitations are your CPU speed and disk space; play around with it, I
did not encounter any problems). Go to File → Settings → Output
and
set “Recording Quality” to “Indistinguishable Quality, Large File
Size” and “Recording Format” to “mkv”. You can go for a “Sample Rate”
of 48 kHz under Audio
, but I doubt it matters much. More
important is to set the resolutions and FPS under Video
to whatever
your target is. If you don’t know, you could go to 1920x1080 with
60 FPS for high-quality recordings. 30 or 24 FPS, for
example, are also OK, being used for TV in the US and movies,
respectively (you can experiment with what looks best if you really
want). Take note that if you record a window on your screen (let’s say
the slides from your presentation), the signal will have only the
exact pixel size that the window has on the screen. So if you want
your slides to fill your full 1920x1080 video, they need to be
displayed at 1920x1080. That also means your monitor needs at least
that resolution. Otherwise use a smaller output for your video,
nothing is gained by up-scaling a 640x480 input to full HD.
If you have convinced yourself that these settings work for you, it is
finally time to record. Set everything up and hit the Start
Recording
button. I would leave a few seconds in the beginning to get
comfortable, you can cut them later. Also leave a few seconds at the
end to avoid having an abrupt cut that you don’t like and cannot fix
later.
You can then find the video in your home directory, named with the timestamp of the start of the recording. If you’re happy with it, and you are going to upload this to YouTube, you can probably get away with just trimming the beginning and end a bit. If you just want to trim, use Avidemux, which can make cuts losslessly and without re-encoding. This is much faster than Kdenlive, for example. The reason why I mentioned uploading to YouTube is because they automatically adjust the loudness. If you use the file for example on your own website or send it to a TV or radio station (why are they not assisting you?), you will need to normalise the loudness. We will get to that later.
Cutting
If you had to do several takes, recorded several sections
independently, want to insert some video clips, or forgot to wear
clothes and need to put some black bars over parts of the video, you
will need a video editor. Start Kdenlive and drag any material you
want to include from your file manager to the Project Bin
field. Go
to Project → Project Settings
and adjust the output video resolution
and FPS (you probably want to match the OBS settings).
Now you should already have several empty video and audio tracks available. Drag and drop your clips into the timeline and arrange them. You can use the “Razor tool” to split a clip. With this, you can delete the parts you don’t like. There are two previews available, the clip monitor and the project monitor. The project monitor will show you how the final product will look and sound.
A useful effect is the “dissolve” transition. Move two video tracks so that they overlap. Then click on the bottom right corner of one of them. A little box labelled “Dissolve” should appear. Move this between the two tracks and make sure it covers the whole overlapped time range. Be careful: The audio from both tracks will play simultaneously.
You can also make freeze frames by selecting a clip and adding the
Motion → Freeze
effect. This uses a time stamp relative to the start
of the clip. From this point on, the image is frozen. You can combine
this with a dissolve transition, possibly setting the track
property
to “Black” to have a fade to black.
If your audio tracks have different loudness, for example because you
used a video clip from another source or you added some music, you
will have to adjust this. This is a complex topic, but for now I would
suggest to just keep one audio track as the reference loudness and
adjust the others until you reach the desired result. This can be
achieved with the effect Audio correction → Volume (keyframeable)
.
Apply this to each clip you want to adjust and change the gain until
you are happy. You can even add more key frames with the +
button to
have different adjustments in different parts of the clip.
Save the project.
Now, we just need to adjust the overall loudness so that it matches
other, similar media. Unfortunately, I could not get the loudness
normalisation in Kdenlive to work, so we have to treat the audio
externally. For this, click on Render
and chose WAV
as the output
format. Click Render to File
. It should not take long. We will now
have to normalise this .wav
file.
Loudness normalisation
First, a tiny bit of theory. The quietest and loudest sounds you want to have in your audio determine the dynamic range. That means that you should be able to digitally represent these signals (same for analogue audio, by the way, it just works differently). We already made sure our input captures a good signal range, so where is the problem? Imagine playing a music album and having to constantly adjust the volume because one track is really quiet and another really loud. This is even worse for music streaming services that often want to sell you a non-offensive background playlist consisting of music from different artists for, let’s say, exercising. You would not want to adjust your phone’s volume while lifting weights. That’s why sound engineers came up with loudness units relative to full scale (LUFS). It does not really matter what that is, you just need to know that it is an average value of loudness as perceived by humans.
This allows us to normalise the loudness to a given target LUFS
value. I could only really get this to work in Ardour. Start Ardour,
make a new session, chose pulseaudio (except if you know better and
want something else). Make sure that Edit → Preferences → Mixer →
Master → Enable master-bus output gain control
is activated.
Go to the “editor” view (button on the top right). Right-click below
“Master” and add a new audio track. Choose stereo and click “Add and
Close”. Drag and drop the .wav
file you exported from Kdenlive to
the Audio 1 track.
Now go to the “mixer” view (button on the top right). On the right
side, there is the mixer for the master, i.e., the total output. There
is a button labelled LAN
. Click it and chose Analyze
. After
waiting a while, you will get the results and some options to adjust
loudness. You can either chose a LUFS value (“integrated loudness”)
yourself, or use one of the many presets. “EBU R128” is for TV
broadcast. “Youtube” is probably also good for podcasts and video
material that goes onto any webpage that does not automatically
normalise the loudness (you will see that the various online platforms
have relatively similar settings). Then press “Apply”. If you want,
you can play similar audio material at the same time as your adjusted
track to make sure it has comparable loudness (make sure both programs
use the same volume settings in your operating system). If you’re
unhappy, repeat with adjusted values, higher values are louder
(remember that LUFS is a negative number).
Now go to Session → Export → Export to audio file
. You will probably
do fine with the “BWAV 24bit session rate” format. Export the file.
Export
After normalisation, go back to Kdenlive. Drag and drop your
normalised master into the Project Bin
and move it into an empty
audio track in the timeline. Adjust its beginning with the zero
timestamp to sync it up with the video. Mute all other tracks.
Click Render
and chose an output format. MP4 with h264 will render
reasonably quickly and is probably widely supported. If you care about
having a format that is free to use without any restrictions, use WebM
with VP8. If you are going for more than full HD, need smaller file
sizes, and can accept longer rendering times, use MP4 with h265 or
WebM with VP9. If file size is an issue, you might have to test out
all of these. Click Render to File
and go do something else until it
finishes. You are done!