Audio Quality for Transcription

Audio quality is one of those topics that I frequently discuss with clients. I had a conversation with a client today on this very subject. Mr Client had recorded a security-team meeting on his cell phone. I am not a cell phone fundi and made the assumption that he must have some sort of state-of-the-art cell phone with a superior external microphone that actually has the capacity to capture voice input effectively.



When I received the audio from Mr Client, I did a test transcription which resulted in a transcription ratio of between 1:15 to 1:20, depending on the speaker. This means that it would have taken me 15 hours to transcribe one hour of recording. The charge rate then becomes too expensive and the resulting transcript is peppered with “inaudible” gaps. Sometimes we just have to admit defeat and classify it as “not viable for transcription”.


I receive many interview recordings conducted in noisy restaurants. I believe this happens as a result of the researcher offering her interviewee a meal or coffee as a token of appreciation for his time and knowledge. It could be that the interviewee might feel that he is not able to speak openly at his place of employment or feels safer in a public place and therefore chooses the friendly atmosphere of a restaurant where he feels more at ease.


It just so happens that a restaurant can be a very noisy place. Adding to this, speech becomes a little more difficult to discern when the speaker has a mouth full of food. There will be moments of distraction when the waitress takes the order, serves the meals, does follow-up visits or when friends or colleagues happen to visit the same restaurant. The transcript could look something like this.


Stacey: How were you made aware of the new policy?

Peter: Well, I went to the – thank you, that looks delicious – I went to the implementation meetings … mm, this calamari is delicious – the implementation meetings were taking place on a weekly basis – would you like to try some of this?

Stacey: Ah, no thanks. I’ll stick with my schnitzel.

Peter: Oh okay. Yeah, so I knew all along that the policy was in the pipeline and that the implementation was going to happen in the early part of the year. Oh what’s that?

Waitress: It’s the sauce you ordered.

Peter: Oh thanks. Yeah, so I knew about the policy from the start. Could you pass me the salt?

Stacey: Sure. So when was the policy actually implemented?

Peter: Thanks. Yeah, it needs a bit of salt. I think I’d like to order a glass of water as well. I guess the waitress will be back in a few minutes. Sorry, your question – oh yeah that’s better –

Stacey: When was the policy implemented?

Peter: Oh yeah. Well, it was planned to happen simultaneously with the commencement of the FY but then there were some glitches that happened –

Stacey: Glitches? What happened?

Peter: — kind-of toward the end of the testing phase and we had to wait for –

(inaudible cross-talking)

Peter: Ah yes, could you bring me a glass of water please?

Waitress: Certainly.

Peter: Yeah, the implementation. The effective date was the 1st of June last year but it was supposed to have been the 1st of March … and yeah, the system went live and we had the usual teething problems but it wasn’t too bad.

Stacey: You mentioned that you experienced some glitches during the middle of the testing phase. Could you expand on that a little please?

Peter: Well, we were waiting for HR and the operations guys to finalise the restructuring before we could commence with the training. You see what happens is … um, this calamari is really good. So yeah, the training needed to happen before we could go live and the policy implementation had to happen around the same time. It was a sensitive situation because we didn’t want to –

Waitress: Your water, Sir.

Peter: Thanks. We didn’t … yeah, so it didn’t happen until about the end of March I think.

Stacey: What didn’t happen?

Peter: The … the finalisation of the … the restructuring process. You see – Rob, hi man. Good to see you.


Ordinarily the food compliments are omitted when doing the transcription but a verbatim transcript would look something like the above sample. Now imagine the above discussion taking place in an environment that has another 50 patrons also conducting conversations across the table. Throw in a dose of Kenny G in the background and the same 50 patrons speaking even louder to make themselves heard above Kenny G. Now add a mouthful of food to the speaker … and we have what we would call “challenging” audio. It is standard practice for transcriptionists to increase their rates when transcribing an audio of such a nature. It stands to reason that it is going to take a lot longer to transcribe a one-on-one interview conducted in a restaurant, than one that was recorded in a less noisy environment like an office or lounge.


To put this in some sort of context, even the presence of an air conditioner in a meeting room can muffle the voices of the participants. Obviously this will have less impact than the guffawing of the restaurant patron sitting at the table next to you. As insidious as white noise can be (or supposed to be), it can make a significant difference to the quality of the audio presented to the transcriptionist.

So when planning the recording of your next hearing or interview, remember to minimise background noise where possible. It is just not possible to eliminate background noise altogether but you can make small adjustments such as closing windows to minimise external traffic noise or requesting participants not to move around the room unless absolutely necessary. Remind the participants that the session is being recorded and where possible to project their voices strongly and direct themselves towards the microphone.