It’s no secret that audio editing is a task many podcasters must begrudgingly embrace. While modern digital audio workstations have come a long way to make audio editing easier, these applications are still lacking in tools that truly speed up or automate the process. A skilled audio producer will be able to see some obvious correction points when looking at waveforms within an editing program. But for the most part, when it comes to editing spoken word tracks, it’s necessary to listen to all of the audio that’s been recorded.
Adobe could turn spoken word audio production on its side with a new prototype software called Project VoCo. Ars Technica has already dubbed VoCo the “photoshop for audio,” quoting developer Zenyu Jin, who debuted VoCo at a recent Adobe conference.
VoCo works like this: Give the software a sample of about 20 minutes of spoken audio from a single voice (the sample could be recorded specifically for VoCo, or taken from a podcast or audio book recording). VoCo then generates a transcript of the words that were spoken and displays the transcript as text. From there, an audio producer can simply rearrange the text into a new order, and VoCo will edit the audio to match the changes in text.
It doesn’t stop there. VoCo can also synthesize spoken words based on the sample audio it’s received, allowing the editor to literally put words into the editing subject’s mouth.
For those who are dreading that next audio editing job, VoCo could speed up the process dramatically. Of course, the software’s ability to convincingly produce spoken words from text will create endless potential for hilarity. But it could also be used for evil. With this in mind, Adobe has added a watermarking tool to VoCo to make it easier to identify authentic audio.
See VoCo in action in the video below.