Examples of speech generated by VoiceNoNG and other models

This project is maintained by JasonSWFu

Demo page of proposed VoiceNoNG

Section 1. Examples of editing speech from movies/YouTube:

Example 1-1: Spider-Man (2002)

Original transcript: With Great Power, comes Great Responsibility

Target transcript: With more GPU, comes Great Responsibility.

Original:

Proposed VoiceNoNG:


Example 1-2:

Original transcript: With Great Power, comes Great Responsibility

Target transcript: With more computational resources, comes Great Responsibility.

Original:

Proposed VoiceNoNG:


Example 2: Harry Potter and the Philosopher’s Stone (2001)

Original transcript: Dear Mr. Potter, we are pleased to inform you that you have been accepted at Hogwarts School of Witchcraft and Wizardry.

Target transcript: Dear Mr. Potter, we are pleased to inform you that you have been expelled from Hogwarts School of Witchcraft and Wizardry.

Original:

Proposed VoiceNoNG:

(Can keep the British Accent and even successfully generate background music!)

RealEdit dataset:

Section 2. Examples of editing speech with background audio:

Example 1: YOU1000000110_S0000046.wav

Original transcript: argentina’s trophy and it’s a fifth world crown.

Target transcript: argentina’s trophy and victory is a fifth world crown.

Original:

Voicebox: (bad quality)

VoiceCraft: (generates speech with unintended long silences, and missing words)

Proposed VoiceNoNG:

Section 3. Examples of attention errors (hallucinations) of VoiceCraft:

Example 1: 8173_294714_000033_000000.wav

Target transcript: promise that you will not ask me to borrow any money from the bank for the money of you for mister van brandt she rejoined and i accept your help gratefully.

VoiceCraft: promise that you will not ask me to borrow any money of you from mister van brandt you rejoined and i accept your help gratefully.

Proposed VoiceNoNG:


Example 2: YOU1000000101_S0000132.wav

Target transcript: yet anytime you and i question the schemes of the dogooders or dare to dig into any of their motives we’re denounced as being against their humanitarian goals they say we are always against things we are never for anything.

VoiceCraft: yet anytime you and i question the schemes of the dog or dare to dig into any of their motives we’re denounced as gooders we’re denounced as being against their humanitarian goals they say we are always against things we are never for anything.

Proposed VoiceNoNG:

Section 4. Examples from LibriTTS (within the RealEdit dataset)

Example 1: 116_288046_000004_000007.wav

Original transcript: And since we are doomed to know the truth, let us cultivate a love for it.

Target transcript: And since we are doomed to possess and seek knowledge, let us cultivate a love for it.

Original:

Proposed VoiceNoNG:

VoiceCraft:

Voicebox:

Post-quantization:


Example 2: 2035_147960_000003_000004.wav

Original transcript: We might get some puppies, or owl eggs, or snake skins.

Target transcript: We might get several colorful gemstones, or owl eggs, or snake skins.

Original:

Proposed VoiceNoNG:

VoiceCraft:

Voicebox:

Post-quantization:

Section 5. Examples from YouTube (within the RealEdit dataset)

Example 1: YOU1000000005_S0000035.wav

Original transcript: and then the campaign content i think this one is really key to use as well.

Target transcript: and then the campaign content is super detailed so this one is really key to use as well.

Original::

Proposed VoiceNoNG:

VoiceCraft:

Voicebox:

Post-quantization:


Example 2: YOU1000000167_S0000107.wav

Original transcript: he hadn’t expected london to have quite so many legs.

Target transcript: he hadn’t expected the new furniture to have quite so many legs.

Original:

Proposed VoiceNoNG:

VoiceCraft:

Voicebox:

Post-quantization:

Section 6. Examples from Spotify (within the RealEdit dataset)

Example 1: show_2CJ6f4oLCccT3fsUaWAk9k-3fVgo6u94DJHpK7uP1Qb7V.wav

Original transcript: And, like comment subscribe give me feedback give me feedback.

Target transcript: And, like comment subscribe give me your thoughts and any feedback.

Original:

Proposed VoiceNoNG:

VoiceCraft:

Voicebox:

Post-quantization:


Example 2: show_2T8QRK60cWaPQflfo6Wuc4-4oTO10xL7hQQS2fuXBy1d7.wav

Original transcript: In the pursuit of lightness minimal stress ultimate fulfillment.

Target transcript: In the pursuit of calm serenity an escape from stress ultimate fulfillment.

Original:

Proposed VoiceNoNG:

VoiceCraft:

Voicebox:

Post-quantization: