Curriculum Learning in 2014 for BBC Wales...
Twelve years ago, we built a Welsh automatic speech recognition (ASR) system which updated itself every day using synthetic speech. This was a pragmatic hack to cope with a language whose vocabulary changes faster than audio data can be collected.
Today, new research in multilingual speech-to-text translation (S2TT) is using remarkably similar ideas — but applying them at global scale. Looking back, it’s striking how many of the modern techniques echo the Welsh pipeline we built years earlier. See Yexing Du et al: arXiv:2409.19510
This short retrospective looks at what we did, why it mattered, and how it connects to state-of-the-art multilingual models today.
Read the full story here →
Today, new research in multilingual speech-to-text translation (S2TT) is using remarkably similar ideas — but applying them at global scale. Looking back, it’s striking how many of the modern techniques echo the Welsh pipeline we built years earlier. See Yexing Du et al: arXiv:2409.19510
This short retrospective looks at what we did, why it mattered, and how it connects to state-of-the-art multilingual models today.
Read the full story here →