Curriculum Learning in 2014 for BBC Wales...

Published

Curriculum Learning in 2014 for BBC Wales...

Posted 2025-11-19 16:38:27 • Updated 2025-11-19 16:52:46

Twelve years ago, we built a Welsh automatic speech recognition (ASR) system which updated itself every day using synthetic speech. This was a pragmatic hack to cope with a language whose vocabulary changes faster than audio data can be collected.

Today, new research in multilingual speech-to-text translation (S2TT) is using remarkably similar ideas — but applying them at global scale. Looking back, it’s striking how many of the modern techniques echo the Welsh pipeline we built years earlier. See Yexing Du et al: arXiv:2409.19510

This short retrospective looks at what we did, why it mattered, and how it connects to state-of-the-art multilingual models today.

Read the full story here →