Speech to speech translation with Predictive Semantic Modelling

Check out our prediction demo →

Why LLMs?

LLMs are changing speech and language technology, providing single models to replace multiple complex steps, e.g.:

Speech-to-text
Translation
Text-to-Speech

Each stage can also be internally complex. Wavenet (2017) revolutionized speech synthesis but required a two-stage TTS process, and even using VITS (2021), a single-network TTS, the full speech translation pipeline [STT > translation > TTS] remains fragmented and latency-prone.

We are starting development with a single process: prediction, but as we enhance the system to include translation, speech input and speech output, it will not be by creating a multi-stage process. the history of speech technology →

Why Bittensor?

Decentralised training and operation

Miners iterate our code and models.
Incentivised by emissions not salaries.
Faster iteration and continuous optimisation.

Triple Approach to IP Exploitation

On-prem licensed software (e.g. security sector)
Maximise market reach with cloud SaaS (e.g. AWS Marketplace)
Run on Bittensor for customers inside ecosystem

Aligned value for investors and users

Equity: licence & SaaS revenue from protectable IP
Tokens (Alpha): yield tied to subnet performance and model growth
Bittensor ecosystem: provides our data, compute, storage

see our latest results →

Roadmap Phases

1 · Phrase completion

Our first competition is to deliver on the core premise behind our latency innovation, i.e. that LLMs, just like native speakers, generally know what is about to said before every word is uttered. This is a significant capability in its own right, even without appllying it to translation.

2 · Real-time text translation

The next step in our product development is to deliver a streaming service to translate text between languages, using a novel confidence measure to dynamically adjust the level of prediction to optimise for accuracy anywhere on the spectrum from completely predictable quotations to lists of random words. Think of translating subtitles into any language with minimal lag.

3 · Low-latency audio ingest; text output

This is well-troden ground for our Chief Scientist, who have been working on low-latency speech processing models since 2020. His team at Neurence, succeeded in reducing the latency of an end-to-end morphing process (including vocoding) down to 50ms.This will further speed up the ability to offer translated subtitles in for live speech, e.g. in meetings.

4 · Speech-to-speech

The final stage is to output speech. The ultimate latency optimisation is likely to come from a true multi-modal single-shot transformation, but there are further optimisations for which speech-mode LLMs are superior to legacy models. For example, LLMs are already capable of dealing with the divergence in how emotions are expressed between different languages, without needing explicit supervised training.