MOSS-TTS-Nano ONNX Demo

State-of-the-art text-to-speech demo for multilingual voice cloning.

Voice Clone - Clone any voice from a reference audio.
Voice Presets - Choose built-in demos from assets/demo.jsonl.

Built with MOSS-TTS-Nano.

Demo

Prompt Speech

Using the selected demo prompt speech.

Text

Generation Options

Max New Frames

Voice Clone Max Text Tokens

Max TTS Batch Size (0=auto)

Max Codec Batch Size (0=auto)

0 keeps the current default behavior. Set Max TTS Batch Size to 1 to force split chunks to run one by one. Buffered generation keeps chunk order and decodes codec sub-batches no larger than the current TTS batch. Realtime Streaming Decode keeps output order and uses the smallest active chunk-group width among auto batching, Max TTS Batch Size, and Max Codec Batch Size.

CPU Threads

This ONNX app uses the server-start execution provider. CPU Threads selects the cached ONNX runtime instance for that request.

Sampling Mode

fixed uses the baked ONNX sampling constants.

Seed

Text Temperature

Text Top P

Text Top K

Audio Temperature

Audio Top P

Audio Top K

Audio Repetition Penalty

Do Sample (derived from Sampling Mode)

Enable WeTextProcessing

Enable normalize_tts_text

WeTextProcessing and normalize_tts_text can now be toggled independently for each request. WeTextProcessing is preloaded during startup so enabling it does not add first-request graph-build latency.

Realtime Streaming Decode

Initial Playback Delay (s)

Warmup Status

Warmup complete. device=cpu elapsed=34.83s | WeTextProcessing ready.

Text Normalization Status

WeTextProcessing ready. languages=zh,en

Run Status

Idle.

Normalized Text

Playback Script

The current sentence will be highlighted here during playback.

Generated Speech

Checkpoint: /home/user/app/models/MOSS-TTS-Nano-100M-ONNX

Audio Tokenizer: /home/user/app/models/MOSS-Audio-Tokenizer-Nano-ONNX