NAVA — Audio-Video Generator (ZeroGPU)

Single H200 · FP8 · Default 5s @ 24fps · 25 steps. ~5 minutes per request when the queue is short.

Tip: ① type a short prompt (Chinese or English). ② optionally upload a first-frame image — I2V mode auto-enables, aspect ratio auto-switches. ③ click Rewrite Prompt — Qwen3 expands your input into the long Chinese caption NAVA was trained on, and (when an image is uploaded) Qwen3-VL captions the scene and composes it into the rewrite. Wrap any spoken line in <S>...<E> — the rewriter preserves these verbatim.

Image (optional — uploads enable I2V mode)

Speaker Reference (optional, max 2)

10 100
2 10
Aspect Ratio (auto-set when you upload an image)