@HuggingPapers: NVIDIA just released an optimized version of the Kokoro TTS model on Hugging Face A lightweight 82M parameter speech sy…

X AI KOLs Following Models

Summary

NVIDIA released an optimized ONNX version of the Kokoro TTS model on Hugging Face. The 82M parameter model is lightweight, fast, and ready for commercial use.

NVIDIA just released an optimized version of the Kokoro TTS model on Hugging Face A lightweight 82M parameter speech synthesizer ready for commercial use, running fast on NVIDIA GPUs via ONNX Runtime. https://t.co/mhxM7fMAWL
Original Article
View Cached Full Text

Cached at: 05/31/26, 12:28 AM

NVIDIA just released an optimized version of the Kokoro TTS model on Hugging Face

A lightweight 82M parameter speech synthesizer ready for commercial use, running fast on NVIDIA GPUs via ONNX Runtime.

https://t.co/mhxM7fMAWL


nvidia/kokoro-82M-onnx-opt · Hugging Face

Source: https://huggingface.co/nvidia/kokoro-82M-onnx-opt

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#kokoro-overviewKokoro Overview

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#descriptionDescription:

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost‑efficient. Kokoro can be deployed anywhere from production environments to personal projects. Kokoro was developed by hexgrad. This model is ready for commercial/non-commercial use.

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#third-party-community-considerationThird-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIAhexgrad Model Card

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#licenseterms-of-useLicense/Terms of Use:

Apache-2.0

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#deployment-geographyDeployment Geography:

Global

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#use-caseUse Case:

Developers and enterprises building text‑to‑speech applications, voice assistants, and audio generation services. Suitable for any domain that requires high‑quality, low‑latency speech synthesis, from production APIs to personal projects.

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#release-dateRelease Date:

**HuggingFace:**05/29/2026 via [URL]

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#referencesReference(s):

StyleTTS 2 ISTFTNet

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#model-architectureModel Architecture:

**Architecture Type:**Transformer **Network Architecture:**StyleTTS 2, ISTFTNet, Decoder only This model was developed based on yl4579/StyleTTS2-LJSpeech. **Number of model parameters:**82M (8.2*10^7)

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#inputInput:

**Input Type(s):**Text **Input Format(s):**String **Input Parameters:**One-Dimensional (1D) Other properties related to input:**Input Length:max length ~500 tokens, recommend to split input into chunks 100-200 tokens longInput Language:**English - full support, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, Brazilian Portugese - partial support

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#outputOutput:

**Output Type(s):**Audio **Output Format:**Audio (.wav, .mp3) **Output Parameters:**One-Dimensional (1D) **Other Properties Related to Output:**Audio output duration is approximately one minute per 1,000 characters of input text.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#software-integrationSoftware Integration:

Runtime Engine(s):

  • ONNXRuntime win-x64-gpu_cuda13-1.24.3Supported Hardware Microarchitecture Compatibility:
  • NVIDIA Ampere
  • NVIDIA Blackwell
  • NVIDIA Lovelace
  • NVIDIA Turing**[Preferred/Supported] Operating System(s):**Windows 10/11

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

This AI model can be embedded as an Application Programming Interface (API) call into the software environment described above.

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#model-versionsModel Version(s):

v1.0

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#training-testing-and-evaluation-datasetsTraining, Testing, and Evaluation Datasets:

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#training-datasetTraining Dataset:

**Link:UndisclosedData Modality:**Audio **Audio Training Data Size:**Less than 10,000 Hours **Data Collection Method by dataset:**Hybrid: Automated, Synthetic **Labeling Method by dataset:**Automated **Properties (Quantity, Dataset Descriptions, Sensor(s)):**Kokoro was trained exclusively on permissive, non‑copyrighted audio data and IPA phoneme labels. The dataset comprises public‑domain recordings, audio released under permissive licenses, and synthetic audio generated by closed‑source TTS models. Overall, the training corpus amounts to a few hundred hours of audio.

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#testing-datasetTesting Dataset:

**Link:UndisclosedData Collection Method by dataset:**Undisclosed **Labeling Method by dataset:**Undisclosed **Properties (Quantity, Dataset Descriptions, Sensor(s)):**Undisclosed

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#evaluation-datasetEvaluation Dataset:

**Link:UndisclosedData Collection Method by dataset:**Undisclosed **Labeling Method by dataset:**Undisclosed **Properties (Quantity, Dataset Descriptions, Sensor(s)):**Undisclosed

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#inferenceInference:

Acceleration Engine:

  • TensorRT
  • CUDA
  • CoreML
  • Xnnpack
  • Nnapi
  • DirectML

Test Hardware:

  • NVIDIA GeForce RTX 4090
  • NVIDIA GeForce RTX 3070 Ti
  • NVIDIA GeForce RTX 2060

https://huggingface.co/nvidia/kokoro-82M-onnx-opt#ethical-considerationsEthical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or concernshere.

Similar Articles

@kwindla: https://x.com/kwindla/status/2062544580105359686

X AI KOLs Timeline

NVIDIA released Nemotron 3.5 ASR, an open-source multilingual speech-to-text model with the lowest latency tested, available in multilingual and English-only variants, ideal for voice agents and self-hosted deployments.

jaaari/kokoro-82m

Replicate Explore

Kokoro-82M is an efficient, high-quality text-to-speech model available on Replicate, supporting multiple languages and voices with low inference cost.