k2-fsa/OmniVoice

Hugging Face Models Trending 2026/03/30 13:43 模型

text-to-speech voice-cloning multilingual diffusion-model open-source ai-speech

摘要

OmniVoice 是一款大规模多语言零样本文本转语音模型，支持超过 600 种语言，基于扩散语言模型架构构建，具备快速推理和语音克隆能力。

任务：文本转语音标签：omnivoice, safetensors, zero-shot, 多语言, voice-cloning, voice-design, 文本转语音, aae, aal, aao, ab, abb, abn, abr, abs, abv, acm, acw, acx, adf, adx, ady, aeb, aec, af, afb, afo, ahl, ahs, ajg, aju, ala, aln, alo, am, amu, an, anc, ank, anp, anw, aom, apc, apd, arb, arq, ars, ary, arz, as, ast, avl, awo, ayl, ayp, az, ba, bag, bas, bax, bba, bbj, bbl, bbu, bce, bci, bcs, bcy, bda, bde, bdm, be, beb, bew, bfd, bft, bg, bgp, bhb, bhh, bho, bhp, bhr, bjj, bjk, bjn, bjt, bkh, bkm, bky, bmm, bmq, bn, bnm, bnn, bns, bo, bou, bqg, br, bra, brh, bri, brx, bs, bsh, bsj, bsk, btm, btv, bug, bum, buo, bux, bwr, bxf, byc, bys, byv, byx, bzc, bzw, ca, ccg, ceb, cen, cfa, cgg, chq, cjk, ckb, ckl, ckr, cky, cnh, cpy, cs, cte, ctl, cut, cux, cv, cy, da, dag, dar, dav, dbd, dcc, de, deg, dgh, dgo, dje, dmk, dml, dru, dty, dua, dv, dyu, dzg, ebr, ebu, ego, eiv, eko, ekr, el, elm, en, eo, es, esu, et, eto, ets, etu, eu, ewo, ext, eyo, fa, fan, fat, ff, ffm, fi, fia, fil, fip, fkk, fmp, fr, fub, fuc, fue, fuf, fuh, fui, fuq, fuv, fy, ga, gbm, gbr, gby, gcc, gdf, gej, ges, ggg, gid, gig, giz, gjk, gju, gl, glw, gn, gol, gom, gsl, gu, gui, gur, guz, gv, gwc, gwe, gwt, gya, gyz, ha, hah, hao, haw, haz, hbb, he, hem, hi, hia, hkk, hla, hno, hoj, hr, hsb, ht, hu, hue, hul, hux, hwo, hy, hz, ia, ibb, id, ida, idu, ig, ijc, ijn, ik, ikw, is, ish, iso, it, its, itw, itz, ja, jal, jax, jgo, jmx, jns, jqr, juk, juo, jv, ka, kab, kai, kaj, kam, kbd, kbl, kbt, kcq, kdh, kea, keu, kfe, kfk, kfp, khg, khw, kj, kjc, kjk, kk, kln, kls, km, kmr, kmy, kn, kna, knn, ko, kol, koo, kpo, kqo, ks, ksd, ksf, kto, kuh, kvx, kw, kwm, kxp, ky, kyx, lag, lb, lcm, ldb, lg, lij, lir, lkb, lla, ln, lnu, lo, loa, lrk, lss, lt, ltg, lto, lua, luo, lus, lv, lwg, mab, maf, mai, mau, max, mbo, mcf, mcn, mcx, mdd, mde, mdf, mek, mer, meu, mfm, mfn, mfo, mfv, mgg, mgi, mhk, mhr, mi, mig, miu, mk, mkf, mki, ml, mlq, mn, mne, mni, mqy, mr, mrj, mrr, mrt, ms, mse, msh, msw, mt, mtr, mtu, mtx, mua, mug, mui, mve, mvy, mxs, mxu, mxy, my, myv, mzl, nal, nan, nap, nb, nbh, ncf, nco, ncx, ndi, ng, ngi, nhg, nhi, nhn, nhq, nja, nl, nla, nlv, nmg, nmz, nn, nnh, no, noe, npi, nso, ny, nyu, oc, odk, odu, ogo, om, orc, oru, ory, os, pa, pbs, pbt, pbu, pcm, pex, phl, phr, pip, piy, pko, pl, plk, plt, pmq, pms, pmy, pnb, poc, poe, pow, prq, ps, pst, pt, pua, pwn, qug, qum, qup, qur, qus, quv, qux, quy, qva, qvi, qvj, qvl, qwa, qws, qxa, qxp, qxt, qxu, qxw, rag, rm, ro, rob, rof, roo, rth, ru, rup, rw, sa, sah, sat, sau, say, sbn, sc, scl, scn, sd, sei, shu, si, sip, siw, sjr, sk, skg, skr, sl, sn, snc, snk, so, sol, sps, sq, sr, src, sro, ssi, ste, sua, sv, sva, sw, szy, ta, tan, tar, tay, tbf, tcf, tcy, tdn, tdx, te, tg, tgc, th, the, thq, thr, thv, ti, tig, tio, tk, tkg, tkt, tli, tlp, tn, tok, tpl, tpz, tqp, tr, trp, trq, trv, trw, tt, ttj, ttr, ttu, tui, tul, tuq, tuv, tuy, tvo, tvu, tw, twu, txs, txy, udl, ug, uk, uki, umb, ur, ush, uz, uzn, vai, var, ver, vi, vmc, vmj, vmm, vmp, vmz, vot, vro, wbl, wci, weo, wes, wja, wji, wo, wof, xh, xhe, xka, xmf, xmv, xmw, xpe, xti, xtu, yaq, yav, yay, ydd, ydg, yer, yes, yi, yo, yue, zga, zgh, zh, zoc, zoh, zor, zpv, zpy, ztg, ztn, ztp, zts, ztu, zu, zza, arxiv:2604.00688, base_model:Qwen/Qwen3-0.6B, base_model:finetune:Qwen/Qwen3-0.6B, license:apache-2.0, region:us

查看原文导出为 Word 导出为 PDF

查看缓存全文

缓存时间: 2026/05/08 09:09

k2-fsa/OmniVoice · Hugging Face

来源：https://huggingface.co/k2-fsa/OmniVoice OmniVoice

Hugging Face 模型 (https://huggingface.co/k2-fsa/OmniVoice)Hugging Face 空间 (https://huggingface.co/spaces/k2-fsa/OmniVoice)GitHub 代码 (https://github.com/k2-fsa/OmniVoice)在 Colab 中打开 (https://colab.research.google.com/github/k2-fsa/OmniVoice/blob/master/docs/OmniVoice.ipynb)

OmniVoice 是一款大规模多语言零样本文本转语音（TTS）模型，支持超过 600 种语言。它基于创新的扩散语言模型风格架构构建，能够以卓越的推理速度生成高质量语音，支持语音克隆和语音设计。

**论文：**OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models (https://huggingface.co/papers/2604.00688)
**代码仓库：**GitHub (https://github.com/k2-fsa/OmniVoice)
**演示：**Hugging Face 空间 (https://huggingface.co/spaces/k2-fsa/OmniVoice)
**Colab：**Google Colab 笔记本 (https://colab.research.google.com/github/k2-fsa/OmniVoice/blob/master/docs/OmniVoice.ipynb)

https://huggingface.co/k2-fsa/OmniVoice#key-features核心特性

支持 600+ 种语言：在零样本 TTS 模型中语言覆盖范围最广。
语音克隆：从短参考音频中实现最先进的语音克隆质量。
语音设计：通过指定的说话人属性（性别、年龄、音高、方言/口音、耳语等）控制语音。
细粒度控制：非语言符号（例如，[laughter]）以及通过拼音或音素进行发音纠正。
快速推理：RTF 低至 0.025（比实时快 40 倍）。
扩散语言模型风格架构：简洁、流畅且可扩展的设计，兼顾质量与速度。

https://huggingface.co/k2-fsa/OmniVoice#usage使用方法

首先，安装 omnivoice 库：

建议使用全新的虚拟环境（例如，conda、venv 等）以避免冲突。

步骤 1：安装 PyTorch

NVIDIA GPU``

根据你的 CUDA 版本安装 PyTorch，例如

pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 –extra-index-url https://download.pytorch.org/whl/cu128 ``

其他版本的安装请参考 PyTorch 官方网站 (https://pytorch.org/get-started/locally/)。

Apple Siliconpip install torch==2.8.0 torchaudio==2.8.0

步骤 2：安装 OmniVoice

pip install omnivoice

https://huggingface.co/k2-fsa/OmniVoice#python-apiPython API

您可以按如下方式使用 OmniVoice 进行零样本语音克隆：

`` from omnivoice import OmniVoice import soundfile as sf import torch

加载模型

model = OmniVoice.from_pretrained( “k2-fsa/OmniVoice”, device_map=“cuda:0”, dtype=torch.float16 )

生成音频

audio = model.generate( text=“Hello, this is a test of zero-shot voice cloning.”, ref_audio=“ref.wav”, ref_text=“Transcription of the reference audio.”, ) # audio 是一个 np.ndarray 列表，形状为 (T,)，采样率为 24 kHz。

sf.write(“out.wav”, audio[0], 24000) ``

更多生成模式（例如，语音设计）、功能（例如，非语言符号、发音纠正）以及完整的使用说明，请参阅我们的 GitHub 仓库 (https://github.com/k2-fsa/OmniVoice)。

https://huggingface.co/k2-fsa/OmniVoice#discussion–communication讨论与交流

您可以直接在 GitHub Issues (https://github.com/k2-fsa/OmniVoice/issues) 上进行讨论。

您也可以扫描二维码加入我们的微信群或关注我们的微信公众号。

https://huggingface.co/k2-fsa/OmniVoice#citation引用

@article{zhu2026omnivoice, title={OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models}, author={Zhu, Han and Ye, Lingxuan and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Han, Zhifeng and Zhuang, Weiji and Lin, Long and Povey, Daniel}, journal={arXiv preprint arXiv:2604.00688}, year={2026} }

https://huggingface.co/k2-fsa/OmniVoice#disclaimer免责声明

严禁用户将此模型用于未经授权的语音克隆、语音冒充、欺诈、诈骗或任何其他非法或不道德的活动。所有用户应确保完全遵守适用的当地法律、法规和道德标准。开发者对此模型的任何滥用不承担任何责任，并倡导负责任的 AI 开发和使用，鼓励社区在 AI 研究和应用中坚持安全和道德原则。

相似文章

@GitTrend0x: 卧槽兄弟们本地跑语音克隆+电影级视频配音，直接支持646种语言，完全离线、无API密钥、无需联网，ElevenLabs直接被干翻 https://github.com/debpalash/OmniVoice-Studio… 这波开源神器…

X AI KOLs Timeline

OmniVoice Studio is an open-source desktop app that enables local voice cloning and cinematic video dubbing across 646 languages, fully offline with no API keys, positioning itself as a privacy-focused alternative to ElevenLabs.

k2-fsa/OmniVoice

k2-fsa/OmniVoice · Hugging Face

https://huggingface.co/k2-fsa/OmniVoice#key-features核心特性

https://huggingface.co/k2-fsa/OmniVoice#usage使用方法

根据你的 CUDA 版本安装 PyTorch，例如

https://huggingface.co/k2-fsa/OmniVoice#python-apiPython API

加载模型

生成音频

https://huggingface.co/k2-fsa/OmniVoice#discussion–communication讨论与交流

https://huggingface.co/k2-fsa/OmniVoice#citation引用

https://huggingface.co/k2-fsa/OmniVoice#disclaimer免责声明

相似文章

@GitTrend0x: 卧槽兄弟们本地跑语音克隆+电影级视频配音，直接支持646种语言，完全离线、无API密钥、无需联网，ElevenLabs直接被干翻 https://github.com/debpalash/OmniVoice-Studio… 这波开源神器…

openbmb/VoxCPM2

OmniFlatten：一种用于无缝语音对话的端到端 GPT 模型

OpenMOSS-Team/MOSS-TTS-Nano-100M

@billtheinvestor: 上海交通大学开源 F5-TTS 语音生成模型。该模型基于 10 万小时数据训练，支持中英多语言合成。技术特性包含 Zero-shot 声音克隆、基于总时长的速度控制、情感表现控制及长文本合成。支持商用。

提交意见反馈

k2-fsa/OmniVoice · Hugging Face

https://huggingface.co/k2-fsa/OmniVoice#key-features核心特性

https://huggingface.co/k2-fsa/OmniVoice#usage使用方法

根据你的 CUDA 版本安装 PyTorch，例如

https://huggingface.co/k2-fsa/OmniVoice#python-apiPython API

加载模型

生成音频

https://huggingface.co/k2-fsa/OmniVoice#discussion–communication讨论与交流

https://huggingface.co/k2-fsa/OmniVoice#citation引用

https://huggingface.co/k2-fsa/OmniVoice#disclaimer免责声明

相似文章

@GitTrend0x: 卧槽兄弟们 本地跑语音克隆+电影级视频配音，直接支持646种语言，完全离线、无API密钥、无需联网，ElevenLabs直接被干翻 https://github.com/debpalash/OmniVoice-Studio… 这波开源神器…

openbmb/VoxCPM2

OmniFlatten：一种用于无缝语音对话的端到端 GPT 模型

OpenMOSS-Team/MOSS-TTS-Nano-100M

@billtheinvestor: 上海交通大学开源 F5-TTS 语音生成模型。该模型基于 10 万小时数据训练，支持中英多语言合成。技术特性包含 Zero-shot 声音克隆、基于总时长的速度控制、情感表现控制及长文本合成。支持商用。

提交意见反馈

@GitTrend0x: 卧槽兄弟们本地跑语音克隆+电影级视频配音，直接支持646种语言，完全离线、无API密钥、无需联网，ElevenLabs直接被干翻 https://github.com/debpalash/OmniVoice-Studio… 这波开源神器…