@gkxspace: I spend two to three thousand on AI subscriptions every month, some for TTS, ASR, etc. The mainstream ones are expensive and their API protocols differ. I kept thinking: is there a single plan that covers voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and coding? Finally found a godsend—StepFun's S...
Summary
StepFun launches Step Plan subscription at $6.99/month, integrating LLM, TTS, ASR, image generation, and other AI models. Supports direct OpenAI SDK connection, applicable for voice cloning, meeting transcription, AI podcast generation, etc.
View Cached Full Text
Cached at: 05/20/26, 04:35 PM
I used to spend two to three thousand yuan a month on AI subscriptions, some of which were for TTS, ASR, etc. The mainstream services are quite expensive, and their API protocols are all different.
I’ve always been looking for a single plan that could do it all: voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and code writing.
Finally found a true lifesaver — Step Plan by StepFun. It costs $6.99 per month and I can never use it all up. So I gradually canceled all the others.
One subscription gets you access to top-tier models of all kinds:
- LLM: Step 3.5 Flash — incredibly low latency, and you can also integrate it with Claude / Cursor / Cline.
- TTS: stepaudio-2.5-tts (I checked; its ranking is higher than ElevenLabs).
- ASR: Real-time voice conversations with voice cloning support.
- Image generation: Text-to-image + image editing, generating images in 0.7 seconds.
All accessible directly via the OpenAI SDK — just change the base URL.
Here are some use cases (details in the comments):
- English audio recording → Chinese notes in 54 seconds
- Long English article → dual‑speaker MP3 for commuting
- Same text → TTS with 7 different emotions
- Lu Xun’s Kong Yiji → automatic role‑based audiobook
- English podcast → end-to-end Chinese remake
@StepFun_ai
Similar Articles
@MaxForAI: If you are working on voice agents, you should try this project. A team from NTU, NUS, and Shanghai AI Lab released: Mega-ASR. This fully open-source ASR is built on Qwen3-ASR, aiming to break the long-standing bottleneck of ASR performance in noisy, reverberant, or other impaired real-world environments...
NTU, NUS, and Shanghai AI Lab jointly released Mega-ASR, a fully open-source ASR model built on Qwen3-ASR. Using the Voices-in-the-Wild-2M dataset and progressive acoustic-to-semantic optimization, it achieves up to 30% relative Word Error Rate (WER) reduction in real-world noisy environments. With only 1.7B parameters, it enables efficient inference on consumer-grade hardware.
@yhslgg: Old Yang shares another gem open-source tool—KrillinAI, 10,000 stars on GitHub, a must-see for multilingual audio/video content! In a nutshell: from video download to subtitle translation, AI dubbing, video compositing, the entire pipeline is covered, and it can even auto-generate platform covers, supporting Bilibili, Douyin, Xiaohongshu, YouTube…
KrillinAI is an open-source tool that integrates the entire workflow of video downloading, subtitle translation, AI dubbing, and video compositing. It supports context-aware translation, voice cloning, auto layout, and cover generation, and is compatible with multiple AI models, suitable for multilingual audio/video content creation and distribution.
@laobaishare: This is incredible. Google just dropped a free AI voice dictation app, supporting iOS and Mac. All paid features unlocked, no subscription needed. 100% free, fully local, powered by Gemma 4. Download here: https://ai.google.dev/edg…
Google launched a free AI voice dictation app, powered by Gemma 4, supporting iOS and Mac, fully local, no subscription needed.
@hisevenih: The AI voice community is blown away. This GitHub open-source black tech takes AI voice to an insane level, truly achieving: one sentence, one voice. Remember this project name: VoxCPM2. It has already gained 20K stars on GitHub. Most incredibly, it doesn't even need a reference audio…
GitHub open-source project VoxCPM2 achieves AI voice cloning without reference audio, generating target voice precisely with just one sentence, has gained 20K stars.
@denziideng: Another AI voice cloning 'dimensional reduction attack'... The CosyVoice I shared before can clone in 3 seconds, which I thought was already scary enough. But today's tool is even more lethal — after casually recording 1 minute of my own voice for training, it directly replicates tone, mannerisms, emotions, breathing, and pauses. It's almost like the soul of the original person possessed it! C...
GPT-SoVITS is an open-source AI voice cloning tool that supports zero-shot (5-second voice) and few-shot (1-minute training) high-fidelity voice cloning, cross-lingual inference, and comes with a complete WebUI toolchain. It has garnered 57.8k stars on GitHub, becoming the leading open-source project in the voice cloning field.