@gkxspace: I spend two to three thousand on AI subscriptions every month, some for TTS, ASR, etc. The mainstream ones are expensive and their API protocols differ. I kept thinking: is there a single plan that covers voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and coding? Finally found a godsend—StepFun's S...

X AI KOLs Timeline 05/20/26, 01:04 PM Products

ai-subscription voice-cloning tts asr speech-to-text image-generation step-fun

Summary

StepFun launches Step Plan subscription at $6.99/month, integrating LLM, TTS, ASR, image generation, and other AI models. Supports direct OpenAI SDK connection, applicable for voice cloning, meeting transcription, AI podcast generation, etc.

I spend two to three thousand on AI subscriptions every month, some for TTS, ASR, etc. The mainstream ones are expensive and their API protocols differ. I kept thinking: is there a single plan that covers: Voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and coding Finally found a godsend—StepFun's Step Plan, $6.99/month, more than enough. So I gradually canceled the others. One subscription includes top-tier models across categories: 1. LLM: Step 3.5 Flash, extremely low latency, also compatible with Claude / Cursor / Cline 2. TTS: stepaudio-2.5-tts (ranked higher than ElevenLabs, as I checked) 3. ASR: Real-time voice dialogue, voice cloning supported 4. Image generation: text-to-image + image editing, 0.7 seconds per image All directly connectable via OpenAI SDK, just change the base_url. Here are some use cases (details in comments): 1. English recording → Chinese notes in 54 seconds 2. English long text → dual-speaker mp3 for commute listening 3. Same text → TTS performs 7 emotions 4. Lu Xun's "Kong Yiji" → audiobook with auto-split characters 5. English podcast → end-to-end Chinese remake @StepFun_ai

Original Article

View Cached Full Text

Cached at: 05/20/26, 04:35 PM

I used to spend two to three thousand yuan a month on AI subscriptions, some of which were for TTS, ASR, etc. The mainstream services are quite expensive, and their API protocols are all different.

I’ve always been looking for a single plan that could do it all: voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and code writing.

Finally found a true lifesaver — Step Plan by StepFun. It costs $6.99 per month and I can never use it all up. So I gradually canceled all the others.

One subscription gets you access to top-tier models of all kinds:

LLM: Step 3.5 Flash — incredibly low latency, and you can also integrate it with Claude / Cursor / Cline.
TTS: stepaudio-2.5-tts (I checked; its ranking is higher than ElevenLabs).
ASR: Real-time voice conversations with voice cloning support.
Image generation: Text-to-image + image editing, generating images in 0.7 seconds.

All accessible directly via the OpenAI SDK — just change the base URL.

Here are some use cases (details in the comments):

English audio recording → Chinese notes in 54 seconds
Long English article → dual‑speaker MP3 for commuting
Same text → TTS with 7 different emotions
Lu Xun’s Kong Yiji → automatic role‑based audiobook
English podcast → end-to-end Chinese remake

@StepFun_ai

Similar Articles

@laobaishare: This is incredible. Google just dropped a free AI voice dictation app, supporting iOS and Mac. All paid features unlocked, no subscription needed. 100% free, fully local, powered by Gemma 4. Download here: https://ai.google.dev/edg…

@hisevenih: The AI voice community is blown away. This GitHub open-source black tech takes AI voice to an insane level, truly achieving: one sentence, one voice. Remember this project name: VoxCPM2. It has already gained 20K stars on GitHub. Most incredibly, it doesn't even need a reference audio…

Submit Feedback

Similar Articles

@MaxForAI: If you are working on voice agents, you should try this project. A team from NTU, NUS, and Shanghai AI Lab released: Mega-ASR. This fully open-source ASR is built on Qwen3-ASR, aiming to break the long-standing bottleneck of ASR performance in noisy, reverberant, or other impaired real-world environments...

@yhslgg: Old Yang shares another gem open-source tool—KrillinAI, 10,000 stars on GitHub, a must-see for multilingual audio/video content! In a nutshell: from video download to subtitle translation, AI dubbing, video compositing, the entire pipeline is covered, and it can even auto-generate platform covers, supporting Bilibili, Douyin, Xiaohongshu, YouTube…

@laobaishare: This is incredible. Google just dropped a free AI voice dictation app, supporting iOS and Mac. All paid features unlocked, no subscription needed. 100% free, fully local, powered by Gemma 4. Download here: https://ai.google.dev/edg…

@hisevenih: The AI voice community is blown away. This GitHub open-source black tech takes AI voice to an insane level, truly achieving: one sentence, one voice. Remember this project name: VoxCPM2. It has already gained 20K stars on GitHub. Most incredibly, it doesn't even need a reference audio…

@denziideng: Another AI voice cloning 'dimensional reduction attack'... The CosyVoice I shared before can clone in 3 seconds, which I thought was already scary enough. But today's tool is even more lethal — after casually recording 1 minute of my own voice for training, it directly replicates tone, mannerisms, emotions, breathing, and pauses. It's almost like the soul of the original person possessed it! C...