@gkxspace: I spend two to three thousand on AI subscriptions every month, some for TTS, ASR, etc. The mainstream ones are expensive and their API protocols differ. I kept thinking: is there a single plan that covers voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and coding? Finally found a godsend—StepFun's S...

X AI KOLs Timeline 05/20/26, 01:04 PM Products

ai-subscription voice-cloning tts asr speech-to-text image-generation step-fun

Summary

StepFun launches Step Plan subscription at $6.99/month, integrating LLM, TTS, ASR, image generation, and other AI models. Supports direct OpenAI SDK connection, applicable for voice cloning, meeting transcription, AI podcast generation, etc.

I spend two to three thousand on AI subscriptions every month, some for TTS, ASR, etc. The mainstream ones are expensive and their API protocols differ. I kept thinking: is there a single plan that covers: Voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and coding Finally found a godsend—StepFun's Step Plan, $6.99/month, more than enough. So I gradually canceled the others. One subscription includes top-tier models across categories: 1. LLM: Step 3.5 Flash, extremely low latency, also compatible with Claude / Cursor / Cline 2. TTS: stepaudio-2.5-tts (ranked higher than ElevenLabs, as I checked) 3. ASR: Real-time voice dialogue, voice cloning supported 4. Image generation: text-to-image + image editing, 0.7 seconds per image All directly connectable via OpenAI SDK, just change the base_url. Here are some use cases (details in comments): 1. English recording → Chinese notes in 54 seconds 2. English long text → dual-speaker mp3 for commute listening 3. Same text → TTS performs 7 emotions 4. Lu Xun's "Kong Yiji" → audiobook with auto-split characters 5. English podcast → end-to-end Chinese remake @StepFun_ai

Original Article

View Cached Full Text

Cached at: 05/20/26, 04:35 PM

I used to spend two to three thousand yuan a month on AI subscriptions, some of which were for TTS, ASR, etc. The mainstream services are quite expensive, and their API protocols are all different.

I’ve always been looking for a single plan that could do it all: voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and code writing.

Finally found a true lifesaver — Step Plan by StepFun. It costs $6.99 per month and I can never use it all up. So I gradually canceled all the others.

One subscription gets you access to top-tier models of all kinds:

LLM: Step 3.5 Flash — incredibly low latency, and you can also integrate it with Claude / Cursor / Cline.
TTS: stepaudio-2.5-tts (I checked; its ranking is higher than ElevenLabs).
ASR: Real-time voice conversations with voice cloning support.
Image generation: Text-to-image + image editing, generating images in 0.7 seconds.

All accessible directly via the OpenAI SDK — just change the base URL.

Here are some use cases (details in the comments):

English audio recording → Chinese notes in 54 seconds
Long English article → dual‑speaker MP3 for commuting
Same text → TTS with 7 different emotions
Lu Xun’s Kong Yiji → automatic role‑based audiobook
English podcast → end-to-end Chinese remake

@StepFun_ai

Similar Articles

@FinanceYF5: AI subscription plan subsidies are much larger than imagined. Claude Max 20x: $200/month, actual usage value about $8,000. ChatGPT Pro 20x: $200/month, actual usage value about $14,000. You spend $200, they lose thousands supporting you. This price war,…

@FeitengLi: Next week, after adding speaker labeling and speech generation, it won't be this cheap early bird price anymore.

@cevenif: Bro, it's time to say goodbye to those paid voice tools! The open-source and free Voicebox has arrived, completely crushing paid giants like ElevenLabs and WisprFlow. Features: Voice cloning - instantly become anyone, Global voice input - accessible anytime...

Submit Feedback

Similar Articles

@FinanceYF5: AI subscription plan subsidies are much larger than imagined. Claude Max 20x: $200/month, actual usage value about $8,000. ChatGPT Pro 20x: $200/month, actual usage value about $14,000. You spend $200, they lose thousands supporting you. This price war,…

@FeitengLi: Next week, after adding speaker labeling and speech generation, it won't be this cheap early bird price anymore.

@MaxForAI: If you are working on voice agents, you should try this project. A team from NTU, NUS, and Shanghai AI Lab released: Mega-ASR. This fully open-source ASR is built on Qwen3-ASR, aiming to break the long-standing bottleneck of ASR performance in noisy, reverberant, or other impaired real-world environments...

@yhslgg: Old Yang shares another gem open-source tool—KrillinAI, 10,000 stars on GitHub, a must-see for multilingual audio/video content! In a nutshell: from video download to subtitle translation, AI dubbing, video compositing, the entire pipeline is covered, and it can even auto-generate platform covers, supporting Bilibili, Douyin, Xiaohongshu, YouTube…

@cevenif: Bro, it's time to say goodbye to those paid voice tools! The open-source and free Voicebox has arrived, completely crushing paid giants like ElevenLabs and WisprFlow. Features: Voice cloning - instantly become anyone, Global voice input - accessible anytime...