Tag
PaddleOCR releases PP-OCRv6, a new OCR model series with sizes from 1.5M to 34.5M parameters, offering improved accuracy and faster inference, supporting 50 languages and new scenarios like PCB and CAD drawings, under Apache 2.0 open source license.
CodeAlchemy is a synthetic data generation framework that transforms publicly available code into semantically rich training data using five strategies, producing over 500 billion tokens and enabling small models to outperform much larger ones on code benchmarks.
Introducing an open-source voice synthesis model with 2 billion parameters and 2 million hours of training. It supports 30 languages and 9 Chinese dialects, allows voice description via natural language, can clone voices from a 3-second recording, delivers 48kHz studio-quality audio, and is free for commercial use under the Apache-2.0 license.
Microsoft has released MAI-Voice-2, an expressive text-to-speech system supporting voice cloning in 15 languages.
Ax is a JS/TS library providing high-level abstractions for LLM usage (signatures, agents, flows, optimizers). It now introduces axIR, allowing compilation to Python, Java, C++, and Go, making the same programming model available across multiple languages.
KrillinAI is an open-source tool that integrates the entire workflow of video downloading, subtitle translation, AI dubbing, and video compositing. It supports context-aware translation, voice cloning, auto layout, and cover generation, and is compatible with multiple AI models, suitable for multilingual audio/video content creation and distribution.
Recommend Scribe2SRT, an open-source speech-to-subtitle tool based on PySide6 and ElevenLabs API, supporting multiple languages with optimized formatting for fast generation of high-quality SRT subtitles.
FindMyAI is a free AI search engine that recommends the best AI tool for any task without requiring signup. It supports 18 languages and aims to help users quickly find suitable AI tools.
Supertonic is a lightning-fast, on-device TTS model with 99M parameters, supporting 31 languages. It runs locally with no API costs, outperforms cloud TTS on accuracy for numbers, phone numbers, and technical terms, and can be installed via Python, Node.js, Rust, Go, and more.
OpenAI Codex is a GPT-3 descendant trained on natural language and billions of lines of source code, capable of generating working code across 15+ programming languages with 3.5x more context memory than GPT-3, now available in private beta via API.
OpenAI released GPT Realtime-2 and two accompanying models during Build Hour, enhancing the intelligence and naturalness of voice interaction. It supports 128k context, parallel tool calls, and dynamic voice cloning, demonstrating production-grade applications such as voice-driven shopping assistants and analytics dashboards.