Tag
DeepSeek, a Chinese AI model built by a quant hedge fund, is reportedly competing with GPT-4 level performance at roughly 5% of the training cost, causing significant market disruption including a $600B drop in NVIDIA's market cap. A free 1 hour 50 minute course has been released teaching users how to leverage DeepSeek V4 locally and via API.
This paper compares a domain-trained small language model (Olava Extract) against frontier LLMs for structured contract extraction, showing that the specialized model achieves higher F1 scores and dramatically lower cost.
A tweet claims that a small visual language model fine-tuned on custom data can match GPT-5 accuracy while costing 50× less, citing Liquid AI’s 1.6B model running locally with llama.cpp.
TRACER is an open-source system that trains lightweight ML surrogates on production traces from LLM classification endpoints, routing requests through a parity gate that activates surrogates only when agreement with the original model exceeds a specified threshold. This approach achieves 83-100% surrogate coverage on intent classification benchmarks while maintaining interpretability into handling boundaries and failure modes.
Google introduces Gemini 3.1 Flash-Lite, a high-speed, cost-efficient AI model available in preview via Google AI Studio and Vertex API, designed for high-volume developer workloads.
W-RAC introduces a cost-efficient chunking framework for web document processing in RAG systems that reduces LLM token usage by an order of magnitude through structured content representation and retrieval-aware grouping decisions. The method decouples text extraction from semantic chunk planning, achieving comparable or better retrieval performance than traditional chunking approaches while minimizing hallucination risks.