beautyyuyanli/multilingual-e5-large
Summary
Multilingual E5-large embedding model is now available on Replicate, costing ~$0.00098 per run and completing in ~1 second on Nvidia L40S.
View Cached Full Text
Cached at: 04/23/26, 01:44 PM
Similar Articles
krthr/clip-embeddings
A CLIP-based embedding model hosted on Replicate that generates 768-dimensional embeddings for both images and text using the clip-vit-large-patch14 architecture, costing ~$0.00022 per run.
Building a Fast Multilingual OCR Model with Synthetic Data
NVIDIA introduces Nemotron OCR v2, a fast multilingual OCR model built using synthetic data generation. The model achieves 34.7 pages/second on a single A100 GPU by using a unified FOTS-based architecture with feature reuse across detection, recognition, and relational components.
New embedding models and API updates
OpenAI released two new embedding models: text-embedding-3-small (5x cheaper than ada-002 with 40%+ MIRACL improvement) and text-embedding-3-large (best performance with up to 3072 dimensions). Both models show significant performance gains on standard benchmarks while reducing costs.
@0xshimei: https://x.com/0xshimei/status/2053088751862288846
This article provides a comprehensive 2026 guide to free and low-cost large language models, comparing domestic (China) and international options.
@0xSero: GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45…
A quantized 478B-parameter GLM-5.1 model runs on 4×RTX Pro 6000 GPUs via SGLang, delivering 370k-token context at up to 45 tok/s decode and 1340 tok/s prefill, and is demoed driving Figma.