language-model

#language-model

@percyliang: For the next Marin model, we are putting together a new data mix. Currently we have 18T tokens, but could use more. So …

X AI KOLs Following ↗ · 7h ago Cached

Percy Liang announces that for the next Marin model, they are compiling a new data mix and request high-quality token data for pre-training, mid-training, and SFT.

0 favorites 0 likes

#language-model

@0xLogicrw: MIT's Kai-Ming He team has released a new language model, ELF (Embedded Language Flows). They bypass current autoregressive architectures by directly applying their expertise in diffusion models from the visual domain to text generation. Specifically: the entire generation process is embedded in a continuous vector space, converting it back to...

X AI KOLs Timeline ↗ · 15h ago

MIT's Kai-Ming He team released ELF, a new language model that uses diffusion processes in continuous vector space for text generation, bypassing standard autoregressive architectures and significantly reducing data requirements.

0 favorites 0 likes

#language-model

HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

arXiv cs.CL ↗ · 16h ago Cached

Hebatron is a new open-weight Hebrew-specialized Large Language Model built on NVIDIA's Nemotron-3 Mixture-of-Experts architecture, achieving strong reasoning performance with efficient inference. It is the first language-specific adaptation of this architecture and supports native long-context processing.

0 favorites 0 likes

#language-model

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper identifies KV-cache contamination as a failure mode for activation steering in dialogue and proposes GCAD, a method that extracts steering signals from prompt contributions and applies token-level gating to improve long-horizon coherence, achieving substantial gains on multi-turn benchmarks.

0 favorites 0 likes

#language-model

Emergent Modularity in Mixture-of-Experts Models (8 minute read)

TLDR AI ↗ · 2d ago Cached

Ai2 releases EMO, a 14B-parameter mixture-of-experts language model trained to develop emergent modularity. It allows using a small subset of experts for specific tasks while maintaining near full-model performance.

0 favorites 0 likes

#language-model

new MoE from ai2, EMO

Reddit r/LocalLLaMA ↗ · 4d ago

AI2 released EMO, a Mixture of Experts language model with 1B active parameters out of 14B total, trained on 1 trillion tokens and featuring document-level routing where experts cluster around domains.

0 favorites 1 likes

#language-model

Amália and the Future of European Portuguese LLMs

Hacker News Top ↗ · 5d ago Cached

The Portuguese government invested €5.5M in AMÁLIA, an open-source LLM for European Portuguese based on EuroLLM, but the model's data, weights, and benchmarks are not yet publicly available.

0 favorites 0 likes

#language-model

Construction of Knowledge Graph based on Language Model

arXiv cs.CL ↗ · 2026-04-22 Cached

Review paper from Kunming University surveys how pre-trained language models automate knowledge-graph construction and introduces LLHKG, a lightweight-LLM framework matching GPT-3.5 performance.

0 favorites 0 likes

#language-model

Remask, Don't Replace: Token-to-Mask Refinement in Masked Diffusion Language Models

arXiv cs.CL ↗ · 2026-04-22 Cached

Introduces Token-to-Mask (T2M) remasking to fix generation errors in masked diffusion LMs by resetting suspect tokens to mask state instead of overwriting, yielding up to +5.92 accuracy on CMATH without extra training or parameters.

0 favorites 0 likes

#language-model

Bulding my own Diffusion Language Model from scratch was easier than I thought [P]

Reddit r/MachineLearning ↗ · 2026-04-21

Developer shares a minimalist 7.5M-parameter diffusion language model trained from scratch on Shakespeare, releasing the code as a learning resource.

0 favorites 0 likes

#language-model

grok 4.3 beta: musk's ($300/month) megaphone

Reddit r/singularity ↗ · 2026-04-18

Grok 4.3 beta has been released, offering advanced AI capabilities through xAI's subscription service at $300/month, representing an incremental update to Elon Musk's AI assistant platform.

0 favorites 0 likes

#language-model

VaultGemma: The world's most capable differentially private LLM

Google DeepMind Blog ↗ · 2025-10-23 Cached

Google and DeepMind introduce VaultGemma, a 1B-parameter open-source language model trained with differential privacy, accompanied by new scaling laws research that characterizes the compute-privacy-utility trade-offs in differentially private LLM training.

0 favorites 0 likes

#language-model

Introducing gpt-oss

OpenAI Blog ↗ · 2025-08-05 Cached

OpenAI releases gpt-oss-120b and gpt-oss-20b, two state-of-the-art open-weight language models under Apache 2.0 license that achieve near-parity with proprietary models while being optimizable for consumer hardware and edge devices. Both models demonstrate strong reasoning and tool-use capabilities with comprehensive safety evaluations.

0 favorites 0 likes

#language-model

Introducing ChatGPT

OpenAI Blog ↗ · 2022-11-30 Cached

OpenAI introduces ChatGPT, a conversational AI model fine-tuned from GPT-3.5 using reinforcement learning from human feedback (RLHF). The model is designed to answer follow-up questions, admit mistakes, and reject inappropriate requests, with free access provided during the research preview.

0 favorites 0 likes

#language-model

GPT-3 powers the next generation of apps

OpenAI Blog ↗ · 2021-03-25 Cached

OpenAI announces that over 300 applications are now using GPT-3 through their API, nine months after launch, generating 4.5 billion words daily. Featured use cases include Viable for customer feedback analysis, Fable Studio for interactive storytelling, and Algolia for semantic search capabilities.

0 favorites 0 likes

#language-model

OpenAI API

OpenAI Blog ↗ · 2020-06-11 Cached

OpenAI announces the release of an API for accessing its AI models with a general-purpose text interface, launching in private beta with strict safety measures including mandatory production reviews and content restrictions to prevent harmful use cases.

0 favorites 0 likes

#language-model

GPT-2: 1.5B release

OpenAI Blog ↗ · 2019-11-05 Cached

OpenAI releases GPT-2 1.5B model with analysis of human perception of credibility, potential for misuse through fine-tuning on extremist ideologies, and challenges in detecting synthetic text. Detection models achieve ~95% accuracy but require complementary approaches for practical deployment.

0 favorites 0 likes

#language-model

Better language models and their implications

OpenAI Blog ↗ · 2019-02-14 Cached

OpenAI introduces GPT-2, a 1.5 billion parameter transformer-based language model trained on 40GB of internet text that achieves state-of-the-art performance on language modeling benchmarks and demonstrates zero-shot capabilities in reading comprehension, translation, question answering, and summarization. Due to safety concerns, only a smaller model and technical paper are released publicly rather than the full trained model.

0 favorites 0 likes

language-model

Submit Feedback