small-model

Tag

Cards List
#small-model

@no_stp_on_snek: what actually surprised me fine-tuning a small open model. note im failry new in this area so some of this may seem obv…

X AI KOLs Timeline · 7h ago Cached

A developer shares surprising lessons from fine-tuning a small open model, including that base models often already max out on intended improvements, the real weakness is behavior (caving), and fine-tuning requires careful measurement and balancing.

0 favorites 0 likes
#small-model

@manateelazycat: Did a big shot come from Baidu's AI Whampoa Military Academy? The open-source Unlimited OCR, based on DeepSeek OCR, immediately drops a killer move. According to its published data, it scored 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR and...

X AI KOLs Timeline · yesterday Cached

The open-source OCR model Unlimited OCR, based on DeepSeek OCR, achieves 93.23 on OmniDocBench v1.5 with only 3B parameters, outperforming DeepSeek OCR, Gemini 2.5, and others.

0 favorites 0 likes
#small-model

@VikParuchuri: Surya 2, which has 650M params and scores 83.3% on olmocr, is the most accurate small OCR model. One reason why is char…

X AI KOLs Following · yesterday Cached

Surya 2 is a 650M parameter OCR model achieving 83.3% on olmocr, claiming to be the most accurate small OCR model thanks to character tokenization which improves accuracy and model size.

0 favorites 0 likes
#small-model

@xdotli: ICYMI Nanbeige 4.1, a 3b model released by Chinese Indeed, outperforms Qwen3-30b-A3b + Qwen 3.5 4b. It can finish long …

X AI KOLs Timeline · 3d ago Cached

Nanbeige 4.1, a 3B model from Chinese Indeed, outperforms larger Qwen models on tasks requiring 600+ tool calls.

0 favorites 0 likes
#small-model

@TheAhmadOsman: 600M that beats a 397B and Sonnet 4.5 Small and specialized models FTW

X AI KOLs Following · 4d ago Cached

A 600M parameter reasoning model trained using SYNTH reportedly outperforms a 397B model and Sonnet 4.5 in an industrial application for the Paris subway, highlighting the effectiveness of small, specialized models.

0 favorites 0 likes
#small-model

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

Reddit r/LocalLLaMA · 5d ago

Inflect-Nano, an ultra-extreme tiny 4.63 million parameter text-to-speech model, has been released.

0 favorites 0 likes
#small-model

A 4b model is now beating 30b ones at web research and the reason is not size

Reddit r/artificial · 6d ago

A 4 billion parameter open model from the Apodex family outperforms 30 billion parameter models on web research benchmarks, attributed to careful training data and self-verification techniques rather than raw scale, suggesting a more democratic trajectory for AI capability.

0 favorites 0 likes
#small-model

VibeThinker-3B: what is this witchcraft? Killing it at MathQA like it has ~30B parameters

Reddit r/LocalLLaMA · 6d ago

VibeThinker-3B is a small 3B parameter model that achieves performance comparable to ~30B parameter models on the MathQA benchmark, demonstrating significant efficiency.

0 favorites 0 likes
#small-model

@aijoey: WeiboAI dropped VibeThinker-3B, so I had to try it locally. this is a 3B model, not a giant frontier system. in the vid…

X AI KOLs Timeline · 6d ago Cached

WeiboAI released VibeThinker-3B, a small 3B reasoning model tested locally on coding tasks, achieving 3/3 on algorithm problems.

0 favorites 0 likes
#small-model

Glimmer 1 - Glint Research. A foundational 10,000 parameter language model

Reddit r/LocalLLaMA · 2026-06-16

Introduces Glimmer, a 10,000 parameter language model trained on 500K tokens of FineWeb-Edu with a standard Llama architecture, available on HuggingFace.

0 favorites 0 likes
#small-model

@kimmonismus: Crazy: A 3B model is now reaching highly competitive results on verifiable reasoning tasks. VibeThinker-3B scores 94.3 …

X AI KOLs Following · 2026-06-16 Cached

A 3B model, VibeThinker-3B, achieves highly competitive results on verifiable reasoning tasks through post-training refinements on Qwen2.5-Coder, including curriculum SFT, multi-domain RL, offline self-distillation, and a final RL-based instruct stage.

0 favorites 0 likes
#small-model

*cough* gemma3 270M *cough*

Reddit r/LocalLLaMA · 2026-06-15

Google releases Gemma 3, a 270M parameter language model.

0 favorites 0 likes
#small-model

Want to build a custom model

Reddit r/LocalLLaMA · 2026-06-14

A user discusses building a small autocomplete model (25M parameters) as a learning project, mentions hardware constraints (32GB VRAM), data requirements (~100M tokens), and seeks advice on datasets and data formatting for autocomplete-style training.

0 favorites 0 likes
#small-model

Spent the weekend on the Apodex 4b, plus a quick look at the 35b mini

Reddit r/LocalLLaMA · 2026-06-12

The author tests the Apodex 4B-SFT and 35B mini models, finding the 4B-SFT surpasses other 4B models in multi-hop search tasks without hallucination, and notes the design philosophy of separating answer checking from generation.

0 favorites 0 likes
#small-model

@omarsar0: this model is the opposite of mythos. Its small, cost effective, apache 2.0, and locally deployable. This is the way LL…

X AI KOLs Following · 2026-06-10

This model is small, cost-effective, open-source (Apache 2.0), and locally deployable, representing a shift towards transparent and sovereign AI.

0 favorites 0 likes
#small-model

@NielsRogge: On http://paperswithcode.co, you can see Mythos 5 getting beaten by a 4B open-source model on CharXiv, a popular chart …

X AI KOLs Following · 2026-06-09 Cached

A 4B open-source model beats Mythos 5 on the CharXiv chart understanding benchmark, showing strong performance from a freely available small model.

0 favorites 0 likes
#small-model

@nickfrosst: this model is the opposite of mythos. Its small, cost effective, apache 2.0, and locally deployable. This is the way LL…

X AI KOLs Following · 2026-06-09 Cached

Cohere released North Mini Code, its first open-source coding model under Apache 2.0, designed to be small, cost-effective, locally deployable, and focused on agentic performance.

0 favorites 0 likes
#small-model

@berryxia: Damn, this directly steals Apple's thunder! A 6.6B small model shuts up Siri and a bunch of cloud giants, running locally on Mac with just 7GB of RAM. CJ Zafir's Mac-1 not only has ridiculously small parameters but also integrates 487 Mac-native tools, enabling chain calls, automatic reasoning, and more...

X AI KOLs Timeline · 2026-06-08 Cached

CJ Zafir's team has introduced Mac-1, a 6.6B-parameter small model that runs locally on Mac with only 7GB of RAM. It can chain-call 487 Mac-native tools, with an inference speed of 65 tok/s, aiming to disrupt the cloud-based large model-dominated Agent paradigm.

0 favorites 0 likes
#small-model

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Ars Technica · 2026-06-03 Cached

Google releases Gemma 4 12B, a compact AI model optimized for local laptop use with only 16GB of RAM, featuring multi-token prediction and streamlined multimodal capabilities for text, audio, and images.

0 favorites 0 likes
#small-model

@_philschmid: We just launched a Gemma 4 12B! Our first mid-sized model with native audio inputs. Gemma 4 12 B is a unified, encoder-…

X AI KOLs Following · 2026-06-03 Cached

We just launched Gemma 4 12B, a mid-sized multimodal model with native audio inputs, requiring only 16GB memory and released under Apache 2.0.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback