@ProfBuehlerMIT: For science, AI sovereignty and physics-grounded reasoning are non-negotiable. But how can we teach a small LLM like Ge…
Summary
mistral.rs now natively supports Agent Skills, enabling locally-run small LLMs to perform complex agentic workflows for scientific tasks, with full control over models, data, and execution.
View Cached Full Text
Cached at: 06/18/26, 06:09 AM
For science, AI sovereignty and physics-grounded reasoning are non-negotiable. But how can we teach a small LLM like Gemma-4-E4B physics? One way is to use Agent Skills, but this has so far been limited to closed frontier models. mistral․rs now implements Agent Skills natively: the first self-hosted inference engine that does this as part of the local inference substrate, where we can use small models to solve complex scientific and other tasks in a flexible and scalable way.
We are in a period of uncertainty about frontier models - access, pricing, deprecation, abrupt restriction. The good news is that when the entire stack runs locally we can build AI that is entirely your own: You own the weights, the skills, the execution loop, the data - all of it runs on your hardware and is reproducible and durable.
While virtually all local inference engines expose a model behind an OpenAI-compatible endpoint, everything agentic is then assembled around it by an external orchestrator that injects context, manages tools, mounts files, and brokers execution. mistral․rs is natively agentic and moves that machinery into the server itself, allowing us to build complex agentic workflows and run them locally, on open-source models.
With this new feature you can now upload Agent Skills bundles to /v1/skills, reference them from Responses API requests by identity, and run them inside a native agentic loop with persistent Python sessions, figure capture, sandboxed shell execution, file inputs mounted directly into the working session; plug-and-play and completely compatible with your existing code/workflow.
A model with a native skill substrate can act, observe consequences, and can modify what it is able to do. The skill is retained procedural capability of the system.
Attached is a short video of all of it: skills, code execution, the full agentic loop carried by Gemma-4-E4B; running entirely on my MacBook Pro. You can install and run a server with this capability in two lines in your terminal, with any quantization you need.
Nice work by the @googlegemma team @OfficialLoganK @demishassabis and @ericlbuehler with mistral․rs!
Similar Articles
@dair_ai: Can an LLM agent actually build a model of an environment it cannot see? This work makes the question gradeable. An age…
A research paper proposes agentic automata learning to evaluate whether LLM agents can infer hidden world models through interaction, finding that performance drops sharply as task complexity increases and that reasoning models outperform non-reasoning ones but still struggle.
Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic
IBM Research explores how agent logic—software primitives like knowledge graphs and program analysis—can guide LLM-based agents to efficiently handle complex enterprise workflows, reducing hallucinations and costs while improving outcomes.
LLMs Go To Confession, Automated Scientific Research, What Copilot Users Want, Reasoning For Less
DeepLearning.AI launches 'Build with Andrew,' a course enabling non-coders to build web applications using AI in under 30 minutes, while research addresses LLM transparency issues including model honesty and automated scientific research capabilities.
@j_golebiowski: The next agent stack: a frontier LLM as orchestrator, fine-tuned SLMs as skills. For PII redaction, the orchestrator ne…
Describes an agent stack design where a frontier LLM orchestrates fine-tuned small language models for PII redaction, ensuring privacy by keeping raw text local.
Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making
Researchers from the University of Michigan introduce MechSim, a mechanism-grounded neuro-symbolic reasoning framework that enables LLM agents to reason about the internal assumptions, dependencies, and execution behavior of scientific simulators rather than treating them as black boxes. The framework improves explanation quality and decision-making reliability across high-stakes domains like healthcare, finance, and public policy.