@ProfBuehlerMIT: For science, AI sovereignty and physics-grounded reasoning are non-negotiable. But how can we teach a small LLM like Ge…

X AI KOLs Timeline 06/18/26, 04:11 AM Tools

local-ai inference-engine agent-skills open-source self-hosted scientific-reasoning mistral-rs

Summary

mistral.rs now natively supports Agent Skills, enabling locally-run small LLMs to perform complex agentic workflows for scientific tasks, with full control over models, data, and execution.

For science, AI sovereignty and physics-grounded reasoning are non-negotiable. But how can we teach a small LLM like Gemma-4-E4B physics? One way is to use Agent Skills, but this has so far been limited to closed frontier models. mistral․rs now implements Agent Skills natively: the first self-hosted inference engine that does this as part of the local inference substrate, where we can use small models to solve complex scientific and other tasks in a flexible and scalable way. We are in a period of uncertainty about frontier models - access, pricing, deprecation, abrupt restriction. The good news is that when the entire stack runs locally we can build AI that is entirely your own: You own the weights, the skills, the execution loop, the data - all of it runs on your hardware and is reproducible and durable. While virtually all local inference engines expose a model behind an OpenAI-compatible endpoint, everything agentic is then assembled around it by an external orchestrator that injects context, manages tools, mounts files, and brokers execution. mistral․rs is natively agentic and moves that machinery into the server itself, allowing us to build complex agentic workflows and run them locally, on open-source models. With this new feature you can now upload Agent Skills bundles to /v1/skills, reference them from Responses API requests by identity, and run them inside a native agentic loop with persistent Python sessions, figure capture, sandboxed shell execution, file inputs mounted directly into the working session; plug-and-play and completely compatible with your existing code/workflow. A model with a native skill substrate can act, observe consequences, and can modify what it is able to do. The skill is retained procedural capability of the system. Attached is a short video of all of it: skills, code execution, the full agentic loop carried by Gemma-4-E4B; running entirely on my MacBook Pro. You can install and run a server with this capability in two lines in your terminal, with any quantization you need. Nice work by the @googlegemma team @OfficialLoganK @demishassabis and @ericlbuehler with mistral․rs!

Original Article

View Cached Full Text

Cached at: 06/18/26, 06:09 AM

We are in a period of uncertainty about frontier models - access, pricing, deprecation, abrupt restriction. The good news is that when the entire stack runs locally we can build AI that is entirely your own: You own the weights, the skills, the execution loop, the data - all of it runs on your hardware and is reproducible and durable.

While virtually all local inference engines expose a model behind an OpenAI-compatible endpoint, everything agentic is then assembled around it by an external orchestrator that injects context, manages tools, mounts files, and brokers execution. mistral․rs is natively agentic and moves that machinery into the server itself, allowing us to build complex agentic workflows and run them locally, on open-source models.

With this new feature you can now upload Agent Skills bundles to /v1/skills, reference them from Responses API requests by identity, and run them inside a native agentic loop with persistent Python sessions, figure capture, sandboxed shell execution, file inputs mounted directly into the working session; plug-and-play and completely compatible with your existing code/workflow.

A model with a native skill substrate can act, observe consequences, and can modify what it is able to do. The skill is retained procedural capability of the system.

Attached is a short video of all of it: skills, code execution, the full agentic loop carried by Gemma-4-E4B; running entirely on my MacBook Pro. You can install and run a server with this capability in two lines in your terminal, with any quantization you need.

Nice work by the @googlegemma team @OfficialLoganK @demishassabis and @ericlbuehler with mistral․rs!

@ProfBuehlerMIT: For science, AI sovereignty and physics-grounded reasoning are non-negotiable. But how can we teach a small LLM like Ge…

Similar Articles

@dair_ai: Can an LLM agent actually build a model of an environment it cannot see? This work makes the question gradeable. An age…

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

LLMs Go To Confession, Automated Scientific Research, What Copilot Users Want, Reasoning For Less

@j_golebiowski: The next agent stack: a frontier LLM as orchestrator, fine-tuned SLMs as skills. For PII redaction, the orchestrator ne…

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

Submit Feedback

Similar Articles

@dair_ai: Can an LLM agent actually build a model of an environment it cannot see? This work makes the question gradeable. An age…

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

LLMs Go To Confession, Automated Scientific Research, What Copilot Users Want, Reasoning For Less

@j_golebiowski: The next agent stack: a frontier LLM as orchestrator, fine-tuned SLMs as skills. For PII redaction, the orchestrator ne…

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making