Tag
Microsoft's NextLat introduces a training objective that rewards belief-state representations instead of relying solely on next-token prediction, pushing models toward compact world models for better generalization.
Microsoft Research introduces Next-Latent Prediction (NextLat), a self-supervised method that trains transformers to predict their own next latent state, enabling compact world models for reasoning and planning and achieving up to 3.3x faster inference via self-speculative decoding.
Microsoft Research highlights multiple advances including 30x faster analytics with CoddSpeed, AI wildlife re-identification, and LLMs that learn across tasks without retraining in the latest Research Focus newsletter.
Microsoft's Project Ire, an autonomous malware-classification agent, successfully identified a LOTUSLITE variant that evaded major EDR tools through behavioral reverse engineering without relying on IOC signatures.
Microsoft Research introduces Arbor, a generalist autonomous research agent that uses persistent hypothesis-tree refinement for cumulative learning, outperforming Codex and Claude Code across six research tasks and achieving 86% Any-Medal on MLE-Bench Lite.
Microsoft Research introduces Mirage, a latent spatial memory that stores 3D scenes as latent tokens, achieving up to 10.57x faster video generation and 55x lower memory use with state-of-the-art consistency.
Microsoft Research's latest newsletter highlights AgentPex, an open-source system for automated evaluation of agentic behaviors; new theoretical work on variance reduction for ranking systems; a call to shift from documents to repositories for human-agent collaboration; and a global challenge on AI value alignment.
A roundup of three notable AI papers: SkillOpt treats skill documents as trainable parameters to optimize frozen agents; a new method compiles agentic workflows into model weights for 100x cost reduction; and AutoScientists introduces a decentralized agent team for long-running science without a central planner.
Microsoft Research introduces SkillOpt, a method that treats agent skill documents as trainable external state, using an optimizer model to make bounded edits validated by a held-out set. The approach achieves best or tied results across 52 evaluation cells and improves accuracy by over 23 points on GPT-5.5, with zero extra inference cost and transferable skills.
Introducing SkillOpt, an optimizer that treats natural-language skills as trainable external parameters instead of finetuning model weights. It uses bounded edits and validation gating to enable stable, controllable skill updates, achieving best or tied-best results across 52 settings on 6 benchmarks with 7 models.
Magma is an open-source repository from Microsoft Research for building multimodal AI agents that integrate vision, language, and action, providing model links, inference examples, training instructions, and demos.
Microsoft's 2026 Future of Work report indicates that generative AI is reshaping the workplace at an unprecedented pace, but the benefits are highly unevenly distributed, with junior roles hit hardest; AI is evolving from an acceleration tool to a collaboration partner, making human professional judgment even more crucial.
Microsoft Research announced new tools, models, repositories, and papers, including MagenticLite, agentic GitHub workflows, verification-first agents, and meaning-matching fine-tuning, during the Microsoft Research Forum virtual series.
Microsoft introduces GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, significantly improving grid efficiency and reducing congestion costs.
Microsoft Research announces MatterSim updates including MatterSim-MT, a multi-task foundation model for materials characterization, faster simulation (3-5x speedup), and experimental validation of thermal conductivity predictions for a new material.
This paper introduces Switchcraft, the first AI model router specifically optimized for agentic tool calling to reduce inference costs. By using a lightweight DistilBERT classifier, it achieves significant cost savings while maintaining high accuracy in tool-use tasks.
A new paper by Microsoft Research and Salesforce reveals that LLM performance drops significantly in multi-turn conversations due to a 'Lost in Conversation' phenomenon, challenging the reliability of current single-turn benchmarks.
This Microsoft Research paper introduces a randomized scheduling technique designed to provide probabilistic guarantees for uncovering bugs in software systems. Published for the ASPLOS conference, it focuses on systematic fault detection through algorithmic randomness.
This paper introduces DataDignity, a framework and benchmark (FakeWiki) for pinpoint provenance, aiming to identify the specific training data sources that support an LLM's response. It proposes ScoringModel and SteerFuse methods to improve attribution accuracy over standard retrieval baselines.
This paper introduces AgenticRAG, a framework from Microsoft that enhances enterprise knowledge base retrieval by equipping LLMs with tools for iterative search, document navigation, and analysis. It demonstrates significant improvements in recall and factuality over standard RAG pipelines on multiple benchmarks.