Tag
Microsoft released ASSERT at Build 2026, an open-source framework that converts natural language behavior specifications into executable evaluations for AI agents.
The user demonstrated a basic demo of generating CAD graphics via voice using Opus 4.8 with just two rounds of conversation, noting it's not yet industrial-grade but suitable for demos.
This paper applies successor representations from reinforcement learning to natural language, training a neural network to predict the expected distribution of future words. It shows that linguistic categories like parts of speech and lexical subclasses emerge spontaneously without explicit supervision.
A critical observation about recent industry AI papers lacking novelty, citing examples like SkillOpt that treat natural-language skills as trainable external parameters.
The Browser-use team has launched a terminal TUI tool written in Rust, allowing users to control the browser through natural language. It supports running with a logged-in Chrome, a headless browser, or Browser Use cloud.
CADAM 是一个开源工具,通过自然语言在 Blender 中利用 ChatGPT、Claude 或 Gemini 生成 3D CAD 模型,支持参数化控制和多种导出格式。
This paper argues that AI agent performance depends more on the harness (control layer) than on prompts alone, proposing natural-language agent harnesses to make design choices inspectable and portable.
Rémi shares that he now types plain English into his shell using Qwen3.6-27B on his laptop, highlighting a practical AI-powered tool for command-line interaction.
Introduces an open-source tool called AI Humanize Text, which uses methods like multilingual translation chains, multi-round rewriting by large models, etc., to rewrite AI-generated text to be more natural and avoid detection.
Chronicle is a 324M-parameter decoder-only transformer pretrained from scratch on both natural language and time series, achieving competitive performance on NLU and time series classification tasks, and setting new state-of-the-art for frozen-embedding time series classification on UCR/UEA datasets.
Figma launches an AI agent within its collaborative canvas that allows users to generate, edit, and automate design tasks using natural language prompts, leveraging partnerships with OpenAI and Anthropic.
LLMEval-Logic is a new Chinese benchmark for evaluating logical reasoning in LLMs, featuring solver-verified answers and adversarial hardening. The benchmark reveals significant gaps in current models, with the best reaching only 37.5% accuracy on hard items.
xAI published a setup guide for the xurl skill, enabling the Hermes AI agent to read and write to X (Twitter) through natural language commands.
Semble is an Agent-oriented code search tool that supports natural language queries, accurately returns semantically complete code snippets, saves 98% token consumption compared to traditional grep+read methods, and features intelligent chunking, dual-path retrieval, and code-aware re-ranking.
An AI-based open-source diagram generation tool that creates draw.io diagrams via natural language, supports multiple models, with 28k GitHub stars.
Incantation presents an interactive video world model that uses natural language as the action interface for fine-grained multi-entity control and cross-entity generalization, achieving high performance and real-time streaming through novel attention and distillation techniques.
AgentSwarms launches a new SQL & BI Agent workspace that allows users to upload CSVs and ask natural language questions, automatically converting them to SQL queries and generating visualizations.
Introduces an open source AI harness that automates small business tasks through natural language, seeking contributors to expand its functionality.
Vex is an open-source CLI agent harness that lets users edit videos via natural language commands, automating tasks like silence removal, b-roll addition, and visual generation.
A tweet highlights that while reasoning models excel at nuance and natural language understanding, this capability hasn't translated to retrieval systems, pointing to a key bottleneck in AI.