Tag
A practical guide on setting up an always-on AI agent on a Mac mini, covering hardware selection, cloud vs. local AI model tradeoffs, and agent system choices for automating tasks like sales reporting and social media suggestions.
CyberSecQwen-4B is a small, specialized 4B parameter model fine-tuned for defensive cybersecurity tasks, designed to run locally on a single GPU, addressing privacy, cost, and air-gapped deployment needs.
Modly is an open-source desktop app that generates fully textured 3D meshes from images, running 100% locally on your GPU with pluggable AI model extensions.
A developer built a JARVIS-style personal assistant called CYBER with wake word activation, local voice cloning via XTTS v2, vision mode, and LLM-generated system commands, all running locally without cloud dependencies.
A user inquires about the specific identity of a ~4GB AI model (likely Gemini Nano) silently downloaded by Chrome for on-device features, and requests a GGUF version for local execution via llama.cpp.
A user demonstrates successful local inference of a 27B parameter Qwen model across three GTX 1080 Ti GPUs, achieving approximately 28-30 tokens per second using TurboQuant optimization.
Kuku is introduced as an open-source tool designed to serve as a local second brain for managing various AI interactions.
The author shares a locally runnable AI companion built with Python, Gemini, and Ollama, featuring a custom cognitive architecture based on Global Workspace Theory and an Integrated Information Theory proxy for personality modeling.
The author announces the release of 'lightning-mlx', a local AI engine optimized for Apple Silicon that achieves high token speeds for coding agents and tool-calling workflows.
A new implementation of Multi-Token Prediction (MTP) in llama.cpp achieves a 40% speedup for Gemma 4 models, tested on a MacBook Pro M5Max. The post provides links to quantized GGUF models and the patched source code.
The author introduces the site plan for effectiveTPS, a tool designed to compare local AI models using a new 'effective TPS' metric alongside raw speed and latency. It aims to provide a simple leaderboard that highlights useful output quality over raw marketing numbers.
Reefy allows users to turn any PC into a private AI machine, running AI locally on their hardware.
OpenClaw, an open-source persistent AI assistant, has become the most-starred GitHub project, sparking debate over security and autonomy. NVIDIA is collaborating to enhance security and releasing NemoClaw as a secure reference implementation.
Poolside releases Laguna XS.2, a 33B parameter MoE model with 3B activated parameters designed for agentic coding and local deployment on Macs with 36GB RAM.
A detailed guide for running the 35B-parameter Qwen3.6 model locally on Apple Silicon with llama.cpp to power the pi coding agent, including optimized configuration flags and sampling parameters.
Developer claims Hermes fine-tunes of Gemma 4 and Qwen 3.5 deliver the best local LLM performance, suggesting they rival paid BigAI models.
Developer achieves productive local agentic coding with Qwen3.6-35B 4-bit MLX and pi.dev tool, completing real tickets efficiently on current hardware.
A user demonstrates Qwen 3.6 running autonomously on an AMD 7900 XTX GPU, locally creating an Android app — described as a sci-fi reality achieved today.
NVIDIA and Google collaborate to optimize Gemma 4 models for local deployment across RTX GPUs, DGX Spark, and Jetson devices, enabling efficient on-device agentic AI with support for reasoning, coding, multimodal capabilities, and 35+ languages.
GGML and llama.cpp have joined Hugging Face to ensure long-term sustainability of local AI development. Georgi Gerganov's team will maintain full autonomy over the projects while receiving resources to scale community support and improve integration between llama.cpp inference and transformers model definitions.