@FakeMaidenMaker: 兄弟们,又挖到一个 AI 工程实战宝藏,叫 Hands-On-AI-Engineering 刚开源就冲了 2.3K star,一个仓库塞了 50 多个能直接跑起来的真实 AI 项目。 它最实用的地方是不讲空泛理论,每个项目都是一个完整的小…

X AI KOLs Timeline 工具

摘要

推荐了一个刚开源就获得2.3K star的GitHub仓库Hands-On-AI-Engineering,内含50多个可直接运行的AI项目,涵盖RAG、AI agent、OCR等类别,每个项目都提供完整代码和说明,适合实战学习。

兄弟们,又挖到一个 AI 工程实战宝藏,叫 Hands-On-AI-Engineering 刚开源就冲了 2.3K star,一个仓库塞了 50 多个能直接跑起来的真实 AI 项目。 它最实用的地方是不讲空泛理论,每个项目都是一个完整的小应用,代码、配置、说明全给齐。 你想学 RAG,里面有十几种 RAG 的写法; 想搞 AI agent,从多智能体财务分析、自动填表到 GitHub PR 审查,几十个现成例子照着改就能用; 还有 OCR 把处方单、数学公式转成结构化数据这种冷门刚需。 以前学这些得满网找教程、东拼西凑,现在一个仓库从头看到尾,照着搭一遍就入门了。 GitHub:
查看原文
查看缓存全文

缓存时间: 2026/06/16 11:51

兄弟们,又挖到一个 AI 工程实战宝藏,叫 Hands-On-AI-Engineering

刚开源就冲了 2.3K star,一个仓库塞了 50 多个能直接跑起来的真实 AI 项目。

它最实用的地方是不讲空泛理论,每个项目都是一个完整的小应用,代码、配置、说明全给齐。

你想学 RAG,里面有十几种 RAG 的写法; 想搞 AI agent,从多智能体财务分析、自动填表到 GitHub PR 审查,几十个现成例子照着改就能用; 还有 OCR 把处方单、数学公式转成结构化数据这种冷门刚需。

以前学这些得满网找教程、东拼西凑,现在一个仓库从头看到尾,照着搭一遍就入门了。

GitHub:


Sumanth077/Hands-On-AI-Engineering

Source: https://github.com/Sumanth077/Hands-On-AI-Engineering

Hands-On AI Engineering Banner

🚀 Hands-On AI Engineering

License: MIT PRs Welcome

A curated collection of practical, production-ready AI projects across multiple modalities, including language models, multimodal models, OCR systems, RAG pipelines, and AI agents. Each project is designed to help you learn, experiment, and build real-world AI applications.

📋 Table of Contents


🎯 Why This Repository?

  • Learn by Doing: Each project includes complete code, setup instructions, and documentation
  • Production-Ready: Projects follow best practices and are ready to be adapted for real-world use
  • Diverse Use Cases: From RAG systems to multi-agent workflows and specialized applications
  • Multiple Model Providers: Projects use OpenAI, Anthropic, Google, and open-source models
  • Active Community: Regular updates and new project additions

🗂️ Project Categories

🤖 AI Agents

Intelligent ai agents for various automation tasks.

  • Multi-Agent Financial Analyst — Team of specialized agents for comprehensive financial analysis.
  • FinAgent — Financial assistant agent for stock market analysis and insights.
  • Daily AI News Digest — Automated daily digest from 92 Karpathy-curated tech blogs delivered to Telegram every morning. MiniMax M2.7 scores articles from the last 24 hours and surfaces the 3 most significant stories.
  • Agentic Form Filler — Agentic form-filling agent using Landing AI for layout parsing and MiniMax M2.7 for multi-turn data gathering.
  • AI Travel Planning Agent — Multi-agent travel planner that turns a single natural language request into a complete trip plan with flights, hotels, and a day-by-day itinerary.
  • Competitive Intelligence Agent — Generates strategic sales battlecards by analyzing competitors through the lens of your own business context.
  • Multi-Agent Research Assistant (AG2) — Multi-agent research pipeline using AG2 where three specialists collaborate to research any topic and produce a structured report.
  • Self-Reflective Agentic RAG — LangGraph RAG system that grades retrieved context, rewrites the query if needed, and generates an answer only once the context passes validation.
  • Agentic SQL Search — Natural language to SQL agent powered by Gemma 4 that writes, executes, and explains queries against an e-commerce database.
  • Stock Portfolio Analyst — Portfolio analysis agent built with Agno and DeepSeek-V4-Flash. Fetches live market data via YFinance and generates a report covering P&L, concentration risk, and rebalancing recommendations.
  • Eagle Eye — GitHub PR review agent using OpenClaw and Telegram. Fetches diffs via GitHub MCP, performs structured code review with severity ratings, and posts feedback after user approval.
  • CartMate — AI Customer Support Agent — Memory-powered e-commerce support agent built with Mem0 and Mistral Small 4 that remembers customers and picks up conversations where they left off.
  • Multi-Agent Coding Assistant — Four-stage coding pipeline powered by Mistral Small 4 and LangChain. A Planner, Coder, and Reviewer agent collaborate to produce a polished final implementation.
  • Startup Analyst — Startup due-diligence agent powered by MiniMax M2.5. Scrapes a company’s site with Firecrawl and produces an investment-grade report covering market position, financials, team, and risks.
  • Research Team — Multi-agent research system powered by MiniMax M2.5. Seek searches the web, Scout navigates internal documents, and a team leader synthesises findings into a structured report.
  • GitHub Intelligence Agent — GitHub research agent powered by Gemini 3 Flash and GitHub’s official MCP server. Ask anything about repos, contributors, issues, or codebases.
  • Smolagents Code Agent — Agentic task runner powered by Mistral Small 4 and HuggingFace smolagents. Writes and executes Python code at each step using DuckDuckGo and Wikipedia.
  • Agent Discovery Agent — Searches and compares AI agents across NANDA, MCP, Virtuals Protocol, A2A, and ERC-8004 through a single natural language interface. Powered by Gemini 3 Flash.
  • Cal Scheduling Agent — Conversational scheduling assistant that manages Cal.com appointments through natural language. Book, reschedule, cancel, and check availability with automatic timezone handling.
  • Hacker News Newsletter Agent — Fetches the 10 latest Hacker News stories, scrapes full article content with Trafilatura, generates a structured HTML newsletter with Gemma 4, and delivers it to your inbox via Gmail SMTP.
  • Hotel Finder Agent — Conversational hotel search agent powered by qwen3.6-flash via Orq.ai and the Trivago MCP Server. Search by location, dates, guest count, price range, star rating, and amenities.
  • Marketing Strategy Agent — Multi-agent marketing campaign generator. A Market Analyst (with Serper web search), Strategy Officer, and Creative Director run sequentially to produce market research, a full strategy, and creative campaign content. Powered by deepseek-v4-flash via Orq.ai.
  • Brand Monitor — Monitors brand mentions across Web, YouTube, Twitter/X, and LinkedIn in a single run. Scrapingdog collects platform data and DeepSeek V4 Flash produces a structured intelligence brief per channel.
  • AI Debate Agent - Two LLM debaters argue opposing sides of any topic you choose. A judge scores each turn and declares a winner.
  • Browser Automation Agent - Takes a natural language instruction and autonomously navigates the web to complete it using browser-use.
  • Documentation QnA Agent - Chat with any documentation by URL. Uses Fetch MCP and DeepSeek V4 Flash on NVIDIA NIM.
  • Job Posting Agent - Generates tailored job postings from a company name and role using DeepSeek V4 Flash on NVIDIA NIM.
  • LangChain Data Agent - Query the Chinook SQLite database in plain English through a conversational Streamlit chat interface.
  • Travel Planner Agent - AI trip planning assistant covering weather, budget, packing lists, and day-by-day itineraries from a single request.
  • Personal Finance Agent - Upload a bank statement CSV, auto-categorize transactions, and ask natural language questions about your spending. Powered by a LangChain tool-calling agent backed by Orq.ai with SQLite persistence.
  • Offline Medical Agent - Fully offline agentic RAG system for clinical protocol lookup at remote clinics and field hospitals.

📸 OCR

Extracting structure and meaning from visual data and documents.

  • Image-to-Structured-Data Extractor — Converts images into validated, structured JSON using Mistral Large 3 and Instructor.
  • LaTeX Formula OCR — Extracts math formulas from images and PDFs into LaTeX using a local vision-language model.
  • Medical Prescription Digitizer — Digitizes handwritten or printed prescriptions into structured output using Mistral Large 3, with real-time drug name validation against RxNorm.

🎧 Audio

Projects for audio understanding and analysis.

  • Music Explorer — Chat with any audio file or YouTube video using Gemini 3 Flash. Ask for transcriptions, emotion analysis, instrument identification, and timestamp-aware breakdowns.
  • Multilingual Audio Translator — Upload or record audio in any language, get it transcribed with faster-whisper, translated via Gemini, and played back as synthesized speech using Kokoro TTS.

🎬 Multimodal

Projects combining vision, video, and language models.

  • GLM-OCR Pro — Structured document extraction using GLM-OCR via Ollama, transforming images and PDFs into formatted Markdown locally.
  • Video Understanding Agent — Summarizes YouTube videos into chapters, key takeaways, and action items using Gemini Flash.
  • Multimodal Weather App — Upload a map image and get live weather. Mistral Small 4 identifies the city via vision, then fetches real-time conditions through native tool calling.
  • Multimodal RAG — RAG system that ingests text, URLs, PDFs, images, audio, and video into a shared ChromaDB index. Gemini Embedding 2 handles retrieval and Gemini 3 Flash generates grounded answers, passing actual file URIs for media sources.
  • Image Question Answering — Upload a PDF, select a page, and ask visual questions answered by Gemma 4 with thinking mode. PyMuPDF renders each page to a full-resolution image for grounded reasoning over charts, tables, and figures.
  • Medical Document Parser - Extracts a structured clinical profile from medical PDFs and images using Gemma 4 vision.

📚 RAG Applications

Retrieval-Augmented Generation systems for knowledge-enhanced AI applications.

  • Agentic RAG with O3-Mini & DuckDuckGo — RAG system using O3-Mini with DuckDuckGo for real-time web search.
  • Agentic RAG with Qwen & FireCrawl — RAG system using Qwen and FireCrawl for web scraping and retrieval.
  • Vision RAG — Multimodal RAG system for processing and querying visual content.
  • Clinical RAG with ADE — High-precision clinical RAG using LandingAI ADE for visual-first document parsing and Mistral Large for grounded reasoning.
  • YouTube Transcript RAG — Chat with any YouTube video using Whisper transcription, ChromaDB retrieval, and Mistral Small 4, with timestamp-linked answers.
  • GraphRAG Knowledge System — Builds a local knowledge graph from uploaded documents using Mistral Small 4 and NetworkX, supporting both entity-level and thematic queries.
  • Hybrid RAG System — Indexes documents into a knowledge graph and a vector store in parallel. Mistral Small 4 answers questions with fused context from both retrieval paths.
  • HyDE RAG — RAG pipeline using Hypothetical Document Embeddings. Gemini 3 Flash generates hypothetical answers, Gemini Embedding 2 embeds and averages them, and the result retrieves more precise chunks from ChromaDB.
  • Rock Music RAG — Custom rock music knowledge base built from Wikipedia. Add any band, ask questions across all of them, and get sourced answers powered by BM25 retrieval and Gemma 4.
  • RAG Agent with Database Routing — Routes queries across three specialized Qdrant databases (products, support, financial) using an Agno router agent. Falls back to a LangGraph ReAct web search agent when no relevant documents are found.
  • Reasoning RAG - Ask questions against any web source and get cited answers with a live, step-by-step reasoning trace via Gradio.

🤝 Contributing

We welcome contributions! Whether you’re adding new projects, improving existing ones, or fixing bugs, your help makes this repository better for everyone.

How to Contribute

  1. Read the guidelines: Check CONTRIBUTING.md for detailed instructions
  2. Create an issue: Propose your project or improvement
  3. Follow the structure: Use the appropriate category folder
  4. Submit a PR: One project per pull request

Project Structure Requirements

  • Each project must be in its own folder within the appropriate category
  • Must include a comprehensive README.md (use our template)
  • Must include requirements.txt or pyproject.toml
  • Must include .env.example for required API keys
  • Follow snake_case naming convention

📜 License

This repository is licensed under the MIT License. See the LICENSE file for details.


🙏 Acknowledgments

Thank you to all contributors who have helped build this collection of AI engineering projects!


Built with ❤️ by the AI Engineering Community

For sponsorship or collaboration inquiries, reach the maintainer at [email protected].

⬆ Back to Top

相似文章

@wsl8297: 学 AI 最怕停在“懂原理”,一到写代码就卡壳:不知道从哪下手,也找不到像样的练手项目。 我在 GitHub 挖到一个实战向宝藏库:AI-Project-Gallery。 它收录了 30+ 高质量 AI 项目,覆盖从房价预测、疾病分类等经…

X AI KOLs Timeline

This post shares a curated GitHub repository containing over 30 practical AI projects, covering domains from regression to generative AI, with many end-to-end examples, suitable for learners and developers.

@IndieDevHailey: 这可能是全网最硬核的 AI 工程开源课 GitHub 爆火项目:ai-engineering-from-scratch 已经拿下 17.4k+ Stars 它不是那种教你调 API的 AI 教程, 而是真正带你从 0 手搓 AI 系统。 …

X AI KOLs Timeline

介绍 GitHub 上爆火的 AI 工程开源课程项目 ai-engineering-from-scratch,已获 17.4k+ Stars,提供 435 节课、20 个阶段,从数学原理到手写 AI 系统,支持多语言,旨在帮助学习者深入理解 AI 底层原理。

@XAMTO_AI: 有位老哥把自己几十年的工程实战经验一股脑塞进了这个开源项目,直接冲上GitHub热榜第一,狂揽12.4万Star。 作者是前Vercel工程师,参与过Next.js早期开发,整理了16个和Claude协作的实战技巧,一条命令装完。 最绝的…

X AI KOLs Timeline

前Vercel工程师Matt Pocock开源了一个名为'skills'的项目,提供16个与Claude等AI编码代理协作的实战技巧,包括Grill Me、红绿测试循环等,旨在解决AI开发中的常见问题,已获得12.4万Star。