@FakeMaidenMaker: Folks, we've dug up another AI engineering treasure: Hands-On-AI-Engineering. Just open-sourced, it already hit 2.3K stars. One repo stuffed with over 50 real AI projects that you can run directly. The most practical part is it doesn't talk about vague theories; each project is a complete little…

X AI KOLs Timeline Tools

Summary

Recommends the freshly open-sourced GitHub repo Hands-On-AI-Engineering with 2.3K stars, containing over 50 runnable AI projects covering RAG, AI agent, OCR, and more. Each project provides full code and instructions, suitable for hands-on learning.

Folks, we've dug up another AI engineering treasure: Hands-On-AI-Engineering Just open-sourced, it already hit 2.3K stars. One repo stuffed with over 50 real AI projects that you can run directly. The most practical part is it doesn't talk about vague theories; each project is a complete mini-application, with all code, configuration, and instructions provided. If you want to learn RAG, there are over a dozen ways to implement it; if you're into AI agents, from multi-agent financial analysis, auto form filling to GitHub PR review, there are dozens of ready-made examples you can adapt and use; there's also OCR for converting prescription forms and math formulas into structured data — a niche but essential need. Before, you'd have to search everywhere for tutorials and piece things together. Now, with this one repo, you can go through it from start to finish, follow along and set up each project, and you'll get started. GitHub:
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:51 AM

Hey everyone, I’ve dug up another AI engineering practical treasure called Hands-On-AI-Engineering. It just went open source and already hit 2.3K stars, packing over 50 real AI projects that you can run straight away. The best part is it doesn’t waste time on vague theory — each project is a complete mini-application with full code, config, and instructions. Want to learn RAG? There are a dozen different RAG implementations. Want to build AI agents? From multi-agent financial analysis and auto form-filling to GitHub PR review, there are dozens of ready-made examples you can adapt. There’s even niche stuff like OCR that turns prescriptions and math formulas into structured data. Before, you’d have to search all over the web for tutorials and piece things together yourself. Now, just go through this repo from start to finish, follow along, and you’ll have a solid foundation.

GitHub: —

Sumanth077/Hands-On-AI-Engineering

Source: https://github.com/Sumanth077/Hands-On-AI-Engineering

🚀 Hands-On AI Engineering

License: MIT (https://opensource.org/licenses/MIT) PRs Welcome

A curated collection of practical, production-ready AI projects across multiple modalities, including language models, multimodal models, OCR systems, RAG pipelines, and AI agents. Each project is designed to help you learn, experiment, and build real-world AI applications.

📋 Table of Contents


🎯 Why This Repository?

  • Learn by Doing: Each project includes complete code, setup instructions, and documentation
  • Production-Ready: Projects follow best practices and are ready to be adapted for real-world use
  • Diverse Use Cases: From RAG systems to multi-agent workflows and specialized applications
  • Multiple Model Providers: Projects use OpenAI, Anthropic, Google, and open-source models
  • Active Community: Regular updates and new project additions

🗂️ Project Categories

🤖 AI Agents

Intelligent ai agents for various automation tasks.

  • Multi-Agent Financial Analyst — Team of specialized agents for comprehensive financial analysis.
  • FinAgent — Financial assistant agent for stock market analysis and insights.
  • Daily AI News Digest — Automated daily digest from 92 Karpathy-curated tech blogs delivered to Telegram every morning. MiniMax M2.7 scores articles from the last 24 hours and surfaces the 3 most significant stories.
  • Agentic Form Filler — Agentic form-filling agent using Landing AI for layout parsing and MiniMax M2.7 for multi-turn data gathering.
  • AI Travel Planning Agent — Multi-agent travel planner that turns a single natural language request into a complete trip plan with flights, hotels, and a day-by-day itinerary.
  • Competitive Intelligence Agent — Generates strategic sales battlecards by analyzing competitors through the lens of your own business context.
  • Multi-Agent Research Assistant (AG2) — Multi-agent research pipeline using AG2 where three specialists collaborate to research any topic and produce a structured report.
  • Self-Reflective Agentic RAG — LangGraph RAG system that grades retrieved context, rewrites the query if needed, and generates an answer only once the context passes validation.
  • Agentic SQL Search — Natural language to SQL agent powered by Gemma 4 that writes, executes, and explains queries against an e-commerce database.
  • Stock Portfolio Analyst — Portfolio analysis agent built with Agno and DeepSeek-V4-Flash. Fetches live market data via YFinance and generates a report covering P&L, concentration risk, and rebalancing recommendations.
  • Eagle Eye — GitHub PR review agent using OpenClaw and Telegram. Fetches diffs via GitHub MCP, performs structured code review with severity ratings, and posts feedback after user approval.
  • CartMate — AI Customer Support Agent — Memory-powered e-commerce support agent built with Mem0 and Mistral Small 4 that remembers customers and picks up conversations where they left off.
  • Multi-Agent Coding Assistant — Four-stage coding pipeline powered by Mistral Small 4 and LangChain. A Planner, Coder, and Reviewer agent collaborate to produce a polished final implementation.
  • Startup Analyst — Startup due-diligence agent powered by MiniMax M2.5. Scrapes a company’s site with Firecrawl and produces an investment-grade report covering market position, financials, team, and risks.
  • Research Team — Multi-agent research system powered by MiniMax M2.5. Seek searches the web, Scout navigates internal documents, and a team leader synthesises findings into a structured report.
  • GitHub Intelligence Agent — GitHub research agent powered by Gemini 3 Flash and GitHub’s official MCP server. Ask anything about repos, contributors, issues, or codebases.
  • Smolagents Code Agent — Agentic task runner powered by Mistral Small 4 and HuggingFace smolagents. Writes and executes Python code at each step using DuckDuckGo and Wikipedia.
  • Agent Discovery Agent — Searches and compares AI agents across NANDA, MCP, Virtuals Protocol, A2A, and ERC-8004 through a single natural language interface. Powered by Gemini 3 Flash.
  • Cal Scheduling Agent — Conversational scheduling assistant that manages Cal.com appointments through natural language. Book, reschedule, cancel, and check availability with automatic timezone handling.
  • Hacker News Newsletter Agent — Fetches the 10 latest Hacker News stories, scrapes full article content with Trafilatura, generates a structured HTML newsletter with Gemma 4, and delivers it to your inbox via Gmail SMTP.
  • Hotel Finder Agent — Conversational hotel search agent powered by qwen3.6-flash via Orq.ai and the Trivago MCP Server. Search by location, dates, guest count, price range, star rating, and amenities.
  • Marketing Strategy Agent — Multi-agent marketing campaign generator. A Market Analyst (with Serper web search), Strategy Officer, and Creative Director run sequentially to produce market research, a full strategy, and creative campaign content. Powered by deepseek-v4-flash via Orq.ai.
  • Brand Monitor — Monitors brand mentions across Web, YouTube, Twitter/X, and LinkedIn in a single run. Scrapingdog collects platform data and DeepSeek V4 Flash produces a structured intelligence brief per channel.
  • AI Debate Agent - Two LLM debaters argue opposing sides of any topic you choose. A judge scores each turn and declares a winner.
  • Browser Automation Agent - Takes a natural language instruction and autonomously navigates the web to complete it using browser-use.
  • Documentation QnA Agent - Chat with any documentation by URL. Uses Fetch MCP and DeepSeek V4 Flash on NVIDIA NIM.
  • Job Posting Agent - Generates tailored job postings from a company name and role using DeepSeek V4 Flash on NVIDIA NIM.
  • LangChain Data Agent - Query the Chinook SQLite database in plain English through a conversational Streamlit chat interface.
  • Travel Planner Agent - AI trip planning assistant covering weather, budget, packing lists, and day-by-day itineraries from a single request.
  • Personal Finance Agent - Upload a bank statement CSV, auto-categorize transactions, and ask natural language questions about your spending. Powered by a LangChain tool-calling agent backed by Orq.ai with SQLite persistence.
  • Offline Medical Agent - Fully offline agentic RAG system for clinical protocol lookup at remote clinics and field hospitals.

📸 OCR

Extracting structure and meaning from visual data and documents.

  • Image-to-Structured-Data Extractor — Converts images into validated, structured JSON using Mistral Large 3 and Instructor.
  • LaTeX Formula OCR — Extracts math formulas from images and PDFs into LaTeX using a local vision-language model.
  • Medical Prescription Digitizer — Digitizes handwritten or printed prescriptions into structured output using Mistral Large 3, with real-time drug name validation against RxNorm.

🎧 Audio

Projects for audio understanding and analysis.

  • Music Explorer — Chat with any audio file or YouTube video using Gemini 3 Flash. Ask for transcriptions, emotion analysis, instrument identification, and timestamp-aware breakdowns.
  • Multilingual Audio Translator — Upload or record audio in any language, get it transcribed with faster-whisper, translated via Gemini, and played back as synthesized speech using Kokoro TTS.

🎬 Multimodal

Projects combining vision, video, and language models.

  • GLM-OCR Pro — Structured document extraction using GLM-OCR via Ollama, transforming images and PDFs into formatted Markdown locally.
  • Video Understanding Agent — Summarizes YouTube videos into chapters, key takeaways, and action items using Gemini Flash.
  • Multimodal Weather App — Upload a map image and get live weather. Mistral Small 4 identifies the city via vision, then fetches real-time conditions through native tool calling.
  • Multimodal RAG — RAG system that ingests text, URLs, PDFs, images, audio, and video into a shared ChromaDB index. Gemini Embedding 2 handles retrieval and Gemini 3 Flash generates grounded answers, passing actual file URIs for media sources.
  • Image Question Answering — Upload a PDF, select a page, and ask visual questions answered by Gemma 4 with thinking mode. PyMuPDF renders each page to a full-resolution image for grounded reasoning over charts, tables, and figures.
  • Medical Document Parser - Extracts a structured clinical profile from medical PDFs and images using Gemma 4 vision.

📚 RAG Applications

Retrieval-Augmented Generation systems for knowledge-enhanced AI applications.

  • Agentic RAG with O3-Mini & DuckDuckGo — RAG system using O3-Mini with DuckDuckGo for real-time web search.
  • Agentic RAG with Qwen & FireCrawl — RAG system using Qwen and FireCrawl for web scraping and retrieval.
  • Vision RAG — Multimodal RAG system for processing and querying visual content.
  • Clinical RAG with ADE — High-precision clinical RAG using LandingAI ADE for visual-first document parsing and Mistral Large for grounded reasoning.
  • YouTube Transcript RAG — Chat with any YouTube video using Whisper transcription, ChromaDB retrieval, and Mistral Small 4, with timestamp-linked answers.
  • GraphRAG Knowledge System — Builds a local knowledge graph from uploaded documents using Mistral Small 4 and NetworkX, supporting both entity-level and thematic queries.
  • Hybrid RAG System — Indexes documents into a knowledge graph and a vector store in parallel. Mistral Small 4 answers questions with fused context from both retrieval paths.
  • HyDE RAG — RAG pipeline using Hypothetical Document Embeddings. Gemini 3 Flash generates hypothetical answers, Gemini Embedding 2 embeds and averages them, and the result retrieves more precise chunks from ChromaDB.
  • Rock Music RAG — Custom rock music knowledge base built from Wikipedia. Add any band, ask questions across all of them, and get sourced answers powered by BM25 retrieval and Gemma 4.
  • RAG Agent with Database Routing — Routes queries across three specialized Qdrant databases (products, support, financial) using an Agno router agent. Falls back to a LangGraph ReAct web search agent when no relevant documents are found.
  • Reasoning RAG - Ask questions against any web source and get cited answers with a live, step-by-step reasoning trace via Gradio.

🤝 Contributing

We welcome contributions! Whether you’re adding new projects, improving existing ones, or fixing bugs, your help makes this repository better for everyone.

How to Contribute

  1. Read the guidelines: Check CONTRIBUTING.md for detailed instructions
  2. Create an issue: Propose your project or improvement
  3. Follow the structure: Use the appropriate category folder
  4. Submit a PR: One project per pull request

Project Structure Requirements

  • Each project must be in its own folder within the appropriate category
  • Must include a comprehensive README.md (use our template)
  • Must include requirements.txt or pyproject.toml
  • Must include .env.example for required API keys
  • Follow snake_case naming convention

📜 License

This repository is licensed under the MIT License. See the LICENSE file for details.


🙏 Acknowledgments

Thank you to all contributors who have helped build this collection of AI engineering projects!


Built with ❤️ by the AI Engineering Community (https://aiengineering.beehiiv.com/) For sponsorship or collaboration inquiries, reach the maintainer at [email protected].

⬆ Back to Top

Similar Articles

@wsl8297: When learning AI, the scariest part is getting stuck at "understanding the theory" and freezing when it's time to write code — not knowing where to start, and unable to find decent practice projects. I unearthed a practical treasure trove on GitHub: AI-Project-Gallery. It collects 30+ high-quality AI projects, covering classic topics like house price prediction and disease classification, as well as hot applications like Gemini chatbot and document generator...

X AI KOLs Timeline

This post shares a curated GitHub repository containing over 30 practical AI projects, covering domains from regression to generative AI, with many end-to-end examples, suitable for learners and developers.

@IndieDevHailey: This might be the hardest-core open-source AI engineering course on the internet. The GitHub trending project: ai-engineering-from-scratch has already gained 17.4k+ Stars. It's not another AI tutorial that teaches you how to call APIs; instead, it truly guides you to build AI systems from scratch.

X AI KOLs Timeline

Introduces the trending open-source AI engineering course project ai-engineering-from-scratch on GitHub, which has earned 17.4k+ Stars. It offers 435 lessons across 20 stages, covering mathematical principles to hand-built AI systems, supports multiple languages, and aims to help learners deeply understand the underlying principles of AI.

@axichuhai: Hey everyone, I've found another GitHub treasure open-source project — hello-agents has shot straight to the top of the GitHub trending list and is still climbing! It systematically organizes AI and Agent from theory to practice into an open-source curriculum, covering Agentic RL, SFT, …

X AI KOLs Timeline

Discovered an open-source GitHub project hello-agents, which organizes a complete open-source course from theory to practice on AI Agents, covering core skills like Agentic RL, SFT, GRPO, and has reached the top of GitHub trending.

@Smartpigai: Anyone using AI for coding should bookmark this 113K stars open-source project: The Agency: a ready-made AI expert team. It organizes 232 specialized agents across 16 fields including frontend, backend, DevOps, product, design, marketing, security, testing, and more.

X AI KOLs Timeline

The Agency is a GitHub repository with 232 specialized AI agent prompts and workflows for roles like frontend, backend, DevOps, and more, designed to help developers use AI coding tools more effectively.

@XAMTO_AI: A veteran engineer packed decades of practical engineering experience into this open-source project, which shot to #1 on GitHub trending, amassing 124k stars. The author, a former Vercel engineer who participated in early Next.js development, compiled 16 practical techniques for collaborating with Claude, installable with a single command. The most impressive…

X AI KOLs Timeline

Former Vercel engineer Matt Pocock open-sourced a project called 'skills' that provides 16 practical tips for collaborating with AI coding agents like Claude, including 'Grill Me' and red-green test cycles. Aimed at solving common AI development issues, it has garnered 124k stars.