@Sumanth_077: 10 GitHub Repositories you should definitely check as an AI Engineer! 1. Hands on AI Engineering Curated repository of …
Summary
A tweet thread lists 10 must-check GitHub repositories for AI engineers, covering hands-on AI engineering, LLMs, AI agents, ML deployment, and more.
View Cached Full Text
Cached at: 06/14/26, 07:38 AM
10 GitHub Repositories you should definitely check as an AI Engineer!
- Hands on AI Engineering
Curated repository of AI-powered applications and agentic systems showcasing practical use cases of LLMs
Check this out: https://github.com/Sumanth077/Hands-On-AI-Engineering…
- Hands on Large Language Models
This repository contains the complete code examples from the book Hands-On Large Language Models.
It includes notebook examples that cover everything from the introduction to language models to fine-tuning them.
Check this out: https://github.com/HandsOnLLM/Hands-On-Large-Language-Models…
- AI Agents for Begineers
Beginner friendly course on AI Agents
This Free 11-lesson course will teach you everything you need to get started with building AI agents.
Check this out: https://github.com/microsoft/ai-agents-for-beginners…
- GenAI Agents
This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced.
It serves as a comprehensive guide for building intelligent, interactive AI systems.
Check this out: https://github.com/NirDiamant/GenAI_Agents…
- Made with ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Check this out: https://github.com/GokuMohandas/Made-With-ML…
- Learn Harness Engineering
A project-based course on building the environment, state management, verification, and control mechanisms that make AI coding agents work reliably.
Check this out: https://github.com/walkinglabs/learn-harness-engineering…
- AutoResearch by Andrej Karpathy
Learn how to build autonomous ML experiment loops where AI agents modify training code, run experiments, and iterate on their own.
This 630-line Python script shows you how to set up an agentic research workflow that runs ~100 experiments overnight on a single GPU. Practical implementation of autonomous research systems.
Check this out: https://github.com/karpathy/autoresearch…
- Designing Machine Learning Systems
This repo contains the summaries and resources for Designing Machine Learning Systems book
Check this out: https://github.com/chiphuyen/dmls-book…
- Awesome LLM Inference
Curated list of LLM/VLM inference papers with codes covering Flash-Attention, Paged-Attention, WINT8/4, Parallelism, and more.
Comprehensive resource for LLM inference optimization techniques including quantization, KV cache management, attention mechanisms, and deployment strategies.
Check this out: https://github.com/xlite-dev/Awesome-LLM-Inference…
- LLM Course: The best hands-on course to learn Large Language Models with roadmaps and Colab notebooks!
Check this out: https://github.com/mlabonne/llm-course…
If you found it insightful, reshare with your network.
Find me → @Sumanth_077 for more insights and tutorials on AI Engineering!
Sumanth077/Hands-On-AI-Engineering
Source: https://github.com/Sumanth077/Hands-On-AI-Engineering
A curated collection of practical, production-ready AI projects across multiple modalities, including language models, multimodal models, OCR systems, RAG pipelines, and AI agents. Each project is designed to help you learn, experiment, and build real-world AI applications.
📋 Table of Contents
🎯 Why This Repository?
- Learn by Doing: Each project includes complete code, setup instructions, and documentation
- Production-Ready: Projects follow best practices and are ready to be adapted for real-world use
- Diverse Use Cases: From RAG systems to multi-agent workflows and specialized applications
- Multiple Model Providers: Projects use OpenAI, Anthropic, Google, and open-source models
- Active Community: Regular updates and new project additions
🗂️ Project Categories
🤖 AI Agents
Intelligent ai agents for various automation tasks.
- Multi-Agent Financial Analyst — Team of specialized agents for comprehensive financial analysis.
- FinAgent — Financial assistant agent for stock market analysis and insights.
- Daily AI News Digest — Automated daily digest from 92 Karpathy-curated tech blogs delivered to Telegram every morning. MiniMax M2.7 scores articles from the last 24 hours and surfaces the 3 most significant stories.
- Agentic Form Filler — Agentic form-filling agent using Landing AI for layout parsing and MiniMax M2.7 for multi-turn data gathering.
- AI Travel Planning Agent — Multi-agent travel planner that turns a single natural language request into a complete trip plan with flights, hotels, and a day-by-day itinerary.
- Competitive Intelligence Agent — Generates strategic sales battlecards by analyzing competitors through the lens of your own business context.
- Multi-Agent Research Assistant (AG2) — Multi-agent research pipeline using AG2 where three specialists collaborate to research any topic and produce a structured report.
- Self-Reflective Agentic RAG — LangGraph RAG system that grades retrieved context, rewrites the query if needed, and generates an answer only once the context passes validation.
- Agentic SQL Search — Natural language to SQL agent powered by Gemma 4 that writes, executes, and explains queries against an e-commerce database.
- Stock Portfolio Analyst — Portfolio analysis agent built with Agno and DeepSeek-V4-Flash. Fetches live market data via YFinance and generates a report covering P&L, concentration risk, and rebalancing recommendations.
- Eagle Eye — GitHub PR review agent using OpenClaw and Telegram. Fetches diffs via GitHub MCP, performs structured code review with severity ratings, and posts feedback after user approval.
- CartMate — AI Customer Support Agent — Memory-powered e-commerce support agent built with Mem0 and Mistral Small 4 that remembers customers and picks up conversations where they left off.
- Multi-Agent Coding Assistant — Four-stage coding pipeline powered by Mistral Small 4 and LangChain. A Planner, Coder, and Reviewer agent collaborate to produce a polished final implementation.
- Startup Analyst — Startup due-diligence agent powered by MiniMax M2.5. Scrapes a company’s site with Firecrawl and produces an investment-grade report covering market position, financials, team, and risks.
- Research Team — Multi-agent research system powered by MiniMax M2.5. Seek searches the web, Scout navigates internal documents, and a team leader synthesises findings into a structured report.
- GitHub Intelligence Agent — GitHub research agent powered by Gemini 3 Flash and GitHub’s official MCP server. Ask anything about repos, contributors, issues, or codebases.
- Smolagents Code Agent — Agentic task runner powered by Mistral Small 4 and HuggingFace smolagents. Writes and executes Python code at each step using DuckDuckGo and Wikipedia.
- Agent Discovery Agent — Searches and compares AI agents across NANDA, MCP, Virtuals Protocol, A2A, and ERC-8004 through a single natural language interface. Powered by Gemini 3 Flash.
- Cal Scheduling Agent — Conversational scheduling assistant that manages Cal.com appointments through natural language. Book, reschedule, cancel, and check availability with automatic timezone handling.
- Hacker News Newsletter Agent — Fetches the 10 latest Hacker News stories, scrapes full article content with Trafilatura, generates a structured HTML newsletter with Gemma 4, and delivers it to your inbox via Gmail SMTP.
- Hotel Finder Agent — Conversational hotel search agent powered by qwen3.6-flash via Orq.ai and the Trivago MCP Server. Search by location, dates, guest count, price range, star rating, and amenities.
- Marketing Strategy Agent — Multi-agent marketing campaign generator. A Market Analyst (with Serper web search), Strategy Officer, and Creative Director run sequentially to produce market research, a full strategy, and creative campaign content. Powered by deepseek-v4-flash via Orq.ai.
- Brand Monitor — Monitors brand mentions across Web, YouTube, Twitter/X, and LinkedIn in a single run. Scrapingdog collects platform data and DeepSeek V4 Flash produces a structured intelligence brief per channel.
- AI Debate Agent - Two LLM debaters argue opposing sides of any topic you choose. A judge scores each turn and declares a winner.
- Browser Automation Agent - Takes a natural language instruction and autonomously navigates the web to complete it using browser-use.
- Documentation QnA Agent - Chat with any documentation by URL. Uses Fetch MCP and DeepSeek V4 Flash on NVIDIA NIM.
- Job Posting Agent - Generates tailored job postings from a company name and role using DeepSeek V4 Flash on NVIDIA NIM.
- LangChain Data Agent - Query the Chinook SQLite database in plain English through a conversational Streamlit chat interface.
- Travel Planner Agent - AI trip planning assistant covering weather, budget, packing lists, and day-by-day itineraries from a single request.
- Personal Finance Agent - Upload a bank statement CSV, auto-categorize transactions, and ask natural language questions about your spending. Powered by a LangChain tool-calling agent backed by Orq.ai with SQLite persistence.
📸 OCR
Extracting structure and meaning from visual data and documents.
- Image-to-Structured-Data Extractor — Converts images into validated, structured JSON using Mistral Large 3 and Instructor.
- LaTeX Formula OCR — Extracts math formulas from images and PDFs into LaTeX using a local vision-language model.
- Medical Prescription Digitizer — Digitizes handwritten or printed prescriptions into structured output using Mistral Large 3, with real-time drug name validation against RxNorm.
🎧 Audio
Projects for audio understanding and analysis.
- Music Explorer — Chat with any audio file or YouTube video using Gemini 3 Flash. Ask for transcriptions, emotion analysis, instrument identification, and timestamp-aware breakdowns.
- Multilingual Audio Translator — Upload or record audio in any language, get it transcribed with faster-whisper, translated via Gemini, and played back as synthesized speech using Kokoro TTS.
🎬 Multimodal
Projects combining vision, video, and language models.
- GLM-OCR Pro — Structured document extraction using GLM-OCR via Ollama, transforming images and PDFs into formatted Markdown locally.
- Video Understanding Agent — Summarizes YouTube videos into chapters, key takeaways, and action items using Gemini Flash.
- Multimodal Weather App — Upload a map image and get live weather. Mistral Small 4 identifies the city via vision, then fetches real-time conditions through native tool calling.
- Multimodal RAG — RAG system that ingests text, URLs, PDFs, images, audio, and video into a shared ChromaDB index. Gemini Embedding 2 handles retrieval and Gemini 3 Flash generates grounded answers, passing actual file URIs for media sources.
- Image Question Answering — Upload a PDF, select a page, and ask visual questions answered by Gemma 4 with thinking mode. PyMuPDF renders each page to a full-resolution image for grounded reasoning over charts, tables, and figures.
- Medical Document Parser - Extracts a structured clinical profile from medical PDFs and images using Gemma 4 vision.
📚 RAG Applications
Retrieval-Augmented Generation systems for knowledge-enhanced AI applications.
- Agentic RAG with O3-Mini & DuckDuckGo — RAG system using O3-Mini with DuckDuckGo for real-time web search.
- Agentic RAG with Qwen & FireCrawl — RAG system using Qwen and FireCrawl for web scraping and retrieval.
- Vision RAG — Multimodal RAG system for processing and querying visual content.
- Clinical RAG with ADE — High-precision clinical RAG using LandingAI ADE for visual-first document parsing and Mistral Large for grounded reasoning.
- YouTube Transcript RAG — Chat with any YouTube video using Whisper transcription, ChromaDB retrieval, and Mistral Small 4, with timestamp-linked answers.
- GraphRAG Knowledge System — Builds a local knowledge graph from uploaded documents using Mistral Small 4 and NetworkX, supporting both entity-level and thematic queries.
- Hybrid RAG System — Indexes documents into a knowledge graph and a vector store in parallel. Mistral Small 4 answers questions with fused context from both retrieval paths.
- HyDE RAG — RAG pipeline using Hypothetical Document Embeddings. Gemini 3 Flash generates hypothetical answers, Gemini Embedding 2 embeds and averages them, and the result retrieves more precise chunks from ChromaDB.
- Rock Music RAG — Custom rock music knowledge base built from Wikipedia. Add any band, ask questions across all of them, and get sourced answers powered by BM25 retrieval and Gemma 4.
- RAG Agent with Database Routing — Routes queries across three specialized Qdrant databases (products, support, financial) using an Agno router agent. Falls back to a LangGraph ReAct web search agent when no relevant documents are found.
- Reasoning RAG - Ask questions against any web source and get cited answers with a live, step-by-step reasoning trace via Gradio.
🤝 Contributing
We welcome contributions! Whether you’re adding new projects, improving existing ones, or fixing bugs, your help makes this repository better for everyone.
How to Contribute
- Read the guidelines: Check CONTRIBUTING.md for detailed instructions
- Create an issue: Propose your project or improvement
- Follow the structure: Use the appropriate category folder
- Submit a PR: One project per pull request
Project Structure Requirements
- Each project must be in its own folder within the appropriate category
- Must include a comprehensive
README.md(use our template) - Must include
requirements.txtorpyproject.toml - Must include
.env.examplefor required API keys - Follow snake_case naming convention
📜 License
This repository is licensed under the MIT License. See the LICENSE file for details.
🙏 Acknowledgments
Thank you to all contributors who have helped build this collection of AI engineering projects!
Built with ❤️ by the AI Engineering Community
For sponsorship or collaboration inquiries, reach the maintainer at [email protected].
Similar Articles
@AvinashSingh_20: 10 GitHub Repos To Learn in-depth Ai engineering ! 1 :- https://github.com/Avik-Jain/100-Days-Of-ML-Code… 2 :- https://…
A curated list of 10 GitHub repositories recommended for learning AI engineering in depth, covering topics like machine learning basics, LLM applications, and RAG techniques.
@RodmanAi: 10 GitHub repos that will level up your AI Agent skills (SAVE THIS) 1. Hands-On Large Language Models Complete code not…
A LinkedIn post shares 10 GitHub repositories covering AI agent skills, LLMs, prompt engineering, and generative AI, including free courses and practical resources.
@Sumanth_077: Hands on AI Engineering! I open-sourced a collection of 50+ hands-on AI engineering tutorials. It features step-by-step…
A collection of 50+ hands-on AI engineering tutorials covering AI agents, RAG, MCP, OCR, voice AI, and more, open-sourced with 1k+ GitHub stars.
@heynavtoor: 10 GitHub repos to build AI agents that ship pull requests while you sleep. Bookmark this. Save this list before your m…
A curated list of 10 open-source GitHub repositories for building AI agents that automate pull requests, bug fixes, and feature development, including tools like OpenHands, SWE-agent, and Aider.
@charliejhills: Most people use AI. The smartest people learn from the people building it. Here are 11 GitHub repos that feel like open…
A tweet thread curating 11 open-source GitHub repositories for AI tools, agents, and learning resources, including PilotDeck, Karpathy's skills, and Microsoft's AI agent course.