@mdancho84: This guy built an entire AI data science team in Python. Then open-sourced (100% free). It automates data science workf…
Summary
An open-source Python library that creates an AI-powered data science team to automate workflows from data loading to modeling, with a visual pipeline studio for reproducibility.
View Cached Full Text
Cached at: 06/03/26, 01:40 AM
This guy built an entire AI data science team in Python. Then open-sourced (100% free).
It automates data science workflows with AI, including data loading, cleaning, exploratory analysis, and feature engineering. And it tracks each step in a 100% reproducible pipeline.
00:00 Project Overview 01:32 Diving into the AI Data Science Workflow and Data Loading 02:10 Data Wrangling and Cleaning 03:33 Data Visualization Insights & Plotting 04:08 Feature Engineering 05:00 Live 1-Hour Workshop 05:44 AI Data Science Team Python Library
AI Data Science Team On GitHub (Give it a Star) https://github.com/business-science/ai-data-science-team…
Want to learn how to build + ship AI and Data Science projects (that businesses actually want in 2026)?
On June 24th, I am hosting a free workshop to help you get started with AI + DS projects in Python.
Register here (500 seats): https://learn.business-science.io/ai-register
business-science/ai-data-science-team
Source: https://github.com/business-science/ai-data-science-team
AI Data Science Team
AI Data Science Team is a Python library of specialized agents for common data science workflows, plus a flagship app: AI Pipeline Studio. The Studio turns your work into a visual, reproducible pipeline, while the AI team handles data loading, cleaning, visualization, and modeling.
Status: Beta. Breaking changes may occur until 0.1.0.
Please ⭐ us on GitHub (it takes 2 seconds and means a lot).
AI Pipeline Studio (Flagship App)
AI Pipeline Studio is the main example of the AI Data Science Team in action.

Highlights:
- Pipeline-first workspace: Visual Editor, Table, Chart, EDA, Code, Model, Predictions, MLflow
- Manual + AI steps with lineage and reproducible scripts
- Multi-dataset handling and merge workflows
- Project saves: metadata-only or full-data
- Storage footprint controls and rehydrate workflows
Run it:
streamlit run apps/ai-pipeline-studio-app/app.py
Full app docs: apps/ai-pipeline-studio-app/README.md
Quickstart
Requirements
- Python 3.10+
- OpenAI API key (or Ollama for local models)
Install the app and library
Clone the repo and install in editable mode:
pip install -e .
Run the AI Pipeline Studio app
streamlit run apps/ai-pipeline-studio-app/app.py
Library Overview
The repository includes both the AI Pipeline Studio app and the underlying AI Data Science Team library. The library provides agent building blocks and multi-agent workflows for:
- Data loading and inspection
- Cleaning, wrangling, and feature engineering
- Visualization and EDA
- Modeling and evaluation (H2O + MLflow tools)
- SQL database interaction
Agents (Snapshot)
Agent examples live in examples/. Notable agents:
- Data Loader Tools Agent
- Data Wrangling Agent
- Data Cleaning Agent
- Data Visualization Agent
- EDA Tools Agent
- Feature Engineering Agent
- SQL Database Agent
- H2O ML Agent
- MLflow Tools Agent
- Multi-agent workflows (e.g., Pandas Data Analyst, SQL Data Analyst)
- Supervisor Agent (oversees other agents)
- Custom tools for data science tasks
Apps
See all apps in apps/. Notable apps:
- AI Pipeline Studio:
apps/ai-pipeline-studio-app/ - EDA Explorer App:
apps/exploratory-copilot-app/ - Pandas Data Analyst App:
apps/pandas-data-analyst-app/
Use OpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model_name="gpt-4.1-mini",
)
Use Ollama (Local LLM)
ollama serve
ollama pull llama3.1:8b
from langchain_ollama import ChatOllama
llm = ChatOllama(
model="llama3.1:8b",
)
Next-Gen AI Agentic Workshop
Want to learn how to build AI agents and AI apps for real data science workflows? Join my next‑gen AI workshop: https://learn.business-science.io/ai-register
Similar Articles
@quantscience_: Some guy made a quant trading system that uses AI, real-time data processing, and risk management. Then open sourced it…
An open-source Python quant trading system leveraging AI, real-time data processing, and risk management has been released for free.
@pauliusztin_: We just open-sourced the full @aiDotEngineer workshop! You can clone it and run everything yourself... → https://github…
An open-source workshop repository for building a real-world multi-agent AI system featuring a Deep Research Agent and LinkedIn Writing Workflow using MCP servers, Pydantic structured outputs, and agentic engineering with Claude Code subagents.
If you've ever wondered how rigorous data analysis+social science research can look with AI, I've finally launched a nice website for my open-source Claude Code researcher's toolkit: the Data Analyst Augmentation Framework! Equal parts interactive explainer on agentic orchestration + free tool
The Data Analyst Augmentation Framework (DAAF) is a free, open-source toolkit that transforms Claude Code into a rigorous quantitative research engine, ensuring auditable and reproducible analysis with human oversight.
@dhruvtwt_: Why is no one talking about this? @nvidia is offering around 80 AI models via hosted APIs absolutely for free. You get …
Nvidia quietly provides ~80 free hosted AI model APIs including MiniMax M2.7, GLM 5.1, Kimi 2.5, DeepSeek 3.2, GPT-OSS-120B, ready to integrate with popular dev tools like OpenClaude and Zed IDE.
@charliejhills: Most people use AI. The smartest people learn from the people building it. Here are 11 GitHub repos that feel like open…
A tweet thread curating 11 open-source GitHub repositories for AI tools, agents, and learning resources, including PilotDeck, Karpathy's skills, and Microsoft's AI agent course.