@mdancho84: This guy built an entire AI data science team in Python. Then open-sourced (100% free). It automates data science workf…

X AI KOLs Timeline Tools

Summary

An open-source Python library that creates an AI-powered data science team to automate workflows from data loading to modeling, with a visual pipeline studio for reproducibility.

This guy built an entire AI data science team in Python. Then open-sourced (100% free). It automates data science workflows with AI, including data loading, cleaning, exploratory analysis, and feature engineering. And it tracks each step in a 100% reproducible pipeline. 00:00 Project Overview 01:32 Diving into the AI Data Science Workflow and Data Loading 02:10 Data Wrangling and Cleaning 03:33 Data Visualization Insights & Plotting 04:08 Feature Engineering 05:00 Live 1-Hour Workshop 05:44 AI Data Science Team Python Library AI Data Science Team On GitHub (Give it a Star) https://github.com/business-science/ai-data-science-team… Want to learn how to build + ship AI and Data Science projects (that businesses actually want in 2026)? On June 24th, I am hosting a free workshop to help you get started with AI + DS projects in Python. Register here (500 seats): https://learn.business-science.io/ai-register
Original Article
View Cached Full Text

Cached at: 06/03/26, 01:40 AM

This guy built an entire AI data science team in Python. Then open-sourced (100% free).

It automates data science workflows with AI, including data loading, cleaning, exploratory analysis, and feature engineering. And it tracks each step in a 100% reproducible pipeline.

00:00 Project Overview 01:32 Diving into the AI Data Science Workflow and Data Loading 02:10 Data Wrangling and Cleaning 03:33 Data Visualization Insights & Plotting 04:08 Feature Engineering 05:00 Live 1-Hour Workshop 05:44 AI Data Science Team Python Library

AI Data Science Team On GitHub (Give it a Star) https://github.com/business-science/ai-data-science-team…

Want to learn how to build + ship AI and Data Science projects (that businesses actually want in 2026)?

On June 24th, I am hosting a free workshop to help you get started with AI + DS projects in Python.

Register here (500 seats): https://learn.business-science.io/ai-register


business-science/ai-data-science-team

Source: https://github.com/business-science/ai-data-science-team

AI Data Science Team
AI Data Science Team + AI Pipeline Studio
PyPI versions license GitHub Repo stars

AI Data Science Team

AI Data Science Team is a Python library of specialized agents for common data science workflows, plus a flagship app: AI Pipeline Studio. The Studio turns your work into a visual, reproducible pipeline, while the AI team handles data loading, cleaning, visualization, and modeling.

Status: Beta. Breaking changes may occur until 0.1.0.

Please ⭐ us on GitHub (it takes 2 seconds and means a lot).

AI Pipeline Studio (Flagship App)

AI Pipeline Studio is the main example of the AI Data Science Team in action.

AI Pipeline Studio

Highlights:

  • Pipeline-first workspace: Visual Editor, Table, Chart, EDA, Code, Model, Predictions, MLflow
  • Manual + AI steps with lineage and reproducible scripts
  • Multi-dataset handling and merge workflows
  • Project saves: metadata-only or full-data
  • Storage footprint controls and rehydrate workflows

Run it:

streamlit run apps/ai-pipeline-studio-app/app.py

Full app docs: apps/ai-pipeline-studio-app/README.md

Quickstart

Requirements

  • Python 3.10+
  • OpenAI API key (or Ollama for local models)

Install the app and library

Clone the repo and install in editable mode:

pip install -e .

Run the AI Pipeline Studio app

streamlit run apps/ai-pipeline-studio-app/app.py

Library Overview

The repository includes both the AI Pipeline Studio app and the underlying AI Data Science Team library. The library provides agent building blocks and multi-agent workflows for:

  • Data loading and inspection
  • Cleaning, wrangling, and feature engineering
  • Visualization and EDA
  • Modeling and evaluation (H2O + MLflow tools)
  • SQL database interaction

Agents (Snapshot)

Agent examples live in examples/. Notable agents:

  • Data Loader Tools Agent
  • Data Wrangling Agent
  • Data Cleaning Agent
  • Data Visualization Agent
  • EDA Tools Agent
  • Feature Engineering Agent
  • SQL Database Agent
  • H2O ML Agent
  • MLflow Tools Agent
  • Multi-agent workflows (e.g., Pandas Data Analyst, SQL Data Analyst)
  • Supervisor Agent (oversees other agents)
  • Custom tools for data science tasks

Apps

See all apps in apps/. Notable apps:

  • AI Pipeline Studio: apps/ai-pipeline-studio-app/
  • EDA Explorer App: apps/exploratory-copilot-app/
  • Pandas Data Analyst App: apps/pandas-data-analyst-app/

Use OpenAI

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model_name="gpt-4.1-mini",
)

Use Ollama (Local LLM)

ollama serve
ollama pull llama3.1:8b
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3.1:8b",
)

Next-Gen AI Agentic Workshop

Want to learn how to build AI agents and AI apps for real data science workflows? Join my next‑gen AI workshop: https://learn.business-science.io/ai-register

Similar Articles

If you've ever wondered how rigorous data analysis+social science research can look with AI, I've finally launched a nice website for my open-source Claude Code researcher's toolkit: the Data Analyst Augmentation Framework! Equal parts interactive explainer on agentic orchestration + free tool

Reddit r/artificial

The Data Analyst Augmentation Framework (DAAF) is a free, open-source toolkit that transforms Claude Code into a rigorous quantitative research engine, ensuring auditable and reproducible analysis with human oversight.