Tag
Anthropic shared best practices for implementing self-service data analysis with Claude, achieving 95% automation of business analysis queries with an overall accuracy of about 95%, and detailed the agent analysis tech stack, three main failure modes, and corresponding countermeasures.
databow is a new open-source Rust CLI tool that provides a unified interface for querying any database with an ADBC driver, supporting over 30 databases including PostgreSQL, DuckDB, and Snowflake.
Ingestr is an open-source CLI tool for high-speed data movement between any source and destination, supporting numerous databases, data warehouses, and SaaS applications.
This paper formalizes Autonomous Agentic Data Engineering, where LLMs act as autonomous data engineers to curate and optimize training data for specialized domains, showing a 57.29% improvement in student model performance using GPT-5.2.
Streambed is an open-source CDC engine that streams Postgres WAL changes to Iceberg tables on S3, with a built-in query server using DuckDB that speaks the Postgres wire protocol.
GitHub's Open Source Friday event features Elvis Kahoro from dltHub discussing dlt, an open-source Python library for building data pipelines without complexity.
This paper introduces Autonomous Agentic Data Engineering, a task where LLMs autonomously execute end-to-end data curation pipelines for model specialization, showing significant performance gains (e.g., GPT-5.2 improves a student model by 57.29%).
This article provides a comprehensive step-by-step breakdown of how modern Large Language Models like ChatGPT and Claude are built from scratch, covering data collection, tokenization, transformer architectures, training, alignment, and deployment.
Major improvements to session storage and access for Hermes Agent, saving 20-40% disk space and improving speed.
DuckDB introduces 'Quack', a new client-server protocol that enables DuckDB instances to communicate via HTTP, supporting concurrent writers and remote access while maintaining simplicity and performance.
DataTalksClub is offering a free 9-week data engineering zoomcamp course covering containers, orchestration, data warehousing, analytics, batch and streaming processing.
LakeSail releases Sail, a Rust-native rewrite of Apache Spark that achieves 8x speed-ups and 94% lower infrastructure costs while maintaining full API compatibility.