data-engineering

Tag

Cards List
#data-engineering

@riba2534: https://x.com/riba2534/status/2062495991421616319

X AI KOLs Timeline · 2d ago Cached

Anthropic shared best practices for implementing self-service data analysis with Claude, achieving 95% automation of business analysis queries with an overall accuracy of about 95%, and detailed the agent analysis tech stack, three main failure modes, and corresponding countermeasures.

0 favorites 0 likes
#data-engineering

databow: a Rust CLI to query any database with an ADBC driver

Hacker News Top · 4d ago Cached

databow is a new open-source Rust CLI tool that provides a unified interface for querying any database with an ADBC driver, supporting over 30 databases including PostgreSQL, DuckDB, and Snowflake.

0 favorites 0 likes
#data-engineering

@svpino: Super-fast CLI tool for moving data around: $ ingestr ingest --source-uri --dest-uri Ingestr is open-source, requires a…

X AI KOLs Following · 5d ago

Ingestr is an open-source CLI tool for high-speed data movement between any source and destination, supporting numerous databases, data warehouses, and SaaS applications.

0 favorites 0 likes
#data-engineering

Exploring Autonomous Agentic Data Engineering for Model Specialization

arXiv cs.CL · 6d ago Cached

This paper formalizes Autonomous Agentic Data Engineering, where LLMs act as autonomous data engineers to curate and optimize training data for specialized domains, showing a 57.29% improvement in student model performance using GPT-5.2.

0 favorites 0 likes
#data-engineering

Show HN: Streambed – Stream Postgres to Iceberg on S3, Supports Postgres Wire

Hacker News Top · 6d ago Cached

Streambed is an open-source CDC engine that streams Postgres WAL changes to Iceberg tables on S3, with a built-in query server using DuckDB that speaks the Postgres wire protocol.

0 favorites 0 likes
#data-engineering

@github: Build data pipelines without the complexity. Tomorrow on Open Source Friday, dev advocate Elvis Kahoro explains how @dl…

X AI KOLs Following · 2026-05-28 Cached

GitHub's Open Source Friday event features Elvis Kahoro from dltHub discussing dlt, an open-source Python library for building data pipelines without complexity.

0 favorites 0 likes
#data-engineering

Exploring Autonomous Agentic Data Engineering for Model Specialization

Hugging Face Daily Papers · 2026-05-28 Cached

This paper introduces Autonomous Agentic Data Engineering, a task where LLMs autonomously execute end-to-end data curation pipelines for model specialization, showing significant performance gains (e.g., GPT-5.2 improves a student model by 57.29%).

0 favorites 0 likes
#data-engineering

@shabnam_774: https://x.com/shabnam_774/status/2058517919760355729

X AI KOLs Timeline · 2026-05-24 Cached

This article provides a comprehensive step-by-step breakdown of how modern Large Language Models like ChatGPT and Claude are built from scratch, covering data collection, tokenization, transformer architectures, training, alignment, and deployment.

0 favorites 0 likes
#data-engineering

@Teknium: Our database and data engineering expert @yoniebans made some major improvements to the way sessions are stored and acc…

X AI KOLs Following · 2026-05-21 Cached

Major improvements to session storage and access for Hermes Agent, saving 20-40% disk space and improving speed.

0 favorites 0 likes
#data-engineering

Quack: The DuckDB Client-Server Protocol

Hacker News Top · 2026-05-12 Cached

DuckDB introduces 'Quack', a new client-server protocol that enables DuckDB instances to communicate via HTTP, supporting concurrent writers and remote access while maintaining simplicity and performance.

0 favorites 0 likes
#data-engineering

DataTalksClub/data-engineering-zoomcamp

GitHub Trending (daily) · 2026-05-29 Cached

DataTalksClub is offering a free 9-week data engineering zoomcamp course covering containers, orchestration, data warehousing, analytics, batch and streaming processing.

0 favorites 0 likes
#data-engineering

@LakeSailHQ: Spark rebuilt in Rust — no JVM, 8x faster, 94% less cost.

X AI KOLs Following · 2026-04-22 Cached

LakeSail releases Sail, a Rust-native rewrite of Apache Spark that achieves 8x speed-ups and 94% lower infrastructure costs while maintaining full API compatibility.

0 favorites 0 likes
← Back to home

Submit Feedback