open-data

Tag

Cards List
#open-data

Open Repair Data Standard – Open Repair Alliance

Hacker News Top · 4d ago Cached

The Open Repair Data Standard (ORDS) defines a shared approach for collecting and sharing repair data about small electrical and electronics, enabling aggregation and analysis of repair trends across different community repair groups.

0 favorites 0 likes
#open-data

I tested 5 AI models summarizing the same news articles. They all inherited the source's framing, even when trying to be neutral. i'm rookie, be kind

Reddit r/ArtificialInteligence · 2026-05-30

A user tested five AI models summarizing immigration news articles and found that all models inherited the framing of the source text, sounding neutral but shaping reader understanding through emphasis and omission. The study is small and exploratory, with open data available.

0 favorites 0 likes
#open-data

@lhoestq: You don't know you actually need local Common Crawl

X AI KOLs Timeline · 2026-05-22 Cached

Learn how to set up and use Common Crawl data locally for web data processing tasks.

0 favorites 0 likes
#open-data

@james_y_zou: We added >220K FDA regulatory and >1M clinical trial docs to #paperclip. All natively indexed for agents and free. Now …

X AI KOLs Timeline · 2026-05-21 Cached

Paperclip adds over 220K FDA regulatory documents and 1M clinical trials from multiple registries, enabling AI agents to search and reason over clinical and regulatory data without web search. This update allows users to query FDA documents, ClinicalTrials.gov, and international registries via a unified filesystem interface.

0 favorites 0 likes
#open-data

Released a free 9.8M doc Indic multilingual corpus — Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P]

Reddit r/MachineLearning · 2026-05-18

Released a free 9.8 million document multilingual Indic corpus (11 languages, CC0 license) on HuggingFace, containing approximately 8.4 billion tokens, built for multilingual research.

0 favorites 0 likes
#open-data

Protovoters: Free, accessible voter files for democracy

Lobsters Hottest · 2026-04-23 Cached

Protovoters is an open-source tool that lets users build local voter files from public data and use them with standard geospatial software, aiming to replace expensive proprietary platforms like VAN or NationBuilder.

0 favorites 0 likes
#open-data

3.4M Solar Panels

Hacker News Top · 2026-04-22 Cached

Version 2 of the GM-SEUS open dataset now maps 3.4 million U.S. solar panels plus new rooftop arrays, up from 2.9 million in v1.

0 favorites 0 likes
#open-data

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Hugging Face Daily Papers · 2026-04-21 Cached

DR-Venus-4B is a 4B-parameter deep-research agent trained on only 10K open samples via agentic SFT+RL with turn-level rewards, outrunning prior sub-9B agents and rivaling 30B models on research benchmarks while staying deployable on edge devices.

0 favorites 0 likes
#open-data

Katzilla

Product Hunt · 2026-04-18

Katzilla is a product designed to provide easy government data access for citizens, optimized for AI consumption.

0 favorites 0 likes
#open-data

Betere Kamerstukken, en hoe lastig innovatie is

Bert Hubert · 2026-02-16 Cached

Bert Hubert beschrijft hoe hij hyperlinks heeft toegevoegd aan Tweede Kamerstukken en motienummers in debatten heeft gekoppeld aan documenten, en reflecteert op de traagheid van innovatie.

0 favorites 0 likes
← Back to home

Submit Feedback