Tag
The Open Repair Data Standard (ORDS) defines a shared approach for collecting and sharing repair data about small electrical and electronics, enabling aggregation and analysis of repair trends across different community repair groups.
A user tested five AI models summarizing immigration news articles and found that all models inherited the framing of the source text, sounding neutral but shaping reader understanding through emphasis and omission. The study is small and exploratory, with open data available.
Learn how to set up and use Common Crawl data locally for web data processing tasks.
Paperclip adds over 220K FDA regulatory documents and 1M clinical trials from multiple registries, enabling AI agents to search and reason over clinical and regulatory data without web search. This update allows users to query FDA documents, ClinicalTrials.gov, and international registries via a unified filesystem interface.
Released a free 9.8 million document multilingual Indic corpus (11 languages, CC0 license) on HuggingFace, containing approximately 8.4 billion tokens, built for multilingual research.
Protovoters is an open-source tool that lets users build local voter files from public data and use them with standard geospatial software, aiming to replace expensive proprietary platforms like VAN or NationBuilder.
Version 2 of the GM-SEUS open dataset now maps 3.4 million U.S. solar panels plus new rooftop arrays, up from 2.9 million in v1.
DR-Venus-4B is a 4B-parameter deep-research agent trained on only 10K open samples via agentic SFT+RL with turn-level rewards, outrunning prior sub-9B agents and rivaling 30B models on research benchmarks while staying deployable on edge devices.
Katzilla is a product designed to provide easy government data access for citizens, optimized for AI consumption.
Bert Hubert beschrijft hoe hij hyperlinks heeft toegevoegd aan Tweede Kamerstukken en motienummers in debatten heeft gekoppeld aan documenten, en reflecteert op de traagheid van innovatie.