Show HN: Large Scale Article Extract of Newspapers 1730s-1960s
Sentiment Mix
Expert Signals
brettnbutter
author • 1 mention
Hacker News
source • 1 mention
AI-Generated Claims
Generated from linked receipts; click sources for full context.
Show HN: Large Scale Article Extract of Newspapers 1730s-1960s.
Supported by 1 story
Hello HN, over the past 7 months I've spent nearly 3,000 hours on building SNEWPAPERS, the first historical newpaper archive with full-text extractions, nearly perfect OCR, a vast categorization taxonomy and of course with semantic and agentic search capabilities.Problem: I wanted to search through newspaper archives, but when I tried every service only lets you search for keywords and dates, and gives you back raw images of the papers, and too many of them with no context.
Supported by 1 story
A sea of noise.Solution: I taught machines how to read the newspapers and so far I've extracted the content from > 600k pages (about 5TB) from the Chronicling America collection.
Supported by 1 story
Problems I had to deal with were an infinite variety of layouts, font sizes, image scan qualities, resolutions, aspect ratios, navigating around the images on the page.
Supported by 1 story
Related Events
Weekly news roundup: Musk vs. Altman, Google’s Pentagon AI deal, China and EU hit Meta - TechTarget
Policy & Regulation • 5/1/2026
Show HN: Piruetas – A self-hosted diary app I built for my girlfriend
Uncategorized • 5/2/2026
Thoughts on Historical Language Models and Talkie-1930
Uncategorized • 5/2/2026
5W Releases AI Platform Citation Source Index 2026: The 50 Websites That Now Decide What Brands Are Visible Inside ChatGPT, Claude, Perplexity, Gemini, and Google AI Overviews - PR Newswire
LLMs • 5/1/2026
Show HN: Filling PDF forms with AI using client-side tool calling
Uncategorized • 5/2/2026