Talkie-1930 Language Model Released
Model is available via Hugging Face and GitHub.
Training cutoff is 1930, allowing public domain access starting 2026.
Inspired by Owain Evans's 'vintage LLMs' concept.
Trained on 260 billion tokens with OCR noise issues.
Model performs 30% of human-transcribed text accuracy.
Exhibits 'temporal leakage' by retaining knowledge of post-1930 events.
Designed for experimental temporal generalization, not production use.
Team plans to scale model for multi-agent historical simulations.
Research aims to bridge STEM and humanities through open-source frameworks.
4 days ago