Language Modeling with Limited Data, Infinite Compute
March 2026
NanoGPT Slowrun is an open effort to implement data-efficient learning algorithms; 5.5x data efficiency in the first week and improving...
Similar Articles (10 found)
π 68.7% similar
Techniques for training large neural networks
Large neural networks are at the core of many recent advances in AI, but training them is a difficult en...
π 67.3% similar
> the generation of 281,128 augmented examples, from which 1,000 were
held out as a benchmark test set.
This model is trained on a custom dataset of 2...
π 67.0% similar
Writing an LLM from scratch, part 22 -- finally training our LLM!
This post wraps up my notes on chapter 5 of Sebastian Raschka's book "Build a Large ...
π 66.8% similar
There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at
https://news.ycombinator.com/item?id=37090632)...
π 66.8% similar
Day zero model performance optimization work is a mix of experimentation, bug fixing, and benchmarking guided by intuition and experience. This writeu...
π 65.2% similar
MicroGPT explained interactively
Andrej Karpathy wrote a 200-line Python script that trains and runs a GPT from scratch, with no libraries or dependen...
π 64.8% similar
Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or pain...
π 63.6% similar
Things we learned about LLMs in 2024
31st December 2024
A lot has happened in the world of Large Language Models over the course of 2024. Hereβs a rev...
π 63.3% similar
The Bitter Lesson is Misunderstood
Together, the Bitter Lesson and Scaling Laws reveal that the god of Compute we worship is yoked to an even greater ...
π 63.2% similar
Understanding LLM Inference Engines: Inside Nano-vLLM (Part 1)
Architecture, Scheduling, and the Path from Prompt to Token
When deploying large langua...