Writing an LLM from scratch, part 22 -- finally training our LLM!
This post wraps up my notes on chapter 5 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)". Understanding cros...
Similar Articles (10 found)
π 75.6% similar
> the generation of 281,128 augmented examples, from which 1,000 were
held out as a benchmark test set.
This model is trained on a custom dataset of 2...
π 70.4% similar
There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at
https://news.ycombinator.com/item?id=37090632)...
π 69.0% similar
First, thanks to the publisher and authors for making this freely available!
I retired recently after using neural networks since the 1980s. I still s...
π 68.9% similar
GPT-5: Key characteristics, pricing and model card
7th August 2025
Iβve had preview access to the new GPT-5 model family for the past two weeks (see r...
π 68.8% similar
Things we learned about LLMs in 2024
31st December 2024
A lot has happened in the world of Large Language Models over the course of 2024. Hereβs a rev...
π 68.6% similar
I'm curious why we seem convinced that this is a task that is possible or something worthy of investigation.
I've worked on language models since 2018...
π 68.0% similar
This article doesn't talk much about testing or getting training data. It seems like that part is key.
For code that you think you understand, it's be...
π 67.2% similar
The Bitter Lesson is Misunderstood
Together, the Bitter Lesson and Scaling Laws reveal that the god of Compute we worship is yoked to an even greater ...
π 67.0% similar
2 Years of ML vs. 1 Month of Prompting
November 7, 2025
Recalls at major automakers cost hundreds of millions of dollars a year. Itβs a huge issue. To...
π 66.8% similar
Deep Neural Nets: 33 years ago and 33 years from now
The Yann LeCun et al. (1989) paper Backpropagation Applied to Handwritten Zip Code Recognition is...