Topic modeling remains a critical tool in the AI and NLP toolbox. While large language models (LLMs) handle text exceptionally well, extracting high-level topics from massive datasets still requires d...
Similar Articles (10 found)
π 62.2% similar
Writing an LLM from scratch, part 22 -- finally training our LLM!
This post wraps up my notes on chapter 5 of Sebastian Raschka's book "Build a Large ...
π 60.2% similar
word2vec-style vector arithmetic on docs embeddingsΒ§
2025 October 29
word2vec popularized the idea of representing words as vectors where semantically...
π 59.7% similar
> the generation of 281,128 augmented examples, from which 1,000 were
held out as a benchmark test set.
This model is trained on a custom dataset of 2...
π 58.5% similar
Multi-modal ML with OpenAI's CLIP
Language models (LMs) can not rely on language alone. That is the idea behind the βExperience Grounds Languageβ pape...
π 58.4% similar
MicroGPT explained interactively
Andrej Karpathy wrote a 200-line Python script that trains and runs a GPT from scratch, with no libraries or dependen...
π 58.2% similar
How big are our embeddings now and why?
#embeddings #openai #anthropic #huggingface #dimensionality
A few years ago, I wrote a paper on embeddings. At...
π 57.4% similar
2 Years of ML vs. 1 Month of Prompting
November 7, 2025
Recalls at major automakers cost hundreds of millions of dollars a year. Itβs a huge issue. To...
π 56.7% similar
The Illustrated Word2vec
Discussions:
Hacker News (347 points, 37 comments), Reddit r/MachineLearning (151 points, 19 comments)
Translations: Chinese ...
π 56.2% similar
There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at
https://news.ycombinator.com/item?id=37090632)...
π 56.0% similar
Things we learned about LLMs in 2024
31st December 2024
A lot has happened in the world of Large Language Models over the course of 2024. Hereβs a rev...