Day zero model performance optimization work is a mix of experimentation, bug fixing, and benchmarking guided by intuition and experience. This writeup outlines the process we followed to achieve SOTA...
Similar Articles (10 found)
π 67.0% similar
There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at
https://news.ycombinator.com/item?id=37090632)...
π 67.0% similar
Each month, this newsletter is read by over 45K+ operators, investors, and tech / product leaders and executives. If you found value in this newslette...
π 66.5% similar
Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or pain...
π 66.5% similar
OpenAI just released two open-weight modelsβgpt-oss-120b and gpt-oss-20bβafter months of anticipation (you can try them here).
That means anyone with ...
π 62.2% similar
When we launched Skald, we wanted it to not only be self-hostable, but also for one to be able to run it without sending any data to third-parties.
Wi...
π 62.0% similar
Why DeepSeek is cheap at scale but expensive to run locally
Why is DeepSeek-V3 supposedly fast and cheap to serve at scale, but too slow and expensive...
π 62.0% similar
> the generation of 281,128 augmented examples, from which 1,000 were
held out as a benchmark test set.
This model is trained on a custom dataset of 2...
π 61.9% similar
What happens when coding agents stop feeling like dialup?
It's funny how quickly humans adjust to new technology. Only a few months ago Claude Code an...
π 61.5% similar
Techniques for training large neural networks
Large neural networks are at the core of many recent advances in AI, but training them is a difficult en...
π 60.9% similar
Things we learned about LLMs in 2024
31st December 2024
A lot has happened in the world of Large Language Models over the course of 2024. Hereβs a rev...