Day zero model performance optimization work is a mix of experimentation, bug fixing, and benchmarking guided by intuition and experience. This writeup outlines the process we followed to achieve SOTA...
Similar Articles (10 found)
π 67.0% similar
There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at
https://news.ycombinator.com/item?id=37090632)...
π 67.0% similar
Each month, this newsletter is read by over 45K+ operators, investors, and tech / product leaders and executives. If you found value in this newslette...
π 66.5% similar
Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or pain...
π 66.5% similar
OpenAI just released two open-weight modelsβgpt-oss-120b and gpt-oss-20bβafter months of anticipation (you can try them here).
That means anyone with ...
π 64.7% similar
The three types of LLM workloads and how to serve them
We hold this truth to be self-evident: not all workloads are created equal.
But for large langu...
π 64.1% similar
GPT-5.2
11th December 2025
OpenAI reportedly declared a βcode redβ on the 1st of December in response to increasingly credible competition from the li...
π 63.9% similar
Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult
24th November 2025
Anthropic released Claude Opus 4.5 this morning, which they ...
π 63.7% similar
Understanding LLM Inference Engines: Inside Nano-vLLM (Part 1)
Architecture, Scheduling, and the Path from Prompt to Token
When deploying large langua...
π 62.2% similar
When we launched Skald, we wanted it to not only be self-hostable, but also for one to be able to run it without sending any data to third-parties.
Wi...
π 62.0% similar
Why DeepSeek is cheap at scale but expensive to run locally
Why is DeepSeek-V3 supposedly fast and cheap to serve at scale, but too slow and expensive...