Similar Articles

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

https://news.ycombinator.com/item?id=44840728

Domain: news.ycombinator.com Added: 2025-08-13 Status: ✓ Success

news.ycombinator.com

Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds. Sure, they have huge GPU cluste...

Similar Articles (10 found)

https://www.baseten.co/blog/sota-performance-for-gpt-oss-120b-on-nvidia-gpus/

www.baseten.co 2025-08-06

www.baseten.co,model performance optimization,bug fixing,nvidia gpus,experimentation,benchmarking

Day zero model performance optimization work is a mix of experimentation, bug fixing, and benchmarking guided by intuition and experience. This writeu...

🔍 View Similar Articles 🟠 HN

https://news.ycombinator.com/item?id=32641769

news.ycombinator.com 2025-07-13

hackernews,tech,news,news.ycombinator.com

For some reason they focus on the inference, which is the computationally cheap part. If you're working on ML (as opposed to deploying someone else's ...

🔍 View Similar Articles

https://www.seangoedecke.com/inference-batching-and-deepseek/

www.seangoedecke.com 2025-07-13

deepseek,ai models,throughput,latency,batch size,www.seangoedecke.com

Why DeepSeek is cheap at scale but expensive to run locally Why is DeepSeek-V3 supposedly fast and cheap to serve at scale, but too slow and expensive...

🔍 View Similar Articles 🟠 HN

🔍 63.9% similar

LLM Engineer's Almanac - Workloads

https://modal.com/llm-almanac/workloads

modal.com 2026-02-03

modal.com

The three types of LLM workloads and how to serve them We hold this truth to be self-evident: not all workloads are created equal. But for large langu...

🔍 View Similar Articles 🟠 HN

https://news.ycombinator.com/item?id=33518443

news.ycombinator.com 2025-07-13

hackernews,news,news.ycombinator.com,tech

What is a good algorithm-to-purpose map for ML beginners? Looking for something like "Algo X is good for making predictions when your data looks like ...

🔍 View Similar Articles

openai.com 2025-07-13

openai.com

Techniques for training large neural networks Large neural networks are at the core of many recent advances in AI, but training them is a difficult en...

🔍 View Similar Articles

https://pub.towardsai.net/laptop-only-llm-tune-google-gemma-3-in-minutes-code-inside-d86fa83e0d8f?source=rss----98111c9905da---4

pub.towardsai.net 2025-09-01

pub.towardsai.net

Member-only story googlLaptop-Only LLM: Tune Google Gemma 3 in Minutes (Code Inside) A clean, from-scratch walkthrough (with code) to tune a 270M-para...

🔍 View Similar Articles

https://news.ycombinator.com/item?id=45427634

news.ycombinator.com 2025-10-11

news.ycombinator.com

> the generation of 281,128 augmented examples, from which 1,000 were held out as a benchmark test set. This model is trained on a custom dataset of 2...

🔍 View Similar Articles

🔍 60.9% similar

Owning a $5M data center

https://blog.comma.ai/datacenter/

blog.comma.ai 2026-02-05

blog.comma.ai

Owning a $5M data center These days it seems you need a trillion fake dollars, or lunch with politicians to get your own data center. They may help, b...

🔍 View Similar Articles 🟠 HN

https://neutree.ai/blog/nano-vllm-part-1

neutree.ai 2026-02-03

neutree.ai

Understanding LLM Inference Engines: Inside Nano-vLLM (Part 1) Architecture, Scheduling, and the Path from Prompt to Token When deploying large langua...

🔍 View Similar Articles 🟠 HN