Similar Articles

Evaluating LLMs for my personal use case

https://darkcoding.net/software/personal-ai-evals-aug-2025/

Domain: darkcoding.net Added: 2025-09-01 Status: ✓ Success

darkcoding.net

Evaluating LLMs for my personal use case Summary It’s great that AI can win maths Olympiads, but that’s not what I’m doing. I mostly ask basic Rust, Python, Linux and life questions. So I did my own e...

Similar Articles (10 found)

https://simonwillison.net/2025/Nov/24/claude-opus/#atom-entries

simonwillison.net 2025-12-18

simonwillison.net

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult 24th November 2025 Anthropic released Claude Opus 4.5 this morning, which they ...

🔍 View Similar Articles

🔍 69.7% similar

Error extracting title

https://simonwillison.net/2024/Dec/31/llms-in-2024/

simonwillison.net 2025-07-12

simonwillison.net

Things we learned about LLMs in 2024 31st December 2024 A lot has happened in the world of Large Language Models over the course of 2024. Here’s a rev...

🔍 View Similar Articles 🟠 HN

https://simonwillison.net/2025/Aug/7/gpt-5/

simonwillison.net 2025-08-13

simonwillison.net

GPT-5: Key characteristics, pricing and model card 7th August 2025 I’ve had preview access to the new GPT-5 model family for the past two weeks (see r...

🔍 View Similar Articles 🟠 HN

🔍 68.2% similar

2 Years of ML vs. 1 Month of Prompting

https://www.levs.fyi/blog/2-years-of-ml-vs-1-month-of-prompting/

www.levs.fyi 2025-11-15

www.levs.fyi

2 Years of ML vs. 1 Month of Prompting November 7, 2025 Recalls at major automakers cost hundreds of millions of dollars a year. It’s a huge issue. To...

🔍 View Similar Articles 🟠 HN

🔍 66.7% similar

GPT-5.2

https://simonwillison.net/2025/Dec/11/gpt-52/#atom-entries

simonwillison.net 2025-12-18

simonwillison.net

GPT-5.2 11th December 2025 OpenAI reportedly declared a “code red” on the 1st of December in response to increasingly credible competition from the li...

🔍 View Similar Articles

https://news.ycombinator.com/item?id=45427634

news.ycombinator.com 2025-10-11

news.ycombinator.com

> the generation of 281,128 augmented examples, from which 1,000 were held out as a benchmark test set. This model is trained on a custom dataset of 2...

🔍 View Similar Articles

🔍 64.9% similar

So you wanna build a local RAG?

https://blog.yakkomajuri.com/blog/local-rag

blog.yakkomajuri.com 2025-11-28

blog.yakkomajuri.com

When we launched Skald, we wanted it to not only be self-hostable, but also for one to be able to run it without sending any data to third-parties. Wi...

🔍 View Similar Articles 🟠 HN

🔍 64.9% similar

Olmo 3 is a fully open LLM

https://simonwillison.net/2025/Nov/22/olmo-3/#atom-entries

simonwillison.net 2025-12-18

simonwillison.net

Olmo 3 is a fully open LLM 22nd November 2025 Olmo is the LLM series from Ai2—the Allen institute for AI. Unlike most open weight models these are not...

🔍 View Similar Articles

https://lightcapai.medium.com/same-ai-different-answer-how-tiny-prompts-can-change-everything-83e880f9773f

lightcapai.medium.com 2025-08-13

lightcapai.medium.com blog article +1

Same AI, Different Answer: How Tiny Prompts Can Change Everything Why Does ChatGPT Sometimes Feel Different? If you’ve used AI chatbots like ChatGPT f...

🔍 View Similar Articles 🟠 HN

🔍 64.3% similar

Vibe Coding as a Coding Veteran

https://levelup.gitconnected.com/vibe-coding-as-a-coding-veteran-cd370fe2be50

levelup.gitconnected.com 2025-09-01

levelup.gitconnected.com

Vibe Coding as a Coding Veteran From 8-bit Assembly to English-as-Code By now, we’ve all heard about this “vibe coding” thing: you let an AI assistant...

🔍 View Similar Articles 🟠 HN