Gosu LLM Leaderboard

Performance rankings of language models across different AI agents

Important Note: The tests used for this Overall Leaderboard are different from those used on the Best AI Agents page. Over time, I am looking to converge these evaluation methodologies, but for now they represent two different sets of evaluations with distinct testing approaches.

Usage Note: Any model performing greater than 65% is very usable for day-to-day coding. You'd need to consider cost and time on top of these rankings for if it suits your needs.

Rank	Model ↕	Score ↕	Agent ↕	Date ↕