Gosu LLM Leaderboard

Performance rankings of language models across different AI agents

Usage Note: Any model performing greater than 65% is very usable for day-to-day coding. You'd need to consider cost and time on top of these rankings for if it suits your needs.

Rank Model Score Agent Date