Building a Natural Language Database Query Agent: Complete Guide

Mar 18, 2026

Model	Size	BIRD Execution Accuracy	Key Strength
Databricks RLVR	32B	75.68%	State-of-the-art via reinforcement learning
Arctic-Text2SQL-R1	32B	71.83%	Outperforms proprietary models on BIRD
Arctic-Text2SQL-R1	7B	68.47%	Matches 70B models at 1/10th the size
GPT-4o	Proprietary	55.9-66.2%	Strong generalist, beaten by specialized models