World Insurance Model

The benchmark for insurance agents.

Anyone can build an agent that reads a policy. The harder question is whether it actually got the answer right. WIM turns complex insurance expertise into verifiable tasks, giving models a structured environment to learn, test, and improve through feedback.

Evals are gravity for agent systems — development bends toward whatever they measure. The work isn't pushing scores higher; it's making sure the gravity well sits under behavior that actually matters. Give a team a metric they trust and the curve takes care of itself.