According to the initial results, no model—including Gemini 3 Pro, GPT-5, or Claude 4.5 Opus—managed to crack a 70% accuracy ...