Your agent eval could be grading the wrong thing. Output-only evals ...

Your agent eval could be grading the wrong thing. Output-only evals ...