Really great work!Recently, I evaluate models by tau^2 bench , but diff user simulator and diff version prompt lead to high variation scores.Would you release your tau^2 bench codebase?Thanks a lot !
· Sign up or log in to comment