Skip to main content

MCP Servers
terminal
filesystem
huggingface
Local Tools
history
claim_done
python_execute
manage_context
handle_overlong_tool_outputs

Instruction

Please scan the workspace folder, pick the model checkpoint with the highest eval_accuracy, then push the best model’s folder to Hugging Face Hub as a model repo named MyAwesomeModel-TestRepo. Finalize the repo’s README.md with the detailed evaluation results for all 15 benchmarks (keep three decimal places), you must refer to the current README.md under workspace and ensure its completeness in the uploaded repo. Do not change any other content in the README.md besides the benchmark scores. You can use the hf_token.txt under the workspace if necessary.

Initial State

Local Workspace

workspace/ ├── checkpoints/ ├── evaluation/ ├── figures/ └── README.md