MCP Servers



Local Tools



manage_context
handle_overlong_tool_outputs
Instruction
Please scan the workspace folder, pick the model checkpoint with the highest eval_accuracy, then push the best model’s folder to Hugging Face Hub as a model repo namedMyAwesomeModel-TestRepo.
Finalize the repo’s README.md with the detailed evaluation results for all 15 benchmarks (keep three decimal places), you must refer to the current README.md under workspace and ensure its completeness in the uploaded repo. Do not change any other content in the README.md besides the benchmark scores.
You can use the hf_token.txt under the workspace if necessary.
Initial State
Local Workspace
workspace/
├── checkpoints/
├── evaluation/
├── figures/
└── README.md