MCP Servers




Local Tools



manage_context
handle_overlong_tool_outputs
Instruction
In the table of the Notion pagemcp_experiments_recordings, based on the historical experiments of W&B project mbzuai-llm/Guru, list the highest val-core acc mean@1/mean@k scores for each benchmark according to the table headers, and calculate and fill in the Best Step for that run (format: step(average acc)).
Instructions:
- If multiple runs have the same name, treat them as one run for combined statistics.
- The average score should only be calculated using the arithmetic mean of metrics available at that step; missing metrics are not included.
- Only operate on the target page under the specified parent page; do not change column names or order.
Initial State
Local Workspace
workspace
└── table_template.txt