# Toolathlon ## Docs - [Introducing Toolathlon](https://toolathlon.xyz/docs/blog/toolathlon.md) - [Contributors](https://toolathlon.xyz/docs/contri.md) - [Task](https://toolathlon.xyz/docs/dataset.md) - [Discord](https://toolathlon.xyz/docs/discord.md) - [Toolathlon Framework](https://toolathlon.xyz/docs/intro.md) - [Model Leaderboard](https://toolathlon.xyz/docs/leaderboard.md) - [Selected Servers](https://toolathlon.xyz/docs/selected.md): Public MCP servers we selected. - [Self Implemented Tookits](https://toolathlon.xyz/docs/self-impl.md): Toolkits we implemented to make it more robust and capable for our complex evaluation. - [Submitting to the Toolathlon](https://toolathlon.xyz/docs/submit.md): How to submit your agent results to the Toolathlon leaderboard. - [Find Alita Paper](https://toolathlon.xyz/docs/tasks/aca/10.md): Find a paper on agentic reasoning whose title has 'Alita' and download it. - [Paper Checker](https://toolathlon.xyz/docs/tasks/aca/152.md): Audit and fix all citation and cross-reference commands across LaTeX files. - [Academic PDF Report](https://toolathlon.xyz/docs/tasks/aca/158.md): Extract first author names and affiliations from papers and save consolidated results as paper_updated.xlsx. - [Set Conf CR DDL](https://toolathlon.xyz/docs/tasks/aca/2.md): Check my emails from the past day for any mention of the COML conference main-track camera-ready deadline, and if found, schedule a calendar reminder for three hours before that deadline. - [Latex Prompt Box](https://toolathlon.xyz/docs/tasks/aca/229.md): Add a boxed final prompt section at the end of the appendix in the LaTeX project, styled consistently with Appendix B of M-STAR, without modifying any other content. - [VLM History Completer](https://toolathlon.xyz/docs/tasks/aca/290.md): Copy the VLM history spreadsheet to vlm-history-completer as VLM-History, then research and populate the "Architecture" and "Sources" columns for each model. - [Email Paper Homepage](https://toolathlon.xyz/docs/tasks/aca/3.md): Remotely update your personal homepage repository by checking all repos, updating paper statuses from 'preprint/under review' based on email acceptances, and linking released code repositories from GitHub. - [CVPR Research](https://toolathlon.xyz/docs/tasks/aca/30.md): Identify the top three CVPR 2025 authors most aligned with your research and active in academia, using Paper Copilot for publication stats, then return their names. - [ImageNet](https://toolathlon.xyz/docs/tasks/aca/32.md): Summarize the ImageNet 256 experimental results from my image generation papers into a LaTeX table. - [Notion Personal Website](https://toolathlon.xyz/docs/tasks/aca/372.md): Update the Notion page 'Colley Whisson' with all details from colley_whisson.docx across four sections: About Me, Paintings, Workshop, and Prizes. - [Personal Website Construct](https://toolathlon.xyz/docs/tasks/aca/4.md): Fork and customize the academicpages template as `LJT-Homepage` using only memory-provided personal and academic details. - [Add Bibtex](https://toolathlon.xyz/docs/tasks/aca/5.md): Add five specified articles to ref.bib in BibTeX format, maintaining consistency with existing entries and sourcing conference versions from OpenReview where applicable. - [HK Top Conf](https://toolathlon.xyz/docs/tasks/aca/66.md): Count and compare deep learning paper outputs from HKU, CUHK, and HKUST at top AI conferences. - [Meeting Assign](https://toolathlon.xyz/docs/tasks/aca/88.md): Parse availability from a When2Meet link, recommend a 2-hour weekly meeting slot. - [Profile Update Online](https://toolathlon.xyz/docs/tasks/aca/9.md): Update personal profile with the metadata and details from the two newly uploaded papers. - [PPT Analysis](https://toolathlon.xyz/docs/tasks/campus/125.md): Summarize functional vs. imperative symbol tables, extract and annotate all related code from Compile.pptx, and explain the homework in HW.PDF. - [Academic Warning](https://toolathlon.xyz/docs/tasks/campus/149.md): Flag students with >25% score drops in bad_student.csv; log >45% drops to exam_log for auto-alerting counselors. - [Course TA HWS](https://toolathlon.xyz/docs/tasks/campus/161.md): Organize and clean up student homework files by renaming, categorizing by programming language, and removing unrelated code to maintain a tidy workspace. - [Language School](https://toolathlon.xyz/docs/tasks/campus/162.md): Summarize official TOEFL/IELTS, fees, and deadlines for QS CS Top 10 US grad programs into “cs_top10_us_2025.xlsx”. - [University Course Selection](https://toolathlon.xyz/docs/tasks/campus/165.md): Generate clean Excel schedules matching course and exam preferences, keeping only valid options. - [Student Interview](https://toolathlon.xyz/docs/tasks/campus/183.md): A professor's workflow of scheduling interviews with students based on their research qualifications and calendar availability. - [Canvas Art Manager](https://toolathlon.xyz/docs/tasks/campus/188.md): Create and publish Canvas courses for assigned instructors using the master schedule. - [Canvas New Students Notification](https://toolathlon.xyz/docs/tasks/campus/189.md): Add new transfer students to course sheets and privately notify each that their first assignment grade will mirror their second, urging diligent completion. - [Canvas List Test](https://toolathlon.xyz/docs/tasks/campus/190.md): Complete and update the two given CSV files with the pending assignments and quizzes, sorted by deadline and course code. - [Canvas Art Quiz](https://toolathlon.xyz/docs/tasks/campus/196.md): Create a Canvas quiz titled “Classic Art History Questions” with four multiple-choice questions and auto-generated correct answers, one point each. - [Canvas Arrange Exam](https://toolathlon.xyz/docs/tasks/campus/197.md): Update exam_schedule.xlsx with your remaining final exams, cleaning course codes/names and marking unreleased info as TBD. - [Canvas Homework Grader Python](https://toolathlon.xyz/docs/tasks/campus/306.md): Grade Homework2 by downloading latest Python submissions from email, running them to check for errors, and assigning 10 (pass) or 0 (fail) in Canvas based on correctness. - [Course Schedule](https://toolathlon.xyz/docs/tasks/campus/34.md): Select daytime courses prioritizing Prof. Yulian’s sections, match required exams from master table, exclude exemptions, and output sorted exam schedule. - [Canvas Submit Late Work](https://toolathlon.xyz/docs/tasks/campus/37.md): Submit your unsubmitted assignments via Playwright and email the TA with your leave document for Assignment 2. - [Course Assistant](https://toolathlon.xyz/docs/tasks/campus/38.md): Identify NLP students who haven't submitted final presentations, and email each with their name and ID in the subject 'nlp-course-emergency' to avoid spam filters. - [Apply PhD Email](https://toolathlon.xyz/docs/tasks/campus/39.md): Locate and act on submission instructions from Kaiming’s email titled “submit_material,” using workspace files and stored personal info. - [Canvas Do Quiz](https://toolathlon.xyz/docs/tasks/campus/404.md): Log in to Canvas, identify all unfinished quizzes, and forcibly submit them. - [Search CA School](https://toolathlon.xyz/docs/tasks/campus/49.md): Find and list top AI grad schools near LA, ranked and sorted by distance. - [Fillout Online Forms](https://toolathlon.xyz/docs/tasks/daily/133.md): Submit the welcome party form using memory data, defaulting unspecified fields to negative. - [Trip Adviser](https://toolathlon.xyz/docs/tasks/daily/155.md): Plan a route from Tokyo Station’s Nihonbashi Exit to Kamakura Station with a nearby Starbucks stop before departure, then to Kamakura Museum of History and Culture. - [Upenn Campus Route](https://toolathlon.xyz/docs/tasks/daily/156.md): Plan the shortest walking route visiting all UPenn campus attractions. - [Trip Itinerary Generator](https://toolathlon.xyz/docs/tasks/daily/159.md): Generate a two-day Paris itinerary starting July 28, 2025 from Notre Dame, grouping east/south (Day 1) and west/central (Day 2) attractions from 'my_wishlist.txt', output as 'Paris_Itinerary.json' in specified format. - [Train Ticket Plan](https://toolathlon.xyz/docs/tasks/daily/16.md): Find matching round-trip high-speed train schedules for two travelers departing from Beijing South and Shanghai Hongqiao to Qufu East. - [Subway Planning](https://toolathlon.xyz/docs/tasks/daily/169.md): Plan the shortest-walking MRT route from Hilton Garden Inn Singapore to Changi Airport, including a required meet-up stop, and save the station sequence to routine.txt. - [Dietary Health](https://toolathlon.xyz/docs/tasks/daily/181.md): Analyze whether your current diet meets fitness nutrition guidelines based on your body data and ingredient nutrition table, outputting a structured report in analysis.md according to the specified format. - [Cooking Guidance](https://toolathlon.xyz/docs/tasks/daily/182.md): Recommend 3 dishes based on available ingredients, output to cuisine.json, and generate a missing-ingredients shopping list in shopping.csv. - [Mrbeast Analysis](https://toolathlon.xyz/docs/tasks/daily/201.md): Analyze MrBeast's video uploads from Jan 2024 to Jun 2025 to determine upload day frequency, average video length, upload interval, and transcript-based content. - [Youtube Repo](https://toolathlon.xyz/docs/tasks/daily/209.md): Scan a YouTube playlist for machine learning technical videos, extract associated technologies and their GitHub repositories, and document each project’s URL and main functions. - [Identify All Songs](https://toolathlon.xyz/docs/tasks/daily/210.md): Extract and identify song names from the lyrics of the YouTube playlist. - [Travel Exchange](https://toolathlon.xyz/docs/tasks/daily/313.md): Aggregate personal expense files for seven travelers, convert all costs to CNY using actual June 5, 2025 exchange rates. - [NHL B2B Analysis](https://toolathlon.xyz/docs/tasks/daily/316.md): Analyze the NHL 2024–2025 schedule to count each team’s back-to-back games by home/away configuration. - [Music Analysis](https://toolathlon.xyz/docs/tasks/daily/319.md): Analyze Billboard 1940s pop charts to compute each song’s longest consecutive chart run. - [Inter Final Performance Analysis](https://toolathlon.xyz/docs/tasks/daily/351.md): Populate Inter Milan’s 2023 and 2025 UCL final stats into three Google Sheets tabs, compute differences in “StatsDifference”, and mark missing data. - [Notion Movies](https://toolathlon.xyz/docs/tasks/daily/371.md): Update the “Ultimate Movie Tracker” Notion page by completing existing movie fields, adding “Star Wars: Episode III” with full metadata and “Watched” status, and embedding its official YouTube trailer link. - [Arrange Workspace](https://toolathlon.xyz/docs/tasks/daily/78.md): Organize the files in user's workspace. - [Stock Build Position](https://toolathlon.xyz/docs/tasks/finance/126.md): Allocate $1M USD across US, Hong Kong, and A-shares in a 4:3:3 ratio, purchasing whole shares using today’s opening prices and exchange rates, and populate the position table accordingly. - [A/B Testing](https://toolathlon.xyz/docs/tasks/finance/141.md): Determine A/B test winner by mean per-scenario conversion rate. - [Price Comparison](https://toolathlon.xyz/docs/tasks/finance/143.md): Compare product prices against competitor FutureGadget by extracting PDF data, merging with internal pricing, calculating differences, and storing structured results. - [Oil Price](https://toolathlon.xyz/docs/tasks/finance/179.md): Fetch last 12 months of WTI/Brent prices from Yahoo Finance, analyze spread dynamics, calculate z-score-based trading signals, backtest the strategy, and return a summary report. - [Quantitative Financial Analysis](https://toolathlon.xyz/docs/tasks/finance/223.md): Create or update a Google Sheet named ‘2025_Q2_Market_Data’ with a worksheet ‘May-Jun_2025’ containing daily stock data. - [Investment Decision Analysis](https://toolathlon.xyz/docs/tasks/finance/267.md): Create three separate Google Sheets spreadsheets in the designated folder using Yahoo Finance data, following naming, content, and formatting rules. - [Yahoo Analysis](https://toolathlon.xyz/docs/tasks/finance/275.md): Evaluate historical analyst rating accuracy for NVDA and AAPL by measuring stock price performance. - [NVIDIA Market](https://toolathlon.xyz/docs/tasks/finance/284.md): Analyze NVIDIA's institutional ownership trends across 8 quarters, adjust for stock split, populate results_template.xlsx with common holdings only. - [Nvidia Stock Analysis](https://toolathlon.xyz/docs/tasks/finance/345.md): Populate "results_template.xlsx" with NVIDIA ownership and market trend data per specifications. - [GDP CR5 Analysis](https://toolathlon.xyz/docs/tasks/finance/352.md): Calculate and rank regions by GDP CR5 index using WorldBank data, saving top 5 countries, GDP sums, and ratios. - [Excel Data Transformation](https://toolathlon.xyz/docs/tasks/office/108.md): Restructure appliance sales data from wide to long format and save as Processed.xlsx. - [Excel Market Research](https://toolathlon.xyz/docs/tasks/office/109.md): Convert market product categories to internal classifications and calculate 2015–2024 Appliance sales growth rate, saving results in growth_rate.xlsx for data-only loading. - [Sales Accounting](https://toolathlon.xyz/docs/tasks/office/116.md): Update last week’s missing transactions in Account_Book.xlsx using structured records from memory.json. - [Interview Report](https://toolathlon.xyz/docs/tasks/office/127.md): Analyze candidate resumes and evaluations against job requirements in “Requirement.txt”, then output only the top candidate’s name. - [Privacy Desensitization](https://toolathlon.xyz/docs/tasks/office/131.md): Scan, detect, and desensitize sensitive data (phones, SSNs, emails, credit cards, IPs) in specified files. - [Game Statistics](https://toolathlon.xyz/docs/tasks/office/142.md): Process end-of-day player scores by generating a top-100 leaderboard in a date-named table and updating daily stats for all players into the historical master table for long-term analysis. - [Flagged Transactions](https://toolathlon.xyz/docs/tasks/office/144.md): Detect anomalies in 2025 transactions for high-net-worth clients by flagging amounts > mean + 3σ per client. - [Live Transactions](https://toolathlon.xyz/docs/tasks/office/147.md): Archive a suspicious transaction’s full data as a JSON file named by its ID and log a critical fraud alert for investigation. - [Machine Operating](https://toolathlon.xyz/docs/tasks/office/150.md): Generate and upload an anomaly report for out-of-range sensor readings during a specified time window. - [Reimbursement Form Filler](https://toolathlon.xyz/docs/tasks/office/163.md): Extract taxi receipt data (filename, month, amount) from ‘bills/’ PDFs for the last 3 months and populate ‘Bill_Format.xlsx’ → rename to ‘department_expenses.xlsx’ per ‘requirement.txt’. - [Detect Revised Terms](https://toolathlon.xyz/docs/tasks/office/23.md): Analyze new_law.pdf and court rulings to identify conflicting, inconsistent, or obsolete legal provisions. - [Payable Invoice Checker](https://toolathlon.xyz/docs/tasks/office/235.md): Automatically update PURCHASE_INVOICE tables with receipts from workspace, email purchasing managers unpaid filenames. - [Notion HR](https://toolathlon.xyz/docs/tasks/office/237.md): Update candidate records in Notion strictly per resume content, and email applicants whose applied positions are closed using the provided template. - [Invoice Org](https://toolathlon.xyz/docs/tasks/office/24.md): Extract invoice details (amount, date, vendor), convert all to RMB using historical exchange rates. - [SLA Timeout Monitor](https://toolathlon.xyz/docs/tasks/office/292.md): Identify overdue tickets by comparing against SLA response times, then send reminder emails to responsible managers and apology emails to affected users. - [Notion Find Job](https://toolathlon.xyz/docs/tasks/office/327.md): For "Checking" jobs meeting salary, location, and role criteria, send a templated application email and update their Notion `Job Tracker` status. - [Travel Expense Reimbursement](https://toolathlon.xyz/docs/tasks/office/331.md): Validate expense claims against invoices. - [Landing Task Reminder](https://toolathlon.xyz/docs/tasks/office/368.md): Auto-generate training emails for new and overdue employees, CC supervisors, and update Snowflake. - [Shopping Helper](https://toolathlon.xyz/docs/tasks/shopping/113.md): Find and recommend 3 genuine black leather sofas within $200–400 on Amazon. - [Woocommerce Customer Survey](https://toolathlon.xyz/docs/tasks/shopping/266.md): For customers with “Completed” orders, generate a Google Forms feedback survey per form_requirements.md, email it to them. - [Woocommerce Update Cover](https://toolathlon.xyz/docs/tasks/shopping/279.md): Update the main product image based on WooCommerce order data. - [Update Material Inventory](https://toolathlon.xyz/docs/tasks/shopping/285.md): Monitor new WooCommerce paid orders, deduct raw materials via BOM from Google Sheets inventory, recalculate max producible quantities, and sync updated stock levels back to WooCommerce. - [Woocommerce New Welcome](https://toolathlon.xyz/docs/tasks/shopping/299.md): Sync first-time customer data from WooCommerce to BigQuery and send each a welcome email. - [Woocommerce Stock Alert](https://toolathlon.xyz/docs/tasks/shopping/300.md): dentify WooCommerce products with stock below safety threshold, log them in Google Sheets stock_sheet, and email purchasing manager. - [Filter Low Selling Products](https://toolathlon.xyz/docs/tasks/shopping/301.md): Identify slow-moving products, move them to “Outlet/Clearance”, and email subscribers a sorted list. - [Woocommerce Product Recall](https://toolathlon.xyz/docs/tasks/shopping/303.md): Sync latest unupdated product inventories from each city’s SQLite warehouse database to WooCommerce online store. - [Inventory Sync](https://toolathlon.xyz/docs/tasks/shopping/304.md): Sync latest unupdated product inventories from each city’s SQLite warehouse database to WooCommerce online store. - [Woocommerce New Product](https://toolathlon.xyz/docs/tasks/shopping/305.md): Send new product reservation alerts to subscribers with “new_product_alerts” preference and discount reminders to all customers. - [Ipad EDU Price](https://toolathlon.xyz/docs/tasks/shopping/95.md): Compare official Apple education prices for iPad Pro (256GB, 11-inch) + Apple Pencil Pro across Mainland China, Hong Kong, Singapore, and the U.S. - [Verl Dataset](https://toolathlon.xyz/docs/tasks/tech/146.md): Download the most downloaded DeepScaleR dataset from Hugging Face, convert it to Parquet format per `format.json`, and save it as `verl_deepscaler.parquet` for replicating DeepSeek-R1’s 'aha moment' in the Verl framework. - [Logical Datasets Collection](https://toolathlon.xyz/docs/tasks/tech/160.md): Generate a LaTeX table named datasets.tex listing five logical reasoning datasets with columns for task count, trainable flag, and adjustable difficulty flag. - [Merge HF Datasets](https://toolathlon.xyz/docs/tasks/tech/17.md): Download the first 500 entries from three Hugging Face datasets (toolace, glaive, xlam), reformat them into a unified schema. - [Git Milestone](https://toolathlon.xyz/docs/tasks/tech/173.md): Fetch and save metadata for GitHub's milestone repositories (IDs 1, 1K, 1M, 1B) into github_info.json, skipping unavailable ones. - [Dataset License Issue](https://toolathlon.xyz/docs/tasks/tech/18.md): Determine the most permissive license from direct data/model sources for each dataset, reply and close the issue. - [HuggingFace Upload](https://toolathlon.xyz/docs/tasks/tech/19.md): Upload the highest-accuracy checkpoint from workspace to Hugging Face Hub. - [K8S Safety Audit](https://toolathlon.xyz/docs/tasks/tech/241.md): Perform a Kubernetes cluster security audit and append the results to the "Week3" worksheet in the "Kubernetes Security Audit" Google Sheet, matching the column order and formatting of prior weeks. - [K8S MySQL](https://toolathlon.xyz/docs/tasks/tech/242.md): Set up a persistent port-forward on port 30124 to the f1 database, and complete the debugging task. - [K8S Redis Helm Upgrade](https://toolathlon.xyz/docs/tasks/tech/244.md): Upgrade the Bitnami Redis Helm chart to exact version 22.0.0 in the shared-services namespace. - [K8S PR Preview Testing](https://toolathlon.xyz/docs/tasks/tech/245.md): Deploy the feature/pr-123 branch of SimpleShopping to Kubernetes. - [K8S Deployment Cleanup](https://toolathlon.xyz/docs/tasks/tech/248.md): Identify and stop all Deployments in 'dev-' namespaces running versions older than 30 days, then email the cluster admin a chronologically ordered list of affected deployments with their age in days. - [LLM Training Dataset](https://toolathlon.xyz/docs/tasks/tech/280.md): Organize pre-training datasets used by LLaMA and GPT-Neo into the ptdata sheet, sorted by size descending. - [Sync ToDo to Readme](https://toolathlon.xyz/docs/tasks/tech/286.md): Extract all TODO items from .py files in lexicographical file order, update them into README.md. - [Experiments Recordings](https://toolathlon.xyz/docs/tasks/tech/294.md): Update the Notion table with best scores and steps per benchmark from W&B runs, combining same-named runs and averaging available metrics. - [Task Tracker](https://toolathlon.xyz/docs/tasks/tech/295.md): Find recent task-adding branches in BenchTasksCollv3, mark tasks as implemented or implementing, update Notion, and add implemented tasks to a new finalpool branch. - [Git Bug Hunt](https://toolathlon.xyz/docs/tasks/tech/42.md): Email the author of the commit that introduced 'remove_caching_layer' to urgently investigate the performance issue, using the provided template. - [Git Repo](https://toolathlon.xyz/docs/tasks/tech/47.md): Summarize the ImageNet 256 experimental results from my image generation papers into a LaTeX table. - [Wandb Best Score](https://toolathlon.xyz/docs/tasks/tech/93.md): Identify the top-performing experiment and its best step from the Weights & Biases project. - [Wandb Shortest Length](https://toolathlon.xyz/docs/tasks/tech/94.md): Identify the experiment yielding shortest responses from W&B logs, then extract entropy_loss, clip_ratio, and response_length_mean. - [How to use Toolathlon](https://toolathlon.xyz/docs/test.md) - [Trajectory](https://toolathlon.xyz/docs/traj.md) - [Tool Decathlon](https://toolathlon.xyz/introduction.md) ## OpenAPI Specs - [openapi](https://toolathlon.xyz/api-reference/openapi.json)