Benchmark POC
Benchmark POC: Benchmarking LLM models on AO
This proof of concept (POC) aims to benchmark large language models (LLMs) on AO. Funders can create a funding pool with a set of questions, and models compete to answer them. Participants can train and submit their models, which are evaluated and ranked on a daily-updated leaderboard. At the end of the funding period, winners are determined based on the leaderboard, and rewards are distributed.
🔑Prerequisites
AOS and Ardrive installed on your system.
📽️Processes
Pool Creation
Model Upload
Model Evaluation
Scoring and Leaderboard
📜Detailed Process
1. Pool Creation
Upload Dataset
Upload your chosen benchmarking dataset (e.g., SIQA) to the Arweave blockchain using the ArDrive application or CLI.
Create pool
Create a pool by sending the following message through AOS:
ao.send({
Target = 'xU9zFkq3X2ZQ6olwNVvr1vUWIjc3kXTWr7xKQD6dh10',
Action = 'Transfer',
Recipient = 'DLJoP8Xtdat6SKz3kqYGZPaa7DJBG6etF1jRLQCwquo',
Quantity = Fee,
['X-Dataset'] = <your dataset process id>,
['X-Allocation'] = 'ArithmeticDecrease'
})
For details on creating a dataset process ID, refer to the tutorial on our GitHub.
2. Prepare Model
Upload fine-tuned models
Upload two fine-tuned models (e.g., llama3-8B dataset alpaca and samsum) to the Arweave blockchain via the ArDrive application or CLI. For example:

After uploading the model, you get the data tx ID : Such as:ISrbGzQot05rs_HKC08O_SmkipYQnqgB1yC3mjZZeEo
3. Model Evaluation
Register models
Join a pool by sending a message to the pool process with the following payload:
ao.send({
Target = 'DLJoP8Xtdat6SKz3kqYGZPaa7DJBG6etF1jRLQCwquo',
Action = 'Join-Pool',
Data = '{"dataset": <the pool id you want to join>,
"model": <your model id>}'})
Once you join a pool, the model will start evaluating the dataset and sending back the score to the pool process.
4. Scoring and Leaderboard
Retrieve Leaderboard Results
Retrieve the leaderboard results by sending a message to the pool process with the following payload:
ao.send({
Target = 'DLJoP8Xtdat6SKz3kqYGZPaa7DJBG6etF1jRLQCwquo',
Action = 'Leaderboard',
Data = <pool id>
})
Leaderboard updated every 24 hours
This message will execute and display the model leaderboard within AOS. Such as:


Last updated