Benchmark POC
Last updated
Last updated
This proof of concept (POC) aims to benchmark large language models (LLMs) on AO. Funders can create a funding pool with a set of questions, and models compete to answer them. Participants can train and submit their models, which are evaluated and ranked on a daily-updated leaderboard. At the end of the funding period, winners are determined based on the leaderboard, and rewards are distributed.
Familiarity with , , and .
AOS and Ardrive installed on your system.
Pool Creation
Model Upload
Model Evaluation
Scoring and Leaderboard
Upload Dataset
Upload your chosen benchmarking dataset (e.g., SIQA) to the Arweave blockchain using the ArDrive application or CLI.
Create pool
Create a pool by sending the following message through AOS:
For details on creating a dataset process ID, refer to the tutorial on our GitHub.
Upload fine-tuned models
Upload two fine-tuned models (e.g., llama3-8B dataset alpaca and samsum) to the Arweave blockchain via the ArDrive application or CLI. For example:
After uploading the model, you get the data tx ID : Such as:ISrbGzQot05rs_HKC08O_SmkipYQnqgB1yC3mjZZeEo
Register models
Join a pool by sending a message to the pool process with the following payload:
Once you join a pool, the model will start evaluating the dataset and sending back the score to the pool process.
Retrieve the leaderboard results by sending a message to the pool process with the following payload:
Leaderboard updated every 24 hours
This message will execute and display the model leaderboard within AOS. Such as: