Benchmark POC

Benchmark POC: Benchmarking LLM models on AO

This proof of concept (POC) aims to benchmark large language models (LLMs) on AO. Funders can create a funding pool with a set of questions, and models compete to answer them. Participants can train and submit their models, which are evaluated and ranked on a daily-updated leaderboard. At the end of the funding period, winners are determined based on the leaderboard, and rewards are distributed.

🔑Prerequisites

  1. Familiarity with AO, AOS, and ArDrive.

  2. AOS and Ardrive installed on your system.

📽️Processes

  1. Pool Creation

  2. Model Upload

  3. Model Evaluation

  4. Scoring and Leaderboard

📜Detailed Process

1. Pool Creation

Upload Dataset

Upload your chosen benchmarking dataset (e.g., SIQA) to the Arweave blockchain using the ArDrive application or CLI.

Create pool

Create a pool by sending the following message through AOS:

ao.send({
   Target = 'xU9zFkq3X2ZQ6olwNVvr1vUWIjc3kXTWr7xKQD6dh10',
   Action = 'Transfer',
   Recipient = 'DLJoP8Xtdat6SKz3kqYGZPaa7DJBG6etF1jRLQCwquo',
   Quantity = Fee,
   ['X-Dataset'] = <your dataset process id>,
   ['X-Allocation'] = 'ArithmeticDecrease'
})

For details on creating a dataset process ID, refer to the tutorial on our GitHub.

2. Prepare Model

Upload fine-tuned models

Upload two fine-tuned models (e.g., llama3-8B dataset alpaca and samsum) to the Arweave blockchain via the ArDrive application or CLI. For example:

After uploading the model, you get the data tx ID : Such as:ISrbGzQot05rs_HKC08O_SmkipYQnqgB1yC3mjZZeEo

3. Model Evaluation

Register models

Join a pool by sending a message to the pool process with the following payload:

ao.send({
   Target = 'DLJoP8Xtdat6SKz3kqYGZPaa7DJBG6etF1jRLQCwquo',
   Action = 'Join-Pool',
   Data = '{"dataset": <the pool id you want to join>, 
   "model": <your model id>}'})

Once you join a pool, the model will start evaluating the dataset and sending back the score to the pool process.

4. Scoring and Leaderboard

Retrieve Leaderboard Results

Retrieve the leaderboard results by sending a message to the pool process with the following payload:

ao.send({
   Target = 'DLJoP8Xtdat6SKz3kqYGZPaa7DJBG6etF1jRLQCwquo',
   Action = 'Leaderboard',
   Data = <pool id>
})

Leaderboard updated every 24 hours

This message will execute and display the model leaderboard within AOS. Such as:

Untitled

Last updated