NAME

OpenAPI::Client::OpenAI::Path::evals-eval_id-runs - Documentation for the /evals/{eval_id}/runs path.

OPERATIONS

GET /evals/{eval_id}/runs

getEvalRuns

$client->get_eval_runs({
    body => { ... },
});

Get a list of runs for an evaluation.

Path/query parameters

  • eval_id (in path, required, string) - The ID of the evaluation to retrieve runs for.

  • after (in query, optional, string) - Identifier for the last run from the previous pagination request.

  • limit (in query, optional, integer) - Number of runs to retrieve.

    Default: 20

  • order (in query, optional, string) - Sort order for runs by timestamp. Use asc for ascending order or desc for descending order. Defaults to asc .

    Allowed values: asc, desc

    Default: asc

  • status (in query, optional, string) - Filter runs by status. One of queued | in_progress | failed | completed | canceled .

    Allowed values: queued, in_progress, completed, canceled, failed

Responses

200 - A list of runs for the evaluation

Content-Type: application/json

Example:

{
   "data" : [
      {
         "created_at" : 1740110812,
         "error" : null,
         "eval_id" : "eval_67b7fa9a81a88190ab4aa417e397ea21",
         "id" : "evalrun_67b7fbdad46c819092f6fe7a14189620",
         "metadata" : {
            "test" : "synthetics"
         },
         "model" : "o3-mini",
         "name" : "Academic Assistant",
         "object" : "eval.run",
         "per_model_usage" : null,
         "per_testing_criteria_results" : [
            {
               "failed" : 80,
               "passed" : 91,
               "testing_criteria" : "String check grader"
            }
         ],
         "report_url" : "https://platform.openai.com/evaluations/eval_67b7fa9a81a88190ab4aa417e397ea21?run_id=evalrun_67b7fbdad46c819092f6fe7a14189620",
         "result_counts" : {
            "errored" : 0,
            "failed" : 80,
            "passed" : 91,
            "total" : 171
         },
         "run_data_source" : {
            "datasource_reference" : null,
            "max_completion_tokens" : null,
            "model" : "o3-mini",
            "seed" : null,
            "temperature" : null,
            "template_messages" : [
               {
                  "content" : {
                     "text" : "You are a helpful assistant.",
                     "type" : "input_text"
                  },
                  "role" : "system",
                  "type" : "message"
               },
               {
                  "content" : {
                     "text" : "Hello, can you help me with my homework?",
                     "type" : "input_text"
                  },
                  "role" : "user",
                  "type" : "message"
               }
            ],
            "top_p" : null,
            "type" : "completions"
         },
         "status" : "completed"
      }
   ],
   "first_id" : "evalrun_67abd54d60ec8190832b46859da808f7",
   "has_more" : false,
   "last_id" : "evalrun_67abd54d60ec8190832b46859da808f7",
   "object" : "list"
}

POST /evals/{eval_id}/runs

createEvalRun

$client->create_eval_run({
    body => { ... },
});

Kicks off a new run for a given evaluation, specifying the data source, and what model configuration to use to test. The datasource will be validated against the schema specified in the config of the evaluation.

Path/query parameters

  • eval_id (in path, required, string) - The ID of the evaluation to create a run for.

Responses

201 - Successfully created a run for the evaluation

Content-Type: application/json

Example:

{
   "created_at" : 1743092069,
   "data_source" : {
      "input_messages" : {
         "template" : [
            {
               "content" : {
                  "text" : "Categorize a given news headline into one of the following topics: Technology, Markets, World, Business, or Sports.\n\n# Steps\n\n1. Analyze the content of the news headline to understand its primary focus.\n2. Extract the subject matter, identifying any key indicators or keywords.\n3. Use the identified indicators to determine the most suitable category out of the five options: Technology, Markets, World, Business, or Sports.\n4. Ensure only one category is selected per headline.\n\n# Output Format\n\nRespond with the chosen category as a single word. For instance: \"Technology\", \"Markets\", \"World\", \"Business\", or \"Sports\".\n\n# Examples\n\n**Input**: \"Apple Unveils New iPhone Model, Featuring Advanced AI Features\"  \n**Output**: \"Technology\"\n\n**Input**: \"Global Stocks Mixed as Investors Await Central Bank Decisions\"  \n**Output**: \"Markets\"\n\n**Input**: \"War in Ukraine: Latest Updates on Negotiation Status\"  \n**Output**: \"World\"\n\n**Input**: \"Microsoft in Talks to Acquire Gaming Company for $2 Billion\"  \n**Output**: \"Business\"\n\n**Input**: \"Manchester United Secures Win in Premier League Football Match\"  \n**Output**: \"Sports\" \n\n# Notes\n\n- If the headline appears to fit into more than one category, choose the most dominant theme.\n- Keywords or phrases such as \"stocks\", \"company acquisition\", \"match\", or technological brands can be good indicators for classification.\n",
                  "type" : "input_text"
               },
               "role" : "developer",
               "type" : "message"
            },
            {
               "content" : {
                  "text" : "{{item.input}}",
                  "type" : "input_text"
               },
               "role" : "user",
               "type" : "message"
            }
         ],
         "type" : "template"
      },
      "model" : "gpt-4o-mini",
      "sampling_params" : {
         "max_completions_tokens" : 2048,
         "seed" : 42,
         "temperature" : 1,
         "top_p" : 1
      },
      "source" : {
         "content" : [
            {
               "item" : {
                  "ground_truth" : "Technology",
                  "input" : "Tech Company Launches Advanced Artificial Intelligence Platform"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Markets",
                  "input" : "Central Bank Increases Interest Rates Amid Inflation Concerns"
               }
            },
            {
               "item" : {
                  "ground_truth" : "World",
                  "input" : "International Summit Addresses Climate Change Strategies"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Business",
                  "input" : "Major Retailer Reports Record-Breaking Holiday Sales"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Sports",
                  "input" : "National Team Qualifies for World Championship Finals"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Markets",
                  "input" : "Stock Markets Rally After Positive Economic Data Released"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Business",
                  "input" : "Global Manufacturer Announces Merger with Competitor"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Technology",
                  "input" : "Breakthrough in Renewable Energy Technology Unveiled"
               }
            },
            {
               "item" : {
                  "ground_truth" : "World",
                  "input" : "World Leaders Sign Historic Climate Agreement"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Sports",
                  "input" : "Professional Athlete Sets New Record in Championship Event"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Business",
                  "input" : "Financial Institutions Adapt to New Regulatory Requirements"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Technology",
                  "input" : "Tech Conference Showcases Advances in Artificial Intelligence"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Markets",
                  "input" : "Global Markets Respond to Oil Price Fluctuations"
               }
            },
            {
               "item" : {
                  "ground_truth" : "World",
                  "input" : "International Cooperation Strengthened Through New Treaty"
               }
            },
            {
               "item" : {
                  "ground_truth" : "Sports",
                  "input" : "Sports League Announces Revised Schedule for Upcoming Season"
               }
            }
         ],
         "type" : "file_content"
      },
      "type" : "completions"
   },
   "error" : null,
   "eval_id" : "eval_67e579652b548190aaa83ada4b125f47",
   "id" : "evalrun_67e57965b480819094274e3a32235e4c",
   "metadata" : {},
   "model" : "gpt-4o-mini",
   "name" : "gpt-4o-mini",
   "object" : "eval.run",
   "per_model_usage" : null,
   "per_testing_criteria_results" : null,
   "report_url" : "https://platform.openai.com/evaluations/eval_67e579652b548190aaa83ada4b125f47?run_id=evalrun_67e57965b480819094274e3a32235e4c",
   "result_counts" : {
      "errored" : 0,
      "failed" : 0,
      "passed" : 0,
      "total" : 0
   },
   "status" : "queued"
}

400 - Bad request (for example, missing eval object)

Content-Type: application/json

Example:

{
   "code" : "string",
   "message" : "string",
   "param" : "string",
   "type" : "string"
}

SCHEMAS

CreateEvalRunRequest

Properties:

  • data_source (object, required) - Details about the run's data source.

  • metadata (Metadata)

    See "Metadata" below for shape.

  • name (string) - The name of the run.

Error

Properties:

  • code (anyOf, required)

  • message (string, required)

  • param (anyOf, required)

  • type (string, required)

EvalApiError

Properties:

  • code (string, required) - The error code.

  • message (string, required) - The error message.

EvalRun

Properties:

  • created_at (integer, required) - Unix timestamp (in seconds) when the evaluation run was created.

  • data_source (object, required) - Information about the run's data source.

  • error (EvalApiError, required)

    See "EvalApiError" below for shape.

  • eval_id (string, required) - The identifier of the associated evaluation.

  • id (string, required) - Unique identifier for the evaluation run.

  • metadata (Metadata, required)

    See "Metadata" below for shape.

  • model (string, required) - The model that is evaluated, if applicable.

  • name (string, required) - The name of the evaluation run.

  • object (string, required) - The type of the object. Always "eval.run".

    Allowed values: eval.run

    Default: eval.run

  • per_model_usage (array of object, required) - Usage statistics for each model during the evaluation run.

  • per_testing_criteria_results (array of object, required) - Results per testing criteria applied during the evaluation run.

  • report_url (string, required) - The URL to the rendered evaluation run report on the UI dashboard.

  • result_counts (object, required) - Counters summarizing the outcomes of the evaluation run.

  • status (string, required) - The status of the evaluation run.

EvalRunList

Properties:

  • data (array of EvalRun, required) - An array of eval run objects.

  • first_id (string, required) - The identifier of the first eval run in the data array.

  • has_more (boolean, required) - Indicates whether there are more evals available.

  • last_id (string, required) - The identifier of the last eval run in the data array.

  • object (string, required) - The type of this object. It is always set to "list".

    Allowed values: list

    Default: list

Metadata

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

SEE ALSO

OpenAPI::Client::OpenAI::Path

COPYRIGHT AND LICENSE

Copyright (C) 2023-2026 by Nelson Ferraz

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.0 or, at your option, any later version of Perl 5 you may have available.