Skip to content

🐍 Python Client API Reference

EGPClient

EGPClient(api_key=None, account_id=None, endpoint_url=None, config_generator=None, log_curl_commands=None)

The SGP client object. This is the main entry point for interacting with SGP.

From this client you can access "collections" which interact with various SGP components. Each collection will have various methods that interact with the API. Some collections may have sub-collections to signify a hierarchical relationship between the entities they represent.

For users within strict firewall environments, the client can be configured to use a proxy via the config_generator argument. Here is an example of how to do Kerberos Authentication through a proxy.

import httpx
from requests_kerberos import HTTPKerberosAuth

from scale_egp.sdk.client import EGPClient, EGPClientConfig, EGPClientConfigGenerator

class KerberosProxyConfigGenerator(EGPClientConfigGenerator):

    def __init__(self, proxy_url: str):
        self._proxy_url = proxy_url

    def generate(self) -> EGPClientConfig:
        return EGPClientConfig(proxies={
            "http://": httpx.Proxy(url=self._proxy_url, headers=self._get_proxy_headers()),
            "https://": httpx.Proxy(url=self._proxy_url, headers=self._get_proxy_headers()),
        })

    def _get_proxy_headers(self) -> httpx.Headers:
        auth = HTTPKerberosAuth()
        negotiate_details = auth.generate_request_header(None, parse_url(self._proxy_url).host, is_preemptive=True)
        return httpx.Headers({"Proxy-Authorization": negotiate_details}, encoding="utf-8")

client = EGPClient(
    api_key="<API_KEY>",
    config_generator=KerberosProxyConfigGenerator("http://proxy.example.com:3128")
)

Parameters:

Name Type Description Default
api_key str

The SGP API key to use. If not provided, the EGP_API_KEY environment variable will be used. Enterprise customers of SGP should use the API key provided to them by their Scale account manager.

None
account_id str

The SGP account ID to use. If not provided, the ACCOUNT_ID environment variable will be used.

None
endpoint_url str

The SGP endpoint URL to use. If not provided, the EGP_ENDPOINT_URL environment variable will be used. If that is not set, the default SGP endpoint URL https://api.egp.scale.com will be used. Enterprise customers of SGP should use the endpoint URL provided by their Scale account manager.

None
config_generator Optional[EGPClientConfigGenerator]

An instance of EGPClientConfigGenerator, which must implement a generate function that returns an EGPClientConfig object. The client config will be used to inject httpx.Client arguments on demand per request. This is useful for dynamically setting proxies, timeouts, etc.

None

application_specs

application_specs()

Returns the Application Spec Collection.

Use this collection to create and manage Application Specs. These are specifications for the AI application you are building. They contain information about the AI application such as its name and description. They are useful to associate your Evaluations with so evaluations can be grouped by application.

Returns:

Type Description
ApplicationSpecCollection

The Application Spec Collection.

chunks

chunks()

Returns the Chunk Collection.

Use this collection to create and manage Chunks.

Returns:

Type Description
ChunkCollection

The Chunk Collection.

completions

completions()

Returns the Completion Collection.

Use this collection if you want to make request to an LLM to generate a completion.

Returns:

Type Description
CompletionCollection

The Completion Collection.

evaluation_configs

evaluation_configs()

Returns the Evaluation Config Collection.

Use this collection to manage Evaluation Configurations. Evaluation Configurations are used to define the parameters of an evaluation.

Returns:

Type Description
EvaluationConfigCollection

The Evaluation Config Collection.

evaluation_datasets

evaluation_datasets()

Returns the Evaluation Dataset Collection.

Use this collection to create and manage Evaluation Datasets or Test Cases within them.

Returns:

Type Description
EvaluationDatasetCollection

The Evaluation Dataset Collection.

evaluations

evaluations()

Returns the Evaluation Collection.

Use this collection to create and manage Evaluations and Test Case Results.

Evaluations are used to evaluate the performance of AI applications. Users are expected to follow the following procedure to perform an evaluation:

  1. Select an Evaluation Dataset
  2. Iterate through the dataset's Test Cases:
  3. For each of these test cases, the user use their AI application to generate output data on each test case input prompt.
  4. The user then submits this data as as batch of Test Case Results associated with an Evaluation.
  5. Annotators will asynchronously log into the SGL annotation platform to annotate the submitted Test Case Results. The annotations will be used to evaluate the performance of the AI application.
  6. The submitting user will check back on their Test Case Results to see if the result field was populated. If so, the evaluation is complete and the user can use the annotation data to evaluate the performance of their AI application.

Returns:

Type Description
EvaluationCollection

The Evaluation Collection.

knowledge_base_data_sources

knowledge_base_data_sources()

Returns the Knowledge Base Data Source Collection.

Use this collection to create and manage Knowledge Bases.

Returns:

Type Description
KnowledgeBaseDataSourceCollection

The Knowledge Base Data Source Collection.

knowledge_bases

knowledge_bases()

Returns the Knowledge Base Collection.

Use this collection to create and manage Knowledge Bases.

Returns:

Type Description
KnowledgeBaseCollection

The Knowledge Base Collection.

model_groups

model_groups()

Returns the Model Group Collection.

Use this collection to create and manage Model Groups.

TODO: Write extensive documentation on Model Groups

Returns:

Type Description
ModelGroupCollection

The Model Group Collection.

model_templates

model_templates()

Returns the Model Template Collection.

Use this collection to create and manage Model Templates.

In order to prevent any user from creating any arbitrary model, users with more advanced permissions can create Model Templates. Models can only be created from Model Templates. This allows power users to create a set of approved models that other users can derive from.

When the model is instantiated from a model template, the settings from the template are referenced to reserve the required computing resources, pull the correct docker image, etc.

Returns:

Type Description
ModelTemplateCollection

The Model Template Collection.

models

models()

Returns the Model Collection.

Use this collection to create and manage Models.

in generative AI applications, there are many types of models that are useful. For example, embedding models are useful for translating natural language into query-able vector representations, reranking models are useful when a vector database's query results need to be re-ranked based on some other criteria, and LLMs are useful for generating text from a prompt.

This collection allows you to create, deploy, and manage any custom model you choose if none of the built-in models fit your use case.

Returns:

Type Description
ModelInstanceCollection

The Model Collection.

question_sets

question_sets()

Returns the Question Set Collection.

Use this collection to create and manage Question Sets.

Returns:

Type Description
QuestionSetCollection

The Question Set Collection.

questions

questions()

Returns the Question Collection.

Use this collection to create and manage Questions.

Returns:

Type Description
QuestionCollection

The Question Collection.

users

users()

Returns the Users Collection.

Use this collection to get information about the currently authenticated user or to get information about other users.

Returns:

Type Description
UsersCollection

The Users Collection.

KnowledgeBaseCollection

KnowledgeBaseCollection(api_client)

artifacts

artifacts()

Returns a KnowledgeBaseArtifactsCollection object for artifacts associated with a knowledge base.

Returns:

Type Description
KnowledgeBaseArtifactsCollection

A KnowledgeBaseArtifactsCollection object.

chunks

chunks()

Returns a KnowledgeBaseChunksCollection object for chunks associated with a knowledge base.

Returns:

Type Description
KnowledgeBaseChunksCollection

A KnowledgeBaseChunksCollection object.

create

create(name, embedding_model_name=None, model_deployment_id=None, metadata=None, account_id=None)

Create a new Knowledge Base. Must pass either embedding_model_name or model_deployment_id.

Parameters:

Name Type Description Default
name str

The name of the Knowledge Base.

required
embedding_model_name Optional[EmbeddingModelName]

The name of the embedding model to use for the Knowledge Base.

None
model_deployment_id Optional[str]

ID for a EmbeddingConfigModelsAPI config.

None
metadata Optional[Dict[str, Any]]

The metadata of the Knowledge Base.

None
account_id Optional[str]

The ID of the account to create this Knowledge Base for.

None

Returns:

Type Description
KnowledgeBase

The newly created Knowledge Base.

delete

delete(id)

Delete a Knowledge Base by ID.

Parameters:

Name Type Description Default
id str

The ID of the Knowledge Base.

required

get

get(id)

Get an Knowledge Base by ID.

Parameters:

Name Type Description Default
id str

The ID of the Knowledge Base.

required

Returns:

Type Description
KnowledgeBase

The Knowledge Base.

list

list()

List all Knowledge Bases.

Returns:

Type Description
List[KnowledgeBase]

A list of Knowledge Bases.

upload_schedules

upload_schedules()

Returns a KnowledgeBaseUploadScheduleCollection object for upload schedules associated with a knowledge base.

Returns:

Type Description
KnowledgeBaseUploadScheduleCollection

A KnowledgeBaseUploadScheduleCollection object.

uploads

uploads()

Returns a KnowledgeBaseUploadsCollection object for uploads associated with a knowledge base.

Returns:

Type Description
KnowledgeBaseUploadsCollection

A KnowledgeBaseUploadsCollection object.

KnowledgeBaseUploadsCollection

KnowledgeBaseUploadsCollection(api_client)

cancel

cancel(knowledge_base, id)

Cancel an upload.

Parameters:

Name Type Description Default
knowledge_base KnowledgeBase

The Knowledge Base the upload was created for.

required
id str

The ID of the upload to cancel.

required

Returns:

Type Description
bool

True if the upload was canceled, False otherwise.

create_local_upload

create_local_upload(knowledge_base, data_source_config, chunks)

Create a new local upload.

Parameters:

Name Type Description Default
knowledge_base KnowledgeBase

The Knowledge Base to upload data to.

required
data_source_config LocalChunksSourceConfig

The data source config.

required
chunks List[ChunkToUpload]

The chunks to upload.

required

Returns:

Type Description
KnowledgeBaseUpload

The newly created local upload.

create_remote_upload

create_remote_upload(knowledge_base, data_source_config, data_source_auth_config, chunking_strategy_config)

Create a new remote upload.

Parameters:

Name Type Description Default
knowledge_base KnowledgeBase

The Knowledge Base to upload data to.

required
data_source_config RemoteDataSourceConfig

The data source config.

required
data_source_auth_config Optional[DataSourceAuthConfig]

The data source auth config.

required
chunking_strategy_config ChunkingStrategyConfig

The chunking strategy config.

required

Returns:

Type Description
KnowledgeBaseUpload

The newly created remote upload.

get

get(id, knowledge_base)

Get an Knowledge Base Upload by ID.

Parameters:

Name Type Description Default
id str

The ID of the Knowledge Base Upload.

required
knowledge_base KnowledgeBase

The Knowledge Base the upload was created for.

required

Returns:

Type Description
KnowledgeBaseUpload

The Knowledge Base Upload.

list

list(knowledge_base)

List all Knowledge Base Uploads.

Returns:

Type Description
List[KnowledgeBaseUpload]

A list of Knowledge Base Uploads.

KnowledgeBaseArtifactsCollection

KnowledgeBaseArtifactsCollection(api_client)

get

get(id, knowledge_base, status_filter=ChunkUploadStatus.COMPLETED.value)

Get a Knowledge Base Artifact by ID.

Parameters:

Name Type Description Default
id str

The ID of the Knowledge Base Artifact.

required
knowledge_base KnowledgeBase

The Knowledge Base the artifact was created for.

required
status_filter Optional[ChunkUploadStatus]

Return only artifacts with the given status.

value

Returns:

Type Description
KnowledgeBaseArtifact

The Knowledge Base Artifact.

list

list(knowledge_base)

List all Knowledge Base Artifacts.

Returns:

Type Description
List[KnowledgeBaseArtifact]

A list of Knowledge Base Artifacts.

ChunkCollection

ChunkCollection(api_client)

rank

rank(query, relevant_chunks, rank_strategy, top_k=None, account_id=None)

Re-rank a list of chunks against a query.

Parameters:

Name Type Description Default
query str

Natural language query to re-rank chunks against. If a vector store query was originally used to retrieve these chunks, please use the same query for this ranking.

required
relevant_chunks List[Chunk]

List of chunks to rank.

required
rank_strategy Union[CrossEncoderRankStrategy, RougeRankStrategy, ModelRankStrategy]

The ranking strategy to use. Rank strategies determine how the ranking is done, They consist of the ranking method name and additional params needed to compute the ranking. So far, only the cross_encoder rank strategy is supported. We plan to support more rank strategies soon.

required
top_k Optional[int]

Number of chunks to return. Must be greater than 0 if specified. If not specified, all chunks will be returned.

None

Returns:

Type Description
List[Chunk]

An ordered list of the re-ranked chunks.

synthesize

synthesize(query, chunks)

Synthesize a natural language response from a list of chunks.

Parameters:

Name Type Description Default
query str

Natural language query to synthesize response from.

required
chunks List[Chunk]

List of chunks to synthesize response from.

required

Returns:

Type Description
str

A natural language response synthesized from the list of chunks.

CompletionCollection

CompletionCollection(api_client)

create

create(model, prompt, account_id, images=None, model_parameters=None)

Create a new LLM Completion.

Parameters:

Name Type Description Default
model Union[Literal['gpt-4', 'gpt-4-0613', 'gpt-4-32k', 'gpt-4-32k-0613', 'gpt-4-vision-preview', 'gpt-4o', 'gpt-3.5-turbo', 'gpt-3.5-turbo-0613', 'gpt-3.5-turbo-16k', 'gpt-3.5-turbo-16k-0613', 'text-davinci-003', 'text-davinci-002', 'text-curie-001', 'text-babbage-001', 'text-ada-001', 'claude-instant-1', 'claude-instant-1.1', 'claude-2', 'claude-2.0', 'llama-7b', 'llama-2-7b', 'llama-2-7b-chat', 'llama-2-13b', 'llama-2-13b-chat', 'llama-2-70b', 'llama-2-70b-chat', 'falcon-7b', 'falcon-7b-instruct', 'falcon-40b', 'falcon-40b-instruct', 'mpt-7b', 'mpt-7b-instruct', 'flan-t5-xxl', 'mistral-7b', 'mistral-7b-instruct', 'mixtral-8x7b', 'mixtral-8x7b-instruct', 'llm-jp-13b-instruct-full', 'llm-jp-13b-instruct-full-dolly', 'zephyr-7b-alpha', 'zephyr-7b-beta', 'codellama-7b', 'codellama-7b-instruct', 'codellama-13b', 'codellama-13b-instruct', 'codellama-34b', 'codellama-34b-instruct', 'codellama-70b', 'codellama-70b-instruct', 'gemini-pro', 'gemini-1.5-pro-preview-0409'], str]

The model to use for the completion.

required
prompt str

The prompt to use for the completion.

required
model_parameters Optional[ModelParameters]

The parameters to use for the model.

None

Returns:

Type Description
Completion

The newly created Completion.

stream

stream(model, prompt, account_id, images=None, model_parameters=None)

Stream LLM Completions.

Returns:

Type Description
Iterable[Completion]

The newly created Completion.

ModelTemplateCollection

ModelTemplateCollection(api_client)

Collections class for SGP Models.

create

create(name, endpoint_type, model_type, vendor_configuration, model_creation_parameters_schema=None, model_request_parameters_schema=None, account_id=None)

Create a new SGP Model Template.

Parameters:

Name Type Description Default
name str

The name of the Model Template.

required
endpoint_type ModelEndpointType

The type of model this template will create. See Model Types and Schemas

required
model_type ModelType

The type of the Model Template.

required
vendor_configuration ModelVendorConfiguration

The vendor configuration of the Model Template.

required
model_creation_parameters_schema Optional[ParameterSchema]

The model creation parameters schema of the Model Template.

None
model_request_parameters_schema Optional[ParameterSchema]

The model request parameters schema of the Model Template.

None
account_id Optional[str]

The account ID of the Model Template.

None

Returns:

Type Description
ModelTemplate

The created Model Template.

delete

delete(id)

Delete a Model Template by ID.

Returns:

Type Description
bool

True if the Model Template was successfully deleted.

get

get(id)

Get a Model Template by ID.

Returns:

Type Description
ModelTemplate

The Model Template.

list

list()

List all Model Templates that the user has access to.

Returns:

Type Description
List[ModelTemplate]

A list of Model Templates that the user has access to.

update

update(id, *, name, endpoint_type=None, model_type=None, vendor_configuration=None, model_creation_parameters_schema=None, model_request_parameters_schema=None)

Update a Model Template by ID.

Returns:

Type Description
ModelTemplate

The updated Model Template.

ModelInstanceCollection

ModelInstanceCollection(api_client)

Collections class for SGP Models.

create

create(name, model_type, model_group_id=None, model_vendor=None, model_template_id=None, base_model_id=None, base_model_metadata=None, account_id=None, model_card=None, training_data_card=None)

Create a new SGP Model.

Returns:

Type Description
ModelInstance

The created Model.

delete

delete(id)

Delete a model by ID.

Returns:

Type Description
bool

True if the model was successfully deleted.

deployments

deployments()

Returns a ModelDeploymentCollection for deployments associated with this model.

get

get(id)

Get a Model by ID.

Returns:

Type Description
ModelInstance

The Model.

list

list()

List all models that the user has access to.

Returns:

Type Description
List[ModelInstance]

A list of models.

update

update(id, *, name=None, model_template_id=None, base_model_id=None, model_creation_parameters=None)

Update a Model by ID.

Returns:

Type Description
ModelInstance

The updated Model.

ModelDeploymentCollection

ModelDeploymentCollection(api_client)

create

create(model, name, model_creation_parameters=None, vendor_configuration=None, account_id=None)

Create a new ModelDeployment.

Parameters:

Name Type Description Default
model ModelInstance

The Model to associate the ModelDeployment with.

required

Returns:

Type Description
ModelDeployment

The newly created ModelDeployment.

delete

delete(id, model)

Delete a ModelDeployment.

Parameters:

Name Type Description Default
id str

The ID of the ModelDeployment.

required
model ModelInstance

The Model to associate the ModelDeployment with.

required

execute

execute(id, model, request, timeout=None)

Execute the specified model deployment with the given request.

Returns:

Type Description
BaseModelResponse

The model deployment's response.

get

get(id, model)

Get a ModelDeployment by ID.

Parameters:

Name Type Description Default
id str

The ID of the ModelDeployment.

required
model ModelInstance

The Model to associate the ModelDeployment with.

required

Returns:

Type Description
ModelDeployment

The ModelDeployment.

list

list(model)

List ModelDeployment.

Parameters:

Name Type Description Default
model ModelInstance

The Model to associate the ModelDeployment with.

required

Returns:

Type Description
List[ModelDeployment]

A list of ModelDeployment.

ApplicationSpecCollection

ApplicationSpecCollection(api_client)

create

create(name, description, account_id=None)

Create a new Application Spec.

Parameters:

Name Type Description Default
name str

The name of the Application Spec.

required
description str

The description of the Application Spec.

required
account_id Optional[str]

The ID of the account to create this Application Spec for.

None

Returns:

Type Description
ApplicationSpec

The newly created Application Spec.

delete

delete(id)

Delete an Application Spec by ID.

Parameters:

Name Type Description Default
id str

The ID of the Application Spec.

required

get

get(id)

Get an Application Spec by ID.

Parameters:

Name Type Description Default
id str

The ID of the Application Spec.

required

Returns:

Type Description
ApplicationSpec

The Application Spec.

list

list()

List all Application Specs.

Returns:

Type Description
List[ApplicationSpec]

A list of Application Specs.

update

update(id, name=None, description=None)

Update an Application Spec by ID.

Parameters:

Name Type Description Default
id str

The ID of the Application Spec.

required
name Optional[str]

The name of the Application Spec.

None
description Optional[str]

The description of the Application Spec.

None

Returns:

Type Description
ApplicationSpec

The updated Application Spec.

QuestionCollection

QuestionCollection(api_client)

create

create(type, title, prompt, choices, multi=None, dropdown=False, required=False, conditions=None, account_id=None)

Create a new Question.

Parameters:

Name Type Description Default
type QuestionType

The type of the Question.

required
title str

The title of the Question.

required
prompt str

The prompt of the Question.

required
account_id Optional[str]

The ID of the account to create this Question for.

None
choices Optional[List[CategoricalChoice]]

The choices of the Question.

required
multi Optional[bool]

Whether the question is multi-select

None
dropdown Optional[bool]

Whether the question is to be displayed as a dropdown

False
required Optional[bool]

Whether the question is required

False
conditions Optional[List[dict]]

The conditions for the question

None

Returns: The newly created Evaluation Configuration.

get

get(id)

Get the details of a Question.

Parameters:

Name Type Description Default
id str

The ID of the Question.

required

Returns:

Type Description
Question

The Question.

list

list()

List Questions.

Returns:

Type Description
List[Question]

A list of Questions.

QuestionSetCollection

QuestionSetCollection(api_client)

create

create(name, questions, account_id=None)

Create a new Question Set.

Parameters:

Name Type Description Default
name str

The name of the Question Set.

required
questions List[Question]

The questions in this Question Set.

required
account_id Optional[str]

The ID of the account to create this Question Set for.

None

Returns: The newly created Evaluation Configuration.

get

get(id)

Get the details of a Question Set.

Parameters:

Name Type Description Default
id str

The ID of the Question Set.

required

Returns:

Type Description
QuestionSet

The details of the Question Set.

list

list()

List Question Sets.

Returns:

Type Description
List[QuestionSet]

A list of Question Sets.

EvaluationConfigCollection

EvaluationConfigCollection(api_client)

create

create(evaluation_type, question_set, account_id=None)

Create a new Evaluation Configuration.

Parameters:

Name Type Description Default
evaluation_type EvaluationType

The type of Evaluation. Only HUMAN is supported.

required
question_set QuestionSet

The Question Set to associate with the Evaluation.

required
account_id Optional[str]

The ID of the account to create this Evaluation Configuration for.

None

Returns:

Type Description
EvaluationConfig

The newly created Evaluation Configuration.

delete

delete(id)

Delete an Evaluation Configuration.

Parameters:

Name Type Description Default
id str

The ID of the Evaluation Configuration.

required

get

get(id)

Get the details of an evaluation config.

Parameters:

Name Type Description Default
id str

The ID of the Evaluation Configuration.

required

Returns:

Type Description
EvaluationConfig

The Evaluation Configuration.

list

list()

List Evaluation Configurations.

Returns:

Type Description
List[EvaluationConfig]

A list of Evaluation Configurations.

EvaluationDatasetCollection

EvaluationDatasetCollection(api_client)

add_test_cases

add_test_cases(evaluation_dataset, test_cases_data, update_dataset_version=True)

Add new test cases to an existing dataset.

Unless you want to batch up multiple modifications to a dataset and snapshot them all at once, you should leave update_dataset_version=True. See the docs for the update_dataset_version method for more details.

The schema_types currently supported and their corresponding fields are:

Schema Types:

GENERATION

Field Type Default
input str required
expected_output str None
expected_extra_info Dict[str, Any] None

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to add test cases to.

required
test_cases_data List[Union[GenerationTestCaseData]]

The test cases to add.

required
update_dataset_version bool

Whether to update the dataset version after adding the test cases. Defaults to True.

True

Returns:

Type Description
EvaluationDataset

The updated dataset.

add_test_cases_from_file

add_test_cases_from_file(evaluation_dataset, filepath, update_dataset_version=True)

Add new test cases to an existing dataset from a JSONL file.

Unless you want to batch up multiple modifications to a dataset and snapshot them all at once, you should leave update_dataset_version=True. See the docs for the update_dataset_version method for more details.

The schema_types currently supported and their corresponding fields are:

Schema Types:

GENERATION

Field Type Default
input str required
expected_output str None
expected_extra_info Dict[str, Any] None

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to add test cases to.

required
filepath str

The path to the JSONL file.

required
update_dataset_version bool

Whether to update the dataset version after adding the test cases.

True

Returns:

Type Description
EvaluationDataset

The updated dataset.

create

create(name, schema_type, account_id=None)

Create a new empty dataset.

Generally since most users will already have a list of test cases they want to create a dataset from, they should use the create_from_file method instead.

Parameters:

Name Type Description Default
name str

The name of the dataset.

required
schema_type TestCaseSchemaType

The schema type of the dataset.

required
account_id Optional[str]

The ID of the account to create this dataset for.

None

Returns:

Type Description
EvaluationDataset

The newly created dataset.

create_from_file

create_from_file(name, schema_type, filepath)

Create a new dataset that is seeded with test cases from a JSONL file.

The headers of the JSONL file must match the fields of the specified schema type.

The schema_types currently supported and their corresponding fields are:

Schema Types:

GENERATION

Field Type Default
input str required
expected_output str None
expected_extra_info Dict[str, Any] None

Parameters:

Name Type Description Default
name str

The name of the dataset.

required
schema_type TestCaseSchemaType

The schema type of the dataset.

required
filepath str

The path to the JSONL file.

required

Returns:

Type Description
EvaluationDataset

The newly created dataset.

delete

delete(id)

Delete an existing dataset.

Parameters:

Name Type Description Default
id str

The ID of the dataset.

required

Returns:

Type Description
bool

True if the dataset was deleted, False otherwise.

delete_test_cases

delete_test_cases(evaluation_dataset, test_case_ids, update_dataset_version=True)

Delete test cases from an existing dataset.

Unless you want to batch up multiple modifications to a dataset and snapshot them all at once, you should leave update_dataset_version=True. See the docs for the update_dataset_version method for more details.

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to delete test cases from.

required
test_case_ids List[str]

The IDs of the test cases to delete.

required
update_dataset_version bool

Whether to update the dataset version after deleting the test cases.

True

Returns:

Type Description
EvaluationDataset

The updated dataset.

get

get(id)

Get an existing dataset by ID.

Parameters:

Name Type Description Default
id str

The ID of the dataset.

required

Returns:

Type Description
EvaluationDataset

The dataset.

list

list()

List all datasets.

Returns:

Type Description
List[EvaluationDataset]

The datasets.

modify_test_cases

modify_test_cases(evaluation_dataset, modified_test_cases, update_dataset_version=True)

Modify test cases in an existing dataset.

Unless you want to batch up multiple modifications to a dataset and snapshot them all at once, you should leave update_dataset_version=True. See the docs for the update_dataset_version method for more details.

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to modify test cases in.

required
modified_test_cases List[TestCase]

The modified test cases.

required
update_dataset_version bool

Whether to update the dataset version after modifying the test cases.

True

Returns:

Type Description
EvaluationDataset

The updated dataset.

overwrite_from_file

overwrite_from_file(evaluation_dataset, schema_type, filepath)

Overwrite all test cases in existing dataset from a JSONL file.

The headers of the JSONL file must match the fields of the specified schema type.

The schema_types currently supported and their corresponding fields are:

Schema Types:

GENERATION

Field Type Default
input str required
expected_output str None
expected_extra_info Dict[str, Any] None

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to overwrite.

required
schema_type TestCaseSchemaType

The schema type of the dataset.

required
filepath str

The path to the JSONL file.

required

Returns:

Type Description
EvaluationDataset

The updated dataset.

test_cases

test_cases()

Returns a TestCaseCollection object for test cases associated with the current Evaluation Dataset.

update

update(id, name=None, schema_type=None)

Update the attributes of an existing dataset.

Important: This method will NOT update the version of the dataset. It will only update the attributes of the dataset. If you want to snapshot the current state of the dataset under an incremented version number, you should use the update_dataset_version method instead.

Parameters:

Name Type Description Default
id str

The ID of the dataset.

required
name Optional[str]

The name of the dataset.

None
schema_type Optional[TestCaseSchemaType]

The schema type of the dataset.

None

Returns:

Type Description
EvaluationDataset

The updated dataset.

update_dataset_version

update_dataset_version(evaluation_dataset)

Update the version of an existing dataset.

This method will snapshot the current state of the dataset under an incremented version number.

Warning: By default, the add_test_cases, delete_test_cases, and modify_test_cases methods will automatically update the dataset version for you. However, if you want to batch up multiple modifications to a dataset and snapshot them all at once, you can set update_dataset_version=False on those methods and then call this method manually afterward.

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to update the version of.

required

Returns:

Type Description
EvaluationDataset

The updated dataset.

TestCaseCollection

TestCaseCollection(api_client)

create

create(evaluation_dataset, schema_type, test_case_data, test_case_metadata=None, chat_history=None)

Create a new test case.

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to create the test case in.

required
schema_type TestCaseSchemaType

The schema type of the test case.

required
test_case_data Union[GenerationTestCaseData]

The test case data.

required

Returns:

Type Description
TestCase

The newly created test case.

create_batch

create_batch(evaluation_dataset, test_cases)

Create multiple new test cases.

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to create the test cases in.

required
test_cases List[TestCaseRequest]

The test cases to create.

required

Returns:

Type Description
List[TestCase]

The newly created test cases.

delete

delete(id, evaluation_dataset)

Delete an existing test case.

Parameters:

Name Type Description Default
id str

The ID of the test case.

required
evaluation_dataset EvaluationDataset

The dataset to delete the test case from.

required

Returns:

Type Description
bool

True if the test case was deleted successfully, False otherwise.

get

get(id, evaluation_dataset)

Get an existing test case by ID.

Parameters:

Name Type Description Default
id str

The ID of the test case.

required
evaluation_dataset EvaluationDataset

The dataset to get the test case from.

required

Returns:

Type Description
TestCase

The test case.

iter

iter(evaluation_dataset)

Iterate over all test cases in a dataset.

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to iterate over test cases from.

required

Returns:

Type Description
Iterable[TestCase]

The test cases.

list

list(evaluation_dataset)

List all test cases in a dataset.

Parameters:

Name Type Description Default
evaluation_dataset EvaluationDataset

The dataset to list test cases from.

required

Returns:

Type Description
List[TestCase]

The test cases.

update

update(test_case_id, evaluation_dataset, schema_type=None, test_case_data=None, test_case_metadata=None, chat_history=None)

Update an existing test case.

Parameters:

Name Type Description Default
test_case_id str

The ID of the test case.

required
evaluation_dataset EvaluationDataset

The dataset to update the test case in.

required
schema_type Optional[TestCaseSchemaType]

The schema type of the test case.

None
test_case_data Optional[Union[GenerationTestCaseData]]

The test case data.

None

Returns:

Type Description
TestCase

The updated test case.

EvaluationCollection

EvaluationCollection(api_client)

create

create(name, description, application_spec, evaluation_config, tags=None, account_id=None)

Create a new Evaluation.

Parameters:

Name Type Description Default
name str

The name of the Evaluation.

required
description str

The description of the Evaluation.

required
application_spec ApplicationSpec

The Application Spec to associate the Evaluation with.

required
evaluation_config EvaluationConfig

The configuration for the Evaluation.

required
tags Optional[Dict[str, Any]]

Optional key, value pairs to associate with the Evaluation.

None
account_id Optional[str]

The ID of the account to create this Evaluation for.

None

Returns:

Type Description
Evaluation

The newly created Evaluation.

delete

delete(id)

Delete an Evaluation.

Parameters:

Name Type Description Default
id str

The ID of the Evaluation.

required

get

get(id)

Get an Evaluation by ID.

Parameters:

Name Type Description Default
id str

The ID of the Evaluation.

required

Returns:

Type Description
Evaluation

The Evaluation.

list

list()

List Evaluations.

Returns:

Type Description
List[Evaluation]

A list of Evaluations.

test_case_results

test_case_results()

Returns a TestCaseResultCollection for test case results associated with this evaluation.

update

update(id, name=None, description=None, evaluation_config=None, tags=None)

Update an Evaluation.

Parameters:

Name Type Description Default
id str

The ID of the Evaluation.

required
name Optional[str]

The name of the Evaluation.

None
description Optional[str]

The description of the Evaluation.

None
evaluation_config Optional[EvaluationConfig]

The configuration for the Evaluation.

None
tags Optional[Dict[str, Any]]

Optional key, value pairs to associate with the Evaluation.

None

Returns:

Type Description
Evaluation

The updated Evaluation.