Skip to content

Launch Client

LaunchClient

LaunchClient(api_key: str, endpoint: Optional[str] = None, self_hosted: bool = False, use_path_with_custom_endpoint: bool = False)

Scale Launch Python Client.

Initializes a Scale Launch Client.

Parameters:

Name Type Description Default
api_key str

Your Scale API key

required
endpoint Optional[str]

The Scale Launch Endpoint (this should not need to be changed)

None
self_hosted bool

True iff you are connecting to a self-hosted Scale Launch

False
use_path_with_custom_endpoint bool

True iff you are not using the default Scale Launch endpoint but your endpoint has path routing (to SCALE_LAUNCH_VX_PATH) set up

False

batch_async_request

batch_async_request(*, model_bundle: Union[ModelBundle, str], urls: Optional[List[str]] = None, inputs: Optional[List[Dict[str, Any]]] = None, batch_url_file_location: Optional[str] = None, serialization_format: str = 'JSON', labels: Optional[Dict[str, str]] = None, cpus: Optional[int] = None, memory: Optional[str] = None, gpus: Optional[int] = None, gpu_type: Optional[str] = None, storage: Optional[str] = None, max_workers: Optional[int] = None, per_worker: Optional[int] = None, timeout_seconds: Optional[float] = None) -> Dict[str, Any]

Sends a batch inference request using a given bundle. Returns a key that can be used to retrieve the results of inference at a later time.

Must have exactly one of urls or inputs passed in.

Parameters:

Name Type Description Default
model_bundle Union[ModelBundle, str]

The bundle or the name of a the bundle to use for inference.

required
urls Optional[List[str]]

A list of urls, each pointing to a file containing model input. Must be accessible by Scale Launch, hence urls need to either be public or signedURLs.

None
inputs Optional[List[Dict[str, Any]]]

A list of model inputs, if exists, we will upload the inputs and pass it in to Launch.

None
batch_url_file_location Optional[str]

In self-hosted mode, the input to the batch job will be uploaded to this location if provided. Otherwise, one will be determined from bundle_location_fn()

None
serialization_format str

Serialization format of output, either 'PICKLE' or 'JSON'. 'pickle' corresponds to pickling results + returning

'JSON'
labels Optional[Dict[str, str]]

An optional dictionary of key/value pairs to associate with this endpoint.

None
cpus Optional[int]

Number of cpus each worker should get, e.g. 1, 2, etc. This must be greater than or equal to 1.

None
memory Optional[str]

Amount of memory each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of memory.

None
storage Optional[str]

Amount of local ephemeral storage each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of storage.

None
gpus Optional[int]

Number of gpus each worker should get, e.g. 0, 1, etc.

None
max_workers Optional[int]

The maximum number of workers. Must be greater than or equal to 0, and as well as greater than or equal to min_workers.

None
per_worker Optional[int]

The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing per_worker requests:

  • If the average number of concurrent requests per worker is lower than per_worker, then the number of workers will be reduced.
  • Otherwise, if the average number of concurrent requests per worker is higher than per_worker, then the number of workers will be increased to meet the elevated traffic.
None
gpu_type Optional[str]

If specifying a non-zero number of gpus, this controls the type of gpu requested. Here are the supported values:

  • nvidia-tesla-t4
  • nvidia-ampere-a10
  • nvidia-hopper-h100
  • nvidia-hopper-h100-1g20g
  • nvidia-hopper-h100-3g40g
None
timeout_seconds Optional[float]

The maximum amount of time (in seconds) that the batch job can take. If not specified, the server defaults to 12 hours. This includes the time required to build the endpoint and the total time required for all the individual tasks.

None

Returns:

Type Description
Dict[str, Any]

A dictionary that contains job_id as a key, and the ID as the value.

cancel_fine_tune

cancel_fine_tune(fine_tune_id: str) -> CancelFineTuneResponse

Cancel a fine-tune

Parameters:

Name Type Description Default
fine_tune_id str

ID of the fine-tune

required

Returns:

Name Type Description
CancelFineTuneResponse CancelFineTuneResponse

whether the cancellation was successful

clone_model_bundle_with_changes

clone_model_bundle_with_changes(model_bundle: Union[ModelBundle, str], app_config: Optional[Dict] = None) -> ModelBundle
Warning

This method is deprecated. Use clone_model_bundle_with_changes_v2 instead.

Parameters:

Name Type Description Default
model_bundle Union[ModelBundle, str]

The existing bundle or its ID.

required
app_config Optional[Dict]

The new bundle's app config, if not passed in, the new bundle's app_config will be set to None

None

Returns:

Type Description
ModelBundle

A ModelBundle object

clone_model_bundle_with_changes_v2

clone_model_bundle_with_changes_v2(original_model_bundle_id: str, new_app_config: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Clone a model bundle with an optional new app_config.

Parameters:

Name Type Description Default
original_model_bundle_id str

The ID of the model bundle you want to clone.

required
new_app_config Optional[Dict[str, Any]]

A dictionary of new app config values to use for the cloned model.

None

Returns:

Type Description
CreateModelBundleV2Response

An object containing the following keys:

  • model_bundle_id: The ID of the cloned model bundle.

completions_stream

completions_stream(endpoint_name: str, prompt: str, max_new_tokens: int, temperature: float, stop_sequences: Optional[List[str]] = None, return_token_log_probs: Optional[bool] = False, timeout: float = DEFAULT_LLM_COMPLETIONS_TIMEOUT) -> Iterable[CompletionStreamV1Response]

Run prompt completion on an LLM endpoint in streaming fashion. Will fail if endpoint does not support streaming.

Parameters:

Name Type Description Default
endpoint_name str

The name of the LLM endpoint to make the request to

required
prompt str

The prompt to send to the endpoint

required
max_new_tokens int

The maximum number of tokens to generate for each prompt

required
temperature float

The temperature to use for sampling

required
stop_sequences Optional[List[str]]

List of sequences to stop the completion at

None
return_token_log_probs Optional[bool]

Whether to return the log probabilities of the tokens

False

Returns:

Type Description
Iterable[CompletionStreamV1Response]

Iterable responses for prompt completion

completions_sync

completions_sync(endpoint_name: str, prompt: str, max_new_tokens: int, temperature: float, stop_sequences: Optional[List[str]] = None, return_token_log_probs: Optional[bool] = False, timeout: float = DEFAULT_LLM_COMPLETIONS_TIMEOUT) -> CompletionSyncV1Response

Run prompt completion on a sync LLM endpoint. Will fail if the endpoint is not sync.

Parameters:

Name Type Description Default
endpoint_name str

The name of the LLM endpoint to make the request to

required
prompt str

The completion prompt to send to the endpoint

required
max_new_tokens int

The maximum number of tokens to generate for each prompt

required
temperature float

The temperature to use for sampling

required
stop_sequences Optional[List[str]]

List of sequences to stop the completion at

None
return_token_log_probs Optional[bool]

Whether to return the log probabilities of the tokens

False

Returns:

Type Description
CompletionSyncV1Response

Response for prompt completion

create_docker_image_batch_job

create_docker_image_batch_job(*, labels: Dict[str, str], docker_image_batch_job_bundle: Optional[Union[str, DockerImageBatchJobBundleResponse]] = None, docker_image_batch_job_bundle_name: Optional[str] = None, job_config: Optional[Dict[str, Any]] = None, cpus: Optional[int] = None, memory: Optional[str] = None, gpus: Optional[int] = None, gpu_type: Optional[str] = None, storage: Optional[str] = None)

For self hosted mode only. Parameters: docker_image_batch_job_bundle: Specifies the docker image bundle to use for the batch job. Either the string id of a docker image bundle, or a DockerImageBatchJobBundleResponse object. Only one of docker_image_batch_job_bundle and docker_image_batch_job_bundle_name can be specified. docker_image_batch_job_bundle_name: The name of a batch job bundle. If specified, Launch will use the most recent bundle with that name owned by the current user. Only one of docker_image_batch_job_bundle and docker_image_batch_job_bundle_name can be specified. labels: Kubernetes labels that are present on the batch job. job_config: A JSON-serializable python object that will get passed to the batch job, specifically as the contents of a file mounted at mount_location inside the bundle. You can call python's json.load() on the file to retrieve the contents. cpus: Optional override for the number of cpus to give to your job. Either the default must be specified in the bundle, or this must be specified. memory: Optional override for the amount of memory to give to your job. Either the default must be specified in the bundle, or this must be specified. gpus: Optional number of gpus to give to the bundle. If not specified in the bundle or here, will be interpreted as 0 gpus. gpu_type: Optional type of gpu. If the final number of gpus is positive, must be specified either in the bundle or here. storage: Optional reserved amount of disk to give to your batch job. If not specified, your job may be evicted if it is using too much disk.

create_docker_image_batch_job_bundle

create_docker_image_batch_job_bundle(*, name: str, image_repository: str, image_tag: str, command: List[str], env: Optional[Dict[str, str]] = None, mount_location: Optional[str] = None, cpus: Optional[int] = None, memory: Optional[str] = None, gpus: Optional[int] = None, gpu_type: Optional[str] = None, storage: Optional[str] = None) -> CreateDockerImageBatchJobBundleResponse

For self hosted mode only.

Creates a Docker Image Batch Job Bundle.

Parameters:

Name Type Description Default
name str

A user-defined name for the bundle. Does not need to be unique.

required
image_repository str

The (short) repository of your image. For example, if your image is located at 123456789012.dkr.ecr.us-west-2.amazonaws.com/repo:tag, and your version of Launch is configured to look at 123456789012.dkr.ecr.us-west-2.amazonaws.com for Docker Images, you would pass the value repo for the image_repository parameter.

required
image_tag str

The tag of your image inside of the repo. In the example above, you would pass the value tag for the image_tag parameter.

required
command List[str]

The command to run inside the docker image.

required
env Optional[Dict[str, str]]

A dictionary of environment variables to inject into your docker image.

None
mount_location Optional[str]

A location in the filesystem where you would like a json-formatted file, controllable on runtime, to be mounted. This allows behavior to be specified on runtime. (Specifically, the contents of this file can be read via json.load() inside of the user-defined code.)

None
cpus Optional[int]

Optional default value for the number of cpus to give the job.

None
memory Optional[str]

Optional default value for the amount of memory to give the job.

None
gpus Optional[int]

Optional default value for the number of gpus to give the job.

None
gpu_type Optional[str]

Optional default value for the type of gpu to give the job.

None
storage Optional[str]

Optional default value for the amount of disk to give the job.

None

create_fine_tune

create_fine_tune(model: str, training_file: str, validation_file: Optional[str] = None, fine_tuning_method: Optional[str] = None, hyperparameters: Optional[Dict[str, str]] = None, wandb_config: Optional[Dict[str, Any]] = None, suffix: str = None) -> CreateFineTuneResponse

Create a fine-tune

Parameters:

Name Type Description Default
model str

Identifier of base model to train from.

required
training_file str

Path to file of training dataset. Dataset must be a csv with columns 'prompt' and 'response'.

required
validation_file Optional[str]

Path to file of validation dataset. Has the same format as training_file. If not provided, we will generate a split from the training dataset.

None
fine_tuning_method Optional[str]

Fine-tuning method. Currently unused, but when different techniques are implemented we will expose this field.

None
hyperparameters Optional[Dict[str, str]]

Hyperparameters to pass in to training job.

None
wandb_config Optional[Dict[str, Any]]

Configuration for Weights and Biases. To enable set hyperparameters["report_to"] to wandb. api_key must be provided which is the API key.

None
suffix str

Optional user-provided identifier suffix for the fine-tuned model.

None

Returns:

Name Type Description
CreateFineTuneResponse CreateFineTuneResponse

ID of the created fine-tune

create_llm_model_endpoint

create_llm_model_endpoint(endpoint_name: str, model_name: str, inference_framework_image_tag: str, source: LLMSource = LLMSource.HUGGING_FACE, inference_framework: LLMInferenceFramework = LLMInferenceFramework.DEEPSPEED, num_shards: int = 4, quantize: Optional[Quantization] = None, checkpoint_path: Optional[str] = None, cpus: int = 32, memory: str = '192Gi', storage: Optional[str] = None, gpus: int = 4, min_workers: int = 0, max_workers: int = 1, per_worker: int = 10, gpu_type: Optional[str] = 'nvidia-ampere-a10', endpoint_type: str = 'sync', high_priority: Optional[bool] = False, post_inference_hooks: Optional[List[PostInferenceHooks]] = None, default_callback_url: Optional[str] = None, default_callback_auth_kind: Optional[Literal['basic', 'mtls']] = None, default_callback_auth_username: Optional[str] = None, default_callback_auth_password: Optional[str] = None, default_callback_auth_cert: Optional[str] = None, default_callback_auth_key: Optional[str] = None, public_inference: Optional[bool] = None, update_if_exists: bool = False, labels: Optional[Dict[str, str]] = None)

Creates and registers a model endpoint in Scale Launch. The returned object is an instance of type Endpoint, which is a base class of either SyncEndpoint or AsyncEndpoint. This is the object to which you sent inference requests.

Parameters:

Name Type Description Default
endpoint_name str

The name of the model endpoint you want to create. The name must be unique across all endpoints that you own.

required
model_name str

name for the LLM. List can be found at (TODO: add list of supported models)

required
inference_framework_image_tag str

image tag for the inference framework. (TODO: use latest image tag when unspecified)

required
source LLMSource

source of the LLM. Currently only HuggingFace is supported.

HUGGING_FACE
inference_framework LLMInferenceFramework

inference framework for the LLM. Currently only DeepSpeed is supported.

DEEPSPEED
num_shards int

number of shards for the LLM. When bigger than 1, LLM will be sharded to multiple GPUs. Number of GPUs must be larger than num_shards.

4
quantize Optional[Quantization]

Quantization method for the LLM. Only affects behavior for text-generation-inference models.

None
checkpoint_path Optional[str]

Path to the checkpoint to load the model from. Only affects behavior for text-generation-inference models.

None
cpus int

Number of cpus each worker should get, e.g. 1, 2, etc. This must be greater than or equal to 1.

32
memory str

Amount of memory each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of memory.

'192Gi'
storage Optional[str]

Amount of local ephemeral storage each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of storage.

None
gpus int

Number of gpus each worker should get, e.g. 0, 1, etc.

4
min_workers int

The minimum number of workers. Must be greater than or equal to 0. This should be determined by computing the minimum throughput of your workload and dividing it by the throughput of a single worker. This field must be at least 1 for synchronous endpoints.

0
max_workers int

The maximum number of workers. Must be greater than or equal to 0, and as well as greater than or equal to min_workers. This should be determined by computing the maximum throughput of your workload and dividing it by the throughput of a single worker.

1
per_worker int

The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing per_worker requests, subject to the limits defined by min_workers and max_workers.

  • If the average number of concurrent requests per worker is lower than per_worker, then the number of workers will be reduced. - Otherwise, if the average number of concurrent requests per worker is higher than per_worker, then the number of workers will be increased to meet the elevated traffic.

Here is our recommendation for computing per_worker:

  1. Compute min_workers and max_workers per your minimum and maximum throughput requirements. 2. Determine a value for the maximum number of concurrent requests in the workload. Divide this number by max_workers. Doing this ensures that the number of workers will "climb" to max_workers.
10
gpu_type Optional[str]

If specifying a non-zero number of gpus, this controls the type of gpu requested. Here are the supported values:

  • nvidia-tesla-t4
  • nvidia-ampere-a10
  • nvidia-hopper-h100
  • nvidia-hopper-h100-1g20g
  • nvidia-hopper-h100-3g40g
'nvidia-ampere-a10'
endpoint_type str

Either "sync" or "async".

'sync'
high_priority Optional[bool]

Either True or False. Enabling this will allow the created endpoint to leverage the shared pool of prewarmed nodes for faster spinup time.

False
post_inference_hooks Optional[List[PostInferenceHooks]]

List of hooks to trigger after inference tasks are served.

None
default_callback_url Optional[str]

The default callback url to use for async endpoints. This can be overridden in the task parameters for each individual task. post_inference_hooks must contain "callback" for the callback to be triggered.

None
default_callback_auth_kind Optional[Literal['basic', 'mtls']]

The default callback auth kind to use for async endpoints. Either "basic" or "mtls". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_username Optional[str]

The default callback auth username to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_password Optional[str]

The default callback auth password to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_cert Optional[str]

The default callback auth cert to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_key Optional[str]

The default callback auth key to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.

None
public_inference Optional[bool]

If True, this endpoint will be available to all user IDs for inference.

None
update_if_exists bool

If True, will attempt to update the endpoint if it exists. Otherwise, will unconditionally try to create a new endpoint. Note that endpoint names for a given user must be unique, so attempting to call this function with update_if_exists=False for an existing endpoint will raise an error.

False
labels Optional[Dict[str, str]]

An optional dictionary of key/value pairs to associate with this endpoint.

None

Returns:

Type Description

A Endpoint object that can be used to make requests to the endpoint.

create_model_bundle

create_model_bundle(model_bundle_name: str, env_params: Dict[str, str], *, load_predict_fn: Optional[Callable[[LaunchModel_T], Callable[[Any], Any]]] = None, predict_fn_or_cls: Optional[Callable[[Any], Any]] = None, requirements: Optional[List[str]] = None, model: Optional[LaunchModel_T] = None, load_model_fn: Optional[Callable[[], LaunchModel_T]] = None, app_config: Optional[Union[Dict[str, Any], str]] = None, globals_copy: Optional[Dict[str, Any]] = None, request_schema: Optional[Type[BaseModel]] = None, response_schema: Optional[Type[BaseModel]] = None) -> ModelBundle
Warning

This method is deprecated. Use create_model_bundle_from_callable_v2 instead.

Parameters:

Name Type Description Default
model_bundle_name str

The name of the model bundle you want to create. The name must be unique across all bundles that you own.

required
predict_fn_or_cls Optional[Callable[[Any], Any]]

Function or a Callable class that runs end-to-end (pre/post processing and model inference) on the call. i.e. predict_fn_or_cls(REQUEST) -> RESPONSE.

None
model Optional[LaunchModel_T]

Typically a trained Neural Network, e.g. a Pytorch module.

Exactly one of model and load_model_fn must be provided.

None
load_model_fn Optional[Callable[[], LaunchModel_T]]

A function that, when run, loads a model. This function is essentially a deferred wrapper around the model argument.

Exactly one of model and load_model_fn must be provided.

None
load_predict_fn Optional[Callable[[LaunchModel_T], Callable[[Any], Any]]]

Function that, when called with a model, returns a function that carries out inference.

If model is specified, then this is equivalent to: load_predict_fn(model, app_config=optional_app_config]) -> predict_fn

Otherwise, if load_model_fn is specified, then this is equivalent to: load_predict_fn(load_model_fn(), app_config=optional_app_config]) -> predict_fn

In both cases, predict_fn is then the inference function, i.e.: predict_fn(REQUEST) -> RESPONSE

None
requirements Optional[List[str]]

A list of python package requirements, where each list element is of the form <package_name>==<package_version>, e.g.

["tensorflow==2.3.0", "tensorflow-hub==0.11.0"]

If you do not pass in a value for requirements, then you must pass in globals() for the globals_copy argument.

None
app_config Optional[Union[Dict[str, Any], str]]

Either a Dictionary that represents a YAML file contents or a local path to a YAML file.

None
env_params Dict[str, str]

A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which base image tag to use, etc. Specifically, the dictionary should contain the following keys:

  • framework_type: either tensorflow or pytorch. - PyTorch fields: - pytorch_image_tag: An image tag for the pytorch docker base image. The list of tags can be found from https://hub.docker.com/r/pytorch/pytorch/tags. - Example:

    .. code-block:: python

    { "framework_type": "pytorch", "pytorch_image_tag": "1.10.0-cuda11.3-cudnn8-runtime" }

  • Tensorflow fields:

    • tensorflow_version: Version of tensorflow, e.g. "2.3.0".
required
globals_copy Optional[Dict[str, Any]]

Dictionary of the global symbol table. Normally provided by globals() built-in function.

None
request_schema Optional[Type[BaseModel]]

A pydantic model that represents the request schema for the model bundle. This is used to validate the request body for the model bundle's endpoint.

None
response_schema Optional[Type[BaseModel]]

A pydantic model that represents the request schema for the model bundle. This is used to validate the response for the model bundle's endpoint. Note: If request_schema is specified, then response_schema must also be specified.

None

create_model_bundle_from_callable_v2

create_model_bundle_from_callable_v2(*, model_bundle_name: str, load_predict_fn: Callable[[LaunchModel_T], Callable[[Any], Any]], load_model_fn: Callable[[], LaunchModel_T], request_schema: Type[BaseModel], response_schema: Type[BaseModel], requirements: Optional[List[str]] = None, pytorch_image_tag: Optional[str] = None, tensorflow_version: Optional[str] = None, custom_base_image_repository: Optional[str] = None, custom_base_image_tag: Optional[str] = None, app_config: Optional[Union[Dict[str, Any], str]] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Uploads and registers a model bundle to Scale Launch.

Parameters:

Name Type Description Default
model_bundle_name str

Name of the model bundle.

required
load_predict_fn Callable[[LaunchModel_T], Callable[[Any], Any]]

Function that takes in a model and returns a predict function. When your model bundle is deployed, this predict function will be called as follows:

input = {"input": "some input"} # or whatever your request schema is.

def load_model_fn():
    # load model
    return model

def load_predict_fn(model, app_config=None):
    def predict_fn(input):
        # do pre-processing
        output = model(input)
        # do post-processing
        return output
    return predict_fn

predict_fn = load_predict_fn(load_model_fn(), app_config=optional_app_config)
response = predict_fn(input)

required
load_model_fn Callable[[], LaunchModel_T]

A function that, when run, loads a model.

required
request_schema Type[BaseModel]

A pydantic model that represents the request schema for the model bundle. This is used to validate the request body for the model bundle's endpoint.

required
response_schema Type[BaseModel]

A pydantic model that represents the request schema for the model bundle. This is used to validate the response for the model bundle's endpoint.

required
requirements Optional[List[str]]

List of pip requirements.

None
pytorch_image_tag Optional[str]

The image tag for the PyTorch image that will be used to run the bundle. Exactly one of pytorch_image_tag, tensorflow_version, or custom_base_image_repository must be specified.

None
tensorflow_version Optional[str]

The version of TensorFlow that will be used to run the bundle. If not specified, the default version will be used. Exactly one of pytorch_image_tag, tensorflow_version, or custom_base_image_repository must be specified.

None
custom_base_image_repository Optional[str]

The repository for a custom base image that will be used to run the bundle. If not specified, the default base image will be used. Exactly one of pytorch_image_tag, tensorflow_version, or custom_base_image_repository must be specified.

None
custom_base_image_tag Optional[str]

The tag for a custom base image that will be used to run the bundle. Must be specified if custom_base_image_repository is specified.

None
app_config Optional[Union[Dict[str, Any], str]]

An optional dictionary of configuration values that will be passed to the bundle when it is run. These values can be accessed by the bundle via the app_config global variable.

None
metadata Optional[Dict[str, Any]]

Metadata to record with the bundle.

None

Returns:

Type Description
CreateModelBundleV2Response

An object containing the following keys:

  • model_bundle_id: The ID of the created model bundle.

create_model_bundle_from_dirs

create_model_bundle_from_dirs(*, model_bundle_name: str, base_paths: List[str], requirements_path: str, env_params: Dict[str, str], load_predict_fn_module_path: str, load_model_fn_module_path: str, app_config: Optional[Union[Dict[str, Any], str]] = None, request_schema: Optional[Type[BaseModel]] = None, response_schema: Optional[Type[BaseModel]] = None) -> ModelBundle
Warning

This method is deprecated. Use create_model_bundle_from_dirs_v2 instead.

Parameters:

Name Type Description Default
model_bundle_name str

The name of the model bundle you want to create. The name must be unique across all bundles that you own.

required
base_paths List[str]

The paths on the local filesystem where the bundle code lives.

required
requirements_path str

A path on the local filesystem where a requirements.txt file lives.

required
env_params Dict[str, str]

A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which base image tag to use, etc. Specifically, the dictionary should contain the following keys:

  • framework_type: either tensorflow or pytorch.
  • PyTorch fields:
    • pytorch_image_tag: An image tag for the pytorch docker base image. The list of tags can be found from https://hub.docker.com/r/pytorch/pytorch/tags

Example:

{
    "framework_type": "pytorch",
    "pytorch_image_tag": "1.10.0-cuda11.3-cudnn8-runtime",
}

required
load_predict_fn_module_path str

A python module path for a function that, when called with the output of load_model_fn_module_path, returns a function that carries out inference.

required
load_model_fn_module_path str

A python module path for a function that returns a model. The output feeds into the function located at load_predict_fn_module_path.

required
app_config Optional[Union[Dict[str, Any], str]]

Either a Dictionary that represents a YAML file contents or a local path to a YAML file.

None
request_schema Optional[Type[BaseModel]]

A pydantic model that represents the request schema for the model bundle. This is used to validate the request body for the model bundle's endpoint.

None
response_schema Optional[Type[BaseModel]]

A pydantic model that represents the request schema for the model bundle. This is used to validate the response for the model bundle's endpoint. Note: If request_schema is specified, then response_schema must also be specified.

None

create_model_bundle_from_dirs_v2

create_model_bundle_from_dirs_v2(*, model_bundle_name: str, base_paths: List[str], load_predict_fn_module_path: str, load_model_fn_module_path: str, request_schema: Type[BaseModel], response_schema: Type[BaseModel], requirements_path: Optional[str] = None, pytorch_image_tag: Optional[str] = None, tensorflow_version: Optional[str] = None, custom_base_image_repository: Optional[str] = None, custom_base_image_tag: Optional[str] = None, app_config: Optional[Dict[str, Any]] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Packages up code from one or more local filesystem folders and uploads them as a bundle to Scale Launch. In this mode, a bundle is just local code instead of a serialized object.

For example, if you have a directory structure like so, and your current working directory is my_root:

   my_root/
       my_module1/
           __init__.py
           ...files and directories
           my_inference_file.py
       my_module2/
           __init__.py
           ...files and directories

then calling create_model_bundle_from_dirs_v2 with base_paths=["my_module1", "my_module2"] essentially creates a zip file without the root directory, e.g.:

   my_module1/
       __init__.py
       ...files and directories
       my_inference_file.py
   my_module2/
       __init__.py
       ...files and directories

and these contents will be unzipped relative to the server side application root. Bear these points in mind when referencing Python module paths for this bundle. For instance, if my_inference_file.py has def f(...) as the desired inference loading function, then the load_predict_fn_module_path argument should be my_module1.my_inference_file.f.

Parameters:

Name Type Description Default
model_bundle_name str

The name of the model bundle you want to create.

required
base_paths List[str]

A list of paths to directories that will be zipped up and uploaded as a bundle. Each path must be relative to the current working directory.

required
load_predict_fn_module_path str

The Python module path to the function that will be used to load the model for inference. This function should take in a path to a model directory, and return a model object. The model object should be pickleable.

required
load_model_fn_module_path str

The Python module path to the function that will be used to load the model for training. This function should take in a path to a model directory, and return a model object. The model object should be pickleable.

required
request_schema Type[BaseModel]

A Pydantic model that defines the request schema for the bundle.

required
response_schema Type[BaseModel]

A Pydantic model that defines the response schema for the bundle.

required
requirements_path Optional[str]

Path to a requirements.txt file that will be used to install dependencies for the bundle. This file must be relative to the current working directory.

None
pytorch_image_tag Optional[str]

The image tag for the PyTorch image that will be used to run the bundle. Exactly one of pytorch_image_tag, tensorflow_version, or custom_base_image_repository must be specified.

None
tensorflow_version Optional[str]

The version of TensorFlow that will be used to run the bundle. If not specified, the default version will be used. Exactly one of pytorch_image_tag, tensorflow_version, or custom_base_image_repository must be specified.

None
custom_base_image_repository Optional[str]

The repository for a custom base image that will be used to run the bundle. If not specified, the default base image will be used. Exactly one of pytorch_image_tag, tensorflow_version, or custom_base_image_repository must be specified.

None
custom_base_image_tag Optional[str]

The tag for a custom base image that will be used to run the bundle. Must be specified if custom_base_image_repository is specified.

None
app_config Optional[Dict[str, Any]]

An optional dictionary of configuration values that will be passed to the bundle when it is run. These values can be accessed by the bundle via the app_config global variable.

None
metadata Optional[Dict[str, Any]]

Metadata to record with the bundle.

None

Returns:

Type Description
CreateModelBundleV2Response

An object containing the following keys:

  • model_bundle_id: The ID of the created model bundle.

create_model_bundle_from_runnable_image_v2

create_model_bundle_from_runnable_image_v2(*, model_bundle_name: str, request_schema: Type[BaseModel], response_schema: Type[BaseModel], repository: str, tag: str, command: List[str], healthcheck_route: Optional[str] = None, predict_route: Optional[str] = None, env: Dict[str, str], readiness_initial_delay_seconds: int, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Create a model bundle from a runnable image. The specified command must start a process that will listen for requests on port 5005 using HTTP.

Inference requests must be served at the POST /predict route while the GET /readyz route is a healthcheck.

Parameters:

Name Type Description Default
model_bundle_name str

The name of the model bundle you want to create.

required
request_schema Type[BaseModel]

A Pydantic model that defines the request schema for the bundle.

required
response_schema Type[BaseModel]

A Pydantic model that defines the response schema for the bundle.

required
repository str

The name of the Docker repository for the runnable image.

required
tag str

The tag for the runnable image.

required
command List[str]

The command that will be used to start the process that listens for requests.

required
predict_route Optional[str]

The endpoint route on the runnable image that will be called.

None
healthcheck_route Optional[str]

The healthcheck endpoint route on the runnable image.

None
env Dict[str, str]

A dictionary of environment variables that will be passed to the bundle when it is run.

required
readiness_initial_delay_seconds int

The number of seconds to wait for the HTTP server to become ready and successfully respond on its healthcheck.

required
metadata Optional[Dict[str, Any]]

Metadata to record with the bundle.

None

Returns:

Type Description
CreateModelBundleV2Response

An object containing the following keys:

  • model_bundle_id: The ID of the created model bundle.

create_model_bundle_from_streaming_enhanced_runnable_image_v2

create_model_bundle_from_streaming_enhanced_runnable_image_v2(*, model_bundle_name: str, request_schema: Type[BaseModel], response_schema: Type[BaseModel], repository: str, tag: str, command: Optional[List[str]] = None, healthcheck_route: Optional[str] = None, predict_route: Optional[str] = None, streaming_command: List[str], streaming_predict_route: Optional[str] = None, env: Dict[str, str], readiness_initial_delay_seconds: int, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Create a model bundle from a runnable image. The specified command must start a process that will listen for requests on port 5005 using HTTP.

Inference requests must be served at the POST /predict route while the GET /readyz route is a healthcheck.

Parameters:

Name Type Description Default
model_bundle_name str

The name of the model bundle you want to create.

required
request_schema Type[BaseModel]

A Pydantic model that defines the request schema for the bundle.

required
response_schema Type[BaseModel]

A Pydantic model that defines the response schema for the bundle.

required
repository str

The name of the Docker repository for the runnable image.

required
tag str

The tag for the runnable image.

required
command Optional[List[str]]

The command that will be used to start the process that listens for requests if this bundle is used as a SYNC or ASYNC endpoint.

None
healthcheck_route Optional[str]

The healthcheck endpoint route on the runnable image.

None
predict_route Optional[str]

The endpoint route on the runnable image that will be called if this bundle is used as a SYNC or ASYNC endpoint.

None
streaming_command List[str]

The command that will be used to start the process that listens for requests if this bundle is used as a STREAMING endpoint.

required
streaming_predict_route Optional[str]

The endpoint route on the runnable image that will be called if this bundle is used as a STREAMING endpoint.

None
env Dict[str, str]

A dictionary of environment variables that will be passed to the bundle when it is run.

required
readiness_initial_delay_seconds int

The number of seconds to wait for the HTTP server to become ready and successfully respond on its healthcheck.

required
metadata Optional[Dict[str, Any]]

Metadata to record with the bundle.

None

Returns:

Type Description
CreateModelBundleV2Response

An object containing the following keys:

  • model_bundle_id: The ID of the created model bundle.

create_model_bundle_from_triton_enhanced_runnable_image_v2

create_model_bundle_from_triton_enhanced_runnable_image_v2(*, model_bundle_name: str, request_schema: Type[BaseModel], response_schema: Type[BaseModel], repository: str, tag: str, command: List[str], healthcheck_route: Optional[str] = None, predict_route: Optional[str] = None, env: Dict[str, str], readiness_initial_delay_seconds: int, triton_model_repository: str, triton_model_replicas: Optional[Dict[str, str]] = None, triton_num_cpu: float, triton_commit_tag: str, triton_storage: Optional[str] = None, triton_memory: Optional[str] = None, triton_readiness_initial_delay_seconds: int, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Create a model bundle from a runnable image and a tritonserver image.

Same requirements as :param:create_model_bundle_from_runnable_image_v2 with additional constraints necessary for configuring tritonserver's execution.

Parameters:

Name Type Description Default
model_bundle_name str

The name of the model bundle you want to create.

required
request_schema Type[BaseModel]

A Pydantic model that defines the request schema for the bundle.

required
response_schema Type[BaseModel]

A Pydantic model that defines the response schema for the bundle.

required
repository str

The name of the Docker repository for the runnable image.

required
tag str

The tag for the runnable image.

required
command List[str]

The command that will be used to start the process that listens for requests.

required
predict_route Optional[str]

The endpoint route on the runnable image that will be called.

None
healthcheck_route Optional[str]

The healthcheck endpoint route on the runnable image.

None
env Dict[str, str]

A dictionary of environment variables that will be passed to the bundle when it is run.

required
readiness_initial_delay_seconds int

The number of seconds to wait for the HTTP server to become ready and successfully respond on its healthcheck.

required
triton_model_repository str

The S3 prefix that contains the contents of the model repository, formatted according to https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md

required
triton_model_replicas Optional[Dict[str, str]]

If supplied, the name and number of replicas to make for each model.

None
triton_num_cpu float

Number of CPUs, fractional, to allocate to tritonserver.

required
triton_commit_tag str

The image tag of the specific trionserver version.

required
triton_storage Optional[str]

Amount of storage space to allocate for the tritonserver container.

None
triton_memory Optional[str]

Amount of memory to allocate for the tritonserver container.

None
triton_readiness_initial_delay_seconds int

Like readiness_initial_delay_seconds, but for tritonserver's own healthcheck.

required
metadata Optional[Dict[str, Any]]

Metadata to record with the bundle.

None

Returns:

Type Description
CreateModelBundleV2Response

An object containing the following keys:

  • model_bundle_id: The ID of the created model bundle.

create_model_endpoint

create_model_endpoint(*, endpoint_name: str, model_bundle: Union[ModelBundle, str], cpus: int = 3, memory: str = '8Gi', storage: str = '16Gi', gpus: int = 0, min_workers: int = 1, max_workers: int = 1, per_worker: int = 10, gpu_type: Optional[str] = None, endpoint_type: str = 'sync', high_priority: Optional[bool] = False, post_inference_hooks: Optional[List[PostInferenceHooks]] = None, default_callback_url: Optional[str] = None, default_callback_auth_kind: Optional[Literal['basic', 'mtls']] = None, default_callback_auth_username: Optional[str] = None, default_callback_auth_password: Optional[str] = None, default_callback_auth_cert: Optional[str] = None, default_callback_auth_key: Optional[str] = None, public_inference: Optional[bool] = None, update_if_exists: bool = False, labels: Optional[Dict[str, str]] = None) -> Optional[Endpoint]

Creates and registers a model endpoint in Scale Launch. The returned object is an instance of type Endpoint, which is a base class of either SyncEndpoint or AsyncEndpoint. This is the object to which you sent inference requests.

Parameters:

Name Type Description Default
endpoint_name str

The name of the model endpoint you want to create. The name must be unique across all endpoints that you own.

required
model_bundle Union[ModelBundle, str]

The ModelBundle that the endpoint should serve.

required
cpus int

Number of cpus each worker should get, e.g. 1, 2, etc. This must be greater than or equal to 1.

3
memory str

Amount of memory each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of memory.

'8Gi'
storage str

Amount of local ephemeral storage each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of storage.

'16Gi'
gpus int

Number of gpus each worker should get, e.g. 0, 1, etc.

0
min_workers int

The minimum number of workers. Must be greater than or equal to 0. This should be determined by computing the minimum throughput of your workload and dividing it by the throughput of a single worker. This field must be at least 1 for synchronous endpoints.

1
max_workers int

The maximum number of workers. Must be greater than or equal to 0, and as well as greater than or equal to min_workers. This should be determined by computing the maximum throughput of your workload and dividing it by the throughput of a single worker.

1
per_worker int

The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing per_worker requests, subject to the limits defined by min_workers and max_workers.

  • If the average number of concurrent requests per worker is lower than per_worker, then the number of workers will be reduced. - Otherwise, if the average number of concurrent requests per worker is higher than per_worker, then the number of workers will be increased to meet the elevated traffic.

Here is our recommendation for computing per_worker:

  1. Compute min_workers and max_workers per your minimum and maximum throughput requirements. 2. Determine a value for the maximum number of concurrent requests in the workload. Divide this number by max_workers. Doing this ensures that the number of workers will "climb" to max_workers.
10
gpu_type Optional[str]

If specifying a non-zero number of gpus, this controls the type of gpu requested. Here are the supported values:

  • nvidia-tesla-t4
  • nvidia-ampere-a10
  • nvidia-hopper-h100
  • nvidia-hopper-h100-1g20g
  • nvidia-hopper-h100-3g40g
None
endpoint_type str

Either "sync", "async", or "streaming".

'sync'
high_priority Optional[bool]

Either True or False. Enabling this will allow the created endpoint to leverage the shared pool of prewarmed nodes for faster spinup time.

False
post_inference_hooks Optional[List[PostInferenceHooks]]

List of hooks to trigger after inference tasks are served.

None
default_callback_url Optional[str]

The default callback url to use for async endpoints. This can be overridden in the task parameters for each individual task. post_inference_hooks must contain "callback" for the callback to be triggered.

None
default_callback_auth_kind Optional[Literal['basic', 'mtls']]

The default callback auth kind to use for async endpoints. Either "basic" or "mtls". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_username Optional[str]

The default callback auth username to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_password Optional[str]

The default callback auth password to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_cert Optional[str]

The default callback auth cert to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_key Optional[str]

The default callback auth key to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.

None
public_inference Optional[bool]

If True, this endpoint will be available to all user IDs for inference.

None
update_if_exists bool

If True, will attempt to update the endpoint if it exists. Otherwise, will unconditionally try to create a new endpoint. Note that endpoint names for a given user must be unique, so attempting to call this function with update_if_exists=False for an existing endpoint will raise an error.

False
labels Optional[Dict[str, str]]

An optional dictionary of key/value pairs to associate with this endpoint.

None

Returns:

Type Description
Optional[Endpoint]

A Endpoint object that can be used to make requests to the endpoint.

delete_file

delete_file(file_id: str) -> DeleteFileResponse

Delete a file

Parameters:

Name Type Description Default
file_id str

ID of the file

required

Returns:

Name Type Description
DeleteFileResponse DeleteFileResponse

whether the deletion was successful

delete_llm_model_endpoint

delete_llm_model_endpoint(model_endpoint_name: str) -> bool

Deletes an LLM model endpoint.

Parameters:

Name Type Description Default
model_endpoint_name str

The name of the model endpoint to delete.

required

delete_model_endpoint

delete_model_endpoint(model_endpoint_name: str)

Deletes a model endpoint.

Parameters:

Name Type Description Default
model_endpoint

A ModelEndpoint object.

required

edit_model_endpoint

edit_model_endpoint(*, model_endpoint: Union[ModelEndpoint, str], model_bundle: Optional[Union[ModelBundle, str]] = None, cpus: Optional[float] = None, memory: Optional[str] = None, storage: Optional[str] = None, gpus: Optional[int] = None, min_workers: Optional[int] = None, max_workers: Optional[int] = None, per_worker: Optional[int] = None, gpu_type: Optional[str] = None, high_priority: Optional[bool] = None, post_inference_hooks: Optional[List[PostInferenceHooks]] = None, default_callback_url: Optional[str] = None, default_callback_auth_kind: Optional[Literal['basic', 'mtls']] = None, default_callback_auth_username: Optional[str] = None, default_callback_auth_password: Optional[str] = None, default_callback_auth_cert: Optional[str] = None, default_callback_auth_key: Optional[str] = None, public_inference: Optional[bool] = None) -> None

Edits an existing model endpoint. Here are the fields that cannot be edited on an existing endpoint:

  • The endpoint's name. - The endpoint's type (i.e. you cannot go from a SyncEnpdoint to an AsyncEndpoint or vice versa.

Parameters:

Name Type Description Default
model_endpoint Union[ModelEndpoint, str]

The model endpoint (or its name) you want to edit. The name must be unique across all endpoints that you own.

required
model_bundle Optional[Union[ModelBundle, str]]

The ModelBundle that the endpoint should serve.

None
cpus Optional[float]

Number of cpus each worker should get, e.g. 1, 2, etc. This must be greater than or equal to 1.

None
memory Optional[str]

Amount of memory each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of memory.

None
storage Optional[str]

Amount of local ephemeral storage each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of storage.

None
gpus Optional[int]

Number of gpus each worker should get, e.g. 0, 1, etc.

None
min_workers Optional[int]

The minimum number of workers. Must be greater than or equal to 0.

None
max_workers Optional[int]

The maximum number of workers. Must be greater than or equal to 0, and as well as greater than or equal to min_workers.

None
per_worker Optional[int]

The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing per_worker requests:

  • If the average number of concurrent requests per worker is lower than per_worker, then the number of workers will be reduced. - Otherwise, if the average number of concurrent requests per worker is higher than per_worker, then the number of workers will be increased to meet the elevated traffic.
None
gpu_type Optional[str]

If specifying a non-zero number of gpus, this controls the type of gpu requested. Here are the supported values:

  • nvidia-tesla-t4
  • nvidia-ampere-a10
  • nvidia-hopper-h100
  • nvidia-hopper-h100-1g20g
  • nvidia-hopper-h100-3g40g
None
high_priority Optional[bool]

Either True or False. Enabling this will allow the created endpoint to leverage the shared pool of prewarmed nodes for faster spinup time.

None
post_inference_hooks Optional[List[PostInferenceHooks]]

List of hooks to trigger after inference tasks are served.

None
default_callback_url Optional[str]

The default callback url to use for async endpoints. This can be overridden in the task parameters for each individual task. post_inference_hooks must contain "callback" for the callback to be triggered.

None
default_callback_auth_kind Optional[Literal['basic', 'mtls']]

The default callback auth kind to use for async endpoints. Either "basic" or "mtls". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_username Optional[str]

The default callback auth username to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_password Optional[str]

The default callback auth password to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_cert Optional[str]

The default callback auth cert to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.

None
default_callback_auth_key Optional[str]

The default callback auth key to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.

None
public_inference Optional[bool]

If True, this endpoint will be available to all user IDs for inference.

None

get_batch_async_response

get_batch_async_response(batch_job_id: str) -> Dict[str, Any]

Gets inference results from a previously created batch job.

Parameters:

Name Type Description Default
batch_job_id str

An id representing the batch task job. This id is the in the response from calling batch_async_request.

required

Returns:

Type Description
Dict[str, Any]

A dictionary that contains the following fields:

Dict[str, Any]
  • status: The status of the job.
Dict[str, Any]
  • result: The url where the result is stored.
Dict[str, Any]
  • duration: A string representation of how long the job took to finish or how long it has been running, for a job current in progress.
Dict[str, Any]
  • num_tasks_pending: The number of tasks that are still pending.
Dict[str, Any]
  • num_tasks_completed: The number of tasks that have completed.

get_docker_image_batch_job

get_docker_image_batch_job(batch_job_id: str)

For self hosted mode only. Gets information about a batch job given a batch job id.

get_docker_image_batch_job_bundle

get_docker_image_batch_job_bundle(docker_image_batch_job_bundle_id: str) -> DockerImageBatchJobBundleResponse

For self hosted mode only. Gets information for a single batch job bundle with a given id.

get_file

get_file(file_id: str) -> GetFileResponse

Get metadata about a file

Parameters:

Name Type Description Default
file_id str

ID of the file

required

Returns:

Name Type Description
GetFileResponse GetFileResponse

ID, filename, and size of the requested file

get_file_content

get_file_content(file_id: str) -> GetFileContentResponse

Get a file's content

Parameters:

Name Type Description Default
file_id str

ID of the file

required

Returns:

Name Type Description
GetFileContentResponse GetFileContentResponse

ID and content of the requested file

get_fine_tune

get_fine_tune(fine_tune_id: str) -> GetFineTuneResponse

Get status of a fine-tune

Parameters:

Name Type Description Default
fine_tune_id str

ID of the fine-tune

required

Returns:

Name Type Description
GetFineTuneResponse GetFineTuneResponse

ID and status of the requested fine-tune

get_fine_tune_events

get_fine_tune_events(fine_tune_id: str) -> GetFineTuneEventsResponse

Get list of fine-tune events

Parameters:

Name Type Description Default
fine_tune_id str

ID of the fine-tune

required

Returns:

Name Type Description
GetFineTuneEventsResponse GetFineTuneEventsResponse

a list of all the events of the fine-tune

get_latest_docker_image_batch_job_bundle

get_latest_docker_image_batch_job_bundle(bundle_name: str) -> DockerImageBatchJobBundleResponse

For self hosted mode only. Gets information for the latest batch job bundle with a given name.

get_latest_model_bundle_v2

get_latest_model_bundle_v2(model_bundle_name: str) -> ModelBundleV2Response

Get the latest version of a model bundle.

Parameters:

Name Type Description Default
model_bundle_name str

The name of the model bundle you want to get.

required

Returns:

Type Description
ModelBundleV2Response

An object containing the following keys:

  • id: The ID of the model bundle.
  • name: The name of the model bundle.
  • schema_location: The location of the schema for the model bundle.
  • flavor: The flavor of the model bundle. Either RunnableImage, CloudpickleArtifact, ZipArtifact, or TritonEnhancedRunnableImageFlavor.
  • created_at: The time the model bundle was created.
  • metadata: A dictionary of metadata associated with the model bundle.
  • model_artifact_ids: A list of IDs of model artifacts associated with the bundle.

get_llm_model_endpoint

get_llm_model_endpoint(endpoint_name: str) -> Optional[Union[AsyncEndpoint, SyncEndpoint, StreamingEndpoint]]

Gets a model endpoint associated with a name that the user has access to.

Parameters:

Name Type Description Default
endpoint_name str

The name of the endpoint to retrieve.

required

get_model_bundle

get_model_bundle(model_bundle: Union[ModelBundle, str]) -> ModelBundle

Returns a model bundle specified by bundle_name that the user owns.

Parameters:

Name Type Description Default
model_bundle Union[ModelBundle, str]

The bundle or its name.

required

Returns:

Type Description
ModelBundle

A ModelBundle object

get_model_bundle_v2

get_model_bundle_v2(model_bundle_id: str) -> ModelBundleV2Response

Get a model bundle.

Parameters:

Name Type Description Default
model_bundle_id str

The ID of the model bundle you want to get.

required

Returns:

Type Description
ModelBundleV2Response

An object containing the following fields:

  • id: The ID of the model bundle.
  • name: The name of the model bundle.
  • flavor: The flavor of the model bundle. Either RunnableImage, CloudpickleArtifact, ZipArtifact, or TritonEnhancedRunnableImageFlavor.
  • created_at: The time the model bundle was created.
  • metadata: A dictionary of metadata associated with the model bundle.
  • model_artifact_ids: A list of IDs of model artifacts associated with the bundle.

get_model_endpoint

get_model_endpoint(endpoint_name: str) -> Optional[Union[AsyncEndpoint, SyncEndpoint]]

Gets a model endpoint associated with a name.

Parameters:

Name Type Description Default
endpoint_name str

The name of the endpoint to retrieve.

required

list_docker_image_batch_job_bundles

list_docker_image_batch_job_bundles(bundle_name: Optional[str] = None, order_by: Optional[Literal['newest', 'oldest']] = None) -> ListDockerImageBatchJobBundleResponse

For self hosted mode only. Gets information for multiple bundles.

Parameters:

Name Type Description Default
bundle_name Optional[str]

The name of the bundles to retrieve. If not specified, this will retrieve all

None
order_by Optional[Literal['newest', 'oldest']]

Either "newest", "oldest", or not specified. Specify to sort by newest/oldest.

None

list_files

list_files() -> ListFilesResponse

List files

Returns:

Name Type Description
ListFilesResponse ListFilesResponse

list of all files (ID, filename, and size)

list_fine_tunes

list_fine_tunes() -> ListFineTunesResponse

List fine-tunes

Returns:

Name Type Description
ListFineTunesResponse ListFineTunesResponse

list of all fine-tunes and their statuses

list_llm_model_endpoints

list_llm_model_endpoints() -> List[Endpoint]

Lists all LLM model endpoints that the user has access to.

Returns:

Type Description
List[Endpoint]

A list of ModelEndpoint objects.

list_model_bundles

list_model_bundles() -> List[ModelBundle]

Returns a list of model bundles that the user owns.

Returns:

Type Description
List[ModelBundle]

A list of ModelBundle objects

list_model_bundles_v2

list_model_bundles_v2() -> ListModelBundlesV2Response

List all model bundles.

Returns:

Type Description
ListModelBundlesV2Response

An object containing the following keys:

  • model_bundles: A list of model bundles. Each model bundle is an object.

list_model_endpoints

list_model_endpoints() -> List[Endpoint]

Lists all model endpoints that the user owns.

Returns:

Type Description
List[Endpoint]

A list of ModelEndpoint objects.

model_download

model_download(model_name: str, download_format: str = 'hugging_face') -> ModelDownloadResponse

download a finetuned model

Parameters:

Name Type Description Default
model_name str

name of the model to download

required
download_format str

format of the model to download

'hugging_face'

Returns:

Name Type Description
ModelDownloadResponse ModelDownloadResponse

dictionary with file names and urls to download the model

read_endpoint_creation_logs

read_endpoint_creation_logs(model_endpoint: Union[ModelEndpoint, str])

Retrieves the logs for the creation of the endpoint.

Parameters:

Name Type Description Default
model_endpoint Union[ModelEndpoint, str]

The endpoint or its name.

required

register_batch_csv_location_fn

register_batch_csv_location_fn(batch_csv_location_fn: Callable[[], str])

For self-hosted mode only. Registers a function that gives a location for batch CSV inputs. Should give different locations each time. This function is called as batch_csv_location_fn(), and should return a batch_csv_url that upload_batch_csv_fn can take.

Strictly, batch_csv_location_fn() does not need to return a str. The only requirement is that if batch_csv_location_fn returns a value of type T, then upload_batch_csv_fn() takes in an object of type T as its second argument (i.e. batch_csv_url).

Parameters:

Name Type Description Default
batch_csv_location_fn Callable[[], str]

Function that generates batch_csv_urls for upload_batch_csv_fn.

required

register_bundle_location_fn

register_bundle_location_fn(bundle_location_fn: Callable[[], str])

For self-hosted mode only. Registers a function that gives a location for a model bundle. Should give different locations each time. This function is called as bundle_location_fn(), and should return a bundle_url that register_upload_bundle_fn can take.

Strictly, bundle_location_fn() does not need to return a str. The only requirement is that if bundle_location_fn returns a value of type T, then upload_bundle_fn() takes in an object of type T as its second argument (i.e. bundle_url).

Parameters:

Name Type Description Default
bundle_location_fn Callable[[], str]

Function that generates bundle_urls for upload_bundle_fn.

required

register_upload_batch_csv_fn

register_upload_batch_csv_fn(upload_batch_csv_fn: Callable[[str, str], None])

For self-hosted mode only. Registers a function that handles batch text upload. This function is called as

upload_batch_csv_fn(csv_text, csv_url)

This function should directly write the contents of csv_text as a text string into csv_url.

Parameters:

Name Type Description Default
upload_batch_csv_fn Callable[[str, str], None]

Function that takes in a csv text (string type), and uploads that bundle to an appropriate location. Only needed for self-hosted mode.

required

register_upload_bundle_fn

register_upload_bundle_fn(upload_bundle_fn: Callable[[str, str], None])

For self-hosted mode only. Registers a function that handles model bundle upload. This function is called as

upload_bundle_fn(serialized_bundle, bundle_url)

This function should directly write the contents of serialized_bundle as a binary string into bundle_url.

See register_bundle_location_fn for more notes on the signature of upload_bundle_fn

Parameters:

Name Type Description Default
upload_bundle_fn Callable[[str, str], None]

Function that takes in a serialized bundle (bytes type), and uploads that bundle to an appropriate location. Only needed for self-hosted mode.

required

update_docker_image_batch_job

update_docker_image_batch_job(batch_job_id: str, cancel: bool)

For self hosted mode only. Updates a batch job by id. Use this if you want to cancel/delete a batch job.

upload_file

upload_file(file_path: str) -> UploadFileResponse

Upload a file

Parameters:

Name Type Description Default
file_path str

Path to a local file to upload.

required

Returns:

Name Type Description
UploadFileResponse UploadFileResponse

ID of the created file