Launch Client¶

LaunchClient ¶

LaunchClient(api_key: str, endpoint: Optional[str] = None, self_hosted: bool = False, use_path_with_custom_endpoint: bool = False)

Scale Launch Python Client.

Initializes a Scale Launch Client.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	Your Scale API key	required
`endpoint`	`Optional[str]`	The Scale Launch Endpoint (this should not need to be changed)	`None`
`self_hosted`	`bool`	True iff you are connecting to a self-hosted Scale Launch	`False`
`use_path_with_custom_endpoint`	`bool`	True iff you are not using the default Scale Launch endpoint but your endpoint has path routing (to SCALE_LAUNCH_VX_PATH) set up	`False`

batch_async_request ¶

batch_async_request(*, model_bundle: Union[ModelBundle, str], urls: Optional[List[str]] = None, inputs: Optional[List[Dict[str, Any]]] = None, batch_url_file_location: Optional[str] = None, serialization_format: str = 'JSON', labels: Optional[Dict[str, str]] = None, cpus: Optional[int] = None, memory: Optional[str] = None, gpus: Optional[int] = None, gpu_type: Optional[str] = None, storage: Optional[str] = None, max_workers: Optional[int] = None, per_worker: Optional[int] = None, timeout_seconds: Optional[float] = None) -> Dict[str, Any]

Sends a batch inference request using a given bundle. Returns a key that can be used to retrieve the results of inference at a later time.

Must have exactly one of urls or inputs passed in.

Parameters:

Name	Type	Description	Default
`model_bundle`	`Union[ModelBundle, str]`	The bundle or the name of a the bundle to use for inference.	required
`urls`	`Optional[List[str]]`	A list of urls, each pointing to a file containing model input. Must be accessible by Scale Launch, hence urls need to either be public or signedURLs.	`None`
`inputs`	`Optional[List[Dict[str, Any]]]`	A list of model inputs, if exists, we will upload the inputs and pass it in to Launch.	`None`
`batch_url_file_location`	`Optional[str]`	In self-hosted mode, the input to the batch job will be uploaded to this location if provided. Otherwise, one will be determined from bundle_location_fn()	`None`
`serialization_format`	`str`	Serialization format of output, either 'PICKLE' or 'JSON'. 'pickle' corresponds to pickling results + returning	`'JSON'`
`labels`	`Optional[Dict[str, str]]`	An optional dictionary of key/value pairs to associate with this endpoint.	`None`
`cpus`	`Optional[int]`	Number of cpus each worker should get, e.g. 1, 2, etc. This must be greater than or equal to 1.	`None`
`memory`	`Optional[str]`	Amount of memory each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of memory.	`None`
`storage`	`Optional[str]`	Amount of local ephemeral storage each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of storage.	`None`
`gpus`	`Optional[int]`	Number of gpus each worker should get, e.g. 0, 1, etc.	`None`
`max_workers`	`Optional[int]`	The maximum number of workers. Must be greater than or equal to 0, and as well as greater than or equal to `min_workers`.	`None`
`per_worker`	`Optional[int]`	The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing `per_worker` requests: If the average number of concurrent requests per worker is lower than `per_worker`, then the number of workers will be reduced. Otherwise, if the average number of concurrent requests per worker is higher than `per_worker`, then the number of workers will be increased to meet the elevated traffic.	`None`
`gpu_type`	`Optional[str]`	If specifying a non-zero number of gpus, this controls the type of gpu requested. Here are the supported values: `nvidia-tesla-t4` `nvidia-ampere-a10` `nvidia-hopper-h100` `nvidia-hopper-h100-1g20g` `nvidia-hopper-h100-3g40g`	`None`
`timeout_seconds`	`Optional[float]`	The maximum amount of time (in seconds) that the batch job can take. If not specified, the server defaults to 12 hours. This includes the time required to build the endpoint and the total time required for all the individual tasks.	`None`

Returns:

Type	Description
`Dict[str, Any]`	A dictionary that contains `job_id` as a key, and the ID as the value.

cancel_fine_tune ¶

cancel_fine_tune(fine_tune_id: str) -> CancelFineTuneResponse

Cancel a fine-tune

Parameters:

Name	Type	Description	Default
`fine_tune_id`	`str`	ID of the fine-tune	required

Returns:

Name	Type	Description
`CancelFineTuneResponse`	`CancelFineTuneResponse`	whether the cancellation was successful

clone_model_bundle_with_changes ¶

clone_model_bundle_with_changes(model_bundle: Union[ModelBundle, str], app_config: Optional[Dict] = None) -> ModelBundle

Warning

This method is deprecated. Use clone_model_bundle_with_changes_v2 instead.

Parameters:

Name	Type	Description	Default
`model_bundle`	`Union[ModelBundle, str]`	The existing bundle or its ID.	required
`app_config`	`Optional[Dict]`	The new bundle's app config, if not passed in, the new bundle's `app_config` will be set to `None`	`None`

Returns:

Type	Description
`ModelBundle`	A `ModelBundle` object

clone_model_bundle_with_changes_v2 ¶

clone_model_bundle_with_changes_v2(original_model_bundle_id: str, new_app_config: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Clone a model bundle with an optional new app_config.

Parameters:

Name	Type	Description	Default
`original_model_bundle_id`	`str`	The ID of the model bundle you want to clone.	required
`new_app_config`	`Optional[Dict[str, Any]]`	A dictionary of new app config values to use for the cloned model.	`None`

Returns:

Type	Description
`CreateModelBundleV2Response`	An object containing the following keys: `model_bundle_id`: The ID of the cloned model bundle.

completions_stream ¶

completions_stream(endpoint_name: str, prompt: str, max_new_tokens: int, temperature: float, stop_sequences: Optional[List[str]] = None, return_token_log_probs: Optional[bool] = False, timeout: float = DEFAULT_LLM_COMPLETIONS_TIMEOUT) -> Iterable[CompletionStreamV1Response]

Run prompt completion on an LLM endpoint in streaming fashion. Will fail if endpoint does not support streaming.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	The name of the LLM endpoint to make the request to	required
`prompt`	`str`	The prompt to send to the endpoint	required
`max_new_tokens`	`int`	The maximum number of tokens to generate for each prompt	required
`temperature`	`float`	The temperature to use for sampling	required
`stop_sequences`	`Optional[List[str]]`	List of sequences to stop the completion at	`None`
`return_token_log_probs`	`Optional[bool]`	Whether to return the log probabilities of the tokens	`False`

Returns:

Type	Description
`Iterable[CompletionStreamV1Response]`	Iterable responses for prompt completion

completions_sync ¶

completions_sync(endpoint_name: str, prompt: str, max_new_tokens: int, temperature: float, stop_sequences: Optional[List[str]] = None, return_token_log_probs: Optional[bool] = False, timeout: float = DEFAULT_LLM_COMPLETIONS_TIMEOUT) -> CompletionSyncV1Response

Run prompt completion on a sync LLM endpoint. Will fail if the endpoint is not sync.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	The name of the LLM endpoint to make the request to	required
`prompt`	`str`	The completion prompt to send to the endpoint	required
`max_new_tokens`	`int`	The maximum number of tokens to generate for each prompt	required
`temperature`	`float`	The temperature to use for sampling	required
`stop_sequences`	`Optional[List[str]]`	List of sequences to stop the completion at	`None`
`return_token_log_probs`	`Optional[bool]`	Whether to return the log probabilities of the tokens	`False`

Returns:

Type	Description
`CompletionSyncV1Response`	Response for prompt completion

create_docker_image_batch_job ¶

create_docker_image_batch_job(*, labels: Dict[str, str], docker_image_batch_job_bundle: Optional[Union[str, DockerImageBatchJobBundleResponse]] = None, docker_image_batch_job_bundle_name: Optional[str] = None, job_config: Optional[Dict[str, Any]] = None, cpus: Optional[int] = None, memory: Optional[str] = None, gpus: Optional[int] = None, gpu_type: Optional[str] = None, storage: Optional[str] = None)

For self hosted mode only. Parameters: docker_image_batch_job_bundle: Specifies the docker image bundle to use for the batch job. Either the string id of a docker image bundle, or a DockerImageBatchJobBundleResponse object. Only one of docker_image_batch_job_bundle and docker_image_batch_job_bundle_name can be specified. docker_image_batch_job_bundle_name: The name of a batch job bundle. If specified, Launch will use the most recent bundle with that name owned by the current user. Only one of docker_image_batch_job_bundle and docker_image_batch_job_bundle_name can be specified. labels: Kubernetes labels that are present on the batch job. job_config: A JSON-serializable python object that will get passed to the batch job, specifically as the contents of a file mounted at mount_location inside the bundle. You can call python's json.load() on the file to retrieve the contents. cpus: Optional override for the number of cpus to give to your job. Either the default must be specified in the bundle, or this must be specified. memory: Optional override for the amount of memory to give to your job. Either the default must be specified in the bundle, or this must be specified. gpus: Optional number of gpus to give to the bundle. If not specified in the bundle or here, will be interpreted as 0 gpus. gpu_type: Optional type of gpu. If the final number of gpus is positive, must be specified either in the bundle or here. storage: Optional reserved amount of disk to give to your batch job. If not specified, your job may be evicted if it is using too much disk.

create_docker_image_batch_job_bundle ¶

create_docker_image_batch_job_bundle(*, name: str, image_repository: str, image_tag: str, command: List[str], env: Optional[Dict[str, str]] = None, mount_location: Optional[str] = None, cpus: Optional[int] = None, memory: Optional[str] = None, gpus: Optional[int] = None, gpu_type: Optional[str] = None, storage: Optional[str] = None) -> CreateDockerImageBatchJobBundleResponse

For self hosted mode only.

Creates a Docker Image Batch Job Bundle.

Parameters:

Name	Type	Description	Default
`name`	`str`	A user-defined name for the bundle. Does not need to be unique.	required
`image_repository`	`str`	The (short) repository of your image. For example, if your image is located at 123456789012.dkr.ecr.us-west-2.amazonaws.com/repo:tag, and your version of Launch is configured to look at 123456789012.dkr.ecr.us-west-2.amazonaws.com for Docker Images, you would pass the value `repo` for the `image_repository` parameter.	required
`image_tag`	`str`	The tag of your image inside of the repo. In the example above, you would pass the value `tag` for the `image_tag` parameter.	required
`command`	`List[str]`	The command to run inside the docker image.	required
`env`	`Optional[Dict[str, str]]`	A dictionary of environment variables to inject into your docker image.	`None`
`mount_location`	`Optional[str]`	A location in the filesystem where you would like a json-formatted file, controllable on runtime, to be mounted. This allows behavior to be specified on runtime. (Specifically, the contents of this file can be read via `json.load()` inside of the user-defined code.)	`None`
`cpus`	`Optional[int]`	Optional default value for the number of cpus to give the job.	`None`
`memory`	`Optional[str]`	Optional default value for the amount of memory to give the job.	`None`
`gpus`	`Optional[int]`	Optional default value for the number of gpus to give the job.	`None`
`gpu_type`	`Optional[str]`	Optional default value for the type of gpu to give the job.	`None`
`storage`	`Optional[str]`	Optional default value for the amount of disk to give the job.	`None`

create_fine_tune ¶

create_fine_tune(model: str, training_file: str, validation_file: Optional[str] = None, fine_tuning_method: Optional[str] = None, hyperparameters: Optional[Dict[str, str]] = None, wandb_config: Optional[Dict[str, Any]] = None, suffix: str = None) -> CreateFineTuneResponse

Create a fine-tune

Parameters:

Name	Type	Description	Default
`model`	`str`	Identifier of base model to train from.	required
`training_file`	`str`	Path to file of training dataset. Dataset must be a csv with columns 'prompt' and 'response'.	required
`validation_file`	`Optional[str]`	Path to file of validation dataset. Has the same format as training_file. If not provided, we will generate a split from the training dataset.	`None`
`fine_tuning_method`	`Optional[str]`	Fine-tuning method. Currently unused, but when different techniques are implemented we will expose this field.	`None`
`hyperparameters`	`Optional[Dict[str, str]]`	Hyperparameters to pass in to training job.	`None`
`wandb_config`	`Optional[Dict[str, Any]]`	Configuration for Weights and Biases. To enable set `hyperparameters["report_to"]` to `wandb`. `api_key` must be provided which is the API key.	`None`
`suffix`	`str`	Optional user-provided identifier suffix for the fine-tuned model.	`None`

Returns:

Name	Type	Description
`CreateFineTuneResponse`	`CreateFineTuneResponse`	ID of the created fine-tune

create_llm_model_endpoint ¶

create_llm_model_endpoint(endpoint_name: str, model_name: str, inference_framework_image_tag: str, source: LLMSource = LLMSource.HUGGING_FACE, inference_framework: LLMInferenceFramework = LLMInferenceFramework.DEEPSPEED, num_shards: int = 4, quantize: Optional[Quantization] = None, checkpoint_path: Optional[str] = None, cpus: int = 32, memory: str = '192Gi', storage: Optional[str] = None, gpus: int = 4, min_workers: int = 0, max_workers: int = 1, per_worker: int = 10, gpu_type: Optional[str] = 'nvidia-ampere-a10', endpoint_type: str = 'sync', high_priority: Optional[bool] = False, post_inference_hooks: Optional[List[PostInferenceHooks]] = None, default_callback_url: Optional[str] = None, default_callback_auth_kind: Optional[Literal['basic', 'mtls']] = None, default_callback_auth_username: Optional[str] = None, default_callback_auth_password: Optional[str] = None, default_callback_auth_cert: Optional[str] = None, default_callback_auth_key: Optional[str] = None, public_inference: Optional[bool] = None, update_if_exists: bool = False, labels: Optional[Dict[str, str]] = None)

Creates and registers a model endpoint in Scale Launch. The returned object is an instance of type Endpoint, which is a base class of either SyncEndpoint or AsyncEndpoint. This is the object to which you sent inference requests.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	The name of the model endpoint you want to create. The name must be unique across all endpoints that you own.	required
`model_name`	`str`	name for the LLM. List can be found at (TODO: add list of supported models)	required
`inference_framework_image_tag`	`str`	image tag for the inference framework. (TODO: use latest image tag when unspecified)	required
`source`	`LLMSource`	source of the LLM. Currently only HuggingFace is supported.	`HUGGING_FACE`
`inference_framework`	`LLMInferenceFramework`	inference framework for the LLM. Currently only DeepSpeed is supported.	`DEEPSPEED`
`num_shards`	`int`	number of shards for the LLM. When bigger than 1, LLM will be sharded to multiple GPUs. Number of GPUs must be larger than num_shards.	`4`
`quantize`	`Optional[Quantization]`	Quantization method for the LLM. Only affects behavior for text-generation-inference models.	`None`
`checkpoint_path`	`Optional[str]`	Path to the checkpoint to load the model from. Only affects behavior for text-generation-inference models.	`None`
`cpus`	`int`	Number of cpus each worker should get, e.g. 1, 2, etc. This must be greater than or equal to 1.	`32`
`memory`	`str`	Amount of memory each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of memory.	`'192Gi'`
`storage`	`Optional[str]`	Amount of local ephemeral storage each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of storage.	`None`
`gpus`	`int`	Number of gpus each worker should get, e.g. 0, 1, etc.	`4`
`min_workers`	`int`	The minimum number of workers. Must be greater than or equal to 0. This should be determined by computing the minimum throughput of your workload and dividing it by the throughput of a single worker. This field must be at least `1` for synchronous endpoints.	`0`
`max_workers`	`int`	The maximum number of workers. Must be greater than or equal to 0, and as well as greater than or equal to `min_workers`. This should be determined by computing the maximum throughput of your workload and dividing it by the throughput of a single worker.	`1`
`per_worker`	`int`	The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing `per_worker` requests, subject to the limits defined by `min_workers` and `max_workers`. If the average number of concurrent requests per worker is lower than `per_worker`, then the number of workers will be reduced. - Otherwise, if the average number of concurrent requests per worker is higher than `per_worker`, then the number of workers will be increased to meet the elevated traffic. Here is our recommendation for computing `per_worker`: Compute `min_workers` and `max_workers` per your minimum and maximum throughput requirements. 2. Determine a value for the maximum number of concurrent requests in the workload. Divide this number by `max_workers`. Doing this ensures that the number of workers will "climb" to `max_workers`.	`10`
`gpu_type`	`Optional[str]`	If specifying a non-zero number of gpus, this controls the type of gpu requested. Here are the supported values: `nvidia-tesla-t4` `nvidia-ampere-a10` `nvidia-hopper-h100` `nvidia-hopper-h100-1g20g` `nvidia-hopper-h100-3g40g`	`'nvidia-ampere-a10'`
`endpoint_type`	`str`	Either `"sync"` or `"async"`.	`'sync'`
`high_priority`	`Optional[bool]`	Either `True` or `False`. Enabling this will allow the created endpoint to leverage the shared pool of prewarmed nodes for faster spinup time.	`False`
`post_inference_hooks`	`Optional[List[PostInferenceHooks]]`	List of hooks to trigger after inference tasks are served.	`None`
`default_callback_url`	`Optional[str]`	The default callback url to use for async endpoints. This can be overridden in the task parameters for each individual task. post_inference_hooks must contain "callback" for the callback to be triggered.	`None`
`default_callback_auth_kind`	`Optional[Literal['basic', 'mtls']]`	The default callback auth kind to use for async endpoints. Either "basic" or "mtls". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_username`	`Optional[str]`	The default callback auth username to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_password`	`Optional[str]`	The default callback auth password to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_cert`	`Optional[str]`	The default callback auth cert to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_key`	`Optional[str]`	The default callback auth key to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.	`None`
`public_inference`	`Optional[bool]`	If `True`, this endpoint will be available to all user IDs for inference.	`None`
`update_if_exists`	`bool`	If `True`, will attempt to update the endpoint if it exists. Otherwise, will unconditionally try to create a new endpoint. Note that endpoint names for a given user must be unique, so attempting to call this function with `update_if_exists=False` for an existing endpoint will raise an error.	`False`
`labels`	`Optional[Dict[str, str]]`	An optional dictionary of key/value pairs to associate with this endpoint.	`None`

Returns:

Type	Description
	A Endpoint object that can be used to make requests to the endpoint.

create_model_bundle ¶

create_model_bundle(model_bundle_name: str, env_params: Dict[str, str], *, load_predict_fn: Optional[Callable[[LaunchModel_T], Callable[[Any], Any]]] = None, predict_fn_or_cls: Optional[Callable[[Any], Any]] = None, requirements: Optional[List[str]] = None, model: Optional[LaunchModel_T] = None, load_model_fn: Optional[Callable[[], LaunchModel_T]] = None, app_config: Optional[Union[Dict[str, Any], str]] = None, globals_copy: Optional[Dict[str, Any]] = None, request_schema: Optional[Type[BaseModel]] = None, response_schema: Optional[Type[BaseModel]] = None) -> ModelBundle

Warning

This method is deprecated. Use create_model_bundle_from_callable_v2 instead.

Parameters:

Name	Type	Description	Default
`model_bundle_name`	`str`	The name of the model bundle you want to create. The name must be unique across all bundles that you own.	required
`predict_fn_or_cls`	`Optional[Callable[[Any], Any]]`	`Function` or a `Callable` class that runs end-to-end (pre/post processing and model inference) on the call. i.e. `predict_fn_or_cls(REQUEST) -> RESPONSE`.	`None`
`model`	`Optional[LaunchModel_T]`	Typically a trained Neural Network, e.g. a Pytorch module. Exactly one of `model` and `load_model_fn` must be provided.	`None`
`load_model_fn`	`Optional[Callable[[], LaunchModel_T]]`	A function that, when run, loads a model. This function is essentially a deferred wrapper around the `model` argument. Exactly one of `model` and `load_model_fn` must be provided.	`None`
`load_predict_fn`	`Optional[Callable[[LaunchModel_T], Callable[[Any], Any]]]`	Function that, when called with a model, returns a function that carries out inference. If `model` is specified, then this is equivalent to: `load_predict_fn(model, app_config=optional_app_config]) -> predict_fn` Otherwise, if `load_model_fn` is specified, then this is equivalent to: `load_predict_fn(load_model_fn(), app_config=optional_app_config]) -> predict_fn` In both cases, `predict_fn` is then the inference function, i.e.: `predict_fn(REQUEST) -> RESPONSE`	`None`
`requirements`	`Optional[List[str]]`	A list of python package requirements, where each list element is of the form `<package_name>==<package_version>`, e.g. `["tensorflow==2.3.0", "tensorflow-hub==0.11.0"]` If you do not pass in a value for `requirements`, then you must pass in `globals()` for the `globals_copy` argument.	`None`
`app_config`	`Optional[Union[Dict[str, Any], str]]`	Either a Dictionary that represents a YAML file contents or a local path to a YAML file.	`None`
`env_params`	`Dict[str, str]`	A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which base image tag to use, etc. Specifically, the dictionary should contain the following keys: `framework_type`: either `tensorflow` or `pytorch`. - PyTorch fields: - `pytorch_image_tag`: An image tag for the `pytorch` docker base image. The list of tags can be found from https://hub.docker.com/r/pytorch/pytorch/tags. - Example: .. code-block:: python { "framework_type": "pytorch", "pytorch_image_tag": "1.10.0-cuda11.3-cudnn8-runtime" } Tensorflow fields: `tensorflow_version`: Version of tensorflow, e.g. `"2.3.0"`.	required
`globals_copy`	`Optional[Dict[str, Any]]`	Dictionary of the global symbol table. Normally provided by `globals()` built-in function.	`None`
`request_schema`	`Optional[Type[BaseModel]]`	A pydantic model that represents the request schema for the model bundle. This is used to validate the request body for the model bundle's endpoint.	`None`
`response_schema`	`Optional[Type[BaseModel]]`	A pydantic model that represents the request schema for the model bundle. This is used to validate the response for the model bundle's endpoint. Note: If request_schema is specified, then response_schema must also be specified.	`None`

create_model_bundle_from_callable_v2 ¶

create_model_bundle_from_callable_v2(*, model_bundle_name: str, load_predict_fn: Callable[[LaunchModel_T], Callable[[Any], Any]], load_model_fn: Callable[[], LaunchModel_T], request_schema: Type[BaseModel], response_schema: Type[BaseModel], requirements: Optional[List[str]] = None, pytorch_image_tag: Optional[str] = None, tensorflow_version: Optional[str] = None, custom_base_image_repository: Optional[str] = None, custom_base_image_tag: Optional[str] = None, app_config: Optional[Union[Dict[str, Any], str]] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Uploads and registers a model bundle to Scale Launch.

Parameters:

Name	Type	Description	Default
`model_bundle_name`	`str`	Name of the model bundle.	required
`load_predict_fn`	`Callable[[LaunchModel_T], Callable[[Any], Any]]`	Function that takes in a model and returns a predict function. When your model bundle is deployed, this predict function will be called as follows: `input = {"input": "some input"} # or whatever your request schema is. def load_model_fn(): # load model return model def load_predict_fn(model, app_config=None): def predict_fn(input): # do pre-processing output = model(input) # do post-processing return output return predict_fn predict_fn = load_predict_fn(load_model_fn(), app_config=optional_app_config) response = predict_fn(input)`	required
`load_model_fn`	`Callable[[], LaunchModel_T]`	A function that, when run, loads a model.	required
`request_schema`	`Type[BaseModel]`	A pydantic model that represents the request schema for the model bundle. This is used to validate the request body for the model bundle's endpoint.	required
`response_schema`	`Type[BaseModel]`	A pydantic model that represents the request schema for the model bundle. This is used to validate the response for the model bundle's endpoint.	required
`requirements`	`Optional[List[str]]`	List of pip requirements.	`None`
`pytorch_image_tag`	`Optional[str]`	The image tag for the PyTorch image that will be used to run the bundle. Exactly one of `pytorch_image_tag`, `tensorflow_version`, or `custom_base_image_repository` must be specified.	`None`
`tensorflow_version`	`Optional[str]`	The version of TensorFlow that will be used to run the bundle. If not specified, the default version will be used. Exactly one of `pytorch_image_tag`, `tensorflow_version`, or `custom_base_image_repository` must be specified.	`None`
`custom_base_image_repository`	`Optional[str]`	The repository for a custom base image that will be used to run the bundle. If not specified, the default base image will be used. Exactly one of `pytorch_image_tag`, `tensorflow_version`, or `custom_base_image_repository` must be specified.	`None`
`custom_base_image_tag`	`Optional[str]`	The tag for a custom base image that will be used to run the bundle. Must be specified if `custom_base_image_repository` is specified.	`None`
`app_config`	`Optional[Union[Dict[str, Any], str]]`	An optional dictionary of configuration values that will be passed to the bundle when it is run. These values can be accessed by the bundle via the `app_config` global variable.	`None`
`metadata`	`Optional[Dict[str, Any]]`	Metadata to record with the bundle.	`None`

Returns:

Type	Description
`CreateModelBundleV2Response`	An object containing the following keys: `model_bundle_id`: The ID of the created model bundle.

create_model_bundle_from_dirs ¶

create_model_bundle_from_dirs(*, model_bundle_name: str, base_paths: List[str], requirements_path: str, env_params: Dict[str, str], load_predict_fn_module_path: str, load_model_fn_module_path: str, app_config: Optional[Union[Dict[str, Any], str]] = None, request_schema: Optional[Type[BaseModel]] = None, response_schema: Optional[Type[BaseModel]] = None) -> ModelBundle

Warning

This method is deprecated. Use create_model_bundle_from_dirs_v2 instead.

Parameters:

Name	Type	Description	Default
`model_bundle_name`	`str`	The name of the model bundle you want to create. The name must be unique across all bundles that you own.	required
`base_paths`	`List[str]`	The paths on the local filesystem where the bundle code lives.	required
`requirements_path`	`str`	A path on the local filesystem where a `requirements.txt` file lives.	required
`env_params`	`Dict[str, str]`	A dictionary that dictates environment information e.g. the use of pytorch or tensorflow, which base image tag to use, etc. Specifically, the dictionary should contain the following keys: `framework_type`: either `tensorflow` or `pytorch`. PyTorch fields: `pytorch_image_tag`: An image tag for the `pytorch` docker base image. The list of tags can be found from https://hub.docker.com/r/pytorch/pytorch/tags Example: `{ "framework_type": "pytorch", "pytorch_image_tag": "1.10.0-cuda11.3-cudnn8-runtime", }`	required
`load_predict_fn_module_path`	`str`	A python module path for a function that, when called with the output of load_model_fn_module_path, returns a function that carries out inference.	required
`load_model_fn_module_path`	`str`	A python module path for a function that returns a model. The output feeds into the function located at load_predict_fn_module_path.	required
`app_config`	`Optional[Union[Dict[str, Any], str]]`	Either a Dictionary that represents a YAML file contents or a local path to a YAML file.	`None`
`request_schema`	`Optional[Type[BaseModel]]`	A pydantic model that represents the request schema for the model bundle. This is used to validate the request body for the model bundle's endpoint.	`None`
`response_schema`	`Optional[Type[BaseModel]]`	A pydantic model that represents the request schema for the model bundle. This is used to validate the response for the model bundle's endpoint. Note: If request_schema is specified, then response_schema must also be specified.	`None`

create_model_bundle_from_dirs_v2 ¶

create_model_bundle_from_dirs_v2(*, model_bundle_name: str, base_paths: List[str], load_predict_fn_module_path: str, load_model_fn_module_path: str, request_schema: Type[BaseModel], response_schema: Type[BaseModel], requirements_path: Optional[str] = None, pytorch_image_tag: Optional[str] = None, tensorflow_version: Optional[str] = None, custom_base_image_repository: Optional[str] = None, custom_base_image_tag: Optional[str] = None, app_config: Optional[Dict[str, Any]] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Packages up code from one or more local filesystem folders and uploads them as a bundle to Scale Launch. In this mode, a bundle is just local code instead of a serialized object.

For example, if you have a directory structure like so, and your current working directory is my_root:

   my_root/
       my_module1/
           __init__.py
           ...files and directories
           my_inference_file.py
       my_module2/
           __init__.py
           ...files and directories

then calling create_model_bundle_from_dirs_v2 with base_paths=["my_module1", "my_module2"] essentially creates a zip file without the root directory, e.g.:

   my_module1/
       __init__.py
       ...files and directories
       my_inference_file.py
   my_module2/
       __init__.py
       ...files and directories

and these contents will be unzipped relative to the server side application root. Bear these points in mind when referencing Python module paths for this bundle. For instance, if my_inference_file.py has def f(...) as the desired inference loading function, then the load_predict_fn_module_path argument should be my_module1.my_inference_file.f.

Parameters:

Name	Type	Description	Default
`model_bundle_name`	`str`	The name of the model bundle you want to create.	required
`base_paths`	`List[str]`	A list of paths to directories that will be zipped up and uploaded as a bundle. Each path must be relative to the current working directory.	required
`load_predict_fn_module_path`	`str`	The Python module path to the function that will be used to load the model for inference. This function should take in a path to a model directory, and return a model object. The model object should be pickleable.	required
`load_model_fn_module_path`	`str`	The Python module path to the function that will be used to load the model for training. This function should take in a path to a model directory, and return a model object. The model object should be pickleable.	required
`request_schema`	`Type[BaseModel]`	A Pydantic model that defines the request schema for the bundle.	required
`response_schema`	`Type[BaseModel]`	A Pydantic model that defines the response schema for the bundle.	required
`requirements_path`	`Optional[str]`	Path to a requirements.txt file that will be used to install dependencies for the bundle. This file must be relative to the current working directory.	`None`
`pytorch_image_tag`	`Optional[str]`	The image tag for the PyTorch image that will be used to run the bundle. Exactly one of `pytorch_image_tag`, `tensorflow_version`, or `custom_base_image_repository` must be specified.	`None`
`tensorflow_version`	`Optional[str]`	The version of TensorFlow that will be used to run the bundle. If not specified, the default version will be used. Exactly one of `pytorch_image_tag`, `tensorflow_version`, or `custom_base_image_repository` must be specified.	`None`
`custom_base_image_repository`	`Optional[str]`	The repository for a custom base image that will be used to run the bundle. If not specified, the default base image will be used. Exactly one of `pytorch_image_tag`, `tensorflow_version`, or `custom_base_image_repository` must be specified.	`None`
`custom_base_image_tag`	`Optional[str]`	The tag for a custom base image that will be used to run the bundle. Must be specified if `custom_base_image_repository` is specified.	`None`
`app_config`	`Optional[Dict[str, Any]]`	An optional dictionary of configuration values that will be passed to the bundle when it is run. These values can be accessed by the bundle via the `app_config` global variable.	`None`
`metadata`	`Optional[Dict[str, Any]]`	Metadata to record with the bundle.	`None`

Returns:

Type	Description
`CreateModelBundleV2Response`	An object containing the following keys: `model_bundle_id`: The ID of the created model bundle.

create_model_bundle_from_runnable_image_v2 ¶

create_model_bundle_from_runnable_image_v2(*, model_bundle_name: str, request_schema: Type[BaseModel], response_schema: Type[BaseModel], repository: str, tag: str, command: List[str], healthcheck_route: Optional[str] = None, predict_route: Optional[str] = None, env: Dict[str, str], readiness_initial_delay_seconds: int, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Create a model bundle from a runnable image. The specified command must start a process that will listen for requests on port 5005 using HTTP.

Inference requests must be served at the POST /predict route while the GET /readyz route is a healthcheck.

Parameters:

Name	Type	Description	Default
`model_bundle_name`	`str`	The name of the model bundle you want to create.	required
`request_schema`	`Type[BaseModel]`	A Pydantic model that defines the request schema for the bundle.	required
`response_schema`	`Type[BaseModel]`	A Pydantic model that defines the response schema for the bundle.	required
`repository`	`str`	The name of the Docker repository for the runnable image.	required
`tag`	`str`	The tag for the runnable image.	required
`command`	`List[str]`	The command that will be used to start the process that listens for requests.	required
`predict_route`	`Optional[str]`	The endpoint route on the runnable image that will be called.	`None`
`healthcheck_route`	`Optional[str]`	The healthcheck endpoint route on the runnable image.	`None`
`env`	`Dict[str, str]`	A dictionary of environment variables that will be passed to the bundle when it is run.	required
`readiness_initial_delay_seconds`	`int`	The number of seconds to wait for the HTTP server to become ready and successfully respond on its healthcheck.	required
`metadata`	`Optional[Dict[str, Any]]`	Metadata to record with the bundle.	`None`

Returns:

Type	Description
`CreateModelBundleV2Response`	An object containing the following keys: `model_bundle_id`: The ID of the created model bundle.

create_model_bundle_from_streaming_enhanced_runnable_image_v2 ¶

create_model_bundle_from_streaming_enhanced_runnable_image_v2(*, model_bundle_name: str, request_schema: Type[BaseModel], response_schema: Type[BaseModel], repository: str, tag: str, command: Optional[List[str]] = None, healthcheck_route: Optional[str] = None, predict_route: Optional[str] = None, streaming_command: List[str], streaming_predict_route: Optional[str] = None, env: Dict[str, str], readiness_initial_delay_seconds: int, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Create a model bundle from a runnable image. The specified command must start a process that will listen for requests on port 5005 using HTTP.

Inference requests must be served at the POST /predict route while the GET /readyz route is a healthcheck.

Parameters:

Name	Type	Description	Default
`model_bundle_name`	`str`	The name of the model bundle you want to create.	required
`request_schema`	`Type[BaseModel]`	A Pydantic model that defines the request schema for the bundle.	required
`response_schema`	`Type[BaseModel]`	A Pydantic model that defines the response schema for the bundle.	required
`repository`	`str`	The name of the Docker repository for the runnable image.	required
`tag`	`str`	The tag for the runnable image.	required
`command`	`Optional[List[str]]`	The command that will be used to start the process that listens for requests if this bundle is used as a SYNC or ASYNC endpoint.	`None`
`healthcheck_route`	`Optional[str]`	The healthcheck endpoint route on the runnable image.	`None`
`predict_route`	`Optional[str]`	The endpoint route on the runnable image that will be called if this bundle is used as a SYNC or ASYNC endpoint.	`None`
`streaming_command`	`List[str]`	The command that will be used to start the process that listens for requests if this bundle is used as a STREAMING endpoint.	required
`streaming_predict_route`	`Optional[str]`	The endpoint route on the runnable image that will be called if this bundle is used as a STREAMING endpoint.	`None`
`env`	`Dict[str, str]`	A dictionary of environment variables that will be passed to the bundle when it is run.	required
`readiness_initial_delay_seconds`	`int`	The number of seconds to wait for the HTTP server to become ready and successfully respond on its healthcheck.	required
`metadata`	`Optional[Dict[str, Any]]`	Metadata to record with the bundle.	`None`

Returns:

Type	Description
`CreateModelBundleV2Response`	An object containing the following keys: `model_bundle_id`: The ID of the created model bundle.

create_model_bundle_from_triton_enhanced_runnable_image_v2 ¶

create_model_bundle_from_triton_enhanced_runnable_image_v2(*, model_bundle_name: str, request_schema: Type[BaseModel], response_schema: Type[BaseModel], repository: str, tag: str, command: List[str], healthcheck_route: Optional[str] = None, predict_route: Optional[str] = None, env: Dict[str, str], readiness_initial_delay_seconds: int, triton_model_repository: str, triton_model_replicas: Optional[Dict[str, str]] = None, triton_num_cpu: float, triton_commit_tag: str, triton_storage: Optional[str] = None, triton_memory: Optional[str] = None, triton_readiness_initial_delay_seconds: int, metadata: Optional[Dict[str, Any]] = None) -> CreateModelBundleV2Response

Create a model bundle from a runnable image and a tritonserver image.

Same requirements as :param:create_model_bundle_from_runnable_image_v2 with additional constraints necessary for configuring tritonserver's execution.

Parameters:

Name	Type	Description	Default
`model_bundle_name`	`str`	The name of the model bundle you want to create.	required
`request_schema`	`Type[BaseModel]`	A Pydantic model that defines the request schema for the bundle.	required
`response_schema`	`Type[BaseModel]`	A Pydantic model that defines the response schema for the bundle.	required
`repository`	`str`	The name of the Docker repository for the runnable image.	required
`tag`	`str`	The tag for the runnable image.	required
`command`	`List[str]`	The command that will be used to start the process that listens for requests.	required
`predict_route`	`Optional[str]`	The endpoint route on the runnable image that will be called.	`None`
`healthcheck_route`	`Optional[str]`	The healthcheck endpoint route on the runnable image.	`None`
`env`	`Dict[str, str]`	A dictionary of environment variables that will be passed to the bundle when it is run.	required
`readiness_initial_delay_seconds`	`int`	The number of seconds to wait for the HTTP server to become ready and successfully respond on its healthcheck.	required
`triton_model_repository`	`str`	The S3 prefix that contains the contents of the model repository, formatted according to https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md	required
`triton_model_replicas`	`Optional[Dict[str, str]]`	If supplied, the name and number of replicas to make for each model.	`None`
`triton_num_cpu`	`float`	Number of CPUs, fractional, to allocate to tritonserver.	required
`triton_commit_tag`	`str`	The image tag of the specific trionserver version.	required
`triton_storage`	`Optional[str]`	Amount of storage space to allocate for the tritonserver container.	`None`
`triton_memory`	`Optional[str]`	Amount of memory to allocate for the tritonserver container.	`None`
`triton_readiness_initial_delay_seconds`	`int`	Like readiness_initial_delay_seconds, but for tritonserver's own healthcheck.	required
`metadata`	`Optional[Dict[str, Any]]`	Metadata to record with the bundle.	`None`

Returns:

Type	Description
`CreateModelBundleV2Response`	An object containing the following keys: `model_bundle_id`: The ID of the created model bundle.

create_model_endpoint ¶

create_model_endpoint(*, endpoint_name: str, model_bundle: Union[ModelBundle, str], cpus: int = 3, memory: str = '8Gi', storage: str = '16Gi', gpus: int = 0, min_workers: int = 1, max_workers: int = 1, per_worker: int = 10, gpu_type: Optional[str] = None, endpoint_type: str = 'sync', high_priority: Optional[bool] = False, post_inference_hooks: Optional[List[PostInferenceHooks]] = None, default_callback_url: Optional[str] = None, default_callback_auth_kind: Optional[Literal['basic', 'mtls']] = None, default_callback_auth_username: Optional[str] = None, default_callback_auth_password: Optional[str] = None, default_callback_auth_cert: Optional[str] = None, default_callback_auth_key: Optional[str] = None, public_inference: Optional[bool] = None, update_if_exists: bool = False, labels: Optional[Dict[str, str]] = None) -> Optional[Endpoint]

Creates and registers a model endpoint in Scale Launch. The returned object is an instance of type Endpoint, which is a base class of either SyncEndpoint or AsyncEndpoint. This is the object to which you sent inference requests.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	The name of the model endpoint you want to create. The name must be unique across all endpoints that you own.	required
`model_bundle`	`Union[ModelBundle, str]`	The `ModelBundle` that the endpoint should serve.	required
`cpus`	`int`	Number of cpus each worker should get, e.g. 1, 2, etc. This must be greater than or equal to 1.	`3`
`memory`	`str`	Amount of memory each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of memory.	`'8Gi'`
`storage`	`str`	Amount of local ephemeral storage each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of storage.	`'16Gi'`
`gpus`	`int`	Number of gpus each worker should get, e.g. 0, 1, etc.	`0`
`min_workers`	`int`	The minimum number of workers. Must be greater than or equal to 0. This should be determined by computing the minimum throughput of your workload and dividing it by the throughput of a single worker. This field must be at least `1` for synchronous endpoints.	`1`
`max_workers`	`int`	The maximum number of workers. Must be greater than or equal to 0, and as well as greater than or equal to `min_workers`. This should be determined by computing the maximum throughput of your workload and dividing it by the throughput of a single worker.	`1`
`per_worker`	`int`	The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing `per_worker` requests, subject to the limits defined by `min_workers` and `max_workers`. If the average number of concurrent requests per worker is lower than `per_worker`, then the number of workers will be reduced. - Otherwise, if the average number of concurrent requests per worker is higher than `per_worker`, then the number of workers will be increased to meet the elevated traffic. Here is our recommendation for computing `per_worker`: Compute `min_workers` and `max_workers` per your minimum and maximum throughput requirements. 2. Determine a value for the maximum number of concurrent requests in the workload. Divide this number by `max_workers`. Doing this ensures that the number of workers will "climb" to `max_workers`.	`10`
`gpu_type`	`Optional[str]`	If specifying a non-zero number of gpus, this controls the type of gpu requested. Here are the supported values: `nvidia-tesla-t4` `nvidia-ampere-a10` `nvidia-hopper-h100` `nvidia-hopper-h100-1g20g` `nvidia-hopper-h100-3g40g`	`None`
`endpoint_type`	`str`	Either `"sync"`, `"async"`, or `"streaming"`.	`'sync'`
`high_priority`	`Optional[bool]`	Either `True` or `False`. Enabling this will allow the created endpoint to leverage the shared pool of prewarmed nodes for faster spinup time.	`False`
`post_inference_hooks`	`Optional[List[PostInferenceHooks]]`	List of hooks to trigger after inference tasks are served.	`None`
`default_callback_url`	`Optional[str]`	The default callback url to use for async endpoints. This can be overridden in the task parameters for each individual task. post_inference_hooks must contain "callback" for the callback to be triggered.	`None`
`default_callback_auth_kind`	`Optional[Literal['basic', 'mtls']]`	The default callback auth kind to use for async endpoints. Either "basic" or "mtls". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_username`	`Optional[str]`	The default callback auth username to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_password`	`Optional[str]`	The default callback auth password to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_cert`	`Optional[str]`	The default callback auth cert to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_key`	`Optional[str]`	The default callback auth key to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.	`None`
`public_inference`	`Optional[bool]`	If `True`, this endpoint will be available to all user IDs for inference.	`None`
`update_if_exists`	`bool`	If `True`, will attempt to update the endpoint if it exists. Otherwise, will unconditionally try to create a new endpoint. Note that endpoint names for a given user must be unique, so attempting to call this function with `update_if_exists=False` for an existing endpoint will raise an error.	`False`
`labels`	`Optional[Dict[str, str]]`	An optional dictionary of key/value pairs to associate with this endpoint.	`None`

Returns:

Type	Description
`Optional[Endpoint]`	A Endpoint object that can be used to make requests to the endpoint.

delete_file ¶

delete_file(file_id: str) -> DeleteFileResponse

Delete a file

Parameters:

Name	Type	Description	Default
`file_id`	`str`	ID of the file	required

Returns:

Name	Type	Description
`DeleteFileResponse`	`DeleteFileResponse`	whether the deletion was successful

delete_llm_model_endpoint ¶

delete_llm_model_endpoint(model_endpoint_name: str) -> bool

Deletes an LLM model endpoint.

Parameters:

Name	Type	Description	Default
`model_endpoint_name`	`str`	The name of the model endpoint to delete.	required

delete_model_endpoint ¶

delete_model_endpoint(model_endpoint_name: str)

Deletes a model endpoint.

Parameters:

Name	Type	Description	Default
`model_endpoint`		A `ModelEndpoint` object.	required

edit_model_endpoint ¶

edit_model_endpoint(*, model_endpoint: Union[ModelEndpoint, str], model_bundle: Optional[Union[ModelBundle, str]] = None, cpus: Optional[float] = None, memory: Optional[str] = None, storage: Optional[str] = None, gpus: Optional[int] = None, min_workers: Optional[int] = None, max_workers: Optional[int] = None, per_worker: Optional[int] = None, gpu_type: Optional[str] = None, high_priority: Optional[bool] = None, post_inference_hooks: Optional[List[PostInferenceHooks]] = None, default_callback_url: Optional[str] = None, default_callback_auth_kind: Optional[Literal['basic', 'mtls']] = None, default_callback_auth_username: Optional[str] = None, default_callback_auth_password: Optional[str] = None, default_callback_auth_cert: Optional[str] = None, default_callback_auth_key: Optional[str] = None, public_inference: Optional[bool] = None) -> None

Edits an existing model endpoint. Here are the fields that cannot be edited on an existing endpoint:

The endpoint's name. - The endpoint's type (i.e. you cannot go from a SyncEnpdoint to an AsyncEndpoint or vice versa.

Parameters:

Name	Type	Description	Default
`model_endpoint`	`Union[ModelEndpoint, str]`	The model endpoint (or its name) you want to edit. The name must be unique across all endpoints that you own.	required
`model_bundle`	`Optional[Union[ModelBundle, str]]`	The `ModelBundle` that the endpoint should serve.	`None`
`cpus`	`Optional[float]`	Number of cpus each worker should get, e.g. 1, 2, etc. This must be greater than or equal to 1.	`None`
`memory`	`Optional[str]`	Amount of memory each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of memory.	`None`
`storage`	`Optional[str]`	Amount of local ephemeral storage each worker should get, e.g. "4Gi", "512Mi", etc. This must be a positive amount of storage.	`None`
`gpus`	`Optional[int]`	Number of gpus each worker should get, e.g. 0, 1, etc.	`None`
`min_workers`	`Optional[int]`	The minimum number of workers. Must be greater than or equal to 0.	`None`
`max_workers`	`Optional[int]`	The maximum number of workers. Must be greater than or equal to 0, and as well as greater than or equal to `min_workers`.	`None`
`per_worker`	`Optional[int]`	The maximum number of concurrent requests that an individual worker can service. Launch automatically scales the number of workers for the endpoint so that each worker is processing `per_worker` requests: If the average number of concurrent requests per worker is lower than `per_worker`, then the number of workers will be reduced. - Otherwise, if the average number of concurrent requests per worker is higher than `per_worker`, then the number of workers will be increased to meet the elevated traffic.	`None`
`gpu_type`	`Optional[str]`	If specifying a non-zero number of gpus, this controls the type of gpu requested. Here are the supported values: `nvidia-tesla-t4` `nvidia-ampere-a10` `nvidia-hopper-h100` `nvidia-hopper-h100-1g20g` `nvidia-hopper-h100-3g40g`	`None`
`high_priority`	`Optional[bool]`	Either `True` or `False`. Enabling this will allow the created endpoint to leverage the shared pool of prewarmed nodes for faster spinup time.	`None`
`post_inference_hooks`	`Optional[List[PostInferenceHooks]]`	List of hooks to trigger after inference tasks are served.	`None`
`default_callback_url`	`Optional[str]`	The default callback url to use for async endpoints. This can be overridden in the task parameters for each individual task. post_inference_hooks must contain "callback" for the callback to be triggered.	`None`
`default_callback_auth_kind`	`Optional[Literal['basic', 'mtls']]`	The default callback auth kind to use for async endpoints. Either "basic" or "mtls". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_username`	`Optional[str]`	The default callback auth username to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_password`	`Optional[str]`	The default callback auth password to use. This only applies if default_callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_cert`	`Optional[str]`	The default callback auth cert to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.	`None`
`default_callback_auth_key`	`Optional[str]`	The default callback auth key to use. This only applies if default_callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.	`None`
`public_inference`	`Optional[bool]`	If `True`, this endpoint will be available to all user IDs for inference.	`None`

get_batch_async_response ¶

get_batch_async_response(batch_job_id: str) -> Dict[str, Any]

Gets inference results from a previously created batch job.

Parameters:

Name	Type	Description	Default
`batch_job_id`	`str`	An id representing the batch task job. This id is the in the response from calling `batch_async_request`.	required

Returns:

Type	Description
`Dict[str, Any]`	A dictionary that contains the following fields:
`Dict[str, Any]`	`status`: The status of the job.
`Dict[str, Any]`	`result`: The url where the result is stored.
`Dict[str, Any]`	`duration`: A string representation of how long the job took to finish or how long it has been running, for a job current in progress.
`Dict[str, Any]`	`num_tasks_pending`: The number of tasks that are still pending.
`Dict[str, Any]`	`num_tasks_completed`: The number of tasks that have completed.

get_docker_image_batch_job ¶

get_docker_image_batch_job(batch_job_id: str)

For self hosted mode only. Gets information about a batch job given a batch job id.

get_docker_image_batch_job_bundle ¶

get_docker_image_batch_job_bundle(docker_image_batch_job_bundle_id: str) -> DockerImageBatchJobBundleResponse

For self hosted mode only. Gets information for a single batch job bundle with a given id.

get_file ¶

get_file(file_id: str) -> GetFileResponse

Get metadata about a file

Parameters:

Name	Type	Description	Default
`file_id`	`str`	ID of the file	required

Returns:

Name	Type	Description
`GetFileResponse`	`GetFileResponse`	ID, filename, and size of the requested file

get_file_content ¶

get_file_content(file_id: str) -> GetFileContentResponse

Get a file's content

Parameters:

Name	Type	Description	Default
`file_id`	`str`	ID of the file	required

Returns:

Name	Type	Description
`GetFileContentResponse`	`GetFileContentResponse`	ID and content of the requested file

get_fine_tune ¶

get_fine_tune(fine_tune_id: str) -> GetFineTuneResponse

Get status of a fine-tune

Parameters:

Name	Type	Description	Default
`fine_tune_id`	`str`	ID of the fine-tune	required

Returns:

Name	Type	Description
`GetFineTuneResponse`	`GetFineTuneResponse`	ID and status of the requested fine-tune

get_fine_tune_events ¶

get_fine_tune_events(fine_tune_id: str) -> GetFineTuneEventsResponse

Get list of fine-tune events

Parameters:

Name	Type	Description	Default
`fine_tune_id`	`str`	ID of the fine-tune	required

Returns:

Name	Type	Description
`GetFineTuneEventsResponse`	`GetFineTuneEventsResponse`	a list of all the events of the fine-tune

get_latest_docker_image_batch_job_bundle ¶

get_latest_docker_image_batch_job_bundle(bundle_name: str) -> DockerImageBatchJobBundleResponse

For self hosted mode only. Gets information for the latest batch job bundle with a given name.

get_latest_model_bundle_v2 ¶

get_latest_model_bundle_v2(model_bundle_name: str) -> ModelBundleV2Response

Get the latest version of a model bundle.

Parameters:

Name	Type	Description	Default
`model_bundle_name`	`str`	The name of the model bundle you want to get.	required

Returns:

Type Description

ModelBundleV2Response

An object containing the following keys:

id: The ID of the model bundle.
name: The name of the model bundle.
schema_location: The location of the schema for the model bundle.
flavor: The flavor of the model bundle. Either RunnableImage, CloudpickleArtifact, ZipArtifact, or TritonEnhancedRunnableImageFlavor.
created_at: The time the model bundle was created.
metadata: A dictionary of metadata associated with the model bundle.
model_artifact_ids: A list of IDs of model artifacts associated with the bundle.

get_llm_model_endpoint ¶

get_llm_model_endpoint(endpoint_name: str) -> Optional[Union[AsyncEndpoint, SyncEndpoint, StreamingEndpoint]]

Gets a model endpoint associated with a name that the user has access to.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	The name of the endpoint to retrieve.	required

get_model_bundle ¶

get_model_bundle(model_bundle: Union[ModelBundle, str]) -> ModelBundle

Returns a model bundle specified by bundle_name that the user owns.

Parameters:

Name	Type	Description	Default
`model_bundle`	`Union[ModelBundle, str]`	The bundle or its name.	required

Returns:

Type	Description
`ModelBundle`	A `ModelBundle` object

get_model_bundle_v2 ¶

get_model_bundle_v2(model_bundle_id: str) -> ModelBundleV2Response

Get a model bundle.

Parameters:

Name	Type	Description	Default
`model_bundle_id`	`str`	The ID of the model bundle you want to get.	required

Returns:

Type Description

ModelBundleV2Response

An object containing the following fields:

id: The ID of the model bundle.
name: The name of the model bundle.
flavor: The flavor of the model bundle. Either RunnableImage, CloudpickleArtifact, ZipArtifact, or TritonEnhancedRunnableImageFlavor.
created_at: The time the model bundle was created.
metadata: A dictionary of metadata associated with the model bundle.
model_artifact_ids: A list of IDs of model artifacts associated with the bundle.

get_model_endpoint ¶

get_model_endpoint(endpoint_name: str) -> Optional[Union[AsyncEndpoint, SyncEndpoint]]

Gets a model endpoint associated with a name.

Parameters:

Name	Type	Description	Default
`endpoint_name`	`str`	The name of the endpoint to retrieve.	required

list_docker_image_batch_job_bundles ¶

list_docker_image_batch_job_bundles(bundle_name: Optional[str] = None, order_by: Optional[Literal['newest', 'oldest']] = None) -> ListDockerImageBatchJobBundleResponse

For self hosted mode only. Gets information for multiple bundles.

Parameters:

Name	Type	Description	Default
`bundle_name`	`Optional[str]`	The name of the bundles to retrieve. If not specified, this will retrieve all	`None`
`order_by`	`Optional[Literal['newest', 'oldest']]`	Either "newest", "oldest", or not specified. Specify to sort by newest/oldest.	`None`

list_files ¶

list_files() -> ListFilesResponse

List files

Returns:

Name	Type	Description
`ListFilesResponse`	`ListFilesResponse`	list of all files (ID, filename, and size)

list_fine_tunes ¶

list_fine_tunes() -> ListFineTunesResponse

List fine-tunes

Returns:

Name	Type	Description
`ListFineTunesResponse`	`ListFineTunesResponse`	list of all fine-tunes and their statuses

list_llm_model_endpoints ¶

list_llm_model_endpoints() -> List[Endpoint]

Lists all LLM model endpoints that the user has access to.

Returns:

Type	Description
`List[Endpoint]`	A list of `ModelEndpoint` objects.

list_model_bundles ¶

list_model_bundles() -> List[ModelBundle]

Returns a list of model bundles that the user owns.

Returns:

Type	Description
`List[ModelBundle]`	A list of ModelBundle objects

list_model_bundles_v2 ¶

list_model_bundles_v2() -> ListModelBundlesV2Response

List all model bundles.

Returns:

Type	Description
`ListModelBundlesV2Response`	An object containing the following keys: `model_bundles`: A list of model bundles. Each model bundle is an object.

list_model_endpoints ¶

list_model_endpoints() -> List[Endpoint]

Lists all model endpoints that the user owns.

Returns:

Type	Description
`List[Endpoint]`	A list of `ModelEndpoint` objects.

model_download ¶

model_download(model_name: str, download_format: str = 'hugging_face') -> ModelDownloadResponse

download a finetuned model

Parameters:

Name	Type	Description	Default
`model_name`	`str`	name of the model to download	required
`download_format`	`str`	format of the model to download	`'hugging_face'`

Returns:

Name	Type	Description
`ModelDownloadResponse`	`ModelDownloadResponse`	dictionary with file names and urls to download the model

read_endpoint_creation_logs ¶

read_endpoint_creation_logs(model_endpoint: Union[ModelEndpoint, str])

Retrieves the logs for the creation of the endpoint.

Parameters:

Name	Type	Description	Default
`model_endpoint`	`Union[ModelEndpoint, str]`	The endpoint or its name.	required

register_batch_csv_location_fn ¶

register_batch_csv_location_fn(batch_csv_location_fn: Callable[[], str])

For self-hosted mode only. Registers a function that gives a location for batch CSV inputs. Should give different locations each time. This function is called as batch_csv_location_fn(), and should return a batch_csv_url that upload_batch_csv_fn can take.

Strictly, batch_csv_location_fn() does not need to return a str. The only requirement is that if batch_csv_location_fn returns a value of type T, then upload_batch_csv_fn() takes in an object of type T as its second argument (i.e. batch_csv_url).

Parameters:

Name	Type	Description	Default
`batch_csv_location_fn`	`Callable[[], str]`	Function that generates batch_csv_urls for upload_batch_csv_fn.	required

register_bundle_location_fn ¶

register_bundle_location_fn(bundle_location_fn: Callable[[], str])

For self-hosted mode only. Registers a function that gives a location for a model bundle. Should give different locations each time. This function is called as bundle_location_fn(), and should return a bundle_url that register_upload_bundle_fn can take.

Strictly, bundle_location_fn() does not need to return a str. The only requirement is that if bundle_location_fn returns a value of type T, then upload_bundle_fn() takes in an object of type T as its second argument (i.e. bundle_url).

Parameters:

Name	Type	Description	Default
`bundle_location_fn`	`Callable[[], str]`	Function that generates bundle_urls for upload_bundle_fn.	required

register_upload_batch_csv_fn ¶

register_upload_batch_csv_fn(upload_batch_csv_fn: Callable[[str, str], None])

For self-hosted mode only. Registers a function that handles batch text upload. This function is called as

upload_batch_csv_fn(csv_text, csv_url)

This function should directly write the contents of csv_text as a text string into csv_url.

Parameters:

Name	Type	Description	Default
`upload_batch_csv_fn`	`Callable[[str, str], None]`	Function that takes in a csv text (string type), and uploads that bundle to an appropriate location. Only needed for self-hosted mode.	required

register_upload_bundle_fn ¶

register_upload_bundle_fn(upload_bundle_fn: Callable[[str, str], None])

For self-hosted mode only. Registers a function that handles model bundle upload. This function is called as

upload_bundle_fn(serialized_bundle, bundle_url)

This function should directly write the contents of serialized_bundle as a binary string into bundle_url.

See register_bundle_location_fn for more notes on the signature of upload_bundle_fn

Parameters:

Name	Type	Description	Default
`upload_bundle_fn`	`Callable[[str, str], None]`	Function that takes in a serialized bundle (bytes type), and uploads that bundle to an appropriate location. Only needed for self-hosted mode.	required

update_docker_image_batch_job ¶

update_docker_image_batch_job(batch_job_id: str, cancel: bool)

For self hosted mode only. Updates a batch job by id. Use this if you want to cancel/delete a batch job.

upload_file ¶

upload_file(file_path: str) -> UploadFileResponse

Upload a file

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Path to a local file to upload.	required

Returns:

Name	Type	Description
`UploadFileResponse`	`UploadFileResponse`	ID of the created file