Skip to content

Endpoint Predictions

Once endpoints have been created, users can send tasks to them to make predictions. The following code snippet shows how to send tasks to endpoints.

import os
from launch import EndpointRequest, LaunchClient

client = LaunchClient(api_key=os.getenv("LAUNCH_API_KEY"))
endpoint = client.get_model_endpoint("demo-endpoint-async")
future = endpoint.predict(request=EndpointRequest(args={"x": 2, "y": "hello"}))
response = future.get()
print(response)
import os
from launch import EndpointRequest, LaunchClient

client = LaunchClient(api_key=os.getenv("LAUNCH_API_KEY"))
endpoint = client.get_model_endpoint("demo-endpoint-sync")
response = endpoint.predict(request=EndpointRequest(args={"x": 2, "y": "hello"}))
print(response)
import os
from launch import EndpointRequest, LaunchClient

client = LaunchClient(api_key=os.getenv("LAUNCH_API_KEY"))
endpoint = client.get_model_endpoint("demo-endpoint-streaming")
response = endpoint.predict(request=EndpointRequest(args={"x": 2, "y": "hello"}))
for chunk in response:
    print(chunk)

EndpointRequest

EndpointRequest(url: Optional[str] = None, args: Optional[Dict] = None, callback_url: Optional[str] = None, callback_auth_kind: Optional[Literal['basic', 'mtls']] = None, callback_auth_username: Optional[str] = None, callback_auth_password: Optional[str] = None, callback_auth_cert: Optional[str] = None, callback_auth_key: Optional[str] = None, return_pickled: Optional[bool] = False, request_id: Optional[str] = None)

Represents a single request to either a SyncEndpoint, StreamingEndpoint, or AsyncEndpoint.

Parameters:

Name Type Description Default
url Optional[str]

A url to some file that can be read in to a ModelBundle's predict function. Can be an image, raw text, etc. Note: the contents of the file located at url are opened as a sequence of bytes and passed to the predict function. If you instead want to pass the url itself as an input to the predict function, see args.

Exactly one of url and args must be specified.

None
args Optional[Dict]

A Dictionary with arguments to a ModelBundle's predict function. If the predict function has signature predict_fn(foo, bar), then the keys in the dictionary should be "foo" and "bar". Values must be native Python objects.

Exactly one of url and args must be specified.

None
return_pickled Optional[bool]

Whether the output should be a pickled python object, or directly returned serialized json.

False
callback_url Optional[str]

The callback url to use for this task. If None, then the default_callback_url of the endpoint is used. The endpoint must specify "callback" as a post-inference hook for the callback to be triggered.

None
callback_auth_kind Optional[Literal['basic', 'mtls']]

The default callback auth kind to use for async endpoints. Either "basic" or "mtls". This can be overridden in the task parameters for each individual task.

None
callback_auth_username Optional[str]

The default callback auth username to use. This only applies if callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.

None
callback_auth_password Optional[str]

The default callback auth password to use. This only applies if callback_auth_kind is "basic". This can be overridden in the task parameters for each individual task.

None
callback_auth_cert Optional[str]

The default callback auth cert to use. This only applies if callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.

None
callback_auth_key Optional[str]

The default callback auth key to use. This only applies if callback_auth_kind is "mtls". This can be overridden in the task parameters for each individual task.

None
request_id Optional[str]

(deprecated) A user-specifiable id for requests. Should be unique among EndpointRequests made in the same batch call. If one isn't provided the client will generate its own.

None

EndpointResponseFuture

EndpointResponseFuture(client, endpoint_name: str, async_task_id: str)

Represents a future response from an Endpoint. Specifically, when the EndpointResponseFuture is ready, then its get method will return an actual instance of EndpointResponse.

This object should not be directly instantiated by the user.

Parameters:

Name Type Description Default
client

An instance of LaunchClient.

required
endpoint_name str

The name of the endpoint.

required
async_task_id str

An async task id.

required

get

get(timeout: Optional[float] = None) -> EndpointResponse

Retrieves the EndpointResponse for the prediction request after it completes. This method blocks.

Parameters:

Name Type Description Default
timeout Optional[float]

The maximum number of seconds to wait for the response. If None, then the method will block indefinitely until the response is ready.

None

EndpointResponse

EndpointResponse(client, status: str, result_url: Optional[str] = None, result: Optional[str] = None, traceback: Optional[str] = None)

Represents a response received from a Endpoint.

Parameters:

Name Type Description Default
client

An instance of LaunchClient.

required
status str

A string representing the status of the request, i.e. SUCCESS, FAILURE, or PENDING

required
result_url Optional[str]

A string that is a url containing the pickled python object from the Endpoint's predict function.

Exactly one of result_url or result will be populated, depending on the value of return_pickled in the request.

None
result Optional[str]

A string that is the serialized return value (in json form) of the Endpoint's predict function. Specifically, one can json.loads() the value of result to get the original python object back.

Exactly one of result_url or result will be populated, depending on the value of return_pickled in the request.

None
traceback Optional[str]

The stack trace if the inference endpoint raised an error. Can be used for debugging

None

EndpointResponseStream

EndpointResponseStream(response)

Bases: Iterator

Represents a stream response from an Endpoint. This object is iterable and yields EndpointResponse objects.

This object should not be directly instantiated by the user.

__iter__

__iter__()

Uses server-sent events to iterate through the stream.

__next__

__next__()

Uses server-sent events to iterate through the stream.