Overview¶

Creating deployments on Launch generally involves three steps:

Create and upload a ModelBundle. Pass your trained model as well as pre-/post-processing code to the Scale Launch Python client, and we’ll create a model bundle based on the code and store it in our Bundle Store.
Create a ModelEndpoint. Pass a ModelBundle as well as infrastructure settings such as the desired number of GPUs to our client. This provisions resources on Scale’s cluster dedicated to your ModelEndpoint.
Make requests to the ModelEndpoint. You can make requests through the Python client, or make HTTP requests directly to Scale.