Custom docker images¶

Warning

This feature is currently in beta, and the API is likely to change. Please contact us if you are interested in using this feature.

If you need more customization that what cloudpickle or zip artifacts can offer, or if you just already have a pre-built docker image, then you can create a Model Bundle with that docker image. You will need to modify your image to run a web server that exposes HTTP port 5005.

In our example below, we assume that you have some existing Python function my_inference_fn that can be imported. If you need to invoke some other binary (e.g. a custom C++ binary), then you can shell out to the OS to call that binary; subsequent versions of this document will have native examples for non-Python binaries.

For choice of web server, we recommend FastAPI due to its speed and ergonomics. Any web server would work, although we give examples with FastAPI.

Step 1: Install Requirements¶

You can add fastapi and uvicorn to the requirements.txt file that gets installed as part of your Dockerfile. Alternatively, you can add pip install fastapi uvicorn to the Dockerfile directly.

Step 2: Set up a web server application¶

Inside your project workspace, create a server.py file with these contents:

# test='skip'
from fastapi import FastAPI

from pydantic import BaseModel

app = FastAPI()

class MyRequestSchema(BaseModel):
    url: str


class MyResponseSchema(BaseModel):
    response: str

def my_inference_fn(req: MyRequestSchema) -> MyResponseSchema:
    # This is an example inference function - you can instead import a function from your own codebase,
    # or shell out to the OS, etc.
    resp = req.url + "_hello"
    return MyResponseSchema(response=resp)

@app.post("/predict")
async def predict(request: MyRequestSchema) -> MyResponseSchema:
    response = my_inference_fn(request)
    return response

@app.get("/readyz")
def readyz():
    return "ok"

Step 3: Rebuild and push your image¶

Build your updated Dockerfile and push the image to a location that is accessible by Scale. For instance, if you are using AWS ECR, please make sure that the necessary cross-account permissions allow Scale to pull your docker image.

Step 4: Deploy!¶

Now you can upload your docker image as a Model Bundle, and then create a Model Endpoint referencing that Model Bundle. Note that path.to.your.server.file:app in the command section below should be relative to the WORKDIR of your docker image.

# test='skip'
import os

from launch import LaunchClient

from server import MyRequestSchema, MyResponseSchema  # Defined as part of your server.py

client = LaunchClient(api_key=os.getenv("LAUNCH_API_KEY"))

model_bundle_name = "my_bundle_name"

client.create_model_bundle_from_runnable_image_v2(
    model_bundle_name=model_bundle_name,
    request_schema=MyRequestSchema,
    response_schema=MyResponseSchema,
    repository="$YOUR_ECR_REPO",
    tag="$YOUR_IMAGE_TAG",
    command=[
        "dumb-init",
        "--",
        "uvicorn",
        "path.to.your.server.file:app",
        "--port",
        "5005",
        "--host",
        "::",
    ],
    predict_route="/predict",
    healthcheck_route="/readyz",
    readiness_initial_delay_seconds=120,
    env={},
)

client.create_model_endpoint(
    endpoint_name=f"endpoint-{model_bundle_name}",
    model_bundle=model_bundle_name,
    endpoint_type="async",
    min_workers=0,
    max_workers=1,
    per_worker=1,
    memory="30Gi",
    storage="40Gi",
    cpus=4, # This must  be at least 2 because forwarding services consume 1 cpu.
    gpus=1,
    gpu_type="nvidia-ampere-a10",
    update_if_exists=True,
)