Value Proposition¶
An Enterprise-Ready Platform¶
SGP is not just an API, it is a full-stack platform that is meant to span across multiple business units and teams across a large enterprise. Here is how we support large-scale enterprises.
Org-Level Permissioning¶
Most companies have multiple teams that are potentially unassociated, for example, Amazon houses both Alexa and Robotics. Each of these business units may want to leverage generative AI. SGP recognizes that companies like this are looking to purchase a single platform that all teams can use.
Because these teams have their own proprietary data that cannot be shared cross-team, resources such as knowledge bases and finetuned models must be restricted for use for specific groups of users. Additionally, within user groups these resources should be able to be shared, so additional users can access resources that are meant to be shared with them. Within user groups, users should also have different permission levels (i.e. read, write, etc.).
SGP supports this hierarchical permissioning model by default.
In 2024, SGP will adopt an even more expressive role-based access control model by adopting Google's Zanzibar design.
Enterprise Data Volumes¶
APi capability is one thing, full stack support for enterprise data volumes is another.
Here is an example of how SGP goes deeper than just offering a simple API:
On the surface, retrieval looks like a fairly straightforward process:
- Extract text from a document
- Split a document into chunks
- Embed each chunk into a vector
- Ingest the vector into a vector database
- Query the vector database
This works for a small number of documents, but what happens if you have hundreds of thousands to millions of documents?
Here is how SGP is designed to make RAG enterprise ready:
- Vector Index Creation and Management
- Optimize shard density and size for performance
- Automatically create new indexes when optimal index sizes are exceeded
- Multiple Data Source Integrations
- Supports Google Drive, S3, Sharepoint, Direct JSON Upload, and more.
- Smart File-Diff Uploads
- Delete artifacts deleted from source
- Re-index artifacts modified at source
- Do nothing for artifacts unchanged at source
- Worker parallelization
- Scale ingestion horizontally to maximize throughput
- SGP can ingest documents at ~500MB/hour using less than 100 worker nodes. This throughput can easily increased by bumping the number of nodes for both the ingestion workers and the embedding model. Throughput can also improve by optimizing the hardware the embedding model is hosted on.
- Autoscaling
- Autoscale ingestion workers to lower costs during dormancy and burst for spiky workloads
- Text extraction
- Automatically extract text from non-text documents, i.e. DOCX, PPTX, PDF
- Chunking
- Select from a list of SGP supported chunking strategies to automatically split data into chunks during ingestion
- Easily swap out chunking strategies just by varying a small API request payload
- Embedding
- Automatically embed each chunk of text into a vector for storage in the vector DB
- Create knowledge bases with different embedding models to test how different embedding models affect retrieval performance
Customer VPC Deployments¶
Large enterprises are understandably concerned about data privacy. Data must neve leave the tenancy of their private cloud. To support this, SGP is built to be shipped into customer VPCs out of the box.
Currently SGP is ready to deploy in AWS environments. By eod of Q1 2024, SGP will be ready to deploy in Azure environments. We will continue to expact to other cloud providers as needed.
SGP services are split between a control plane and a data plane.
The control plane is a Scale-managed service that records installation metadata, like software versions and licenses. Additionally, when a software version upgrade is requested, the control plane is responsible for providing the upgrade information to the customer data plane. The control plane does not execute any application logic or have access to any customer application data.
The data plane runs on the customer’s cloud infrastructure, which Scale AI does not require access to. The data plane runs all SGP application services, including the logic for LLM inference, embedding, and retrieval.
For more information regarding this VPC deployment, please reach out to a Scale represenative for a full breakdown of our standard engagement.