Designing private network connectivity for RAG-capable gen AI apps

The flexibility of Google Cloud allows enterprises to build secure and reliable architecture for their AI workloads. In this blog we will look at a reference architecture for private connectivity for retrieval-augmented generation (RAG)-capable generative AI applications. This architecture is for scenarios where communications of the overall system must use private IP addresses and must not traverse the internet.

The power of RAG

RAG is a powerful technique used to optimize the output of large language models (LLMs) by grounding them in specific, authoritative knowledge bases outside of their original training data. RAG allows an application to retrieve relevant information from your documents, datasources, or databases in real time. This retrieved context is then provided to the model alongside the user’s query, helping to ensure that the AI’s responses are accurate, verifiable, and highly relevant to your business. This improves the quality of responses and reduces hallucinations.

This approach is helpful because it allows you to direct generative AI to use a designated source of truth, rather than relying solely on the model's pre-existing knowledge, and without needing to retrain or fine-tune the model itself.

Design pattern example

To understand how to think about setting up your network for private connectivity for a RAG application in a regional design, let's look at the design pattern.

The setup comprises an external network (on-prem and other clouds) and Google Cloud environments consisting of a routing project, a Shared VPC host project for RAG, and three specialized service projects: data ingestion, serving, and frontend.

This design utilizes the following services to provide an end-to-end solution:

*

Cloud Interconnect or Cloud VPN: To securely connect from your on-premises or other clouds to the routing VPC network

*

Network Connectivity Center: Used as an orchestration framework to manage connectivity between the routing VPC network and the RAG VPC network via VPC spokes and hybrid spokes

*

Cloud Router: In the routing project, facilitates dynamic BGP route exchange between the external network and Google Cloud

*

Private Service Connect: Provides a private endpoint in the routing VPC network to reach the Cloud Storage bucket for data ingestion without traversing the public internet

*

Shared VPC: Host project architecture that allows multiple service projects to use a common, centralized VPC network

*

Google Cloud Armor and Application Load Balancer: Placed in the frontend service project to provide security and traffic management for user interaction

* VPC Service Controls: Creates a managed security perimeter around all resources to mitigate data exfiltration risks

The traffic flow

RAG population flow

In the diagram, the green dashed line shows the RAG population flow, which describes how data travels from data engineers to vector storage.

*

From the external network, data travels over Cloud Interconnect or Cloud VPN.

*

In the routing projects it uses the Private Service Connect endpoint to get to the Cloud Storage bucket.

*

From the Cloud Storage bucket in the Data Ingestion service project, the data ingestion subsystem processes the raw data.

*

The AI model creates vectors from the chunks, returns them to the data ingestion subsystem, which writes them to the RAG datastore in the serving service project.

Inference flow

In the diagram, the orange dashed line shows the inference flow, which describes customer or user requests.

*

The request travels over Cloud Interconnect or Cloud VPN to the routing VPC network and then over the VPC spoke to the RAG VPC network.

*

The request reaches the Application Load Balancer protected by Cloud Armor; once allowed, it passes it to the frontend subsystem.

*

The frontend subsystem forwards the request to the serving subsystem, which augments the prompt with data from the RAG datastore and generates a response via the AI model.

*

The system generates a response via the AI model, and the grounded response is returned along the same path to the requestor.

Management and routing

In the diagram, the blue dotted lines represent the Network Connectivity Center hybrid and VPC spokes that manage the control plane and route orchestration between the routing network and the RAG VPC network. This ensures that routes learned from the external network are appropriately propagated across the environment.

Please read the entire architecture document Private connectivity for RAG-capable generative AI applications to understand the specific including IAM permissions, VPC Service Controls, and deployment considerations.

Next steps

Take a deeper dive into the Cross-Cloud Network, and other guides about generative AI with RAG:

*

Document set: Generative AI with RAG

*

Document: Cross-Cloud Network for distributed applications

*

Blog: Build Your First ADK Agent Workforce

Want to ask a question, find out more or share a thought? Please connect with me on Linkedin. 🔗 Google IA

https://cloud.google.com/blog/products/networking/design-private-connectivity-for-rag-ai-apps/?utm_source=dlvr.it&utm_medium=blogger