Deploying and Monitoring Your Production ADK Agent on Google Cloud

This chapter marks a critical transition: moving your sophisticated, context-aware ADK agent from a local development environment to a production-grade cloud platform. We’ll focus on deploying the containerized agent built in the previous chapter to Google Cloud Run, a fully managed serverless platform. Beyond deployment, we’ll establish essential operational capabilities, including secure secret management, robust logging, and foundational monitoring.

By the end of this chapter, you will have a live, accessible ADK agent running on Google Cloud, capable of persisting its state and conversational context, ready to serve users reliably. This milestone is about making your agent resilient, scalable, and observable in a real-world environment.

Project Overview: From Prototype to Production

In previous chapters, you designed and built an ADK agent capable of maintaining long-term conversational context by integrating with Google Cloud Firestore. You also containerized this agent using Docker. This chapter’s objective is to take that robust, containerized agent and deploy it to a production-ready environment on Google Cloud. We’ll prioritize automation, security, and observability to ensure the agent is not just functional but also maintainable and reliable in the wild.

Tech Stack for Production Deployment

To achieve a production-grade deployment, we’ll leverage several key Google Cloud services and Python libraries:

Python (3.12): The core language for our agent and FastAPI application.
Google ADK (version unknown, checked 2026-05-23): The Agent Development Kit for building the core agent logic.
FastAPI (0.111.0): A modern, fast (high-performance) web framework for Python, used to expose our agent via an HTTP API.
Uvicorn (0.30.1): An ASGI server that runs the FastAPI application.
Docker: For containerizing our agent, providing a portable and consistent deployment unit.
Google Cloud Run: A fully managed serverless platform for deploying containerized applications. It handles infrastructure, scaling, and provides a public endpoint.
Google Cloud Firestore: Our chosen NoSQL database for persisting agent state and conversational context across sessions.
Google Secret Manager: A service for securely storing and managing sensitive configuration data (e.g., API keys).
Google Cloud Logging: Centralized logging for all application and infrastructure logs, automatically integrated with Cloud Run.
Google Cloud IAM (Identity and Access Management): For managing permissions and ensuring the principle of least privilege for our deployed services.
Google Artifact Registry: A universal package manager for storing Docker images and other build artifacts.

Build Plan: Deploying Your Agent to the Cloud

Our deployment process is broken down into a series of logical steps, ensuring each component is correctly configured and secured.

Enable Google Cloud APIs: Activate necessary services in your project.
Create a Dedicated Service Account: Establish a runtime identity for your agent with precise permissions.
Refine Dockerfile for Cloud Run: Optimize the container image for a serverless environment.
Build and Push Docker Image: Store your container in Google Artifact Registry.
Secure Sensitive Configuration: Manage any external API keys or secrets using Secret Manager.
Deploy to Cloud Run: Launch your containerized agent as a managed service.
Verify and Monitor: Test the deployed agent and inspect its logs and metrics.

Architecture: Production Deployment Strategy

Deploying an AI agent to production requires careful consideration of scalability, security, and observability. Google Cloud Run offers an excellent balance of ease of deployment, auto-scaling, and integration with other Google Cloud services, making it an ideal choice for our ADK agent.

Our deployment strategy centers on Cloud Run hosting the ADK agent. The agent will interact with Firestore for state persistence and potentially other external tools. Critical configurations like API keys will be managed securely using Google Secret Manager. All agent activity, including requests, responses, and internal processing, will be automatically captured by Google Cloud Logging.

Here’s a high-level view of the architecture:

flowchart LR User --> Cloud_Run[Cloud Run Service]; Cloud_Run --> ADK_Agent[ADK Agent Container]; ADK_Agent --> Firestore[Firestore Database]; ADK_Agent --> Secret_Manager[Secret Manager]; ADK_Agent --> Cloud_Logging[Cloud Logging]; Firestore -->|Persist State| ADK_Agent; Secret_Manager -->|Provide Secrets| ADK_Agent; ADK_Agent -->|Emit Logs| Cloud_Logging; subgraph GCS["Google Cloud"] Cloud_Run Firestore Secret_Manager Cloud_Logging end

Why Cloud Run?

Serverless: No infrastructure to provision or manage. You only pay for the compute resources consumed.
Auto-scaling: Scales automatically from zero instances to handle peak loads, then scales back down, optimizing cost efficiency.
Container-based: Deploy any application packaged as a Docker container, making it highly portable and environment-agnostic.
Integrated: Seamlessly connects with other Google Cloud services like Firestore, Secret Manager, and Cloud Logging out of the box.

Step-by-Step Implementation

Before we begin, ensure you have the Google Cloud SDK installed and authenticated. The gcloud commands below assume you’ve already logged in and set your project.

gcloud auth login
gcloud config set project YOUR_GOOGLE_CLOUD_PROJECT_ID

1. Enable Required Google Cloud APIs

First, we need to ensure the necessary APIs are enabled in your Google Cloud project for Cloud Run, Artifact Registry, Secret Manager, and Logging.

gcloud services enable run.googleapis.com \
    artifactregistry.googleapis.com \
    secretmanager.googleapis.com \
    cloudbuild.googleapis.com \
    logging.googleapis.com

2. Create a Dedicated Service Account

It’s best practice to use a dedicated Google Cloud Service Account for your Cloud Run service with the principle of least privilege. This service account will be the runtime identity of your agent, not for deployment itself.

File: 08_deployment_setup.sh

Create this script in your project root and make it executable.

#!/bin/bash

# Get the current project ID
PROJECT_ID=$(gcloud config get-value project)
SERVICE_ACCOUNT_NAME="adk-agent-runner"
SERVICE_ACCOUNT_EMAIL="${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"

echo "Creating service account: ${SERVICE_ACCOUNT_EMAIL}"
gcloud iam service-accounts create "${SERVICE_ACCOUNT_NAME}" \
    --display-name "ADK Agent Cloud Run Service Account"

echo "Granting permissions to service account for runtime operations..."

# Grant Firestore Data Editor role for state persistence
# Firestore uses datastore roles for data access. This role allows read, write, and delete.
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member "serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
    --role "roles/datastore.user"

# Grant Secret Manager Secret Accessor role to retrieve secrets
# This role allows the service account to access the secret payload.
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member "serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
    --role "roles/secretmanager.secretAccessor"

# Grant Cloud Logging Writer role
# While Cloud Run automatically captures stdout/stderr, this role allows explicit logging if needed.
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member "serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
    --role "roles/logging.logWriter"

echo "Service account setup complete."
echo "Service Account Email: ${SERVICE_ACCOUNT_EMAIL}"

Run this script from your terminal:

bash 08_deployment_setup.sh

📌 Key Idea: The adk-agent-runner service account is strictly for the runtime identity of your Cloud Run service. It needs permissions to interact with other Google Cloud services (like Firestore, Secret Manager, Logging), but not to deploy or manage the Cloud Run service itself.

3. Update Dockerfile for Production Readiness

Ensure your Dockerfile (adapted from Chapter 7) is optimized for production. This includes using a multi-stage build to minimize image size and setting up the entrypoint for Uvicorn.

File: requirements.txt

Verify or create this file in your project root with the following dependencies. These versions were checked as stable on 2026-05-23.

adk
google-cloud-firestore
fastapi==0.111.0
uvicorn==0.30.1
pydantic==2.7.1

File: Dockerfile

Update your Dockerfile to use a multi-stage build.

# Stage 1: Build dependencies
FROM python:3.12-slim-bookworm as builder

WORKDIR /app

# Install ADK, FastAPI, Uvicorn, and other dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Stage 2: Final image for runtime
FROM python:3.12-slim-bookworm

WORKDIR /app

# Copy only necessary installed packages from the builder stage
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
# Copy your application code
COPY . .

# Environment variables for production best practices
ENV PYTHONUNBUFFERED=1
ENV ADK_ENV=production

# Cloud Run injects a PORT environment variable. We expose 8080 as a common default.
EXPOSE 8080

# Command to run the agent using Uvicorn, listening on all interfaces (0.0.0.0)
# and the port specified by Cloud Run (which defaults to 8080 if not set).
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Explanation:

FROM python:3.12-slim-bookworm as builder: Starts the first stage with a slim Python image. 3.12-slim-bookworm is a lightweight base image for Python 3.12.
RUN pip install --no-cache-dir -r requirements.txt: Installs Python dependencies. --no-cache-dir prevents pip from storing cache, further reducing image size.
FROM python:3.12-slim-bookworm: Starts the second, final stage with another clean, slim Python image.
COPY --from=builder ...: This is the core of the multi-stage build. It copies only the installed Python packages from the builder stage, avoiding unnecessary build tools or temporary files.
COPY . .: Copies your application code into the /app directory.
ENV PYTHONUNBUFFERED=1: Ensures Python output is sent directly to stdout/stderr, which Cloud Logging can capture immediately without buffering delays.
EXPOSE 8080: Declares that the container listens on port 8080. Cloud Run will use its PORT environment variable, which often defaults to 8080, to route traffic.
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]: This is the entrypoint. It starts the Uvicorn ASGI server, which hosts your FastAPI application (where app is the FastAPI instance in main.py). It binds to 0.0.0.0 to listen on all available network interfaces.

File: main.py

Your main.py needs to expose an HTTP endpoint for Cloud Run to interact with, typically /agent. This endpoint will receive user messages and session IDs, then route them to your ADK agent’s handle_message method, leveraging your persistent Firestore state manager.

# main.py
import os
import datetime
import logging # Use standard logging for better production practices
from fastapi import FastAPI, Request, HTTPException
from pydantic import BaseModel
from adk.agent import Agent
from adk.state_manager import InMemoryStateManager # ADK's internal, short-term state manager
from adk.message import Message
from adk.state_manager.firestore import FirestoreStateManager # Our long-term persistent state manager

# Configure basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

app = FastAPI()

# Pydantic model for incoming requests to ensure data validation
class AgentRequest(BaseModel):
    message: str
    session_id: str = "default_session"

# Initialize Firestore State Manager for long-term persistence
# This is *our* external state store, designed for cross-session context.
# It uses the GOOGLE_CLOUD_PROJECT environment variable, which Cloud Run will provide.
PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT", os.getenv("GCP_PROJECT"))
if not PROJECT_ID:
    logger.error("GOOGLE_CLOUD_PROJECT environment variable not set. Cannot initialize FirestoreStateManager.")
    raise ValueError("GOOGLE_CLOUD_PROJECT environment variable not set.")

persistent_state_manager = FirestoreStateManager(project_id=PROJECT_ID, collection_name="adk_agent_sessions")
logger.info(f"FirestoreStateManager initialized for project: {PROJECT_ID}, collection: adk_agent_sessions")

# Define our custom agent that leverages the persistent state manager
class PersistentADKAgent(Agent):
    def __init__(self, name: str, persistent_state_manager: FirestoreStateManager):
        # ADK's internal state_manager is typically for short-term conversational context
        # within a single interaction or a very short-lived session.
        # For our long-running, pause-resume agent, we manage external persistence separately.
        super().__init__(name, state_manager=InMemoryStateManager())
        self.persistent_state_manager = persistent_state_manager
        logger.info(f"PersistentADKAgent '{name}' initialized.")

    async def handle_message(self, message: Message, session_id: str = "default_session"):
        logger.info(f"Agent received message for session {session_id}: {message.content}")

        # Load persistent context for this session from Firestore
        # This is where we implement the "pause/resume" and long-term memory.
        try:
            stored_context = await self.persistent_state_manager.load_state(session_id)
            if stored_context:
                logger.info(f"Loaded persistent context for {session_id}.")
                last_message_content = stored_context.get('last_message', 'none')
                last_timestamp = stored_context.get('timestamp', 'unknown')
                response_content = (
                    f"You said: '{message.content}'. "
                    f"Previously, on {last_timestamp}, you said: '{last_message_content}'."
                )
            else:
                response_content = f"You said: '{message.content}'. This is a new session."
                logger.info(f"No persistent context found for {session_id}. Starting new session.")

            # Simulate agent processing and response
            # In a real agent, ADK's internal state_manager would be used here
            # for tool calls, intermediate thoughts, etc., within this single interaction.
            response = Message(content=response_content, sender="agent")

            # Save updated persistent context to Firestore for the next interaction
            await self.persistent_state_manager.save_state(
                session_id,
                {"last_message": message.content, "timestamp": datetime.datetime.utcnow().isoformat()}
            )
            logger.info(f"Saved updated persistent context for {session_id}.")
            return response
        except Exception as e:
            logger.error(f"Error in handle_message for session {session_id}: {e}", exc_info=True)
            # Re-raise to be caught by the FastAPI exception handler
            raise

# Instantiate our agent globally
adk_agent = PersistentADKAgent(name="MyPersistentADKAgent", persistent_state_manager=persistent_state_manager)

@app.post('/agent')
async def agent_endpoint(request_data: AgentRequest):
    """
    Handles incoming messages for the ADK agent.
    Receives a message and session_id, processes it with the agent,
    and returns the agent's response.
    """
    user_message_content = request_data.message
    session_id = request_data.session_id

    user_message = Message(content=user_message_content, sender="user")

    try:
        response_message = await adk_agent.handle_message(user_message, session_id)
        return {"response": response_message.content}
    except Exception as e:
        logger.error(f"Error handling agent request for session {session_id}: {e}", exc_info=True)
        # Raise HTTPException to return a proper HTTP 500 error to the client
        raise HTTPException(status_code=500, detail=f"Internal server error processing request: {e}")

# Note: The `if __name__ == '__main__':` block is not needed when using Uvicorn directly
# via the CMD command in the Dockerfile, as Uvicorn manages the server startup.

Key Changes and Decisions in main.py:

FastAPI Integration: The @app.post('/agent') decorator defines an HTTP POST endpoint. Cloud Run will route incoming requests to this.
Pydantic for Validation: AgentRequest ensures that incoming JSON payloads have message (string) and session_id (string, with a default).
Persistent State Manager: The FirestoreStateManager is initialized globally. It relies on the GOOGLE_CLOUD_PROJECT environment variable, which Cloud Run will inject.
PersistentADKAgent: This custom agent class wraps the ADK Agent and explicitly uses our FirestoreStateManager for load_state and save_state operations, thereby providing long-term memory and pause/resume capabilities.
Structured Logging: Replaced print() statements with Python’s standard logging module. This is crucial for production, as Cloud Logging can better parse and filter structured log data.
Error Handling: Added try...except blocks to catch potential issues during agent processing or Firestore interactions, returning a proper HTTP 500 response.

4. Build and Push Docker Image to Artifact Registry

First, configure Docker to authenticate with Google Cloud’s Artifact Registry. Replace us-central1 with your chosen region.

gcloud auth configure-docker us-central1-docker.pkg.dev

Next, build your Docker image and push it to Artifact Registry.

PROJECT_ID=$(gcloud config get-value project)

# Create an Artifact Registry repository if you haven't already
# The repository name 'adk-agent-repo' is a suggestion.
gcloud artifacts repositories create adk-agent-repo --repository-format=docker \
    --location=us-central1 --description="Docker repository for ADK agents" \
    --async # Run in background to not block terminal

# Define the full image name
IMAGE_NAME="us-central1-docker.pkg.dev/${PROJECT_ID}/adk-agent-repo/adk-persistent-agent:latest"

echo "Building Docker image: ${IMAGE_NAME}"
# Build the Docker image from the current directory
docker build -t "${IMAGE_NAME}" .

echo "Pushing Docker image to Artifact Registry..."
# Push the Docker image to Artifact Registry
docker push "${IMAGE_NAME}"
echo "Docker image pushed successfully."

Explanation:

gcloud artifacts repositories create: Sets up a Docker repository in Artifact Registry to store your container images. This is a one-time setup.
docker build -t "${IMAGE_NAME}" .: Builds the Docker image based on your Dockerfile in the current directory and tags it with the specified name.
docker push "${IMAGE_NAME}": Uploads the built image to the Artifact Registry, making it accessible for Cloud Run deployments.

5. Secure Secrets with Secret Manager

If your agent requires any sensitive configuration (e.g., an external API key for a third-party tool) that isn’t handled by Google Cloud service accounts, use Secret Manager. For this project, the ADK often leverages service account credentials for Google Cloud services directly, reducing the need for explicit API keys for Google services.

Let’s assume you have a hypothetical EXTERNAL_TOOL_API_KEY for a non-Google service.

PROJECT_ID=$(gcloud config get-value project)
SERVICE_ACCOUNT_EMAIL="adk-agent-runner@${PROJECT_ID}.iam.gserviceaccount.com" # From step 2

echo "Creating secret 'EXTERNAL_TOOL_API_KEY' in Secret Manager..."
# Create a secret in Secret Manager with a dummy value
echo "my-super-secret-key-value-123" | gcloud secret-manager secrets create EXTERNAL_TOOL_API_KEY \
    --project="${PROJECT_ID}" --data-file=-

echo "Granting service account access to secret..."
# Grant the service account access to this specific secret
gcloud secret-manager secrets add-iam-policy-binding EXTERNAL_TOOL_API_KEY \
    --project="${PROJECT_ID}" \
    --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
    --role="roles/secretmanager.secretAccessor"

echo "Secret setup complete."

Explanation:

gcloud secret-manager secrets create: Creates a new secret named EXTERNAL_TOOL_API_KEY. The --data-file=- reads the secret value from standard input.
gcloud secret-manager secrets add-iam-policy-binding: Grants your adk-agent-runner service account the roles/secretmanager.secretAccessor role for this specific secret. This ensures the principle of least privilege.

6. Deploy to Cloud Run

Now, deploy your containerized agent to Cloud Run. We’ll link the dedicated service account, set environment variables, and inject the secret.

PROJECT_ID=$(gcloud config get-value project)
SERVICE_ACCOUNT_EMAIL="adk-agent-runner@${PROJECT_ID}.iam.gserviceaccount.com" # From step 2
IMAGE_NAME="us-central1-docker.pkg.dev/${PROJECT_ID}/adk-agent-repo/adk-persistent-agent:latest" # From step 4

echo "Deploying 'adk-persistent-agent' to Cloud Run..."
gcloud run deploy adk-persistent-agent \
    --image "${IMAGE_NAME}" \
    --platform managed \
    --region us-central1 \
    --allow-unauthenticated \
    --service-account "${SERVICE_ACCOUNT_EMAIL}" \
    --set-env-vars GOOGLE_CLOUD_PROJECT="${PROJECT_ID}" \
    --update-secrets EXTERNAL_TOOL_API_KEY=EXTERNAL_TOOL_API_KEY:latest \
    --memory 512Mi \
    --cpu 1 \
    --timeout 300s \
    --min-instances 0 \
    --max-instances 10

echo "Deployment initiated. Cloud Run will provide a service URL upon completion."

Explanation of parameters:

adk-persistent-agent: The chosen name for your Cloud Run service.
--image "${IMAGE_NAME}": Specifies the Docker image to deploy from Artifact Registry.
--platform managed: Indicates you’re using the fully managed Cloud Run service, Google handles the underlying infrastructure.
--region us-central1: The Google Cloud region for deployment. Choose one close to your users.
--allow-unauthenticated: Makes the service publicly accessible. For internal applications, you’d omit this and manage access with IAM.
--service-account "${SERVICE_ACCOUNT_EMAIL}": Assigns the dedicated service account created earlier. This account’s permissions dictate what your agent can do on Google Cloud (e.g., access Firestore, Secret Manager).
--set-env-vars GOOGLE_CLOUD_PROJECT="${PROJECT_ID}": Passes your project ID as an environment variable to the container. Your FirestoreStateManager uses this for initialization.
--update-secrets EXTERNAL_TOOL_API_KEY=EXTERNAL_TOOL_API_KEY:latest: Injects the EXTERNAL_TOOL_API_KEY secret as an environment variable named EXTERNAL_TOOL_API_KEY into your container. The :latest ensures it always uses the most recent version of the secret.
--memory 512Mi: Sets the memory limit for each instance. Adjust based on your agent’s resource needs.
--cpu 1: Allocates 1 CPU core per instance.
--timeout 300s: Maximum request processing time. ADK agents with complex tool use or long LLM calls might need longer.
--min-instances 0: Allows the service to scale down to zero instances when idle, significantly saving costs.
--max-instances 10: Sets the maximum number of instances Cloud Run can spin up to handle traffic. Adjust based on expected load.

After deployment, Cloud Run will provide a URL for your service in the terminal output. Note this URL.

7. Configure Logging and Monitoring

Cloud Run automatically integrates with Cloud Logging. All print() statements and logs from your Python application (especially those using the standard logging module, as we’ve updated main.py) will appear in Cloud Logging.

Viewing Logs:

Navigate to the Google Cloud Console.
Go to Logging > Logs Explorer.
Filter by resource.type="cloud_run_revision" and resource.labels.service_name="adk-persistent-agent".

You’ll see your agent’s INFO and ERROR log messages, FastAPI server logs, and any Python tracebacks.

Basic Monitoring:

Cloud Run also provides basic monitoring metrics (request count, latency, error rates) directly on the Cloud Run service details page in the Google Cloud Console. For advanced monitoring and custom alerts, you can use Cloud Monitoring (formerly Stackdriver Monitoring) to create dashboards and trigger notifications based on these metrics or even specific log patterns.

Testing & Verification

Now that your agent is deployed, let’s verify its functionality and context persistence.

Access the Agent: Use curl or a tool like Postman to send POST requests to your Cloud Run service URL (e.g., https://adk-persistent-agent-xxxxxxx-uc.a.run.app/agent). Remember to replace YOUR_CLOUD_RUN_URL with the actual URL provided by Cloud Run after deployment.

# Replace with your actual Cloud Run URL
AGENT_URL="YOUR_CLOUD_RUN_URL/agent"

echo "--- First interaction (new session for user123) ---"
curl -X POST -H "Content-Type: application/json" \
     -d '{"message": "Hello ADK!", "session_id": "user123"}' \
     "${AGENT_URL}"

echo ""
echo "--- Second interaction (same session for user123) ---"
curl -X POST -H "Content-Type: application/json" \
     -d '{"message": "How are you doing?", "session_id": "user123"}' \
     "${AGENT_URL}"

echo ""
echo "--- First interaction (new session for user456) ---"
curl -X POST -H "Content-Type: application/json" \
     -d '{"message": "Who are you?", "session_id": "user456"}' \
     "${AGENT_URL}"

Verify Context Persistence:
- Observe Responses: Check the responses from the curl commands. For user123, the second message should correctly acknowledge the content of the first message, demonstrating that the agent loaded the previous context. The response should resemble: "You said: 'How are you doing?'. Previously, on [timestamp], you said: 'Hello ADK!'."
- Inspect Firestore: Go to your Firestore console in Google Cloud. You should see a collection named adk_agent_sessions (or whatever you configured in main.py). Within this collection, there should be documents corresponding to user123 and user456, each containing their respective last_message and timestamp fields. This confirms state is being written and read from the external store.
Check Cloud Run Logs:
- Go to Logging > Logs Explorer and filter for your adk-persistent-agent service.
- You should see log entries for each request, including the agent’s internal logger.info statements (e.g., “Agent received message…”, “Loaded persistent context…”, “Saved updated persistent context…”). This confirms your agent is running, processing requests, and interacting with Firestore as expected. Look for any logger.error messages if something isn’t working.

Operations and Production Readiness

Deploying is just the first step. Operating a production AI agent requires ongoing attention to observability, security, scalability, and cost.

Observability

Structured Logging: As implemented in main.py, using Python’s logging module is a good start. For even richer insights, consider formatting your logs as JSON. Cloud Logging can automatically parse JSON logs, allowing you to filter and query specific fields (e.g., jsonPayload.session_id, jsonPayload.tool_call_details, jsonPayload.error_type).
Metrics & Alerts: Use Cloud Monitoring to create custom metrics from your logs (e.g., count of agent errors, latency of agent responses, number of tool calls). Set up alerts to notify your team via email, SMS, or PagerDuty if these metrics cross predefined thresholds.
Request Tracing: For complex agents interacting with many services (databases, other APIs, multiple LLMs), integrate OpenTelemetry or OpenCensus. This enables distributed tracing, giving you end-to-end visibility into request flows across all services involved in an agent’s response.

Security

IAM Least Privilege: Regularly review and refine the permissions of your adk-agent-runner service account. Only grant the roles absolutely necessary for the agent’s operation. Avoid broad roles like “Editor” or “Owner.”
Secret Manager Policies: Ensure only the Cloud Run service account can access the secrets it needs. Avoid granting broad “Secret Manager Admin” roles, which could expose all secrets.
Network Access: If your agent needs to access private resources (e.g., a database in a private VPC network), configure a Serverless VPC Access connector for your Cloud Run service.
Input Validation & Guardrails: Always validate and sanitize user inputs to prevent prompt injection attacks or unexpected behavior. While ADK provides some built-in safety features, custom tools or complex prompts might require additional, explicit checks within your agent’s logic.

Scalability and Cost Management

Cold Starts: When min-instances is 0 (as configured), Cloud Run scales down to zero instances when idle. The very first request after a period of inactivity will incur a “cold start” delay as a new instance spins up. For latency-sensitive applications with frequent but sporadic traffic, you might set --min-instances to 1 or higher to keep an instance warm, but this increases continuous cost.
Concurrency: Cloud Run instances can handle multiple concurrent requests. The --concurrency setting (default is 80) determines how many requests a single instance can process simultaneously. Tune this based on your agent’s average request processing time and resource usage to optimize performance and cost.
Cost Monitoring: Regularly review your Google Cloud billing to understand Cloud Run, Firestore, and Secret Manager costs. Set budget alerts in the Google Cloud Console to avoid unexpected expenditures.

Common Issues & Solutions

Deployment Fails with Permission Errors:
- Issue: The gcloud run deploy command fails, often due to the user or the Cloud Build service account (if using CI/CD) lacking sufficient permissions (e.g., roles/run.admin or roles/artifactregistry.writer).
- Solution: Ensure the identity performing the deployment (your gcloud user) has the necessary permissions. The adk-agent-runner service account is for runtime, not deployment.
Agent Not Responding / HTTP 500 Errors:
- Issue: The deployed agent returns HTTP 500 errors or doesn’t respond as expected.
- Solution:
  - Check Cloud Run Logs: This is your first and most important debugging step. Look for Python tracebacks, ModuleNotFoundError (ensure requirements.txt is complete and pip install ran correctly in the Dockerfile), or issues connecting to Firestore.
  - Environment Variables: Verify all required environment variables (e.g., GOOGLE_CLOUD_PROJECT) are correctly set in the Cloud Run service configuration. You can check this with gcloud run services describe adk-persistent-agent --platform managed --region us-central1.
  - Service Account Permissions: Double-check that the adk-agent-runner service account has roles/datastore.user (for Firestore) and roles/secretmanager.secretAccessor (if using Secret Manager).
  - Port Listening: Confirm your main.py FastAPI app is correctly exposed via Uvicorn, listening on 0.0.0.0 and the port provided by Cloud Run (default 8080).
Context Loss / State Not Persisting:
- Issue: The agent doesn’t remember previous conversational context across sessions or requests, despite the setup.
- Solution:
  - Firestore Connectivity: Check Cloud Run logs for any errors connecting to Firestore. Verify GOOGLE_CLOUD_PROJECT is correctly passed and that the service account has roles/datastore.user.
  - Firestore Data: Manually inspect your Firestore collection (adk_agent_sessions) in the Google Cloud Console to see if state is actually being written and read for the correct session_id.
  - Agent Logic: Double-check your PersistentADKAgent’s handle_message method to ensure it’s correctly calling self.persistent_state_manager.load_state() and self.persistent_state_manager.save_state() with the appropriate session_id.
Slow Responses (Cold Starts):
- Issue: Initial requests to the agent are slow, especially after periods of inactivity (e.g., the first request of the day).
- Solution: This is a known characteristic of serverless services configured with min-instances=0. If the cold start delay is unacceptable for your application’s latency requirements, set --min-instances to 1 or higher on your Cloud Run service. Be aware that this will incur continuous costs, even when the service is idle.

Summary & Next Step

Congratulations! You have successfully deployed your long-running, context-aware ADK agent to Google Cloud Run, secured its secrets with Secret Manager, and established fundamental logging and monitoring. This agent is now running in a scalable, reliable, and observable production-like environment. You’ve moved beyond a local prototype to a robust system ready for real users.

The project is now fully functional and production-ready. From here, you might consider:

CI/CD Integration: Automate your build, test, and deployment process using Cloud Build, GitHub Actions, or GitLab CI to streamline future updates.
Advanced Monitoring: Set up custom dashboards, detailed alerts, and potentially integrate with APM (Application Performance Monitoring) tools for deeper insights into agent performance and health.
User Interface: Build a frontend application (web or mobile) to interact with your deployed agent API, creating a complete user experience.
A/B Testing: Implement mechanisms to test different agent versions, prompt strategies, or tool configurations in production to optimize performance and user satisfaction.

This concludes our journey of building a long-running AI agent with Google ADK that can maintain context and state across sessions. You now have a solid foundation for developing and operating sophisticated AI applications.

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.

Deploying and Monitoring Your Production ADK Agent on Google Cloud

// table of contents

Project Overview: From Prototype to Production

Tech Stack for Production Deployment

Build Plan: Deploying Your Agent to the Cloud

Architecture: Production Deployment Strategy

Step-by-Step Implementation

1. Enable Required Google Cloud APIs

2. Create a Dedicated Service Account

3. Update Dockerfile for Production Readiness

4. Build and Push Docker Image to Artifact Registry

5. Secure Secrets with Secret Manager

6. Deploy to Cloud Run

7. Configure Logging and Monitoring

Testing & Verification

Operations and Production Readiness

Observability

Security

Scalability and Cost Management

Common Issues & Solutions

Summary & Next Step

References