Robust Testing for Long-Running Agent Workflows

Building a reliable, long-running AI agent that can pause, resume, and maintain its conversational context across sessions is paramount for production systems. This chapter focuses on establishing a robust testing framework to ensure our Google ADK agent’s state persistence and recovery mechanisms function flawlessly under various conditions.

By the end of this milestone, you will have implemented unit, integration, and end-to-end tests. These tests will validate the agent’s ability to save and load its state, preserve conversation history, and correctly resume complex workflows after an interruption. This rigorous testing is crucial for delivering an AI agent that users can trust not to “forget” their interactions.

Project Overview

Our overarching project aims to develop a persistent AI agent using Google’s Agent Development Kit (ADK). This agent is designed to handle multi-turn conversations and complex workflows, maintaining its state and context even if its execution is interrupted or paused. This persistence is achieved by decoupling the agent’s in-memory state from its durable storage, leveraging a robust external data store like Google Cloud Firestore.

The core challenge for such an agent lies in ensuring that its state is consistently saved and accurately restored. Any loss of context could lead to frustrating user experiences, broken business processes, and a significant erosion of user trust. This chapter directly addresses that challenge by building a comprehensive test suite.

Tech Stack for Testing

To build our robust test suite, we will primarily utilize:

Python 3.12.x: The core language for our agent and tests. (Latest stable version as of 2026-05-23).
pytest: A widely adopted and powerful testing framework for Python. (Latest stable version recommended).
pytest-mock: A pytest plugin that provides a convenient fixture for unittest.mock functionality, essential for isolating components. (Latest stable version recommended).
Google Cloud Firestore Client Library: For interacting with our chosen state persistence layer. (Latest stable version recommended).
Google Cloud Firestore Emulator: A local, in-memory version of Firestore, critical for fast, isolated, and deterministic integration tests without incurring cloud costs or affecting live data.

The specific version of Google ADK is not publicly available as of 2026-05-23, but we assume compatibility with standard Python 3.x environments. Our agent’s interaction with ADK will be modeled for testing purposes, focusing on the persistence aspects.

Milestones for This Chapter

This chapter is structured to build our testing capabilities incrementally:

Unit Test State Serialization: Verify the core logic of converting agent state to a storable format (e.g., JSON) and back, ensuring data integrity.
Integration Test Persistence Layer: Validate that our StateManager correctly interacts with Google Cloud Firestore (using the emulator) to save and load agent states.
End-to-End Test Pause/Resume: Simulate a full agent workflow, including an interruption and subsequent resumption, to confirm the agent picks up the conversation correctly.

Architecture for Testing Persistent Agents

Testing a stateful, long-running AI agent requires a multi-layered approach to ensure reliability at every level. Our testing architecture is designed to provide confidence in data integrity and workflow continuity.

Testing Layers

Unit Tests: Focus on isolated functions or methods (e.g., serialize_state). They should be fast and have no external dependencies, often using mocks for any collaborators.
Integration Tests: Verify the interaction between two or more components (e.g., FirestoreStateManager and the Firestore database emulator). These tests confirm that components work together as expected.
End-to-End (E2E) Tests: Simulate a user’s full journey with the agent, including pauses and resumes, to validate the entire system’s behavior. These are the most comprehensive but also the slowest.

Test File Structure

We’ll organize our tests in a dedicated tests/ directory at the root of our project, mirroring our application’s structure.

.
├── my_adk_agent/
│   ├── __init__.py
│   ├── agent.py               # Our simplified ADK agent model
│   └── persistence.py         # State serialization and Firestore interaction
├── tests/
│   ├── __init__.py
│   ├── test_persistence_units.py  # Unit tests for serialization
│   ├── test_persistence_integration.py # Integration tests for Firestore
│   └── test_e2e_resume.py         # E2E tests for pause/resume workflow
├── requirements.txt
└── main.py

E2E Pause/Resume Test Flow

The most critical test for a long-running agent is verifying its ability to pause and resume. This diagram illustrates the high-level flow of our end-to-end test.

flowchart TD A[User Input] --> B[Process Input] B --> C[Save State] C --> D[Shutdown] D --> E[Agent Restarts] E --> F[Load State] F --> G[Resume Workflow] G --> H[Verify Resumption]

Explanation: This flow ensures that after an interruption (simulated shutdown), a new agent instance can correctly load the previously saved state and continue the conversation from the exact point it left off.

Key Testing Principles

Isolation: For unit tests, use mocking to replace external dependencies (like the state store or Google ADK’s internal components) to focus purely on the logic under test.
Deterministic Scenarios: Every test should be repeatable and produce the same results, regardless of when or where it’s run. This is why local emulators are preferred over live cloud services for integration tests.
Clear Assertions: Each test must make specific, unambiguous assertions about the expected state, output, or behavior. Vague tests lead to false confidence.
Realism for Integration/E2E: While unit tests are mocked, integration tests should interact with a realistic (emulated) persistence layer, and E2E tests should simulate user interactions as closely as possible.

Step-by-Step Implementation

We’ll use pytest for our tests. First, ensure the necessary libraries are installed.

pip install pytest pytest-mock google-cloud-firestore

Python version note: As of 2026-05-23, the latest stable Python 3.x series (e.g., Python 3.12.x) is recommended. The Google ADK version is not publicly available as of this date, so we assume compatibility with standard Python 3.x environments.

1. Unit Test: State Serialization/Deserialization

This test validates the serialize_state and deserialize_state functions, which are responsible for converting our agent’s Python dictionary state into a JSON string and back. These functions are typically located in my_adk_agent/persistence.py.

Create tests/test_persistence_units.py:

# tests/test_persistence_units.py
import pytest
from my_adk_agent.persistence import serialize_state, deserialize_state

def test_state_serialization_deserialization():
    """
    Verifies that agent state can be serialized to JSON and deserialized back
    without loss of data or type changes.
    """
    initial_state = {
        "user_id": "test_user_123",
        "conversation_history": [
            {"role": "user", "content": "Hello"},
            {"role": "agent", "content": "Hi there!"}
        ],
        "current_workflow_step": "awaiting_confirmation",
        "context_data": {"order_id": "ORD-001", "item_count": 2},
        "is_active": True,
        "last_updated_timestamp": 1700000000.0, # Example timestamp
    }

    # 📌 Key Idea: Test round-trip fidelity.
    serialized_data = serialize_state(initial_state)
    assert isinstance(serialized_data, str), "Serialized data should be a string."
    assert 'test_user_123' in serialized_data, "User ID should be present in serialized data."
    assert 'awaiting_confirmation' in serialized_data, "Workflow step should be present."

    deserialized_state = deserialize_state(serialized_data)

    # Assert that the deserialized state matches the original
    assert deserialized_state == initial_state, "Deserialized state must match initial state."
    # Also check specific types for robustness, especially for numbers/booleans
    assert isinstance(deserialized_state['is_active'], bool), "Boolean type should be preserved."
    assert isinstance(deserialized_state['last_updated_timestamp'], float), "Float type should be preserved."
    assert deserialized_state['conversation_history'][0]['role'] == 'user', "Nested list/dict data should be correct."

    print(f"\nInitial state: {initial_state}")
    print(f"Serialized data: {serialized_data}")
    print(f"Deserialized state: {deserialized_state}")

def test_empty_state_serialization():
    """Tests serialization/deserialization of an empty state dictionary."""
    empty_state = {}
    serialized = serialize_state(empty_state)
    deserialized = deserialize_state(serialized)
    assert deserialized == empty_state, "Empty state should serialize and deserialize correctly."

def test_invalid_json_deserialization():
    """Tests deserialization with invalid JSON input to ensure error handling."""
    # ⚠️ What can go wrong: Malformed JSON can crash your agent.
    with pytest.raises(ValueError, match="Invalid JSON format"):
        deserialize_state("this is not valid json")

    with pytest.raises(ValueError, match="Invalid JSON format"):
        deserialize_state("{'key': 'value'") # Malformed JSON with single quotes

Explanation:

Round-trip Test: The primary goal is to verify that serialize_state and deserialize_state are inverse operations, meaning deserialize_state(serialize_state(data)) == data.
Type Preservation: JSON serialization can sometimes alter data types (e.g., numbers becoming strings). Explicit assertions ensure critical types like booleans and floats are maintained.
Error Handling: Testing with invalid JSON confirms that our deserialize_state function gracefully handles bad input, raising a ValueError as expected, rather than crashing the application. This is a crucial production awareness point.

2. Integration Test: Firestore Persistence

This test interacts with a real (or emulated) Firestore instance to ensure our save_agent_state and load_agent_state functions work correctly with the actual database. Using a local Firestore emulator is vital for fast, reliable, and isolated integration tests.

First, set up a local Firestore emulator if you haven’t already. Google Cloud Firestore Emulator documentation

Install the Google Cloud CLI (gcloud) and run:

gcloud components install cloud-firestore-emulator
gcloud emulators firestore start --host-port=localhost:8080

Keep this running in a separate terminal.

Now, ensure your my_adk_agent/persistence.py has a FirestoreStateManager class that uses google.cloud.firestore.Client. (This was implied from the previous chapter’s context).

# my_adk_agent/persistence.py (Illustrative snippet, adapt to your implementation)
import json
from google.cloud import firestore

class StateManager:
    """Abstract base class for state management."""
    def save_agent_state(self, agent_id: str, state: dict):
        raise NotImplementedError
    def load_agent_state(self, agent_id: str) -> dict | None:
        raise NotImplementedError

class FirestoreStateManager(StateManager):
    """Manages agent state persistence using Google Cloud Firestore."""
    def __init__(self, client: firestore.Client, collection_name: str = "agent_states"):
        self.db = client
        self.collection_ref = self.db.collection(collection_name)

    def save_agent_state(self, agent_id: str, state: dict):
        """Saves the agent's state to Firestore."""
        doc_ref = self.collection_ref.document(agent_id)
        # Firestore automatically handles Python dicts, converting them to JSON-like documents.
        # We assume `state` is already a dictionary of basic types or objects Firestore can handle.
        doc_ref.set(state)
        # ⚡ Quick Note: For complex objects, you might need pre-serialization here.
        # Our `serialize_state` is more for general JSON string output, not direct Firestore dicts.
        # Firestore's client library handles basic Python types (dicts, lists, int, str, bool, float) well.

    def load_agent_state(self, agent_id: str) -> dict | None:
        """Loads the agent's state from Firestore."""
        doc_ref = self.collection_ref.document(agent_id)
        doc = doc_ref.get()
        if doc.exists:
            return doc.to_dict()
        return None

def serialize_state(state: dict) -> str:
    """Serializes a dictionary state to a JSON string."""
    try:
        return json.dumps(state)
    except TypeError as e:
        raise ValueError(f"Failed to serialize state to JSON: {e}")

def deserialize_state(serialized_state: str) -> dict:
    """Deserializes a JSON string back to a dictionary state."""
    try:
        return json.loads(serialized_state)
    except json.JSONDecodeError as e:
        raise ValueError(f"Invalid JSON format for state: {e}")

Now, create tests/test_persistence_integration.py:

# tests/test_persistence_integration.py
import pytest
import os
from google.cloud import firestore
from my_adk_agent.persistence import FirestoreStateManager

# IMPORTANT: Configure Firestore emulator for tests
# This environment variable tells the client library to connect to the emulator.
os.environ["FIRESTORE_EMULATOR_HOST"] = "localhost:8080"
os.environ["GCLOUD_PROJECT"] = "test-project-id" # Use a dummy project ID for emulator

@pytest.fixture(scope="module")
def firestore_client_fixture():
    """Provides a Firestore client connected to the emulator for the test module."""
    client = firestore.Client()
    yield client
    # Clean up any data created by tests in this module
    # 🔥 Optimization / Pro tip: For more granular cleanup, delete per test or use a unique collection name per test run.
    # For a simple module-scoped cleanup, deleting the collection is sufficient.
    print("\nCleaning up Firestore emulator data...")
    docs = client.collection("agent_states").stream()
    for doc in docs:
        doc.reference.delete()
    print("Firestore emulator data cleaned.")


def test_save_and_load_agent_state_integration(firestore_client_fixture):
    """
    Tests saving and loading agent state to/from Firestore using the emulator.
    This validates the interaction between our StateManager and Firestore.
    """
    manager = FirestoreStateManager(firestore_client_fixture, collection_name="agent_states")
    test_agent_id = "agent_test_id_001"
    initial_state = {
        "user_id": "test_user_001",
        "conversation_history": [{"role": "user", "content": "Hello Firestore"}],
        "workflow_status": "pending_db_check",
        "nested_data": {"sub_key": 123, "list_items": ["a", "b"]}
    }

    # 1. Save state
    manager.save_agent_state(test_agent_id, initial_state)

    # 2. Load state
    loaded_state = manager.load_agent_state(test_agent_id)

    # 3. Assertions
    assert loaded_state is not None, "Loaded state should not be None."
    assert loaded_state == initial_state, "Loaded state must exactly match initial state."
    assert loaded_state["user_id"] == "test_user_001"
    assert loaded_state["conversation_history"][0]["content"] == "Hello Firestore"
    assert loaded_state["nested_data"]["sub_key"] == 123

    # Test updating state: crucial for long-running agents
    updated_state = initial_state.copy()
    updated_state["workflow_status"] = "completed_db_check"
    updated_state["new_field"] = "added_during_update"
    manager.save_agent_state(test_agent_id, updated_state)
    reloaded_updated_state = manager.load_agent_state(test_agent_id)

    assert reloaded_updated_state == updated_state, "Updated state must be correctly saved and loaded."
    assert reloaded_updated_state["new_field"] == "added_during_update"

def test_load_non_existent_state(firestore_client_fixture):
    """
    Tests loading a state for an agent ID that does not exist.
    It should gracefully return None, indicating no state found.
    """
    manager = FirestoreStateManager(firestore_client_fixture, collection_name="agent_states")
    non_existent_id = "non_existent_agent_id_123"
    loaded_state = manager.load_agent_state(non_existent_id)
    assert loaded_state is None, "Loading non-existent state should return None."

Explanation:

Firestore Emulator: The os.environ lines are critical. They tell the google-cloud-firestore client library to connect to the local emulator, ensuring tests are isolated and fast. GCLOUD_PROJECT is a dummy ID required by the client.
firestore_client_fixture: This pytest fixture provides a configured firestore.Client instance. scope="module" means it runs once for all tests in this file. The yield statement ensures that cleanup (deleting test data from the emulator) happens after all tests in the module are complete, maintaining a clean slate for subsequent test runs.
Save and Load: The test_save_and_load_agent_state_integration function saves a sample initial_state for a unique agent_id, then immediately loads it back. It asserts that the loaded state is identical to the original, confirming both saving and loading work. It also includes an update scenario, which is vital for long-running agents that continuously modify their state.
Edge Case: Non-existent State: test_load_non_existent_state verifies that the system gracefully handles requests to load state for an agent_id that has no stored data, returning None.

3. End-to-End Test: Pause and Resume Workflow

This E2E test simulates an agent interaction, saves its state, then simulates a restart and verifies the agent can pick up exactly where it left off. This requires a simplified model of our ADK agent (MyADKAgent) that interacts with our StateManager.

Let’s assume my_adk_agent/agent.py contains our main MyADKAgent class. This class will manage its internal state using our StateManager.

# my_adk_agent/agent.py (Illustrative snippet, adapt to your ADK agent structure)
from .persistence import StateManager, FirestoreStateManager # Assuming FirestoreStateManager is used
# from adk.agent import Agent as AdkAgent # Placeholder for ADK Agent class, not directly used in this simplified model
# from adk.message import Message # Placeholder for ADK Message class

class MyADKAgent:
    """
    A simplified agent model for demonstrating state management and pause/resume.
    In a real ADK setup, state management would integrate with ADK's lifecycle hooks.
    This model abstracts the ADK specifics to focus on state persistence testing.
    """
    def __init__(self, agent_id: str, state_manager: StateManager):
        self.agent_id = agent_id
        self.state_manager = state_manager
        # Load existing state or initialize new state
        self.state = self.state_manager.load_agent_state(self.agent_id) or self._initial_state()
        print(f"Agent {self.agent_id} initialized. Current state: {self.state.get('current_workflow_step')}")


    def _initial_state(self) -> dict:
        """Returns the default initial state for a new agent session."""
        return {
            "conversation_history": [],
            "current_workflow_step": "start",
            "context_data": {}
        }

    def _save_state(self):
        """Saves the current internal state of the agent using the state manager."""
        self.state_manager.save_agent_state(self.agent_id, self.state)

    def handle_message(self, user_message: str) -> str:
        """
        Simulates agent processing of a user message and updates its internal state.
        Saves state after each interaction.
        """
        self.state["conversation_history"].append({"role": "user", "content": user_message})

        response = ""
        current_step = self.state.get("current_workflow_step")

        # 🧠 Important: This simplified logic demonstrates state transitions.
        # A real ADK agent would use LLM calls, tool execution, etc.
        if current_step == "start":
            response = "Welcome! What is your name?"
            self.state["current_workflow_step"] = "awaiting_name"
        elif current_step == "awaiting_name":
            self.state["context_data"]["user_name"] = user_message
            response = f"Nice to meet you, {user_message}. What task can I help you with?"
            self.state["current_workflow_step"] = "awaiting_task"
        elif current_step == "awaiting_task":
            response = f"Understood. I will help with '{user_message}'. Is that correct? (yes/no)"
            self.state["context_data"]["proposed_task"] = user_message
            self.state["current_workflow_step"] = "confirm_task"
        elif current_step == "confirm_task":
            if user_message.lower() == "yes":
                response = "Great! Starting your task now."
                self.state["current_workflow_step"] = "task_in_progress"
            else:
                response = "Okay, what task should I help with instead?"
                self.state["current_workflow_step"] = "awaiting_task"
        elif current_step == "task_in_progress":
            response = "I'm currently working on your task. What's next?"
        else:
            response = "I'm not sure how to proceed. Let's restart. What is your name?"
            self.state["current_workflow_step"] = "awaiting_name"

        self.state["conversation_history"].append({"role": "agent", "content": response})
        self._save_state() # Save state after every interaction to ensure persistence
        return response

    def get_current_workflow_step(self) -> str:
        """Returns the current step in the agent's workflow."""
        return self.state.get("current_workflow_step", "unknown")

    def get_context_data(self) -> dict:
        """Returns the current context data maintained by the agent."""
        return self.state.get("context_data", {})

Now, create tests/test_e2e_resume.py:

# tests/test_e2e_resume.py
import pytest
import os
from google.cloud import firestore
from my_adk_agent.persistence import FirestoreStateManager
from my_adk_agent.agent import MyADKAgent

# Ensure Firestore emulator is configured for tests
os.environ["FIRESTORE_EMULATOR_HOST"] = "localhost:8080"
os.environ["GCLOUD_PROJECT"] = "test-project-id"

@pytest.fixture(scope="module")
def firestore_client_e2e_fixture():
    """Provides a Firestore client connected to the emulator for E2E tests."""
    client = firestore.Client()
    yield client
    # Clean up specific collection used by E2E tests
    print("\nCleaning up E2E Firestore emulator data...")
    docs = client.collection("e2e_agent_states").stream()
    for doc in docs:
        doc.reference.delete()
    print("E2E Firestore emulator data cleaned.")

def test_agent_pause_resume_workflow(firestore_client_e2e_fixture):
    """
    Tests an end-to-end scenario where an agent workflow is paused and then resumed
    from the correct state and context. This simulates an agent being restarted.
    """
    test_agent_id = "e2e_agent_id_001"
    state_manager = FirestoreStateManager(firestore_client_e2e_fixture, collection_name="e2e_agent_states")

    # --- Part 1: Initial interaction and state saving (Simulate Agent A) ---
    print("\n--- Agent A: Initial conversation ---")
    agent_a = MyADKAgent(test_agent_id, state_manager)
    # Agent A starts, loads state (will be initial state as it's the first run)

    # First interaction
    response1 = agent_a.handle_message("My name is Alice")
    print(f"Agent A Response 1: {response1}")
    assert "Nice to meet you, Alice" in response1, "Agent A should greet Alice."
    assert agent_a.get_current_workflow_step() == "awaiting_task", "Agent A should move to awaiting_task step."
    assert agent_a.get_context_data().get("user_name") == "Alice", "Agent A should store user name."

    # Second interaction
    response2 = agent_a.handle_message("I need help with booking a flight")
    print(f"Agent A Response 2: {response2}")
    assert "booking a flight" in response2, "Agent A should confirm task."
    assert agent_a.get_current_workflow_step() == "confirm_task", "Agent A should move to confirm_task step."
    assert agent_a.get_context_data().get("proposed_task") == "booking a flight", "Agent A should store proposed task."

    # At this point, agent_a's state is saved to Firestore by `handle_message`
    print(f"Agent A current state: {agent_a.get_current_workflow_step()}, context: {agent_a.get_context_data()}")

    # --- Part 2: Simulate agent shutdown and restart (Simulate Agent B) ---
    print("\n--- Agent B: Resuming conversation ---")
    # Simulate a new instance of the agent, loading state from persistence
    agent_b = MyADKAgent(test_agent_id, state_manager)
    # Agent B initializes, and crucially, loads the state saved by Agent A

    # Verify agent_b has correctly loaded the state from agent_a
    assert agent_b.get_current_workflow_step() == "confirm_task", "Agent B should resume at 'confirm_task' step."
    assert agent_b.get_context_data().get("user_name") == "Alice", "Agent B should have Alice's name in context."
    assert agent_b.get_context_data().get("proposed_task") == "booking a flight", "Agent B should have proposed task in context."
    print(f"Agent B loaded workflow step: {agent_b.get_current_workflow_step()}")
    print(f"Agent B loaded context data: {agent_b.get_context_data()}")

    # Continue the conversation from where Agent A left off
    response3 = agent_b.handle_message("yes")
    print(f"Agent B Response 3: {response3}")
    assert "Starting your task now" in response3, "Agent B should proceed to task start confirmation."
    assert agent_b.get_current_workflow_step() == "task_in_progress", "Agent B should move to 'task_in_progress' step."
    assert agent_b.get_context_data().get("proposed_task") == "booking a flight", "Agent B should retain proposed task after confirmation."

    print("\nE2E Pause/Resume Test Passed!")

Explanation:

Simplified Agent Model: MyADKAgent is a simplified Python class that models the core state-dependent behavior of our ADK agent. It’s not a full ADK agent, but it demonstrates how an agent would interact with our StateManager to save and load its state. This abstraction allows us to test the persistence logic without needing to mock complex ADK internals.
Two Agent Instances: The test creates agent_a to simulate the initial conversation. After agent_a completes a few steps and saves its state, agent_b is instantiated with the same agent_id. This simulates a new process or a restarted agent picking up the conversation.
State Verification: Before agent_b continues, crucial assertions are made to ensure it has loaded the correct current_workflow_step and context_data from agent_a’s last saved state. This is the heart of the pause/resume functionality.
Resumption: agent_b then sends the next logical message (“yes”), and we assert that the conversation proceeds as expected, demonstrating successful resumption from a previously saved state.

Testing & Verification

To run all your tests, ensure the Firestore emulator is running in a separate terminal. Then, navigate to your project’s root directory in the terminal and execute:

pytest

Expected Output:

You should see output similar to this, indicating all tests passed:

============================= test session starts ==============================
platform linux -- Python 3.12.x, pytest-X.X.X, pluggy-X.X.X
rootdir: /path/to/your/project
plugins: mock-X.X.X
collected 6 items

tests/test_persistence_units.py::test_state_serialization_deserialization PASSED [ 16%]
tests/test_persistence_units.py::test_empty_state_serialization PASSED [ 33%]
tests/test_persistence_units.py::test_invalid_json_deserialization PASSED [ 50%]
tests/test_persistence_integration.py::test_save_and_load_agent_state_integration PASSED [ 66%]
tests/test_persistence_integration.py::test_load_non_existent_state PASSED [ 83%]
tests/test_e2e_resume.py::test_agent_pause_resume_workflow PASSED [100%]

============================== 6 passed in X.XXs ===============================

If any test fails, pytest will provide detailed traceback information, pointing you to the exact line where an assertion failed, helping you quickly identify the root cause.

Quick Debugging Checks

Firestore Emulator Status: Is the Firestore emulator running on localhost:8080? If not, pytest will report connection errors for integration and E2E tests.
Environment Variables: Verify that FIRESTORE_EMULATOR_HOST and GCLOUD_PROJECT are correctly set in your test files.
Serialization Logic: If test_persistence_units.py fails, carefully inspect my_adk_agent/persistence.py for issues in serialize_state or deserialize_state, especially concerning complex data types or custom objects that might not convert cleanly to/from JSON.
State Mismatch in E2E: For integration and E2E tests, add print() statements to display the initial_state and loaded_state (or agent_a.state and agent_b.state) at critical points. This allows for direct visual comparison to pinpoint where data loss or corruption occurs.

Production Considerations

Robust testing is a cornerstone of production readiness, especially for stateful AI agents.

CI/CD Integration: Integrate these tests into your Continuous Integration/Continuous Deployment (CI/CD) pipeline. Every code change should trigger the full test suite, including emulator-based integration tests, to catch regressions early before deployment. This ensures that new features or bug fixes don’t inadvertently break state persistence.
Performance Testing: For high-throughput agents handling thousands of concurrent users, the performance of your state persistence layer is critical. Conduct performance tests on save_agent_state and load_agent_state under simulated load. Even a few milliseconds of delay per call can add up to significant latency for users or bottlenecks for the system.
Test Data Management: For complex E2E tests, consider using factory libraries (e.g., Faker) to generate realistic, yet randomized, test data. Ensure your test data is always cleaned up between test runs to maintain test isolation and prevent test pollution.
Reliability vs. Speed: Unit tests are fast, providing quick feedback. Integration and E2E tests, involving external dependencies (even emulated ones), will inherently be slower. Balance the depth of testing with the need for quick feedback in your CI pipeline. You might run a subset of fast tests on every commit and the full, slower suite less frequently (e.g., nightly builds or before major deployments).

Common Issues & Solutions

1. Test Flakiness Due to External Dependencies

Issue: Integration or E2E tests intermittently fail without clear cause, especially when interacting with real external services (e.g., a live database, a cloud API). This can be due to network latency, transient service issues, or rate limits.

Solution:

Why it happens: Live external services introduce non-determinism. Network conditions vary, and cloud services can have temporary hiccups.
Use Emulators: As demonstrated, prioritize local emulators (Firestore, Pub/Sub, etc.) for integration tests. They provide a consistent, fast, and isolated environment, eliminating external variability.
Mock External APIs: For services without emulators, use unittest.mock or pytest-mock to replace actual API calls with controlled, predictable responses.
Retries (Limited Use): In very specific E2E scenarios hitting staging environments (not local tests), implementing simple retry logic with exponential backoff for network-related failures might be necessary. Avoid this for local development tests.

2. Incomplete State Capture in Tests

Issue: An E2E test passes, but later in production, the agent loses some context or data upon resume. This often means the test’s initial_state was too simplistic or didn’t cover all aspects of the agent’s real, evolving state.

Solution:

Why it happens: As agents evolve, new fields or complex data structures are added to their internal state, but the persistence logic or test cases aren’t updated to reflect this.
Comprehensive State Models: Ensure your test initial_state objects are as close as possible to the full complexity of your agent’s actual state, including nested dictionaries, lists, and various data types.
Deep Assertions: Beyond assert loaded_state == initial_state, add specific assertions for critical fields, especially those that are deeply nested or have specific type requirements. This forces a more thorough check.
Review Persistence Logic: Regularly review your agent’s _save_state and load_agent_state methods to ensure all relevant data is being persisted and restored. Consider an explicit schema or Pydantic models for your agent state to prevent accidental omissions.

3. Difficulty Testing Asynchronous Agent Actions

Issue: If your ADK agent performs asynchronous operations (e.g., calling external APIs, waiting for user input, processing long-running tasks), it can be challenging to test the state at specific intermediate points during these operations.

Solution:

Why it happens: Asynchronous operations execute concurrently, making it hard to predict the exact state at a given moment without explicit synchronization.
Mock Asynchronous Calls: Use unittest.mock.AsyncMock (Python 3.8+) or pytest-asyncio to mock asynchronous functions. This allows you to control their return values and side effects, making the asynchronous behavior deterministic for tests.
Introduce Checkpoints: Design your agent with explicit “save points” or state transitions that occur after asynchronous operations complete. Your tests can then assert the state at these known, stable checkpoints.
Event-Driven Testing: For truly complex asynchronous flows, consider an event-driven testing approach where tests assert that specific events are emitted, rather than trying to inspect internal state at every microsecond.

Summary & Next Step

In this chapter, we’ve established a critical foundation for ensuring the reliability of our long-running ADK agent: a robust testing suite. We successfully implemented:

Unit tests for state serialization and deserialization, verifying the integrity of data conversion.
Integration tests for our Firestore persistence layer, confirming seamless interaction with the database emulator for saving and loading agent states.
End-to-end tests that simulate the entire pause-and-resume workflow, providing high confidence that the agent can pick up a conversation exactly where it left off, even after a restart.

This comprehensive testing approach provides confidence that our agent’s ability to maintain context and state across sessions is solid and production-ready. The agent is now not only functional but also verifiable for its core persistence features, which is essential for user trust and system stability.

Our agent is now capable of persisting its state and has a strong testing framework in place. The next crucial step is to prepare it for actual deployment. In the next chapter, we will focus on Containerization with Docker to package our agent for scalable and portable deployment to Google Cloud.

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.

Robust Testing for Long-Running Agent Workflows

// table of contents

Project Overview

Tech Stack for Testing

Milestones for This Chapter

Architecture for Testing Persistent Agents

Testing Layers

Test File Structure

E2E Pause/Resume Test Flow

Key Testing Principles

Step-by-Step Implementation

1. Unit Test: State Serialization/Deserialization

2. Integration Test: Firestore Persistence

3. End-to-End Test: Pause and Resume Workflow

Testing & Verification

Quick Debugging Checks

Production Considerations

Common Issues & Solutions

1. Test Flakiness Due to External Dependencies

2. Incomplete State Capture in Tests

3. Difficulty Testing Asynchronous Agent Actions

Summary & Next Step

References