Skip to content

Fallback Strategy

The system designs fallback guarantees at every layer, ensuring the main path remains available under any single-point failure. Core philosophy: Prefer degraded service over service interruption.


Seven-Layer Fallback Matrix

Layer Component Failure Scenario Fallback Target User Perception
1 LLM LLM_API_KEY not configured / call failed _MockLLM assembled reply Reply not LLM-generated, retrieval chain normal
2 BGE embedding Model download failed / load timeout hash fallback vectors (1024-dim) Semantic retrieval disabled, flow runs
3 LangGraph Package not installed / build failed / runtime exception _SynchOrchestrator synchronous orchestration No perception, behavior fully consistent
4 Redis Connection unreachable In-memory queue / in-memory dict Single-machine usable, sessions lost on restart
5 Business API BUSINESS_API_BASE_URL empty / call timeout mock business system (in-memory) Business query returns mock data
6 Real LLM Main LLM call failed ModelRouter falls back to default model retry Occasional latency increase, result normal
7 Langfuse Not configured / reporting failed no-op (silent skip) No trace data, no impact on main path

Layer-by-Layer Fallback Details

1. LLM Unavailable → _MockLLM

When LLM_API_KEY is empty, LLMClient auto-instantiates _MockLLM:

class _MockLLM:
    """Mock LLM: fallback when no API Key or call fails.

    Does not call any external service; extracts the last user content from messages
    and assembles a reply. Intent recognition uses keyword rules; answer generation
    uses retrieval result concatenation.
    """
Behavior Mock mode Real mode
Intent recognition Keyword rules (sentiment → ticket → business → chitchat → knowledge_qa) LLM structured recognition + IntentCache
Query rewriting Skipped, uses original query DeepSeek sync rewrite
Answer generation Retrieval result concatenation LLM generates from fragments
Dialog polish Skipped, uses raw reply DialogAgent LLM polish

Value of mock mode

Mock mode lets the system run the full chain without an LLM Key or network, ideal for local development, CI testing, and demos. The retrieval chain (vector + BM25 + RRF + Reranker) runs in full, which can verify knowledge base ingestion and retrieval effectiveness.

2. BGE Load Failure → hash fallback

EmbeddingService tries to load in a four-level order; after all fail, degrades to hash vectorization:

# Load order: primary source → mirror source → local cache → hash fallback
# 1. HuggingFace primary source (https://huggingface.co)
# 2. Domestic mirror source (HF_MIRROR_URL, default https://hf-mirror.com)
# 3. Local cache directory (EMBEDDING_LOCAL_CACHE_DIR, default ./models/bge-large-zh)
# 4. hash fallback: deterministic degradation, only ensures the flow is available
class _FallbackEmbedder:
    """Deterministic hash vectorization: only for running the flow.

    Uses hashlib.sha256 to generate 1024-dim vectors, aligned with BGE dimensions
    to avoid vector store schema conflicts. Semantic retrieval is disabled in this mode.
    """
    def encode(self, text: str) -> List[float]:
        digest = hashlib.sha256(text.encode("utf-8")).digest()
        # Generate deterministic 1024-dim vectors
        ...

hash fallback impact

hash fallback only ensures the flow does not break; semantic retrieval is completely disabled (identical text has identical vectors, but semantically similar text has no correlation). Production environments must ensure the BGE model is available; check whether the embedding mode is bge via GET /api/v1/observability/health.

3. LangGraph Unavailable → Synchronous Orchestrator

When LangGraph is not installed or build fails, degrades to _SynchOrchestrator:

def _get_compiled_graph():
    """Lazily get the compiled LangGraph instance.

    Tries to build on first call; on failure records the error and returns None,
    subsequent calls reuse the result, avoiding retries every time.
    """
    global _compiled_graph, _graph_init_error
    if _compiled_graph is not None:
        return _compiled_graph
    if _graph_init_error is not None:
        return None  # Previously failed, no retry
    try:
        _compiled_graph = _build_lang_graph()
        return _compiled_graph
    except Exception as exc:
        _graph_init_error = exc
        logger.warning("LangGraph build failed, degrading to synchronous orchestrator: %s", exc)
        return None

The synchronous orchestrator reuses the exact same node functions and conditional logic, with behavior consistent with the LangGraph version. See Multi-Agent Collaboration · Synchronous Fallback Path.

4. Redis Unreachable → In-Memory Queue

When Redis connection fails, degrades to in-process in-memory data structures:

Feature Redis mode In-memory mode (fallback)
Session storage Cross-process persistence In-process dict, lost on restart
Cache Distributed cache In-memory dict
Message queue Redis Pub/Sub In-memory queue

In-memory mode use cases

In-memory mode is suitable for single-machine development and testing. Multi-instance deployment or production environments requiring session persistence must enable Redis.

5. Business API Failure → mock Business System

When BUSINESS_ADAPTER_MODE=http but config is missing or calls fail, degrades:

# http mode but BASE_URL empty → auto-degrade to mock with warning
# Business API call timeout/failure → degrade to mock business system

The mock business system provides in-memory order/member/return/account data, ensuring the business query Agent chain is available.

6. Real LLM Failure → ModelRouter Fallback

ModelRouter falls back to the main LLM retry when the small model call fails:

try:
    # Prefer small model (Doubao/Qwen), first token ~1s
    raw = get_model_router().chat_with_routing(messages=messages, query=query, ...)
except Exception as exc:
    # ModelRouter call failed: degrade to main LLM direct call, ensure availability
    logger.warning("ModelRouter call failed, degrading main LLM intent recognition: %s", exc)
    raw = self.llm_client.chat(messages, ...)

Dual-Provider routing safety

The small model (Doubao/Qwen) and main LLM (DeepSeek) are different Providers, with incompatible model names. When the small model is not configured, the system uses the main LLM directly, avoiding call failures from model name switching.

7. Langfuse Not Configured → no-op

When LANGFUSE_ENABLED=False or Key is empty, LangfuseClient degrades to no-op:

# When not enabled, start_langfuse_trace returns None
langfuse_trace = start_langfuse_trace(name="run_graph", metadata={...})

# finish_langfuse_trace(None, ...) is a no-op, silently skipped
finish_langfuse_trace(langfuse_trace, status="success")

LLM calls fall back to the native OpenAI SDK (not wrapped by langfuse.openai), so the main path is completely unaffected.


Fallback Implementation Mechanism

try/except + warning log

Every potentially failing call is wrapped in try/except; on failure, a warning is logged and degraded, without throwing to the caller:

# Typical pattern: cache read failure does not block the main path
try:
    cached_intent = get_intent_cache().get(query)
    if cached_intent is not None:
        return cached_intent
except Exception as exc:
    # Cache read failure does not block the main path; proceed normally to LLM
    logger.warning("Intent cache read failed, degrading to LLM intent recognition: %s", exc)

Singleton Reset

A degraded singleton marks its state and does not retry afterward (to avoid repeated exception overhead). After tests or config changes, use reset_* functions to clear the cache and re-initialize:

Reset function Effect
reset_graph() Resets LangGraph compile cache and synchronous orchestrator instance
reset_orchestrator() Resets OrchestratorAgent singleton
reset_hybrid_retriever() Resets HybridRetriever singleton
get_settings.cache_clear() Resets Settings config singleton

Reset usage in tests

from app.agents.graph import reset_graph
from app.agents.orchestrator import reset_orchestrator

def test_with_new_config():
    # After modifying config, reset singletons for new config to take effect
    reset_graph()
    reset_orchestrator()
    # Subsequent calls will re-initialize

Monitoring Instrumentation Fault Tolerance

Monitoring instrumentation failure also does not affect the business path:

def _record_step_safe(trace_id, node, input_summary, output_summary, duration_ms):
    """Safely record a node step: any exception does not affect the main path.

    On instrumentation failure, only logs, to avoid the observability system
    taking down the business path.
    """
    if not trace_id:
        return
    try:
        get_monitor().record_step(trace_id, node, input_summary, output_summary, duration_ms)
    except Exception as exc:
        logger.warning("Monitoring instrumentation failed node=%s err=%s", node, exc)

Fault Self-Healing: circuit_breaker

The system has a built-in CircuitBreaker that automatically trips and degrades external calls that repeatedly fail, with half-open probing to recover:

State Machine

stateDiagram-v2
    [*] --> Closed: Initial state
    Closed --> Open: Consecutive failures reach threshold
    Open --> HalfOpen: After cooldown
    HalfOpen --> Closed: Probe success
    HalfOpen --> Open: Probe failure
    Closed --> [*]

Three States

State Behavior Description
CLOSED Normal call Default state, requests pass normally
OPEN Direct rejection Trips after consecutive failures reach threshold; requests go directly to the fallback path, no longer calling the target service
HALF_OPEN Probe pass After cooldown, passes one probe request; success restores CLOSED, failure re-opens OPEN

Workflow

class CircuitBreaker:
    """Circuit breaker: CLOSED → OPEN → HALF_OPEN → CLOSED.

    Trips (OPEN) after consecutive failures reach threshold; after cooldown enters
    half-open (HALF_OPEN), passes one probe request; success restores (CLOSED),
    failure re-trips (OPEN).
    """

    def call(self, func, *args, **kwargs):
        # CLOSED: normal call, record success/failure
        # OPEN: directly raise CircuitBreakerOpenError, go to fallback path
        # HALF_OPEN: pass one probe
        ...

    def record_success(self):
        # HALF_OPEN → CLOSED (probe success, service restored)
        # CLOSED: reset failure count

    def record_failure(self):
        # HALF_OPEN → OPEN (probe failure, re-trip)
        # CLOSED: accumulate failures; reach threshold → OPEN

Value of circuit breaking

When an external service (e.g., LLM API) is persistently unavailable, the circuit breaker goes directly to the fallback path, avoiding waiting for timeout on every request (e.g., 30s), significantly reducing user-perceived latency. After service recovery, half-open probing automatically switches back, without manual intervention.

Circuit Breaker Statistics

View each component's circuit state via GET /api/v1/observability/health:

{
  "llm": {"state": "closed", "failures": 0},
  "vectorstore": {"state": "closed", "failures": 0},
  "redis": {"state": "open", "failures": 5}
}

Fallback Design Summary

flowchart TD
    Req[User Request] --> L1{"LLM available?"}
    L1 -- "No" --> Mock["_MockLLM assembled"]
    L1 -- "Yes" --> L2{"LangGraph available?"}
    L2 -- "No" --> Sync["Synchronous orchestrator"]
    L2 -- "Yes" --> LG["LangGraph state machine"]
    Mock --> L3{"BGE available?"}
    Sync --> L3
    LG --> L3
    L3 -- "No" --> Hash["hash fallback vectors"]
    L3 -- "Yes" --> BGE["BGE semantic retrieval"]
    Hash --> L4{"Redis available?"}
    BGE --> L4
    L4 -- "No" --> Mem["In-memory queue"]
    L4 -- "Yes" --> Redis["Redis persistence"]
    Mem --> Reply([Return reply])
    Redis --> Reply

    style Mock fill:#fff9c4,stroke:#fbc02d
    style Sync fill:#fff9c4,stroke:#fbc02d
    style Hash fill:#ffebee,stroke:#f44336
    style Mem fill:#fff9c4,stroke:#fbc02d

Design principles

  • Fallback-first: Every layer has fallback guarantees; single-point failure does not interrupt service
  • Seamless switching: Post-degradation behavior is as consistent as possible with normal mode (e.g., synchronous orchestrator reuses the same node functions)
  • Observable: Warnings are logged on degradation; component status can be viewed via the health check endpoint
  • Self-healing: Circuit breaker half-open probing auto-recovers, without manual intervention
  • No retry: Degraded singletons mark their state and do not retry, avoiding repeated exception overhead (tests can manually reset)

Topic Link
Multi-Agent collaboration (incl. LangGraph fallback details) Multi-Agent Collaboration
RAG retrieval pipeline (incl. BGE/Reranker fallback) RAG Retrieval Pipeline
Architecture (incl. fallback design principles) Architecture
Configuration guide (fallback-related options) Configuration Guide