Fallback Strategy¶
The system designs fallback guarantees at every layer, ensuring the main path remains available under any single-point failure. Core philosophy: Prefer degraded service over service interruption.
Seven-Layer Fallback Matrix¶
| Layer | Component | Failure Scenario | Fallback Target | User Perception |
|---|---|---|---|---|
| 1 | LLM | LLM_API_KEY not configured / call failed |
_MockLLM assembled reply |
Reply not LLM-generated, retrieval chain normal |
| 2 | BGE embedding | Model download failed / load timeout | hash fallback vectors (1024-dim) | Semantic retrieval disabled, flow runs |
| 3 | LangGraph | Package not installed / build failed / runtime exception | _SynchOrchestrator synchronous orchestration |
No perception, behavior fully consistent |
| 4 | Redis | Connection unreachable | In-memory queue / in-memory dict | Single-machine usable, sessions lost on restart |
| 5 | Business API | BUSINESS_API_BASE_URL empty / call timeout |
mock business system (in-memory) | Business query returns mock data |
| 6 | Real LLM | Main LLM call failed | ModelRouter falls back to default model retry | Occasional latency increase, result normal |
| 7 | Langfuse | Not configured / reporting failed | no-op (silent skip) | No trace data, no impact on main path |
Layer-by-Layer Fallback Details¶
1. LLM Unavailable → _MockLLM¶
When LLM_API_KEY is empty, LLMClient auto-instantiates _MockLLM:
class _MockLLM:
"""Mock LLM: fallback when no API Key or call fails.
Does not call any external service; extracts the last user content from messages
and assembles a reply. Intent recognition uses keyword rules; answer generation
uses retrieval result concatenation.
"""
| Behavior | Mock mode | Real mode |
|---|---|---|
| Intent recognition | Keyword rules (sentiment → ticket → business → chitchat → knowledge_qa) | LLM structured recognition + IntentCache |
| Query rewriting | Skipped, uses original query | DeepSeek sync rewrite |
| Answer generation | Retrieval result concatenation | LLM generates from fragments |
| Dialog polish | Skipped, uses raw reply | DialogAgent LLM polish |
Value of mock mode
Mock mode lets the system run the full chain without an LLM Key or network, ideal for local development, CI testing, and demos. The retrieval chain (vector + BM25 + RRF + Reranker) runs in full, which can verify knowledge base ingestion and retrieval effectiveness.
2. BGE Load Failure → hash fallback¶
EmbeddingService tries to load in a four-level order; after all fail, degrades to hash vectorization:
# Load order: primary source → mirror source → local cache → hash fallback
# 1. HuggingFace primary source (https://huggingface.co)
# 2. Domestic mirror source (HF_MIRROR_URL, default https://hf-mirror.com)
# 3. Local cache directory (EMBEDDING_LOCAL_CACHE_DIR, default ./models/bge-large-zh)
# 4. hash fallback: deterministic degradation, only ensures the flow is available
class _FallbackEmbedder:
"""Deterministic hash vectorization: only for running the flow.
Uses hashlib.sha256 to generate 1024-dim vectors, aligned with BGE dimensions
to avoid vector store schema conflicts. Semantic retrieval is disabled in this mode.
"""
def encode(self, text: str) -> List[float]:
digest = hashlib.sha256(text.encode("utf-8")).digest()
# Generate deterministic 1024-dim vectors
...
hash fallback impact
hash fallback only ensures the flow does not break; semantic retrieval is completely disabled (identical text has identical vectors, but semantically similar text has no correlation). Production environments must ensure the BGE model is available; check whether the embedding mode is bge via GET /api/v1/observability/health.
3. LangGraph Unavailable → Synchronous Orchestrator¶
When LangGraph is not installed or build fails, degrades to _SynchOrchestrator:
def _get_compiled_graph():
"""Lazily get the compiled LangGraph instance.
Tries to build on first call; on failure records the error and returns None,
subsequent calls reuse the result, avoiding retries every time.
"""
global _compiled_graph, _graph_init_error
if _compiled_graph is not None:
return _compiled_graph
if _graph_init_error is not None:
return None # Previously failed, no retry
try:
_compiled_graph = _build_lang_graph()
return _compiled_graph
except Exception as exc:
_graph_init_error = exc
logger.warning("LangGraph build failed, degrading to synchronous orchestrator: %s", exc)
return None
The synchronous orchestrator reuses the exact same node functions and conditional logic, with behavior consistent with the LangGraph version. See Multi-Agent Collaboration · Synchronous Fallback Path.
4. Redis Unreachable → In-Memory Queue¶
When Redis connection fails, degrades to in-process in-memory data structures:
| Feature | Redis mode | In-memory mode (fallback) |
|---|---|---|
| Session storage | Cross-process persistence | In-process dict, lost on restart |
| Cache | Distributed cache | In-memory dict |
| Message queue | Redis Pub/Sub | In-memory queue |
In-memory mode use cases
In-memory mode is suitable for single-machine development and testing. Multi-instance deployment or production environments requiring session persistence must enable Redis.
5. Business API Failure → mock Business System¶
When BUSINESS_ADAPTER_MODE=http but config is missing or calls fail, degrades:
# http mode but BASE_URL empty → auto-degrade to mock with warning
# Business API call timeout/failure → degrade to mock business system
The mock business system provides in-memory order/member/return/account data, ensuring the business query Agent chain is available.
6. Real LLM Failure → ModelRouter Fallback¶
ModelRouter falls back to the main LLM retry when the small model call fails:
try:
# Prefer small model (Doubao/Qwen), first token ~1s
raw = get_model_router().chat_with_routing(messages=messages, query=query, ...)
except Exception as exc:
# ModelRouter call failed: degrade to main LLM direct call, ensure availability
logger.warning("ModelRouter call failed, degrading main LLM intent recognition: %s", exc)
raw = self.llm_client.chat(messages, ...)
Dual-Provider routing safety
The small model (Doubao/Qwen) and main LLM (DeepSeek) are different Providers, with incompatible model names. When the small model is not configured, the system uses the main LLM directly, avoiding call failures from model name switching.
7. Langfuse Not Configured → no-op¶
When LANGFUSE_ENABLED=False or Key is empty, LangfuseClient degrades to no-op:
# When not enabled, start_langfuse_trace returns None
langfuse_trace = start_langfuse_trace(name="run_graph", metadata={...})
# finish_langfuse_trace(None, ...) is a no-op, silently skipped
finish_langfuse_trace(langfuse_trace, status="success")
LLM calls fall back to the native OpenAI SDK (not wrapped by langfuse.openai), so the main path is completely unaffected.
Fallback Implementation Mechanism¶
try/except + warning log¶
Every potentially failing call is wrapped in try/except; on failure, a warning is logged and degraded, without throwing to the caller:
# Typical pattern: cache read failure does not block the main path
try:
cached_intent = get_intent_cache().get(query)
if cached_intent is not None:
return cached_intent
except Exception as exc:
# Cache read failure does not block the main path; proceed normally to LLM
logger.warning("Intent cache read failed, degrading to LLM intent recognition: %s", exc)
Singleton Reset¶
A degraded singleton marks its state and does not retry afterward (to avoid repeated exception overhead). After tests or config changes, use reset_* functions to clear the cache and re-initialize:
| Reset function | Effect |
|---|---|
reset_graph() |
Resets LangGraph compile cache and synchronous orchestrator instance |
reset_orchestrator() |
Resets OrchestratorAgent singleton |
reset_hybrid_retriever() |
Resets HybridRetriever singleton |
get_settings.cache_clear() |
Resets Settings config singleton |
Reset usage in tests
Monitoring Instrumentation Fault Tolerance¶
Monitoring instrumentation failure also does not affect the business path:
def _record_step_safe(trace_id, node, input_summary, output_summary, duration_ms):
"""Safely record a node step: any exception does not affect the main path.
On instrumentation failure, only logs, to avoid the observability system
taking down the business path.
"""
if not trace_id:
return
try:
get_monitor().record_step(trace_id, node, input_summary, output_summary, duration_ms)
except Exception as exc:
logger.warning("Monitoring instrumentation failed node=%s err=%s", node, exc)
Fault Self-Healing: circuit_breaker¶
The system has a built-in CircuitBreaker that automatically trips and degrades external calls that repeatedly fail, with half-open probing to recover:
State Machine¶
stateDiagram-v2
[*] --> Closed: Initial state
Closed --> Open: Consecutive failures reach threshold
Open --> HalfOpen: After cooldown
HalfOpen --> Closed: Probe success
HalfOpen --> Open: Probe failure
Closed --> [*]
Three States¶
| State | Behavior | Description |
|---|---|---|
CLOSED |
Normal call | Default state, requests pass normally |
OPEN |
Direct rejection | Trips after consecutive failures reach threshold; requests go directly to the fallback path, no longer calling the target service |
HALF_OPEN |
Probe pass | After cooldown, passes one probe request; success restores CLOSED, failure re-opens OPEN |
Workflow¶
class CircuitBreaker:
"""Circuit breaker: CLOSED → OPEN → HALF_OPEN → CLOSED.
Trips (OPEN) after consecutive failures reach threshold; after cooldown enters
half-open (HALF_OPEN), passes one probe request; success restores (CLOSED),
failure re-trips (OPEN).
"""
def call(self, func, *args, **kwargs):
# CLOSED: normal call, record success/failure
# OPEN: directly raise CircuitBreakerOpenError, go to fallback path
# HALF_OPEN: pass one probe
...
def record_success(self):
# HALF_OPEN → CLOSED (probe success, service restored)
# CLOSED: reset failure count
def record_failure(self):
# HALF_OPEN → OPEN (probe failure, re-trip)
# CLOSED: accumulate failures; reach threshold → OPEN
Value of circuit breaking
When an external service (e.g., LLM API) is persistently unavailable, the circuit breaker goes directly to the fallback path, avoiding waiting for timeout on every request (e.g., 30s), significantly reducing user-perceived latency. After service recovery, half-open probing automatically switches back, without manual intervention.
Circuit Breaker Statistics¶
View each component's circuit state via GET /api/v1/observability/health:
{
"llm": {"state": "closed", "failures": 0},
"vectorstore": {"state": "closed", "failures": 0},
"redis": {"state": "open", "failures": 5}
}
Fallback Design Summary¶
flowchart TD
Req[User Request] --> L1{"LLM available?"}
L1 -- "No" --> Mock["_MockLLM assembled"]
L1 -- "Yes" --> L2{"LangGraph available?"}
L2 -- "No" --> Sync["Synchronous orchestrator"]
L2 -- "Yes" --> LG["LangGraph state machine"]
Mock --> L3{"BGE available?"}
Sync --> L3
LG --> L3
L3 -- "No" --> Hash["hash fallback vectors"]
L3 -- "Yes" --> BGE["BGE semantic retrieval"]
Hash --> L4{"Redis available?"}
BGE --> L4
L4 -- "No" --> Mem["In-memory queue"]
L4 -- "Yes" --> Redis["Redis persistence"]
Mem --> Reply([Return reply])
Redis --> Reply
style Mock fill:#fff9c4,stroke:#fbc02d
style Sync fill:#fff9c4,stroke:#fbc02d
style Hash fill:#ffebee,stroke:#f44336
style Mem fill:#fff9c4,stroke:#fbc02d
Design principles
- Fallback-first: Every layer has fallback guarantees; single-point failure does not interrupt service
- Seamless switching: Post-degradation behavior is as consistent as possible with normal mode (e.g., synchronous orchestrator reuses the same node functions)
- Observable: Warnings are logged on degradation; component status can be viewed via the health check endpoint
- Self-healing: Circuit breaker half-open probing auto-recovers, without manual intervention
- No retry: Degraded singletons mark their state and do not retry, avoiding repeated exception overhead (tests can manually reset)
Related Documentation¶
| Topic | Link |
|---|---|
| Multi-Agent collaboration (incl. LangGraph fallback details) | Multi-Agent Collaboration |
| RAG retrieval pipeline (incl. BGE/Reranker fallback) | RAG Retrieval Pipeline |
| Architecture (incl. fallback design principles) | Architecture |
| Configuration guide (fallback-related options) | Configuration Guide |