Installation Guide¶
This guide covers the full process from environment preparation to service startup verification, including troubleshooting for common installation issues.
Just want to get started quickly?
If you only want to start as fast as possible, see Quick Start in One Minute. This guide is for scenarios requiring full configuration and troubleshooting.
Environment Requirements¶
| Item | Minimum | Recommended | Notes |
|---|---|---|---|
| Python | 3.11 | 3.11 / 3.12 | Uses TypedDict, tomllib and other new features; 3.10 and below are not supported |
| pip | 23.0 | Latest | chromadb and other packages need a newer pip dependency resolver |
| OS | — | — | Windows 10+ / macOS 12+ / Ubuntu 20.04+ all verified |
| Memory | 4 GB | 8 GB+ | BGE model loading takes ~2 GB memory |
| Disk | 2 GB | 3 GB+ | Incl. BGE weights and ChromaDB persistent data |
Windows users note
Some dependencies of chromadb and sentence-transformers require a C++ build environment. If installation fails, first install Visual C++ Build Tools and select the "Desktop development with C++" workload.
Dependency Installation¶
1. Create a Virtual Environment¶
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# Windows (CMD)
.venv\Scripts\activate.bat
# macOS / Linux
source .venv/bin/activate
2. Upgrade pip¶
# chromadb and other packages depend on a newer pip dependency resolver
python -m pip install --upgrade pip setuptools wheel
3. Install Project Dependencies¶
Key dependencies overview
| Dependency | Purpose |
|---|---|
fastapi + uvicorn |
Web framework and ASGI server |
langgraph |
Multi-Agent state machine orchestration (auto-degrades to synchronous orchestration when unavailable) |
openai |
OpenAI-compatible SDK, integrates DeepSeek / Qwen / Doubao, etc. |
chromadb |
Vector database (hnsw:space=cosine) |
sentence-transformers |
BGE embedding and Reranker model loading |
rank-bm25 |
BM25 keyword retrieval |
langfuse |
LLM tracing and prompt version management (auto-degrades when not configured) |
pydantic-settings |
Configuration management, loads from .env and environment variables |
redis |
Session persistence and caching (optional, degrades to in-memory queue) |
unstructured + PyMuPDF + python-docx + beautifulsoup4 |
Multi-format document parsing |
BGE Model Download¶
The system uses BAAI/bge-large-zh-v1.5 (1024 dimensions) as the embedding model and BAAI/bge-reranker-base as the reranker model. They are downloaded automatically on first launch, but may be slow or fail under domestic network conditions.
Automatic Download (Default)¶
On first retrieval or ingestion, EmbeddingService tries to load in the following order:
- HuggingFace primary source (
https://huggingface.co) - Domestic mirror source (
https://hf-mirror.com, configured viaHF_MIRROR_URL) - Local cache directory (
./models/bge-large-zh, configured viaEMBEDDING_LOCAL_CACHE_DIR) - hash fallback (deterministic degradation, only ensures the flow runs; retrieval quality is not guaranteed)
Impact of download failure degradation
If all four loading sources fail, the system degrades to hash fallback vectorization: uses hashlib.sha256 to generate deterministic 1024-dimensional vectors. Semantic retrieval is disabled in this mode, only ensuring the flow does not break. Production environments must ensure the BGE model is available.
Manual Download (Recommended for domestic users)¶
# Download BGE models to the local cache directory in advance
from sentence_transformers import SentenceTransformer
# Download the embedding model (~1.3 GB)
model = SentenceTransformer("BAAI/bge-large-zh-v1.5")
model.save("./models/bge-large-zh")
# Download the reranker model (~400 MB)
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("BAAI/bge-reranker-base")
reranker.save("./models/bge-reranker-base")
Use a Domestic Mirror¶
# Method 1: Set environment variable (recommended)
export HF_ENDPOINT=https://hf-mirror.com
# Windows PowerShell
$env:HF_ENDPOINT = "https://hf-mirror.com"
# Method 2: Configure in .env (the project has built-in fallback to this mirror)
# HF_MIRROR_URL=https://hf-mirror.com (already filled by default)
After setting the mirror, automatic download will prefer hf-mirror.com, significantly improving speed for domestic users.
Offline environment deployment
Run the manual download script on an internet-connected machine, then copy the entire ./models/bge-large-zh directory to the same path on the offline machine. The system will load directly from the local cache when detected, without needing internet.
Environment Variable Configuration¶
.env.example contains all configuration options with comments, and the default values are safe for out-of-the-box use. For minimal usage, focus on the following:
# === Minimal configuration (mock mode, out of the box) ===
APP_PORT=8000
DEBUG=True
API_KEY= # Empty = no auth
LLM_API_KEY= # Empty = mock mode
BUSINESS_ADAPTER_MODE=mock # mock business system
# === Connect a real LLM (optional) ===
LLM_API_KEY=sk-your-key
LLM_BASE_URL=https://api.deepseek.com/v1
LLM_MODEL=deepseek-chat
Full configuration guide
For detailed explanations, default values, and impact scope of all configuration options, see Configuration Guide.
Optional Dependencies¶
The following components are not required, but enhance system capabilities when enabled. When not installed or not configured, the system automatically degrades without affecting the main path.
Redis (Session Persistence)¶
| Scenario | No Redis (default) | With Redis |
|---|---|---|
| Session storage | In-process dict, lost on restart | Cross-process persistence |
| Cache | In-memory dict | Distributed cache |
| Queue | In-memory queue (degraded) | Redis Pub/Sub |
When you need Redis
- Multi-instance deployment: Multiple FastAPI processes need to share session state
- Session persistence: Preserve conversation history after service restart
- Production environment: Recommended to ensure reliability
Redis is not needed for single-machine development; in-memory mode is fully sufficient.
Elasticsearch (Full-text Retrieval Enhancement)¶
BM25 already implements built-in keyword retrieval based on rank-bm25. Elasticsearch is for very large knowledge bases (millions of chunks), providing more efficient distributed full-text retrieval.
Real Business System¶
# .env configuration
BUSINESS_ADAPTER_MODE=http
BUSINESS_API_BASE_URL=https://your-business-api.com
BUSINESS_API_KEY=your-business-api-key
BUSINESS_API_TIMEOUT=10
The default mock mode uses an in-memory mock business system (orders/members/returns/accounts). After switching to http mode, BusinessAgent will call the real business system REST API.
http mode degradation
When BUSINESS_ADAPTER_MODE=http but BUSINESS_API_BASE_URL is empty, the system automatically degrades to mock mode and prints a warning log.
Startup Verification¶
1. Start the Service¶
2. Health Check¶
A 200 response means the service is healthy:
3. Component Health Check¶
Returns the status of each component to confirm whether LLM, vector store, Redis, and disk are ready:
4. Chat Verification¶
curl -X POST http://localhost:8000/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello"}'
In mock mode, it should return a chitchat greeting. With a real LLM configured, it returns an LLM-generated reply.
Common Installation Issues¶
chromadb Installation Fails¶
Symptom: pip install chromadb reports a compile error or dependency conflict.
Solution
- Download Visual C++ Build Tools
- During installation, select "Desktop development with C++"
- Restart the terminal and re-run
pip install chromadb
sentence-transformers Downloads Slowly¶
Symptom: First launch hangs on model download, or times out and fails.
Solution
# Option 1: Set HuggingFace mirror (recommended)
export HF_ENDPOINT=https://hf-mirror.com
# Windows PowerShell
$env:HF_ENDPOINT = "https://hf-mirror.com"
# Option 2: Download manually in advance (see "Manual Download" section above)
# Option 3: The system has built-in mirror fallback, will automatically try hf-mirror.com without extra action
Download failure does not block startup — the system auto-degrades to hash fallback vectorization, and the main path runs (but semantic retrieval quality drops).
LangGraph Unavailable¶
Symptom: Log shows LangGraph build failed, degrading to synchronous orchestrator.
Solution
No manual action needed. This is expected behavior: when LangGraph is unavailable or build fails, the system auto-degrades to the _SynchOrchestrator synchronous orchestrator, reusing the same node functions (intent_node / agent_node / dialog_node / escalate_node). Behavior is fully consistent with the LangGraph version, only lacking graph-structure scheduling.
To enable LangGraph:
After installation, restart the service. The log should show LangGraph orchestrator built successfully.
Port Already in Use¶
Symptom: OSError: [Errno 48] Address already in use.
Solution
Out of Memory (OOM)¶
Symptom: Process is killed during startup or retrieval, with no clear error in logs.
Solution
# Reduce the embedding batch size to lower peak memory
# .env configuration
EMBEDDING_BATCH_SIZE=16 # Default 32, reduce when memory is tight
# Reduce recall count to lower per-retrieval memory usage
VECTOR_TOP_K=15
BM25_TOP_K=15
Production environments should have ≥ 8 GB memory and enable Redis to share cache pressure.
Import Error ModuleNotFoundError¶
Symptom: ModuleNotFoundError: No module named 'app'.
Solution
Make sure to run commands in the project root directory (the level containing the app/ directory):
Next Steps¶
- Configuration Guide — Understand all configuration options
- Architecture — Understand the system design
- Quick Start in One Minute — Quickly verify the installation result