Changelog¶
This project follows Semantic Versioning. Version numbers use the MAJOR.MINOR.PATCH format. This page records all changes in reverse chronological order.
Version semantics
- MAJOR: Incompatible API changes
- MINOR: Backward-compatible new features
- PATCH: Backward-compatible bug fixes
v0.4.0¶
Release date: 2026-07-03
Theme: Agent assist workbench — Closing the human-agent collaboration loop
New features¶
- 8
/api/v1/agent/*endpoints: Provide agent-side workbench API support after escalation GET /sessions/pending: Pending session list (sorted byEscalationPrioritydescending)GET /sessions/{session_id}: Session details (includingEscalationCard+ fullhistory)POST /sessions/{session_id}/accept: Agent takes over (CASpending → assigned)POST /sessions/{session_id}/messages: Agent appends messages tohistoryPOST /sessions/{session_id}/knowledge-recommend: Knowledge recommendation assistancePOST /sessions/{session_id}/business-assist: Business query assistance (with masking)POST /sessions/{session_id}/resolve: Mark as resolved (CASassigned → resolved)-
POST /sessions/{session_id}/solution: Record solution and consolidate back to the knowledge base -
SessionManager extension: 4 new fields + 4 CAS methods
- New fields:
agent_status/assigned_agent_id/escalation_card/resolve_note - New methods:
list_pending_sessions()/assign_agent()/resolve_session()/mark_pending() -
All use CAS (Compare-And-Swap) to ensure concurrency safety
-
escalate_node integration: On escalation, automatically calls
mark_pendingto setagent_status="pending"and caches theEscalationCardto avoid repeated construction
Tests¶
- Added 28 test cases, covering happy paths, boundary scenarios (404/409/422), and concurrent takeover CAS for all 8 endpoints
Documentation¶
- Added Agent assist workbench tutorial
- Updated API reference with descriptions for 8 endpoints
Complete endpoint list
| Endpoint | Method | Description |
|---|---|---|
/api/v1/agent/sessions/pending |
GET | Pending list |
/api/v1/agent/sessions/{id} |
GET | Session details |
/api/v1/agent/sessions/{id}/accept |
POST | Agent takeover |
/api/v1/agent/sessions/{id}/messages |
POST | Agent sends message |
/api/v1/agent/sessions/{id}/knowledge-recommend |
POST | Knowledge recommendation |
/api/v1/agent/sessions/{id}/business-assist |
POST | Business assistance |
/api/v1/agent/sessions/{id}/resolve |
POST | Mark resolved |
/api/v1/agent/sessions/{id}/solution |
POST | Solution consolidation |
v0.3.0¶
Release date: 2026-07-02
Theme: Langfuse LLM observability — Full-chain trace visualization
New features¶
- 11 LLM call points tagged with prompt name/version: Covers all LLM calls including intent recognition, query rewriting, knowledge generation, and dialog polishing
recognize_intent_v1: Intent recognitionquery_rewrite_v1: Query rewritingknowledge_generate_v1: Knowledge Q&A generationdialog_polish_v1: Dialog polishing-
And 11 prompt tags in total
-
Dual write to trace and monitor: Langfuse trace and local
Monitorrecord simultaneously, acting as fallback for each other - Langfuse: Visualizes the full chain; automatically reports token/cost/latency
-
Monitor: Local trace summary, queryable via
/api/v1/monitor/traces -
Automatic no-op fallback when unconfigured: When
LANGFUSE_ENABLED=Falseor credentials are empty, all Langfuse calls degrade to no-ops without affecting main-chain performance
Configuration¶
New Langfuse configuration items (see .env.example):
LANGFUSE_ENABLED=False # Empty or False: degrade all calls to no-op
LANGFUSE_PUBLIC_KEY= # Obtain from Langfuse Project Settings → API Keys
LANGFUSE_SECRET_KEY=
LANGFUSE_HOST=https://cloud.langfuse.com
Tests¶
- Added Langfuse integration tests, covering both successful reporting and fallback scenarios
Fallback trigger conditions
LANGFUSE_ENABLED=FalseLANGFUSE_PUBLIC_KEYorLANGFUSE_SECRET_KEYis empty- Langfuse service connection timeout (default 3 seconds)
v0.2.0¶
Release date: 2026-07-02
Theme: Streaming first-token optimization — Perceived wait reduced from 7-8s to <1s
New features¶
- HotQueryCache / ModelRouter / IntentCache three-layer optimization:
HotQueryCache: High-frequency query cache; on hit, first token <100ms, skipping the full orchestrationModelRouter: Lightweight tasks like intent recognition use small models (about 1/10 cost); main LLM only for generation-
IntentCache: Same-intent reuse, avoiding repeated intent recognition -
Qwen qwen-turbo replaces Doubao lite: Default small model switched to Qwen for better Chinese understanding
SMALL_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1-
SMALL_LLM_MODEL=qwen-turbo -
Intent fast-track: High-frequency intents like chitchat/escalation are matched by keywords, skipping LLM intent recognition, with first token <200ms
-
Non-knowledge Q&A intent streaming: Intents like chitchat / business_query are sliced by sentence-ending punctuation and streamed out, instead of waiting for full generation before single-token output
-
First-token latency monitoring: New
stream_first_tokeninstrumentation, queryable via/api/v1/performance/metricsfor avg/p95
Performance improvements¶
| Metric | Before | After | Improvement |
|---|---|---|---|
| First token time (knowledge Q&A) | ~3s | <1s | 67% ↓ |
| First token time (chitchat fast-track) | ~3s | <200ms | 93% ↓ |
| First token time (cache hit) | ~3s | <100ms | 97% ↓ |
| Sync endpoint P95 | 7.94s | 2.27s | 71% ↓ |
Tests¶
- Added tests for first-token time, fast-track streaming, and cache hits
Three-layer cache coordination
flowchart TD
A[Request] --> B{HotQueryCache hit?}
B -- Yes --> C[Return directly, first token <100ms]
B -- No --> D{IntentCache hit?}
D -- Yes --> E[Skip intent recognition]
D -- No --> F[ModelRouter routes to small model]
F --> G[Small model intent recognition]
E --> H[Hybrid retrieval + generation]
G --> H
H --> I[Write to HotQueryCache]
I --> J[Return result]
v0.1.0¶
Release date: 2026-06
Theme: Initial version — Multi-agent collaboration + RAG knowledge-enhanced intelligent customer service system
Core features¶
- Multi-agent collaboration architecture: "1+5" architecture based on LangGraph
- 1 orchestration Agent (OrchestratorAgent): Intent recognition, routing, fallback aggregation
- 5 specialized Agents: Knowledge retrieval (KnowledgeAgent) / Business query (BusinessAgent) / Emotion analysis (EmotionAgent) / Ticket handling (TicketAgent) / Dialog polishing (DialogAgent)
-
Automatic fallback to synchronous orchestration when LangGraph is unavailable
-
Hybrid retrieval + RAG:
- Query rewriting → vector retrieval + BM25 dual-channel recall → RRF fusion → Reranker reordering → LLM generation
- Refuses to answer when similarity is below threshold, avoiding hallucination
-
Recall@5 = 1.0, Hit Rate = 0.9333, hallucination rate = 0.0
-
Human escalation loop:
- Triggered by emotion sensitivity / consecutive failures / user request
- Generates
EscalationCardwith escalation reason, priority, and context summary -
Working-hours constraint (
WORKING_HOURS_START/WORKING_HOURS_END) -
Knowledge base governance:
- Document ingestion pipeline (PDF/Word/Markdown/HTML parsing)
- Quality validation (deduplication, terminology, sensitive words)
- Version management and rollback
-
Full / incremental / real-time update mechanisms
-
Business system integration:
- Order / member / return / account API adapter framework
- Identity verification, phone masking, two-step confirmation for write operations
- mock / http dual mode, out of the box
Endpoint list (v0.1.0)¶
| Module | Endpoint count | Description |
|---|---|---|
| Chat | 2 | /chat + /chat/stream |
| Knowledge base | 9 | Ingestion/stats/document management/quality/version/canary |
| Escalation | 3 | Solution submission/review/ingestion |
| Document update | 4 | Full/incremental/real-time/status |
| Ticket mining | 2 | Trigger/status |
| Retrieval tuning | 3 | Query/update/reset |
| Retrieval evaluation | 3 | Run/list/detail |
| Performance monitoring | 3 | Metrics/cache/invalidate |
| Observability | 5 | Circuit breaker/alerts/health/token |
| Monitoring | 5 | Overview/trace/agent/session |
| Operations | 6 | Experiment/dashboard/checklist |
| Health check | 1 | Liveness probe |
| Gateway | 1 | Multi-channel access |
Tests¶
- Initial test suite of 640+ cases, covering core chains and boundary scenarios
Documentation¶
- Initial documentation system: Quick start, installation guide, configuration, architecture design, tutorials
v0.1.0 performance metrics (verified with real LLM)
| Metric | Target | Actual | Pass |
|---|---|---|---|
| Recall@5 | ≥ 0.85 | 1.0 | ✅ |
| Hit Rate | ≥ 0.90 | 0.9333 | ✅ |
| Hallucination rate | ≤ 0.10 | 0.0 | ✅ |
| Independent resolution rate | ≥ 60% | 80% | ✅ |
| Average response time | ≤ 3s | 2.27s | ✅ |
Version planning¶
Next version v0.5.0 plan (draft)
- Frontend agent workbench UI (consumes 8 agent endpoints)
- Knowledge base management console UI
- Multi-language support (English)
- Elasticsearch full-text search integration
Related documentation¶
- Contributing guide: Version release process
- API reference: Current complete endpoints
- Architecture design: Overall system architecture