Changelog¶

This project follows Semantic Versioning. Version numbers use the MAJOR.MINOR.PATCH format. This page records all changes in reverse chronological order.

Version semantics

MAJOR: Incompatible API changes
MINOR: Backward-compatible new features
PATCH: Backward-compatible bug fixes

v0.4.0¶

Release date: 2026-07-03

Theme: Agent assist workbench — Closing the human-agent collaboration loop

New features¶

8 /api/v1/agent/* endpoints: Provide agent-side workbench API support after escalation
GET /sessions/pending: Pending session list (sorted by EscalationPriority descending)
GET /sessions/{session_id}: Session details (including EscalationCard + full history)
POST /sessions/{session_id}/accept: Agent takes over (CAS pending → assigned)
POST /sessions/{session_id}/messages: Agent appends messages to history
POST /sessions/{session_id}/knowledge-recommend: Knowledge recommendation assistance
POST /sessions/{session_id}/business-assist: Business query assistance (with masking)
POST /sessions/{session_id}/resolve: Mark as resolved (CAS assigned → resolved)
POST /sessions/{session_id}/solution: Record solution and consolidate back to the knowledge base
SessionManager extension: 4 new fields + 4 CAS methods
New fields: agent_status / assigned_agent_id / escalation_card / resolve_note
New methods: list_pending_sessions() / assign_agent() / resolve_session() / mark_pending()
All use CAS (Compare-And-Swap) to ensure concurrency safety
escalate_node integration: On escalation, automatically calls mark_pending to set agent_status="pending" and caches the EscalationCard to avoid repeated construction

Tests¶

Added 28 test cases, covering happy paths, boundary scenarios (404/409/422), and concurrent takeover CAS for all 8 endpoints

Documentation¶

Added Agent assist workbench tutorial
Updated API reference with descriptions for 8 endpoints

Complete endpoint list

Endpoint	Method	Description
`/api/v1/agent/sessions/pending`	GET	Pending list
`/api/v1/agent/sessions/{id}`	GET	Session details
`/api/v1/agent/sessions/{id}/accept`	POST	Agent takeover
`/api/v1/agent/sessions/{id}/messages`	POST	Agent sends message
`/api/v1/agent/sessions/{id}/knowledge-recommend`	POST	Knowledge recommendation
`/api/v1/agent/sessions/{id}/business-assist`	POST	Business assistance
`/api/v1/agent/sessions/{id}/resolve`	POST	Mark resolved
`/api/v1/agent/sessions/{id}/solution`	POST	Solution consolidation

v0.3.0¶

Release date: 2026-07-02

Theme: Langfuse LLM observability — Full-chain trace visualization

New features¶

11 LLM call points tagged with prompt name/version: Covers all LLM calls including intent recognition, query rewriting, knowledge generation, and dialog polishing
recognize_intent_v1: Intent recognition
query_rewrite_v1: Query rewriting
knowledge_generate_v1: Knowledge Q&A generation
dialog_polish_v1: Dialog polishing
And 11 prompt tags in total
Dual write to trace and monitor: Langfuse trace and local Monitor record simultaneously, acting as fallback for each other
Langfuse: Visualizes the full chain; automatically reports token/cost/latency
Monitor: Local trace summary, queryable via /api/v1/monitor/traces
Automatic no-op fallback when unconfigured: When LANGFUSE_ENABLED=False or credentials are empty, all Langfuse calls degrade to no-ops without affecting main-chain performance

Configuration¶

New Langfuse configuration items (see .env.example):

LANGFUSE_ENABLED=False        # Empty or False: degrade all calls to no-op
LANGFUSE_PUBLIC_KEY=          # Obtain from Langfuse Project Settings → API Keys
LANGFUSE_SECRET_KEY=
LANGFUSE_HOST=https://cloud.langfuse.com

Tests¶

Added Langfuse integration tests, covering both successful reporting and fallback scenarios

Fallback trigger conditions

LANGFUSE_ENABLED=False
LANGFUSE_PUBLIC_KEY or LANGFUSE_SECRET_KEY is empty
Langfuse service connection timeout (default 3 seconds)

v0.2.0¶

Release date: 2026-07-02

Theme: Streaming first-token optimization — Perceived wait reduced from 7-8s to <1s

New features¶

HotQueryCache / ModelRouter / IntentCache three-layer optimization:
HotQueryCache: High-frequency query cache; on hit, first token <100ms, skipping the full orchestration
ModelRouter: Lightweight tasks like intent recognition use small models (about 1/10 cost); main LLM only for generation
IntentCache: Same-intent reuse, avoiding repeated intent recognition
Qwen qwen-turbo replaces Doubao lite: Default small model switched to Qwen for better Chinese understanding
SMALL_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
SMALL_LLM_MODEL=qwen-turbo
Intent fast-track: High-frequency intents like chitchat/escalation are matched by keywords, skipping LLM intent recognition, with first token <200ms
Non-knowledge Q&A intent streaming: Intents like chitchat / business_query are sliced by sentence-ending punctuation and streamed out, instead of waiting for full generation before single-token output
First-token latency monitoring: New stream_first_token instrumentation, queryable via /api/v1/performance/metrics for avg/p95

Performance improvements¶

Metric	Before	After	Improvement
First token time (knowledge Q&A)	~3s	<1s	67% ↓
First token time (chitchat fast-track)	~3s	<200ms	93% ↓
First token time (cache hit)	~3s	<100ms	97% ↓
Sync endpoint P95	7.94s	2.27s	71% ↓

Tests¶

Added tests for first-token time, fast-track streaming, and cache hits

Three-layer cache coordination

flowchart TD
    A[Request] --> B{HotQueryCache hit?}
    B -- Yes --> C[Return directly, first token <100ms]
    B -- No --> D{IntentCache hit?}
    D -- Yes --> E[Skip intent recognition]
    D -- No --> F[ModelRouter routes to small model]
    F --> G[Small model intent recognition]
    E --> H[Hybrid retrieval + generation]
    G --> H
    H --> I[Write to HotQueryCache]
    I --> J[Return result]

v0.1.0¶

Release date: 2026-06

Theme: Initial version — Multi-agent collaboration + RAG knowledge-enhanced intelligent customer service system

Core features¶

Multi-agent collaboration architecture: "1+5" architecture based on LangGraph
1 orchestration Agent (OrchestratorAgent): Intent recognition, routing, fallback aggregation
5 specialized Agents: Knowledge retrieval (KnowledgeAgent) / Business query (BusinessAgent) / Emotion analysis (EmotionAgent) / Ticket handling (TicketAgent) / Dialog polishing (DialogAgent)
Automatic fallback to synchronous orchestration when LangGraph is unavailable
Hybrid retrieval + RAG:
Query rewriting → vector retrieval + BM25 dual-channel recall → RRF fusion → Reranker reordering → LLM generation
Refuses to answer when similarity is below threshold, avoiding hallucination
Recall@5 = 1.0, Hit Rate = 0.9333, hallucination rate = 0.0
Human escalation loop:
Triggered by emotion sensitivity / consecutive failures / user request
Generates EscalationCard with escalation reason, priority, and context summary
Working-hours constraint (WORKING_HOURS_START / WORKING_HOURS_END)
Knowledge base governance:
Document ingestion pipeline (PDF/Word/Markdown/HTML parsing)
Quality validation (deduplication, terminology, sensitive words)
Version management and rollback
Full / incremental / real-time update mechanisms
Business system integration:
Order / member / return / account API adapter framework
Identity verification, phone masking, two-step confirmation for write operations
mock / http dual mode, out of the box

Endpoint list (v0.1.0)¶

Module	Endpoint count	Description
Chat	2	`/chat` + `/chat/stream`
Knowledge base	9	Ingestion/stats/document management/quality/version/canary
Escalation	3	Solution submission/review/ingestion
Document update	4	Full/incremental/real-time/status
Ticket mining	2	Trigger/status
Retrieval tuning	3	Query/update/reset
Retrieval evaluation	3	Run/list/detail
Performance monitoring	3	Metrics/cache/invalidate
Observability	5	Circuit breaker/alerts/health/token
Monitoring	5	Overview/trace/agent/session
Operations	6	Experiment/dashboard/checklist
Health check	1	Liveness probe
Gateway	1	Multi-channel access

Tests¶

Initial test suite of 640+ cases, covering core chains and boundary scenarios

Documentation¶

Initial documentation system: Quick start, installation guide, configuration, architecture design, tutorials

v0.1.0 performance metrics (verified with real LLM)

Metric	Target	Actual	Pass
Recall@5	≥ 0.85	1.0	✅
Hit Rate	≥ 0.90	0.9333	✅
Hallucination rate	≤ 0.10	0.0	✅
Independent resolution rate	≥ 60%	80%	✅
Average response time	≤ 3s	2.27s	✅

Version planning¶

Next version v0.5.0 plan (draft)

Frontend agent workbench UI (consumes 8 agent endpoints)
Knowledge base management console UI
Multi-language support (English)
Elasticsearch full-text search integration

Contributing guide: Version release process
API reference: Current complete endpoints
Architecture design: Overall system architecture

Changelog¶

v0.4.0¶

New features¶

Tests¶

Documentation¶

v0.3.0¶

New features¶

Configuration¶

Tests¶

v0.2.0¶

New features¶

Performance improvements¶

Tests¶

v0.1.0¶

Core features¶

Endpoint list (v0.1.0)¶

Tests¶

Documentation¶

Version planning¶

Related documentation¶