- .kiro/specs/ → docs/specs/(41 个历史需求 spec 迁移,移除 .config.kiro) - CLAUDE.md 三层拆分:根文件精简 + apps/backend/CLAUDE.md + .claude/commands/ - 新增 /spec-close、/pre-change 两个工作流命令 - DDL 基线刷新(从测试库重新导出 11 个文件,dws 35→38 表,biz 18→21 表) - BD_Manual → BD_manual 命名统一(48 个文件) - 修复 3 处文档与数据库不一致(auth.users.status 默认值、scheduled_tasks 字段、RLS 视图数) - 新增 BD_manual_public_rbac_tables.md(public schema 8 张 RBAC/工作流表) - 合并 biz.trigger_jobs 文档(10→12 字段,归档独立文档) - docs/database/README.md 索引更新 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
597 lines
24 KiB
Markdown
597 lines
24 KiB
Markdown
# Design: 开发调试全链路日志系统
|
||
|
||
## Overview
|
||
|
||
为小程序前后端联调提供全链路请求追踪能力。后端采集从 HTTP 请求进入到数据库查询的每一层细粒度日志(span),写入 JSON Lines 日志文件。admin-web 提供「开发测试日志」板块,支持按时间、类型等维度筛选和查看完整请求链路。
|
||
|
||
仅在开发/测试环境启用,生产环境通过开关关闭。
|
||
|
||
## Architecture
|
||
|
||
### 数据流
|
||
|
||
```
|
||
小程序请求 → FastAPI 后端
|
||
↓
|
||
TraceMiddleware(生成 request_id,开始计时)
|
||
↓
|
||
CORS 中间件(记录 MIDDLEWARE span)
|
||
↓
|
||
ResponseWrapperMiddleware(记录 MIDDLEWARE span)
|
||
↓
|
||
鉴权层(记录 AUTH span,含失败原因分类)
|
||
↓
|
||
路由处理函数(记录 ROUTE span)
|
||
↓
|
||
Service 层(记录 SERVICE span,含函数名、参数)
|
||
↓
|
||
数据库层(记录 DB_QUERY span,含 SQL、参数、行数、耗时)
|
||
├─ 连接获取(记录 DB_CONN span,含连接耗时)
|
||
└─ 连接释放(记录 DB_CONN_RELEASE span)
|
||
↓
|
||
[分支] SSE 流式响应(记录 SSE_START / SSE_EVENT / SSE_END span)
|
||
├─ AI 调用(记录 AI_CALL span,含 app_id、prompt 长度、token 数)
|
||
└─ 流式 token(记录 SSE_EVENT span,含累计 token 数)
|
||
↓
|
||
[分支] 异常处理(记录 ERROR span,含异常类型、堆栈、发生层级)
|
||
↓
|
||
响应返回(记录 HTTP_OUT span,含状态码、耗时汇总、响应体大小)
|
||
↓
|
||
TraceMiddleware 写入 JSON Lines 文件
|
||
↓
|
||
admin-web 通过 API 读取日志文件 → 展示
|
||
|
||
WebSocket 连接 → 独立 Trace
|
||
├─ WS_CONNECT span(连接建立)
|
||
├─ WS_MESSAGE span(每条消息)
|
||
└─ WS_DISCONNECT span(连接断开)
|
||
|
||
后台 Job → 独立 Trace(job_id 作为 root span)
|
||
├─ JOB_START span(任务开始)
|
||
├─ SERVICE / DB_QUERY span(内部调用)
|
||
└─ JOB_END / JOB_ERROR span(任务结束/失败)
|
||
```
|
||
|
||
### 核心组件
|
||
|
||
#### 1. 后端:Trace 采集系统
|
||
|
||
##### 1.1 TraceContext(contextvars)
|
||
|
||
```python
|
||
# apps/backend/app/trace/context.py
|
||
import contextvars
|
||
from dataclasses import dataclass, field
|
||
from datetime import datetime
|
||
from typing import Any
|
||
import uuid
|
||
|
||
trace_context: contextvars.ContextVar['TraceContext'] = contextvars.ContextVar('trace_context')
|
||
|
||
@dataclass
|
||
class TraceSpan:
|
||
"""单个追踪节点"""
|
||
span_type: str # HTTP_IN, AUTH, ROUTE, SERVICE, DB_QUERY, DB_CONN, DB_CONN_RELEASE,
|
||
# HTTP_OUT, ERROR, DB_ERROR, MIDDLEWARE, MIDDLEWARE_ERROR,
|
||
# SSE_START, SSE_EVENT, SSE_END, AI_CALL, AI_STREAM, AI_ERROR,
|
||
# WS_CONNECT, WS_MESSAGE, WS_DISCONNECT,
|
||
# JOB_START, JOB_END, JOB_ERROR
|
||
module: str # 模块路径 (e.g. "xcx_tasks")
|
||
function: str # 函数名 (e.g. "get_task_list")
|
||
description_zh: str # 中文描述
|
||
description_en: str # 英文描述
|
||
params: dict[str, Any] # 参数
|
||
result_summary: str # 结果摘要
|
||
duration_ms: float # 耗时毫秒
|
||
timestamp: str # ISO 时间戳
|
||
extra: dict[str, Any] = field(default_factory=dict) # SQL语句等额外信息
|
||
|
||
@dataclass
|
||
class TraceContext:
|
||
"""请求级追踪上下文"""
|
||
request_id: str = field(default_factory=lambda: uuid.uuid4().hex[:12])
|
||
trace_type: str = "http" # http, sse, ws, job
|
||
start_time: datetime = field(default_factory=datetime.now)
|
||
method: str = ""
|
||
path: str = ""
|
||
user_id: int | None = None
|
||
site_id: int | None = None
|
||
spans: list[TraceSpan] = field(default_factory=list)
|
||
|
||
def add_span(self, span: TraceSpan):
|
||
self.spans.append(span)
|
||
```
|
||
|
||
##### 1.2 TraceMiddleware(ASGI 中间件)
|
||
|
||
```python
|
||
# apps/backend/app/trace/middleware.py
|
||
# - 每个请求创建 TraceContext,存入 contextvars
|
||
# - 记录 HTTP_IN span(method, path, query_params, body_preview)
|
||
# - 请求结束时记录 HTTP_OUT span(status_code, duration, body_size)
|
||
# - 将完整 trace 写入日志文件
|
||
# - 响应头写入 X-Request-ID, X-Process-Time, X-DB-Queries, X-DB-Time
|
||
```
|
||
|
||
##### 1.3 trace_span 装饰器
|
||
|
||
```python
|
||
# apps/backend/app/trace/decorators.py
|
||
def trace_service(description_zh: str, description_en: str):
|
||
"""Service 层函数装饰器,自动记录函数调用 span"""
|
||
# 记录:模块名、函数名、参数名+值、返回值摘要、耗时
|
||
|
||
def trace_db(description_zh: str, description_en: str):
|
||
"""数据库查询装饰器,自动记录 SQL span"""
|
||
# 记录:SQL 语句、参数、返回行数、耗时
|
||
```
|
||
|
||
##### 1.4 数据库连接包装
|
||
|
||
```python
|
||
# apps/backend/app/trace/db_wrapper.py
|
||
# 包装 get_connection(),拦截 cursor.execute()
|
||
# 自动记录每条 SQL 的:
|
||
# - 完整 SQL 语句(参数化)
|
||
# - 绑定参数值
|
||
# - 返回行数
|
||
# - 执行耗时
|
||
# - 调用来源(哪个 service 函数)
|
||
# 同时记录连接生命周期:
|
||
# - DB_CONN span:连接获取耗时
|
||
# - DB_CONN_RELEASE span:连接释放
|
||
```
|
||
|
||
##### 1.5 鉴权层追踪
|
||
|
||
```python
|
||
# 在 require_approved() / get_current_user() 等依赖注入中添加 span
|
||
# 记录:token 前缀、user_id、site_id、roles、是否通过
|
||
# 鉴权失败时记录详细原因分类:
|
||
# - AUTH_EXPIRED:令牌过期
|
||
# - AUTH_INVALID:令牌无效(签名错误)
|
||
# - AUTH_MALFORMED:令牌格式错误(缺少字段)
|
||
# - AUTH_LIMITED:受限令牌访问完整接口
|
||
# - AUTH_FORBIDDEN:角色权限不足
|
||
```
|
||
|
||
##### 1.6 SSE 流式响应追踪
|
||
|
||
```python
|
||
# apps/backend/app/trace/sse_wrapper.py
|
||
# 包装 StreamingResponse 的 event_generator,追踪 SSE 全流程:
|
||
# - SSE_START span:流开始(记录端点、用户、chat_id)
|
||
# - SSE_EVENT span:每个事件(message/done/error),记录累计 token 数
|
||
# - SSE_END span:流结束(总 token 数、总耗时、是否正常完成)
|
||
# 特别处理 AI 调用链:
|
||
# - AI_CALL span:DashScope API 调用(app_id、prompt 长度、session_id)
|
||
# - AI_STREAM span:流式 token 接收(每 N 个 token 记录一次,避免 span 爆炸)
|
||
# - AI_ERROR span:AI 调用失败(错误类型、重试次数)
|
||
```
|
||
|
||
##### 1.7 异常/错误追踪
|
||
|
||
```python
|
||
# apps/backend/app/trace/error_handler.py
|
||
# 集成到全局异常处理器(http_exception_handler / unhandled_exception_handler):
|
||
# - ERROR span:记录异常类型、异常消息、堆栈摘要(前 5 行)、发生层级
|
||
# - 区分 HTTPException(业务错误)和未捕获异常(系统错误)
|
||
# - 数据库异常(psycopg2.Error)单独分类:DB_ERROR span
|
||
# - 确保异常时 trace 仍能正确写入(异常处理器中调用 TraceWriter)
|
||
```
|
||
|
||
##### 1.8 WebSocket 追踪
|
||
|
||
```python
|
||
# apps/backend/app/trace/ws_wrapper.py
|
||
# 包装 WebSocket 端点,追踪连接全生命周期:
|
||
# - WS_CONNECT span:连接建立(execution_id、客户端信息)
|
||
# - WS_MESSAGE span:消息推送(消息数量、累计字节数,每 N 条记录一次)
|
||
# - WS_DISCONNECT span:连接断开(原因、总消息数、总耗时)
|
||
# WebSocket trace 使用独立的 request_id(ws_ 前缀),与 HTTP trace 区分
|
||
```
|
||
|
||
##### 1.9 后台 Job 追踪
|
||
|
||
```python
|
||
# apps/backend/app/trace/job_wrapper.py
|
||
# 包装 lifespan 中注册的 job handler,追踪后台任务执行:
|
||
# - JOB_START span:任务开始(job_name、触发时间)
|
||
# - 内部的 SERVICE / DB_QUERY span 自动关联到 job trace
|
||
# - JOB_END span:任务正常结束(耗时、处理记录数)
|
||
# - JOB_ERROR span:任务异常(异常类型、堆栈摘要)
|
||
# Job trace 使用独立的 request_id(job_ 前缀),写入同一日志文件
|
||
# 在 admin-web 中可按 trace 类型(http/sse/ws/job)筛选
|
||
```
|
||
|
||
##### 1.10 中间件层追踪
|
||
|
||
```python
|
||
# 在 TraceMiddleware 中记录中间件链的执行耗时:
|
||
# - MIDDLEWARE span:ResponseWrapperMiddleware 执行耗时
|
||
# - 如果响应包装失败(JSON 解析错误),记录 MIDDLEWARE_ERROR span
|
||
# - 记录响应体大小(用于检测异常大响应)
|
||
```
|
||
|
||
#### 2. 日志文件方案
|
||
|
||
##### 2.1 文件组织
|
||
|
||
```
|
||
export/dev-trace-logs/
|
||
├── 2026-03-22/
|
||
│ ├── trace_2026-03-22_00.jsonl # 按小时分割
|
||
│ ├── trace_2026-03-22_01.jsonl
|
||
│ └── ...
|
||
├── 2026-03-23/
|
||
│ └── ...
|
||
└── _index.json # 索引文件(日期→文件列表→记录数)
|
||
```
|
||
|
||
##### 2.2 单条日志格式(JSON Lines)
|
||
|
||
```json
|
||
{
|
||
"request_id": "a1b2c3d4e5f6",
|
||
"timestamp": "2026-03-22T14:30:15.123",
|
||
"method": "POST",
|
||
"path": "/api/xcx/tasks",
|
||
"status_code": 200,
|
||
"total_duration_ms": 45,
|
||
"user_id": 7,
|
||
"site_id": 1,
|
||
"db_query_count": 2,
|
||
"db_total_ms": 20,
|
||
"error": null,
|
||
"spans": [
|
||
{
|
||
"span_type": "HTTP_IN",
|
||
"module": "trace.middleware",
|
||
"function": "TraceMiddleware.__call__",
|
||
"description_zh": "接收请求 POST /api/xcx/tasks",
|
||
"description_en": "Received request POST /api/xcx/tasks",
|
||
"params": {"query": {"status": "pending", "page": "1"}, "body_preview": ""},
|
||
"result_summary": "",
|
||
"duration_ms": 0,
|
||
"timestamp": "2026-03-22T14:30:15.123"
|
||
},
|
||
{
|
||
"span_type": "AUTH",
|
||
"module": "auth.dependencies",
|
||
"function": "require_approved",
|
||
"description_zh": "JWT 鉴权通过:用户ID=7, 门店ID=1, 角色=[coach]",
|
||
"description_en": "JWT auth passed: user_id=7, site_id=1, roles=[coach]",
|
||
"params": {"token_prefix": "eyJ..."},
|
||
"result_summary": "approved",
|
||
"duration_ms": 2,
|
||
"timestamp": "2026-03-22T14:30:15.125"
|
||
},
|
||
{
|
||
"span_type": "SERVICE",
|
||
"module": "services.task_manager",
|
||
"function": "get_task_list_v2",
|
||
"description_zh": "调用任务管理服务:查询待处理任务",
|
||
"description_en": "Called task manager service: query pending tasks",
|
||
"params": {"user_id": 7, "site_id": 1, "status": "pending"},
|
||
"result_summary": "返回 15 条任务",
|
||
"duration_ms": 38,
|
||
"timestamp": "2026-03-22T14:30:15.126"
|
||
},
|
||
{
|
||
"span_type": "DB_QUERY",
|
||
"module": "services.task_manager",
|
||
"function": "get_task_list_v2",
|
||
"description_zh": "查询任务表:按门店和状态筛选",
|
||
"description_en": "Query tasks table: filter by site and status",
|
||
"params": {"site_id": 1, "status": "pending"},
|
||
"result_summary": "15 行",
|
||
"duration_ms": 12,
|
||
"timestamp": "2026-03-22T14:30:15.128",
|
||
"extra": {
|
||
"sql": "SELECT id, customer_name, task_type, ... FROM biz.tasks WHERE site_id = $1 AND status = $2",
|
||
"params": [1, "pending"],
|
||
"row_count": 15
|
||
}
|
||
},
|
||
{
|
||
"span_type": "HTTP_OUT",
|
||
"module": "trace.middleware",
|
||
"function": "TraceMiddleware.__call__",
|
||
"description_zh": "响应返回 200 OK,耗时 45ms",
|
||
"description_en": "Response sent 200 OK, took 45ms",
|
||
"params": {},
|
||
"result_summary": "200 OK, 3.2KB body",
|
||
"duration_ms": 45,
|
||
"timestamp": "2026-03-22T14:30:15.168"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
##### 2.3 文件分割策略
|
||
|
||
- 按日期分目录:`YYYY-MM-DD/`
|
||
- 按小时分文件:`trace_YYYY-MM-DD_HH.jsonl`
|
||
- 单文件超过 10MB 自动轮转:`trace_YYYY-MM-DD_HH_001.jsonl`
|
||
- 索引文件 `_index.json` 记录每个文件的记录数和大小
|
||
|
||
##### 2.4 清理策略
|
||
|
||
- 自动清理:每天凌晨检查,删除超过 N 天的日志(默认 7 天)
|
||
- 手动清理:admin-web 提供按日期范围清理的功能
|
||
- 配置项:`DEV_TRACE_LOG_RETENTION_DAYS=7`(.env)
|
||
|
||
#### 3. 后端 API(日志读取 + 覆盖率)
|
||
|
||
```
|
||
GET /api/admin/dev-trace/coverage # 获取最近一次覆盖率扫描结果
|
||
POST /api/admin/dev-trace/coverage/scan # 手动触发覆盖率扫描
|
||
GET /api/admin/dev-trace/dates # 获取有日志的日期列表
|
||
GET /api/admin/dev-trace/requests # 按条件查询请求列表
|
||
?date=2026-03-22
|
||
&start_time=14:00
|
||
&end_time=15:00
|
||
&trace_type=http|sse|ws|job # 新增:按 trace 类型筛选
|
||
&method=POST
|
||
&path_contains=tasks
|
||
&status_code=200
|
||
&min_duration=100
|
||
&has_error=true # 新增:只看有错误的请求
|
||
&span_type=DB_QUERY,ERROR # 新增:包含特定 span 类型的请求
|
||
&page=1&page_size=50
|
||
GET /api/admin/dev-trace/request/{id} # 获取单个请求的完整 span 链路
|
||
POST /api/admin/dev-trace/cleanup # 手动清理指定日期范围的日志
|
||
GET /api/admin/dev-trace/settings # 获取日志设置(保留天数、开关状态)
|
||
PUT /api/admin/dev-trace/settings # 更新日志设置
|
||
```
|
||
|
||
#### 4. admin-web:开发测试日志板块
|
||
|
||
##### 4.1 页面结构
|
||
|
||
左右分栏布局:
|
||
- 左侧:请求列表(时间、方法/类型、路径、状态码、耗时、DB查询数)
|
||
- 右侧:选中请求的完整 span 链路树(层级缩进展示)
|
||
- 顶部:筛选栏(日期、时间范围、Trace 类型[HTTP/SSE/WS/Job]、方法、路径关键词、状态码、最小耗时、span_type 筛选)
|
||
- Span 类型颜色编码:HTTP=蓝、AUTH=橙、SERVICE=绿、DB=紫、ERROR=红、SSE=青、WS=黄、JOB=灰
|
||
|
||
##### 4.2 覆盖率仪表盘(页面顶部)
|
||
|
||
在 DevTrace 页面顶部展示 Trace 覆盖率状态栏:
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ 📊 Trace 覆盖率:路由 10/11 (91%) | Service 7/23 (30%) | Job 4/4 (100%) │
|
||
│ 未覆盖:xcx_test, fdw_queries, matching, application, ... [🔄 扫描] │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
后端扫描逻辑(`apps/backend/app/trace/coverage.py`):
|
||
|
||
```python
|
||
# 扫描维度:
|
||
# 1. 路由覆盖:扫描 app/routers/xcx_*.py 中的路由函数,
|
||
# 对比 TraceMiddleware 的路由前缀匹配规则,判断是否在 trace 范围内
|
||
# 2. Service 覆盖:扫描 app/services/ 下所有公开函数(非 _ 开头),
|
||
# 检查是否有 @trace_service 装饰器
|
||
# 3. Job 覆盖:扫描 lifespan 中注册的 job handler,
|
||
# 检查是否被 job_wrapper 包装
|
||
# 4. SSE/WS 覆盖:扫描 SSE/WS 端点,检查是否集成了对应 wrapper
|
||
|
||
# 输出结构:
|
||
{
|
||
"scan_time": "2026-03-22T14:30:00",
|
||
"routes": {
|
||
"total": 11, "covered": 10,
|
||
"uncovered": ["xcx_test"],
|
||
"details": [{"name": "xcx_tasks", "covered": true, "functions": 4}, ...]
|
||
},
|
||
"services": {
|
||
"total": 23, "covered": 7,
|
||
"uncovered": ["fdw_queries.get_member_data", "matching.find_best_match", ...],
|
||
"details": [{"module": "task_manager", "total": 5, "covered": 5}, ...]
|
||
},
|
||
"jobs": {"total": 4, "covered": 4, "uncovered": []},
|
||
"sse_endpoints": {"total": 1, "covered": 1, "uncovered": []},
|
||
"ws_endpoints": {"total": 1, "covered": 1, "uncovered": []}
|
||
}
|
||
```
|
||
|
||
扫描触发方式:
|
||
- 手动扫描:admin-web 页面点击「扫描」按钮,调用 API 立即执行
|
||
- 定时扫描:后端启动时扫描一次,之后按配置间隔定期扫描(默认 1 小时)
|
||
- 扫描结果缓存在内存中,API 返回最近一次扫描结果
|
||
|
||
##### 4.3 设置面板
|
||
|
||
- 日志开关(启用/禁用)
|
||
- 保留天数配置
|
||
- 自动清理开关
|
||
- 手动清理(按日期范围)
|
||
- 磁盘占用统计
|
||
- 覆盖率扫描间隔配置(分钟)
|
||
|
||
#### 5. 开关机制
|
||
|
||
##### 5.1 环境变量
|
||
|
||
```env
|
||
DEV_TRACE_ENABLED=true # 总开关
|
||
DEV_TRACE_LOG_DIR=export/dev-trace-logs # 日志目录
|
||
DEV_TRACE_LOG_RETENTION_DAYS=7 # 自动清理保留天数
|
||
DEV_TRACE_LOG_SQL=true # 是否记录完整 SQL
|
||
DEV_TRACE_LOG_PARAMS=true # 是否记录函数参数值
|
||
```
|
||
|
||
##### 5.2 运行时开关
|
||
|
||
- admin-web 设置面板可动态开关(通过 API 修改内存状态)
|
||
- 不需要重启后端
|
||
- 重启后回退到 .env 配置
|
||
|
||
## Considerations
|
||
|
||
### 性能影响
|
||
- JSON Lines 追加写入,IO 开销极小
|
||
- contextvars 无锁,线程安全
|
||
- 装饰器开销:每个 span 约 0.01ms(可忽略)
|
||
- 文件写入异步化(写入失败不影响请求处理)
|
||
|
||
### 安全
|
||
- 仅 admin 角色可访问日志 API
|
||
- SQL 参数值在日志中记录(开发环境可接受,生产环境关闭)
|
||
- Token 只记录前缀,不记录完整值
|
||
|
||
### 与现有系统的关系
|
||
- 不影响现有的 `ResponseWrapperMiddleware`(trace 中间件在其外层)
|
||
- 不影响现有的 logging 配置
|
||
- 日志文件路径遵循 `export-paths` 规范
|
||
|
||
### 实施范围
|
||
- xcx_* 路由全覆盖(登录、任务、备注、绩效、AI 对话、客户、助教、看板、配置)
|
||
- SSE 流式端点(xcx_chat 的 AI 对话流)完整追踪
|
||
- WebSocket 端点(/ws/logs)连接生命周期追踪
|
||
- 后台 Job(task_generator、task_expiry、recall_detector、note_reclassifier)执行追踪
|
||
- 异常/错误全链路追踪(业务异常 + 系统异常 + 数据库异常)
|
||
- 数据库连接生命周期追踪(获取/释放)
|
||
- 中间件层耗时追踪
|
||
- 后续可扩展到 admin_* 路由
|
||
- Service 层装饰器按需添加(优先覆盖联调涉及的 service)
|
||
|
||
|
||
## Correctness Properties
|
||
|
||
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
||
|
||
### Property 1: Request ID 唯一性
|
||
|
||
*For any* sequence of N HTTP requests processed by the TraceMiddleware, all N generated request_id values shall be distinct.
|
||
|
||
**Validates: Requirement 1.1**
|
||
|
||
### Property 2: Span 顺序保持
|
||
|
||
*For any* sequence of TraceSpan objects added to a TraceContext, the spans list shall preserve the insertion order (i.e., `spans[i].timestamp <= spans[i+1].timestamp` for all valid i).
|
||
|
||
**Validates: Requirement 1.5**
|
||
|
||
### Property 3: TraceSpan 结构完整性
|
||
|
||
*For any* TraceSpan instance regardless of span_type, the serialized JSON output shall contain all required fields: span_type, module, function, description_zh, description_en, params, result_summary, duration_ms, timestamp, and extra. Additionally, the top-level trace record shall contain request_id, timestamp, method, path, status_code, total_duration_ms, user_id, site_id, db_query_count, db_total_ms, error, and spans.
|
||
|
||
**Validates: Requirements 2.4, 3.1, 3.2**
|
||
|
||
### Property 4: Token 前缀截断
|
||
|
||
*For any* JWT token string of any length, the AUTH span shall record only a prefix (not exceeding a fixed maximum length) and the recorded value shall not equal the complete token when the token exceeds that length.
|
||
|
||
**Validates: Requirement 2.5**
|
||
|
||
### Property 5: JSON 序列化往返一致性
|
||
|
||
*For any* valid TraceContext object, serializing it to a JSON line then parsing that JSON line back shall produce an equivalent data structure (all field values preserved).
|
||
|
||
**Validates: Requirement 3.5**
|
||
|
||
### Property 6: 日志文件路径生成
|
||
|
||
*For any* timestamp, the generated log directory name shall match the format `YYYY-MM-DD/` and the generated log file name shall match the format `trace_YYYY-MM-DD_HH.jsonl`, where the date and hour components correspond to the input timestamp.
|
||
|
||
**Validates: Requirements 4.1, 4.2**
|
||
|
||
### Property 7: 索引文件一致性
|
||
|
||
*For any* sequence of log write operations, the `_index.json` file shall accurately reflect the current state: every referenced file exists, every existing log file is referenced, and the record count and file size for each entry match the actual file.
|
||
|
||
**Validates: Requirements 4.4, 4.5**
|
||
|
||
### Property 8: 清理保留期正确性
|
||
|
||
*For any* set of date directories and a configured retention period of N days, after cleanup executes, all directories with dates older than N days from today shall be deleted, all directories within the retention window shall be preserved, and the `_index.json` shall not reference any deleted directories.
|
||
|
||
**Validates: Requirements 5.2, 5.4**
|
||
|
||
### Property 9: API 筛选正确性
|
||
|
||
*For any* set of stored trace records and any combination of filter parameters (date, time range, method, path keyword, status code, minimum duration), all returned results shall satisfy every specified filter criterion, and no matching record shall be omitted from the results (within pagination bounds).
|
||
|
||
**Validates: Requirement 6.2**
|
||
|
||
### Property 10: Trace 写入-读取往返一致性
|
||
|
||
*For any* TraceContext written to a log file, querying the Trace_API with the corresponding request_id shall return a record equivalent to the original TraceContext (all fields and spans preserved).
|
||
|
||
**Validates: Requirement 6.3**
|
||
|
||
### Property 11: Admin 权限强制
|
||
|
||
*For any* Trace_API endpoint and any user without admin role, the API shall return a 403 Forbidden response.
|
||
|
||
**Validates: Requirements 6.5, 6.6**
|
||
|
||
### Property 12: 设置更新往返一致性
|
||
|
||
*For any* valid settings update (enabled status, retention days, SQL logging flag, parameter logging flag), after a PUT to the settings API, a subsequent GET shall return the updated values.
|
||
|
||
**Validates: Requirement 7.2**
|
||
|
||
### Property 13: 开关关闭时无 Trace 产出
|
||
|
||
*For any* HTTP request processed while the trace system is disabled (either via `DEV_TRACE_ENABLED=false` or runtime switch off), no TraceContext shall be created, no spans shall be recorded, and no log file entries shall be written.
|
||
|
||
**Validates: Requirements 8.2, 8.3**
|
||
|
||
### Property 14: 功能开关控制 Span 内容
|
||
|
||
*For any* DB_QUERY span when `DEV_TRACE_LOG_SQL` is false, the span shall not contain the full SQL statement. *For any* SERVICE span when `DEV_TRACE_LOG_PARAMS` is false, the span shall not contain function parameter values.
|
||
|
||
**Validates: Requirements 8.5, 8.6**
|
||
|
||
### Property 15: 路由前缀过滤
|
||
|
||
*For any* HTTP request, the TraceMiddleware shall produce trace data if and only if the request path matches the `xcx_*` route prefix. Non-matching requests shall produce no trace output.
|
||
|
||
**Validates: Requirements 11.1, 11.2**
|
||
|
||
### Property 16: 异常时 Trace 完整性
|
||
|
||
*For any* HTTP request that results in an exception (HTTPException or unhandled), the trace record shall contain an ERROR or DB_ERROR span with the exception type and message, and the HTTP_OUT span shall still be recorded with the correct error status code.
|
||
|
||
**Validates: Requirements 12.1, 12.2, 12.3**
|
||
|
||
### Property 17: SSE 流式 Trace 完整性
|
||
|
||
*For any* SSE streaming response, the trace record shall contain SSE_START and SSE_END spans. The SSE_END span's total_tokens shall equal the sum of tokens reported in SSE_EVENT spans. If an error occurs during streaming, an AI_ERROR span shall be present.
|
||
|
||
**Validates: Requirements 13.1, 13.2, 13.3**
|
||
|
||
### Property 18: WebSocket Trace 生命周期
|
||
|
||
*For any* WebSocket connection, the trace record shall contain a WS_CONNECT span and a WS_DISCONNECT span. The WS_DISCONNECT span's total_messages shall be consistent with the number of WS_MESSAGE spans recorded.
|
||
|
||
**Validates: Requirements 14.1, 14.2**
|
||
|
||
### Property 19: 后台 Job Trace 完整性
|
||
|
||
*For any* background job execution, the trace record shall contain a JOB_START span and either a JOB_END or JOB_ERROR span. Internal SERVICE and DB_QUERY spans shall be associated with the same job trace request_id.
|
||
|
||
**Validates: Requirements 15.1, 15.2**
|
||
|
||
### Property 20: 鉴权失败原因分类
|
||
|
||
*For any* authentication failure, the AUTH span shall contain a failure_reason field with one of the defined categories (AUTH_EXPIRED, AUTH_INVALID, AUTH_MALFORMED, AUTH_LIMITED, AUTH_FORBIDDEN), and the reason shall accurately reflect the actual failure cause.
|
||
|
||
**Validates: Requirement 12.4**
|
||
|
||
### Property 21: 数据库连接生命周期配对
|
||
|
||
*For any* database connection acquired during a request, there shall be exactly one DB_CONN span (connection open) and one DB_CONN_RELEASE span (connection close). The DB_CONN_RELEASE timestamp shall be >= the DB_CONN timestamp.
|
||
|
||
**Validates: Requirement 16.1**
|
||
|
||
### Property 22: 覆盖率扫描一致性
|
||
|
||
*For any* set of route files, service modules, and job handlers in the backend codebase, the coverage scanner shall correctly identify all public functions and accurately report which ones have trace decorators/wrappers applied. The total count shall equal covered + uncovered for each category.
|
||
|
||
**Validates: Requirements 18.1, 18.2**
|