- .kiro/specs/ → docs/specs/(41 个历史需求 spec 迁移,移除 .config.kiro) - CLAUDE.md 三层拆分:根文件精简 + apps/backend/CLAUDE.md + .claude/commands/ - 新增 /spec-close、/pre-change 两个工作流命令 - DDL 基线刷新(从测试库重新导出 11 个文件,dws 35→38 表,biz 18→21 表) - BD_Manual → BD_manual 命名统一(48 个文件) - 修复 3 处文档与数据库不一致(auth.users.status 默认值、scheduled_tasks 字段、RLS 视图数) - 新增 BD_manual_public_rbac_tables.md(public schema 8 张 RBAC/工作流表) - 合并 biz.trigger_jobs 文档(10→12 字段,归档独立文档) - docs/database/README.md 索引更新 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
18 KiB
Requirements Document
Introduction
开发调试全链路日志系统(dev-trace-log)为小程序前后端联调提供全链路请求追踪能力。后端采集从 HTTP 请求进入到数据库查询的每一层细粒度日志(span),写入 JSON Lines 日志文件。admin-web 提供「开发测试日志」板块,支持多维度筛选和查看完整请求链路。仅在开发/测试环境启用,生产环境通过开关关闭。
Glossary
- Trace_System: 全链路日志采集系统,包含中间件、装饰器、数据库包装器等组件
- TraceMiddleware: ASGI 中间件,负责为每个请求创建追踪上下文、记录 HTTP_IN/HTTP_OUT span、写入日志文件
- TraceContext: 基于 contextvars 的请求级追踪上下文,存储 request_id、span 列表等信息
- TraceSpan: 单个追踪节点,记录某一层的函数调用信息(类型、模块、函数名、参数、耗时等)
- Span_Type: span 类型枚举,包括 HTTP_IN、AUTH、ROUTE、SERVICE、DB_QUERY、HTTP_OUT
- JSON_Lines_File: 以
.jsonl为扩展名的日志文件,每行一条完整的 JSON 格式请求追踪记录 - Log_Writer: 日志文件写入组件,负责按日期/小时分割文件、轮转和索引维护
- Trace_API: admin-web 后端 API 端点集合,提供日志查询、清理、设置等功能
- Admin_Web_Trace_Page: admin-web 中的「开发测试日志」板块,提供请求列表和 span 链路树展示
- Runtime_Switch: 运行时动态开关,通过 API 修改内存状态控制日志采集的启用/禁用
Requirements
Requirement 1: 请求级追踪上下文管理
User Story: 作为后端开发者,我希望每个 HTTP 请求自动生成唯一的追踪上下文,以便将请求全链路的所有 span 关联到同一个 request_id。
Acceptance Criteria
- WHEN an HTTP request enters the FastAPI application, THE TraceMiddleware SHALL create a new TraceContext with a unique request_id and store it in contextvars
- WHEN the TraceContext is created, THE TraceMiddleware SHALL record an HTTP_IN span containing the request method, path, query parameters, and body preview
- WHEN the HTTP response is sent, THE TraceMiddleware SHALL record an HTTP_OUT span containing the status code, total duration, and response body size
- WHEN the HTTP response is sent, THE TraceMiddleware SHALL include X-Request-ID, X-Process-Time, X-DB-Queries, and X-DB-Time in the response headers
- THE TraceContext SHALL maintain an ordered list of TraceSpan objects appended during the request lifecycle
Requirement 2: 多层 Span 采集
User Story: 作为后端开发者,我希望鉴权、路由、Service、数据库四层的函数调用都自动记录 span,以便在调试时看到完整的请求处理链路。
Acceptance Criteria
- WHEN a request passes through the authentication layer, THE Trace_System SHALL record an AUTH span containing token parse result, user_id, site_id, roles, and approval status
- WHEN a decorated Service function is called, THE Trace_System SHALL record a SERVICE span containing the module name, function name, parameter names and values, return value summary, and duration
- WHEN a database query is executed, THE Trace_System SHALL record a DB_QUERY span containing the full parameterized SQL statement, bound parameter values, returned row count, execution duration, and calling source function
- THE TraceSpan SHALL contain the following fields: span_type, module, function, description_zh, description_en, params, result_summary, duration_ms, timestamp, and extra
- WHEN the AUTH span records a token, THE Trace_System SHALL record only the token prefix, not the complete token value
Requirement 3: JSON Lines 日志序列化与写入
User Story: 作为后端开发者,我希望每个请求的完整追踪数据以 JSON Lines 格式写入日志文件,以便后续通过 API 读取和展示。
Acceptance Criteria
- WHEN a request completes, THE Log_Writer SHALL serialize the TraceContext into a single JSON line containing request_id, timestamp, method, path, status_code, total_duration_ms, user_id, site_id, db_query_count, db_total_ms, error, and spans array
- WHEN serializing a TraceSpan, THE Log_Writer SHALL include all TraceSpan fields in the JSON output
- THE Log_Writer SHALL write log entries by appending to the current JSON Lines file asynchronously
- IF the log file write operation fails, THEN THE Log_Writer SHALL not affect the HTTP request processing or response
- THE Log_Writer SHALL produce valid JSON on each line such that parsing then re-serializing a log entry produces an equivalent JSON object (round-trip property)
Requirement 4: 日志文件分割与轮转
User Story: 作为运维人员,我希望日志文件按日期和小时自动分割,并在单文件过大时自动轮转,以便管理磁盘空间。
Acceptance Criteria
- THE Log_Writer SHALL organize log files into date-based directories using the format
YYYY-MM-DD/ - THE Log_Writer SHALL split log files by hour using the naming format
trace_YYYY-MM-DD_HH.jsonl - WHEN a log file exceeds 10MB, THE Log_Writer SHALL rotate to a new file with an incremented suffix using the format
trace_YYYY-MM-DD_HH_NNN.jsonl - THE Log_Writer SHALL maintain an
_index.jsonfile recording the file list, record count, and file size for each date directory - WHEN a new log entry is written, THE Log_Writer SHALL update the
_index.jsonfile to reflect the current state
Requirement 5: 日志自动清理
User Story: 作为运维人员,我希望系统自动清理过期日志,以便防止磁盘空间被无限占用。
Acceptance Criteria
- THE Trace_System SHALL execute an automatic cleanup check daily at midnight
- WHEN the automatic cleanup runs, THE Trace_System SHALL delete all log directories older than the configured retention days (default 7 days)
- THE Trace_System SHALL read the retention period from the environment variable
DEV_TRACE_LOG_RETENTION_DAYS - WHEN a date directory is deleted during cleanup, THE Trace_System SHALL update the
_index.jsonfile accordingly
Requirement 6: 后端日志查询 API
User Story: 作为 admin-web 前端,我希望通过 API 查询日志数据,以便在页面上展示请求列表和链路详情。
Acceptance Criteria
- WHEN a GET request is made to
/api/admin/dev-trace/dates, THE Trace_API SHALL return a list of dates that have log data available - WHEN a GET request is made to
/api/admin/dev-trace/requestswith filter parameters (date, start_time, end_time, method, path_contains, status_code, min_duration, page, page_size), THE Trace_API SHALL return a paginated list of matching request summaries - WHEN a GET request is made to
/api/admin/dev-trace/request/{id}, THE Trace_API SHALL return the complete trace record including all spans for the specified request_id - WHEN a POST request is made to
/api/admin/dev-trace/cleanupwith a date range, THE Trace_API SHALL delete log files within the specified date range and return the cleanup result - THE Trace_API SHALL require admin role authentication for all endpoints
- IF a non-admin user attempts to access any Trace_API endpoint, THEN THE Trace_API SHALL return a 403 Forbidden response
Requirement 7: 日志设置管理 API
User Story: 作为管理员,我希望通过 API 读取和修改日志系统的设置,以便在运行时动态调整日志行为。
Acceptance Criteria
- WHEN a GET request is made to
/api/admin/dev-trace/settings, THE Trace_API SHALL return the current settings including enabled status, retention days, SQL logging flag, and parameter logging flag - WHEN a PUT request is made to
/api/admin/dev-trace/settingswith updated values, THE Trace_API SHALL update the in-memory runtime configuration without requiring a server restart - WHEN the server restarts, THE Trace_System SHALL reset all runtime settings to the values defined in the
.envfile
Requirement 8: 开关机制
User Story: 作为开发者,我希望通过环境变量和运行时开关控制日志采集的启用/禁用,以便在生产环境关闭日志、在开发环境灵活控制。
Acceptance Criteria
- THE Trace_System SHALL read the master switch from the environment variable
DEV_TRACE_ENABLED - WHILE
DEV_TRACE_ENABLEDis set to false, THE TraceMiddleware SHALL skip all trace context creation, span recording, and log file writing - WHEN the Runtime_Switch is toggled via the settings API, THE Trace_System SHALL immediately start or stop trace collection without server restart
- THE Trace_System SHALL read the log directory path from the environment variable
DEV_TRACE_LOG_DIR - WHERE the
DEV_TRACE_LOG_SQLoption is set to false, THE Trace_System SHALL omit full SQL statements from DB_QUERY spans - WHERE the
DEV_TRACE_LOG_PARAMSoption is set to false, THE Trace_System SHALL omit function parameter values from SERVICE spans
Requirement 9: admin-web 开发测试日志页面
User Story: 作为管理员,我希望在 admin-web 中查看请求列表和完整的 span 链路树,以便快速定位前后端联调问题。
Acceptance Criteria
- WHEN the admin navigates to the dev trace log page, THE Admin_Web_Trace_Page SHALL display a left-right split layout with request list on the left and span detail on the right
- WHEN the admin applies filters (date, time range, HTTP method, path keyword, status code, minimum duration), THE Admin_Web_Trace_Page SHALL query the Trace_API and display matching requests
- WHEN the admin selects a request from the list, THE Admin_Web_Trace_Page SHALL display the complete span chain as a hierarchical tree with indentation showing the call depth
- THE Admin_Web_Trace_Page SHALL display each request entry with timestamp, HTTP method, path, status code, total duration, and DB query count
- THE Admin_Web_Trace_Page SHALL display each span with span_type, function name, description (zh), duration, and relevant extra information (SQL for DB_QUERY spans)
Requirement 10: admin-web 设置面板
User Story: 作为管理员,我希望在 admin-web 中管理日志系统的设置,以便控制日志采集行为和清理策略。
Acceptance Criteria
- WHEN the admin opens the settings panel, THE Admin_Web_Trace_Page SHALL display the current log enabled status, retention days, auto-cleanup toggle, and disk usage statistics
- WHEN the admin toggles the log enabled switch, THE Admin_Web_Trace_Page SHALL call the settings API and reflect the updated state immediately
- WHEN the admin initiates a manual cleanup with a date range, THE Admin_Web_Trace_Page SHALL call the cleanup API and display the cleanup result (deleted file count and freed space)
- WHEN the admin modifies the retention days, THE Admin_Web_Trace_Page SHALL call the settings API and confirm the update
Requirement 11: 实施范围与路由覆盖
User Story: 作为开发者,我希望第一期覆盖所有 xcx_* 路由的追踪,以便在小程序联调时获得完整的调试信息。
Acceptance Criteria
- THE TraceMiddleware SHALL intercept all requests matching the
xcx_*route prefix (login, tasks, notes, performance, AI conversation) - WHEN a request matches a non-xcx route, THE TraceMiddleware SHALL skip trace collection for that request
- THE Trace_System SHALL not interfere with the existing ResponseWrapperMiddleware or logging configuration
Requirement 12: 异常/错误全链路追踪
User Story: 作为开发者,我希望请求处理中发生的任何异常都被完整记录到 trace 中,以便快速定位错误发生在哪一层、什么原因。
Acceptance Criteria
- WHEN an HTTPException is raised during request processing, THE Trace_System SHALL record an ERROR span containing the exception type, status code, detail message, and the layer where it occurred
- WHEN an unhandled exception occurs, THE Trace_System SHALL record an ERROR span containing the exception type, message, and stack trace summary (first 5 lines)
- WHEN a database exception (psycopg2.Error) occurs, THE Trace_System SHALL record a DB_ERROR span containing the PostgreSQL error code, message, and the SQL statement that caused it
- WHEN authentication fails, THE AUTH span SHALL include a failure_reason field categorized as one of: AUTH_EXPIRED (token expired), AUTH_INVALID (signature error), AUTH_MALFORMED (missing fields), AUTH_LIMITED (limited token on full endpoint), AUTH_FORBIDDEN (insufficient role)
- WHEN an exception occurs, THE TraceMiddleware SHALL still record the HTTP_OUT span with the error status code and ensure the complete trace is written to the log file
Requirement 13: SSE 流式响应追踪
User Story: 作为开发者,我希望 AI 对话的 SSE 流式响应全过程被追踪,以便看到 AI 调用链的每一步(prompt 构建、API 调用、token 流、完成/错误)。
Acceptance Criteria
- WHEN an SSE streaming endpoint is called, THE Trace_System SHALL record an SSE_START span containing the endpoint, user info, and chat_id
- WHEN the AI API (DashScope) is called, THE Trace_System SHALL record an AI_CALL span containing the app_id, prompt length, and session_id
- DURING SSE token streaming, THE Trace_System SHALL record SSE_EVENT spans at regular intervals (every N tokens) containing the cumulative token count, to avoid span explosion
- WHEN the SSE stream completes, THE Trace_System SHALL record an SSE_END span containing total token count, total duration, and whether it completed normally
- WHEN an error occurs during SSE streaming, THE Trace_System SHALL record an AI_ERROR span containing the error type, message, and retry count
- THE SSE trace SHALL use trace_type="sse" to distinguish from regular HTTP traces
Requirement 14: WebSocket 连接追踪
User Story: 作为开发者,我希望 WebSocket 连接的全生命周期被追踪,以便看到连接建立、消息推送、断开的完整过程。
Acceptance Criteria
- WHEN a WebSocket connection is established, THE Trace_System SHALL record a WS_CONNECT span containing the execution_id and client information
- DURING WebSocket message pushing, THE Trace_System SHALL record WS_MESSAGE spans at regular intervals (every N messages) containing the cumulative message count and byte count
- WHEN a WebSocket connection is closed, THE Trace_System SHALL record a WS_DISCONNECT span containing the disconnect reason, total message count, and total duration
- THE WebSocket trace SHALL use trace_type="ws" and a request_id with "ws_" prefix to distinguish from HTTP traces
Requirement 15: 后台 Job 执行追踪
User Story: 作为开发者,我希望后台定时任务(task_generator、task_expiry、recall_detector、note_reclassifier)的执行过程被追踪,以便排查后台任务的问题。
Acceptance Criteria
- WHEN a background job starts execution, THE Trace_System SHALL create a new TraceContext with trace_type="job" and record a JOB_START span containing the job name and trigger time
- DURING job execution, THE Trace_System SHALL record SERVICE and DB_QUERY spans associated with the job's TraceContext (via contextvars)
- WHEN a job completes successfully, THE Trace_System SHALL record a JOB_END span containing the duration and processed record count
- WHEN a job fails with an exception, THE Trace_System SHALL record a JOB_ERROR span containing the exception type, message, and stack trace summary
- THE job trace SHALL use a request_id with "job_" prefix and be written to the same log files as HTTP traces
Requirement 16: 数据库连接生命周期追踪
User Story: 作为开发者,我希望看到每个数据库连接的获取和释放时间,以便检测连接泄漏或连接获取瓶颈。
Acceptance Criteria
- WHEN a database connection is acquired via get_connection(), THE Trace_System SHALL record a DB_CONN span containing the connection acquisition duration
- WHEN a database connection is released (closed), THE Trace_System SHALL record a DB_CONN_RELEASE span
- EACH DB_CONN span SHALL be paired with exactly one DB_CONN_RELEASE span within the same trace
Requirement 17: 中间件层追踪
User Story: 作为开发者,我希望看到中间件链的执行耗时,以便检测中间件层的性能瓶颈。
Acceptance Criteria
- THE Trace_System SHALL record a MIDDLEWARE span for ResponseWrapperMiddleware containing its execution duration
- IF the ResponseWrapperMiddleware fails to wrap a response (e.g., JSON parse error), THE Trace_System SHALL record a MIDDLEWARE_ERROR span containing the error details
- THE MIDDLEWARE span SHALL include the response body size for monitoring abnormally large responses
Requirement 18: Trace 覆盖率扫描与展示
User Story: 作为开发者,我希望在 admin-web 的日志页面顶部看到当前 trace 系统对路由、Service、Job 等模块的覆盖率,以便在新增模块后及时发现未接入 trace 的函数。
Acceptance Criteria
- THE Trace_System SHALL provide a coverage scanner that inspects the backend codebase and reports: (a) xcx_* route coverage (which route files are in trace scope), (b) Service function coverage (which public functions in
app/services/have@trace_servicedecorator), (c) Job handler coverage (which registered jobs are wrapped byjob_wrapper), (d) SSE/WS endpoint coverage (which endpoints have trace wrappers) - THE coverage scanner SHALL report for each category: total count, covered count, uncovered list with module and function names
- WHEN a GET request is made to
/api/admin/dev-trace/coverage, THE Trace_API SHALL return the most recent scan result including scan_time, and per-category totals/details - WHEN a POST request is made to
/api/admin/dev-trace/coverage/scan, THE Trace_API SHALL execute a fresh scan immediately and return the updated result - THE Trace_System SHALL execute an automatic coverage scan on server startup and periodically at a configurable interval (default 60 minutes)
- THE Admin_Web_Trace_Page SHALL display a coverage summary bar at the top of the DevTrace page showing per-category coverage percentages and a list of uncovered items
- THE Admin_Web_Trace_Page SHALL provide a manual "Scan" button that triggers an immediate coverage scan via the API
- THE coverage scan interval SHALL be configurable via the settings API and settings panel