在前后端开发联调前的提交20260223

2026-02-23 23:02:20 +08:00
parent 254ccb1e77
commit fafc95e64c
1142 changed files with 10366960 additions and 36957 deletions
--- a/docs/audit/prompt_logs/prompt_log_20260221_123335.md
+++ b/docs/audit/prompt_logs/prompt_log_20260221_123335.md
@@ -0,0 +1,114 @@
+- [P20260221-123335] 2026-02-21 12:33:35 +0800
+  - summary: CONTEXT TRANSFER: We are continuing a conversation that had gotten too long. Here is a summary: --- ## TASK 1: Execute D…
+  - prompt:
+```text
+CONTEXT TRANSFER: We are continuing a conversation that had gotten too long. Here is a summary:
+
+---
+## TASK 1: Execute Data Flow Structure Analysis (数据流结构分析)
+
+**STATUS**: in-progress
+
+**USER QUERIES**: User's single request - Execute dataflow structure analysis in two phases: (1) data collection via `analyze_dataflow.py`, (2) report generation via `gen_dataflow_report.py`
+
+**DETAILS**:
+- User requested a full dataflow structure analysis for the feiqiu (飞球) connector
+- Two-phase process: Phase 1 = data collection, Phase 2 = report generation
+- The agent confirmed `scripts/ops/analyze_dataflow.py` exists but was cut off before executing anything
+- No commands have been run yet - the task is at the very beginning
+- The user specified that if historical task artifacts exist, they should be cleared and re-executed
+
+**NEXT STEPS**:
+1. Check the output directory status (likely `SYSTEM_ANALYZE_ROOT` from `.env`) for any existing artifacts
+2. Run `python scripts/ops/analyze_dataflow.py` from the project root to collect data
+3. Verify collection results are on disk: `json_trees/`, `db_schemas/`, `field_mappings/`, `bd_descriptions/`, `collection_manifest.json`
+4. Run `python scripts/ops/gen_dataflow_report.py` to generate the Markdown report
+5. Verify report contains all required enhanced content (API date range, JSON field counts, field diff report with whitelist folding, business descriptions, anchor links, etc.)
+6. Output the file path and key statistics summary
+
+**KEY CONTEXT**:
+- Project is a billiard hall (台球门店) data platform monorepo called NeoZQYY
+- ETL pipeline: API → ODS → DWD → DWS with PostgreSQL
+- Four DB instances: `etl_feiqiu`, `test_etl_feiqiu`, `zqyy_app`, `test_zqyy_app`
+- Environment variables control all output paths (see `export-paths.md` steering)
+- Output paths come from `.env` - key vars: `SYSTEM_ANALYZE_ROOT`, `FULL_DATAFLOW_DOC_ROOT`
+- Scripts must be run with `uv run python` or `python` from project root `C:\NeoZQYY`
+- OS is Windows with cmd shell
+- Whitelist rules (v4): ETL meta cols, SCD2 cols, siteProfile nested fields - still checked but folded in report
+- Only analyzing feiqiu connector currently
+
+**FILEPATHS**:
+- `scripts/ops/analyze_dataflow.py` - Phase 1: data collection script
+- `scripts/ops/gen_dataflow_report.py` - Phase 2: report generation script (partially loaded, truncated at ~806/889 lines)
+- `scripts/ops/field_level_report.py` - Related field-level analysis script
+- `scripts/ops/etl_consistency_check.py` - Related consistency check script (partially loaded, truncated at ~811/1011 lines)
+- `.env` - Environment variables (not read yet, needed for paths)
+- `.env.template` - Template for env vars
+- `apps/etl/connectors/feiqiu/docs/architecture/data_flow.md` - Architecture documentation
+- `export/SYSTEM/REPORTS/full_dataflow_doc/dataflow_api_ods_dwd.md` - Previous report output (4838 lines, only 408 loaded)
+
+**USER CORRECTIONS AND INSTRUCTIONS**:
+- All responses must be in simplified Chinese (简体中文) per `language-zh.md` steering
+- Must use `.env` for all output paths - never hardcode (per `export-paths.md`)
+- Testing/scripts must load `.env` properly (per `testing-env.md`)
+- Prefer Python scripts over PowerShell for complex operations (per `tech.md`)
+- `cwd` for ETL scripts should be `apps/etl/connectors/feiqiu/` but these ops scripts run from project root
+- DB connections use `PG_DSN` from `.env`
+- This is NOT a spec creation task - it's a direct execution task despite the system prompt mentioning spec workflow
+
+**Files to read**:
+- `scripts/ops/analyze_dataflow.py`
+- `scripts/ops/gen_dataflow_report.py`
+- `.env.template`
+
+USER QUERIES(most recent first):
+1. <source-event>
+    The user manually invoked this action
+    The user is focued on the following file: No file focused
+    The user has the following paths open: 
+</source-event>
+
+执行数据流结构分析，按以下步骤完成。若发现已完成或有历史任务痕迹则清空，重新执行：
+
+第一阶段：数据采集
+1. 运行 `python scripts/ops/analyze_dataflow.py` 完成数据采集（如需指定日期范围，加 --date-from / --date-to 参数）
+2. 确认采集结果已落盘，包括：
+   - json_trees/（含 samples 多示例值）
+   - db_schemas/
+   - field_mappings/（三层映射 + 锚点）
+   - bd_descriptions/（BD_manual 业务描述）
+   - collection_manifest.json（含 json_field_count、date_from、date_to）
+
+第二阶段：报告生成
+3. 运行 `python scripts/ops/gen_dataflow_report.py` 生成 Markdown 报告
+4. 报告包含以下增强内容：
+   - 报告头含 API 请求日期范围（date_from ~ date_to）和 JSON 数据总量
+   - 总览表含 API JSON 字段数列
+   - 1.1 API↔ODS↔DWD 字段对比差异报告（白名单字段折叠汇总，不展开详细表格行）
+   - 2.3 覆盖率表含业务描述列
+   - API 源字段表含业务描述列 + 多示例值（枚举值解释）
+   - ODS 表结构含业务描述列 + 上下游双向映射锚点链接
+   - DWD 表结构含业务描述列 + ODS 来源锚点链接
+5. 输出文件路径和关键统计摘要
+
+白名单规则（v4）：
+- ETL 元数据列（source_file, source_endpoint, fetched_at, payload, content_hash）
+- DWD 维表 SCD2 管理列（valid_from, valid_to, is_current, etl_loaded_at, etl_batch_id）
+- API siteProfile 嵌套对象字段
+- 白名单字段仍正常参与检查和统计，仅在报告中折叠显示并注明原因
+
+注意：当前仅分析飞球（feiqiu）连接器。未来新增连接器时，应自动发现并纳入分析范围。
+2. <implicit-rules>## Implicit Rules
+Focus on creating a new spec file or identifying an existing spec to update. 
+If starting a new spec, create a requirements.md file in the .kiro/specs directory with clear user stories and acceptance criteria. 
+If working with an existing spec, review the current requirements and suggest improvements if needed. 
+Do not make direct code changes yet. First establish or review the spec file that will guide our implementation.</implicit-rules>
+---
+
+METADATA:
+The previous conversation had 2 messages.
+
+INSTRUCTIONS:
+Continue working until the user query has been fully addressed. Do not ask for clarification - proceed with the work based on the context provided.
+IMPORTANT: you need to read from the files to Read section
+```