在准备环境前提交次全部更改。

This commit is contained in:
Neo
2026-02-19 08:35:13 +08:00
parent ded6dfb9d8
commit 4eac07da47
1387 changed files with 6107191 additions and 33002 deletions

View File

@@ -0,0 +1,137 @@
- [P20260217-013236] 2026-02-17 01:32:36 +0800
- summary: CONTEXT TRANSFER: We are continuing a conversation that had gotten too long. Here is a summary: --- ## TASK 1: Enhance d…
- prompt:
```text
CONTEXT TRANSFER: We are continuing a conversation that had gotten too long. Here is a summary:
---
## TASK 1: Enhance dataflow analysis scripts with field mapping columns and anchor links
**STATUS**: done
**USER QUERIES**: 1 (from previous conversation — "保持新文档数据全面性的基础上加上旧文档中表格里的字段上下游映射列以及字段的直接链接点击跳转么更新脚本和HOOKS")
**DETAILS**: Completed in previous conversation. Added `dwd_tables`, `parse_table_map()`, `parse_fact_mappings()`, `build_field_mappings()` to `dataflow_analyzer.py`. Updated hook to v2.0. All 45 tests passed.
**FILEPATHS**: `scripts/ops/dataflow_analyzer.py`, `.kiro/hooks/dataflow-analyze.kiro.hook`, `tests/test_dataflow_analyzer.py`
---
## TASK 2: Enhance report with business descriptions, sample values, JSON field counts, and diff report
**STATUS**: done
**USER QUERIES**: 2-4 (continuing from previous conversation)
**DETAILS**:
- Enhanced `dataflow_analyzer.py`:
- Added `samples: list[str]` field to `FieldInfo` dataclass with `MAX_SAMPLES = 8`
- Updated `_recurse_json()` and `flatten_json_tree()` to collect multiple distinct sample values per field
- Added `json_field_count` to manifest per-table data
- Added `parse_bd_manual_fields()` — parses BD_manual Markdown docs extracting `{field_name -> description}` from "## 字段说明" tables
- Added `load_bd_descriptions()` — loads ODS + all DWD BD_manual descriptions for a table
- Added `dump_bd_descriptions()` — outputs `bd_descriptions/{table}.json` with ODS and DWD field descriptions
- Updated `dump_collection_results()` to call `dump_bd_descriptions()` and return `bd_descriptions` path
- Rewrote `gen_dataflow_report.py` to v3:
- Overview table has "API JSON 字段数" column
- Section 1.1: API↔ODS↔DWD field diff report with counts and reasons
- Section 2.3: Coverage table has "业务描述" column
- API/ODS/DWD per-table sections all have "业务描述" column from BD_manual docs
- API section merges 说明+示例值 with enum detection (`_is_enum_like()`) and multi-sample display (`_format_samples()`)
- Updated hook to v3.0.0 — two-phase: `analyze_dataflow.py` → `gen_dataflow_report.py`
- Updated all tests: fixed `FieldInfo` constructors to include `samples`, added 29 new tests
- All 74 tests pass
**FILEPATHS**: `scripts/ops/dataflow_analyzer.py`, `scripts/ops/gen_dataflow_report.py`, `.kiro/hooks/dataflow-analyze.kiro.hook`, `tests/test_dataflow_analyzer.py`
---
## TASK 3: Enhance 1.1 field diff report with clickable field names, detail sub-tables, and bold formatting for unmapped rows
**STATUS**: in-progress
**USER QUERIES**: 5 ("在 1.1 API↔ODS↔DWD 字段对比差异 表格中,将差异统计数改为 差异统计数+ 差异字段。点击字段跳转至逐表详情的对应字段位置。嵌套对象的处理:默认折叠,可以展开。若不支持这样的交互操作,则单独使用分表进行记录,点击跳转分表查看详情。下列分表,出现上下游不对应的字段,将整行内容加粗。")
**DETAILS**:
User wants to enhance the 1.1 field diff summary table:
1. Change from just counts to "count + clickable field names" that jump to per-table detail sections
2. Nested objects: use `<details>` for collapse/expand, or if not supported, use separate sub-tables
3. Below the summary table, add per-table sub-tables listing specific diff fields
4. In sub-tables, rows where fields have no upstream/downstream mapping should be **bold** (entire row)
The current `_write_field_diff_report()` function in `gen_dataflow_report.py` (line 228) only outputs a summary table with counts and reason strings. It needs to be completely rewritten.
I had read the current implementation and the generated report output to understand the current state, but NO code changes were made yet for this task.
**NEXT STEPS**:
- Rewrite `_write_field_diff_report()` in `scripts/ops/gen_dataflow_report.py`:
- Summary table: each cell shows count + anchor link to per-table sub-table (e.g., `[25](#diff-settlement-records)`)
- After summary table, generate per-table sub-tables with anchors like `<a id="diff-{table}"></a>`
- Each sub-table has 4 sections: API→ODS unmapped fields, ODS no JSON source, ODS→DWD unmapped, DWD no ODS source
- Field names in sub-tables should be clickable links to the corresponding field in section 3 detail tables (using existing anchors like `#api-{table}`, `#ods-{table}`, `#dwd-{dwd_name}`)
- Nested object fields (containing ".") should be grouped separately — use `<details><summary>` for collapse if Markdown renderer supports it, otherwise separate sub-section
- Rows where a field has no upstream/downstream mapping: wrap entire row content in `**bold**`
- Add tests for the new sub-table generation in `tests/test_dataflow_analyzer.py`
- Run tests to verify
**FILEPATHS**: `scripts/ops/gen_dataflow_report.py`, `tests/test_dataflow_analyzer.py`
---
## USER CORRECTIONS AND INSTRUCTIONS:
- All prose/comments/docs in 简体中文; code identifiers stay English
- Scripts go in `scripts/ops/`; tests in root `tests/`
- Reference `apps/etl/pipelines/feiqiu/docs/database/` docs for business field descriptions
- The report is generated by a 2-phase process: Python script collects data → `gen_dataflow_report.py` assembles Markdown report
- Markdown `<details>` tags can be used for collapse/expand, but user said "若不支持则用分表" — prefer `<details>` but have fallback
- ETL meta columns to exclude from diff: `source_file`, `source_endpoint`, `fetched_at`, `payload`, `content_hash`
- This is NOT a high-risk path change (scripts/ops + tests), no `/audit` needed
## Key Architecture Context:
- `dataflow_analyzer.py` — core collection module (API fetch, JSON flatten, DB schema query, field mapping build, BD_manual parsing)
- `analyze_dataflow.py` — CLI entry point
- `gen_dataflow_report.py` — report generator that reads collected JSON data and outputs Markdown
- `dataflow-analyze.kiro.hook` — userTriggered hook, runs analyze then gen_report
- Output goes to `SYSTEM_ANALYZE_ROOT` env var (or `export/dataflow_analysis/`)
- Anchors in report: `api-{table-name}`, `ods-{table-name}`, `dwd-{dwd-short-name}` (underscores replaced with hyphens)
- field_mappings JSON structure: `anchors`, `json_to_ods[]`, `ods_to_dwd{}`, `dwd_to_ods{}`
## Files to read:
- `scripts/ops/gen_dataflow_report.py` — the file that needs changes (specifically `_write_field_diff_report` function)
- `export/dataflow_analysis/dataflow_2026-02-17_011645.md` — current report output showing the 1.1 diff table format (lines 1-80)
- `tests/test_dataflow_analyzer.py` — test file that needs new tests added
- `export/dataflow_analysis/field_mappings/assistant_accounts_master.json` — sample field_mappings JSON to understand data structure
- `scripts/ops/dataflow_analyzer.py` — for understanding the data structures (FieldInfo, ColumnInfo, etc.)
USER QUERIES(most recent first):
1. 在 1.1 API↔ODS↔DWD 字段对比差异 表格中,在将差异统计数改为 差异统计数+ 差异字段。点击字段跳转至逐表详情的对应字段位置。嵌套对象的处理:默认折叠,可以展开。若不支持这样的交互操作,则单独使用分表进行记录,点击跳转分表查看详情。下列分表,出现上下游不对应的字段,将整行内容加粗。
2. Output:
Command executed successfully with no output.
Exit Code: 0
3. Output:
Command executed successfully with no output.
Exit Code: 0
4. 在 1.1 API↔ODS↔DWD 字段对比差异 表格中,在将差异统计数改为 差异统计数+ 差异字段。点击字段跳转至逐表详情的对应字段位置。嵌套对象的处理:默认折叠,可以展开。若不支持这样的交互操作,则单独使用分表进行记录,点击跳转分表查看详情。下列分表,出现上下游不对应的字段,将整行内容加粗。
5. Output:
Command executed successfully with no output.
Exit Code: 0
6. Output:
Command executed successfully with no output.
Exit Code: 0
7. Output:
Command executed successfully with no output.
Exit Code: 0
---
METADATA:
The previous conversation had 4 messages.
INSTRUCTIONS:
Continue working until the user query has been fully addressed. Do not ask for clarification - proceed with the work based on the context provided.
IMPORTANT: you need to read from the files to Read section
```