初始提交：飞球 ETL 系统全量代码

2026-02-13 08:05:34 +08:00
commit 3c51f5485d
441 changed files with 117631 additions and 0 deletions
--- a/.kiro/specs/repo-audit/tasks.md
+++ b/.kiro/specs/repo-audit/tasks.md
@@ -0,0 +1,118 @@
+# 实施计划：仓库治理只读审计
+
+## 概述
+
+将设计文档中的审计脚本拆分为增量式编码任务。每个任务构建在前一个任务之上，最终产出可运行的审计工具集。所有脚本位于 `scripts/audit/` 目录，报告输出到 `docs/audit/`。
+
+## 任务
+
+- [x] 1. 搭建审计脚本骨架和数据模型
+  - [x] 1.1 创建 `scripts/audit/__init__.py` 和数据模型定义
+    - 定义 `FileEntry` dataclass（`rel_path`, `is_dir`, `size_bytes`, `extension`, `is_empty_dir`）
+    - 定义 `Category` 和 `Disposition` 枚举
+    - 定义 `InventoryItem` dataclass
+    - 定义 `FlowNode` dataclass
+    - 定义 `DocMapping` 和 `AlignmentIssue` dataclass
+    - _Requirements: 1.2, 1.3, 1.4, 2.7, 3.2, 3.3_
+
+  - [x] 1.2 编写 classify 完整性属性测试
+    - **Property 1: classify 完整性**
+    - **Validates: Requirements 1.2, 1.3**
+
+- [x] 2. 实现仓库扫描器
+  - [x] 2.1 创建 `scripts/audit/scanner.py`
+    - 实现 `EXCLUDED_PATTERNS` 常量和排除匹配逻辑
+    - 实现 `scan_repo(root, exclude)` 函数：递归遍历文件系统，返回 `list[FileEntry]`
+    - 处理空目录检测（`is_empty_dir`）
+    - 处理文件读取权限错误（跳过并记录）
+    - _Requirements: 1.1, 5.1, 5.3_
+
+  - [x] 2.2 编写扫描器排除规则属性测试
+    - **Property 7: 扫描器排除规则**
+    - **Validates: Requirements 1.1**
+
+- [x] 3. 实现文件清单分析器
+  - [x] 3.1 创建 `scripts/audit/inventory_analyzer.py`
+    - 实现 `classify(entry: FileEntry) -> InventoryItem` 函数，包含完整分类规则表
+    - 实现 `build_inventory(entries) -> list[InventoryItem]` 批量分类函数
+    - 实现 `render_inventory_report(items, repo_root) -> str` Markdown 渲染函数
+    - 包含统计摘要生成（各分类/标签计数）
+    - 注意：需求 1.8 仅覆盖 `logs/` 和 `export/` 目录（不含 `reports/`）
+    - _Requirements: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 1.10, 4.2, 4.5_
+
+  - [x] 3.2 编写 classify 分类规则属性测试
+    - **Property 3: 空目录标记为候选删除**
+    - **Property 4: .lnk/.rar 文件标记为候选删除**
+    - **Property 5: tmp/ 下文件处置范围**
+    - **Property 6: 运行时产出目录标记为候选归档**（仅 `logs/`、`export/`）
+    - **Validates: Requirements 1.5, 1.6, 1.7, 1.8**
+
+  - [x] 3.3 编写清单渲染属性测试
+    - **Property 2: 清单渲染完整性**
+    - **Property 8: 清单按分类分组**
+    - **Validates: Requirements 1.4, 1.10**
+
+- [x] 4. 检查点 - 确保文件清单模块测试通过
+  - 确保所有测试通过，如有疑问请向用户确认。
+
+- [x] 5. 实现流程树分析器
+  - [x] 5.1 创建 `scripts/audit/flow_analyzer.py`
+    - 实现 `parse_imports(filepath)` 函数：使用 `ast` 模块解析 Python 文件的 import 语句
+    - 实现 `build_flow_tree(repo_root, entry_file)` 函数：从入口递归追踪 import 链
+    - 实现 `find_orphan_modules(repo_root, all_entries, reachable)` 函数
+    - 实现 `render_flow_report(trees, orphans, repo_root)` 函数：生成 Mermaid 图和缩进文本
+    - 包含入口点识别逻辑（CLI、GUI、批处理、运维脚本）
+    - 包含任务类型和加载器类型区分逻辑
+    - 包含统计摘要生成
+    - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 4.6_
+
+  - [x] 5.2 编写流程树属性测试
+    - **Property 9: 流程树节点 source_file 有效性**
+    - **Property 10: 孤立模块检测正确性**
+    - **Validates: Requirements 2.7, 2.8**
+
+- [x] 6. 实现文档对齐分析器
+  - [x] 6.1 创建 `scripts/audit/doc_alignment_analyzer.py`
+    - 实现 `scan_docs(repo_root)` 函数：扫描所有文档来源
+    - 实现 `extract_code_references(doc_path)` 函数：从文档提取代码引用
+    - 实现 `check_reference_validity(ref, repo_root)` 函数
+    - 实现 `find_undocumented_modules(repo_root, documented)` 函数
+    - 实现 `check_ddl_vs_dictionary(repo_root)` 函数：DDL 与数据字典比对
+    - 实现 `check_api_samples_vs_parsers(repo_root)` 函数：API 样本与解析器比对
+    - 实现 `render_alignment_report(mappings, issues, repo_root)` 函数
+    - 包含统计摘要生成
+    - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 4.7_
+
+  - [x] 6.2 编写文档对齐属性测试
+    - **Property 11: 过期引用检测**
+    - **Property 12: 缺失文档检测**
+    - **Property 16: 文档对齐报告分区完整性**
+    - **Validates: Requirements 3.3, 3.5, 3.8**
+
+- [x] 7. 检查点 - 确保流程树和文档对齐模块测试通过
+  - 确保所有测试通过，如有疑问请向用户确认。
+
+- [x] 8. 实现审计主入口和报告输出
+  - [x] 8.1 创建 `scripts/audit/run_audit.py`
+    - 实现 `run_audit(repo_root)` 主函数：依次调用扫描器和三个分析器
+    - 实现 `docs/audit/` 目录检查与创建逻辑
+    - 实现报告头部元信息（时间戳、仓库路径）注入
+    - 实现三份报告的文件写入
+    - 添加 `if __name__ == "__main__"` 入口
+    - _Requirements: 4.1, 4.2, 4.3, 4.4, 5.2, 5.4_
+
+  - [x] 8.2 编写报告输出属性测试
+    - **Property 13: 统计摘要一致性**
+    - **Property 14: 报告头部元信息**
+    - **Property 15: 写操作仅限 docs/audit/**
+    - **Validates: Requirements 4.2, 4.5, 4.6, 4.7, 5.2**
+
+- [x] 9. 最终检查点 - 确保所有测试通过
+  - 确保所有测试通过，如有疑问请向用户确认。
+
+## 备注
+
+- 标记 `*` 的子任务为可选，可跳过以加速 MVP 交付
+- 每个任务引用了具体的需求编号，便于追溯
+- 属性测试使用 `hypothesis` 库，每个测试至少 100 次迭代
+- 单元测试验证具体示例和边界情况，属性测试验证通用正确性