在前后端开发联调前 的提交20260223
This commit is contained in:
83
.env
83
.env
@@ -14,28 +14,95 @@ DB_USER=local-Python
|
||||
DB_PASSWORD=Neo-local-1991125
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据库名称(开发/测试环境使用测试库,生产环境切换为 etl_feiqiu / zqyy_app)
|
||||
# 数据库名称
|
||||
# CHANGE 2026-02-15 | 默认指向测试库,避免开发时误操作生产数据
|
||||
# CHANGE 2026-02-19 | 移除 PG_NAME(未被代码引用,仅 .env.template 分离式配置预留)
|
||||
#
|
||||
# 数据库清单:
|
||||
# etl_feiqiu — ETL 流程(飞球连接器),正式环境
|
||||
# test_etl_feiqiu — ETL 流程(飞球连接器),开发/测试环境
|
||||
# zqyy_app — 小程序业务库,正式环境
|
||||
# test_zqyy_app — 小程序业务库,开发/测试环境
|
||||
# ------------------------------------------------------------------------------
|
||||
PG_NAME=test_etl_feiqiu
|
||||
APP_DB_NAME=test_zqyy_app
|
||||
ETL_DB_NAME=test_etl_feiqiu
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 组合式 DSN(analyze_dataflow.py / 后端等需要完整连接串的场景使用)
|
||||
# 组合式 DSN(各子系统 / 脚本需要完整连接串时使用)
|
||||
# 格式:postgresql://user:password@host:port/dbname
|
||||
# CHANGE 2026-02-16 | 新增,供 dataflow_analyzer 等跨模块脚本直接读取
|
||||
# CHANGE 2026-02-19 | 多库 DSN:PG_DSN(ETL 库,向后兼容)+ APP_DB_DSN(业务库)
|
||||
# ------------------------------------------------------------------------------
|
||||
PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/test_etl_feiqiu
|
||||
APP_DB_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/test_zqyy_app
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据流结构分析输出目录(analyze_dataflow.py 使用)
|
||||
# 缺省时回退到 docs/reports/
|
||||
# ------------------------------------------------------------------------------
|
||||
SYSTEM_ANALYZE_ROOT=C:/NeoZQYY/export/dataflow_analysis
|
||||
# CHANGE 2026-02-21 | 显式定义测试库 DSN,运维脚本/集成测试优先使用
|
||||
TEST_DB_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/test_etl_feiqiu
|
||||
TEST_APP_DB_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/test_zqyy_app
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 通用
|
||||
# ------------------------------------------------------------------------------
|
||||
TIMEZONE=Asia/Shanghai
|
||||
LOG_LEVEL=INFO
|
||||
|
||||
# ==============================================================================
|
||||
# 统一输出路径配置(export/ 目录)
|
||||
# ==============================================================================
|
||||
# CHANGE 2026-02-19 | 统一规划 export 目录结构,所有输出路径集中管理
|
||||
#
|
||||
# 目录总览:
|
||||
# export/
|
||||
# ├── ETL-Connectors/feiqiu/
|
||||
# │ ├── JSON/ — API 原始 JSON 导出(ODS 抓取落盘)
|
||||
# │ ├── LOGS/ — ETL 运行日志(每次 run 一个 .log)
|
||||
# │ └── REPORTS/ — ETL 质检/完整性报告(JSON 格式)
|
||||
# ├── SYSTEM/
|
||||
# │ ├── LOGS/ — 系统级运维日志
|
||||
# │ ├── REPORTS/
|
||||
# │ │ ├── dataflow_analysis/ — 数据流结构分析报告(Markdown)
|
||||
# │ │ ├── field_audit/ — 字段排查报告
|
||||
# │ │ └── full_dataflow_doc/ — 全链路数据流文档
|
||||
# │ └── CACHE/
|
||||
# │ └── api_samples/ — API 样本缓存(gen_full_dataflow_doc 使用)
|
||||
# └── BACKEND/
|
||||
# └── LOGS/ — 后端结构化日志(预留)
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ETL Connector(飞球)输出路径
|
||||
# ------------------------------------------------------------------------------
|
||||
# JSON 导出根目录(ODS 抓取落盘,按 TASK_CODE/run_id 自动建子目录)
|
||||
EXPORT_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON
|
||||
# ETL 运行日志根目录
|
||||
LOG_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/LOGS
|
||||
# 在线抓取 JSON 输出根目录(FETCH_ONLY 模式使用)
|
||||
FETCH_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON
|
||||
# ETL 质检/完整性报告输出目录
|
||||
ETL_REPORT_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/REPORTS
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 系统级输出路径
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据流结构分析报告输出目录(gen_dataflow_report.py / analyze_dataflow.py)
|
||||
SYSTEM_ANALYZE_ROOT=C:/NeoZQYY/export/SYSTEM/REPORTS/dataflow_analysis
|
||||
# 字段排查报告输出目录(field_audit.py)
|
||||
FIELD_AUDIT_ROOT=C:/NeoZQYY/export/SYSTEM/REPORTS/field_audit
|
||||
# 全链路数据流文档输出目录(gen_full_dataflow_doc.py)
|
||||
FULL_DATAFLOW_DOC_ROOT=C:/NeoZQYY/export/SYSTEM/REPORTS/full_dataflow_doc
|
||||
# API 样本缓存目录(gen_full_dataflow_doc.py 的 24h 缓存)
|
||||
API_SAMPLE_CACHE_ROOT=C:/NeoZQYY/export/SYSTEM/CACHE/api_samples
|
||||
# 系统级运维日志目录
|
||||
SYSTEM_LOG_ROOT=C:/NeoZQYY/export/SYSTEM/LOGS
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 后端输出路径(预留)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 后端结构化日志目录
|
||||
BACKEND_LOG_ROOT=C:/NeoZQYY/export/BACKEND/LOGS
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 阿里云百炼 AI 配置
|
||||
# CHANGE 2026-02-23 | 从 PRD 文档迁移至 .env,禁止在文档中明文存放
|
||||
# ------------------------------------------------------------------------------
|
||||
BAILIAN_API_KEY=sk-6def29cab3474cc797e52b82a46a5dba
|
||||
BAILIAN_TEST_APP_ID=541edb3d5fcd4c18b13cbad81bb5fb9d
|
||||
|
||||
409
.env.template
409
.env.template
@@ -3,57 +3,20 @@
|
||||
# ==============================================================================
|
||||
# 使用方式:复制为 .env 后填入实际值
|
||||
# 配置优先级:DEFAULTS < .env < .env.local < 环境变量 < CLI 参数
|
||||
#
|
||||
# 本文件包含所有层级的参数模板:
|
||||
# [ROOT] — 根 .env(公共配置,所有子系统共享)
|
||||
# [ETL] — apps/etl/connectors/feiqiu/.env(ETL 专属配置)
|
||||
# [BACKEND] — apps/backend/.env.local(后端私有覆盖)
|
||||
#
|
||||
# 语法:KEY=VALUE(正斜杠路径,布尔值用 true/false,列表用逗号分隔)
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 门店配置
|
||||
# ------------------------------------------------------------------------------
|
||||
STORE_ID=
|
||||
TIMEZONE=Asia/Shanghai
|
||||
# ╔════════════════════════════════════════════════════════════════════════════╗
|
||||
# ║ [ROOT] 根 .env — 公共配置层 ║
|
||||
# ╚════════════════════════════════════════════════════════════════════════════╝
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据库配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# 完整 DSN(优先使用,设置后忽略下面的分离式配置)
|
||||
PG_DSN=postgresql://user:password@host:5432/dbname
|
||||
|
||||
# 分离式配置(不使用 DSN 时启用)
|
||||
# PG_HOST=localhost
|
||||
# PG_PORT=5432
|
||||
# PG_NAME=your_database
|
||||
# PG_USER=your_user
|
||||
# PG_PASSWORD=your_password
|
||||
|
||||
# 连接超时(秒,默认 20)
|
||||
PG_CONNECT_TIMEOUT=10
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据库会话参数(ETL defaults.py → db.session.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 会话时区(默认跟随 TIMEZONE)
|
||||
# DB_SESSION_TIMEZONE=Asia/Shanghai
|
||||
|
||||
# SQL 语句超时(毫秒,默认 30000)
|
||||
# DB_STATEMENT_TIMEOUT_MS=30000
|
||||
|
||||
# 锁等待超时(毫秒,默认 5000)
|
||||
# DB_LOCK_TIMEOUT_MS=5000
|
||||
|
||||
# 事务空闲超时(毫秒,默认 600000 = 10 分钟)
|
||||
# DB_IDLE_IN_TX_TIMEOUT_MS=600000
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据库 Schema 配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# CHANGE 2026-02-15 | 对齐新库 etl_feiqiu 六层架构
|
||||
# OLTP 业务数据 schema(默认 ods)
|
||||
SCHEMA_OLTP=ods
|
||||
|
||||
# ETL 管理数据 schema(默认 meta)
|
||||
SCHEMA_ETL=meta
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 后端公共数据库连接参数(后端 config.py 从根 .env 读取)
|
||||
# 数据库公共连接参数(后端 + ETL 共用同一 PostgreSQL 实例)
|
||||
# ------------------------------------------------------------------------------
|
||||
DB_HOST=localhost
|
||||
DB_PORT=5432
|
||||
@@ -61,307 +24,275 @@ DB_USER=
|
||||
DB_PASSWORD=
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 业务数据库配置(后端 API 使用)
|
||||
# 数据库名称
|
||||
# 数据库清单:
|
||||
# etl_feiqiu — ETL 流程(飞球连接器),正式环境
|
||||
# test_etl_feiqiu — ETL 流程(飞球连接器),开发/测试环境
|
||||
# zqyy_app — 小程序业务库,正式环境
|
||||
# test_zqyy_app — 小程序业务库,开发/测试环境
|
||||
# 开发/测试环境使用 test_ 前缀库
|
||||
# ------------------------------------------------------------------------------
|
||||
# 开发/测试环境使用 test_ 前缀库,生产环境切换为 zqyy_app / etl_feiqiu
|
||||
APP_DB_NAME=test_zqyy_app
|
||||
|
||||
# ETL 数据库(后端只读访问,用于数据库查看器;省略时复用 DB_HOST/PORT/USER/PASSWORD)
|
||||
# ETL_DB_HOST=
|
||||
# ETL_DB_PORT=
|
||||
# ETL_DB_USER=
|
||||
# ETL_DB_PASSWORD=
|
||||
ETL_DB_NAME=test_etl_feiqiu
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# JWT 认证(后端使用,放在 apps/backend/.env.local 中)
|
||||
# 组合式 DSN(各子系统 / 脚本需要完整连接串时使用)
|
||||
# 格式:postgresql://user:password@host:port/dbname
|
||||
# ------------------------------------------------------------------------------
|
||||
# JWT_SECRET_KEY=change-me-in-production
|
||||
# JWT_ALGORITHM=HS256
|
||||
# JWT_ACCESS_TOKEN_EXPIRE_MINUTES=30
|
||||
# JWT_REFRESH_TOKEN_EXPIRE_DAYS=7
|
||||
PG_DSN=postgresql://user:password@host:5432/test_etl_feiqiu
|
||||
APP_DB_DSN=postgresql://user:password@host:5432/test_zqyy_app
|
||||
|
||||
# 测试库 DSN(运维脚本、集成测试优先使用)
|
||||
TEST_DB_DSN=postgresql://user:password@host:5432/test_etl_feiqiu
|
||||
TEST_APP_DB_DSN=postgresql://user:password@host:5432/test_zqyy_app
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 微信小程序配置(后端使用,放在 apps/backend/.env.local 中)
|
||||
# 通用
|
||||
# ------------------------------------------------------------------------------
|
||||
# 消息推送回调 Token(与微信后台填写的一致)
|
||||
# WX_CALLBACK_TOKEN=
|
||||
# 小程序 AppID(code2Session 登录时需要)
|
||||
# WX_APP_ID=
|
||||
# 小程序 AppSecret(禁止出现在前端代码中)
|
||||
# WX_APP_SECRET=
|
||||
TIMEZONE=Asia/Shanghai
|
||||
LOG_LEVEL=INFO
|
||||
|
||||
# ==============================================================================
|
||||
# 统一输出路径配置(export/ 目录)
|
||||
# ==============================================================================
|
||||
# 目录总览:
|
||||
# export/
|
||||
# ├── ETL-Connectors/feiqiu/
|
||||
# │ ├── JSON/ — API 原始 JSON 导出
|
||||
# │ ├── LOGS/ — ETL 运行日志
|
||||
# │ └── REPORTS/ — ETL 质检/完整性报告
|
||||
# ├── SYSTEM/
|
||||
# │ ├── LOGS/ — 系统级运维日志
|
||||
# │ ├── REPORTS/
|
||||
# │ │ ├── dataflow_analysis/ — 数据流结构分析报告
|
||||
# │ │ ├── field_audit/ — 字段排查报告
|
||||
# │ │ └── full_dataflow_doc/ — 全链路数据流文档
|
||||
# │ └── CACHE/
|
||||
# │ └── api_samples/ — API 样本缓存
|
||||
# └── BACKEND/
|
||||
# └── LOGS/ — 后端结构化日志
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# CORS(后端使用,逗号分隔)
|
||||
# ETL Connector 输出路径
|
||||
# ------------------------------------------------------------------------------
|
||||
# CORS_ORIGINS=http://localhost:5173
|
||||
EXPORT_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON
|
||||
LOG_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/LOGS
|
||||
FETCH_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON
|
||||
ETL_REPORT_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/REPORTS
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 日志级别
|
||||
# 系统级输出路径
|
||||
# ------------------------------------------------------------------------------
|
||||
# LOG_LEVEL=INFO
|
||||
SYSTEM_ANALYZE_ROOT=C:/NeoZQYY/export/SYSTEM/REPORTS/dataflow_analysis
|
||||
FIELD_AUDIT_ROOT=C:/NeoZQYY/export/SYSTEM/REPORTS/field_audit
|
||||
FULL_DATAFLOW_DOC_ROOT=C:/NeoZQYY/export/SYSTEM/REPORTS/full_dataflow_doc
|
||||
API_SAMPLE_CACHE_ROOT=C:/NeoZQYY/export/SYSTEM/CACHE/api_samples
|
||||
SYSTEM_LOG_ROOT=C:/NeoZQYY/export/SYSTEM/LOGS
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 后端输出路径
|
||||
# ------------------------------------------------------------------------------
|
||||
BACKEND_LOG_ROOT=C:/NeoZQYY/export/BACKEND/LOGS
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 阿里云百炼 AI 配置
|
||||
# ------------------------------------------------------------------------------
|
||||
BAILIAN_API_KEY=
|
||||
BAILIAN_TEST_APP_ID=
|
||||
|
||||
# ╔════════════════════════════════════════════════════════════════════════════╗
|
||||
# ║ [ETL] apps/etl/connectors/feiqiu/.env — ETL 专属配置 ║
|
||||
# ╚════════════════════════════════════════════════════════════════════════════╝
|
||||
# 以下参数应放在 apps/etl/connectors/feiqiu/.env 中,而非根 .env
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 门店配置
|
||||
# ------------------------------------------------------------------------------
|
||||
STORE_ID=
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据库配置(ETL 专属,覆盖根 .env 的公共参数)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 完整 DSN(优先使用,设置后忽略分离式配置)
|
||||
# PG_DSN=postgresql://user:password@host:5432/test_etl_feiqiu
|
||||
PG_CONNECT_TIMEOUT=10
|
||||
|
||||
# 分离式配置(不使用 DSN 时启用)
|
||||
# PG_HOST=localhost
|
||||
# PG_PORT=5432
|
||||
# PG_USER=your_user
|
||||
# PG_PASSWORD=your_password
|
||||
# PG_NAME=your_database
|
||||
|
||||
# 数据库 Schema
|
||||
SCHEMA_OLTP=ods
|
||||
SCHEMA_ETL=meta
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据库会话参数(defaults.py → db.session.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# DB_SESSION_TIMEZONE=Asia/Shanghai
|
||||
# DB_STATEMENT_TIMEOUT_MS=30000
|
||||
# DB_LOCK_TIMEOUT_MS=5000
|
||||
# DB_IDLE_IN_TX_TIMEOUT_MS=600000
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# API 配置(上游 SaaS API)
|
||||
# ------------------------------------------------------------------------------
|
||||
API_BASE=
|
||||
API_TOKEN=
|
||||
|
||||
# API 请求超时(秒,默认 20)
|
||||
API_TIMEOUT=20
|
||||
|
||||
# 分页大小(默认 200)
|
||||
API_PAGE_SIZE=200
|
||||
|
||||
# 最大重试次数(默认 3)
|
||||
API_RETRY_MAX=3
|
||||
|
||||
# 重试退避时间(JSON 数组格式,单位秒,默认 [1, 2, 4])
|
||||
# API_RETRY_BACKOFF=[1, 2, 4]
|
||||
|
||||
# 额外请求参数(JSON 对象格式)
|
||||
# API_PARAMS={}
|
||||
|
||||
# 额外请求头(JSON 对象格式,默认空)
|
||||
# API_HEADERS_EXTRA={}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 路径配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# JSON 导出根目录
|
||||
EXPORT_ROOT=C:/NeoZQYY/export/ETL/JSON
|
||||
|
||||
# 日志输出根目录
|
||||
LOG_ROOT=C:/NeoZQYY/export/ETL/LOG
|
||||
|
||||
# 在线抓取 JSON 输出根目录(FETCH_ONLY 模式使用)
|
||||
FETCH_ROOT=C:/NeoZQYY/export/ETL/JSON
|
||||
|
||||
# 本地清洗入库时的 JSON 输入目录(INGEST_ONLY 模式使用,为空则默认使用本次抓取目录)
|
||||
# INGEST_SOURCE_DIR=
|
||||
|
||||
# manifest 文件名(默认 manifest.json)
|
||||
# MANIFEST_NAME=manifest.json
|
||||
|
||||
# 入库报告文件名(默认 ingest_report.json)
|
||||
# INGEST_REPORT_NAME=ingest_report.json
|
||||
|
||||
# 是否格式化输出 JSON(默认 true)
|
||||
WRITE_PRETTY_JSON=true
|
||||
|
||||
# 单文件最大字节数(默认 50MB = 52428800)
|
||||
# MAX_FILE_BYTES=52428800
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 管线流程配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# 运行流程:FULL(抓取+入库)、FETCH_ONLY(仅抓取落盘)、INGEST_ONLY(仅本地入库)
|
||||
PIPELINE_FLOW=FULL
|
||||
|
||||
# 数据源模式(直接设置,不经 PIPELINE_FLOW 映射):hybrid / online / offline
|
||||
# DATA_SOURCE=hybrid
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 时间窗口配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# 窗口重叠秒数(默认 600,即 10 分钟)
|
||||
OVERLAP_SECONDS=600
|
||||
|
||||
# 繁忙时段窗口分钟数(默认 30)
|
||||
WINDOW_BUSY_MIN=30
|
||||
|
||||
# 空闲时段窗口分钟数(默认 180)
|
||||
WINDOW_IDLE_MIN=180
|
||||
|
||||
# 空闲时段起止时间(默认 04:00 ~ 16:00)
|
||||
IDLE_START=04:00
|
||||
IDLE_END=16:00
|
||||
|
||||
# 窗口切分单位(默认 day)
|
||||
WINDOW_SPLIT_UNIT=day
|
||||
|
||||
# 窗口切分天数(默认 10)
|
||||
WINDOW_SPLIT_DAYS=10
|
||||
|
||||
# 窗口补偿小时数(默认 2)
|
||||
WINDOW_COMPENSATION_HOURS=2
|
||||
|
||||
# 空结果是否推进游标(默认 true)
|
||||
ALLOW_EMPTY_RESULT_ADVANCE=true
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 快照配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# 快照缺失时是否删除(默认 true)
|
||||
SNAPSHOT_MISSING_DELETE=true
|
||||
|
||||
# 快照为空时是否允许删除(默认 false)
|
||||
SNAPSHOT_ALLOW_EMPTY_DELETE=false
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据完整性检查配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# 检查模式:history(历史区间)/ latest(最近一次)
|
||||
INTEGRITY_MODE=history
|
||||
|
||||
# 历史检查起始日期
|
||||
INTEGRITY_HISTORY_START=2025-07-01
|
||||
|
||||
# 历史检查结束日期(为空则到当天)
|
||||
# INTEGRITY_HISTORY_END=
|
||||
|
||||
# 是否包含维度表(默认 true)
|
||||
INTEGRITY_INCLUDE_DIMENSIONS=true
|
||||
|
||||
# 是否自动触发检查(默认 false)
|
||||
INTEGRITY_AUTO_CHECK=false
|
||||
|
||||
# 是否自动回填缺失数据(默认 false)
|
||||
INTEGRITY_AUTO_BACKFILL=false
|
||||
|
||||
# 是否比较内容(默认 true)
|
||||
INTEGRITY_COMPARE_CONTENT=true
|
||||
|
||||
# 内容比较采样上限(默认 50)
|
||||
INTEGRITY_CONTENT_SAMPLE_LIMIT=50
|
||||
|
||||
# 内容不一致时是否回填(默认 true)
|
||||
INTEGRITY_BACKFILL_MISMATCH=true
|
||||
|
||||
# 回填后是否重新检查(默认 true)
|
||||
INTEGRITY_RECHECK_AFTER_BACKFILL=true
|
||||
|
||||
# 指定 ODS 任务代码(逗号分隔,为空则全部)
|
||||
# INTEGRITY_ODS_TASK_CODES=
|
||||
|
||||
# 是否强制按月切分(默认 true)
|
||||
# INTEGRITY_FORCE_MONTHLY_SPLIT=true
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 清洗配置(ETL defaults.py → clean.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 是否记录未知字段(默认 true)
|
||||
# CLEAN_LOG_UNKNOWN_FIELDS=true
|
||||
|
||||
# 未知字段日志上限(默认 50)
|
||||
# CLEAN_UNKNOWN_FIELDS_LIMIT=50
|
||||
|
||||
# 哈希算法(默认 sha1)
|
||||
# CLEAN_HASH_ALGO=sha1
|
||||
|
||||
# 哈希盐值(默认空)
|
||||
# CLEAN_HASH_SALT=
|
||||
|
||||
# 严格数值校验(默认 true)
|
||||
# CLEAN_STRICT_NUMERIC=true
|
||||
|
||||
# 金额舍入精度(默认 2 位小数)
|
||||
# CLEAN_ROUND_MONEY_SCALE=2
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 安全配置(ETL defaults.py → security.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 日志中是否脱敏(默认 true)
|
||||
# SECURITY_REDACT_IN_LOGS=true
|
||||
|
||||
# 需脱敏的键名(JSON 数组,默认 ["token","password","Authorization"])
|
||||
# SECURITY_REDACT_KEYS=["token","password","Authorization"]
|
||||
|
||||
# 日志中是否回显 token(默认 false,调试用)
|
||||
# SECURITY_ECHO_TOKEN_IN_LOGS=false
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 校验配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# 抓取后校验时是否跳过 ODS 重载(默认 true)
|
||||
VERIFY_SKIP_ODS_ON_FETCH=true
|
||||
|
||||
# 校验时是否使用本地 JSON(默认 true)
|
||||
VERIFY_ODS_LOCAL_JSON=true
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# IO 配置
|
||||
# ------------------------------------------------------------------------------
|
||||
WRITE_PRETTY_JSON=true
|
||||
# INGEST_SOURCE_DIR=
|
||||
# MANIFEST_NAME=manifest.json
|
||||
# INGEST_REPORT_NAME=ingest_report.json
|
||||
# MAX_FILE_BYTES=52428800
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 清洗配置(defaults.py → clean.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# CLEAN_LOG_UNKNOWN_FIELDS=true
|
||||
# CLEAN_UNKNOWN_FIELDS_LIMIT=50
|
||||
# CLEAN_HASH_ALGO=sha1
|
||||
# CLEAN_HASH_SALT=
|
||||
# CLEAN_STRICT_NUMERIC=true
|
||||
# CLEAN_ROUND_MONEY_SCALE=2
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 安全配置(defaults.py → security.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# SECURITY_REDACT_IN_LOGS=true
|
||||
# SECURITY_REDACT_KEYS=["token","password","Authorization"]
|
||||
# SECURITY_ECHO_TOKEN_IN_LOGS=false
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# DWD 层配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# 事实表是否使用 UPSERT(默认 true)
|
||||
DWD_FACT_UPSERT=true
|
||||
|
||||
# 事实表 UPSERT 批量大小(默认 1000)
|
||||
# DWD_FACT_UPSERT_BATCH_SIZE=1000
|
||||
|
||||
# 最小批量大小(锁冲突时自动缩小,默认 100)
|
||||
# DWD_FACT_UPSERT_MIN_BATCH_SIZE=100
|
||||
|
||||
# 最大重试次数(默认 2)
|
||||
# DWD_FACT_UPSERT_MAX_RETRIES=2
|
||||
|
||||
# 重试退避时间(JSON 数组,秒,默认 [1,2,4])
|
||||
# DWD_FACT_UPSERT_RETRY_BACKOFF=[1,2,4]
|
||||
|
||||
# 事实表 backfill 锁等待超时(毫秒,为空则沿用 DB_LOCK_TIMEOUT_MS)
|
||||
# DWD_FACT_UPSERT_LOCK_TIMEOUT_MS=
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 任务列表配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# ODS 抓取任务列表(逗号分隔)
|
||||
RUN_TASKS=PRODUCTS,TABLES,MEMBERS,ASSISTANTS,PACKAGES_DEF,ORDERS,PAYMENTS,REFUNDS,COUPON_USAGE,INVENTORY_CHANGE,TOPUPS,TABLE_DISCOUNT,ASSISTANT_ABOLISH,LEDGER
|
||||
|
||||
# DWS 汇总任务列表(逗号分隔,为空则不执行)
|
||||
# RUN_DWS_TASKS=
|
||||
|
||||
# 指数计算任务列表(逗号分隔,为空则不执行)
|
||||
# RUN_INDEX_TASKS=
|
||||
|
||||
# 指数回溯天数(默认 60)
|
||||
INDEX_LOOKBACK_DAYS=60
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# DWS 月度/薪资配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# 新人封顶生效日期(默认 2026-03-01)
|
||||
# DWS_MONTHLY_NEW_HIRE_CAP_EFFECTIVE_FROM=2026-03-01
|
||||
|
||||
# 新人封顶天数(默认 25)
|
||||
# DWS_MONTHLY_NEW_HIRE_CAP_DAY=25
|
||||
|
||||
# 新人最高等级(默认 2)
|
||||
# DWS_MONTHLY_NEW_HIRE_MAX_TIER_LEVEL=2
|
||||
|
||||
# 薪资计算运行天数(默认 5)
|
||||
# DWS_SALARY_RUN_DAYS=5
|
||||
|
||||
# 是否允许非周期内运行(默认 false)
|
||||
# DWS_SALARY_ALLOW_OUT_OF_CYCLE=false
|
||||
|
||||
# 包房课单价(默认 138)
|
||||
# DWS_SALARY_ROOM_COURSE_PRICE=138
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# DWS 月度额外参数
|
||||
# ------------------------------------------------------------------------------
|
||||
# 是否允许历史月度重算(默认 false)
|
||||
# DWS_MONTHLY_ALLOW_HISTORY=false
|
||||
|
||||
# 上月宽限天数(默认 5,即次月 1-5 号仍可计算上月)
|
||||
# DWS_MONTHLY_PREV_GRACE_DAYS=5
|
||||
|
||||
# 历史月份数(默认 0,即不回溯)
|
||||
# DWS_MONTHLY_HISTORY_MONTHS=0
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 后端 ETL 项目路径(后端 config.py 使用,缺省按 monorepo 相对路径推算)
|
||||
# ------------------------------------------------------------------------------
|
||||
# ETL_PROJECT_PATH=C:/NeoZQYY/apps/etl/connectors/feiqiu
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据流结构分析(analyze_dataflow.py 使用)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 分析结果输出目录(缺省回退到 docs/reports/)
|
||||
# SYSTEM_ANALYZE_ROOT=C:/NeoZQYY/export/dataflow_analysis
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ODS 离线回放配置(仅开发/运维使用)
|
||||
# ------------------------------------------------------------------------------
|
||||
# ODS_JSON_DOC_DIR=export/test-json-doc
|
||||
# ODS_INCLUDE_FILES=
|
||||
# ODS_DROP_SCHEMA_FIRST=true
|
||||
|
||||
# ╔════════════════════════════════════════════════════════════════════════════╗
|
||||
# ║ [BACKEND] apps/backend/.env.local — 后端私有覆盖 ║
|
||||
# ╚════════════════════════════════════════════════════════════════════════════╝
|
||||
# 以下参数应放在 apps/backend/.env.local 中,而非根 .env
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ETL 数据库(后端只读访问,用于数据库查看器;省略时复用 DB_HOST/PORT/USER/PASSWORD)
|
||||
# ------------------------------------------------------------------------------
|
||||
# ETL_DB_HOST=
|
||||
# ETL_DB_PORT=
|
||||
# ETL_DB_USER=
|
||||
# ETL_DB_PASSWORD=
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# JWT 认证
|
||||
# ------------------------------------------------------------------------------
|
||||
# JWT_SECRET_KEY=change-me-in-production
|
||||
# JWT_ALGORITHM=HS256
|
||||
# JWT_ACCESS_TOKEN_EXPIRE_MINUTES=30
|
||||
# JWT_REFRESH_TOKEN_EXPIRE_DAYS=7
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 微信小程序配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# WX_CALLBACK_TOKEN=
|
||||
# WX_APP_ID=
|
||||
# WX_APP_SECRET=
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# CORS(逗号分隔)
|
||||
# ------------------------------------------------------------------------------
|
||||
# CORS_ORIGINS=http://localhost:5173
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ETL 项目路径(子进程 cwd,缺省按 monorepo 相对路径推算)
|
||||
# ------------------------------------------------------------------------------
|
||||
# ETL_PROJECT_PATH=C:/NeoZQYY/apps/etl/connectors/feiqiu
|
||||
@@ -1,4 +1,4 @@
|
||||
{
|
||||
"prompt_id": "P20260219-083015",
|
||||
"at": "2026-02-19T08:30:15.914061+08:00"
|
||||
"prompt_id": "P20260223-225610",
|
||||
"at": "2026-02-23T22:56:10.365734+08:00"
|
||||
}
|
||||
@@ -11,7 +11,7 @@ tools: ["read", "write", "shell"]
|
||||
- 审计一览表:`docs/audit/audit_dashboard.md`(自动生成,勿手动编辑)
|
||||
- Prompt 日志:`docs/audit/prompt_logs/`
|
||||
- 一览表刷新命令:`python scripts/audit/gen_audit_dashboard.py`
|
||||
- 所有审计产物统一写入项目根目录 `docs/audit/`,不要写入子模块(如 `apps/etl/pipelines/feiqiu/docs/audit/`)内部
|
||||
- 所有审计产物统一写入项目根目录 `docs/audit/`,不要写入子模块(如 `apps/etl/connectors/feiqiu/docs/audit/`)内部
|
||||
|
||||
## 输入来源(不要询问主代理)
|
||||
- 通过 `git status --porcelain` 和 `git diff` 获取本次未提交变更
|
||||
@@ -20,7 +20,7 @@ tools: ["read", "write", "shell"]
|
||||
|
||||
## 何时需要做"重型后处理"
|
||||
满足任一即执行审计收口(否则只输出"无逻辑改动/无需审计",并清除待审计标记):
|
||||
- 改动文件命中 ETL Connector 高风险路径:`apps/etl/pipelines/feiqiu/` 下的 `api/`、`cli/`、`config/`、`database/`、`loaders/`、`models/`、`orchestration/`、`scd/`、`tasks/`、`utils/`、`quality/`
|
||||
- 改动文件命中 ETL Connector 高风险路径:`apps/etl/connectors/feiqiu/` 下的 `api/`、`cli/`、`config/`、`database/`、`loaders/`、`models/`、`orchestration/`、`scd/`、`tasks/`、`utils/`、`quality/`
|
||||
- 改动文件命中后端 API:`apps/backend/app/`
|
||||
- 改动文件命中管理后台源码:`apps/admin-web/src/`
|
||||
- 改动文件命中小程序源码:`apps/miniprogram/miniapp/`、`apps/miniprogram/miniprogram/`
|
||||
|
||||
@@ -1,14 +1,14 @@
|
||||
{
|
||||
"enabled": true,
|
||||
"name": "Data Flow Structure Analysis",
|
||||
"description": "手动触发数据流结构分析:先执行 Python 脚本采集 API JSON、DB 表结构、三层字段映射和 BD_manual 业务描述,再由报告生成器输出带锚点链接、业务描述、多示例值和字段差异报告的 Markdown 文档。",
|
||||
"version": "3.0.0",
|
||||
"description": "手动触发数据流结构分析:先执行 Python 脚本采集 API JSON、DB 表结构、三层字段映射和 BD_manual 业务描述,再由报告生成器输出带锚点链接、业务描述、多示例值、白名单折叠和字段差异报告的 Markdown 文档。",
|
||||
"version": "4.0.0",
|
||||
"when": {
|
||||
"type": "userTriggered"
|
||||
},
|
||||
"then": {
|
||||
"type": "askAgent",
|
||||
"prompt": "执行数据流结构分析,按以下步骤完成:\n\n第一阶段:数据采集\n1. 运行 `python scripts/ops/analyze_dataflow.py` 完成数据采集\n2. 确认采集结果已落盘,包括:\n - json_trees/(含 samples 多示例值)\n - db_schemas/\n - field_mappings/(三层映射 + 锚点)\n - bd_descriptions/(BD_manual 业务描述)\n - collection_manifest.json(含 json_field_count)\n\n第二阶段:报告生成\n3. 运行 `python scripts/ops/gen_dataflow_report.py` 生成 Markdown 报告\n4. 报告包含以下增强内容:\n - 总览表含 API JSON 字段数列\n - 1.1 API↔ODS↔DWD 字段对比差异报告\n - 2.3 覆盖率表含业务描述列\n - API 源字段表含业务描述列 + 多示例值(枚举值解释)\n - ODS 表结构含业务描述列 + 上下游双向映射锚点链接\n - DWD 表结构含业务描述列 + ODS 来源锚点链接\n5. 输出文件路径和关键统计摘要\n\n注意:当前仅分析飞球(feiqiu)连接器。未来新增连接器时,应自动发现并纳入分析范围。"
|
||||
"prompt": "执行数据流结构分析,按以下步骤完成。若发现已完成或有历史任务痕迹则清空,重新执行:\n\n第一阶段:数据采集\n1. 运行 `python scripts/ops/analyze_dataflow.py` 完成数据采集(如需指定日期范围,加 --date-from / --date-to 参数)\n2. 确认采集结果已落盘,包括:\n - json_trees/(含 samples 多示例值)\n - db_schemas/\n - field_mappings/(三层映射 + 锚点)\n - bd_descriptions/(BD_manual 业务描述)\n - collection_manifest.json(含 json_field_count、date_from、date_to)\n\n第二阶段:报告生成\n3. 运行 `python scripts/ops/gen_dataflow_report.py` 生成 Markdown 报告\n4. 报告包含以下增强内容:\n - 报告头含 API 请求日期范围(date_from ~ date_to)和 JSON 数据总量\n - 总览表含 API JSON 字段数列\n - 1.1 API↔ODS↔DWD 字段对比差异报告(白名单字段折叠汇总,不展开详细表格行)\n - 2.3 覆盖率表含业务描述列\n - API 源字段表含业务描述列 + 多示例值(枚举值解释)\n - ODS 表结构含业务描述列 + 上下游双向映射锚点链接\n - DWD 表结构含业务描述列 + ODS 来源锚点链接\n5. 输出文件路径和关键统计摘要\n\n白名单规则(v4):\n- ETL 元数据列(source_file, source_endpoint, fetched_at, payload, content_hash)\n- DWD 维表 SCD2 管理列(valid_from, valid_to, is_current, etl_loaded_at, etl_batch_id)\n- API siteProfile 嵌套对象字段\n- 白名单字段仍正常参与检查和统计,仅在报告中折叠显示并注明原因\n\n注意:当前仅分析飞球(feiqiu)连接器。未来新增连接器时,应自动发现并纳入分析范围。"
|
||||
},
|
||||
"workspaceFolderName": "NeoZQYY",
|
||||
"shortName": "dataflow-analyze"
|
||||
|
||||
15
.kiro/hooks/etl-data-consistency.kiro.hook
Normal file
15
.kiro/hooks/etl-data-consistency.kiro.hook
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"enabled": true,
|
||||
"name": "ETL Data Consistency Check",
|
||||
"description": "手动触发 ETL 全链路数据一致性黑盒检查:获取最近一次成功的 ETL 任务,对 API→ODS→DWD→DWS/INDEX 逐表逐字段进行实际数据比对,输出详细的数据差异报告。",
|
||||
"version": "1.0.0",
|
||||
"when": {
|
||||
"type": "userTriggered"
|
||||
},
|
||||
"then": {
|
||||
"type": "askAgent",
|
||||
"prompt": "执行 ETL 全链路数据一致性黑盒检查,按以下步骤完成,若发现已完成或有历史任务痕迹则清空,重新执行:\n\n1. 运行 `python scripts/ops/etl_consistency_check.py`\n2. 脚本会自动:\n a. 从 LOG_ROOT 找到最近一次成功的 ETL 日志,解析本次执行的任务列表\n b. 从 FETCH_ROOT 读取本次 ETL 落盘的 API JSON 文件\n c. 连接数据库(PG_DSN),对本次任务涉及的每张表逐字段比对:\n - API JSON vs ODS:字段完整性、值采样比对(随机 5 条记录的关键字段)\n - ODS vs DWD:字段映射正确性、值转换验证(采样比对)\n - DWD vs DWS/INDEX:聚合逻辑验证(行数、关键指标抽查)\n d. 输出 Markdown 报告到 ETL_REPORT_ROOT\n3. 检查报告输出,汇总关键发现\n\n报告结构:\n- 1. ETL 执行概览(任务列表、成功/失败/跳过统计)\n- 2. API↔ODS 数据一致性(逐表逐字段值比对)\n- 3. ODS↔DWD 数据一致性(映射验证 + 值采样)\n- 4. DWD↔DWS 数据一致性(聚合逻辑验证)\n- 5. 异常汇总与建议\n\n注意:使用正式库 PG_DSN 连接(只读模式),不修改任何数据。"
|
||||
},
|
||||
"workspaceFolderName": "NeoZQYY",
|
||||
"shortName": "etl-data-consistency"
|
||||
}
|
||||
@@ -15,7 +15,7 @@ from datetime import datetime, timezone, timedelta
|
||||
TZ_TAIPEI = timezone(timedelta(hours=8))
|
||||
|
||||
RISK_RULES = [
|
||||
(re.compile(r"^apps/etl/pipelines/feiqiu/(api|cli|config|database|loaders|models|orchestration|scd|tasks|utils|quality)/"), "etl"),
|
||||
(re.compile(r"^apps/etl/connectors/feiqiu/(api|cli|config|database|loaders|models|orchestration|scd|tasks|utils|quality)/"), "etl"),
|
||||
(re.compile(r"^apps/backend/app/"), "backend"),
|
||||
(re.compile(r"^apps/admin-web/src/"), "admin-web"),
|
||||
(re.compile(r"^apps/miniprogram/(miniapp|miniprogram)/"), "miniprogram"),
|
||||
|
||||
@@ -9,8 +9,69 @@
|
||||
],
|
||||
"disabled": false,
|
||||
"autoApprove": [
|
||||
"git_status",
|
||||
"git_branch",
|
||||
"all",
|
||||
"*"
|
||||
]
|
||||
},
|
||||
"postgres": {
|
||||
"disabled": true
|
||||
},
|
||||
"pg-etl": {
|
||||
"command": "uvx",
|
||||
"args": [
|
||||
"postgres-mcp",
|
||||
"--access-mode=unrestricted"
|
||||
],
|
||||
"env": {
|
||||
"DATABASE_URI": "postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/etl_feiqiu"
|
||||
},
|
||||
"disabled": true,
|
||||
"autoApprove": [
|
||||
"all",
|
||||
"*"
|
||||
]
|
||||
},
|
||||
"pg-etl-test": {
|
||||
"command": "uvx",
|
||||
"args": [
|
||||
"postgres-mcp",
|
||||
"--access-mode=unrestricted"
|
||||
],
|
||||
"env": {
|
||||
"DATABASE_URI": "postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/test_etl_feiqiu"
|
||||
},
|
||||
"disabled": false,
|
||||
"autoApprove": [
|
||||
"all",
|
||||
"*"
|
||||
]
|
||||
},
|
||||
"pg-app": {
|
||||
"command": "uvx",
|
||||
"args": [
|
||||
"postgres-mcp",
|
||||
"--access-mode=unrestricted"
|
||||
],
|
||||
"env": {
|
||||
"DATABASE_URI": "postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/zqyy_app"
|
||||
},
|
||||
"disabled": true,
|
||||
"autoApprove": [
|
||||
"all",
|
||||
"*"
|
||||
]
|
||||
},
|
||||
"pg-app-test": {
|
||||
"command": "uvx",
|
||||
"args": [
|
||||
"postgres-mcp",
|
||||
"--access-mode=unrestricted"
|
||||
],
|
||||
"env": {
|
||||
"DATABASE_URI": "postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/test_zqyy_app"
|
||||
},
|
||||
"disabled": false,
|
||||
"autoApprove": [
|
||||
"all",
|
||||
"*"
|
||||
]
|
||||
|
||||
@@ -27,7 +27,7 @@ description: 当发生业务/ETL/API/鉴权/小程序交互等逻辑改动时,
|
||||
### 2b) 各级 README.md(根据变更涉及的模块逐一评估)
|
||||
- `README.md`(根目录):项目总览、快速开始、环境变量、架构概述
|
||||
- `apps/backend/README.md`:后端 API 路由、配置、运行方式、接口契约
|
||||
- `apps/etl/pipelines/feiqiu/README.md`:ETL 任务清单、开发约定、注册流程
|
||||
- `apps/etl/connectors/feiqiu/README.md`:ETL 任务清单、开发约定、注册流程
|
||||
- `apps/miniprogram/README.md`:小程序页面结构、构建部署
|
||||
- `apps/admin-web/README.md`:管理后台功能说明
|
||||
- `packages/shared/README.md`:共享包模块说明、使用方式
|
||||
|
||||
@@ -13,7 +13,7 @@
|
||||
3. 前后端通过 JSON API 通信,实时日志通过 WebSocket 推送
|
||||
4. 数据库查询限制为只读,防止误操作
|
||||
5. **多门店隔离**:通过 `site_id` 贯穿全链路,Operator 登录后绑定门店,所有 API 请求自动携带 site_id
|
||||
6. **执行流程(Flow)分离**:完整保留现有 7 种 Flow 和 3 种处理模式,前端按 Flow 动态展示可选层和任务
|
||||
6. **执行流程(Flow)分离**:完整保留现有 7 种 Flow 和 4 种处理模式,前端按 Flow 动态展示可选层和任务
|
||||
|
||||
## 架构
|
||||
|
||||
@@ -94,10 +94,11 @@ Web 管理后台的隔离策略:
|
||||
| `dwd_dws_index` | DWD → DWS汇总 → DWS指数 | DWS, INDEX |
|
||||
| `dwd_index` | DWD → DWS指数 | INDEX |
|
||||
|
||||
3 种处理模式:
|
||||
4 种处理模式:
|
||||
- `increment_only`:仅增量处理
|
||||
- `verify_only`:校验并修复(可选"校验前从 API 获取")
|
||||
- `increment_verify`:增量 + 校验并修复
|
||||
- `full_window`:用 API 返回数据的实际时间范围处理全部层,无需校验
|
||||
|
||||
时间窗口模式:
|
||||
- `lookback`:回溯 + 冗余(lookback_hours + overlap_seconds)
|
||||
@@ -198,7 +199,7 @@ apps/admin-web/
|
||||
| GET | `/api/tasks/registry` | 获取任务注册表(按业务域分组) | 2 |
|
||||
| GET | `/api/tasks/dwd-tables` | 获取 DWD 表定义(按业务域分组) | 2 |
|
||||
| POST | `/api/tasks/validate` | 验证 TaskConfig | 2, 11 |
|
||||
| GET | `/api/tasks/flows` | 获取执行流程列表(7 种 Flow + 3 种处理模式) | 2 |
|
||||
| GET | `/api/tasks/flows` | 获取执行流程列表(7 种 Flow + 4 种处理模式) | 2 |
|
||||
| POST | `/api/execution/run` | 提交任务执行 | 3 |
|
||||
| GET | `/api/execution/queue` | 获取当前队列 | 4 |
|
||||
| POST | `/api/execution/queue` | 添加任务到队列 | 4 |
|
||||
|
||||
@@ -45,13 +45,13 @@
|
||||
- [x] 3.1 迁移 CLIBuilder 到后端(`apps/backend/app/services/cli_builder.py`)
|
||||
- 从 `gui/utils/cli_builder.py` 迁移核心逻辑
|
||||
- 适配新的 TaskConfigSchema,自动注入 `--store-id` 参数
|
||||
- 支持 7 种 Flow 和 3 种处理模式
|
||||
- 支持 7 种 Flow 和 4 种处理模式
|
||||
- _Requirements: 2.6_
|
||||
|
||||
- [x] 3.2 实现任务注册表 API(`apps/backend/app/routers/tasks.py`)
|
||||
- `GET /api/tasks/registry`:返回按业务域分组的任务列表
|
||||
- `GET /api/tasks/dwd-tables`:返回按业务域分组的 DWD 表定义
|
||||
- `GET /api/tasks/flows`:返回 7 种 Flow 定义和 3 种处理模式定义
|
||||
- `GET /api/tasks/flows`:返回 7 种 Flow 定义和 4 种处理模式定义
|
||||
- `POST /api/tasks/validate`:验证 TaskConfig 并返回生成的 CLI 命令预览
|
||||
- Pydantic schemas 在 `apps/backend/app/schemas/tasks.py`
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 11.1, 11.2_
|
||||
|
||||
1
.kiro/specs/assistant-abolish-cleanup/.config.kiro
Normal file
1
.kiro/specs/assistant-abolish-cleanup/.config.kiro
Normal file
@@ -0,0 +1 @@
|
||||
{"generationMode": "requirements-first"}
|
||||
224
.kiro/specs/assistant-abolish-cleanup/design.md
Normal file
224
.kiro/specs/assistant-abolish-cleanup/design.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# 设计文档:助教废除(Abolish)全链路清理
|
||||
|
||||
## 概述
|
||||
|
||||
本设计描述如何安全地从 ETL 全链路中移除"助教废除"独立数据链路。
|
||||
核心思路:**删除独立的废除链路(API → ODS → DWD),保留服务记录中已有的 `is_trash` 字段作为唯一废除判断源。**
|
||||
|
||||
清理范围覆盖:
|
||||
- ETL 任务定义(ODS 任务 + 注册表)
|
||||
- DWD 加载映射(FACT_MAPPINGS + TABLE_MAP)
|
||||
- DWS 聚合逻辑(死代码移除)
|
||||
- DWD 验证器配置
|
||||
- 数据库 DDL 和迁移脚本
|
||||
- 属性测试
|
||||
- 运维脚本
|
||||
|
||||
## 架构
|
||||
|
||||
### 清理前数据流
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
API_Abolish["/AssistantPerformance/GetAbolitionAssistant"] --> ODS_ACR["ods.assistant_cancellation_records"]
|
||||
ODS_ACR --> DWD_ATE["dwd.dwd_assistant_trash_event"]
|
||||
ODS_ACR --> DWD_ATE_EX["dwd.dwd_assistant_trash_event_ex"]
|
||||
DWD_ATE --> |"_extract_trash_records (死代码)"| DWS_DAILY["dws.dws_assistant_daily_detail"]
|
||||
|
||||
API_Service["/AssistantPerformance/GetAssistantServiceRecords"] --> ODS_ASR["ods.assistant_service_records"]
|
||||
ODS_ASR --> DWD_SL_EX["dwd.dwd_assistant_service_log_ex"]
|
||||
DWD_SL_EX --> |"is_trash 字段 (实际使用)"| DWS_DAILY
|
||||
|
||||
style API_Abolish fill:#f99,stroke:#c00
|
||||
style ODS_ACR fill:#f99,stroke:#c00
|
||||
style DWD_ATE fill:#f99,stroke:#c00
|
||||
style DWD_ATE_EX fill:#f99,stroke:#c00
|
||||
```
|
||||
|
||||
### 清理后数据流
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
API_Service["/AssistantPerformance/GetAssistantServiceRecords"] --> ODS_ASR["ods.assistant_service_records"]
|
||||
ODS_ASR --> DWD_SL_EX["dwd.dwd_assistant_service_log_ex"]
|
||||
DWD_SL_EX --> |"is_trash 字段"| DWS_DAILY["dws.dws_assistant_daily_detail"]
|
||||
|
||||
style API_Service fill:#9f9,stroke:#090
|
||||
style DWD_SL_EX fill:#9f9,stroke:#090
|
||||
```
|
||||
|
||||
## 组件与接口
|
||||
|
||||
### 需要修改的文件清单
|
||||
|
||||
| 层级 | 文件 | 操作 | 需求 |
|
||||
|------|------|------|------|
|
||||
| ODS 任务 | `tasks/ods/ods_tasks.py` | 删除 `OdsTaskSpec` 条目 + 从默认序列移除 | 1.2, 1.3 |
|
||||
| 任务注册 | `orchestration/task_registry.py` | 删除 `ODS_ASSISTANT_ABOLISH` 注册 | 1.1 |
|
||||
| DWD 加载 | `tasks/dwd/dwd_load_task.py` | 删除 FACT_MAPPINGS 和 TABLE_MAP 条目 | 2.1–2.4 |
|
||||
| DWS 日度 | `tasks/dws/assistant_daily_task.py` | 删除 `_extract_trash_records`、`_build_trash_index`,简化 `_aggregate_by_assistant_date` 签名 | 3.1–3.4 |
|
||||
| DWD 验证 | `tasks/verification/dwd_verifier.py` | 删除废除表的 ID 和时间字段映射 | 4.1–4.4 |
|
||||
| DDL | `db/etl_feiqiu/schemas/dwd.sql` | 删除 `dwd_assistant_trash_event` / `_ex` 的 CREATE TABLE + COMMENT | 6.1–6.2 |
|
||||
| DDL | `db/etl_feiqiu/schemas/ods.sql` | 删除 `assistant_cancellation_records` 的 CREATE TABLE + COMMENT | 6.3 |
|
||||
| DDL | `db/etl_feiqiu/schemas/schema_dwd_doc.sql` | 删除废除表的 CREATE TABLE + COMMENT | 6.4 |
|
||||
| DDL | `db/etl_feiqiu/schemas/schema_ODS_doc.sql` | 删除废除表的 CREATE TABLE + COMMENT | 6.5 |
|
||||
| DDL | `db/etl_feiqiu/schemas/dws.sql` | 更新 `dws_assistant_daily_detail` 注释 | 6.6 |
|
||||
| DDL | `db/etl_feiqiu/schemas/schema_dws.sql` | 更新 `dws_assistant_daily_detail` 注释 | 6.7 |
|
||||
| 迁移 | `db/etl_feiqiu/migrations/` | 新建 DROP TABLE 迁移脚本 | 5.1–5.5 |
|
||||
| 属性测试 | `tests/test_property_1_fact_mappings.py` | 删除 `_REQ3_EXPECTED` 和相关引用 | 7.1–7.3 |
|
||||
| 运维 | `scripts/ops/dataflow_analyzer.py` | 删除 `ODS_ASSISTANT_ABOLISH` spec 条目 | 8.1 |
|
||||
| 运维 | `scripts/ops/gen_full_dataflow_doc.py` | 删除 `ODS_ASSISTANT_ABOLISH` spec 条目 | 8.1 |
|
||||
| 运维 | `scripts/ops/etl_consistency_check.py` | 删除废除相关映射 | 8.1–8.2 |
|
||||
| 运维 | `scripts/ops/blackbox_test_report.py` | 删除废除相关映射 | 8.1–8.4 |
|
||||
| 运维 | `scripts/ops/field_audit.py` | 删除废除表审计条目 | 8.3–8.4 |
|
||||
| 运维 | `scripts/ops/gen_field_review_doc.py` | 删除废除表字段定义 | 8.3–8.4 |
|
||||
| 运维 | `scripts/ops/gen_api_field_mapping.py` | 从 ODS_TABLES 列表移除 | 8.3 |
|
||||
| 运维 | `scripts/ops/export_dwd_field_review.py` | 删除废除表条目 | 8.4 |
|
||||
| 运维 | `scripts/ops/check_ods_latest_indexes.py` | 删除废除表索引检查 | 8.3 |
|
||||
|
||||
### 不需要修改的文件(确认安全)
|
||||
|
||||
| 文件 | 原因 |
|
||||
|------|------|
|
||||
| `dwd_assistant_service_log_ex` 表 DDL | 保留 `is_trash` 等字段(需求 9) |
|
||||
| `ods.assistant_service_records` 表 DDL | 保留 `is_trash` 等字段(需求 9) |
|
||||
| `assistant_monthly_task.py` | 仅消费 `dws_assistant_daily_detail` 的 `trashed_seconds`/`trashed_count`,不直接引用废除表 |
|
||||
| `assistant_salary_task.py` | 仅消费 `dws_assistant_monthly_summary`,不直接引用废除表 |
|
||||
|
||||
## 数据模型
|
||||
|
||||
### 被删除的表
|
||||
|
||||
```sql
|
||||
-- ODS 层
|
||||
ods.assistant_cancellation_records -- 78 条记录
|
||||
|
||||
-- DWD 层
|
||||
dwd.dwd_assistant_trash_event -- 主表
|
||||
dwd.dwd_assistant_trash_event_ex -- 扩展表
|
||||
```
|
||||
|
||||
### 保留的废除相关字段
|
||||
|
||||
```sql
|
||||
-- ods.assistant_service_records 中(保留)
|
||||
is_trash INT -- 废除标记
|
||||
trash_reason TEXT -- 废除原因
|
||||
trash_applicant_id BIGINT -- 废除申请人 ID
|
||||
trash_applicant_name TEXT -- 废除申请人姓名
|
||||
|
||||
-- dwd.dwd_assistant_service_log_ex 中(保留)
|
||||
is_trash INTEGER -- 废除标记
|
||||
trash_applicant_id BIGINT -- 废除申请人 ID
|
||||
trash_applicant_name VARCHAR(64) -- 废除申请人姓名
|
||||
trash_reason VARCHAR(255) -- 废除原因
|
||||
```
|
||||
|
||||
### DWS 层字段(保留,数据来源变更说明)
|
||||
|
||||
```sql
|
||||
-- dws.dws_assistant_daily_detail(保留,注释需更新)
|
||||
trashed_seconds INTEGER -- 数据来源:dwd_assistant_service_log_ex.is_trash + income_seconds
|
||||
trashed_count INTEGER -- 数据来源:dwd_assistant_service_log_ex.is_trash 计数
|
||||
|
||||
-- dws.dws_assistant_monthly_summary(保留)
|
||||
trashed_hours NUMERIC(10,2) -- 来自 daily_detail.trashed_seconds 汇总
|
||||
```
|
||||
|
||||
## DWS 代码重构细节
|
||||
|
||||
### assistant_daily_task.py 变更
|
||||
|
||||
**删除方法:**
|
||||
- `_extract_trash_records()` — 查询 `dwd.dwd_assistant_trash_event` 的 SQL,已无消费者
|
||||
- `_build_trash_index()` — 构建废除索引,已不参与判断逻辑
|
||||
|
||||
**修改方法:**
|
||||
- `extract()` — 移除对 `_extract_trash_records` 的调用,移除 `trash_records` 变量
|
||||
- `transform()` 或调用 `_aggregate_by_assistant_date` 的地方 — 移除 `trash_index` 参数传递
|
||||
- `_aggregate_by_assistant_date()` — 从签名中移除 `trash_index` 参数;`is_trash` 判断逻辑保持不变
|
||||
|
||||
**不变逻辑:**
|
||||
```python
|
||||
# 这段逻辑保持不变——通过 is_trash 字段判断废除
|
||||
is_trashed = bool(record.get('is_trash', 0))
|
||||
if is_trashed:
|
||||
agg['trashed_seconds'] += income_seconds
|
||||
agg['trashed_count'] += 1
|
||||
```
|
||||
|
||||
|
||||
## 正确性属性
|
||||
|
||||
*正确性属性是一种在系统所有有效执行中都应成立的特征或行为——本质上是关于系统应该做什么的形式化陈述。属性是人类可读规格与机器可验证正确性保证之间的桥梁。*
|
||||
|
||||
### Property 1:废除聚合逻辑正确性(is_trash 驱动)
|
||||
|
||||
*对任意*服务记录集合,其中每条记录包含 `is_trash` 标记和 `income_seconds` 值,聚合后:
|
||||
- `trashed_seconds` 应等于所有 `is_trash=1` 记录的 `income_seconds` 之和
|
||||
- `trashed_count` 应等于所有 `is_trash=1` 记录的数量
|
||||
- `total_service_count` 应等于所有 `is_trash=0` 记录的数量
|
||||
- `total_seconds` 应等于所有 `is_trash=0` 记录的 `income_seconds` 之和
|
||||
|
||||
**Validates: Requirements 3.4, 9.3**
|
||||
|
||||
### Property 2:FACT_MAPPINGS 一致性(已有属性测试的回归验证)
|
||||
|
||||
*对任意* FACT_MAPPINGS 中的表名,该表名应在 TABLE_MAP 中有对应的 ODS 源表映射,且映射的每个 DWD 列名应为合法的 SQL 标识符。
|
||||
|
||||
**Validates: Requirements 2.1–2.4, 7.3**
|
||||
|
||||
> 注:此属性已由 `tests/test_property_1_fact_mappings.py` 实现。清理后需确保该测试仍然通过。
|
||||
|
||||
## 错误处理
|
||||
|
||||
### 迁移脚本安全性
|
||||
|
||||
- 所有 `DROP TABLE` 语句使用 `IF EXISTS`,确保幂等执行
|
||||
- 迁移脚本在单个事务中执行,失败时自动回滚
|
||||
- 迁移脚本包含注释说明移除原因,便于审计追溯
|
||||
|
||||
### 代码删除安全性
|
||||
|
||||
- 删除 `_extract_trash_records` 和 `_build_trash_index` 前,确认无其他调用者
|
||||
- `_aggregate_by_assistant_date` 移除 `trash_index` 参数后,确认所有调用点已同步更新
|
||||
- 保留 `is_trash` 判断逻辑不变,确保废除统计功能不受影响
|
||||
|
||||
### 回滚策略
|
||||
|
||||
- DDL 变更通过迁移脚本管理,可通过反向迁移(CREATE TABLE)回滚
|
||||
- 代码变更通过 Git 版本控制回滚
|
||||
- ODS 表数据在删除前可选择性备份(数据量小,仅 78 条)
|
||||
|
||||
## 测试策略
|
||||
|
||||
### 属性测试
|
||||
|
||||
- 使用 `hypothesis` 库进行属性测试
|
||||
- 每个属性测试至少运行 100 次迭代
|
||||
- 每个测试用注释标注对应的设计属性编号
|
||||
|
||||
**Property 1 测试方案:**
|
||||
- 生成随机服务记录列表(包含随机 `is_trash` 标记和 `income_seconds` 值)
|
||||
- 调用 `_aggregate_by_assistant_date` 方法
|
||||
- 验证 `trashed_seconds`/`trashed_count` 与手动计算的期望值一致
|
||||
- 标签:`Feature: assistant-abolish-cleanup, Property 1: 废除聚合逻辑正确性`
|
||||
|
||||
**Property 2 测试方案:**
|
||||
- 已由 `tests/test_property_1_fact_mappings.py` 覆盖
|
||||
- 清理后运行 `pytest tests/ -v` 确认无回归
|
||||
- 标签:`Feature: assistant-abolish-cleanup, Property 2: FACT_MAPPINGS 一致性`
|
||||
|
||||
### 单元测试
|
||||
|
||||
- 验证 `AssistantDailyTask` 不再有 `_extract_trash_records` 和 `_build_trash_index` 方法
|
||||
- 验证 `_aggregate_by_assistant_date` 签名不包含 `trash_index` 参数
|
||||
- 验证 FACT_MAPPINGS 不包含废除表条目
|
||||
- 验证 TABLE_MAP 不包含废除表映射
|
||||
- 验证 DwdVerifier 配置不包含废除表
|
||||
|
||||
### 集成验证
|
||||
|
||||
- 运行现有属性测试套件:`pytest tests/ -v`
|
||||
- 运行 ETL 单元测试:`cd apps/etl/connectors/feiqiu && pytest tests/unit`
|
||||
- 确认所有测试通过,无回归
|
||||
127
.kiro/specs/assistant-abolish-cleanup/requirements.md
Normal file
127
.kiro/specs/assistant-abolish-cleanup/requirements.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# 需求文档:助教废除(Abolish)全链路清理
|
||||
|
||||
## 简介
|
||||
|
||||
上游 SaaS 系统提供了一个独立的"助教废除记录"API(`/AssistantPerformance/GetAbolitionAssistant`),
|
||||
ETL 系统为此建立了完整的 ODS → DWD → DWS 数据链路。但经排查发现:
|
||||
|
||||
1. **废除表 `dwd_assistant_trash_event` 无法与服务记录表 `dwd_assistant_service_log` 做 1:1 关联**——废除表没有 `assistant_service_id` 外键,两个 ID 不同源。
|
||||
2. **DWS 层已改用 `dwd_assistant_service_log_ex.is_trash` 字段**(来自 `assistant_service_records` API)直接判断服务是否被废除,不再依赖废除表做跨表匹配。
|
||||
3. 废除表的 `_extract_trash_records` 和 `_build_trash_index` 虽然仍被调用,但 `trash_index` 实际上不再参与废除判断逻辑(仅"备查"),属于死代码。
|
||||
4. `trashed_seconds` / `trashed_count` 等 DWS 字段的数据来源已从废除表切换为服务记录自身的 `income_seconds`,废除表数据不再被消费。
|
||||
|
||||
因此,整条 abolish 独立链路(API 抓取 → ODS 表 → DWD 表 → DWS 引用)可以安全移除,
|
||||
同时保留 `assistant_service_records` 中已有的 `is_trash` / `trash_reason` / `trash_applicant_*` 字段作为废除判断的唯一数据源。
|
||||
|
||||
## 术语表
|
||||
|
||||
- **ETL_System**:飞球 ETL Connector(`apps/etl/connectors/feiqiu/`)
|
||||
- **ODS_Layer**:原始数据层(`ods` schema),存储从上游 API 抓取的原始记录
|
||||
- **DWD_Layer**:明细数据层(`dwd` schema),存储清洗后的事实表和维度表
|
||||
- **DWS_Layer**:汇总数据层(`dws` schema),存储按业务粒度聚合的汇总表
|
||||
- **Abolish_Chain**:助教废除独立链路,包括 `ODS_ASSISTANT_ABOLISH` 任务、`ods.assistant_cancellation_records` 表、`dwd.dwd_assistant_trash_event` / `_ex` 表,以及 DWS 层中引用这些表的代码
|
||||
- **Service_Trash_Fields**:`assistant_service_records` API 中自带的废除标记字段(`is_trash`、`trash_reason`、`trash_applicant_id`、`trash_applicant_name`),已映射到 `dwd_assistant_service_log_ex` 表
|
||||
- **FACT_MAPPINGS**:`dwd_load_task.py` 中定义的 ODS → DWD 字段映射字典
|
||||
- **Task_Registry**:`orchestration/task_registry.py` 中的任务注册表
|
||||
|
||||
## 需求
|
||||
|
||||
### 需求 1:移除 ODS 层废除任务
|
||||
|
||||
**用户故事:** 作为 ETL 维护者,我希望移除不再使用的 ODS 抓取任务,以减少无效 API 调用和维护负担。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行调度时,THE Task_Registry SHALL 不包含 `ODS_ASSISTANT_ABOLISH` 任务注册
|
||||
2. WHEN ETL_System 加载 ODS 任务定义时,THE ETL_System SHALL 不包含 `OdsAssistantAbolishTask` 的 `OdsTaskSpec` 定义
|
||||
3. WHEN ETL_System 构建默认执行序列时,THE ETL_System SHALL 不包含 `ODS_ASSISTANT_ABOLISH` 任务代码
|
||||
|
||||
### 需求 2:移除 DWD 层废除表映射
|
||||
|
||||
**用户故事:** 作为 ETL 维护者,我希望移除废除表的 FACT_MAPPINGS 和 DWD 加载配置,以消除死代码。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN DWD_Layer 执行装载时,THE FACT_MAPPINGS SHALL 不包含 `dwd.dwd_assistant_trash_event` 的映射条目
|
||||
2. WHEN DWD_Layer 执行装载时,THE FACT_MAPPINGS SHALL 不包含 `dwd.dwd_assistant_trash_event_ex` 的映射条目
|
||||
3. WHEN DWD_Layer 构建 ODS→DWD 表映射时,THE ETL_System SHALL 不包含 `dwd.dwd_assistant_trash_event` 到 `ods.assistant_cancellation_records` 的映射关系
|
||||
4. WHEN DWD_Layer 构建 ODS→DWD 表映射时,THE ETL_System SHALL 不包含 `dwd.dwd_assistant_trash_event_ex` 到 `ods.assistant_cancellation_records` 的映射关系
|
||||
|
||||
### 需求 3:清理 DWS 层废除表引用
|
||||
|
||||
**用户故事:** 作为 ETL 维护者,我希望移除 DWS 任务中对废除表的查询和索引构建代码,以消除死代码路径。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN DWS_Layer 执行 `DWS_ASSISTANT_DAILY` 任务时,THE AssistantDailyTask SHALL 不调用 `_extract_trash_records` 方法
|
||||
2. WHEN DWS_Layer 执行 `DWS_ASSISTANT_DAILY` 任务时,THE AssistantDailyTask SHALL 不调用 `_build_trash_index` 方法
|
||||
3. WHEN DWS_Layer 执行 `DWS_ASSISTANT_DAILY` 任务时,THE AssistantDailyTask SHALL 不向 `_aggregate_by_assistant_date` 传递 `trash_index` 参数
|
||||
4. WHEN DWS_Layer 聚合服务记录时,THE AssistantDailyTask SHALL 仅通过 `is_trash` 字段(来自 `dwd_assistant_service_log_ex` JOIN)判断服务是否被废除
|
||||
|
||||
### 需求 4:清理 DWD 验证器配置
|
||||
|
||||
**用户故事:** 作为 ETL 维护者,我希望移除验证器中对废除表的引用,以避免验证器尝试校验已不存在的表。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN DWD_Layer 执行数据验证时,THE DwdVerifier SHALL 不包含 `dwd_assistant_trash_event` 的 ID 映射配置
|
||||
2. WHEN DWD_Layer 执行数据验证时,THE DwdVerifier SHALL 不包含 `dwd_assistant_trash_event_ex` 的 ID 映射配置
|
||||
3. WHEN DWD_Layer 执行数据验证时,THE DwdVerifier SHALL 不包含 `dwd_assistant_trash_event` 的时间字段映射配置
|
||||
4. WHEN DWD_Layer 执行数据验证时,THE DwdVerifier SHALL 不包含 `dwd_assistant_trash_event_ex` 的时间字段映射配置
|
||||
|
||||
### 需求 5:创建数据库迁移脚本
|
||||
|
||||
**用户故事:** 作为数据库管理员,我希望通过迁移脚本安全地移除废除相关的数据库对象,以保持 schema 整洁。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 迁移脚本执行时,THE 迁移脚本 SHALL 删除 `dwd.dwd_assistant_trash_event` 表
|
||||
2. WHEN 迁移脚本执行时,THE 迁移脚本 SHALL 删除 `dwd.dwd_assistant_trash_event_ex` 表
|
||||
3. WHEN 迁移脚本执行时,THE 迁移脚本 SHALL 删除 `ods.assistant_cancellation_records` 表
|
||||
4. WHEN 迁移脚本执行时,THE 迁移脚本 SHALL 在 DROP 前使用 `IF EXISTS` 防止重复执行报错
|
||||
5. WHEN 迁移脚本执行时,THE 迁移脚本 SHALL 包含注释说明移除原因
|
||||
|
||||
### 需求 6:同步更新 DDL 文档
|
||||
|
||||
**用户故事:** 作为 ETL 维护者,我希望 DDL schema 文件与实际数据库结构保持一致。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN DDL 文件更新后,THE `db/etl_feiqiu/schemas/dwd.sql` SHALL 不包含 `dwd_assistant_trash_event` 表的 CREATE TABLE 语句
|
||||
2. WHEN DDL 文件更新后,THE `db/etl_feiqiu/schemas/dwd.sql` SHALL 不包含 `dwd_assistant_trash_event_ex` 表的 CREATE TABLE 语句
|
||||
3. WHEN DDL 文件更新后,THE `db/etl_feiqiu/schemas/ods.sql` SHALL 不包含 `assistant_cancellation_records` 表的 CREATE TABLE 语句
|
||||
4. WHEN DDL 文件更新后,THE `db/etl_feiqiu/schemas/schema_dwd_doc.sql` SHALL 不包含 `dwd_assistant_trash_event` 相关的 CREATE TABLE 和 COMMENT 语句
|
||||
5. WHEN DDL 文件更新后,THE `db/etl_feiqiu/schemas/schema_ODS_doc.sql` SHALL 不包含 `assistant_cancellation_records` 相关的 CREATE TABLE 和 COMMENT 语句
|
||||
6. WHEN DDL 文件更新后,THE `db/etl_feiqiu/schemas/dws.sql` 中 `dws_assistant_daily_detail` 的注释 SHALL 不再引用 `dwd_assistant_trash_event` 作为数据来源
|
||||
7. WHEN DDL 文件更新后,THE `db/etl_feiqiu/schemas/schema_dws.sql` 中 `dws_assistant_daily_detail` 的注释 SHALL 不再引用 `dwd_assistant_trash_event` 作为数据来源
|
||||
|
||||
### 需求 7:更新属性测试
|
||||
|
||||
**用户故事:** 作为开发者,我希望属性测试反映清理后的实际状态,以确保测试的准确性。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 属性测试执行时,THE `test_property_1_fact_mappings.py` SHALL 不包含 `dwd.dwd_assistant_trash_event` 在 A 类表列表中
|
||||
2. WHEN 属性测试执行时,THE `test_property_1_fact_mappings.py` SHALL 不包含 `assistant_cancellation_records → dwd_assistant_trash_event` 的映射期望(`_REQ3_EXPECTED`)
|
||||
3. WHEN 属性测试执行后,THE 所有现有属性测试 SHALL 通过(无回归)
|
||||
|
||||
### 需求 8:更新运维脚本引用
|
||||
|
||||
**用户故事:** 作为运维人员,我希望运维脚本中不再引用已移除的表和任务,以避免脚本执行错误。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 运维脚本加载 ODS 任务映射时,THE 脚本 SHALL 不包含 `ODS_ASSISTANT_ABOLISH` 到 `assistant_cancellation_records` 的映射
|
||||
2. WHEN 运维脚本加载 DWD 表映射时,THE 脚本 SHALL 不包含 `dwd.dwd_assistant_trash_event` 到 `ods.assistant_cancellation_records` 的映射
|
||||
3. WHEN 运维脚本列举 ODS 表时,THE 脚本 SHALL 不包含 `assistant_cancellation_records`
|
||||
4. WHEN 运维脚本列举 DWD 表时,THE 脚本 SHALL 不包含 `dwd_assistant_trash_event` 和 `dwd_assistant_trash_event_ex`
|
||||
|
||||
### 需求 9:保留 Service_Trash_Fields 不受影响
|
||||
|
||||
**用户故事:** 作为 ETL 维护者,我希望确认清理操作不会影响 `assistant_service_records` 中已有的废除标记字段。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHILE 清理操作执行期间,THE `dwd_assistant_service_log_ex` 表 SHALL 保留 `is_trash`、`trash_reason`、`trash_applicant_id`、`trash_applicant_name` 字段不变
|
||||
2. WHILE 清理操作执行期间,THE `ods.assistant_service_records` 表 SHALL 保留 `is_trash`、`trash_reason`、`trash_applicant_id`、`trash_applicant_name` 字段不变
|
||||
3. WHILE 清理操作执行期间,THE AssistantDailyTask 中通过 `is_trash` 判断废除的逻辑 SHALL 保持正常工作
|
||||
99
.kiro/specs/assistant-abolish-cleanup/tasks.md
Normal file
99
.kiro/specs/assistant-abolish-cleanup/tasks.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# 实施计划:助教废除(Abolish)全链路清理
|
||||
|
||||
## 概述
|
||||
|
||||
按 ETL 数据流的逆序(DWS → DWD → ODS)清理废除链路,确保每一步都可验证。先清理代码引用,再清理 DDL 和数据库对象,最后更新运维脚本和测试。
|
||||
|
||||
## 任务
|
||||
|
||||
- [x] 1. 清理 DWS 层死代码
|
||||
- [x] 1.1 从 `assistant_daily_task.py` 中删除 `_extract_trash_records` 和 `_build_trash_index` 方法,从 `extract()` 中移除对 `_extract_trash_records` 的调用和 `trash_records` 变量,从 `_aggregate_by_assistant_date` 签名中移除 `trash_index` 参数,同步更新所有调用点。保留 `is_trash` 判断逻辑不变。
|
||||
- 更新文件头部的 docstring,移除对 `dwd_assistant_trash_event` 的数据来源引用
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4_
|
||||
|
||||
- [x] 1.2 编写属性测试验证废除聚合逻辑正确性
|
||||
- **Property 1: 废除聚合逻辑正确性(is_trash 驱动)**
|
||||
- 生成随机服务记录列表,验证 `trashed_seconds`/`trashed_count` 与 `is_trash=1` 记录的手动计算一致
|
||||
- **Validates: Requirements 3.4, 9.3**
|
||||
|
||||
- [x] 2. 清理 DWD 层映射和验证器
|
||||
- [x] 2.1 从 `dwd_load_task.py` 的 `FACT_MAPPINGS` 中删除 `dwd.dwd_assistant_trash_event` 和 `dwd.dwd_assistant_trash_event_ex` 条目,从 `TABLE_MAP` 中删除对应的 ODS→DWD 映射
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4_
|
||||
|
||||
- [x] 2.2 从 `dwd_verifier.py` 中删除 `dwd_assistant_trash_event` 和 `dwd_assistant_trash_event_ex` 的 ID 映射和时间字段映射配置
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4_
|
||||
|
||||
- [x] 3. 清理 ODS 层任务定义
|
||||
- [x] 3.1 从 `ods_tasks.py` 中删除 `ODS_ASSISTANT_ABOLISH` 的 `OdsTaskSpec` 定义,从默认执行序列中移除该任务代码
|
||||
- _Requirements: 1.2, 1.3_
|
||||
|
||||
- [x] 3.2 从 `task_registry.py` 中删除 `ODS_ASSISTANT_ABOLISH` 的注册语句(如果存在独立注册)
|
||||
- _Requirements: 1.1_
|
||||
|
||||
- [x] 4. Checkpoint — 确保 ETL 单元测试通过
|
||||
- 运行 `cd apps/etl/connectors/feiqiu && pytest tests/unit`,确保所有测试通过,如有问题请询问用户。
|
||||
|
||||
- [x] 5. 更新属性测试
|
||||
- [x] 5.1 从 `tests/test_property_1_fact_mappings.py` 中删除 `dwd.dwd_assistant_trash_event` 在 A 类表列表中的条目,删除 `_REQ3_EXPECTED` 映射期望及其在参数化测试中的引用
|
||||
- _Requirements: 7.1, 7.2_
|
||||
|
||||
- [x] 6. 创建数据库迁移脚本
|
||||
- [x] 6.1 在 `db/etl_feiqiu/migrations/` 下创建迁移脚本 `2026-02-22__drop_assistant_abolish_tables.sql`,包含 `DROP TABLE IF EXISTS` 语句删除 `ods.assistant_cancellation_records`、`dwd.dwd_assistant_trash_event`、`dwd.dwd_assistant_trash_event_ex`,以及删除相关索引(如 `idx_ods_assistant_cancellation_records_latest`),包含注释说明移除原因
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5_
|
||||
|
||||
- [x] 7. 更新 DDL schema 文件
|
||||
- [x] 7.1 从 `db/etl_feiqiu/schemas/dwd.sql` 中删除 `dwd_assistant_trash_event` 和 `dwd_assistant_trash_event_ex` 的 CREATE TABLE 和 COMMENT 语句
|
||||
- _Requirements: 6.1, 6.2_
|
||||
|
||||
- [x] 7.2 从 `db/etl_feiqiu/schemas/ods.sql` 中删除 `assistant_cancellation_records` 的 CREATE TABLE 和 COMMENT 语句
|
||||
- _Requirements: 6.3_
|
||||
|
||||
- [x] 7.3 从 `db/etl_feiqiu/schemas/schema_dwd_doc.sql` 中删除 `dwd_assistant_trash_event` 和 `dwd_assistant_trash_event_ex` 的 CREATE TABLE 和 COMMENT 语句
|
||||
- _Requirements: 6.4_
|
||||
|
||||
- [x] 7.4 从 `db/etl_feiqiu/schemas/schema_ODS_doc.sql` 中删除 `assistant_cancellation_records` 的 CREATE TABLE 和 COMMENT 语句
|
||||
- _Requirements: 6.5_
|
||||
|
||||
- [x] 7.5 更新 `db/etl_feiqiu/schemas/dws.sql` 和 `db/etl_feiqiu/schemas/schema_dws.sql` 中 `dws_assistant_daily_detail` 的注释,将数据来源从 `dwd_assistant_trash_event` 改为 `dwd_assistant_service_log_ex.is_trash`
|
||||
- _Requirements: 6.6, 6.7_
|
||||
|
||||
- [x] 8. 更新运维脚本
|
||||
- [x] 8.1 从 `scripts/ops/dataflow_analyzer.py` 中删除 `ODS_ASSISTANT_ABOLISH` spec 条目
|
||||
- _Requirements: 8.1_
|
||||
|
||||
- [x] 8.2 从 `scripts/ops/gen_full_dataflow_doc.py` 中删除 `ODS_ASSISTANT_ABOLISH` spec 条目
|
||||
- _Requirements: 8.1_
|
||||
|
||||
- [x] 8.3 从 `scripts/ops/etl_consistency_check.py` 中删除 `ODS_ASSISTANT_ABOLISH` 映射和 `dwd.dwd_assistant_trash_event` 映射
|
||||
- _Requirements: 8.1, 8.2_
|
||||
|
||||
- [x] 8.4 从 `scripts/ops/blackbox_test_report.py` 中删除 `assistant_cancellation_records` 在 ODS_TABLES 列表中的条目、`ODS_ASSISTANT_ABOLISH` 映射、`dwd.dwd_assistant_trash_event` 映射
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4_
|
||||
|
||||
- [x] 8.5 从 `scripts/ops/field_audit.py` 中删除 `assistant_cancellation_records` 审计条目
|
||||
- _Requirements: 8.3, 8.4_
|
||||
|
||||
- [x] 8.6 从 `scripts/ops/gen_field_review_doc.py` 中删除 `assistant_cancellation_records` 相关的字段定义块
|
||||
- _Requirements: 8.3, 8.4_
|
||||
|
||||
- [x] 8.7 从 `scripts/ops/gen_api_field_mapping.py` 中删除 `assistant_cancellation_records` 在 ODS_TABLES 列表中的条目
|
||||
- _Requirements: 8.3_
|
||||
|
||||
- [x] 8.8 从 `scripts/ops/export_dwd_field_review.py` 中删除 `dwd_assistant_trash_event` 和 `dwd_assistant_trash_event_ex` 条目
|
||||
- _Requirements: 8.4_
|
||||
|
||||
- [x] 8.9 从 `scripts/ops/check_ods_latest_indexes.py` 中删除 `idx_ods_assistant_cancellation_records_latest` 索引检查
|
||||
- _Requirements: 8.3_
|
||||
|
||||
- [x] 9. 最终 Checkpoint — 确保所有测试通过
|
||||
- 运行 `cd apps/etl/connectors/feiqiu && pytest tests/unit` 和 `cd C:\NeoZQYY && pytest tests/ -v`
|
||||
- 确认所有测试通过,无回归,如有问题请询问用户。
|
||||
- _Requirements: 7.3, 9.1, 9.2, 9.3_
|
||||
|
||||
## 备注
|
||||
|
||||
- 标记 `*` 的任务为可选,可跳过以加速 MVP
|
||||
- 每个任务引用了具体的需求编号,便于追溯
|
||||
- Checkpoint 确保增量验证
|
||||
- 属性测试验证通用正确性属性,单元测试验证具体示例和边界情况
|
||||
- 本次清理涉及高风险路径(`tasks/`、`orchestration/`、`db/`),完成后需运行 `/audit`
|
||||
1
.kiro/specs/dataflow-field-completion/.config.kiro
Normal file
1
.kiro/specs/dataflow-field-completion/.config.kiro
Normal file
@@ -0,0 +1 @@
|
||||
{"generationMode": "requirements-first"}
|
||||
447
.kiro/specs/dataflow-field-completion/design.md
Normal file
447
.kiro/specs/dataflow-field-completion/design.md
Normal file
@@ -0,0 +1,447 @@
|
||||
# 设计文档:数据流字段补全与前后端联调
|
||||
|
||||
## 概述
|
||||
|
||||
本设计基于 `dataflow_2026-02-19_190440.md` 数据流分析报告,覆盖两大任务:
|
||||
|
||||
1. **字段补全**:对 11 张 ODS/DWD 表执行字段映射补全,包括 DDL 更新、ETL loader/task 代码同步、文档精化
|
||||
2. **DWS 库存汇总**:在 DWS 层新建日/周/月三个粒度的库存汇总表,基于 DWD goods_stock_summary 数据构建
|
||||
3. **前后端联调**:确保 admin-web 前端与 FastAPI 后端的 ETL 执行流程完整可用,含计时和黑盒测试
|
||||
|
||||
核心设计原则:
|
||||
- **执行依据**:字段补全部分基于排查结论文档 `export/SYSTEM/REPORTS/field_audit/field_review_for_user.md`(由 `FIELD_AUDIT_ROOT` 环境变量配置路径)
|
||||
- **先确认再新增**:对每个疑似缺失字段,必须先排查是否已存在(可能是命名差异、已映射到其他列、或已在 FACT_MAPPINGS 中以不同名称配置),确认确实缺失后才执行新增
|
||||
- 所有字段映射变更通过 `DwdLoadTask.FACT_MAPPINGS` 声明式配置,不修改核心合并逻辑
|
||||
- 新建 DWD 表遵循现有 main/ex 分表模式(核心字段 → main 表,扩展字段 → ex 表)
|
||||
- DDL 变更通过迁移脚本(`db/etl_feiqiu/migrations/`)执行,同步更新 schema 文件
|
||||
- 控制无效字段新增:仅在确认字段确实缺失且有业务价值时才新增
|
||||
|
||||
## 架构
|
||||
|
||||
### 现有 ETL 数据流架构
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
API[上游 SaaS API] -->|JSON| ODS_Loader[GenericODSLoader]
|
||||
ODS_Loader -->|UPSERT| ODS[(ODS 表)]
|
||||
ODS -->|SELECT| DWD_Task[DwdLoadTask]
|
||||
DWD_Task -->|SCD2 合并| DIM[(DWD 维度表)]
|
||||
DWD_Task -->|增量插入| FACT[(DWD 事实表)]
|
||||
```
|
||||
|
||||
### 字段映射机制
|
||||
|
||||
`DwdLoadTask` 使用两层映射策略:
|
||||
1. **自动映射**:ODS 列名与 DWD 列名相同时自动匹配
|
||||
2. **显式映射**:通过 `FACT_MAPPINGS` 字典声明 `(dwd_col, ods_expr, cast_type)` 三元组
|
||||
|
||||
本次变更主要操作 `FACT_MAPPINGS` 和 `TABLE_MAP`,以及对应的 DDL。
|
||||
|
||||
### 前后端联调架构
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
AdminWeb[Admin Web<br/>React + Ant Design] -->|HTTP/WS| Backend[FastAPI 后端]
|
||||
Backend -->|subprocess| ETL[ETL CLI]
|
||||
ETL -->|SQL| DB[(PostgreSQL)]
|
||||
Backend -->|WebSocket| AdminWeb
|
||||
```
|
||||
|
||||
## 字段排查结论(已完成)
|
||||
|
||||
排查工作已完成,详细结论见 `export/SYSTEM/REPORTS/field_audit/field_review_for_user.md`。
|
||||
|
||||
排查方法包括:查 DWD 表现有列、查 FACT_MAPPINGS、查 ODS 表现有列、查自动映射、查 API JSON 样本、数据库实际数据验证。排查发现 4 个映射错误、21 个待新增字段、2 张需新建 DWD 表、6 个跳过字段。
|
||||
|
||||
## 组件与接口
|
||||
|
||||
### 任务 1:字段补全涉及的组件
|
||||
|
||||
| 组件 | 文件路径 | 变更类型 |
|
||||
|------|---------|---------|
|
||||
| DWD 加载任务 | `tasks/dwd/dwd_load_task.py` | 修改 `FACT_MAPPINGS`、`TABLE_MAP` |
|
||||
| ODS DDL | `db/etl_feiqiu/schemas/ods.sql` | 新增列(store_goods_master 嵌套展开) |
|
||||
| DWD DDL | `db/etl_feiqiu/schemas/dwd.sql` | 新增列、新建表 |
|
||||
| 迁移脚本 | `db/etl_feiqiu/migrations/` | 新增 ALTER TABLE / CREATE TABLE |
|
||||
| ODS 加载器 | `loaders/ods/generic.py` | 可能需要扩展 columns 列表 |
|
||||
| BD_Manual 文档 | `docs/database/` | 更新字段说明 |
|
||||
|
||||
### 任务 2:前后端联调涉及的组件
|
||||
|
||||
| 组件 | 文件路径 | 变更类型 |
|
||||
|------|---------|---------|
|
||||
| 执行 API | `apps/backend/app/routers/` | 调试/修复参数传递 |
|
||||
| 执行页面 | `apps/admin-web/src/pages/TaskManager.tsx` | 调试/修复前端逻辑 |
|
||||
| 计时模块 | `apps/etl/connectors/feiqiu/utils/` | 新增计时器工具 |
|
||||
| 黑盒测试 | `apps/etl/connectors/feiqiu/quality/` | 新增数据一致性检查 |
|
||||
|
||||
## 数据模型
|
||||
|
||||
### 字段补全分类
|
||||
|
||||
根据 `field_review_for_user.md` 排查结论,将变更分为四类:
|
||||
|
||||
#### 🔴 映射错误修复(高优先级)
|
||||
|
||||
| 表 | 问题 | 修正方案 |
|
||||
|----|------|---------|
|
||||
| assistant_service_records | DWD `site_assistant_id` 错误映射自 ODS `order_assistant_id` | 修正映射源 + 新增 `order_assistant_id` 列 |
|
||||
| store_goods_sales_records | DWD `discount_price` 实际映射自 ODS `discount_money`(列名误导) | 重命名 DWD 列 + 新增真正的 `discount_price` |
|
||||
| store_goods_master | `batch_stock_qty` 映射自 `stock`(错误),`provisional_total_cost` 映射自 `total_purchase_cost`(错误) | 修正 FACT_MAPPINGS 源列 |
|
||||
|
||||
#### A 类:新增 DWD 列 + FACT_MAPPINGS
|
||||
|
||||
| 表 | 新增字段数 | DWD 目标 |
|
||||
|----|----------|---------|
|
||||
| assistant_accounts_master | 4 | dim_assistant_ex |
|
||||
| assistant_service_records | 2 | dwd_assistant_service_log_ex |
|
||||
| assistant_cancellation_records | 0(仅更新映射) | dwd_assistant_trash_event |
|
||||
| member_balance_changes | 1 | dwd_member_balance_change_ex |
|
||||
| site_tables_master | 14 | dim_table_ex |
|
||||
|
||||
#### B 类:仅补 FACT_MAPPINGS(DWD 列已存在)
|
||||
|
||||
| 表 | 说明 |
|
||||
|----|------|
|
||||
| recharge_settlements | 5 个字段,DWD 列已存在,ODS/DWD 两侧数据全为 0(业务未启用) |
|
||||
|
||||
#### 跳过(无需变更)
|
||||
|
||||
| 表 | 原因 |
|
||||
|----|------|
|
||||
| tenant_goods_master | `commoditycode` 与 `commodity_code` 100% 冗余(花括号包裹格式),跳过 |
|
||||
| store_goods_master(time_slot_sale) | ODS 列不存在,跳过 |
|
||||
|
||||
#### C 类:需新建 DWD 表
|
||||
|
||||
| 表 | ODS 字段数 | DWD 新表 | 备注 |
|
||||
|----|----------|---------|------|
|
||||
| goods_stock_summary | 14 | dwd_goods_stock_summary | 需先修改 ODS 配置 `requires_window=True` 并重新采集 |
|
||||
| goods_stock_movements | 19 | dwd_goods_stock_movement | 事实表,按 createtime 增量加载 |
|
||||
|
||||
#### C 类:疑似需新建 DWD 表(需排查是否有替代方案)
|
||||
|
||||
| 表 | ODS 字段数 | 疑似新建 DWD 表 | 排查重点 |
|
||||
|----|----------|---------------|---------|
|
||||
| goods_stock_summary | 14 | dwd_goods_stock_summary | 确认是否有意不建 DWD 表(如数据直接在 ODS 层使用) |
|
||||
| goods_stock_movements | 19 | dwd_goods_stock_movement | 同上 |
|
||||
|
||||
### 已确认的映射关系(排查结论)
|
||||
|
||||
以下映射关系已通过数据库实际数据验证确认:
|
||||
|
||||
| 字段 | 排查结论 | 所在表 |
|
||||
|------|---------|-------|
|
||||
| discount_price (store_goods_sales) | 🔴 DWD `discount_price` 实际映射自 ODS `discount_money`,需重命名 + 新增 | store_goods_sales_records |
|
||||
| commoditycode (tenant_goods) | ⏭️ 与 `commodity_code` 100% 冗余,跳过 | tenant_goods_master |
|
||||
| site_assistant_id (assistant_service) | 🔴 DWD 错误映射自 ODS `order_assistant_id`,需修正 | assistant_service_records |
|
||||
| recharge 电费/券字段 | ✅ DWD 列已存在,仅需补 FACT_MAPPINGS(数据全为 0) | recharge_settlements |
|
||||
| batch_stock_qty (store_goods) | 🔴 错误映射自 `stock`,应映射自 `batch_stock_quantity` | store_goods_master |
|
||||
| provisional_total_cost (store_goods) | 🔴 错误映射自 `total_purchase_cost`,应映射自 `provisional_total_cost` | store_goods_master |
|
||||
|
||||
### 新建 DWD 表设计
|
||||
|
||||
#### dwd_goods_stock_summary
|
||||
|
||||
```sql
|
||||
CREATE TABLE dwd.dwd_goods_stock_summary (
|
||||
site_goods_id bigint NOT NULL,
|
||||
goods_name text,
|
||||
goods_unit text,
|
||||
goods_category_id bigint,
|
||||
goods_category_second_id bigint,
|
||||
category_name text,
|
||||
range_start_stock numeric,
|
||||
range_end_stock numeric,
|
||||
range_in numeric,
|
||||
range_out numeric,
|
||||
range_sale numeric,
|
||||
range_sale_money numeric(12,2),
|
||||
range_inventory numeric,
|
||||
current_stock numeric,
|
||||
site_id bigint,
|
||||
tenant_id bigint,
|
||||
fetched_at timestamptz,
|
||||
PRIMARY KEY (site_goods_id)
|
||||
);
|
||||
```
|
||||
|
||||
#### dwd_goods_stock_movement
|
||||
|
||||
```sql
|
||||
CREATE TABLE dwd.dwd_goods_stock_movement (
|
||||
site_goods_stock_id bigint NOT NULL,
|
||||
tenant_id bigint,
|
||||
site_id bigint,
|
||||
site_goods_id bigint,
|
||||
goods_name text,
|
||||
goods_category_id bigint,
|
||||
goods_second_category_id bigint,
|
||||
unit text,
|
||||
price numeric(12,2),
|
||||
stock_type integer,
|
||||
change_num numeric,
|
||||
start_num numeric,
|
||||
end_num numeric,
|
||||
change_num_a numeric,
|
||||
start_num_a numeric,
|
||||
end_num_a numeric,
|
||||
remark text,
|
||||
operator_name text,
|
||||
create_time timestamptz,
|
||||
fetched_at timestamptz,
|
||||
PRIMARY KEY (site_goods_stock_id)
|
||||
);
|
||||
```
|
||||
|
||||
### recharge_settlements 映射关系
|
||||
|
||||
ODS 列与 DWD 列的对应关系(命名转换):
|
||||
|
||||
| ODS 列(驼峰) | DWD 列(蛇形) |
|
||||
|---------------|--------------|
|
||||
| plcouponsaleamount | pl_coupon_sale_amount |
|
||||
| mervousalesamount | mervou_sales_amount |
|
||||
| electricitymoney | electricity_money |
|
||||
| realelectricitymoney | real_electricity_money |
|
||||
| electricityadjustmoney | electricity_adjust_money |
|
||||
|
||||
这 5 个字段在 `dwd_recharge_order` 中已有列定义但缺少 FACT_MAPPINGS 条目,需要补充映射。
|
||||
|
||||
### store_goods_master 映射修正
|
||||
|
||||
根据排查结论,该表存在两个映射错误(非新增字段):
|
||||
|
||||
| DWD 列 | 当前错误映射 ODS 列 | 正确 ODS 列 | 验证结果 |
|
||||
|--------|-------------------|------------|---------|
|
||||
| `batch_stock_qty` | `stock`(当前库存) | `batch_stock_quantity`(批次库存) | 仅 7.3% 行相等 |
|
||||
| `provisional_total_cost` | `total_purchase_cost`(实际采购成本) | `provisional_total_cost`(暂估成本) | 93.5% 行相等但 113 行不同 |
|
||||
|
||||
`time_slot_sale` ODS 列不存在,跳过。`goodsStockWarningInfo` 嵌套展开不在本次范围内。
|
||||
|
||||
### DWS 库存汇总表设计(日/周/月)
|
||||
|
||||
基于 `field_review_for_user.md` 第 10 章发现,goods_stock_summary API 支持 `startTime`/`endTime` 参数返回时间范围内的库存汇总数据。在 ODS 任务配置修改(`requires_window=True` + `time_fields=("startTime", "endTime")`)并重新采集后,DWD 层 `dwd_goods_stock_summary` 将拥有带时间范围的真实数据,可在此基础上构建 DWS 层汇总。
|
||||
|
||||
#### 三张 DWS 表
|
||||
|
||||
| 表名 | 粒度 | 任务代码 | stat_period |
|
||||
|------|------|---------|-------------|
|
||||
| `dws.dws_goods_stock_daily_summary` | 日 | `DWS_GOODS_STOCK_DAILY` | `'daily'` |
|
||||
| `dws.dws_goods_stock_weekly_summary` | 周 | `DWS_GOODS_STOCK_WEEKLY` | `'weekly'` |
|
||||
| `dws.dws_goods_stock_monthly_summary` | 月 | `DWS_GOODS_STOCK_MONTHLY` | `'monthly'` |
|
||||
|
||||
#### DDL 设计(三张表结构相同)
|
||||
|
||||
```sql
|
||||
CREATE TABLE dws.dws_goods_stock_daily_summary (
|
||||
site_id bigint NOT NULL,
|
||||
tenant_id bigint,
|
||||
stat_date date NOT NULL,
|
||||
site_goods_id bigint NOT NULL,
|
||||
goods_name text,
|
||||
goods_unit text,
|
||||
goods_category_id bigint,
|
||||
goods_category_second_id bigint,
|
||||
category_name text,
|
||||
range_start_stock numeric,
|
||||
range_end_stock numeric,
|
||||
range_in numeric,
|
||||
range_out numeric,
|
||||
range_sale numeric,
|
||||
range_sale_money numeric(12,2),
|
||||
range_inventory numeric,
|
||||
current_stock numeric,
|
||||
stat_period text NOT NULL DEFAULT 'daily',
|
||||
created_at timestamptz NOT NULL DEFAULT now(),
|
||||
updated_at timestamptz NOT NULL DEFAULT now(),
|
||||
PRIMARY KEY (site_id, stat_date, site_goods_id)
|
||||
);
|
||||
```
|
||||
|
||||
周度表和月度表结构相同,仅表名和 `stat_period` 默认值不同(`'weekly'` / `'monthly'`)。
|
||||
|
||||
#### 任务实现模式
|
||||
|
||||
继承 `BaseDwsTask`,实现 `extract` / `transform` / `load` 三阶段:
|
||||
|
||||
- **extract**:从 `dwd.dwd_goods_stock_summary` 按时间范围查询数据
|
||||
- **transform**:按粒度(日/周/月)对 `stat_date` 进行分组聚合,计算各库存指标
|
||||
- 日度:直接取 DWD 数据(`stat_date` = 采集日期)
|
||||
- 周度:按 ISO 周分组,`stat_date` = 周一日期
|
||||
- 月度:按自然月分组,`stat_date` = 月首日期
|
||||
- **load**:使用 `upsert` 写入目标表,主键冲突时更新
|
||||
|
||||
#### 前置依赖
|
||||
|
||||
- 需求 7(goods_stock_summary 新建 DWD 表)必须先完成
|
||||
- ODS 任务配置修改(`requires_window=True`)必须先完成并重新采集数据
|
||||
|
||||
#### 文件位置
|
||||
|
||||
- DDL:`db/etl_feiqiu/schemas/dws.sql`
|
||||
- 迁移脚本:`db/etl_feiqiu/migrations/{date}__create_dws_goods_stock_summary.sql`
|
||||
- 任务代码:`apps/etl/connectors/feiqiu/tasks/dws/goods_stock_daily_task.py`、`goods_stock_weekly_task.py`、`goods_stock_monthly_task.py`
|
||||
|
||||
### settlement_ticket_details 彻底移除设计
|
||||
|
||||
从项目中完整移除 `settlement_ticket_details`(结账小票详情)相关的所有代码、DDL、配置、文档和数据。
|
||||
|
||||
#### 需要移除的文件/代码位置
|
||||
|
||||
| 层级 | 文件路径 | 移除内容 |
|
||||
|------|---------|---------|
|
||||
| ETL 任务定义 | `tasks/ods/ods_tasks.py` | `OdsTaskSpec("ODS_SETTLEMENT_TICKET", ...)`、`OdsSettlementTicketTask` 类、`ENABLED_ODS_CODES` 中的条目、`ODS_TASK_CLASSES` 覆盖 |
|
||||
| ETL 校验 | `tasks/verification/dwd_verifier.py` | `settlement_ticket_details` 主键映射条目 |
|
||||
| ETL 校验 | `tasks/verification/ods_verifier.py` | 相关注释和特殊处理逻辑 |
|
||||
| ETL 手动导入 | `tasks/utility/manual_ingest_task.py` | `settlement_ticket_details` 的表映射和配置 |
|
||||
| JSON 存储 | `utils/json_store.py` | `/order/getordersettleticketnew` 的路径映射 |
|
||||
| ODS 间隙检查 | `scripts/check/check_ods_gaps.py` | `_check_settlement_tickets` 函数及调用 |
|
||||
| 黑盒调试 | `scripts/debug/debug_blackbox.py` | `ODS_SETTLEMENT_TICKET` 跳过逻辑 |
|
||||
| DDL | `db/etl_feiqiu/schemas/ods.sql`、`schema_ODS_doc.sql` | `settlement_ticket_details` 建表语句和注释 |
|
||||
| 种子数据 | `db/etl_feiqiu/seeds/seed_ods_tasks.sql` | `ODS_SETTLEMENT_TICKET` 条目 |
|
||||
| 索引检查 | `scripts/ops/check_ods_latest_indexes.py` | `idx_ods_settlement_ticket_details_latest` |
|
||||
| 分析脚本 | `scripts/ops/gen_full_dataflow_doc.py` | ODS spec 条目和特殊跳过逻辑 |
|
||||
| 分析脚本 | `scripts/ops/gen_field_review_doc.py` | 第 12 章 settlement_ticket_details 配置 |
|
||||
| 分析脚本 | `scripts/ops/gen_api_field_mapping.py` | 表名列表中的条目 |
|
||||
| 分析脚本 | `scripts/ops/field_audit.py` | 排查配置和特殊处理 |
|
||||
| 分析脚本 | `scripts/ops/export_dwd_field_review.py` | 字段列表配置 |
|
||||
| 分析脚本 | `scripts/ops/dataflow_analyzer.py` | ODS spec 条目和跳过逻辑 |
|
||||
| 文档 | `docs/database/etl_feiqiu_schema_migration.md` | 索引条目 |
|
||||
| ETL 文档 | `apps/etl/connectors/feiqiu/docs/etl_tasks/` | 任务表格条目 |
|
||||
| 单元测试 | `tests/unit/test_ods_tasks.py` | `test_ods_settlement_ticket_by_payment_relate_ids` |
|
||||
|
||||
#### 迁移脚本
|
||||
|
||||
```sql
|
||||
-- 移除 settlement_ticket_details 表和索引
|
||||
DROP INDEX IF EXISTS ods.idx_ods_settlement_ticket_details_latest;
|
||||
DROP TABLE IF EXISTS ods.settlement_ticket_details;
|
||||
|
||||
-- 移除 meta.ods_task_registry 中的任务注册
|
||||
DELETE FROM meta.ods_task_registry WHERE task_code = 'ODS_SETTLEMENT_TICKET';
|
||||
```
|
||||
|
||||
#### 注意事项
|
||||
|
||||
- `export/` 下的报告文件(`field_audit_report.md`、`dataflow_api_ods_dwd.md` 等)为历史产物,不需要手动清理,下次重新生成时自然不再包含
|
||||
- `docs/audit/` 下的审计日志为历史记录,保留不动
|
||||
- `tmp/` 下的临时文件不需要处理
|
||||
|
||||
|
||||
|
||||
## 正确性属性
|
||||
|
||||
*正确性属性是一种在系统所有合法执行中都应成立的特征或行为——本质上是对系统应做什么的形式化陈述。属性是人类可读规格与机器可验证正确性保证之间的桥梁。*
|
||||
|
||||
### Property 1:FACT_MAPPINGS 字段映射正确性
|
||||
|
||||
*对于任意* ODS 表行和任意已配置的 `FACT_MAPPINGS` 条目 `(dwd_col, ods_expr, cast_type)`,当 DWD 加载任务执行后,DWD 目标行中 `dwd_col` 列的值应等于从 ODS 行中按 `ods_expr` 提取并按 `cast_type` 转换后的值。
|
||||
|
||||
**Validates: Requirements 1.1, 1.2, 2.1, 3.1, 4.1, 5.1, 6.1, 6.2, 7.2, 8.2, 9.1, 10.3, 11.1**
|
||||
|
||||
### Property 2:FACT_MAPPINGS 引用完整性
|
||||
|
||||
*对于任意* `FACT_MAPPINGS` 中的映射条目,其 DWD 目标列名必须存在于对应 DWD 表的列定义中,其 ODS 源表达式引用的列名必须存在于对应 ODS 表的列定义中(或为合法的 SQL 表达式)。
|
||||
|
||||
**Validates: Requirements 6.3**
|
||||
|
||||
### Property 3:TABLE_MAP 覆盖完整性
|
||||
|
||||
*对于任意* 在 `TABLE_MAP` 中注册的 DWD 表,该表的所有非 SCD2 列要么在 `FACT_MAPPINGS` 中有显式映射,要么在对应 ODS 表中存在同名列(自动映射)。
|
||||
|
||||
**Validates: Requirements 7.2, 8.2**
|
||||
|
||||
### Property 4:映射错误修正后数据一致性
|
||||
|
||||
*对于任意* 已修正映射的字段(assistant_service_records.site_assistant_id、store_goods_sales_records.discount_price、store_goods_master.batch_stock_qty、store_goods_master.provisional_total_cost),修正后 DWD 目标列的值应等于从正确的 ODS 源列提取的值,而非修正前的错误源列。
|
||||
|
||||
**Validates: Requirements 2.1, 4.1, 10.3**
|
||||
|
||||
### Property 5:ETL 参数解析与 CLI 命令构建正确性
|
||||
|
||||
*对于任意* 合法的 ETL 执行参数组合(门店列表、数据源模式、校验模式、时间范围、窗口切分、force-full 标志、任务选择),Backend 构建的 CLI 命令字符串应包含所有指定参数,且参数值与输入一致。
|
||||
|
||||
**Validates: Requirements 14.1, 14.2**
|
||||
|
||||
### Property 6:数据一致性检查正确性
|
||||
|
||||
*对于任意* ODS 行和对应的 DWD 行,黑盒测试检查器应能正确识别:(a) ODS 中存在但 DWD 中缺失的字段,(b) ODS 与 DWD 之间值不一致的字段。
|
||||
|
||||
**Validates: Requirements 16.2, 16.3**
|
||||
|
||||
### Property 7:计时器记录完整性
|
||||
|
||||
*对于任意* ETL 步骤序列,计时器输出应包含每个步骤的名称、开始时间、结束时间和耗时,且耗时等于结束时间减去开始时间。
|
||||
|
||||
**Validates: Requirements 15.2**
|
||||
|
||||
### Property 8:DWS 库存汇总粒度聚合正确性
|
||||
|
||||
*对于任意* DWD 库存汇总数据集和任意汇总粒度(日/周/月),DWS 汇总任务的 transform 输出应满足:(a) 每条记录的 `stat_period` 与任务粒度一致,(b) 同一 `(site_id, stat_date, site_goods_id)` 组合不重复,(c) 日度汇总的记录数不少于周度和月度汇总的记录数。
|
||||
|
||||
**Validates: Requirements 12.2, 12.3, 12.4, 12.5, 12.6**
|
||||
|
||||
## 错误处理
|
||||
|
||||
### 字段补全错误处理
|
||||
|
||||
| 场景 | 处理方式 |
|
||||
|------|---------|
|
||||
| DDL 迁移失败 | 回滚事务,记录错误日志,不影响其他表 |
|
||||
| ODS 列不存在 | 跳过该映射条目,记录 WARNING 日志 |
|
||||
| 类型转换失败 | 使用 NULLIF + CAST 兜底,转换失败写入 NULL |
|
||||
| 新建 DWD 表主键冲突 | 使用 ON CONFLICT DO UPDATE 策略 |
|
||||
|
||||
### DWS 库存汇总错误处理
|
||||
|
||||
| 场景 | 处理方式 |
|
||||
|------|---------|
|
||||
| DWD 源表无数据 | 跳过汇总,记录 WARNING 日志 |
|
||||
| 跨周/跨月边界数据不完整 | 按已有数据汇总,不补零 |
|
||||
| upsert 主键冲突 | 使用 ON CONFLICT DO UPDATE 更新已有记录 |
|
||||
| DWD 表尚未创建(前置依赖未完成) | 抛出明确错误,提示需先完成需求 7 |
|
||||
|
||||
### 前后端联调错误处理
|
||||
|
||||
| 场景 | 处理方式 |
|
||||
|------|---------|
|
||||
| 参数校验失败 | 返回 422 状态码,附带详细错误信息 |
|
||||
| ETL 子进程超时 | 设置超时阈值,超时后终止进程并返回错误 |
|
||||
| WebSocket 断连 | 前端自动重连,后端缓存最近日志 |
|
||||
| 黑盒测试发现不一致 | 记录差异明细到报告,不中断流程 |
|
||||
|
||||
## 测试策略
|
||||
|
||||
### 属性测试
|
||||
|
||||
使用 `hypothesis` 库进行属性测试,每个属性至少运行 100 次迭代。
|
||||
|
||||
- **Property 1-3**:通过 FakeDB 模拟 ODS/DWD 表结构,生成随机 ODS 行数据,验证 FACT_MAPPINGS 映射逻辑
|
||||
- **Property 4**:对修正后的映射字段,验证 DWD 值来自正确的 ODS 源列
|
||||
- **Property 5**:生成随机参数组合,验证 CLI 命令构建
|
||||
- **Property 6**:生成随机 ODS/DWD 行对,验证一致性检查逻辑
|
||||
- **Property 7**:生成随机步骤序列,验证计时器输出
|
||||
- **Property 8**:生成随机 DWD 库存数据,验证日/周/月三个粒度的聚合逻辑正确性
|
||||
|
||||
测试标签格式:`Feature: dataflow-field-completion, Property N: {property_text}`
|
||||
|
||||
### 单元测试
|
||||
|
||||
- DDL 迁移脚本语法正确性(SQL 解析)
|
||||
- 各表 FACT_MAPPINGS 条目的具体映射值验证
|
||||
- DWS 库存汇总任务的边界值测试(跨周/跨月数据、空数据集)
|
||||
- 前端参数表单的边界值测试
|
||||
- 计时器精度测试
|
||||
|
||||
### 集成测试
|
||||
|
||||
- 端到端 ETL 执行:从 API JSON 到 DWD 落库的完整流程
|
||||
- 前后端联调:从 Admin Web 触发到 ETL 完成的完整流程
|
||||
- 黑盒测试:全量数据一致性验证
|
||||
|
||||
### 测试工具
|
||||
|
||||
- ETL 单元测试使用 `tests/unit/task_test_utils.py` 提供的 FakeDB/FakeAPI
|
||||
- 属性测试使用 `hypothesis` 库
|
||||
- 后端测试使用 `pytest` + FastAPI TestClient
|
||||
228
.kiro/specs/dataflow-field-completion/requirements.md
Normal file
228
.kiro/specs/dataflow-field-completion/requirements.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# 需求文档:数据流字段补全与前后端联调
|
||||
|
||||
## 简介
|
||||
|
||||
本特性基于 `dataflow_2026-02-19_190440.md` 数据流分析报告,完成三大任务:
|
||||
1. 补全 11 张 ODS/DWD 表中缺失的字段映射(含 DDL 更新、ETL loader/task 代码同步、文档精化)
|
||||
2. 在 DWS 层新建库存汇总表,支持日/周/月三个粒度的库存数据汇总
|
||||
3. 管理后台(admin-web)前后端联调,确保 ETL 全流程可通过 Web 界面正确触发和执行
|
||||
|
||||
## 术语表
|
||||
|
||||
- **ETL_System**:飞球连接器 ETL 系统(`apps/etl/connectors/feiqiu/`),负责从上游 API 抽取数据并经 ODS→DWD→DWS 三层处理
|
||||
- **ODS**:Operational Data Store,原始数据层,保留 API 返回的原始字段
|
||||
- **DWD**:Data Warehouse Detail,明细数据层,经清洗、标准化后的业务字段
|
||||
- **DDL**:Data Definition Language,数据库表结构定义(位于 `db/etl_feiqiu/schemas/`)
|
||||
- **Loader**:ETL 加载器(`loaders/`),负责将 ODS 数据清洗映射到 DWD 表
|
||||
- **Task**:ETL 任务(`tasks/`),编排 loader 的执行逻辑
|
||||
- **Admin_Web**:管理后台(`apps/admin-web/`),React + Vite + Ant Design 前端
|
||||
- **Backend**:FastAPI 后端(`apps/backend/`),提供 ETL 调度和数据查询 API
|
||||
- **SCD2**:缓慢变化维度类型 2,用于维度表历史版本追踪
|
||||
- **BD_Manual**:业务数据字典文档(`docs/database/`),记录字段含义和映射关系
|
||||
- **Field_Mapping**:字段映射关系,描述 API JSON → ODS 列 → DWD 列的对应关系
|
||||
- **DWS**:Data Warehouse Summary,汇总数据层,按业务维度聚合的统计数据
|
||||
- **BaseDwsTask**:DWS 任务基类(`tasks/dws/base_dws_task.py`),提供 extract/transform/load 三阶段框架
|
||||
|
||||
## 执行依据
|
||||
|
||||
本需求文档的字段补全部分基于以下排查结论文档:
|
||||
- `export/SYSTEM/REPORTS/field_audit/field_review_for_user.md` — 逐表逐字段的排查结论与操作建议
|
||||
|
||||
## 需求
|
||||
|
||||
### 需求 1:assistant_accounts_master 字段补全
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将助教账号档案表中 4 个未映射的 ODS 字段(system_role_id、job_num、cx_unit_price、pd_unit_price)补全到 DWD 层,以便下游分析可以使用完整的助教档案数据。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 assistant_accounts_master 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `system_role_id` 映射到 DWD 目标表 `dim_assistant_ex` 的对应列
|
||||
2. WHEN ETL_System 执行 assistant_accounts_master 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `job_num`、`cx_unit_price`、`pd_unit_price` 映射到 DWD 目标表 `dim_assistant_ex` 的对应列
|
||||
3. WHEN 新字段被添加到 DWD 表, THE DDL SHALL 在 `db/etl_feiqiu/schemas/dwd.sql` 中包含对应的 ALTER TABLE 或 CREATE 语句
|
||||
4. WHEN 字段映射完成后, THE BD_Manual SHALL 更新对应的字段说明文档,消除"待补充""待分析"等模糊描述
|
||||
|
||||
### 需求 2:assistant_service_records 字段补全
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将助教服务流水表中 3 个未映射的 ODS 字段(site_assistant_id、operator_id、operator_name)补全到 DWD 层,以便追踪服务操作员信息。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 assistant_service_records 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `site_assistant_id`、`operator_id`、`operator_name` 映射到 DWD 目标表 `dwd_assistant_service_log_ex` 的对应列
|
||||
2. WHEN 新字段被添加到 DWD 表, THE DDL SHALL 在 `dwd.sql` 中包含对应的列定义
|
||||
3. WHEN 字段映射完成后, THE BD_Manual SHALL 更新对应的字段说明文档
|
||||
|
||||
### 需求 3:assistant_cancellation_records 字段补全
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将助教废除记录表中 1 个未映射的 ODS 字段(assistanton)补全到 DWD 层。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 assistant_cancellation_records 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `assistanton` 映射到 DWD 目标表 `dwd_assistant_trash_event_ex` 的对应列
|
||||
2. WHEN 新字段被添加到 DWD 表, THE DDL SHALL 在 `dwd.sql` 中包含对应的列定义
|
||||
3. WHEN 字段映射完成后, THE BD_Manual SHALL 对 `assistanton` 字段进行语义分析并补充精确说明
|
||||
|
||||
### 需求 4:store_goods_sales_records 字段补全
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将门店商品销售流水表中 1 个未映射的 ODS 字段(discount_price)补全到 DWD 层,以便分析折后单价。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 store_goods_sales_records 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `discount_price` 映射到 DWD 目标表 `dwd_store_goods_sale_ex` 的对应列
|
||||
2. WHEN 新字段被添加到 DWD 表, THE DDL SHALL 在 `dwd.sql` 中包含对应的列定义,类型为 `numeric`(金额精度)
|
||||
3. WHEN 字段映射完成后, THE BD_Manual SHALL 更新对应的字段说明文档
|
||||
|
||||
### 需求 5:member_balance_changes 字段补全
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将会员余额变动表中 1 个未映射的 ODS 字段(relate_id)补全到 DWD 层,以便关联充值记录或订单。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 member_balance_changes 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `relate_id` 映射到 DWD 目标表 `dwd_member_balance_change_ex` 的对应列
|
||||
2. WHEN 新字段被添加到 DWD 表, THE DDL SHALL 在 `dwd.sql` 中包含对应的列定义
|
||||
3. WHEN 字段映射完成后, THE BD_Manual SHALL 更新对应的字段说明文档
|
||||
|
||||
|
||||
### 需求 6:recharge_settlements 字段补全与映射建立
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将充值结算表中 5 个 ODS→DWD 未映射字段补全,并为 5 个 DWD 无 ODS 源字段建立正确的映射关系,以便电费和券销售额数据完整流转。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 recharge_settlements 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `electricityadjustmoney`、`electricitymoney`、`mervousalesamount`、`plcouponsaleamount`、`realelectricitymoney` 映射到 DWD 目标表 `dwd_recharge_order` 的对应列
|
||||
2. WHEN DWD 表存在无 ODS 源的列(`pl_coupon_sale_amount`、`mervou_sales_amount`、`electricity_money`、`real_electricity_money`、`electricity_adjust_money`), THE Loader SHALL 建立从 ODS 对应列到这些 DWD 列的映射关系
|
||||
3. WHEN 映射建立后, THE ETL_System SHALL 确保 ODS 列名(驼峰式)与 DWD 列名(蛇形式)之间的命名转换正确
|
||||
4. WHEN 字段映射完成后, THE DDL SHALL 同步更新,THE BD_Manual SHALL 更新对应的字段说明文档
|
||||
|
||||
### 需求 7:goods_stock_summary 新建 DWD 表与字段映射
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望为库存汇总表新建 DWD 目标表,并将 14 个 ODS 字段完整映射,以便库存数据可在 DWD 层使用。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 需要加载 goods_stock_summary 数据到 DWD 层, THE DDL SHALL 在 `dwd.sql` 中创建新的 DWD 目标表(如 `dwd_goods_stock_summary`)
|
||||
2. WHEN DWD 目标表创建后, THE Loader SHALL 将全部 14 个 ODS 列(sitegoodsid、goodsname、goodsunit、goodscategoryid、goodscategorysecondid、categoryname、rangestartstock、rangeendstock、rangein、rangeout、rangesale、rangesalemoney、rangeinventory、currentstock)映射到 DWD 目标表
|
||||
3. WHEN 新表创建后, THE ETL_System SHALL 创建对应的 DWD loader 和 task 代码
|
||||
4. WHEN 新表创建后, THE BD_Manual SHALL 为新表编写完整的字段说明文档
|
||||
|
||||
### 需求 8:goods_stock_movements 新建 DWD 表与字段映射
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望为库存变化记录表新建 DWD 目标表,并将 19 个 ODS 字段完整映射,以便库存变动明细可在 DWD 层使用。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 需要加载 goods_stock_movements 数据到 DWD 层, THE DDL SHALL 在 `dwd.sql` 中创建新的 DWD 目标表(如 `dwd_goods_stock_movement`)
|
||||
2. WHEN DWD 目标表创建后, THE Loader SHALL 将全部 19 个 ODS 列映射到 DWD 目标表
|
||||
3. WHEN 新表创建后, THE ETL_System SHALL 创建对应的 DWD loader 和 task 代码
|
||||
4. WHEN 新表创建后, THE BD_Manual SHALL 为新表编写完整的字段说明文档
|
||||
|
||||
### 需求 9:site_tables_master 字段补全
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将台桌维表中 14 个未映射的 ODS 字段补全到 DWD 层,以便台桌配置信息完整可用。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 site_tables_master 的 DWD 加载任务, THE Loader SHALL 将 14 个 ODS 列(sitename、appletQrCodeUrl、audit_status、charge_free、create_time、delay_lights_time、is_rest_area、light_status、only_allow_groupon、order_delay_time、self_table、tablestatusname、temporary_light_second、virtual_table)映射到 DWD 目标表 `dim_table_ex` 的对应列
|
||||
2. WHEN 新字段被添加到 DWD 表, THE DDL SHALL 在 `dwd.sql` 中包含对应的列定义
|
||||
3. WHEN 字段映射完成后, THE BD_Manual SHALL 更新对应的字段说明文档,消除"待补充""待分析"等模糊描述
|
||||
|
||||
### 需求 10:store_goods_master 字段补全与嵌套展开
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将门店商品档案表中的平层未映射字段、嵌套对象字段、ODS→DWD 未映射字段全部补全,以便商品档案数据完整。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 store_goods_master 的 ODS 加载任务, THE Loader SHALL 将 API 平层字段 `time_slot_sale` 映射到 ODS 表的对应列
|
||||
2. WHEN ETL_System 执行 store_goods_master 的 ODS 加载任务, THE Loader SHALL 将嵌套对象 `goodsStockWarningInfo` 的 4 个子字段(site_goods_id、sales_day、warning_day_max、warning_day_min)展开并映射到 ODS 表的对应列
|
||||
3. WHEN ETL_System 执行 store_goods_master 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `batch_stock_quantity`、`provisional_total_cost` 以及展开后的库存预警字段映射到 DWD 目标表(根据字段用途自动分配到 `dim_store_goods` 或 `dim_store_goods_ex`)
|
||||
4. WHEN 新字段被添加, THE DDL SHALL 同步更新 `ods.sql` 和 `dwd.sql`
|
||||
5. WHEN 字段映射完成后, THE BD_Manual SHALL 更新对应的字段说明文档
|
||||
|
||||
### 需求 11:tenant_goods_master 字段补全
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望将租户商品档案表中 1 个未映射的 ODS 字段(commoditycode)补全到 DWD 层。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL_System 执行 tenant_goods_master 的 DWD 加载任务, THE Loader SHALL 将 ODS 列 `commoditycode` 映射到 DWD 目标表 `dim_tenant_goods_ex` 的对应列
|
||||
2. WHEN 新字段被添加到 DWD 表, THE DDL SHALL 在 `dwd.sql` 中包含对应的列定义
|
||||
3. WHEN 字段映射完成后, THE BD_Manual SHALL 更新对应的字段说明文档
|
||||
|
||||
### 需求 12:DWS 库存汇总(日/周/月)
|
||||
|
||||
**用户故事:** 作为数据分析师,我希望在 DWS 层拥有日度、周度、月度三个粒度的库存汇总表,以便按不同时间维度分析商品库存变化趋势。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 需求 7(goods_stock_summary 新建 DWD 表)完成且 ODS 任务配置已修改(`requires_window=True` + `time_fields=("startTime", "endTime")`)并重新采集数据后, THE ETL_System SHALL 具备构建 DWS 库存汇总的数据基础
|
||||
2. WHEN ETL_System 执行 DWS_GOODS_STOCK_DAILY 任务, THE ETL_System SHALL 从 DWD 层 `dwd_goods_stock_summary` 提取数据,按日粒度汇总并写入 `dws.dws_goods_stock_daily_summary`
|
||||
3. WHEN ETL_System 执行 DWS_GOODS_STOCK_WEEKLY 任务, THE ETL_System SHALL 从 DWD 层提取数据,按周粒度汇总并写入 `dws.dws_goods_stock_weekly_summary`
|
||||
4. WHEN ETL_System 执行 DWS_GOODS_STOCK_MONTHLY 任务, THE ETL_System SHALL 从 DWD 层提取数据,按月粒度汇总并写入 `dws.dws_goods_stock_monthly_summary`
|
||||
5. THE DWS 库存汇总表 SHALL 包含以下字段:site_id、tenant_id、stat_date(汇总日期)、site_goods_id、goods_name、goods_unit、goods_category_id、goods_category_second_id、category_name(商品维度)、range_start_stock、range_end_stock、range_in、range_out、range_sale、range_sale_money、range_inventory、current_stock(库存指标)、stat_period(汇总粒度标识:'daily'/'weekly'/'monthly')
|
||||
6. THE DWS 库存汇总表 SHALL 以 `(site_id, stat_date, site_goods_id)` 为主键,支持按门店、日期、商品维度的唯一性约束
|
||||
7. WHEN DWS 库存汇总任务执行时, THE ETL_System SHALL 继承 `BaseDwsTask`,实现 `extract` / `transform` / `load` 三阶段
|
||||
8. WHEN DWS 库存汇总表创建后, THE DDL SHALL 在 `db/etl_feiqiu/schemas/dws.sql` 中包含建表语句,迁移脚本放在 `db/etl_feiqiu/migrations/`
|
||||
9. WHEN DWS 库存汇总任务代码创建后, THE ETL_System SHALL 将任务代码放在 `apps/etl/connectors/feiqiu/tasks/dws/` 目录下
|
||||
|
||||
### 需求 17:彻底移除 settlement_ticket_details
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望从项目中彻底移除 settlement_ticket_details(结账小票详情)相关的所有代码、DDL、配置、文档和数据,以便简化系统维护并消除无用的数据流。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 移除任务完成后, THE ETL_System SHALL 不再包含 `ODS_SETTLEMENT_TICKET` 任务代码(从 `ods_tasks.py` 的 `ENABLED_ODS_CODES`、`ODS_TASK_CLASSES`、`OdsSettlementTicketTask` 类中移除)
|
||||
2. WHEN 移除任务完成后, THE DDL SHALL 不再包含 `ods.settlement_ticket_details` 表定义(从 `ods.sql` / `schema_ODS_doc.sql` 中移除建表语句和注释)
|
||||
3. WHEN 移除任务完成后, THE ETL_System SHALL 从以下位置移除所有 settlement_ticket_details 引用:
|
||||
- `tasks/ods/ods_tasks.py`(OdsTaskSpec、OdsSettlementTicketTask 类、ENABLED_ODS_CODES)
|
||||
- `tasks/verification/dwd_verifier.py`、`tasks/verification/ods_verifier.py`
|
||||
- `tasks/utility/manual_ingest_task.py`
|
||||
- `utils/json_store.py`
|
||||
- `scripts/check/check_ods_gaps.py`
|
||||
- `scripts/debug/debug_blackbox.py`
|
||||
4. WHEN 移除任务完成后, THE ETL_System SHALL 从 `db/etl_feiqiu/seeds/seed_ods_tasks.sql` 中移除 `ODS_SETTLEMENT_TICKET`
|
||||
5. WHEN 移除任务完成后, THE BD_Manual SHALL 从 `docs/database/etl_feiqiu_schema_migration.md` 和 ETL 任务文档中移除相关条目
|
||||
6. WHEN 移除任务完成后, THE ETL_System SHALL 编写迁移脚本 `DROP TABLE IF EXISTS ods.settlement_ticket_details` 和 `DROP INDEX IF EXISTS ods.idx_ods_settlement_ticket_details_latest`
|
||||
7. WHEN 移除任务完成后, THE ETL_System SHALL 从 `scripts/ops/` 下的分析脚本(`gen_full_dataflow_doc.py`、`gen_field_review_doc.py`、`gen_api_field_mapping.py`、`field_audit.py`、`export_dwd_field_review.py`、`dataflow_analyzer.py`、`check_ods_latest_indexes.py`)中移除相关引用
|
||||
8. WHEN 移除任务完成后, THE ETL_System SHALL 从单元测试 `tests/unit/test_ods_tasks.py` 中移除 `test_ods_settlement_ticket_by_payment_relate_ids` 测试
|
||||
|
||||
### 需求 13:文档精化
|
||||
|
||||
**用户故事:** 作为数据工程师,我希望对所有涉及的 BD_Manual 文档进行精细化更新,消除所有模糊描述,以便团队成员可以准确理解每个字段的含义。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 文档精化任务执行时, THE BD_Manual SHALL 逐个文档、逐项排查所有"待补充""待处理""未确定""未定义"等缺失内容
|
||||
2. WHEN 文档精化任务执行时, THE BD_Manual SHALL 将"金额字段""XX 相关""XXX 类"等粗略说明替换为精确的字段语义描述
|
||||
3. WHEN 字段说明需要精化时, THE ETL_System SHALL 通过手动字段名称分析、上下文推测、遍历值/枚举值分析、代码取用情况分析来确定字段含义
|
||||
4. WHEN 文档更新完成后, THE BD_Manual SHALL 确保每个字段说明包含:字段类型、业务含义、取值范围或枚举值、在代码中的使用位置
|
||||
|
||||
### 需求 14:Admin-Web 前后端联调
|
||||
|
||||
**用户故事:** 作为系统管理员,我希望通过管理后台 Web 界面触发 ETL 全流程执行,以便可视化管理数据处理任务。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 管理员在 Admin_Web 中配置 ETL 参数(全部门店、api_full、仅校验修复且校验前从 API 获取、自定义范围 2025-11-01 至 2026-02-20、窗口切分 10 天、force-full、全选常用功能), THE Backend SHALL 正确接收并解析这些参数
|
||||
2. WHEN Backend 接收到 ETL 执行请求, THE Backend SHALL 将参数转换为 ETL_System 可识别的命令并触发执行
|
||||
3. WHEN ETL 任务执行时, THE Admin_Web SHALL 实时展示任务执行状态和进度
|
||||
4. WHEN 所有选中的任务执行完成后, THE ETL_System SHALL 确保数据处理结果正确(源数据与落库数据/字段一致)
|
||||
|
||||
### 需求 15:ETL 执行计时机制
|
||||
|
||||
**用户故事:** 作为系统管理员,我希望 ETL 执行过程中有详细的计时记录,以便分析各步骤的性能瓶颈。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN ETL 任务开始执行, THE ETL_System SHALL 启动计时器,记录每个步骤和分步骤的开始时间
|
||||
2. WHEN 每个步骤完成时, THE ETL_System SHALL 记录该步骤的耗时(精确到毫秒)
|
||||
3. WHEN 全部任务执行完成后, THE ETL_System SHALL 输出详细颗粒度的计时结果文档,包含每个步骤名称、开始时间、结束时间、耗时
|
||||
|
||||
### 需求 16:黑盒测试机制
|
||||
|
||||
**用户故事:** 作为质量保证工程师,我希望在 ETL 全流程完成后执行黑盒测试,验证数据源与落库数据的一致性。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 所有 ETL 步骤顺利完成后, THE ETL_System SHALL 以黑盒测试者角度检查数据源和落库数据/字段情况是否一致
|
||||
2. WHEN 黑盒测试执行时, THE ETL_System SHALL 对比 API 源数据与 ODS 落库数据的字段完整性
|
||||
3. WHEN 黑盒测试执行时, THE ETL_System SHALL 对比 ODS 数据与 DWD 落库数据的映射正确性
|
||||
4. WHEN 黑盒测试完成后, THE ETL_System SHALL 输出黑盒测试报告,包含每张表的检查结果、差异明细、通过/失败状态
|
||||
304
.kiro/specs/dataflow-field-completion/tasks.md
Normal file
304
.kiro/specs/dataflow-field-completion/tasks.md
Normal file
@@ -0,0 +1,304 @@
|
||||
# 实现计划:数据流字段补全与前后端联调
|
||||
|
||||
## 概述
|
||||
|
||||
按"先排查确认 → 再 DDL 变更 → 再代码映射 → 再移除废弃表 → 再 DWS 汇总 → 再文档精化 → 最后联调"的顺序,分阶段推进。每个表的字段补全遵循"先确认再新增"原则。
|
||||
|
||||
执行依据:`export/SYSTEM/REPORTS/field_audit/field_review_for_user.md`(逐表逐字段排查结论)
|
||||
|
||||
## 任务
|
||||
|
||||
- [x] 1. 字段排查脚本与基础设施
|
||||
- [x] 1.1 编写字段排查脚本 `scripts/ops/field_audit.py`
|
||||
- 连接数据库,对每张目标表执行排查流程:查 DWD 现有列、查 FACT_MAPPINGS 现状、查 ODS 列、查自动映射
|
||||
- 输出排查记录表(markdown 格式),标注每个字段的排查结论和建议操作
|
||||
- 覆盖 11 张表的所有疑似缺失字段
|
||||
- _Requirements: 1.1-1.4, 2.1-2.3, 3.1-3.3, 4.1-4.3, 5.1-5.3, 6.1-6.4, 7.1-7.4, 8.1-8.4, 9.1-9.3, 10.1-10.5, 11.1-11.3_
|
||||
|
||||
- [x] 1.2 执行排查脚本,生成排查报告
|
||||
- 运行脚本,审查输出结果
|
||||
- _Requirements: 1.1-1.4_
|
||||
|
||||
- [x] 1.3 逐字段调查与推测,确认排查结论(由 Kiro 执行)
|
||||
- 结论文档:`export/SYSTEM/REPORTS/field_audit/field_review_for_user.md`
|
||||
- 脚本输出仅为线索,不能直接作为最终结论
|
||||
- 对脚本标记为"缺失"或"对不齐"的每个字段,Kiro 需逐一执行以下调查:
|
||||
- 查阅 FACT_MAPPINGS 源码,确认是否已以其他名称/表达式映射
|
||||
- 查阅 DWD DDL,确认是否已有同语义列(命名差异)
|
||||
- 查阅 ODS loader 代码,确认 ODS 列是否真实写入
|
||||
- 结合字段命名规律、上下文语义、业务逻辑进行推测
|
||||
- 必要时查询数据库实际数据(SELECT DISTINCT / 采样)辅助判断
|
||||
- 对每个字段标注最终决策:无需变更 / 仅补映射 / 新增列+映射 / 跳过(附理由)
|
||||
- 将调查过程和推测依据记录到排查报告中,确保可追溯
|
||||
- _Requirements: 1.1-1.4, 2.1-2.3, 3.1-3.3, 4.1-4.3, 5.1-5.3, 6.1-6.4, 7.1-7.4, 8.1-8.4, 9.1-9.3, 10.1-10.5, 11.1-11.3_
|
||||
|
||||
- [x] 2. Checkpoint - 排查结果确认
|
||||
- 此项已完成,最终文档为:`export/SYSTEM/REPORTS/field_audit/field_review_for_user.md`
|
||||
|
||||
- [x] 3. 🔴 映射错误修复(高优先级)
|
||||
- 依据:`field_review_for_user.md` 映射错误修复章节
|
||||
|
||||
- [x] 3.1 assistant_service_records — site_assistant_id 映射错误修正
|
||||
- 当前问题:DWD `dwd_assistant_service_log.site_assistant_id` 错误映射自 ODS `order_assistant_id`(订单级 ID),应映射自 ODS `site_assistant_id`(助教档案 ID)
|
||||
- 修正 FACT_MAPPINGS:将 DWD `site_assistant_id` 的 ODS 源从 `order_assistant_id` 改为 `site_assistant_id`
|
||||
- 新增 DWD 列 `order_assistant_id`(bigint)到 `dwd_assistant_service_log`,映射 ODS `order_assistant_id`
|
||||
- 编写迁移脚本
|
||||
- 需重新加载历史数据
|
||||
- _Requirements: 2.1, 2.2_
|
||||
|
||||
- [x] 3.2 store_goods_sales_records — discount_price 列名误导修正
|
||||
- 当前问题:DWD `discount_price` 实际映射自 ODS `discount_money`(折扣金额),而非 ODS `discount_price`(折后单价)
|
||||
- 将 DWD 列名 `discount_price` 重命名为 `discount_money`
|
||||
- 新增 DWD 列 `discount_price`(numeric),映射 ODS `discount_price`(折后单价)
|
||||
- 编写迁移脚本
|
||||
- _Requirements: 4.1, 4.2_
|
||||
|
||||
- [x] 3.3 store_goods_master — batch_stock_qty 和 provisional_total_cost 映射源修正
|
||||
- `batch_stock_qty` 的 FACT_MAPPINGS 从 `stock`(当前库存)改为 `batch_stock_quantity`(批次库存)
|
||||
- `provisional_total_cost` 的 FACT_MAPPINGS 从 `total_purchase_cost`(实际采购成本)改为 `provisional_total_cost`(暂估成本)
|
||||
- 需重新加载历史数据
|
||||
- _Requirements: 10.3_
|
||||
|
||||
- [x] 4. A 类表:新增 DWD 列 + FACT_MAPPINGS
|
||||
- 依据:`field_review_for_user.md` 各表"待新增/补映射字段"
|
||||
|
||||
- [x] 4.1 assistant_accounts_master — 新增 4 个字段到 dim_assistant_ex
|
||||
- 新增列:`system_role_id`(bigint)、`job_num`(text)、`cx_unit_price`(numeric(18,2))、`pd_unit_price`(numeric(18,2))
|
||||
- 更新 `dwd.sql`,添加 FACT_MAPPINGS 条目
|
||||
- 编写迁移脚本
|
||||
- _Requirements: 1.1, 1.2, 1.3_
|
||||
|
||||
- [x] 4.2 assistant_service_records — 新增 2 个字段到 dwd_assistant_service_log_ex
|
||||
- 新增列:`operator_id`(bigint)、`operator_name`(text)
|
||||
- 跳过 `siteprofile`(jsonb 嵌套列)
|
||||
- 更新 `dwd.sql`,添加 FACT_MAPPINGS 条目
|
||||
- 编写迁移脚本
|
||||
- _Requirements: 2.1, 2.2_
|
||||
|
||||
- [x] 4.3 assistant_cancellation_records — 更新 FACT_MAPPINGS
|
||||
- 更新映射:ODS `assistanton` → DWD `dwd_assistant_trash_event.assistant_no`
|
||||
- 跳过 `siteprofile`(jsonb 嵌套列)
|
||||
- _Requirements: 3.1_
|
||||
|
||||
- [x] 4.4 member_balance_changes — 新增 1 个字段到 dwd_member_balance_change_ex
|
||||
- 新增列:`relate_id`(bigint)— 关联业务单据 ID
|
||||
- 更新 `dwd.sql`,添加 FACT_MAPPINGS 条目
|
||||
- 编写迁移脚本
|
||||
- _Requirements: 5.1, 5.2_
|
||||
|
||||
- [x] 4.5 site_tables_master — 新增 14 个字段到 dim_table_ex
|
||||
- 新增列:`create_time`(timestamptz)、`light_status`(integer)、`tablestatusname`(text)、`sitename`(text)、`appletQrCodeUrl`(text)、`audit_status`(integer)、`charge_free`(integer)、`delay_lights_time`(integer)、`is_rest_area`(integer)、`only_allow_groupon`(integer)、`order_delay_time`(integer)、`self_table`(integer)、`temporary_light_second`(integer)、`virtual_table`(integer)
|
||||
- 更新 `dwd.sql`,添加 FACT_MAPPINGS 条目
|
||||
- 编写迁移脚本
|
||||
- _Requirements: 9.1, 9.2_
|
||||
|
||||
- [x] 4.6 tenant_goods_master — 无需变更(跳过)
|
||||
- `commoditycode` 与 `commodity_code` 100% 冗余(花括号包裹格式),已确认跳过
|
||||
- _Requirements: 11.1(已确认无需操作)_
|
||||
|
||||
- [x] 4.7 编写 A 类表字段映射属性测试
|
||||
- **Property 1: FACT_MAPPINGS 字段映射正确性**
|
||||
- **Validates: Requirements 1.1, 1.2, 2.1, 3.1, 4.1, 5.1, 9.1**
|
||||
|
||||
- [x] 5. B 类表:仅补 FACT_MAPPINGS / 修正映射
|
||||
- 依据:`field_review_for_user.md` B 类表章节
|
||||
|
||||
- [x] 5.1 recharge_settlements — 补充 5 个 FACT_MAPPINGS 条目
|
||||
- 仅补映射(DWD 列已存在,ODS/DWD 两侧数据全为 0,业务未启用):
|
||||
- `plcouponsaleamount → pl_coupon_sale_amount`
|
||||
- `mervousalesamount → mervou_sales_amount`
|
||||
- `electricitymoney → electricity_money`
|
||||
- `realelectricitymoney → real_electricity_money`
|
||||
- `electricityadjustmoney → electricity_adjust_money`
|
||||
- 无需 DDL 变更,无需迁移脚本
|
||||
- _Requirements: 6.1, 6.2, 6.3_
|
||||
|
||||
- [x] 5.2 编写 B 类表属性测试
|
||||
- **Property 2: FACT_MAPPINGS 引用完整性**
|
||||
- **Validates: Requirements 6.3**
|
||||
|
||||
- [x] 5.5. Checkpoint - 映射修复与 A/B 类表完成确认
|
||||
- 确保所有映射错误已修正、A/B 类表的 FACT_MAPPINGS、DDL、迁移脚本都已更新
|
||||
- Ask the user if questions arise.
|
||||
|
||||
- [x] 6. C 类表:新建 DWD 表与完整映射
|
||||
- 依据:`field_review_for_user.md` C 类表章节
|
||||
|
||||
- [x] 6.1 goods_stock_summary — 修改 ODS 配置 + 新建 DWD 表
|
||||
- 步骤 1:修改 ODS 任务配置 `requires_window=True` + `time_fields=("startTime", "endTime")`
|
||||
- 步骤 2:重新采集历史数据(按时间窗口分批)
|
||||
- 步骤 3:编写 DDL 创建 `dwd.dwd_goods_stock_summary`(14 个字段)
|
||||
- 步骤 4:在 `TABLE_MAP` 中注册,在 `FACT_MAPPINGS` 中添加映射
|
||||
- 步骤 5:创建 DWD loader 和 task 代码
|
||||
- 编写迁移脚本
|
||||
- _Requirements: 7.1, 7.2, 7.3_
|
||||
|
||||
- [x] 6.2 goods_stock_movements — 新建 DWD 表
|
||||
- 编写 DDL 创建 `dwd.dwd_goods_stock_movement`(19 个字段,事实表,按 createtime 增量加载)
|
||||
- 在 `TABLE_MAP` 中注册,在 `FACT_MAPPINGS` 中添加映射(驼峰 → 蛇形命名)
|
||||
- 创建 DWD loader 和 task 代码
|
||||
- 编写迁移脚本
|
||||
- _Requirements: 8.1, 8.2, 8.3_
|
||||
|
||||
- [x] 6.3 编写 C 类表属性测试
|
||||
- **Property 3: TABLE_MAP 覆盖完整性**
|
||||
- **Validates: Requirements 7.2, 8.2**
|
||||
|
||||
- [x] 7. Checkpoint - 全部字段补全完成
|
||||
- 确保所有 11 张表的字段补全工作完成,所有测试通过,ask the user if questions arise.
|
||||
|
||||
- [x] 7.3. 彻底移除 settlement_ticket_details
|
||||
- [x] 7.3.1 移除 ETL 核心代码中的 settlement_ticket_details
|
||||
- 从 `tasks/ods/ods_tasks.py` 中移除:`OdsTaskSpec("ODS_SETTLEMENT_TICKET", ...)`、`OdsSettlementTicketTask` 类、`ENABLED_ODS_CODES` 中的条目、`ODS_TASK_CLASSES` 覆盖
|
||||
- 从 `tasks/verification/dwd_verifier.py` 中移除主键映射条目
|
||||
- 从 `tasks/verification/ods_verifier.py` 中移除相关注释和特殊处理
|
||||
- 从 `tasks/utility/manual_ingest_task.py` 中移除表映射和配置
|
||||
- 从 `utils/json_store.py` 中移除 `/order/getordersettleticketnew` 路径映射
|
||||
- _Requirements: 17.1, 17.3_
|
||||
|
||||
- [x] 7.3.2 移除 DDL、种子数据和迁移脚本
|
||||
- 从 `db/etl_feiqiu/schemas/ods.sql` 和 `schema_ODS_doc.sql` 中移除建表语句和注释
|
||||
- 从 `db/etl_feiqiu/seeds/seed_ods_tasks.sql` 中移除 `ODS_SETTLEMENT_TICKET`
|
||||
- 编写迁移脚本:`DROP TABLE IF EXISTS ods.settlement_ticket_details`、`DROP INDEX`、`DELETE FROM meta.ods_task_registry`
|
||||
- _Requirements: 17.2, 17.4, 17.6_
|
||||
|
||||
- [x] 7.3.3 移除分析脚本和工具中的引用
|
||||
- 从 `scripts/ops/` 下移除:`gen_full_dataflow_doc.py`、`gen_field_review_doc.py`、`gen_api_field_mapping.py`、`field_audit.py`、`export_dwd_field_review.py`、`dataflow_analyzer.py`、`check_ods_latest_indexes.py` 中的相关引用
|
||||
- 从 `scripts/check/check_ods_gaps.py` 中移除 `_check_settlement_tickets` 函数及调用
|
||||
- 从 `scripts/debug/debug_blackbox.py` 中移除跳过逻辑
|
||||
- _Requirements: 17.7_
|
||||
|
||||
- [x] 7.3.4 移除文档和测试中的引用
|
||||
- 从 `docs/database/etl_feiqiu_schema_migration.md` 中移除索引条目
|
||||
- 从 `apps/etl/connectors/feiqiu/docs/etl_tasks/` 中移除任务表格条目
|
||||
- 从 `tests/unit/test_ods_tasks.py` 中移除 `test_ods_settlement_ticket_by_payment_relate_ids`
|
||||
- _Requirements: 17.5, 17.8_
|
||||
|
||||
- [x] 7.5. DWS 库存汇总(日/周/月)
|
||||
- 前置依赖:任务 6.1(goods_stock_summary DWD 表)完成、ODS 任务配置修改(`requires_window=True`)完成并重新采集数据
|
||||
|
||||
- [x] 7.5.1 编写 DWS 库存汇总 DDL 与迁移脚本
|
||||
- 在 `db/etl_feiqiu/schemas/dws.sql` 中添加三张表的建表语句:`dws_goods_stock_daily_summary`、`dws_goods_stock_weekly_summary`、`dws_goods_stock_monthly_summary`
|
||||
- 编写迁移脚本 `db/etl_feiqiu/migrations/{date}__create_dws_goods_stock_summary.sql`
|
||||
- 主键:`(site_id, stat_date, site_goods_id)`
|
||||
- 字段:site_id, tenant_id, stat_date, site_goods_id, goods_name, goods_unit, goods_category_id, goods_category_second_id, category_name, range_start_stock, range_end_stock, range_in, range_out, range_sale, range_sale_money, range_inventory, current_stock, stat_period, created_at, updated_at
|
||||
- _Requirements: 12.5, 12.6, 12.8_
|
||||
|
||||
- [x] 7.5.2 实现 DWS_GOODS_STOCK_DAILY 任务
|
||||
- 在 `apps/etl/connectors/feiqiu/tasks/dws/` 下创建 `goods_stock_daily_task.py`
|
||||
- 继承 `BaseDwsTask`,实现 `extract` / `transform` / `load` 三阶段
|
||||
- extract:从 `dwd.dwd_goods_stock_summary` 按时间范围查询
|
||||
- transform:按日粒度汇总,`stat_period='daily'`
|
||||
- load:upsert 写入 `dws.dws_goods_stock_daily_summary`
|
||||
- _Requirements: 12.2, 12.7_
|
||||
|
||||
- [x] 7.5.3 实现 DWS_GOODS_STOCK_WEEKLY 任务
|
||||
- 创建 `goods_stock_weekly_task.py`,按 ISO 周分组,`stat_date` = 周一日期,`stat_period='weekly'`
|
||||
- _Requirements: 12.3, 12.7_
|
||||
|
||||
- [x] 7.5.4 实现 DWS_GOODS_STOCK_MONTHLY 任务
|
||||
- 创建 `goods_stock_monthly_task.py`,按自然月分组,`stat_date` = 月首日期,`stat_period='monthly'`
|
||||
- _Requirements: 12.4, 12.7_
|
||||
|
||||
- [x] 7.5.5 注册 DWS 库存汇总任务到任务调度
|
||||
- 在任务注册表中添加 `DWS_GOODS_STOCK_DAILY`、`DWS_GOODS_STOCK_WEEKLY`、`DWS_GOODS_STOCK_MONTHLY`
|
||||
- 确保任务依赖关系正确(依赖 DWD goods_stock_summary 加载完成)
|
||||
- _Requirements: 12.9_
|
||||
|
||||
- [x] 7.5.6 编写 DWS 库存汇总属性测试
|
||||
- **Property 8: DWS 库存汇总粒度聚合正确性**
|
||||
- **Validates: Requirements 12.2, 12.3, 12.4, 12.5, 12.6**
|
||||
|
||||
|
||||
- [x] 8. 文档精化
|
||||
- [x] 8.1 精化 A/B/C 类表涉及的 BD_Manual 文档
|
||||
- 逐个文档、逐项排查所有"待补充""待处理""未确定""未定义"等缺失内容
|
||||
- 将"金额字段""XX 相关""XXX 类"等粗略说明替换为精确的字段语义描述
|
||||
- 通过字段名称分析、上下文推测、遍历值/枚举值分析、代码取用情况分析确定字段含义
|
||||
- 确保每个字段说明包含:字段类型、业务含义、取值范围或枚举值、在代码中的使用位置
|
||||
- _Requirements: 13.1, 13.2, 13.3, 13.4_
|
||||
|
||||
- [x] 8.2 更新 dataflow 分析报告中涉及表的字段说明
|
||||
- 同步更新 `docs/database/` 下对应的文档
|
||||
- _Requirements: 1.4, 2.3, 3.3, 4.3, 5.3, 6.4, 7.4, 8.4, 9.3, 10.5, 11.3_
|
||||
|
||||
- [x] 9. Checkpoint - 文档精化完成
|
||||
- 确保所有文档更新完毕,无遗留的"待补充"标记,ask the user if questions arise.
|
||||
|
||||
- [x] 10. Admin-Web 前后端联调
|
||||
- [x] 10.1 排查并修复后端 ETL 执行 API
|
||||
- 检查 `apps/backend/app/routers/` 中 ETL 执行相关路由
|
||||
- 确保参数解析正确:全部门店、api_full、仅校验修复且校验前从 API 获取、自定义范围 2025-11-01 至 2026-02-20、窗口切分 10 天、force-full、全选常用功能
|
||||
- 确保参数正确转换为 ETL CLI 命令
|
||||
- _Requirements: 14.1, 14.2_
|
||||
|
||||
- [x] 10.2 排查并修复前端 TaskManager 页面
|
||||
- 检查 `apps/admin-web/src/pages/TaskManager.tsx` 中的参数配置表单
|
||||
- 确保所有参数选项可正确选择和提交
|
||||
- 确保任务执行状态实时展示(WebSocket 日志流)
|
||||
- _Requirements: 14.3_
|
||||
|
||||
- [x] 10.3 实现 ETL 执行计时器模块
|
||||
- 在 `apps/etl/connectors/feiqiu/utils/` 中新增计时器工具
|
||||
- 记录每个步骤和分步骤的开始时间、结束时间、耗时(精确到毫秒)
|
||||
- 全部任务完成后输出计时结果文档
|
||||
- _Requirements: 15.1, 15.2, 15.3_
|
||||
|
||||
- [x] 10.4 编写计时器属性测试
|
||||
- **Property 7: 计时器记录完整性**
|
||||
- **Validates: Requirements 15.2**
|
||||
|
||||
- [x] 10.5 编写 ETL 参数解析属性测试
|
||||
- **Property 5: ETL 参数解析与 CLI 命令构建正确性**
|
||||
- **Validates: Requirements 14.1, 14.2**
|
||||
|
||||
- [x] 11. 黑盒测试机制
|
||||
- [x] 11.1 实现数据一致性检查器
|
||||
- 在 `apps/etl/connectors/feiqiu/quality/` 中新增一致性检查模块
|
||||
- 实现 API 源数据 vs ODS 落库数据的字段完整性对比
|
||||
- 实现 ODS 数据 vs DWD 落库数据的映射正确性对比
|
||||
- 输出黑盒测试报告(每张表的检查结果、差异明细、通过/失败状态)
|
||||
- _Requirements: 16.1, 16.2, 16.3, 16.4_
|
||||
|
||||
- [x] 11.2 编写数据一致性检查属性测试
|
||||
- **Property 6: 数据一致性检查正确性**
|
||||
- **Validates: Requirements 16.2, 16.3**
|
||||
|
||||
- [x] 12. 端到端联调验证
|
||||
- [x] 12.1 执行完整 ETL 流程并验证数据正确性
|
||||
- 将 `EtlTimer` 集成到 `orchestration/flow_runner.py` 的 `FlowRunner.run()` 方法中
|
||||
- 增量 ETL 和校验分支均包裹计时步骤(`INCREMENT_ETL`、`VERIFICATION`、`FETCH_BEFORE_VERIFY`)
|
||||
- `timer.finish(write_report=True)` 在 Flow 结束时自动输出计时报告到 `ETL_REPORT_ROOT`
|
||||
- 产出物:`export/ETL-Connectors/feiqiu/REPORTS/etl_timing_*.md`
|
||||
- _Requirements: 14.4, 15.3_
|
||||
|
||||
- [x] 12.2 执行黑盒测试并生成报告
|
||||
- 将 `ConsistencyChecker` 集成到 `FlowRunner.run()` 的 `_run_post_consistency_check()` 方法中
|
||||
- ETL Flow 完成后自动运行一致性检查(API vs ODS + ODS vs DWD),输出报告到 `ETL_REPORT_ROOT`
|
||||
- 独立验证脚本 `scripts/ops/run_post_etl_reports.py` 确认报告生成正常
|
||||
- 实际执行结果:API vs ODS 22/22 通过,ODS vs DWD 38/42 通过
|
||||
- 产出物:`export/ETL-Connectors/feiqiu/REPORTS/consistency_report_*.md`
|
||||
- _Requirements: 16.1, 16.4_
|
||||
|
||||
- [x] 12.3 前后端浏览器联调验证(2026-02-20)
|
||||
- 启动后端 `uvicorn app.main:app --reload`(localhost:8000)+ 前端 `pnpm dev`(localhost:5173)
|
||||
- 浏览器登录管理后台,进入"任务配置"页面
|
||||
- 配置:Flow=ods_dwd, 处理模式=仅增量, dry-run=✓, 本地JSON=✓, 回溯24h
|
||||
- 点击"直接执行"→ 自动跳转"任务管理 > 历史"tab
|
||||
- 验证结果:status=success, duration=22.5s, exit_code=0
|
||||
- CLI 命令正确构建:`python -m cli.main --flow ods_dwd --processing-mode increment_only --tasks DWD_LOAD_FROM_ODS --lookback-hours 24 --overlap-seconds 600 --dry-run --data-source offline --store-id 2790685415443269`
|
||||
- 执行日志实时推送到前端 Modal(WebSocket /ws/logs/{id})✅
|
||||
- 计时报告自动生成:`etl_timing_20260220_073610.md`(2步骤,总耗时20.78s)✅
|
||||
- 一致性检查报告自动生成:`consistency_report_20260220_073610.md` ✅
|
||||
- _Requirements: 14.1, 14.2, 14.3, 14.4, 15.3, 16.4_
|
||||
|
||||
- [x] 13. Final checkpoint - 全部完成
|
||||
- 确保所有字段补全、文档精化、前后端联调、黑盒测试均已完成,ask the user if questions arise.
|
||||
|
||||
## 备注
|
||||
|
||||
- 标记 `*` 的任务为可选,可跳过以加速 MVP
|
||||
- 每个任务引用具体需求编号以确保可追溯
|
||||
- Checkpoint 确保增量验证
|
||||
- 属性测试验证通用正确性属性,单元测试验证具体示例和边界情况
|
||||
- 所有涉及 `loaders/`、`tasks/`、`db/` 的变更属于高风险路径,完成后需触发 `/audit`
|
||||
@@ -10,7 +10,7 @@
|
||||
5. 将数据采集脚本任务化,支持 CLI 参数(日期范围、条数),落盘到 `SYSTEM_ANALYZE_ROOT` 目录
|
||||
6. 全流程通过 Kiro Hook 手动触发,Python 脚本负责机械性数据准备,Kiro Agent 负责语义分析和报告编排
|
||||
|
||||
现有脚本输出样本参见 `docs/reports/dataflow_api_ods_dwd.md`。
|
||||
现有脚本输出样本参见 `export/SYSTEM/REPORTS/full_dataflow_doc/dataflow_api_ods_dwd.md`。
|
||||
|
||||
## 术语表
|
||||
|
||||
|
||||
@@ -84,7 +84,7 @@
|
||||
- 使用 `hypothesis` 生成随机列集合和窗口参数,通过 `FakeCursor` 捕获 SQL,验证 INSERT INTO ... ON CONFLICT 结构与预期一致
|
||||
- 文件:`tests/test_dwd_phase1_properties.py`
|
||||
|
||||
- [~] 7. 最终检查点 - 确保所有测试通过
|
||||
- [x] 7. 最终检查点 - 确保所有测试通过
|
||||
- 运行 `cd apps/etl/pipelines/feiqiu && pytest tests/unit` 和 `cd C:\NeoZQYY && pytest tests/ -v`,确保所有测试通过,如有问题请询问用户。
|
||||
|
||||
## 备注
|
||||
|
||||
1
.kiro/specs/etl-aggregation-fix/.config.kiro
Normal file
1
.kiro/specs/etl-aggregation-fix/.config.kiro
Normal file
@@ -0,0 +1 @@
|
||||
{"generationMode": "requirements-first"}
|
||||
464
.kiro/specs/etl-aggregation-fix/design.md
Normal file
464
.kiro/specs/etl-aggregation-fix/design.md
Normal file
@@ -0,0 +1,464 @@
|
||||
# 设计文档
|
||||
|
||||
## 概述
|
||||
|
||||
本设计覆盖 v8 联调中 4 个"临时止血"修复的深度方案,按优先级排列:
|
||||
1. **需求 D(P0)**:`DwdLoadTask.load()` 返回值格式规范化
|
||||
2. **需求 C1(P1)**:会员生日字段 ETL 链路补齐
|
||||
3. **需求 B(P1)**:多门店会员查询支持
|
||||
4. **需求 A(P2)**:助教月度聚合按档位分段统计
|
||||
5. **需求 C2(P2)**:助教手动补录会员生日
|
||||
|
||||
设计原则:
|
||||
- 每个需求独立可部署,按优先级逐步实施
|
||||
- DDL 变更通过迁移脚本执行,支持回滚
|
||||
- 保持现有 ETL 架构(BaseTask E/T/L 模板)不变
|
||||
|
||||
## 架构
|
||||
|
||||
整体架构不变,变更集中在以下层面:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "需求 D: 返回值规范化"
|
||||
D1[DwdLoadTask.load] -->|errors: int| D2[BaseTask._accumulate_counts]
|
||||
D2 -->|sum| D3[FlowRunner._safe_int]
|
||||
end
|
||||
|
||||
subgraph "需求 A: 档位分段统计"
|
||||
A1[dws_assistant_daily_detail] -->|GROUP BY level_code| A2[AssistantMonthlyTask]
|
||||
A2 -->|多行/档位| A3[dws_assistant_monthly_summary]
|
||||
A3 --> A4[AssistantSalaryTask 分段计算]
|
||||
end
|
||||
|
||||
subgraph "需求 B: 多门店会员查询"
|
||||
B1[dwd.事实表] -->|member_id IN| B2[dim_member]
|
||||
B2 --> B3[DWS 任务]
|
||||
end
|
||||
|
||||
subgraph "需求 C: 生日字段"
|
||||
C1[ODS payload] -->|birthday 提取| C2[dim_member.birthday]
|
||||
C3[后端 API] -->|UPSERT| C4[zqyy_app.member_birthday_manual]
|
||||
C2 --> C5[DWS 任务: COALESCE]
|
||||
C4 -->|FDW 只读| C5
|
||||
end
|
||||
```
|
||||
|
||||
## 组件与接口
|
||||
|
||||
### 需求 D:返回值格式规范化
|
||||
|
||||
**变更文件:**
|
||||
- `apps/etl/connectors/feiqiu/tasks/dwd/dwd_load_task.py`
|
||||
- `apps/etl/connectors/feiqiu/tasks/base_task.py`
|
||||
|
||||
**DwdLoadTask.load() 返回值变更:**
|
||||
|
||||
```python
|
||||
# 变更前
|
||||
return {"tables": summary, "errors": errors}
|
||||
# errors: list[dict],如 [{"table": "dim_assistant_ex", "error": "..."}]
|
||||
|
||||
# 变更后
|
||||
return {
|
||||
"tables": summary,
|
||||
"errors": len(errors), # int — 与其他任务一致
|
||||
"error_details": errors, # list[dict] — 保留详情供日志使用
|
||||
}
|
||||
```
|
||||
|
||||
**BaseTask._accumulate_counts() 防御层增强:**
|
||||
|
||||
```python
|
||||
@staticmethod
|
||||
def _accumulate_counts(total: dict, current: dict) -> dict:
|
||||
for key, value in (current or {}).items():
|
||||
if isinstance(value, (int, float)):
|
||||
total[key] = (total.get(key) or 0) + value
|
||||
elif isinstance(value, list):
|
||||
# 防御层:list 类型转为 len() 累加
|
||||
total[key] = (total.get(key) or 0) + len(value)
|
||||
else:
|
||||
total.setdefault(key, value)
|
||||
return total
|
||||
```
|
||||
|
||||
**FlowRunner._safe_int() 保留不变**,作为最终防御层。
|
||||
|
||||
### 需求 A:助教月度聚合按档位分段统计
|
||||
|
||||
**DDL 变更:**
|
||||
|
||||
```sql
|
||||
-- 迁移脚本:删除旧唯一约束,创建新约束
|
||||
ALTER TABLE dws.dws_assistant_monthly_summary
|
||||
DROP CONSTRAINT IF EXISTS uk_dws_assistant_monthly;
|
||||
|
||||
ALTER TABLE dws.dws_assistant_monthly_summary
|
||||
ADD CONSTRAINT uk_dws_assistant_monthly
|
||||
UNIQUE (site_id, assistant_id, stat_month, assistant_level_code);
|
||||
```
|
||||
|
||||
**AssistantMonthlyTask._extract_daily_aggregates() 变更:**
|
||||
|
||||
```sql
|
||||
-- 变更前:GROUP BY assistant_id, DATE_TRUNC('month', stat_date)
|
||||
-- 变更后:加入 assistant_level_code 分组
|
||||
SELECT
|
||||
assistant_id,
|
||||
assistant_level_code,
|
||||
assistant_level_name,
|
||||
-- nickname 取时间最后一条
|
||||
(ARRAY_AGG(assistant_nickname ORDER BY stat_date DESC))[1] AS assistant_nickname,
|
||||
DATE_TRUNC('month', stat_date)::DATE AS stat_month,
|
||||
COUNT(DISTINCT stat_date) AS work_days,
|
||||
SUM(total_service_count) AS total_service_count,
|
||||
-- ... 其余聚合字段不变
|
||||
FROM dws.dws_assistant_daily_detail
|
||||
WHERE site_id = %s AND ({month_where})
|
||||
GROUP BY assistant_id, assistant_level_code, assistant_level_name,
|
||||
DATE_TRUNC('month', stat_date)
|
||||
```
|
||||
|
||||
**AssistantSalaryTask 适配:**
|
||||
- `_extract_monthly_summary()` 返回多行(同一助教不同档位)
|
||||
- `transform()` 遍历每行分别计算工资,按档位使用对应的 `level_price` 和 `tier`
|
||||
- 最终每个 `(assistant_id, stat_month, assistant_level_code)` 生成一条工资记录
|
||||
|
||||
**AssistantFinanceTask._extract_daily_revenue() nickname 修复:**
|
||||
|
||||
```sql
|
||||
-- 变更前:MAX(s.nickname) AS assistant_nickname
|
||||
-- 变更后:
|
||||
(ARRAY_AGG(s.nickname ORDER BY s.start_use_time DESC))[1] AS assistant_nickname
|
||||
```
|
||||
|
||||
**AssistantCustomerTask._extract_service_pairs() nickname 修复:**
|
||||
|
||||
```sql
|
||||
-- 变更前:MAX(assistant_nickname) AS assistant_nickname
|
||||
-- 变更后:
|
||||
(ARRAY_AGG(assistant_nickname ORDER BY service_date DESC))[1] AS assistant_nickname
|
||||
```
|
||||
|
||||
### 需求 B:多门店会员查询支持
|
||||
|
||||
**变更模式:** 所有 `_extract_member_info(site_id)` 方法的 SQL 从:
|
||||
|
||||
```sql
|
||||
WHERE register_site_id = %s AND scd2_is_current = 1
|
||||
```
|
||||
|
||||
改为通过事实表反查:
|
||||
|
||||
```sql
|
||||
WHERE member_id IN (
|
||||
SELECT DISTINCT tenant_member_id
|
||||
FROM dwd.{事实表}
|
||||
WHERE site_id = %s AND tenant_member_id IS NOT NULL AND tenant_member_id != 0
|
||||
) AND scd2_is_current = 1
|
||||
```
|
||||
|
||||
**受影响的任务和对应事实表:**
|
||||
|
||||
| 任务 | 方法 | 事实表 |
|
||||
|------|------|--------|
|
||||
| `member_visit_task.py` | `_extract_member_info` | `dwd_settlement_head` |
|
||||
| `member_consumption_task.py` | `_extract_member_info` | `dwd_settlement_head` |
|
||||
| `assistant_customer_task.py` | `_extract_member_info` | `dwd_assistant_service_log` |
|
||||
|
||||
**`dim_member_card_account` 的处理:**
|
||||
- `member_consumption_task.py` 和 `finance_recharge_task.py` 中对 `dim_member_card_account` 的查询也使用 `register_site_id`
|
||||
- 同样改为通过事实表的 `tenant_member_id` 反查:
|
||||
|
||||
```sql
|
||||
WHERE tenant_member_id IN (
|
||||
SELECT DISTINCT tenant_member_id
|
||||
FROM dwd.{事实表}
|
||||
WHERE site_id = %s AND tenant_member_id IS NOT NULL AND tenant_member_id != 0
|
||||
) AND scd2_is_current = 1
|
||||
```
|
||||
|
||||
### 需求 C1:会员生日字段 ETL 链路补齐
|
||||
|
||||
**DDL 变更:**
|
||||
|
||||
```sql
|
||||
-- dim_member 加列
|
||||
ALTER TABLE dwd.dim_member ADD COLUMN IF NOT EXISTS birthday DATE;
|
||||
COMMENT ON COLUMN dwd.dim_member.birthday IS '会员生日,来源:ODS member_profiles payload 中的 birthday 字段';
|
||||
```
|
||||
|
||||
**ODS → DWD 装载:**
|
||||
- `DwdLoadTask` 的列映射是自动的(通过 `_get_columns()` 读取 DWD 表列名,与 ODS 列名匹配)
|
||||
- ODS `member_profiles` 表没有 `birthday` 列,但 `payload` JSONB 中可能包含
|
||||
- 需要在 `_build_column_mapping()` 或 `_fetch_source_rows()` 中增加从 `payload` 提取 `birthday` 的逻辑
|
||||
- 方案:在 ODS 表也加 `birthday` 列(保持 ODS 与 API 字段对齐),ODS 入库时从 JSON 提取
|
||||
|
||||
```sql
|
||||
-- ODS member_profiles 加列
|
||||
ALTER TABLE ods.member_profiles ADD COLUMN IF NOT EXISTS birthday DATE;
|
||||
```
|
||||
|
||||
ODS 入库逻辑(`ods_tasks.py`)已有从 JSON 提取字段的机制,新增 `birthday` 字段映射即可。DwdLoadTask 的自动列匹配会自动将 `ods.member_profiles.birthday` 映射到 `dwd.dim_member.birthday`。
|
||||
|
||||
**SCD2 处理:**
|
||||
- `birthday` 作为 `dim_member` 的普通维度列,SCD2 变化检测会自动包含
|
||||
- 当 API 返回的 birthday 值变化时,会触发 SCD2 版本更新
|
||||
|
||||
**DWS 任务恢复 birthday 引用:**
|
||||
- `member_visit_task.py` 的 `_extract_member_info()` SQL 中加入 `birthday`
|
||||
- `member_consumption_task.py` 同理
|
||||
|
||||
### 需求 C2:助教手动补录会员生日
|
||||
|
||||
**DDL(`zqyy_app` / `test_zqyy_app` 业务库):**
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS member_birthday_manual (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
member_id BIGINT NOT NULL,
|
||||
birthday_value DATE NOT NULL,
|
||||
recorded_by_assistant_id BIGINT,
|
||||
recorded_by_name VARCHAR(50),
|
||||
recorded_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
source VARCHAR(20) DEFAULT 'assistant',
|
||||
CONSTRAINT uk_member_birthday_manual
|
||||
UNIQUE (member_id, recorded_by_assistant_id)
|
||||
);
|
||||
|
||||
COMMENT ON TABLE member_birthday_manual IS '助教手动补录的会员生日信息';
|
||||
CREATE INDEX idx_mbd_member ON member_birthday_manual (member_id);
|
||||
```
|
||||
|
||||
**FDW 映射(ETL 库读取业务库数据):**
|
||||
|
||||
当前 FDW 方向是 `zqyy_app` → `etl_feiqiu`(业务库读 ETL 数据)。需求 C2 需要反向:ETL DWS 任务读取业务库的手动补录表。
|
||||
|
||||
方案:在 `etl_feiqiu` 库中创建指向 `zqyy_app` 的 FDW 外部表:
|
||||
|
||||
```sql
|
||||
-- 在 etl_feiqiu 中执行
|
||||
CREATE SERVER IF NOT EXISTS zqyy_app_server
|
||||
FOREIGN DATA WRAPPER postgres_fdw
|
||||
OPTIONS (host 'localhost', dbname 'zqyy_app', port '5432');
|
||||
|
||||
CREATE USER MAPPING IF NOT EXISTS FOR etl_user
|
||||
SERVER zqyy_app_server
|
||||
OPTIONS (user 'app_reader', password '***');
|
||||
|
||||
CREATE SCHEMA IF NOT EXISTS fdw_app;
|
||||
|
||||
CREATE FOREIGN TABLE fdw_app.member_birthday_manual (
|
||||
id BIGINT,
|
||||
member_id BIGINT,
|
||||
birthday_value DATE,
|
||||
recorded_by_assistant_id BIGINT,
|
||||
recorded_by_name VARCHAR(50),
|
||||
recorded_at TIMESTAMPTZ,
|
||||
source VARCHAR(20)
|
||||
) SERVER zqyy_app_server
|
||||
OPTIONS (schema_name 'public', table_name 'member_birthday_manual');
|
||||
```
|
||||
|
||||
**DWS 任务生日读取逻辑:**
|
||||
|
||||
```sql
|
||||
-- 优先手动补录值,其次 API 值
|
||||
COALESCE(
|
||||
(SELECT birthday_value
|
||||
FROM fdw_app.member_birthday_manual
|
||||
WHERE member_id = m.member_id
|
||||
ORDER BY recorded_at ASC -- 最早提交优先
|
||||
LIMIT 1),
|
||||
m.birthday
|
||||
) AS member_birthday
|
||||
```
|
||||
|
||||
**后端 API:**
|
||||
|
||||
```python
|
||||
# apps/backend/app/routers/member_birthday.py
|
||||
@router.post("/member-birthday")
|
||||
async def submit_member_birthday(
|
||||
member_id: int,
|
||||
birthday_value: date,
|
||||
assistant_id: int,
|
||||
assistant_name: str,
|
||||
db=Depends(get_db),
|
||||
):
|
||||
"""助教提交会员生日(UPSERT)"""
|
||||
sql = """
|
||||
INSERT INTO member_birthday_manual
|
||||
(member_id, birthday_value, recorded_by_assistant_id, recorded_by_name)
|
||||
VALUES (%s, %s, %s, %s)
|
||||
ON CONFLICT (member_id, recorded_by_assistant_id)
|
||||
DO UPDATE SET
|
||||
birthday_value = EXCLUDED.birthday_value,
|
||||
recorded_at = NOW()
|
||||
"""
|
||||
db.execute(sql, (member_id, birthday_value, assistant_id, assistant_name))
|
||||
return {"status": "ok"}
|
||||
```
|
||||
|
||||
## 数据模型
|
||||
|
||||
### 变更汇总
|
||||
|
||||
| 库 | 表 | 变更类型 | 说明 |
|
||||
|----|-----|---------|------|
|
||||
| `etl_feiqiu` | `ods.member_profiles` | 加列 | `birthday DATE` |
|
||||
| `etl_feiqiu` | `dwd.dim_member` | 加列 | `birthday DATE` |
|
||||
| `etl_feiqiu` | `dws.dws_assistant_monthly_summary` | 改约束 | UK 加入 `assistant_level_code` |
|
||||
| `zqyy_app` | `member_birthday_manual` | 新建表 | 手动补录生日 |
|
||||
| `etl_feiqiu` | `fdw_app.member_birthday_manual` | 新建外部表 | FDW 映射 |
|
||||
|
||||
### 迁移脚本清单
|
||||
|
||||
按优先级排序,每个迁移脚本独立可执行:
|
||||
|
||||
1. `2026-02-22__D_dwd_load_return_format.sql` — 无 DDL(纯代码变更)
|
||||
2. `2026-02-22__C1_dim_member_add_birthday.sql` — ODS/DWD 加列
|
||||
3. `2026-02-22__B_no_ddl_code_only.sql` — 无 DDL(纯代码变更)
|
||||
4. `2026-02-22__A_monthly_summary_uk_change.sql` — 唯一约束变更
|
||||
5. `2026-02-22__C2_member_birthday_manual.sql` — 新建表 + FDW
|
||||
|
||||
|
||||
## 正确性属性
|
||||
|
||||
*属性(Property)是系统在所有合法执行中都应保持为真的特征或行为——本质上是对"系统应该做什么"的形式化陈述。属性是连接人类可读规格说明与机器可验证正确性保证之间的桥梁。*
|
||||
|
||||
### Property 1: DwdLoadTask 返回值格式一致性
|
||||
|
||||
*对于任意* DwdLoadTask.load() 的执行结果,返回字典中 `errors` 键的值应为 `int` 类型,且等于 `error_details` 列表的长度。
|
||||
|
||||
**验证: 需求 1.1**
|
||||
|
||||
### Property 2: _accumulate_counts 类型安全累加
|
||||
|
||||
*对于任意* 包含 `int`、`float`、`list` 类型值的计数字典,`_accumulate_counts()` 应将 `int`/`float` 直接累加,将 `list` 转为 `len()` 后累加,且不抛出异常。
|
||||
|
||||
**验证: 需求 1.2**
|
||||
|
||||
### Property 3: 档位分段聚合正确性
|
||||
|
||||
*对于任意* 助教在同一月内存在 N 个不同 `assistant_level_code` 的日度数据,`_extract_daily_aggregates()` 应返回恰好 N 行记录,每行的业绩指标之和应等于该助教该月的总业绩。
|
||||
|
||||
**验证: 需求 2.1**
|
||||
|
||||
### Property 4: nickname 按时间倒序取值
|
||||
|
||||
*对于任意* 助教在聚合周期内有多条不同 nickname 的记录,聚合结果中的 nickname 应等于时间最晚的那条记录的 nickname。此属性适用于 AssistantMonthlyTask、AssistantFinanceTask、AssistantCustomerTask 三个任务。
|
||||
|
||||
**验证: 需求 2.3, 2.5, 2.6**
|
||||
|
||||
### Property 5: 工资按档位分段计算
|
||||
|
||||
*对于任意* 助教在同一月有多个档位的月度汇总记录,AssistantSalaryTask 应为每个档位分别计算工资,每个档位使用对应的 `level_price` 和 `tier` 配置,且所有档位的工资记录数等于月度汇总的行数。
|
||||
|
||||
**验证: 需求 2.4**
|
||||
|
||||
### Property 6: 跨店会员可查
|
||||
|
||||
*对于任意* 在 A 店注册但在 B 店有消费记录的会员,B 店的 DWS 任务通过事实表反查 `dim_member` 时,应能获取到该会员的维度信息(nickname、mobile 等)。
|
||||
|
||||
**验证: 需求 3.1, 3.2**
|
||||
|
||||
### Property 7: birthday ODS→DWD 装载正确性
|
||||
|
||||
*对于任意* ODS `member_profiles` 中包含 `birthday` 值的记录,DwdLoadTask 装载后 `dwd.dim_member` 中对应记录的 `birthday` 应与 ODS 源值一致。
|
||||
|
||||
**验证: 需求 4.2**
|
||||
|
||||
### Property 8: birthday SCD2 变化检测
|
||||
|
||||
*对于任意* `dim_member` 现有记录,当 ODS 中同一会员的 `birthday` 值发生变化时,SCD2 应关闭旧版本并创建新版本,新版本的 `birthday` 等于新值。
|
||||
|
||||
**验证: 需求 4.3**
|
||||
|
||||
### Property 9: 生日 UPSERT 幂等性
|
||||
|
||||
*对于任意* `(member_id, assistant_id)` 组合,连续两次提交不同的 `birthday_value`,`member_birthday_manual` 表中该组合应只有一条记录,且 `birthday_value` 等于最后一次提交的值。
|
||||
|
||||
**验证: 需求 5.2, 5.5**
|
||||
|
||||
### Property 10: 手动补录优先于 API 来源
|
||||
|
||||
*对于任意* 同时在 `dim_member.birthday` 和 `member_birthday_manual` 中有值的会员,DWS 任务输出的 `member_birthday` 应等于手动补录表中的值。
|
||||
|
||||
**验证: 需求 5.4**
|
||||
|
||||
### Property 11: SCD2 更新不影响手动补录表
|
||||
|
||||
*对于任意* 在 `member_birthday_manual` 中有记录的会员,执行 DwdLoadTask SCD2 更新 `dim_member.birthday` 后,`member_birthday_manual` 中的记录应保持不变。
|
||||
|
||||
**验证: 需求 5.6**
|
||||
|
||||
## 错误处理
|
||||
|
||||
### 需求 D:返回值格式
|
||||
|
||||
- `DwdLoadTask.load()` 中单表装载失败时,错误信息追加到 `error_details` 列表,`errors` 计数递增
|
||||
- `_accumulate_counts()` 遇到未知类型时使用 `setdefault` 保留原值(现有行为不变)
|
||||
- `_safe_int()` 遇到非 int/list 类型时返回 0(现有行为不变)
|
||||
|
||||
### 需求 A:档位分段
|
||||
|
||||
- 助教在某月无任何服务记录时,不生成月度汇总行(现有行为不变)
|
||||
- `assistant_level_code` 为 NULL 时,作为独立分组处理(NULL 视为一个档位)
|
||||
- 唯一约束变更后,需要清理旧数据(DELETE + 重新计算当月数据)
|
||||
|
||||
### 需求 B:多门店查询
|
||||
|
||||
- 事实表中无该门店消费记录时,`_extract_member_info()` 返回空字典(现有行为不变)
|
||||
- 子查询返回空集时,`WHERE member_id IN (空集)` 等价于 `WHERE FALSE`,不会报错
|
||||
|
||||
### 需求 C1:生日字段
|
||||
|
||||
- ODS 中 `birthday` 为 NULL 或空字符串时,DWD 中存为 NULL
|
||||
- 无效日期格式时,DwdLoadTask 的现有类型转换逻辑会将其置为 NULL
|
||||
|
||||
### 需求 C2:手动补录
|
||||
|
||||
- `member_id` 不存在于 `dim_member` 时,仍允许提交(助教可能先于 ETL 发现新客户)
|
||||
- `birthday_value` 格式校验由后端 API 的 Pydantic schema 处理
|
||||
- FDW 连接失败时,DWS 任务应 catch 异常并降级为仅使用 `dim_member.birthday`
|
||||
|
||||
## 测试策略
|
||||
|
||||
### 测试框架
|
||||
|
||||
- 属性测试:`hypothesis`(Python),每个属性测试最少 100 次迭代
|
||||
- 单元测试:`pytest`
|
||||
- 测试工具:`apps/etl/connectors/feiqiu/tests/unit/task_test_utils.py` 提供 FakeDB/FakeAPI
|
||||
|
||||
### 属性测试
|
||||
|
||||
每个正确性属性对应一个 hypothesis 属性测试,标注格式:
|
||||
|
||||
```python
|
||||
# Feature: etl-aggregation-fix, Property N: {property_text}
|
||||
@given(...)
|
||||
def test_property_N_xxx(data):
|
||||
...
|
||||
```
|
||||
|
||||
属性测试放置位置:
|
||||
- 需求 D 相关(Property 1-2):`apps/etl/connectors/feiqiu/tests/unit/test_return_format_properties.py`
|
||||
- 需求 A 相关(Property 3-5):`apps/etl/connectors/feiqiu/tests/unit/test_monthly_aggregation_properties.py`
|
||||
- 需求 B 相关(Property 6):`apps/etl/connectors/feiqiu/tests/unit/test_multi_store_properties.py`
|
||||
- 需求 C 相关(Property 7-11):`apps/etl/connectors/feiqiu/tests/unit/test_birthday_properties.py`
|
||||
|
||||
### 单元测试
|
||||
|
||||
单元测试覆盖具体示例和边界情况:
|
||||
- DDL 结构验证(唯一约束、列存在性)
|
||||
- 空数据 / NULL 值边界
|
||||
- 迁移脚本的回滚验证
|
||||
|
||||
### 测试环境
|
||||
|
||||
- 数据库:`test_etl_feiqiu` / `test_zqyy_app`(通过 `TEST_DB_DSN` 环境变量)
|
||||
- 纯单元测试使用 FakeDB/FakeAPI,不涉及真实数据库连接
|
||||
- ETL 测试 cwd:`apps/etl/connectors/feiqiu/`
|
||||
- 后端测试 cwd:`apps/backend/`
|
||||
84
.kiro/specs/etl-aggregation-fix/requirements.md
Normal file
84
.kiro/specs/etl-aggregation-fix/requirements.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# 需求文档
|
||||
|
||||
## 简介
|
||||
|
||||
v8 联调修复了 11 个 BUG,其中 4 个的当前修复方式是"临时止血",需要更完整的方案。本 Spec 覆盖以下四个需求:
|
||||
- **需求 A**:助教月度聚合按档位分段统计,替代 `MAX()` 聚合
|
||||
- **需求 B**:多门店会员查询 `register_site_id` 已知限制标记 + 预留扩展方案
|
||||
- **需求 C**:会员生日字段全链路补齐(C1: ETL 链路 / C2: 手动补录)
|
||||
- **需求 D**:`DwdLoadTask.load()` 返回值格式规范化
|
||||
|
||||
## 术语表
|
||||
|
||||
- **DwdLoadTask**:DWD 层装载任务,负责将 ODS 原始数据清洗装载至 DWD 明细层
|
||||
- **DWS 任务**:数据汇总层任务,从 DWD 层聚合生成业务报表数据
|
||||
- **SCD2**:缓慢变化维度类型 2,通过版本化记录维度属性的历史变化
|
||||
- **BaseTask**:ETL 任务基类,提供 Extract/Transform/Load 模板方法和计数累加逻辑
|
||||
- **FlowRunner**:ETL 流程编排器,按层级顺序执行任务并汇总计数
|
||||
- **dim_member**:DWD 层会员维度表,使用 SCD2 管理历史版本
|
||||
- **dws_assistant_monthly_summary**:DWS 层助教月度汇总表
|
||||
- **dws_assistant_salary_calc**:DWS 层助教工资计算表
|
||||
- **register_site_id**:会员注册门店 ID,当前 `dim_member` 中唯一的门店标识
|
||||
- **assistant_level_code**:助教档位代码,助教月内可能因升级/降级而变化
|
||||
- **dim_member_birthday_manual**:手动补录生日表(位于 `zqyy_app` 业务库),存储助教提交的会员生日信息,通过 FDW 只读映射供 ETL DWS 任务读取
|
||||
- **_safe_int()**:`flow_runner.py` 中的类型安全辅助函数,将 `int`/`list`/`None` 统一转为 `int`
|
||||
- **_accumulate_counts()**:`BaseTask` 中的计数累加方法,合并多段执行的统计结果
|
||||
|
||||
## 需求
|
||||
|
||||
### 需求 1:DwdLoadTask 返回值格式规范化(需求 D)
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我希望 DwdLoadTask.load() 的返回值格式与其他任务保持一致,以便 FlowRunner 能安全地汇总所有任务的计数。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN DwdLoadTask.load() 执行完成, THE DwdLoadTask SHALL 返回包含 `errors` 键(值为 `int` 类型,等于错误列表长度)和 `error_details` 键(值为 `list[dict]` 类型,包含错误详情)的字典
|
||||
2. WHEN BaseTask._accumulate_counts() 遇到值为 `list` 类型的计数项, THE BaseTask SHALL 将该值转换为 `len(list)` 后再累加(防御层)
|
||||
3. WHEN FlowRunner 汇总所有任务计数, THE FlowRunner SHALL 保留 `_safe_int()` 作为最终防御层,确保 `sum()` 不会因类型不一致而崩溃
|
||||
|
||||
### 需求 2:助教月度聚合按档位分段统计(需求 A)
|
||||
|
||||
**用户故事:** 作为运营管理者,我希望助教月度汇总按档位分段统计业绩,以便准确反映助教在不同档位期间的表现和工资计算。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 助教在同一月内存在多个 assistant_level_code, THE AssistantMonthlyTask SHALL 按 `(assistant_id, stat_month, assistant_level_code)` 分组生成多行记录,分别统计各档位的业绩指标
|
||||
2. THE dws_assistant_monthly_summary 表 SHALL 使用 `(site_id, assistant_id, stat_month, assistant_level_code)` 作为唯一约束
|
||||
3. WHEN AssistantMonthlyTask 需要取 nickname 值, THE AssistantMonthlyTask SHALL 按时间倒序取最后一条记录的 nickname,而非使用 `MAX()` 聚合
|
||||
4. WHEN AssistantSalaryTask 计算工资, THE AssistantSalaryTask SHALL 按档位分段计算抽成,适配新的多行月度汇总结构
|
||||
5. WHEN AssistantFinanceTask 提取日度收入需要 nickname, THE AssistantFinanceTask SHALL 按时间倒序取最后一条记录的 nickname,而非使用 `MAX()` 聚合
|
||||
6. WHEN AssistantCustomerTask 提取服务对需要 nickname, THE AssistantCustomerTask SHALL 按时间倒序取最后一条记录的 nickname,而非使用 `MAX()` 聚合
|
||||
|
||||
### 需求 3:多门店会员查询支持(需求 B)
|
||||
|
||||
**用户故事:** 作为运营管理者,我希望 DWS 任务能正确查询跨店消费的会员信息,以便 B 店能看到在 A 店注册但在 B 店消费的会员维度数据。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN DWS 任务需要查询会员信息, THE DWS 任务 SHALL 通过事实表中的 `member_id` 反查 `dim_member`,而非使用 `WHERE register_site_id = %s` 预筛选
|
||||
2. WHEN 会员在 A 店注册并在 B 店消费, THE B 店的 DWS 任务 SHALL 能查询到该会员的昵称、手机号等维度信息
|
||||
3. WHEN DWS 任务执行会员信息提取, THE DWS 任务 SHALL 使用 `WHERE member_id IN (SELECT DISTINCT member_id FROM dwd.事实表 WHERE site_id = %s)` 模式获取会员维度数据
|
||||
|
||||
### 需求 4:会员生日字段 ETL 链路补齐(需求 C1)
|
||||
|
||||
**用户故事:** 作为运营管理者,我希望会员生日信息能从上游 API 完整传递到 DWS 层,以便用于会员分析和销售线索。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE dim_member 表 SHALL 包含 `birthday DATE` 列
|
||||
2. WHEN DwdLoadTask 执行 ODS → DWD 装载, THE DwdLoadTask SHALL 在列映射中包含 `birthday` 字段,将 ODS 中的生日数据装载到 `dim_member.birthday`
|
||||
3. WHEN SCD2 更新 dim_member 记录, THE DwdLoadTask SHALL 将 `birthday` 作为变化检测字段之一,正常处理生日值的变化
|
||||
4. WHEN MemberVisitTask 等 DWS 任务提取会员信息, THE DWS 任务 SHALL 从 `dim_member.birthday` 读取生日字段并写入 DWS 目标表
|
||||
|
||||
### 需求 5:助教手动补录会员生日(需求 C2)
|
||||
|
||||
**用户故事:** 作为助教,我希望能手动提交客户的生日信息,以便在上游 API 未提供生日数据时补充这一重要销售线索。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE `zqyy_app` 业务库(开发/测试环境使用 `test_zqyy_app`)SHALL 包含 `member_birthday_manual` 表,结构包含 `member_id`、`birthday_value`、`recorded_by_assistant_id`、`recorded_by_name`、`recorded_at`、`source` 字段,唯一约束为 `(member_id, recorded_by_assistant_id)`
|
||||
2. WHEN 同一助教对同一会员重复提交生日, THE 系统 SHALL 更新该助教的已有记录(UPSERT),保留所有其他助教的提交记录
|
||||
3. WHEN DWS 任务需要读取手动补录生日, THE DWS 任务 SHALL 通过 FDW 只读映射从 `zqyy_app.member_birthday_manual` 读取数据
|
||||
4. WHEN dim_member.birthday(API 来源)和 member_birthday_manual(手动来源)同时存在, THE DWS 任务 SHALL 优先使用手动补录值
|
||||
5. WHEN 助教通过后端 API 提交生日, THE 后端 SHALL 提供 POST 接口接收 `member_id`、`birthday_value`、`assistant_id`、`assistant_name` 参数,执行 UPSERT 写入 `zqyy_app.member_birthday_manual`
|
||||
6. WHEN SCD2 更新 dim_member.birthday, THE DwdLoadTask SHALL 正常更新 API 来源的生日值,不影响业务库中手动补录表的数据
|
||||
228
.kiro/specs/etl-aggregation-fix/tasks.md
Normal file
228
.kiro/specs/etl-aggregation-fix/tasks.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# 实现计划:ETL 聚合修复与生日字段补齐
|
||||
|
||||
## 概述
|
||||
|
||||
按优先级(D → C1 → B → A → C2)逐步实施,每个需求独立可部署。代码变更集中在 ETL Connector 的 tasks 层和后端 API,DDL 变更通过迁移脚本执行。
|
||||
|
||||
## 任务
|
||||
|
||||
- [x] 1. 需求 D:DwdLoadTask 返回值格式规范化
|
||||
- [x] 1.1 修改 DwdLoadTask.load() 返回值格式
|
||||
- 在 `apps/etl/connectors/feiqiu/tasks/dwd/dwd_load_task.py` 的 `load()` 方法中,将 `return {"tables": summary, "errors": errors}` 改为 `return {"tables": summary, "errors": len(errors), "error_details": errors}`
|
||||
- _需求: 1.1_
|
||||
|
||||
- [x] 1.2 增强 BaseTask._accumulate_counts() 防御层
|
||||
- 在 `apps/etl/connectors/feiqiu/tasks/base_task.py` 的 `_accumulate_counts()` 方法中,增加 `isinstance(value, list)` 分支,将 list 转为 `len()` 后累加
|
||||
- _需求: 1.2_
|
||||
|
||||
- [x] 1.3 编写属性测试:DwdLoadTask 返回值格式一致性
|
||||
- **Property 1: DwdLoadTask 返回值格式一致性**
|
||||
- **验证: 需求 1.1**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_return_format_properties.py`
|
||||
|
||||
- [x] 1.4 编写属性测试:_accumulate_counts 类型安全累加
|
||||
- **Property 2: _accumulate_counts 类型安全累加**
|
||||
- **验证: 需求 1.2**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_return_format_properties.py`
|
||||
|
||||
- [x] 2. 检查点 — 需求 D 完成
|
||||
- 确保所有测试通过,ask the user if questions arise.
|
||||
|
||||
- [x] 3. 需求 C1:会员生日字段 ETL 链路补齐
|
||||
- [x] 3.1 编写迁移脚本:ODS/DWD 加 birthday 列
|
||||
- 创建 `db/etl_feiqiu/migrations/2026-02-22__C1_dim_member_add_birthday.sql`
|
||||
- ODS: `ALTER TABLE ods.member_profiles ADD COLUMN IF NOT EXISTS birthday DATE;`
|
||||
- DWD: `ALTER TABLE dwd.dim_member ADD COLUMN IF NOT EXISTS birthday DATE;`
|
||||
- 包含回滚语句和验证 SQL
|
||||
- _需求: 4.1_
|
||||
|
||||
- [-] 3.1a 在测试库执行迁移脚本 C1
|
||||
- 在 `test_etl_feiqiu` 上执行 `2026-02-22__C1_dim_member_add_birthday.sql`
|
||||
- 执行验证 SQL 确认列已添加
|
||||
- _需求: 4.1_
|
||||
|
||||
- [x] 3.2 更新 ODS 入库逻辑:提取 birthday 字段
|
||||
- 在 ODS 入库任务中增加 `birthday` 字段的 JSON 提取映射
|
||||
- 确认 `ods_tasks.py` 中 `member_profiles` 的字段列表包含 `birthday`
|
||||
- _需求: 4.2_
|
||||
|
||||
- [x] 3.3 验证 DwdLoadTask 自动列映射包含 birthday
|
||||
- DwdLoadTask 通过 `_get_columns()` 自动读取 DWD 表列名,确认 `birthday` 被自动包含在列映射中
|
||||
- SCD2 变化检测自动包含所有非 SCD2 元数据列,确认 `birthday` 参与变化检测
|
||||
- _需求: 4.2, 4.3_
|
||||
|
||||
- [x] 3.4 恢复 DWS 任务中的 birthday 引用
|
||||
- 修改 `member_visit_task.py` 的 `_extract_member_info()` SQL,加入 `birthday` 字段
|
||||
- 修改 `member_consumption_task.py` 的 `_extract_member_info()` SQL,加入 `birthday` 字段
|
||||
- 修改 DWS 任务的 `transform()` 方法,将 `member_birthday` 写入输出记录
|
||||
- _需求: 4.4_
|
||||
|
||||
- [x] 3.5 编写属性测试:birthday ODS→DWD 装载正确性
|
||||
- **Property 7: birthday ODS→DWD 装载正确性**
|
||||
- **验证: 需求 4.2**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_birthday_properties.py`
|
||||
|
||||
- [x] 3.6 编写属性测试:birthday SCD2 变化检测
|
||||
- **Property 8: birthday SCD2 变化检测**
|
||||
- **验证: 需求 4.3**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_birthday_properties.py`
|
||||
|
||||
- [x] 4. 检查点 — 需求 C1 完成
|
||||
- 确保所有测试通过,ask the user if questions arise.
|
||||
|
||||
- [x] 5. 需求 B:多门店会员查询支持
|
||||
- [x] 5.1 修改 member_visit_task.py 的 _extract_member_info()
|
||||
- 将 `WHERE register_site_id = %s` 改为通过 `dwd_settlement_head` 事实表的 `tenant_member_id` 反查
|
||||
- _需求: 3.1, 3.2_
|
||||
|
||||
- [x] 5.2 修改 member_consumption_task.py 的 _extract_member_info()
|
||||
- 将 `WHERE register_site_id = %s` 改为通过 `dwd_settlement_head` 事实表的 `tenant_member_id` 反查
|
||||
- 同时修改 `dim_member_card_account` 查询,改为通过事实表反查
|
||||
- _需求: 3.1, 3.2_
|
||||
|
||||
- [x] 5.3 修改 assistant_customer_task.py 的 _extract_member_info()
|
||||
- 将 `WHERE register_site_id = %s` 改为通过 `dwd_assistant_service_log` 事实表的 `tenant_member_id` 反查
|
||||
- _需求: 3.1, 3.2_
|
||||
|
||||
- [x] 5.4 修改 finance_recharge_task.py 的 dim_member_card_account 查询
|
||||
- 将 `WHERE register_site_id = %s` 改为通过事实表反查
|
||||
- _需求: 3.1_
|
||||
|
||||
- [x] 5.5 编写属性测试:跨店会员可查
|
||||
- **Property 6: 跨店会员可查**
|
||||
- **验证: 需求 3.1, 3.2**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_multi_store_properties.py`
|
||||
|
||||
- [x] 6. 检查点 — 需求 B 完成
|
||||
- 确保所有测试通过,ask the user if questions arise.
|
||||
|
||||
- [x] 7. 需求 A:助教月度聚合按档位分段统计
|
||||
- [x] 7.1 编写迁移脚本:唯一约束变更
|
||||
- 创建 `db/etl_feiqiu/migrations/2026-02-22__A_monthly_summary_uk_change.sql`
|
||||
- DROP 旧约束 `uk_dws_assistant_monthly`,ADD 新约束 `(site_id, assistant_id, stat_month, assistant_level_code)`
|
||||
- 包含回滚语句和验证 SQL
|
||||
- _需求: 2.2_
|
||||
|
||||
- [x] 7.1a 在测试库执行迁移脚本 A
|
||||
- 在 `test_etl_feiqiu` 上执行 `2026-02-22__A_monthly_summary_uk_change.sql`
|
||||
- 执行验证 SQL 确认约束已变更(`SELECT conname FROM pg_constraint ...`)
|
||||
- _需求: 2.2_
|
||||
|
||||
- [x] 7.2 修改 AssistantMonthlyTask._extract_daily_aggregates()
|
||||
- GROUP BY 加入 `assistant_level_code, assistant_level_name`
|
||||
- nickname 改为 `(ARRAY_AGG(assistant_nickname ORDER BY stat_date DESC))[1]`
|
||||
- _需求: 2.1, 2.3_
|
||||
|
||||
- [x] 7.3 修改 AssistantMonthlyTask._process_month() 适配多行
|
||||
- 确认 `_process_month()` 能正确处理同一助教多个档位的聚合数据
|
||||
- 每行使用自己的 `assistant_level_code` 进行档位匹配和排名计算
|
||||
- _需求: 2.1_
|
||||
|
||||
- [x] 7.4 修改 AssistantSalaryTask 适配档位分段工资计算
|
||||
- `_extract_monthly_summary()` 已能返回多行(同一助教不同档位)
|
||||
- `transform()` 遍历每行分别计算工资,按档位使用对应的 `level_price` 和 `tier`
|
||||
- _需求: 2.4_
|
||||
|
||||
- [x] 7.5 修改 AssistantFinanceTask._extract_daily_revenue() nickname 取值
|
||||
- 将 `MAX(s.nickname)` 改为 `(ARRAY_AGG(s.nickname ORDER BY s.start_use_time DESC))[1]`
|
||||
- _需求: 2.5_
|
||||
|
||||
- [x] 7.6 修改 AssistantCustomerTask._extract_service_pairs() nickname 取值
|
||||
- 将 `MAX(assistant_nickname)` 改为 `(ARRAY_AGG(assistant_nickname ORDER BY service_date DESC))[1]`
|
||||
- _需求: 2.6_
|
||||
|
||||
- [x] 7.7 编写属性测试:档位分段聚合正确性
|
||||
- **Property 3: 档位分段聚合正确性**
|
||||
- **验证: 需求 2.1**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_monthly_aggregation_properties.py`
|
||||
|
||||
- [x] 7.8 编写属性测试:nickname 按时间倒序取值
|
||||
- **Property 4: nickname 按时间倒序取值**
|
||||
- **验证: 需求 2.3, 2.5, 2.6**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_monthly_aggregation_properties.py`
|
||||
|
||||
- [x] 7.9 编写属性测试:工资按档位分段计算
|
||||
- **Property 5: 工资按档位分段计算**
|
||||
- **验证: 需求 2.4**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_monthly_aggregation_properties.py`
|
||||
|
||||
- [x] 8. 检查点 — 需求 A 完成
|
||||
- 确保所有测试通过,ask the user if questions arise.
|
||||
|
||||
- [x] 9. 需求 C2:助教手动补录会员生日
|
||||
- [x] 9.1 编写迁移脚本:创建 member_birthday_manual 表
|
||||
- 创建 `db/zqyy_app/migrations/2026-02-22__C2_member_birthday_manual.sql`
|
||||
- 在 `zqyy_app` / `test_zqyy_app` 中创建 `member_birthday_manual` 表
|
||||
- 包含回滚语句和验证 SQL
|
||||
- _需求: 5.1_
|
||||
|
||||
- [x] 9.1a 在测试库执行迁移脚本 C2
|
||||
- 在 `test_zqyy_app` 上执行 `2026-02-22__C2_member_birthday_manual.sql`
|
||||
- 执行验证 SQL 确认表和约束已创建
|
||||
- _需求: 5.1_
|
||||
|
||||
- [x] 9.2 编写 FDW 映射脚本:ETL 库读取业务库
|
||||
- 创建 `db/fdw/setup_fdw_reverse.sql`(etl_feiqiu → zqyy_app 方向)
|
||||
- 创建 `db/fdw/setup_fdw_reverse_test.sql`(test 环境版本)
|
||||
- 在 ETL 库中创建 `fdw_app.member_birthday_manual` 外部表
|
||||
- _需求: 5.3_
|
||||
|
||||
- [x] 9.2a 在测试库执行 FDW 映射脚本
|
||||
- 在 `test_etl_feiqiu` 上执行 `setup_fdw_reverse_test.sql`
|
||||
- 验证 `fdw_app.member_birthday_manual` 外部表可读取
|
||||
- _需求: 5.3_
|
||||
|
||||
- [x] 9.3 修改 DWS 任务:生日读取优先级逻辑
|
||||
- 在 `member_visit_task.py` 和 `member_consumption_task.py` 的 `_extract_member_info()` 中,使用 `COALESCE(fdw_app.member_birthday_manual, dim_member.birthday)` 逻辑
|
||||
- 增加 FDW 连接失败的降级处理
|
||||
- _需求: 5.4_
|
||||
|
||||
- [x] 9.4 实现后端 API:生日提交接口
|
||||
- 创建 `apps/backend/app/routers/member_birthday.py`
|
||||
- 实现 `POST /member-birthday` 接口,执行 UPSERT
|
||||
- 创建 Pydantic schema `apps/backend/app/schemas/member_birthday.py`
|
||||
- 在 `apps/backend/app/main.py` 中注册路由
|
||||
- _需求: 5.5_
|
||||
|
||||
- [x] 9.5 编写属性测试:生日 UPSERT 幂等性
|
||||
- **Property 9: 生日 UPSERT 幂等性**
|
||||
- **验证: 需求 5.2, 5.5**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_birthday_properties.py`
|
||||
|
||||
- [x] 9.6 编写属性测试:手动补录优先于 API 来源
|
||||
- **Property 10: 手动补录优先于 API 来源**
|
||||
- **验证: 需求 5.4**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_birthday_properties.py`
|
||||
|
||||
- [x] 9.7 编写属性测试:SCD2 更新不影响手动补录表
|
||||
- **Property 11: SCD2 更新不影响手动补录表**
|
||||
- **验证: 需求 5.6**
|
||||
- 文件:`apps/etl/connectors/feiqiu/tests/unit/test_birthday_properties.py`
|
||||
|
||||
- [x] 10. 收尾:主 DDL 合并与文档更新
|
||||
- [x] 10.1 合并迁移变更到主 DDL 文件
|
||||
- 将 `birthday` 列定义合并到 `db/etl_feiqiu/schemas/ods.sql`(`member_profiles` 表)
|
||||
- 将 `birthday` 列定义合并到 `db/etl_feiqiu/schemas/dwd.sql`(`dim_member` 表)
|
||||
- 将新唯一约束合并到 `db/etl_feiqiu/schemas/dws.sql`(`dws_assistant_monthly_summary` 表)
|
||||
- 将 `member_birthday_manual` 表定义合并到 `db/zqyy_app/schemas/init.sql`
|
||||
- 将 FDW 反向映射合并到 `db/fdw/setup_fdw.sql` 和 `db/fdw/setup_fdw_test.sql`
|
||||
|
||||
- [x] 10.2 更新数据库文档
|
||||
- 在 `docs/database/` 中新建或更新受影响表的文档:
|
||||
- `dim_member`:新增 `birthday` 列说明
|
||||
- `dws_assistant_monthly_summary`:唯一约束变更说明
|
||||
- `member_birthday_manual`:新建表文档(含 FDW 映射说明)
|
||||
- 遵循现有 `BD_Manual_*.md` 命名规范
|
||||
|
||||
- [x] 10.3 最终验证
|
||||
- 确保所有测试通过
|
||||
- 确认主 DDL 文件与测试库实际结构一致
|
||||
- ask the user if questions arise
|
||||
|
||||
## 备注
|
||||
|
||||
- 标记 `*` 的任务为可选测试任务,可跳过以加速 MVP
|
||||
- 每个需求独立可部署,检查点确保增量验证
|
||||
- 迁移脚本需在 `test_etl_feiqiu` / `test_zqyy_app` 上先行验证
|
||||
- 属性测试使用 hypothesis,每个属性最少 100 次迭代
|
||||
- 单元测试使用 FakeDB/FakeAPI,不依赖真实数据库
|
||||
126
.kiro/specs/etl-fullstack-integration/design.md
Normal file
126
.kiro/specs/etl-fullstack-integration/design.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# 设计文档:ETL 全流程前后端联调(etl-fullstack-integration)
|
||||
|
||||
## 概述
|
||||
|
||||
本 Spec 是一个运维联调任务,不涉及新功能开发。目标是验证 `admin-web-console` Spec 产出的前后端代码在真实环境下的端到端正确性,同时收集性能数据。
|
||||
|
||||
核心流程:
|
||||
1. 启动后端 + 前端服务
|
||||
2. 通过 API 登录获取 JWT
|
||||
3. 提交全流程 ETL 任务(api_full, full_window, force-full, 全选常用任务, 自定义窗口 2025-11-01~2026-02-20, 30天切分, 全部门店)
|
||||
4. 实时监控执行过程,捕获错误/警告
|
||||
5. 执行完成后生成综合报告
|
||||
|
||||
## 架构
|
||||
|
||||
```
|
||||
联调脚本 (scripts/ops/)
|
||||
│
|
||||
├── 1. 启动服务
|
||||
│ ├── uvicorn app.main:app (后端 :8000)
|
||||
│ └── pnpm dev (前端 :5173)
|
||||
│
|
||||
├── 2. API 调用链
|
||||
│ ├── POST /api/auth/login → JWT
|
||||
│ ├── GET /api/tasks/registry → 任务列表
|
||||
│ ├── GET /api/tasks/sync-check → 同步检查
|
||||
│ ├── POST /api/tasks/validate → CLI 预览
|
||||
│ └── POST /api/execution/run → 触发执行
|
||||
│
|
||||
├── 3. 监控循环
|
||||
│ ├── GET /api/execution/queue → 状态轮询
|
||||
│ ├── GET /api/execution/{id}/logs → 日志获取
|
||||
│ └── 错误/警告检测
|
||||
│
|
||||
└── 4. 报告生成
|
||||
└── 输出到 SYSTEM_LOG_ROOT
|
||||
```
|
||||
|
||||
## 任务参数
|
||||
|
||||
根据用户需求,联调任务的具体参数:
|
||||
|
||||
```python
|
||||
INTEGRATION_TASK_CONFIG = {
|
||||
"flow": "api_full", # 全流程:API → ODS → DWD → DWS → INDEX
|
||||
"processing_mode": "full_window", # 全窗口处理
|
||||
"window_mode": "custom", # 自定义时间范围
|
||||
"window_start": "2025-11-01 00:00",
|
||||
"window_end": "2026-02-20 00:00",
|
||||
"window_split": "day", # 按天切分
|
||||
"window_split_days": 30, # 30天一个切片
|
||||
"force_full": True, # 强制全量
|
||||
"dry_run": False,
|
||||
"tasks": [ # 全选 is_common=True 的任务
|
||||
# ODS 层(22 个)
|
||||
"ODS_ASSISTANT_ACCOUNT", "ODS_ASSISTANT_LEDGER", "ODS_ASSISTANT_ABOLISH",
|
||||
"ODS_SETTLEMENT_RECORDS", "ODS_TABLE_USE", "ODS_TABLE_FEE_DISCOUNT",
|
||||
"ODS_TABLES", "ODS_PAYMENT", "ODS_REFUND", "ODS_PLATFORM_COUPON",
|
||||
"ODS_MEMBER", "ODS_MEMBER_CARD", "ODS_MEMBER_BALANCE", "ODS_RECHARGE_SETTLE",
|
||||
"ODS_GROUP_PACKAGE", "ODS_GROUP_BUY_REDEMPTION",
|
||||
"ODS_INVENTORY_STOCK", "ODS_INVENTORY_CHANGE",
|
||||
"ODS_GOODS_CATEGORY", "ODS_STORE_GOODS", "ODS_STORE_GOODS_SALES", "ODS_TENANT_GOODS",
|
||||
# DWD 层(1 个常用)
|
||||
"DWD_LOAD_FROM_ODS",
|
||||
# DWS 层(15 个常用,排除 DWS_MAINTENANCE)
|
||||
"DWS_BUILD_ORDER_SUMMARY", "DWS_ASSISTANT_DAILY", "DWS_ASSISTANT_MONTHLY",
|
||||
"DWS_ASSISTANT_CUSTOMER", "DWS_ASSISTANT_SALARY", "DWS_ASSISTANT_FINANCE",
|
||||
"DWS_MEMBER_CONSUMPTION", "DWS_MEMBER_VISIT",
|
||||
"DWS_FINANCE_DAILY", "DWS_FINANCE_RECHARGE", "DWS_FINANCE_INCOME_STRUCTURE",
|
||||
"DWS_FINANCE_DISCOUNT_DETAIL",
|
||||
"DWS_GOODS_STOCK_DAILY", "DWS_GOODS_STOCK_WEEKLY", "DWS_GOODS_STOCK_MONTHLY",
|
||||
# INDEX 层(3 个常用,排除 DWS_ML_MANUAL_IMPORT)
|
||||
"DWS_WINBACK_INDEX", "DWS_NEWCONV_INDEX", "DWS_RELATION_INDEX",
|
||||
],
|
||||
# store_id 由后端从 JWT 注入(默认管理员 site_id=1)
|
||||
# 注意:用户要求"全部门店",但当前系统只有 site_id=1,后续多门店需逐个执行
|
||||
}
|
||||
```
|
||||
|
||||
## 监控策略
|
||||
|
||||
- 轮询间隔:30 秒
|
||||
- 最长等待:30 分钟(无新日志输出时)
|
||||
- 错误检测:日志行匹配 `ERROR`、`CRITICAL`、`Traceback`、`Exception`
|
||||
- 警告检测:日志行匹配 `WARNING`、`WARN`
|
||||
- 计时解析:从日志中提取时间戳,计算阶段耗时
|
||||
|
||||
## 报告格式
|
||||
|
||||
报告输出为 Markdown 文件,路径:`{SYSTEM_LOG_ROOT}/{date}__etl_integration_report.md`
|
||||
|
||||
```markdown
|
||||
# ETL 全流程联调报告
|
||||
|
||||
## 执行概要
|
||||
- 任务参数:...
|
||||
- 开始时间 / 结束时间 / 总时长
|
||||
- 退出码 / 最终状态
|
||||
|
||||
## 性能报告
|
||||
- 各窗口切片耗时对比表
|
||||
- Top-5 耗时阶段
|
||||
- 总体吞吐量估算
|
||||
|
||||
## DEBUG 报告(如有)
|
||||
- 错误摘要
|
||||
- 警告摘要
|
||||
- 相关日志片段
|
||||
- 可能的原因分析
|
||||
```
|
||||
|
||||
## 正确性属性
|
||||
|
||||
本 Spec 为运维联调任务,不涉及新功能代码开发,因此不定义形式化的属性测试。验证通过以下方式进行:
|
||||
- 服务健康检查通过
|
||||
- 任务提交成功并开始执行
|
||||
- 执行完成后退出码和日志符合预期
|
||||
- 报告文件成功生成
|
||||
|
||||
## 测试策略
|
||||
|
||||
本 Spec 本身就是一次集成测试。不额外编写单元测试或属性测试。验证标准:
|
||||
- 后端 API 响应正确
|
||||
- ETL CLI 子进程正常启动和执行
|
||||
- 日志正确捕获和推送
|
||||
- 报告文件正确生成到 SYSTEM_LOG_ROOT
|
||||
70
.kiro/specs/etl-fullstack-integration/requirements.md
Normal file
70
.kiro/specs/etl-fullstack-integration/requirements.md
Normal file
@@ -0,0 +1,70 @@
|
||||
# 需求文档:ETL 全流程前后端联调(etl-fullstack-integration)
|
||||
|
||||
## 简介
|
||||
|
||||
基于已完成的 `admin-web-console` Spec 产出的前后端代码,进行一次完整的端到端联调验证。通过管理后台 API 提交 ETL 全流程任务(api_full),覆盖 ODS → DWD → DWS → INDEX 全链路,验证前后端协作、子进程执行、日志推送、错误处理等环节的正确性。同时收集精细计时数据,定位性能瓶颈。
|
||||
|
||||
## 术语表
|
||||
|
||||
- **联调**:将 admin-web 后端 API 与 feiqiu ETL CLI 串联,通过真实 API 调用触发 ETL 全流程执行
|
||||
- **全选常用任务**:任务注册表中 `is_common=True` 的所有任务(排除工具类、手动导入、维护类等 `is_common=False` 的任务)
|
||||
- **精细计时**:在 ETL 执行过程中,通过日志解析或 CLI 输出,记录每个阶段(ODS/DWD/DWS/INDEX)和每个子任务的耗时
|
||||
|
||||
## 需求
|
||||
|
||||
### 需求 1:服务启动与健康检查
|
||||
|
||||
**用户故事:** 作为开发者,我希望一键启动后端和前端服务,并确认服务健康可用,以便开始联调。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 启动后端服务, THE Backend_API SHALL 在 `http://localhost:8000` 上响应请求
|
||||
2. WHEN 启动前端服务, THE Admin_Web SHALL 在 `http://localhost:5173` 上可访问
|
||||
3. WHEN 调用 `POST /api/auth/login`, THE Backend_API SHALL 返回有效的 JWT 令牌
|
||||
4. WHEN 调用 `GET /api/tasks/registry`, THE Backend_API SHALL 返回非空的任务注册表
|
||||
5. WHEN 调用 `GET /api/tasks/sync-check`, THE Backend_API SHALL 确认后端任务注册表与 ETL 真实注册表同步
|
||||
|
||||
### 需求 2:全流程任务提交与执行
|
||||
|
||||
**用户故事:** 作为开发者,我希望通过后端 API 提交一个覆盖全链路的 ETL 任务,验证任务配置、CLI 构建、子进程执行的完整流程。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 提交 TaskConfig(api_full + full_window + 全选常用任务 + 自定义窗口 2025-11-01~2026-02-20 + 30天切分 + force-full), THE Backend_API SHALL 验证配置有效并返回 CLI 命令预览
|
||||
2. WHEN 提交任务到执行队列, THE Backend_API SHALL 创建队列任务并自动开始执行
|
||||
3. WHILE ETL 子进程运行中, THE Backend_API SHALL 通过 WebSocket 推送实时日志
|
||||
4. WHEN ETL 子进程完成, THE Backend_API SHALL 记录退出码、执行时长、完整日志到 task_execution_log
|
||||
|
||||
### 需求 3:执行监控与错误处理
|
||||
|
||||
**用户故事:** 作为开发者,我希望在任务执行过程中实时监控状态,对报错或警告及时发现并进行 DEBUG。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHILE 任务执行中, THE 监控脚本 SHALL 每 30 秒轮询执行状态和日志
|
||||
2. WHEN 日志中出现 ERROR 或 WARNING 级别信息, THE 监控脚本 SHALL 立即记录并标记
|
||||
3. WHEN 任务执行完成(成功或失败), THE 监控脚本 SHALL 停止轮询并收集最终状态
|
||||
4. IF 任务执行超过 30 分钟无新日志输出, THEN THE 监控脚本 SHALL 报告超时警告
|
||||
5. IF 任务执行失败, THEN THE 监控脚本 SHALL 收集完整的 stderr 和错误上下文
|
||||
|
||||
### 需求 4:性能计时与瓶颈分析
|
||||
|
||||
**用户故事:** 作为开发者,我希望获得精细粒度的执行计时数据,以便发现性能瓶颈。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. WHEN 任务执行完成, THE 报告 SHALL 包含总执行时长
|
||||
2. WHEN 日志中包含阶段性时间戳, THE 报告 SHALL 解析并展示每个窗口切片的耗时
|
||||
3. THE 报告 SHALL 标注耗时最长的 Top-5 阶段/任务
|
||||
4. THE 报告 SHALL 包含每个窗口切片(30天)的独立耗时对比
|
||||
|
||||
### 需求 5:联调报告输出
|
||||
|
||||
**用户故事:** 作为开发者,我希望联调完成后获得一份综合报告,包含执行情况、性能数据和可能的 DEBUG 信息。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE 报告 SHALL 包含:执行概要(任务参数、开始/结束时间、总时长、退出码)
|
||||
2. THE 报告 SHALL 包含:性能报告(各阶段耗时、窗口切片耗时对比、Top-5 瓶颈)
|
||||
3. IF 执行过程中出现错误或警告, THEN THE 报告 SHALL 包含 DEBUG 报告(错误摘要、相关日志片段、可能的原因分析)
|
||||
4. THE 报告 SHALL 输出到 `SYSTEM_LOG_ROOT` 环境变量指定的目录
|
||||
86
.kiro/specs/etl-fullstack-integration/tasks.md
Normal file
86
.kiro/specs/etl-fullstack-integration/tasks.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# 实现计划:ETL 全流程前后端联调(etl-fullstack-integration)
|
||||
|
||||
## 概述
|
||||
|
||||
基于 `admin-web-console` 已完成的前后端代码,进行端到端联调验证。通过后端 API 提交 api_full 全流程 ETL 任务(自定义窗口 2025-11-01~2026-02-20,30天切分,force-full,全选常用任务),实时监控执行过程,收集性能数据,最终生成综合报告。
|
||||
|
||||
## 任务
|
||||
|
||||
- [ ] 1. 服务启动与健康检查
|
||||
- [ ] 1.1 启动后端服务(`apps/backend/`,uvicorn :8000),确认 API 可达
|
||||
- 启动 `uvicorn app.main:app --host 0.0.0.0 --port 8000`,cwd 为 `apps/backend/`
|
||||
- 验证 `GET /api/tasks/flows` 返回 200
|
||||
- _Requirements: 1.1_
|
||||
|
||||
- [ ] 1.2 启动前端服务(`apps/admin-web/`,pnpm dev :5173),确认页面可访问
|
||||
- 启动 `pnpm dev`,cwd 为 `apps/admin-web/`
|
||||
- 验证 `http://localhost:5173` 可达
|
||||
- _Requirements: 1.2_
|
||||
|
||||
- [ ] 1.3 API 健康检查:登录获取 JWT,验证任务注册表,执行 sync-check
|
||||
- `POST /api/auth/login` 使用默认管理员账号(admin / admin123),获取 JWT
|
||||
- `GET /api/tasks/registry` 确认返回非空任务列表
|
||||
- `GET /api/tasks/sync-check` 确认 `in_sync=true`(后端注册表与 ETL 真实注册表一致)
|
||||
- 如果 sync-check 不一致,记录差异并向用户报告
|
||||
- _Requirements: 1.3, 1.4, 1.5_
|
||||
|
||||
- [ ] 2. 全流程任务提交
|
||||
- [ ] 2.1 构建 TaskConfig 并调用 validate 预览 CLI 命令
|
||||
- 构建 TaskConfig:flow=api_full, processing_mode=full_window, window_mode=custom, window_start="2025-11-01 00:00", window_end="2026-02-20 00:00", window_split=day, window_split_days=30, force_full=True, tasks=全选 is_common=True 的任务
|
||||
- 调用 `POST /api/tasks/validate` 验证配置有效
|
||||
- 记录生成的 CLI 命令预览,确认参数完整(--flow api_full --processing-mode full_window --window-start ... --window-end ... --window-split day --window-split-days 30 --force-full --store-id 1 --tasks ...)
|
||||
- _Requirements: 2.1_
|
||||
|
||||
- [ ] 2.2 提交任务执行(`POST /api/execution/run`),记录 execution_id
|
||||
- 调用 `POST /api/execution/run` 提交任务
|
||||
- 记录返回的 execution_id
|
||||
- 确认任务状态变为 running
|
||||
- _Requirements: 2.2, 2.4_
|
||||
|
||||
- [ ] 3. 执行监控与 DEBUG
|
||||
- [ ] 3.1 启动监控循环:每 30 秒轮询状态和日志,检测错误/警告,最长等待 30 分钟
|
||||
- 轮询 `GET /api/execution/queue` 检查任务状态
|
||||
- 轮询 `GET /api/execution/{id}/logs` 获取增量日志
|
||||
- 检测日志中的 ERROR / CRITICAL / Traceback / Exception / WARNING
|
||||
- 记录每次轮询的时间戳和日志增量行数
|
||||
- 如果连续 30 分钟无新日志输出,报告超时警告
|
||||
- 任务完成(success/failed/cancelled)时停止轮询
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5_
|
||||
|
||||
- [ ] 3.2 对执行过程中发现的错误/警告进行 DEBUG 分析
|
||||
- 收集所有 ERROR 和 WARNING 日志行及其上下文(前后各 5 行)
|
||||
- 分析错误类型:API 超时、数据库连接、数据质量、配置问题等
|
||||
- 如果任务失败,获取完整 stderr 并分析根因
|
||||
- 记录 DEBUG 发现到报告中
|
||||
- _Requirements: 3.2, 3.5_
|
||||
|
||||
- [ ] 4. 性能计时与报告生成
|
||||
- [ ] 4.1 解析执行日志,提取精细计时数据
|
||||
- 从日志中提取每个窗口切片(30天)的开始/结束时间
|
||||
- 计算每个切片的耗时
|
||||
- 识别 ODS / DWD / DWS / INDEX 各阶段的耗时
|
||||
- 标注 Top-5 耗时最长的阶段/任务
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4_
|
||||
|
||||
- [ ] 4.2 生成综合联调报告,输出到 SYSTEM_LOG_ROOT
|
||||
- 报告包含:执行概要(参数、时间、退出码)
|
||||
- 报告包含:性能报告(各切片耗时对比、Top-5 瓶颈)
|
||||
- 报告包含:DEBUG 报告(如有错误/警告)
|
||||
- 输出路径:`{SYSTEM_LOG_ROOT}/{date}__etl_integration_report.md`
|
||||
- 路径通过 `SYSTEM_LOG_ROOT` 环境变量获取,缺失时报错
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4_
|
||||
|
||||
- [ ] 5. 服务清理
|
||||
- [ ] 5.1 停止后端和前端服务,清理资源
|
||||
- 停止 uvicorn 进程
|
||||
- 停止 pnpm dev 进程
|
||||
- 报告联调完成状态
|
||||
|
||||
## 说明
|
||||
|
||||
- 本 Spec 为运维联调任务,不涉及新功能代码开发
|
||||
- 不编写属性测试或单元测试,联调本身即为集成验证
|
||||
- 全选常用任务 = 任务注册表中 `is_common=True` 的所有任务(共 41 个)
|
||||
- "全部门店":当前系统仅有 site_id=1(默认管理员绑定),如需多门店需逐个执行
|
||||
- 监控允许空闲等待,最长 30 分钟无新日志才报超时
|
||||
- 报告输出路径遵循 export-paths 规范,通过 SYSTEM_LOG_ROOT 环境变量获取
|
||||
1
.kiro/specs/etl-staff-dimension/.config.kiro
Normal file
1
.kiro/specs/etl-staff-dimension/.config.kiro
Normal file
@@ -0,0 +1 @@
|
||||
{"generationMode": "requirements-first"}
|
||||
354
.kiro/specs/etl-staff-dimension/design.md
Normal file
354
.kiro/specs/etl-staff-dimension/design.md
Normal file
@@ -0,0 +1,354 @@
|
||||
# 设计文档:ETL 员工维度表(staff_info)
|
||||
|
||||
## 概述
|
||||
|
||||
为飞球 ETL 连接器新增员工维度表,从 `SearchSystemStaffInfo` API 抓取球房全体员工数据(店长、主管、教练、收银员、助教管理员等),经 ODS 落地后清洗装载至 DWD 层。员工表与现有助教表(`assistant_accounts_master`)是完全独立的实体。
|
||||
|
||||
## API 响应结构
|
||||
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"total": 15,
|
||||
"staffProfiles": [
|
||||
{
|
||||
"id": 3020236636900101,
|
||||
"cashierPointId": 2790685415443270,
|
||||
"cashierPointName": "默认",
|
||||
"job_num": "",
|
||||
"staff_name": "葛芃",
|
||||
"mobile": "13811638071",
|
||||
"auth_code": "",
|
||||
"avatar": "",
|
||||
"create_time": "2025-12-24 00:03:37",
|
||||
"entry_time": "2025-12-23 08:00:00",
|
||||
"is_delete": 0,
|
||||
"leave_status": 0,
|
||||
"resign_time": "2225-12-24 00:03:37",
|
||||
"site_id": 2790685415443269,
|
||||
"staff_identity": 2,
|
||||
"status": 1,
|
||||
"system_role_id": 4,
|
||||
"system_user_id": 3020236636293893,
|
||||
"tenant_id": 2790683160709957,
|
||||
"tenant_org_id": 2790685415443269,
|
||||
"job": "店长",
|
||||
"shop_name": "朗朗桌球",
|
||||
"account_status": 1,
|
||||
"is_reserve": 1,
|
||||
"groupName": "",
|
||||
"groupId": 0,
|
||||
"alias_name": "葛芃",
|
||||
"staff_profile_id": 0,
|
||||
"site_label": "",
|
||||
"rank_id": -1,
|
||||
"ding_talk_synced": 1,
|
||||
"new_rank_id": 0,
|
||||
"new_staff_identity": 0,
|
||||
"salary_grant_enabled": 2,
|
||||
"rankName": "无职级",
|
||||
"entry_type": 1,
|
||||
"userRoles": [],
|
||||
"entry_sign_status": 0,
|
||||
"resign_sign_status": 0,
|
||||
"criticism_status": 1,
|
||||
"gender": 3
|
||||
}
|
||||
]
|
||||
},
|
||||
"code": 0
|
||||
}
|
||||
```
|
||||
|
||||
## 1. ODS 层设计
|
||||
|
||||
### 1.1 ODS 任务规格
|
||||
|
||||
```python
|
||||
OdsTaskSpec(
|
||||
code="ODS_STAFF_INFO",
|
||||
class_name="OdsStaffInfoTask",
|
||||
table_name="ods.staff_info_master",
|
||||
endpoint="/PersonnelManagement/SearchSystemStaffInfo",
|
||||
data_path=("data",),
|
||||
list_key="staffProfiles",
|
||||
pk_columns=(_int_col("id", "id", required=True),),
|
||||
extra_params={
|
||||
"workStatusEnum": 0,
|
||||
"dingTalkSynced": 0,
|
||||
"staffIdentity": 0,
|
||||
"rankId": 0,
|
||||
"criticismStatus": 0,
|
||||
"signStatus": -1,
|
||||
},
|
||||
include_source_endpoint=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
requires_window=False,
|
||||
time_fields=None,
|
||||
snapshot_mode=SnapshotMode.FULL_TABLE,
|
||||
description="员工档案 ODS:SearchSystemStaffInfo -> staffProfiles 原始 JSON",
|
||||
)
|
||||
```
|
||||
|
||||
### 1.2 ODS 表 DDL:`ods.staff_info_master`
|
||||
|
||||
```sql
|
||||
CREATE TABLE ods.staff_info_master (
|
||||
id BIGINT NOT NULL,
|
||||
tenant_id BIGINT,
|
||||
site_id BIGINT,
|
||||
tenant_org_id BIGINT,
|
||||
system_user_id BIGINT,
|
||||
staff_name TEXT,
|
||||
alias_name TEXT,
|
||||
mobile TEXT,
|
||||
avatar TEXT,
|
||||
gender INTEGER,
|
||||
job TEXT,
|
||||
job_num TEXT,
|
||||
staff_identity INTEGER,
|
||||
status INTEGER,
|
||||
account_status INTEGER,
|
||||
system_role_id INTEGER,
|
||||
rank_id INTEGER,
|
||||
rank_name TEXT,
|
||||
new_rank_id INTEGER,
|
||||
new_staff_identity INTEGER,
|
||||
leave_status INTEGER,
|
||||
entry_time TIMESTAMP WITHOUT TIME ZONE,
|
||||
resign_time TIMESTAMP WITHOUT TIME ZONE,
|
||||
create_time TIMESTAMP WITHOUT TIME ZONE,
|
||||
is_delete INTEGER,
|
||||
is_reserve INTEGER,
|
||||
shop_name TEXT,
|
||||
site_label TEXT,
|
||||
cashier_point_id BIGINT,
|
||||
cashier_point_name TEXT,
|
||||
group_id BIGINT,
|
||||
group_name TEXT,
|
||||
staff_profile_id BIGINT,
|
||||
auth_code TEXT,
|
||||
auth_code_create TIMESTAMP WITHOUT TIME ZONE,
|
||||
ding_talk_synced INTEGER,
|
||||
salary_grant_enabled INTEGER,
|
||||
entry_type INTEGER,
|
||||
entry_sign_status INTEGER,
|
||||
resign_sign_status INTEGER,
|
||||
criticism_status INTEGER,
|
||||
user_roles JSONB,
|
||||
-- ETL 元数据
|
||||
content_hash TEXT NOT NULL,
|
||||
source_file TEXT,
|
||||
fetched_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
|
||||
payload JSONB NOT NULL
|
||||
);
|
||||
|
||||
COMMENT ON TABLE ods.staff_info_master IS '员工档案主数据(来源:SearchSystemStaffInfo API)';
|
||||
```
|
||||
|
||||
### 1.3 ODS 列名映射说明
|
||||
|
||||
API 返回的驼峰字段在 ODS 层统一转为蛇形命名(由 BaseOdsTask 自动处理):
|
||||
- `cashierPointId` → `cashier_point_id`
|
||||
- `cashierPointName` → `cashier_point_name`
|
||||
- `staffName` / `staff_name` → `staff_name`(API 已是蛇形)
|
||||
- `systemUserId` / `system_user_id` → `system_user_id`
|
||||
- `tenantOrgId` / `tenant_org_id` → `tenant_org_id`
|
||||
- `groupName` → `group_name`(注意:API 返回驼峰 `groupName`)
|
||||
- `groupId` → `group_id`(API 返回驼峰 `groupId`)
|
||||
- `rankName` → `rank_name`(API 返回驼峰 `rankName`)
|
||||
- `userRoles` → `user_roles`(数组,存为 JSONB)
|
||||
- `authCodeCreate` / `auth_code_create` → `auth_code_create`
|
||||
|
||||
## 2. DWD 层设计
|
||||
|
||||
### 2.1 主表 DDL:`dwd.dim_staff`
|
||||
|
||||
核心业务字段,高频查询使用。
|
||||
|
||||
```sql
|
||||
CREATE TABLE dwd.dim_staff (
|
||||
staff_id BIGINT NOT NULL,
|
||||
staff_name TEXT,
|
||||
alias_name TEXT,
|
||||
mobile TEXT,
|
||||
gender INTEGER,
|
||||
job TEXT,
|
||||
tenant_id BIGINT,
|
||||
site_id BIGINT,
|
||||
system_role_id INTEGER,
|
||||
staff_identity INTEGER,
|
||||
status INTEGER,
|
||||
leave_status INTEGER,
|
||||
entry_time TIMESTAMP WITH TIME ZONE,
|
||||
resign_time TIMESTAMP WITH TIME ZONE,
|
||||
is_delete INTEGER,
|
||||
-- SCD2
|
||||
scd2_start_time TIMESTAMP WITH TIME ZONE NOT NULL,
|
||||
scd2_end_time TIMESTAMP WITH TIME ZONE,
|
||||
scd2_is_current INTEGER,
|
||||
scd2_version INTEGER,
|
||||
PRIMARY KEY (staff_id, scd2_start_time)
|
||||
);
|
||||
|
||||
COMMENT ON TABLE dwd.dim_staff IS '员工档案维度主表(SCD2)';
|
||||
```
|
||||
|
||||
### 2.2 扩展表 DDL:`dwd.dim_staff_ex`
|
||||
|
||||
次要/低频变更字段。
|
||||
|
||||
```sql
|
||||
CREATE TABLE dwd.dim_staff_ex (
|
||||
staff_id BIGINT NOT NULL,
|
||||
avatar TEXT,
|
||||
job_num TEXT,
|
||||
account_status INTEGER,
|
||||
rank_id INTEGER,
|
||||
rank_name TEXT,
|
||||
new_rank_id INTEGER,
|
||||
new_staff_identity INTEGER,
|
||||
is_reserve INTEGER,
|
||||
shop_name TEXT,
|
||||
site_label TEXT,
|
||||
tenant_org_id BIGINT,
|
||||
system_user_id BIGINT,
|
||||
cashier_point_id BIGINT,
|
||||
cashier_point_name TEXT,
|
||||
group_id BIGINT,
|
||||
group_name TEXT,
|
||||
staff_profile_id BIGINT,
|
||||
auth_code TEXT,
|
||||
auth_code_create TIMESTAMP WITH TIME ZONE,
|
||||
ding_talk_synced INTEGER,
|
||||
salary_grant_enabled INTEGER,
|
||||
entry_type INTEGER,
|
||||
entry_sign_status INTEGER,
|
||||
resign_sign_status INTEGER,
|
||||
criticism_status INTEGER,
|
||||
create_time TIMESTAMP WITH TIME ZONE,
|
||||
user_roles JSONB,
|
||||
-- SCD2
|
||||
scd2_start_time TIMESTAMP WITH TIME ZONE NOT NULL,
|
||||
scd2_end_time TIMESTAMP WITH TIME ZONE,
|
||||
scd2_is_current INTEGER,
|
||||
scd2_version INTEGER,
|
||||
PRIMARY KEY (staff_id, scd2_start_time)
|
||||
);
|
||||
|
||||
COMMENT ON TABLE dwd.dim_staff_ex IS '员工档案维度扩展表(SCD2)';
|
||||
```
|
||||
|
||||
### 2.3 TABLE_MAP 映射
|
||||
|
||||
```python
|
||||
# 在 DwdLoadTask.TABLE_MAP 中新增:
|
||||
"dwd.dim_staff": "ods.staff_info_master",
|
||||
"dwd.dim_staff_ex": "ods.staff_info_master",
|
||||
```
|
||||
|
||||
### 2.4 FACT_MAPPINGS 字段映射
|
||||
|
||||
```python
|
||||
# dim_staff 主表映射
|
||||
"dwd.dim_staff": [
|
||||
("staff_id", "id", None),
|
||||
("entry_time", "entry_time", "timestamptz"),
|
||||
("resign_time", "resign_time", "timestamptz"),
|
||||
],
|
||||
# dim_staff_ex 扩展表映射
|
||||
"dwd.dim_staff_ex": [
|
||||
("staff_id", "id", None),
|
||||
("rank_name", "rankname", None),
|
||||
("cashier_point_id", "cashierpointid", "bigint"),
|
||||
("cashier_point_name", "cashierpointname", None),
|
||||
("group_id", "groupid", "bigint"),
|
||||
("group_name", "groupname", None),
|
||||
("system_user_id", "systemuserid", "bigint"),
|
||||
("tenant_org_id", "tenantorgid", "bigint"),
|
||||
("auth_code_create", "auth_code_create", "timestamptz"),
|
||||
("create_time", "create_time", "timestamptz"),
|
||||
("user_roles", "userroles", "jsonb"),
|
||||
],
|
||||
```
|
||||
|
||||
说明:
|
||||
- ODS 层的列名由 BaseOdsTask 自动从 API 驼峰转为蛇形(如 `cashierPointId` → `cashierpointid`,注意 PG 列名全小写无下划线)
|
||||
- DWD 主表中 `staff_name`、`alias_name`、`mobile` 等与 ODS 同名列自动映射,无需显式配置
|
||||
- `staff_id` 映射自 ODS 的 `id` 列
|
||||
|
||||
## 3. 数据流概览
|
||||
|
||||
```
|
||||
API: SearchSystemStaffInfo
|
||||
↓ (POST, 分页, extra_params 筛选)
|
||||
ODS: ods.staff_info_master
|
||||
↓ (SCD2 合并, FULL_TABLE 快照)
|
||||
DWD: dwd.dim_staff + dwd.dim_staff_ex
|
||||
```
|
||||
|
||||
## 4. 测试框架
|
||||
|
||||
- 测试框架:`pytest` + `hypothesis`
|
||||
- 单元测试使用 `FakeDB` / `FakeAPI`(`tests/unit/task_test_utils.py`)
|
||||
|
||||
## 5. 正确性属性
|
||||
|
||||
### P1:ODS 任务规格完整性
|
||||
对于 `ODS_STAFF_INFO` 任务规格,以下属性必须成立:
|
||||
- `code == "ODS_STAFF_INFO"`
|
||||
- `table_name == "ods.staff_info_master"`
|
||||
- `endpoint == "/PersonnelManagement/SearchSystemStaffInfo"`
|
||||
- `list_key == "staffProfiles"`
|
||||
- `snapshot_mode == SnapshotMode.FULL_TABLE`
|
||||
- `requires_window == False`
|
||||
- `time_fields is None`
|
||||
- `"staffProfiles"` 存在于 `DEFAULT_LIST_KEYS` 中
|
||||
- `"ODS_STAFF_INFO"` 存在于 `ENABLED_ODS_CODES` 中
|
||||
|
||||
验证方式:单元测试直接断言
|
||||
|
||||
### P2:DWD 映射完整性
|
||||
对于 DWD 装载配置,以下属性必须成立:
|
||||
- `TABLE_MAP["dwd.dim_staff"] == "ods.staff_info_master"`
|
||||
- `TABLE_MAP["dwd.dim_staff_ex"] == "ods.staff_info_master"`
|
||||
- `FACT_MAPPINGS["dwd.dim_staff"]` 包含 `staff_id` → `id` 的映射
|
||||
- `FACT_MAPPINGS["dwd.dim_staff_ex"]` 包含 `staff_id` → `id` 的映射
|
||||
|
||||
验证方式:单元测试直接断言
|
||||
|
||||
### P3:ODS 列名提取一致性(属性测试)
|
||||
对于任意 API 返回的员工记录(含驼峰和蛇形混合字段名),经 BaseOdsTask 处理后:
|
||||
- 所有字段名转为小写蛇形
|
||||
- `id` 字段不为空且为正整数
|
||||
- `payload` 字段包含完整原始 JSON
|
||||
|
||||
验证方式:hypothesis 属性测试,生成随机员工记录验证转换一致性
|
||||
|
||||
## 6. 文件变更清单
|
||||
|
||||
### 代码变更
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `apps/etl/connectors/feiqiu/api/client.py` | 修改 | `DEFAULT_LIST_KEYS` 添加 `"staffProfiles"` |
|
||||
| `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py` | 修改 | 新增 `ODS_STAFF_INFO` 任务规格 + 注册到 `ENABLED_ODS_CODES` |
|
||||
| `apps/etl/connectors/feiqiu/tasks/dwd/dwd_load_task.py` | 修改 | `TABLE_MAP` 和 `FACT_MAPPINGS` 新增 dim_staff/dim_staff_ex 映射 |
|
||||
|
||||
### DDL / 迁移
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `db/etl_feiqiu/migrations/2026-02-22__add_staff_info_tables.sql` | 新增 | ODS + DWD 建表迁移脚本 |
|
||||
| `docs/database/ddl/etl_feiqiu__ods.sql` | 修改 | 追加 `ods.staff_info_master` DDL |
|
||||
| `docs/database/ddl/etl_feiqiu__dwd.sql` | 修改 | 追加 `dwd.dim_staff` + `dwd.dim_staff_ex` DDL |
|
||||
|
||||
### 文档
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `apps/etl/connectors/feiqiu/docs/database/ODS/mappings/mapping_SearchSystemStaffInfo_staff_info_master.md` | 新增 | API→ODS 字段映射文档 |
|
||||
| `apps/etl/connectors/feiqiu/docs/database/ODS/main/BD_manual_staff_info_master.md` | 新增 | ODS 表 BD 手册 |
|
||||
| `apps/etl/connectors/feiqiu/docs/database/DWD/main/BD_manual_dim_staff.md` | 新增 | DWD 主表 BD 手册 |
|
||||
| `apps/etl/connectors/feiqiu/docs/database/DWD/main/BD_manual_dim_staff_ex.md` | 新增 | DWD 扩展表 BD 手册 |
|
||||
| `apps/etl/connectors/feiqiu/docs/database/README.md` | 修改 | 增加员工表条目 |
|
||||
| `apps/etl/connectors/feiqiu/docs/etl_tasks/ods_tasks.md` | 修改 | 增加 ODS_STAFF_INFO 任务说明 |
|
||||
| `docs/database/README.md` | 修改 | 增加员工相关表条目 |
|
||||
82
.kiro/specs/etl-staff-dimension/requirements.md
Normal file
82
.kiro/specs/etl-staff-dimension/requirements.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# 需求文档:ETL 员工维度表(staff_info)
|
||||
|
||||
## 概述
|
||||
|
||||
为飞球 ETL 连接器新增"员工"维度表,从上游 `SearchSystemStaffInfo` API 抓取球房员工数据,经 ODS 落地后清洗装载至 DWD 层,走 SCD2 缓慢变化维度。
|
||||
|
||||
## 用户故事
|
||||
|
||||
### US-1:API 对接与 ODS 落地
|
||||
作为 ETL 开发者,我需要将 `SearchSystemStaffInfo` API 融入现有 API 请求框架,并将原始数据落地到 ODS 层,以便后续清洗使用。
|
||||
|
||||
验收标准:
|
||||
- 1.1 在 `ODS_TASK_SPECS` 中新增 `ODS_STAFF_INFO` 任务规格,endpoint 为 `/PersonnelManagement/SearchSystemStaffInfo`
|
||||
- 1.2 API 请求体包含必要的筛选参数(`workStatusEnum`、`staffIdentity` 等),使用现有 `APIClient.iter_paginated` 分页机制
|
||||
- 1.3 ODS 表 `ods.staff_info_master` 包含 API 返回的所有业务字段 + ETL 元数据列(`content_hash`、`source_file`、`fetched_at`、`payload`)
|
||||
- 1.4 任务配置为 `snapshot_mode=FULL_TABLE`(全量快照,无时间窗口),`requires_window=False`
|
||||
- 1.5 在 `DEFAULT_LIST_KEYS` 中添加 `staffProfiles`(API 响应的列表键名)
|
||||
- 1.6 在 `ENABLED_ODS_CODES` 中注册 `ODS_STAFF_INFO`
|
||||
|
||||
### US-2:DWD 维度表设计与 SCD2 装载
|
||||
作为数据分析师,我需要一张清洗后的员工维度表,以便在 DWS 汇总层关联员工信息。
|
||||
|
||||
验收标准:
|
||||
- 2.1 创建 `dwd.dim_staff` 主表,包含核心业务字段(员工 ID、姓名、手机号、角色、门店、在职状态等)+ SCD2 列
|
||||
- 2.2 创建 `dwd.dim_staff_ex` 扩展表,包含次要/低频变更字段 + SCD2 列
|
||||
- 2.3 在 `DwdLoadTask.TABLE_MAP` 中注册 `dwd.dim_staff` → `ods.staff_info_master` 和 `dwd.dim_staff_ex` → `ods.staff_info_master` 的映射
|
||||
- 2.4 在 `DwdLoadTask.FACT_MAPPINGS` 中配置字段映射(ODS 列名 → DWD 列名,含必要的类型转换)
|
||||
- 2.5 DWD 装载走 SCD2 合并路径,变更检测正常工作
|
||||
|
||||
### US-3:DDL 创建与归档
|
||||
作为 DBA,我需要 ODS 和 DWD 层的 DDL 被正确创建并归档到项目文档中。
|
||||
|
||||
验收标准:
|
||||
- 3.1 编写 ODS 层 DDL(`ods.staff_info_master`),在测试库执行
|
||||
- 3.2 编写 DWD 层 DDL(`dwd.dim_staff`、`dwd.dim_staff_ex`),在测试库执行
|
||||
- 3.3 DDL 归档至 `docs/database/ddl/etl_feiqiu__ods.sql` 和 `docs/database/ddl/etl_feiqiu__dwd.sql`
|
||||
- 3.4 编写迁移脚本至 `db/etl_feiqiu/migrations/`(日期前缀)
|
||||
|
||||
### US-4:文档增补
|
||||
作为团队成员,我需要所有相关文档同步更新,以便理解新增的数据流。
|
||||
|
||||
验收标准:
|
||||
- 4.1 新增 ODS mapping 文档:`apps/etl/connectors/feiqiu/docs/database/ODS/mappings/mapping_SearchSystemStaffInfo_staff_info_master.md`
|
||||
- 4.2 新增 ODS BD_manual 文档:`apps/etl/connectors/feiqiu/docs/database/ODS/main/BD_manual_staff_info_master.md`
|
||||
- 4.3 新增 DWD BD_manual 文档:`apps/etl/connectors/feiqiu/docs/database/DWD/main/BD_manual_dim_staff.md`
|
||||
- 4.4 更新 `apps/etl/connectors/feiqiu/docs/database/README.md`,增加员工表条目
|
||||
- 4.5 更新 `apps/etl/connectors/feiqiu/docs/etl_tasks/ods_tasks.md`,增加 ODS_STAFF_INFO 任务说明
|
||||
- 4.6 更新 `docs/database/README.md`,增加员工相关表的条目
|
||||
- 4.7 新增 DWD BD_manual 扩展表文档:`apps/etl/connectors/feiqiu/docs/database/DWD/main/BD_manual_dim_staff_ex.md`(如有扩展表)
|
||||
|
||||
## API 信息
|
||||
|
||||
| 属性 | 值 |
|
||||
|------|-----|
|
||||
| URL | `https://pc.ficoo.vip/apiprod/admin/v1/PersonnelManagement/SearchSystemStaffInfo` |
|
||||
| 方法 | POST |
|
||||
| 端点路径 | `/PersonnelManagement/SearchSystemStaffInfo` |
|
||||
| 认证 | Bearer Token(标准飞球 API 认证) |
|
||||
| 分页 | `page` + `limit`(与现有接口一致) |
|
||||
|
||||
### 请求体参数
|
||||
|
||||
| 参数 | 类型 | 默认值 | 说明 |
|
||||
|------|------|--------|------|
|
||||
| workStatusEnum | int | 0 | 在职状态筛选(0=全部) |
|
||||
| dingTalkSynced | int | 0 | 钉钉同步状态(0=全部) |
|
||||
| staffIdentity | int | 0 | 员工身份筛选(0=全部) |
|
||||
| rankId | int | 0 | 职级筛选(0=全部) |
|
||||
| criticismStatus | int | 0 | 批评状态(0=全部) |
|
||||
| signStatus | int | -1 | 签约状态(-1=全部) |
|
||||
| page | int | 1 | 页码 |
|
||||
| limit | int | 50 | 每页条数 |
|
||||
|
||||
## 技术约束
|
||||
|
||||
- 员工表为维度表,DWD 层走 SCD2
|
||||
- ODS 使用 `SnapshotMode.FULL_TABLE`(全量快照软删除)
|
||||
- 不需要时间窗口(`requires_window=False`,`time_fields=None`)
|
||||
- 主键为 `id`(员工 ID)
|
||||
- API 响应结构已确认:`data.staffProfiles` 为列表键,`data.total` 为总数
|
||||
- 员工表与助教表(`assistant_accounts_master`)是完全独立的实体
|
||||
- DWD 层拆分为主表(`dim_staff`)+ 扩展表(`dim_staff_ex`)
|
||||
33
.kiro/specs/etl-staff-dimension/tasks.md
Normal file
33
.kiro/specs/etl-staff-dimension/tasks.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# 任务列表:ETL 员工维度表(staff_info)
|
||||
|
||||
## 任务
|
||||
|
||||
- [x] 1. DDL 创建与数据库执行
|
||||
- [x] 1.1 编写迁移脚本 `db/etl_feiqiu/migrations/2026-02-22__add_staff_info_tables.sql`,包含 ODS + DWD 三张表的 CREATE TABLE 语句
|
||||
- [x] 1.2 在测试库(test_etl_feiqiu)执行迁移脚本,创建 `ods.staff_info_master`、`dwd.dim_staff`、`dwd.dim_staff_ex`
|
||||
- [x] 1.3 将 DDL 归档追加至 `docs/database/ddl/etl_feiqiu__ods.sql` 和 `docs/database/ddl/etl_feiqiu__dwd.sql`
|
||||
|
||||
- [x] 2. ODS 任务注册
|
||||
- [x] 2.1 在 `apps/etl/connectors/feiqiu/api/client.py` 的 `DEFAULT_LIST_KEYS` 中添加 `"staffProfiles"`
|
||||
- [x] 2.2 在 `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py` 的 `ODS_TASK_SPECS` 中新增 `ODS_STAFF_INFO` 任务规格
|
||||
- [x] 2.3 在 `ENABLED_ODS_CODES` 集合中注册 `"ODS_STAFF_INFO"`
|
||||
|
||||
- [x] 3. DWD 映射注册
|
||||
- [x] 3.1 在 `apps/etl/connectors/feiqiu/tasks/dwd/dwd_load_task.py` 的 `TABLE_MAP` 中新增 `dwd.dim_staff` 和 `dwd.dim_staff_ex` 的映射
|
||||
- [x] 3.2 在 `FACT_MAPPINGS` 中新增 `dwd.dim_staff` 和 `dwd.dim_staff_ex` 的字段映射配置
|
||||
|
||||
- [x] 4. 单元测试
|
||||
- [x] 4.1 编写 ODS 任务规格完整性测试(验证 P1 属性)
|
||||
- [x] 4.2 编写 DWD 映射完整性测试(验证 P2 属性)
|
||||
|
||||
- [x] 5. 属性测试
|
||||
- [x] 5.1 [PBT] 编写 ODS 列名提取一致性属性测试(验证 P3 属性):对于任意员工记录,字段名转换和 payload 保留正确
|
||||
|
||||
- [x] 6. 文档增补
|
||||
- [x] 6.1 新增 ODS mapping 文档:`apps/etl/connectors/feiqiu/docs/database/ODS/mappings/mapping_SearchSystemStaffInfo_staff_info_master.md`
|
||||
- [x] 6.2 新增 ODS BD_manual 文档:`apps/etl/connectors/feiqiu/docs/database/ODS/main/BD_manual_staff_info_master.md`
|
||||
- [x] 6.3 新增 DWD BD_manual 主表文档:`apps/etl/connectors/feiqiu/docs/database/DWD/main/BD_manual_dim_staff.md`
|
||||
- [x] 6.4 新增 DWD BD_manual 扩展表文档:`apps/etl/connectors/feiqiu/docs/database/DWD/main/BD_manual_dim_staff_ex.md`
|
||||
- [x] 6.5 更新 `apps/etl/connectors/feiqiu/docs/database/README.md`,增加员工表条目
|
||||
- [x] 6.6 更新 `apps/etl/connectors/feiqiu/docs/etl_tasks/ods_tasks.md`,增加 ODS_STAFF_INFO 任务说明
|
||||
- [x] 6.7 更新 `docs/database/README.md`,增加员工相关表条目
|
||||
1
.kiro/specs/spi-spending-power-index/.config.kiro
Normal file
1
.kiro/specs/spi-spending-power-index/.config.kiro
Normal file
@@ -0,0 +1 @@
|
||||
{"generationMode": "requirements-first"}
|
||||
395
.kiro/specs/spi-spending-power-index/design.md
Normal file
395
.kiro/specs/spi-spending-power-index/design.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# 设计文档:SPI 消费力指数(Spending Power Index)
|
||||
|
||||
## 概述
|
||||
|
||||
SPI 是 NeoZQYY 指数体系的第 7 个指数(继 WBI/NCI/RS/OS/MS/ML 之后),粒度为 `(site_id, member_id)`,用于衡量会员在门店内的综合消费力层级。
|
||||
|
||||
SPI 采用"主分 + 子分"结构:
|
||||
- Level(消费水平):基于消费金额和客单价的 log1p 压缩加权
|
||||
- Speed(消费速度):基于绝对速度、相对速度、EWMA 速度的加权
|
||||
- Stability(消费稳定性):基于近 90 天周覆盖率
|
||||
|
||||
SPI 不继承 `MemberIndexBaseTask`(该基类为 WBI/NCI 共享的会员分群逻辑,SPI 不需要 NEW/OLD/STOP 分群),而是直接继承 `BaseIndexTask`,自行实现数据提取和评分逻辑。
|
||||
|
||||
### 设计决策
|
||||
|
||||
1. **继承 BaseIndexTask 而非 MemberIndexBaseTask**:SPI 不需要会员分群(NEW/OLD/STOP),所有有消费记录的会员均参与计算。MemberIndexBaseTask 的 `_build_member_activity` 提取的特征(intervals、t_v/t_r/t_a 等)与 SPI 需求不匹配,复用反而增加耦合。
|
||||
2. **独立数据提取**:SPI 需要按周聚合、日消费序列等 MemberIndexBaseTask 不提供的特征,因此自行编写 SQL 提取逻辑。
|
||||
3. **金额压缩基数自动校准**:首次执行时从门店数据计算中位数作为基数,后续可通过 cfg_index_parameters 手动覆盖。
|
||||
4. **子分独立映射**:Level/Speed/Stability 各自独立做 batch_normalize_to_display,使用不同的 index_type 后缀(SPI_LEVEL/SPI_SPEED/SPI_STABILITY)隔离分位历史。
|
||||
|
||||
## 架构
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph 数据来源
|
||||
A[dwd.dwd_settlement_head<br/>消费订单]
|
||||
B[dwd.dwd_recharge_order<br/>充值订单]
|
||||
end
|
||||
|
||||
subgraph SpendingPowerIndexTask
|
||||
C[extract_spending_features<br/>提取基础特征]
|
||||
D[calculate_level<br/>Level 子分]
|
||||
E[calculate_speed<br/>Speed 子分]
|
||||
F[calculate_stability<br/>Stability 子分]
|
||||
G[calculate_spi_raw<br/>SPI 总分合成]
|
||||
H[batch_normalize_to_display<br/>展示分映射]
|
||||
end
|
||||
|
||||
subgraph 配置
|
||||
I[cfg_index_parameters<br/>index_type='SPI']
|
||||
end
|
||||
|
||||
subgraph 输出
|
||||
J[dws.dws_member_spending_power_index]
|
||||
K[dws.dws_index_percentile_history]
|
||||
end
|
||||
|
||||
A --> C
|
||||
B --> C
|
||||
I --> C
|
||||
I --> D
|
||||
I --> E
|
||||
I --> F
|
||||
I --> G
|
||||
C --> D
|
||||
C --> E
|
||||
C --> F
|
||||
D --> G
|
||||
E --> G
|
||||
F --> G
|
||||
G --> H
|
||||
H --> J
|
||||
H --> K
|
||||
```
|
||||
|
||||
### 继承体系
|
||||
|
||||
```
|
||||
BaseTask
|
||||
└── BaseDwsTask
|
||||
└── BaseIndexTask
|
||||
├── MemberIndexBaseTask ← WBI / NCI(不使用)
|
||||
├── RelationIndexTask ← RS/OS/MS/ML
|
||||
├── MlManualImportTask ← ML 台账导入
|
||||
└── SpendingPowerIndexTask ← SPI(新增)
|
||||
```
|
||||
|
||||
## 组件与接口
|
||||
|
||||
### SpendingPowerIndexTask
|
||||
|
||||
继承 `BaseIndexTask`,实现以下接口:
|
||||
|
||||
```python
|
||||
class SpendingPowerIndexTask(BaseIndexTask):
|
||||
INDEX_TYPE = "SPI"
|
||||
|
||||
DEFAULT_PARAMS = {
|
||||
# 窗口参数
|
||||
'spend_window_short_days': 30,
|
||||
'spend_window_long_days': 90,
|
||||
'ewma_alpha_daily_spend': 0.3,
|
||||
# 金额压缩基数(初始默认值,可被自动校准或配置表覆盖)
|
||||
'amount_base_spend_30': 500.0,
|
||||
'amount_base_spend_90': 1500.0,
|
||||
'amount_base_ticket_90': 200.0,
|
||||
'amount_base_recharge_90': 1000.0,
|
||||
'amount_base_speed_abs': 100.0,
|
||||
'amount_base_ewma_90': 50.0,
|
||||
# Level 子分权重
|
||||
'w_level_spend_30': 0.30,
|
||||
'w_level_spend_90': 0.35,
|
||||
'w_level_ticket_90': 0.20,
|
||||
'w_level_recharge_90': 0.15,
|
||||
# Speed 子分权重
|
||||
'w_speed_abs': 0.50,
|
||||
'w_speed_rel': 0.30,
|
||||
'w_speed_ewma': 0.20,
|
||||
# 总分权重
|
||||
'weight_level': 0.60,
|
||||
'weight_speed': 0.30,
|
||||
'weight_stability': 0.10,
|
||||
# 稳定性参数
|
||||
'stability_window_days': 90,
|
||||
'use_stability': 1,
|
||||
# 映射与平滑
|
||||
'percentile_lower': 5,
|
||||
'percentile_upper': 95,
|
||||
'compression_mode': 1, # log1p
|
||||
'use_smoothing': 1,
|
||||
'ewma_alpha': 0.2,
|
||||
# 速度计算
|
||||
'speed_epsilon': 1e-6,
|
||||
}
|
||||
|
||||
# --- 必须实现的抽象方法 ---
|
||||
def get_task_code(self) -> str: ...
|
||||
def get_target_table(self) -> str: ...
|
||||
def get_primary_keys(self) -> List[str]: ...
|
||||
def get_index_type(self) -> str: ...
|
||||
|
||||
# --- 核心执行流程 ---
|
||||
def execute(self, context=None) -> Dict[str, Any]: ...
|
||||
|
||||
# --- 数据提取 ---
|
||||
def _extract_spending_features(self, site_id, params) -> Dict[int, SPIMemberFeatures]: ...
|
||||
def _extract_recharge_features(self, site_id, params) -> Dict[int, RechargeFeatures]: ...
|
||||
def _calibrate_amount_bases(self, features, params) -> Dict[str, float]: ...
|
||||
|
||||
# --- 子分计算(纯函数,可独立测试) ---
|
||||
@staticmethod
|
||||
def compute_level(features, params) -> float: ...
|
||||
@staticmethod
|
||||
def compute_speed(features, params) -> float: ...
|
||||
@staticmethod
|
||||
def compute_stability(features, params) -> float: ...
|
||||
@staticmethod
|
||||
def compute_spi_raw(level, speed, stability, params) -> float: ...
|
||||
|
||||
# --- 持久化 ---
|
||||
def _save_spi_data(self, data_list, site_id) -> int: ...
|
||||
```
|
||||
|
||||
### 关键设计:子分计算为静态方法
|
||||
|
||||
`compute_level`、`compute_speed`、`compute_stability`、`compute_spi_raw` 设计为 `@staticmethod`,不依赖数据库或任务实例状态。这使得属性测试可以直接调用这些纯函数,无需 mock 数据库连接。
|
||||
|
||||
|
||||
### SPIMemberFeatures 数据类
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class SPIMemberFeatures:
|
||||
"""SPI 计算所需的会员级特征"""
|
||||
member_id: int
|
||||
site_id: int
|
||||
|
||||
# 基础特征
|
||||
spend_30: float = 0.0 # 近30天消费总额
|
||||
spend_90: float = 0.0 # 近90天消费总额
|
||||
recharge_90: float = 0.0 # 近90天充值总额
|
||||
orders_30: int = 0 # 近30天消费笔数
|
||||
orders_90: int = 0 # 近90天消费笔数
|
||||
visit_days_30: int = 0 # 近30天消费日数(按天去重)
|
||||
visit_days_90: int = 0 # 近90天消费日数(按天去重)
|
||||
avg_ticket_90: float = 0.0 # 90天客单价
|
||||
active_weeks_90: int = 0 # 近90天有消费的自然周数
|
||||
daily_spend_ewma_90: float = 0.0 # 日消费 EWMA
|
||||
|
||||
# 子分
|
||||
score_level_raw: float = 0.0
|
||||
score_speed_raw: float = 0.0
|
||||
score_stability_raw: float = 0.0
|
||||
|
||||
# 展示分(归一化后填充)
|
||||
score_level_display: float = 0.0
|
||||
score_speed_display: float = 0.0
|
||||
score_stability_display: float = 0.0
|
||||
|
||||
# 总分
|
||||
raw_score: float = 0.0
|
||||
display_score: float = 0.0
|
||||
```
|
||||
|
||||
## 数据模型
|
||||
|
||||
### dws.dws_member_spending_power_index 表结构
|
||||
|
||||
```sql
|
||||
CREATE TABLE dws.dws_member_spending_power_index (
|
||||
spi_id BIGSERIAL PRIMARY KEY,
|
||||
site_id INTEGER NOT NULL,
|
||||
member_id BIGINT NOT NULL,
|
||||
|
||||
-- 基础特征
|
||||
spend_30 NUMERIC(14,2) DEFAULT 0,
|
||||
spend_90 NUMERIC(14,2) DEFAULT 0,
|
||||
recharge_90 NUMERIC(14,2) DEFAULT 0,
|
||||
orders_30 INTEGER DEFAULT 0,
|
||||
orders_90 INTEGER DEFAULT 0,
|
||||
visit_days_30 INTEGER DEFAULT 0,
|
||||
visit_days_90 INTEGER DEFAULT 0,
|
||||
avg_ticket_90 NUMERIC(14,2) DEFAULT 0,
|
||||
active_weeks_90 INTEGER DEFAULT 0,
|
||||
daily_spend_ewma_90 NUMERIC(14,2) DEFAULT 0,
|
||||
|
||||
-- 子分(Raw)
|
||||
score_level_raw NUMERIC(10,4) DEFAULT 0,
|
||||
score_speed_raw NUMERIC(10,4) DEFAULT 0,
|
||||
score_stability_raw NUMERIC(10,4) DEFAULT 0,
|
||||
|
||||
-- 子分(Display 0-10)
|
||||
score_level_display NUMERIC(5,2) DEFAULT 0,
|
||||
score_speed_display NUMERIC(5,2) DEFAULT 0,
|
||||
score_stability_display NUMERIC(5,2) DEFAULT 0,
|
||||
|
||||
-- 总分
|
||||
raw_score NUMERIC(10,4) DEFAULT 0,
|
||||
display_score NUMERIC(5,2) DEFAULT 0,
|
||||
|
||||
-- 元数据
|
||||
calc_time TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
|
||||
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
|
||||
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- 唯一约束(业务主键)
|
||||
CREATE UNIQUE INDEX idx_spi_site_member
|
||||
ON dws.dws_member_spending_power_index (site_id, member_id);
|
||||
|
||||
-- 查询索引
|
||||
CREATE INDEX idx_spi_display_score
|
||||
ON dws.dws_member_spending_power_index (site_id, display_score DESC);
|
||||
```
|
||||
|
||||
### cfg_index_parameters 新增种子数据
|
||||
|
||||
在 `db/etl_feiqiu/seeds/seed_index_parameters.sql` 中追加 `index_type='SPI'` 的参数行,格式与现有 WBI/NCI 参数一致。
|
||||
|
||||
### 执行流程
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Scheduler as 调度器
|
||||
participant Task as SpendingPowerIndexTask
|
||||
participant DB as PostgreSQL
|
||||
participant Base as BaseIndexTask
|
||||
|
||||
Scheduler->>Task: execute(context)
|
||||
Task->>DB: 获取 site_id
|
||||
Task->>Base: load_index_parameters('SPI')
|
||||
Base->>DB: SELECT FROM cfg_index_parameters
|
||||
Base-->>Task: params dict
|
||||
|
||||
Task->>DB: 提取消费订单(近90天)
|
||||
Task->>DB: 提取充值订单(近90天)
|
||||
Task->>Task: 聚合会员级特征
|
||||
Task->>Task: 校准金额压缩基数(如需)
|
||||
|
||||
loop 每个会员
|
||||
Task->>Task: compute_level(features, params)
|
||||
Task->>Task: compute_speed(features, params)
|
||||
Task->>Task: compute_stability(features, params)
|
||||
Task->>Task: compute_spi_raw(L, S, P, params)
|
||||
end
|
||||
|
||||
Task->>Base: batch_normalize_to_display(SPI raw scores)
|
||||
Task->>Base: batch_normalize_to_display(Level raw scores)
|
||||
Task->>Base: batch_normalize_to_display(Speed raw scores)
|
||||
Task->>Base: batch_normalize_to_display(Stability raw scores)
|
||||
|
||||
Task->>DB: DELETE FROM dws_member_spending_power_index WHERE site_id = %s
|
||||
Task->>DB: INSERT INTO dws_member_spending_power_index (batch)
|
||||
Task->>Base: save_percentile_history('SPI')
|
||||
Task-->>Scheduler: {status, member_count, records_inserted}
|
||||
```
|
||||
|
||||
|
||||
## 正确性属性
|
||||
|
||||
*正确性属性(Correctness Property)是系统在所有合法执行路径上都应成立的行为特征——本质上是对"系统应该做什么"的形式化陈述。属性是人类可读规格与机器可验证正确性保证之间的桥梁。*
|
||||
|
||||
以下属性基于需求文档中的验收标准推导,每个属性都是可通过 hypothesis 属性测试验证的全称量化命题。子分计算函数(`compute_level`、`compute_speed`、`compute_stability`、`compute_spi_raw`)设计为纯静态方法,不依赖数据库,可直接用于属性测试。
|
||||
|
||||
### Property 1: SPI 总分非负性
|
||||
|
||||
*For any* 非负的 Level、Speed、Stability 子分和非负的权重参数,`compute_spi_raw(L, S, P, params)` 的返回值应为非负。
|
||||
|
||||
推导:`SPI_raw = w_L × L + w_S × S + w_P × P`,当所有子分 ≥ 0 且所有权重 ≥ 0 时,加权和必然 ≥ 0。
|
||||
|
||||
**Validates: Requirements 6.1, 10.1**
|
||||
|
||||
### Property 2: Level 子分关于消费金额单调非递减
|
||||
|
||||
*For any* 非负的特征值和参数,若仅增加 `spend_30` 或 `spend_90`(其他特征不变),`compute_level` 的返回值不应减少。
|
||||
|
||||
推导:`L` 中每一项形如 `w × ln(1 + x/M)`,`ln(1 + x/M)` 关于 `x` 单调递增(`x ≥ 0, M > 0`),权重 `w ≥ 0`,因此增加任一消费金额项只会使 `L` 增加或不变。
|
||||
|
||||
**Validates: Requirements 3.1, 10.2**
|
||||
|
||||
### Property 3: Speed 子分关于 spend_30 单调非递减
|
||||
|
||||
*For any* 非负的特征值和参数,若仅增加 `spend_30`(其他特征不变),`compute_speed` 的返回值不应减少。
|
||||
|
||||
推导:
|
||||
- `V_abs = ln(1 + spend_30 / (max(visit_days_30, 1) × V0))`:关于 spend_30 单调递增
|
||||
- `V_rel = ln((spend_30/30 + ε) / (spend_90/90 + ε))`:spend_30 增加使分子增大,`max(0, V_rel)` 不减
|
||||
- `V_ewma`:不依赖 spend_30,不变
|
||||
- 三项加权和中前两项不减,第三项不变,总和不减
|
||||
|
||||
**Validates: Requirements 4.1, 4.4, 10.3**
|
||||
|
||||
### Property 4: Stability 子分取值范围 [0, 1]
|
||||
|
||||
*For any* `active_weeks_90` 在 [0, 13] 范围内,`compute_stability` 的返回值应在 [0, 1] 范围内。
|
||||
|
||||
推导:`P = active_weeks_90 / 13`,当 `active_weeks_90 ∈ {0, 1, ..., 13}` 时,`P ∈ [0, 1]`。
|
||||
|
||||
**Validates: Requirements 5.2, 5.4, 10.4**
|
||||
|
||||
### Property 5: Display Score 取值范围 [0, 10]
|
||||
|
||||
*For any* 非空的 raw_score 列表(所有值非负),经 `batch_normalize_to_display` 映射后,所有 display_score 应在 [0.00, 10.00] 范围内。
|
||||
|
||||
推导:`batch_normalize_to_display` 内部先 Winsorize 到 [P5, P95],再 MinMax 映射到 [0, 10],最后 `max(0, min(10, score))` 截断。
|
||||
|
||||
**Validates: Requirements 6.6, 10.5**
|
||||
|
||||
## 错误处理
|
||||
|
||||
| 场景 | 处理方式 |
|
||||
|------|----------|
|
||||
| 门店无消费/充值数据 | 返回 `{'status': 'skipped', 'reason': 'no_data'}`,不写入任何记录 |
|
||||
| cfg_index_parameters 中缺少 SPI 参数 | 使用 `DEFAULT_PARAMS` 字典中的默认值,日志 WARNING |
|
||||
| 金额压缩基数为 0 或负数 | 使用 DEFAULT_PARAMS 中的默认基数,日志 WARNING |
|
||||
| orders_90 = 0 导致除零 | `avg_ticket_90 = spend_90 / max(orders_90, 1)`,分母至少为 1 |
|
||||
| visit_days_30 = 0 导致除零 | `V_abs` 公式中 `max(visit_days_30, 1)`,分母至少为 1 |
|
||||
| v_30 和 v_90 均为 0 导致 V_rel 除零 | 使用 `ε`(speed_epsilon,默认 1e-6)防除零 |
|
||||
| 所有会员 raw_score 相同 | `batch_normalize_to_display` 在 `max - min < ε` 时返回 5.0 |
|
||||
| 数据库写入失败 | 事务回滚,抛出异常由调度器处理 |
|
||||
| EWMA 分位历史不存在(首次执行) | 不平滑,直接使用当前分位点 |
|
||||
|
||||
## 测试策略
|
||||
|
||||
### 属性测试(hypothesis)
|
||||
|
||||
属性测试位于 `tests/` 目录(Monorepo 级),使用 `hypothesis` 库。
|
||||
|
||||
每个属性测试对应设计文档中的一个 Property,最少运行 100 次迭代。
|
||||
|
||||
测试文件:`tests/test_spi_properties.py`
|
||||
|
||||
```python
|
||||
# Feature: spi-spending-power-index, Property 1: SPI 总分非负性
|
||||
@given(
|
||||
level=st.floats(min_value=0, max_value=100),
|
||||
speed=st.floats(min_value=0, max_value=100),
|
||||
stability=st.floats(min_value=0, max_value=1),
|
||||
)
|
||||
@settings(max_examples=200)
|
||||
def test_spi_raw_non_negative(level, speed, stability):
|
||||
params = SpendingPowerIndexTask.DEFAULT_PARAMS
|
||||
result = SpendingPowerIndexTask.compute_spi_raw(level, speed, stability, params)
|
||||
assert result >= 0
|
||||
```
|
||||
|
||||
属性测试库:`hypothesis`(已在项目依赖中)
|
||||
|
||||
### 单元测试
|
||||
|
||||
单元测试位于 `apps/etl/connectors/feiqiu/tests/unit/`,使用 FakeDB/FakeAPI 工具。
|
||||
|
||||
重点覆盖:
|
||||
- 边界情况:全零输入、单一极大值输入
|
||||
- 配置回退:参数缺失时使用默认值
|
||||
- 任务注册:验证 task_registry 中 SPI 任务的注册信息
|
||||
- use_stability=0 时稳定性子分不参与计算
|
||||
|
||||
### 测试配置
|
||||
|
||||
- 属性测试:`cd C:\NeoZQYY && pytest tests/test_spi_properties.py -v`
|
||||
- 单元测试:`cd apps/etl/connectors/feiqiu && pytest tests/unit/test_spi_task.py -v`
|
||||
- 每个属性测试标注 `@settings(max_examples=200)`
|
||||
- 每个属性测试注释引用设计文档 Property 编号
|
||||
|
||||
156
.kiro/specs/spi-spending-power-index/requirements.md
Normal file
156
.kiro/specs/spi-spending-power-index/requirements.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# 需求文档:SPI 消费力指数(Spending Power Index)
|
||||
|
||||
## 简介
|
||||
|
||||
SPI(Spending Power Index,消费力指数)是 NeoZQYY 指数体系的新增客户级指数,用于评估会员在门店场景内的综合消费力层级。SPI 基于消费水平(Level)、消费速度(Speed)、消费稳定性(Stability)三个子分加权合成,与现有 NCI/WBI/RS/OS/MS/ML 指数协同使用,为运营人员提供客户分层和资源分配依据。
|
||||
|
||||
## 术语表
|
||||
|
||||
- **SPI_Task**:`SpendingPowerIndexTask`,负责计算 SPI 指数的 ETL 任务
|
||||
- **BaseIndexTask**:指数算法任务基类,提供展示分映射(Winsorize → 压缩 → MinMax 0-10 → EWMA 平滑)
|
||||
- **cfg_index_parameters**:`dws.cfg_index_parameters` 表,按 `index_type` + `param_name` 存储指数算法参数
|
||||
- **dws_member_spending_power_index**:SPI 指数结果表,存储会员级消费力评分
|
||||
- **Raw_Score**:原始评分,由算法直接计算得出的未归一化分数
|
||||
- **Display_Score**:展示分,Raw_Score 经 P5/P95 Winsorize → 可选压缩 → MinMax 映射到 [0, 10] 的归一化分数
|
||||
- **Level_Sub_Score**:消费水平子分,衡量客户消费金额层级与客单水平
|
||||
- **Speed_Sub_Score**:消费速度子分,衡量近期消费推进速度与节奏变化
|
||||
- **Stability_Sub_Score**:消费稳定性子分,衡量消费行为的时间覆盖稳定性
|
||||
- **Winsorize**:分位截断,将值限制在 [P5, P95] 范围内以消除极端值影响
|
||||
- **EWMA**:指数加权移动平均(Exponential Weighted Moving Average),用于平滑分位点避免批次间波动
|
||||
- **log1p**:`ln(1 + x)` 压缩变换,用于处理长尾分布
|
||||
- **settle_type**:结算类型,1=台桌结账,3=商城订单,5=充值订单
|
||||
- **task_registry**:`orchestration/task_registry.py`,ETL 任务注册表
|
||||
- **delete-before-insert**:按门店全量刷新策略,先删除该门店所有旧记录再插入新记录
|
||||
|
||||
## 需求
|
||||
|
||||
### 需求 1:SPI 结果表创建
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要创建 SPI 指数结果表,以便存储会员级消费力评分及中间特征。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE SPI_Task SHALL 将结果写入 `dws.dws_member_spending_power_index` 表,主键为 `(site_id, member_id)`
|
||||
2. THE dws_member_spending_power_index 表 SHALL 包含基础特征字段:`spend_30`、`spend_90`、`recharge_90`、`orders_30`、`orders_90`、`visit_days_30`、`visit_days_90`、`avg_ticket_90`、`active_weeks_90`、`daily_spend_ewma_90`
|
||||
3. THE dws_member_spending_power_index 表 SHALL 包含子分字段:`score_level_raw`、`score_level_display`、`score_speed_raw`、`score_speed_display`、`score_stability_raw`、`score_stability_display`
|
||||
4. THE dws_member_spending_power_index 表 SHALL 包含总分字段:`raw_score`(SPI 原始分)和 `display_score`(SPI 展示分,numeric(5,2))
|
||||
5. THE dws_member_spending_power_index 表 SHALL 包含元数据字段:`calc_time`、`created_at`、`updated_at`
|
||||
6. THE 开发者 SHALL 编写迁移脚本 `db/etl_feiqiu/migrations/<日期>_create_dws_member_spending_power_index.sql`,在测试库 test_etl_feiqiu 中执行建表
|
||||
7. WHEN DDL 在测试库执行成功后,THE 开发者 SHALL 运行 `python scripts/ops/gen_consolidated_ddl.py` 从测试库导出最新 DDL,自动合并到 `docs/database/ddl/etl_feiqiu__dws.sql`
|
||||
|
||||
### 需求 2:SPI 基础特征提取
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要从 DWD 层提取会员消费和充值数据,以便计算 SPI 所需的基础特征。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE SPI_Task SHALL 从 `dwd.dwd_settlement_head` 提取近 90 天消费订单(settle_type IN (1, 3)),聚合为客户级特征
|
||||
2. THE SPI_Task SHALL 从 `dwd.dwd_recharge_order` 提取近 90 天充值订单(settle_type = 5),聚合为客户级充值特征
|
||||
3. THE SPI_Task SHALL 计算以下基础特征:`spend_30`(近30天消费总额)、`spend_90`(近90天消费总额)、`recharge_90`(近90天充值总额)、`orders_30`(近30天消费笔数)、`orders_90`(近90天消费笔数)、`visit_days_30`(近30天消费日数,按天去重)、`visit_days_90`(近90天消费日数,按天去重)
|
||||
4. THE SPI_Task SHALL 计算 `avg_ticket_90 = spend_90 / max(orders_90, 1)`
|
||||
5. THE SPI_Task SHALL 计算 `active_weeks_90`(近90天有消费的自然周数,最多13周)
|
||||
6. THE SPI_Task SHALL 对近90天日消费序列计算 EWMA 得到 `daily_spend_ewma_90`,平滑系数从 cfg_index_parameters 读取
|
||||
|
||||
### 需求 3:Level 子分计算
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要实现消费水平(Level)子分算法,以便衡量客户的消费金额层级。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE SPI_Task SHALL 按以下公式计算 Level 子分:`L = w_s30 × ln(1 + spend_30/M30) + w_s90 × ln(1 + spend_90/M90) + w_ticket × ln(1 + avg_ticket_90/T0) + w_r90 × ln(1 + recharge_90/R90)`
|
||||
2. THE SPI_Task SHALL 从 cfg_index_parameters 读取 Level 子分的权重参数(`w_level_spend_30`、`w_level_spend_90`、`w_level_ticket_90`、`w_level_recharge_90`)和金额压缩基数(`amount_base_spend_30`、`amount_base_spend_90`、`amount_base_ticket_90`、`amount_base_recharge_90`)
|
||||
3. WHEN 所有消费和充值金额均为 0 时,THE SPI_Task SHALL 将 Level 子分 Raw 设为 0.0
|
||||
|
||||
### 需求 4:Speed 子分计算
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要实现消费速度(Speed)子分算法,以便衡量客户近期消费推进速度。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE SPI_Task SHALL 计算绝对速度:`V_abs = ln(1 + spend_30 / (max(visit_days_30, 1) × V0))`
|
||||
2. THE SPI_Task SHALL 计算相对速度:`V_rel = ln((v_30 + ε) / (v_90 + ε))`,其中 `v_30 = spend_30 / 30`,`v_90 = spend_90 / 90`,`ε` 为防除零小量
|
||||
3. THE SPI_Task SHALL 计算 EWMA 速度:`V_ewma = ln(1 + daily_spend_ewma_90 / E0)`
|
||||
4. THE SPI_Task SHALL 按以下公式合成 Speed 子分:`S = w_abs × V_abs + w_rel × max(0, V_rel) + w_ewma × V_ewma`
|
||||
5. THE SPI_Task SHALL 仅对加速(`V_rel > 0`)加分,不对减速直接扣分(通过 `max(0, V_rel)` 实现)
|
||||
|
||||
### 需求 5:Stability 子分计算
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要实现消费稳定性(Stability)子分算法,以便识别稳定高消费与偶发冲高。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE SPI_Task SHALL 使用近 90 天数据计算稳定性,窗口固定为 90 天
|
||||
2. THE SPI_Task SHALL 按周覆盖率计算稳定性:`P = active_weeks_90 / 13`(近90天共约13个自然周)
|
||||
3. WHEN cfg_index_parameters 中 `use_stability = 0` 时,THE SPI_Task SHALL 将 Stability 子分权重视为 0,跳过稳定性计算
|
||||
4. THE Stability_Sub_Score SHALL 的取值范围为 [0, 1]
|
||||
|
||||
### 需求 6:SPI 总分合成与展示分映射
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要将三个子分加权合成 SPI 总分并映射为展示分,以便业务人员直观理解客户消费力层级。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE SPI_Task SHALL 按以下公式计算 SPI 总分:`SPI_raw = w_L × L + w_S × S + w_P × P`,默认权重 `w_L=0.60`、`w_S=0.30`、`w_P=0.10`
|
||||
2. THE SPI_Task SHALL 复用 BaseIndexTask 的 `batch_normalize_to_display` 方法将 Raw_Score 映射为 Display_Score(0-10 分)
|
||||
3. THE SPI_Task SHALL 对 Level、Speed、Stability 三个子分分别独立映射为展示分(0-10 分)
|
||||
4. THE SPI_Task SHALL 支持通过 cfg_index_parameters 配置压缩模式(`compression_mode`:0=无压缩,1=log1p,2=asinh)
|
||||
5. THE SPI_Task SHALL 支持通过 cfg_index_parameters 配置 EWMA 分位平滑(`use_smoothing`、`ewma_alpha`)
|
||||
6. THE Display_Score SHALL 保留 2 位小数,取值范围为 [0.00, 10.00]
|
||||
|
||||
### 需求 7:SPI 配置参数管理
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要在 cfg_index_parameters 中注册 SPI 的全部默认参数,以便算法参数可配置、可追溯。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE 种子数据脚本 SHALL 在 cfg_index_parameters 中插入 `index_type='SPI'` 的全部参数,包括窗口参数、金额压缩基数、子分权重、总分权重、映射与平滑参数
|
||||
2. THE SPI_Task SHALL 通过 BaseIndexTask 的 `load_index_parameters(index_type='SPI')` 加载参数
|
||||
3. IF cfg_index_parameters 中缺少某个 SPI 参数,THEN THE SPI_Task SHALL 使用 DEFAULT_PARAMS 字典中定义的默认值
|
||||
4. THE 种子数据脚本 SHALL 追加到 `db/etl_feiqiu/seeds/seed_index_parameters.sql`
|
||||
|
||||
### 需求 8:金额压缩基数校准
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要提供金额压缩基数的校准机制,以便各门店的 SPI 评分能适配不同的消费水平分布。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE SPI_Task SHALL 在首次执行或参数缺失时,支持从门店历史数据自动计算金额压缩基数的建议值
|
||||
2. THE SPI_Task SHALL 以近 90 天消费数据的中位数作为各金额压缩基数的默认校准值:`amount_base_spend_30` 取近30天消费中位数、`amount_base_spend_90` 取近90天消费中位数、`amount_base_ticket_90` 取90天客单中位数、`amount_base_recharge_90` 取90天充值中位数、`amount_base_speed_abs` 取每消费日平均消费中位数、`amount_base_ewma_90` 取日消费 EWMA 中位数
|
||||
3. IF cfg_index_parameters 中已存在对应的金额压缩基数参数,THEN THE SPI_Task SHALL 优先使用配置表中的值而非自动校准值
|
||||
4. THE SPI_Task SHALL 在日志中输出实际使用的金额压缩基数值,便于运营人员审查和手动调优
|
||||
5. THE 种子数据脚本 SHALL 为金额压缩基数提供合理的初始默认值(基于典型台球门店消费水平)
|
||||
|
||||
### 需求 9:SPI 任务注册与执行
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要将 SPI 任务注册到 task_registry 并实现完整的执行流程,以便通过调度器触发计算。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE SPI_Task SHALL 以任务代码 `DWS_SPENDING_POWER_INDEX` 注册到 task_registry,`layer="INDEX"`,`requires_db_config=False`
|
||||
2. THE SPI_Task SHALL 声明依赖 `depends_on=["DWS_MEMBER_CONSUMPTION"]`
|
||||
3. THE SPI_Task SHALL 采用 delete-before-insert 策略:先按 `site_id` 删除旧记录,再批量插入新记录
|
||||
4. WHEN 门店无任何消费或充值数据时,THE SPI_Task SHALL 返回 `{'status': 'skipped', 'reason': 'no_data'}` 并跳过计算
|
||||
5. THE SPI_Task SHALL 在执行完成后保存分位点历史到 `dws_index_percentile_history` 表(index_type='SPI')
|
||||
|
||||
### 需求 10:SPI 算法正确性测试
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要通过属性测试(hypothesis)验证 SPI 算法的正确性,以便确保计算逻辑符合 PRD 定义。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE 属性测试 SHALL 验证:对于任意非负消费/充值金额,SPI_raw 为非负值
|
||||
2. THE 属性测试 SHALL 验证:在其他条件不变时,增加 spend_30 或 spend_90 不会导致 Level 子分下降(单调性)
|
||||
3. THE 属性测试 SHALL 验证:在其他条件不变时,增加 spend_30 不会导致 Speed 子分下降(单调性)
|
||||
4. THE 属性测试 SHALL 验证:Stability 子分取值范围为 [0, 1]
|
||||
5. THE 属性测试 SHALL 验证:Display_Score 取值范围为 [0.00, 10.00]
|
||||
6. THE 属性测试 SHALL 验证:SPI 总分权重 `w_L + w_S + w_P` 之和为 1.0(权重归一化)
|
||||
|
||||
### 需求 11:文档更新
|
||||
|
||||
**用户故事:** 作为 ETL 开发者,我需要更新相关文档,以便团队成员了解 SPI 的表结构、算法逻辑和使用方式。
|
||||
|
||||
#### 验收标准
|
||||
|
||||
1. THE 开发者 SHALL 编写数据库手册文档 `docs/database/BD_Manual_dws_member_spending_power_index.md`,包含表结构、字段说明、索引、验证 SQL
|
||||
2. THE 开发者 SHALL 更新 ETL 任务文档 `apps/etl/connectors/feiqiu/docs/etl_tasks/index_tasks.md`,新增 SPI 任务章节
|
||||
3. THE 文档 SHALL 包含 SPI 算法公式、参数清单、数据来源、计算流程说明
|
||||
131
.kiro/specs/spi-spending-power-index/tasks.md
Normal file
131
.kiro/specs/spi-spending-power-index/tasks.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# 实现计划:SPI 消费力指数(Spending Power Index)
|
||||
|
||||
## 概述
|
||||
|
||||
基于设计文档,将 SPI 指数建设拆分为 DDL 建表 → 核心算法实现 → 任务注册与执行流程 → 配置种子数据 → 属性测试 → 文档更新 六个阶段,每个阶段增量构建并可验证。
|
||||
|
||||
## 任务
|
||||
|
||||
- [x] 1. 创建 DDL 与数据库表
|
||||
- [x] 1.1 编写迁移脚本 `db/etl_feiqiu/migrations/<日期>_create_dws_member_spending_power_index.sql`
|
||||
- 创建 `dws.dws_member_spending_power_index` 表(含序列、唯一索引、查询索引)
|
||||
- 字段定义参照设计文档数据模型章节
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
|
||||
- [x] 1.2 在测试库 test_etl_feiqiu 执行迁移脚本建表
|
||||
- 通过 TEST_DB_DSN 连接测试库执行 SQL
|
||||
- _Requirements: 1.6_
|
||||
- [x] 1.3 运行 `gen_consolidated_ddl.py` 从测试库导出 DDL 合并到主 DDL
|
||||
- 执行 `python scripts/ops/gen_consolidated_ddl.py`,验证 `docs/database/ddl/etl_feiqiu__dws.sql` 已包含新表
|
||||
- _Requirements: 1.7_
|
||||
|
||||
- [x] 2. 实现 SPI 核心算法(纯函数)
|
||||
- [x] 2.1 创建 `SPIMemberFeatures` 数据类和 `SpendingPowerIndexTask` 骨架
|
||||
- 新建 `apps/etl/connectors/feiqiu/tasks/dws/index/spending_power_index_task.py`
|
||||
- 定义 `SPIMemberFeatures` dataclass
|
||||
- 定义 `SpendingPowerIndexTask` 类继承 `BaseIndexTask`,包含 `INDEX_TYPE`、`DEFAULT_PARAMS`、抽象方法实现
|
||||
- _Requirements: 7.2, 7.3, 9.1_
|
||||
- [x] 2.2 实现 `compute_level` 静态方法
|
||||
- Level 子分公式:`L = w_s30 × ln(1 + spend_30/M30) + w_s90 × ln(1 + spend_90/M90) + w_ticket × ln(1 + avg_ticket_90/T0) + w_r90 × ln(1 + recharge_90/R90)`
|
||||
- 全零输入返回 0.0
|
||||
- _Requirements: 3.1, 3.2, 3.3_
|
||||
- [x] 2.3 实现 `compute_speed` 静态方法
|
||||
- 绝对速度 V_abs、相对速度 V_rel(仅加速加分)、EWMA 速度 V_ewma
|
||||
- Speed 子分公式:`S = w_abs × V_abs + w_rel × max(0, V_rel) + w_ewma × V_ewma`
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4, 4.5_
|
||||
- [x] 2.4 实现 `compute_stability` 静态方法
|
||||
- 周覆盖率:`P = active_weeks_90 / 13`
|
||||
- 支持 `use_stability=0` 时返回 0.0
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4_
|
||||
- [x] 2.5 实现 `compute_spi_raw` 静态方法
|
||||
- 总分公式:`SPI_raw = w_L × L + w_S × S + w_P × P`
|
||||
- _Requirements: 6.1_
|
||||
- [x] 2.6 编写属性测试:SPI 总分非负性
|
||||
- **Property 1: SPI 总分非负性**
|
||||
- **Validates: Requirements 6.1, 10.1**
|
||||
- [x] 2.7 编写属性测试:Level 子分单调性
|
||||
- **Property 2: Level 子分关于消费金额单调非递减**
|
||||
- **Validates: Requirements 3.1, 10.2**
|
||||
- [x] 2.8 编写属性测试:Speed 子分单调性
|
||||
- **Property 3: Speed 子分关于 spend_30 单调非递减**
|
||||
- **Validates: Requirements 4.1, 4.4, 10.3**
|
||||
- [x] 2.9 编写属性测试:Stability 子分范围
|
||||
- **Property 4: Stability 子分取值范围 [0, 1]**
|
||||
- **Validates: Requirements 5.2, 5.4, 10.4**
|
||||
- [x] 2.10 编写属性测试:Display Score 范围
|
||||
- **Property 5: Display Score 取值范围 [0, 10]**
|
||||
- **Validates: Requirements 6.6, 10.5**
|
||||
|
||||
- [x] 3. 检查点 - 确保核心算法测试通过
|
||||
- 运行 `cd C:\NeoZQYY && pytest tests/test_spi_properties.py -v`
|
||||
- 确保所有属性测试通过,如有问题请询问用户。
|
||||
|
||||
- [x] 4. 实现数据提取与执行流程
|
||||
- [x] 4.1 实现 `_extract_spending_features` 方法
|
||||
- 从 `dwd.dwd_settlement_head` 提取近 90 天消费订单,聚合为会员级特征
|
||||
- 计算 spend_30/90、orders_30/90、visit_days_30/90、avg_ticket_90、active_weeks_90
|
||||
- _Requirements: 2.1, 2.3, 2.4, 2.5_
|
||||
- [x] 4.2 实现 `_extract_recharge_features` 方法
|
||||
- 从 `dwd.dwd_recharge_order` 提取近 90 天充值订单
|
||||
- _Requirements: 2.2_
|
||||
- [x] 4.3 实现 `_compute_daily_spend_ewma` 方法
|
||||
- 对近 90 天日消费序列计算 EWMA
|
||||
- _Requirements: 2.6_
|
||||
- [x] 4.4 实现 `_calibrate_amount_bases` 方法
|
||||
- 从门店数据计算中位数作为金额压缩基数校准值
|
||||
- 配置表优先级高于自动校准值
|
||||
- 日志输出实际使用的基数值
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4_
|
||||
- [x] 4.5 实现 `execute` 方法(完整执行流程)
|
||||
- 获取 site_id → 加载参数 → 提取特征 → 校准基数 → 计算子分 → 合成总分 → 归一化展示分 → 持久化
|
||||
- delete-before-insert 策略
|
||||
- 无数据时返回 skipped
|
||||
- 保存分位点历史
|
||||
- _Requirements: 6.2, 6.3, 6.4, 6.5, 6.6, 9.3, 9.4, 9.5_
|
||||
- [x] 4.6 实现 `_save_spi_data` 方法
|
||||
- 批量 INSERT 到 dws_member_spending_power_index
|
||||
- _Requirements: 1.1_
|
||||
|
||||
- [x] 5. 任务注册与模块导出
|
||||
- [x] 5.1 在 `tasks/dws/index/__init__.py` 中导出 `SpendingPowerIndexTask`
|
||||
- 添加 import 和 __all__ 条目
|
||||
- _Requirements: 9.1_
|
||||
- [x] 5.2 在 `tasks/dws/__init__.py` 中导出 `SpendingPowerIndexTask`
|
||||
- 添加 import 和 __all__ 条目
|
||||
- _Requirements: 9.1_
|
||||
- [x] 5.3 在 `orchestration/task_registry.py` 中注册 SPI 任务
|
||||
- `default_registry.register("DWS_SPENDING_POWER_INDEX", SpendingPowerIndexTask, requires_db_config=False, layer="INDEX", depends_on=["DWS_MEMBER_CONSUMPTION"])`
|
||||
- _Requirements: 9.1, 9.2_
|
||||
|
||||
- [-] 6. 配置种子数据
|
||||
- [x] 6.1 在 `db/etl_feiqiu/seeds/seed_index_parameters.sql` 中追加 SPI 参数
|
||||
- 插入 `index_type='SPI'` 的全部参数行(窗口、基数、权重、映射、稳定性)
|
||||
- 金额压缩基数使用合理初始默认值
|
||||
- _Requirements: 7.1, 7.4, 8.5_
|
||||
- [~] 6.2 在测试库执行种子数据脚本
|
||||
- 通过 TEST_DB_DSN 连接测试库执行 INSERT
|
||||
- _Requirements: 7.1_
|
||||
|
||||
- [~] 7. 检查点 - 确保单元测试和集成验证通过
|
||||
- 运行 `cd apps/etl/connectors/feiqiu && pytest tests/unit/test_spi_task.py -v`
|
||||
- 确保所有测试通过,如有问题请询问用户。
|
||||
|
||||
- [ ] 8. 文档更新
|
||||
- [~] 8.1 编写数据库手册 `docs/database/BD_Manual_dws_member_spending_power_index.md`
|
||||
- 包含表结构、字段说明、索引、验证 SQL(至少 3 条)、兼容性说明、回滚策略
|
||||
- _Requirements: 11.1_
|
||||
- [~] 8.2 更新 ETL 任务文档 `apps/etl/connectors/feiqiu/docs/etl_tasks/index_tasks.md`
|
||||
- 新增 DWS_SPENDING_POWER_INDEX 章节,包含算法公式、参数清单、数据来源、计算流程
|
||||
- 更新概述表格和继承体系图
|
||||
- _Requirements: 11.2, 11.3_
|
||||
|
||||
- [~] 9. 最终检查点 - 确保所有测试通过
|
||||
- 运行属性测试:`cd C:\NeoZQYY && pytest tests/test_spi_properties.py -v`
|
||||
- 运行单元测试:`cd apps/etl/connectors/feiqiu && pytest tests/unit/test_spi_task.py -v`
|
||||
- 确保所有测试通过,如有问题请询问用户。
|
||||
|
||||
## 备注
|
||||
|
||||
- 标记 `*` 的子任务为可选(属性测试),可跳过以加速 MVP
|
||||
- 每个任务引用具体需求编号以确保可追溯
|
||||
- 检查点确保增量验证
|
||||
- 属性测试验证全称正确性属性,单元测试验证具体示例和边界情况
|
||||
48
.kiro/steering/export-paths-full.md
Normal file
48
.kiro/steering/export-paths-full.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
inclusion: fileMatch
|
||||
fileMatchPattern: "**/.env*,**/scripts/**,**/export/**,**/EXPORT-PATHS*"
|
||||
name: export-paths-full
|
||||
description: 输出路径完整规范(目录结构、环境变量映射、检查清单)。读到 .env / scripts / export 文件时自动加载。
|
||||
---
|
||||
|
||||
# 输出路径完整规范
|
||||
|
||||
## 目录结构与环境变量
|
||||
|
||||
```
|
||||
export/
|
||||
├── ETL-Connectors/feiqiu/
|
||||
│ ├── JSON/ — EXPORT_ROOT / FETCH_ROOT
|
||||
│ ├── LOGS/ — LOG_ROOT
|
||||
│ └── REPORTS/ — ETL_REPORT_ROOT
|
||||
├── SYSTEM/
|
||||
│ ├── LOGS/ — SYSTEM_LOG_ROOT
|
||||
│ ├── REPORTS/
|
||||
│ │ ├── dataflow_analysis/ — SYSTEM_ANALYZE_ROOT
|
||||
│ │ ├── field_audit/ — FIELD_AUDIT_ROOT
|
||||
│ │ └── full_dataflow_doc/ — FULL_DATAFLOW_DOC_ROOT
|
||||
│ └── CACHE/
|
||||
│ └── api_samples/ — API_SAMPLE_CACHE_ROOT
|
||||
└── BACKEND/
|
||||
└── LOGS/ — BACKEND_LOG_ROOT
|
||||
```
|
||||
|
||||
## 路径读取方式详细
|
||||
- `scripts/ops/` 脚本:通过 `_env_paths.get_output_path("变量名")` 读取(内部自动 `load_dotenv`)
|
||||
- ETL 核心模块:通过 `env_parser.py` → `AppConfig` 的 `io.*` 配置节读取
|
||||
- ETL 独立脚本:通过 `os.environ.get("ETL_REPORT_ROOT")` 读取,缺失时抛错
|
||||
- 后端:通过 `os.environ.get("BACKEND_LOG_ROOT")` 读取
|
||||
|
||||
## 新增输出场景的检查清单
|
||||
|
||||
当任何操作需要写入文件时,按以下顺序确认:
|
||||
1. 该输出是否已有对应的环境变量?→ 直接使用
|
||||
2. 是否属于现有目录分类(ETL/SYSTEM/BACKEND)?→ 使用对应父目录变量 + 子路径
|
||||
3. 都不匹配?→ 在 `export/` 下新建合理子目录,新增环境变量,更新 `.env` / `.env.template` / `EXPORT-PATHS.md`
|
||||
|
||||
## 共享工具
|
||||
- `scripts/ops/_env_paths.py`:提供 `get_output_path(env_var)` 函数,自动 `load_dotenv` + 读取 + 建目录 + 缺失报错
|
||||
|
||||
## 参考文档
|
||||
- 完整目录说明:`docs/deployment/EXPORT-PATHS.md`
|
||||
- 环境变量定义:根 `.env` 的"统一输出路径配置"节
|
||||
16
.kiro/steering/export-paths.md
Normal file
16
.kiro/steering/export-paths.md
Normal file
@@ -0,0 +1,16 @@
|
||||
---
|
||||
inclusion: always
|
||||
---
|
||||
|
||||
# 输出路径规范(强制)
|
||||
|
||||
## 核心原则
|
||||
所有文件输出必须写入 `export/` 统一目录结构,通过 `.env` 环境变量控制路径。禁止在 `export/` 体系外自行创建输出目录。
|
||||
|
||||
## 编码规则
|
||||
1. 路径必须从 `.env` 读取,禁止硬编码任何目录结构(含绝对路径和相对路径)
|
||||
2. 环境变量缺失时必须报错(`KeyError` / `RuntimeError`),禁止静默回退
|
||||
3. 路径读取方式:`scripts/ops/` 用 `_env_paths.get_output_path()`;ETL 核心用 `AppConfig.io.*`;独立脚本用 `os.environ.get()` + 显式报错
|
||||
4. 新增输出类型:先在根 `.env` + `.env.template` 新增变量,再在代码中引用,同步更新 `docs/deployment/EXPORT-PATHS.md`
|
||||
|
||||
> 完整目录结构、环境变量映射表、新增场景检查清单见 `export-paths-full.md`(fileMatch:读到 `.env*` / `scripts/` / `export/` 文件时自动加载,也可 `#export-paths-full` 手动加载)。
|
||||
18
.kiro/steering/product-full.md
Normal file
18
.kiro/steering/product-full.md
Normal file
@@ -0,0 +1,18 @@
|
||||
---
|
||||
inclusion: fileMatch
|
||||
fileMatchPattern: "**/tasks/**,**/models/**,**/loaders/**,**/scd/**,**/quality/**,**/business-rules/**"
|
||||
name: product-full
|
||||
description: 产品详细说明(ETL 功能、指数算法、在线/离线模式)。读到 ETL 任务/模型/业务规则文件时自动加载。
|
||||
---
|
||||
|
||||
# 产品详细说明
|
||||
|
||||
## ETL 功能
|
||||
- 从上游 SaaS API 抽取运营数据(订单、支付、会员、助教、库存等)
|
||||
- 原始数据落地 ODS,保留源 payload 便于回溯
|
||||
- 清洗装载至 DWD,维度走 SCD2,事实按时间增量
|
||||
- 汇总至 DWS:助教业绩、财务日报、会员分析、工资计算、自定义指数算法(WBI/NCI/RS/OS/MS/ML/SPI)
|
||||
- 支持在线(API 抓取)和离线(JSON 回放)两种模式
|
||||
|
||||
## 主要入口
|
||||
详见 `tech.md` 常用命令节。
|
||||
@@ -4,29 +4,19 @@ inclusion: always
|
||||
|
||||
# 产品概述
|
||||
|
||||
NeoZQYY Monorepo — 面向台球门店业务的全栈数据平台,包含 ETL Connector、后端 API、管理后台、微信小程序。
|
||||
NeoZQYY Monorepo — 面向台球门店业务的全栈数据平台。
|
||||
|
||||
## 子系统
|
||||
- ETL Connector:从上游 SaaS API 抽取运营数据,经 ODS → DWD → DWS 三层处理
|
||||
- FastAPI 后端:业务 API 服务
|
||||
- 微信小程序:C 端用户界面
|
||||
- 管理后台(`apps/admin-web/`):任务管理、调度配置、数据查看、ETL 状态监控(已替代原 PySide6 桌面 GUI)
|
||||
- 共享包:枚举、金额精度、时间工具
|
||||
|
||||
> 各子系统路径见 `structure-lite.md`
|
||||
|
||||
## ETL 功能
|
||||
- 从上游 SaaS API 抽取运营数据(订单、支付、会员、助教、库存等)
|
||||
- 原始数据落地 ODS,保留源 payload 便于回溯
|
||||
- 清洗装载至 DWD,维度走 SCD2,事实按时间增量
|
||||
- 汇总至 DWS:助教业绩、财务日报、会员分析、工资计算、自定义指数算法(WBI/NCI/RS/OS/MS/ML)
|
||||
- 支持在线(API 抓取)和离线(JSON 回放)两种模式
|
||||
- ETL Connector(`apps/etl/connectors/feiqiu/`):上游 SaaS API → ODS → DWD → DWS 三层处理
|
||||
- FastAPI 后端(`apps/backend/`):业务 API 服务
|
||||
- 微信小程序(`apps/miniprogram/`):C 端用户界面
|
||||
- 管理后台(`apps/admin-web/`):任务管理、调度配置、数据查看、ETL 监控
|
||||
- MCP Server(`apps/mcp-server/`):AI 工具集成服务
|
||||
- 共享包(`packages/shared/`):枚举、金额精度、时间工具
|
||||
|
||||
## 业务上下文
|
||||
- 多门店隔离:通过 `site_id` + RLS 实现
|
||||
- 多门店隔离:`site_id` + RLS
|
||||
- 核心实体:会员、助教、台桌、订单、支付、退款、团购套餐、库存
|
||||
- 领域语言以中文为主;代码注释、文档、UI 文案均为中文
|
||||
- 货币:人民币(CNY),金额以 numeric(2) 存储
|
||||
- 领域语言中文;货币 CNY,金额 numeric(2)
|
||||
|
||||
## 主要入口
|
||||
详见 `tech.md` 常用命令节。
|
||||
> ETL 功能细节、指数算法等见 `product-full.md`(fileMatch:读到 ETL 任务/模型/业务规则文件时自动加载,也可 `#product-full` 手动加载)。
|
||||
|
||||
@@ -11,15 +11,16 @@ inclusion: always
|
||||
- `apps/backend/` — FastAPI 后端
|
||||
- `apps/miniprogram/` — 微信小程序
|
||||
- `apps/admin-web/` — 管理后台(React + Vite + Ant Design)
|
||||
- `apps/mcp-server/` — MCP Server(AI 工具集成)
|
||||
- `packages/shared/` — 跨项目共享包
|
||||
- `db/` — DDL / 迁移 / 种子(`etl_feiqiu/`、`zqyy_app/`、`fdw/`)
|
||||
- `docs/` — 项目级文档 + `audit/`(统一审计落地点)
|
||||
- `tests/` — Monorepo 级属性测试
|
||||
- `scripts/` — 项目级运维脚本
|
||||
- `scripts/` — 项目级运维脚本(`ops/`、`audit/`、`migrate/`、`server/`)
|
||||
|
||||
## 高风险路径(变更需审计)
|
||||
- `apps/etl/connectors/feiqiu/` 下:`api/`、`cli/`、`config/`、`database/`、`loaders/`、`models/`、`orchestration/`、`scd/`、`tasks/`、`utils/`、`quality/`
|
||||
- `apps/backend/app/`、`apps/admin-web/src/`、`apps/miniprogram/miniapp/`、`apps/miniprogram/miniprogram/`
|
||||
- `apps/backend/app/`、`apps/admin-web/src/`、`apps/miniprogram/miniprogram/`
|
||||
- `packages/shared/`、`db/`、根目录散文件(`.env*`、`pyproject.toml`)
|
||||
|
||||
## 文件归属规则(强制)
|
||||
|
||||
@@ -31,12 +31,14 @@ NeoZQYY/
|
||||
│ │ ├── tests/ # 后端测试
|
||||
│ │ └── pyproject.toml
|
||||
│ ├── miniprogram/ # 微信小程序
|
||||
│ │ ├── miniapp/ # 小程序源码(主包)
|
||||
│ │ ├── miniprogram/ # 小程序源码(分包)
|
||||
│ │ ├── miniprogram/ # 小程序源码
|
||||
│ │ └── doc/ # 小程序文档
|
||||
│ └── admin-web/ # 管理后台
|
||||
│ ├── src/ # 前端源码(api/components/pages/store/types)
|
||||
│ └── src/__tests__/ # 前端测试
|
||||
│ ├── admin-web/ # 管理后台
|
||||
│ │ ├── src/ # 前端源码(api/components/pages/store/types)
|
||||
│ │ └── src/__tests__/ # 前端测试
|
||||
│ └── mcp-server/ # MCP Server(AI 工具集成)
|
||||
│ ├── server.py
|
||||
│ └── pyproject.toml
|
||||
├── packages/shared/ # 跨项目共享包(enums, money, datetime_utils)
|
||||
├── db/
|
||||
│ ├── etl_feiqiu/
|
||||
@@ -53,6 +55,7 @@ NeoZQYY/
|
||||
│ │ └── audit_dashboard.md # 审计一览表(自动生成)
|
||||
│ ├── database/ # 全局数据库文档
|
||||
│ ├── architecture/ # 架构设计
|
||||
│ ├── deployment/ # 部署文档(EXPORT-PATHS.md、LAUNCH-CHECKLIST.md)
|
||||
│ ├── prd/ # 产品需求
|
||||
│ ├── contracts/ # 数据契约
|
||||
│ └── ...
|
||||
@@ -60,8 +63,9 @@ NeoZQYY/
|
||||
├── scripts/ # 项目级运维脚本
|
||||
│ ├── audit/ # 审计工具(gen_audit_dashboard.py)
|
||||
│ ├── ops/ # 日常运维(init_databases、clone_to_test_db 等)
|
||||
│ └── migrate/ # 一次性迁移脚本
|
||||
├── pyproject.toml # uv workspace 根配置
|
||||
│ ├── migrate/ # 一次性迁移脚本
|
||||
│ └── server/ # 服务器部署脚本
|
||||
├── pyproject.toml # uv workspace 根配置(4 成员)
|
||||
├── .env.template
|
||||
└── README.md
|
||||
```
|
||||
@@ -70,18 +74,14 @@ NeoZQYY/
|
||||
- 任务模式:继承 `BaseTask`(Extract → Transform → Load),在 `orchestration/task_registry.py` 注册
|
||||
- 加载器模式:每张目标表一个 Loader,`upsert()` + 冲突处理
|
||||
- 配置分层:DEFAULTS → `.env` → CLI 覆盖
|
||||
- Flow:通过 `--pipeline` 参数指定(如 `api_full`),旧 `--pipeline-flow` 已弃用
|
||||
- Flow:通过 `--pipeline` 参数指定(如 `api_full`)
|
||||
- 多门店隔离:`site_id` + RLS(`app` schema 视图层)
|
||||
- 跨库访问:`zqyy_app` 通过 FDW 只读映射 `etl_feiqiu.app`
|
||||
|
||||
## 文件归属规则(展开说明)
|
||||
|
||||
### 模块内部(各 APP / Connector 自治)
|
||||
每个子模块的以下目录属于模块专属,只放该模块自身的内容:
|
||||
- `docs/` — 模块专属文档(API 参考、业务规则、任务说明、运维指南等)
|
||||
- `tests/` — 模块专属测试(单元测试、集成测试)
|
||||
- `scripts/` — 模块专属脚本(数据检查、修复、导出等)
|
||||
|
||||
每个子模块的 `docs/`、`tests/`、`scripts/` 属于模块专属,只放该模块自身的内容。
|
||||
禁止将项目级内容放入模块内部目录,也禁止将模块专属内容放到根目录。
|
||||
|
||||
### 项目级(根目录统管)
|
||||
@@ -96,7 +96,7 @@ NeoZQYY/
|
||||
- 审计一览表:`docs/audit/audit_dashboard.md`(自动生成,勿手动编辑)
|
||||
- Prompt 日志:`docs/audit/prompt_logs/`
|
||||
- 一览表生成脚本:`scripts/audit/gen_audit_dashboard.py`
|
||||
- 禁止将审计产物写入子模块内部(如 `apps/etl/connectors/feiqiu/docs/audit/`)
|
||||
- 禁止将审计产物写入子模块内部
|
||||
|
||||
### 速查表
|
||||
|
||||
|
||||
27
.kiro/steering/tech-full.md
Normal file
27
.kiro/steering/tech-full.md
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
inclusion: fileMatch
|
||||
fileMatchPattern: "**/pyproject.toml,**/config/**,**/migrations/**,**/.env*,**/seeds/**"
|
||||
name: tech-full
|
||||
description: 技术栈详细信息(依赖清单、DDL 基线、测试工具、种子数据)。读到配置/迁移/依赖文件时自动加载。
|
||||
---
|
||||
|
||||
# 技术栈详细信息
|
||||
|
||||
## 核心依赖
|
||||
- ETL:`psycopg2-binary`、`requests`、`python-dateutil`、`tzdata`、`python-dotenv`、`openpyxl`
|
||||
- 后端:`fastapi`、`uvicorn[standard]`、`psycopg2-binary`、`python-dotenv`
|
||||
- 管理后台:`React`、`Vite`、`Ant Design`(独立 pnpm 管理)
|
||||
- 共享包:`neozqyy-shared`(workspace 内部引用)
|
||||
- 测试:`pytest`、`hypothesis`
|
||||
|
||||
## 数据库详细
|
||||
- 业务库 `zqyy_app`(用户/RBAC/任务/审批),通过 FDW 只读映射 ETL 数据
|
||||
- DDL 基线:`docs/database/ddl/`(从测试库自动导出,按 schema 分文件),重新生成:`python scripts/ops/gen_consolidated_ddl.py`
|
||||
- 旧 DDL / 迁移脚本已归档至 `db/_archived/ddl_baseline_2026-02-22/`
|
||||
- 种子数据:`db/etl_feiqiu/seeds/`、`db/zqyy_app/seeds/`
|
||||
|
||||
## 测试
|
||||
- ETL 单元测试:`cd apps/etl/connectors/feiqiu && pytest tests/unit`
|
||||
- ETL 集成测试:`TEST_DB_DSN="..." pytest tests/integration`
|
||||
- Monorepo 属性测试:`pytest tests/ -v`(根目录,hypothesis)
|
||||
- 测试工具:`apps/etl/connectors/feiqiu/tests/unit/task_test_utils.py` 提供 FakeDB/FakeAPI
|
||||
@@ -5,58 +5,29 @@ inclusion: always
|
||||
# 技术栈与构建
|
||||
|
||||
## 语言与运行时
|
||||
- Python 3.10+
|
||||
- uv workspace 统一依赖管理(根 `pyproject.toml` 声明 3 个 workspace 成员)
|
||||
|
||||
## 核心依赖
|
||||
- ETL:`psycopg2-binary`、`requests`、`python-dateutil`、`tzdata`、`python-dotenv`、`openpyxl`
|
||||
- 后端:`fastapi`、`uvicorn[standard]`、`psycopg2-binary`、`python-dotenv`
|
||||
- 管理后台:`React`、`Vite`、`Ant Design`(`apps/admin-web/`,独立 pnpm 管理)
|
||||
- 共享包:`neozqyy-shared`(workspace 内部引用)
|
||||
- 测试:`pytest`、`hypothesis`
|
||||
- Python 3.10+,uv workspace(根 `pyproject.toml` 声明 4 个成员:etl/connectors/feiqiu、backend、mcp-server、shared)
|
||||
- 管理后台:React + Vite + Ant Design(`apps/admin-web/`,独立 pnpm)
|
||||
|
||||
## 数据库
|
||||
- PostgreSQL(远程实例)
|
||||
- 六层 Schema 架构:`meta`(调度元数据)、`ods`(原始数据)、`dwd`(明细数据)、`core`(跨门店标准化)、`dws`(汇总数据)、`app`(RLS 视图层)
|
||||
- 业务数据库:`zqyy_app`(用户/RBAC/任务/审批),通过 FDW 只读映射 ETL 数据
|
||||
- DDL 文件位于 `db/etl_feiqiu/schemas/`,迁移脚本位于 `db/etl_feiqiu/migrations/`
|
||||
- 种子数据位于 `db/etl_feiqiu/seeds/`
|
||||
|
||||
## 测试
|
||||
- ETL 单元测试:`cd apps/etl/connectors/feiqiu && pytest tests/unit`
|
||||
- ETL 集成测试:`TEST_DB_DSN="..." pytest tests/integration`
|
||||
- Monorepo 属性测试:`pytest tests/ -v`(根目录,hypothesis)
|
||||
- 测试工具:`apps/etl/connectors/feiqiu/tests/unit/task_test_utils.py` 提供 FakeDB/FakeAPI
|
||||
- PostgreSQL 远程实例,四库:`etl_feiqiu` / `test_etl_feiqiu`(ETL)、`zqyy_app` / `test_zqyy_app`(业务)
|
||||
- ETL 六层 Schema:meta / ods / dwd / core / dws / app
|
||||
- DSN:`PG_DSN`(ETL)、`APP_DB_DSN`(业务),根 `.env` 定义
|
||||
|
||||
## 常用命令
|
||||
```bash
|
||||
# 安装依赖
|
||||
uv sync
|
||||
|
||||
# ETL 开发
|
||||
cd apps/etl/connectors/feiqiu
|
||||
python -m cli.main --dry-run --tasks DWD_LOAD_FROM_ODS
|
||||
|
||||
# 后端开发
|
||||
cd apps/backend
|
||||
uvicorn app.main:app --reload
|
||||
|
||||
# ETL 单元测试
|
||||
cd apps/etl/connectors/feiqiu && pytest tests/unit
|
||||
|
||||
# 属性测试
|
||||
cd C:\NeoZQYY && pytest tests/ -v
|
||||
uv sync # 安装依赖
|
||||
cd apps/etl/connectors/feiqiu && python -m cli.main --dry-run --tasks DWD_LOAD_FROM_ODS
|
||||
cd apps/backend && uvicorn app.main:app --reload
|
||||
cd apps/etl/connectors/feiqiu && pytest tests/unit # ETL 单元测试
|
||||
cd C:\NeoZQYY && pytest tests/ -v # 属性测试
|
||||
```
|
||||
|
||||
## 配置体系
|
||||
- 分层叠加:根 `.env` < 应用 `.env.local` < 环境变量 < CLI 参数
|
||||
- ETL 配置类:`apps/etl/connectors/feiqiu/config/settings.py` → `AppConfig`
|
||||
- 敏感值放在 `.env` / `.env.local` 中,禁止提交;`.env.template` 提供模板
|
||||
|
||||
## 脚本执行规范
|
||||
- 需要执行多步操作、文件处理、数据库操作等脚本级任务时,优先编写 Python 脚本(`.py`)再通过 `python script.py` 执行
|
||||
- 避免直接使用 PowerShell 编写复杂逻辑,防止转义符、编码、管道等语法陷阱
|
||||
- 以下情况可以直接用 shell 命令:
|
||||
- 用户明确指定使用 PowerShell / CMD
|
||||
- 操作本身是单条简单命令(如 `pytest`、`uv sync`、`git status`)
|
||||
- Python 脚本放置遵循"两层分治"原则:一次性运维脚本放 `scripts/ops/`,模块专属脚本放模块内 `scripts/`的合理目录下
|
||||
- 复杂操作优先写 Python 脚本再执行,避免 PowerShell 复杂逻辑
|
||||
- 一次性运维脚本放 `scripts/ops/`,模块专属脚本放模块内 `scripts/`
|
||||
|
||||
> 核心依赖清单、DDL 基线、种子数据等详细信息见 `tech-full.md`(fileMatch:读到 pyproject.toml / 配置 / 迁移文件时自动加载,也可 `#tech-full` 手动加载)。
|
||||
|
||||
25
.kiro/steering/testing-env.md
Normal file
25
.kiro/steering/testing-env.md
Normal file
@@ -0,0 +1,25 @@
|
||||
---
|
||||
inclusion: always
|
||||
---
|
||||
|
||||
# 测试与验证环境规范(强制)
|
||||
|
||||
## 核心原则
|
||||
AI 执行测试、验证、调试、一次性脚本时,必须使用与正式运行一致的参数环境。禁止因"只是测试"而省略配置、跳过 `.env` 加载、或使用不完整的参数集。
|
||||
|
||||
## 具体要求
|
||||
|
||||
1. **环境变量必须完整加载**:测试脚本必须通过 `load_dotenv` 或等效方式加载根 `.env`(及模块 `.env`),不得假设"测试不需要路径配置"
|
||||
2. **禁止空值回退到意外默认**:如果某个必需参数(如 `FETCH_ROOT`、`EXPORT_ROOT`、`PG_DSN`)未加载到,应立即报错终止,而非静默使用空字符串或其他配置项的值
|
||||
3. **cwd 必须与正式运行一致**:ETL CLI 测试的 `cwd` 应为 `apps/etl/connectors/feiqiu/`;后端测试的 `cwd` 应为 `apps/backend/`
|
||||
4. **不得为测试单独构造简化配置**:除非用户明确要求隔离测试环境,否则一律使用 `AppConfig.load()` 正常流程加载配置
|
||||
5. **数据库连接使用测试库**:测试涉及数据库时,优先使用 `test_etl_feiqiu` / `test_zqyy_app`(通过 `TEST_DB_DSN` 环境变量),而非正式库
|
||||
|
||||
## 例外情况
|
||||
以下场景允许偏离:
|
||||
- 用户明确指定使用特定参数或简化环境
|
||||
- 纯单元测试使用 FakeDB/FakeAPI(`tests/unit/task_test_utils.py`),不涉及真实路径或连接
|
||||
- `--dry-run` 模式下的 CLI 验证(但路径配置仍需完整)
|
||||
|
||||
## 背景
|
||||
此规则源于实际事故:测试时 `FETCH_ROOT` 未正确加载,`or` 链回退到空字符串,导致时区值 `Asia/Shanghai` 被误用为文件路径,在项目目录下创建了垃圾目录 `Asia/Shanghai/ODS_JSON_ARCHIVE/`。
|
||||
8
apps/admin-web/.vite/deps/_metadata.json
Normal file
8
apps/admin-web/.vite/deps/_metadata.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"hash": "75f75ae2",
|
||||
"configHash": "3c6579c7",
|
||||
"lockfileHash": "4e1d8c76",
|
||||
"browserHash": "dc64490c",
|
||||
"optimized": {},
|
||||
"chunks": {}
|
||||
}
|
||||
3
apps/admin-web/.vite/deps/package.json
Normal file
3
apps/admin-web/.vite/deps/package.json
Normal file
@@ -0,0 +1,3 @@
|
||||
{
|
||||
"type": "module"
|
||||
}
|
||||
@@ -17,6 +17,7 @@ import {
|
||||
DashboardOutlined,
|
||||
FileTextOutlined,
|
||||
LogoutOutlined,
|
||||
DesktopOutlined,
|
||||
} from "@ant-design/icons";
|
||||
import type { MenuProps } from "antd";
|
||||
import { useAuthStore } from "./store/authStore";
|
||||
@@ -29,6 +30,7 @@ import EnvConfig from "./pages/EnvConfig";
|
||||
import DBViewer from "./pages/DBViewer";
|
||||
import ETLStatus from "./pages/ETLStatus";
|
||||
import LogViewer from "./pages/LogViewer";
|
||||
import OpsPanel from "./pages/OpsPanel";
|
||||
|
||||
const { Sider, Content, Footer } = Layout;
|
||||
const { Text } = Typography;
|
||||
@@ -44,6 +46,7 @@ const NAV_ITEMS: MenuProps["items"] = [
|
||||
{ key: "/db-viewer", icon: <DatabaseOutlined />, label: "数据库" },
|
||||
{ key: "/log-viewer", icon: <FileTextOutlined />, label: "日志" },
|
||||
{ key: "/env-config", icon: <ToolOutlined />, label: "环境配置" },
|
||||
{ key: "/ops-panel", icon: <DesktopOutlined />, label: "运维面板" },
|
||||
];
|
||||
|
||||
/* ------------------------------------------------------------------ */
|
||||
@@ -140,6 +143,7 @@ const AppLayout: React.FC = () => {
|
||||
<Route path="/db-viewer" element={<DBViewer />} />
|
||||
<Route path="/etl-status" element={<ETLStatus />} />
|
||||
<Route path="/log-viewer" element={<LogViewer />} />
|
||||
<Route path="/ops-panel" element={<OpsPanel />} />
|
||||
</Routes>
|
||||
</Content>
|
||||
<Footer
|
||||
@@ -154,7 +158,7 @@ const AppLayout: React.FC = () => {
|
||||
<Space size={8}>
|
||||
<Spin size="small" />
|
||||
<Text>执行中</Text>
|
||||
<Tag color="processing">{runningTask.config.pipeline}</Tag>
|
||||
<Tag color="processing">{runningTask.config.flow}</Tag>
|
||||
<Text type="secondary" style={{ fontSize: 12 }}>
|
||||
{runningTask.config.tasks.slice(0, 3).join(", ")}
|
||||
{runningTask.config.tasks.length > 3 && ` +${runningTask.config.tasks.length - 3}`}
|
||||
|
||||
100
apps/admin-web/src/api/opsPanel.ts
Normal file
100
apps/admin-web/src/api/opsPanel.ts
Normal file
@@ -0,0 +1,100 @@
|
||||
/**
|
||||
* 运维控制面板 API
|
||||
*
|
||||
* 对接后端 /api/ops/* 端点,提供服务状态、Git 操作、系统信息等。
|
||||
*/
|
||||
|
||||
import { apiClient } from "./client";
|
||||
|
||||
// ---- 类型定义 ----
|
||||
|
||||
export interface SystemInfo {
|
||||
cpu_percent: number;
|
||||
memory_total_gb: number;
|
||||
memory_used_gb: number;
|
||||
memory_percent: number;
|
||||
disk_total_gb: number;
|
||||
disk_used_gb: number;
|
||||
disk_percent: number;
|
||||
boot_time: string;
|
||||
}
|
||||
|
||||
export interface ServiceStatus {
|
||||
env: string;
|
||||
label: string;
|
||||
running: boolean;
|
||||
pid: number | null;
|
||||
port: number;
|
||||
uptime_seconds: number | null;
|
||||
memory_mb: number | null;
|
||||
cpu_percent: number | null;
|
||||
}
|
||||
|
||||
export interface GitInfo {
|
||||
env: string;
|
||||
branch: string;
|
||||
last_commit_hash: string;
|
||||
last_commit_message: string;
|
||||
last_commit_time: string;
|
||||
has_local_changes: boolean;
|
||||
}
|
||||
|
||||
export interface ActionResult {
|
||||
env: string;
|
||||
action: string;
|
||||
success: boolean;
|
||||
message: string;
|
||||
}
|
||||
|
||||
export interface GitPullResult {
|
||||
env: string;
|
||||
success: boolean;
|
||||
output: string;
|
||||
}
|
||||
|
||||
// ---- API 调用 ----
|
||||
|
||||
export async function fetchSystemInfo(): Promise<SystemInfo> {
|
||||
const { data } = await apiClient.get<SystemInfo>("/ops/system");
|
||||
return data;
|
||||
}
|
||||
|
||||
export async function fetchServicesStatus(): Promise<ServiceStatus[]> {
|
||||
const { data } = await apiClient.get<ServiceStatus[]>("/ops/services");
|
||||
return data;
|
||||
}
|
||||
|
||||
export async function fetchGitInfo(): Promise<GitInfo[]> {
|
||||
const { data } = await apiClient.get<GitInfo[]>("/ops/git");
|
||||
return data;
|
||||
}
|
||||
|
||||
export async function startService(env: string): Promise<ActionResult> {
|
||||
const { data } = await apiClient.post<ActionResult>(`/ops/services/${env}/start`);
|
||||
return data;
|
||||
}
|
||||
|
||||
export async function stopService(env: string): Promise<ActionResult> {
|
||||
const { data } = await apiClient.post<ActionResult>(`/ops/services/${env}/stop`);
|
||||
return data;
|
||||
}
|
||||
|
||||
export async function restartService(env: string): Promise<ActionResult> {
|
||||
const { data } = await apiClient.post<ActionResult>(`/ops/services/${env}/restart`);
|
||||
return data;
|
||||
}
|
||||
|
||||
export async function gitPull(env: string): Promise<GitPullResult> {
|
||||
const { data } = await apiClient.post<GitPullResult>(`/ops/git/${env}/pull`);
|
||||
return data;
|
||||
}
|
||||
|
||||
export async function syncDeps(env: string): Promise<ActionResult> {
|
||||
const { data } = await apiClient.post<ActionResult>(`/ops/git/${env}/sync-deps`);
|
||||
return data;
|
||||
}
|
||||
|
||||
export async function fetchEnvFile(env: string): Promise<{ env: string; content: string }> {
|
||||
const { data } = await apiClient.get<{ env: string; content: string }>(`/ops/env-file/${env}`);
|
||||
return data;
|
||||
}
|
||||
@@ -240,7 +240,7 @@ const ScheduleTab: React.FC = () => {
|
||||
task_codes: [],
|
||||
task_config: {
|
||||
tasks: [],
|
||||
pipeline: 'api_full',
|
||||
flow: 'api_full',
|
||||
processing_mode: 'increment_only',
|
||||
pipeline_flow: 'FULL',
|
||||
dry_run: false,
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
* 功能:
|
||||
* - 同步检查:工具栏右侧 Badge 指示,点击展示差异 Modal
|
||||
* - 全选常用 / 全选 / 反选 / 清空 按钮
|
||||
* - DWD 表选中 = 过滤 DWD_LOAD_FROM_ODS 的装载范围
|
||||
* - DWD 表勾选 = 选择要装载的 DWD 表(正向选择,和 ODS/DWS 一致)
|
||||
*/
|
||||
|
||||
import React, { useEffect, useState, useMemo, useCallback } from "react";
|
||||
@@ -151,6 +151,16 @@ const TaskSelector: React.FC<TaskSelectorProps> = ({
|
||||
// eslint-disable-next-line react-hooks/exhaustive-deps
|
||||
}, [registry]);
|
||||
|
||||
/* CHANGE [2026-02-19] intent: DWD 表正向勾选,加载后默认全选 */
|
||||
useEffect(() => {
|
||||
if (!onDwdTablesChange) return;
|
||||
const allTables = Object.values(dwdTableGroups).flat().map((t) => t.table_name);
|
||||
if (allTables.length > 0 && selectedDwdTables.length === 0) {
|
||||
onDwdTablesChange(allTables);
|
||||
}
|
||||
// eslint-disable-next-line react-hooks/exhaustive-deps
|
||||
}, [dwdTableGroups]);
|
||||
|
||||
const domainGroups = useMemo(
|
||||
() => buildDomainGroups(registry, dwdTableGroups, layers),
|
||||
[registry, dwdTableGroups, layers],
|
||||
@@ -251,9 +261,9 @@ const TaskSelector: React.FC<TaskSelectorProps> = ({
|
||||
<div style={{ display: "flex", justifyContent: "space-between", alignItems: "center", marginBottom: 4 }}>
|
||||
<Space size={4}>
|
||||
<TableOutlined style={{ color: "#52c41a", fontSize: 12 }} />
|
||||
<Text style={{ fontSize: 12, fontWeight: 500 }}>DWD 表过滤</Text>
|
||||
<Text style={{ fontSize: 12, fontWeight: 500 }}>DWD 装载表</Text>
|
||||
<Text type="secondary" style={{ fontSize: 11 }}>
|
||||
{domainDwdSelected.length === 0 ? "(未选 = 全部装载)" : `${domainDwdSelected.length}/${dwdTables.length}`}
|
||||
{`${domainDwdSelected.length}/${dwdTables.length}`}
|
||||
</Text>
|
||||
</Space>
|
||||
<Space size={4}>
|
||||
@@ -370,6 +380,9 @@ const TaskSelector: React.FC<TaskSelectorProps> = ({
|
||||
>
|
||||
<Text strong style={!t.is_common ? { color: "#999" } : undefined}>{t.code}</Text>
|
||||
<Text type="secondary" style={{ marginLeft: 8 }}>{t.name}</Text>
|
||||
{t.description && (t.layer === "DWS" || t.layer === "INDEX") && (
|
||||
<Text type="secondary" style={{ marginLeft: 6, fontSize: 10, color: "#8c8c8c" }}>({t.description})</Text>
|
||||
)}
|
||||
{!t.is_common && <Tag color="default" style={{ marginLeft: 6, fontSize: 11 }}>不常用</Tag>}
|
||||
</Checkbox>
|
||||
</div>
|
||||
|
||||
365
apps/admin-web/src/pages/OpsPanel.tsx
Normal file
365
apps/admin-web/src/pages/OpsPanel.tsx
Normal file
@@ -0,0 +1,365 @@
|
||||
/**
|
||||
* 运维控制面板页面
|
||||
*
|
||||
* 功能:
|
||||
* - 服务器系统资源概况(CPU / 内存 / 磁盘)
|
||||
* - 各环境服务状态 + 启停重启按钮
|
||||
* - 各环境 Git 状态 + pull / 同步依赖按钮
|
||||
* - 各环境 .env 配置查看(敏感值脱敏)
|
||||
*/
|
||||
|
||||
import React, { useEffect, useState, useCallback } from "react";
|
||||
import {
|
||||
Card,
|
||||
Row,
|
||||
Col,
|
||||
Tag,
|
||||
Button,
|
||||
Space,
|
||||
Statistic,
|
||||
Progress,
|
||||
Modal,
|
||||
message,
|
||||
Descriptions,
|
||||
Spin,
|
||||
Tooltip,
|
||||
Typography,
|
||||
Input,
|
||||
} from "antd";
|
||||
import {
|
||||
PlayCircleOutlined,
|
||||
PauseCircleOutlined,
|
||||
ReloadOutlined,
|
||||
CloudDownloadOutlined,
|
||||
SyncOutlined,
|
||||
FileTextOutlined,
|
||||
CheckCircleOutlined,
|
||||
CloseCircleOutlined,
|
||||
ClockCircleOutlined,
|
||||
DesktopOutlined,
|
||||
} from "@ant-design/icons";
|
||||
import type {
|
||||
SystemInfo,
|
||||
ServiceStatus,
|
||||
GitInfo,
|
||||
} from "../api/opsPanel";
|
||||
import {
|
||||
fetchSystemInfo,
|
||||
fetchServicesStatus,
|
||||
fetchGitInfo,
|
||||
startService,
|
||||
stopService,
|
||||
restartService,
|
||||
gitPull,
|
||||
syncDeps,
|
||||
fetchEnvFile,
|
||||
} from "../api/opsPanel";
|
||||
|
||||
const { Text, Title } = Typography;
|
||||
const { TextArea } = Input;
|
||||
|
||||
/* ------------------------------------------------------------------ */
|
||||
/* 工具函数 */
|
||||
/* ------------------------------------------------------------------ */
|
||||
|
||||
/** 秒数格式化为 "Xd Xh Xm" */
|
||||
function formatUptime(seconds: number | null): string {
|
||||
if (seconds == null) return "-";
|
||||
const d = Math.floor(seconds / 86400);
|
||||
const h = Math.floor((seconds % 86400) / 3600);
|
||||
const m = Math.floor((seconds % 3600) / 60);
|
||||
const parts: string[] = [];
|
||||
if (d > 0) parts.push(`${d}天`);
|
||||
if (h > 0) parts.push(`${h}时`);
|
||||
parts.push(`${m}分`);
|
||||
return parts.join(" ");
|
||||
}
|
||||
|
||||
/* ------------------------------------------------------------------ */
|
||||
/* 组件 */
|
||||
/* ------------------------------------------------------------------ */
|
||||
|
||||
const OpsPanel: React.FC = () => {
|
||||
const [system, setSystem] = useState<SystemInfo | null>(null);
|
||||
const [services, setServices] = useState<ServiceStatus[]>([]);
|
||||
const [gitInfos, setGitInfos] = useState<GitInfo[]>([]);
|
||||
const [loading, setLoading] = useState(true);
|
||||
const [actionLoading, setActionLoading] = useState<Record<string, boolean>>({});
|
||||
const [envModalOpen, setEnvModalOpen] = useState(false);
|
||||
const [envModalContent, setEnvModalContent] = useState("");
|
||||
const [envModalTitle, setEnvModalTitle] = useState("");
|
||||
|
||||
// ---- 数据加载 ----
|
||||
|
||||
const loadAll = useCallback(async () => {
|
||||
try {
|
||||
const [sys, svc, git] = await Promise.all([
|
||||
fetchSystemInfo(),
|
||||
fetchServicesStatus(),
|
||||
fetchGitInfo(),
|
||||
]);
|
||||
setSystem(sys);
|
||||
setServices(svc);
|
||||
setGitInfos(git);
|
||||
} catch {
|
||||
message.error("加载运维数据失败");
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
}, []);
|
||||
|
||||
useEffect(() => {
|
||||
loadAll();
|
||||
const timer = setInterval(loadAll, 15_000);
|
||||
return () => clearInterval(timer);
|
||||
}, [loadAll]);
|
||||
|
||||
// ---- 操作处理 ----
|
||||
|
||||
const withAction = async (key: string, fn: () => Promise<void>) => {
|
||||
setActionLoading((prev) => ({ ...prev, [key]: true }));
|
||||
try {
|
||||
await fn();
|
||||
} finally {
|
||||
setActionLoading((prev) => ({ ...prev, [key]: false }));
|
||||
}
|
||||
};
|
||||
|
||||
const handleStart = (env: string) =>
|
||||
withAction(`start-${env}`, async () => {
|
||||
const r = await startService(env);
|
||||
r.success ? message.success(r.message) : message.warning(r.message);
|
||||
await loadAll();
|
||||
});
|
||||
|
||||
const handleStop = (env: string) =>
|
||||
withAction(`stop-${env}`, async () => {
|
||||
const r = await stopService(env);
|
||||
r.success ? message.success(r.message) : message.warning(r.message);
|
||||
await loadAll();
|
||||
});
|
||||
|
||||
const handleRestart = (env: string) =>
|
||||
withAction(`restart-${env}`, async () => {
|
||||
const r = await restartService(env);
|
||||
r.success ? message.success(r.message) : message.warning(r.message);
|
||||
await loadAll();
|
||||
});
|
||||
|
||||
const handlePull = (env: string) =>
|
||||
withAction(`pull-${env}`, async () => {
|
||||
const r = await gitPull(env);
|
||||
if (r.success) {
|
||||
message.success("拉取成功");
|
||||
Modal.info({ title: `Git Pull - ${env}`, content: <pre style={{ maxHeight: 300, overflow: "auto", fontSize: 12 }}>{r.output}</pre>, width: 600 });
|
||||
} else {
|
||||
message.error("拉取失败");
|
||||
Modal.error({ title: `Git Pull 失败 - ${env}`, content: <pre style={{ maxHeight: 300, overflow: "auto", fontSize: 12 }}>{r.output}</pre>, width: 600 });
|
||||
}
|
||||
await loadAll();
|
||||
});
|
||||
|
||||
const handleSyncDeps = (env: string) =>
|
||||
withAction(`sync-${env}`, async () => {
|
||||
const r = await syncDeps(env);
|
||||
r.success ? message.success("依赖同步完成") : message.error(r.message);
|
||||
});
|
||||
|
||||
const handleViewEnv = async (env: string, label: string) => {
|
||||
try {
|
||||
const r = await fetchEnvFile(env);
|
||||
setEnvModalTitle(`${label} .env 配置`);
|
||||
setEnvModalContent(r.content);
|
||||
setEnvModalOpen(true);
|
||||
} catch {
|
||||
message.error("读取配置文件失败");
|
||||
}
|
||||
};
|
||||
|
||||
// ---- 渲染 ----
|
||||
|
||||
if (loading) {
|
||||
return <Spin size="large" style={{ display: "flex", justifyContent: "center", marginTop: 120 }} />;
|
||||
}
|
||||
|
||||
return (
|
||||
<div>
|
||||
<Title level={4} style={{ marginBottom: 16 }}>
|
||||
<DesktopOutlined style={{ marginRight: 8 }} />
|
||||
运维控制面板
|
||||
</Title>
|
||||
|
||||
{/* ---- 系统资源 ---- */}
|
||||
{system && (
|
||||
<Card size="small" title="服务器资源" style={{ marginBottom: 16 }}>
|
||||
<Row gutter={24}>
|
||||
<Col span={8}>
|
||||
<Statistic title="CPU 使用率" value={system.cpu_percent} suffix="%" />
|
||||
<Progress percent={system.cpu_percent} size="small" status={system.cpu_percent > 80 ? "exception" : "normal"} showInfo={false} />
|
||||
</Col>
|
||||
<Col span={8}>
|
||||
<Statistic title="内存" value={system.memory_used_gb} suffix={`/ ${system.memory_total_gb} GB`} precision={1} />
|
||||
<Progress percent={system.memory_percent} size="small" status={system.memory_percent > 85 ? "exception" : "normal"} showInfo={false} />
|
||||
</Col>
|
||||
<Col span={8}>
|
||||
<Statistic title="磁盘" value={system.disk_used_gb} suffix={`/ ${system.disk_total_gb} GB`} precision={1} />
|
||||
<Progress percent={system.disk_percent} size="small" status={system.disk_percent > 90 ? "exception" : "normal"} showInfo={false} />
|
||||
</Col>
|
||||
</Row>
|
||||
<Text type="secondary" style={{ fontSize: 12, marginTop: 8, display: "block" }}>
|
||||
开机时间:{new Date(system.boot_time).toLocaleString()}
|
||||
</Text>
|
||||
</Card>
|
||||
)}
|
||||
|
||||
{/* ---- 服务状态 ---- */}
|
||||
<Card size="small" title="服务状态" style={{ marginBottom: 16 }}>
|
||||
<Row gutter={16}>
|
||||
{services.map((svc) => (
|
||||
<Col span={12} key={svc.env}>
|
||||
<Card
|
||||
size="small"
|
||||
type="inner"
|
||||
title={
|
||||
<Space>
|
||||
{svc.running
|
||||
? <CheckCircleOutlined style={{ color: "#52c41a" }} />
|
||||
: <CloseCircleOutlined style={{ color: "#ff4d4f" }} />}
|
||||
{svc.label}
|
||||
<Tag color={svc.running ? "success" : "error"}>
|
||||
{svc.running ? "运行中" : "已停止"}
|
||||
</Tag>
|
||||
</Space>
|
||||
}
|
||||
extra={<Tag>:{svc.port}</Tag>}
|
||||
>
|
||||
{svc.running && (
|
||||
<Descriptions size="small" column={3} style={{ marginBottom: 12 }}>
|
||||
<Descriptions.Item label="PID">{svc.pid}</Descriptions.Item>
|
||||
<Descriptions.Item label="运行时长">
|
||||
<ClockCircleOutlined style={{ marginRight: 4 }} />
|
||||
{formatUptime(svc.uptime_seconds)}
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="内存">{svc.memory_mb ?? "-"} MB</Descriptions.Item>
|
||||
</Descriptions>
|
||||
)}
|
||||
<Space>
|
||||
{!svc.running && (
|
||||
<Button
|
||||
type="primary"
|
||||
size="small"
|
||||
icon={<PlayCircleOutlined />}
|
||||
loading={actionLoading[`start-${svc.env}`]}
|
||||
onClick={() => handleStart(svc.env)}
|
||||
>
|
||||
启动
|
||||
</Button>
|
||||
)}
|
||||
{svc.running && (
|
||||
<>
|
||||
<Button
|
||||
danger
|
||||
size="small"
|
||||
icon={<PauseCircleOutlined />}
|
||||
loading={actionLoading[`stop-${svc.env}`]}
|
||||
onClick={() => handleStop(svc.env)}
|
||||
>
|
||||
停止
|
||||
</Button>
|
||||
<Button
|
||||
size="small"
|
||||
icon={<ReloadOutlined />}
|
||||
loading={actionLoading[`restart-${svc.env}`]}
|
||||
onClick={() => handleRestart(svc.env)}
|
||||
>
|
||||
重启
|
||||
</Button>
|
||||
</>
|
||||
)}
|
||||
</Space>
|
||||
</Card>
|
||||
</Col>
|
||||
))}
|
||||
</Row>
|
||||
</Card>
|
||||
|
||||
{/* ---- Git 状态 & 配置 ---- */}
|
||||
<Card size="small" title="代码与配置" style={{ marginBottom: 16 }}>
|
||||
<Row gutter={16}>
|
||||
{gitInfos.map((git) => {
|
||||
const envCfg = services.find((s) => s.env === git.env);
|
||||
const label = envCfg?.label ?? git.env;
|
||||
return (
|
||||
<Col span={12} key={git.env}>
|
||||
<Card size="small" type="inner" title={label}>
|
||||
<Descriptions size="small" column={1} style={{ marginBottom: 12 }}>
|
||||
<Descriptions.Item label="分支">
|
||||
<Tag color="blue">{git.branch}</Tag>
|
||||
{git.has_local_changes && (
|
||||
<Tooltip title="工作区有未提交的变更">
|
||||
<Tag color="warning">有变更</Tag>
|
||||
</Tooltip>
|
||||
)}
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="最新提交">
|
||||
<Text code style={{ fontSize: 12 }}>{git.last_commit_hash}</Text>
|
||||
<Text type="secondary" style={{ marginLeft: 8, fontSize: 12 }}>
|
||||
{git.last_commit_message}
|
||||
</Text>
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="提交时间">
|
||||
<Text type="secondary" style={{ fontSize: 12 }}>{git.last_commit_time}</Text>
|
||||
</Descriptions.Item>
|
||||
</Descriptions>
|
||||
<Space>
|
||||
<Button
|
||||
size="small"
|
||||
icon={<CloudDownloadOutlined />}
|
||||
loading={actionLoading[`pull-${git.env}`]}
|
||||
onClick={() => handlePull(git.env)}
|
||||
>
|
||||
Git Pull
|
||||
</Button>
|
||||
<Button
|
||||
size="small"
|
||||
icon={<SyncOutlined />}
|
||||
loading={actionLoading[`sync-${git.env}`]}
|
||||
onClick={() => handleSyncDeps(git.env)}
|
||||
>
|
||||
同步依赖
|
||||
</Button>
|
||||
<Button
|
||||
size="small"
|
||||
icon={<FileTextOutlined />}
|
||||
onClick={() => handleViewEnv(git.env, label)}
|
||||
>
|
||||
查看配置
|
||||
</Button>
|
||||
</Space>
|
||||
</Card>
|
||||
</Col>
|
||||
);
|
||||
})}
|
||||
</Row>
|
||||
</Card>
|
||||
|
||||
{/* ---- 配置查看弹窗 ---- */}
|
||||
<Modal
|
||||
title={envModalTitle}
|
||||
open={envModalOpen}
|
||||
onCancel={() => setEnvModalOpen(false)}
|
||||
footer={null}
|
||||
width={700}
|
||||
>
|
||||
<TextArea
|
||||
value={envModalContent}
|
||||
readOnly
|
||||
autoSize={{ minRows: 10, maxRows: 30 }}
|
||||
style={{ fontFamily: "monospace", fontSize: 12 }}
|
||||
/>
|
||||
</Modal>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
export default OpsPanel;
|
||||
@@ -71,6 +71,7 @@ const FALLBACK_PROCESSING_MODES: ProcModeEntry[] = [
|
||||
{ value: "increment_only", label: "仅增量", desc: "按游标增量抓取和装载" },
|
||||
{ value: "verify_only", label: "校验并修复", desc: "对比源和目标,修复差异" },
|
||||
{ value: "increment_verify", label: "增量+校验", desc: "先增量再校验" },
|
||||
{ value: "full_window", label: "全窗口", desc: "用 API 返回数据的时间范围处理所有层" },
|
||||
];
|
||||
|
||||
/** 将 API 返回的 FlowDef[] 转为 Record<id, FlowEntry> */
|
||||
@@ -243,7 +244,7 @@ const TaskConfig: React.FC = () => {
|
||||
: selectedTasks;
|
||||
return {
|
||||
tasks,
|
||||
pipeline: flow,
|
||||
flow: flow,
|
||||
processing_mode: processingMode,
|
||||
pipeline_flow: "FULL",
|
||||
dry_run: dryRun,
|
||||
@@ -258,7 +259,8 @@ const TaskConfig: React.FC = () => {
|
||||
skip_ods_when_fetch_before_verify: false,
|
||||
ods_use_local_json: useLocalJson,
|
||||
store_id: effectiveStoreId,
|
||||
dwd_only_tables: selectedDwdTables.length > 0 ? selectedDwdTables : null,
|
||||
/* CHANGE [2026-02-19] intent: DWD 表正向勾选,选中=装载 */
|
||||
dwd_only_tables: layers.includes("DWD") ? (selectedDwdTables.length > 0 ? selectedDwdTables : null) : null,
|
||||
force_full: forceFull,
|
||||
extra_args: {},
|
||||
};
|
||||
|
||||
@@ -2,22 +2,27 @@
|
||||
* 任务管理页面。
|
||||
*
|
||||
* 三个 Tab:队列、调度、历史
|
||||
* 队列 Tab:running 状态的任务可点击查看实时 WebSocket 日志流
|
||||
* 历史 Tab:点击记录可查看执行详情和历史日志
|
||||
*/
|
||||
|
||||
import React, { useEffect, useState, useCallback } from 'react';
|
||||
import React, { useEffect, useState, useCallback, useRef } from 'react';
|
||||
import {
|
||||
Tabs, Table, Tag, Button, Popconfirm, Space, message, Drawer,
|
||||
Typography, Descriptions, Empty,
|
||||
Typography, Descriptions, Empty, Spin,
|
||||
} from 'antd';
|
||||
import {
|
||||
ReloadOutlined, DeleteOutlined, StopOutlined,
|
||||
UnorderedListOutlined, ClockCircleOutlined, HistoryOutlined,
|
||||
FileTextOutlined,
|
||||
} from '@ant-design/icons';
|
||||
import type { ColumnsType } from 'antd/es/table';
|
||||
import type { QueuedTask, ExecutionLog } from '../types';
|
||||
import {
|
||||
fetchQueue, fetchHistory, deleteFromQueue, cancelExecution,
|
||||
} from '../api/execution';
|
||||
import { apiClient } from '../api/client';
|
||||
import LogStream from '../components/LogStream';
|
||||
import ScheduleTab from '../components/ScheduleTab';
|
||||
|
||||
const { Title, Text } = Typography;
|
||||
@@ -61,6 +66,13 @@ const QueueTab: React.FC = () => {
|
||||
const [data, setData] = useState<QueuedTask[]>([]);
|
||||
const [loading, setLoading] = useState(false);
|
||||
|
||||
/* WebSocket 日志流状态 */
|
||||
const [logDrawerOpen, setLogDrawerOpen] = useState(false);
|
||||
const [logLines, setLogLines] = useState<string[]>([]);
|
||||
const [logTaskId, setLogTaskId] = useState<string | null>(null);
|
||||
const [wsConnected, setWsConnected] = useState(false);
|
||||
const wsRef = useRef<WebSocket | null>(null);
|
||||
|
||||
const load = useCallback(async () => {
|
||||
setLoading(true);
|
||||
try { setData(await fetchQueue()); }
|
||||
@@ -70,6 +82,51 @@ const QueueTab: React.FC = () => {
|
||||
|
||||
useEffect(() => { load(); }, [load]);
|
||||
|
||||
/* 自动轮询队列状态(5 秒间隔),保持状态实时 */
|
||||
useEffect(() => {
|
||||
const timer = setInterval(load, 5_000);
|
||||
return () => clearInterval(timer);
|
||||
}, [load]);
|
||||
|
||||
/* 组件卸载时关闭 WebSocket */
|
||||
useEffect(() => {
|
||||
return () => { wsRef.current?.close(); };
|
||||
}, []);
|
||||
|
||||
/** 打开日志抽屉并建立 WebSocket 连接 */
|
||||
const handleViewLogs = useCallback((taskId: string) => {
|
||||
setLogTaskId(taskId);
|
||||
setLogLines([]);
|
||||
setLogDrawerOpen(true);
|
||||
|
||||
// 关闭旧连接
|
||||
wsRef.current?.close();
|
||||
|
||||
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
||||
const host = window.location.host;
|
||||
const ws = new WebSocket(`${protocol}//${host}/ws/logs/${taskId}`);
|
||||
wsRef.current = ws;
|
||||
|
||||
ws.onopen = () => { setWsConnected(true); };
|
||||
ws.onmessage = (event) => {
|
||||
setLogLines((prev) => [...prev, event.data]);
|
||||
};
|
||||
ws.onclose = () => { setWsConnected(false); };
|
||||
ws.onerror = () => {
|
||||
message.error('WebSocket 连接失败');
|
||||
setWsConnected(false);
|
||||
};
|
||||
}, []);
|
||||
|
||||
/** 关闭日志抽屉 */
|
||||
const handleCloseLogDrawer = useCallback(() => {
|
||||
setLogDrawerOpen(false);
|
||||
wsRef.current?.close();
|
||||
wsRef.current = null;
|
||||
setWsConnected(false);
|
||||
setLogTaskId(null);
|
||||
}, []);
|
||||
|
||||
const handleDelete = async (id: string) => {
|
||||
try { await deleteFromQueue(id); message.success('已删除'); load(); }
|
||||
catch { message.error('删除失败'); }
|
||||
@@ -90,7 +147,7 @@ const QueueTab: React.FC = () => {
|
||||
),
|
||||
},
|
||||
{
|
||||
title: 'Flow', dataIndex: ['config', 'pipeline'], key: 'pipeline', width: 120,
|
||||
title: 'Flow', dataIndex: ['config', 'flow'], key: 'flow', width: 120,
|
||||
render: (v: string) => <Tag>{v}</Tag>,
|
||||
},
|
||||
{
|
||||
@@ -100,7 +157,7 @@ const QueueTab: React.FC = () => {
|
||||
{ title: '位置', dataIndex: 'position', key: 'position', width: 60, align: 'center' },
|
||||
{ title: '创建时间', dataIndex: 'created_at', key: 'created_at', width: 170, render: fmtTime },
|
||||
{
|
||||
title: '操作', key: 'action', width: 100, align: 'center',
|
||||
title: '操作', key: 'action', width: 160, align: 'center',
|
||||
render: (_: unknown, record: QueuedTask) => {
|
||||
if (record.status === 'pending') {
|
||||
return (
|
||||
@@ -111,9 +168,17 @@ const QueueTab: React.FC = () => {
|
||||
}
|
||||
if (record.status === 'running') {
|
||||
return (
|
||||
<Popconfirm title="确认取消执行?" onConfirm={() => handleCancel(record.id)}>
|
||||
<Button type="link" danger icon={<StopOutlined />} size="small">取消</Button>
|
||||
</Popconfirm>
|
||||
<Space size={4}>
|
||||
<Button
|
||||
type="link" icon={<FileTextOutlined />} size="small"
|
||||
onClick={() => handleViewLogs(record.id)}
|
||||
>
|
||||
日志
|
||||
</Button>
|
||||
<Popconfirm title="确认取消执行?" onConfirm={() => handleCancel(record.id)}>
|
||||
<Button type="link" danger icon={<StopOutlined />} size="small">取消</Button>
|
||||
</Popconfirm>
|
||||
</Space>
|
||||
);
|
||||
}
|
||||
return null;
|
||||
@@ -132,6 +197,29 @@ const QueueTab: React.FC = () => {
|
||||
loading={loading} pagination={false} size="small"
|
||||
locale={{ emptyText: <Empty description="队列为空" /> }}
|
||||
/>
|
||||
|
||||
{/* 实时日志抽屉 */}
|
||||
<Drawer
|
||||
title={
|
||||
<Space>
|
||||
<FileTextOutlined />
|
||||
<span>执行日志</span>
|
||||
{wsConnected
|
||||
? <Tag color="processing">实时连接中</Tag>
|
||||
: <Tag>未连接</Tag>}
|
||||
</Space>
|
||||
}
|
||||
open={logDrawerOpen}
|
||||
onClose={handleCloseLogDrawer}
|
||||
width={720}
|
||||
styles={{ body: { padding: 12, display: 'flex', flexDirection: 'column', height: '100%' } }}
|
||||
>
|
||||
{logTaskId && (
|
||||
<div style={{ flex: 1, minHeight: 0 }}>
|
||||
<LogStream executionId={logTaskId} lines={logLines} />
|
||||
</div>
|
||||
)}
|
||||
</Drawer>
|
||||
</>
|
||||
);
|
||||
};
|
||||
@@ -144,6 +232,8 @@ const HistoryTab: React.FC = () => {
|
||||
const [data, setData] = useState<ExecutionLog[]>([]);
|
||||
const [loading, setLoading] = useState(false);
|
||||
const [detail, setDetail] = useState<ExecutionLog | null>(null);
|
||||
const [historyLogLines, setHistoryLogLines] = useState<string[]>([]);
|
||||
const [logLoading, setLogLoading] = useState(false);
|
||||
|
||||
const load = useCallback(async () => {
|
||||
setLoading(true);
|
||||
@@ -154,6 +244,28 @@ const HistoryTab: React.FC = () => {
|
||||
|
||||
useEffect(() => { load(); }, [load]);
|
||||
|
||||
/** 点击行时加载详情和日志 */
|
||||
const handleRowClick = useCallback(async (record: ExecutionLog) => {
|
||||
setDetail(record);
|
||||
setHistoryLogLines([]);
|
||||
setLogLoading(true);
|
||||
try {
|
||||
const { data: logData } = await apiClient.get<{
|
||||
execution_id: string;
|
||||
output_log: string | null;
|
||||
error_log: string | null;
|
||||
}>(`/execution/${record.id}/logs`);
|
||||
const parts: string[] = [];
|
||||
if (logData.output_log) parts.push(logData.output_log);
|
||||
if (logData.error_log) parts.push(logData.error_log);
|
||||
setHistoryLogLines(parts.join('\n').split('\n').filter(Boolean));
|
||||
} catch {
|
||||
/* 日志可能不存在,静默处理 */
|
||||
} finally {
|
||||
setLogLoading(false);
|
||||
}
|
||||
}, []);
|
||||
|
||||
const columns: ColumnsType<ExecutionLog> = [
|
||||
{
|
||||
title: '任务', dataIndex: 'task_codes', key: 'task_codes',
|
||||
@@ -187,31 +299,44 @@ const HistoryTab: React.FC = () => {
|
||||
rowKey="id" columns={columns} dataSource={data}
|
||||
loading={loading} pagination={{ pageSize: 20, showTotal: (t) => `共 ${t} 条` }}
|
||||
size="small"
|
||||
onRow={(record) => ({ onClick: () => setDetail(record), style: { cursor: 'pointer' } })}
|
||||
onRow={(record) => ({ onClick: () => handleRowClick(record), style: { cursor: 'pointer' } })}
|
||||
/>
|
||||
|
||||
<Drawer
|
||||
title="执行详情" open={!!detail} onClose={() => setDetail(null)}
|
||||
width={520}
|
||||
width={720}
|
||||
styles={{ body: { padding: 12 } }}
|
||||
>
|
||||
{detail && (
|
||||
<Descriptions column={1} bordered size="small">
|
||||
<Descriptions.Item label="任务">{detail.task_codes?.join(', ')}</Descriptions.Item>
|
||||
<Descriptions.Item label="状态">
|
||||
<Tag color={STATUS_COLOR[detail.status] ?? 'default'}>{detail.status}</Tag>
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="开始时间">{fmtTime(detail.started_at)}</Descriptions.Item>
|
||||
<Descriptions.Item label="结束时间">{fmtTime(detail.finished_at)}</Descriptions.Item>
|
||||
<Descriptions.Item label="时长">{fmtDuration(detail.duration_ms)}</Descriptions.Item>
|
||||
<Descriptions.Item label="退出码">
|
||||
{detail.exit_code != null ? (
|
||||
<Tag color={detail.exit_code === 0 ? 'success' : 'error'}>{detail.exit_code}</Tag>
|
||||
) : '—'}
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="命令">
|
||||
<code style={{ wordBreak: 'break-all', fontSize: 12 }}>{detail.command || '—'}</code>
|
||||
</Descriptions.Item>
|
||||
</Descriptions>
|
||||
<>
|
||||
<Descriptions column={1} bordered size="small" style={{ marginBottom: 16 }}>
|
||||
<Descriptions.Item label="任务">{detail.task_codes?.join(', ')}</Descriptions.Item>
|
||||
<Descriptions.Item label="状态">
|
||||
<Tag color={STATUS_COLOR[detail.status] ?? 'default'}>{detail.status}</Tag>
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="开始时间">{fmtTime(detail.started_at)}</Descriptions.Item>
|
||||
<Descriptions.Item label="结束时间">{fmtTime(detail.finished_at)}</Descriptions.Item>
|
||||
<Descriptions.Item label="时长">{fmtDuration(detail.duration_ms)}</Descriptions.Item>
|
||||
<Descriptions.Item label="退出码">
|
||||
{detail.exit_code != null ? (
|
||||
<Tag color={detail.exit_code === 0 ? 'success' : 'error'}>{detail.exit_code}</Tag>
|
||||
) : '—'}
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="命令">
|
||||
<code style={{ wordBreak: 'break-all', fontSize: 12 }}>{detail.command || '—'}</code>
|
||||
</Descriptions.Item>
|
||||
</Descriptions>
|
||||
|
||||
{/* 历史日志展示 */}
|
||||
<div style={{ marginBottom: 8, display: 'flex', alignItems: 'center', gap: 8 }}>
|
||||
<FileTextOutlined />
|
||||
<Text strong>执行日志</Text>
|
||||
{logLoading && <Spin size="small" />}
|
||||
</div>
|
||||
<div style={{ height: 400 }}>
|
||||
<LogStream executionId={detail.id} lines={historyLogLines} />
|
||||
</div>
|
||||
</>
|
||||
)}
|
||||
</Drawer>
|
||||
</>
|
||||
|
||||
@@ -6,8 +6,8 @@
|
||||
/** ETL 任务执行配置 */
|
||||
export interface TaskConfig {
|
||||
tasks: string[];
|
||||
/** 执行流程 Flow ID(对应 CLI --pipeline) */
|
||||
pipeline: string;
|
||||
/** 执行流程 Flow ID(对应 CLI --flow) */
|
||||
flow: string;
|
||||
/** 处理模式 */
|
||||
processing_mode: string;
|
||||
/** 传统模式兼容(已弃用) */
|
||||
@@ -36,7 +36,7 @@ export interface TaskConfig {
|
||||
}
|
||||
|
||||
/** 执行流程(Flow)定义 */
|
||||
export interface PipelineDefinition {
|
||||
export interface FlowDefinition {
|
||||
id: string;
|
||||
name: string;
|
||||
/** 包含的层:ODS / DWD / DWS / INDEX */
|
||||
|
||||
@@ -12,7 +12,9 @@ from fastapi.middleware.cors import CORSMiddleware
|
||||
|
||||
from app import config
|
||||
# CHANGE 2026-02-19 | 新增 xcx_test 路由(MVP 验证)+ wx_callback 路由(微信消息推送)
|
||||
from app.routers import auth, execution, schedules, tasks, env_config, db_viewer, etl_status, xcx_test, wx_callback
|
||||
# CHANGE 2026-02-22 | 新增 member_birthday 路由(助教手动补录会员生日)
|
||||
# CHANGE 2026-02-23 | 新增 ops_panel 路由(运维控制面板)
|
||||
from app.routers import auth, execution, schedules, tasks, env_config, db_viewer, etl_status, xcx_test, wx_callback, member_birthday, ops_panel
|
||||
from app.services.scheduler import scheduler
|
||||
from app.services.task_queue import task_queue
|
||||
from app.ws.logs import ws_router
|
||||
@@ -60,6 +62,8 @@ app.include_router(etl_status.router)
|
||||
app.include_router(ws_router)
|
||||
app.include_router(xcx_test.router)
|
||||
app.include_router(wx_callback.router)
|
||||
app.include_router(member_birthday.router)
|
||||
app.include_router(ops_panel.router)
|
||||
|
||||
|
||||
@app.get("/health", tags=["系统"])
|
||||
|
||||
57
apps/backend/app/routers/member_birthday.py
Normal file
57
apps/backend/app/routers/member_birthday.py
Normal file
@@ -0,0 +1,57 @@
|
||||
"""
|
||||
会员生日手动补录路由。
|
||||
|
||||
- POST /api/member-birthday — 助教提交会员生日(UPSERT)
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
from fastapi import APIRouter, HTTPException, status
|
||||
|
||||
from app.database import get_connection
|
||||
from app.schemas.member_birthday import MemberBirthdaySubmit
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
router = APIRouter(prefix="/api", tags=["会员生日"])
|
||||
|
||||
|
||||
@router.post("/member-birthday")
|
||||
async def submit_member_birthday(body: MemberBirthdaySubmit):
|
||||
"""
|
||||
助教提交会员生日(UPSERT)。
|
||||
|
||||
同一 (member_id, assistant_id) 组合重复提交时,
|
||||
更新 birthday_value 和 recorded_at,保留其他助教的记录。
|
||||
"""
|
||||
sql = """
|
||||
INSERT INTO member_birthday_manual
|
||||
(member_id, birthday_value, recorded_by_assistant_id, recorded_by_name, site_id)
|
||||
VALUES (%s, %s, %s, %s, %s)
|
||||
ON CONFLICT (member_id, recorded_by_assistant_id)
|
||||
DO UPDATE SET
|
||||
birthday_value = EXCLUDED.birthday_value,
|
||||
recorded_at = NOW()
|
||||
"""
|
||||
conn = get_connection()
|
||||
try:
|
||||
with conn.cursor() as cur:
|
||||
cur.execute(sql, (
|
||||
body.member_id,
|
||||
body.birthday_value,
|
||||
body.assistant_id,
|
||||
body.assistant_name,
|
||||
body.site_id,
|
||||
))
|
||||
conn.commit()
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
logger.exception("会员生日 UPSERT 失败: member_id=%s", body.member_id)
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="生日提交失败,请稍后重试",
|
||||
)
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
return {"status": "ok"}
|
||||
372
apps/backend/app/routers/ops_panel.py
Normal file
372
apps/backend/app/routers/ops_panel.py
Normal file
@@ -0,0 +1,372 @@
|
||||
"""
|
||||
运维控制面板 API
|
||||
|
||||
提供服务器各环境的服务状态查看、启停控制、Git 操作和配置管理。
|
||||
面向管理后台的运维面板页面。
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import subprocess
|
||||
import platform
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import psutil
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
|
||||
router = APIRouter(prefix="/api/ops", tags=["运维面板"])
|
||||
|
||||
# ---- 环境定义 ----
|
||||
# 服务器上的两套环境;开发机上回退到本机路径(方便调试)
|
||||
|
||||
_SERVER_BASE = Path("D:/NeoZQYY")
|
||||
|
||||
ENVIRONMENTS: dict[str, dict[str, Any]] = {
|
||||
"test": {
|
||||
"label": "测试环境",
|
||||
"repo_path": str(_SERVER_BASE / "test" / "repo"),
|
||||
"branch": "test",
|
||||
"port": 8001,
|
||||
"bat_script": str(_SERVER_BASE / "scripts" / "start-test-api.bat"),
|
||||
"window_title": "NeoZQYY Test API",
|
||||
},
|
||||
"prod": {
|
||||
"label": "正式环境",
|
||||
"repo_path": str(_SERVER_BASE / "prod" / "repo"),
|
||||
"branch": "master",
|
||||
"port": 8000,
|
||||
"bat_script": str(_SERVER_BASE / "scripts" / "start-prod-api.bat"),
|
||||
"window_title": "NeoZQYY Prod API",
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
# ---- 数据模型 ----
|
||||
|
||||
class ServiceStatus(BaseModel):
|
||||
env: str
|
||||
label: str
|
||||
running: bool
|
||||
pid: int | None = None
|
||||
port: int
|
||||
uptime_seconds: float | None = None
|
||||
memory_mb: float | None = None
|
||||
cpu_percent: float | None = None
|
||||
|
||||
|
||||
class GitInfo(BaseModel):
|
||||
env: str
|
||||
branch: str
|
||||
last_commit_hash: str
|
||||
last_commit_message: str
|
||||
last_commit_time: str
|
||||
has_local_changes: bool
|
||||
|
||||
|
||||
class SystemInfo(BaseModel):
|
||||
cpu_percent: float
|
||||
memory_total_gb: float
|
||||
memory_used_gb: float
|
||||
memory_percent: float
|
||||
disk_total_gb: float
|
||||
disk_used_gb: float
|
||||
disk_percent: float
|
||||
boot_time: str
|
||||
|
||||
|
||||
class EnvFileContent(BaseModel):
|
||||
content: str
|
||||
|
||||
|
||||
class GitPullResult(BaseModel):
|
||||
env: str
|
||||
success: bool
|
||||
output: str
|
||||
|
||||
|
||||
class ServiceActionResult(BaseModel):
|
||||
env: str
|
||||
action: str
|
||||
success: bool
|
||||
message: str
|
||||
|
||||
|
||||
# ---- 辅助函数 ----
|
||||
|
||||
def _find_uvicorn_process(port: int) -> psutil.Process | None:
|
||||
"""查找监听指定端口的 uvicorn 进程。"""
|
||||
for proc in psutil.process_iter(["pid", "name", "cmdline"]):
|
||||
try:
|
||||
cmdline = proc.info.get("cmdline") or []
|
||||
cmdline_str = " ".join(cmdline)
|
||||
# 匹配 uvicorn 进程且包含对应端口
|
||||
if "uvicorn" in cmdline_str and str(port) in cmdline_str:
|
||||
return proc
|
||||
except (psutil.NoSuchProcess, psutil.AccessDenied):
|
||||
continue
|
||||
return None
|
||||
|
||||
|
||||
def _run_cmd(cmd: str | list[str], cwd: str | None = None, timeout: int = 30) -> tuple[bool, str]:
|
||||
"""执行命令并返回 (成功, 输出)。"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
cwd=cwd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
shell=isinstance(cmd, str),
|
||||
encoding="utf-8",
|
||||
errors="replace",
|
||||
)
|
||||
output = (result.stdout + "\n" + result.stderr).strip()
|
||||
return result.returncode == 0, output
|
||||
except subprocess.TimeoutExpired:
|
||||
return False, "命令执行超时"
|
||||
except Exception as e:
|
||||
return False, str(e)
|
||||
|
||||
|
||||
# ---- 系统信息 ----
|
||||
|
||||
@router.get("/system", response_model=SystemInfo)
|
||||
async def get_system_info():
|
||||
"""获取服务器系统资源概况。"""
|
||||
mem = psutil.virtual_memory()
|
||||
disk = psutil.disk_usage("D:\\") if os.path.exists("D:\\") else psutil.disk_usage("/")
|
||||
boot = datetime.fromtimestamp(psutil.boot_time())
|
||||
return SystemInfo(
|
||||
cpu_percent=psutil.cpu_percent(interval=0.5),
|
||||
memory_total_gb=round(mem.total / (1024 ** 3), 2),
|
||||
memory_used_gb=round(mem.used / (1024 ** 3), 2),
|
||||
memory_percent=mem.percent,
|
||||
disk_total_gb=round(disk.total / (1024 ** 3), 2),
|
||||
disk_used_gb=round(disk.used / (1024 ** 3), 2),
|
||||
disk_percent=disk.percent,
|
||||
boot_time=boot.isoformat(),
|
||||
)
|
||||
|
||||
|
||||
# ---- 服务状态 ----
|
||||
|
||||
@router.get("/services", response_model=list[ServiceStatus])
|
||||
async def get_services_status():
|
||||
"""获取所有环境的服务运行状态。"""
|
||||
results = []
|
||||
for env_key, env_cfg in ENVIRONMENTS.items():
|
||||
proc = _find_uvicorn_process(env_cfg["port"])
|
||||
if proc:
|
||||
try:
|
||||
mem_info = proc.memory_info()
|
||||
create_time = proc.create_time()
|
||||
results.append(ServiceStatus(
|
||||
env=env_key,
|
||||
label=env_cfg["label"],
|
||||
running=True,
|
||||
pid=proc.pid,
|
||||
port=env_cfg["port"],
|
||||
uptime_seconds=round(datetime.now().timestamp() - create_time, 1),
|
||||
memory_mb=round(mem_info.rss / (1024 ** 2), 1),
|
||||
cpu_percent=proc.cpu_percent(interval=0.1),
|
||||
))
|
||||
except (psutil.NoSuchProcess, psutil.AccessDenied):
|
||||
results.append(ServiceStatus(
|
||||
env=env_key, label=env_cfg["label"],
|
||||
running=False, port=env_cfg["port"],
|
||||
))
|
||||
else:
|
||||
results.append(ServiceStatus(
|
||||
env=env_key, label=env_cfg["label"],
|
||||
running=False, port=env_cfg["port"],
|
||||
))
|
||||
return results
|
||||
|
||||
|
||||
# ---- 服务启停 ----
|
||||
|
||||
@router.post("/services/{env}/start", response_model=ServiceActionResult)
|
||||
async def start_service(env: str):
|
||||
"""启动指定环境的后端服务。"""
|
||||
if env not in ENVIRONMENTS:
|
||||
raise HTTPException(404, f"未知环境: {env}")
|
||||
|
||||
cfg = ENVIRONMENTS[env]
|
||||
proc = _find_uvicorn_process(cfg["port"])
|
||||
if proc:
|
||||
return ServiceActionResult(
|
||||
env=env, action="start", success=True,
|
||||
message=f"服务已在运行中 (PID: {proc.pid})",
|
||||
)
|
||||
|
||||
bat_path = cfg["bat_script"]
|
||||
if not os.path.exists(bat_path):
|
||||
raise HTTPException(400, f"启动脚本不存在: {bat_path}")
|
||||
|
||||
# 通过 start 命令在新窗口中启动 bat 脚本
|
||||
try:
|
||||
subprocess.Popen(
|
||||
f'start "{cfg["window_title"]}" cmd /c "{bat_path}"',
|
||||
shell=True,
|
||||
)
|
||||
# 等待进程启动
|
||||
await asyncio.sleep(3)
|
||||
new_proc = _find_uvicorn_process(cfg["port"])
|
||||
if new_proc:
|
||||
return ServiceActionResult(
|
||||
env=env, action="start", success=True,
|
||||
message=f"服务已启动 (PID: {new_proc.pid})",
|
||||
)
|
||||
else:
|
||||
return ServiceActionResult(
|
||||
env=env, action="start", success=False,
|
||||
message="启动命令已执行,但未检测到进程,请检查日志",
|
||||
)
|
||||
except Exception as e:
|
||||
return ServiceActionResult(
|
||||
env=env, action="start", success=False, message=str(e),
|
||||
)
|
||||
|
||||
|
||||
@router.post("/services/{env}/stop", response_model=ServiceActionResult)
|
||||
async def stop_service(env: str):
|
||||
"""停止指定环境的后端服务。"""
|
||||
if env not in ENVIRONMENTS:
|
||||
raise HTTPException(404, f"未知环境: {env}")
|
||||
|
||||
cfg = ENVIRONMENTS[env]
|
||||
proc = _find_uvicorn_process(cfg["port"])
|
||||
if not proc:
|
||||
return ServiceActionResult(
|
||||
env=env, action="stop", success=True, message="服务未在运行",
|
||||
)
|
||||
|
||||
try:
|
||||
# 终止进程树(包括子进程)
|
||||
parent = psutil.Process(proc.pid)
|
||||
children = parent.children(recursive=True)
|
||||
for child in children:
|
||||
child.terminate()
|
||||
parent.terminate()
|
||||
# 等待进程退出
|
||||
gone, alive = psutil.wait_procs([parent] + children, timeout=5)
|
||||
for p in alive:
|
||||
p.kill()
|
||||
return ServiceActionResult(
|
||||
env=env, action="stop", success=True, message="服务已停止",
|
||||
)
|
||||
except Exception as e:
|
||||
return ServiceActionResult(
|
||||
env=env, action="stop", success=False, message=str(e),
|
||||
)
|
||||
|
||||
|
||||
@router.post("/services/{env}/restart", response_model=ServiceActionResult)
|
||||
async def restart_service(env: str):
|
||||
"""重启指定环境的后端服务。"""
|
||||
stop_result = await stop_service(env)
|
||||
if not stop_result.success and "未在运行" not in stop_result.message:
|
||||
return ServiceActionResult(
|
||||
env=env, action="restart", success=False,
|
||||
message=f"停止失败: {stop_result.message}",
|
||||
)
|
||||
await asyncio.sleep(1)
|
||||
start_result = await start_service(env)
|
||||
return ServiceActionResult(
|
||||
env=env, action="restart",
|
||||
success=start_result.success,
|
||||
message=start_result.message,
|
||||
)
|
||||
|
||||
|
||||
# ---- Git 操作 ----
|
||||
|
||||
@router.get("/git", response_model=list[GitInfo])
|
||||
async def get_git_info():
|
||||
"""获取所有环境的 Git 状态。"""
|
||||
results = []
|
||||
for env_key, env_cfg in ENVIRONMENTS.items():
|
||||
repo = env_cfg["repo_path"]
|
||||
if not os.path.isdir(os.path.join(repo, ".git")):
|
||||
results.append(GitInfo(
|
||||
env=env_key, branch="N/A",
|
||||
last_commit_hash="N/A", last_commit_message="仓库不存在",
|
||||
last_commit_time="", has_local_changes=False,
|
||||
))
|
||||
continue
|
||||
|
||||
_, branch = _run_cmd(["git", "rev-parse", "--abbrev-ref", "HEAD"], cwd=repo)
|
||||
_, log_out = _run_cmd(
|
||||
["git", "log", "-1", "--format=%H|%s|%ci"],
|
||||
cwd=repo,
|
||||
)
|
||||
_, status_out = _run_cmd(["git", "status", "--porcelain"], cwd=repo)
|
||||
|
||||
parts = log_out.strip().split("|", 2) if log_out else ["", "", ""]
|
||||
results.append(GitInfo(
|
||||
env=env_key,
|
||||
branch=branch.strip(),
|
||||
last_commit_hash=parts[0][:8] if parts[0] else "N/A",
|
||||
last_commit_message=parts[1] if len(parts) > 1 else "",
|
||||
last_commit_time=parts[2] if len(parts) > 2 else "",
|
||||
has_local_changes=bool(status_out.strip()),
|
||||
))
|
||||
return results
|
||||
|
||||
|
||||
@router.post("/git/{env}/pull", response_model=GitPullResult)
|
||||
async def git_pull(env: str):
|
||||
"""对指定环境执行 git pull。"""
|
||||
if env not in ENVIRONMENTS:
|
||||
raise HTTPException(404, f"未知环境: {env}")
|
||||
|
||||
cfg = ENVIRONMENTS[env]
|
||||
repo = cfg["repo_path"]
|
||||
if not os.path.isdir(os.path.join(repo, ".git")):
|
||||
raise HTTPException(400, f"仓库路径不存在: {repo}")
|
||||
|
||||
success, output = _run_cmd(["git", "pull", "--ff-only"], cwd=repo, timeout=60)
|
||||
return GitPullResult(env=env, success=success, output=output)
|
||||
|
||||
|
||||
@router.post("/git/{env}/sync-deps", response_model=ServiceActionResult)
|
||||
async def sync_deps(env: str):
|
||||
"""对指定环境执行 uv sync --all-packages。"""
|
||||
if env not in ENVIRONMENTS:
|
||||
raise HTTPException(404, f"未知环境: {env}")
|
||||
|
||||
cfg = ENVIRONMENTS[env]
|
||||
repo = cfg["repo_path"]
|
||||
success, output = _run_cmd(["uv", "sync", "--all-packages"], cwd=repo, timeout=120)
|
||||
return ServiceActionResult(
|
||||
env=env, action="sync-deps", success=success, message=output[:500],
|
||||
)
|
||||
|
||||
|
||||
# ---- 环境配置管理 ----
|
||||
|
||||
@router.get("/env-file/{env}")
|
||||
async def get_env_file(env: str):
|
||||
"""读取指定环境的 .env 文件(敏感值脱敏)。"""
|
||||
if env not in ENVIRONMENTS:
|
||||
raise HTTPException(404, f"未知环境: {env}")
|
||||
|
||||
env_path = Path(ENVIRONMENTS[env]["repo_path"]) / ".env"
|
||||
if not env_path.exists():
|
||||
raise HTTPException(404, f".env 文件不存在: {env_path}")
|
||||
|
||||
lines = env_path.read_text(encoding="utf-8").splitlines()
|
||||
masked_lines = []
|
||||
sensitive_keys = {"PASSWORD", "SECRET", "TOKEN", "DSN", "APP_SECRET"}
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
if stripped and not stripped.startswith("#") and "=" in stripped:
|
||||
key = stripped.split("=", 1)[0].strip()
|
||||
if any(s in key.upper() for s in sensitive_keys):
|
||||
masked_lines.append(f"{key}=********")
|
||||
continue
|
||||
masked_lines.append(line)
|
||||
return {"env": env, "content": "\n".join(masked_lines)}
|
||||
@@ -4,7 +4,7 @@
|
||||
提供 4 个端点:
|
||||
- GET /api/tasks/registry — 按业务域分组的任务列表
|
||||
- GET /api/tasks/dwd-tables — 按业务域分组的 DWD 表定义
|
||||
- GET /api/tasks/flows — 7 种 Flow + 3 种处理模式
|
||||
- GET /api/tasks/flows — 7 种 Flow + 4 种处理模式
|
||||
- POST /api/tasks/validate — 验证 TaskConfig 并返回 CLI 命令预览
|
||||
|
||||
所有端点需要 JWT 认证。validate 端点从 JWT 注入 store_id。
|
||||
@@ -103,6 +103,7 @@ PROCESSING_MODE_DEFINITIONS: list[ProcessingModeDefinition] = [
|
||||
ProcessingModeDefinition(id="increment_only", name="仅增量处理", description="只处理新增和变更的数据"),
|
||||
ProcessingModeDefinition(id="verify_only", name="仅校验修复", description="校验现有数据并修复不一致"),
|
||||
ProcessingModeDefinition(id="increment_verify", name="增量 + 校验修复", description="先增量处理,再校验并修复"),
|
||||
ProcessingModeDefinition(id="full_window", name="全窗口处理", description="用 API 返回数据的实际时间范围处理全部层,无需校验"),
|
||||
]
|
||||
|
||||
|
||||
@@ -163,7 +164,7 @@ async def get_dwd_tables(
|
||||
async def get_flows(
|
||||
user: CurrentUser = Depends(get_current_user),
|
||||
) -> FlowsResponse:
|
||||
"""返回 7 种 Flow 定义和 3 种处理模式定义"""
|
||||
"""返回 7 种 Flow 定义和 4 种处理模式定义"""
|
||||
return FlowsResponse(
|
||||
flows=FLOW_DEFINITIONS,
|
||||
processing_modes=PROCESSING_MODE_DEFINITIONS,
|
||||
@@ -183,8 +184,9 @@ async def validate_task_config(
|
||||
errors: list[str] = []
|
||||
|
||||
# 验证 Flow ID
|
||||
if config.pipeline not in FLOW_LAYER_MAP:
|
||||
errors.append(f"无效的执行流程: {config.pipeline}")
|
||||
# CHANGE [2026-02-20] intent: pipeline → flow,统一命名
|
||||
if config.flow not in FLOW_LAYER_MAP:
|
||||
errors.append(f"无效的执行流程: {config.flow}")
|
||||
|
||||
# 验证任务列表非空
|
||||
if not config.tasks:
|
||||
|
||||
19
apps/backend/app/schemas/member_birthday.py
Normal file
19
apps/backend/app/schemas/member_birthday.py
Normal file
@@ -0,0 +1,19 @@
|
||||
"""
|
||||
会员生日手动补录相关 Pydantic 模型。
|
||||
|
||||
- MemberBirthdaySubmit:助教提交会员生日请求体
|
||||
"""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class MemberBirthdaySubmit(BaseModel):
|
||||
"""助教提交会员生日请求。"""
|
||||
|
||||
member_id: int = Field(..., gt=0, description="会员 ID")
|
||||
birthday_value: date = Field(..., description="生日日期")
|
||||
assistant_id: int = Field(..., gt=0, description="助教 ID")
|
||||
assistant_name: str = Field(..., min_length=1, max_length=50, description="助教姓名")
|
||||
site_id: int = Field(..., gt=0, description="门店 ID")
|
||||
@@ -13,8 +13,8 @@ class TaskConfigSchema(BaseModel):
|
||||
"""任务配置 — 前后端传输格式
|
||||
|
||||
字段与 CLI 参数的映射关系:
|
||||
- pipeline → --pipeline(Flow ID,7 种之一)
|
||||
- processing_mode → --processing-mode(3 种处理模式)
|
||||
- flow → --flow(Flow ID,7 种之一)
|
||||
- processing_mode → --processing-mode(4 种处理模式)
|
||||
- tasks → --tasks(逗号分隔)
|
||||
- dry_run → --dry-run(布尔标志)
|
||||
- window_mode → 决定使用 lookback 还是 custom 时间窗口(仅前端逻辑,不直接映射 CLI 参数)
|
||||
@@ -30,7 +30,8 @@ class TaskConfigSchema(BaseModel):
|
||||
"""
|
||||
|
||||
tasks: list[str]
|
||||
pipeline: str = "api_ods_dwd"
|
||||
# CHANGE [2026-02-20] intent: pipeline → flow,统一命名(消除历史别名)
|
||||
flow: str = "api_ods_dwd"
|
||||
processing_mode: str = "increment_only"
|
||||
dry_run: bool = False
|
||||
window_mode: str = "lookback"
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
|
||||
支持:
|
||||
- 7 种 Flow(api_ods / api_ods_dwd / api_full / ods_dwd / dwd_dws / dwd_dws_index / dwd_index)
|
||||
- 3 种处理模式(increment_only / verify_only / increment_verify)
|
||||
- 4 种处理模式(increment_only / verify_only / increment_verify / full_window)
|
||||
- 自动注入 --store-id 参数
|
||||
"""
|
||||
|
||||
@@ -30,6 +30,7 @@ VALID_PROCESSING_MODES: set[str] = {
|
||||
"increment_only",
|
||||
"verify_only",
|
||||
"increment_verify",
|
||||
"full_window",
|
||||
}
|
||||
|
||||
# CLI 支持的 extra_args 键(值类型 + 布尔类型)
|
||||
@@ -72,7 +73,8 @@ class CLIBuilder:
|
||||
cmd: list[str] = [python_executable, "-m", "cli.main"]
|
||||
|
||||
# -- Flow(执行流程) --
|
||||
cmd.extend(["--flow", config.pipeline])
|
||||
# CHANGE [2026-02-20] intent: pipeline → flow,统一命名
|
||||
cmd.extend(["--flow", config.flow])
|
||||
|
||||
# -- 处理模式 --
|
||||
if config.processing_mode:
|
||||
|
||||
@@ -46,7 +46,7 @@ ODS_TASKS: list[TaskDefinition] = [
|
||||
TaskDefinition("ODS_ASSISTANT_LEDGER", "助教服务记录", "抽取助教服务流水", "助教", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_ASSISTANT_ABOLISH", "助教取消记录", "抽取助教取消/作废记录", "助教", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_SETTLEMENT_RECORDS", "结算记录", "抽取订单结算记录", "结算", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_SETTLEMENT_TICKET", "结账小票", "抽取结账小票明细", "结算", "ODS", is_ods=True),
|
||||
# CHANGE [2026-07-20] intent: 同步 ETL 侧移除——ODS_SETTLEMENT_TICKET 已在 Task 7.3 中彻底移除
|
||||
TaskDefinition("ODS_TABLE_USE", "台费流水", "抽取台费使用流水", "台桌", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_TABLE_FEE_DISCOUNT", "台费折扣", "抽取台费折扣记录", "台桌", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_TABLES", "台桌主数据", "抽取门店台桌信息", "台桌", "ODS", is_ods=True, requires_window=False),
|
||||
@@ -59,7 +59,7 @@ ODS_TASKS: list[TaskDefinition] = [
|
||||
TaskDefinition("ODS_RECHARGE_SETTLE", "充值结算", "抽取充值结算记录", "会员", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_GROUP_PACKAGE", "团购套餐", "抽取团购套餐定义", "团购", "ODS", is_ods=True, requires_window=False),
|
||||
TaskDefinition("ODS_GROUP_BUY_REDEMPTION", "团购核销", "抽取团购核销记录", "团购", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_INVENTORY_STOCK", "库存快照", "抽取商品库存汇总", "库存", "ODS", is_ods=True, requires_window=False),
|
||||
TaskDefinition("ODS_INVENTORY_STOCK", "库存快照", "抽取商品库存汇总", "库存", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_INVENTORY_CHANGE", "库存变动", "抽取库存出入库记录", "库存", "ODS", is_ods=True),
|
||||
TaskDefinition("ODS_GOODS_CATEGORY", "商品分类", "抽取商品分类树", "商品", "ODS", is_ods=True, requires_window=False),
|
||||
TaskDefinition("ODS_STORE_GOODS", "门店商品", "抽取门店商品主数据", "商品", "ODS", is_ods=True, requires_window=False),
|
||||
@@ -91,6 +91,10 @@ DWS_TASKS: list[TaskDefinition] = [
|
||||
TaskDefinition("DWS_FINANCE_DISCOUNT_DETAIL", "折扣明细", "汇总折扣明细", "财务", "DWS"),
|
||||
# CHANGE [2026-02-19] intent: 同步 ETL 侧合并——原 DWS_RETENTION_CLEANUP / DWS_MV_REFRESH_* 已合并为 DWS_MAINTENANCE
|
||||
TaskDefinition("DWS_MAINTENANCE", "DWS 维护", "刷新物化视图 + 清理过期留存数据", "通用", "DWS", requires_window=False, is_common=False),
|
||||
# CHANGE [2026-07-20] intent: 注册 DWS 库存汇总任务(日/周/月),依赖 DWD goods_stock_summary 加载完成(需求 12.9)
|
||||
TaskDefinition("DWS_GOODS_STOCK_DAILY", "库存日报", "按日粒度汇总商品库存数据", "库存", "DWS"),
|
||||
TaskDefinition("DWS_GOODS_STOCK_WEEKLY", "库存周报", "按周粒度汇总商品库存数据", "库存", "DWS"),
|
||||
TaskDefinition("DWS_GOODS_STOCK_MONTHLY", "库存月报", "按月粒度汇总商品库存数据", "库存", "DWS"),
|
||||
]
|
||||
|
||||
# ── INDEX 任务定义 ────────────────────────────────────────────
|
||||
@@ -99,7 +103,8 @@ INDEX_TASKS: list[TaskDefinition] = [
|
||||
TaskDefinition("DWS_WINBACK_INDEX", "回流指数 (WBI)", "计算会员回流指数", "指数", "INDEX"),
|
||||
TaskDefinition("DWS_NEWCONV_INDEX", "新客转化指数 (NCI)", "计算新客转化指数", "指数", "INDEX"),
|
||||
TaskDefinition("DWS_ML_MANUAL_IMPORT", "手动导入 (ML)", "手动导入机器学习数据", "指数", "INDEX", requires_window=False, is_common=False),
|
||||
TaskDefinition("DWS_RELATION_INDEX", "关系指数 (RS)", "计算助教-客户关系指数", "指数", "INDEX"),
|
||||
# CHANGE [2026-02-19] intent: 补充说明 RelationIndexTask 产出 RS/OS/MS/ML 四个子指数
|
||||
TaskDefinition("DWS_RELATION_INDEX", "关系指数 (RS)", "产出 RS/OS/MS/ML 四个子指数", "指数", "INDEX"),
|
||||
]
|
||||
|
||||
# ── 工具类任务定义 ────────────────────────────────────────────
|
||||
@@ -210,6 +215,9 @@ DWD_TABLES: list[DwdTableDefinition] = [
|
||||
DwdTableDefinition("dwd.dwd_payment", "支付流水", "结算", "ods.payment_transactions"),
|
||||
DwdTableDefinition("dwd.dwd_refund", "退款流水", "结算", "ods.refund_transactions"),
|
||||
DwdTableDefinition("dwd.dwd_refund_ex", "退款流水(扩展)", "结算", "ods.refund_transactions"),
|
||||
# CHANGE [2026-07-20] intent: 同步 Task 6.1/6.2 新建的 DWD 库存表
|
||||
DwdTableDefinition("dwd.dwd_goods_stock_summary", "库存汇总", "库存", "ods.goods_stock_summary"),
|
||||
DwdTableDefinition("dwd.dwd_goods_stock_movement", "库存变动", "库存", "ods.goods_stock_movements"),
|
||||
]
|
||||
|
||||
|
||||
|
||||
@@ -16,6 +16,7 @@ dependencies = [
|
||||
"python-dotenv>=1.0",
|
||||
"python-jose[cryptography]>=3.3",
|
||||
"bcrypt>=4.0",
|
||||
"psutil>=5.9",
|
||||
]
|
||||
|
||||
[tool.uv.sources]
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""CLIBuilder 单元测试
|
||||
|
||||
覆盖:7 种 Flow、3 种处理模式、时间窗口、store_id 自动注入、extra_args 等。
|
||||
覆盖:7 种 Flow、4 种处理模式、时间窗口、store_id 自动注入、extra_args 等。
|
||||
"""
|
||||
|
||||
import pytest
|
||||
@@ -24,11 +24,11 @@ ETL_PATH = "/fake/etl/project"
|
||||
|
||||
class TestBasicCommand:
|
||||
def test_minimal_command(self, builder: CLIBuilder):
|
||||
"""最小配置应生成 python -m cli.main --pipeline ... --processing-mode ..."""
|
||||
"""最小配置应生成 python -m cli.main --flow ... --processing-mode ..."""
|
||||
config = TaskConfigSchema(tasks=["ODS_MEMBER"])
|
||||
cmd = builder.build_command(config, ETL_PATH)
|
||||
assert cmd[:3] == ["python", "-m", "cli.main"]
|
||||
assert "--pipeline" in cmd
|
||||
assert "--flow" in cmd
|
||||
assert "--processing-mode" in cmd
|
||||
|
||||
def test_custom_python_executable(self, builder: CLIBuilder):
|
||||
@@ -56,20 +56,20 @@ class TestBasicCommand:
|
||||
class TestFlows:
|
||||
@pytest.mark.parametrize("flow_id", sorted(VALID_FLOWS))
|
||||
def test_all_flows_accepted(self, builder: CLIBuilder, flow_id: str):
|
||||
config = TaskConfigSchema(tasks=["ODS_MEMBER"], pipeline=flow_id)
|
||||
config = TaskConfigSchema(tasks=["ODS_MEMBER"], flow=flow_id)
|
||||
cmd = builder.build_command(config, ETL_PATH)
|
||||
idx = cmd.index("--pipeline")
|
||||
idx = cmd.index("--flow")
|
||||
assert cmd[idx + 1] == flow_id
|
||||
|
||||
def test_default_flow_is_api_ods_dwd(self, builder: CLIBuilder):
|
||||
config = TaskConfigSchema(tasks=["ODS_MEMBER"])
|
||||
cmd = builder.build_command(config, ETL_PATH)
|
||||
idx = cmd.index("--pipeline")
|
||||
idx = cmd.index("--flow")
|
||||
assert cmd[idx + 1] == "api_ods_dwd"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 3 种处理模式
|
||||
# 4 种处理模式
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestProcessingModes:
|
||||
|
||||
@@ -36,7 +36,7 @@ _NOW = datetime(2024, 6, 1, 12, 0, 0, tzinfo=timezone.utc)
|
||||
# 构造测试用的 TaskConfig payload
|
||||
_VALID_CONFIG = {
|
||||
"tasks": ["ODS_MEMBER"],
|
||||
"pipeline": "api_ods",
|
||||
"flow": "api_ods",
|
||||
}
|
||||
|
||||
|
||||
@@ -74,7 +74,7 @@ class TestRunTask:
|
||||
|
||||
def test_run_invalid_config_returns_422(self):
|
||||
"""缺少必填字段 tasks 时返回 422"""
|
||||
resp = client.post("/api/execution/run", json={"pipeline": "api_ods"})
|
||||
resp = client.post("/api/execution/run", json={"flow": "api_ods"})
|
||||
assert resp.status_code == 422
|
||||
|
||||
|
||||
|
||||
@@ -40,7 +40,7 @@ _task_codes = ["ODS_MEMBER", "ODS_PAYMENT", "ODS_ORDER", "DWD_LOAD_FROM_ODS", "D
|
||||
_simple_config_st = st.builds(
|
||||
TaskConfigSchema,
|
||||
tasks=st.lists(st.sampled_from(_task_codes), min_size=1, max_size=3, unique=True),
|
||||
pipeline=st.sampled_from(["api_ods", "api_ods_dwd", "ods_dwd"]),
|
||||
flow=st.sampled_from(["api_ods", "api_ods_dwd", "ods_dwd"]),
|
||||
)
|
||||
|
||||
|
||||
@@ -254,7 +254,7 @@ def test_queue_crud_invariant(mock_get_conn, config, site_id, initial_count):
|
||||
db.rows[tid] = {
|
||||
"id": tid,
|
||||
"site_id": site_id,
|
||||
"config": {"tasks": ["ODS_MEMBER"], "pipeline": "api_ods"},
|
||||
"config": {"tasks": ["ODS_MEMBER"], "flow": "api_ods"},
|
||||
"status": "pending",
|
||||
"position": i + 1,
|
||||
}
|
||||
@@ -322,7 +322,7 @@ def test_queue_dequeue_order(mock_get_conn, site_id, num_tasks, positions):
|
||||
db.rows[tid] = {
|
||||
"id": tid,
|
||||
"site_id": site_id,
|
||||
"config": {"tasks": [_task_codes[i % len(_task_codes)]], "pipeline": "api_ods"},
|
||||
"config": {"tasks": [_task_codes[i % len(_task_codes)]], "flow": "api_ods"},
|
||||
"status": "pending",
|
||||
"position": pos,
|
||||
}
|
||||
@@ -372,7 +372,7 @@ def test_queue_reorder_consistency(mock_get_conn, site_id, num_tasks, data):
|
||||
db.rows[tid] = {
|
||||
"id": tid,
|
||||
"site_id": site_id,
|
||||
"config": {"tasks": ["ODS_MEMBER"], "pipeline": "api_ods"},
|
||||
"config": {"tasks": ["ODS_MEMBER"], "flow": "api_ods"},
|
||||
"status": "pending",
|
||||
"position": i + 1,
|
||||
}
|
||||
|
||||
@@ -42,7 +42,7 @@ _task_codes = ["ODS_MEMBER", "ODS_PAYMENT", "ODS_ORDER", "DWD_LOAD_FROM_ODS", "D
|
||||
|
||||
_simple_task_config_st = st.fixed_dictionaries({
|
||||
"tasks": st.lists(st.sampled_from(_task_codes), min_size=1, max_size=3, unique=True),
|
||||
"pipeline": st.sampled_from(["api_ods", "api_ods_dwd", "ods_dwd", "api_full"]),
|
||||
"flow": st.sampled_from(["api_ods", "api_ods_dwd", "ods_dwd", "api_full"]),
|
||||
})
|
||||
|
||||
# 调度配置策略:覆盖 5 种调度类型
|
||||
@@ -324,8 +324,8 @@ def test_due_schedule_auto_enqueue(
|
||||
assert enqueued_config.tasks == task_config["tasks"], (
|
||||
f"入队的 tasks 应为 {task_config['tasks']},实际 {enqueued_config.tasks}"
|
||||
)
|
||||
assert enqueued_config.pipeline == task_config["pipeline"], (
|
||||
f"入队的 pipeline 应为 {task_config['pipeline']},实际 {enqueued_config.pipeline}"
|
||||
assert enqueued_config.flow == task_config["flow"], (
|
||||
f"入队的 flow 应为 {task_config['flow']},实际 {enqueued_config.flow}"
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -230,7 +230,7 @@ class TestCheckAndEnqueue:
|
||||
@patch("app.services.scheduler.task_queue")
|
||||
def test_enqueues_due_tasks(self, mock_tq, mock_get_conn, sched):
|
||||
"""到期任务应被入队,且更新 last_run_at / run_count / next_run_at"""
|
||||
task_config = {"tasks": ["ODS_MEMBER"], "pipeline": "api_ods_dwd"}
|
||||
task_config = {"tasks": ["ODS_MEMBER"], "flow": "api_ods_dwd"}
|
||||
schedule_config = {
|
||||
"schedule_type": "interval",
|
||||
"interval_value": 1,
|
||||
@@ -280,7 +280,7 @@ class TestCheckAndEnqueue:
|
||||
def test_skips_invalid_config(self, mock_tq, mock_get_conn, sched):
|
||||
"""配置反序列化失败的任务应被跳过"""
|
||||
# task_config 缺少必填字段 tasks
|
||||
bad_config = {"pipeline": "api_ods_dwd"}
|
||||
bad_config = {"flow": "api_ods_dwd"}
|
||||
schedule_config = {"schedule_type": "once"}
|
||||
|
||||
cur = _mock_cursor(
|
||||
@@ -300,7 +300,7 @@ class TestCheckAndEnqueue:
|
||||
@patch("app.services.scheduler.task_queue")
|
||||
def test_enqueue_failure_continues(self, mock_tq, mock_get_conn, sched):
|
||||
"""入队失败时应跳过该任务,继续处理后续任务"""
|
||||
task_config = {"tasks": ["ODS_MEMBER"], "pipeline": "api_ods_dwd"}
|
||||
task_config = {"tasks": ["ODS_MEMBER"], "flow": "api_ods_dwd"}
|
||||
schedule_config = {"schedule_type": "once"}
|
||||
|
||||
cur = _mock_cursor(
|
||||
@@ -327,7 +327,7 @@ class TestCheckAndEnqueue:
|
||||
@patch("app.services.scheduler.task_queue")
|
||||
def test_once_type_sets_next_run_none(self, mock_tq, mock_get_conn, sched):
|
||||
"""once 类型任务入队后,next_run_at 应被设为 NULL"""
|
||||
task_config = {"tasks": ["ODS_MEMBER"], "pipeline": "api_ods_dwd"}
|
||||
task_config = {"tasks": ["ODS_MEMBER"], "flow": "api_ods_dwd"}
|
||||
schedule_config = {"schedule_type": "once"}
|
||||
|
||||
select_cur = _mock_cursor(
|
||||
|
||||
@@ -40,14 +40,14 @@ _SCHEDULE_CONFIG = {
|
||||
_VALID_CREATE = {
|
||||
"name": "每日全量同步",
|
||||
"task_codes": ["ODS_MEMBER", "ODS_ORDER"],
|
||||
"task_config": {"tasks": ["ODS_MEMBER", "ODS_ORDER"], "pipeline": "api_ods"},
|
||||
"task_config": {"tasks": ["ODS_MEMBER", "ODS_ORDER"], "flow": "api_ods"},
|
||||
"schedule_config": _SCHEDULE_CONFIG,
|
||||
}
|
||||
|
||||
# 模拟数据库返回的完整行(13 列,与 _SELECT_COLS 对应)
|
||||
_DB_ROW = (
|
||||
"sched-1", 100, "每日全量同步", ["ODS_MEMBER", "ODS_ORDER"],
|
||||
json.dumps({"tasks": ["ODS_MEMBER"], "pipeline": "api_ods"}),
|
||||
json.dumps({"tasks": ["ODS_MEMBER"], "flow": "api_ods"}),
|
||||
json.dumps(_SCHEDULE_CONFIG),
|
||||
True, None, _NEXT, 0, None, _NOW, _NOW,
|
||||
)
|
||||
|
||||
@@ -53,7 +53,7 @@ def _make_queue_rows(site_id: int, count: int) -> list[tuple]:
|
||||
rows.append((
|
||||
str(uuid.uuid4()), # id
|
||||
site_id, # site_id
|
||||
json.dumps({"tasks": ["ODS_MEMBER"], "pipeline": "api_ods"}), # config
|
||||
json.dumps({"tasks": ["ODS_MEMBER"], "flow": "api_ods"}), # config
|
||||
"pending", # status
|
||||
i + 1, # position
|
||||
datetime(2024, 1, 1, tzinfo=timezone.utc), # created_at
|
||||
@@ -75,7 +75,7 @@ def _make_schedule_rows(site_id: int, count: int) -> list[tuple]:
|
||||
site_id, # site_id
|
||||
f"调度任务_{i}", # name
|
||||
["ODS_MEMBER"], # task_codes
|
||||
{"tasks": ["ODS_MEMBER"], "pipeline": "api_ods"}, # task_config
|
||||
{"tasks": ["ODS_MEMBER"], "flow": "api_ods"}, # task_config
|
||||
{"schedule_type": "daily", "daily_time": "04:00", # schedule_config
|
||||
"interval_value": 1, "interval_unit": "hours",
|
||||
"weekly_days": [1], "weekly_time": "04:00",
|
||||
|
||||
@@ -31,7 +31,7 @@ _tasks_st = st.lists(
|
||||
unique=True,
|
||||
)
|
||||
|
||||
_pipeline_st = st.sampled_from(sorted(VALID_FLOWS))
|
||||
_flow_st = st.sampled_from(sorted(VALID_FLOWS))
|
||||
_processing_mode_st = st.sampled_from(sorted(VALID_PROCESSING_MODES))
|
||||
_window_mode_st = st.sampled_from(["lookback", "custom"])
|
||||
|
||||
@@ -69,7 +69,7 @@ def _valid_task_config_st():
|
||||
@st.composite
|
||||
def _build(draw):
|
||||
tasks = draw(_tasks_st)
|
||||
pipeline = draw(_pipeline_st)
|
||||
flow_id = draw(_flow_st)
|
||||
processing_mode = draw(_processing_mode_st)
|
||||
dry_run = draw(st.booleans())
|
||||
window_mode = draw(_window_mode_st)
|
||||
@@ -103,7 +103,7 @@ def _valid_task_config_st():
|
||||
|
||||
return TaskConfigSchema(
|
||||
tasks=tasks,
|
||||
pipeline=pipeline,
|
||||
flow=flow_id,
|
||||
processing_mode=processing_mode,
|
||||
dry_run=dry_run,
|
||||
window_mode=window_mode,
|
||||
@@ -204,10 +204,10 @@ def test_task_config_to_cli_completeness(config: TaskConfigSchema):
|
||||
"""Property 7: CLIBuilder 生成的命令应包含 TaskConfig 中所有非空字段对应的 CLI 参数。"""
|
||||
cmd = _builder.build_command(config, _ETL_PATH)
|
||||
|
||||
# 1) --pipeline 始终存在且值正确
|
||||
assert "--pipeline" in cmd
|
||||
idx = cmd.index("--pipeline")
|
||||
assert cmd[idx + 1] == config.pipeline
|
||||
# 1) --flow 始终存在且值正确
|
||||
assert "--flow" in cmd
|
||||
idx = cmd.index("--flow")
|
||||
assert cmd[idx + 1] == config.flow
|
||||
|
||||
# 2) --processing-mode 始终存在且值正确
|
||||
assert "--processing-mode" in cmd
|
||||
|
||||
@@ -24,7 +24,7 @@ def executor() -> TaskExecutor:
|
||||
def sample_config() -> TaskConfigSchema:
|
||||
return TaskConfigSchema(
|
||||
tasks=["ODS_MEMBER", "ODS_PAYMENT"],
|
||||
pipeline="api_ods_dwd",
|
||||
flow="api_ods_dwd",
|
||||
store_id=42,
|
||||
)
|
||||
|
||||
|
||||
@@ -25,7 +25,7 @@ def queue() -> TaskQueue:
|
||||
def sample_config() -> TaskConfigSchema:
|
||||
return TaskConfigSchema(
|
||||
tasks=["ODS_MEMBER", "ODS_PAYMENT"],
|
||||
pipeline="api_ods_dwd",
|
||||
flow="api_ods_dwd",
|
||||
store_id=42,
|
||||
)
|
||||
|
||||
@@ -107,7 +107,7 @@ class TestEnqueue:
|
||||
config_json_str = insert_call[0][1][2]
|
||||
parsed = json.loads(config_json_str)
|
||||
assert parsed["tasks"] == ["ODS_MEMBER", "ODS_PAYMENT"]
|
||||
assert parsed["pipeline"] == "api_ods_dwd"
|
||||
assert parsed["flow"] == "api_ods_dwd"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -129,7 +129,7 @@ class TestDequeue:
|
||||
@patch("app.services.task_queue.get_connection")
|
||||
def test_dequeue_returns_task(self, mock_get_conn, queue):
|
||||
task_id = str(uuid.uuid4())
|
||||
config_dict = {"tasks": ["ODS_MEMBER"], "pipeline": "api_ods"}
|
||||
config_dict = {"tasks": ["ODS_MEMBER"], "flow": "api_ods"}
|
||||
row = (
|
||||
task_id, 42, json.dumps(config_dict), "pending", 1,
|
||||
None, None, None, None, None,
|
||||
@@ -149,7 +149,7 @@ class TestDequeue:
|
||||
@patch("app.services.task_queue.get_connection")
|
||||
def test_dequeue_updates_status_to_running(self, mock_get_conn, queue):
|
||||
task_id = str(uuid.uuid4())
|
||||
config_dict = {"tasks": ["ODS_MEMBER"], "pipeline": "api_ods"}
|
||||
config_dict = {"tasks": ["ODS_MEMBER"], "flow": "api_ods"}
|
||||
row = (
|
||||
task_id, 42, json.dumps(config_dict), "pending", 1,
|
||||
None, None, None, None, None,
|
||||
@@ -285,7 +285,7 @@ class TestQuery:
|
||||
@patch("app.services.task_queue.get_connection")
|
||||
def test_list_pending_returns_tasks(self, mock_get_conn, queue):
|
||||
tid = str(uuid.uuid4())
|
||||
config = json.dumps({"tasks": ["ODS_MEMBER"], "pipeline": "api_ods"})
|
||||
config = json.dumps({"tasks": ["ODS_MEMBER"], "flow": "api_ods"})
|
||||
rows = [(tid, 42, config, "pending", 1, None, None, None, None, None)]
|
||||
cur = _mock_cursor(fetchall_val=rows)
|
||||
conn = _mock_conn(cur)
|
||||
@@ -353,7 +353,7 @@ class TestProcessLoop:
|
||||
task_id = str(uuid.uuid4())
|
||||
config_dict = {
|
||||
"tasks": ["ODS_MEMBER"],
|
||||
"pipeline": "api_ods_dwd",
|
||||
"flow": "api_ods_dwd",
|
||||
"processing_mode": "increment_only",
|
||||
"dry_run": False,
|
||||
"window_mode": "lookback",
|
||||
|
||||
@@ -152,7 +152,7 @@ class TestValidate:
|
||||
resp = client.post("/api/tasks/validate", json={
|
||||
"config": {
|
||||
"tasks": ["ODS_MEMBER", "ODS_PAYMENT"],
|
||||
"pipeline": "api_ods",
|
||||
"flow": "api_ods",
|
||||
}
|
||||
})
|
||||
assert resp.status_code == 200
|
||||
@@ -169,7 +169,7 @@ class TestValidate:
|
||||
resp = client.post("/api/tasks/validate", json={
|
||||
"config": {
|
||||
"tasks": ["DWD_LOAD_FROM_ODS"],
|
||||
"pipeline": "ods_dwd",
|
||||
"flow": "ods_dwd",
|
||||
"store_id": 999,
|
||||
}
|
||||
})
|
||||
@@ -184,7 +184,7 @@ class TestValidate:
|
||||
resp = client.post("/api/tasks/validate", json={
|
||||
"config": {
|
||||
"tasks": ["ODS_MEMBER"],
|
||||
"pipeline": "nonexistent_flow",
|
||||
"flow": "nonexistent_flow",
|
||||
}
|
||||
})
|
||||
assert resp.status_code == 200
|
||||
@@ -196,7 +196,7 @@ class TestValidate:
|
||||
resp = client.post("/api/tasks/validate", json={
|
||||
"config": {
|
||||
"tasks": [],
|
||||
"pipeline": "api_ods",
|
||||
"flow": "api_ods",
|
||||
}
|
||||
})
|
||||
assert resp.status_code == 200
|
||||
@@ -208,7 +208,7 @@ class TestValidate:
|
||||
resp = client.post("/api/tasks/validate", json={
|
||||
"config": {
|
||||
"tasks": ["ODS_MEMBER"],
|
||||
"pipeline": "api_ods",
|
||||
"flow": "api_ods",
|
||||
"window_mode": "custom",
|
||||
"window_start": "2024-01-01",
|
||||
"window_end": "2024-01-31",
|
||||
@@ -225,7 +225,7 @@ class TestValidate:
|
||||
resp = client.post("/api/tasks/validate", json={
|
||||
"config": {
|
||||
"tasks": ["ODS_MEMBER"],
|
||||
"pipeline": "api_ods",
|
||||
"flow": "api_ods",
|
||||
"window_mode": "custom",
|
||||
"window_start": "2024-12-31",
|
||||
"window_end": "2024-01-01",
|
||||
@@ -237,7 +237,7 @@ class TestValidate:
|
||||
resp = client.post("/api/tasks/validate", json={
|
||||
"config": {
|
||||
"tasks": ["ODS_MEMBER"],
|
||||
"pipeline": "api_ods",
|
||||
"flow": "api_ods",
|
||||
"dry_run": True,
|
||||
}
|
||||
})
|
||||
|
||||
@@ -26,17 +26,18 @@ SCHEMA_ETL=meta
|
||||
# API 配置(上游 SaaS API)
|
||||
# ------------------------------------------------------------------------------
|
||||
API_BASE=https://pc.ficoo.vip/apiprod/admin/v1/
|
||||
API_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnQtdHlwZSI6IjQiLCJ1c2VyLXR5cGUiOiIxIiwiaHR0cDovL3NjaGVtYXMubWljcm9zb2Z0LmNvbS93cy8yMDA4LzA2L2lkZW50aXR5L2NsYWltcy9yb2xlIjoiMTIiLCJyb2xlLWlkIjoiMTIiLCJ0ZW5hbnQtaWQiOiIyNzkwNjgzMTYwNzA5OTU3Iiwibmlja25hbWUiOiLnp5_miLfnrqHnkIblkZjvvJrmganmgakxIiwic2l0ZS1pZCI6IjAiLCJtb2JpbGUiOiIxMzgxMDUwMjMwNCIsInNpZCI6IjI5NTA0ODk2NTgzOTU4NDUiLCJzdGFmZi1pZCI6IjMwMDk5MTg2OTE1NTkwNDUiLCJvcmctaWQiOiIwIiwicm9sZS10eXBlIjoiMyIsInJlZnJlc2hUb2tlbiI6InoxazVzWjlDeEFKYnFkNG1pT3NwUzBsQTRMYUNGcURkQjJBdFdsQk1DbDA9IiwicmVmcmVzaEV4cGlyeVRpbWUiOiIyMDI2LzIvMjIg5LiL5Y2IMTE6NTk6MzAiLCJuZWVkQ2hlY2tUb2tlbiI6ImZhbHNlIiwiZXhwIjoxNzcxNzc1OTcwLCJpc3MiOiJ0ZXN0IiwiYXVkIjoiVXNlciJ9.27D1QgKFYGgMKR9bS5NbCSl4kIf9oFVOQLsFl_ITxdI
|
||||
API_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnQtdHlwZSI6IjQiLCJ1c2VyLXR5cGUiOiIxIiwiaHR0cDovL3NjaGVtYXMubWljcm9zb2Z0LmNvbS93cy8yMDA4LzA2L2lkZW50aXR5L2NsYWltcy9yb2xlIjoiMTIiLCJyb2xlLWlkIjoiMTIiLCJ0ZW5hbnQtaWQiOiIyNzkwNjgzMTYwNzA5OTU3Iiwibmlja25hbWUiOiLnp5_miLfnrqHnkIblkZjvvJrmganmgakxIiwic2l0ZS1pZCI6IjAiLCJtb2JpbGUiOiIxMzgxMDUwMjMwNCIsInNpZCI6IjI5NTA0ODk2NTgzOTU4NDUiLCJzdGFmZi1pZCI6IjMwMDk5MTg2OTE1NTkwNDUiLCJvcmctaWQiOiIwIiwicm9sZS10eXBlIjoiMyIsInJlZnJlc2hUb2tlbiI6IjN4d3IwYjNWN01jemlvcFYyZnZibmtpMVg4MEhxNVFvOFRMcHh3RkNkQUk9IiwicmVmcmVzaEV4cGlyeVRpbWUiOiIyMDI2LzMvMSDkuIvljYgxMDo1MDozOCIsIm5lZWRDaGVja1Rva2VuIjoiZmFsc2UiLCJleHAiOjE3NzIzNzY2MzgsImlzcyI6InRlc3QiLCJhdWQiOiJVc2VyIn0.k_f4jnSGKOKPoZC22bVSrAo9A1FfRqvsNiGw-Vmc0qQ
|
||||
API_TIMEOUT=20
|
||||
API_PAGE_SIZE=200
|
||||
API_RETRY_MAX=3
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 路径配置(已更新为 NeoZQYY 路径)
|
||||
# 路径配置
|
||||
# CHANGE 2026-02-19 | 统一迁移到 export/ETL-Connectors/feiqiu/ 下
|
||||
# ------------------------------------------------------------------------------
|
||||
EXPORT_ROOT=C:/NeoZQYY/export/ETL/JSON
|
||||
LOG_ROOT=C:/NeoZQYY/export/ETL/LOG
|
||||
FETCH_ROOT=C:/NeoZQYY/export/ETL/JSON
|
||||
EXPORT_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON
|
||||
LOG_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/LOGS
|
||||
FETCH_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON
|
||||
WRITE_PRETTY_JSON=true
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
@@ -53,6 +53,7 @@ DEFAULT_LIST_KEYS: Tuple[str, ...] = (
|
||||
"goodsCategoryList",
|
||||
"orderGoodsList",
|
||||
"orderGoodsLedgers",
|
||||
"staffProfiles",
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -174,7 +174,12 @@ def build_recording_client(
|
||||
tz_name = _cfg_get(cfg, "app.timezone", "Asia/Shanghai") or "Asia/Shanghai"
|
||||
tz = ZoneInfo(tz_name)
|
||||
ts = datetime.now(tz).strftime("%Y%m%d-%H%M%S")
|
||||
fetch_root = _cfg_get(cfg, "pipeline.fetch_root") or _cfg_get(cfg, "io.export_root") or "export/JSON"
|
||||
fetch_root = _cfg_get(cfg, "pipeline.fetch_root") or _cfg_get(cfg, "io.export_root")
|
||||
if not fetch_root:
|
||||
raise RuntimeError(
|
||||
"EXPORT_ROOT / FETCH_ROOT 未配置。"
|
||||
"请在根 .env 中设置,参考 .env.template"
|
||||
)
|
||||
task_upper = str(task_code).upper()
|
||||
output_dir = Path(fetch_root) / task_upper / f"{task_upper}-{run_id}-{ts}"
|
||||
|
||||
|
||||
@@ -7,7 +7,6 @@
|
||||
3. Layers 模式:通过 --layers 自由组合 ETL 层(ODS,DWD,DWS,INDEX)
|
||||
|
||||
注意:--flow 和 --layers 互斥,不能同时使用。
|
||||
--pipeline 是 --flow 的已弃用别名,使用时会输出 DeprecationWarning。
|
||||
|
||||
处理模式说明(--processing-mode):
|
||||
- increment_only:仅增量 - 只执行增量数据处理
|
||||
@@ -75,6 +74,7 @@ PROCESSING_MODE_CHOICES = [
|
||||
"increment_only", # 仅增量
|
||||
"verify_only", # 校验并修复(跳过增量)
|
||||
"increment_verify", # 增量 + 校验并修复
|
||||
"full_window", # 全窗口处理(用 API 返回数据的时间范围,处理所有层,无需校验)
|
||||
]
|
||||
|
||||
# 时间窗口切分选项
|
||||
@@ -150,9 +150,6 @@ def parse_args():
|
||||
|
||||
# 仅执行 DWS + INDEX 层
|
||||
python -m cli.main --layers DWS,INDEX
|
||||
|
||||
# --pipeline 仍可用(已弃用,建议迁移到 --flow)
|
||||
python -m cli.main --pipeline api_full
|
||||
""",
|
||||
)
|
||||
|
||||
@@ -167,16 +164,9 @@ def parse_args():
|
||||
choices=FLOW_CHOICES,
|
||||
help="Flow 类型(与 --layers 互斥):api_ods, api_ods_dwd, api_full, ods_dwd, dwd_dws, dwd_dws_index, dwd_index",
|
||||
)
|
||||
# --pipeline 作为已弃用别名,映射到独立 dest 以便检测使用
|
||||
parser.add_argument(
|
||||
"--pipeline",
|
||||
choices=FLOW_CHOICES,
|
||||
dest="pipeline_deprecated",
|
||||
help="[已弃用] 请使用 --flow。功能与 --flow 相同,使用时输出 DeprecationWarning",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--layers",
|
||||
help="ETL 层自由组合,逗号分隔(ODS,DWD,DWS,INDEX),与 --flow/--pipeline 互斥",
|
||||
help="ETL 层自由组合,逗号分隔(ODS,DWD,DWS,INDEX),与 --flow 互斥",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--processing-mode",
|
||||
@@ -409,6 +399,10 @@ def build_cli_overrides(args) -> dict:
|
||||
if args.allow_empty_advance:
|
||||
overrides.setdefault("run", {})["allow_empty_result_advance"] = True
|
||||
|
||||
# 处理模式(写入 config 供 ODS 任务层读取)
|
||||
if hasattr(args, "processing_mode") and args.processing_mode:
|
||||
overrides.setdefault("run", {})["processing_mode"] = args.processing_mode
|
||||
|
||||
# 强制全量更新
|
||||
if args.force_full:
|
||||
overrides.setdefault("run", {})["force_full_update"] = True
|
||||
@@ -451,19 +445,7 @@ def main():
|
||||
logger = setup_logging()
|
||||
args = parse_args()
|
||||
|
||||
# --pipeline 已弃用别名处理:合并到 args.flow(参数名保留以兼容旧调用)
|
||||
if args.pipeline_deprecated:
|
||||
import warnings
|
||||
warnings.warn(
|
||||
"--pipeline 参数已弃用,请使用 --flow",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
if args.flow:
|
||||
print("错误: --pipeline 和 --flow 不能同时指定", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
args.flow = args.pipeline_deprecated
|
||||
|
||||
# CHANGE [2026-02-20] intent: 移除 --pipeline 弃用别名,统一使用 --flow
|
||||
# --layers 和 --flow 互斥校验
|
||||
if getattr(args, "layers", None) and args.flow:
|
||||
print("错误: --layers 和 --flow 互斥,请只指定其中一个", file=sys.stderr)
|
||||
@@ -552,7 +534,7 @@ def main():
|
||||
db_conn, api_client, logger,
|
||||
)
|
||||
result = runner.run(
|
||||
pipeline=args.flow,
|
||||
flow=args.flow,
|
||||
processing_mode=args.processing_mode,
|
||||
data_source=data_source,
|
||||
window_start=window_start,
|
||||
@@ -610,7 +592,7 @@ def main():
|
||||
db_conn, api_client, logger,
|
||||
)
|
||||
result = runner.run(
|
||||
pipeline=None,
|
||||
flow=None,
|
||||
layers=layers,
|
||||
processing_mode=args.processing_mode,
|
||||
data_source=data_source,
|
||||
|
||||
@@ -80,9 +80,9 @@ DEFAULTS = {
|
||||
"allow_empty_result_advance": True,
|
||||
},
|
||||
"io": {
|
||||
"export_root": "export/JSON",
|
||||
"log_root": "export/LOG",
|
||||
"fetch_root": "export/JSON",
|
||||
"export_root": "",
|
||||
"log_root": "",
|
||||
"fetch_root": "",
|
||||
"ingest_source_dir": "",
|
||||
"manifest_name": "manifest.json",
|
||||
"ingest_report_name": "ingest_report.json",
|
||||
@@ -94,7 +94,7 @@ DEFAULTS = {
|
||||
# 运行流程:FETCH_ONLY(仅在线抓取落盘)、INGEST_ONLY(本地清洗入库)、FULL(抓取 + 清洗入库)
|
||||
"flow": "FULL",
|
||||
# 在线抓取 JSON 输出根目录(按任务、run_id 与时间自动创建子目录)
|
||||
"fetch_root": "export/JSON",
|
||||
"fetch_root": "",
|
||||
# 本地清洗入库时的 JSON 输入目录(为空则默认使用本次抓取目录)
|
||||
"ingest_source_dir": "",
|
||||
},
|
||||
@@ -115,7 +115,7 @@ DEFAULTS = {
|
||||
},
|
||||
"ods": {
|
||||
# ODS 离线重建/回放相关(仅开发/运维使用)
|
||||
"json_doc_dir": "export/test-json-doc",
|
||||
"json_doc_dir": "",
|
||||
"include_files": "",
|
||||
"drop_schema_first": True,
|
||||
},
|
||||
|
||||
@@ -64,7 +64,7 @@
|
||||
- 为全部 23 张 ODS 表创建 `(业务主键, fetched_at DESC)` 复合索引
|
||||
- 使用 `CREATE INDEX CONCURRENTLY IF NOT EXISTS`,保证幂等且不锁表
|
||||
- 索引命名规范:`idx_ods_{table_name}_latest`
|
||||
- 同步更新 `db/etl_feiqiu/schemas/ods.sql` 中的索引定义
|
||||
- 同步更新 `docs/database/ddl/etl_feiqiu__ods.sql` 中的索引定义
|
||||
|
||||
### 用途
|
||||
|
||||
|
||||
@@ -0,0 +1,44 @@
|
||||
# dim_staff 员工档案主表
|
||||
|
||||
> 生成时间:2026-02-23
|
||||
|
||||
## 表信息
|
||||
|
||||
| 属性 | 值 |
|
||||
|------|-----|
|
||||
| Schema | dwd |
|
||||
| 表名 | dim_staff |
|
||||
| 主键 | staff_id, scd2_start_time |
|
||||
| 扩展表 | dim_staff_ex |
|
||||
| ODS 来源 | ods.staff_info_master |
|
||||
| 说明 | 员工档案维度主表(SCD2),包含核心业务字段 |
|
||||
|
||||
## 字段说明
|
||||
|
||||
| 序号 | 字段名 | 类型 | 可空 | 主键 | 说明 |
|
||||
|------|--------|------|------|------|------|
|
||||
| 1 | staff_id | BIGINT | NO | PK | 员工唯一标识(映射自 ODS id) |
|
||||
| 2 | staff_name | TEXT | YES | | 员工姓名 |
|
||||
| 3 | alias_name | TEXT | YES | | 别名 |
|
||||
| 4 | mobile | TEXT | YES | | 手机号 |
|
||||
| 5 | gender | INTEGER | YES | | 性别 |
|
||||
| 6 | job | TEXT | YES | | 职位(店长/主管/教练/收银员等) |
|
||||
| 7 | tenant_id | BIGINT | YES | | 租户 ID |
|
||||
| 8 | site_id | BIGINT | YES | | 门店 ID |
|
||||
| 9 | system_role_id | INTEGER | YES | | 系统角色 ID |
|
||||
| 10 | staff_identity | INTEGER | YES | | 员工身份类型 |
|
||||
| 11 | status | INTEGER | YES | | 账号状态 |
|
||||
| 12 | leave_status | INTEGER | YES | | 在职状态(0=在职,1=离职) |
|
||||
| 13 | entry_time | TIMESTAMPTZ | YES | | 入职时间 |
|
||||
| 14 | resign_time | TIMESTAMPTZ | YES | | 离职时间 |
|
||||
| 15 | is_delete | INTEGER | YES | | 删除标记 |
|
||||
| 16 | scd2_start_time | TIMESTAMPTZ | NO | PK | SCD2 版本生效时间 |
|
||||
| 17 | scd2_end_time | TIMESTAMPTZ | YES | | SCD2 版本失效时间 |
|
||||
| 18 | scd2_is_current | INTEGER | YES | | 当前版本标记 |
|
||||
| 19 | scd2_version | INTEGER | YES | | 版本号 |
|
||||
|
||||
## 与其他表的关系
|
||||
|
||||
- 扩展表:`dwd.dim_staff_ex`(次要/低频变更字段)
|
||||
- ODS 来源:`ods.staff_info_master`
|
||||
- 与助教维度表(`dim_assistant`)是完全独立的实体
|
||||
@@ -0,0 +1,51 @@
|
||||
# dim_staff_ex 员工档案扩展表
|
||||
|
||||
> 生成时间:2026-02-23
|
||||
|
||||
## 表信息
|
||||
|
||||
| 属性 | 值 |
|
||||
|------|-----|
|
||||
| Schema | dwd |
|
||||
| 表名 | dim_staff_ex |
|
||||
| 主键 | staff_id, scd2_start_time |
|
||||
| 主表 | dim_staff |
|
||||
| ODS 来源 | ods.staff_info_master |
|
||||
| 说明 | 员工档案维度扩展表(SCD2),包含次要/低频变更字段 |
|
||||
|
||||
## 字段说明
|
||||
|
||||
| 序号 | 字段名 | 类型 | 可空 | 主键 | 说明 |
|
||||
|------|--------|------|------|------|------|
|
||||
| 1 | staff_id | BIGINT | NO | PK | 员工唯一标识(映射自 ODS id) |
|
||||
| 2 | avatar | TEXT | YES | | 头像 URL |
|
||||
| 3 | job_num | TEXT | YES | | 工号 |
|
||||
| 4 | account_status | INTEGER | YES | | 账号启用状态 |
|
||||
| 5 | rank_id | INTEGER | YES | | 职级 ID |
|
||||
| 6 | rank_name | TEXT | YES | | 职级名称 |
|
||||
| 7 | new_rank_id | INTEGER | YES | | 新职级 ID |
|
||||
| 8 | new_staff_identity | INTEGER | YES | | 新员工身份 |
|
||||
| 9 | is_reserve | INTEGER | YES | | 预约标记 |
|
||||
| 10 | shop_name | TEXT | YES | | 门店名称 |
|
||||
| 11 | site_label | TEXT | YES | | 门店标签 |
|
||||
| 12 | tenant_org_id | BIGINT | YES | | 租户组织 ID |
|
||||
| 13 | system_user_id | BIGINT | YES | | 系统用户 ID |
|
||||
| 14 | cashier_point_id | BIGINT | YES | | 收银点 ID |
|
||||
| 15 | cashier_point_name | TEXT | YES | | 收银点名称 |
|
||||
| 16 | group_id | BIGINT | YES | | 分组 ID |
|
||||
| 17 | group_name | TEXT | YES | | 分组名称 |
|
||||
| 18 | staff_profile_id | BIGINT | YES | | 员工档案 ID |
|
||||
| 19 | auth_code | TEXT | YES | | 授权码 |
|
||||
| 20 | auth_code_create | TIMESTAMPTZ | YES | | 授权码创建时间 |
|
||||
| 21 | ding_talk_synced | INTEGER | YES | | 钉钉同步状态 |
|
||||
| 22 | salary_grant_enabled | INTEGER | YES | | 工资发放启用 |
|
||||
| 23 | entry_type | INTEGER | YES | | 入职类型 |
|
||||
| 24 | entry_sign_status | INTEGER | YES | | 入职签约状态 |
|
||||
| 25 | resign_sign_status | INTEGER | YES | | 离职签约状态 |
|
||||
| 26 | criticism_status | INTEGER | YES | | 批评状态 |
|
||||
| 27 | create_time | TIMESTAMPTZ | YES | | 创建时间 |
|
||||
| 28 | user_roles | JSONB | YES | | 用户角色列表 |
|
||||
| 29 | scd2_start_time | TIMESTAMPTZ | NO | PK | SCD2 版本生效时间 |
|
||||
| 30 | scd2_end_time | TIMESTAMPTZ | YES | | SCD2 版本失效时间 |
|
||||
| 31 | scd2_is_current | INTEGER | YES | | 当前版本标记 |
|
||||
| 32 | scd2_version | INTEGER | YES | | 版本号 |
|
||||
@@ -0,0 +1,70 @@
|
||||
# staff_info_master 员工档案主表
|
||||
|
||||
> 生成时间:2026-02-23
|
||||
|
||||
## 表信息
|
||||
|
||||
| 属性 | 值 |
|
||||
|------|-----|
|
||||
| Schema | ods |
|
||||
| 表名 | staff_info_master |
|
||||
| 主键 | id |
|
||||
| 数据来源 | SearchSystemStaffInfo API |
|
||||
| 快照模式 | FULL_TABLE |
|
||||
| 说明 | 员工档案主数据(店长、主管、教练、收银员等) |
|
||||
|
||||
## 字段说明
|
||||
|
||||
| 序号 | 字段名 | 类型 | 可空 | 说明 |
|
||||
|------|--------|------|------|------|
|
||||
| 1 | id | BIGINT | NO | 员工主键 ID |
|
||||
| 2 | tenant_id | BIGINT | YES | 租户 ID |
|
||||
| 3 | site_id | BIGINT | YES | 门店 ID |
|
||||
| 4 | tenant_org_id | BIGINT | YES | 租户组织 ID |
|
||||
| 5 | system_user_id | BIGINT | YES | 系统用户 ID |
|
||||
| 6 | staff_name | TEXT | YES | 员工姓名 |
|
||||
| 7 | alias_name | TEXT | YES | 别名 |
|
||||
| 8 | mobile | TEXT | YES | 手机号 |
|
||||
| 9 | avatar | TEXT | YES | 头像 URL |
|
||||
| 10 | gender | INTEGER | YES | 性别(3=未知) |
|
||||
| 11 | job | TEXT | YES | 职位(店长/主管/教练/收银员等) |
|
||||
| 12 | job_num | TEXT | YES | 工号 |
|
||||
| 13 | staff_identity | INTEGER | YES | 员工身份类型 |
|
||||
| 14 | status | INTEGER | YES | 账号状态 |
|
||||
| 15 | account_status | INTEGER | YES | 账号启用状态 |
|
||||
| 16 | system_role_id | INTEGER | YES | 系统角色 ID |
|
||||
| 17 | rank_id | INTEGER | YES | 职级 ID |
|
||||
| 18 | rank_name | TEXT | YES | 职级名称 |
|
||||
| 19 | new_rank_id | INTEGER | YES | 新职级 ID |
|
||||
| 20 | new_staff_identity | INTEGER | YES | 新员工身份 |
|
||||
| 21 | leave_status | INTEGER | YES | 在职状态(0=在职,1=离职) |
|
||||
| 22 | entry_time | TIMESTAMP | YES | 入职时间 |
|
||||
| 23 | resign_time | TIMESTAMP | YES | 离职时间 |
|
||||
| 24 | create_time | TIMESTAMP | YES | 创建时间 |
|
||||
| 25 | is_delete | INTEGER | YES | 删除标记 |
|
||||
| 26 | is_reserve | INTEGER | YES | 预约标记 |
|
||||
| 27 | shop_name | TEXT | YES | 门店名称 |
|
||||
| 28 | site_label | TEXT | YES | 门店标签 |
|
||||
| 29 | cashier_point_id | BIGINT | YES | 收银点 ID |
|
||||
| 30 | cashier_point_name | TEXT | YES | 收银点名称 |
|
||||
| 31 | group_id | BIGINT | YES | 分组 ID |
|
||||
| 32 | group_name | TEXT | YES | 分组名称 |
|
||||
| 33 | staff_profile_id | BIGINT | YES | 员工档案 ID |
|
||||
| 34 | auth_code | TEXT | YES | 授权码 |
|
||||
| 35 | auth_code_create | TIMESTAMP | YES | 授权码创建时间 |
|
||||
| 36 | ding_talk_synced | INTEGER | YES | 钉钉同步状态 |
|
||||
| 37 | salary_grant_enabled | INTEGER | YES | 工资发放启用 |
|
||||
| 38 | entry_type | INTEGER | YES | 入职类型 |
|
||||
| 39 | entry_sign_status | INTEGER | YES | 入职签约状态 |
|
||||
| 40 | resign_sign_status | INTEGER | YES | 离职签约状态 |
|
||||
| 41 | criticism_status | INTEGER | YES | 批评状态 |
|
||||
| 42 | user_roles | JSONB | YES | 用户角色列表 |
|
||||
| 43 | content_hash | TEXT | NO | 记录内容哈希 |
|
||||
| 44 | source_file | TEXT | YES | 来源文件路径 |
|
||||
| 45 | fetched_at | TIMESTAMPTZ | YES | 抓取时间 |
|
||||
| 46 | payload | JSONB | NO | 原始 JSON |
|
||||
|
||||
## 与其他表的关系
|
||||
|
||||
- 员工表与助教表(`assistant_accounts_master`)是完全独立的实体
|
||||
- 下游:`dwd.dim_staff`(主表)、`dwd.dim_staff_ex`(扩展表)
|
||||
@@ -0,0 +1,80 @@
|
||||
# 员工档案(SearchSystemStaffInfo) → staff_info_master 字段映射
|
||||
|
||||
> 生成时间:2026-02-23
|
||||
|
||||
## 端点信息
|
||||
|
||||
| 属性 | 值 |
|
||||
|------|-----|
|
||||
| 接口路径 | `PersonnelManagement/SearchSystemStaffInfo` |
|
||||
| 请求方法 | POST |
|
||||
| ODS 对应表 | `ods.staff_info_master` |
|
||||
| JSON 数据路径 | `data.staffProfiles` |
|
||||
| 快照模式 | FULL_TABLE(全量快照) |
|
||||
|
||||
## 请求参数
|
||||
|
||||
| 参数 | 类型 | 默认值 | 说明 |
|
||||
|------|------|--------|------|
|
||||
| workStatusEnum | int | 0 | 在职状态筛选(0=全部) |
|
||||
| dingTalkSynced | int | 0 | 钉钉同步状态(0=全部) |
|
||||
| staffIdentity | int | 0 | 员工身份筛选(0=全部) |
|
||||
| rankId | int | 0 | 职级筛选(0=全部) |
|
||||
| criticismStatus | int | 0 | 批评状态(0=全部) |
|
||||
| signStatus | int | -1 | 签约状态(-1=全部) |
|
||||
|
||||
## 字段映射
|
||||
|
||||
| JSON 字段 | ODS 列名 | 类型转换 | 说明 |
|
||||
|-----------|----------|----------|------|
|
||||
| id | id | int→BIGINT | 员工主键 ID |
|
||||
| tenant_id | tenant_id | int→BIGINT | 租户 ID |
|
||||
| site_id | site_id | int→BIGINT | 门店 ID |
|
||||
| tenant_org_id | tenant_org_id | int→BIGINT | 租户组织 ID |
|
||||
| system_user_id | system_user_id | int→BIGINT | 系统用户 ID |
|
||||
| staff_name | staff_name | string→TEXT | 员工姓名 |
|
||||
| alias_name | alias_name | string→TEXT | 别名 |
|
||||
| mobile | mobile | string→TEXT | 手机号 |
|
||||
| avatar | avatar | string→TEXT | 头像 URL |
|
||||
| gender | gender | int→INTEGER | 性别(3=未知) |
|
||||
| job | job | string→TEXT | 职位名称(店长/主管/教练等) |
|
||||
| job_num | job_num | string→TEXT | 工号 |
|
||||
| staff_identity | staff_identity | int→INTEGER | 员工身份类型 |
|
||||
| status | status | int→INTEGER | 账号状态 |
|
||||
| account_status | account_status | int→INTEGER | 账号启用状态 |
|
||||
| system_role_id | system_role_id | int→INTEGER | 系统角色 ID |
|
||||
| rank_id | rank_id | int→INTEGER | 职级 ID |
|
||||
| rankName | rank_name | string→TEXT | 职级名称(驼峰→蛇形) |
|
||||
| new_rank_id | new_rank_id | int→INTEGER | 新职级 ID |
|
||||
| new_staff_identity | new_staff_identity | int→INTEGER | 新员工身份 |
|
||||
| leave_status | leave_status | int→INTEGER | 在职状态(0=在职) |
|
||||
| entry_time | entry_time | string→TIMESTAMP | 入职时间 |
|
||||
| resign_time | resign_time | string→TIMESTAMP | 离职时间 |
|
||||
| create_time | create_time | string→TIMESTAMP | 创建时间 |
|
||||
| is_delete | is_delete | int→INTEGER | 删除标记 |
|
||||
| is_reserve | is_reserve | int→INTEGER | 预约标记 |
|
||||
| shop_name | shop_name | string→TEXT | 门店名称 |
|
||||
| site_label | site_label | string→TEXT | 门店标签 |
|
||||
| cashierPointId | cashier_point_id | int→BIGINT | 收银点 ID(驼峰→蛇形) |
|
||||
| cashierPointName | cashier_point_name | string→TEXT | 收银点名称(驼峰→蛇形) |
|
||||
| groupId | group_id | int→BIGINT | 分组 ID(驼峰→蛇形) |
|
||||
| groupName | group_name | string→TEXT | 分组名称(驼峰→蛇形) |
|
||||
| staff_profile_id | staff_profile_id | int→BIGINT | 员工档案 ID |
|
||||
| auth_code | auth_code | string→TEXT | 授权码 |
|
||||
| auth_code_create | auth_code_create | string→TIMESTAMP | 授权码创建时间 |
|
||||
| ding_talk_synced | ding_talk_synced | int→INTEGER | 钉钉同步状态 |
|
||||
| salary_grant_enabled | salary_grant_enabled | int→INTEGER | 工资发放启用 |
|
||||
| entry_type | entry_type | int→INTEGER | 入职类型 |
|
||||
| entry_sign_status | entry_sign_status | int→INTEGER | 入职签约状态 |
|
||||
| resign_sign_status | resign_sign_status | int→INTEGER | 离职签约状态 |
|
||||
| criticism_status | criticism_status | int→INTEGER | 批评状态 |
|
||||
| userRoles | user_roles | array→JSONB | 用户角色列表(驼峰→蛇形) |
|
||||
|
||||
## ETL 元数据列
|
||||
|
||||
| 列名 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| content_hash | TEXT | 记录内容哈希(去重用) |
|
||||
| source_file | TEXT | 来源文件路径 |
|
||||
| fetched_at | TIMESTAMPTZ | 抓取时间 |
|
||||
| payload | JSONB | 原始 JSON 完整保留 |
|
||||
@@ -1,230 +1,38 @@
|
||||
# BD_Manual — 飞球 ETL 数据库手册
|
||||
# 飞球 ETL 数据库手册
|
||||
|
||||
> 本文档是 `docs/bd_manual/` 目录的导航索引,涵盖 ODS、DWD、DWS、ETL_Admin 四个数据层的表级文档、字段映射文档和变更记录。
|
||||
> 模块专属的表级文档、字段映射、扩展表说明。
|
||||
> DDL 基线见项目级 `docs/database/ddl/`,变更记录已归档至 `_archived/`。
|
||||
|
||||
## 目录结构
|
||||
|
||||
```
|
||||
docs/bd_manual/
|
||||
├── README.md ← 本文件(根索引)
|
||||
├── ddl_compare_results.md ← DDL 对比结果汇总
|
||||
├── ODS/ ← 操作数据存储层(ods schema)
|
||||
│ ├── main/ ← 表级文档
|
||||
│ ├── mappings/ ← API JSON → ODS 字段映射文档
|
||||
│ └── changes/ ← 变更记录
|
||||
├── DWD/ ← 明细数据层(dwd schema)
|
||||
│ ├── main/ ← 表级文档
|
||||
│ ├── Ex/ ← 扩展表文档(SCD2 维度扩展等)
|
||||
│ └── changes/ ← 变更记录
|
||||
├── DWS/ ← 数据服务层(dws schema)
|
||||
│ ├── main/ ← 表级文档
|
||||
│ └── changes/ ← 变更记录
|
||||
└── ETL_Admin/ ← ETL 管理层(meta schema)
|
||||
├── main/ ← 表级文档
|
||||
└── changes/ ← 变更记录
|
||||
database/
|
||||
├── ODS/
|
||||
│ ├── main/ — ODS 表级文档(BD_manual_*.md)
|
||||
│ └── mappings/ — API JSON → ODS 字段映射(mapping_*.md)
|
||||
├── DWD/
|
||||
│ ├── main/ — DWD 主表文档
|
||||
│ └── Ex/ — DWD 扩展表文档
|
||||
├── DWS/
|
||||
│ └── main/ — DWS 汇总表文档
|
||||
├── ETL_Admin/
|
||||
│ └── main/ — meta schema 表文档
|
||||
└── _archived/ — 过时的变更记录、DDL 对比报告、已删除表文档
|
||||
```
|
||||
|
||||
## 文档命名规范
|
||||
## 文档类型
|
||||
|
||||
| 文档类型 | 命名格式 | 示例 |
|
||||
|----------|----------|------|
|
||||
| 表级文档 | `BD_manual_{表名}.md` | `BD_manual_member_profiles.md` |
|
||||
| 映射文档 | `mapping_{API端点名}_{ODS表名}.md` | `mapping_GetTenantMemberList_member_profiles.md` |
|
||||
| 变更记录 | `{YYYYMMDD}_{变更简述}.md` 或 `{YYYY-MM-DD}_{变更简述}.md` | `2026-02-13_ddl_sync_ods.md` |
|
||||
| 类型 | 命名规则 | 说明 |
|
||||
|------|---------|------|
|
||||
| 表级文档 | `BD_manual_{表名}.md` | 字段说明、主键、业务含义 |
|
||||
| 扩展表文档 | `BD_manual_{表名}_ex.md` | SCD2 扩展字段、溢出字段 |
|
||||
| 字段映射 | `mapping_{API端点}_{ODS表名}.md` | API JSON 字段 → ODS 列的映射关系 |
|
||||
|
||||
## ODS 层文档清单(ods)
|
||||
## 与项目级文档的关系
|
||||
|
||||
### 表级文档(`ODS/main/`)— 共 23 份
|
||||
|
||||
| 序号 | 文件名 | 对应表 |
|
||||
|------|--------|--------|
|
||||
| 1 | `BD_manual_assistant_accounts_master.md` | assistant_accounts_master |
|
||||
| 2 | `BD_manual_assistant_cancellation_records.md` | assistant_cancellation_records |
|
||||
| 3 | `BD_manual_assistant_service_records.md` | assistant_service_records |
|
||||
| 4 | `BD_manual_goods_stock_movements.md` | goods_stock_movements |
|
||||
| 5 | `BD_manual_goods_stock_summary.md` | goods_stock_summary |
|
||||
| 6 | `BD_manual_group_buy_packages.md` | group_buy_packages |
|
||||
| 7 | `BD_manual_group_buy_redemption_records.md` | group_buy_redemption_records |
|
||||
| 8 | `BD_manual_member_balance_changes.md` | member_balance_changes |
|
||||
| 9 | `BD_manual_member_profiles.md` | member_profiles |
|
||||
| 10 | `BD_manual_member_stored_value_cards.md` | member_stored_value_cards |
|
||||
| 11 | `BD_manual_payment_transactions.md` | payment_transactions |
|
||||
| 12 | `BD_manual_platform_coupon_redemption_records.md` | platform_coupon_redemption_records |
|
||||
| 13 | `BD_manual_recharge_settlements.md` | recharge_settlements |
|
||||
| 14 | `BD_manual_refund_transactions.md` | refund_transactions |
|
||||
| 15 | `BD_manual_settlement_records.md` | settlement_records |
|
||||
| 16 | `BD_manual_settlement_ticket_details.md` | settlement_ticket_details |
|
||||
| 17 | `BD_manual_site_tables_master.md` | site_tables_master |
|
||||
| 18 | `BD_manual_stock_goods_category_tree.md` | stock_goods_category_tree |
|
||||
| 19 | `BD_manual_store_goods_master.md` | store_goods_master |
|
||||
| 20 | `BD_manual_store_goods_sales_records.md` | store_goods_sales_records |
|
||||
| 21 | `BD_manual_table_fee_discount_records.md` | table_fee_discount_records |
|
||||
| 22 | `BD_manual_table_fee_transactions.md` | table_fee_transactions |
|
||||
| 23 | `BD_manual_tenant_goods_master.md` | tenant_goods_master |
|
||||
|
||||
### API→ODS 字段映射文档(`ODS/mappings/`)— 共 23 份
|
||||
|
||||
| 序号 | 文件名 | API 端点 → ODS 表 |
|
||||
|------|--------|-------------------|
|
||||
| 1 | `mapping_GetAbolitionAssistant_assistant_cancellation_records.md` | GetAbolitionAssistant → assistant_cancellation_records |
|
||||
| 2 | `mapping_GetAllOrderSettleList_settlement_records.md` | GetAllOrderSettleList → settlement_records |
|
||||
| 3 | `mapping_GetGoodsInventoryList_store_goods_master.md` | GetGoodsInventoryList → store_goods_master |
|
||||
| 4 | `mapping_GetGoodsSalesList_store_goods_sales_records.md` | GetGoodsSalesList → store_goods_sales_records |
|
||||
| 5 | `mapping_GetGoodsStockReport_goods_stock_summary.md` | GetGoodsStockReport → goods_stock_summary |
|
||||
| 6 | `mapping_GetMemberCardBalanceChange_member_balance_changes.md` | GetMemberCardBalanceChange → member_balance_changes |
|
||||
| 7 | `mapping_GetOfflineCouponConsumePageList_platform_coupon_redemption_records.md` | GetOfflineCouponConsumePageList → platform_coupon_redemption_records |
|
||||
| 8 | `mapping_GetOrderAssistantDetails_assistant_service_records.md` | GetOrderAssistantDetails → assistant_service_records |
|
||||
| 9 | `mapping_GetOrderSettleTicketNew_settlement_ticket_details.md` | GetOrderSettleTicketNew → settlement_ticket_details |
|
||||
| 10 | `mapping_GetPayLogListPage_payment_transactions.md` | GetPayLogListPage → payment_transactions |
|
||||
| 11 | `mapping_GetRechargeSettleList_recharge_settlements.md` | GetRechargeSettleList → recharge_settlements |
|
||||
| 12 | `mapping_GetRefundPayLogList_refund_transactions.md` | GetRefundPayLogList → refund_transactions |
|
||||
| 13 | `mapping_GetSiteTableOrderDetails_table_fee_transactions.md` | GetSiteTableOrderDetails → table_fee_transactions |
|
||||
| 14 | `mapping_GetSiteTables_site_tables_master.md` | GetSiteTables → site_tables_master |
|
||||
| 15 | `mapping_GetSiteTableUseDetails_group_buy_redemption_records.md` | GetSiteTableUseDetails → group_buy_redemption_records |
|
||||
| 16 | `mapping_GetTaiFeeAdjustList_table_fee_discount_records.md` | GetTaiFeeAdjustList → table_fee_discount_records |
|
||||
| 17 | `mapping_GetTenantMemberCardList_member_stored_value_cards.md` | GetTenantMemberCardList → member_stored_value_cards |
|
||||
| 18 | `mapping_GetTenantMemberList_member_profiles.md` | GetTenantMemberList → member_profiles |
|
||||
| 19 | `mapping_QueryGoodsOutboundReceipt_goods_stock_movements.md` | QueryGoodsOutboundReceipt → goods_stock_movements |
|
||||
| 20 | `mapping_QueryPackageCouponList_group_buy_packages.md` | QueryPackageCouponList → group_buy_packages |
|
||||
| 21 | `mapping_QueryPrimarySecondaryCategory_stock_goods_category_tree.md` | QueryPrimarySecondaryCategory → stock_goods_category_tree |
|
||||
| 22 | `mapping_QueryTenantGoods_tenant_goods_master.md` | QueryTenantGoods → tenant_goods_master |
|
||||
| 23 | `mapping_SearchAssistantInfo_assistant_accounts_master.md` | SearchAssistantInfo → assistant_accounts_master |
|
||||
|
||||
### 变更记录(`ODS/changes/`)
|
||||
|
||||
| 文件名 | 说明 |
|
||||
|--------|------|
|
||||
| `2026-02-13_ddl_sync_ods.md` | DDL 对比同步 — ODS 层 |
|
||||
| `20260213_align_ods_with_api.md` | ODS 表结构与 API 对齐 |
|
||||
| `20260214_drop_ods_option_name_able_site_transfer.md` | 移除 ODS 冗余字段/表 |
|
||||
| `20260214_drop_ods_settlelist.md` | 移除 ODS settle_list 表 |
|
||||
|
||||
## DWD 层文档清单(dwd)
|
||||
|
||||
### 表级文档(`DWD/main/`)— 共 22 份
|
||||
|
||||
| 序号 | 文件名 | 对应表 |
|
||||
|------|--------|--------|
|
||||
| 1 | `BD_manual_dwd.md` | dwd(层级概览) |
|
||||
| 2 | `BD_manual_dim_assistant.md` | dim_assistant |
|
||||
| 3 | `BD_manual_dim_goods_category.md` | dim_goods_category |
|
||||
| 4 | `BD_manual_dim_groupbuy_package.md` | dim_groupbuy_package |
|
||||
| 5 | `BD_manual_dim_member.md` | dim_member |
|
||||
| 6 | `BD_manual_dim_member_card_account.md` | dim_member_card_account |
|
||||
| 7 | `BD_manual_dim_site.md` | dim_site |
|
||||
| 8 | `BD_manual_dim_store_goods.md` | dim_store_goods |
|
||||
| 9 | `BD_manual_dim_table.md` | dim_table |
|
||||
| 10 | `BD_manual_dim_tenant_goods.md` | dim_tenant_goods |
|
||||
| 11 | `BD_manual_dwd_assistant_service_log.md` | dwd_assistant_service_log |
|
||||
| 12 | `BD_manual_dwd_assistant_trash_event.md` | dwd_assistant_trash_event |
|
||||
| 13 | `BD_manual_dwd_groupbuy_redemption.md` | dwd_groupbuy_redemption |
|
||||
| 14 | `BD_manual_dwd_member_balance_change.md` | dwd_member_balance_change |
|
||||
| 15 | `BD_manual_dwd_payment.md` | dwd_payment |
|
||||
| 16 | `BD_manual_dwd_platform_coupon_redemption.md` | dwd_platform_coupon_redemption |
|
||||
| 17 | `BD_manual_dwd_recharge_order.md` | dwd_recharge_order |
|
||||
| 18 | `BD_manual_dwd_refund.md` | dwd_refund |
|
||||
| 19 | `BD_manual_dwd_settlement_head.md` | dwd_settlement_head |
|
||||
| 20 | `BD_manual_dwd_store_goods_sale.md` | dwd_store_goods_sale |
|
||||
| 21 | `BD_manual_dwd_table_fee_adjust.md` | dwd_table_fee_adjust |
|
||||
| 22 | `BD_manual_dwd_table_fee_log.md` | dwd_table_fee_log |
|
||||
|
||||
### 扩展表文档(`DWD/Ex/`)— 共 19 份
|
||||
|
||||
| 序号 | 文件名 | 对应扩展表 |
|
||||
|------|--------|------------|
|
||||
| 1 | `BD_manual_dim_assistant_ex.md` | dim_assistant_ex |
|
||||
| 2 | `BD_manual_dim_groupbuy_package_ex.md` | dim_groupbuy_package_ex |
|
||||
| 3 | `BD_manual_dim_member_card_account_ex.md` | dim_member_card_account_ex |
|
||||
| 4 | `BD_manual_dim_member_ex.md` | dim_member_ex |
|
||||
| 5 | `BD_manual_dim_site_ex.md` | dim_site_ex |
|
||||
| 6 | `BD_manual_dim_store_goods_ex.md` | dim_store_goods_ex |
|
||||
| 7 | `BD_manual_dim_table_ex.md` | dim_table_ex |
|
||||
| 8 | `BD_manual_dim_tenant_goods_ex.md` | dim_tenant_goods_ex |
|
||||
| 9 | `BD_manual_dwd_assistant_service_log_ex.md` | dwd_assistant_service_log_ex |
|
||||
| 10 | `BD_manual_dwd_assistant_trash_event_ex.md` | dwd_assistant_trash_event_ex |
|
||||
| 11 | `BD_manual_dwd_groupbuy_redemption_ex.md` | dwd_groupbuy_redemption_ex |
|
||||
| 12 | `BD_manual_dwd_member_balance_change_ex.md` | dwd_member_balance_change_ex |
|
||||
| 13 | `BD_manual_dwd_platform_coupon_redemption_ex.md` | dwd_platform_coupon_redemption_ex |
|
||||
| 14 | `BD_manual_dwd_recharge_order_ex.md` | dwd_recharge_order_ex |
|
||||
| 15 | `BD_manual_dwd_refund_ex.md` | dwd_refund_ex |
|
||||
| 16 | `BD_manual_dwd_settlement_head_ex.md` | dwd_settlement_head_ex |
|
||||
| 17 | `BD_manual_dwd_store_goods_sale_ex.md` | dwd_store_goods_sale_ex |
|
||||
| 18 | `BD_manual_dwd_table_fee_adjust_ex.md` | dwd_table_fee_adjust_ex |
|
||||
| 19 | `BD_manual_dwd_table_fee_log_ex.md` | dwd_table_fee_log_ex |
|
||||
|
||||
### 变更记录(`DWD/changes/`)
|
||||
|
||||
| 文件名 | 说明 |
|
||||
|--------|------|
|
||||
| `2026-02-13_ddl_sync_dwd.md` | DDL 对比同步 — DWD 层 |
|
||||
| `20260214_drop_dwd_settle_list.md` | 移除 DWD settle_list 表 |
|
||||
|
||||
## DWS 层文档清单(dws)
|
||||
|
||||
### 表级文档(`DWS/main/`)— 共 29 份
|
||||
|
||||
| 序号 | 文件名 | 对应表 |
|
||||
|------|--------|--------|
|
||||
| 1 | `BD_manual_cfg_area_category.md` | cfg_area_category |
|
||||
| 2 | `BD_manual_cfg_assistant_level_price.md` | cfg_assistant_level_price |
|
||||
| 3 | `BD_manual_cfg_bonus_rules.md` | cfg_bonus_rules |
|
||||
| 4 | `BD_manual_cfg_index_parameters.md` | cfg_index_parameters |
|
||||
| 5 | `BD_manual_cfg_performance_tier.md` | cfg_performance_tier |
|
||||
| 6 | `BD_manual_cfg_skill_type.md` | cfg_skill_type |
|
||||
| 7 | `BD_manual_dws_assistant_customer_stats.md` | dws_assistant_customer_stats |
|
||||
| 8 | `BD_manual_dws_assistant_daily_detail.md` | dws_assistant_daily_detail |
|
||||
| 9 | `BD_manual_dws_assistant_finance_analysis.md` | dws_assistant_finance_analysis |
|
||||
| 10 | `BD_manual_dws_assistant_monthly_summary.md` | dws_assistant_monthly_summary |
|
||||
| 11 | `BD_manual_dws_assistant_recharge_commission.md` | dws_assistant_recharge_commission |
|
||||
| 12 | `BD_manual_dws_assistant_salary_calc.md` | dws_assistant_salary_calc |
|
||||
| 13 | `BD_manual_dws_finance_daily_summary.md` | dws_finance_daily_summary |
|
||||
| 14 | `BD_manual_dws_finance_discount_detail.md` | dws_finance_discount_detail |
|
||||
| 15 | `BD_manual_dws_finance_expense_summary.md` | dws_finance_expense_summary |
|
||||
| 16 | `BD_manual_dws_finance_income_structure.md` | dws_finance_income_structure |
|
||||
| 17 | `BD_manual_dws_finance_recharge_summary.md` | dws_finance_recharge_summary |
|
||||
| 18 | `BD_manual_dws_index_percentile_history.md` | dws_index_percentile_history |
|
||||
| 19 | `BD_manual_dws_member_assistant_intimacy.md` | dws_member_assistant_intimacy |
|
||||
| 20 | `BD_manual_dws_member_assistant_relation_index.md` | dws_member_assistant_relation_index |
|
||||
| 21 | `BD_manual_dws_member_consumption_summary.md` | dws_member_consumption_summary |
|
||||
| 22 | `BD_manual_dws_member_newconv_index.md` | dws_member_newconv_index |
|
||||
| 23 | `BD_manual_dws_member_visit_detail.md` | dws_member_visit_detail |
|
||||
| 24 | `BD_manual_dws_member_winback_index.md` | dws_member_winback_index |
|
||||
| 25 | `BD_manual_dws_ml_manual_order_alloc.md` | dws_ml_manual_order_alloc |
|
||||
| 26 | `BD_manual_dws_ml_manual_order_source.md` | dws_ml_manual_order_source |
|
||||
| 27 | `BD_manual_dws_order_summary.md` | dws_order_summary |
|
||||
| 28 | `BD_manual_dws_platform_settlement.md` | dws_platform_settlement |
|
||||
| 29 | `BD_manual_v_member_recall_priority.md` | v_member_recall_priority |
|
||||
|
||||
### 变更记录(`DWS/changes/`)
|
||||
|
||||
| 文件名 | 说明 |
|
||||
|--------|------|
|
||||
| `2026-02-13_ddl_sync_dws.md` | DDL 对比同步 — DWS 层 |
|
||||
|
||||
## ETL_Admin 层文档清单(etl_admin)
|
||||
|
||||
### 表级文档(`ETL_Admin/main/`)— 共 3 份
|
||||
|
||||
| 序号 | 文件名 | 对应表 |
|
||||
|------|--------|--------|
|
||||
| 1 | `BD_manual_etl_cursor.md` | etl_cursor |
|
||||
| 2 | `BD_manual_etl_run.md` | etl_run |
|
||||
| 3 | `BD_manual_etl_task.md` | etl_task |
|
||||
|
||||
### 变更记录(`ETL_Admin/changes/`)
|
||||
|
||||
暂无变更记录。
|
||||
|
||||
## 相关资源
|
||||
|
||||
| 资源 | 路径 | 说明 |
|
||||
| 内容 | 位置 | 说明 |
|
||||
|------|------|------|
|
||||
| ODS 数据字典 | `docs/dictionary/ods_tables_dictionary.md` | ODS 层所有表的概览汇总 |
|
||||
| DDL 对比结果 | `docs/bd_manual/ddl_compare_results.md` | DDL 文件与数据库实际状态的对比报告 |
|
||||
| DDL 文件 — ODS | `database/schema_ODS_doc.sql` | ODS 层表结构定义 |
|
||||
| DDL 文件 — DWD | `database/schema_dwd_doc.sql` | DWD 层表结构定义 |
|
||||
| DDL 文件 — DWS | `database/schema_dws.sql` | DWS 层表结构定义 |
|
||||
| DDL 文件 — ETL_Admin | `database/schema_etl_admin.sql` | ETL_Admin 层表结构定义 |
|
||||
| API 端点文档 | `docs/api-reference/endpoints/` | 上游 SaaS API 端点说明 |
|
||||
| DDL 对比脚本 | `scripts/compare_ddl_db.py` | DDL 与数据库实际状态对比工具 |
|
||||
| 文档验证脚本 | `scripts/validate_bd_manual.py` | BD_Manual 文档覆盖率和格式验证 |
|
||||
| DDL 基线 | `docs/database/ddl/` | 从数据库自动导出,按 schema 分文件 |
|
||||
| ODS→DWD 字段映射 | `docs/database/BD_Manual_*.md` | 跨层映射(ODS 表 → DWD 表) |
|
||||
| 表级字段说明 | 本目录 `*/main/BD_manual_*.md` | 单表字段详情 |
|
||||
| API→ODS 字段映射 | 本目录 `ODS/mappings/` | API JSON → ODS 列映射 |
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user