feat: P1-P3 全栈集成 — 数据库基础 + DWS 扩展 + 小程序鉴权 + 工程化体系
## P1 数据库基础 - zqyy_app: 创建 auth/biz schema、FDW 连接 etl_feiqiu - etl_feiqiu: 创建 app schema RLS 视图、商品库存预警表 - 清理 assistant_abolish 残留数据 ## P2 ETL/DWS 扩展 - 新增 DWS 助教订单贡献度表 (dws.assistant_order_contribution) - 新增 assistant_order_contribution_task 任务及 RLS 视图 - member_consumption 增加充值字段、assistant_daily 增加处罚字段 - 更新 ODS/DWD/DWS 任务文档及业务规则文档 - 更新 consistency_checker、flow_runner、task_registry 等核心模块 ## P3 小程序鉴权系统 - 新增 xcx_auth 路由/schema(微信登录 + JWT) - 新增 wechat/role/matching/application 服务层 - zqyy_app 鉴权表迁移 + 角色权限种子数据 - auth/dependencies.py 支持小程序 JWT 鉴权 ## 文档与审计 - 新增 DOCUMENTATION-MAP 文档导航 - 新增 7 份 BD_Manual 数据库变更文档 - 更新 DDL 基线快照(etl_feiqiu 6 schema + zqyy_app auth) - 新增全栈集成审计记录、部署检查清单更新 - 新增 BACKLOG 路线图、FDW→Core 迁移计划 ## Kiro 工程化 - 新增 5 个 Spec(P1/P2/P3/全栈集成/核心业务) - 新增审计自动化脚本(agent_on_stop/build_audit_context/compliance_prescan) - 新增 6 个 Hook(合规检查/会话日志/提交审计等) - 新增 doc-map steering 文件 ## 运维与测试 - 新增 ops 脚本:迁移验证/API 健康检查/ETL 监控/集成报告 - 新增属性测试:test_dws_contribution / test_auth_system - 清理过期 export 报告文件 - 更新 .gitignore 排除规则
This commit is contained in:
@@ -5,6 +5,26 @@
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-26
|
||||
|
||||
### 文档审计 — 任务统计、业务口径、SCD2 规则全面校正
|
||||
|
||||
- **摘要**:对 `docs/` 下 9 个文档进行系统性审计与修正,以 `task_registry.py` 为唯一事实来源,修复任务数量、域分组、表名、工具链引用等 15 处不一致
|
||||
- **修正清单**:
|
||||
- `docs/README.md`:移除不存在的 `audit/`、`reports/` 目录引用
|
||||
- `docs/etl_tasks/README.md`:ODS 22→23、DWS 13→17、补充 `ODS_STAFF_INFO`/`DWS_ASSISTANT_ORDER_CONTRIBUTION`/库存汇总域/SPI
|
||||
- `docs/etl_tasks/ods_tasks.md`:任务总数 22→23
|
||||
- `docs/etl_tasks/dws_tasks.md`:任务总数 14→17、域分组四组→五组、补充库存汇总域概述表格
|
||||
- `docs/etl_tasks/base_task_mechanism.md`:DWS 13→17、INDEX 4→5
|
||||
- `docs/operations/environment_setup.md`:移除重复段落
|
||||
- `docs/operations/troubleshooting.md`:`pip install -r requirements.txt` → `uv sync`
|
||||
- `docs/business-rules/dws_metrics.md`:完全重写,移除不存在的表、修正表名、补充库存汇总域和指数算法章节
|
||||
- `docs/business-rules/scd2_rules.md`:完全重写,填充 9 个维度表的实际跟踪字段、补充 `dim_staff` 维度、文档化变更检测机制
|
||||
- **影响范围**:文档(`docs/` 全目录,无代码变更)
|
||||
- **风险**:极低(纯文档修正)
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-19
|
||||
|
||||
### 文档全面刷新 — Schema 名称、技术栈、任务统计同步至项目现状
|
||||
|
||||
@@ -6,8 +6,6 @@
|
||||
|-------------|------|
|
||||
| [`architecture/`](architecture/README.md) | 架构设计文档 — 系统整体架构、数据流向(ODS→DWD→DWS)、模块交互关系 |
|
||||
| [`api-reference/`](api-reference/) | API 参考文档(25 个端点的标准化文档 + JSON 样本) |
|
||||
| [`audit/`](audit/README.md) | 审计目录(历史归档,新记录已迁移至根 `docs/audit/`) |
|
||||
| [`audit/repo/`](audit/repo/) | 仓库审计报告(由 `scripts/audit/` 自动生成:文件清单、调用流、文档对齐) |
|
||||
| [`business-rules/`](business-rules/README.md) | 业务规则文档 — 指数算法、DWS 口径定义、SCD2 处理规则等业务逻辑 |
|
||||
| [`database/`](database/README.md) | 数据库文档统一目录 — 层级概览 + ODS/DWD/DWS/ETL_Admin 表级文档 |
|
||||
| [`database/overview/`](database/overview/) | 层级概览 / 速查索引(表清单、主键、记录数、业务域分类) |
|
||||
@@ -17,8 +15,6 @@
|
||||
| [`database/ETL_Admin/`](database/ETL_Admin/) | ETL 管理层表手册(etl_cursor/etl_run/etl_task) |
|
||||
| [`etl_tasks/`](etl_tasks/README.md) | ETL 任务文档(ODS/DWD/DWS/指数任务说明与机制) |
|
||||
| [`operations/`](operations/README.md) | 运维文档 — 环境搭建指南、调度配置说明、故障排查手册 |
|
||||
| [`reports/`](reports/) | 分析报告(数据质量、一致性检查等输出) |
|
||||
| [`requirements/`](requirements/) | 需求文档(功能需求、口径补充、指数 PRD 等) |
|
||||
| [`CHANGELOG.md`](CHANGELOG.md) | 项目级版本变更历史(日期、变更摘要、影响范围) |
|
||||
|
||||
## 维护约定
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
本文档定义 `dws` 模式下各汇总指标的业务口径、计算规则和数据来源。
|
||||
所有指标均基于 DWD 明细层数据聚合生成。
|
||||
|
||||
> **状态**:骨架文档,各章节待补充具体计算公式与字段映射。
|
||||
> 各任务的详细实现(数据来源表、输出字段、核心逻辑)请参阅 [dws_tasks.md](../etl_tasks/dws_tasks.md)。
|
||||
|
||||
---
|
||||
|
||||
@@ -11,39 +11,51 @@
|
||||
|
||||
### 1.1 助教日报(dws_assistant_daily_detail)
|
||||
|
||||
<!-- 待补充:日维度助教业绩指标定义,包括订单数、服务时长、收入等 -->
|
||||
|
||||
- 目标表:`dws.dws_assistant_daily_detail`
|
||||
- 数据来源:DWD 订单事实表、助教维度表
|
||||
- 数据来源:`dwd_assistant_service_log`、`dwd_assistant_trash_event`、`dim_assistant`(SCD2)
|
||||
- 粒度:门店 × 助教 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
- 核心指标:服务次数(总/基础课/附加课/包厢课)、计费秒数与小时数、台账金额、去重客户数与台桌数、废除统计
|
||||
- 课程类型分类:通过 `cfg_skill_type` 映射 `skill_id` → `BASE`/`BONUS`/`ROOM`
|
||||
- 助教等级:SCD2 as-of 取值,取统计日当日生效的等级版本
|
||||
|
||||
### 1.2 助教月报(dws_assistant_monthly_summary)
|
||||
|
||||
<!-- 待补充:月维度助教业绩汇总指标 -->
|
||||
|
||||
- 目标表:`dws.dws_assistant_monthly_summary`
|
||||
- 数据来源:助教日报聚合
|
||||
- 数据来源:`dws_assistant_daily_detail` 聚合 + `dwd_assistant_service_log`(月度去重)+ `dim_assistant` + `cfg_performance_tier`
|
||||
- 粒度:门店 × 助教 × 年月
|
||||
- 核心指标:*(待定义)*
|
||||
- 核心指标:月度累计服务次数/时长/金额、有效业绩小时数(`total_hours - trashed_hours`)、绩效档位匹配、排名(考虑并列)
|
||||
- 新入职判断:入职日期在当月 1 日后即视为新入职,档位匹配按日均折算 30 天
|
||||
- 月度去重客户/台桌:从 DWD 直接去重,避免日度求和失真
|
||||
|
||||
### 1.3 助教客户统计(dws_assistant_customer_stats)
|
||||
|
||||
<!-- 待补充:助教服务客户维度的统计指标 -->
|
||||
|
||||
- 目标表:`dws.dws_assistant_customer_stats`
|
||||
- 数据来源:DWD 订单事实表、会员维度表
|
||||
- 数据来源:`dwd_assistant_service_log`、`dim_member`、`dim_assistant`
|
||||
- 粒度:门店 × 助教 × 会员
|
||||
- 核心指标:*(待定义)*
|
||||
- 核心指标:全量累计(首次/最近服务日期、累计次数/时长/金额)、6 个滚动窗口(7/10/15/30/60/90 天)、活跃度判定
|
||||
- 散客排除:`member_id` 为 0 或 None 不进入统计
|
||||
- HAVING 过滤:仅保留最近 90 天内有服务记录的助教-客户对
|
||||
|
||||
### 1.4 助教财务分析(dws_assistant_finance_analysis)
|
||||
|
||||
<!-- 待补充:助教维度的财务分析指标 -->
|
||||
|
||||
- 目标表:`dws.dws_assistant_finance_analysis`
|
||||
- 数据来源:DWD 支付/退款事实表
|
||||
- 数据来源:`dwd_assistant_service_log`、`cfg_skill_type`、`dws_assistant_salary_calc`、`dws_assistant_daily_detail`
|
||||
- 粒度:门店 × 助教 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
- 核心指标:日度收入(总/基础课/附加课/包厢课)、日均成本(`gross_salary / work_days`)、毛利润与毛利率
|
||||
- 依赖:`DWS_ASSISTANT_SALARY` 和 `DWS_ASSISTANT_DAILY` 的输出
|
||||
|
||||
### 1.5 助教订单流水贡献(dws_assistant_order_contribution)
|
||||
|
||||
- 目标表:`dws.dws_assistant_order_contribution`
|
||||
- 数据来源:`dwd_settlement_head`、`dwd_table_fee_log`、`dwd_assistant_service_log`
|
||||
- 粒度:门店 × 助教 × 日期
|
||||
- 核心指标(四项统计):
|
||||
- `order_gross_revenue`:订单总流水(台费 + 酒水食品 + 所有助教服务费)
|
||||
- `order_net_revenue`:订单净流水(总流水 - 所有助教服务分成)
|
||||
- `time_weighted_revenue`:时效贡献流水(按服务时长占比分摊)
|
||||
- `time_weighted_net_revenue`:时效净贡献(时效贡献 - 个人服务分成)
|
||||
- 超休/打赏课特殊处理:`course_type = BONUS` 的助教不参与订单级分摊
|
||||
- 依赖:`DWD_LOAD_FROM_ODS`
|
||||
|
||||
---
|
||||
|
||||
@@ -51,21 +63,17 @@
|
||||
|
||||
### 2.1 助教薪酬(dws_assistant_salary_calc)
|
||||
|
||||
<!-- 待补充:薪酬计算规则、提成比例、结算周期 -->
|
||||
|
||||
- 目标表:`dws.dws_assistant_salary_calc`
|
||||
- 数据来源:助教日报/月报、充值提成
|
||||
- 粒度:门店 × 助教 × 结算周期
|
||||
- 核心指标:*(待定义)*
|
||||
|
||||
### 2.2 充值提成(dws_assistant_recharge_commission)
|
||||
|
||||
<!-- 待补充:充值提成计算规则 -->
|
||||
|
||||
- 目标表:`dws.dws_assistant_recharge_commission`
|
||||
- 数据来源:DWD 充值事实表
|
||||
- 粒度:门店 × 助教 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
- 数据来源:`dws_assistant_monthly_summary`、`dws_assistant_recharge_commission`、`cfg_performance_tier`、`cfg_assistant_level_price`、`cfg_bonus_rules`
|
||||
- 粒度:门店 × 助教 × 结算月份
|
||||
- 核心公式:
|
||||
- 应发工资 = 课时收入 + 奖金合计
|
||||
- 基础课收入 = `base_hours × (base_course_price - base_deduction)`
|
||||
- 附加课收入 = `bonus_hours × bonus_course_price × (1 - bonus_deduction_ratio)`
|
||||
- 包厢课收入 = `room_hours × (room_course_price - base_deduction)`
|
||||
- 奖金合计 = 冲刺奖金 + Top3 排名奖金 + 充值提成 + 其他奖金
|
||||
- 等级定价:SCD2 按月份取历史生效值
|
||||
- 运行调度:默认仅月初前 5 天运行
|
||||
|
||||
---
|
||||
|
||||
@@ -73,57 +81,33 @@
|
||||
|
||||
### 3.1 财务日报汇总(dws_finance_daily_summary)
|
||||
|
||||
<!-- 待补充:每日财务汇总口径,含收入、支出、利润等 -->
|
||||
|
||||
- 目标表:`dws.dws_finance_daily_summary`
|
||||
- 数据来源:DWD 支付/退款/订单事实表
|
||||
- 数据来源:`dwd_settlement_head`、`dwd_groupbuy_redemption`、`dwd_recharge_order`、`dwd_member_balance_change`
|
||||
- 粒度:门店 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
- 核心指标:发生额(正价)、优惠合计(团购/会员/赠送卡/手动/抹零)、确认收入、现金流入/流出/净变动、卡消费、充值统计(首充/续充)、订单统计
|
||||
- 确认收入 = 发生额 - 优惠合计
|
||||
- 金额字段统一 `NUMERIC(12,2)`,货币单位人民币(CNY)
|
||||
|
||||
### 3.2 收入结构(dws_finance_income_structure)
|
||||
|
||||
<!-- 待补充:收入按来源/类型的分类口径 -->
|
||||
|
||||
- 目标表:`dws.dws_finance_income_structure`
|
||||
- 数据来源:DWD 支付事实表
|
||||
- 粒度:门店 × 日期 × 收入类型
|
||||
- 核心指标:*(待定义)*
|
||||
- 数据来源:`dwd_settlement_head`、`dwd_table_fee_log`、`dwd_assistant_service_log`、`dim_table`、`cfg_area_category`
|
||||
- 粒度:门店 × 日期 × 结构类型 × 分类代码
|
||||
- 两种分析维度:按收入类型(`INCOME_TYPE`:台费/商品/助教基础课/助教附加课)、按区域(`AREA`:通过 `cfg_area_category` 映射)
|
||||
|
||||
### 3.3 折扣明细(dws_finance_discount_detail)
|
||||
|
||||
<!-- 待补充:折扣/优惠统计口径 -->
|
||||
|
||||
- 目标表:`dws.dws_finance_discount_detail`
|
||||
- 数据来源:DWD 订单事实表
|
||||
- 粒度:门店 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
- 数据来源:`dwd_settlement_head`、`dwd_groupbuy_redemption`、`dwd_member_balance_change`
|
||||
- 粒度:门店 × 日期 × 折扣类型
|
||||
- 折扣类型:`GROUPBUY`/`VIP`/`ROUNDING`/`GIFT_CARD_TABLE`/`GIFT_CARD_DRINK`/`GIFT_CARD_COUPON`/`BIG_CUSTOMER`/`OTHER`
|
||||
|
||||
### 3.4 充值汇总(dws_finance_recharge_summary)
|
||||
|
||||
<!-- 待补充:充值金额、笔数等汇总口径 -->
|
||||
|
||||
- 目标表:`dws.dws_finance_recharge_summary`
|
||||
- 数据来源:DWD 充值事实表
|
||||
- 数据来源:`dwd_recharge_order`、`dim_member_card_account`
|
||||
- 粒度:门店 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
|
||||
### 3.5 支出汇总(dws_finance_expense_summary)
|
||||
|
||||
<!-- 待补充:支出分类汇总口径 -->
|
||||
|
||||
- 目标表:`dws.dws_finance_expense_summary`
|
||||
- 数据来源:DWD 支出事实表
|
||||
- 粒度:门店 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
|
||||
### 3.6 平台结算(dws_platform_settlement)
|
||||
|
||||
<!-- 待补充:第三方平台(团购等)结算口径 -->
|
||||
|
||||
- 目标表:`dws.dws_platform_settlement`
|
||||
- 数据来源:DWD 团购/支付事实表
|
||||
- 粒度:门店 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
- 核心指标:充值笔数/总额(现金+赠送)、首充/续充拆分、去重会员数、全店卡余额快照
|
||||
|
||||
---
|
||||
|
||||
@@ -131,21 +115,19 @@
|
||||
|
||||
### 4.1 会员消费汇总(dws_member_consumption_summary)
|
||||
|
||||
<!-- 待补充:会员消费行为汇总口径 -->
|
||||
|
||||
- 目标表:`dws.dws_member_consumption_summary`
|
||||
- 数据来源:DWD 订单/支付事实表、会员维度表
|
||||
- 数据来源:`dwd_settlement_head`、`dim_member`(SCD2)、`dim_member_card_account`(SCD2)
|
||||
- 粒度:门店 × 会员
|
||||
- 核心指标:*(待定义)*
|
||||
- 核心指标:全量累计消费、6 个滚动窗口(7/10/15/30/60/90 天)的到店次数与消费金额、卡余额(现金卡/赠送卡)、活跃度、客户分层
|
||||
- 客户分层规则:高价值(90 天 ≥ 3 次且 ≥ 1000 元)→ 中等(30 天内有消费)→ 低活跃(90 天内有但 30 天内无)→ 流失
|
||||
- 散客排除:`member_id` 为 0 或 None 不进入统计
|
||||
|
||||
### 4.2 会员到访明细(dws_member_visit_detail)
|
||||
|
||||
<!-- 待补充:会员到访频次、时段分布等口径 -->
|
||||
|
||||
- 目标表:`dws.dws_member_visit_detail`
|
||||
- 数据来源:DWD 订单事实表
|
||||
- 粒度:门店 × 会员 × 日期
|
||||
- 核心指标:*(待定义)*
|
||||
- 数据来源:`dwd_settlement_head`、`dwd_assistant_service_log`、`dwd_table_fee_log`、`dim_member`、`dim_table`、`cfg_area_category`
|
||||
- 粒度:门店 × 会员 × 结账单
|
||||
- 核心指标:消费金额拆分(台费/商品/助教)、支付方式拆分(现金/储值卡/赠送卡/团购券)、台桌使用时长、助教服务明细(JSON)
|
||||
|
||||
---
|
||||
|
||||
@@ -153,67 +135,84 @@
|
||||
|
||||
### 5.1 订单汇总宽表(dws_order_summary)
|
||||
|
||||
<!-- 待补充:订单维度的汇总宽表口径 -->
|
||||
|
||||
- 目标表:`dws.dws_order_summary`
|
||||
- 数据来源:DWD 订单/支付/退款事实表
|
||||
- 数据来源:`dwd_settlement_head`、`dwd_table_fee_log`、`dwd_assistant_service_log`、`dwd_store_goods_sale`、`dwd_groupbuy_redemption`、`dwd_refund`/`dwd_refund_ex`
|
||||
- 粒度:门店 × 结账单
|
||||
- 核心指标:*(待定义)*
|
||||
- 核心指标:费用明细(台费/助教/商品/团购)、优惠、金额汇总、支付方式、台账流水、有效消费、退款与净收入
|
||||
- 通过 6 个 CTE 多表合并,金额优先取明细表聚合值,回退到结账单头表汇总字段
|
||||
|
||||
---
|
||||
|
||||
## 6. 自定义指数算法
|
||||
## 6. 库存汇总
|
||||
|
||||
指数算法的详细计算流程、参数与归一化方法请参阅 [index_algorithm_cn.md](index_algorithm_cn.md)。
|
||||
### 6.1 库存日度汇总(dws_goods_stock_daily_summary)
|
||||
|
||||
- 目标表:`dws.dws_goods_stock_daily_summary`
|
||||
- 数据来源:`dwd_goods_stock_summary`
|
||||
- 粒度:门店 × 日期 × 商品
|
||||
- 更新策略:upsert(ON CONFLICT DO UPDATE)
|
||||
- 核心逻辑:按 `fetched_at` 日期分组,数值指标取 SUM,期初/期末取当日首/末条记录
|
||||
|
||||
### 6.2 库存周度汇总(dws_goods_stock_weekly_summary)
|
||||
|
||||
- 目标表:`dws.dws_goods_stock_weekly_summary`
|
||||
- 数据来源:`dwd_goods_stock_summary`
|
||||
- 粒度:门店 × ISO 周 × 商品
|
||||
- 更新策略:upsert(ON CONFLICT DO UPDATE)
|
||||
- 核心逻辑:按 ISO 周分组,`stat_date` = 该周周一
|
||||
|
||||
### 6.3 库存月度汇总(dws_goods_stock_monthly_summary)
|
||||
|
||||
- 目标表:`dws.dws_goods_stock_monthly_summary`
|
||||
- 数据来源:`dwd_goods_stock_summary`
|
||||
- 粒度:门店 × 自然月 × 商品
|
||||
- 更新策略:upsert(ON CONFLICT DO UPDATE)
|
||||
- 核心逻辑:按自然月分组,`stat_date` = 该月第一天
|
||||
|
||||
---
|
||||
|
||||
## 7. 自定义指数算法
|
||||
|
||||
指数算法的详细计算流程、参数与归一化方法请参阅 [index_tasks.md](../etl_tasks/index_tasks.md)。
|
||||
|
||||
以下为各指数对应的汇总表概览:
|
||||
|
||||
### 6.1 会员召回指数 — WBI(dws_member_recall_index)
|
||||
|
||||
<!-- 待补充:WBI 指数的业务口径与触发条件 -->
|
||||
|
||||
- 目标表:`dws.dws_member_recall_index`
|
||||
- 粒度:门店 × 会员
|
||||
|
||||
### 6.2 新客转化指数 — NCI(dws_member_newconv_index)
|
||||
|
||||
<!-- 待补充:NCI 指数的业务口径与评分规则 -->
|
||||
|
||||
- 目标表:`dws.dws_member_newconv_index`
|
||||
- 粒度:门店 × 会员
|
||||
|
||||
### 6.3 关系指数 — RS(dws_member_assistant_relation_index)
|
||||
|
||||
<!-- 待补充:RS 指数的业务口径与亲密度计算 -->
|
||||
|
||||
- 目标表:`dws.dws_member_assistant_relation_index`
|
||||
- 粒度:门店 × 会员 × 助教
|
||||
|
||||
### 6.4 助教-会员亲密度(dws_member_assistant_intimacy)
|
||||
|
||||
<!-- 待补充:亲密度评分口径 -->
|
||||
|
||||
- 目标表:`dws.dws_member_assistant_intimacy`
|
||||
- 粒度:门店 × 会员 × 助教
|
||||
|
||||
### 6.5 回流指数 — OS(dws_member_winback_index)
|
||||
|
||||
<!-- 待补充:OS 指数的业务口径与回流判定规则 -->
|
||||
### 7.1 回流指数 — WBI(dws_member_winback_index)
|
||||
|
||||
- 目标表:`dws.dws_member_winback_index`
|
||||
- 粒度:门店 × 会员
|
||||
- 任务代码:`DWS_WINBACK_INDEX`
|
||||
- 依赖:`DWS_MEMBER_VISIT`、`DWS_MEMBER_CONSUMPTION`
|
||||
|
||||
### 6.6 人工台账 — ML(dws_ml_manual_order_source / dws_ml_manual_order_alloc)
|
||||
### 7.2 新客转化指数 — NCI(dws_member_newconv_index)
|
||||
|
||||
<!-- 待补充:ML 人工台账的业务口径与分配规则 -->
|
||||
- 目标表:`dws.dws_member_newconv_index`
|
||||
- 粒度:门店 × 会员
|
||||
- 任务代码:`DWS_NEWCONV_INDEX`
|
||||
- 依赖:`DWS_MEMBER_VISIT`、`DWS_MEMBER_CONSUMPTION`
|
||||
|
||||
### 7.3 关系指数 — RS(dws_member_assistant_relation_index)
|
||||
|
||||
- 目标表:`dws.dws_relation_index`
|
||||
- 粒度:门店 × 会员 × 助教
|
||||
- 任务代码:`DWS_RELATION_INDEX`
|
||||
- 依赖:`DWS_ASSISTANT_DAILY`
|
||||
|
||||
### 7.4 消费力指数 — SPI(dws_member_spending_power_index)
|
||||
|
||||
- 目标表:`dws.dws_member_spending_power_index`
|
||||
- 粒度:门店 × 会员
|
||||
- 任务代码:`DWS_SPENDING_POWER_INDEX`
|
||||
- 依赖:`DWS_MEMBER_CONSUMPTION`
|
||||
|
||||
### 7.5 人工台账 — ML(dws_ml_manual_order_source / dws_ml_manual_order_alloc)
|
||||
|
||||
- 宽表:`dws.dws_ml_manual_order_source`
|
||||
- 窄表:`dws.dws_ml_manual_order_alloc`
|
||||
- 粒度:门店 × 订单 × 助教
|
||||
- 任务代码:`DWS_ML_MANUAL_IMPORT`
|
||||
|
||||
### 6.7 指数百分位历史(dws_index_percentile_history)
|
||||
|
||||
<!-- 待补充:指数百分位归一化的历史快照口径 -->
|
||||
### 7.6 指数百分位历史(dws_index_percentile_history)
|
||||
|
||||
- 目标表:`dws.dws_index_percentile_history`
|
||||
- 粒度:门店 × 指数类型 × 日期
|
||||
|
||||
@@ -3,8 +3,6 @@
|
||||
本文档定义 `dwd` 模式下维度表的 SCD2(Slowly Changing Dimension Type 2)处理策略、
|
||||
生效区间管理和版本控制规则。
|
||||
|
||||
> **状态**:骨架文档,各维度表的跟踪字段与变更触发条件待补充。
|
||||
|
||||
---
|
||||
|
||||
## 1. 概述
|
||||
@@ -17,9 +15,18 @@ SCD2 通过保留维度记录的历史版本来追踪属性变化。当被跟踪
|
||||
|
||||
### 1.2 实现模块
|
||||
|
||||
- 处理器:`scd/scd2_handler.py` — `SCD2Handler` 类
|
||||
- 核心方法:`upsert(table_name, natural_key, tracked_fields, record, effective_date)`
|
||||
- 返回值:`INSERT`(新记录)、`UPDATE`(属性变更)、`UNCHANGED`(无变化)
|
||||
- 处理器:`tasks/dwd/dwd_load_task.py` — `_merge_dim_scd2()` 方法
|
||||
- 变更检测:`_is_row_changed()` — 比较所有非 SCD2 控制列,任一列值不同即视为变更
|
||||
- 批量关闭:`_close_current_dim_bulk()` — 批量设置旧版本的 `scd2_end_time` 和 `scd2_is_current = 0`
|
||||
- 批量插入:`_insert_dim_rows_bulk()` — 批量插入新版本行
|
||||
|
||||
### 1.3 变更检测逻辑
|
||||
|
||||
`_is_row_changed(current, incoming, dwd_cols)` 遍历目标表的所有列(排除 SCD2 控制列),逐列比较当前版本与新数据。比较时会进行类型归一化处理:
|
||||
- 空值归一化:`None`、空字符串、`"null"` 视为等价
|
||||
- 数值归一化:字符串形式的数字与 `Decimal`/`int` 比较前先转换
|
||||
- 布尔归一化:`"true"`/`"1"`/`"yes"` 等与 `True` 视为等价
|
||||
- 日期归一化:字符串形式的日期与 `datetime` 比较前先解析
|
||||
|
||||
---
|
||||
|
||||
@@ -38,103 +45,119 @@ SCD2 通过保留维度记录的历史版本来追踪属性变化。当被跟踪
|
||||
|
||||
- 主键:`(natural_key, scd2_start_time)` — 同一自然键的不同版本通过生效时间区分
|
||||
- 唯一索引:`WHERE scd2_is_current = 1` — 保证每个自然键只有一条当前记录
|
||||
- 排他约束(GiST):`tstzrange(scd2_start_time, scd2_end_time)` — 防止同一自然键的版本时间段重叠
|
||||
|
||||
---
|
||||
|
||||
## 3. 处理流程
|
||||
|
||||
```
|
||||
收到维度记录
|
||||
_merge_dim_scd2(cur, dwd_table, ods_table, dwd_cols, ods_cols, now)
|
||||
│
|
||||
▼
|
||||
按 natural_key 查找 valid_to IS NULL 的当前记录
|
||||
├── 1. 从 ODS 取最新有效版本(DISTINCT ON + is_delete IS DISTINCT FROM 1)
|
||||
│
|
||||
├── 不存在 → INSERT 新记录(is_current=1, valid_from=now)
|
||||
├── 2. 从 DWD 取当前版本(scd2_is_current = 1)
|
||||
│
|
||||
└── 存在 → 比较 tracked_fields
|
||||
│
|
||||
├── 无变化 → UNCHANGED(跳过)
|
||||
│
|
||||
└── 有变化 → UPDATE 旧记录(valid_to=now, is_current=0)
|
||||
INSERT 新记录(valid_from=now, is_current=1)
|
||||
├── 3. 按自然键逐条比较:
|
||||
│ │
|
||||
│ ├── DWD 中不存在 → 收集为待插入(INSERT)
|
||||
│ │
|
||||
│ ├── 存在但 _is_row_changed() 返回 True → 收集为待更新
|
||||
│ │ ├── 关闭旧版本(scd2_end_time = now, scd2_is_current = 0)
|
||||
│ │ └── 插入新版本(scd2_start_time = now, scd2_is_current = 1, version + 1)
|
||||
│ │
|
||||
│ └── 存在且无变化 → 跳过(UNCHANGED)
|
||||
│
|
||||
├── 4. _close_current_dim_bulk() — 批量关闭旧版本
|
||||
│
|
||||
└── 5. _insert_dim_rows_bulk() — 批量插入新版本
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. 维度表 SCD2 配置
|
||||
|
||||
> 跟踪字段 = 表中除自然键和 SCD2 控制列(`scd2_start_time`/`scd2_end_time`/`scd2_is_current`/`scd2_version`)之外的所有列。任一跟踪字段值变化即触发新版本。
|
||||
|
||||
### 4.1 门店维度(dim_site / dim_site_ex)
|
||||
|
||||
<!-- 待补充:自然键、跟踪字段列表 -->
|
||||
|
||||
- Schema:`dwd`
|
||||
- ODS 来源:`ods.table_fee_transactions`(从台费流水中的 `siteProfile` 快照提取)
|
||||
- 自然键:`site_id`
|
||||
- 跟踪字段:*(待定义)*
|
||||
- 变更触发场景:*(待补充)*
|
||||
- dim_site 跟踪字段:`org_id`、`tenant_id`、`shop_name`、`site_label`、`full_address`、`address`、`longitude`、`latitude`、`tenant_site_region_id`、`business_tel`、`site_type`、`shop_status`
|
||||
- dim_site_ex 跟踪字段:`avatar`、`address`、`longitude`、`latitude`、`tenant_site_region_id`、`auto_light`、`light_status`、`light_type`、`light_token`、`site_type`、`site_label`、`attendance_enabled`、`attendance_distance`、`customer_service_qrcode`、`customer_service_wechat`、`fixed_pay_qrcode`、`prod_env`、`shop_status`、`create_time`、`update_time`
|
||||
- 变更触发场景:门店名称/地址/状态/经纬度等基础信息变更
|
||||
|
||||
### 4.2 台桌维度(dim_table / dim_table_ex)
|
||||
|
||||
<!-- 待补充:自然键、跟踪字段列表 -->
|
||||
|
||||
- Schema:`dwd`
|
||||
- ODS 来源:`ods.site_tables_master`
|
||||
- 自然键:`table_id`
|
||||
- 跟踪字段:*(待定义)*
|
||||
- 变更触发场景:*(待补充)*
|
||||
- dim_table 跟踪字段:`site_id`、`table_name`、`site_table_area_id`、`site_table_area_name`、`tenant_table_area_id`、`table_price`、`order_id`
|
||||
- dim_table_ex 跟踪字段:`show_status`、`is_online_reservation`、`table_cloth_use_time`、`table_cloth_use_cycle`、`table_status`、`create_time`、`light_status`、`tablestatusname`、`sitename`、`applet_qr_code_url`、`audit_status`、`charge_free`、`delay_lights_time`、`is_rest_area`、`only_allow_groupon`、`order_delay_time`、`self_table`、`temporary_light_second`、`virtual_table`
|
||||
- 变更触发场景:台桌名称/区域/价格/状态变更
|
||||
|
||||
### 4.3 助教维度(dim_assistant / dim_assistant_ex)
|
||||
|
||||
<!-- 待补充:自然键、跟踪字段列表 -->
|
||||
|
||||
- Schema:`dwd`
|
||||
- ODS 来源:`ods.assistant_accounts_master`
|
||||
- 自然键:`assistant_id`
|
||||
- 跟踪字段:*(待定义)*
|
||||
- 变更触发场景:*(待补充)*
|
||||
- dim_assistant 跟踪字段:`user_id`、`assistant_no`、`real_name`、`nickname`、`mobile`、`tenant_id`、`site_id`、`team_id`、`team_name`、`level`、`entry_time`、`resign_time`、`leave_status`、`assistant_status`
|
||||
- dim_assistant_ex 跟踪字段:`gender`、`birth_date`、`avatar`、`introduce`、`video_introduction_url`、`height`、`weight`、`shop_name`、`group_id`、`group_name`、`person_org_id`、`staff_id`、`staff_profile_id`、`assistant_grade`、`sum_grade`、`get_grade_times`、`charge_way`、`allow_cx`、`is_guaranteed`、`salary_grant_enabled`、`entry_type`、`entry_sign_status`、`resign_sign_status`、`work_status`、`show_status`、`show_sort`、`online_status`、`is_delete`、`criticism_status`、`create_time`、`update_time`、`start_time`、`end_time`、`last_table_id`、`last_table_name`、`last_update_name`、`order_trade_no`、`ding_talk_synced`、`site_light_cfg_id`、`light_equipment_id`、`light_status`、`is_team_leader`、`serial_number`、`system_role_id`、`job_num`、`cx_unit_price`、`pd_unit_price`
|
||||
- 变更触发场景:助教等级/团队/状态/入职离职/评分等变更
|
||||
|
||||
### 4.4 会员维度(dim_member / dim_member_ex)
|
||||
|
||||
<!-- 待补充:自然键、跟踪字段列表 -->
|
||||
|
||||
- Schema:`dwd`
|
||||
- ODS 来源:`ods.member_profiles`
|
||||
- 自然键:`member_id`
|
||||
- 跟踪字段:*(待定义)*
|
||||
- 变更触发场景:*(待补充)*
|
||||
- dim_member 跟踪字段:`system_member_id`、`tenant_id`、`register_site_id`、`mobile`、`nickname`、`member_card_grade_code`、`member_card_grade_name`、`create_time`、`update_time`、`pay_money_sum`、`recharge_money_sum`、`birthday`
|
||||
- dim_member_ex 跟踪字段:`referrer_member_id`、`point`、`register_site_name`、`growth_value`、`user_status`、`status`、`person_tenant_org_id`、`person_tenant_org_name`、`register_source`
|
||||
- 变更触发场景:会员昵称/手机号/卡等级/累计消费充值/状态等变更
|
||||
|
||||
### 4.5 会员卡账户维度(dim_member_card_account / dim_member_card_account_ex)
|
||||
|
||||
<!-- 待补充:自然键、跟踪字段列表 -->
|
||||
|
||||
- Schema:`dwd`
|
||||
- ODS 来源:`ods.member_stored_value_cards`
|
||||
- 自然键:`member_card_id`
|
||||
- 跟踪字段:*(待定义)*
|
||||
- 变更触发场景:*(待补充)*
|
||||
- dim_member_card_account 跟踪字段:`tenant_id`、`register_site_id`、`tenant_member_id`、`system_member_id`、`card_type_id`、`member_card_grade_code`、`member_card_grade_code_name`、`member_card_type_name`、`member_name`、`member_mobile`、`balance`、`start_time`、`end_time`、`last_consume_time`、`status`、`is_delete`、`principal_balance`、`member_grade`
|
||||
- dim_member_card_account_ex 跟踪字段:(60+ 列,含各类折扣比例、抵扣开关等,详见 DDL)
|
||||
- 变更触发场景:卡余额/状态/折扣配置/有效期等变更
|
||||
|
||||
### 4.6 商品维度(dim_tenant_goods / dim_tenant_goods_ex / dim_store_goods / dim_store_goods_ex)
|
||||
### 4.6 商品维度
|
||||
|
||||
<!-- 待补充:自然键、跟踪字段列表 -->
|
||||
#### 租户商品(dim_tenant_goods / dim_tenant_goods_ex)
|
||||
|
||||
- Schema:`dwd`
|
||||
- 自然键:`tenant_goods_id` / `site_goods_id`
|
||||
- 跟踪字段:*(待定义)*
|
||||
- 变更触发场景:*(待补充)*
|
||||
- ODS 来源:`ods.tenant_goods_master`
|
||||
- 自然键:`tenant_goods_id`
|
||||
- dim_tenant_goods 跟踪字段:`tenant_id`、`supplier_id`、`category_name`、`goods_category_id`、`goods_second_category_id`、`goods_name`、`goods_number`、`unit`、`market_price`、`goods_state`、`create_time`、`update_time`、`is_delete`、`not_sale`
|
||||
|
||||
#### 门店商品(dim_store_goods / dim_store_goods_ex)
|
||||
|
||||
- ODS 来源:`ods.store_goods_master`
|
||||
- 自然键:`site_goods_id`
|
||||
- dim_store_goods 跟踪字段:`tenant_id`、`site_id`、`tenant_goods_id`、`goods_name`、`goods_category_id`、`goods_second_category_id`、`category_level1_name`、`category_level2_name`、`batch_stock_qty`、`sale_qty`、`total_sales_qty`、`sale_price`、`created_at`、`updated_at`、`avg_monthly_sales`、`goods_state`、`enable_status`、`send_state`、`is_delete`、`commodity_code`、`not_sale`
|
||||
|
||||
### 4.7 商品分类维度(dim_goods_category)
|
||||
|
||||
<!-- 待补充:自然键、跟踪字段列表 -->
|
||||
|
||||
- Schema:`dwd`
|
||||
- ODS 来源:`ods.stock_goods_category_tree`
|
||||
- 自然键:`category_id`
|
||||
- 跟踪字段:*(待定义)*
|
||||
- 变更触发场景:*(待补充)*
|
||||
- 跟踪字段:`tenant_id`、`category_name`、`alias_name`、`parent_category_id`、`business_name`、`tenant_goods_business_id`、`category_level`、`is_leaf`、`open_salesman`、`sort_order`、`is_warehousing`
|
||||
- 变更触发场景:分类名称/层级/排序/启用状态变更
|
||||
|
||||
### 4.8 团购套餐维度(dim_groupbuy_package / dim_groupbuy_package_ex)
|
||||
|
||||
<!-- 待补充:自然键、跟踪字段列表 -->
|
||||
|
||||
- Schema:`dwd`
|
||||
- ODS 来源:`ods.group_buy_packages`
|
||||
- 自然键:`groupbuy_package_id`
|
||||
- 跟踪字段:*(待定义)*
|
||||
- 变更触发场景:*(待补充)*
|
||||
- dim_groupbuy_package 跟踪字段:`tenant_id`、`site_id`、`package_name`、`package_template_id`、`selling_price`、`coupon_face_value`、`duration_seconds`、`start_time`、`end_time`、`table_area_name`、`is_enabled`、`is_delete`、`create_time`、`tenant_table_area_id_list`、`card_type_ids`、`sort`、`is_first_limit`
|
||||
- 变更触发场景:套餐名称/价格/面值/有效期/启用状态变更
|
||||
|
||||
### 4.9 员工维度(dim_staff / dim_staff_ex)
|
||||
|
||||
- ODS 来源:`ods.staff_info_master`
|
||||
- 自然键:`staff_id`
|
||||
- dim_staff 跟踪字段:`staff_name`、`alias_name`、`mobile`、`gender`、`job`、`tenant_id`、`site_id`、`system_role_id`、`staff_identity`、`status`、`leave_status`、`entry_time`、`resign_time`、`is_delete`
|
||||
- dim_staff_ex 跟踪字段:`avatar`、`job_num`、`account_status`、`rank_id`、`rank_name`、`new_rank_id`、`new_staff_identity`、`is_reserve`、`shop_name`、`site_label`、`tenant_org_id`、`system_user_id`、`cashier_point_id`、`cashier_point_name`、`group_id`、`group_name`、`staff_profile_id`、`auth_code`、`auth_code_create`、`ding_talk_synced`、`salary_grant_enabled`、`entry_type`、`entry_sign_status`、`resign_sign_status`、`criticism_status`、`create_time`、`user_roles`
|
||||
- 变更触发场景:员工姓名/岗位/角色/状态/入职离职等变更
|
||||
|
||||
---
|
||||
|
||||
@@ -167,9 +190,10 @@ ORDER BY scd2_start_time;
|
||||
|
||||
## 6. 注意事项
|
||||
|
||||
- **时区**:`scd2_start_time` / `scd2_end_time` 使用 `TIMESTAMPTZ`,统一以服务器时区存储
|
||||
- **时区**:`scd2_start_time` / `scd2_end_time` 使用 `TIMESTAMPTZ`,统一以 `Asia/Shanghai` 时区存储
|
||||
- **并发安全**:当前实现在单次 ETL 运行内串行处理,未做行级锁;并发写入需额外保护
|
||||
- **删除策略**:维度记录不做物理删除,仅通过关闭版本(`scd2_is_current = 0`)标记失效
|
||||
- **ODS 来源过滤**:从 ODS 取数时统一使用 `DISTINCT ON (natural_key) ... WHERE is_delete IS DISTINCT FROM 1 ORDER BY natural_key, fetched_at DESC`,确保取最新有效版本
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -51,10 +51,10 @@ graph LR
|
||||
| 文档 | 说明 |
|
||||
|------|------|
|
||||
| [BaseTask 公共机制](base_task_mechanism.md) | 任务基类模板方法、TaskContext、时间窗口、注册表、Flow 执行 |
|
||||
| [ODS 层任务](ods_tasks.md) | 22 个通用 ODS 任务的架构、配置结构、API 端点、目标表 |
|
||||
| [ODS 层任务](ods_tasks.md) | 23 个通用 ODS 任务的架构、配置结构、API 端点、目标表 |
|
||||
| [DWD 层任务](dwd_tasks.md) | DWD_LOAD_FROM_ODS 核心装载、SCD2 处理、质量校验 |
|
||||
| [DWS 层任务](dws_tasks.md) | 助教业绩、会员分析、财务统计、运维任务共 13 个 DWS 任务 |
|
||||
| [INDEX 层任务](index_tasks.md) | WBI/NCI/RS 指数算法 + ML 手动台账导入 |
|
||||
| [DWS 层任务](dws_tasks.md) | 助教业绩、会员分析、财务统计、库存汇总、运维任务共 17 个 DWS 任务 |
|
||||
| [INDEX 层任务](index_tasks.md) | WBI/NCI/RS/SPI 指数算法 + ML 手动台账导入 |
|
||||
| [工具类任务](utility_tasks.md) | Schema 初始化、手动入库、归档、截止检查、完整性校验 |
|
||||
|
||||
---
|
||||
@@ -89,6 +89,7 @@ graph LR
|
||||
| `ODS_STORE_GOODS_SALES` | `OdsGoodsLedgerTask` | `ods.store_goods_sales_records` | 门店商品销售流水 | [查看](ods_tasks.md) |
|
||||
| `ODS_TENANT_GOODS` | `OdsTenantGoodsTask` | `ods.tenant_goods_master` | 租户商品档案 | [查看](ods_tasks.md) |
|
||||
| `ODS_SETTLEMENT_RECORDS` | `OdsOrderSettleTask` | `ods.settlement_records` | 结账记录 | [查看](ods_tasks.md) |
|
||||
| `ODS_STAFF_INFO` | `OdsStaffInfoTask` | `ods.staff_info_master` | 员工档案(含在职/离职) | [查看](ods_tasks.md) |
|
||||
|
||||
### DWD 层(明细数据)
|
||||
|
||||
@@ -108,6 +109,7 @@ graph LR
|
||||
| `DWS_ASSISTANT_CUSTOMER` | `AssistantCustomerTask` | `dws_assistant_customer_stats` | 日期+助教+会员 | [查看](dws_tasks.md) |
|
||||
| `DWS_ASSISTANT_SALARY` | `AssistantSalaryTask` | `dws_assistant_salary_calc` | 月份+助教 | [查看](dws_tasks.md) |
|
||||
| `DWS_ASSISTANT_FINANCE` | `AssistantFinanceTask` | `dws_assistant_finance_analysis` | 日期+助教 | [查看](dws_tasks.md) |
|
||||
| `DWS_ASSISTANT_ORDER_CONTRIBUTION` | `AssistantOrderContributionTask` | `dws_assistant_order_contribution` | 日期+助教 | [查看](dws_tasks.md) |
|
||||
|
||||
#### 会员分析域
|
||||
|
||||
@@ -125,6 +127,14 @@ graph LR
|
||||
| `DWS_FINANCE_INCOME_STRUCTURE` | `FinanceIncomeStructureTask` | `dws_finance_income_structure` | 日期+收入类型 | [查看](dws_tasks.md) |
|
||||
| `DWS_FINANCE_DISCOUNT_DETAIL` | `FinanceDiscountDetailTask` | `dws_finance_discount_detail` | 日期+折扣类型 | [查看](dws_tasks.md) |
|
||||
|
||||
#### 库存汇总域
|
||||
|
||||
| 任务代码 | Python 类 | 目标表 | 粒度 | 详情 |
|
||||
|----------|-----------|--------|------|------|
|
||||
| `DWS_GOODS_STOCK_DAILY` | `GoodsStockDailyTask` | `dws_goods_stock_daily_summary` | 日期+商品 | [查看](dws_tasks.md) |
|
||||
| `DWS_GOODS_STOCK_WEEKLY` | `GoodsStockWeeklyTask` | `dws_goods_stock_weekly_summary` | ISO周+商品 | [查看](dws_tasks.md) |
|
||||
| `DWS_GOODS_STOCK_MONTHLY` | `GoodsStockMonthlyTask` | `dws_goods_stock_monthly_summary` | 月份+商品 | [查看](dws_tasks.md) |
|
||||
|
||||
#### 运维任务
|
||||
|
||||
| 任务代码 | Python 类 | 简要说明 | 详情 |
|
||||
@@ -140,6 +150,7 @@ graph LR
|
||||
| `DWS_NEWCONV_INDEX` | `NewconvIndexTask` | `dws_member_newconv_index` | NCI(新客转化指数) | [查看](index_tasks.md) |
|
||||
| `DWS_RELATION_INDEX` | `RelationIndexTask` | `dws_relation_index` | RS(关系指数) | [查看](index_tasks.md) |
|
||||
| `DWS_ML_MANUAL_IMPORT` | `MlManualImportTask` | `dws_ml_manual_ledger` | ML(手动台账导入) | [查看](index_tasks.md) |
|
||||
| `DWS_SPENDING_POWER_INDEX` | `SpendingPowerIndexTask` | `dws_member_spending_power_index` | SPI(消费力指数) | [查看](index_tasks.md) |
|
||||
|
||||
### 工具类 / 校验类
|
||||
|
||||
@@ -353,4 +364,4 @@ python -m cli.main --tasks DATA_INTEGRITY_CHECK
|
||||
|
||||
---
|
||||
|
||||
> 最后更新日期:2026-02-18
|
||||
> 最后更新日期:2026-02-26
|
||||
|
||||
@@ -286,8 +286,8 @@ default_registry.register("DWS_ASSISTANT_FINANCE", AssistantFinanceTask, layer="
|
||||
|----|------|------|
|
||||
| ODS | 23 | 通用 ODS 任务(由 `ODS_TASK_CLASSES` 动态生成),全部默认 `skip_unchanged=True` |
|
||||
| DWD | 2 | 含核心装载任务 `DWD_LOAD_FROM_ODS` 和质量检查 |
|
||||
| DWS | 13 | 助教业绩、会员分析、财务统计、统一维护任务(原 3 个 MV 刷新/清理任务已合并为 DWS_MAINTENANCE) |
|
||||
| INDEX | 4 | 回流指数、新客转化指数、关系指数、手动台账导入 |
|
||||
| DWS | 17 | 助教业绩(含订单流水贡献)、会员分析、财务统计、库存汇总、运维维护(原 3 个 MV 刷新/清理任务已合并为 DWS_MAINTENANCE) |
|
||||
| INDEX | 5 | 回流指数、新客转化指数、关系指数、消费力指数、手动台账导入 |
|
||||
| 工具类 | 7 | Schema 初始化、手动入库、归档、校验等 |
|
||||
| 校验类 | 1 | 数据完整性校验 |
|
||||
|
||||
|
||||
@@ -8,9 +8,9 @@
|
||||
|
||||
## 概述
|
||||
|
||||
DWS 层共有 13 个已注册任务,按业务域分为四组:
|
||||
DWS 层共有 17 个已注册任务(含 DWS_MAINTENANCE),按业务域分为五组:
|
||||
|
||||
### 助教业绩域(5 个)
|
||||
### 助教业绩域(6 个)
|
||||
|
||||
| 任务代码 | Python 类 | 目标表 | 粒度 | 更新策略 |
|
||||
|----------|-----------|--------|------|----------|
|
||||
@@ -19,6 +19,7 @@ DWS 层共有 13 个已注册任务,按业务域分为四组:
|
||||
| `DWS_ASSISTANT_CUSTOMER` | `AssistantCustomerTask` | `dws_assistant_customer_stats` | 日期+助教+会员 | delete-before-insert |
|
||||
| `DWS_ASSISTANT_SALARY` | `AssistantSalaryTask` | `dws_assistant_salary_calc` | 月份+助教 | delete-before-insert |
|
||||
| `DWS_ASSISTANT_FINANCE` | `AssistantFinanceTask` | `dws_assistant_finance_analysis` | 日期+助教 | delete-before-insert |
|
||||
| `DWS_ASSISTANT_ORDER_CONTRIBUTION` | `AssistantOrderContributionTask` | `dws_assistant_order_contribution` | 日期+助教 | delete-before-insert |
|
||||
|
||||
### 会员分析域(2 个)
|
||||
|
||||
@@ -36,6 +37,14 @@ DWS 层共有 13 个已注册任务,按业务域分为四组:
|
||||
| `DWS_FINANCE_INCOME_STRUCTURE` | `FinanceIncomeStructureTask` | `dws_finance_income_structure` | 日期+收入类型 | delete-before-insert |
|
||||
| `DWS_FINANCE_DISCOUNT_DETAIL` | `FinanceDiscountDetailTask` | `dws_finance_discount_detail` | 日期+折扣类型 | delete-before-insert |
|
||||
|
||||
### 库存汇总域(3 个)
|
||||
|
||||
| 任务代码 | Python 类 | 目标表 | 粒度 | 更新策略 |
|
||||
|----------|-----------|--------|------|----------|
|
||||
| `DWS_GOODS_STOCK_DAILY` | `GoodsStockDailyTask` | `dws_goods_stock_daily_summary` | 日期+商品 | upsert |
|
||||
| `DWS_GOODS_STOCK_WEEKLY` | `GoodsStockWeeklyTask` | `dws_goods_stock_weekly_summary` | ISO周+商品 | upsert |
|
||||
| `DWS_GOODS_STOCK_MONTHLY` | `GoodsStockMonthlyTask` | `dws_goods_stock_monthly_summary` | 月份+商品 | upsert |
|
||||
|
||||
### 运维任务(2 个)
|
||||
|
||||
| 任务代码 | Python 类 | 继承 | 说明 | 更新策略 |
|
||||
@@ -377,6 +386,51 @@ dwd_assistant_service_log ────► DWS_ASSISTANT_CUSTOMER(客户关系
|
||||
|
||||
---
|
||||
|
||||
### DWS_ASSISTANT_ORDER_CONTRIBUTION — 助教订单流水四项统计
|
||||
|
||||
| 属性 | 值 |
|
||||
|------|-----|
|
||||
| 任务代码 | `DWS_ASSISTANT_ORDER_CONTRIBUTION` |
|
||||
| Python 类 | `AssistantOrderContributionTask`(`tasks/dws/assistant_order_contribution_task.py`) |
|
||||
| 目标表 | `dws.dws_assistant_order_contribution` |
|
||||
| 主键 | `site_id`, `assistant_id`, `stat_date` |
|
||||
| 粒度 | 日期 + 助教 |
|
||||
| 更新策略 | delete-before-insert(按日期窗口) |
|
||||
| 更新频率 | 每日更新 |
|
||||
| 依赖 | `DWD_LOAD_FROM_ODS` |
|
||||
|
||||
#### 数据来源
|
||||
|
||||
| 来源表 | Schema | 用途 |
|
||||
|--------|--------|------|
|
||||
| `dwd_settlement_head` | `dwd` | 结算主表(订单级信息) |
|
||||
| `dwd_table_fee_log` | `dwd` | 台费明细(台桌使用时长、台费金额) |
|
||||
| `dwd_assistant_service_log` | `dwd` | 助教服务记录(服务时长、流水、分成) |
|
||||
|
||||
#### 聚合维度与输出字段
|
||||
|
||||
按 `(assistant_id, stat_date)` 聚合,输出以下字段:
|
||||
|
||||
| 字段分组 | 字段 | 说明 |
|
||||
|----------|------|------|
|
||||
| 标识 | `site_id`, `tenant_id`, `assistant_id`, `assistant_nickname`, `stat_date` | 门店、助教、日期 |
|
||||
| 四项统计 | `order_gross_revenue` | 订单总流水:台费 + 酒水食品 + 所有助教服务费 |
|
||||
| | `order_net_revenue` | 订单净流水:订单总流水 - 所有助教服务分成 |
|
||||
| | `time_weighted_revenue` | 时效贡献流水:按服务时长折算的个人贡献 |
|
||||
| | `time_weighted_net_revenue` | 时效净贡献:时效贡献流水 - 个人服务分成 |
|
||||
| 辅助 | `order_count`, `total_service_seconds` | 参与订单数、总服务时长秒数 |
|
||||
|
||||
#### 核心业务逻辑
|
||||
|
||||
1. **订单总流水(order_gross_revenue)**:助教参与订单的全部流水(台费 + 酒水食品 + 所有助教服务费),每个参与助教获得相同值
|
||||
2. **订单净流水(order_net_revenue)**:订单总流水 - 该订单所有助教的服务分成总额,每个参与助教获得相同值
|
||||
3. **时效贡献流水(time_weighted_revenue)**:台费按助教在各台桌的服务时长占比分摊 + 个人服务费 + 酒水食品按总时长比例均分
|
||||
4. **时效净贡献(time_weighted_net_revenue)**:时效贡献流水 - 该助教个人的服务分成
|
||||
5. **超休/打赏课特殊处理**:`course_type = BONUS` 的助教不参与订单级分摊,四项统计均等于个人服务流水和分成
|
||||
6. **台费分摊公式**:`billable_seconds = MAX(SUM(助教服务时长), 台桌使用时长)`,各助教按 `service_seconds / billable_seconds` 比例分摊
|
||||
|
||||
---
|
||||
|
||||
### DWS_ASSISTANT_DAILY — 助教日度业绩明细
|
||||
|
||||
| 属性 | 值 |
|
||||
|
||||
@@ -8,16 +8,17 @@
|
||||
|
||||
## 概述
|
||||
|
||||
INDEX 层共有 4 个已注册任务:
|
||||
INDEX 层共有 5 个已注册任务:
|
||||
|
||||
| 任务代码 | Python 类 | 目标表 | 指数类型 | 更新策略 |
|
||||
|----------|-----------|--------|----------|----------|
|
||||
| `DWS_WINBACK_INDEX` | `WinbackIndexTask` | `dws_member_winback_index` | WBI(回流指数) | delete-before-insert(按门店全量刷新) |
|
||||
| `DWS_NEWCONV_INDEX` | `NewconvIndexTask` | `dws_member_newconv_index` | NCI(新客转化指数) | delete-before-insert(按门店全量刷新) |
|
||||
| `DWS_RELATION_INDEX` | `RelationIndexTask` | `dws_member_assistant_relation_index` | RS/OS/MS/ML(关系指数) | delete-before-insert(按门店全量刷新) |
|
||||
| `DWS_SPENDING_POWER_INDEX` | `SpendingPowerIndexTask` | `dws_member_spending_power_index` | SPI(消费力指数) | delete-before-insert(按门店全量刷新) |
|
||||
| `DWS_ML_MANUAL_IMPORT` | `MlManualImportTask` | `dws_ml_manual_order_source` / `dws_ml_manual_order_alloc` | ML(手动台账导入) | 按 scope 先删后写 |
|
||||
|
||||
> 注册位置:`orchestration/task_registry.py`,所有 INDEX 任务的 `requires_db_config=False`、`layer="INDEX"`。
|
||||
> 注册位置:`orchestration/task_registry.py`,所有 INDEX 任务的 `requires_db_config=False`、`layer="INDEX"`。SPI 任务额外声明 `depends_on=["DWS_MEMBER_CONSUMPTION"]`。
|
||||
|
||||
---
|
||||
|
||||
@@ -34,8 +35,9 @@ BaseTask
|
||||
├── MemberIndexBaseTask ← WBI / NCI 共享的会员特征提取
|
||||
│ ├── WinbackIndexTask
|
||||
│ └── NewconvIndexTask
|
||||
├── RelationIndexTask ← RS/OS/MS/ML 四合一
|
||||
└── MlManualImportTask ← ML 人工台账导入
|
||||
├── RelationIndexTask ← RS/OS/MS/ML 四合一
|
||||
├── SpendingPowerIndexTask ← SPI 消费力指数(独立数据提取)
|
||||
└── MlManualImportTask ← ML 人工台账导入
|
||||
```
|
||||
|
||||
### 子类必须实现的抽象方法
|
||||
@@ -414,6 +416,177 @@ NCI 产出 3 个 Display Score:
|
||||
| `w_value` | 1.0 | 价值权重 |
|
||||
|
||||
|
||||
---
|
||||
|
||||
## DWS_SPENDING_POWER_INDEX — 消费力指数(SPI)
|
||||
|
||||
| 属性 | 值 |
|
||||
|------|-----|
|
||||
| 任务代码 | `DWS_SPENDING_POWER_INDEX` |
|
||||
| Python 类 | `SpendingPowerIndexTask`(`tasks/dws/index/spending_power_index_task.py`) |
|
||||
| 继承链 | `BaseTask → BaseDwsTask → BaseIndexTask → SpendingPowerIndexTask` |
|
||||
| 目标表 | `dws.dws_member_spending_power_index` |
|
||||
| 主键 | `site_id, member_id` |
|
||||
| 指数类型 | `SPI` |
|
||||
| 依赖任务 | `DWS_MEMBER_CONSUMPTION` |
|
||||
| 更新策略 | 按门店全量刷新(先 DELETE WHERE site_id = %s,再 INSERT) |
|
||||
|
||||
### 业务含义
|
||||
|
||||
SPI 衡量会员在门店内的综合消费力层级——分数越高,表示该会员的消费能力和消费意愿越强。适用于客户分层、资源分配和精准营销场景,与 WBI/NCI 等运营指数协同使用。
|
||||
|
||||
SPI 不使用 `MemberIndexBaseTask` 的会员分群逻辑(NEW/OLD/STOP),所有在近 90 天内有消费或充值记录的会员均参与计算。
|
||||
|
||||
### 计算范围
|
||||
|
||||
所有在近 90 天内有消费订单(settle_type IN (1, 3))或充值订单(settle_type = 5)的会员。无任何消费/充值数据的门店跳过计算,返回 `{'status': 'skipped', 'reason': 'no_data'}`。
|
||||
|
||||
### 数据来源
|
||||
|
||||
| 数据 | 来源表 | 提取方式 |
|
||||
|------|--------|----------|
|
||||
| 消费订单 | `dwd.dwd_settlement_head` | settle_type IN (1, 3),近 90 天,聚合为会员级特征 |
|
||||
| 充值订单 | `dwd.dwd_recharge_order` | settle_type = 5,近 90 天,聚合为会员级充值特征 |
|
||||
| 算法参数 | `dws.cfg_index_parameters` | index_type = 'SPI' |
|
||||
|
||||
### 基础特征(SPIMemberFeatures)
|
||||
|
||||
从 DWD 层提取并计算的会员级消费特征:
|
||||
|
||||
| 字段 | 类型 | 含义 |
|
||||
|------|------|------|
|
||||
| `spend_30` | float | 近 30 天消费总额 |
|
||||
| `spend_90` | float | 近 90 天消费总额 |
|
||||
| `recharge_90` | float | 近 90 天充值总额 |
|
||||
| `orders_30` | int | 近 30 天消费笔数 |
|
||||
| `orders_90` | int | 近 90 天消费笔数 |
|
||||
| `visit_days_30` | int | 近 30 天消费日数(按天去重) |
|
||||
| `visit_days_90` | int | 近 90 天消费日数(按天去重) |
|
||||
| `avg_ticket_90` | float | 90 天客单价:`spend_90 / max(orders_90, 1)` |
|
||||
| `active_weeks_90` | int | 近 90 天有消费的自然周数(最多 13 周) |
|
||||
| `daily_spend_ewma_90` | float | 近 90 天日消费 EWMA |
|
||||
|
||||
### 算法概要
|
||||
|
||||
SPI 由三个子分加权合成:
|
||||
|
||||
```
|
||||
SPI_raw = w_L × Level + w_S × Speed + w_P × Stability
|
||||
```
|
||||
|
||||
默认权重:`w_L = 0.60`、`w_S = 0.30`、`w_P = 0.10`。
|
||||
|
||||
#### 子分 1:消费水平(Level)
|
||||
|
||||
基于消费金额和客单价的 log1p 压缩加权,衡量客户消费金额层级:
|
||||
|
||||
```
|
||||
L = w_s30 × ln(1 + spend_30 / M30)
|
||||
+ w_s90 × ln(1 + spend_90 / M90)
|
||||
+ w_ticket × ln(1 + avg_ticket_90 / T0)
|
||||
+ w_r90 × ln(1 + recharge_90 / R90)
|
||||
```
|
||||
|
||||
| 参数 | 默认值 | 含义 |
|
||||
|------|--------|------|
|
||||
| `w_level_spend_30` (w_s30) | 0.30 | 近 30 天消费权重 |
|
||||
| `w_level_spend_90` (w_s90) | 0.35 | 近 90 天消费权重 |
|
||||
| `w_level_ticket_90` (w_ticket) | 0.20 | 客单价权重 |
|
||||
| `w_level_recharge_90` (w_r90) | 0.15 | 充值权重 |
|
||||
| `amount_base_spend_30` (M30) | 500 | 30 天消费压缩基数 |
|
||||
| `amount_base_spend_90` (M90) | 1500 | 90 天消费压缩基数 |
|
||||
| `amount_base_ticket_90` (T0) | 200 | 客单价压缩基数 |
|
||||
| `amount_base_recharge_90` (R90) | 1000 | 充值压缩基数 |
|
||||
|
||||
当所有消费和充值金额均为 0 时,Level 子分为 0.0。
|
||||
|
||||
#### 子分 2:消费速度(Speed)
|
||||
|
||||
衡量近期消费推进速度与节奏变化,由三个速度指标加权合成:
|
||||
|
||||
```
|
||||
S = w_abs × V_abs + w_rel × max(0, V_rel) + w_ewma × V_ewma
|
||||
```
|
||||
|
||||
| 速度指标 | 公式 | 含义 |
|
||||
|----------|------|------|
|
||||
| V_abs(绝对速度) | `ln(1 + spend_30 / (max(visit_days_30, 1) × V0))` | 每消费日平均消费的对数压缩 |
|
||||
| V_rel(相对速度) | `ln((v_30 + ε) / (v_90 + ε))`,其中 `v_30 = spend_30/30`,`v_90 = spend_90/90` | 近期消费速率相对长期的变化 |
|
||||
| V_ewma(EWMA 速度) | `ln(1 + daily_spend_ewma_90 / E0)` | 日消费 EWMA 的对数压缩 |
|
||||
|
||||
设计要点:仅对加速(`V_rel > 0`)加分,不对减速直接扣分(通过 `max(0, V_rel)` 实现)。
|
||||
|
||||
| 参数 | 默认值 | 含义 |
|
||||
|------|--------|------|
|
||||
| `w_speed_abs` | 0.50 | 绝对速度权重 |
|
||||
| `w_speed_rel` | 0.30 | 相对速度权重 |
|
||||
| `w_speed_ewma` | 0.20 | EWMA 速度权重 |
|
||||
| `amount_base_speed_abs` (V0) | 100 | 绝对速度压缩基数 |
|
||||
| `amount_base_ewma_90` (E0) | 50 | EWMA 速度压缩基数 |
|
||||
| `speed_epsilon` (ε) | 1e-6 | 防除零小量 |
|
||||
|
||||
#### 子分 3:消费稳定性(Stability)
|
||||
|
||||
基于近 90 天周覆盖率,识别稳定高消费与偶发冲高:
|
||||
|
||||
```
|
||||
P = active_weeks_90 / 13
|
||||
```
|
||||
|
||||
近 90 天共约 13 个自然周,`active_weeks_90` 为其中有消费的周数。取值范围 [0, 1]。
|
||||
|
||||
当 `use_stability = 0` 时,Stability 子分权重视为 0,跳过稳定性计算。
|
||||
|
||||
### Display Score 归一化
|
||||
|
||||
SPI 产出 4 组 Display Score,各自独立归一化到 0-10 分:
|
||||
|
||||
| 展示分 | 对应 Raw Score | 分位历史 index_type |
|
||||
|--------|---------------|---------------------|
|
||||
| `display_score` | `raw_score`(SPI 总分) | `SPI` |
|
||||
| `score_level_display` | `score_level_raw` | `SPI_LEVEL` |
|
||||
| `score_speed_display` | `score_speed_raw` | `SPI_SPEED` |
|
||||
| `score_stability_display` | `score_stability_raw` | `SPI_STABILITY` |
|
||||
|
||||
归一化流程复用 `BaseIndexTask.batch_normalize_to_display`:
|
||||
|
||||
```
|
||||
Raw Score → [可选压缩] → Winsorize(P5, P95) → MinMax(0, 10) → [可选 EWMA 平滑]
|
||||
```
|
||||
|
||||
### 金额压缩基数校准
|
||||
|
||||
SPI 支持金额压缩基数的自动校准机制:
|
||||
|
||||
1. 首次执行或参数缺失时,从门店近 90 天消费数据计算各基数的中位数作为建议值
|
||||
2. 若 `cfg_index_parameters` 中已存在对应参数,优先使用配置表中的值
|
||||
3. 实际使用的基数值会输出到日志,便于运营人员审查和手动调优
|
||||
|
||||
### 执行流程
|
||||
|
||||
```
|
||||
1. 获取 site_id
|
||||
2. load_index_parameters('SPI') 加载参数(缺失参数使用 DEFAULT_PARAMS)
|
||||
3. _extract_spending_features:从 dwd_settlement_head 提取消费特征
|
||||
4. _extract_recharge_features:从 dwd_recharge_order 提取充值特征
|
||||
5. _compute_daily_spend_ewma:计算日消费 EWMA
|
||||
6. _calibrate_amount_bases:校准金额压缩基数
|
||||
7. 逐会员计算:compute_level → compute_speed → compute_stability → compute_spi_raw
|
||||
8. batch_normalize_to_display:SPI 总分 + 三个子分各自独立归一化
|
||||
9. DELETE FROM dws_member_spending_power_index WHERE site_id = %s
|
||||
10. _save_spi_data:批量 INSERT
|
||||
11. 保存分位点历史到 dws_index_percentile_history(index_type='SPI')
|
||||
```
|
||||
|
||||
### 默认权重
|
||||
|
||||
| 参数 | 默认值 | 含义 |
|
||||
|------|--------|------|
|
||||
| `weight_level` | 0.60 | Level 子分在总分中的权重 |
|
||||
| `weight_speed` | 0.30 | Speed 子分在总分中的权重 |
|
||||
| `weight_stability` | 0.10 | Stability 子分在总分中的权重 |
|
||||
|
||||
|
||||
---
|
||||
|
||||
## DWS_RELATION_INDEX — 关系指数(RS/OS/MS/ML)
|
||||
@@ -751,5 +924,29 @@ ORDER BY effective_from DESC
|
||||
| `compression_mode` | 1 | 压缩模式(默认 log1p) |
|
||||
| `use_smoothing` / `ewma_alpha` | 1 / 0.2 | EWMA 平滑 |
|
||||
|
||||
### SPI 参数清单
|
||||
|
||||
| 参数名 | 默认值 | 说明 |
|
||||
|--------|--------|------|
|
||||
| `spend_window_short_days` | 30 | 短期消费窗口(天) |
|
||||
| `spend_window_long_days` | 90 | 长期消费窗口(天) |
|
||||
| `ewma_alpha_daily_spend` | 0.3 | 日消费 EWMA 平滑系数 |
|
||||
| `amount_base_spend_30` | 500 | 30 天消费压缩基数 |
|
||||
| `amount_base_spend_90` | 1500 | 90 天消费压缩基数 |
|
||||
| `amount_base_ticket_90` | 200 | 客单价压缩基数 |
|
||||
| `amount_base_recharge_90` | 1000 | 充值压缩基数 |
|
||||
| `amount_base_speed_abs` | 100 | 绝对速度压缩基数 |
|
||||
| `amount_base_ewma_90` | 50 | EWMA 速度压缩基数 |
|
||||
| `w_level_spend_30` / `w_level_spend_90` | 0.30 / 0.35 | Level 子分中消费权重 |
|
||||
| `w_level_ticket_90` / `w_level_recharge_90` | 0.20 / 0.15 | Level 子分中客单/充值权重 |
|
||||
| `w_speed_abs` / `w_speed_rel` / `w_speed_ewma` | 0.50 / 0.30 / 0.20 | Speed 子分三项权重 |
|
||||
| `weight_level` / `weight_speed` / `weight_stability` | 0.60 / 0.30 / 0.10 | SPI 总分三子分权重 |
|
||||
| `stability_window_days` | 90 | 稳定性计算窗口(天) |
|
||||
| `use_stability` | 1 | 是否启用稳定性子分(0=跳过) |
|
||||
| `percentile_lower` / `percentile_upper` | 5 / 95 | 归一化分位点 |
|
||||
| `compression_mode` | 1 | 压缩模式(默认 log1p) |
|
||||
| `use_smoothing` / `ewma_alpha` | 1 / 0.2 | EWMA 分位平滑 |
|
||||
| `speed_epsilon` | 1e-6 | 速度计算防除零小量 |
|
||||
|
||||
> 种子数据脚本:`db/etl_feiqiu/seeds/seed_index_parameters.sql`
|
||||
> DDL 定义:`docs/database/ddl/etl_feiqiu__dws.sql`
|
||||
|
||||
@@ -228,7 +228,7 @@ execute(cursor_data)
|
||||
|
||||
### content_hash 去重机制
|
||||
|
||||
`content_hash` 是通用 ODS 任务的核心去重手段,所有 22 个任务默认开启(`skip_unchanged=True`)。
|
||||
`content_hash` 是通用 ODS 任务的核心去重手段,所有 23 个任务默认开启(`skip_unchanged=True`)。
|
||||
|
||||
#### 计算方式
|
||||
|
||||
|
||||
@@ -118,12 +118,6 @@ psql "$PG_DSN" -f db/etl_feiqiu/seeds/seed_*.sql
|
||||
> 注:旧的 `db/etl_feiqiu/schemas/` 和 `db/etl_feiqiu/migrations/` 已归档至 `db/_archived/`。
|
||||
> DDL 基线现由 `docs/database/ddl/` 统一管理,可通过 `python scripts/ops/gen_consolidated_ddl.py` 重新生成。
|
||||
|
||||
或使用 CLI 工具任务初始化:
|
||||
|
||||
```bash
|
||||
python -m cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA,INIT_DWS_SCHEMA,SEED_DWS_CONFIG --pg-dsn "$PG_DSN"
|
||||
```
|
||||
|
||||
## 5. 验证安装
|
||||
|
||||
```bash
|
||||
|
||||
@@ -123,7 +123,7 @@
|
||||
|
||||
**解决方案**:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
uv sync
|
||||
```
|
||||
|
||||
### 5.3 编码问题
|
||||
|
||||
@@ -166,7 +166,11 @@ class FlowRunner:
|
||||
|
||||
timer.start_step("INCREMENT_ETL")
|
||||
if task_codes:
|
||||
results = self.task_executor.run_tasks(task_codes, data_source=data_source)
|
||||
# CHANGE [2026-02-24] intent: 对前端传入的 task_codes 也执行拓扑排序,
|
||||
# 避免 DWS 在 DWD 未完成时就开始计算(跨层依赖顺序缺失 bug)
|
||||
# prompt: "修复管理后台全选任务时不按层级顺序执行的问题"
|
||||
sorted_codes = topological_sort(task_codes, self.task_registry)
|
||||
results = self.task_executor.run_tasks(sorted_codes, data_source=data_source)
|
||||
else:
|
||||
auto_tasks = self._resolve_tasks(layers)
|
||||
results = self.task_executor.run_tasks(auto_tasks, data_source=data_source)
|
||||
|
||||
@@ -107,6 +107,11 @@ class TaskExecutor:
|
||||
results.append(result_entry)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
self.logger.error("任务 %s 失败: %s", task_code, exc, exc_info=True)
|
||||
# CHANGE 2026-02-24 | 任务失败后 rollback,防止 InFailedSqlTransaction 级联
|
||||
try:
|
||||
self.db.rollback()
|
||||
except Exception:
|
||||
pass
|
||||
results.append({
|
||||
"task_code": task_code,
|
||||
"status": "失败",
|
||||
|
||||
@@ -30,6 +30,7 @@ from tasks.utility.seed_dws_config_task import SeedDwsConfigTask
|
||||
# DWS 层任务导入
|
||||
from tasks.dws import (
|
||||
AssistantDailyTask,
|
||||
AssistantOrderContributionTask,
|
||||
AssistantMonthlyTask,
|
||||
AssistantCustomerTask,
|
||||
AssistantSalaryTask,
|
||||
@@ -147,6 +148,7 @@ default_registry.register("DATA_INTEGRITY_CHECK", DataIntegrityTask, requires_db
|
||||
# ── DWS 层业务任务 ────────────────────────────────────────────
|
||||
default_registry.register("DWS_BUILD_ORDER_SUMMARY", DwsBuildOrderSummaryTask, requires_db_config=False, layer="DWS")
|
||||
default_registry.register("DWS_ASSISTANT_DAILY", AssistantDailyTask, layer="DWS")
|
||||
default_registry.register("DWS_ASSISTANT_ORDER_CONTRIBUTION", AssistantOrderContributionTask, layer="DWS", depends_on=["DWD_LOAD_FROM_ODS"])
|
||||
# CHANGE [2026-07-17] intent: 为已知依赖关系添加 depends_on 声明(需求 8.1, 8.2)
|
||||
default_registry.register("DWS_ASSISTANT_MONTHLY", AssistantMonthlyTask, layer="DWS", depends_on=["DWS_ASSISTANT_DAILY"])
|
||||
default_registry.register("DWS_ASSISTANT_CUSTOMER", AssistantCustomerTask, layer="DWS")
|
||||
@@ -166,7 +168,8 @@ default_registry.register("DWS_GOODS_STOCK_MONTHLY", GoodsStockMonthlyTask, laye
|
||||
# 替换为统一维护任务 DWS_MAINTENANCE(需求 4.5)
|
||||
# depends_on: 所有其他 DWS 任务——MV 刷新和清理应在数据写入后执行
|
||||
default_registry.register("DWS_MAINTENANCE", DwsMaintenanceTask, layer="DWS", depends_on=[
|
||||
"DWS_ASSISTANT_DAILY", "DWS_ASSISTANT_MONTHLY", "DWS_ASSISTANT_CUSTOMER",
|
||||
"DWS_ASSISTANT_DAILY", "DWS_ASSISTANT_ORDER_CONTRIBUTION",
|
||||
"DWS_ASSISTANT_MONTHLY", "DWS_ASSISTANT_CUSTOMER",
|
||||
"DWS_ASSISTANT_SALARY", "DWS_ASSISTANT_FINANCE",
|
||||
"DWS_MEMBER_CONSUMPTION", "DWS_MEMBER_VISIT",
|
||||
"DWS_FINANCE_DAILY", "DWS_FINANCE_RECHARGE",
|
||||
|
||||
@@ -2,6 +2,8 @@
|
||||
"""拓扑排序模块 — Kahn's algorithm
|
||||
|
||||
对任务列表按依赖关系执行拓扑排序:
|
||||
- 显式依赖:TaskMeta.depends_on 声明的任务间依赖
|
||||
- 隐含层级依赖:ODS → DWD → DWS → INDEX,同批任务中低层任务必须先于高层任务
|
||||
- 仅对当前执行列表内的任务排序
|
||||
- depends_on 中引用的任务不在列表内时记录警告
|
||||
- 检测循环依赖并抛出 ValueError
|
||||
@@ -11,10 +13,22 @@ import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# 层级优先级:数值越小越先执行
|
||||
_LAYER_ORDER: dict[str, int] = {
|
||||
"ODS": 0,
|
||||
"DWD": 1,
|
||||
"DWS": 2,
|
||||
"INDEX": 3,
|
||||
}
|
||||
|
||||
|
||||
def topological_sort(task_codes: list[str], registry) -> list[str]:
|
||||
"""对任务列表执行拓扑排序(Kahn's algorithm)。
|
||||
|
||||
除了显式 depends_on 依赖外,还注入隐含的层级依赖:
|
||||
同批任务中,所有 ODS 任务排在 DWD 之前,DWD 排在 DWS 之前,
|
||||
DWS 排在 INDEX 之前。这确保跨层执行顺序正确。
|
||||
|
||||
Args:
|
||||
task_codes: 待排序的任务代码列表
|
||||
registry: TaskRegistry 实例,提供 get_metadata() 查询依赖
|
||||
@@ -29,9 +43,10 @@ def topological_sort(task_codes: list[str], registry) -> list[str]:
|
||||
return []
|
||||
|
||||
in_degree = {code: 0 for code in task_codes}
|
||||
graph = {code: [] for code in task_codes}
|
||||
graph: dict[str, list[str]] = {code: [] for code in task_codes}
|
||||
task_set = set(task_codes)
|
||||
|
||||
# 1. 显式依赖(depends_on)
|
||||
for code in task_codes:
|
||||
meta = registry.get_metadata(code)
|
||||
if meta and meta.depends_on:
|
||||
@@ -44,6 +59,31 @@ def topological_sort(task_codes: list[str], registry) -> list[str]:
|
||||
"任务 %s 依赖 %s,但后者不在当前执行列表中", code, dep
|
||||
)
|
||||
|
||||
# CHANGE [2026-02-24] intent: 注入隐含层级依赖,确保跨层执行顺序正确
|
||||
# assumptions: 层级顺序固定为 ODS→DWD→DWS→INDEX;同层任务无隐含互相依赖
|
||||
# prompt: "修复管理后台全选任务时不按层级顺序执行的问题"
|
||||
# 2. 隐含层级依赖:按层分组,相邻层之间建立边
|
||||
# 选择每层一个"代表节点"作为屏障,避免 O(n*m) 的全连接边
|
||||
layer_groups: dict[int, list[str]] = {}
|
||||
for code in task_codes:
|
||||
meta = registry.get_metadata(code)
|
||||
if meta and meta.layer:
|
||||
order = _LAYER_ORDER.get(meta.layer.upper())
|
||||
if order is not None:
|
||||
layer_groups.setdefault(order, []).append(code)
|
||||
|
||||
sorted_layers = sorted(layer_groups.keys())
|
||||
for i in range(len(sorted_layers) - 1):
|
||||
lower_layer = sorted_layers[i]
|
||||
higher_layer = sorted_layers[i + 1]
|
||||
# 高层的每个任务都依赖低层的所有任务
|
||||
for higher_code in layer_groups[higher_layer]:
|
||||
for lower_code in layer_groups[lower_layer]:
|
||||
# 避免重复添加已有的显式依赖边
|
||||
if higher_code not in graph[lower_code]:
|
||||
graph[lower_code].append(higher_code)
|
||||
in_degree[higher_code] += 1
|
||||
|
||||
queue = deque(code for code in task_codes if in_degree[code] == 0)
|
||||
result = []
|
||||
while queue:
|
||||
|
||||
@@ -606,6 +606,11 @@ def run_consistency_check(
|
||||
report.ods_vs_dwd_results.append(result)
|
||||
|
||||
except Exception as exc:
|
||||
# CHANGE 2026-02-24 | rollback 防止 InFailedSqlTransaction 级联到后续表检查
|
||||
try:
|
||||
db_conn.conn.rollback()
|
||||
except Exception:
|
||||
pass
|
||||
result = TableCheckResult(
|
||||
table_name=dwd_full,
|
||||
check_type="ods_vs_dwd",
|
||||
|
||||
393
apps/etl/connectors/feiqiu/scripts/verify_dws_extensions.py
Normal file
393
apps/etl/connectors/feiqiu/scripts/verify_dws_extensions.py
Normal file
@@ -0,0 +1,393 @@
|
||||
#!/usr/bin/env python
|
||||
# -*- coding: utf-8 -*-
|
||||
"""DWS 层扩展验证脚本 — 影子跑数验证。
|
||||
|
||||
对照 Requirements 9.1–9.4 验证三个 DWS 表的结构完整性和数据合理性:
|
||||
1. dws_assistant_order_contribution — 四项统计一致性
|
||||
2. dws_member_consumption_summary — 充值窗口字段
|
||||
3. dws_assistant_daily_detail — 定档折算惩罚字段
|
||||
4. RLS 视图 + FDW 外部表存在性
|
||||
|
||||
用法:
|
||||
cd apps/etl/connectors/feiqiu
|
||||
python scripts/verify_dws_extensions.py
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 1. 加载根 .env(遵循 testing-env.md 规范)
|
||||
# ---------------------------------------------------------------------------
|
||||
_ROOT = Path(__file__).resolve().parents[5] # scripts/ → feiqiu/ → connectors/ → etl/ → apps/ → 根目录
|
||||
load_dotenv(_ROOT / ".env")
|
||||
|
||||
PG_DSN = os.environ.get("PG_DSN")
|
||||
if not PG_DSN:
|
||||
raise RuntimeError("PG_DSN 未设置,请检查 .env 配置")
|
||||
|
||||
APP_DB_DSN = os.environ.get("APP_DB_DSN")
|
||||
if not APP_DB_DSN:
|
||||
raise RuntimeError("APP_DB_DSN 未设置,请检查 .env 配置")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 2. 数据库连接
|
||||
# ---------------------------------------------------------------------------
|
||||
try:
|
||||
import psycopg2
|
||||
except ImportError:
|
||||
print("ERROR: psycopg2 未安装,请执行 uv pip install psycopg2-binary")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 辅助函数
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class _Result:
|
||||
"""单条验证结果。"""
|
||||
|
||||
def __init__(self, name: str):
|
||||
self.name = name
|
||||
self.passed = True
|
||||
self.details: list[str] = []
|
||||
|
||||
def fail(self, msg: str) -> None:
|
||||
self.passed = False
|
||||
self.details.append(f" FAIL: {msg}")
|
||||
|
||||
def ok(self, msg: str) -> None:
|
||||
self.details.append(f" OK: {msg}")
|
||||
|
||||
def __str__(self) -> str:
|
||||
status = "PASS" if self.passed else "FAIL"
|
||||
header = f"[{status}] {self.name}"
|
||||
if self.details:
|
||||
return header + "\n" + "\n".join(self.details)
|
||||
return header
|
||||
|
||||
|
||||
def _query(conn, sql: str, params=None) -> list[tuple]:
|
||||
with conn.cursor() as cur:
|
||||
cur.execute(sql, params)
|
||||
return cur.fetchall()
|
||||
|
||||
|
||||
def _query_one(conn, sql: str, params=None):
|
||||
rows = _query(conn, sql, params)
|
||||
return rows[0] if rows else None
|
||||
|
||||
|
||||
def _column_exists(conn, schema: str, table: str, column: str) -> bool:
|
||||
row = _query_one(
|
||||
conn,
|
||||
"""
|
||||
SELECT 1 FROM information_schema.columns
|
||||
WHERE table_schema = %s AND table_name = %s AND column_name = %s
|
||||
""",
|
||||
(schema, table, column),
|
||||
)
|
||||
return row is not None
|
||||
|
||||
|
||||
def _table_exists(conn, schema: str, table: str) -> bool:
|
||||
row = _query_one(
|
||||
conn,
|
||||
"""
|
||||
SELECT 1 FROM information_schema.tables
|
||||
WHERE table_schema = %s AND table_name = %s
|
||||
""",
|
||||
(schema, table),
|
||||
)
|
||||
return row is not None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 验证 1:dws_assistant_order_contribution 四项统计(Req 9.1, 9.2)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def verify_contribution_table(conn) -> _Result:
|
||||
r = _Result("验证 1:dws_assistant_order_contribution 表结构与数据")
|
||||
|
||||
# 1a. 表存在
|
||||
if not _table_exists(conn, "dws", "dws_assistant_order_contribution"):
|
||||
r.fail("表 dws.dws_assistant_order_contribution 不存在")
|
||||
return r
|
||||
r.ok("表存在")
|
||||
|
||||
# 1b. 关键字段存在
|
||||
required_cols = [
|
||||
"contribution_id", "site_id", "tenant_id", "assistant_id",
|
||||
"assistant_nickname", "stat_date",
|
||||
"order_gross_revenue", "order_net_revenue",
|
||||
"time_weighted_revenue", "time_weighted_net_revenue",
|
||||
"order_count", "total_service_seconds",
|
||||
"created_at", "updated_at",
|
||||
]
|
||||
missing = [c for c in required_cols
|
||||
if not _column_exists(conn, "dws", "dws_assistant_order_contribution", c)]
|
||||
if missing:
|
||||
r.fail(f"缺少字段: {', '.join(missing)}")
|
||||
else:
|
||||
r.ok(f"全部 {len(required_cols)} 个字段存在")
|
||||
|
||||
# 1c. 唯一索引存在
|
||||
idx_row = _query_one(
|
||||
conn,
|
||||
"""
|
||||
SELECT indexname FROM pg_indexes
|
||||
WHERE schemaname = 'dws'
|
||||
AND tablename = 'dws_assistant_order_contribution'
|
||||
AND indexname = 'idx_aoc_site_assistant_date'
|
||||
""",
|
||||
)
|
||||
if idx_row:
|
||||
r.ok("唯一索引 idx_aoc_site_assistant_date 存在")
|
||||
else:
|
||||
r.fail("唯一索引 idx_aoc_site_assistant_date 不存在")
|
||||
|
||||
# 1d. 数据行数(信息性,不判 FAIL)
|
||||
row = _query_one(conn, "SELECT COUNT(*) FROM dws.dws_assistant_order_contribution")
|
||||
count = row[0] if row else 0
|
||||
r.ok(f"当前数据行数: {count}")
|
||||
|
||||
# 1e. 如果有数据,检查四项统计非负
|
||||
if count > 0:
|
||||
neg_row = _query_one(
|
||||
conn,
|
||||
"""
|
||||
SELECT COUNT(*) FROM dws.dws_assistant_order_contribution
|
||||
WHERE order_gross_revenue < 0
|
||||
OR order_net_revenue < 0
|
||||
OR time_weighted_revenue < 0
|
||||
OR time_weighted_net_revenue < 0
|
||||
""",
|
||||
)
|
||||
neg_count = neg_row[0] if neg_row else 0
|
||||
if neg_count > 0:
|
||||
r.fail(f"存在 {neg_count} 条四项统计为负值的记录")
|
||||
else:
|
||||
r.ok("四项统计数值均非负")
|
||||
|
||||
return r
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 验证 2:dws_member_consumption_summary 充值窗口字段(Req 9.3)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def verify_consumption_fields(conn) -> _Result:
|
||||
r = _Result("验证 2:dws_member_consumption_summary 充值窗口字段")
|
||||
|
||||
if not _table_exists(conn, "dws", "dws_member_consumption_summary"):
|
||||
r.fail("表 dws.dws_member_consumption_summary 不存在")
|
||||
return r
|
||||
r.ok("表存在")
|
||||
|
||||
new_cols = [
|
||||
"recharge_count_30d", "recharge_count_60d", "recharge_count_90d",
|
||||
"recharge_amount_30d", "recharge_amount_60d", "recharge_amount_90d",
|
||||
"avg_ticket_amount",
|
||||
]
|
||||
missing = [c for c in new_cols
|
||||
if not _column_exists(conn, "dws", "dws_member_consumption_summary", c)]
|
||||
if missing:
|
||||
r.fail(f"缺少新增字段: {', '.join(missing)}")
|
||||
else:
|
||||
r.ok(f"全部 {len(new_cols)} 个新增字段存在")
|
||||
|
||||
# 如果有数据,检查充值金额和次均消费非负
|
||||
row = _query_one(conn, "SELECT COUNT(*) FROM dws.dws_member_consumption_summary")
|
||||
count = row[0] if row else 0
|
||||
r.ok(f"当前数据行数: {count}")
|
||||
|
||||
if count > 0:
|
||||
neg_row = _query_one(
|
||||
conn,
|
||||
"""
|
||||
SELECT COUNT(*) FROM dws.dws_member_consumption_summary
|
||||
WHERE recharge_amount_30d < 0
|
||||
OR recharge_amount_60d < 0
|
||||
OR recharge_amount_90d < 0
|
||||
OR avg_ticket_amount < 0
|
||||
""",
|
||||
)
|
||||
neg_count = neg_row[0] if neg_row else 0
|
||||
if neg_count > 0:
|
||||
r.fail(f"存在 {neg_count} 条充值金额或次均消费为负值的记录")
|
||||
else:
|
||||
r.ok("充值金额和次均消费均非负")
|
||||
|
||||
return r
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 验证 3:dws_assistant_daily_detail 惩罚字段(Req 9.4)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def verify_penalty_fields(conn) -> _Result:
|
||||
r = _Result("验证 3:dws_assistant_daily_detail 惩罚字段")
|
||||
|
||||
if not _table_exists(conn, "dws", "dws_assistant_daily_detail"):
|
||||
r.fail("表 dws.dws_assistant_daily_detail 不存在")
|
||||
return r
|
||||
r.ok("表存在")
|
||||
|
||||
new_cols = ["penalty_minutes", "penalty_reason", "is_exempt", "per_hour_contribution"]
|
||||
missing = [c for c in new_cols
|
||||
if not _column_exists(conn, "dws", "dws_assistant_daily_detail", c)]
|
||||
if missing:
|
||||
r.fail(f"缺少新增字段: {', '.join(missing)}")
|
||||
else:
|
||||
r.ok(f"全部 {len(new_cols)} 个惩罚字段存在")
|
||||
|
||||
# 检查 is_exempt 字段类型为 boolean
|
||||
type_row = _query_one(
|
||||
conn,
|
||||
"""
|
||||
SELECT data_type FROM information_schema.columns
|
||||
WHERE table_schema = 'dws'
|
||||
AND table_name = 'dws_assistant_daily_detail'
|
||||
AND column_name = 'is_exempt'
|
||||
""",
|
||||
)
|
||||
if type_row and type_row[0] == "boolean":
|
||||
r.ok("is_exempt 字段类型为 boolean")
|
||||
elif type_row:
|
||||
r.fail(f"is_exempt 字段类型为 {type_row[0]},预期 boolean")
|
||||
|
||||
# 如果有数据,检查 penalty_minutes >= 0
|
||||
row = _query_one(conn, "SELECT COUNT(*) FROM dws.dws_assistant_daily_detail")
|
||||
count = row[0] if row else 0
|
||||
r.ok(f"当前数据行数: {count}")
|
||||
|
||||
if count > 0:
|
||||
neg_row = _query_one(
|
||||
conn,
|
||||
"""
|
||||
SELECT COUNT(*) FROM dws.dws_assistant_daily_detail
|
||||
WHERE penalty_minutes < 0
|
||||
""",
|
||||
)
|
||||
neg_count = neg_row[0] if neg_row else 0
|
||||
if neg_count > 0:
|
||||
r.fail(f"存在 {neg_count} 条 penalty_minutes 为负值的记录")
|
||||
else:
|
||||
r.ok("penalty_minutes 均非负")
|
||||
|
||||
return r
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 验证 4:RLS 视图和 FDW 映射(Req 7, 8)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def verify_rls_views(conn_etl) -> _Result:
|
||||
r = _Result("验证 4a:RLS 视图存在性(ETL 库 app schema)")
|
||||
|
||||
views = [
|
||||
"v_dws_assistant_order_contribution",
|
||||
"v_dws_member_consumption_summary",
|
||||
"v_dws_assistant_daily_detail",
|
||||
]
|
||||
for v in views:
|
||||
if _table_exists(conn_etl, "app", v):
|
||||
r.ok(f"视图 app.{v} 存在")
|
||||
else:
|
||||
r.fail(f"视图 app.{v} 不存在")
|
||||
|
||||
return r
|
||||
|
||||
|
||||
def verify_fdw_tables(conn_app) -> _Result:
|
||||
r = _Result("验证 4b:FDW 外部表存在性(业务库 fdw_etl schema)")
|
||||
|
||||
# FDW 外部表名与 RLS 视图名一致,带 v_ 前缀
|
||||
tables = [
|
||||
"v_dws_assistant_order_contribution",
|
||||
"v_dws_member_consumption_summary",
|
||||
"v_dws_assistant_daily_detail",
|
||||
]
|
||||
for t in tables:
|
||||
row = _query_one(
|
||||
conn_app,
|
||||
"""
|
||||
SELECT 1 FROM information_schema.tables
|
||||
WHERE table_schema = 'fdw_etl' AND table_name = %s
|
||||
""",
|
||||
(t,),
|
||||
)
|
||||
if row:
|
||||
r.ok(f"外部表 fdw_etl.{t} 存在")
|
||||
else:
|
||||
r.fail(f"外部表 fdw_etl.{t} 不存在")
|
||||
|
||||
return r
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 主函数
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main() -> int:
|
||||
results: list[_Result] = []
|
||||
|
||||
# 连接 ETL 测试库
|
||||
try:
|
||||
conn_etl = psycopg2.connect(PG_DSN)
|
||||
conn_etl.autocommit = True
|
||||
except Exception as e:
|
||||
print(f"ERROR: 无法连接 ETL 库 ({PG_DSN[:40]}...): {e}")
|
||||
return 1
|
||||
|
||||
# 连接业务测试库
|
||||
try:
|
||||
conn_app = psycopg2.connect(APP_DB_DSN)
|
||||
conn_app.autocommit = True
|
||||
except Exception as e:
|
||||
print(f"ERROR: 无法连接业务库 ({APP_DB_DSN[:40]}...): {e}")
|
||||
conn_etl.close()
|
||||
return 1
|
||||
|
||||
try:
|
||||
print("=" * 60)
|
||||
print("DWS 层扩展验证 — 影子跑数验证")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# ETL 库验证
|
||||
results.append(verify_contribution_table(conn_etl))
|
||||
results.append(verify_consumption_fields(conn_etl))
|
||||
results.append(verify_penalty_fields(conn_etl))
|
||||
results.append(verify_rls_views(conn_etl))
|
||||
|
||||
# 业务库验证
|
||||
results.append(verify_fdw_tables(conn_app))
|
||||
|
||||
# 输出结果
|
||||
for r in results:
|
||||
print(r)
|
||||
print()
|
||||
|
||||
# 汇总
|
||||
total = len(results)
|
||||
passed = sum(1 for r in results if r.passed)
|
||||
failed = total - passed
|
||||
print("=" * 60)
|
||||
print(f"汇总: {passed}/{total} 通过, {failed} 失败")
|
||||
print("=" * 60)
|
||||
|
||||
return 0 if failed == 0 else 1
|
||||
|
||||
finally:
|
||||
conn_etl.close()
|
||||
conn_app.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -269,6 +269,9 @@ class DwdLoadTask(BaseTask):
|
||||
("days_on_shelf", "days_available", None),
|
||||
("sort_order", "sort", None),
|
||||
("time_slot_sale", "time_slot_sale", None), # CHANGE 2026-02-21: 新增分时段销售标记
|
||||
("warning_sales_day", "warning_sales_day", None), # CHANGE 2026-02-24: 库存预警日均销量
|
||||
("warning_day_max", "warning_day_max", None), # CHANGE 2026-02-24: 预警天数上限
|
||||
("warning_day_min", "warning_day_min", None), # CHANGE 2026-02-24: 预警天数下限
|
||||
],
|
||||
"dwd.dim_goods_category": [
|
||||
("category_id", "id", None),
|
||||
|
||||
@@ -13,6 +13,7 @@ DWS层ETL任务模块
|
||||
|
||||
from .base_dws_task import BaseDwsTask, TimeLayer, TimeWindow, CourseType, DiscountType
|
||||
from .assistant_daily_task import AssistantDailyTask
|
||||
from .assistant_order_contribution_task import AssistantOrderContributionTask
|
||||
from .assistant_monthly_task import AssistantMonthlyTask
|
||||
from .assistant_customer_task import AssistantCustomerTask
|
||||
from .assistant_salary_task import AssistantSalaryTask
|
||||
@@ -47,6 +48,7 @@ __all__ = [
|
||||
"DiscountType",
|
||||
# 助教维度
|
||||
"AssistantDailyTask",
|
||||
"AssistantOrderContributionTask",
|
||||
"AssistantMonthlyTask",
|
||||
"AssistantCustomerTask",
|
||||
"AssistantSalaryTask",
|
||||
|
||||
@@ -29,12 +29,19 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import date, datetime, timedelta
|
||||
from decimal import Decimal
|
||||
from collections import defaultdict
|
||||
from datetime import date, datetime, time, timedelta
|
||||
from decimal import Decimal, ROUND_HALF_UP
|
||||
from typing import Any, Dict, List, Optional, Set, Tuple
|
||||
|
||||
from .base_dws_task import BaseDwsTask, CourseType, TaskContext
|
||||
|
||||
# 惩罚区域集合:大厅 A/B/C/S/TV + 麻将房 M1–M7
|
||||
PENALTY_AREAS: Set[str] = {
|
||||
"A", "B", "C", "S", "TV",
|
||||
"M1", "M2", "M3", "M4", "M5", "M6", "M7",
|
||||
}
|
||||
|
||||
|
||||
class AssistantDailyTask(BaseDwsTask):
|
||||
"""
|
||||
@@ -93,7 +100,7 @@ class AssistantDailyTask(BaseDwsTask):
|
||||
|
||||
def transform(self, extracted: Dict[str, Any], context: TaskContext) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
转换数据:按助教+日期聚合
|
||||
转换数据:按助教+日期聚合,并执行定档折算惩罚检测
|
||||
"""
|
||||
service_records = extracted['service_records']
|
||||
site_id = extracted['site_id']
|
||||
@@ -108,6 +115,68 @@ class AssistantDailyTask(BaseDwsTask):
|
||||
service_records,
|
||||
site_id
|
||||
)
|
||||
|
||||
# ── 定档折算惩罚检测 ──
|
||||
# 构造重叠检测所需的记录格式
|
||||
overlap_records = []
|
||||
for r in service_records:
|
||||
start_t = r.get("start_use_time")
|
||||
end_t = r.get("last_use_time")
|
||||
if start_t is None or end_t is None:
|
||||
continue
|
||||
overlap_records.append({
|
||||
"assistant_id": r.get("assistant_id"),
|
||||
"table_id": r.get("table_id"),
|
||||
"table_area": r.get("table_area_name", ""),
|
||||
"start_time": start_t,
|
||||
"end_time": end_t,
|
||||
"service_date": r.get("service_date"),
|
||||
})
|
||||
|
||||
violations = self.detect_overlap_violations(overlap_records, PENALTY_AREAS)
|
||||
|
||||
# 将惩罚信息填充到聚合结果
|
||||
for agg in aggregated:
|
||||
aid = agg["assistant_id"]
|
||||
stat_date = agg["stat_date"]
|
||||
key = (aid, stat_date)
|
||||
|
||||
if agg.get("is_exempt"):
|
||||
# 豁免:不计算惩罚
|
||||
agg["penalty_minutes"] = Decimal("0")
|
||||
agg["penalty_reason"] = None
|
||||
agg["is_exempt"] = True
|
||||
agg["per_hour_contribution"] = None
|
||||
elif key in violations:
|
||||
# 有违规:计算惩罚
|
||||
# 取第一条违规信息(同一天可能有多条,取最严重的)
|
||||
v_list = violations[key]
|
||||
overlap_count = max(v["overlap_count"] for v in v_list)
|
||||
# per_hour_contribution 需要从台费数据计算
|
||||
# 此处使用聚合后的 base_ledger_amount 和 base_hours 近似
|
||||
base_hours = agg.get("base_hours", Decimal("0"))
|
||||
base_amount = agg.get("base_ledger_amount", Decimal("0"))
|
||||
if base_hours > 0:
|
||||
per_hour = base_amount / base_hours / Decimal(str(overlap_count))
|
||||
else:
|
||||
per_hour = Decimal("0")
|
||||
|
||||
actual_minutes = agg.get("base_hours", Decimal("0")) * Decimal("60")
|
||||
penalty = self.compute_penalty_minutes(actual_minutes, per_hour)
|
||||
|
||||
agg["penalty_minutes"] = penalty
|
||||
agg["penalty_reason"] = (
|
||||
f"规则2违规:同台桌{overlap_count}名助教重叠挂台,"
|
||||
f"单人每小时贡献={per_hour:.2f}元"
|
||||
)
|
||||
agg["is_exempt"] = False
|
||||
agg["per_hour_contribution"] = per_hour
|
||||
else:
|
||||
# 无违规
|
||||
agg["penalty_minutes"] = Decimal("0")
|
||||
agg["penalty_reason"] = None
|
||||
agg["is_exempt"] = False
|
||||
agg["per_hour_contribution"] = None
|
||||
|
||||
return aggregated
|
||||
|
||||
@@ -143,6 +212,9 @@ class AssistantDailyTask(BaseDwsTask):
|
||||
asl.real_use_seconds,
|
||||
asl.ledger_amount,
|
||||
asl.ledger_unit_price,
|
||||
asl.start_use_time,
|
||||
asl.last_use_time,
|
||||
asl.table_area_name,
|
||||
DATE(asl.start_use_time) AS service_date,
|
||||
COALESCE(ex.is_trash, 0) AS is_trash
|
||||
FROM dwd.dwd_assistant_service_log asl
|
||||
@@ -281,6 +353,131 @@ class AssistantDailyTask(BaseDwsTask):
|
||||
|
||||
return result
|
||||
|
||||
# ==========================================================================
|
||||
# 定档折算惩罚 — 纯函数(静态方法,不依赖数据库)
|
||||
# ==========================================================================
|
||||
|
||||
@staticmethod
|
||||
def detect_overlap_violations(
|
||||
service_records: List[Dict[str, Any]],
|
||||
penalty_areas: Set[str],
|
||||
) -> Dict[Tuple[int, date], List[Dict[str, Any]]]:
|
||||
"""
|
||||
检测同一台桌同一时间段超过 2 名助教挂台的违规。
|
||||
|
||||
输入:
|
||||
service_records: 服务记录列表,每条需包含
|
||||
assistant_id, table_id, table_area, start_time, end_time, service_date
|
||||
penalty_areas: 需要检测的区域集合(如 PENALTY_AREAS)
|
||||
|
||||
输出:
|
||||
{(assistant_id, service_date): [violation_info, ...]}
|
||||
violation_info 包含 table_id, overlap_count, assistant_ids 等
|
||||
|
||||
算法:
|
||||
1. 过滤出属于惩罚区域的记录
|
||||
2. 按 (table_id, service_date) 分组
|
||||
3. 对每组用扫描线算法检测最大同时在线助教数
|
||||
4. 若峰值 > 2,标记所有参与助教为违规
|
||||
"""
|
||||
# 过滤:仅保留惩罚区域内的记录,且时间信息完整
|
||||
filtered = []
|
||||
for r in service_records:
|
||||
area = r.get("table_area", "")
|
||||
if area not in penalty_areas:
|
||||
continue
|
||||
if r.get("start_time") is None or r.get("end_time") is None:
|
||||
continue
|
||||
filtered.append(r)
|
||||
|
||||
# 按 (table_id, service_date) 分组
|
||||
groups: Dict[Tuple[int, date], List[Dict[str, Any]]] = defaultdict(list)
|
||||
for r in filtered:
|
||||
key = (r["table_id"], r["service_date"])
|
||||
groups[key].append(r)
|
||||
|
||||
violations: Dict[Tuple[int, date], List[Dict[str, Any]]] = defaultdict(list)
|
||||
|
||||
for (table_id, svc_date), records in groups.items():
|
||||
if len(records) <= 2:
|
||||
# 不可能超过 2 名助教
|
||||
continue
|
||||
|
||||
# 扫描线:收集所有事件点,检测峰值
|
||||
events: List[Tuple[Any, int, int]] = [] # (time, +1/-1, assistant_id)
|
||||
for r in records:
|
||||
aid = r["assistant_id"]
|
||||
events.append((r["start_time"], 1, aid))
|
||||
events.append((r["end_time"], -1, aid))
|
||||
|
||||
# 按时间排序;同一时刻先处理 +1(开始)再处理 -1(结束),
|
||||
# 这样"恰好交接"也算重叠
|
||||
events.sort(key=lambda e: (e[0], -e[1]))
|
||||
|
||||
# 扫描:追踪当前在线助教集合
|
||||
active: Dict[int, int] = defaultdict(int) # assistant_id -> 计数
|
||||
max_overlap = 0
|
||||
max_overlap_aids: Set[int] = set()
|
||||
|
||||
for t, delta, aid in events:
|
||||
active[aid] += delta
|
||||
if active[aid] <= 0:
|
||||
del active[aid]
|
||||
|
||||
current_count = len(active)
|
||||
if current_count > max_overlap:
|
||||
max_overlap = current_count
|
||||
max_overlap_aids = set(active.keys())
|
||||
elif current_count == max_overlap and current_count > 2:
|
||||
max_overlap_aids |= set(active.keys())
|
||||
|
||||
if max_overlap > 2:
|
||||
violation_info = {
|
||||
"table_id": table_id,
|
||||
"service_date": svc_date,
|
||||
"overlap_count": max_overlap,
|
||||
"assistant_ids": max_overlap_aids,
|
||||
}
|
||||
# 为每个涉及的助教记录违规
|
||||
for aid in max_overlap_aids:
|
||||
violations[(aid, svc_date)].append(violation_info)
|
||||
|
||||
return dict(violations)
|
||||
|
||||
@staticmethod
|
||||
def compute_penalty_minutes(
|
||||
actual_minutes: Decimal,
|
||||
per_hour_contribution: Decimal,
|
||||
threshold: Decimal = Decimal("24"),
|
||||
) -> Decimal:
|
||||
"""
|
||||
计算惩罚分钟数(纯函数)。
|
||||
|
||||
规则:
|
||||
- per_hour_contribution >= threshold → 0(满额计入)
|
||||
- per_hour_contribution < threshold →
|
||||
actual_minutes × (1 - per_hour_contribution / threshold)
|
||||
- per_hour_contribution < 0 → 视为 0(防御性编程)
|
||||
|
||||
结果范围:[0, actual_minutes]
|
||||
"""
|
||||
if actual_minutes <= 0:
|
||||
return Decimal("0")
|
||||
|
||||
# 防御性:负值视为 0
|
||||
phc = max(per_hour_contribution, Decimal("0"))
|
||||
|
||||
if phc >= threshold:
|
||||
return Decimal("0")
|
||||
|
||||
# penalty = actual_minutes × (1 - phc / threshold)
|
||||
ratio = Decimal("1") - phc / threshold
|
||||
penalty = actual_minutes * ratio
|
||||
|
||||
# 确保结果在 [0, actual_minutes] 范围内
|
||||
penalty = max(Decimal("0"), min(penalty, actual_minutes))
|
||||
return penalty
|
||||
|
||||
|
||||
# 便于外部导入
|
||||
__all__ = ['AssistantDailyTask']
|
||||
__all__ = ['AssistantDailyTask', 'PENALTY_AREAS']
|
||||
|
||||
@@ -0,0 +1,542 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
助教订单流水四项统计任务
|
||||
|
||||
功能说明:
|
||||
以"助教+日期"为粒度,计算每名助教每日的订单流水贡献:
|
||||
- order_gross_revenue: 订单总流水(台费 + 酒水食品 + 所有助教服务费)
|
||||
- order_net_revenue: 订单净流水(订单总流水 - 所有助教服务分成)
|
||||
- time_weighted_revenue: 时效贡献流水(按服务时长折算的个人贡献)
|
||||
- time_weighted_net_revenue: 时效净贡献(时效贡献流水 - 个人服务分成)
|
||||
|
||||
数据来源:
|
||||
- dwd_settlement_head: 结算主表
|
||||
- dwd_table_fee_log: 台费明细
|
||||
- dwd_assistant_service_log: 助教服务记录
|
||||
|
||||
目标表:
|
||||
dws.dws_assistant_order_contribution
|
||||
|
||||
更新策略:
|
||||
- 幂等方式:delete-before-insert(按日期窗口)
|
||||
|
||||
核心算法:
|
||||
时效贡献流水按以下步骤计算:
|
||||
1. 每张台桌的有效计费时长 = MAX(助教总服务时长, 台桌使用时长)
|
||||
2. 台费分摊 = table_fee × (个人服务时长 / 有效计费时长)
|
||||
3. 个人服务费直接计入
|
||||
4. 酒水食品按助教总时长比例均分
|
||||
|
||||
超休/打赏课(course_type=BONUS)不参与订单级分摊,
|
||||
四项统计均设为该助教个人的服务流水和分成。
|
||||
|
||||
作者:ETL团队
|
||||
创建日期:2026-02-24
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import date
|
||||
from decimal import Decimal
|
||||
from typing import Any, Dict, List
|
||||
|
||||
from .base_dws_task import BaseDwsTask, TaskContext
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# 数据结构
|
||||
# =============================================================================
|
||||
|
||||
@dataclass
|
||||
class TableUsage:
|
||||
"""台桌使用信息"""
|
||||
table_id: int
|
||||
table_area: str # 区域名称(A/B/C/S/TV/M1-M7 等)
|
||||
usage_seconds: int # 台桌使用时长(秒)
|
||||
table_fee: Decimal # 台费/房费
|
||||
|
||||
|
||||
@dataclass
|
||||
class AssistantService:
|
||||
"""助教服务记录"""
|
||||
assistant_id: int
|
||||
table_id: int
|
||||
service_seconds: int # 服务时长(秒)
|
||||
ledger_amount: Decimal # 服务流水(助教收费)
|
||||
commission: Decimal # 助教分成
|
||||
skill_id: int
|
||||
course_type: str # BASE / BONUS / ROOM
|
||||
nickname: str = "" # 助教昵称(用于输出)
|
||||
|
||||
|
||||
@dataclass
|
||||
class OrderData:
|
||||
"""订单聚合数据(一个结算单的完整信息)"""
|
||||
order_settle_id: int
|
||||
site_id: int
|
||||
total_table_fee: Decimal # 台费总额
|
||||
total_goods_amount: Decimal # 酒水食品总额
|
||||
tables: List[TableUsage] = field(default_factory=list)
|
||||
assistants: List[AssistantService] = field(default_factory=list)
|
||||
stat_date: date | None = None # 订单日期(pay_time 的日期部分)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# 助教订单流水统计任务
|
||||
# =============================================================================
|
||||
|
||||
class AssistantOrderContributionTask(BaseDwsTask):
|
||||
"""
|
||||
助教订单流水四项统计任务
|
||||
|
||||
粒度:(site_id, assistant_id, stat_date)
|
||||
策略:delete-before-insert 幂等更新
|
||||
"""
|
||||
|
||||
DATE_COL = "stat_date"
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "DWS_ASSISTANT_ORDER_CONTRIBUTION"
|
||||
|
||||
def get_target_table(self) -> str:
|
||||
return "dws_assistant_order_contribution"
|
||||
|
||||
def get_primary_keys(self) -> List[str]:
|
||||
return ["site_id", "assistant_id", "stat_date"]
|
||||
|
||||
# =========================================================================
|
||||
# ETL 主流程(骨架,后续任务实现)
|
||||
# =========================================================================
|
||||
|
||||
def extract(self, context: TaskContext) -> Dict[str, Any]:
|
||||
"""提取数据:从 DWD 层读取结算、台费和助教服务数据,按订单聚合为 OrderData"""
|
||||
start_date = context.window_start.date() if hasattr(context.window_start, 'date') else context.window_start
|
||||
end_date = context.window_end.date() if hasattr(context.window_end, 'date') else context.window_end
|
||||
site_id = context.store_id
|
||||
|
||||
self.logger.info(
|
||||
"%s: 提取数据,日期范围 %s ~ %s",
|
||||
self.get_task_code(), start_date, end_date
|
||||
)
|
||||
|
||||
# 1. 提取台桌结账订单的结算主表(settle_type=1 为台桌结账)
|
||||
settlements = self._extract_settlements(site_id, start_date, end_date)
|
||||
|
||||
# 2. 提取台费明细
|
||||
table_fees = self._extract_table_fees(site_id, start_date, end_date)
|
||||
|
||||
# 3. 提取助教服务记录(含课程类型映射)
|
||||
service_logs = self._extract_service_logs(site_id, start_date, end_date)
|
||||
|
||||
# 4. 按 order_settle_id 聚合为 OrderData 列表
|
||||
orders = self._aggregate_to_orders(settlements, table_fees, service_logs)
|
||||
|
||||
self.logger.info(
|
||||
"%s: 提取完成,结算单 %d 条,聚合订单 %d 条",
|
||||
self.get_task_code(), len(settlements), len(orders)
|
||||
)
|
||||
|
||||
return {
|
||||
'orders': orders,
|
||||
'start_date': start_date,
|
||||
'end_date': end_date,
|
||||
'site_id': site_id,
|
||||
}
|
||||
|
||||
def transform(self, extracted: Dict[str, Any], context: TaskContext) -> List[Dict[str, Any]]:
|
||||
"""转换数据:调用四项统计计算,按 (assistant_id, stat_date) 聚合日度统计"""
|
||||
orders: List[OrderData] = extracted['orders']
|
||||
site_id = extracted['site_id']
|
||||
|
||||
self.logger.info(
|
||||
"%s: 转换数据,订单 %d 条",
|
||||
self.get_task_code(), len(orders)
|
||||
)
|
||||
|
||||
# 按 (assistant_id, stat_date) 聚合
|
||||
agg: Dict[tuple, Dict[str, Any]] = {}
|
||||
|
||||
for order in orders:
|
||||
# 跳过无助教服务的订单
|
||||
if not order.assistants:
|
||||
continue
|
||||
|
||||
# 获取订单日期(从结算主表的 pay_time 推导,存储在 order 中)
|
||||
stat_date = getattr(order, 'stat_date', None)
|
||||
if stat_date is None:
|
||||
continue
|
||||
|
||||
# 收集该订单所有参与助教(去重)
|
||||
assistant_ids = set(a.assistant_id for a in order.assistants)
|
||||
|
||||
for aid in assistant_ids:
|
||||
contribution = self.compute_assistant_contribution(order, aid)
|
||||
key = (aid, stat_date)
|
||||
|
||||
if key not in agg:
|
||||
# 获取助教昵称(取第一条服务记录的昵称)
|
||||
nickname = next(
|
||||
(a.nickname for a in order.assistants if a.assistant_id == aid),
|
||||
None
|
||||
)
|
||||
agg[key] = {
|
||||
'site_id': site_id,
|
||||
'tenant_id': order.site_id, # tenant_id 与 site_id 相同
|
||||
'assistant_id': aid,
|
||||
'assistant_nickname': nickname,
|
||||
'stat_date': stat_date,
|
||||
'order_gross_revenue': Decimal('0'),
|
||||
'order_net_revenue': Decimal('0'),
|
||||
'time_weighted_revenue': Decimal('0'),
|
||||
'time_weighted_net_revenue': Decimal('0'),
|
||||
'order_count': 0,
|
||||
'total_service_seconds': 0,
|
||||
}
|
||||
|
||||
rec = agg[key]
|
||||
rec['order_gross_revenue'] += contribution['order_gross_revenue']
|
||||
rec['order_net_revenue'] += contribution['order_net_revenue']
|
||||
rec['time_weighted_revenue'] += contribution['time_weighted_revenue']
|
||||
rec['time_weighted_net_revenue'] += contribution['time_weighted_net_revenue']
|
||||
rec['order_count'] += 1
|
||||
# 累加该助教在该订单中的总服务时长
|
||||
rec['total_service_seconds'] += sum(
|
||||
a.service_seconds for a in order.assistants if a.assistant_id == aid
|
||||
)
|
||||
|
||||
result = list(agg.values())
|
||||
self.logger.info(
|
||||
"%s: 转换完成,输出 %d 条助教日度统计",
|
||||
self.get_task_code(), len(result)
|
||||
)
|
||||
return result
|
||||
|
||||
# load() 使用 BaseDwsTask 默认实现(DATE_COL="stat_date")
|
||||
|
||||
# =========================================================================
|
||||
# 数据提取方法
|
||||
# =========================================================================
|
||||
|
||||
def _extract_settlements(
|
||||
self, site_id: int, start_date: date, end_date: date
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""提取台桌结账订单的结算主表
|
||||
|
||||
settle_type=1 为台桌结账,包含台费、酒水食品等金额。
|
||||
"""
|
||||
sql = """
|
||||
SELECT
|
||||
order_settle_id,
|
||||
site_id,
|
||||
tenant_id,
|
||||
table_charge_money,
|
||||
goods_money,
|
||||
DATE(pay_time) AS stat_date
|
||||
FROM dwd.dwd_settlement_head
|
||||
WHERE site_id = %s
|
||||
AND settle_type = 1
|
||||
AND DATE(pay_time) >= %s
|
||||
AND DATE(pay_time) <= %s
|
||||
"""
|
||||
rows = self.db.query(sql, (site_id, start_date, end_date))
|
||||
return [dict(row) for row in rows] if rows else []
|
||||
|
||||
def _extract_table_fees(
|
||||
self, site_id: int, start_date: date, end_date: date
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""提取台费明细
|
||||
|
||||
每条记录对应一张台桌在一个订单中的台费信息。
|
||||
real_table_use_seconds 为台桌实际使用时长。
|
||||
"""
|
||||
sql = """
|
||||
SELECT
|
||||
tfl.order_settle_id,
|
||||
tfl.site_table_id AS table_id,
|
||||
COALESCE(tfl.site_table_area_name, '') AS table_area,
|
||||
COALESCE(tfl.real_table_use_seconds, 0) AS usage_seconds,
|
||||
COALESCE(tfl.ledger_amount, 0) AS table_fee
|
||||
FROM dwd.dwd_table_fee_log tfl
|
||||
WHERE tfl.site_id = %s
|
||||
AND DATE(tfl.start_use_time) >= %s
|
||||
AND DATE(tfl.start_use_time) <= %s
|
||||
AND COALESCE(tfl.is_delete, 0) = 0
|
||||
"""
|
||||
rows = self.db.query(sql, (site_id, start_date, end_date))
|
||||
return [dict(row) for row in rows] if rows else []
|
||||
|
||||
def _extract_service_logs(
|
||||
self, site_id: int, start_date: date, end_date: date
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""提取助教服务记录(含课程类型映射)
|
||||
|
||||
通过 LEFT JOIN cfg_skill_type 获取 course_type_code,
|
||||
real_service_money 为助教分成。
|
||||
"""
|
||||
sql = """
|
||||
SELECT
|
||||
asl.order_settle_id,
|
||||
asl.site_assistant_id AS assistant_id,
|
||||
asl.nickname,
|
||||
asl.site_table_id AS table_id,
|
||||
COALESCE(asl.income_seconds, 0) AS service_seconds,
|
||||
COALESCE(asl.ledger_amount, 0) AS ledger_amount,
|
||||
COALESCE(asl.real_service_money, 0) AS commission,
|
||||
COALESCE(asl.skill_id, 0) AS skill_id,
|
||||
COALESCE(cst.course_type_code, 'BASE') AS course_type
|
||||
FROM dwd.dwd_assistant_service_log asl
|
||||
LEFT JOIN dws.cfg_skill_type cst
|
||||
ON asl.skill_id = cst.skill_id
|
||||
AND cst.is_active = TRUE
|
||||
WHERE asl.site_id = %s
|
||||
AND DATE(asl.start_use_time) >= %s
|
||||
AND DATE(asl.start_use_time) <= %s
|
||||
AND COALESCE(asl.is_delete, 0) = 0
|
||||
"""
|
||||
rows = self.db.query(sql, (site_id, start_date, end_date))
|
||||
return [dict(row) for row in rows] if rows else []
|
||||
|
||||
def _aggregate_to_orders(
|
||||
self,
|
||||
settlements: List[Dict[str, Any]],
|
||||
table_fees: List[Dict[str, Any]],
|
||||
service_logs: List[Dict[str, Any]],
|
||||
) -> List[OrderData]:
|
||||
"""按 order_settle_id 聚合为 OrderData 列表
|
||||
|
||||
只保留有助教服务记录的订单(无助教的订单在 transform 中也会跳过)。
|
||||
"""
|
||||
from collections import defaultdict
|
||||
|
||||
# 按 order_settle_id 索引台费和服务记录
|
||||
table_fee_map: Dict[int, List[Dict]] = defaultdict(list)
|
||||
for tf in table_fees:
|
||||
table_fee_map[tf['order_settle_id']].append(tf)
|
||||
|
||||
service_map: Dict[int, List[Dict]] = defaultdict(list)
|
||||
for sl in service_logs:
|
||||
service_map[sl['order_settle_id']].append(sl)
|
||||
|
||||
orders: List[OrderData] = []
|
||||
for settle in settlements:
|
||||
oid = settle['order_settle_id']
|
||||
svc_list = service_map.get(oid)
|
||||
# 跳过无助教服务的订单
|
||||
if not svc_list:
|
||||
continue
|
||||
|
||||
tables = [
|
||||
TableUsage(
|
||||
table_id=int(tf['table_id']),
|
||||
table_area=tf['table_area'],
|
||||
usage_seconds=int(tf['usage_seconds']),
|
||||
table_fee=Decimal(str(tf['table_fee'])),
|
||||
)
|
||||
for tf in table_fee_map.get(oid, [])
|
||||
]
|
||||
|
||||
assistants = [
|
||||
AssistantService(
|
||||
assistant_id=int(sl['assistant_id']),
|
||||
table_id=int(sl['table_id']),
|
||||
service_seconds=int(sl['service_seconds']),
|
||||
ledger_amount=Decimal(str(sl['ledger_amount'])),
|
||||
commission=Decimal(str(sl['commission'])),
|
||||
skill_id=int(sl['skill_id']),
|
||||
course_type=sl['course_type'],
|
||||
nickname=sl.get('nickname', ''),
|
||||
)
|
||||
for sl in svc_list
|
||||
]
|
||||
|
||||
orders.append(OrderData(
|
||||
order_settle_id=int(oid),
|
||||
site_id=int(settle['site_id']),
|
||||
total_table_fee=Decimal(str(settle.get('table_charge_money') or 0)),
|
||||
total_goods_amount=Decimal(str(settle.get('goods_money') or 0)),
|
||||
tables=tables,
|
||||
assistants=assistants,
|
||||
stat_date=settle.get('stat_date'),
|
||||
))
|
||||
|
||||
return orders
|
||||
|
||||
# =========================================================================
|
||||
# 核心计算(纯函数,不依赖数据库,便于属性测试)
|
||||
# =========================================================================
|
||||
|
||||
@staticmethod
|
||||
def compute_order_gross_revenue(order: OrderData) -> Decimal:
|
||||
"""订单总流水 = 台费 + 酒水食品 + 所有助教服务费
|
||||
|
||||
每个参与助教获得相同的 order_gross_revenue 值。
|
||||
"""
|
||||
total_service_amount = sum(
|
||||
(a.ledger_amount for a in order.assistants), Decimal('0')
|
||||
)
|
||||
return order.total_table_fee + order.total_goods_amount + total_service_amount
|
||||
|
||||
@staticmethod
|
||||
def compute_order_net_revenue(order: OrderData) -> Decimal:
|
||||
"""订单净流水 = 订单总流水 - 所有助教服务分成
|
||||
|
||||
每个参与助教获得相同的 order_net_revenue 值。
|
||||
"""
|
||||
gross = AssistantOrderContributionTask.compute_order_gross_revenue(order)
|
||||
total_commission = sum(
|
||||
(a.commission for a in order.assistants), Decimal('0')
|
||||
)
|
||||
return gross - total_commission
|
||||
|
||||
@staticmethod
|
||||
def compute_time_weighted_revenue(
|
||||
order: OrderData, assistant_id: int
|
||||
) -> Decimal:
|
||||
"""时效贡献流水 = 台费按时长分摊 + 个人服务费 + 酒水食品按时长比例
|
||||
|
||||
算法步骤:
|
||||
1. 每张台桌:billable_seconds = MAX(助教总服务时长, 台桌使用时长)
|
||||
台费分摊 = table_fee × (个人服务时长 / billable_seconds)
|
||||
2. 个人服务费(ledger_amount)直接计入
|
||||
3. 酒水食品按个人总服务时长占所有助教总服务时长的比例均分
|
||||
|
||||
超休/打赏课(BONUS):四项统计均设为个人服务流水和分成,
|
||||
不参与订单级分摊。此逻辑在调用方处理,本方法仅处理常规情况。
|
||||
|
||||
边界情况:
|
||||
- 台桌使用时长为 0 且助教总服务时长也为 0:台费分摊 = 0
|
||||
- 助教总服务时长为 0:酒水食品分摊 = 0
|
||||
"""
|
||||
# --- 筛选该助教的服务记录(排除 BONUS 类型) ---
|
||||
my_services = [
|
||||
a for a in order.assistants
|
||||
if a.assistant_id == assistant_id and a.course_type != "BONUS"
|
||||
]
|
||||
all_non_bonus = [a for a in order.assistants if a.course_type != "BONUS"]
|
||||
|
||||
# 如果该助教无非 BONUS 服务记录,返回 0
|
||||
if not my_services:
|
||||
return Decimal('0')
|
||||
|
||||
# --- 步骤 1:台费按时长分摊 ---
|
||||
table_fee_share = Decimal('0')
|
||||
for table in order.tables:
|
||||
# 该台桌上所有助教的服务时长之和
|
||||
table_total_svc = sum(
|
||||
a.service_seconds for a in all_non_bonus
|
||||
if a.table_id == table.table_id
|
||||
)
|
||||
# 该助教在该台桌的服务时长
|
||||
my_table_svc = sum(
|
||||
a.service_seconds for a in my_services
|
||||
if a.table_id == table.table_id
|
||||
)
|
||||
if my_table_svc == 0:
|
||||
continue
|
||||
|
||||
# 有效计费时长 = MAX(助教总服务时长, 台桌使用时长)
|
||||
billable_seconds = max(table_total_svc, table.usage_seconds)
|
||||
if billable_seconds <= 0:
|
||||
continue
|
||||
|
||||
table_fee_share += table.table_fee * Decimal(my_table_svc) / Decimal(billable_seconds)
|
||||
|
||||
# --- 步骤 2:个人服务费直接计入 ---
|
||||
personal_service = sum(
|
||||
(a.ledger_amount for a in my_services), Decimal('0')
|
||||
)
|
||||
|
||||
# --- 步骤 3:酒水食品按总时长比例均分 ---
|
||||
my_total_seconds = sum(a.service_seconds for a in my_services)
|
||||
all_total_seconds = sum(a.service_seconds for a in all_non_bonus)
|
||||
|
||||
if all_total_seconds > 0 and my_total_seconds > 0:
|
||||
goods_share = order.total_goods_amount * Decimal(my_total_seconds) / Decimal(all_total_seconds)
|
||||
else:
|
||||
goods_share = Decimal('0')
|
||||
|
||||
return table_fee_share + personal_service + goods_share
|
||||
|
||||
@staticmethod
|
||||
def compute_time_weighted_net_revenue(
|
||||
time_weighted_revenue: Decimal, assistant_commission: Decimal
|
||||
) -> Decimal:
|
||||
"""时效净贡献 = 时效贡献流水 - 个人服务分成"""
|
||||
return time_weighted_revenue - assistant_commission
|
||||
|
||||
@staticmethod
|
||||
def compute_assistant_contribution(
|
||||
order: OrderData, assistant_id: int
|
||||
) -> Dict[str, Decimal]:
|
||||
"""计算单个助教在单个订单中的四项统计(含 BONUS 特殊处理)
|
||||
|
||||
返回字典包含:
|
||||
- order_gross_revenue
|
||||
- order_net_revenue
|
||||
- time_weighted_revenue
|
||||
- time_weighted_net_revenue
|
||||
- total_commission(该助教个人分成,辅助字段)
|
||||
|
||||
超休/打赏课(BONUS):四项统计均设为个人服务流水和分成,
|
||||
不参与订单级分摊。
|
||||
"""
|
||||
cls = AssistantOrderContributionTask
|
||||
|
||||
# 该助教的所有服务记录
|
||||
my_services = [a for a in order.assistants if a.assistant_id == assistant_id]
|
||||
if not my_services:
|
||||
return {
|
||||
'order_gross_revenue': Decimal('0'),
|
||||
'order_net_revenue': Decimal('0'),
|
||||
'time_weighted_revenue': Decimal('0'),
|
||||
'time_weighted_net_revenue': Decimal('0'),
|
||||
'total_commission': Decimal('0'),
|
||||
}
|
||||
|
||||
# 分离 BONUS 和非 BONUS 服务
|
||||
bonus_services = [a for a in my_services if a.course_type == "BONUS"]
|
||||
normal_services = [a for a in my_services if a.course_type != "BONUS"]
|
||||
|
||||
# BONUS 部分:直接用个人流水
|
||||
bonus_revenue = sum((a.ledger_amount for a in bonus_services), Decimal('0'))
|
||||
bonus_commission = sum((a.commission for a in bonus_services), Decimal('0'))
|
||||
|
||||
if normal_services:
|
||||
# 有常规服务:按正常逻辑计算
|
||||
normal_commission = sum((a.commission for a in normal_services), Decimal('0'))
|
||||
total_commission = normal_commission + bonus_commission
|
||||
gross = cls.compute_order_gross_revenue(order)
|
||||
net = cls.compute_order_net_revenue(order)
|
||||
twr = cls.compute_time_weighted_revenue(order, assistant_id)
|
||||
|
||||
# 合成最终值(先算 time_weighted_revenue 再减 total_commission,
|
||||
# 保证 twnr == twr_final - total_commission 精度一致)
|
||||
twr_final = twr + bonus_revenue
|
||||
twnr_final = twr_final - total_commission
|
||||
|
||||
return {
|
||||
'order_gross_revenue': gross + bonus_revenue,
|
||||
'order_net_revenue': net + (bonus_revenue - bonus_commission),
|
||||
'time_weighted_revenue': twr_final,
|
||||
'time_weighted_net_revenue': twnr_final,
|
||||
'total_commission': total_commission,
|
||||
}
|
||||
else:
|
||||
# 纯 BONUS 助教:四项统计均为个人流水
|
||||
return {
|
||||
'order_gross_revenue': bonus_revenue,
|
||||
'order_net_revenue': bonus_revenue - bonus_commission,
|
||||
'time_weighted_revenue': bonus_revenue,
|
||||
'time_weighted_net_revenue': bonus_revenue - bonus_commission,
|
||||
'total_commission': bonus_commission,
|
||||
}
|
||||
|
||||
|
||||
# 便于外部导入
|
||||
__all__ = [
|
||||
'TableUsage',
|
||||
'AssistantService',
|
||||
'OrderData',
|
||||
'AssistantOrderContributionTask',
|
||||
]
|
||||
@@ -85,10 +85,14 @@ class MemberConsumptionTask(BaseDwsTask):
|
||||
# 3. 获取会员卡余额
|
||||
card_balances = self._extract_card_balances(site_id)
|
||||
|
||||
# CHANGE 2025-07-15 | task 4.1: 获取充值统计(30/60/90 天窗口)
|
||||
recharge_stats = self._extract_recharge_stats(site_id, stat_date)
|
||||
|
||||
return {
|
||||
'consumption_stats': consumption_stats,
|
||||
'member_info': member_info,
|
||||
'card_balances': card_balances,
|
||||
'recharge_stats': recharge_stats,
|
||||
'stat_date': stat_date,
|
||||
'site_id': site_id
|
||||
}
|
||||
@@ -100,6 +104,7 @@ class MemberConsumptionTask(BaseDwsTask):
|
||||
consumption_stats = extracted['consumption_stats']
|
||||
member_info = extracted['member_info']
|
||||
card_balances = extracted['card_balances']
|
||||
recharge_stats = extracted.get('recharge_stats', {})
|
||||
stat_date = extracted['stat_date']
|
||||
site_id = extracted['site_id']
|
||||
|
||||
@@ -119,11 +124,20 @@ class MemberConsumptionTask(BaseDwsTask):
|
||||
|
||||
memb_info = member_info.get(member_id, {})
|
||||
balance = card_balances.get(member_id, {})
|
||||
# CHANGE 2025-07-15 | task 4.2: 合并充值统计,无记录时默认 0
|
||||
recharge = recharge_stats.get(member_id, {})
|
||||
|
||||
# 计算活跃度和客户分层
|
||||
days_since_last = self._calc_days_since(stat_date, stats.get('last_consume_date'))
|
||||
customer_tier = self._calculate_customer_tier(stats, days_since_last)
|
||||
|
||||
# CHANGE 2025-07-15 | task 4.2: 次均消费 = total_consume_amount / MAX(total_visit_count, 1)
|
||||
total_consume_amount = self.safe_decimal(stats.get('total_consume_amount', 0))
|
||||
total_visit_count = self.safe_int(stats.get('total_visit_count', 0))
|
||||
avg_ticket_amount = (
|
||||
total_consume_amount / max(total_visit_count, 1)
|
||||
).quantize(Decimal('0.01'))
|
||||
|
||||
record = {
|
||||
'site_id': site_id,
|
||||
'tenant_id': self.config.get("app.tenant_id", site_id),
|
||||
@@ -137,8 +151,8 @@ class MemberConsumptionTask(BaseDwsTask):
|
||||
# 全量累计统计
|
||||
'first_consume_date': stats.get('first_consume_date'),
|
||||
'last_consume_date': stats.get('last_consume_date'),
|
||||
'total_visit_count': self.safe_int(stats.get('total_visit_count', 0)),
|
||||
'total_consume_amount': self.safe_decimal(stats.get('total_consume_amount', 0)),
|
||||
'total_visit_count': total_visit_count,
|
||||
'total_consume_amount': total_consume_amount,
|
||||
'total_recharge_amount': self.safe_decimal(memb_info.get('recharge_money_sum', 0)),
|
||||
'total_table_fee': self.safe_decimal(stats.get('total_table_fee', 0)),
|
||||
'total_goods_amount': self.safe_decimal(stats.get('total_goods_amount', 0)),
|
||||
@@ -156,6 +170,15 @@ class MemberConsumptionTask(BaseDwsTask):
|
||||
'consume_amount_30d': self.safe_decimal(stats.get('consume_amount_30d', 0)),
|
||||
'consume_amount_60d': self.safe_decimal(stats.get('consume_amount_60d', 0)),
|
||||
'consume_amount_90d': self.safe_decimal(stats.get('consume_amount_90d', 0)),
|
||||
# 充值窗口统计(30/60/90 天)
|
||||
'recharge_count_30d': self.safe_int(recharge.get('count_30d', 0)),
|
||||
'recharge_count_60d': self.safe_int(recharge.get('count_60d', 0)),
|
||||
'recharge_count_90d': self.safe_int(recharge.get('count_90d', 0)),
|
||||
'recharge_amount_30d': self.safe_decimal(recharge.get('amount_30d', 0)),
|
||||
'recharge_amount_60d': self.safe_decimal(recharge.get('amount_60d', 0)),
|
||||
'recharge_amount_90d': self.safe_decimal(recharge.get('amount_90d', 0)),
|
||||
# 次均消费
|
||||
'avg_ticket_amount': avg_ticket_amount,
|
||||
# 卡余额
|
||||
'cash_card_balance': self.safe_decimal(balance.get('cash_balance', 0)),
|
||||
'gift_card_balance': self.safe_decimal(balance.get('gift_balance', 0)),
|
||||
@@ -259,13 +282,14 @@ class MemberConsumptionTask(BaseDwsTask):
|
||||
) AS birthday
|
||||
FROM dwd.dim_member m
|
||||
WHERE m.member_id IN (
|
||||
SELECT DISTINCT tenant_member_id
|
||||
SELECT DISTINCT member_id
|
||||
FROM dwd.dwd_settlement_head
|
||||
WHERE site_id = %s
|
||||
AND tenant_member_id IS NOT NULL
|
||||
AND tenant_member_id != 0
|
||||
AND member_id IS NOT NULL
|
||||
AND member_id != 0
|
||||
) AND m.scd2_is_current = 1
|
||||
"""
|
||||
# CHANGE 2026-02-24 | 修复列名:tenant_member_id → member_id(dwd_settlement_head 无 tenant_member_id 列)
|
||||
sql_fallback = """
|
||||
SELECT
|
||||
member_id,
|
||||
@@ -277,16 +301,18 @@ class MemberConsumptionTask(BaseDwsTask):
|
||||
birthday
|
||||
FROM dwd.dim_member
|
||||
WHERE member_id IN (
|
||||
SELECT DISTINCT tenant_member_id
|
||||
SELECT DISTINCT member_id
|
||||
FROM dwd.dwd_settlement_head
|
||||
WHERE site_id = %s
|
||||
AND tenant_member_id IS NOT NULL
|
||||
AND tenant_member_id != 0
|
||||
AND member_id IS NOT NULL
|
||||
AND member_id != 0
|
||||
) AND scd2_is_current = 1
|
||||
"""
|
||||
try:
|
||||
rows = self.db.query(sql_with_fdw, (site_id,))
|
||||
except Exception as exc:
|
||||
# CHANGE [2026-02-24] FDW 查询失败后事务处于 failed 状态,必须先 rollback 再执行 fallback
|
||||
self.db.rollback()
|
||||
# FDW 连接失败,降级为仅使用 dim_member.birthday
|
||||
self.logger.warning(
|
||||
"%s: FDW 读取 member_birthday_manual 失败,降级为 dim_member.birthday — %s",
|
||||
@@ -352,6 +378,55 @@ class MemberConsumptionTask(BaseDwsTask):
|
||||
|
||||
return result
|
||||
|
||||
# CHANGE 2025-07-15 | task 4.1: 新增充值统计提取方法
|
||||
def _extract_recharge_stats(
|
||||
self,
|
||||
site_id: int,
|
||||
stat_date: date,
|
||||
) -> Dict[int, Dict[str, Any]]:
|
||||
"""
|
||||
从 dwd.dwd_recharge_order 提取 30/60/90 天充值统计
|
||||
|
||||
返回: {member_id: {count_30d, count_60d, count_90d,
|
||||
amount_30d, amount_60d, amount_90d}}
|
||||
"""
|
||||
sql = """
|
||||
SELECT
|
||||
member_id,
|
||||
COUNT(CASE WHEN DATE(pay_time) >= %s - INTERVAL '29 days' THEN 1 END) AS count_30d,
|
||||
COUNT(CASE WHEN DATE(pay_time) >= %s - INTERVAL '59 days' THEN 1 END) AS count_60d,
|
||||
COUNT(CASE WHEN DATE(pay_time) >= %s - INTERVAL '89 days' THEN 1 END) AS count_90d,
|
||||
COALESCE(SUM(CASE WHEN DATE(pay_time) >= %s - INTERVAL '29 days' THEN pay_amount ELSE 0 END), 0) AS amount_30d,
|
||||
COALESCE(SUM(CASE WHEN DATE(pay_time) >= %s - INTERVAL '59 days' THEN pay_amount ELSE 0 END), 0) AS amount_60d,
|
||||
COALESCE(SUM(CASE WHEN DATE(pay_time) >= %s - INTERVAL '89 days' THEN pay_amount ELSE 0 END), 0) AS amount_90d
|
||||
FROM dwd.dwd_recharge_order
|
||||
WHERE site_id = %s
|
||||
AND member_id IS NOT NULL
|
||||
AND member_id != 0
|
||||
AND pay_time IS NOT NULL
|
||||
AND DATE(pay_time) <= %s
|
||||
GROUP BY member_id
|
||||
"""
|
||||
params = (
|
||||
stat_date, stat_date, stat_date,
|
||||
stat_date, stat_date, stat_date,
|
||||
site_id, stat_date,
|
||||
)
|
||||
rows = self.db.query(sql, params)
|
||||
|
||||
result: Dict[int, Dict[str, Any]] = {}
|
||||
for row in (rows or []):
|
||||
rd = dict(row)
|
||||
result[rd['member_id']] = {
|
||||
'count_30d': rd.get('count_30d', 0),
|
||||
'count_60d': rd.get('count_60d', 0),
|
||||
'count_90d': rd.get('count_90d', 0),
|
||||
'amount_30d': self.safe_decimal(rd.get('amount_30d', 0)),
|
||||
'amount_60d': self.safe_decimal(rd.get('amount_60d', 0)),
|
||||
'amount_90d': self.safe_decimal(rd.get('amount_90d', 0)),
|
||||
}
|
||||
return result
|
||||
|
||||
# ==========================================================================
|
||||
# 工具方法
|
||||
# ==========================================================================
|
||||
|
||||
@@ -351,6 +351,8 @@ class MemberVisitTask(BaseDwsTask):
|
||||
try:
|
||||
rows = self.db.query(sql_with_fdw, (site_id,))
|
||||
except Exception as exc:
|
||||
# CHANGE [2026-02-24] FDW 查询失败后事务处于 failed 状态,必须先 rollback 再执行 fallback
|
||||
self.db.rollback()
|
||||
# FDW 连接失败,降级为仅使用 dim_member.birthday
|
||||
self.logger.warning(
|
||||
"%s: FDW 读取 member_birthday_manual 失败,降级为 dim_member.birthday — %s",
|
||||
|
||||
@@ -161,6 +161,8 @@ class BaseOdsTask(BaseTask):
|
||||
segment_keys: set[tuple] = set()
|
||||
# CHANGE 2026-02-18 | 收集 WINDOW 模式下 API 返回数据的实际最早时间戳
|
||||
segment_earliest_time: datetime | None = None
|
||||
# CHANGE [2026-02-24] 收集 API 返回数据的实际最晚时间戳,用于 late-cutoff 保护
|
||||
segment_latest_time: datetime | None = None
|
||||
|
||||
self.logger.info(
|
||||
"%s: 开始执行(%s/%s),窗口[%s ~ %s]",
|
||||
@@ -197,6 +199,13 @@ class BaseOdsTask(BaseTask):
|
||||
if page_earliest is not None:
|
||||
if segment_earliest_time is None or page_earliest < segment_earliest_time:
|
||||
segment_earliest_time = page_earliest
|
||||
# CHANGE [2026-02-24] 收集实际最晚时间戳,用于 late-cutoff 保护
|
||||
page_latest = self._collect_latest_time(
|
||||
page_records, snapshot_time_column
|
||||
)
|
||||
if page_latest is not None:
|
||||
if segment_latest_time is None or page_latest > segment_latest_time:
|
||||
segment_latest_time = page_latest
|
||||
inserted, updated, skipped = self._insert_records_schema_aware(
|
||||
table=spec.table_name,
|
||||
records=page_records,
|
||||
@@ -229,13 +238,27 @@ class BaseOdsTask(BaseTask):
|
||||
spec.code, seg_start, segment_earliest_time,
|
||||
)
|
||||
effective_window_start = segment_earliest_time
|
||||
# CHANGE [2026-02-24] late-cutoff 保护:用 API 实际最晚时间戳收窄软删除范围
|
||||
# 防止 recent endpoint 数据保留期滚动导致窗口尾部数据消失时误标删除
|
||||
effective_window_end = seg_end
|
||||
if (
|
||||
snapshot_protect_early_cutoff
|
||||
and snapshot_mode == SnapshotMode.WINDOW
|
||||
and segment_latest_time is not None
|
||||
and segment_latest_time < seg_end
|
||||
):
|
||||
self.logger.info(
|
||||
"%s: late-cutoff 保护生效,软删除窗口终点从 %s 收窄至 %s",
|
||||
spec.code, seg_end, segment_latest_time,
|
||||
)
|
||||
effective_window_end = segment_latest_time
|
||||
deleted = self._mark_missing_as_deleted(
|
||||
table=spec.table_name,
|
||||
business_pk_cols=business_pk_cols,
|
||||
snapshot_mode=snapshot_mode,
|
||||
snapshot_time_column=snapshot_time_column,
|
||||
window_start=effective_window_start,
|
||||
window_end=seg_end,
|
||||
window_end=effective_window_end,
|
||||
key_values=segment_keys,
|
||||
allow_empty=snapshot_allow_empty,
|
||||
)
|
||||
@@ -548,7 +571,39 @@ class BaseOdsTask(BaseTask):
|
||||
except (ValueError, TypeError, OverflowError):
|
||||
continue
|
||||
return earliest
|
||||
def _collect_latest_time(
|
||||
self, records: list, time_column: str
|
||||
) -> datetime | None:
|
||||
"""从一批 API 返回记录中提取 time_column 的最大值。
|
||||
|
||||
# CHANGE [2026-02-24] Prompt=诊断 2976396053006405 is_delete 误标
|
||||
# 用于 late-cutoff 保护:当 API recent endpoint 数据保留期滚动导致
|
||||
# 窗口尾部数据消失时,避免将尾部之后的数据误标为软删除。
|
||||
"""
|
||||
if not records or not time_column:
|
||||
return None
|
||||
latest: datetime | None = None
|
||||
for rec in records:
|
||||
if not isinstance(rec, dict):
|
||||
continue
|
||||
merged = self._merge_record_layers(rec)
|
||||
raw = self._get_value_case_insensitive(merged, time_column)
|
||||
if raw is None:
|
||||
continue
|
||||
try:
|
||||
if isinstance(raw, datetime):
|
||||
ts = raw
|
||||
elif isinstance(raw, str):
|
||||
ts = dtparser.parse(raw)
|
||||
else:
|
||||
continue
|
||||
if ts.tzinfo is None:
|
||||
ts = ts.replace(tzinfo=self.tz)
|
||||
if latest is None or ts > latest:
|
||||
latest = ts
|
||||
except (ValueError, TypeError, OverflowError):
|
||||
continue
|
||||
return latest
|
||||
|
||||
def _mark_missing_as_deleted(
|
||||
self,
|
||||
@@ -995,6 +1050,13 @@ class BaseOdsTask(BaseTask):
|
||||
updated += 1
|
||||
return inserted, updated
|
||||
|
||||
# goodsStockWarningInfo 嵌套字段 → ODS 扁平列名映射
|
||||
_STOCK_WARNING_FIELD_MAP: dict[str, str] = {
|
||||
"sales_day": "warning_sales_day",
|
||||
"warning_day_max": "warning_day_max",
|
||||
"warning_day_min": "warning_day_min",
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def _merge_record_layers(record: dict) -> dict:
|
||||
merged = record
|
||||
@@ -1005,6 +1067,13 @@ class BaseOdsTask(BaseTask):
|
||||
settle_inner = merged.get("settleList")
|
||||
if isinstance(settle_inner, dict):
|
||||
merged = {**settle_inner, **merged}
|
||||
# CHANGE 2026-02-24 | 扁平化 goodsStockWarningInfo 嵌套对象,
|
||||
# 将 sales_day/warning_day_max/warning_day_min 提升为顶层键
|
||||
warning_info = merged.get("goodsStockWarningInfo")
|
||||
if isinstance(warning_info, dict):
|
||||
for src_key, dst_key in BaseOdsTask._STOCK_WARNING_FIELD_MAP.items():
|
||||
if src_key in warning_info and dst_key not in merged:
|
||||
merged[dst_key] = warning_info[src_key]
|
||||
return merged
|
||||
|
||||
@staticmethod
|
||||
|
||||
@@ -20,17 +20,24 @@ from orchestration.topological_sort import topological_sort
|
||||
@dataclass
|
||||
class _FakeMeta:
|
||||
depends_on: list[str] = field(default_factory=list)
|
||||
layer: str | None = None
|
||||
|
||||
|
||||
class _FakeRegistry:
|
||||
"""最小 Registry 替身,仅提供 get_metadata()。"""
|
||||
|
||||
def __init__(self, deps: dict[str, list[str]]):
|
||||
def __init__(self, deps: dict[str, list[str]], layers: dict[str, str] | None = None):
|
||||
self._deps = deps
|
||||
self._layers = layers or {}
|
||||
|
||||
def get_metadata(self, code: str):
|
||||
if code in self._deps:
|
||||
return _FakeMeta(depends_on=self._deps[code])
|
||||
return _FakeMeta(
|
||||
depends_on=self._deps[code],
|
||||
layer=self._layers.get(code),
|
||||
)
|
||||
if code in self._layers:
|
||||
return _FakeMeta(layer=self._layers[code])
|
||||
return _FakeMeta()
|
||||
|
||||
|
||||
@@ -140,3 +147,78 @@ class TestTopologicalSort:
|
||||
assert result.index("DWS_ASSISTANT_DAILY") < result.index("DWS_ASSISTANT_MONTHLY")
|
||||
# 所有任务在 MAINTENANCE 前
|
||||
assert result.index("DWS_MAINTENANCE") == len(result) - 1
|
||||
|
||||
# ── 跨层隐含依赖测试 ──────────────────────────────────────
|
||||
|
||||
def test_cross_layer_ods_before_dwd(self):
|
||||
"""ODS 任务应排在 DWD 任务之前(隐含层级依赖)。"""
|
||||
reg = _FakeRegistry(
|
||||
deps={"ODS_A": [], "DWD_LOAD": []},
|
||||
layers={"ODS_A": "ODS", "DWD_LOAD": "DWD"},
|
||||
)
|
||||
# 故意把 DWD 放前面
|
||||
result = topological_sort(["DWD_LOAD", "ODS_A"], reg)
|
||||
assert result.index("ODS_A") < result.index("DWD_LOAD")
|
||||
|
||||
def test_cross_layer_full_pipeline_order(self):
|
||||
"""全流程:ODS → DWD → DWS → INDEX,无论输入顺序如何。"""
|
||||
reg = _FakeRegistry(
|
||||
deps={"IDX": [], "DWS_A": [], "DWD_L": [], "ODS_X": []},
|
||||
layers={"ODS_X": "ODS", "DWD_L": "DWD", "DWS_A": "DWS", "IDX": "INDEX"},
|
||||
)
|
||||
# 故意倒序输入
|
||||
result = topological_sort(["IDX", "DWS_A", "DWD_L", "ODS_X"], reg)
|
||||
assert result.index("ODS_X") < result.index("DWD_L")
|
||||
assert result.index("DWD_L") < result.index("DWS_A")
|
||||
assert result.index("DWS_A") < result.index("IDX")
|
||||
|
||||
def test_cross_layer_multiple_tasks_per_layer(self):
|
||||
"""每层多个任务时,所有低层任务排在高层任务之前。"""
|
||||
reg = _FakeRegistry(
|
||||
deps={"ODS_A": [], "ODS_B": [], "DWD_X": [], "DWD_Y": []},
|
||||
layers={"ODS_A": "ODS", "ODS_B": "ODS", "DWD_X": "DWD", "DWD_Y": "DWD"},
|
||||
)
|
||||
result = topological_sort(["DWD_Y", "ODS_A", "DWD_X", "ODS_B"], reg)
|
||||
# 所有 ODS 在所有 DWD 之前
|
||||
for ods in ["ODS_A", "ODS_B"]:
|
||||
for dwd in ["DWD_X", "DWD_Y"]:
|
||||
assert result.index(ods) < result.index(dwd)
|
||||
|
||||
def test_cross_layer_with_explicit_deps_combined(self):
|
||||
"""隐含层级依赖 + 显式 depends_on 同时生效。"""
|
||||
reg = _FakeRegistry(
|
||||
deps={
|
||||
"ODS_A": [],
|
||||
"DWD_LOAD": [],
|
||||
"DWS_MONTHLY": ["DWS_DAILY"],
|
||||
"DWS_DAILY": [],
|
||||
},
|
||||
layers={
|
||||
"ODS_A": "ODS",
|
||||
"DWD_LOAD": "DWD",
|
||||
"DWS_DAILY": "DWS",
|
||||
"DWS_MONTHLY": "DWS",
|
||||
},
|
||||
)
|
||||
result = topological_sort(
|
||||
["DWS_MONTHLY", "DWS_DAILY", "DWD_LOAD", "ODS_A"], reg
|
||||
)
|
||||
# 层级:ODS < DWD < DWS
|
||||
assert result.index("ODS_A") < result.index("DWD_LOAD")
|
||||
assert result.index("DWD_LOAD") < result.index("DWS_DAILY")
|
||||
assert result.index("DWD_LOAD") < result.index("DWS_MONTHLY")
|
||||
# 显式依赖:DAILY < MONTHLY
|
||||
assert result.index("DWS_DAILY") < result.index("DWS_MONTHLY")
|
||||
|
||||
def test_tasks_without_layer_unaffected(self):
|
||||
"""无 layer 的任务不受层级排序影响,保持原有依赖关系。"""
|
||||
reg = _FakeRegistry(
|
||||
deps={"UTIL_A": [], "ODS_X": [], "DWD_L": []},
|
||||
layers={"ODS_X": "ODS", "DWD_L": "DWD"},
|
||||
# UTIL_A 无 layer
|
||||
)
|
||||
result = topological_sort(["UTIL_A", "DWD_L", "ODS_X"], reg)
|
||||
# ODS 仍在 DWD 之前
|
||||
assert result.index("ODS_X") < result.index("DWD_L")
|
||||
# UTIL_A 无层级约束,应在结果中
|
||||
assert "UTIL_A" in result
|
||||
|
||||
Reference in New Issue
Block a user