# Design Document: 财务看板 DWS 区域维度重构 ## Overview 本次重构解决财务看板在 `area≠all` 时优惠数据从全局 DWS 表取数导致区域级优惠占比严重失真的核心 bug(如 B区优惠占比 417.9%)。 采用两层架构: 1. **原子层** `dws_finance_area_daily`:按 `(site_id, stat_date, area_code)` 日粒度存储 9 个区域的收入、优惠、现金流等预计算数据 2. **缓存层** `dws_finance_board_cache`:缓存已完成周期的聚合结果,避免重复 SUM 计算 后端查询改为:已完成周期先查缓存 → 未命中从日粒度表 SUM → 当期周期直接从日粒度表 SUM。API 签名和返回结构完全不变,前端零改动。 核心设计决策: - 优惠按结算单桌台区域直接聚合,不做分摊(每张结算单对应一张桌台) - 区域映射抽成共享包,ETL 和后端共用同一份配置 - `discount_gift_card` 使用赠送卡消费金额口径(与现有 ETL 一致) - 现金流/充值/卡消费仅 `area_code='all'` 时有值,区域级无法拆分 ## Architecture ### 系统架构图 ```mermaid graph TB subgraph "数据源" DWD[dwd_settlement_head
结算单明细] DIM[dim_table
桌台维度表 SCD2] DWS_OLD[dws_finance_daily_summary
现有全局日汇总] end subgraph "共享包 packages/shared/" AM[area_mapping.py
AREA_LABEL_MAP + resolve_area_code] end subgraph "ETL 层 apps/etl/" ETL1[DWS_FINANCE_AREA_DAILY
每小时 · delete-before-insert] ETL2[DWS_FINANCE_BOARD_CACHE
每天 · 指纹对比] end subgraph "DWS 新表" T1[dws_finance_area_daily
原子层 · 9行/天/站点] T2[dws_finance_board_cache
缓存层 · 已完成周期] end subgraph "后端 apps/backend/" SVC[board_service.py
缓存优先查询逻辑] FDW[fdw_queries.py
get_finance_overview_area
get_finance_revenue_area] end subgraph "前端(不改动)" MP[小程序财务看板页] end DWD --> ETL1 DIM --> ETL1 DWS_OLD --> ETL1 AM --> ETL1 AM --> FDW ETL1 --> T1 T1 --> ETL2 ETL2 --> T2 T1 --> FDW T2 --> FDW FDW --> SVC SVC --> MP ``` ### 数据流 ```mermaid sequenceDiagram participant Client as 小程序 participant API as FastAPI participant SVC as board_service participant Cache as dws_finance_board_cache participant Daily as dws_finance_area_daily Client->>API: GET /api/xcx/board/finance?time=X&area=Y&compare=Z API->>SVC: get_finance_board(time, area, compare, site_id) alt 已完成周期 (lastMonth/lastWeek/...) SVC->>Cache: 查询缓存 alt 缓存命中 Cache-->>SVC: 返回缓存数据 else 缓存未命中 SVC->>Daily: SUM(area_code=Y, date_range) Daily-->>SVC: 聚合结果 SVC->>Cache: 写入缓存 end else 当期周期 (month/week/quarter) SVC->>Daily: SUM(area_code=Y, date_range) Daily-->>SVC: 聚合结果 end alt compare=1 SVC->>SVC: 对上期执行同样逻辑 SVC->>SVC: calc_compare(当期, 上期) end SVC-->>API: FinanceBoardResponse API-->>Client: JSON (camelCase) ``` ### ETL 任务依赖 ```mermaid graph LR A[DWD_LOAD_FROM_ODS] --> B[DWS_FINANCE_AREA_DAILY
每小时] B --> C[DWS_FINANCE_BOARD_CACHE
每天一次] A --> D[DWS_FINANCE_DAILY
现有任务·不改动] ``` ## Components and Interfaces ### 1. 共享区域映射 — `packages/shared/src/neozqyy_shared/area_mapping.py` ```python # 区域编码 → 物理区域名称列表 AREA_LABEL_MAP: dict[str, list[str]] = { "hallA": ["A区"], "hallB": ["B区"], "hallC": ["C区", "TV台", "美洲豹赛台"], "vip": ["VIP包厢"], "snooker": ["斯诺克区"], "mahjong": ["麻将房", "M7", "M8", "666", "发财"], "ktv": ["K包", "k包活动区", "幸会158"], } # 所有具体区域编码(不含 all/hall) SPECIFIC_AREA_CODES: list[str] # ["hallA", "hallB", ..., "ktv"] # 全部 9 个区域编码 ALL_AREA_CODES: list[str] # ["all", "hall", "hallA", ..., "ktv"] # 反向映射:物理区域名称 → 区域编码 _REVERSE_MAP: dict[str, str] # {"A区": "hallA", "B区": "hallB", ...} def resolve_area_code(area_name: str | None) -> str | None: """输入 site_table_area_name,返回对应的 area_code。未匹配返回 None。""" def get_area_labels(area_code: str) -> list[str] | None: """输入 area_code,返回对应的物理区域名称列表。all/hall 返回 None。""" ``` 设计决策: - `hall` = 所有具体区域之和(不含 all),语义上等同于 all(历史兼容) - `all` = 所有区域之和 - 未匹配的 `area_name` 返回 `None`,由 ETL 决定是否记录警告 ### 2. ETL 任务 — `DWS_FINANCE_AREA_DAILY` 位置:`apps/etl/connectors/feiqiu/tasks/dws/finance_area_daily.py` 继承 `FinanceBaseTask`(复用结算单提取方法),覆盖 `extract` / `transform` / `load`: - **extract**:从 `dwd_settlement_head` + `dim_table` 提取当天结算单(按营业日切点),同时从 `dws_finance_daily_summary` 提取全局现金流/充值/卡消费字段 - **transform**:使用 `resolve_area_code` 将每张结算单映射到区域,按区域聚合收入和优惠字段,构建 9 行(7 个具体区域 + hall + all) - **load**:delete-before-insert(按 `site_id + stat_date` 删除后插入 9 行) 关键接口: ```python class FinanceAreaDailyTask(FinanceBaseTask): def get_task_code(self) -> str: return "DWS_FINANCE_AREA_DAILY" def get_target_table(self) -> str: return "dws.dws_finance_area_daily" def get_primary_keys(self) -> list[str]: return ["site_id", "stat_date", "area_code"] def extract(self, context: TaskContext) -> dict: ... def transform(self, extracted: dict, context: TaskContext) -> list[dict]: ... ``` ### 3. ETL 任务 — `DWS_FINANCE_BOARD_CACHE` 位置:`apps/etl/connectors/feiqiu/tasks/dws/finance_board_cache.py` 继承 `BaseDwsTask`: - **extract**:遍历 5 个已完成周期 × 9 个区域 = 45 组合,对每个组合从 `dws_finance_area_daily` 读取日粒度行 - **transform**:计算数据指纹(MD5),与缓存表对比,标记需要重算的组合 - **load**:对需要重算的组合,从日粒度表 SUM 后 upsert 到缓存表 指纹计算: ```python def compute_fingerprint(rows: list[dict]) -> str: """对 (stat_date, gross_amount, discount_total) 排序后 MD5""" sorted_rows = sorted(rows, key=lambda r: str(r['stat_date'])) payload = json.dumps([(str(r['stat_date']), str(r['gross_amount']), str(r['discount_total'])) for r in sorted_rows]) return hashlib.md5(payload.encode()).hexdigest() ``` ### 4. 后端查询改造 — `fdw_queries.py` 新增/改造函数: ```python def get_finance_overview_area( conn, site_id: int, start_date: str, end_date: str, area_code: str = "all" ) -> dict: """从 v_dws_finance_area_daily 按 area_code 聚合 overview 8 项指标""" def get_finance_revenue_area( conn, site_id: int, start_date: str, end_date: str, area_code: str = "all" ) -> dict: """从 v_dws_finance_area_daily 按 area_code 聚合 revenue 板块数据""" def get_finance_board_cache( conn, site_id: int, time_range: str, area_code: str ) -> dict | None: """查询 v_dws_finance_board_cache 缓存""" def set_finance_board_cache( conn, site_id: int, time_range: str, area_code: str, data: dict ) -> None: """写入/更新缓存""" ``` ### 5. 后端服务改造 — `board_service.py` `get_finance_board` 函数改造: - 新增缓存查询逻辑:已完成周期先查缓存 - `_build_overview` 改为调用 `get_finance_overview_area`(传入 area_code) - `_build_revenue` 改为调用 `get_finance_revenue_area`(传入 area_code) - `_build_cashflow` 不变(始终用全局数据) - `area≠all` 时 overview 覆盖逻辑保留 ## Data Models ### 1. `dws_finance_area_daily` — 原子层 ```sql CREATE TABLE dws.dws_finance_area_daily ( id BIGSERIAL PRIMARY KEY, site_id BIGINT NOT NULL, tenant_id BIGINT NOT NULL, stat_date DATE NOT NULL, area_code VARCHAR(20) NOT NULL, -- 收入结构(4 项 + gross_amount) table_fee_amount NUMERIC(14,2) NOT NULL DEFAULT 0, goods_amount NUMERIC(14,2) NOT NULL DEFAULT 0, assistant_pd_amount NUMERIC(14,2) NOT NULL DEFAULT 0, assistant_cx_amount NUMERIC(14,2) NOT NULL DEFAULT 0, gross_amount NUMERIC(14,2) NOT NULL DEFAULT 0, -- 优惠拆分(6 项 + discount_total) discount_groupbuy NUMERIC(14,2) NOT NULL DEFAULT 0, discount_vip NUMERIC(14,2) NOT NULL DEFAULT 0, discount_manual NUMERIC(14,2) NOT NULL DEFAULT 0, discount_gift_card NUMERIC(14,2) NOT NULL DEFAULT 0, discount_rounding NUMERIC(14,2) NOT NULL DEFAULT 0, discount_other NUMERIC(14,2) NOT NULL DEFAULT 0, discount_total NUMERIC(14,2) NOT NULL DEFAULT 0, -- 确认收入 confirmed_income NUMERIC(14,2) NOT NULL DEFAULT 0, -- 现金流(仅 area_code='all') cash_pay_amount NUMERIC(14,2) NOT NULL DEFAULT 0, cash_paper_amount NUMERIC(14,2) NOT NULL DEFAULT 0, scan_pay_amount NUMERIC(14,2) NOT NULL DEFAULT 0, groupbuy_pay_amount NUMERIC(14,2) NOT NULL DEFAULT 0, recharge_cash_inflow NUMERIC(14,2) NOT NULL DEFAULT 0, cash_inflow_total NUMERIC(14,2) NOT NULL DEFAULT 0, cash_outflow_total NUMERIC(14,2) NOT NULL DEFAULT 0, cash_balance_change NUMERIC(14,2) NOT NULL DEFAULT 0, -- 卡消费(仅 area_code='all') card_consume_total NUMERIC(14,2) NOT NULL DEFAULT 0, recharge_card_consume NUMERIC(14,2) NOT NULL DEFAULT 0, gift_card_consume NUMERIC(14,2) NOT NULL DEFAULT 0, -- 充值(仅 area_code='all') recharge_cash NUMERIC(14,2) NOT NULL DEFAULT 0, first_recharge_cash NUMERIC(14,2) NOT NULL DEFAULT 0, renewal_cash NUMERIC(14,2) NOT NULL DEFAULT 0, -- 订单统计 order_count INTEGER NOT NULL DEFAULT 0, -- 元数据 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), UNIQUE (site_id, stat_date, area_code) ); ``` 约束与恒等式: - `gross_amount = table_fee_amount + goods_amount + assistant_pd_amount + assistant_cx_amount` - `discount_total = discount_groupbuy + discount_vip + discount_manual + discount_gift_card + discount_rounding + discount_other` - `confirmed_income = gross_amount - discount_total` - `area_code ∈ {all, hall, hallA, hallB, hallC, vip, snooker, mahjong, ktv}` - `area_code ≠ 'all'` 时现金流/卡消费/充值字段 = 0 ### 2. `dws_finance_board_cache` — 缓存层 ```sql CREATE TABLE dws.dws_finance_board_cache ( id BIGSERIAL PRIMARY KEY, site_id BIGINT NOT NULL, time_range VARCHAR(20) NOT NULL, area_code VARCHAR(20) NOT NULL, start_date DATE NOT NULL, end_date DATE NOT NULL, prev_start_date DATE, prev_end_date DATE, -- overview 8 项 occurrence NUMERIC(14,2) NOT NULL DEFAULT 0, discount NUMERIC(14,2) NOT NULL DEFAULT 0, discount_rate NUMERIC(8,4) NOT NULL DEFAULT 0, confirmed_revenue NUMERIC(14,2) NOT NULL DEFAULT 0, cash_in NUMERIC(14,2) NOT NULL DEFAULT 0, cash_out NUMERIC(14,2) NOT NULL DEFAULT 0, cash_balance NUMERIC(14,2) NOT NULL DEFAULT 0, balance_rate NUMERIC(8,4) NOT NULL DEFAULT 0, -- 指纹 data_fingerprint VARCHAR(64), computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), -- 元数据 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), UNIQUE (site_id, time_range, area_code) ); ``` 缓存策略: - `time_range ∈ {lastMonth, lastWeek, lastQuarter, quarter3, half6}` → 缓存 - `time_range ∈ {month, week, quarter}` → 不缓存 - 失效条件:`data_fingerprint` 变化(补录导致) ### 3. 区域映射数据模型 ```python # area_code 枚举值 AREA_CODES = Literal[ "all", "hall", "hallA", "hallB", "hallC", "vip", "snooker", "mahjong", "ktv" ] # AREA_LABEL_MAP 结构 AREA_LABEL_MAP: dict[str, list[str]] = { "hallA": ["A区"], "hallB": ["B区"], "hallC": ["C区", "TV台", "美洲豹赛台"], "vip": ["VIP包厢"], "snooker": ["斯诺克区"], "mahjong": ["麻将房", "M7", "M8", "666", "发财"], "ktv": ["K包", "k包活动区", "幸会158"], } ``` ## Correctness Properties *A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.* ### Property 1: 区域映射 round-trip *For any* `area_name` 存在于 `AREA_LABEL_MAP` 的某个值列表中,`resolve_area_code(area_name)` 应返回对应的 `area_code`,且 `area_name in get_area_labels(resolve_area_code(area_name))` 为 True。 **Validates: Requirements 1.1, 1.5** ### Property 2: 未知区域名称返回 None *For any* 不在 `AREA_LABEL_MAP` 任何值列表中的字符串 `area_name`,`resolve_area_code(area_name)` 应返回 `None`。 **Validates: Requirements 1.4** ### Property 3: 日粒度行数学恒等式 *For any* `dws_finance_area_daily` 行,以下三个恒等式必须同时成立: 1. `gross_amount = table_fee_amount + goods_amount + assistant_pd_amount + assistant_cx_amount` 2. `discount_total = discount_groupbuy + discount_vip + discount_manual + discount_gift_card + discount_rounding + discount_other` 3. `confirmed_income = gross_amount - discount_total` **Validates: Requirements 2.1, 2.2, 2.3, 8.3** ### Property 4: 非 all 区域现金流/卡消费/充值为零 *For any* `dws_finance_area_daily` 行,当 `area_code ≠ 'all'` 时,所有现金流字段(cash_pay_amount, cash_paper_amount, scan_pay_amount, groupbuy_pay_amount, recharge_cash_inflow, cash_inflow_total, cash_outflow_total, cash_balance_change)、卡消费字段(card_consume_total, recharge_card_consume, gift_card_consume)和充值字段(recharge_cash, first_recharge_cash, renewal_cash)均应为 0。 **Validates: Requirements 2.5** ### Property 5: ETL 输出完整性与聚合正确性 *For any* 一组结算单输入数据,ETL transform 应输出恰好 9 行(area_code 覆盖 all/hall/hallA/hallB/hallC/vip/snooker/mahjong/ktv),且 `all` 行的收入和优惠字段 = hallA~ktv 各行对应字段之和,`hall` 行的收入和优惠字段 = hallA~ktv 各行对应字段之和。 **Validates: Requirements 2.7, 2.8, 8.4** ### Property 6: ETL 幂等性(delete-before-insert) *For any* 一组结算单输入数据,对同一 `(site_id, stat_date)` 运行两次 ETL transform,两次输出应完全相同。 **Validates: Requirements 3.4** ### Property 7: settle_type 过滤 *For any* 一组包含不同 `settle_type` 值的结算单,ETL 仅处理 `settle_type IN (1, 3)` 的记录,其他 settle_type 的结算单不应影响输出金额。 **Validates: Requirements 3.6** ### Property 8: 数据指纹确定性与缓存失效 *For any* 一组日粒度行,`compute_fingerprint` 是确定性的(相同输入产生相同输出)。且 *for any* 对源数据的修改(改变任意行的 gross_amount 或 discount_total),新指纹应与原指纹不同。 **Validates: Requirements 5.2, 5.3, 5.4** ### Property 9: 当期周期不写入缓存 *For any* `time_range ∈ {month, week, quarter}`,ETL 缓存任务不应为该 time_range 写入缓存记录。 **Validates: Requirements 5.7** ### Property 10: 查询路由正确性 *For any* 查询请求,当 `time_range` 为已完成周期且缓存存在时,应直接返回缓存数据;当缓存不存在时,应从日粒度表 SUM 计算并写入缓存;当 `time_range` 为当期周期时,应直接从日粒度表 SUM 计算,不查缓存。 **Validates: Requirements 6.1, 6.2, 6.3, 9.4** ### Property 11: 区域过滤行为 *For any* `area_code ≠ 'all'` 的查询,`recharge` 板块应返回 `null`,`cashflow`/`expense`/`coach_analysis` 板块的数据应与 `area_code='all'` 时一致。 **Validates: Requirements 6.7, 6.8** ### Property 12: revenue 固定项数 *For any* 查询返回的 revenue 板块,`discount_items` 应恰好包含 5 项(团购/会员折扣/手动调整/赠送卡/其他),`channel_items` 应恰好包含 3 项(储值卡结算冲销/现金线上支付/团购核销)。 **Validates: Requirements 7.3, 7.4** ### Property 13: area≠all 时 overview 覆盖逻辑 *For any* `area_code ≠ 'all'` 的查询,`overview.occurrence` 应等于 `revenue.total_occurrence`,`overview.discount` 应等于 `revenue.discount_total`,`overview.confirmed_revenue` 应等于 `revenue.confirmed_total`。 **Validates: Requirements 7.6** ### Property 14: area=all 回归一致性 *For any* 日期范围和 `area_code='all'` 的查询,新逻辑(从 `dws_finance_area_daily` 查询)的 overview 板块 8 项指标应与旧逻辑(从 `dws_finance_daily_summary` 查询)的结果完全一致。 **Validates: Requirements 9.1** ## Error Handling ### ETL 层 | 场景 | 处理策略 | |------|---------| | `resolve_area_code` 返回 None(未知区域) | 记录 WARNING 日志,该结算单不计入任何具体区域行,但仍计入 all 行 | | `dws_finance_daily_summary` 无当天数据 | all 行的现金流/充值/卡消费字段填 0,记录 WARNING | | `dim_table` 中 `table_id` 无匹配(`scd2_is_current=1`) | 该结算单的 area_code 视为 None,同上处理 | | delete-before-insert 事务失败 | 整个事务回滚,任务标记失败,下次调度重试 | | 指纹计算时日粒度表无数据 | 指纹为空字符串的 MD5,缓存标记为"无数据" | ### 后端查询层 | 场景 | 处理策略 | |------|---------| | `dws_finance_area_daily` 无数据(新站点/新日期) | 返回全零的 overview/revenue,与现有降级逻辑一致 | | 缓存写入失败 | 不影响查询结果返回,记录 ERROR 日志,下次请求重试写入 | | 缓存表连接失败 | 降级为直接从日粒度表 SUM,不中断请求 | | area_code 参数非法 | 由 FastAPI 的 AreaFilterEnum 校验拦截,返回 422 | ### 数据一致性保护 - ETL delete-before-insert 在单个事务内执行,保证原子性 - 缓存写入使用 `ON CONFLICT ... DO UPDATE`,保证幂等性 - 后端查询使用 `SET LOCAL app.current_site_id` 保证 RLS 隔离 ## Testing Strategy ### 属性测试(Property-Based Testing) 使用 **hypothesis** 库(Python),每个属性测试最少 100 次迭代。 | Property | 测试文件 | 生成器 | |----------|---------|--------| | Property 1-2 | `tests/test_area_mapping_props.py` | `st.text()` 生成随机 area_name | | Property 3-7 | `tests/test_finance_area_daily_props.py` | 生成随机结算单列表(金额用 `st.decimals`,area_name 从已知+未知混合) | | Property 8-9 | `tests/test_finance_board_cache_props.py` | 生成随机日粒度行列表 | | Property 10-14 | `tests/test_board_service_props.py` | 生成随机查询参数 + mock 数据库返回 | 每个测试函数必须包含注释标签: ```python # Feature: board-finance-dws-area-refactor, Property 1: 区域映射 round-trip ``` ### 单元测试 | 测试范围 | 测试文件 | 关注点 | |---------|---------|--------| | area_mapping | `tests/test_area_mapping_unit.py` | 边界:空字符串、None、大小写、特殊字符 | | ETL transform | `apps/etl/connectors/feiqiu/tests/unit/test_finance_area_daily.py` | discount_gift_card 口径验证、营业日切点边界 | | ETL cache | `apps/etl/connectors/feiqiu/tests/unit/test_finance_board_cache.py` | 指纹变化检测、空数据处理 | | 后端查询 | `apps/backend/tests/unit/test_fdw_queries_area.py` | SQL 正确性、area_code 过滤、缓存命中/未命中 | | 后端服务 | `apps/backend/tests/unit/test_board_service_area.py` | 覆盖逻辑、环比计算、降级行为 | | 回归验证 | `scripts/ops/validate_board_finance.py` | 144 组合全量对比 | ### 测试配置 ```python # conftest.py / hypothesis settings from hypothesis import settings settings.register_profile("ci", max_examples=100) settings.register_profile("dev", max_examples=30) ``` ### 集成验证 - 144 组合全量验证脚本:`scripts/ops/validate_board_finance.py`(8 time_range × 9 area_code × 2 compare) - area=all 回归对比:新旧逻辑输出 diff - 缓存命中率验证:已完成周期第二次请求不触发 SUM