Files
Neo-ZQYY/docs/specs/board-finance-dws-area-refactor/design.md
Neo 70324d8542 chore: 文档与 IDE 配置整理
- .kiro/specs/ → docs/specs/(41 个历史需求 spec 迁移,移除 .config.kiro)
- CLAUDE.md 三层拆分:根文件精简 + apps/backend/CLAUDE.md + .claude/commands/
- 新增 /spec-close、/pre-change 两个工作流命令
- DDL 基线刷新(从测试库重新导出 11 个文件,dws 35→38 表,biz 18→21 表)
- BD_Manual → BD_manual 命名统一(48 个文件)
- 修复 3 处文档与数据库不一致(auth.users.status 默认值、scheduled_tasks 字段、RLS 视图数)
- 新增 BD_manual_public_rbac_tables.md(public schema 8 张 RBAC/工作流表)
- 合并 biz.trigger_jobs 文档(10→12 字段,归档独立文档)
- docs/database/README.md 索引更新

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 00:02:37 +08:00

21 KiB
Raw Blame History

Design Document: 财务看板 DWS 区域维度重构

Overview

本次重构解决财务看板在 area≠all 时优惠数据从全局 DWS 表取数导致区域级优惠占比严重失真的核心 bug如 B区优惠占比 417.9%)。

采用两层架构:

  1. 原子层 dws_finance_area_daily:按 (site_id, stat_date, area_code) 日粒度存储 9 个区域的收入、优惠、现金流等预计算数据
  2. 缓存层 dws_finance_board_cache:缓存已完成周期的聚合结果,避免重复 SUM 计算

后端查询改为:已完成周期先查缓存 → 未命中从日粒度表 SUM → 当期周期直接从日粒度表 SUM。API 签名和返回结构完全不变,前端零改动。

核心设计决策:

  • 优惠按结算单桌台区域直接聚合,不做分摊(每张结算单对应一张桌台)
  • 区域映射抽成共享包ETL 和后端共用同一份配置
  • discount_gift_card 使用赠送卡消费金额口径(与现有 ETL 一致)
  • 现金流/充值/卡消费仅 area_code='all' 时有值,区域级无法拆分

Architecture

系统架构图

graph TB
    subgraph "数据源"
        DWD[dwd_settlement_head<br/>结算单明细]
        DIM[dim_table<br/>桌台维度表 SCD2]
        DWS_OLD[dws_finance_daily_summary<br/>现有全局日汇总]
    end

    subgraph "共享包 packages/shared/"
        AM[area_mapping.py<br/>AREA_LABEL_MAP + resolve_area_code]
    end

    subgraph "ETL 层 apps/etl/"
        ETL1[DWS_FINANCE_AREA_DAILY<br/>每小时 · delete-before-insert]
        ETL2[DWS_FINANCE_BOARD_CACHE<br/>每天 · 指纹对比]
    end

    subgraph "DWS 新表"
        T1[dws_finance_area_daily<br/>原子层 · 9行/天/站点]
        T2[dws_finance_board_cache<br/>缓存层 · 已完成周期]
    end

    subgraph "后端 apps/backend/"
        SVC[board_service.py<br/>缓存优先查询逻辑]
        FDW[fdw_queries.py<br/>get_finance_overview_area<br/>get_finance_revenue_area]
    end

    subgraph "前端(不改动)"
        MP[小程序财务看板页]
    end

    DWD --> ETL1
    DIM --> ETL1
    DWS_OLD --> ETL1
    AM --> ETL1
    AM --> FDW
    ETL1 --> T1
    T1 --> ETL2
    ETL2 --> T2
    T1 --> FDW
    T2 --> FDW
    FDW --> SVC
    SVC --> MP

数据流

sequenceDiagram
    participant Client as 小程序
    participant API as FastAPI
    participant SVC as board_service
    participant Cache as dws_finance_board_cache
    participant Daily as dws_finance_area_daily

    Client->>API: GET /api/xcx/board/finance?time=X&area=Y&compare=Z
    API->>SVC: get_finance_board(time, area, compare, site_id)

    alt 已完成周期 (lastMonth/lastWeek/...)
        SVC->>Cache: 查询缓存
        alt 缓存命中
            Cache-->>SVC: 返回缓存数据
        else 缓存未命中
            SVC->>Daily: SUM(area_code=Y, date_range)
            Daily-->>SVC: 聚合结果
            SVC->>Cache: 写入缓存
        end
    else 当期周期 (month/week/quarter)
        SVC->>Daily: SUM(area_code=Y, date_range)
        Daily-->>SVC: 聚合结果
    end

    alt compare=1
        SVC->>SVC: 对上期执行同样逻辑
        SVC->>SVC: calc_compare(当期, 上期)
    end

    SVC-->>API: FinanceBoardResponse
    API-->>Client: JSON (camelCase)

ETL 任务依赖

graph LR
    A[DWD_LOAD_FROM_ODS] --> B[DWS_FINANCE_AREA_DAILY<br/>每小时]
    B --> C[DWS_FINANCE_BOARD_CACHE<br/>每天一次]
    A --> D[DWS_FINANCE_DAILY<br/>现有任务·不改动]

Components and Interfaces

1. 共享区域映射 — packages/shared/src/neozqyy_shared/area_mapping.py

# 区域编码 → 物理区域名称列表
AREA_LABEL_MAP: dict[str, list[str]] = {
    "hallA":   ["A区"],
    "hallB":   ["B区"],
    "hallC":   ["C区", "TV台", "美洲豹赛台"],
    "vip":     ["VIP包厢"],
    "snooker": ["斯诺克区"],
    "mahjong": ["麻将房", "M7", "M8", "666", "发财"],
    "ktv":     ["K包", "k包活动区", "幸会158"],
}

# 所有具体区域编码(不含 all/hall
SPECIFIC_AREA_CODES: list[str]  # ["hallA", "hallB", ..., "ktv"]

# 全部 9 个区域编码
ALL_AREA_CODES: list[str]  # ["all", "hall", "hallA", ..., "ktv"]

# 反向映射:物理区域名称 → 区域编码
_REVERSE_MAP: dict[str, str]  # {"A区": "hallA", "B区": "hallB", ...}

def resolve_area_code(area_name: str | None) -> str | None:
    """输入 site_table_area_name返回对应的 area_code。未匹配返回 None。"""

def get_area_labels(area_code: str) -> list[str] | None:
    """输入 area_code返回对应的物理区域名称列表。all/hall 返回 None。"""

设计决策:

  • hall = 所有具体区域之和(不含 all语义上等同于 all历史兼容
  • all = 所有区域之和
  • 未匹配的 area_name 返回 None,由 ETL 决定是否记录警告

2. ETL 任务 — DWS_FINANCE_AREA_DAILY

位置:apps/etl/connectors/feiqiu/tasks/dws/finance_area_daily.py

继承 FinanceBaseTask(复用结算单提取方法),覆盖 extract / transform / load

  • extract:从 dwd_settlement_head + dim_table 提取当天结算单(按营业日切点),同时从 dws_finance_daily_summary 提取全局现金流/充值/卡消费字段
  • transform:使用 resolve_area_code 将每张结算单映射到区域,按区域聚合收入和优惠字段,构建 9 行7 个具体区域 + hall + all
  • loaddelete-before-insertsite_id + stat_date 删除后插入 9 行)

关键接口:

class FinanceAreaDailyTask(FinanceBaseTask):
    def get_task_code(self) -> str: return "DWS_FINANCE_AREA_DAILY"
    def get_target_table(self) -> str: return "dws.dws_finance_area_daily"
    def get_primary_keys(self) -> list[str]: return ["site_id", "stat_date", "area_code"]
    def extract(self, context: TaskContext) -> dict: ...
    def transform(self, extracted: dict, context: TaskContext) -> list[dict]: ...

3. ETL 任务 — DWS_FINANCE_BOARD_CACHE

位置:apps/etl/connectors/feiqiu/tasks/dws/finance_board_cache.py

继承 BaseDwsTask

  • extract:遍历 5 个已完成周期 × 9 个区域 = 45 组合,对每个组合从 dws_finance_area_daily 读取日粒度行
  • transform计算数据指纹MD5与缓存表对比标记需要重算的组合
  • load:对需要重算的组合,从日粒度表 SUM 后 upsert 到缓存表

指纹计算:

def compute_fingerprint(rows: list[dict]) -> str:
    """对 (stat_date, gross_amount, discount_total) 排序后 MD5"""
    sorted_rows = sorted(rows, key=lambda r: str(r['stat_date']))
    payload = json.dumps([(str(r['stat_date']), str(r['gross_amount']), str(r['discount_total'])) for r in sorted_rows])
    return hashlib.md5(payload.encode()).hexdigest()

4. 后端查询改造 — fdw_queries.py

新增/改造函数:

def get_finance_overview_area(
    conn, site_id: int, start_date: str, end_date: str, area_code: str = "all"
) -> dict:
    """从 v_dws_finance_area_daily 按 area_code 聚合 overview 8 项指标"""

def get_finance_revenue_area(
    conn, site_id: int, start_date: str, end_date: str, area_code: str = "all"
) -> dict:
    """从 v_dws_finance_area_daily 按 area_code 聚合 revenue 板块数据"""

def get_finance_board_cache(
    conn, site_id: int, time_range: str, area_code: str
) -> dict | None:
    """查询 v_dws_finance_board_cache 缓存"""

def set_finance_board_cache(
    conn, site_id: int, time_range: str, area_code: str, data: dict
) -> None:
    """写入/更新缓存"""

5. 后端服务改造 — board_service.py

get_finance_board 函数改造:

  • 新增缓存查询逻辑:已完成周期先查缓存
  • _build_overview 改为调用 get_finance_overview_area(传入 area_code
  • _build_revenue 改为调用 get_finance_revenue_area(传入 area_code
  • _build_cashflow 不变(始终用全局数据)
  • area≠all 时 overview 覆盖逻辑保留

Data Models

1. dws_finance_area_daily — 原子层

CREATE TABLE dws.dws_finance_area_daily (
    id              BIGSERIAL PRIMARY KEY,
    site_id         BIGINT NOT NULL,
    tenant_id       BIGINT NOT NULL,
    stat_date       DATE NOT NULL,
    area_code       VARCHAR(20) NOT NULL,
    -- 收入结构4 项 + gross_amount
    table_fee_amount        NUMERIC(14,2) NOT NULL DEFAULT 0,
    goods_amount            NUMERIC(14,2) NOT NULL DEFAULT 0,
    assistant_pd_amount     NUMERIC(14,2) NOT NULL DEFAULT 0,
    assistant_cx_amount     NUMERIC(14,2) NOT NULL DEFAULT 0,
    gross_amount            NUMERIC(14,2) NOT NULL DEFAULT 0,
    -- 优惠拆分6 项 + discount_total
    discount_groupbuy       NUMERIC(14,2) NOT NULL DEFAULT 0,
    discount_vip            NUMERIC(14,2) NOT NULL DEFAULT 0,
    discount_manual         NUMERIC(14,2) NOT NULL DEFAULT 0,
    discount_gift_card      NUMERIC(14,2) NOT NULL DEFAULT 0,
    discount_rounding       NUMERIC(14,2) NOT NULL DEFAULT 0,
    discount_other          NUMERIC(14,2) NOT NULL DEFAULT 0,
    discount_total          NUMERIC(14,2) NOT NULL DEFAULT 0,
    -- 确认收入
    confirmed_income        NUMERIC(14,2) NOT NULL DEFAULT 0,
    -- 现金流(仅 area_code='all'
    cash_pay_amount         NUMERIC(14,2) NOT NULL DEFAULT 0,
    cash_paper_amount       NUMERIC(14,2) NOT NULL DEFAULT 0,
    scan_pay_amount         NUMERIC(14,2) NOT NULL DEFAULT 0,
    groupbuy_pay_amount     NUMERIC(14,2) NOT NULL DEFAULT 0,
    recharge_cash_inflow    NUMERIC(14,2) NOT NULL DEFAULT 0,
    cash_inflow_total       NUMERIC(14,2) NOT NULL DEFAULT 0,
    cash_outflow_total      NUMERIC(14,2) NOT NULL DEFAULT 0,
    cash_balance_change     NUMERIC(14,2) NOT NULL DEFAULT 0,
    -- 卡消费(仅 area_code='all'
    card_consume_total      NUMERIC(14,2) NOT NULL DEFAULT 0,
    recharge_card_consume   NUMERIC(14,2) NOT NULL DEFAULT 0,
    gift_card_consume       NUMERIC(14,2) NOT NULL DEFAULT 0,
    -- 充值(仅 area_code='all'
    recharge_cash           NUMERIC(14,2) NOT NULL DEFAULT 0,
    first_recharge_cash     NUMERIC(14,2) NOT NULL DEFAULT 0,
    renewal_cash            NUMERIC(14,2) NOT NULL DEFAULT 0,
    -- 订单统计
    order_count             INTEGER NOT NULL DEFAULT 0,
    -- 元数据
    created_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (site_id, stat_date, area_code)
);

约束与恒等式:

  • gross_amount = table_fee_amount + goods_amount + assistant_pd_amount + assistant_cx_amount
  • discount_total = discount_groupbuy + discount_vip + discount_manual + discount_gift_card + discount_rounding + discount_other
  • confirmed_income = gross_amount - discount_total
  • area_code ∈ {all, hall, hallA, hallB, hallC, vip, snooker, mahjong, ktv}
  • area_code ≠ 'all' 时现金流/卡消费/充值字段 = 0

2. dws_finance_board_cache — 缓存层

CREATE TABLE dws.dws_finance_board_cache (
    id              BIGSERIAL PRIMARY KEY,
    site_id         BIGINT NOT NULL,
    time_range      VARCHAR(20) NOT NULL,
    area_code       VARCHAR(20) NOT NULL,
    start_date      DATE NOT NULL,
    end_date        DATE NOT NULL,
    prev_start_date DATE,
    prev_end_date   DATE,
    -- overview 8 项
    occurrence          NUMERIC(14,2) NOT NULL DEFAULT 0,
    discount            NUMERIC(14,2) NOT NULL DEFAULT 0,
    discount_rate       NUMERIC(8,4) NOT NULL DEFAULT 0,
    confirmed_revenue   NUMERIC(14,2) NOT NULL DEFAULT 0,
    cash_in             NUMERIC(14,2) NOT NULL DEFAULT 0,
    cash_out            NUMERIC(14,2) NOT NULL DEFAULT 0,
    cash_balance        NUMERIC(14,2) NOT NULL DEFAULT 0,
    balance_rate        NUMERIC(8,4) NOT NULL DEFAULT 0,
    -- 指纹
    data_fingerprint    VARCHAR(64),
    computed_at         TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    -- 元数据
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (site_id, time_range, area_code)
);

缓存策略:

  • time_range ∈ {lastMonth, lastWeek, lastQuarter, quarter3, half6} → 缓存
  • time_range ∈ {month, week, quarter} → 不缓存
  • 失效条件:data_fingerprint 变化(补录导致)

3. 区域映射数据模型

# area_code 枚举值
AREA_CODES = Literal[
    "all", "hall", "hallA", "hallB", "hallC",
    "vip", "snooker", "mahjong", "ktv"
]

# AREA_LABEL_MAP 结构
AREA_LABEL_MAP: dict[str, list[str]] = {
    "hallA":   ["A区"],
    "hallB":   ["B区"],
    "hallC":   ["C区", "TV台", "美洲豹赛台"],
    "vip":     ["VIP包厢"],
    "snooker": ["斯诺克区"],
    "mahjong": ["麻将房", "M7", "M8", "666", "发财"],
    "ktv":     ["K包", "k包活动区", "幸会158"],
}

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: 区域映射 round-trip

For any area_name 存在于 AREA_LABEL_MAP 的某个值列表中,resolve_area_code(area_name) 应返回对应的 area_code,且 area_name in get_area_labels(resolve_area_code(area_name)) 为 True。

Validates: Requirements 1.1, 1.5

Property 2: 未知区域名称返回 None

For any 不在 AREA_LABEL_MAP 任何值列表中的字符串 area_nameresolve_area_code(area_name) 应返回 None

Validates: Requirements 1.4

Property 3: 日粒度行数学恒等式

For any dws_finance_area_daily 行,以下三个恒等式必须同时成立:

  1. gross_amount = table_fee_amount + goods_amount + assistant_pd_amount + assistant_cx_amount
  2. discount_total = discount_groupbuy + discount_vip + discount_manual + discount_gift_card + discount_rounding + discount_other
  3. confirmed_income = gross_amount - discount_total

Validates: Requirements 2.1, 2.2, 2.3, 8.3

Property 4: 非 all 区域现金流/卡消费/充值为零

For any dws_finance_area_daily 行,当 area_code ≠ 'all'所有现金流字段cash_pay_amount, cash_paper_amount, scan_pay_amount, groupbuy_pay_amount, recharge_cash_inflow, cash_inflow_total, cash_outflow_total, cash_balance_change、卡消费字段card_consume_total, recharge_card_consume, gift_card_consume和充值字段recharge_cash, first_recharge_cash, renewal_cash均应为 0。

Validates: Requirements 2.5

Property 5: ETL 输出完整性与聚合正确性

For any 一组结算单输入数据ETL transform 应输出恰好 9 行area_code 覆盖 all/hall/hallA/hallB/hallC/vip/snooker/mahjong/ktvall 行的收入和优惠字段 = hallAktv 各行对应字段之和,hall 行的收入和优惠字段 = hallAktv 各行对应字段之和。

Validates: Requirements 2.7, 2.8, 8.4

Property 6: ETL 幂等性delete-before-insert

For any 一组结算单输入数据,对同一 (site_id, stat_date) 运行两次 ETL transform两次输出应完全相同。

Validates: Requirements 3.4

Property 7: settle_type 过滤

For any 一组包含不同 settle_type 值的结算单ETL 仅处理 settle_type IN (1, 3) 的记录,其他 settle_type 的结算单不应影响输出金额。

Validates: Requirements 3.6

Property 8: 数据指纹确定性与缓存失效

For any 一组日粒度行,compute_fingerprint 是确定性的(相同输入产生相同输出)。且 for any 对源数据的修改(改变任意行的 gross_amount 或 discount_total新指纹应与原指纹不同。

Validates: Requirements 5.2, 5.3, 5.4

Property 9: 当期周期不写入缓存

For any time_range ∈ {month, week, quarter}ETL 缓存任务不应为该 time_range 写入缓存记录。

Validates: Requirements 5.7

Property 10: 查询路由正确性

For any 查询请求,当 time_range 为已完成周期且缓存存在时,应直接返回缓存数据;当缓存不存在时,应从日粒度表 SUM 计算并写入缓存;当 time_range 为当期周期时,应直接从日粒度表 SUM 计算,不查缓存。

Validates: Requirements 6.1, 6.2, 6.3, 9.4

Property 11: 区域过滤行为

For any area_code ≠ 'all' 的查询,recharge 板块应返回 nullcashflow/expense/coach_analysis 板块的数据应与 area_code='all' 时一致。

Validates: Requirements 6.7, 6.8

Property 12: revenue 固定项数

For any 查询返回的 revenue 板块,discount_items 应恰好包含 5 项(团购/会员折扣/手动调整/赠送卡/其他),channel_items 应恰好包含 3 项(储值卡结算冲销/现金线上支付/团购核销)。

Validates: Requirements 7.3, 7.4

Property 13: area≠all 时 overview 覆盖逻辑

For any area_code ≠ 'all' 的查询,overview.occurrence 应等于 revenue.total_occurrenceoverview.discount 应等于 revenue.discount_totaloverview.confirmed_revenue 应等于 revenue.confirmed_total

Validates: Requirements 7.6

Property 14: area=all 回归一致性

For any 日期范围和 area_code='all' 的查询,新逻辑(从 dws_finance_area_daily 查询)的 overview 板块 8 项指标应与旧逻辑(从 dws_finance_daily_summary 查询)的结果完全一致。

Validates: Requirements 9.1

Error Handling

ETL 层

场景 处理策略
resolve_area_code 返回 None未知区域 记录 WARNING 日志,该结算单不计入任何具体区域行,但仍计入 all 行
dws_finance_daily_summary 无当天数据 all 行的现金流/充值/卡消费字段填 0记录 WARNING
dim_tabletable_id 无匹配(scd2_is_current=1 该结算单的 area_code 视为 None同上处理
delete-before-insert 事务失败 整个事务回滚,任务标记失败,下次调度重试
指纹计算时日粒度表无数据 指纹为空字符串的 MD5缓存标记为"无数据"

后端查询层

场景 处理策略
dws_finance_area_daily 无数据(新站点/新日期) 返回全零的 overview/revenue与现有降级逻辑一致
缓存写入失败 不影响查询结果返回,记录 ERROR 日志,下次请求重试写入
缓存表连接失败 降级为直接从日粒度表 SUM不中断请求
area_code 参数非法 由 FastAPI 的 AreaFilterEnum 校验拦截,返回 422

数据一致性保护

  • ETL delete-before-insert 在单个事务内执行,保证原子性
  • 缓存写入使用 ON CONFLICT ... DO UPDATE,保证幂等性
  • 后端查询使用 SET LOCAL app.current_site_id 保证 RLS 隔离

Testing Strategy

属性测试Property-Based Testing

使用 hypothesisPython每个属性测试最少 100 次迭代。

Property 测试文件 生成器
Property 1-2 tests/test_area_mapping_props.py st.text() 生成随机 area_name
Property 3-7 tests/test_finance_area_daily_props.py 生成随机结算单列表(金额用 st.decimalsarea_name 从已知+未知混合)
Property 8-9 tests/test_finance_board_cache_props.py 生成随机日粒度行列表
Property 10-14 tests/test_board_service_props.py 生成随机查询参数 + mock 数据库返回

每个测试函数必须包含注释标签:

# Feature: board-finance-dws-area-refactor, Property 1: 区域映射 round-trip

单元测试

测试范围 测试文件 关注点
area_mapping tests/test_area_mapping_unit.py 边界空字符串、None、大小写、特殊字符
ETL transform apps/etl/connectors/feiqiu/tests/unit/test_finance_area_daily.py discount_gift_card 口径验证、营业日切点边界
ETL cache apps/etl/connectors/feiqiu/tests/unit/test_finance_board_cache.py 指纹变化检测、空数据处理
后端查询 apps/backend/tests/unit/test_fdw_queries_area.py SQL 正确性、area_code 过滤、缓存命中/未命中
后端服务 apps/backend/tests/unit/test_board_service_area.py 覆盖逻辑、环比计算、降级行为
回归验证 scripts/ops/validate_board_finance.py 144 组合全量对比

测试配置

# conftest.py / hypothesis settings
from hypothesis import settings
settings.register_profile("ci", max_examples=100)
settings.register_profile("dev", max_examples=30)

集成验证

  • 144 组合全量验证脚本:scripts/ops/validate_board_finance.py8 time_range × 9 area_code × 2 compare
  • area=all 回归对比:新旧逻辑输出 diff
  • 缓存命中率验证:已完成周期第二次请求不触发 SUM