Compare commits

..

3 Commits

Author SHA1 Message Date
Neo
90fb63feaf 整理 SQL 的 注释 2025-12-13 08:26:09 +08:00
Neo
0ab040b9fb 整理项目 2025-12-09 05:43:04 +08:00
Neo
0c29bd41f8 整理项目 2025-12-09 05:42:57 +08:00
40 changed files with 12899 additions and 3630 deletions

119
README.md
View File

@@ -1,88 +1,57 @@
# 飞球 ETL 系统 # 飞球 ETL 系统ODS → DWD
面向门店业务的 ETL 流水线:从上游 API 拉取订单/支付/会员/库存等 JSON先落地 ODS随后清洗装载 DWD含 SCD2 维度、事实增量),并提供质量校验与回归验证工具 面向门店业务的 ETL:拉取/或离线灌入上游 JSON先落地 ODS清洗装载 DWD含 SCD2 维度、事实增量),并提供质量校验报表
## 功能要点 ## 快速运行(离线示例 JSON
- 双层形态ODS 原始保留 + DWD 清洗标准化,支持回放与重载。 1) 环境Python 3.10+、PostgreSQL`.env` 关键项:`PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test``INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc`
- 任务调度ETLScheduler 统一管理任务、日志、失败隔离CLI 友好。
- 配置体系:默认值 + .env + CLI 覆盖,便于多环境运行。
- 批量入库:通用 ODS Loader / SCD2 维度合并 / 事实增量写入。
- 回归校验:示例 JSON、行数对照、质量报告便于快速验证。
## 仓库结构
- etl_billiards/config默认配置、环境变量解析、CLI 覆盖。
- etl_billiards/apiHTTP 客户端与重试、分页封装。
- etl_billiards/database连接管理、批量 upsert 封装、DDL。
- etl_billiards/tasks业务任务ODS/DWD/初始化/手工灌入等)。
- etl_billiards/loadersODS/DWD/SCD Loader 实现。
- etl_billiards/orchestration调度器与任务注册。
- etl_billiards/scripts测试、重建、探活脚本。
- etl_billiards/reports质量报告输出。
- etl_billiards/docsODS->DWD 映射说明、样例 JSON 说明。
## 支持的主要任务
- ODS订单结算、台费流水、助教流水/废除、库存、支付、退款、会员、充值结算等。
- DWD维度表门店/台桌/会员/助教/商品等)与事实表(结算、支付、退款、充值、台费、商品销售等)。
- 初始化与手工灌入INIT_ODS_SCHEMA、MANUAL_INGEST。
## 快速开始
1) 环境Python 3.10+PostgreSQL 可用;在 etl_billiards/ 下运行命令。
2) 安装依赖: 2) 安装依赖:
```bash ```bash
cd etl_billiards cd etl_billiards
pip install -r requirements.txt pip install -r requirements.txt
# 开发模式pip install -e .
``` ```
3) 配置 .env示例关键项 3) 一键 ODS→DWD→质检
```bash ```bash
PG_DSN=postgresql://user:pwd@host:5432/LLZQ-test python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA --pipeline-flow INGEST_ONLY
API_BASE=https://api.example.com python -m etl_billiards.cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
API_TOKEN=your_token python -m etl_billiards.cli.main --tasks DWD_LOAD_FROM_ODS --pipeline-flow INGEST_ONLY
STORE_ID=2790685415443269 python -m etl_billiards.cli.main --tasks DWD_QUALITY_CHECK --pipeline-flow INGEST_ONLY
EXPORT_ROOT=C:\dev\LLTQ\export\JSON # 报表etl_billiards/reports/dwd_quality_report.json
LOG_ROOT=C:\dev\LLTQ\export\LOG
INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc
```
4) 初始化库表:
```bash
python -m cli.main --tasks INIT_ODS_SCHEMA --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
# 或直接用 psql 执行 schema_*.sql
```
5) 运行任务(示例):
```bash
# 默认任务列表(见 config/defaults.py
python -m cli.main
# 指定任务
python -m cli.main --tasks settlement_records,recharge_settlements
# 仅手工灌入示例 JSON
python -m cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
``` ```
## 运行与数据流 ## 目录与文件作用
- CLI 解析参数 -> AppConfig.load 合并配置 -> ETLScheduler 创建 DB/API/日志上下文 -> 实例化任务 -> 拉取/清洗/写入。 - 根目录:`etl_billiards/` 主代码;`requirements.txt` 依赖;`run_etl.sh/.bat` 启动脚本;`.env/.env.example` 配置;`tmp/` 存放草稿/调试/备份。
- ODS 任务:调用 API分页提取字段解析后批量 upsertpayload 保留原始 JSON。 - etl_billiards/ 主线目录
- DWD 任务:维度表做 SCD2事实表按时间水位增量写入。 - `config/``defaults.py` 默认值,`env_parser.py` 解析 .env`settings.py` 统一配置加载。
- `api/``client.py` HTTP 请求、重试与分页。
- `database/``connection.py` 连接封装,`operations.py` 批量 upsertDDL`schema_ODS_doc.sql`、`schema_dwd_doc.sql`。
- `tasks/`:业务任务
- `init_schema_task.py`INIT_ODS_SCHEMA / INIT_DWD_SCHEMA。
- `manual_ingest_task.py`:示例 JSON → ODS。
- `dwd_load_task.py`ODS → DWD映射、SCD2/事实增量)。
- 其他任务按需扩展。
- `loaders/`ODS/DWD/SCD2 Loader 实现。
- `scd/``scd2_handler.py` 处理维度 SCD2 历史。
- `quality/`:质量检查器(行数/金额对照)。
- `orchestration/``scheduler.py` 调度;`task_registry.py` 任务注册;`run_tracker.py` 运行记录。
- `scripts/`:重建/测试/探活工具。
- `docs/``ods_to_dwd_mapping.md` 映射说明,`ods_sample_json.md` 示例 JSON 说明,`dwd_quality_check.md` 质检说明。
- `reports/`:质检输出(如 `dwd_quality_report.json`)。
- `tests/`:单元/集成测试;`utils/`:通用工具。
- `backups/`(若存在):关键文件备份。
## 测试与回归 ## 业务流程与文件关系
- 单测/集成pytest 或 python scripts/run_tests.py --suite online。 1) 调度入口:`cli/main.py` 解析 CLI → `orchestration/scheduler.py` 依 `task_registry.py` 创建任务 → 初始化 DB/API/Config 上下文。
- 离线模式TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=... pytest tests/unit/test_etl_tasks_offline.py。 2) ODS`init_schema_task.py` 执行 `schema_ODS_doc.sql` 建表;`manual_ingest_task.py` 从 `INGEST_SOURCE_DIR` 读 JSON批量 upsert ODS。
- 数据库连通python scripts/test_db_connection.py --dsn <PG_DSN> --query "SELECT 1"。 3) DWD`init_schema_task.py` 执行 `schema_dwd_doc.sql` 建表;`dwd_load_task.py` 依据 `TABLE_MAP/FACT_MAPPINGS` 从 ODS 清洗写入 DWD维度走 SCD2`scd/scd2_handler.py`),事实按时间/水位增量。
4) 质检:质量任务读取 ODS/DWD统计行数/金额,输出 `reports/dwd_quality_report.json`。
5) 配置:`config/defaults.py` + `.env` + CLI 参数叠加HTTP如启用在线走 `api/client.py`DB 访问走 `database/connection.py`。
6) 文档:`docs/ods_to_dwd_mapping.md` 记录字段映射;`docs/ods_sample_json.md` 描述示例数据结构,便于对照调试。
## 其他提示 ## 当前状态2025-12-09
- .env.example 罗列全部配置config/defaults.py 给出默认值与任务窗口。 - 示例 JSON 全量灌入DWD 行数与 ODS 对齐。
- loaders/ods/generic.py 支持自定义主键/冲突列; asks/manual_ingest_task.py 可将示例 JSON 快速灌入对应 ODS 表。 - 分类维度已展平大类+子类:`dim_goods_category` 26 行category_level/leaf 已赋值)。
- 添加新任务:在 asks/ 中实现并在 orchestration/task_registry.py 注册 - 剩余空值多因源数据为空;补值需先确认上游是否提供
## ODS 任务与调度使用
- 注册etl_admin.etl_task 已启用 INIT_ODS_SCHEMA、MANUAL_INGESTstore_id=2790685415443269可按需追加其他任务
- 示例数据目录:默认 C:\dev\LLTQ\export\test-json-doc可在 .env 的 INGEST_SOURCE_DIR 覆盖)。
- 一键重建+灌入:
`bash
python -m cli.main --tasks INIT_ODS_SCHEMA,MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
`
- 行数对照etl_billiards/ods_row_report.json 存示例 JSON 行数与 ODS 行数,可用于回归校验。
- 备份etl_billiards/backups/ 保存当前 schema_ODS_doc.sql、 asks/manual_ingest_task.py 版本。
- 充值结算 ODSrecharge_settlements 已按 settleList 扁平化主字段(
echarge_order_id 主键,金额/状态/快照等列site_profile 与 payload 保留原始 JSON任务 recharge_settlements 直接写入该表,手工灌入会自动展开
echarge_settlements.json。
## 可精简/归档
- `tmp/`、`tmp/etl_billiards_misc/` 中的草稿、旧备份、调试脚本仅供参考,运行不依赖。
- 根级保留必要文件README、requirements、run_etl.*、.env/.env.example其余临时文件已移至 tmp。

216
README_FULL.md Normal file
View File

@@ -0,0 +1,216 @@
# 飞球 ETL 系统ODS → DWD— 详细版
> 本文为项目的详细说明,保持与当前代码一致,覆盖 ODS 任务、DWD 装载、质检及开发扩展要点。
---
## 1. 项目概览
面向门店业务的 ETL从上游 API 或离线 JSON 采集订单、支付、会员、库存等数据,先落地 **ODS**,再清洗装载 **DWD**(含 SCD2 维度、事实增量),并输出质量校验报表。项目采用模块化/分层架构配置、API、数据库、Loader/SCD、质量、调度、CLI、测试统一通过 CLI 调度。
---
## 2. 快速开始(离线示例 JSON
**环境要求**Python 3.10+PostgreSQL`.env` 关键项:
- `PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test`
- `INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc`
**安装依赖**
```bash
cd etl_billiards
pip install -r requirements.txt
```
**一键 ODS → DWD → 质检(离线回放)**
```bash
# 初始化 ODS + DWD
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA --pipeline-flow INGEST_ONLY
# 灌入示例 JSON 到 ODS可用 .env 的 INGEST_SOURCE_DIR 覆盖)
python -m etl_billiards.cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
# 从 ODS 装载 DWD
python -m etl_billiards.cli.main --tasks DWD_LOAD_FROM_ODS --pipeline-flow INGEST_ONLY
# 质量校验报表
python -m etl_billiards.cli.main --tasks DWD_QUALITY_CHECK --pipeline-flow INGEST_ONLY
# 报表输出etl_billiards/reports/dwd_quality_report.json
```
> 可按需单独运行:
> - 仅建表:`python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA`
> - 仅 ODS 灌入:`python -m etl_billiards.cli.main --tasks MANUAL_INGEST`
> - 仅 DWD 装载:`python -m etl_billiards.cli.main --tasks INIT_DWD_SCHEMA,DWD_LOAD_FROM_ODS`
---
## 3. 配置与路径
- 示例数据目录:`C:\dev\LLTQ\export\test-json-doc`(可由 `.env``INGEST_SOURCE_DIR` 覆盖)。
- 日志/导出目录:`LOG_ROOT``EXPORT_ROOT``.env`
- 报表:`etl_billiards/reports/dwd_quality_report.json`
- DDL`etl_billiards/database/schema_ODS_doc.sql``etl_billiards/database/schema_dwd_doc.sql`
- 任务注册:`etl_billiards/orchestration/task_registry.py`(默认启用 INIT_ODS_SCHEMA、MANUAL_INGEST、INIT_DWD_SCHEMA、DWD_LOAD_FROM_ODS、DWD_QUALITY_CHECK
**安全提示**:建议将数据库凭证保存在 `.env` 或受控秘钥管理中,生产环境使用最小权限账号。
---
## 4. 目录结构与关键文件
- 根目录:`etl_billiards/` 主代码;`requirements.txt` 依赖;`run_etl.sh/.bat` 启动脚本;`.env/.env.example` 配置;`tmp/` 草稿/调试归档。
- `config/``defaults.py` 默认值,`env_parser.py` 解析 .env`settings.py` AppConfig 统一加载。
- `api/``client.py` HTTP 请求、重试、分页。
- `database/``connection.py` 连接封装;`operations.py` 批量 upsertDDL SQLODS/DWD
- `tasks/`
- `init_schema_task.py`INIT_ODS_SCHEMA/INIT_DWD_SCHEMA
- `manual_ingest_task.py`(示例 JSON → ODS
- `dwd_load_task.py`ODS → DWD 映射、SCD2/事实增量);
- 其他任务按需扩展。
- `loaders/`ODS/DWD/SCD2 Loader 实现。
- `scd/``scd2_handler.py` 处理维度 SCD2 历史。
- `quality/`:质量检查器(行数/金额对照)。
- `orchestration/``scheduler.py` 调度;`task_registry.py` 注册;`run_tracker.py` 运行记录;`cursor_manager.py` 水位管理。
- `scripts/`:重建/测试/探活工具。
- `docs/``ods_to_dwd_mapping.md` 映射说明;`ods_sample_json.md` 示例 JSON 说明;`dwd_quality_check.md` 质检说明。
- `reports/`:质检输出(如 `dwd_quality_report.json`)。
- `tests/`:单元/集成测试;`utils/`:通用工具;`backups/`:备份(若存在)。
---
## 5. 架构与流程
执行链路(控制流):
1) CLI`cli/main.py`)解析参数 → 生成 AppConfig → 初始化日志/DB 连接;
2) 调度层(`scheduler.py`)按 `task_registry.py` 中的注册表实例化任务,设置 run_uuid、cursor水位、上下文
3) 任务基类模板:
- 获取时间窗口/水位cursor_manager
- 拉取数据:在线模式调用 `api/client.py` 支持分页、重试;离线模式直接读取 JSON 文件;
- 解析与校验:类型转换、必填校验(如任务内部 parse/validate
- 加载:调用 Loader`loaders/`)执行批量 Upsert/SCD2/增量写入(底层用 `database/operations.py`
- 质量检查(如需):质量模块对行数/金额等进行对比;
- 更新水位与运行记录(`run_tracker.py`),提交/回滚事务。
数据流与依赖:
- 配置:`config/defaults.py` + `.env` + CLI 参数叠加,形成 AppConfig。
- API 访问:`api/client.py` 支撑分页/重试;离线 ingest 直接读文件。
- DB 访问:`database/connection.py` 提供连接上下文;`operations.py` 负责批量 upsert/分页写入。
- ODS`manual_ingest_task.py` 读取 JSON → ODS 表(保留 payload/来源/时间戳)。
- DWD`dwd_load_task.py` 依据 `TABLE_MAP/FACT_MAPPINGS` 从 ODS 选取字段;维度走 SCD2`scd/scd2_handler.py`事实走增量支持字段表达式JSON->>、CAST
- 质检:`quality` 模块或相关任务对 ODS/DWD 行数、金额等进行比对,输出 `reports/`
---
## 6. ODS → DWD 策略
1. ODS 留底保留源主键、payload、时间/来源信息。
2. DWD 清洗:维度 SCD2事实按时间/水位增量;字段类型、单位、枚举标准化,保留可溯源字段。
3. 业务键统一site_id、member_id、table_id、order_settle_id、order_trade_no 等统一命名。
4. 不过度汇总DWD 只做明细/轻度清洗,汇总留待 DWS/报表。
5. 去嵌套:数组展开为子表/子行,重复 profile 提炼为维度。
6. 长期演进:优先加列/加表,避免频繁改已有表结构。
---
## 7. 常用 CLI
```bash
# 运行所有已注册任务
python -m etl_billiards.cli.main
# 运行指定任务
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,MANUAL_INGEST
# 覆盖 DSN
python -m etl_billiards.cli.main --pg-dsn "postgresql://user:pwd@host:5432/db"
# 覆盖 API
python -m etl_billiards.cli.main --api-base "https://api.example.com" --api-token "..."
# 试运行(不写库)
python -m etl_billiards.cli.main --dry-run --tasks DWD_LOAD_FROM_ODS
```
---
## 8. 测试ONLINE / OFFLINE
- `TEST_MODE=ONLINE`:调用真实 API全链路 E/T/L。
- `TEST_MODE=OFFLINE`:从 `TEST_JSON_ARCHIVE_DIR` 读取离线 JSON只做 Transform + Load。
- `TEST_DB_DSN`:如设置,则集成测试连真库;未设置用内存/临时库。
示例:
```bash
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/db --query "SELECT 1"
```
---
## 9. 开发与扩展
- 新任务:在 `tasks/` 继承 BaseTask实现 `get_task_code/execute`,并在 `orchestration/task_registry.py` 注册。
- 新 Loader/Checker参考 `loaders/``quality/` 复用批量 upsert/质检接口。
- 配置:`config/defaults.py` + `.env` + CLI 叠加,新增配置需在 defaults 与 env_parser 中声明。
---
## 10. ODS 任务上线指引
- 任务注册脚本:`etl_billiards/database/seed_ods_tasks.sql`(替换 store_id 后执行:`psql "$PG_DSN" -f ...`)。
- 确认 `etl_admin.etl_task` 中已启用所需 ODS 任务。
- 离线回放:可用 `scripts/rebuild_ods_from_json`(如有)从本地 JSON 重建 ODS。
- 单测:`pytest etl_billiards/tests/unit/test_ods_tasks.py`
---
## 11. ODS 表概览(数据路径)
| ODS 表名 | 接口 Path | 数据列表路径 |
| ------------------------------------ | ------------------------------------------------- | ----------------------------- |
| assistant_accounts_master | /PersonnelManagement/SearchAssistantInfo | data.assistantInfos |
| assistant_service_records | /AssistantPerformance/GetOrderAssistantDetails | data.orderAssistantDetails |
| assistant_cancellation_records | /AssistantPerformance/GetAbolitionAssistant | data.abolitionAssistants |
| goods_stock_movements | /GoodsStockManage/QueryGoodsOutboundReceipt | data.queryDeliveryRecordsList |
| goods_stock_summary | /TenantGoods/GetGoodsStockReport | data |
| group_buy_packages | /PackageCoupon/QueryPackageCouponList | data.packageCouponList |
| group_buy_redemption_records | /Site/GetSiteTableUseDetails | data.siteTableUseDetailsList |
| member_profiles | /MemberProfile/GetTenantMemberList | data.tenantMemberInfos |
| member_balance_changes | /MemberProfile/GetMemberCardBalanceChange | data.tenantMemberCardLogs |
| member_stored_value_cards | /MemberProfile/GetTenantMemberCardList | data.tenantMemberCards |
| payment_transactions | /PayLog/GetPayLogListPage | data |
| platform_coupon_redemption_records | /Promotion/GetOfflineCouponConsumePageList | data |
| recharge_settlements | /Site/GetRechargeSettleList | data.settleList |
| refund_transactions | /Order/GetRefundPayLogList | data |
| settlement_records | /Site/GetAllOrderSettleList | data.settleList |
| settlement_ticket_details | /Order/GetOrderSettleTicketNew | 完整 JSON |
| site_tables_master | /Table/GetSiteTables | data.siteTables |
| stock_goods_category_tree | /TenantGoodsCategory/QueryPrimarySecondaryCategory| data.goodsCategoryList |
| store_goods_master | /TenantGoods/GetGoodsInventoryList | data.orderGoodsList |
| store_goods_sales_records | /TenantGoods/GetGoodsSalesList | data.orderGoodsLedgers |
| table_fee_discount_records | /Site/GetTaiFeeAdjustList | data.taiFeeAdjustInfos |
| table_fee_transactions | /Site/GetSiteTableOrderDetails | data.siteTableUseDetailsList |
| tenant_goods_master | /TenantGoods/QueryTenantGoods | data.tenantGoodsList |
> 完整字段级映射见 `docs/` 与 ODS/DWD DDL。
---
## 12. DWD 维度与建模要点
1. 颗粒一致、单一业务键:一张 DWD 表只承载一种业务事件/颗粒,避免混颗粒。
2. 先理解业务链路,再建模;不要机械按 JSON 列表建表。
3. 业务键统一site_id、member_id、table_id、order_settle_id、order_trade_no 等必须一致命名。
4. 保留明细,不过度汇总;聚合留到 DWS/报表。
5. 清洗标准化同时保留溯源字段源主键、时间、金额、payload
6. 去嵌套与解耦:数组展开子行,重复 profile 提炼维度。
7. 演进优先加列/加表,减少对已有表结构的破坏。
---
## 13. 当前状态2025-12-09
- 示例 JSON 已全量灌入DWD 行数与 ODS 对齐。
- 分类维度已展平大类+子类:`dim_goods_category` 26 行category_level/leaf 已赋值)。
- 部分空字段源数据即为空,如需补值请先确认上游。
---
## 14. 可精简/归档
- `tmp/``tmp/etl_billiards_misc/` 中草稿、旧备份、调试脚本仅供参考,不影响运行。
- 根级保留必要文件README、requirements、run_etl.*、.env/.env.example其他临时文件已移至 tmp。
---
## 15. FAQ
- 字段空值:若映射已存在且源列非空仍为空,再检查上游 JSON维度 SCD2 按全量合并。
- DSN/路径:确认 `.env``PG_DSN``INGEST_SOURCE_DIR` 与本地一致。
- 新增任务:在 `tasks/` 实现并注册到 `task_registry.py`,必要时同步更新 DDL 与映射。
- 权限/运行:检查网络、账号权限;脚本需执行权限(如 `chmod +x run_etl.sh`)。

View File

@@ -1,837 +0,0 @@
# 台球场 ETL 系统(模块化版本)合并文档
本文为原多份文档(如 `INDEX.md``QUICK_START.md``ARCHITECTURE.md``MIGRATION_GUIDE.md``PROJECT_STRUCTURE.md``README.md` 等)的合并版,只保留与**当前项目本身**相关的内容:项目说明、目录结构、架构设计、数据与控制流程、迁移与扩展指南等,不包含修改历史和重构过程描述。
---
## 1. 项目概述
台球场 ETL 系统是一个面向门店业务的专业 ETL 工程项目,用于从外部业务 API 拉取订单、支付、会员等数据经过解析、校验、SCD2 处理、质量检查后写入 PostgreSQL 数据库,并支持增量同步和任务运行追踪。
系统采用模块化、分层架构设计,核心特性包括:
- 模块化目录结构配置、数据库、API、模型、加载器、SCD2、质量检查、编排、任务、CLI、工具、测试等分层清晰
- 完整的配置管理:默认值 + 环境变量 + CLI 参数多层覆盖。
- 可复用的数据库访问层(连接管理、批量 Upsert 封装)。
- 支持重试与分页的 API 客户端。
- 类型安全的数据解析与校验模块。
- SCD2 维度历史管理。
- 数据质量检查(例如余额一致性检查)。
- 任务编排层统一调度、游标管理与运行追踪。
- 命令行入口统一管理任务执行支持筛选任务、Dry-run 等模式。
---
## 2. 快速开始
### 2.1 环境准备
- Python 版本:建议 3.10+
- 数据库PostgreSQL
- 操作系统Windows / Linux / macOS 均可
```bash
# 克隆/下载代码后进入项目目录
cd etl_billiards/
ls -la
```
你会看到下述目录结构的顶层部分(详细见第 4 章):
- `config/` - 配置管理
- `database/` - 数据库访问
- `api/` - API 客户端
- `tasks/` - ETL 任务实现
- `cli/` - 命令行入口
- `docs/` - 技术文档
### 2.2 安装依赖
```bash
pip install -r requirements.txt
```
主要依赖示例(按实际 `requirements.txt` 为准):
- `psycopg2-binary`PostgreSQL 驱动
- `requests`HTTP 客户端
- `python-dateutil`:时间处理
- `tzdata`:时区数据
### 2.3 配置环境变量
复制并修改环境变量模板:
```bash
cp .env.example .env
# 使用你习惯的编辑器修改 .env
```
`.env` 示例(最小配置):
```bash
# 数据库
PG_DSN=postgresql://user:password@localhost:5432/....
# API
API_BASE=https://api.example.com
API_TOKEN=your_token_here
# 门店/应用
STORE_ID=2790685415443269
TIMEZONE=Asia/Taipei
# 目录
EXPORT_ROOT=/path/to/export
LOG_ROOT=/path/to/logs
```
> 所有配置项的默认值见 `config/defaults.py`,最终生效配置由「默认值 + 环境变量 + CLI 参数」三层叠加。
### 2.4 运行第一个任务
通过 CLI 入口运行:
```bash
# 运行所有任务
python -m cli.main
# 仅运行订单任务
python -m cli.main --tasks ORDERS
# 运行订单 + 支付
python -m cli.main --tasks ORDERS,PAYMENTS
# Windows 使用脚本
run_etl.bat --tasks ORDERS
# Linux / macOS 使用脚本
./run_etl.sh --tasks ORDERS
```
### 2.5 查看结果
- 日志目录:使用 `LOG_ROOT` 指定,例如
```bash
ls -la C:\dev\LLTQ\export\LOG/
```
- 导出目录:使用 `EXPORT_ROOT` 指定,例如
```bash
ls -la C:\dev\LLTQ\export\JSON/
```
---
## 3. 常用命令与开发工具
### 3.1 CLI 常用命令
```bash
# 运行所有任务
python -m cli.main
# 运行指定任务
python -m cli.main --tasks ORDERS,PAYMENTS,MEMBERS
# 使用自定义数据库
python -m cli.main --pg-dsn "postgresql://user:password@host:5432/db"
# 使用自定义 API 端点
python -m cli.main --api-base "https://api.example.com" --api-token "..."
# 试运行(不写入数据库)
python -m cli.main --dry-run --tasks ORDERS
```
### 3.2 IDE / 代码质量工具示例VSCode
`.vscode/settings.json` 示例:
```json
{
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "black",
"python.testing.pytestEnabled": true
}
```
代码格式化与检查:
```bash
pip install black isort pylint
black .
isort .
pylint etl_billiards/
```
### 3.3 测试
```bash
# 安装测试依赖(按需)
pip install pytest pytest-cov
# 运行全部测试
pytest
# 仅运行单元测试
pytest tests/unit/
# 生成覆盖率报告
pytest --cov=. --cov-report=html
```
测试示例(按实际项目为准):
- `tests/unit/test_config.py` 配置管理单元测试
- `tests/unit/test_parsers.py` 解析器单元测试
- `tests/integration/test_database.py` 数据库集成测试
#### 3.3.1 测试模式ONLINE / OFFLINE
- `TEST_MODE=ONLINE`(默认)时,测试会模拟实时 API完整执行 E/T/L。
- `TEST_MODE=OFFLINE` 时,测试改为从 `TEST_JSON_ARCHIVE_DIR` 指定的归档 JSON 中读取数据,仅做 Transform + Load适合验证本地归档数据是否仍可回放。
- `TEST_JSON_ARCHIVE_DIR`:离线 JSON 归档目录(示例:`tests/source-data-doc` 或 CI 产出的快照)。
- `TEST_JSON_TEMP_DIR`:测试生成的临时 JSON 输出目录,便于隔离每次运行的数据。
- `TEST_DB_DSN`:可选,若设置则单元测试会连接到此 PostgreSQL DSN实打实执行写库留空时测试使用内存伪库避免依赖数据库。
示例命令:
```bash
# 在线模式覆盖所有任务
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
# 离线模式使用归档 JSON 覆盖所有任务
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
# 使用脚本按需组合参数(示例:在线 + 仅订单用例)
python scripts/run_tests.py --suite online --mode ONLINE --keyword ORDERS
# 使用脚本连接真实测试库并回放离线模式
python scripts/run_tests.py --suite offline --mode OFFLINE --db-dsn postgresql://user:pwd@localhost:5432/testdb
# 使用“指令仓库”中的预置命令
python scripts/run_tests.py --preset offline_realdb
python scripts/run_tests.py --list-presets # 查看或自定义 scripts/test_presets.py
```
#### 3.3.2 脚本化测试组合(`run_tests.py` / `test_presets.py`
- `scripts/run_tests.py` 是 pytest 的统一入口:自动把项目根目录加入 `sys.path`,并提供 `--suite online/offline/integration`、`--tests`(自定义路径)、`--mode`、`--db-dsn`、`--json-archive`、`--json-temp`、`--keyword/-k`、`--pytest-args`、`--env KEY=VALUE` 等参数,可以像搭积木一样自由组合;
- `--preset foo` 会读取 `scripts/test_presets.py` 内 `PRESETS["foo"]` 的配置,并叠加到当前命令;`--list-presets` 与 `--dry-run` 可用来审阅或仅打印命令;
- 直接执行 `python scripts/test_presets.py` 可依次运行 `AUTO_RUN_PRESETS` 中列出的预置;传入 `--preset x --dry-run` 则只打印对应命令。
`test_presets.py` 充当“指令仓库”。每个预置都是一个字典,常用字段解释如下:
| 字段 | 作用 |
| ---------------------------- | ------------------------------------------------------------------ |
| `suite` | 复用 `run_tests.py` 内置套件online/offline/integration可多选 |
| `tests` | 追加任意 pytest 路径,例如 `tests/unit/test_config.py` |
| `mode` | 覆盖 `TEST_MODE`ONLINE / OFFLINE |
| `db_dsn` | 覆盖 `TEST_DB_DSN`,用于连入真实测试库 |
| `json_archive` / `json_temp` | 配置离线 JSON 归档与临时目录 |
| `keyword` | 映射到 `pytest -k`,用于关键字过滤 |
| `pytest_args` | 附加 pytest 参数,例 `-vv --maxfail=1` |
| `env` | 额外环境变量列表,如 `["STORE_ID=123"]` |
| `preset_meta` | 说明性文字,便于描述场景 |
示例:`offline_realdb` 预置会设置 `TEST_MODE=OFFLINE`、指定 `tests/source-data-doc` 为归档目录,并通过 `db_dsn` 连到测试库。执行 `python scripts/run_tests.py --preset offline_realdb` 或 `python scripts/test_presets.py --preset offline_realdb` 即可复用该组合保证本地、CI 与生产回放脚本一致。
#### 3.3.3 数据库连通性快速检查
`python scripts/test_db_connection.py` 提供最轻量的 PostgreSQL 连通性检测:默认使用 `TEST_DB_DSN`(也可传 `--dsn`),尝试连接并执行 `SELECT 1 AS ok`(可通过 `--query` 自定义)。典型用途:
```bash
# 读取 .env/环境变量中的 TEST_DB_DSN
python scripts/test_db_connection.py
# 临时指定 DSN并检查任务配置表
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/.... --query "SELECT count(*) FROM etl_admin.etl_task"
```
脚本返回 0 代表连接与查询成功;若返回非 0可结合第 8 章“常见问题排查”的数据库章节(网络、防火墙、账号权限等)先定位问题,再运行完整 ETL。
---
## 4. 项目结构与文件说明
### 4.1 总体目录结构(树状图)
```text
etl_billiards/
├── README.md # 项目总览和使用说明
├── MIGRATION_GUIDE.md # 从旧版本迁移指南
├── requirements.txt # Python 依赖列表
├── setup.py # 项目安装配置
├── .env.example # 环境变量配置模板
├── .gitignore # Git 忽略文件配置
├── run_etl.sh # Linux/Mac 运行脚本
├── run_etl.bat # Windows 运行脚本
├── config/ # 配置管理模块
│ ├── __init__.py
│ ├── defaults.py # 默认配置值定义
│ ├── env_parser.py # 环境变量解析器
│ └── settings.py # 配置管理主类
├── database/ # 数据库访问层
│ ├── __init__.py
│ ├── connection.py # 数据库连接管理
│ └── operations.py # 批量操作封装
├── api/ # HTTP API 客户端
│ ├── __init__.py
│ └── client.py # API 客户端(重试 + 分页)
├── models/ # 数据模型层
│ ├── __init__.py
│ ├── parsers.py # 类型解析器
│ └── validators.py # 数据验证器
├── loaders/ # 数据加载器层
│ ├── __init__.py
│ ├── base_loader.py # 加载器基类
│ ├── dimensions/ # 维度表加载器
│ │ ├── __init__.py
│ │ └── member.py # 会员维度加载器
│ └── facts/ # 事实表加载器
│ ├── __init__.py
│ ├── order.py # 订单事实表加载器
│ └── payment.py # 支付记录加载器
├── scd/ # SCD2 处理层
│ ├── __init__.py
│ └── scd2_handler.py # SCD2 历史记录处理器
├── quality/ # 数据质量检查层
│ ├── __init__.py
│ ├── base_checker.py # 质量检查器基类
│ └── balance_checker.py # 余额一致性检查器
├── orchestration/ # ETL 编排层
│ ├── __init__.py
│ ├── scheduler.py # ETL 调度器
│ ├── task_registry.py # 任务注册表(工厂模式)
│ ├── cursor_manager.py # 游标管理器
│ └── run_tracker.py # 运行记录追踪器
├── tasks/ # ETL 任务层
│ ├── __init__.py
│ ├── base_task.py # 任务基类(模板方法)
│ ├── orders_task.py # 订单 ETL 任务
│ ├── payments_task.py # 支付 ETL 任务
│ └── members_task.py # 会员 ETL 任务
├── cli/ # 命令行接口层
│ ├── __init__.py
│ └── main.py # CLI 主入口
├── utils/ # 工具函数
│ ├── __init__.py
│ └── helpers.py # 通用工具函数
├── tests/ # 测试代码
│ ├── __init__.py
│ ├── unit/ # 单元测试
│ │ ├── __init__.py
│ │ ├── test_config.py
│ │ └── test_parsers.py
│ ├── testdata_json/ # 清洗入库用的测试Json文件
│ │ └── XX.json
│ └── integration/ # 集成测试
│ ├── __init__.py
│ └── test_database.py
└── docs/ # 文档
└── ARCHITECTURE.md # 架构设计文档
```
### 4.2 各模块职责概览
- **config/**
- 统一配置入口,支持默认值、环境变量、命令行参数三层覆盖。
- **database/**
- 封装 PostgreSQL 连接与批量操作插入、更新、Upsert 等)。
- **api/**
- 对上游业务 API 的 HTTP 调用进行统一封装,支持重试、分页与超时控制。
- **models/**
- 提供类型解析器(时间戳、金额、整数等)与业务级数据校验器。
- **loaders/**
- 提供事实表与维度表的加载逻辑(包含批量 Upsert、统计写入结果等
- **scd/**
- 维度型数据的 SCD2 历史管理(有效期、版本标记等)。
- **quality/**
- 质量检查策略,例如余额一致性、记录数量对齐等。
- **orchestration/**
- 任务调度、任务注册、游标管理(增量窗口)、运行记录追踪。
- **tasks/**
- 具体业务任务(订单、支付、会员等),封装了从“取数 → 处理 → 写库 → 记录结果”的完整流程。
- **cli/**
- 命令行入口,解析参数并启动调度流程。
- **utils/**
- 杂项工具函数。
- **tests/**
- 单元测试与集成测试代码。
---
## 5. 架构设计与流程说明
### 5.1 分层架构图
```text
┌─────────────────────────────────────┐
│ CLI 命令行接口 │ <- cli/main.py
└─────────────┬───────────────────────┘
┌─────────────▼───────────────────────┐
│ Orchestration 编排层 │ <- orchestration/
│ (Scheduler, TaskRegistry, ...) │
└─────────────┬───────────────────────┘
┌─────────────▼───────────────────────┐
│ Tasks 任务层 │ <- tasks/
│ (OrdersTask, PaymentsTask, ...) │
└───┬─────────┬─────────┬─────────────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌─────┐ ┌──────────┐
│Loaders │ │ SCD │ │ Quality │ <- loaders/, scd/, quality/
└────────┘ └─────┘ └──────────┘
┌───────▼────────┐
│ Models 模型 │ <- models/
└───────┬────────┘
┌───────▼────────┐
│ API 客户端 │ <- api/
└───────┬────────┘
┌───────▼────────┐
│ Database 访问 │ <- database/
└───────┬────────┘
┌───────▼────────┐
│ Config 配置 │ <- config/
└────────────────┘
```
### 5.2 各层职责(当前设计)
- **CLI 层 (`cli/`)**
- 解析命令行参数指定任务列表、Dry-run、覆盖配置项等
- 初始化配置与日志后交由编排层执行。
- **编排层 (`orchestration/`)**
- `scheduler.py`:根据配置与 CLI 参数选择需要执行的任务,控制执行顺序和并行策略。
- `task_registry.py`:提供任务注册表,按任务代码创建任务实例(工厂模式)。
- `cursor_manager.py`:管理增量游标(时间窗口 / ID 游标)。
- `run_tracker.py`:记录每次任务运行的状态、统计信息和错误信息。
- **任务层 (`tasks/`)**
- `base_task.py`:定义任务执行模板流程(模板方法模式),包括获取窗口、调用上游、解析 / 校验、写库、更新游标等。
- `orders_task.py` / `payments_task.py` / `members_task.py`:实现具体任务逻辑(订单、支付、会员)。
- **加载器 / SCD / 质量层**
- `loaders/`:根据目标表封装 Upsert / Insert / Update 逻辑。
- `scd/scd2_handler.py`:为维度表提供 SCD2 历史管理能力。
- `quality/`:执行数据质量检查,如余额对账。
- **模型层 (`models/`)**
- `parsers.py`:负责数据类型转换(字符串 → 时间戳、Decimal、int 等)。
- `validators.py`:执行字段级和记录级的数据校验。
- **API 层 (`api/client.py`)**
- 封装 HTTP 调用,处理重试、超时及分页。
- **数据库层 (`database/`)**
- 管理数据库连接及上下文。
- 提供批量插入 / 更新 / Upsert 操作接口。
- **配置层 (`config/`)**
- 定义配置项默认值。
- 解析环境变量并进行类型转换。
- 对外提供统一配置对象。
### 5.3 设计模式(当前使用)
- 工厂模式:任务注册 / 创建(`TaskRegistry`)。
- 模板方法模式:任务执行流程(`BaseTask`)。
- 策略模式:不同 Loader / Checker 实现不同策略。
- 依赖注入:通过构造函数向任务传入 `db`、`api`、`config` 等依赖。
### 5.4 数据与控制流程
整体流程:
1. CLI 解析参数并加载配置。
2. Scheduler 构建数据库连接、API 客户端等依赖。
3. Scheduler 遍历任务配置,从 `TaskRegistry` 获取任务类并实例化。
4. 每个任务按统一模板执行:
- 读取游标 / 时间窗口。
- 调用 API 拉取数据(可分页)。
- 解析、验证数据。
- 通过 Loader 写入数据库(事实表 / 维度表 / SCD2
- 执行质量检查。
- 更新游标与运行记录。
5. 所有任务执行完成后,释放连接并退出进程。
### 5.5 错误处理策略
- 单个任务失败不影响其他任务执行。
- 数据库操作异常自动回滚当前事务。
- API 请求失败时按配置进行重试,超过重试次数记录错误并终止该任务。
- 所有错误被记录到日志和运行追踪表,便于事后排查。
### 5.6 ODS + DWD 双阶段策略(新增)
为了支撑回溯/重放与后续 DWD 宽表构建,项目新增了 `billiards_ods` Schema 以及一组专门的 ODS 任务/Loader
- **ODS 表**`billiards_ods.ods_order_settle`、`ods_table_use_detail`、`ods_assistant_ledger`、`ods_assistant_abolish`、`ods_goods_ledger`、`ods_payment`、`ods_refund`、`ods_coupon_verify`、`ods_member`、`ods_member_card`、`ods_package_coupon`、`ods_inventory_stock`、`ods_inventory_change`。每条记录都会保存 `store_id + 源主键 + payload JSON + fetched_at + source_endpoint` 等信息。
- **通用 Loader**`loaders/ods/generic.py::GenericODSLoader` 统一封装了 `INSERT ... ON CONFLICT ...` 与批量写入逻辑,调用方只需提供列名与主键列即可。
- **ODS 任务**`tasks/ods_tasks.py` 内通过 `OdsTaskSpec` 定义了一组任务(`ODS_ORDER_SETTLE`、`ODS_PAYMENT`、`ODS_ASSISTANT_LEDGER` 等),并在 `TaskRegistry` 中自动注册,可直接通过 `python -m cli.main --tasks ODS_ORDER_SETTLE,ODS_PAYMENT` 执行。
- **双阶段链路**
1. 阶段 1ODS调用 API/离线归档 JSON将原始记录写入 ODS 表,保留分页、抓取时间、来源文件等元数据。
2. 阶段 2DWD/DIM后续订单、支付、券等事实任务将改为从 ODS 读取 payload经过解析/校验后写入 `billiards.fact_*`、`dim_*` 表,避免重复拉取上游接口。
> 新增的单元测试 `tests/unit/test_ods_tasks.py` 覆盖了 `ODS_ORDER_SETTLE`、`ODS_PAYMENT` 的入库路径,可作为扩展其他 ODS 任务的模板。
---
## 6. 迁移指南(从旧脚本到当前项目)
本节用于说明如何从旧的单文件脚本(如 `task_merged.py`)迁移到当前模块化项目,属于当前项目的使用说明,不涉及历史对比细节。
### 6.1 核心功能映射示意
| 旧版本函数 / 类 | 新版本位置 | 说明 |
| --------------------- | ----------------------------------------------------- | ---------- |
| `DEFAULTS` 字典 | `config/defaults.py` | 配置默认值 |
| `build_config()` | `config/settings.py::AppConfig.load()` | 配置加载 |
| `Pg` 类 | `database/connection.py::DatabaseConnection` | 数据库连接 |
| `http_get_json()` | `api/client.py::APIClient.get()` | API 请求 |
| `paged_get()` | `api/client.py::APIClient.get_paginated()` | 分页请求 |
| `parse_ts()` | `models/parsers.py::TypeParser.parse_timestamp()` | 时间解析 |
| `upsert_fact_order()` | `loaders/facts/order.py::OrderLoader.upsert_orders()` | 订单加载 |
| `scd2_upsert()` | `scd/scd2_handler.py::SCD2Handler.upsert()` | SCD2 处理 |
| `run_task_orders()` | `tasks/orders_task.py::OrdersTask.execute()` | 订单任务 |
| `main()` | `cli/main.py::main()` | 主入口 |
### 6.2 典型迁移步骤
1. **配置迁移**
- 原来在 `DEFAULTS` 或脚本内硬编码的配置,迁移到 `.env` 与 `config/defaults.py`。
- 使用 `AppConfig.load()` 统一获取配置。
2. **并行运行验证**
```bash
# 旧脚本
python task_merged.py --tasks ORDERS
# 新项目
python -m cli.main --tasks ORDERS
```
对比新旧版本导出的数据表和日志,确认一致性。
3. **自定义逻辑迁移**
- 原脚本中的自定义清洗逻辑 → 放入相应 `loaders/` 或任务类中。
- 自定义任务 → 在 `tasks/` 中实现并在 `task_registry` 中注册。
- 自定义 API 调用 → 扩展 `api/client.py` 或单独封装服务类。
4. **逐步切换**
- 先在测试环境并行运行。
- 再逐步切换生产任务到新版本。
---
## 7. 开发与扩展指南(当前项目)
### 7.1 添加新任务
1. 在 `tasks/` 目录创建任务类:
```python
from .base_task import BaseTask
class MyTask(BaseTask):
def get_task_code(self) -> str:
return "MY_TASK"
def execute(self) -> dict:
# 1. 获取时间窗口
window_start, window_end, _ = self._get_time_window()
# 2. 调用 API 获取数据
records, _ = self.api.get_paginated(...)
# 3. 解析 / 校验
parsed = [self._parse(r) for r in records]
# 4. 加载数据
loader = MyLoader(self.db)
inserted, updated, _ = loader.upsert(parsed)
# 5. 提交并返回结果
self.db.commit()
return self._build_result("SUCCESS", {
"inserted": inserted,
"updated": updated,
})
```
2. 在 `orchestration/task_registry.py` 中注册:
```python
from tasks.my_task import MyTask
default_registry.register("MY_TASK", MyTask)
```
3. 在任务配置表中启用(示例):
```sql
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
VALUES ('MY_TASK', 123456, TRUE);
```
### 7.2 添加新加载器
```python
from loaders.base_loader import BaseLoader
class MyLoader(BaseLoader):
def upsert(self, records: list) -> tuple:
sql = "INSERT INTO table_name (...) VALUES (...) ON CONFLICT (...) DO UPDATE SET ... RETURNING (xmax = 0) AS inserted"
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)
```
### 7.3 添加新质量检查器
1. 在 `quality/` 中实现检查器,继承 `base_checker.py`。
2. 在任务或调度流程中调用该检查器,在写库后进行验证。
### 7.4 类型解析与校验扩展
- 在 `models/parsers.py` 中添加新类型解析方法。
- 在 `models/validators.py` 中添加新规则(如枚举校验、跨字段校验等)。
---
## 8. 常见问题排查
### 8.1 数据库连接失败
```text
错误: could not connect to server
```
排查要点:
- 检查 `PG_DSN` 或相关数据库配置是否正确。
- 确认数据库服务是否启动、网络是否可达。
### 8.2 API 请求超时
```text
错误: requests.exceptions.Timeout
```
排查要点:
- 检查 `API_BASE` 地址与网络连通性。
- 适当提高超时与重试次数(在配置中调整)。
### 8.3 模块导入错误
```text
错误: ModuleNotFoundError
```
排查要点:
- 确认在项目根目录下运行(包含 `etl_billiards/` 包)。
- 或通过 `pip install -e .` 以可编辑模式安装项目。
### 8.4 权限相关问题
```text
错误: Permission denied
```
排查要点:
- 脚本无执行权限:`chmod +x run_etl.sh`。
- Windows 需要以管理员身份运行,或修改日志 / 导出目录权限。
---
## 9. 使用前检查清单
在正式运行前建议确认:
- [ ] 已安装 Python 3.10+。
- [ ] 已执行 `pip install -r requirements.txt`。
- [ ] `.env` 已配置正确数据库、API、门店 ID、路径等
- [ ] PostgreSQL 数据库可连接。
- [ ] API 服务可访问且凭证有效。
- [ ] `LOG_ROOT`、`EXPORT_ROOT` 目录存在且拥有写权限。
---
## 10. 参考说明
- 本文已合并原有的快速开始、项目结构、架构说明、迁移指南等内容,可作为当前项目的统一说明文档。
- 如需在此基础上拆分多份文档,可按章节拆出,例如「快速开始」「架构设计」「迁移指南」「开发扩展」等。
## 11. 运行/调试模式说明
- 生产环境仅保留“任务模式”:通过调度/CLI 执行注册的任务ETL/ODS不使用调试脚本。
- 开发/调试可使用的辅助脚本(上线前可删除或禁用):
- `python -m etl_billiards.scripts.rebuild_ods_from_json`:从本地 JSON 目录重建 `billiards_ods`,用于离线初始化/验证。环境变量:`PG_DSN`(必填)、`JSON_DOC_DIR`(可选,默认 `C:\dev\LLTQ\export\test-json-doc`)、`INCLUDE_FILES`(逗号分隔文件名)、`DROP_SCHEMA_FIRST`(默认 true
- 如需在生产环境保留脚本,请在运维手册中明确用途和禁用条件,避免误用。
## 12. ODS 任务上线指引
- 任务注册:`etl_billiards/database/seed_ods_tasks.sql` 列出了当前启用的 ODS 任务。将其中的 `store_id` 替换为实际门店后执行:
```
psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
```
`ON CONFLICT` 会保持 enabled=true避免重复。
- 调度:确认 `etl_admin.etl_task` 中已启用所需的 ODS 任务(任务代码见 seed 脚本),调度器或 CLI `--tasks` 即可调用。
- 离线回灌:开发环境可用 `rebuild_ods_from_json` 以样例 JSON 初始化 ODS生产慎用默认按 `(source_file, record_index)` 去重。
- 测试:`pytest etl_billiards/tests/unit/test_ods_tasks.py` 覆盖核心 ODS 任务;测试时可设置 `ETL_SKIP_DOTENV=1` 跳过本地 .env 读取。
## 13. ODS 表映射总览
| ODS 表名 | 接口 Path | 数据列表路径 |
| ------------------------------------ | ---------------------------------------------------- | ----------------------------- |
| `assistant_accounts_master` | `/PersonnelManagement/SearchAssistantInfo` | data.assistantInfos |
| `assistant_service_records` | `/AssistantPerformance/GetOrderAssistantDetails` | data.orderAssistantDetails |
| `assistant_cancellation_records` | `/AssistantPerformance/GetAbolitionAssistant` | data.abolitionAssistants |
| `goods_stock_movements` | `/GoodsStockManage/QueryGoodsOutboundReceipt` | data.queryDeliveryRecordsList |
| `goods_stock_summary` | `/TenantGoods/GetGoodsStockReport` | data |
| `group_buy_packages` | `/PackageCoupon/QueryPackageCouponList` | data.packageCouponList |
| `group_buy_redemption_records` | `/Site/GetSiteTableUseDetails` | data.siteTableUseDetailsList |
| `member_profiles` | `/MemberProfile/GetTenantMemberList` | data.tenantMemberInfos |
| `member_balance_changes` | `/MemberProfile/GetMemberCardBalanceChange` | data.tenantMemberCardLogs |
| `member_stored_value_cards` | `/MemberProfile/GetTenantMemberCardList` | data.tenantMemberCards |
| `payment_transactions` | `/PayLog/GetPayLogListPage` | data |
| `platform_coupon_redemption_records` | `/Promotion/GetOfflineCouponConsumePageList` | data |
| `recharge_settlements` | `/Site/GetRechargeSettleList` | data.settleList |
| `refund_transactions` | `/Order/GetRefundPayLogList` | data |
| `settlement_records` | `/Site/GetAllOrderSettleList` | data.settleList |
| `settlement_ticket_details` | `/Order/GetOrderSettleTicketNew` | (整包原始 JSON |
| `site_tables_master` | `/Table/GetSiteTables` | data.siteTables |
| `stock_goods_category_tree` | `/TenantGoodsCategory/QueryPrimarySecondaryCategory` | data.goodsCategoryList |
| `store_goods_master` | `/TenantGoods/GetGoodsInventoryList` | data.orderGoodsList |
| `store_goods_sales_records` | `/TenantGoods/GetGoodsSalesList` | data.orderGoodsLedgers |
| `table_fee_discount_records` | `/Site/GetTaiFeeAdjustList` | data.taiFeeAdjustInfos |
| `table_fee_transactions` | `/Site/GetSiteTableOrderDetails` | data.siteTableUseDetailsList |
| `tenant_goods_master` | `/TenantGoods/QueryTenantGoods` | data.tenantGoodsList |
## 14. ODS 相关环境变量/默认值
- `.env` / 环境变量:
- `JSON_DOC_DIR`ODS 样例 JSON 目录(开发/回灌用)
- `ODS_INCLUDE_FILES`:限定导入的文件名(逗号分隔,不含 .json
- `ODS_DROP_SCHEMA_FIRST`true/false是否重建 schema
- `ETL_SKIP_DOTENV`:测试/CI 时设为 1 跳过本地 .env 读取
- `config/defaults.py` 中 `ods` 默认值:
- `json_doc_dir`: `C:\dev\LLTQ\export\test-json-doc`
- `include_files`: `""`
- `drop_schema_first`: `True`
---
## 15. DWD 维度 “业务事件”
1. 粒度唯一、原子
- 一张 DWD 表只能有一种业务粒度,比如:
- 一条记录 = 一次结账;
- 一条记录 = 一段台费流水;
- 一条记录 = 一次助教服务;
- 一条记录 = 一次会员余额变动。
- 表里面不能又混“订单头”又混“订单行”,不能一部分是“汇总”,一部分是“明细”。
- 一旦粒度确定,所有字段都要跟这个粒度匹配:
- 比如“结账头表”就不要塞每一行商品明细;
- 商品明细就不要塞整单级别的总金额。
- 这是 DWD 层最重要的一条。
2. 以业务过程建模,不以 JSON 列表建模
- 先画清楚你真实的业务链路:
- 开台 / 换台 / 关台 → 台费流水
- 助教上桌 → 助教服务流水 / 废除事件
- 点单 → 商品销售流水
- 充值 / 消费 → 余额变更 / 充值单
- 结账 → 结账头表 + 支付流水 / 退款流水
- 团购 / 平台券 → 核销流水
3. 主键明确、外键统一
- 每张 DWD 表必须有业务主键(哪怕是接口给的 id不要依赖数据库自增。
- 所有“同一概念”的字段必须统一命名、统一含义:
- 门店:统一叫 site_id都对应 siteProfile.id
- 会员:统一叫 member_id 对应 member_profiles.idsystem_member_id 单独一列;
- 台桌:统一 table_id 对应 site_tables_master.id
- 结账:统一 order_settle_id
- 订单:统一 order_trade_no 等。
- 否则后面 DWS、AI 要把表拼起来会非常痛苦。
4. 保留明细,不做过度汇总
- DWD 层的事实表原则上只做“明细级”的数据:
- 不要在 DWD 就把“日汇总、周汇总、月汇总”算出来,那是 DWS 的事;
- 也不要把多个事件折成一行(例如一张表同时放日汇总+单笔流水)。
- 需要聚合时,再在 DWS 做主题宽表:
- dws_member_day_profile、dws_site_day_summary 等。
- DWD 只负责细颗粒度的真相。
5. 统一清洗、标准化,但保持可追溯
- 在 DWD 层一定要做的清洗:
- 类型转换:字符串时间 → 时间类型,金额统一为 decimal布尔统一为 0/1
- 单位统一:秒 / 分钟、元 / 分都统一;
- 枚举标准化:状态码、类型码在 DWD 里就定死含义,必要时建枚举维表。
- 同时要保证:
- 每条 DWD 记录都能追溯回 ODS
- 保留源系统主键;
- 保留原始时间 / 原始金额字段(不要覆盖掉)。
6. 扁平化、去嵌套
- JSON 里常见结构是:分页壳 + 头 + 明细数组 + 各种嵌套对象siteProfile、tableProfile、goodsLedgers…
- DWD 的原则是:
- 去掉分页壳;
- 把“数组”拆成子表(头表 / 行表);
- 把重复出现的 profile 抽出去做维度表(门店、台、商品、会员……)。
- 目标是DWD 表都是二维表结构,不存复杂嵌套 JSON。
7. 模型长期稳定,可扩展
- DWD 的表结构要尽可能稳定,新增需求尽量通过:
- 加字段;
- 新建事实表 / 维度表;
- 在 DWS 做派生指标;
- 而不是频繁重构已有 DWD 表结构。
- 这点跟你后面要喂给 LLM 也很相关AI 配的 prompt、schema 理解都要尽量少改。

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,10 +1,10 @@
-- 灏嗘柊鐨?ODS 浠诲姟娉ㄥ唽鍒?etl_admin.etl_task锛堟牴鎹渶瑕佹浛鎹?store_id锛? -- 将新的 ODS 任务注册到 etl_admin.etl_task(按需替换 store_id)。
-- 浣跨敤鏂瑰紡锛堢ず渚嬶級锛? -- 使用方式(示例):
-- psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql -- psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
-- 鎴栬€呭湪 psql 涓墽琛屾湰鏂囦欢鍐呭銆? -- 或在 psql 中直接执行本文件内容。
WITH target_store AS ( WITH target_store AS (
SELECT 2790685415443269::bigint AS store_id -- TODO: 鏇挎崲涓哄疄闄?store_id SELECT 2790685415443269::bigint AS store_id -- TODO: 替换为实际 store_id
), ),
task_codes AS ( task_codes AS (
SELECT unnest(ARRAY[ SELECT unnest(ARRAY[
@@ -37,5 +37,3 @@ SELECT t.task_code, s.store_id, TRUE
FROM task_codes t CROSS JOIN target_store s FROM task_codes t CROSS JOIN target_store s
ON CONFLICT (task_code, store_id) DO UPDATE ON CONFLICT (task_code, store_id) DO UPDATE
SET enabled = EXCLUDED.enabled; SET enabled = EXCLUDED.enabled;

View File

@@ -1,9 +0,0 @@
# DWD 璐ㄩ噺鏍¢獙鎸囧紩
璇存槑锛氱敤浜?ODS 鈫?DWD 钀藉湴鍚庣殑琛屾暟/閲戦鏍稿涓庢娊鏍峰洖鏌ャ€?
## 琛屾暟瀵规瘮锛堢ず渚嬶級
- 鏉ユ簮锛歚etl_billiards/ods_row_report.json` 璁板綍浜嗙ず渚?JSON 涓?ODS 琛屾暟锛屽彲浣滀负 DWD 瀵规瘮鍩虹嚎銆?- 鎵ц锛氬湪 DWD 璺戝畬鍚庯紝缁熻鍏抽敭琛ㄨ鏁帮紝涓?ODS 姹囨€绘垨 JSON 鍩虹嚎瀵归綈锛涘紓甯告椂杈撳嚭宸紓銆?
## 閲戦/鎸囨爣鏍稿寤鸿
- dwd_settlement_head / dwd_settlement_head_Ex锛氳仛鍚堣鍗曟€婚銆侀€€娆鹃锛屼笌 ODS settleList 閲戦鏍稿銆?- dwd_store_goods_sale锛氭寜鍟嗗搧姹囨€婚攢鍞/鏁伴噺锛屼笌 ODS store_goods_sales_records 鑱氬悎瀵规瘮銆?- dwd_member_balance_change锛氭寜浼氬憳姹囨€诲彉鍔ㄩ锛屼笌 ODS 鍚岃〃鑱氬悎瀵规瘮銆?- dwd_recharge_order / dwd_payment / dwd_refund锛氭寜鏀粯鏂瑰紡銆佹椂闂存鑱氬悎閲戦锛屾牳瀵瑰樊寮傘€?
## 鎶芥牱鍥炴煡
- 闅忔満鍙栬嫢骞?DWD 璁板綍锛屽洖鏌?ODS payload锛堥€氳繃涓婚敭鍦?ODS 琛ㄦ煡璇級纭瀛楁鏄犲皠姝g‘銆?- 瀵?SCD2 缁村害锛圖IM 琛級锛氭牎楠屽悓涓氬姟閿粎涓€鏉?is_current=1锛屾椂闂存涓嶉噸鍙狅紝鐗堟湰鍙烽€掑銆?
## 鑷姩鍖栨牎楠岃剼鏈缓璁?- 缁熻鑴氭湰锛氳緭鍑?DWD 鍏抽敭琛ㄨ鏁?閲戦鍒?JSON锛屾柟渚夸笌鍩虹嚎瀵规瘮銆?- 寮傚父鍛婅锛氬彂鐜拌鏁板亸宸垨閲戦鍋忓樊瓒呰繃闃堝€兼椂鎵撳嵃璇︽儏锛堜富閿垪琛ㄣ€佽仛鍚堟槑缁嗭級銆?

View File

@@ -1,28 +0,0 @@
# ODS 示例 JSON 对照表
示例文件名与正式文件名的前缀一致(正式文件会附加 `_YYYYMMDDHHMMSS` 时间戳),表名与文件前缀保持一致,便于业务对照。示例目录默认:`C:\dev\LLTQ\export\test-json-doc`
| JSON 文件名(前缀) | ODS 表名 | 主键字段 | 备注 |
| --- | --- | --- | --- |
| assistant_accounts_master.json | assistant_accounts_master | id | 店员主数据 |
| assistant_cancellation_records.json | assistant_cancellation_records | id | 店员作废事件 |
| assistant_service_records.json | assistant_service_records | id | 店员服务流水 |
| goods_stock_movements.json | goods_stock_movements | id | 进销存出入库 |
| goods_stock_summary.json | goods_stock_summary | id | 库存汇总 |
| group_buy_packages.json | group_buy_packages | id | 团购套餐定义 |
| group_buy_redemption_records.json | group_buy_redemption_records | id | 团购核销/消耗 |
| member_balance_changes.json | member_balance_changes | id | 储值余额变动 |
| member_profiles.json | member_profiles | id | 会员档案 |
| member_stored_value_cards.json | member_stored_value_cards | id | 储值卡账户 |
| payment_transactions.json | payment_transactions | id | 支付流水 |
| platform_coupon_redemption_records.json | platform_coupon_redemption_records | id | 平台券核销 |
| recharge_settlements.json | recharge_settlements | id | 储值充值结算 |
| refund_transactions.json | refund_transactions | id | 退款流水 |
| settlement_records.json | settlement_records | id | 订单结算头 |
| settlement_ticket_details.json | settlement_ticket_details | id | 小票/明细表 |
| site_tables_master.json | site_tables_master | id | 台桌主数据 |
| stock_goods_category_tree.json | stock_goods_category_tree | id | 商品类目树 |
| store_goods_master.json | store_goods_master | id | 门店商品档案 |
| store_goods_sales_records.json | store_goods_sales_records | id | 门店商品销售明细 |
| table_fee_discount_records.json | table_fee_discount_records | id | 台桌减免/调价 |
| table_fee_transactions.json | table_fee_transactions | id | 台桌计费流水 |
| tenant_goods_master.json | tenant_goods_master | id | 品牌/租户级商品档案 |

View File

@@ -1,252 +0,0 @@
# ODS → DWD 映射文档(重建版)
本文件基于最新 DWD 质检结果重构,列出 DWD 表的 ODS 来源与字段映射状态。DIM 表默认使用 SCD2SCD2_start_time / SCD2_end_time / SCD2_is_current / SCD2_version
## 表级映射概览
| DWD 表 | 主键/提示 | 对应 ODS 表 | SCD2 |
| --- | --- | --- | --- |
| billiards_dwd.dim_site | 见 schema_dwd_doc.sql | billiards_ods.site_tables_master | 是 |
| billiards_dwd.dim_site_ex | 见 schema_dwd_doc.sql | billiards_ods.site_tables_master | 是 |
| billiards_dwd.dim_table | 见 schema_dwd_doc.sql | billiards_ods.site_tables_master | 是 |
| billiards_dwd.dim_table_ex | 见 schema_dwd_doc.sql | billiards_ods.site_tables_master | 是 |
| billiards_dwd.dim_assistant | 见 schema_dwd_doc.sql | billiards_ods.assistant_accounts_master | 是 |
| billiards_dwd.dim_assistant_ex | 见 schema_dwd_doc.sql | billiards_ods.assistant_accounts_master | 是 |
| billiards_dwd.dim_member | 见 schema_dwd_doc.sql | billiards_ods.member_profiles | 是 |
| billiards_dwd.dim_member_ex | 见 schema_dwd_doc.sql | billiards_ods.member_profiles | 是 |
| billiards_dwd.dim_member_card_account | 见 schema_dwd_doc.sql | billiards_ods.member_stored_value_cards | 是 |
| billiards_dwd.dim_member_card_account_ex | 见 schema_dwd_doc.sql | billiards_ods.member_stored_value_cards | 是 |
| billiards_dwd.dim_tenant_goods | 见 schema_dwd_doc.sql | billiards_ods.tenant_goods_master | 是 |
| billiards_dwd.dim_tenant_goods_ex | 见 schema_dwd_doc.sql | billiards_ods.tenant_goods_master | 是 |
| billiards_dwd.dim_store_goods | 见 schema_dwd_doc.sql | billiards_ods.store_goods_master | 是 |
| billiards_dwd.dim_store_goods_ex | 见 schema_dwd_doc.sql | billiards_ods.store_goods_master | 是 |
| billiards_dwd.dim_goods_category | 见 schema_dwd_doc.sql | billiards_ods.stock_goods_category_tree | 是 |
| billiards_dwd.dim_groupbuy_package | 见 schema_dwd_doc.sql | billiards_ods.group_buy_packages | 是 |
| billiards_dwd.dim_groupbuy_package_ex | 见 schema_dwd_doc.sql | billiards_ods.group_buy_packages | 是 |
| billiards_dwd.dwd_settlement_head | 见 schema_dwd_doc.sql | billiards_ods.settlement_records | 否 |
| billiards_dwd.dwd_settlement_head_ex | 见 schema_dwd_doc.sql | billiards_ods.settlement_records | 否 |
| billiards_dwd.dwd_table_fee_log | 见 schema_dwd_doc.sql | billiards_ods.table_fee_transactions | 否 |
| billiards_dwd.dwd_table_fee_log_ex | 见 schema_dwd_doc.sql | billiards_ods.table_fee_transactions | 否 |
| billiards_dwd.dwd_table_fee_adjust | 见 schema_dwd_doc.sql | billiards_ods.table_fee_discount_records | 否 |
| billiards_dwd.dwd_table_fee_adjust_ex | 见 schema_dwd_doc.sql | billiards_ods.table_fee_discount_records | 否 |
| billiards_dwd.dwd_store_goods_sale | 见 schema_dwd_doc.sql | billiards_ods.store_goods_sales_records | 否 |
| billiards_dwd.dwd_store_goods_sale_ex | 见 schema_dwd_doc.sql | billiards_ods.store_goods_sales_records | 否 |
| billiards_dwd.dwd_assistant_service_log | 见 schema_dwd_doc.sql | billiards_ods.assistant_service_records | 否 |
| billiards_dwd.dwd_assistant_service_log_ex | 见 schema_dwd_doc.sql | billiards_ods.assistant_service_records | 否 |
| billiards_dwd.dwd_assistant_trash_event | 见 schema_dwd_doc.sql | billiards_ods.assistant_cancellation_records | 否 |
| billiards_dwd.dwd_assistant_trash_event_ex | 见 schema_dwd_doc.sql | billiards_ods.assistant_cancellation_records | 否 |
| billiards_dwd.dwd_member_balance_change | 见 schema_dwd_doc.sql | billiards_ods.member_balance_changes | 否 |
| billiards_dwd.dwd_member_balance_change_ex | 见 schema_dwd_doc.sql | billiards_ods.member_balance_changes | 否 |
| billiards_dwd.dwd_groupbuy_redemption | 见 schema_dwd_doc.sql | billiards_ods.group_buy_redemption_records | 否 |
| billiards_dwd.dwd_groupbuy_redemption_ex | 见 schema_dwd_doc.sql | billiards_ods.group_buy_redemption_records | 否 |
| billiards_dwd.dwd_platform_coupon_redemption | 见 schema_dwd_doc.sql | billiards_ods.platform_coupon_redemption_records | 否 |
| billiards_dwd.dwd_platform_coupon_redemption_ex | 见 schema_dwd_doc.sql | billiards_ods.platform_coupon_redemption_records | 否 |
| billiards_dwd.dwd_recharge_order | 见 schema_dwd_doc.sql | billiards_ods.recharge_settlements | 否 |
| billiards_dwd.dwd_recharge_order_ex | 见 schema_dwd_doc.sql | billiards_ods.recharge_settlements | 否 |
| billiards_dwd.dwd_payment | 见 schema_dwd_doc.sql | billiards_ods.payment_transactions | 否 |
| billiards_dwd.dwd_refund | 见 schema_dwd_doc.sql | billiards_ods.refund_transactions | 否 |
| billiards_dwd.dwd_refund_ex | 见 schema_dwd_doc.sql | billiards_ods.refund_transactions | 否 |
## 字段级映射(同名直拷 / 需映射)
同名直拷DWD 字段与 ODS 同名直接复制需映射ODS 无同名列,需要在装载逻辑中指定来源或默认值。
## 字段级映射(同名直拷 / 需映射)
同名直拷:DWD 字段与 ODS 同名,直接复制;需映射:ODS 无同名列,需要在装载逻辑中指定来源或默认值。
### billiards_dwd.dim_site
来源billiards_ods.site_tables_master
**同名直拷字段:** site_id
**需映射/派生字段:** org_id, shop_name, business_tel, full_address, tenant_id, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_site_ex
来源billiards_ods.site_tables_master
**同名直拷字段:** site_id, light_status, create_time
**需映射/派生字段:** avatar, address, longitude, latitude, tenant_site_region_id, auto_light, light_type, light_token, site_type, site_label, attendance_enabled, attendance_distance, customer_service_qrcode, customer_service_wechat, fixed_pay_qrcode, prod_env, shop_status, update_time, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_table
来源billiards_ods.site_tables_master
**同名直拷字段:** site_id, table_name, site_table_area_id, table_price
**需映射/派生字段:** table_id, tenant_id, site_table_area_name, tenant_table_area_id, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_table_ex
来源billiards_ods.site_tables_master
**同名直拷字段:** show_status, is_online_reservation, table_cloth_use_time, table_cloth_use_cycle, table_status
**需映射/派生字段:** table_id, last_maintenance_time, remark, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_assistant
来源billiards_ods.assistant_accounts_master
**同名直拷字段:** assistant_no, real_name, nickname, mobile, tenant_id, site_id, team_id, team_name, level, entry_time, resign_time, leave_status, assistant_status
**需映射/派生字段:** assistant_id, user_id, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_assistant_ex
来源billiards_ods.assistant_accounts_master
**同名直拷字段:** gender, avatar, video_introduction_url, staff_id, staff_profile_id, sum_grade, get_grade_times, work_status, show_status, show_sort, create_time, update_time, start_time, end_time, order_trade_no
**需映射/派生字段:** assistant_id, birth_date, introduce, height, weight, shop_name, group_id, group_name, person_org_id, assistant_grade, charge_way, allow_cx, is_guaranteed, salary_grant_enabled, entry_type, entry_sign_status, resign_sign_status, online_status, is_delete, criticism_status, last_table_id, last_table_name, last_update_name, ding_talk_synced, site_light_cfg_id, light_equipment_id, light_status, is_team_leader, serial_number, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_member
来源billiards_ods.member_profiles
**同名直拷字段:** system_member_id, tenant_id, register_site_id, mobile, nickname, member_card_grade_code, member_card_grade_name, create_time
**需映射/派生字段:** member_id, update_time, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_member_ex
来源billiards_ods.member_profiles
**同名直拷字段:** referrer_member_id, point, growth_value, user_status, status
**需映射/派生字段:** member_id, register_site_name, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_member_card_account
来源billiards_ods.member_stored_value_cards
**同名直拷字段:** tenant_id, register_site_id, tenant_member_id, system_member_id, card_type_id, member_card_grade_code, member_card_grade_code_name, member_card_type_name, member_name, member_mobile, balance, start_time, end_time, last_consume_time, status, is_delete
**需映射/派生字段:** member_card_id, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_member_card_account_ex
来源billiards_ods.member_stored_value_cards
**同名直拷字段:** site_name, tenantavatar, effect_site_id, able_cross_site, card_physics_type, card_no, bind_password, use_scene, denomination, create_time, disable_start_time, disable_end_time, is_allow_give, is_allow_order_deduct, sort, table_discount, goods_discount, assistant_discount, assistant_reward_discount, table_service_discount, goods_service_discount, assistant_service_discount, coupon_discount, table_discount_sub_switch, goods_discount_sub_switch, assistant_discount_sub_switch, assistant_reward_discount_sub_switch, goods_discount_range_type, table_deduct_radio, goods_deduct_radio, assistant_deduct_radio, table_service_deduct_radio, goods_service_deduct_radio, assistant_service_deduct_radio, assistant_reward_deduct_radio, coupon_deduct_radio, cardsettlededuct, tablecarddeduct, tableservicecarddeduct, goodscardeduct, goodsservicecarddeduct, assistantcarddeduct, assistantservicecarddeduct, assistantrewardcarddeduct, couponcarddeduct, deliveryfeededuct, tableareaid, goodscategoryid, pdassisnatlevel, cxassisnatlevel
**需映射/派生字段:** member_card_id, tenant_name, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_tenant_goods
来源billiards_ods.tenant_goods_master
**同名直拷字段:** tenant_id, supplier_id, goods_category_id, goods_second_category_id, goods_name, goods_number, unit, market_price, goods_state, create_time, update_time, is_delete
**需映射/派生字段:** tenant_goods_id, category_name, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_tenant_goods_ex
来源billiards_ods.tenant_goods_master
**同名直拷字段:** remark_name, pinyin_initial, goods_cover, goods_bar_code, commodity_code, min_discount_price, cost_price, cost_price_type, able_discount, sale_channel, is_warehousing, able_site_transfer, common_sale_royalty, point_sale_royalty
**需映射/派生字段:** tenant_goods_id, commodity_code_list, is_in_site, out_goods_id, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_store_goods
来源billiards_ods.store_goods_master
**同名直拷字段:** tenant_id, site_id, tenant_goods_id, goods_name, goods_category_id, goods_second_category_id, sale_price, goods_state, enable_status, send_state, is_delete
**需映射/派生字段:** site_goods_id, category_level1_name, category_level2_name, batch_stock_qty, sale_qty, total_sales_qty, created_at, updated_at, avg_monthly_sales, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_store_goods_ex
来源billiards_ods.store_goods_master
**同名直拷字段:** unit, pinyin_initial, cost_price, cost_price_type, total_purchase_cost, min_discount_price, audit_status, sale_channel, is_warehousing, forbid_sell_status, able_site_transfer, custom_label_type, option_required, remark
**需映射/派生字段:** site_goods_id, site_name, goods_barcode, goods_cover_url, stock_qty, stock_secondary_qty, safety_stock_qty, provisional_total_cost, is_discountable, days_on_shelf, freeze_status, sort_order, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_goods_category
来源billiards_ods.stock_goods_category_tree
**同名直拷字段:** tenant_id, category_name, alias_name, business_name, tenant_goods_business_id, open_salesman, is_warehousing
**需映射/派生字段:** category_id, parent_category_id, category_level, is_leaf, sort_order, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_groupbuy_package
来源billiards_ods.group_buy_packages
**同名直拷字段:** tenant_id, site_id, package_name, selling_price, start_time, end_time, table_area_name, is_enabled, is_delete, create_time, tenant_table_area_id_list, card_type_ids
**需映射/派生字段:** groupbuy_package_id, package_template_id, coupon_face_value, duration_seconds, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dim_groupbuy_package_ex
来源billiards_ods.group_buy_packages
**同名直拷字段:** site_name, usable_count, date_type, usable_range, date_info, start_clock, end_clock, add_start_clock, add_end_clock, area_tag_type, table_area_id, tenant_table_area_id, table_area_id_list, group_type, system_group_type, effective_status, max_selectable_categories, creator_name
**需映射/派生字段:** groupbuy_package_id, package_type, scd2_start_time, scd2_end_time, scd2_is_current, scd2_version
### billiards_dwd.dwd_settlement_head
来源billiards_ods.settlement_records
**同名直拷字段:**
**需映射/派生字段:** order_settle_id, tenant_id, site_id, site_name, table_id, settle_name, order_trade_no, create_time, pay_time, settle_type, revoke_order_id, member_id, member_name, member_phone, member_card_account_id, member_card_type_name, is_bind_member, member_discount_amount, consume_money, table_charge_money, goods_money, real_goods_money, assistant_pd_money, assistant_cx_money, adjust_amount, pay_amount, balance_amount, recharge_card_amount, gift_card_amount, coupon_amount, rounding_amount, point_amount
### billiards_dwd.dwd_settlement_head_ex
来源billiards_ods.settlement_records
**同名直拷字段:**
**需映射/派生字段:** order_settle_id, serial_number, settle_status, can_be_revoked, revoke_order_name, revoke_time, is_first_order, service_money, cash_amount, card_amount, online_amount, refund_amount, prepay_money, payment_method, coupon_sale_amount, all_coupon_discount, goods_promotion_money, assistant_promotion_money, activity_discount, assistant_manual_discount, point_discount_price, point_discount_cost, is_use_coupon, is_use_discount, is_activity, operator_name, salesman_name, order_remark, operator_id, salesman_user_id
### billiards_dwd.dwd_table_fee_log
来源billiards_ods.table_fee_transactions
**同名直拷字段:** order_trade_no, order_settle_id, order_pay_id, tenant_id, site_id, site_table_id, site_table_area_id, site_table_area_name, tenant_table_area_id, member_id, ledger_name, ledger_unit_price, ledger_count, ledger_amount, real_table_charge_money, coupon_promotion_amount, member_discount_amount, adjust_amount, real_table_use_seconds, add_clock_seconds, start_use_time, ledger_end_time, create_time, ledger_status, is_single_order, is_delete
**需映射/派生字段:** table_fee_log_id
### billiards_dwd.dwd_table_fee_log_ex
来源billiards_ods.table_fee_transactions
**同名直拷字段:** operator_name, salesman_name, used_card_amount, service_money, mgmt_fee, fee_total, ledger_start_time, last_use_time, operator_id, salesman_user_id, salesman_org_id
**需映射/派生字段:** table_fee_log_id
### billiards_dwd.dwd_table_fee_adjust
来源billiards_ods.table_fee_discount_records
**同名直拷字段:** order_trade_no, order_settle_id, tenant_id, site_id, tenant_table_area_id, ledger_amount, ledger_status, is_delete
**需映射/派生字段:** table_fee_adjust_id, table_id, table_area_id, table_area_name, adjust_time
### billiards_dwd.dwd_table_fee_adjust_ex
来源billiards_ods.table_fee_discount_records
**同名直拷字段:** adjust_type, ledger_count, ledger_name, applicant_name, operator_name, applicant_id, operator_id
**需映射/派生字段:** table_fee_adjust_id
### billiards_dwd.dwd_store_goods_sale
来源billiards_ods.store_goods_sales_records
**同名直拷字段:** order_trade_no, order_settle_id, order_pay_id, order_goods_id, site_id, tenant_id, site_goods_id, tenant_goods_id, tenant_goods_category_id, tenant_goods_business_id, site_table_id, ledger_name, ledger_group_name, ledger_unit_price, ledger_count, ledger_amount, real_goods_money, cost_money, ledger_status, is_delete, create_time
**需映射/派生字段:** store_goods_sale_id, discount_price
### billiards_dwd.dwd_store_goods_sale_ex
来源billiards_ods.store_goods_sales_records
**同名直拷字段:** goods_remark, option_value_name, operator_name, salesman_user_id, salesman_name, salesman_role_id, discount_money, coupon_deduct_money, member_discount_amount, point_discount_money, point_discount_money_cost, package_coupon_id, order_coupon_id, member_coupon_id, option_price, option_member_discount_money, option_coupon_deduct_money, push_money, is_single_order, sales_type, operator_id
**需映射/派生字段:** store_goods_sale_id, legacy_order_goods_id, site_name, legacy_site_id, open_salesman_flag, salesman_org_id, returns_number
### billiards_dwd.dwd_assistant_service_log
来源billiards_ods.assistant_service_records
**同名直拷字段:** order_trade_no, order_settle_id, order_pay_id, order_assistant_id, order_assistant_type, tenant_id, site_id, site_table_id, nickname, assistant_team_id, person_org_id, assistant_level, ledger_unit_price, ledger_amount, projected_income, coupon_deduct_money, income_seconds, real_use_seconds, add_clock, create_time, start_use_time, last_use_time, is_delete
**需映射/派生字段:** assistant_service_id, tenant_member_id, system_member_id, assistant_no, site_assistant_id, user_id, level_name, skill_id, skill_name
### billiards_dwd.dwd_assistant_service_log_ex
来源billiards_ods.assistant_service_records
**同名直拷字段:** ledger_name, ledger_group_name, ledger_count, member_discount_amount, manual_discount_amount, service_money, returns_clock, ledger_start_time, ledger_end_time, ledger_status, is_confirm, is_single_order, is_not_responding, is_trash, trash_applicant_id, trash_applicant_name, trash_reason, salesman_user_id, salesman_name, salesman_org_id, skill_grade, service_grade, composite_grade, sum_grade, get_grade_times, grade_status, composite_grade_time
**需映射/派生字段:** assistant_service_id, table_name, assistant_name
### billiards_dwd.dwd_assistant_trash_event
来源billiards_ods.assistant_cancellation_records
**同名直拷字段:**
**需映射/派生字段:** assistant_trash_event_id, site_id, table_id, table_area_id, assistant_no, assistant_name, charge_minutes_raw, abolish_amount, trash_reason, create_time
### billiards_dwd.dwd_assistant_trash_event_ex
来源billiards_ods.assistant_cancellation_records
**同名直拷字段:**
**需映射/派生字段:** assistant_trash_event_id, table_name, table_area_name
### billiards_dwd.dwd_member_balance_change
来源billiards_ods.member_balance_changes
**同名直拷字段:** tenant_id, site_id, register_site_id, tenant_member_id, system_member_id, tenant_member_card_id, card_type_id, from_type, payment_method, is_delete, remark
**需映射/派生字段:** balance_change_id, card_type_name, member_name, member_mobile, balance_before, change_amount, balance_after, change_time
### billiards_dwd.dwd_member_balance_change_ex
来源billiards_ods.member_balance_changes
**同名直拷字段:** refund_amount, operator_id, operator_name
**需映射/派生字段:** balance_change_id, pay_site_name, register_site_name
### billiards_dwd.dwd_groupbuy_redemption
来源billiards_ods.group_buy_redemption_records
**同名直拷字段:** tenant_id, site_id, table_id, tenant_table_area_id, table_charge_seconds, order_trade_no, order_settle_id, order_coupon_id, coupon_origin_id, promotion_activity_id, promotion_coupon_id, order_coupon_channel, ledger_unit_price, ledger_count, ledger_amount, coupon_money, promotion_seconds, coupon_code, is_single_order, is_delete, ledger_name, create_time
**需映射/派生字段:** redemption_id
### billiards_dwd.dwd_groupbuy_redemption_ex
来源billiards_ods.group_buy_redemption_records
**同名直拷字段:** order_pay_id, goods_promotion_money, table_service_promotion_money, assistant_promotion_money, assistant_service_promotion_money, reward_promotion_money, recharge_promotion_money, offer_type, ledger_status, operator_id, operator_name, salesman_user_id, salesman_name, salesman_role_id, ledger_group_name
**需映射/派生字段:** redemption_id, site_name, table_name, table_area_name, goods_option_price, salesman_org_id
### billiards_dwd.dwd_platform_coupon_redemption
来源billiards_ods.platform_coupon_redemption_records
**同名直拷字段:** tenant_id, site_id, coupon_code, coupon_channel, coupon_name, sale_price, coupon_money, coupon_free_time, channel_deal_id, deal_id, group_package_id, site_order_id, table_id, certificate_id, verify_id, use_status, is_delete, create_time, consume_time
**需映射/派生字段:** platform_coupon_redemption_id
### billiards_dwd.dwd_platform_coupon_redemption_ex
来源billiards_ods.platform_coupon_redemption_records
**同名直拷字段:** coupon_cover, coupon_remark, groupon_type, operator_id, operator_name
**需映射/派生字段:** platform_coupon_redemption_id
### billiards_dwd.dwd_recharge_order
来源billiards_ods.recharge_settlements
**同名直拷字段:**
**需映射/派生字段:** recharge_order_id, tenant_id, site_id, member_id, member_name_snapshot, member_phone_snapshot, tenant_member_card_id, member_card_type_name, settle_relate_id, settle_type, settle_name, is_first, pay_amount, refund_amount, point_amount, cash_amount, payment_method, create_time, pay_time
### billiards_dwd.dwd_recharge_order_ex
来源billiards_ods.recharge_settlements
**同名直拷字段:**
**需映射/派生字段:** recharge_order_id, site_name_snapshot, settle_status, is_bind_member, is_activity, is_use_coupon, is_use_discount, can_be_revoked, online_amount, balance_amount, card_amount, coupon_amount, recharge_card_amount, gift_card_amount, prepay_money, consume_money, goods_money, real_goods_money, table_charge_money, service_money, activity_discount, all_coupon_discount, goods_promotion_money, assistant_promotion_money, assistant_pd_money, assistant_cx_money, assistant_manual_discount, coupon_sale_amount, member_discount_amount, point_discount_price, point_discount_cost, adjust_amount, rounding_amount, operator_id, operator_name_snapshot, salesman_user_id, salesman_name, order_remark, table_id, serial_number, revoke_order_id, revoke_order_name, revoke_time
### billiards_dwd.dwd_payment
来源billiards_ods.payment_transactions
**同名直拷字段:** site_id, relate_type, relate_id, pay_amount, pay_status, payment_method, online_pay_channel, create_time, pay_time
**需映射/派生字段:** payment_id, pay_date
### billiards_dwd.dwd_refund
来源billiards_ods.refund_transactions
**同名直拷字段:** tenant_id, site_id, relate_type, relate_id, pay_amount, channel_fee, pay_time, create_time, payment_method, member_id, member_card_id
**需映射/派生字段:** refund_id
### billiards_dwd.dwd_refund_ex
来源billiards_ods.refund_transactions
**同名直拷字段:** pay_sn, refund_amount, round_amount, balance_frozen_amount, card_frozen_amount, pay_status, action_type, is_revoke, is_delete, check_status, online_pay_channel, online_pay_type, pay_terminal, pay_config_id, cashier_point_id, operator_id, channel_payer_id, channel_pay_no
**需映射/派生字段:** refund_id, tenant_name

View File

@@ -1,692 +0,0 @@
{
"generated_at": "2025-12-09T01:38:19.992961",
"tables": [
{
"dwd_table": "billiards_dwd.dim_site",
"ods_table": "billiards_ods.table_fee_transactions",
"count": {
"dwd": 1,
"ods": 200,
"diff": -199
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_site_ex",
"ods_table": "billiards_ods.table_fee_transactions",
"count": {
"dwd": 1,
"ods": 200,
"diff": -199
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_table",
"ods_table": "billiards_ods.site_tables_master",
"count": {
"dwd": 71,
"ods": 71,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_table_ex",
"ods_table": "billiards_ods.site_tables_master",
"count": {
"dwd": 71,
"ods": 71,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_assistant",
"ods_table": "billiards_ods.assistant_accounts_master",
"count": {
"dwd": 50,
"ods": 50,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_assistant_ex",
"ods_table": "billiards_ods.assistant_accounts_master",
"count": {
"dwd": 50,
"ods": 50,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_member",
"ods_table": "billiards_ods.member_profiles",
"count": {
"dwd": 199,
"ods": 199,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_member_ex",
"ods_table": "billiards_ods.member_profiles",
"count": {
"dwd": 199,
"ods": 199,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_member_card_account",
"ods_table": "billiards_ods.member_stored_value_cards",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "balance",
"dwd_sum": 31061.03,
"ods_sum": 31061.03,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dim_member_card_account_ex",
"ods_table": "billiards_ods.member_stored_value_cards",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "deliveryfeededuct",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dim_tenant_goods",
"ods_table": "billiards_ods.tenant_goods_master",
"count": {
"dwd": 156,
"ods": 156,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_tenant_goods_ex",
"ods_table": "billiards_ods.tenant_goods_master",
"count": {
"dwd": 156,
"ods": 156,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_store_goods",
"ods_table": "billiards_ods.store_goods_master",
"count": {
"dwd": 161,
"ods": 161,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_store_goods_ex",
"ods_table": "billiards_ods.store_goods_master",
"count": {
"dwd": 161,
"ods": 161,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_goods_category",
"ods_table": "billiards_ods.stock_goods_category_tree",
"count": {
"dwd": 9,
"ods": 9,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_groupbuy_package",
"ods_table": "billiards_ods.group_buy_packages",
"count": {
"dwd": 17,
"ods": 17,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_groupbuy_package_ex",
"ods_table": "billiards_ods.group_buy_packages",
"count": {
"dwd": 17,
"ods": 17,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_settlement_head",
"ods_table": "billiards_ods.settlement_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_settlement_head_ex",
"ods_table": "billiards_ods.settlement_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_table_fee_log",
"ods_table": "billiards_ods.table_fee_transactions",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "adjust_amount",
"dwd_sum": 1157.45,
"ods_sum": 1157.45,
"diff": 0.0
},
{
"column": "coupon_promotion_amount",
"dwd_sum": 11244.49,
"ods_sum": 11244.49,
"diff": 0.0
},
{
"column": "ledger_amount",
"dwd_sum": 18107.0,
"ods_sum": 18107.0,
"diff": 0.0
},
{
"column": "member_discount_amount",
"dwd_sum": 1149.19,
"ods_sum": 1149.19,
"diff": 0.0
},
{
"column": "real_table_charge_money",
"dwd_sum": 5705.06,
"ods_sum": 5705.06,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_table_fee_log_ex",
"ods_table": "billiards_ods.table_fee_transactions",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "fee_total",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "mgmt_fee",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "service_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "used_card_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_table_fee_adjust",
"ods_table": "billiards_ods.table_fee_discount_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "ledger_amount",
"dwd_sum": 20650.84,
"ods_sum": 20650.84,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_table_fee_adjust_ex",
"ods_table": "billiards_ods.table_fee_discount_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_store_goods_sale",
"ods_table": "billiards_ods.store_goods_sales_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "cost_money",
"dwd_sum": 22.3,
"ods_sum": 22.3,
"diff": 0.0
},
{
"column": "ledger_amount",
"dwd_sum": 4583.0,
"ods_sum": 4583.0,
"diff": 0.0
},
{
"column": "real_goods_money",
"dwd_sum": 3791.0,
"ods_sum": 3791.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_store_goods_sale_ex",
"ods_table": "billiards_ods.store_goods_sales_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "coupon_deduct_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "discount_money",
"dwd_sum": 792.0,
"ods_sum": 792.0,
"diff": 0.0
},
{
"column": "member_discount_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "option_coupon_deduct_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "option_member_discount_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "point_discount_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "point_discount_money_cost",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "push_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_assistant_service_log",
"ods_table": "billiards_ods.assistant_service_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "coupon_deduct_money",
"dwd_sum": 626.83,
"ods_sum": 626.83,
"diff": 0.0
},
{
"column": "ledger_amount",
"dwd_sum": 63251.37,
"ods_sum": 63251.37,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_assistant_service_log_ex",
"ods_table": "billiards_ods.assistant_service_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "manual_discount_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "member_discount_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "service_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_assistant_trash_event",
"ods_table": "billiards_ods.assistant_cancellation_records",
"count": {
"dwd": 15,
"ods": 15,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_assistant_trash_event_ex",
"ods_table": "billiards_ods.assistant_cancellation_records",
"count": {
"dwd": 15,
"ods": 15,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_member_balance_change",
"ods_table": "billiards_ods.member_balance_changes",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_member_balance_change_ex",
"ods_table": "billiards_ods.member_balance_changes",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "refund_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_groupbuy_redemption",
"ods_table": "billiards_ods.group_buy_redemption_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "coupon_money",
"dwd_sum": 12266.0,
"ods_sum": 12266.0,
"diff": 0.0
},
{
"column": "ledger_amount",
"dwd_sum": 12049.53,
"ods_sum": 12049.53,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_groupbuy_redemption_ex",
"ods_table": "billiards_ods.group_buy_redemption_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "assistant_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "assistant_service_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "goods_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "recharge_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "reward_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "table_service_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_platform_coupon_redemption",
"ods_table": "billiards_ods.platform_coupon_redemption_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "coupon_money",
"dwd_sum": 11956.0,
"ods_sum": 11956.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_platform_coupon_redemption_ex",
"ods_table": "billiards_ods.platform_coupon_redemption_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_recharge_order",
"ods_table": "billiards_ods.recharge_settlements",
"count": {
"dwd": 74,
"ods": 74,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_recharge_order_ex",
"ods_table": "billiards_ods.recharge_settlements",
"count": {
"dwd": 74,
"ods": 74,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_payment",
"ods_table": "billiards_ods.payment_transactions",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "pay_amount",
"dwd_sum": 10863.0,
"ods_sum": 10863.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_refund",
"ods_table": "billiards_ods.refund_transactions",
"count": {
"dwd": 11,
"ods": 11,
"diff": 0
},
"amounts": [
{
"column": "channel_fee",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "pay_amount",
"dwd_sum": -62186.0,
"ods_sum": -62186.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_refund_ex",
"ods_table": "billiards_ods.refund_transactions",
"count": {
"dwd": 11,
"ods": 11,
"diff": 0
},
"amounts": [
{
"column": "balance_frozen_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "card_frozen_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "refund_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "round_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
}
],
"note": "行数/金额核对,金额字段基于列名包含 amount/money/fee/balance 的数值列自动扫描。"
}

View File

@@ -1,5 +1,5 @@
{ {
"generated_at": "2025-12-09T03:43:54.887796", "generated_at": "2025-12-09T05:21:24.745244",
"tables": [ "tables": [
{ {
"dwd_table": "billiards_dwd.dim_site", "dwd_table": "billiards_dwd.dim_site",
@@ -159,9 +159,9 @@
"dwd_table": "billiards_dwd.dim_goods_category", "dwd_table": "billiards_dwd.dim_goods_category",
"ods_table": "billiards_ods.stock_goods_category_tree", "ods_table": "billiards_ods.stock_goods_category_tree",
"count": { "count": {
"dwd": 9, "dwd": 26,
"ods": 9, "ods": 9,
"diff": 0 "diff": 17
}, },
"amounts": [] "amounts": []
}, },

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,634 @@
# -*- coding: utf-8 -*-
import ast
import json
import re
from collections import deque
from pathlib import Path
ROOT = Path(r"C:\dev\LLTQ\ETL\feiqiu-ETL")
SQL_PATH = ROOT / "etl_billiards" / "database" / "schema_dwd_doc.sql"
DOC_DIR = Path(r"C:\dev\LLTQ\export\test-json-doc")
DWD_TASK_PATH = ROOT / "etl_billiards" / "tasks" / "dwd_load_task.py"
SCD_COLS = {"scd2_start_time", "scd2_end_time", "scd2_is_current", "scd2_version"}
SITEPROFILE_FIELD_PURPOSE = {
"id": "门店 ID用于门店维度关联。",
"org_id": "组织/机构 ID用于组织维度归属。",
"shop_name": "门店名称,用于展示与查询。",
"site_label": "门店标签(如 A/B 店),用于展示与分组。",
"full_address": "门店详细地址,用于展示与地理信息。",
"address": "门店地址简称/快照,用于展示。",
"longitude": "经度,用于定位与地图展示。",
"latitude": "纬度,用于定位与地图展示。",
"tenant_site_region_id": "租户下门店区域 ID用于区域维度分析。",
"business_tel": "门店电话,用于联系信息展示。",
"site_type": "门店类型枚举,用于门店分类。",
"shop_status": "门店状态枚举,用于营业状态标识。",
"tenant_id": "租户/品牌 ID用于商户维度过滤与关联。",
"auto_light": "是否启用自动灯控配置,用于门店设备策略。",
"attendance_enabled": "是否启用考勤功能,用于门店考勤配置。",
"attendance_distance": "考勤允许距离(米),用于考勤打卡限制。",
"prod_env": "环境标识(生产/测试),用于区分配置环境。",
"light_status": "灯控状态/开关,用于灯控设备管理。",
"light_type": "灯控类型,用于设备类型区分。",
"light_token": "灯控控制令牌,用于对接灯控服务。",
"avatar": "门店头像/图片 URL用于展示。",
"wifi_name": "门店 WiFi 名称,用于展示与引导。",
"wifi_password": "门店 WiFi 密码,用于展示与引导。",
"customer_service_qrcode": "客服二维码 URL用于引导联系。",
"customer_service_wechat": "客服微信号,用于引导联系。",
"fixed_pay_qrCode": "固定收款码二维码URL用于收款引导。",
"create_time": "门店创建时间(快照字段)。",
"update_time": "门店更新时间(快照字段)。",
}
def _escape_sql(s: str) -> str:
return (s or "").replace("'", "''")
def _first_sentence(text: str, max_len: int = 140) -> str:
s = re.sub(r"\s+", " ", (text or "").strip())
if not s:
return ""
parts = re.split(r"[。;;]\s*", s)
s = parts[0].strip() if parts else s
if len(s) > max_len:
s = s[: max_len - 1] + ""
return s
def normalize_key(s: str) -> str:
return re.sub(r"[_\-\s]", "", (s or "").lower())
def snake_to_lower_camel(s: str) -> str:
parts = re.split(r"[_\-\s]+", s)
if not parts:
return s
first = parts[0].lower()
rest = "".join(p[:1].upper() + p[1:] for p in parts[1:] if p)
return first + rest
def snake_to_upper_camel(s: str) -> str:
parts = re.split(r"[_\-\s]+", s)
return "".join(p[:1].upper() + p[1:] for p in parts if p)
def find_key_in_record(record: dict, token: str) -> str | None:
if not isinstance(record, dict):
return None
if token in record:
return token
norm_to_key = {normalize_key(k): k for k in record.keys()}
candidates = [
token,
token.lower(),
token.upper(),
snake_to_lower_camel(token),
snake_to_upper_camel(token),
]
# 常见变体siteProfile/siteprofile
if normalize_key(token) == "siteprofile":
candidates.extend(["siteProfile", "siteprofile"])
for c in candidates:
nk = normalize_key(c)
if nk in norm_to_key:
return norm_to_key[nk]
return None
def parse_dwd_task_mappings(path: Path):
mod = ast.parse(path.read_text(encoding="utf-8"))
table_map = None
fact_mappings = None
for node in mod.body:
if isinstance(node, ast.ClassDef) and node.name == "DwdLoadTask":
for stmt in node.body:
if isinstance(stmt, ast.Assign) and len(stmt.targets) == 1 and isinstance(stmt.targets[0], ast.Name):
name = stmt.targets[0].id
if name == "TABLE_MAP":
table_map = ast.literal_eval(stmt.value)
elif name == "FACT_MAPPINGS":
fact_mappings = ast.literal_eval(stmt.value)
if isinstance(stmt, ast.AnnAssign) and isinstance(stmt.target, ast.Name):
name = stmt.target.id
if name == "TABLE_MAP":
table_map = ast.literal_eval(stmt.value)
elif name == "FACT_MAPPINGS":
fact_mappings = ast.literal_eval(stmt.value)
if not isinstance(table_map, dict) or not isinstance(fact_mappings, dict):
raise RuntimeError("Failed to parse TABLE_MAP/FACT_MAPPINGS from dwd_load_task.py")
return table_map, fact_mappings
def parse_columns_from_ddl(create_sql: str):
start = create_sql.find("(")
end = create_sql.rfind(")")
body = create_sql[start + 1 : end]
cols = []
for line in body.splitlines():
s = line.strip().rstrip(",")
if not s:
continue
if s.upper().startswith("PRIMARY KEY"):
continue
if s.upper().startswith("CONSTRAINT "):
continue
m = re.match(r"^([A-Za-z_][A-Za-z0-9_]*)\s+", s)
if not m:
continue
name = m.group(1)
if name.upper() in {"PRIMARY", "UNIQUE", "FOREIGN", "CHECK"}:
continue
cols.append(name.lower())
return cols
def _find_best_record_list(data, required_norm_keys: set[str]):
best = None
best_score = -1.0
best_path: list[str] = []
q = deque([(data, 0, [])])
visited = 0
while q and visited < 25000:
node, depth, path = q.popleft()
visited += 1
if depth > 10:
continue
if isinstance(node, list):
if node and all(isinstance(x, dict) for x in node[:3]):
scores = []
for x in node[:5]:
keys_norm = {normalize_key(k) for k in x.keys()}
scores.append(len(keys_norm & required_norm_keys))
score = sum(scores) / max(1, len(scores))
if score > best_score:
best_score = score
best = node
best_path = path
for x in node[:10]:
q.append((x, depth + 1, path))
else:
for x in node[:120]:
q.append((x, depth + 1, path))
elif isinstance(node, dict):
for k, v in list(node.items())[:160]:
q.append((v, depth + 1, path + [str(k)]))
node_str = ".".join(best_path) if best_path else "$"
return best or [], node_str
def _format_example(value, max_len: int = 120) -> str:
if value is None:
return "NULL"
if isinstance(value, bool):
return "true" if value else "false"
if isinstance(value, (int, float)):
return str(value)
if isinstance(value, str):
s = value.strip()
if len(s) > max_len:
s = s[: max_len - 1] + ""
return s
if isinstance(value, dict):
keys = list(value)[:6]
mini = {k: value.get(k) for k in keys}
rendered = json.dumps(mini, ensure_ascii=False)
if len(value) > len(keys):
rendered = rendered[:-1] + ", …}"
if len(rendered) > max_len:
rendered = rendered[: max_len - 1] + ""
return rendered
if isinstance(value, list):
if not value:
return "[]"
rendered = json.dumps(value[0], ensure_ascii=False)
if len(value) > 1:
rendered = f"[{rendered}, …] (len={len(value)})"
else:
rendered = f"[{rendered}]"
if len(rendered) > max_len:
rendered = rendered[: max_len - 1] + ""
return rendered
s = str(value)
if len(s) > max_len:
s = s[: max_len - 1] + ""
return s
def _infer_purpose(table: str, col: str, json_path: str | None) -> str:
lcol = col.lower()
if lcol in SCD_COLS:
if lcol == "scd2_start_time":
return "SCD2 开始时间(版本生效起点),用于维度慢变追踪。"
if lcol == "scd2_end_time":
return "SCD2 结束时间(默认 9999-12-31 表示当前版本),用于维度慢变追踪。"
if lcol == "scd2_is_current":
return "SCD2 当前版本标记1=当前0=历史),用于筛选最新维度记录。"
if lcol == "scd2_version":
return "SCD2 版本号(自增),用于与时间段一起避免版本重叠。"
if json_path and json_path.startswith("siteProfile."):
sf = json_path.split(".", 1)[1]
return SITEPROFILE_FIELD_PURPOSE.get(sf, "门店快照字段,用于门店维度补充信息。")
if lcol.endswith("_id"):
return "标识类 ID 字段,用于关联/定位相关实体。"
if lcol.endswith("_time") or lcol.endswith("time") or lcol.endswith("_date"):
return "时间/日期字段,用于记录业务时间与统计口径对齐。"
if any(k in lcol for k in ["amount", "money", "fee", "price", "deduct", "cost", "balance"]):
return "金额字段,用于计费/结算/核算等金额计算。"
if any(k in lcol for k in ["count", "num", "number", "seconds", "qty", "quantity"]):
return "数量/时长字段,用于统计与计量。"
if lcol.endswith("_name") or lcol.endswith("name"):
return "名称字段,用于展示与辅助识别。"
if lcol.endswith("_status") or lcol == "status":
return "状态枚举字段,用于标识业务状态。"
if lcol.startswith("is_") or lcol.startswith("can_"):
return "布尔/开关字段,用于表示是否/可用性等业务开关。"
# 表级兜底
if table.startswith("dim_"):
return "维度字段,用于补充维度属性。"
return "明细字段,用于记录事实取值。"
def _parse_json_extract(expr: str):
# e.g. siteprofile->>'org_id'
m = re.match(r"^([A-Za-z_][A-Za-z0-9_]*)\s*->>\s*'([^']+)'\s*$", expr)
if not m:
return None
base = m.group(1)
field = m.group(2)
if normalize_key(base) == "siteprofile":
base = "siteProfile"
return base, field
def build_table_comment(table: str, source_ods: str | None, source_json_base: str | None) -> str:
table_l = table.lower()
if table_l.startswith("dim_"):
kind = "DWD 维度表"
else:
kind = "DWD 明细事实表"
extra = "扩展字段表" if table_l.endswith("_ex") else ""
if source_ods and source_json_base:
src = (
f"ODS 来源表:{source_ods}(对应 JSON{source_json_base}.json分析{source_json_base}-Analysis.md"
f"装载/清洗逻辑参考etl_billiards/tasks/dwd_load_task.pyDwdLoadTask"
)
else:
src = "来源:由 ODS 清洗装载生成(详见 DWD 装载任务)。"
return f"{kind}{('' + extra + '') if extra else ''}{table_l}{src}"
def get_source_info(table_l: str, table_map: dict) -> tuple[str | None, str | None]:
key = f"billiards_dwd.{table_l}"
source_ods = table_map.get(key)
if not source_ods:
return None, None
json_base = source_ods.split(".")[-1]
return source_ods, json_base
def build_column_mappings(table_l: str, cols: list[str], fact_mappings: dict) -> dict[str, tuple[str | None, str | None]]:
# return col -> (json_path, src_expr)
mapping_list = fact_mappings.get(f"billiards_dwd.{table_l}") or []
explicit = {dwd_col.lower(): src_expr for dwd_col, src_expr, _cast in mapping_list}
casts = {dwd_col.lower(): cast for dwd_col, _src_expr, cast in mapping_list}
out: dict[str, tuple[str | None, str | None]] = {}
for c in cols:
if c in SCD_COLS:
out[c] = (None, None)
continue
src_expr = explicit.get(c, c)
cast = casts.get(c)
json_path = None
parsed = _parse_json_extract(src_expr)
if parsed:
base, field = parsed
json_path = f"{base}.{field}"
else:
# derived: pay_date uses pay_time + cast date
if cast == "date":
json_path = src_expr
else:
json_path = src_expr
out[c] = (json_path, src_expr)
return out
def load_json_records(json_base: str, required_norm_keys: set[str]):
json_path = DOC_DIR / f"{json_base}.json"
data = json.loads(json_path.read_text(encoding="utf-8"))
return _find_best_record_list(data, required_norm_keys)
def pick_example_from_record(record: dict, json_path: str | None):
if not json_path:
return None
if json_path.startswith("siteProfile."):
base_key = find_key_in_record(record, "siteProfile")
base = record.get(base_key) if base_key else None
if isinstance(base, dict):
field = json_path.split(".", 1)[1]
return base.get(field)
return None
# plain key
key = find_key_in_record(record, json_path)
if key:
return record.get(key)
# fallback: try match by normalized name
nk = normalize_key(json_path)
for k in record.keys():
if normalize_key(k) == nk:
return record.get(k)
return None
def resolve_json_field_display(records: list, json_path: str | None, cast: str | None = None) -> str:
if not json_path:
return ""
if json_path.startswith("siteProfile."):
return json_path
actual_key = None
for r in records[:80]:
if not isinstance(r, dict):
continue
k = find_key_in_record(r, json_path)
if k:
actual_key = k
break
base = actual_key or json_path
if cast == "date":
return f"{base}派生DATE({base})"
if cast == "boolean":
return f"{base}派生BOOLEAN({base})"
if cast in {"numeric", "timestamptz"}:
return f"{base}派生CAST({base} AS {cast})"
return base
def resolve_ods_source_field(records: list, src_expr: str | None, cast: str | None = None) -> str:
if not src_expr:
return ""
parsed = _parse_json_extract(src_expr)
if parsed:
base, field = parsed
# 统一大小写展示
if normalize_key(base) == "siteprofile":
base = "siteProfile"
return f"{base}.{field}"
# 直接字段:尽量输出 JSON 实际键名(大小写/驼峰)
actual = None
for r in records[:80]:
if not isinstance(r, dict):
continue
k = find_key_in_record(r, src_expr)
if k:
actual = k
break
base = actual or src_expr
if cast == "date":
return f"{base}派生DATE({base})"
if cast == "boolean":
return f"{base}派生BOOLEAN({base})"
if cast in {"numeric", "timestamptz"}:
return f"{base}派生CAST({base} AS {cast})"
return base
def resolve_json_field_triplet(
json_file: str | None,
record_node: str | None,
records: list,
json_path: str | None,
cast: str | None = None,
) -> str:
if not json_file:
json_file = ""
node = record_node or "$"
if not json_path:
return f"{json_file} - 无 - 无"
if json_path.startswith("siteProfile."):
base_key = None
field_key = None
for r in records[:80]:
if not isinstance(r, dict):
continue
base_key = find_key_in_record(r, "siteProfile")
if base_key:
base = r.get(base_key)
if isinstance(base, dict):
raw_field = json_path.split(".", 1)[1]
# 尽量匹配子字段大小写
if raw_field in base:
field_key = raw_field
else:
nk = normalize_key(raw_field)
for k in base.keys():
if normalize_key(k) == nk:
field_key = k
break
break
base_key = base_key or "siteProfile"
field_key = field_key or json_path.split(".", 1)[1]
node = f"{node}.{base_key}" if node else base_key
field = field_key
else:
actual = None
for r in records[:80]:
if isinstance(r, dict):
actual = find_key_in_record(r, json_path)
if actual:
break
field = actual or json_path
if cast == "date":
field = f"{field}派生DATE({field})"
elif cast == "boolean":
field = f"{field}派生BOOLEAN({field})"
elif cast in {"numeric", "timestamptz"}:
field = f"{field}派生CAST({field} AS {cast})"
return f"{json_file} - {node} - {field}"
def main():
table_map, fact_mappings = parse_dwd_task_mappings(DWD_TASK_PATH)
raw = SQL_PATH.read_text(encoding="utf-8", errors="replace")
newline = "\r\n" if "\r\n" in raw else "\n"
# strip all sql comments and existing COMMENT ON statements, incl. DO-block comment exec lines
kept_lines = []
for line in raw.splitlines(True):
if line.lstrip().startswith("--"):
continue
if re.match(r"^\s*COMMENT ON\s+(TABLE|COLUMN)\s+", line, re.I):
continue
if "COMMENT ON COLUMN" in line or "COMMENT ON TABLE" in line:
# remove legacy execute format lines too
continue
kept_lines.append(line)
clean = "".join(kept_lines)
create_re = re.compile(
r"(^\s*CREATE TABLE IF NOT EXISTS\s+(?P<table>[A-Za-z0-9_]+)\s*\([\s\S]*?\)\s*;)",
re.M,
)
out_parts = []
last = 0
count_tables = 0
for m in create_re.finditer(clean):
stmt = m.group(1)
table = m.group("table").lower()
out_parts.append(clean[last : m.end()])
cols = parse_columns_from_ddl(stmt)
source_ods, json_base = get_source_info(table, table_map)
# derive required keys
required_norm = set()
col_map = build_column_mappings(table, cols, fact_mappings)
# cast map for json field display
cast_map = {
dwd_col.lower(): cast
for dwd_col, _src_expr, cast in (fact_mappings.get(f"billiards_dwd.{table}") or [])
}
src_expr_map = {
dwd_col.lower(): src_expr
for dwd_col, src_expr, _cast in (fact_mappings.get(f"billiards_dwd.{table}") or [])
}
for c, (jp, _src) in col_map.items():
if not jp:
continue
if jp.startswith("siteProfile."):
required_norm.add(normalize_key("siteProfile"))
else:
required_norm.add(normalize_key(jp))
records = []
record_node = "$"
if json_base and (DOC_DIR / f"{json_base}.json").exists():
try:
records, record_node = load_json_records(json_base, required_norm)
except Exception:
records = []
record_node = "$"
table_comment = build_table_comment(table, source_ods, json_base)
comment_lines = [f"COMMENT ON TABLE billiards_dwd.{table} IS '{_escape_sql(table_comment)}';"]
for c in cols:
jp, _src = col_map.get(c, (None, None))
if c in SCD_COLS:
if c == "scd2_start_time":
ex = "2025-11-10T00:00:00+08:00"
elif c == "scd2_end_time":
ex = "9999-12-31T00:00:00+00:00"
elif c == "scd2_is_current":
ex = "1"
else:
ex = "1"
json_field = "无 - DWD慢变元数据 - 无"
ods_src = "DWD慢变元数据"
else:
# pick example from first records
ex_val = None
for r in records[:80]:
v = pick_example_from_record(r, jp)
if v not in (None, ""):
ex_val = v
break
ex = _format_example(ex_val)
json_field = resolve_json_field_triplet(
f"{json_base}.json" if json_base else None,
record_node,
records,
jp,
cast_map.get(c),
)
src_expr = src_expr_map.get(c, jp)
ods_src = resolve_ods_source_field(records, src_expr, cast_map.get(c))
purpose = _first_sentence(_infer_purpose(table, c, jp), 140)
func = purpose
if "用于" not in func:
func = "用于" + func.rstrip("")
if source_ods:
ods_table_only = source_ods.split(".")[-1]
ods_src_display = f"{ods_table_only} - {ods_src}"
else:
ods_src_display = f"无 - {ods_src}"
comment = (
f"【说明】{purpose}"
f" 【示例】{ex}{func})。"
f" 【ODS来源】{ods_src_display}"
f" 【JSON字段】{json_field}"
)
comment_lines.append(
f"COMMENT ON COLUMN billiards_dwd.{table}.{c} IS '{_escape_sql(comment)}';"
)
out_parts.append(newline + newline + (newline.join(comment_lines)) + newline + newline)
last = m.end()
count_tables += 1
out_parts.append(clean[last:])
result = "".join(out_parts)
# collapse extra blank lines
result = re.sub(r"(?:\r?\n){4,}", newline * 3, result)
backup = SQL_PATH.with_suffix(SQL_PATH.suffix + ".bak")
if not backup.exists():
backup.write_text(raw, encoding="utf-8")
SQL_PATH.write_text(result, encoding="utf-8")
print(f"Rewrote comments for {count_tables} tables: {SQL_PATH}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,560 @@
# -*- coding: utf-8 -*-
import json
import re
from pathlib import Path
from collections import defaultdict
SQL_PATH = Path(r"C:\dev\LLTQ\ETL\feiqiu-ETL\etl_billiards\database\schema_ODS_doc.sql")
DOC_DIR = Path(r"C:\dev\LLTQ\export\test-json-doc")
TABLE_CN = {
"member_profiles": "会员档案/会员账户信息",
"member_balance_changes": "会员余额变更流水",
"member_stored_value_cards": "会员储值/卡券账户列表",
"recharge_settlements": "充值结算记录",
"settlement_records": "结账/结算记录",
"assistant_cancellation_records": "助教作废/取消记录",
"assistant_accounts_master": "助教档案主数据",
"assistant_service_records": "助教服务流水",
"site_tables_master": "门店桌台主数据",
"table_fee_discount_records": "台费折扣记录",
"table_fee_transactions": "台费流水",
"goods_stock_movements": "商品库存变动流水",
"stock_goods_category_tree": "商品分类树",
"goods_stock_summary": "商品库存汇总",
"payment_transactions": "支付流水",
"refund_transactions": "退款流水",
"platform_coupon_redemption_records": "平台券核销/使用记录",
"tenant_goods_master": "租户商品主数据",
"group_buy_packages": "团购套餐主数据",
"group_buy_redemption_records": "团购核销记录",
"settlement_ticket_details": "结算小票明细",
"store_goods_master": "门店商品主数据",
"store_goods_sales_records": "门店商品销售流水",
}
COMMON_FIELD_PURPOSE = {
"tenant_id": "租户/品牌 ID用于商户维度过滤与关联。",
"site_id": "门店 ID用于门店维度过滤与关联。",
"register_site_id": "会员注册门店 ID用于归属门店维度关联。",
"site_name": "门店名称快照,用于直接展示。",
"id": "本表主键 ID用于唯一标识一条记录。",
"system_member_id": "系统级会员 ID跨门店/跨卡种统一到‘人’的维度)。",
"order_trade_no": "订单交易号,用于串联同一订单下的各类消费明细。",
"order_settle_id": "订单结算/结账主键,用于关联结算记录与小票明细。",
"order_pay_id": "关联支付流水的主键 ID用于追溯支付明细。",
"point": "积分余额,用于记录会员积分取值。",
"growth_value": "成长值/成长积分,用于会员成长与等级评估。",
"referrer_member_id": "推荐人会员 ID用于记录会员推荐/拉新关系。",
"create_time": "记录创建时间(业务侧产生时间)。",
"status": "状态枚举,用于标识记录当前业务状态。",
"user_status": "用户状态枚举,用于标识会员账户/用户可用状态。",
"is_delete": "逻辑删除标记0=否1=是)。",
"payload": "完整原始 JSON 记录快照,用于回溯与二次解析。",
"source_file": "ETL 元数据:原始导出文件名,用于数据追溯。",
"source_endpoint": "ETL 元数据:采集来源(接口/文件路径),用于数据追溯。",
"fetched_at": "ETL 元数据:采集/入库时间戳,用于口径对齐与增量处理。",
}
ETL_META_FIELDS = {"source_file", "source_endpoint", "fetched_at"}
def _first_sentence(text: str, max_len: int = 120) -> str:
s = re.sub(r"\s+", " ", (text or "").strip())
if not s:
return ""
parts = re.split(r"[。;;]\s*", s)
s = parts[0].strip() if parts else s
if len(s) > max_len:
s = s[: max_len - 1] + ""
return s
def _escape_sql(s: str) -> str:
return (s or "").replace("'", "''")
def normalize_key(s: str) -> str:
return re.sub(r"[_\-\s]", "", (s or "").lower())
def snake_to_lower_camel(s: str) -> str:
parts = re.split(r"[_\-\s]+", s)
if not parts:
return s
first = parts[0].lower()
rest = "".join(p[:1].upper() + p[1:] for p in parts[1:] if p)
return first + rest
def snake_to_upper_camel(s: str) -> str:
parts = re.split(r"[_\-\s]+", s)
return "".join(p[:1].upper() + p[1:] for p in parts if p)
def find_key_in_record(record: dict, token: str) -> str | None:
if not isinstance(record, dict) or not token:
return None
if token in record:
return token
norm_to_key = {normalize_key(k): k for k in record.keys()}
candidates = [
token,
token.lower(),
token.upper(),
snake_to_lower_camel(token),
snake_to_upper_camel(token),
]
for c in candidates:
nk = normalize_key(c)
if nk in norm_to_key:
return norm_to_key[nk]
return None
def _infer_purpose(_table: str, col: str) -> str:
if col in COMMON_FIELD_PURPOSE:
return COMMON_FIELD_PURPOSE[col]
lower = col.lower()
if lower.endswith("_id"):
return "标识类 ID 字段,用于关联/定位相关实体。"
if lower.endswith("_time") or lower.endswith("time"):
return "时间字段,用于记录业务时间点/发生时间。"
if any(k in lower for k in ["amount", "money", "fee", "price", "deduct", "cost"]):
return "金额字段,用于计费/结算/分摊等金额计算。"
if any(k in lower for k in ["count", "num", "number", "seconds", "qty"]):
return "数量/时长字段,用于统计与计量。"
if lower.endswith("_name") or lower.endswith("name"):
return "名称字段,用于展示与辅助识别。"
if lower.endswith("_code") or lower.endswith("code"):
return "编码/枚举字段,用于表示类型、等级或业务枚举。"
if lower.startswith("is_") or lower.startswith("able_") or lower.startswith("can_"):
return "布尔/开关字段,用于表示权限、可用性或状态开关。"
return "来自 JSON 导出的原始字段,用于保留业务取值。"
def _format_example(value, max_len: int = 120) -> str:
if value is None:
return "NULL"
if isinstance(value, bool):
return "true" if value else "false"
if isinstance(value, (int, float)):
return str(value)
if isinstance(value, str):
s = value.strip()
if len(s) > max_len:
s = s[: max_len - 1] + ""
return s
if isinstance(value, list):
if not value:
return "[]"
sample = value[0]
rendered = json.dumps(sample, ensure_ascii=False)
if len(value) > 1:
rendered = f"[{rendered}, …] (len={len(value)})"
else:
rendered = f"[{rendered}]"
if len(rendered) > max_len:
rendered = rendered[: max_len - 1] + ""
return rendered
if isinstance(value, dict):
keys = list(value)[:6]
mini = {k: value.get(k) for k in keys}
rendered = json.dumps(mini, ensure_ascii=False)
if len(value) > len(keys):
rendered = rendered[:-1] + ", …}"
if len(rendered) > max_len:
rendered = rendered[: max_len - 1] + ""
return rendered
rendered = str(value)
if len(rendered) > max_len:
rendered = rendered[: max_len - 1] + ""
return rendered
def _find_best_record_list(data, columns):
cols = set(columns)
best = None
best_score = -1
queue = [(data, 0)]
visited = 0
while queue and visited < 20000:
node, depth = queue.pop(0)
visited += 1
if depth > 8:
continue
if isinstance(node, list):
if node and all(isinstance(x, dict) for x in node[:3]):
scores = []
for x in node[:5]:
scores.append(len(set(x.keys()) & cols))
score = sum(scores) / max(1, len(scores))
if score > best_score:
best_score = score
best = node
for x in node[:10]:
queue.append((x, depth + 1))
else:
for x in node[:50]:
queue.append((x, depth + 1))
elif isinstance(node, dict):
for v in list(node.values())[:80]:
queue.append((v, depth + 1))
return best
def _find_best_record_list_and_node(data, columns):
cols = set(columns)
best = None
best_score = -1
best_path = []
queue = [(data, 0, [])]
visited = 0
while queue and visited < 25000:
node, depth, path = queue.pop(0)
visited += 1
if depth > 10:
continue
if isinstance(node, list):
if node and all(isinstance(x, dict) for x in node[:3]):
scores = []
for x in node[:5]:
scores.append(len(set(x.keys()) & cols))
score = sum(scores) / max(1, len(scores))
if score > best_score:
best_score = score
best = node
best_path = path
for x in node[:10]:
queue.append((x, depth + 1, path))
else:
for x in node[:80]:
queue.append((x, depth + 1, path))
elif isinstance(node, dict):
for k, v in list(node.items())[:120]:
queue.append((v, depth + 1, path + [str(k)]))
node_str = ".".join(best_path) if best_path else "$"
return best or [], node_str
def _choose_examples(records, columns):
examples = {}
if not records:
return examples
for col in columns:
val = None
for r in records[:120]:
if isinstance(r, dict) and col in r and r[col] not in (None, ""):
val = r[col]
break
examples[col] = val
return examples
def _extract_header_fields(line: str, columns_set):
s = line.strip()
if not s:
return []
# 支持 1. id / 1.1 siteProfile / 8. tenant_id
m = re.match(r"^\d+(?:\.\d+)*[\.)]?\s+(.+)$", s)
if m:
s = m.group(1).strip()
parts = re.split(r"\s*[/、,]\s*", s)
fields = [p.strip() for p in parts if p.strip() in columns_set]
if not fields and s in columns_set:
fields = [s]
if fields and len(line) <= 120:
return fields
return []
def _parse_field_purpose_from_block(block_lines):
lines = [l.rstrip() for l in block_lines]
def pick_after_label(labels):
for i, l in enumerate(lines):
for lab in labels:
if lab in l:
after = l.split(lab, 1)[1].strip()
if after:
return after
buf = []
j = i + 1
while j < len(lines) and not lines[j].strip():
j += 1
for k in range(j, len(lines)):
if not lines[k].strip():
break
if re.match(r"^[\w\u4e00-\u9fff]+[:]", lines[k].strip()):
break
buf.append(lines[k].strip())
if buf:
return " ".join(buf)
return ""
# 兼容「含义(结合其它文件):」「含义(推测):」等变体
picked = pick_after_label(["含义:", "含义:"])
if not picked:
for i, l in enumerate(lines):
s = l.strip()
m = re.match(r"^含义.*[:]\s*(.*)$", s)
if m:
after = m.group(1).strip()
if after:
picked = after
else:
buf = []
j = i + 1
while j < len(lines) and not lines[j].strip():
j += 1
for k in range(j, len(lines)):
if not lines[k].strip():
break
if re.match(r"^[\w\u4e00-\u9fff]+[:]", lines[k].strip()):
break
buf.append(lines[k].strip())
if buf:
picked = " ".join(buf)
break
if not picked:
picked = pick_after_label(["作用:", "作用:"])
if not picked:
for i, l in enumerate(lines):
s = l.strip()
m = re.match(r"^作用.*[:]\s*(.*)$", s)
if m:
after = m.group(1).strip()
if after:
picked = after
break
if not picked:
# 兜底:尽量避开“类型:/唯一值个数:”这类描述
for l in lines:
s = l.strip()
if not s:
continue
if any(
s.startswith(prefix)
for prefix in [
"类型:",
"非空:",
"唯一值",
"观测",
"特征",
"统计",
"分布",
"说明:",
"关联:",
"结构关系",
"和其它表",
"重复记录",
"全部为",
]
):
continue
picked = s
break
return _first_sentence(picked, 160)
def _is_poor_purpose(purpose: str) -> bool:
s = (purpose or "").strip()
if not s:
return True
if s.endswith("") or s.endswith(":"):
return True
if s.startswith("全部为"):
return True
if s.startswith("含义") and ("" in s or ":" in s) and len(s) <= 12:
return True
return False
def parse_analysis(analysis_text: str, columns):
columns_set = set(columns)
blocks = defaultdict(list)
current_fields = []
buf = []
for raw in analysis_text.splitlines():
fields = _extract_header_fields(raw, columns_set)
if fields:
if current_fields and buf:
for f in current_fields:
blocks[f].extend(buf)
current_fields = fields
buf = []
else:
if current_fields:
buf.append(raw)
if current_fields and buf:
for f in current_fields:
blocks[f].extend(buf)
purposes = {}
for col in columns:
if col in blocks and blocks[col]:
p = _parse_field_purpose_from_block(blocks[col])
if p:
purposes[col] = p
return purposes
def parse_columns_from_ddl(create_sql: str):
start = create_sql.find("(")
end = create_sql.rfind(")")
body = create_sql[start + 1 : end]
cols = []
for line in body.splitlines():
s = line.strip().rstrip(",")
if not s:
continue
if s.startswith(")"):
continue
if s.upper().startswith("CONSTRAINT "):
continue
m = re.match(r"^([A-Za-z_][A-Za-z0-9_]*)\s+", s)
if not m:
continue
name = m.group(1)
if name.upper() in {"PRIMARY", "UNIQUE", "FOREIGN", "CHECK"}:
continue
cols.append(name)
return cols
def build_comment_block(table: str, columns, analysis_text: str, records):
# records_node: 由外部确定,避免这里重复遍历 JSON
records, records_node = records
purposes = parse_analysis(analysis_text, columns)
examples = _choose_examples(records, columns)
table_cn = TABLE_CN.get(table, table)
table_comment = (
f"ODS 原始明细表:{table_cn}"
f"来源C:/dev/LLTQ/export/test-json-doc/{table}.json分析{table}-Analysis.md。"
f"字段以导出原样为主ETL 补充 source_file/source_endpoint/fetched_at并保留 payload 为原始记录快照。"
)
lines = []
lines.append(f"COMMENT ON TABLE billiards_ods.{table} IS '{_escape_sql(table_comment)}';")
for col in columns:
json_file = f"{table}.json"
if col in ETL_META_FIELDS:
json_field = f"{json_file} - ETL元数据 - 无"
elif col == "payload":
json_field = f"{json_file} - {records_node} - $"
else:
actual = None
for r in records[:50]:
if isinstance(r, dict):
actual = find_key_in_record(r, col)
if actual:
break
field_name = actual or col
json_field = f"{json_file} - {records_node} - {field_name}"
purpose = purposes.get(col) or _infer_purpose(table, col)
purpose = _first_sentence(purpose, 140) or _infer_purpose(table, col)
if _is_poor_purpose(purpose):
purpose = COMMON_FIELD_PURPOSE.get(col) or _infer_purpose(table, col)
if col in ETL_META_FIELDS:
if col == "source_file":
ex = f"{table}.json"
elif col == "source_endpoint":
ex = f"C:/dev/LLTQ/export/test-json-doc/{table}.json"
else:
ex = "2025-11-10T00:00:00+08:00"
elif col == "payload":
ex = "{...}"
else:
ex = _format_example(examples.get(col))
func = purpose
if "用于" not in func:
func = "用于" + func.rstrip("")
# ODS来源表名-字段名ODS自身字段ETL补充字段标记
if col in ETL_META_FIELDS:
ods_src = f"{table} - {col}ETL补充"
else:
ods_src = f"{table} - {col}"
comment = (
f"【说明】{purpose}"
f" 【示例】{ex}{func})。"
f" 【ODS来源】{ods_src}"
f" 【JSON字段】{json_field}"
)
lines.append(
f"COMMENT ON COLUMN billiards_ods.{table}.{col} IS '{_escape_sql(comment)}';"
)
return "\n".join(lines)
text = SQL_PATH.read_text(encoding="utf-8")
newline = "\r\n" if "\r\n" in text else "\n"
kept = []
for raw_line in text.splitlines(True):
stripped = raw_line.lstrip()
if stripped.startswith("--"):
continue
if re.match(r"^\s*COMMENT ON\s+(TABLE|COLUMN)\s+", raw_line):
continue
kept.append(raw_line)
clean = "".join(kept)
create_re = re.compile(
r"(CREATE TABLE IF NOT EXISTS\s+billiards_ods\.(?P<table>[A-Za-z0-9_]+)\s*\([\s\S]*?\)\s*;)" ,
re.M,
)
out_parts = []
last = 0
count = 0
for m in create_re.finditer(clean):
out_parts.append(clean[last : m.end()])
table = m.group("table")
create_sql = m.group(1)
cols = parse_columns_from_ddl(create_sql)
analysis_text = (DOC_DIR / f"{table}-Analysis.md").read_text(encoding="utf-8")
data = json.loads((DOC_DIR / f"{table}.json").read_text(encoding="utf-8"))
record_list, record_node = _find_best_record_list_and_node(data, cols)
out_parts.append(newline + newline + build_comment_block(table, cols, analysis_text, (record_list, record_node)) + newline + newline)
last = m.end()
count += 1
out_parts.append(clean[last:])
result = "".join(out_parts)
result = re.sub(r"(?:\r?\n){4,}", newline * 3, result)
backup = SQL_PATH.with_suffix(SQL_PATH.suffix + ".rewrite2.bak")
backup.write_text(text, encoding="utf-8")
SQL_PATH.write_text(result, encoding="utf-8")
print(f"Rewrote comments for {count} tables. Backup: {backup}")

1907
tmp/schema_ODS_doc.sql Normal file

File diff suppressed because it is too large Load Diff

1878
tmp/schema_dwd_doc.sql Normal file

File diff suppressed because it is too large Load Diff