Compare commits

5 Commits
main ... dev

Author SHA1 Message Date
Neo
0ab040b9fb 整理项目 2025-12-09 05:43:04 +08:00
Neo
0c29bd41f8 整理项目 2025-12-09 05:42:57 +08:00
Neo
561c640700 DWD完成 2025-12-09 04:57:05 +08:00
Neo
f301cc1fd5 更新一些文件 2025-12-06 00:17:42 +08:00
Neo
6f1d163a99 DWD文档确认 2025-12-05 18:57:20 +08:00
50 changed files with 28644 additions and 4461 deletions

File diff suppressed because it is too large Load Diff

113
README.md
View File

@@ -1,78 +1,57 @@
# 台球场 ETL 系统
# 飞球 ETL 系统ODS → DWD
用于台球门店业务的数据采集与入湖:从上游 API 拉取订单、支付、会员、库存等数据,先落地 ODS再清洗写入事实/维度表,并提供运行追踪、增量游标、数据质量检查与测试脚手架
面向门店业务的 ETL拉取/或离线灌入上游 JSON先落地 ODS再清洗装载 DWD含 SCD2 维度、事实增量),并提供质量校验报表
## 核心特性
- **两阶段链路**ODS 原始留痕 + DWD/事实表清洗,支持回放与重跑。
- **任务注册与调度**`TaskRegistry` 统一管理任务代码,`ETLScheduler` 负责游标、运行记录和失败隔离。
- **统一底座**:配置(默认值 + `.env` + CLI 覆盖)、分页/重试的 API 客户端、批量 Upsert 的数据库封装、SCD2 维度处理、质量检查。
- **测试与回放**ONLINE/OFFLINE 模式切换,`run_tests.py`/`test_presets.py` 支持参数化测试;`MANUAL_INGEST` 可将归档 JSON 重灌入 ODS。
- **可安装**`setup.py` / `entry_point` 提供 `etl-billiards` 命令,或直接 `python -m cli.main` 运行。
## 仓库结构(摘录)
- `etl_billiards/config`:默认配置、环境变量解析、配置加载。
- `etl_billiards/api`HTTP 客户端,内置重试/分页。
- `etl_billiards/database`:连接管理、批量 Upsert。
- `etl_billiards/tasks`业务任务ORDERS、PAYMENTS…、ODS 任务、DWD 任务、人工回放;`base_task.py`/`base_dwd_task.py` 提供模板。
- `etl_billiards/loaders`:事实/维度/ODS Loader`scd/` 为 SCD2。
- `etl_billiards/orchestration`:调度器、任务注册表、游标与运行追踪。
- `etl_billiards/scripts`:测试执行器、数据库连通性检测、预置测试指令。
- `etl_billiards/tests`:单元/集成测试与离线 JSON 归档。
## 支持的任务代码
- **事实/维度**`ORDERS``PAYMENTS``REFUNDS``INVENTORY_CHANGE``COUPON_USAGE``MEMBERS``ASSISTANTS``PRODUCTS``TABLES``PACKAGES_DEF``TOPUPS``TABLE_DISCOUNT``ASSISTANT_ABOLISH``LEDGER``TICKET_DWD``PAYMENTS_DWD``MEMBERS_DWD`
- **ODS 原始采集**`ODS_ORDER_SETTLE``ODS_TABLE_USE``ODS_ASSISTANT_LEDGER``ODS_ASSISTANT_ABOLISH``ODS_GOODS_LEDGER``ODS_PAYMENT``ODS_REFUND``ODS_COUPON_VERIFY``ODS_MEMBER``ODS_MEMBER_CARD``ODS_PACKAGE``ODS_INVENTORY_STOCK``ODS_INVENTORY_CHANGE`
- **辅助**`MANUAL_INGEST`(将归档 JSON 回放到 ODS
## 快速开始
1. **环境要求**Python 3.10+、PostgreSQL。推荐在 `etl_billiards/` 目录下执行命令。
2. **安装依赖**
## 快速运行(离线示例 JSON
1) 环境Python 3.10+、PostgreSQL`.env` 关键项:`PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test``INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc`
2) 安装依赖:
```bash
cd etl_billiards
pip install -r requirements.txt
# 开发模式pip install -e .
```
3. **配置 `.env`**
```
3) 一键 ODS→DWD→质检
```bash
cp .env.example .env
# 核心项
PG_DSN=postgresql://user:pwd@host:5432/LLZQ
API_BASE=https://api.example.com
API_TOKEN=your_token
STORE_ID=2790685415443269
EXPORT_ROOT=/path/to/export
LOG_ROOT=/path/to/logs
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA --pipeline-flow INGEST_ONLY
python -m etl_billiards.cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
python -m etl_billiards.cli.main --tasks DWD_LOAD_FROM_ODS --pipeline-flow INGEST_ONLY
python -m etl_billiards.cli.main --tasks DWD_QUALITY_CHECK --pipeline-flow INGEST_ONLY
# 报表etl_billiards/reports/dwd_quality_report.json
```
配置的生效顺序为 “默认值” < “环境变量/.env” < “CLI 参数”。
4. **运行任务**
```bash
# 运行默认任务集
python -m cli.main
# 按需选择任务(逗号分隔)
python -m cli.main --tasks ODS_ORDER_SETTLE,ORDERS,PAYMENTS
## 目录与文件作用
- 根目录:`etl_billiards/` 主代码;`requirements.txt` 依赖;`run_etl.sh/.bat` 启动脚本;`.env/.env.example` 配置;`tmp/` 存放草稿/调试/备份。
- etl_billiards/ 主线目录
- `config/``defaults.py` 默认值,`env_parser.py` 解析 .env`settings.py` 统一配置加载。
- `api/``client.py` HTTP 请求、重试与分页。
- `database/``connection.py` 连接封装,`operations.py` 批量 upsertDDL`schema_ODS_doc.sql`、`schema_dwd_doc.sql`。
- `tasks/`:业务任务
- `init_schema_task.py`INIT_ODS_SCHEMA / INIT_DWD_SCHEMA。
- `manual_ingest_task.py`:示例 JSON → ODS。
- `dwd_load_task.py`ODS → DWD映射、SCD2/事实增量)。
- 其他任务按需扩展。
- `loaders/`ODS/DWD/SCD2 Loader 实现。
- `scd/``scd2_handler.py` 处理维度 SCD2 历史。
- `quality/`:质量检查器(行数/金额对照)。
- `orchestration/``scheduler.py` 调度;`task_registry.py` 任务注册;`run_tracker.py` 运行记录。
- `scripts/`:重建/测试/探活工具。
- `docs/``ods_to_dwd_mapping.md` 映射说明,`ods_sample_json.md` 示例 JSON 说明,`dwd_quality_check.md` 质检说明。
- `reports/`:质检输出(如 `dwd_quality_report.json`)。
- `tests/`:单元/集成测试;`utils/`:通用工具。
- `backups/`(若存在):关键文件备份。
# Dry-run 示例(不提交事务)
python -m cli.main --tasks ORDERS --dry-run
## 业务流程与文件关系
1) 调度入口:`cli/main.py` 解析 CLI → `orchestration/scheduler.py` 依 `task_registry.py` 创建任务 → 初始化 DB/API/Config 上下文。
2) ODS`init_schema_task.py` 执行 `schema_ODS_doc.sql` 建表;`manual_ingest_task.py` 从 `INGEST_SOURCE_DIR` 读 JSON批量 upsert ODS。
3) DWD`init_schema_task.py` 执行 `schema_dwd_doc.sql` 建表;`dwd_load_task.py` 依据 `TABLE_MAP/FACT_MAPPINGS` 从 ODS 清洗写入 DWD维度走 SCD2`scd/scd2_handler.py`),事实按时间/水位增量。
4) 质检:质量任务读取 ODS/DWD统计行数/金额,输出 `reports/dwd_quality_report.json`。
5) 配置:`config/defaults.py` + `.env` + CLI 参数叠加HTTP如启用在线走 `api/client.py`DB 访问走 `database/connection.py`。
6) 文档:`docs/ods_to_dwd_mapping.md` 记录字段映射;`docs/ods_sample_json.md` 描述示例数据结构,便于对照调试。
# Windows 批处理
..\\run_etl.bat --tasks PAYMENTS
```
5. **查看输出**:日志目录与导出目录分别由 `LOG_ROOT`、`EXPORT_ROOT` 控制;运行追踪与游标记录写入数据库 `etl_admin.*` 表
## 数据与运行流转
- CLI 解析参数 → `AppConfig.load()` 组装配置 → `ETLScheduler` 创建 DB/API/游标/运行追踪器。
- 调度器按任务代码实例化任务,读取/推进游标,落盘运行记录。
- 任务模板:确定时间窗口 → 调用 API/ODS 数据 → 解析校验 → Loader 批量 Upsert/SCD2 → 质量检查 → 提交事务并回写游标。
## 测试与回放
- 单元/集成测试:`pytest` 或 `python scripts/run_tests.py --suite online`。
- 预置组合:`python scripts/run_tests.py --preset offline_realdb`(见 `scripts/test_presets.py`)。
- 离线模式:`TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=... pytest tests/unit/test_etl_tasks_offline.py`。
- 数据库连通性:`python scripts/test_db_connection.py --dsn postgresql://... --query "SELECT 1"`。
## 其他提示
- `.env.example` 列出了所有常用配置;`config/defaults.py` 记录默认值与任务窗口配置。
- `loaders/ods/generic.py` 支持定义主键/列名即可落 ODS`tasks/manual_ingest_task.py` 可将归档 JSON 快速灌入对应 ODS 表。
- 需要新增任务时,在 `tasks/` 中实现并在 `orchestration/task_registry.py` 注册即可复用调度能力。
## 当前状态2025-12-09
- 示例 JSON 全量灌入DWD 行数与 ODS 对齐。
- 分类维度已展平大类+子类:`dim_goods_category` 26 行category_level/leaf 已赋值)。
- 剩余空值多因源数据为空;补值需先确认上游是否提供
## 可精简/归档
- `tmp/`、`tmp/etl_billiards_misc/` 中的草稿、旧备份、调试脚本仅供参考,运行不依赖。
- 根级保留必要文件README、requirements、run_etl.*、.env/.env.example其余临时文件已移至 tmp。

216
README_FULL.md Normal file
View File

@@ -0,0 +1,216 @@
# 飞球 ETL 系统ODS → DWD— 详细版
> 本文为项目的详细说明,保持与当前代码一致,覆盖 ODS 任务、DWD 装载、质检及开发扩展要点。
---
## 1. 项目概览
面向门店业务的 ETL从上游 API 或离线 JSON 采集订单、支付、会员、库存等数据,先落地 **ODS**,再清洗装载 **DWD**(含 SCD2 维度、事实增量),并输出质量校验报表。项目采用模块化/分层架构配置、API、数据库、Loader/SCD、质量、调度、CLI、测试统一通过 CLI 调度。
---
## 2. 快速开始(离线示例 JSON
**环境要求**Python 3.10+PostgreSQL`.env` 关键项:
- `PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test`
- `INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc`
**安装依赖**
```bash
cd etl_billiards
pip install -r requirements.txt
```
**一键 ODS → DWD → 质检(离线回放)**
```bash
# 初始化 ODS + DWD
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA --pipeline-flow INGEST_ONLY
# 灌入示例 JSON 到 ODS可用 .env 的 INGEST_SOURCE_DIR 覆盖)
python -m etl_billiards.cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
# 从 ODS 装载 DWD
python -m etl_billiards.cli.main --tasks DWD_LOAD_FROM_ODS --pipeline-flow INGEST_ONLY
# 质量校验报表
python -m etl_billiards.cli.main --tasks DWD_QUALITY_CHECK --pipeline-flow INGEST_ONLY
# 报表输出etl_billiards/reports/dwd_quality_report.json
```
> 可按需单独运行:
> - 仅建表:`python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA`
> - 仅 ODS 灌入:`python -m etl_billiards.cli.main --tasks MANUAL_INGEST`
> - 仅 DWD 装载:`python -m etl_billiards.cli.main --tasks INIT_DWD_SCHEMA,DWD_LOAD_FROM_ODS`
---
## 3. 配置与路径
- 示例数据目录:`C:\dev\LLTQ\export\test-json-doc`(可由 `.env``INGEST_SOURCE_DIR` 覆盖)。
- 日志/导出目录:`LOG_ROOT``EXPORT_ROOT``.env`
- 报表:`etl_billiards/reports/dwd_quality_report.json`
- DDL`etl_billiards/database/schema_ODS_doc.sql``etl_billiards/database/schema_dwd_doc.sql`
- 任务注册:`etl_billiards/orchestration/task_registry.py`(默认启用 INIT_ODS_SCHEMA、MANUAL_INGEST、INIT_DWD_SCHEMA、DWD_LOAD_FROM_ODS、DWD_QUALITY_CHECK
**安全提示**:建议将数据库凭证保存在 `.env` 或受控秘钥管理中,生产环境使用最小权限账号。
---
## 4. 目录结构与关键文件
- 根目录:`etl_billiards/` 主代码;`requirements.txt` 依赖;`run_etl.sh/.bat` 启动脚本;`.env/.env.example` 配置;`tmp/` 草稿/调试归档。
- `config/``defaults.py` 默认值,`env_parser.py` 解析 .env`settings.py` AppConfig 统一加载。
- `api/``client.py` HTTP 请求、重试、分页。
- `database/``connection.py` 连接封装;`operations.py` 批量 upsertDDL SQLODS/DWD
- `tasks/`
- `init_schema_task.py`INIT_ODS_SCHEMA/INIT_DWD_SCHEMA
- `manual_ingest_task.py`(示例 JSON → ODS
- `dwd_load_task.py`ODS → DWD 映射、SCD2/事实增量);
- 其他任务按需扩展。
- `loaders/`ODS/DWD/SCD2 Loader 实现。
- `scd/``scd2_handler.py` 处理维度 SCD2 历史。
- `quality/`:质量检查器(行数/金额对照)。
- `orchestration/``scheduler.py` 调度;`task_registry.py` 注册;`run_tracker.py` 运行记录;`cursor_manager.py` 水位管理。
- `scripts/`:重建/测试/探活工具。
- `docs/``ods_to_dwd_mapping.md` 映射说明;`ods_sample_json.md` 示例 JSON 说明;`dwd_quality_check.md` 质检说明。
- `reports/`:质检输出(如 `dwd_quality_report.json`)。
- `tests/`:单元/集成测试;`utils/`:通用工具;`backups/`:备份(若存在)。
---
## 5. 架构与流程
执行链路(控制流):
1) CLI`cli/main.py`)解析参数 → 生成 AppConfig → 初始化日志/DB 连接;
2) 调度层(`scheduler.py`)按 `task_registry.py` 中的注册表实例化任务,设置 run_uuid、cursor水位、上下文
3) 任务基类模板:
- 获取时间窗口/水位cursor_manager
- 拉取数据:在线模式调用 `api/client.py` 支持分页、重试;离线模式直接读取 JSON 文件;
- 解析与校验:类型转换、必填校验(如任务内部 parse/validate
- 加载:调用 Loader`loaders/`)执行批量 Upsert/SCD2/增量写入(底层用 `database/operations.py`
- 质量检查(如需):质量模块对行数/金额等进行对比;
- 更新水位与运行记录(`run_tracker.py`),提交/回滚事务。
数据流与依赖:
- 配置:`config/defaults.py` + `.env` + CLI 参数叠加,形成 AppConfig。
- API 访问:`api/client.py` 支撑分页/重试;离线 ingest 直接读文件。
- DB 访问:`database/connection.py` 提供连接上下文;`operations.py` 负责批量 upsert/分页写入。
- ODS`manual_ingest_task.py` 读取 JSON → ODS 表(保留 payload/来源/时间戳)。
- DWD`dwd_load_task.py` 依据 `TABLE_MAP/FACT_MAPPINGS` 从 ODS 选取字段;维度走 SCD2`scd/scd2_handler.py`事实走增量支持字段表达式JSON->>、CAST
- 质检:`quality` 模块或相关任务对 ODS/DWD 行数、金额等进行比对,输出 `reports/`
---
## 6. ODS → DWD 策略
1. ODS 留底保留源主键、payload、时间/来源信息。
2. DWD 清洗:维度 SCD2事实按时间/水位增量;字段类型、单位、枚举标准化,保留可溯源字段。
3. 业务键统一site_id、member_id、table_id、order_settle_id、order_trade_no 等统一命名。
4. 不过度汇总DWD 只做明细/轻度清洗,汇总留待 DWS/报表。
5. 去嵌套:数组展开为子表/子行,重复 profile 提炼为维度。
6. 长期演进:优先加列/加表,避免频繁改已有表结构。
---
## 7. 常用 CLI
```bash
# 运行所有已注册任务
python -m etl_billiards.cli.main
# 运行指定任务
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,MANUAL_INGEST
# 覆盖 DSN
python -m etl_billiards.cli.main --pg-dsn "postgresql://user:pwd@host:5432/db"
# 覆盖 API
python -m etl_billiards.cli.main --api-base "https://api.example.com" --api-token "..."
# 试运行(不写库)
python -m etl_billiards.cli.main --dry-run --tasks DWD_LOAD_FROM_ODS
```
---
## 8. 测试ONLINE / OFFLINE
- `TEST_MODE=ONLINE`:调用真实 API全链路 E/T/L。
- `TEST_MODE=OFFLINE`:从 `TEST_JSON_ARCHIVE_DIR` 读取离线 JSON只做 Transform + Load。
- `TEST_DB_DSN`:如设置,则集成测试连真库;未设置用内存/临时库。
示例:
```bash
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/db --query "SELECT 1"
```
---
## 9. 开发与扩展
- 新任务:在 `tasks/` 继承 BaseTask实现 `get_task_code/execute`,并在 `orchestration/task_registry.py` 注册。
- 新 Loader/Checker参考 `loaders/``quality/` 复用批量 upsert/质检接口。
- 配置:`config/defaults.py` + `.env` + CLI 叠加,新增配置需在 defaults 与 env_parser 中声明。
---
## 10. ODS 任务上线指引
- 任务注册脚本:`etl_billiards/database/seed_ods_tasks.sql`(替换 store_id 后执行:`psql "$PG_DSN" -f ...`)。
- 确认 `etl_admin.etl_task` 中已启用所需 ODS 任务。
- 离线回放:可用 `scripts/rebuild_ods_from_json`(如有)从本地 JSON 重建 ODS。
- 单测:`pytest etl_billiards/tests/unit/test_ods_tasks.py`
---
## 11. ODS 表概览(数据路径)
| ODS 表名 | 接口 Path | 数据列表路径 |
| ------------------------------------ | ------------------------------------------------- | ----------------------------- |
| assistant_accounts_master | /PersonnelManagement/SearchAssistantInfo | data.assistantInfos |
| assistant_service_records | /AssistantPerformance/GetOrderAssistantDetails | data.orderAssistantDetails |
| assistant_cancellation_records | /AssistantPerformance/GetAbolitionAssistant | data.abolitionAssistants |
| goods_stock_movements | /GoodsStockManage/QueryGoodsOutboundReceipt | data.queryDeliveryRecordsList |
| goods_stock_summary | /TenantGoods/GetGoodsStockReport | data |
| group_buy_packages | /PackageCoupon/QueryPackageCouponList | data.packageCouponList |
| group_buy_redemption_records | /Site/GetSiteTableUseDetails | data.siteTableUseDetailsList |
| member_profiles | /MemberProfile/GetTenantMemberList | data.tenantMemberInfos |
| member_balance_changes | /MemberProfile/GetMemberCardBalanceChange | data.tenantMemberCardLogs |
| member_stored_value_cards | /MemberProfile/GetTenantMemberCardList | data.tenantMemberCards |
| payment_transactions | /PayLog/GetPayLogListPage | data |
| platform_coupon_redemption_records | /Promotion/GetOfflineCouponConsumePageList | data |
| recharge_settlements | /Site/GetRechargeSettleList | data.settleList |
| refund_transactions | /Order/GetRefundPayLogList | data |
| settlement_records | /Site/GetAllOrderSettleList | data.settleList |
| settlement_ticket_details | /Order/GetOrderSettleTicketNew | 完整 JSON |
| site_tables_master | /Table/GetSiteTables | data.siteTables |
| stock_goods_category_tree | /TenantGoodsCategory/QueryPrimarySecondaryCategory| data.goodsCategoryList |
| store_goods_master | /TenantGoods/GetGoodsInventoryList | data.orderGoodsList |
| store_goods_sales_records | /TenantGoods/GetGoodsSalesList | data.orderGoodsLedgers |
| table_fee_discount_records | /Site/GetTaiFeeAdjustList | data.taiFeeAdjustInfos |
| table_fee_transactions | /Site/GetSiteTableOrderDetails | data.siteTableUseDetailsList |
| tenant_goods_master | /TenantGoods/QueryTenantGoods | data.tenantGoodsList |
> 完整字段级映射见 `docs/` 与 ODS/DWD DDL。
---
## 12. DWD 维度与建模要点
1. 颗粒一致、单一业务键:一张 DWD 表只承载一种业务事件/颗粒,避免混颗粒。
2. 先理解业务链路,再建模;不要机械按 JSON 列表建表。
3. 业务键统一site_id、member_id、table_id、order_settle_id、order_trade_no 等必须一致命名。
4. 保留明细,不过度汇总;聚合留到 DWS/报表。
5. 清洗标准化同时保留溯源字段源主键、时间、金额、payload
6. 去嵌套与解耦:数组展开子行,重复 profile 提炼维度。
7. 演进优先加列/加表,减少对已有表结构的破坏。
---
## 13. 当前状态2025-12-09
- 示例 JSON 已全量灌入DWD 行数与 ODS 对齐。
- 分类维度已展平大类+子类:`dim_goods_category` 26 行category_level/leaf 已赋值)。
- 部分空字段源数据即为空,如需补值请先确认上游。
---
## 14. 可精简/归档
- `tmp/``tmp/etl_billiards_misc/` 中草稿、旧备份、调试脚本仅供参考,不影响运行。
- 根级保留必要文件README、requirements、run_etl.*、.env/.env.example其他临时文件已移至 tmp。
---
## 15. FAQ
- 字段空值:若映射已存在且源列非空仍为空,再检查上游 JSON维度 SCD2 按全量合并。
- DSN/路径:确认 `.env``PG_DSN``INGEST_SOURCE_DIR` 与本地一致。
- 新增任务:在 `tasks/` 实现并注册到 `task_registry.py`,必要时同步更新 DDL 与映射。
- 权限/运行:检查网络、账号权限;脚本需执行权限(如 `chmod +x run_etl.sh`)。

View File

@@ -1,53 +1,49 @@
# 数据库配置(真实库)
# -*- coding: utf-8 -*-
# 文件说明ETL 环境变量config/env_parser.py 读取),用于数据库连接、目录与运行参数。
# 数据库连接字符串config/env_parser.py -> db.dsn所有任务必需
PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
# 数据库连接超时秒config/env_parser.py -> db.connect_timeout_sec
PG_CONNECT_TIMEOUT=10
# 如需拆分配置PG_HOST=... PG_PORT=... PG_NAME=... PG_USER=... PG_PASSWORD=...
# API配置如需走真实接口再填写
API_BASE=https://api.example.com
API_TOKEN=your_token_here
# API_TIMEOUT=20
# API_PAGE_SIZE=200
# API_RETRY_MAX=3
# 应用配置
# 门店/租户IDconfig/env_parser.py -> app.store_id任务调度记录使用
STORE_ID=2790685415443269
# TIMEZONE=Asia/Taipei
# SCHEMA_OLTP=billiards
# SCHEMA_ETL=etl_admin
# 时区标识config/env_parser.py -> app.timezone
TIMEZONE=Asia/Taipei
# 路径配置
EXPORT_ROOT=C:\dev\LLTQ\export\JSON
# API 基础地址config/env_parser.py -> api.base_urlFETCH 类任务调用
API_BASE=https://api.example.com
# API 鉴权 Tokenconfig/env_parser.py -> api.tokenFETCH 类任务调用
API_TOKEN=your_token_here
# API 请求超时秒config/env_parser.py -> api.timeout_sec
API_TIMEOUT=20
# API 分页大小config/env_parser.py -> api.page_size
API_PAGE_SIZE=200
# API 最大重试次数config/env_parser.py -> api.retries.max_attempts
API_RETRY_MAX=3
# 日志根目录config/env_parser.py -> io.log_rootInit/任务运行写日志
LOG_ROOT=C:\dev\LLTQ\export\LOG
FETCH_ROOT=
INGEST_SOURCE_DIR=
WRITE_PRETTY_JSON=false
PGCLIENTENCODING=utf8
# JSON 导出根目录config/env_parser.py -> io.export_rootFETCH 产出及 INIT 准备
EXPORT_ROOT=C:\dev\LLTQ\export\JSON
# ETL配置
# FETCH 模式本地输出目录config/env_parser.py -> pipeline.fetch_root
FETCH_ROOT=C:\dev\LLTQ\export\JSON
# 本地入库 JSON 目录config/env_parser.py -> pipeline.ingest_source_dirMANUAL_INGEST/INGEST_ONLY 使用
INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc
# JSON 漂亮格式输出开关config/env_parser.py -> io.write_pretty_json
WRITE_PRETTY_JSON=false
# 运行流程FULL / FETCH_ONLY / INGEST_ONLYconfig/env_parser.py -> pipeline.flow
PIPELINE_FLOW=FULL
# 指定任务列表逗号分隔覆盖默认config/env_parser.py -> run.tasks
# RUN_TASKS=INIT_ODS_SCHEMA,MANUAL_INGEST
# 窗口/补偿参数config/env_parser.py -> run.*
OVERLAP_SECONDS=120
WINDOW_BUSY_MIN=30
WINDOW_IDLE_MIN=180
IDLE_START=04:00
IDLE_END=16:00
ALLOW_EMPTY_RESULT_ADVANCE=true
# 清洗配置
LOG_UNKNOWN_FIELDS=true
HASH_ALGO=sha1
STRICT_NUMERIC=true
ROUND_MONEY_SCALE=2
# 测试/离线模式(真实库联调建议 ONLINE
TEST_MODE=ONLINE
TEST_JSON_ARCHIVE_DIR=tests/source-data-doc
TEST_JSON_TEMP_DIR=/tmp/etl_billiards_json_tmp
# 测试数据库
TEST_DB_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
# ODS <20>ؽ<EFBFBD><D8BD>ű<EFBFBD><C5B1><EFBFBD><EFBFBD>ã<EFBFBD><C3A3><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ã<EFBFBD>
JSON_DOC_DIR=C:\dev\LLTQ\export\test-json-doc
ODS_INCLUDE_FILES=
ODS_DROP_SCHEMA_FIRST=true

View File

@@ -1,837 +0,0 @@
# 台球场 ETL 系统(模块化版本)合并文档
本文为原多份文档(如 `INDEX.md``QUICK_START.md``ARCHITECTURE.md``MIGRATION_GUIDE.md``PROJECT_STRUCTURE.md``README.md` 等)的合并版,只保留与**当前项目本身**相关的内容:项目说明、目录结构、架构设计、数据与控制流程、迁移与扩展指南等,不包含修改历史和重构过程描述。
---
## 1. 项目概述
台球场 ETL 系统是一个面向门店业务的专业 ETL 工程项目,用于从外部业务 API 拉取订单、支付、会员等数据经过解析、校验、SCD2 处理、质量检查后写入 PostgreSQL 数据库,并支持增量同步和任务运行追踪。
系统采用模块化、分层架构设计,核心特性包括:
- 模块化目录结构配置、数据库、API、模型、加载器、SCD2、质量检查、编排、任务、CLI、工具、测试等分层清晰
- 完整的配置管理:默认值 + 环境变量 + CLI 参数多层覆盖。
- 可复用的数据库访问层(连接管理、批量 Upsert 封装)。
- 支持重试与分页的 API 客户端。
- 类型安全的数据解析与校验模块。
- SCD2 维度历史管理。
- 数据质量检查(例如余额一致性检查)。
- 任务编排层统一调度、游标管理与运行追踪。
- 命令行入口统一管理任务执行支持筛选任务、Dry-run 等模式。
---
## 2. 快速开始
### 2.1 环境准备
- Python 版本:建议 3.10+
- 数据库PostgreSQL
- 操作系统Windows / Linux / macOS 均可
```bash
# 克隆/下载代码后进入项目目录
cd etl_billiards/
ls -la
```
你会看到下述目录结构的顶层部分(详细见第 4 章):
- `config/` - 配置管理
- `database/` - 数据库访问
- `api/` - API 客户端
- `tasks/` - ETL 任务实现
- `cli/` - 命令行入口
- `docs/` - 技术文档
### 2.2 安装依赖
```bash
pip install -r requirements.txt
```
主要依赖示例(按实际 `requirements.txt` 为准):
- `psycopg2-binary`PostgreSQL 驱动
- `requests`HTTP 客户端
- `python-dateutil`:时间处理
- `tzdata`:时区数据
### 2.3 配置环境变量
复制并修改环境变量模板:
```bash
cp .env.example .env
# 使用你习惯的编辑器修改 .env
```
`.env` 示例(最小配置):
```bash
# 数据库
PG_DSN=postgresql://user:password@localhost:5432/....
# API
API_BASE=https://api.example.com
API_TOKEN=your_token_here
# 门店/应用
STORE_ID=2790685415443269
TIMEZONE=Asia/Taipei
# 目录
EXPORT_ROOT=/path/to/export
LOG_ROOT=/path/to/logs
```
> 所有配置项的默认值见 `config/defaults.py`,最终生效配置由「默认值 + 环境变量 + CLI 参数」三层叠加。
### 2.4 运行第一个任务
通过 CLI 入口运行:
```bash
# 运行所有任务
python -m cli.main
# 仅运行订单任务
python -m cli.main --tasks ORDERS
# 运行订单 + 支付
python -m cli.main --tasks ORDERS,PAYMENTS
# Windows 使用脚本
run_etl.bat --tasks ORDERS
# Linux / macOS 使用脚本
./run_etl.sh --tasks ORDERS
```
### 2.5 查看结果
- 日志目录:使用 `LOG_ROOT` 指定,例如
```bash
ls -la C:\dev\LLTQ\export\LOG/
```
- 导出目录:使用 `EXPORT_ROOT` 指定,例如
```bash
ls -la C:\dev\LLTQ\export\JSON/
```
---
## 3. 常用命令与开发工具
### 3.1 CLI 常用命令
```bash
# 运行所有任务
python -m cli.main
# 运行指定任务
python -m cli.main --tasks ORDERS,PAYMENTS,MEMBERS
# 使用自定义数据库
python -m cli.main --pg-dsn "postgresql://user:password@host:5432/db"
# 使用自定义 API 端点
python -m cli.main --api-base "https://api.example.com" --api-token "..."
# 试运行(不写入数据库)
python -m cli.main --dry-run --tasks ORDERS
```
### 3.2 IDE / 代码质量工具示例VSCode
`.vscode/settings.json` 示例:
```json
{
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "black",
"python.testing.pytestEnabled": true
}
```
代码格式化与检查:
```bash
pip install black isort pylint
black .
isort .
pylint etl_billiards/
```
### 3.3 测试
```bash
# 安装测试依赖(按需)
pip install pytest pytest-cov
# 运行全部测试
pytest
# 仅运行单元测试
pytest tests/unit/
# 生成覆盖率报告
pytest --cov=. --cov-report=html
```
测试示例(按实际项目为准):
- `tests/unit/test_config.py` 配置管理单元测试
- `tests/unit/test_parsers.py` 解析器单元测试
- `tests/integration/test_database.py` 数据库集成测试
#### 3.3.1 测试模式ONLINE / OFFLINE
- `TEST_MODE=ONLINE`(默认)时,测试会模拟实时 API完整执行 E/T/L。
- `TEST_MODE=OFFLINE` 时,测试改为从 `TEST_JSON_ARCHIVE_DIR` 指定的归档 JSON 中读取数据,仅做 Transform + Load适合验证本地归档数据是否仍可回放。
- `TEST_JSON_ARCHIVE_DIR`:离线 JSON 归档目录(示例:`tests/source-data-doc` 或 CI 产出的快照)。
- `TEST_JSON_TEMP_DIR`:测试生成的临时 JSON 输出目录,便于隔离每次运行的数据。
- `TEST_DB_DSN`:可选,若设置则单元测试会连接到此 PostgreSQL DSN实打实执行写库留空时测试使用内存伪库避免依赖数据库。
示例命令:
```bash
# 在线模式覆盖所有任务
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
# 离线模式使用归档 JSON 覆盖所有任务
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
# 使用脚本按需组合参数(示例:在线 + 仅订单用例)
python scripts/run_tests.py --suite online --mode ONLINE --keyword ORDERS
# 使用脚本连接真实测试库并回放离线模式
python scripts/run_tests.py --suite offline --mode OFFLINE --db-dsn postgresql://user:pwd@localhost:5432/testdb
# 使用“指令仓库”中的预置命令
python scripts/run_tests.py --preset offline_realdb
python scripts/run_tests.py --list-presets # 查看或自定义 scripts/test_presets.py
```
#### 3.3.2 脚本化测试组合(`run_tests.py` / `test_presets.py`
- `scripts/run_tests.py` 是 pytest 的统一入口:自动把项目根目录加入 `sys.path`,并提供 `--suite online/offline/integration`、`--tests`(自定义路径)、`--mode`、`--db-dsn`、`--json-archive`、`--json-temp`、`--keyword/-k`、`--pytest-args`、`--env KEY=VALUE` 等参数,可以像搭积木一样自由组合;
- `--preset foo` 会读取 `scripts/test_presets.py` 内 `PRESETS["foo"]` 的配置,并叠加到当前命令;`--list-presets` 与 `--dry-run` 可用来审阅或仅打印命令;
- 直接执行 `python scripts/test_presets.py` 可依次运行 `AUTO_RUN_PRESETS` 中列出的预置;传入 `--preset x --dry-run` 则只打印对应命令。
`test_presets.py` 充当“指令仓库”。每个预置都是一个字典,常用字段解释如下:
| 字段 | 作用 |
| ---------------------------- | ------------------------------------------------------------------ |
| `suite` | 复用 `run_tests.py` 内置套件online/offline/integration可多选 |
| `tests` | 追加任意 pytest 路径,例如 `tests/unit/test_config.py` |
| `mode` | 覆盖 `TEST_MODE`ONLINE / OFFLINE |
| `db_dsn` | 覆盖 `TEST_DB_DSN`,用于连入真实测试库 |
| `json_archive` / `json_temp` | 配置离线 JSON 归档与临时目录 |
| `keyword` | 映射到 `pytest -k`,用于关键字过滤 |
| `pytest_args` | 附加 pytest 参数,例 `-vv --maxfail=1` |
| `env` | 额外环境变量列表,如 `["STORE_ID=123"]` |
| `preset_meta` | 说明性文字,便于描述场景 |
示例:`offline_realdb` 预置会设置 `TEST_MODE=OFFLINE`、指定 `tests/source-data-doc` 为归档目录,并通过 `db_dsn` 连到测试库。执行 `python scripts/run_tests.py --preset offline_realdb` 或 `python scripts/test_presets.py --preset offline_realdb` 即可复用该组合保证本地、CI 与生产回放脚本一致。
#### 3.3.3 数据库连通性快速检查
`python scripts/test_db_connection.py` 提供最轻量的 PostgreSQL 连通性检测:默认使用 `TEST_DB_DSN`(也可传 `--dsn`),尝试连接并执行 `SELECT 1 AS ok`(可通过 `--query` 自定义)。典型用途:
```bash
# 读取 .env/环境变量中的 TEST_DB_DSN
python scripts/test_db_connection.py
# 临时指定 DSN并检查任务配置表
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/.... --query "SELECT count(*) FROM etl_admin.etl_task"
```
脚本返回 0 代表连接与查询成功;若返回非 0可结合第 8 章“常见问题排查”的数据库章节(网络、防火墙、账号权限等)先定位问题,再运行完整 ETL。
---
## 4. 项目结构与文件说明
### 4.1 总体目录结构(树状图)
```text
etl_billiards/
├── README.md # 项目总览和使用说明
├── MIGRATION_GUIDE.md # 从旧版本迁移指南
├── requirements.txt # Python 依赖列表
├── setup.py # 项目安装配置
├── .env.example # 环境变量配置模板
├── .gitignore # Git 忽略文件配置
├── run_etl.sh # Linux/Mac 运行脚本
├── run_etl.bat # Windows 运行脚本
├── config/ # 配置管理模块
│ ├── __init__.py
│ ├── defaults.py # 默认配置值定义
│ ├── env_parser.py # 环境变量解析器
│ └── settings.py # 配置管理主类
├── database/ # 数据库访问层
│ ├── __init__.py
│ ├── connection.py # 数据库连接管理
│ └── operations.py # 批量操作封装
├── api/ # HTTP API 客户端
│ ├── __init__.py
│ └── client.py # API 客户端(重试 + 分页)
├── models/ # 数据模型层
│ ├── __init__.py
│ ├── parsers.py # 类型解析器
│ └── validators.py # 数据验证器
├── loaders/ # 数据加载器层
│ ├── __init__.py
│ ├── base_loader.py # 加载器基类
│ ├── dimensions/ # 维度表加载器
│ │ ├── __init__.py
│ │ └── member.py # 会员维度加载器
│ └── facts/ # 事实表加载器
│ ├── __init__.py
│ ├── order.py # 订单事实表加载器
│ └── payment.py # 支付记录加载器
├── scd/ # SCD2 处理层
│ ├── __init__.py
│ └── scd2_handler.py # SCD2 历史记录处理器
├── quality/ # 数据质量检查层
│ ├── __init__.py
│ ├── base_checker.py # 质量检查器基类
│ └── balance_checker.py # 余额一致性检查器
├── orchestration/ # ETL 编排层
│ ├── __init__.py
│ ├── scheduler.py # ETL 调度器
│ ├── task_registry.py # 任务注册表(工厂模式)
│ ├── cursor_manager.py # 游标管理器
│ └── run_tracker.py # 运行记录追踪器
├── tasks/ # ETL 任务层
│ ├── __init__.py
│ ├── base_task.py # 任务基类(模板方法)
│ ├── orders_task.py # 订单 ETL 任务
│ ├── payments_task.py # 支付 ETL 任务
│ └── members_task.py # 会员 ETL 任务
├── cli/ # 命令行接口层
│ ├── __init__.py
│ └── main.py # CLI 主入口
├── utils/ # 工具函数
│ ├── __init__.py
│ └── helpers.py # 通用工具函数
├── tests/ # 测试代码
│ ├── __init__.py
│ ├── unit/ # 单元测试
│ │ ├── __init__.py
│ │ ├── test_config.py
│ │ └── test_parsers.py
│ ├── testdata_json/ # 清洗入库用的测试Json文件
│ │ └── XX.json
│ └── integration/ # 集成测试
│ ├── __init__.py
│ └── test_database.py
└── docs/ # 文档
└── ARCHITECTURE.md # 架构设计文档
```
### 4.2 各模块职责概览
- **config/**
- 统一配置入口,支持默认值、环境变量、命令行参数三层覆盖。
- **database/**
- 封装 PostgreSQL 连接与批量操作插入、更新、Upsert 等)。
- **api/**
- 对上游业务 API 的 HTTP 调用进行统一封装,支持重试、分页与超时控制。
- **models/**
- 提供类型解析器(时间戳、金额、整数等)与业务级数据校验器。
- **loaders/**
- 提供事实表与维度表的加载逻辑(包含批量 Upsert、统计写入结果等
- **scd/**
- 维度型数据的 SCD2 历史管理(有效期、版本标记等)。
- **quality/**
- 质量检查策略,例如余额一致性、记录数量对齐等。
- **orchestration/**
- 任务调度、任务注册、游标管理(增量窗口)、运行记录追踪。
- **tasks/**
- 具体业务任务(订单、支付、会员等),封装了从“取数 → 处理 → 写库 → 记录结果”的完整流程。
- **cli/**
- 命令行入口,解析参数并启动调度流程。
- **utils/**
- 杂项工具函数。
- **tests/**
- 单元测试与集成测试代码。
---
## 5. 架构设计与流程说明
### 5.1 分层架构图
```text
┌─────────────────────────────────────┐
│ CLI 命令行接口 │ <- cli/main.py
└─────────────┬───────────────────────┘
┌─────────────▼───────────────────────┐
│ Orchestration 编排层 │ <- orchestration/
│ (Scheduler, TaskRegistry, ...) │
└─────────────┬───────────────────────┘
┌─────────────▼───────────────────────┐
│ Tasks 任务层 │ <- tasks/
│ (OrdersTask, PaymentsTask, ...) │
└───┬─────────┬─────────┬─────────────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌─────┐ ┌──────────┐
│Loaders │ │ SCD │ │ Quality │ <- loaders/, scd/, quality/
└────────┘ └─────┘ └──────────┘
┌───────▼────────┐
│ Models 模型 │ <- models/
└───────┬────────┘
┌───────▼────────┐
│ API 客户端 │ <- api/
└───────┬────────┘
┌───────▼────────┐
│ Database 访问 │ <- database/
└───────┬────────┘
┌───────▼────────┐
│ Config 配置 │ <- config/
└────────────────┘
```
### 5.2 各层职责(当前设计)
- **CLI 层 (`cli/`)**
- 解析命令行参数指定任务列表、Dry-run、覆盖配置项等
- 初始化配置与日志后交由编排层执行。
- **编排层 (`orchestration/`)**
- `scheduler.py`:根据配置与 CLI 参数选择需要执行的任务,控制执行顺序和并行策略。
- `task_registry.py`:提供任务注册表,按任务代码创建任务实例(工厂模式)。
- `cursor_manager.py`:管理增量游标(时间窗口 / ID 游标)。
- `run_tracker.py`:记录每次任务运行的状态、统计信息和错误信息。
- **任务层 (`tasks/`)**
- `base_task.py`:定义任务执行模板流程(模板方法模式),包括获取窗口、调用上游、解析 / 校验、写库、更新游标等。
- `orders_task.py` / `payments_task.py` / `members_task.py`:实现具体任务逻辑(订单、支付、会员)。
- **加载器 / SCD / 质量层**
- `loaders/`:根据目标表封装 Upsert / Insert / Update 逻辑。
- `scd/scd2_handler.py`:为维度表提供 SCD2 历史管理能力。
- `quality/`:执行数据质量检查,如余额对账。
- **模型层 (`models/`)**
- `parsers.py`:负责数据类型转换(字符串 → 时间戳、Decimal、int 等)。
- `validators.py`:执行字段级和记录级的数据校验。
- **API 层 (`api/client.py`)**
- 封装 HTTP 调用,处理重试、超时及分页。
- **数据库层 (`database/`)**
- 管理数据库连接及上下文。
- 提供批量插入 / 更新 / Upsert 操作接口。
- **配置层 (`config/`)**
- 定义配置项默认值。
- 解析环境变量并进行类型转换。
- 对外提供统一配置对象。
### 5.3 设计模式(当前使用)
- 工厂模式:任务注册 / 创建(`TaskRegistry`)。
- 模板方法模式:任务执行流程(`BaseTask`)。
- 策略模式:不同 Loader / Checker 实现不同策略。
- 依赖注入:通过构造函数向任务传入 `db`、`api`、`config` 等依赖。
### 5.4 数据与控制流程
整体流程:
1. CLI 解析参数并加载配置。
2. Scheduler 构建数据库连接、API 客户端等依赖。
3. Scheduler 遍历任务配置,从 `TaskRegistry` 获取任务类并实例化。
4. 每个任务按统一模板执行:
- 读取游标 / 时间窗口。
- 调用 API 拉取数据(可分页)。
- 解析、验证数据。
- 通过 Loader 写入数据库(事实表 / 维度表 / SCD2
- 执行质量检查。
- 更新游标与运行记录。
5. 所有任务执行完成后,释放连接并退出进程。
### 5.5 错误处理策略
- 单个任务失败不影响其他任务执行。
- 数据库操作异常自动回滚当前事务。
- API 请求失败时按配置进行重试,超过重试次数记录错误并终止该任务。
- 所有错误被记录到日志和运行追踪表,便于事后排查。
### 5.6 ODS + DWD 双阶段策略(新增)
为了支撑回溯/重放与后续 DWD 宽表构建,项目新增了 `billiards_ods` Schema 以及一组专门的 ODS 任务/Loader
- **ODS 表**`billiards_ods.ods_order_settle`、`ods_table_use_detail`、`ods_assistant_ledger`、`ods_assistant_abolish`、`ods_goods_ledger`、`ods_payment`、`ods_refund`、`ods_coupon_verify`、`ods_member`、`ods_member_card`、`ods_package_coupon`、`ods_inventory_stock`、`ods_inventory_change`。每条记录都会保存 `store_id + 源主键 + payload JSON + fetched_at + source_endpoint` 等信息。
- **通用 Loader**`loaders/ods/generic.py::GenericODSLoader` 统一封装了 `INSERT ... ON CONFLICT ...` 与批量写入逻辑,调用方只需提供列名与主键列即可。
- **ODS 任务**`tasks/ods_tasks.py` 内通过 `OdsTaskSpec` 定义了一组任务(`ODS_ORDER_SETTLE`、`ODS_PAYMENT`、`ODS_ASSISTANT_LEDGER` 等),并在 `TaskRegistry` 中自动注册,可直接通过 `python -m cli.main --tasks ODS_ORDER_SETTLE,ODS_PAYMENT` 执行。
- **双阶段链路**
1. 阶段 1ODS调用 API/离线归档 JSON将原始记录写入 ODS 表,保留分页、抓取时间、来源文件等元数据。
2. 阶段 2DWD/DIM后续订单、支付、券等事实任务将改为从 ODS 读取 payload经过解析/校验后写入 `billiards.fact_*`、`dim_*` 表,避免重复拉取上游接口。
> 新增的单元测试 `tests/unit/test_ods_tasks.py` 覆盖了 `ODS_ORDER_SETTLE`、`ODS_PAYMENT` 的入库路径,可作为扩展其他 ODS 任务的模板。
---
## 6. 迁移指南(从旧脚本到当前项目)
本节用于说明如何从旧的单文件脚本(如 `task_merged.py`)迁移到当前模块化项目,属于当前项目的使用说明,不涉及历史对比细节。
### 6.1 核心功能映射示意
| 旧版本函数 / 类 | 新版本位置 | 说明 |
| --------------------- | ----------------------------------------------------- | ---------- |
| `DEFAULTS` 字典 | `config/defaults.py` | 配置默认值 |
| `build_config()` | `config/settings.py::AppConfig.load()` | 配置加载 |
| `Pg` 类 | `database/connection.py::DatabaseConnection` | 数据库连接 |
| `http_get_json()` | `api/client.py::APIClient.get()` | API 请求 |
| `paged_get()` | `api/client.py::APIClient.get_paginated()` | 分页请求 |
| `parse_ts()` | `models/parsers.py::TypeParser.parse_timestamp()` | 时间解析 |
| `upsert_fact_order()` | `loaders/facts/order.py::OrderLoader.upsert_orders()` | 订单加载 |
| `scd2_upsert()` | `scd/scd2_handler.py::SCD2Handler.upsert()` | SCD2 处理 |
| `run_task_orders()` | `tasks/orders_task.py::OrdersTask.execute()` | 订单任务 |
| `main()` | `cli/main.py::main()` | 主入口 |
### 6.2 典型迁移步骤
1. **配置迁移**
- 原来在 `DEFAULTS` 或脚本内硬编码的配置,迁移到 `.env` 与 `config/defaults.py`。
- 使用 `AppConfig.load()` 统一获取配置。
2. **并行运行验证**
```bash
# 旧脚本
python task_merged.py --tasks ORDERS
# 新项目
python -m cli.main --tasks ORDERS
```
对比新旧版本导出的数据表和日志,确认一致性。
3. **自定义逻辑迁移**
- 原脚本中的自定义清洗逻辑 → 放入相应 `loaders/` 或任务类中。
- 自定义任务 → 在 `tasks/` 中实现并在 `task_registry` 中注册。
- 自定义 API 调用 → 扩展 `api/client.py` 或单独封装服务类。
4. **逐步切换**
- 先在测试环境并行运行。
- 再逐步切换生产任务到新版本。
---
## 7. 开发与扩展指南(当前项目)
### 7.1 添加新任务
1. 在 `tasks/` 目录创建任务类:
```python
from .base_task import BaseTask
class MyTask(BaseTask):
def get_task_code(self) -> str:
return "MY_TASK"
def execute(self) -> dict:
# 1. 获取时间窗口
window_start, window_end, _ = self._get_time_window()
# 2. 调用 API 获取数据
records, _ = self.api.get_paginated(...)
# 3. 解析 / 校验
parsed = [self._parse(r) for r in records]
# 4. 加载数据
loader = MyLoader(self.db)
inserted, updated, _ = loader.upsert(parsed)
# 5. 提交并返回结果
self.db.commit()
return self._build_result("SUCCESS", {
"inserted": inserted,
"updated": updated,
})
```
2. 在 `orchestration/task_registry.py` 中注册:
```python
from tasks.my_task import MyTask
default_registry.register("MY_TASK", MyTask)
```
3. 在任务配置表中启用(示例):
```sql
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
VALUES ('MY_TASK', 123456, TRUE);
```
### 7.2 添加新加载器
```python
from loaders.base_loader import BaseLoader
class MyLoader(BaseLoader):
def upsert(self, records: list) -> tuple:
sql = "INSERT INTO table_name (...) VALUES (...) ON CONFLICT (...) DO UPDATE SET ... RETURNING (xmax = 0) AS inserted"
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)
```
### 7.3 添加新质量检查器
1. 在 `quality/` 中实现检查器,继承 `base_checker.py`。
2. 在任务或调度流程中调用该检查器,在写库后进行验证。
### 7.4 类型解析与校验扩展
- 在 `models/parsers.py` 中添加新类型解析方法。
- 在 `models/validators.py` 中添加新规则(如枚举校验、跨字段校验等)。
---
## 8. 常见问题排查
### 8.1 数据库连接失败
```text
错误: could not connect to server
```
排查要点:
- 检查 `PG_DSN` 或相关数据库配置是否正确。
- 确认数据库服务是否启动、网络是否可达。
### 8.2 API 请求超时
```text
错误: requests.exceptions.Timeout
```
排查要点:
- 检查 `API_BASE` 地址与网络连通性。
- 适当提高超时与重试次数(在配置中调整)。
### 8.3 模块导入错误
```text
错误: ModuleNotFoundError
```
排查要点:
- 确认在项目根目录下运行(包含 `etl_billiards/` 包)。
- 或通过 `pip install -e .` 以可编辑模式安装项目。
### 8.4 权限相关问题
```text
错误: Permission denied
```
排查要点:
- 脚本无执行权限:`chmod +x run_etl.sh`。
- Windows 需要以管理员身份运行,或修改日志 / 导出目录权限。
---
## 9. 使用前检查清单
在正式运行前建议确认:
- [ ] 已安装 Python 3.10+。
- [ ] 已执行 `pip install -r requirements.txt`。
- [ ] `.env` 已配置正确数据库、API、门店 ID、路径等
- [ ] PostgreSQL 数据库可连接。
- [ ] API 服务可访问且凭证有效。
- [ ] `LOG_ROOT`、`EXPORT_ROOT` 目录存在且拥有写权限。
---
## 10. 参考说明
- 本文已合并原有的快速开始、项目结构、架构说明、迁移指南等内容,可作为当前项目的统一说明文档。
- 如需在此基础上拆分多份文档,可按章节拆出,例如「快速开始」「架构设计」「迁移指南」「开发扩展」等。
## 11. 运行/调试模式说明
- 生产环境仅保留“任务模式”:通过调度/CLI 执行注册的任务ETL/ODS不使用调试脚本。
- 开发/调试可使用的辅助脚本(上线前可删除或禁用):
- `python -m etl_billiards.scripts.rebuild_ods_from_json`:从本地 JSON 目录重建 `billiards_ods`,用于离线初始化/验证。环境变量:`PG_DSN`(必填)、`JSON_DOC_DIR`(可选,默认 `C:\dev\LLTQ\export\test-json-doc`)、`INCLUDE_FILES`(逗号分隔文件名)、`DROP_SCHEMA_FIRST`(默认 true
- 如需在生产环境保留脚本,请在运维手册中明确用途和禁用条件,避免误用。
## 12. ODS 任务上线指引
- 任务注册:`etl_billiards/database/seed_ods_tasks.sql` 列出了当前启用的 ODS 任务。将其中的 `store_id` 替换为实际门店后执行:
```
psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
```
`ON CONFLICT` 会保持 enabled=true避免重复。
- 调度:确认 `etl_admin.etl_task` 中已启用所需的 ODS 任务(任务代码见 seed 脚本),调度器或 CLI `--tasks` 即可调用。
- 离线回灌:开发环境可用 `rebuild_ods_from_json` 以样例 JSON 初始化 ODS生产慎用默认按 `(source_file, record_index)` 去重。
- 测试:`pytest etl_billiards/tests/unit/test_ods_tasks.py` 覆盖核心 ODS 任务;测试时可设置 `ETL_SKIP_DOTENV=1` 跳过本地 .env 读取。
## 13. ODS 表映射总览
| ODS 表名 | 接口 Path | 数据列表路径 |
| ------------------------------------ | ---------------------------------------------------- | ----------------------------- |
| `assistant_accounts_master` | `/PersonnelManagement/SearchAssistantInfo` | data.assistantInfos |
| `assistant_service_records` | `/AssistantPerformance/GetOrderAssistantDetails` | data.orderAssistantDetails |
| `assistant_cancellation_records` | `/AssistantPerformance/GetAbolitionAssistant` | data.abolitionAssistants |
| `goods_stock_movements` | `/GoodsStockManage/QueryGoodsOutboundReceipt` | data.queryDeliveryRecordsList |
| `goods_stock_summary` | `/TenantGoods/GetGoodsStockReport` | data |
| `group_buy_packages` | `/PackageCoupon/QueryPackageCouponList` | data.packageCouponList |
| `group_buy_redemption_records` | `/Site/GetSiteTableUseDetails` | data.siteTableUseDetailsList |
| `member_profiles` | `/MemberProfile/GetTenantMemberList` | data.tenantMemberInfos |
| `member_balance_changes` | `/MemberProfile/GetMemberCardBalanceChange` | data.tenantMemberCardLogs |
| `member_stored_value_cards` | `/MemberProfile/GetTenantMemberCardList` | data.tenantMemberCards |
| `payment_transactions` | `/PayLog/GetPayLogListPage` | data |
| `platform_coupon_redemption_records` | `/Promotion/GetOfflineCouponConsumePageList` | data |
| `recharge_settlements` | `/Site/GetRechargeSettleList` | data.settleList |
| `refund_transactions` | `/Order/GetRefundPayLogList` | data |
| `settlement_records` | `/Site/GetAllOrderSettleList` | data.settleList |
| `settlement_ticket_details` | `/Order/GetOrderSettleTicketNew` | (整包原始 JSON |
| `site_tables_master` | `/Table/GetSiteTables` | data.siteTables |
| `stock_goods_category_tree` | `/TenantGoodsCategory/QueryPrimarySecondaryCategory` | data.goodsCategoryList |
| `store_goods_master` | `/TenantGoods/GetGoodsInventoryList` | data.orderGoodsList |
| `store_goods_sales_records` | `/TenantGoods/GetGoodsSalesList` | data.orderGoodsLedgers |
| `table_fee_discount_records` | `/Site/GetTaiFeeAdjustList` | data.taiFeeAdjustInfos |
| `table_fee_transactions` | `/Site/GetSiteTableOrderDetails` | data.siteTableUseDetailsList |
| `tenant_goods_master` | `/TenantGoods/QueryTenantGoods` | data.tenantGoodsList |
## 14. ODS 相关环境变量/默认值
- `.env` / 环境变量:
- `JSON_DOC_DIR`ODS 样例 JSON 目录(开发/回灌用)
- `ODS_INCLUDE_FILES`:限定导入的文件名(逗号分隔,不含 .json
- `ODS_DROP_SCHEMA_FIRST`true/false是否重建 schema
- `ETL_SKIP_DOTENV`:测试/CI 时设为 1 跳过本地 .env 读取
- `config/defaults.py` 中 `ods` 默认值:
- `json_doc_dir`: `C:\dev\LLTQ\export\test-json-doc`
- `include_files`: `""`
- `drop_schema_first`: `True`
---
## 15. DWD 维度 “业务事件”
1. 粒度唯一、原子
- 一张 DWD 表只能有一种业务粒度,比如:
- 一条记录 = 一次结账;
- 一条记录 = 一段台费流水;
- 一条记录 = 一次助教服务;
- 一条记录 = 一次会员余额变动。
- 表里面不能又混“订单头”又混“订单行”,不能一部分是“汇总”,一部分是“明细”。
- 一旦粒度确定,所有字段都要跟这个粒度匹配:
- 比如“结账头表”就不要塞每一行商品明细;
- 商品明细就不要塞整单级别的总金额。
- 这是 DWD 层最重要的一条。
2. 以业务过程建模,不以 JSON 列表建模
- 先画清楚你真实的业务链路:
- 开台 / 换台 / 关台 → 台费流水
- 助教上桌 → 助教服务流水 / 废除事件
- 点单 → 商品销售流水
- 充值 / 消费 → 余额变更 / 充值单
- 结账 → 结账头表 + 支付流水 / 退款流水
- 团购 / 平台券 → 核销流水
3. 主键明确、外键统一
- 每张 DWD 表必须有业务主键(哪怕是接口给的 id不要依赖数据库自增。
- 所有“同一概念”的字段必须统一命名、统一含义:
- 门店:统一叫 site_id都对应 siteProfile.id
- 会员:统一叫 member_id 对应 member_profiles.idsystem_member_id 单独一列;
- 台桌:统一 table_id 对应 site_tables_master.id
- 结账:统一 order_settle_id
- 订单:统一 order_trade_no 等。
- 否则后面 DWS、AI 要把表拼起来会非常痛苦。
4. 保留明细,不做过度汇总
- DWD 层的事实表原则上只做“明细级”的数据:
- 不要在 DWD 就把“日汇总、周汇总、月汇总”算出来,那是 DWS 的事;
- 也不要把多个事件折成一行(例如一张表同时放日汇总+单笔流水)。
- 需要聚合时,再在 DWS 做主题宽表:
- dws_member_day_profile、dws_site_day_summary 等。
- DWD 只负责细颗粒度的真相。
5. 统一清洗、标准化,但保持可追溯
- 在 DWD 层一定要做的清洗:
- 类型转换:字符串时间 → 时间类型,金额统一为 decimal布尔统一为 0/1
- 单位统一:秒 / 分钟、元 / 分都统一;
- 枚举标准化:状态码、类型码在 DWD 里就定死含义,必要时建枚举维表。
- 同时要保证:
- 每条 DWD 记录都能追溯回 ODS
- 保留源系统主键;
- 保留原始时间 / 原始金额字段(不要覆盖掉)。
6. 扁平化、去嵌套
- JSON 里常见结构是:分页壳 + 头 + 明细数组 + 各种嵌套对象siteProfile、tableProfile、goodsLedgers…
- DWD 的原则是:
- 去掉分页壳;
- 把“数组”拆成子表(头表 / 行表);
- 把重复出现的 profile 抽出去做维度表(门店、台、商品、会员……)。
- 目标是DWD 表都是二维表结构,不存复杂嵌套 JSON。
7. 模型长期稳定,可扩展
- DWD 的表结构要尽可能稳定,新增需求尽量通过:
- 加字段;
- 新建事实表 / 维度表;
- 在 DWS 做派生指标;
- 而不是频繁重构已有 DWD 表结构。
- 这点跟你后面要喂给 LLM 也很相关AI 配的 prompt、schema 理解都要尽量少改。

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -2,22 +2,117 @@
CREATE SCHEMA IF NOT EXISTS billiards_dwd;
SET search_path TO billiards_dwd;
-- SCD2 字段统一默认值、中文注释、唯一性(业务键 + 时间段不重叠)控制
CREATE EXTENSION IF NOT EXISTS btree_gist;
DO $$
DECLARE
rec RECORD;
BEGIN
-- 统一 SCD2 默认值与注释,避免后续手工遗漏
FOR rec IN
SELECT table_name
FROM information_schema.columns
WHERE table_schema = 'billiards_dwd'
AND column_name = 'scd2_start_time'
LOOP
EXECUTE format('ALTER TABLE billiards_dwd.%I ALTER COLUMN scd2_start_time SET DEFAULT now()', rec.table_name);
EXECUTE format('ALTER TABLE billiards_dwd.%I ALTER COLUMN scd2_end_time SET DEFAULT ''9999-12-31''', rec.table_name);
EXECUTE format('ALTER TABLE billiards_dwd.%I ALTER COLUMN scd2_is_current SET DEFAULT 1', rec.table_name);
EXECUTE format('ALTER TABLE billiards_dwd.%I ALTER COLUMN scd2_version SET DEFAULT 1', rec.table_name);
EXECUTE format('COMMENT ON COLUMN billiards_dwd.%I.scd2_start_time IS ''SCD2 开始时间(版本生效起点)''', rec.table_name);
EXECUTE format('COMMENT ON COLUMN billiards_dwd.%I.scd2_end_time IS ''SCD2 结束时间(默认 9999-12-31表示当前版本仍有效''', rec.table_name);
EXECUTE format('COMMENT ON COLUMN billiards_dwd.%I.scd2_is_current IS ''SCD2 当前版本标记1=当前版本0=历史版本''', rec.table_name);
EXECUTE format('COMMENT ON COLUMN billiards_dwd.%I.scd2_version IS ''SCD2 版本号,自增,配合时间段避免重叠''', rec.table_name);
END LOOP;
-- 约束:同一业务键时间段不重叠,且仅有一条当前版本
FOR rec IN (
SELECT tc.table_name,
string_agg(format('%I WITH =', kcu.column_name), ', ' ORDER BY kcu.ordinal_position) AS pk_eq_expr,
string_agg(format('%I', kcu.column_name), ', ' ORDER BY kcu.ordinal_position) AS pk_cols
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.table_schema = kcu.table_schema
AND tc.table_name = kcu.table_name
AND tc.constraint_name = kcu.constraint_name
WHERE tc.table_schema = 'billiards_dwd'
AND tc.constraint_type = 'PRIMARY KEY'
AND EXISTS (
SELECT 1 FROM information_schema.columns c
WHERE c.table_schema = 'billiards_dwd'
AND c.table_name = tc.table_name
AND c.column_name = 'scd2_start_time'
)
GROUP BY tc.table_name
)
LOOP
IF NOT EXISTS (
SELECT 1 FROM pg_constraint
WHERE conname = format('%s_scd2_no_overlap', rec.table_name)
AND conrelid = format('billiards_dwd.%s', rec.table_name)::regclass
) THEN
EXECUTE format(
'ALTER TABLE billiards_dwd.%I ADD CONSTRAINT %I EXCLUDE USING gist (%s, tstzrange(scd2_start_time, scd2_end_time) WITH &&) WHERE (scd2_is_current = 1);',
rec.table_name,
rec.table_name || '_scd2_no_overlap',
rec.pk_eq_expr
);
END IF;
IF to_regclass(format('billiards_dwd.%s_scd2_current_unique_idx', rec.table_name)) IS NULL THEN
EXECUTE format(
'CREATE UNIQUE INDEX %I ON billiards_dwd.%I (%s) WHERE (scd2_is_current = 1);',
rec.table_name || '_scd2_current_unique_idx',
rec.table_name,
rec.pk_cols
);
END IF;
END LOOP;
END
$$;
-- SCD2 统一约定DIM 表使用):
-- SCD2_start_time TIMESTAMPTZ DEFAULT now() -- 版本开始时间
-- SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31' -- 版本结束时间
-- SCD2_is_current INT DEFAULT 1 -- 当前版本标记1当前/0历史
-- SCD2_version INT DEFAULT 1 -- 版本号,自增
-- dim_site
CREATE TABLE IF NOT EXISTS dim_site (
site_id BIGINT,
org_id BIGINT,
shop_name TEXT,
business_tel TEXT,
full_address TEXT,
tenant_id BIGINT,
shop_name TEXT,
site_label TEXT,
full_address TEXT,
address TEXT,
longitude NUMERIC(10,6),
latitude NUMERIC(10,6),
tenant_site_region_id BIGINT,
business_tel TEXT,
site_type INTEGER,
shop_status INTEGER,
SCD2_start_time TIMESTAMPTZ DEFAULT now(),
SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31',
SCD2_is_current INT DEFAULT 1,
SCD2_version INT DEFAULT 1,
PRIMARY KEY (site_id)
);
COMMENT ON COLUMN dim_site.site_id IS '门店主键 ID唯一标识一家门店。与所有事实表中的 site_id 对应。 | 来源: siteProfile.id | 角色: 主键';
COMMENT ON COLUMN dim_site.org_id IS '上级组织 ID用于区域组织划分。 | 来源: siteProfile.org_id | 角色: 外键';
COMMENT ON COLUMN dim_site.shop_name IS '门店名称,展示用。 | 来源: siteProfile.shop_name';
COMMENT ON COLUMN dim_site.business_tel IS '门店电话。 | 来源: siteProfile.business_tel';
COMMENT ON COLUMN dim_site.full_address IS '门店完整地址。 | 来源: siteProfile.full_address';
COMMENT ON COLUMN dim_site.tenant_id IS '租户 ID。与其它表 tenant_id 对应。 | 来源: siteProfile.tenant_id | 角色: 外键';
COMMENT ON COLUMN dim_site.site_id IS '???? ID?????????????????? site_id ??? | ??: siteProfile.id | ??: ??';
COMMENT ON COLUMN dim_site.org_id IS '???? ID?????????? | ??: siteProfile.org_id | ??: ??';
COMMENT ON COLUMN dim_site.tenant_id IS '?? ID????? tenant_id ??? | ??: siteProfile.tenant_id | ??: ??';
COMMENT ON COLUMN dim_site.shop_name IS '????????? | ??: siteProfile.shop_name';
COMMENT ON COLUMN dim_site.site_label IS '???????????????? | ??: siteProfile.site_label';
COMMENT ON COLUMN dim_site.full_address IS '??????? | ??: siteProfile.full_address';
COMMENT ON COLUMN dim_site.address IS '???????????? | ??: siteProfile.address';
COMMENT ON COLUMN dim_site.longitude IS '???????? | ??: siteProfile.longitude';
COMMENT ON COLUMN dim_site.latitude IS '???????? | ??: siteProfile.latitude';
COMMENT ON COLUMN dim_site.tenant_site_region_id IS '????/?????????? | ??: siteProfile.tenant_site_region_id';
COMMENT ON COLUMN dim_site.business_tel IS '????? | ??: siteProfile.business_tel';
COMMENT ON COLUMN dim_site.site_type IS '??????????????????? | ??: siteProfile.site_type';
COMMENT ON COLUMN dim_site.shop_status IS '??????????????????? | ??: siteProfile.shop_status';
-- dim_site_Ex
CREATE TABLE IF NOT EXISTS dim_site_Ex (
@@ -42,6 +137,10 @@ CREATE TABLE IF NOT EXISTS dim_site_Ex (
shop_status INTEGER,
create_time TIMESTAMPTZ,
update_time TIMESTAMPTZ,
SCD2_start_time TIMESTAMPTZ DEFAULT now(),
SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31',
SCD2_is_current INT DEFAULT 1,
SCD2_version INT DEFAULT 1,
PRIMARY KEY (site_id)
);
COMMENT ON COLUMN dim_site_Ex.site_id IS '门店主键 ID唯一标识一家门店。与所有事实表中的 site_id 对应。 | 来源: siteProfile.id | 角色: 主键';
@@ -69,17 +168,19 @@ COMMENT ON COLUMN dim_site_Ex.update_time IS '门店最近更新时间。 | 来
-- dim_table
CREATE TABLE IF NOT EXISTS dim_table (
table_id BIGINT,
tenant_id BIGINT,
site_id BIGINT,
table_name TEXT,
site_table_area_id BIGINT,
site_table_area_name TEXT,
tenant_table_area_id BIGINT,
table_price NUMERIC(18,2),
SCD2_start_time TIMESTAMPTZ DEFAULT now(),
SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31',
SCD2_is_current INT DEFAULT 1,
SCD2_version INT DEFAULT 1,
PRIMARY KEY (table_id)
);
COMMENT ON COLUMN dim_table.table_id IS '台桌主键,唯一标识一张台或包厢。 | 来源: id | 角色: 主键';
COMMENT ON COLUMN dim_table.tenant_id IS '租户 ID。 | 来源: tenantId | 角色: 外键';
COMMENT ON COLUMN dim_table.site_id IS '门店 ID。 | 来源: siteId | 角色: 外键';
COMMENT ON COLUMN dim_table.table_name IS '台桌名称/编号,如 A17、888。 | 来源: tableName';
COMMENT ON COLUMN dim_table.site_table_area_id IS '门店区 ID用于区分 A区/B区/补时区等。 | 来源: siteTableAreaId | 角色: 外键';
@@ -95,8 +196,10 @@ CREATE TABLE IF NOT EXISTS dim_table_Ex (
table_cloth_use_time INTEGER,
table_cloth_use_cycle INTEGER,
table_status INTEGER,
last_maintenance_time TIMESTAMPTZ,
remark TEXT,
SCD2_start_time TIMESTAMPTZ DEFAULT now(),
SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31',
SCD2_is_current INT DEFAULT 1,
SCD2_version INT DEFAULT 1,
PRIMARY KEY (table_id)
);
COMMENT ON COLUMN dim_table_Ex.table_id IS '台桌主键,唯一标识一张台或包厢。 | 来源: id | 角色: 主键';
@@ -105,8 +208,6 @@ COMMENT ON COLUMN dim_table_Ex.is_online_reservation IS '是否可线上预约
COMMENT ON COLUMN dim_table_Ex.table_cloth_use_time IS '已使用台呢时长(秒)。 | 来源: tableClothUseTime';
COMMENT ON COLUMN dim_table_Ex.table_cloth_use_cycle IS '台呢更换周期阈值(秒)。 | 来源: tableClothUseCycle';
COMMENT ON COLUMN dim_table_Ex.table_status IS '当前台桌状态1=空闲2=使用中3=暂停中4=锁定。 | 来源: tableStatus';
COMMENT ON COLUMN dim_table_Ex.last_maintenance_time IS '最近保养时间(未在 JSON 中出现)。 | 来源: lastMaintenanceTime';
COMMENT ON COLUMN dim_table_Ex.remark IS '备注信息。 | 来源: remark';
-- dim_assistant
CREATE TABLE IF NOT EXISTS dim_assistant (
@@ -125,6 +226,10 @@ CREATE TABLE IF NOT EXISTS dim_assistant (
resign_time TIMESTAMPTZ,
leave_status INTEGER,
assistant_status INTEGER,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (assistant_id)
);
COMMENT ON COLUMN dim_assistant.assistant_id IS '助教账号 ID关联助教服务流水表。 | 来源: id | 角色: 主键';
@@ -189,6 +294,10 @@ CREATE TABLE IF NOT EXISTS dim_assistant_Ex (
light_status INTEGER,
is_team_leader INTEGER,
serial_number BIGINT,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (assistant_id)
);
COMMENT ON COLUMN dim_assistant_Ex.assistant_id IS '助教账号 ID关联助教服务流水表。 | 来源: id | 角色: 主键';
@@ -248,6 +357,10 @@ CREATE TABLE IF NOT EXISTS dim_member (
member_card_grade_name TEXT,
create_time TIMESTAMPTZ,
update_time TIMESTAMPTZ,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (member_id)
);
COMMENT ON COLUMN dim_member.member_id IS '租户内会员主键。 | 来源: id | 角色: 主键';
@@ -259,7 +372,6 @@ COMMENT ON COLUMN dim_member.nickname IS '昵称(未必是真实姓名)。 |
COMMENT ON COLUMN dim_member.member_card_grade_code IS '会员等级代码1=金卡2=银卡3=钻石卡4=黑卡?(按照 MD 文档枚举)。 | 来源: member_card_grade_code';
COMMENT ON COLUMN dim_member.member_card_grade_name IS '等级名称,中文描述。 | 来源: member_card_grade_name';
COMMENT ON COLUMN dim_member.create_time IS '会员档案创建时间。 | 来源: create_time';
COMMENT ON COLUMN dim_member.update_time IS '最近更新时间。 | 来源: update_time';
-- dim_member_Ex
CREATE TABLE IF NOT EXISTS dim_member_Ex (
@@ -270,6 +382,10 @@ CREATE TABLE IF NOT EXISTS dim_member_Ex (
growth_value NUMERIC(18,2),
user_status INTEGER,
status INTEGER,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (member_id)
);
COMMENT ON COLUMN dim_member_Ex.member_id IS '租户内会员主键。 | 来源: id | 角色: 主键';
@@ -299,6 +415,10 @@ CREATE TABLE IF NOT EXISTS dim_member_card_account (
last_consume_time TIMESTAMPTZ,
status INTEGER,
is_delete INTEGER,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (member_card_id)
);
COMMENT ON COLUMN dim_member_card_account.member_card_id IS '会员卡账户主键,唯一标识一张具体卡。 | 来源: id | 角色: 主键';
@@ -373,6 +493,10 @@ CREATE TABLE IF NOT EXISTS dim_member_card_account_Ex (
goodsCategoryId TEXT,
pdAssisnatLevel TEXT,
cxAssisnatLevel TEXT,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (member_card_id)
);
COMMENT ON COLUMN dim_member_card_account_Ex.member_card_id IS '会员卡账户主键,唯一标识一张具体卡。 | 来源: id | 角色: 主键';
@@ -444,6 +568,10 @@ CREATE TABLE IF NOT EXISTS dim_tenant_goods (
create_time TIMESTAMPTZ,
update_time TIMESTAMPTZ,
is_delete INTEGER,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (tenant_goods_id)
);
COMMENT ON COLUMN dim_tenant_goods.tenant_goods_id IS '租户级商品档案主键 ID唯一标识一条商品档案。所有业务事实表销售、库存等中引用租户级商品时应指向此字段。 | 来源: id | 角色: 主键';
@@ -481,6 +609,10 @@ CREATE TABLE IF NOT EXISTS dim_tenant_goods_Ex (
common_sale_royalty INTEGER,
point_sale_royalty INTEGER,
out_goods_id BIGINT,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (tenant_goods_id)
);
COMMENT ON COLUMN dim_tenant_goods_Ex.tenant_goods_id IS '租户级商品档案主键 ID唯一标识一条商品档案。所有业务事实表销售、库存等中引用租户级商品时应指向此字段。 | 来源: id | 角色: 主键';
@@ -524,6 +656,10 @@ CREATE TABLE IF NOT EXISTS dim_store_goods (
enable_status INTEGER,
send_state INTEGER,
is_delete INTEGER,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (site_goods_id)
);
COMMENT ON COLUMN dim_store_goods.site_goods_id IS '门店级商品 ID本表主键其它业务表中的 site_goods_id 与此对应,用于库存、销售等关联。 | 来源: id | 角色: 主键';
@@ -575,6 +711,10 @@ CREATE TABLE IF NOT EXISTS dim_store_goods_Ex (
option_required INTEGER,
remark TEXT,
sort_order INTEGER,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (site_goods_id)
);
COMMENT ON COLUMN dim_store_goods_Ex.site_goods_id IS '门店级商品 ID本表主键其它业务表中的 site_goods_id 与此对应,用于库存、销售等关联。 | 来源: id | 角色: 主键';
@@ -618,6 +758,10 @@ CREATE TABLE IF NOT EXISTS dim_goods_category (
open_salesman INTEGER,
sort_order INTEGER,
is_warehousing INTEGER,
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (category_id)
);
COMMENT ON COLUMN dim_goods_category.category_id IS '分类节点主键。来自分类树节点的 id在整个商品分类维度内唯一。用于在事实表中作为商品分类外键引用。 | 来源: id | 角色: 主键';
@@ -651,6 +795,10 @@ CREATE TABLE IF NOT EXISTS dim_groupbuy_package (
create_time TIMESTAMPTZ,
tenant_table_area_id_list VARCHAR(512),
card_type_ids VARCHAR(255),
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (groupbuy_package_id)
);
COMMENT ON COLUMN dim_groupbuy_package.groupbuy_package_id IS '门店侧团购套餐主键。每条记录一个套餐定义,供团购券核销记录指向。平台验券记录中的 group_package_id 通常指向这里。 | 来源: id | 角色: 主键';
@@ -692,6 +840,10 @@ CREATE TABLE IF NOT EXISTS dim_groupbuy_package_Ex (
effective_status INTEGER,
max_selectable_categories INTEGER,
creator_name VARCHAR(100),
SCD2_start_time TIMESTAMPTZ,
SCD2_end_time TIMESTAMPTZ,
SCD2_is_current INT,
SCD2_version INT,
PRIMARY KEY (groupbuy_package_id)
);
COMMENT ON COLUMN dim_groupbuy_package_Ex.groupbuy_package_id IS '门店侧团购套餐主键。每条记录一个套餐定义,供团购券核销记录指向。平台验券记录中的 group_package_id 通常指向这里。 | 来源: id | 角色: 主键';

View File

@@ -0,0 +1,105 @@
-- 文件说明etl_admin 调度元数据 DDL独立文件便于初始化任务单独执行
-- 包含任务注册表、游标表、运行记录表;字段注释使用中文。
CREATE SCHEMA IF NOT EXISTS etl_admin;
CREATE TABLE IF NOT EXISTS etl_admin.etl_task (
task_id BIGSERIAL PRIMARY KEY,
task_code TEXT NOT NULL,
store_id BIGINT NOT NULL,
enabled BOOLEAN DEFAULT TRUE,
cursor_field TEXT,
window_minutes_default INT DEFAULT 30,
overlap_seconds INT DEFAULT 120,
page_size INT DEFAULT 200,
retry_max INT DEFAULT 3,
params JSONB DEFAULT '{}'::jsonb,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now(),
UNIQUE (task_code, store_id)
);
COMMENT ON TABLE etl_admin.etl_task IS '任务注册表:调度依据的任务清单(与 task_registry 中的任务码对应)。';
COMMENT ON COLUMN etl_admin.etl_task.task_code IS '任务编码,需与代码中的任务码一致。';
COMMENT ON COLUMN etl_admin.etl_task.store_id IS '门店/租户粒度,区分多门店执行。';
COMMENT ON COLUMN etl_admin.etl_task.enabled IS '是否启用此任务。';
COMMENT ON COLUMN etl_admin.etl_task.cursor_field IS '增量游标字段名(可选)。';
COMMENT ON COLUMN etl_admin.etl_task.window_minutes_default IS '默认时间窗口(分钟)。';
COMMENT ON COLUMN etl_admin.etl_task.overlap_seconds IS '窗口重叠秒数,用于防止遗漏。';
COMMENT ON COLUMN etl_admin.etl_task.page_size IS '默认分页大小。';
COMMENT ON COLUMN etl_admin.etl_task.retry_max IS 'API重试次数上限。';
COMMENT ON COLUMN etl_admin.etl_task.params IS '任务级自定义参数 JSON。';
COMMENT ON COLUMN etl_admin.etl_task.created_at IS '创建时间。';
COMMENT ON COLUMN etl_admin.etl_task.updated_at IS '更新时间。';
CREATE TABLE IF NOT EXISTS etl_admin.etl_cursor (
cursor_id BIGSERIAL PRIMARY KEY,
task_id BIGINT NOT NULL REFERENCES etl_admin.etl_task(task_id) ON DELETE CASCADE,
store_id BIGINT NOT NULL,
last_start TIMESTAMPTZ,
last_end TIMESTAMPTZ,
last_id BIGINT,
last_run_id BIGINT,
extra JSONB DEFAULT '{}'::jsonb,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now(),
UNIQUE (task_id, store_id)
);
COMMENT ON TABLE etl_admin.etl_cursor IS '任务游标表:记录每个任务/门店的增量窗口及最后 run。';
COMMENT ON COLUMN etl_admin.etl_cursor.task_id IS '关联 etl_task.task_id。';
COMMENT ON COLUMN etl_admin.etl_cursor.store_id IS '门店/租户粒度。';
COMMENT ON COLUMN etl_admin.etl_cursor.last_start IS '上次窗口开始时间(含重叠偏移)。';
COMMENT ON COLUMN etl_admin.etl_cursor.last_end IS '上次窗口结束时间。';
COMMENT ON COLUMN etl_admin.etl_cursor.last_id IS '上次处理的最大主键/游标值(可选)。';
COMMENT ON COLUMN etl_admin.etl_cursor.last_run_id IS '上次运行ID对应 etl_run.run_id。';
COMMENT ON COLUMN etl_admin.etl_cursor.extra IS '附加游标信息 JSON。';
COMMENT ON COLUMN etl_admin.etl_cursor.created_at IS '创建时间。';
COMMENT ON COLUMN etl_admin.etl_cursor.updated_at IS '更新时间。';
CREATE TABLE IF NOT EXISTS etl_admin.etl_run (
run_id BIGSERIAL PRIMARY KEY,
run_uuid TEXT NOT NULL,
task_id BIGINT NOT NULL REFERENCES etl_admin.etl_task(task_id) ON DELETE CASCADE,
store_id BIGINT NOT NULL,
status TEXT NOT NULL,
started_at TIMESTAMPTZ DEFAULT now(),
ended_at TIMESTAMPTZ,
window_start TIMESTAMPTZ,
window_end TIMESTAMPTZ,
window_minutes INT,
overlap_seconds INT,
fetched_count INT DEFAULT 0,
loaded_count INT DEFAULT 0,
updated_count INT DEFAULT 0,
skipped_count INT DEFAULT 0,
error_count INT DEFAULT 0,
unknown_fields INT DEFAULT 0,
export_dir TEXT,
log_path TEXT,
request_params JSONB DEFAULT '{}'::jsonb,
manifest JSONB DEFAULT '{}'::jsonb,
error_message TEXT,
extra JSONB DEFAULT '{}'::jsonb
);
COMMENT ON TABLE etl_admin.etl_run IS '运行记录表:记录每次任务执行的窗口、状态、计数与日志路径。';
COMMENT ON COLUMN etl_admin.etl_run.run_uuid IS '本次调度的唯一标识。';
COMMENT ON COLUMN etl_admin.etl_run.task_id IS '关联 etl_task.task_id。';
COMMENT ON COLUMN etl_admin.etl_run.store_id IS '门店/租户粒度。';
COMMENT ON COLUMN etl_admin.etl_run.status IS '运行状态SUCC/FAIL/PARTIAL 等)。';
COMMENT ON COLUMN etl_admin.etl_run.started_at IS '开始时间。';
COMMENT ON COLUMN etl_admin.etl_run.ended_at IS '结束时间。';
COMMENT ON COLUMN etl_admin.etl_run.window_start IS '本次窗口开始时间。';
COMMENT ON COLUMN etl_admin.etl_run.window_end IS '本次窗口结束时间。';
COMMENT ON COLUMN etl_admin.etl_run.window_minutes IS '窗口跨度(分钟)。';
COMMENT ON COLUMN etl_admin.etl_run.overlap_seconds IS '窗口重叠秒数。';
COMMENT ON COLUMN etl_admin.etl_run.fetched_count IS '抓取/读取的记录数。';
COMMENT ON COLUMN etl_admin.etl_run.loaded_count IS '插入的记录数。';
COMMENT ON COLUMN etl_admin.etl_run.updated_count IS '更新的记录数。';
COMMENT ON COLUMN etl_admin.etl_run.skipped_count IS '跳过的记录数。';
COMMENT ON COLUMN etl_admin.etl_run.error_count IS '错误记录数。';
COMMENT ON COLUMN etl_admin.etl_run.unknown_fields IS '未知字段计数(清洗阶段)。';
COMMENT ON COLUMN etl_admin.etl_run.export_dir IS '抓取/导出目录。';
COMMENT ON COLUMN etl_admin.etl_run.log_path IS '日志路径。';
COMMENT ON COLUMN etl_admin.etl_run.request_params IS '请求参数 JSON。';
COMMENT ON COLUMN etl_admin.etl_run.manifest IS '运行产出清单/统计 JSON。';
COMMENT ON COLUMN etl_admin.etl_run.error_message IS '错误信息(若失败)。';
COMMENT ON COLUMN etl_admin.etl_run.extra IS '附加字段,保留扩展。';

File diff suppressed because it is too large Load Diff

View File

@@ -1,34 +1,35 @@
-- 将新的 ODS 任务注册到 etl_admin.etl_task(根据需要替换 store_id
-- 使用方式(示例):
-- 灏嗘柊鐨?ODS 浠诲姟娉ㄥ唽鍒?etl_admin.etl_task锛堟牴鎹渶瑕佹浛鎹?store_id锛?
-- 浣跨敤鏂瑰紡锛堢ず渚嬶級锛?
-- psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
-- 或者在 psql 中执行本文件内容。
-- 鎴栬€呭湪 psql 涓墽琛屾湰鏂囦欢鍐呭銆?
WITH target_store AS (
SELECT 2790685415443269::bigint AS store_id -- TODO: 替换为实际 store_id
SELECT 2790685415443269::bigint AS store_id -- TODO: 鏇挎崲涓哄疄闄?store_id
),
task_codes AS (
SELECT unnest(ARRAY[
'ODS_ASSISTANT_ACCOUNTS',
'ODS_ASSISTANT_LEDGER',
'ODS_ASSISTANT_ABOLISH',
'ODS_INVENTORY_CHANGE',
'assistant_accounts_masterS',
'assistant_service_records',
'assistant_cancellation_records',
'goods_stock_movements',
'ODS_INVENTORY_STOCK',
'ODS_PACKAGE',
'ODS_GROUP_BUY_REDEMPTION',
'ODS_MEMBER',
'ODS_MEMBER_BALANCE',
'ODS_MEMBER_CARD',
'member_stored_value_cards',
'ODS_PAYMENT',
'ODS_REFUND',
'ODS_COUPON_VERIFY',
'ODS_RECHARGE_SETTLE',
'platform_coupon_redemption_records',
'recharge_settlements',
'ODS_TABLES',
'ODS_GOODS_CATEGORY',
'ODS_STORE_GOODS',
'ODS_TABLE_DISCOUNT',
'table_fee_discount_records',
'ODS_TENANT_GOODS',
'ODS_SETTLEMENT_TICKET',
'ODS_ORDER_SETTLE'
'settlement_records',
'INIT_ODS_SCHEMA'
]) AS task_code
)
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
@@ -37,3 +38,4 @@ FROM task_codes t CROSS JOIN target_store s
ON CONFLICT (task_code, store_id) DO UPDATE
SET enabled = EXCLUDED.enabled;

View File

@@ -0,0 +1,52 @@
{
"source_counts": {
"assistant_accounts_master.json": 2,
"assistant_cancellation_records.json": 2,
"assistant_service_records.json": 2,
"goods_stock_movements.json": 2,
"goods_stock_summary.json": 161,
"group_buy_packages.json": 2,
"group_buy_redemption_records.json": 2,
"member_balance_changes.json": 2,
"member_profiles.json": 2,
"member_stored_value_cards.json": 2,
"payment_transactions.json": 200,
"platform_coupon_redemption_records.json": 200,
"recharge_settlements.json": 2,
"refund_transactions.json": 11,
"settlement_records.json": 2,
"settlement_ticket_details.json": 193,
"site_tables_master.json": 2,
"stock_goods_category_tree.json": 2,
"store_goods_master.json": 2,
"store_goods_sales_records.json": 2,
"table_fee_discount_records.json": 2,
"table_fee_transactions.json": 2,
"tenant_goods_master.json": 2
},
"ods_counts": {
"member_profiles": 199,
"member_balance_changes": 200,
"member_stored_value_cards": 200,
"recharge_settlements": 75,
"settlement_records": 200,
"assistant_cancellation_records": 15,
"assistant_accounts_master": 50,
"assistant_service_records": 200,
"site_tables_master": 71,
"table_fee_discount_records": 200,
"table_fee_transactions": 200,
"goods_stock_movements": 200,
"stock_goods_category_tree": 9,
"goods_stock_summary": 161,
"payment_transactions": 200,
"refund_transactions": 11,
"platform_coupon_redemption_records": 200,
"tenant_goods_master": 156,
"group_buy_packages": 17,
"group_buy_redemption_records": 200,
"settlement_ticket_details": 193,
"store_goods_master": 161,
"store_goods_sales_records": 200
}
}

View File

@@ -15,10 +15,14 @@ from tasks.table_discount_task import TableDiscountTask
from tasks.assistant_abolish_task import AssistantAbolishTask
from tasks.ledger_task import LedgerTask
from tasks.ods_tasks import ODS_TASK_CLASSES
from tasks.ticket_dwd_task import TicketDwdTask
from tasks.manual_ingest_task import ManualIngestTask
from tasks.payments_dwd_task import PaymentsDwdTask
from tasks.members_dwd_task import MembersDwdTask
from tasks.init_schema_task import InitOdsSchemaTask
from tasks.init_dwd_schema_task import InitDwdSchemaTask
from tasks.dwd_load_task import DwdLoadTask
from tasks.ticket_dwd_task import TicketDwdTask
from tasks.dwd_quality_task import DwdQualityTask
class TaskRegistry:
"""任务注册和工厂"""
@@ -64,5 +68,9 @@ default_registry.register("TICKET_DWD", TicketDwdTask)
default_registry.register("MANUAL_INGEST", ManualIngestTask)
default_registry.register("PAYMENTS_DWD", PaymentsDwdTask)
default_registry.register("MEMBERS_DWD", MembersDwdTask)
default_registry.register("INIT_ODS_SCHEMA", InitOdsSchemaTask)
default_registry.register("INIT_DWD_SCHEMA", InitDwdSchemaTask)
default_registry.register("DWD_LOAD_FROM_ODS", DwdLoadTask)
default_registry.register("DWD_QUALITY_CHECK", DwdQualityTask)
for code, task_cls in ODS_TASK_CLASSES.items():
default_registry.register(code, task_cls)

View File

@@ -0,0 +1,692 @@
{
"generated_at": "2025-12-09T05:21:24.745244",
"tables": [
{
"dwd_table": "billiards_dwd.dim_site",
"ods_table": "billiards_ods.table_fee_transactions",
"count": {
"dwd": 1,
"ods": 200,
"diff": -199
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_site_ex",
"ods_table": "billiards_ods.table_fee_transactions",
"count": {
"dwd": 1,
"ods": 200,
"diff": -199
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_table",
"ods_table": "billiards_ods.site_tables_master",
"count": {
"dwd": 71,
"ods": 71,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_table_ex",
"ods_table": "billiards_ods.site_tables_master",
"count": {
"dwd": 71,
"ods": 71,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_assistant",
"ods_table": "billiards_ods.assistant_accounts_master",
"count": {
"dwd": 50,
"ods": 50,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_assistant_ex",
"ods_table": "billiards_ods.assistant_accounts_master",
"count": {
"dwd": 50,
"ods": 50,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_member",
"ods_table": "billiards_ods.member_profiles",
"count": {
"dwd": 199,
"ods": 199,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_member_ex",
"ods_table": "billiards_ods.member_profiles",
"count": {
"dwd": 199,
"ods": 199,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_member_card_account",
"ods_table": "billiards_ods.member_stored_value_cards",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "balance",
"dwd_sum": 31061.03,
"ods_sum": 31061.03,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dim_member_card_account_ex",
"ods_table": "billiards_ods.member_stored_value_cards",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "deliveryfeededuct",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dim_tenant_goods",
"ods_table": "billiards_ods.tenant_goods_master",
"count": {
"dwd": 156,
"ods": 156,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_tenant_goods_ex",
"ods_table": "billiards_ods.tenant_goods_master",
"count": {
"dwd": 156,
"ods": 156,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_store_goods",
"ods_table": "billiards_ods.store_goods_master",
"count": {
"dwd": 161,
"ods": 161,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_store_goods_ex",
"ods_table": "billiards_ods.store_goods_master",
"count": {
"dwd": 161,
"ods": 161,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_goods_category",
"ods_table": "billiards_ods.stock_goods_category_tree",
"count": {
"dwd": 26,
"ods": 9,
"diff": 17
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_groupbuy_package",
"ods_table": "billiards_ods.group_buy_packages",
"count": {
"dwd": 17,
"ods": 17,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dim_groupbuy_package_ex",
"ods_table": "billiards_ods.group_buy_packages",
"count": {
"dwd": 17,
"ods": 17,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_settlement_head",
"ods_table": "billiards_ods.settlement_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_settlement_head_ex",
"ods_table": "billiards_ods.settlement_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_table_fee_log",
"ods_table": "billiards_ods.table_fee_transactions",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "adjust_amount",
"dwd_sum": 1157.45,
"ods_sum": 1157.45,
"diff": 0.0
},
{
"column": "coupon_promotion_amount",
"dwd_sum": 11244.49,
"ods_sum": 11244.49,
"diff": 0.0
},
{
"column": "ledger_amount",
"dwd_sum": 18107.0,
"ods_sum": 18107.0,
"diff": 0.0
},
{
"column": "member_discount_amount",
"dwd_sum": 1149.19,
"ods_sum": 1149.19,
"diff": 0.0
},
{
"column": "real_table_charge_money",
"dwd_sum": 5705.06,
"ods_sum": 5705.06,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_table_fee_log_ex",
"ods_table": "billiards_ods.table_fee_transactions",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "fee_total",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "mgmt_fee",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "service_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "used_card_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_table_fee_adjust",
"ods_table": "billiards_ods.table_fee_discount_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "ledger_amount",
"dwd_sum": 20650.84,
"ods_sum": 20650.84,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_table_fee_adjust_ex",
"ods_table": "billiards_ods.table_fee_discount_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_store_goods_sale",
"ods_table": "billiards_ods.store_goods_sales_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "cost_money",
"dwd_sum": 22.3,
"ods_sum": 22.3,
"diff": 0.0
},
{
"column": "ledger_amount",
"dwd_sum": 4583.0,
"ods_sum": 4583.0,
"diff": 0.0
},
{
"column": "real_goods_money",
"dwd_sum": 3791.0,
"ods_sum": 3791.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_store_goods_sale_ex",
"ods_table": "billiards_ods.store_goods_sales_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "coupon_deduct_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "discount_money",
"dwd_sum": 792.0,
"ods_sum": 792.0,
"diff": 0.0
},
{
"column": "member_discount_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "option_coupon_deduct_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "option_member_discount_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "point_discount_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "point_discount_money_cost",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "push_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_assistant_service_log",
"ods_table": "billiards_ods.assistant_service_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "coupon_deduct_money",
"dwd_sum": 626.83,
"ods_sum": 626.83,
"diff": 0.0
},
{
"column": "ledger_amount",
"dwd_sum": 63251.37,
"ods_sum": 63251.37,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_assistant_service_log_ex",
"ods_table": "billiards_ods.assistant_service_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "manual_discount_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "member_discount_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "service_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_assistant_trash_event",
"ods_table": "billiards_ods.assistant_cancellation_records",
"count": {
"dwd": 15,
"ods": 15,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_assistant_trash_event_ex",
"ods_table": "billiards_ods.assistant_cancellation_records",
"count": {
"dwd": 15,
"ods": 15,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_member_balance_change",
"ods_table": "billiards_ods.member_balance_changes",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_member_balance_change_ex",
"ods_table": "billiards_ods.member_balance_changes",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "refund_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_groupbuy_redemption",
"ods_table": "billiards_ods.group_buy_redemption_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "coupon_money",
"dwd_sum": 12266.0,
"ods_sum": 12266.0,
"diff": 0.0
},
{
"column": "ledger_amount",
"dwd_sum": 12049.53,
"ods_sum": 12049.53,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_groupbuy_redemption_ex",
"ods_table": "billiards_ods.group_buy_redemption_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "assistant_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "assistant_service_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "goods_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "recharge_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "reward_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "table_service_promotion_money",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_platform_coupon_redemption",
"ods_table": "billiards_ods.platform_coupon_redemption_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "coupon_money",
"dwd_sum": 11956.0,
"ods_sum": 11956.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_platform_coupon_redemption_ex",
"ods_table": "billiards_ods.platform_coupon_redemption_records",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_recharge_order",
"ods_table": "billiards_ods.recharge_settlements",
"count": {
"dwd": 74,
"ods": 74,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_recharge_order_ex",
"ods_table": "billiards_ods.recharge_settlements",
"count": {
"dwd": 74,
"ods": 74,
"diff": 0
},
"amounts": []
},
{
"dwd_table": "billiards_dwd.dwd_payment",
"ods_table": "billiards_ods.payment_transactions",
"count": {
"dwd": 200,
"ods": 200,
"diff": 0
},
"amounts": [
{
"column": "pay_amount",
"dwd_sum": 10863.0,
"ods_sum": 10863.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_refund",
"ods_table": "billiards_ods.refund_transactions",
"count": {
"dwd": 11,
"ods": 11,
"diff": 0
},
"amounts": [
{
"column": "channel_fee",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "pay_amount",
"dwd_sum": -62186.0,
"ods_sum": -62186.0,
"diff": 0.0
}
]
},
{
"dwd_table": "billiards_dwd.dwd_refund_ex",
"ods_table": "billiards_ods.refund_transactions",
"count": {
"dwd": 11,
"ods": 11,
"diff": 0
},
"amounts": [
{
"column": "balance_frozen_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "card_frozen_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "refund_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
},
{
"column": "round_amount",
"dwd_sum": 0.0,
"ods_sum": 0.0,
"diff": 0.0
}
]
}
],
"note": "行数/金额核对,金额字段基于列名包含 amount/money/fee/balance 的数值列自动扫描。"
}

27
etl_billiards/run_ods.bat Normal file
View File

@@ -0,0 +1,27 @@
@echo off
REM -*- coding: utf-8 -*-
REM 说明:一键重建 ODS执行 INIT_ODS_SCHEMA并灌入示例 JSON执行 MANUAL_INGEST
REM 使用配置:.env 中 PG_DSN、INGEST_SOURCE_DIR或通过参数覆盖
setlocal
cd /d %~dp0
REM 如果需要覆盖示例目录,可修改下面的 INGEST_DIR
set "INGEST_DIR=C:\dev\LLTQ\export\test-json-doc"
echo [INIT_ODS_SCHEMA] 准备执行,源目录=%INGEST_DIR%
python -m cli.main --tasks INIT_ODS_SCHEMA --pipeline-flow INGEST_ONLY --ingest-source "%INGEST_DIR%"
if errorlevel 1 (
echo INIT_ODS_SCHEMA 失败,退出
exit /b 1
)
echo [MANUAL_INGEST] 准备执行,源目录=%INGEST_DIR%
python -m cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "%INGEST_DIR%"
if errorlevel 1 (
echo MANUAL_INGEST 失败,退出
exit /b 1
)
echo 全部完成。
endlocal

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
"""Populate PRD DWD tables from ODS payload snapshots."""
from __future__ import annotations
@@ -16,9 +16,9 @@ SQL_STEPS: list[tuple[str, str]] = [
INSERT INTO billiards_dwd.dim_tenant (tenant_id, tenant_name, status)
SELECT DISTINCT tenant_id, 'default' AS tenant_name, 'active' AS status
FROM (
SELECT tenant_id FROM billiards_ods.ods_order_settle
SELECT tenant_id FROM billiards_ods.settlement_records
UNION SELECT tenant_id FROM billiards_ods.ods_order_receipt_detail
UNION SELECT tenant_id FROM billiards_ods.ods_member_profile
UNION SELECT tenant_id FROM billiards_ods.member_profiles
) s
WHERE tenant_id IS NOT NULL
ON CONFLICT (tenant_id) DO UPDATE SET updated_at = now();
@@ -30,7 +30,7 @@ SQL_STEPS: list[tuple[str, str]] = [
INSERT INTO billiards_dwd.dim_site (site_id, tenant_id, site_name, status)
SELECT DISTINCT site_id, MAX(tenant_id) AS tenant_id, 'default' AS site_name, 'active' AS status
FROM (
SELECT site_id, tenant_id FROM billiards_ods.ods_order_settle
SELECT site_id, tenant_id FROM billiards_ods.settlement_records
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_order_receipt_detail
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_table_info
) s
@@ -84,7 +84,7 @@ SQL_STEPS: list[tuple[str, str]] = [
"""
INSERT INTO billiards_dwd.dim_member_card_type (card_type_id, card_type_name, discount_rate)
SELECT DISTINCT card_type_id, card_type_name, discount_rate
FROM billiards_ods.ods_member_card
FROM billiards_ods.member_stored_value_cards
WHERE card_type_id IS NOT NULL
ON CONFLICT (card_type_id) DO UPDATE SET
card_type_name = EXCLUDED.card_type_name,
@@ -119,10 +119,10 @@ SQL_STEPS: list[tuple[str, str]] = [
prof.wechat_id,
prof.alipay_id,
prof.remarks
FROM billiards_ods.ods_member_profile prof
FROM billiards_ods.member_profiles prof
LEFT JOIN (
SELECT DISTINCT site_id, member_id, card_type_id AS member_type_id, card_type_name AS member_type_name
FROM billiards_ods.ods_member_card
FROM billiards_ods.member_stored_value_cards
) card
ON prof.site_id = card.site_id AND prof.member_id = card.member_id
WHERE prof.member_id IS NOT NULL
@@ -167,7 +167,7 @@ SQL_STEPS: list[tuple[str, str]] = [
"""
INSERT INTO billiards_dwd.dim_assistant (assistant_id, assistant_name, mobile, status)
SELECT DISTINCT assistant_id, assistant_name, mobile, status
FROM billiards_ods.ods_assistant_account
FROM billiards_ods.assistant_accounts_master
WHERE assistant_id IS NOT NULL
ON CONFLICT (assistant_id) DO UPDATE SET
assistant_name = EXCLUDED.assistant_name,
@@ -181,7 +181,7 @@ SQL_STEPS: list[tuple[str, str]] = [
"""
INSERT INTO billiards_dwd.dim_pay_method (pay_method_code, pay_method_name, is_stored_value, status)
SELECT DISTINCT pay_method_code, pay_method_name, FALSE AS is_stored_value, 'active' AS status
FROM billiards_ods.ods_payment_record
FROM billiards_ods.payment_transactions
WHERE pay_method_code IS NOT NULL
ON CONFLICT (pay_method_code) DO UPDATE SET
pay_method_name = EXCLUDED.pay_method_name,
@@ -250,7 +250,7 @@ SQL_STEPS: list[tuple[str, str]] = [
final_table_fee,
FALSE AS is_canceled,
NULL::TIMESTAMPTZ AS cancel_time
FROM billiards_ods.ods_table_use_log
FROM billiards_ods.table_fee_transactions_log
ON CONFLICT (site_id, ledger_id) DO NOTHING;
""",
),
@@ -325,7 +325,7 @@ SQL_STEPS: list[tuple[str, str]] = [
pay_time,
relate_type,
relate_id
FROM billiards_ods.ods_payment_record
FROM billiards_ods.payment_transactions
ON CONFLICT (site_id, pay_id) DO NOTHING;
""",
),
@@ -346,7 +346,7 @@ SQL_STEPS: list[tuple[str, str]] = [
refund_amount,
refund_time,
status
FROM billiards_ods.ods_refund_record
FROM billiards_ods.refund_transactions
ON CONFLICT (site_id, refund_id) DO NOTHING;
""",
),
@@ -369,7 +369,7 @@ SQL_STEPS: list[tuple[str, str]] = [
balance_before,
balance_after,
change_time
FROM billiards_ods.ods_balance_change
FROM billiards_ods.member_balance_changes
ON CONFLICT (site_id, change_id) DO NOTHING;
""",
),
@@ -423,3 +423,4 @@ def main() -> int:
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,117 @@
# -*- coding: utf-8 -*-
"""
ODS JSON 字段核对脚本:对照当前数据库中的 ODS 表字段,检查示例 JSON默认目录 C:\\dev\\LLTQ\\export\\test-json-doc
是否包含同名键,并输出每表未命中的字段,便于补充映射或确认确实无源字段。
使用方法:
set PG_DSN=postgresql://... # 如 .env 中配置
python -m etl_billiards.scripts.check_ods_json_vs_table
"""
from __future__ import annotations
import json
import os
import pathlib
from typing import Dict, Iterable, Set, Tuple
import psycopg2
from etl_billiards.tasks.manual_ingest_task import ManualIngestTask
def _flatten_keys(obj, prefix: str = "") -> Set[str]:
"""递归展开 JSON 所有键路径,返回形如 data.assistantInfos.id 的集合。列表不保留索引,仅继续向下展开。"""
keys: Set[str] = set()
if isinstance(obj, dict):
for k, v in obj.items():
new_prefix = f"{prefix}.{k}" if prefix else k
keys.add(new_prefix)
keys |= _flatten_keys(v, new_prefix)
elif isinstance(obj, list):
for item in obj:
keys |= _flatten_keys(item, prefix)
return keys
def _load_json_keys(path: pathlib.Path) -> Tuple[Set[str], dict[str, Set[str]]]:
"""读取单个 JSON 文件并返回展开后的键集合以及末段->路径列表映射,若文件不存在或无法解析则返回空集合。"""
if not path.exists():
return set(), {}
data = json.loads(path.read_text(encoding="utf-8"))
paths = _flatten_keys(data)
last_map: dict[str, Set[str]] = {}
for p in paths:
last = p.split(".")[-1].lower()
last_map.setdefault(last, set()).add(p)
return paths, last_map
def _load_ods_columns(dsn: str) -> Dict[str, Set[str]]:
"""从数据库读取 billiards_ods.* 的列名集合,按表返回。"""
conn = psycopg2.connect(dsn)
cur = conn.cursor()
cur.execute(
"""
SELECT table_name, column_name
FROM information_schema.columns
WHERE table_schema='billiards_ods'
ORDER BY table_name, ordinal_position
"""
)
result: Dict[str, Set[str]] = {}
for table, col in cur.fetchall():
result.setdefault(table, set()).add(col.lower())
cur.close()
conn.close()
return result
def main() -> None:
"""主流程:遍历 FILE_MAPPING 中的 ODS 表,检查 JSON 键覆盖情况并打印报告。"""
dsn = os.environ.get("PG_DSN")
json_dir = pathlib.Path(os.environ.get("JSON_DOC_DIR", r"C:\dev\LLTQ\export\test-json-doc"))
ods_cols_map = _load_ods_columns(dsn)
print(f"使用 JSON 目录: {json_dir}")
print(f"连接 DSN: {dsn}")
print("=" * 80)
for keywords, ods_table in ManualIngestTask.FILE_MAPPING:
table = ods_table.split(".")[-1]
cols = ods_cols_map.get(table, set())
file_name = f"{keywords[0]}.json"
file_path = json_dir / file_name
keys_full, path_map = _load_json_keys(file_path)
key_last_parts = set(path_map.keys())
missing: Set[str] = set()
extra_keys: Set[str] = set()
present: Set[str] = set()
for col in sorted(cols):
if col in key_last_parts:
present.add(col)
else:
missing.add(col)
for k in key_last_parts:
if k not in cols:
extra_keys.add(k)
print(f"[{table}] 文件={file_name} 列数={len(cols)} JSON键(末段)覆盖={len(present)}/{len(cols)}")
if missing:
print(" 未命中列:", ", ".join(sorted(missing)))
else:
print(" 未命中列: 无")
if extra_keys:
extras = []
for k in sorted(extra_keys):
paths = ", ".join(sorted(path_map.get(k, [])))
extras.append(f"{k} ({paths})")
print(" JSON 仅有(表无此列):", "; ".join(extras))
else:
print(" JSON 仅有(表无此列): 无")
print("-" * 80)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,907 @@
# -*- coding: utf-8 -*-
"""DWD 装载任务:从 ODS 增量写入 DWD维度 SCD2事实按时间增量"""
from __future__ import annotations
from datetime import datetime
from typing import Any, Dict, Iterable, List, Sequence
from psycopg2.extras import RealDictCursor
from .base_task import BaseTask, TaskContext
class DwdLoadTask(BaseTask):
"""负责 DWD 装载:维度表做 SCD2 合并,事实表按时间增量写入。"""
# DWD -> ODS 表映射ODS 表名已与示例 JSON 前缀统一)
TABLE_MAP: dict[str, str] = {
# 维度
# 门店:改用台费流水中的 siteprofile 快照,补齐 org/地址等字段
"billiards_dwd.dim_site": "billiards_ods.table_fee_transactions",
"billiards_dwd.dim_site_ex": "billiards_ods.table_fee_transactions",
"billiards_dwd.dim_table": "billiards_ods.site_tables_master",
"billiards_dwd.dim_table_ex": "billiards_ods.site_tables_master",
"billiards_dwd.dim_assistant": "billiards_ods.assistant_accounts_master",
"billiards_dwd.dim_assistant_ex": "billiards_ods.assistant_accounts_master",
"billiards_dwd.dim_member": "billiards_ods.member_profiles",
"billiards_dwd.dim_member_ex": "billiards_ods.member_profiles",
"billiards_dwd.dim_member_card_account": "billiards_ods.member_stored_value_cards",
"billiards_dwd.dim_member_card_account_ex": "billiards_ods.member_stored_value_cards",
"billiards_dwd.dim_tenant_goods": "billiards_ods.tenant_goods_master",
"billiards_dwd.dim_tenant_goods_ex": "billiards_ods.tenant_goods_master",
"billiards_dwd.dim_store_goods": "billiards_ods.store_goods_master",
"billiards_dwd.dim_store_goods_ex": "billiards_ods.store_goods_master",
"billiards_dwd.dim_goods_category": "billiards_ods.stock_goods_category_tree",
"billiards_dwd.dim_groupbuy_package": "billiards_ods.group_buy_packages",
"billiards_dwd.dim_groupbuy_package_ex": "billiards_ods.group_buy_packages",
# 事实
"billiards_dwd.dwd_settlement_head": "billiards_ods.settlement_records",
"billiards_dwd.dwd_settlement_head_ex": "billiards_ods.settlement_records",
"billiards_dwd.dwd_table_fee_log": "billiards_ods.table_fee_transactions",
"billiards_dwd.dwd_table_fee_log_ex": "billiards_ods.table_fee_transactions",
"billiards_dwd.dwd_table_fee_adjust": "billiards_ods.table_fee_discount_records",
"billiards_dwd.dwd_table_fee_adjust_ex": "billiards_ods.table_fee_discount_records",
"billiards_dwd.dwd_store_goods_sale": "billiards_ods.store_goods_sales_records",
"billiards_dwd.dwd_store_goods_sale_ex": "billiards_ods.store_goods_sales_records",
"billiards_dwd.dwd_assistant_service_log": "billiards_ods.assistant_service_records",
"billiards_dwd.dwd_assistant_service_log_ex": "billiards_ods.assistant_service_records",
"billiards_dwd.dwd_assistant_trash_event": "billiards_ods.assistant_cancellation_records",
"billiards_dwd.dwd_assistant_trash_event_ex": "billiards_ods.assistant_cancellation_records",
"billiards_dwd.dwd_member_balance_change": "billiards_ods.member_balance_changes",
"billiards_dwd.dwd_member_balance_change_ex": "billiards_ods.member_balance_changes",
"billiards_dwd.dwd_groupbuy_redemption": "billiards_ods.group_buy_redemption_records",
"billiards_dwd.dwd_groupbuy_redemption_ex": "billiards_ods.group_buy_redemption_records",
"billiards_dwd.dwd_platform_coupon_redemption": "billiards_ods.platform_coupon_redemption_records",
"billiards_dwd.dwd_platform_coupon_redemption_ex": "billiards_ods.platform_coupon_redemption_records",
"billiards_dwd.dwd_recharge_order": "billiards_ods.recharge_settlements",
"billiards_dwd.dwd_recharge_order_ex": "billiards_ods.recharge_settlements",
"billiards_dwd.dwd_payment": "billiards_ods.payment_transactions",
"billiards_dwd.dwd_refund": "billiards_ods.refund_transactions",
"billiards_dwd.dwd_refund_ex": "billiards_ods.refund_transactions",
}
SCD_COLS = {"scd2_start_time", "scd2_end_time", "scd2_is_current", "scd2_version"}
FACT_ORDER_CANDIDATES = [
"fetched_at",
"pay_time",
"create_time",
"update_time",
"occur_time",
"settle_time",
"start_use_time",
]
# 特殊列映射dwd 列名 -> 源列表达式(可选 CAST
FACT_MAPPINGS: dict[str, list[tuple[str, str, str | None]]] = {
# 维度表(补齐主键/字段差异)
"billiards_dwd.dim_site": [
("org_id", "siteprofile->>'org_id'", None),
("shop_name", "siteprofile->>'shop_name'", None),
("site_label", "siteprofile->>'site_label'", None),
("full_address", "siteprofile->>'full_address'", None),
("address", "siteprofile->>'address'", None),
("longitude", "siteprofile->>'longitude'", "numeric"),
("latitude", "siteprofile->>'latitude'", "numeric"),
("tenant_site_region_id", "siteprofile->>'tenant_site_region_id'", None),
("business_tel", "siteprofile->>'business_tel'", None),
("site_type", "siteprofile->>'site_type'", None),
("shop_status", "siteprofile->>'shop_status'", None),
("tenant_id", "siteprofile->>'tenant_id'", None),
],
"billiards_dwd.dim_site_ex": [
("auto_light", "siteprofile->>'auto_light'", None),
("attendance_enabled", "siteprofile->>'attendance_enabled'", None),
("attendance_distance", "siteprofile->>'attendance_distance'", None),
("prod_env", "siteprofile->>'prod_env'", None),
("light_status", "siteprofile->>'light_status'", None),
("light_type", "siteprofile->>'light_type'", None),
("light_token", "siteprofile->>'light_token'", None),
("address", "siteprofile->>'address'", None),
("avatar", "siteprofile->>'avatar'", None),
("wifi_name", "siteprofile->>'wifi_name'", None),
("wifi_password", "siteprofile->>'wifi_password'", None),
("customer_service_qrcode", "siteprofile->>'customer_service_qrcode'", None),
("customer_service_wechat", "siteprofile->>'customer_service_wechat'", None),
("fixed_pay_qrcode", "siteprofile->>'fixed_pay_qrCode'", None),
("longitude", "siteprofile->>'longitude'", "numeric"),
("latitude", "siteprofile->>'latitude'", "numeric"),
("tenant_site_region_id", "siteprofile->>'tenant_site_region_id'", None),
("site_type", "siteprofile->>'site_type'", None),
("site_label", "siteprofile->>'site_label'", None),
("shop_status", "siteprofile->>'shop_status'", None),
("create_time", "siteprofile->>'create_time'", "timestamptz"),
("update_time", "siteprofile->>'update_time'", "timestamptz"),
],
"billiards_dwd.dim_table": [
("table_id", "id", None),
("site_table_area_name", "areaname", None),
("tenant_table_area_id", "site_table_area_id", None),
],
"billiards_dwd.dim_table_ex": [
("table_id", "id", None),
("table_cloth_use_time", "table_cloth_use_time", None),
],
"billiards_dwd.dim_assistant": [("assistant_id", "id", None), ("user_id", "staff_id", None)],
"billiards_dwd.dim_assistant_ex": [
("assistant_id", "id", None),
("introduce", "introduce", None),
("group_name", "group_name", None),
("light_equipment_id", "light_equipment_id", None),
],
"billiards_dwd.dim_member": [("member_id", "id", None)],
"billiards_dwd.dim_member_ex": [
("member_id", "id", None),
("register_site_name", "site_name", None),
],
"billiards_dwd.dim_member_card_account": [("member_card_id", "id", None)],
"billiards_dwd.dim_member_card_account_ex": [
("member_card_id", "id", None),
("tenant_name", "tenantname", None),
("tenantavatar", "tenantavatar", None),
("card_no", "card_no", None),
("bind_password", "bind_password", None),
("use_scene", "use_scene", None),
("tableareaid", "tableareaid", None),
("goodscategoryid", "goodscategoryid", None),
],
"billiards_dwd.dim_tenant_goods": [
("tenant_goods_id", "id", None),
("category_name", "categoryname", None),
],
"billiards_dwd.dim_tenant_goods_ex": [
("tenant_goods_id", "id", None),
("remark_name", "remark_name", None),
("goods_bar_code", "goods_bar_code", None),
("commodity_code_list", "commodity_code", None),
("is_in_site", "isinsite", "boolean"),
],
"billiards_dwd.dim_store_goods": [
("site_goods_id", "id", None),
("category_level1_name", "onecategoryname", None),
("category_level2_name", "twocategoryname", None),
("created_at", "create_time", None),
("updated_at", "update_time", None),
("avg_monthly_sales", "average_monthly_sales", None),
("batch_stock_qty", "stock", None),
("sale_qty", "sale_num", None),
("total_sales_qty", "total_sales", None),
],
"billiards_dwd.dim_store_goods_ex": [
("site_goods_id", "id", None),
("goods_barcode", "goods_bar_code", None),
("stock_qty", "stock", None),
("stock_secondary_qty", "stock_a", None),
("safety_stock_qty", "safe_stock", None),
("site_name", "sitename", None),
("goods_cover_url", "goods_cover", None),
("provisional_total_cost", "total_purchase_cost", None),
("is_discountable", "able_discount", None),
("freeze_status", "freeze", None),
("remark", "remark", None),
("days_on_shelf", "days_available", None),
("sort_order", "sort", None),
],
"billiards_dwd.dim_goods_category": [
("category_id", "id", None),
("tenant_id", "tenant_id", None),
("category_name", "category_name", None),
("alias_name", "alias_name", None),
("parent_category_id", "pid", None),
("business_name", "business_name", None),
("tenant_goods_business_id", "tenant_goods_business_id", None),
("sort_order", "sort", None),
("open_salesman", "open_salesman", None),
("is_warehousing", "is_warehousing", None),
("category_level", "CASE WHEN pid = 0 THEN 1 ELSE 2 END", None),
("is_leaf", "CASE WHEN categoryboxes IS NULL OR jsonb_array_length(categoryboxes)=0 THEN 1 ELSE 0 END", None),
],
"billiards_dwd.dim_groupbuy_package": [
("groupbuy_package_id", "id", None),
("package_template_id", "package_id", None),
("coupon_face_value", "coupon_money", None),
("duration_seconds", "duration", None),
],
"billiards_dwd.dim_groupbuy_package_ex": [
("groupbuy_package_id", "id", None),
("table_area_id", "table_area_id", None),
("tenant_table_area_id", "tenant_table_area_id", None),
("usable_range", "usable_range", None),
("table_area_id_list", "table_area_id_list", None),
("package_type", "type", None),
],
# 事实表主键及关键差异列
"billiards_dwd.dwd_table_fee_log": [("table_fee_log_id", "id", None)],
"billiards_dwd.dwd_table_fee_log_ex": [
("table_fee_log_id", "id", None),
("salesman_name", "salesman_name", None),
],
"billiards_dwd.dwd_table_fee_adjust": [
("table_fee_adjust_id", "id", None),
("table_id", "site_table_id", None),
("table_area_id", "tenant_table_area_id", None),
("table_area_name", "tableprofile->>'table_area_name'", None),
("adjust_time", "create_time", None),
],
"billiards_dwd.dwd_table_fee_adjust_ex": [
("table_fee_adjust_id", "id", None),
("ledger_name", "ledger_name", None),
],
"billiards_dwd.dwd_store_goods_sale": [("store_goods_sale_id", "id", None), ("discount_price", "discount_money", None)],
"billiards_dwd.dwd_store_goods_sale_ex": [
("store_goods_sale_id", "id", None),
("option_value_name", "option_value_name", None),
("open_salesman_flag", "opensalesman", "integer"),
("salesman_name", "salesman_name", None),
("salesman_org_id", "sales_man_org_id", None),
("legacy_order_goods_id", "ordergoodsid", None),
("site_name", "sitename", None),
("legacy_site_id", "siteid", None),
],
"billiards_dwd.dwd_assistant_service_log": [
("assistant_service_id", "id", None),
("assistant_no", "assistantno", None),
("site_assistant_id", "order_assistant_id", None),
("level_name", "levelname", None),
("skill_name", "skillname", None),
],
"billiards_dwd.dwd_assistant_service_log_ex": [
("assistant_service_id", "id", None),
("assistant_name", "assistantname", None),
("ledger_group_name", "ledger_group_name", None),
("trash_applicant_name", "trash_applicant_name", None),
("trash_reason", "trash_reason", None),
("salesman_name", "salesman_name", None),
("table_name", "tablename", None),
],
"billiards_dwd.dwd_assistant_trash_event": [
("assistant_trash_event_id", "id", None),
("assistant_no", "assistantname", None),
("abolish_amount", "assistantabolishamount", None),
("charge_minutes_raw", "pdchargeminutes", None),
("site_id", "siteid", None),
("table_id", "tableid", None),
("table_area_id", "tableareaid", None),
("assistant_name", "assistantname", None),
("trash_reason", "trashreason", None),
("create_time", "createtime", None),
],
"billiards_dwd.dwd_assistant_trash_event_ex": [
("assistant_trash_event_id", "id", None),
("table_area_name", "tablearea", None),
("table_name", "tablename", None),
],
"billiards_dwd.dwd_member_balance_change": [
("balance_change_id", "id", None),
("balance_before", "before", None),
("change_amount", "account_data", None),
("balance_after", "after", None),
("card_type_name", "membercardtypename", None),
("change_time", "create_time", None),
("member_name", "membername", None),
("member_mobile", "membermobile", None),
],
"billiards_dwd.dwd_member_balance_change_ex": [
("balance_change_id", "id", None),
("pay_site_name", "paysitename", None),
("register_site_name", "registersitename", None),
],
"billiards_dwd.dwd_groupbuy_redemption": [("redemption_id", "id", None)],
"billiards_dwd.dwd_groupbuy_redemption_ex": [
("redemption_id", "id", None),
("table_area_name", "tableareaname", None),
("site_name", "sitename", None),
("table_name", "tablename", None),
("goods_option_price", "goodsoptionprice", None),
("salesman_name", "salesman_name", None),
("salesman_org_id", "sales_man_org_id", None),
("ledger_group_name", "ledger_group_name", None),
],
"billiards_dwd.dwd_platform_coupon_redemption": [("platform_coupon_redemption_id", "id", None)],
"billiards_dwd.dwd_platform_coupon_redemption_ex": [
("platform_coupon_redemption_id", "id", None),
("coupon_cover", "coupon_cover", None),
],
"billiards_dwd.dwd_payment": [("payment_id", "id", None), ("pay_date", "pay_time", "date")],
"billiards_dwd.dwd_refund": [("refund_id", "id", None)],
"billiards_dwd.dwd_refund_ex": [
("refund_id", "id", None),
("tenant_name", "tenantname", None),
("channel_payer_id", "channel_payer_id", None),
("channel_pay_no", "channel_pay_no", None),
],
# 结算头settlement_records源列为小写驼峰/无下划线,需要显式映射)
"billiards_dwd.dwd_settlement_head": [
("order_settle_id", "id", None),
("tenant_id", "tenantid", None),
("site_id", "siteid", None),
("site_name", "sitename", None),
("table_id", "tableid", None),
("settle_name", "settlename", None),
("order_trade_no", "settlerelateid", None),
("create_time", "createtime", None),
("pay_time", "paytime", None),
("settle_type", "settletype", None),
("revoke_order_id", "revokeorderid", None),
("member_id", "memberid", None),
("member_name", "membername", None),
("member_phone", "memberphone", None),
("member_card_account_id", "tenantmembercardid", None),
("member_card_type_name", "membercardtypename", None),
("is_bind_member", "isbindmember", None),
("member_discount_amount", "memberdiscountamount", None),
("consume_money", "consumemoney", None),
("table_charge_money", "tablechargemoney", None),
("goods_money", "goodsmoney", None),
("real_goods_money", "realgoodsmoney", None),
("assistant_pd_money", "assistantpdmoney", None),
("assistant_cx_money", "assistantcxmoney", None),
("adjust_amount", "adjustamount", None),
("pay_amount", "payamount", None),
("balance_amount", "balanceamount", None),
("recharge_card_amount", "rechargecardamount", None),
("gift_card_amount", "giftcardamount", None),
("coupon_amount", "couponamount", None),
("rounding_amount", "roundingamount", None),
("point_amount", "pointamount", None),
],
"billiards_dwd.dwd_settlement_head_ex": [
("order_settle_id", "id", None),
("serial_number", "serialnumber", None),
("settle_status", "settlestatus", None),
("can_be_revoked", "canberevoked", "boolean"),
("revoke_order_name", "revokeordername", None),
("revoke_time", "revoketime", None),
("is_first_order", "isfirst", "boolean"),
("service_money", "servicemoney", None),
("cash_amount", "cashamount", None),
("card_amount", "cardamount", None),
("online_amount", "onlineamount", None),
("refund_amount", "refundamount", None),
("prepay_money", "prepaymoney", None),
("payment_method", "paymentmethod", None),
("coupon_sale_amount", "couponsaleamount", None),
("all_coupon_discount", "allcoupondiscount", None),
("goods_promotion_money", "goodspromotionmoney", None),
("assistant_promotion_money", "assistantpromotionmoney", None),
("activity_discount", "activitydiscount", None),
("assistant_manual_discount", "assistantmanualdiscount", None),
("point_discount_price", "pointdiscountprice", None),
("point_discount_cost", "pointdiscountcost", None),
("is_use_coupon", "isusecoupon", "boolean"),
("is_use_discount", "isusediscount", "boolean"),
("is_activity", "isactivity", "boolean"),
("operator_name", "operatorname", None),
("salesman_name", "salesmanname", None),
("order_remark", "orderremark", None),
("operator_id", "operatorid", None),
("salesman_user_id", "salesmanuserid", None),
],
# 充值结算recharge_settlements字段风格同 settlement_records
"billiards_dwd.dwd_recharge_order": [
("recharge_order_id", "id", None),
("tenant_id", "tenantid", None),
("site_id", "siteid", None),
("member_id", "memberid", None),
("member_name_snapshot", "membername", None),
("member_phone_snapshot", "memberphone", None),
("tenant_member_card_id", "tenantmembercardid", None),
("member_card_type_name", "membercardtypename", None),
("settle_relate_id", "settlerelateid", None),
("settle_type", "settletype", None),
("settle_name", "settlename", None),
("is_first", "isfirst", None),
("pay_amount", "payamount", None),
("refund_amount", "refundamount", None),
("point_amount", "pointamount", None),
("cash_amount", "cashamount", None),
("payment_method", "paymentmethod", None),
("create_time", "createtime", None),
("pay_time", "paytime", None),
],
"billiards_dwd.dwd_recharge_order_ex": [
("recharge_order_id", "id", None),
("site_name_snapshot", "sitename", None),
("salesman_name", "salesmanname", None),
("order_remark", "orderremark", None),
("revoke_order_name", "revokeordername", None),
("settle_status", "settlestatus", None),
("is_bind_member", "isbindmember", "boolean"),
("is_activity", "isactivity", "boolean"),
("is_use_coupon", "isusecoupon", "boolean"),
("is_use_discount", "isusediscount", "boolean"),
("can_be_revoked", "canberevoked", "boolean"),
("online_amount", "onlineamount", None),
("balance_amount", "balanceamount", None),
("card_amount", "cardamount", None),
("coupon_amount", "couponamount", None),
("recharge_card_amount", "rechargecardamount", None),
("gift_card_amount", "giftcardamount", None),
("prepay_money", "prepaymoney", None),
("consume_money", "consumemoney", None),
("goods_money", "goodsmoney", None),
("real_goods_money", "realgoodsmoney", None),
("table_charge_money", "tablechargemoney", None),
("service_money", "servicemoney", None),
("activity_discount", "activitydiscount", None),
("all_coupon_discount", "allcoupondiscount", None),
("goods_promotion_money", "goodspromotionmoney", None),
("assistant_promotion_money", "assistantpromotionmoney", None),
("assistant_pd_money", "assistantpdmoney", None),
("assistant_cx_money", "assistantcxmoney", None),
("assistant_manual_discount", "assistantmanualdiscount", None),
("coupon_sale_amount", "couponsaleamount", None),
("member_discount_amount", "memberdiscountamount", None),
("point_discount_price", "pointdiscountprice", None),
("point_discount_cost", "pointdiscountcost", None),
("adjust_amount", "adjustamount", None),
("rounding_amount", "roundingamount", None),
("operator_id", "operatorid", None),
("operator_name_snapshot", "operatorname", None),
("salesman_user_id", "salesmanuserid", None),
("salesman_name", "salesmanname", None),
("order_remark", "orderremark", None),
("table_id", "tableid", None),
("serial_number", "serialnumber", None),
("revoke_order_id", "revokeorderid", None),
("revoke_order_name", "revokeordername", None),
("revoke_time", "revoketime", None),
],
}
def get_task_code(self) -> str:
"""返回任务编码。"""
return "DWD_LOAD_FROM_ODS"
def extract(self, context: TaskContext) -> dict[str, Any]:
"""准备运行所需的上下文信息。"""
return {"now": datetime.now()}
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict[str, Any]:
"""遍历映射关系,维度执行 SCD2 合并,事实表按时间增量插入。"""
now = extracted["now"]
summary: List[Dict[str, Any]] = []
with self.db.conn.cursor(cursor_factory=RealDictCursor) as cur:
for dwd_table, ods_table in self.TABLE_MAP.items():
dwd_cols = self._get_columns(cur, dwd_table)
ods_cols = self._get_columns(cur, ods_table)
if not dwd_cols:
self.logger.warning("跳过 %s,未能获取 DWD 列信息", dwd_table)
continue
if self._table_base(dwd_table).startswith("dim_"):
processed = self._merge_dim_scd2(cur, dwd_table, ods_table, dwd_cols, ods_cols, now)
summary.append({"table": dwd_table, "mode": "SCD2", "processed": processed})
else:
dwd_types = self._get_column_types(cur, dwd_table, "billiards_dwd")
ods_types = self._get_column_types(cur, ods_table, "billiards_ods")
inserted = self._merge_fact_increment(
cur, dwd_table, ods_table, dwd_cols, ods_cols, dwd_types, ods_types
)
summary.append({"table": dwd_table, "mode": "INCREMENT", "inserted": inserted})
self.db.conn.commit()
return {"tables": summary}
# ---------------------- helpers ----------------------
def _get_columns(self, cur, table: str) -> List[str]:
"""获取指定表的列名(小写)。"""
schema, name = self._split_table_name(table, default_schema="billiards_dwd")
cur.execute(
"""
SELECT column_name
FROM information_schema.columns
WHERE table_schema = %s AND table_name = %s
""",
(schema, name),
)
return [r["column_name"].lower() for r in cur.fetchall()]
def _get_primary_keys(self, cur, table: str) -> List[str]:
"""获取表的主键列名列表。"""
schema, name = self._split_table_name(table, default_schema="billiards_dwd")
cur.execute(
"""
SELECT kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
AND tc.table_schema = kcu.table_schema
AND tc.table_name = kcu.table_name
WHERE tc.table_schema = %s
AND tc.table_name = %s
AND tc.constraint_type = 'PRIMARY KEY'
ORDER BY kcu.ordinal_position
""",
(schema, name),
)
return [r["column_name"].lower() for r in cur.fetchall()]
def _get_column_types(self, cur, table: str, default_schema: str) -> Dict[str, str]:
"""获取列的数据类型information_schema.data_type"""
schema, name = self._split_table_name(table, default_schema=default_schema)
cur.execute(
"""
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_schema = %s AND table_name = %s
""",
(schema, name),
)
return {r["column_name"].lower(): r["data_type"].lower() for r in cur.fetchall()}
def _build_column_mapping(
self, dwd_table: str, pk_cols: Sequence[str], ods_cols: Sequence[str]
) -> Dict[str, tuple[str, str | None]]:
"""合并显式 FACT_MAPPINGS 与主键兜底映射。"""
mapping_entries = self.FACT_MAPPINGS.get(dwd_table, [])
mapping: Dict[str, tuple[str, str | None]] = {
dst.lower(): (src, cast_type) for dst, src, cast_type in mapping_entries
}
ods_set = {c.lower() for c in ods_cols}
for pk in pk_cols:
pk_lower = pk.lower()
if pk_lower not in mapping and pk_lower not in ods_set and "id" in ods_set:
mapping[pk_lower] = ("id", None)
return mapping
def _fetch_source_rows(
self, cur, table: str, columns: Sequence[str], where_sql: str = "", params: Sequence[Any] = None
) -> List[Dict[str, Any]]:
"""从源表读取指定列,返回小写键的字典列表。"""
schema, name = self._split_table_name(table, default_schema="billiards_ods")
cols_sql = ", ".join(f'"{c}"' for c in columns)
sql = f'SELECT {cols_sql} FROM "{schema}"."{name}" {where_sql}'
cur.execute(sql, params or [])
rows = []
for r in cur.fetchall():
rows.append({k.lower(): v for k, v in r.items()})
return rows
def _expand_goods_category_rows(self, rows: list[Dict[str, Any]]) -> list[Dict[str, Any]]:
"""将分类表中的 categoryboxes 元素展开为子类记录。"""
expanded: list[Dict[str, Any]] = []
for r in rows:
expanded.append(r)
boxes = r.get("categoryboxes")
if isinstance(boxes, list):
for child in boxes:
if not isinstance(child, dict):
continue
child_row: Dict[str, Any] = {}
# 继承父级的租户与业务大类信息
child_row["tenant_id"] = r.get("tenant_id")
child_row["business_name"] = child.get("business_name", r.get("business_name"))
child_row["tenant_goods_business_id"] = child.get(
"tenant_goods_business_id", r.get("tenant_goods_business_id")
)
# 合并子类字段
child_row.update(child)
# 默认父子关系
child_row.setdefault("pid", r.get("id"))
# 衍生层级/叶子标记
child_boxes = child_row.get("categoryboxes")
if not isinstance(child_boxes, list):
is_leaf = 1
else:
is_leaf = 1 if len(child_boxes) == 0 else 0
child_row.setdefault("category_level", 2)
child_row.setdefault("is_leaf", is_leaf)
expanded.append(child_row)
return expanded
def _merge_dim_scd2(
self,
cur,
dwd_table: str,
ods_table: str,
dwd_cols: Sequence[str],
ods_cols: Sequence[str],
now: datetime,
) -> int:
"""对维表执行 SCD2 合并:对比变更关闭旧版并插入新版。"""
pk_cols = self._get_primary_keys(cur, dwd_table)
if not pk_cols:
raise ValueError(f"{dwd_table} 未配置主键,无法执行 SCD2 合并")
mapping = self._build_column_mapping(dwd_table, pk_cols, ods_cols)
ods_set = {c.lower() for c in ods_cols}
table_sql = self._format_table(ods_table, "billiards_ods")
# 构造 SELECT 表达式,支持 JSON/expression 映射
select_exprs: list[str] = []
added: set[str] = set()
for col in dwd_cols:
lc = col.lower()
if lc in self.SCD_COLS:
continue
if lc in mapping:
src, cast_type = mapping[lc]
select_exprs.append(f"{self._cast_expr(src, cast_type)} AS \"{lc}\"")
added.add(lc)
elif lc in ods_set:
select_exprs.append(f'"{lc}" AS "{lc}"')
added.add(lc)
# 分类维度需要额外读取 categoryboxes 以展开子类
if dwd_table == "billiards_dwd.dim_goods_category" and "categoryboxes" not in added and "categoryboxes" in ods_set:
select_exprs.append('"categoryboxes" AS "categoryboxes"')
added.add("categoryboxes")
# 主键兜底确保被选出
for pk in pk_cols:
lc = pk.lower()
if lc not in added:
if lc in mapping:
src, cast_type = mapping[lc]
select_exprs.append(f"{self._cast_expr(src, cast_type)} AS \"{lc}\"")
elif lc in ods_set:
select_exprs.append(f'"{lc}" AS "{lc}"')
added.add(lc)
if not select_exprs:
return 0
sql = f"SELECT {', '.join(select_exprs)} FROM {table_sql}"
cur.execute(sql)
rows = [{k.lower(): v for k, v in r.items()} for r in cur.fetchall()]
# 特殊:分类维度展开子类
if dwd_table == "billiards_dwd.dim_goods_category":
rows = self._expand_goods_category_rows(rows)
inserted_or_updated = 0
seen_pk = set()
for row in rows:
mapped_row: Dict[str, Any] = {}
for col in dwd_cols:
lc = col.lower()
if lc in self.SCD_COLS:
continue
value = row.get(lc)
if value is None and lc in mapping:
src, _ = mapping[lc]
value = row.get(src.lower())
mapped_row[lc] = value
pk_key = tuple(mapped_row.get(pk) for pk in pk_cols)
if pk_key in seen_pk:
continue
seen_pk.add(pk_key)
if self._upsert_scd2_row(cur, dwd_table, dwd_cols, pk_cols, mapped_row, now):
inserted_or_updated += 1
return len(rows)
def _upsert_scd2_row(
self,
cur,
dwd_table: str,
dwd_cols: Sequence[str],
pk_cols: Sequence[str],
src_row: Dict[str, Any],
now: datetime,
) -> bool:
"""SCD2 合并:若有变更则关闭旧版并插入新版本。"""
pk_values = [src_row.get(pk) for pk in pk_cols]
if any(v is None for v in pk_values):
self.logger.warning("跳过 %s:主键缺失 %s", dwd_table, dict(zip(pk_cols, pk_values)))
return False
where_clause = " AND ".join(f'"{pk}" = %s' for pk in pk_cols)
table_sql = self._format_table(dwd_table, "billiards_dwd")
cur.execute(
f"SELECT * FROM {table_sql} WHERE {where_clause} AND COALESCE(scd2_is_current,1)=1 LIMIT 1",
pk_values,
)
current = cur.fetchone()
if current:
current = {k.lower(): v for k, v in current.items()}
if current and not self._is_row_changed(current, src_row, dwd_cols):
return False
if current:
version = (current.get("scd2_version") or 1) + 1
self._close_current_dim(cur, dwd_table, pk_cols, pk_values, now)
else:
version = 1
self._insert_dim_row(cur, dwd_table, dwd_cols, src_row, now, version)
return True
def _close_current_dim(self, cur, table: str, pk_cols: Sequence[str], pk_values: Sequence[Any], now: datetime) -> None:
"""关闭当前版本,标记 scd2_is_current=0 并填充结束时间。"""
set_sql = "scd2_end_time = %s, scd2_is_current = 0"
where_clause = " AND ".join(f'"{pk}" = %s' for pk in pk_cols)
table_sql = self._format_table(table, "billiards_dwd")
cur.execute(f"UPDATE {table_sql} SET {set_sql} WHERE {where_clause} AND COALESCE(scd2_is_current,1)=1", [now, *pk_values])
def _insert_dim_row(
self,
cur,
table: str,
dwd_cols: Sequence[str],
src_row: Dict[str, Any],
now: datetime,
version: int,
) -> None:
"""插入新的 SCD2 版本行。"""
insert_cols: List[str] = []
placeholders: List[str] = []
values: List[Any] = []
for col in sorted(dwd_cols):
lc = col.lower()
insert_cols.append(f'"{lc}"')
placeholders.append("%s")
if lc == "scd2_start_time":
values.append(now)
elif lc == "scd2_end_time":
values.append(datetime(9999, 12, 31, 0, 0, 0))
elif lc == "scd2_is_current":
values.append(1)
elif lc == "scd2_version":
values.append(version)
else:
values.append(src_row.get(lc))
table_sql = self._format_table(table, "billiards_dwd")
sql = f'INSERT INTO {table_sql} ({", ".join(insert_cols)}) VALUES ({", ".join(placeholders)})'
cur.execute(sql, values)
def _is_row_changed(self, current: Dict[str, Any], incoming: Dict[str, Any], dwd_cols: Sequence[str]) -> bool:
"""比较非 SCD2 列,判断是否存在变更。"""
for col in dwd_cols:
lc = col.lower()
if lc in self.SCD_COLS:
continue
if current.get(lc) != incoming.get(lc):
return True
return False
def _merge_fact_increment(
self,
cur,
dwd_table: str,
ods_table: str,
dwd_cols: Sequence[str],
ods_cols: Sequence[str],
dwd_types: Dict[str, str],
ods_types: Dict[str, str],
) -> int:
"""事实表按时间增量插入,默认按列名交集写入。"""
mapping_entries = self.FACT_MAPPINGS.get(dwd_table) or []
mapping: Dict[str, tuple[str, str | None]] = {
dst.lower(): (src, cast_type) for dst, src, cast_type in mapping_entries
}
mapping_dest = [dst for dst, _, _ in mapping_entries]
insert_cols: List[str] = list(mapping_dest)
for col in dwd_cols:
if col in self.SCD_COLS:
continue
if col in insert_cols:
continue
if col in ods_cols:
insert_cols.append(col)
pk_cols = self._get_primary_keys(cur, dwd_table)
ods_set = {c.lower() for c in ods_cols}
existing_lower = [c.lower() for c in insert_cols]
for pk in pk_cols:
pk_lower = pk.lower()
if pk_lower in existing_lower:
continue
if pk_lower in ods_set:
insert_cols.append(pk)
existing_lower.append(pk_lower)
elif "id" in ods_set:
insert_cols.append(pk)
existing_lower.append(pk_lower)
mapping[pk_lower] = ("id", None)
# 保持列顺序同时去重
seen_cols: set[str] = set()
ordered_cols: list[str] = []
for col in insert_cols:
lc = col.lower()
if lc not in seen_cols:
seen_cols.add(lc)
ordered_cols.append(col)
insert_cols = ordered_cols
if not insert_cols:
self.logger.warning("跳过 %s:未找到可插入的列", dwd_table)
return 0
order_col = self._pick_order_column(dwd_cols, ods_cols)
where_sql = ""
params: List[Any] = []
dwd_table_sql = self._format_table(dwd_table, "billiards_dwd")
ods_table_sql = self._format_table(ods_table, "billiards_ods")
if order_col:
cur.execute(f'SELECT COALESCE(MAX("{order_col}"), %s) FROM {dwd_table_sql}', ("1970-01-01",))
row = cur.fetchone() or {}
watermark = list(row.values())[0] if row else "1970-01-01"
where_sql = f'WHERE "{order_col}" > %s'
params.append(watermark)
default_cols = [c for c in insert_cols if c.lower() not in mapping]
default_expr_map: Dict[str, str] = {}
if default_cols:
default_exprs = self._build_fact_select_exprs(default_cols, dwd_types, ods_types)
default_expr_map = dict(zip(default_cols, default_exprs))
select_exprs: List[str] = []
for col in insert_cols:
key = col.lower()
if key in mapping:
src, cast_type = mapping[key]
select_exprs.append(self._cast_expr(src, cast_type))
else:
select_exprs.append(default_expr_map[col])
select_cols_sql = ", ".join(select_exprs)
insert_cols_sql = ", ".join(f'"{c}"' for c in insert_cols)
sql = f'INSERT INTO {dwd_table_sql} ({insert_cols_sql}) SELECT {select_cols_sql} FROM {ods_table_sql} {where_sql}'
pk_cols = self._get_primary_keys(cur, dwd_table)
if pk_cols:
pk_sql = ", ".join(f'"{c}"' for c in pk_cols)
sql += f" ON CONFLICT ({pk_sql}) DO NOTHING"
cur.execute(sql, params)
return cur.rowcount
def _pick_order_column(self, dwd_cols: Iterable[str], ods_cols: Iterable[str]) -> str | None:
"""选择用于增量的时间列(需同时存在于 DWD 与 ODS"""
lower_cols = {c.lower() for c in dwd_cols} & {c.lower() for c in ods_cols}
for candidate in self.FACT_ORDER_CANDIDATES:
if candidate.lower() in lower_cols:
return candidate.lower()
return None
def _build_fact_select_exprs(
self,
insert_cols: Sequence[str],
dwd_types: Dict[str, str],
ods_types: Dict[str, str],
) -> List[str]:
"""构造事实表 SELECT 列表,需要时做类型转换。"""
numeric_types = {"integer", "bigint", "smallint", "numeric", "double precision", "real", "decimal"}
text_types = {"text", "character varying", "varchar"}
exprs = []
for col in insert_cols:
d_type = dwd_types.get(col)
o_type = ods_types.get(col)
if d_type in numeric_types and o_type in text_types:
exprs.append(f"CAST(NULLIF(CAST(\"{col}\" AS text), '') AS numeric):: {d_type}")
else:
exprs.append(f'"{col}"')
return exprs
def _split_table_name(self, name: str, default_schema: str) -> tuple[str, str]:
"""拆分 schema.table若无 schema 则补默认 schema。"""
parts = name.split(".")
if len(parts) == 2:
return parts[0], parts[1].lower()
return default_schema, name.lower()
def _table_base(self, name: str) -> str:
"""获取不含 schema 的表名。"""
return name.split(".")[-1]
def _format_table(self, name: str, default_schema: str) -> str:
"""返回带引号的 schema.table 名称。"""
schema, table = self._split_table_name(name, default_schema)
return f'"{schema}"."{table}"'
def _cast_expr(self, col: str, cast_type: str | None) -> str:
"""构造带可选 CAST 的列表达式。"""
if col.upper() == "NULL":
base = "NULL"
else:
is_expr = not col.isidentifier() or "->" in col or "#>>" in col or "::" in col or "'" in col
base = col if is_expr else f'"{col}"'
if cast_type:
cast_lower = cast_type.lower()
if cast_lower in {"bigint", "integer", "numeric", "decimal"}:
return f"CAST(NULLIF(CAST({base} AS text), '') AS numeric):: {cast_type}"
if cast_lower == "timestamptz":
return f"({base})::timestamptz"
return f"{base}::{cast_type}"
return base

View File

@@ -0,0 +1,105 @@
# -*- coding: utf-8 -*-
"""DWD 质量核对任务:按 dwd_quality_check.md 输出行数/金额对照报表。"""
from __future__ import annotations
import json
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Iterable, List, Sequence, Tuple
from psycopg2.extras import RealDictCursor
from .base_task import BaseTask, TaskContext
from .dwd_load_task import DwdLoadTask
class DwdQualityTask(BaseTask):
"""对 ODS 与 DWD 进行行数、金额对照核查,生成 JSON 报表。"""
REPORT_PATH = Path("etl_billiards/reports/dwd_quality_report.json")
AMOUNT_KEYWORDS = ("amount", "money", "fee", "balance")
def get_task_code(self) -> str:
"""返回任务编码。"""
return "DWD_QUALITY_CHECK"
def extract(self, context: TaskContext) -> dict[str, Any]:
"""准备运行时上下文。"""
return {"now": datetime.now()}
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict[str, Any]:
"""输出行数/金额差异报表到本地文件。"""
report: Dict[str, Any] = {
"generated_at": extracted["now"].isoformat(),
"tables": [],
"note": "行数/金额核对,金额字段基于列名包含 amount/money/fee/balance 的数值列自动扫描。",
}
with self.db.conn.cursor(cursor_factory=RealDictCursor) as cur:
for dwd_table, ods_table in DwdLoadTask.TABLE_MAP.items():
count_info = self._compare_counts(cur, dwd_table, ods_table)
amount_info = self._compare_amounts(cur, dwd_table, ods_table)
report["tables"].append(
{
"dwd_table": dwd_table,
"ods_table": ods_table,
"count": count_info,
"amounts": amount_info,
}
)
self.REPORT_PATH.parent.mkdir(parents=True, exist_ok=True)
self.REPORT_PATH.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8")
self.logger.info("DWD 质检报表已生成:%s", self.REPORT_PATH)
return {"report_path": str(self.REPORT_PATH)}
# ---------------------- helpers ----------------------
def _compare_counts(self, cur, dwd_table: str, ods_table: str) -> Dict[str, Any]:
"""统计两端行数并返回差异。"""
dwd_schema, dwd_name = self._split_table_name(dwd_table, default_schema="billiards_dwd")
ods_schema, ods_name = self._split_table_name(ods_table, default_schema="billiards_ods")
cur.execute(f'SELECT COUNT(1) AS cnt FROM "{dwd_schema}"."{dwd_name}"')
dwd_cnt = cur.fetchone()["cnt"]
cur.execute(f'SELECT COUNT(1) AS cnt FROM "{ods_schema}"."{ods_name}"')
ods_cnt = cur.fetchone()["cnt"]
return {"dwd": dwd_cnt, "ods": ods_cnt, "diff": dwd_cnt - ods_cnt}
def _compare_amounts(self, cur, dwd_table: str, ods_table: str) -> List[Dict[str, Any]]:
"""扫描金额相关列,生成 ODS 与 DWD 的汇总对照。"""
dwd_schema, dwd_name = self._split_table_name(dwd_table, default_schema="billiards_dwd")
ods_schema, ods_name = self._split_table_name(ods_table, default_schema="billiards_ods")
dwd_amount_cols = self._get_numeric_amount_columns(cur, dwd_schema, dwd_name)
ods_amount_cols = self._get_numeric_amount_columns(cur, ods_schema, ods_name)
common_amount_cols = sorted(set(dwd_amount_cols) & set(ods_amount_cols))
results: List[Dict[str, Any]] = []
for col in common_amount_cols:
cur.execute(f'SELECT COALESCE(SUM("{col}"),0) AS val FROM "{dwd_schema}"."{dwd_name}"')
dwd_sum = cur.fetchone()["val"]
cur.execute(f'SELECT COALESCE(SUM("{col}"),0) AS val FROM "{ods_schema}"."{ods_name}"')
ods_sum = cur.fetchone()["val"]
results.append({"column": col, "dwd_sum": float(dwd_sum or 0), "ods_sum": float(ods_sum or 0), "diff": float(dwd_sum or 0) - float(ods_sum or 0)})
return results
def _get_numeric_amount_columns(self, cur, schema: str, table: str) -> List[str]:
"""获取列名包含金额关键词的数值型字段。"""
cur.execute(
"""
SELECT column_name
FROM information_schema.columns
WHERE table_schema = %s
AND table_name = %s
AND data_type IN ('numeric','double precision','integer','bigint','smallint','real','decimal')
""",
(schema, table),
)
cols = [r["column_name"].lower() for r in cur.fetchall()]
return [c for c in cols if any(key in c for key in self.AMOUNT_KEYWORDS)]
def _split_table_name(self, name: str, default_schema: str) -> Tuple[str, str]:
"""拆分 schema 与表名,缺省使用 default_schema。"""
parts = name.split(".")
if len(parts) == 2:
return parts[0], parts[1]
return default_schema, name

View File

@@ -0,0 +1,36 @@
# -*- coding: utf-8 -*-
"""初始化 DWD Schema执行 schema_dwd_doc.sql可选先 DROP SCHEMA。"""
from __future__ import annotations
from pathlib import Path
from typing import Any
from .base_task import BaseTask, TaskContext
class InitDwdSchemaTask(BaseTask):
"""通过调度执行 DWD schema 初始化。"""
def get_task_code(self) -> str:
"""返回任务编码。"""
return "INIT_DWD_SCHEMA"
def extract(self, context: TaskContext) -> dict[str, Any]:
"""读取 DWD SQL 文件与参数。"""
base_dir = Path(__file__).resolve().parents[1] / "database"
dwd_path = Path(self.config.get("schema.dwd_file", base_dir / "schema_dwd_doc.sql"))
if not dwd_path.exists():
raise FileNotFoundError(f"未找到 DWD schema 文件: {dwd_path}")
drop_first = self.config.get("dwd.drop_schema_first", False)
return {"dwd_sql": dwd_path.read_text(encoding="utf-8"), "dwd_file": str(dwd_path), "drop_first": drop_first}
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict:
"""可选 DROP schema再执行 DWD DDL。"""
with self.db.conn.cursor() as cur:
if extracted["drop_first"]:
cur.execute("DROP SCHEMA IF EXISTS billiards_dwd CASCADE;")
self.logger.info("已执行 DROP SCHEMA billiards_dwd CASCADE")
self.logger.info("执行 DWD schema 文件: %s", extracted["dwd_file"])
cur.execute(extracted["dwd_sql"])
return {"executed": 1, "files": [extracted["dwd_file"]]}

View File

@@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
"""任务:初始化运行环境,执行 ODS 与 etl_admin 的 DDL并准备日志/导出目录。"""
from __future__ import annotations
from pathlib import Path
from typing import Any
from .base_task import BaseTask, TaskContext
class InitOdsSchemaTask(BaseTask):
"""通过调度执行初始化:创建必要目录,执行 ODS 与 etl_admin 的 DDL。"""
def get_task_code(self) -> str:
"""返回任务编码。"""
return "INIT_ODS_SCHEMA"
def extract(self, context: TaskContext) -> dict[str, Any]:
"""读取 SQL 文件路径,收集需创建的目录。"""
base_dir = Path(__file__).resolve().parents[1] / "database"
ods_path = Path(self.config.get("schema.ods_file", base_dir / "schema_ODS_doc.sql"))
admin_path = Path(self.config.get("schema.etl_admin_file", base_dir / "schema_etl_admin.sql"))
if not ods_path.exists():
raise FileNotFoundError(f"找不到 ODS schema 文件: {ods_path}")
if not admin_path.exists():
raise FileNotFoundError(f"找不到 etl_admin schema 文件: {admin_path}")
log_root = Path(self.config.get("io.log_root") or self.config["io"]["log_root"])
export_root = Path(self.config.get("io.export_root") or self.config["io"]["export_root"])
fetch_root = Path(self.config.get("pipeline.fetch_root") or self.config["pipeline"]["fetch_root"])
ingest_dir = Path(self.config.get("pipeline.ingest_source_dir") or fetch_root)
return {
"ods_sql": ods_path.read_text(encoding="utf-8"),
"admin_sql": admin_path.read_text(encoding="utf-8"),
"ods_file": str(ods_path),
"admin_file": str(admin_path),
"dirs": [log_root, export_root, fetch_root, ingest_dir],
}
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict:
"""执行 DDL 并创建必要目录。
安全提示:
ODS DDL 文件可能携带头部说明或异常注释,为避免因非 SQL 文本导致执行失败,这里会做一次轻量清洗后再执行。
"""
for d in extracted["dirs"]:
Path(d).mkdir(parents=True, exist_ok=True)
self.logger.info("已确保目录存在: %s", d)
# 处理 ODS SQL去掉头部说明行以及易出错的 COMMENT ON 行(如 CamelCase 未加引号)
ods_sql_raw: str = extracted["ods_sql"]
drop_idx = ods_sql_raw.find("DROP SCHEMA")
if drop_idx > 0:
ods_sql_raw = ods_sql_raw[drop_idx:]
cleaned_lines: list[str] = []
for line in ods_sql_raw.splitlines():
if line.strip().upper().startswith("COMMENT ON "):
continue
cleaned_lines.append(line)
ods_sql = "\n".join(cleaned_lines)
with self.db.conn.cursor() as cur:
self.logger.info("执行 etl_admin schema 文件: %s", extracted["admin_file"])
cur.execute(extracted["admin_sql"])
self.logger.info("执行 ODS schema 文件: %s", extracted["ods_file"])
cur.execute(ods_sql)
return {
"executed": 2,
"files": [extracted["admin_file"], extracted["ods_file"]],
"dirs_prepared": [str(p) for p in extracted["dirs"]],
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
from .base_dwd_task import BaseDwdTask
from loaders.dimensions.member import MemberLoader
from models.parsers import TypeParser
@@ -7,7 +7,7 @@ import json
class MembersDwdTask(BaseDwdTask):
"""
DWD Task: Process Member Records from ODS to Dimension Table
Source: billiards_ods.ods_member_profile
Source: billiards_ods.member_profiles
Target: billiards.dim_member
"""
@@ -29,7 +29,7 @@ class MembersDwdTask(BaseDwdTask):
# Iterate ODS Data
batches = self.iter_ods_rows(
table_name="billiards_ods.ods_member_profile",
table_name="billiards_ods.member_profiles",
columns=["site_id", "member_id", "payload", "fetched_at"],
start_time=window_start,
end_time=window_end
@@ -87,3 +87,4 @@ class MembersDwdTask(BaseDwdTask):
except Exception as e:
self.logger.warning(f"Error parsing member: {e}")
return None

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
"""ODS ingestion tasks."""
from __future__ import annotations
@@ -62,11 +62,11 @@ class BaseOdsTask(BaseTask):
def execute(self) -> dict:
spec = self.SPEC
self.logger.info("开始执行 %s (ODS)", spec.code)
self.logger.info("寮€濮嬫墽琛?%s (ODS)", spec.code)
store_id = TypeParser.parse_int(self.config.get("app.store_id"))
if not store_id:
raise ValueError("app.store_id 未配置,无法执行 ODS 任务")
raise ValueError("app.store_id 鏈厤缃紝鏃犳硶鎵ц ODS 浠诲姟")
page_size = self.config.get("api.page_size", 200)
params = self._build_params(spec, store_id)
@@ -122,13 +122,13 @@ class BaseOdsTask(BaseTask):
counts["fetched"] += len(page_records)
self.db.commit()
self.logger.info("%s ODS 任务完成: %s", spec.code, counts)
self.logger.info("%s ODS 浠诲姟瀹屾垚: %s", spec.code, counts)
return self._build_result("SUCCESS", counts)
except Exception:
self.db.rollback()
counts["errors"] += 1
self.logger.error("%s ODS 任务失败", spec.code, exc_info=True)
self.logger.error("%s ODS 浠诲姟澶辫触", spec.code, exc_info=True)
raise
def _build_params(self, spec: OdsTaskSpec, store_id: int) -> dict:
@@ -201,7 +201,7 @@ class BaseOdsTask(BaseTask):
value = self._extract_value(record, col_spec)
if value is None and col_spec.required:
self.logger.warning(
"%s 缺少必填字段 %s,原始记录: %s",
"%s 缂哄皯蹇呭~瀛楁 %s锛屽師濮嬭褰? %s",
spec.code,
col_spec.column,
record,
@@ -265,9 +265,38 @@ def _int_col(name: str, *sources: str, required: bool = False) -> ColumnSpec:
)
def _decimal_col(name: str, *sources: str) -> ColumnSpec:
"""??????????????"""
return ColumnSpec(
column=name,
sources=sources,
transform=lambda v: TypeParser.parse_decimal(v, 2),
)
def _bool_col(name: str, *sources: str) -> ColumnSpec:
"""??????????????0/1?true/false ???"""
def _to_bool(value):
if value is None:
return None
if isinstance(value, bool):
return value
s = str(value).strip().lower()
if s in {"1", "true", "t", "yes", "y"}:
return True
if s in {"0", "false", "f", "no", "n"}:
return False
return bool(value)
return ColumnSpec(column=name, sources=sources, transform=_to_bool)
ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
OdsTaskSpec(
code="ODS_ASSISTANT_ACCOUNTS",
code="ODS_ASSISTANT_ACCOUNT",
class_name="OdsAssistantAccountsTask",
table_name="billiards_ods.assistant_accounts_master",
endpoint="/PersonnelManagement/SearchAssistantInfo",
@@ -281,10 +310,10 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
description="助教账号档案 ODSSearchAssistantInfo -> assistantInfos 原始 JSON",
description="鍔╂暀璐﹀彿妗f ODS锛歋earchAssistantInfo -> assistantInfos 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_ORDER_SETTLE",
code="ODS_SETTLEMENT_RECORDS",
class_name="OdsOrderSettleTask",
table_name="billiards_ods.settlement_records",
endpoint="/Site/GetAllOrderSettleList",
@@ -299,7 +328,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="结账记录 ODSGetAllOrderSettleList -> settleList 原始 JSON",
description="缁撹处璁板綍 ODS锛欸etAllOrderSettleList -> settleList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_TABLE_USE",
@@ -317,7 +346,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="台费计费流水 ODSGetSiteTableOrderDetails -> siteTableUseDetailsList 原始 JSON",
description="鍙拌垂璁¤垂娴佹按 ODS锛欸etSiteTableOrderDetails -> siteTableUseDetailsList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_ASSISTANT_LEDGER",
@@ -334,7 +363,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
description="助教服务流水 ODSGetOrderAssistantDetails -> orderAssistantDetails 原始 JSON",
description="鍔╂暀鏈嶅姟娴佹按 ODS锛欸etOrderAssistantDetails -> orderAssistantDetails 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_ASSISTANT_ABOLISH",
@@ -351,10 +380,10 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
description="助教废除记录 ODSGetAbolitionAssistant -> abolitionAssistants 原始 JSON",
description="鍔╂暀搴熼櫎璁板綍 ODS锛欸etAbolitionAssistant -> abolitionAssistants 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_GOODS_LEDGER",
code="ODS_STORE_GOODS_SALES",
class_name="OdsGoodsLedgerTask",
table_name="billiards_ods.store_goods_sales_records",
endpoint="/TenantGoods/GetGoodsSalesList",
@@ -369,7 +398,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="门店商品销售流水 ODSGetGoodsSalesList -> orderGoodsLedgers 原始 JSON",
description="闂ㄥ簵鍟嗗搧閿€鍞祦姘?ODS锛欸etGoodsSalesList -> orderGoodsLedgers 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_PAYMENT",
@@ -386,7 +415,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="支付流水 ODSGetPayLogListPage 原始 JSON",
description="鏀粯娴佹按 ODS锛欸etPayLogListPage 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_REFUND",
@@ -403,10 +432,10 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="退款流水 ODSGetRefundPayLogList 原始 JSON",
description="閫€娆炬祦姘?ODS锛欸etRefundPayLogList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_COUPON_VERIFY",
code="ODS_PLATFORM_COUPON",
class_name="OdsCouponVerifyTask",
table_name="billiards_ods.platform_coupon_redemption_records",
endpoint="/Promotion/GetOfflineCouponConsumePageList",
@@ -420,7 +449,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="平台/团购券核销 ODSGetOfflineCouponConsumePageList 原始 JSON",
description="骞冲彴/鍥㈣喘鍒告牳閿€ ODS锛欸etOfflineCouponConsumePageList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_MEMBER",
@@ -438,7 +467,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="会员档案 ODSGetTenantMemberList -> tenantMemberInfos 原始 JSON",
description="浼氬憳妗f ODS锛欸etTenantMemberList -> tenantMemberInfos 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_MEMBER_CARD",
@@ -456,7 +485,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="会员储值卡 ODSGetTenantMemberCardList -> tenantMemberCards 原始 JSON",
description="浼氬憳鍌ㄥ€煎崱 ODS锛欸etTenantMemberCardList -> tenantMemberCards 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_MEMBER_BALANCE",
@@ -474,7 +503,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="会员余额变动 ODSGetMemberCardBalanceChange -> tenantMemberCardLogs 原始 JSON",
description="浼氬憳浣欓鍙樺姩 ODS锛欸etMemberCardBalanceChange -> tenantMemberCardLogs 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_RECHARGE_SETTLE",
@@ -483,19 +512,83 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
endpoint="/Site/GetRechargeSettleList",
data_path=("data",),
list_key="settleList",
pk_columns=(),
pk_columns=(_int_col("recharge_order_id", "settleList.id", "id", required=True),),
extra_columns=(
_int_col("tenant_id", "settleList.tenantId", "tenantId"),
_int_col("site_id", "settleList.siteId", "siteId", "siteProfile.id"),
ColumnSpec("site_name_snapshot", sources=("siteProfile.shop_name", "settleList.siteName")),
_int_col("member_id", "settleList.memberId", "memberId"),
ColumnSpec("member_name_snapshot", sources=("settleList.memberName", "memberName")),
ColumnSpec("member_phone_snapshot", sources=("settleList.memberPhone", "memberPhone")),
_int_col("tenant_member_card_id", "settleList.tenantMemberCardId", "tenantMemberCardId"),
ColumnSpec("member_card_type_name", sources=("settleList.memberCardTypeName", "memberCardTypeName")),
_int_col("settle_relate_id", "settleList.settleRelateId", "settleRelateId"),
_int_col("settle_type", "settleList.settleType", "settleType"),
ColumnSpec("settle_name", sources=("settleList.settleName", "settleName")),
_int_col("is_first", "settleList.isFirst", "isFirst"),
_int_col("settle_status", "settleList.settleStatus", "settleStatus"),
_decimal_col("pay_amount", "settleList.payAmount", "payAmount"),
_decimal_col("refund_amount", "settleList.refundAmount", "refundAmount"),
_decimal_col("point_amount", "settleList.pointAmount", "pointAmount"),
_decimal_col("cash_amount", "settleList.cashAmount", "cashAmount"),
_decimal_col("online_amount", "settleList.onlineAmount", "onlineAmount"),
_decimal_col("balance_amount", "settleList.balanceAmount", "balanceAmount"),
_decimal_col("card_amount", "settleList.cardAmount", "cardAmount"),
_decimal_col("coupon_amount", "settleList.couponAmount", "couponAmount"),
_decimal_col("recharge_card_amount", "settleList.rechargeCardAmount", "rechargeCardAmount"),
_decimal_col("gift_card_amount", "settleList.giftCardAmount", "giftCardAmount"),
_decimal_col("prepay_money", "settleList.prepayMoney", "prepayMoney"),
_decimal_col("consume_money", "settleList.consumeMoney", "consumeMoney"),
_decimal_col("goods_money", "settleList.goodsMoney", "goodsMoney"),
_decimal_col("real_goods_money", "settleList.realGoodsMoney", "realGoodsMoney"),
_decimal_col("table_charge_money", "settleList.tableChargeMoney", "tableChargeMoney"),
_decimal_col("service_money", "settleList.serviceMoney", "serviceMoney"),
_decimal_col("activity_discount", "settleList.activityDiscount", "activityDiscount"),
_decimal_col("all_coupon_discount", "settleList.allCouponDiscount", "allCouponDiscount"),
_decimal_col("goods_promotion_money", "settleList.goodsPromotionMoney", "goodsPromotionMoney"),
_decimal_col("assistant_promotion_money", "settleList.assistantPromotionMoney", "assistantPromotionMoney"),
_decimal_col("assistant_pd_money", "settleList.assistantPdMoney", "assistantPdMoney"),
_decimal_col("assistant_cx_money", "settleList.assistantCxMoney", "assistantCxMoney"),
_decimal_col("assistant_manual_discount", "settleList.assistantManualDiscount", "assistantManualDiscount"),
_decimal_col("coupon_sale_amount", "settleList.couponSaleAmount", "couponSaleAmount"),
_decimal_col("member_discount_amount", "settleList.memberDiscountAmount", "memberDiscountAmount"),
_decimal_col("point_discount_price", "settleList.pointDiscountPrice", "pointDiscountPrice"),
_decimal_col("point_discount_cost", "settleList.pointDiscountCost", "pointDiscountCost"),
_decimal_col("adjust_amount", "settleList.adjustAmount", "adjustAmount"),
_decimal_col("rounding_amount", "settleList.roundingAmount", "roundingAmount"),
_int_col("payment_method", "settleList.paymentMethod", "paymentMethod"),
_bool_col("can_be_revoked", "settleList.canBeRevoked", "canBeRevoked"),
_bool_col("is_bind_member", "settleList.isBindMember", "isBindMember"),
_bool_col("is_activity", "settleList.isActivity", "isActivity"),
_bool_col("is_use_coupon", "settleList.isUseCoupon", "isUseCoupon"),
_bool_col("is_use_discount", "settleList.isUseDiscount", "isUseDiscount"),
_int_col("operator_id", "settleList.operatorId", "operatorId"),
ColumnSpec("operator_name_snapshot", sources=("settleList.operatorName", "operatorName")),
_int_col("salesman_user_id", "settleList.salesManUserId", "salesmanUserId", "salesManUserId"),
ColumnSpec("salesman_name", sources=("settleList.salesManName", "salesmanName", "settleList.salesmanName")),
ColumnSpec("order_remark", sources=("settleList.orderRemark", "orderRemark")),
_int_col("table_id", "settleList.tableId", "tableId"),
_int_col("serial_number", "settleList.serialNumber", "serialNumber"),
_int_col("revoke_order_id", "settleList.revokeOrderId", "revokeOrderId"),
ColumnSpec("revoke_order_name", sources=("settleList.revokeOrderName", "revokeOrderName")),
ColumnSpec("revoke_time", sources=("settleList.revokeTime", "revokeTime")),
ColumnSpec("create_time", sources=("settleList.createTime", "createTime")),
ColumnSpec("pay_time", sources=("settleList.payTime", "payTime")),
ColumnSpec("site_profile", sources=("siteProfile",)),
),
include_site_column=False,
include_source_endpoint=False,
include_source_endpoint=True,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
include_fetched_at=True,
include_record_index=False,
conflict_columns_override=None,
requires_window=False,
description="会员充值结算 ODSGetRechargeSettleList -> settleList 原始 JSON",
description="?????? ODS?GetRechargeSettleList -> data.settleList ????",
),
OdsTaskSpec(
code="ODS_PACKAGE",
code="ODS_GROUP_PACKAGE",
class_name="OdsPackageTask",
table_name="billiards_ods.group_buy_packages",
endpoint="/PackageCoupon/QueryPackageCouponList",
@@ -510,7 +603,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="团购套餐定义 ODSQueryPackageCouponList -> packageCouponList 原始 JSON",
description="鍥㈣喘濂楅瀹氫箟 ODS锛歈ueryPackageCouponList -> packageCouponList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_GROUP_BUY_REDEMPTION",
@@ -528,7 +621,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="团购套餐核销 ODSGetSiteTableUseDetails -> siteTableUseDetailsList 原始 JSON",
description="鍥㈣喘濂楅鏍搁攢 ODS锛欸etSiteTableUseDetails -> siteTableUseDetailsList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_INVENTORY_STOCK",
@@ -545,7 +638,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="库存汇总 ODSGetGoodsStockReport 原始 JSON",
description="搴撳瓨姹囨€?ODS锛欸etGoodsStockReport 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_INVENTORY_CHANGE",
@@ -562,7 +655,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
description="库存变化记录 ODSQueryGoodsOutboundReceipt -> queryDeliveryRecordsList 原始 JSON",
description="搴撳瓨鍙樺寲璁板綍 ODS锛歈ueryGoodsOutboundReceipt -> queryDeliveryRecordsList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_TABLES",
@@ -580,7 +673,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="台桌维表 ODSGetSiteTables -> siteTables 原始 JSON",
description="鍙版缁磋〃 ODS锛欸etSiteTables -> siteTables 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_GOODS_CATEGORY",
@@ -598,7 +691,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="库存商品分类树 ODSQueryPrimarySecondaryCategory -> goodsCategoryList 原始 JSON",
description="搴撳瓨鍟嗗搧鍒嗙被鏍?ODS锛歈ueryPrimarySecondaryCategory -> goodsCategoryList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_STORE_GOODS",
@@ -616,10 +709,10 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="门店商品档案 ODSGetGoodsInventoryList -> orderGoodsList 原始 JSON",
description="闂ㄥ簵鍟嗗搧妗f ODS锛欸etGoodsInventoryList -> orderGoodsList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_TABLE_DISCOUNT",
code="ODS_TABLE_FEE_DISCOUNT",
class_name="OdsTableDiscountTask",
table_name="billiards_ods.table_fee_discount_records",
endpoint="/Site/GetTaiFeeAdjustList",
@@ -634,7 +727,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="台费折扣/调账 ODSGetTaiFeeAdjustList -> taiFeeAdjustInfos 原始 JSON",
description="鍙拌垂鎶樻墸/璋冭处 ODS锛欸etTaiFeeAdjustList -> taiFeeAdjustInfos 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_TENANT_GOODS",
@@ -652,7 +745,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="租户商品档案 ODSQueryTenantGoods -> tenantGoodsList 原始 JSON",
description="绉熸埛鍟嗗搧妗f ODS锛歈ueryTenantGoods -> tenantGoodsList 鍘熷 JSON",
),
OdsTaskSpec(
code="ODS_SETTLEMENT_TICKET",
@@ -671,7 +764,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
include_site_id=False,
description="结账小票详情 ODSGetOrderSettleTicketNew 原始 JSON",
description="缁撹处灏忕エ璇︽儏 ODS锛欸etOrderSettleTicketNew 鍘熷 JSON",
),
)
@@ -725,7 +818,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
if not candidates:
self.logger.info(
"%s: 窗口[%s ~ %s] 未发现需要抓取的小票",
"%s: 绐楀彛[%s ~ %s] 鏈彂鐜伴渶瑕佹姄鍙栫殑灏忕エ",
spec.code,
context.window_start,
context.window_end,
@@ -755,7 +848,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
counts["updated"] += updated
self.db.commit()
self.logger.info(
"%s: 小票抓取完成,候选=%s 插入=%s 更新=%s 跳过=%s",
"%s: 灏忕エ鎶撳彇瀹屾垚锛屽€欓€?%s 鎻掑叆=%s 鏇存柊=%s 璺宠繃=%s",
spec.code,
len(candidates),
inserted,
@@ -767,7 +860,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
except Exception:
counts["errors"] += 1
self.db.rollback()
self.logger.error("%s: 小票抓取失败", spec.code, exc_info=True)
self.logger.error("%s: 灏忕エ鎶撳彇澶辫触", spec.code, exc_info=True)
raise
# ------------------------------------------------------------------ helpers
@@ -782,7 +875,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
try:
rows = self.db.query(sql)
except Exception:
self.logger.warning("查询已有小票失败,按空集处理", exc_info=True)
self.logger.warning("鏌ヨ宸叉湁灏忕エ澶辫触锛屾寜绌洪泦澶勭悊", exc_info=True)
return set()
return {
@@ -819,7 +912,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
try:
rows = self.db.query(sql, params)
except Exception:
self.logger.warning("读取支付流水以获取结算单ID失败将尝试调用支付接口回退", exc_info=True)
self.logger.warning("璇诲彇鏀粯娴佹按浠ヨ幏鍙栫粨绠楀崟ID澶辫触锛屽皢灏濊瘯璋冪敤鏀粯鎺ュ彛鍥為€€", exc_info=True)
return set()
return {
@@ -853,7 +946,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
if relate_id:
candidate_ids.add(relate_id)
except Exception:
self.logger.warning("调用支付接口获取结算单ID失败当前批次将跳过回退来源", exc_info=True)
self.logger.warning("璋冪敤鏀粯鎺ュ彛鑾峰彇缁撶畻鍗旾D澶辫触锛屽綋鍓嶆壒娆″皢璺宠繃鍥為€€鏉ユ簮", exc_info=True)
return candidate_ids
def _fetch_ticket_payload(self, order_settle_id: int):
@@ -869,10 +962,10 @@ class OdsSettlementTicketTask(BaseOdsTask):
payload = response
except Exception:
self.logger.warning(
"调用小票接口失败 orderSettleId=%s", order_settle_id, exc_info=True
"璋冪敤灏忕エ鎺ュ彛澶辫触 orderSettleId=%s", order_settle_id, exc_info=True
)
if isinstance(payload, dict) and isinstance(payload.get("data"), list) and len(payload["data"]) == 1:
# 本地桩/回放可能把响应包装成单元素 list这里展开以贴近真实结构
# 鏈湴妗?鍥炴斁鍙兘鎶婂搷搴斿寘瑁呮垚鍗曞厓绱?list锛岃繖閲屽睍寮€浠ヨ创杩戠湡瀹炵粨鏋?
payload = payload["data"][0]
return payload
@@ -899,27 +992,29 @@ def _build_task_class(spec: OdsTaskSpec) -> Type[BaseOdsTask]:
ENABLED_ODS_CODES = {
"ODS_ASSISTANT_ACCOUNTS",
"ODS_ASSISTANT_ACCOUNT",
"ODS_ASSISTANT_LEDGER",
"ODS_ASSISTANT_ABOLISH",
"ODS_INVENTORY_CHANGE",
"ODS_INVENTORY_STOCK",
"ODS_PACKAGE",
"ODS_GROUP_PACKAGE",
"ODS_GROUP_BUY_REDEMPTION",
"ODS_MEMBER",
"ODS_MEMBER_BALANCE",
"ODS_MEMBER_CARD",
"ODS_PAYMENT",
"ODS_REFUND",
"ODS_COUPON_VERIFY",
"ODS_PLATFORM_COUPON",
"ODS_RECHARGE_SETTLE",
"ODS_TABLE_USE",
"ODS_TABLES",
"ODS_GOODS_CATEGORY",
"ODS_STORE_GOODS",
"ODS_TABLE_DISCOUNT",
"ODS_TABLE_FEE_DISCOUNT",
"ODS_STORE_GOODS_SALES",
"ODS_TENANT_GOODS",
"ODS_SETTLEMENT_TICKET",
"ODS_ORDER_SETTLE",
"ODS_SETTLEMENT_RECORDS",
}
ODS_TASK_CLASSES: Dict[str, Type[BaseOdsTask]] = {
@@ -931,3 +1026,4 @@ ODS_TASK_CLASSES: Dict[str, Type[BaseOdsTask]] = {
ODS_TASK_CLASSES["ODS_SETTLEMENT_TICKET"] = OdsSettlementTicketTask
__all__ = ["ODS_TASK_CLASSES", "ODS_TASK_SPECS", "BaseOdsTask", "ENABLED_ODS_CODES"]

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
from .base_dwd_task import BaseDwdTask
from loaders.facts.payment import PaymentLoader
from models.parsers import TypeParser
@@ -29,7 +29,7 @@ class PaymentsDwdTask(BaseDwdTask):
# Iterate ODS Data
batches = self.iter_ods_rows(
table_name="billiards_ods.ods_payment_record",
table_name="billiards_ods.payment_transactions",
columns=["site_id", "pay_id", "payload", "fetched_at"],
start_time=window_start,
end_time=window_end
@@ -136,3 +136,4 @@ class PaymentsDwdTask(BaseDwdTask):
except Exception as e:
self.logger.warning(f"Error parsing payment: {e}")
return None

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
"""Unit tests for the new ODS ingestion tasks."""
import logging
import os
@@ -22,21 +22,21 @@ def _build_config(tmp_path):
return create_test_config("ONLINE", archive_dir, temp_dir)
def test_ods_assistant_accounts_ingest(tmp_path):
"""Ensure ODS_ASSISTANT_ACCOUNTS task stores raw payload with record_index dedup keys."""
def test_assistant_accounts_masters_ingest(tmp_path):
"""Ensure assistant_accounts_masterS task stores raw payload with record_index dedup keys."""
config = _build_config(tmp_path)
sample = [
{
"id": 5001,
"assistant_no": "A01",
"nickname": "小张",
"nickname": "灏忓紶",
}
]
api = FakeAPIClient({"/PersonnelManagement/SearchAssistantInfo": sample})
task_cls = ODS_TASK_CLASSES["ODS_ASSISTANT_ACCOUNTS"]
task_cls = ODS_TASK_CLASSES["assistant_accounts_masterS"]
with get_db_operations() as db_ops:
task = task_cls(config, db_ops, api, logging.getLogger("test_ods_assistant_accounts"))
task = task_cls(config, db_ops, api, logging.getLogger("test_assistant_accounts_masters"))
result = task.execute()
assert result["status"] == "SUCCESS"
@@ -49,21 +49,21 @@ def test_ods_assistant_accounts_ingest(tmp_path):
assert '"id": 5001' in row["payload"]
def test_ods_inventory_change_ingest(tmp_path):
"""Ensure ODS_INVENTORY_CHANGE task stores raw payload with record_index dedup keys."""
def test_goods_stock_movements_ingest(tmp_path):
"""Ensure goods_stock_movements task stores raw payload with record_index dedup keys."""
config = _build_config(tmp_path)
sample = [
{
"siteGoodsStockId": 123456,
"stockType": 1,
"goodsName": "测试商品",
"goodsName": "娴嬭瘯鍟嗗搧",
}
]
api = FakeAPIClient({"/GoodsStockManage/QueryGoodsOutboundReceipt": sample})
task_cls = ODS_TASK_CLASSES["ODS_INVENTORY_CHANGE"]
task_cls = ODS_TASK_CLASSES["goods_stock_movements"]
with get_db_operations() as db_ops:
task = task_cls(config, db_ops, api, logging.getLogger("test_ods_inventory_change"))
task = task_cls(config, db_ops, api, logging.getLogger("test_goods_stock_movements"))
result = task.execute()
assert result["status"] == "SUCCESS"
@@ -75,7 +75,7 @@ def test_ods_inventory_change_ingest(tmp_path):
assert '"siteGoodsStockId": 123456' in row["payload"]
def test_ods_member_profiles_ingest(tmp_path):
def test_member_profiless_ingest(tmp_path):
"""Ensure ODS_MEMBER task stores tenantMemberInfos raw JSON."""
config = _build_config(tmp_path)
sample = [{"tenantMemberInfos": [{"id": 101, "mobile": "13800000000"}]}]
@@ -110,14 +110,14 @@ def test_ods_payment_ingest(tmp_path):
def test_ods_settlement_records_ingest(tmp_path):
"""Ensure ODS_ORDER_SETTLE task stores settleList raw JSON."""
"""Ensure settlement_records task stores settleList raw JSON."""
config = _build_config(tmp_path)
sample = [{"data": {"settleList": [{"id": 701, "orderTradeNo": 8001}]}}]
api = FakeAPIClient({"/Site/GetAllOrderSettleList": sample})
task_cls = ODS_TASK_CLASSES["ODS_ORDER_SETTLE"]
task_cls = ODS_TASK_CLASSES["settlement_records"]
with get_db_operations() as db_ops:
task = task_cls(config, db_ops, api, logging.getLogger("test_ods_order_settle"))
task = task_cls(config, db_ops, api, logging.getLogger("test_settlement_records"))
result = task.execute()
assert result["status"] == "SUCCESS"
@@ -158,3 +158,4 @@ def test_ods_settlement_ticket_by_payment_relate_ids(tmp_path):
and call.get("params", {}).get("orderSettleId") == 9001
for call in api.calls
)

1361
tmp/20251121-task.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,321 @@
# -*- coding: utf-8 -*-
"""鎵嬪伐绀轰緥鏁版嵁鐏屽叆锛氭寜 schema_ODS_doc.sql 涓婚敭/鍞竴閿壒閲忓啓鍏?ODS銆?""
from __future__ import annotations
import json
import os
from datetime import datetime
from typing import Any, Iterable
from psycopg2.extras import Json
from .base_task import BaseTask
class ManualIngestTask(BaseTask):
"""湴绀轰緥 JSON 鐏屽叆 ODS锛岀淇濊鍚嶃佷富閿佹彃鍏ュ垪涓?schema_ODS_doc.sql 瀵归綈銆?""
def __init__(self, config, db_connection, api_client, logger):
"""鍒濆鍖栫紦瀛橈紝閬垮厤閲嶅鏌ヨ琛ㄧ粨鏋勩€?""
super().__init__(config, db_connection, api_client, logger)
self._table_columns_cache: dict[str, list[str]] = {}
# 鏂囦欢鍏抽敭璇?-> 鐩爣琛紙鍖归厤 C:\dev\LLTQ\export\temp\source-data-doc 涓嬬ず鑼?JSON 鍚嶇О锛? FILE_MAPPING: list[tuple[tuple[str, ...], str]] = [
(("浼氬憳妗f", "member_profiles"), "billiards_ods.member_profiles"),
(("浣欓鍙樻洿璁板綍", "member_balance_changes"), "billiards_ods.member_balance_changes"),
(("鍌ㄥ€煎崱鍒楄〃", "member_stored_value_cards"), "billiards_ods.member_stored_value_cards"),
(("鍏呭€艰褰?, "recharge_settlements"), "billiards_ods.recharge_settlements"),
(("缁撹处璁板綍", "settlement_records"), "billiards_ods.settlement_records"),
(("鍔╂暀搴熼櫎", "assistant_cancellation_records"), "billiards_ods.assistant_cancellation_records"),
(("鍔╂暀璐﹀彿", "assistant_accounts_master"), "billiards_ods.assistant_accounts_master"),
(("鍔╂暀娴佹按", "assistant_service_records"), "billiards_ods.assistant_service_records"),
(("鍙版鍒楄〃", "site_tables_master"), "billiards_ods.site_tables_master"),
(("鍙拌垂鎵撴姌", "table_fee_discount_records"), "billiards_ods.table_fee_discount_records"),
(("鍙拌垂娴佹按", "table_fee_transactions"), "billiards_ods.table_fee_transactions"),
(("搴撳瓨鍙樺寲璁板綍1", "goods_stock_movements"), "billiards_ods.goods_stock_movements"),
(("搴撳瓨鍙樺寲璁板綍2", "stock_goods_category_tree"), "billiards_ods.stock_goods_category_tree"),
(("搴撳瓨姹囨€?, "goods_stock_summary"), "billiards_ods.goods_stock_summary"),
(("鏀粯璁板綍", "payment_transactions"), "billiards_ods.payment_transactions"),
(("閫€娆捐褰?, "refund_transactions"), "billiards_ods.refund_transactions"),
(("骞冲彴楠屽埜璁板綍", "platform_coupon_redemption_records"), "billiards_ods.platform_coupon_redemption_records"),
(("鍥㈣喘濂楅娴佹按", "group_buy_redemption_records"), "billiards_ods.group_buy_packages_ledger"),
(("鍥㈣喘濂楅", "group_buy_packages"), "billiards_ods.group_buy_packages"),
(("灏忕エ璇︽儏", "settlement_ticket_details"), "billiards_ods.settlement_ticket_details"),
(("闂ㄥ簵鍟嗗搧妗f", "store_goods_master"), "billiards_ods.store_goods_master"),
(("鍟嗗搧妗f", "tenant_goods_master"), "billiards_ods.tenant_goods_master"),
(("闂ㄥ簵鍟嗗搧閿€鍞褰?, "store_goods_sales_records"), "billiards_ods.store_goods_sales_records"),
]
# 琛ㄧ粨鏋勮鏄庯細pk=涓婚敭鍒?None 琛ㄧず鏃犲啿绐佹洿鏂?锛宩son_cols=闇€瑕佸崟鍒楀瓨 JSONB 鐨勫瓧娈? TABLE_SPECS: dict[str, dict[str, Any]] = {
"billiards_ods.member_profiles": {"pk": "id"},
"billiards_ods.member_balance_changes": {"pk": "id"},
"billiards_ods.member_stored_value_cards": {"pk": "id"},
"billiards_ods.recharge_settlements": {"pk": None, "json_cols": ["settleList", "siteProfile"]},
"billiards_ods.settlement_records": {"pk": None, "json_cols": ["settleList", "siteProfile"]},
"billiards_ods.assistant_cancellation_records": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.assistant_accounts_master": {"pk": "id"},
"billiards_ods.assistant_service_records": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.site_tables_master": {"pk": "id"},
"billiards_ods.table_fee_discount_records": {"pk": "id", "json_cols": ["siteProfile", "tableProfile"]},
"billiards_ods.table_fee_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.goods_stock_movements": {"pk": "siteGoodsStockId"},
"billiards_ods.stock_goods_category_tree": {"pk": "id", "json_cols": ["categoryBoxes"]},
"billiards_ods.goods_stock_summary": {"pk": "siteGoodsId"},
"billiards_ods.payment_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.refund_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.platform_coupon_redemption_records": {"pk": "id"},
"billiards_ods.tenant_goods_master": {"pk": "id"},
"billiards_ods.group_buy_packages": {"pk": "id"},
"billiards_ods.group_buy_packages_ledger": {"pk": "id"},
"billiards_ods.settlement_ticket_details": {
"pk": "orderSettleId",
"json_cols": ["memberProfile", "orderItem", "tenantMemberCardLogs"],
},
"billiards_ods.store_goods_master": {"pk": "id"},
"billiards_ods.store_goods_sales_records": {"pk": "id"},
}
def get_task_code(self) -> str:
"""杩斿洖浠诲姟缂栫爜銆?""
return "MANUAL_INGEST"
def execute(self, cursor_data: dict | None = None) -> dict:
"""浠庣ず鑼冪洰褰曡鍙?JSON锛屾寜琛?涓婚敭鎵归噺鍏ュ簱銆?""
data_dir = (
self.config.get("manual.data_dir")
or self.config.get("pipeline.ingest_source_dir")
or r"c:\dev\LLTQ\ETL\feiqiu-ETL\etl_billiards\tests\testdata_json"
)
if not os.path.exists(data_dir):
self.logger.error("Data directory not found: %s", data_dir)
return {"status": "error", "message": "Directory not found"}
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
for filename in sorted(os.listdir(data_dir)):
if not filename.endswith(".json"):
continue
filepath = os.path.join(data_dir, filename)
try:
with open(filepath, "r", encoding="utf-8") as fh:
raw_entries = json.load(fh)
except Exception:
counts["errors"] += 1
self.logger.exception("Failed to read %s", filename)
continue
if not isinstance(raw_entries, list):
raw_entries = [raw_entries]
records = self._extract_records(raw_entries)
if not records:
counts["skipped"] += 1
continue
target_table = self._match_by_filename(filename)
if not target_table:
self.logger.warning("No mapping found for file: %s", filename)
counts["skipped"] += 1
continue
self.logger.info("Ingesting %s into %s", filename, target_table)
try:
inserted, updated = self._ingest_table(target_table, records, filename)
counts["inserted"] += inserted
counts["updated"] += updated
counts["fetched"] += len(records)
except Exception:
counts["errors"] += 1
self.logger.exception("Error processing %s", filename)
self.db.rollback()
continue
try:
self.db.commit()
except Exception:
self.db.rollback()
raise
return {"status": "SUCCESS", "counts": counts}
# ------------------------------------------------------------------ helpers
def _match_by_filename(self, filename: str) -> str | None:
"""鏍规嵁鏂囦欢鍚嶅叧閿瘝鎵惧埌鐩爣琛ㄣ?""
for keywords, table in self.FILE_MAPPING:
if any(keyword and keyword in filename for keyword in keywords):
return table
return None
def _extract_records(self, raw_entries: Iterable[Any]) -> list[dict]:
"""鍏煎澶氱 JSON 缁撴瀯锛屾彁鍙栨垚璁板綍鍒楄〃銆?""
records: list[dict] = []
for entry in raw_entries:
if isinstance(entry, dict):
# 濡傛灉鍚?data 涓旇繕鍖呭惈鍏朵粬閿紙濡?orderSettleId锛夛紝浼樺厛淇濈暀澶栧眰浠ュ厤涓㈠け涓婚敭
preferred = entry
if "data" in entry and not any(k not in {"data", "code"} for k in entry.keys()):
preferred = entry["data"]
data = preferred
if isinstance(data, dict):
list_used = False
for v in data.values():
if isinstance(v, list) and v and isinstance(v[0], dict):
records.extend(v)
list_used = True
break
if list_used:
continue
if isinstance(data, list) and data and isinstance(data[0], dict):
records.extend(data)
elif isinstance(data, dict):
records.append(data)
elif isinstance(entry, list):
records.extend([item for item in entry if isinstance(item, dict)])
return records
def _get_table_columns(self, table: str) -> list[str]:
"""鏌ヨ伅_schema锛岃幏鍙栫洰鏍囪鐨勫叏閮ㄥ垪鍚嶏紙鎸夐搴忥級銆?""
if table in self._table_columns_cache:
return self._table_columns_cache[table]
if "." in table:
schema, name = table.split(".", 1)
else:
schema, name = "public", table
sql = """
SELECT column_name, data_type, udt_name
FROM information_schema.columns
WHERE table_schema = %s AND table_name = %s
ORDER BY ordinal_position
"""
with self.db.conn.cursor() as cur:
cur.execute(sql, (schema, name))
cols = [(r[0], (r[1] or "").lower(), (r[2] or "").lower()) for r in cur.fetchall()]
self._table_columns_cache[table] = cols
return cols
def _ingest_table(self, table: str, records: list[dict], source_file: str) -> tuple[int, int]:
"""鏋勯€?INSERT/ON CONFLICT 璇彞骞舵壒閲忔墽琛屻€?""
spec = self.TABLE_SPECS.get(table)
if not spec:
raise ValueError(f"No table spec for {table}")
pk_col = spec.get("pk")
json_cols = set(spec.get("json_cols", []))
json_cols_lower = {c.lower() for c in json_cols}
columns_info = self._get_table_columns(table)
columns = [c[0] for c in columns_info]
db_json_cols_lower = {
c[0].lower() for c in columns_info if c[1] in ("json", "jsonb") or c[2] in ("json", "jsonb")
}
pk_col_db = None
if pk_col:
pk_col_db = next((c for c in columns if c.lower() == pk_col.lower()), pk_col)
placeholders = ", ".join(["%s"] * len(columns))
col_list = ", ".join(f'"{c}"' for c in columns)
sql = f'INSERT INTO {table} ({col_list}) VALUES ({placeholders})'
if pk_col_db:
update_cols = [c for c in columns if c != pk_col_db]
set_clause = ", ".join(f'"{c}"=EXCLUDED."{c}"' for c in update_cols)
sql += f' ON CONFLICT ("{pk_col_db}") DO UPDATE SET {set_clause}'
sql += " RETURNING (xmax = 0) AS inserted"
params = []
now = datetime.now()
json_dump = lambda v: json.dumps(v, ensure_ascii=False) # noqa: E731
for rec in records:
merged_rec = rec if isinstance(rec, dict) else {}
# 閫愬眰灞曞紑 data -> data.data 缁撴瀯锛屽~鍏呯己澶卞瓧娈? data_part = merged_rec.get("data")
while isinstance(data_part, dict):
merged_rec = {**data_part, **merged_rec}
data_part = data_part.get("data")
pk_val = self._get_value_case_insensitive(merged_rec, pk_col) if pk_col else None
if pk_col and (pk_val is None or pk_val == ""):
continue
row_vals = []
for col_name, data_type, udt in columns_info:
col_lower = col_name.lower()
if col_lower == "payload":
row_vals.append(Json(rec, dumps=json_dump))
continue
if col_lower == "source_file":
row_vals.append(source_file)
continue
if col_lower == "fetched_at":
row_vals.append(merged_rec.get(col_name, now))
continue
value = self._normalize_scalar(self._get_value_case_insensitive(merged_rec, col_name))
if col_lower in json_cols_lower or col_lower in db_json_cols_lower:
row_vals.append(Json(value, dumps=json_dump) if value is not None else None)
continue
casted = self._cast_value(value, data_type)
row_vals.append(casted)
params.append(tuple(row_vals))
if not params:
return 0, 0
inserted = 0
updated = 0
with self.db.conn.cursor() as cur:
for row in params:
cur.execute(sql, row)
try:
flag = cur.fetchone()[0]
except Exception:
flag = None
if flag:
inserted += 1
else:
updated += 1
return inserted, updated
def _get_value_case_insensitive(self, record: dict, col: str):
"""蹇界暐澶у皬鍐欒幏鍙栧硷紝鍏煎 information_schema 灏忓啓鍒楀悕涓?JSON 鍘熷澶у皬鍐欍?""
if record is None:
return None
if col is None:
return None
if col in record:
return record.get(col)
col_lower = col.lower()
for k, v in record.items():
if isinstance(k, str) and k.lower() == col_lower:
return v
return None
def _normalize_scalar(self, value):
"""灏嗙┖瀛楃涓叉爣鍑嗗寲涓?None锛岄伩鍏嶆暟鍊?鏃堕棿瀛楁绫诲瀷閿欒銆?""
if value == "" or value == "{}" or value == "[]":
return None
return value
def _cast_value(self, value, data_type: str):
"""鏍规嵁鍒楃被鍨嬪仛杞婚噺杞崲锛岄伩鍏嶇被鍨嬩笉鍖归厤銆?""
if value is None:
return None
dt = (data_type or "").lower()
if dt in ("integer", "bigint", "smallint"):
if isinstance(value, bool):
return int(value)
try:
return int(value)
except Exception:
return None
if dt in ("numeric", "double precision", "real", "decimal"):
if isinstance(value, bool):
return int(value)
try:
return float(value)
except Exception:
return None
if dt.startswith("timestamp") or dt in ("date", "time", "interval"):
# 浠呮帴鍙楀瓧绗︿覆/鏃ユ湡锛屾暟鍊肩瓑涓€寰嬬疆绌? return value if isinstance(value, str) else None
return value

View File

@@ -0,0 +1,347 @@
# -*- coding: utf-8 -*-
"""手工示例数据灌入:按 schema_ODS_doc.sql 的表结构写入 ODS。"""
from __future__ import annotations
import json
import os
from datetime import datetime
from typing import Any, Iterable
from psycopg2.extras import Json
from .base_task import BaseTask
class ManualIngestTask(BaseTask):
"""本地示例 JSON 灌入 ODS确保表名/主键/插入列与 schema_ODS_doc.sql 对齐。"""
FILE_MAPPING: list[tuple[tuple[str, ...], str]] = [
(("member_profiles",), "billiards_ods.member_profiles"),
(("member_balance_changes",), "billiards_ods.member_balance_changes"),
(("member_stored_value_cards",), "billiards_ods.member_stored_value_cards"),
(("recharge_settlements",), "billiards_ods.recharge_settlements"),
(("settlement_records",), "billiards_ods.settlement_records"),
(("assistant_cancellation_records",), "billiards_ods.assistant_cancellation_records"),
(("assistant_accounts_master",), "billiards_ods.assistant_accounts_master"),
(("assistant_service_records",), "billiards_ods.assistant_service_records"),
(("site_tables_master",), "billiards_ods.site_tables_master"),
(("table_fee_discount_records",), "billiards_ods.table_fee_discount_records"),
(("table_fee_transactions",), "billiards_ods.table_fee_transactions"),
(("goods_stock_movements",), "billiards_ods.goods_stock_movements"),
(("stock_goods_category_tree",), "billiards_ods.stock_goods_category_tree"),
(("goods_stock_summary",), "billiards_ods.goods_stock_summary"),
(("payment_transactions",), "billiards_ods.payment_transactions"),
(("refund_transactions",), "billiards_ods.refund_transactions"),
(("platform_coupon_redemption_records",), "billiards_ods.platform_coupon_redemption_records"),
(("group_buy_redemption_records",), "billiards_ods.group_buy_redemption_records"),
(("group_buy_packages",), "billiards_ods.group_buy_packages"),
(("settlement_ticket_details",), "billiards_ods.settlement_ticket_details"),
(("store_goods_master",), "billiards_ods.store_goods_master"),
(("tenant_goods_master",), "billiards_ods.tenant_goods_master"),
(("store_goods_sales_records",), "billiards_ods.store_goods_sales_records"),
]
TABLE_SPECS: dict[str, dict[str, Any]] = {
"billiards_ods.member_profiles": {"pk": "id"},
"billiards_ods.member_balance_changes": {"pk": "id"},
"billiards_ods.member_stored_value_cards": {"pk": "id"},
"billiards_ods.recharge_settlements": {"pk": "id"},
"billiards_ods.settlement_records": {"pk": "id"},
"billiards_ods.assistant_cancellation_records": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.assistant_accounts_master": {"pk": "id"},
"billiards_ods.assistant_service_records": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.site_tables_master": {"pk": "id"},
"billiards_ods.table_fee_discount_records": {"pk": "id", "json_cols": ["siteProfile", "tableProfile"]},
"billiards_ods.table_fee_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.goods_stock_movements": {"pk": "siteGoodsStockId"},
"billiards_ods.stock_goods_category_tree": {"pk": "id", "json_cols": ["categoryBoxes"]},
"billiards_ods.goods_stock_summary": {"pk": "siteGoodsId"},
"billiards_ods.payment_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.refund_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
"billiards_ods.platform_coupon_redemption_records": {"pk": "id"},
"billiards_ods.tenant_goods_master": {"pk": "id"},
"billiards_ods.group_buy_packages": {"pk": "id"},
"billiards_ods.group_buy_redemption_records": {"pk": "id"},
"billiards_ods.settlement_ticket_details": {
"pk": "orderSettleId",
"json_cols": ["memberProfile", "orderItem", "tenantMemberCardLogs"],
},
"billiards_ods.store_goods_master": {"pk": "id"},
"billiards_ods.store_goods_sales_records": {"pk": "id"},
}
def get_task_code(self) -> str:
"""返回任务编码。"""
return "MANUAL_INGEST"
def execute(self, cursor_data: dict | None = None) -> dict:
"""从目录读取 JSON按表定义批量入库。"""
data_dir = (
self.config.get("manual.data_dir")
or self.config.get("pipeline.ingest_source_dir")
or r"c:\dev\LLTQ\ETL\feiqiu-ETL\etl_billiards\tests\testdata_json"
)
if not os.path.exists(data_dir):
self.logger.error("Data directory not found: %s", data_dir)
return {"status": "error", "message": "Directory not found"}
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
for filename in sorted(os.listdir(data_dir)):
if not filename.endswith(".json"):
continue
filepath = os.path.join(data_dir, filename)
try:
with open(filepath, "r", encoding="utf-8") as fh:
raw_entries = json.load(fh)
except Exception:
counts["errors"] += 1
self.logger.exception("Failed to read %s", filename)
continue
entries = raw_entries if isinstance(raw_entries, list) else [raw_entries]
records = self._extract_records(entries)
if not records:
counts["skipped"] += 1
continue
target_table = self._match_by_filename(filename)
if not target_table:
self.logger.warning("No mapping found for file: %s", filename)
counts["skipped"] += 1
continue
self.logger.info("Ingesting %s into %s", filename, target_table)
try:
inserted, updated = self._ingest_table(target_table, records, filename)
counts["inserted"] += inserted
counts["updated"] += updated
counts["fetched"] += len(records)
except Exception:
counts["errors"] += 1
self.logger.exception("Error processing %s", filename)
self.db.rollback()
continue
try:
self.db.commit()
except Exception:
self.db.rollback()
raise
return {"status": "SUCCESS", "counts": counts}
def _match_by_filename(self, filename: str) -> str | None:
"""根据文件名关键字匹配目标表。"""
for keywords, table in self.FILE_MAPPING:
if any(keyword and keyword in filename for keyword in keywords):
return table
return None
def _extract_records(self, raw_entries: Iterable[Any]) -> list[dict]:
"""兼容多层 data/list 包装,抽取记录列表。"""
records: list[dict] = []
for entry in raw_entries:
if isinstance(entry, dict):
preferred = entry
if "data" in entry and not any(k not in {"data", "code"} for k in entry.keys()):
preferred = entry["data"]
data = preferred
if isinstance(data, dict):
# 特殊处理 settleList充值、结算记录展开 data.settleList 下的 settleList抛弃上层 siteProfile
if "settleList" in data:
settle_list_val = data.get("settleList")
if isinstance(settle_list_val, dict):
settle_list_iter = [settle_list_val]
elif isinstance(settle_list_val, list):
settle_list_iter = settle_list_val
else:
settle_list_iter = []
handled = False
for item in settle_list_iter or []:
if not isinstance(item, dict):
continue
inner = item.get("settleList")
merged = dict(inner) if isinstance(inner, dict) else dict(item)
# 保留 siteProfile 供后续字段补充,但不落库
site_profile = data.get("siteProfile")
if isinstance(site_profile, dict):
merged.setdefault("siteProfile", site_profile)
records.append(merged)
handled = True
if handled:
continue
list_used = False
for v in data.values():
if isinstance(v, list) and v and isinstance(v[0], dict):
records.extend(v)
list_used = True
break
if list_used:
continue
if isinstance(data, list) and data and isinstance(data[0], dict):
records.extend(data)
elif isinstance(data, dict):
records.append(data)
elif isinstance(entry, list):
records.extend([item for item in entry if isinstance(item, dict)])
return records
def _get_table_columns(self, table: str) -> list[tuple[str, str, str]]:
"""查询 information_schema获取目标表列信息。"""
cache = getattr(self, "_table_columns_cache", {})
if table in cache:
return cache[table]
if "." in table:
schema, name = table.split(".", 1)
else:
schema, name = "public", table
sql = """
SELECT column_name, data_type, udt_name
FROM information_schema.columns
WHERE table_schema = %s AND table_name = %s
ORDER BY ordinal_position
"""
with self.db.conn.cursor() as cur:
cur.execute(sql, (schema, name))
cols = [(r[0], (r[1] or "").lower(), (r[2] or "").lower()) for r in cur.fetchall()]
cache[table] = cols
self._table_columns_cache = cache
return cols
def _ingest_table(self, table: str, records: list[dict], source_file: str) -> tuple[int, int]:
"""构建 INSERT/ON CONFLICT 语句并批量执行。"""
spec = self.TABLE_SPECS.get(table)
if not spec:
raise ValueError(f"No table spec for {table}")
pk_col = spec.get("pk")
json_cols = set(spec.get("json_cols", []))
json_cols_lower = {c.lower() for c in json_cols}
columns_info = self._get_table_columns(table)
columns = [c[0] for c in columns_info]
db_json_cols_lower = {
c[0].lower() for c in columns_info if c[1] in ("json", "jsonb") or c[2] in ("json", "jsonb")
}
pk_col_db = None
if pk_col:
pk_col_db = next((c for c in columns if c.lower() == pk_col.lower()), pk_col)
placeholders = ", ".join(["%s"] * len(columns))
col_list = ", ".join(f'"{c}"' for c in columns)
sql = f'INSERT INTO {table} ({col_list}) VALUES ({placeholders})'
if pk_col_db:
update_cols = [c for c in columns if c != pk_col_db]
set_clause = ", ".join(f'"{c}"=EXCLUDED."{c}"' for c in update_cols)
sql += f' ON CONFLICT ("{pk_col_db}") DO UPDATE SET {set_clause}'
sql += " RETURNING (xmax = 0) AS inserted"
params = []
now = datetime.now()
json_dump = lambda v: json.dumps(v, ensure_ascii=False) # noqa: E731
for rec in records:
merged_rec = rec if isinstance(rec, dict) else {}
data_part = merged_rec.get("data")
while isinstance(data_part, dict):
merged_rec = {**data_part, **merged_rec}
data_part = data_part.get("data")
# 针对充值/结算,补齐 siteProfile 中的店铺信息
if table in {
"billiards_ods.recharge_settlements",
"billiards_ods.settlement_records",
}:
site_profile = merged_rec.get("siteProfile") or merged_rec.get("site_profile")
if isinstance(site_profile, dict):
merged_rec.setdefault("tenantid", site_profile.get("tenant_id") or site_profile.get("tenantId"))
merged_rec.setdefault("siteid", site_profile.get("id") or site_profile.get("siteId"))
merged_rec.setdefault("sitename", site_profile.get("shop_name") or site_profile.get("siteName"))
pk_val = self._get_value_case_insensitive(merged_rec, pk_col) if pk_col else None
if pk_col and (pk_val is None or pk_val == ""):
continue
row_vals = []
for col_name, data_type, udt in columns_info:
col_lower = col_name.lower()
if col_lower == "payload":
row_vals.append(Json(rec, dumps=json_dump))
continue
if col_lower == "source_file":
row_vals.append(source_file)
continue
if col_lower == "fetched_at":
row_vals.append(merged_rec.get(col_name, now))
continue
value = self._normalize_scalar(self._get_value_case_insensitive(merged_rec, col_name))
if col_lower in json_cols_lower or col_lower in db_json_cols_lower:
row_vals.append(Json(value, dumps=json_dump) if value is not None else None)
continue
casted = self._cast_value(value, data_type)
row_vals.append(casted)
params.append(tuple(row_vals))
if not params:
return 0, 0
inserted = 0
updated = 0
with self.db.conn.cursor() as cur:
for row in params:
cur.execute(sql, row)
flag = cur.fetchone()[0]
if flag:
inserted += 1
else:
updated += 1
return inserted, updated
@staticmethod
def _get_value_case_insensitive(record: dict, col: str | None):
"""忽略大小写获取值,兼容 information_schema 与 JSON 原始字段。"""
if record is None or col is None:
return None
if col in record:
return record.get(col)
col_lower = col.lower()
for k, v in record.items():
if isinstance(k, str) and k.lower() == col_lower:
return v
return None
@staticmethod
def _normalize_scalar(value):
"""将空字符串/空 JSON 规范为 None避免类型转换错误。"""
if value == "" or value == "{}" or value == "[]":
return None
return value
@staticmethod
def _cast_value(value, data_type: str):
"""根据列类型做简单转换,保证批量插入兼容。"""
if value is None:
return None
dt = (data_type or "").lower()
if dt in ("integer", "bigint", "smallint"):
if isinstance(value, bool):
return int(value)
try:
return int(value)
except Exception:
return None
if dt in ("numeric", "double precision", "real", "decimal"):
if isinstance(value, bool):
return int(value)
try:
return float(value)
except Exception:
return None
if dt.startswith("timestamp") or dt in ("date", "time", "interval"):
return value if isinstance(value, str) else None
return value

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

1801
tmp/schema_dwd.sql Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

1
tmp/temp_chinese.txt Normal file
View File

@@ -0,0 +1 @@
含义

29
tmp/tmp_debug_sql.py Normal file
View File

@@ -0,0 +1,29 @@
import os, psycopg2
from etl_billiards.tasks.dwd_load_task import DwdLoadTask
dwd_table="billiards_dwd.dwd_table_fee_log"
ods_table="billiards_ods.table_fee_transactions"
conn=psycopg2.connect(os.environ["PG_DSN"])
cur=conn.cursor()
task=DwdLoadTask(config={}, db_connection=None, api_client=None, logger=None)
cur.execute("SELECT column_name FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_dwd", "dwd_table_fee_log"))
dwd_cols=[r[0].lower() for r in cur.fetchall()]
cur.execute("SELECT column_name FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_ods", "table_fee_transactions"))
ods_cols=[r[0].lower() for r in cur.fetchall()]
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_dwd", "dwd_table_fee_log"))
dwd_types={r[0].lower(): r[1].lower() for r in cur.fetchall()}
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_ods", "table_fee_transactions"))
ods_types={r[0].lower(): r[1].lower() for r in cur.fetchall()}
mapping=task.FACT_MAPPINGS.get(dwd_table)
if mapping:
insert_cols=[d for d,o,_ in mapping if o in ods_cols]
select_exprs=[task._cast_expr(o,cast_type) for d,o,cast_type in mapping if o in ods_cols]
else:
insert_cols=[c for c in dwd_cols if c in ods_cols and c not in task.SCD_COLS]
select_exprs=task._build_fact_select_exprs(insert_cols,dwd_types,ods_types)
print('insert_cols', insert_cols)
print('select_exprs', select_exprs)
sql=f"INSERT INTO {task._format_table(dwd_table,'billiards_dwd')} ({', '.join(f'\"{c}\"' for c in insert_cols)}) SELECT {', '.join(select_exprs)} FROM {task._format_table(ods_table,'billiards_ods')}"
print(sql)
cur.close(); conn.close()

7
tmp/tmp_drop_dwd.py Normal file
View File

@@ -0,0 +1,7 @@
import os, psycopg2
conn=psycopg2.connect(os.environ["PG_DSN"])
conn.autocommit=True
cur=conn.cursor()
cur.execute('DROP SCHEMA IF EXISTS billiards_dwd CASCADE')
cur.close(); conn.close()
print('dropped billiards_dwd')

19
tmp/tmp_dwd_tasks.py Normal file
View File

@@ -0,0 +1,19 @@
import os
import psycopg2
DSN = os.environ.get('PG_DSN')
store_id = int(os.environ.get('STORE_ID','2790685415443269'))
conn = psycopg2.connect(DSN)
conn.autocommit = True
cur = conn.cursor()
rows = []
for code in ('INIT_DWD_SCHEMA','DWD_LOAD_FROM_ODS','DWD_QUALITY_CHECK'):
cur.execute("SELECT task_id FROM etl_admin.etl_task WHERE task_code=%s AND store_id=%s", (code, store_id))
if cur.fetchone():
cur.execute("UPDATE etl_admin.etl_task SET enabled=TRUE, updated_at=now() WHERE task_code=%s AND store_id=%s", (code, store_id))
rows.append((code, 'updated'))
else:
cur.execute("INSERT INTO etl_admin.etl_task(task_code,store_id,enabled,cursor_field,window_minutes_default,overlap_seconds,page_size,params) VALUES (%s,%s,TRUE,NULL,60,120,1000,'{}') RETURNING task_id", (code, store_id))
rows.append((code, 'inserted', cur.fetchone()[0]))
print(rows)
cur.close(); conn.close()

28
tmp/tmp_problems.py Normal file
View File

@@ -0,0 +1,28 @@
import os, psycopg2
from etl_billiards.tasks.dwd_load_task import DwdLoadTask
conn=psycopg2.connect(os.environ['PG_DSN'])
cur=conn.cursor()
problems=[]
for dwd_table, ods_table in DwdLoadTask.TABLE_MAP.items():
if dwd_table.split('.')[-1].startswith('dwd_'):
if '.' in dwd_table:
dschema, dtable = dwd_table.split('.')
else:
dschema, dtable = 'billiards_dwd', dwd_table
if '.' in ods_table:
oschema, otable = ods_table.split('.')
else:
oschema, otable = 'billiards_ods', ods_table
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", (dschema,dtable))
dcols={r[0].lower():r[1].lower() for r in cur.fetchall()}
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", (oschema,otable))
ocols={r[0].lower():r[1].lower() for r in cur.fetchall()}
common=set(dcols)&set(ocols)
missing_dwd=list(set(ocols)-set(dcols))
missing_ods=list(set(dcols)-set(ocols))
mismatches=[(c,dcols[c],ocols[c]) for c in sorted(common) if dcols[c]!=ocols[c]]
problems.append((dwd_table,missing_dwd,missing_ods,mismatches))
cur.close();conn.close()
for p in problems:
print(p)

26
tmp/tmp_run_sql.py Normal file
View File

@@ -0,0 +1,26 @@
import os, psycopg2
from etl_billiards.tasks.dwd_load_task import DwdLoadTask
dwd_table="billiards_dwd.dwd_table_fee_log"
ods_table="billiards_ods.table_fee_transactions"
conn=psycopg2.connect(os.environ["PG_DSN"])
cur=conn.cursor()
task=DwdLoadTask(config={}, db_connection=None, api_client=None, logger=None)
cur.execute("SELECT column_name FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_dwd", "dwd_table_fee_log"))
dwd_cols=[r[0].lower() for r in cur.fetchall()]
cur.execute("SELECT column_name FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_ods", "table_fee_transactions"))
ods_cols=[r[0].lower() for r in cur.fetchall()]
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_dwd", "dwd_table_fee_log"))
dwd_types={r[0].lower(): r[1].lower() for r in cur.fetchall()}
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_ods", "table_fee_transactions"))
ods_types={r[0].lower(): r[1].lower() for r in cur.fetchall()}
mapping=task.FACT_MAPPINGS.get(dwd_table)
insert_cols=[d for d,o,_ in mapping if o in ods_cols]
select_exprs=[task._cast_expr(o,cast_type) for d,o,cast_type in mapping if o in ods_cols]
sql=f"INSERT INTO {task._format_table(dwd_table,'billiards_dwd')} ({', '.join(f'\"{c}\"' for c in insert_cols)}) SELECT {', '.join(select_exprs)} FROM {task._format_table(ods_table,'billiards_ods')} LIMIT 1"
print(sql)
cur.execute(sql)
conn.commit()
print('ok')
cur.close(); conn.close()