Compare commits
19 Commits
13d853c3f5
...
dev
| Author | SHA1 | Date | |
|---|---|---|---|
| 0ab040b9fb | |||
| 0c29bd41f8 | |||
| 561c640700 | |||
| f301cc1fd5 | |||
| 6f1d163a99 | |||
| a6ad343092 | |||
| b9b050bb5d | |||
| cbd16a39ba | |||
| 92f219b575 | |||
| b1f64c4bac | |||
| ed47754b46 | |||
| fbee8a751e | |||
| cbe48c8ee7 | |||
| 821d302243 | |||
| 9a1df70a23 | |||
| 5bb5a8a568 | |||
| c3749474c6 | |||
| 7f87421678 | |||
| 84e80841cd |
48
.gitignore
vendored
Normal file
48
.gitignore
vendored
Normal file
@@ -0,0 +1,48 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# 虚拟环境
|
||||
venv/
|
||||
ENV/
|
||||
env/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# 日志和导出
|
||||
*.log
|
||||
*.jsonl
|
||||
export/
|
||||
logs/
|
||||
|
||||
# 环境变量
|
||||
.env
|
||||
.env.local
|
||||
|
||||
# 测试
|
||||
.pytest_cache/
|
||||
.coverage
|
||||
htmlcov/
|
||||
57
README.md
57
README.md
@@ -0,0 +1,57 @@
|
||||
# 飞球 ETL 系统(ODS → DWD)
|
||||
|
||||
面向门店业务的 ETL:拉取/或离线灌入上游 JSON,先落地 ODS,再清洗装载 DWD(含 SCD2 维度、事实增量),并提供质量校验报表。
|
||||
|
||||
## 快速运行(离线示例 JSON)
|
||||
1) 环境:Python 3.10+、PostgreSQL;`.env` 关键项:`PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test`,`INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc`。
|
||||
2) 安装依赖:
|
||||
```bash
|
||||
cd etl_billiards
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
3) 一键 ODS→DWD→质检:
|
||||
```bash
|
||||
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA --pipeline-flow INGEST_ONLY
|
||||
python -m etl_billiards.cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
|
||||
python -m etl_billiards.cli.main --tasks DWD_LOAD_FROM_ODS --pipeline-flow INGEST_ONLY
|
||||
python -m etl_billiards.cli.main --tasks DWD_QUALITY_CHECK --pipeline-flow INGEST_ONLY
|
||||
# 报表:etl_billiards/reports/dwd_quality_report.json
|
||||
```
|
||||
|
||||
## 目录与文件作用
|
||||
- 根目录:`etl_billiards/` 主代码;`requirements.txt` 依赖;`run_etl.sh/.bat` 启动脚本;`.env/.env.example` 配置;`tmp/` 存放草稿/调试/备份。
|
||||
- etl_billiards/ 主线目录
|
||||
- `config/`:`defaults.py` 默认值,`env_parser.py` 解析 .env,`settings.py` 统一配置加载。
|
||||
- `api/`:`client.py` HTTP 请求、重试与分页。
|
||||
- `database/`:`connection.py` 连接封装,`operations.py` 批量 upsert,DDL:`schema_ODS_doc.sql`、`schema_dwd_doc.sql`。
|
||||
- `tasks/`:业务任务
|
||||
- `init_schema_task.py`:INIT_ODS_SCHEMA / INIT_DWD_SCHEMA。
|
||||
- `manual_ingest_task.py`:示例 JSON → ODS。
|
||||
- `dwd_load_task.py`:ODS → DWD(映射、SCD2/事实增量)。
|
||||
- 其他任务按需扩展。
|
||||
- `loaders/`:ODS/DWD/SCD2 Loader 实现。
|
||||
- `scd/`:`scd2_handler.py` 处理维度 SCD2 历史。
|
||||
- `quality/`:质量检查器(行数/金额对照)。
|
||||
- `orchestration/`:`scheduler.py` 调度;`task_registry.py` 任务注册;`run_tracker.py` 运行记录。
|
||||
- `scripts/`:重建/测试/探活工具。
|
||||
- `docs/`:`ods_to_dwd_mapping.md` 映射说明,`ods_sample_json.md` 示例 JSON 说明,`dwd_quality_check.md` 质检说明。
|
||||
- `reports/`:质检输出(如 `dwd_quality_report.json`)。
|
||||
- `tests/`:单元/集成测试;`utils/`:通用工具。
|
||||
- `backups/`(若存在):关键文件备份。
|
||||
|
||||
## 业务流程与文件关系
|
||||
1) 调度入口:`cli/main.py` 解析 CLI → `orchestration/scheduler.py` 依 `task_registry.py` 创建任务 → 初始化 DB/API/Config 上下文。
|
||||
2) ODS:`init_schema_task.py` 执行 `schema_ODS_doc.sql` 建表;`manual_ingest_task.py` 从 `INGEST_SOURCE_DIR` 读 JSON,批量 upsert ODS。
|
||||
3) DWD:`init_schema_task.py` 执行 `schema_dwd_doc.sql` 建表;`dwd_load_task.py` 依据 `TABLE_MAP/FACT_MAPPINGS` 从 ODS 清洗写入 DWD,维度走 SCD2(`scd/scd2_handler.py`),事实按时间/水位增量。
|
||||
4) 质检:质量任务读取 ODS/DWD,统计行数/金额,输出 `reports/dwd_quality_report.json`。
|
||||
5) 配置:`config/defaults.py` + `.env` + CLI 参数叠加;HTTP(如启用在线)走 `api/client.py`;DB 访问走 `database/connection.py`。
|
||||
6) 文档:`docs/ods_to_dwd_mapping.md` 记录字段映射;`docs/ods_sample_json.md` 描述示例数据结构,便于对照调试。
|
||||
|
||||
## 当前状态(2025-12-09)
|
||||
- 示例 JSON 全量灌入;DWD 行数与 ODS 对齐。
|
||||
- 分类维度已展平大类+子类:`dim_goods_category` 26 行(category_level/leaf 已赋值)。
|
||||
- 剩余空值多因源数据为空;补值需先确认上游是否提供。
|
||||
|
||||
## 可精简/归档
|
||||
- `tmp/`、`tmp/etl_billiards_misc/` 中的草稿、旧备份、调试脚本仅供参考,运行不依赖。
|
||||
- 根级保留必要文件(README、requirements、run_etl.*、.env/.env.example),其余临时文件已移至 tmp。
|
||||
|
||||
216
README_FULL.md
Normal file
216
README_FULL.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# 飞球 ETL 系统(ODS → DWD)— 详细版
|
||||
|
||||
> 本文为项目的详细说明,保持与当前代码一致,覆盖 ODS 任务、DWD 装载、质检及开发扩展要点。
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目概览
|
||||
|
||||
面向门店业务的 ETL:从上游 API 或离线 JSON 采集订单、支付、会员、库存等数据,先落地 **ODS**,再清洗装载 **DWD**(含 SCD2 维度、事实增量),并输出质量校验报表。项目采用模块化/分层架构(配置、API、数据库、Loader/SCD、质量、调度、CLI、测试),统一通过 CLI 调度。
|
||||
|
||||
---
|
||||
|
||||
## 2. 快速开始(离线示例 JSON)
|
||||
|
||||
**环境要求**:Python 3.10+;PostgreSQL;`.env` 关键项:
|
||||
- `PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test`
|
||||
- `INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc`
|
||||
|
||||
**安装依赖**:
|
||||
```bash
|
||||
cd etl_billiards
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
**一键 ODS → DWD → 质检(离线回放)**:
|
||||
```bash
|
||||
# 初始化 ODS + DWD
|
||||
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA --pipeline-flow INGEST_ONLY
|
||||
|
||||
# 灌入示例 JSON 到 ODS(可用 .env 的 INGEST_SOURCE_DIR 覆盖)
|
||||
python -m etl_billiards.cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
|
||||
|
||||
# 从 ODS 装载 DWD
|
||||
python -m etl_billiards.cli.main --tasks DWD_LOAD_FROM_ODS --pipeline-flow INGEST_ONLY
|
||||
|
||||
# 质量校验报表
|
||||
python -m etl_billiards.cli.main --tasks DWD_QUALITY_CHECK --pipeline-flow INGEST_ONLY
|
||||
# 报表输出:etl_billiards/reports/dwd_quality_report.json
|
||||
```
|
||||
|
||||
> 可按需单独运行:
|
||||
> - 仅建表:`python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA`
|
||||
> - 仅 ODS 灌入:`python -m etl_billiards.cli.main --tasks MANUAL_INGEST`
|
||||
> - 仅 DWD 装载:`python -m etl_billiards.cli.main --tasks INIT_DWD_SCHEMA,DWD_LOAD_FROM_ODS`
|
||||
|
||||
---
|
||||
|
||||
## 3. 配置与路径
|
||||
- 示例数据目录:`C:\dev\LLTQ\export\test-json-doc`(可由 `.env` 的 `INGEST_SOURCE_DIR` 覆盖)。
|
||||
- 日志/导出目录:`LOG_ROOT`、`EXPORT_ROOT` 见 `.env`。
|
||||
- 报表:`etl_billiards/reports/dwd_quality_report.json`。
|
||||
- DDL:`etl_billiards/database/schema_ODS_doc.sql`、`etl_billiards/database/schema_dwd_doc.sql`。
|
||||
- 任务注册:`etl_billiards/orchestration/task_registry.py`(默认启用 INIT_ODS_SCHEMA、MANUAL_INGEST、INIT_DWD_SCHEMA、DWD_LOAD_FROM_ODS、DWD_QUALITY_CHECK)。
|
||||
|
||||
**安全提示**:建议将数据库凭证保存在 `.env` 或受控秘钥管理中,生产环境使用最小权限账号。
|
||||
|
||||
---
|
||||
|
||||
## 4. 目录结构与关键文件
|
||||
- 根目录:`etl_billiards/` 主代码;`requirements.txt` 依赖;`run_etl.sh/.bat` 启动脚本;`.env/.env.example` 配置;`tmp/` 草稿/调试归档。
|
||||
- `config/`:`defaults.py` 默认值,`env_parser.py` 解析 .env,`settings.py` AppConfig 统一加载。
|
||||
- `api/`:`client.py` HTTP 请求、重试、分页。
|
||||
- `database/`:`connection.py` 连接封装;`operations.py` 批量 upsert;DDL SQL(ODS/DWD)。
|
||||
- `tasks/`:
|
||||
- `init_schema_task.py`(INIT_ODS_SCHEMA/INIT_DWD_SCHEMA);
|
||||
- `manual_ingest_task.py`(示例 JSON → ODS);
|
||||
- `dwd_load_task.py`(ODS → DWD 映射、SCD2/事实增量);
|
||||
- 其他任务按需扩展。
|
||||
- `loaders/`:ODS/DWD/SCD2 Loader 实现。
|
||||
- `scd/`:`scd2_handler.py` 处理维度 SCD2 历史。
|
||||
- `quality/`:质量检查器(行数/金额对照)。
|
||||
- `orchestration/`:`scheduler.py` 调度;`task_registry.py` 注册;`run_tracker.py` 运行记录;`cursor_manager.py` 水位管理。
|
||||
- `scripts/`:重建/测试/探活工具。
|
||||
- `docs/`:`ods_to_dwd_mapping.md` 映射说明;`ods_sample_json.md` 示例 JSON 说明;`dwd_quality_check.md` 质检说明。
|
||||
- `reports/`:质检输出(如 `dwd_quality_report.json`)。
|
||||
- `tests/`:单元/集成测试;`utils/`:通用工具;`backups/`:备份(若存在)。
|
||||
|
||||
---
|
||||
|
||||
## 5. 架构与流程
|
||||
执行链路(控制流):
|
||||
1) CLI(`cli/main.py`)解析参数 → 生成 AppConfig → 初始化日志/DB 连接;
|
||||
2) 调度层(`scheduler.py`)按 `task_registry.py` 中的注册表实例化任务,设置 run_uuid、cursor(水位)、上下文;
|
||||
3) 任务基类模板:
|
||||
- 获取时间窗口/水位(cursor_manager);
|
||||
- 拉取数据:在线模式调用 `api/client.py` 支持分页、重试;离线模式直接读取 JSON 文件;
|
||||
- 解析与校验:类型转换、必填校验(如任务内部 parse/validate);
|
||||
- 加载:调用 Loader(`loaders/`)执行批量 Upsert/SCD2/增量写入(底层用 `database/operations.py`);
|
||||
- 质量检查(如需):质量模块对行数/金额等进行对比;
|
||||
- 更新水位与运行记录(`run_tracker.py`),提交/回滚事务。
|
||||
|
||||
数据流与依赖:
|
||||
- 配置:`config/defaults.py` + `.env` + CLI 参数叠加,形成 AppConfig。
|
||||
- API 访问:`api/client.py` 支撑分页/重试;离线 ingest 直接读文件。
|
||||
- DB 访问:`database/connection.py` 提供连接上下文;`operations.py` 负责批量 upsert/分页写入。
|
||||
- ODS:`manual_ingest_task.py` 读取 JSON → ODS 表(保留 payload/来源/时间戳)。
|
||||
- DWD:`dwd_load_task.py` 依据 `TABLE_MAP/FACT_MAPPINGS` 从 ODS 选取字段;维度走 SCD2(`scd/scd2_handler.py`),事实走增量;支持字段表达式(JSON->>、CAST)。
|
||||
- 质检:`quality` 模块或相关任务对 ODS/DWD 行数、金额等进行比对,输出 `reports/`。
|
||||
|
||||
---
|
||||
|
||||
## 6. ODS → DWD 策略
|
||||
1. ODS 留底:保留源主键、payload、时间/来源信息。
|
||||
2. DWD 清洗:维度 SCD2,事实按时间/水位增量;字段类型、单位、枚举标准化,保留可溯源字段。
|
||||
3. 业务键统一:site_id、member_id、table_id、order_settle_id、order_trade_no 等统一命名。
|
||||
4. 不过度汇总:DWD 只做明细/轻度清洗,汇总留待 DWS/报表。
|
||||
5. 去嵌套:数组展开为子表/子行,重复 profile 提炼为维度。
|
||||
6. 长期演进:优先加列/加表,避免频繁改已有表结构。
|
||||
|
||||
---
|
||||
|
||||
## 7. 常用 CLI
|
||||
```bash
|
||||
# 运行所有已注册任务
|
||||
python -m etl_billiards.cli.main
|
||||
# 运行指定任务
|
||||
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,MANUAL_INGEST
|
||||
# 覆盖 DSN
|
||||
python -m etl_billiards.cli.main --pg-dsn "postgresql://user:pwd@host:5432/db"
|
||||
# 覆盖 API
|
||||
python -m etl_billiards.cli.main --api-base "https://api.example.com" --api-token "..."
|
||||
# 试运行(不写库)
|
||||
python -m etl_billiards.cli.main --dry-run --tasks DWD_LOAD_FROM_ODS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. 测试(ONLINE / OFFLINE)
|
||||
- `TEST_MODE=ONLINE`:调用真实 API,全链路 E/T/L。
|
||||
- `TEST_MODE=OFFLINE`:从 `TEST_JSON_ARCHIVE_DIR` 读取离线 JSON,只做 Transform + Load。
|
||||
- `TEST_DB_DSN`:如设置,则集成测试连真库;未设置用内存/临时库。
|
||||
示例:
|
||||
```bash
|
||||
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
|
||||
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
|
||||
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/db --query "SELECT 1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. 开发与扩展
|
||||
- 新任务:在 `tasks/` 继承 BaseTask,实现 `get_task_code/execute`,并在 `orchestration/task_registry.py` 注册。
|
||||
- 新 Loader/Checker:参考 `loaders/`、`quality/` 复用批量 upsert/质检接口。
|
||||
- 配置:`config/defaults.py` + `.env` + CLI 叠加,新增配置需在 defaults 与 env_parser 中声明。
|
||||
|
||||
---
|
||||
|
||||
## 10. ODS 任务上线指引
|
||||
- 任务注册脚本:`etl_billiards/database/seed_ods_tasks.sql`(替换 store_id 后执行:`psql "$PG_DSN" -f ...`)。
|
||||
- 确认 `etl_admin.etl_task` 中已启用所需 ODS 任务。
|
||||
- 离线回放:可用 `scripts/rebuild_ods_from_json`(如有)从本地 JSON 重建 ODS。
|
||||
- 单测:`pytest etl_billiards/tests/unit/test_ods_tasks.py`。
|
||||
|
||||
---
|
||||
|
||||
## 11. ODS 表概览(数据路径)
|
||||
|
||||
| ODS 表名 | 接口 Path | 数据列表路径 |
|
||||
| ------------------------------------ | ------------------------------------------------- | ----------------------------- |
|
||||
| assistant_accounts_master | /PersonnelManagement/SearchAssistantInfo | data.assistantInfos |
|
||||
| assistant_service_records | /AssistantPerformance/GetOrderAssistantDetails | data.orderAssistantDetails |
|
||||
| assistant_cancellation_records | /AssistantPerformance/GetAbolitionAssistant | data.abolitionAssistants |
|
||||
| goods_stock_movements | /GoodsStockManage/QueryGoodsOutboundReceipt | data.queryDeliveryRecordsList |
|
||||
| goods_stock_summary | /TenantGoods/GetGoodsStockReport | data |
|
||||
| group_buy_packages | /PackageCoupon/QueryPackageCouponList | data.packageCouponList |
|
||||
| group_buy_redemption_records | /Site/GetSiteTableUseDetails | data.siteTableUseDetailsList |
|
||||
| member_profiles | /MemberProfile/GetTenantMemberList | data.tenantMemberInfos |
|
||||
| member_balance_changes | /MemberProfile/GetMemberCardBalanceChange | data.tenantMemberCardLogs |
|
||||
| member_stored_value_cards | /MemberProfile/GetTenantMemberCardList | data.tenantMemberCards |
|
||||
| payment_transactions | /PayLog/GetPayLogListPage | data |
|
||||
| platform_coupon_redemption_records | /Promotion/GetOfflineCouponConsumePageList | data |
|
||||
| recharge_settlements | /Site/GetRechargeSettleList | data.settleList |
|
||||
| refund_transactions | /Order/GetRefundPayLogList | data |
|
||||
| settlement_records | /Site/GetAllOrderSettleList | data.settleList |
|
||||
| settlement_ticket_details | /Order/GetOrderSettleTicketNew | 完整 JSON |
|
||||
| site_tables_master | /Table/GetSiteTables | data.siteTables |
|
||||
| stock_goods_category_tree | /TenantGoodsCategory/QueryPrimarySecondaryCategory| data.goodsCategoryList |
|
||||
| store_goods_master | /TenantGoods/GetGoodsInventoryList | data.orderGoodsList |
|
||||
| store_goods_sales_records | /TenantGoods/GetGoodsSalesList | data.orderGoodsLedgers |
|
||||
| table_fee_discount_records | /Site/GetTaiFeeAdjustList | data.taiFeeAdjustInfos |
|
||||
| table_fee_transactions | /Site/GetSiteTableOrderDetails | data.siteTableUseDetailsList |
|
||||
| tenant_goods_master | /TenantGoods/QueryTenantGoods | data.tenantGoodsList |
|
||||
|
||||
> 完整字段级映射见 `docs/` 与 ODS/DWD DDL。
|
||||
|
||||
---
|
||||
|
||||
## 12. DWD 维度与建模要点
|
||||
1. 颗粒一致、单一业务键:一张 DWD 表只承载一种业务事件/颗粒,避免混颗粒。
|
||||
2. 先理解业务链路,再建模;不要机械按 JSON 列表建表。
|
||||
3. 业务键统一:site_id、member_id、table_id、order_settle_id、order_trade_no 等必须一致命名。
|
||||
4. 保留明细,不过度汇总;聚合留到 DWS/报表。
|
||||
5. 清洗标准化同时保留溯源字段(源主键、时间、金额、payload)。
|
||||
6. 去嵌套与解耦:数组展开子行,重复 profile 提炼维度。
|
||||
7. 演进优先加列/加表,减少对已有表结构的破坏。
|
||||
|
||||
---
|
||||
|
||||
## 13. 当前状态(2025-12-09)
|
||||
- 示例 JSON 已全量灌入,DWD 行数与 ODS 对齐。
|
||||
- 分类维度已展平大类+子类:`dim_goods_category` 26 行(category_level/leaf 已赋值)。
|
||||
- 部分空字段源数据即为空,如需补值请先确认上游。
|
||||
|
||||
---
|
||||
|
||||
## 14. 可精简/归档
|
||||
- `tmp/`、`tmp/etl_billiards_misc/` 中草稿、旧备份、调试脚本仅供参考,不影响运行。
|
||||
- 根级保留必要文件(README、requirements、run_etl.*、.env/.env.example),其他临时文件已移至 tmp。
|
||||
|
||||
---
|
||||
|
||||
## 15. FAQ
|
||||
- 字段空值:若映射已存在且源列非空仍为空,再检查上游 JSON;维度 SCD2 按全量合并。
|
||||
- DSN/路径:确认 `.env` 中 `PG_DSN`、`INGEST_SOURCE_DIR` 与本地一致。
|
||||
- 新增任务:在 `tasks/` 实现并注册到 `task_registry.py`,必要时同步更新 DDL 与映射。
|
||||
- 权限/运行:检查网络、账号权限;脚本需执行权限(如 `chmod +x run_etl.sh`)。
|
||||
9
app/etl_busy.py
Normal file
9
app/etl_busy.py
Normal file
@@ -0,0 +1,9 @@
|
||||
# app/etl_busy.py
|
||||
def run():
|
||||
"""
|
||||
忙时抓取逻辑。
|
||||
TODO: 这里写具体抓取流程(API 调用 / 网页解析 / 写入 PostgreSQL 等)
|
||||
"""
|
||||
print("Running busy-period ETL...")
|
||||
# 示例:后续在这里接 PostgreSQL 或 HTTP 抓取
|
||||
# ...
|
||||
8
app/etl_idle.py
Normal file
8
app/etl_idle.py
Normal file
@@ -0,0 +1,8 @@
|
||||
# app/etl_idle.py
|
||||
def run():
|
||||
"""
|
||||
闲时抓取逻辑。
|
||||
可以做全量同步、大批量历史修正等。
|
||||
"""
|
||||
print("Running idle-period ETL...")
|
||||
# ...
|
||||
31
app/runner.py
Normal file
31
app/runner.py
Normal file
@@ -0,0 +1,31 @@
|
||||
# app/runner.py
|
||||
import argparse
|
||||
from datetime import datetime
|
||||
|
||||
from . import etl_busy, etl_idle
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Feiqiu ETL Runner")
|
||||
parser.add_argument(
|
||||
"--mode",
|
||||
choices=["busy", "idle"],
|
||||
required=True,
|
||||
help="ETL mode: busy or idle",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
|
||||
print(f"[{now}] Start ETL mode={args.mode}")
|
||||
|
||||
if args.mode == "busy":
|
||||
etl_busy.run()
|
||||
else:
|
||||
etl_idle.run()
|
||||
|
||||
print(f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] ETL finished.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
49
etl_billiards/.env
Normal file
49
etl_billiards/.env
Normal file
@@ -0,0 +1,49 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# 文件说明:ETL 环境变量(config/env_parser.py 读取),用于数据库连接、目录与运行参数。
|
||||
|
||||
# 数据库连接字符串,config/env_parser.py -> db.dsn,所有任务必需
|
||||
PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
|
||||
# 数据库连接超时秒,config/env_parser.py -> db.connect_timeout_sec
|
||||
PG_CONNECT_TIMEOUT=10
|
||||
|
||||
# 门店/租户ID,config/env_parser.py -> app.store_id,任务调度记录使用
|
||||
STORE_ID=2790685415443269
|
||||
# 时区标识,config/env_parser.py -> app.timezone
|
||||
TIMEZONE=Asia/Taipei
|
||||
|
||||
# API 基础地址,config/env_parser.py -> api.base_url,FETCH 类任务调用
|
||||
API_BASE=https://api.example.com
|
||||
# API 鉴权 Token,config/env_parser.py -> api.token,FETCH 类任务调用
|
||||
API_TOKEN=your_token_here
|
||||
# API 请求超时秒,config/env_parser.py -> api.timeout_sec
|
||||
API_TIMEOUT=20
|
||||
# API 分页大小,config/env_parser.py -> api.page_size
|
||||
API_PAGE_SIZE=200
|
||||
# API 最大重试次数,config/env_parser.py -> api.retries.max_attempts
|
||||
API_RETRY_MAX=3
|
||||
|
||||
# 日志根目录,config/env_parser.py -> io.log_root,Init/任务运行写日志
|
||||
LOG_ROOT=C:\dev\LLTQ\export\LOG
|
||||
# JSON 导出根目录,config/env_parser.py -> io.export_root,FETCH 产出及 INIT 准备
|
||||
EXPORT_ROOT=C:\dev\LLTQ\export\JSON
|
||||
|
||||
# FETCH 模式本地输出目录,config/env_parser.py -> pipeline.fetch_root
|
||||
FETCH_ROOT=C:\dev\LLTQ\export\JSON
|
||||
# 本地入库 JSON 目录,config/env_parser.py -> pipeline.ingest_source_dir,MANUAL_INGEST/INGEST_ONLY 使用
|
||||
INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc
|
||||
|
||||
# JSON 漂亮格式输出开关,config/env_parser.py -> io.write_pretty_json
|
||||
WRITE_PRETTY_JSON=false
|
||||
|
||||
# 运行流程:FULL / FETCH_ONLY / INGEST_ONLY,config/env_parser.py -> pipeline.flow
|
||||
PIPELINE_FLOW=FULL
|
||||
# 指定任务列表(逗号分隔,覆盖默认),config/env_parser.py -> run.tasks
|
||||
# RUN_TASKS=INIT_ODS_SCHEMA,MANUAL_INGEST
|
||||
|
||||
# 窗口/补偿参数,config/env_parser.py -> run.*
|
||||
OVERLAP_SECONDS=120
|
||||
WINDOW_BUSY_MIN=30
|
||||
WINDOW_IDLE_MIN=180
|
||||
IDLE_START=04:00
|
||||
IDLE_END=16:00
|
||||
ALLOW_EMPTY_RESULT_ADVANCE=true
|
||||
0
etl_billiards/__init__.py
Normal file
0
etl_billiards/__init__.py
Normal file
0
etl_billiards/api/__init__.py
Normal file
0
etl_billiards/api/__init__.py
Normal file
256
etl_billiards/api/client.py
Normal file
256
etl_billiards/api/client.py
Normal file
@@ -0,0 +1,256 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""API客户端:统一封装 POST/重试/分页与列表提取逻辑。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Iterable, Sequence, Tuple
|
||||
|
||||
import requests
|
||||
from requests.adapters import HTTPAdapter
|
||||
from urllib3.util.retry import Retry
|
||||
|
||||
DEFAULT_BROWSER_HEADERS = {
|
||||
"Accept": "application/json, text/plain, */*",
|
||||
"Content-Type": "application/json",
|
||||
"Origin": "https://pc.ficoo.vip",
|
||||
"Referer": "https://pc.ficoo.vip/",
|
||||
"User-Agent": (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
|
||||
"(KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36"
|
||||
),
|
||||
"Accept-Language": "zh-CN,zh;q=0.9",
|
||||
"sec-ch-ua": '"Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"',
|
||||
"sec-ch-ua-platform": '"Windows"',
|
||||
"sec-ch-ua-mobile": "?0",
|
||||
"sec-fetch-site": "same-origin",
|
||||
"sec-fetch-mode": "cors",
|
||||
"sec-fetch-dest": "empty",
|
||||
"priority": "u=1, i",
|
||||
"X-Requested-With": "XMLHttpRequest",
|
||||
"DNT": "1",
|
||||
}
|
||||
|
||||
DEFAULT_LIST_KEYS: Tuple[str, ...] = (
|
||||
"list",
|
||||
"rows",
|
||||
"records",
|
||||
"items",
|
||||
"dataList",
|
||||
"data_list",
|
||||
"tenantMemberInfos",
|
||||
"tenantMemberCardLogs",
|
||||
"tenantMemberCards",
|
||||
"settleList",
|
||||
"orderAssistantDetails",
|
||||
"assistantInfos",
|
||||
"siteTables",
|
||||
"taiFeeAdjustInfos",
|
||||
"siteTableUseDetailsList",
|
||||
"tenantGoodsList",
|
||||
"packageCouponList",
|
||||
"queryDeliveryRecordsList",
|
||||
"goodsCategoryList",
|
||||
"orderGoodsList",
|
||||
"orderGoodsLedgers",
|
||||
)
|
||||
|
||||
|
||||
class APIClient:
|
||||
"""HTTP API 客户端(默认使用 POST + JSON 请求体)"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
base_url: str,
|
||||
token: str | None = None,
|
||||
timeout: int = 20,
|
||||
retry_max: int = 3,
|
||||
headers_extra: dict | None = None,
|
||||
):
|
||||
self.base_url = (base_url or "").rstrip("/")
|
||||
self.token = self._normalize_token(token)
|
||||
self.timeout = timeout
|
||||
self.retry_max = retry_max
|
||||
self.headers_extra = headers_extra or {}
|
||||
self._session: requests.Session | None = None
|
||||
|
||||
# ------------------------------------------------------------------ HTTP 基础
|
||||
def _get_session(self) -> requests.Session:
|
||||
"""获取或创建带重试的 Session。"""
|
||||
if self._session is None:
|
||||
self._session = requests.Session()
|
||||
|
||||
retries = max(0, int(self.retry_max) - 1)
|
||||
retry = Retry(
|
||||
total=None,
|
||||
connect=retries,
|
||||
read=retries,
|
||||
status=retries,
|
||||
allowed_methods=frozenset(["GET", "POST"]),
|
||||
status_forcelist=(429, 500, 502, 503, 504),
|
||||
backoff_factor=0.5,
|
||||
respect_retry_after_header=True,
|
||||
raise_on_status=False,
|
||||
)
|
||||
|
||||
adapter = HTTPAdapter(max_retries=retry)
|
||||
self._session.mount("http://", adapter)
|
||||
self._session.mount("https://", adapter)
|
||||
self._session.headers.update(self._build_headers())
|
||||
|
||||
return self._session
|
||||
|
||||
def get(self, endpoint: str, params: dict | None = None) -> dict:
|
||||
"""
|
||||
兼容旧名的请求入口(实际以 POST JSON 方式请求)。
|
||||
"""
|
||||
return self._post_json(endpoint, params)
|
||||
|
||||
def _post_json(self, endpoint: str, payload: dict | None = None) -> dict:
|
||||
if not self.base_url:
|
||||
raise ValueError("API base_url 未配置")
|
||||
|
||||
url = f"{self.base_url}/{endpoint.lstrip('/')}"
|
||||
sess = self._get_session()
|
||||
resp = sess.post(url, json=payload or {}, timeout=self.timeout)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
self._ensure_success(data)
|
||||
return data
|
||||
|
||||
def _build_headers(self) -> dict:
|
||||
headers = dict(DEFAULT_BROWSER_HEADERS)
|
||||
headers.update(self.headers_extra)
|
||||
if self.token:
|
||||
headers["Authorization"] = self.token
|
||||
return headers
|
||||
|
||||
@staticmethod
|
||||
def _normalize_token(token: str | None) -> str | None:
|
||||
if not token:
|
||||
return None
|
||||
t = str(token).strip()
|
||||
if not t.lower().startswith("bearer "):
|
||||
t = f"Bearer {t}"
|
||||
return t
|
||||
|
||||
@staticmethod
|
||||
def _ensure_success(payload: dict):
|
||||
"""API 返回 code 非 0 时主动抛错,便于上层重试/记录。"""
|
||||
if isinstance(payload, dict) and "code" in payload:
|
||||
code = payload.get("code")
|
||||
if code not in (0, "0", None):
|
||||
msg = payload.get("msg") or payload.get("message") or ""
|
||||
raise ValueError(f"API 返回错误 code={code} msg={msg}")
|
||||
|
||||
# ------------------------------------------------------------------ 分页
|
||||
def iter_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict | None,
|
||||
page_size: int | None = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | Sequence[str] | None = None,
|
||||
page_start: int = 1,
|
||||
page_end: int | None = None,
|
||||
) -> Iterable[tuple[int, list, dict, dict]]:
|
||||
"""
|
||||
分页迭代器:逐页拉取数据并产出 (page_no, records, request_params, raw_response)。
|
||||
page_size=None 时不附带分页参数,仅拉取一次。
|
||||
"""
|
||||
base_params = dict(params or {})
|
||||
page = page_start
|
||||
|
||||
while True:
|
||||
page_params = dict(base_params)
|
||||
if page_size is not None:
|
||||
page_params[page_field] = page
|
||||
page_params[size_field] = page_size
|
||||
|
||||
payload = self._post_json(endpoint, page_params)
|
||||
records = self._extract_list(payload, data_path, list_key)
|
||||
|
||||
yield page, records, page_params, payload
|
||||
|
||||
if page_size is None:
|
||||
break
|
||||
if page_end is not None and page >= page_end:
|
||||
break
|
||||
if len(records) < (page_size or 0):
|
||||
break
|
||||
if len(records) == 0:
|
||||
break
|
||||
|
||||
page += 1
|
||||
|
||||
def get_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict,
|
||||
page_size: int | None = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | Sequence[str] | None = None,
|
||||
page_start: int = 1,
|
||||
page_end: int | None = None,
|
||||
) -> tuple[list, list]:
|
||||
"""分页获取数据并将所有记录汇总在一个列表中。"""
|
||||
records, pages_meta = [], []
|
||||
|
||||
for page_no, page_records, request_params, response in self.iter_paginated(
|
||||
endpoint=endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
page_field=page_field,
|
||||
size_field=size_field,
|
||||
data_path=data_path,
|
||||
list_key=list_key,
|
||||
page_start=page_start,
|
||||
page_end=page_end,
|
||||
):
|
||||
records.extend(page_records)
|
||||
pages_meta.append(
|
||||
{"page": page_no, "request": request_params, "response": response}
|
||||
)
|
||||
|
||||
return records, pages_meta
|
||||
|
||||
# ------------------------------------------------------------------ 响应解析
|
||||
@classmethod
|
||||
def _extract_list(
|
||||
cls, payload: dict | list, data_path: tuple, list_key: str | Sequence[str] | None
|
||||
) -> list:
|
||||
"""根据 data_path/list_key 提取列表结构,兼容常见字段名。"""
|
||||
cur: object = payload
|
||||
|
||||
if isinstance(cur, list):
|
||||
return cur
|
||||
|
||||
for key in data_path:
|
||||
if isinstance(cur, dict):
|
||||
cur = cur.get(key)
|
||||
else:
|
||||
cur = None
|
||||
if cur is None:
|
||||
break
|
||||
|
||||
if isinstance(cur, list):
|
||||
return cur
|
||||
|
||||
if isinstance(cur, dict):
|
||||
if list_key:
|
||||
keys = (list_key,) if isinstance(list_key, str) else tuple(list_key)
|
||||
for k in keys:
|
||||
if isinstance(cur.get(k), list):
|
||||
return cur[k]
|
||||
|
||||
for k in DEFAULT_LIST_KEYS:
|
||||
if isinstance(cur.get(k), list):
|
||||
return cur[k]
|
||||
|
||||
for v in cur.values():
|
||||
if isinstance(v, list):
|
||||
return v
|
||||
|
||||
return []
|
||||
74
etl_billiards/api/local_json_client.py
Normal file
74
etl_billiards/api/local_json_client.py
Normal file
@@ -0,0 +1,74 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""本地 JSON 客户端,模拟 APIClient 的分页接口,从落盘的 JSON 回放数据。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Iterable, Tuple
|
||||
|
||||
from api.client import APIClient
|
||||
from utils.json_store import endpoint_to_filename
|
||||
|
||||
|
||||
class LocalJsonClient:
|
||||
"""
|
||||
读取 RecordingAPIClient 生成的 JSON,提供 iter_paginated/get_paginated 接口。
|
||||
"""
|
||||
|
||||
def __init__(self, base_dir: str | Path):
|
||||
self.base_dir = Path(base_dir)
|
||||
if not self.base_dir.exists():
|
||||
raise FileNotFoundError(f"JSON 目录不存在: {self.base_dir}")
|
||||
|
||||
def iter_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict | None,
|
||||
page_size: int = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | None = None,
|
||||
) -> Iterable[Tuple[int, list, dict, dict]]:
|
||||
file_path = self.base_dir / endpoint_to_filename(endpoint)
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"未找到匹配的 JSON 文件: {file_path}")
|
||||
|
||||
with file_path.open("r", encoding="utf-8") as fp:
|
||||
payload = json.load(fp)
|
||||
|
||||
pages = payload.get("pages")
|
||||
if not isinstance(pages, list) or not pages:
|
||||
pages = [{"page": 1, "request": params or {}, "response": payload}]
|
||||
|
||||
for idx, page in enumerate(pages, start=1):
|
||||
response = page.get("response", {})
|
||||
request_params = page.get("request") or {}
|
||||
page_no = page.get("page") or idx
|
||||
records = APIClient._extract_list(response, data_path, list_key) # type: ignore[attr-defined]
|
||||
yield page_no, records, request_params, response
|
||||
|
||||
def get_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict,
|
||||
page_size: int = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | None = None,
|
||||
) -> tuple[list, list]:
|
||||
records: list = []
|
||||
pages_meta: list = []
|
||||
for page_no, page_records, request_params, response in self.iter_paginated(
|
||||
endpoint=endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
page_field=page_field,
|
||||
size_field=size_field,
|
||||
data_path=data_path,
|
||||
list_key=list_key,
|
||||
):
|
||||
records.extend(page_records)
|
||||
pages_meta.append({"page": page_no, "request": request_params, "response": response})
|
||||
return records, pages_meta
|
||||
118
etl_billiards/api/recording_client.py
Normal file
118
etl_billiards/api/recording_client.py
Normal file
@@ -0,0 +1,118 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""包装 APIClient,将分页响应落盘便于后续本地清洗。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterable, Tuple
|
||||
|
||||
from api.client import APIClient
|
||||
from utils.json_store import dump_json, endpoint_to_filename
|
||||
|
||||
|
||||
class RecordingAPIClient:
|
||||
"""
|
||||
代理 APIClient,在调用 iter_paginated/get_paginated 时同时把响应写入 JSON 文件。
|
||||
文件名根据 endpoint 生成,写入到指定 output_dir。
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
base_client: APIClient,
|
||||
output_dir: Path | str,
|
||||
task_code: str,
|
||||
run_id: int,
|
||||
write_pretty: bool = False,
|
||||
):
|
||||
self.base = base_client
|
||||
self.output_dir = Path(output_dir)
|
||||
self.output_dir.mkdir(parents=True, exist_ok=True)
|
||||
self.task_code = task_code
|
||||
self.run_id = run_id
|
||||
self.write_pretty = write_pretty
|
||||
self.last_dump: dict[str, Any] | None = None
|
||||
|
||||
# ------------------------------------------------------------------ public API
|
||||
def iter_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict | None,
|
||||
page_size: int = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | None = None,
|
||||
) -> Iterable[Tuple[int, list, dict, dict]]:
|
||||
pages: list[dict[str, Any]] = []
|
||||
total_records = 0
|
||||
|
||||
for page_no, records, request_params, response in self.base.iter_paginated(
|
||||
endpoint=endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
page_field=page_field,
|
||||
size_field=size_field,
|
||||
data_path=data_path,
|
||||
list_key=list_key,
|
||||
):
|
||||
pages.append({"page": page_no, "request": request_params, "response": response})
|
||||
total_records += len(records)
|
||||
yield page_no, records, request_params, response
|
||||
|
||||
self._dump(endpoint, params, page_size, pages, total_records)
|
||||
|
||||
def get_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict,
|
||||
page_size: int = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | None = None,
|
||||
) -> tuple[list, list]:
|
||||
records: list = []
|
||||
pages_meta: list = []
|
||||
|
||||
for page_no, page_records, request_params, response in self.iter_paginated(
|
||||
endpoint=endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
page_field=page_field,
|
||||
size_field=size_field,
|
||||
data_path=data_path,
|
||||
list_key=list_key,
|
||||
):
|
||||
records.extend(page_records)
|
||||
pages_meta.append({"page": page_no, "request": request_params, "response": response})
|
||||
|
||||
return records, pages_meta
|
||||
|
||||
# ------------------------------------------------------------------ internal
|
||||
def _dump(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict | None,
|
||||
page_size: int,
|
||||
pages: list[dict[str, Any]],
|
||||
total_records: int,
|
||||
):
|
||||
filename = endpoint_to_filename(endpoint)
|
||||
path = self.output_dir / filename
|
||||
payload = {
|
||||
"task_code": self.task_code,
|
||||
"run_id": self.run_id,
|
||||
"endpoint": endpoint,
|
||||
"params": params or {},
|
||||
"page_size": page_size,
|
||||
"pages": pages,
|
||||
"total_records": total_records,
|
||||
"dumped_at": datetime.utcnow().isoformat() + "Z",
|
||||
}
|
||||
dump_json(path, payload, pretty=self.write_pretty)
|
||||
self.last_dump = {
|
||||
"file": str(path),
|
||||
"endpoint": endpoint,
|
||||
"pages": len(pages),
|
||||
"records": total_records,
|
||||
}
|
||||
0
etl_billiards/cli/__init__.py
Normal file
0
etl_billiards/cli/__init__.py
Normal file
158
etl_billiards/cli/main.py
Normal file
158
etl_billiards/cli/main.py
Normal file
@@ -0,0 +1,158 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""CLI主入口"""
|
||||
import sys
|
||||
import argparse
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
from config.settings import AppConfig
|
||||
from orchestration.scheduler import ETLScheduler
|
||||
|
||||
def setup_logging():
|
||||
"""设置日志"""
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s [%(levelname)s] %(name)s: %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
return logging.getLogger("etl_billiards")
|
||||
|
||||
def parse_args():
|
||||
"""解析命令行参数"""
|
||||
parser = argparse.ArgumentParser(description="台球场ETL系统")
|
||||
|
||||
# 基本参数
|
||||
parser.add_argument("--store-id", type=int, help="门店ID")
|
||||
parser.add_argument("--tasks", help="任务列表,逗号分隔")
|
||||
parser.add_argument("--dry-run", action="store_true", help="试运行(不提交)")
|
||||
|
||||
# 数据库参数
|
||||
parser.add_argument("--pg-dsn", help="PostgreSQL DSN")
|
||||
parser.add_argument("--pg-host", help="PostgreSQL主机")
|
||||
parser.add_argument("--pg-port", type=int, help="PostgreSQL端口")
|
||||
parser.add_argument("--pg-name", help="PostgreSQL数据库名")
|
||||
parser.add_argument("--pg-user", help="PostgreSQL用户名")
|
||||
parser.add_argument("--pg-password", help="PostgreSQL密码")
|
||||
|
||||
# API参数
|
||||
parser.add_argument("--api-base", help="API基础URL")
|
||||
parser.add_argument("--api-token", "--token", dest="api_token", help="API令牌(Bearer Token)")
|
||||
parser.add_argument("--api-timeout", type=int, help="API超时(秒)")
|
||||
parser.add_argument("--api-page-size", type=int, help="分页大小")
|
||||
parser.add_argument("--api-retry-max", type=int, help="API重试最大次数")
|
||||
|
||||
# 目录参数
|
||||
parser.add_argument("--export-root", help="导出根目录")
|
||||
parser.add_argument("--log-root", help="日志根目录")
|
||||
|
||||
# 抓取/清洗管线
|
||||
parser.add_argument("--pipeline-flow", choices=["FULL", "FETCH_ONLY", "INGEST_ONLY"], help="流水线模式")
|
||||
parser.add_argument("--fetch-root", help="抓取JSON输出根目录")
|
||||
parser.add_argument("--ingest-source", help="本地清洗入库源目录")
|
||||
parser.add_argument("--write-pretty-json", action="store_true", help="抓取JSON美化输出")
|
||||
|
||||
# 运行窗口
|
||||
parser.add_argument("--idle-start", help="闲时窗口开始(HH:MM)")
|
||||
parser.add_argument("--idle-end", help="闲时窗口结束(HH:MM)")
|
||||
parser.add_argument("--allow-empty-advance", action="store_true", help="允许空结果推进窗口")
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
def build_cli_overrides(args) -> dict:
|
||||
"""从命令行参数构建配置覆盖"""
|
||||
overrides = {}
|
||||
|
||||
# 基本信息
|
||||
if args.store_id is not None:
|
||||
overrides.setdefault("app", {})["store_id"] = args.store_id
|
||||
|
||||
# 数据库
|
||||
if args.pg_dsn:
|
||||
overrides.setdefault("db", {})["dsn"] = args.pg_dsn
|
||||
if args.pg_host:
|
||||
overrides.setdefault("db", {})["host"] = args.pg_host
|
||||
if args.pg_port:
|
||||
overrides.setdefault("db", {})["port"] = args.pg_port
|
||||
if args.pg_name:
|
||||
overrides.setdefault("db", {})["name"] = args.pg_name
|
||||
if args.pg_user:
|
||||
overrides.setdefault("db", {})["user"] = args.pg_user
|
||||
if args.pg_password:
|
||||
overrides.setdefault("db", {})["password"] = args.pg_password
|
||||
|
||||
# API
|
||||
if args.api_base:
|
||||
overrides.setdefault("api", {})["base_url"] = args.api_base
|
||||
if args.api_token:
|
||||
overrides.setdefault("api", {})["token"] = args.api_token
|
||||
if args.api_timeout:
|
||||
overrides.setdefault("api", {})["timeout_sec"] = args.api_timeout
|
||||
if args.api_page_size:
|
||||
overrides.setdefault("api", {})["page_size"] = args.api_page_size
|
||||
if args.api_retry_max:
|
||||
overrides.setdefault("api", {}).setdefault("retries", {})["max_attempts"] = args.api_retry_max
|
||||
|
||||
# 目录
|
||||
if args.export_root:
|
||||
overrides.setdefault("io", {})["export_root"] = args.export_root
|
||||
if args.log_root:
|
||||
overrides.setdefault("io", {})["log_root"] = args.log_root
|
||||
|
||||
# 抓取/清洗管线
|
||||
if args.pipeline_flow:
|
||||
overrides.setdefault("pipeline", {})["flow"] = args.pipeline_flow.upper()
|
||||
if args.fetch_root:
|
||||
overrides.setdefault("pipeline", {})["fetch_root"] = args.fetch_root
|
||||
if args.ingest_source:
|
||||
overrides.setdefault("pipeline", {})["ingest_source_dir"] = args.ingest_source
|
||||
if args.write_pretty_json:
|
||||
overrides.setdefault("io", {})["write_pretty_json"] = True
|
||||
|
||||
# 运行窗口
|
||||
if args.idle_start:
|
||||
overrides.setdefault("run", {}).setdefault("idle_window", {})["start"] = args.idle_start
|
||||
if args.idle_end:
|
||||
overrides.setdefault("run", {}).setdefault("idle_window", {})["end"] = args.idle_end
|
||||
if args.allow_empty_advance:
|
||||
overrides.setdefault("run", {})["allow_empty_result_advance"] = True
|
||||
|
||||
# 任务
|
||||
if args.tasks:
|
||||
tasks = [t.strip().upper() for t in args.tasks.split(",") if t.strip()]
|
||||
overrides.setdefault("run", {})["tasks"] = tasks
|
||||
|
||||
return overrides
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
logger = setup_logging()
|
||||
args = parse_args()
|
||||
|
||||
try:
|
||||
# 加载配置
|
||||
cli_overrides = build_cli_overrides(args)
|
||||
config = AppConfig.load(cli_overrides)
|
||||
|
||||
logger.info("配置加载完成")
|
||||
logger.info(f"门店ID: {config.get('app.store_id')}")
|
||||
logger.info(f"任务列表: {config.get('run.tasks')}")
|
||||
|
||||
# 创建调度器
|
||||
scheduler = ETLScheduler(config, logger)
|
||||
|
||||
# 运行任务
|
||||
task_codes = config.get("run.tasks")
|
||||
scheduler.run_tasks(task_codes)
|
||||
|
||||
# 关闭连接
|
||||
scheduler.close()
|
||||
|
||||
logger.info("ETL运行完成")
|
||||
return 0
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"ETL运行失败: {e}", exc_info=True)
|
||||
return 1
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
0
etl_billiards/config/__init__.py
Normal file
0
etl_billiards/config/__init__.py
Normal file
120
etl_billiards/config/defaults.py
Normal file
120
etl_billiards/config/defaults.py
Normal file
@@ -0,0 +1,120 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""配置默认值定义"""
|
||||
|
||||
DEFAULTS = {
|
||||
"app": {
|
||||
"timezone": "Asia/Taipei",
|
||||
"store_id": "",
|
||||
"schema_oltp": "billiards",
|
||||
"schema_etl": "etl_admin",
|
||||
},
|
||||
"db": {
|
||||
"dsn": "",
|
||||
"host": "",
|
||||
"port": "",
|
||||
"name": "",
|
||||
"user": "",
|
||||
"password": "",
|
||||
"connect_timeout_sec": 20,
|
||||
"batch_size": 1000,
|
||||
"session": {
|
||||
"timezone": "Asia/Taipei",
|
||||
"statement_timeout_ms": 30000,
|
||||
"lock_timeout_ms": 5000,
|
||||
"idle_in_tx_timeout_ms": 600000,
|
||||
},
|
||||
},
|
||||
"api": {
|
||||
"base_url": "https://pc.ficoo.vip/apiprod/admin/v1",
|
||||
"token": None,
|
||||
"timeout_sec": 20,
|
||||
"page_size": 200,
|
||||
"params": {},
|
||||
"retries": {
|
||||
"max_attempts": 3,
|
||||
"backoff_sec": [1, 2, 4],
|
||||
},
|
||||
"headers_extra": {},
|
||||
},
|
||||
"run": {
|
||||
"tasks": [
|
||||
"PRODUCTS",
|
||||
"TABLES",
|
||||
"MEMBERS",
|
||||
"ASSISTANTS",
|
||||
"PACKAGES_DEF",
|
||||
"ORDERS",
|
||||
"PAYMENTS",
|
||||
"REFUNDS",
|
||||
"COUPON_USAGE",
|
||||
"INVENTORY_CHANGE",
|
||||
"TOPUPS",
|
||||
"TABLE_DISCOUNT",
|
||||
"ASSISTANT_ABOLISH",
|
||||
"LEDGER",
|
||||
],
|
||||
"window_minutes": {
|
||||
"default_busy": 30,
|
||||
"default_idle": 180,
|
||||
},
|
||||
"overlap_seconds": 120,
|
||||
"idle_window": {
|
||||
"start": "04:00",
|
||||
"end": "16:00",
|
||||
},
|
||||
"allow_empty_result_advance": True,
|
||||
},
|
||||
"io": {
|
||||
"export_root": r"C:\dev\LLTQ\export\JSON",
|
||||
"log_root": r"C:\dev\LLTQ\export\LOG",
|
||||
"manifest_name": "manifest.json",
|
||||
"ingest_report_name": "ingest_report.json",
|
||||
"write_pretty_json": True,
|
||||
"max_file_bytes": 50 * 1024 * 1024,
|
||||
},
|
||||
"pipeline": {
|
||||
# 运行流程:FETCH_ONLY(仅在线抓取落盘)、INGEST_ONLY(本地清洗入库)、FULL(抓取 + 清洗入库)
|
||||
"flow": "FULL",
|
||||
# 在线抓取 JSON 输出根目录(按任务、run_id 与时间自动创建子目录)
|
||||
"fetch_root": r"C:\dev\LLTQ\export\JSON",
|
||||
# 本地清洗入库时的 JSON 输入目录(为空则默认使用本次抓取目录)
|
||||
"ingest_source_dir": "",
|
||||
},
|
||||
"clean": {
|
||||
"log_unknown_fields": True,
|
||||
"unknown_fields_limit": 50,
|
||||
"hash_key": {
|
||||
"algo": "sha1",
|
||||
"salt": "",
|
||||
},
|
||||
"strict_numeric": True,
|
||||
"round_money_scale": 2,
|
||||
},
|
||||
"security": {
|
||||
"redact_in_logs": True,
|
||||
"redact_keys": ["token", "password", "Authorization"],
|
||||
"echo_token_in_logs": False,
|
||||
},
|
||||
"ods": {
|
||||
# ODS 离线重建/回放相关(仅开发/运维使用)
|
||||
"json_doc_dir": r"C:\dev\LLTQ\export\test-json-doc",
|
||||
"include_files": "",
|
||||
"drop_schema_first": True,
|
||||
},
|
||||
}
|
||||
|
||||
# 任务代码常量
|
||||
TASK_ORDERS = "ORDERS"
|
||||
TASK_PAYMENTS = "PAYMENTS"
|
||||
TASK_REFUNDS = "REFUNDS"
|
||||
TASK_INVENTORY_CHANGE = "INVENTORY_CHANGE"
|
||||
TASK_COUPON_USAGE = "COUPON_USAGE"
|
||||
TASK_MEMBERS = "MEMBERS"
|
||||
TASK_ASSISTANTS = "ASSISTANTS"
|
||||
TASK_PRODUCTS = "PRODUCTS"
|
||||
TASK_TABLES = "TABLES"
|
||||
TASK_PACKAGES_DEF = "PACKAGES_DEF"
|
||||
TASK_TOPUPS = "TOPUPS"
|
||||
TASK_TABLE_DISCOUNT = "TABLE_DISCOUNT"
|
||||
TASK_ASSISTANT_ABOLISH = "ASSISTANT_ABOLISH"
|
||||
TASK_LEDGER = "LEDGER"
|
||||
175
etl_billiards/config/env_parser.py
Normal file
175
etl_billiards/config/env_parser.py
Normal file
@@ -0,0 +1,175 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""环境变量解析"""
|
||||
import os
|
||||
import json
|
||||
from pathlib import Path
|
||||
from copy import deepcopy
|
||||
|
||||
ENV_MAP = {
|
||||
"TIMEZONE": ("app.timezone",),
|
||||
"STORE_ID": ("app.store_id",),
|
||||
"SCHEMA_OLTP": ("app.schema_oltp",),
|
||||
"SCHEMA_ETL": ("app.schema_etl",),
|
||||
"PG_DSN": ("db.dsn",),
|
||||
"PG_HOST": ("db.host",),
|
||||
"PG_PORT": ("db.port",),
|
||||
"PG_NAME": ("db.name",),
|
||||
"PG_USER": ("db.user",),
|
||||
"PG_PASSWORD": ("db.password",),
|
||||
"PG_CONNECT_TIMEOUT": ("db.connect_timeout_sec",),
|
||||
"API_BASE": ("api.base_url",),
|
||||
"API_TOKEN": ("api.token",),
|
||||
"FICOO_TOKEN": ("api.token",),
|
||||
"API_TIMEOUT": ("api.timeout_sec",),
|
||||
"API_PAGE_SIZE": ("api.page_size",),
|
||||
"API_RETRY_MAX": ("api.retries.max_attempts",),
|
||||
"API_RETRY_BACKOFF": ("api.retries.backoff_sec",),
|
||||
"API_PARAMS": ("api.params",),
|
||||
"EXPORT_ROOT": ("io.export_root",),
|
||||
"LOG_ROOT": ("io.log_root",),
|
||||
"MANIFEST_NAME": ("io.manifest_name",),
|
||||
"INGEST_REPORT_NAME": ("io.ingest_report_name",),
|
||||
"WRITE_PRETTY_JSON": ("io.write_pretty_json",),
|
||||
"RUN_TASKS": ("run.tasks",),
|
||||
"OVERLAP_SECONDS": ("run.overlap_seconds",),
|
||||
"WINDOW_BUSY_MIN": ("run.window_minutes.default_busy",),
|
||||
"WINDOW_IDLE_MIN": ("run.window_minutes.default_idle",),
|
||||
"IDLE_START": ("run.idle_window.start",),
|
||||
"IDLE_END": ("run.idle_window.end",),
|
||||
"IDLE_WINDOW_START": ("run.idle_window.start",),
|
||||
"IDLE_WINDOW_END": ("run.idle_window.end",),
|
||||
"ALLOW_EMPTY_RESULT_ADVANCE": ("run.allow_empty_result_advance",),
|
||||
"ALLOW_EMPTY_ADVANCE": ("run.allow_empty_result_advance",),
|
||||
"PIPELINE_FLOW": ("pipeline.flow",),
|
||||
"JSON_FETCH_ROOT": ("pipeline.fetch_root",),
|
||||
"JSON_SOURCE_DIR": ("pipeline.ingest_source_dir",),
|
||||
"FETCH_ROOT": ("pipeline.fetch_root",),
|
||||
"INGEST_SOURCE_DIR": ("pipeline.ingest_source_dir",),
|
||||
}
|
||||
|
||||
|
||||
def _deep_set(d, dotted_keys, value):
|
||||
cur = d
|
||||
for k in dotted_keys[:-1]:
|
||||
cur = cur.setdefault(k, {})
|
||||
cur[dotted_keys[-1]] = value
|
||||
|
||||
|
||||
def _coerce_env(v: str):
|
||||
if v is None:
|
||||
return None
|
||||
s = v.strip()
|
||||
if s.lower() in ("true", "false"):
|
||||
return s.lower() == "true"
|
||||
try:
|
||||
if s.isdigit() or (s.startswith("-") and s[1:].isdigit()):
|
||||
return int(s)
|
||||
except Exception:
|
||||
pass
|
||||
if (s.startswith("{") and s.endswith("}")) or (s.startswith("[") and s.endswith("]")):
|
||||
try:
|
||||
return json.loads(s)
|
||||
except Exception:
|
||||
return s
|
||||
return s
|
||||
|
||||
|
||||
def _strip_inline_comment(value: str) -> str:
|
||||
"""去掉未被引号包裹的内联注释"""
|
||||
result = []
|
||||
in_quote = False
|
||||
quote_char = ""
|
||||
escape = False
|
||||
for ch in value:
|
||||
if escape:
|
||||
result.append(ch)
|
||||
escape = False
|
||||
continue
|
||||
if ch == "\\":
|
||||
escape = True
|
||||
result.append(ch)
|
||||
continue
|
||||
if ch in ("'", '"'):
|
||||
if not in_quote:
|
||||
in_quote = True
|
||||
quote_char = ch
|
||||
elif quote_char == ch:
|
||||
in_quote = False
|
||||
quote_char = ""
|
||||
result.append(ch)
|
||||
continue
|
||||
if ch == "#" and not in_quote:
|
||||
break
|
||||
result.append(ch)
|
||||
return "".join(result).rstrip()
|
||||
|
||||
|
||||
def _unquote_value(value: str) -> str:
|
||||
"""处理引号/原始字符串以及尾随逗号"""
|
||||
trimmed = value.strip()
|
||||
trimmed = _strip_inline_comment(trimmed)
|
||||
trimmed = trimmed.rstrip(",").rstrip()
|
||||
if not trimmed:
|
||||
return trimmed
|
||||
if len(trimmed) >= 2 and trimmed[0] in ("'", '"') and trimmed[-1] == trimmed[0]:
|
||||
return trimmed[1:-1]
|
||||
if (
|
||||
len(trimmed) >= 3
|
||||
and trimmed[0] in ("r", "R")
|
||||
and trimmed[1] in ("'", '"')
|
||||
and trimmed[-1] == trimmed[1]
|
||||
):
|
||||
return trimmed[2:-1]
|
||||
return trimmed
|
||||
|
||||
|
||||
def _parse_dotenv_line(line: str) -> tuple[str, str] | None:
|
||||
"""解析 .env 文件中的单行"""
|
||||
stripped = line.strip()
|
||||
if not stripped or stripped.startswith("#"):
|
||||
return None
|
||||
if stripped.startswith("export "):
|
||||
stripped = stripped[len("export ") :].strip()
|
||||
if "=" not in stripped:
|
||||
return None
|
||||
key, value = stripped.split("=", 1)
|
||||
key = key.strip()
|
||||
value = _unquote_value(value)
|
||||
return key, value
|
||||
|
||||
|
||||
def _load_dotenv_values() -> dict:
|
||||
"""从项目根目录读取 .env 文件键值"""
|
||||
if os.environ.get("ETL_SKIP_DOTENV") in ("1", "true", "TRUE", "True"):
|
||||
return {}
|
||||
root = Path(__file__).resolve().parents[1]
|
||||
dotenv_path = root / ".env"
|
||||
if not dotenv_path.exists():
|
||||
return {}
|
||||
values: dict[str, str] = {}
|
||||
for line in dotenv_path.read_text(encoding="utf-8", errors="ignore").splitlines():
|
||||
parsed = _parse_dotenv_line(line)
|
||||
if parsed:
|
||||
key, value = parsed
|
||||
values[key] = value
|
||||
return values
|
||||
|
||||
|
||||
def _apply_env_values(cfg: dict, source: dict):
|
||||
for env_key, dotted in ENV_MAP.items():
|
||||
val = source.get(env_key)
|
||||
if val is None:
|
||||
continue
|
||||
v2 = _coerce_env(val)
|
||||
for path in dotted:
|
||||
if path == "run.tasks" and isinstance(v2, str):
|
||||
v2 = [item.strip() for item in v2.split(",") if item.strip()]
|
||||
_deep_set(cfg, path.split("."), v2)
|
||||
|
||||
|
||||
def load_env_overrides(defaults: dict) -> dict:
|
||||
cfg = deepcopy(defaults)
|
||||
# 先读取 .env,再读取真实环境变量,确保 CLI 仍然最高优先级
|
||||
_apply_env_values(cfg, _load_dotenv_values())
|
||||
_apply_env_values(cfg, os.environ)
|
||||
return cfg
|
||||
92
etl_billiards/config/settings.py
Normal file
92
etl_billiards/config/settings.py
Normal file
@@ -0,0 +1,92 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""配置管理主类"""
|
||||
from copy import deepcopy
|
||||
from .defaults import DEFAULTS
|
||||
from .env_parser import load_env_overrides
|
||||
|
||||
class AppConfig:
|
||||
"""应用配置管理器"""
|
||||
|
||||
def __init__(self, config_dict: dict):
|
||||
self.config = config_dict
|
||||
|
||||
@classmethod
|
||||
def load(cls, cli_overrides: dict = None):
|
||||
"""加载配置: DEFAULTS < ENV < CLI"""
|
||||
cfg = load_env_overrides(DEFAULTS)
|
||||
|
||||
if cli_overrides:
|
||||
cls._deep_merge(cfg, cli_overrides)
|
||||
|
||||
# 规范化
|
||||
cls._normalize(cfg)
|
||||
cls._validate(cfg)
|
||||
|
||||
return cls(cfg)
|
||||
|
||||
@staticmethod
|
||||
def _deep_merge(dst, src):
|
||||
"""深度合并字典"""
|
||||
for k, v in src.items():
|
||||
if isinstance(v, dict) and isinstance(dst.get(k), dict):
|
||||
AppConfig._deep_merge(dst[k], v)
|
||||
else:
|
||||
dst[k] = v
|
||||
|
||||
@staticmethod
|
||||
def _normalize(cfg):
|
||||
"""规范化配置"""
|
||||
# 转换 store_id 为整数
|
||||
try:
|
||||
cfg["app"]["store_id"] = int(str(cfg["app"]["store_id"]).strip())
|
||||
except Exception:
|
||||
raise SystemExit("app.store_id 必须为整数")
|
||||
|
||||
# DSN 组装
|
||||
if not cfg["db"]["dsn"]:
|
||||
cfg["db"]["dsn"] = (
|
||||
f"postgresql://{cfg['db']['user']}:{cfg['db']['password']}"
|
||||
f"@{cfg['db']['host']}:{cfg['db']['port']}/{cfg['db']['name']}"
|
||||
)
|
||||
|
||||
# connect_timeout 限定 1-20 秒
|
||||
try:
|
||||
timeout_sec = int(cfg["db"].get("connect_timeout_sec") or 5)
|
||||
except Exception:
|
||||
raise SystemExit("db.connect_timeout_sec 必须为整数")
|
||||
cfg["db"]["connect_timeout_sec"] = max(1, min(timeout_sec, 20))
|
||||
|
||||
# 会话参数
|
||||
cfg["db"].setdefault("session", {})
|
||||
sess = cfg["db"]["session"]
|
||||
sess.setdefault("timezone", cfg["app"]["timezone"])
|
||||
|
||||
for k in ("statement_timeout_ms", "lock_timeout_ms", "idle_in_tx_timeout_ms"):
|
||||
if k in sess and sess[k] is not None:
|
||||
try:
|
||||
sess[k] = int(sess[k])
|
||||
except Exception:
|
||||
raise SystemExit(f"db.session.{k} 需为整数毫秒")
|
||||
|
||||
@staticmethod
|
||||
def _validate(cfg):
|
||||
"""验证必填配置"""
|
||||
missing = []
|
||||
if not cfg["app"]["store_id"]:
|
||||
missing.append("app.store_id")
|
||||
if missing:
|
||||
raise SystemExit("缺少必需配置: " + ", ".join(missing))
|
||||
|
||||
def get(self, key: str, default=None):
|
||||
"""获取配置值(支持点号路径)"""
|
||||
keys = key.split(".")
|
||||
val = self.config
|
||||
for k in keys:
|
||||
if isinstance(val, dict):
|
||||
val = val.get(k)
|
||||
else:
|
||||
return default
|
||||
return val if val is not None else default
|
||||
|
||||
def __getitem__(self, key):
|
||||
return self.config[key]
|
||||
0
etl_billiards/database/__init__.py
Normal file
0
etl_billiards/database/__init__.py
Normal file
112
etl_billiards/database/base.py
Normal file
112
etl_billiards/database/base.py
Normal file
@@ -0,0 +1,112 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
数据库操作(批量、RETURNING支持)
|
||||
"""
|
||||
import re
|
||||
from typing import List, Dict, Tuple
|
||||
import psycopg2.extras
|
||||
from .connection import DatabaseConnection
|
||||
|
||||
|
||||
class DatabaseOperations(DatabaseConnection):
|
||||
"""扩展数据库操作(包含批量upsert和returning支持)"""
|
||||
|
||||
def batch_execute(self, sql: str, rows: List[Dict], page_size: int = 1000):
|
||||
"""批量执行SQL(不带RETURNING)"""
|
||||
if not rows:
|
||||
return
|
||||
with self.conn.cursor() as c:
|
||||
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
|
||||
|
||||
def batch_upsert_with_returning(self, sql: str, rows: List[Dict], page_size: int = 1000) -> Tuple[int, int]:
|
||||
"""
|
||||
批量 UPSERT 并统计插入/更新数
|
||||
|
||||
Args:
|
||||
sql: 包含RETURNING子句的SQL
|
||||
rows: 数据行列表
|
||||
page_size: 批次大小
|
||||
|
||||
Returns:
|
||||
(inserted_count, updated_count) 元组
|
||||
"""
|
||||
if not rows:
|
||||
return (0, 0)
|
||||
|
||||
use_returning = "RETURNING" in sql.upper()
|
||||
|
||||
with self.conn.cursor() as c:
|
||||
if not use_returning:
|
||||
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
|
||||
return (0, 0)
|
||||
|
||||
# 优先尝试向量化执行
|
||||
try:
|
||||
inserted, updated = self._execute_with_returning_vectorized(c, sql, rows, page_size)
|
||||
return (inserted, updated)
|
||||
except Exception:
|
||||
# 回退到逐行执行
|
||||
return self._execute_with_returning_row_by_row(c, sql, rows)
|
||||
|
||||
def _execute_with_returning_vectorized(self, cursor, sql: str, rows: List[Dict], page_size: int) -> Tuple[int, int]:
|
||||
"""向量化执行(使用execute_values)"""
|
||||
# 解析VALUES子句
|
||||
m = re.search(r"VALUES\s*\((.*?)\)", sql, flags=re.IGNORECASE | re.DOTALL)
|
||||
if not m:
|
||||
raise ValueError("Cannot parse VALUES clause")
|
||||
|
||||
tpl = "(" + m.group(1) + ")"
|
||||
base_sql = sql[:m.start()] + "VALUES %s" + sql[m.end():]
|
||||
|
||||
ret = psycopg2.extras.execute_values(
|
||||
cursor, base_sql, rows, template=tpl, page_size=page_size, fetch=True
|
||||
)
|
||||
|
||||
if not ret:
|
||||
return (0, 0)
|
||||
|
||||
inserted = 0
|
||||
for rec in ret:
|
||||
flag = self._extract_inserted_flag(rec)
|
||||
if flag:
|
||||
inserted += 1
|
||||
|
||||
return (inserted, len(ret) - inserted)
|
||||
|
||||
def _execute_with_returning_row_by_row(self, cursor, sql: str, rows: List[Dict]) -> Tuple[int, int]:
|
||||
"""逐行执行(回退方案)"""
|
||||
inserted = 0
|
||||
updated = 0
|
||||
|
||||
for r in rows:
|
||||
cursor.execute(sql, r)
|
||||
try:
|
||||
rec = cursor.fetchone()
|
||||
except Exception:
|
||||
rec = None
|
||||
|
||||
flag = self._extract_inserted_flag(rec) if rec else None
|
||||
|
||||
if flag:
|
||||
inserted += 1
|
||||
else:
|
||||
updated += 1
|
||||
|
||||
return (inserted, updated)
|
||||
|
||||
@staticmethod
|
||||
def _extract_inserted_flag(rec) -> bool:
|
||||
"""从返回记录中提取inserted标志"""
|
||||
if isinstance(rec, tuple):
|
||||
return bool(rec[0])
|
||||
elif isinstance(rec, dict):
|
||||
return bool(rec.get("inserted"))
|
||||
else:
|
||||
try:
|
||||
return bool(rec["inserted"])
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
# 为了向后兼容,提供Pg别名
|
||||
Pg = DatabaseOperations
|
||||
63
etl_billiards/database/connection.py
Normal file
63
etl_billiards/database/connection.py
Normal file
@@ -0,0 +1,63 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Database connection manager with capped connect_timeout."""
|
||||
|
||||
import psycopg2
|
||||
import psycopg2.extras
|
||||
|
||||
|
||||
class DatabaseConnection:
|
||||
"""Wrap psycopg2 connection with session parameters and timeout guard."""
|
||||
|
||||
def __init__(self, dsn: str, session: dict = None, connect_timeout: int = None):
|
||||
timeout_val = connect_timeout if connect_timeout is not None else 5
|
||||
# PRD: database connect_timeout must not exceed 20 seconds.
|
||||
timeout_val = max(1, min(int(timeout_val), 20))
|
||||
|
||||
self.conn = psycopg2.connect(dsn, connect_timeout=timeout_val)
|
||||
self.conn.autocommit = False
|
||||
|
||||
# Session parameters (timezone, statement timeout, etc.)
|
||||
if session:
|
||||
with self.conn.cursor() as c:
|
||||
if session.get("timezone"):
|
||||
c.execute("SET TIME ZONE %s", (session["timezone"],))
|
||||
if session.get("statement_timeout_ms") is not None:
|
||||
c.execute(
|
||||
"SET statement_timeout = %s",
|
||||
(int(session["statement_timeout_ms"]),),
|
||||
)
|
||||
if session.get("lock_timeout_ms") is not None:
|
||||
c.execute(
|
||||
"SET lock_timeout = %s", (int(session["lock_timeout_ms"]),)
|
||||
)
|
||||
if session.get("idle_in_tx_timeout_ms") is not None:
|
||||
c.execute(
|
||||
"SET idle_in_transaction_session_timeout = %s",
|
||||
(int(session["idle_in_tx_timeout_ms"]),),
|
||||
)
|
||||
|
||||
def query(self, sql: str, args=None):
|
||||
"""Execute a query and fetch all rows."""
|
||||
with self.conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as c:
|
||||
c.execute(sql, args)
|
||||
return c.fetchall()
|
||||
|
||||
def execute(self, sql: str, args=None):
|
||||
"""Execute a SQL statement without returning rows."""
|
||||
with self.conn.cursor() as c:
|
||||
c.execute(sql, args)
|
||||
|
||||
def commit(self):
|
||||
"""Commit current transaction."""
|
||||
self.conn.commit()
|
||||
|
||||
def rollback(self):
|
||||
"""Rollback current transaction."""
|
||||
self.conn.rollback()
|
||||
|
||||
def close(self):
|
||||
"""Safely close the connection."""
|
||||
try:
|
||||
self.conn.close()
|
||||
except Exception:
|
||||
pass
|
||||
99
etl_billiards/database/operations.py
Normal file
99
etl_billiards/database/operations.py
Normal file
@@ -0,0 +1,99 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据库批量操作"""
|
||||
import psycopg2.extras
|
||||
import re
|
||||
|
||||
class DatabaseOperations:
|
||||
"""数据库批量操作封装"""
|
||||
|
||||
def __init__(self, connection):
|
||||
self._connection = connection
|
||||
self.conn = connection.conn
|
||||
|
||||
def batch_execute(self, sql: str, rows: list, page_size: int = 1000):
|
||||
"""批量执行SQL"""
|
||||
if not rows:
|
||||
return
|
||||
with self.conn.cursor() as c:
|
||||
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
|
||||
|
||||
def batch_upsert_with_returning(self, sql: str, rows: list,
|
||||
page_size: int = 1000) -> tuple:
|
||||
"""批量UPSERT并返回插入/更新计数"""
|
||||
if not rows:
|
||||
return (0, 0)
|
||||
|
||||
use_returning = "RETURNING" in sql.upper()
|
||||
|
||||
with self.conn.cursor() as c:
|
||||
if not use_returning:
|
||||
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
|
||||
return (0, 0)
|
||||
|
||||
# 尝试向量化执行
|
||||
try:
|
||||
m = re.search(r"VALUES\s*\((.*?)\)", sql, flags=re.IGNORECASE | re.DOTALL)
|
||||
if m:
|
||||
tpl = "(" + m.group(1) + ")"
|
||||
base_sql = sql[:m.start()] + "VALUES %s" + sql[m.end():]
|
||||
|
||||
ret = psycopg2.extras.execute_values(
|
||||
c, base_sql, rows, template=tpl, page_size=page_size, fetch=True
|
||||
)
|
||||
|
||||
if not ret:
|
||||
return (0, 0)
|
||||
|
||||
inserted = sum(1 for rec in ret if self._is_inserted(rec))
|
||||
return (inserted, len(ret) - inserted)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# 回退:逐行执行
|
||||
inserted = 0
|
||||
updated = 0
|
||||
for r in rows:
|
||||
c.execute(sql, r)
|
||||
try:
|
||||
rec = c.fetchone()
|
||||
except Exception:
|
||||
rec = None
|
||||
|
||||
if self._is_inserted(rec):
|
||||
inserted += 1
|
||||
else:
|
||||
updated += 1
|
||||
|
||||
return (inserted, updated)
|
||||
|
||||
@staticmethod
|
||||
def _is_inserted(rec) -> bool:
|
||||
"""判断是否为插入操作"""
|
||||
if rec is None:
|
||||
return False
|
||||
if isinstance(rec, tuple):
|
||||
return bool(rec[0])
|
||||
if isinstance(rec, dict):
|
||||
return bool(rec.get("inserted"))
|
||||
return False
|
||||
|
||||
# --- pass-through helpers -------------------------------------------------
|
||||
def commit(self):
|
||||
"""提交事务(委托给底层连接)"""
|
||||
self._connection.commit()
|
||||
|
||||
def rollback(self):
|
||||
"""回滚事务(委托给底层连接)"""
|
||||
self._connection.rollback()
|
||||
|
||||
def query(self, sql: str, args=None):
|
||||
"""执行查询并返回结果"""
|
||||
return self._connection.query(sql, args)
|
||||
|
||||
def execute(self, sql: str, args=None):
|
||||
"""执行任意 SQL"""
|
||||
self._connection.execute(sql, args)
|
||||
|
||||
def cursor(self):
|
||||
"""暴露原生 cursor,供特殊操作使用"""
|
||||
return self.conn.cursor()
|
||||
1886
etl_billiards/database/schema_ODS_doc copy.sql
Normal file
1886
etl_billiards/database/schema_ODS_doc copy.sql
Normal file
File diff suppressed because it is too large
Load Diff
1907
etl_billiards/database/schema_ODS_doc.sql
Normal file
1907
etl_billiards/database/schema_ODS_doc.sql
Normal file
File diff suppressed because it is too large
Load Diff
1878
etl_billiards/database/schema_dwd_doc.sql
Normal file
1878
etl_billiards/database/schema_dwd_doc.sql
Normal file
File diff suppressed because it is too large
Load Diff
105
etl_billiards/database/schema_etl_admin.sql
Normal file
105
etl_billiards/database/schema_etl_admin.sql
Normal file
@@ -0,0 +1,105 @@
|
||||
-- 文件说明:etl_admin 调度元数据 DDL(独立文件,便于初始化任务单独执行)。
|
||||
-- 包含任务注册表、游标表、运行记录表;字段注释使用中文。
|
||||
|
||||
CREATE SCHEMA IF NOT EXISTS etl_admin;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS etl_admin.etl_task (
|
||||
task_id BIGSERIAL PRIMARY KEY,
|
||||
task_code TEXT NOT NULL,
|
||||
store_id BIGINT NOT NULL,
|
||||
enabled BOOLEAN DEFAULT TRUE,
|
||||
cursor_field TEXT,
|
||||
window_minutes_default INT DEFAULT 30,
|
||||
overlap_seconds INT DEFAULT 120,
|
||||
page_size INT DEFAULT 200,
|
||||
retry_max INT DEFAULT 3,
|
||||
params JSONB DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ DEFAULT now(),
|
||||
UNIQUE (task_code, store_id)
|
||||
);
|
||||
COMMENT ON TABLE etl_admin.etl_task IS '任务注册表:调度依据的任务清单(与 task_registry 中的任务码对应)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.task_code IS '任务编码,需与代码中的任务码一致。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.store_id IS '门店/租户粒度,区分多门店执行。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.enabled IS '是否启用此任务。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.cursor_field IS '增量游标字段名(可选)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.window_minutes_default IS '默认时间窗口(分钟)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.overlap_seconds IS '窗口重叠秒数,用于防止遗漏。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.page_size IS '默认分页大小。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.retry_max IS 'API重试次数上限。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.params IS '任务级自定义参数 JSON。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.created_at IS '创建时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.updated_at IS '更新时间。';
|
||||
|
||||
CREATE TABLE IF NOT EXISTS etl_admin.etl_cursor (
|
||||
cursor_id BIGSERIAL PRIMARY KEY,
|
||||
task_id BIGINT NOT NULL REFERENCES etl_admin.etl_task(task_id) ON DELETE CASCADE,
|
||||
store_id BIGINT NOT NULL,
|
||||
last_start TIMESTAMPTZ,
|
||||
last_end TIMESTAMPTZ,
|
||||
last_id BIGINT,
|
||||
last_run_id BIGINT,
|
||||
extra JSONB DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ DEFAULT now(),
|
||||
UNIQUE (task_id, store_id)
|
||||
);
|
||||
COMMENT ON TABLE etl_admin.etl_cursor IS '任务游标表:记录每个任务/门店的增量窗口及最后 run。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.task_id IS '关联 etl_task.task_id。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.store_id IS '门店/租户粒度。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.last_start IS '上次窗口开始时间(含重叠偏移)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.last_end IS '上次窗口结束时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.last_id IS '上次处理的最大主键/游标值(可选)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.last_run_id IS '上次运行ID,对应 etl_run.run_id。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.extra IS '附加游标信息 JSON。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.created_at IS '创建时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.updated_at IS '更新时间。';
|
||||
|
||||
CREATE TABLE IF NOT EXISTS etl_admin.etl_run (
|
||||
run_id BIGSERIAL PRIMARY KEY,
|
||||
run_uuid TEXT NOT NULL,
|
||||
task_id BIGINT NOT NULL REFERENCES etl_admin.etl_task(task_id) ON DELETE CASCADE,
|
||||
store_id BIGINT NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
started_at TIMESTAMPTZ DEFAULT now(),
|
||||
ended_at TIMESTAMPTZ,
|
||||
window_start TIMESTAMPTZ,
|
||||
window_end TIMESTAMPTZ,
|
||||
window_minutes INT,
|
||||
overlap_seconds INT,
|
||||
fetched_count INT DEFAULT 0,
|
||||
loaded_count INT DEFAULT 0,
|
||||
updated_count INT DEFAULT 0,
|
||||
skipped_count INT DEFAULT 0,
|
||||
error_count INT DEFAULT 0,
|
||||
unknown_fields INT DEFAULT 0,
|
||||
export_dir TEXT,
|
||||
log_path TEXT,
|
||||
request_params JSONB DEFAULT '{}'::jsonb,
|
||||
manifest JSONB DEFAULT '{}'::jsonb,
|
||||
error_message TEXT,
|
||||
extra JSONB DEFAULT '{}'::jsonb
|
||||
);
|
||||
COMMENT ON TABLE etl_admin.etl_run IS '运行记录表:记录每次任务执行的窗口、状态、计数与日志路径。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.run_uuid IS '本次调度的唯一标识。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.task_id IS '关联 etl_task.task_id。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.store_id IS '门店/租户粒度。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.status IS '运行状态(SUCC/FAIL/PARTIAL 等)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.started_at IS '开始时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.ended_at IS '结束时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.window_start IS '本次窗口开始时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.window_end IS '本次窗口结束时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.window_minutes IS '窗口跨度(分钟)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.overlap_seconds IS '窗口重叠秒数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.fetched_count IS '抓取/读取的记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.loaded_count IS '插入的记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.updated_count IS '更新的记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.skipped_count IS '跳过的记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.error_count IS '错误记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.unknown_fields IS '未知字段计数(清洗阶段)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.export_dir IS '抓取/导出目录。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.log_path IS '日志路径。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.request_params IS '请求参数 JSON。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.manifest IS '运行产出清单/统计 JSON。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.error_message IS '错误信息(若失败)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.extra IS '附加字段,保留扩展。';
|
||||
41
etl_billiards/database/seed_ods_tasks.sql
Normal file
41
etl_billiards/database/seed_ods_tasks.sql
Normal file
@@ -0,0 +1,41 @@
|
||||
-- 灏嗘柊鐨?ODS 浠诲姟娉ㄥ唽鍒?etl_admin.etl_task锛堟牴鎹渶瑕佹浛鎹?store_id锛?
|
||||
-- 浣跨敤鏂瑰紡锛堢ず渚嬶級锛?
|
||||
-- psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
|
||||
-- 鎴栬€呭湪 psql 涓墽琛屾湰鏂囦欢鍐呭銆?
|
||||
|
||||
WITH target_store AS (
|
||||
SELECT 2790685415443269::bigint AS store_id -- TODO: 鏇挎崲涓哄疄闄?store_id
|
||||
),
|
||||
task_codes AS (
|
||||
SELECT unnest(ARRAY[
|
||||
'assistant_accounts_masterS',
|
||||
'assistant_service_records',
|
||||
'assistant_cancellation_records',
|
||||
'goods_stock_movements',
|
||||
'ODS_INVENTORY_STOCK',
|
||||
'ODS_PACKAGE',
|
||||
'ODS_GROUP_BUY_REDEMPTION',
|
||||
'ODS_MEMBER',
|
||||
'ODS_MEMBER_BALANCE',
|
||||
'member_stored_value_cards',
|
||||
'ODS_PAYMENT',
|
||||
'ODS_REFUND',
|
||||
'platform_coupon_redemption_records',
|
||||
'recharge_settlements',
|
||||
'ODS_TABLES',
|
||||
'ODS_GOODS_CATEGORY',
|
||||
'ODS_STORE_GOODS',
|
||||
'table_fee_discount_records',
|
||||
'ODS_TENANT_GOODS',
|
||||
'ODS_SETTLEMENT_TICKET',
|
||||
'settlement_records',
|
||||
'INIT_ODS_SCHEMA'
|
||||
]) AS task_code
|
||||
)
|
||||
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
|
||||
SELECT t.task_code, s.store_id, TRUE
|
||||
FROM task_codes t CROSS JOIN target_store s
|
||||
ON CONFLICT (task_code, store_id) DO UPDATE
|
||||
SET enabled = EXCLUDED.enabled;
|
||||
|
||||
|
||||
0
etl_billiards/loaders/__init__.py
Normal file
0
etl_billiards/loaders/__init__.py
Normal file
23
etl_billiards/loaders/base_loader.py
Normal file
23
etl_billiards/loaders/base_loader.py
Normal file
@@ -0,0 +1,23 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据加载器基类"""
|
||||
|
||||
import logging
|
||||
|
||||
|
||||
class BaseLoader:
|
||||
"""数据加载器基类"""
|
||||
|
||||
def __init__(self, db_ops, logger=None):
|
||||
self.db = db_ops
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
|
||||
def upsert(self, records: list) -> tuple:
|
||||
"""
|
||||
执行 UPSERT 操作
|
||||
返回: (inserted_count, updated_count, skipped_count)
|
||||
"""
|
||||
raise NotImplementedError("子类需实现 upsert 方法")
|
||||
|
||||
def _batch_size(self) -> int:
|
||||
"""批次大小"""
|
||||
return 1000
|
||||
0
etl_billiards/loaders/dimensions/__init__.py
Normal file
0
etl_billiards/loaders/dimensions/__init__.py
Normal file
114
etl_billiards/loaders/dimensions/assistant.py
Normal file
114
etl_billiards/loaders/dimensions/assistant.py
Normal file
@@ -0,0 +1,114 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教维度加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class AssistantLoader(BaseLoader):
|
||||
"""写入 dim_assistant"""
|
||||
|
||||
def upsert_assistants(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.dim_assistant (
|
||||
store_id,
|
||||
assistant_id,
|
||||
assistant_no,
|
||||
nickname,
|
||||
real_name,
|
||||
gender,
|
||||
mobile,
|
||||
level,
|
||||
team_id,
|
||||
team_name,
|
||||
assistant_status,
|
||||
work_status,
|
||||
entry_time,
|
||||
resign_time,
|
||||
start_time,
|
||||
end_time,
|
||||
create_time,
|
||||
update_time,
|
||||
system_role_id,
|
||||
online_status,
|
||||
allow_cx,
|
||||
charge_way,
|
||||
pd_unit_price,
|
||||
cx_unit_price,
|
||||
is_guaranteed,
|
||||
is_team_leader,
|
||||
serial_number,
|
||||
show_sort,
|
||||
is_delete,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(assistant_id)s,
|
||||
%(assistant_no)s,
|
||||
%(nickname)s,
|
||||
%(real_name)s,
|
||||
%(gender)s,
|
||||
%(mobile)s,
|
||||
%(level)s,
|
||||
%(team_id)s,
|
||||
%(team_name)s,
|
||||
%(assistant_status)s,
|
||||
%(work_status)s,
|
||||
%(entry_time)s,
|
||||
%(resign_time)s,
|
||||
%(start_time)s,
|
||||
%(end_time)s,
|
||||
%(create_time)s,
|
||||
%(update_time)s,
|
||||
%(system_role_id)s,
|
||||
%(online_status)s,
|
||||
%(allow_cx)s,
|
||||
%(charge_way)s,
|
||||
%(pd_unit_price)s,
|
||||
%(cx_unit_price)s,
|
||||
%(is_guaranteed)s,
|
||||
%(is_team_leader)s,
|
||||
%(serial_number)s,
|
||||
%(show_sort)s,
|
||||
%(is_delete)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, assistant_id) DO UPDATE SET
|
||||
assistant_no = EXCLUDED.assistant_no,
|
||||
nickname = EXCLUDED.nickname,
|
||||
real_name = EXCLUDED.real_name,
|
||||
gender = EXCLUDED.gender,
|
||||
mobile = EXCLUDED.mobile,
|
||||
level = EXCLUDED.level,
|
||||
team_id = EXCLUDED.team_id,
|
||||
team_name = EXCLUDED.team_name,
|
||||
assistant_status= EXCLUDED.assistant_status,
|
||||
work_status = EXCLUDED.work_status,
|
||||
entry_time = EXCLUDED.entry_time,
|
||||
resign_time = EXCLUDED.resign_time,
|
||||
start_time = EXCLUDED.start_time,
|
||||
end_time = EXCLUDED.end_time,
|
||||
update_time = COALESCE(EXCLUDED.update_time, now()),
|
||||
system_role_id = EXCLUDED.system_role_id,
|
||||
online_status = EXCLUDED.online_status,
|
||||
allow_cx = EXCLUDED.allow_cx,
|
||||
charge_way = EXCLUDED.charge_way,
|
||||
pd_unit_price = EXCLUDED.pd_unit_price,
|
||||
cx_unit_price = EXCLUDED.cx_unit_price,
|
||||
is_guaranteed = EXCLUDED.is_guaranteed,
|
||||
is_team_leader = EXCLUDED.is_team_leader,
|
||||
serial_number = EXCLUDED.serial_number,
|
||||
show_sort = EXCLUDED.show_sort,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
34
etl_billiards/loaders/dimensions/member.py
Normal file
34
etl_billiards/loaders/dimensions/member.py
Normal file
@@ -0,0 +1,34 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""会员维度表加载器"""
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
class MemberLoader(BaseLoader):
|
||||
"""会员维度加载器"""
|
||||
|
||||
def upsert_members(self, records: list, store_id: int) -> tuple:
|
||||
"""加载会员数据"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.dim_member (
|
||||
store_id, member_id, member_name, phone, balance,
|
||||
status, register_time, raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s, %(member_id)s, %(member_name)s, %(phone)s, %(balance)s,
|
||||
%(status)s, %(register_time)s, %(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, member_id) DO UPDATE SET
|
||||
member_name = EXCLUDED.member_name,
|
||||
phone = EXCLUDED.phone,
|
||||
balance = EXCLUDED.balance,
|
||||
status = EXCLUDED.status,
|
||||
register_time = EXCLUDED.register_time,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
|
||||
return (inserted, updated, 0)
|
||||
91
etl_billiards/loaders/dimensions/package.py
Normal file
91
etl_billiards/loaders/dimensions/package.py
Normal file
@@ -0,0 +1,91 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""团购/套餐定义加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class PackageDefinitionLoader(BaseLoader):
|
||||
"""写入 dim_package_coupon"""
|
||||
|
||||
def upsert_packages(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.dim_package_coupon (
|
||||
store_id,
|
||||
package_id,
|
||||
package_code,
|
||||
package_name,
|
||||
table_area_id,
|
||||
table_area_name,
|
||||
selling_price,
|
||||
duration_seconds,
|
||||
start_time,
|
||||
end_time,
|
||||
type,
|
||||
is_enabled,
|
||||
is_delete,
|
||||
usable_count,
|
||||
creator_name,
|
||||
date_type,
|
||||
group_type,
|
||||
coupon_money,
|
||||
area_tag_type,
|
||||
system_group_type,
|
||||
card_type_ids,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(package_id)s,
|
||||
%(package_code)s,
|
||||
%(package_name)s,
|
||||
%(table_area_id)s,
|
||||
%(table_area_name)s,
|
||||
%(selling_price)s,
|
||||
%(duration_seconds)s,
|
||||
%(start_time)s,
|
||||
%(end_time)s,
|
||||
%(type)s,
|
||||
%(is_enabled)s,
|
||||
%(is_delete)s,
|
||||
%(usable_count)s,
|
||||
%(creator_name)s,
|
||||
%(date_type)s,
|
||||
%(group_type)s,
|
||||
%(coupon_money)s,
|
||||
%(area_tag_type)s,
|
||||
%(system_group_type)s,
|
||||
%(card_type_ids)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, package_id) DO UPDATE SET
|
||||
package_code = EXCLUDED.package_code,
|
||||
package_name = EXCLUDED.package_name,
|
||||
table_area_id = EXCLUDED.table_area_id,
|
||||
table_area_name = EXCLUDED.table_area_name,
|
||||
selling_price = EXCLUDED.selling_price,
|
||||
duration_seconds = EXCLUDED.duration_seconds,
|
||||
start_time = EXCLUDED.start_time,
|
||||
end_time = EXCLUDED.end_time,
|
||||
type = EXCLUDED.type,
|
||||
is_enabled = EXCLUDED.is_enabled,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
usable_count = EXCLUDED.usable_count,
|
||||
creator_name = EXCLUDED.creator_name,
|
||||
date_type = EXCLUDED.date_type,
|
||||
group_type = EXCLUDED.group_type,
|
||||
coupon_money = EXCLUDED.coupon_money,
|
||||
area_tag_type = EXCLUDED.area_tag_type,
|
||||
system_group_type = EXCLUDED.system_group_type,
|
||||
card_type_ids = EXCLUDED.card_type_ids,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
134
etl_billiards/loaders/dimensions/product.py
Normal file
134
etl_billiards/loaders/dimensions/product.py
Normal file
@@ -0,0 +1,134 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""商品维度 + 价格SCD2 加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
from scd.scd2_handler import SCD2Handler
|
||||
|
||||
|
||||
class ProductLoader(BaseLoader):
|
||||
"""商品维度加载器(dim_product + dim_product_price_scd)"""
|
||||
|
||||
def __init__(self, db_ops):
|
||||
super().__init__(db_ops)
|
||||
# SCD2 处理器,复用通用逻辑
|
||||
self.scd_handler = SCD2Handler(db_ops)
|
||||
|
||||
def upsert_products(self, records: list, store_id: int) -> tuple:
|
||||
"""
|
||||
加载商品维度及价格SCD
|
||||
|
||||
返回: (inserted_count, updated_count, skipped_count)
|
||||
"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
# 1) 维度主表:billiards.dim_product
|
||||
sql_base = """
|
||||
INSERT INTO billiards.dim_product (
|
||||
store_id,
|
||||
product_id,
|
||||
site_product_id,
|
||||
product_name,
|
||||
category_id,
|
||||
category_name,
|
||||
second_category_id,
|
||||
unit,
|
||||
cost_price,
|
||||
sale_price,
|
||||
allow_discount,
|
||||
status,
|
||||
supplier_id,
|
||||
barcode,
|
||||
is_combo,
|
||||
created_time,
|
||||
updated_time,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(product_id)s,
|
||||
%(site_product_id)s,
|
||||
%(product_name)s,
|
||||
%(category_id)s,
|
||||
%(category_name)s,
|
||||
%(second_category_id)s,
|
||||
%(unit)s,
|
||||
%(cost_price)s,
|
||||
%(sale_price)s,
|
||||
%(allow_discount)s,
|
||||
%(status)s,
|
||||
%(supplier_id)s,
|
||||
%(barcode)s,
|
||||
%(is_combo)s,
|
||||
%(created_time)s,
|
||||
%(updated_time)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, product_id) DO UPDATE SET
|
||||
site_product_id = EXCLUDED.site_product_id,
|
||||
product_name = EXCLUDED.product_name,
|
||||
category_id = EXCLUDED.category_id,
|
||||
category_name = EXCLUDED.category_name,
|
||||
second_category_id = EXCLUDED.second_category_id,
|
||||
unit = EXCLUDED.unit,
|
||||
cost_price = EXCLUDED.cost_price,
|
||||
sale_price = EXCLUDED.sale_price,
|
||||
allow_discount = EXCLUDED.allow_discount,
|
||||
status = EXCLUDED.status,
|
||||
supplier_id = EXCLUDED.supplier_id,
|
||||
barcode = EXCLUDED.barcode,
|
||||
is_combo = EXCLUDED.is_combo,
|
||||
updated_time = COALESCE(EXCLUDED.updated_time, now()),
|
||||
raw_data = EXCLUDED.raw_data
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql_base,
|
||||
records,
|
||||
page_size=self._batch_size(),
|
||||
)
|
||||
|
||||
# 2) 价格 SCD2:billiards.dim_product_price_scd
|
||||
# 只追踪 price + 类目 + 名称等字段的历史
|
||||
tracked_fields = [
|
||||
"product_name",
|
||||
"category_id",
|
||||
"category_name",
|
||||
"second_category_id",
|
||||
"cost_price",
|
||||
"sale_price",
|
||||
"allow_discount",
|
||||
"status",
|
||||
]
|
||||
natural_key = ["store_id", "product_id"]
|
||||
|
||||
for rec in records:
|
||||
effective_date = rec.get("updated_time") or rec.get("created_time")
|
||||
|
||||
scd_record = {
|
||||
"store_id": rec["store_id"],
|
||||
"product_id": rec["product_id"],
|
||||
"product_name": rec.get("product_name"),
|
||||
"category_id": rec.get("category_id"),
|
||||
"category_name": rec.get("category_name"),
|
||||
"second_category_id": rec.get("second_category_id"),
|
||||
"cost_price": rec.get("cost_price"),
|
||||
"sale_price": rec.get("sale_price"),
|
||||
"allow_discount": rec.get("allow_discount"),
|
||||
"status": rec.get("status"),
|
||||
# 原表中有 raw_data jsonb 字段,这里直接复用 task 传入的 raw_data
|
||||
"raw_data": rec.get("raw_data"),
|
||||
}
|
||||
|
||||
# 这里我们不强行区分 INSERT/UPDATE/SKIP,对 ETL 统计来说意义不大
|
||||
self.scd_handler.upsert(
|
||||
table_name="billiards.dim_product_price_scd",
|
||||
natural_key=natural_key,
|
||||
tracked_fields=tracked_fields,
|
||||
record=scd_record,
|
||||
effective_date=effective_date,
|
||||
)
|
||||
|
||||
# skipped_count 统一按 0 返回(真正被丢弃的记录在 Task 端已经过滤)
|
||||
return (inserted, updated, 0)
|
||||
80
etl_billiards/loaders/dimensions/table.py
Normal file
80
etl_billiards/loaders/dimensions/table.py
Normal file
@@ -0,0 +1,80 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""台桌维度加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class TableLoader(BaseLoader):
|
||||
"""将台桌档案写入 dim_table"""
|
||||
|
||||
def upsert_tables(self, records: list) -> tuple:
|
||||
"""批量写入台桌档案"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.dim_table (
|
||||
store_id,
|
||||
table_id,
|
||||
site_id,
|
||||
area_id,
|
||||
area_name,
|
||||
table_name,
|
||||
table_price,
|
||||
table_status,
|
||||
table_status_name,
|
||||
light_status,
|
||||
is_rest_area,
|
||||
show_status,
|
||||
virtual_table,
|
||||
charge_free,
|
||||
only_allow_groupon,
|
||||
is_online_reservation,
|
||||
created_time,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(table_id)s,
|
||||
%(site_id)s,
|
||||
%(area_id)s,
|
||||
%(area_name)s,
|
||||
%(table_name)s,
|
||||
%(table_price)s,
|
||||
%(table_status)s,
|
||||
%(table_status_name)s,
|
||||
%(light_status)s,
|
||||
%(is_rest_area)s,
|
||||
%(show_status)s,
|
||||
%(virtual_table)s,
|
||||
%(charge_free)s,
|
||||
%(only_allow_groupon)s,
|
||||
%(is_online_reservation)s,
|
||||
%(created_time)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, table_id) DO UPDATE SET
|
||||
site_id = EXCLUDED.site_id,
|
||||
area_id = EXCLUDED.area_id,
|
||||
area_name = EXCLUDED.area_name,
|
||||
table_name = EXCLUDED.table_name,
|
||||
table_price = EXCLUDED.table_price,
|
||||
table_status = EXCLUDED.table_status,
|
||||
table_status_name = EXCLUDED.table_status_name,
|
||||
light_status = EXCLUDED.light_status,
|
||||
is_rest_area = EXCLUDED.is_rest_area,
|
||||
show_status = EXCLUDED.show_status,
|
||||
virtual_table = EXCLUDED.virtual_table,
|
||||
charge_free = EXCLUDED.charge_free,
|
||||
only_allow_groupon = EXCLUDED.only_allow_groupon,
|
||||
is_online_reservation = EXCLUDED.is_online_reservation,
|
||||
created_time = COALESCE(EXCLUDED.created_time, dim_table.created_time),
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
0
etl_billiards/loaders/facts/__init__.py
Normal file
0
etl_billiards/loaders/facts/__init__.py
Normal file
64
etl_billiards/loaders/facts/assistant_abolish.py
Normal file
64
etl_billiards/loaders/facts/assistant_abolish.py
Normal file
@@ -0,0 +1,64 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教作废事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class AssistantAbolishLoader(BaseLoader):
|
||||
"""写入 fact_assistant_abolish"""
|
||||
|
||||
def upsert_records(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_assistant_abolish (
|
||||
store_id,
|
||||
abolish_id,
|
||||
table_id,
|
||||
table_name,
|
||||
table_area_id,
|
||||
table_area,
|
||||
assistant_no,
|
||||
assistant_name,
|
||||
charge_minutes,
|
||||
abolish_amount,
|
||||
create_time,
|
||||
trash_reason,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(abolish_id)s,
|
||||
%(table_id)s,
|
||||
%(table_name)s,
|
||||
%(table_area_id)s,
|
||||
%(table_area)s,
|
||||
%(assistant_no)s,
|
||||
%(assistant_name)s,
|
||||
%(charge_minutes)s,
|
||||
%(abolish_amount)s,
|
||||
%(create_time)s,
|
||||
%(trash_reason)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, abolish_id) DO UPDATE SET
|
||||
table_id = EXCLUDED.table_id,
|
||||
table_name = EXCLUDED.table_name,
|
||||
table_area_id = EXCLUDED.table_area_id,
|
||||
table_area = EXCLUDED.table_area,
|
||||
assistant_no = EXCLUDED.assistant_no,
|
||||
assistant_name = EXCLUDED.assistant_name,
|
||||
charge_minutes = EXCLUDED.charge_minutes,
|
||||
abolish_amount = EXCLUDED.abolish_amount,
|
||||
create_time = EXCLUDED.create_time,
|
||||
trash_reason = EXCLUDED.trash_reason,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
136
etl_billiards/loaders/facts/assistant_ledger.py
Normal file
136
etl_billiards/loaders/facts/assistant_ledger.py
Normal file
@@ -0,0 +1,136 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教流水事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class AssistantLedgerLoader(BaseLoader):
|
||||
"""写入 fact_assistant_ledger"""
|
||||
|
||||
def upsert_ledgers(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_assistant_ledger (
|
||||
store_id,
|
||||
ledger_id,
|
||||
assistant_no,
|
||||
assistant_name,
|
||||
nickname,
|
||||
level_name,
|
||||
table_name,
|
||||
ledger_unit_price,
|
||||
ledger_count,
|
||||
ledger_amount,
|
||||
projected_income,
|
||||
service_money,
|
||||
member_discount_amount,
|
||||
manual_discount_amount,
|
||||
coupon_deduct_money,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
operator_id,
|
||||
operator_name,
|
||||
assistant_team_id,
|
||||
assistant_level,
|
||||
site_table_id,
|
||||
order_assistant_id,
|
||||
site_assistant_id,
|
||||
user_id,
|
||||
ledger_start_time,
|
||||
ledger_end_time,
|
||||
start_use_time,
|
||||
last_use_time,
|
||||
income_seconds,
|
||||
real_use_seconds,
|
||||
is_trash,
|
||||
trash_reason,
|
||||
is_confirm,
|
||||
ledger_status,
|
||||
create_time,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(ledger_id)s,
|
||||
%(assistant_no)s,
|
||||
%(assistant_name)s,
|
||||
%(nickname)s,
|
||||
%(level_name)s,
|
||||
%(table_name)s,
|
||||
%(ledger_unit_price)s,
|
||||
%(ledger_count)s,
|
||||
%(ledger_amount)s,
|
||||
%(projected_income)s,
|
||||
%(service_money)s,
|
||||
%(member_discount_amount)s,
|
||||
%(manual_discount_amount)s,
|
||||
%(coupon_deduct_money)s,
|
||||
%(order_trade_no)s,
|
||||
%(order_settle_id)s,
|
||||
%(operator_id)s,
|
||||
%(operator_name)s,
|
||||
%(assistant_team_id)s,
|
||||
%(assistant_level)s,
|
||||
%(site_table_id)s,
|
||||
%(order_assistant_id)s,
|
||||
%(site_assistant_id)s,
|
||||
%(user_id)s,
|
||||
%(ledger_start_time)s,
|
||||
%(ledger_end_time)s,
|
||||
%(start_use_time)s,
|
||||
%(last_use_time)s,
|
||||
%(income_seconds)s,
|
||||
%(real_use_seconds)s,
|
||||
%(is_trash)s,
|
||||
%(trash_reason)s,
|
||||
%(is_confirm)s,
|
||||
%(ledger_status)s,
|
||||
%(create_time)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, ledger_id) DO UPDATE SET
|
||||
assistant_no = EXCLUDED.assistant_no,
|
||||
assistant_name = EXCLUDED.assistant_name,
|
||||
nickname = EXCLUDED.nickname,
|
||||
level_name = EXCLUDED.level_name,
|
||||
table_name = EXCLUDED.table_name,
|
||||
ledger_unit_price = EXCLUDED.ledger_unit_price,
|
||||
ledger_count = EXCLUDED.ledger_count,
|
||||
ledger_amount = EXCLUDED.ledger_amount,
|
||||
projected_income = EXCLUDED.projected_income,
|
||||
service_money = EXCLUDED.service_money,
|
||||
member_discount_amount = EXCLUDED.member_discount_amount,
|
||||
manual_discount_amount = EXCLUDED.manual_discount_amount,
|
||||
coupon_deduct_money = EXCLUDED.coupon_deduct_money,
|
||||
order_trade_no = EXCLUDED.order_trade_no,
|
||||
order_settle_id = EXCLUDED.order_settle_id,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
assistant_team_id = EXCLUDED.assistant_team_id,
|
||||
assistant_level = EXCLUDED.assistant_level,
|
||||
site_table_id = EXCLUDED.site_table_id,
|
||||
order_assistant_id = EXCLUDED.order_assistant_id,
|
||||
site_assistant_id = EXCLUDED.site_assistant_id,
|
||||
user_id = EXCLUDED.user_id,
|
||||
ledger_start_time = EXCLUDED.ledger_start_time,
|
||||
ledger_end_time = EXCLUDED.ledger_end_time,
|
||||
start_use_time = EXCLUDED.start_use_time,
|
||||
last_use_time = EXCLUDED.last_use_time,
|
||||
income_seconds = EXCLUDED.income_seconds,
|
||||
real_use_seconds = EXCLUDED.real_use_seconds,
|
||||
is_trash = EXCLUDED.is_trash,
|
||||
trash_reason = EXCLUDED.trash_reason,
|
||||
is_confirm = EXCLUDED.is_confirm,
|
||||
ledger_status = EXCLUDED.ledger_status,
|
||||
create_time = EXCLUDED.create_time,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
91
etl_billiards/loaders/facts/coupon_usage.py
Normal file
91
etl_billiards/loaders/facts/coupon_usage.py
Normal file
@@ -0,0 +1,91 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""券核销事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class CouponUsageLoader(BaseLoader):
|
||||
"""写入 fact_coupon_usage"""
|
||||
|
||||
def upsert_coupon_usage(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_coupon_usage (
|
||||
store_id,
|
||||
usage_id,
|
||||
coupon_code,
|
||||
coupon_channel,
|
||||
coupon_name,
|
||||
sale_price,
|
||||
coupon_money,
|
||||
coupon_free_time,
|
||||
use_status,
|
||||
create_time,
|
||||
consume_time,
|
||||
operator_id,
|
||||
operator_name,
|
||||
table_id,
|
||||
site_order_id,
|
||||
group_package_id,
|
||||
coupon_remark,
|
||||
deal_id,
|
||||
certificate_id,
|
||||
verify_id,
|
||||
is_delete,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(usage_id)s,
|
||||
%(coupon_code)s,
|
||||
%(coupon_channel)s,
|
||||
%(coupon_name)s,
|
||||
%(sale_price)s,
|
||||
%(coupon_money)s,
|
||||
%(coupon_free_time)s,
|
||||
%(use_status)s,
|
||||
%(create_time)s,
|
||||
%(consume_time)s,
|
||||
%(operator_id)s,
|
||||
%(operator_name)s,
|
||||
%(table_id)s,
|
||||
%(site_order_id)s,
|
||||
%(group_package_id)s,
|
||||
%(coupon_remark)s,
|
||||
%(deal_id)s,
|
||||
%(certificate_id)s,
|
||||
%(verify_id)s,
|
||||
%(is_delete)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, usage_id) DO UPDATE SET
|
||||
coupon_code = EXCLUDED.coupon_code,
|
||||
coupon_channel = EXCLUDED.coupon_channel,
|
||||
coupon_name = EXCLUDED.coupon_name,
|
||||
sale_price = EXCLUDED.sale_price,
|
||||
coupon_money = EXCLUDED.coupon_money,
|
||||
coupon_free_time = EXCLUDED.coupon_free_time,
|
||||
use_status = EXCLUDED.use_status,
|
||||
create_time = EXCLUDED.create_time,
|
||||
consume_time = EXCLUDED.consume_time,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
table_id = EXCLUDED.table_id,
|
||||
site_order_id = EXCLUDED.site_order_id,
|
||||
group_package_id = EXCLUDED.group_package_id,
|
||||
coupon_remark = EXCLUDED.coupon_remark,
|
||||
deal_id = EXCLUDED.deal_id,
|
||||
certificate_id = EXCLUDED.certificate_id,
|
||||
verify_id = EXCLUDED.verify_id,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
73
etl_billiards/loaders/facts/inventory_change.py
Normal file
73
etl_billiards/loaders/facts/inventory_change.py
Normal file
@@ -0,0 +1,73 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""库存变动事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class InventoryChangeLoader(BaseLoader):
|
||||
"""写入 fact_inventory_change"""
|
||||
|
||||
def upsert_changes(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_inventory_change (
|
||||
store_id,
|
||||
change_id,
|
||||
site_goods_id,
|
||||
stock_type,
|
||||
goods_name,
|
||||
change_time,
|
||||
start_qty,
|
||||
end_qty,
|
||||
change_qty,
|
||||
unit,
|
||||
price,
|
||||
operator_name,
|
||||
remark,
|
||||
goods_category_id,
|
||||
goods_second_category_id,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(change_id)s,
|
||||
%(site_goods_id)s,
|
||||
%(stock_type)s,
|
||||
%(goods_name)s,
|
||||
%(change_time)s,
|
||||
%(start_qty)s,
|
||||
%(end_qty)s,
|
||||
%(change_qty)s,
|
||||
%(unit)s,
|
||||
%(price)s,
|
||||
%(operator_name)s,
|
||||
%(remark)s,
|
||||
%(goods_category_id)s,
|
||||
%(goods_second_category_id)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, change_id) DO UPDATE SET
|
||||
site_goods_id = EXCLUDED.site_goods_id,
|
||||
stock_type = EXCLUDED.stock_type,
|
||||
goods_name = EXCLUDED.goods_name,
|
||||
change_time = EXCLUDED.change_time,
|
||||
start_qty = EXCLUDED.start_qty,
|
||||
end_qty = EXCLUDED.end_qty,
|
||||
change_qty = EXCLUDED.change_qty,
|
||||
unit = EXCLUDED.unit,
|
||||
price = EXCLUDED.price,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
remark = EXCLUDED.remark,
|
||||
goods_category_id = EXCLUDED.goods_category_id,
|
||||
goods_second_category_id = EXCLUDED.goods_second_category_id,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
42
etl_billiards/loaders/facts/order.py
Normal file
42
etl_billiards/loaders/facts/order.py
Normal file
@@ -0,0 +1,42 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""订单事实表加载器"""
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
class OrderLoader(BaseLoader):
|
||||
"""订单数据加载器"""
|
||||
|
||||
def upsert_orders(self, records: list, store_id: int) -> tuple:
|
||||
"""加载订单数据"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_order (
|
||||
store_id, order_id, order_no, member_id, table_id,
|
||||
order_time, end_time, total_amount, discount_amount,
|
||||
final_amount, pay_status, order_status, remark, raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s, %(order_id)s, %(order_no)s, %(member_id)s, %(table_id)s,
|
||||
%(order_time)s, %(end_time)s, %(total_amount)s, %(discount_amount)s,
|
||||
%(final_amount)s, %(pay_status)s, %(order_status)s, %(remark)s, %(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, order_id) DO UPDATE SET
|
||||
order_no = EXCLUDED.order_no,
|
||||
member_id = EXCLUDED.member_id,
|
||||
table_id = EXCLUDED.table_id,
|
||||
order_time = EXCLUDED.order_time,
|
||||
end_time = EXCLUDED.end_time,
|
||||
total_amount = EXCLUDED.total_amount,
|
||||
discount_amount = EXCLUDED.discount_amount,
|
||||
final_amount = EXCLUDED.final_amount,
|
||||
pay_status = EXCLUDED.pay_status,
|
||||
order_status = EXCLUDED.order_status,
|
||||
remark = EXCLUDED.remark,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
|
||||
return (inserted, updated, 0)
|
||||
61
etl_billiards/loaders/facts/payment.py
Normal file
61
etl_billiards/loaders/facts/payment.py
Normal file
@@ -0,0 +1,61 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""支付事实表加载器"""
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
class PaymentLoader(BaseLoader):
|
||||
"""支付数据加载器"""
|
||||
|
||||
def upsert_payments(self, records: list, store_id: int) -> tuple:
|
||||
"""加载支付数据"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_payment (
|
||||
store_id, pay_id, order_id,
|
||||
site_id, tenant_id,
|
||||
order_settle_id, order_trade_no,
|
||||
relate_type, relate_id,
|
||||
create_time, pay_time,
|
||||
pay_amount, fee_amount, discount_amount,
|
||||
payment_method, pay_type,
|
||||
online_pay_channel, pay_terminal,
|
||||
pay_status, remark, raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s, %(pay_id)s, %(order_id)s,
|
||||
%(site_id)s, %(tenant_id)s,
|
||||
%(order_settle_id)s, %(order_trade_no)s,
|
||||
%(relate_type)s, %(relate_id)s,
|
||||
%(create_time)s, %(pay_time)s,
|
||||
%(pay_amount)s, %(fee_amount)s, %(discount_amount)s,
|
||||
%(payment_method)s, %(pay_type)s,
|
||||
%(online_pay_channel)s, %(pay_terminal)s,
|
||||
%(pay_status)s, %(remark)s, %(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, pay_id) DO UPDATE SET
|
||||
order_settle_id = EXCLUDED.order_settle_id,
|
||||
order_trade_no = EXCLUDED.order_trade_no,
|
||||
relate_type = EXCLUDED.relate_type,
|
||||
relate_id = EXCLUDED.relate_id,
|
||||
order_id = EXCLUDED.order_id,
|
||||
site_id = EXCLUDED.site_id,
|
||||
tenant_id = EXCLUDED.tenant_id,
|
||||
create_time = EXCLUDED.create_time,
|
||||
pay_time = EXCLUDED.pay_time,
|
||||
pay_amount = EXCLUDED.pay_amount,
|
||||
fee_amount = EXCLUDED.fee_amount,
|
||||
discount_amount = EXCLUDED.discount_amount,
|
||||
payment_method = EXCLUDED.payment_method,
|
||||
pay_type = EXCLUDED.pay_type,
|
||||
online_pay_channel = EXCLUDED.online_pay_channel,
|
||||
pay_terminal = EXCLUDED.pay_terminal,
|
||||
pay_status = EXCLUDED.pay_status,
|
||||
remark = EXCLUDED.remark,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
|
||||
return (inserted, updated, 0)
|
||||
88
etl_billiards/loaders/facts/refund.py
Normal file
88
etl_billiards/loaders/facts/refund.py
Normal file
@@ -0,0 +1,88 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""退款事实表加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class RefundLoader(BaseLoader):
|
||||
"""写入 fact_refund"""
|
||||
|
||||
def upsert_refunds(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_refund (
|
||||
store_id,
|
||||
refund_id,
|
||||
site_id,
|
||||
tenant_id,
|
||||
pay_amount,
|
||||
pay_status,
|
||||
pay_time,
|
||||
create_time,
|
||||
relate_type,
|
||||
relate_id,
|
||||
payment_method,
|
||||
refund_amount,
|
||||
action_type,
|
||||
pay_terminal,
|
||||
operator_id,
|
||||
channel_pay_no,
|
||||
channel_fee,
|
||||
is_delete,
|
||||
member_id,
|
||||
member_card_id,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(refund_id)s,
|
||||
%(site_id)s,
|
||||
%(tenant_id)s,
|
||||
%(pay_amount)s,
|
||||
%(pay_status)s,
|
||||
%(pay_time)s,
|
||||
%(create_time)s,
|
||||
%(relate_type)s,
|
||||
%(relate_id)s,
|
||||
%(payment_method)s,
|
||||
%(refund_amount)s,
|
||||
%(action_type)s,
|
||||
%(pay_terminal)s,
|
||||
%(operator_id)s,
|
||||
%(channel_pay_no)s,
|
||||
%(channel_fee)s,
|
||||
%(is_delete)s,
|
||||
%(member_id)s,
|
||||
%(member_card_id)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, refund_id) DO UPDATE SET
|
||||
site_id = EXCLUDED.site_id,
|
||||
tenant_id = EXCLUDED.tenant_id,
|
||||
pay_amount = EXCLUDED.pay_amount,
|
||||
pay_status = EXCLUDED.pay_status,
|
||||
pay_time = EXCLUDED.pay_time,
|
||||
create_time = EXCLUDED.create_time,
|
||||
relate_type = EXCLUDED.relate_type,
|
||||
relate_id = EXCLUDED.relate_id,
|
||||
payment_method = EXCLUDED.payment_method,
|
||||
refund_amount = EXCLUDED.refund_amount,
|
||||
action_type = EXCLUDED.action_type,
|
||||
pay_terminal = EXCLUDED.pay_terminal,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
channel_pay_no = EXCLUDED.channel_pay_no,
|
||||
channel_fee = EXCLUDED.channel_fee,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
member_id = EXCLUDED.member_id,
|
||||
member_card_id = EXCLUDED.member_card_id,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
82
etl_billiards/loaders/facts/table_discount.py
Normal file
82
etl_billiards/loaders/facts/table_discount.py
Normal file
@@ -0,0 +1,82 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""台费打折事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class TableDiscountLoader(BaseLoader):
|
||||
"""写入 fact_table_discount"""
|
||||
|
||||
def upsert_discounts(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_table_discount (
|
||||
store_id,
|
||||
discount_id,
|
||||
adjust_type,
|
||||
applicant_id,
|
||||
applicant_name,
|
||||
operator_id,
|
||||
operator_name,
|
||||
ledger_amount,
|
||||
ledger_count,
|
||||
ledger_name,
|
||||
ledger_status,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
site_table_id,
|
||||
table_area_id,
|
||||
table_area_name,
|
||||
create_time,
|
||||
is_delete,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(discount_id)s,
|
||||
%(adjust_type)s,
|
||||
%(applicant_id)s,
|
||||
%(applicant_name)s,
|
||||
%(operator_id)s,
|
||||
%(operator_name)s,
|
||||
%(ledger_amount)s,
|
||||
%(ledger_count)s,
|
||||
%(ledger_name)s,
|
||||
%(ledger_status)s,
|
||||
%(order_settle_id)s,
|
||||
%(order_trade_no)s,
|
||||
%(site_table_id)s,
|
||||
%(table_area_id)s,
|
||||
%(table_area_name)s,
|
||||
%(create_time)s,
|
||||
%(is_delete)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, discount_id) DO UPDATE SET
|
||||
adjust_type = EXCLUDED.adjust_type,
|
||||
applicant_id = EXCLUDED.applicant_id,
|
||||
applicant_name = EXCLUDED.applicant_name,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
ledger_amount = EXCLUDED.ledger_amount,
|
||||
ledger_count = EXCLUDED.ledger_count,
|
||||
ledger_name = EXCLUDED.ledger_name,
|
||||
ledger_status = EXCLUDED.ledger_status,
|
||||
order_settle_id = EXCLUDED.order_settle_id,
|
||||
order_trade_no = EXCLUDED.order_trade_no,
|
||||
site_table_id = EXCLUDED.site_table_id,
|
||||
table_area_id = EXCLUDED.table_area_id,
|
||||
table_area_name = EXCLUDED.table_area_name,
|
||||
create_time = EXCLUDED.create_time,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
188
etl_billiards/loaders/facts/ticket.py
Normal file
188
etl_billiards/loaders/facts/ticket.py
Normal file
@@ -0,0 +1,188 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""小票详情加载器"""
|
||||
from ..base_loader import BaseLoader
|
||||
import json
|
||||
|
||||
class TicketLoader(BaseLoader):
|
||||
"""
|
||||
Loader for parsing Ticket Detail JSON and populating DWD fact tables.
|
||||
Handles:
|
||||
- fact_order (Header)
|
||||
- fact_order_goods (Items)
|
||||
- fact_table_usage (Items)
|
||||
- fact_assistant_service (Items)
|
||||
"""
|
||||
|
||||
def process_tickets(self, tickets: list, store_id: int) -> tuple:
|
||||
"""
|
||||
Process a batch of ticket JSONs.
|
||||
Returns (inserted_count, error_count)
|
||||
"""
|
||||
inserted_count = 0
|
||||
error_count = 0
|
||||
|
||||
# Prepare batch lists
|
||||
orders = []
|
||||
goods_list = []
|
||||
table_usages = []
|
||||
assistant_services = []
|
||||
|
||||
for ticket in tickets:
|
||||
try:
|
||||
# 1. Parse Header (fact_order)
|
||||
root_data = ticket.get("data", {}).get("data", {})
|
||||
if not root_data:
|
||||
continue
|
||||
|
||||
order_settle_id = root_data.get("orderSettleId")
|
||||
if not order_settle_id:
|
||||
continue
|
||||
|
||||
orders.append({
|
||||
"store_id": store_id,
|
||||
"order_settle_id": order_settle_id,
|
||||
"order_trade_no": 0,
|
||||
"order_no": str(root_data.get("orderSettleNumber", "")),
|
||||
"member_id": 0,
|
||||
"pay_time": root_data.get("payTime"),
|
||||
"total_amount": root_data.get("consumeMoney", 0),
|
||||
"pay_amount": root_data.get("actualPayment", 0),
|
||||
"discount_amount": root_data.get("memberOfferAmount", 0),
|
||||
"coupon_amount": root_data.get("couponAmount", 0),
|
||||
"status": "PAID",
|
||||
"cashier_name": root_data.get("cashierName", ""),
|
||||
"remark": root_data.get("orderRemark", ""),
|
||||
"raw_data": json.dumps(ticket, ensure_ascii=False)
|
||||
})
|
||||
|
||||
# 2. Parse Items (orderItem list)
|
||||
order_items = root_data.get("orderItem", [])
|
||||
for item in order_items:
|
||||
order_trade_no = item.get("siteOrderId")
|
||||
|
||||
# 2.1 Table Ledger
|
||||
table_ledger = item.get("tableLedger")
|
||||
if table_ledger:
|
||||
table_usages.append({
|
||||
"store_id": store_id,
|
||||
"order_ledger_id": table_ledger.get("orderTableLedgerId"),
|
||||
"order_settle_id": order_settle_id,
|
||||
"table_id": table_ledger.get("siteTableId"),
|
||||
"table_name": table_ledger.get("tableName"),
|
||||
"start_time": table_ledger.get("chargeStartTime"),
|
||||
"end_time": table_ledger.get("chargeEndTime"),
|
||||
"duration_minutes": table_ledger.get("useDuration", 0),
|
||||
"total_amount": table_ledger.get("consumptionAmount", 0),
|
||||
"pay_amount": table_ledger.get("consumptionAmount", 0) - table_ledger.get("memberDiscountAmount", 0)
|
||||
})
|
||||
|
||||
# 2.2 Goods Ledgers
|
||||
goods_ledgers = item.get("goodsLedgers", [])
|
||||
for g in goods_ledgers:
|
||||
goods_list.append({
|
||||
"store_id": store_id,
|
||||
"order_goods_id": g.get("orderGoodsLedgerId"),
|
||||
"order_settle_id": order_settle_id,
|
||||
"order_trade_no": order_trade_no,
|
||||
"goods_id": g.get("siteGoodsId"),
|
||||
"goods_name": g.get("goodsName"),
|
||||
"quantity": g.get("goodsCount", 0),
|
||||
"unit_price": g.get("goodsPrice", 0),
|
||||
"total_amount": g.get("ledgerAmount", 0),
|
||||
"pay_amount": g.get("realGoodsMoney", 0)
|
||||
})
|
||||
|
||||
# 2.3 Assistant Services
|
||||
assistant_ledgers = item.get("assistantPlayWith", [])
|
||||
for a in assistant_ledgers:
|
||||
assistant_services.append({
|
||||
"store_id": store_id,
|
||||
"ledger_id": a.get("orderAssistantLedgerId"),
|
||||
"order_settle_id": order_settle_id,
|
||||
"assistant_id": a.get("assistantId"),
|
||||
"assistant_name": a.get("ledgerName"),
|
||||
"service_type": a.get("skillName", "Play"),
|
||||
"start_time": a.get("ledgerStartTime"),
|
||||
"end_time": a.get("ledgerEndTime"),
|
||||
"duration_minutes": int(a.get("ledgerCount", 0) / 60) if a.get("ledgerCount") else 0,
|
||||
"total_amount": a.get("ledgerAmount", 0),
|
||||
"pay_amount": a.get("ledgerAmount", 0)
|
||||
})
|
||||
|
||||
inserted_count += 1
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error parsing ticket: {e}", exc_info=True)
|
||||
error_count += 1
|
||||
|
||||
# 3. Batch Insert/Upsert
|
||||
if orders:
|
||||
self._upsert_orders(orders)
|
||||
if goods_list:
|
||||
self._upsert_goods(goods_list)
|
||||
if table_usages:
|
||||
self._upsert_table_usages(table_usages)
|
||||
if assistant_services:
|
||||
self._upsert_assistant_services(assistant_services)
|
||||
|
||||
return inserted_count, error_count
|
||||
|
||||
def _upsert_orders(self, rows):
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_order (
|
||||
store_id, order_settle_id, order_trade_no, order_no, member_id,
|
||||
pay_time, total_amount, pay_amount, discount_amount, coupon_amount,
|
||||
status, cashier_name, remark, raw_data
|
||||
) VALUES (
|
||||
%(store_id)s, %(order_settle_id)s, %(order_trade_no)s, %(order_no)s, %(member_id)s,
|
||||
%(pay_time)s, %(total_amount)s, %(pay_amount)s, %(discount_amount)s, %(coupon_amount)s,
|
||||
%(status)s, %(cashier_name)s, %(remark)s, %(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, order_settle_id) DO UPDATE SET
|
||||
pay_time = EXCLUDED.pay_time,
|
||||
pay_amount = EXCLUDED.pay_amount,
|
||||
updated_at = now()
|
||||
"""
|
||||
self.db.batch_execute(sql, rows)
|
||||
|
||||
def _upsert_goods(self, rows):
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_order_goods (
|
||||
store_id, order_goods_id, order_settle_id, order_trade_no,
|
||||
goods_id, goods_name, quantity, unit_price, total_amount, pay_amount
|
||||
) VALUES (
|
||||
%(store_id)s, %(order_goods_id)s, %(order_settle_id)s, %(order_trade_no)s,
|
||||
%(goods_id)s, %(goods_name)s, %(quantity)s, %(unit_price)s, %(total_amount)s, %(pay_amount)s
|
||||
)
|
||||
ON CONFLICT (store_id, order_goods_id) DO UPDATE SET
|
||||
pay_amount = EXCLUDED.pay_amount
|
||||
"""
|
||||
self.db.batch_execute(sql, rows)
|
||||
|
||||
def _upsert_table_usages(self, rows):
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_table_usage (
|
||||
store_id, order_ledger_id, order_settle_id, table_id, table_name,
|
||||
start_time, end_time, duration_minutes, total_amount, pay_amount
|
||||
) VALUES (
|
||||
%(store_id)s, %(order_ledger_id)s, %(order_settle_id)s, %(table_id)s, %(table_name)s,
|
||||
%(start_time)s, %(end_time)s, %(duration_minutes)s, %(total_amount)s, %(pay_amount)s
|
||||
)
|
||||
ON CONFLICT (store_id, order_ledger_id) DO UPDATE SET
|
||||
pay_amount = EXCLUDED.pay_amount
|
||||
"""
|
||||
self.db.batch_execute(sql, rows)
|
||||
|
||||
def _upsert_assistant_services(self, rows):
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_assistant_service (
|
||||
store_id, ledger_id, order_settle_id, assistant_id, assistant_name,
|
||||
service_type, start_time, end_time, duration_minutes, total_amount, pay_amount
|
||||
) VALUES (
|
||||
%(store_id)s, %(ledger_id)s, %(order_settle_id)s, %(assistant_id)s, %(assistant_name)s,
|
||||
%(service_type)s, %(start_time)s, %(end_time)s, %(duration_minutes)s, %(total_amount)s, %(pay_amount)s
|
||||
)
|
||||
ON CONFLICT (store_id, ledger_id) DO UPDATE SET
|
||||
pay_amount = EXCLUDED.pay_amount
|
||||
"""
|
||||
self.db.batch_execute(sql, rows)
|
||||
118
etl_billiards/loaders/facts/topup.py
Normal file
118
etl_billiards/loaders/facts/topup.py
Normal file
@@ -0,0 +1,118 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""充值记录事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class TopupLoader(BaseLoader):
|
||||
"""写入 fact_topup"""
|
||||
|
||||
def upsert_topups(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_topup (
|
||||
store_id,
|
||||
topup_id,
|
||||
member_id,
|
||||
member_name,
|
||||
member_phone,
|
||||
card_id,
|
||||
card_type_name,
|
||||
pay_amount,
|
||||
consume_money,
|
||||
settle_status,
|
||||
settle_type,
|
||||
settle_name,
|
||||
settle_relate_id,
|
||||
pay_time,
|
||||
create_time,
|
||||
operator_id,
|
||||
operator_name,
|
||||
payment_method,
|
||||
refund_amount,
|
||||
cash_amount,
|
||||
card_amount,
|
||||
balance_amount,
|
||||
online_amount,
|
||||
rounding_amount,
|
||||
adjust_amount,
|
||||
goods_money,
|
||||
table_charge_money,
|
||||
service_money,
|
||||
coupon_amount,
|
||||
order_remark,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(topup_id)s,
|
||||
%(member_id)s,
|
||||
%(member_name)s,
|
||||
%(member_phone)s,
|
||||
%(card_id)s,
|
||||
%(card_type_name)s,
|
||||
%(pay_amount)s,
|
||||
%(consume_money)s,
|
||||
%(settle_status)s,
|
||||
%(settle_type)s,
|
||||
%(settle_name)s,
|
||||
%(settle_relate_id)s,
|
||||
%(pay_time)s,
|
||||
%(create_time)s,
|
||||
%(operator_id)s,
|
||||
%(operator_name)s,
|
||||
%(payment_method)s,
|
||||
%(refund_amount)s,
|
||||
%(cash_amount)s,
|
||||
%(card_amount)s,
|
||||
%(balance_amount)s,
|
||||
%(online_amount)s,
|
||||
%(rounding_amount)s,
|
||||
%(adjust_amount)s,
|
||||
%(goods_money)s,
|
||||
%(table_charge_money)s,
|
||||
%(service_money)s,
|
||||
%(coupon_amount)s,
|
||||
%(order_remark)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, topup_id) DO UPDATE SET
|
||||
member_id = EXCLUDED.member_id,
|
||||
member_name = EXCLUDED.member_name,
|
||||
member_phone = EXCLUDED.member_phone,
|
||||
card_id = EXCLUDED.card_id,
|
||||
card_type_name = EXCLUDED.card_type_name,
|
||||
pay_amount = EXCLUDED.pay_amount,
|
||||
consume_money = EXCLUDED.consume_money,
|
||||
settle_status = EXCLUDED.settle_status,
|
||||
settle_type = EXCLUDED.settle_type,
|
||||
settle_name = EXCLUDED.settle_name,
|
||||
settle_relate_id = EXCLUDED.settle_relate_id,
|
||||
pay_time = EXCLUDED.pay_time,
|
||||
create_time = EXCLUDED.create_time,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
payment_method = EXCLUDED.payment_method,
|
||||
refund_amount = EXCLUDED.refund_amount,
|
||||
cash_amount = EXCLUDED.cash_amount,
|
||||
card_amount = EXCLUDED.card_amount,
|
||||
balance_amount = EXCLUDED.balance_amount,
|
||||
online_amount = EXCLUDED.online_amount,
|
||||
rounding_amount = EXCLUDED.rounding_amount,
|
||||
adjust_amount = EXCLUDED.adjust_amount,
|
||||
goods_money = EXCLUDED.goods_money,
|
||||
table_charge_money = EXCLUDED.table_charge_money,
|
||||
service_money = EXCLUDED.service_money,
|
||||
coupon_amount = EXCLUDED.coupon_amount,
|
||||
order_remark = EXCLUDED.order_remark,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
6
etl_billiards/loaders/ods/__init__.py
Normal file
6
etl_billiards/loaders/ods/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""ODS loader helpers."""
|
||||
|
||||
from .generic import GenericODSLoader
|
||||
|
||||
__all__ = ["GenericODSLoader"]
|
||||
67
etl_billiards/loaders/ods/generic.py
Normal file
67
etl_billiards/loaders/ods/generic.py
Normal file
@@ -0,0 +1,67 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Generic ODS loader that keeps raw payload + primary keys."""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from datetime import datetime, timezone
|
||||
from typing import Iterable, Sequence
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class GenericODSLoader(BaseLoader):
|
||||
"""Insert/update helper for ODS tables that share the same pattern."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
db_ops,
|
||||
table_name: str,
|
||||
columns: Sequence[str],
|
||||
conflict_columns: Sequence[str],
|
||||
):
|
||||
super().__init__(db_ops)
|
||||
if not conflict_columns:
|
||||
raise ValueError("conflict_columns must not be empty for ODS loader")
|
||||
self.table_name = table_name
|
||||
self.columns = list(columns)
|
||||
self.conflict_columns = list(conflict_columns)
|
||||
self._sql = self._build_sql()
|
||||
|
||||
def upsert_rows(self, rows: Iterable[dict]) -> tuple[int, int, int]:
|
||||
"""Insert/update the provided iterable of dictionaries."""
|
||||
rows = list(rows)
|
||||
if not rows:
|
||||
return (0, 0, 0)
|
||||
|
||||
normalized = [self._normalize_row(row) for row in rows]
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
self._sql, normalized, page_size=self._batch_size()
|
||||
)
|
||||
return inserted, updated, 0
|
||||
|
||||
def _build_sql(self) -> str:
|
||||
col_list = ", ".join(self.columns)
|
||||
placeholders = ", ".join(f"%({col})s" for col in self.columns)
|
||||
conflict_clause = ", ".join(self.conflict_columns)
|
||||
update_columns = [c for c in self.columns if c not in self.conflict_columns]
|
||||
set_clause = ", ".join(f"{col} = EXCLUDED.{col}" for col in update_columns)
|
||||
return (
|
||||
f"INSERT INTO {self.table_name} ({col_list}) "
|
||||
f"VALUES ({placeholders}) "
|
||||
f"ON CONFLICT ({conflict_clause}) DO UPDATE SET {set_clause} "
|
||||
f"RETURNING (xmax = 0) AS inserted"
|
||||
)
|
||||
|
||||
def _normalize_row(self, row: dict) -> dict:
|
||||
normalized = {}
|
||||
for col in self.columns:
|
||||
value = row.get(col)
|
||||
if col == "payload" and value is not None and not isinstance(value, str):
|
||||
normalized[col] = json.dumps(value, ensure_ascii=False)
|
||||
else:
|
||||
normalized[col] = value
|
||||
|
||||
if "fetched_at" in normalized and normalized["fetched_at"] is None:
|
||||
normalized["fetched_at"] = datetime.now(timezone.utc)
|
||||
|
||||
return normalized
|
||||
0
etl_billiards/models/__init__.py
Normal file
0
etl_billiards/models/__init__.py
Normal file
50
etl_billiards/models/parsers.py
Normal file
50
etl_billiards/models/parsers.py
Normal file
@@ -0,0 +1,50 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据类型解析器"""
|
||||
from datetime import datetime
|
||||
from decimal import Decimal, ROUND_HALF_UP
|
||||
from dateutil import parser as dtparser
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
class TypeParser:
|
||||
"""类型解析工具"""
|
||||
|
||||
@staticmethod
|
||||
def parse_timestamp(s: str, tz: ZoneInfo) -> datetime | None:
|
||||
"""解析时间戳"""
|
||||
if not s:
|
||||
return None
|
||||
try:
|
||||
dt = dtparser.parse(s)
|
||||
if dt.tzinfo is None:
|
||||
return dt.replace(tzinfo=tz)
|
||||
return dt.astimezone(tz)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def parse_decimal(value, scale: int = 2) -> Decimal | None:
|
||||
"""解析金额"""
|
||||
if value is None:
|
||||
return None
|
||||
try:
|
||||
d = Decimal(str(value))
|
||||
return d.quantize(Decimal(10) ** -scale, rounding=ROUND_HALF_UP)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def parse_int(value) -> int | None:
|
||||
"""解析整数"""
|
||||
if value is None:
|
||||
return None
|
||||
try:
|
||||
return int(value)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def format_timestamp(dt: datetime | None, tz: ZoneInfo) -> str | None:
|
||||
"""格式化时间戳"""
|
||||
if not dt:
|
||||
return None
|
||||
return dt.astimezone(tz).strftime("%Y-%m-%d %H:%M:%S")
|
||||
25
etl_billiards/models/validators.py
Normal file
25
etl_billiards/models/validators.py
Normal file
@@ -0,0 +1,25 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据验证器"""
|
||||
from decimal import Decimal
|
||||
|
||||
class DataValidator:
|
||||
"""数据验证工具"""
|
||||
|
||||
@staticmethod
|
||||
def validate_positive_amount(value: Decimal | None, field_name: str = "amount"):
|
||||
"""验证金额为正数"""
|
||||
if value is not None and value < 0:
|
||||
raise ValueError(f"{field_name} 不能为负数: {value}")
|
||||
|
||||
@staticmethod
|
||||
def validate_required(value, field_name: str):
|
||||
"""验证必填字段"""
|
||||
if value is None or value == "":
|
||||
raise ValueError(f"{field_name} 是必填字段")
|
||||
|
||||
@staticmethod
|
||||
def validate_range(value, min_val, max_val, field_name: str):
|
||||
"""验证值范围"""
|
||||
if value is not None:
|
||||
if value < min_val or value > max_val:
|
||||
raise ValueError(f"{field_name} 必须在 {min_val} 到 {max_val} 之间")
|
||||
52
etl_billiards/ods_row_report.json
Normal file
52
etl_billiards/ods_row_report.json
Normal file
@@ -0,0 +1,52 @@
|
||||
{
|
||||
"source_counts": {
|
||||
"assistant_accounts_master.json": 2,
|
||||
"assistant_cancellation_records.json": 2,
|
||||
"assistant_service_records.json": 2,
|
||||
"goods_stock_movements.json": 2,
|
||||
"goods_stock_summary.json": 161,
|
||||
"group_buy_packages.json": 2,
|
||||
"group_buy_redemption_records.json": 2,
|
||||
"member_balance_changes.json": 2,
|
||||
"member_profiles.json": 2,
|
||||
"member_stored_value_cards.json": 2,
|
||||
"payment_transactions.json": 200,
|
||||
"platform_coupon_redemption_records.json": 200,
|
||||
"recharge_settlements.json": 2,
|
||||
"refund_transactions.json": 11,
|
||||
"settlement_records.json": 2,
|
||||
"settlement_ticket_details.json": 193,
|
||||
"site_tables_master.json": 2,
|
||||
"stock_goods_category_tree.json": 2,
|
||||
"store_goods_master.json": 2,
|
||||
"store_goods_sales_records.json": 2,
|
||||
"table_fee_discount_records.json": 2,
|
||||
"table_fee_transactions.json": 2,
|
||||
"tenant_goods_master.json": 2
|
||||
},
|
||||
"ods_counts": {
|
||||
"member_profiles": 199,
|
||||
"member_balance_changes": 200,
|
||||
"member_stored_value_cards": 200,
|
||||
"recharge_settlements": 75,
|
||||
"settlement_records": 200,
|
||||
"assistant_cancellation_records": 15,
|
||||
"assistant_accounts_master": 50,
|
||||
"assistant_service_records": 200,
|
||||
"site_tables_master": 71,
|
||||
"table_fee_discount_records": 200,
|
||||
"table_fee_transactions": 200,
|
||||
"goods_stock_movements": 200,
|
||||
"stock_goods_category_tree": 9,
|
||||
"goods_stock_summary": 161,
|
||||
"payment_transactions": 200,
|
||||
"refund_transactions": 11,
|
||||
"platform_coupon_redemption_records": 200,
|
||||
"tenant_goods_master": 156,
|
||||
"group_buy_packages": 17,
|
||||
"group_buy_redemption_records": 200,
|
||||
"settlement_ticket_details": 193,
|
||||
"store_goods_master": 161,
|
||||
"store_goods_sales_records": 200
|
||||
}
|
||||
}
|
||||
0
etl_billiards/orchestration/__init__.py
Normal file
0
etl_billiards/orchestration/__init__.py
Normal file
62
etl_billiards/orchestration/cursor_manager.py
Normal file
62
etl_billiards/orchestration/cursor_manager.py
Normal file
@@ -0,0 +1,62 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""游标管理器"""
|
||||
from datetime import datetime
|
||||
|
||||
class CursorManager:
|
||||
"""ETL游标管理"""
|
||||
|
||||
def __init__(self, db_connection):
|
||||
self.db = db_connection
|
||||
|
||||
def get_or_create(self, task_id: int, store_id: int) -> dict:
|
||||
"""获取或创建游标"""
|
||||
rows = self.db.query(
|
||||
"SELECT * FROM etl_admin.etl_cursor WHERE task_id=%s AND store_id=%s",
|
||||
(task_id, store_id)
|
||||
)
|
||||
|
||||
if rows:
|
||||
return rows[0]
|
||||
|
||||
# 创建新游标
|
||||
self.db.execute(
|
||||
"""
|
||||
INSERT INTO etl_admin.etl_cursor(task_id, store_id, last_start, last_end, last_id, extra)
|
||||
VALUES(%s, %s, NULL, NULL, NULL, '{}'::jsonb)
|
||||
""",
|
||||
(task_id, store_id)
|
||||
)
|
||||
self.db.commit()
|
||||
|
||||
rows = self.db.query(
|
||||
"SELECT * FROM etl_admin.etl_cursor WHERE task_id=%s AND store_id=%s",
|
||||
(task_id, store_id)
|
||||
)
|
||||
return rows[0] if rows else None
|
||||
|
||||
def advance(self, task_id: int, store_id: int, window_start: datetime,
|
||||
window_end: datetime, run_id: int, last_id: int = None):
|
||||
"""推进游标"""
|
||||
if last_id is not None:
|
||||
sql = """
|
||||
UPDATE etl_admin.etl_cursor
|
||||
SET last_start = %s,
|
||||
last_end = %s,
|
||||
last_id = GREATEST(COALESCE(last_id, 0), %s),
|
||||
last_run_id = %s,
|
||||
updated_at = now()
|
||||
WHERE task_id = %s AND store_id = %s
|
||||
"""
|
||||
self.db.execute(sql, (window_start, window_end, last_id, run_id, task_id, store_id))
|
||||
else:
|
||||
sql = """
|
||||
UPDATE etl_admin.etl_cursor
|
||||
SET last_start = %s,
|
||||
last_end = %s,
|
||||
last_run_id = %s,
|
||||
updated_at = now()
|
||||
WHERE task_id = %s AND store_id = %s
|
||||
"""
|
||||
self.db.execute(sql, (window_start, window_end, run_id, task_id, store_id))
|
||||
|
||||
self.db.commit()
|
||||
70
etl_billiards/orchestration/run_tracker.py
Normal file
70
etl_billiards/orchestration/run_tracker.py
Normal file
@@ -0,0 +1,70 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""运行记录追踪器"""
|
||||
import json
|
||||
from datetime import datetime
|
||||
|
||||
class RunTracker:
|
||||
"""ETL运行记录管理"""
|
||||
|
||||
def __init__(self, db_connection):
|
||||
self.db = db_connection
|
||||
|
||||
def create_run(self, task_id: int, store_id: int, run_uuid: str,
|
||||
export_dir: str, log_path: str, status: str,
|
||||
window_start: datetime = None, window_end: datetime = None,
|
||||
window_minutes: int = None, overlap_seconds: int = None,
|
||||
request_params: dict = None) -> int:
|
||||
"""创建运行记录"""
|
||||
sql = """
|
||||
INSERT INTO etl_admin.etl_run(
|
||||
run_uuid, task_id, store_id, status, started_at, window_start, window_end,
|
||||
window_minutes, overlap_seconds, fetched_count, loaded_count, updated_count,
|
||||
skipped_count, error_count, unknown_fields, export_dir, log_path,
|
||||
request_params, manifest, error_message, extra
|
||||
) VALUES (
|
||||
%s, %s, %s, %s, now(), %s, %s, %s, %s, 0, 0, 0, 0, 0, 0, %s, %s, %s,
|
||||
'{}'::jsonb, NULL, '{}'::jsonb
|
||||
)
|
||||
RETURNING run_id
|
||||
"""
|
||||
|
||||
result = self.db.query(
|
||||
sql,
|
||||
(run_uuid, task_id, store_id, status, window_start, window_end,
|
||||
window_minutes, overlap_seconds, export_dir, log_path,
|
||||
json.dumps(request_params or {}, ensure_ascii=False))
|
||||
)
|
||||
|
||||
run_id = result[0]["run_id"]
|
||||
self.db.commit()
|
||||
return run_id
|
||||
|
||||
def update_run(self, run_id: int, counts: dict, status: str,
|
||||
ended_at: datetime = None, manifest: dict = None,
|
||||
error_message: str = None):
|
||||
"""更新运行记录"""
|
||||
sql = """
|
||||
UPDATE etl_admin.etl_run
|
||||
SET fetched_count = %s,
|
||||
loaded_count = %s,
|
||||
updated_count = %s,
|
||||
skipped_count = %s,
|
||||
error_count = %s,
|
||||
unknown_fields = %s,
|
||||
status = %s,
|
||||
ended_at = %s,
|
||||
manifest = %s,
|
||||
error_message = %s
|
||||
WHERE run_id = %s
|
||||
"""
|
||||
|
||||
self.db.execute(
|
||||
sql,
|
||||
(counts.get("fetched", 0), counts.get("inserted", 0),
|
||||
counts.get("updated", 0), counts.get("skipped", 0),
|
||||
counts.get("errors", 0), counts.get("unknown_fields", 0),
|
||||
status, ended_at,
|
||||
json.dumps(manifest or {}, ensure_ascii=False),
|
||||
error_message, run_id)
|
||||
)
|
||||
self.db.commit()
|
||||
234
etl_billiards/orchestration/scheduler.py
Normal file
234
etl_billiards/orchestration/scheduler.py
Normal file
@@ -0,0 +1,234 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""ETL 调度:支持在线抓取、离线清洗入库、全流程三种模式。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import uuid
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
from api.client import APIClient
|
||||
from api.local_json_client import LocalJsonClient
|
||||
from api.recording_client import RecordingAPIClient
|
||||
from database.connection import DatabaseConnection
|
||||
from database.operations import DatabaseOperations
|
||||
from orchestration.cursor_manager import CursorManager
|
||||
from orchestration.run_tracker import RunTracker
|
||||
from orchestration.task_registry import default_registry
|
||||
|
||||
|
||||
class ETLScheduler:
|
||||
"""调度多个任务,按 pipeline.flow 执行抓取/清洗入库。"""
|
||||
|
||||
def __init__(self, config, logger):
|
||||
self.config = config
|
||||
self.logger = logger
|
||||
self.tz = ZoneInfo(config.get("app.timezone", "Asia/Taipei"))
|
||||
|
||||
self.pipeline_flow = str(config.get("pipeline.flow", "FULL") or "FULL").upper()
|
||||
self.fetch_root = Path(config.get("pipeline.fetch_root") or config["io"]["export_root"])
|
||||
self.ingest_source_dir = config.get("pipeline.ingest_source_dir") or ""
|
||||
self.write_pretty_json = bool(config.get("io.write_pretty_json", False))
|
||||
|
||||
# 组件
|
||||
self.db_conn = DatabaseConnection(
|
||||
dsn=config["db"]["dsn"],
|
||||
session=config["db"].get("session"),
|
||||
connect_timeout=config["db"].get("connect_timeout_sec"),
|
||||
)
|
||||
self.db_ops = DatabaseOperations(self.db_conn)
|
||||
|
||||
self.api_client = APIClient(
|
||||
base_url=config["api"]["base_url"],
|
||||
token=config["api"]["token"],
|
||||
timeout=config["api"]["timeout_sec"],
|
||||
retry_max=config["api"]["retries"]["max_attempts"],
|
||||
headers_extra=config["api"].get("headers_extra"),
|
||||
)
|
||||
|
||||
self.cursor_mgr = CursorManager(self.db_conn)
|
||||
self.run_tracker = RunTracker(self.db_conn)
|
||||
self.task_registry = default_registry
|
||||
|
||||
# ------------------------------------------------------------------ public
|
||||
def run_tasks(self, task_codes: list | None = None):
|
||||
"""按配置或传入列表执行任务。"""
|
||||
run_uuid = uuid.uuid4().hex
|
||||
store_id = self.config.get("app.store_id")
|
||||
|
||||
if not task_codes:
|
||||
task_codes = self.config.get("run.tasks", [])
|
||||
|
||||
self.logger.info("开始运行任务: %s, run_uuid=%s", task_codes, run_uuid)
|
||||
|
||||
for task_code in task_codes:
|
||||
try:
|
||||
self._run_single_task(task_code, run_uuid, store_id)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
self.logger.error("任务 %s 失败: %s", task_code, exc, exc_info=True)
|
||||
continue
|
||||
|
||||
self.logger.info("所有任务执行完成")
|
||||
|
||||
# ------------------------------------------------------------------ internals
|
||||
def _run_single_task(self, task_code: str, run_uuid: str, store_id: int):
|
||||
"""单个任务的抓取/清洗编排。"""
|
||||
task_cfg = self._load_task_config(task_code, store_id)
|
||||
if not task_cfg:
|
||||
self.logger.warning("任务 %s 未启用或不存在", task_code)
|
||||
return
|
||||
|
||||
task_id = task_cfg["task_id"]
|
||||
cursor_data = self.cursor_mgr.get_or_create(task_id, store_id)
|
||||
|
||||
# run 记录
|
||||
export_dir = Path(self.config["io"]["export_root"]) / datetime.now(self.tz).strftime("%Y%m%d")
|
||||
log_path = str(Path(self.config["io"]["log_root"]) / f"{run_uuid}.log")
|
||||
run_id = self.run_tracker.create_run(
|
||||
task_id=task_id,
|
||||
store_id=store_id,
|
||||
run_uuid=run_uuid,
|
||||
export_dir=str(export_dir),
|
||||
log_path=log_path,
|
||||
status=self._map_run_status("RUNNING"),
|
||||
)
|
||||
|
||||
# 为抓取阶段准备目录
|
||||
fetch_dir = self._build_fetch_dir(task_code, run_id)
|
||||
fetch_stats = None
|
||||
|
||||
try:
|
||||
if self._flow_includes_fetch():
|
||||
fetch_stats = self._execute_fetch(task_code, cursor_data, fetch_dir, run_id)
|
||||
if self.pipeline_flow == "FETCH_ONLY":
|
||||
counts = self._counts_from_fetch(fetch_stats)
|
||||
self.run_tracker.update_run(
|
||||
run_id=run_id,
|
||||
counts=counts,
|
||||
status=self._map_run_status("SUCCESS"),
|
||||
ended_at=datetime.now(self.tz),
|
||||
)
|
||||
return
|
||||
|
||||
if self._flow_includes_ingest():
|
||||
source_dir = self._resolve_ingest_source(fetch_dir, fetch_stats)
|
||||
result = self._execute_ingest(task_code, cursor_data, source_dir)
|
||||
|
||||
self.run_tracker.update_run(
|
||||
run_id=run_id,
|
||||
counts=result["counts"],
|
||||
status=self._map_run_status(result["status"]),
|
||||
ended_at=datetime.now(self.tz),
|
||||
)
|
||||
|
||||
if (result.get("status") or "").upper() == "SUCCESS":
|
||||
window = result.get("window")
|
||||
if window:
|
||||
self.cursor_mgr.advance(
|
||||
task_id=task_id,
|
||||
store_id=store_id,
|
||||
window_start=window.get("start"),
|
||||
window_end=window.get("end"),
|
||||
run_id=run_id,
|
||||
)
|
||||
|
||||
except Exception as exc: # noqa: BLE001
|
||||
self.run_tracker.update_run(
|
||||
run_id=run_id,
|
||||
counts={},
|
||||
status=self._map_run_status("FAIL"),
|
||||
ended_at=datetime.now(self.tz),
|
||||
error_message=str(exc),
|
||||
)
|
||||
raise
|
||||
|
||||
def _execute_fetch(self, task_code: str, cursor_data: dict | None, fetch_dir: Path, run_id: int):
|
||||
"""在线抓取阶段:用 RecordingAPIClient 拉取并落盘,不做 Transform/Load。"""
|
||||
recording_client = RecordingAPIClient(
|
||||
base_client=self.api_client,
|
||||
output_dir=fetch_dir,
|
||||
task_code=task_code,
|
||||
run_id=run_id,
|
||||
write_pretty=self.write_pretty_json,
|
||||
)
|
||||
task = self.task_registry.create_task(task_code, self.config, self.db_ops, recording_client, self.logger)
|
||||
context = task._build_context(cursor_data) # type: ignore[attr-defined]
|
||||
self.logger.info("%s: 抓取阶段开始,目录=%s", task_code, fetch_dir)
|
||||
|
||||
extracted = task.extract(context)
|
||||
# 抓取结束,不执行 transform/load
|
||||
stats = recording_client.last_dump or {}
|
||||
fetched_count = stats.get("records") or len(extracted.get("records", [])) if isinstance(extracted, dict) else 0
|
||||
self.logger.info(
|
||||
"%s: 抓取完成,文件=%s,记录数=%s",
|
||||
task_code,
|
||||
stats.get("file"),
|
||||
fetched_count,
|
||||
)
|
||||
return {"file": stats.get("file"), "records": fetched_count, "pages": stats.get("pages")}
|
||||
|
||||
def _execute_ingest(self, task_code: str, cursor_data: dict | None, source_dir: Path):
|
||||
"""本地清洗入库:使用 LocalJsonClient 回放 JSON,走原有任务 ETL。"""
|
||||
local_client = LocalJsonClient(source_dir)
|
||||
task = self.task_registry.create_task(task_code, self.config, self.db_ops, local_client, self.logger)
|
||||
self.logger.info("%s: 本地清洗入库开始,源目录=%s", task_code, source_dir)
|
||||
return task.execute(cursor_data)
|
||||
|
||||
def _build_fetch_dir(self, task_code: str, run_id: int) -> Path:
|
||||
ts = datetime.now(self.tz).strftime("%Y%m%d-%H%M%S")
|
||||
return Path(self.fetch_root) / f"{task_code.upper()}-{run_id}-{ts}"
|
||||
|
||||
def _resolve_ingest_source(self, fetch_dir: Path, fetch_stats: dict | None) -> Path:
|
||||
if fetch_stats and fetch_dir.exists():
|
||||
return fetch_dir
|
||||
if self.ingest_source_dir:
|
||||
return Path(self.ingest_source_dir)
|
||||
raise FileNotFoundError("未提供本地清洗入库所需的 JSON 目录")
|
||||
|
||||
def _counts_from_fetch(self, stats: dict | None) -> dict:
|
||||
fetched = (stats or {}).get("records") or 0
|
||||
return {
|
||||
"fetched": fetched,
|
||||
"inserted": 0,
|
||||
"updated": 0,
|
||||
"skipped": 0,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _flow_includes_fetch(self) -> bool:
|
||||
return self.pipeline_flow in {"FETCH_ONLY", "FULL"}
|
||||
|
||||
def _flow_includes_ingest(self) -> bool:
|
||||
return self.pipeline_flow in {"INGEST_ONLY", "FULL"}
|
||||
|
||||
def _load_task_config(self, task_code: str, store_id: int) -> dict | None:
|
||||
"""从数据库加载任务配置。"""
|
||||
sql = """
|
||||
SELECT task_id, task_code, store_id, enabled, cursor_field,
|
||||
window_minutes_default, overlap_seconds, page_size, retry_max, params
|
||||
FROM etl_admin.etl_task
|
||||
WHERE store_id = %s AND task_code = %s AND enabled = TRUE
|
||||
"""
|
||||
|
||||
rows = self.db_conn.query(sql, (store_id, task_code))
|
||||
return rows[0] if rows else None
|
||||
|
||||
def close(self):
|
||||
"""关闭连接。"""
|
||||
self.db_conn.close()
|
||||
|
||||
@staticmethod
|
||||
def _map_run_status(status: str) -> str:
|
||||
"""
|
||||
将任务返回的状态转换为 etl_admin.run_status_enum
|
||||
(SUCC / FAIL / PARTIAL)
|
||||
"""
|
||||
normalized = (status or "").upper()
|
||||
if normalized in {"SUCCESS", "SUCC"}:
|
||||
return "SUCC"
|
||||
if normalized in {"FAIL", "FAILED", "ERROR"}:
|
||||
return "FAIL"
|
||||
if normalized in {"RUNNING", "PARTIAL", "PENDING", "IN_PROGRESS"}:
|
||||
return "PARTIAL"
|
||||
# 未知状态默认标记为 FAIL,便于排查
|
||||
return "FAIL"
|
||||
76
etl_billiards/orchestration/task_registry.py
Normal file
76
etl_billiards/orchestration/task_registry.py
Normal file
@@ -0,0 +1,76 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""任务注册表"""
|
||||
from tasks.orders_task import OrdersTask
|
||||
from tasks.payments_task import PaymentsTask
|
||||
from tasks.members_task import MembersTask
|
||||
from tasks.products_task import ProductsTask
|
||||
from tasks.tables_task import TablesTask
|
||||
from tasks.assistants_task import AssistantsTask
|
||||
from tasks.packages_task import PackagesDefTask
|
||||
from tasks.refunds_task import RefundsTask
|
||||
from tasks.coupon_usage_task import CouponUsageTask
|
||||
from tasks.inventory_change_task import InventoryChangeTask
|
||||
from tasks.topups_task import TopupsTask
|
||||
from tasks.table_discount_task import TableDiscountTask
|
||||
from tasks.assistant_abolish_task import AssistantAbolishTask
|
||||
from tasks.ledger_task import LedgerTask
|
||||
from tasks.ods_tasks import ODS_TASK_CLASSES
|
||||
from tasks.manual_ingest_task import ManualIngestTask
|
||||
from tasks.payments_dwd_task import PaymentsDwdTask
|
||||
from tasks.members_dwd_task import MembersDwdTask
|
||||
from tasks.init_schema_task import InitOdsSchemaTask
|
||||
from tasks.init_dwd_schema_task import InitDwdSchemaTask
|
||||
from tasks.dwd_load_task import DwdLoadTask
|
||||
from tasks.ticket_dwd_task import TicketDwdTask
|
||||
from tasks.dwd_quality_task import DwdQualityTask
|
||||
|
||||
class TaskRegistry:
|
||||
"""任务注册和工厂"""
|
||||
|
||||
def __init__(self):
|
||||
self._tasks = {}
|
||||
|
||||
def register(self, task_code: str, task_class):
|
||||
"""注册任务类"""
|
||||
self._tasks[task_code.upper()] = task_class
|
||||
|
||||
def create_task(self, task_code: str, config, db_connection, api_client, logger):
|
||||
"""创建任务实例"""
|
||||
task_code = task_code.upper()
|
||||
if task_code not in self._tasks:
|
||||
raise ValueError(f"未知的任务类型: {task_code}")
|
||||
|
||||
task_class = self._tasks[task_code]
|
||||
return task_class(config, db_connection, api_client, logger)
|
||||
|
||||
def get_all_task_codes(self) -> list:
|
||||
"""获取所有已注册的任务代码"""
|
||||
return list(self._tasks.keys())
|
||||
|
||||
|
||||
# 默认注册表
|
||||
default_registry = TaskRegistry()
|
||||
default_registry.register("PRODUCTS", ProductsTask)
|
||||
default_registry.register("TABLES", TablesTask)
|
||||
default_registry.register("MEMBERS", MembersTask)
|
||||
default_registry.register("ASSISTANTS", AssistantsTask)
|
||||
default_registry.register("PACKAGES_DEF", PackagesDefTask)
|
||||
default_registry.register("ORDERS", OrdersTask)
|
||||
default_registry.register("PAYMENTS", PaymentsTask)
|
||||
default_registry.register("REFUNDS", RefundsTask)
|
||||
default_registry.register("COUPON_USAGE", CouponUsageTask)
|
||||
default_registry.register("INVENTORY_CHANGE", InventoryChangeTask)
|
||||
default_registry.register("TOPUPS", TopupsTask)
|
||||
default_registry.register("TABLE_DISCOUNT", TableDiscountTask)
|
||||
default_registry.register("ASSISTANT_ABOLISH", AssistantAbolishTask)
|
||||
default_registry.register("LEDGER", LedgerTask)
|
||||
default_registry.register("TICKET_DWD", TicketDwdTask)
|
||||
default_registry.register("MANUAL_INGEST", ManualIngestTask)
|
||||
default_registry.register("PAYMENTS_DWD", PaymentsDwdTask)
|
||||
default_registry.register("MEMBERS_DWD", MembersDwdTask)
|
||||
default_registry.register("INIT_ODS_SCHEMA", InitOdsSchemaTask)
|
||||
default_registry.register("INIT_DWD_SCHEMA", InitDwdSchemaTask)
|
||||
default_registry.register("DWD_LOAD_FROM_ODS", DwdLoadTask)
|
||||
default_registry.register("DWD_QUALITY_CHECK", DwdQualityTask)
|
||||
for code, task_cls in ODS_TASK_CLASSES.items():
|
||||
default_registry.register(code, task_cls)
|
||||
0
etl_billiards/quality/__init__.py
Normal file
0
etl_billiards/quality/__init__.py
Normal file
73
etl_billiards/quality/balance_checker.py
Normal file
73
etl_billiards/quality/balance_checker.py
Normal file
@@ -0,0 +1,73 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""余额一致性检查器"""
|
||||
from .base_checker import BaseDataQualityChecker
|
||||
|
||||
class BalanceChecker(BaseDataQualityChecker):
|
||||
"""检查订单、支付、退款的金额一致性"""
|
||||
|
||||
def check(self, store_id: int, start_date: str, end_date: str) -> dict:
|
||||
"""
|
||||
检查指定时间范围内的余额一致性
|
||||
|
||||
验证: 订单总额 = 支付总额 - 退款总额
|
||||
"""
|
||||
checks = []
|
||||
|
||||
# 查询订单总额
|
||||
sql_orders = """
|
||||
SELECT COALESCE(SUM(final_amount), 0) AS total
|
||||
FROM billiards.fact_order
|
||||
WHERE store_id = %s
|
||||
AND order_time >= %s
|
||||
AND order_time < %s
|
||||
AND order_status = 'COMPLETED'
|
||||
"""
|
||||
order_total = self.db.query(sql_orders, (store_id, start_date, end_date))[0]["total"]
|
||||
|
||||
# 查询支付总额
|
||||
sql_payments = """
|
||||
SELECT COALESCE(SUM(pay_amount), 0) AS total
|
||||
FROM billiards.fact_payment
|
||||
WHERE store_id = %s
|
||||
AND pay_time >= %s
|
||||
AND pay_time < %s
|
||||
AND pay_status = 'SUCCESS'
|
||||
"""
|
||||
payment_total = self.db.query(sql_payments, (store_id, start_date, end_date))[0]["total"]
|
||||
|
||||
# 查询退款总额
|
||||
sql_refunds = """
|
||||
SELECT COALESCE(SUM(refund_amount), 0) AS total
|
||||
FROM billiards.fact_refund
|
||||
WHERE store_id = %s
|
||||
AND refund_time >= %s
|
||||
AND refund_time < %s
|
||||
AND refund_status = 'SUCCESS'
|
||||
"""
|
||||
refund_total = self.db.query(sql_refunds, (store_id, start_date, end_date))[0]["total"]
|
||||
|
||||
# 验证余额
|
||||
expected_total = payment_total - refund_total
|
||||
diff = abs(float(order_total) - float(expected_total))
|
||||
threshold = 0.01 # 1分钱的容差
|
||||
|
||||
passed = diff < threshold
|
||||
|
||||
checks.append({
|
||||
"name": "balance_consistency",
|
||||
"passed": passed,
|
||||
"message": f"订单总额: {order_total}, 支付-退款: {expected_total}, 差异: {diff}",
|
||||
"details": {
|
||||
"order_total": float(order_total),
|
||||
"payment_total": float(payment_total),
|
||||
"refund_total": float(refund_total),
|
||||
"diff": diff
|
||||
}
|
||||
})
|
||||
|
||||
all_passed = all(c["passed"] for c in checks)
|
||||
|
||||
return {
|
||||
"passed": all_passed,
|
||||
"checks": checks
|
||||
}
|
||||
19
etl_billiards/quality/base_checker.py
Normal file
19
etl_billiards/quality/base_checker.py
Normal file
@@ -0,0 +1,19 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据质量检查器基类"""
|
||||
|
||||
class BaseDataQualityChecker:
|
||||
"""数据质量检查器基类"""
|
||||
|
||||
def __init__(self, db_connection, logger):
|
||||
self.db = db_connection
|
||||
self.logger = logger
|
||||
|
||||
def check(self) -> dict:
|
||||
"""
|
||||
执行质量检查
|
||||
返回: {
|
||||
"passed": bool,
|
||||
"checks": [{"name": str, "passed": bool, "message": str}]
|
||||
}
|
||||
"""
|
||||
raise NotImplementedError("子类需实现 check 方法")
|
||||
692
etl_billiards/reports/dwd_quality_report.json
Normal file
692
etl_billiards/reports/dwd_quality_report.json
Normal file
@@ -0,0 +1,692 @@
|
||||
{
|
||||
"generated_at": "2025-12-09T05:21:24.745244",
|
||||
"tables": [
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_site",
|
||||
"ods_table": "billiards_ods.table_fee_transactions",
|
||||
"count": {
|
||||
"dwd": 1,
|
||||
"ods": 200,
|
||||
"diff": -199
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_site_ex",
|
||||
"ods_table": "billiards_ods.table_fee_transactions",
|
||||
"count": {
|
||||
"dwd": 1,
|
||||
"ods": 200,
|
||||
"diff": -199
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_table",
|
||||
"ods_table": "billiards_ods.site_tables_master",
|
||||
"count": {
|
||||
"dwd": 71,
|
||||
"ods": 71,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_table_ex",
|
||||
"ods_table": "billiards_ods.site_tables_master",
|
||||
"count": {
|
||||
"dwd": 71,
|
||||
"ods": 71,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_assistant",
|
||||
"ods_table": "billiards_ods.assistant_accounts_master",
|
||||
"count": {
|
||||
"dwd": 50,
|
||||
"ods": 50,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_assistant_ex",
|
||||
"ods_table": "billiards_ods.assistant_accounts_master",
|
||||
"count": {
|
||||
"dwd": 50,
|
||||
"ods": 50,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_member",
|
||||
"ods_table": "billiards_ods.member_profiles",
|
||||
"count": {
|
||||
"dwd": 199,
|
||||
"ods": 199,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_member_ex",
|
||||
"ods_table": "billiards_ods.member_profiles",
|
||||
"count": {
|
||||
"dwd": 199,
|
||||
"ods": 199,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_member_card_account",
|
||||
"ods_table": "billiards_ods.member_stored_value_cards",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "balance",
|
||||
"dwd_sum": 31061.03,
|
||||
"ods_sum": 31061.03,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_member_card_account_ex",
|
||||
"ods_table": "billiards_ods.member_stored_value_cards",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "deliveryfeededuct",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_tenant_goods",
|
||||
"ods_table": "billiards_ods.tenant_goods_master",
|
||||
"count": {
|
||||
"dwd": 156,
|
||||
"ods": 156,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_tenant_goods_ex",
|
||||
"ods_table": "billiards_ods.tenant_goods_master",
|
||||
"count": {
|
||||
"dwd": 156,
|
||||
"ods": 156,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_store_goods",
|
||||
"ods_table": "billiards_ods.store_goods_master",
|
||||
"count": {
|
||||
"dwd": 161,
|
||||
"ods": 161,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_store_goods_ex",
|
||||
"ods_table": "billiards_ods.store_goods_master",
|
||||
"count": {
|
||||
"dwd": 161,
|
||||
"ods": 161,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_goods_category",
|
||||
"ods_table": "billiards_ods.stock_goods_category_tree",
|
||||
"count": {
|
||||
"dwd": 26,
|
||||
"ods": 9,
|
||||
"diff": 17
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_groupbuy_package",
|
||||
"ods_table": "billiards_ods.group_buy_packages",
|
||||
"count": {
|
||||
"dwd": 17,
|
||||
"ods": 17,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_groupbuy_package_ex",
|
||||
"ods_table": "billiards_ods.group_buy_packages",
|
||||
"count": {
|
||||
"dwd": 17,
|
||||
"ods": 17,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_settlement_head",
|
||||
"ods_table": "billiards_ods.settlement_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_settlement_head_ex",
|
||||
"ods_table": "billiards_ods.settlement_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_table_fee_log",
|
||||
"ods_table": "billiards_ods.table_fee_transactions",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "adjust_amount",
|
||||
"dwd_sum": 1157.45,
|
||||
"ods_sum": 1157.45,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "coupon_promotion_amount",
|
||||
"dwd_sum": 11244.49,
|
||||
"ods_sum": 11244.49,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 18107.0,
|
||||
"ods_sum": 18107.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "member_discount_amount",
|
||||
"dwd_sum": 1149.19,
|
||||
"ods_sum": 1149.19,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "real_table_charge_money",
|
||||
"dwd_sum": 5705.06,
|
||||
"ods_sum": 5705.06,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_table_fee_log_ex",
|
||||
"ods_table": "billiards_ods.table_fee_transactions",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "fee_total",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "mgmt_fee",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "service_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "used_card_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_table_fee_adjust",
|
||||
"ods_table": "billiards_ods.table_fee_discount_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 20650.84,
|
||||
"ods_sum": 20650.84,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_table_fee_adjust_ex",
|
||||
"ods_table": "billiards_ods.table_fee_discount_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_store_goods_sale",
|
||||
"ods_table": "billiards_ods.store_goods_sales_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "cost_money",
|
||||
"dwd_sum": 22.3,
|
||||
"ods_sum": 22.3,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 4583.0,
|
||||
"ods_sum": 4583.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "real_goods_money",
|
||||
"dwd_sum": 3791.0,
|
||||
"ods_sum": 3791.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_store_goods_sale_ex",
|
||||
"ods_table": "billiards_ods.store_goods_sales_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "coupon_deduct_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "discount_money",
|
||||
"dwd_sum": 792.0,
|
||||
"ods_sum": 792.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "member_discount_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "option_coupon_deduct_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "option_member_discount_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "point_discount_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "point_discount_money_cost",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "push_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_assistant_service_log",
|
||||
"ods_table": "billiards_ods.assistant_service_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "coupon_deduct_money",
|
||||
"dwd_sum": 626.83,
|
||||
"ods_sum": 626.83,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 63251.37,
|
||||
"ods_sum": 63251.37,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_assistant_service_log_ex",
|
||||
"ods_table": "billiards_ods.assistant_service_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "manual_discount_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "member_discount_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "service_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_assistant_trash_event",
|
||||
"ods_table": "billiards_ods.assistant_cancellation_records",
|
||||
"count": {
|
||||
"dwd": 15,
|
||||
"ods": 15,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_assistant_trash_event_ex",
|
||||
"ods_table": "billiards_ods.assistant_cancellation_records",
|
||||
"count": {
|
||||
"dwd": 15,
|
||||
"ods": 15,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_member_balance_change",
|
||||
"ods_table": "billiards_ods.member_balance_changes",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_member_balance_change_ex",
|
||||
"ods_table": "billiards_ods.member_balance_changes",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "refund_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_groupbuy_redemption",
|
||||
"ods_table": "billiards_ods.group_buy_redemption_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "coupon_money",
|
||||
"dwd_sum": 12266.0,
|
||||
"ods_sum": 12266.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 12049.53,
|
||||
"ods_sum": 12049.53,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_groupbuy_redemption_ex",
|
||||
"ods_table": "billiards_ods.group_buy_redemption_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "assistant_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "assistant_service_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "goods_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "recharge_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "reward_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "table_service_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_platform_coupon_redemption",
|
||||
"ods_table": "billiards_ods.platform_coupon_redemption_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "coupon_money",
|
||||
"dwd_sum": 11956.0,
|
||||
"ods_sum": 11956.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_platform_coupon_redemption_ex",
|
||||
"ods_table": "billiards_ods.platform_coupon_redemption_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_recharge_order",
|
||||
"ods_table": "billiards_ods.recharge_settlements",
|
||||
"count": {
|
||||
"dwd": 74,
|
||||
"ods": 74,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_recharge_order_ex",
|
||||
"ods_table": "billiards_ods.recharge_settlements",
|
||||
"count": {
|
||||
"dwd": 74,
|
||||
"ods": 74,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_payment",
|
||||
"ods_table": "billiards_ods.payment_transactions",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "pay_amount",
|
||||
"dwd_sum": 10863.0,
|
||||
"ods_sum": 10863.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_refund",
|
||||
"ods_table": "billiards_ods.refund_transactions",
|
||||
"count": {
|
||||
"dwd": 11,
|
||||
"ods": 11,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "channel_fee",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "pay_amount",
|
||||
"dwd_sum": -62186.0,
|
||||
"ods_sum": -62186.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_refund_ex",
|
||||
"ods_table": "billiards_ods.refund_transactions",
|
||||
"count": {
|
||||
"dwd": 11,
|
||||
"ods": 11,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "balance_frozen_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "card_frozen_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "refund_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "round_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"note": "行数/金额核对,金额字段基于列名包含 amount/money/fee/balance 的数值列自动扫描。"
|
||||
}
|
||||
5
etl_billiards/requirements.txt
Normal file
5
etl_billiards/requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
# Python依赖包
|
||||
psycopg2-binary>=2.9.0
|
||||
requests>=2.28.0
|
||||
python-dateutil>=2.8.0
|
||||
tzdata>=2023.0
|
||||
27
etl_billiards/run_ods.bat
Normal file
27
etl_billiards/run_ods.bat
Normal file
@@ -0,0 +1,27 @@
|
||||
@echo off
|
||||
REM -*- coding: utf-8 -*-
|
||||
REM 说明:一键重建 ODS(执行 INIT_ODS_SCHEMA)并灌入示例 JSON(执行 MANUAL_INGEST)
|
||||
REM 使用配置:.env 中 PG_DSN、INGEST_SOURCE_DIR,或通过参数覆盖
|
||||
|
||||
setlocal
|
||||
cd /d %~dp0
|
||||
|
||||
REM 如果需要覆盖示例目录,可修改下面的 INGEST_DIR
|
||||
set "INGEST_DIR=C:\dev\LLTQ\export\test-json-doc"
|
||||
|
||||
echo [INIT_ODS_SCHEMA] 准备执行,源目录=%INGEST_DIR%
|
||||
python -m cli.main --tasks INIT_ODS_SCHEMA --pipeline-flow INGEST_ONLY --ingest-source "%INGEST_DIR%"
|
||||
if errorlevel 1 (
|
||||
echo INIT_ODS_SCHEMA 失败,退出
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo [MANUAL_INGEST] 准备执行,源目录=%INGEST_DIR%
|
||||
python -m cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "%INGEST_DIR%"
|
||||
if errorlevel 1 (
|
||||
echo MANUAL_INGEST 失败,退出
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo 全部完成。
|
||||
endlocal
|
||||
0
etl_billiards/scd/__init__.py
Normal file
0
etl_billiards/scd/__init__.py
Normal file
89
etl_billiards/scd/scd2_handler.py
Normal file
89
etl_billiards/scd/scd2_handler.py
Normal file
@@ -0,0 +1,89 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""SCD2 (Slowly Changing Dimension Type 2) 处理逻辑"""
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
def _row_to_dict(cursor, row):
|
||||
if row is None:
|
||||
return None
|
||||
columns = [desc[0] for desc in cursor.description]
|
||||
return {col: row[idx] for idx, col in enumerate(columns)}
|
||||
|
||||
|
||||
class SCD2Handler:
|
||||
"""SCD2历史记录处理"""
|
||||
|
||||
def __init__(self, db_ops):
|
||||
self.db = db_ops
|
||||
|
||||
def upsert(
|
||||
self,
|
||||
table_name: str,
|
||||
natural_key: list,
|
||||
tracked_fields: list,
|
||||
record: dict,
|
||||
effective_date: datetime = None,
|
||||
) -> str:
|
||||
"""
|
||||
处理SCD2更新
|
||||
|
||||
Returns:
|
||||
操作类型: 'INSERT', 'UPDATE', 'UNCHANGED'
|
||||
"""
|
||||
effective_date = effective_date or datetime.now()
|
||||
|
||||
where_clause = " AND ".join([f"{k} = %({k})s" for k in natural_key])
|
||||
sql_select = f"""
|
||||
SELECT * FROM {table_name}
|
||||
WHERE {where_clause}
|
||||
AND valid_to IS NULL
|
||||
"""
|
||||
|
||||
with self.db.conn.cursor() as current:
|
||||
current.execute(sql_select, record)
|
||||
existing = _row_to_dict(current, current.fetchone())
|
||||
|
||||
if not existing:
|
||||
record["valid_from"] = effective_date
|
||||
record["valid_to"] = None
|
||||
record["is_current"] = True
|
||||
|
||||
fields = list(record.keys())
|
||||
placeholders = ", ".join([f"%({f})s" for f in fields])
|
||||
sql_insert = f"""
|
||||
INSERT INTO {table_name} ({', '.join(fields)})
|
||||
VALUES ({placeholders})
|
||||
"""
|
||||
current.execute(sql_insert, record)
|
||||
return "INSERT"
|
||||
|
||||
has_changes = any(existing.get(field) != record.get(field) for field in tracked_fields)
|
||||
if not has_changes:
|
||||
return "UNCHANGED"
|
||||
|
||||
update_where = " AND ".join([f"{k} = %({k})s" for k in natural_key])
|
||||
sql_close = f"""
|
||||
UPDATE {table_name}
|
||||
SET valid_to = %(effective_date)s,
|
||||
is_current = FALSE
|
||||
WHERE {update_where}
|
||||
AND valid_to IS NULL
|
||||
"""
|
||||
record["effective_date"] = effective_date
|
||||
current.execute(sql_close, record)
|
||||
|
||||
record["valid_from"] = effective_date
|
||||
record["valid_to"] = None
|
||||
record["is_current"] = True
|
||||
|
||||
fields = list(record.keys())
|
||||
if "effective_date" in fields:
|
||||
fields.remove("effective_date")
|
||||
placeholders = ", ".join([f"%({f})s" for f in fields])
|
||||
sql_insert = f"""
|
||||
INSERT INTO {table_name} ({', '.join(fields)})
|
||||
VALUES ({placeholders})
|
||||
"""
|
||||
current.execute(sql_insert, record)
|
||||
|
||||
return "UPDATE"
|
||||
0
etl_billiards/scripts/Temp1.py
Normal file
0
etl_billiards/scripts/Temp1.py
Normal file
76
etl_billiards/scripts/bootstrap_schema.py
Normal file
76
etl_billiards/scripts/bootstrap_schema.py
Normal file
@@ -0,0 +1,76 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Apply the PRD-aligned warehouse schema (ODS/DWD/DWS) to PostgreSQL."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
PROJECT_ROOT = Path(__file__).resolve().parents[1]
|
||||
if str(PROJECT_ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(PROJECT_ROOT))
|
||||
|
||||
from database.connection import DatabaseConnection # noqa: E402
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Create/upgrade warehouse schemas using schema_v2.sql"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dsn",
|
||||
help="PostgreSQL DSN (fallback to PG_DSN env)",
|
||||
default=os.environ.get("PG_DSN"),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--file",
|
||||
help="Path to schema SQL",
|
||||
default=str(PROJECT_ROOT / "database" / "schema_v2.sql"),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
|
||||
help="connect_timeout seconds (capped at 20, default 10)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def apply_schema(dsn: str, sql_path: Path, timeout: int) -> None:
|
||||
if not sql_path.exists():
|
||||
raise FileNotFoundError(f"Schema file not found: {sql_path}")
|
||||
|
||||
sql_text = sql_path.read_text(encoding="utf-8")
|
||||
timeout_val = max(1, min(timeout, 20))
|
||||
|
||||
conn = DatabaseConnection(dsn, connect_timeout=timeout_val)
|
||||
try:
|
||||
with conn.conn.cursor() as cur:
|
||||
cur.execute(sql_text)
|
||||
conn.commit()
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
raise
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
if not args.dsn:
|
||||
print("Missing DSN. Set PG_DSN or pass --dsn.", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
try:
|
||||
apply_schema(args.dsn, Path(args.file), args.timeout)
|
||||
except Exception as exc: # pragma: no cover - utility script
|
||||
print(f"Schema apply failed: {exc}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
print("Schema applied successfully.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
426
etl_billiards/scripts/build_dwd_from_ods.py
Normal file
426
etl_billiards/scripts/build_dwd_from_ods.py
Normal file
@@ -0,0 +1,426 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Populate PRD DWD tables from ODS payload snapshots."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
import psycopg2
|
||||
|
||||
|
||||
SQL_STEPS: list[tuple[str, str]] = [
|
||||
(
|
||||
"dim_tenant",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_tenant (tenant_id, tenant_name, status)
|
||||
SELECT DISTINCT tenant_id, 'default' AS tenant_name, 'active' AS status
|
||||
FROM (
|
||||
SELECT tenant_id FROM billiards_ods.settlement_records
|
||||
UNION SELECT tenant_id FROM billiards_ods.ods_order_receipt_detail
|
||||
UNION SELECT tenant_id FROM billiards_ods.member_profiles
|
||||
) s
|
||||
WHERE tenant_id IS NOT NULL
|
||||
ON CONFLICT (tenant_id) DO UPDATE SET updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_site",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_site (site_id, tenant_id, site_name, status)
|
||||
SELECT DISTINCT site_id, MAX(tenant_id) AS tenant_id, 'default' AS site_name, 'active' AS status
|
||||
FROM (
|
||||
SELECT site_id, tenant_id FROM billiards_ods.settlement_records
|
||||
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_order_receipt_detail
|
||||
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_table_info
|
||||
) s
|
||||
WHERE site_id IS NOT NULL
|
||||
GROUP BY site_id
|
||||
ON CONFLICT (site_id) DO UPDATE SET updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_product_category",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_product_category (category_id, category_name, parent_id, level_no, status)
|
||||
SELECT DISTINCT category_id, category_name, parent_id, level_no, status
|
||||
FROM billiards_ods.ods_goods_category
|
||||
WHERE category_id IS NOT NULL
|
||||
ON CONFLICT (category_id) DO UPDATE SET
|
||||
category_name = EXCLUDED.category_name,
|
||||
parent_id = EXCLUDED.parent_id,
|
||||
level_no = EXCLUDED.level_no,
|
||||
status = EXCLUDED.status;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_product",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_product (goods_id, goods_name, goods_code, category_id, category_name, unit, default_price, status)
|
||||
SELECT DISTINCT goods_id, goods_name, NULL::TEXT AS goods_code, category_id, category_name, NULL::TEXT AS unit, sale_price AS default_price, status
|
||||
FROM billiards_ods.ods_store_product
|
||||
WHERE goods_id IS NOT NULL
|
||||
ON CONFLICT (goods_id) DO UPDATE SET
|
||||
goods_name = EXCLUDED.goods_name,
|
||||
category_id = EXCLUDED.category_id,
|
||||
category_name = EXCLUDED.category_name,
|
||||
default_price = EXCLUDED.default_price,
|
||||
status = EXCLUDED.status,
|
||||
updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_product_from_sales",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_product (goods_id, goods_name)
|
||||
SELECT DISTINCT goods_id, goods_name
|
||||
FROM billiards_ods.ods_store_sale_item
|
||||
WHERE goods_id IS NOT NULL
|
||||
ON CONFLICT (goods_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_member_card_type",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_member_card_type (card_type_id, card_type_name, discount_rate)
|
||||
SELECT DISTINCT card_type_id, card_type_name, discount_rate
|
||||
FROM billiards_ods.member_stored_value_cards
|
||||
WHERE card_type_id IS NOT NULL
|
||||
ON CONFLICT (card_type_id) DO UPDATE SET
|
||||
card_type_name = EXCLUDED.card_type_name,
|
||||
discount_rate = EXCLUDED.discount_rate;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_member",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_member (
|
||||
site_id, member_id, tenant_id, member_name, nickname, gender, birthday, mobile,
|
||||
member_type_id, member_type_name, status, register_time, last_visit_time,
|
||||
balance, total_recharge_amount, total_consumed_amount, wechat_id, alipay_id, remark
|
||||
)
|
||||
SELECT DISTINCT
|
||||
prof.site_id,
|
||||
prof.member_id,
|
||||
prof.tenant_id,
|
||||
prof.member_name,
|
||||
prof.nickname,
|
||||
prof.gender,
|
||||
prof.birthday,
|
||||
prof.mobile,
|
||||
card.member_type_id,
|
||||
card.member_type_name,
|
||||
prof.status,
|
||||
prof.register_time,
|
||||
prof.last_visit_time,
|
||||
prof.balance,
|
||||
NULL::NUMERIC AS total_recharge_amount,
|
||||
NULL::NUMERIC AS total_consumed_amount,
|
||||
prof.wechat_id,
|
||||
prof.alipay_id,
|
||||
prof.remarks
|
||||
FROM billiards_ods.member_profiles prof
|
||||
LEFT JOIN (
|
||||
SELECT DISTINCT site_id, member_id, card_type_id AS member_type_id, card_type_name AS member_type_name
|
||||
FROM billiards_ods.member_stored_value_cards
|
||||
) card
|
||||
ON prof.site_id = card.site_id AND prof.member_id = card.member_id
|
||||
WHERE prof.member_id IS NOT NULL
|
||||
ON CONFLICT (site_id, member_id) DO UPDATE SET
|
||||
member_name = EXCLUDED.member_name,
|
||||
nickname = EXCLUDED.nickname,
|
||||
gender = EXCLUDED.gender,
|
||||
birthday = EXCLUDED.birthday,
|
||||
mobile = EXCLUDED.mobile,
|
||||
member_type_id = EXCLUDED.member_type_id,
|
||||
member_type_name = EXCLUDED.member_type_name,
|
||||
status = EXCLUDED.status,
|
||||
register_time = EXCLUDED.register_time,
|
||||
last_visit_time = EXCLUDED.last_visit_time,
|
||||
balance = EXCLUDED.balance,
|
||||
wechat_id = EXCLUDED.wechat_id,
|
||||
alipay_id = EXCLUDED.alipay_id,
|
||||
remark = EXCLUDED.remark,
|
||||
updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_table",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_table (table_id, site_id, table_code, table_name, table_type, area_name, status, created_time, updated_time)
|
||||
SELECT DISTINCT table_id, site_id, table_code, table_name, table_type, area_name, status, created_time, updated_time
|
||||
FROM billiards_ods.ods_table_info
|
||||
WHERE table_id IS NOT NULL
|
||||
ON CONFLICT (table_id) DO UPDATE SET
|
||||
site_id = EXCLUDED.site_id,
|
||||
table_code = EXCLUDED.table_code,
|
||||
table_name = EXCLUDED.table_name,
|
||||
table_type = EXCLUDED.table_type,
|
||||
area_name = EXCLUDED.area_name,
|
||||
status = EXCLUDED.status,
|
||||
created_time = EXCLUDED.created_time,
|
||||
updated_time = EXCLUDED.updated_time;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_assistant",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_assistant (assistant_id, assistant_name, mobile, status)
|
||||
SELECT DISTINCT assistant_id, assistant_name, mobile, status
|
||||
FROM billiards_ods.assistant_accounts_master
|
||||
WHERE assistant_id IS NOT NULL
|
||||
ON CONFLICT (assistant_id) DO UPDATE SET
|
||||
assistant_name = EXCLUDED.assistant_name,
|
||||
mobile = EXCLUDED.mobile,
|
||||
status = EXCLUDED.status,
|
||||
updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_pay_method",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_pay_method (pay_method_code, pay_method_name, is_stored_value, status)
|
||||
SELECT DISTINCT pay_method_code, pay_method_name, FALSE AS is_stored_value, 'active' AS status
|
||||
FROM billiards_ods.payment_transactions
|
||||
WHERE pay_method_code IS NOT NULL
|
||||
ON CONFLICT (pay_method_code) DO UPDATE SET
|
||||
pay_method_name = EXCLUDED.pay_method_name,
|
||||
status = EXCLUDED.status,
|
||||
updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_coupon_platform",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_coupon_platform (platform_code, platform_name)
|
||||
SELECT DISTINCT platform_code, platform_code AS platform_name
|
||||
FROM billiards_ods.ods_platform_coupon_log
|
||||
WHERE platform_code IS NOT NULL
|
||||
ON CONFLICT (platform_code) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_sale_item",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_sale_item (
|
||||
site_id, sale_item_id, order_trade_no, order_settle_id, member_id,
|
||||
goods_id, category_id, quantity, original_amount, discount_amount,
|
||||
final_amount, is_gift, sale_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
sale_item_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
NULL::BIGINT AS member_id,
|
||||
goods_id,
|
||||
category_id,
|
||||
quantity,
|
||||
original_amount,
|
||||
discount_amount,
|
||||
final_amount,
|
||||
COALESCE(is_gift, FALSE),
|
||||
sale_time
|
||||
FROM billiards_ods.ods_store_sale_item
|
||||
ON CONFLICT (site_id, sale_item_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_table_usage",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_table_usage (
|
||||
site_id, ledger_id, order_trade_no, order_settle_id, table_id,
|
||||
member_id, start_time, end_time, duration_minutes,
|
||||
original_table_fee, member_discount_amount, manual_discount_amount,
|
||||
final_table_fee, is_canceled, cancel_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
ledger_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
table_id,
|
||||
member_id,
|
||||
start_time,
|
||||
end_time,
|
||||
duration_minutes,
|
||||
original_table_fee,
|
||||
0::NUMERIC AS member_discount_amount,
|
||||
discount_amount AS manual_discount_amount,
|
||||
final_table_fee,
|
||||
FALSE AS is_canceled,
|
||||
NULL::TIMESTAMPTZ AS cancel_time
|
||||
FROM billiards_ods.table_fee_transactions_log
|
||||
ON CONFLICT (site_id, ledger_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_assistant_service",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_assistant_service (
|
||||
site_id, ledger_id, order_trade_no, order_settle_id, assistant_id,
|
||||
assist_type_code, member_id, start_time, end_time, duration_minutes,
|
||||
original_fee, member_discount_amount, manual_discount_amount,
|
||||
final_fee, is_canceled, cancel_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
ledger_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
assistant_id,
|
||||
NULL::TEXT AS assist_type_code,
|
||||
member_id,
|
||||
start_time,
|
||||
end_time,
|
||||
duration_minutes,
|
||||
original_fee,
|
||||
0::NUMERIC AS member_discount_amount,
|
||||
discount_amount AS manual_discount_amount,
|
||||
final_fee,
|
||||
FALSE AS is_canceled,
|
||||
NULL::TIMESTAMPTZ AS cancel_time
|
||||
FROM billiards_ods.ods_assistant_service_log
|
||||
ON CONFLICT (site_id, ledger_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_coupon_usage",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_coupon_usage (
|
||||
site_id, coupon_id, package_id, order_trade_no, order_settle_id,
|
||||
member_id, platform_code, status, deduct_amount, settle_price, used_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
coupon_id,
|
||||
NULL::BIGINT AS package_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
member_id,
|
||||
platform_code,
|
||||
status,
|
||||
deduct_amount,
|
||||
settle_price,
|
||||
used_time
|
||||
FROM billiards_ods.ods_platform_coupon_log
|
||||
ON CONFLICT (site_id, coupon_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_payment",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_payment (
|
||||
site_id, pay_id, order_trade_no, order_settle_id, member_id,
|
||||
pay_method_code, pay_amount, pay_time, relate_type, relate_id
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
pay_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
member_id,
|
||||
pay_method_code,
|
||||
pay_amount,
|
||||
pay_time,
|
||||
relate_type,
|
||||
relate_id
|
||||
FROM billiards_ods.payment_transactions
|
||||
ON CONFLICT (site_id, pay_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_refund",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_refund (
|
||||
site_id, refund_id, order_trade_no, order_settle_id, member_id,
|
||||
pay_method_code, refund_amount, refund_time, status
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
refund_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
member_id,
|
||||
pay_method_code,
|
||||
refund_amount,
|
||||
refund_time,
|
||||
status
|
||||
FROM billiards_ods.refund_transactions
|
||||
ON CONFLICT (site_id, refund_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_balance_change",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_balance_change (
|
||||
site_id, change_id, member_id, change_type, relate_type, relate_id,
|
||||
pay_method_code, change_amount, balance_before, balance_after, change_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
change_id,
|
||||
member_id,
|
||||
change_type,
|
||||
NULL::TEXT AS relate_type,
|
||||
relate_id,
|
||||
NULL::TEXT AS pay_method_code,
|
||||
change_amount,
|
||||
balance_before,
|
||||
balance_after,
|
||||
change_time
|
||||
FROM billiards_ods.member_balance_changes
|
||||
ON CONFLICT (site_id, change_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(description="Build DWD tables from ODS payloads (PRD schema).")
|
||||
parser.add_argument(
|
||||
"--dsn",
|
||||
default=os.environ.get("PG_DSN"),
|
||||
help="PostgreSQL DSN (fallback PG_DSN env)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
|
||||
help="connect_timeout seconds (capped at 20, default 10)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
if not args.dsn:
|
||||
print("Missing DSN. Use --dsn or PG_DSN.", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
timeout_val = max(1, min(args.timeout, 20))
|
||||
conn = psycopg2.connect(args.dsn, connect_timeout=timeout_val)
|
||||
conn.autocommit = False
|
||||
try:
|
||||
with conn.cursor() as cur:
|
||||
for name, sql in SQL_STEPS:
|
||||
cur.execute(sql)
|
||||
print(f"[OK] {name}")
|
||||
conn.commit()
|
||||
except Exception as exc: # pragma: no cover - operational script
|
||||
conn.rollback()
|
||||
print(f"[FAIL] {exc}", file=sys.stderr)
|
||||
return 1
|
||||
finally:
|
||||
try:
|
||||
conn.close()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
print("DWD build complete.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
|
||||
322
etl_billiards/scripts/build_dws_order_summary.py
Normal file
322
etl_billiards/scripts/build_dws_order_summary.py
Normal file
@@ -0,0 +1,322 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Recompute billiards_dws.dws_order_summary from DWD fact tables."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
PROJECT_ROOT = Path(__file__).resolve().parents[1]
|
||||
if str(PROJECT_ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(PROJECT_ROOT))
|
||||
|
||||
from database.connection import DatabaseConnection # noqa: E402
|
||||
|
||||
|
||||
SQL_BUILD_SUMMARY = r"""
|
||||
WITH table_fee AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
MIN(member_id) AS member_id,
|
||||
SUM(COALESCE(final_table_fee, 0)) AS table_fee_amount,
|
||||
SUM(COALESCE(member_discount_amount, 0)) AS member_discount_amount,
|
||||
SUM(COALESCE(manual_discount_amount, 0)) AS manual_discount_amount,
|
||||
SUM(COALESCE(original_table_fee, 0)) AS original_table_fee,
|
||||
MIN(start_time) AS first_time
|
||||
FROM billiards_dwd.fact_table_usage
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR start_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR start_time::date <= %(end_date)s)
|
||||
AND COALESCE(is_canceled, FALSE) = FALSE
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
assistant_fee AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
MIN(member_id) AS member_id,
|
||||
SUM(COALESCE(final_fee, 0)) AS assistant_service_amount,
|
||||
SUM(COALESCE(member_discount_amount, 0)) AS member_discount_amount,
|
||||
SUM(COALESCE(manual_discount_amount, 0)) AS manual_discount_amount,
|
||||
SUM(COALESCE(original_fee, 0)) AS original_fee,
|
||||
MIN(start_time) AS first_time
|
||||
FROM billiards_dwd.fact_assistant_service
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR start_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR start_time::date <= %(end_date)s)
|
||||
AND COALESCE(is_canceled, FALSE) = FALSE
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
goods_fee AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
MIN(member_id) AS member_id,
|
||||
SUM(COALESCE(final_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_amount,
|
||||
SUM(COALESCE(discount_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_discount_amount,
|
||||
SUM(COALESCE(original_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_original_amount,
|
||||
COUNT(*) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS item_count,
|
||||
SUM(COALESCE(quantity, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS total_item_quantity,
|
||||
MIN(sale_time) AS first_time
|
||||
FROM billiards_dwd.fact_sale_item
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR sale_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR sale_time::date <= %(end_date)s)
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
coupon_usage AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
MIN(member_id) AS member_id,
|
||||
SUM(COALESCE(deduct_amount, 0)) AS coupon_deduction,
|
||||
SUM(COALESCE(settle_price, 0)) AS settle_price,
|
||||
MIN(used_time) AS first_time
|
||||
FROM billiards_dwd.fact_coupon_usage
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR used_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR used_time::date <= %(end_date)s)
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
payments AS (
|
||||
SELECT
|
||||
fp.site_id,
|
||||
fp.order_settle_id,
|
||||
fp.order_trade_no,
|
||||
MIN(fp.member_id) AS member_id,
|
||||
SUM(COALESCE(fp.pay_amount, 0)) AS total_paid_amount,
|
||||
SUM(COALESCE(fp.pay_amount, 0)) FILTER (WHERE COALESCE(pm.is_stored_value, FALSE)) AS stored_card_deduct,
|
||||
SUM(COALESCE(fp.pay_amount, 0)) FILTER (WHERE NOT COALESCE(pm.is_stored_value, FALSE)) AS external_paid_amount,
|
||||
MIN(fp.pay_time) AS first_time
|
||||
FROM billiards_dwd.fact_payment fp
|
||||
LEFT JOIN billiards_dwd.dim_pay_method pm ON fp.pay_method_code = pm.pay_method_code
|
||||
WHERE (%(site_id)s IS NULL OR fp.site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR fp.pay_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR fp.pay_time::date <= %(end_date)s)
|
||||
GROUP BY fp.site_id, fp.order_settle_id, fp.order_trade_no
|
||||
),
|
||||
refunds AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
SUM(COALESCE(refund_amount, 0)) AS refund_amount
|
||||
FROM billiards_dwd.fact_refund
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR refund_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR refund_time::date <= %(end_date)s)
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
combined_ids AS (
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM table_fee
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM assistant_fee
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM goods_fee
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM coupon_usage
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM payments
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM refunds
|
||||
),
|
||||
site_dim AS (
|
||||
SELECT site_id, tenant_id FROM billiards_dwd.dim_site
|
||||
)
|
||||
INSERT INTO billiards_dws.dws_order_summary (
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
order_date,
|
||||
tenant_id,
|
||||
member_id,
|
||||
member_flag,
|
||||
recharge_order_flag,
|
||||
item_count,
|
||||
total_item_quantity,
|
||||
table_fee_amount,
|
||||
assistant_service_amount,
|
||||
goods_amount,
|
||||
group_amount,
|
||||
total_coupon_deduction,
|
||||
member_discount_amount,
|
||||
manual_discount_amount,
|
||||
order_original_amount,
|
||||
order_final_amount,
|
||||
stored_card_deduct,
|
||||
external_paid_amount,
|
||||
total_paid_amount,
|
||||
book_table_flow,
|
||||
book_assistant_flow,
|
||||
book_goods_flow,
|
||||
book_group_flow,
|
||||
book_order_flow,
|
||||
order_effective_consume_cash,
|
||||
order_effective_recharge_cash,
|
||||
order_effective_flow,
|
||||
refund_amount,
|
||||
net_income,
|
||||
created_at,
|
||||
updated_at
|
||||
)
|
||||
SELECT
|
||||
c.site_id,
|
||||
c.order_settle_id,
|
||||
c.order_trade_no,
|
||||
COALESCE(tf.first_time, af.first_time, gf.first_time, pay.first_time, cu.first_time)::date AS order_date,
|
||||
sd.tenant_id,
|
||||
COALESCE(tf.member_id, af.member_id, gf.member_id, cu.member_id, pay.member_id) AS member_id,
|
||||
COALESCE(tf.member_id, af.member_id, gf.member_id, cu.member_id, pay.member_id) IS NOT NULL AS member_flag,
|
||||
-- recharge flag: no consumption side but has payments
|
||||
(COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
|
||||
AND COALESCE(pay.total_paid_amount, 0) > 0 AS recharge_order_flag,
|
||||
COALESCE(gf.item_count, 0) AS item_count,
|
||||
COALESCE(gf.total_item_quantity, 0) AS total_item_quantity,
|
||||
COALESCE(tf.table_fee_amount, 0) AS table_fee_amount,
|
||||
COALESCE(af.assistant_service_amount, 0) AS assistant_service_amount,
|
||||
COALESCE(gf.goods_amount, 0) AS goods_amount,
|
||||
COALESCE(cu.settle_price, 0) AS group_amount,
|
||||
COALESCE(cu.coupon_deduction, 0) AS total_coupon_deduction,
|
||||
COALESCE(tf.member_discount_amount, 0) + COALESCE(af.member_discount_amount, 0) + COALESCE(gf.goods_discount_amount, 0) AS member_discount_amount,
|
||||
COALESCE(tf.manual_discount_amount, 0) + COALESCE(af.manual_discount_amount, 0) AS manual_discount_amount,
|
||||
COALESCE(tf.original_table_fee, 0) + COALESCE(af.original_fee, 0) + COALESCE(gf.goods_original_amount, 0) AS order_original_amount,
|
||||
COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) - COALESCE(cu.coupon_deduction, 0) AS order_final_amount,
|
||||
COALESCE(pay.stored_card_deduct, 0) AS stored_card_deduct,
|
||||
COALESCE(pay.external_paid_amount, 0) AS external_paid_amount,
|
||||
COALESCE(pay.total_paid_amount, 0) AS total_paid_amount,
|
||||
COALESCE(tf.table_fee_amount, 0) AS book_table_flow,
|
||||
COALESCE(af.assistant_service_amount, 0) AS book_assistant_flow,
|
||||
COALESCE(gf.goods_amount, 0) AS book_goods_flow,
|
||||
COALESCE(cu.settle_price, 0) AS book_group_flow,
|
||||
COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) AS book_order_flow,
|
||||
CASE
|
||||
WHEN (COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
|
||||
THEN 0
|
||||
ELSE COALESCE(pay.external_paid_amount, 0)
|
||||
END AS order_effective_consume_cash,
|
||||
CASE
|
||||
WHEN (COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
|
||||
THEN COALESCE(pay.external_paid_amount, 0)
|
||||
ELSE 0
|
||||
END AS order_effective_recharge_cash,
|
||||
COALESCE(pay.external_paid_amount, 0) + COALESCE(cu.settle_price, 0) AS order_effective_flow,
|
||||
COALESCE(rf.refund_amount, 0) AS refund_amount,
|
||||
(COALESCE(pay.external_paid_amount, 0) + COALESCE(cu.settle_price, 0)) - COALESCE(rf.refund_amount, 0) AS net_income,
|
||||
now() AS created_at,
|
||||
now() AS updated_at
|
||||
FROM combined_ids c
|
||||
LEFT JOIN table_fee tf ON c.site_id = tf.site_id AND c.order_settle_id = tf.order_settle_id
|
||||
LEFT JOIN assistant_fee af ON c.site_id = af.site_id AND c.order_settle_id = af.order_settle_id
|
||||
LEFT JOIN goods_fee gf ON c.site_id = gf.site_id AND c.order_settle_id = gf.order_settle_id
|
||||
LEFT JOIN coupon_usage cu ON c.site_id = cu.site_id AND c.order_settle_id = cu.order_settle_id
|
||||
LEFT JOIN payments pay ON c.site_id = pay.site_id AND c.order_settle_id = pay.order_settle_id
|
||||
LEFT JOIN refunds rf ON c.site_id = rf.site_id AND c.order_settle_id = rf.order_settle_id
|
||||
LEFT JOIN site_dim sd ON c.site_id = sd.site_id
|
||||
ON CONFLICT (site_id, order_settle_id) DO UPDATE SET
|
||||
order_trade_no = EXCLUDED.order_trade_no,
|
||||
order_date = EXCLUDED.order_date,
|
||||
tenant_id = EXCLUDED.tenant_id,
|
||||
member_id = EXCLUDED.member_id,
|
||||
member_flag = EXCLUDED.member_flag,
|
||||
recharge_order_flag = EXCLUDED.recharge_order_flag,
|
||||
item_count = EXCLUDED.item_count,
|
||||
total_item_quantity = EXCLUDED.total_item_quantity,
|
||||
table_fee_amount = EXCLUDED.table_fee_amount,
|
||||
assistant_service_amount = EXCLUDED.assistant_service_amount,
|
||||
goods_amount = EXCLUDED.goods_amount,
|
||||
group_amount = EXCLUDED.group_amount,
|
||||
total_coupon_deduction = EXCLUDED.total_coupon_deduction,
|
||||
member_discount_amount = EXCLUDED.member_discount_amount,
|
||||
manual_discount_amount = EXCLUDED.manual_discount_amount,
|
||||
order_original_amount = EXCLUDED.order_original_amount,
|
||||
order_final_amount = EXCLUDED.order_final_amount,
|
||||
stored_card_deduct = EXCLUDED.stored_card_deduct,
|
||||
external_paid_amount = EXCLUDED.external_paid_amount,
|
||||
total_paid_amount = EXCLUDED.total_paid_amount,
|
||||
book_table_flow = EXCLUDED.book_table_flow,
|
||||
book_assistant_flow = EXCLUDED.book_assistant_flow,
|
||||
book_goods_flow = EXCLUDED.book_goods_flow,
|
||||
book_group_flow = EXCLUDED.book_group_flow,
|
||||
book_order_flow = EXCLUDED.book_order_flow,
|
||||
order_effective_consume_cash = EXCLUDED.order_effective_consume_cash,
|
||||
order_effective_recharge_cash = EXCLUDED.order_effective_recharge_cash,
|
||||
order_effective_flow = EXCLUDED.order_effective_flow,
|
||||
refund_amount = EXCLUDED.refund_amount,
|
||||
net_income = EXCLUDED.net_income,
|
||||
updated_at = now();
|
||||
"""
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Build/update dws_order_summary from DWD fact tables."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dsn",
|
||||
default=os.environ.get("PG_DSN"),
|
||||
help="PostgreSQL DSN (fallback: PG_DSN env)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--site-id",
|
||||
type=int,
|
||||
default=None,
|
||||
help="Filter by site_id (optional, default all sites)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--start-date",
|
||||
dest="start_date",
|
||||
default=None,
|
||||
help="Filter facts from this date (YYYY-MM-DD, optional)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--end-date",
|
||||
dest="end_date",
|
||||
default=None,
|
||||
help="Filter facts until this date (YYYY-MM-DD, optional)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
|
||||
help="connect_timeout seconds (capped at 20, default 10)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
if not args.dsn:
|
||||
print("Missing DSN. Set PG_DSN or pass --dsn.", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
params = {
|
||||
"site_id": args.site_id,
|
||||
"start_date": args.start_date,
|
||||
"end_date": args.end_date,
|
||||
}
|
||||
timeout_val = max(1, min(args.timeout, 20))
|
||||
|
||||
conn = DatabaseConnection(args.dsn, connect_timeout=timeout_val)
|
||||
try:
|
||||
with conn.conn.cursor() as cur:
|
||||
cur.execute(SQL_BUILD_SUMMARY, params)
|
||||
conn.commit()
|
||||
except Exception as exc: # pragma: no cover - operational script
|
||||
conn.rollback()
|
||||
print(f"DWS build failed: {exc}", file=sys.stderr)
|
||||
return 1
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
print("dws_order_summary refreshed.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
117
etl_billiards/scripts/check_ods_json_vs_table.py
Normal file
117
etl_billiards/scripts/check_ods_json_vs_table.py
Normal file
@@ -0,0 +1,117 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
ODS JSON 字段核对脚本:对照当前数据库中的 ODS 表字段,检查示例 JSON(默认目录 C:\\dev\\LLTQ\\export\\test-json-doc)
|
||||
是否包含同名键,并输出每表未命中的字段,便于补充映射或确认确实无源字段。
|
||||
|
||||
使用方法:
|
||||
set PG_DSN=postgresql://... # 如 .env 中配置
|
||||
python -m etl_billiards.scripts.check_ods_json_vs_table
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import pathlib
|
||||
from typing import Dict, Iterable, Set, Tuple
|
||||
|
||||
import psycopg2
|
||||
|
||||
from etl_billiards.tasks.manual_ingest_task import ManualIngestTask
|
||||
|
||||
|
||||
def _flatten_keys(obj, prefix: str = "") -> Set[str]:
|
||||
"""递归展开 JSON 所有键路径,返回形如 data.assistantInfos.id 的集合。列表不保留索引,仅继续向下展开。"""
|
||||
keys: Set[str] = set()
|
||||
if isinstance(obj, dict):
|
||||
for k, v in obj.items():
|
||||
new_prefix = f"{prefix}.{k}" if prefix else k
|
||||
keys.add(new_prefix)
|
||||
keys |= _flatten_keys(v, new_prefix)
|
||||
elif isinstance(obj, list):
|
||||
for item in obj:
|
||||
keys |= _flatten_keys(item, prefix)
|
||||
return keys
|
||||
|
||||
|
||||
def _load_json_keys(path: pathlib.Path) -> Tuple[Set[str], dict[str, Set[str]]]:
|
||||
"""读取单个 JSON 文件并返回展开后的键集合以及末段->路径列表映射,若文件不存在或无法解析则返回空集合。"""
|
||||
if not path.exists():
|
||||
return set(), {}
|
||||
data = json.loads(path.read_text(encoding="utf-8"))
|
||||
paths = _flatten_keys(data)
|
||||
last_map: dict[str, Set[str]] = {}
|
||||
for p in paths:
|
||||
last = p.split(".")[-1].lower()
|
||||
last_map.setdefault(last, set()).add(p)
|
||||
return paths, last_map
|
||||
|
||||
|
||||
def _load_ods_columns(dsn: str) -> Dict[str, Set[str]]:
|
||||
"""从数据库读取 billiards_ods.* 的列名集合,按表返回。"""
|
||||
conn = psycopg2.connect(dsn)
|
||||
cur = conn.cursor()
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT table_name, column_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema='billiards_ods'
|
||||
ORDER BY table_name, ordinal_position
|
||||
"""
|
||||
)
|
||||
result: Dict[str, Set[str]] = {}
|
||||
for table, col in cur.fetchall():
|
||||
result.setdefault(table, set()).add(col.lower())
|
||||
cur.close()
|
||||
conn.close()
|
||||
return result
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""主流程:遍历 FILE_MAPPING 中的 ODS 表,检查 JSON 键覆盖情况并打印报告。"""
|
||||
dsn = os.environ.get("PG_DSN")
|
||||
json_dir = pathlib.Path(os.environ.get("JSON_DOC_DIR", r"C:\dev\LLTQ\export\test-json-doc"))
|
||||
|
||||
ods_cols_map = _load_ods_columns(dsn)
|
||||
|
||||
print(f"使用 JSON 目录: {json_dir}")
|
||||
print(f"连接 DSN: {dsn}")
|
||||
print("=" * 80)
|
||||
|
||||
for keywords, ods_table in ManualIngestTask.FILE_MAPPING:
|
||||
table = ods_table.split(".")[-1]
|
||||
cols = ods_cols_map.get(table, set())
|
||||
file_name = f"{keywords[0]}.json"
|
||||
file_path = json_dir / file_name
|
||||
keys_full, path_map = _load_json_keys(file_path)
|
||||
key_last_parts = set(path_map.keys())
|
||||
|
||||
missing: Set[str] = set()
|
||||
extra_keys: Set[str] = set()
|
||||
present: Set[str] = set()
|
||||
for col in sorted(cols):
|
||||
if col in key_last_parts:
|
||||
present.add(col)
|
||||
else:
|
||||
missing.add(col)
|
||||
for k in key_last_parts:
|
||||
if k not in cols:
|
||||
extra_keys.add(k)
|
||||
|
||||
print(f"[{table}] 文件={file_name} 列数={len(cols)} JSON键(末段)覆盖={len(present)}/{len(cols)}")
|
||||
if missing:
|
||||
print(" 未命中列:", ", ".join(sorted(missing)))
|
||||
else:
|
||||
print(" 未命中列: 无")
|
||||
if extra_keys:
|
||||
extras = []
|
||||
for k in sorted(extra_keys):
|
||||
paths = ", ".join(sorted(path_map.get(k, [])))
|
||||
extras.append(f"{k} ({paths})")
|
||||
print(" JSON 仅有(表无此列):", "; ".join(extras))
|
||||
else:
|
||||
print(" JSON 仅有(表无此列): 无")
|
||||
print("-" * 80)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
258
etl_billiards/scripts/rebuild_ods_from_json.py
Normal file
258
etl_billiards/scripts/rebuild_ods_from_json.py
Normal file
@@ -0,0 +1,258 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
从本地 JSON 示例目录重建 billiards_ods.* 表,并导入样例数据。
|
||||
用法:
|
||||
PYTHONPATH=. python -m etl_billiards.scripts.rebuild_ods_from_json [--dsn ...] [--json-dir ...] [--include ...] [--drop-schema-first]
|
||||
|
||||
依赖环境变量:
|
||||
PG_DSN PostgreSQL 连接串(必填)
|
||||
PG_CONNECT_TIMEOUT 可选,秒,默认 10
|
||||
JSON_DOC_DIR 可选,JSON 目录,默认 C:\\dev\\LLTQ\\export\\test-json-doc
|
||||
ODS_INCLUDE_FILES 可选,逗号分隔文件名(不含 .json)
|
||||
ODS_DROP_SCHEMA_FIRST 可选,true/false,默认 true
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Iterable, List, Tuple
|
||||
|
||||
import psycopg2
|
||||
from psycopg2 import sql
|
||||
from psycopg2.extras import Json, execute_values
|
||||
|
||||
|
||||
DEFAULT_JSON_DIR = r"C:\dev\LLTQ\export\test-json-doc"
|
||||
SPECIAL_LIST_PATHS: dict[str, tuple[str, ...]] = {
|
||||
"assistant_accounts_master": ("data", "assistantInfos"),
|
||||
"assistant_cancellation_records": ("data", "abolitionAssistants"),
|
||||
"assistant_service_records": ("data", "orderAssistantDetails"),
|
||||
"goods_stock_movements": ("data", "queryDeliveryRecordsList"),
|
||||
"goods_stock_summary": ("data",),
|
||||
"group_buy_packages": ("data", "packageCouponList"),
|
||||
"group_buy_redemption_records": ("data", "siteTableUseDetailsList"),
|
||||
"member_balance_changes": ("data", "tenantMemberCardLogs"),
|
||||
"member_profiles": ("data", "tenantMemberInfos"),
|
||||
"member_stored_value_cards": ("data", "tenantMemberCards"),
|
||||
"recharge_settlements": ("data", "settleList"),
|
||||
"settlement_records": ("data", "settleList"),
|
||||
"site_tables_master": ("data", "siteTables"),
|
||||
"stock_goods_category_tree": ("data", "goodsCategoryList"),
|
||||
"store_goods_master": ("data", "orderGoodsList"),
|
||||
"store_goods_sales_records": ("data", "orderGoodsLedgers"),
|
||||
"table_fee_discount_records": ("data", "taiFeeAdjustInfos"),
|
||||
"table_fee_transactions": ("data", "siteTableUseDetailsList"),
|
||||
"tenant_goods_master": ("data", "tenantGoodsList"),
|
||||
}
|
||||
|
||||
|
||||
def sanitize_identifier(name: str) -> str:
|
||||
"""将任意字符串转为可用的 SQL identifier(小写、非字母数字转下划线)。"""
|
||||
cleaned = re.sub(r"[^0-9a-zA-Z_]", "_", name.strip())
|
||||
if not cleaned:
|
||||
cleaned = "col"
|
||||
if cleaned[0].isdigit():
|
||||
cleaned = f"_{cleaned}"
|
||||
return cleaned.lower()
|
||||
|
||||
|
||||
def _extract_list_via_path(node, path: tuple[str, ...]):
|
||||
cur = node
|
||||
for key in path:
|
||||
if isinstance(cur, dict):
|
||||
cur = cur.get(key)
|
||||
else:
|
||||
return []
|
||||
return cur if isinstance(cur, list) else []
|
||||
|
||||
|
||||
def load_records(payload, list_path: tuple[str, ...] | None = None) -> list:
|
||||
"""
|
||||
尝试从 JSON 结构中提取记录列表:
|
||||
- 直接是 list -> 返回
|
||||
- dict 中 data 是 list -> 返回
|
||||
- dict 中 data 是 dict,取第一个 list 字段
|
||||
- dict 中任意值是 list -> 返回
|
||||
- 其余情况,包装为单条记录
|
||||
"""
|
||||
if list_path:
|
||||
if isinstance(payload, list):
|
||||
merged: list = []
|
||||
for item in payload:
|
||||
merged.extend(_extract_list_via_path(item, list_path))
|
||||
if merged:
|
||||
return merged
|
||||
elif isinstance(payload, dict):
|
||||
lst = _extract_list_via_path(payload, list_path)
|
||||
if lst:
|
||||
return lst
|
||||
|
||||
if isinstance(payload, list):
|
||||
return payload
|
||||
if isinstance(payload, dict):
|
||||
data_node = payload.get("data")
|
||||
if isinstance(data_node, list):
|
||||
return data_node
|
||||
if isinstance(data_node, dict):
|
||||
for v in data_node.values():
|
||||
if isinstance(v, list):
|
||||
return v
|
||||
for v in payload.values():
|
||||
if isinstance(v, list):
|
||||
return v
|
||||
return [payload]
|
||||
|
||||
|
||||
def collect_columns(records: Iterable[dict]) -> List[str]:
|
||||
"""汇总所有顶层键,作为表字段;仅处理 dict 记录。"""
|
||||
cols: set[str] = set()
|
||||
for rec in records:
|
||||
if isinstance(rec, dict):
|
||||
cols.update(rec.keys())
|
||||
return sorted(cols)
|
||||
|
||||
|
||||
def create_table(cur, schema: str, table: str, columns: List[Tuple[str, str]]):
|
||||
"""
|
||||
创建表:字段全部 jsonb,外加 source_file、record_index、payload、ingested_at。
|
||||
columns: [(col_name, original_key)]
|
||||
"""
|
||||
fields = [sql.SQL("{} jsonb").format(sql.Identifier(col)) for col, _ in columns]
|
||||
constraint_name = f"uq_{table}_source_record"
|
||||
ddl = sql.SQL(
|
||||
"CREATE TABLE IF NOT EXISTS {schema}.{table} ("
|
||||
"source_file text,"
|
||||
"record_index integer,"
|
||||
"{cols},"
|
||||
"payload jsonb,"
|
||||
"ingested_at timestamptz default now(),"
|
||||
"CONSTRAINT {constraint} UNIQUE (source_file, record_index)"
|
||||
");"
|
||||
).format(
|
||||
schema=sql.Identifier(schema),
|
||||
table=sql.Identifier(table),
|
||||
cols=sql.SQL(",").join(fields),
|
||||
constraint=sql.Identifier(constraint_name),
|
||||
)
|
||||
cur.execute(ddl)
|
||||
|
||||
|
||||
def insert_records(cur, schema: str, table: str, columns: List[Tuple[str, str]], records: list, source_file: str):
|
||||
"""批量插入记录。"""
|
||||
col_idents = [sql.Identifier(col) for col, _ in columns]
|
||||
col_names = [col for col, _ in columns]
|
||||
orig_keys = [orig for _, orig in columns]
|
||||
all_cols = [sql.Identifier("source_file"), sql.Identifier("record_index")] + col_idents + [
|
||||
sql.Identifier("payload")
|
||||
]
|
||||
|
||||
rows = []
|
||||
for idx, rec in enumerate(records):
|
||||
if not isinstance(rec, dict):
|
||||
rec = {"value": rec}
|
||||
row_values = [source_file, idx]
|
||||
for key in orig_keys:
|
||||
row_values.append(Json(rec.get(key)))
|
||||
row_values.append(Json(rec))
|
||||
rows.append(row_values)
|
||||
|
||||
insert_sql = sql.SQL("INSERT INTO {}.{} ({}) VALUES %s ON CONFLICT DO NOTHING").format(
|
||||
sql.Identifier(schema),
|
||||
sql.Identifier(table),
|
||||
sql.SQL(",").join(all_cols),
|
||||
)
|
||||
execute_values(cur, insert_sql, rows, page_size=500)
|
||||
|
||||
|
||||
def rebuild(schema: str = "billiards_ods", data_dir: str | Path = DEFAULT_JSON_DIR):
|
||||
parser = argparse.ArgumentParser(description="重建 billiards_ods.* 表并导入 JSON 样例")
|
||||
parser.add_argument("--dsn", dest="dsn", help="PostgreSQL DSN(默认读取环境变量 PG_DSN)")
|
||||
parser.add_argument("--json-dir", dest="json_dir", help=f"JSON 目录,默认 {DEFAULT_JSON_DIR}")
|
||||
parser.add_argument(
|
||||
"--include",
|
||||
dest="include_files",
|
||||
help="限定导入的文件名(逗号分隔,不含 .json),默认全部",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--drop-schema-first",
|
||||
dest="drop_schema_first",
|
||||
action="store_true",
|
||||
help="先删除并重建 schema(默认 true)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-drop-schema-first",
|
||||
dest="drop_schema_first",
|
||||
action="store_false",
|
||||
help="保留现有 schema,仅按冲突去重导入",
|
||||
)
|
||||
parser.set_defaults(drop_schema_first=None)
|
||||
args = parser.parse_args()
|
||||
|
||||
dsn = args.dsn or os.environ.get("PG_DSN")
|
||||
if not dsn:
|
||||
print("缺少参数/环境变量 PG_DSN,无法连接数据库。")
|
||||
sys.exit(1)
|
||||
timeout = max(1, min(int(os.environ.get("PG_CONNECT_TIMEOUT", 10)), 60))
|
||||
env_drop = os.environ.get("ODS_DROP_SCHEMA_FIRST") or os.environ.get("DROP_SCHEMA_FIRST")
|
||||
drop_schema_first = (
|
||||
args.drop_schema_first
|
||||
if args.drop_schema_first is not None
|
||||
else str(env_drop or "true").lower() in ("1", "true", "yes")
|
||||
)
|
||||
include_files_env = args.include_files or os.environ.get("ODS_INCLUDE_FILES") or os.environ.get("INCLUDE_FILES")
|
||||
include_files = set()
|
||||
if include_files_env:
|
||||
include_files = {p.strip().lower() for p in include_files_env.split(",") if p.strip()}
|
||||
|
||||
base_dir = Path(args.json_dir or data_dir or DEFAULT_JSON_DIR)
|
||||
if not base_dir.exists():
|
||||
print(f"JSON 目录不存在: {base_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
conn = psycopg2.connect(dsn, connect_timeout=timeout)
|
||||
conn.autocommit = False
|
||||
cur = conn.cursor()
|
||||
|
||||
if drop_schema_first:
|
||||
print(f"Dropping schema {schema} ...")
|
||||
cur.execute(sql.SQL("DROP SCHEMA IF EXISTS {} CASCADE;").format(sql.Identifier(schema)))
|
||||
cur.execute(sql.SQL("CREATE SCHEMA {};").format(sql.Identifier(schema)))
|
||||
else:
|
||||
cur.execute(
|
||||
sql.SQL("SELECT schema_name FROM information_schema.schemata WHERE schema_name=%s"),
|
||||
(schema,),
|
||||
)
|
||||
if not cur.fetchone():
|
||||
cur.execute(sql.SQL("CREATE SCHEMA {};").format(sql.Identifier(schema)))
|
||||
|
||||
json_files = sorted(base_dir.glob("*.json"))
|
||||
for path in json_files:
|
||||
stem_lower = path.stem.lower()
|
||||
if include_files and stem_lower not in include_files:
|
||||
continue
|
||||
|
||||
print(f"Processing {path.name} ...")
|
||||
payload = json.loads(path.read_text(encoding="utf-8"))
|
||||
list_path = SPECIAL_LIST_PATHS.get(stem_lower)
|
||||
records = load_records(payload, list_path=list_path)
|
||||
columns_raw = collect_columns(records)
|
||||
columns = [(sanitize_identifier(c), c) for c in columns_raw]
|
||||
|
||||
table_name = sanitize_identifier(path.stem)
|
||||
create_table(cur, schema, table_name, columns)
|
||||
if records:
|
||||
insert_records(cur, schema, table_name, columns, records, path.name)
|
||||
print(f" -> rows: {len(records)}, columns: {len(columns)}")
|
||||
|
||||
conn.commit()
|
||||
cur.close()
|
||||
conn.close()
|
||||
print("Rebuild done.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
rebuild()
|
||||
195
etl_billiards/scripts/run_tests.py
Normal file
195
etl_billiards/scripts/run_tests.py
Normal file
@@ -0,0 +1,195 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
灵活的测试执行脚本,可像搭积木一样组合不同参数或预置命令(模式/数据库/归档路径等),
|
||||
直接运行本文件即可触发 pytest。
|
||||
|
||||
示例:
|
||||
python scripts/run_tests.py --suite online --flow FULL --keyword ORDERS
|
||||
python scripts/run_tests.py --preset fetch_only
|
||||
python scripts/run_tests.py --suite online --json-source tmp/archives
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import importlib.util
|
||||
import os
|
||||
import shlex
|
||||
import sys
|
||||
from typing import Dict, List
|
||||
|
||||
import pytest
|
||||
|
||||
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
# 确保项目根目录在 sys.path,便于 tests 内部 import config / tasks 等模块
|
||||
if PROJECT_ROOT not in sys.path:
|
||||
sys.path.insert(0, PROJECT_ROOT)
|
||||
|
||||
SUITE_MAP: Dict[str, str] = {
|
||||
"online": "tests/unit/test_etl_tasks_online.py",
|
||||
"integration": "tests/integration/test_database.py",
|
||||
}
|
||||
|
||||
PRESETS: Dict[str, Dict] = {}
|
||||
|
||||
|
||||
def _load_presets():
|
||||
preset_path = os.path.join(os.path.dirname(__file__), "test_presets.py")
|
||||
if not os.path.exists(preset_path):
|
||||
return
|
||||
spec = importlib.util.spec_from_file_location("test_presets", preset_path)
|
||||
if not spec or not spec.loader:
|
||||
return
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module) # type: ignore[attr-defined]
|
||||
presets = getattr(module, "PRESETS", {})
|
||||
if isinstance(presets, dict):
|
||||
PRESETS.update(presets)
|
||||
|
||||
|
||||
_load_presets()
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(description="ETL 测试执行器(支持参数化调配)")
|
||||
parser.add_argument(
|
||||
"--suite",
|
||||
choices=sorted(SUITE_MAP.keys()),
|
||||
nargs="+",
|
||||
help="预置测试套件,可多选(默认全部 online/offline)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--tests",
|
||||
nargs="+",
|
||||
help="自定义测试路径(可与 --suite 混用),例如 tests/unit/test_config.py",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--flow",
|
||||
choices=["FETCH_ONLY", "INGEST_ONLY", "FULL"],
|
||||
help="覆盖 PIPELINE_FLOW(在线抓取/本地清洗/全流程)",
|
||||
)
|
||||
parser.add_argument("--json-source", help="设置 JSON_SOURCE_DIR(本地清洗入库使用的 JSON 目录)")
|
||||
parser.add_argument("--json-fetch-root", help="设置 JSON_FETCH_ROOT(在线抓取输出根目录)")
|
||||
parser.add_argument(
|
||||
"--keyword",
|
||||
"-k",
|
||||
help="pytest -k 关键字过滤(例如 ORDERS,只运行包含该字符串的用例)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--pytest-args",
|
||||
help="附加 pytest 参数,格式与命令行一致(例如 \"-vv --maxfail=1\")",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--env",
|
||||
action="append",
|
||||
metavar="KEY=VALUE",
|
||||
help="自定义环境变量,可重复传入,例如 --env STORE_ID=123",
|
||||
)
|
||||
parser.add_argument("--preset", choices=sorted(PRESETS.keys()) if PRESETS else None, nargs="+",
|
||||
help="从 scripts/test_presets.py 中选择一个或多个组合命令")
|
||||
parser.add_argument("--list-presets", action="store_true", help="列出可用预置命令后退出")
|
||||
parser.add_argument("--dry-run", action="store_true", help="仅打印将要执行的命令与环境,不真正运行 pytest")
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def apply_presets_to_args(args: argparse.Namespace):
|
||||
if not args.preset:
|
||||
return
|
||||
for name in args.preset:
|
||||
preset = PRESETS.get(name, {})
|
||||
if not preset:
|
||||
continue
|
||||
for key, value in preset.items():
|
||||
if key in ("suite", "tests"):
|
||||
if not value:
|
||||
continue
|
||||
existing = getattr(args, key)
|
||||
if existing is None:
|
||||
setattr(args, key, list(value))
|
||||
else:
|
||||
existing.extend(value)
|
||||
elif key == "env":
|
||||
args.env = (args.env or []) + list(value)
|
||||
elif key == "pytest_args":
|
||||
args.pytest_args = " ".join(filter(None, [value, args.pytest_args or ""]))
|
||||
elif key == "keyword":
|
||||
if args.keyword is None:
|
||||
args.keyword = value
|
||||
else:
|
||||
if getattr(args, key, None) is None:
|
||||
setattr(args, key, value)
|
||||
|
||||
|
||||
def apply_env(args: argparse.Namespace) -> Dict[str, str]:
|
||||
env_updates = {}
|
||||
if args.flow:
|
||||
env_updates["PIPELINE_FLOW"] = args.flow
|
||||
if args.json_source:
|
||||
env_updates["JSON_SOURCE_DIR"] = args.json_source
|
||||
if args.json_fetch_root:
|
||||
env_updates["JSON_FETCH_ROOT"] = args.json_fetch_root
|
||||
if args.env:
|
||||
for item in args.env:
|
||||
if "=" not in item:
|
||||
raise SystemExit(f"--env 参数格式错误: {item!r},应为 KEY=VALUE")
|
||||
key, value = item.split("=", 1)
|
||||
env_updates[key.strip()] = value.strip()
|
||||
|
||||
for key, value in env_updates.items():
|
||||
os.environ[key] = value
|
||||
return env_updates
|
||||
|
||||
|
||||
def build_pytest_args(args: argparse.Namespace) -> List[str]:
|
||||
targets: List[str] = []
|
||||
if args.suite:
|
||||
for suite in args.suite:
|
||||
targets.append(SUITE_MAP[suite])
|
||||
if args.tests:
|
||||
targets.extend(args.tests)
|
||||
if not targets:
|
||||
targets = list(SUITE_MAP.values())
|
||||
|
||||
pytest_args: List[str] = targets
|
||||
if args.keyword:
|
||||
pytest_args += ["-k", args.keyword]
|
||||
if args.pytest_args:
|
||||
pytest_args += shlex.split(args.pytest_args)
|
||||
return pytest_args
|
||||
|
||||
|
||||
def main() -> int:
|
||||
os.chdir(PROJECT_ROOT)
|
||||
args = parse_args()
|
||||
if args.list_presets:
|
||||
print("可用预置命令:")
|
||||
if not PRESETS:
|
||||
print("(暂无,可编辑 scripts/test_presets.py 添加)")
|
||||
else:
|
||||
for name in sorted(PRESETS):
|
||||
print(f"- {name}")
|
||||
return 0
|
||||
|
||||
apply_presets_to_args(args)
|
||||
env_updates = apply_env(args)
|
||||
pytest_args = build_pytest_args(args)
|
||||
|
||||
print("=== 环境变量覆盖 ===")
|
||||
if env_updates:
|
||||
for k, v in env_updates.items():
|
||||
print(f"{k}={v}")
|
||||
else:
|
||||
print("(无覆盖,沿用系统默认)")
|
||||
print("\n=== Pytest 参数 ===")
|
||||
print(" ".join(pytest_args))
|
||||
print()
|
||||
|
||||
if args.dry_run:
|
||||
print("Dry-run 模式,未真正执行 pytest")
|
||||
return 0
|
||||
|
||||
exit_code = pytest.main(pytest_args)
|
||||
return int(exit_code)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
64
etl_billiards/scripts/test_db_connection.py
Normal file
64
etl_billiards/scripts/test_db_connection.py
Normal file
@@ -0,0 +1,64 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Quick utility for validating PostgreSQL connectivity (ASCII-only output)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
|
||||
if PROJECT_ROOT not in sys.path:
|
||||
sys.path.insert(0, PROJECT_ROOT)
|
||||
|
||||
from database.connection import DatabaseConnection
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(description="PostgreSQL connectivity smoke test")
|
||||
parser.add_argument("--dsn", help="Override TEST_DB_DSN / env value")
|
||||
parser.add_argument(
|
||||
"--query",
|
||||
default="SELECT 1 AS ok",
|
||||
help="Custom SQL to run after connection (default: SELECT 1 AS ok)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=10,
|
||||
help="connect_timeout seconds passed to psycopg2 (capped at 20, default: 10)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
dsn = args.dsn or os.environ.get("TEST_DB_DSN")
|
||||
if not dsn:
|
||||
print("Missing DSN. Use --dsn or TEST_DB_DSN.", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
print(f"Trying connection: {dsn}")
|
||||
try:
|
||||
timeout = max(1, min(args.timeout, 20))
|
||||
conn = DatabaseConnection(dsn, connect_timeout=timeout)
|
||||
except Exception as exc: # pragma: no cover - diagnostic output
|
||||
print("Connection failed:", exc, file=sys.stderr)
|
||||
return 1
|
||||
|
||||
try:
|
||||
result = conn.query(args.query)
|
||||
print("Connection OK, query result:")
|
||||
for row in result:
|
||||
print(row)
|
||||
conn.close()
|
||||
return 0
|
||||
except Exception as exc: # pragma: no cover - diagnostic output
|
||||
print("Connection succeeded but query failed:", exc, file=sys.stderr)
|
||||
try:
|
||||
conn.close()
|
||||
finally:
|
||||
return 3
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
122
etl_billiards/scripts/test_presets.py
Normal file
122
etl_billiards/scripts/test_presets.py
Normal file
@@ -0,0 +1,122 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""测试命令仓库:集中维护 run_tests.py 的常用组合,支持一键执行。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from typing import List
|
||||
|
||||
RUN_TESTS_SCRIPT = os.path.join(os.path.dirname(__file__), "run_tests.py")
|
||||
|
||||
# 默认自动运行的预置(可根据需要修改顺序/条目)
|
||||
AUTO_RUN_PRESETS = ["fetch_only"]
|
||||
|
||||
PRESETS = {
|
||||
"fetch_only": {
|
||||
"suite": ["online"],
|
||||
"flow": "FETCH_ONLY",
|
||||
"json_fetch_root": "tmp/json_fetch",
|
||||
"keyword": "ORDERS",
|
||||
"pytest_args": "-vv",
|
||||
"preset_meta": "仅在线抓取阶段,输出到本地目录",
|
||||
},
|
||||
"ingest_local": {
|
||||
"suite": ["online"],
|
||||
"flow": "INGEST_ONLY",
|
||||
"json_source": "tests/source-data-doc",
|
||||
"keyword": "ORDERS",
|
||||
"preset_meta": "从指定 JSON 目录做本地清洗入库",
|
||||
},
|
||||
"full_pipeline": {
|
||||
"suite": ["online"],
|
||||
"flow": "FULL",
|
||||
"json_fetch_root": "tmp/json_fetch",
|
||||
"keyword": "ORDERS",
|
||||
"preset_meta": "先抓取再清洗入库的全流程",
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def print_parameter_help() -> None:
|
||||
print("=== 参数键说明 ===")
|
||||
print("suite : 预置套件列表,如 ['online','integration']")
|
||||
print("tests : 自定义 pytest 路径列表")
|
||||
print("flow : PIPELINE_FLOW(FETCH_ONLY / INGEST_ONLY / FULL)")
|
||||
print("json_source : JSON_SOURCE_DIR,本地清洗入库使用的 JSON 目录")
|
||||
print("json_fetch_root : JSON_FETCH_ROOT,在线抓取输出根目录")
|
||||
print("keyword : pytest -k 过滤关键字")
|
||||
print("pytest_args : 额外 pytest 参数(字符串)")
|
||||
print("env : 附加环境变量,例如 ['KEY=VALUE']")
|
||||
print("preset_meta : 仅用于注释说明")
|
||||
print()
|
||||
|
||||
|
||||
def print_presets() -> None:
|
||||
if not PRESETS:
|
||||
print("当前未定义任何预置,请在 PRESETS 中添加。")
|
||||
return
|
||||
for idx, (name, payload) in enumerate(PRESETS.items(), start=1):
|
||||
comment = payload.get("preset_meta", "")
|
||||
print(f"{idx}. {name}")
|
||||
if comment:
|
||||
print(f" 说明: {comment}")
|
||||
for key, value in payload.items():
|
||||
if key == "preset_meta":
|
||||
continue
|
||||
print(f" {key}: {value}")
|
||||
print()
|
||||
|
||||
|
||||
def resolve_targets(requested: List[str] | None) -> List[str]:
|
||||
if not PRESETS:
|
||||
raise SystemExit("预置为空,请先在 PRESETS 中定义测试组合。")
|
||||
|
||||
def valid(names: List[str]) -> List[str]:
|
||||
return [name for name in names if name in PRESETS]
|
||||
|
||||
if requested:
|
||||
candidates = valid(requested)
|
||||
missing = [name for name in requested if name not in PRESETS]
|
||||
if missing:
|
||||
print(f"警告:忽略未定义的预置 {missing}")
|
||||
if candidates:
|
||||
return candidates
|
||||
|
||||
auto = valid(AUTO_RUN_PRESETS)
|
||||
if auto:
|
||||
return auto
|
||||
|
||||
return list(PRESETS.keys())
|
||||
|
||||
|
||||
def run_presets(preset_names: List[str], dry_run: bool) -> None:
|
||||
for name in preset_names:
|
||||
cmd = [sys.executable, RUN_TESTS_SCRIPT, "--preset", name]
|
||||
printable = " ".join(cmd)
|
||||
if dry_run:
|
||||
print(f"[Dry-Run] {printable}")
|
||||
else:
|
||||
print(f"\n>>> 执行: {printable}")
|
||||
subprocess.run(cmd, check=False)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="测试预置仓库(集中配置即可批量触发 run_tests)")
|
||||
parser.add_argument("--preset", choices=sorted(PRESETS.keys()), nargs="+", help="指定要运行的预置命令")
|
||||
parser.add_argument("--list", action="store_true", help="仅列出参数说明与所有预置")
|
||||
parser.add_argument("--dry-run", action="store_true", help="仅打印命令,不执行 pytest")
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.list:
|
||||
print_parameter_help()
|
||||
print_presets()
|
||||
return
|
||||
|
||||
targets = resolve_targets(args.preset)
|
||||
run_presets(targets, dry_run=args.dry_run)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
30
etl_billiards/setup.py
Normal file
30
etl_billiards/setup.py
Normal file
@@ -0,0 +1,30 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
Setup script for ETL Billiards
|
||||
"""
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
with open("requirements.txt") as f:
|
||||
requirements = f.read().splitlines()
|
||||
|
||||
setup(
|
||||
name="etl-billiards",
|
||||
version="2.0.0",
|
||||
description="Modular ETL system for billiards business data",
|
||||
author="Data Platform Team",
|
||||
author_email="data-platform@example.com",
|
||||
packages=find_packages(),
|
||||
install_requires=requirements,
|
||||
python_requires=">=3.10",
|
||||
entry_points={
|
||||
"console_scripts": [
|
||||
"etl-billiards=cli.main:main",
|
||||
],
|
||||
},
|
||||
classifiers=[
|
||||
"Development Status :: 4 - Beta",
|
||||
"Intended Audience :: Developers",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
],
|
||||
)
|
||||
0
etl_billiards/tasks/__init__.py
Normal file
0
etl_billiards/tasks/__init__.py
Normal file
81
etl_billiards/tasks/assistant_abolish_task.py
Normal file
81
etl_billiards/tasks/assistant_abolish_task.py
Normal file
@@ -0,0 +1,81 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教作废任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.assistant_abolish import AssistantAbolishLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class AssistantAbolishTask(BaseTask):
|
||||
"""同步助教作废记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "ASSISTANT_ABOLISH"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/AssistantPerformance/GetAbolitionAssistant",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="abolitionAssistants",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_record(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = AssistantAbolishLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_records(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_record(self, raw: dict, store_id: int) -> dict | None:
|
||||
abolish_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not abolish_id:
|
||||
self.logger.warning("跳过缺少作废ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"abolish_id": abolish_id,
|
||||
"table_id": TypeParser.parse_int(raw.get("tableId")),
|
||||
"table_name": raw.get("tableName"),
|
||||
"table_area_id": TypeParser.parse_int(raw.get("tableAreaId")),
|
||||
"table_area": raw.get("tableArea"),
|
||||
"assistant_no": raw.get("assistantOn"),
|
||||
"assistant_name": raw.get("assistantName"),
|
||||
"charge_minutes": TypeParser.parse_int(raw.get("pdChargeMinutes")),
|
||||
"abolish_amount": TypeParser.parse_decimal(raw.get("assistantAbolishAmount")),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("createTime") or raw.get("create_time"), self.tz
|
||||
),
|
||||
"trash_reason": raw.get("trashReason"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
102
etl_billiards/tasks/assistants_task.py
Normal file
102
etl_billiards/tasks/assistants_task.py
Normal file
@@ -0,0 +1,102 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教账号任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.assistant import AssistantLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class AssistantsTask(BaseTask):
|
||||
"""同步助教账号资料"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "ASSISTANTS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/PersonnelManagement/SearchAssistantInfo",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="assistantInfos",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_assistant(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = AssistantLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_assistants(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_assistant(self, raw: dict, store_id: int) -> dict | None:
|
||||
assistant_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not assistant_id:
|
||||
self.logger.warning("跳过缺少助教ID的数据: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"assistant_id": assistant_id,
|
||||
"assistant_no": raw.get("assistant_no") or raw.get("assistantNo"),
|
||||
"nickname": raw.get("nickname"),
|
||||
"real_name": raw.get("real_name") or raw.get("realName"),
|
||||
"gender": raw.get("gender"),
|
||||
"mobile": raw.get("mobile"),
|
||||
"level": raw.get("level"),
|
||||
"team_id": TypeParser.parse_int(raw.get("team_id") or raw.get("teamId")),
|
||||
"team_name": raw.get("team_name"),
|
||||
"assistant_status": raw.get("assistant_status"),
|
||||
"work_status": raw.get("work_status"),
|
||||
"entry_time": TypeParser.parse_timestamp(
|
||||
raw.get("entry_time") or raw.get("entryTime"), self.tz
|
||||
),
|
||||
"resign_time": TypeParser.parse_timestamp(
|
||||
raw.get("resign_time") or raw.get("resignTime"), self.tz
|
||||
),
|
||||
"start_time": TypeParser.parse_timestamp(
|
||||
raw.get("start_time") or raw.get("startTime"), self.tz
|
||||
),
|
||||
"end_time": TypeParser.parse_timestamp(
|
||||
raw.get("end_time") or raw.get("endTime"), self.tz
|
||||
),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"update_time": TypeParser.parse_timestamp(
|
||||
raw.get("update_time") or raw.get("updateTime"), self.tz
|
||||
),
|
||||
"system_role_id": raw.get("system_role_id"),
|
||||
"online_status": raw.get("online_status"),
|
||||
"allow_cx": raw.get("allow_cx"),
|
||||
"charge_way": raw.get("charge_way"),
|
||||
"pd_unit_price": TypeParser.parse_decimal(raw.get("pd_unit_price")),
|
||||
"cx_unit_price": TypeParser.parse_decimal(raw.get("cx_unit_price")),
|
||||
"is_guaranteed": raw.get("is_guaranteed"),
|
||||
"is_team_leader": raw.get("is_team_leader"),
|
||||
"serial_number": raw.get("serial_number"),
|
||||
"show_sort": raw.get("show_sort"),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
79
etl_billiards/tasks/base_dwd_task.py
Normal file
79
etl_billiards/tasks/base_dwd_task.py
Normal file
@@ -0,0 +1,79 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""DWD任务基类"""
|
||||
import json
|
||||
from typing import Any, Dict, Iterator, List, Optional, Tuple
|
||||
from datetime import datetime
|
||||
|
||||
from .base_task import BaseTask
|
||||
from models.parsers import TypeParser
|
||||
|
||||
class BaseDwdTask(BaseTask):
|
||||
"""
|
||||
DWD 层任务基类
|
||||
负责从 ODS 表读取数据,供子类清洗和写入事实/维度表
|
||||
"""
|
||||
|
||||
def _get_ods_cursor(self, task_code: str) -> datetime:
|
||||
"""
|
||||
获取上次处理的 ODS 数据的时间点 (fetched_at)
|
||||
这里简化处理,实际应该从 etl_cursor 表读取
|
||||
目前先依赖 BaseTask 的时间窗口逻辑,或者子类自己管理
|
||||
"""
|
||||
# TODO: 对接真正的 CursorManager
|
||||
# 暂时返回一个较早的时间,或者由子类通过 _get_time_window 获取
|
||||
return None
|
||||
|
||||
def iter_ods_rows(
|
||||
self,
|
||||
table_name: str,
|
||||
columns: List[str],
|
||||
start_time: datetime,
|
||||
end_time: datetime,
|
||||
time_col: str = "fetched_at",
|
||||
batch_size: int = 1000
|
||||
) -> Iterator[List[Dict[str, Any]]]:
|
||||
"""
|
||||
分批迭代读取 ODS 表数据
|
||||
|
||||
Args:
|
||||
table_name: ODS 表名
|
||||
columns: 需要查询的字段列表 (必须包含 payload)
|
||||
start_time: 开始时间 (包含)
|
||||
end_time: 结束时间 (包含)
|
||||
time_col: 时间过滤字段,默认 fetched_at
|
||||
batch_size: 批次大小
|
||||
"""
|
||||
offset = 0
|
||||
cols_str = ", ".join(columns)
|
||||
|
||||
while True:
|
||||
sql = f"""
|
||||
SELECT {cols_str}
|
||||
FROM {table_name}
|
||||
WHERE {time_col} >= %s AND {time_col} <= %s
|
||||
ORDER BY {time_col} ASC
|
||||
LIMIT %s OFFSET %s
|
||||
"""
|
||||
|
||||
rows = self.db.query(sql, (start_time, end_time, batch_size, offset))
|
||||
|
||||
if not rows:
|
||||
break
|
||||
|
||||
yield rows
|
||||
|
||||
if len(rows) < batch_size:
|
||||
break
|
||||
|
||||
offset += batch_size
|
||||
|
||||
def parse_payload(self, row: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
解析 ODS 行中的 payload JSON
|
||||
"""
|
||||
payload = row.get("payload")
|
||||
if isinstance(payload, str):
|
||||
return json.loads(payload)
|
||||
elif isinstance(payload, dict):
|
||||
return payload
|
||||
return {}
|
||||
141
etl_billiards/tasks/base_task.py
Normal file
141
etl_billiards/tasks/base_task.py
Normal file
@@ -0,0 +1,141 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""ETL任务基类(引入 Extract/Transform/Load 模板方法)"""
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timedelta
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class TaskContext:
|
||||
"""统一透传给 Extract/Transform/Load 的运行期信息。"""
|
||||
|
||||
store_id: int
|
||||
window_start: datetime
|
||||
window_end: datetime
|
||||
window_minutes: int
|
||||
cursor: dict | None = None
|
||||
|
||||
|
||||
class BaseTask:
|
||||
"""提供 E/T/L 模板的任务基类。"""
|
||||
|
||||
def __init__(self, config, db_connection, api_client, logger):
|
||||
self.config = config
|
||||
self.db = db_connection
|
||||
self.api = api_client
|
||||
self.logger = logger
|
||||
self.tz = ZoneInfo(config.get("app.timezone", "Asia/Taipei"))
|
||||
|
||||
# ------------------------------------------------------------------ 基本信息
|
||||
def get_task_code(self) -> str:
|
||||
"""获取任务代码"""
|
||||
raise NotImplementedError("子类需实现 get_task_code 方法")
|
||||
|
||||
# ------------------------------------------------------------------ E/T/L 钩子
|
||||
def extract(self, context: TaskContext):
|
||||
"""提取数据"""
|
||||
raise NotImplementedError("子类需实现 extract 方法")
|
||||
|
||||
def transform(self, extracted, context: TaskContext):
|
||||
"""转换数据"""
|
||||
return extracted
|
||||
|
||||
def load(self, transformed, context: TaskContext) -> dict:
|
||||
"""加载数据并返回统计信息"""
|
||||
raise NotImplementedError("子类需实现 load 方法")
|
||||
|
||||
# ------------------------------------------------------------------ 主流程
|
||||
def execute(self, cursor_data: dict | None = None) -> dict:
|
||||
"""统一 orchestrate Extract → Transform → Load"""
|
||||
context = self._build_context(cursor_data)
|
||||
task_code = self.get_task_code()
|
||||
self.logger.info(
|
||||
"%s: 开始执行,窗口[%s ~ %s]",
|
||||
task_code,
|
||||
context.window_start,
|
||||
context.window_end,
|
||||
)
|
||||
|
||||
try:
|
||||
extracted = self.extract(context)
|
||||
transformed = self.transform(extracted, context)
|
||||
counts = self.load(transformed, context) or {}
|
||||
self.db.commit()
|
||||
except Exception:
|
||||
self.db.rollback()
|
||||
self.logger.error("%s: 执行失败", task_code, exc_info=True)
|
||||
raise
|
||||
|
||||
result = self._build_result("SUCCESS", counts)
|
||||
result["window"] = {
|
||||
"start": context.window_start,
|
||||
"end": context.window_end,
|
||||
"minutes": context.window_minutes,
|
||||
}
|
||||
self.logger.info("%s: 完成,统计=%s", task_code, result["counts"])
|
||||
return result
|
||||
|
||||
# ------------------------------------------------------------------ 辅助方法
|
||||
def _build_context(self, cursor_data: dict | None) -> TaskContext:
|
||||
window_start, window_end, window_minutes = self._get_time_window(cursor_data)
|
||||
return TaskContext(
|
||||
store_id=self.config.get("app.store_id"),
|
||||
window_start=window_start,
|
||||
window_end=window_end,
|
||||
window_minutes=window_minutes,
|
||||
cursor=cursor_data,
|
||||
)
|
||||
|
||||
def _get_time_window(self, cursor_data: dict = None) -> tuple:
|
||||
"""计算时间窗口"""
|
||||
now = datetime.now(self.tz)
|
||||
|
||||
idle_start = self.config.get("run.idle_window.start", "04:00")
|
||||
idle_end = self.config.get("run.idle_window.end", "16:00")
|
||||
is_idle = self._is_in_idle_window(now, idle_start, idle_end)
|
||||
|
||||
if is_idle:
|
||||
window_minutes = self.config.get("run.window_minutes.default_idle", 180)
|
||||
else:
|
||||
window_minutes = self.config.get("run.window_minutes.default_busy", 30)
|
||||
|
||||
overlap_seconds = self.config.get("run.overlap_seconds", 120)
|
||||
|
||||
if cursor_data and cursor_data.get("last_end"):
|
||||
window_start = cursor_data["last_end"] - timedelta(seconds=overlap_seconds)
|
||||
else:
|
||||
window_start = now - timedelta(minutes=window_minutes)
|
||||
|
||||
window_end = now
|
||||
return window_start, window_end, window_minutes
|
||||
|
||||
def _is_in_idle_window(self, dt: datetime, start_time: str, end_time: str) -> bool:
|
||||
"""判断是否在闲时窗口"""
|
||||
current_time = dt.strftime("%H:%M")
|
||||
return start_time <= current_time <= end_time
|
||||
|
||||
def _merge_common_params(self, base: dict) -> dict:
|
||||
"""
|
||||
合并全局/任务级参数池,便于在配置中统一覆<E4B880>?/追加过滤条件。
|
||||
支持:
|
||||
- api.params 下的通用键<E794A8>?
|
||||
- api.params.<task_code_lower> 下的任务级键<E7BAA7>?
|
||||
"""
|
||||
merged: dict = {}
|
||||
common = self.config.get("api.params", {}) or {}
|
||||
if isinstance(common, dict):
|
||||
merged.update(common)
|
||||
|
||||
task_key = f"api.params.{self.get_task_code().lower()}"
|
||||
scoped = self.config.get(task_key, {}) or {}
|
||||
if isinstance(scoped, dict):
|
||||
merged.update(scoped)
|
||||
|
||||
merged.update(base)
|
||||
return merged
|
||||
|
||||
def _build_result(self, status: str, counts: dict) -> dict:
|
||||
"""构建结果字典"""
|
||||
return {"status": status, "counts": counts}
|
||||
93
etl_billiards/tasks/coupon_usage_task.py
Normal file
93
etl_billiards/tasks/coupon_usage_task.py
Normal file
@@ -0,0 +1,93 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""平台券核销任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.coupon_usage import CouponUsageLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class CouponUsageTask(BaseTask):
|
||||
"""同步平台券验券/核销记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "COUPON_USAGE"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Promotion/GetOfflineCouponConsumePageList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_usage(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = CouponUsageLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_coupon_usage(
|
||||
transformed["records"]
|
||||
)
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_usage(self, raw: dict, store_id: int) -> dict | None:
|
||||
usage_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not usage_id:
|
||||
self.logger.warning("跳过缺少券核销ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"usage_id": usage_id,
|
||||
"coupon_code": raw.get("coupon_code"),
|
||||
"coupon_channel": raw.get("coupon_channel"),
|
||||
"coupon_name": raw.get("coupon_name"),
|
||||
"sale_price": TypeParser.parse_decimal(raw.get("sale_price")),
|
||||
"coupon_money": TypeParser.parse_decimal(raw.get("coupon_money")),
|
||||
"coupon_free_time": TypeParser.parse_int(raw.get("coupon_free_time")),
|
||||
"use_status": raw.get("use_status"),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"consume_time": TypeParser.parse_timestamp(
|
||||
raw.get("consume_time") or raw.get("consumeTime"), self.tz
|
||||
),
|
||||
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
|
||||
"operator_name": raw.get("operator_name"),
|
||||
"table_id": TypeParser.parse_int(raw.get("table_id")),
|
||||
"site_order_id": TypeParser.parse_int(raw.get("site_order_id")),
|
||||
"group_package_id": TypeParser.parse_int(raw.get("group_package_id")),
|
||||
"coupon_remark": raw.get("coupon_remark"),
|
||||
"deal_id": raw.get("deal_id"),
|
||||
"certificate_id": raw.get("certificate_id"),
|
||||
"verify_id": raw.get("verify_id"),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
907
etl_billiards/tasks/dwd_load_task.py
Normal file
907
etl_billiards/tasks/dwd_load_task.py
Normal file
@@ -0,0 +1,907 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""DWD 装载任务:从 ODS 增量写入 DWD(维度 SCD2,事实按时间增量)。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any, Dict, Iterable, List, Sequence
|
||||
|
||||
from psycopg2.extras import RealDictCursor
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
|
||||
|
||||
class DwdLoadTask(BaseTask):
|
||||
"""负责 DWD 装载:维度表做 SCD2 合并,事实表按时间增量写入。"""
|
||||
|
||||
# DWD -> ODS 表映射(ODS 表名已与示例 JSON 前缀统一)
|
||||
TABLE_MAP: dict[str, str] = {
|
||||
# 维度
|
||||
# 门店:改用台费流水中的 siteprofile 快照,补齐 org/地址等字段
|
||||
"billiards_dwd.dim_site": "billiards_ods.table_fee_transactions",
|
||||
"billiards_dwd.dim_site_ex": "billiards_ods.table_fee_transactions",
|
||||
"billiards_dwd.dim_table": "billiards_ods.site_tables_master",
|
||||
"billiards_dwd.dim_table_ex": "billiards_ods.site_tables_master",
|
||||
"billiards_dwd.dim_assistant": "billiards_ods.assistant_accounts_master",
|
||||
"billiards_dwd.dim_assistant_ex": "billiards_ods.assistant_accounts_master",
|
||||
"billiards_dwd.dim_member": "billiards_ods.member_profiles",
|
||||
"billiards_dwd.dim_member_ex": "billiards_ods.member_profiles",
|
||||
"billiards_dwd.dim_member_card_account": "billiards_ods.member_stored_value_cards",
|
||||
"billiards_dwd.dim_member_card_account_ex": "billiards_ods.member_stored_value_cards",
|
||||
"billiards_dwd.dim_tenant_goods": "billiards_ods.tenant_goods_master",
|
||||
"billiards_dwd.dim_tenant_goods_ex": "billiards_ods.tenant_goods_master",
|
||||
"billiards_dwd.dim_store_goods": "billiards_ods.store_goods_master",
|
||||
"billiards_dwd.dim_store_goods_ex": "billiards_ods.store_goods_master",
|
||||
"billiards_dwd.dim_goods_category": "billiards_ods.stock_goods_category_tree",
|
||||
"billiards_dwd.dim_groupbuy_package": "billiards_ods.group_buy_packages",
|
||||
"billiards_dwd.dim_groupbuy_package_ex": "billiards_ods.group_buy_packages",
|
||||
# 事实
|
||||
"billiards_dwd.dwd_settlement_head": "billiards_ods.settlement_records",
|
||||
"billiards_dwd.dwd_settlement_head_ex": "billiards_ods.settlement_records",
|
||||
"billiards_dwd.dwd_table_fee_log": "billiards_ods.table_fee_transactions",
|
||||
"billiards_dwd.dwd_table_fee_log_ex": "billiards_ods.table_fee_transactions",
|
||||
"billiards_dwd.dwd_table_fee_adjust": "billiards_ods.table_fee_discount_records",
|
||||
"billiards_dwd.dwd_table_fee_adjust_ex": "billiards_ods.table_fee_discount_records",
|
||||
"billiards_dwd.dwd_store_goods_sale": "billiards_ods.store_goods_sales_records",
|
||||
"billiards_dwd.dwd_store_goods_sale_ex": "billiards_ods.store_goods_sales_records",
|
||||
"billiards_dwd.dwd_assistant_service_log": "billiards_ods.assistant_service_records",
|
||||
"billiards_dwd.dwd_assistant_service_log_ex": "billiards_ods.assistant_service_records",
|
||||
"billiards_dwd.dwd_assistant_trash_event": "billiards_ods.assistant_cancellation_records",
|
||||
"billiards_dwd.dwd_assistant_trash_event_ex": "billiards_ods.assistant_cancellation_records",
|
||||
"billiards_dwd.dwd_member_balance_change": "billiards_ods.member_balance_changes",
|
||||
"billiards_dwd.dwd_member_balance_change_ex": "billiards_ods.member_balance_changes",
|
||||
"billiards_dwd.dwd_groupbuy_redemption": "billiards_ods.group_buy_redemption_records",
|
||||
"billiards_dwd.dwd_groupbuy_redemption_ex": "billiards_ods.group_buy_redemption_records",
|
||||
"billiards_dwd.dwd_platform_coupon_redemption": "billiards_ods.platform_coupon_redemption_records",
|
||||
"billiards_dwd.dwd_platform_coupon_redemption_ex": "billiards_ods.platform_coupon_redemption_records",
|
||||
"billiards_dwd.dwd_recharge_order": "billiards_ods.recharge_settlements",
|
||||
"billiards_dwd.dwd_recharge_order_ex": "billiards_ods.recharge_settlements",
|
||||
"billiards_dwd.dwd_payment": "billiards_ods.payment_transactions",
|
||||
"billiards_dwd.dwd_refund": "billiards_ods.refund_transactions",
|
||||
"billiards_dwd.dwd_refund_ex": "billiards_ods.refund_transactions",
|
||||
}
|
||||
|
||||
SCD_COLS = {"scd2_start_time", "scd2_end_time", "scd2_is_current", "scd2_version"}
|
||||
FACT_ORDER_CANDIDATES = [
|
||||
"fetched_at",
|
||||
"pay_time",
|
||||
"create_time",
|
||||
"update_time",
|
||||
"occur_time",
|
||||
"settle_time",
|
||||
"start_use_time",
|
||||
]
|
||||
|
||||
# 特殊列映射:dwd 列名 -> 源列表达式(可选 CAST)
|
||||
FACT_MAPPINGS: dict[str, list[tuple[str, str, str | None]]] = {
|
||||
# 维度表(补齐主键/字段差异)
|
||||
"billiards_dwd.dim_site": [
|
||||
("org_id", "siteprofile->>'org_id'", None),
|
||||
("shop_name", "siteprofile->>'shop_name'", None),
|
||||
("site_label", "siteprofile->>'site_label'", None),
|
||||
("full_address", "siteprofile->>'full_address'", None),
|
||||
("address", "siteprofile->>'address'", None),
|
||||
("longitude", "siteprofile->>'longitude'", "numeric"),
|
||||
("latitude", "siteprofile->>'latitude'", "numeric"),
|
||||
("tenant_site_region_id", "siteprofile->>'tenant_site_region_id'", None),
|
||||
("business_tel", "siteprofile->>'business_tel'", None),
|
||||
("site_type", "siteprofile->>'site_type'", None),
|
||||
("shop_status", "siteprofile->>'shop_status'", None),
|
||||
("tenant_id", "siteprofile->>'tenant_id'", None),
|
||||
],
|
||||
"billiards_dwd.dim_site_ex": [
|
||||
("auto_light", "siteprofile->>'auto_light'", None),
|
||||
("attendance_enabled", "siteprofile->>'attendance_enabled'", None),
|
||||
("attendance_distance", "siteprofile->>'attendance_distance'", None),
|
||||
("prod_env", "siteprofile->>'prod_env'", None),
|
||||
("light_status", "siteprofile->>'light_status'", None),
|
||||
("light_type", "siteprofile->>'light_type'", None),
|
||||
("light_token", "siteprofile->>'light_token'", None),
|
||||
("address", "siteprofile->>'address'", None),
|
||||
("avatar", "siteprofile->>'avatar'", None),
|
||||
("wifi_name", "siteprofile->>'wifi_name'", None),
|
||||
("wifi_password", "siteprofile->>'wifi_password'", None),
|
||||
("customer_service_qrcode", "siteprofile->>'customer_service_qrcode'", None),
|
||||
("customer_service_wechat", "siteprofile->>'customer_service_wechat'", None),
|
||||
("fixed_pay_qrcode", "siteprofile->>'fixed_pay_qrCode'", None),
|
||||
("longitude", "siteprofile->>'longitude'", "numeric"),
|
||||
("latitude", "siteprofile->>'latitude'", "numeric"),
|
||||
("tenant_site_region_id", "siteprofile->>'tenant_site_region_id'", None),
|
||||
("site_type", "siteprofile->>'site_type'", None),
|
||||
("site_label", "siteprofile->>'site_label'", None),
|
||||
("shop_status", "siteprofile->>'shop_status'", None),
|
||||
("create_time", "siteprofile->>'create_time'", "timestamptz"),
|
||||
("update_time", "siteprofile->>'update_time'", "timestamptz"),
|
||||
],
|
||||
"billiards_dwd.dim_table": [
|
||||
("table_id", "id", None),
|
||||
("site_table_area_name", "areaname", None),
|
||||
("tenant_table_area_id", "site_table_area_id", None),
|
||||
],
|
||||
"billiards_dwd.dim_table_ex": [
|
||||
("table_id", "id", None),
|
||||
("table_cloth_use_time", "table_cloth_use_time", None),
|
||||
],
|
||||
"billiards_dwd.dim_assistant": [("assistant_id", "id", None), ("user_id", "staff_id", None)],
|
||||
"billiards_dwd.dim_assistant_ex": [
|
||||
("assistant_id", "id", None),
|
||||
("introduce", "introduce", None),
|
||||
("group_name", "group_name", None),
|
||||
("light_equipment_id", "light_equipment_id", None),
|
||||
],
|
||||
"billiards_dwd.dim_member": [("member_id", "id", None)],
|
||||
"billiards_dwd.dim_member_ex": [
|
||||
("member_id", "id", None),
|
||||
("register_site_name", "site_name", None),
|
||||
],
|
||||
"billiards_dwd.dim_member_card_account": [("member_card_id", "id", None)],
|
||||
"billiards_dwd.dim_member_card_account_ex": [
|
||||
("member_card_id", "id", None),
|
||||
("tenant_name", "tenantname", None),
|
||||
("tenantavatar", "tenantavatar", None),
|
||||
("card_no", "card_no", None),
|
||||
("bind_password", "bind_password", None),
|
||||
("use_scene", "use_scene", None),
|
||||
("tableareaid", "tableareaid", None),
|
||||
("goodscategoryid", "goodscategoryid", None),
|
||||
],
|
||||
"billiards_dwd.dim_tenant_goods": [
|
||||
("tenant_goods_id", "id", None),
|
||||
("category_name", "categoryname", None),
|
||||
],
|
||||
"billiards_dwd.dim_tenant_goods_ex": [
|
||||
("tenant_goods_id", "id", None),
|
||||
("remark_name", "remark_name", None),
|
||||
("goods_bar_code", "goods_bar_code", None),
|
||||
("commodity_code_list", "commodity_code", None),
|
||||
("is_in_site", "isinsite", "boolean"),
|
||||
],
|
||||
"billiards_dwd.dim_store_goods": [
|
||||
("site_goods_id", "id", None),
|
||||
("category_level1_name", "onecategoryname", None),
|
||||
("category_level2_name", "twocategoryname", None),
|
||||
("created_at", "create_time", None),
|
||||
("updated_at", "update_time", None),
|
||||
("avg_monthly_sales", "average_monthly_sales", None),
|
||||
("batch_stock_qty", "stock", None),
|
||||
("sale_qty", "sale_num", None),
|
||||
("total_sales_qty", "total_sales", None),
|
||||
],
|
||||
"billiards_dwd.dim_store_goods_ex": [
|
||||
("site_goods_id", "id", None),
|
||||
("goods_barcode", "goods_bar_code", None),
|
||||
("stock_qty", "stock", None),
|
||||
("stock_secondary_qty", "stock_a", None),
|
||||
("safety_stock_qty", "safe_stock", None),
|
||||
("site_name", "sitename", None),
|
||||
("goods_cover_url", "goods_cover", None),
|
||||
("provisional_total_cost", "total_purchase_cost", None),
|
||||
("is_discountable", "able_discount", None),
|
||||
("freeze_status", "freeze", None),
|
||||
("remark", "remark", None),
|
||||
("days_on_shelf", "days_available", None),
|
||||
("sort_order", "sort", None),
|
||||
],
|
||||
"billiards_dwd.dim_goods_category": [
|
||||
("category_id", "id", None),
|
||||
("tenant_id", "tenant_id", None),
|
||||
("category_name", "category_name", None),
|
||||
("alias_name", "alias_name", None),
|
||||
("parent_category_id", "pid", None),
|
||||
("business_name", "business_name", None),
|
||||
("tenant_goods_business_id", "tenant_goods_business_id", None),
|
||||
("sort_order", "sort", None),
|
||||
("open_salesman", "open_salesman", None),
|
||||
("is_warehousing", "is_warehousing", None),
|
||||
("category_level", "CASE WHEN pid = 0 THEN 1 ELSE 2 END", None),
|
||||
("is_leaf", "CASE WHEN categoryboxes IS NULL OR jsonb_array_length(categoryboxes)=0 THEN 1 ELSE 0 END", None),
|
||||
],
|
||||
"billiards_dwd.dim_groupbuy_package": [
|
||||
("groupbuy_package_id", "id", None),
|
||||
("package_template_id", "package_id", None),
|
||||
("coupon_face_value", "coupon_money", None),
|
||||
("duration_seconds", "duration", None),
|
||||
],
|
||||
"billiards_dwd.dim_groupbuy_package_ex": [
|
||||
("groupbuy_package_id", "id", None),
|
||||
("table_area_id", "table_area_id", None),
|
||||
("tenant_table_area_id", "tenant_table_area_id", None),
|
||||
("usable_range", "usable_range", None),
|
||||
("table_area_id_list", "table_area_id_list", None),
|
||||
("package_type", "type", None),
|
||||
],
|
||||
# 事实表主键及关键差异列
|
||||
"billiards_dwd.dwd_table_fee_log": [("table_fee_log_id", "id", None)],
|
||||
"billiards_dwd.dwd_table_fee_log_ex": [
|
||||
("table_fee_log_id", "id", None),
|
||||
("salesman_name", "salesman_name", None),
|
||||
],
|
||||
"billiards_dwd.dwd_table_fee_adjust": [
|
||||
("table_fee_adjust_id", "id", None),
|
||||
("table_id", "site_table_id", None),
|
||||
("table_area_id", "tenant_table_area_id", None),
|
||||
("table_area_name", "tableprofile->>'table_area_name'", None),
|
||||
("adjust_time", "create_time", None),
|
||||
],
|
||||
"billiards_dwd.dwd_table_fee_adjust_ex": [
|
||||
("table_fee_adjust_id", "id", None),
|
||||
("ledger_name", "ledger_name", None),
|
||||
],
|
||||
"billiards_dwd.dwd_store_goods_sale": [("store_goods_sale_id", "id", None), ("discount_price", "discount_money", None)],
|
||||
"billiards_dwd.dwd_store_goods_sale_ex": [
|
||||
("store_goods_sale_id", "id", None),
|
||||
("option_value_name", "option_value_name", None),
|
||||
("open_salesman_flag", "opensalesman", "integer"),
|
||||
("salesman_name", "salesman_name", None),
|
||||
("salesman_org_id", "sales_man_org_id", None),
|
||||
("legacy_order_goods_id", "ordergoodsid", None),
|
||||
("site_name", "sitename", None),
|
||||
("legacy_site_id", "siteid", None),
|
||||
],
|
||||
"billiards_dwd.dwd_assistant_service_log": [
|
||||
("assistant_service_id", "id", None),
|
||||
("assistant_no", "assistantno", None),
|
||||
("site_assistant_id", "order_assistant_id", None),
|
||||
("level_name", "levelname", None),
|
||||
("skill_name", "skillname", None),
|
||||
],
|
||||
"billiards_dwd.dwd_assistant_service_log_ex": [
|
||||
("assistant_service_id", "id", None),
|
||||
("assistant_name", "assistantname", None),
|
||||
("ledger_group_name", "ledger_group_name", None),
|
||||
("trash_applicant_name", "trash_applicant_name", None),
|
||||
("trash_reason", "trash_reason", None),
|
||||
("salesman_name", "salesman_name", None),
|
||||
("table_name", "tablename", None),
|
||||
],
|
||||
"billiards_dwd.dwd_assistant_trash_event": [
|
||||
("assistant_trash_event_id", "id", None),
|
||||
("assistant_no", "assistantname", None),
|
||||
("abolish_amount", "assistantabolishamount", None),
|
||||
("charge_minutes_raw", "pdchargeminutes", None),
|
||||
("site_id", "siteid", None),
|
||||
("table_id", "tableid", None),
|
||||
("table_area_id", "tableareaid", None),
|
||||
("assistant_name", "assistantname", None),
|
||||
("trash_reason", "trashreason", None),
|
||||
("create_time", "createtime", None),
|
||||
],
|
||||
"billiards_dwd.dwd_assistant_trash_event_ex": [
|
||||
("assistant_trash_event_id", "id", None),
|
||||
("table_area_name", "tablearea", None),
|
||||
("table_name", "tablename", None),
|
||||
],
|
||||
"billiards_dwd.dwd_member_balance_change": [
|
||||
("balance_change_id", "id", None),
|
||||
("balance_before", "before", None),
|
||||
("change_amount", "account_data", None),
|
||||
("balance_after", "after", None),
|
||||
("card_type_name", "membercardtypename", None),
|
||||
("change_time", "create_time", None),
|
||||
("member_name", "membername", None),
|
||||
("member_mobile", "membermobile", None),
|
||||
],
|
||||
"billiards_dwd.dwd_member_balance_change_ex": [
|
||||
("balance_change_id", "id", None),
|
||||
("pay_site_name", "paysitename", None),
|
||||
("register_site_name", "registersitename", None),
|
||||
],
|
||||
"billiards_dwd.dwd_groupbuy_redemption": [("redemption_id", "id", None)],
|
||||
"billiards_dwd.dwd_groupbuy_redemption_ex": [
|
||||
("redemption_id", "id", None),
|
||||
("table_area_name", "tableareaname", None),
|
||||
("site_name", "sitename", None),
|
||||
("table_name", "tablename", None),
|
||||
("goods_option_price", "goodsoptionprice", None),
|
||||
("salesman_name", "salesman_name", None),
|
||||
("salesman_org_id", "sales_man_org_id", None),
|
||||
("ledger_group_name", "ledger_group_name", None),
|
||||
],
|
||||
"billiards_dwd.dwd_platform_coupon_redemption": [("platform_coupon_redemption_id", "id", None)],
|
||||
"billiards_dwd.dwd_platform_coupon_redemption_ex": [
|
||||
("platform_coupon_redemption_id", "id", None),
|
||||
("coupon_cover", "coupon_cover", None),
|
||||
],
|
||||
"billiards_dwd.dwd_payment": [("payment_id", "id", None), ("pay_date", "pay_time", "date")],
|
||||
"billiards_dwd.dwd_refund": [("refund_id", "id", None)],
|
||||
"billiards_dwd.dwd_refund_ex": [
|
||||
("refund_id", "id", None),
|
||||
("tenant_name", "tenantname", None),
|
||||
("channel_payer_id", "channel_payer_id", None),
|
||||
("channel_pay_no", "channel_pay_no", None),
|
||||
],
|
||||
# 结算头:settlement_records(源列为小写驼峰/无下划线,需要显式映射)
|
||||
"billiards_dwd.dwd_settlement_head": [
|
||||
("order_settle_id", "id", None),
|
||||
("tenant_id", "tenantid", None),
|
||||
("site_id", "siteid", None),
|
||||
("site_name", "sitename", None),
|
||||
("table_id", "tableid", None),
|
||||
("settle_name", "settlename", None),
|
||||
("order_trade_no", "settlerelateid", None),
|
||||
("create_time", "createtime", None),
|
||||
("pay_time", "paytime", None),
|
||||
("settle_type", "settletype", None),
|
||||
("revoke_order_id", "revokeorderid", None),
|
||||
("member_id", "memberid", None),
|
||||
("member_name", "membername", None),
|
||||
("member_phone", "memberphone", None),
|
||||
("member_card_account_id", "tenantmembercardid", None),
|
||||
("member_card_type_name", "membercardtypename", None),
|
||||
("is_bind_member", "isbindmember", None),
|
||||
("member_discount_amount", "memberdiscountamount", None),
|
||||
("consume_money", "consumemoney", None),
|
||||
("table_charge_money", "tablechargemoney", None),
|
||||
("goods_money", "goodsmoney", None),
|
||||
("real_goods_money", "realgoodsmoney", None),
|
||||
("assistant_pd_money", "assistantpdmoney", None),
|
||||
("assistant_cx_money", "assistantcxmoney", None),
|
||||
("adjust_amount", "adjustamount", None),
|
||||
("pay_amount", "payamount", None),
|
||||
("balance_amount", "balanceamount", None),
|
||||
("recharge_card_amount", "rechargecardamount", None),
|
||||
("gift_card_amount", "giftcardamount", None),
|
||||
("coupon_amount", "couponamount", None),
|
||||
("rounding_amount", "roundingamount", None),
|
||||
("point_amount", "pointamount", None),
|
||||
],
|
||||
"billiards_dwd.dwd_settlement_head_ex": [
|
||||
("order_settle_id", "id", None),
|
||||
("serial_number", "serialnumber", None),
|
||||
("settle_status", "settlestatus", None),
|
||||
("can_be_revoked", "canberevoked", "boolean"),
|
||||
("revoke_order_name", "revokeordername", None),
|
||||
("revoke_time", "revoketime", None),
|
||||
("is_first_order", "isfirst", "boolean"),
|
||||
("service_money", "servicemoney", None),
|
||||
("cash_amount", "cashamount", None),
|
||||
("card_amount", "cardamount", None),
|
||||
("online_amount", "onlineamount", None),
|
||||
("refund_amount", "refundamount", None),
|
||||
("prepay_money", "prepaymoney", None),
|
||||
("payment_method", "paymentmethod", None),
|
||||
("coupon_sale_amount", "couponsaleamount", None),
|
||||
("all_coupon_discount", "allcoupondiscount", None),
|
||||
("goods_promotion_money", "goodspromotionmoney", None),
|
||||
("assistant_promotion_money", "assistantpromotionmoney", None),
|
||||
("activity_discount", "activitydiscount", None),
|
||||
("assistant_manual_discount", "assistantmanualdiscount", None),
|
||||
("point_discount_price", "pointdiscountprice", None),
|
||||
("point_discount_cost", "pointdiscountcost", None),
|
||||
("is_use_coupon", "isusecoupon", "boolean"),
|
||||
("is_use_discount", "isusediscount", "boolean"),
|
||||
("is_activity", "isactivity", "boolean"),
|
||||
("operator_name", "operatorname", None),
|
||||
("salesman_name", "salesmanname", None),
|
||||
("order_remark", "orderremark", None),
|
||||
("operator_id", "operatorid", None),
|
||||
("salesman_user_id", "salesmanuserid", None),
|
||||
],
|
||||
# 充值结算:recharge_settlements(字段风格同 settlement_records)
|
||||
"billiards_dwd.dwd_recharge_order": [
|
||||
("recharge_order_id", "id", None),
|
||||
("tenant_id", "tenantid", None),
|
||||
("site_id", "siteid", None),
|
||||
("member_id", "memberid", None),
|
||||
("member_name_snapshot", "membername", None),
|
||||
("member_phone_snapshot", "memberphone", None),
|
||||
("tenant_member_card_id", "tenantmembercardid", None),
|
||||
("member_card_type_name", "membercardtypename", None),
|
||||
("settle_relate_id", "settlerelateid", None),
|
||||
("settle_type", "settletype", None),
|
||||
("settle_name", "settlename", None),
|
||||
("is_first", "isfirst", None),
|
||||
("pay_amount", "payamount", None),
|
||||
("refund_amount", "refundamount", None),
|
||||
("point_amount", "pointamount", None),
|
||||
("cash_amount", "cashamount", None),
|
||||
("payment_method", "paymentmethod", None),
|
||||
("create_time", "createtime", None),
|
||||
("pay_time", "paytime", None),
|
||||
],
|
||||
"billiards_dwd.dwd_recharge_order_ex": [
|
||||
("recharge_order_id", "id", None),
|
||||
("site_name_snapshot", "sitename", None),
|
||||
("salesman_name", "salesmanname", None),
|
||||
("order_remark", "orderremark", None),
|
||||
("revoke_order_name", "revokeordername", None),
|
||||
("settle_status", "settlestatus", None),
|
||||
("is_bind_member", "isbindmember", "boolean"),
|
||||
("is_activity", "isactivity", "boolean"),
|
||||
("is_use_coupon", "isusecoupon", "boolean"),
|
||||
("is_use_discount", "isusediscount", "boolean"),
|
||||
("can_be_revoked", "canberevoked", "boolean"),
|
||||
("online_amount", "onlineamount", None),
|
||||
("balance_amount", "balanceamount", None),
|
||||
("card_amount", "cardamount", None),
|
||||
("coupon_amount", "couponamount", None),
|
||||
("recharge_card_amount", "rechargecardamount", None),
|
||||
("gift_card_amount", "giftcardamount", None),
|
||||
("prepay_money", "prepaymoney", None),
|
||||
("consume_money", "consumemoney", None),
|
||||
("goods_money", "goodsmoney", None),
|
||||
("real_goods_money", "realgoodsmoney", None),
|
||||
("table_charge_money", "tablechargemoney", None),
|
||||
("service_money", "servicemoney", None),
|
||||
("activity_discount", "activitydiscount", None),
|
||||
("all_coupon_discount", "allcoupondiscount", None),
|
||||
("goods_promotion_money", "goodspromotionmoney", None),
|
||||
("assistant_promotion_money", "assistantpromotionmoney", None),
|
||||
("assistant_pd_money", "assistantpdmoney", None),
|
||||
("assistant_cx_money", "assistantcxmoney", None),
|
||||
("assistant_manual_discount", "assistantmanualdiscount", None),
|
||||
("coupon_sale_amount", "couponsaleamount", None),
|
||||
("member_discount_amount", "memberdiscountamount", None),
|
||||
("point_discount_price", "pointdiscountprice", None),
|
||||
("point_discount_cost", "pointdiscountcost", None),
|
||||
("adjust_amount", "adjustamount", None),
|
||||
("rounding_amount", "roundingamount", None),
|
||||
("operator_id", "operatorid", None),
|
||||
("operator_name_snapshot", "operatorname", None),
|
||||
("salesman_user_id", "salesmanuserid", None),
|
||||
("salesman_name", "salesmanname", None),
|
||||
("order_remark", "orderremark", None),
|
||||
("table_id", "tableid", None),
|
||||
("serial_number", "serialnumber", None),
|
||||
("revoke_order_id", "revokeorderid", None),
|
||||
("revoke_order_name", "revokeordername", None),
|
||||
("revoke_time", "revoketime", None),
|
||||
],
|
||||
}
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "DWD_LOAD_FROM_ODS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict[str, Any]:
|
||||
"""准备运行所需的上下文信息。"""
|
||||
return {"now": datetime.now()}
|
||||
|
||||
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict[str, Any]:
|
||||
"""遍历映射关系,维度执行 SCD2 合并,事实表按时间增量插入。"""
|
||||
now = extracted["now"]
|
||||
summary: List[Dict[str, Any]] = []
|
||||
with self.db.conn.cursor(cursor_factory=RealDictCursor) as cur:
|
||||
for dwd_table, ods_table in self.TABLE_MAP.items():
|
||||
dwd_cols = self._get_columns(cur, dwd_table)
|
||||
ods_cols = self._get_columns(cur, ods_table)
|
||||
if not dwd_cols:
|
||||
self.logger.warning("跳过 %s,未能获取 DWD 列信息", dwd_table)
|
||||
continue
|
||||
|
||||
if self._table_base(dwd_table).startswith("dim_"):
|
||||
processed = self._merge_dim_scd2(cur, dwd_table, ods_table, dwd_cols, ods_cols, now)
|
||||
summary.append({"table": dwd_table, "mode": "SCD2", "processed": processed})
|
||||
else:
|
||||
dwd_types = self._get_column_types(cur, dwd_table, "billiards_dwd")
|
||||
ods_types = self._get_column_types(cur, ods_table, "billiards_ods")
|
||||
inserted = self._merge_fact_increment(
|
||||
cur, dwd_table, ods_table, dwd_cols, ods_cols, dwd_types, ods_types
|
||||
)
|
||||
summary.append({"table": dwd_table, "mode": "INCREMENT", "inserted": inserted})
|
||||
|
||||
self.db.conn.commit()
|
||||
return {"tables": summary}
|
||||
|
||||
# ---------------------- helpers ----------------------
|
||||
def _get_columns(self, cur, table: str) -> List[str]:
|
||||
"""获取指定表的列名(小写)。"""
|
||||
schema, name = self._split_table_name(table, default_schema="billiards_dwd")
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT column_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s AND table_name = %s
|
||||
""",
|
||||
(schema, name),
|
||||
)
|
||||
return [r["column_name"].lower() for r in cur.fetchall()]
|
||||
|
||||
def _get_primary_keys(self, cur, table: str) -> List[str]:
|
||||
"""获取表的主键列名列表。"""
|
||||
schema, name = self._split_table_name(table, default_schema="billiards_dwd")
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT kcu.column_name
|
||||
FROM information_schema.table_constraints tc
|
||||
JOIN information_schema.key_column_usage kcu
|
||||
ON tc.constraint_name = kcu.constraint_name
|
||||
AND tc.table_schema = kcu.table_schema
|
||||
AND tc.table_name = kcu.table_name
|
||||
WHERE tc.table_schema = %s
|
||||
AND tc.table_name = %s
|
||||
AND tc.constraint_type = 'PRIMARY KEY'
|
||||
ORDER BY kcu.ordinal_position
|
||||
""",
|
||||
(schema, name),
|
||||
)
|
||||
return [r["column_name"].lower() for r in cur.fetchall()]
|
||||
|
||||
def _get_column_types(self, cur, table: str, default_schema: str) -> Dict[str, str]:
|
||||
"""获取列的数据类型(information_schema.data_type)。"""
|
||||
schema, name = self._split_table_name(table, default_schema=default_schema)
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT column_name, data_type
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s AND table_name = %s
|
||||
""",
|
||||
(schema, name),
|
||||
)
|
||||
return {r["column_name"].lower(): r["data_type"].lower() for r in cur.fetchall()}
|
||||
|
||||
def _build_column_mapping(
|
||||
self, dwd_table: str, pk_cols: Sequence[str], ods_cols: Sequence[str]
|
||||
) -> Dict[str, tuple[str, str | None]]:
|
||||
"""合并显式 FACT_MAPPINGS 与主键兜底映射。"""
|
||||
mapping_entries = self.FACT_MAPPINGS.get(dwd_table, [])
|
||||
mapping: Dict[str, tuple[str, str | None]] = {
|
||||
dst.lower(): (src, cast_type) for dst, src, cast_type in mapping_entries
|
||||
}
|
||||
ods_set = {c.lower() for c in ods_cols}
|
||||
for pk in pk_cols:
|
||||
pk_lower = pk.lower()
|
||||
if pk_lower not in mapping and pk_lower not in ods_set and "id" in ods_set:
|
||||
mapping[pk_lower] = ("id", None)
|
||||
return mapping
|
||||
|
||||
def _fetch_source_rows(
|
||||
self, cur, table: str, columns: Sequence[str], where_sql: str = "", params: Sequence[Any] = None
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""从源表读取指定列,返回小写键的字典列表。"""
|
||||
schema, name = self._split_table_name(table, default_schema="billiards_ods")
|
||||
cols_sql = ", ".join(f'"{c}"' for c in columns)
|
||||
sql = f'SELECT {cols_sql} FROM "{schema}"."{name}" {where_sql}'
|
||||
cur.execute(sql, params or [])
|
||||
rows = []
|
||||
for r in cur.fetchall():
|
||||
rows.append({k.lower(): v for k, v in r.items()})
|
||||
return rows
|
||||
|
||||
def _expand_goods_category_rows(self, rows: list[Dict[str, Any]]) -> list[Dict[str, Any]]:
|
||||
"""将分类表中的 categoryboxes 元素展开为子类记录。"""
|
||||
expanded: list[Dict[str, Any]] = []
|
||||
for r in rows:
|
||||
expanded.append(r)
|
||||
boxes = r.get("categoryboxes")
|
||||
if isinstance(boxes, list):
|
||||
for child in boxes:
|
||||
if not isinstance(child, dict):
|
||||
continue
|
||||
child_row: Dict[str, Any] = {}
|
||||
# 继承父级的租户与业务大类信息
|
||||
child_row["tenant_id"] = r.get("tenant_id")
|
||||
child_row["business_name"] = child.get("business_name", r.get("business_name"))
|
||||
child_row["tenant_goods_business_id"] = child.get(
|
||||
"tenant_goods_business_id", r.get("tenant_goods_business_id")
|
||||
)
|
||||
# 合并子类字段
|
||||
child_row.update(child)
|
||||
# 默认父子关系
|
||||
child_row.setdefault("pid", r.get("id"))
|
||||
# 衍生层级/叶子标记
|
||||
child_boxes = child_row.get("categoryboxes")
|
||||
if not isinstance(child_boxes, list):
|
||||
is_leaf = 1
|
||||
else:
|
||||
is_leaf = 1 if len(child_boxes) == 0 else 0
|
||||
child_row.setdefault("category_level", 2)
|
||||
child_row.setdefault("is_leaf", is_leaf)
|
||||
expanded.append(child_row)
|
||||
return expanded
|
||||
|
||||
def _merge_dim_scd2(
|
||||
self,
|
||||
cur,
|
||||
dwd_table: str,
|
||||
ods_table: str,
|
||||
dwd_cols: Sequence[str],
|
||||
ods_cols: Sequence[str],
|
||||
now: datetime,
|
||||
) -> int:
|
||||
"""对维表执行 SCD2 合并:对比变更关闭旧版并插入新版。"""
|
||||
pk_cols = self._get_primary_keys(cur, dwd_table)
|
||||
if not pk_cols:
|
||||
raise ValueError(f"{dwd_table} 未配置主键,无法执行 SCD2 合并")
|
||||
|
||||
mapping = self._build_column_mapping(dwd_table, pk_cols, ods_cols)
|
||||
ods_set = {c.lower() for c in ods_cols}
|
||||
table_sql = self._format_table(ods_table, "billiards_ods")
|
||||
# 构造 SELECT 表达式,支持 JSON/expression 映射
|
||||
select_exprs: list[str] = []
|
||||
added: set[str] = set()
|
||||
for col in dwd_cols:
|
||||
lc = col.lower()
|
||||
if lc in self.SCD_COLS:
|
||||
continue
|
||||
if lc in mapping:
|
||||
src, cast_type = mapping[lc]
|
||||
select_exprs.append(f"{self._cast_expr(src, cast_type)} AS \"{lc}\"")
|
||||
added.add(lc)
|
||||
elif lc in ods_set:
|
||||
select_exprs.append(f'"{lc}" AS "{lc}"')
|
||||
added.add(lc)
|
||||
# 分类维度需要额外读取 categoryboxes 以展开子类
|
||||
if dwd_table == "billiards_dwd.dim_goods_category" and "categoryboxes" not in added and "categoryboxes" in ods_set:
|
||||
select_exprs.append('"categoryboxes" AS "categoryboxes"')
|
||||
added.add("categoryboxes")
|
||||
# 主键兜底确保被选出
|
||||
for pk in pk_cols:
|
||||
lc = pk.lower()
|
||||
if lc not in added:
|
||||
if lc in mapping:
|
||||
src, cast_type = mapping[lc]
|
||||
select_exprs.append(f"{self._cast_expr(src, cast_type)} AS \"{lc}\"")
|
||||
elif lc in ods_set:
|
||||
select_exprs.append(f'"{lc}" AS "{lc}"')
|
||||
added.add(lc)
|
||||
|
||||
if not select_exprs:
|
||||
return 0
|
||||
|
||||
sql = f"SELECT {', '.join(select_exprs)} FROM {table_sql}"
|
||||
cur.execute(sql)
|
||||
rows = [{k.lower(): v for k, v in r.items()} for r in cur.fetchall()]
|
||||
|
||||
# 特殊:分类维度展开子类
|
||||
if dwd_table == "billiards_dwd.dim_goods_category":
|
||||
rows = self._expand_goods_category_rows(rows)
|
||||
|
||||
inserted_or_updated = 0
|
||||
seen_pk = set()
|
||||
for row in rows:
|
||||
mapped_row: Dict[str, Any] = {}
|
||||
for col in dwd_cols:
|
||||
lc = col.lower()
|
||||
if lc in self.SCD_COLS:
|
||||
continue
|
||||
value = row.get(lc)
|
||||
if value is None and lc in mapping:
|
||||
src, _ = mapping[lc]
|
||||
value = row.get(src.lower())
|
||||
mapped_row[lc] = value
|
||||
|
||||
pk_key = tuple(mapped_row.get(pk) for pk in pk_cols)
|
||||
if pk_key in seen_pk:
|
||||
continue
|
||||
seen_pk.add(pk_key)
|
||||
if self._upsert_scd2_row(cur, dwd_table, dwd_cols, pk_cols, mapped_row, now):
|
||||
inserted_or_updated += 1
|
||||
return len(rows)
|
||||
|
||||
def _upsert_scd2_row(
|
||||
self,
|
||||
cur,
|
||||
dwd_table: str,
|
||||
dwd_cols: Sequence[str],
|
||||
pk_cols: Sequence[str],
|
||||
src_row: Dict[str, Any],
|
||||
now: datetime,
|
||||
) -> bool:
|
||||
"""SCD2 合并:若有变更则关闭旧版并插入新版本。"""
|
||||
pk_values = [src_row.get(pk) for pk in pk_cols]
|
||||
if any(v is None for v in pk_values):
|
||||
self.logger.warning("跳过 %s:主键缺失 %s", dwd_table, dict(zip(pk_cols, pk_values)))
|
||||
return False
|
||||
|
||||
where_clause = " AND ".join(f'"{pk}" = %s' for pk in pk_cols)
|
||||
table_sql = self._format_table(dwd_table, "billiards_dwd")
|
||||
cur.execute(
|
||||
f"SELECT * FROM {table_sql} WHERE {where_clause} AND COALESCE(scd2_is_current,1)=1 LIMIT 1",
|
||||
pk_values,
|
||||
)
|
||||
current = cur.fetchone()
|
||||
if current:
|
||||
current = {k.lower(): v for k, v in current.items()}
|
||||
|
||||
if current and not self._is_row_changed(current, src_row, dwd_cols):
|
||||
return False
|
||||
|
||||
if current:
|
||||
version = (current.get("scd2_version") or 1) + 1
|
||||
self._close_current_dim(cur, dwd_table, pk_cols, pk_values, now)
|
||||
else:
|
||||
version = 1
|
||||
|
||||
self._insert_dim_row(cur, dwd_table, dwd_cols, src_row, now, version)
|
||||
return True
|
||||
|
||||
def _close_current_dim(self, cur, table: str, pk_cols: Sequence[str], pk_values: Sequence[Any], now: datetime) -> None:
|
||||
"""关闭当前版本,标记 scd2_is_current=0 并填充结束时间。"""
|
||||
set_sql = "scd2_end_time = %s, scd2_is_current = 0"
|
||||
where_clause = " AND ".join(f'"{pk}" = %s' for pk in pk_cols)
|
||||
table_sql = self._format_table(table, "billiards_dwd")
|
||||
cur.execute(f"UPDATE {table_sql} SET {set_sql} WHERE {where_clause} AND COALESCE(scd2_is_current,1)=1", [now, *pk_values])
|
||||
|
||||
def _insert_dim_row(
|
||||
self,
|
||||
cur,
|
||||
table: str,
|
||||
dwd_cols: Sequence[str],
|
||||
src_row: Dict[str, Any],
|
||||
now: datetime,
|
||||
version: int,
|
||||
) -> None:
|
||||
"""插入新的 SCD2 版本行。"""
|
||||
insert_cols: List[str] = []
|
||||
placeholders: List[str] = []
|
||||
values: List[Any] = []
|
||||
for col in sorted(dwd_cols):
|
||||
lc = col.lower()
|
||||
insert_cols.append(f'"{lc}"')
|
||||
placeholders.append("%s")
|
||||
if lc == "scd2_start_time":
|
||||
values.append(now)
|
||||
elif lc == "scd2_end_time":
|
||||
values.append(datetime(9999, 12, 31, 0, 0, 0))
|
||||
elif lc == "scd2_is_current":
|
||||
values.append(1)
|
||||
elif lc == "scd2_version":
|
||||
values.append(version)
|
||||
else:
|
||||
values.append(src_row.get(lc))
|
||||
table_sql = self._format_table(table, "billiards_dwd")
|
||||
sql = f'INSERT INTO {table_sql} ({", ".join(insert_cols)}) VALUES ({", ".join(placeholders)})'
|
||||
cur.execute(sql, values)
|
||||
|
||||
def _is_row_changed(self, current: Dict[str, Any], incoming: Dict[str, Any], dwd_cols: Sequence[str]) -> bool:
|
||||
"""比较非 SCD2 列,判断是否存在变更。"""
|
||||
for col in dwd_cols:
|
||||
lc = col.lower()
|
||||
if lc in self.SCD_COLS:
|
||||
continue
|
||||
if current.get(lc) != incoming.get(lc):
|
||||
return True
|
||||
return False
|
||||
|
||||
def _merge_fact_increment(
|
||||
self,
|
||||
cur,
|
||||
dwd_table: str,
|
||||
ods_table: str,
|
||||
dwd_cols: Sequence[str],
|
||||
ods_cols: Sequence[str],
|
||||
dwd_types: Dict[str, str],
|
||||
ods_types: Dict[str, str],
|
||||
) -> int:
|
||||
"""事实表按时间增量插入,默认按列名交集写入。"""
|
||||
mapping_entries = self.FACT_MAPPINGS.get(dwd_table) or []
|
||||
mapping: Dict[str, tuple[str, str | None]] = {
|
||||
dst.lower(): (src, cast_type) for dst, src, cast_type in mapping_entries
|
||||
}
|
||||
|
||||
mapping_dest = [dst for dst, _, _ in mapping_entries]
|
||||
insert_cols: List[str] = list(mapping_dest)
|
||||
for col in dwd_cols:
|
||||
if col in self.SCD_COLS:
|
||||
continue
|
||||
if col in insert_cols:
|
||||
continue
|
||||
if col in ods_cols:
|
||||
insert_cols.append(col)
|
||||
|
||||
pk_cols = self._get_primary_keys(cur, dwd_table)
|
||||
ods_set = {c.lower() for c in ods_cols}
|
||||
existing_lower = [c.lower() for c in insert_cols]
|
||||
for pk in pk_cols:
|
||||
pk_lower = pk.lower()
|
||||
if pk_lower in existing_lower:
|
||||
continue
|
||||
if pk_lower in ods_set:
|
||||
insert_cols.append(pk)
|
||||
existing_lower.append(pk_lower)
|
||||
elif "id" in ods_set:
|
||||
insert_cols.append(pk)
|
||||
existing_lower.append(pk_lower)
|
||||
mapping[pk_lower] = ("id", None)
|
||||
|
||||
# 保持列顺序同时去重
|
||||
seen_cols: set[str] = set()
|
||||
ordered_cols: list[str] = []
|
||||
for col in insert_cols:
|
||||
lc = col.lower()
|
||||
if lc not in seen_cols:
|
||||
seen_cols.add(lc)
|
||||
ordered_cols.append(col)
|
||||
insert_cols = ordered_cols
|
||||
|
||||
if not insert_cols:
|
||||
self.logger.warning("跳过 %s:未找到可插入的列", dwd_table)
|
||||
return 0
|
||||
|
||||
order_col = self._pick_order_column(dwd_cols, ods_cols)
|
||||
where_sql = ""
|
||||
params: List[Any] = []
|
||||
dwd_table_sql = self._format_table(dwd_table, "billiards_dwd")
|
||||
ods_table_sql = self._format_table(ods_table, "billiards_ods")
|
||||
if order_col:
|
||||
cur.execute(f'SELECT COALESCE(MAX("{order_col}"), %s) FROM {dwd_table_sql}', ("1970-01-01",))
|
||||
row = cur.fetchone() or {}
|
||||
watermark = list(row.values())[0] if row else "1970-01-01"
|
||||
where_sql = f'WHERE "{order_col}" > %s'
|
||||
params.append(watermark)
|
||||
|
||||
default_cols = [c for c in insert_cols if c.lower() not in mapping]
|
||||
default_expr_map: Dict[str, str] = {}
|
||||
if default_cols:
|
||||
default_exprs = self._build_fact_select_exprs(default_cols, dwd_types, ods_types)
|
||||
default_expr_map = dict(zip(default_cols, default_exprs))
|
||||
|
||||
select_exprs: List[str] = []
|
||||
for col in insert_cols:
|
||||
key = col.lower()
|
||||
if key in mapping:
|
||||
src, cast_type = mapping[key]
|
||||
select_exprs.append(self._cast_expr(src, cast_type))
|
||||
else:
|
||||
select_exprs.append(default_expr_map[col])
|
||||
|
||||
select_cols_sql = ", ".join(select_exprs)
|
||||
insert_cols_sql = ", ".join(f'"{c}"' for c in insert_cols)
|
||||
sql = f'INSERT INTO {dwd_table_sql} ({insert_cols_sql}) SELECT {select_cols_sql} FROM {ods_table_sql} {where_sql}'
|
||||
|
||||
pk_cols = self._get_primary_keys(cur, dwd_table)
|
||||
if pk_cols:
|
||||
pk_sql = ", ".join(f'"{c}"' for c in pk_cols)
|
||||
sql += f" ON CONFLICT ({pk_sql}) DO NOTHING"
|
||||
|
||||
cur.execute(sql, params)
|
||||
return cur.rowcount
|
||||
|
||||
def _pick_order_column(self, dwd_cols: Iterable[str], ods_cols: Iterable[str]) -> str | None:
|
||||
"""选择用于增量的时间列(需同时存在于 DWD 与 ODS)。"""
|
||||
lower_cols = {c.lower() for c in dwd_cols} & {c.lower() for c in ods_cols}
|
||||
for candidate in self.FACT_ORDER_CANDIDATES:
|
||||
if candidate.lower() in lower_cols:
|
||||
return candidate.lower()
|
||||
return None
|
||||
|
||||
def _build_fact_select_exprs(
|
||||
self,
|
||||
insert_cols: Sequence[str],
|
||||
dwd_types: Dict[str, str],
|
||||
ods_types: Dict[str, str],
|
||||
) -> List[str]:
|
||||
"""构造事实表 SELECT 列表,需要时做类型转换。"""
|
||||
numeric_types = {"integer", "bigint", "smallint", "numeric", "double precision", "real", "decimal"}
|
||||
text_types = {"text", "character varying", "varchar"}
|
||||
exprs = []
|
||||
for col in insert_cols:
|
||||
d_type = dwd_types.get(col)
|
||||
o_type = ods_types.get(col)
|
||||
if d_type in numeric_types and o_type in text_types:
|
||||
exprs.append(f"CAST(NULLIF(CAST(\"{col}\" AS text), '') AS numeric):: {d_type}")
|
||||
else:
|
||||
exprs.append(f'"{col}"')
|
||||
return exprs
|
||||
|
||||
def _split_table_name(self, name: str, default_schema: str) -> tuple[str, str]:
|
||||
"""拆分 schema.table,若无 schema 则补默认 schema。"""
|
||||
parts = name.split(".")
|
||||
if len(parts) == 2:
|
||||
return parts[0], parts[1].lower()
|
||||
return default_schema, name.lower()
|
||||
|
||||
def _table_base(self, name: str) -> str:
|
||||
"""获取不含 schema 的表名。"""
|
||||
return name.split(".")[-1]
|
||||
|
||||
def _format_table(self, name: str, default_schema: str) -> str:
|
||||
"""返回带引号的 schema.table 名称。"""
|
||||
schema, table = self._split_table_name(name, default_schema)
|
||||
return f'"{schema}"."{table}"'
|
||||
|
||||
def _cast_expr(self, col: str, cast_type: str | None) -> str:
|
||||
"""构造带可选 CAST 的列表达式。"""
|
||||
if col.upper() == "NULL":
|
||||
base = "NULL"
|
||||
else:
|
||||
is_expr = not col.isidentifier() or "->" in col or "#>>" in col or "::" in col or "'" in col
|
||||
base = col if is_expr else f'"{col}"'
|
||||
if cast_type:
|
||||
cast_lower = cast_type.lower()
|
||||
if cast_lower in {"bigint", "integer", "numeric", "decimal"}:
|
||||
return f"CAST(NULLIF(CAST({base} AS text), '') AS numeric):: {cast_type}"
|
||||
if cast_lower == "timestamptz":
|
||||
return f"({base})::timestamptz"
|
||||
return f"{base}::{cast_type}"
|
||||
return base
|
||||
105
etl_billiards/tasks/dwd_quality_task.py
Normal file
105
etl_billiards/tasks/dwd_quality_task.py
Normal file
@@ -0,0 +1,105 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""DWD 质量核对任务:按 dwd_quality_check.md 输出行数/金额对照报表。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Iterable, List, Sequence, Tuple
|
||||
|
||||
from psycopg2.extras import RealDictCursor
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from .dwd_load_task import DwdLoadTask
|
||||
|
||||
|
||||
class DwdQualityTask(BaseTask):
|
||||
"""对 ODS 与 DWD 进行行数、金额对照核查,生成 JSON 报表。"""
|
||||
|
||||
REPORT_PATH = Path("etl_billiards/reports/dwd_quality_report.json")
|
||||
AMOUNT_KEYWORDS = ("amount", "money", "fee", "balance")
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "DWD_QUALITY_CHECK"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict[str, Any]:
|
||||
"""准备运行时上下文。"""
|
||||
return {"now": datetime.now()}
|
||||
|
||||
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict[str, Any]:
|
||||
"""输出行数/金额差异报表到本地文件。"""
|
||||
report: Dict[str, Any] = {
|
||||
"generated_at": extracted["now"].isoformat(),
|
||||
"tables": [],
|
||||
"note": "行数/金额核对,金额字段基于列名包含 amount/money/fee/balance 的数值列自动扫描。",
|
||||
}
|
||||
|
||||
with self.db.conn.cursor(cursor_factory=RealDictCursor) as cur:
|
||||
for dwd_table, ods_table in DwdLoadTask.TABLE_MAP.items():
|
||||
count_info = self._compare_counts(cur, dwd_table, ods_table)
|
||||
amount_info = self._compare_amounts(cur, dwd_table, ods_table)
|
||||
report["tables"].append(
|
||||
{
|
||||
"dwd_table": dwd_table,
|
||||
"ods_table": ods_table,
|
||||
"count": count_info,
|
||||
"amounts": amount_info,
|
||||
}
|
||||
)
|
||||
|
||||
self.REPORT_PATH.parent.mkdir(parents=True, exist_ok=True)
|
||||
self.REPORT_PATH.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8")
|
||||
self.logger.info("DWD 质检报表已生成:%s", self.REPORT_PATH)
|
||||
return {"report_path": str(self.REPORT_PATH)}
|
||||
|
||||
# ---------------------- helpers ----------------------
|
||||
def _compare_counts(self, cur, dwd_table: str, ods_table: str) -> Dict[str, Any]:
|
||||
"""统计两端行数并返回差异。"""
|
||||
dwd_schema, dwd_name = self._split_table_name(dwd_table, default_schema="billiards_dwd")
|
||||
ods_schema, ods_name = self._split_table_name(ods_table, default_schema="billiards_ods")
|
||||
cur.execute(f'SELECT COUNT(1) AS cnt FROM "{dwd_schema}"."{dwd_name}"')
|
||||
dwd_cnt = cur.fetchone()["cnt"]
|
||||
cur.execute(f'SELECT COUNT(1) AS cnt FROM "{ods_schema}"."{ods_name}"')
|
||||
ods_cnt = cur.fetchone()["cnt"]
|
||||
return {"dwd": dwd_cnt, "ods": ods_cnt, "diff": dwd_cnt - ods_cnt}
|
||||
|
||||
def _compare_amounts(self, cur, dwd_table: str, ods_table: str) -> List[Dict[str, Any]]:
|
||||
"""扫描金额相关列,生成 ODS 与 DWD 的汇总对照。"""
|
||||
dwd_schema, dwd_name = self._split_table_name(dwd_table, default_schema="billiards_dwd")
|
||||
ods_schema, ods_name = self._split_table_name(ods_table, default_schema="billiards_ods")
|
||||
|
||||
dwd_amount_cols = self._get_numeric_amount_columns(cur, dwd_schema, dwd_name)
|
||||
ods_amount_cols = self._get_numeric_amount_columns(cur, ods_schema, ods_name)
|
||||
common_amount_cols = sorted(set(dwd_amount_cols) & set(ods_amount_cols))
|
||||
|
||||
results: List[Dict[str, Any]] = []
|
||||
for col in common_amount_cols:
|
||||
cur.execute(f'SELECT COALESCE(SUM("{col}"),0) AS val FROM "{dwd_schema}"."{dwd_name}"')
|
||||
dwd_sum = cur.fetchone()["val"]
|
||||
cur.execute(f'SELECT COALESCE(SUM("{col}"),0) AS val FROM "{ods_schema}"."{ods_name}"')
|
||||
ods_sum = cur.fetchone()["val"]
|
||||
results.append({"column": col, "dwd_sum": float(dwd_sum or 0), "ods_sum": float(ods_sum or 0), "diff": float(dwd_sum or 0) - float(ods_sum or 0)})
|
||||
return results
|
||||
|
||||
def _get_numeric_amount_columns(self, cur, schema: str, table: str) -> List[str]:
|
||||
"""获取列名包含金额关键词的数值型字段。"""
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT column_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s
|
||||
AND table_name = %s
|
||||
AND data_type IN ('numeric','double precision','integer','bigint','smallint','real','decimal')
|
||||
""",
|
||||
(schema, table),
|
||||
)
|
||||
cols = [r["column_name"].lower() for r in cur.fetchall()]
|
||||
return [c for c in cols if any(key in c for key in self.AMOUNT_KEYWORDS)]
|
||||
|
||||
def _split_table_name(self, name: str, default_schema: str) -> Tuple[str, str]:
|
||||
"""拆分 schema 与表名,缺省使用 default_schema。"""
|
||||
parts = name.split(".")
|
||||
if len(parts) == 2:
|
||||
return parts[0], parts[1]
|
||||
return default_schema, name
|
||||
36
etl_billiards/tasks/init_dwd_schema_task.py
Normal file
36
etl_billiards/tasks/init_dwd_schema_task.py
Normal file
@@ -0,0 +1,36 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""初始化 DWD Schema:执行 schema_dwd_doc.sql,可选先 DROP SCHEMA。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
|
||||
|
||||
class InitDwdSchemaTask(BaseTask):
|
||||
"""通过调度执行 DWD schema 初始化。"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "INIT_DWD_SCHEMA"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict[str, Any]:
|
||||
"""读取 DWD SQL 文件与参数。"""
|
||||
base_dir = Path(__file__).resolve().parents[1] / "database"
|
||||
dwd_path = Path(self.config.get("schema.dwd_file", base_dir / "schema_dwd_doc.sql"))
|
||||
if not dwd_path.exists():
|
||||
raise FileNotFoundError(f"未找到 DWD schema 文件: {dwd_path}")
|
||||
|
||||
drop_first = self.config.get("dwd.drop_schema_first", False)
|
||||
return {"dwd_sql": dwd_path.read_text(encoding="utf-8"), "dwd_file": str(dwd_path), "drop_first": drop_first}
|
||||
|
||||
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict:
|
||||
"""可选 DROP schema,再执行 DWD DDL。"""
|
||||
with self.db.conn.cursor() as cur:
|
||||
if extracted["drop_first"]:
|
||||
cur.execute("DROP SCHEMA IF EXISTS billiards_dwd CASCADE;")
|
||||
self.logger.info("已执行 DROP SCHEMA billiards_dwd CASCADE")
|
||||
self.logger.info("执行 DWD schema 文件: %s", extracted["dwd_file"])
|
||||
cur.execute(extracted["dwd_sql"])
|
||||
return {"executed": 1, "files": [extracted["dwd_file"]]}
|
||||
73
etl_billiards/tasks/init_schema_task.py
Normal file
73
etl_billiards/tasks/init_schema_task.py
Normal file
@@ -0,0 +1,73 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""任务:初始化运行环境,执行 ODS 与 etl_admin 的 DDL,并准备日志/导出目录。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
|
||||
|
||||
class InitOdsSchemaTask(BaseTask):
|
||||
"""通过调度执行初始化:创建必要目录,执行 ODS 与 etl_admin 的 DDL。"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "INIT_ODS_SCHEMA"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict[str, Any]:
|
||||
"""读取 SQL 文件路径,收集需创建的目录。"""
|
||||
base_dir = Path(__file__).resolve().parents[1] / "database"
|
||||
ods_path = Path(self.config.get("schema.ods_file", base_dir / "schema_ODS_doc.sql"))
|
||||
admin_path = Path(self.config.get("schema.etl_admin_file", base_dir / "schema_etl_admin.sql"))
|
||||
if not ods_path.exists():
|
||||
raise FileNotFoundError(f"找不到 ODS schema 文件: {ods_path}")
|
||||
if not admin_path.exists():
|
||||
raise FileNotFoundError(f"找不到 etl_admin schema 文件: {admin_path}")
|
||||
|
||||
log_root = Path(self.config.get("io.log_root") or self.config["io"]["log_root"])
|
||||
export_root = Path(self.config.get("io.export_root") or self.config["io"]["export_root"])
|
||||
fetch_root = Path(self.config.get("pipeline.fetch_root") or self.config["pipeline"]["fetch_root"])
|
||||
ingest_dir = Path(self.config.get("pipeline.ingest_source_dir") or fetch_root)
|
||||
|
||||
return {
|
||||
"ods_sql": ods_path.read_text(encoding="utf-8"),
|
||||
"admin_sql": admin_path.read_text(encoding="utf-8"),
|
||||
"ods_file": str(ods_path),
|
||||
"admin_file": str(admin_path),
|
||||
"dirs": [log_root, export_root, fetch_root, ingest_dir],
|
||||
}
|
||||
|
||||
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict:
|
||||
"""执行 DDL 并创建必要目录。
|
||||
|
||||
安全提示:
|
||||
ODS DDL 文件可能携带头部说明或异常注释,为避免因非 SQL 文本导致执行失败,这里会做一次轻量清洗后再执行。
|
||||
"""
|
||||
for d in extracted["dirs"]:
|
||||
Path(d).mkdir(parents=True, exist_ok=True)
|
||||
self.logger.info("已确保目录存在: %s", d)
|
||||
|
||||
# 处理 ODS SQL:去掉头部说明行,以及易出错的 COMMENT ON 行(如 CamelCase 未加引号)
|
||||
ods_sql_raw: str = extracted["ods_sql"]
|
||||
drop_idx = ods_sql_raw.find("DROP SCHEMA")
|
||||
if drop_idx > 0:
|
||||
ods_sql_raw = ods_sql_raw[drop_idx:]
|
||||
cleaned_lines: list[str] = []
|
||||
for line in ods_sql_raw.splitlines():
|
||||
if line.strip().upper().startswith("COMMENT ON "):
|
||||
continue
|
||||
cleaned_lines.append(line)
|
||||
ods_sql = "\n".join(cleaned_lines)
|
||||
|
||||
with self.db.conn.cursor() as cur:
|
||||
self.logger.info("执行 etl_admin schema 文件: %s", extracted["admin_file"])
|
||||
cur.execute(extracted["admin_sql"])
|
||||
self.logger.info("执行 ODS schema 文件: %s", extracted["ods_file"])
|
||||
cur.execute(ods_sql)
|
||||
|
||||
return {
|
||||
"executed": 2,
|
||||
"files": [extracted["admin_file"], extracted["ods_file"]],
|
||||
"dirs_prepared": [str(p) for p in extracted["dirs"]],
|
||||
}
|
||||
90
etl_billiards/tasks/inventory_change_task.py
Normal file
90
etl_billiards/tasks/inventory_change_task.py
Normal file
@@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""库存变更任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.inventory_change import InventoryChangeLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class InventoryChangeTask(BaseTask):
|
||||
"""同步库存变化记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "INVENTORY_CHANGE"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/GoodsStockManage/QueryGoodsOutboundReceipt",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="queryDeliveryRecordsList",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_change(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = InventoryChangeLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_changes(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_change(self, raw: dict, store_id: int) -> dict | None:
|
||||
change_id = TypeParser.parse_int(
|
||||
raw.get("siteGoodsStockId") or raw.get("site_goods_stock_id")
|
||||
)
|
||||
if not change_id:
|
||||
self.logger.warning("跳过缺少库存变动ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"change_id": change_id,
|
||||
"site_goods_id": TypeParser.parse_int(
|
||||
raw.get("siteGoodsId") or raw.get("site_goods_id")
|
||||
),
|
||||
"stock_type": raw.get("stockType") or raw.get("stock_type"),
|
||||
"goods_name": raw.get("goodsName"),
|
||||
"change_time": TypeParser.parse_timestamp(
|
||||
raw.get("createTime") or raw.get("create_time"), self.tz
|
||||
),
|
||||
"start_qty": TypeParser.parse_int(raw.get("startNum")),
|
||||
"end_qty": TypeParser.parse_int(raw.get("endNum")),
|
||||
"change_qty": TypeParser.parse_int(raw.get("changeNum")),
|
||||
"unit": raw.get("unit"),
|
||||
"price": TypeParser.parse_decimal(raw.get("price")),
|
||||
"operator_name": raw.get("operatorName"),
|
||||
"remark": raw.get("remark"),
|
||||
"goods_category_id": TypeParser.parse_int(raw.get("goodsCategoryId")),
|
||||
"goods_second_category_id": TypeParser.parse_int(
|
||||
raw.get("goodsSecondCategoryId")
|
||||
),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
115
etl_billiards/tasks/ledger_task.py
Normal file
115
etl_billiards/tasks/ledger_task.py
Normal file
@@ -0,0 +1,115 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教流水任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.assistant_ledger import AssistantLedgerLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class LedgerTask(BaseTask):
|
||||
"""同步助教服务台账"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "LEDGER"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/AssistantPerformance/GetOrderAssistantDetails",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="orderAssistantDetails",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_ledger(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = AssistantLedgerLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_ledgers(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_ledger(self, raw: dict, store_id: int) -> dict | None:
|
||||
ledger_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not ledger_id:
|
||||
self.logger.warning("跳过缺少助教流水ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"ledger_id": ledger_id,
|
||||
"assistant_no": raw.get("assistantNo"),
|
||||
"assistant_name": raw.get("assistantName"),
|
||||
"nickname": raw.get("nickname"),
|
||||
"level_name": raw.get("levelName"),
|
||||
"table_name": raw.get("tableName"),
|
||||
"ledger_unit_price": TypeParser.parse_decimal(raw.get("ledger_unit_price")),
|
||||
"ledger_count": TypeParser.parse_int(raw.get("ledger_count")),
|
||||
"ledger_amount": TypeParser.parse_decimal(raw.get("ledger_amount")),
|
||||
"projected_income": TypeParser.parse_decimal(raw.get("projected_income")),
|
||||
"service_money": TypeParser.parse_decimal(raw.get("service_money")),
|
||||
"member_discount_amount": TypeParser.parse_decimal(
|
||||
raw.get("member_discount_amount")
|
||||
),
|
||||
"manual_discount_amount": TypeParser.parse_decimal(
|
||||
raw.get("manual_discount_amount")
|
||||
),
|
||||
"coupon_deduct_money": TypeParser.parse_decimal(
|
||||
raw.get("coupon_deduct_money")
|
||||
),
|
||||
"order_trade_no": TypeParser.parse_int(raw.get("order_trade_no")),
|
||||
"order_settle_id": TypeParser.parse_int(raw.get("order_settle_id")),
|
||||
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
|
||||
"operator_name": raw.get("operator_name"),
|
||||
"assistant_team_id": TypeParser.parse_int(raw.get("assistant_team_id")),
|
||||
"assistant_level": raw.get("assistant_level"),
|
||||
"site_table_id": TypeParser.parse_int(raw.get("site_table_id")),
|
||||
"order_assistant_id": TypeParser.parse_int(raw.get("order_assistant_id")),
|
||||
"site_assistant_id": TypeParser.parse_int(raw.get("site_assistant_id")),
|
||||
"user_id": TypeParser.parse_int(raw.get("user_id")),
|
||||
"ledger_start_time": TypeParser.parse_timestamp(
|
||||
raw.get("ledger_start_time"), self.tz
|
||||
),
|
||||
"ledger_end_time": TypeParser.parse_timestamp(
|
||||
raw.get("ledger_end_time"), self.tz
|
||||
),
|
||||
"start_use_time": TypeParser.parse_timestamp(raw.get("start_use_time"), self.tz),
|
||||
"last_use_time": TypeParser.parse_timestamp(raw.get("last_use_time"), self.tz),
|
||||
"income_seconds": TypeParser.parse_int(raw.get("income_seconds")),
|
||||
"real_use_seconds": TypeParser.parse_int(raw.get("real_use_seconds")),
|
||||
"is_trash": raw.get("is_trash"),
|
||||
"trash_reason": raw.get("trash_reason"),
|
||||
"is_confirm": raw.get("is_confirm"),
|
||||
"ledger_status": raw.get("ledger_status"),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
347
etl_billiards/tasks/manual_ingest_task.py
Normal file
347
etl_billiards/tasks/manual_ingest_task.py
Normal file
@@ -0,0 +1,347 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""手工示例数据灌入:按 schema_ODS_doc.sql 的表结构写入 ODS。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime
|
||||
from typing import Any, Iterable
|
||||
|
||||
from psycopg2.extras import Json
|
||||
|
||||
from .base_task import BaseTask
|
||||
|
||||
|
||||
class ManualIngestTask(BaseTask):
|
||||
"""本地示例 JSON 灌入 ODS,确保表名/主键/插入列与 schema_ODS_doc.sql 对齐。"""
|
||||
|
||||
FILE_MAPPING: list[tuple[tuple[str, ...], str]] = [
|
||||
(("member_profiles",), "billiards_ods.member_profiles"),
|
||||
(("member_balance_changes",), "billiards_ods.member_balance_changes"),
|
||||
(("member_stored_value_cards",), "billiards_ods.member_stored_value_cards"),
|
||||
(("recharge_settlements",), "billiards_ods.recharge_settlements"),
|
||||
(("settlement_records",), "billiards_ods.settlement_records"),
|
||||
(("assistant_cancellation_records",), "billiards_ods.assistant_cancellation_records"),
|
||||
(("assistant_accounts_master",), "billiards_ods.assistant_accounts_master"),
|
||||
(("assistant_service_records",), "billiards_ods.assistant_service_records"),
|
||||
(("site_tables_master",), "billiards_ods.site_tables_master"),
|
||||
(("table_fee_discount_records",), "billiards_ods.table_fee_discount_records"),
|
||||
(("table_fee_transactions",), "billiards_ods.table_fee_transactions"),
|
||||
(("goods_stock_movements",), "billiards_ods.goods_stock_movements"),
|
||||
(("stock_goods_category_tree",), "billiards_ods.stock_goods_category_tree"),
|
||||
(("goods_stock_summary",), "billiards_ods.goods_stock_summary"),
|
||||
(("payment_transactions",), "billiards_ods.payment_transactions"),
|
||||
(("refund_transactions",), "billiards_ods.refund_transactions"),
|
||||
(("platform_coupon_redemption_records",), "billiards_ods.platform_coupon_redemption_records"),
|
||||
(("group_buy_redemption_records",), "billiards_ods.group_buy_redemption_records"),
|
||||
(("group_buy_packages",), "billiards_ods.group_buy_packages"),
|
||||
(("settlement_ticket_details",), "billiards_ods.settlement_ticket_details"),
|
||||
(("store_goods_master",), "billiards_ods.store_goods_master"),
|
||||
(("tenant_goods_master",), "billiards_ods.tenant_goods_master"),
|
||||
(("store_goods_sales_records",), "billiards_ods.store_goods_sales_records"),
|
||||
]
|
||||
|
||||
TABLE_SPECS: dict[str, dict[str, Any]] = {
|
||||
"billiards_ods.member_profiles": {"pk": "id"},
|
||||
"billiards_ods.member_balance_changes": {"pk": "id"},
|
||||
"billiards_ods.member_stored_value_cards": {"pk": "id"},
|
||||
"billiards_ods.recharge_settlements": {"pk": "id"},
|
||||
"billiards_ods.settlement_records": {"pk": "id"},
|
||||
"billiards_ods.assistant_cancellation_records": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.assistant_accounts_master": {"pk": "id"},
|
||||
"billiards_ods.assistant_service_records": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.site_tables_master": {"pk": "id"},
|
||||
"billiards_ods.table_fee_discount_records": {"pk": "id", "json_cols": ["siteProfile", "tableProfile"]},
|
||||
"billiards_ods.table_fee_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.goods_stock_movements": {"pk": "siteGoodsStockId"},
|
||||
"billiards_ods.stock_goods_category_tree": {"pk": "id", "json_cols": ["categoryBoxes"]},
|
||||
"billiards_ods.goods_stock_summary": {"pk": "siteGoodsId"},
|
||||
"billiards_ods.payment_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.refund_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.platform_coupon_redemption_records": {"pk": "id"},
|
||||
"billiards_ods.tenant_goods_master": {"pk": "id"},
|
||||
"billiards_ods.group_buy_packages": {"pk": "id"},
|
||||
"billiards_ods.group_buy_redemption_records": {"pk": "id"},
|
||||
"billiards_ods.settlement_ticket_details": {
|
||||
"pk": "orderSettleId",
|
||||
"json_cols": ["memberProfile", "orderItem", "tenantMemberCardLogs"],
|
||||
},
|
||||
"billiards_ods.store_goods_master": {"pk": "id"},
|
||||
"billiards_ods.store_goods_sales_records": {"pk": "id"},
|
||||
}
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "MANUAL_INGEST"
|
||||
|
||||
def execute(self, cursor_data: dict | None = None) -> dict:
|
||||
"""从目录读取 JSON,按表定义批量入库。"""
|
||||
data_dir = (
|
||||
self.config.get("manual.data_dir")
|
||||
or self.config.get("pipeline.ingest_source_dir")
|
||||
or r"c:\dev\LLTQ\ETL\feiqiu-ETL\etl_billiards\tests\testdata_json"
|
||||
)
|
||||
if not os.path.exists(data_dir):
|
||||
self.logger.error("Data directory not found: %s", data_dir)
|
||||
return {"status": "error", "message": "Directory not found"}
|
||||
|
||||
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
|
||||
|
||||
for filename in sorted(os.listdir(data_dir)):
|
||||
if not filename.endswith(".json"):
|
||||
continue
|
||||
filepath = os.path.join(data_dir, filename)
|
||||
try:
|
||||
with open(filepath, "r", encoding="utf-8") as fh:
|
||||
raw_entries = json.load(fh)
|
||||
except Exception:
|
||||
counts["errors"] += 1
|
||||
self.logger.exception("Failed to read %s", filename)
|
||||
continue
|
||||
|
||||
entries = raw_entries if isinstance(raw_entries, list) else [raw_entries]
|
||||
records = self._extract_records(entries)
|
||||
if not records:
|
||||
counts["skipped"] += 1
|
||||
continue
|
||||
|
||||
target_table = self._match_by_filename(filename)
|
||||
if not target_table:
|
||||
self.logger.warning("No mapping found for file: %s", filename)
|
||||
counts["skipped"] += 1
|
||||
continue
|
||||
|
||||
self.logger.info("Ingesting %s into %s", filename, target_table)
|
||||
try:
|
||||
inserted, updated = self._ingest_table(target_table, records, filename)
|
||||
counts["inserted"] += inserted
|
||||
counts["updated"] += updated
|
||||
counts["fetched"] += len(records)
|
||||
except Exception:
|
||||
counts["errors"] += 1
|
||||
self.logger.exception("Error processing %s", filename)
|
||||
self.db.rollback()
|
||||
continue
|
||||
|
||||
try:
|
||||
self.db.commit()
|
||||
except Exception:
|
||||
self.db.rollback()
|
||||
raise
|
||||
|
||||
return {"status": "SUCCESS", "counts": counts}
|
||||
|
||||
def _match_by_filename(self, filename: str) -> str | None:
|
||||
"""根据文件名关键字匹配目标表。"""
|
||||
for keywords, table in self.FILE_MAPPING:
|
||||
if any(keyword and keyword in filename for keyword in keywords):
|
||||
return table
|
||||
return None
|
||||
|
||||
def _extract_records(self, raw_entries: Iterable[Any]) -> list[dict]:
|
||||
"""兼容多层 data/list 包装,抽取记录列表。"""
|
||||
records: list[dict] = []
|
||||
for entry in raw_entries:
|
||||
if isinstance(entry, dict):
|
||||
preferred = entry
|
||||
if "data" in entry and not any(k not in {"data", "code"} for k in entry.keys()):
|
||||
preferred = entry["data"]
|
||||
data = preferred
|
||||
if isinstance(data, dict):
|
||||
# 特殊处理 settleList(充值、结算记录):展开 data.settleList 下的 settleList,抛弃上层 siteProfile
|
||||
if "settleList" in data:
|
||||
settle_list_val = data.get("settleList")
|
||||
if isinstance(settle_list_val, dict):
|
||||
settle_list_iter = [settle_list_val]
|
||||
elif isinstance(settle_list_val, list):
|
||||
settle_list_iter = settle_list_val
|
||||
else:
|
||||
settle_list_iter = []
|
||||
|
||||
handled = False
|
||||
for item in settle_list_iter or []:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
inner = item.get("settleList")
|
||||
merged = dict(inner) if isinstance(inner, dict) else dict(item)
|
||||
# 保留 siteProfile 供后续字段补充,但不落库
|
||||
site_profile = data.get("siteProfile")
|
||||
if isinstance(site_profile, dict):
|
||||
merged.setdefault("siteProfile", site_profile)
|
||||
records.append(merged)
|
||||
handled = True
|
||||
if handled:
|
||||
continue
|
||||
|
||||
list_used = False
|
||||
for v in data.values():
|
||||
if isinstance(v, list) and v and isinstance(v[0], dict):
|
||||
records.extend(v)
|
||||
list_used = True
|
||||
break
|
||||
if list_used:
|
||||
continue
|
||||
if isinstance(data, list) and data and isinstance(data[0], dict):
|
||||
records.extend(data)
|
||||
elif isinstance(data, dict):
|
||||
records.append(data)
|
||||
elif isinstance(entry, list):
|
||||
records.extend([item for item in entry if isinstance(item, dict)])
|
||||
return records
|
||||
|
||||
def _get_table_columns(self, table: str) -> list[tuple[str, str, str]]:
|
||||
"""查询 information_schema,获取目标表列信息。"""
|
||||
cache = getattr(self, "_table_columns_cache", {})
|
||||
if table in cache:
|
||||
return cache[table]
|
||||
if "." in table:
|
||||
schema, name = table.split(".", 1)
|
||||
else:
|
||||
schema, name = "public", table
|
||||
sql = """
|
||||
SELECT column_name, data_type, udt_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s AND table_name = %s
|
||||
ORDER BY ordinal_position
|
||||
"""
|
||||
with self.db.conn.cursor() as cur:
|
||||
cur.execute(sql, (schema, name))
|
||||
cols = [(r[0], (r[1] or "").lower(), (r[2] or "").lower()) for r in cur.fetchall()]
|
||||
cache[table] = cols
|
||||
self._table_columns_cache = cache
|
||||
return cols
|
||||
|
||||
def _ingest_table(self, table: str, records: list[dict], source_file: str) -> tuple[int, int]:
|
||||
"""构建 INSERT/ON CONFLICT 语句并批量执行。"""
|
||||
spec = self.TABLE_SPECS.get(table)
|
||||
if not spec:
|
||||
raise ValueError(f"No table spec for {table}")
|
||||
|
||||
pk_col = spec.get("pk")
|
||||
json_cols = set(spec.get("json_cols", []))
|
||||
json_cols_lower = {c.lower() for c in json_cols}
|
||||
|
||||
columns_info = self._get_table_columns(table)
|
||||
columns = [c[0] for c in columns_info]
|
||||
db_json_cols_lower = {
|
||||
c[0].lower() for c in columns_info if c[1] in ("json", "jsonb") or c[2] in ("json", "jsonb")
|
||||
}
|
||||
pk_col_db = None
|
||||
if pk_col:
|
||||
pk_col_db = next((c for c in columns if c.lower() == pk_col.lower()), pk_col)
|
||||
|
||||
placeholders = ", ".join(["%s"] * len(columns))
|
||||
col_list = ", ".join(f'"{c}"' for c in columns)
|
||||
sql = f'INSERT INTO {table} ({col_list}) VALUES ({placeholders})'
|
||||
if pk_col_db:
|
||||
update_cols = [c for c in columns if c != pk_col_db]
|
||||
set_clause = ", ".join(f'"{c}"=EXCLUDED."{c}"' for c in update_cols)
|
||||
sql += f' ON CONFLICT ("{pk_col_db}") DO UPDATE SET {set_clause}'
|
||||
sql += " RETURNING (xmax = 0) AS inserted"
|
||||
|
||||
params = []
|
||||
now = datetime.now()
|
||||
json_dump = lambda v: json.dumps(v, ensure_ascii=False) # noqa: E731
|
||||
for rec in records:
|
||||
merged_rec = rec if isinstance(rec, dict) else {}
|
||||
data_part = merged_rec.get("data")
|
||||
while isinstance(data_part, dict):
|
||||
merged_rec = {**data_part, **merged_rec}
|
||||
data_part = data_part.get("data")
|
||||
|
||||
# 针对充值/结算,补齐 siteProfile 中的店铺信息
|
||||
if table in {
|
||||
"billiards_ods.recharge_settlements",
|
||||
"billiards_ods.settlement_records",
|
||||
}:
|
||||
site_profile = merged_rec.get("siteProfile") or merged_rec.get("site_profile")
|
||||
if isinstance(site_profile, dict):
|
||||
merged_rec.setdefault("tenantid", site_profile.get("tenant_id") or site_profile.get("tenantId"))
|
||||
merged_rec.setdefault("siteid", site_profile.get("id") or site_profile.get("siteId"))
|
||||
merged_rec.setdefault("sitename", site_profile.get("shop_name") or site_profile.get("siteName"))
|
||||
|
||||
pk_val = self._get_value_case_insensitive(merged_rec, pk_col) if pk_col else None
|
||||
if pk_col and (pk_val is None or pk_val == ""):
|
||||
continue
|
||||
|
||||
row_vals = []
|
||||
for col_name, data_type, udt in columns_info:
|
||||
col_lower = col_name.lower()
|
||||
if col_lower == "payload":
|
||||
row_vals.append(Json(rec, dumps=json_dump))
|
||||
continue
|
||||
if col_lower == "source_file":
|
||||
row_vals.append(source_file)
|
||||
continue
|
||||
if col_lower == "fetched_at":
|
||||
row_vals.append(merged_rec.get(col_name, now))
|
||||
continue
|
||||
|
||||
value = self._normalize_scalar(self._get_value_case_insensitive(merged_rec, col_name))
|
||||
|
||||
if col_lower in json_cols_lower or col_lower in db_json_cols_lower:
|
||||
row_vals.append(Json(value, dumps=json_dump) if value is not None else None)
|
||||
continue
|
||||
|
||||
casted = self._cast_value(value, data_type)
|
||||
row_vals.append(casted)
|
||||
params.append(tuple(row_vals))
|
||||
|
||||
if not params:
|
||||
return 0, 0
|
||||
|
||||
inserted = 0
|
||||
updated = 0
|
||||
with self.db.conn.cursor() as cur:
|
||||
for row in params:
|
||||
cur.execute(sql, row)
|
||||
flag = cur.fetchone()[0]
|
||||
if flag:
|
||||
inserted += 1
|
||||
else:
|
||||
updated += 1
|
||||
return inserted, updated
|
||||
|
||||
@staticmethod
|
||||
def _get_value_case_insensitive(record: dict, col: str | None):
|
||||
"""忽略大小写获取值,兼容 information_schema 与 JSON 原始字段。"""
|
||||
if record is None or col is None:
|
||||
return None
|
||||
if col in record:
|
||||
return record.get(col)
|
||||
col_lower = col.lower()
|
||||
for k, v in record.items():
|
||||
if isinstance(k, str) and k.lower() == col_lower:
|
||||
return v
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def _normalize_scalar(value):
|
||||
"""将空字符串/空 JSON 规范为 None,避免类型转换错误。"""
|
||||
if value == "" or value == "{}" or value == "[]":
|
||||
return None
|
||||
return value
|
||||
|
||||
@staticmethod
|
||||
def _cast_value(value, data_type: str):
|
||||
"""根据列类型做简单转换,保证批量插入兼容。"""
|
||||
if value is None:
|
||||
return None
|
||||
dt = (data_type or "").lower()
|
||||
if dt in ("integer", "bigint", "smallint"):
|
||||
if isinstance(value, bool):
|
||||
return int(value)
|
||||
try:
|
||||
return int(value)
|
||||
except Exception:
|
||||
return None
|
||||
if dt in ("numeric", "double precision", "real", "decimal"):
|
||||
if isinstance(value, bool):
|
||||
return int(value)
|
||||
try:
|
||||
return float(value)
|
||||
except Exception:
|
||||
return None
|
||||
if dt.startswith("timestamp") or dt in ("date", "time", "interval"):
|
||||
return value if isinstance(value, str) else None
|
||||
return value
|
||||
90
etl_billiards/tasks/members_dwd_task.py
Normal file
90
etl_billiards/tasks/members_dwd_task.py
Normal file
@@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
from .base_dwd_task import BaseDwdTask
|
||||
from loaders.dimensions.member import MemberLoader
|
||||
from models.parsers import TypeParser
|
||||
import json
|
||||
|
||||
class MembersDwdTask(BaseDwdTask):
|
||||
"""
|
||||
DWD Task: Process Member Records from ODS to Dimension Table
|
||||
Source: billiards_ods.member_profiles
|
||||
Target: billiards.dim_member
|
||||
"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "MEMBERS_DWD"
|
||||
|
||||
def execute(self) -> dict:
|
||||
self.logger.info(f"Starting {self.get_task_code()} task")
|
||||
|
||||
window_start, window_end, _ = self._get_time_window()
|
||||
self.logger.info(f"Processing window: {window_start} to {window_end}")
|
||||
|
||||
loader = MemberLoader(self.db)
|
||||
store_id = self.config.get("app.store_id")
|
||||
|
||||
total_inserted = 0
|
||||
total_updated = 0
|
||||
total_errors = 0
|
||||
|
||||
# Iterate ODS Data
|
||||
batches = self.iter_ods_rows(
|
||||
table_name="billiards_ods.member_profiles",
|
||||
columns=["site_id", "member_id", "payload", "fetched_at"],
|
||||
start_time=window_start,
|
||||
end_time=window_end
|
||||
)
|
||||
|
||||
for batch in batches:
|
||||
if not batch:
|
||||
continue
|
||||
|
||||
parsed_rows = []
|
||||
for row in batch:
|
||||
payload = self.parse_payload(row)
|
||||
if not payload:
|
||||
continue
|
||||
|
||||
parsed = self._parse_member(payload, store_id)
|
||||
if parsed:
|
||||
parsed_rows.append(parsed)
|
||||
|
||||
if parsed_rows:
|
||||
inserted, updated, skipped = loader.upsert_members(parsed_rows, store_id)
|
||||
total_inserted += inserted
|
||||
total_updated += updated
|
||||
|
||||
self.db.commit()
|
||||
|
||||
self.logger.info(f"Task {self.get_task_code()} completed. Inserted: {total_inserted}, Updated: {total_updated}")
|
||||
|
||||
return {
|
||||
"status": "success",
|
||||
"inserted": total_inserted,
|
||||
"updated": total_updated,
|
||||
"window_start": window_start.isoformat(),
|
||||
"window_end": window_end.isoformat()
|
||||
}
|
||||
|
||||
def _parse_member(self, raw: dict, store_id: int) -> dict:
|
||||
"""Parse ODS payload into Dim structure"""
|
||||
try:
|
||||
# Handle both API structure (camelCase) and manual structure
|
||||
member_id = raw.get("id") or raw.get("memberId")
|
||||
if not member_id:
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"member_id": member_id,
|
||||
"member_name": raw.get("name") or raw.get("memberName"),
|
||||
"phone": raw.get("phone") or raw.get("mobile"),
|
||||
"balance": raw.get("balance", 0),
|
||||
"status": str(raw.get("status", "NORMAL")),
|
||||
"register_time": raw.get("createTime") or raw.get("registerTime"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False)
|
||||
}
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Error parsing member: {e}")
|
||||
return None
|
||||
|
||||
72
etl_billiards/tasks/members_task.py
Normal file
72
etl_billiards/tasks/members_task.py
Normal file
@@ -0,0 +1,72 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""会员ETL任务"""
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.member import MemberLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class MembersTask(BaseTask):
|
||||
"""会员ETL任务"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "MEMBERS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/MemberProfile/GetTenantMemberList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="tenantMemberInfos",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
parsed_row = self._parse_member(raw, context.store_id)
|
||||
if parsed_row:
|
||||
parsed.append(parsed_row)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = MemberLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_members(
|
||||
transformed["records"], context.store_id
|
||||
)
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_member(self, raw: dict, store_id: int) -> dict | None:
|
||||
"""解析会员记录"""
|
||||
try:
|
||||
member_id = TypeParser.parse_int(raw.get("memberId"))
|
||||
if not member_id:
|
||||
return None
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"member_id": member_id,
|
||||
"member_name": raw.get("memberName"),
|
||||
"phone": raw.get("phone"),
|
||||
"balance": TypeParser.parse_decimal(raw.get("balance")),
|
||||
"status": raw.get("status"),
|
||||
"register_time": TypeParser.parse_timestamp(raw.get("registerTime"), self.tz),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
except Exception as exc:
|
||||
self.logger.warning("解析会员记录失败: %s, 原始数据: %s", exc, raw)
|
||||
return None
|
||||
1029
etl_billiards/tasks/ods_tasks.py
Normal file
1029
etl_billiards/tasks/ods_tasks.py
Normal file
File diff suppressed because it is too large
Load Diff
91
etl_billiards/tasks/orders_task.py
Normal file
91
etl_billiards/tasks/orders_task.py
Normal file
@@ -0,0 +1,91 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""订单ETL任务"""
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.order import OrderLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class OrdersTask(BaseTask):
|
||||
"""订单数据ETL任务"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "ORDERS"
|
||||
|
||||
# ------------------------------------------------------------------ E/T/L hooks
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
"""调用 API 拉取订单记录"""
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"rangeStartTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"rangeEndTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, pages_meta = self.api.get_paginated(
|
||||
endpoint="/Site/GetAllOrderSettleList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="settleList",
|
||||
)
|
||||
return {"records": records, "meta": pages_meta}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
"""解析原始订单 JSON"""
|
||||
parsed_records = []
|
||||
skipped = 0
|
||||
|
||||
for rec in extracted.get("records", []):
|
||||
parsed = self._parse_order(rec, context.store_id)
|
||||
if parsed:
|
||||
parsed_records.append(parsed)
|
||||
else:
|
||||
skipped += 1
|
||||
|
||||
return {
|
||||
"records": parsed_records,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
"""写入 fact_order"""
|
||||
loader = OrderLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_orders(
|
||||
transformed["records"], context.store_id
|
||||
)
|
||||
|
||||
counts = {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
return counts
|
||||
|
||||
# ------------------------------------------------------------------ helpers
|
||||
def _parse_order(self, raw: dict, store_id: int) -> dict | None:
|
||||
"""解析单条订单记录"""
|
||||
try:
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"order_id": TypeParser.parse_int(raw.get("orderId")),
|
||||
"order_no": raw.get("orderNo"),
|
||||
"member_id": TypeParser.parse_int(raw.get("memberId")),
|
||||
"table_id": TypeParser.parse_int(raw.get("tableId")),
|
||||
"order_time": TypeParser.parse_timestamp(raw.get("orderTime"), self.tz),
|
||||
"end_time": TypeParser.parse_timestamp(raw.get("endTime"), self.tz),
|
||||
"total_amount": TypeParser.parse_decimal(raw.get("totalAmount")),
|
||||
"discount_amount": TypeParser.parse_decimal(raw.get("discountAmount")),
|
||||
"final_amount": TypeParser.parse_decimal(raw.get("finalAmount")),
|
||||
"pay_status": raw.get("payStatus"),
|
||||
"order_status": raw.get("orderStatus"),
|
||||
"remark": raw.get("remark"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
except Exception as exc:
|
||||
self.logger.warning("解析订单失败: %s, 原始数据: %s", exc, raw)
|
||||
return None
|
||||
90
etl_billiards/tasks/packages_task.py
Normal file
90
etl_billiards/tasks/packages_task.py
Normal file
@@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""团购/套餐定义任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.package import PackageDefinitionLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class PackagesDefTask(BaseTask):
|
||||
"""同步团购套餐定义"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "PACKAGES_DEF"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/PackageCoupon/QueryPackageCouponList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="packageCouponList",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_package(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = PackageDefinitionLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_packages(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_package(self, raw: dict, store_id: int) -> dict | None:
|
||||
package_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not package_id:
|
||||
self.logger.warning("跳过缺少 package id 的套餐记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"package_id": package_id,
|
||||
"package_code": raw.get("package_id") or raw.get("packageId"),
|
||||
"package_name": raw.get("package_name"),
|
||||
"table_area_id": raw.get("table_area_id"),
|
||||
"table_area_name": raw.get("table_area_name"),
|
||||
"selling_price": TypeParser.parse_decimal(
|
||||
raw.get("selling_price") or raw.get("sellingPrice")
|
||||
),
|
||||
"duration_seconds": TypeParser.parse_int(raw.get("duration")),
|
||||
"start_time": TypeParser.parse_timestamp(
|
||||
raw.get("start_time") or raw.get("startTime"), self.tz
|
||||
),
|
||||
"end_time": TypeParser.parse_timestamp(
|
||||
raw.get("end_time") or raw.get("endTime"), self.tz
|
||||
),
|
||||
"type": raw.get("type"),
|
||||
"is_enabled": raw.get("is_enabled"),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"usable_count": TypeParser.parse_int(raw.get("usable_count")),
|
||||
"creator_name": raw.get("creator_name"),
|
||||
"date_type": raw.get("date_type"),
|
||||
"group_type": raw.get("group_type"),
|
||||
"coupon_money": TypeParser.parse_decimal(
|
||||
raw.get("coupon_money") or raw.get("couponMoney")
|
||||
),
|
||||
"area_tag_type": raw.get("area_tag_type"),
|
||||
"system_group_type": raw.get("system_group_type"),
|
||||
"card_type_ids": raw.get("card_type_ids"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
139
etl_billiards/tasks/payments_dwd_task.py
Normal file
139
etl_billiards/tasks/payments_dwd_task.py
Normal file
@@ -0,0 +1,139 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
from .base_dwd_task import BaseDwdTask
|
||||
from loaders.facts.payment import PaymentLoader
|
||||
from models.parsers import TypeParser
|
||||
import json
|
||||
|
||||
class PaymentsDwdTask(BaseDwdTask):
|
||||
"""
|
||||
DWD Task: Process Payment Records from ODS to Fact Table
|
||||
Source: billiards_ods.ods_payment
|
||||
Target: billiards.fact_payment
|
||||
"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "PAYMENTS_DWD"
|
||||
|
||||
def execute(self) -> dict:
|
||||
self.logger.info(f"Starting {self.get_task_code()} task")
|
||||
|
||||
window_start, window_end, _ = self._get_time_window()
|
||||
self.logger.info(f"Processing window: {window_start} to {window_end}")
|
||||
|
||||
loader = PaymentLoader(self.db, logger=self.logger)
|
||||
store_id = self.config.get("app.store_id")
|
||||
|
||||
total_inserted = 0
|
||||
total_updated = 0
|
||||
total_skipped = 0
|
||||
|
||||
# Iterate ODS Data
|
||||
batches = self.iter_ods_rows(
|
||||
table_name="billiards_ods.payment_transactions",
|
||||
columns=["site_id", "pay_id", "payload", "fetched_at"],
|
||||
start_time=window_start,
|
||||
end_time=window_end
|
||||
)
|
||||
|
||||
for batch in batches:
|
||||
if not batch:
|
||||
continue
|
||||
|
||||
parsed_rows = []
|
||||
for row in batch:
|
||||
payload = self.parse_payload(row)
|
||||
if not payload:
|
||||
continue
|
||||
|
||||
parsed = self._parse_payment(payload, store_id)
|
||||
if parsed:
|
||||
parsed_rows.append(parsed)
|
||||
|
||||
if parsed_rows:
|
||||
inserted, updated, skipped = loader.upsert_payments(parsed_rows, store_id)
|
||||
total_inserted += inserted
|
||||
total_updated += updated
|
||||
total_skipped += skipped
|
||||
|
||||
self.db.commit()
|
||||
|
||||
self.logger.info(
|
||||
"Task %s completed. inserted=%s updated=%s skipped=%s",
|
||||
self.get_task_code(),
|
||||
total_inserted,
|
||||
total_updated,
|
||||
total_skipped,
|
||||
)
|
||||
|
||||
return {
|
||||
"status": "SUCCESS",
|
||||
"counts": {
|
||||
"inserted": total_inserted,
|
||||
"updated": total_updated,
|
||||
"skipped": total_skipped,
|
||||
},
|
||||
"window_start": window_start,
|
||||
"window_end": window_end,
|
||||
}
|
||||
|
||||
def _parse_payment(self, raw: dict, store_id: int) -> dict:
|
||||
"""Parse ODS payload into Fact structure"""
|
||||
try:
|
||||
pay_id = TypeParser.parse_int(raw.get("payId") or raw.get("id"))
|
||||
if not pay_id:
|
||||
return None
|
||||
|
||||
relate_type = str(raw.get("relateType") or raw.get("relate_type") or "")
|
||||
relate_id = TypeParser.parse_int(raw.get("relateId") or raw.get("relate_id"))
|
||||
|
||||
# Attempt to populate settlement / trade identifiers
|
||||
order_settle_id = TypeParser.parse_int(
|
||||
raw.get("orderSettleId") or raw.get("order_settle_id")
|
||||
)
|
||||
order_trade_no = TypeParser.parse_int(
|
||||
raw.get("orderTradeNo") or raw.get("order_trade_no")
|
||||
)
|
||||
|
||||
if relate_type in {"1", "SETTLE", "ORDER"}:
|
||||
order_settle_id = order_settle_id or relate_id
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"pay_id": pay_id,
|
||||
"order_id": TypeParser.parse_int(raw.get("orderId") or raw.get("order_id")),
|
||||
"order_settle_id": order_settle_id,
|
||||
"order_trade_no": order_trade_no,
|
||||
"relate_type": relate_type,
|
||||
"relate_id": relate_id,
|
||||
"site_id": TypeParser.parse_int(
|
||||
raw.get("siteId") or raw.get("site_id") or store_id
|
||||
),
|
||||
"tenant_id": TypeParser.parse_int(raw.get("tenantId") or raw.get("tenant_id")),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("createTime") or raw.get("create_time"), self.tz
|
||||
),
|
||||
"pay_time": TypeParser.parse_timestamp(raw.get("payTime"), self.tz),
|
||||
"pay_amount": TypeParser.parse_decimal(raw.get("payAmount")),
|
||||
"fee_amount": TypeParser.parse_decimal(
|
||||
raw.get("feeAmount")
|
||||
or raw.get("serviceFee")
|
||||
or raw.get("channelFee")
|
||||
or raw.get("fee_amount")
|
||||
),
|
||||
"discount_amount": TypeParser.parse_decimal(
|
||||
raw.get("discountAmount")
|
||||
or raw.get("couponAmount")
|
||||
or raw.get("discount_amount")
|
||||
),
|
||||
"payment_method": str(raw.get("paymentMethod") or raw.get("payment_method") or ""),
|
||||
"pay_type": raw.get("payType") or raw.get("pay_type"),
|
||||
"online_pay_channel": raw.get("onlinePayChannel") or raw.get("online_pay_channel"),
|
||||
"pay_terminal": raw.get("payTerminal") or raw.get("pay_terminal"),
|
||||
"pay_status": str(raw.get("payStatus") or raw.get("pay_status") or ""),
|
||||
"remark": raw.get("remark"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False)
|
||||
}
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Error parsing payment: {e}")
|
||||
return None
|
||||
|
||||
111
etl_billiards/tasks/payments_task.py
Normal file
111
etl_billiards/tasks/payments_task.py
Normal file
@@ -0,0 +1,111 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""支付记录ETL任务"""
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.payment import PaymentLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class PaymentsTask(BaseTask):
|
||||
"""支付记录 E/T/L 任务"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "PAYMENTS"
|
||||
|
||||
# ------------------------------------------------------------------ E/T/L hooks
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
"""调用 API 抓取支付记录"""
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"StartPayTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"EndPayTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, pages_meta = self.api.get_paginated(
|
||||
endpoint="/PayLog/GetPayLogListPage",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
)
|
||||
return {"records": records, "meta": pages_meta}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
"""解析支付 JSON"""
|
||||
parsed, skipped = [], 0
|
||||
for rec in extracted.get("records", []):
|
||||
cleaned = self._parse_payment(rec, context.store_id)
|
||||
if cleaned:
|
||||
parsed.append(cleaned)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
"""写入 fact_payment"""
|
||||
loader = PaymentLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_payments(
|
||||
transformed["records"], context.store_id
|
||||
)
|
||||
counts = {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
return counts
|
||||
|
||||
# ------------------------------------------------------------------ helpers
|
||||
def _parse_payment(self, raw: dict, store_id: int) -> dict | None:
|
||||
"""解析支付记录"""
|
||||
try:
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"pay_id": TypeParser.parse_int(raw.get("payId") or raw.get("id")),
|
||||
"order_id": TypeParser.parse_int(raw.get("orderId")),
|
||||
"order_settle_id": TypeParser.parse_int(
|
||||
raw.get("orderSettleId") or raw.get("order_settle_id")
|
||||
),
|
||||
"order_trade_no": TypeParser.parse_int(
|
||||
raw.get("orderTradeNo") or raw.get("order_trade_no")
|
||||
),
|
||||
"relate_type": raw.get("relateType") or raw.get("relate_type"),
|
||||
"relate_id": TypeParser.parse_int(raw.get("relateId") or raw.get("relate_id")),
|
||||
"site_id": TypeParser.parse_int(
|
||||
raw.get("siteId") or raw.get("site_id") or store_id
|
||||
),
|
||||
"tenant_id": TypeParser.parse_int(raw.get("tenantId") or raw.get("tenant_id")),
|
||||
"pay_time": TypeParser.parse_timestamp(raw.get("payTime"), self.tz),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("createTime") or raw.get("create_time"), self.tz
|
||||
),
|
||||
"pay_amount": TypeParser.parse_decimal(raw.get("payAmount")),
|
||||
"fee_amount": TypeParser.parse_decimal(
|
||||
raw.get("feeAmount")
|
||||
or raw.get("serviceFee")
|
||||
or raw.get("channelFee")
|
||||
or raw.get("fee_amount")
|
||||
),
|
||||
"discount_amount": TypeParser.parse_decimal(
|
||||
raw.get("discountAmount")
|
||||
or raw.get("couponAmount")
|
||||
or raw.get("discount_amount")
|
||||
),
|
||||
"pay_type": raw.get("payType"),
|
||||
"payment_method": raw.get("paymentMethod") or raw.get("payment_method"),
|
||||
"online_pay_channel": raw.get("onlinePayChannel")
|
||||
or raw.get("online_pay_channel"),
|
||||
"pay_status": raw.get("payStatus"),
|
||||
"pay_terminal": raw.get("payTerminal") or raw.get("pay_terminal"),
|
||||
"remark": raw.get("remark"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
except Exception as exc:
|
||||
self.logger.warning("解析支付记录失败: %s, 原始数据: %s", exc, raw)
|
||||
return None
|
||||
93
etl_billiards/tasks/products_task.py
Normal file
93
etl_billiards/tasks/products_task.py
Normal file
@@ -0,0 +1,93 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""商品档案(PRODUCTS)ETL任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.product import ProductLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class ProductsTask(BaseTask):
|
||||
"""商品维度 ETL 任务"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "PRODUCTS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/TenantGoods/QueryTenantGoods",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="tenantGoodsList",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
parsed_row = self._parse_product(raw, context.store_id)
|
||||
if parsed_row:
|
||||
parsed.append(parsed_row)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = ProductLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_products(
|
||||
transformed["records"], context.store_id
|
||||
)
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_product(self, raw: dict, store_id: int) -> dict | None:
|
||||
try:
|
||||
product_id = TypeParser.parse_int(
|
||||
raw.get("siteGoodsId") or raw.get("tenantGoodsId") or raw.get("productId")
|
||||
)
|
||||
if not product_id:
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"product_id": product_id,
|
||||
"site_product_id": TypeParser.parse_int(raw.get("siteGoodsId")),
|
||||
"product_name": raw.get("goodsName") or raw.get("productName"),
|
||||
"category_id": TypeParser.parse_int(
|
||||
raw.get("tenantGoodsCategoryId") or raw.get("goodsCategoryId")
|
||||
),
|
||||
"category_name": raw.get("categoryName"),
|
||||
"second_category_id": TypeParser.parse_int(raw.get("goodsCategorySecondId")),
|
||||
"unit": raw.get("goodsUnit"),
|
||||
"cost_price": TypeParser.parse_decimal(raw.get("costPrice")),
|
||||
"sale_price": TypeParser.parse_decimal(
|
||||
raw.get("goodsPrice") or raw.get("salePrice")
|
||||
),
|
||||
"allow_discount": None,
|
||||
"status": raw.get("goodsState") or raw.get("status"),
|
||||
"supplier_id": TypeParser.parse_int(raw.get("supplierId"))
|
||||
if raw.get("supplierId")
|
||||
else None,
|
||||
"barcode": raw.get("barcode"),
|
||||
"is_combo": bool(raw.get("isCombo"))
|
||||
if raw.get("isCombo") is not None
|
||||
else None,
|
||||
"created_time": TypeParser.parse_timestamp(raw.get("createTime"), self.tz),
|
||||
"updated_time": TypeParser.parse_timestamp(raw.get("updateTime"), self.tz),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
except Exception as exc:
|
||||
self.logger.warning("解析商品记录失败: %s, 原始数据: %s", exc, raw)
|
||||
return None
|
||||
90
etl_billiards/tasks/refunds_task.py
Normal file
90
etl_billiards/tasks/refunds_task.py
Normal file
@@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""退款记录任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.refund import RefundLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class RefundsTask(BaseTask):
|
||||
"""同步支付退款流水"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "REFUNDS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Order/GetRefundPayLogList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_refund(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = RefundLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_refunds(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_refund(self, raw: dict, store_id: int) -> dict | None:
|
||||
refund_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not refund_id:
|
||||
self.logger.warning("跳过缺少退款ID的数据: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"refund_id": refund_id,
|
||||
"site_id": TypeParser.parse_int(raw.get("site_id") or raw.get("siteId")),
|
||||
"tenant_id": TypeParser.parse_int(raw.get("tenant_id") or raw.get("tenantId")),
|
||||
"pay_amount": TypeParser.parse_decimal(raw.get("pay_amount")),
|
||||
"pay_status": raw.get("pay_status"),
|
||||
"pay_time": TypeParser.parse_timestamp(
|
||||
raw.get("pay_time") or raw.get("payTime"), self.tz
|
||||
),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"relate_type": raw.get("relate_type"),
|
||||
"relate_id": TypeParser.parse_int(raw.get("relate_id")),
|
||||
"payment_method": raw.get("payment_method"),
|
||||
"refund_amount": TypeParser.parse_decimal(raw.get("refund_amount")),
|
||||
"action_type": raw.get("action_type"),
|
||||
"pay_terminal": raw.get("pay_terminal"),
|
||||
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
|
||||
"channel_pay_no": raw.get("channel_pay_no"),
|
||||
"channel_fee": TypeParser.parse_decimal(raw.get("channel_fee")),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"member_id": TypeParser.parse_int(raw.get("member_id")),
|
||||
"member_card_id": TypeParser.parse_int(raw.get("member_card_id")),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
92
etl_billiards/tasks/table_discount_task.py
Normal file
92
etl_billiards/tasks/table_discount_task.py
Normal file
@@ -0,0 +1,92 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""台费折扣任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.table_discount import TableDiscountLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class TableDiscountTask(BaseTask):
|
||||
"""同步台费折扣/调价记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "TABLE_DISCOUNT"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Site/GetTaiFeeAdjustList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="taiFeeAdjustInfos",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_discount(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = TableDiscountLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_discounts(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_discount(self, raw: dict, store_id: int) -> dict | None:
|
||||
discount_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not discount_id:
|
||||
self.logger.warning("跳过缺少折扣ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
table_profile = raw.get("tableProfile") or {}
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"discount_id": discount_id,
|
||||
"adjust_type": raw.get("adjust_type") or raw.get("adjustType"),
|
||||
"applicant_id": TypeParser.parse_int(raw.get("applicant_id")),
|
||||
"applicant_name": raw.get("applicant_name"),
|
||||
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
|
||||
"operator_name": raw.get("operator_name"),
|
||||
"ledger_amount": TypeParser.parse_decimal(raw.get("ledger_amount")),
|
||||
"ledger_count": TypeParser.parse_int(raw.get("ledger_count")),
|
||||
"ledger_name": raw.get("ledger_name"),
|
||||
"ledger_status": raw.get("ledger_status"),
|
||||
"order_settle_id": TypeParser.parse_int(raw.get("order_settle_id")),
|
||||
"order_trade_no": TypeParser.parse_int(raw.get("order_trade_no")),
|
||||
"site_table_id": TypeParser.parse_int(
|
||||
raw.get("site_table_id") or table_profile.get("id")
|
||||
),
|
||||
"table_area_id": TypeParser.parse_int(
|
||||
raw.get("tableAreaId") or table_profile.get("site_table_area_id")
|
||||
),
|
||||
"table_area_name": table_profile.get("site_table_area_name"),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
84
etl_billiards/tasks/tables_task.py
Normal file
84
etl_billiards/tasks/tables_task.py
Normal file
@@ -0,0 +1,84 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""台桌档案任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.table import TableLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class TablesTask(BaseTask):
|
||||
"""同步门店台桌列表"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "TABLES"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Table/GetSiteTables",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="siteTables",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_table(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = TableLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_tables(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_table(self, raw: dict, store_id: int) -> dict | None:
|
||||
table_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not table_id:
|
||||
self.logger.warning("跳过缺少 table_id 的台桌记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"table_id": table_id,
|
||||
"site_id": TypeParser.parse_int(raw.get("site_id") or raw.get("siteId")),
|
||||
"area_id": TypeParser.parse_int(
|
||||
raw.get("site_table_area_id") or raw.get("siteTableAreaId")
|
||||
),
|
||||
"area_name": raw.get("areaName") or raw.get("site_table_area_name"),
|
||||
"table_name": raw.get("table_name") or raw.get("tableName"),
|
||||
"table_price": TypeParser.parse_decimal(
|
||||
raw.get("table_price") or raw.get("tablePrice")
|
||||
),
|
||||
"table_status": raw.get("table_status") or raw.get("tableStatus"),
|
||||
"table_status_name": raw.get("tableStatusName"),
|
||||
"light_status": raw.get("light_status"),
|
||||
"is_rest_area": raw.get("is_rest_area"),
|
||||
"show_status": raw.get("show_status"),
|
||||
"virtual_table": raw.get("virtual_table"),
|
||||
"charge_free": raw.get("charge_free"),
|
||||
"only_allow_groupon": raw.get("only_allow_groupon"),
|
||||
"is_online_reservation": raw.get("is_online_reservation"),
|
||||
"created_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user