Compare commits
5 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 0ab040b9fb | |||
| 0c29bd41f8 | |||
| 561c640700 | |||
| f301cc1fd5 | |||
| 6f1d163a99 |
1360
20251121-task.txt
1360
20251121-task.txt
File diff suppressed because it is too large
Load Diff
113
README.md
113
README.md
@@ -1,78 +1,57 @@
|
||||
# 台球场 ETL 系统
|
||||
# 飞球 ETL 系统(ODS → DWD)
|
||||
|
||||
用于台球门店业务的数据采集与入湖:从上游 API 拉取订单、支付、会员、库存等数据,先落地 ODS,再清洗写入事实/维度表,并提供运行追踪、增量游标、数据质量检查与测试脚手架。
|
||||
面向门店业务的 ETL:拉取/或离线灌入上游 JSON,先落地 ODS,再清洗装载 DWD(含 SCD2 维度、事实增量),并提供质量校验报表。
|
||||
|
||||
## 核心特性
|
||||
- **两阶段链路**:ODS 原始留痕 + DWD/事实表清洗,支持回放与重跑。
|
||||
- **任务注册与调度**:`TaskRegistry` 统一管理任务代码,`ETLScheduler` 负责游标、运行记录和失败隔离。
|
||||
- **统一底座**:配置(默认值 + `.env` + CLI 覆盖)、分页/重试的 API 客户端、批量 Upsert 的数据库封装、SCD2 维度处理、质量检查。
|
||||
- **测试与回放**:ONLINE/OFFLINE 模式切换,`run_tests.py`/`test_presets.py` 支持参数化测试;`MANUAL_INGEST` 可将归档 JSON 重灌入 ODS。
|
||||
- **可安装**:`setup.py` / `entry_point` 提供 `etl-billiards` 命令,或直接 `python -m cli.main` 运行。
|
||||
|
||||
## 仓库结构(摘录)
|
||||
- `etl_billiards/config`:默认配置、环境变量解析、配置加载。
|
||||
- `etl_billiards/api`:HTTP 客户端,内置重试/分页。
|
||||
- `etl_billiards/database`:连接管理、批量 Upsert。
|
||||
- `etl_billiards/tasks`:业务任务(ORDERS、PAYMENTS…)、ODS 任务、DWD 任务、人工回放;`base_task.py`/`base_dwd_task.py` 提供模板。
|
||||
- `etl_billiards/loaders`:事实/维度/ODS Loader;`scd/` 为 SCD2。
|
||||
- `etl_billiards/orchestration`:调度器、任务注册表、游标与运行追踪。
|
||||
- `etl_billiards/scripts`:测试执行器、数据库连通性检测、预置测试指令。
|
||||
- `etl_billiards/tests`:单元/集成测试与离线 JSON 归档。
|
||||
|
||||
## 支持的任务代码
|
||||
- **事实/维度**:`ORDERS`、`PAYMENTS`、`REFUNDS`、`INVENTORY_CHANGE`、`COUPON_USAGE`、`MEMBERS`、`ASSISTANTS`、`PRODUCTS`、`TABLES`、`PACKAGES_DEF`、`TOPUPS`、`TABLE_DISCOUNT`、`ASSISTANT_ABOLISH`、`LEDGER`、`TICKET_DWD`、`PAYMENTS_DWD`、`MEMBERS_DWD`。
|
||||
- **ODS 原始采集**:`ODS_ORDER_SETTLE`、`ODS_TABLE_USE`、`ODS_ASSISTANT_LEDGER`、`ODS_ASSISTANT_ABOLISH`、`ODS_GOODS_LEDGER`、`ODS_PAYMENT`、`ODS_REFUND`、`ODS_COUPON_VERIFY`、`ODS_MEMBER`、`ODS_MEMBER_CARD`、`ODS_PACKAGE`、`ODS_INVENTORY_STOCK`、`ODS_INVENTORY_CHANGE`。
|
||||
- **辅助**:`MANUAL_INGEST`(将归档 JSON 回放到 ODS)。
|
||||
|
||||
## 快速开始
|
||||
1. **环境要求**:Python 3.10+、PostgreSQL。推荐在 `etl_billiards/` 目录下执行命令。
|
||||
2. **安装依赖**
|
||||
## 快速运行(离线示例 JSON)
|
||||
1) 环境:Python 3.10+、PostgreSQL;`.env` 关键项:`PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test`,`INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc`。
|
||||
2) 安装依赖:
|
||||
```bash
|
||||
cd etl_billiards
|
||||
pip install -r requirements.txt
|
||||
# 开发模式:pip install -e .
|
||||
```
|
||||
3. **配置 `.env`**
|
||||
```
|
||||
3) 一键 ODS→DWD→质检:
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# 核心项
|
||||
PG_DSN=postgresql://user:pwd@host:5432/LLZQ
|
||||
API_BASE=https://api.example.com
|
||||
API_TOKEN=your_token
|
||||
STORE_ID=2790685415443269
|
||||
EXPORT_ROOT=/path/to/export
|
||||
LOG_ROOT=/path/to/logs
|
||||
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA --pipeline-flow INGEST_ONLY
|
||||
python -m etl_billiards.cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
|
||||
python -m etl_billiards.cli.main --tasks DWD_LOAD_FROM_ODS --pipeline-flow INGEST_ONLY
|
||||
python -m etl_billiards.cli.main --tasks DWD_QUALITY_CHECK --pipeline-flow INGEST_ONLY
|
||||
# 报表:etl_billiards/reports/dwd_quality_report.json
|
||||
```
|
||||
配置的生效顺序为 “默认值” < “环境变量/.env” < “CLI 参数”。
|
||||
4. **运行任务**
|
||||
```bash
|
||||
# 运行默认任务集
|
||||
python -m cli.main
|
||||
|
||||
# 按需选择任务(逗号分隔)
|
||||
python -m cli.main --tasks ODS_ORDER_SETTLE,ORDERS,PAYMENTS
|
||||
## 目录与文件作用
|
||||
- 根目录:`etl_billiards/` 主代码;`requirements.txt` 依赖;`run_etl.sh/.bat` 启动脚本;`.env/.env.example` 配置;`tmp/` 存放草稿/调试/备份。
|
||||
- etl_billiards/ 主线目录
|
||||
- `config/`:`defaults.py` 默认值,`env_parser.py` 解析 .env,`settings.py` 统一配置加载。
|
||||
- `api/`:`client.py` HTTP 请求、重试与分页。
|
||||
- `database/`:`connection.py` 连接封装,`operations.py` 批量 upsert,DDL:`schema_ODS_doc.sql`、`schema_dwd_doc.sql`。
|
||||
- `tasks/`:业务任务
|
||||
- `init_schema_task.py`:INIT_ODS_SCHEMA / INIT_DWD_SCHEMA。
|
||||
- `manual_ingest_task.py`:示例 JSON → ODS。
|
||||
- `dwd_load_task.py`:ODS → DWD(映射、SCD2/事实增量)。
|
||||
- 其他任务按需扩展。
|
||||
- `loaders/`:ODS/DWD/SCD2 Loader 实现。
|
||||
- `scd/`:`scd2_handler.py` 处理维度 SCD2 历史。
|
||||
- `quality/`:质量检查器(行数/金额对照)。
|
||||
- `orchestration/`:`scheduler.py` 调度;`task_registry.py` 任务注册;`run_tracker.py` 运行记录。
|
||||
- `scripts/`:重建/测试/探活工具。
|
||||
- `docs/`:`ods_to_dwd_mapping.md` 映射说明,`ods_sample_json.md` 示例 JSON 说明,`dwd_quality_check.md` 质检说明。
|
||||
- `reports/`:质检输出(如 `dwd_quality_report.json`)。
|
||||
- `tests/`:单元/集成测试;`utils/`:通用工具。
|
||||
- `backups/`(若存在):关键文件备份。
|
||||
|
||||
# Dry-run 示例(不提交事务)
|
||||
python -m cli.main --tasks ORDERS --dry-run
|
||||
## 业务流程与文件关系
|
||||
1) 调度入口:`cli/main.py` 解析 CLI → `orchestration/scheduler.py` 依 `task_registry.py` 创建任务 → 初始化 DB/API/Config 上下文。
|
||||
2) ODS:`init_schema_task.py` 执行 `schema_ODS_doc.sql` 建表;`manual_ingest_task.py` 从 `INGEST_SOURCE_DIR` 读 JSON,批量 upsert ODS。
|
||||
3) DWD:`init_schema_task.py` 执行 `schema_dwd_doc.sql` 建表;`dwd_load_task.py` 依据 `TABLE_MAP/FACT_MAPPINGS` 从 ODS 清洗写入 DWD,维度走 SCD2(`scd/scd2_handler.py`),事实按时间/水位增量。
|
||||
4) 质检:质量任务读取 ODS/DWD,统计行数/金额,输出 `reports/dwd_quality_report.json`。
|
||||
5) 配置:`config/defaults.py` + `.env` + CLI 参数叠加;HTTP(如启用在线)走 `api/client.py`;DB 访问走 `database/connection.py`。
|
||||
6) 文档:`docs/ods_to_dwd_mapping.md` 记录字段映射;`docs/ods_sample_json.md` 描述示例数据结构,便于对照调试。
|
||||
|
||||
# Windows 批处理
|
||||
..\\run_etl.bat --tasks PAYMENTS
|
||||
```
|
||||
5. **查看输出**:日志目录与导出目录分别由 `LOG_ROOT`、`EXPORT_ROOT` 控制;运行追踪与游标记录写入数据库 `etl_admin.*` 表。
|
||||
|
||||
## 数据与运行流转
|
||||
- CLI 解析参数 → `AppConfig.load()` 组装配置 → `ETLScheduler` 创建 DB/API/游标/运行追踪器。
|
||||
- 调度器按任务代码实例化任务,读取/推进游标,落盘运行记录。
|
||||
- 任务模板:确定时间窗口 → 调用 API/ODS 数据 → 解析校验 → Loader 批量 Upsert/SCD2 → 质量检查 → 提交事务并回写游标。
|
||||
|
||||
## 测试与回放
|
||||
- 单元/集成测试:`pytest` 或 `python scripts/run_tests.py --suite online`。
|
||||
- 预置组合:`python scripts/run_tests.py --preset offline_realdb`(见 `scripts/test_presets.py`)。
|
||||
- 离线模式:`TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=... pytest tests/unit/test_etl_tasks_offline.py`。
|
||||
- 数据库连通性:`python scripts/test_db_connection.py --dsn postgresql://... --query "SELECT 1"`。
|
||||
|
||||
## 其他提示
|
||||
- `.env.example` 列出了所有常用配置;`config/defaults.py` 记录默认值与任务窗口配置。
|
||||
- `loaders/ods/generic.py` 支持定义主键/列名即可落 ODS;`tasks/manual_ingest_task.py` 可将归档 JSON 快速灌入对应 ODS 表。
|
||||
- 需要新增任务时,在 `tasks/` 中实现并在 `orchestration/task_registry.py` 注册即可复用调度能力。
|
||||
## 当前状态(2025-12-09)
|
||||
- 示例 JSON 全量灌入;DWD 行数与 ODS 对齐。
|
||||
- 分类维度已展平大类+子类:`dim_goods_category` 26 行(category_level/leaf 已赋值)。
|
||||
- 剩余空值多因源数据为空;补值需先确认上游是否提供。
|
||||
|
||||
## 可精简/归档
|
||||
- `tmp/`、`tmp/etl_billiards_misc/` 中的草稿、旧备份、调试脚本仅供参考,运行不依赖。
|
||||
- 根级保留必要文件(README、requirements、run_etl.*、.env/.env.example),其余临时文件已移至 tmp。
|
||||
|
||||
216
README_FULL.md
Normal file
216
README_FULL.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# 飞球 ETL 系统(ODS → DWD)— 详细版
|
||||
|
||||
> 本文为项目的详细说明,保持与当前代码一致,覆盖 ODS 任务、DWD 装载、质检及开发扩展要点。
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目概览
|
||||
|
||||
面向门店业务的 ETL:从上游 API 或离线 JSON 采集订单、支付、会员、库存等数据,先落地 **ODS**,再清洗装载 **DWD**(含 SCD2 维度、事实增量),并输出质量校验报表。项目采用模块化/分层架构(配置、API、数据库、Loader/SCD、质量、调度、CLI、测试),统一通过 CLI 调度。
|
||||
|
||||
---
|
||||
|
||||
## 2. 快速开始(离线示例 JSON)
|
||||
|
||||
**环境要求**:Python 3.10+;PostgreSQL;`.env` 关键项:
|
||||
- `PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test`
|
||||
- `INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc`
|
||||
|
||||
**安装依赖**:
|
||||
```bash
|
||||
cd etl_billiards
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
**一键 ODS → DWD → 质检(离线回放)**:
|
||||
```bash
|
||||
# 初始化 ODS + DWD
|
||||
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,INIT_DWD_SCHEMA --pipeline-flow INGEST_ONLY
|
||||
|
||||
# 灌入示例 JSON 到 ODS(可用 .env 的 INGEST_SOURCE_DIR 覆盖)
|
||||
python -m etl_billiards.cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "C:\dev\LLTQ\export\test-json-doc"
|
||||
|
||||
# 从 ODS 装载 DWD
|
||||
python -m etl_billiards.cli.main --tasks DWD_LOAD_FROM_ODS --pipeline-flow INGEST_ONLY
|
||||
|
||||
# 质量校验报表
|
||||
python -m etl_billiards.cli.main --tasks DWD_QUALITY_CHECK --pipeline-flow INGEST_ONLY
|
||||
# 报表输出:etl_billiards/reports/dwd_quality_report.json
|
||||
```
|
||||
|
||||
> 可按需单独运行:
|
||||
> - 仅建表:`python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA`
|
||||
> - 仅 ODS 灌入:`python -m etl_billiards.cli.main --tasks MANUAL_INGEST`
|
||||
> - 仅 DWD 装载:`python -m etl_billiards.cli.main --tasks INIT_DWD_SCHEMA,DWD_LOAD_FROM_ODS`
|
||||
|
||||
---
|
||||
|
||||
## 3. 配置与路径
|
||||
- 示例数据目录:`C:\dev\LLTQ\export\test-json-doc`(可由 `.env` 的 `INGEST_SOURCE_DIR` 覆盖)。
|
||||
- 日志/导出目录:`LOG_ROOT`、`EXPORT_ROOT` 见 `.env`。
|
||||
- 报表:`etl_billiards/reports/dwd_quality_report.json`。
|
||||
- DDL:`etl_billiards/database/schema_ODS_doc.sql`、`etl_billiards/database/schema_dwd_doc.sql`。
|
||||
- 任务注册:`etl_billiards/orchestration/task_registry.py`(默认启用 INIT_ODS_SCHEMA、MANUAL_INGEST、INIT_DWD_SCHEMA、DWD_LOAD_FROM_ODS、DWD_QUALITY_CHECK)。
|
||||
|
||||
**安全提示**:建议将数据库凭证保存在 `.env` 或受控秘钥管理中,生产环境使用最小权限账号。
|
||||
|
||||
---
|
||||
|
||||
## 4. 目录结构与关键文件
|
||||
- 根目录:`etl_billiards/` 主代码;`requirements.txt` 依赖;`run_etl.sh/.bat` 启动脚本;`.env/.env.example` 配置;`tmp/` 草稿/调试归档。
|
||||
- `config/`:`defaults.py` 默认值,`env_parser.py` 解析 .env,`settings.py` AppConfig 统一加载。
|
||||
- `api/`:`client.py` HTTP 请求、重试、分页。
|
||||
- `database/`:`connection.py` 连接封装;`operations.py` 批量 upsert;DDL SQL(ODS/DWD)。
|
||||
- `tasks/`:
|
||||
- `init_schema_task.py`(INIT_ODS_SCHEMA/INIT_DWD_SCHEMA);
|
||||
- `manual_ingest_task.py`(示例 JSON → ODS);
|
||||
- `dwd_load_task.py`(ODS → DWD 映射、SCD2/事实增量);
|
||||
- 其他任务按需扩展。
|
||||
- `loaders/`:ODS/DWD/SCD2 Loader 实现。
|
||||
- `scd/`:`scd2_handler.py` 处理维度 SCD2 历史。
|
||||
- `quality/`:质量检查器(行数/金额对照)。
|
||||
- `orchestration/`:`scheduler.py` 调度;`task_registry.py` 注册;`run_tracker.py` 运行记录;`cursor_manager.py` 水位管理。
|
||||
- `scripts/`:重建/测试/探活工具。
|
||||
- `docs/`:`ods_to_dwd_mapping.md` 映射说明;`ods_sample_json.md` 示例 JSON 说明;`dwd_quality_check.md` 质检说明。
|
||||
- `reports/`:质检输出(如 `dwd_quality_report.json`)。
|
||||
- `tests/`:单元/集成测试;`utils/`:通用工具;`backups/`:备份(若存在)。
|
||||
|
||||
---
|
||||
|
||||
## 5. 架构与流程
|
||||
执行链路(控制流):
|
||||
1) CLI(`cli/main.py`)解析参数 → 生成 AppConfig → 初始化日志/DB 连接;
|
||||
2) 调度层(`scheduler.py`)按 `task_registry.py` 中的注册表实例化任务,设置 run_uuid、cursor(水位)、上下文;
|
||||
3) 任务基类模板:
|
||||
- 获取时间窗口/水位(cursor_manager);
|
||||
- 拉取数据:在线模式调用 `api/client.py` 支持分页、重试;离线模式直接读取 JSON 文件;
|
||||
- 解析与校验:类型转换、必填校验(如任务内部 parse/validate);
|
||||
- 加载:调用 Loader(`loaders/`)执行批量 Upsert/SCD2/增量写入(底层用 `database/operations.py`);
|
||||
- 质量检查(如需):质量模块对行数/金额等进行对比;
|
||||
- 更新水位与运行记录(`run_tracker.py`),提交/回滚事务。
|
||||
|
||||
数据流与依赖:
|
||||
- 配置:`config/defaults.py` + `.env` + CLI 参数叠加,形成 AppConfig。
|
||||
- API 访问:`api/client.py` 支撑分页/重试;离线 ingest 直接读文件。
|
||||
- DB 访问:`database/connection.py` 提供连接上下文;`operations.py` 负责批量 upsert/分页写入。
|
||||
- ODS:`manual_ingest_task.py` 读取 JSON → ODS 表(保留 payload/来源/时间戳)。
|
||||
- DWD:`dwd_load_task.py` 依据 `TABLE_MAP/FACT_MAPPINGS` 从 ODS 选取字段;维度走 SCD2(`scd/scd2_handler.py`),事实走增量;支持字段表达式(JSON->>、CAST)。
|
||||
- 质检:`quality` 模块或相关任务对 ODS/DWD 行数、金额等进行比对,输出 `reports/`。
|
||||
|
||||
---
|
||||
|
||||
## 6. ODS → DWD 策略
|
||||
1. ODS 留底:保留源主键、payload、时间/来源信息。
|
||||
2. DWD 清洗:维度 SCD2,事实按时间/水位增量;字段类型、单位、枚举标准化,保留可溯源字段。
|
||||
3. 业务键统一:site_id、member_id、table_id、order_settle_id、order_trade_no 等统一命名。
|
||||
4. 不过度汇总:DWD 只做明细/轻度清洗,汇总留待 DWS/报表。
|
||||
5. 去嵌套:数组展开为子表/子行,重复 profile 提炼为维度。
|
||||
6. 长期演进:优先加列/加表,避免频繁改已有表结构。
|
||||
|
||||
---
|
||||
|
||||
## 7. 常用 CLI
|
||||
```bash
|
||||
# 运行所有已注册任务
|
||||
python -m etl_billiards.cli.main
|
||||
# 运行指定任务
|
||||
python -m etl_billiards.cli.main --tasks INIT_ODS_SCHEMA,MANUAL_INGEST
|
||||
# 覆盖 DSN
|
||||
python -m etl_billiards.cli.main --pg-dsn "postgresql://user:pwd@host:5432/db"
|
||||
# 覆盖 API
|
||||
python -m etl_billiards.cli.main --api-base "https://api.example.com" --api-token "..."
|
||||
# 试运行(不写库)
|
||||
python -m etl_billiards.cli.main --dry-run --tasks DWD_LOAD_FROM_ODS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. 测试(ONLINE / OFFLINE)
|
||||
- `TEST_MODE=ONLINE`:调用真实 API,全链路 E/T/L。
|
||||
- `TEST_MODE=OFFLINE`:从 `TEST_JSON_ARCHIVE_DIR` 读取离线 JSON,只做 Transform + Load。
|
||||
- `TEST_DB_DSN`:如设置,则集成测试连真库;未设置用内存/临时库。
|
||||
示例:
|
||||
```bash
|
||||
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
|
||||
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
|
||||
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/db --query "SELECT 1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. 开发与扩展
|
||||
- 新任务:在 `tasks/` 继承 BaseTask,实现 `get_task_code/execute`,并在 `orchestration/task_registry.py` 注册。
|
||||
- 新 Loader/Checker:参考 `loaders/`、`quality/` 复用批量 upsert/质检接口。
|
||||
- 配置:`config/defaults.py` + `.env` + CLI 叠加,新增配置需在 defaults 与 env_parser 中声明。
|
||||
|
||||
---
|
||||
|
||||
## 10. ODS 任务上线指引
|
||||
- 任务注册脚本:`etl_billiards/database/seed_ods_tasks.sql`(替换 store_id 后执行:`psql "$PG_DSN" -f ...`)。
|
||||
- 确认 `etl_admin.etl_task` 中已启用所需 ODS 任务。
|
||||
- 离线回放:可用 `scripts/rebuild_ods_from_json`(如有)从本地 JSON 重建 ODS。
|
||||
- 单测:`pytest etl_billiards/tests/unit/test_ods_tasks.py`。
|
||||
|
||||
---
|
||||
|
||||
## 11. ODS 表概览(数据路径)
|
||||
|
||||
| ODS 表名 | 接口 Path | 数据列表路径 |
|
||||
| ------------------------------------ | ------------------------------------------------- | ----------------------------- |
|
||||
| assistant_accounts_master | /PersonnelManagement/SearchAssistantInfo | data.assistantInfos |
|
||||
| assistant_service_records | /AssistantPerformance/GetOrderAssistantDetails | data.orderAssistantDetails |
|
||||
| assistant_cancellation_records | /AssistantPerformance/GetAbolitionAssistant | data.abolitionAssistants |
|
||||
| goods_stock_movements | /GoodsStockManage/QueryGoodsOutboundReceipt | data.queryDeliveryRecordsList |
|
||||
| goods_stock_summary | /TenantGoods/GetGoodsStockReport | data |
|
||||
| group_buy_packages | /PackageCoupon/QueryPackageCouponList | data.packageCouponList |
|
||||
| group_buy_redemption_records | /Site/GetSiteTableUseDetails | data.siteTableUseDetailsList |
|
||||
| member_profiles | /MemberProfile/GetTenantMemberList | data.tenantMemberInfos |
|
||||
| member_balance_changes | /MemberProfile/GetMemberCardBalanceChange | data.tenantMemberCardLogs |
|
||||
| member_stored_value_cards | /MemberProfile/GetTenantMemberCardList | data.tenantMemberCards |
|
||||
| payment_transactions | /PayLog/GetPayLogListPage | data |
|
||||
| platform_coupon_redemption_records | /Promotion/GetOfflineCouponConsumePageList | data |
|
||||
| recharge_settlements | /Site/GetRechargeSettleList | data.settleList |
|
||||
| refund_transactions | /Order/GetRefundPayLogList | data |
|
||||
| settlement_records | /Site/GetAllOrderSettleList | data.settleList |
|
||||
| settlement_ticket_details | /Order/GetOrderSettleTicketNew | 完整 JSON |
|
||||
| site_tables_master | /Table/GetSiteTables | data.siteTables |
|
||||
| stock_goods_category_tree | /TenantGoodsCategory/QueryPrimarySecondaryCategory| data.goodsCategoryList |
|
||||
| store_goods_master | /TenantGoods/GetGoodsInventoryList | data.orderGoodsList |
|
||||
| store_goods_sales_records | /TenantGoods/GetGoodsSalesList | data.orderGoodsLedgers |
|
||||
| table_fee_discount_records | /Site/GetTaiFeeAdjustList | data.taiFeeAdjustInfos |
|
||||
| table_fee_transactions | /Site/GetSiteTableOrderDetails | data.siteTableUseDetailsList |
|
||||
| tenant_goods_master | /TenantGoods/QueryTenantGoods | data.tenantGoodsList |
|
||||
|
||||
> 完整字段级映射见 `docs/` 与 ODS/DWD DDL。
|
||||
|
||||
---
|
||||
|
||||
## 12. DWD 维度与建模要点
|
||||
1. 颗粒一致、单一业务键:一张 DWD 表只承载一种业务事件/颗粒,避免混颗粒。
|
||||
2. 先理解业务链路,再建模;不要机械按 JSON 列表建表。
|
||||
3. 业务键统一:site_id、member_id、table_id、order_settle_id、order_trade_no 等必须一致命名。
|
||||
4. 保留明细,不过度汇总;聚合留到 DWS/报表。
|
||||
5. 清洗标准化同时保留溯源字段(源主键、时间、金额、payload)。
|
||||
6. 去嵌套与解耦:数组展开子行,重复 profile 提炼维度。
|
||||
7. 演进优先加列/加表,减少对已有表结构的破坏。
|
||||
|
||||
---
|
||||
|
||||
## 13. 当前状态(2025-12-09)
|
||||
- 示例 JSON 已全量灌入,DWD 行数与 ODS 对齐。
|
||||
- 分类维度已展平大类+子类:`dim_goods_category` 26 行(category_level/leaf 已赋值)。
|
||||
- 部分空字段源数据即为空,如需补值请先确认上游。
|
||||
|
||||
---
|
||||
|
||||
## 14. 可精简/归档
|
||||
- `tmp/`、`tmp/etl_billiards_misc/` 中草稿、旧备份、调试脚本仅供参考,不影响运行。
|
||||
- 根级保留必要文件(README、requirements、run_etl.*、.env/.env.example),其他临时文件已移至 tmp。
|
||||
|
||||
---
|
||||
|
||||
## 15. FAQ
|
||||
- 字段空值:若映射已存在且源列非空仍为空,再检查上游 JSON;维度 SCD2 按全量合并。
|
||||
- DSN/路径:确认 `.env` 中 `PG_DSN`、`INGEST_SOURCE_DIR` 与本地一致。
|
||||
- 新增任务:在 `tasks/` 实现并注册到 `task_registry.py`,必要时同步更新 DDL 与映射。
|
||||
- 权限/运行:检查网络、账号权限;脚本需执行权限(如 `chmod +x run_etl.sh`)。
|
||||
@@ -1,53 +1,49 @@
|
||||
# 数据库配置(真实库)
|
||||
# -*- coding: utf-8 -*-
|
||||
# 文件说明:ETL 环境变量(config/env_parser.py 读取),用于数据库连接、目录与运行参数。
|
||||
|
||||
# 数据库连接字符串,config/env_parser.py -> db.dsn,所有任务必需
|
||||
PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
|
||||
# 数据库连接超时秒,config/env_parser.py -> db.connect_timeout_sec
|
||||
PG_CONNECT_TIMEOUT=10
|
||||
# 如需拆分配置:PG_HOST=... PG_PORT=... PG_NAME=... PG_USER=... PG_PASSWORD=...
|
||||
|
||||
# API配置(如需走真实接口再填写)
|
||||
API_BASE=https://api.example.com
|
||||
API_TOKEN=your_token_here
|
||||
# API_TIMEOUT=20
|
||||
# API_PAGE_SIZE=200
|
||||
# API_RETRY_MAX=3
|
||||
|
||||
# 应用配置
|
||||
# 门店/租户ID,config/env_parser.py -> app.store_id,任务调度记录使用
|
||||
STORE_ID=2790685415443269
|
||||
# TIMEZONE=Asia/Taipei
|
||||
# SCHEMA_OLTP=billiards
|
||||
# SCHEMA_ETL=etl_admin
|
||||
# 时区标识,config/env_parser.py -> app.timezone
|
||||
TIMEZONE=Asia/Taipei
|
||||
|
||||
# 路径配置
|
||||
EXPORT_ROOT=C:\dev\LLTQ\export\JSON
|
||||
# API 基础地址,config/env_parser.py -> api.base_url,FETCH 类任务调用
|
||||
API_BASE=https://api.example.com
|
||||
# API 鉴权 Token,config/env_parser.py -> api.token,FETCH 类任务调用
|
||||
API_TOKEN=your_token_here
|
||||
# API 请求超时秒,config/env_parser.py -> api.timeout_sec
|
||||
API_TIMEOUT=20
|
||||
# API 分页大小,config/env_parser.py -> api.page_size
|
||||
API_PAGE_SIZE=200
|
||||
# API 最大重试次数,config/env_parser.py -> api.retries.max_attempts
|
||||
API_RETRY_MAX=3
|
||||
|
||||
# 日志根目录,config/env_parser.py -> io.log_root,Init/任务运行写日志
|
||||
LOG_ROOT=C:\dev\LLTQ\export\LOG
|
||||
FETCH_ROOT=
|
||||
INGEST_SOURCE_DIR=
|
||||
WRITE_PRETTY_JSON=false
|
||||
PGCLIENTENCODING=utf8
|
||||
# JSON 导出根目录,config/env_parser.py -> io.export_root,FETCH 产出及 INIT 准备
|
||||
EXPORT_ROOT=C:\dev\LLTQ\export\JSON
|
||||
|
||||
# ETL配置
|
||||
# FETCH 模式本地输出目录,config/env_parser.py -> pipeline.fetch_root
|
||||
FETCH_ROOT=C:\dev\LLTQ\export\JSON
|
||||
# 本地入库 JSON 目录,config/env_parser.py -> pipeline.ingest_source_dir,MANUAL_INGEST/INGEST_ONLY 使用
|
||||
INGEST_SOURCE_DIR=C:\dev\LLTQ\export\test-json-doc
|
||||
|
||||
# JSON 漂亮格式输出开关,config/env_parser.py -> io.write_pretty_json
|
||||
WRITE_PRETTY_JSON=false
|
||||
|
||||
# 运行流程:FULL / FETCH_ONLY / INGEST_ONLY,config/env_parser.py -> pipeline.flow
|
||||
PIPELINE_FLOW=FULL
|
||||
# 指定任务列表(逗号分隔,覆盖默认),config/env_parser.py -> run.tasks
|
||||
# RUN_TASKS=INIT_ODS_SCHEMA,MANUAL_INGEST
|
||||
|
||||
# 窗口/补偿参数,config/env_parser.py -> run.*
|
||||
OVERLAP_SECONDS=120
|
||||
WINDOW_BUSY_MIN=30
|
||||
WINDOW_IDLE_MIN=180
|
||||
IDLE_START=04:00
|
||||
IDLE_END=16:00
|
||||
ALLOW_EMPTY_RESULT_ADVANCE=true
|
||||
|
||||
# 清洗配置
|
||||
LOG_UNKNOWN_FIELDS=true
|
||||
HASH_ALGO=sha1
|
||||
STRICT_NUMERIC=true
|
||||
ROUND_MONEY_SCALE=2
|
||||
|
||||
# 测试/离线模式(真实库联调建议 ONLINE)
|
||||
TEST_MODE=ONLINE
|
||||
TEST_JSON_ARCHIVE_DIR=tests/source-data-doc
|
||||
TEST_JSON_TEMP_DIR=/tmp/etl_billiards_json_tmp
|
||||
|
||||
# 测试数据库
|
||||
TEST_DB_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
|
||||
|
||||
# ODS <20>ؽ<EFBFBD><D8BD>ű<EFBFBD><C5B1><EFBFBD><EFBFBD>ã<EFBFBD><C3A3><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ã<EFBFBD>
|
||||
JSON_DOC_DIR=C:\dev\LLTQ\export\test-json-doc
|
||||
ODS_INCLUDE_FILES=
|
||||
ODS_DROP_SCHEMA_FIRST=true
|
||||
|
||||
|
||||
@@ -1,837 +0,0 @@
|
||||
# 台球场 ETL 系统(模块化版本)合并文档
|
||||
|
||||
本文为原多份文档(如 `INDEX.md`、`QUICK_START.md`、`ARCHITECTURE.md`、`MIGRATION_GUIDE.md`、`PROJECT_STRUCTURE.md`、`README.md` 等)的合并版,只保留与**当前项目本身**相关的内容:项目说明、目录结构、架构设计、数据与控制流程、迁移与扩展指南等,不包含修改历史和重构过程描述。
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目概述
|
||||
|
||||
台球场 ETL 系统是一个面向门店业务的专业 ETL 工程项目,用于从外部业务 API 拉取订单、支付、会员等数据,经过解析、校验、SCD2 处理、质量检查后写入 PostgreSQL 数据库,并支持增量同步和任务运行追踪。
|
||||
|
||||
系统采用模块化、分层架构设计,核心特性包括:
|
||||
|
||||
- 模块化目录结构(配置、数据库、API、模型、加载器、SCD2、质量检查、编排、任务、CLI、工具、测试等分层清晰)。
|
||||
- 完整的配置管理:默认值 + 环境变量 + CLI 参数多层覆盖。
|
||||
- 可复用的数据库访问层(连接管理、批量 Upsert 封装)。
|
||||
- 支持重试与分页的 API 客户端。
|
||||
- 类型安全的数据解析与校验模块。
|
||||
- SCD2 维度历史管理。
|
||||
- 数据质量检查(例如余额一致性检查)。
|
||||
- 任务编排层统一调度、游标管理与运行追踪。
|
||||
- 命令行入口统一管理任务执行,支持筛选任务、Dry-run 等模式。
|
||||
|
||||
---
|
||||
|
||||
## 2. 快速开始
|
||||
|
||||
### 2.1 环境准备
|
||||
|
||||
- Python 版本:建议 3.10+
|
||||
- 数据库:PostgreSQL
|
||||
- 操作系统:Windows / Linux / macOS 均可
|
||||
|
||||
```bash
|
||||
# 克隆/下载代码后进入项目目录
|
||||
cd etl_billiards/
|
||||
ls -la
|
||||
```
|
||||
|
||||
你会看到下述目录结构的顶层部分(详细见第 4 章):
|
||||
|
||||
- `config/` - 配置管理
|
||||
- `database/` - 数据库访问
|
||||
- `api/` - API 客户端
|
||||
- `tasks/` - ETL 任务实现
|
||||
- `cli/` - 命令行入口
|
||||
- `docs/` - 技术文档
|
||||
|
||||
### 2.2 安装依赖
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
主要依赖示例(按实际 `requirements.txt` 为准):
|
||||
|
||||
- `psycopg2-binary`:PostgreSQL 驱动
|
||||
- `requests`:HTTP 客户端
|
||||
- `python-dateutil`:时间处理
|
||||
- `tzdata`:时区数据
|
||||
|
||||
### 2.3 配置环境变量
|
||||
|
||||
复制并修改环境变量模板:
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# 使用你习惯的编辑器修改 .env
|
||||
```
|
||||
|
||||
`.env` 示例(最小配置):
|
||||
|
||||
```bash
|
||||
# 数据库
|
||||
PG_DSN=postgresql://user:password@localhost:5432/....
|
||||
|
||||
# API
|
||||
API_BASE=https://api.example.com
|
||||
API_TOKEN=your_token_here
|
||||
|
||||
# 门店/应用
|
||||
STORE_ID=2790685415443269
|
||||
TIMEZONE=Asia/Taipei
|
||||
|
||||
# 目录
|
||||
EXPORT_ROOT=/path/to/export
|
||||
LOG_ROOT=/path/to/logs
|
||||
```
|
||||
|
||||
> 所有配置项的默认值见 `config/defaults.py`,最终生效配置由「默认值 + 环境变量 + CLI 参数」三层叠加。
|
||||
|
||||
### 2.4 运行第一个任务
|
||||
|
||||
通过 CLI 入口运行:
|
||||
|
||||
```bash
|
||||
# 运行所有任务
|
||||
python -m cli.main
|
||||
|
||||
# 仅运行订单任务
|
||||
python -m cli.main --tasks ORDERS
|
||||
|
||||
# 运行订单 + 支付
|
||||
python -m cli.main --tasks ORDERS,PAYMENTS
|
||||
|
||||
# Windows 使用脚本
|
||||
run_etl.bat --tasks ORDERS
|
||||
|
||||
# Linux / macOS 使用脚本
|
||||
./run_etl.sh --tasks ORDERS
|
||||
```
|
||||
|
||||
### 2.5 查看结果
|
||||
|
||||
- 日志目录:使用 `LOG_ROOT` 指定,例如
|
||||
|
||||
```bash
|
||||
ls -la C:\dev\LLTQ\export\LOG/
|
||||
```
|
||||
|
||||
- 导出目录:使用 `EXPORT_ROOT` 指定,例如
|
||||
|
||||
```bash
|
||||
ls -la C:\dev\LLTQ\export\JSON/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 常用命令与开发工具
|
||||
|
||||
### 3.1 CLI 常用命令
|
||||
|
||||
```bash
|
||||
# 运行所有任务
|
||||
python -m cli.main
|
||||
|
||||
# 运行指定任务
|
||||
python -m cli.main --tasks ORDERS,PAYMENTS,MEMBERS
|
||||
|
||||
# 使用自定义数据库
|
||||
python -m cli.main --pg-dsn "postgresql://user:password@host:5432/db"
|
||||
|
||||
# 使用自定义 API 端点
|
||||
python -m cli.main --api-base "https://api.example.com" --api-token "..."
|
||||
|
||||
# 试运行(不写入数据库)
|
||||
python -m cli.main --dry-run --tasks ORDERS
|
||||
```
|
||||
|
||||
### 3.2 IDE / 代码质量工具(示例:VSCode)
|
||||
|
||||
`.vscode/settings.json` 示例:
|
||||
|
||||
```json
|
||||
{
|
||||
"python.linting.enabled": true,
|
||||
"python.linting.pylintEnabled": true,
|
||||
"python.formatting.provider": "black",
|
||||
"python.testing.pytestEnabled": true
|
||||
}
|
||||
```
|
||||
|
||||
代码格式化与检查:
|
||||
|
||||
```bash
|
||||
pip install black isort pylint
|
||||
|
||||
black .
|
||||
isort .
|
||||
pylint etl_billiards/
|
||||
```
|
||||
|
||||
### 3.3 测试
|
||||
|
||||
```bash
|
||||
# 安装测试依赖(按需)
|
||||
pip install pytest pytest-cov
|
||||
|
||||
# 运行全部测试
|
||||
pytest
|
||||
|
||||
# 仅运行单元测试
|
||||
pytest tests/unit/
|
||||
|
||||
# 生成覆盖率报告
|
||||
pytest --cov=. --cov-report=html
|
||||
```
|
||||
|
||||
测试示例(按实际项目为准):
|
||||
|
||||
- `tests/unit/test_config.py` – 配置管理单元测试
|
||||
- `tests/unit/test_parsers.py` – 解析器单元测试
|
||||
- `tests/integration/test_database.py` – 数据库集成测试
|
||||
|
||||
#### 3.3.1 测试模式(ONLINE / OFFLINE)
|
||||
|
||||
- `TEST_MODE=ONLINE`(默认)时,测试会模拟实时 API,完整执行 E/T/L。
|
||||
- `TEST_MODE=OFFLINE` 时,测试改为从 `TEST_JSON_ARCHIVE_DIR` 指定的归档 JSON 中读取数据,仅做 Transform + Load,适合验证本地归档数据是否仍可回放。
|
||||
- `TEST_JSON_ARCHIVE_DIR`:离线 JSON 归档目录(示例:`tests/source-data-doc` 或 CI 产出的快照)。
|
||||
- `TEST_JSON_TEMP_DIR`:测试生成的临时 JSON 输出目录,便于隔离每次运行的数据。
|
||||
- `TEST_DB_DSN`:可选,若设置则单元测试会连接到此 PostgreSQL DSN,实打实执行写库;留空时测试使用内存伪库,避免依赖数据库。
|
||||
|
||||
示例命令:
|
||||
|
||||
```bash
|
||||
# 在线模式覆盖所有任务
|
||||
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
|
||||
|
||||
# 离线模式使用归档 JSON 覆盖所有任务
|
||||
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
|
||||
|
||||
# 使用脚本按需组合参数(示例:在线 + 仅订单用例)
|
||||
python scripts/run_tests.py --suite online --mode ONLINE --keyword ORDERS
|
||||
|
||||
# 使用脚本连接真实测试库并回放离线模式
|
||||
python scripts/run_tests.py --suite offline --mode OFFLINE --db-dsn postgresql://user:pwd@localhost:5432/testdb
|
||||
|
||||
# 使用“指令仓库”中的预置命令
|
||||
python scripts/run_tests.py --preset offline_realdb
|
||||
python scripts/run_tests.py --list-presets # 查看或自定义 scripts/test_presets.py
|
||||
```
|
||||
|
||||
#### 3.3.2 脚本化测试组合(`run_tests.py` / `test_presets.py`)
|
||||
|
||||
- `scripts/run_tests.py` 是 pytest 的统一入口:自动把项目根目录加入 `sys.path`,并提供 `--suite online/offline/integration`、`--tests`(自定义路径)、`--mode`、`--db-dsn`、`--json-archive`、`--json-temp`、`--keyword/-k`、`--pytest-args`、`--env KEY=VALUE` 等参数,可以像搭积木一样自由组合;
|
||||
- `--preset foo` 会读取 `scripts/test_presets.py` 内 `PRESETS["foo"]` 的配置,并叠加到当前命令;`--list-presets` 与 `--dry-run` 可用来审阅或仅打印命令;
|
||||
- 直接执行 `python scripts/test_presets.py` 可依次运行 `AUTO_RUN_PRESETS` 中列出的预置;传入 `--preset x --dry-run` 则只打印对应命令。
|
||||
|
||||
`test_presets.py` 充当“指令仓库”。每个预置都是一个字典,常用字段解释如下:
|
||||
|
||||
| 字段 | 作用 |
|
||||
| ---------------------------- | ------------------------------------------------------------------ |
|
||||
| `suite` | 复用 `run_tests.py` 内置套件(online/offline/integration,可多选) |
|
||||
| `tests` | 追加任意 pytest 路径,例如 `tests/unit/test_config.py` |
|
||||
| `mode` | 覆盖 `TEST_MODE`(ONLINE / OFFLINE) |
|
||||
| `db_dsn` | 覆盖 `TEST_DB_DSN`,用于连入真实测试库 |
|
||||
| `json_archive` / `json_temp` | 配置离线 JSON 归档与临时目录 |
|
||||
| `keyword` | 映射到 `pytest -k`,用于关键字过滤 |
|
||||
| `pytest_args` | 附加 pytest 参数,例 `-vv --maxfail=1` |
|
||||
| `env` | 额外环境变量列表,如 `["STORE_ID=123"]` |
|
||||
| `preset_meta` | 说明性文字,便于描述场景 |
|
||||
|
||||
示例:`offline_realdb` 预置会设置 `TEST_MODE=OFFLINE`、指定 `tests/source-data-doc` 为归档目录,并通过 `db_dsn` 连到测试库。执行 `python scripts/run_tests.py --preset offline_realdb` 或 `python scripts/test_presets.py --preset offline_realdb` 即可复用该组合,保证本地、CI 与生产回放脚本一致。
|
||||
|
||||
#### 3.3.3 数据库连通性快速检查
|
||||
|
||||
`python scripts/test_db_connection.py` 提供最轻量的 PostgreSQL 连通性检测:默认使用 `TEST_DB_DSN`(也可传 `--dsn`),尝试连接并执行 `SELECT 1 AS ok`(可通过 `--query` 自定义)。典型用途:
|
||||
|
||||
```bash
|
||||
# 读取 .env/环境变量中的 TEST_DB_DSN
|
||||
python scripts/test_db_connection.py
|
||||
|
||||
# 临时指定 DSN,并检查任务配置表
|
||||
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/.... --query "SELECT count(*) FROM etl_admin.etl_task"
|
||||
```
|
||||
|
||||
脚本返回 0 代表连接与查询成功;若返回非 0,可结合第 8 章“常见问题排查”的数据库章节(网络、防火墙、账号权限等)先定位问题,再运行完整 ETL。
|
||||
|
||||
---
|
||||
|
||||
## 4. 项目结构与文件说明
|
||||
|
||||
### 4.1 总体目录结构(树状图)
|
||||
|
||||
```text
|
||||
etl_billiards/
|
||||
│
|
||||
├── README.md # 项目总览和使用说明
|
||||
├── MIGRATION_GUIDE.md # 从旧版本迁移指南
|
||||
├── requirements.txt # Python 依赖列表
|
||||
├── setup.py # 项目安装配置
|
||||
├── .env.example # 环境变量配置模板
|
||||
├── .gitignore # Git 忽略文件配置
|
||||
├── run_etl.sh # Linux/Mac 运行脚本
|
||||
├── run_etl.bat # Windows 运行脚本
|
||||
│
|
||||
├── config/ # 配置管理模块
|
||||
│ ├── __init__.py
|
||||
│ ├── defaults.py # 默认配置值定义
|
||||
│ ├── env_parser.py # 环境变量解析器
|
||||
│ └── settings.py # 配置管理主类
|
||||
│
|
||||
├── database/ # 数据库访问层
|
||||
│ ├── __init__.py
|
||||
│ ├── connection.py # 数据库连接管理
|
||||
│ └── operations.py # 批量操作封装
|
||||
│
|
||||
├── api/ # HTTP API 客户端
|
||||
│ ├── __init__.py
|
||||
│ └── client.py # API 客户端(重试 + 分页)
|
||||
│
|
||||
├── models/ # 数据模型层
|
||||
│ ├── __init__.py
|
||||
│ ├── parsers.py # 类型解析器
|
||||
│ └── validators.py # 数据验证器
|
||||
│
|
||||
├── loaders/ # 数据加载器层
|
||||
│ ├── __init__.py
|
||||
│ ├── base_loader.py # 加载器基类
|
||||
│ ├── dimensions/ # 维度表加载器
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── member.py # 会员维度加载器
|
||||
│ └── facts/ # 事实表加载器
|
||||
│ ├── __init__.py
|
||||
│ ├── order.py # 订单事实表加载器
|
||||
│ └── payment.py # 支付记录加载器
|
||||
│
|
||||
├── scd/ # SCD2 处理层
|
||||
│ ├── __init__.py
|
||||
│ └── scd2_handler.py # SCD2 历史记录处理器
|
||||
│
|
||||
├── quality/ # 数据质量检查层
|
||||
│ ├── __init__.py
|
||||
│ ├── base_checker.py # 质量检查器基类
|
||||
│ └── balance_checker.py # 余额一致性检查器
|
||||
│
|
||||
├── orchestration/ # ETL 编排层
|
||||
│ ├── __init__.py
|
||||
│ ├── scheduler.py # ETL 调度器
|
||||
│ ├── task_registry.py # 任务注册表(工厂模式)
|
||||
│ ├── cursor_manager.py # 游标管理器
|
||||
│ └── run_tracker.py # 运行记录追踪器
|
||||
│
|
||||
├── tasks/ # ETL 任务层
|
||||
│ ├── __init__.py
|
||||
│ ├── base_task.py # 任务基类(模板方法)
|
||||
│ ├── orders_task.py # 订单 ETL 任务
|
||||
│ ├── payments_task.py # 支付 ETL 任务
|
||||
│ └── members_task.py # 会员 ETL 任务
|
||||
│
|
||||
├── cli/ # 命令行接口层
|
||||
│ ├── __init__.py
|
||||
│ └── main.py # CLI 主入口
|
||||
│
|
||||
├── utils/ # 工具函数
|
||||
│ ├── __init__.py
|
||||
│ └── helpers.py # 通用工具函数
|
||||
│
|
||||
├── tests/ # 测试代码
|
||||
│ ├── __init__.py
|
||||
│ ├── unit/ # 单元测试
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── test_config.py
|
||||
│ │ └── test_parsers.py
|
||||
│ ├── testdata_json/ # 清洗入库用的测试Json文件
|
||||
│ │ └── XX.json
|
||||
│ └── integration/ # 集成测试
|
||||
│ ├── __init__.py
|
||||
│ └── test_database.py
|
||||
│
|
||||
└── docs/ # 文档
|
||||
└── ARCHITECTURE.md # 架构设计文档
|
||||
```
|
||||
|
||||
### 4.2 各模块职责概览
|
||||
|
||||
- **config/**
|
||||
- 统一配置入口,支持默认值、环境变量、命令行参数三层覆盖。
|
||||
- **database/**
|
||||
- 封装 PostgreSQL 连接与批量操作(插入、更新、Upsert 等)。
|
||||
- **api/**
|
||||
- 对上游业务 API 的 HTTP 调用进行统一封装,支持重试、分页与超时控制。
|
||||
- **models/**
|
||||
- 提供类型解析器(时间戳、金额、整数等)与业务级数据校验器。
|
||||
- **loaders/**
|
||||
- 提供事实表与维度表的加载逻辑(包含批量 Upsert、统计写入结果等)。
|
||||
- **scd/**
|
||||
- 维度型数据的 SCD2 历史管理(有效期、版本标记等)。
|
||||
- **quality/**
|
||||
- 质量检查策略,例如余额一致性、记录数量对齐等。
|
||||
- **orchestration/**
|
||||
- 任务调度、任务注册、游标管理(增量窗口)、运行记录追踪。
|
||||
- **tasks/**
|
||||
- 具体业务任务(订单、支付、会员等),封装了从“取数 → 处理 → 写库 → 记录结果”的完整流程。
|
||||
- **cli/**
|
||||
- 命令行入口,解析参数并启动调度流程。
|
||||
- **utils/**
|
||||
- 杂项工具函数。
|
||||
- **tests/**
|
||||
- 单元测试与集成测试代码。
|
||||
|
||||
---
|
||||
|
||||
## 5. 架构设计与流程说明
|
||||
|
||||
### 5.1 分层架构图
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────┐
|
||||
│ CLI 命令行接口 │ <- cli/main.py
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
┌─────────────▼───────────────────────┐
|
||||
│ Orchestration 编排层 │ <- orchestration/
|
||||
│ (Scheduler, TaskRegistry, ...) │
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
┌─────────────▼───────────────────────┐
|
||||
│ Tasks 任务层 │ <- tasks/
|
||||
│ (OrdersTask, PaymentsTask, ...) │
|
||||
└───┬─────────┬─────────┬─────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌────────┐ ┌─────┐ ┌──────────┐
|
||||
│Loaders │ │ SCD │ │ Quality │ <- loaders/, scd/, quality/
|
||||
└────────┘ └─────┘ └──────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ Models 模型 │ <- models/
|
||||
└───────┬────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ API 客户端 │ <- api/
|
||||
└───────┬────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ Database 访问 │ <- database/
|
||||
└───────┬────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ Config 配置 │ <- config/
|
||||
└────────────────┘
|
||||
```
|
||||
|
||||
### 5.2 各层职责(当前设计)
|
||||
|
||||
- **CLI 层 (`cli/`)**
|
||||
|
||||
- 解析命令行参数(指定任务列表、Dry-run、覆盖配置项等)。
|
||||
- 初始化配置与日志后交由编排层执行。
|
||||
|
||||
- **编排层 (`orchestration/`)**
|
||||
|
||||
- `scheduler.py`:根据配置与 CLI 参数选择需要执行的任务,控制执行顺序和并行策略。
|
||||
- `task_registry.py`:提供任务注册表,按任务代码创建任务实例(工厂模式)。
|
||||
- `cursor_manager.py`:管理增量游标(时间窗口 / ID 游标)。
|
||||
- `run_tracker.py`:记录每次任务运行的状态、统计信息和错误信息。
|
||||
|
||||
- **任务层 (`tasks/`)**
|
||||
|
||||
- `base_task.py`:定义任务执行模板流程(模板方法模式),包括获取窗口、调用上游、解析 / 校验、写库、更新游标等。
|
||||
- `orders_task.py` / `payments_task.py` / `members_task.py`:实现具体任务逻辑(订单、支付、会员)。
|
||||
|
||||
- **加载器 / SCD / 质量层**
|
||||
|
||||
- `loaders/`:根据目标表封装 Upsert / Insert / Update 逻辑。
|
||||
- `scd/scd2_handler.py`:为维度表提供 SCD2 历史管理能力。
|
||||
- `quality/`:执行数据质量检查,如余额对账。
|
||||
|
||||
- **模型层 (`models/`)**
|
||||
|
||||
- `parsers.py`:负责数据类型转换(字符串 → 时间戳、Decimal、int 等)。
|
||||
- `validators.py`:执行字段级和记录级的数据校验。
|
||||
|
||||
- **API 层 (`api/client.py`)**
|
||||
|
||||
- 封装 HTTP 调用,处理重试、超时及分页。
|
||||
|
||||
- **数据库层 (`database/`)**
|
||||
|
||||
- 管理数据库连接及上下文。
|
||||
- 提供批量插入 / 更新 / Upsert 操作接口。
|
||||
|
||||
- **配置层 (`config/`)**
|
||||
- 定义配置项默认值。
|
||||
- 解析环境变量并进行类型转换。
|
||||
- 对外提供统一配置对象。
|
||||
|
||||
### 5.3 设计模式(当前使用)
|
||||
|
||||
- 工厂模式:任务注册 / 创建(`TaskRegistry`)。
|
||||
- 模板方法模式:任务执行流程(`BaseTask`)。
|
||||
- 策略模式:不同 Loader / Checker 实现不同策略。
|
||||
- 依赖注入:通过构造函数向任务传入 `db`、`api`、`config` 等依赖。
|
||||
|
||||
### 5.4 数据与控制流程
|
||||
|
||||
整体流程:
|
||||
|
||||
1. CLI 解析参数并加载配置。
|
||||
2. Scheduler 构建数据库连接、API 客户端等依赖。
|
||||
3. Scheduler 遍历任务配置,从 `TaskRegistry` 获取任务类并实例化。
|
||||
4. 每个任务按统一模板执行:
|
||||
- 读取游标 / 时间窗口。
|
||||
- 调用 API 拉取数据(可分页)。
|
||||
- 解析、验证数据。
|
||||
- 通过 Loader 写入数据库(事实表 / 维度表 / SCD2)。
|
||||
- 执行质量检查。
|
||||
- 更新游标与运行记录。
|
||||
5. 所有任务执行完成后,释放连接并退出进程。
|
||||
|
||||
### 5.5 错误处理策略
|
||||
|
||||
- 单个任务失败不影响其他任务执行。
|
||||
- 数据库操作异常自动回滚当前事务。
|
||||
- API 请求失败时按配置进行重试,超过重试次数记录错误并终止该任务。
|
||||
- 所有错误被记录到日志和运行追踪表,便于事后排查。
|
||||
|
||||
### 5.6 ODS + DWD 双阶段策略(新增)
|
||||
|
||||
为了支撑回溯/重放与后续 DWD 宽表构建,项目新增了 `billiards_ods` Schema 以及一组专门的 ODS 任务/Loader:
|
||||
|
||||
- **ODS 表**:`billiards_ods.ods_order_settle`、`ods_table_use_detail`、`ods_assistant_ledger`、`ods_assistant_abolish`、`ods_goods_ledger`、`ods_payment`、`ods_refund`、`ods_coupon_verify`、`ods_member`、`ods_member_card`、`ods_package_coupon`、`ods_inventory_stock`、`ods_inventory_change`。每条记录都会保存 `store_id + 源主键 + payload JSON + fetched_at + source_endpoint` 等信息。
|
||||
- **通用 Loader**:`loaders/ods/generic.py::GenericODSLoader` 统一封装了 `INSERT ... ON CONFLICT ...` 与批量写入逻辑,调用方只需提供列名与主键列即可。
|
||||
- **ODS 任务**:`tasks/ods_tasks.py` 内通过 `OdsTaskSpec` 定义了一组任务(`ODS_ORDER_SETTLE`、`ODS_PAYMENT`、`ODS_ASSISTANT_LEDGER` 等),并在 `TaskRegistry` 中自动注册,可直接通过 `python -m cli.main --tasks ODS_ORDER_SETTLE,ODS_PAYMENT` 执行。
|
||||
- **双阶段链路**:
|
||||
1. 阶段 1(ODS):调用 API/离线归档 JSON,将原始记录写入 ODS 表,保留分页、抓取时间、来源文件等元数据。
|
||||
2. 阶段 2(DWD/DIM):后续订单、支付、券等事实任务将改为从 ODS 读取 payload,经过解析/校验后写入 `billiards.fact_*`、`dim_*` 表,避免重复拉取上游接口。
|
||||
|
||||
> 新增的单元测试 `tests/unit/test_ods_tasks.py` 覆盖了 `ODS_ORDER_SETTLE`、`ODS_PAYMENT` 的入库路径,可作为扩展其他 ODS 任务的模板。
|
||||
|
||||
---
|
||||
|
||||
## 6. 迁移指南(从旧脚本到当前项目)
|
||||
|
||||
本节用于说明如何从旧的单文件脚本(如 `task_merged.py`)迁移到当前模块化项目,属于当前项目的使用说明,不涉及历史对比细节。
|
||||
|
||||
### 6.1 核心功能映射示意
|
||||
|
||||
| 旧版本函数 / 类 | 新版本位置 | 说明 |
|
||||
| --------------------- | ----------------------------------------------------- | ---------- |
|
||||
| `DEFAULTS` 字典 | `config/defaults.py` | 配置默认值 |
|
||||
| `build_config()` | `config/settings.py::AppConfig.load()` | 配置加载 |
|
||||
| `Pg` 类 | `database/connection.py::DatabaseConnection` | 数据库连接 |
|
||||
| `http_get_json()` | `api/client.py::APIClient.get()` | API 请求 |
|
||||
| `paged_get()` | `api/client.py::APIClient.get_paginated()` | 分页请求 |
|
||||
| `parse_ts()` | `models/parsers.py::TypeParser.parse_timestamp()` | 时间解析 |
|
||||
| `upsert_fact_order()` | `loaders/facts/order.py::OrderLoader.upsert_orders()` | 订单加载 |
|
||||
| `scd2_upsert()` | `scd/scd2_handler.py::SCD2Handler.upsert()` | SCD2 处理 |
|
||||
| `run_task_orders()` | `tasks/orders_task.py::OrdersTask.execute()` | 订单任务 |
|
||||
| `main()` | `cli/main.py::main()` | 主入口 |
|
||||
|
||||
### 6.2 典型迁移步骤
|
||||
|
||||
1. **配置迁移**
|
||||
|
||||
- 原来在 `DEFAULTS` 或脚本内硬编码的配置,迁移到 `.env` 与 `config/defaults.py`。
|
||||
- 使用 `AppConfig.load()` 统一获取配置。
|
||||
|
||||
2. **并行运行验证**
|
||||
|
||||
```bash
|
||||
# 旧脚本
|
||||
python task_merged.py --tasks ORDERS
|
||||
|
||||
# 新项目
|
||||
python -m cli.main --tasks ORDERS
|
||||
```
|
||||
|
||||
对比新旧版本导出的数据表和日志,确认一致性。
|
||||
|
||||
3. **自定义逻辑迁移**
|
||||
|
||||
- 原脚本中的自定义清洗逻辑 → 放入相应 `loaders/` 或任务类中。
|
||||
- 自定义任务 → 在 `tasks/` 中实现并在 `task_registry` 中注册。
|
||||
- 自定义 API 调用 → 扩展 `api/client.py` 或单独封装服务类。
|
||||
|
||||
4. **逐步切换**
|
||||
- 先在测试环境并行运行。
|
||||
- 再逐步切换生产任务到新版本。
|
||||
|
||||
---
|
||||
|
||||
## 7. 开发与扩展指南(当前项目)
|
||||
|
||||
### 7.1 添加新任务
|
||||
|
||||
1. 在 `tasks/` 目录创建任务类:
|
||||
|
||||
```python
|
||||
from .base_task import BaseTask
|
||||
|
||||
class MyTask(BaseTask):
|
||||
def get_task_code(self) -> str:
|
||||
return "MY_TASK"
|
||||
|
||||
def execute(self) -> dict:
|
||||
# 1. 获取时间窗口
|
||||
window_start, window_end, _ = self._get_time_window()
|
||||
|
||||
# 2. 调用 API 获取数据
|
||||
records, _ = self.api.get_paginated(...)
|
||||
|
||||
# 3. 解析 / 校验
|
||||
parsed = [self._parse(r) for r in records]
|
||||
|
||||
# 4. 加载数据
|
||||
loader = MyLoader(self.db)
|
||||
inserted, updated, _ = loader.upsert(parsed)
|
||||
|
||||
# 5. 提交并返回结果
|
||||
self.db.commit()
|
||||
return self._build_result("SUCCESS", {
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
})
|
||||
```
|
||||
|
||||
2. 在 `orchestration/task_registry.py` 中注册:
|
||||
|
||||
```python
|
||||
from tasks.my_task import MyTask
|
||||
|
||||
default_registry.register("MY_TASK", MyTask)
|
||||
```
|
||||
|
||||
3. 在任务配置表中启用(示例):
|
||||
|
||||
```sql
|
||||
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
|
||||
VALUES ('MY_TASK', 123456, TRUE);
|
||||
```
|
||||
|
||||
### 7.2 添加新加载器
|
||||
|
||||
```python
|
||||
from loaders.base_loader import BaseLoader
|
||||
|
||||
class MyLoader(BaseLoader):
|
||||
def upsert(self, records: list) -> tuple:
|
||||
sql = "INSERT INTO table_name (...) VALUES (...) ON CONFLICT (...) DO UPDATE SET ... RETURNING (xmax = 0) AS inserted"
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
```
|
||||
|
||||
### 7.3 添加新质量检查器
|
||||
|
||||
1. 在 `quality/` 中实现检查器,继承 `base_checker.py`。
|
||||
2. 在任务或调度流程中调用该检查器,在写库后进行验证。
|
||||
|
||||
### 7.4 类型解析与校验扩展
|
||||
|
||||
- 在 `models/parsers.py` 中添加新类型解析方法。
|
||||
- 在 `models/validators.py` 中添加新规则(如枚举校验、跨字段校验等)。
|
||||
|
||||
---
|
||||
|
||||
## 8. 常见问题排查
|
||||
|
||||
### 8.1 数据库连接失败
|
||||
|
||||
```text
|
||||
错误: could not connect to server
|
||||
```
|
||||
|
||||
排查要点:
|
||||
|
||||
- 检查 `PG_DSN` 或相关数据库配置是否正确。
|
||||
- 确认数据库服务是否启动、网络是否可达。
|
||||
|
||||
### 8.2 API 请求超时
|
||||
|
||||
```text
|
||||
错误: requests.exceptions.Timeout
|
||||
```
|
||||
|
||||
排查要点:
|
||||
|
||||
- 检查 `API_BASE` 地址与网络连通性。
|
||||
- 适当提高超时与重试次数(在配置中调整)。
|
||||
|
||||
### 8.3 模块导入错误
|
||||
|
||||
```text
|
||||
错误: ModuleNotFoundError
|
||||
```
|
||||
|
||||
排查要点:
|
||||
|
||||
- 确认在项目根目录下运行(包含 `etl_billiards/` 包)。
|
||||
- 或通过 `pip install -e .` 以可编辑模式安装项目。
|
||||
|
||||
### 8.4 权限相关问题
|
||||
|
||||
```text
|
||||
错误: Permission denied
|
||||
```
|
||||
|
||||
排查要点:
|
||||
|
||||
- 脚本无执行权限:`chmod +x run_etl.sh`。
|
||||
- Windows 需要以管理员身份运行,或修改日志 / 导出目录权限。
|
||||
|
||||
---
|
||||
|
||||
## 9. 使用前检查清单
|
||||
|
||||
在正式运行前建议确认:
|
||||
|
||||
- [ ] 已安装 Python 3.10+。
|
||||
- [ ] 已执行 `pip install -r requirements.txt`。
|
||||
- [ ] `.env` 已配置正确(数据库、API、门店 ID、路径等)。
|
||||
- [ ] PostgreSQL 数据库可连接。
|
||||
- [ ] API 服务可访问且凭证有效。
|
||||
- [ ] `LOG_ROOT`、`EXPORT_ROOT` 目录存在且拥有写权限。
|
||||
|
||||
---
|
||||
|
||||
## 10. 参考说明
|
||||
|
||||
- 本文已合并原有的快速开始、项目结构、架构说明、迁移指南等内容,可作为当前项目的统一说明文档。
|
||||
- 如需在此基础上拆分多份文档,可按章节拆出,例如「快速开始」「架构设计」「迁移指南」「开发扩展」等。
|
||||
|
||||
## 11. 运行/调试模式说明
|
||||
|
||||
- 生产环境仅保留“任务模式”:通过调度/CLI 执行注册的任务(ETL/ODS),不使用调试脚本。
|
||||
- 开发/调试可使用的辅助脚本(上线前可删除或禁用):
|
||||
- `python -m etl_billiards.scripts.rebuild_ods_from_json`:从本地 JSON 目录重建 `billiards_ods`,用于离线初始化/验证。环境变量:`PG_DSN`(必填)、`JSON_DOC_DIR`(可选,默认 `C:\dev\LLTQ\export\test-json-doc`)、`INCLUDE_FILES`(逗号分隔文件名)、`DROP_SCHEMA_FIRST`(默认 true)。
|
||||
- 如需在生产环境保留脚本,请在运维手册中明确用途和禁用条件,避免误用。
|
||||
|
||||
## 12. ODS 任务上线指引
|
||||
|
||||
- 任务注册:`etl_billiards/database/seed_ods_tasks.sql` 列出了当前启用的 ODS 任务。将其中的 `store_id` 替换为实际门店后执行:
|
||||
```
|
||||
psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
|
||||
```
|
||||
`ON CONFLICT` 会保持 enabled=true,避免重复。
|
||||
- 调度:确认 `etl_admin.etl_task` 中已启用所需的 ODS 任务(任务代码见 seed 脚本),调度器或 CLI `--tasks` 即可调用。
|
||||
- 离线回灌:开发环境可用 `rebuild_ods_from_json` 以样例 JSON 初始化 ODS,生产慎用;默认按 `(source_file, record_index)` 去重。
|
||||
- 测试:`pytest etl_billiards/tests/unit/test_ods_tasks.py` 覆盖核心 ODS 任务;测试时可设置 `ETL_SKIP_DOTENV=1` 跳过本地 .env 读取。
|
||||
|
||||
## 13. ODS 表映射总览
|
||||
|
||||
| ODS 表名 | 接口 Path | 数据列表路径 |
|
||||
| ------------------------------------ | ---------------------------------------------------- | ----------------------------- |
|
||||
| `assistant_accounts_master` | `/PersonnelManagement/SearchAssistantInfo` | data.assistantInfos |
|
||||
| `assistant_service_records` | `/AssistantPerformance/GetOrderAssistantDetails` | data.orderAssistantDetails |
|
||||
| `assistant_cancellation_records` | `/AssistantPerformance/GetAbolitionAssistant` | data.abolitionAssistants |
|
||||
| `goods_stock_movements` | `/GoodsStockManage/QueryGoodsOutboundReceipt` | data.queryDeliveryRecordsList |
|
||||
| `goods_stock_summary` | `/TenantGoods/GetGoodsStockReport` | data |
|
||||
| `group_buy_packages` | `/PackageCoupon/QueryPackageCouponList` | data.packageCouponList |
|
||||
| `group_buy_redemption_records` | `/Site/GetSiteTableUseDetails` | data.siteTableUseDetailsList |
|
||||
| `member_profiles` | `/MemberProfile/GetTenantMemberList` | data.tenantMemberInfos |
|
||||
| `member_balance_changes` | `/MemberProfile/GetMemberCardBalanceChange` | data.tenantMemberCardLogs |
|
||||
| `member_stored_value_cards` | `/MemberProfile/GetTenantMemberCardList` | data.tenantMemberCards |
|
||||
| `payment_transactions` | `/PayLog/GetPayLogListPage` | data |
|
||||
| `platform_coupon_redemption_records` | `/Promotion/GetOfflineCouponConsumePageList` | data |
|
||||
| `recharge_settlements` | `/Site/GetRechargeSettleList` | data.settleList |
|
||||
| `refund_transactions` | `/Order/GetRefundPayLogList` | data |
|
||||
| `settlement_records` | `/Site/GetAllOrderSettleList` | data.settleList |
|
||||
| `settlement_ticket_details` | `/Order/GetOrderSettleTicketNew` | (整包原始 JSON) |
|
||||
| `site_tables_master` | `/Table/GetSiteTables` | data.siteTables |
|
||||
| `stock_goods_category_tree` | `/TenantGoodsCategory/QueryPrimarySecondaryCategory` | data.goodsCategoryList |
|
||||
| `store_goods_master` | `/TenantGoods/GetGoodsInventoryList` | data.orderGoodsList |
|
||||
| `store_goods_sales_records` | `/TenantGoods/GetGoodsSalesList` | data.orderGoodsLedgers |
|
||||
| `table_fee_discount_records` | `/Site/GetTaiFeeAdjustList` | data.taiFeeAdjustInfos |
|
||||
| `table_fee_transactions` | `/Site/GetSiteTableOrderDetails` | data.siteTableUseDetailsList |
|
||||
| `tenant_goods_master` | `/TenantGoods/QueryTenantGoods` | data.tenantGoodsList |
|
||||
|
||||
## 14. ODS 相关环境变量/默认值
|
||||
|
||||
- `.env` / 环境变量:
|
||||
- `JSON_DOC_DIR`:ODS 样例 JSON 目录(开发/回灌用)
|
||||
- `ODS_INCLUDE_FILES`:限定导入的文件名(逗号分隔,不含 .json)
|
||||
- `ODS_DROP_SCHEMA_FIRST`:true/false,是否重建 schema
|
||||
- `ETL_SKIP_DOTENV`:测试/CI 时设为 1 跳过本地 .env 读取
|
||||
- `config/defaults.py` 中 `ods` 默认值:
|
||||
- `json_doc_dir`: `C:\dev\LLTQ\export\test-json-doc`
|
||||
- `include_files`: `""`
|
||||
- `drop_schema_first`: `True`
|
||||
|
||||
---
|
||||
|
||||
## 15. DWD 维度 “业务事件”
|
||||
|
||||
1. 粒度唯一、原子
|
||||
|
||||
- 一张 DWD 表只能有一种业务粒度,比如:
|
||||
- 一条记录 = 一次结账;
|
||||
- 一条记录 = 一段台费流水;
|
||||
- 一条记录 = 一次助教服务;
|
||||
- 一条记录 = 一次会员余额变动。
|
||||
- 表里面不能又混“订单头”又混“订单行”,不能一部分是“汇总”,一部分是“明细”。
|
||||
- 一旦粒度确定,所有字段都要跟这个粒度匹配:
|
||||
- 比如“结账头表”就不要塞每一行商品明细;
|
||||
- 商品明细就不要塞整单级别的总金额。
|
||||
- 这是 DWD 层最重要的一条。
|
||||
|
||||
2. 以业务过程建模,不以 JSON 列表建模
|
||||
|
||||
- 先画清楚你真实的业务链路:
|
||||
- 开台 / 换台 / 关台 → 台费流水
|
||||
- 助教上桌 → 助教服务流水 / 废除事件
|
||||
- 点单 → 商品销售流水
|
||||
- 充值 / 消费 → 余额变更 / 充值单
|
||||
- 结账 → 结账头表 + 支付流水 / 退款流水
|
||||
- 团购 / 平台券 → 核销流水
|
||||
|
||||
3. 主键明确、外键统一
|
||||
|
||||
- 每张 DWD 表必须有业务主键(哪怕是接口给的 id),不要依赖数据库自增。
|
||||
- 所有“同一概念”的字段必须统一命名、统一含义:
|
||||
- 门店:统一叫 site_id,都对应 siteProfile.id;
|
||||
- 会员:统一叫 member_id 对应 member_profiles.id,system_member_id 单独一列;
|
||||
- 台桌:统一 table_id 对应 site_tables_master.id;
|
||||
- 结账:统一 order_settle_id;
|
||||
- 订单:统一 order_trade_no 等。
|
||||
- 否则后面 DWS、AI 要把表拼起来会非常痛苦。
|
||||
|
||||
4. 保留明细,不做过度汇总
|
||||
|
||||
- DWD 层的事实表原则上只做“明细级”的数据:
|
||||
- 不要在 DWD 就把“日汇总、周汇总、月汇总”算出来,那是 DWS 的事;
|
||||
- 也不要把多个事件折成一行(例如一张表同时放日汇总+单笔流水)。
|
||||
- 需要聚合时,再在 DWS 做主题宽表:
|
||||
- dws_member_day_profile、dws_site_day_summary 等。
|
||||
- DWD 只负责细颗粒度的真相。
|
||||
|
||||
5. 统一清洗、标准化,但保持可追溯
|
||||
|
||||
- 在 DWD 层一定要做的清洗:
|
||||
- 类型转换:字符串时间 → 时间类型,金额统一为 decimal,布尔统一为 0/1;
|
||||
- 单位统一:秒 / 分钟、元 / 分都统一;
|
||||
- 枚举标准化:状态码、类型码在 DWD 里就定死含义,必要时建枚举维表。
|
||||
- 同时要保证:
|
||||
- 每条 DWD 记录都能追溯回 ODS:
|
||||
- 保留源系统主键;
|
||||
- 保留原始时间 / 原始金额字段(不要覆盖掉)。
|
||||
|
||||
6. 扁平化、去嵌套
|
||||
|
||||
- JSON 里常见结构是:分页壳 + 头 + 明细数组 + 各种嵌套对象(siteProfile、tableProfile、goodsLedgers…)。
|
||||
- DWD 的原则是:
|
||||
- 去掉分页壳;
|
||||
- 把“数组”拆成子表(头表 / 行表);
|
||||
- 把重复出现的 profile 抽出去做维度表(门店、台、商品、会员……)。
|
||||
- 目标是:DWD 表都是二维表结构,不存复杂嵌套 JSON。
|
||||
|
||||
7. 模型长期稳定,可扩展
|
||||
|
||||
- DWD 的表结构要尽可能稳定,新增需求尽量通过:
|
||||
- 加字段;
|
||||
- 新建事实表 / 维度表;
|
||||
- 在 DWS 做派生指标;
|
||||
- 而不是频繁重构已有 DWD 表结构。
|
||||
- 这点跟你后面要喂给 LLM 也很相关:AI 配的 prompt、schema 理解都要尽量少改。
|
||||
1886
etl_billiards/database/schema_ODS_doc copy.sql
Normal file
1886
etl_billiards/database/schema_ODS_doc copy.sql
Normal file
File diff suppressed because it is too large
Load Diff
1907
etl_billiards/database/schema_ODS_doc.sql
Normal file
1907
etl_billiards/database/schema_ODS_doc.sql
Normal file
File diff suppressed because it is too large
Load Diff
@@ -2,22 +2,117 @@
|
||||
CREATE SCHEMA IF NOT EXISTS billiards_dwd;
|
||||
SET search_path TO billiards_dwd;
|
||||
|
||||
-- SCD2 字段统一默认值、中文注释、唯一性(业务键 + 时间段不重叠)控制
|
||||
CREATE EXTENSION IF NOT EXISTS btree_gist;
|
||||
|
||||
DO $$
|
||||
DECLARE
|
||||
rec RECORD;
|
||||
BEGIN
|
||||
-- 统一 SCD2 默认值与注释,避免后续手工遗漏
|
||||
FOR rec IN
|
||||
SELECT table_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = 'billiards_dwd'
|
||||
AND column_name = 'scd2_start_time'
|
||||
LOOP
|
||||
EXECUTE format('ALTER TABLE billiards_dwd.%I ALTER COLUMN scd2_start_time SET DEFAULT now()', rec.table_name);
|
||||
EXECUTE format('ALTER TABLE billiards_dwd.%I ALTER COLUMN scd2_end_time SET DEFAULT ''9999-12-31''', rec.table_name);
|
||||
EXECUTE format('ALTER TABLE billiards_dwd.%I ALTER COLUMN scd2_is_current SET DEFAULT 1', rec.table_name);
|
||||
EXECUTE format('ALTER TABLE billiards_dwd.%I ALTER COLUMN scd2_version SET DEFAULT 1', rec.table_name);
|
||||
|
||||
EXECUTE format('COMMENT ON COLUMN billiards_dwd.%I.scd2_start_time IS ''SCD2 开始时间(版本生效起点)''', rec.table_name);
|
||||
EXECUTE format('COMMENT ON COLUMN billiards_dwd.%I.scd2_end_time IS ''SCD2 结束时间(默认 9999-12-31,表示当前版本仍有效)''', rec.table_name);
|
||||
EXECUTE format('COMMENT ON COLUMN billiards_dwd.%I.scd2_is_current IS ''SCD2 当前版本标记:1=当前版本,0=历史版本''', rec.table_name);
|
||||
EXECUTE format('COMMENT ON COLUMN billiards_dwd.%I.scd2_version IS ''SCD2 版本号,自增,配合时间段避免重叠''', rec.table_name);
|
||||
END LOOP;
|
||||
|
||||
-- 约束:同一业务键时间段不重叠,且仅有一条当前版本
|
||||
FOR rec IN (
|
||||
SELECT tc.table_name,
|
||||
string_agg(format('%I WITH =', kcu.column_name), ', ' ORDER BY kcu.ordinal_position) AS pk_eq_expr,
|
||||
string_agg(format('%I', kcu.column_name), ', ' ORDER BY kcu.ordinal_position) AS pk_cols
|
||||
FROM information_schema.table_constraints tc
|
||||
JOIN information_schema.key_column_usage kcu
|
||||
ON tc.table_schema = kcu.table_schema
|
||||
AND tc.table_name = kcu.table_name
|
||||
AND tc.constraint_name = kcu.constraint_name
|
||||
WHERE tc.table_schema = 'billiards_dwd'
|
||||
AND tc.constraint_type = 'PRIMARY KEY'
|
||||
AND EXISTS (
|
||||
SELECT 1 FROM information_schema.columns c
|
||||
WHERE c.table_schema = 'billiards_dwd'
|
||||
AND c.table_name = tc.table_name
|
||||
AND c.column_name = 'scd2_start_time'
|
||||
)
|
||||
GROUP BY tc.table_name
|
||||
)
|
||||
LOOP
|
||||
IF NOT EXISTS (
|
||||
SELECT 1 FROM pg_constraint
|
||||
WHERE conname = format('%s_scd2_no_overlap', rec.table_name)
|
||||
AND conrelid = format('billiards_dwd.%s', rec.table_name)::regclass
|
||||
) THEN
|
||||
EXECUTE format(
|
||||
'ALTER TABLE billiards_dwd.%I ADD CONSTRAINT %I EXCLUDE USING gist (%s, tstzrange(scd2_start_time, scd2_end_time) WITH &&) WHERE (scd2_is_current = 1);',
|
||||
rec.table_name,
|
||||
rec.table_name || '_scd2_no_overlap',
|
||||
rec.pk_eq_expr
|
||||
);
|
||||
END IF;
|
||||
|
||||
IF to_regclass(format('billiards_dwd.%s_scd2_current_unique_idx', rec.table_name)) IS NULL THEN
|
||||
EXECUTE format(
|
||||
'CREATE UNIQUE INDEX %I ON billiards_dwd.%I (%s) WHERE (scd2_is_current = 1);',
|
||||
rec.table_name || '_scd2_current_unique_idx',
|
||||
rec.table_name,
|
||||
rec.pk_cols
|
||||
);
|
||||
END IF;
|
||||
END LOOP;
|
||||
END
|
||||
$$;
|
||||
|
||||
-- SCD2 统一约定(DIM 表使用):
|
||||
-- SCD2_start_time TIMESTAMPTZ DEFAULT now() -- 版本开始时间
|
||||
-- SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31' -- 版本结束时间
|
||||
-- SCD2_is_current INT DEFAULT 1 -- 当前版本标记(1当前/0历史)
|
||||
-- SCD2_version INT DEFAULT 1 -- 版本号,自增
|
||||
|
||||
-- dim_site
|
||||
CREATE TABLE IF NOT EXISTS dim_site (
|
||||
site_id BIGINT,
|
||||
org_id BIGINT,
|
||||
shop_name TEXT,
|
||||
business_tel TEXT,
|
||||
full_address TEXT,
|
||||
tenant_id BIGINT,
|
||||
shop_name TEXT,
|
||||
site_label TEXT,
|
||||
full_address TEXT,
|
||||
address TEXT,
|
||||
longitude NUMERIC(10,6),
|
||||
latitude NUMERIC(10,6),
|
||||
tenant_site_region_id BIGINT,
|
||||
business_tel TEXT,
|
||||
site_type INTEGER,
|
||||
shop_status INTEGER,
|
||||
SCD2_start_time TIMESTAMPTZ DEFAULT now(),
|
||||
SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31',
|
||||
SCD2_is_current INT DEFAULT 1,
|
||||
SCD2_version INT DEFAULT 1,
|
||||
PRIMARY KEY (site_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_site.site_id IS '门店主键 ID,唯一标识一家门店。与所有事实表中的 site_id 对应。 | 来源: siteProfile.id | 角色: 主键';
|
||||
COMMENT ON COLUMN dim_site.org_id IS '上级组织 ID,用于区域组织划分。 | 来源: siteProfile.org_id | 角色: 外键';
|
||||
COMMENT ON COLUMN dim_site.shop_name IS '门店名称,展示用。 | 来源: siteProfile.shop_name';
|
||||
COMMENT ON COLUMN dim_site.business_tel IS '门店电话。 | 来源: siteProfile.business_tel';
|
||||
COMMENT ON COLUMN dim_site.full_address IS '门店完整地址。 | 来源: siteProfile.full_address';
|
||||
COMMENT ON COLUMN dim_site.tenant_id IS '租户 ID。与其它表 tenant_id 对应。 | 来源: siteProfile.tenant_id | 角色: 外键';
|
||||
COMMENT ON COLUMN dim_site.site_id IS '???? ID?????????????????? site_id ??? | ??: siteProfile.id | ??: ??';
|
||||
COMMENT ON COLUMN dim_site.org_id IS '???? ID?????????? | ??: siteProfile.org_id | ??: ??';
|
||||
COMMENT ON COLUMN dim_site.tenant_id IS '?? ID????? tenant_id ??? | ??: siteProfile.tenant_id | ??: ??';
|
||||
COMMENT ON COLUMN dim_site.shop_name IS '????????? | ??: siteProfile.shop_name';
|
||||
COMMENT ON COLUMN dim_site.site_label IS '???????????????? | ??: siteProfile.site_label';
|
||||
COMMENT ON COLUMN dim_site.full_address IS '??????? | ??: siteProfile.full_address';
|
||||
COMMENT ON COLUMN dim_site.address IS '???????????? | ??: siteProfile.address';
|
||||
COMMENT ON COLUMN dim_site.longitude IS '???????? | ??: siteProfile.longitude';
|
||||
COMMENT ON COLUMN dim_site.latitude IS '???????? | ??: siteProfile.latitude';
|
||||
COMMENT ON COLUMN dim_site.tenant_site_region_id IS '????/?????????? | ??: siteProfile.tenant_site_region_id';
|
||||
COMMENT ON COLUMN dim_site.business_tel IS '????? | ??: siteProfile.business_tel';
|
||||
COMMENT ON COLUMN dim_site.site_type IS '??????????????????? | ??: siteProfile.site_type';
|
||||
COMMENT ON COLUMN dim_site.shop_status IS '??????????????????? | ??: siteProfile.shop_status';
|
||||
|
||||
-- dim_site_Ex
|
||||
CREATE TABLE IF NOT EXISTS dim_site_Ex (
|
||||
@@ -42,6 +137,10 @@ CREATE TABLE IF NOT EXISTS dim_site_Ex (
|
||||
shop_status INTEGER,
|
||||
create_time TIMESTAMPTZ,
|
||||
update_time TIMESTAMPTZ,
|
||||
SCD2_start_time TIMESTAMPTZ DEFAULT now(),
|
||||
SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31',
|
||||
SCD2_is_current INT DEFAULT 1,
|
||||
SCD2_version INT DEFAULT 1,
|
||||
PRIMARY KEY (site_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_site_Ex.site_id IS '门店主键 ID,唯一标识一家门店。与所有事实表中的 site_id 对应。 | 来源: siteProfile.id | 角色: 主键';
|
||||
@@ -69,17 +168,19 @@ COMMENT ON COLUMN dim_site_Ex.update_time IS '门店最近更新时间。 | 来
|
||||
-- dim_table
|
||||
CREATE TABLE IF NOT EXISTS dim_table (
|
||||
table_id BIGINT,
|
||||
tenant_id BIGINT,
|
||||
site_id BIGINT,
|
||||
table_name TEXT,
|
||||
site_table_area_id BIGINT,
|
||||
site_table_area_name TEXT,
|
||||
tenant_table_area_id BIGINT,
|
||||
table_price NUMERIC(18,2),
|
||||
SCD2_start_time TIMESTAMPTZ DEFAULT now(),
|
||||
SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31',
|
||||
SCD2_is_current INT DEFAULT 1,
|
||||
SCD2_version INT DEFAULT 1,
|
||||
PRIMARY KEY (table_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_table.table_id IS '台桌主键,唯一标识一张台或包厢。 | 来源: id | 角色: 主键';
|
||||
COMMENT ON COLUMN dim_table.tenant_id IS '租户 ID。 | 来源: tenantId | 角色: 外键';
|
||||
COMMENT ON COLUMN dim_table.site_id IS '门店 ID。 | 来源: siteId | 角色: 外键';
|
||||
COMMENT ON COLUMN dim_table.table_name IS '台桌名称/编号,如 A17、888。 | 来源: tableName';
|
||||
COMMENT ON COLUMN dim_table.site_table_area_id IS '门店区 ID,用于区分 A区/B区/补时区等。 | 来源: siteTableAreaId | 角色: 外键';
|
||||
@@ -95,8 +196,10 @@ CREATE TABLE IF NOT EXISTS dim_table_Ex (
|
||||
table_cloth_use_time INTEGER,
|
||||
table_cloth_use_cycle INTEGER,
|
||||
table_status INTEGER,
|
||||
last_maintenance_time TIMESTAMPTZ,
|
||||
remark TEXT,
|
||||
SCD2_start_time TIMESTAMPTZ DEFAULT now(),
|
||||
SCD2_end_time TIMESTAMPTZ DEFAULT '9999-12-31',
|
||||
SCD2_is_current INT DEFAULT 1,
|
||||
SCD2_version INT DEFAULT 1,
|
||||
PRIMARY KEY (table_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_table_Ex.table_id IS '台桌主键,唯一标识一张台或包厢。 | 来源: id | 角色: 主键';
|
||||
@@ -105,8 +208,6 @@ COMMENT ON COLUMN dim_table_Ex.is_online_reservation IS '是否可线上预约
|
||||
COMMENT ON COLUMN dim_table_Ex.table_cloth_use_time IS '已使用台呢时长(秒)。 | 来源: tableClothUseTime';
|
||||
COMMENT ON COLUMN dim_table_Ex.table_cloth_use_cycle IS '台呢更换周期阈值(秒)。 | 来源: tableClothUseCycle';
|
||||
COMMENT ON COLUMN dim_table_Ex.table_status IS '当前台桌状态:1=空闲,2=使用中,3=暂停中,4=锁定。 | 来源: tableStatus';
|
||||
COMMENT ON COLUMN dim_table_Ex.last_maintenance_time IS '最近保养时间(未在 JSON 中出现)。 | 来源: lastMaintenanceTime';
|
||||
COMMENT ON COLUMN dim_table_Ex.remark IS '备注信息。 | 来源: remark';
|
||||
|
||||
-- dim_assistant
|
||||
CREATE TABLE IF NOT EXISTS dim_assistant (
|
||||
@@ -125,6 +226,10 @@ CREATE TABLE IF NOT EXISTS dim_assistant (
|
||||
resign_time TIMESTAMPTZ,
|
||||
leave_status INTEGER,
|
||||
assistant_status INTEGER,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (assistant_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_assistant.assistant_id IS '助教账号 ID,关联助教服务流水表。 | 来源: id | 角色: 主键';
|
||||
@@ -189,6 +294,10 @@ CREATE TABLE IF NOT EXISTS dim_assistant_Ex (
|
||||
light_status INTEGER,
|
||||
is_team_leader INTEGER,
|
||||
serial_number BIGINT,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (assistant_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_assistant_Ex.assistant_id IS '助教账号 ID,关联助教服务流水表。 | 来源: id | 角色: 主键';
|
||||
@@ -248,6 +357,10 @@ CREATE TABLE IF NOT EXISTS dim_member (
|
||||
member_card_grade_name TEXT,
|
||||
create_time TIMESTAMPTZ,
|
||||
update_time TIMESTAMPTZ,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (member_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_member.member_id IS '租户内会员主键。 | 来源: id | 角色: 主键';
|
||||
@@ -259,7 +372,6 @@ COMMENT ON COLUMN dim_member.nickname IS '昵称(未必是真实姓名)。 |
|
||||
COMMENT ON COLUMN dim_member.member_card_grade_code IS '会员等级代码:1=金卡?2=银卡?3=钻石卡?4=黑卡?(按照 MD 文档枚举)。 | 来源: member_card_grade_code';
|
||||
COMMENT ON COLUMN dim_member.member_card_grade_name IS '等级名称,中文描述。 | 来源: member_card_grade_name';
|
||||
COMMENT ON COLUMN dim_member.create_time IS '会员档案创建时间。 | 来源: create_time';
|
||||
COMMENT ON COLUMN dim_member.update_time IS '最近更新时间。 | 来源: update_time';
|
||||
|
||||
-- dim_member_Ex
|
||||
CREATE TABLE IF NOT EXISTS dim_member_Ex (
|
||||
@@ -270,6 +382,10 @@ CREATE TABLE IF NOT EXISTS dim_member_Ex (
|
||||
growth_value NUMERIC(18,2),
|
||||
user_status INTEGER,
|
||||
status INTEGER,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (member_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_member_Ex.member_id IS '租户内会员主键。 | 来源: id | 角色: 主键';
|
||||
@@ -299,6 +415,10 @@ CREATE TABLE IF NOT EXISTS dim_member_card_account (
|
||||
last_consume_time TIMESTAMPTZ,
|
||||
status INTEGER,
|
||||
is_delete INTEGER,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (member_card_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_member_card_account.member_card_id IS '会员卡账户主键,唯一标识一张具体卡。 | 来源: id | 角色: 主键';
|
||||
@@ -373,6 +493,10 @@ CREATE TABLE IF NOT EXISTS dim_member_card_account_Ex (
|
||||
goodsCategoryId TEXT,
|
||||
pdAssisnatLevel TEXT,
|
||||
cxAssisnatLevel TEXT,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (member_card_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_member_card_account_Ex.member_card_id IS '会员卡账户主键,唯一标识一张具体卡。 | 来源: id | 角色: 主键';
|
||||
@@ -444,6 +568,10 @@ CREATE TABLE IF NOT EXISTS dim_tenant_goods (
|
||||
create_time TIMESTAMPTZ,
|
||||
update_time TIMESTAMPTZ,
|
||||
is_delete INTEGER,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (tenant_goods_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_tenant_goods.tenant_goods_id IS '租户级商品档案主键 ID,唯一标识一条商品档案。所有业务事实表(销售、库存等)中引用租户级商品时应指向此字段。 | 来源: id | 角色: 主键';
|
||||
@@ -481,6 +609,10 @@ CREATE TABLE IF NOT EXISTS dim_tenant_goods_Ex (
|
||||
common_sale_royalty INTEGER,
|
||||
point_sale_royalty INTEGER,
|
||||
out_goods_id BIGINT,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (tenant_goods_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_tenant_goods_Ex.tenant_goods_id IS '租户级商品档案主键 ID,唯一标识一条商品档案。所有业务事实表(销售、库存等)中引用租户级商品时应指向此字段。 | 来源: id | 角色: 主键';
|
||||
@@ -524,6 +656,10 @@ CREATE TABLE IF NOT EXISTS dim_store_goods (
|
||||
enable_status INTEGER,
|
||||
send_state INTEGER,
|
||||
is_delete INTEGER,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (site_goods_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_store_goods.site_goods_id IS '门店级商品 ID,本表主键;其它业务表中的 site_goods_id 与此对应,用于库存、销售等关联。 | 来源: id | 角色: 主键';
|
||||
@@ -575,6 +711,10 @@ CREATE TABLE IF NOT EXISTS dim_store_goods_Ex (
|
||||
option_required INTEGER,
|
||||
remark TEXT,
|
||||
sort_order INTEGER,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (site_goods_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_store_goods_Ex.site_goods_id IS '门店级商品 ID,本表主键;其它业务表中的 site_goods_id 与此对应,用于库存、销售等关联。 | 来源: id | 角色: 主键';
|
||||
@@ -618,6 +758,10 @@ CREATE TABLE IF NOT EXISTS dim_goods_category (
|
||||
open_salesman INTEGER,
|
||||
sort_order INTEGER,
|
||||
is_warehousing INTEGER,
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (category_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_goods_category.category_id IS '分类节点主键。来自分类树节点的 id,在整个商品分类维度内唯一。用于在事实表中作为商品分类外键引用。 | 来源: id | 角色: 主键';
|
||||
@@ -651,6 +795,10 @@ CREATE TABLE IF NOT EXISTS dim_groupbuy_package (
|
||||
create_time TIMESTAMPTZ,
|
||||
tenant_table_area_id_list VARCHAR(512),
|
||||
card_type_ids VARCHAR(255),
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (groupbuy_package_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_groupbuy_package.groupbuy_package_id IS '门店侧团购套餐主键。每条记录一个套餐定义,供团购券核销记录指向。平台验券记录中的 group_package_id 通常指向这里。 | 来源: id | 角色: 主键';
|
||||
@@ -692,6 +840,10 @@ CREATE TABLE IF NOT EXISTS dim_groupbuy_package_Ex (
|
||||
effective_status INTEGER,
|
||||
max_selectable_categories INTEGER,
|
||||
creator_name VARCHAR(100),
|
||||
SCD2_start_time TIMESTAMPTZ,
|
||||
SCD2_end_time TIMESTAMPTZ,
|
||||
SCD2_is_current INT,
|
||||
SCD2_version INT,
|
||||
PRIMARY KEY (groupbuy_package_id)
|
||||
);
|
||||
COMMENT ON COLUMN dim_groupbuy_package_Ex.groupbuy_package_id IS '门店侧团购套餐主键。每条记录一个套餐定义,供团购券核销记录指向。平台验券记录中的 group_package_id 通常指向这里。 | 来源: id | 角色: 主键';
|
||||
|
||||
105
etl_billiards/database/schema_etl_admin.sql
Normal file
105
etl_billiards/database/schema_etl_admin.sql
Normal file
@@ -0,0 +1,105 @@
|
||||
-- 文件说明:etl_admin 调度元数据 DDL(独立文件,便于初始化任务单独执行)。
|
||||
-- 包含任务注册表、游标表、运行记录表;字段注释使用中文。
|
||||
|
||||
CREATE SCHEMA IF NOT EXISTS etl_admin;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS etl_admin.etl_task (
|
||||
task_id BIGSERIAL PRIMARY KEY,
|
||||
task_code TEXT NOT NULL,
|
||||
store_id BIGINT NOT NULL,
|
||||
enabled BOOLEAN DEFAULT TRUE,
|
||||
cursor_field TEXT,
|
||||
window_minutes_default INT DEFAULT 30,
|
||||
overlap_seconds INT DEFAULT 120,
|
||||
page_size INT DEFAULT 200,
|
||||
retry_max INT DEFAULT 3,
|
||||
params JSONB DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ DEFAULT now(),
|
||||
UNIQUE (task_code, store_id)
|
||||
);
|
||||
COMMENT ON TABLE etl_admin.etl_task IS '任务注册表:调度依据的任务清单(与 task_registry 中的任务码对应)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.task_code IS '任务编码,需与代码中的任务码一致。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.store_id IS '门店/租户粒度,区分多门店执行。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.enabled IS '是否启用此任务。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.cursor_field IS '增量游标字段名(可选)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.window_minutes_default IS '默认时间窗口(分钟)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.overlap_seconds IS '窗口重叠秒数,用于防止遗漏。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.page_size IS '默认分页大小。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.retry_max IS 'API重试次数上限。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.params IS '任务级自定义参数 JSON。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.created_at IS '创建时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_task.updated_at IS '更新时间。';
|
||||
|
||||
CREATE TABLE IF NOT EXISTS etl_admin.etl_cursor (
|
||||
cursor_id BIGSERIAL PRIMARY KEY,
|
||||
task_id BIGINT NOT NULL REFERENCES etl_admin.etl_task(task_id) ON DELETE CASCADE,
|
||||
store_id BIGINT NOT NULL,
|
||||
last_start TIMESTAMPTZ,
|
||||
last_end TIMESTAMPTZ,
|
||||
last_id BIGINT,
|
||||
last_run_id BIGINT,
|
||||
extra JSONB DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ DEFAULT now(),
|
||||
UNIQUE (task_id, store_id)
|
||||
);
|
||||
COMMENT ON TABLE etl_admin.etl_cursor IS '任务游标表:记录每个任务/门店的增量窗口及最后 run。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.task_id IS '关联 etl_task.task_id。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.store_id IS '门店/租户粒度。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.last_start IS '上次窗口开始时间(含重叠偏移)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.last_end IS '上次窗口结束时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.last_id IS '上次处理的最大主键/游标值(可选)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.last_run_id IS '上次运行ID,对应 etl_run.run_id。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.extra IS '附加游标信息 JSON。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.created_at IS '创建时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_cursor.updated_at IS '更新时间。';
|
||||
|
||||
CREATE TABLE IF NOT EXISTS etl_admin.etl_run (
|
||||
run_id BIGSERIAL PRIMARY KEY,
|
||||
run_uuid TEXT NOT NULL,
|
||||
task_id BIGINT NOT NULL REFERENCES etl_admin.etl_task(task_id) ON DELETE CASCADE,
|
||||
store_id BIGINT NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
started_at TIMESTAMPTZ DEFAULT now(),
|
||||
ended_at TIMESTAMPTZ,
|
||||
window_start TIMESTAMPTZ,
|
||||
window_end TIMESTAMPTZ,
|
||||
window_minutes INT,
|
||||
overlap_seconds INT,
|
||||
fetched_count INT DEFAULT 0,
|
||||
loaded_count INT DEFAULT 0,
|
||||
updated_count INT DEFAULT 0,
|
||||
skipped_count INT DEFAULT 0,
|
||||
error_count INT DEFAULT 0,
|
||||
unknown_fields INT DEFAULT 0,
|
||||
export_dir TEXT,
|
||||
log_path TEXT,
|
||||
request_params JSONB DEFAULT '{}'::jsonb,
|
||||
manifest JSONB DEFAULT '{}'::jsonb,
|
||||
error_message TEXT,
|
||||
extra JSONB DEFAULT '{}'::jsonb
|
||||
);
|
||||
COMMENT ON TABLE etl_admin.etl_run IS '运行记录表:记录每次任务执行的窗口、状态、计数与日志路径。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.run_uuid IS '本次调度的唯一标识。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.task_id IS '关联 etl_task.task_id。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.store_id IS '门店/租户粒度。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.status IS '运行状态(SUCC/FAIL/PARTIAL 等)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.started_at IS '开始时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.ended_at IS '结束时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.window_start IS '本次窗口开始时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.window_end IS '本次窗口结束时间。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.window_minutes IS '窗口跨度(分钟)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.overlap_seconds IS '窗口重叠秒数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.fetched_count IS '抓取/读取的记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.loaded_count IS '插入的记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.updated_count IS '更新的记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.skipped_count IS '跳过的记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.error_count IS '错误记录数。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.unknown_fields IS '未知字段计数(清洗阶段)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.export_dir IS '抓取/导出目录。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.log_path IS '日志路径。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.request_params IS '请求参数 JSON。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.manifest IS '运行产出清单/统计 JSON。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.error_message IS '错误信息(若失败)。';
|
||||
COMMENT ON COLUMN etl_admin.etl_run.extra IS '附加字段,保留扩展。';
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,34 +1,35 @@
|
||||
-- 将新的 ODS 任务注册到 etl_admin.etl_task(根据需要替换 store_id)
|
||||
-- 使用方式(示例):
|
||||
-- 灏嗘柊鐨?ODS 浠诲姟娉ㄥ唽鍒?etl_admin.etl_task锛堟牴鎹渶瑕佹浛鎹?store_id锛?
|
||||
-- 浣跨敤鏂瑰紡锛堢ず渚嬶級锛?
|
||||
-- psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
|
||||
-- 或者在 psql 中执行本文件内容。
|
||||
-- 鎴栬€呭湪 psql 涓墽琛屾湰鏂囦欢鍐呭銆?
|
||||
|
||||
WITH target_store AS (
|
||||
SELECT 2790685415443269::bigint AS store_id -- TODO: 替换为实际 store_id
|
||||
SELECT 2790685415443269::bigint AS store_id -- TODO: 鏇挎崲涓哄疄闄?store_id
|
||||
),
|
||||
task_codes AS (
|
||||
SELECT unnest(ARRAY[
|
||||
'ODS_ASSISTANT_ACCOUNTS',
|
||||
'ODS_ASSISTANT_LEDGER',
|
||||
'ODS_ASSISTANT_ABOLISH',
|
||||
'ODS_INVENTORY_CHANGE',
|
||||
'assistant_accounts_masterS',
|
||||
'assistant_service_records',
|
||||
'assistant_cancellation_records',
|
||||
'goods_stock_movements',
|
||||
'ODS_INVENTORY_STOCK',
|
||||
'ODS_PACKAGE',
|
||||
'ODS_GROUP_BUY_REDEMPTION',
|
||||
'ODS_MEMBER',
|
||||
'ODS_MEMBER_BALANCE',
|
||||
'ODS_MEMBER_CARD',
|
||||
'member_stored_value_cards',
|
||||
'ODS_PAYMENT',
|
||||
'ODS_REFUND',
|
||||
'ODS_COUPON_VERIFY',
|
||||
'ODS_RECHARGE_SETTLE',
|
||||
'platform_coupon_redemption_records',
|
||||
'recharge_settlements',
|
||||
'ODS_TABLES',
|
||||
'ODS_GOODS_CATEGORY',
|
||||
'ODS_STORE_GOODS',
|
||||
'ODS_TABLE_DISCOUNT',
|
||||
'table_fee_discount_records',
|
||||
'ODS_TENANT_GOODS',
|
||||
'ODS_SETTLEMENT_TICKET',
|
||||
'ODS_ORDER_SETTLE'
|
||||
'settlement_records',
|
||||
'INIT_ODS_SCHEMA'
|
||||
]) AS task_code
|
||||
)
|
||||
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
|
||||
@@ -37,3 +38,4 @@ FROM task_codes t CROSS JOIN target_store s
|
||||
ON CONFLICT (task_code, store_id) DO UPDATE
|
||||
SET enabled = EXCLUDED.enabled;
|
||||
|
||||
|
||||
|
||||
52
etl_billiards/ods_row_report.json
Normal file
52
etl_billiards/ods_row_report.json
Normal file
@@ -0,0 +1,52 @@
|
||||
{
|
||||
"source_counts": {
|
||||
"assistant_accounts_master.json": 2,
|
||||
"assistant_cancellation_records.json": 2,
|
||||
"assistant_service_records.json": 2,
|
||||
"goods_stock_movements.json": 2,
|
||||
"goods_stock_summary.json": 161,
|
||||
"group_buy_packages.json": 2,
|
||||
"group_buy_redemption_records.json": 2,
|
||||
"member_balance_changes.json": 2,
|
||||
"member_profiles.json": 2,
|
||||
"member_stored_value_cards.json": 2,
|
||||
"payment_transactions.json": 200,
|
||||
"platform_coupon_redemption_records.json": 200,
|
||||
"recharge_settlements.json": 2,
|
||||
"refund_transactions.json": 11,
|
||||
"settlement_records.json": 2,
|
||||
"settlement_ticket_details.json": 193,
|
||||
"site_tables_master.json": 2,
|
||||
"stock_goods_category_tree.json": 2,
|
||||
"store_goods_master.json": 2,
|
||||
"store_goods_sales_records.json": 2,
|
||||
"table_fee_discount_records.json": 2,
|
||||
"table_fee_transactions.json": 2,
|
||||
"tenant_goods_master.json": 2
|
||||
},
|
||||
"ods_counts": {
|
||||
"member_profiles": 199,
|
||||
"member_balance_changes": 200,
|
||||
"member_stored_value_cards": 200,
|
||||
"recharge_settlements": 75,
|
||||
"settlement_records": 200,
|
||||
"assistant_cancellation_records": 15,
|
||||
"assistant_accounts_master": 50,
|
||||
"assistant_service_records": 200,
|
||||
"site_tables_master": 71,
|
||||
"table_fee_discount_records": 200,
|
||||
"table_fee_transactions": 200,
|
||||
"goods_stock_movements": 200,
|
||||
"stock_goods_category_tree": 9,
|
||||
"goods_stock_summary": 161,
|
||||
"payment_transactions": 200,
|
||||
"refund_transactions": 11,
|
||||
"platform_coupon_redemption_records": 200,
|
||||
"tenant_goods_master": 156,
|
||||
"group_buy_packages": 17,
|
||||
"group_buy_redemption_records": 200,
|
||||
"settlement_ticket_details": 193,
|
||||
"store_goods_master": 161,
|
||||
"store_goods_sales_records": 200
|
||||
}
|
||||
}
|
||||
@@ -15,10 +15,14 @@ from tasks.table_discount_task import TableDiscountTask
|
||||
from tasks.assistant_abolish_task import AssistantAbolishTask
|
||||
from tasks.ledger_task import LedgerTask
|
||||
from tasks.ods_tasks import ODS_TASK_CLASSES
|
||||
from tasks.ticket_dwd_task import TicketDwdTask
|
||||
from tasks.manual_ingest_task import ManualIngestTask
|
||||
from tasks.payments_dwd_task import PaymentsDwdTask
|
||||
from tasks.members_dwd_task import MembersDwdTask
|
||||
from tasks.init_schema_task import InitOdsSchemaTask
|
||||
from tasks.init_dwd_schema_task import InitDwdSchemaTask
|
||||
from tasks.dwd_load_task import DwdLoadTask
|
||||
from tasks.ticket_dwd_task import TicketDwdTask
|
||||
from tasks.dwd_quality_task import DwdQualityTask
|
||||
|
||||
class TaskRegistry:
|
||||
"""任务注册和工厂"""
|
||||
@@ -64,5 +68,9 @@ default_registry.register("TICKET_DWD", TicketDwdTask)
|
||||
default_registry.register("MANUAL_INGEST", ManualIngestTask)
|
||||
default_registry.register("PAYMENTS_DWD", PaymentsDwdTask)
|
||||
default_registry.register("MEMBERS_DWD", MembersDwdTask)
|
||||
default_registry.register("INIT_ODS_SCHEMA", InitOdsSchemaTask)
|
||||
default_registry.register("INIT_DWD_SCHEMA", InitDwdSchemaTask)
|
||||
default_registry.register("DWD_LOAD_FROM_ODS", DwdLoadTask)
|
||||
default_registry.register("DWD_QUALITY_CHECK", DwdQualityTask)
|
||||
for code, task_cls in ODS_TASK_CLASSES.items():
|
||||
default_registry.register(code, task_cls)
|
||||
|
||||
692
etl_billiards/reports/dwd_quality_report.json
Normal file
692
etl_billiards/reports/dwd_quality_report.json
Normal file
@@ -0,0 +1,692 @@
|
||||
{
|
||||
"generated_at": "2025-12-09T05:21:24.745244",
|
||||
"tables": [
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_site",
|
||||
"ods_table": "billiards_ods.table_fee_transactions",
|
||||
"count": {
|
||||
"dwd": 1,
|
||||
"ods": 200,
|
||||
"diff": -199
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_site_ex",
|
||||
"ods_table": "billiards_ods.table_fee_transactions",
|
||||
"count": {
|
||||
"dwd": 1,
|
||||
"ods": 200,
|
||||
"diff": -199
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_table",
|
||||
"ods_table": "billiards_ods.site_tables_master",
|
||||
"count": {
|
||||
"dwd": 71,
|
||||
"ods": 71,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_table_ex",
|
||||
"ods_table": "billiards_ods.site_tables_master",
|
||||
"count": {
|
||||
"dwd": 71,
|
||||
"ods": 71,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_assistant",
|
||||
"ods_table": "billiards_ods.assistant_accounts_master",
|
||||
"count": {
|
||||
"dwd": 50,
|
||||
"ods": 50,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_assistant_ex",
|
||||
"ods_table": "billiards_ods.assistant_accounts_master",
|
||||
"count": {
|
||||
"dwd": 50,
|
||||
"ods": 50,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_member",
|
||||
"ods_table": "billiards_ods.member_profiles",
|
||||
"count": {
|
||||
"dwd": 199,
|
||||
"ods": 199,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_member_ex",
|
||||
"ods_table": "billiards_ods.member_profiles",
|
||||
"count": {
|
||||
"dwd": 199,
|
||||
"ods": 199,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_member_card_account",
|
||||
"ods_table": "billiards_ods.member_stored_value_cards",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "balance",
|
||||
"dwd_sum": 31061.03,
|
||||
"ods_sum": 31061.03,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_member_card_account_ex",
|
||||
"ods_table": "billiards_ods.member_stored_value_cards",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "deliveryfeededuct",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_tenant_goods",
|
||||
"ods_table": "billiards_ods.tenant_goods_master",
|
||||
"count": {
|
||||
"dwd": 156,
|
||||
"ods": 156,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_tenant_goods_ex",
|
||||
"ods_table": "billiards_ods.tenant_goods_master",
|
||||
"count": {
|
||||
"dwd": 156,
|
||||
"ods": 156,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_store_goods",
|
||||
"ods_table": "billiards_ods.store_goods_master",
|
||||
"count": {
|
||||
"dwd": 161,
|
||||
"ods": 161,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_store_goods_ex",
|
||||
"ods_table": "billiards_ods.store_goods_master",
|
||||
"count": {
|
||||
"dwd": 161,
|
||||
"ods": 161,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_goods_category",
|
||||
"ods_table": "billiards_ods.stock_goods_category_tree",
|
||||
"count": {
|
||||
"dwd": 26,
|
||||
"ods": 9,
|
||||
"diff": 17
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_groupbuy_package",
|
||||
"ods_table": "billiards_ods.group_buy_packages",
|
||||
"count": {
|
||||
"dwd": 17,
|
||||
"ods": 17,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dim_groupbuy_package_ex",
|
||||
"ods_table": "billiards_ods.group_buy_packages",
|
||||
"count": {
|
||||
"dwd": 17,
|
||||
"ods": 17,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_settlement_head",
|
||||
"ods_table": "billiards_ods.settlement_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_settlement_head_ex",
|
||||
"ods_table": "billiards_ods.settlement_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_table_fee_log",
|
||||
"ods_table": "billiards_ods.table_fee_transactions",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "adjust_amount",
|
||||
"dwd_sum": 1157.45,
|
||||
"ods_sum": 1157.45,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "coupon_promotion_amount",
|
||||
"dwd_sum": 11244.49,
|
||||
"ods_sum": 11244.49,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 18107.0,
|
||||
"ods_sum": 18107.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "member_discount_amount",
|
||||
"dwd_sum": 1149.19,
|
||||
"ods_sum": 1149.19,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "real_table_charge_money",
|
||||
"dwd_sum": 5705.06,
|
||||
"ods_sum": 5705.06,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_table_fee_log_ex",
|
||||
"ods_table": "billiards_ods.table_fee_transactions",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "fee_total",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "mgmt_fee",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "service_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "used_card_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_table_fee_adjust",
|
||||
"ods_table": "billiards_ods.table_fee_discount_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 20650.84,
|
||||
"ods_sum": 20650.84,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_table_fee_adjust_ex",
|
||||
"ods_table": "billiards_ods.table_fee_discount_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_store_goods_sale",
|
||||
"ods_table": "billiards_ods.store_goods_sales_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "cost_money",
|
||||
"dwd_sum": 22.3,
|
||||
"ods_sum": 22.3,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 4583.0,
|
||||
"ods_sum": 4583.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "real_goods_money",
|
||||
"dwd_sum": 3791.0,
|
||||
"ods_sum": 3791.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_store_goods_sale_ex",
|
||||
"ods_table": "billiards_ods.store_goods_sales_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "coupon_deduct_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "discount_money",
|
||||
"dwd_sum": 792.0,
|
||||
"ods_sum": 792.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "member_discount_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "option_coupon_deduct_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "option_member_discount_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "point_discount_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "point_discount_money_cost",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "push_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_assistant_service_log",
|
||||
"ods_table": "billiards_ods.assistant_service_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "coupon_deduct_money",
|
||||
"dwd_sum": 626.83,
|
||||
"ods_sum": 626.83,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 63251.37,
|
||||
"ods_sum": 63251.37,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_assistant_service_log_ex",
|
||||
"ods_table": "billiards_ods.assistant_service_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "manual_discount_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "member_discount_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "service_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_assistant_trash_event",
|
||||
"ods_table": "billiards_ods.assistant_cancellation_records",
|
||||
"count": {
|
||||
"dwd": 15,
|
||||
"ods": 15,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_assistant_trash_event_ex",
|
||||
"ods_table": "billiards_ods.assistant_cancellation_records",
|
||||
"count": {
|
||||
"dwd": 15,
|
||||
"ods": 15,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_member_balance_change",
|
||||
"ods_table": "billiards_ods.member_balance_changes",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_member_balance_change_ex",
|
||||
"ods_table": "billiards_ods.member_balance_changes",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "refund_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_groupbuy_redemption",
|
||||
"ods_table": "billiards_ods.group_buy_redemption_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "coupon_money",
|
||||
"dwd_sum": 12266.0,
|
||||
"ods_sum": 12266.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "ledger_amount",
|
||||
"dwd_sum": 12049.53,
|
||||
"ods_sum": 12049.53,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_groupbuy_redemption_ex",
|
||||
"ods_table": "billiards_ods.group_buy_redemption_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "assistant_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "assistant_service_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "goods_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "recharge_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "reward_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "table_service_promotion_money",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_platform_coupon_redemption",
|
||||
"ods_table": "billiards_ods.platform_coupon_redemption_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "coupon_money",
|
||||
"dwd_sum": 11956.0,
|
||||
"ods_sum": 11956.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_platform_coupon_redemption_ex",
|
||||
"ods_table": "billiards_ods.platform_coupon_redemption_records",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_recharge_order",
|
||||
"ods_table": "billiards_ods.recharge_settlements",
|
||||
"count": {
|
||||
"dwd": 74,
|
||||
"ods": 74,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_recharge_order_ex",
|
||||
"ods_table": "billiards_ods.recharge_settlements",
|
||||
"count": {
|
||||
"dwd": 74,
|
||||
"ods": 74,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": []
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_payment",
|
||||
"ods_table": "billiards_ods.payment_transactions",
|
||||
"count": {
|
||||
"dwd": 200,
|
||||
"ods": 200,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "pay_amount",
|
||||
"dwd_sum": 10863.0,
|
||||
"ods_sum": 10863.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_refund",
|
||||
"ods_table": "billiards_ods.refund_transactions",
|
||||
"count": {
|
||||
"dwd": 11,
|
||||
"ods": 11,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "channel_fee",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "pay_amount",
|
||||
"dwd_sum": -62186.0,
|
||||
"ods_sum": -62186.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"dwd_table": "billiards_dwd.dwd_refund_ex",
|
||||
"ods_table": "billiards_ods.refund_transactions",
|
||||
"count": {
|
||||
"dwd": 11,
|
||||
"ods": 11,
|
||||
"diff": 0
|
||||
},
|
||||
"amounts": [
|
||||
{
|
||||
"column": "balance_frozen_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "card_frozen_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "refund_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
},
|
||||
{
|
||||
"column": "round_amount",
|
||||
"dwd_sum": 0.0,
|
||||
"ods_sum": 0.0,
|
||||
"diff": 0.0
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"note": "行数/金额核对,金额字段基于列名包含 amount/money/fee/balance 的数值列自动扫描。"
|
||||
}
|
||||
27
etl_billiards/run_ods.bat
Normal file
27
etl_billiards/run_ods.bat
Normal file
@@ -0,0 +1,27 @@
|
||||
@echo off
|
||||
REM -*- coding: utf-8 -*-
|
||||
REM 说明:一键重建 ODS(执行 INIT_ODS_SCHEMA)并灌入示例 JSON(执行 MANUAL_INGEST)
|
||||
REM 使用配置:.env 中 PG_DSN、INGEST_SOURCE_DIR,或通过参数覆盖
|
||||
|
||||
setlocal
|
||||
cd /d %~dp0
|
||||
|
||||
REM 如果需要覆盖示例目录,可修改下面的 INGEST_DIR
|
||||
set "INGEST_DIR=C:\dev\LLTQ\export\test-json-doc"
|
||||
|
||||
echo [INIT_ODS_SCHEMA] 准备执行,源目录=%INGEST_DIR%
|
||||
python -m cli.main --tasks INIT_ODS_SCHEMA --pipeline-flow INGEST_ONLY --ingest-source "%INGEST_DIR%"
|
||||
if errorlevel 1 (
|
||||
echo INIT_ODS_SCHEMA 失败,退出
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo [MANUAL_INGEST] 准备执行,源目录=%INGEST_DIR%
|
||||
python -m cli.main --tasks MANUAL_INGEST --pipeline-flow INGEST_ONLY --ingest-source "%INGEST_DIR%"
|
||||
if errorlevel 1 (
|
||||
echo MANUAL_INGEST 失败,退出
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo 全部完成。
|
||||
endlocal
|
||||
@@ -1,4 +1,4 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Populate PRD DWD tables from ODS payload snapshots."""
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -16,9 +16,9 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
INSERT INTO billiards_dwd.dim_tenant (tenant_id, tenant_name, status)
|
||||
SELECT DISTINCT tenant_id, 'default' AS tenant_name, 'active' AS status
|
||||
FROM (
|
||||
SELECT tenant_id FROM billiards_ods.ods_order_settle
|
||||
SELECT tenant_id FROM billiards_ods.settlement_records
|
||||
UNION SELECT tenant_id FROM billiards_ods.ods_order_receipt_detail
|
||||
UNION SELECT tenant_id FROM billiards_ods.ods_member_profile
|
||||
UNION SELECT tenant_id FROM billiards_ods.member_profiles
|
||||
) s
|
||||
WHERE tenant_id IS NOT NULL
|
||||
ON CONFLICT (tenant_id) DO UPDATE SET updated_at = now();
|
||||
@@ -30,7 +30,7 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
INSERT INTO billiards_dwd.dim_site (site_id, tenant_id, site_name, status)
|
||||
SELECT DISTINCT site_id, MAX(tenant_id) AS tenant_id, 'default' AS site_name, 'active' AS status
|
||||
FROM (
|
||||
SELECT site_id, tenant_id FROM billiards_ods.ods_order_settle
|
||||
SELECT site_id, tenant_id FROM billiards_ods.settlement_records
|
||||
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_order_receipt_detail
|
||||
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_table_info
|
||||
) s
|
||||
@@ -84,7 +84,7 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_member_card_type (card_type_id, card_type_name, discount_rate)
|
||||
SELECT DISTINCT card_type_id, card_type_name, discount_rate
|
||||
FROM billiards_ods.ods_member_card
|
||||
FROM billiards_ods.member_stored_value_cards
|
||||
WHERE card_type_id IS NOT NULL
|
||||
ON CONFLICT (card_type_id) DO UPDATE SET
|
||||
card_type_name = EXCLUDED.card_type_name,
|
||||
@@ -119,10 +119,10 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
prof.wechat_id,
|
||||
prof.alipay_id,
|
||||
prof.remarks
|
||||
FROM billiards_ods.ods_member_profile prof
|
||||
FROM billiards_ods.member_profiles prof
|
||||
LEFT JOIN (
|
||||
SELECT DISTINCT site_id, member_id, card_type_id AS member_type_id, card_type_name AS member_type_name
|
||||
FROM billiards_ods.ods_member_card
|
||||
FROM billiards_ods.member_stored_value_cards
|
||||
) card
|
||||
ON prof.site_id = card.site_id AND prof.member_id = card.member_id
|
||||
WHERE prof.member_id IS NOT NULL
|
||||
@@ -167,7 +167,7 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_assistant (assistant_id, assistant_name, mobile, status)
|
||||
SELECT DISTINCT assistant_id, assistant_name, mobile, status
|
||||
FROM billiards_ods.ods_assistant_account
|
||||
FROM billiards_ods.assistant_accounts_master
|
||||
WHERE assistant_id IS NOT NULL
|
||||
ON CONFLICT (assistant_id) DO UPDATE SET
|
||||
assistant_name = EXCLUDED.assistant_name,
|
||||
@@ -181,7 +181,7 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_pay_method (pay_method_code, pay_method_name, is_stored_value, status)
|
||||
SELECT DISTINCT pay_method_code, pay_method_name, FALSE AS is_stored_value, 'active' AS status
|
||||
FROM billiards_ods.ods_payment_record
|
||||
FROM billiards_ods.payment_transactions
|
||||
WHERE pay_method_code IS NOT NULL
|
||||
ON CONFLICT (pay_method_code) DO UPDATE SET
|
||||
pay_method_name = EXCLUDED.pay_method_name,
|
||||
@@ -250,7 +250,7 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
final_table_fee,
|
||||
FALSE AS is_canceled,
|
||||
NULL::TIMESTAMPTZ AS cancel_time
|
||||
FROM billiards_ods.ods_table_use_log
|
||||
FROM billiards_ods.table_fee_transactions_log
|
||||
ON CONFLICT (site_id, ledger_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
@@ -325,7 +325,7 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
pay_time,
|
||||
relate_type,
|
||||
relate_id
|
||||
FROM billiards_ods.ods_payment_record
|
||||
FROM billiards_ods.payment_transactions
|
||||
ON CONFLICT (site_id, pay_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
@@ -346,7 +346,7 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
refund_amount,
|
||||
refund_time,
|
||||
status
|
||||
FROM billiards_ods.ods_refund_record
|
||||
FROM billiards_ods.refund_transactions
|
||||
ON CONFLICT (site_id, refund_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
@@ -369,7 +369,7 @@ SQL_STEPS: list[tuple[str, str]] = [
|
||||
balance_before,
|
||||
balance_after,
|
||||
change_time
|
||||
FROM billiards_ods.ods_balance_change
|
||||
FROM billiards_ods.member_balance_changes
|
||||
ON CONFLICT (site_id, change_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
@@ -423,3 +423,4 @@ def main() -> int:
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
|
||||
|
||||
117
etl_billiards/scripts/check_ods_json_vs_table.py
Normal file
117
etl_billiards/scripts/check_ods_json_vs_table.py
Normal file
@@ -0,0 +1,117 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
ODS JSON 字段核对脚本:对照当前数据库中的 ODS 表字段,检查示例 JSON(默认目录 C:\\dev\\LLTQ\\export\\test-json-doc)
|
||||
是否包含同名键,并输出每表未命中的字段,便于补充映射或确认确实无源字段。
|
||||
|
||||
使用方法:
|
||||
set PG_DSN=postgresql://... # 如 .env 中配置
|
||||
python -m etl_billiards.scripts.check_ods_json_vs_table
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import pathlib
|
||||
from typing import Dict, Iterable, Set, Tuple
|
||||
|
||||
import psycopg2
|
||||
|
||||
from etl_billiards.tasks.manual_ingest_task import ManualIngestTask
|
||||
|
||||
|
||||
def _flatten_keys(obj, prefix: str = "") -> Set[str]:
|
||||
"""递归展开 JSON 所有键路径,返回形如 data.assistantInfos.id 的集合。列表不保留索引,仅继续向下展开。"""
|
||||
keys: Set[str] = set()
|
||||
if isinstance(obj, dict):
|
||||
for k, v in obj.items():
|
||||
new_prefix = f"{prefix}.{k}" if prefix else k
|
||||
keys.add(new_prefix)
|
||||
keys |= _flatten_keys(v, new_prefix)
|
||||
elif isinstance(obj, list):
|
||||
for item in obj:
|
||||
keys |= _flatten_keys(item, prefix)
|
||||
return keys
|
||||
|
||||
|
||||
def _load_json_keys(path: pathlib.Path) -> Tuple[Set[str], dict[str, Set[str]]]:
|
||||
"""读取单个 JSON 文件并返回展开后的键集合以及末段->路径列表映射,若文件不存在或无法解析则返回空集合。"""
|
||||
if not path.exists():
|
||||
return set(), {}
|
||||
data = json.loads(path.read_text(encoding="utf-8"))
|
||||
paths = _flatten_keys(data)
|
||||
last_map: dict[str, Set[str]] = {}
|
||||
for p in paths:
|
||||
last = p.split(".")[-1].lower()
|
||||
last_map.setdefault(last, set()).add(p)
|
||||
return paths, last_map
|
||||
|
||||
|
||||
def _load_ods_columns(dsn: str) -> Dict[str, Set[str]]:
|
||||
"""从数据库读取 billiards_ods.* 的列名集合,按表返回。"""
|
||||
conn = psycopg2.connect(dsn)
|
||||
cur = conn.cursor()
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT table_name, column_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema='billiards_ods'
|
||||
ORDER BY table_name, ordinal_position
|
||||
"""
|
||||
)
|
||||
result: Dict[str, Set[str]] = {}
|
||||
for table, col in cur.fetchall():
|
||||
result.setdefault(table, set()).add(col.lower())
|
||||
cur.close()
|
||||
conn.close()
|
||||
return result
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""主流程:遍历 FILE_MAPPING 中的 ODS 表,检查 JSON 键覆盖情况并打印报告。"""
|
||||
dsn = os.environ.get("PG_DSN")
|
||||
json_dir = pathlib.Path(os.environ.get("JSON_DOC_DIR", r"C:\dev\LLTQ\export\test-json-doc"))
|
||||
|
||||
ods_cols_map = _load_ods_columns(dsn)
|
||||
|
||||
print(f"使用 JSON 目录: {json_dir}")
|
||||
print(f"连接 DSN: {dsn}")
|
||||
print("=" * 80)
|
||||
|
||||
for keywords, ods_table in ManualIngestTask.FILE_MAPPING:
|
||||
table = ods_table.split(".")[-1]
|
||||
cols = ods_cols_map.get(table, set())
|
||||
file_name = f"{keywords[0]}.json"
|
||||
file_path = json_dir / file_name
|
||||
keys_full, path_map = _load_json_keys(file_path)
|
||||
key_last_parts = set(path_map.keys())
|
||||
|
||||
missing: Set[str] = set()
|
||||
extra_keys: Set[str] = set()
|
||||
present: Set[str] = set()
|
||||
for col in sorted(cols):
|
||||
if col in key_last_parts:
|
||||
present.add(col)
|
||||
else:
|
||||
missing.add(col)
|
||||
for k in key_last_parts:
|
||||
if k not in cols:
|
||||
extra_keys.add(k)
|
||||
|
||||
print(f"[{table}] 文件={file_name} 列数={len(cols)} JSON键(末段)覆盖={len(present)}/{len(cols)}")
|
||||
if missing:
|
||||
print(" 未命中列:", ", ".join(sorted(missing)))
|
||||
else:
|
||||
print(" 未命中列: 无")
|
||||
if extra_keys:
|
||||
extras = []
|
||||
for k in sorted(extra_keys):
|
||||
paths = ", ".join(sorted(path_map.get(k, [])))
|
||||
extras.append(f"{k} ({paths})")
|
||||
print(" JSON 仅有(表无此列):", "; ".join(extras))
|
||||
else:
|
||||
print(" JSON 仅有(表无此列): 无")
|
||||
print("-" * 80)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
907
etl_billiards/tasks/dwd_load_task.py
Normal file
907
etl_billiards/tasks/dwd_load_task.py
Normal file
@@ -0,0 +1,907 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""DWD 装载任务:从 ODS 增量写入 DWD(维度 SCD2,事实按时间增量)。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Any, Dict, Iterable, List, Sequence
|
||||
|
||||
from psycopg2.extras import RealDictCursor
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
|
||||
|
||||
class DwdLoadTask(BaseTask):
|
||||
"""负责 DWD 装载:维度表做 SCD2 合并,事实表按时间增量写入。"""
|
||||
|
||||
# DWD -> ODS 表映射(ODS 表名已与示例 JSON 前缀统一)
|
||||
TABLE_MAP: dict[str, str] = {
|
||||
# 维度
|
||||
# 门店:改用台费流水中的 siteprofile 快照,补齐 org/地址等字段
|
||||
"billiards_dwd.dim_site": "billiards_ods.table_fee_transactions",
|
||||
"billiards_dwd.dim_site_ex": "billiards_ods.table_fee_transactions",
|
||||
"billiards_dwd.dim_table": "billiards_ods.site_tables_master",
|
||||
"billiards_dwd.dim_table_ex": "billiards_ods.site_tables_master",
|
||||
"billiards_dwd.dim_assistant": "billiards_ods.assistant_accounts_master",
|
||||
"billiards_dwd.dim_assistant_ex": "billiards_ods.assistant_accounts_master",
|
||||
"billiards_dwd.dim_member": "billiards_ods.member_profiles",
|
||||
"billiards_dwd.dim_member_ex": "billiards_ods.member_profiles",
|
||||
"billiards_dwd.dim_member_card_account": "billiards_ods.member_stored_value_cards",
|
||||
"billiards_dwd.dim_member_card_account_ex": "billiards_ods.member_stored_value_cards",
|
||||
"billiards_dwd.dim_tenant_goods": "billiards_ods.tenant_goods_master",
|
||||
"billiards_dwd.dim_tenant_goods_ex": "billiards_ods.tenant_goods_master",
|
||||
"billiards_dwd.dim_store_goods": "billiards_ods.store_goods_master",
|
||||
"billiards_dwd.dim_store_goods_ex": "billiards_ods.store_goods_master",
|
||||
"billiards_dwd.dim_goods_category": "billiards_ods.stock_goods_category_tree",
|
||||
"billiards_dwd.dim_groupbuy_package": "billiards_ods.group_buy_packages",
|
||||
"billiards_dwd.dim_groupbuy_package_ex": "billiards_ods.group_buy_packages",
|
||||
# 事实
|
||||
"billiards_dwd.dwd_settlement_head": "billiards_ods.settlement_records",
|
||||
"billiards_dwd.dwd_settlement_head_ex": "billiards_ods.settlement_records",
|
||||
"billiards_dwd.dwd_table_fee_log": "billiards_ods.table_fee_transactions",
|
||||
"billiards_dwd.dwd_table_fee_log_ex": "billiards_ods.table_fee_transactions",
|
||||
"billiards_dwd.dwd_table_fee_adjust": "billiards_ods.table_fee_discount_records",
|
||||
"billiards_dwd.dwd_table_fee_adjust_ex": "billiards_ods.table_fee_discount_records",
|
||||
"billiards_dwd.dwd_store_goods_sale": "billiards_ods.store_goods_sales_records",
|
||||
"billiards_dwd.dwd_store_goods_sale_ex": "billiards_ods.store_goods_sales_records",
|
||||
"billiards_dwd.dwd_assistant_service_log": "billiards_ods.assistant_service_records",
|
||||
"billiards_dwd.dwd_assistant_service_log_ex": "billiards_ods.assistant_service_records",
|
||||
"billiards_dwd.dwd_assistant_trash_event": "billiards_ods.assistant_cancellation_records",
|
||||
"billiards_dwd.dwd_assistant_trash_event_ex": "billiards_ods.assistant_cancellation_records",
|
||||
"billiards_dwd.dwd_member_balance_change": "billiards_ods.member_balance_changes",
|
||||
"billiards_dwd.dwd_member_balance_change_ex": "billiards_ods.member_balance_changes",
|
||||
"billiards_dwd.dwd_groupbuy_redemption": "billiards_ods.group_buy_redemption_records",
|
||||
"billiards_dwd.dwd_groupbuy_redemption_ex": "billiards_ods.group_buy_redemption_records",
|
||||
"billiards_dwd.dwd_platform_coupon_redemption": "billiards_ods.platform_coupon_redemption_records",
|
||||
"billiards_dwd.dwd_platform_coupon_redemption_ex": "billiards_ods.platform_coupon_redemption_records",
|
||||
"billiards_dwd.dwd_recharge_order": "billiards_ods.recharge_settlements",
|
||||
"billiards_dwd.dwd_recharge_order_ex": "billiards_ods.recharge_settlements",
|
||||
"billiards_dwd.dwd_payment": "billiards_ods.payment_transactions",
|
||||
"billiards_dwd.dwd_refund": "billiards_ods.refund_transactions",
|
||||
"billiards_dwd.dwd_refund_ex": "billiards_ods.refund_transactions",
|
||||
}
|
||||
|
||||
SCD_COLS = {"scd2_start_time", "scd2_end_time", "scd2_is_current", "scd2_version"}
|
||||
FACT_ORDER_CANDIDATES = [
|
||||
"fetched_at",
|
||||
"pay_time",
|
||||
"create_time",
|
||||
"update_time",
|
||||
"occur_time",
|
||||
"settle_time",
|
||||
"start_use_time",
|
||||
]
|
||||
|
||||
# 特殊列映射:dwd 列名 -> 源列表达式(可选 CAST)
|
||||
FACT_MAPPINGS: dict[str, list[tuple[str, str, str | None]]] = {
|
||||
# 维度表(补齐主键/字段差异)
|
||||
"billiards_dwd.dim_site": [
|
||||
("org_id", "siteprofile->>'org_id'", None),
|
||||
("shop_name", "siteprofile->>'shop_name'", None),
|
||||
("site_label", "siteprofile->>'site_label'", None),
|
||||
("full_address", "siteprofile->>'full_address'", None),
|
||||
("address", "siteprofile->>'address'", None),
|
||||
("longitude", "siteprofile->>'longitude'", "numeric"),
|
||||
("latitude", "siteprofile->>'latitude'", "numeric"),
|
||||
("tenant_site_region_id", "siteprofile->>'tenant_site_region_id'", None),
|
||||
("business_tel", "siteprofile->>'business_tel'", None),
|
||||
("site_type", "siteprofile->>'site_type'", None),
|
||||
("shop_status", "siteprofile->>'shop_status'", None),
|
||||
("tenant_id", "siteprofile->>'tenant_id'", None),
|
||||
],
|
||||
"billiards_dwd.dim_site_ex": [
|
||||
("auto_light", "siteprofile->>'auto_light'", None),
|
||||
("attendance_enabled", "siteprofile->>'attendance_enabled'", None),
|
||||
("attendance_distance", "siteprofile->>'attendance_distance'", None),
|
||||
("prod_env", "siteprofile->>'prod_env'", None),
|
||||
("light_status", "siteprofile->>'light_status'", None),
|
||||
("light_type", "siteprofile->>'light_type'", None),
|
||||
("light_token", "siteprofile->>'light_token'", None),
|
||||
("address", "siteprofile->>'address'", None),
|
||||
("avatar", "siteprofile->>'avatar'", None),
|
||||
("wifi_name", "siteprofile->>'wifi_name'", None),
|
||||
("wifi_password", "siteprofile->>'wifi_password'", None),
|
||||
("customer_service_qrcode", "siteprofile->>'customer_service_qrcode'", None),
|
||||
("customer_service_wechat", "siteprofile->>'customer_service_wechat'", None),
|
||||
("fixed_pay_qrcode", "siteprofile->>'fixed_pay_qrCode'", None),
|
||||
("longitude", "siteprofile->>'longitude'", "numeric"),
|
||||
("latitude", "siteprofile->>'latitude'", "numeric"),
|
||||
("tenant_site_region_id", "siteprofile->>'tenant_site_region_id'", None),
|
||||
("site_type", "siteprofile->>'site_type'", None),
|
||||
("site_label", "siteprofile->>'site_label'", None),
|
||||
("shop_status", "siteprofile->>'shop_status'", None),
|
||||
("create_time", "siteprofile->>'create_time'", "timestamptz"),
|
||||
("update_time", "siteprofile->>'update_time'", "timestamptz"),
|
||||
],
|
||||
"billiards_dwd.dim_table": [
|
||||
("table_id", "id", None),
|
||||
("site_table_area_name", "areaname", None),
|
||||
("tenant_table_area_id", "site_table_area_id", None),
|
||||
],
|
||||
"billiards_dwd.dim_table_ex": [
|
||||
("table_id", "id", None),
|
||||
("table_cloth_use_time", "table_cloth_use_time", None),
|
||||
],
|
||||
"billiards_dwd.dim_assistant": [("assistant_id", "id", None), ("user_id", "staff_id", None)],
|
||||
"billiards_dwd.dim_assistant_ex": [
|
||||
("assistant_id", "id", None),
|
||||
("introduce", "introduce", None),
|
||||
("group_name", "group_name", None),
|
||||
("light_equipment_id", "light_equipment_id", None),
|
||||
],
|
||||
"billiards_dwd.dim_member": [("member_id", "id", None)],
|
||||
"billiards_dwd.dim_member_ex": [
|
||||
("member_id", "id", None),
|
||||
("register_site_name", "site_name", None),
|
||||
],
|
||||
"billiards_dwd.dim_member_card_account": [("member_card_id", "id", None)],
|
||||
"billiards_dwd.dim_member_card_account_ex": [
|
||||
("member_card_id", "id", None),
|
||||
("tenant_name", "tenantname", None),
|
||||
("tenantavatar", "tenantavatar", None),
|
||||
("card_no", "card_no", None),
|
||||
("bind_password", "bind_password", None),
|
||||
("use_scene", "use_scene", None),
|
||||
("tableareaid", "tableareaid", None),
|
||||
("goodscategoryid", "goodscategoryid", None),
|
||||
],
|
||||
"billiards_dwd.dim_tenant_goods": [
|
||||
("tenant_goods_id", "id", None),
|
||||
("category_name", "categoryname", None),
|
||||
],
|
||||
"billiards_dwd.dim_tenant_goods_ex": [
|
||||
("tenant_goods_id", "id", None),
|
||||
("remark_name", "remark_name", None),
|
||||
("goods_bar_code", "goods_bar_code", None),
|
||||
("commodity_code_list", "commodity_code", None),
|
||||
("is_in_site", "isinsite", "boolean"),
|
||||
],
|
||||
"billiards_dwd.dim_store_goods": [
|
||||
("site_goods_id", "id", None),
|
||||
("category_level1_name", "onecategoryname", None),
|
||||
("category_level2_name", "twocategoryname", None),
|
||||
("created_at", "create_time", None),
|
||||
("updated_at", "update_time", None),
|
||||
("avg_monthly_sales", "average_monthly_sales", None),
|
||||
("batch_stock_qty", "stock", None),
|
||||
("sale_qty", "sale_num", None),
|
||||
("total_sales_qty", "total_sales", None),
|
||||
],
|
||||
"billiards_dwd.dim_store_goods_ex": [
|
||||
("site_goods_id", "id", None),
|
||||
("goods_barcode", "goods_bar_code", None),
|
||||
("stock_qty", "stock", None),
|
||||
("stock_secondary_qty", "stock_a", None),
|
||||
("safety_stock_qty", "safe_stock", None),
|
||||
("site_name", "sitename", None),
|
||||
("goods_cover_url", "goods_cover", None),
|
||||
("provisional_total_cost", "total_purchase_cost", None),
|
||||
("is_discountable", "able_discount", None),
|
||||
("freeze_status", "freeze", None),
|
||||
("remark", "remark", None),
|
||||
("days_on_shelf", "days_available", None),
|
||||
("sort_order", "sort", None),
|
||||
],
|
||||
"billiards_dwd.dim_goods_category": [
|
||||
("category_id", "id", None),
|
||||
("tenant_id", "tenant_id", None),
|
||||
("category_name", "category_name", None),
|
||||
("alias_name", "alias_name", None),
|
||||
("parent_category_id", "pid", None),
|
||||
("business_name", "business_name", None),
|
||||
("tenant_goods_business_id", "tenant_goods_business_id", None),
|
||||
("sort_order", "sort", None),
|
||||
("open_salesman", "open_salesman", None),
|
||||
("is_warehousing", "is_warehousing", None),
|
||||
("category_level", "CASE WHEN pid = 0 THEN 1 ELSE 2 END", None),
|
||||
("is_leaf", "CASE WHEN categoryboxes IS NULL OR jsonb_array_length(categoryboxes)=0 THEN 1 ELSE 0 END", None),
|
||||
],
|
||||
"billiards_dwd.dim_groupbuy_package": [
|
||||
("groupbuy_package_id", "id", None),
|
||||
("package_template_id", "package_id", None),
|
||||
("coupon_face_value", "coupon_money", None),
|
||||
("duration_seconds", "duration", None),
|
||||
],
|
||||
"billiards_dwd.dim_groupbuy_package_ex": [
|
||||
("groupbuy_package_id", "id", None),
|
||||
("table_area_id", "table_area_id", None),
|
||||
("tenant_table_area_id", "tenant_table_area_id", None),
|
||||
("usable_range", "usable_range", None),
|
||||
("table_area_id_list", "table_area_id_list", None),
|
||||
("package_type", "type", None),
|
||||
],
|
||||
# 事实表主键及关键差异列
|
||||
"billiards_dwd.dwd_table_fee_log": [("table_fee_log_id", "id", None)],
|
||||
"billiards_dwd.dwd_table_fee_log_ex": [
|
||||
("table_fee_log_id", "id", None),
|
||||
("salesman_name", "salesman_name", None),
|
||||
],
|
||||
"billiards_dwd.dwd_table_fee_adjust": [
|
||||
("table_fee_adjust_id", "id", None),
|
||||
("table_id", "site_table_id", None),
|
||||
("table_area_id", "tenant_table_area_id", None),
|
||||
("table_area_name", "tableprofile->>'table_area_name'", None),
|
||||
("adjust_time", "create_time", None),
|
||||
],
|
||||
"billiards_dwd.dwd_table_fee_adjust_ex": [
|
||||
("table_fee_adjust_id", "id", None),
|
||||
("ledger_name", "ledger_name", None),
|
||||
],
|
||||
"billiards_dwd.dwd_store_goods_sale": [("store_goods_sale_id", "id", None), ("discount_price", "discount_money", None)],
|
||||
"billiards_dwd.dwd_store_goods_sale_ex": [
|
||||
("store_goods_sale_id", "id", None),
|
||||
("option_value_name", "option_value_name", None),
|
||||
("open_salesman_flag", "opensalesman", "integer"),
|
||||
("salesman_name", "salesman_name", None),
|
||||
("salesman_org_id", "sales_man_org_id", None),
|
||||
("legacy_order_goods_id", "ordergoodsid", None),
|
||||
("site_name", "sitename", None),
|
||||
("legacy_site_id", "siteid", None),
|
||||
],
|
||||
"billiards_dwd.dwd_assistant_service_log": [
|
||||
("assistant_service_id", "id", None),
|
||||
("assistant_no", "assistantno", None),
|
||||
("site_assistant_id", "order_assistant_id", None),
|
||||
("level_name", "levelname", None),
|
||||
("skill_name", "skillname", None),
|
||||
],
|
||||
"billiards_dwd.dwd_assistant_service_log_ex": [
|
||||
("assistant_service_id", "id", None),
|
||||
("assistant_name", "assistantname", None),
|
||||
("ledger_group_name", "ledger_group_name", None),
|
||||
("trash_applicant_name", "trash_applicant_name", None),
|
||||
("trash_reason", "trash_reason", None),
|
||||
("salesman_name", "salesman_name", None),
|
||||
("table_name", "tablename", None),
|
||||
],
|
||||
"billiards_dwd.dwd_assistant_trash_event": [
|
||||
("assistant_trash_event_id", "id", None),
|
||||
("assistant_no", "assistantname", None),
|
||||
("abolish_amount", "assistantabolishamount", None),
|
||||
("charge_minutes_raw", "pdchargeminutes", None),
|
||||
("site_id", "siteid", None),
|
||||
("table_id", "tableid", None),
|
||||
("table_area_id", "tableareaid", None),
|
||||
("assistant_name", "assistantname", None),
|
||||
("trash_reason", "trashreason", None),
|
||||
("create_time", "createtime", None),
|
||||
],
|
||||
"billiards_dwd.dwd_assistant_trash_event_ex": [
|
||||
("assistant_trash_event_id", "id", None),
|
||||
("table_area_name", "tablearea", None),
|
||||
("table_name", "tablename", None),
|
||||
],
|
||||
"billiards_dwd.dwd_member_balance_change": [
|
||||
("balance_change_id", "id", None),
|
||||
("balance_before", "before", None),
|
||||
("change_amount", "account_data", None),
|
||||
("balance_after", "after", None),
|
||||
("card_type_name", "membercardtypename", None),
|
||||
("change_time", "create_time", None),
|
||||
("member_name", "membername", None),
|
||||
("member_mobile", "membermobile", None),
|
||||
],
|
||||
"billiards_dwd.dwd_member_balance_change_ex": [
|
||||
("balance_change_id", "id", None),
|
||||
("pay_site_name", "paysitename", None),
|
||||
("register_site_name", "registersitename", None),
|
||||
],
|
||||
"billiards_dwd.dwd_groupbuy_redemption": [("redemption_id", "id", None)],
|
||||
"billiards_dwd.dwd_groupbuy_redemption_ex": [
|
||||
("redemption_id", "id", None),
|
||||
("table_area_name", "tableareaname", None),
|
||||
("site_name", "sitename", None),
|
||||
("table_name", "tablename", None),
|
||||
("goods_option_price", "goodsoptionprice", None),
|
||||
("salesman_name", "salesman_name", None),
|
||||
("salesman_org_id", "sales_man_org_id", None),
|
||||
("ledger_group_name", "ledger_group_name", None),
|
||||
],
|
||||
"billiards_dwd.dwd_platform_coupon_redemption": [("platform_coupon_redemption_id", "id", None)],
|
||||
"billiards_dwd.dwd_platform_coupon_redemption_ex": [
|
||||
("platform_coupon_redemption_id", "id", None),
|
||||
("coupon_cover", "coupon_cover", None),
|
||||
],
|
||||
"billiards_dwd.dwd_payment": [("payment_id", "id", None), ("pay_date", "pay_time", "date")],
|
||||
"billiards_dwd.dwd_refund": [("refund_id", "id", None)],
|
||||
"billiards_dwd.dwd_refund_ex": [
|
||||
("refund_id", "id", None),
|
||||
("tenant_name", "tenantname", None),
|
||||
("channel_payer_id", "channel_payer_id", None),
|
||||
("channel_pay_no", "channel_pay_no", None),
|
||||
],
|
||||
# 结算头:settlement_records(源列为小写驼峰/无下划线,需要显式映射)
|
||||
"billiards_dwd.dwd_settlement_head": [
|
||||
("order_settle_id", "id", None),
|
||||
("tenant_id", "tenantid", None),
|
||||
("site_id", "siteid", None),
|
||||
("site_name", "sitename", None),
|
||||
("table_id", "tableid", None),
|
||||
("settle_name", "settlename", None),
|
||||
("order_trade_no", "settlerelateid", None),
|
||||
("create_time", "createtime", None),
|
||||
("pay_time", "paytime", None),
|
||||
("settle_type", "settletype", None),
|
||||
("revoke_order_id", "revokeorderid", None),
|
||||
("member_id", "memberid", None),
|
||||
("member_name", "membername", None),
|
||||
("member_phone", "memberphone", None),
|
||||
("member_card_account_id", "tenantmembercardid", None),
|
||||
("member_card_type_name", "membercardtypename", None),
|
||||
("is_bind_member", "isbindmember", None),
|
||||
("member_discount_amount", "memberdiscountamount", None),
|
||||
("consume_money", "consumemoney", None),
|
||||
("table_charge_money", "tablechargemoney", None),
|
||||
("goods_money", "goodsmoney", None),
|
||||
("real_goods_money", "realgoodsmoney", None),
|
||||
("assistant_pd_money", "assistantpdmoney", None),
|
||||
("assistant_cx_money", "assistantcxmoney", None),
|
||||
("adjust_amount", "adjustamount", None),
|
||||
("pay_amount", "payamount", None),
|
||||
("balance_amount", "balanceamount", None),
|
||||
("recharge_card_amount", "rechargecardamount", None),
|
||||
("gift_card_amount", "giftcardamount", None),
|
||||
("coupon_amount", "couponamount", None),
|
||||
("rounding_amount", "roundingamount", None),
|
||||
("point_amount", "pointamount", None),
|
||||
],
|
||||
"billiards_dwd.dwd_settlement_head_ex": [
|
||||
("order_settle_id", "id", None),
|
||||
("serial_number", "serialnumber", None),
|
||||
("settle_status", "settlestatus", None),
|
||||
("can_be_revoked", "canberevoked", "boolean"),
|
||||
("revoke_order_name", "revokeordername", None),
|
||||
("revoke_time", "revoketime", None),
|
||||
("is_first_order", "isfirst", "boolean"),
|
||||
("service_money", "servicemoney", None),
|
||||
("cash_amount", "cashamount", None),
|
||||
("card_amount", "cardamount", None),
|
||||
("online_amount", "onlineamount", None),
|
||||
("refund_amount", "refundamount", None),
|
||||
("prepay_money", "prepaymoney", None),
|
||||
("payment_method", "paymentmethod", None),
|
||||
("coupon_sale_amount", "couponsaleamount", None),
|
||||
("all_coupon_discount", "allcoupondiscount", None),
|
||||
("goods_promotion_money", "goodspromotionmoney", None),
|
||||
("assistant_promotion_money", "assistantpromotionmoney", None),
|
||||
("activity_discount", "activitydiscount", None),
|
||||
("assistant_manual_discount", "assistantmanualdiscount", None),
|
||||
("point_discount_price", "pointdiscountprice", None),
|
||||
("point_discount_cost", "pointdiscountcost", None),
|
||||
("is_use_coupon", "isusecoupon", "boolean"),
|
||||
("is_use_discount", "isusediscount", "boolean"),
|
||||
("is_activity", "isactivity", "boolean"),
|
||||
("operator_name", "operatorname", None),
|
||||
("salesman_name", "salesmanname", None),
|
||||
("order_remark", "orderremark", None),
|
||||
("operator_id", "operatorid", None),
|
||||
("salesman_user_id", "salesmanuserid", None),
|
||||
],
|
||||
# 充值结算:recharge_settlements(字段风格同 settlement_records)
|
||||
"billiards_dwd.dwd_recharge_order": [
|
||||
("recharge_order_id", "id", None),
|
||||
("tenant_id", "tenantid", None),
|
||||
("site_id", "siteid", None),
|
||||
("member_id", "memberid", None),
|
||||
("member_name_snapshot", "membername", None),
|
||||
("member_phone_snapshot", "memberphone", None),
|
||||
("tenant_member_card_id", "tenantmembercardid", None),
|
||||
("member_card_type_name", "membercardtypename", None),
|
||||
("settle_relate_id", "settlerelateid", None),
|
||||
("settle_type", "settletype", None),
|
||||
("settle_name", "settlename", None),
|
||||
("is_first", "isfirst", None),
|
||||
("pay_amount", "payamount", None),
|
||||
("refund_amount", "refundamount", None),
|
||||
("point_amount", "pointamount", None),
|
||||
("cash_amount", "cashamount", None),
|
||||
("payment_method", "paymentmethod", None),
|
||||
("create_time", "createtime", None),
|
||||
("pay_time", "paytime", None),
|
||||
],
|
||||
"billiards_dwd.dwd_recharge_order_ex": [
|
||||
("recharge_order_id", "id", None),
|
||||
("site_name_snapshot", "sitename", None),
|
||||
("salesman_name", "salesmanname", None),
|
||||
("order_remark", "orderremark", None),
|
||||
("revoke_order_name", "revokeordername", None),
|
||||
("settle_status", "settlestatus", None),
|
||||
("is_bind_member", "isbindmember", "boolean"),
|
||||
("is_activity", "isactivity", "boolean"),
|
||||
("is_use_coupon", "isusecoupon", "boolean"),
|
||||
("is_use_discount", "isusediscount", "boolean"),
|
||||
("can_be_revoked", "canberevoked", "boolean"),
|
||||
("online_amount", "onlineamount", None),
|
||||
("balance_amount", "balanceamount", None),
|
||||
("card_amount", "cardamount", None),
|
||||
("coupon_amount", "couponamount", None),
|
||||
("recharge_card_amount", "rechargecardamount", None),
|
||||
("gift_card_amount", "giftcardamount", None),
|
||||
("prepay_money", "prepaymoney", None),
|
||||
("consume_money", "consumemoney", None),
|
||||
("goods_money", "goodsmoney", None),
|
||||
("real_goods_money", "realgoodsmoney", None),
|
||||
("table_charge_money", "tablechargemoney", None),
|
||||
("service_money", "servicemoney", None),
|
||||
("activity_discount", "activitydiscount", None),
|
||||
("all_coupon_discount", "allcoupondiscount", None),
|
||||
("goods_promotion_money", "goodspromotionmoney", None),
|
||||
("assistant_promotion_money", "assistantpromotionmoney", None),
|
||||
("assistant_pd_money", "assistantpdmoney", None),
|
||||
("assistant_cx_money", "assistantcxmoney", None),
|
||||
("assistant_manual_discount", "assistantmanualdiscount", None),
|
||||
("coupon_sale_amount", "couponsaleamount", None),
|
||||
("member_discount_amount", "memberdiscountamount", None),
|
||||
("point_discount_price", "pointdiscountprice", None),
|
||||
("point_discount_cost", "pointdiscountcost", None),
|
||||
("adjust_amount", "adjustamount", None),
|
||||
("rounding_amount", "roundingamount", None),
|
||||
("operator_id", "operatorid", None),
|
||||
("operator_name_snapshot", "operatorname", None),
|
||||
("salesman_user_id", "salesmanuserid", None),
|
||||
("salesman_name", "salesmanname", None),
|
||||
("order_remark", "orderremark", None),
|
||||
("table_id", "tableid", None),
|
||||
("serial_number", "serialnumber", None),
|
||||
("revoke_order_id", "revokeorderid", None),
|
||||
("revoke_order_name", "revokeordername", None),
|
||||
("revoke_time", "revoketime", None),
|
||||
],
|
||||
}
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "DWD_LOAD_FROM_ODS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict[str, Any]:
|
||||
"""准备运行所需的上下文信息。"""
|
||||
return {"now": datetime.now()}
|
||||
|
||||
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict[str, Any]:
|
||||
"""遍历映射关系,维度执行 SCD2 合并,事实表按时间增量插入。"""
|
||||
now = extracted["now"]
|
||||
summary: List[Dict[str, Any]] = []
|
||||
with self.db.conn.cursor(cursor_factory=RealDictCursor) as cur:
|
||||
for dwd_table, ods_table in self.TABLE_MAP.items():
|
||||
dwd_cols = self._get_columns(cur, dwd_table)
|
||||
ods_cols = self._get_columns(cur, ods_table)
|
||||
if not dwd_cols:
|
||||
self.logger.warning("跳过 %s,未能获取 DWD 列信息", dwd_table)
|
||||
continue
|
||||
|
||||
if self._table_base(dwd_table).startswith("dim_"):
|
||||
processed = self._merge_dim_scd2(cur, dwd_table, ods_table, dwd_cols, ods_cols, now)
|
||||
summary.append({"table": dwd_table, "mode": "SCD2", "processed": processed})
|
||||
else:
|
||||
dwd_types = self._get_column_types(cur, dwd_table, "billiards_dwd")
|
||||
ods_types = self._get_column_types(cur, ods_table, "billiards_ods")
|
||||
inserted = self._merge_fact_increment(
|
||||
cur, dwd_table, ods_table, dwd_cols, ods_cols, dwd_types, ods_types
|
||||
)
|
||||
summary.append({"table": dwd_table, "mode": "INCREMENT", "inserted": inserted})
|
||||
|
||||
self.db.conn.commit()
|
||||
return {"tables": summary}
|
||||
|
||||
# ---------------------- helpers ----------------------
|
||||
def _get_columns(self, cur, table: str) -> List[str]:
|
||||
"""获取指定表的列名(小写)。"""
|
||||
schema, name = self._split_table_name(table, default_schema="billiards_dwd")
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT column_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s AND table_name = %s
|
||||
""",
|
||||
(schema, name),
|
||||
)
|
||||
return [r["column_name"].lower() for r in cur.fetchall()]
|
||||
|
||||
def _get_primary_keys(self, cur, table: str) -> List[str]:
|
||||
"""获取表的主键列名列表。"""
|
||||
schema, name = self._split_table_name(table, default_schema="billiards_dwd")
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT kcu.column_name
|
||||
FROM information_schema.table_constraints tc
|
||||
JOIN information_schema.key_column_usage kcu
|
||||
ON tc.constraint_name = kcu.constraint_name
|
||||
AND tc.table_schema = kcu.table_schema
|
||||
AND tc.table_name = kcu.table_name
|
||||
WHERE tc.table_schema = %s
|
||||
AND tc.table_name = %s
|
||||
AND tc.constraint_type = 'PRIMARY KEY'
|
||||
ORDER BY kcu.ordinal_position
|
||||
""",
|
||||
(schema, name),
|
||||
)
|
||||
return [r["column_name"].lower() for r in cur.fetchall()]
|
||||
|
||||
def _get_column_types(self, cur, table: str, default_schema: str) -> Dict[str, str]:
|
||||
"""获取列的数据类型(information_schema.data_type)。"""
|
||||
schema, name = self._split_table_name(table, default_schema=default_schema)
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT column_name, data_type
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s AND table_name = %s
|
||||
""",
|
||||
(schema, name),
|
||||
)
|
||||
return {r["column_name"].lower(): r["data_type"].lower() for r in cur.fetchall()}
|
||||
|
||||
def _build_column_mapping(
|
||||
self, dwd_table: str, pk_cols: Sequence[str], ods_cols: Sequence[str]
|
||||
) -> Dict[str, tuple[str, str | None]]:
|
||||
"""合并显式 FACT_MAPPINGS 与主键兜底映射。"""
|
||||
mapping_entries = self.FACT_MAPPINGS.get(dwd_table, [])
|
||||
mapping: Dict[str, tuple[str, str | None]] = {
|
||||
dst.lower(): (src, cast_type) for dst, src, cast_type in mapping_entries
|
||||
}
|
||||
ods_set = {c.lower() for c in ods_cols}
|
||||
for pk in pk_cols:
|
||||
pk_lower = pk.lower()
|
||||
if pk_lower not in mapping and pk_lower not in ods_set and "id" in ods_set:
|
||||
mapping[pk_lower] = ("id", None)
|
||||
return mapping
|
||||
|
||||
def _fetch_source_rows(
|
||||
self, cur, table: str, columns: Sequence[str], where_sql: str = "", params: Sequence[Any] = None
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""从源表读取指定列,返回小写键的字典列表。"""
|
||||
schema, name = self._split_table_name(table, default_schema="billiards_ods")
|
||||
cols_sql = ", ".join(f'"{c}"' for c in columns)
|
||||
sql = f'SELECT {cols_sql} FROM "{schema}"."{name}" {where_sql}'
|
||||
cur.execute(sql, params or [])
|
||||
rows = []
|
||||
for r in cur.fetchall():
|
||||
rows.append({k.lower(): v for k, v in r.items()})
|
||||
return rows
|
||||
|
||||
def _expand_goods_category_rows(self, rows: list[Dict[str, Any]]) -> list[Dict[str, Any]]:
|
||||
"""将分类表中的 categoryboxes 元素展开为子类记录。"""
|
||||
expanded: list[Dict[str, Any]] = []
|
||||
for r in rows:
|
||||
expanded.append(r)
|
||||
boxes = r.get("categoryboxes")
|
||||
if isinstance(boxes, list):
|
||||
for child in boxes:
|
||||
if not isinstance(child, dict):
|
||||
continue
|
||||
child_row: Dict[str, Any] = {}
|
||||
# 继承父级的租户与业务大类信息
|
||||
child_row["tenant_id"] = r.get("tenant_id")
|
||||
child_row["business_name"] = child.get("business_name", r.get("business_name"))
|
||||
child_row["tenant_goods_business_id"] = child.get(
|
||||
"tenant_goods_business_id", r.get("tenant_goods_business_id")
|
||||
)
|
||||
# 合并子类字段
|
||||
child_row.update(child)
|
||||
# 默认父子关系
|
||||
child_row.setdefault("pid", r.get("id"))
|
||||
# 衍生层级/叶子标记
|
||||
child_boxes = child_row.get("categoryboxes")
|
||||
if not isinstance(child_boxes, list):
|
||||
is_leaf = 1
|
||||
else:
|
||||
is_leaf = 1 if len(child_boxes) == 0 else 0
|
||||
child_row.setdefault("category_level", 2)
|
||||
child_row.setdefault("is_leaf", is_leaf)
|
||||
expanded.append(child_row)
|
||||
return expanded
|
||||
|
||||
def _merge_dim_scd2(
|
||||
self,
|
||||
cur,
|
||||
dwd_table: str,
|
||||
ods_table: str,
|
||||
dwd_cols: Sequence[str],
|
||||
ods_cols: Sequence[str],
|
||||
now: datetime,
|
||||
) -> int:
|
||||
"""对维表执行 SCD2 合并:对比变更关闭旧版并插入新版。"""
|
||||
pk_cols = self._get_primary_keys(cur, dwd_table)
|
||||
if not pk_cols:
|
||||
raise ValueError(f"{dwd_table} 未配置主键,无法执行 SCD2 合并")
|
||||
|
||||
mapping = self._build_column_mapping(dwd_table, pk_cols, ods_cols)
|
||||
ods_set = {c.lower() for c in ods_cols}
|
||||
table_sql = self._format_table(ods_table, "billiards_ods")
|
||||
# 构造 SELECT 表达式,支持 JSON/expression 映射
|
||||
select_exprs: list[str] = []
|
||||
added: set[str] = set()
|
||||
for col in dwd_cols:
|
||||
lc = col.lower()
|
||||
if lc in self.SCD_COLS:
|
||||
continue
|
||||
if lc in mapping:
|
||||
src, cast_type = mapping[lc]
|
||||
select_exprs.append(f"{self._cast_expr(src, cast_type)} AS \"{lc}\"")
|
||||
added.add(lc)
|
||||
elif lc in ods_set:
|
||||
select_exprs.append(f'"{lc}" AS "{lc}"')
|
||||
added.add(lc)
|
||||
# 分类维度需要额外读取 categoryboxes 以展开子类
|
||||
if dwd_table == "billiards_dwd.dim_goods_category" and "categoryboxes" not in added and "categoryboxes" in ods_set:
|
||||
select_exprs.append('"categoryboxes" AS "categoryboxes"')
|
||||
added.add("categoryboxes")
|
||||
# 主键兜底确保被选出
|
||||
for pk in pk_cols:
|
||||
lc = pk.lower()
|
||||
if lc not in added:
|
||||
if lc in mapping:
|
||||
src, cast_type = mapping[lc]
|
||||
select_exprs.append(f"{self._cast_expr(src, cast_type)} AS \"{lc}\"")
|
||||
elif lc in ods_set:
|
||||
select_exprs.append(f'"{lc}" AS "{lc}"')
|
||||
added.add(lc)
|
||||
|
||||
if not select_exprs:
|
||||
return 0
|
||||
|
||||
sql = f"SELECT {', '.join(select_exprs)} FROM {table_sql}"
|
||||
cur.execute(sql)
|
||||
rows = [{k.lower(): v for k, v in r.items()} for r in cur.fetchall()]
|
||||
|
||||
# 特殊:分类维度展开子类
|
||||
if dwd_table == "billiards_dwd.dim_goods_category":
|
||||
rows = self._expand_goods_category_rows(rows)
|
||||
|
||||
inserted_or_updated = 0
|
||||
seen_pk = set()
|
||||
for row in rows:
|
||||
mapped_row: Dict[str, Any] = {}
|
||||
for col in dwd_cols:
|
||||
lc = col.lower()
|
||||
if lc in self.SCD_COLS:
|
||||
continue
|
||||
value = row.get(lc)
|
||||
if value is None and lc in mapping:
|
||||
src, _ = mapping[lc]
|
||||
value = row.get(src.lower())
|
||||
mapped_row[lc] = value
|
||||
|
||||
pk_key = tuple(mapped_row.get(pk) for pk in pk_cols)
|
||||
if pk_key in seen_pk:
|
||||
continue
|
||||
seen_pk.add(pk_key)
|
||||
if self._upsert_scd2_row(cur, dwd_table, dwd_cols, pk_cols, mapped_row, now):
|
||||
inserted_or_updated += 1
|
||||
return len(rows)
|
||||
|
||||
def _upsert_scd2_row(
|
||||
self,
|
||||
cur,
|
||||
dwd_table: str,
|
||||
dwd_cols: Sequence[str],
|
||||
pk_cols: Sequence[str],
|
||||
src_row: Dict[str, Any],
|
||||
now: datetime,
|
||||
) -> bool:
|
||||
"""SCD2 合并:若有变更则关闭旧版并插入新版本。"""
|
||||
pk_values = [src_row.get(pk) for pk in pk_cols]
|
||||
if any(v is None for v in pk_values):
|
||||
self.logger.warning("跳过 %s:主键缺失 %s", dwd_table, dict(zip(pk_cols, pk_values)))
|
||||
return False
|
||||
|
||||
where_clause = " AND ".join(f'"{pk}" = %s' for pk in pk_cols)
|
||||
table_sql = self._format_table(dwd_table, "billiards_dwd")
|
||||
cur.execute(
|
||||
f"SELECT * FROM {table_sql} WHERE {where_clause} AND COALESCE(scd2_is_current,1)=1 LIMIT 1",
|
||||
pk_values,
|
||||
)
|
||||
current = cur.fetchone()
|
||||
if current:
|
||||
current = {k.lower(): v for k, v in current.items()}
|
||||
|
||||
if current and not self._is_row_changed(current, src_row, dwd_cols):
|
||||
return False
|
||||
|
||||
if current:
|
||||
version = (current.get("scd2_version") or 1) + 1
|
||||
self._close_current_dim(cur, dwd_table, pk_cols, pk_values, now)
|
||||
else:
|
||||
version = 1
|
||||
|
||||
self._insert_dim_row(cur, dwd_table, dwd_cols, src_row, now, version)
|
||||
return True
|
||||
|
||||
def _close_current_dim(self, cur, table: str, pk_cols: Sequence[str], pk_values: Sequence[Any], now: datetime) -> None:
|
||||
"""关闭当前版本,标记 scd2_is_current=0 并填充结束时间。"""
|
||||
set_sql = "scd2_end_time = %s, scd2_is_current = 0"
|
||||
where_clause = " AND ".join(f'"{pk}" = %s' for pk in pk_cols)
|
||||
table_sql = self._format_table(table, "billiards_dwd")
|
||||
cur.execute(f"UPDATE {table_sql} SET {set_sql} WHERE {where_clause} AND COALESCE(scd2_is_current,1)=1", [now, *pk_values])
|
||||
|
||||
def _insert_dim_row(
|
||||
self,
|
||||
cur,
|
||||
table: str,
|
||||
dwd_cols: Sequence[str],
|
||||
src_row: Dict[str, Any],
|
||||
now: datetime,
|
||||
version: int,
|
||||
) -> None:
|
||||
"""插入新的 SCD2 版本行。"""
|
||||
insert_cols: List[str] = []
|
||||
placeholders: List[str] = []
|
||||
values: List[Any] = []
|
||||
for col in sorted(dwd_cols):
|
||||
lc = col.lower()
|
||||
insert_cols.append(f'"{lc}"')
|
||||
placeholders.append("%s")
|
||||
if lc == "scd2_start_time":
|
||||
values.append(now)
|
||||
elif lc == "scd2_end_time":
|
||||
values.append(datetime(9999, 12, 31, 0, 0, 0))
|
||||
elif lc == "scd2_is_current":
|
||||
values.append(1)
|
||||
elif lc == "scd2_version":
|
||||
values.append(version)
|
||||
else:
|
||||
values.append(src_row.get(lc))
|
||||
table_sql = self._format_table(table, "billiards_dwd")
|
||||
sql = f'INSERT INTO {table_sql} ({", ".join(insert_cols)}) VALUES ({", ".join(placeholders)})'
|
||||
cur.execute(sql, values)
|
||||
|
||||
def _is_row_changed(self, current: Dict[str, Any], incoming: Dict[str, Any], dwd_cols: Sequence[str]) -> bool:
|
||||
"""比较非 SCD2 列,判断是否存在变更。"""
|
||||
for col in dwd_cols:
|
||||
lc = col.lower()
|
||||
if lc in self.SCD_COLS:
|
||||
continue
|
||||
if current.get(lc) != incoming.get(lc):
|
||||
return True
|
||||
return False
|
||||
|
||||
def _merge_fact_increment(
|
||||
self,
|
||||
cur,
|
||||
dwd_table: str,
|
||||
ods_table: str,
|
||||
dwd_cols: Sequence[str],
|
||||
ods_cols: Sequence[str],
|
||||
dwd_types: Dict[str, str],
|
||||
ods_types: Dict[str, str],
|
||||
) -> int:
|
||||
"""事实表按时间增量插入,默认按列名交集写入。"""
|
||||
mapping_entries = self.FACT_MAPPINGS.get(dwd_table) or []
|
||||
mapping: Dict[str, tuple[str, str | None]] = {
|
||||
dst.lower(): (src, cast_type) for dst, src, cast_type in mapping_entries
|
||||
}
|
||||
|
||||
mapping_dest = [dst for dst, _, _ in mapping_entries]
|
||||
insert_cols: List[str] = list(mapping_dest)
|
||||
for col in dwd_cols:
|
||||
if col in self.SCD_COLS:
|
||||
continue
|
||||
if col in insert_cols:
|
||||
continue
|
||||
if col in ods_cols:
|
||||
insert_cols.append(col)
|
||||
|
||||
pk_cols = self._get_primary_keys(cur, dwd_table)
|
||||
ods_set = {c.lower() for c in ods_cols}
|
||||
existing_lower = [c.lower() for c in insert_cols]
|
||||
for pk in pk_cols:
|
||||
pk_lower = pk.lower()
|
||||
if pk_lower in existing_lower:
|
||||
continue
|
||||
if pk_lower in ods_set:
|
||||
insert_cols.append(pk)
|
||||
existing_lower.append(pk_lower)
|
||||
elif "id" in ods_set:
|
||||
insert_cols.append(pk)
|
||||
existing_lower.append(pk_lower)
|
||||
mapping[pk_lower] = ("id", None)
|
||||
|
||||
# 保持列顺序同时去重
|
||||
seen_cols: set[str] = set()
|
||||
ordered_cols: list[str] = []
|
||||
for col in insert_cols:
|
||||
lc = col.lower()
|
||||
if lc not in seen_cols:
|
||||
seen_cols.add(lc)
|
||||
ordered_cols.append(col)
|
||||
insert_cols = ordered_cols
|
||||
|
||||
if not insert_cols:
|
||||
self.logger.warning("跳过 %s:未找到可插入的列", dwd_table)
|
||||
return 0
|
||||
|
||||
order_col = self._pick_order_column(dwd_cols, ods_cols)
|
||||
where_sql = ""
|
||||
params: List[Any] = []
|
||||
dwd_table_sql = self._format_table(dwd_table, "billiards_dwd")
|
||||
ods_table_sql = self._format_table(ods_table, "billiards_ods")
|
||||
if order_col:
|
||||
cur.execute(f'SELECT COALESCE(MAX("{order_col}"), %s) FROM {dwd_table_sql}', ("1970-01-01",))
|
||||
row = cur.fetchone() or {}
|
||||
watermark = list(row.values())[0] if row else "1970-01-01"
|
||||
where_sql = f'WHERE "{order_col}" > %s'
|
||||
params.append(watermark)
|
||||
|
||||
default_cols = [c for c in insert_cols if c.lower() not in mapping]
|
||||
default_expr_map: Dict[str, str] = {}
|
||||
if default_cols:
|
||||
default_exprs = self._build_fact_select_exprs(default_cols, dwd_types, ods_types)
|
||||
default_expr_map = dict(zip(default_cols, default_exprs))
|
||||
|
||||
select_exprs: List[str] = []
|
||||
for col in insert_cols:
|
||||
key = col.lower()
|
||||
if key in mapping:
|
||||
src, cast_type = mapping[key]
|
||||
select_exprs.append(self._cast_expr(src, cast_type))
|
||||
else:
|
||||
select_exprs.append(default_expr_map[col])
|
||||
|
||||
select_cols_sql = ", ".join(select_exprs)
|
||||
insert_cols_sql = ", ".join(f'"{c}"' for c in insert_cols)
|
||||
sql = f'INSERT INTO {dwd_table_sql} ({insert_cols_sql}) SELECT {select_cols_sql} FROM {ods_table_sql} {where_sql}'
|
||||
|
||||
pk_cols = self._get_primary_keys(cur, dwd_table)
|
||||
if pk_cols:
|
||||
pk_sql = ", ".join(f'"{c}"' for c in pk_cols)
|
||||
sql += f" ON CONFLICT ({pk_sql}) DO NOTHING"
|
||||
|
||||
cur.execute(sql, params)
|
||||
return cur.rowcount
|
||||
|
||||
def _pick_order_column(self, dwd_cols: Iterable[str], ods_cols: Iterable[str]) -> str | None:
|
||||
"""选择用于增量的时间列(需同时存在于 DWD 与 ODS)。"""
|
||||
lower_cols = {c.lower() for c in dwd_cols} & {c.lower() for c in ods_cols}
|
||||
for candidate in self.FACT_ORDER_CANDIDATES:
|
||||
if candidate.lower() in lower_cols:
|
||||
return candidate.lower()
|
||||
return None
|
||||
|
||||
def _build_fact_select_exprs(
|
||||
self,
|
||||
insert_cols: Sequence[str],
|
||||
dwd_types: Dict[str, str],
|
||||
ods_types: Dict[str, str],
|
||||
) -> List[str]:
|
||||
"""构造事实表 SELECT 列表,需要时做类型转换。"""
|
||||
numeric_types = {"integer", "bigint", "smallint", "numeric", "double precision", "real", "decimal"}
|
||||
text_types = {"text", "character varying", "varchar"}
|
||||
exprs = []
|
||||
for col in insert_cols:
|
||||
d_type = dwd_types.get(col)
|
||||
o_type = ods_types.get(col)
|
||||
if d_type in numeric_types and o_type in text_types:
|
||||
exprs.append(f"CAST(NULLIF(CAST(\"{col}\" AS text), '') AS numeric):: {d_type}")
|
||||
else:
|
||||
exprs.append(f'"{col}"')
|
||||
return exprs
|
||||
|
||||
def _split_table_name(self, name: str, default_schema: str) -> tuple[str, str]:
|
||||
"""拆分 schema.table,若无 schema 则补默认 schema。"""
|
||||
parts = name.split(".")
|
||||
if len(parts) == 2:
|
||||
return parts[0], parts[1].lower()
|
||||
return default_schema, name.lower()
|
||||
|
||||
def _table_base(self, name: str) -> str:
|
||||
"""获取不含 schema 的表名。"""
|
||||
return name.split(".")[-1]
|
||||
|
||||
def _format_table(self, name: str, default_schema: str) -> str:
|
||||
"""返回带引号的 schema.table 名称。"""
|
||||
schema, table = self._split_table_name(name, default_schema)
|
||||
return f'"{schema}"."{table}"'
|
||||
|
||||
def _cast_expr(self, col: str, cast_type: str | None) -> str:
|
||||
"""构造带可选 CAST 的列表达式。"""
|
||||
if col.upper() == "NULL":
|
||||
base = "NULL"
|
||||
else:
|
||||
is_expr = not col.isidentifier() or "->" in col or "#>>" in col or "::" in col or "'" in col
|
||||
base = col if is_expr else f'"{col}"'
|
||||
if cast_type:
|
||||
cast_lower = cast_type.lower()
|
||||
if cast_lower in {"bigint", "integer", "numeric", "decimal"}:
|
||||
return f"CAST(NULLIF(CAST({base} AS text), '') AS numeric):: {cast_type}"
|
||||
if cast_lower == "timestamptz":
|
||||
return f"({base})::timestamptz"
|
||||
return f"{base}::{cast_type}"
|
||||
return base
|
||||
105
etl_billiards/tasks/dwd_quality_task.py
Normal file
105
etl_billiards/tasks/dwd_quality_task.py
Normal file
@@ -0,0 +1,105 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""DWD 质量核对任务:按 dwd_quality_check.md 输出行数/金额对照报表。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Iterable, List, Sequence, Tuple
|
||||
|
||||
from psycopg2.extras import RealDictCursor
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from .dwd_load_task import DwdLoadTask
|
||||
|
||||
|
||||
class DwdQualityTask(BaseTask):
|
||||
"""对 ODS 与 DWD 进行行数、金额对照核查,生成 JSON 报表。"""
|
||||
|
||||
REPORT_PATH = Path("etl_billiards/reports/dwd_quality_report.json")
|
||||
AMOUNT_KEYWORDS = ("amount", "money", "fee", "balance")
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "DWD_QUALITY_CHECK"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict[str, Any]:
|
||||
"""准备运行时上下文。"""
|
||||
return {"now": datetime.now()}
|
||||
|
||||
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict[str, Any]:
|
||||
"""输出行数/金额差异报表到本地文件。"""
|
||||
report: Dict[str, Any] = {
|
||||
"generated_at": extracted["now"].isoformat(),
|
||||
"tables": [],
|
||||
"note": "行数/金额核对,金额字段基于列名包含 amount/money/fee/balance 的数值列自动扫描。",
|
||||
}
|
||||
|
||||
with self.db.conn.cursor(cursor_factory=RealDictCursor) as cur:
|
||||
for dwd_table, ods_table in DwdLoadTask.TABLE_MAP.items():
|
||||
count_info = self._compare_counts(cur, dwd_table, ods_table)
|
||||
amount_info = self._compare_amounts(cur, dwd_table, ods_table)
|
||||
report["tables"].append(
|
||||
{
|
||||
"dwd_table": dwd_table,
|
||||
"ods_table": ods_table,
|
||||
"count": count_info,
|
||||
"amounts": amount_info,
|
||||
}
|
||||
)
|
||||
|
||||
self.REPORT_PATH.parent.mkdir(parents=True, exist_ok=True)
|
||||
self.REPORT_PATH.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8")
|
||||
self.logger.info("DWD 质检报表已生成:%s", self.REPORT_PATH)
|
||||
return {"report_path": str(self.REPORT_PATH)}
|
||||
|
||||
# ---------------------- helpers ----------------------
|
||||
def _compare_counts(self, cur, dwd_table: str, ods_table: str) -> Dict[str, Any]:
|
||||
"""统计两端行数并返回差异。"""
|
||||
dwd_schema, dwd_name = self._split_table_name(dwd_table, default_schema="billiards_dwd")
|
||||
ods_schema, ods_name = self._split_table_name(ods_table, default_schema="billiards_ods")
|
||||
cur.execute(f'SELECT COUNT(1) AS cnt FROM "{dwd_schema}"."{dwd_name}"')
|
||||
dwd_cnt = cur.fetchone()["cnt"]
|
||||
cur.execute(f'SELECT COUNT(1) AS cnt FROM "{ods_schema}"."{ods_name}"')
|
||||
ods_cnt = cur.fetchone()["cnt"]
|
||||
return {"dwd": dwd_cnt, "ods": ods_cnt, "diff": dwd_cnt - ods_cnt}
|
||||
|
||||
def _compare_amounts(self, cur, dwd_table: str, ods_table: str) -> List[Dict[str, Any]]:
|
||||
"""扫描金额相关列,生成 ODS 与 DWD 的汇总对照。"""
|
||||
dwd_schema, dwd_name = self._split_table_name(dwd_table, default_schema="billiards_dwd")
|
||||
ods_schema, ods_name = self._split_table_name(ods_table, default_schema="billiards_ods")
|
||||
|
||||
dwd_amount_cols = self._get_numeric_amount_columns(cur, dwd_schema, dwd_name)
|
||||
ods_amount_cols = self._get_numeric_amount_columns(cur, ods_schema, ods_name)
|
||||
common_amount_cols = sorted(set(dwd_amount_cols) & set(ods_amount_cols))
|
||||
|
||||
results: List[Dict[str, Any]] = []
|
||||
for col in common_amount_cols:
|
||||
cur.execute(f'SELECT COALESCE(SUM("{col}"),0) AS val FROM "{dwd_schema}"."{dwd_name}"')
|
||||
dwd_sum = cur.fetchone()["val"]
|
||||
cur.execute(f'SELECT COALESCE(SUM("{col}"),0) AS val FROM "{ods_schema}"."{ods_name}"')
|
||||
ods_sum = cur.fetchone()["val"]
|
||||
results.append({"column": col, "dwd_sum": float(dwd_sum or 0), "ods_sum": float(ods_sum or 0), "diff": float(dwd_sum or 0) - float(ods_sum or 0)})
|
||||
return results
|
||||
|
||||
def _get_numeric_amount_columns(self, cur, schema: str, table: str) -> List[str]:
|
||||
"""获取列名包含金额关键词的数值型字段。"""
|
||||
cur.execute(
|
||||
"""
|
||||
SELECT column_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s
|
||||
AND table_name = %s
|
||||
AND data_type IN ('numeric','double precision','integer','bigint','smallint','real','decimal')
|
||||
""",
|
||||
(schema, table),
|
||||
)
|
||||
cols = [r["column_name"].lower() for r in cur.fetchall()]
|
||||
return [c for c in cols if any(key in c for key in self.AMOUNT_KEYWORDS)]
|
||||
|
||||
def _split_table_name(self, name: str, default_schema: str) -> Tuple[str, str]:
|
||||
"""拆分 schema 与表名,缺省使用 default_schema。"""
|
||||
parts = name.split(".")
|
||||
if len(parts) == 2:
|
||||
return parts[0], parts[1]
|
||||
return default_schema, name
|
||||
36
etl_billiards/tasks/init_dwd_schema_task.py
Normal file
36
etl_billiards/tasks/init_dwd_schema_task.py
Normal file
@@ -0,0 +1,36 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""初始化 DWD Schema:执行 schema_dwd_doc.sql,可选先 DROP SCHEMA。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
|
||||
|
||||
class InitDwdSchemaTask(BaseTask):
|
||||
"""通过调度执行 DWD schema 初始化。"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "INIT_DWD_SCHEMA"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict[str, Any]:
|
||||
"""读取 DWD SQL 文件与参数。"""
|
||||
base_dir = Path(__file__).resolve().parents[1] / "database"
|
||||
dwd_path = Path(self.config.get("schema.dwd_file", base_dir / "schema_dwd_doc.sql"))
|
||||
if not dwd_path.exists():
|
||||
raise FileNotFoundError(f"未找到 DWD schema 文件: {dwd_path}")
|
||||
|
||||
drop_first = self.config.get("dwd.drop_schema_first", False)
|
||||
return {"dwd_sql": dwd_path.read_text(encoding="utf-8"), "dwd_file": str(dwd_path), "drop_first": drop_first}
|
||||
|
||||
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict:
|
||||
"""可选 DROP schema,再执行 DWD DDL。"""
|
||||
with self.db.conn.cursor() as cur:
|
||||
if extracted["drop_first"]:
|
||||
cur.execute("DROP SCHEMA IF EXISTS billiards_dwd CASCADE;")
|
||||
self.logger.info("已执行 DROP SCHEMA billiards_dwd CASCADE")
|
||||
self.logger.info("执行 DWD schema 文件: %s", extracted["dwd_file"])
|
||||
cur.execute(extracted["dwd_sql"])
|
||||
return {"executed": 1, "files": [extracted["dwd_file"]]}
|
||||
73
etl_billiards/tasks/init_schema_task.py
Normal file
73
etl_billiards/tasks/init_schema_task.py
Normal file
@@ -0,0 +1,73 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""任务:初始化运行环境,执行 ODS 与 etl_admin 的 DDL,并准备日志/导出目录。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
|
||||
|
||||
class InitOdsSchemaTask(BaseTask):
|
||||
"""通过调度执行初始化:创建必要目录,执行 ODS 与 etl_admin 的 DDL。"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "INIT_ODS_SCHEMA"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict[str, Any]:
|
||||
"""读取 SQL 文件路径,收集需创建的目录。"""
|
||||
base_dir = Path(__file__).resolve().parents[1] / "database"
|
||||
ods_path = Path(self.config.get("schema.ods_file", base_dir / "schema_ODS_doc.sql"))
|
||||
admin_path = Path(self.config.get("schema.etl_admin_file", base_dir / "schema_etl_admin.sql"))
|
||||
if not ods_path.exists():
|
||||
raise FileNotFoundError(f"找不到 ODS schema 文件: {ods_path}")
|
||||
if not admin_path.exists():
|
||||
raise FileNotFoundError(f"找不到 etl_admin schema 文件: {admin_path}")
|
||||
|
||||
log_root = Path(self.config.get("io.log_root") or self.config["io"]["log_root"])
|
||||
export_root = Path(self.config.get("io.export_root") or self.config["io"]["export_root"])
|
||||
fetch_root = Path(self.config.get("pipeline.fetch_root") or self.config["pipeline"]["fetch_root"])
|
||||
ingest_dir = Path(self.config.get("pipeline.ingest_source_dir") or fetch_root)
|
||||
|
||||
return {
|
||||
"ods_sql": ods_path.read_text(encoding="utf-8"),
|
||||
"admin_sql": admin_path.read_text(encoding="utf-8"),
|
||||
"ods_file": str(ods_path),
|
||||
"admin_file": str(admin_path),
|
||||
"dirs": [log_root, export_root, fetch_root, ingest_dir],
|
||||
}
|
||||
|
||||
def load(self, extracted: dict[str, Any], context: TaskContext) -> dict:
|
||||
"""执行 DDL 并创建必要目录。
|
||||
|
||||
安全提示:
|
||||
ODS DDL 文件可能携带头部说明或异常注释,为避免因非 SQL 文本导致执行失败,这里会做一次轻量清洗后再执行。
|
||||
"""
|
||||
for d in extracted["dirs"]:
|
||||
Path(d).mkdir(parents=True, exist_ok=True)
|
||||
self.logger.info("已确保目录存在: %s", d)
|
||||
|
||||
# 处理 ODS SQL:去掉头部说明行,以及易出错的 COMMENT ON 行(如 CamelCase 未加引号)
|
||||
ods_sql_raw: str = extracted["ods_sql"]
|
||||
drop_idx = ods_sql_raw.find("DROP SCHEMA")
|
||||
if drop_idx > 0:
|
||||
ods_sql_raw = ods_sql_raw[drop_idx:]
|
||||
cleaned_lines: list[str] = []
|
||||
for line in ods_sql_raw.splitlines():
|
||||
if line.strip().upper().startswith("COMMENT ON "):
|
||||
continue
|
||||
cleaned_lines.append(line)
|
||||
ods_sql = "\n".join(cleaned_lines)
|
||||
|
||||
with self.db.conn.cursor() as cur:
|
||||
self.logger.info("执行 etl_admin schema 文件: %s", extracted["admin_file"])
|
||||
cur.execute(extracted["admin_sql"])
|
||||
self.logger.info("执行 ODS schema 文件: %s", extracted["ods_file"])
|
||||
cur.execute(ods_sql)
|
||||
|
||||
return {
|
||||
"executed": 2,
|
||||
"files": [extracted["admin_file"], extracted["ods_file"]],
|
||||
"dirs_prepared": [str(p) for p in extracted["dirs"]],
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,4 +1,4 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# -*- coding: utf-8 -*-
|
||||
from .base_dwd_task import BaseDwdTask
|
||||
from loaders.dimensions.member import MemberLoader
|
||||
from models.parsers import TypeParser
|
||||
@@ -7,7 +7,7 @@ import json
|
||||
class MembersDwdTask(BaseDwdTask):
|
||||
"""
|
||||
DWD Task: Process Member Records from ODS to Dimension Table
|
||||
Source: billiards_ods.ods_member_profile
|
||||
Source: billiards_ods.member_profiles
|
||||
Target: billiards.dim_member
|
||||
"""
|
||||
|
||||
@@ -29,7 +29,7 @@ class MembersDwdTask(BaseDwdTask):
|
||||
|
||||
# Iterate ODS Data
|
||||
batches = self.iter_ods_rows(
|
||||
table_name="billiards_ods.ods_member_profile",
|
||||
table_name="billiards_ods.member_profiles",
|
||||
columns=["site_id", "member_id", "payload", "fetched_at"],
|
||||
start_time=window_start,
|
||||
end_time=window_end
|
||||
@@ -87,3 +87,4 @@ class MembersDwdTask(BaseDwdTask):
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Error parsing member: {e}")
|
||||
return None
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# -*- coding: utf-8 -*-
|
||||
"""ODS ingestion tasks."""
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -62,11 +62,11 @@ class BaseOdsTask(BaseTask):
|
||||
|
||||
def execute(self) -> dict:
|
||||
spec = self.SPEC
|
||||
self.logger.info("开始执行 %s (ODS)", spec.code)
|
||||
self.logger.info("寮€濮嬫墽琛?%s (ODS)", spec.code)
|
||||
|
||||
store_id = TypeParser.parse_int(self.config.get("app.store_id"))
|
||||
if not store_id:
|
||||
raise ValueError("app.store_id 未配置,无法执行 ODS 任务")
|
||||
raise ValueError("app.store_id 鏈厤缃紝鏃犳硶鎵ц ODS 浠诲姟")
|
||||
|
||||
page_size = self.config.get("api.page_size", 200)
|
||||
params = self._build_params(spec, store_id)
|
||||
@@ -122,13 +122,13 @@ class BaseOdsTask(BaseTask):
|
||||
counts["fetched"] += len(page_records)
|
||||
|
||||
self.db.commit()
|
||||
self.logger.info("%s ODS 任务完成: %s", spec.code, counts)
|
||||
self.logger.info("%s ODS 浠诲姟瀹屾垚: %s", spec.code, counts)
|
||||
return self._build_result("SUCCESS", counts)
|
||||
|
||||
except Exception:
|
||||
self.db.rollback()
|
||||
counts["errors"] += 1
|
||||
self.logger.error("%s ODS 任务失败", spec.code, exc_info=True)
|
||||
self.logger.error("%s ODS 浠诲姟澶辫触", spec.code, exc_info=True)
|
||||
raise
|
||||
|
||||
def _build_params(self, spec: OdsTaskSpec, store_id: int) -> dict:
|
||||
@@ -201,7 +201,7 @@ class BaseOdsTask(BaseTask):
|
||||
value = self._extract_value(record, col_spec)
|
||||
if value is None and col_spec.required:
|
||||
self.logger.warning(
|
||||
"%s 缺少必填字段 %s,原始记录: %s",
|
||||
"%s 缂哄皯蹇呭~瀛楁 %s锛屽師濮嬭褰? %s",
|
||||
spec.code,
|
||||
col_spec.column,
|
||||
record,
|
||||
@@ -265,9 +265,38 @@ def _int_col(name: str, *sources: str, required: bool = False) -> ColumnSpec:
|
||||
)
|
||||
|
||||
|
||||
def _decimal_col(name: str, *sources: str) -> ColumnSpec:
|
||||
"""??????????????"""
|
||||
return ColumnSpec(
|
||||
column=name,
|
||||
sources=sources,
|
||||
transform=lambda v: TypeParser.parse_decimal(v, 2),
|
||||
)
|
||||
|
||||
|
||||
def _bool_col(name: str, *sources: str) -> ColumnSpec:
|
||||
"""??????????????0/1?true/false ???"""
|
||||
|
||||
def _to_bool(value):
|
||||
if value is None:
|
||||
return None
|
||||
if isinstance(value, bool):
|
||||
return value
|
||||
s = str(value).strip().lower()
|
||||
if s in {"1", "true", "t", "yes", "y"}:
|
||||
return True
|
||||
if s in {"0", "false", "f", "no", "n"}:
|
||||
return False
|
||||
return bool(value)
|
||||
|
||||
return ColumnSpec(column=name, sources=sources, transform=_to_bool)
|
||||
|
||||
|
||||
|
||||
|
||||
ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
OdsTaskSpec(
|
||||
code="ODS_ASSISTANT_ACCOUNTS",
|
||||
code="ODS_ASSISTANT_ACCOUNT",
|
||||
class_name="OdsAssistantAccountsTask",
|
||||
table_name="billiards_ods.assistant_accounts_master",
|
||||
endpoint="/PersonnelManagement/SearchAssistantInfo",
|
||||
@@ -281,10 +310,10 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
description="助教账号档案 ODS:SearchAssistantInfo -> assistantInfos 原始 JSON",
|
||||
description="鍔╂暀璐﹀彿妗f ODS锛歋earchAssistantInfo -> assistantInfos 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_ORDER_SETTLE",
|
||||
code="ODS_SETTLEMENT_RECORDS",
|
||||
class_name="OdsOrderSettleTask",
|
||||
table_name="billiards_ods.settlement_records",
|
||||
endpoint="/Site/GetAllOrderSettleList",
|
||||
@@ -299,7 +328,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="结账记录 ODS:GetAllOrderSettleList -> settleList 原始 JSON",
|
||||
description="缁撹处璁板綍 ODS锛欸etAllOrderSettleList -> settleList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_TABLE_USE",
|
||||
@@ -317,7 +346,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="台费计费流水 ODS:GetSiteTableOrderDetails -> siteTableUseDetailsList 原始 JSON",
|
||||
description="鍙拌垂璁¤垂娴佹按 ODS锛欸etSiteTableOrderDetails -> siteTableUseDetailsList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_ASSISTANT_LEDGER",
|
||||
@@ -334,7 +363,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
description="助教服务流水 ODS:GetOrderAssistantDetails -> orderAssistantDetails 原始 JSON",
|
||||
description="鍔╂暀鏈嶅姟娴佹按 ODS锛欸etOrderAssistantDetails -> orderAssistantDetails 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_ASSISTANT_ABOLISH",
|
||||
@@ -351,10 +380,10 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
description="助教废除记录 ODS:GetAbolitionAssistant -> abolitionAssistants 原始 JSON",
|
||||
description="鍔╂暀搴熼櫎璁板綍 ODS锛欸etAbolitionAssistant -> abolitionAssistants 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_GOODS_LEDGER",
|
||||
code="ODS_STORE_GOODS_SALES",
|
||||
class_name="OdsGoodsLedgerTask",
|
||||
table_name="billiards_ods.store_goods_sales_records",
|
||||
endpoint="/TenantGoods/GetGoodsSalesList",
|
||||
@@ -369,7 +398,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="门店商品销售流水 ODS:GetGoodsSalesList -> orderGoodsLedgers 原始 JSON",
|
||||
description="闂ㄥ簵鍟嗗搧閿€鍞祦姘?ODS锛欸etGoodsSalesList -> orderGoodsLedgers 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_PAYMENT",
|
||||
@@ -386,7 +415,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="支付流水 ODS:GetPayLogListPage 原始 JSON",
|
||||
description="鏀粯娴佹按 ODS锛欸etPayLogListPage 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_REFUND",
|
||||
@@ -403,10 +432,10 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="退款流水 ODS:GetRefundPayLogList 原始 JSON",
|
||||
description="閫€娆炬祦姘?ODS锛欸etRefundPayLogList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_COUPON_VERIFY",
|
||||
code="ODS_PLATFORM_COUPON",
|
||||
class_name="OdsCouponVerifyTask",
|
||||
table_name="billiards_ods.platform_coupon_redemption_records",
|
||||
endpoint="/Promotion/GetOfflineCouponConsumePageList",
|
||||
@@ -420,7 +449,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="平台/团购券核销 ODS:GetOfflineCouponConsumePageList 原始 JSON",
|
||||
description="骞冲彴/鍥㈣喘鍒告牳閿€ ODS锛欸etOfflineCouponConsumePageList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_MEMBER",
|
||||
@@ -438,7 +467,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="会员档案 ODS:GetTenantMemberList -> tenantMemberInfos 原始 JSON",
|
||||
description="浼氬憳妗f ODS锛欸etTenantMemberList -> tenantMemberInfos 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_MEMBER_CARD",
|
||||
@@ -456,7 +485,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="会员储值卡 ODS:GetTenantMemberCardList -> tenantMemberCards 原始 JSON",
|
||||
description="浼氬憳鍌ㄥ€煎崱 ODS锛欸etTenantMemberCardList -> tenantMemberCards 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_MEMBER_BALANCE",
|
||||
@@ -474,7 +503,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="会员余额变动 ODS:GetMemberCardBalanceChange -> tenantMemberCardLogs 原始 JSON",
|
||||
description="浼氬憳浣欓鍙樺姩 ODS锛欸etMemberCardBalanceChange -> tenantMemberCardLogs 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_RECHARGE_SETTLE",
|
||||
@@ -483,19 +512,83 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
endpoint="/Site/GetRechargeSettleList",
|
||||
data_path=("data",),
|
||||
list_key="settleList",
|
||||
pk_columns=(),
|
||||
pk_columns=(_int_col("recharge_order_id", "settleList.id", "id", required=True),),
|
||||
extra_columns=(
|
||||
_int_col("tenant_id", "settleList.tenantId", "tenantId"),
|
||||
_int_col("site_id", "settleList.siteId", "siteId", "siteProfile.id"),
|
||||
ColumnSpec("site_name_snapshot", sources=("siteProfile.shop_name", "settleList.siteName")),
|
||||
_int_col("member_id", "settleList.memberId", "memberId"),
|
||||
ColumnSpec("member_name_snapshot", sources=("settleList.memberName", "memberName")),
|
||||
ColumnSpec("member_phone_snapshot", sources=("settleList.memberPhone", "memberPhone")),
|
||||
_int_col("tenant_member_card_id", "settleList.tenantMemberCardId", "tenantMemberCardId"),
|
||||
ColumnSpec("member_card_type_name", sources=("settleList.memberCardTypeName", "memberCardTypeName")),
|
||||
_int_col("settle_relate_id", "settleList.settleRelateId", "settleRelateId"),
|
||||
_int_col("settle_type", "settleList.settleType", "settleType"),
|
||||
ColumnSpec("settle_name", sources=("settleList.settleName", "settleName")),
|
||||
_int_col("is_first", "settleList.isFirst", "isFirst"),
|
||||
_int_col("settle_status", "settleList.settleStatus", "settleStatus"),
|
||||
_decimal_col("pay_amount", "settleList.payAmount", "payAmount"),
|
||||
_decimal_col("refund_amount", "settleList.refundAmount", "refundAmount"),
|
||||
_decimal_col("point_amount", "settleList.pointAmount", "pointAmount"),
|
||||
_decimal_col("cash_amount", "settleList.cashAmount", "cashAmount"),
|
||||
_decimal_col("online_amount", "settleList.onlineAmount", "onlineAmount"),
|
||||
_decimal_col("balance_amount", "settleList.balanceAmount", "balanceAmount"),
|
||||
_decimal_col("card_amount", "settleList.cardAmount", "cardAmount"),
|
||||
_decimal_col("coupon_amount", "settleList.couponAmount", "couponAmount"),
|
||||
_decimal_col("recharge_card_amount", "settleList.rechargeCardAmount", "rechargeCardAmount"),
|
||||
_decimal_col("gift_card_amount", "settleList.giftCardAmount", "giftCardAmount"),
|
||||
_decimal_col("prepay_money", "settleList.prepayMoney", "prepayMoney"),
|
||||
_decimal_col("consume_money", "settleList.consumeMoney", "consumeMoney"),
|
||||
_decimal_col("goods_money", "settleList.goodsMoney", "goodsMoney"),
|
||||
_decimal_col("real_goods_money", "settleList.realGoodsMoney", "realGoodsMoney"),
|
||||
_decimal_col("table_charge_money", "settleList.tableChargeMoney", "tableChargeMoney"),
|
||||
_decimal_col("service_money", "settleList.serviceMoney", "serviceMoney"),
|
||||
_decimal_col("activity_discount", "settleList.activityDiscount", "activityDiscount"),
|
||||
_decimal_col("all_coupon_discount", "settleList.allCouponDiscount", "allCouponDiscount"),
|
||||
_decimal_col("goods_promotion_money", "settleList.goodsPromotionMoney", "goodsPromotionMoney"),
|
||||
_decimal_col("assistant_promotion_money", "settleList.assistantPromotionMoney", "assistantPromotionMoney"),
|
||||
_decimal_col("assistant_pd_money", "settleList.assistantPdMoney", "assistantPdMoney"),
|
||||
_decimal_col("assistant_cx_money", "settleList.assistantCxMoney", "assistantCxMoney"),
|
||||
_decimal_col("assistant_manual_discount", "settleList.assistantManualDiscount", "assistantManualDiscount"),
|
||||
_decimal_col("coupon_sale_amount", "settleList.couponSaleAmount", "couponSaleAmount"),
|
||||
_decimal_col("member_discount_amount", "settleList.memberDiscountAmount", "memberDiscountAmount"),
|
||||
_decimal_col("point_discount_price", "settleList.pointDiscountPrice", "pointDiscountPrice"),
|
||||
_decimal_col("point_discount_cost", "settleList.pointDiscountCost", "pointDiscountCost"),
|
||||
_decimal_col("adjust_amount", "settleList.adjustAmount", "adjustAmount"),
|
||||
_decimal_col("rounding_amount", "settleList.roundingAmount", "roundingAmount"),
|
||||
_int_col("payment_method", "settleList.paymentMethod", "paymentMethod"),
|
||||
_bool_col("can_be_revoked", "settleList.canBeRevoked", "canBeRevoked"),
|
||||
_bool_col("is_bind_member", "settleList.isBindMember", "isBindMember"),
|
||||
_bool_col("is_activity", "settleList.isActivity", "isActivity"),
|
||||
_bool_col("is_use_coupon", "settleList.isUseCoupon", "isUseCoupon"),
|
||||
_bool_col("is_use_discount", "settleList.isUseDiscount", "isUseDiscount"),
|
||||
_int_col("operator_id", "settleList.operatorId", "operatorId"),
|
||||
ColumnSpec("operator_name_snapshot", sources=("settleList.operatorName", "operatorName")),
|
||||
_int_col("salesman_user_id", "settleList.salesManUserId", "salesmanUserId", "salesManUserId"),
|
||||
ColumnSpec("salesman_name", sources=("settleList.salesManName", "salesmanName", "settleList.salesmanName")),
|
||||
ColumnSpec("order_remark", sources=("settleList.orderRemark", "orderRemark")),
|
||||
_int_col("table_id", "settleList.tableId", "tableId"),
|
||||
_int_col("serial_number", "settleList.serialNumber", "serialNumber"),
|
||||
_int_col("revoke_order_id", "settleList.revokeOrderId", "revokeOrderId"),
|
||||
ColumnSpec("revoke_order_name", sources=("settleList.revokeOrderName", "revokeOrderName")),
|
||||
ColumnSpec("revoke_time", sources=("settleList.revokeTime", "revokeTime")),
|
||||
ColumnSpec("create_time", sources=("settleList.createTime", "createTime")),
|
||||
ColumnSpec("pay_time", sources=("settleList.payTime", "payTime")),
|
||||
ColumnSpec("site_profile", sources=("siteProfile",)),
|
||||
),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_source_endpoint=True,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
include_fetched_at=True,
|
||||
include_record_index=False,
|
||||
conflict_columns_override=None,
|
||||
requires_window=False,
|
||||
description="会员充值结算 ODS:GetRechargeSettleList -> settleList 原始 JSON",
|
||||
description="?????? ODS?GetRechargeSettleList -> data.settleList ????",
|
||||
),
|
||||
|
||||
OdsTaskSpec(
|
||||
code="ODS_PACKAGE",
|
||||
code="ODS_GROUP_PACKAGE",
|
||||
class_name="OdsPackageTask",
|
||||
table_name="billiards_ods.group_buy_packages",
|
||||
endpoint="/PackageCoupon/QueryPackageCouponList",
|
||||
@@ -510,7 +603,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="团购套餐定义 ODS:QueryPackageCouponList -> packageCouponList 原始 JSON",
|
||||
description="鍥㈣喘濂楅瀹氫箟 ODS锛歈ueryPackageCouponList -> packageCouponList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_GROUP_BUY_REDEMPTION",
|
||||
@@ -528,7 +621,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="团购套餐核销 ODS:GetSiteTableUseDetails -> siteTableUseDetailsList 原始 JSON",
|
||||
description="鍥㈣喘濂楅鏍搁攢 ODS锛欸etSiteTableUseDetails -> siteTableUseDetailsList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_INVENTORY_STOCK",
|
||||
@@ -545,7 +638,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="库存汇总 ODS:GetGoodsStockReport 原始 JSON",
|
||||
description="搴撳瓨姹囨€?ODS锛欸etGoodsStockReport 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_INVENTORY_CHANGE",
|
||||
@@ -562,7 +655,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
description="库存变化记录 ODS:QueryGoodsOutboundReceipt -> queryDeliveryRecordsList 原始 JSON",
|
||||
description="搴撳瓨鍙樺寲璁板綍 ODS锛歈ueryGoodsOutboundReceipt -> queryDeliveryRecordsList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_TABLES",
|
||||
@@ -580,7 +673,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="台桌维表 ODS:GetSiteTables -> siteTables 原始 JSON",
|
||||
description="鍙版缁磋〃 ODS锛欸etSiteTables -> siteTables 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_GOODS_CATEGORY",
|
||||
@@ -598,7 +691,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="库存商品分类树 ODS:QueryPrimarySecondaryCategory -> goodsCategoryList 原始 JSON",
|
||||
description="搴撳瓨鍟嗗搧鍒嗙被鏍?ODS锛歈ueryPrimarySecondaryCategory -> goodsCategoryList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_STORE_GOODS",
|
||||
@@ -616,10 +709,10 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="门店商品档案 ODS:GetGoodsInventoryList -> orderGoodsList 原始 JSON",
|
||||
description="闂ㄥ簵鍟嗗搧妗f ODS锛欸etGoodsInventoryList -> orderGoodsList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_TABLE_DISCOUNT",
|
||||
code="ODS_TABLE_FEE_DISCOUNT",
|
||||
class_name="OdsTableDiscountTask",
|
||||
table_name="billiards_ods.table_fee_discount_records",
|
||||
endpoint="/Site/GetTaiFeeAdjustList",
|
||||
@@ -634,7 +727,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="台费折扣/调账 ODS:GetTaiFeeAdjustList -> taiFeeAdjustInfos 原始 JSON",
|
||||
description="鍙拌垂鎶樻墸/璋冭处 ODS锛欸etTaiFeeAdjustList -> taiFeeAdjustInfos 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_TENANT_GOODS",
|
||||
@@ -652,7 +745,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="租户商品档案 ODS:QueryTenantGoods -> tenantGoodsList 原始 JSON",
|
||||
description="绉熸埛鍟嗗搧妗f ODS锛歈ueryTenantGoods -> tenantGoodsList 鍘熷 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_SETTLEMENT_TICKET",
|
||||
@@ -671,7 +764,7 @@ ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
include_site_id=False,
|
||||
description="结账小票详情 ODS:GetOrderSettleTicketNew 原始 JSON",
|
||||
description="缁撹处灏忕エ璇︽儏 ODS锛欸etOrderSettleTicketNew 鍘熷 JSON",
|
||||
),
|
||||
)
|
||||
|
||||
@@ -725,7 +818,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
|
||||
|
||||
if not candidates:
|
||||
self.logger.info(
|
||||
"%s: 窗口[%s ~ %s] 未发现需要抓取的小票",
|
||||
"%s: 绐楀彛[%s ~ %s] 鏈彂鐜伴渶瑕佹姄鍙栫殑灏忕エ",
|
||||
spec.code,
|
||||
context.window_start,
|
||||
context.window_end,
|
||||
@@ -755,7 +848,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
|
||||
counts["updated"] += updated
|
||||
self.db.commit()
|
||||
self.logger.info(
|
||||
"%s: 小票抓取完成,候选=%s 插入=%s 更新=%s 跳过=%s",
|
||||
"%s: 灏忕エ鎶撳彇瀹屾垚锛屽€欓€?%s 鎻掑叆=%s 鏇存柊=%s 璺宠繃=%s",
|
||||
spec.code,
|
||||
len(candidates),
|
||||
inserted,
|
||||
@@ -767,7 +860,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
|
||||
except Exception:
|
||||
counts["errors"] += 1
|
||||
self.db.rollback()
|
||||
self.logger.error("%s: 小票抓取失败", spec.code, exc_info=True)
|
||||
self.logger.error("%s: 灏忕エ鎶撳彇澶辫触", spec.code, exc_info=True)
|
||||
raise
|
||||
|
||||
# ------------------------------------------------------------------ helpers
|
||||
@@ -782,7 +875,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
|
||||
try:
|
||||
rows = self.db.query(sql)
|
||||
except Exception:
|
||||
self.logger.warning("查询已有小票失败,按空集处理", exc_info=True)
|
||||
self.logger.warning("鏌ヨ宸叉湁灏忕エ澶辫触锛屾寜绌洪泦澶勭悊", exc_info=True)
|
||||
return set()
|
||||
|
||||
return {
|
||||
@@ -819,7 +912,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
|
||||
try:
|
||||
rows = self.db.query(sql, params)
|
||||
except Exception:
|
||||
self.logger.warning("读取支付流水以获取结算单ID失败,将尝试调用支付接口回退", exc_info=True)
|
||||
self.logger.warning("璇诲彇鏀粯娴佹按浠ヨ幏鍙栫粨绠楀崟ID澶辫触锛屽皢灏濊瘯璋冪敤鏀粯鎺ュ彛鍥為€€", exc_info=True)
|
||||
return set()
|
||||
|
||||
return {
|
||||
@@ -853,7 +946,7 @@ class OdsSettlementTicketTask(BaseOdsTask):
|
||||
if relate_id:
|
||||
candidate_ids.add(relate_id)
|
||||
except Exception:
|
||||
self.logger.warning("调用支付接口获取结算单ID失败,当前批次将跳过回退来源", exc_info=True)
|
||||
self.logger.warning("璋冪敤鏀粯鎺ュ彛鑾峰彇缁撶畻鍗旾D澶辫触锛屽綋鍓嶆壒娆″皢璺宠繃鍥為€€鏉ユ簮", exc_info=True)
|
||||
return candidate_ids
|
||||
|
||||
def _fetch_ticket_payload(self, order_settle_id: int):
|
||||
@@ -869,10 +962,10 @@ class OdsSettlementTicketTask(BaseOdsTask):
|
||||
payload = response
|
||||
except Exception:
|
||||
self.logger.warning(
|
||||
"调用小票接口失败 orderSettleId=%s", order_settle_id, exc_info=True
|
||||
"璋冪敤灏忕エ鎺ュ彛澶辫触 orderSettleId=%s", order_settle_id, exc_info=True
|
||||
)
|
||||
if isinstance(payload, dict) and isinstance(payload.get("data"), list) and len(payload["data"]) == 1:
|
||||
# 本地桩/回放可能把响应包装成单元素 list,这里展开以贴近真实结构
|
||||
# 鏈湴妗?鍥炴斁鍙兘鎶婂搷搴斿寘瑁呮垚鍗曞厓绱?list锛岃繖閲屽睍寮€浠ヨ创杩戠湡瀹炵粨鏋?
|
||||
payload = payload["data"][0]
|
||||
return payload
|
||||
|
||||
@@ -899,27 +992,29 @@ def _build_task_class(spec: OdsTaskSpec) -> Type[BaseOdsTask]:
|
||||
|
||||
|
||||
ENABLED_ODS_CODES = {
|
||||
"ODS_ASSISTANT_ACCOUNTS",
|
||||
"ODS_ASSISTANT_ACCOUNT",
|
||||
"ODS_ASSISTANT_LEDGER",
|
||||
"ODS_ASSISTANT_ABOLISH",
|
||||
"ODS_INVENTORY_CHANGE",
|
||||
"ODS_INVENTORY_STOCK",
|
||||
"ODS_PACKAGE",
|
||||
"ODS_GROUP_PACKAGE",
|
||||
"ODS_GROUP_BUY_REDEMPTION",
|
||||
"ODS_MEMBER",
|
||||
"ODS_MEMBER_BALANCE",
|
||||
"ODS_MEMBER_CARD",
|
||||
"ODS_PAYMENT",
|
||||
"ODS_REFUND",
|
||||
"ODS_COUPON_VERIFY",
|
||||
"ODS_PLATFORM_COUPON",
|
||||
"ODS_RECHARGE_SETTLE",
|
||||
"ODS_TABLE_USE",
|
||||
"ODS_TABLES",
|
||||
"ODS_GOODS_CATEGORY",
|
||||
"ODS_STORE_GOODS",
|
||||
"ODS_TABLE_DISCOUNT",
|
||||
"ODS_TABLE_FEE_DISCOUNT",
|
||||
"ODS_STORE_GOODS_SALES",
|
||||
"ODS_TENANT_GOODS",
|
||||
"ODS_SETTLEMENT_TICKET",
|
||||
"ODS_ORDER_SETTLE",
|
||||
"ODS_SETTLEMENT_RECORDS",
|
||||
}
|
||||
|
||||
ODS_TASK_CLASSES: Dict[str, Type[BaseOdsTask]] = {
|
||||
@@ -931,3 +1026,4 @@ ODS_TASK_CLASSES: Dict[str, Type[BaseOdsTask]] = {
|
||||
ODS_TASK_CLASSES["ODS_SETTLEMENT_TICKET"] = OdsSettlementTicketTask
|
||||
|
||||
__all__ = ["ODS_TASK_CLASSES", "ODS_TASK_SPECS", "BaseOdsTask", "ENABLED_ODS_CODES"]
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# -*- coding: utf-8 -*-
|
||||
from .base_dwd_task import BaseDwdTask
|
||||
from loaders.facts.payment import PaymentLoader
|
||||
from models.parsers import TypeParser
|
||||
@@ -29,7 +29,7 @@ class PaymentsDwdTask(BaseDwdTask):
|
||||
|
||||
# Iterate ODS Data
|
||||
batches = self.iter_ods_rows(
|
||||
table_name="billiards_ods.ods_payment_record",
|
||||
table_name="billiards_ods.payment_transactions",
|
||||
columns=["site_id", "pay_id", "payload", "fetched_at"],
|
||||
start_time=window_start,
|
||||
end_time=window_end
|
||||
@@ -136,3 +136,4 @@ class PaymentsDwdTask(BaseDwdTask):
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Error parsing payment: {e}")
|
||||
return None
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Unit tests for the new ODS ingestion tasks."""
|
||||
import logging
|
||||
import os
|
||||
@@ -22,21 +22,21 @@ def _build_config(tmp_path):
|
||||
return create_test_config("ONLINE", archive_dir, temp_dir)
|
||||
|
||||
|
||||
def test_ods_assistant_accounts_ingest(tmp_path):
|
||||
"""Ensure ODS_ASSISTANT_ACCOUNTS task stores raw payload with record_index dedup keys."""
|
||||
def test_assistant_accounts_masters_ingest(tmp_path):
|
||||
"""Ensure assistant_accounts_masterS task stores raw payload with record_index dedup keys."""
|
||||
config = _build_config(tmp_path)
|
||||
sample = [
|
||||
{
|
||||
"id": 5001,
|
||||
"assistant_no": "A01",
|
||||
"nickname": "小张",
|
||||
"nickname": "灏忓紶",
|
||||
}
|
||||
]
|
||||
api = FakeAPIClient({"/PersonnelManagement/SearchAssistantInfo": sample})
|
||||
task_cls = ODS_TASK_CLASSES["ODS_ASSISTANT_ACCOUNTS"]
|
||||
task_cls = ODS_TASK_CLASSES["assistant_accounts_masterS"]
|
||||
|
||||
with get_db_operations() as db_ops:
|
||||
task = task_cls(config, db_ops, api, logging.getLogger("test_ods_assistant_accounts"))
|
||||
task = task_cls(config, db_ops, api, logging.getLogger("test_assistant_accounts_masters"))
|
||||
result = task.execute()
|
||||
|
||||
assert result["status"] == "SUCCESS"
|
||||
@@ -49,21 +49,21 @@ def test_ods_assistant_accounts_ingest(tmp_path):
|
||||
assert '"id": 5001' in row["payload"]
|
||||
|
||||
|
||||
def test_ods_inventory_change_ingest(tmp_path):
|
||||
"""Ensure ODS_INVENTORY_CHANGE task stores raw payload with record_index dedup keys."""
|
||||
def test_goods_stock_movements_ingest(tmp_path):
|
||||
"""Ensure goods_stock_movements task stores raw payload with record_index dedup keys."""
|
||||
config = _build_config(tmp_path)
|
||||
sample = [
|
||||
{
|
||||
"siteGoodsStockId": 123456,
|
||||
"stockType": 1,
|
||||
"goodsName": "测试商品",
|
||||
"goodsName": "娴嬭瘯鍟嗗搧",
|
||||
}
|
||||
]
|
||||
api = FakeAPIClient({"/GoodsStockManage/QueryGoodsOutboundReceipt": sample})
|
||||
task_cls = ODS_TASK_CLASSES["ODS_INVENTORY_CHANGE"]
|
||||
task_cls = ODS_TASK_CLASSES["goods_stock_movements"]
|
||||
|
||||
with get_db_operations() as db_ops:
|
||||
task = task_cls(config, db_ops, api, logging.getLogger("test_ods_inventory_change"))
|
||||
task = task_cls(config, db_ops, api, logging.getLogger("test_goods_stock_movements"))
|
||||
result = task.execute()
|
||||
|
||||
assert result["status"] == "SUCCESS"
|
||||
@@ -75,7 +75,7 @@ def test_ods_inventory_change_ingest(tmp_path):
|
||||
assert '"siteGoodsStockId": 123456' in row["payload"]
|
||||
|
||||
|
||||
def test_ods_member_profiles_ingest(tmp_path):
|
||||
def test_member_profiless_ingest(tmp_path):
|
||||
"""Ensure ODS_MEMBER task stores tenantMemberInfos raw JSON."""
|
||||
config = _build_config(tmp_path)
|
||||
sample = [{"tenantMemberInfos": [{"id": 101, "mobile": "13800000000"}]}]
|
||||
@@ -110,14 +110,14 @@ def test_ods_payment_ingest(tmp_path):
|
||||
|
||||
|
||||
def test_ods_settlement_records_ingest(tmp_path):
|
||||
"""Ensure ODS_ORDER_SETTLE task stores settleList raw JSON."""
|
||||
"""Ensure settlement_records task stores settleList raw JSON."""
|
||||
config = _build_config(tmp_path)
|
||||
sample = [{"data": {"settleList": [{"id": 701, "orderTradeNo": 8001}]}}]
|
||||
api = FakeAPIClient({"/Site/GetAllOrderSettleList": sample})
|
||||
task_cls = ODS_TASK_CLASSES["ODS_ORDER_SETTLE"]
|
||||
task_cls = ODS_TASK_CLASSES["settlement_records"]
|
||||
|
||||
with get_db_operations() as db_ops:
|
||||
task = task_cls(config, db_ops, api, logging.getLogger("test_ods_order_settle"))
|
||||
task = task_cls(config, db_ops, api, logging.getLogger("test_settlement_records"))
|
||||
result = task.execute()
|
||||
|
||||
assert result["status"] == "SUCCESS"
|
||||
@@ -158,3 +158,4 @@ def test_ods_settlement_ticket_by_payment_relate_ids(tmp_path):
|
||||
and call.get("params", {}).get("orderSettleId") == 9001
|
||||
for call in api.calls
|
||||
)
|
||||
|
||||
|
||||
1361
tmp/20251121-task.txt
Normal file
1361
tmp/20251121-task.txt
Normal file
File diff suppressed because it is too large
Load Diff
321
tmp/etl_billiards_misc/backups/manual_ingest_task.py
Normal file
321
tmp/etl_billiards_misc/backups/manual_ingest_task.py
Normal file
@@ -0,0 +1,321 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""鎵嬪伐绀轰緥鏁版嵁鐏屽叆锛氭寜 schema_ODS_doc.sql 涓婚敭/鍞竴閿壒閲忓啓鍏?ODS銆?""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime
|
||||
from typing import Any, Iterable
|
||||
|
||||
from psycopg2.extras import Json
|
||||
|
||||
from .base_task import BaseTask
|
||||
|
||||
|
||||
class ManualIngestTask(BaseTask):
|
||||
"""鏈湴绀轰緥 JSON 鐏屽叆 ODS锛岀‘淇濊〃鍚嶃€佷富閿€佹彃鍏ュ垪涓?schema_ODS_doc.sql 瀵归綈銆?""
|
||||
|
||||
def __init__(self, config, db_connection, api_client, logger):
|
||||
"""鍒濆鍖栫紦瀛橈紝閬垮厤閲嶅鏌ヨ琛ㄧ粨鏋勩€?""
|
||||
super().__init__(config, db_connection, api_client, logger)
|
||||
self._table_columns_cache: dict[str, list[str]] = {}
|
||||
|
||||
# 鏂囦欢鍏抽敭璇?-> 鐩爣琛紙鍖归厤 C:\dev\LLTQ\export\temp\source-data-doc 涓嬬ず鑼?JSON 鍚嶇О锛? FILE_MAPPING: list[tuple[tuple[str, ...], str]] = [
|
||||
(("浼氬憳妗f", "member_profiles"), "billiards_ods.member_profiles"),
|
||||
(("浣欓鍙樻洿璁板綍", "member_balance_changes"), "billiards_ods.member_balance_changes"),
|
||||
(("鍌ㄥ€煎崱鍒楄〃", "member_stored_value_cards"), "billiards_ods.member_stored_value_cards"),
|
||||
(("鍏呭€艰褰?, "recharge_settlements"), "billiards_ods.recharge_settlements"),
|
||||
(("缁撹处璁板綍", "settlement_records"), "billiards_ods.settlement_records"),
|
||||
(("鍔╂暀搴熼櫎", "assistant_cancellation_records"), "billiards_ods.assistant_cancellation_records"),
|
||||
(("鍔╂暀璐﹀彿", "assistant_accounts_master"), "billiards_ods.assistant_accounts_master"),
|
||||
(("鍔╂暀娴佹按", "assistant_service_records"), "billiards_ods.assistant_service_records"),
|
||||
(("鍙版鍒楄〃", "site_tables_master"), "billiards_ods.site_tables_master"),
|
||||
(("鍙拌垂鎵撴姌", "table_fee_discount_records"), "billiards_ods.table_fee_discount_records"),
|
||||
(("鍙拌垂娴佹按", "table_fee_transactions"), "billiards_ods.table_fee_transactions"),
|
||||
(("搴撳瓨鍙樺寲璁板綍1", "goods_stock_movements"), "billiards_ods.goods_stock_movements"),
|
||||
(("搴撳瓨鍙樺寲璁板綍2", "stock_goods_category_tree"), "billiards_ods.stock_goods_category_tree"),
|
||||
(("搴撳瓨姹囨€?, "goods_stock_summary"), "billiards_ods.goods_stock_summary"),
|
||||
(("鏀粯璁板綍", "payment_transactions"), "billiards_ods.payment_transactions"),
|
||||
(("閫€娆捐褰?, "refund_transactions"), "billiards_ods.refund_transactions"),
|
||||
(("骞冲彴楠屽埜璁板綍", "platform_coupon_redemption_records"), "billiards_ods.platform_coupon_redemption_records"),
|
||||
(("鍥㈣喘濂楅娴佹按", "group_buy_redemption_records"), "billiards_ods.group_buy_packages_ledger"),
|
||||
(("鍥㈣喘濂楅", "group_buy_packages"), "billiards_ods.group_buy_packages"),
|
||||
(("灏忕エ璇︽儏", "settlement_ticket_details"), "billiards_ods.settlement_ticket_details"),
|
||||
(("闂ㄥ簵鍟嗗搧妗f", "store_goods_master"), "billiards_ods.store_goods_master"),
|
||||
(("鍟嗗搧妗f", "tenant_goods_master"), "billiards_ods.tenant_goods_master"),
|
||||
(("闂ㄥ簵鍟嗗搧閿€鍞褰?, "store_goods_sales_records"), "billiards_ods.store_goods_sales_records"),
|
||||
]
|
||||
|
||||
# 琛ㄧ粨鏋勮鏄庯細pk=涓婚敭鍒?None 琛ㄧず鏃犲啿绐佹洿鏂?锛宩son_cols=闇€瑕佸崟鍒楀瓨 JSONB 鐨勫瓧娈? TABLE_SPECS: dict[str, dict[str, Any]] = {
|
||||
"billiards_ods.member_profiles": {"pk": "id"},
|
||||
"billiards_ods.member_balance_changes": {"pk": "id"},
|
||||
"billiards_ods.member_stored_value_cards": {"pk": "id"},
|
||||
"billiards_ods.recharge_settlements": {"pk": None, "json_cols": ["settleList", "siteProfile"]},
|
||||
"billiards_ods.settlement_records": {"pk": None, "json_cols": ["settleList", "siteProfile"]},
|
||||
"billiards_ods.assistant_cancellation_records": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.assistant_accounts_master": {"pk": "id"},
|
||||
"billiards_ods.assistant_service_records": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.site_tables_master": {"pk": "id"},
|
||||
"billiards_ods.table_fee_discount_records": {"pk": "id", "json_cols": ["siteProfile", "tableProfile"]},
|
||||
"billiards_ods.table_fee_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.goods_stock_movements": {"pk": "siteGoodsStockId"},
|
||||
"billiards_ods.stock_goods_category_tree": {"pk": "id", "json_cols": ["categoryBoxes"]},
|
||||
"billiards_ods.goods_stock_summary": {"pk": "siteGoodsId"},
|
||||
"billiards_ods.payment_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.refund_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.platform_coupon_redemption_records": {"pk": "id"},
|
||||
"billiards_ods.tenant_goods_master": {"pk": "id"},
|
||||
"billiards_ods.group_buy_packages": {"pk": "id"},
|
||||
"billiards_ods.group_buy_packages_ledger": {"pk": "id"},
|
||||
"billiards_ods.settlement_ticket_details": {
|
||||
"pk": "orderSettleId",
|
||||
"json_cols": ["memberProfile", "orderItem", "tenantMemberCardLogs"],
|
||||
},
|
||||
"billiards_ods.store_goods_master": {"pk": "id"},
|
||||
"billiards_ods.store_goods_sales_records": {"pk": "id"},
|
||||
}
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""杩斿洖浠诲姟缂栫爜銆?""
|
||||
return "MANUAL_INGEST"
|
||||
|
||||
def execute(self, cursor_data: dict | None = None) -> dict:
|
||||
"""浠庣ず鑼冪洰褰曡鍙?JSON锛屾寜琛?涓婚敭鎵归噺鍏ュ簱銆?""
|
||||
data_dir = (
|
||||
self.config.get("manual.data_dir")
|
||||
or self.config.get("pipeline.ingest_source_dir")
|
||||
or r"c:\dev\LLTQ\ETL\feiqiu-ETL\etl_billiards\tests\testdata_json"
|
||||
)
|
||||
if not os.path.exists(data_dir):
|
||||
self.logger.error("Data directory not found: %s", data_dir)
|
||||
return {"status": "error", "message": "Directory not found"}
|
||||
|
||||
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
|
||||
|
||||
for filename in sorted(os.listdir(data_dir)):
|
||||
if not filename.endswith(".json"):
|
||||
continue
|
||||
filepath = os.path.join(data_dir, filename)
|
||||
try:
|
||||
with open(filepath, "r", encoding="utf-8") as fh:
|
||||
raw_entries = json.load(fh)
|
||||
except Exception:
|
||||
counts["errors"] += 1
|
||||
self.logger.exception("Failed to read %s", filename)
|
||||
continue
|
||||
|
||||
if not isinstance(raw_entries, list):
|
||||
raw_entries = [raw_entries]
|
||||
|
||||
records = self._extract_records(raw_entries)
|
||||
if not records:
|
||||
counts["skipped"] += 1
|
||||
continue
|
||||
|
||||
target_table = self._match_by_filename(filename)
|
||||
if not target_table:
|
||||
self.logger.warning("No mapping found for file: %s", filename)
|
||||
counts["skipped"] += 1
|
||||
continue
|
||||
|
||||
self.logger.info("Ingesting %s into %s", filename, target_table)
|
||||
try:
|
||||
inserted, updated = self._ingest_table(target_table, records, filename)
|
||||
counts["inserted"] += inserted
|
||||
counts["updated"] += updated
|
||||
counts["fetched"] += len(records)
|
||||
except Exception:
|
||||
counts["errors"] += 1
|
||||
self.logger.exception("Error processing %s", filename)
|
||||
self.db.rollback()
|
||||
continue
|
||||
|
||||
try:
|
||||
self.db.commit()
|
||||
except Exception:
|
||||
self.db.rollback()
|
||||
raise
|
||||
|
||||
return {"status": "SUCCESS", "counts": counts}
|
||||
|
||||
# ------------------------------------------------------------------ helpers
|
||||
def _match_by_filename(self, filename: str) -> str | None:
|
||||
"""鏍规嵁鏂囦欢鍚嶅叧閿瘝鎵惧埌鐩爣琛ㄣ€?""
|
||||
for keywords, table in self.FILE_MAPPING:
|
||||
if any(keyword and keyword in filename for keyword in keywords):
|
||||
return table
|
||||
return None
|
||||
|
||||
def _extract_records(self, raw_entries: Iterable[Any]) -> list[dict]:
|
||||
"""鍏煎澶氱 JSON 缁撴瀯锛屾彁鍙栨垚璁板綍鍒楄〃銆?""
|
||||
records: list[dict] = []
|
||||
for entry in raw_entries:
|
||||
if isinstance(entry, dict):
|
||||
# 濡傛灉鍚?data 涓旇繕鍖呭惈鍏朵粬閿紙濡?orderSettleId锛夛紝浼樺厛淇濈暀澶栧眰浠ュ厤涓㈠け涓婚敭
|
||||
preferred = entry
|
||||
if "data" in entry and not any(k not in {"data", "code"} for k in entry.keys()):
|
||||
preferred = entry["data"]
|
||||
data = preferred
|
||||
if isinstance(data, dict):
|
||||
list_used = False
|
||||
for v in data.values():
|
||||
if isinstance(v, list) and v and isinstance(v[0], dict):
|
||||
records.extend(v)
|
||||
list_used = True
|
||||
break
|
||||
if list_used:
|
||||
continue
|
||||
if isinstance(data, list) and data and isinstance(data[0], dict):
|
||||
records.extend(data)
|
||||
elif isinstance(data, dict):
|
||||
records.append(data)
|
||||
elif isinstance(entry, list):
|
||||
records.extend([item for item in entry if isinstance(item, dict)])
|
||||
return records
|
||||
|
||||
def _get_table_columns(self, table: str) -> list[str]:
|
||||
"""鏌ヨ淇℃伅_schema锛岃幏鍙栫洰鏍囪〃鐨勫叏閮ㄥ垪鍚嶏紙鎸夐『搴忥級銆?""
|
||||
if table in self._table_columns_cache:
|
||||
return self._table_columns_cache[table]
|
||||
if "." in table:
|
||||
schema, name = table.split(".", 1)
|
||||
else:
|
||||
schema, name = "public", table
|
||||
sql = """
|
||||
SELECT column_name, data_type, udt_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s AND table_name = %s
|
||||
ORDER BY ordinal_position
|
||||
"""
|
||||
with self.db.conn.cursor() as cur:
|
||||
cur.execute(sql, (schema, name))
|
||||
cols = [(r[0], (r[1] or "").lower(), (r[2] or "").lower()) for r in cur.fetchall()]
|
||||
self._table_columns_cache[table] = cols
|
||||
return cols
|
||||
|
||||
def _ingest_table(self, table: str, records: list[dict], source_file: str) -> tuple[int, int]:
|
||||
"""鏋勯€?INSERT/ON CONFLICT 璇彞骞舵壒閲忔墽琛屻€?""
|
||||
spec = self.TABLE_SPECS.get(table)
|
||||
if not spec:
|
||||
raise ValueError(f"No table spec for {table}")
|
||||
|
||||
pk_col = spec.get("pk")
|
||||
json_cols = set(spec.get("json_cols", []))
|
||||
json_cols_lower = {c.lower() for c in json_cols}
|
||||
|
||||
columns_info = self._get_table_columns(table)
|
||||
columns = [c[0] for c in columns_info]
|
||||
db_json_cols_lower = {
|
||||
c[0].lower() for c in columns_info if c[1] in ("json", "jsonb") or c[2] in ("json", "jsonb")
|
||||
}
|
||||
pk_col_db = None
|
||||
if pk_col:
|
||||
pk_col_db = next((c for c in columns if c.lower() == pk_col.lower()), pk_col)
|
||||
|
||||
placeholders = ", ".join(["%s"] * len(columns))
|
||||
col_list = ", ".join(f'"{c}"' for c in columns)
|
||||
sql = f'INSERT INTO {table} ({col_list}) VALUES ({placeholders})'
|
||||
if pk_col_db:
|
||||
update_cols = [c for c in columns if c != pk_col_db]
|
||||
set_clause = ", ".join(f'"{c}"=EXCLUDED."{c}"' for c in update_cols)
|
||||
sql += f' ON CONFLICT ("{pk_col_db}") DO UPDATE SET {set_clause}'
|
||||
sql += " RETURNING (xmax = 0) AS inserted"
|
||||
|
||||
params = []
|
||||
now = datetime.now()
|
||||
json_dump = lambda v: json.dumps(v, ensure_ascii=False) # noqa: E731
|
||||
for rec in records:
|
||||
merged_rec = rec if isinstance(rec, dict) else {}
|
||||
# 閫愬眰灞曞紑 data -> data.data 缁撴瀯锛屽~鍏呯己澶卞瓧娈? data_part = merged_rec.get("data")
|
||||
while isinstance(data_part, dict):
|
||||
merged_rec = {**data_part, **merged_rec}
|
||||
data_part = data_part.get("data")
|
||||
|
||||
pk_val = self._get_value_case_insensitive(merged_rec, pk_col) if pk_col else None
|
||||
if pk_col and (pk_val is None or pk_val == ""):
|
||||
continue
|
||||
|
||||
row_vals = []
|
||||
for col_name, data_type, udt in columns_info:
|
||||
col_lower = col_name.lower()
|
||||
if col_lower == "payload":
|
||||
row_vals.append(Json(rec, dumps=json_dump))
|
||||
continue
|
||||
if col_lower == "source_file":
|
||||
row_vals.append(source_file)
|
||||
continue
|
||||
if col_lower == "fetched_at":
|
||||
row_vals.append(merged_rec.get(col_name, now))
|
||||
continue
|
||||
|
||||
value = self._normalize_scalar(self._get_value_case_insensitive(merged_rec, col_name))
|
||||
|
||||
if col_lower in json_cols_lower or col_lower in db_json_cols_lower:
|
||||
row_vals.append(Json(value, dumps=json_dump) if value is not None else None)
|
||||
continue
|
||||
|
||||
casted = self._cast_value(value, data_type)
|
||||
row_vals.append(casted)
|
||||
params.append(tuple(row_vals))
|
||||
|
||||
if not params:
|
||||
return 0, 0
|
||||
|
||||
inserted = 0
|
||||
updated = 0
|
||||
with self.db.conn.cursor() as cur:
|
||||
for row in params:
|
||||
cur.execute(sql, row)
|
||||
try:
|
||||
flag = cur.fetchone()[0]
|
||||
except Exception:
|
||||
flag = None
|
||||
if flag:
|
||||
inserted += 1
|
||||
else:
|
||||
updated += 1
|
||||
return inserted, updated
|
||||
|
||||
def _get_value_case_insensitive(self, record: dict, col: str):
|
||||
"""蹇界暐澶у皬鍐欒幏鍙栧€硷紝鍏煎 information_schema 灏忓啓鍒楀悕涓?JSON 鍘熷澶у皬鍐欍€?""
|
||||
if record is None:
|
||||
return None
|
||||
if col is None:
|
||||
return None
|
||||
if col in record:
|
||||
return record.get(col)
|
||||
col_lower = col.lower()
|
||||
for k, v in record.items():
|
||||
if isinstance(k, str) and k.lower() == col_lower:
|
||||
return v
|
||||
return None
|
||||
|
||||
def _normalize_scalar(self, value):
|
||||
"""灏嗙┖瀛楃涓叉爣鍑嗗寲涓?None锛岄伩鍏嶆暟鍊?鏃堕棿瀛楁绫诲瀷閿欒銆?""
|
||||
if value == "" or value == "{}" or value == "[]":
|
||||
return None
|
||||
return value
|
||||
|
||||
def _cast_value(self, value, data_type: str):
|
||||
"""鏍规嵁鍒楃被鍨嬪仛杞婚噺杞崲锛岄伩鍏嶇被鍨嬩笉鍖归厤銆?""
|
||||
if value is None:
|
||||
return None
|
||||
dt = (data_type or "").lower()
|
||||
if dt in ("integer", "bigint", "smallint"):
|
||||
if isinstance(value, bool):
|
||||
return int(value)
|
||||
try:
|
||||
return int(value)
|
||||
except Exception:
|
||||
return None
|
||||
if dt in ("numeric", "double precision", "real", "decimal"):
|
||||
if isinstance(value, bool):
|
||||
return int(value)
|
||||
try:
|
||||
return float(value)
|
||||
except Exception:
|
||||
return None
|
||||
if dt.startswith("timestamp") or dt in ("date", "time", "interval"):
|
||||
# 浠呮帴鍙楀瓧绗︿覆/鏃ユ湡锛屾暟鍊肩瓑涓€寰嬬疆绌? return value if isinstance(value, str) else None
|
||||
return value
|
||||
|
||||
@@ -0,0 +1,347 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""手工示例数据灌入:按 schema_ODS_doc.sql 的表结构写入 ODS。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime
|
||||
from typing import Any, Iterable
|
||||
|
||||
from psycopg2.extras import Json
|
||||
|
||||
from .base_task import BaseTask
|
||||
|
||||
|
||||
class ManualIngestTask(BaseTask):
|
||||
"""本地示例 JSON 灌入 ODS,确保表名/主键/插入列与 schema_ODS_doc.sql 对齐。"""
|
||||
|
||||
FILE_MAPPING: list[tuple[tuple[str, ...], str]] = [
|
||||
(("member_profiles",), "billiards_ods.member_profiles"),
|
||||
(("member_balance_changes",), "billiards_ods.member_balance_changes"),
|
||||
(("member_stored_value_cards",), "billiards_ods.member_stored_value_cards"),
|
||||
(("recharge_settlements",), "billiards_ods.recharge_settlements"),
|
||||
(("settlement_records",), "billiards_ods.settlement_records"),
|
||||
(("assistant_cancellation_records",), "billiards_ods.assistant_cancellation_records"),
|
||||
(("assistant_accounts_master",), "billiards_ods.assistant_accounts_master"),
|
||||
(("assistant_service_records",), "billiards_ods.assistant_service_records"),
|
||||
(("site_tables_master",), "billiards_ods.site_tables_master"),
|
||||
(("table_fee_discount_records",), "billiards_ods.table_fee_discount_records"),
|
||||
(("table_fee_transactions",), "billiards_ods.table_fee_transactions"),
|
||||
(("goods_stock_movements",), "billiards_ods.goods_stock_movements"),
|
||||
(("stock_goods_category_tree",), "billiards_ods.stock_goods_category_tree"),
|
||||
(("goods_stock_summary",), "billiards_ods.goods_stock_summary"),
|
||||
(("payment_transactions",), "billiards_ods.payment_transactions"),
|
||||
(("refund_transactions",), "billiards_ods.refund_transactions"),
|
||||
(("platform_coupon_redemption_records",), "billiards_ods.platform_coupon_redemption_records"),
|
||||
(("group_buy_redemption_records",), "billiards_ods.group_buy_redemption_records"),
|
||||
(("group_buy_packages",), "billiards_ods.group_buy_packages"),
|
||||
(("settlement_ticket_details",), "billiards_ods.settlement_ticket_details"),
|
||||
(("store_goods_master",), "billiards_ods.store_goods_master"),
|
||||
(("tenant_goods_master",), "billiards_ods.tenant_goods_master"),
|
||||
(("store_goods_sales_records",), "billiards_ods.store_goods_sales_records"),
|
||||
]
|
||||
|
||||
TABLE_SPECS: dict[str, dict[str, Any]] = {
|
||||
"billiards_ods.member_profiles": {"pk": "id"},
|
||||
"billiards_ods.member_balance_changes": {"pk": "id"},
|
||||
"billiards_ods.member_stored_value_cards": {"pk": "id"},
|
||||
"billiards_ods.recharge_settlements": {"pk": "id"},
|
||||
"billiards_ods.settlement_records": {"pk": "id"},
|
||||
"billiards_ods.assistant_cancellation_records": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.assistant_accounts_master": {"pk": "id"},
|
||||
"billiards_ods.assistant_service_records": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.site_tables_master": {"pk": "id"},
|
||||
"billiards_ods.table_fee_discount_records": {"pk": "id", "json_cols": ["siteProfile", "tableProfile"]},
|
||||
"billiards_ods.table_fee_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.goods_stock_movements": {"pk": "siteGoodsStockId"},
|
||||
"billiards_ods.stock_goods_category_tree": {"pk": "id", "json_cols": ["categoryBoxes"]},
|
||||
"billiards_ods.goods_stock_summary": {"pk": "siteGoodsId"},
|
||||
"billiards_ods.payment_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.refund_transactions": {"pk": "id", "json_cols": ["siteProfile"]},
|
||||
"billiards_ods.platform_coupon_redemption_records": {"pk": "id"},
|
||||
"billiards_ods.tenant_goods_master": {"pk": "id"},
|
||||
"billiards_ods.group_buy_packages": {"pk": "id"},
|
||||
"billiards_ods.group_buy_redemption_records": {"pk": "id"},
|
||||
"billiards_ods.settlement_ticket_details": {
|
||||
"pk": "orderSettleId",
|
||||
"json_cols": ["memberProfile", "orderItem", "tenantMemberCardLogs"],
|
||||
},
|
||||
"billiards_ods.store_goods_master": {"pk": "id"},
|
||||
"billiards_ods.store_goods_sales_records": {"pk": "id"},
|
||||
}
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
"""返回任务编码。"""
|
||||
return "MANUAL_INGEST"
|
||||
|
||||
def execute(self, cursor_data: dict | None = None) -> dict:
|
||||
"""从目录读取 JSON,按表定义批量入库。"""
|
||||
data_dir = (
|
||||
self.config.get("manual.data_dir")
|
||||
or self.config.get("pipeline.ingest_source_dir")
|
||||
or r"c:\dev\LLTQ\ETL\feiqiu-ETL\etl_billiards\tests\testdata_json"
|
||||
)
|
||||
if not os.path.exists(data_dir):
|
||||
self.logger.error("Data directory not found: %s", data_dir)
|
||||
return {"status": "error", "message": "Directory not found"}
|
||||
|
||||
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
|
||||
|
||||
for filename in sorted(os.listdir(data_dir)):
|
||||
if not filename.endswith(".json"):
|
||||
continue
|
||||
filepath = os.path.join(data_dir, filename)
|
||||
try:
|
||||
with open(filepath, "r", encoding="utf-8") as fh:
|
||||
raw_entries = json.load(fh)
|
||||
except Exception:
|
||||
counts["errors"] += 1
|
||||
self.logger.exception("Failed to read %s", filename)
|
||||
continue
|
||||
|
||||
entries = raw_entries if isinstance(raw_entries, list) else [raw_entries]
|
||||
records = self._extract_records(entries)
|
||||
if not records:
|
||||
counts["skipped"] += 1
|
||||
continue
|
||||
|
||||
target_table = self._match_by_filename(filename)
|
||||
if not target_table:
|
||||
self.logger.warning("No mapping found for file: %s", filename)
|
||||
counts["skipped"] += 1
|
||||
continue
|
||||
|
||||
self.logger.info("Ingesting %s into %s", filename, target_table)
|
||||
try:
|
||||
inserted, updated = self._ingest_table(target_table, records, filename)
|
||||
counts["inserted"] += inserted
|
||||
counts["updated"] += updated
|
||||
counts["fetched"] += len(records)
|
||||
except Exception:
|
||||
counts["errors"] += 1
|
||||
self.logger.exception("Error processing %s", filename)
|
||||
self.db.rollback()
|
||||
continue
|
||||
|
||||
try:
|
||||
self.db.commit()
|
||||
except Exception:
|
||||
self.db.rollback()
|
||||
raise
|
||||
|
||||
return {"status": "SUCCESS", "counts": counts}
|
||||
|
||||
def _match_by_filename(self, filename: str) -> str | None:
|
||||
"""根据文件名关键字匹配目标表。"""
|
||||
for keywords, table in self.FILE_MAPPING:
|
||||
if any(keyword and keyword in filename for keyword in keywords):
|
||||
return table
|
||||
return None
|
||||
|
||||
def _extract_records(self, raw_entries: Iterable[Any]) -> list[dict]:
|
||||
"""兼容多层 data/list 包装,抽取记录列表。"""
|
||||
records: list[dict] = []
|
||||
for entry in raw_entries:
|
||||
if isinstance(entry, dict):
|
||||
preferred = entry
|
||||
if "data" in entry and not any(k not in {"data", "code"} for k in entry.keys()):
|
||||
preferred = entry["data"]
|
||||
data = preferred
|
||||
if isinstance(data, dict):
|
||||
# 特殊处理 settleList(充值、结算记录):展开 data.settleList 下的 settleList,抛弃上层 siteProfile
|
||||
if "settleList" in data:
|
||||
settle_list_val = data.get("settleList")
|
||||
if isinstance(settle_list_val, dict):
|
||||
settle_list_iter = [settle_list_val]
|
||||
elif isinstance(settle_list_val, list):
|
||||
settle_list_iter = settle_list_val
|
||||
else:
|
||||
settle_list_iter = []
|
||||
|
||||
handled = False
|
||||
for item in settle_list_iter or []:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
inner = item.get("settleList")
|
||||
merged = dict(inner) if isinstance(inner, dict) else dict(item)
|
||||
# 保留 siteProfile 供后续字段补充,但不落库
|
||||
site_profile = data.get("siteProfile")
|
||||
if isinstance(site_profile, dict):
|
||||
merged.setdefault("siteProfile", site_profile)
|
||||
records.append(merged)
|
||||
handled = True
|
||||
if handled:
|
||||
continue
|
||||
|
||||
list_used = False
|
||||
for v in data.values():
|
||||
if isinstance(v, list) and v and isinstance(v[0], dict):
|
||||
records.extend(v)
|
||||
list_used = True
|
||||
break
|
||||
if list_used:
|
||||
continue
|
||||
if isinstance(data, list) and data and isinstance(data[0], dict):
|
||||
records.extend(data)
|
||||
elif isinstance(data, dict):
|
||||
records.append(data)
|
||||
elif isinstance(entry, list):
|
||||
records.extend([item for item in entry if isinstance(item, dict)])
|
||||
return records
|
||||
|
||||
def _get_table_columns(self, table: str) -> list[tuple[str, str, str]]:
|
||||
"""查询 information_schema,获取目标表列信息。"""
|
||||
cache = getattr(self, "_table_columns_cache", {})
|
||||
if table in cache:
|
||||
return cache[table]
|
||||
if "." in table:
|
||||
schema, name = table.split(".", 1)
|
||||
else:
|
||||
schema, name = "public", table
|
||||
sql = """
|
||||
SELECT column_name, data_type, udt_name
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = %s AND table_name = %s
|
||||
ORDER BY ordinal_position
|
||||
"""
|
||||
with self.db.conn.cursor() as cur:
|
||||
cur.execute(sql, (schema, name))
|
||||
cols = [(r[0], (r[1] or "").lower(), (r[2] or "").lower()) for r in cur.fetchall()]
|
||||
cache[table] = cols
|
||||
self._table_columns_cache = cache
|
||||
return cols
|
||||
|
||||
def _ingest_table(self, table: str, records: list[dict], source_file: str) -> tuple[int, int]:
|
||||
"""构建 INSERT/ON CONFLICT 语句并批量执行。"""
|
||||
spec = self.TABLE_SPECS.get(table)
|
||||
if not spec:
|
||||
raise ValueError(f"No table spec for {table}")
|
||||
|
||||
pk_col = spec.get("pk")
|
||||
json_cols = set(spec.get("json_cols", []))
|
||||
json_cols_lower = {c.lower() for c in json_cols}
|
||||
|
||||
columns_info = self._get_table_columns(table)
|
||||
columns = [c[0] for c in columns_info]
|
||||
db_json_cols_lower = {
|
||||
c[0].lower() for c in columns_info if c[1] in ("json", "jsonb") or c[2] in ("json", "jsonb")
|
||||
}
|
||||
pk_col_db = None
|
||||
if pk_col:
|
||||
pk_col_db = next((c for c in columns if c.lower() == pk_col.lower()), pk_col)
|
||||
|
||||
placeholders = ", ".join(["%s"] * len(columns))
|
||||
col_list = ", ".join(f'"{c}"' for c in columns)
|
||||
sql = f'INSERT INTO {table} ({col_list}) VALUES ({placeholders})'
|
||||
if pk_col_db:
|
||||
update_cols = [c for c in columns if c != pk_col_db]
|
||||
set_clause = ", ".join(f'"{c}"=EXCLUDED."{c}"' for c in update_cols)
|
||||
sql += f' ON CONFLICT ("{pk_col_db}") DO UPDATE SET {set_clause}'
|
||||
sql += " RETURNING (xmax = 0) AS inserted"
|
||||
|
||||
params = []
|
||||
now = datetime.now()
|
||||
json_dump = lambda v: json.dumps(v, ensure_ascii=False) # noqa: E731
|
||||
for rec in records:
|
||||
merged_rec = rec if isinstance(rec, dict) else {}
|
||||
data_part = merged_rec.get("data")
|
||||
while isinstance(data_part, dict):
|
||||
merged_rec = {**data_part, **merged_rec}
|
||||
data_part = data_part.get("data")
|
||||
|
||||
# 针对充值/结算,补齐 siteProfile 中的店铺信息
|
||||
if table in {
|
||||
"billiards_ods.recharge_settlements",
|
||||
"billiards_ods.settlement_records",
|
||||
}:
|
||||
site_profile = merged_rec.get("siteProfile") or merged_rec.get("site_profile")
|
||||
if isinstance(site_profile, dict):
|
||||
merged_rec.setdefault("tenantid", site_profile.get("tenant_id") or site_profile.get("tenantId"))
|
||||
merged_rec.setdefault("siteid", site_profile.get("id") or site_profile.get("siteId"))
|
||||
merged_rec.setdefault("sitename", site_profile.get("shop_name") or site_profile.get("siteName"))
|
||||
|
||||
pk_val = self._get_value_case_insensitive(merged_rec, pk_col) if pk_col else None
|
||||
if pk_col and (pk_val is None or pk_val == ""):
|
||||
continue
|
||||
|
||||
row_vals = []
|
||||
for col_name, data_type, udt in columns_info:
|
||||
col_lower = col_name.lower()
|
||||
if col_lower == "payload":
|
||||
row_vals.append(Json(rec, dumps=json_dump))
|
||||
continue
|
||||
if col_lower == "source_file":
|
||||
row_vals.append(source_file)
|
||||
continue
|
||||
if col_lower == "fetched_at":
|
||||
row_vals.append(merged_rec.get(col_name, now))
|
||||
continue
|
||||
|
||||
value = self._normalize_scalar(self._get_value_case_insensitive(merged_rec, col_name))
|
||||
|
||||
if col_lower in json_cols_lower or col_lower in db_json_cols_lower:
|
||||
row_vals.append(Json(value, dumps=json_dump) if value is not None else None)
|
||||
continue
|
||||
|
||||
casted = self._cast_value(value, data_type)
|
||||
row_vals.append(casted)
|
||||
params.append(tuple(row_vals))
|
||||
|
||||
if not params:
|
||||
return 0, 0
|
||||
|
||||
inserted = 0
|
||||
updated = 0
|
||||
with self.db.conn.cursor() as cur:
|
||||
for row in params:
|
||||
cur.execute(sql, row)
|
||||
flag = cur.fetchone()[0]
|
||||
if flag:
|
||||
inserted += 1
|
||||
else:
|
||||
updated += 1
|
||||
return inserted, updated
|
||||
|
||||
@staticmethod
|
||||
def _get_value_case_insensitive(record: dict, col: str | None):
|
||||
"""忽略大小写获取值,兼容 information_schema 与 JSON 原始字段。"""
|
||||
if record is None or col is None:
|
||||
return None
|
||||
if col in record:
|
||||
return record.get(col)
|
||||
col_lower = col.lower()
|
||||
for k, v in record.items():
|
||||
if isinstance(k, str) and k.lower() == col_lower:
|
||||
return v
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def _normalize_scalar(value):
|
||||
"""将空字符串/空 JSON 规范为 None,避免类型转换错误。"""
|
||||
if value == "" or value == "{}" or value == "[]":
|
||||
return None
|
||||
return value
|
||||
|
||||
@staticmethod
|
||||
def _cast_value(value, data_type: str):
|
||||
"""根据列类型做简单转换,保证批量插入兼容。"""
|
||||
if value is None:
|
||||
return None
|
||||
dt = (data_type or "").lower()
|
||||
if dt in ("integer", "bigint", "smallint"):
|
||||
if isinstance(value, bool):
|
||||
return int(value)
|
||||
try:
|
||||
return int(value)
|
||||
except Exception:
|
||||
return None
|
||||
if dt in ("numeric", "double precision", "real", "decimal"):
|
||||
if isinstance(value, bool):
|
||||
return int(value)
|
||||
try:
|
||||
return float(value)
|
||||
except Exception:
|
||||
return None
|
||||
if dt.startswith("timestamp") or dt in ("date", "time", "interval"):
|
||||
return value if isinstance(value, str) else None
|
||||
return value
|
||||
1714
tmp/etl_billiards_misc/backups/schema_ODS_doc.sql
Normal file
1714
tmp/etl_billiards_misc/backups/schema_ODS_doc.sql
Normal file
File diff suppressed because it is too large
Load Diff
1886
tmp/etl_billiards_misc/backups/schema_ODS_doc.sql.bak_20251209
Normal file
1886
tmp/etl_billiards_misc/backups/schema_ODS_doc.sql.bak_20251209
Normal file
File diff suppressed because it is too large
Load Diff
1373
tmp/etl_billiards_misc/tmp & Delete/DWD层设计草稿.md
Normal file
1373
tmp/etl_billiards_misc/tmp & Delete/DWD层设计草稿.md
Normal file
File diff suppressed because it is too large
Load Diff
1772
tmp/etl_billiards_misc/tmp & Delete/schema_v2.sql
Normal file
1772
tmp/etl_billiards_misc/tmp & Delete/schema_v2.sql
Normal file
File diff suppressed because it is too large
Load Diff
6899
tmp/recharge_only/recharge_settlements.json
Normal file
6899
tmp/recharge_only/recharge_settlements.json
Normal file
File diff suppressed because it is too large
Load Diff
1801
tmp/schema_dwd.sql
Normal file
1801
tmp/schema_dwd.sql
Normal file
File diff suppressed because it is too large
Load Diff
4218
tmp/single_ingest/goods_stock_movements.json
Normal file
4218
tmp/single_ingest/goods_stock_movements.json
Normal file
File diff suppressed because it is too large
Load Diff
1
tmp/temp_chinese.txt
Normal file
1
tmp/temp_chinese.txt
Normal file
@@ -0,0 +1 @@
|
||||
含义
|
||||
29
tmp/tmp_debug_sql.py
Normal file
29
tmp/tmp_debug_sql.py
Normal file
@@ -0,0 +1,29 @@
|
||||
import os, psycopg2
|
||||
from etl_billiards.tasks.dwd_load_task import DwdLoadTask
|
||||
|
||||
dwd_table="billiards_dwd.dwd_table_fee_log"
|
||||
ods_table="billiards_ods.table_fee_transactions"
|
||||
conn=psycopg2.connect(os.environ["PG_DSN"])
|
||||
cur=conn.cursor()
|
||||
task=DwdLoadTask(config={}, db_connection=None, api_client=None, logger=None)
|
||||
cur.execute("SELECT column_name FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_dwd", "dwd_table_fee_log"))
|
||||
dwd_cols=[r[0].lower() for r in cur.fetchall()]
|
||||
cur.execute("SELECT column_name FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_ods", "table_fee_transactions"))
|
||||
ods_cols=[r[0].lower() for r in cur.fetchall()]
|
||||
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_dwd", "dwd_table_fee_log"))
|
||||
dwd_types={r[0].lower(): r[1].lower() for r in cur.fetchall()}
|
||||
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_ods", "table_fee_transactions"))
|
||||
ods_types={r[0].lower(): r[1].lower() for r in cur.fetchall()}
|
||||
mapping=task.FACT_MAPPINGS.get(dwd_table)
|
||||
if mapping:
|
||||
insert_cols=[d for d,o,_ in mapping if o in ods_cols]
|
||||
select_exprs=[task._cast_expr(o,cast_type) for d,o,cast_type in mapping if o in ods_cols]
|
||||
else:
|
||||
insert_cols=[c for c in dwd_cols if c in ods_cols and c not in task.SCD_COLS]
|
||||
select_exprs=task._build_fact_select_exprs(insert_cols,dwd_types,ods_types)
|
||||
print('insert_cols', insert_cols)
|
||||
print('select_exprs', select_exprs)
|
||||
sql=f"INSERT INTO {task._format_table(dwd_table,'billiards_dwd')} ({', '.join(f'\"{c}\"' for c in insert_cols)}) SELECT {', '.join(select_exprs)} FROM {task._format_table(ods_table,'billiards_ods')}"
|
||||
print(sql)
|
||||
cur.close(); conn.close()
|
||||
|
||||
7
tmp/tmp_drop_dwd.py
Normal file
7
tmp/tmp_drop_dwd.py
Normal file
@@ -0,0 +1,7 @@
|
||||
import os, psycopg2
|
||||
conn=psycopg2.connect(os.environ["PG_DSN"])
|
||||
conn.autocommit=True
|
||||
cur=conn.cursor()
|
||||
cur.execute('DROP SCHEMA IF EXISTS billiards_dwd CASCADE')
|
||||
cur.close(); conn.close()
|
||||
print('dropped billiards_dwd')
|
||||
19
tmp/tmp_dwd_tasks.py
Normal file
19
tmp/tmp_dwd_tasks.py
Normal file
@@ -0,0 +1,19 @@
|
||||
import os
|
||||
import psycopg2
|
||||
|
||||
DSN = os.environ.get('PG_DSN')
|
||||
store_id = int(os.environ.get('STORE_ID','2790685415443269'))
|
||||
conn = psycopg2.connect(DSN)
|
||||
conn.autocommit = True
|
||||
cur = conn.cursor()
|
||||
rows = []
|
||||
for code in ('INIT_DWD_SCHEMA','DWD_LOAD_FROM_ODS','DWD_QUALITY_CHECK'):
|
||||
cur.execute("SELECT task_id FROM etl_admin.etl_task WHERE task_code=%s AND store_id=%s", (code, store_id))
|
||||
if cur.fetchone():
|
||||
cur.execute("UPDATE etl_admin.etl_task SET enabled=TRUE, updated_at=now() WHERE task_code=%s AND store_id=%s", (code, store_id))
|
||||
rows.append((code, 'updated'))
|
||||
else:
|
||||
cur.execute("INSERT INTO etl_admin.etl_task(task_code,store_id,enabled,cursor_field,window_minutes_default,overlap_seconds,page_size,params) VALUES (%s,%s,TRUE,NULL,60,120,1000,'{}') RETURNING task_id", (code, store_id))
|
||||
rows.append((code, 'inserted', cur.fetchone()[0]))
|
||||
print(rows)
|
||||
cur.close(); conn.close()
|
||||
28
tmp/tmp_problems.py
Normal file
28
tmp/tmp_problems.py
Normal file
@@ -0,0 +1,28 @@
|
||||
import os, psycopg2
|
||||
from etl_billiards.tasks.dwd_load_task import DwdLoadTask
|
||||
|
||||
conn=psycopg2.connect(os.environ['PG_DSN'])
|
||||
cur=conn.cursor()
|
||||
problems=[]
|
||||
for dwd_table, ods_table in DwdLoadTask.TABLE_MAP.items():
|
||||
if dwd_table.split('.')[-1].startswith('dwd_'):
|
||||
if '.' in dwd_table:
|
||||
dschema, dtable = dwd_table.split('.')
|
||||
else:
|
||||
dschema, dtable = 'billiards_dwd', dwd_table
|
||||
if '.' in ods_table:
|
||||
oschema, otable = ods_table.split('.')
|
||||
else:
|
||||
oschema, otable = 'billiards_ods', ods_table
|
||||
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", (dschema,dtable))
|
||||
dcols={r[0].lower():r[1].lower() for r in cur.fetchall()}
|
||||
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", (oschema,otable))
|
||||
ocols={r[0].lower():r[1].lower() for r in cur.fetchall()}
|
||||
common=set(dcols)&set(ocols)
|
||||
missing_dwd=list(set(ocols)-set(dcols))
|
||||
missing_ods=list(set(dcols)-set(ocols))
|
||||
mismatches=[(c,dcols[c],ocols[c]) for c in sorted(common) if dcols[c]!=ocols[c]]
|
||||
problems.append((dwd_table,missing_dwd,missing_ods,mismatches))
|
||||
cur.close();conn.close()
|
||||
for p in problems:
|
||||
print(p)
|
||||
26
tmp/tmp_run_sql.py
Normal file
26
tmp/tmp_run_sql.py
Normal file
@@ -0,0 +1,26 @@
|
||||
import os, psycopg2
|
||||
from etl_billiards.tasks.dwd_load_task import DwdLoadTask
|
||||
|
||||
dwd_table="billiards_dwd.dwd_table_fee_log"
|
||||
ods_table="billiards_ods.table_fee_transactions"
|
||||
conn=psycopg2.connect(os.environ["PG_DSN"])
|
||||
cur=conn.cursor()
|
||||
task=DwdLoadTask(config={}, db_connection=None, api_client=None, logger=None)
|
||||
cur.execute("SELECT column_name FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_dwd", "dwd_table_fee_log"))
|
||||
dwd_cols=[r[0].lower() for r in cur.fetchall()]
|
||||
cur.execute("SELECT column_name FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_ods", "table_fee_transactions"))
|
||||
ods_cols=[r[0].lower() for r in cur.fetchall()]
|
||||
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_dwd", "dwd_table_fee_log"))
|
||||
dwd_types={r[0].lower(): r[1].lower() for r in cur.fetchall()}
|
||||
cur.execute("SELECT column_name,data_type FROM information_schema.columns WHERE table_schema=%s AND table_name=%s", ("billiards_ods", "table_fee_transactions"))
|
||||
ods_types={r[0].lower(): r[1].lower() for r in cur.fetchall()}
|
||||
mapping=task.FACT_MAPPINGS.get(dwd_table)
|
||||
insert_cols=[d for d,o,_ in mapping if o in ods_cols]
|
||||
select_exprs=[task._cast_expr(o,cast_type) for d,o,cast_type in mapping if o in ods_cols]
|
||||
sql=f"INSERT INTO {task._format_table(dwd_table,'billiards_dwd')} ({', '.join(f'\"{c}\"' for c in insert_cols)}) SELECT {', '.join(select_exprs)} FROM {task._format_table(ods_table,'billiards_ods')} LIMIT 1"
|
||||
print(sql)
|
||||
cur.execute(sql)
|
||||
conn.commit()
|
||||
print('ok')
|
||||
cur.close(); conn.close()
|
||||
|
||||
Reference in New Issue
Block a user