Compare commits
14 Commits
13d853c3f5
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| a6ad343092 | |||
| b9b050bb5d | |||
| cbd16a39ba | |||
| 92f219b575 | |||
| b1f64c4bac | |||
| ed47754b46 | |||
| fbee8a751e | |||
| cbe48c8ee7 | |||
| 821d302243 | |||
| 9a1df70a23 | |||
| 5bb5a8a568 | |||
| c3749474c6 | |||
| 7f87421678 | |||
| 84e80841cd |
48
.gitignore
vendored
Normal file
48
.gitignore
vendored
Normal file
@@ -0,0 +1,48 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# 虚拟环境
|
||||
venv/
|
||||
ENV/
|
||||
env/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# 日志和导出
|
||||
*.log
|
||||
*.jsonl
|
||||
export/
|
||||
logs/
|
||||
|
||||
# 环境变量
|
||||
.env
|
||||
.env.local
|
||||
|
||||
# 测试
|
||||
.pytest_cache/
|
||||
.coverage
|
||||
htmlcov/
|
||||
1360
20251121-task.txt
Normal file
1360
20251121-task.txt
Normal file
File diff suppressed because it is too large
Load Diff
BIN
DWD层设计建议.docx
Normal file
BIN
DWD层设计建议.docx
Normal file
Binary file not shown.
78
README.md
78
README.md
@@ -0,0 +1,78 @@
|
||||
# 台球场 ETL 系统
|
||||
|
||||
用于台球门店业务的数据采集与入湖:从上游 API 拉取订单、支付、会员、库存等数据,先落地 ODS,再清洗写入事实/维度表,并提供运行追踪、增量游标、数据质量检查与测试脚手架。
|
||||
|
||||
## 核心特性
|
||||
- **两阶段链路**:ODS 原始留痕 + DWD/事实表清洗,支持回放与重跑。
|
||||
- **任务注册与调度**:`TaskRegistry` 统一管理任务代码,`ETLScheduler` 负责游标、运行记录和失败隔离。
|
||||
- **统一底座**:配置(默认值 + `.env` + CLI 覆盖)、分页/重试的 API 客户端、批量 Upsert 的数据库封装、SCD2 维度处理、质量检查。
|
||||
- **测试与回放**:ONLINE/OFFLINE 模式切换,`run_tests.py`/`test_presets.py` 支持参数化测试;`MANUAL_INGEST` 可将归档 JSON 重灌入 ODS。
|
||||
- **可安装**:`setup.py` / `entry_point` 提供 `etl-billiards` 命令,或直接 `python -m cli.main` 运行。
|
||||
|
||||
## 仓库结构(摘录)
|
||||
- `etl_billiards/config`:默认配置、环境变量解析、配置加载。
|
||||
- `etl_billiards/api`:HTTP 客户端,内置重试/分页。
|
||||
- `etl_billiards/database`:连接管理、批量 Upsert。
|
||||
- `etl_billiards/tasks`:业务任务(ORDERS、PAYMENTS…)、ODS 任务、DWD 任务、人工回放;`base_task.py`/`base_dwd_task.py` 提供模板。
|
||||
- `etl_billiards/loaders`:事实/维度/ODS Loader;`scd/` 为 SCD2。
|
||||
- `etl_billiards/orchestration`:调度器、任务注册表、游标与运行追踪。
|
||||
- `etl_billiards/scripts`:测试执行器、数据库连通性检测、预置测试指令。
|
||||
- `etl_billiards/tests`:单元/集成测试与离线 JSON 归档。
|
||||
|
||||
## 支持的任务代码
|
||||
- **事实/维度**:`ORDERS`、`PAYMENTS`、`REFUNDS`、`INVENTORY_CHANGE`、`COUPON_USAGE`、`MEMBERS`、`ASSISTANTS`、`PRODUCTS`、`TABLES`、`PACKAGES_DEF`、`TOPUPS`、`TABLE_DISCOUNT`、`ASSISTANT_ABOLISH`、`LEDGER`、`TICKET_DWD`、`PAYMENTS_DWD`、`MEMBERS_DWD`。
|
||||
- **ODS 原始采集**:`ODS_ORDER_SETTLE`、`ODS_TABLE_USE`、`ODS_ASSISTANT_LEDGER`、`ODS_ASSISTANT_ABOLISH`、`ODS_GOODS_LEDGER`、`ODS_PAYMENT`、`ODS_REFUND`、`ODS_COUPON_VERIFY`、`ODS_MEMBER`、`ODS_MEMBER_CARD`、`ODS_PACKAGE`、`ODS_INVENTORY_STOCK`、`ODS_INVENTORY_CHANGE`。
|
||||
- **辅助**:`MANUAL_INGEST`(将归档 JSON 回放到 ODS)。
|
||||
|
||||
## 快速开始
|
||||
1. **环境要求**:Python 3.10+、PostgreSQL。推荐在 `etl_billiards/` 目录下执行命令。
|
||||
2. **安装依赖**
|
||||
```bash
|
||||
cd etl_billiards
|
||||
pip install -r requirements.txt
|
||||
# 开发模式:pip install -e .
|
||||
```
|
||||
3. **配置 `.env`**
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# 核心项
|
||||
PG_DSN=postgresql://user:pwd@host:5432/LLZQ
|
||||
API_BASE=https://api.example.com
|
||||
API_TOKEN=your_token
|
||||
STORE_ID=2790685415443269
|
||||
EXPORT_ROOT=/path/to/export
|
||||
LOG_ROOT=/path/to/logs
|
||||
```
|
||||
配置的生效顺序为 “默认值” < “环境变量/.env” < “CLI 参数”。
|
||||
4. **运行任务**
|
||||
```bash
|
||||
# 运行默认任务集
|
||||
python -m cli.main
|
||||
|
||||
# 按需选择任务(逗号分隔)
|
||||
python -m cli.main --tasks ODS_ORDER_SETTLE,ORDERS,PAYMENTS
|
||||
|
||||
# Dry-run 示例(不提交事务)
|
||||
python -m cli.main --tasks ORDERS --dry-run
|
||||
|
||||
# Windows 批处理
|
||||
..\\run_etl.bat --tasks PAYMENTS
|
||||
```
|
||||
5. **查看输出**:日志目录与导出目录分别由 `LOG_ROOT`、`EXPORT_ROOT` 控制;运行追踪与游标记录写入数据库 `etl_admin.*` 表。
|
||||
|
||||
## 数据与运行流转
|
||||
- CLI 解析参数 → `AppConfig.load()` 组装配置 → `ETLScheduler` 创建 DB/API/游标/运行追踪器。
|
||||
- 调度器按任务代码实例化任务,读取/推进游标,落盘运行记录。
|
||||
- 任务模板:确定时间窗口 → 调用 API/ODS 数据 → 解析校验 → Loader 批量 Upsert/SCD2 → 质量检查 → 提交事务并回写游标。
|
||||
|
||||
## 测试与回放
|
||||
- 单元/集成测试:`pytest` 或 `python scripts/run_tests.py --suite online`。
|
||||
- 预置组合:`python scripts/run_tests.py --preset offline_realdb`(见 `scripts/test_presets.py`)。
|
||||
- 离线模式:`TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=... pytest tests/unit/test_etl_tasks_offline.py`。
|
||||
- 数据库连通性:`python scripts/test_db_connection.py --dsn postgresql://... --query "SELECT 1"`。
|
||||
|
||||
## 其他提示
|
||||
- `.env.example` 列出了所有常用配置;`config/defaults.py` 记录默认值与任务窗口配置。
|
||||
- `loaders/ods/generic.py` 支持定义主键/列名即可落 ODS;`tasks/manual_ingest_task.py` 可将归档 JSON 快速灌入对应 ODS 表。
|
||||
- 需要新增任务时,在 `tasks/` 中实现并在 `orchestration/task_registry.py` 注册即可复用调度能力。
|
||||
|
||||
|
||||
9
app/etl_busy.py
Normal file
9
app/etl_busy.py
Normal file
@@ -0,0 +1,9 @@
|
||||
# app/etl_busy.py
|
||||
def run():
|
||||
"""
|
||||
忙时抓取逻辑。
|
||||
TODO: 这里写具体抓取流程(API 调用 / 网页解析 / 写入 PostgreSQL 等)
|
||||
"""
|
||||
print("Running busy-period ETL...")
|
||||
# 示例:后续在这里接 PostgreSQL 或 HTTP 抓取
|
||||
# ...
|
||||
8
app/etl_idle.py
Normal file
8
app/etl_idle.py
Normal file
@@ -0,0 +1,8 @@
|
||||
# app/etl_idle.py
|
||||
def run():
|
||||
"""
|
||||
闲时抓取逻辑。
|
||||
可以做全量同步、大批量历史修正等。
|
||||
"""
|
||||
print("Running idle-period ETL...")
|
||||
# ...
|
||||
31
app/runner.py
Normal file
31
app/runner.py
Normal file
@@ -0,0 +1,31 @@
|
||||
# app/runner.py
|
||||
import argparse
|
||||
from datetime import datetime
|
||||
|
||||
from . import etl_busy, etl_idle
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Feiqiu ETL Runner")
|
||||
parser.add_argument(
|
||||
"--mode",
|
||||
choices=["busy", "idle"],
|
||||
required=True,
|
||||
help="ETL mode: busy or idle",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
|
||||
print(f"[{now}] Start ETL mode={args.mode}")
|
||||
|
||||
if args.mode == "busy":
|
||||
etl_busy.run()
|
||||
else:
|
||||
etl_idle.run()
|
||||
|
||||
print(f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] ETL finished.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
53
etl_billiards/.env
Normal file
53
etl_billiards/.env
Normal file
@@ -0,0 +1,53 @@
|
||||
# 数据库配置(真实库)
|
||||
PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
|
||||
PG_CONNECT_TIMEOUT=10
|
||||
# 如需拆分配置:PG_HOST=... PG_PORT=... PG_NAME=... PG_USER=... PG_PASSWORD=...
|
||||
|
||||
# API配置(如需走真实接口再填写)
|
||||
API_BASE=https://api.example.com
|
||||
API_TOKEN=your_token_here
|
||||
# API_TIMEOUT=20
|
||||
# API_PAGE_SIZE=200
|
||||
# API_RETRY_MAX=3
|
||||
|
||||
# 应用配置
|
||||
STORE_ID=2790685415443269
|
||||
# TIMEZONE=Asia/Taipei
|
||||
# SCHEMA_OLTP=billiards
|
||||
# SCHEMA_ETL=etl_admin
|
||||
|
||||
# 路径配置
|
||||
EXPORT_ROOT=C:\dev\LLTQ\export\JSON
|
||||
LOG_ROOT=C:\dev\LLTQ\export\LOG
|
||||
FETCH_ROOT=
|
||||
INGEST_SOURCE_DIR=
|
||||
WRITE_PRETTY_JSON=false
|
||||
PGCLIENTENCODING=utf8
|
||||
|
||||
# ETL配置
|
||||
OVERLAP_SECONDS=120
|
||||
WINDOW_BUSY_MIN=30
|
||||
WINDOW_IDLE_MIN=180
|
||||
IDLE_START=04:00
|
||||
IDLE_END=16:00
|
||||
ALLOW_EMPTY_RESULT_ADVANCE=true
|
||||
|
||||
# 清洗配置
|
||||
LOG_UNKNOWN_FIELDS=true
|
||||
HASH_ALGO=sha1
|
||||
STRICT_NUMERIC=true
|
||||
ROUND_MONEY_SCALE=2
|
||||
|
||||
# 测试/离线模式(真实库联调建议 ONLINE)
|
||||
TEST_MODE=ONLINE
|
||||
TEST_JSON_ARCHIVE_DIR=tests/source-data-doc
|
||||
TEST_JSON_TEMP_DIR=/tmp/etl_billiards_json_tmp
|
||||
|
||||
# 测试数据库
|
||||
TEST_DB_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
|
||||
|
||||
# ODS <20>ؽ<EFBFBD><D8BD>ű<EFBFBD><C5B1><EFBFBD><EFBFBD>ã<EFBFBD><C3A3><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ã<EFBFBD>
|
||||
JSON_DOC_DIR=C:\dev\LLTQ\export\test-json-doc
|
||||
ODS_INCLUDE_FILES=
|
||||
ODS_DROP_SCHEMA_FIRST=true
|
||||
|
||||
59
etl_billiards/.env.example
Normal file
59
etl_billiards/.env.example
Normal file
@@ -0,0 +1,59 @@
|
||||
# 数据库配置
|
||||
PG_DSN=postgresql://user:password@localhost:5432/....
|
||||
PG_HOST=localhost
|
||||
PG_PORT=5432
|
||||
PG_NAME=LLZQ
|
||||
PG_USER=local-Python
|
||||
PG_PASSWORD=your_password_here
|
||||
PG_CONNECT_TIMEOUT=10
|
||||
|
||||
# API配置
|
||||
API_BASE=https://api.example.com
|
||||
API_TOKEN=your_token_here
|
||||
API_TIMEOUT=20
|
||||
API_PAGE_SIZE=200
|
||||
API_RETRY_MAX=3
|
||||
API_RETRY_BACKOFF=[1,2,4]
|
||||
|
||||
# 应用配置
|
||||
STORE_ID=2790685415443269
|
||||
TIMEZONE=Asia/Taipei
|
||||
SCHEMA_OLTP=billiards
|
||||
SCHEMA_ETL=etl_admin
|
||||
|
||||
# 路径配置
|
||||
EXPORT_ROOT=/path/to/export
|
||||
LOG_ROOT=/path/to/logs
|
||||
FETCH_ROOT=/path/to/json_fetch
|
||||
INGEST_SOURCE_DIR=
|
||||
WRITE_PRETTY_JSON=false
|
||||
MANIFEST_NAME=manifest.json
|
||||
INGEST_REPORT_NAME=ingest_report.json
|
||||
|
||||
# ETL配置
|
||||
OVERLAP_SECONDS=120
|
||||
WINDOW_BUSY_MIN=30
|
||||
WINDOW_IDLE_MIN=180
|
||||
IDLE_START=04:00
|
||||
IDLE_END=16:00
|
||||
ALLOW_EMPTY_RESULT_ADVANCE=true
|
||||
|
||||
# 清洗配置
|
||||
LOG_UNKNOWN_FIELDS=true
|
||||
HASH_ALGO=sha1
|
||||
STRICT_NUMERIC=true
|
||||
ROUND_MONEY_SCALE=2
|
||||
|
||||
# 测试/离线模式
|
||||
TEST_MODE=ONLINE
|
||||
TEST_JSON_ARCHIVE_DIR=tests/source-data-doc
|
||||
TEST_JSON_TEMP_DIR=/tmp/etl_billiards_json_tmp
|
||||
|
||||
# 测试数据库(可选:若设置则单元测试连入此 DSN)
|
||||
TEST_DB_DSN=
|
||||
|
||||
# ODS <20>ؽ<EFBFBD><D8BD>ű<EFBFBD><C5B1><EFBFBD><EFBFBD>ã<EFBFBD><C3A3><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ã<EFBFBD>
|
||||
JSON_DOC_DIR=C:\dev\LLTQ\export\test-json-doc
|
||||
ODS_INCLUDE_FILES=
|
||||
ODS_DROP_SCHEMA_FIRST=true
|
||||
|
||||
40
etl_billiards/0.py
Normal file
40
etl_billiards/0.py
Normal file
@@ -0,0 +1,40 @@
|
||||
"""Simple PostgreSQL connectivity smoke-checker."""
|
||||
import os
|
||||
import sys
|
||||
import psycopg2
|
||||
from psycopg2 import OperationalError
|
||||
|
||||
|
||||
DEFAULT_DSN = os.environ.get(
|
||||
"PG_DSN", "postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test"
|
||||
)
|
||||
DEFAULT_TIMEOUT = max(1, min(int(os.environ.get("PG_CONNECT_TIMEOUT", 10)), 20))
|
||||
|
||||
|
||||
def check_postgres_connection(dsn: str, timeout: int = DEFAULT_TIMEOUT) -> bool:
|
||||
"""Return True if connection succeeds; print diagnostics otherwise."""
|
||||
try:
|
||||
conn = psycopg2.connect(dsn, connect_timeout=timeout)
|
||||
with conn:
|
||||
with conn.cursor() as cur:
|
||||
cur.execute("SELECT 1;")
|
||||
_ = cur.fetchone()
|
||||
print(f"PostgreSQL 连接成功 (timeout={timeout}s)")
|
||||
return True
|
||||
except OperationalError as exc:
|
||||
print("PostgreSQL 连接失败(OperationalError):", exc)
|
||||
except Exception as exc: # pragma: no cover - defensive
|
||||
print("PostgreSQL 连接失败(其他异常):", exc)
|
||||
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
dsn = sys.argv[1] if len(sys.argv) > 1 else DEFAULT_DSN
|
||||
if not dsn:
|
||||
print("缺少 DSN,请传入参数或设置 PG_DSN 环境变量。")
|
||||
sys.exit(2)
|
||||
|
||||
ok = check_postgres_connection(dsn)
|
||||
if not ok:
|
||||
sys.exit(1)
|
||||
837
etl_billiards/README.md
Normal file
837
etl_billiards/README.md
Normal file
@@ -0,0 +1,837 @@
|
||||
# 台球场 ETL 系统(模块化版本)合并文档
|
||||
|
||||
本文为原多份文档(如 `INDEX.md`、`QUICK_START.md`、`ARCHITECTURE.md`、`MIGRATION_GUIDE.md`、`PROJECT_STRUCTURE.md`、`README.md` 等)的合并版,只保留与**当前项目本身**相关的内容:项目说明、目录结构、架构设计、数据与控制流程、迁移与扩展指南等,不包含修改历史和重构过程描述。
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目概述
|
||||
|
||||
台球场 ETL 系统是一个面向门店业务的专业 ETL 工程项目,用于从外部业务 API 拉取订单、支付、会员等数据,经过解析、校验、SCD2 处理、质量检查后写入 PostgreSQL 数据库,并支持增量同步和任务运行追踪。
|
||||
|
||||
系统采用模块化、分层架构设计,核心特性包括:
|
||||
|
||||
- 模块化目录结构(配置、数据库、API、模型、加载器、SCD2、质量检查、编排、任务、CLI、工具、测试等分层清晰)。
|
||||
- 完整的配置管理:默认值 + 环境变量 + CLI 参数多层覆盖。
|
||||
- 可复用的数据库访问层(连接管理、批量 Upsert 封装)。
|
||||
- 支持重试与分页的 API 客户端。
|
||||
- 类型安全的数据解析与校验模块。
|
||||
- SCD2 维度历史管理。
|
||||
- 数据质量检查(例如余额一致性检查)。
|
||||
- 任务编排层统一调度、游标管理与运行追踪。
|
||||
- 命令行入口统一管理任务执行,支持筛选任务、Dry-run 等模式。
|
||||
|
||||
---
|
||||
|
||||
## 2. 快速开始
|
||||
|
||||
### 2.1 环境准备
|
||||
|
||||
- Python 版本:建议 3.10+
|
||||
- 数据库:PostgreSQL
|
||||
- 操作系统:Windows / Linux / macOS 均可
|
||||
|
||||
```bash
|
||||
# 克隆/下载代码后进入项目目录
|
||||
cd etl_billiards/
|
||||
ls -la
|
||||
```
|
||||
|
||||
你会看到下述目录结构的顶层部分(详细见第 4 章):
|
||||
|
||||
- `config/` - 配置管理
|
||||
- `database/` - 数据库访问
|
||||
- `api/` - API 客户端
|
||||
- `tasks/` - ETL 任务实现
|
||||
- `cli/` - 命令行入口
|
||||
- `docs/` - 技术文档
|
||||
|
||||
### 2.2 安装依赖
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
主要依赖示例(按实际 `requirements.txt` 为准):
|
||||
|
||||
- `psycopg2-binary`:PostgreSQL 驱动
|
||||
- `requests`:HTTP 客户端
|
||||
- `python-dateutil`:时间处理
|
||||
- `tzdata`:时区数据
|
||||
|
||||
### 2.3 配置环境变量
|
||||
|
||||
复制并修改环境变量模板:
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# 使用你习惯的编辑器修改 .env
|
||||
```
|
||||
|
||||
`.env` 示例(最小配置):
|
||||
|
||||
```bash
|
||||
# 数据库
|
||||
PG_DSN=postgresql://user:password@localhost:5432/....
|
||||
|
||||
# API
|
||||
API_BASE=https://api.example.com
|
||||
API_TOKEN=your_token_here
|
||||
|
||||
# 门店/应用
|
||||
STORE_ID=2790685415443269
|
||||
TIMEZONE=Asia/Taipei
|
||||
|
||||
# 目录
|
||||
EXPORT_ROOT=/path/to/export
|
||||
LOG_ROOT=/path/to/logs
|
||||
```
|
||||
|
||||
> 所有配置项的默认值见 `config/defaults.py`,最终生效配置由「默认值 + 环境变量 + CLI 参数」三层叠加。
|
||||
|
||||
### 2.4 运行第一个任务
|
||||
|
||||
通过 CLI 入口运行:
|
||||
|
||||
```bash
|
||||
# 运行所有任务
|
||||
python -m cli.main
|
||||
|
||||
# 仅运行订单任务
|
||||
python -m cli.main --tasks ORDERS
|
||||
|
||||
# 运行订单 + 支付
|
||||
python -m cli.main --tasks ORDERS,PAYMENTS
|
||||
|
||||
# Windows 使用脚本
|
||||
run_etl.bat --tasks ORDERS
|
||||
|
||||
# Linux / macOS 使用脚本
|
||||
./run_etl.sh --tasks ORDERS
|
||||
```
|
||||
|
||||
### 2.5 查看结果
|
||||
|
||||
- 日志目录:使用 `LOG_ROOT` 指定,例如
|
||||
|
||||
```bash
|
||||
ls -la C:\dev\LLTQ\export\LOG/
|
||||
```
|
||||
|
||||
- 导出目录:使用 `EXPORT_ROOT` 指定,例如
|
||||
|
||||
```bash
|
||||
ls -la C:\dev\LLTQ\export\JSON/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 常用命令与开发工具
|
||||
|
||||
### 3.1 CLI 常用命令
|
||||
|
||||
```bash
|
||||
# 运行所有任务
|
||||
python -m cli.main
|
||||
|
||||
# 运行指定任务
|
||||
python -m cli.main --tasks ORDERS,PAYMENTS,MEMBERS
|
||||
|
||||
# 使用自定义数据库
|
||||
python -m cli.main --pg-dsn "postgresql://user:password@host:5432/db"
|
||||
|
||||
# 使用自定义 API 端点
|
||||
python -m cli.main --api-base "https://api.example.com" --api-token "..."
|
||||
|
||||
# 试运行(不写入数据库)
|
||||
python -m cli.main --dry-run --tasks ORDERS
|
||||
```
|
||||
|
||||
### 3.2 IDE / 代码质量工具(示例:VSCode)
|
||||
|
||||
`.vscode/settings.json` 示例:
|
||||
|
||||
```json
|
||||
{
|
||||
"python.linting.enabled": true,
|
||||
"python.linting.pylintEnabled": true,
|
||||
"python.formatting.provider": "black",
|
||||
"python.testing.pytestEnabled": true
|
||||
}
|
||||
```
|
||||
|
||||
代码格式化与检查:
|
||||
|
||||
```bash
|
||||
pip install black isort pylint
|
||||
|
||||
black .
|
||||
isort .
|
||||
pylint etl_billiards/
|
||||
```
|
||||
|
||||
### 3.3 测试
|
||||
|
||||
```bash
|
||||
# 安装测试依赖(按需)
|
||||
pip install pytest pytest-cov
|
||||
|
||||
# 运行全部测试
|
||||
pytest
|
||||
|
||||
# 仅运行单元测试
|
||||
pytest tests/unit/
|
||||
|
||||
# 生成覆盖率报告
|
||||
pytest --cov=. --cov-report=html
|
||||
```
|
||||
|
||||
测试示例(按实际项目为准):
|
||||
|
||||
- `tests/unit/test_config.py` – 配置管理单元测试
|
||||
- `tests/unit/test_parsers.py` – 解析器单元测试
|
||||
- `tests/integration/test_database.py` – 数据库集成测试
|
||||
|
||||
#### 3.3.1 测试模式(ONLINE / OFFLINE)
|
||||
|
||||
- `TEST_MODE=ONLINE`(默认)时,测试会模拟实时 API,完整执行 E/T/L。
|
||||
- `TEST_MODE=OFFLINE` 时,测试改为从 `TEST_JSON_ARCHIVE_DIR` 指定的归档 JSON 中读取数据,仅做 Transform + Load,适合验证本地归档数据是否仍可回放。
|
||||
- `TEST_JSON_ARCHIVE_DIR`:离线 JSON 归档目录(示例:`tests/source-data-doc` 或 CI 产出的快照)。
|
||||
- `TEST_JSON_TEMP_DIR`:测试生成的临时 JSON 输出目录,便于隔离每次运行的数据。
|
||||
- `TEST_DB_DSN`:可选,若设置则单元测试会连接到此 PostgreSQL DSN,实打实执行写库;留空时测试使用内存伪库,避免依赖数据库。
|
||||
|
||||
示例命令:
|
||||
|
||||
```bash
|
||||
# 在线模式覆盖所有任务
|
||||
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
|
||||
|
||||
# 离线模式使用归档 JSON 覆盖所有任务
|
||||
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
|
||||
|
||||
# 使用脚本按需组合参数(示例:在线 + 仅订单用例)
|
||||
python scripts/run_tests.py --suite online --mode ONLINE --keyword ORDERS
|
||||
|
||||
# 使用脚本连接真实测试库并回放离线模式
|
||||
python scripts/run_tests.py --suite offline --mode OFFLINE --db-dsn postgresql://user:pwd@localhost:5432/testdb
|
||||
|
||||
# 使用“指令仓库”中的预置命令
|
||||
python scripts/run_tests.py --preset offline_realdb
|
||||
python scripts/run_tests.py --list-presets # 查看或自定义 scripts/test_presets.py
|
||||
```
|
||||
|
||||
#### 3.3.2 脚本化测试组合(`run_tests.py` / `test_presets.py`)
|
||||
|
||||
- `scripts/run_tests.py` 是 pytest 的统一入口:自动把项目根目录加入 `sys.path`,并提供 `--suite online/offline/integration`、`--tests`(自定义路径)、`--mode`、`--db-dsn`、`--json-archive`、`--json-temp`、`--keyword/-k`、`--pytest-args`、`--env KEY=VALUE` 等参数,可以像搭积木一样自由组合;
|
||||
- `--preset foo` 会读取 `scripts/test_presets.py` 内 `PRESETS["foo"]` 的配置,并叠加到当前命令;`--list-presets` 与 `--dry-run` 可用来审阅或仅打印命令;
|
||||
- 直接执行 `python scripts/test_presets.py` 可依次运行 `AUTO_RUN_PRESETS` 中列出的预置;传入 `--preset x --dry-run` 则只打印对应命令。
|
||||
|
||||
`test_presets.py` 充当“指令仓库”。每个预置都是一个字典,常用字段解释如下:
|
||||
|
||||
| 字段 | 作用 |
|
||||
| ---------------------------- | ------------------------------------------------------------------ |
|
||||
| `suite` | 复用 `run_tests.py` 内置套件(online/offline/integration,可多选) |
|
||||
| `tests` | 追加任意 pytest 路径,例如 `tests/unit/test_config.py` |
|
||||
| `mode` | 覆盖 `TEST_MODE`(ONLINE / OFFLINE) |
|
||||
| `db_dsn` | 覆盖 `TEST_DB_DSN`,用于连入真实测试库 |
|
||||
| `json_archive` / `json_temp` | 配置离线 JSON 归档与临时目录 |
|
||||
| `keyword` | 映射到 `pytest -k`,用于关键字过滤 |
|
||||
| `pytest_args` | 附加 pytest 参数,例 `-vv --maxfail=1` |
|
||||
| `env` | 额外环境变量列表,如 `["STORE_ID=123"]` |
|
||||
| `preset_meta` | 说明性文字,便于描述场景 |
|
||||
|
||||
示例:`offline_realdb` 预置会设置 `TEST_MODE=OFFLINE`、指定 `tests/source-data-doc` 为归档目录,并通过 `db_dsn` 连到测试库。执行 `python scripts/run_tests.py --preset offline_realdb` 或 `python scripts/test_presets.py --preset offline_realdb` 即可复用该组合,保证本地、CI 与生产回放脚本一致。
|
||||
|
||||
#### 3.3.3 数据库连通性快速检查
|
||||
|
||||
`python scripts/test_db_connection.py` 提供最轻量的 PostgreSQL 连通性检测:默认使用 `TEST_DB_DSN`(也可传 `--dsn`),尝试连接并执行 `SELECT 1 AS ok`(可通过 `--query` 自定义)。典型用途:
|
||||
|
||||
```bash
|
||||
# 读取 .env/环境变量中的 TEST_DB_DSN
|
||||
python scripts/test_db_connection.py
|
||||
|
||||
# 临时指定 DSN,并检查任务配置表
|
||||
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/.... --query "SELECT count(*) FROM etl_admin.etl_task"
|
||||
```
|
||||
|
||||
脚本返回 0 代表连接与查询成功;若返回非 0,可结合第 8 章“常见问题排查”的数据库章节(网络、防火墙、账号权限等)先定位问题,再运行完整 ETL。
|
||||
|
||||
---
|
||||
|
||||
## 4. 项目结构与文件说明
|
||||
|
||||
### 4.1 总体目录结构(树状图)
|
||||
|
||||
```text
|
||||
etl_billiards/
|
||||
│
|
||||
├── README.md # 项目总览和使用说明
|
||||
├── MIGRATION_GUIDE.md # 从旧版本迁移指南
|
||||
├── requirements.txt # Python 依赖列表
|
||||
├── setup.py # 项目安装配置
|
||||
├── .env.example # 环境变量配置模板
|
||||
├── .gitignore # Git 忽略文件配置
|
||||
├── run_etl.sh # Linux/Mac 运行脚本
|
||||
├── run_etl.bat # Windows 运行脚本
|
||||
│
|
||||
├── config/ # 配置管理模块
|
||||
│ ├── __init__.py
|
||||
│ ├── defaults.py # 默认配置值定义
|
||||
│ ├── env_parser.py # 环境变量解析器
|
||||
│ └── settings.py # 配置管理主类
|
||||
│
|
||||
├── database/ # 数据库访问层
|
||||
│ ├── __init__.py
|
||||
│ ├── connection.py # 数据库连接管理
|
||||
│ └── operations.py # 批量操作封装
|
||||
│
|
||||
├── api/ # HTTP API 客户端
|
||||
│ ├── __init__.py
|
||||
│ └── client.py # API 客户端(重试 + 分页)
|
||||
│
|
||||
├── models/ # 数据模型层
|
||||
│ ├── __init__.py
|
||||
│ ├── parsers.py # 类型解析器
|
||||
│ └── validators.py # 数据验证器
|
||||
│
|
||||
├── loaders/ # 数据加载器层
|
||||
│ ├── __init__.py
|
||||
│ ├── base_loader.py # 加载器基类
|
||||
│ ├── dimensions/ # 维度表加载器
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── member.py # 会员维度加载器
|
||||
│ └── facts/ # 事实表加载器
|
||||
│ ├── __init__.py
|
||||
│ ├── order.py # 订单事实表加载器
|
||||
│ └── payment.py # 支付记录加载器
|
||||
│
|
||||
├── scd/ # SCD2 处理层
|
||||
│ ├── __init__.py
|
||||
│ └── scd2_handler.py # SCD2 历史记录处理器
|
||||
│
|
||||
├── quality/ # 数据质量检查层
|
||||
│ ├── __init__.py
|
||||
│ ├── base_checker.py # 质量检查器基类
|
||||
│ └── balance_checker.py # 余额一致性检查器
|
||||
│
|
||||
├── orchestration/ # ETL 编排层
|
||||
│ ├── __init__.py
|
||||
│ ├── scheduler.py # ETL 调度器
|
||||
│ ├── task_registry.py # 任务注册表(工厂模式)
|
||||
│ ├── cursor_manager.py # 游标管理器
|
||||
│ └── run_tracker.py # 运行记录追踪器
|
||||
│
|
||||
├── tasks/ # ETL 任务层
|
||||
│ ├── __init__.py
|
||||
│ ├── base_task.py # 任务基类(模板方法)
|
||||
│ ├── orders_task.py # 订单 ETL 任务
|
||||
│ ├── payments_task.py # 支付 ETL 任务
|
||||
│ └── members_task.py # 会员 ETL 任务
|
||||
│
|
||||
├── cli/ # 命令行接口层
|
||||
│ ├── __init__.py
|
||||
│ └── main.py # CLI 主入口
|
||||
│
|
||||
├── utils/ # 工具函数
|
||||
│ ├── __init__.py
|
||||
│ └── helpers.py # 通用工具函数
|
||||
│
|
||||
├── tests/ # 测试代码
|
||||
│ ├── __init__.py
|
||||
│ ├── unit/ # 单元测试
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── test_config.py
|
||||
│ │ └── test_parsers.py
|
||||
│ ├── testdata_json/ # 清洗入库用的测试Json文件
|
||||
│ │ └── XX.json
|
||||
│ └── integration/ # 集成测试
|
||||
│ ├── __init__.py
|
||||
│ └── test_database.py
|
||||
│
|
||||
└── docs/ # 文档
|
||||
└── ARCHITECTURE.md # 架构设计文档
|
||||
```
|
||||
|
||||
### 4.2 各模块职责概览
|
||||
|
||||
- **config/**
|
||||
- 统一配置入口,支持默认值、环境变量、命令行参数三层覆盖。
|
||||
- **database/**
|
||||
- 封装 PostgreSQL 连接与批量操作(插入、更新、Upsert 等)。
|
||||
- **api/**
|
||||
- 对上游业务 API 的 HTTP 调用进行统一封装,支持重试、分页与超时控制。
|
||||
- **models/**
|
||||
- 提供类型解析器(时间戳、金额、整数等)与业务级数据校验器。
|
||||
- **loaders/**
|
||||
- 提供事实表与维度表的加载逻辑(包含批量 Upsert、统计写入结果等)。
|
||||
- **scd/**
|
||||
- 维度型数据的 SCD2 历史管理(有效期、版本标记等)。
|
||||
- **quality/**
|
||||
- 质量检查策略,例如余额一致性、记录数量对齐等。
|
||||
- **orchestration/**
|
||||
- 任务调度、任务注册、游标管理(增量窗口)、运行记录追踪。
|
||||
- **tasks/**
|
||||
- 具体业务任务(订单、支付、会员等),封装了从“取数 → 处理 → 写库 → 记录结果”的完整流程。
|
||||
- **cli/**
|
||||
- 命令行入口,解析参数并启动调度流程。
|
||||
- **utils/**
|
||||
- 杂项工具函数。
|
||||
- **tests/**
|
||||
- 单元测试与集成测试代码。
|
||||
|
||||
---
|
||||
|
||||
## 5. 架构设计与流程说明
|
||||
|
||||
### 5.1 分层架构图
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────┐
|
||||
│ CLI 命令行接口 │ <- cli/main.py
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
┌─────────────▼───────────────────────┐
|
||||
│ Orchestration 编排层 │ <- orchestration/
|
||||
│ (Scheduler, TaskRegistry, ...) │
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
┌─────────────▼───────────────────────┐
|
||||
│ Tasks 任务层 │ <- tasks/
|
||||
│ (OrdersTask, PaymentsTask, ...) │
|
||||
└───┬─────────┬─────────┬─────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌────────┐ ┌─────┐ ┌──────────┐
|
||||
│Loaders │ │ SCD │ │ Quality │ <- loaders/, scd/, quality/
|
||||
└────────┘ └─────┘ └──────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ Models 模型 │ <- models/
|
||||
└───────┬────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ API 客户端 │ <- api/
|
||||
└───────┬────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ Database 访问 │ <- database/
|
||||
└───────┬────────┘
|
||||
│
|
||||
┌───────▼────────┐
|
||||
│ Config 配置 │ <- config/
|
||||
└────────────────┘
|
||||
```
|
||||
|
||||
### 5.2 各层职责(当前设计)
|
||||
|
||||
- **CLI 层 (`cli/`)**
|
||||
|
||||
- 解析命令行参数(指定任务列表、Dry-run、覆盖配置项等)。
|
||||
- 初始化配置与日志后交由编排层执行。
|
||||
|
||||
- **编排层 (`orchestration/`)**
|
||||
|
||||
- `scheduler.py`:根据配置与 CLI 参数选择需要执行的任务,控制执行顺序和并行策略。
|
||||
- `task_registry.py`:提供任务注册表,按任务代码创建任务实例(工厂模式)。
|
||||
- `cursor_manager.py`:管理增量游标(时间窗口 / ID 游标)。
|
||||
- `run_tracker.py`:记录每次任务运行的状态、统计信息和错误信息。
|
||||
|
||||
- **任务层 (`tasks/`)**
|
||||
|
||||
- `base_task.py`:定义任务执行模板流程(模板方法模式),包括获取窗口、调用上游、解析 / 校验、写库、更新游标等。
|
||||
- `orders_task.py` / `payments_task.py` / `members_task.py`:实现具体任务逻辑(订单、支付、会员)。
|
||||
|
||||
- **加载器 / SCD / 质量层**
|
||||
|
||||
- `loaders/`:根据目标表封装 Upsert / Insert / Update 逻辑。
|
||||
- `scd/scd2_handler.py`:为维度表提供 SCD2 历史管理能力。
|
||||
- `quality/`:执行数据质量检查,如余额对账。
|
||||
|
||||
- **模型层 (`models/`)**
|
||||
|
||||
- `parsers.py`:负责数据类型转换(字符串 → 时间戳、Decimal、int 等)。
|
||||
- `validators.py`:执行字段级和记录级的数据校验。
|
||||
|
||||
- **API 层 (`api/client.py`)**
|
||||
|
||||
- 封装 HTTP 调用,处理重试、超时及分页。
|
||||
|
||||
- **数据库层 (`database/`)**
|
||||
|
||||
- 管理数据库连接及上下文。
|
||||
- 提供批量插入 / 更新 / Upsert 操作接口。
|
||||
|
||||
- **配置层 (`config/`)**
|
||||
- 定义配置项默认值。
|
||||
- 解析环境变量并进行类型转换。
|
||||
- 对外提供统一配置对象。
|
||||
|
||||
### 5.3 设计模式(当前使用)
|
||||
|
||||
- 工厂模式:任务注册 / 创建(`TaskRegistry`)。
|
||||
- 模板方法模式:任务执行流程(`BaseTask`)。
|
||||
- 策略模式:不同 Loader / Checker 实现不同策略。
|
||||
- 依赖注入:通过构造函数向任务传入 `db`、`api`、`config` 等依赖。
|
||||
|
||||
### 5.4 数据与控制流程
|
||||
|
||||
整体流程:
|
||||
|
||||
1. CLI 解析参数并加载配置。
|
||||
2. Scheduler 构建数据库连接、API 客户端等依赖。
|
||||
3. Scheduler 遍历任务配置,从 `TaskRegistry` 获取任务类并实例化。
|
||||
4. 每个任务按统一模板执行:
|
||||
- 读取游标 / 时间窗口。
|
||||
- 调用 API 拉取数据(可分页)。
|
||||
- 解析、验证数据。
|
||||
- 通过 Loader 写入数据库(事实表 / 维度表 / SCD2)。
|
||||
- 执行质量检查。
|
||||
- 更新游标与运行记录。
|
||||
5. 所有任务执行完成后,释放连接并退出进程。
|
||||
|
||||
### 5.5 错误处理策略
|
||||
|
||||
- 单个任务失败不影响其他任务执行。
|
||||
- 数据库操作异常自动回滚当前事务。
|
||||
- API 请求失败时按配置进行重试,超过重试次数记录错误并终止该任务。
|
||||
- 所有错误被记录到日志和运行追踪表,便于事后排查。
|
||||
|
||||
### 5.6 ODS + DWD 双阶段策略(新增)
|
||||
|
||||
为了支撑回溯/重放与后续 DWD 宽表构建,项目新增了 `billiards_ods` Schema 以及一组专门的 ODS 任务/Loader:
|
||||
|
||||
- **ODS 表**:`billiards_ods.ods_order_settle`、`ods_table_use_detail`、`ods_assistant_ledger`、`ods_assistant_abolish`、`ods_goods_ledger`、`ods_payment`、`ods_refund`、`ods_coupon_verify`、`ods_member`、`ods_member_card`、`ods_package_coupon`、`ods_inventory_stock`、`ods_inventory_change`。每条记录都会保存 `store_id + 源主键 + payload JSON + fetched_at + source_endpoint` 等信息。
|
||||
- **通用 Loader**:`loaders/ods/generic.py::GenericODSLoader` 统一封装了 `INSERT ... ON CONFLICT ...` 与批量写入逻辑,调用方只需提供列名与主键列即可。
|
||||
- **ODS 任务**:`tasks/ods_tasks.py` 内通过 `OdsTaskSpec` 定义了一组任务(`ODS_ORDER_SETTLE`、`ODS_PAYMENT`、`ODS_ASSISTANT_LEDGER` 等),并在 `TaskRegistry` 中自动注册,可直接通过 `python -m cli.main --tasks ODS_ORDER_SETTLE,ODS_PAYMENT` 执行。
|
||||
- **双阶段链路**:
|
||||
1. 阶段 1(ODS):调用 API/离线归档 JSON,将原始记录写入 ODS 表,保留分页、抓取时间、来源文件等元数据。
|
||||
2. 阶段 2(DWD/DIM):后续订单、支付、券等事实任务将改为从 ODS 读取 payload,经过解析/校验后写入 `billiards.fact_*`、`dim_*` 表,避免重复拉取上游接口。
|
||||
|
||||
> 新增的单元测试 `tests/unit/test_ods_tasks.py` 覆盖了 `ODS_ORDER_SETTLE`、`ODS_PAYMENT` 的入库路径,可作为扩展其他 ODS 任务的模板。
|
||||
|
||||
---
|
||||
|
||||
## 6. 迁移指南(从旧脚本到当前项目)
|
||||
|
||||
本节用于说明如何从旧的单文件脚本(如 `task_merged.py`)迁移到当前模块化项目,属于当前项目的使用说明,不涉及历史对比细节。
|
||||
|
||||
### 6.1 核心功能映射示意
|
||||
|
||||
| 旧版本函数 / 类 | 新版本位置 | 说明 |
|
||||
| --------------------- | ----------------------------------------------------- | ---------- |
|
||||
| `DEFAULTS` 字典 | `config/defaults.py` | 配置默认值 |
|
||||
| `build_config()` | `config/settings.py::AppConfig.load()` | 配置加载 |
|
||||
| `Pg` 类 | `database/connection.py::DatabaseConnection` | 数据库连接 |
|
||||
| `http_get_json()` | `api/client.py::APIClient.get()` | API 请求 |
|
||||
| `paged_get()` | `api/client.py::APIClient.get_paginated()` | 分页请求 |
|
||||
| `parse_ts()` | `models/parsers.py::TypeParser.parse_timestamp()` | 时间解析 |
|
||||
| `upsert_fact_order()` | `loaders/facts/order.py::OrderLoader.upsert_orders()` | 订单加载 |
|
||||
| `scd2_upsert()` | `scd/scd2_handler.py::SCD2Handler.upsert()` | SCD2 处理 |
|
||||
| `run_task_orders()` | `tasks/orders_task.py::OrdersTask.execute()` | 订单任务 |
|
||||
| `main()` | `cli/main.py::main()` | 主入口 |
|
||||
|
||||
### 6.2 典型迁移步骤
|
||||
|
||||
1. **配置迁移**
|
||||
|
||||
- 原来在 `DEFAULTS` 或脚本内硬编码的配置,迁移到 `.env` 与 `config/defaults.py`。
|
||||
- 使用 `AppConfig.load()` 统一获取配置。
|
||||
|
||||
2. **并行运行验证**
|
||||
|
||||
```bash
|
||||
# 旧脚本
|
||||
python task_merged.py --tasks ORDERS
|
||||
|
||||
# 新项目
|
||||
python -m cli.main --tasks ORDERS
|
||||
```
|
||||
|
||||
对比新旧版本导出的数据表和日志,确认一致性。
|
||||
|
||||
3. **自定义逻辑迁移**
|
||||
|
||||
- 原脚本中的自定义清洗逻辑 → 放入相应 `loaders/` 或任务类中。
|
||||
- 自定义任务 → 在 `tasks/` 中实现并在 `task_registry` 中注册。
|
||||
- 自定义 API 调用 → 扩展 `api/client.py` 或单独封装服务类。
|
||||
|
||||
4. **逐步切换**
|
||||
- 先在测试环境并行运行。
|
||||
- 再逐步切换生产任务到新版本。
|
||||
|
||||
---
|
||||
|
||||
## 7. 开发与扩展指南(当前项目)
|
||||
|
||||
### 7.1 添加新任务
|
||||
|
||||
1. 在 `tasks/` 目录创建任务类:
|
||||
|
||||
```python
|
||||
from .base_task import BaseTask
|
||||
|
||||
class MyTask(BaseTask):
|
||||
def get_task_code(self) -> str:
|
||||
return "MY_TASK"
|
||||
|
||||
def execute(self) -> dict:
|
||||
# 1. 获取时间窗口
|
||||
window_start, window_end, _ = self._get_time_window()
|
||||
|
||||
# 2. 调用 API 获取数据
|
||||
records, _ = self.api.get_paginated(...)
|
||||
|
||||
# 3. 解析 / 校验
|
||||
parsed = [self._parse(r) for r in records]
|
||||
|
||||
# 4. 加载数据
|
||||
loader = MyLoader(self.db)
|
||||
inserted, updated, _ = loader.upsert(parsed)
|
||||
|
||||
# 5. 提交并返回结果
|
||||
self.db.commit()
|
||||
return self._build_result("SUCCESS", {
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
})
|
||||
```
|
||||
|
||||
2. 在 `orchestration/task_registry.py` 中注册:
|
||||
|
||||
```python
|
||||
from tasks.my_task import MyTask
|
||||
|
||||
default_registry.register("MY_TASK", MyTask)
|
||||
```
|
||||
|
||||
3. 在任务配置表中启用(示例):
|
||||
|
||||
```sql
|
||||
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
|
||||
VALUES ('MY_TASK', 123456, TRUE);
|
||||
```
|
||||
|
||||
### 7.2 添加新加载器
|
||||
|
||||
```python
|
||||
from loaders.base_loader import BaseLoader
|
||||
|
||||
class MyLoader(BaseLoader):
|
||||
def upsert(self, records: list) -> tuple:
|
||||
sql = "INSERT INTO table_name (...) VALUES (...) ON CONFLICT (...) DO UPDATE SET ... RETURNING (xmax = 0) AS inserted"
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
```
|
||||
|
||||
### 7.3 添加新质量检查器
|
||||
|
||||
1. 在 `quality/` 中实现检查器,继承 `base_checker.py`。
|
||||
2. 在任务或调度流程中调用该检查器,在写库后进行验证。
|
||||
|
||||
### 7.4 类型解析与校验扩展
|
||||
|
||||
- 在 `models/parsers.py` 中添加新类型解析方法。
|
||||
- 在 `models/validators.py` 中添加新规则(如枚举校验、跨字段校验等)。
|
||||
|
||||
---
|
||||
|
||||
## 8. 常见问题排查
|
||||
|
||||
### 8.1 数据库连接失败
|
||||
|
||||
```text
|
||||
错误: could not connect to server
|
||||
```
|
||||
|
||||
排查要点:
|
||||
|
||||
- 检查 `PG_DSN` 或相关数据库配置是否正确。
|
||||
- 确认数据库服务是否启动、网络是否可达。
|
||||
|
||||
### 8.2 API 请求超时
|
||||
|
||||
```text
|
||||
错误: requests.exceptions.Timeout
|
||||
```
|
||||
|
||||
排查要点:
|
||||
|
||||
- 检查 `API_BASE` 地址与网络连通性。
|
||||
- 适当提高超时与重试次数(在配置中调整)。
|
||||
|
||||
### 8.3 模块导入错误
|
||||
|
||||
```text
|
||||
错误: ModuleNotFoundError
|
||||
```
|
||||
|
||||
排查要点:
|
||||
|
||||
- 确认在项目根目录下运行(包含 `etl_billiards/` 包)。
|
||||
- 或通过 `pip install -e .` 以可编辑模式安装项目。
|
||||
|
||||
### 8.4 权限相关问题
|
||||
|
||||
```text
|
||||
错误: Permission denied
|
||||
```
|
||||
|
||||
排查要点:
|
||||
|
||||
- 脚本无执行权限:`chmod +x run_etl.sh`。
|
||||
- Windows 需要以管理员身份运行,或修改日志 / 导出目录权限。
|
||||
|
||||
---
|
||||
|
||||
## 9. 使用前检查清单
|
||||
|
||||
在正式运行前建议确认:
|
||||
|
||||
- [ ] 已安装 Python 3.10+。
|
||||
- [ ] 已执行 `pip install -r requirements.txt`。
|
||||
- [ ] `.env` 已配置正确(数据库、API、门店 ID、路径等)。
|
||||
- [ ] PostgreSQL 数据库可连接。
|
||||
- [ ] API 服务可访问且凭证有效。
|
||||
- [ ] `LOG_ROOT`、`EXPORT_ROOT` 目录存在且拥有写权限。
|
||||
|
||||
---
|
||||
|
||||
## 10. 参考说明
|
||||
|
||||
- 本文已合并原有的快速开始、项目结构、架构说明、迁移指南等内容,可作为当前项目的统一说明文档。
|
||||
- 如需在此基础上拆分多份文档,可按章节拆出,例如「快速开始」「架构设计」「迁移指南」「开发扩展」等。
|
||||
|
||||
## 11. 运行/调试模式说明
|
||||
|
||||
- 生产环境仅保留“任务模式”:通过调度/CLI 执行注册的任务(ETL/ODS),不使用调试脚本。
|
||||
- 开发/调试可使用的辅助脚本(上线前可删除或禁用):
|
||||
- `python -m etl_billiards.scripts.rebuild_ods_from_json`:从本地 JSON 目录重建 `billiards_ods`,用于离线初始化/验证。环境变量:`PG_DSN`(必填)、`JSON_DOC_DIR`(可选,默认 `C:\dev\LLTQ\export\test-json-doc`)、`INCLUDE_FILES`(逗号分隔文件名)、`DROP_SCHEMA_FIRST`(默认 true)。
|
||||
- 如需在生产环境保留脚本,请在运维手册中明确用途和禁用条件,避免误用。
|
||||
|
||||
## 12. ODS 任务上线指引
|
||||
|
||||
- 任务注册:`etl_billiards/database/seed_ods_tasks.sql` 列出了当前启用的 ODS 任务。将其中的 `store_id` 替换为实际门店后执行:
|
||||
```
|
||||
psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
|
||||
```
|
||||
`ON CONFLICT` 会保持 enabled=true,避免重复。
|
||||
- 调度:确认 `etl_admin.etl_task` 中已启用所需的 ODS 任务(任务代码见 seed 脚本),调度器或 CLI `--tasks` 即可调用。
|
||||
- 离线回灌:开发环境可用 `rebuild_ods_from_json` 以样例 JSON 初始化 ODS,生产慎用;默认按 `(source_file, record_index)` 去重。
|
||||
- 测试:`pytest etl_billiards/tests/unit/test_ods_tasks.py` 覆盖核心 ODS 任务;测试时可设置 `ETL_SKIP_DOTENV=1` 跳过本地 .env 读取。
|
||||
|
||||
## 13. ODS 表映射总览
|
||||
|
||||
| ODS 表名 | 接口 Path | 数据列表路径 |
|
||||
| ------------------------------------ | ---------------------------------------------------- | ----------------------------- |
|
||||
| `assistant_accounts_master` | `/PersonnelManagement/SearchAssistantInfo` | data.assistantInfos |
|
||||
| `assistant_service_records` | `/AssistantPerformance/GetOrderAssistantDetails` | data.orderAssistantDetails |
|
||||
| `assistant_cancellation_records` | `/AssistantPerformance/GetAbolitionAssistant` | data.abolitionAssistants |
|
||||
| `goods_stock_movements` | `/GoodsStockManage/QueryGoodsOutboundReceipt` | data.queryDeliveryRecordsList |
|
||||
| `goods_stock_summary` | `/TenantGoods/GetGoodsStockReport` | data |
|
||||
| `group_buy_packages` | `/PackageCoupon/QueryPackageCouponList` | data.packageCouponList |
|
||||
| `group_buy_redemption_records` | `/Site/GetSiteTableUseDetails` | data.siteTableUseDetailsList |
|
||||
| `member_profiles` | `/MemberProfile/GetTenantMemberList` | data.tenantMemberInfos |
|
||||
| `member_balance_changes` | `/MemberProfile/GetMemberCardBalanceChange` | data.tenantMemberCardLogs |
|
||||
| `member_stored_value_cards` | `/MemberProfile/GetTenantMemberCardList` | data.tenantMemberCards |
|
||||
| `payment_transactions` | `/PayLog/GetPayLogListPage` | data |
|
||||
| `platform_coupon_redemption_records` | `/Promotion/GetOfflineCouponConsumePageList` | data |
|
||||
| `recharge_settlements` | `/Site/GetRechargeSettleList` | data.settleList |
|
||||
| `refund_transactions` | `/Order/GetRefundPayLogList` | data |
|
||||
| `settlement_records` | `/Site/GetAllOrderSettleList` | data.settleList |
|
||||
| `settlement_ticket_details` | `/Order/GetOrderSettleTicketNew` | (整包原始 JSON) |
|
||||
| `site_tables_master` | `/Table/GetSiteTables` | data.siteTables |
|
||||
| `stock_goods_category_tree` | `/TenantGoodsCategory/QueryPrimarySecondaryCategory` | data.goodsCategoryList |
|
||||
| `store_goods_master` | `/TenantGoods/GetGoodsInventoryList` | data.orderGoodsList |
|
||||
| `store_goods_sales_records` | `/TenantGoods/GetGoodsSalesList` | data.orderGoodsLedgers |
|
||||
| `table_fee_discount_records` | `/Site/GetTaiFeeAdjustList` | data.taiFeeAdjustInfos |
|
||||
| `table_fee_transactions` | `/Site/GetSiteTableOrderDetails` | data.siteTableUseDetailsList |
|
||||
| `tenant_goods_master` | `/TenantGoods/QueryTenantGoods` | data.tenantGoodsList |
|
||||
|
||||
## 14. ODS 相关环境变量/默认值
|
||||
|
||||
- `.env` / 环境变量:
|
||||
- `JSON_DOC_DIR`:ODS 样例 JSON 目录(开发/回灌用)
|
||||
- `ODS_INCLUDE_FILES`:限定导入的文件名(逗号分隔,不含 .json)
|
||||
- `ODS_DROP_SCHEMA_FIRST`:true/false,是否重建 schema
|
||||
- `ETL_SKIP_DOTENV`:测试/CI 时设为 1 跳过本地 .env 读取
|
||||
- `config/defaults.py` 中 `ods` 默认值:
|
||||
- `json_doc_dir`: `C:\dev\LLTQ\export\test-json-doc`
|
||||
- `include_files`: `""`
|
||||
- `drop_schema_first`: `True`
|
||||
|
||||
---
|
||||
|
||||
## 15. DWD 维度 “业务事件”
|
||||
|
||||
1. 粒度唯一、原子
|
||||
|
||||
- 一张 DWD 表只能有一种业务粒度,比如:
|
||||
- 一条记录 = 一次结账;
|
||||
- 一条记录 = 一段台费流水;
|
||||
- 一条记录 = 一次助教服务;
|
||||
- 一条记录 = 一次会员余额变动。
|
||||
- 表里面不能又混“订单头”又混“订单行”,不能一部分是“汇总”,一部分是“明细”。
|
||||
- 一旦粒度确定,所有字段都要跟这个粒度匹配:
|
||||
- 比如“结账头表”就不要塞每一行商品明细;
|
||||
- 商品明细就不要塞整单级别的总金额。
|
||||
- 这是 DWD 层最重要的一条。
|
||||
|
||||
2. 以业务过程建模,不以 JSON 列表建模
|
||||
|
||||
- 先画清楚你真实的业务链路:
|
||||
- 开台 / 换台 / 关台 → 台费流水
|
||||
- 助教上桌 → 助教服务流水 / 废除事件
|
||||
- 点单 → 商品销售流水
|
||||
- 充值 / 消费 → 余额变更 / 充值单
|
||||
- 结账 → 结账头表 + 支付流水 / 退款流水
|
||||
- 团购 / 平台券 → 核销流水
|
||||
|
||||
3. 主键明确、外键统一
|
||||
|
||||
- 每张 DWD 表必须有业务主键(哪怕是接口给的 id),不要依赖数据库自增。
|
||||
- 所有“同一概念”的字段必须统一命名、统一含义:
|
||||
- 门店:统一叫 site_id,都对应 siteProfile.id;
|
||||
- 会员:统一叫 member_id 对应 member_profiles.id,system_member_id 单独一列;
|
||||
- 台桌:统一 table_id 对应 site_tables_master.id;
|
||||
- 结账:统一 order_settle_id;
|
||||
- 订单:统一 order_trade_no 等。
|
||||
- 否则后面 DWS、AI 要把表拼起来会非常痛苦。
|
||||
|
||||
4. 保留明细,不做过度汇总
|
||||
|
||||
- DWD 层的事实表原则上只做“明细级”的数据:
|
||||
- 不要在 DWD 就把“日汇总、周汇总、月汇总”算出来,那是 DWS 的事;
|
||||
- 也不要把多个事件折成一行(例如一张表同时放日汇总+单笔流水)。
|
||||
- 需要聚合时,再在 DWS 做主题宽表:
|
||||
- dws_member_day_profile、dws_site_day_summary 等。
|
||||
- DWD 只负责细颗粒度的真相。
|
||||
|
||||
5. 统一清洗、标准化,但保持可追溯
|
||||
|
||||
- 在 DWD 层一定要做的清洗:
|
||||
- 类型转换:字符串时间 → 时间类型,金额统一为 decimal,布尔统一为 0/1;
|
||||
- 单位统一:秒 / 分钟、元 / 分都统一;
|
||||
- 枚举标准化:状态码、类型码在 DWD 里就定死含义,必要时建枚举维表。
|
||||
- 同时要保证:
|
||||
- 每条 DWD 记录都能追溯回 ODS:
|
||||
- 保留源系统主键;
|
||||
- 保留原始时间 / 原始金额字段(不要覆盖掉)。
|
||||
|
||||
6. 扁平化、去嵌套
|
||||
|
||||
- JSON 里常见结构是:分页壳 + 头 + 明细数组 + 各种嵌套对象(siteProfile、tableProfile、goodsLedgers…)。
|
||||
- DWD 的原则是:
|
||||
- 去掉分页壳;
|
||||
- 把“数组”拆成子表(头表 / 行表);
|
||||
- 把重复出现的 profile 抽出去做维度表(门店、台、商品、会员……)。
|
||||
- 目标是:DWD 表都是二维表结构,不存复杂嵌套 JSON。
|
||||
|
||||
7. 模型长期稳定,可扩展
|
||||
|
||||
- DWD 的表结构要尽可能稳定,新增需求尽量通过:
|
||||
- 加字段;
|
||||
- 新建事实表 / 维度表;
|
||||
- 在 DWS 做派生指标;
|
||||
- 而不是频繁重构已有 DWD 表结构。
|
||||
- 这点跟你后面要喂给 LLM 也很相关:AI 配的 prompt、schema 理解都要尽量少改。
|
||||
0
etl_billiards/__init__.py
Normal file
0
etl_billiards/__init__.py
Normal file
0
etl_billiards/api/__init__.py
Normal file
0
etl_billiards/api/__init__.py
Normal file
256
etl_billiards/api/client.py
Normal file
256
etl_billiards/api/client.py
Normal file
@@ -0,0 +1,256 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""API客户端:统一封装 POST/重试/分页与列表提取逻辑。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Iterable, Sequence, Tuple
|
||||
|
||||
import requests
|
||||
from requests.adapters import HTTPAdapter
|
||||
from urllib3.util.retry import Retry
|
||||
|
||||
DEFAULT_BROWSER_HEADERS = {
|
||||
"Accept": "application/json, text/plain, */*",
|
||||
"Content-Type": "application/json",
|
||||
"Origin": "https://pc.ficoo.vip",
|
||||
"Referer": "https://pc.ficoo.vip/",
|
||||
"User-Agent": (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
|
||||
"(KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36"
|
||||
),
|
||||
"Accept-Language": "zh-CN,zh;q=0.9",
|
||||
"sec-ch-ua": '"Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"',
|
||||
"sec-ch-ua-platform": '"Windows"',
|
||||
"sec-ch-ua-mobile": "?0",
|
||||
"sec-fetch-site": "same-origin",
|
||||
"sec-fetch-mode": "cors",
|
||||
"sec-fetch-dest": "empty",
|
||||
"priority": "u=1, i",
|
||||
"X-Requested-With": "XMLHttpRequest",
|
||||
"DNT": "1",
|
||||
}
|
||||
|
||||
DEFAULT_LIST_KEYS: Tuple[str, ...] = (
|
||||
"list",
|
||||
"rows",
|
||||
"records",
|
||||
"items",
|
||||
"dataList",
|
||||
"data_list",
|
||||
"tenantMemberInfos",
|
||||
"tenantMemberCardLogs",
|
||||
"tenantMemberCards",
|
||||
"settleList",
|
||||
"orderAssistantDetails",
|
||||
"assistantInfos",
|
||||
"siteTables",
|
||||
"taiFeeAdjustInfos",
|
||||
"siteTableUseDetailsList",
|
||||
"tenantGoodsList",
|
||||
"packageCouponList",
|
||||
"queryDeliveryRecordsList",
|
||||
"goodsCategoryList",
|
||||
"orderGoodsList",
|
||||
"orderGoodsLedgers",
|
||||
)
|
||||
|
||||
|
||||
class APIClient:
|
||||
"""HTTP API 客户端(默认使用 POST + JSON 请求体)"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
base_url: str,
|
||||
token: str | None = None,
|
||||
timeout: int = 20,
|
||||
retry_max: int = 3,
|
||||
headers_extra: dict | None = None,
|
||||
):
|
||||
self.base_url = (base_url or "").rstrip("/")
|
||||
self.token = self._normalize_token(token)
|
||||
self.timeout = timeout
|
||||
self.retry_max = retry_max
|
||||
self.headers_extra = headers_extra or {}
|
||||
self._session: requests.Session | None = None
|
||||
|
||||
# ------------------------------------------------------------------ HTTP 基础
|
||||
def _get_session(self) -> requests.Session:
|
||||
"""获取或创建带重试的 Session。"""
|
||||
if self._session is None:
|
||||
self._session = requests.Session()
|
||||
|
||||
retries = max(0, int(self.retry_max) - 1)
|
||||
retry = Retry(
|
||||
total=None,
|
||||
connect=retries,
|
||||
read=retries,
|
||||
status=retries,
|
||||
allowed_methods=frozenset(["GET", "POST"]),
|
||||
status_forcelist=(429, 500, 502, 503, 504),
|
||||
backoff_factor=0.5,
|
||||
respect_retry_after_header=True,
|
||||
raise_on_status=False,
|
||||
)
|
||||
|
||||
adapter = HTTPAdapter(max_retries=retry)
|
||||
self._session.mount("http://", adapter)
|
||||
self._session.mount("https://", adapter)
|
||||
self._session.headers.update(self._build_headers())
|
||||
|
||||
return self._session
|
||||
|
||||
def get(self, endpoint: str, params: dict | None = None) -> dict:
|
||||
"""
|
||||
兼容旧名的请求入口(实际以 POST JSON 方式请求)。
|
||||
"""
|
||||
return self._post_json(endpoint, params)
|
||||
|
||||
def _post_json(self, endpoint: str, payload: dict | None = None) -> dict:
|
||||
if not self.base_url:
|
||||
raise ValueError("API base_url 未配置")
|
||||
|
||||
url = f"{self.base_url}/{endpoint.lstrip('/')}"
|
||||
sess = self._get_session()
|
||||
resp = sess.post(url, json=payload or {}, timeout=self.timeout)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
self._ensure_success(data)
|
||||
return data
|
||||
|
||||
def _build_headers(self) -> dict:
|
||||
headers = dict(DEFAULT_BROWSER_HEADERS)
|
||||
headers.update(self.headers_extra)
|
||||
if self.token:
|
||||
headers["Authorization"] = self.token
|
||||
return headers
|
||||
|
||||
@staticmethod
|
||||
def _normalize_token(token: str | None) -> str | None:
|
||||
if not token:
|
||||
return None
|
||||
t = str(token).strip()
|
||||
if not t.lower().startswith("bearer "):
|
||||
t = f"Bearer {t}"
|
||||
return t
|
||||
|
||||
@staticmethod
|
||||
def _ensure_success(payload: dict):
|
||||
"""API 返回 code 非 0 时主动抛错,便于上层重试/记录。"""
|
||||
if isinstance(payload, dict) and "code" in payload:
|
||||
code = payload.get("code")
|
||||
if code not in (0, "0", None):
|
||||
msg = payload.get("msg") or payload.get("message") or ""
|
||||
raise ValueError(f"API 返回错误 code={code} msg={msg}")
|
||||
|
||||
# ------------------------------------------------------------------ 分页
|
||||
def iter_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict | None,
|
||||
page_size: int | None = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | Sequence[str] | None = None,
|
||||
page_start: int = 1,
|
||||
page_end: int | None = None,
|
||||
) -> Iterable[tuple[int, list, dict, dict]]:
|
||||
"""
|
||||
分页迭代器:逐页拉取数据并产出 (page_no, records, request_params, raw_response)。
|
||||
page_size=None 时不附带分页参数,仅拉取一次。
|
||||
"""
|
||||
base_params = dict(params or {})
|
||||
page = page_start
|
||||
|
||||
while True:
|
||||
page_params = dict(base_params)
|
||||
if page_size is not None:
|
||||
page_params[page_field] = page
|
||||
page_params[size_field] = page_size
|
||||
|
||||
payload = self._post_json(endpoint, page_params)
|
||||
records = self._extract_list(payload, data_path, list_key)
|
||||
|
||||
yield page, records, page_params, payload
|
||||
|
||||
if page_size is None:
|
||||
break
|
||||
if page_end is not None and page >= page_end:
|
||||
break
|
||||
if len(records) < (page_size or 0):
|
||||
break
|
||||
if len(records) == 0:
|
||||
break
|
||||
|
||||
page += 1
|
||||
|
||||
def get_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict,
|
||||
page_size: int | None = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | Sequence[str] | None = None,
|
||||
page_start: int = 1,
|
||||
page_end: int | None = None,
|
||||
) -> tuple[list, list]:
|
||||
"""分页获取数据并将所有记录汇总在一个列表中。"""
|
||||
records, pages_meta = [], []
|
||||
|
||||
for page_no, page_records, request_params, response in self.iter_paginated(
|
||||
endpoint=endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
page_field=page_field,
|
||||
size_field=size_field,
|
||||
data_path=data_path,
|
||||
list_key=list_key,
|
||||
page_start=page_start,
|
||||
page_end=page_end,
|
||||
):
|
||||
records.extend(page_records)
|
||||
pages_meta.append(
|
||||
{"page": page_no, "request": request_params, "response": response}
|
||||
)
|
||||
|
||||
return records, pages_meta
|
||||
|
||||
# ------------------------------------------------------------------ 响应解析
|
||||
@classmethod
|
||||
def _extract_list(
|
||||
cls, payload: dict | list, data_path: tuple, list_key: str | Sequence[str] | None
|
||||
) -> list:
|
||||
"""根据 data_path/list_key 提取列表结构,兼容常见字段名。"""
|
||||
cur: object = payload
|
||||
|
||||
if isinstance(cur, list):
|
||||
return cur
|
||||
|
||||
for key in data_path:
|
||||
if isinstance(cur, dict):
|
||||
cur = cur.get(key)
|
||||
else:
|
||||
cur = None
|
||||
if cur is None:
|
||||
break
|
||||
|
||||
if isinstance(cur, list):
|
||||
return cur
|
||||
|
||||
if isinstance(cur, dict):
|
||||
if list_key:
|
||||
keys = (list_key,) if isinstance(list_key, str) else tuple(list_key)
|
||||
for k in keys:
|
||||
if isinstance(cur.get(k), list):
|
||||
return cur[k]
|
||||
|
||||
for k in DEFAULT_LIST_KEYS:
|
||||
if isinstance(cur.get(k), list):
|
||||
return cur[k]
|
||||
|
||||
for v in cur.values():
|
||||
if isinstance(v, list):
|
||||
return v
|
||||
|
||||
return []
|
||||
74
etl_billiards/api/local_json_client.py
Normal file
74
etl_billiards/api/local_json_client.py
Normal file
@@ -0,0 +1,74 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""本地 JSON 客户端,模拟 APIClient 的分页接口,从落盘的 JSON 回放数据。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Iterable, Tuple
|
||||
|
||||
from api.client import APIClient
|
||||
from utils.json_store import endpoint_to_filename
|
||||
|
||||
|
||||
class LocalJsonClient:
|
||||
"""
|
||||
读取 RecordingAPIClient 生成的 JSON,提供 iter_paginated/get_paginated 接口。
|
||||
"""
|
||||
|
||||
def __init__(self, base_dir: str | Path):
|
||||
self.base_dir = Path(base_dir)
|
||||
if not self.base_dir.exists():
|
||||
raise FileNotFoundError(f"JSON 目录不存在: {self.base_dir}")
|
||||
|
||||
def iter_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict | None,
|
||||
page_size: int = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | None = None,
|
||||
) -> Iterable[Tuple[int, list, dict, dict]]:
|
||||
file_path = self.base_dir / endpoint_to_filename(endpoint)
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"未找到匹配的 JSON 文件: {file_path}")
|
||||
|
||||
with file_path.open("r", encoding="utf-8") as fp:
|
||||
payload = json.load(fp)
|
||||
|
||||
pages = payload.get("pages")
|
||||
if not isinstance(pages, list) or not pages:
|
||||
pages = [{"page": 1, "request": params or {}, "response": payload}]
|
||||
|
||||
for idx, page in enumerate(pages, start=1):
|
||||
response = page.get("response", {})
|
||||
request_params = page.get("request") or {}
|
||||
page_no = page.get("page") or idx
|
||||
records = APIClient._extract_list(response, data_path, list_key) # type: ignore[attr-defined]
|
||||
yield page_no, records, request_params, response
|
||||
|
||||
def get_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict,
|
||||
page_size: int = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | None = None,
|
||||
) -> tuple[list, list]:
|
||||
records: list = []
|
||||
pages_meta: list = []
|
||||
for page_no, page_records, request_params, response in self.iter_paginated(
|
||||
endpoint=endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
page_field=page_field,
|
||||
size_field=size_field,
|
||||
data_path=data_path,
|
||||
list_key=list_key,
|
||||
):
|
||||
records.extend(page_records)
|
||||
pages_meta.append({"page": page_no, "request": request_params, "response": response})
|
||||
return records, pages_meta
|
||||
118
etl_billiards/api/recording_client.py
Normal file
118
etl_billiards/api/recording_client.py
Normal file
@@ -0,0 +1,118 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""包装 APIClient,将分页响应落盘便于后续本地清洗。"""
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterable, Tuple
|
||||
|
||||
from api.client import APIClient
|
||||
from utils.json_store import dump_json, endpoint_to_filename
|
||||
|
||||
|
||||
class RecordingAPIClient:
|
||||
"""
|
||||
代理 APIClient,在调用 iter_paginated/get_paginated 时同时把响应写入 JSON 文件。
|
||||
文件名根据 endpoint 生成,写入到指定 output_dir。
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
base_client: APIClient,
|
||||
output_dir: Path | str,
|
||||
task_code: str,
|
||||
run_id: int,
|
||||
write_pretty: bool = False,
|
||||
):
|
||||
self.base = base_client
|
||||
self.output_dir = Path(output_dir)
|
||||
self.output_dir.mkdir(parents=True, exist_ok=True)
|
||||
self.task_code = task_code
|
||||
self.run_id = run_id
|
||||
self.write_pretty = write_pretty
|
||||
self.last_dump: dict[str, Any] | None = None
|
||||
|
||||
# ------------------------------------------------------------------ public API
|
||||
def iter_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict | None,
|
||||
page_size: int = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | None = None,
|
||||
) -> Iterable[Tuple[int, list, dict, dict]]:
|
||||
pages: list[dict[str, Any]] = []
|
||||
total_records = 0
|
||||
|
||||
for page_no, records, request_params, response in self.base.iter_paginated(
|
||||
endpoint=endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
page_field=page_field,
|
||||
size_field=size_field,
|
||||
data_path=data_path,
|
||||
list_key=list_key,
|
||||
):
|
||||
pages.append({"page": page_no, "request": request_params, "response": response})
|
||||
total_records += len(records)
|
||||
yield page_no, records, request_params, response
|
||||
|
||||
self._dump(endpoint, params, page_size, pages, total_records)
|
||||
|
||||
def get_paginated(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict,
|
||||
page_size: int = 200,
|
||||
page_field: str = "page",
|
||||
size_field: str = "limit",
|
||||
data_path: tuple = ("data",),
|
||||
list_key: str | None = None,
|
||||
) -> tuple[list, list]:
|
||||
records: list = []
|
||||
pages_meta: list = []
|
||||
|
||||
for page_no, page_records, request_params, response in self.iter_paginated(
|
||||
endpoint=endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
page_field=page_field,
|
||||
size_field=size_field,
|
||||
data_path=data_path,
|
||||
list_key=list_key,
|
||||
):
|
||||
records.extend(page_records)
|
||||
pages_meta.append({"page": page_no, "request": request_params, "response": response})
|
||||
|
||||
return records, pages_meta
|
||||
|
||||
# ------------------------------------------------------------------ internal
|
||||
def _dump(
|
||||
self,
|
||||
endpoint: str,
|
||||
params: dict | None,
|
||||
page_size: int,
|
||||
pages: list[dict[str, Any]],
|
||||
total_records: int,
|
||||
):
|
||||
filename = endpoint_to_filename(endpoint)
|
||||
path = self.output_dir / filename
|
||||
payload = {
|
||||
"task_code": self.task_code,
|
||||
"run_id": self.run_id,
|
||||
"endpoint": endpoint,
|
||||
"params": params or {},
|
||||
"page_size": page_size,
|
||||
"pages": pages,
|
||||
"total_records": total_records,
|
||||
"dumped_at": datetime.utcnow().isoformat() + "Z",
|
||||
}
|
||||
dump_json(path, payload, pretty=self.write_pretty)
|
||||
self.last_dump = {
|
||||
"file": str(path),
|
||||
"endpoint": endpoint,
|
||||
"pages": len(pages),
|
||||
"records": total_records,
|
||||
}
|
||||
0
etl_billiards/cli/__init__.py
Normal file
0
etl_billiards/cli/__init__.py
Normal file
158
etl_billiards/cli/main.py
Normal file
158
etl_billiards/cli/main.py
Normal file
@@ -0,0 +1,158 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""CLI主入口"""
|
||||
import sys
|
||||
import argparse
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
from config.settings import AppConfig
|
||||
from orchestration.scheduler import ETLScheduler
|
||||
|
||||
def setup_logging():
|
||||
"""设置日志"""
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s [%(levelname)s] %(name)s: %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
return logging.getLogger("etl_billiards")
|
||||
|
||||
def parse_args():
|
||||
"""解析命令行参数"""
|
||||
parser = argparse.ArgumentParser(description="台球场ETL系统")
|
||||
|
||||
# 基本参数
|
||||
parser.add_argument("--store-id", type=int, help="门店ID")
|
||||
parser.add_argument("--tasks", help="任务列表,逗号分隔")
|
||||
parser.add_argument("--dry-run", action="store_true", help="试运行(不提交)")
|
||||
|
||||
# 数据库参数
|
||||
parser.add_argument("--pg-dsn", help="PostgreSQL DSN")
|
||||
parser.add_argument("--pg-host", help="PostgreSQL主机")
|
||||
parser.add_argument("--pg-port", type=int, help="PostgreSQL端口")
|
||||
parser.add_argument("--pg-name", help="PostgreSQL数据库名")
|
||||
parser.add_argument("--pg-user", help="PostgreSQL用户名")
|
||||
parser.add_argument("--pg-password", help="PostgreSQL密码")
|
||||
|
||||
# API参数
|
||||
parser.add_argument("--api-base", help="API基础URL")
|
||||
parser.add_argument("--api-token", "--token", dest="api_token", help="API令牌(Bearer Token)")
|
||||
parser.add_argument("--api-timeout", type=int, help="API超时(秒)")
|
||||
parser.add_argument("--api-page-size", type=int, help="分页大小")
|
||||
parser.add_argument("--api-retry-max", type=int, help="API重试最大次数")
|
||||
|
||||
# 目录参数
|
||||
parser.add_argument("--export-root", help="导出根目录")
|
||||
parser.add_argument("--log-root", help="日志根目录")
|
||||
|
||||
# 抓取/清洗管线
|
||||
parser.add_argument("--pipeline-flow", choices=["FULL", "FETCH_ONLY", "INGEST_ONLY"], help="流水线模式")
|
||||
parser.add_argument("--fetch-root", help="抓取JSON输出根目录")
|
||||
parser.add_argument("--ingest-source", help="本地清洗入库源目录")
|
||||
parser.add_argument("--write-pretty-json", action="store_true", help="抓取JSON美化输出")
|
||||
|
||||
# 运行窗口
|
||||
parser.add_argument("--idle-start", help="闲时窗口开始(HH:MM)")
|
||||
parser.add_argument("--idle-end", help="闲时窗口结束(HH:MM)")
|
||||
parser.add_argument("--allow-empty-advance", action="store_true", help="允许空结果推进窗口")
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
def build_cli_overrides(args) -> dict:
|
||||
"""从命令行参数构建配置覆盖"""
|
||||
overrides = {}
|
||||
|
||||
# 基本信息
|
||||
if args.store_id is not None:
|
||||
overrides.setdefault("app", {})["store_id"] = args.store_id
|
||||
|
||||
# 数据库
|
||||
if args.pg_dsn:
|
||||
overrides.setdefault("db", {})["dsn"] = args.pg_dsn
|
||||
if args.pg_host:
|
||||
overrides.setdefault("db", {})["host"] = args.pg_host
|
||||
if args.pg_port:
|
||||
overrides.setdefault("db", {})["port"] = args.pg_port
|
||||
if args.pg_name:
|
||||
overrides.setdefault("db", {})["name"] = args.pg_name
|
||||
if args.pg_user:
|
||||
overrides.setdefault("db", {})["user"] = args.pg_user
|
||||
if args.pg_password:
|
||||
overrides.setdefault("db", {})["password"] = args.pg_password
|
||||
|
||||
# API
|
||||
if args.api_base:
|
||||
overrides.setdefault("api", {})["base_url"] = args.api_base
|
||||
if args.api_token:
|
||||
overrides.setdefault("api", {})["token"] = args.api_token
|
||||
if args.api_timeout:
|
||||
overrides.setdefault("api", {})["timeout_sec"] = args.api_timeout
|
||||
if args.api_page_size:
|
||||
overrides.setdefault("api", {})["page_size"] = args.api_page_size
|
||||
if args.api_retry_max:
|
||||
overrides.setdefault("api", {}).setdefault("retries", {})["max_attempts"] = args.api_retry_max
|
||||
|
||||
# 目录
|
||||
if args.export_root:
|
||||
overrides.setdefault("io", {})["export_root"] = args.export_root
|
||||
if args.log_root:
|
||||
overrides.setdefault("io", {})["log_root"] = args.log_root
|
||||
|
||||
# 抓取/清洗管线
|
||||
if args.pipeline_flow:
|
||||
overrides.setdefault("pipeline", {})["flow"] = args.pipeline_flow.upper()
|
||||
if args.fetch_root:
|
||||
overrides.setdefault("pipeline", {})["fetch_root"] = args.fetch_root
|
||||
if args.ingest_source:
|
||||
overrides.setdefault("pipeline", {})["ingest_source_dir"] = args.ingest_source
|
||||
if args.write_pretty_json:
|
||||
overrides.setdefault("io", {})["write_pretty_json"] = True
|
||||
|
||||
# 运行窗口
|
||||
if args.idle_start:
|
||||
overrides.setdefault("run", {}).setdefault("idle_window", {})["start"] = args.idle_start
|
||||
if args.idle_end:
|
||||
overrides.setdefault("run", {}).setdefault("idle_window", {})["end"] = args.idle_end
|
||||
if args.allow_empty_advance:
|
||||
overrides.setdefault("run", {})["allow_empty_result_advance"] = True
|
||||
|
||||
# 任务
|
||||
if args.tasks:
|
||||
tasks = [t.strip().upper() for t in args.tasks.split(",") if t.strip()]
|
||||
overrides.setdefault("run", {})["tasks"] = tasks
|
||||
|
||||
return overrides
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
logger = setup_logging()
|
||||
args = parse_args()
|
||||
|
||||
try:
|
||||
# 加载配置
|
||||
cli_overrides = build_cli_overrides(args)
|
||||
config = AppConfig.load(cli_overrides)
|
||||
|
||||
logger.info("配置加载完成")
|
||||
logger.info(f"门店ID: {config.get('app.store_id')}")
|
||||
logger.info(f"任务列表: {config.get('run.tasks')}")
|
||||
|
||||
# 创建调度器
|
||||
scheduler = ETLScheduler(config, logger)
|
||||
|
||||
# 运行任务
|
||||
task_codes = config.get("run.tasks")
|
||||
scheduler.run_tasks(task_codes)
|
||||
|
||||
# 关闭连接
|
||||
scheduler.close()
|
||||
|
||||
logger.info("ETL运行完成")
|
||||
return 0
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"ETL运行失败: {e}", exc_info=True)
|
||||
return 1
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
0
etl_billiards/config/__init__.py
Normal file
0
etl_billiards/config/__init__.py
Normal file
120
etl_billiards/config/defaults.py
Normal file
120
etl_billiards/config/defaults.py
Normal file
@@ -0,0 +1,120 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""配置默认值定义"""
|
||||
|
||||
DEFAULTS = {
|
||||
"app": {
|
||||
"timezone": "Asia/Taipei",
|
||||
"store_id": "",
|
||||
"schema_oltp": "billiards",
|
||||
"schema_etl": "etl_admin",
|
||||
},
|
||||
"db": {
|
||||
"dsn": "",
|
||||
"host": "",
|
||||
"port": "",
|
||||
"name": "",
|
||||
"user": "",
|
||||
"password": "",
|
||||
"connect_timeout_sec": 20,
|
||||
"batch_size": 1000,
|
||||
"session": {
|
||||
"timezone": "Asia/Taipei",
|
||||
"statement_timeout_ms": 30000,
|
||||
"lock_timeout_ms": 5000,
|
||||
"idle_in_tx_timeout_ms": 600000,
|
||||
},
|
||||
},
|
||||
"api": {
|
||||
"base_url": "https://pc.ficoo.vip/apiprod/admin/v1",
|
||||
"token": None,
|
||||
"timeout_sec": 20,
|
||||
"page_size": 200,
|
||||
"params": {},
|
||||
"retries": {
|
||||
"max_attempts": 3,
|
||||
"backoff_sec": [1, 2, 4],
|
||||
},
|
||||
"headers_extra": {},
|
||||
},
|
||||
"run": {
|
||||
"tasks": [
|
||||
"PRODUCTS",
|
||||
"TABLES",
|
||||
"MEMBERS",
|
||||
"ASSISTANTS",
|
||||
"PACKAGES_DEF",
|
||||
"ORDERS",
|
||||
"PAYMENTS",
|
||||
"REFUNDS",
|
||||
"COUPON_USAGE",
|
||||
"INVENTORY_CHANGE",
|
||||
"TOPUPS",
|
||||
"TABLE_DISCOUNT",
|
||||
"ASSISTANT_ABOLISH",
|
||||
"LEDGER",
|
||||
],
|
||||
"window_minutes": {
|
||||
"default_busy": 30,
|
||||
"default_idle": 180,
|
||||
},
|
||||
"overlap_seconds": 120,
|
||||
"idle_window": {
|
||||
"start": "04:00",
|
||||
"end": "16:00",
|
||||
},
|
||||
"allow_empty_result_advance": True,
|
||||
},
|
||||
"io": {
|
||||
"export_root": r"C:\dev\LLTQ\export\JSON",
|
||||
"log_root": r"C:\dev\LLTQ\export\LOG",
|
||||
"manifest_name": "manifest.json",
|
||||
"ingest_report_name": "ingest_report.json",
|
||||
"write_pretty_json": True,
|
||||
"max_file_bytes": 50 * 1024 * 1024,
|
||||
},
|
||||
"pipeline": {
|
||||
# 运行流程:FETCH_ONLY(仅在线抓取落盘)、INGEST_ONLY(本地清洗入库)、FULL(抓取 + 清洗入库)
|
||||
"flow": "FULL",
|
||||
# 在线抓取 JSON 输出根目录(按任务、run_id 与时间自动创建子目录)
|
||||
"fetch_root": r"C:\dev\LLTQ\export\JSON",
|
||||
# 本地清洗入库时的 JSON 输入目录(为空则默认使用本次抓取目录)
|
||||
"ingest_source_dir": "",
|
||||
},
|
||||
"clean": {
|
||||
"log_unknown_fields": True,
|
||||
"unknown_fields_limit": 50,
|
||||
"hash_key": {
|
||||
"algo": "sha1",
|
||||
"salt": "",
|
||||
},
|
||||
"strict_numeric": True,
|
||||
"round_money_scale": 2,
|
||||
},
|
||||
"security": {
|
||||
"redact_in_logs": True,
|
||||
"redact_keys": ["token", "password", "Authorization"],
|
||||
"echo_token_in_logs": False,
|
||||
},
|
||||
"ods": {
|
||||
# ODS 离线重建/回放相关(仅开发/运维使用)
|
||||
"json_doc_dir": r"C:\dev\LLTQ\export\test-json-doc",
|
||||
"include_files": "",
|
||||
"drop_schema_first": True,
|
||||
},
|
||||
}
|
||||
|
||||
# 任务代码常量
|
||||
TASK_ORDERS = "ORDERS"
|
||||
TASK_PAYMENTS = "PAYMENTS"
|
||||
TASK_REFUNDS = "REFUNDS"
|
||||
TASK_INVENTORY_CHANGE = "INVENTORY_CHANGE"
|
||||
TASK_COUPON_USAGE = "COUPON_USAGE"
|
||||
TASK_MEMBERS = "MEMBERS"
|
||||
TASK_ASSISTANTS = "ASSISTANTS"
|
||||
TASK_PRODUCTS = "PRODUCTS"
|
||||
TASK_TABLES = "TABLES"
|
||||
TASK_PACKAGES_DEF = "PACKAGES_DEF"
|
||||
TASK_TOPUPS = "TOPUPS"
|
||||
TASK_TABLE_DISCOUNT = "TABLE_DISCOUNT"
|
||||
TASK_ASSISTANT_ABOLISH = "ASSISTANT_ABOLISH"
|
||||
TASK_LEDGER = "LEDGER"
|
||||
175
etl_billiards/config/env_parser.py
Normal file
175
etl_billiards/config/env_parser.py
Normal file
@@ -0,0 +1,175 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""环境变量解析"""
|
||||
import os
|
||||
import json
|
||||
from pathlib import Path
|
||||
from copy import deepcopy
|
||||
|
||||
ENV_MAP = {
|
||||
"TIMEZONE": ("app.timezone",),
|
||||
"STORE_ID": ("app.store_id",),
|
||||
"SCHEMA_OLTP": ("app.schema_oltp",),
|
||||
"SCHEMA_ETL": ("app.schema_etl",),
|
||||
"PG_DSN": ("db.dsn",),
|
||||
"PG_HOST": ("db.host",),
|
||||
"PG_PORT": ("db.port",),
|
||||
"PG_NAME": ("db.name",),
|
||||
"PG_USER": ("db.user",),
|
||||
"PG_PASSWORD": ("db.password",),
|
||||
"PG_CONNECT_TIMEOUT": ("db.connect_timeout_sec",),
|
||||
"API_BASE": ("api.base_url",),
|
||||
"API_TOKEN": ("api.token",),
|
||||
"FICOO_TOKEN": ("api.token",),
|
||||
"API_TIMEOUT": ("api.timeout_sec",),
|
||||
"API_PAGE_SIZE": ("api.page_size",),
|
||||
"API_RETRY_MAX": ("api.retries.max_attempts",),
|
||||
"API_RETRY_BACKOFF": ("api.retries.backoff_sec",),
|
||||
"API_PARAMS": ("api.params",),
|
||||
"EXPORT_ROOT": ("io.export_root",),
|
||||
"LOG_ROOT": ("io.log_root",),
|
||||
"MANIFEST_NAME": ("io.manifest_name",),
|
||||
"INGEST_REPORT_NAME": ("io.ingest_report_name",),
|
||||
"WRITE_PRETTY_JSON": ("io.write_pretty_json",),
|
||||
"RUN_TASKS": ("run.tasks",),
|
||||
"OVERLAP_SECONDS": ("run.overlap_seconds",),
|
||||
"WINDOW_BUSY_MIN": ("run.window_minutes.default_busy",),
|
||||
"WINDOW_IDLE_MIN": ("run.window_minutes.default_idle",),
|
||||
"IDLE_START": ("run.idle_window.start",),
|
||||
"IDLE_END": ("run.idle_window.end",),
|
||||
"IDLE_WINDOW_START": ("run.idle_window.start",),
|
||||
"IDLE_WINDOW_END": ("run.idle_window.end",),
|
||||
"ALLOW_EMPTY_RESULT_ADVANCE": ("run.allow_empty_result_advance",),
|
||||
"ALLOW_EMPTY_ADVANCE": ("run.allow_empty_result_advance",),
|
||||
"PIPELINE_FLOW": ("pipeline.flow",),
|
||||
"JSON_FETCH_ROOT": ("pipeline.fetch_root",),
|
||||
"JSON_SOURCE_DIR": ("pipeline.ingest_source_dir",),
|
||||
"FETCH_ROOT": ("pipeline.fetch_root",),
|
||||
"INGEST_SOURCE_DIR": ("pipeline.ingest_source_dir",),
|
||||
}
|
||||
|
||||
|
||||
def _deep_set(d, dotted_keys, value):
|
||||
cur = d
|
||||
for k in dotted_keys[:-1]:
|
||||
cur = cur.setdefault(k, {})
|
||||
cur[dotted_keys[-1]] = value
|
||||
|
||||
|
||||
def _coerce_env(v: str):
|
||||
if v is None:
|
||||
return None
|
||||
s = v.strip()
|
||||
if s.lower() in ("true", "false"):
|
||||
return s.lower() == "true"
|
||||
try:
|
||||
if s.isdigit() or (s.startswith("-") and s[1:].isdigit()):
|
||||
return int(s)
|
||||
except Exception:
|
||||
pass
|
||||
if (s.startswith("{") and s.endswith("}")) or (s.startswith("[") and s.endswith("]")):
|
||||
try:
|
||||
return json.loads(s)
|
||||
except Exception:
|
||||
return s
|
||||
return s
|
||||
|
||||
|
||||
def _strip_inline_comment(value: str) -> str:
|
||||
"""去掉未被引号包裹的内联注释"""
|
||||
result = []
|
||||
in_quote = False
|
||||
quote_char = ""
|
||||
escape = False
|
||||
for ch in value:
|
||||
if escape:
|
||||
result.append(ch)
|
||||
escape = False
|
||||
continue
|
||||
if ch == "\\":
|
||||
escape = True
|
||||
result.append(ch)
|
||||
continue
|
||||
if ch in ("'", '"'):
|
||||
if not in_quote:
|
||||
in_quote = True
|
||||
quote_char = ch
|
||||
elif quote_char == ch:
|
||||
in_quote = False
|
||||
quote_char = ""
|
||||
result.append(ch)
|
||||
continue
|
||||
if ch == "#" and not in_quote:
|
||||
break
|
||||
result.append(ch)
|
||||
return "".join(result).rstrip()
|
||||
|
||||
|
||||
def _unquote_value(value: str) -> str:
|
||||
"""处理引号/原始字符串以及尾随逗号"""
|
||||
trimmed = value.strip()
|
||||
trimmed = _strip_inline_comment(trimmed)
|
||||
trimmed = trimmed.rstrip(",").rstrip()
|
||||
if not trimmed:
|
||||
return trimmed
|
||||
if len(trimmed) >= 2 and trimmed[0] in ("'", '"') and trimmed[-1] == trimmed[0]:
|
||||
return trimmed[1:-1]
|
||||
if (
|
||||
len(trimmed) >= 3
|
||||
and trimmed[0] in ("r", "R")
|
||||
and trimmed[1] in ("'", '"')
|
||||
and trimmed[-1] == trimmed[1]
|
||||
):
|
||||
return trimmed[2:-1]
|
||||
return trimmed
|
||||
|
||||
|
||||
def _parse_dotenv_line(line: str) -> tuple[str, str] | None:
|
||||
"""解析 .env 文件中的单行"""
|
||||
stripped = line.strip()
|
||||
if not stripped or stripped.startswith("#"):
|
||||
return None
|
||||
if stripped.startswith("export "):
|
||||
stripped = stripped[len("export ") :].strip()
|
||||
if "=" not in stripped:
|
||||
return None
|
||||
key, value = stripped.split("=", 1)
|
||||
key = key.strip()
|
||||
value = _unquote_value(value)
|
||||
return key, value
|
||||
|
||||
|
||||
def _load_dotenv_values() -> dict:
|
||||
"""从项目根目录读取 .env 文件键值"""
|
||||
if os.environ.get("ETL_SKIP_DOTENV") in ("1", "true", "TRUE", "True"):
|
||||
return {}
|
||||
root = Path(__file__).resolve().parents[1]
|
||||
dotenv_path = root / ".env"
|
||||
if not dotenv_path.exists():
|
||||
return {}
|
||||
values: dict[str, str] = {}
|
||||
for line in dotenv_path.read_text(encoding="utf-8", errors="ignore").splitlines():
|
||||
parsed = _parse_dotenv_line(line)
|
||||
if parsed:
|
||||
key, value = parsed
|
||||
values[key] = value
|
||||
return values
|
||||
|
||||
|
||||
def _apply_env_values(cfg: dict, source: dict):
|
||||
for env_key, dotted in ENV_MAP.items():
|
||||
val = source.get(env_key)
|
||||
if val is None:
|
||||
continue
|
||||
v2 = _coerce_env(val)
|
||||
for path in dotted:
|
||||
if path == "run.tasks" and isinstance(v2, str):
|
||||
v2 = [item.strip() for item in v2.split(",") if item.strip()]
|
||||
_deep_set(cfg, path.split("."), v2)
|
||||
|
||||
|
||||
def load_env_overrides(defaults: dict) -> dict:
|
||||
cfg = deepcopy(defaults)
|
||||
# 先读取 .env,再读取真实环境变量,确保 CLI 仍然最高优先级
|
||||
_apply_env_values(cfg, _load_dotenv_values())
|
||||
_apply_env_values(cfg, os.environ)
|
||||
return cfg
|
||||
92
etl_billiards/config/settings.py
Normal file
92
etl_billiards/config/settings.py
Normal file
@@ -0,0 +1,92 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""配置管理主类"""
|
||||
from copy import deepcopy
|
||||
from .defaults import DEFAULTS
|
||||
from .env_parser import load_env_overrides
|
||||
|
||||
class AppConfig:
|
||||
"""应用配置管理器"""
|
||||
|
||||
def __init__(self, config_dict: dict):
|
||||
self.config = config_dict
|
||||
|
||||
@classmethod
|
||||
def load(cls, cli_overrides: dict = None):
|
||||
"""加载配置: DEFAULTS < ENV < CLI"""
|
||||
cfg = load_env_overrides(DEFAULTS)
|
||||
|
||||
if cli_overrides:
|
||||
cls._deep_merge(cfg, cli_overrides)
|
||||
|
||||
# 规范化
|
||||
cls._normalize(cfg)
|
||||
cls._validate(cfg)
|
||||
|
||||
return cls(cfg)
|
||||
|
||||
@staticmethod
|
||||
def _deep_merge(dst, src):
|
||||
"""深度合并字典"""
|
||||
for k, v in src.items():
|
||||
if isinstance(v, dict) and isinstance(dst.get(k), dict):
|
||||
AppConfig._deep_merge(dst[k], v)
|
||||
else:
|
||||
dst[k] = v
|
||||
|
||||
@staticmethod
|
||||
def _normalize(cfg):
|
||||
"""规范化配置"""
|
||||
# 转换 store_id 为整数
|
||||
try:
|
||||
cfg["app"]["store_id"] = int(str(cfg["app"]["store_id"]).strip())
|
||||
except Exception:
|
||||
raise SystemExit("app.store_id 必须为整数")
|
||||
|
||||
# DSN 组装
|
||||
if not cfg["db"]["dsn"]:
|
||||
cfg["db"]["dsn"] = (
|
||||
f"postgresql://{cfg['db']['user']}:{cfg['db']['password']}"
|
||||
f"@{cfg['db']['host']}:{cfg['db']['port']}/{cfg['db']['name']}"
|
||||
)
|
||||
|
||||
# connect_timeout 限定 1-20 秒
|
||||
try:
|
||||
timeout_sec = int(cfg["db"].get("connect_timeout_sec") or 5)
|
||||
except Exception:
|
||||
raise SystemExit("db.connect_timeout_sec 必须为整数")
|
||||
cfg["db"]["connect_timeout_sec"] = max(1, min(timeout_sec, 20))
|
||||
|
||||
# 会话参数
|
||||
cfg["db"].setdefault("session", {})
|
||||
sess = cfg["db"]["session"]
|
||||
sess.setdefault("timezone", cfg["app"]["timezone"])
|
||||
|
||||
for k in ("statement_timeout_ms", "lock_timeout_ms", "idle_in_tx_timeout_ms"):
|
||||
if k in sess and sess[k] is not None:
|
||||
try:
|
||||
sess[k] = int(sess[k])
|
||||
except Exception:
|
||||
raise SystemExit(f"db.session.{k} 需为整数毫秒")
|
||||
|
||||
@staticmethod
|
||||
def _validate(cfg):
|
||||
"""验证必填配置"""
|
||||
missing = []
|
||||
if not cfg["app"]["store_id"]:
|
||||
missing.append("app.store_id")
|
||||
if missing:
|
||||
raise SystemExit("缺少必需配置: " + ", ".join(missing))
|
||||
|
||||
def get(self, key: str, default=None):
|
||||
"""获取配置值(支持点号路径)"""
|
||||
keys = key.split(".")
|
||||
val = self.config
|
||||
for k in keys:
|
||||
if isinstance(val, dict):
|
||||
val = val.get(k)
|
||||
else:
|
||||
return default
|
||||
return val if val is not None else default
|
||||
|
||||
def __getitem__(self, key):
|
||||
return self.config[key]
|
||||
0
etl_billiards/database/__init__.py
Normal file
0
etl_billiards/database/__init__.py
Normal file
112
etl_billiards/database/base.py
Normal file
112
etl_billiards/database/base.py
Normal file
@@ -0,0 +1,112 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
数据库操作(批量、RETURNING支持)
|
||||
"""
|
||||
import re
|
||||
from typing import List, Dict, Tuple
|
||||
import psycopg2.extras
|
||||
from .connection import DatabaseConnection
|
||||
|
||||
|
||||
class DatabaseOperations(DatabaseConnection):
|
||||
"""扩展数据库操作(包含批量upsert和returning支持)"""
|
||||
|
||||
def batch_execute(self, sql: str, rows: List[Dict], page_size: int = 1000):
|
||||
"""批量执行SQL(不带RETURNING)"""
|
||||
if not rows:
|
||||
return
|
||||
with self.conn.cursor() as c:
|
||||
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
|
||||
|
||||
def batch_upsert_with_returning(self, sql: str, rows: List[Dict], page_size: int = 1000) -> Tuple[int, int]:
|
||||
"""
|
||||
批量 UPSERT 并统计插入/更新数
|
||||
|
||||
Args:
|
||||
sql: 包含RETURNING子句的SQL
|
||||
rows: 数据行列表
|
||||
page_size: 批次大小
|
||||
|
||||
Returns:
|
||||
(inserted_count, updated_count) 元组
|
||||
"""
|
||||
if not rows:
|
||||
return (0, 0)
|
||||
|
||||
use_returning = "RETURNING" in sql.upper()
|
||||
|
||||
with self.conn.cursor() as c:
|
||||
if not use_returning:
|
||||
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
|
||||
return (0, 0)
|
||||
|
||||
# 优先尝试向量化执行
|
||||
try:
|
||||
inserted, updated = self._execute_with_returning_vectorized(c, sql, rows, page_size)
|
||||
return (inserted, updated)
|
||||
except Exception:
|
||||
# 回退到逐行执行
|
||||
return self._execute_with_returning_row_by_row(c, sql, rows)
|
||||
|
||||
def _execute_with_returning_vectorized(self, cursor, sql: str, rows: List[Dict], page_size: int) -> Tuple[int, int]:
|
||||
"""向量化执行(使用execute_values)"""
|
||||
# 解析VALUES子句
|
||||
m = re.search(r"VALUES\s*\((.*?)\)", sql, flags=re.IGNORECASE | re.DOTALL)
|
||||
if not m:
|
||||
raise ValueError("Cannot parse VALUES clause")
|
||||
|
||||
tpl = "(" + m.group(1) + ")"
|
||||
base_sql = sql[:m.start()] + "VALUES %s" + sql[m.end():]
|
||||
|
||||
ret = psycopg2.extras.execute_values(
|
||||
cursor, base_sql, rows, template=tpl, page_size=page_size, fetch=True
|
||||
)
|
||||
|
||||
if not ret:
|
||||
return (0, 0)
|
||||
|
||||
inserted = 0
|
||||
for rec in ret:
|
||||
flag = self._extract_inserted_flag(rec)
|
||||
if flag:
|
||||
inserted += 1
|
||||
|
||||
return (inserted, len(ret) - inserted)
|
||||
|
||||
def _execute_with_returning_row_by_row(self, cursor, sql: str, rows: List[Dict]) -> Tuple[int, int]:
|
||||
"""逐行执行(回退方案)"""
|
||||
inserted = 0
|
||||
updated = 0
|
||||
|
||||
for r in rows:
|
||||
cursor.execute(sql, r)
|
||||
try:
|
||||
rec = cursor.fetchone()
|
||||
except Exception:
|
||||
rec = None
|
||||
|
||||
flag = self._extract_inserted_flag(rec) if rec else None
|
||||
|
||||
if flag:
|
||||
inserted += 1
|
||||
else:
|
||||
updated += 1
|
||||
|
||||
return (inserted, updated)
|
||||
|
||||
@staticmethod
|
||||
def _extract_inserted_flag(rec) -> bool:
|
||||
"""从返回记录中提取inserted标志"""
|
||||
if isinstance(rec, tuple):
|
||||
return bool(rec[0])
|
||||
elif isinstance(rec, dict):
|
||||
return bool(rec.get("inserted"))
|
||||
else:
|
||||
try:
|
||||
return bool(rec["inserted"])
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
# 为了向后兼容,提供Pg别名
|
||||
Pg = DatabaseOperations
|
||||
63
etl_billiards/database/connection.py
Normal file
63
etl_billiards/database/connection.py
Normal file
@@ -0,0 +1,63 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Database connection manager with capped connect_timeout."""
|
||||
|
||||
import psycopg2
|
||||
import psycopg2.extras
|
||||
|
||||
|
||||
class DatabaseConnection:
|
||||
"""Wrap psycopg2 connection with session parameters and timeout guard."""
|
||||
|
||||
def __init__(self, dsn: str, session: dict = None, connect_timeout: int = None):
|
||||
timeout_val = connect_timeout if connect_timeout is not None else 5
|
||||
# PRD: database connect_timeout must not exceed 20 seconds.
|
||||
timeout_val = max(1, min(int(timeout_val), 20))
|
||||
|
||||
self.conn = psycopg2.connect(dsn, connect_timeout=timeout_val)
|
||||
self.conn.autocommit = False
|
||||
|
||||
# Session parameters (timezone, statement timeout, etc.)
|
||||
if session:
|
||||
with self.conn.cursor() as c:
|
||||
if session.get("timezone"):
|
||||
c.execute("SET TIME ZONE %s", (session["timezone"],))
|
||||
if session.get("statement_timeout_ms") is not None:
|
||||
c.execute(
|
||||
"SET statement_timeout = %s",
|
||||
(int(session["statement_timeout_ms"]),),
|
||||
)
|
||||
if session.get("lock_timeout_ms") is not None:
|
||||
c.execute(
|
||||
"SET lock_timeout = %s", (int(session["lock_timeout_ms"]),)
|
||||
)
|
||||
if session.get("idle_in_tx_timeout_ms") is not None:
|
||||
c.execute(
|
||||
"SET idle_in_transaction_session_timeout = %s",
|
||||
(int(session["idle_in_tx_timeout_ms"]),),
|
||||
)
|
||||
|
||||
def query(self, sql: str, args=None):
|
||||
"""Execute a query and fetch all rows."""
|
||||
with self.conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as c:
|
||||
c.execute(sql, args)
|
||||
return c.fetchall()
|
||||
|
||||
def execute(self, sql: str, args=None):
|
||||
"""Execute a SQL statement without returning rows."""
|
||||
with self.conn.cursor() as c:
|
||||
c.execute(sql, args)
|
||||
|
||||
def commit(self):
|
||||
"""Commit current transaction."""
|
||||
self.conn.commit()
|
||||
|
||||
def rollback(self):
|
||||
"""Rollback current transaction."""
|
||||
self.conn.rollback()
|
||||
|
||||
def close(self):
|
||||
"""Safely close the connection."""
|
||||
try:
|
||||
self.conn.close()
|
||||
except Exception:
|
||||
pass
|
||||
99
etl_billiards/database/operations.py
Normal file
99
etl_billiards/database/operations.py
Normal file
@@ -0,0 +1,99 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据库批量操作"""
|
||||
import psycopg2.extras
|
||||
import re
|
||||
|
||||
class DatabaseOperations:
|
||||
"""数据库批量操作封装"""
|
||||
|
||||
def __init__(self, connection):
|
||||
self._connection = connection
|
||||
self.conn = connection.conn
|
||||
|
||||
def batch_execute(self, sql: str, rows: list, page_size: int = 1000):
|
||||
"""批量执行SQL"""
|
||||
if not rows:
|
||||
return
|
||||
with self.conn.cursor() as c:
|
||||
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
|
||||
|
||||
def batch_upsert_with_returning(self, sql: str, rows: list,
|
||||
page_size: int = 1000) -> tuple:
|
||||
"""批量UPSERT并返回插入/更新计数"""
|
||||
if not rows:
|
||||
return (0, 0)
|
||||
|
||||
use_returning = "RETURNING" in sql.upper()
|
||||
|
||||
with self.conn.cursor() as c:
|
||||
if not use_returning:
|
||||
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
|
||||
return (0, 0)
|
||||
|
||||
# 尝试向量化执行
|
||||
try:
|
||||
m = re.search(r"VALUES\s*\((.*?)\)", sql, flags=re.IGNORECASE | re.DOTALL)
|
||||
if m:
|
||||
tpl = "(" + m.group(1) + ")"
|
||||
base_sql = sql[:m.start()] + "VALUES %s" + sql[m.end():]
|
||||
|
||||
ret = psycopg2.extras.execute_values(
|
||||
c, base_sql, rows, template=tpl, page_size=page_size, fetch=True
|
||||
)
|
||||
|
||||
if not ret:
|
||||
return (0, 0)
|
||||
|
||||
inserted = sum(1 for rec in ret if self._is_inserted(rec))
|
||||
return (inserted, len(ret) - inserted)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# 回退:逐行执行
|
||||
inserted = 0
|
||||
updated = 0
|
||||
for r in rows:
|
||||
c.execute(sql, r)
|
||||
try:
|
||||
rec = c.fetchone()
|
||||
except Exception:
|
||||
rec = None
|
||||
|
||||
if self._is_inserted(rec):
|
||||
inserted += 1
|
||||
else:
|
||||
updated += 1
|
||||
|
||||
return (inserted, updated)
|
||||
|
||||
@staticmethod
|
||||
def _is_inserted(rec) -> bool:
|
||||
"""判断是否为插入操作"""
|
||||
if rec is None:
|
||||
return False
|
||||
if isinstance(rec, tuple):
|
||||
return bool(rec[0])
|
||||
if isinstance(rec, dict):
|
||||
return bool(rec.get("inserted"))
|
||||
return False
|
||||
|
||||
# --- pass-through helpers -------------------------------------------------
|
||||
def commit(self):
|
||||
"""提交事务(委托给底层连接)"""
|
||||
self._connection.commit()
|
||||
|
||||
def rollback(self):
|
||||
"""回滚事务(委托给底层连接)"""
|
||||
self._connection.rollback()
|
||||
|
||||
def query(self, sql: str, args=None):
|
||||
"""执行查询并返回结果"""
|
||||
return self._connection.query(sql, args)
|
||||
|
||||
def execute(self, sql: str, args=None):
|
||||
"""执行任意 SQL"""
|
||||
self._connection.execute(sql, args)
|
||||
|
||||
def cursor(self):
|
||||
"""暴露原生 cursor,供特殊操作使用"""
|
||||
return self.conn.cursor()
|
||||
1726
etl_billiards/database/schema_dwd_doc.sql
Normal file
1726
etl_billiards/database/schema_dwd_doc.sql
Normal file
File diff suppressed because it is too large
Load Diff
1128
etl_billiards/database/schema_v2.sql
Normal file
1128
etl_billiards/database/schema_v2.sql
Normal file
File diff suppressed because it is too large
Load Diff
39
etl_billiards/database/seed_ods_tasks.sql
Normal file
39
etl_billiards/database/seed_ods_tasks.sql
Normal file
@@ -0,0 +1,39 @@
|
||||
-- 将新的 ODS 任务注册到 etl_admin.etl_task(根据需要替换 store_id)
|
||||
-- 使用方式(示例):
|
||||
-- psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
|
||||
-- 或者在 psql 中执行本文件内容。
|
||||
|
||||
WITH target_store AS (
|
||||
SELECT 2790685415443269::bigint AS store_id -- TODO: 替换为实际 store_id
|
||||
),
|
||||
task_codes AS (
|
||||
SELECT unnest(ARRAY[
|
||||
'ODS_ASSISTANT_ACCOUNTS',
|
||||
'ODS_ASSISTANT_LEDGER',
|
||||
'ODS_ASSISTANT_ABOLISH',
|
||||
'ODS_INVENTORY_CHANGE',
|
||||
'ODS_INVENTORY_STOCK',
|
||||
'ODS_PACKAGE',
|
||||
'ODS_GROUP_BUY_REDEMPTION',
|
||||
'ODS_MEMBER',
|
||||
'ODS_MEMBER_BALANCE',
|
||||
'ODS_MEMBER_CARD',
|
||||
'ODS_PAYMENT',
|
||||
'ODS_REFUND',
|
||||
'ODS_COUPON_VERIFY',
|
||||
'ODS_RECHARGE_SETTLE',
|
||||
'ODS_TABLES',
|
||||
'ODS_GOODS_CATEGORY',
|
||||
'ODS_STORE_GOODS',
|
||||
'ODS_TABLE_DISCOUNT',
|
||||
'ODS_TENANT_GOODS',
|
||||
'ODS_SETTLEMENT_TICKET',
|
||||
'ODS_ORDER_SETTLE'
|
||||
]) AS task_code
|
||||
)
|
||||
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
|
||||
SELECT t.task_code, s.store_id, TRUE
|
||||
FROM task_codes t CROSS JOIN target_store s
|
||||
ON CONFLICT (task_code, store_id) DO UPDATE
|
||||
SET enabled = EXCLUDED.enabled;
|
||||
|
||||
16
etl_billiards/feiqiu-ETL.code-workspace
Normal file
16
etl_billiards/feiqiu-ETL.code-workspace
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"folders": [
|
||||
{
|
||||
"path": ".."
|
||||
},
|
||||
{
|
||||
"name": "LLZQ-server",
|
||||
"path": "../../../LLZQ-server"
|
||||
},
|
||||
{
|
||||
"name": "feiqiu-ETL-reload",
|
||||
"path": "../../feiqiu-ETL-reload"
|
||||
}
|
||||
],
|
||||
"settings": {}
|
||||
}
|
||||
0
etl_billiards/loaders/__init__.py
Normal file
0
etl_billiards/loaders/__init__.py
Normal file
23
etl_billiards/loaders/base_loader.py
Normal file
23
etl_billiards/loaders/base_loader.py
Normal file
@@ -0,0 +1,23 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据加载器基类"""
|
||||
|
||||
import logging
|
||||
|
||||
|
||||
class BaseLoader:
|
||||
"""数据加载器基类"""
|
||||
|
||||
def __init__(self, db_ops, logger=None):
|
||||
self.db = db_ops
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
|
||||
def upsert(self, records: list) -> tuple:
|
||||
"""
|
||||
执行 UPSERT 操作
|
||||
返回: (inserted_count, updated_count, skipped_count)
|
||||
"""
|
||||
raise NotImplementedError("子类需实现 upsert 方法")
|
||||
|
||||
def _batch_size(self) -> int:
|
||||
"""批次大小"""
|
||||
return 1000
|
||||
0
etl_billiards/loaders/dimensions/__init__.py
Normal file
0
etl_billiards/loaders/dimensions/__init__.py
Normal file
114
etl_billiards/loaders/dimensions/assistant.py
Normal file
114
etl_billiards/loaders/dimensions/assistant.py
Normal file
@@ -0,0 +1,114 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教维度加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class AssistantLoader(BaseLoader):
|
||||
"""写入 dim_assistant"""
|
||||
|
||||
def upsert_assistants(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.dim_assistant (
|
||||
store_id,
|
||||
assistant_id,
|
||||
assistant_no,
|
||||
nickname,
|
||||
real_name,
|
||||
gender,
|
||||
mobile,
|
||||
level,
|
||||
team_id,
|
||||
team_name,
|
||||
assistant_status,
|
||||
work_status,
|
||||
entry_time,
|
||||
resign_time,
|
||||
start_time,
|
||||
end_time,
|
||||
create_time,
|
||||
update_time,
|
||||
system_role_id,
|
||||
online_status,
|
||||
allow_cx,
|
||||
charge_way,
|
||||
pd_unit_price,
|
||||
cx_unit_price,
|
||||
is_guaranteed,
|
||||
is_team_leader,
|
||||
serial_number,
|
||||
show_sort,
|
||||
is_delete,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(assistant_id)s,
|
||||
%(assistant_no)s,
|
||||
%(nickname)s,
|
||||
%(real_name)s,
|
||||
%(gender)s,
|
||||
%(mobile)s,
|
||||
%(level)s,
|
||||
%(team_id)s,
|
||||
%(team_name)s,
|
||||
%(assistant_status)s,
|
||||
%(work_status)s,
|
||||
%(entry_time)s,
|
||||
%(resign_time)s,
|
||||
%(start_time)s,
|
||||
%(end_time)s,
|
||||
%(create_time)s,
|
||||
%(update_time)s,
|
||||
%(system_role_id)s,
|
||||
%(online_status)s,
|
||||
%(allow_cx)s,
|
||||
%(charge_way)s,
|
||||
%(pd_unit_price)s,
|
||||
%(cx_unit_price)s,
|
||||
%(is_guaranteed)s,
|
||||
%(is_team_leader)s,
|
||||
%(serial_number)s,
|
||||
%(show_sort)s,
|
||||
%(is_delete)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, assistant_id) DO UPDATE SET
|
||||
assistant_no = EXCLUDED.assistant_no,
|
||||
nickname = EXCLUDED.nickname,
|
||||
real_name = EXCLUDED.real_name,
|
||||
gender = EXCLUDED.gender,
|
||||
mobile = EXCLUDED.mobile,
|
||||
level = EXCLUDED.level,
|
||||
team_id = EXCLUDED.team_id,
|
||||
team_name = EXCLUDED.team_name,
|
||||
assistant_status= EXCLUDED.assistant_status,
|
||||
work_status = EXCLUDED.work_status,
|
||||
entry_time = EXCLUDED.entry_time,
|
||||
resign_time = EXCLUDED.resign_time,
|
||||
start_time = EXCLUDED.start_time,
|
||||
end_time = EXCLUDED.end_time,
|
||||
update_time = COALESCE(EXCLUDED.update_time, now()),
|
||||
system_role_id = EXCLUDED.system_role_id,
|
||||
online_status = EXCLUDED.online_status,
|
||||
allow_cx = EXCLUDED.allow_cx,
|
||||
charge_way = EXCLUDED.charge_way,
|
||||
pd_unit_price = EXCLUDED.pd_unit_price,
|
||||
cx_unit_price = EXCLUDED.cx_unit_price,
|
||||
is_guaranteed = EXCLUDED.is_guaranteed,
|
||||
is_team_leader = EXCLUDED.is_team_leader,
|
||||
serial_number = EXCLUDED.serial_number,
|
||||
show_sort = EXCLUDED.show_sort,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
34
etl_billiards/loaders/dimensions/member.py
Normal file
34
etl_billiards/loaders/dimensions/member.py
Normal file
@@ -0,0 +1,34 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""会员维度表加载器"""
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
class MemberLoader(BaseLoader):
|
||||
"""会员维度加载器"""
|
||||
|
||||
def upsert_members(self, records: list, store_id: int) -> tuple:
|
||||
"""加载会员数据"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.dim_member (
|
||||
store_id, member_id, member_name, phone, balance,
|
||||
status, register_time, raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s, %(member_id)s, %(member_name)s, %(phone)s, %(balance)s,
|
||||
%(status)s, %(register_time)s, %(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, member_id) DO UPDATE SET
|
||||
member_name = EXCLUDED.member_name,
|
||||
phone = EXCLUDED.phone,
|
||||
balance = EXCLUDED.balance,
|
||||
status = EXCLUDED.status,
|
||||
register_time = EXCLUDED.register_time,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
|
||||
return (inserted, updated, 0)
|
||||
91
etl_billiards/loaders/dimensions/package.py
Normal file
91
etl_billiards/loaders/dimensions/package.py
Normal file
@@ -0,0 +1,91 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""团购/套餐定义加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class PackageDefinitionLoader(BaseLoader):
|
||||
"""写入 dim_package_coupon"""
|
||||
|
||||
def upsert_packages(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.dim_package_coupon (
|
||||
store_id,
|
||||
package_id,
|
||||
package_code,
|
||||
package_name,
|
||||
table_area_id,
|
||||
table_area_name,
|
||||
selling_price,
|
||||
duration_seconds,
|
||||
start_time,
|
||||
end_time,
|
||||
type,
|
||||
is_enabled,
|
||||
is_delete,
|
||||
usable_count,
|
||||
creator_name,
|
||||
date_type,
|
||||
group_type,
|
||||
coupon_money,
|
||||
area_tag_type,
|
||||
system_group_type,
|
||||
card_type_ids,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(package_id)s,
|
||||
%(package_code)s,
|
||||
%(package_name)s,
|
||||
%(table_area_id)s,
|
||||
%(table_area_name)s,
|
||||
%(selling_price)s,
|
||||
%(duration_seconds)s,
|
||||
%(start_time)s,
|
||||
%(end_time)s,
|
||||
%(type)s,
|
||||
%(is_enabled)s,
|
||||
%(is_delete)s,
|
||||
%(usable_count)s,
|
||||
%(creator_name)s,
|
||||
%(date_type)s,
|
||||
%(group_type)s,
|
||||
%(coupon_money)s,
|
||||
%(area_tag_type)s,
|
||||
%(system_group_type)s,
|
||||
%(card_type_ids)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, package_id) DO UPDATE SET
|
||||
package_code = EXCLUDED.package_code,
|
||||
package_name = EXCLUDED.package_name,
|
||||
table_area_id = EXCLUDED.table_area_id,
|
||||
table_area_name = EXCLUDED.table_area_name,
|
||||
selling_price = EXCLUDED.selling_price,
|
||||
duration_seconds = EXCLUDED.duration_seconds,
|
||||
start_time = EXCLUDED.start_time,
|
||||
end_time = EXCLUDED.end_time,
|
||||
type = EXCLUDED.type,
|
||||
is_enabled = EXCLUDED.is_enabled,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
usable_count = EXCLUDED.usable_count,
|
||||
creator_name = EXCLUDED.creator_name,
|
||||
date_type = EXCLUDED.date_type,
|
||||
group_type = EXCLUDED.group_type,
|
||||
coupon_money = EXCLUDED.coupon_money,
|
||||
area_tag_type = EXCLUDED.area_tag_type,
|
||||
system_group_type = EXCLUDED.system_group_type,
|
||||
card_type_ids = EXCLUDED.card_type_ids,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
134
etl_billiards/loaders/dimensions/product.py
Normal file
134
etl_billiards/loaders/dimensions/product.py
Normal file
@@ -0,0 +1,134 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""商品维度 + 价格SCD2 加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
from scd.scd2_handler import SCD2Handler
|
||||
|
||||
|
||||
class ProductLoader(BaseLoader):
|
||||
"""商品维度加载器(dim_product + dim_product_price_scd)"""
|
||||
|
||||
def __init__(self, db_ops):
|
||||
super().__init__(db_ops)
|
||||
# SCD2 处理器,复用通用逻辑
|
||||
self.scd_handler = SCD2Handler(db_ops)
|
||||
|
||||
def upsert_products(self, records: list, store_id: int) -> tuple:
|
||||
"""
|
||||
加载商品维度及价格SCD
|
||||
|
||||
返回: (inserted_count, updated_count, skipped_count)
|
||||
"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
# 1) 维度主表:billiards.dim_product
|
||||
sql_base = """
|
||||
INSERT INTO billiards.dim_product (
|
||||
store_id,
|
||||
product_id,
|
||||
site_product_id,
|
||||
product_name,
|
||||
category_id,
|
||||
category_name,
|
||||
second_category_id,
|
||||
unit,
|
||||
cost_price,
|
||||
sale_price,
|
||||
allow_discount,
|
||||
status,
|
||||
supplier_id,
|
||||
barcode,
|
||||
is_combo,
|
||||
created_time,
|
||||
updated_time,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(product_id)s,
|
||||
%(site_product_id)s,
|
||||
%(product_name)s,
|
||||
%(category_id)s,
|
||||
%(category_name)s,
|
||||
%(second_category_id)s,
|
||||
%(unit)s,
|
||||
%(cost_price)s,
|
||||
%(sale_price)s,
|
||||
%(allow_discount)s,
|
||||
%(status)s,
|
||||
%(supplier_id)s,
|
||||
%(barcode)s,
|
||||
%(is_combo)s,
|
||||
%(created_time)s,
|
||||
%(updated_time)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, product_id) DO UPDATE SET
|
||||
site_product_id = EXCLUDED.site_product_id,
|
||||
product_name = EXCLUDED.product_name,
|
||||
category_id = EXCLUDED.category_id,
|
||||
category_name = EXCLUDED.category_name,
|
||||
second_category_id = EXCLUDED.second_category_id,
|
||||
unit = EXCLUDED.unit,
|
||||
cost_price = EXCLUDED.cost_price,
|
||||
sale_price = EXCLUDED.sale_price,
|
||||
allow_discount = EXCLUDED.allow_discount,
|
||||
status = EXCLUDED.status,
|
||||
supplier_id = EXCLUDED.supplier_id,
|
||||
barcode = EXCLUDED.barcode,
|
||||
is_combo = EXCLUDED.is_combo,
|
||||
updated_time = COALESCE(EXCLUDED.updated_time, now()),
|
||||
raw_data = EXCLUDED.raw_data
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql_base,
|
||||
records,
|
||||
page_size=self._batch_size(),
|
||||
)
|
||||
|
||||
# 2) 价格 SCD2:billiards.dim_product_price_scd
|
||||
# 只追踪 price + 类目 + 名称等字段的历史
|
||||
tracked_fields = [
|
||||
"product_name",
|
||||
"category_id",
|
||||
"category_name",
|
||||
"second_category_id",
|
||||
"cost_price",
|
||||
"sale_price",
|
||||
"allow_discount",
|
||||
"status",
|
||||
]
|
||||
natural_key = ["store_id", "product_id"]
|
||||
|
||||
for rec in records:
|
||||
effective_date = rec.get("updated_time") or rec.get("created_time")
|
||||
|
||||
scd_record = {
|
||||
"store_id": rec["store_id"],
|
||||
"product_id": rec["product_id"],
|
||||
"product_name": rec.get("product_name"),
|
||||
"category_id": rec.get("category_id"),
|
||||
"category_name": rec.get("category_name"),
|
||||
"second_category_id": rec.get("second_category_id"),
|
||||
"cost_price": rec.get("cost_price"),
|
||||
"sale_price": rec.get("sale_price"),
|
||||
"allow_discount": rec.get("allow_discount"),
|
||||
"status": rec.get("status"),
|
||||
# 原表中有 raw_data jsonb 字段,这里直接复用 task 传入的 raw_data
|
||||
"raw_data": rec.get("raw_data"),
|
||||
}
|
||||
|
||||
# 这里我们不强行区分 INSERT/UPDATE/SKIP,对 ETL 统计来说意义不大
|
||||
self.scd_handler.upsert(
|
||||
table_name="billiards.dim_product_price_scd",
|
||||
natural_key=natural_key,
|
||||
tracked_fields=tracked_fields,
|
||||
record=scd_record,
|
||||
effective_date=effective_date,
|
||||
)
|
||||
|
||||
# skipped_count 统一按 0 返回(真正被丢弃的记录在 Task 端已经过滤)
|
||||
return (inserted, updated, 0)
|
||||
80
etl_billiards/loaders/dimensions/table.py
Normal file
80
etl_billiards/loaders/dimensions/table.py
Normal file
@@ -0,0 +1,80 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""台桌维度加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class TableLoader(BaseLoader):
|
||||
"""将台桌档案写入 dim_table"""
|
||||
|
||||
def upsert_tables(self, records: list) -> tuple:
|
||||
"""批量写入台桌档案"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.dim_table (
|
||||
store_id,
|
||||
table_id,
|
||||
site_id,
|
||||
area_id,
|
||||
area_name,
|
||||
table_name,
|
||||
table_price,
|
||||
table_status,
|
||||
table_status_name,
|
||||
light_status,
|
||||
is_rest_area,
|
||||
show_status,
|
||||
virtual_table,
|
||||
charge_free,
|
||||
only_allow_groupon,
|
||||
is_online_reservation,
|
||||
created_time,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(table_id)s,
|
||||
%(site_id)s,
|
||||
%(area_id)s,
|
||||
%(area_name)s,
|
||||
%(table_name)s,
|
||||
%(table_price)s,
|
||||
%(table_status)s,
|
||||
%(table_status_name)s,
|
||||
%(light_status)s,
|
||||
%(is_rest_area)s,
|
||||
%(show_status)s,
|
||||
%(virtual_table)s,
|
||||
%(charge_free)s,
|
||||
%(only_allow_groupon)s,
|
||||
%(is_online_reservation)s,
|
||||
%(created_time)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, table_id) DO UPDATE SET
|
||||
site_id = EXCLUDED.site_id,
|
||||
area_id = EXCLUDED.area_id,
|
||||
area_name = EXCLUDED.area_name,
|
||||
table_name = EXCLUDED.table_name,
|
||||
table_price = EXCLUDED.table_price,
|
||||
table_status = EXCLUDED.table_status,
|
||||
table_status_name = EXCLUDED.table_status_name,
|
||||
light_status = EXCLUDED.light_status,
|
||||
is_rest_area = EXCLUDED.is_rest_area,
|
||||
show_status = EXCLUDED.show_status,
|
||||
virtual_table = EXCLUDED.virtual_table,
|
||||
charge_free = EXCLUDED.charge_free,
|
||||
only_allow_groupon = EXCLUDED.only_allow_groupon,
|
||||
is_online_reservation = EXCLUDED.is_online_reservation,
|
||||
created_time = COALESCE(EXCLUDED.created_time, dim_table.created_time),
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
0
etl_billiards/loaders/facts/__init__.py
Normal file
0
etl_billiards/loaders/facts/__init__.py
Normal file
64
etl_billiards/loaders/facts/assistant_abolish.py
Normal file
64
etl_billiards/loaders/facts/assistant_abolish.py
Normal file
@@ -0,0 +1,64 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教作废事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class AssistantAbolishLoader(BaseLoader):
|
||||
"""写入 fact_assistant_abolish"""
|
||||
|
||||
def upsert_records(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_assistant_abolish (
|
||||
store_id,
|
||||
abolish_id,
|
||||
table_id,
|
||||
table_name,
|
||||
table_area_id,
|
||||
table_area,
|
||||
assistant_no,
|
||||
assistant_name,
|
||||
charge_minutes,
|
||||
abolish_amount,
|
||||
create_time,
|
||||
trash_reason,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(abolish_id)s,
|
||||
%(table_id)s,
|
||||
%(table_name)s,
|
||||
%(table_area_id)s,
|
||||
%(table_area)s,
|
||||
%(assistant_no)s,
|
||||
%(assistant_name)s,
|
||||
%(charge_minutes)s,
|
||||
%(abolish_amount)s,
|
||||
%(create_time)s,
|
||||
%(trash_reason)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, abolish_id) DO UPDATE SET
|
||||
table_id = EXCLUDED.table_id,
|
||||
table_name = EXCLUDED.table_name,
|
||||
table_area_id = EXCLUDED.table_area_id,
|
||||
table_area = EXCLUDED.table_area,
|
||||
assistant_no = EXCLUDED.assistant_no,
|
||||
assistant_name = EXCLUDED.assistant_name,
|
||||
charge_minutes = EXCLUDED.charge_minutes,
|
||||
abolish_amount = EXCLUDED.abolish_amount,
|
||||
create_time = EXCLUDED.create_time,
|
||||
trash_reason = EXCLUDED.trash_reason,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
136
etl_billiards/loaders/facts/assistant_ledger.py
Normal file
136
etl_billiards/loaders/facts/assistant_ledger.py
Normal file
@@ -0,0 +1,136 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教流水事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class AssistantLedgerLoader(BaseLoader):
|
||||
"""写入 fact_assistant_ledger"""
|
||||
|
||||
def upsert_ledgers(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_assistant_ledger (
|
||||
store_id,
|
||||
ledger_id,
|
||||
assistant_no,
|
||||
assistant_name,
|
||||
nickname,
|
||||
level_name,
|
||||
table_name,
|
||||
ledger_unit_price,
|
||||
ledger_count,
|
||||
ledger_amount,
|
||||
projected_income,
|
||||
service_money,
|
||||
member_discount_amount,
|
||||
manual_discount_amount,
|
||||
coupon_deduct_money,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
operator_id,
|
||||
operator_name,
|
||||
assistant_team_id,
|
||||
assistant_level,
|
||||
site_table_id,
|
||||
order_assistant_id,
|
||||
site_assistant_id,
|
||||
user_id,
|
||||
ledger_start_time,
|
||||
ledger_end_time,
|
||||
start_use_time,
|
||||
last_use_time,
|
||||
income_seconds,
|
||||
real_use_seconds,
|
||||
is_trash,
|
||||
trash_reason,
|
||||
is_confirm,
|
||||
ledger_status,
|
||||
create_time,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(ledger_id)s,
|
||||
%(assistant_no)s,
|
||||
%(assistant_name)s,
|
||||
%(nickname)s,
|
||||
%(level_name)s,
|
||||
%(table_name)s,
|
||||
%(ledger_unit_price)s,
|
||||
%(ledger_count)s,
|
||||
%(ledger_amount)s,
|
||||
%(projected_income)s,
|
||||
%(service_money)s,
|
||||
%(member_discount_amount)s,
|
||||
%(manual_discount_amount)s,
|
||||
%(coupon_deduct_money)s,
|
||||
%(order_trade_no)s,
|
||||
%(order_settle_id)s,
|
||||
%(operator_id)s,
|
||||
%(operator_name)s,
|
||||
%(assistant_team_id)s,
|
||||
%(assistant_level)s,
|
||||
%(site_table_id)s,
|
||||
%(order_assistant_id)s,
|
||||
%(site_assistant_id)s,
|
||||
%(user_id)s,
|
||||
%(ledger_start_time)s,
|
||||
%(ledger_end_time)s,
|
||||
%(start_use_time)s,
|
||||
%(last_use_time)s,
|
||||
%(income_seconds)s,
|
||||
%(real_use_seconds)s,
|
||||
%(is_trash)s,
|
||||
%(trash_reason)s,
|
||||
%(is_confirm)s,
|
||||
%(ledger_status)s,
|
||||
%(create_time)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, ledger_id) DO UPDATE SET
|
||||
assistant_no = EXCLUDED.assistant_no,
|
||||
assistant_name = EXCLUDED.assistant_name,
|
||||
nickname = EXCLUDED.nickname,
|
||||
level_name = EXCLUDED.level_name,
|
||||
table_name = EXCLUDED.table_name,
|
||||
ledger_unit_price = EXCLUDED.ledger_unit_price,
|
||||
ledger_count = EXCLUDED.ledger_count,
|
||||
ledger_amount = EXCLUDED.ledger_amount,
|
||||
projected_income = EXCLUDED.projected_income,
|
||||
service_money = EXCLUDED.service_money,
|
||||
member_discount_amount = EXCLUDED.member_discount_amount,
|
||||
manual_discount_amount = EXCLUDED.manual_discount_amount,
|
||||
coupon_deduct_money = EXCLUDED.coupon_deduct_money,
|
||||
order_trade_no = EXCLUDED.order_trade_no,
|
||||
order_settle_id = EXCLUDED.order_settle_id,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
assistant_team_id = EXCLUDED.assistant_team_id,
|
||||
assistant_level = EXCLUDED.assistant_level,
|
||||
site_table_id = EXCLUDED.site_table_id,
|
||||
order_assistant_id = EXCLUDED.order_assistant_id,
|
||||
site_assistant_id = EXCLUDED.site_assistant_id,
|
||||
user_id = EXCLUDED.user_id,
|
||||
ledger_start_time = EXCLUDED.ledger_start_time,
|
||||
ledger_end_time = EXCLUDED.ledger_end_time,
|
||||
start_use_time = EXCLUDED.start_use_time,
|
||||
last_use_time = EXCLUDED.last_use_time,
|
||||
income_seconds = EXCLUDED.income_seconds,
|
||||
real_use_seconds = EXCLUDED.real_use_seconds,
|
||||
is_trash = EXCLUDED.is_trash,
|
||||
trash_reason = EXCLUDED.trash_reason,
|
||||
is_confirm = EXCLUDED.is_confirm,
|
||||
ledger_status = EXCLUDED.ledger_status,
|
||||
create_time = EXCLUDED.create_time,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
91
etl_billiards/loaders/facts/coupon_usage.py
Normal file
91
etl_billiards/loaders/facts/coupon_usage.py
Normal file
@@ -0,0 +1,91 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""券核销事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class CouponUsageLoader(BaseLoader):
|
||||
"""写入 fact_coupon_usage"""
|
||||
|
||||
def upsert_coupon_usage(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_coupon_usage (
|
||||
store_id,
|
||||
usage_id,
|
||||
coupon_code,
|
||||
coupon_channel,
|
||||
coupon_name,
|
||||
sale_price,
|
||||
coupon_money,
|
||||
coupon_free_time,
|
||||
use_status,
|
||||
create_time,
|
||||
consume_time,
|
||||
operator_id,
|
||||
operator_name,
|
||||
table_id,
|
||||
site_order_id,
|
||||
group_package_id,
|
||||
coupon_remark,
|
||||
deal_id,
|
||||
certificate_id,
|
||||
verify_id,
|
||||
is_delete,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(usage_id)s,
|
||||
%(coupon_code)s,
|
||||
%(coupon_channel)s,
|
||||
%(coupon_name)s,
|
||||
%(sale_price)s,
|
||||
%(coupon_money)s,
|
||||
%(coupon_free_time)s,
|
||||
%(use_status)s,
|
||||
%(create_time)s,
|
||||
%(consume_time)s,
|
||||
%(operator_id)s,
|
||||
%(operator_name)s,
|
||||
%(table_id)s,
|
||||
%(site_order_id)s,
|
||||
%(group_package_id)s,
|
||||
%(coupon_remark)s,
|
||||
%(deal_id)s,
|
||||
%(certificate_id)s,
|
||||
%(verify_id)s,
|
||||
%(is_delete)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, usage_id) DO UPDATE SET
|
||||
coupon_code = EXCLUDED.coupon_code,
|
||||
coupon_channel = EXCLUDED.coupon_channel,
|
||||
coupon_name = EXCLUDED.coupon_name,
|
||||
sale_price = EXCLUDED.sale_price,
|
||||
coupon_money = EXCLUDED.coupon_money,
|
||||
coupon_free_time = EXCLUDED.coupon_free_time,
|
||||
use_status = EXCLUDED.use_status,
|
||||
create_time = EXCLUDED.create_time,
|
||||
consume_time = EXCLUDED.consume_time,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
table_id = EXCLUDED.table_id,
|
||||
site_order_id = EXCLUDED.site_order_id,
|
||||
group_package_id = EXCLUDED.group_package_id,
|
||||
coupon_remark = EXCLUDED.coupon_remark,
|
||||
deal_id = EXCLUDED.deal_id,
|
||||
certificate_id = EXCLUDED.certificate_id,
|
||||
verify_id = EXCLUDED.verify_id,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
73
etl_billiards/loaders/facts/inventory_change.py
Normal file
73
etl_billiards/loaders/facts/inventory_change.py
Normal file
@@ -0,0 +1,73 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""库存变动事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class InventoryChangeLoader(BaseLoader):
|
||||
"""写入 fact_inventory_change"""
|
||||
|
||||
def upsert_changes(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_inventory_change (
|
||||
store_id,
|
||||
change_id,
|
||||
site_goods_id,
|
||||
stock_type,
|
||||
goods_name,
|
||||
change_time,
|
||||
start_qty,
|
||||
end_qty,
|
||||
change_qty,
|
||||
unit,
|
||||
price,
|
||||
operator_name,
|
||||
remark,
|
||||
goods_category_id,
|
||||
goods_second_category_id,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(change_id)s,
|
||||
%(site_goods_id)s,
|
||||
%(stock_type)s,
|
||||
%(goods_name)s,
|
||||
%(change_time)s,
|
||||
%(start_qty)s,
|
||||
%(end_qty)s,
|
||||
%(change_qty)s,
|
||||
%(unit)s,
|
||||
%(price)s,
|
||||
%(operator_name)s,
|
||||
%(remark)s,
|
||||
%(goods_category_id)s,
|
||||
%(goods_second_category_id)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, change_id) DO UPDATE SET
|
||||
site_goods_id = EXCLUDED.site_goods_id,
|
||||
stock_type = EXCLUDED.stock_type,
|
||||
goods_name = EXCLUDED.goods_name,
|
||||
change_time = EXCLUDED.change_time,
|
||||
start_qty = EXCLUDED.start_qty,
|
||||
end_qty = EXCLUDED.end_qty,
|
||||
change_qty = EXCLUDED.change_qty,
|
||||
unit = EXCLUDED.unit,
|
||||
price = EXCLUDED.price,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
remark = EXCLUDED.remark,
|
||||
goods_category_id = EXCLUDED.goods_category_id,
|
||||
goods_second_category_id = EXCLUDED.goods_second_category_id,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
42
etl_billiards/loaders/facts/order.py
Normal file
42
etl_billiards/loaders/facts/order.py
Normal file
@@ -0,0 +1,42 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""订单事实表加载器"""
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
class OrderLoader(BaseLoader):
|
||||
"""订单数据加载器"""
|
||||
|
||||
def upsert_orders(self, records: list, store_id: int) -> tuple:
|
||||
"""加载订单数据"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_order (
|
||||
store_id, order_id, order_no, member_id, table_id,
|
||||
order_time, end_time, total_amount, discount_amount,
|
||||
final_amount, pay_status, order_status, remark, raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s, %(order_id)s, %(order_no)s, %(member_id)s, %(table_id)s,
|
||||
%(order_time)s, %(end_time)s, %(total_amount)s, %(discount_amount)s,
|
||||
%(final_amount)s, %(pay_status)s, %(order_status)s, %(remark)s, %(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, order_id) DO UPDATE SET
|
||||
order_no = EXCLUDED.order_no,
|
||||
member_id = EXCLUDED.member_id,
|
||||
table_id = EXCLUDED.table_id,
|
||||
order_time = EXCLUDED.order_time,
|
||||
end_time = EXCLUDED.end_time,
|
||||
total_amount = EXCLUDED.total_amount,
|
||||
discount_amount = EXCLUDED.discount_amount,
|
||||
final_amount = EXCLUDED.final_amount,
|
||||
pay_status = EXCLUDED.pay_status,
|
||||
order_status = EXCLUDED.order_status,
|
||||
remark = EXCLUDED.remark,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
|
||||
return (inserted, updated, 0)
|
||||
61
etl_billiards/loaders/facts/payment.py
Normal file
61
etl_billiards/loaders/facts/payment.py
Normal file
@@ -0,0 +1,61 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""支付事实表加载器"""
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
class PaymentLoader(BaseLoader):
|
||||
"""支付数据加载器"""
|
||||
|
||||
def upsert_payments(self, records: list, store_id: int) -> tuple:
|
||||
"""加载支付数据"""
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_payment (
|
||||
store_id, pay_id, order_id,
|
||||
site_id, tenant_id,
|
||||
order_settle_id, order_trade_no,
|
||||
relate_type, relate_id,
|
||||
create_time, pay_time,
|
||||
pay_amount, fee_amount, discount_amount,
|
||||
payment_method, pay_type,
|
||||
online_pay_channel, pay_terminal,
|
||||
pay_status, remark, raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s, %(pay_id)s, %(order_id)s,
|
||||
%(site_id)s, %(tenant_id)s,
|
||||
%(order_settle_id)s, %(order_trade_no)s,
|
||||
%(relate_type)s, %(relate_id)s,
|
||||
%(create_time)s, %(pay_time)s,
|
||||
%(pay_amount)s, %(fee_amount)s, %(discount_amount)s,
|
||||
%(payment_method)s, %(pay_type)s,
|
||||
%(online_pay_channel)s, %(pay_terminal)s,
|
||||
%(pay_status)s, %(remark)s, %(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, pay_id) DO UPDATE SET
|
||||
order_settle_id = EXCLUDED.order_settle_id,
|
||||
order_trade_no = EXCLUDED.order_trade_no,
|
||||
relate_type = EXCLUDED.relate_type,
|
||||
relate_id = EXCLUDED.relate_id,
|
||||
order_id = EXCLUDED.order_id,
|
||||
site_id = EXCLUDED.site_id,
|
||||
tenant_id = EXCLUDED.tenant_id,
|
||||
create_time = EXCLUDED.create_time,
|
||||
pay_time = EXCLUDED.pay_time,
|
||||
pay_amount = EXCLUDED.pay_amount,
|
||||
fee_amount = EXCLUDED.fee_amount,
|
||||
discount_amount = EXCLUDED.discount_amount,
|
||||
payment_method = EXCLUDED.payment_method,
|
||||
pay_type = EXCLUDED.pay_type,
|
||||
online_pay_channel = EXCLUDED.online_pay_channel,
|
||||
pay_terminal = EXCLUDED.pay_terminal,
|
||||
pay_status = EXCLUDED.pay_status,
|
||||
remark = EXCLUDED.remark,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
|
||||
return (inserted, updated, 0)
|
||||
88
etl_billiards/loaders/facts/refund.py
Normal file
88
etl_billiards/loaders/facts/refund.py
Normal file
@@ -0,0 +1,88 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""退款事实表加载器"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class RefundLoader(BaseLoader):
|
||||
"""写入 fact_refund"""
|
||||
|
||||
def upsert_refunds(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_refund (
|
||||
store_id,
|
||||
refund_id,
|
||||
site_id,
|
||||
tenant_id,
|
||||
pay_amount,
|
||||
pay_status,
|
||||
pay_time,
|
||||
create_time,
|
||||
relate_type,
|
||||
relate_id,
|
||||
payment_method,
|
||||
refund_amount,
|
||||
action_type,
|
||||
pay_terminal,
|
||||
operator_id,
|
||||
channel_pay_no,
|
||||
channel_fee,
|
||||
is_delete,
|
||||
member_id,
|
||||
member_card_id,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(refund_id)s,
|
||||
%(site_id)s,
|
||||
%(tenant_id)s,
|
||||
%(pay_amount)s,
|
||||
%(pay_status)s,
|
||||
%(pay_time)s,
|
||||
%(create_time)s,
|
||||
%(relate_type)s,
|
||||
%(relate_id)s,
|
||||
%(payment_method)s,
|
||||
%(refund_amount)s,
|
||||
%(action_type)s,
|
||||
%(pay_terminal)s,
|
||||
%(operator_id)s,
|
||||
%(channel_pay_no)s,
|
||||
%(channel_fee)s,
|
||||
%(is_delete)s,
|
||||
%(member_id)s,
|
||||
%(member_card_id)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, refund_id) DO UPDATE SET
|
||||
site_id = EXCLUDED.site_id,
|
||||
tenant_id = EXCLUDED.tenant_id,
|
||||
pay_amount = EXCLUDED.pay_amount,
|
||||
pay_status = EXCLUDED.pay_status,
|
||||
pay_time = EXCLUDED.pay_time,
|
||||
create_time = EXCLUDED.create_time,
|
||||
relate_type = EXCLUDED.relate_type,
|
||||
relate_id = EXCLUDED.relate_id,
|
||||
payment_method = EXCLUDED.payment_method,
|
||||
refund_amount = EXCLUDED.refund_amount,
|
||||
action_type = EXCLUDED.action_type,
|
||||
pay_terminal = EXCLUDED.pay_terminal,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
channel_pay_no = EXCLUDED.channel_pay_no,
|
||||
channel_fee = EXCLUDED.channel_fee,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
member_id = EXCLUDED.member_id,
|
||||
member_card_id = EXCLUDED.member_card_id,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
82
etl_billiards/loaders/facts/table_discount.py
Normal file
82
etl_billiards/loaders/facts/table_discount.py
Normal file
@@ -0,0 +1,82 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""台费打折事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class TableDiscountLoader(BaseLoader):
|
||||
"""写入 fact_table_discount"""
|
||||
|
||||
def upsert_discounts(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_table_discount (
|
||||
store_id,
|
||||
discount_id,
|
||||
adjust_type,
|
||||
applicant_id,
|
||||
applicant_name,
|
||||
operator_id,
|
||||
operator_name,
|
||||
ledger_amount,
|
||||
ledger_count,
|
||||
ledger_name,
|
||||
ledger_status,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
site_table_id,
|
||||
table_area_id,
|
||||
table_area_name,
|
||||
create_time,
|
||||
is_delete,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(discount_id)s,
|
||||
%(adjust_type)s,
|
||||
%(applicant_id)s,
|
||||
%(applicant_name)s,
|
||||
%(operator_id)s,
|
||||
%(operator_name)s,
|
||||
%(ledger_amount)s,
|
||||
%(ledger_count)s,
|
||||
%(ledger_name)s,
|
||||
%(ledger_status)s,
|
||||
%(order_settle_id)s,
|
||||
%(order_trade_no)s,
|
||||
%(site_table_id)s,
|
||||
%(table_area_id)s,
|
||||
%(table_area_name)s,
|
||||
%(create_time)s,
|
||||
%(is_delete)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, discount_id) DO UPDATE SET
|
||||
adjust_type = EXCLUDED.adjust_type,
|
||||
applicant_id = EXCLUDED.applicant_id,
|
||||
applicant_name = EXCLUDED.applicant_name,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
ledger_amount = EXCLUDED.ledger_amount,
|
||||
ledger_count = EXCLUDED.ledger_count,
|
||||
ledger_name = EXCLUDED.ledger_name,
|
||||
ledger_status = EXCLUDED.ledger_status,
|
||||
order_settle_id = EXCLUDED.order_settle_id,
|
||||
order_trade_no = EXCLUDED.order_trade_no,
|
||||
site_table_id = EXCLUDED.site_table_id,
|
||||
table_area_id = EXCLUDED.table_area_id,
|
||||
table_area_name = EXCLUDED.table_area_name,
|
||||
create_time = EXCLUDED.create_time,
|
||||
is_delete = EXCLUDED.is_delete,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
188
etl_billiards/loaders/facts/ticket.py
Normal file
188
etl_billiards/loaders/facts/ticket.py
Normal file
@@ -0,0 +1,188 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""小票详情加载器"""
|
||||
from ..base_loader import BaseLoader
|
||||
import json
|
||||
|
||||
class TicketLoader(BaseLoader):
|
||||
"""
|
||||
Loader for parsing Ticket Detail JSON and populating DWD fact tables.
|
||||
Handles:
|
||||
- fact_order (Header)
|
||||
- fact_order_goods (Items)
|
||||
- fact_table_usage (Items)
|
||||
- fact_assistant_service (Items)
|
||||
"""
|
||||
|
||||
def process_tickets(self, tickets: list, store_id: int) -> tuple:
|
||||
"""
|
||||
Process a batch of ticket JSONs.
|
||||
Returns (inserted_count, error_count)
|
||||
"""
|
||||
inserted_count = 0
|
||||
error_count = 0
|
||||
|
||||
# Prepare batch lists
|
||||
orders = []
|
||||
goods_list = []
|
||||
table_usages = []
|
||||
assistant_services = []
|
||||
|
||||
for ticket in tickets:
|
||||
try:
|
||||
# 1. Parse Header (fact_order)
|
||||
root_data = ticket.get("data", {}).get("data", {})
|
||||
if not root_data:
|
||||
continue
|
||||
|
||||
order_settle_id = root_data.get("orderSettleId")
|
||||
if not order_settle_id:
|
||||
continue
|
||||
|
||||
orders.append({
|
||||
"store_id": store_id,
|
||||
"order_settle_id": order_settle_id,
|
||||
"order_trade_no": 0,
|
||||
"order_no": str(root_data.get("orderSettleNumber", "")),
|
||||
"member_id": 0,
|
||||
"pay_time": root_data.get("payTime"),
|
||||
"total_amount": root_data.get("consumeMoney", 0),
|
||||
"pay_amount": root_data.get("actualPayment", 0),
|
||||
"discount_amount": root_data.get("memberOfferAmount", 0),
|
||||
"coupon_amount": root_data.get("couponAmount", 0),
|
||||
"status": "PAID",
|
||||
"cashier_name": root_data.get("cashierName", ""),
|
||||
"remark": root_data.get("orderRemark", ""),
|
||||
"raw_data": json.dumps(ticket, ensure_ascii=False)
|
||||
})
|
||||
|
||||
# 2. Parse Items (orderItem list)
|
||||
order_items = root_data.get("orderItem", [])
|
||||
for item in order_items:
|
||||
order_trade_no = item.get("siteOrderId")
|
||||
|
||||
# 2.1 Table Ledger
|
||||
table_ledger = item.get("tableLedger")
|
||||
if table_ledger:
|
||||
table_usages.append({
|
||||
"store_id": store_id,
|
||||
"order_ledger_id": table_ledger.get("orderTableLedgerId"),
|
||||
"order_settle_id": order_settle_id,
|
||||
"table_id": table_ledger.get("siteTableId"),
|
||||
"table_name": table_ledger.get("tableName"),
|
||||
"start_time": table_ledger.get("chargeStartTime"),
|
||||
"end_time": table_ledger.get("chargeEndTime"),
|
||||
"duration_minutes": table_ledger.get("useDuration", 0),
|
||||
"total_amount": table_ledger.get("consumptionAmount", 0),
|
||||
"pay_amount": table_ledger.get("consumptionAmount", 0) - table_ledger.get("memberDiscountAmount", 0)
|
||||
})
|
||||
|
||||
# 2.2 Goods Ledgers
|
||||
goods_ledgers = item.get("goodsLedgers", [])
|
||||
for g in goods_ledgers:
|
||||
goods_list.append({
|
||||
"store_id": store_id,
|
||||
"order_goods_id": g.get("orderGoodsLedgerId"),
|
||||
"order_settle_id": order_settle_id,
|
||||
"order_trade_no": order_trade_no,
|
||||
"goods_id": g.get("siteGoodsId"),
|
||||
"goods_name": g.get("goodsName"),
|
||||
"quantity": g.get("goodsCount", 0),
|
||||
"unit_price": g.get("goodsPrice", 0),
|
||||
"total_amount": g.get("ledgerAmount", 0),
|
||||
"pay_amount": g.get("realGoodsMoney", 0)
|
||||
})
|
||||
|
||||
# 2.3 Assistant Services
|
||||
assistant_ledgers = item.get("assistantPlayWith", [])
|
||||
for a in assistant_ledgers:
|
||||
assistant_services.append({
|
||||
"store_id": store_id,
|
||||
"ledger_id": a.get("orderAssistantLedgerId"),
|
||||
"order_settle_id": order_settle_id,
|
||||
"assistant_id": a.get("assistantId"),
|
||||
"assistant_name": a.get("ledgerName"),
|
||||
"service_type": a.get("skillName", "Play"),
|
||||
"start_time": a.get("ledgerStartTime"),
|
||||
"end_time": a.get("ledgerEndTime"),
|
||||
"duration_minutes": int(a.get("ledgerCount", 0) / 60) if a.get("ledgerCount") else 0,
|
||||
"total_amount": a.get("ledgerAmount", 0),
|
||||
"pay_amount": a.get("ledgerAmount", 0)
|
||||
})
|
||||
|
||||
inserted_count += 1
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error parsing ticket: {e}", exc_info=True)
|
||||
error_count += 1
|
||||
|
||||
# 3. Batch Insert/Upsert
|
||||
if orders:
|
||||
self._upsert_orders(orders)
|
||||
if goods_list:
|
||||
self._upsert_goods(goods_list)
|
||||
if table_usages:
|
||||
self._upsert_table_usages(table_usages)
|
||||
if assistant_services:
|
||||
self._upsert_assistant_services(assistant_services)
|
||||
|
||||
return inserted_count, error_count
|
||||
|
||||
def _upsert_orders(self, rows):
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_order (
|
||||
store_id, order_settle_id, order_trade_no, order_no, member_id,
|
||||
pay_time, total_amount, pay_amount, discount_amount, coupon_amount,
|
||||
status, cashier_name, remark, raw_data
|
||||
) VALUES (
|
||||
%(store_id)s, %(order_settle_id)s, %(order_trade_no)s, %(order_no)s, %(member_id)s,
|
||||
%(pay_time)s, %(total_amount)s, %(pay_amount)s, %(discount_amount)s, %(coupon_amount)s,
|
||||
%(status)s, %(cashier_name)s, %(remark)s, %(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, order_settle_id) DO UPDATE SET
|
||||
pay_time = EXCLUDED.pay_time,
|
||||
pay_amount = EXCLUDED.pay_amount,
|
||||
updated_at = now()
|
||||
"""
|
||||
self.db.batch_execute(sql, rows)
|
||||
|
||||
def _upsert_goods(self, rows):
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_order_goods (
|
||||
store_id, order_goods_id, order_settle_id, order_trade_no,
|
||||
goods_id, goods_name, quantity, unit_price, total_amount, pay_amount
|
||||
) VALUES (
|
||||
%(store_id)s, %(order_goods_id)s, %(order_settle_id)s, %(order_trade_no)s,
|
||||
%(goods_id)s, %(goods_name)s, %(quantity)s, %(unit_price)s, %(total_amount)s, %(pay_amount)s
|
||||
)
|
||||
ON CONFLICT (store_id, order_goods_id) DO UPDATE SET
|
||||
pay_amount = EXCLUDED.pay_amount
|
||||
"""
|
||||
self.db.batch_execute(sql, rows)
|
||||
|
||||
def _upsert_table_usages(self, rows):
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_table_usage (
|
||||
store_id, order_ledger_id, order_settle_id, table_id, table_name,
|
||||
start_time, end_time, duration_minutes, total_amount, pay_amount
|
||||
) VALUES (
|
||||
%(store_id)s, %(order_ledger_id)s, %(order_settle_id)s, %(table_id)s, %(table_name)s,
|
||||
%(start_time)s, %(end_time)s, %(duration_minutes)s, %(total_amount)s, %(pay_amount)s
|
||||
)
|
||||
ON CONFLICT (store_id, order_ledger_id) DO UPDATE SET
|
||||
pay_amount = EXCLUDED.pay_amount
|
||||
"""
|
||||
self.db.batch_execute(sql, rows)
|
||||
|
||||
def _upsert_assistant_services(self, rows):
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_assistant_service (
|
||||
store_id, ledger_id, order_settle_id, assistant_id, assistant_name,
|
||||
service_type, start_time, end_time, duration_minutes, total_amount, pay_amount
|
||||
) VALUES (
|
||||
%(store_id)s, %(ledger_id)s, %(order_settle_id)s, %(assistant_id)s, %(assistant_name)s,
|
||||
%(service_type)s, %(start_time)s, %(end_time)s, %(duration_minutes)s, %(total_amount)s, %(pay_amount)s
|
||||
)
|
||||
ON CONFLICT (store_id, ledger_id) DO UPDATE SET
|
||||
pay_amount = EXCLUDED.pay_amount
|
||||
"""
|
||||
self.db.batch_execute(sql, rows)
|
||||
118
etl_billiards/loaders/facts/topup.py
Normal file
118
etl_billiards/loaders/facts/topup.py
Normal file
@@ -0,0 +1,118 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""充值记录事实表"""
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class TopupLoader(BaseLoader):
|
||||
"""写入 fact_topup"""
|
||||
|
||||
def upsert_topups(self, records: list) -> tuple:
|
||||
if not records:
|
||||
return (0, 0, 0)
|
||||
|
||||
sql = """
|
||||
INSERT INTO billiards.fact_topup (
|
||||
store_id,
|
||||
topup_id,
|
||||
member_id,
|
||||
member_name,
|
||||
member_phone,
|
||||
card_id,
|
||||
card_type_name,
|
||||
pay_amount,
|
||||
consume_money,
|
||||
settle_status,
|
||||
settle_type,
|
||||
settle_name,
|
||||
settle_relate_id,
|
||||
pay_time,
|
||||
create_time,
|
||||
operator_id,
|
||||
operator_name,
|
||||
payment_method,
|
||||
refund_amount,
|
||||
cash_amount,
|
||||
card_amount,
|
||||
balance_amount,
|
||||
online_amount,
|
||||
rounding_amount,
|
||||
adjust_amount,
|
||||
goods_money,
|
||||
table_charge_money,
|
||||
service_money,
|
||||
coupon_amount,
|
||||
order_remark,
|
||||
raw_data
|
||||
)
|
||||
VALUES (
|
||||
%(store_id)s,
|
||||
%(topup_id)s,
|
||||
%(member_id)s,
|
||||
%(member_name)s,
|
||||
%(member_phone)s,
|
||||
%(card_id)s,
|
||||
%(card_type_name)s,
|
||||
%(pay_amount)s,
|
||||
%(consume_money)s,
|
||||
%(settle_status)s,
|
||||
%(settle_type)s,
|
||||
%(settle_name)s,
|
||||
%(settle_relate_id)s,
|
||||
%(pay_time)s,
|
||||
%(create_time)s,
|
||||
%(operator_id)s,
|
||||
%(operator_name)s,
|
||||
%(payment_method)s,
|
||||
%(refund_amount)s,
|
||||
%(cash_amount)s,
|
||||
%(card_amount)s,
|
||||
%(balance_amount)s,
|
||||
%(online_amount)s,
|
||||
%(rounding_amount)s,
|
||||
%(adjust_amount)s,
|
||||
%(goods_money)s,
|
||||
%(table_charge_money)s,
|
||||
%(service_money)s,
|
||||
%(coupon_amount)s,
|
||||
%(order_remark)s,
|
||||
%(raw_data)s
|
||||
)
|
||||
ON CONFLICT (store_id, topup_id) DO UPDATE SET
|
||||
member_id = EXCLUDED.member_id,
|
||||
member_name = EXCLUDED.member_name,
|
||||
member_phone = EXCLUDED.member_phone,
|
||||
card_id = EXCLUDED.card_id,
|
||||
card_type_name = EXCLUDED.card_type_name,
|
||||
pay_amount = EXCLUDED.pay_amount,
|
||||
consume_money = EXCLUDED.consume_money,
|
||||
settle_status = EXCLUDED.settle_status,
|
||||
settle_type = EXCLUDED.settle_type,
|
||||
settle_name = EXCLUDED.settle_name,
|
||||
settle_relate_id = EXCLUDED.settle_relate_id,
|
||||
pay_time = EXCLUDED.pay_time,
|
||||
create_time = EXCLUDED.create_time,
|
||||
operator_id = EXCLUDED.operator_id,
|
||||
operator_name = EXCLUDED.operator_name,
|
||||
payment_method = EXCLUDED.payment_method,
|
||||
refund_amount = EXCLUDED.refund_amount,
|
||||
cash_amount = EXCLUDED.cash_amount,
|
||||
card_amount = EXCLUDED.card_amount,
|
||||
balance_amount = EXCLUDED.balance_amount,
|
||||
online_amount = EXCLUDED.online_amount,
|
||||
rounding_amount = EXCLUDED.rounding_amount,
|
||||
adjust_amount = EXCLUDED.adjust_amount,
|
||||
goods_money = EXCLUDED.goods_money,
|
||||
table_charge_money = EXCLUDED.table_charge_money,
|
||||
service_money = EXCLUDED.service_money,
|
||||
coupon_amount = EXCLUDED.coupon_amount,
|
||||
order_remark = EXCLUDED.order_remark,
|
||||
raw_data = EXCLUDED.raw_data,
|
||||
updated_at = now()
|
||||
RETURNING (xmax = 0) AS inserted
|
||||
"""
|
||||
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
sql, records, page_size=self._batch_size()
|
||||
)
|
||||
return (inserted, updated, 0)
|
||||
6
etl_billiards/loaders/ods/__init__.py
Normal file
6
etl_billiards/loaders/ods/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""ODS loader helpers."""
|
||||
|
||||
from .generic import GenericODSLoader
|
||||
|
||||
__all__ = ["GenericODSLoader"]
|
||||
67
etl_billiards/loaders/ods/generic.py
Normal file
67
etl_billiards/loaders/ods/generic.py
Normal file
@@ -0,0 +1,67 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Generic ODS loader that keeps raw payload + primary keys."""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from datetime import datetime, timezone
|
||||
from typing import Iterable, Sequence
|
||||
|
||||
from ..base_loader import BaseLoader
|
||||
|
||||
|
||||
class GenericODSLoader(BaseLoader):
|
||||
"""Insert/update helper for ODS tables that share the same pattern."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
db_ops,
|
||||
table_name: str,
|
||||
columns: Sequence[str],
|
||||
conflict_columns: Sequence[str],
|
||||
):
|
||||
super().__init__(db_ops)
|
||||
if not conflict_columns:
|
||||
raise ValueError("conflict_columns must not be empty for ODS loader")
|
||||
self.table_name = table_name
|
||||
self.columns = list(columns)
|
||||
self.conflict_columns = list(conflict_columns)
|
||||
self._sql = self._build_sql()
|
||||
|
||||
def upsert_rows(self, rows: Iterable[dict]) -> tuple[int, int, int]:
|
||||
"""Insert/update the provided iterable of dictionaries."""
|
||||
rows = list(rows)
|
||||
if not rows:
|
||||
return (0, 0, 0)
|
||||
|
||||
normalized = [self._normalize_row(row) for row in rows]
|
||||
inserted, updated = self.db.batch_upsert_with_returning(
|
||||
self._sql, normalized, page_size=self._batch_size()
|
||||
)
|
||||
return inserted, updated, 0
|
||||
|
||||
def _build_sql(self) -> str:
|
||||
col_list = ", ".join(self.columns)
|
||||
placeholders = ", ".join(f"%({col})s" for col in self.columns)
|
||||
conflict_clause = ", ".join(self.conflict_columns)
|
||||
update_columns = [c for c in self.columns if c not in self.conflict_columns]
|
||||
set_clause = ", ".join(f"{col} = EXCLUDED.{col}" for col in update_columns)
|
||||
return (
|
||||
f"INSERT INTO {self.table_name} ({col_list}) "
|
||||
f"VALUES ({placeholders}) "
|
||||
f"ON CONFLICT ({conflict_clause}) DO UPDATE SET {set_clause} "
|
||||
f"RETURNING (xmax = 0) AS inserted"
|
||||
)
|
||||
|
||||
def _normalize_row(self, row: dict) -> dict:
|
||||
normalized = {}
|
||||
for col in self.columns:
|
||||
value = row.get(col)
|
||||
if col == "payload" and value is not None and not isinstance(value, str):
|
||||
normalized[col] = json.dumps(value, ensure_ascii=False)
|
||||
else:
|
||||
normalized[col] = value
|
||||
|
||||
if "fetched_at" in normalized and normalized["fetched_at"] is None:
|
||||
normalized["fetched_at"] = datetime.now(timezone.utc)
|
||||
|
||||
return normalized
|
||||
0
etl_billiards/models/__init__.py
Normal file
0
etl_billiards/models/__init__.py
Normal file
50
etl_billiards/models/parsers.py
Normal file
50
etl_billiards/models/parsers.py
Normal file
@@ -0,0 +1,50 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据类型解析器"""
|
||||
from datetime import datetime
|
||||
from decimal import Decimal, ROUND_HALF_UP
|
||||
from dateutil import parser as dtparser
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
class TypeParser:
|
||||
"""类型解析工具"""
|
||||
|
||||
@staticmethod
|
||||
def parse_timestamp(s: str, tz: ZoneInfo) -> datetime | None:
|
||||
"""解析时间戳"""
|
||||
if not s:
|
||||
return None
|
||||
try:
|
||||
dt = dtparser.parse(s)
|
||||
if dt.tzinfo is None:
|
||||
return dt.replace(tzinfo=tz)
|
||||
return dt.astimezone(tz)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def parse_decimal(value, scale: int = 2) -> Decimal | None:
|
||||
"""解析金额"""
|
||||
if value is None:
|
||||
return None
|
||||
try:
|
||||
d = Decimal(str(value))
|
||||
return d.quantize(Decimal(10) ** -scale, rounding=ROUND_HALF_UP)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def parse_int(value) -> int | None:
|
||||
"""解析整数"""
|
||||
if value is None:
|
||||
return None
|
||||
try:
|
||||
return int(value)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def format_timestamp(dt: datetime | None, tz: ZoneInfo) -> str | None:
|
||||
"""格式化时间戳"""
|
||||
if not dt:
|
||||
return None
|
||||
return dt.astimezone(tz).strftime("%Y-%m-%d %H:%M:%S")
|
||||
25
etl_billiards/models/validators.py
Normal file
25
etl_billiards/models/validators.py
Normal file
@@ -0,0 +1,25 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据验证器"""
|
||||
from decimal import Decimal
|
||||
|
||||
class DataValidator:
|
||||
"""数据验证工具"""
|
||||
|
||||
@staticmethod
|
||||
def validate_positive_amount(value: Decimal | None, field_name: str = "amount"):
|
||||
"""验证金额为正数"""
|
||||
if value is not None and value < 0:
|
||||
raise ValueError(f"{field_name} 不能为负数: {value}")
|
||||
|
||||
@staticmethod
|
||||
def validate_required(value, field_name: str):
|
||||
"""验证必填字段"""
|
||||
if value is None or value == "":
|
||||
raise ValueError(f"{field_name} 是必填字段")
|
||||
|
||||
@staticmethod
|
||||
def validate_range(value, min_val, max_val, field_name: str):
|
||||
"""验证值范围"""
|
||||
if value is not None:
|
||||
if value < min_val or value > max_val:
|
||||
raise ValueError(f"{field_name} 必须在 {min_val} 到 {max_val} 之间")
|
||||
0
etl_billiards/orchestration/__init__.py
Normal file
0
etl_billiards/orchestration/__init__.py
Normal file
62
etl_billiards/orchestration/cursor_manager.py
Normal file
62
etl_billiards/orchestration/cursor_manager.py
Normal file
@@ -0,0 +1,62 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""游标管理器"""
|
||||
from datetime import datetime
|
||||
|
||||
class CursorManager:
|
||||
"""ETL游标管理"""
|
||||
|
||||
def __init__(self, db_connection):
|
||||
self.db = db_connection
|
||||
|
||||
def get_or_create(self, task_id: int, store_id: int) -> dict:
|
||||
"""获取或创建游标"""
|
||||
rows = self.db.query(
|
||||
"SELECT * FROM etl_admin.etl_cursor WHERE task_id=%s AND store_id=%s",
|
||||
(task_id, store_id)
|
||||
)
|
||||
|
||||
if rows:
|
||||
return rows[0]
|
||||
|
||||
# 创建新游标
|
||||
self.db.execute(
|
||||
"""
|
||||
INSERT INTO etl_admin.etl_cursor(task_id, store_id, last_start, last_end, last_id, extra)
|
||||
VALUES(%s, %s, NULL, NULL, NULL, '{}'::jsonb)
|
||||
""",
|
||||
(task_id, store_id)
|
||||
)
|
||||
self.db.commit()
|
||||
|
||||
rows = self.db.query(
|
||||
"SELECT * FROM etl_admin.etl_cursor WHERE task_id=%s AND store_id=%s",
|
||||
(task_id, store_id)
|
||||
)
|
||||
return rows[0] if rows else None
|
||||
|
||||
def advance(self, task_id: int, store_id: int, window_start: datetime,
|
||||
window_end: datetime, run_id: int, last_id: int = None):
|
||||
"""推进游标"""
|
||||
if last_id is not None:
|
||||
sql = """
|
||||
UPDATE etl_admin.etl_cursor
|
||||
SET last_start = %s,
|
||||
last_end = %s,
|
||||
last_id = GREATEST(COALESCE(last_id, 0), %s),
|
||||
last_run_id = %s,
|
||||
updated_at = now()
|
||||
WHERE task_id = %s AND store_id = %s
|
||||
"""
|
||||
self.db.execute(sql, (window_start, window_end, last_id, run_id, task_id, store_id))
|
||||
else:
|
||||
sql = """
|
||||
UPDATE etl_admin.etl_cursor
|
||||
SET last_start = %s,
|
||||
last_end = %s,
|
||||
last_run_id = %s,
|
||||
updated_at = now()
|
||||
WHERE task_id = %s AND store_id = %s
|
||||
"""
|
||||
self.db.execute(sql, (window_start, window_end, run_id, task_id, store_id))
|
||||
|
||||
self.db.commit()
|
||||
70
etl_billiards/orchestration/run_tracker.py
Normal file
70
etl_billiards/orchestration/run_tracker.py
Normal file
@@ -0,0 +1,70 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""运行记录追踪器"""
|
||||
import json
|
||||
from datetime import datetime
|
||||
|
||||
class RunTracker:
|
||||
"""ETL运行记录管理"""
|
||||
|
||||
def __init__(self, db_connection):
|
||||
self.db = db_connection
|
||||
|
||||
def create_run(self, task_id: int, store_id: int, run_uuid: str,
|
||||
export_dir: str, log_path: str, status: str,
|
||||
window_start: datetime = None, window_end: datetime = None,
|
||||
window_minutes: int = None, overlap_seconds: int = None,
|
||||
request_params: dict = None) -> int:
|
||||
"""创建运行记录"""
|
||||
sql = """
|
||||
INSERT INTO etl_admin.etl_run(
|
||||
run_uuid, task_id, store_id, status, started_at, window_start, window_end,
|
||||
window_minutes, overlap_seconds, fetched_count, loaded_count, updated_count,
|
||||
skipped_count, error_count, unknown_fields, export_dir, log_path,
|
||||
request_params, manifest, error_message, extra
|
||||
) VALUES (
|
||||
%s, %s, %s, %s, now(), %s, %s, %s, %s, 0, 0, 0, 0, 0, 0, %s, %s, %s,
|
||||
'{}'::jsonb, NULL, '{}'::jsonb
|
||||
)
|
||||
RETURNING run_id
|
||||
"""
|
||||
|
||||
result = self.db.query(
|
||||
sql,
|
||||
(run_uuid, task_id, store_id, status, window_start, window_end,
|
||||
window_minutes, overlap_seconds, export_dir, log_path,
|
||||
json.dumps(request_params or {}, ensure_ascii=False))
|
||||
)
|
||||
|
||||
run_id = result[0]["run_id"]
|
||||
self.db.commit()
|
||||
return run_id
|
||||
|
||||
def update_run(self, run_id: int, counts: dict, status: str,
|
||||
ended_at: datetime = None, manifest: dict = None,
|
||||
error_message: str = None):
|
||||
"""更新运行记录"""
|
||||
sql = """
|
||||
UPDATE etl_admin.etl_run
|
||||
SET fetched_count = %s,
|
||||
loaded_count = %s,
|
||||
updated_count = %s,
|
||||
skipped_count = %s,
|
||||
error_count = %s,
|
||||
unknown_fields = %s,
|
||||
status = %s,
|
||||
ended_at = %s,
|
||||
manifest = %s,
|
||||
error_message = %s
|
||||
WHERE run_id = %s
|
||||
"""
|
||||
|
||||
self.db.execute(
|
||||
sql,
|
||||
(counts.get("fetched", 0), counts.get("inserted", 0),
|
||||
counts.get("updated", 0), counts.get("skipped", 0),
|
||||
counts.get("errors", 0), counts.get("unknown_fields", 0),
|
||||
status, ended_at,
|
||||
json.dumps(manifest or {}, ensure_ascii=False),
|
||||
error_message, run_id)
|
||||
)
|
||||
self.db.commit()
|
||||
234
etl_billiards/orchestration/scheduler.py
Normal file
234
etl_billiards/orchestration/scheduler.py
Normal file
@@ -0,0 +1,234 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""ETL 调度:支持在线抓取、离线清洗入库、全流程三种模式。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import uuid
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
from api.client import APIClient
|
||||
from api.local_json_client import LocalJsonClient
|
||||
from api.recording_client import RecordingAPIClient
|
||||
from database.connection import DatabaseConnection
|
||||
from database.operations import DatabaseOperations
|
||||
from orchestration.cursor_manager import CursorManager
|
||||
from orchestration.run_tracker import RunTracker
|
||||
from orchestration.task_registry import default_registry
|
||||
|
||||
|
||||
class ETLScheduler:
|
||||
"""调度多个任务,按 pipeline.flow 执行抓取/清洗入库。"""
|
||||
|
||||
def __init__(self, config, logger):
|
||||
self.config = config
|
||||
self.logger = logger
|
||||
self.tz = ZoneInfo(config.get("app.timezone", "Asia/Taipei"))
|
||||
|
||||
self.pipeline_flow = str(config.get("pipeline.flow", "FULL") or "FULL").upper()
|
||||
self.fetch_root = Path(config.get("pipeline.fetch_root") or config["io"]["export_root"])
|
||||
self.ingest_source_dir = config.get("pipeline.ingest_source_dir") or ""
|
||||
self.write_pretty_json = bool(config.get("io.write_pretty_json", False))
|
||||
|
||||
# 组件
|
||||
self.db_conn = DatabaseConnection(
|
||||
dsn=config["db"]["dsn"],
|
||||
session=config["db"].get("session"),
|
||||
connect_timeout=config["db"].get("connect_timeout_sec"),
|
||||
)
|
||||
self.db_ops = DatabaseOperations(self.db_conn)
|
||||
|
||||
self.api_client = APIClient(
|
||||
base_url=config["api"]["base_url"],
|
||||
token=config["api"]["token"],
|
||||
timeout=config["api"]["timeout_sec"],
|
||||
retry_max=config["api"]["retries"]["max_attempts"],
|
||||
headers_extra=config["api"].get("headers_extra"),
|
||||
)
|
||||
|
||||
self.cursor_mgr = CursorManager(self.db_conn)
|
||||
self.run_tracker = RunTracker(self.db_conn)
|
||||
self.task_registry = default_registry
|
||||
|
||||
# ------------------------------------------------------------------ public
|
||||
def run_tasks(self, task_codes: list | None = None):
|
||||
"""按配置或传入列表执行任务。"""
|
||||
run_uuid = uuid.uuid4().hex
|
||||
store_id = self.config.get("app.store_id")
|
||||
|
||||
if not task_codes:
|
||||
task_codes = self.config.get("run.tasks", [])
|
||||
|
||||
self.logger.info("开始运行任务: %s, run_uuid=%s", task_codes, run_uuid)
|
||||
|
||||
for task_code in task_codes:
|
||||
try:
|
||||
self._run_single_task(task_code, run_uuid, store_id)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
self.logger.error("任务 %s 失败: %s", task_code, exc, exc_info=True)
|
||||
continue
|
||||
|
||||
self.logger.info("所有任务执行完成")
|
||||
|
||||
# ------------------------------------------------------------------ internals
|
||||
def _run_single_task(self, task_code: str, run_uuid: str, store_id: int):
|
||||
"""单个任务的抓取/清洗编排。"""
|
||||
task_cfg = self._load_task_config(task_code, store_id)
|
||||
if not task_cfg:
|
||||
self.logger.warning("任务 %s 未启用或不存在", task_code)
|
||||
return
|
||||
|
||||
task_id = task_cfg["task_id"]
|
||||
cursor_data = self.cursor_mgr.get_or_create(task_id, store_id)
|
||||
|
||||
# run 记录
|
||||
export_dir = Path(self.config["io"]["export_root"]) / datetime.now(self.tz).strftime("%Y%m%d")
|
||||
log_path = str(Path(self.config["io"]["log_root"]) / f"{run_uuid}.log")
|
||||
run_id = self.run_tracker.create_run(
|
||||
task_id=task_id,
|
||||
store_id=store_id,
|
||||
run_uuid=run_uuid,
|
||||
export_dir=str(export_dir),
|
||||
log_path=log_path,
|
||||
status=self._map_run_status("RUNNING"),
|
||||
)
|
||||
|
||||
# 为抓取阶段准备目录
|
||||
fetch_dir = self._build_fetch_dir(task_code, run_id)
|
||||
fetch_stats = None
|
||||
|
||||
try:
|
||||
if self._flow_includes_fetch():
|
||||
fetch_stats = self._execute_fetch(task_code, cursor_data, fetch_dir, run_id)
|
||||
if self.pipeline_flow == "FETCH_ONLY":
|
||||
counts = self._counts_from_fetch(fetch_stats)
|
||||
self.run_tracker.update_run(
|
||||
run_id=run_id,
|
||||
counts=counts,
|
||||
status=self._map_run_status("SUCCESS"),
|
||||
ended_at=datetime.now(self.tz),
|
||||
)
|
||||
return
|
||||
|
||||
if self._flow_includes_ingest():
|
||||
source_dir = self._resolve_ingest_source(fetch_dir, fetch_stats)
|
||||
result = self._execute_ingest(task_code, cursor_data, source_dir)
|
||||
|
||||
self.run_tracker.update_run(
|
||||
run_id=run_id,
|
||||
counts=result["counts"],
|
||||
status=self._map_run_status(result["status"]),
|
||||
ended_at=datetime.now(self.tz),
|
||||
)
|
||||
|
||||
if (result.get("status") or "").upper() == "SUCCESS":
|
||||
window = result.get("window")
|
||||
if window:
|
||||
self.cursor_mgr.advance(
|
||||
task_id=task_id,
|
||||
store_id=store_id,
|
||||
window_start=window.get("start"),
|
||||
window_end=window.get("end"),
|
||||
run_id=run_id,
|
||||
)
|
||||
|
||||
except Exception as exc: # noqa: BLE001
|
||||
self.run_tracker.update_run(
|
||||
run_id=run_id,
|
||||
counts={},
|
||||
status=self._map_run_status("FAIL"),
|
||||
ended_at=datetime.now(self.tz),
|
||||
error_message=str(exc),
|
||||
)
|
||||
raise
|
||||
|
||||
def _execute_fetch(self, task_code: str, cursor_data: dict | None, fetch_dir: Path, run_id: int):
|
||||
"""在线抓取阶段:用 RecordingAPIClient 拉取并落盘,不做 Transform/Load。"""
|
||||
recording_client = RecordingAPIClient(
|
||||
base_client=self.api_client,
|
||||
output_dir=fetch_dir,
|
||||
task_code=task_code,
|
||||
run_id=run_id,
|
||||
write_pretty=self.write_pretty_json,
|
||||
)
|
||||
task = self.task_registry.create_task(task_code, self.config, self.db_ops, recording_client, self.logger)
|
||||
context = task._build_context(cursor_data) # type: ignore[attr-defined]
|
||||
self.logger.info("%s: 抓取阶段开始,目录=%s", task_code, fetch_dir)
|
||||
|
||||
extracted = task.extract(context)
|
||||
# 抓取结束,不执行 transform/load
|
||||
stats = recording_client.last_dump or {}
|
||||
fetched_count = stats.get("records") or len(extracted.get("records", [])) if isinstance(extracted, dict) else 0
|
||||
self.logger.info(
|
||||
"%s: 抓取完成,文件=%s,记录数=%s",
|
||||
task_code,
|
||||
stats.get("file"),
|
||||
fetched_count,
|
||||
)
|
||||
return {"file": stats.get("file"), "records": fetched_count, "pages": stats.get("pages")}
|
||||
|
||||
def _execute_ingest(self, task_code: str, cursor_data: dict | None, source_dir: Path):
|
||||
"""本地清洗入库:使用 LocalJsonClient 回放 JSON,走原有任务 ETL。"""
|
||||
local_client = LocalJsonClient(source_dir)
|
||||
task = self.task_registry.create_task(task_code, self.config, self.db_ops, local_client, self.logger)
|
||||
self.logger.info("%s: 本地清洗入库开始,源目录=%s", task_code, source_dir)
|
||||
return task.execute(cursor_data)
|
||||
|
||||
def _build_fetch_dir(self, task_code: str, run_id: int) -> Path:
|
||||
ts = datetime.now(self.tz).strftime("%Y%m%d-%H%M%S")
|
||||
return Path(self.fetch_root) / f"{task_code.upper()}-{run_id}-{ts}"
|
||||
|
||||
def _resolve_ingest_source(self, fetch_dir: Path, fetch_stats: dict | None) -> Path:
|
||||
if fetch_stats and fetch_dir.exists():
|
||||
return fetch_dir
|
||||
if self.ingest_source_dir:
|
||||
return Path(self.ingest_source_dir)
|
||||
raise FileNotFoundError("未提供本地清洗入库所需的 JSON 目录")
|
||||
|
||||
def _counts_from_fetch(self, stats: dict | None) -> dict:
|
||||
fetched = (stats or {}).get("records") or 0
|
||||
return {
|
||||
"fetched": fetched,
|
||||
"inserted": 0,
|
||||
"updated": 0,
|
||||
"skipped": 0,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _flow_includes_fetch(self) -> bool:
|
||||
return self.pipeline_flow in {"FETCH_ONLY", "FULL"}
|
||||
|
||||
def _flow_includes_ingest(self) -> bool:
|
||||
return self.pipeline_flow in {"INGEST_ONLY", "FULL"}
|
||||
|
||||
def _load_task_config(self, task_code: str, store_id: int) -> dict | None:
|
||||
"""从数据库加载任务配置。"""
|
||||
sql = """
|
||||
SELECT task_id, task_code, store_id, enabled, cursor_field,
|
||||
window_minutes_default, overlap_seconds, page_size, retry_max, params
|
||||
FROM etl_admin.etl_task
|
||||
WHERE store_id = %s AND task_code = %s AND enabled = TRUE
|
||||
"""
|
||||
|
||||
rows = self.db_conn.query(sql, (store_id, task_code))
|
||||
return rows[0] if rows else None
|
||||
|
||||
def close(self):
|
||||
"""关闭连接。"""
|
||||
self.db_conn.close()
|
||||
|
||||
@staticmethod
|
||||
def _map_run_status(status: str) -> str:
|
||||
"""
|
||||
将任务返回的状态转换为 etl_admin.run_status_enum
|
||||
(SUCC / FAIL / PARTIAL)
|
||||
"""
|
||||
normalized = (status or "").upper()
|
||||
if normalized in {"SUCCESS", "SUCC"}:
|
||||
return "SUCC"
|
||||
if normalized in {"FAIL", "FAILED", "ERROR"}:
|
||||
return "FAIL"
|
||||
if normalized in {"RUNNING", "PARTIAL", "PENDING", "IN_PROGRESS"}:
|
||||
return "PARTIAL"
|
||||
# 未知状态默认标记为 FAIL,便于排查
|
||||
return "FAIL"
|
||||
68
etl_billiards/orchestration/task_registry.py
Normal file
68
etl_billiards/orchestration/task_registry.py
Normal file
@@ -0,0 +1,68 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""任务注册表"""
|
||||
from tasks.orders_task import OrdersTask
|
||||
from tasks.payments_task import PaymentsTask
|
||||
from tasks.members_task import MembersTask
|
||||
from tasks.products_task import ProductsTask
|
||||
from tasks.tables_task import TablesTask
|
||||
from tasks.assistants_task import AssistantsTask
|
||||
from tasks.packages_task import PackagesDefTask
|
||||
from tasks.refunds_task import RefundsTask
|
||||
from tasks.coupon_usage_task import CouponUsageTask
|
||||
from tasks.inventory_change_task import InventoryChangeTask
|
||||
from tasks.topups_task import TopupsTask
|
||||
from tasks.table_discount_task import TableDiscountTask
|
||||
from tasks.assistant_abolish_task import AssistantAbolishTask
|
||||
from tasks.ledger_task import LedgerTask
|
||||
from tasks.ods_tasks import ODS_TASK_CLASSES
|
||||
from tasks.ticket_dwd_task import TicketDwdTask
|
||||
from tasks.manual_ingest_task import ManualIngestTask
|
||||
from tasks.payments_dwd_task import PaymentsDwdTask
|
||||
from tasks.members_dwd_task import MembersDwdTask
|
||||
|
||||
class TaskRegistry:
|
||||
"""任务注册和工厂"""
|
||||
|
||||
def __init__(self):
|
||||
self._tasks = {}
|
||||
|
||||
def register(self, task_code: str, task_class):
|
||||
"""注册任务类"""
|
||||
self._tasks[task_code.upper()] = task_class
|
||||
|
||||
def create_task(self, task_code: str, config, db_connection, api_client, logger):
|
||||
"""创建任务实例"""
|
||||
task_code = task_code.upper()
|
||||
if task_code not in self._tasks:
|
||||
raise ValueError(f"未知的任务类型: {task_code}")
|
||||
|
||||
task_class = self._tasks[task_code]
|
||||
return task_class(config, db_connection, api_client, logger)
|
||||
|
||||
def get_all_task_codes(self) -> list:
|
||||
"""获取所有已注册的任务代码"""
|
||||
return list(self._tasks.keys())
|
||||
|
||||
|
||||
# 默认注册表
|
||||
default_registry = TaskRegistry()
|
||||
default_registry.register("PRODUCTS", ProductsTask)
|
||||
default_registry.register("TABLES", TablesTask)
|
||||
default_registry.register("MEMBERS", MembersTask)
|
||||
default_registry.register("ASSISTANTS", AssistantsTask)
|
||||
default_registry.register("PACKAGES_DEF", PackagesDefTask)
|
||||
default_registry.register("ORDERS", OrdersTask)
|
||||
default_registry.register("PAYMENTS", PaymentsTask)
|
||||
default_registry.register("REFUNDS", RefundsTask)
|
||||
default_registry.register("COUPON_USAGE", CouponUsageTask)
|
||||
default_registry.register("INVENTORY_CHANGE", InventoryChangeTask)
|
||||
default_registry.register("TOPUPS", TopupsTask)
|
||||
default_registry.register("TABLE_DISCOUNT", TableDiscountTask)
|
||||
default_registry.register("ASSISTANT_ABOLISH", AssistantAbolishTask)
|
||||
default_registry.register("LEDGER", LedgerTask)
|
||||
default_registry.register("TICKET_DWD", TicketDwdTask)
|
||||
default_registry.register("MANUAL_INGEST", ManualIngestTask)
|
||||
default_registry.register("PAYMENTS_DWD", PaymentsDwdTask)
|
||||
default_registry.register("MEMBERS_DWD", MembersDwdTask)
|
||||
for code, task_cls in ODS_TASK_CLASSES.items():
|
||||
default_registry.register(code, task_cls)
|
||||
0
etl_billiards/quality/__init__.py
Normal file
0
etl_billiards/quality/__init__.py
Normal file
73
etl_billiards/quality/balance_checker.py
Normal file
73
etl_billiards/quality/balance_checker.py
Normal file
@@ -0,0 +1,73 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""余额一致性检查器"""
|
||||
from .base_checker import BaseDataQualityChecker
|
||||
|
||||
class BalanceChecker(BaseDataQualityChecker):
|
||||
"""检查订单、支付、退款的金额一致性"""
|
||||
|
||||
def check(self, store_id: int, start_date: str, end_date: str) -> dict:
|
||||
"""
|
||||
检查指定时间范围内的余额一致性
|
||||
|
||||
验证: 订单总额 = 支付总额 - 退款总额
|
||||
"""
|
||||
checks = []
|
||||
|
||||
# 查询订单总额
|
||||
sql_orders = """
|
||||
SELECT COALESCE(SUM(final_amount), 0) AS total
|
||||
FROM billiards.fact_order
|
||||
WHERE store_id = %s
|
||||
AND order_time >= %s
|
||||
AND order_time < %s
|
||||
AND order_status = 'COMPLETED'
|
||||
"""
|
||||
order_total = self.db.query(sql_orders, (store_id, start_date, end_date))[0]["total"]
|
||||
|
||||
# 查询支付总额
|
||||
sql_payments = """
|
||||
SELECT COALESCE(SUM(pay_amount), 0) AS total
|
||||
FROM billiards.fact_payment
|
||||
WHERE store_id = %s
|
||||
AND pay_time >= %s
|
||||
AND pay_time < %s
|
||||
AND pay_status = 'SUCCESS'
|
||||
"""
|
||||
payment_total = self.db.query(sql_payments, (store_id, start_date, end_date))[0]["total"]
|
||||
|
||||
# 查询退款总额
|
||||
sql_refunds = """
|
||||
SELECT COALESCE(SUM(refund_amount), 0) AS total
|
||||
FROM billiards.fact_refund
|
||||
WHERE store_id = %s
|
||||
AND refund_time >= %s
|
||||
AND refund_time < %s
|
||||
AND refund_status = 'SUCCESS'
|
||||
"""
|
||||
refund_total = self.db.query(sql_refunds, (store_id, start_date, end_date))[0]["total"]
|
||||
|
||||
# 验证余额
|
||||
expected_total = payment_total - refund_total
|
||||
diff = abs(float(order_total) - float(expected_total))
|
||||
threshold = 0.01 # 1分钱的容差
|
||||
|
||||
passed = diff < threshold
|
||||
|
||||
checks.append({
|
||||
"name": "balance_consistency",
|
||||
"passed": passed,
|
||||
"message": f"订单总额: {order_total}, 支付-退款: {expected_total}, 差异: {diff}",
|
||||
"details": {
|
||||
"order_total": float(order_total),
|
||||
"payment_total": float(payment_total),
|
||||
"refund_total": float(refund_total),
|
||||
"diff": diff
|
||||
}
|
||||
})
|
||||
|
||||
all_passed = all(c["passed"] for c in checks)
|
||||
|
||||
return {
|
||||
"passed": all_passed,
|
||||
"checks": checks
|
||||
}
|
||||
19
etl_billiards/quality/base_checker.py
Normal file
19
etl_billiards/quality/base_checker.py
Normal file
@@ -0,0 +1,19 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据质量检查器基类"""
|
||||
|
||||
class BaseDataQualityChecker:
|
||||
"""数据质量检查器基类"""
|
||||
|
||||
def __init__(self, db_connection, logger):
|
||||
self.db = db_connection
|
||||
self.logger = logger
|
||||
|
||||
def check(self) -> dict:
|
||||
"""
|
||||
执行质量检查
|
||||
返回: {
|
||||
"passed": bool,
|
||||
"checks": [{"name": str, "passed": bool, "message": str}]
|
||||
}
|
||||
"""
|
||||
raise NotImplementedError("子类需实现 check 方法")
|
||||
5
etl_billiards/requirements.txt
Normal file
5
etl_billiards/requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
# Python依赖包
|
||||
psycopg2-binary>=2.9.0
|
||||
requests>=2.28.0
|
||||
python-dateutil>=2.8.0
|
||||
tzdata>=2023.0
|
||||
0
etl_billiards/scd/__init__.py
Normal file
0
etl_billiards/scd/__init__.py
Normal file
89
etl_billiards/scd/scd2_handler.py
Normal file
89
etl_billiards/scd/scd2_handler.py
Normal file
@@ -0,0 +1,89 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""SCD2 (Slowly Changing Dimension Type 2) 处理逻辑"""
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
def _row_to_dict(cursor, row):
|
||||
if row is None:
|
||||
return None
|
||||
columns = [desc[0] for desc in cursor.description]
|
||||
return {col: row[idx] for idx, col in enumerate(columns)}
|
||||
|
||||
|
||||
class SCD2Handler:
|
||||
"""SCD2历史记录处理"""
|
||||
|
||||
def __init__(self, db_ops):
|
||||
self.db = db_ops
|
||||
|
||||
def upsert(
|
||||
self,
|
||||
table_name: str,
|
||||
natural_key: list,
|
||||
tracked_fields: list,
|
||||
record: dict,
|
||||
effective_date: datetime = None,
|
||||
) -> str:
|
||||
"""
|
||||
处理SCD2更新
|
||||
|
||||
Returns:
|
||||
操作类型: 'INSERT', 'UPDATE', 'UNCHANGED'
|
||||
"""
|
||||
effective_date = effective_date or datetime.now()
|
||||
|
||||
where_clause = " AND ".join([f"{k} = %({k})s" for k in natural_key])
|
||||
sql_select = f"""
|
||||
SELECT * FROM {table_name}
|
||||
WHERE {where_clause}
|
||||
AND valid_to IS NULL
|
||||
"""
|
||||
|
||||
with self.db.conn.cursor() as current:
|
||||
current.execute(sql_select, record)
|
||||
existing = _row_to_dict(current, current.fetchone())
|
||||
|
||||
if not existing:
|
||||
record["valid_from"] = effective_date
|
||||
record["valid_to"] = None
|
||||
record["is_current"] = True
|
||||
|
||||
fields = list(record.keys())
|
||||
placeholders = ", ".join([f"%({f})s" for f in fields])
|
||||
sql_insert = f"""
|
||||
INSERT INTO {table_name} ({', '.join(fields)})
|
||||
VALUES ({placeholders})
|
||||
"""
|
||||
current.execute(sql_insert, record)
|
||||
return "INSERT"
|
||||
|
||||
has_changes = any(existing.get(field) != record.get(field) for field in tracked_fields)
|
||||
if not has_changes:
|
||||
return "UNCHANGED"
|
||||
|
||||
update_where = " AND ".join([f"{k} = %({k})s" for k in natural_key])
|
||||
sql_close = f"""
|
||||
UPDATE {table_name}
|
||||
SET valid_to = %(effective_date)s,
|
||||
is_current = FALSE
|
||||
WHERE {update_where}
|
||||
AND valid_to IS NULL
|
||||
"""
|
||||
record["effective_date"] = effective_date
|
||||
current.execute(sql_close, record)
|
||||
|
||||
record["valid_from"] = effective_date
|
||||
record["valid_to"] = None
|
||||
record["is_current"] = True
|
||||
|
||||
fields = list(record.keys())
|
||||
if "effective_date" in fields:
|
||||
fields.remove("effective_date")
|
||||
placeholders = ", ".join([f"%({f})s" for f in fields])
|
||||
sql_insert = f"""
|
||||
INSERT INTO {table_name} ({', '.join(fields)})
|
||||
VALUES ({placeholders})
|
||||
"""
|
||||
current.execute(sql_insert, record)
|
||||
|
||||
return "UPDATE"
|
||||
0
etl_billiards/scripts/Temp1.py
Normal file
0
etl_billiards/scripts/Temp1.py
Normal file
76
etl_billiards/scripts/bootstrap_schema.py
Normal file
76
etl_billiards/scripts/bootstrap_schema.py
Normal file
@@ -0,0 +1,76 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Apply the PRD-aligned warehouse schema (ODS/DWD/DWS) to PostgreSQL."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
PROJECT_ROOT = Path(__file__).resolve().parents[1]
|
||||
if str(PROJECT_ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(PROJECT_ROOT))
|
||||
|
||||
from database.connection import DatabaseConnection # noqa: E402
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Create/upgrade warehouse schemas using schema_v2.sql"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dsn",
|
||||
help="PostgreSQL DSN (fallback to PG_DSN env)",
|
||||
default=os.environ.get("PG_DSN"),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--file",
|
||||
help="Path to schema SQL",
|
||||
default=str(PROJECT_ROOT / "database" / "schema_v2.sql"),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
|
||||
help="connect_timeout seconds (capped at 20, default 10)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def apply_schema(dsn: str, sql_path: Path, timeout: int) -> None:
|
||||
if not sql_path.exists():
|
||||
raise FileNotFoundError(f"Schema file not found: {sql_path}")
|
||||
|
||||
sql_text = sql_path.read_text(encoding="utf-8")
|
||||
timeout_val = max(1, min(timeout, 20))
|
||||
|
||||
conn = DatabaseConnection(dsn, connect_timeout=timeout_val)
|
||||
try:
|
||||
with conn.conn.cursor() as cur:
|
||||
cur.execute(sql_text)
|
||||
conn.commit()
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
raise
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
if not args.dsn:
|
||||
print("Missing DSN. Set PG_DSN or pass --dsn.", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
try:
|
||||
apply_schema(args.dsn, Path(args.file), args.timeout)
|
||||
except Exception as exc: # pragma: no cover - utility script
|
||||
print(f"Schema apply failed: {exc}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
print("Schema applied successfully.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
425
etl_billiards/scripts/build_dwd_from_ods.py
Normal file
425
etl_billiards/scripts/build_dwd_from_ods.py
Normal file
@@ -0,0 +1,425 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Populate PRD DWD tables from ODS payload snapshots."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
import psycopg2
|
||||
|
||||
|
||||
SQL_STEPS: list[tuple[str, str]] = [
|
||||
(
|
||||
"dim_tenant",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_tenant (tenant_id, tenant_name, status)
|
||||
SELECT DISTINCT tenant_id, 'default' AS tenant_name, 'active' AS status
|
||||
FROM (
|
||||
SELECT tenant_id FROM billiards_ods.ods_order_settle
|
||||
UNION SELECT tenant_id FROM billiards_ods.ods_order_receipt_detail
|
||||
UNION SELECT tenant_id FROM billiards_ods.ods_member_profile
|
||||
) s
|
||||
WHERE tenant_id IS NOT NULL
|
||||
ON CONFLICT (tenant_id) DO UPDATE SET updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_site",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_site (site_id, tenant_id, site_name, status)
|
||||
SELECT DISTINCT site_id, MAX(tenant_id) AS tenant_id, 'default' AS site_name, 'active' AS status
|
||||
FROM (
|
||||
SELECT site_id, tenant_id FROM billiards_ods.ods_order_settle
|
||||
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_order_receipt_detail
|
||||
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_table_info
|
||||
) s
|
||||
WHERE site_id IS NOT NULL
|
||||
GROUP BY site_id
|
||||
ON CONFLICT (site_id) DO UPDATE SET updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_product_category",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_product_category (category_id, category_name, parent_id, level_no, status)
|
||||
SELECT DISTINCT category_id, category_name, parent_id, level_no, status
|
||||
FROM billiards_ods.ods_goods_category
|
||||
WHERE category_id IS NOT NULL
|
||||
ON CONFLICT (category_id) DO UPDATE SET
|
||||
category_name = EXCLUDED.category_name,
|
||||
parent_id = EXCLUDED.parent_id,
|
||||
level_no = EXCLUDED.level_no,
|
||||
status = EXCLUDED.status;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_product",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_product (goods_id, goods_name, goods_code, category_id, category_name, unit, default_price, status)
|
||||
SELECT DISTINCT goods_id, goods_name, NULL::TEXT AS goods_code, category_id, category_name, NULL::TEXT AS unit, sale_price AS default_price, status
|
||||
FROM billiards_ods.ods_store_product
|
||||
WHERE goods_id IS NOT NULL
|
||||
ON CONFLICT (goods_id) DO UPDATE SET
|
||||
goods_name = EXCLUDED.goods_name,
|
||||
category_id = EXCLUDED.category_id,
|
||||
category_name = EXCLUDED.category_name,
|
||||
default_price = EXCLUDED.default_price,
|
||||
status = EXCLUDED.status,
|
||||
updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_product_from_sales",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_product (goods_id, goods_name)
|
||||
SELECT DISTINCT goods_id, goods_name
|
||||
FROM billiards_ods.ods_store_sale_item
|
||||
WHERE goods_id IS NOT NULL
|
||||
ON CONFLICT (goods_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_member_card_type",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_member_card_type (card_type_id, card_type_name, discount_rate)
|
||||
SELECT DISTINCT card_type_id, card_type_name, discount_rate
|
||||
FROM billiards_ods.ods_member_card
|
||||
WHERE card_type_id IS NOT NULL
|
||||
ON CONFLICT (card_type_id) DO UPDATE SET
|
||||
card_type_name = EXCLUDED.card_type_name,
|
||||
discount_rate = EXCLUDED.discount_rate;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_member",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_member (
|
||||
site_id, member_id, tenant_id, member_name, nickname, gender, birthday, mobile,
|
||||
member_type_id, member_type_name, status, register_time, last_visit_time,
|
||||
balance, total_recharge_amount, total_consumed_amount, wechat_id, alipay_id, remark
|
||||
)
|
||||
SELECT DISTINCT
|
||||
prof.site_id,
|
||||
prof.member_id,
|
||||
prof.tenant_id,
|
||||
prof.member_name,
|
||||
prof.nickname,
|
||||
prof.gender,
|
||||
prof.birthday,
|
||||
prof.mobile,
|
||||
card.member_type_id,
|
||||
card.member_type_name,
|
||||
prof.status,
|
||||
prof.register_time,
|
||||
prof.last_visit_time,
|
||||
prof.balance,
|
||||
NULL::NUMERIC AS total_recharge_amount,
|
||||
NULL::NUMERIC AS total_consumed_amount,
|
||||
prof.wechat_id,
|
||||
prof.alipay_id,
|
||||
prof.remarks
|
||||
FROM billiards_ods.ods_member_profile prof
|
||||
LEFT JOIN (
|
||||
SELECT DISTINCT site_id, member_id, card_type_id AS member_type_id, card_type_name AS member_type_name
|
||||
FROM billiards_ods.ods_member_card
|
||||
) card
|
||||
ON prof.site_id = card.site_id AND prof.member_id = card.member_id
|
||||
WHERE prof.member_id IS NOT NULL
|
||||
ON CONFLICT (site_id, member_id) DO UPDATE SET
|
||||
member_name = EXCLUDED.member_name,
|
||||
nickname = EXCLUDED.nickname,
|
||||
gender = EXCLUDED.gender,
|
||||
birthday = EXCLUDED.birthday,
|
||||
mobile = EXCLUDED.mobile,
|
||||
member_type_id = EXCLUDED.member_type_id,
|
||||
member_type_name = EXCLUDED.member_type_name,
|
||||
status = EXCLUDED.status,
|
||||
register_time = EXCLUDED.register_time,
|
||||
last_visit_time = EXCLUDED.last_visit_time,
|
||||
balance = EXCLUDED.balance,
|
||||
wechat_id = EXCLUDED.wechat_id,
|
||||
alipay_id = EXCLUDED.alipay_id,
|
||||
remark = EXCLUDED.remark,
|
||||
updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_table",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_table (table_id, site_id, table_code, table_name, table_type, area_name, status, created_time, updated_time)
|
||||
SELECT DISTINCT table_id, site_id, table_code, table_name, table_type, area_name, status, created_time, updated_time
|
||||
FROM billiards_ods.ods_table_info
|
||||
WHERE table_id IS NOT NULL
|
||||
ON CONFLICT (table_id) DO UPDATE SET
|
||||
site_id = EXCLUDED.site_id,
|
||||
table_code = EXCLUDED.table_code,
|
||||
table_name = EXCLUDED.table_name,
|
||||
table_type = EXCLUDED.table_type,
|
||||
area_name = EXCLUDED.area_name,
|
||||
status = EXCLUDED.status,
|
||||
created_time = EXCLUDED.created_time,
|
||||
updated_time = EXCLUDED.updated_time;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_assistant",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_assistant (assistant_id, assistant_name, mobile, status)
|
||||
SELECT DISTINCT assistant_id, assistant_name, mobile, status
|
||||
FROM billiards_ods.ods_assistant_account
|
||||
WHERE assistant_id IS NOT NULL
|
||||
ON CONFLICT (assistant_id) DO UPDATE SET
|
||||
assistant_name = EXCLUDED.assistant_name,
|
||||
mobile = EXCLUDED.mobile,
|
||||
status = EXCLUDED.status,
|
||||
updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_pay_method",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_pay_method (pay_method_code, pay_method_name, is_stored_value, status)
|
||||
SELECT DISTINCT pay_method_code, pay_method_name, FALSE AS is_stored_value, 'active' AS status
|
||||
FROM billiards_ods.ods_payment_record
|
||||
WHERE pay_method_code IS NOT NULL
|
||||
ON CONFLICT (pay_method_code) DO UPDATE SET
|
||||
pay_method_name = EXCLUDED.pay_method_name,
|
||||
status = EXCLUDED.status,
|
||||
updated_at = now();
|
||||
""",
|
||||
),
|
||||
(
|
||||
"dim_coupon_platform",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.dim_coupon_platform (platform_code, platform_name)
|
||||
SELECT DISTINCT platform_code, platform_code AS platform_name
|
||||
FROM billiards_ods.ods_platform_coupon_log
|
||||
WHERE platform_code IS NOT NULL
|
||||
ON CONFLICT (platform_code) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_sale_item",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_sale_item (
|
||||
site_id, sale_item_id, order_trade_no, order_settle_id, member_id,
|
||||
goods_id, category_id, quantity, original_amount, discount_amount,
|
||||
final_amount, is_gift, sale_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
sale_item_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
NULL::BIGINT AS member_id,
|
||||
goods_id,
|
||||
category_id,
|
||||
quantity,
|
||||
original_amount,
|
||||
discount_amount,
|
||||
final_amount,
|
||||
COALESCE(is_gift, FALSE),
|
||||
sale_time
|
||||
FROM billiards_ods.ods_store_sale_item
|
||||
ON CONFLICT (site_id, sale_item_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_table_usage",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_table_usage (
|
||||
site_id, ledger_id, order_trade_no, order_settle_id, table_id,
|
||||
member_id, start_time, end_time, duration_minutes,
|
||||
original_table_fee, member_discount_amount, manual_discount_amount,
|
||||
final_table_fee, is_canceled, cancel_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
ledger_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
table_id,
|
||||
member_id,
|
||||
start_time,
|
||||
end_time,
|
||||
duration_minutes,
|
||||
original_table_fee,
|
||||
0::NUMERIC AS member_discount_amount,
|
||||
discount_amount AS manual_discount_amount,
|
||||
final_table_fee,
|
||||
FALSE AS is_canceled,
|
||||
NULL::TIMESTAMPTZ AS cancel_time
|
||||
FROM billiards_ods.ods_table_use_log
|
||||
ON CONFLICT (site_id, ledger_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_assistant_service",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_assistant_service (
|
||||
site_id, ledger_id, order_trade_no, order_settle_id, assistant_id,
|
||||
assist_type_code, member_id, start_time, end_time, duration_minutes,
|
||||
original_fee, member_discount_amount, manual_discount_amount,
|
||||
final_fee, is_canceled, cancel_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
ledger_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
assistant_id,
|
||||
NULL::TEXT AS assist_type_code,
|
||||
member_id,
|
||||
start_time,
|
||||
end_time,
|
||||
duration_minutes,
|
||||
original_fee,
|
||||
0::NUMERIC AS member_discount_amount,
|
||||
discount_amount AS manual_discount_amount,
|
||||
final_fee,
|
||||
FALSE AS is_canceled,
|
||||
NULL::TIMESTAMPTZ AS cancel_time
|
||||
FROM billiards_ods.ods_assistant_service_log
|
||||
ON CONFLICT (site_id, ledger_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_coupon_usage",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_coupon_usage (
|
||||
site_id, coupon_id, package_id, order_trade_no, order_settle_id,
|
||||
member_id, platform_code, status, deduct_amount, settle_price, used_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
coupon_id,
|
||||
NULL::BIGINT AS package_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
member_id,
|
||||
platform_code,
|
||||
status,
|
||||
deduct_amount,
|
||||
settle_price,
|
||||
used_time
|
||||
FROM billiards_ods.ods_platform_coupon_log
|
||||
ON CONFLICT (site_id, coupon_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_payment",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_payment (
|
||||
site_id, pay_id, order_trade_no, order_settle_id, member_id,
|
||||
pay_method_code, pay_amount, pay_time, relate_type, relate_id
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
pay_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
member_id,
|
||||
pay_method_code,
|
||||
pay_amount,
|
||||
pay_time,
|
||||
relate_type,
|
||||
relate_id
|
||||
FROM billiards_ods.ods_payment_record
|
||||
ON CONFLICT (site_id, pay_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_refund",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_refund (
|
||||
site_id, refund_id, order_trade_no, order_settle_id, member_id,
|
||||
pay_method_code, refund_amount, refund_time, status
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
refund_id,
|
||||
order_trade_no,
|
||||
order_settle_id,
|
||||
member_id,
|
||||
pay_method_code,
|
||||
refund_amount,
|
||||
refund_time,
|
||||
status
|
||||
FROM billiards_ods.ods_refund_record
|
||||
ON CONFLICT (site_id, refund_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
(
|
||||
"fact_balance_change",
|
||||
"""
|
||||
INSERT INTO billiards_dwd.fact_balance_change (
|
||||
site_id, change_id, member_id, change_type, relate_type, relate_id,
|
||||
pay_method_code, change_amount, balance_before, balance_after, change_time
|
||||
)
|
||||
SELECT
|
||||
site_id,
|
||||
change_id,
|
||||
member_id,
|
||||
change_type,
|
||||
NULL::TEXT AS relate_type,
|
||||
relate_id,
|
||||
NULL::TEXT AS pay_method_code,
|
||||
change_amount,
|
||||
balance_before,
|
||||
balance_after,
|
||||
change_time
|
||||
FROM billiards_ods.ods_balance_change
|
||||
ON CONFLICT (site_id, change_id) DO NOTHING;
|
||||
""",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(description="Build DWD tables from ODS payloads (PRD schema).")
|
||||
parser.add_argument(
|
||||
"--dsn",
|
||||
default=os.environ.get("PG_DSN"),
|
||||
help="PostgreSQL DSN (fallback PG_DSN env)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
|
||||
help="connect_timeout seconds (capped at 20, default 10)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
if not args.dsn:
|
||||
print("Missing DSN. Use --dsn or PG_DSN.", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
timeout_val = max(1, min(args.timeout, 20))
|
||||
conn = psycopg2.connect(args.dsn, connect_timeout=timeout_val)
|
||||
conn.autocommit = False
|
||||
try:
|
||||
with conn.cursor() as cur:
|
||||
for name, sql in SQL_STEPS:
|
||||
cur.execute(sql)
|
||||
print(f"[OK] {name}")
|
||||
conn.commit()
|
||||
except Exception as exc: # pragma: no cover - operational script
|
||||
conn.rollback()
|
||||
print(f"[FAIL] {exc}", file=sys.stderr)
|
||||
return 1
|
||||
finally:
|
||||
try:
|
||||
conn.close()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
print("DWD build complete.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
322
etl_billiards/scripts/build_dws_order_summary.py
Normal file
322
etl_billiards/scripts/build_dws_order_summary.py
Normal file
@@ -0,0 +1,322 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Recompute billiards_dws.dws_order_summary from DWD fact tables."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
PROJECT_ROOT = Path(__file__).resolve().parents[1]
|
||||
if str(PROJECT_ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(PROJECT_ROOT))
|
||||
|
||||
from database.connection import DatabaseConnection # noqa: E402
|
||||
|
||||
|
||||
SQL_BUILD_SUMMARY = r"""
|
||||
WITH table_fee AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
MIN(member_id) AS member_id,
|
||||
SUM(COALESCE(final_table_fee, 0)) AS table_fee_amount,
|
||||
SUM(COALESCE(member_discount_amount, 0)) AS member_discount_amount,
|
||||
SUM(COALESCE(manual_discount_amount, 0)) AS manual_discount_amount,
|
||||
SUM(COALESCE(original_table_fee, 0)) AS original_table_fee,
|
||||
MIN(start_time) AS first_time
|
||||
FROM billiards_dwd.fact_table_usage
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR start_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR start_time::date <= %(end_date)s)
|
||||
AND COALESCE(is_canceled, FALSE) = FALSE
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
assistant_fee AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
MIN(member_id) AS member_id,
|
||||
SUM(COALESCE(final_fee, 0)) AS assistant_service_amount,
|
||||
SUM(COALESCE(member_discount_amount, 0)) AS member_discount_amount,
|
||||
SUM(COALESCE(manual_discount_amount, 0)) AS manual_discount_amount,
|
||||
SUM(COALESCE(original_fee, 0)) AS original_fee,
|
||||
MIN(start_time) AS first_time
|
||||
FROM billiards_dwd.fact_assistant_service
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR start_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR start_time::date <= %(end_date)s)
|
||||
AND COALESCE(is_canceled, FALSE) = FALSE
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
goods_fee AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
MIN(member_id) AS member_id,
|
||||
SUM(COALESCE(final_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_amount,
|
||||
SUM(COALESCE(discount_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_discount_amount,
|
||||
SUM(COALESCE(original_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_original_amount,
|
||||
COUNT(*) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS item_count,
|
||||
SUM(COALESCE(quantity, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS total_item_quantity,
|
||||
MIN(sale_time) AS first_time
|
||||
FROM billiards_dwd.fact_sale_item
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR sale_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR sale_time::date <= %(end_date)s)
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
coupon_usage AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
MIN(member_id) AS member_id,
|
||||
SUM(COALESCE(deduct_amount, 0)) AS coupon_deduction,
|
||||
SUM(COALESCE(settle_price, 0)) AS settle_price,
|
||||
MIN(used_time) AS first_time
|
||||
FROM billiards_dwd.fact_coupon_usage
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR used_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR used_time::date <= %(end_date)s)
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
payments AS (
|
||||
SELECT
|
||||
fp.site_id,
|
||||
fp.order_settle_id,
|
||||
fp.order_trade_no,
|
||||
MIN(fp.member_id) AS member_id,
|
||||
SUM(COALESCE(fp.pay_amount, 0)) AS total_paid_amount,
|
||||
SUM(COALESCE(fp.pay_amount, 0)) FILTER (WHERE COALESCE(pm.is_stored_value, FALSE)) AS stored_card_deduct,
|
||||
SUM(COALESCE(fp.pay_amount, 0)) FILTER (WHERE NOT COALESCE(pm.is_stored_value, FALSE)) AS external_paid_amount,
|
||||
MIN(fp.pay_time) AS first_time
|
||||
FROM billiards_dwd.fact_payment fp
|
||||
LEFT JOIN billiards_dwd.dim_pay_method pm ON fp.pay_method_code = pm.pay_method_code
|
||||
WHERE (%(site_id)s IS NULL OR fp.site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR fp.pay_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR fp.pay_time::date <= %(end_date)s)
|
||||
GROUP BY fp.site_id, fp.order_settle_id, fp.order_trade_no
|
||||
),
|
||||
refunds AS (
|
||||
SELECT
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
SUM(COALESCE(refund_amount, 0)) AS refund_amount
|
||||
FROM billiards_dwd.fact_refund
|
||||
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
|
||||
AND (%(start_date)s IS NULL OR refund_time::date >= %(start_date)s)
|
||||
AND (%(end_date)s IS NULL OR refund_time::date <= %(end_date)s)
|
||||
GROUP BY site_id, order_settle_id, order_trade_no
|
||||
),
|
||||
combined_ids AS (
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM table_fee
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM assistant_fee
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM goods_fee
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM coupon_usage
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM payments
|
||||
UNION
|
||||
SELECT site_id, order_settle_id, order_trade_no FROM refunds
|
||||
),
|
||||
site_dim AS (
|
||||
SELECT site_id, tenant_id FROM billiards_dwd.dim_site
|
||||
)
|
||||
INSERT INTO billiards_dws.dws_order_summary (
|
||||
site_id,
|
||||
order_settle_id,
|
||||
order_trade_no,
|
||||
order_date,
|
||||
tenant_id,
|
||||
member_id,
|
||||
member_flag,
|
||||
recharge_order_flag,
|
||||
item_count,
|
||||
total_item_quantity,
|
||||
table_fee_amount,
|
||||
assistant_service_amount,
|
||||
goods_amount,
|
||||
group_amount,
|
||||
total_coupon_deduction,
|
||||
member_discount_amount,
|
||||
manual_discount_amount,
|
||||
order_original_amount,
|
||||
order_final_amount,
|
||||
stored_card_deduct,
|
||||
external_paid_amount,
|
||||
total_paid_amount,
|
||||
book_table_flow,
|
||||
book_assistant_flow,
|
||||
book_goods_flow,
|
||||
book_group_flow,
|
||||
book_order_flow,
|
||||
order_effective_consume_cash,
|
||||
order_effective_recharge_cash,
|
||||
order_effective_flow,
|
||||
refund_amount,
|
||||
net_income,
|
||||
created_at,
|
||||
updated_at
|
||||
)
|
||||
SELECT
|
||||
c.site_id,
|
||||
c.order_settle_id,
|
||||
c.order_trade_no,
|
||||
COALESCE(tf.first_time, af.first_time, gf.first_time, pay.first_time, cu.first_time)::date AS order_date,
|
||||
sd.tenant_id,
|
||||
COALESCE(tf.member_id, af.member_id, gf.member_id, cu.member_id, pay.member_id) AS member_id,
|
||||
COALESCE(tf.member_id, af.member_id, gf.member_id, cu.member_id, pay.member_id) IS NOT NULL AS member_flag,
|
||||
-- recharge flag: no consumption side but has payments
|
||||
(COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
|
||||
AND COALESCE(pay.total_paid_amount, 0) > 0 AS recharge_order_flag,
|
||||
COALESCE(gf.item_count, 0) AS item_count,
|
||||
COALESCE(gf.total_item_quantity, 0) AS total_item_quantity,
|
||||
COALESCE(tf.table_fee_amount, 0) AS table_fee_amount,
|
||||
COALESCE(af.assistant_service_amount, 0) AS assistant_service_amount,
|
||||
COALESCE(gf.goods_amount, 0) AS goods_amount,
|
||||
COALESCE(cu.settle_price, 0) AS group_amount,
|
||||
COALESCE(cu.coupon_deduction, 0) AS total_coupon_deduction,
|
||||
COALESCE(tf.member_discount_amount, 0) + COALESCE(af.member_discount_amount, 0) + COALESCE(gf.goods_discount_amount, 0) AS member_discount_amount,
|
||||
COALESCE(tf.manual_discount_amount, 0) + COALESCE(af.manual_discount_amount, 0) AS manual_discount_amount,
|
||||
COALESCE(tf.original_table_fee, 0) + COALESCE(af.original_fee, 0) + COALESCE(gf.goods_original_amount, 0) AS order_original_amount,
|
||||
COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) - COALESCE(cu.coupon_deduction, 0) AS order_final_amount,
|
||||
COALESCE(pay.stored_card_deduct, 0) AS stored_card_deduct,
|
||||
COALESCE(pay.external_paid_amount, 0) AS external_paid_amount,
|
||||
COALESCE(pay.total_paid_amount, 0) AS total_paid_amount,
|
||||
COALESCE(tf.table_fee_amount, 0) AS book_table_flow,
|
||||
COALESCE(af.assistant_service_amount, 0) AS book_assistant_flow,
|
||||
COALESCE(gf.goods_amount, 0) AS book_goods_flow,
|
||||
COALESCE(cu.settle_price, 0) AS book_group_flow,
|
||||
COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) AS book_order_flow,
|
||||
CASE
|
||||
WHEN (COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
|
||||
THEN 0
|
||||
ELSE COALESCE(pay.external_paid_amount, 0)
|
||||
END AS order_effective_consume_cash,
|
||||
CASE
|
||||
WHEN (COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
|
||||
THEN COALESCE(pay.external_paid_amount, 0)
|
||||
ELSE 0
|
||||
END AS order_effective_recharge_cash,
|
||||
COALESCE(pay.external_paid_amount, 0) + COALESCE(cu.settle_price, 0) AS order_effective_flow,
|
||||
COALESCE(rf.refund_amount, 0) AS refund_amount,
|
||||
(COALESCE(pay.external_paid_amount, 0) + COALESCE(cu.settle_price, 0)) - COALESCE(rf.refund_amount, 0) AS net_income,
|
||||
now() AS created_at,
|
||||
now() AS updated_at
|
||||
FROM combined_ids c
|
||||
LEFT JOIN table_fee tf ON c.site_id = tf.site_id AND c.order_settle_id = tf.order_settle_id
|
||||
LEFT JOIN assistant_fee af ON c.site_id = af.site_id AND c.order_settle_id = af.order_settle_id
|
||||
LEFT JOIN goods_fee gf ON c.site_id = gf.site_id AND c.order_settle_id = gf.order_settle_id
|
||||
LEFT JOIN coupon_usage cu ON c.site_id = cu.site_id AND c.order_settle_id = cu.order_settle_id
|
||||
LEFT JOIN payments pay ON c.site_id = pay.site_id AND c.order_settle_id = pay.order_settle_id
|
||||
LEFT JOIN refunds rf ON c.site_id = rf.site_id AND c.order_settle_id = rf.order_settle_id
|
||||
LEFT JOIN site_dim sd ON c.site_id = sd.site_id
|
||||
ON CONFLICT (site_id, order_settle_id) DO UPDATE SET
|
||||
order_trade_no = EXCLUDED.order_trade_no,
|
||||
order_date = EXCLUDED.order_date,
|
||||
tenant_id = EXCLUDED.tenant_id,
|
||||
member_id = EXCLUDED.member_id,
|
||||
member_flag = EXCLUDED.member_flag,
|
||||
recharge_order_flag = EXCLUDED.recharge_order_flag,
|
||||
item_count = EXCLUDED.item_count,
|
||||
total_item_quantity = EXCLUDED.total_item_quantity,
|
||||
table_fee_amount = EXCLUDED.table_fee_amount,
|
||||
assistant_service_amount = EXCLUDED.assistant_service_amount,
|
||||
goods_amount = EXCLUDED.goods_amount,
|
||||
group_amount = EXCLUDED.group_amount,
|
||||
total_coupon_deduction = EXCLUDED.total_coupon_deduction,
|
||||
member_discount_amount = EXCLUDED.member_discount_amount,
|
||||
manual_discount_amount = EXCLUDED.manual_discount_amount,
|
||||
order_original_amount = EXCLUDED.order_original_amount,
|
||||
order_final_amount = EXCLUDED.order_final_amount,
|
||||
stored_card_deduct = EXCLUDED.stored_card_deduct,
|
||||
external_paid_amount = EXCLUDED.external_paid_amount,
|
||||
total_paid_amount = EXCLUDED.total_paid_amount,
|
||||
book_table_flow = EXCLUDED.book_table_flow,
|
||||
book_assistant_flow = EXCLUDED.book_assistant_flow,
|
||||
book_goods_flow = EXCLUDED.book_goods_flow,
|
||||
book_group_flow = EXCLUDED.book_group_flow,
|
||||
book_order_flow = EXCLUDED.book_order_flow,
|
||||
order_effective_consume_cash = EXCLUDED.order_effective_consume_cash,
|
||||
order_effective_recharge_cash = EXCLUDED.order_effective_recharge_cash,
|
||||
order_effective_flow = EXCLUDED.order_effective_flow,
|
||||
refund_amount = EXCLUDED.refund_amount,
|
||||
net_income = EXCLUDED.net_income,
|
||||
updated_at = now();
|
||||
"""
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Build/update dws_order_summary from DWD fact tables."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dsn",
|
||||
default=os.environ.get("PG_DSN"),
|
||||
help="PostgreSQL DSN (fallback: PG_DSN env)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--site-id",
|
||||
type=int,
|
||||
default=None,
|
||||
help="Filter by site_id (optional, default all sites)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--start-date",
|
||||
dest="start_date",
|
||||
default=None,
|
||||
help="Filter facts from this date (YYYY-MM-DD, optional)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--end-date",
|
||||
dest="end_date",
|
||||
default=None,
|
||||
help="Filter facts until this date (YYYY-MM-DD, optional)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
|
||||
help="connect_timeout seconds (capped at 20, default 10)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
if not args.dsn:
|
||||
print("Missing DSN. Set PG_DSN or pass --dsn.", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
params = {
|
||||
"site_id": args.site_id,
|
||||
"start_date": args.start_date,
|
||||
"end_date": args.end_date,
|
||||
}
|
||||
timeout_val = max(1, min(args.timeout, 20))
|
||||
|
||||
conn = DatabaseConnection(args.dsn, connect_timeout=timeout_val)
|
||||
try:
|
||||
with conn.conn.cursor() as cur:
|
||||
cur.execute(SQL_BUILD_SUMMARY, params)
|
||||
conn.commit()
|
||||
except Exception as exc: # pragma: no cover - operational script
|
||||
conn.rollback()
|
||||
print(f"DWS build failed: {exc}", file=sys.stderr)
|
||||
return 1
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
print("dws_order_summary refreshed.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
258
etl_billiards/scripts/rebuild_ods_from_json.py
Normal file
258
etl_billiards/scripts/rebuild_ods_from_json.py
Normal file
@@ -0,0 +1,258 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
从本地 JSON 示例目录重建 billiards_ods.* 表,并导入样例数据。
|
||||
用法:
|
||||
PYTHONPATH=. python -m etl_billiards.scripts.rebuild_ods_from_json [--dsn ...] [--json-dir ...] [--include ...] [--drop-schema-first]
|
||||
|
||||
依赖环境变量:
|
||||
PG_DSN PostgreSQL 连接串(必填)
|
||||
PG_CONNECT_TIMEOUT 可选,秒,默认 10
|
||||
JSON_DOC_DIR 可选,JSON 目录,默认 C:\\dev\\LLTQ\\export\\test-json-doc
|
||||
ODS_INCLUDE_FILES 可选,逗号分隔文件名(不含 .json)
|
||||
ODS_DROP_SCHEMA_FIRST 可选,true/false,默认 true
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Iterable, List, Tuple
|
||||
|
||||
import psycopg2
|
||||
from psycopg2 import sql
|
||||
from psycopg2.extras import Json, execute_values
|
||||
|
||||
|
||||
DEFAULT_JSON_DIR = r"C:\dev\LLTQ\export\test-json-doc"
|
||||
SPECIAL_LIST_PATHS: dict[str, tuple[str, ...]] = {
|
||||
"assistant_accounts_master": ("data", "assistantInfos"),
|
||||
"assistant_cancellation_records": ("data", "abolitionAssistants"),
|
||||
"assistant_service_records": ("data", "orderAssistantDetails"),
|
||||
"goods_stock_movements": ("data", "queryDeliveryRecordsList"),
|
||||
"goods_stock_summary": ("data",),
|
||||
"group_buy_packages": ("data", "packageCouponList"),
|
||||
"group_buy_redemption_records": ("data", "siteTableUseDetailsList"),
|
||||
"member_balance_changes": ("data", "tenantMemberCardLogs"),
|
||||
"member_profiles": ("data", "tenantMemberInfos"),
|
||||
"member_stored_value_cards": ("data", "tenantMemberCards"),
|
||||
"recharge_settlements": ("data", "settleList"),
|
||||
"settlement_records": ("data", "settleList"),
|
||||
"site_tables_master": ("data", "siteTables"),
|
||||
"stock_goods_category_tree": ("data", "goodsCategoryList"),
|
||||
"store_goods_master": ("data", "orderGoodsList"),
|
||||
"store_goods_sales_records": ("data", "orderGoodsLedgers"),
|
||||
"table_fee_discount_records": ("data", "taiFeeAdjustInfos"),
|
||||
"table_fee_transactions": ("data", "siteTableUseDetailsList"),
|
||||
"tenant_goods_master": ("data", "tenantGoodsList"),
|
||||
}
|
||||
|
||||
|
||||
def sanitize_identifier(name: str) -> str:
|
||||
"""将任意字符串转为可用的 SQL identifier(小写、非字母数字转下划线)。"""
|
||||
cleaned = re.sub(r"[^0-9a-zA-Z_]", "_", name.strip())
|
||||
if not cleaned:
|
||||
cleaned = "col"
|
||||
if cleaned[0].isdigit():
|
||||
cleaned = f"_{cleaned}"
|
||||
return cleaned.lower()
|
||||
|
||||
|
||||
def _extract_list_via_path(node, path: tuple[str, ...]):
|
||||
cur = node
|
||||
for key in path:
|
||||
if isinstance(cur, dict):
|
||||
cur = cur.get(key)
|
||||
else:
|
||||
return []
|
||||
return cur if isinstance(cur, list) else []
|
||||
|
||||
|
||||
def load_records(payload, list_path: tuple[str, ...] | None = None) -> list:
|
||||
"""
|
||||
尝试从 JSON 结构中提取记录列表:
|
||||
- 直接是 list -> 返回
|
||||
- dict 中 data 是 list -> 返回
|
||||
- dict 中 data 是 dict,取第一个 list 字段
|
||||
- dict 中任意值是 list -> 返回
|
||||
- 其余情况,包装为单条记录
|
||||
"""
|
||||
if list_path:
|
||||
if isinstance(payload, list):
|
||||
merged: list = []
|
||||
for item in payload:
|
||||
merged.extend(_extract_list_via_path(item, list_path))
|
||||
if merged:
|
||||
return merged
|
||||
elif isinstance(payload, dict):
|
||||
lst = _extract_list_via_path(payload, list_path)
|
||||
if lst:
|
||||
return lst
|
||||
|
||||
if isinstance(payload, list):
|
||||
return payload
|
||||
if isinstance(payload, dict):
|
||||
data_node = payload.get("data")
|
||||
if isinstance(data_node, list):
|
||||
return data_node
|
||||
if isinstance(data_node, dict):
|
||||
for v in data_node.values():
|
||||
if isinstance(v, list):
|
||||
return v
|
||||
for v in payload.values():
|
||||
if isinstance(v, list):
|
||||
return v
|
||||
return [payload]
|
||||
|
||||
|
||||
def collect_columns(records: Iterable[dict]) -> List[str]:
|
||||
"""汇总所有顶层键,作为表字段;仅处理 dict 记录。"""
|
||||
cols: set[str] = set()
|
||||
for rec in records:
|
||||
if isinstance(rec, dict):
|
||||
cols.update(rec.keys())
|
||||
return sorted(cols)
|
||||
|
||||
|
||||
def create_table(cur, schema: str, table: str, columns: List[Tuple[str, str]]):
|
||||
"""
|
||||
创建表:字段全部 jsonb,外加 source_file、record_index、payload、ingested_at。
|
||||
columns: [(col_name, original_key)]
|
||||
"""
|
||||
fields = [sql.SQL("{} jsonb").format(sql.Identifier(col)) for col, _ in columns]
|
||||
constraint_name = f"uq_{table}_source_record"
|
||||
ddl = sql.SQL(
|
||||
"CREATE TABLE IF NOT EXISTS {schema}.{table} ("
|
||||
"source_file text,"
|
||||
"record_index integer,"
|
||||
"{cols},"
|
||||
"payload jsonb,"
|
||||
"ingested_at timestamptz default now(),"
|
||||
"CONSTRAINT {constraint} UNIQUE (source_file, record_index)"
|
||||
");"
|
||||
).format(
|
||||
schema=sql.Identifier(schema),
|
||||
table=sql.Identifier(table),
|
||||
cols=sql.SQL(",").join(fields),
|
||||
constraint=sql.Identifier(constraint_name),
|
||||
)
|
||||
cur.execute(ddl)
|
||||
|
||||
|
||||
def insert_records(cur, schema: str, table: str, columns: List[Tuple[str, str]], records: list, source_file: str):
|
||||
"""批量插入记录。"""
|
||||
col_idents = [sql.Identifier(col) for col, _ in columns]
|
||||
col_names = [col for col, _ in columns]
|
||||
orig_keys = [orig for _, orig in columns]
|
||||
all_cols = [sql.Identifier("source_file"), sql.Identifier("record_index")] + col_idents + [
|
||||
sql.Identifier("payload")
|
||||
]
|
||||
|
||||
rows = []
|
||||
for idx, rec in enumerate(records):
|
||||
if not isinstance(rec, dict):
|
||||
rec = {"value": rec}
|
||||
row_values = [source_file, idx]
|
||||
for key in orig_keys:
|
||||
row_values.append(Json(rec.get(key)))
|
||||
row_values.append(Json(rec))
|
||||
rows.append(row_values)
|
||||
|
||||
insert_sql = sql.SQL("INSERT INTO {}.{} ({}) VALUES %s ON CONFLICT DO NOTHING").format(
|
||||
sql.Identifier(schema),
|
||||
sql.Identifier(table),
|
||||
sql.SQL(",").join(all_cols),
|
||||
)
|
||||
execute_values(cur, insert_sql, rows, page_size=500)
|
||||
|
||||
|
||||
def rebuild(schema: str = "billiards_ods", data_dir: str | Path = DEFAULT_JSON_DIR):
|
||||
parser = argparse.ArgumentParser(description="重建 billiards_ods.* 表并导入 JSON 样例")
|
||||
parser.add_argument("--dsn", dest="dsn", help="PostgreSQL DSN(默认读取环境变量 PG_DSN)")
|
||||
parser.add_argument("--json-dir", dest="json_dir", help=f"JSON 目录,默认 {DEFAULT_JSON_DIR}")
|
||||
parser.add_argument(
|
||||
"--include",
|
||||
dest="include_files",
|
||||
help="限定导入的文件名(逗号分隔,不含 .json),默认全部",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--drop-schema-first",
|
||||
dest="drop_schema_first",
|
||||
action="store_true",
|
||||
help="先删除并重建 schema(默认 true)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-drop-schema-first",
|
||||
dest="drop_schema_first",
|
||||
action="store_false",
|
||||
help="保留现有 schema,仅按冲突去重导入",
|
||||
)
|
||||
parser.set_defaults(drop_schema_first=None)
|
||||
args = parser.parse_args()
|
||||
|
||||
dsn = args.dsn or os.environ.get("PG_DSN")
|
||||
if not dsn:
|
||||
print("缺少参数/环境变量 PG_DSN,无法连接数据库。")
|
||||
sys.exit(1)
|
||||
timeout = max(1, min(int(os.environ.get("PG_CONNECT_TIMEOUT", 10)), 60))
|
||||
env_drop = os.environ.get("ODS_DROP_SCHEMA_FIRST") or os.environ.get("DROP_SCHEMA_FIRST")
|
||||
drop_schema_first = (
|
||||
args.drop_schema_first
|
||||
if args.drop_schema_first is not None
|
||||
else str(env_drop or "true").lower() in ("1", "true", "yes")
|
||||
)
|
||||
include_files_env = args.include_files or os.environ.get("ODS_INCLUDE_FILES") or os.environ.get("INCLUDE_FILES")
|
||||
include_files = set()
|
||||
if include_files_env:
|
||||
include_files = {p.strip().lower() for p in include_files_env.split(",") if p.strip()}
|
||||
|
||||
base_dir = Path(args.json_dir or data_dir or DEFAULT_JSON_DIR)
|
||||
if not base_dir.exists():
|
||||
print(f"JSON 目录不存在: {base_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
conn = psycopg2.connect(dsn, connect_timeout=timeout)
|
||||
conn.autocommit = False
|
||||
cur = conn.cursor()
|
||||
|
||||
if drop_schema_first:
|
||||
print(f"Dropping schema {schema} ...")
|
||||
cur.execute(sql.SQL("DROP SCHEMA IF EXISTS {} CASCADE;").format(sql.Identifier(schema)))
|
||||
cur.execute(sql.SQL("CREATE SCHEMA {};").format(sql.Identifier(schema)))
|
||||
else:
|
||||
cur.execute(
|
||||
sql.SQL("SELECT schema_name FROM information_schema.schemata WHERE schema_name=%s"),
|
||||
(schema,),
|
||||
)
|
||||
if not cur.fetchone():
|
||||
cur.execute(sql.SQL("CREATE SCHEMA {};").format(sql.Identifier(schema)))
|
||||
|
||||
json_files = sorted(base_dir.glob("*.json"))
|
||||
for path in json_files:
|
||||
stem_lower = path.stem.lower()
|
||||
if include_files and stem_lower not in include_files:
|
||||
continue
|
||||
|
||||
print(f"Processing {path.name} ...")
|
||||
payload = json.loads(path.read_text(encoding="utf-8"))
|
||||
list_path = SPECIAL_LIST_PATHS.get(stem_lower)
|
||||
records = load_records(payload, list_path=list_path)
|
||||
columns_raw = collect_columns(records)
|
||||
columns = [(sanitize_identifier(c), c) for c in columns_raw]
|
||||
|
||||
table_name = sanitize_identifier(path.stem)
|
||||
create_table(cur, schema, table_name, columns)
|
||||
if records:
|
||||
insert_records(cur, schema, table_name, columns, records, path.name)
|
||||
print(f" -> rows: {len(records)}, columns: {len(columns)}")
|
||||
|
||||
conn.commit()
|
||||
cur.close()
|
||||
conn.close()
|
||||
print("Rebuild done.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
rebuild()
|
||||
195
etl_billiards/scripts/run_tests.py
Normal file
195
etl_billiards/scripts/run_tests.py
Normal file
@@ -0,0 +1,195 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
灵活的测试执行脚本,可像搭积木一样组合不同参数或预置命令(模式/数据库/归档路径等),
|
||||
直接运行本文件即可触发 pytest。
|
||||
|
||||
示例:
|
||||
python scripts/run_tests.py --suite online --flow FULL --keyword ORDERS
|
||||
python scripts/run_tests.py --preset fetch_only
|
||||
python scripts/run_tests.py --suite online --json-source tmp/archives
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import importlib.util
|
||||
import os
|
||||
import shlex
|
||||
import sys
|
||||
from typing import Dict, List
|
||||
|
||||
import pytest
|
||||
|
||||
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
# 确保项目根目录在 sys.path,便于 tests 内部 import config / tasks 等模块
|
||||
if PROJECT_ROOT not in sys.path:
|
||||
sys.path.insert(0, PROJECT_ROOT)
|
||||
|
||||
SUITE_MAP: Dict[str, str] = {
|
||||
"online": "tests/unit/test_etl_tasks_online.py",
|
||||
"integration": "tests/integration/test_database.py",
|
||||
}
|
||||
|
||||
PRESETS: Dict[str, Dict] = {}
|
||||
|
||||
|
||||
def _load_presets():
|
||||
preset_path = os.path.join(os.path.dirname(__file__), "test_presets.py")
|
||||
if not os.path.exists(preset_path):
|
||||
return
|
||||
spec = importlib.util.spec_from_file_location("test_presets", preset_path)
|
||||
if not spec or not spec.loader:
|
||||
return
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module) # type: ignore[attr-defined]
|
||||
presets = getattr(module, "PRESETS", {})
|
||||
if isinstance(presets, dict):
|
||||
PRESETS.update(presets)
|
||||
|
||||
|
||||
_load_presets()
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(description="ETL 测试执行器(支持参数化调配)")
|
||||
parser.add_argument(
|
||||
"--suite",
|
||||
choices=sorted(SUITE_MAP.keys()),
|
||||
nargs="+",
|
||||
help="预置测试套件,可多选(默认全部 online/offline)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--tests",
|
||||
nargs="+",
|
||||
help="自定义测试路径(可与 --suite 混用),例如 tests/unit/test_config.py",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--flow",
|
||||
choices=["FETCH_ONLY", "INGEST_ONLY", "FULL"],
|
||||
help="覆盖 PIPELINE_FLOW(在线抓取/本地清洗/全流程)",
|
||||
)
|
||||
parser.add_argument("--json-source", help="设置 JSON_SOURCE_DIR(本地清洗入库使用的 JSON 目录)")
|
||||
parser.add_argument("--json-fetch-root", help="设置 JSON_FETCH_ROOT(在线抓取输出根目录)")
|
||||
parser.add_argument(
|
||||
"--keyword",
|
||||
"-k",
|
||||
help="pytest -k 关键字过滤(例如 ORDERS,只运行包含该字符串的用例)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--pytest-args",
|
||||
help="附加 pytest 参数,格式与命令行一致(例如 \"-vv --maxfail=1\")",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--env",
|
||||
action="append",
|
||||
metavar="KEY=VALUE",
|
||||
help="自定义环境变量,可重复传入,例如 --env STORE_ID=123",
|
||||
)
|
||||
parser.add_argument("--preset", choices=sorted(PRESETS.keys()) if PRESETS else None, nargs="+",
|
||||
help="从 scripts/test_presets.py 中选择一个或多个组合命令")
|
||||
parser.add_argument("--list-presets", action="store_true", help="列出可用预置命令后退出")
|
||||
parser.add_argument("--dry-run", action="store_true", help="仅打印将要执行的命令与环境,不真正运行 pytest")
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def apply_presets_to_args(args: argparse.Namespace):
|
||||
if not args.preset:
|
||||
return
|
||||
for name in args.preset:
|
||||
preset = PRESETS.get(name, {})
|
||||
if not preset:
|
||||
continue
|
||||
for key, value in preset.items():
|
||||
if key in ("suite", "tests"):
|
||||
if not value:
|
||||
continue
|
||||
existing = getattr(args, key)
|
||||
if existing is None:
|
||||
setattr(args, key, list(value))
|
||||
else:
|
||||
existing.extend(value)
|
||||
elif key == "env":
|
||||
args.env = (args.env or []) + list(value)
|
||||
elif key == "pytest_args":
|
||||
args.pytest_args = " ".join(filter(None, [value, args.pytest_args or ""]))
|
||||
elif key == "keyword":
|
||||
if args.keyword is None:
|
||||
args.keyword = value
|
||||
else:
|
||||
if getattr(args, key, None) is None:
|
||||
setattr(args, key, value)
|
||||
|
||||
|
||||
def apply_env(args: argparse.Namespace) -> Dict[str, str]:
|
||||
env_updates = {}
|
||||
if args.flow:
|
||||
env_updates["PIPELINE_FLOW"] = args.flow
|
||||
if args.json_source:
|
||||
env_updates["JSON_SOURCE_DIR"] = args.json_source
|
||||
if args.json_fetch_root:
|
||||
env_updates["JSON_FETCH_ROOT"] = args.json_fetch_root
|
||||
if args.env:
|
||||
for item in args.env:
|
||||
if "=" not in item:
|
||||
raise SystemExit(f"--env 参数格式错误: {item!r},应为 KEY=VALUE")
|
||||
key, value = item.split("=", 1)
|
||||
env_updates[key.strip()] = value.strip()
|
||||
|
||||
for key, value in env_updates.items():
|
||||
os.environ[key] = value
|
||||
return env_updates
|
||||
|
||||
|
||||
def build_pytest_args(args: argparse.Namespace) -> List[str]:
|
||||
targets: List[str] = []
|
||||
if args.suite:
|
||||
for suite in args.suite:
|
||||
targets.append(SUITE_MAP[suite])
|
||||
if args.tests:
|
||||
targets.extend(args.tests)
|
||||
if not targets:
|
||||
targets = list(SUITE_MAP.values())
|
||||
|
||||
pytest_args: List[str] = targets
|
||||
if args.keyword:
|
||||
pytest_args += ["-k", args.keyword]
|
||||
if args.pytest_args:
|
||||
pytest_args += shlex.split(args.pytest_args)
|
||||
return pytest_args
|
||||
|
||||
|
||||
def main() -> int:
|
||||
os.chdir(PROJECT_ROOT)
|
||||
args = parse_args()
|
||||
if args.list_presets:
|
||||
print("可用预置命令:")
|
||||
if not PRESETS:
|
||||
print("(暂无,可编辑 scripts/test_presets.py 添加)")
|
||||
else:
|
||||
for name in sorted(PRESETS):
|
||||
print(f"- {name}")
|
||||
return 0
|
||||
|
||||
apply_presets_to_args(args)
|
||||
env_updates = apply_env(args)
|
||||
pytest_args = build_pytest_args(args)
|
||||
|
||||
print("=== 环境变量覆盖 ===")
|
||||
if env_updates:
|
||||
for k, v in env_updates.items():
|
||||
print(f"{k}={v}")
|
||||
else:
|
||||
print("(无覆盖,沿用系统默认)")
|
||||
print("\n=== Pytest 参数 ===")
|
||||
print(" ".join(pytest_args))
|
||||
print()
|
||||
|
||||
if args.dry_run:
|
||||
print("Dry-run 模式,未真正执行 pytest")
|
||||
return 0
|
||||
|
||||
exit_code = pytest.main(pytest_args)
|
||||
return int(exit_code)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
64
etl_billiards/scripts/test_db_connection.py
Normal file
64
etl_billiards/scripts/test_db_connection.py
Normal file
@@ -0,0 +1,64 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""Quick utility for validating PostgreSQL connectivity (ASCII-only output)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
|
||||
if PROJECT_ROOT not in sys.path:
|
||||
sys.path.insert(0, PROJECT_ROOT)
|
||||
|
||||
from database.connection import DatabaseConnection
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(description="PostgreSQL connectivity smoke test")
|
||||
parser.add_argument("--dsn", help="Override TEST_DB_DSN / env value")
|
||||
parser.add_argument(
|
||||
"--query",
|
||||
default="SELECT 1 AS ok",
|
||||
help="Custom SQL to run after connection (default: SELECT 1 AS ok)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=10,
|
||||
help="connect_timeout seconds passed to psycopg2 (capped at 20, default: 10)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
dsn = args.dsn or os.environ.get("TEST_DB_DSN")
|
||||
if not dsn:
|
||||
print("Missing DSN. Use --dsn or TEST_DB_DSN.", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
print(f"Trying connection: {dsn}")
|
||||
try:
|
||||
timeout = max(1, min(args.timeout, 20))
|
||||
conn = DatabaseConnection(dsn, connect_timeout=timeout)
|
||||
except Exception as exc: # pragma: no cover - diagnostic output
|
||||
print("Connection failed:", exc, file=sys.stderr)
|
||||
return 1
|
||||
|
||||
try:
|
||||
result = conn.query(args.query)
|
||||
print("Connection OK, query result:")
|
||||
for row in result:
|
||||
print(row)
|
||||
conn.close()
|
||||
return 0
|
||||
except Exception as exc: # pragma: no cover - diagnostic output
|
||||
print("Connection succeeded but query failed:", exc, file=sys.stderr)
|
||||
try:
|
||||
conn.close()
|
||||
finally:
|
||||
return 3
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
122
etl_billiards/scripts/test_presets.py
Normal file
122
etl_billiards/scripts/test_presets.py
Normal file
@@ -0,0 +1,122 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""测试命令仓库:集中维护 run_tests.py 的常用组合,支持一键执行。"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from typing import List
|
||||
|
||||
RUN_TESTS_SCRIPT = os.path.join(os.path.dirname(__file__), "run_tests.py")
|
||||
|
||||
# 默认自动运行的预置(可根据需要修改顺序/条目)
|
||||
AUTO_RUN_PRESETS = ["fetch_only"]
|
||||
|
||||
PRESETS = {
|
||||
"fetch_only": {
|
||||
"suite": ["online"],
|
||||
"flow": "FETCH_ONLY",
|
||||
"json_fetch_root": "tmp/json_fetch",
|
||||
"keyword": "ORDERS",
|
||||
"pytest_args": "-vv",
|
||||
"preset_meta": "仅在线抓取阶段,输出到本地目录",
|
||||
},
|
||||
"ingest_local": {
|
||||
"suite": ["online"],
|
||||
"flow": "INGEST_ONLY",
|
||||
"json_source": "tests/source-data-doc",
|
||||
"keyword": "ORDERS",
|
||||
"preset_meta": "从指定 JSON 目录做本地清洗入库",
|
||||
},
|
||||
"full_pipeline": {
|
||||
"suite": ["online"],
|
||||
"flow": "FULL",
|
||||
"json_fetch_root": "tmp/json_fetch",
|
||||
"keyword": "ORDERS",
|
||||
"preset_meta": "先抓取再清洗入库的全流程",
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def print_parameter_help() -> None:
|
||||
print("=== 参数键说明 ===")
|
||||
print("suite : 预置套件列表,如 ['online','integration']")
|
||||
print("tests : 自定义 pytest 路径列表")
|
||||
print("flow : PIPELINE_FLOW(FETCH_ONLY / INGEST_ONLY / FULL)")
|
||||
print("json_source : JSON_SOURCE_DIR,本地清洗入库使用的 JSON 目录")
|
||||
print("json_fetch_root : JSON_FETCH_ROOT,在线抓取输出根目录")
|
||||
print("keyword : pytest -k 过滤关键字")
|
||||
print("pytest_args : 额外 pytest 参数(字符串)")
|
||||
print("env : 附加环境变量,例如 ['KEY=VALUE']")
|
||||
print("preset_meta : 仅用于注释说明")
|
||||
print()
|
||||
|
||||
|
||||
def print_presets() -> None:
|
||||
if not PRESETS:
|
||||
print("当前未定义任何预置,请在 PRESETS 中添加。")
|
||||
return
|
||||
for idx, (name, payload) in enumerate(PRESETS.items(), start=1):
|
||||
comment = payload.get("preset_meta", "")
|
||||
print(f"{idx}. {name}")
|
||||
if comment:
|
||||
print(f" 说明: {comment}")
|
||||
for key, value in payload.items():
|
||||
if key == "preset_meta":
|
||||
continue
|
||||
print(f" {key}: {value}")
|
||||
print()
|
||||
|
||||
|
||||
def resolve_targets(requested: List[str] | None) -> List[str]:
|
||||
if not PRESETS:
|
||||
raise SystemExit("预置为空,请先在 PRESETS 中定义测试组合。")
|
||||
|
||||
def valid(names: List[str]) -> List[str]:
|
||||
return [name for name in names if name in PRESETS]
|
||||
|
||||
if requested:
|
||||
candidates = valid(requested)
|
||||
missing = [name for name in requested if name not in PRESETS]
|
||||
if missing:
|
||||
print(f"警告:忽略未定义的预置 {missing}")
|
||||
if candidates:
|
||||
return candidates
|
||||
|
||||
auto = valid(AUTO_RUN_PRESETS)
|
||||
if auto:
|
||||
return auto
|
||||
|
||||
return list(PRESETS.keys())
|
||||
|
||||
|
||||
def run_presets(preset_names: List[str], dry_run: bool) -> None:
|
||||
for name in preset_names:
|
||||
cmd = [sys.executable, RUN_TESTS_SCRIPT, "--preset", name]
|
||||
printable = " ".join(cmd)
|
||||
if dry_run:
|
||||
print(f"[Dry-Run] {printable}")
|
||||
else:
|
||||
print(f"\n>>> 执行: {printable}")
|
||||
subprocess.run(cmd, check=False)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="测试预置仓库(集中配置即可批量触发 run_tests)")
|
||||
parser.add_argument("--preset", choices=sorted(PRESETS.keys()), nargs="+", help="指定要运行的预置命令")
|
||||
parser.add_argument("--list", action="store_true", help="仅列出参数说明与所有预置")
|
||||
parser.add_argument("--dry-run", action="store_true", help="仅打印命令,不执行 pytest")
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.list:
|
||||
print_parameter_help()
|
||||
print_presets()
|
||||
return
|
||||
|
||||
targets = resolve_targets(args.preset)
|
||||
run_presets(targets, dry_run=args.dry_run)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
30
etl_billiards/setup.py
Normal file
30
etl_billiards/setup.py
Normal file
@@ -0,0 +1,30 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
Setup script for ETL Billiards
|
||||
"""
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
with open("requirements.txt") as f:
|
||||
requirements = f.read().splitlines()
|
||||
|
||||
setup(
|
||||
name="etl-billiards",
|
||||
version="2.0.0",
|
||||
description="Modular ETL system for billiards business data",
|
||||
author="Data Platform Team",
|
||||
author_email="data-platform@example.com",
|
||||
packages=find_packages(),
|
||||
install_requires=requirements,
|
||||
python_requires=">=3.10",
|
||||
entry_points={
|
||||
"console_scripts": [
|
||||
"etl-billiards=cli.main:main",
|
||||
],
|
||||
},
|
||||
classifiers=[
|
||||
"Development Status :: 4 - Beta",
|
||||
"Intended Audience :: Developers",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
],
|
||||
)
|
||||
0
etl_billiards/tasks/__init__.py
Normal file
0
etl_billiards/tasks/__init__.py
Normal file
81
etl_billiards/tasks/assistant_abolish_task.py
Normal file
81
etl_billiards/tasks/assistant_abolish_task.py
Normal file
@@ -0,0 +1,81 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教作废任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.assistant_abolish import AssistantAbolishLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class AssistantAbolishTask(BaseTask):
|
||||
"""同步助教作废记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "ASSISTANT_ABOLISH"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/AssistantPerformance/GetAbolitionAssistant",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="abolitionAssistants",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_record(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = AssistantAbolishLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_records(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_record(self, raw: dict, store_id: int) -> dict | None:
|
||||
abolish_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not abolish_id:
|
||||
self.logger.warning("跳过缺少作废ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"abolish_id": abolish_id,
|
||||
"table_id": TypeParser.parse_int(raw.get("tableId")),
|
||||
"table_name": raw.get("tableName"),
|
||||
"table_area_id": TypeParser.parse_int(raw.get("tableAreaId")),
|
||||
"table_area": raw.get("tableArea"),
|
||||
"assistant_no": raw.get("assistantOn"),
|
||||
"assistant_name": raw.get("assistantName"),
|
||||
"charge_minutes": TypeParser.parse_int(raw.get("pdChargeMinutes")),
|
||||
"abolish_amount": TypeParser.parse_decimal(raw.get("assistantAbolishAmount")),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("createTime") or raw.get("create_time"), self.tz
|
||||
),
|
||||
"trash_reason": raw.get("trashReason"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
102
etl_billiards/tasks/assistants_task.py
Normal file
102
etl_billiards/tasks/assistants_task.py
Normal file
@@ -0,0 +1,102 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教账号任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.assistant import AssistantLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class AssistantsTask(BaseTask):
|
||||
"""同步助教账号资料"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "ASSISTANTS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/PersonnelManagement/SearchAssistantInfo",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="assistantInfos",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_assistant(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = AssistantLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_assistants(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_assistant(self, raw: dict, store_id: int) -> dict | None:
|
||||
assistant_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not assistant_id:
|
||||
self.logger.warning("跳过缺少助教ID的数据: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"assistant_id": assistant_id,
|
||||
"assistant_no": raw.get("assistant_no") or raw.get("assistantNo"),
|
||||
"nickname": raw.get("nickname"),
|
||||
"real_name": raw.get("real_name") or raw.get("realName"),
|
||||
"gender": raw.get("gender"),
|
||||
"mobile": raw.get("mobile"),
|
||||
"level": raw.get("level"),
|
||||
"team_id": TypeParser.parse_int(raw.get("team_id") or raw.get("teamId")),
|
||||
"team_name": raw.get("team_name"),
|
||||
"assistant_status": raw.get("assistant_status"),
|
||||
"work_status": raw.get("work_status"),
|
||||
"entry_time": TypeParser.parse_timestamp(
|
||||
raw.get("entry_time") or raw.get("entryTime"), self.tz
|
||||
),
|
||||
"resign_time": TypeParser.parse_timestamp(
|
||||
raw.get("resign_time") or raw.get("resignTime"), self.tz
|
||||
),
|
||||
"start_time": TypeParser.parse_timestamp(
|
||||
raw.get("start_time") or raw.get("startTime"), self.tz
|
||||
),
|
||||
"end_time": TypeParser.parse_timestamp(
|
||||
raw.get("end_time") or raw.get("endTime"), self.tz
|
||||
),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"update_time": TypeParser.parse_timestamp(
|
||||
raw.get("update_time") or raw.get("updateTime"), self.tz
|
||||
),
|
||||
"system_role_id": raw.get("system_role_id"),
|
||||
"online_status": raw.get("online_status"),
|
||||
"allow_cx": raw.get("allow_cx"),
|
||||
"charge_way": raw.get("charge_way"),
|
||||
"pd_unit_price": TypeParser.parse_decimal(raw.get("pd_unit_price")),
|
||||
"cx_unit_price": TypeParser.parse_decimal(raw.get("cx_unit_price")),
|
||||
"is_guaranteed": raw.get("is_guaranteed"),
|
||||
"is_team_leader": raw.get("is_team_leader"),
|
||||
"serial_number": raw.get("serial_number"),
|
||||
"show_sort": raw.get("show_sort"),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
79
etl_billiards/tasks/base_dwd_task.py
Normal file
79
etl_billiards/tasks/base_dwd_task.py
Normal file
@@ -0,0 +1,79 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""DWD任务基类"""
|
||||
import json
|
||||
from typing import Any, Dict, Iterator, List, Optional, Tuple
|
||||
from datetime import datetime
|
||||
|
||||
from .base_task import BaseTask
|
||||
from models.parsers import TypeParser
|
||||
|
||||
class BaseDwdTask(BaseTask):
|
||||
"""
|
||||
DWD 层任务基类
|
||||
负责从 ODS 表读取数据,供子类清洗和写入事实/维度表
|
||||
"""
|
||||
|
||||
def _get_ods_cursor(self, task_code: str) -> datetime:
|
||||
"""
|
||||
获取上次处理的 ODS 数据的时间点 (fetched_at)
|
||||
这里简化处理,实际应该从 etl_cursor 表读取
|
||||
目前先依赖 BaseTask 的时间窗口逻辑,或者子类自己管理
|
||||
"""
|
||||
# TODO: 对接真正的 CursorManager
|
||||
# 暂时返回一个较早的时间,或者由子类通过 _get_time_window 获取
|
||||
return None
|
||||
|
||||
def iter_ods_rows(
|
||||
self,
|
||||
table_name: str,
|
||||
columns: List[str],
|
||||
start_time: datetime,
|
||||
end_time: datetime,
|
||||
time_col: str = "fetched_at",
|
||||
batch_size: int = 1000
|
||||
) -> Iterator[List[Dict[str, Any]]]:
|
||||
"""
|
||||
分批迭代读取 ODS 表数据
|
||||
|
||||
Args:
|
||||
table_name: ODS 表名
|
||||
columns: 需要查询的字段列表 (必须包含 payload)
|
||||
start_time: 开始时间 (包含)
|
||||
end_time: 结束时间 (包含)
|
||||
time_col: 时间过滤字段,默认 fetched_at
|
||||
batch_size: 批次大小
|
||||
"""
|
||||
offset = 0
|
||||
cols_str = ", ".join(columns)
|
||||
|
||||
while True:
|
||||
sql = f"""
|
||||
SELECT {cols_str}
|
||||
FROM {table_name}
|
||||
WHERE {time_col} >= %s AND {time_col} <= %s
|
||||
ORDER BY {time_col} ASC
|
||||
LIMIT %s OFFSET %s
|
||||
"""
|
||||
|
||||
rows = self.db.query(sql, (start_time, end_time, batch_size, offset))
|
||||
|
||||
if not rows:
|
||||
break
|
||||
|
||||
yield rows
|
||||
|
||||
if len(rows) < batch_size:
|
||||
break
|
||||
|
||||
offset += batch_size
|
||||
|
||||
def parse_payload(self, row: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
解析 ODS 行中的 payload JSON
|
||||
"""
|
||||
payload = row.get("payload")
|
||||
if isinstance(payload, str):
|
||||
return json.loads(payload)
|
||||
elif isinstance(payload, dict):
|
||||
return payload
|
||||
return {}
|
||||
141
etl_billiards/tasks/base_task.py
Normal file
141
etl_billiards/tasks/base_task.py
Normal file
@@ -0,0 +1,141 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""ETL任务基类(引入 Extract/Transform/Load 模板方法)"""
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timedelta
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class TaskContext:
|
||||
"""统一透传给 Extract/Transform/Load 的运行期信息。"""
|
||||
|
||||
store_id: int
|
||||
window_start: datetime
|
||||
window_end: datetime
|
||||
window_minutes: int
|
||||
cursor: dict | None = None
|
||||
|
||||
|
||||
class BaseTask:
|
||||
"""提供 E/T/L 模板的任务基类。"""
|
||||
|
||||
def __init__(self, config, db_connection, api_client, logger):
|
||||
self.config = config
|
||||
self.db = db_connection
|
||||
self.api = api_client
|
||||
self.logger = logger
|
||||
self.tz = ZoneInfo(config.get("app.timezone", "Asia/Taipei"))
|
||||
|
||||
# ------------------------------------------------------------------ 基本信息
|
||||
def get_task_code(self) -> str:
|
||||
"""获取任务代码"""
|
||||
raise NotImplementedError("子类需实现 get_task_code 方法")
|
||||
|
||||
# ------------------------------------------------------------------ E/T/L 钩子
|
||||
def extract(self, context: TaskContext):
|
||||
"""提取数据"""
|
||||
raise NotImplementedError("子类需实现 extract 方法")
|
||||
|
||||
def transform(self, extracted, context: TaskContext):
|
||||
"""转换数据"""
|
||||
return extracted
|
||||
|
||||
def load(self, transformed, context: TaskContext) -> dict:
|
||||
"""加载数据并返回统计信息"""
|
||||
raise NotImplementedError("子类需实现 load 方法")
|
||||
|
||||
# ------------------------------------------------------------------ 主流程
|
||||
def execute(self, cursor_data: dict | None = None) -> dict:
|
||||
"""统一 orchestrate Extract → Transform → Load"""
|
||||
context = self._build_context(cursor_data)
|
||||
task_code = self.get_task_code()
|
||||
self.logger.info(
|
||||
"%s: 开始执行,窗口[%s ~ %s]",
|
||||
task_code,
|
||||
context.window_start,
|
||||
context.window_end,
|
||||
)
|
||||
|
||||
try:
|
||||
extracted = self.extract(context)
|
||||
transformed = self.transform(extracted, context)
|
||||
counts = self.load(transformed, context) or {}
|
||||
self.db.commit()
|
||||
except Exception:
|
||||
self.db.rollback()
|
||||
self.logger.error("%s: 执行失败", task_code, exc_info=True)
|
||||
raise
|
||||
|
||||
result = self._build_result("SUCCESS", counts)
|
||||
result["window"] = {
|
||||
"start": context.window_start,
|
||||
"end": context.window_end,
|
||||
"minutes": context.window_minutes,
|
||||
}
|
||||
self.logger.info("%s: 完成,统计=%s", task_code, result["counts"])
|
||||
return result
|
||||
|
||||
# ------------------------------------------------------------------ 辅助方法
|
||||
def _build_context(self, cursor_data: dict | None) -> TaskContext:
|
||||
window_start, window_end, window_minutes = self._get_time_window(cursor_data)
|
||||
return TaskContext(
|
||||
store_id=self.config.get("app.store_id"),
|
||||
window_start=window_start,
|
||||
window_end=window_end,
|
||||
window_minutes=window_minutes,
|
||||
cursor=cursor_data,
|
||||
)
|
||||
|
||||
def _get_time_window(self, cursor_data: dict = None) -> tuple:
|
||||
"""计算时间窗口"""
|
||||
now = datetime.now(self.tz)
|
||||
|
||||
idle_start = self.config.get("run.idle_window.start", "04:00")
|
||||
idle_end = self.config.get("run.idle_window.end", "16:00")
|
||||
is_idle = self._is_in_idle_window(now, idle_start, idle_end)
|
||||
|
||||
if is_idle:
|
||||
window_minutes = self.config.get("run.window_minutes.default_idle", 180)
|
||||
else:
|
||||
window_minutes = self.config.get("run.window_minutes.default_busy", 30)
|
||||
|
||||
overlap_seconds = self.config.get("run.overlap_seconds", 120)
|
||||
|
||||
if cursor_data and cursor_data.get("last_end"):
|
||||
window_start = cursor_data["last_end"] - timedelta(seconds=overlap_seconds)
|
||||
else:
|
||||
window_start = now - timedelta(minutes=window_minutes)
|
||||
|
||||
window_end = now
|
||||
return window_start, window_end, window_minutes
|
||||
|
||||
def _is_in_idle_window(self, dt: datetime, start_time: str, end_time: str) -> bool:
|
||||
"""判断是否在闲时窗口"""
|
||||
current_time = dt.strftime("%H:%M")
|
||||
return start_time <= current_time <= end_time
|
||||
|
||||
def _merge_common_params(self, base: dict) -> dict:
|
||||
"""
|
||||
合并全局/任务级参数池,便于在配置中统一覆<E4B880>?/追加过滤条件。
|
||||
支持:
|
||||
- api.params 下的通用键<E794A8>?
|
||||
- api.params.<task_code_lower> 下的任务级键<E7BAA7>?
|
||||
"""
|
||||
merged: dict = {}
|
||||
common = self.config.get("api.params", {}) or {}
|
||||
if isinstance(common, dict):
|
||||
merged.update(common)
|
||||
|
||||
task_key = f"api.params.{self.get_task_code().lower()}"
|
||||
scoped = self.config.get(task_key, {}) or {}
|
||||
if isinstance(scoped, dict):
|
||||
merged.update(scoped)
|
||||
|
||||
merged.update(base)
|
||||
return merged
|
||||
|
||||
def _build_result(self, status: str, counts: dict) -> dict:
|
||||
"""构建结果字典"""
|
||||
return {"status": status, "counts": counts}
|
||||
93
etl_billiards/tasks/coupon_usage_task.py
Normal file
93
etl_billiards/tasks/coupon_usage_task.py
Normal file
@@ -0,0 +1,93 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""平台券核销任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.coupon_usage import CouponUsageLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class CouponUsageTask(BaseTask):
|
||||
"""同步平台券验券/核销记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "COUPON_USAGE"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Promotion/GetOfflineCouponConsumePageList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_usage(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = CouponUsageLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_coupon_usage(
|
||||
transformed["records"]
|
||||
)
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_usage(self, raw: dict, store_id: int) -> dict | None:
|
||||
usage_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not usage_id:
|
||||
self.logger.warning("跳过缺少券核销ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"usage_id": usage_id,
|
||||
"coupon_code": raw.get("coupon_code"),
|
||||
"coupon_channel": raw.get("coupon_channel"),
|
||||
"coupon_name": raw.get("coupon_name"),
|
||||
"sale_price": TypeParser.parse_decimal(raw.get("sale_price")),
|
||||
"coupon_money": TypeParser.parse_decimal(raw.get("coupon_money")),
|
||||
"coupon_free_time": TypeParser.parse_int(raw.get("coupon_free_time")),
|
||||
"use_status": raw.get("use_status"),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"consume_time": TypeParser.parse_timestamp(
|
||||
raw.get("consume_time") or raw.get("consumeTime"), self.tz
|
||||
),
|
||||
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
|
||||
"operator_name": raw.get("operator_name"),
|
||||
"table_id": TypeParser.parse_int(raw.get("table_id")),
|
||||
"site_order_id": TypeParser.parse_int(raw.get("site_order_id")),
|
||||
"group_package_id": TypeParser.parse_int(raw.get("group_package_id")),
|
||||
"coupon_remark": raw.get("coupon_remark"),
|
||||
"deal_id": raw.get("deal_id"),
|
||||
"certificate_id": raw.get("certificate_id"),
|
||||
"verify_id": raw.get("verify_id"),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
90
etl_billiards/tasks/inventory_change_task.py
Normal file
90
etl_billiards/tasks/inventory_change_task.py
Normal file
@@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""库存变更任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.inventory_change import InventoryChangeLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class InventoryChangeTask(BaseTask):
|
||||
"""同步库存变化记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "INVENTORY_CHANGE"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/GoodsStockManage/QueryGoodsOutboundReceipt",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="queryDeliveryRecordsList",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_change(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = InventoryChangeLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_changes(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_change(self, raw: dict, store_id: int) -> dict | None:
|
||||
change_id = TypeParser.parse_int(
|
||||
raw.get("siteGoodsStockId") or raw.get("site_goods_stock_id")
|
||||
)
|
||||
if not change_id:
|
||||
self.logger.warning("跳过缺少库存变动ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"change_id": change_id,
|
||||
"site_goods_id": TypeParser.parse_int(
|
||||
raw.get("siteGoodsId") or raw.get("site_goods_id")
|
||||
),
|
||||
"stock_type": raw.get("stockType") or raw.get("stock_type"),
|
||||
"goods_name": raw.get("goodsName"),
|
||||
"change_time": TypeParser.parse_timestamp(
|
||||
raw.get("createTime") or raw.get("create_time"), self.tz
|
||||
),
|
||||
"start_qty": TypeParser.parse_int(raw.get("startNum")),
|
||||
"end_qty": TypeParser.parse_int(raw.get("endNum")),
|
||||
"change_qty": TypeParser.parse_int(raw.get("changeNum")),
|
||||
"unit": raw.get("unit"),
|
||||
"price": TypeParser.parse_decimal(raw.get("price")),
|
||||
"operator_name": raw.get("operatorName"),
|
||||
"remark": raw.get("remark"),
|
||||
"goods_category_id": TypeParser.parse_int(raw.get("goodsCategoryId")),
|
||||
"goods_second_category_id": TypeParser.parse_int(
|
||||
raw.get("goodsSecondCategoryId")
|
||||
),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
115
etl_billiards/tasks/ledger_task.py
Normal file
115
etl_billiards/tasks/ledger_task.py
Normal file
@@ -0,0 +1,115 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""助教流水任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.assistant_ledger import AssistantLedgerLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class LedgerTask(BaseTask):
|
||||
"""同步助教服务台账"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "LEDGER"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/AssistantPerformance/GetOrderAssistantDetails",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="orderAssistantDetails",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_ledger(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = AssistantLedgerLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_ledgers(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_ledger(self, raw: dict, store_id: int) -> dict | None:
|
||||
ledger_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not ledger_id:
|
||||
self.logger.warning("跳过缺少助教流水ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"ledger_id": ledger_id,
|
||||
"assistant_no": raw.get("assistantNo"),
|
||||
"assistant_name": raw.get("assistantName"),
|
||||
"nickname": raw.get("nickname"),
|
||||
"level_name": raw.get("levelName"),
|
||||
"table_name": raw.get("tableName"),
|
||||
"ledger_unit_price": TypeParser.parse_decimal(raw.get("ledger_unit_price")),
|
||||
"ledger_count": TypeParser.parse_int(raw.get("ledger_count")),
|
||||
"ledger_amount": TypeParser.parse_decimal(raw.get("ledger_amount")),
|
||||
"projected_income": TypeParser.parse_decimal(raw.get("projected_income")),
|
||||
"service_money": TypeParser.parse_decimal(raw.get("service_money")),
|
||||
"member_discount_amount": TypeParser.parse_decimal(
|
||||
raw.get("member_discount_amount")
|
||||
),
|
||||
"manual_discount_amount": TypeParser.parse_decimal(
|
||||
raw.get("manual_discount_amount")
|
||||
),
|
||||
"coupon_deduct_money": TypeParser.parse_decimal(
|
||||
raw.get("coupon_deduct_money")
|
||||
),
|
||||
"order_trade_no": TypeParser.parse_int(raw.get("order_trade_no")),
|
||||
"order_settle_id": TypeParser.parse_int(raw.get("order_settle_id")),
|
||||
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
|
||||
"operator_name": raw.get("operator_name"),
|
||||
"assistant_team_id": TypeParser.parse_int(raw.get("assistant_team_id")),
|
||||
"assistant_level": raw.get("assistant_level"),
|
||||
"site_table_id": TypeParser.parse_int(raw.get("site_table_id")),
|
||||
"order_assistant_id": TypeParser.parse_int(raw.get("order_assistant_id")),
|
||||
"site_assistant_id": TypeParser.parse_int(raw.get("site_assistant_id")),
|
||||
"user_id": TypeParser.parse_int(raw.get("user_id")),
|
||||
"ledger_start_time": TypeParser.parse_timestamp(
|
||||
raw.get("ledger_start_time"), self.tz
|
||||
),
|
||||
"ledger_end_time": TypeParser.parse_timestamp(
|
||||
raw.get("ledger_end_time"), self.tz
|
||||
),
|
||||
"start_use_time": TypeParser.parse_timestamp(raw.get("start_use_time"), self.tz),
|
||||
"last_use_time": TypeParser.parse_timestamp(raw.get("last_use_time"), self.tz),
|
||||
"income_seconds": TypeParser.parse_int(raw.get("income_seconds")),
|
||||
"real_use_seconds": TypeParser.parse_int(raw.get("real_use_seconds")),
|
||||
"is_trash": raw.get("is_trash"),
|
||||
"trash_reason": raw.get("trash_reason"),
|
||||
"is_confirm": raw.get("is_confirm"),
|
||||
"ledger_status": raw.get("ledger_status"),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
1001
etl_billiards/tasks/manual_ingest_task.py
Normal file
1001
etl_billiards/tasks/manual_ingest_task.py
Normal file
File diff suppressed because it is too large
Load Diff
89
etl_billiards/tasks/members_dwd_task.py
Normal file
89
etl_billiards/tasks/members_dwd_task.py
Normal file
@@ -0,0 +1,89 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
from .base_dwd_task import BaseDwdTask
|
||||
from loaders.dimensions.member import MemberLoader
|
||||
from models.parsers import TypeParser
|
||||
import json
|
||||
|
||||
class MembersDwdTask(BaseDwdTask):
|
||||
"""
|
||||
DWD Task: Process Member Records from ODS to Dimension Table
|
||||
Source: billiards_ods.ods_member_profile
|
||||
Target: billiards.dim_member
|
||||
"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "MEMBERS_DWD"
|
||||
|
||||
def execute(self) -> dict:
|
||||
self.logger.info(f"Starting {self.get_task_code()} task")
|
||||
|
||||
window_start, window_end, _ = self._get_time_window()
|
||||
self.logger.info(f"Processing window: {window_start} to {window_end}")
|
||||
|
||||
loader = MemberLoader(self.db)
|
||||
store_id = self.config.get("app.store_id")
|
||||
|
||||
total_inserted = 0
|
||||
total_updated = 0
|
||||
total_errors = 0
|
||||
|
||||
# Iterate ODS Data
|
||||
batches = self.iter_ods_rows(
|
||||
table_name="billiards_ods.ods_member_profile",
|
||||
columns=["site_id", "member_id", "payload", "fetched_at"],
|
||||
start_time=window_start,
|
||||
end_time=window_end
|
||||
)
|
||||
|
||||
for batch in batches:
|
||||
if not batch:
|
||||
continue
|
||||
|
||||
parsed_rows = []
|
||||
for row in batch:
|
||||
payload = self.parse_payload(row)
|
||||
if not payload:
|
||||
continue
|
||||
|
||||
parsed = self._parse_member(payload, store_id)
|
||||
if parsed:
|
||||
parsed_rows.append(parsed)
|
||||
|
||||
if parsed_rows:
|
||||
inserted, updated, skipped = loader.upsert_members(parsed_rows, store_id)
|
||||
total_inserted += inserted
|
||||
total_updated += updated
|
||||
|
||||
self.db.commit()
|
||||
|
||||
self.logger.info(f"Task {self.get_task_code()} completed. Inserted: {total_inserted}, Updated: {total_updated}")
|
||||
|
||||
return {
|
||||
"status": "success",
|
||||
"inserted": total_inserted,
|
||||
"updated": total_updated,
|
||||
"window_start": window_start.isoformat(),
|
||||
"window_end": window_end.isoformat()
|
||||
}
|
||||
|
||||
def _parse_member(self, raw: dict, store_id: int) -> dict:
|
||||
"""Parse ODS payload into Dim structure"""
|
||||
try:
|
||||
# Handle both API structure (camelCase) and manual structure
|
||||
member_id = raw.get("id") or raw.get("memberId")
|
||||
if not member_id:
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"member_id": member_id,
|
||||
"member_name": raw.get("name") or raw.get("memberName"),
|
||||
"phone": raw.get("phone") or raw.get("mobile"),
|
||||
"balance": raw.get("balance", 0),
|
||||
"status": str(raw.get("status", "NORMAL")),
|
||||
"register_time": raw.get("createTime") or raw.get("registerTime"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False)
|
||||
}
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Error parsing member: {e}")
|
||||
return None
|
||||
72
etl_billiards/tasks/members_task.py
Normal file
72
etl_billiards/tasks/members_task.py
Normal file
@@ -0,0 +1,72 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""会员ETL任务"""
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.member import MemberLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class MembersTask(BaseTask):
|
||||
"""会员ETL任务"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "MEMBERS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/MemberProfile/GetTenantMemberList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="tenantMemberInfos",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
parsed_row = self._parse_member(raw, context.store_id)
|
||||
if parsed_row:
|
||||
parsed.append(parsed_row)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = MemberLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_members(
|
||||
transformed["records"], context.store_id
|
||||
)
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_member(self, raw: dict, store_id: int) -> dict | None:
|
||||
"""解析会员记录"""
|
||||
try:
|
||||
member_id = TypeParser.parse_int(raw.get("memberId"))
|
||||
if not member_id:
|
||||
return None
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"member_id": member_id,
|
||||
"member_name": raw.get("memberName"),
|
||||
"phone": raw.get("phone"),
|
||||
"balance": TypeParser.parse_decimal(raw.get("balance")),
|
||||
"status": raw.get("status"),
|
||||
"register_time": TypeParser.parse_timestamp(raw.get("registerTime"), self.tz),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
except Exception as exc:
|
||||
self.logger.warning("解析会员记录失败: %s, 原始数据: %s", exc, raw)
|
||||
return None
|
||||
933
etl_billiards/tasks/ods_tasks.py
Normal file
933
etl_billiards/tasks/ods_tasks.py
Normal file
@@ -0,0 +1,933 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""ODS ingestion tasks."""
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, Dict, Iterable, List, Sequence, Tuple, Type
|
||||
|
||||
from loaders.ods import GenericODSLoader
|
||||
from models.parsers import TypeParser
|
||||
from .base_task import BaseTask
|
||||
|
||||
|
||||
ColumnTransform = Callable[[Any], Any]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ColumnSpec:
|
||||
"""Mapping between DB column and source JSON field."""
|
||||
|
||||
column: str
|
||||
sources: Tuple[str, ...] = ()
|
||||
required: bool = False
|
||||
default: Any = None
|
||||
transform: ColumnTransform | None = None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class OdsTaskSpec:
|
||||
"""Definition of a single ODS ingestion task."""
|
||||
|
||||
code: str
|
||||
class_name: str
|
||||
table_name: str
|
||||
endpoint: str
|
||||
data_path: Tuple[str, ...] = ("data",)
|
||||
list_key: str | None = None
|
||||
pk_columns: Tuple[ColumnSpec, ...] = ()
|
||||
extra_columns: Tuple[ColumnSpec, ...] = ()
|
||||
include_page_size: bool = False
|
||||
include_page_no: bool = False
|
||||
include_source_file: bool = True
|
||||
include_source_endpoint: bool = True
|
||||
include_record_index: bool = False
|
||||
include_site_column: bool = True
|
||||
include_fetched_at: bool = True
|
||||
requires_window: bool = True
|
||||
time_fields: Tuple[str, str] | None = ("startTime", "endTime")
|
||||
include_site_id: bool = True
|
||||
description: str = ""
|
||||
extra_params: Dict[str, Any] = field(default_factory=dict)
|
||||
conflict_columns_override: Tuple[str, ...] | None = None
|
||||
|
||||
|
||||
class BaseOdsTask(BaseTask):
|
||||
"""Shared functionality for ODS ingestion tasks."""
|
||||
|
||||
SPEC: OdsTaskSpec
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return self.SPEC.code
|
||||
|
||||
def execute(self) -> dict:
|
||||
spec = self.SPEC
|
||||
self.logger.info("开始执行 %s (ODS)", spec.code)
|
||||
|
||||
store_id = TypeParser.parse_int(self.config.get("app.store_id"))
|
||||
if not store_id:
|
||||
raise ValueError("app.store_id 未配置,无法执行 ODS 任务")
|
||||
|
||||
page_size = self.config.get("api.page_size", 200)
|
||||
params = self._build_params(spec, store_id)
|
||||
columns = self._resolve_columns(spec)
|
||||
if spec.conflict_columns_override:
|
||||
conflict_columns = list(spec.conflict_columns_override)
|
||||
else:
|
||||
conflict_columns = []
|
||||
if spec.include_site_column:
|
||||
conflict_columns.append("site_id")
|
||||
conflict_columns += [col.column for col in spec.pk_columns]
|
||||
loader = GenericODSLoader(
|
||||
self.db,
|
||||
spec.table_name,
|
||||
columns,
|
||||
conflict_columns,
|
||||
)
|
||||
|
||||
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
|
||||
source_file = self._resolve_source_file_hint(spec)
|
||||
|
||||
try:
|
||||
global_index = 0
|
||||
for page_no, page_records, _, _ in self.api.iter_paginated(
|
||||
endpoint=spec.endpoint,
|
||||
params=params,
|
||||
page_size=page_size,
|
||||
data_path=spec.data_path,
|
||||
list_key=spec.list_key,
|
||||
):
|
||||
rows: List[dict] = []
|
||||
for raw in page_records:
|
||||
row = self._build_row(
|
||||
spec=spec,
|
||||
store_id=store_id,
|
||||
record=raw,
|
||||
page_no=page_no if spec.include_page_no else None,
|
||||
page_size_value=len(page_records)
|
||||
if spec.include_page_size
|
||||
else None,
|
||||
source_file=source_file,
|
||||
record_index=global_index if spec.include_record_index else None,
|
||||
)
|
||||
if row is None:
|
||||
counts["skipped"] += 1
|
||||
continue
|
||||
rows.append(row)
|
||||
global_index += 1
|
||||
|
||||
inserted, updated, _ = loader.upsert_rows(rows)
|
||||
counts["inserted"] += inserted
|
||||
counts["updated"] += updated
|
||||
counts["fetched"] += len(page_records)
|
||||
|
||||
self.db.commit()
|
||||
self.logger.info("%s ODS 任务完成: %s", spec.code, counts)
|
||||
return self._build_result("SUCCESS", counts)
|
||||
|
||||
except Exception:
|
||||
self.db.rollback()
|
||||
counts["errors"] += 1
|
||||
self.logger.error("%s ODS 任务失败", spec.code, exc_info=True)
|
||||
raise
|
||||
|
||||
def _build_params(self, spec: OdsTaskSpec, store_id: int) -> dict:
|
||||
base: dict[str, Any] = {}
|
||||
if spec.include_site_id:
|
||||
base["siteId"] = store_id
|
||||
if spec.requires_window and spec.time_fields:
|
||||
window_start, window_end, _ = self._get_time_window()
|
||||
start_key, end_key = spec.time_fields
|
||||
base[start_key] = TypeParser.format_timestamp(window_start, self.tz)
|
||||
base[end_key] = TypeParser.format_timestamp(window_end, self.tz)
|
||||
|
||||
params = self._merge_common_params(base)
|
||||
params.update(spec.extra_params)
|
||||
return params
|
||||
|
||||
def _resolve_columns(self, spec: OdsTaskSpec) -> List[str]:
|
||||
columns: List[str] = []
|
||||
if spec.include_site_column:
|
||||
columns.append("site_id")
|
||||
seen = set(columns)
|
||||
for col_spec in list(spec.pk_columns) + list(spec.extra_columns):
|
||||
if col_spec.column not in seen:
|
||||
columns.append(col_spec.column)
|
||||
seen.add(col_spec.column)
|
||||
|
||||
if spec.include_record_index and "record_index" not in seen:
|
||||
columns.append("record_index")
|
||||
seen.add("record_index")
|
||||
|
||||
if spec.include_page_no and "page_no" not in seen:
|
||||
columns.append("page_no")
|
||||
seen.add("page_no")
|
||||
|
||||
if spec.include_page_size and "page_size" not in seen:
|
||||
columns.append("page_size")
|
||||
seen.add("page_size")
|
||||
|
||||
if spec.include_source_file and "source_file" not in seen:
|
||||
columns.append("source_file")
|
||||
seen.add("source_file")
|
||||
|
||||
if spec.include_source_endpoint and "source_endpoint" not in seen:
|
||||
columns.append("source_endpoint")
|
||||
seen.add("source_endpoint")
|
||||
|
||||
if spec.include_fetched_at and "fetched_at" not in seen:
|
||||
columns.append("fetched_at")
|
||||
seen.add("fetched_at")
|
||||
if "payload" not in seen:
|
||||
columns.append("payload")
|
||||
|
||||
return columns
|
||||
|
||||
def _build_row(
|
||||
self,
|
||||
spec: OdsTaskSpec,
|
||||
store_id: int,
|
||||
record: dict,
|
||||
page_no: int | None,
|
||||
page_size_value: int | None,
|
||||
source_file: str | None,
|
||||
record_index: int | None = None,
|
||||
) -> dict | None:
|
||||
row: dict[str, Any] = {}
|
||||
if spec.include_site_column:
|
||||
row["site_id"] = store_id
|
||||
|
||||
for col_spec in spec.pk_columns + spec.extra_columns:
|
||||
value = self._extract_value(record, col_spec)
|
||||
if value is None and col_spec.required:
|
||||
self.logger.warning(
|
||||
"%s 缺少必填字段 %s,原始记录: %s",
|
||||
spec.code,
|
||||
col_spec.column,
|
||||
record,
|
||||
)
|
||||
return None
|
||||
row[col_spec.column] = value
|
||||
|
||||
if spec.include_page_no:
|
||||
row["page_no"] = page_no
|
||||
if spec.include_page_size:
|
||||
row["page_size"] = page_size_value
|
||||
if spec.include_record_index:
|
||||
row["record_index"] = record_index
|
||||
if spec.include_source_file:
|
||||
row["source_file"] = source_file
|
||||
if spec.include_source_endpoint:
|
||||
row["source_endpoint"] = spec.endpoint
|
||||
|
||||
if spec.include_fetched_at:
|
||||
row["fetched_at"] = datetime.now(self.tz)
|
||||
row["payload"] = record
|
||||
return row
|
||||
|
||||
def _extract_value(self, record: dict, spec: ColumnSpec):
|
||||
value = None
|
||||
for key in spec.sources:
|
||||
value = self._dig(record, key)
|
||||
if value is not None:
|
||||
break
|
||||
if value is None and spec.default is not None:
|
||||
value = spec.default
|
||||
if value is not None and spec.transform:
|
||||
value = spec.transform(value)
|
||||
return value
|
||||
|
||||
@staticmethod
|
||||
def _dig(record: Any, path: str | None):
|
||||
if not path:
|
||||
return None
|
||||
current = record
|
||||
for part in path.split("."):
|
||||
if isinstance(current, dict):
|
||||
current = current.get(part)
|
||||
else:
|
||||
return None
|
||||
return current
|
||||
|
||||
def _resolve_source_file_hint(self, spec: OdsTaskSpec) -> str | None:
|
||||
resolver = getattr(self.api, "get_source_hint", None)
|
||||
if callable(resolver):
|
||||
return resolver(spec.endpoint)
|
||||
return None
|
||||
|
||||
|
||||
def _int_col(name: str, *sources: str, required: bool = False) -> ColumnSpec:
|
||||
return ColumnSpec(
|
||||
column=name,
|
||||
sources=sources,
|
||||
required=required,
|
||||
transform=TypeParser.parse_int,
|
||||
)
|
||||
|
||||
|
||||
ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
|
||||
OdsTaskSpec(
|
||||
code="ODS_ASSISTANT_ACCOUNTS",
|
||||
class_name="OdsAssistantAccountsTask",
|
||||
table_name="billiards_ods.assistant_accounts_master",
|
||||
endpoint="/PersonnelManagement/SearchAssistantInfo",
|
||||
data_path=("data",),
|
||||
list_key="assistantInfos",
|
||||
pk_columns=(_int_col("id", "id", required=True),),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
description="助教账号档案 ODS:SearchAssistantInfo -> assistantInfos 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_ORDER_SETTLE",
|
||||
class_name="OdsOrderSettleTask",
|
||||
table_name="billiards_ods.settlement_records",
|
||||
endpoint="/Site/GetAllOrderSettleList",
|
||||
data_path=("data",),
|
||||
list_key="settleList",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="结账记录 ODS:GetAllOrderSettleList -> settleList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_TABLE_USE",
|
||||
class_name="OdsTableUseTask",
|
||||
table_name="billiards_ods.table_fee_transactions",
|
||||
endpoint="/Site/GetSiteTableOrderDetails",
|
||||
data_path=("data",),
|
||||
list_key="siteTableUseDetailsList",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="台费计费流水 ODS:GetSiteTableOrderDetails -> siteTableUseDetailsList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_ASSISTANT_LEDGER",
|
||||
class_name="OdsAssistantLedgerTask",
|
||||
table_name="billiards_ods.assistant_service_records",
|
||||
endpoint="/AssistantPerformance/GetOrderAssistantDetails",
|
||||
data_path=("data",),
|
||||
list_key="orderAssistantDetails",
|
||||
pk_columns=(_int_col("id", "id", required=True),),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
description="助教服务流水 ODS:GetOrderAssistantDetails -> orderAssistantDetails 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_ASSISTANT_ABOLISH",
|
||||
class_name="OdsAssistantAbolishTask",
|
||||
table_name="billiards_ods.assistant_cancellation_records",
|
||||
endpoint="/AssistantPerformance/GetAbolitionAssistant",
|
||||
data_path=("data",),
|
||||
list_key="abolitionAssistants",
|
||||
pk_columns=(_int_col("id", "id", required=True),),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
description="助教废除记录 ODS:GetAbolitionAssistant -> abolitionAssistants 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_GOODS_LEDGER",
|
||||
class_name="OdsGoodsLedgerTask",
|
||||
table_name="billiards_ods.store_goods_sales_records",
|
||||
endpoint="/TenantGoods/GetGoodsSalesList",
|
||||
data_path=("data",),
|
||||
list_key="orderGoodsLedgers",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="门店商品销售流水 ODS:GetGoodsSalesList -> orderGoodsLedgers 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_PAYMENT",
|
||||
class_name="OdsPaymentTask",
|
||||
table_name="billiards_ods.payment_transactions",
|
||||
endpoint="/PayLog/GetPayLogListPage",
|
||||
data_path=("data",),
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="支付流水 ODS:GetPayLogListPage 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_REFUND",
|
||||
class_name="OdsRefundTask",
|
||||
table_name="billiards_ods.refund_transactions",
|
||||
endpoint="/Order/GetRefundPayLogList",
|
||||
data_path=("data",),
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="退款流水 ODS:GetRefundPayLogList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_COUPON_VERIFY",
|
||||
class_name="OdsCouponVerifyTask",
|
||||
table_name="billiards_ods.platform_coupon_redemption_records",
|
||||
endpoint="/Promotion/GetOfflineCouponConsumePageList",
|
||||
data_path=("data",),
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="平台/团购券核销 ODS:GetOfflineCouponConsumePageList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_MEMBER",
|
||||
class_name="OdsMemberTask",
|
||||
table_name="billiards_ods.member_profiles",
|
||||
endpoint="/MemberProfile/GetTenantMemberList",
|
||||
data_path=("data",),
|
||||
list_key="tenantMemberInfos",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="会员档案 ODS:GetTenantMemberList -> tenantMemberInfos 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_MEMBER_CARD",
|
||||
class_name="OdsMemberCardTask",
|
||||
table_name="billiards_ods.member_stored_value_cards",
|
||||
endpoint="/MemberProfile/GetTenantMemberCardList",
|
||||
data_path=("data",),
|
||||
list_key="tenantMemberCards",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="会员储值卡 ODS:GetTenantMemberCardList -> tenantMemberCards 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_MEMBER_BALANCE",
|
||||
class_name="OdsMemberBalanceTask",
|
||||
table_name="billiards_ods.member_balance_changes",
|
||||
endpoint="/MemberProfile/GetMemberCardBalanceChange",
|
||||
data_path=("data",),
|
||||
list_key="tenantMemberCardLogs",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="会员余额变动 ODS:GetMemberCardBalanceChange -> tenantMemberCardLogs 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_RECHARGE_SETTLE",
|
||||
class_name="OdsRechargeSettleTask",
|
||||
table_name="billiards_ods.recharge_settlements",
|
||||
endpoint="/Site/GetRechargeSettleList",
|
||||
data_path=("data",),
|
||||
list_key="settleList",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="会员充值结算 ODS:GetRechargeSettleList -> settleList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_PACKAGE",
|
||||
class_name="OdsPackageTask",
|
||||
table_name="billiards_ods.group_buy_packages",
|
||||
endpoint="/PackageCoupon/QueryPackageCouponList",
|
||||
data_path=("data",),
|
||||
list_key="packageCouponList",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="团购套餐定义 ODS:QueryPackageCouponList -> packageCouponList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_GROUP_BUY_REDEMPTION",
|
||||
class_name="OdsGroupBuyRedemptionTask",
|
||||
table_name="billiards_ods.group_buy_redemption_records",
|
||||
endpoint="/Site/GetSiteTableUseDetails",
|
||||
data_path=("data",),
|
||||
list_key="siteTableUseDetailsList",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="团购套餐核销 ODS:GetSiteTableUseDetails -> siteTableUseDetailsList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_INVENTORY_STOCK",
|
||||
class_name="OdsInventoryStockTask",
|
||||
table_name="billiards_ods.goods_stock_summary",
|
||||
endpoint="/TenantGoods/GetGoodsStockReport",
|
||||
data_path=("data",),
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="库存汇总 ODS:GetGoodsStockReport 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_INVENTORY_CHANGE",
|
||||
class_name="OdsInventoryChangeTask",
|
||||
table_name="billiards_ods.goods_stock_movements",
|
||||
endpoint="/GoodsStockManage/QueryGoodsOutboundReceipt",
|
||||
data_path=("data",),
|
||||
list_key="queryDeliveryRecordsList",
|
||||
pk_columns=(_int_col("sitegoodsstockid", "siteGoodsStockId", required=True),),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
description="库存变化记录 ODS:QueryGoodsOutboundReceipt -> queryDeliveryRecordsList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_TABLES",
|
||||
class_name="OdsTablesTask",
|
||||
table_name="billiards_ods.site_tables_master",
|
||||
endpoint="/Table/GetSiteTables",
|
||||
data_path=("data",),
|
||||
list_key="siteTables",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="台桌维表 ODS:GetSiteTables -> siteTables 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_GOODS_CATEGORY",
|
||||
class_name="OdsGoodsCategoryTask",
|
||||
table_name="billiards_ods.stock_goods_category_tree",
|
||||
endpoint="/TenantGoodsCategory/QueryPrimarySecondaryCategory",
|
||||
data_path=("data",),
|
||||
list_key="goodsCategoryList",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="库存商品分类树 ODS:QueryPrimarySecondaryCategory -> goodsCategoryList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_STORE_GOODS",
|
||||
class_name="OdsStoreGoodsTask",
|
||||
table_name="billiards_ods.store_goods_master",
|
||||
endpoint="/TenantGoods/GetGoodsInventoryList",
|
||||
data_path=("data",),
|
||||
list_key="orderGoodsList",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="门店商品档案 ODS:GetGoodsInventoryList -> orderGoodsList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_TABLE_DISCOUNT",
|
||||
class_name="OdsTableDiscountTask",
|
||||
table_name="billiards_ods.table_fee_discount_records",
|
||||
endpoint="/Site/GetTaiFeeAdjustList",
|
||||
data_path=("data",),
|
||||
list_key="taiFeeAdjustInfos",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="台费折扣/调账 ODS:GetTaiFeeAdjustList -> taiFeeAdjustInfos 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_TENANT_GOODS",
|
||||
class_name="OdsTenantGoodsTask",
|
||||
table_name="billiards_ods.tenant_goods_master",
|
||||
endpoint="/TenantGoods/QueryTenantGoods",
|
||||
data_path=("data",),
|
||||
list_key="tenantGoodsList",
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=False,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
description="租户商品档案 ODS:QueryTenantGoods -> tenantGoodsList 原始 JSON",
|
||||
),
|
||||
OdsTaskSpec(
|
||||
code="ODS_SETTLEMENT_TICKET",
|
||||
class_name="OdsSettlementTicketTask",
|
||||
table_name="billiards_ods.settlement_ticket_details",
|
||||
endpoint="/Order/GetOrderSettleTicketNew",
|
||||
data_path=(),
|
||||
list_key=None,
|
||||
pk_columns=(),
|
||||
include_site_column=False,
|
||||
include_source_endpoint=False,
|
||||
include_page_no=False,
|
||||
include_page_size=False,
|
||||
include_fetched_at=True,
|
||||
include_record_index=True,
|
||||
conflict_columns_override=("source_file", "record_index"),
|
||||
requires_window=False,
|
||||
include_site_id=False,
|
||||
description="结账小票详情 ODS:GetOrderSettleTicketNew 原始 JSON",
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
def _get_spec(code: str) -> OdsTaskSpec:
|
||||
for spec in ODS_TASK_SPECS:
|
||||
if spec.code == code:
|
||||
return spec
|
||||
raise KeyError(f"Spec not found for code {code}")
|
||||
|
||||
|
||||
_SETTLEMENT_TICKET_SPEC = _get_spec("ODS_SETTLEMENT_TICKET")
|
||||
|
||||
|
||||
class OdsSettlementTicketTask(BaseOdsTask):
|
||||
"""Special handling: fetch ticket details per payment relate_id/orderSettleId."""
|
||||
|
||||
SPEC = _SETTLEMENT_TICKET_SPEC
|
||||
|
||||
def extract(self, context) -> dict:
|
||||
"""Fetch ticket payloads only (used by fetch-only pipeline)."""
|
||||
existing_ids = self._fetch_existing_ticket_ids()
|
||||
candidates = self._collect_settlement_ids(
|
||||
context.store_id or 0, existing_ids, context.window_start, context.window_end
|
||||
)
|
||||
candidates = [cid for cid in candidates if cid and cid not in existing_ids]
|
||||
payloads, skipped = self._fetch_ticket_payloads(candidates)
|
||||
return {"records": payloads, "skipped": skipped, "fetched": len(candidates)}
|
||||
|
||||
def execute(self, cursor_data: dict | None = None) -> dict:
|
||||
spec = self.SPEC
|
||||
context = self._build_context(cursor_data)
|
||||
store_id = TypeParser.parse_int(self.config.get("app.store_id")) or 0
|
||||
|
||||
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
|
||||
loader = GenericODSLoader(
|
||||
self.db,
|
||||
spec.table_name,
|
||||
self._resolve_columns(spec),
|
||||
list(spec.conflict_columns_override or ("source_file", "record_index")),
|
||||
)
|
||||
source_file = self._resolve_source_file_hint(spec)
|
||||
|
||||
try:
|
||||
existing_ids = self._fetch_existing_ticket_ids()
|
||||
candidates = self._collect_settlement_ids(
|
||||
store_id, existing_ids, context.window_start, context.window_end
|
||||
)
|
||||
candidates = [cid for cid in candidates if cid and cid not in existing_ids]
|
||||
counts["fetched"] = len(candidates)
|
||||
|
||||
if not candidates:
|
||||
self.logger.info(
|
||||
"%s: 窗口[%s ~ %s] 未发现需要抓取的小票",
|
||||
spec.code,
|
||||
context.window_start,
|
||||
context.window_end,
|
||||
)
|
||||
return self._build_result("SUCCESS", counts)
|
||||
|
||||
payloads, skipped = self._fetch_ticket_payloads(candidates)
|
||||
counts["skipped"] += skipped
|
||||
rows: list[dict] = []
|
||||
for idx, payload in enumerate(payloads):
|
||||
row = self._build_row(
|
||||
spec=spec,
|
||||
store_id=store_id,
|
||||
record=payload,
|
||||
page_no=None,
|
||||
page_size_value=None,
|
||||
source_file=source_file,
|
||||
record_index=idx if spec.include_record_index else None,
|
||||
)
|
||||
if row is None:
|
||||
counts["skipped"] += 1
|
||||
continue
|
||||
rows.append(row)
|
||||
|
||||
inserted, updated, _ = loader.upsert_rows(rows)
|
||||
counts["inserted"] += inserted
|
||||
counts["updated"] += updated
|
||||
self.db.commit()
|
||||
self.logger.info(
|
||||
"%s: 小票抓取完成,候选=%s 插入=%s 更新=%s 跳过=%s",
|
||||
spec.code,
|
||||
len(candidates),
|
||||
inserted,
|
||||
updated,
|
||||
counts["skipped"],
|
||||
)
|
||||
return self._build_result("SUCCESS", counts)
|
||||
|
||||
except Exception:
|
||||
counts["errors"] += 1
|
||||
self.db.rollback()
|
||||
self.logger.error("%s: 小票抓取失败", spec.code, exc_info=True)
|
||||
raise
|
||||
|
||||
# ------------------------------------------------------------------ helpers
|
||||
def _fetch_existing_ticket_ids(self) -> set[int]:
|
||||
sql = """
|
||||
SELECT DISTINCT
|
||||
CASE WHEN (payload ->> 'orderSettleId') ~ '^[0-9]+$'
|
||||
THEN (payload ->> 'orderSettleId')::bigint
|
||||
END AS order_settle_id
|
||||
FROM billiards_ods.settlement_ticket_details
|
||||
"""
|
||||
try:
|
||||
rows = self.db.query(sql)
|
||||
except Exception:
|
||||
self.logger.warning("查询已有小票失败,按空集处理", exc_info=True)
|
||||
return set()
|
||||
|
||||
return {
|
||||
TypeParser.parse_int(row.get("order_settle_id"))
|
||||
for row in rows
|
||||
if row.get("order_settle_id") is not None
|
||||
}
|
||||
|
||||
def _collect_settlement_ids(
|
||||
self, store_id: int, existing_ids: set[int], window_start, window_end
|
||||
) -> list[int]:
|
||||
ids = self._fetch_from_payment_table(store_id)
|
||||
if not ids:
|
||||
ids = self._fetch_from_payment_api(store_id, window_start, window_end)
|
||||
return sorted(i for i in ids if i is not None and i not in existing_ids)
|
||||
|
||||
def _fetch_from_payment_table(self, store_id: int) -> set[int]:
|
||||
sql = """
|
||||
SELECT DISTINCT COALESCE(
|
||||
CASE WHEN (payload ->> 'orderSettleId') ~ '^[0-9]+$'
|
||||
THEN (payload ->> 'orderSettleId')::bigint END,
|
||||
CASE WHEN (payload ->> 'relateId') ~ '^[0-9]+$'
|
||||
THEN (payload ->> 'relateId')::bigint END
|
||||
) AS order_settle_id
|
||||
FROM billiards_ods.payment_transactions
|
||||
WHERE (payload ->> 'orderSettleId') ~ '^[0-9]+$'
|
||||
OR (payload ->> 'relateId') ~ '^[0-9]+$'
|
||||
"""
|
||||
params = None
|
||||
if store_id:
|
||||
sql += " AND COALESCE((payload ->> 'siteId')::bigint, %s) = %s"
|
||||
params = (store_id, store_id)
|
||||
|
||||
try:
|
||||
rows = self.db.query(sql, params)
|
||||
except Exception:
|
||||
self.logger.warning("读取支付流水以获取结算单ID失败,将尝试调用支付接口回退", exc_info=True)
|
||||
return set()
|
||||
|
||||
return {
|
||||
TypeParser.parse_int(row.get("order_settle_id"))
|
||||
for row in rows
|
||||
if row.get("order_settle_id") is not None
|
||||
}
|
||||
|
||||
def _fetch_from_payment_api(self, store_id: int, window_start, window_end) -> set[int]:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": store_id,
|
||||
"StartPayTime": TypeParser.format_timestamp(window_start, self.tz),
|
||||
"EndPayTime": TypeParser.format_timestamp(window_end, self.tz),
|
||||
}
|
||||
)
|
||||
candidate_ids: set[int] = set()
|
||||
try:
|
||||
for _, records, _, _ in self.api.iter_paginated(
|
||||
endpoint="/PayLog/GetPayLogListPage",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
):
|
||||
for rec in records:
|
||||
relate_id = TypeParser.parse_int(
|
||||
(rec or {}).get("relateId")
|
||||
or (rec or {}).get("orderSettleId")
|
||||
or (rec or {}).get("order_settle_id")
|
||||
)
|
||||
if relate_id:
|
||||
candidate_ids.add(relate_id)
|
||||
except Exception:
|
||||
self.logger.warning("调用支付接口获取结算单ID失败,当前批次将跳过回退来源", exc_info=True)
|
||||
return candidate_ids
|
||||
|
||||
def _fetch_ticket_payload(self, order_settle_id: int):
|
||||
payload = None
|
||||
try:
|
||||
for _, _, _, response in self.api.iter_paginated(
|
||||
endpoint=self.SPEC.endpoint,
|
||||
params={"orderSettleId": order_settle_id},
|
||||
page_size=None,
|
||||
data_path=self.SPEC.data_path,
|
||||
list_key=self.SPEC.list_key,
|
||||
):
|
||||
payload = response
|
||||
except Exception:
|
||||
self.logger.warning(
|
||||
"调用小票接口失败 orderSettleId=%s", order_settle_id, exc_info=True
|
||||
)
|
||||
if isinstance(payload, dict) and isinstance(payload.get("data"), list) and len(payload["data"]) == 1:
|
||||
# 本地桩/回放可能把响应包装成单元素 list,这里展开以贴近真实结构
|
||||
payload = payload["data"][0]
|
||||
return payload
|
||||
|
||||
def _fetch_ticket_payloads(self, candidates: list[int]) -> tuple[list, int]:
|
||||
"""Fetch ticket payloads for a set of orderSettleIds; returns (payloads, skipped)."""
|
||||
payloads: list = []
|
||||
skipped = 0
|
||||
for order_settle_id in candidates:
|
||||
payload = self._fetch_ticket_payload(order_settle_id)
|
||||
if payload:
|
||||
payloads.append(payload)
|
||||
else:
|
||||
skipped += 1
|
||||
return payloads, skipped
|
||||
|
||||
|
||||
def _build_task_class(spec: OdsTaskSpec) -> Type[BaseOdsTask]:
|
||||
attrs = {
|
||||
"SPEC": spec,
|
||||
"__doc__": spec.description or f"ODS ingestion task {spec.code}",
|
||||
"__module__": __name__,
|
||||
}
|
||||
return type(spec.class_name, (BaseOdsTask,), attrs)
|
||||
|
||||
|
||||
ENABLED_ODS_CODES = {
|
||||
"ODS_ASSISTANT_ACCOUNTS",
|
||||
"ODS_ASSISTANT_LEDGER",
|
||||
"ODS_ASSISTANT_ABOLISH",
|
||||
"ODS_INVENTORY_CHANGE",
|
||||
"ODS_INVENTORY_STOCK",
|
||||
"ODS_PACKAGE",
|
||||
"ODS_GROUP_BUY_REDEMPTION",
|
||||
"ODS_MEMBER",
|
||||
"ODS_MEMBER_BALANCE",
|
||||
"ODS_MEMBER_CARD",
|
||||
"ODS_PAYMENT",
|
||||
"ODS_REFUND",
|
||||
"ODS_COUPON_VERIFY",
|
||||
"ODS_RECHARGE_SETTLE",
|
||||
"ODS_TABLES",
|
||||
"ODS_GOODS_CATEGORY",
|
||||
"ODS_STORE_GOODS",
|
||||
"ODS_TABLE_DISCOUNT",
|
||||
"ODS_TENANT_GOODS",
|
||||
"ODS_SETTLEMENT_TICKET",
|
||||
"ODS_ORDER_SETTLE",
|
||||
}
|
||||
|
||||
ODS_TASK_CLASSES: Dict[str, Type[BaseOdsTask]] = {
|
||||
spec.code: _build_task_class(spec)
|
||||
for spec in ODS_TASK_SPECS
|
||||
if spec.code in ENABLED_ODS_CODES
|
||||
}
|
||||
# Override with specialized settlement ticket implementation
|
||||
ODS_TASK_CLASSES["ODS_SETTLEMENT_TICKET"] = OdsSettlementTicketTask
|
||||
|
||||
__all__ = ["ODS_TASK_CLASSES", "ODS_TASK_SPECS", "BaseOdsTask", "ENABLED_ODS_CODES"]
|
||||
91
etl_billiards/tasks/orders_task.py
Normal file
91
etl_billiards/tasks/orders_task.py
Normal file
@@ -0,0 +1,91 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""订单ETL任务"""
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.order import OrderLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class OrdersTask(BaseTask):
|
||||
"""订单数据ETL任务"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "ORDERS"
|
||||
|
||||
# ------------------------------------------------------------------ E/T/L hooks
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
"""调用 API 拉取订单记录"""
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"rangeStartTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"rangeEndTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, pages_meta = self.api.get_paginated(
|
||||
endpoint="/Site/GetAllOrderSettleList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="settleList",
|
||||
)
|
||||
return {"records": records, "meta": pages_meta}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
"""解析原始订单 JSON"""
|
||||
parsed_records = []
|
||||
skipped = 0
|
||||
|
||||
for rec in extracted.get("records", []):
|
||||
parsed = self._parse_order(rec, context.store_id)
|
||||
if parsed:
|
||||
parsed_records.append(parsed)
|
||||
else:
|
||||
skipped += 1
|
||||
|
||||
return {
|
||||
"records": parsed_records,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
"""写入 fact_order"""
|
||||
loader = OrderLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_orders(
|
||||
transformed["records"], context.store_id
|
||||
)
|
||||
|
||||
counts = {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
return counts
|
||||
|
||||
# ------------------------------------------------------------------ helpers
|
||||
def _parse_order(self, raw: dict, store_id: int) -> dict | None:
|
||||
"""解析单条订单记录"""
|
||||
try:
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"order_id": TypeParser.parse_int(raw.get("orderId")),
|
||||
"order_no": raw.get("orderNo"),
|
||||
"member_id": TypeParser.parse_int(raw.get("memberId")),
|
||||
"table_id": TypeParser.parse_int(raw.get("tableId")),
|
||||
"order_time": TypeParser.parse_timestamp(raw.get("orderTime"), self.tz),
|
||||
"end_time": TypeParser.parse_timestamp(raw.get("endTime"), self.tz),
|
||||
"total_amount": TypeParser.parse_decimal(raw.get("totalAmount")),
|
||||
"discount_amount": TypeParser.parse_decimal(raw.get("discountAmount")),
|
||||
"final_amount": TypeParser.parse_decimal(raw.get("finalAmount")),
|
||||
"pay_status": raw.get("payStatus"),
|
||||
"order_status": raw.get("orderStatus"),
|
||||
"remark": raw.get("remark"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
except Exception as exc:
|
||||
self.logger.warning("解析订单失败: %s, 原始数据: %s", exc, raw)
|
||||
return None
|
||||
90
etl_billiards/tasks/packages_task.py
Normal file
90
etl_billiards/tasks/packages_task.py
Normal file
@@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""团购/套餐定义任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.package import PackageDefinitionLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class PackagesDefTask(BaseTask):
|
||||
"""同步团购套餐定义"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "PACKAGES_DEF"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/PackageCoupon/QueryPackageCouponList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="packageCouponList",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_package(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = PackageDefinitionLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_packages(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_package(self, raw: dict, store_id: int) -> dict | None:
|
||||
package_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not package_id:
|
||||
self.logger.warning("跳过缺少 package id 的套餐记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"package_id": package_id,
|
||||
"package_code": raw.get("package_id") or raw.get("packageId"),
|
||||
"package_name": raw.get("package_name"),
|
||||
"table_area_id": raw.get("table_area_id"),
|
||||
"table_area_name": raw.get("table_area_name"),
|
||||
"selling_price": TypeParser.parse_decimal(
|
||||
raw.get("selling_price") or raw.get("sellingPrice")
|
||||
),
|
||||
"duration_seconds": TypeParser.parse_int(raw.get("duration")),
|
||||
"start_time": TypeParser.parse_timestamp(
|
||||
raw.get("start_time") or raw.get("startTime"), self.tz
|
||||
),
|
||||
"end_time": TypeParser.parse_timestamp(
|
||||
raw.get("end_time") or raw.get("endTime"), self.tz
|
||||
),
|
||||
"type": raw.get("type"),
|
||||
"is_enabled": raw.get("is_enabled"),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"usable_count": TypeParser.parse_int(raw.get("usable_count")),
|
||||
"creator_name": raw.get("creator_name"),
|
||||
"date_type": raw.get("date_type"),
|
||||
"group_type": raw.get("group_type"),
|
||||
"coupon_money": TypeParser.parse_decimal(
|
||||
raw.get("coupon_money") or raw.get("couponMoney")
|
||||
),
|
||||
"area_tag_type": raw.get("area_tag_type"),
|
||||
"system_group_type": raw.get("system_group_type"),
|
||||
"card_type_ids": raw.get("card_type_ids"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
138
etl_billiards/tasks/payments_dwd_task.py
Normal file
138
etl_billiards/tasks/payments_dwd_task.py
Normal file
@@ -0,0 +1,138 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
from .base_dwd_task import BaseDwdTask
|
||||
from loaders.facts.payment import PaymentLoader
|
||||
from models.parsers import TypeParser
|
||||
import json
|
||||
|
||||
class PaymentsDwdTask(BaseDwdTask):
|
||||
"""
|
||||
DWD Task: Process Payment Records from ODS to Fact Table
|
||||
Source: billiards_ods.ods_payment
|
||||
Target: billiards.fact_payment
|
||||
"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "PAYMENTS_DWD"
|
||||
|
||||
def execute(self) -> dict:
|
||||
self.logger.info(f"Starting {self.get_task_code()} task")
|
||||
|
||||
window_start, window_end, _ = self._get_time_window()
|
||||
self.logger.info(f"Processing window: {window_start} to {window_end}")
|
||||
|
||||
loader = PaymentLoader(self.db, logger=self.logger)
|
||||
store_id = self.config.get("app.store_id")
|
||||
|
||||
total_inserted = 0
|
||||
total_updated = 0
|
||||
total_skipped = 0
|
||||
|
||||
# Iterate ODS Data
|
||||
batches = self.iter_ods_rows(
|
||||
table_name="billiards_ods.ods_payment_record",
|
||||
columns=["site_id", "pay_id", "payload", "fetched_at"],
|
||||
start_time=window_start,
|
||||
end_time=window_end
|
||||
)
|
||||
|
||||
for batch in batches:
|
||||
if not batch:
|
||||
continue
|
||||
|
||||
parsed_rows = []
|
||||
for row in batch:
|
||||
payload = self.parse_payload(row)
|
||||
if not payload:
|
||||
continue
|
||||
|
||||
parsed = self._parse_payment(payload, store_id)
|
||||
if parsed:
|
||||
parsed_rows.append(parsed)
|
||||
|
||||
if parsed_rows:
|
||||
inserted, updated, skipped = loader.upsert_payments(parsed_rows, store_id)
|
||||
total_inserted += inserted
|
||||
total_updated += updated
|
||||
total_skipped += skipped
|
||||
|
||||
self.db.commit()
|
||||
|
||||
self.logger.info(
|
||||
"Task %s completed. inserted=%s updated=%s skipped=%s",
|
||||
self.get_task_code(),
|
||||
total_inserted,
|
||||
total_updated,
|
||||
total_skipped,
|
||||
)
|
||||
|
||||
return {
|
||||
"status": "SUCCESS",
|
||||
"counts": {
|
||||
"inserted": total_inserted,
|
||||
"updated": total_updated,
|
||||
"skipped": total_skipped,
|
||||
},
|
||||
"window_start": window_start,
|
||||
"window_end": window_end,
|
||||
}
|
||||
|
||||
def _parse_payment(self, raw: dict, store_id: int) -> dict:
|
||||
"""Parse ODS payload into Fact structure"""
|
||||
try:
|
||||
pay_id = TypeParser.parse_int(raw.get("payId") or raw.get("id"))
|
||||
if not pay_id:
|
||||
return None
|
||||
|
||||
relate_type = str(raw.get("relateType") or raw.get("relate_type") or "")
|
||||
relate_id = TypeParser.parse_int(raw.get("relateId") or raw.get("relate_id"))
|
||||
|
||||
# Attempt to populate settlement / trade identifiers
|
||||
order_settle_id = TypeParser.parse_int(
|
||||
raw.get("orderSettleId") or raw.get("order_settle_id")
|
||||
)
|
||||
order_trade_no = TypeParser.parse_int(
|
||||
raw.get("orderTradeNo") or raw.get("order_trade_no")
|
||||
)
|
||||
|
||||
if relate_type in {"1", "SETTLE", "ORDER"}:
|
||||
order_settle_id = order_settle_id or relate_id
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"pay_id": pay_id,
|
||||
"order_id": TypeParser.parse_int(raw.get("orderId") or raw.get("order_id")),
|
||||
"order_settle_id": order_settle_id,
|
||||
"order_trade_no": order_trade_no,
|
||||
"relate_type": relate_type,
|
||||
"relate_id": relate_id,
|
||||
"site_id": TypeParser.parse_int(
|
||||
raw.get("siteId") or raw.get("site_id") or store_id
|
||||
),
|
||||
"tenant_id": TypeParser.parse_int(raw.get("tenantId") or raw.get("tenant_id")),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("createTime") or raw.get("create_time"), self.tz
|
||||
),
|
||||
"pay_time": TypeParser.parse_timestamp(raw.get("payTime"), self.tz),
|
||||
"pay_amount": TypeParser.parse_decimal(raw.get("payAmount")),
|
||||
"fee_amount": TypeParser.parse_decimal(
|
||||
raw.get("feeAmount")
|
||||
or raw.get("serviceFee")
|
||||
or raw.get("channelFee")
|
||||
or raw.get("fee_amount")
|
||||
),
|
||||
"discount_amount": TypeParser.parse_decimal(
|
||||
raw.get("discountAmount")
|
||||
or raw.get("couponAmount")
|
||||
or raw.get("discount_amount")
|
||||
),
|
||||
"payment_method": str(raw.get("paymentMethod") or raw.get("payment_method") or ""),
|
||||
"pay_type": raw.get("payType") or raw.get("pay_type"),
|
||||
"online_pay_channel": raw.get("onlinePayChannel") or raw.get("online_pay_channel"),
|
||||
"pay_terminal": raw.get("payTerminal") or raw.get("pay_terminal"),
|
||||
"pay_status": str(raw.get("payStatus") or raw.get("pay_status") or ""),
|
||||
"remark": raw.get("remark"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False)
|
||||
}
|
||||
except Exception as e:
|
||||
self.logger.warning(f"Error parsing payment: {e}")
|
||||
return None
|
||||
111
etl_billiards/tasks/payments_task.py
Normal file
111
etl_billiards/tasks/payments_task.py
Normal file
@@ -0,0 +1,111 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""支付记录ETL任务"""
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.payment import PaymentLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class PaymentsTask(BaseTask):
|
||||
"""支付记录 E/T/L 任务"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "PAYMENTS"
|
||||
|
||||
# ------------------------------------------------------------------ E/T/L hooks
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
"""调用 API 抓取支付记录"""
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"StartPayTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"EndPayTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, pages_meta = self.api.get_paginated(
|
||||
endpoint="/PayLog/GetPayLogListPage",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
)
|
||||
return {"records": records, "meta": pages_meta}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
"""解析支付 JSON"""
|
||||
parsed, skipped = [], 0
|
||||
for rec in extracted.get("records", []):
|
||||
cleaned = self._parse_payment(rec, context.store_id)
|
||||
if cleaned:
|
||||
parsed.append(cleaned)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
"""写入 fact_payment"""
|
||||
loader = PaymentLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_payments(
|
||||
transformed["records"], context.store_id
|
||||
)
|
||||
counts = {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
return counts
|
||||
|
||||
# ------------------------------------------------------------------ helpers
|
||||
def _parse_payment(self, raw: dict, store_id: int) -> dict | None:
|
||||
"""解析支付记录"""
|
||||
try:
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"pay_id": TypeParser.parse_int(raw.get("payId") or raw.get("id")),
|
||||
"order_id": TypeParser.parse_int(raw.get("orderId")),
|
||||
"order_settle_id": TypeParser.parse_int(
|
||||
raw.get("orderSettleId") or raw.get("order_settle_id")
|
||||
),
|
||||
"order_trade_no": TypeParser.parse_int(
|
||||
raw.get("orderTradeNo") or raw.get("order_trade_no")
|
||||
),
|
||||
"relate_type": raw.get("relateType") or raw.get("relate_type"),
|
||||
"relate_id": TypeParser.parse_int(raw.get("relateId") or raw.get("relate_id")),
|
||||
"site_id": TypeParser.parse_int(
|
||||
raw.get("siteId") or raw.get("site_id") or store_id
|
||||
),
|
||||
"tenant_id": TypeParser.parse_int(raw.get("tenantId") or raw.get("tenant_id")),
|
||||
"pay_time": TypeParser.parse_timestamp(raw.get("payTime"), self.tz),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("createTime") or raw.get("create_time"), self.tz
|
||||
),
|
||||
"pay_amount": TypeParser.parse_decimal(raw.get("payAmount")),
|
||||
"fee_amount": TypeParser.parse_decimal(
|
||||
raw.get("feeAmount")
|
||||
or raw.get("serviceFee")
|
||||
or raw.get("channelFee")
|
||||
or raw.get("fee_amount")
|
||||
),
|
||||
"discount_amount": TypeParser.parse_decimal(
|
||||
raw.get("discountAmount")
|
||||
or raw.get("couponAmount")
|
||||
or raw.get("discount_amount")
|
||||
),
|
||||
"pay_type": raw.get("payType"),
|
||||
"payment_method": raw.get("paymentMethod") or raw.get("payment_method"),
|
||||
"online_pay_channel": raw.get("onlinePayChannel")
|
||||
or raw.get("online_pay_channel"),
|
||||
"pay_status": raw.get("payStatus"),
|
||||
"pay_terminal": raw.get("payTerminal") or raw.get("pay_terminal"),
|
||||
"remark": raw.get("remark"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
except Exception as exc:
|
||||
self.logger.warning("解析支付记录失败: %s, 原始数据: %s", exc, raw)
|
||||
return None
|
||||
93
etl_billiards/tasks/products_task.py
Normal file
93
etl_billiards/tasks/products_task.py
Normal file
@@ -0,0 +1,93 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""商品档案(PRODUCTS)ETL任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.product import ProductLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class ProductsTask(BaseTask):
|
||||
"""商品维度 ETL 任务"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "PRODUCTS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/TenantGoods/QueryTenantGoods",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="tenantGoodsList",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
parsed_row = self._parse_product(raw, context.store_id)
|
||||
if parsed_row:
|
||||
parsed.append(parsed_row)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = ProductLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_products(
|
||||
transformed["records"], context.store_id
|
||||
)
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_product(self, raw: dict, store_id: int) -> dict | None:
|
||||
try:
|
||||
product_id = TypeParser.parse_int(
|
||||
raw.get("siteGoodsId") or raw.get("tenantGoodsId") or raw.get("productId")
|
||||
)
|
||||
if not product_id:
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"product_id": product_id,
|
||||
"site_product_id": TypeParser.parse_int(raw.get("siteGoodsId")),
|
||||
"product_name": raw.get("goodsName") or raw.get("productName"),
|
||||
"category_id": TypeParser.parse_int(
|
||||
raw.get("tenantGoodsCategoryId") or raw.get("goodsCategoryId")
|
||||
),
|
||||
"category_name": raw.get("categoryName"),
|
||||
"second_category_id": TypeParser.parse_int(raw.get("goodsCategorySecondId")),
|
||||
"unit": raw.get("goodsUnit"),
|
||||
"cost_price": TypeParser.parse_decimal(raw.get("costPrice")),
|
||||
"sale_price": TypeParser.parse_decimal(
|
||||
raw.get("goodsPrice") or raw.get("salePrice")
|
||||
),
|
||||
"allow_discount": None,
|
||||
"status": raw.get("goodsState") or raw.get("status"),
|
||||
"supplier_id": TypeParser.parse_int(raw.get("supplierId"))
|
||||
if raw.get("supplierId")
|
||||
else None,
|
||||
"barcode": raw.get("barcode"),
|
||||
"is_combo": bool(raw.get("isCombo"))
|
||||
if raw.get("isCombo") is not None
|
||||
else None,
|
||||
"created_time": TypeParser.parse_timestamp(raw.get("createTime"), self.tz),
|
||||
"updated_time": TypeParser.parse_timestamp(raw.get("updateTime"), self.tz),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
except Exception as exc:
|
||||
self.logger.warning("解析商品记录失败: %s, 原始数据: %s", exc, raw)
|
||||
return None
|
||||
90
etl_billiards/tasks/refunds_task.py
Normal file
90
etl_billiards/tasks/refunds_task.py
Normal file
@@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""退款记录任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.refund import RefundLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class RefundsTask(BaseTask):
|
||||
"""同步支付退款流水"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "REFUNDS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Order/GetRefundPayLogList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_refund(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = RefundLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_refunds(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_refund(self, raw: dict, store_id: int) -> dict | None:
|
||||
refund_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not refund_id:
|
||||
self.logger.warning("跳过缺少退款ID的数据: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"refund_id": refund_id,
|
||||
"site_id": TypeParser.parse_int(raw.get("site_id") or raw.get("siteId")),
|
||||
"tenant_id": TypeParser.parse_int(raw.get("tenant_id") or raw.get("tenantId")),
|
||||
"pay_amount": TypeParser.parse_decimal(raw.get("pay_amount")),
|
||||
"pay_status": raw.get("pay_status"),
|
||||
"pay_time": TypeParser.parse_timestamp(
|
||||
raw.get("pay_time") or raw.get("payTime"), self.tz
|
||||
),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"relate_type": raw.get("relate_type"),
|
||||
"relate_id": TypeParser.parse_int(raw.get("relate_id")),
|
||||
"payment_method": raw.get("payment_method"),
|
||||
"refund_amount": TypeParser.parse_decimal(raw.get("refund_amount")),
|
||||
"action_type": raw.get("action_type"),
|
||||
"pay_terminal": raw.get("pay_terminal"),
|
||||
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
|
||||
"channel_pay_no": raw.get("channel_pay_no"),
|
||||
"channel_fee": TypeParser.parse_decimal(raw.get("channel_fee")),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"member_id": TypeParser.parse_int(raw.get("member_id")),
|
||||
"member_card_id": TypeParser.parse_int(raw.get("member_card_id")),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
92
etl_billiards/tasks/table_discount_task.py
Normal file
92
etl_billiards/tasks/table_discount_task.py
Normal file
@@ -0,0 +1,92 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""台费折扣任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.table_discount import TableDiscountLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class TableDiscountTask(BaseTask):
|
||||
"""同步台费折扣/调价记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "TABLE_DISCOUNT"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Site/GetTaiFeeAdjustList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="taiFeeAdjustInfos",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_discount(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = TableDiscountLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_discounts(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_discount(self, raw: dict, store_id: int) -> dict | None:
|
||||
discount_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not discount_id:
|
||||
self.logger.warning("跳过缺少折扣ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
table_profile = raw.get("tableProfile") or {}
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"discount_id": discount_id,
|
||||
"adjust_type": raw.get("adjust_type") or raw.get("adjustType"),
|
||||
"applicant_id": TypeParser.parse_int(raw.get("applicant_id")),
|
||||
"applicant_name": raw.get("applicant_name"),
|
||||
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
|
||||
"operator_name": raw.get("operator_name"),
|
||||
"ledger_amount": TypeParser.parse_decimal(raw.get("ledger_amount")),
|
||||
"ledger_count": TypeParser.parse_int(raw.get("ledger_count")),
|
||||
"ledger_name": raw.get("ledger_name"),
|
||||
"ledger_status": raw.get("ledger_status"),
|
||||
"order_settle_id": TypeParser.parse_int(raw.get("order_settle_id")),
|
||||
"order_trade_no": TypeParser.parse_int(raw.get("order_trade_no")),
|
||||
"site_table_id": TypeParser.parse_int(
|
||||
raw.get("site_table_id") or table_profile.get("id")
|
||||
),
|
||||
"table_area_id": TypeParser.parse_int(
|
||||
raw.get("tableAreaId") or table_profile.get("site_table_area_id")
|
||||
),
|
||||
"table_area_name": table_profile.get("site_table_area_name"),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"is_delete": raw.get("is_delete"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
84
etl_billiards/tasks/tables_task.py
Normal file
84
etl_billiards/tasks/tables_task.py
Normal file
@@ -0,0 +1,84 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""台桌档案任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.dimensions.table import TableLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class TablesTask(BaseTask):
|
||||
"""同步门店台桌列表"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "TABLES"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params({"siteId": context.store_id})
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Table/GetSiteTables",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="siteTables",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_table(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = TableLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_tables(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_table(self, raw: dict, store_id: int) -> dict | None:
|
||||
table_id = TypeParser.parse_int(raw.get("id"))
|
||||
if not table_id:
|
||||
self.logger.warning("跳过缺少 table_id 的台桌记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"table_id": table_id,
|
||||
"site_id": TypeParser.parse_int(raw.get("site_id") or raw.get("siteId")),
|
||||
"area_id": TypeParser.parse_int(
|
||||
raw.get("site_table_area_id") or raw.get("siteTableAreaId")
|
||||
),
|
||||
"area_name": raw.get("areaName") or raw.get("site_table_area_name"),
|
||||
"table_name": raw.get("table_name") or raw.get("tableName"),
|
||||
"table_price": TypeParser.parse_decimal(
|
||||
raw.get("table_price") or raw.get("tablePrice")
|
||||
),
|
||||
"table_status": raw.get("table_status") or raw.get("tableStatus"),
|
||||
"table_status_name": raw.get("tableStatusName"),
|
||||
"light_status": raw.get("light_status"),
|
||||
"is_rest_area": raw.get("is_rest_area"),
|
||||
"show_status": raw.get("show_status"),
|
||||
"virtual_table": raw.get("virtual_table"),
|
||||
"charge_free": raw.get("charge_free"),
|
||||
"only_allow_groupon": raw.get("only_allow_groupon"),
|
||||
"is_online_reservation": raw.get("is_online_reservation"),
|
||||
"created_time": TypeParser.parse_timestamp(
|
||||
raw.get("create_time") or raw.get("createTime"), self.tz
|
||||
),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
69
etl_billiards/tasks/ticket_dwd_task.py
Normal file
69
etl_billiards/tasks/ticket_dwd_task.py
Normal file
@@ -0,0 +1,69 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
from .base_dwd_task import BaseDwdTask
|
||||
from loaders.facts.ticket import TicketLoader
|
||||
|
||||
class TicketDwdTask(BaseDwdTask):
|
||||
"""
|
||||
DWD Task: Process Ticket Details from ODS to Fact Tables
|
||||
Source: billiards_ods.ods_ticket_detail
|
||||
Targets:
|
||||
- billiards.fact_order
|
||||
- billiards.fact_order_goods
|
||||
- billiards.fact_table_usage
|
||||
- billiards.fact_assistant_service
|
||||
"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "TICKET_DWD"
|
||||
|
||||
def execute(self) -> dict:
|
||||
self.logger.info(f"Starting {self.get_task_code()} task")
|
||||
|
||||
# 1. Get Time Window (Incremental Load)
|
||||
window_start, window_end, _ = self._get_time_window()
|
||||
self.logger.info(f"Processing window: {window_start} to {window_end}")
|
||||
|
||||
# 2. Initialize Loader
|
||||
loader = TicketLoader(self.db, logger=self.logger)
|
||||
store_id = self.config.get("app.store_id")
|
||||
|
||||
total_inserted = 0
|
||||
total_errors = 0
|
||||
|
||||
# 3. Iterate ODS Data
|
||||
# We query ods_ticket_detail based on fetched_at
|
||||
batches = self.iter_ods_rows(
|
||||
table_name="billiards_ods.settlement_ticket_details",
|
||||
columns=["payload", "fetched_at", "source_file", "record_index"],
|
||||
start_time=window_start,
|
||||
end_time=window_end
|
||||
)
|
||||
|
||||
for batch in batches:
|
||||
if not batch:
|
||||
continue
|
||||
|
||||
# Extract payloads
|
||||
tickets = []
|
||||
for row in batch:
|
||||
payload = self.parse_payload(row)
|
||||
if payload:
|
||||
tickets.append(payload)
|
||||
|
||||
# Process Batch
|
||||
inserted, errors = loader.process_tickets(tickets, store_id)
|
||||
total_inserted += inserted
|
||||
total_errors += errors
|
||||
|
||||
# 4. Commit
|
||||
self.db.commit()
|
||||
|
||||
self.logger.info(f"Task {self.get_task_code()} completed. Inserted: {total_inserted}, Errors: {total_errors}")
|
||||
|
||||
return {
|
||||
"status": "success",
|
||||
"inserted": total_inserted,
|
||||
"errors": total_errors,
|
||||
"window_start": window_start.isoformat(),
|
||||
"window_end": window_end.isoformat()
|
||||
}
|
||||
102
etl_billiards/tasks/topups_task.py
Normal file
102
etl_billiards/tasks/topups_task.py
Normal file
@@ -0,0 +1,102 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""充值记录任务"""
|
||||
|
||||
import json
|
||||
|
||||
from .base_task import BaseTask, TaskContext
|
||||
from loaders.facts.topup import TopupLoader
|
||||
from models.parsers import TypeParser
|
||||
|
||||
|
||||
class TopupsTask(BaseTask):
|
||||
"""同步储值充值结算记录"""
|
||||
|
||||
def get_task_code(self) -> str:
|
||||
return "TOPUPS"
|
||||
|
||||
def extract(self, context: TaskContext) -> dict:
|
||||
params = self._merge_common_params(
|
||||
{
|
||||
"siteId": context.store_id,
|
||||
"rangeStartTime": TypeParser.format_timestamp(context.window_start, self.tz),
|
||||
"rangeEndTime": TypeParser.format_timestamp(context.window_end, self.tz),
|
||||
}
|
||||
)
|
||||
records, _ = self.api.get_paginated(
|
||||
endpoint="/Site/GetRechargeSettleList",
|
||||
params=params,
|
||||
page_size=self.config.get("api.page_size", 200),
|
||||
data_path=("data",),
|
||||
list_key="settleList",
|
||||
)
|
||||
return {"records": records}
|
||||
|
||||
def transform(self, extracted: dict, context: TaskContext) -> dict:
|
||||
parsed, skipped = [], 0
|
||||
for raw in extracted.get("records", []):
|
||||
mapped = self._parse_topup(raw, context.store_id)
|
||||
if mapped:
|
||||
parsed.append(mapped)
|
||||
else:
|
||||
skipped += 1
|
||||
return {
|
||||
"records": parsed,
|
||||
"fetched": len(extracted.get("records", [])),
|
||||
"skipped": skipped,
|
||||
}
|
||||
|
||||
def load(self, transformed: dict, context: TaskContext) -> dict:
|
||||
loader = TopupLoader(self.db)
|
||||
inserted, updated, loader_skipped = loader.upsert_topups(transformed["records"])
|
||||
return {
|
||||
"fetched": transformed["fetched"],
|
||||
"inserted": inserted,
|
||||
"updated": updated,
|
||||
"skipped": transformed["skipped"] + loader_skipped,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
def _parse_topup(self, raw: dict, store_id: int) -> dict | None:
|
||||
node = raw.get("settleList") if isinstance(raw.get("settleList"), dict) else raw
|
||||
topup_id = TypeParser.parse_int(node.get("id"))
|
||||
if not topup_id:
|
||||
self.logger.warning("跳过缺少充值ID的记录: %s", raw)
|
||||
return None
|
||||
|
||||
return {
|
||||
"store_id": store_id,
|
||||
"topup_id": topup_id,
|
||||
"member_id": TypeParser.parse_int(node.get("memberId")),
|
||||
"member_name": node.get("memberName"),
|
||||
"member_phone": node.get("memberPhone"),
|
||||
"card_id": TypeParser.parse_int(node.get("tenantMemberCardId")),
|
||||
"card_type_name": node.get("memberCardTypeName"),
|
||||
"pay_amount": TypeParser.parse_decimal(node.get("payAmount")),
|
||||
"consume_money": TypeParser.parse_decimal(node.get("consumeMoney")),
|
||||
"settle_status": node.get("settleStatus"),
|
||||
"settle_type": node.get("settleType"),
|
||||
"settle_name": node.get("settleName"),
|
||||
"settle_relate_id": TypeParser.parse_int(node.get("settleRelateId")),
|
||||
"pay_time": TypeParser.parse_timestamp(
|
||||
node.get("payTime") or node.get("pay_time"), self.tz
|
||||
),
|
||||
"create_time": TypeParser.parse_timestamp(
|
||||
node.get("createTime") or node.get("create_time"), self.tz
|
||||
),
|
||||
"operator_id": TypeParser.parse_int(node.get("operatorId")),
|
||||
"operator_name": node.get("operatorName"),
|
||||
"payment_method": node.get("paymentMethod"),
|
||||
"refund_amount": TypeParser.parse_decimal(node.get("refundAmount")),
|
||||
"cash_amount": TypeParser.parse_decimal(node.get("cashAmount")),
|
||||
"card_amount": TypeParser.parse_decimal(node.get("cardAmount")),
|
||||
"balance_amount": TypeParser.parse_decimal(node.get("balanceAmount")),
|
||||
"online_amount": TypeParser.parse_decimal(node.get("onlineAmount")),
|
||||
"rounding_amount": TypeParser.parse_decimal(node.get("roundingAmount")),
|
||||
"adjust_amount": TypeParser.parse_decimal(node.get("adjustAmount")),
|
||||
"goods_money": TypeParser.parse_decimal(node.get("goodsMoney")),
|
||||
"table_charge_money": TypeParser.parse_decimal(node.get("tableChargeMoney")),
|
||||
"service_money": TypeParser.parse_decimal(node.get("serviceMoney")),
|
||||
"coupon_amount": TypeParser.parse_decimal(node.get("couponAmount")),
|
||||
"order_remark": node.get("orderRemark"),
|
||||
"raw_data": json.dumps(raw, ensure_ascii=False),
|
||||
}
|
||||
0
etl_billiards/tests/__init__.py
Normal file
0
etl_billiards/tests/__init__.py
Normal file
0
etl_billiards/tests/integration/__init__.py
Normal file
0
etl_billiards/tests/integration/__init__.py
Normal file
33
etl_billiards/tests/integration/test_database.py
Normal file
33
etl_billiards/tests/integration/test_database.py
Normal file
@@ -0,0 +1,33 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""数据库集成测试"""
|
||||
import pytest
|
||||
from database.connection import DatabaseConnection
|
||||
from database.operations import DatabaseOperations
|
||||
|
||||
# 注意:这些测试需要实际的数据库连接
|
||||
# 在CI/CD环境中应使用测试数据库
|
||||
|
||||
@pytest.fixture
|
||||
def db_connection():
|
||||
"""数据库连接fixture"""
|
||||
# 从环境变量获取测试数据库DSN
|
||||
import os
|
||||
dsn = os.environ.get("TEST_DB_DSN")
|
||||
if not dsn:
|
||||
pytest.skip("未配置测试数据库")
|
||||
|
||||
conn = DatabaseConnection(dsn)
|
||||
yield conn
|
||||
conn.close()
|
||||
|
||||
def test_database_query(db_connection):
|
||||
"""测试数据库查询"""
|
||||
result = db_connection.query("SELECT 1 AS test")
|
||||
assert len(result) == 1
|
||||
assert result[0]["test"] == 1
|
||||
|
||||
def test_database_operations(db_connection):
|
||||
"""测试数据库操作"""
|
||||
ops = DatabaseOperations(db_connection)
|
||||
# 添加实际的测试用例
|
||||
pass
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user