Compare commits

...

17 Commits

Author SHA1 Message Date
Neo
a6ad343092 ODS 完成 2025-11-30 07:19:05 +08:00
Neo
b9b050bb5d ODS 完成 2025-11-30 07:18:55 +08:00
Neo
cbd16a39ba 阶段性更新 2025-11-20 01:27:33 +08:00
Neo
92f219b575 阶段性更新 2025-11-20 01:27:04 +08:00
Neo
b1f64c4bac 版本更改 2025-11-19 05:35:22 +08:00
Neo
ed47754b46 版本更改 2025-11-19 05:35:10 +08:00
Neo
fbee8a751e 同步 2025-11-19 05:32:03 +08:00
Neo
cbe48c8ee7 为WinSW做准备 2025-11-19 05:05:11 +08:00
Neo
821d302243 Merge branch 'main' into dev 2025-11-19 03:38:27 +08:00
Neo
9a1df70a23 补全任务与测试 2025-11-19 03:36:44 +08:00
Neo
5bb5a8a568 迁移代码到Git 2025-11-18 21:46:46 +08:00
Neo
c3749474c6 迁移代码到Git 2025-11-18 02:32:00 +08:00
Neo
7f87421678 迁移代码到Git 2025-11-18 02:31:52 +08:00
Neo
84e80841cd 代码迁移 2025-11-18 02:28:47 +08:00
13d853c3f5 Merge pull request 'Merge pull request '清空.md' (#2) from main into test' (#3) from test into main
Reviewed-on: #3
2025-11-17 17:10:17 +00:00
56fac9d3c0 Merge pull request '清空.md' (#2) from main into test
Reviewed-on: #2
2025-11-17 17:08:30 +00:00
ccf3baca2b 清空.md
Signed-off-by: root <Neo101Neo101@gmail.com>
2025-11-17 16:51:46 +00:00
124 changed files with 26218 additions and 606 deletions

48
.gitignore vendored Normal file
View File

@@ -0,0 +1,48 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# 虚拟环境
venv/
ENV/
env/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# 日志和导出
*.log
*.jsonl
export/
logs/
# 环境变量
.env
.env.local
# 测试
.pytest_cache/
.coverage
htmlcov/

0
.gitkeep Normal file
View File

1360
20251121-task.txt Normal file

File diff suppressed because it is too large Load Diff

BIN
DWD层设计建议.docx Normal file

Binary file not shown.

675
README.md
View File

@@ -1,615 +1,78 @@
# 台球场 ETL 系统(模块化版本)合并文档
本文为原多份文档(如 `INDEX.md``QUICK_START.md``ARCHITECTURE.md``MIGRATION_GUIDE.md``PROJECT_STRUCTURE.md``README.md` 等)的合并版,只保留与**当前项目本身**相关的内容:项目说明、目录结构、架构设计、数据与控制流程、迁移与扩展指南等,不包含修改历史和重构过程描述
---
## 1. 项目概述
台球场 ETL 系统是一个面向门店业务的专业 ETL 工程项目,用于从外部业务 API 拉取订单、支付、会员等数据经过解析、校验、SCD2 处理、质量检查后写入 PostgreSQL 数据库,并支持增量同步和任务运行追踪
系统采用模块化、分层架构设计,核心特性包括:
- 模块化目录结构配置、数据库、API、模型、加载器、SCD2、质量检查、编排、任务、CLI、工具、测试等分层清晰
- 完整的配置管理:默认值 + 环境变量 + CLI 参数多层覆盖
- 可复用的数据库访问层(连接管理、批量 Upsert 封装)
- 支持重试与分页的 API 客户端
- 类型安全的数据解析与校验模块
- SCD2 维度历史管理
- 数据质量检查(例如余额一致性检查)
- 任务编排层统一调度、游标管理与运行追踪
- 命令行入口统一管理任务执行支持筛选任务、Dry-run 等模式。
---
## 2. 快速开始
### 2.1 环境准备
- Python 版本:建议 3.10+
- 数据库PostgreSQL
- 操作系统Windows / Linux / macOS 均可
```bash
# 克隆/下载代码后进入项目目录
cd etl_billiards/
ls -la
```
你会看到下述目录结构的顶层部分(详细见第 4 章):
- `config/` - 配置管理
- `database/` - 数据库访问
- `api/` - API 客户端
- `tasks/` - ETL 任务实现
- `cli/` - 命令行入口
- `docs/` - 技术文档
### 2.2 安装依赖
```bash
pip install -r requirements.txt
```
主要依赖示例(按实际 `requirements.txt` 为准):
- `psycopg2-binary`PostgreSQL 驱动
- `requests`HTTP 客户端
- `python-dateutil`:时间处理
- `tzdata`:时区数据
### 2.3 配置环境变量
复制并修改环境变量模板:
```bash
cp .env.example .env
# 使用你习惯的编辑器修改 .env
```
`.env` 示例(最小配置):
```bash
# 数据库
PG_DSN=postgresql://user:password@localhost:5432/LLZQ
# API
API_BASE=https://api.example.com
API_TOKEN=your_token_here
# 门店/应用
STORE_ID=2790685415443269
TIMEZONE=Asia/Taipei
# 目录
EXPORT_ROOT=/path/to/export
LOG_ROOT=/path/to/logs
```
> 所有配置项的默认值见 `config/defaults.py`,最终生效配置由「默认值 + 环境变量 + CLI 参数」三层叠加。
### 2.4 运行第一个任务
通过 CLI 入口运行:
```bash
# 运行所有任务
python -m cli.main
# 仅运行订单任务
python -m cli.main --tasks ORDERS
# 运行订单 + 支付
python -m cli.main --tasks ORDERS,PAYMENTS
# Windows 使用脚本
run_etl.bat --tasks ORDERS
# Linux / macOS 使用脚本
./run_etl.sh --tasks ORDERS
```
### 2.5 查看结果
- 日志目录:使用 `LOG_ROOT` 指定,例如
```bash
ls -la D:\LLZQ\DB\logs/
```
- 导出目录:使用 `EXPORT_ROOT` 指定,例如
```bash
ls -la D:\LLZQ\DB\export/
```
---
## 3. 常用命令与开发工具
### 3.1 CLI 常用命令
```bash
# 运行所有任务
python -m cli.main
# 运行指定任务
python -m cli.main --tasks ORDERS,PAYMENTS,MEMBERS
# 使用自定义数据库
python -m cli.main --pg-dsn "postgresql://user:password@host:5432/db"
# 使用自定义 API 端点
python -m cli.main --api-base "https://api.example.com" --api-token "..."
# 试运行(不写入数据库)
python -m cli.main --dry-run --tasks ORDERS
```
### 3.2 IDE / 代码质量工具示例VSCode
`.vscode/settings.json` 示例:
```json
{
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "black",
"python.testing.pytestEnabled": true
}
```
代码格式化与检查:
```bash
pip install black isort pylint
black .
isort .
pylint etl_billiards/
```
### 3.3 测试
```bash
# 安装测试依赖(按需)
pip install pytest pytest-cov
# 运行全部测试
pytest
# 仅运行单元测试
pytest tests/unit/
# 生成覆盖率报告
pytest --cov=. --cov-report=html
```
测试示例(按实际项目为准):
- `tests/unit/test_config.py` 配置管理单元测试
- `tests/unit/test_parsers.py` 解析器单元测试
- `tests/integration/test_database.py` 数据库集成测试
---
## 4. 项目结构与文件说明
### 4.1 总体目录结构(树状图)
```text
etl_billiards/
├── README.md # 项目总览和使用说明
├── MIGRATION_GUIDE.md # 从旧版本迁移指南
├── requirements.txt # Python 依赖列表
├── setup.py # 项目安装配置
├── .env.example # 环境变量配置模板
├── .gitignore # Git 忽略文件配置
├── run_etl.sh # Linux/Mac 运行脚本
├── run_etl.bat # Windows 运行脚本
├── config/ # 配置管理模块
│ ├── __init__.py
│ ├── defaults.py # 默认配置值定义
│ ├── env_parser.py # 环境变量解析器
│ └── settings.py # 配置管理主类
├── database/ # 数据库访问层
│ ├── __init__.py
│ ├── connection.py # 数据库连接管理
│ └── operations.py # 批量操作封装
├── api/ # HTTP API 客户端
│ ├── __init__.py
│ └── client.py # API 客户端(重试 + 分页)
├── models/ # 数据模型层
│ ├── __init__.py
│ ├── parsers.py # 类型解析器
│ └── validators.py # 数据验证器
├── loaders/ # 数据加载器层
│ ├── __init__.py
│ ├── base_loader.py # 加载器基类
│ ├── dimensions/ # 维度表加载器
│ │ ├── __init__.py
│ │ └── member.py # 会员维度加载器
│ └── facts/ # 事实表加载器
│ ├── __init__.py
│ ├── order.py # 订单事实表加载器
│ └── payment.py # 支付记录加载器
├── scd/ # SCD2 处理层
│ ├── __init__.py
│ └── scd2_handler.py # SCD2 历史记录处理器
├── quality/ # 数据质量检查层
│ ├── __init__.py
│ ├── base_checker.py # 质量检查器基类
│ └── balance_checker.py # 余额一致性检查器
├── orchestration/ # ETL 编排层
│ ├── __init__.py
│ ├── scheduler.py # ETL 调度器
│ ├── task_registry.py # 任务注册表(工厂模式)
│ ├── cursor_manager.py # 游标管理器
│ └── run_tracker.py # 运行记录追踪器
├── tasks/ # ETL 任务层
│ ├── __init__.py
│ ├── base_task.py # 任务基类(模板方法)
│ ├── orders_task.py # 订单 ETL 任务
│ ├── payments_task.py # 支付 ETL 任务
│ └── members_task.py # 会员 ETL 任务
├── cli/ # 命令行接口层
│ ├── __init__.py
│ └── main.py # CLI 主入口
├── utils/ # 工具函数
│ ├── __init__.py
│ └── helpers.py # 通用工具函数
├── tests/ # 测试代码
│ ├── __init__.py
│ ├── unit/ # 单元测试
│ │ ├── __init__.py
│ │ ├── test_config.py
│ │ └── test_parsers.py
│ └── integration/ # 集成测试
│ ├── __init__.py
│ └── test_database.py
└── docs/ # 文档
└── ARCHITECTURE.md # 架构设计文档
```
### 4.2 各模块职责概览
- **config/**
- 统一配置入口,支持默认值、环境变量、命令行参数三层覆盖。
- **database/**
- 封装 PostgreSQL 连接与批量操作插入、更新、Upsert 等)。
- **api/**
- 对上游业务 API 的 HTTP 调用进行统一封装,支持重试、分页与超时控制。
- **models/**
- 提供类型解析器(时间戳、金额、整数等)与业务级数据校验器。
- **loaders/**
- 提供事实表与维度表的加载逻辑(包含批量 Upsert、统计写入结果等
- **scd/**
- 维度型数据的 SCD2 历史管理(有效期、版本标记等)。
- **quality/**
- 质量检查策略,例如余额一致性、记录数量对齐等。
- **orchestration/**
- 任务调度、任务注册、游标管理(增量窗口)、运行记录追踪。
- **tasks/**
- 具体业务任务(订单、支付、会员等),封装了从“取数 → 处理 → 写库 → 记录结果”的完整流程。
- **cli/**
- 命令行入口,解析参数并启动调度流程。
- **utils/**
- 杂项工具函数。
- **tests/**
- 单元测试与集成测试代码。
---
## 5. 架构设计与流程说明
### 5.1 分层架构图
```text
┌─────────────────────────────────────┐
│ CLI 命令行接口 │ <- cli/main.py
└─────────────┬───────────────────────┘
┌─────────────▼───────────────────────┐
│ Orchestration 编排层 │ <- orchestration/
│ (Scheduler, TaskRegistry, ...) │
└─────────────┬───────────────────────┘
┌─────────────▼───────────────────────┐
│ Tasks 任务层 │ <- tasks/
│ (OrdersTask, PaymentsTask, ...) │
└───┬─────────┬─────────┬─────────────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌─────┐ ┌──────────┐
│Loaders │ │ SCD │ │ Quality │ <- loaders/, scd/, quality/
└────────┘ └─────┘ └──────────┘
┌───────▼────────┐
│ Models 模型 │ <- models/
└───────┬────────┘
┌───────▼────────┐
│ API 客户端 │ <- api/
└───────┬────────┘
┌───────▼────────┐
│ Database 访问 │ <- database/
└───────┬────────┘
┌───────▼────────┐
│ Config 配置 │ <- config/
└────────────────┘
```
### 5.2 各层职责(当前设计)
- **CLI 层 (`cli/`)**
- 解析命令行参数指定任务列表、Dry-run、覆盖配置项等
- 初始化配置与日志后交由编排层执行。
- **编排层 (`orchestration/`)**
- `scheduler.py`:根据配置与 CLI 参数选择需要执行的任务,控制执行顺序和并行策略。
- `task_registry.py`:提供任务注册表,按任务代码创建任务实例(工厂模式)。
- `cursor_manager.py`:管理增量游标(时间窗口 / ID 游标)。
- `run_tracker.py`:记录每次任务运行的状态、统计信息和错误信息。
- **任务层 (`tasks/`)**
- `base_task.py`:定义任务执行模板流程(模板方法模式),包括获取窗口、调用上游、解析 / 校验、写库、更新游标等。
- `orders_task.py` / `payments_task.py` / `members_task.py`:实现具体任务逻辑(订单、支付、会员)。
- **加载器 / SCD / 质量层**
- `loaders/`:根据目标表封装 Upsert / Insert / Update 逻辑。
- `scd/scd2_handler.py`:为维度表提供 SCD2 历史管理能力。
- `quality/`:执行数据质量检查,如余额对账。
- **模型层 (`models/`)**
- `parsers.py`:负责数据类型转换(字符串 → 时间戳、Decimal、int 等)。
- `validators.py`:执行字段级和记录级的数据校验。
- **API 层 (`api/client.py`)**
- 封装 HTTP 调用,处理重试、超时及分页。
- **数据库层 (`database/`)**
- 管理数据库连接及上下文。
- 提供批量插入 / 更新 / Upsert 操作接口。
- **配置层 (`config/`)**
- 定义配置项默认值。
- 解析环境变量并进行类型转换。
- 对外提供统一配置对象。
### 5.3 设计模式(当前使用)
- 工厂模式:任务注册 / 创建(`TaskRegistry`)。
- 模板方法模式:任务执行流程(`BaseTask`)。
- 策略模式:不同 Loader / Checker 实现不同策略。
- 依赖注入:通过构造函数向任务传入 `db`、`api`、`config` 等依赖。
### 5.4 数据与控制流程
整体流程:
1. CLI 解析参数并加载配置。
2. Scheduler 构建数据库连接、API 客户端等依赖。
3. Scheduler 遍历任务配置,从 `TaskRegistry` 获取任务类并实例化。
4. 每个任务按统一模板执行:
- 读取游标 / 时间窗口。
- 调用 API 拉取数据(可分页)。
- 解析、验证数据。
- 通过 Loader 写入数据库(事实表 / 维度表 / SCD2
- 执行质量检查。
- 更新游标与运行记录。
5. 所有任务执行完成后,释放连接并退出进程。
### 5.5 错误处理策略
- 单个任务失败不影响其他任务执行。
- 数据库操作异常自动回滚当前事务。
- API 请求失败时按配置进行重试,超过重试次数记录错误并终止该任务。
- 所有错误被记录到日志和运行追踪表,便于事后排查。
---
## 6. 迁移指南(从旧脚本到当前项目)
本节用于说明如何从旧的单文件脚本(如 `task_merged.py`)迁移到当前模块化项目,属于当前项目的使用说明,不涉及历史对比细节。
### 6.1 核心功能映射示意
| 旧版本函数 / 类 | 新版本位置 | 说明 |
|---------------------------|--------------------------------------------------------|----------------|
| `DEFAULTS` 字典 | `config/defaults.py` | 配置默认值 |
| `build_config()` | `config/settings.py::AppConfig.load()` | 配置加载 |
| `Pg` 类 | `database/connection.py::DatabaseConnection` | 数据库连接 |
| `http_get_json()` | `api/client.py::APIClient.get()` | API 请求 |
| `paged_get()` | `api/client.py::APIClient.get_paginated()` | 分页请求 |
| `parse_ts()` | `models/parsers.py::TypeParser.parse_timestamp()` | 时间解析 |
| `upsert_fact_order()` | `loaders/facts/order.py::OrderLoader.upsert_orders()` | 订单加载 |
| `scd2_upsert()` | `scd/scd2_handler.py::SCD2Handler.upsert()` | SCD2 处理 |
| `run_task_orders()` | `tasks/orders_task.py::OrdersTask.execute()` | 订单任务 |
| `main()` | `cli/main.py::main()` | 主入口 |
### 6.2 典型迁移步骤
1. **配置迁移**
- 原来在 `DEFAULTS` 或脚本内硬编码的配置,迁移到 `.env` 与 `config/defaults.py`。
- 使用 `AppConfig.load()` 统一获取配置。
2. **并行运行验证**
# 台球场 ETL 系统
用于台球门店业务的数据采集与入湖:从上游 API 拉取订单、支付、会员、库存等数据,先落地 ODS再清洗写入事实/维度表,并提供运行追踪、增量游标、数据质量检查与测试脚手架
## 核心特性
- **两阶段链路**ODS 原始留痕 + DWD/事实表清洗,支持回放与重跑。
- **任务注册与调度**`TaskRegistry` 统一管理任务代码,`ETLScheduler` 负责游标、运行记录和失败隔离。
- **统一底座**:配置(默认值 + `.env` + CLI 覆盖)、分页/重试的 API 客户端、批量 Upsert 的数据库封装、SCD2 维度处理、质量检查。
- **测试与回放**ONLINE/OFFLINE 模式切换,`run_tests.py`/`test_presets.py` 支持参数化测试;`MANUAL_INGEST` 可将归档 JSON 重灌入 ODS
- **可安装**`setup.py` / `entry_point` 提供 `etl-billiards` 命令,或直接 `python -m cli.main` 运行。
## 仓库结构(摘录)
- `etl_billiards/config`:默认配置、环境变量解析、配置加载
- `etl_billiards/api`HTTP 客户端,内置重试/分页
- `etl_billiards/database`连接管理、批量 Upsert。
- `etl_billiards/tasks`业务任务ORDERS、PAYMENTS…、ODS 任务、DWD 任务、人工回放;`base_task.py`/`base_dwd_task.py` 提供模板
- `etl_billiards/loaders`:事实/维度/ODS Loader`scd/` 为 SCD2
- `etl_billiards/orchestration`:调度器、任务注册表、游标与运行追踪
- `etl_billiards/scripts`:测试执行器、数据库连通性检测、预置测试指令
- `etl_billiards/tests`:单元/集成测试与离线 JSON 归档
## 支持的任务代码
- **事实/维度**`ORDERS``PAYMENTS``REFUNDS``INVENTORY_CHANGE``COUPON_USAGE``MEMBERS``ASSISTANTS``PRODUCTS``TABLES``PACKAGES_DEF``TOPUPS``TABLE_DISCOUNT``ASSISTANT_ABOLISH``LEDGER``TICKET_DWD``PAYMENTS_DWD``MEMBERS_DWD`
- **ODS 原始采集**`ODS_ORDER_SETTLE``ODS_TABLE_USE``ODS_ASSISTANT_LEDGER``ODS_ASSISTANT_ABOLISH``ODS_GOODS_LEDGER``ODS_PAYMENT``ODS_REFUND``ODS_COUPON_VERIFY``ODS_MEMBER``ODS_MEMBER_CARD``ODS_PACKAGE``ODS_INVENTORY_STOCK``ODS_INVENTORY_CHANGE`
- **辅助**`MANUAL_INGEST`(将归档 JSON 回放到 ODS
## 快速开始
1. **环境要求**Python 3.10+、PostgreSQL。推荐在 `etl_billiards/` 目录下执行命令。
2. **安装依赖**
```bash
# 旧脚本
python task_merged.py --tasks ORDERS
# 新项目
python -m cli.main --tasks ORDERS
cd etl_billiards
pip install -r requirements.txt
# 开发模式pip install -e .
```
3. **配置 `.env`**
```bash
cp .env.example .env
# 核心项
PG_DSN=postgresql://user:pwd@host:5432/LLZQ
API_BASE=https://api.example.com
API_TOKEN=your_token
STORE_ID=2790685415443269
EXPORT_ROOT=/path/to/export
LOG_ROOT=/path/to/logs
```
配置的生效顺序为 “默认值” < “环境变量/.env” < “CLI 参数”。
4. **运行任务**
```bash
# 运行默认任务集
python -m cli.main
对比新旧版本导出的数据表和日志,确认一致性。
# 按需选择任务(逗号分隔)
python -m cli.main --tasks ODS_ORDER_SETTLE,ORDERS,PAYMENTS
3. **自定义逻辑迁移**
- 原脚本中的自定义清洗逻辑 → 放入相应 `loaders/` 或任务类中。
- 自定义任务 → 在 `tasks/` 中实现并在 `task_registry` 中注册。
- 自定义 API 调用 → 扩展 `api/client.py` 或单独封装服务类。
# Dry-run 示例(不提交事务)
python -m cli.main --tasks ORDERS --dry-run
4. **逐步切换**
- 先在测试环境并行运行。
- 再逐步切换生产任务到新版本。
# Windows 批处理
..\\run_etl.bat --tasks PAYMENTS
```
5. **查看输出**:日志目录与导出目录分别由 `LOG_ROOT`、`EXPORT_ROOT` 控制;运行追踪与游标记录写入数据库 `etl_admin.*` 表。
---
## 数据与运行流转
- CLI 解析参数 → `AppConfig.load()` 组装配置 → `ETLScheduler` 创建 DB/API/游标/运行追踪器。
- 调度器按任务代码实例化任务,读取/推进游标,落盘运行记录。
- 任务模板:确定时间窗口 → 调用 API/ODS 数据 → 解析校验 → Loader 批量 Upsert/SCD2 → 质量检查 → 提交事务并回写游标。
## 7. 开发与扩展指南(当前项目)
## 测试与回放
- 单元/集成测试:`pytest` 或 `python scripts/run_tests.py --suite online`。
- 预置组合:`python scripts/run_tests.py --preset offline_realdb`(见 `scripts/test_presets.py`)。
- 离线模式:`TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=... pytest tests/unit/test_etl_tasks_offline.py`。
- 数据库连通性:`python scripts/test_db_connection.py --dsn postgresql://... --query "SELECT 1"`。
### 7.1 添加新任务
## 其他提示
- `.env.example` 列出了所有常用配置;`config/defaults.py` 记录默认值与任务窗口配置。
- `loaders/ods/generic.py` 支持定义主键/列名即可落 ODS`tasks/manual_ingest_task.py` 可将归档 JSON 快速灌入对应 ODS 表。
- 需要新增任务时,在 `tasks/` 中实现并在 `orchestration/task_registry.py` 注册即可复用调度能力。
1. 在 `tasks/` 目录创建任务类:
```python
from .base_task import BaseTask
class MyTask(BaseTask):
def get_task_code(self) -> str:
return "MY_TASK"
def execute(self) -> dict:
# 1. 获取时间窗口
window_start, window_end, _ = self._get_time_window()
# 2. 调用 API 获取数据
records, _ = self.api.get_paginated(...)
# 3. 解析 / 校验
parsed = [self._parse(r) for r in records]
# 4. 加载数据
loader = MyLoader(self.db)
inserted, updated, _ = loader.upsert(parsed)
# 5. 提交并返回结果
self.db.commit()
return self._build_result("SUCCESS", {
"inserted": inserted,
"updated": updated,
})
```
2. 在 `orchestration/task_registry.py` 中注册:
```python
from tasks.my_task import MyTask
default_registry.register("MY_TASK", MyTask)
```
3. 在任务配置表中启用(示例):
```sql
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
VALUES ('MY_TASK', 123456, TRUE);
```
### 7.2 添加新加载器
```python
from loaders.base_loader import BaseLoader
class MyLoader(BaseLoader):
def upsert(self, records: list) -> tuple:
sql = "INSERT INTO table_name (...) VALUES (...) ON CONFLICT (...) DO UPDATE SET ... RETURNING (xmax = 0) AS inserted"
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)
```
### 7.3 添加新质量检查器
1. 在 `quality/` 中实现检查器,继承 `base_checker.py`。
2. 在任务或调度流程中调用该检查器,在写库后进行验证。
### 7.4 类型解析与校验扩展
- 在 `models/parsers.py` 中添加新类型解析方法。
- 在 `models/validators.py` 中添加新规则(如枚举校验、跨字段校验等)。
---
## 8. 常见问题排查
### 8.1 数据库连接失败
```text
错误: could not connect to server
```
排查要点:
- 检查 `PG_DSN` 或相关数据库配置是否正确。
- 确认数据库服务是否启动、网络是否可达。
### 8.2 API 请求超时
```text
错误: requests.exceptions.Timeout
```
排查要点:
- 检查 `API_BASE` 地址与网络连通性。
- 适当提高超时与重试次数(在配置中调整)。
### 8.3 模块导入错误
```text
错误: ModuleNotFoundError
```
排查要点:
- 确认在项目根目录下运行(包含 `etl_billiards/` 包)。
- 或通过 `pip install -e .` 以可编辑模式安装项目。
### 8.4 权限相关问题
```text
错误: Permission denied
```
排查要点:
- 脚本无执行权限:`chmod +x run_etl.sh`。
- Windows 需要以管理员身份运行,或修改日志 / 导出目录权限。
---
## 9. 使用前检查清单
在正式运行前建议确认:
- [ ] 已安装 Python 3.10+。
- [ ] 已执行 `pip install -r requirements.txt`。
- [ ] `.env` 已配置正确数据库、API、门店 ID、路径等
- [ ] PostgreSQL 数据库可连接。
- [ ] API 服务可访问且凭证有效。
- [ ] `LOG_ROOT`、`EXPORT_ROOT` 目录存在且拥有写权限。
---
## 10. 参考说明
- 本文已合并原有的快速开始、项目结构、架构说明、迁移指南等内容,可作为当前项目的统一说明文档。
- 如需在此基础上拆分多份文档,可按章节拆出,例如「快速开始」「架构设计」「迁移指南」「开发扩展」等。

9
app/etl_busy.py Normal file
View File

@@ -0,0 +1,9 @@
# app/etl_busy.py
def run():
"""
忙时抓取逻辑。
TODO: 这里写具体抓取流程API 调用 / 网页解析 / 写入 PostgreSQL 等)
"""
print("Running busy-period ETL...")
# 示例:后续在这里接 PostgreSQL 或 HTTP 抓取
# ...

8
app/etl_idle.py Normal file
View File

@@ -0,0 +1,8 @@
# app/etl_idle.py
def run():
"""
闲时抓取逻辑。
可以做全量同步、大批量历史修正等。
"""
print("Running idle-period ETL...")
# ...

31
app/runner.py Normal file
View File

@@ -0,0 +1,31 @@
# app/runner.py
import argparse
from datetime import datetime
from . import etl_busy, etl_idle
def main():
parser = argparse.ArgumentParser(description="Feiqiu ETL Runner")
parser.add_argument(
"--mode",
choices=["busy", "idle"],
required=True,
help="ETL mode: busy or idle",
)
args = parser.parse_args()
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"[{now}] Start ETL mode={args.mode}")
if args.mode == "busy":
etl_busy.run()
else:
etl_idle.run()
print(f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] ETL finished.")
if __name__ == "__main__":
main()

53
etl_billiards/.env Normal file
View File

@@ -0,0 +1,53 @@
# 数据库配置(真实库)
PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
PG_CONNECT_TIMEOUT=10
# 如需拆分配置PG_HOST=... PG_PORT=... PG_NAME=... PG_USER=... PG_PASSWORD=...
# API配置如需走真实接口再填写
API_BASE=https://api.example.com
API_TOKEN=your_token_here
# API_TIMEOUT=20
# API_PAGE_SIZE=200
# API_RETRY_MAX=3
# 应用配置
STORE_ID=2790685415443269
# TIMEZONE=Asia/Taipei
# SCHEMA_OLTP=billiards
# SCHEMA_ETL=etl_admin
# 路径配置
EXPORT_ROOT=C:\dev\LLTQ\export\JSON
LOG_ROOT=C:\dev\LLTQ\export\LOG
FETCH_ROOT=
INGEST_SOURCE_DIR=
WRITE_PRETTY_JSON=false
PGCLIENTENCODING=utf8
# ETL配置
OVERLAP_SECONDS=120
WINDOW_BUSY_MIN=30
WINDOW_IDLE_MIN=180
IDLE_START=04:00
IDLE_END=16:00
ALLOW_EMPTY_RESULT_ADVANCE=true
# 清洗配置
LOG_UNKNOWN_FIELDS=true
HASH_ALGO=sha1
STRICT_NUMERIC=true
ROUND_MONEY_SCALE=2
# 测试/离线模式(真实库联调建议 ONLINE
TEST_MODE=ONLINE
TEST_JSON_ARCHIVE_DIR=tests/source-data-doc
TEST_JSON_TEMP_DIR=/tmp/etl_billiards_json_tmp
# 测试数据库
TEST_DB_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test
# ODS <20>ؽ<EFBFBD><D8BD>ű<EFBFBD><C5B1><EFBFBD><EFBFBD>ã<EFBFBD><C3A3><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ã<EFBFBD>
JSON_DOC_DIR=C:\dev\LLTQ\export\test-json-doc
ODS_INCLUDE_FILES=
ODS_DROP_SCHEMA_FIRST=true

View File

@@ -0,0 +1,59 @@
# 数据库配置
PG_DSN=postgresql://user:password@localhost:5432/....
PG_HOST=localhost
PG_PORT=5432
PG_NAME=LLZQ
PG_USER=local-Python
PG_PASSWORD=your_password_here
PG_CONNECT_TIMEOUT=10
# API配置
API_BASE=https://api.example.com
API_TOKEN=your_token_here
API_TIMEOUT=20
API_PAGE_SIZE=200
API_RETRY_MAX=3
API_RETRY_BACKOFF=[1,2,4]
# 应用配置
STORE_ID=2790685415443269
TIMEZONE=Asia/Taipei
SCHEMA_OLTP=billiards
SCHEMA_ETL=etl_admin
# 路径配置
EXPORT_ROOT=/path/to/export
LOG_ROOT=/path/to/logs
FETCH_ROOT=/path/to/json_fetch
INGEST_SOURCE_DIR=
WRITE_PRETTY_JSON=false
MANIFEST_NAME=manifest.json
INGEST_REPORT_NAME=ingest_report.json
# ETL配置
OVERLAP_SECONDS=120
WINDOW_BUSY_MIN=30
WINDOW_IDLE_MIN=180
IDLE_START=04:00
IDLE_END=16:00
ALLOW_EMPTY_RESULT_ADVANCE=true
# 清洗配置
LOG_UNKNOWN_FIELDS=true
HASH_ALGO=sha1
STRICT_NUMERIC=true
ROUND_MONEY_SCALE=2
# 测试/离线模式
TEST_MODE=ONLINE
TEST_JSON_ARCHIVE_DIR=tests/source-data-doc
TEST_JSON_TEMP_DIR=/tmp/etl_billiards_json_tmp
# 测试数据库(可选:若设置则单元测试连入此 DSN
TEST_DB_DSN=
# ODS <20>ؽ<EFBFBD><D8BD>ű<EFBFBD><C5B1><EFBFBD><EFBFBD>ã<EFBFBD><C3A3><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ã<EFBFBD>
JSON_DOC_DIR=C:\dev\LLTQ\export\test-json-doc
ODS_INCLUDE_FILES=
ODS_DROP_SCHEMA_FIRST=true

40
etl_billiards/0.py Normal file
View File

@@ -0,0 +1,40 @@
"""Simple PostgreSQL connectivity smoke-checker."""
import os
import sys
import psycopg2
from psycopg2 import OperationalError
DEFAULT_DSN = os.environ.get(
"PG_DSN", "postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/LLZQ-test"
)
DEFAULT_TIMEOUT = max(1, min(int(os.environ.get("PG_CONNECT_TIMEOUT", 10)), 20))
def check_postgres_connection(dsn: str, timeout: int = DEFAULT_TIMEOUT) -> bool:
"""Return True if connection succeeds; print diagnostics otherwise."""
try:
conn = psycopg2.connect(dsn, connect_timeout=timeout)
with conn:
with conn.cursor() as cur:
cur.execute("SELECT 1;")
_ = cur.fetchone()
print(f"PostgreSQL 连接成功 (timeout={timeout}s)")
return True
except OperationalError as exc:
print("PostgreSQL 连接失败OperationalError", exc)
except Exception as exc: # pragma: no cover - defensive
print("PostgreSQL 连接失败(其他异常):", exc)
return False
if __name__ == "__main__":
dsn = sys.argv[1] if len(sys.argv) > 1 else DEFAULT_DSN
if not dsn:
print("缺少 DSN请传入参数或设置 PG_DSN 环境变量。")
sys.exit(2)
ok = check_postgres_connection(dsn)
if not ok:
sys.exit(1)

837
etl_billiards/README.md Normal file
View File

@@ -0,0 +1,837 @@
# 台球场 ETL 系统(模块化版本)合并文档
本文为原多份文档(如 `INDEX.md``QUICK_START.md``ARCHITECTURE.md``MIGRATION_GUIDE.md``PROJECT_STRUCTURE.md``README.md` 等)的合并版,只保留与**当前项目本身**相关的内容:项目说明、目录结构、架构设计、数据与控制流程、迁移与扩展指南等,不包含修改历史和重构过程描述。
---
## 1. 项目概述
台球场 ETL 系统是一个面向门店业务的专业 ETL 工程项目,用于从外部业务 API 拉取订单、支付、会员等数据经过解析、校验、SCD2 处理、质量检查后写入 PostgreSQL 数据库,并支持增量同步和任务运行追踪。
系统采用模块化、分层架构设计,核心特性包括:
- 模块化目录结构配置、数据库、API、模型、加载器、SCD2、质量检查、编排、任务、CLI、工具、测试等分层清晰
- 完整的配置管理:默认值 + 环境变量 + CLI 参数多层覆盖。
- 可复用的数据库访问层(连接管理、批量 Upsert 封装)。
- 支持重试与分页的 API 客户端。
- 类型安全的数据解析与校验模块。
- SCD2 维度历史管理。
- 数据质量检查(例如余额一致性检查)。
- 任务编排层统一调度、游标管理与运行追踪。
- 命令行入口统一管理任务执行支持筛选任务、Dry-run 等模式。
---
## 2. 快速开始
### 2.1 环境准备
- Python 版本:建议 3.10+
- 数据库PostgreSQL
- 操作系统Windows / Linux / macOS 均可
```bash
# 克隆/下载代码后进入项目目录
cd etl_billiards/
ls -la
```
你会看到下述目录结构的顶层部分(详细见第 4 章):
- `config/` - 配置管理
- `database/` - 数据库访问
- `api/` - API 客户端
- `tasks/` - ETL 任务实现
- `cli/` - 命令行入口
- `docs/` - 技术文档
### 2.2 安装依赖
```bash
pip install -r requirements.txt
```
主要依赖示例(按实际 `requirements.txt` 为准):
- `psycopg2-binary`PostgreSQL 驱动
- `requests`HTTP 客户端
- `python-dateutil`:时间处理
- `tzdata`:时区数据
### 2.3 配置环境变量
复制并修改环境变量模板:
```bash
cp .env.example .env
# 使用你习惯的编辑器修改 .env
```
`.env` 示例(最小配置):
```bash
# 数据库
PG_DSN=postgresql://user:password@localhost:5432/....
# API
API_BASE=https://api.example.com
API_TOKEN=your_token_here
# 门店/应用
STORE_ID=2790685415443269
TIMEZONE=Asia/Taipei
# 目录
EXPORT_ROOT=/path/to/export
LOG_ROOT=/path/to/logs
```
> 所有配置项的默认值见 `config/defaults.py`,最终生效配置由「默认值 + 环境变量 + CLI 参数」三层叠加。
### 2.4 运行第一个任务
通过 CLI 入口运行:
```bash
# 运行所有任务
python -m cli.main
# 仅运行订单任务
python -m cli.main --tasks ORDERS
# 运行订单 + 支付
python -m cli.main --tasks ORDERS,PAYMENTS
# Windows 使用脚本
run_etl.bat --tasks ORDERS
# Linux / macOS 使用脚本
./run_etl.sh --tasks ORDERS
```
### 2.5 查看结果
- 日志目录:使用 `LOG_ROOT` 指定,例如
```bash
ls -la C:\dev\LLTQ\export\LOG/
```
- 导出目录:使用 `EXPORT_ROOT` 指定,例如
```bash
ls -la C:\dev\LLTQ\export\JSON/
```
---
## 3. 常用命令与开发工具
### 3.1 CLI 常用命令
```bash
# 运行所有任务
python -m cli.main
# 运行指定任务
python -m cli.main --tasks ORDERS,PAYMENTS,MEMBERS
# 使用自定义数据库
python -m cli.main --pg-dsn "postgresql://user:password@host:5432/db"
# 使用自定义 API 端点
python -m cli.main --api-base "https://api.example.com" --api-token "..."
# 试运行(不写入数据库)
python -m cli.main --dry-run --tasks ORDERS
```
### 3.2 IDE / 代码质量工具示例VSCode
`.vscode/settings.json` 示例:
```json
{
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "black",
"python.testing.pytestEnabled": true
}
```
代码格式化与检查:
```bash
pip install black isort pylint
black .
isort .
pylint etl_billiards/
```
### 3.3 测试
```bash
# 安装测试依赖(按需)
pip install pytest pytest-cov
# 运行全部测试
pytest
# 仅运行单元测试
pytest tests/unit/
# 生成覆盖率报告
pytest --cov=. --cov-report=html
```
测试示例(按实际项目为准):
- `tests/unit/test_config.py` 配置管理单元测试
- `tests/unit/test_parsers.py` 解析器单元测试
- `tests/integration/test_database.py` 数据库集成测试
#### 3.3.1 测试模式ONLINE / OFFLINE
- `TEST_MODE=ONLINE`(默认)时,测试会模拟实时 API完整执行 E/T/L。
- `TEST_MODE=OFFLINE` 时,测试改为从 `TEST_JSON_ARCHIVE_DIR` 指定的归档 JSON 中读取数据,仅做 Transform + Load适合验证本地归档数据是否仍可回放。
- `TEST_JSON_ARCHIVE_DIR`:离线 JSON 归档目录(示例:`tests/source-data-doc` 或 CI 产出的快照)。
- `TEST_JSON_TEMP_DIR`:测试生成的临时 JSON 输出目录,便于隔离每次运行的数据。
- `TEST_DB_DSN`:可选,若设置则单元测试会连接到此 PostgreSQL DSN实打实执行写库留空时测试使用内存伪库避免依赖数据库。
示例命令:
```bash
# 在线模式覆盖所有任务
TEST_MODE=ONLINE pytest tests/unit/test_etl_tasks_online.py
# 离线模式使用归档 JSON 覆盖所有任务
TEST_MODE=OFFLINE TEST_JSON_ARCHIVE_DIR=tests/source-data-doc pytest tests/unit/test_etl_tasks_offline.py
# 使用脚本按需组合参数(示例:在线 + 仅订单用例)
python scripts/run_tests.py --suite online --mode ONLINE --keyword ORDERS
# 使用脚本连接真实测试库并回放离线模式
python scripts/run_tests.py --suite offline --mode OFFLINE --db-dsn postgresql://user:pwd@localhost:5432/testdb
# 使用“指令仓库”中的预置命令
python scripts/run_tests.py --preset offline_realdb
python scripts/run_tests.py --list-presets # 查看或自定义 scripts/test_presets.py
```
#### 3.3.2 脚本化测试组合(`run_tests.py` / `test_presets.py`
- `scripts/run_tests.py` 是 pytest 的统一入口:自动把项目根目录加入 `sys.path`,并提供 `--suite online/offline/integration`、`--tests`(自定义路径)、`--mode`、`--db-dsn`、`--json-archive`、`--json-temp`、`--keyword/-k`、`--pytest-args`、`--env KEY=VALUE` 等参数,可以像搭积木一样自由组合;
- `--preset foo` 会读取 `scripts/test_presets.py` 内 `PRESETS["foo"]` 的配置,并叠加到当前命令;`--list-presets` 与 `--dry-run` 可用来审阅或仅打印命令;
- 直接执行 `python scripts/test_presets.py` 可依次运行 `AUTO_RUN_PRESETS` 中列出的预置;传入 `--preset x --dry-run` 则只打印对应命令。
`test_presets.py` 充当“指令仓库”。每个预置都是一个字典,常用字段解释如下:
| 字段 | 作用 |
| ---------------------------- | ------------------------------------------------------------------ |
| `suite` | 复用 `run_tests.py` 内置套件online/offline/integration可多选 |
| `tests` | 追加任意 pytest 路径,例如 `tests/unit/test_config.py` |
| `mode` | 覆盖 `TEST_MODE`ONLINE / OFFLINE |
| `db_dsn` | 覆盖 `TEST_DB_DSN`,用于连入真实测试库 |
| `json_archive` / `json_temp` | 配置离线 JSON 归档与临时目录 |
| `keyword` | 映射到 `pytest -k`,用于关键字过滤 |
| `pytest_args` | 附加 pytest 参数,例 `-vv --maxfail=1` |
| `env` | 额外环境变量列表,如 `["STORE_ID=123"]` |
| `preset_meta` | 说明性文字,便于描述场景 |
示例:`offline_realdb` 预置会设置 `TEST_MODE=OFFLINE`、指定 `tests/source-data-doc` 为归档目录,并通过 `db_dsn` 连到测试库。执行 `python scripts/run_tests.py --preset offline_realdb` 或 `python scripts/test_presets.py --preset offline_realdb` 即可复用该组合保证本地、CI 与生产回放脚本一致。
#### 3.3.3 数据库连通性快速检查
`python scripts/test_db_connection.py` 提供最轻量的 PostgreSQL 连通性检测:默认使用 `TEST_DB_DSN`(也可传 `--dsn`),尝试连接并执行 `SELECT 1 AS ok`(可通过 `--query` 自定义)。典型用途:
```bash
# 读取 .env/环境变量中的 TEST_DB_DSN
python scripts/test_db_connection.py
# 临时指定 DSN并检查任务配置表
python scripts/test_db_connection.py --dsn postgresql://user:pwd@host:5432/.... --query "SELECT count(*) FROM etl_admin.etl_task"
```
脚本返回 0 代表连接与查询成功;若返回非 0可结合第 8 章“常见问题排查”的数据库章节(网络、防火墙、账号权限等)先定位问题,再运行完整 ETL。
---
## 4. 项目结构与文件说明
### 4.1 总体目录结构(树状图)
```text
etl_billiards/
├── README.md # 项目总览和使用说明
├── MIGRATION_GUIDE.md # 从旧版本迁移指南
├── requirements.txt # Python 依赖列表
├── setup.py # 项目安装配置
├── .env.example # 环境变量配置模板
├── .gitignore # Git 忽略文件配置
├── run_etl.sh # Linux/Mac 运行脚本
├── run_etl.bat # Windows 运行脚本
├── config/ # 配置管理模块
│ ├── __init__.py
│ ├── defaults.py # 默认配置值定义
│ ├── env_parser.py # 环境变量解析器
│ └── settings.py # 配置管理主类
├── database/ # 数据库访问层
│ ├── __init__.py
│ ├── connection.py # 数据库连接管理
│ └── operations.py # 批量操作封装
├── api/ # HTTP API 客户端
│ ├── __init__.py
│ └── client.py # API 客户端(重试 + 分页)
├── models/ # 数据模型层
│ ├── __init__.py
│ ├── parsers.py # 类型解析器
│ └── validators.py # 数据验证器
├── loaders/ # 数据加载器层
│ ├── __init__.py
│ ├── base_loader.py # 加载器基类
│ ├── dimensions/ # 维度表加载器
│ │ ├── __init__.py
│ │ └── member.py # 会员维度加载器
│ └── facts/ # 事实表加载器
│ ├── __init__.py
│ ├── order.py # 订单事实表加载器
│ └── payment.py # 支付记录加载器
├── scd/ # SCD2 处理层
│ ├── __init__.py
│ └── scd2_handler.py # SCD2 历史记录处理器
├── quality/ # 数据质量检查层
│ ├── __init__.py
│ ├── base_checker.py # 质量检查器基类
│ └── balance_checker.py # 余额一致性检查器
├── orchestration/ # ETL 编排层
│ ├── __init__.py
│ ├── scheduler.py # ETL 调度器
│ ├── task_registry.py # 任务注册表(工厂模式)
│ ├── cursor_manager.py # 游标管理器
│ └── run_tracker.py # 运行记录追踪器
├── tasks/ # ETL 任务层
│ ├── __init__.py
│ ├── base_task.py # 任务基类(模板方法)
│ ├── orders_task.py # 订单 ETL 任务
│ ├── payments_task.py # 支付 ETL 任务
│ └── members_task.py # 会员 ETL 任务
├── cli/ # 命令行接口层
│ ├── __init__.py
│ └── main.py # CLI 主入口
├── utils/ # 工具函数
│ ├── __init__.py
│ └── helpers.py # 通用工具函数
├── tests/ # 测试代码
│ ├── __init__.py
│ ├── unit/ # 单元测试
│ │ ├── __init__.py
│ │ ├── test_config.py
│ │ └── test_parsers.py
│ ├── testdata_json/ # 清洗入库用的测试Json文件
│ │ └── XX.json
│ └── integration/ # 集成测试
│ ├── __init__.py
│ └── test_database.py
└── docs/ # 文档
└── ARCHITECTURE.md # 架构设计文档
```
### 4.2 各模块职责概览
- **config/**
- 统一配置入口,支持默认值、环境变量、命令行参数三层覆盖。
- **database/**
- 封装 PostgreSQL 连接与批量操作插入、更新、Upsert 等)。
- **api/**
- 对上游业务 API 的 HTTP 调用进行统一封装,支持重试、分页与超时控制。
- **models/**
- 提供类型解析器(时间戳、金额、整数等)与业务级数据校验器。
- **loaders/**
- 提供事实表与维度表的加载逻辑(包含批量 Upsert、统计写入结果等
- **scd/**
- 维度型数据的 SCD2 历史管理(有效期、版本标记等)。
- **quality/**
- 质量检查策略,例如余额一致性、记录数量对齐等。
- **orchestration/**
- 任务调度、任务注册、游标管理(增量窗口)、运行记录追踪。
- **tasks/**
- 具体业务任务(订单、支付、会员等),封装了从“取数 → 处理 → 写库 → 记录结果”的完整流程。
- **cli/**
- 命令行入口,解析参数并启动调度流程。
- **utils/**
- 杂项工具函数。
- **tests/**
- 单元测试与集成测试代码。
---
## 5. 架构设计与流程说明
### 5.1 分层架构图
```text
┌─────────────────────────────────────┐
│ CLI 命令行接口 │ <- cli/main.py
└─────────────┬───────────────────────┘
┌─────────────▼───────────────────────┐
│ Orchestration 编排层 │ <- orchestration/
│ (Scheduler, TaskRegistry, ...) │
└─────────────┬───────────────────────┘
┌─────────────▼───────────────────────┐
│ Tasks 任务层 │ <- tasks/
│ (OrdersTask, PaymentsTask, ...) │
└───┬─────────┬─────────┬─────────────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌─────┐ ┌──────────┐
│Loaders │ │ SCD │ │ Quality │ <- loaders/, scd/, quality/
└────────┘ └─────┘ └──────────┘
┌───────▼────────┐
│ Models 模型 │ <- models/
└───────┬────────┘
┌───────▼────────┐
│ API 客户端 │ <- api/
└───────┬────────┘
┌───────▼────────┐
│ Database 访问 │ <- database/
└───────┬────────┘
┌───────▼────────┐
│ Config 配置 │ <- config/
└────────────────┘
```
### 5.2 各层职责(当前设计)
- **CLI 层 (`cli/`)**
- 解析命令行参数指定任务列表、Dry-run、覆盖配置项等
- 初始化配置与日志后交由编排层执行。
- **编排层 (`orchestration/`)**
- `scheduler.py`:根据配置与 CLI 参数选择需要执行的任务,控制执行顺序和并行策略。
- `task_registry.py`:提供任务注册表,按任务代码创建任务实例(工厂模式)。
- `cursor_manager.py`:管理增量游标(时间窗口 / ID 游标)。
- `run_tracker.py`:记录每次任务运行的状态、统计信息和错误信息。
- **任务层 (`tasks/`)**
- `base_task.py`:定义任务执行模板流程(模板方法模式),包括获取窗口、调用上游、解析 / 校验、写库、更新游标等。
- `orders_task.py` / `payments_task.py` / `members_task.py`:实现具体任务逻辑(订单、支付、会员)。
- **加载器 / SCD / 质量层**
- `loaders/`:根据目标表封装 Upsert / Insert / Update 逻辑。
- `scd/scd2_handler.py`:为维度表提供 SCD2 历史管理能力。
- `quality/`:执行数据质量检查,如余额对账。
- **模型层 (`models/`)**
- `parsers.py`:负责数据类型转换(字符串 → 时间戳、Decimal、int 等)。
- `validators.py`:执行字段级和记录级的数据校验。
- **API 层 (`api/client.py`)**
- 封装 HTTP 调用,处理重试、超时及分页。
- **数据库层 (`database/`)**
- 管理数据库连接及上下文。
- 提供批量插入 / 更新 / Upsert 操作接口。
- **配置层 (`config/`)**
- 定义配置项默认值。
- 解析环境变量并进行类型转换。
- 对外提供统一配置对象。
### 5.3 设计模式(当前使用)
- 工厂模式:任务注册 / 创建(`TaskRegistry`)。
- 模板方法模式:任务执行流程(`BaseTask`)。
- 策略模式:不同 Loader / Checker 实现不同策略。
- 依赖注入:通过构造函数向任务传入 `db`、`api`、`config` 等依赖。
### 5.4 数据与控制流程
整体流程:
1. CLI 解析参数并加载配置。
2. Scheduler 构建数据库连接、API 客户端等依赖。
3. Scheduler 遍历任务配置,从 `TaskRegistry` 获取任务类并实例化。
4. 每个任务按统一模板执行:
- 读取游标 / 时间窗口。
- 调用 API 拉取数据(可分页)。
- 解析、验证数据。
- 通过 Loader 写入数据库(事实表 / 维度表 / SCD2
- 执行质量检查。
- 更新游标与运行记录。
5. 所有任务执行完成后,释放连接并退出进程。
### 5.5 错误处理策略
- 单个任务失败不影响其他任务执行。
- 数据库操作异常自动回滚当前事务。
- API 请求失败时按配置进行重试,超过重试次数记录错误并终止该任务。
- 所有错误被记录到日志和运行追踪表,便于事后排查。
### 5.6 ODS + DWD 双阶段策略(新增)
为了支撑回溯/重放与后续 DWD 宽表构建,项目新增了 `billiards_ods` Schema 以及一组专门的 ODS 任务/Loader
- **ODS 表**`billiards_ods.ods_order_settle`、`ods_table_use_detail`、`ods_assistant_ledger`、`ods_assistant_abolish`、`ods_goods_ledger`、`ods_payment`、`ods_refund`、`ods_coupon_verify`、`ods_member`、`ods_member_card`、`ods_package_coupon`、`ods_inventory_stock`、`ods_inventory_change`。每条记录都会保存 `store_id + 源主键 + payload JSON + fetched_at + source_endpoint` 等信息。
- **通用 Loader**`loaders/ods/generic.py::GenericODSLoader` 统一封装了 `INSERT ... ON CONFLICT ...` 与批量写入逻辑,调用方只需提供列名与主键列即可。
- **ODS 任务**`tasks/ods_tasks.py` 内通过 `OdsTaskSpec` 定义了一组任务(`ODS_ORDER_SETTLE`、`ODS_PAYMENT`、`ODS_ASSISTANT_LEDGER` 等),并在 `TaskRegistry` 中自动注册,可直接通过 `python -m cli.main --tasks ODS_ORDER_SETTLE,ODS_PAYMENT` 执行。
- **双阶段链路**
1. 阶段 1ODS调用 API/离线归档 JSON将原始记录写入 ODS 表,保留分页、抓取时间、来源文件等元数据。
2. 阶段 2DWD/DIM后续订单、支付、券等事实任务将改为从 ODS 读取 payload经过解析/校验后写入 `billiards.fact_*`、`dim_*` 表,避免重复拉取上游接口。
> 新增的单元测试 `tests/unit/test_ods_tasks.py` 覆盖了 `ODS_ORDER_SETTLE`、`ODS_PAYMENT` 的入库路径,可作为扩展其他 ODS 任务的模板。
---
## 6. 迁移指南(从旧脚本到当前项目)
本节用于说明如何从旧的单文件脚本(如 `task_merged.py`)迁移到当前模块化项目,属于当前项目的使用说明,不涉及历史对比细节。
### 6.1 核心功能映射示意
| 旧版本函数 / 类 | 新版本位置 | 说明 |
| --------------------- | ----------------------------------------------------- | ---------- |
| `DEFAULTS` 字典 | `config/defaults.py` | 配置默认值 |
| `build_config()` | `config/settings.py::AppConfig.load()` | 配置加载 |
| `Pg` 类 | `database/connection.py::DatabaseConnection` | 数据库连接 |
| `http_get_json()` | `api/client.py::APIClient.get()` | API 请求 |
| `paged_get()` | `api/client.py::APIClient.get_paginated()` | 分页请求 |
| `parse_ts()` | `models/parsers.py::TypeParser.parse_timestamp()` | 时间解析 |
| `upsert_fact_order()` | `loaders/facts/order.py::OrderLoader.upsert_orders()` | 订单加载 |
| `scd2_upsert()` | `scd/scd2_handler.py::SCD2Handler.upsert()` | SCD2 处理 |
| `run_task_orders()` | `tasks/orders_task.py::OrdersTask.execute()` | 订单任务 |
| `main()` | `cli/main.py::main()` | 主入口 |
### 6.2 典型迁移步骤
1. **配置迁移**
- 原来在 `DEFAULTS` 或脚本内硬编码的配置,迁移到 `.env` 与 `config/defaults.py`。
- 使用 `AppConfig.load()` 统一获取配置。
2. **并行运行验证**
```bash
# 旧脚本
python task_merged.py --tasks ORDERS
# 新项目
python -m cli.main --tasks ORDERS
```
对比新旧版本导出的数据表和日志,确认一致性。
3. **自定义逻辑迁移**
- 原脚本中的自定义清洗逻辑 → 放入相应 `loaders/` 或任务类中。
- 自定义任务 → 在 `tasks/` 中实现并在 `task_registry` 中注册。
- 自定义 API 调用 → 扩展 `api/client.py` 或单独封装服务类。
4. **逐步切换**
- 先在测试环境并行运行。
- 再逐步切换生产任务到新版本。
---
## 7. 开发与扩展指南(当前项目)
### 7.1 添加新任务
1. 在 `tasks/` 目录创建任务类:
```python
from .base_task import BaseTask
class MyTask(BaseTask):
def get_task_code(self) -> str:
return "MY_TASK"
def execute(self) -> dict:
# 1. 获取时间窗口
window_start, window_end, _ = self._get_time_window()
# 2. 调用 API 获取数据
records, _ = self.api.get_paginated(...)
# 3. 解析 / 校验
parsed = [self._parse(r) for r in records]
# 4. 加载数据
loader = MyLoader(self.db)
inserted, updated, _ = loader.upsert(parsed)
# 5. 提交并返回结果
self.db.commit()
return self._build_result("SUCCESS", {
"inserted": inserted,
"updated": updated,
})
```
2. 在 `orchestration/task_registry.py` 中注册:
```python
from tasks.my_task import MyTask
default_registry.register("MY_TASK", MyTask)
```
3. 在任务配置表中启用(示例):
```sql
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
VALUES ('MY_TASK', 123456, TRUE);
```
### 7.2 添加新加载器
```python
from loaders.base_loader import BaseLoader
class MyLoader(BaseLoader):
def upsert(self, records: list) -> tuple:
sql = "INSERT INTO table_name (...) VALUES (...) ON CONFLICT (...) DO UPDATE SET ... RETURNING (xmax = 0) AS inserted"
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)
```
### 7.3 添加新质量检查器
1. 在 `quality/` 中实现检查器,继承 `base_checker.py`。
2. 在任务或调度流程中调用该检查器,在写库后进行验证。
### 7.4 类型解析与校验扩展
- 在 `models/parsers.py` 中添加新类型解析方法。
- 在 `models/validators.py` 中添加新规则(如枚举校验、跨字段校验等)。
---
## 8. 常见问题排查
### 8.1 数据库连接失败
```text
错误: could not connect to server
```
排查要点:
- 检查 `PG_DSN` 或相关数据库配置是否正确。
- 确认数据库服务是否启动、网络是否可达。
### 8.2 API 请求超时
```text
错误: requests.exceptions.Timeout
```
排查要点:
- 检查 `API_BASE` 地址与网络连通性。
- 适当提高超时与重试次数(在配置中调整)。
### 8.3 模块导入错误
```text
错误: ModuleNotFoundError
```
排查要点:
- 确认在项目根目录下运行(包含 `etl_billiards/` 包)。
- 或通过 `pip install -e .` 以可编辑模式安装项目。
### 8.4 权限相关问题
```text
错误: Permission denied
```
排查要点:
- 脚本无执行权限:`chmod +x run_etl.sh`。
- Windows 需要以管理员身份运行,或修改日志 / 导出目录权限。
---
## 9. 使用前检查清单
在正式运行前建议确认:
- [ ] 已安装 Python 3.10+。
- [ ] 已执行 `pip install -r requirements.txt`。
- [ ] `.env` 已配置正确数据库、API、门店 ID、路径等
- [ ] PostgreSQL 数据库可连接。
- [ ] API 服务可访问且凭证有效。
- [ ] `LOG_ROOT`、`EXPORT_ROOT` 目录存在且拥有写权限。
---
## 10. 参考说明
- 本文已合并原有的快速开始、项目结构、架构说明、迁移指南等内容,可作为当前项目的统一说明文档。
- 如需在此基础上拆分多份文档,可按章节拆出,例如「快速开始」「架构设计」「迁移指南」「开发扩展」等。
## 11. 运行/调试模式说明
- 生产环境仅保留“任务模式”:通过调度/CLI 执行注册的任务ETL/ODS不使用调试脚本。
- 开发/调试可使用的辅助脚本(上线前可删除或禁用):
- `python -m etl_billiards.scripts.rebuild_ods_from_json`:从本地 JSON 目录重建 `billiards_ods`,用于离线初始化/验证。环境变量:`PG_DSN`(必填)、`JSON_DOC_DIR`(可选,默认 `C:\dev\LLTQ\export\test-json-doc`)、`INCLUDE_FILES`(逗号分隔文件名)、`DROP_SCHEMA_FIRST`(默认 true
- 如需在生产环境保留脚本,请在运维手册中明确用途和禁用条件,避免误用。
## 12. ODS 任务上线指引
- 任务注册:`etl_billiards/database/seed_ods_tasks.sql` 列出了当前启用的 ODS 任务。将其中的 `store_id` 替换为实际门店后执行:
```
psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
```
`ON CONFLICT` 会保持 enabled=true避免重复。
- 调度:确认 `etl_admin.etl_task` 中已启用所需的 ODS 任务(任务代码见 seed 脚本),调度器或 CLI `--tasks` 即可调用。
- 离线回灌:开发环境可用 `rebuild_ods_from_json` 以样例 JSON 初始化 ODS生产慎用默认按 `(source_file, record_index)` 去重。
- 测试:`pytest etl_billiards/tests/unit/test_ods_tasks.py` 覆盖核心 ODS 任务;测试时可设置 `ETL_SKIP_DOTENV=1` 跳过本地 .env 读取。
## 13. ODS 表映射总览
| ODS 表名 | 接口 Path | 数据列表路径 |
| ------------------------------------ | ---------------------------------------------------- | ----------------------------- |
| `assistant_accounts_master` | `/PersonnelManagement/SearchAssistantInfo` | data.assistantInfos |
| `assistant_service_records` | `/AssistantPerformance/GetOrderAssistantDetails` | data.orderAssistantDetails |
| `assistant_cancellation_records` | `/AssistantPerformance/GetAbolitionAssistant` | data.abolitionAssistants |
| `goods_stock_movements` | `/GoodsStockManage/QueryGoodsOutboundReceipt` | data.queryDeliveryRecordsList |
| `goods_stock_summary` | `/TenantGoods/GetGoodsStockReport` | data |
| `group_buy_packages` | `/PackageCoupon/QueryPackageCouponList` | data.packageCouponList |
| `group_buy_redemption_records` | `/Site/GetSiteTableUseDetails` | data.siteTableUseDetailsList |
| `member_profiles` | `/MemberProfile/GetTenantMemberList` | data.tenantMemberInfos |
| `member_balance_changes` | `/MemberProfile/GetMemberCardBalanceChange` | data.tenantMemberCardLogs |
| `member_stored_value_cards` | `/MemberProfile/GetTenantMemberCardList` | data.tenantMemberCards |
| `payment_transactions` | `/PayLog/GetPayLogListPage` | data |
| `platform_coupon_redemption_records` | `/Promotion/GetOfflineCouponConsumePageList` | data |
| `recharge_settlements` | `/Site/GetRechargeSettleList` | data.settleList |
| `refund_transactions` | `/Order/GetRefundPayLogList` | data |
| `settlement_records` | `/Site/GetAllOrderSettleList` | data.settleList |
| `settlement_ticket_details` | `/Order/GetOrderSettleTicketNew` | (整包原始 JSON |
| `site_tables_master` | `/Table/GetSiteTables` | data.siteTables |
| `stock_goods_category_tree` | `/TenantGoodsCategory/QueryPrimarySecondaryCategory` | data.goodsCategoryList |
| `store_goods_master` | `/TenantGoods/GetGoodsInventoryList` | data.orderGoodsList |
| `store_goods_sales_records` | `/TenantGoods/GetGoodsSalesList` | data.orderGoodsLedgers |
| `table_fee_discount_records` | `/Site/GetTaiFeeAdjustList` | data.taiFeeAdjustInfos |
| `table_fee_transactions` | `/Site/GetSiteTableOrderDetails` | data.siteTableUseDetailsList |
| `tenant_goods_master` | `/TenantGoods/QueryTenantGoods` | data.tenantGoodsList |
## 14. ODS 相关环境变量/默认值
- `.env` / 环境变量:
- `JSON_DOC_DIR`ODS 样例 JSON 目录(开发/回灌用)
- `ODS_INCLUDE_FILES`:限定导入的文件名(逗号分隔,不含 .json
- `ODS_DROP_SCHEMA_FIRST`true/false是否重建 schema
- `ETL_SKIP_DOTENV`:测试/CI 时设为 1 跳过本地 .env 读取
- `config/defaults.py` 中 `ods` 默认值:
- `json_doc_dir`: `C:\dev\LLTQ\export\test-json-doc`
- `include_files`: `""`
- `drop_schema_first`: `True`
---
## 15. DWD 维度 “业务事件”
1. 粒度唯一、原子
- 一张 DWD 表只能有一种业务粒度,比如:
- 一条记录 = 一次结账;
- 一条记录 = 一段台费流水;
- 一条记录 = 一次助教服务;
- 一条记录 = 一次会员余额变动。
- 表里面不能又混“订单头”又混“订单行”,不能一部分是“汇总”,一部分是“明细”。
- 一旦粒度确定,所有字段都要跟这个粒度匹配:
- 比如“结账头表”就不要塞每一行商品明细;
- 商品明细就不要塞整单级别的总金额。
- 这是 DWD 层最重要的一条。
2. 以业务过程建模,不以 JSON 列表建模
- 先画清楚你真实的业务链路:
- 开台 / 换台 / 关台 → 台费流水
- 助教上桌 → 助教服务流水 / 废除事件
- 点单 → 商品销售流水
- 充值 / 消费 → 余额变更 / 充值单
- 结账 → 结账头表 + 支付流水 / 退款流水
- 团购 / 平台券 → 核销流水
3. 主键明确、外键统一
- 每张 DWD 表必须有业务主键(哪怕是接口给的 id不要依赖数据库自增。
- 所有“同一概念”的字段必须统一命名、统一含义:
- 门店:统一叫 site_id都对应 siteProfile.id
- 会员:统一叫 member_id 对应 member_profiles.idsystem_member_id 单独一列;
- 台桌:统一 table_id 对应 site_tables_master.id
- 结账:统一 order_settle_id
- 订单:统一 order_trade_no 等。
- 否则后面 DWS、AI 要把表拼起来会非常痛苦。
4. 保留明细,不做过度汇总
- DWD 层的事实表原则上只做“明细级”的数据:
- 不要在 DWD 就把“日汇总、周汇总、月汇总”算出来,那是 DWS 的事;
- 也不要把多个事件折成一行(例如一张表同时放日汇总+单笔流水)。
- 需要聚合时,再在 DWS 做主题宽表:
- dws_member_day_profile、dws_site_day_summary 等。
- DWD 只负责细颗粒度的真相。
5. 统一清洗、标准化,但保持可追溯
- 在 DWD 层一定要做的清洗:
- 类型转换:字符串时间 → 时间类型,金额统一为 decimal布尔统一为 0/1
- 单位统一:秒 / 分钟、元 / 分都统一;
- 枚举标准化:状态码、类型码在 DWD 里就定死含义,必要时建枚举维表。
- 同时要保证:
- 每条 DWD 记录都能追溯回 ODS
- 保留源系统主键;
- 保留原始时间 / 原始金额字段(不要覆盖掉)。
6. 扁平化、去嵌套
- JSON 里常见结构是:分页壳 + 头 + 明细数组 + 各种嵌套对象siteProfile、tableProfile、goodsLedgers…
- DWD 的原则是:
- 去掉分页壳;
- 把“数组”拆成子表(头表 / 行表);
- 把重复出现的 profile 抽出去做维度表(门店、台、商品、会员……)。
- 目标是DWD 表都是二维表结构,不存复杂嵌套 JSON。
7. 模型长期稳定,可扩展
- DWD 的表结构要尽可能稳定,新增需求尽量通过:
- 加字段;
- 新建事实表 / 维度表;
- 在 DWS 做派生指标;
- 而不是频繁重构已有 DWD 表结构。
- 这点跟你后面要喂给 LLM 也很相关AI 配的 prompt、schema 理解都要尽量少改。

View File

View File

256
etl_billiards/api/client.py Normal file
View File

@@ -0,0 +1,256 @@
# -*- coding: utf-8 -*-
"""API客户端统一封装 POST/重试/分页与列表提取逻辑。"""
from __future__ import annotations
from typing import Iterable, Sequence, Tuple
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
DEFAULT_BROWSER_HEADERS = {
"Accept": "application/json, text/plain, */*",
"Content-Type": "application/json",
"Origin": "https://pc.ficoo.vip",
"Referer": "https://pc.ficoo.vip/",
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36"
),
"Accept-Language": "zh-CN,zh;q=0.9",
"sec-ch-ua": '"Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"',
"sec-ch-ua-platform": '"Windows"',
"sec-ch-ua-mobile": "?0",
"sec-fetch-site": "same-origin",
"sec-fetch-mode": "cors",
"sec-fetch-dest": "empty",
"priority": "u=1, i",
"X-Requested-With": "XMLHttpRequest",
"DNT": "1",
}
DEFAULT_LIST_KEYS: Tuple[str, ...] = (
"list",
"rows",
"records",
"items",
"dataList",
"data_list",
"tenantMemberInfos",
"tenantMemberCardLogs",
"tenantMemberCards",
"settleList",
"orderAssistantDetails",
"assistantInfos",
"siteTables",
"taiFeeAdjustInfos",
"siteTableUseDetailsList",
"tenantGoodsList",
"packageCouponList",
"queryDeliveryRecordsList",
"goodsCategoryList",
"orderGoodsList",
"orderGoodsLedgers",
)
class APIClient:
"""HTTP API 客户端(默认使用 POST + JSON 请求体)"""
def __init__(
self,
base_url: str,
token: str | None = None,
timeout: int = 20,
retry_max: int = 3,
headers_extra: dict | None = None,
):
self.base_url = (base_url or "").rstrip("/")
self.token = self._normalize_token(token)
self.timeout = timeout
self.retry_max = retry_max
self.headers_extra = headers_extra or {}
self._session: requests.Session | None = None
# ------------------------------------------------------------------ HTTP 基础
def _get_session(self) -> requests.Session:
"""获取或创建带重试的 Session。"""
if self._session is None:
self._session = requests.Session()
retries = max(0, int(self.retry_max) - 1)
retry = Retry(
total=None,
connect=retries,
read=retries,
status=retries,
allowed_methods=frozenset(["GET", "POST"]),
status_forcelist=(429, 500, 502, 503, 504),
backoff_factor=0.5,
respect_retry_after_header=True,
raise_on_status=False,
)
adapter = HTTPAdapter(max_retries=retry)
self._session.mount("http://", adapter)
self._session.mount("https://", adapter)
self._session.headers.update(self._build_headers())
return self._session
def get(self, endpoint: str, params: dict | None = None) -> dict:
"""
兼容旧名的请求入口(实际以 POST JSON 方式请求)。
"""
return self._post_json(endpoint, params)
def _post_json(self, endpoint: str, payload: dict | None = None) -> dict:
if not self.base_url:
raise ValueError("API base_url 未配置")
url = f"{self.base_url}/{endpoint.lstrip('/')}"
sess = self._get_session()
resp = sess.post(url, json=payload or {}, timeout=self.timeout)
resp.raise_for_status()
data = resp.json()
self._ensure_success(data)
return data
def _build_headers(self) -> dict:
headers = dict(DEFAULT_BROWSER_HEADERS)
headers.update(self.headers_extra)
if self.token:
headers["Authorization"] = self.token
return headers
@staticmethod
def _normalize_token(token: str | None) -> str | None:
if not token:
return None
t = str(token).strip()
if not t.lower().startswith("bearer "):
t = f"Bearer {t}"
return t
@staticmethod
def _ensure_success(payload: dict):
"""API 返回 code 非 0 时主动抛错,便于上层重试/记录。"""
if isinstance(payload, dict) and "code" in payload:
code = payload.get("code")
if code not in (0, "0", None):
msg = payload.get("msg") or payload.get("message") or ""
raise ValueError(f"API 返回错误 code={code} msg={msg}")
# ------------------------------------------------------------------ 分页
def iter_paginated(
self,
endpoint: str,
params: dict | None,
page_size: int | None = 200,
page_field: str = "page",
size_field: str = "limit",
data_path: tuple = ("data",),
list_key: str | Sequence[str] | None = None,
page_start: int = 1,
page_end: int | None = None,
) -> Iterable[tuple[int, list, dict, dict]]:
"""
分页迭代器:逐页拉取数据并产出 (page_no, records, request_params, raw_response)。
page_size=None 时不附带分页参数,仅拉取一次。
"""
base_params = dict(params or {})
page = page_start
while True:
page_params = dict(base_params)
if page_size is not None:
page_params[page_field] = page
page_params[size_field] = page_size
payload = self._post_json(endpoint, page_params)
records = self._extract_list(payload, data_path, list_key)
yield page, records, page_params, payload
if page_size is None:
break
if page_end is not None and page >= page_end:
break
if len(records) < (page_size or 0):
break
if len(records) == 0:
break
page += 1
def get_paginated(
self,
endpoint: str,
params: dict,
page_size: int | None = 200,
page_field: str = "page",
size_field: str = "limit",
data_path: tuple = ("data",),
list_key: str | Sequence[str] | None = None,
page_start: int = 1,
page_end: int | None = None,
) -> tuple[list, list]:
"""分页获取数据并将所有记录汇总在一个列表中。"""
records, pages_meta = [], []
for page_no, page_records, request_params, response in self.iter_paginated(
endpoint=endpoint,
params=params,
page_size=page_size,
page_field=page_field,
size_field=size_field,
data_path=data_path,
list_key=list_key,
page_start=page_start,
page_end=page_end,
):
records.extend(page_records)
pages_meta.append(
{"page": page_no, "request": request_params, "response": response}
)
return records, pages_meta
# ------------------------------------------------------------------ 响应解析
@classmethod
def _extract_list(
cls, payload: dict | list, data_path: tuple, list_key: str | Sequence[str] | None
) -> list:
"""根据 data_path/list_key 提取列表结构,兼容常见字段名。"""
cur: object = payload
if isinstance(cur, list):
return cur
for key in data_path:
if isinstance(cur, dict):
cur = cur.get(key)
else:
cur = None
if cur is None:
break
if isinstance(cur, list):
return cur
if isinstance(cur, dict):
if list_key:
keys = (list_key,) if isinstance(list_key, str) else tuple(list_key)
for k in keys:
if isinstance(cur.get(k), list):
return cur[k]
for k in DEFAULT_LIST_KEYS:
if isinstance(cur.get(k), list):
return cur[k]
for v in cur.values():
if isinstance(v, list):
return v
return []

View File

@@ -0,0 +1,74 @@
# -*- coding: utf-8 -*-
"""本地 JSON 客户端,模拟 APIClient 的分页接口,从落盘的 JSON 回放数据。"""
from __future__ import annotations
import json
from pathlib import Path
from typing import Iterable, Tuple
from api.client import APIClient
from utils.json_store import endpoint_to_filename
class LocalJsonClient:
"""
读取 RecordingAPIClient 生成的 JSON提供 iter_paginated/get_paginated 接口。
"""
def __init__(self, base_dir: str | Path):
self.base_dir = Path(base_dir)
if not self.base_dir.exists():
raise FileNotFoundError(f"JSON 目录不存在: {self.base_dir}")
def iter_paginated(
self,
endpoint: str,
params: dict | None,
page_size: int = 200,
page_field: str = "page",
size_field: str = "limit",
data_path: tuple = ("data",),
list_key: str | None = None,
) -> Iterable[Tuple[int, list, dict, dict]]:
file_path = self.base_dir / endpoint_to_filename(endpoint)
if not file_path.exists():
raise FileNotFoundError(f"未找到匹配的 JSON 文件: {file_path}")
with file_path.open("r", encoding="utf-8") as fp:
payload = json.load(fp)
pages = payload.get("pages")
if not isinstance(pages, list) or not pages:
pages = [{"page": 1, "request": params or {}, "response": payload}]
for idx, page in enumerate(pages, start=1):
response = page.get("response", {})
request_params = page.get("request") or {}
page_no = page.get("page") or idx
records = APIClient._extract_list(response, data_path, list_key) # type: ignore[attr-defined]
yield page_no, records, request_params, response
def get_paginated(
self,
endpoint: str,
params: dict,
page_size: int = 200,
page_field: str = "page",
size_field: str = "limit",
data_path: tuple = ("data",),
list_key: str | None = None,
) -> tuple[list, list]:
records: list = []
pages_meta: list = []
for page_no, page_records, request_params, response in self.iter_paginated(
endpoint=endpoint,
params=params,
page_size=page_size,
page_field=page_field,
size_field=size_field,
data_path=data_path,
list_key=list_key,
):
records.extend(page_records)
pages_meta.append({"page": page_no, "request": request_params, "response": response})
return records, pages_meta

View File

@@ -0,0 +1,118 @@
# -*- coding: utf-8 -*-
"""包装 APIClient将分页响应落盘便于后续本地清洗。"""
from __future__ import annotations
from datetime import datetime
from pathlib import Path
from typing import Any, Iterable, Tuple
from api.client import APIClient
from utils.json_store import dump_json, endpoint_to_filename
class RecordingAPIClient:
"""
代理 APIClient在调用 iter_paginated/get_paginated 时同时把响应写入 JSON 文件。
文件名根据 endpoint 生成,写入到指定 output_dir。
"""
def __init__(
self,
base_client: APIClient,
output_dir: Path | str,
task_code: str,
run_id: int,
write_pretty: bool = False,
):
self.base = base_client
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
self.task_code = task_code
self.run_id = run_id
self.write_pretty = write_pretty
self.last_dump: dict[str, Any] | None = None
# ------------------------------------------------------------------ public API
def iter_paginated(
self,
endpoint: str,
params: dict | None,
page_size: int = 200,
page_field: str = "page",
size_field: str = "limit",
data_path: tuple = ("data",),
list_key: str | None = None,
) -> Iterable[Tuple[int, list, dict, dict]]:
pages: list[dict[str, Any]] = []
total_records = 0
for page_no, records, request_params, response in self.base.iter_paginated(
endpoint=endpoint,
params=params,
page_size=page_size,
page_field=page_field,
size_field=size_field,
data_path=data_path,
list_key=list_key,
):
pages.append({"page": page_no, "request": request_params, "response": response})
total_records += len(records)
yield page_no, records, request_params, response
self._dump(endpoint, params, page_size, pages, total_records)
def get_paginated(
self,
endpoint: str,
params: dict,
page_size: int = 200,
page_field: str = "page",
size_field: str = "limit",
data_path: tuple = ("data",),
list_key: str | None = None,
) -> tuple[list, list]:
records: list = []
pages_meta: list = []
for page_no, page_records, request_params, response in self.iter_paginated(
endpoint=endpoint,
params=params,
page_size=page_size,
page_field=page_field,
size_field=size_field,
data_path=data_path,
list_key=list_key,
):
records.extend(page_records)
pages_meta.append({"page": page_no, "request": request_params, "response": response})
return records, pages_meta
# ------------------------------------------------------------------ internal
def _dump(
self,
endpoint: str,
params: dict | None,
page_size: int,
pages: list[dict[str, Any]],
total_records: int,
):
filename = endpoint_to_filename(endpoint)
path = self.output_dir / filename
payload = {
"task_code": self.task_code,
"run_id": self.run_id,
"endpoint": endpoint,
"params": params or {},
"page_size": page_size,
"pages": pages,
"total_records": total_records,
"dumped_at": datetime.utcnow().isoformat() + "Z",
}
dump_json(path, payload, pretty=self.write_pretty)
self.last_dump = {
"file": str(path),
"endpoint": endpoint,
"pages": len(pages),
"records": total_records,
}

View File

158
etl_billiards/cli/main.py Normal file
View File

@@ -0,0 +1,158 @@
# -*- coding: utf-8 -*-
"""CLI主入口"""
import sys
import argparse
import logging
from pathlib import Path
from config.settings import AppConfig
from orchestration.scheduler import ETLScheduler
def setup_logging():
"""设置日志"""
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(levelname)s] %(name)s: %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
return logging.getLogger("etl_billiards")
def parse_args():
"""解析命令行参数"""
parser = argparse.ArgumentParser(description="台球场ETL系统")
# 基本参数
parser.add_argument("--store-id", type=int, help="门店ID")
parser.add_argument("--tasks", help="任务列表,逗号分隔")
parser.add_argument("--dry-run", action="store_true", help="试运行(不提交)")
# 数据库参数
parser.add_argument("--pg-dsn", help="PostgreSQL DSN")
parser.add_argument("--pg-host", help="PostgreSQL主机")
parser.add_argument("--pg-port", type=int, help="PostgreSQL端口")
parser.add_argument("--pg-name", help="PostgreSQL数据库名")
parser.add_argument("--pg-user", help="PostgreSQL用户名")
parser.add_argument("--pg-password", help="PostgreSQL密码")
# API参数
parser.add_argument("--api-base", help="API基础URL")
parser.add_argument("--api-token", "--token", dest="api_token", help="API令牌Bearer Token")
parser.add_argument("--api-timeout", type=int, help="API超时(秒)")
parser.add_argument("--api-page-size", type=int, help="分页大小")
parser.add_argument("--api-retry-max", type=int, help="API重试最大次数")
# 目录参数
parser.add_argument("--export-root", help="导出根目录")
parser.add_argument("--log-root", help="日志根目录")
# 抓取/清洗管线
parser.add_argument("--pipeline-flow", choices=["FULL", "FETCH_ONLY", "INGEST_ONLY"], help="流水线模式")
parser.add_argument("--fetch-root", help="抓取JSON输出根目录")
parser.add_argument("--ingest-source", help="本地清洗入库源目录")
parser.add_argument("--write-pretty-json", action="store_true", help="抓取JSON美化输出")
# 运行窗口
parser.add_argument("--idle-start", help="闲时窗口开始(HH:MM)")
parser.add_argument("--idle-end", help="闲时窗口结束(HH:MM)")
parser.add_argument("--allow-empty-advance", action="store_true", help="允许空结果推进窗口")
return parser.parse_args()
def build_cli_overrides(args) -> dict:
"""从命令行参数构建配置覆盖"""
overrides = {}
# 基本信息
if args.store_id is not None:
overrides.setdefault("app", {})["store_id"] = args.store_id
# 数据库
if args.pg_dsn:
overrides.setdefault("db", {})["dsn"] = args.pg_dsn
if args.pg_host:
overrides.setdefault("db", {})["host"] = args.pg_host
if args.pg_port:
overrides.setdefault("db", {})["port"] = args.pg_port
if args.pg_name:
overrides.setdefault("db", {})["name"] = args.pg_name
if args.pg_user:
overrides.setdefault("db", {})["user"] = args.pg_user
if args.pg_password:
overrides.setdefault("db", {})["password"] = args.pg_password
# API
if args.api_base:
overrides.setdefault("api", {})["base_url"] = args.api_base
if args.api_token:
overrides.setdefault("api", {})["token"] = args.api_token
if args.api_timeout:
overrides.setdefault("api", {})["timeout_sec"] = args.api_timeout
if args.api_page_size:
overrides.setdefault("api", {})["page_size"] = args.api_page_size
if args.api_retry_max:
overrides.setdefault("api", {}).setdefault("retries", {})["max_attempts"] = args.api_retry_max
# 目录
if args.export_root:
overrides.setdefault("io", {})["export_root"] = args.export_root
if args.log_root:
overrides.setdefault("io", {})["log_root"] = args.log_root
# 抓取/清洗管线
if args.pipeline_flow:
overrides.setdefault("pipeline", {})["flow"] = args.pipeline_flow.upper()
if args.fetch_root:
overrides.setdefault("pipeline", {})["fetch_root"] = args.fetch_root
if args.ingest_source:
overrides.setdefault("pipeline", {})["ingest_source_dir"] = args.ingest_source
if args.write_pretty_json:
overrides.setdefault("io", {})["write_pretty_json"] = True
# 运行窗口
if args.idle_start:
overrides.setdefault("run", {}).setdefault("idle_window", {})["start"] = args.idle_start
if args.idle_end:
overrides.setdefault("run", {}).setdefault("idle_window", {})["end"] = args.idle_end
if args.allow_empty_advance:
overrides.setdefault("run", {})["allow_empty_result_advance"] = True
# 任务
if args.tasks:
tasks = [t.strip().upper() for t in args.tasks.split(",") if t.strip()]
overrides.setdefault("run", {})["tasks"] = tasks
return overrides
def main():
"""主函数"""
logger = setup_logging()
args = parse_args()
try:
# 加载配置
cli_overrides = build_cli_overrides(args)
config = AppConfig.load(cli_overrides)
logger.info("配置加载完成")
logger.info(f"门店ID: {config.get('app.store_id')}")
logger.info(f"任务列表: {config.get('run.tasks')}")
# 创建调度器
scheduler = ETLScheduler(config, logger)
# 运行任务
task_codes = config.get("run.tasks")
scheduler.run_tasks(task_codes)
# 关闭连接
scheduler.close()
logger.info("ETL运行完成")
return 0
except Exception as e:
logger.error(f"ETL运行失败: {e}", exc_info=True)
return 1
if __name__ == "__main__":
sys.exit(main())

View File

View File

@@ -0,0 +1,120 @@
# -*- coding: utf-8 -*-
"""配置默认值定义"""
DEFAULTS = {
"app": {
"timezone": "Asia/Taipei",
"store_id": "",
"schema_oltp": "billiards",
"schema_etl": "etl_admin",
},
"db": {
"dsn": "",
"host": "",
"port": "",
"name": "",
"user": "",
"password": "",
"connect_timeout_sec": 20,
"batch_size": 1000,
"session": {
"timezone": "Asia/Taipei",
"statement_timeout_ms": 30000,
"lock_timeout_ms": 5000,
"idle_in_tx_timeout_ms": 600000,
},
},
"api": {
"base_url": "https://pc.ficoo.vip/apiprod/admin/v1",
"token": None,
"timeout_sec": 20,
"page_size": 200,
"params": {},
"retries": {
"max_attempts": 3,
"backoff_sec": [1, 2, 4],
},
"headers_extra": {},
},
"run": {
"tasks": [
"PRODUCTS",
"TABLES",
"MEMBERS",
"ASSISTANTS",
"PACKAGES_DEF",
"ORDERS",
"PAYMENTS",
"REFUNDS",
"COUPON_USAGE",
"INVENTORY_CHANGE",
"TOPUPS",
"TABLE_DISCOUNT",
"ASSISTANT_ABOLISH",
"LEDGER",
],
"window_minutes": {
"default_busy": 30,
"default_idle": 180,
},
"overlap_seconds": 120,
"idle_window": {
"start": "04:00",
"end": "16:00",
},
"allow_empty_result_advance": True,
},
"io": {
"export_root": r"C:\dev\LLTQ\export\JSON",
"log_root": r"C:\dev\LLTQ\export\LOG",
"manifest_name": "manifest.json",
"ingest_report_name": "ingest_report.json",
"write_pretty_json": True,
"max_file_bytes": 50 * 1024 * 1024,
},
"pipeline": {
# 运行流程FETCH_ONLY仅在线抓取落盘、INGEST_ONLY本地清洗入库、FULL抓取 + 清洗入库)
"flow": "FULL",
# 在线抓取 JSON 输出根目录按任务、run_id 与时间自动创建子目录)
"fetch_root": r"C:\dev\LLTQ\export\JSON",
# 本地清洗入库时的 JSON 输入目录(为空则默认使用本次抓取目录)
"ingest_source_dir": "",
},
"clean": {
"log_unknown_fields": True,
"unknown_fields_limit": 50,
"hash_key": {
"algo": "sha1",
"salt": "",
},
"strict_numeric": True,
"round_money_scale": 2,
},
"security": {
"redact_in_logs": True,
"redact_keys": ["token", "password", "Authorization"],
"echo_token_in_logs": False,
},
"ods": {
# ODS 离线重建/回放相关(仅开发/运维使用)
"json_doc_dir": r"C:\dev\LLTQ\export\test-json-doc",
"include_files": "",
"drop_schema_first": True,
},
}
# 任务代码常量
TASK_ORDERS = "ORDERS"
TASK_PAYMENTS = "PAYMENTS"
TASK_REFUNDS = "REFUNDS"
TASK_INVENTORY_CHANGE = "INVENTORY_CHANGE"
TASK_COUPON_USAGE = "COUPON_USAGE"
TASK_MEMBERS = "MEMBERS"
TASK_ASSISTANTS = "ASSISTANTS"
TASK_PRODUCTS = "PRODUCTS"
TASK_TABLES = "TABLES"
TASK_PACKAGES_DEF = "PACKAGES_DEF"
TASK_TOPUPS = "TOPUPS"
TASK_TABLE_DISCOUNT = "TABLE_DISCOUNT"
TASK_ASSISTANT_ABOLISH = "ASSISTANT_ABOLISH"
TASK_LEDGER = "LEDGER"

View File

@@ -0,0 +1,175 @@
# -*- coding: utf-8 -*-
"""环境变量解析"""
import os
import json
from pathlib import Path
from copy import deepcopy
ENV_MAP = {
"TIMEZONE": ("app.timezone",),
"STORE_ID": ("app.store_id",),
"SCHEMA_OLTP": ("app.schema_oltp",),
"SCHEMA_ETL": ("app.schema_etl",),
"PG_DSN": ("db.dsn",),
"PG_HOST": ("db.host",),
"PG_PORT": ("db.port",),
"PG_NAME": ("db.name",),
"PG_USER": ("db.user",),
"PG_PASSWORD": ("db.password",),
"PG_CONNECT_TIMEOUT": ("db.connect_timeout_sec",),
"API_BASE": ("api.base_url",),
"API_TOKEN": ("api.token",),
"FICOO_TOKEN": ("api.token",),
"API_TIMEOUT": ("api.timeout_sec",),
"API_PAGE_SIZE": ("api.page_size",),
"API_RETRY_MAX": ("api.retries.max_attempts",),
"API_RETRY_BACKOFF": ("api.retries.backoff_sec",),
"API_PARAMS": ("api.params",),
"EXPORT_ROOT": ("io.export_root",),
"LOG_ROOT": ("io.log_root",),
"MANIFEST_NAME": ("io.manifest_name",),
"INGEST_REPORT_NAME": ("io.ingest_report_name",),
"WRITE_PRETTY_JSON": ("io.write_pretty_json",),
"RUN_TASKS": ("run.tasks",),
"OVERLAP_SECONDS": ("run.overlap_seconds",),
"WINDOW_BUSY_MIN": ("run.window_minutes.default_busy",),
"WINDOW_IDLE_MIN": ("run.window_minutes.default_idle",),
"IDLE_START": ("run.idle_window.start",),
"IDLE_END": ("run.idle_window.end",),
"IDLE_WINDOW_START": ("run.idle_window.start",),
"IDLE_WINDOW_END": ("run.idle_window.end",),
"ALLOW_EMPTY_RESULT_ADVANCE": ("run.allow_empty_result_advance",),
"ALLOW_EMPTY_ADVANCE": ("run.allow_empty_result_advance",),
"PIPELINE_FLOW": ("pipeline.flow",),
"JSON_FETCH_ROOT": ("pipeline.fetch_root",),
"JSON_SOURCE_DIR": ("pipeline.ingest_source_dir",),
"FETCH_ROOT": ("pipeline.fetch_root",),
"INGEST_SOURCE_DIR": ("pipeline.ingest_source_dir",),
}
def _deep_set(d, dotted_keys, value):
cur = d
for k in dotted_keys[:-1]:
cur = cur.setdefault(k, {})
cur[dotted_keys[-1]] = value
def _coerce_env(v: str):
if v is None:
return None
s = v.strip()
if s.lower() in ("true", "false"):
return s.lower() == "true"
try:
if s.isdigit() or (s.startswith("-") and s[1:].isdigit()):
return int(s)
except Exception:
pass
if (s.startswith("{") and s.endswith("}")) or (s.startswith("[") and s.endswith("]")):
try:
return json.loads(s)
except Exception:
return s
return s
def _strip_inline_comment(value: str) -> str:
"""去掉未被引号包裹的内联注释"""
result = []
in_quote = False
quote_char = ""
escape = False
for ch in value:
if escape:
result.append(ch)
escape = False
continue
if ch == "\\":
escape = True
result.append(ch)
continue
if ch in ("'", '"'):
if not in_quote:
in_quote = True
quote_char = ch
elif quote_char == ch:
in_quote = False
quote_char = ""
result.append(ch)
continue
if ch == "#" and not in_quote:
break
result.append(ch)
return "".join(result).rstrip()
def _unquote_value(value: str) -> str:
"""处理引号/原始字符串以及尾随逗号"""
trimmed = value.strip()
trimmed = _strip_inline_comment(trimmed)
trimmed = trimmed.rstrip(",").rstrip()
if not trimmed:
return trimmed
if len(trimmed) >= 2 and trimmed[0] in ("'", '"') and trimmed[-1] == trimmed[0]:
return trimmed[1:-1]
if (
len(trimmed) >= 3
and trimmed[0] in ("r", "R")
and trimmed[1] in ("'", '"')
and trimmed[-1] == trimmed[1]
):
return trimmed[2:-1]
return trimmed
def _parse_dotenv_line(line: str) -> tuple[str, str] | None:
"""解析 .env 文件中的单行"""
stripped = line.strip()
if not stripped or stripped.startswith("#"):
return None
if stripped.startswith("export "):
stripped = stripped[len("export ") :].strip()
if "=" not in stripped:
return None
key, value = stripped.split("=", 1)
key = key.strip()
value = _unquote_value(value)
return key, value
def _load_dotenv_values() -> dict:
"""从项目根目录读取 .env 文件键值"""
if os.environ.get("ETL_SKIP_DOTENV") in ("1", "true", "TRUE", "True"):
return {}
root = Path(__file__).resolve().parents[1]
dotenv_path = root / ".env"
if not dotenv_path.exists():
return {}
values: dict[str, str] = {}
for line in dotenv_path.read_text(encoding="utf-8", errors="ignore").splitlines():
parsed = _parse_dotenv_line(line)
if parsed:
key, value = parsed
values[key] = value
return values
def _apply_env_values(cfg: dict, source: dict):
for env_key, dotted in ENV_MAP.items():
val = source.get(env_key)
if val is None:
continue
v2 = _coerce_env(val)
for path in dotted:
if path == "run.tasks" and isinstance(v2, str):
v2 = [item.strip() for item in v2.split(",") if item.strip()]
_deep_set(cfg, path.split("."), v2)
def load_env_overrides(defaults: dict) -> dict:
cfg = deepcopy(defaults)
# 先读取 .env再读取真实环境变量确保 CLI 仍然最高优先级
_apply_env_values(cfg, _load_dotenv_values())
_apply_env_values(cfg, os.environ)
return cfg

View File

@@ -0,0 +1,92 @@
# -*- coding: utf-8 -*-
"""配置管理主类"""
from copy import deepcopy
from .defaults import DEFAULTS
from .env_parser import load_env_overrides
class AppConfig:
"""应用配置管理器"""
def __init__(self, config_dict: dict):
self.config = config_dict
@classmethod
def load(cls, cli_overrides: dict = None):
"""加载配置: DEFAULTS < ENV < CLI"""
cfg = load_env_overrides(DEFAULTS)
if cli_overrides:
cls._deep_merge(cfg, cli_overrides)
# 规范化
cls._normalize(cfg)
cls._validate(cfg)
return cls(cfg)
@staticmethod
def _deep_merge(dst, src):
"""深度合并字典"""
for k, v in src.items():
if isinstance(v, dict) and isinstance(dst.get(k), dict):
AppConfig._deep_merge(dst[k], v)
else:
dst[k] = v
@staticmethod
def _normalize(cfg):
"""规范化配置"""
# 转换 store_id 为整数
try:
cfg["app"]["store_id"] = int(str(cfg["app"]["store_id"]).strip())
except Exception:
raise SystemExit("app.store_id 必须为整数")
# DSN 组装
if not cfg["db"]["dsn"]:
cfg["db"]["dsn"] = (
f"postgresql://{cfg['db']['user']}:{cfg['db']['password']}"
f"@{cfg['db']['host']}:{cfg['db']['port']}/{cfg['db']['name']}"
)
# connect_timeout 限定 1-20 秒
try:
timeout_sec = int(cfg["db"].get("connect_timeout_sec") or 5)
except Exception:
raise SystemExit("db.connect_timeout_sec 必须为整数")
cfg["db"]["connect_timeout_sec"] = max(1, min(timeout_sec, 20))
# 会话参数
cfg["db"].setdefault("session", {})
sess = cfg["db"]["session"]
sess.setdefault("timezone", cfg["app"]["timezone"])
for k in ("statement_timeout_ms", "lock_timeout_ms", "idle_in_tx_timeout_ms"):
if k in sess and sess[k] is not None:
try:
sess[k] = int(sess[k])
except Exception:
raise SystemExit(f"db.session.{k} 需为整数毫秒")
@staticmethod
def _validate(cfg):
"""验证必填配置"""
missing = []
if not cfg["app"]["store_id"]:
missing.append("app.store_id")
if missing:
raise SystemExit("缺少必需配置: " + ", ".join(missing))
def get(self, key: str, default=None):
"""获取配置值(支持点号路径)"""
keys = key.split(".")
val = self.config
for k in keys:
if isinstance(val, dict):
val = val.get(k)
else:
return default
return val if val is not None else default
def __getitem__(self, key):
return self.config[key]

View File

View File

@@ -0,0 +1,112 @@
# -*- coding: utf-8 -*-
"""
数据库操作批量、RETURNING支持
"""
import re
from typing import List, Dict, Tuple
import psycopg2.extras
from .connection import DatabaseConnection
class DatabaseOperations(DatabaseConnection):
"""扩展数据库操作包含批量upsert和returning支持"""
def batch_execute(self, sql: str, rows: List[Dict], page_size: int = 1000):
"""批量执行SQL不带RETURNING"""
if not rows:
return
with self.conn.cursor() as c:
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
def batch_upsert_with_returning(self, sql: str, rows: List[Dict], page_size: int = 1000) -> Tuple[int, int]:
"""
批量 UPSERT 并统计插入/更新数
Args:
sql: 包含RETURNING子句的SQL
rows: 数据行列表
page_size: 批次大小
Returns:
(inserted_count, updated_count) 元组
"""
if not rows:
return (0, 0)
use_returning = "RETURNING" in sql.upper()
with self.conn.cursor() as c:
if not use_returning:
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
return (0, 0)
# 优先尝试向量化执行
try:
inserted, updated = self._execute_with_returning_vectorized(c, sql, rows, page_size)
return (inserted, updated)
except Exception:
# 回退到逐行执行
return self._execute_with_returning_row_by_row(c, sql, rows)
def _execute_with_returning_vectorized(self, cursor, sql: str, rows: List[Dict], page_size: int) -> Tuple[int, int]:
"""向量化执行使用execute_values"""
# 解析VALUES子句
m = re.search(r"VALUES\s*\((.*?)\)", sql, flags=re.IGNORECASE | re.DOTALL)
if not m:
raise ValueError("Cannot parse VALUES clause")
tpl = "(" + m.group(1) + ")"
base_sql = sql[:m.start()] + "VALUES %s" + sql[m.end():]
ret = psycopg2.extras.execute_values(
cursor, base_sql, rows, template=tpl, page_size=page_size, fetch=True
)
if not ret:
return (0, 0)
inserted = 0
for rec in ret:
flag = self._extract_inserted_flag(rec)
if flag:
inserted += 1
return (inserted, len(ret) - inserted)
def _execute_with_returning_row_by_row(self, cursor, sql: str, rows: List[Dict]) -> Tuple[int, int]:
"""逐行执行(回退方案)"""
inserted = 0
updated = 0
for r in rows:
cursor.execute(sql, r)
try:
rec = cursor.fetchone()
except Exception:
rec = None
flag = self._extract_inserted_flag(rec) if rec else None
if flag:
inserted += 1
else:
updated += 1
return (inserted, updated)
@staticmethod
def _extract_inserted_flag(rec) -> bool:
"""从返回记录中提取inserted标志"""
if isinstance(rec, tuple):
return bool(rec[0])
elif isinstance(rec, dict):
return bool(rec.get("inserted"))
else:
try:
return bool(rec["inserted"])
except Exception:
return False
# 为了向后兼容提供Pg别名
Pg = DatabaseOperations

View File

@@ -0,0 +1,63 @@
# -*- coding: utf-8 -*-
"""Database connection manager with capped connect_timeout."""
import psycopg2
import psycopg2.extras
class DatabaseConnection:
"""Wrap psycopg2 connection with session parameters and timeout guard."""
def __init__(self, dsn: str, session: dict = None, connect_timeout: int = None):
timeout_val = connect_timeout if connect_timeout is not None else 5
# PRD: database connect_timeout must not exceed 20 seconds.
timeout_val = max(1, min(int(timeout_val), 20))
self.conn = psycopg2.connect(dsn, connect_timeout=timeout_val)
self.conn.autocommit = False
# Session parameters (timezone, statement timeout, etc.)
if session:
with self.conn.cursor() as c:
if session.get("timezone"):
c.execute("SET TIME ZONE %s", (session["timezone"],))
if session.get("statement_timeout_ms") is not None:
c.execute(
"SET statement_timeout = %s",
(int(session["statement_timeout_ms"]),),
)
if session.get("lock_timeout_ms") is not None:
c.execute(
"SET lock_timeout = %s", (int(session["lock_timeout_ms"]),)
)
if session.get("idle_in_tx_timeout_ms") is not None:
c.execute(
"SET idle_in_transaction_session_timeout = %s",
(int(session["idle_in_tx_timeout_ms"]),),
)
def query(self, sql: str, args=None):
"""Execute a query and fetch all rows."""
with self.conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as c:
c.execute(sql, args)
return c.fetchall()
def execute(self, sql: str, args=None):
"""Execute a SQL statement without returning rows."""
with self.conn.cursor() as c:
c.execute(sql, args)
def commit(self):
"""Commit current transaction."""
self.conn.commit()
def rollback(self):
"""Rollback current transaction."""
self.conn.rollback()
def close(self):
"""Safely close the connection."""
try:
self.conn.close()
except Exception:
pass

View File

@@ -0,0 +1,99 @@
# -*- coding: utf-8 -*-
"""数据库批量操作"""
import psycopg2.extras
import re
class DatabaseOperations:
"""数据库批量操作封装"""
def __init__(self, connection):
self._connection = connection
self.conn = connection.conn
def batch_execute(self, sql: str, rows: list, page_size: int = 1000):
"""批量执行SQL"""
if not rows:
return
with self.conn.cursor() as c:
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
def batch_upsert_with_returning(self, sql: str, rows: list,
page_size: int = 1000) -> tuple:
"""批量UPSERT并返回插入/更新计数"""
if not rows:
return (0, 0)
use_returning = "RETURNING" in sql.upper()
with self.conn.cursor() as c:
if not use_returning:
psycopg2.extras.execute_batch(c, sql, rows, page_size=page_size)
return (0, 0)
# 尝试向量化执行
try:
m = re.search(r"VALUES\s*\((.*?)\)", sql, flags=re.IGNORECASE | re.DOTALL)
if m:
tpl = "(" + m.group(1) + ")"
base_sql = sql[:m.start()] + "VALUES %s" + sql[m.end():]
ret = psycopg2.extras.execute_values(
c, base_sql, rows, template=tpl, page_size=page_size, fetch=True
)
if not ret:
return (0, 0)
inserted = sum(1 for rec in ret if self._is_inserted(rec))
return (inserted, len(ret) - inserted)
except Exception:
pass
# 回退:逐行执行
inserted = 0
updated = 0
for r in rows:
c.execute(sql, r)
try:
rec = c.fetchone()
except Exception:
rec = None
if self._is_inserted(rec):
inserted += 1
else:
updated += 1
return (inserted, updated)
@staticmethod
def _is_inserted(rec) -> bool:
"""判断是否为插入操作"""
if rec is None:
return False
if isinstance(rec, tuple):
return bool(rec[0])
if isinstance(rec, dict):
return bool(rec.get("inserted"))
return False
# --- pass-through helpers -------------------------------------------------
def commit(self):
"""提交事务(委托给底层连接)"""
self._connection.commit()
def rollback(self):
"""回滚事务(委托给底层连接)"""
self._connection.rollback()
def query(self, sql: str, args=None):
"""执行查询并返回结果"""
return self._connection.query(sql, args)
def execute(self, sql: str, args=None):
"""执行任意 SQL"""
self._connection.execute(sql, args)
def cursor(self):
"""暴露原生 cursor供特殊操作使用"""
return self.conn.cursor()

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,39 @@
-- 将新的 ODS 任务注册到 etl_admin.etl_task根据需要替换 store_id
-- 使用方式(示例):
-- psql "$PG_DSN" -f etl_billiards/database/seed_ods_tasks.sql
-- 或者在 psql 中执行本文件内容。
WITH target_store AS (
SELECT 2790685415443269::bigint AS store_id -- TODO: 替换为实际 store_id
),
task_codes AS (
SELECT unnest(ARRAY[
'ODS_ASSISTANT_ACCOUNTS',
'ODS_ASSISTANT_LEDGER',
'ODS_ASSISTANT_ABOLISH',
'ODS_INVENTORY_CHANGE',
'ODS_INVENTORY_STOCK',
'ODS_PACKAGE',
'ODS_GROUP_BUY_REDEMPTION',
'ODS_MEMBER',
'ODS_MEMBER_BALANCE',
'ODS_MEMBER_CARD',
'ODS_PAYMENT',
'ODS_REFUND',
'ODS_COUPON_VERIFY',
'ODS_RECHARGE_SETTLE',
'ODS_TABLES',
'ODS_GOODS_CATEGORY',
'ODS_STORE_GOODS',
'ODS_TABLE_DISCOUNT',
'ODS_TENANT_GOODS',
'ODS_SETTLEMENT_TICKET',
'ODS_ORDER_SETTLE'
]) AS task_code
)
INSERT INTO etl_admin.etl_task (task_code, store_id, enabled)
SELECT t.task_code, s.store_id, TRUE
FROM task_codes t CROSS JOIN target_store s
ON CONFLICT (task_code, store_id) DO UPDATE
SET enabled = EXCLUDED.enabled;

View File

@@ -0,0 +1,16 @@
{
"folders": [
{
"path": ".."
},
{
"name": "LLZQ-server",
"path": "../../../LLZQ-server"
},
{
"name": "feiqiu-ETL-reload",
"path": "../../feiqiu-ETL-reload"
}
],
"settings": {}
}

View File

View File

@@ -0,0 +1,23 @@
# -*- coding: utf-8 -*-
"""数据加载器基类"""
import logging
class BaseLoader:
"""数据加载器基类"""
def __init__(self, db_ops, logger=None):
self.db = db_ops
self.logger = logger or logging.getLogger(self.__class__.__name__)
def upsert(self, records: list) -> tuple:
"""
执行 UPSERT 操作
返回: (inserted_count, updated_count, skipped_count)
"""
raise NotImplementedError("子类需实现 upsert 方法")
def _batch_size(self) -> int:
"""批次大小"""
return 1000

View File

@@ -0,0 +1,114 @@
# -*- coding: utf-8 -*-
"""助教维度加载器"""
from ..base_loader import BaseLoader
class AssistantLoader(BaseLoader):
"""写入 dim_assistant"""
def upsert_assistants(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.dim_assistant (
store_id,
assistant_id,
assistant_no,
nickname,
real_name,
gender,
mobile,
level,
team_id,
team_name,
assistant_status,
work_status,
entry_time,
resign_time,
start_time,
end_time,
create_time,
update_time,
system_role_id,
online_status,
allow_cx,
charge_way,
pd_unit_price,
cx_unit_price,
is_guaranteed,
is_team_leader,
serial_number,
show_sort,
is_delete,
raw_data
)
VALUES (
%(store_id)s,
%(assistant_id)s,
%(assistant_no)s,
%(nickname)s,
%(real_name)s,
%(gender)s,
%(mobile)s,
%(level)s,
%(team_id)s,
%(team_name)s,
%(assistant_status)s,
%(work_status)s,
%(entry_time)s,
%(resign_time)s,
%(start_time)s,
%(end_time)s,
%(create_time)s,
%(update_time)s,
%(system_role_id)s,
%(online_status)s,
%(allow_cx)s,
%(charge_way)s,
%(pd_unit_price)s,
%(cx_unit_price)s,
%(is_guaranteed)s,
%(is_team_leader)s,
%(serial_number)s,
%(show_sort)s,
%(is_delete)s,
%(raw_data)s
)
ON CONFLICT (store_id, assistant_id) DO UPDATE SET
assistant_no = EXCLUDED.assistant_no,
nickname = EXCLUDED.nickname,
real_name = EXCLUDED.real_name,
gender = EXCLUDED.gender,
mobile = EXCLUDED.mobile,
level = EXCLUDED.level,
team_id = EXCLUDED.team_id,
team_name = EXCLUDED.team_name,
assistant_status= EXCLUDED.assistant_status,
work_status = EXCLUDED.work_status,
entry_time = EXCLUDED.entry_time,
resign_time = EXCLUDED.resign_time,
start_time = EXCLUDED.start_time,
end_time = EXCLUDED.end_time,
update_time = COALESCE(EXCLUDED.update_time, now()),
system_role_id = EXCLUDED.system_role_id,
online_status = EXCLUDED.online_status,
allow_cx = EXCLUDED.allow_cx,
charge_way = EXCLUDED.charge_way,
pd_unit_price = EXCLUDED.pd_unit_price,
cx_unit_price = EXCLUDED.cx_unit_price,
is_guaranteed = EXCLUDED.is_guaranteed,
is_team_leader = EXCLUDED.is_team_leader,
serial_number = EXCLUDED.serial_number,
show_sort = EXCLUDED.show_sort,
is_delete = EXCLUDED.is_delete,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,34 @@
# -*- coding: utf-8 -*-
"""会员维度表加载器"""
from ..base_loader import BaseLoader
class MemberLoader(BaseLoader):
"""会员维度加载器"""
def upsert_members(self, records: list, store_id: int) -> tuple:
"""加载会员数据"""
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.dim_member (
store_id, member_id, member_name, phone, balance,
status, register_time, raw_data
)
VALUES (
%(store_id)s, %(member_id)s, %(member_name)s, %(phone)s, %(balance)s,
%(status)s, %(register_time)s, %(raw_data)s
)
ON CONFLICT (store_id, member_id) DO UPDATE SET
member_name = EXCLUDED.member_name,
phone = EXCLUDED.phone,
balance = EXCLUDED.balance,
status = EXCLUDED.status,
register_time = EXCLUDED.register_time,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
return (inserted, updated, 0)

View File

@@ -0,0 +1,91 @@
# -*- coding: utf-8 -*-
"""团购/套餐定义加载器"""
from ..base_loader import BaseLoader
class PackageDefinitionLoader(BaseLoader):
"""写入 dim_package_coupon"""
def upsert_packages(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.dim_package_coupon (
store_id,
package_id,
package_code,
package_name,
table_area_id,
table_area_name,
selling_price,
duration_seconds,
start_time,
end_time,
type,
is_enabled,
is_delete,
usable_count,
creator_name,
date_type,
group_type,
coupon_money,
area_tag_type,
system_group_type,
card_type_ids,
raw_data
)
VALUES (
%(store_id)s,
%(package_id)s,
%(package_code)s,
%(package_name)s,
%(table_area_id)s,
%(table_area_name)s,
%(selling_price)s,
%(duration_seconds)s,
%(start_time)s,
%(end_time)s,
%(type)s,
%(is_enabled)s,
%(is_delete)s,
%(usable_count)s,
%(creator_name)s,
%(date_type)s,
%(group_type)s,
%(coupon_money)s,
%(area_tag_type)s,
%(system_group_type)s,
%(card_type_ids)s,
%(raw_data)s
)
ON CONFLICT (store_id, package_id) DO UPDATE SET
package_code = EXCLUDED.package_code,
package_name = EXCLUDED.package_name,
table_area_id = EXCLUDED.table_area_id,
table_area_name = EXCLUDED.table_area_name,
selling_price = EXCLUDED.selling_price,
duration_seconds = EXCLUDED.duration_seconds,
start_time = EXCLUDED.start_time,
end_time = EXCLUDED.end_time,
type = EXCLUDED.type,
is_enabled = EXCLUDED.is_enabled,
is_delete = EXCLUDED.is_delete,
usable_count = EXCLUDED.usable_count,
creator_name = EXCLUDED.creator_name,
date_type = EXCLUDED.date_type,
group_type = EXCLUDED.group_type,
coupon_money = EXCLUDED.coupon_money,
area_tag_type = EXCLUDED.area_tag_type,
system_group_type = EXCLUDED.system_group_type,
card_type_ids = EXCLUDED.card_type_ids,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,134 @@
# -*- coding: utf-8 -*-
"""商品维度 + 价格SCD2 加载器"""
from ..base_loader import BaseLoader
from scd.scd2_handler import SCD2Handler
class ProductLoader(BaseLoader):
"""商品维度加载器dim_product + dim_product_price_scd"""
def __init__(self, db_ops):
super().__init__(db_ops)
# SCD2 处理器,复用通用逻辑
self.scd_handler = SCD2Handler(db_ops)
def upsert_products(self, records: list, store_id: int) -> tuple:
"""
加载商品维度及价格SCD
返回: (inserted_count, updated_count, skipped_count)
"""
if not records:
return (0, 0, 0)
# 1) 维度主表billiards.dim_product
sql_base = """
INSERT INTO billiards.dim_product (
store_id,
product_id,
site_product_id,
product_name,
category_id,
category_name,
second_category_id,
unit,
cost_price,
sale_price,
allow_discount,
status,
supplier_id,
barcode,
is_combo,
created_time,
updated_time,
raw_data
)
VALUES (
%(store_id)s,
%(product_id)s,
%(site_product_id)s,
%(product_name)s,
%(category_id)s,
%(category_name)s,
%(second_category_id)s,
%(unit)s,
%(cost_price)s,
%(sale_price)s,
%(allow_discount)s,
%(status)s,
%(supplier_id)s,
%(barcode)s,
%(is_combo)s,
%(created_time)s,
%(updated_time)s,
%(raw_data)s
)
ON CONFLICT (store_id, product_id) DO UPDATE SET
site_product_id = EXCLUDED.site_product_id,
product_name = EXCLUDED.product_name,
category_id = EXCLUDED.category_id,
category_name = EXCLUDED.category_name,
second_category_id = EXCLUDED.second_category_id,
unit = EXCLUDED.unit,
cost_price = EXCLUDED.cost_price,
sale_price = EXCLUDED.sale_price,
allow_discount = EXCLUDED.allow_discount,
status = EXCLUDED.status,
supplier_id = EXCLUDED.supplier_id,
barcode = EXCLUDED.barcode,
is_combo = EXCLUDED.is_combo,
updated_time = COALESCE(EXCLUDED.updated_time, now()),
raw_data = EXCLUDED.raw_data
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql_base,
records,
page_size=self._batch_size(),
)
# 2) 价格 SCD2billiards.dim_product_price_scd
# 只追踪 price + 类目 + 名称等字段的历史
tracked_fields = [
"product_name",
"category_id",
"category_name",
"second_category_id",
"cost_price",
"sale_price",
"allow_discount",
"status",
]
natural_key = ["store_id", "product_id"]
for rec in records:
effective_date = rec.get("updated_time") or rec.get("created_time")
scd_record = {
"store_id": rec["store_id"],
"product_id": rec["product_id"],
"product_name": rec.get("product_name"),
"category_id": rec.get("category_id"),
"category_name": rec.get("category_name"),
"second_category_id": rec.get("second_category_id"),
"cost_price": rec.get("cost_price"),
"sale_price": rec.get("sale_price"),
"allow_discount": rec.get("allow_discount"),
"status": rec.get("status"),
# 原表中有 raw_data jsonb 字段,这里直接复用 task 传入的 raw_data
"raw_data": rec.get("raw_data"),
}
# 这里我们不强行区分 INSERT/UPDATE/SKIP对 ETL 统计来说意义不大
self.scd_handler.upsert(
table_name="billiards.dim_product_price_scd",
natural_key=natural_key,
tracked_fields=tracked_fields,
record=scd_record,
effective_date=effective_date,
)
# skipped_count 统一按 0 返回(真正被丢弃的记录在 Task 端已经过滤)
return (inserted, updated, 0)

View File

@@ -0,0 +1,80 @@
# -*- coding: utf-8 -*-
"""台桌维度加载器"""
from ..base_loader import BaseLoader
class TableLoader(BaseLoader):
"""将台桌档案写入 dim_table"""
def upsert_tables(self, records: list) -> tuple:
"""批量写入台桌档案"""
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.dim_table (
store_id,
table_id,
site_id,
area_id,
area_name,
table_name,
table_price,
table_status,
table_status_name,
light_status,
is_rest_area,
show_status,
virtual_table,
charge_free,
only_allow_groupon,
is_online_reservation,
created_time,
raw_data
)
VALUES (
%(store_id)s,
%(table_id)s,
%(site_id)s,
%(area_id)s,
%(area_name)s,
%(table_name)s,
%(table_price)s,
%(table_status)s,
%(table_status_name)s,
%(light_status)s,
%(is_rest_area)s,
%(show_status)s,
%(virtual_table)s,
%(charge_free)s,
%(only_allow_groupon)s,
%(is_online_reservation)s,
%(created_time)s,
%(raw_data)s
)
ON CONFLICT (store_id, table_id) DO UPDATE SET
site_id = EXCLUDED.site_id,
area_id = EXCLUDED.area_id,
area_name = EXCLUDED.area_name,
table_name = EXCLUDED.table_name,
table_price = EXCLUDED.table_price,
table_status = EXCLUDED.table_status,
table_status_name = EXCLUDED.table_status_name,
light_status = EXCLUDED.light_status,
is_rest_area = EXCLUDED.is_rest_area,
show_status = EXCLUDED.show_status,
virtual_table = EXCLUDED.virtual_table,
charge_free = EXCLUDED.charge_free,
only_allow_groupon = EXCLUDED.only_allow_groupon,
is_online_reservation = EXCLUDED.is_online_reservation,
created_time = COALESCE(EXCLUDED.created_time, dim_table.created_time),
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

View File

@@ -0,0 +1,64 @@
# -*- coding: utf-8 -*-
"""助教作废事实表"""
from ..base_loader import BaseLoader
class AssistantAbolishLoader(BaseLoader):
"""写入 fact_assistant_abolish"""
def upsert_records(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_assistant_abolish (
store_id,
abolish_id,
table_id,
table_name,
table_area_id,
table_area,
assistant_no,
assistant_name,
charge_minutes,
abolish_amount,
create_time,
trash_reason,
raw_data
)
VALUES (
%(store_id)s,
%(abolish_id)s,
%(table_id)s,
%(table_name)s,
%(table_area_id)s,
%(table_area)s,
%(assistant_no)s,
%(assistant_name)s,
%(charge_minutes)s,
%(abolish_amount)s,
%(create_time)s,
%(trash_reason)s,
%(raw_data)s
)
ON CONFLICT (store_id, abolish_id) DO UPDATE SET
table_id = EXCLUDED.table_id,
table_name = EXCLUDED.table_name,
table_area_id = EXCLUDED.table_area_id,
table_area = EXCLUDED.table_area,
assistant_no = EXCLUDED.assistant_no,
assistant_name = EXCLUDED.assistant_name,
charge_minutes = EXCLUDED.charge_minutes,
abolish_amount = EXCLUDED.abolish_amount,
create_time = EXCLUDED.create_time,
trash_reason = EXCLUDED.trash_reason,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,136 @@
# -*- coding: utf-8 -*-
"""助教流水事实表"""
from ..base_loader import BaseLoader
class AssistantLedgerLoader(BaseLoader):
"""写入 fact_assistant_ledger"""
def upsert_ledgers(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_assistant_ledger (
store_id,
ledger_id,
assistant_no,
assistant_name,
nickname,
level_name,
table_name,
ledger_unit_price,
ledger_count,
ledger_amount,
projected_income,
service_money,
member_discount_amount,
manual_discount_amount,
coupon_deduct_money,
order_trade_no,
order_settle_id,
operator_id,
operator_name,
assistant_team_id,
assistant_level,
site_table_id,
order_assistant_id,
site_assistant_id,
user_id,
ledger_start_time,
ledger_end_time,
start_use_time,
last_use_time,
income_seconds,
real_use_seconds,
is_trash,
trash_reason,
is_confirm,
ledger_status,
create_time,
raw_data
)
VALUES (
%(store_id)s,
%(ledger_id)s,
%(assistant_no)s,
%(assistant_name)s,
%(nickname)s,
%(level_name)s,
%(table_name)s,
%(ledger_unit_price)s,
%(ledger_count)s,
%(ledger_amount)s,
%(projected_income)s,
%(service_money)s,
%(member_discount_amount)s,
%(manual_discount_amount)s,
%(coupon_deduct_money)s,
%(order_trade_no)s,
%(order_settle_id)s,
%(operator_id)s,
%(operator_name)s,
%(assistant_team_id)s,
%(assistant_level)s,
%(site_table_id)s,
%(order_assistant_id)s,
%(site_assistant_id)s,
%(user_id)s,
%(ledger_start_time)s,
%(ledger_end_time)s,
%(start_use_time)s,
%(last_use_time)s,
%(income_seconds)s,
%(real_use_seconds)s,
%(is_trash)s,
%(trash_reason)s,
%(is_confirm)s,
%(ledger_status)s,
%(create_time)s,
%(raw_data)s
)
ON CONFLICT (store_id, ledger_id) DO UPDATE SET
assistant_no = EXCLUDED.assistant_no,
assistant_name = EXCLUDED.assistant_name,
nickname = EXCLUDED.nickname,
level_name = EXCLUDED.level_name,
table_name = EXCLUDED.table_name,
ledger_unit_price = EXCLUDED.ledger_unit_price,
ledger_count = EXCLUDED.ledger_count,
ledger_amount = EXCLUDED.ledger_amount,
projected_income = EXCLUDED.projected_income,
service_money = EXCLUDED.service_money,
member_discount_amount = EXCLUDED.member_discount_amount,
manual_discount_amount = EXCLUDED.manual_discount_amount,
coupon_deduct_money = EXCLUDED.coupon_deduct_money,
order_trade_no = EXCLUDED.order_trade_no,
order_settle_id = EXCLUDED.order_settle_id,
operator_id = EXCLUDED.operator_id,
operator_name = EXCLUDED.operator_name,
assistant_team_id = EXCLUDED.assistant_team_id,
assistant_level = EXCLUDED.assistant_level,
site_table_id = EXCLUDED.site_table_id,
order_assistant_id = EXCLUDED.order_assistant_id,
site_assistant_id = EXCLUDED.site_assistant_id,
user_id = EXCLUDED.user_id,
ledger_start_time = EXCLUDED.ledger_start_time,
ledger_end_time = EXCLUDED.ledger_end_time,
start_use_time = EXCLUDED.start_use_time,
last_use_time = EXCLUDED.last_use_time,
income_seconds = EXCLUDED.income_seconds,
real_use_seconds = EXCLUDED.real_use_seconds,
is_trash = EXCLUDED.is_trash,
trash_reason = EXCLUDED.trash_reason,
is_confirm = EXCLUDED.is_confirm,
ledger_status = EXCLUDED.ledger_status,
create_time = EXCLUDED.create_time,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,91 @@
# -*- coding: utf-8 -*-
"""券核销事实表"""
from ..base_loader import BaseLoader
class CouponUsageLoader(BaseLoader):
"""写入 fact_coupon_usage"""
def upsert_coupon_usage(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_coupon_usage (
store_id,
usage_id,
coupon_code,
coupon_channel,
coupon_name,
sale_price,
coupon_money,
coupon_free_time,
use_status,
create_time,
consume_time,
operator_id,
operator_name,
table_id,
site_order_id,
group_package_id,
coupon_remark,
deal_id,
certificate_id,
verify_id,
is_delete,
raw_data
)
VALUES (
%(store_id)s,
%(usage_id)s,
%(coupon_code)s,
%(coupon_channel)s,
%(coupon_name)s,
%(sale_price)s,
%(coupon_money)s,
%(coupon_free_time)s,
%(use_status)s,
%(create_time)s,
%(consume_time)s,
%(operator_id)s,
%(operator_name)s,
%(table_id)s,
%(site_order_id)s,
%(group_package_id)s,
%(coupon_remark)s,
%(deal_id)s,
%(certificate_id)s,
%(verify_id)s,
%(is_delete)s,
%(raw_data)s
)
ON CONFLICT (store_id, usage_id) DO UPDATE SET
coupon_code = EXCLUDED.coupon_code,
coupon_channel = EXCLUDED.coupon_channel,
coupon_name = EXCLUDED.coupon_name,
sale_price = EXCLUDED.sale_price,
coupon_money = EXCLUDED.coupon_money,
coupon_free_time = EXCLUDED.coupon_free_time,
use_status = EXCLUDED.use_status,
create_time = EXCLUDED.create_time,
consume_time = EXCLUDED.consume_time,
operator_id = EXCLUDED.operator_id,
operator_name = EXCLUDED.operator_name,
table_id = EXCLUDED.table_id,
site_order_id = EXCLUDED.site_order_id,
group_package_id = EXCLUDED.group_package_id,
coupon_remark = EXCLUDED.coupon_remark,
deal_id = EXCLUDED.deal_id,
certificate_id = EXCLUDED.certificate_id,
verify_id = EXCLUDED.verify_id,
is_delete = EXCLUDED.is_delete,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
"""库存变动事实表"""
from ..base_loader import BaseLoader
class InventoryChangeLoader(BaseLoader):
"""写入 fact_inventory_change"""
def upsert_changes(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_inventory_change (
store_id,
change_id,
site_goods_id,
stock_type,
goods_name,
change_time,
start_qty,
end_qty,
change_qty,
unit,
price,
operator_name,
remark,
goods_category_id,
goods_second_category_id,
raw_data
)
VALUES (
%(store_id)s,
%(change_id)s,
%(site_goods_id)s,
%(stock_type)s,
%(goods_name)s,
%(change_time)s,
%(start_qty)s,
%(end_qty)s,
%(change_qty)s,
%(unit)s,
%(price)s,
%(operator_name)s,
%(remark)s,
%(goods_category_id)s,
%(goods_second_category_id)s,
%(raw_data)s
)
ON CONFLICT (store_id, change_id) DO UPDATE SET
site_goods_id = EXCLUDED.site_goods_id,
stock_type = EXCLUDED.stock_type,
goods_name = EXCLUDED.goods_name,
change_time = EXCLUDED.change_time,
start_qty = EXCLUDED.start_qty,
end_qty = EXCLUDED.end_qty,
change_qty = EXCLUDED.change_qty,
unit = EXCLUDED.unit,
price = EXCLUDED.price,
operator_name = EXCLUDED.operator_name,
remark = EXCLUDED.remark,
goods_category_id = EXCLUDED.goods_category_id,
goods_second_category_id = EXCLUDED.goods_second_category_id,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,42 @@
# -*- coding: utf-8 -*-
"""订单事实表加载器"""
from ..base_loader import BaseLoader
class OrderLoader(BaseLoader):
"""订单数据加载器"""
def upsert_orders(self, records: list, store_id: int) -> tuple:
"""加载订单数据"""
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_order (
store_id, order_id, order_no, member_id, table_id,
order_time, end_time, total_amount, discount_amount,
final_amount, pay_status, order_status, remark, raw_data
)
VALUES (
%(store_id)s, %(order_id)s, %(order_no)s, %(member_id)s, %(table_id)s,
%(order_time)s, %(end_time)s, %(total_amount)s, %(discount_amount)s,
%(final_amount)s, %(pay_status)s, %(order_status)s, %(remark)s, %(raw_data)s
)
ON CONFLICT (store_id, order_id) DO UPDATE SET
order_no = EXCLUDED.order_no,
member_id = EXCLUDED.member_id,
table_id = EXCLUDED.table_id,
order_time = EXCLUDED.order_time,
end_time = EXCLUDED.end_time,
total_amount = EXCLUDED.total_amount,
discount_amount = EXCLUDED.discount_amount,
final_amount = EXCLUDED.final_amount,
pay_status = EXCLUDED.pay_status,
order_status = EXCLUDED.order_status,
remark = EXCLUDED.remark,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
return (inserted, updated, 0)

View File

@@ -0,0 +1,61 @@
# -*- coding: utf-8 -*-
"""支付事实表加载器"""
from ..base_loader import BaseLoader
class PaymentLoader(BaseLoader):
"""支付数据加载器"""
def upsert_payments(self, records: list, store_id: int) -> tuple:
"""加载支付数据"""
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_payment (
store_id, pay_id, order_id,
site_id, tenant_id,
order_settle_id, order_trade_no,
relate_type, relate_id,
create_time, pay_time,
pay_amount, fee_amount, discount_amount,
payment_method, pay_type,
online_pay_channel, pay_terminal,
pay_status, remark, raw_data
)
VALUES (
%(store_id)s, %(pay_id)s, %(order_id)s,
%(site_id)s, %(tenant_id)s,
%(order_settle_id)s, %(order_trade_no)s,
%(relate_type)s, %(relate_id)s,
%(create_time)s, %(pay_time)s,
%(pay_amount)s, %(fee_amount)s, %(discount_amount)s,
%(payment_method)s, %(pay_type)s,
%(online_pay_channel)s, %(pay_terminal)s,
%(pay_status)s, %(remark)s, %(raw_data)s
)
ON CONFLICT (store_id, pay_id) DO UPDATE SET
order_settle_id = EXCLUDED.order_settle_id,
order_trade_no = EXCLUDED.order_trade_no,
relate_type = EXCLUDED.relate_type,
relate_id = EXCLUDED.relate_id,
order_id = EXCLUDED.order_id,
site_id = EXCLUDED.site_id,
tenant_id = EXCLUDED.tenant_id,
create_time = EXCLUDED.create_time,
pay_time = EXCLUDED.pay_time,
pay_amount = EXCLUDED.pay_amount,
fee_amount = EXCLUDED.fee_amount,
discount_amount = EXCLUDED.discount_amount,
payment_method = EXCLUDED.payment_method,
pay_type = EXCLUDED.pay_type,
online_pay_channel = EXCLUDED.online_pay_channel,
pay_terminal = EXCLUDED.pay_terminal,
pay_status = EXCLUDED.pay_status,
remark = EXCLUDED.remark,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(sql, records, page_size=self._batch_size())
return (inserted, updated, 0)

View File

@@ -0,0 +1,88 @@
# -*- coding: utf-8 -*-
"""退款事实表加载器"""
from ..base_loader import BaseLoader
class RefundLoader(BaseLoader):
"""写入 fact_refund"""
def upsert_refunds(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_refund (
store_id,
refund_id,
site_id,
tenant_id,
pay_amount,
pay_status,
pay_time,
create_time,
relate_type,
relate_id,
payment_method,
refund_amount,
action_type,
pay_terminal,
operator_id,
channel_pay_no,
channel_fee,
is_delete,
member_id,
member_card_id,
raw_data
)
VALUES (
%(store_id)s,
%(refund_id)s,
%(site_id)s,
%(tenant_id)s,
%(pay_amount)s,
%(pay_status)s,
%(pay_time)s,
%(create_time)s,
%(relate_type)s,
%(relate_id)s,
%(payment_method)s,
%(refund_amount)s,
%(action_type)s,
%(pay_terminal)s,
%(operator_id)s,
%(channel_pay_no)s,
%(channel_fee)s,
%(is_delete)s,
%(member_id)s,
%(member_card_id)s,
%(raw_data)s
)
ON CONFLICT (store_id, refund_id) DO UPDATE SET
site_id = EXCLUDED.site_id,
tenant_id = EXCLUDED.tenant_id,
pay_amount = EXCLUDED.pay_amount,
pay_status = EXCLUDED.pay_status,
pay_time = EXCLUDED.pay_time,
create_time = EXCLUDED.create_time,
relate_type = EXCLUDED.relate_type,
relate_id = EXCLUDED.relate_id,
payment_method = EXCLUDED.payment_method,
refund_amount = EXCLUDED.refund_amount,
action_type = EXCLUDED.action_type,
pay_terminal = EXCLUDED.pay_terminal,
operator_id = EXCLUDED.operator_id,
channel_pay_no = EXCLUDED.channel_pay_no,
channel_fee = EXCLUDED.channel_fee,
is_delete = EXCLUDED.is_delete,
member_id = EXCLUDED.member_id,
member_card_id = EXCLUDED.member_card_id,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,82 @@
# -*- coding: utf-8 -*-
"""台费打折事实表"""
from ..base_loader import BaseLoader
class TableDiscountLoader(BaseLoader):
"""写入 fact_table_discount"""
def upsert_discounts(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_table_discount (
store_id,
discount_id,
adjust_type,
applicant_id,
applicant_name,
operator_id,
operator_name,
ledger_amount,
ledger_count,
ledger_name,
ledger_status,
order_settle_id,
order_trade_no,
site_table_id,
table_area_id,
table_area_name,
create_time,
is_delete,
raw_data
)
VALUES (
%(store_id)s,
%(discount_id)s,
%(adjust_type)s,
%(applicant_id)s,
%(applicant_name)s,
%(operator_id)s,
%(operator_name)s,
%(ledger_amount)s,
%(ledger_count)s,
%(ledger_name)s,
%(ledger_status)s,
%(order_settle_id)s,
%(order_trade_no)s,
%(site_table_id)s,
%(table_area_id)s,
%(table_area_name)s,
%(create_time)s,
%(is_delete)s,
%(raw_data)s
)
ON CONFLICT (store_id, discount_id) DO UPDATE SET
adjust_type = EXCLUDED.adjust_type,
applicant_id = EXCLUDED.applicant_id,
applicant_name = EXCLUDED.applicant_name,
operator_id = EXCLUDED.operator_id,
operator_name = EXCLUDED.operator_name,
ledger_amount = EXCLUDED.ledger_amount,
ledger_count = EXCLUDED.ledger_count,
ledger_name = EXCLUDED.ledger_name,
ledger_status = EXCLUDED.ledger_status,
order_settle_id = EXCLUDED.order_settle_id,
order_trade_no = EXCLUDED.order_trade_no,
site_table_id = EXCLUDED.site_table_id,
table_area_id = EXCLUDED.table_area_id,
table_area_name = EXCLUDED.table_area_name,
create_time = EXCLUDED.create_time,
is_delete = EXCLUDED.is_delete,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,188 @@
# -*- coding: utf-8 -*-
"""小票详情加载器"""
from ..base_loader import BaseLoader
import json
class TicketLoader(BaseLoader):
"""
Loader for parsing Ticket Detail JSON and populating DWD fact tables.
Handles:
- fact_order (Header)
- fact_order_goods (Items)
- fact_table_usage (Items)
- fact_assistant_service (Items)
"""
def process_tickets(self, tickets: list, store_id: int) -> tuple:
"""
Process a batch of ticket JSONs.
Returns (inserted_count, error_count)
"""
inserted_count = 0
error_count = 0
# Prepare batch lists
orders = []
goods_list = []
table_usages = []
assistant_services = []
for ticket in tickets:
try:
# 1. Parse Header (fact_order)
root_data = ticket.get("data", {}).get("data", {})
if not root_data:
continue
order_settle_id = root_data.get("orderSettleId")
if not order_settle_id:
continue
orders.append({
"store_id": store_id,
"order_settle_id": order_settle_id,
"order_trade_no": 0,
"order_no": str(root_data.get("orderSettleNumber", "")),
"member_id": 0,
"pay_time": root_data.get("payTime"),
"total_amount": root_data.get("consumeMoney", 0),
"pay_amount": root_data.get("actualPayment", 0),
"discount_amount": root_data.get("memberOfferAmount", 0),
"coupon_amount": root_data.get("couponAmount", 0),
"status": "PAID",
"cashier_name": root_data.get("cashierName", ""),
"remark": root_data.get("orderRemark", ""),
"raw_data": json.dumps(ticket, ensure_ascii=False)
})
# 2. Parse Items (orderItem list)
order_items = root_data.get("orderItem", [])
for item in order_items:
order_trade_no = item.get("siteOrderId")
# 2.1 Table Ledger
table_ledger = item.get("tableLedger")
if table_ledger:
table_usages.append({
"store_id": store_id,
"order_ledger_id": table_ledger.get("orderTableLedgerId"),
"order_settle_id": order_settle_id,
"table_id": table_ledger.get("siteTableId"),
"table_name": table_ledger.get("tableName"),
"start_time": table_ledger.get("chargeStartTime"),
"end_time": table_ledger.get("chargeEndTime"),
"duration_minutes": table_ledger.get("useDuration", 0),
"total_amount": table_ledger.get("consumptionAmount", 0),
"pay_amount": table_ledger.get("consumptionAmount", 0) - table_ledger.get("memberDiscountAmount", 0)
})
# 2.2 Goods Ledgers
goods_ledgers = item.get("goodsLedgers", [])
for g in goods_ledgers:
goods_list.append({
"store_id": store_id,
"order_goods_id": g.get("orderGoodsLedgerId"),
"order_settle_id": order_settle_id,
"order_trade_no": order_trade_no,
"goods_id": g.get("siteGoodsId"),
"goods_name": g.get("goodsName"),
"quantity": g.get("goodsCount", 0),
"unit_price": g.get("goodsPrice", 0),
"total_amount": g.get("ledgerAmount", 0),
"pay_amount": g.get("realGoodsMoney", 0)
})
# 2.3 Assistant Services
assistant_ledgers = item.get("assistantPlayWith", [])
for a in assistant_ledgers:
assistant_services.append({
"store_id": store_id,
"ledger_id": a.get("orderAssistantLedgerId"),
"order_settle_id": order_settle_id,
"assistant_id": a.get("assistantId"),
"assistant_name": a.get("ledgerName"),
"service_type": a.get("skillName", "Play"),
"start_time": a.get("ledgerStartTime"),
"end_time": a.get("ledgerEndTime"),
"duration_minutes": int(a.get("ledgerCount", 0) / 60) if a.get("ledgerCount") else 0,
"total_amount": a.get("ledgerAmount", 0),
"pay_amount": a.get("ledgerAmount", 0)
})
inserted_count += 1
except Exception as e:
self.logger.error(f"Error parsing ticket: {e}", exc_info=True)
error_count += 1
# 3. Batch Insert/Upsert
if orders:
self._upsert_orders(orders)
if goods_list:
self._upsert_goods(goods_list)
if table_usages:
self._upsert_table_usages(table_usages)
if assistant_services:
self._upsert_assistant_services(assistant_services)
return inserted_count, error_count
def _upsert_orders(self, rows):
sql = """
INSERT INTO billiards.fact_order (
store_id, order_settle_id, order_trade_no, order_no, member_id,
pay_time, total_amount, pay_amount, discount_amount, coupon_amount,
status, cashier_name, remark, raw_data
) VALUES (
%(store_id)s, %(order_settle_id)s, %(order_trade_no)s, %(order_no)s, %(member_id)s,
%(pay_time)s, %(total_amount)s, %(pay_amount)s, %(discount_amount)s, %(coupon_amount)s,
%(status)s, %(cashier_name)s, %(remark)s, %(raw_data)s
)
ON CONFLICT (store_id, order_settle_id) DO UPDATE SET
pay_time = EXCLUDED.pay_time,
pay_amount = EXCLUDED.pay_amount,
updated_at = now()
"""
self.db.batch_execute(sql, rows)
def _upsert_goods(self, rows):
sql = """
INSERT INTO billiards.fact_order_goods (
store_id, order_goods_id, order_settle_id, order_trade_no,
goods_id, goods_name, quantity, unit_price, total_amount, pay_amount
) VALUES (
%(store_id)s, %(order_goods_id)s, %(order_settle_id)s, %(order_trade_no)s,
%(goods_id)s, %(goods_name)s, %(quantity)s, %(unit_price)s, %(total_amount)s, %(pay_amount)s
)
ON CONFLICT (store_id, order_goods_id) DO UPDATE SET
pay_amount = EXCLUDED.pay_amount
"""
self.db.batch_execute(sql, rows)
def _upsert_table_usages(self, rows):
sql = """
INSERT INTO billiards.fact_table_usage (
store_id, order_ledger_id, order_settle_id, table_id, table_name,
start_time, end_time, duration_minutes, total_amount, pay_amount
) VALUES (
%(store_id)s, %(order_ledger_id)s, %(order_settle_id)s, %(table_id)s, %(table_name)s,
%(start_time)s, %(end_time)s, %(duration_minutes)s, %(total_amount)s, %(pay_amount)s
)
ON CONFLICT (store_id, order_ledger_id) DO UPDATE SET
pay_amount = EXCLUDED.pay_amount
"""
self.db.batch_execute(sql, rows)
def _upsert_assistant_services(self, rows):
sql = """
INSERT INTO billiards.fact_assistant_service (
store_id, ledger_id, order_settle_id, assistant_id, assistant_name,
service_type, start_time, end_time, duration_minutes, total_amount, pay_amount
) VALUES (
%(store_id)s, %(ledger_id)s, %(order_settle_id)s, %(assistant_id)s, %(assistant_name)s,
%(service_type)s, %(start_time)s, %(end_time)s, %(duration_minutes)s, %(total_amount)s, %(pay_amount)s
)
ON CONFLICT (store_id, ledger_id) DO UPDATE SET
pay_amount = EXCLUDED.pay_amount
"""
self.db.batch_execute(sql, rows)

View File

@@ -0,0 +1,118 @@
# -*- coding: utf-8 -*-
"""充值记录事实表"""
from ..base_loader import BaseLoader
class TopupLoader(BaseLoader):
"""写入 fact_topup"""
def upsert_topups(self, records: list) -> tuple:
if not records:
return (0, 0, 0)
sql = """
INSERT INTO billiards.fact_topup (
store_id,
topup_id,
member_id,
member_name,
member_phone,
card_id,
card_type_name,
pay_amount,
consume_money,
settle_status,
settle_type,
settle_name,
settle_relate_id,
pay_time,
create_time,
operator_id,
operator_name,
payment_method,
refund_amount,
cash_amount,
card_amount,
balance_amount,
online_amount,
rounding_amount,
adjust_amount,
goods_money,
table_charge_money,
service_money,
coupon_amount,
order_remark,
raw_data
)
VALUES (
%(store_id)s,
%(topup_id)s,
%(member_id)s,
%(member_name)s,
%(member_phone)s,
%(card_id)s,
%(card_type_name)s,
%(pay_amount)s,
%(consume_money)s,
%(settle_status)s,
%(settle_type)s,
%(settle_name)s,
%(settle_relate_id)s,
%(pay_time)s,
%(create_time)s,
%(operator_id)s,
%(operator_name)s,
%(payment_method)s,
%(refund_amount)s,
%(cash_amount)s,
%(card_amount)s,
%(balance_amount)s,
%(online_amount)s,
%(rounding_amount)s,
%(adjust_amount)s,
%(goods_money)s,
%(table_charge_money)s,
%(service_money)s,
%(coupon_amount)s,
%(order_remark)s,
%(raw_data)s
)
ON CONFLICT (store_id, topup_id) DO UPDATE SET
member_id = EXCLUDED.member_id,
member_name = EXCLUDED.member_name,
member_phone = EXCLUDED.member_phone,
card_id = EXCLUDED.card_id,
card_type_name = EXCLUDED.card_type_name,
pay_amount = EXCLUDED.pay_amount,
consume_money = EXCLUDED.consume_money,
settle_status = EXCLUDED.settle_status,
settle_type = EXCLUDED.settle_type,
settle_name = EXCLUDED.settle_name,
settle_relate_id = EXCLUDED.settle_relate_id,
pay_time = EXCLUDED.pay_time,
create_time = EXCLUDED.create_time,
operator_id = EXCLUDED.operator_id,
operator_name = EXCLUDED.operator_name,
payment_method = EXCLUDED.payment_method,
refund_amount = EXCLUDED.refund_amount,
cash_amount = EXCLUDED.cash_amount,
card_amount = EXCLUDED.card_amount,
balance_amount = EXCLUDED.balance_amount,
online_amount = EXCLUDED.online_amount,
rounding_amount = EXCLUDED.rounding_amount,
adjust_amount = EXCLUDED.adjust_amount,
goods_money = EXCLUDED.goods_money,
table_charge_money = EXCLUDED.table_charge_money,
service_money = EXCLUDED.service_money,
coupon_amount = EXCLUDED.coupon_amount,
order_remark = EXCLUDED.order_remark,
raw_data = EXCLUDED.raw_data,
updated_at = now()
RETURNING (xmax = 0) AS inserted
"""
inserted, updated = self.db.batch_upsert_with_returning(
sql, records, page_size=self._batch_size()
)
return (inserted, updated, 0)

View File

@@ -0,0 +1,6 @@
# -*- coding: utf-8 -*-
"""ODS loader helpers."""
from .generic import GenericODSLoader
__all__ = ["GenericODSLoader"]

View File

@@ -0,0 +1,67 @@
# -*- coding: utf-8 -*-
"""Generic ODS loader that keeps raw payload + primary keys."""
from __future__ import annotations
import json
from datetime import datetime, timezone
from typing import Iterable, Sequence
from ..base_loader import BaseLoader
class GenericODSLoader(BaseLoader):
"""Insert/update helper for ODS tables that share the same pattern."""
def __init__(
self,
db_ops,
table_name: str,
columns: Sequence[str],
conflict_columns: Sequence[str],
):
super().__init__(db_ops)
if not conflict_columns:
raise ValueError("conflict_columns must not be empty for ODS loader")
self.table_name = table_name
self.columns = list(columns)
self.conflict_columns = list(conflict_columns)
self._sql = self._build_sql()
def upsert_rows(self, rows: Iterable[dict]) -> tuple[int, int, int]:
"""Insert/update the provided iterable of dictionaries."""
rows = list(rows)
if not rows:
return (0, 0, 0)
normalized = [self._normalize_row(row) for row in rows]
inserted, updated = self.db.batch_upsert_with_returning(
self._sql, normalized, page_size=self._batch_size()
)
return inserted, updated, 0
def _build_sql(self) -> str:
col_list = ", ".join(self.columns)
placeholders = ", ".join(f"%({col})s" for col in self.columns)
conflict_clause = ", ".join(self.conflict_columns)
update_columns = [c for c in self.columns if c not in self.conflict_columns]
set_clause = ", ".join(f"{col} = EXCLUDED.{col}" for col in update_columns)
return (
f"INSERT INTO {self.table_name} ({col_list}) "
f"VALUES ({placeholders}) "
f"ON CONFLICT ({conflict_clause}) DO UPDATE SET {set_clause} "
f"RETURNING (xmax = 0) AS inserted"
)
def _normalize_row(self, row: dict) -> dict:
normalized = {}
for col in self.columns:
value = row.get(col)
if col == "payload" and value is not None and not isinstance(value, str):
normalized[col] = json.dumps(value, ensure_ascii=False)
else:
normalized[col] = value
if "fetched_at" in normalized and normalized["fetched_at"] is None:
normalized["fetched_at"] = datetime.now(timezone.utc)
return normalized

View File

View File

@@ -0,0 +1,50 @@
# -*- coding: utf-8 -*-
"""数据类型解析器"""
from datetime import datetime
from decimal import Decimal, ROUND_HALF_UP
from dateutil import parser as dtparser
from zoneinfo import ZoneInfo
class TypeParser:
"""类型解析工具"""
@staticmethod
def parse_timestamp(s: str, tz: ZoneInfo) -> datetime | None:
"""解析时间戳"""
if not s:
return None
try:
dt = dtparser.parse(s)
if dt.tzinfo is None:
return dt.replace(tzinfo=tz)
return dt.astimezone(tz)
except Exception:
return None
@staticmethod
def parse_decimal(value, scale: int = 2) -> Decimal | None:
"""解析金额"""
if value is None:
return None
try:
d = Decimal(str(value))
return d.quantize(Decimal(10) ** -scale, rounding=ROUND_HALF_UP)
except Exception:
return None
@staticmethod
def parse_int(value) -> int | None:
"""解析整数"""
if value is None:
return None
try:
return int(value)
except Exception:
return None
@staticmethod
def format_timestamp(dt: datetime | None, tz: ZoneInfo) -> str | None:
"""格式化时间戳"""
if not dt:
return None
return dt.astimezone(tz).strftime("%Y-%m-%d %H:%M:%S")

View File

@@ -0,0 +1,25 @@
# -*- coding: utf-8 -*-
"""数据验证器"""
from decimal import Decimal
class DataValidator:
"""数据验证工具"""
@staticmethod
def validate_positive_amount(value: Decimal | None, field_name: str = "amount"):
"""验证金额为正数"""
if value is not None and value < 0:
raise ValueError(f"{field_name} 不能为负数: {value}")
@staticmethod
def validate_required(value, field_name: str):
"""验证必填字段"""
if value is None or value == "":
raise ValueError(f"{field_name} 是必填字段")
@staticmethod
def validate_range(value, min_val, max_val, field_name: str):
"""验证值范围"""
if value is not None:
if value < min_val or value > max_val:
raise ValueError(f"{field_name} 必须在 {min_val}{max_val} 之间")

View File

View File

@@ -0,0 +1,62 @@
# -*- coding: utf-8 -*-
"""游标管理器"""
from datetime import datetime
class CursorManager:
"""ETL游标管理"""
def __init__(self, db_connection):
self.db = db_connection
def get_or_create(self, task_id: int, store_id: int) -> dict:
"""获取或创建游标"""
rows = self.db.query(
"SELECT * FROM etl_admin.etl_cursor WHERE task_id=%s AND store_id=%s",
(task_id, store_id)
)
if rows:
return rows[0]
# 创建新游标
self.db.execute(
"""
INSERT INTO etl_admin.etl_cursor(task_id, store_id, last_start, last_end, last_id, extra)
VALUES(%s, %s, NULL, NULL, NULL, '{}'::jsonb)
""",
(task_id, store_id)
)
self.db.commit()
rows = self.db.query(
"SELECT * FROM etl_admin.etl_cursor WHERE task_id=%s AND store_id=%s",
(task_id, store_id)
)
return rows[0] if rows else None
def advance(self, task_id: int, store_id: int, window_start: datetime,
window_end: datetime, run_id: int, last_id: int = None):
"""推进游标"""
if last_id is not None:
sql = """
UPDATE etl_admin.etl_cursor
SET last_start = %s,
last_end = %s,
last_id = GREATEST(COALESCE(last_id, 0), %s),
last_run_id = %s,
updated_at = now()
WHERE task_id = %s AND store_id = %s
"""
self.db.execute(sql, (window_start, window_end, last_id, run_id, task_id, store_id))
else:
sql = """
UPDATE etl_admin.etl_cursor
SET last_start = %s,
last_end = %s,
last_run_id = %s,
updated_at = now()
WHERE task_id = %s AND store_id = %s
"""
self.db.execute(sql, (window_start, window_end, run_id, task_id, store_id))
self.db.commit()

View File

@@ -0,0 +1,70 @@
# -*- coding: utf-8 -*-
"""运行记录追踪器"""
import json
from datetime import datetime
class RunTracker:
"""ETL运行记录管理"""
def __init__(self, db_connection):
self.db = db_connection
def create_run(self, task_id: int, store_id: int, run_uuid: str,
export_dir: str, log_path: str, status: str,
window_start: datetime = None, window_end: datetime = None,
window_minutes: int = None, overlap_seconds: int = None,
request_params: dict = None) -> int:
"""创建运行记录"""
sql = """
INSERT INTO etl_admin.etl_run(
run_uuid, task_id, store_id, status, started_at, window_start, window_end,
window_minutes, overlap_seconds, fetched_count, loaded_count, updated_count,
skipped_count, error_count, unknown_fields, export_dir, log_path,
request_params, manifest, error_message, extra
) VALUES (
%s, %s, %s, %s, now(), %s, %s, %s, %s, 0, 0, 0, 0, 0, 0, %s, %s, %s,
'{}'::jsonb, NULL, '{}'::jsonb
)
RETURNING run_id
"""
result = self.db.query(
sql,
(run_uuid, task_id, store_id, status, window_start, window_end,
window_minutes, overlap_seconds, export_dir, log_path,
json.dumps(request_params or {}, ensure_ascii=False))
)
run_id = result[0]["run_id"]
self.db.commit()
return run_id
def update_run(self, run_id: int, counts: dict, status: str,
ended_at: datetime = None, manifest: dict = None,
error_message: str = None):
"""更新运行记录"""
sql = """
UPDATE etl_admin.etl_run
SET fetched_count = %s,
loaded_count = %s,
updated_count = %s,
skipped_count = %s,
error_count = %s,
unknown_fields = %s,
status = %s,
ended_at = %s,
manifest = %s,
error_message = %s
WHERE run_id = %s
"""
self.db.execute(
sql,
(counts.get("fetched", 0), counts.get("inserted", 0),
counts.get("updated", 0), counts.get("skipped", 0),
counts.get("errors", 0), counts.get("unknown_fields", 0),
status, ended_at,
json.dumps(manifest or {}, ensure_ascii=False),
error_message, run_id)
)
self.db.commit()

View File

@@ -0,0 +1,234 @@
# -*- coding: utf-8 -*-
"""ETL 调度:支持在线抓取、离线清洗入库、全流程三种模式。"""
from __future__ import annotations
import uuid
from datetime import datetime
from pathlib import Path
from zoneinfo import ZoneInfo
from api.client import APIClient
from api.local_json_client import LocalJsonClient
from api.recording_client import RecordingAPIClient
from database.connection import DatabaseConnection
from database.operations import DatabaseOperations
from orchestration.cursor_manager import CursorManager
from orchestration.run_tracker import RunTracker
from orchestration.task_registry import default_registry
class ETLScheduler:
"""调度多个任务,按 pipeline.flow 执行抓取/清洗入库。"""
def __init__(self, config, logger):
self.config = config
self.logger = logger
self.tz = ZoneInfo(config.get("app.timezone", "Asia/Taipei"))
self.pipeline_flow = str(config.get("pipeline.flow", "FULL") or "FULL").upper()
self.fetch_root = Path(config.get("pipeline.fetch_root") or config["io"]["export_root"])
self.ingest_source_dir = config.get("pipeline.ingest_source_dir") or ""
self.write_pretty_json = bool(config.get("io.write_pretty_json", False))
# 组件
self.db_conn = DatabaseConnection(
dsn=config["db"]["dsn"],
session=config["db"].get("session"),
connect_timeout=config["db"].get("connect_timeout_sec"),
)
self.db_ops = DatabaseOperations(self.db_conn)
self.api_client = APIClient(
base_url=config["api"]["base_url"],
token=config["api"]["token"],
timeout=config["api"]["timeout_sec"],
retry_max=config["api"]["retries"]["max_attempts"],
headers_extra=config["api"].get("headers_extra"),
)
self.cursor_mgr = CursorManager(self.db_conn)
self.run_tracker = RunTracker(self.db_conn)
self.task_registry = default_registry
# ------------------------------------------------------------------ public
def run_tasks(self, task_codes: list | None = None):
"""按配置或传入列表执行任务。"""
run_uuid = uuid.uuid4().hex
store_id = self.config.get("app.store_id")
if not task_codes:
task_codes = self.config.get("run.tasks", [])
self.logger.info("开始运行任务: %s, run_uuid=%s", task_codes, run_uuid)
for task_code in task_codes:
try:
self._run_single_task(task_code, run_uuid, store_id)
except Exception as exc: # noqa: BLE001
self.logger.error("任务 %s 失败: %s", task_code, exc, exc_info=True)
continue
self.logger.info("所有任务执行完成")
# ------------------------------------------------------------------ internals
def _run_single_task(self, task_code: str, run_uuid: str, store_id: int):
"""单个任务的抓取/清洗编排。"""
task_cfg = self._load_task_config(task_code, store_id)
if not task_cfg:
self.logger.warning("任务 %s 未启用或不存在", task_code)
return
task_id = task_cfg["task_id"]
cursor_data = self.cursor_mgr.get_or_create(task_id, store_id)
# run 记录
export_dir = Path(self.config["io"]["export_root"]) / datetime.now(self.tz).strftime("%Y%m%d")
log_path = str(Path(self.config["io"]["log_root"]) / f"{run_uuid}.log")
run_id = self.run_tracker.create_run(
task_id=task_id,
store_id=store_id,
run_uuid=run_uuid,
export_dir=str(export_dir),
log_path=log_path,
status=self._map_run_status("RUNNING"),
)
# 为抓取阶段准备目录
fetch_dir = self._build_fetch_dir(task_code, run_id)
fetch_stats = None
try:
if self._flow_includes_fetch():
fetch_stats = self._execute_fetch(task_code, cursor_data, fetch_dir, run_id)
if self.pipeline_flow == "FETCH_ONLY":
counts = self._counts_from_fetch(fetch_stats)
self.run_tracker.update_run(
run_id=run_id,
counts=counts,
status=self._map_run_status("SUCCESS"),
ended_at=datetime.now(self.tz),
)
return
if self._flow_includes_ingest():
source_dir = self._resolve_ingest_source(fetch_dir, fetch_stats)
result = self._execute_ingest(task_code, cursor_data, source_dir)
self.run_tracker.update_run(
run_id=run_id,
counts=result["counts"],
status=self._map_run_status(result["status"]),
ended_at=datetime.now(self.tz),
)
if (result.get("status") or "").upper() == "SUCCESS":
window = result.get("window")
if window:
self.cursor_mgr.advance(
task_id=task_id,
store_id=store_id,
window_start=window.get("start"),
window_end=window.get("end"),
run_id=run_id,
)
except Exception as exc: # noqa: BLE001
self.run_tracker.update_run(
run_id=run_id,
counts={},
status=self._map_run_status("FAIL"),
ended_at=datetime.now(self.tz),
error_message=str(exc),
)
raise
def _execute_fetch(self, task_code: str, cursor_data: dict | None, fetch_dir: Path, run_id: int):
"""在线抓取阶段:用 RecordingAPIClient 拉取并落盘,不做 Transform/Load。"""
recording_client = RecordingAPIClient(
base_client=self.api_client,
output_dir=fetch_dir,
task_code=task_code,
run_id=run_id,
write_pretty=self.write_pretty_json,
)
task = self.task_registry.create_task(task_code, self.config, self.db_ops, recording_client, self.logger)
context = task._build_context(cursor_data) # type: ignore[attr-defined]
self.logger.info("%s: 抓取阶段开始,目录=%s", task_code, fetch_dir)
extracted = task.extract(context)
# 抓取结束,不执行 transform/load
stats = recording_client.last_dump or {}
fetched_count = stats.get("records") or len(extracted.get("records", [])) if isinstance(extracted, dict) else 0
self.logger.info(
"%s: 抓取完成,文件=%s,记录数=%s",
task_code,
stats.get("file"),
fetched_count,
)
return {"file": stats.get("file"), "records": fetched_count, "pages": stats.get("pages")}
def _execute_ingest(self, task_code: str, cursor_data: dict | None, source_dir: Path):
"""本地清洗入库:使用 LocalJsonClient 回放 JSON走原有任务 ETL。"""
local_client = LocalJsonClient(source_dir)
task = self.task_registry.create_task(task_code, self.config, self.db_ops, local_client, self.logger)
self.logger.info("%s: 本地清洗入库开始,源目录=%s", task_code, source_dir)
return task.execute(cursor_data)
def _build_fetch_dir(self, task_code: str, run_id: int) -> Path:
ts = datetime.now(self.tz).strftime("%Y%m%d-%H%M%S")
return Path(self.fetch_root) / f"{task_code.upper()}-{run_id}-{ts}"
def _resolve_ingest_source(self, fetch_dir: Path, fetch_stats: dict | None) -> Path:
if fetch_stats and fetch_dir.exists():
return fetch_dir
if self.ingest_source_dir:
return Path(self.ingest_source_dir)
raise FileNotFoundError("未提供本地清洗入库所需的 JSON 目录")
def _counts_from_fetch(self, stats: dict | None) -> dict:
fetched = (stats or {}).get("records") or 0
return {
"fetched": fetched,
"inserted": 0,
"updated": 0,
"skipped": 0,
"errors": 0,
}
def _flow_includes_fetch(self) -> bool:
return self.pipeline_flow in {"FETCH_ONLY", "FULL"}
def _flow_includes_ingest(self) -> bool:
return self.pipeline_flow in {"INGEST_ONLY", "FULL"}
def _load_task_config(self, task_code: str, store_id: int) -> dict | None:
"""从数据库加载任务配置。"""
sql = """
SELECT task_id, task_code, store_id, enabled, cursor_field,
window_minutes_default, overlap_seconds, page_size, retry_max, params
FROM etl_admin.etl_task
WHERE store_id = %s AND task_code = %s AND enabled = TRUE
"""
rows = self.db_conn.query(sql, (store_id, task_code))
return rows[0] if rows else None
def close(self):
"""关闭连接。"""
self.db_conn.close()
@staticmethod
def _map_run_status(status: str) -> str:
"""
将任务返回的状态转换为 etl_admin.run_status_enum
(SUCC / FAIL / PARTIAL)
"""
normalized = (status or "").upper()
if normalized in {"SUCCESS", "SUCC"}:
return "SUCC"
if normalized in {"FAIL", "FAILED", "ERROR"}:
return "FAIL"
if normalized in {"RUNNING", "PARTIAL", "PENDING", "IN_PROGRESS"}:
return "PARTIAL"
# 未知状态默认标记为 FAIL便于排查
return "FAIL"

View File

@@ -0,0 +1,68 @@
# -*- coding: utf-8 -*-
"""任务注册表"""
from tasks.orders_task import OrdersTask
from tasks.payments_task import PaymentsTask
from tasks.members_task import MembersTask
from tasks.products_task import ProductsTask
from tasks.tables_task import TablesTask
from tasks.assistants_task import AssistantsTask
from tasks.packages_task import PackagesDefTask
from tasks.refunds_task import RefundsTask
from tasks.coupon_usage_task import CouponUsageTask
from tasks.inventory_change_task import InventoryChangeTask
from tasks.topups_task import TopupsTask
from tasks.table_discount_task import TableDiscountTask
from tasks.assistant_abolish_task import AssistantAbolishTask
from tasks.ledger_task import LedgerTask
from tasks.ods_tasks import ODS_TASK_CLASSES
from tasks.ticket_dwd_task import TicketDwdTask
from tasks.manual_ingest_task import ManualIngestTask
from tasks.payments_dwd_task import PaymentsDwdTask
from tasks.members_dwd_task import MembersDwdTask
class TaskRegistry:
"""任务注册和工厂"""
def __init__(self):
self._tasks = {}
def register(self, task_code: str, task_class):
"""注册任务类"""
self._tasks[task_code.upper()] = task_class
def create_task(self, task_code: str, config, db_connection, api_client, logger):
"""创建任务实例"""
task_code = task_code.upper()
if task_code not in self._tasks:
raise ValueError(f"未知的任务类型: {task_code}")
task_class = self._tasks[task_code]
return task_class(config, db_connection, api_client, logger)
def get_all_task_codes(self) -> list:
"""获取所有已注册的任务代码"""
return list(self._tasks.keys())
# 默认注册表
default_registry = TaskRegistry()
default_registry.register("PRODUCTS", ProductsTask)
default_registry.register("TABLES", TablesTask)
default_registry.register("MEMBERS", MembersTask)
default_registry.register("ASSISTANTS", AssistantsTask)
default_registry.register("PACKAGES_DEF", PackagesDefTask)
default_registry.register("ORDERS", OrdersTask)
default_registry.register("PAYMENTS", PaymentsTask)
default_registry.register("REFUNDS", RefundsTask)
default_registry.register("COUPON_USAGE", CouponUsageTask)
default_registry.register("INVENTORY_CHANGE", InventoryChangeTask)
default_registry.register("TOPUPS", TopupsTask)
default_registry.register("TABLE_DISCOUNT", TableDiscountTask)
default_registry.register("ASSISTANT_ABOLISH", AssistantAbolishTask)
default_registry.register("LEDGER", LedgerTask)
default_registry.register("TICKET_DWD", TicketDwdTask)
default_registry.register("MANUAL_INGEST", ManualIngestTask)
default_registry.register("PAYMENTS_DWD", PaymentsDwdTask)
default_registry.register("MEMBERS_DWD", MembersDwdTask)
for code, task_cls in ODS_TASK_CLASSES.items():
default_registry.register(code, task_cls)

View File

View File

@@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
"""余额一致性检查器"""
from .base_checker import BaseDataQualityChecker
class BalanceChecker(BaseDataQualityChecker):
"""检查订单、支付、退款的金额一致性"""
def check(self, store_id: int, start_date: str, end_date: str) -> dict:
"""
检查指定时间范围内的余额一致性
验证: 订单总额 = 支付总额 - 退款总额
"""
checks = []
# 查询订单总额
sql_orders = """
SELECT COALESCE(SUM(final_amount), 0) AS total
FROM billiards.fact_order
WHERE store_id = %s
AND order_time >= %s
AND order_time < %s
AND order_status = 'COMPLETED'
"""
order_total = self.db.query(sql_orders, (store_id, start_date, end_date))[0]["total"]
# 查询支付总额
sql_payments = """
SELECT COALESCE(SUM(pay_amount), 0) AS total
FROM billiards.fact_payment
WHERE store_id = %s
AND pay_time >= %s
AND pay_time < %s
AND pay_status = 'SUCCESS'
"""
payment_total = self.db.query(sql_payments, (store_id, start_date, end_date))[0]["total"]
# 查询退款总额
sql_refunds = """
SELECT COALESCE(SUM(refund_amount), 0) AS total
FROM billiards.fact_refund
WHERE store_id = %s
AND refund_time >= %s
AND refund_time < %s
AND refund_status = 'SUCCESS'
"""
refund_total = self.db.query(sql_refunds, (store_id, start_date, end_date))[0]["total"]
# 验证余额
expected_total = payment_total - refund_total
diff = abs(float(order_total) - float(expected_total))
threshold = 0.01 # 1分钱的容差
passed = diff < threshold
checks.append({
"name": "balance_consistency",
"passed": passed,
"message": f"订单总额: {order_total}, 支付-退款: {expected_total}, 差异: {diff}",
"details": {
"order_total": float(order_total),
"payment_total": float(payment_total),
"refund_total": float(refund_total),
"diff": diff
}
})
all_passed = all(c["passed"] for c in checks)
return {
"passed": all_passed,
"checks": checks
}

View File

@@ -0,0 +1,19 @@
# -*- coding: utf-8 -*-
"""数据质量检查器基类"""
class BaseDataQualityChecker:
"""数据质量检查器基类"""
def __init__(self, db_connection, logger):
self.db = db_connection
self.logger = logger
def check(self) -> dict:
"""
执行质量检查
返回: {
"passed": bool,
"checks": [{"name": str, "passed": bool, "message": str}]
}
"""
raise NotImplementedError("子类需实现 check 方法")

View File

@@ -0,0 +1,5 @@
# Python依赖包
psycopg2-binary>=2.9.0
requests>=2.28.0
python-dateutil>=2.8.0
tzdata>=2023.0

View File

View File

@@ -0,0 +1,89 @@
# -*- coding: utf-8 -*-
"""SCD2 (Slowly Changing Dimension Type 2) 处理逻辑"""
from datetime import datetime
def _row_to_dict(cursor, row):
if row is None:
return None
columns = [desc[0] for desc in cursor.description]
return {col: row[idx] for idx, col in enumerate(columns)}
class SCD2Handler:
"""SCD2历史记录处理"""
def __init__(self, db_ops):
self.db = db_ops
def upsert(
self,
table_name: str,
natural_key: list,
tracked_fields: list,
record: dict,
effective_date: datetime = None,
) -> str:
"""
处理SCD2更新
Returns:
操作类型: 'INSERT', 'UPDATE', 'UNCHANGED'
"""
effective_date = effective_date or datetime.now()
where_clause = " AND ".join([f"{k} = %({k})s" for k in natural_key])
sql_select = f"""
SELECT * FROM {table_name}
WHERE {where_clause}
AND valid_to IS NULL
"""
with self.db.conn.cursor() as current:
current.execute(sql_select, record)
existing = _row_to_dict(current, current.fetchone())
if not existing:
record["valid_from"] = effective_date
record["valid_to"] = None
record["is_current"] = True
fields = list(record.keys())
placeholders = ", ".join([f"%({f})s" for f in fields])
sql_insert = f"""
INSERT INTO {table_name} ({', '.join(fields)})
VALUES ({placeholders})
"""
current.execute(sql_insert, record)
return "INSERT"
has_changes = any(existing.get(field) != record.get(field) for field in tracked_fields)
if not has_changes:
return "UNCHANGED"
update_where = " AND ".join([f"{k} = %({k})s" for k in natural_key])
sql_close = f"""
UPDATE {table_name}
SET valid_to = %(effective_date)s,
is_current = FALSE
WHERE {update_where}
AND valid_to IS NULL
"""
record["effective_date"] = effective_date
current.execute(sql_close, record)
record["valid_from"] = effective_date
record["valid_to"] = None
record["is_current"] = True
fields = list(record.keys())
if "effective_date" in fields:
fields.remove("effective_date")
placeholders = ", ".join([f"%({f})s" for f in fields])
sql_insert = f"""
INSERT INTO {table_name} ({', '.join(fields)})
VALUES ({placeholders})
"""
current.execute(sql_insert, record)
return "UPDATE"

View File

View File

@@ -0,0 +1,76 @@
# -*- coding: utf-8 -*-
"""Apply the PRD-aligned warehouse schema (ODS/DWD/DWS) to PostgreSQL."""
from __future__ import annotations
import argparse
import os
import sys
from pathlib import Path
PROJECT_ROOT = Path(__file__).resolve().parents[1]
if str(PROJECT_ROOT) not in sys.path:
sys.path.insert(0, str(PROJECT_ROOT))
from database.connection import DatabaseConnection # noqa: E402
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Create/upgrade warehouse schemas using schema_v2.sql"
)
parser.add_argument(
"--dsn",
help="PostgreSQL DSN (fallback to PG_DSN env)",
default=os.environ.get("PG_DSN"),
)
parser.add_argument(
"--file",
help="Path to schema SQL",
default=str(PROJECT_ROOT / "database" / "schema_v2.sql"),
)
parser.add_argument(
"--timeout",
type=int,
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
help="connect_timeout seconds (capped at 20, default 10)",
)
return parser.parse_args()
def apply_schema(dsn: str, sql_path: Path, timeout: int) -> None:
if not sql_path.exists():
raise FileNotFoundError(f"Schema file not found: {sql_path}")
sql_text = sql_path.read_text(encoding="utf-8")
timeout_val = max(1, min(timeout, 20))
conn = DatabaseConnection(dsn, connect_timeout=timeout_val)
try:
with conn.conn.cursor() as cur:
cur.execute(sql_text)
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
def main() -> int:
args = parse_args()
if not args.dsn:
print("Missing DSN. Set PG_DSN or pass --dsn.", file=sys.stderr)
return 2
try:
apply_schema(args.dsn, Path(args.file), args.timeout)
except Exception as exc: # pragma: no cover - utility script
print(f"Schema apply failed: {exc}", file=sys.stderr)
return 1
print("Schema applied successfully.")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,425 @@
# -*- coding: utf-8 -*-
"""Populate PRD DWD tables from ODS payload snapshots."""
from __future__ import annotations
import argparse
import os
import sys
import psycopg2
SQL_STEPS: list[tuple[str, str]] = [
(
"dim_tenant",
"""
INSERT INTO billiards_dwd.dim_tenant (tenant_id, tenant_name, status)
SELECT DISTINCT tenant_id, 'default' AS tenant_name, 'active' AS status
FROM (
SELECT tenant_id FROM billiards_ods.ods_order_settle
UNION SELECT tenant_id FROM billiards_ods.ods_order_receipt_detail
UNION SELECT tenant_id FROM billiards_ods.ods_member_profile
) s
WHERE tenant_id IS NOT NULL
ON CONFLICT (tenant_id) DO UPDATE SET updated_at = now();
""",
),
(
"dim_site",
"""
INSERT INTO billiards_dwd.dim_site (site_id, tenant_id, site_name, status)
SELECT DISTINCT site_id, MAX(tenant_id) AS tenant_id, 'default' AS site_name, 'active' AS status
FROM (
SELECT site_id, tenant_id FROM billiards_ods.ods_order_settle
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_order_receipt_detail
UNION SELECT site_id, tenant_id FROM billiards_ods.ods_table_info
) s
WHERE site_id IS NOT NULL
GROUP BY site_id
ON CONFLICT (site_id) DO UPDATE SET updated_at = now();
""",
),
(
"dim_product_category",
"""
INSERT INTO billiards_dwd.dim_product_category (category_id, category_name, parent_id, level_no, status)
SELECT DISTINCT category_id, category_name, parent_id, level_no, status
FROM billiards_ods.ods_goods_category
WHERE category_id IS NOT NULL
ON CONFLICT (category_id) DO UPDATE SET
category_name = EXCLUDED.category_name,
parent_id = EXCLUDED.parent_id,
level_no = EXCLUDED.level_no,
status = EXCLUDED.status;
""",
),
(
"dim_product",
"""
INSERT INTO billiards_dwd.dim_product (goods_id, goods_name, goods_code, category_id, category_name, unit, default_price, status)
SELECT DISTINCT goods_id, goods_name, NULL::TEXT AS goods_code, category_id, category_name, NULL::TEXT AS unit, sale_price AS default_price, status
FROM billiards_ods.ods_store_product
WHERE goods_id IS NOT NULL
ON CONFLICT (goods_id) DO UPDATE SET
goods_name = EXCLUDED.goods_name,
category_id = EXCLUDED.category_id,
category_name = EXCLUDED.category_name,
default_price = EXCLUDED.default_price,
status = EXCLUDED.status,
updated_at = now();
""",
),
(
"dim_product_from_sales",
"""
INSERT INTO billiards_dwd.dim_product (goods_id, goods_name)
SELECT DISTINCT goods_id, goods_name
FROM billiards_ods.ods_store_sale_item
WHERE goods_id IS NOT NULL
ON CONFLICT (goods_id) DO NOTHING;
""",
),
(
"dim_member_card_type",
"""
INSERT INTO billiards_dwd.dim_member_card_type (card_type_id, card_type_name, discount_rate)
SELECT DISTINCT card_type_id, card_type_name, discount_rate
FROM billiards_ods.ods_member_card
WHERE card_type_id IS NOT NULL
ON CONFLICT (card_type_id) DO UPDATE SET
card_type_name = EXCLUDED.card_type_name,
discount_rate = EXCLUDED.discount_rate;
""",
),
(
"dim_member",
"""
INSERT INTO billiards_dwd.dim_member (
site_id, member_id, tenant_id, member_name, nickname, gender, birthday, mobile,
member_type_id, member_type_name, status, register_time, last_visit_time,
balance, total_recharge_amount, total_consumed_amount, wechat_id, alipay_id, remark
)
SELECT DISTINCT
prof.site_id,
prof.member_id,
prof.tenant_id,
prof.member_name,
prof.nickname,
prof.gender,
prof.birthday,
prof.mobile,
card.member_type_id,
card.member_type_name,
prof.status,
prof.register_time,
prof.last_visit_time,
prof.balance,
NULL::NUMERIC AS total_recharge_amount,
NULL::NUMERIC AS total_consumed_amount,
prof.wechat_id,
prof.alipay_id,
prof.remarks
FROM billiards_ods.ods_member_profile prof
LEFT JOIN (
SELECT DISTINCT site_id, member_id, card_type_id AS member_type_id, card_type_name AS member_type_name
FROM billiards_ods.ods_member_card
) card
ON prof.site_id = card.site_id AND prof.member_id = card.member_id
WHERE prof.member_id IS NOT NULL
ON CONFLICT (site_id, member_id) DO UPDATE SET
member_name = EXCLUDED.member_name,
nickname = EXCLUDED.nickname,
gender = EXCLUDED.gender,
birthday = EXCLUDED.birthday,
mobile = EXCLUDED.mobile,
member_type_id = EXCLUDED.member_type_id,
member_type_name = EXCLUDED.member_type_name,
status = EXCLUDED.status,
register_time = EXCLUDED.register_time,
last_visit_time = EXCLUDED.last_visit_time,
balance = EXCLUDED.balance,
wechat_id = EXCLUDED.wechat_id,
alipay_id = EXCLUDED.alipay_id,
remark = EXCLUDED.remark,
updated_at = now();
""",
),
(
"dim_table",
"""
INSERT INTO billiards_dwd.dim_table (table_id, site_id, table_code, table_name, table_type, area_name, status, created_time, updated_time)
SELECT DISTINCT table_id, site_id, table_code, table_name, table_type, area_name, status, created_time, updated_time
FROM billiards_ods.ods_table_info
WHERE table_id IS NOT NULL
ON CONFLICT (table_id) DO UPDATE SET
site_id = EXCLUDED.site_id,
table_code = EXCLUDED.table_code,
table_name = EXCLUDED.table_name,
table_type = EXCLUDED.table_type,
area_name = EXCLUDED.area_name,
status = EXCLUDED.status,
created_time = EXCLUDED.created_time,
updated_time = EXCLUDED.updated_time;
""",
),
(
"dim_assistant",
"""
INSERT INTO billiards_dwd.dim_assistant (assistant_id, assistant_name, mobile, status)
SELECT DISTINCT assistant_id, assistant_name, mobile, status
FROM billiards_ods.ods_assistant_account
WHERE assistant_id IS NOT NULL
ON CONFLICT (assistant_id) DO UPDATE SET
assistant_name = EXCLUDED.assistant_name,
mobile = EXCLUDED.mobile,
status = EXCLUDED.status,
updated_at = now();
""",
),
(
"dim_pay_method",
"""
INSERT INTO billiards_dwd.dim_pay_method (pay_method_code, pay_method_name, is_stored_value, status)
SELECT DISTINCT pay_method_code, pay_method_name, FALSE AS is_stored_value, 'active' AS status
FROM billiards_ods.ods_payment_record
WHERE pay_method_code IS NOT NULL
ON CONFLICT (pay_method_code) DO UPDATE SET
pay_method_name = EXCLUDED.pay_method_name,
status = EXCLUDED.status,
updated_at = now();
""",
),
(
"dim_coupon_platform",
"""
INSERT INTO billiards_dwd.dim_coupon_platform (platform_code, platform_name)
SELECT DISTINCT platform_code, platform_code AS platform_name
FROM billiards_ods.ods_platform_coupon_log
WHERE platform_code IS NOT NULL
ON CONFLICT (platform_code) DO NOTHING;
""",
),
(
"fact_sale_item",
"""
INSERT INTO billiards_dwd.fact_sale_item (
site_id, sale_item_id, order_trade_no, order_settle_id, member_id,
goods_id, category_id, quantity, original_amount, discount_amount,
final_amount, is_gift, sale_time
)
SELECT
site_id,
sale_item_id,
order_trade_no,
order_settle_id,
NULL::BIGINT AS member_id,
goods_id,
category_id,
quantity,
original_amount,
discount_amount,
final_amount,
COALESCE(is_gift, FALSE),
sale_time
FROM billiards_ods.ods_store_sale_item
ON CONFLICT (site_id, sale_item_id) DO NOTHING;
""",
),
(
"fact_table_usage",
"""
INSERT INTO billiards_dwd.fact_table_usage (
site_id, ledger_id, order_trade_no, order_settle_id, table_id,
member_id, start_time, end_time, duration_minutes,
original_table_fee, member_discount_amount, manual_discount_amount,
final_table_fee, is_canceled, cancel_time
)
SELECT
site_id,
ledger_id,
order_trade_no,
order_settle_id,
table_id,
member_id,
start_time,
end_time,
duration_minutes,
original_table_fee,
0::NUMERIC AS member_discount_amount,
discount_amount AS manual_discount_amount,
final_table_fee,
FALSE AS is_canceled,
NULL::TIMESTAMPTZ AS cancel_time
FROM billiards_ods.ods_table_use_log
ON CONFLICT (site_id, ledger_id) DO NOTHING;
""",
),
(
"fact_assistant_service",
"""
INSERT INTO billiards_dwd.fact_assistant_service (
site_id, ledger_id, order_trade_no, order_settle_id, assistant_id,
assist_type_code, member_id, start_time, end_time, duration_minutes,
original_fee, member_discount_amount, manual_discount_amount,
final_fee, is_canceled, cancel_time
)
SELECT
site_id,
ledger_id,
order_trade_no,
order_settle_id,
assistant_id,
NULL::TEXT AS assist_type_code,
member_id,
start_time,
end_time,
duration_minutes,
original_fee,
0::NUMERIC AS member_discount_amount,
discount_amount AS manual_discount_amount,
final_fee,
FALSE AS is_canceled,
NULL::TIMESTAMPTZ AS cancel_time
FROM billiards_ods.ods_assistant_service_log
ON CONFLICT (site_id, ledger_id) DO NOTHING;
""",
),
(
"fact_coupon_usage",
"""
INSERT INTO billiards_dwd.fact_coupon_usage (
site_id, coupon_id, package_id, order_trade_no, order_settle_id,
member_id, platform_code, status, deduct_amount, settle_price, used_time
)
SELECT
site_id,
coupon_id,
NULL::BIGINT AS package_id,
order_trade_no,
order_settle_id,
member_id,
platform_code,
status,
deduct_amount,
settle_price,
used_time
FROM billiards_ods.ods_platform_coupon_log
ON CONFLICT (site_id, coupon_id) DO NOTHING;
""",
),
(
"fact_payment",
"""
INSERT INTO billiards_dwd.fact_payment (
site_id, pay_id, order_trade_no, order_settle_id, member_id,
pay_method_code, pay_amount, pay_time, relate_type, relate_id
)
SELECT
site_id,
pay_id,
order_trade_no,
order_settle_id,
member_id,
pay_method_code,
pay_amount,
pay_time,
relate_type,
relate_id
FROM billiards_ods.ods_payment_record
ON CONFLICT (site_id, pay_id) DO NOTHING;
""",
),
(
"fact_refund",
"""
INSERT INTO billiards_dwd.fact_refund (
site_id, refund_id, order_trade_no, order_settle_id, member_id,
pay_method_code, refund_amount, refund_time, status
)
SELECT
site_id,
refund_id,
order_trade_no,
order_settle_id,
member_id,
pay_method_code,
refund_amount,
refund_time,
status
FROM billiards_ods.ods_refund_record
ON CONFLICT (site_id, refund_id) DO NOTHING;
""",
),
(
"fact_balance_change",
"""
INSERT INTO billiards_dwd.fact_balance_change (
site_id, change_id, member_id, change_type, relate_type, relate_id,
pay_method_code, change_amount, balance_before, balance_after, change_time
)
SELECT
site_id,
change_id,
member_id,
change_type,
NULL::TEXT AS relate_type,
relate_id,
NULL::TEXT AS pay_method_code,
change_amount,
balance_before,
balance_after,
change_time
FROM billiards_ods.ods_balance_change
ON CONFLICT (site_id, change_id) DO NOTHING;
""",
),
]
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Build DWD tables from ODS payloads (PRD schema).")
parser.add_argument(
"--dsn",
default=os.environ.get("PG_DSN"),
help="PostgreSQL DSN (fallback PG_DSN env)",
)
parser.add_argument(
"--timeout",
type=int,
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
help="connect_timeout seconds (capped at 20, default 10)",
)
return parser.parse_args()
def main() -> int:
args = parse_args()
if not args.dsn:
print("Missing DSN. Use --dsn or PG_DSN.", file=sys.stderr)
return 2
timeout_val = max(1, min(args.timeout, 20))
conn = psycopg2.connect(args.dsn, connect_timeout=timeout_val)
conn.autocommit = False
try:
with conn.cursor() as cur:
for name, sql in SQL_STEPS:
cur.execute(sql)
print(f"[OK] {name}")
conn.commit()
except Exception as exc: # pragma: no cover - operational script
conn.rollback()
print(f"[FAIL] {exc}", file=sys.stderr)
return 1
finally:
try:
conn.close()
except Exception:
pass
print("DWD build complete.")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,322 @@
# -*- coding: utf-8 -*-
"""Recompute billiards_dws.dws_order_summary from DWD fact tables."""
from __future__ import annotations
import argparse
import os
import sys
from pathlib import Path
PROJECT_ROOT = Path(__file__).resolve().parents[1]
if str(PROJECT_ROOT) not in sys.path:
sys.path.insert(0, str(PROJECT_ROOT))
from database.connection import DatabaseConnection # noqa: E402
SQL_BUILD_SUMMARY = r"""
WITH table_fee AS (
SELECT
site_id,
order_settle_id,
order_trade_no,
MIN(member_id) AS member_id,
SUM(COALESCE(final_table_fee, 0)) AS table_fee_amount,
SUM(COALESCE(member_discount_amount, 0)) AS member_discount_amount,
SUM(COALESCE(manual_discount_amount, 0)) AS manual_discount_amount,
SUM(COALESCE(original_table_fee, 0)) AS original_table_fee,
MIN(start_time) AS first_time
FROM billiards_dwd.fact_table_usage
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
AND (%(start_date)s IS NULL OR start_time::date >= %(start_date)s)
AND (%(end_date)s IS NULL OR start_time::date <= %(end_date)s)
AND COALESCE(is_canceled, FALSE) = FALSE
GROUP BY site_id, order_settle_id, order_trade_no
),
assistant_fee AS (
SELECT
site_id,
order_settle_id,
order_trade_no,
MIN(member_id) AS member_id,
SUM(COALESCE(final_fee, 0)) AS assistant_service_amount,
SUM(COALESCE(member_discount_amount, 0)) AS member_discount_amount,
SUM(COALESCE(manual_discount_amount, 0)) AS manual_discount_amount,
SUM(COALESCE(original_fee, 0)) AS original_fee,
MIN(start_time) AS first_time
FROM billiards_dwd.fact_assistant_service
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
AND (%(start_date)s IS NULL OR start_time::date >= %(start_date)s)
AND (%(end_date)s IS NULL OR start_time::date <= %(end_date)s)
AND COALESCE(is_canceled, FALSE) = FALSE
GROUP BY site_id, order_settle_id, order_trade_no
),
goods_fee AS (
SELECT
site_id,
order_settle_id,
order_trade_no,
MIN(member_id) AS member_id,
SUM(COALESCE(final_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_amount,
SUM(COALESCE(discount_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_discount_amount,
SUM(COALESCE(original_amount, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS goods_original_amount,
COUNT(*) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS item_count,
SUM(COALESCE(quantity, 0)) FILTER (WHERE COALESCE(is_gift, FALSE) = FALSE) AS total_item_quantity,
MIN(sale_time) AS first_time
FROM billiards_dwd.fact_sale_item
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
AND (%(start_date)s IS NULL OR sale_time::date >= %(start_date)s)
AND (%(end_date)s IS NULL OR sale_time::date <= %(end_date)s)
GROUP BY site_id, order_settle_id, order_trade_no
),
coupon_usage AS (
SELECT
site_id,
order_settle_id,
order_trade_no,
MIN(member_id) AS member_id,
SUM(COALESCE(deduct_amount, 0)) AS coupon_deduction,
SUM(COALESCE(settle_price, 0)) AS settle_price,
MIN(used_time) AS first_time
FROM billiards_dwd.fact_coupon_usage
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
AND (%(start_date)s IS NULL OR used_time::date >= %(start_date)s)
AND (%(end_date)s IS NULL OR used_time::date <= %(end_date)s)
GROUP BY site_id, order_settle_id, order_trade_no
),
payments AS (
SELECT
fp.site_id,
fp.order_settle_id,
fp.order_trade_no,
MIN(fp.member_id) AS member_id,
SUM(COALESCE(fp.pay_amount, 0)) AS total_paid_amount,
SUM(COALESCE(fp.pay_amount, 0)) FILTER (WHERE COALESCE(pm.is_stored_value, FALSE)) AS stored_card_deduct,
SUM(COALESCE(fp.pay_amount, 0)) FILTER (WHERE NOT COALESCE(pm.is_stored_value, FALSE)) AS external_paid_amount,
MIN(fp.pay_time) AS first_time
FROM billiards_dwd.fact_payment fp
LEFT JOIN billiards_dwd.dim_pay_method pm ON fp.pay_method_code = pm.pay_method_code
WHERE (%(site_id)s IS NULL OR fp.site_id = %(site_id)s)
AND (%(start_date)s IS NULL OR fp.pay_time::date >= %(start_date)s)
AND (%(end_date)s IS NULL OR fp.pay_time::date <= %(end_date)s)
GROUP BY fp.site_id, fp.order_settle_id, fp.order_trade_no
),
refunds AS (
SELECT
site_id,
order_settle_id,
order_trade_no,
SUM(COALESCE(refund_amount, 0)) AS refund_amount
FROM billiards_dwd.fact_refund
WHERE (%(site_id)s IS NULL OR site_id = %(site_id)s)
AND (%(start_date)s IS NULL OR refund_time::date >= %(start_date)s)
AND (%(end_date)s IS NULL OR refund_time::date <= %(end_date)s)
GROUP BY site_id, order_settle_id, order_trade_no
),
combined_ids AS (
SELECT site_id, order_settle_id, order_trade_no FROM table_fee
UNION
SELECT site_id, order_settle_id, order_trade_no FROM assistant_fee
UNION
SELECT site_id, order_settle_id, order_trade_no FROM goods_fee
UNION
SELECT site_id, order_settle_id, order_trade_no FROM coupon_usage
UNION
SELECT site_id, order_settle_id, order_trade_no FROM payments
UNION
SELECT site_id, order_settle_id, order_trade_no FROM refunds
),
site_dim AS (
SELECT site_id, tenant_id FROM billiards_dwd.dim_site
)
INSERT INTO billiards_dws.dws_order_summary (
site_id,
order_settle_id,
order_trade_no,
order_date,
tenant_id,
member_id,
member_flag,
recharge_order_flag,
item_count,
total_item_quantity,
table_fee_amount,
assistant_service_amount,
goods_amount,
group_amount,
total_coupon_deduction,
member_discount_amount,
manual_discount_amount,
order_original_amount,
order_final_amount,
stored_card_deduct,
external_paid_amount,
total_paid_amount,
book_table_flow,
book_assistant_flow,
book_goods_flow,
book_group_flow,
book_order_flow,
order_effective_consume_cash,
order_effective_recharge_cash,
order_effective_flow,
refund_amount,
net_income,
created_at,
updated_at
)
SELECT
c.site_id,
c.order_settle_id,
c.order_trade_no,
COALESCE(tf.first_time, af.first_time, gf.first_time, pay.first_time, cu.first_time)::date AS order_date,
sd.tenant_id,
COALESCE(tf.member_id, af.member_id, gf.member_id, cu.member_id, pay.member_id) AS member_id,
COALESCE(tf.member_id, af.member_id, gf.member_id, cu.member_id, pay.member_id) IS NOT NULL AS member_flag,
-- recharge flag: no consumption side but has payments
(COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
AND COALESCE(pay.total_paid_amount, 0) > 0 AS recharge_order_flag,
COALESCE(gf.item_count, 0) AS item_count,
COALESCE(gf.total_item_quantity, 0) AS total_item_quantity,
COALESCE(tf.table_fee_amount, 0) AS table_fee_amount,
COALESCE(af.assistant_service_amount, 0) AS assistant_service_amount,
COALESCE(gf.goods_amount, 0) AS goods_amount,
COALESCE(cu.settle_price, 0) AS group_amount,
COALESCE(cu.coupon_deduction, 0) AS total_coupon_deduction,
COALESCE(tf.member_discount_amount, 0) + COALESCE(af.member_discount_amount, 0) + COALESCE(gf.goods_discount_amount, 0) AS member_discount_amount,
COALESCE(tf.manual_discount_amount, 0) + COALESCE(af.manual_discount_amount, 0) AS manual_discount_amount,
COALESCE(tf.original_table_fee, 0) + COALESCE(af.original_fee, 0) + COALESCE(gf.goods_original_amount, 0) AS order_original_amount,
COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) - COALESCE(cu.coupon_deduction, 0) AS order_final_amount,
COALESCE(pay.stored_card_deduct, 0) AS stored_card_deduct,
COALESCE(pay.external_paid_amount, 0) AS external_paid_amount,
COALESCE(pay.total_paid_amount, 0) AS total_paid_amount,
COALESCE(tf.table_fee_amount, 0) AS book_table_flow,
COALESCE(af.assistant_service_amount, 0) AS book_assistant_flow,
COALESCE(gf.goods_amount, 0) AS book_goods_flow,
COALESCE(cu.settle_price, 0) AS book_group_flow,
COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) AS book_order_flow,
CASE
WHEN (COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
THEN 0
ELSE COALESCE(pay.external_paid_amount, 0)
END AS order_effective_consume_cash,
CASE
WHEN (COALESCE(tf.table_fee_amount, 0) + COALESCE(af.assistant_service_amount, 0) + COALESCE(gf.goods_amount, 0) + COALESCE(cu.settle_price, 0) = 0)
THEN COALESCE(pay.external_paid_amount, 0)
ELSE 0
END AS order_effective_recharge_cash,
COALESCE(pay.external_paid_amount, 0) + COALESCE(cu.settle_price, 0) AS order_effective_flow,
COALESCE(rf.refund_amount, 0) AS refund_amount,
(COALESCE(pay.external_paid_amount, 0) + COALESCE(cu.settle_price, 0)) - COALESCE(rf.refund_amount, 0) AS net_income,
now() AS created_at,
now() AS updated_at
FROM combined_ids c
LEFT JOIN table_fee tf ON c.site_id = tf.site_id AND c.order_settle_id = tf.order_settle_id
LEFT JOIN assistant_fee af ON c.site_id = af.site_id AND c.order_settle_id = af.order_settle_id
LEFT JOIN goods_fee gf ON c.site_id = gf.site_id AND c.order_settle_id = gf.order_settle_id
LEFT JOIN coupon_usage cu ON c.site_id = cu.site_id AND c.order_settle_id = cu.order_settle_id
LEFT JOIN payments pay ON c.site_id = pay.site_id AND c.order_settle_id = pay.order_settle_id
LEFT JOIN refunds rf ON c.site_id = rf.site_id AND c.order_settle_id = rf.order_settle_id
LEFT JOIN site_dim sd ON c.site_id = sd.site_id
ON CONFLICT (site_id, order_settle_id) DO UPDATE SET
order_trade_no = EXCLUDED.order_trade_no,
order_date = EXCLUDED.order_date,
tenant_id = EXCLUDED.tenant_id,
member_id = EXCLUDED.member_id,
member_flag = EXCLUDED.member_flag,
recharge_order_flag = EXCLUDED.recharge_order_flag,
item_count = EXCLUDED.item_count,
total_item_quantity = EXCLUDED.total_item_quantity,
table_fee_amount = EXCLUDED.table_fee_amount,
assistant_service_amount = EXCLUDED.assistant_service_amount,
goods_amount = EXCLUDED.goods_amount,
group_amount = EXCLUDED.group_amount,
total_coupon_deduction = EXCLUDED.total_coupon_deduction,
member_discount_amount = EXCLUDED.member_discount_amount,
manual_discount_amount = EXCLUDED.manual_discount_amount,
order_original_amount = EXCLUDED.order_original_amount,
order_final_amount = EXCLUDED.order_final_amount,
stored_card_deduct = EXCLUDED.stored_card_deduct,
external_paid_amount = EXCLUDED.external_paid_amount,
total_paid_amount = EXCLUDED.total_paid_amount,
book_table_flow = EXCLUDED.book_table_flow,
book_assistant_flow = EXCLUDED.book_assistant_flow,
book_goods_flow = EXCLUDED.book_goods_flow,
book_group_flow = EXCLUDED.book_group_flow,
book_order_flow = EXCLUDED.book_order_flow,
order_effective_consume_cash = EXCLUDED.order_effective_consume_cash,
order_effective_recharge_cash = EXCLUDED.order_effective_recharge_cash,
order_effective_flow = EXCLUDED.order_effective_flow,
refund_amount = EXCLUDED.refund_amount,
net_income = EXCLUDED.net_income,
updated_at = now();
"""
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Build/update dws_order_summary from DWD fact tables."
)
parser.add_argument(
"--dsn",
default=os.environ.get("PG_DSN"),
help="PostgreSQL DSN (fallback: PG_DSN env)",
)
parser.add_argument(
"--site-id",
type=int,
default=None,
help="Filter by site_id (optional, default all sites)",
)
parser.add_argument(
"--start-date",
dest="start_date",
default=None,
help="Filter facts from this date (YYYY-MM-DD, optional)",
)
parser.add_argument(
"--end-date",
dest="end_date",
default=None,
help="Filter facts until this date (YYYY-MM-DD, optional)",
)
parser.add_argument(
"--timeout",
type=int,
default=int(os.environ.get("PG_CONNECT_TIMEOUT", 10) or 10),
help="connect_timeout seconds (capped at 20, default 10)",
)
return parser.parse_args()
def main() -> int:
args = parse_args()
if not args.dsn:
print("Missing DSN. Set PG_DSN or pass --dsn.", file=sys.stderr)
return 2
params = {
"site_id": args.site_id,
"start_date": args.start_date,
"end_date": args.end_date,
}
timeout_val = max(1, min(args.timeout, 20))
conn = DatabaseConnection(args.dsn, connect_timeout=timeout_val)
try:
with conn.conn.cursor() as cur:
cur.execute(SQL_BUILD_SUMMARY, params)
conn.commit()
except Exception as exc: # pragma: no cover - operational script
conn.rollback()
print(f"DWS build failed: {exc}", file=sys.stderr)
return 1
finally:
conn.close()
print("dws_order_summary refreshed.")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,258 @@
# -*- coding: utf-8 -*-
"""
从本地 JSON 示例目录重建 billiards_ods.* 表,并导入样例数据。
用法:
PYTHONPATH=. python -m etl_billiards.scripts.rebuild_ods_from_json [--dsn ...] [--json-dir ...] [--include ...] [--drop-schema-first]
依赖环境变量:
PG_DSN PostgreSQL 连接串(必填)
PG_CONNECT_TIMEOUT 可选,秒,默认 10
JSON_DOC_DIR 可选JSON 目录,默认 C:\\dev\\LLTQ\\export\\test-json-doc
ODS_INCLUDE_FILES 可选,逗号分隔文件名(不含 .json
ODS_DROP_SCHEMA_FIRST 可选true/false默认 true
"""
from __future__ import annotations
import argparse
import os
import re
import sys
import json
from pathlib import Path
from typing import Iterable, List, Tuple
import psycopg2
from psycopg2 import sql
from psycopg2.extras import Json, execute_values
DEFAULT_JSON_DIR = r"C:\dev\LLTQ\export\test-json-doc"
SPECIAL_LIST_PATHS: dict[str, tuple[str, ...]] = {
"assistant_accounts_master": ("data", "assistantInfos"),
"assistant_cancellation_records": ("data", "abolitionAssistants"),
"assistant_service_records": ("data", "orderAssistantDetails"),
"goods_stock_movements": ("data", "queryDeliveryRecordsList"),
"goods_stock_summary": ("data",),
"group_buy_packages": ("data", "packageCouponList"),
"group_buy_redemption_records": ("data", "siteTableUseDetailsList"),
"member_balance_changes": ("data", "tenantMemberCardLogs"),
"member_profiles": ("data", "tenantMemberInfos"),
"member_stored_value_cards": ("data", "tenantMemberCards"),
"recharge_settlements": ("data", "settleList"),
"settlement_records": ("data", "settleList"),
"site_tables_master": ("data", "siteTables"),
"stock_goods_category_tree": ("data", "goodsCategoryList"),
"store_goods_master": ("data", "orderGoodsList"),
"store_goods_sales_records": ("data", "orderGoodsLedgers"),
"table_fee_discount_records": ("data", "taiFeeAdjustInfos"),
"table_fee_transactions": ("data", "siteTableUseDetailsList"),
"tenant_goods_master": ("data", "tenantGoodsList"),
}
def sanitize_identifier(name: str) -> str:
"""将任意字符串转为可用的 SQL identifier小写、非字母数字转下划线"""
cleaned = re.sub(r"[^0-9a-zA-Z_]", "_", name.strip())
if not cleaned:
cleaned = "col"
if cleaned[0].isdigit():
cleaned = f"_{cleaned}"
return cleaned.lower()
def _extract_list_via_path(node, path: tuple[str, ...]):
cur = node
for key in path:
if isinstance(cur, dict):
cur = cur.get(key)
else:
return []
return cur if isinstance(cur, list) else []
def load_records(payload, list_path: tuple[str, ...] | None = None) -> list:
"""
尝试从 JSON 结构中提取记录列表:
- 直接是 list -> 返回
- dict 中 data 是 list -> 返回
- dict 中 data 是 dict取第一个 list 字段
- dict 中任意值是 list -> 返回
- 其余情况,包装为单条记录
"""
if list_path:
if isinstance(payload, list):
merged: list = []
for item in payload:
merged.extend(_extract_list_via_path(item, list_path))
if merged:
return merged
elif isinstance(payload, dict):
lst = _extract_list_via_path(payload, list_path)
if lst:
return lst
if isinstance(payload, list):
return payload
if isinstance(payload, dict):
data_node = payload.get("data")
if isinstance(data_node, list):
return data_node
if isinstance(data_node, dict):
for v in data_node.values():
if isinstance(v, list):
return v
for v in payload.values():
if isinstance(v, list):
return v
return [payload]
def collect_columns(records: Iterable[dict]) -> List[str]:
"""汇总所有顶层键,作为表字段;仅处理 dict 记录。"""
cols: set[str] = set()
for rec in records:
if isinstance(rec, dict):
cols.update(rec.keys())
return sorted(cols)
def create_table(cur, schema: str, table: str, columns: List[Tuple[str, str]]):
"""
创建表:字段全部 jsonb外加 source_file、record_index、payload、ingested_at。
columns: [(col_name, original_key)]
"""
fields = [sql.SQL("{} jsonb").format(sql.Identifier(col)) for col, _ in columns]
constraint_name = f"uq_{table}_source_record"
ddl = sql.SQL(
"CREATE TABLE IF NOT EXISTS {schema}.{table} ("
"source_file text,"
"record_index integer,"
"{cols},"
"payload jsonb,"
"ingested_at timestamptz default now(),"
"CONSTRAINT {constraint} UNIQUE (source_file, record_index)"
");"
).format(
schema=sql.Identifier(schema),
table=sql.Identifier(table),
cols=sql.SQL(",").join(fields),
constraint=sql.Identifier(constraint_name),
)
cur.execute(ddl)
def insert_records(cur, schema: str, table: str, columns: List[Tuple[str, str]], records: list, source_file: str):
"""批量插入记录。"""
col_idents = [sql.Identifier(col) for col, _ in columns]
col_names = [col for col, _ in columns]
orig_keys = [orig for _, orig in columns]
all_cols = [sql.Identifier("source_file"), sql.Identifier("record_index")] + col_idents + [
sql.Identifier("payload")
]
rows = []
for idx, rec in enumerate(records):
if not isinstance(rec, dict):
rec = {"value": rec}
row_values = [source_file, idx]
for key in orig_keys:
row_values.append(Json(rec.get(key)))
row_values.append(Json(rec))
rows.append(row_values)
insert_sql = sql.SQL("INSERT INTO {}.{} ({}) VALUES %s ON CONFLICT DO NOTHING").format(
sql.Identifier(schema),
sql.Identifier(table),
sql.SQL(",").join(all_cols),
)
execute_values(cur, insert_sql, rows, page_size=500)
def rebuild(schema: str = "billiards_ods", data_dir: str | Path = DEFAULT_JSON_DIR):
parser = argparse.ArgumentParser(description="重建 billiards_ods.* 表并导入 JSON 样例")
parser.add_argument("--dsn", dest="dsn", help="PostgreSQL DSN默认读取环境变量 PG_DSN")
parser.add_argument("--json-dir", dest="json_dir", help=f"JSON 目录,默认 {DEFAULT_JSON_DIR}")
parser.add_argument(
"--include",
dest="include_files",
help="限定导入的文件名(逗号分隔,不含 .json默认全部",
)
parser.add_argument(
"--drop-schema-first",
dest="drop_schema_first",
action="store_true",
help="先删除并重建 schema默认 true",
)
parser.add_argument(
"--no-drop-schema-first",
dest="drop_schema_first",
action="store_false",
help="保留现有 schema仅按冲突去重导入",
)
parser.set_defaults(drop_schema_first=None)
args = parser.parse_args()
dsn = args.dsn or os.environ.get("PG_DSN")
if not dsn:
print("缺少参数/环境变量 PG_DSN无法连接数据库。")
sys.exit(1)
timeout = max(1, min(int(os.environ.get("PG_CONNECT_TIMEOUT", 10)), 60))
env_drop = os.environ.get("ODS_DROP_SCHEMA_FIRST") or os.environ.get("DROP_SCHEMA_FIRST")
drop_schema_first = (
args.drop_schema_first
if args.drop_schema_first is not None
else str(env_drop or "true").lower() in ("1", "true", "yes")
)
include_files_env = args.include_files or os.environ.get("ODS_INCLUDE_FILES") or os.environ.get("INCLUDE_FILES")
include_files = set()
if include_files_env:
include_files = {p.strip().lower() for p in include_files_env.split(",") if p.strip()}
base_dir = Path(args.json_dir or data_dir or DEFAULT_JSON_DIR)
if not base_dir.exists():
print(f"JSON 目录不存在: {base_dir}")
sys.exit(1)
conn = psycopg2.connect(dsn, connect_timeout=timeout)
conn.autocommit = False
cur = conn.cursor()
if drop_schema_first:
print(f"Dropping schema {schema} ...")
cur.execute(sql.SQL("DROP SCHEMA IF EXISTS {} CASCADE;").format(sql.Identifier(schema)))
cur.execute(sql.SQL("CREATE SCHEMA {};").format(sql.Identifier(schema)))
else:
cur.execute(
sql.SQL("SELECT schema_name FROM information_schema.schemata WHERE schema_name=%s"),
(schema,),
)
if not cur.fetchone():
cur.execute(sql.SQL("CREATE SCHEMA {};").format(sql.Identifier(schema)))
json_files = sorted(base_dir.glob("*.json"))
for path in json_files:
stem_lower = path.stem.lower()
if include_files and stem_lower not in include_files:
continue
print(f"Processing {path.name} ...")
payload = json.loads(path.read_text(encoding="utf-8"))
list_path = SPECIAL_LIST_PATHS.get(stem_lower)
records = load_records(payload, list_path=list_path)
columns_raw = collect_columns(records)
columns = [(sanitize_identifier(c), c) for c in columns_raw]
table_name = sanitize_identifier(path.stem)
create_table(cur, schema, table_name, columns)
if records:
insert_records(cur, schema, table_name, columns, records, path.name)
print(f" -> rows: {len(records)}, columns: {len(columns)}")
conn.commit()
cur.close()
conn.close()
print("Rebuild done.")
if __name__ == "__main__":
rebuild()

View File

@@ -0,0 +1,195 @@
# -*- coding: utf-8 -*-
"""
灵活的测试执行脚本,可像搭积木一样组合不同参数或预置命令(模式/数据库/归档路径等),
直接运行本文件即可触发 pytest。
示例:
python scripts/run_tests.py --suite online --flow FULL --keyword ORDERS
python scripts/run_tests.py --preset fetch_only
python scripts/run_tests.py --suite online --json-source tmp/archives
"""
from __future__ import annotations
import argparse
import importlib.util
import os
import shlex
import sys
from typing import Dict, List
import pytest
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
# 确保项目根目录在 sys.path便于 tests 内部 import config / tasks 等模块
if PROJECT_ROOT not in sys.path:
sys.path.insert(0, PROJECT_ROOT)
SUITE_MAP: Dict[str, str] = {
"online": "tests/unit/test_etl_tasks_online.py",
"integration": "tests/integration/test_database.py",
}
PRESETS: Dict[str, Dict] = {}
def _load_presets():
preset_path = os.path.join(os.path.dirname(__file__), "test_presets.py")
if not os.path.exists(preset_path):
return
spec = importlib.util.spec_from_file_location("test_presets", preset_path)
if not spec or not spec.loader:
return
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module) # type: ignore[attr-defined]
presets = getattr(module, "PRESETS", {})
if isinstance(presets, dict):
PRESETS.update(presets)
_load_presets()
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="ETL 测试执行器(支持参数化调配)")
parser.add_argument(
"--suite",
choices=sorted(SUITE_MAP.keys()),
nargs="+",
help="预置测试套件,可多选(默认全部 online/offline",
)
parser.add_argument(
"--tests",
nargs="+",
help="自定义测试路径(可与 --suite 混用),例如 tests/unit/test_config.py",
)
parser.add_argument(
"--flow",
choices=["FETCH_ONLY", "INGEST_ONLY", "FULL"],
help="覆盖 PIPELINE_FLOW在线抓取/本地清洗/全流程)",
)
parser.add_argument("--json-source", help="设置 JSON_SOURCE_DIR本地清洗入库使用的 JSON 目录)")
parser.add_argument("--json-fetch-root", help="设置 JSON_FETCH_ROOT在线抓取输出根目录")
parser.add_argument(
"--keyword",
"-k",
help="pytest -k 关键字过滤(例如 ORDERS只运行包含该字符串的用例",
)
parser.add_argument(
"--pytest-args",
help="附加 pytest 参数,格式与命令行一致(例如 \"-vv --maxfail=1\"",
)
parser.add_argument(
"--env",
action="append",
metavar="KEY=VALUE",
help="自定义环境变量,可重复传入,例如 --env STORE_ID=123",
)
parser.add_argument("--preset", choices=sorted(PRESETS.keys()) if PRESETS else None, nargs="+",
help="从 scripts/test_presets.py 中选择一个或多个组合命令")
parser.add_argument("--list-presets", action="store_true", help="列出可用预置命令后退出")
parser.add_argument("--dry-run", action="store_true", help="仅打印将要执行的命令与环境,不真正运行 pytest")
return parser.parse_args()
def apply_presets_to_args(args: argparse.Namespace):
if not args.preset:
return
for name in args.preset:
preset = PRESETS.get(name, {})
if not preset:
continue
for key, value in preset.items():
if key in ("suite", "tests"):
if not value:
continue
existing = getattr(args, key)
if existing is None:
setattr(args, key, list(value))
else:
existing.extend(value)
elif key == "env":
args.env = (args.env or []) + list(value)
elif key == "pytest_args":
args.pytest_args = " ".join(filter(None, [value, args.pytest_args or ""]))
elif key == "keyword":
if args.keyword is None:
args.keyword = value
else:
if getattr(args, key, None) is None:
setattr(args, key, value)
def apply_env(args: argparse.Namespace) -> Dict[str, str]:
env_updates = {}
if args.flow:
env_updates["PIPELINE_FLOW"] = args.flow
if args.json_source:
env_updates["JSON_SOURCE_DIR"] = args.json_source
if args.json_fetch_root:
env_updates["JSON_FETCH_ROOT"] = args.json_fetch_root
if args.env:
for item in args.env:
if "=" not in item:
raise SystemExit(f"--env 参数格式错误: {item!r},应为 KEY=VALUE")
key, value = item.split("=", 1)
env_updates[key.strip()] = value.strip()
for key, value in env_updates.items():
os.environ[key] = value
return env_updates
def build_pytest_args(args: argparse.Namespace) -> List[str]:
targets: List[str] = []
if args.suite:
for suite in args.suite:
targets.append(SUITE_MAP[suite])
if args.tests:
targets.extend(args.tests)
if not targets:
targets = list(SUITE_MAP.values())
pytest_args: List[str] = targets
if args.keyword:
pytest_args += ["-k", args.keyword]
if args.pytest_args:
pytest_args += shlex.split(args.pytest_args)
return pytest_args
def main() -> int:
os.chdir(PROJECT_ROOT)
args = parse_args()
if args.list_presets:
print("可用预置命令:")
if not PRESETS:
print("(暂无,可编辑 scripts/test_presets.py 添加)")
else:
for name in sorted(PRESETS):
print(f"- {name}")
return 0
apply_presets_to_args(args)
env_updates = apply_env(args)
pytest_args = build_pytest_args(args)
print("=== 环境变量覆盖 ===")
if env_updates:
for k, v in env_updates.items():
print(f"{k}={v}")
else:
print("(无覆盖,沿用系统默认)")
print("\n=== Pytest 参数 ===")
print(" ".join(pytest_args))
print()
if args.dry_run:
print("Dry-run 模式,未真正执行 pytest")
return 0
exit_code = pytest.main(pytest_args)
return int(exit_code)
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,64 @@
# -*- coding: utf-8 -*-
"""Quick utility for validating PostgreSQL connectivity (ASCII-only output)."""
from __future__ import annotations
import argparse
import os
import sys
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
if PROJECT_ROOT not in sys.path:
sys.path.insert(0, PROJECT_ROOT)
from database.connection import DatabaseConnection
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="PostgreSQL connectivity smoke test")
parser.add_argument("--dsn", help="Override TEST_DB_DSN / env value")
parser.add_argument(
"--query",
default="SELECT 1 AS ok",
help="Custom SQL to run after connection (default: SELECT 1 AS ok)",
)
parser.add_argument(
"--timeout",
type=int,
default=10,
help="connect_timeout seconds passed to psycopg2 (capped at 20, default: 10)",
)
return parser.parse_args()
def main() -> int:
args = parse_args()
dsn = args.dsn or os.environ.get("TEST_DB_DSN")
if not dsn:
print("Missing DSN. Use --dsn or TEST_DB_DSN.", file=sys.stderr)
return 2
print(f"Trying connection: {dsn}")
try:
timeout = max(1, min(args.timeout, 20))
conn = DatabaseConnection(dsn, connect_timeout=timeout)
except Exception as exc: # pragma: no cover - diagnostic output
print("Connection failed:", exc, file=sys.stderr)
return 1
try:
result = conn.query(args.query)
print("Connection OK, query result:")
for row in result:
print(row)
conn.close()
return 0
except Exception as exc: # pragma: no cover - diagnostic output
print("Connection succeeded but query failed:", exc, file=sys.stderr)
try:
conn.close()
finally:
return 3
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,122 @@
# -*- coding: utf-8 -*-
"""测试命令仓库:集中维护 run_tests.py 的常用组合,支持一键执行。"""
from __future__ import annotations
import argparse
import os
import subprocess
import sys
from typing import List
RUN_TESTS_SCRIPT = os.path.join(os.path.dirname(__file__), "run_tests.py")
# 默认自动运行的预置(可根据需要修改顺序/条目)
AUTO_RUN_PRESETS = ["fetch_only"]
PRESETS = {
"fetch_only": {
"suite": ["online"],
"flow": "FETCH_ONLY",
"json_fetch_root": "tmp/json_fetch",
"keyword": "ORDERS",
"pytest_args": "-vv",
"preset_meta": "仅在线抓取阶段,输出到本地目录",
},
"ingest_local": {
"suite": ["online"],
"flow": "INGEST_ONLY",
"json_source": "tests/source-data-doc",
"keyword": "ORDERS",
"preset_meta": "从指定 JSON 目录做本地清洗入库",
},
"full_pipeline": {
"suite": ["online"],
"flow": "FULL",
"json_fetch_root": "tmp/json_fetch",
"keyword": "ORDERS",
"preset_meta": "先抓取再清洗入库的全流程",
},
}
def print_parameter_help() -> None:
print("=== 参数键说明 ===")
print("suite : 预置套件列表,如 ['online','integration']")
print("tests : 自定义 pytest 路径列表")
print("flow : PIPELINE_FLOWFETCH_ONLY / INGEST_ONLY / FULL")
print("json_source : JSON_SOURCE_DIR本地清洗入库使用的 JSON 目录")
print("json_fetch_root : JSON_FETCH_ROOT在线抓取输出根目录")
print("keyword : pytest -k 过滤关键字")
print("pytest_args : 额外 pytest 参数(字符串)")
print("env : 附加环境变量,例如 ['KEY=VALUE']")
print("preset_meta : 仅用于注释说明")
print()
def print_presets() -> None:
if not PRESETS:
print("当前未定义任何预置,请在 PRESETS 中添加。")
return
for idx, (name, payload) in enumerate(PRESETS.items(), start=1):
comment = payload.get("preset_meta", "")
print(f"{idx}. {name}")
if comment:
print(f" 说明: {comment}")
for key, value in payload.items():
if key == "preset_meta":
continue
print(f" {key}: {value}")
print()
def resolve_targets(requested: List[str] | None) -> List[str]:
if not PRESETS:
raise SystemExit("预置为空,请先在 PRESETS 中定义测试组合。")
def valid(names: List[str]) -> List[str]:
return [name for name in names if name in PRESETS]
if requested:
candidates = valid(requested)
missing = [name for name in requested if name not in PRESETS]
if missing:
print(f"警告:忽略未定义的预置 {missing}")
if candidates:
return candidates
auto = valid(AUTO_RUN_PRESETS)
if auto:
return auto
return list(PRESETS.keys())
def run_presets(preset_names: List[str], dry_run: bool) -> None:
for name in preset_names:
cmd = [sys.executable, RUN_TESTS_SCRIPT, "--preset", name]
printable = " ".join(cmd)
if dry_run:
print(f"[Dry-Run] {printable}")
else:
print(f"\n>>> 执行: {printable}")
subprocess.run(cmd, check=False)
def main() -> None:
parser = argparse.ArgumentParser(description="测试预置仓库(集中配置即可批量触发 run_tests")
parser.add_argument("--preset", choices=sorted(PRESETS.keys()), nargs="+", help="指定要运行的预置命令")
parser.add_argument("--list", action="store_true", help="仅列出参数说明与所有预置")
parser.add_argument("--dry-run", action="store_true", help="仅打印命令,不执行 pytest")
args = parser.parse_args()
if args.list:
print_parameter_help()
print_presets()
return
targets = resolve_targets(args.preset)
run_presets(targets, dry_run=args.dry_run)
if __name__ == "__main__":
main()

30
etl_billiards/setup.py Normal file
View File

@@ -0,0 +1,30 @@
# -*- coding: utf-8 -*-
"""
Setup script for ETL Billiards
"""
from setuptools import setup, find_packages
with open("requirements.txt") as f:
requirements = f.read().splitlines()
setup(
name="etl-billiards",
version="2.0.0",
description="Modular ETL system for billiards business data",
author="Data Platform Team",
author_email="data-platform@example.com",
packages=find_packages(),
install_requires=requirements,
python_requires=">=3.10",
entry_points={
"console_scripts": [
"etl-billiards=cli.main:main",
],
},
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
],
)

View File

View File

@@ -0,0 +1,81 @@
# -*- coding: utf-8 -*-
"""助教作废任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.assistant_abolish import AssistantAbolishLoader
from models.parsers import TypeParser
class AssistantAbolishTask(BaseTask):
"""同步助教作废记录"""
def get_task_code(self) -> str:
return "ASSISTANT_ABOLISH"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params(
{
"siteId": context.store_id,
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, _ = self.api.get_paginated(
endpoint="/AssistantPerformance/GetAbolitionAssistant",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="abolitionAssistants",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_record(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = AssistantAbolishLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_records(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_record(self, raw: dict, store_id: int) -> dict | None:
abolish_id = TypeParser.parse_int(raw.get("id"))
if not abolish_id:
self.logger.warning("跳过缺少作废ID的记录: %s", raw)
return None
return {
"store_id": store_id,
"abolish_id": abolish_id,
"table_id": TypeParser.parse_int(raw.get("tableId")),
"table_name": raw.get("tableName"),
"table_area_id": TypeParser.parse_int(raw.get("tableAreaId")),
"table_area": raw.get("tableArea"),
"assistant_no": raw.get("assistantOn"),
"assistant_name": raw.get("assistantName"),
"charge_minutes": TypeParser.parse_int(raw.get("pdChargeMinutes")),
"abolish_amount": TypeParser.parse_decimal(raw.get("assistantAbolishAmount")),
"create_time": TypeParser.parse_timestamp(
raw.get("createTime") or raw.get("create_time"), self.tz
),
"trash_reason": raw.get("trashReason"),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

@@ -0,0 +1,102 @@
# -*- coding: utf-8 -*-
"""助教账号任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.dimensions.assistant import AssistantLoader
from models.parsers import TypeParser
class AssistantsTask(BaseTask):
"""同步助教账号资料"""
def get_task_code(self) -> str:
return "ASSISTANTS"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params({"siteId": context.store_id})
records, _ = self.api.get_paginated(
endpoint="/PersonnelManagement/SearchAssistantInfo",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="assistantInfos",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_assistant(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = AssistantLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_assistants(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_assistant(self, raw: dict, store_id: int) -> dict | None:
assistant_id = TypeParser.parse_int(raw.get("id"))
if not assistant_id:
self.logger.warning("跳过缺少助教ID的数据: %s", raw)
return None
return {
"store_id": store_id,
"assistant_id": assistant_id,
"assistant_no": raw.get("assistant_no") or raw.get("assistantNo"),
"nickname": raw.get("nickname"),
"real_name": raw.get("real_name") or raw.get("realName"),
"gender": raw.get("gender"),
"mobile": raw.get("mobile"),
"level": raw.get("level"),
"team_id": TypeParser.parse_int(raw.get("team_id") or raw.get("teamId")),
"team_name": raw.get("team_name"),
"assistant_status": raw.get("assistant_status"),
"work_status": raw.get("work_status"),
"entry_time": TypeParser.parse_timestamp(
raw.get("entry_time") or raw.get("entryTime"), self.tz
),
"resign_time": TypeParser.parse_timestamp(
raw.get("resign_time") or raw.get("resignTime"), self.tz
),
"start_time": TypeParser.parse_timestamp(
raw.get("start_time") or raw.get("startTime"), self.tz
),
"end_time": TypeParser.parse_timestamp(
raw.get("end_time") or raw.get("endTime"), self.tz
),
"create_time": TypeParser.parse_timestamp(
raw.get("create_time") or raw.get("createTime"), self.tz
),
"update_time": TypeParser.parse_timestamp(
raw.get("update_time") or raw.get("updateTime"), self.tz
),
"system_role_id": raw.get("system_role_id"),
"online_status": raw.get("online_status"),
"allow_cx": raw.get("allow_cx"),
"charge_way": raw.get("charge_way"),
"pd_unit_price": TypeParser.parse_decimal(raw.get("pd_unit_price")),
"cx_unit_price": TypeParser.parse_decimal(raw.get("cx_unit_price")),
"is_guaranteed": raw.get("is_guaranteed"),
"is_team_leader": raw.get("is_team_leader"),
"serial_number": raw.get("serial_number"),
"show_sort": raw.get("show_sort"),
"is_delete": raw.get("is_delete"),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

@@ -0,0 +1,79 @@
# -*- coding: utf-8 -*-
"""DWD任务基类"""
import json
from typing import Any, Dict, Iterator, List, Optional, Tuple
from datetime import datetime
from .base_task import BaseTask
from models.parsers import TypeParser
class BaseDwdTask(BaseTask):
"""
DWD 层任务基类
负责从 ODS 表读取数据,供子类清洗和写入事实/维度表
"""
def _get_ods_cursor(self, task_code: str) -> datetime:
"""
获取上次处理的 ODS 数据的时间点 (fetched_at)
这里简化处理,实际应该从 etl_cursor 表读取
目前先依赖 BaseTask 的时间窗口逻辑,或者子类自己管理
"""
# TODO: 对接真正的 CursorManager
# 暂时返回一个较早的时间,或者由子类通过 _get_time_window 获取
return None
def iter_ods_rows(
self,
table_name: str,
columns: List[str],
start_time: datetime,
end_time: datetime,
time_col: str = "fetched_at",
batch_size: int = 1000
) -> Iterator[List[Dict[str, Any]]]:
"""
分批迭代读取 ODS 表数据
Args:
table_name: ODS 表名
columns: 需要查询的字段列表 (必须包含 payload)
start_time: 开始时间 (包含)
end_time: 结束时间 (包含)
time_col: 时间过滤字段,默认 fetched_at
batch_size: 批次大小
"""
offset = 0
cols_str = ", ".join(columns)
while True:
sql = f"""
SELECT {cols_str}
FROM {table_name}
WHERE {time_col} >= %s AND {time_col} <= %s
ORDER BY {time_col} ASC
LIMIT %s OFFSET %s
"""
rows = self.db.query(sql, (start_time, end_time, batch_size, offset))
if not rows:
break
yield rows
if len(rows) < batch_size:
break
offset += batch_size
def parse_payload(self, row: Dict[str, Any]) -> Dict[str, Any]:
"""
解析 ODS 行中的 payload JSON
"""
payload = row.get("payload")
if isinstance(payload, str):
return json.loads(payload)
elif isinstance(payload, dict):
return payload
return {}

View File

@@ -0,0 +1,141 @@
# -*- coding: utf-8 -*-
"""ETL任务基类引入 Extract/Transform/Load 模板方法)"""
from __future__ import annotations
from dataclasses import dataclass
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo
@dataclass(frozen=True)
class TaskContext:
"""统一透传给 Extract/Transform/Load 的运行期信息。"""
store_id: int
window_start: datetime
window_end: datetime
window_minutes: int
cursor: dict | None = None
class BaseTask:
"""提供 E/T/L 模板的任务基类。"""
def __init__(self, config, db_connection, api_client, logger):
self.config = config
self.db = db_connection
self.api = api_client
self.logger = logger
self.tz = ZoneInfo(config.get("app.timezone", "Asia/Taipei"))
# ------------------------------------------------------------------ 基本信息
def get_task_code(self) -> str:
"""获取任务代码"""
raise NotImplementedError("子类需实现 get_task_code 方法")
# ------------------------------------------------------------------ E/T/L 钩子
def extract(self, context: TaskContext):
"""提取数据"""
raise NotImplementedError("子类需实现 extract 方法")
def transform(self, extracted, context: TaskContext):
"""转换数据"""
return extracted
def load(self, transformed, context: TaskContext) -> dict:
"""加载数据并返回统计信息"""
raise NotImplementedError("子类需实现 load 方法")
# ------------------------------------------------------------------ 主流程
def execute(self, cursor_data: dict | None = None) -> dict:
"""统一 orchestrate Extract → Transform → Load"""
context = self._build_context(cursor_data)
task_code = self.get_task_code()
self.logger.info(
"%s: 开始执行,窗口[%s ~ %s]",
task_code,
context.window_start,
context.window_end,
)
try:
extracted = self.extract(context)
transformed = self.transform(extracted, context)
counts = self.load(transformed, context) or {}
self.db.commit()
except Exception:
self.db.rollback()
self.logger.error("%s: 执行失败", task_code, exc_info=True)
raise
result = self._build_result("SUCCESS", counts)
result["window"] = {
"start": context.window_start,
"end": context.window_end,
"minutes": context.window_minutes,
}
self.logger.info("%s: 完成,统计=%s", task_code, result["counts"])
return result
# ------------------------------------------------------------------ 辅助方法
def _build_context(self, cursor_data: dict | None) -> TaskContext:
window_start, window_end, window_minutes = self._get_time_window(cursor_data)
return TaskContext(
store_id=self.config.get("app.store_id"),
window_start=window_start,
window_end=window_end,
window_minutes=window_minutes,
cursor=cursor_data,
)
def _get_time_window(self, cursor_data: dict = None) -> tuple:
"""计算时间窗口"""
now = datetime.now(self.tz)
idle_start = self.config.get("run.idle_window.start", "04:00")
idle_end = self.config.get("run.idle_window.end", "16:00")
is_idle = self._is_in_idle_window(now, idle_start, idle_end)
if is_idle:
window_minutes = self.config.get("run.window_minutes.default_idle", 180)
else:
window_minutes = self.config.get("run.window_minutes.default_busy", 30)
overlap_seconds = self.config.get("run.overlap_seconds", 120)
if cursor_data and cursor_data.get("last_end"):
window_start = cursor_data["last_end"] - timedelta(seconds=overlap_seconds)
else:
window_start = now - timedelta(minutes=window_minutes)
window_end = now
return window_start, window_end, window_minutes
def _is_in_idle_window(self, dt: datetime, start_time: str, end_time: str) -> bool:
"""判断是否在闲时窗口"""
current_time = dt.strftime("%H:%M")
return start_time <= current_time <= end_time
def _merge_common_params(self, base: dict) -> dict:
"""
合并全局/任务级参数池便于在配置中统一覆<E4B880>?/追加过滤条件。
支持:
- api.params 下的通用键<E794A8>?
- api.params.<task_code_lower> 下的任务级键<E7BAA7>?
"""
merged: dict = {}
common = self.config.get("api.params", {}) or {}
if isinstance(common, dict):
merged.update(common)
task_key = f"api.params.{self.get_task_code().lower()}"
scoped = self.config.get(task_key, {}) or {}
if isinstance(scoped, dict):
merged.update(scoped)
merged.update(base)
return merged
def _build_result(self, status: str, counts: dict) -> dict:
"""构建结果字典"""
return {"status": status, "counts": counts}

View File

@@ -0,0 +1,93 @@
# -*- coding: utf-8 -*-
"""平台券核销任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.coupon_usage import CouponUsageLoader
from models.parsers import TypeParser
class CouponUsageTask(BaseTask):
"""同步平台券验券/核销记录"""
def get_task_code(self) -> str:
return "COUPON_USAGE"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params(
{
"siteId": context.store_id,
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, _ = self.api.get_paginated(
endpoint="/Promotion/GetOfflineCouponConsumePageList",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_usage(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = CouponUsageLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_coupon_usage(
transformed["records"]
)
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_usage(self, raw: dict, store_id: int) -> dict | None:
usage_id = TypeParser.parse_int(raw.get("id"))
if not usage_id:
self.logger.warning("跳过缺少券核销ID的记录: %s", raw)
return None
return {
"store_id": store_id,
"usage_id": usage_id,
"coupon_code": raw.get("coupon_code"),
"coupon_channel": raw.get("coupon_channel"),
"coupon_name": raw.get("coupon_name"),
"sale_price": TypeParser.parse_decimal(raw.get("sale_price")),
"coupon_money": TypeParser.parse_decimal(raw.get("coupon_money")),
"coupon_free_time": TypeParser.parse_int(raw.get("coupon_free_time")),
"use_status": raw.get("use_status"),
"create_time": TypeParser.parse_timestamp(
raw.get("create_time") or raw.get("createTime"), self.tz
),
"consume_time": TypeParser.parse_timestamp(
raw.get("consume_time") or raw.get("consumeTime"), self.tz
),
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
"operator_name": raw.get("operator_name"),
"table_id": TypeParser.parse_int(raw.get("table_id")),
"site_order_id": TypeParser.parse_int(raw.get("site_order_id")),
"group_package_id": TypeParser.parse_int(raw.get("group_package_id")),
"coupon_remark": raw.get("coupon_remark"),
"deal_id": raw.get("deal_id"),
"certificate_id": raw.get("certificate_id"),
"verify_id": raw.get("verify_id"),
"is_delete": raw.get("is_delete"),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

@@ -0,0 +1,90 @@
# -*- coding: utf-8 -*-
"""库存变更任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.inventory_change import InventoryChangeLoader
from models.parsers import TypeParser
class InventoryChangeTask(BaseTask):
"""同步库存变化记录"""
def get_task_code(self) -> str:
return "INVENTORY_CHANGE"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params(
{
"siteId": context.store_id,
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, _ = self.api.get_paginated(
endpoint="/GoodsStockManage/QueryGoodsOutboundReceipt",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="queryDeliveryRecordsList",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_change(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = InventoryChangeLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_changes(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_change(self, raw: dict, store_id: int) -> dict | None:
change_id = TypeParser.parse_int(
raw.get("siteGoodsStockId") or raw.get("site_goods_stock_id")
)
if not change_id:
self.logger.warning("跳过缺少库存变动ID的记录: %s", raw)
return None
return {
"store_id": store_id,
"change_id": change_id,
"site_goods_id": TypeParser.parse_int(
raw.get("siteGoodsId") or raw.get("site_goods_id")
),
"stock_type": raw.get("stockType") or raw.get("stock_type"),
"goods_name": raw.get("goodsName"),
"change_time": TypeParser.parse_timestamp(
raw.get("createTime") or raw.get("create_time"), self.tz
),
"start_qty": TypeParser.parse_int(raw.get("startNum")),
"end_qty": TypeParser.parse_int(raw.get("endNum")),
"change_qty": TypeParser.parse_int(raw.get("changeNum")),
"unit": raw.get("unit"),
"price": TypeParser.parse_decimal(raw.get("price")),
"operator_name": raw.get("operatorName"),
"remark": raw.get("remark"),
"goods_category_id": TypeParser.parse_int(raw.get("goodsCategoryId")),
"goods_second_category_id": TypeParser.parse_int(
raw.get("goodsSecondCategoryId")
),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

@@ -0,0 +1,115 @@
# -*- coding: utf-8 -*-
"""助教流水任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.assistant_ledger import AssistantLedgerLoader
from models.parsers import TypeParser
class LedgerTask(BaseTask):
"""同步助教服务台账"""
def get_task_code(self) -> str:
return "LEDGER"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params(
{
"siteId": context.store_id,
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, _ = self.api.get_paginated(
endpoint="/AssistantPerformance/GetOrderAssistantDetails",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="orderAssistantDetails",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_ledger(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = AssistantLedgerLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_ledgers(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_ledger(self, raw: dict, store_id: int) -> dict | None:
ledger_id = TypeParser.parse_int(raw.get("id"))
if not ledger_id:
self.logger.warning("跳过缺少助教流水ID的记录: %s", raw)
return None
return {
"store_id": store_id,
"ledger_id": ledger_id,
"assistant_no": raw.get("assistantNo"),
"assistant_name": raw.get("assistantName"),
"nickname": raw.get("nickname"),
"level_name": raw.get("levelName"),
"table_name": raw.get("tableName"),
"ledger_unit_price": TypeParser.parse_decimal(raw.get("ledger_unit_price")),
"ledger_count": TypeParser.parse_int(raw.get("ledger_count")),
"ledger_amount": TypeParser.parse_decimal(raw.get("ledger_amount")),
"projected_income": TypeParser.parse_decimal(raw.get("projected_income")),
"service_money": TypeParser.parse_decimal(raw.get("service_money")),
"member_discount_amount": TypeParser.parse_decimal(
raw.get("member_discount_amount")
),
"manual_discount_amount": TypeParser.parse_decimal(
raw.get("manual_discount_amount")
),
"coupon_deduct_money": TypeParser.parse_decimal(
raw.get("coupon_deduct_money")
),
"order_trade_no": TypeParser.parse_int(raw.get("order_trade_no")),
"order_settle_id": TypeParser.parse_int(raw.get("order_settle_id")),
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
"operator_name": raw.get("operator_name"),
"assistant_team_id": TypeParser.parse_int(raw.get("assistant_team_id")),
"assistant_level": raw.get("assistant_level"),
"site_table_id": TypeParser.parse_int(raw.get("site_table_id")),
"order_assistant_id": TypeParser.parse_int(raw.get("order_assistant_id")),
"site_assistant_id": TypeParser.parse_int(raw.get("site_assistant_id")),
"user_id": TypeParser.parse_int(raw.get("user_id")),
"ledger_start_time": TypeParser.parse_timestamp(
raw.get("ledger_start_time"), self.tz
),
"ledger_end_time": TypeParser.parse_timestamp(
raw.get("ledger_end_time"), self.tz
),
"start_use_time": TypeParser.parse_timestamp(raw.get("start_use_time"), self.tz),
"last_use_time": TypeParser.parse_timestamp(raw.get("last_use_time"), self.tz),
"income_seconds": TypeParser.parse_int(raw.get("income_seconds")),
"real_use_seconds": TypeParser.parse_int(raw.get("real_use_seconds")),
"is_trash": raw.get("is_trash"),
"trash_reason": raw.get("trash_reason"),
"is_confirm": raw.get("is_confirm"),
"ledger_status": raw.get("ledger_status"),
"create_time": TypeParser.parse_timestamp(
raw.get("create_time") or raw.get("createTime"), self.tz
),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,89 @@
# -*- coding: utf-8 -*-
from .base_dwd_task import BaseDwdTask
from loaders.dimensions.member import MemberLoader
from models.parsers import TypeParser
import json
class MembersDwdTask(BaseDwdTask):
"""
DWD Task: Process Member Records from ODS to Dimension Table
Source: billiards_ods.ods_member_profile
Target: billiards.dim_member
"""
def get_task_code(self) -> str:
return "MEMBERS_DWD"
def execute(self) -> dict:
self.logger.info(f"Starting {self.get_task_code()} task")
window_start, window_end, _ = self._get_time_window()
self.logger.info(f"Processing window: {window_start} to {window_end}")
loader = MemberLoader(self.db)
store_id = self.config.get("app.store_id")
total_inserted = 0
total_updated = 0
total_errors = 0
# Iterate ODS Data
batches = self.iter_ods_rows(
table_name="billiards_ods.ods_member_profile",
columns=["site_id", "member_id", "payload", "fetched_at"],
start_time=window_start,
end_time=window_end
)
for batch in batches:
if not batch:
continue
parsed_rows = []
for row in batch:
payload = self.parse_payload(row)
if not payload:
continue
parsed = self._parse_member(payload, store_id)
if parsed:
parsed_rows.append(parsed)
if parsed_rows:
inserted, updated, skipped = loader.upsert_members(parsed_rows, store_id)
total_inserted += inserted
total_updated += updated
self.db.commit()
self.logger.info(f"Task {self.get_task_code()} completed. Inserted: {total_inserted}, Updated: {total_updated}")
return {
"status": "success",
"inserted": total_inserted,
"updated": total_updated,
"window_start": window_start.isoformat(),
"window_end": window_end.isoformat()
}
def _parse_member(self, raw: dict, store_id: int) -> dict:
"""Parse ODS payload into Dim structure"""
try:
# Handle both API structure (camelCase) and manual structure
member_id = raw.get("id") or raw.get("memberId")
if not member_id:
return None
return {
"store_id": store_id,
"member_id": member_id,
"member_name": raw.get("name") or raw.get("memberName"),
"phone": raw.get("phone") or raw.get("mobile"),
"balance": raw.get("balance", 0),
"status": str(raw.get("status", "NORMAL")),
"register_time": raw.get("createTime") or raw.get("registerTime"),
"raw_data": json.dumps(raw, ensure_ascii=False)
}
except Exception as e:
self.logger.warning(f"Error parsing member: {e}")
return None

View File

@@ -0,0 +1,72 @@
# -*- coding: utf-8 -*-
"""会员ETL任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.dimensions.member import MemberLoader
from models.parsers import TypeParser
class MembersTask(BaseTask):
"""会员ETL任务"""
def get_task_code(self) -> str:
return "MEMBERS"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params({"siteId": context.store_id})
records, _ = self.api.get_paginated(
endpoint="/MemberProfile/GetTenantMemberList",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="tenantMemberInfos",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
parsed_row = self._parse_member(raw, context.store_id)
if parsed_row:
parsed.append(parsed_row)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = MemberLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_members(
transformed["records"], context.store_id
)
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_member(self, raw: dict, store_id: int) -> dict | None:
"""解析会员记录"""
try:
member_id = TypeParser.parse_int(raw.get("memberId"))
if not member_id:
return None
return {
"store_id": store_id,
"member_id": member_id,
"member_name": raw.get("memberName"),
"phone": raw.get("phone"),
"balance": TypeParser.parse_decimal(raw.get("balance")),
"status": raw.get("status"),
"register_time": TypeParser.parse_timestamp(raw.get("registerTime"), self.tz),
"raw_data": json.dumps(raw, ensure_ascii=False),
}
except Exception as exc:
self.logger.warning("解析会员记录失败: %s, 原始数据: %s", exc, raw)
return None

View File

@@ -0,0 +1,933 @@
# -*- coding: utf-8 -*-
"""ODS ingestion tasks."""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Callable, Dict, Iterable, List, Sequence, Tuple, Type
from loaders.ods import GenericODSLoader
from models.parsers import TypeParser
from .base_task import BaseTask
ColumnTransform = Callable[[Any], Any]
@dataclass(frozen=True)
class ColumnSpec:
"""Mapping between DB column and source JSON field."""
column: str
sources: Tuple[str, ...] = ()
required: bool = False
default: Any = None
transform: ColumnTransform | None = None
@dataclass(frozen=True)
class OdsTaskSpec:
"""Definition of a single ODS ingestion task."""
code: str
class_name: str
table_name: str
endpoint: str
data_path: Tuple[str, ...] = ("data",)
list_key: str | None = None
pk_columns: Tuple[ColumnSpec, ...] = ()
extra_columns: Tuple[ColumnSpec, ...] = ()
include_page_size: bool = False
include_page_no: bool = False
include_source_file: bool = True
include_source_endpoint: bool = True
include_record_index: bool = False
include_site_column: bool = True
include_fetched_at: bool = True
requires_window: bool = True
time_fields: Tuple[str, str] | None = ("startTime", "endTime")
include_site_id: bool = True
description: str = ""
extra_params: Dict[str, Any] = field(default_factory=dict)
conflict_columns_override: Tuple[str, ...] | None = None
class BaseOdsTask(BaseTask):
"""Shared functionality for ODS ingestion tasks."""
SPEC: OdsTaskSpec
def get_task_code(self) -> str:
return self.SPEC.code
def execute(self) -> dict:
spec = self.SPEC
self.logger.info("开始执行 %s (ODS)", spec.code)
store_id = TypeParser.parse_int(self.config.get("app.store_id"))
if not store_id:
raise ValueError("app.store_id 未配置,无法执行 ODS 任务")
page_size = self.config.get("api.page_size", 200)
params = self._build_params(spec, store_id)
columns = self._resolve_columns(spec)
if spec.conflict_columns_override:
conflict_columns = list(spec.conflict_columns_override)
else:
conflict_columns = []
if spec.include_site_column:
conflict_columns.append("site_id")
conflict_columns += [col.column for col in spec.pk_columns]
loader = GenericODSLoader(
self.db,
spec.table_name,
columns,
conflict_columns,
)
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
source_file = self._resolve_source_file_hint(spec)
try:
global_index = 0
for page_no, page_records, _, _ in self.api.iter_paginated(
endpoint=spec.endpoint,
params=params,
page_size=page_size,
data_path=spec.data_path,
list_key=spec.list_key,
):
rows: List[dict] = []
for raw in page_records:
row = self._build_row(
spec=spec,
store_id=store_id,
record=raw,
page_no=page_no if spec.include_page_no else None,
page_size_value=len(page_records)
if spec.include_page_size
else None,
source_file=source_file,
record_index=global_index if spec.include_record_index else None,
)
if row is None:
counts["skipped"] += 1
continue
rows.append(row)
global_index += 1
inserted, updated, _ = loader.upsert_rows(rows)
counts["inserted"] += inserted
counts["updated"] += updated
counts["fetched"] += len(page_records)
self.db.commit()
self.logger.info("%s ODS 任务完成: %s", spec.code, counts)
return self._build_result("SUCCESS", counts)
except Exception:
self.db.rollback()
counts["errors"] += 1
self.logger.error("%s ODS 任务失败", spec.code, exc_info=True)
raise
def _build_params(self, spec: OdsTaskSpec, store_id: int) -> dict:
base: dict[str, Any] = {}
if spec.include_site_id:
base["siteId"] = store_id
if spec.requires_window and spec.time_fields:
window_start, window_end, _ = self._get_time_window()
start_key, end_key = spec.time_fields
base[start_key] = TypeParser.format_timestamp(window_start, self.tz)
base[end_key] = TypeParser.format_timestamp(window_end, self.tz)
params = self._merge_common_params(base)
params.update(spec.extra_params)
return params
def _resolve_columns(self, spec: OdsTaskSpec) -> List[str]:
columns: List[str] = []
if spec.include_site_column:
columns.append("site_id")
seen = set(columns)
for col_spec in list(spec.pk_columns) + list(spec.extra_columns):
if col_spec.column not in seen:
columns.append(col_spec.column)
seen.add(col_spec.column)
if spec.include_record_index and "record_index" not in seen:
columns.append("record_index")
seen.add("record_index")
if spec.include_page_no and "page_no" not in seen:
columns.append("page_no")
seen.add("page_no")
if spec.include_page_size and "page_size" not in seen:
columns.append("page_size")
seen.add("page_size")
if spec.include_source_file and "source_file" not in seen:
columns.append("source_file")
seen.add("source_file")
if spec.include_source_endpoint and "source_endpoint" not in seen:
columns.append("source_endpoint")
seen.add("source_endpoint")
if spec.include_fetched_at and "fetched_at" not in seen:
columns.append("fetched_at")
seen.add("fetched_at")
if "payload" not in seen:
columns.append("payload")
return columns
def _build_row(
self,
spec: OdsTaskSpec,
store_id: int,
record: dict,
page_no: int | None,
page_size_value: int | None,
source_file: str | None,
record_index: int | None = None,
) -> dict | None:
row: dict[str, Any] = {}
if spec.include_site_column:
row["site_id"] = store_id
for col_spec in spec.pk_columns + spec.extra_columns:
value = self._extract_value(record, col_spec)
if value is None and col_spec.required:
self.logger.warning(
"%s 缺少必填字段 %s,原始记录: %s",
spec.code,
col_spec.column,
record,
)
return None
row[col_spec.column] = value
if spec.include_page_no:
row["page_no"] = page_no
if spec.include_page_size:
row["page_size"] = page_size_value
if spec.include_record_index:
row["record_index"] = record_index
if spec.include_source_file:
row["source_file"] = source_file
if spec.include_source_endpoint:
row["source_endpoint"] = spec.endpoint
if spec.include_fetched_at:
row["fetched_at"] = datetime.now(self.tz)
row["payload"] = record
return row
def _extract_value(self, record: dict, spec: ColumnSpec):
value = None
for key in spec.sources:
value = self._dig(record, key)
if value is not None:
break
if value is None and spec.default is not None:
value = spec.default
if value is not None and spec.transform:
value = spec.transform(value)
return value
@staticmethod
def _dig(record: Any, path: str | None):
if not path:
return None
current = record
for part in path.split("."):
if isinstance(current, dict):
current = current.get(part)
else:
return None
return current
def _resolve_source_file_hint(self, spec: OdsTaskSpec) -> str | None:
resolver = getattr(self.api, "get_source_hint", None)
if callable(resolver):
return resolver(spec.endpoint)
return None
def _int_col(name: str, *sources: str, required: bool = False) -> ColumnSpec:
return ColumnSpec(
column=name,
sources=sources,
required=required,
transform=TypeParser.parse_int,
)
ODS_TASK_SPECS: Tuple[OdsTaskSpec, ...] = (
OdsTaskSpec(
code="ODS_ASSISTANT_ACCOUNTS",
class_name="OdsAssistantAccountsTask",
table_name="billiards_ods.assistant_accounts_master",
endpoint="/PersonnelManagement/SearchAssistantInfo",
data_path=("data",),
list_key="assistantInfos",
pk_columns=(_int_col("id", "id", required=True),),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
description="助教账号档案 ODSSearchAssistantInfo -> assistantInfos 原始 JSON",
),
OdsTaskSpec(
code="ODS_ORDER_SETTLE",
class_name="OdsOrderSettleTask",
table_name="billiards_ods.settlement_records",
endpoint="/Site/GetAllOrderSettleList",
data_path=("data",),
list_key="settleList",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="结账记录 ODSGetAllOrderSettleList -> settleList 原始 JSON",
),
OdsTaskSpec(
code="ODS_TABLE_USE",
class_name="OdsTableUseTask",
table_name="billiards_ods.table_fee_transactions",
endpoint="/Site/GetSiteTableOrderDetails",
data_path=("data",),
list_key="siteTableUseDetailsList",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="台费计费流水 ODSGetSiteTableOrderDetails -> siteTableUseDetailsList 原始 JSON",
),
OdsTaskSpec(
code="ODS_ASSISTANT_LEDGER",
class_name="OdsAssistantLedgerTask",
table_name="billiards_ods.assistant_service_records",
endpoint="/AssistantPerformance/GetOrderAssistantDetails",
data_path=("data",),
list_key="orderAssistantDetails",
pk_columns=(_int_col("id", "id", required=True),),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
description="助教服务流水 ODSGetOrderAssistantDetails -> orderAssistantDetails 原始 JSON",
),
OdsTaskSpec(
code="ODS_ASSISTANT_ABOLISH",
class_name="OdsAssistantAbolishTask",
table_name="billiards_ods.assistant_cancellation_records",
endpoint="/AssistantPerformance/GetAbolitionAssistant",
data_path=("data",),
list_key="abolitionAssistants",
pk_columns=(_int_col("id", "id", required=True),),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
description="助教废除记录 ODSGetAbolitionAssistant -> abolitionAssistants 原始 JSON",
),
OdsTaskSpec(
code="ODS_GOODS_LEDGER",
class_name="OdsGoodsLedgerTask",
table_name="billiards_ods.store_goods_sales_records",
endpoint="/TenantGoods/GetGoodsSalesList",
data_path=("data",),
list_key="orderGoodsLedgers",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="门店商品销售流水 ODSGetGoodsSalesList -> orderGoodsLedgers 原始 JSON",
),
OdsTaskSpec(
code="ODS_PAYMENT",
class_name="OdsPaymentTask",
table_name="billiards_ods.payment_transactions",
endpoint="/PayLog/GetPayLogListPage",
data_path=("data",),
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="支付流水 ODSGetPayLogListPage 原始 JSON",
),
OdsTaskSpec(
code="ODS_REFUND",
class_name="OdsRefundTask",
table_name="billiards_ods.refund_transactions",
endpoint="/Order/GetRefundPayLogList",
data_path=("data",),
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="退款流水 ODSGetRefundPayLogList 原始 JSON",
),
OdsTaskSpec(
code="ODS_COUPON_VERIFY",
class_name="OdsCouponVerifyTask",
table_name="billiards_ods.platform_coupon_redemption_records",
endpoint="/Promotion/GetOfflineCouponConsumePageList",
data_path=("data",),
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="平台/团购券核销 ODSGetOfflineCouponConsumePageList 原始 JSON",
),
OdsTaskSpec(
code="ODS_MEMBER",
class_name="OdsMemberTask",
table_name="billiards_ods.member_profiles",
endpoint="/MemberProfile/GetTenantMemberList",
data_path=("data",),
list_key="tenantMemberInfos",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="会员档案 ODSGetTenantMemberList -> tenantMemberInfos 原始 JSON",
),
OdsTaskSpec(
code="ODS_MEMBER_CARD",
class_name="OdsMemberCardTask",
table_name="billiards_ods.member_stored_value_cards",
endpoint="/MemberProfile/GetTenantMemberCardList",
data_path=("data",),
list_key="tenantMemberCards",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="会员储值卡 ODSGetTenantMemberCardList -> tenantMemberCards 原始 JSON",
),
OdsTaskSpec(
code="ODS_MEMBER_BALANCE",
class_name="OdsMemberBalanceTask",
table_name="billiards_ods.member_balance_changes",
endpoint="/MemberProfile/GetMemberCardBalanceChange",
data_path=("data",),
list_key="tenantMemberCardLogs",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="会员余额变动 ODSGetMemberCardBalanceChange -> tenantMemberCardLogs 原始 JSON",
),
OdsTaskSpec(
code="ODS_RECHARGE_SETTLE",
class_name="OdsRechargeSettleTask",
table_name="billiards_ods.recharge_settlements",
endpoint="/Site/GetRechargeSettleList",
data_path=("data",),
list_key="settleList",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="会员充值结算 ODSGetRechargeSettleList -> settleList 原始 JSON",
),
OdsTaskSpec(
code="ODS_PACKAGE",
class_name="OdsPackageTask",
table_name="billiards_ods.group_buy_packages",
endpoint="/PackageCoupon/QueryPackageCouponList",
data_path=("data",),
list_key="packageCouponList",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="团购套餐定义 ODSQueryPackageCouponList -> packageCouponList 原始 JSON",
),
OdsTaskSpec(
code="ODS_GROUP_BUY_REDEMPTION",
class_name="OdsGroupBuyRedemptionTask",
table_name="billiards_ods.group_buy_redemption_records",
endpoint="/Site/GetSiteTableUseDetails",
data_path=("data",),
list_key="siteTableUseDetailsList",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="团购套餐核销 ODSGetSiteTableUseDetails -> siteTableUseDetailsList 原始 JSON",
),
OdsTaskSpec(
code="ODS_INVENTORY_STOCK",
class_name="OdsInventoryStockTask",
table_name="billiards_ods.goods_stock_summary",
endpoint="/TenantGoods/GetGoodsStockReport",
data_path=("data",),
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="库存汇总 ODSGetGoodsStockReport 原始 JSON",
),
OdsTaskSpec(
code="ODS_INVENTORY_CHANGE",
class_name="OdsInventoryChangeTask",
table_name="billiards_ods.goods_stock_movements",
endpoint="/GoodsStockManage/QueryGoodsOutboundReceipt",
data_path=("data",),
list_key="queryDeliveryRecordsList",
pk_columns=(_int_col("sitegoodsstockid", "siteGoodsStockId", required=True),),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
description="库存变化记录 ODSQueryGoodsOutboundReceipt -> queryDeliveryRecordsList 原始 JSON",
),
OdsTaskSpec(
code="ODS_TABLES",
class_name="OdsTablesTask",
table_name="billiards_ods.site_tables_master",
endpoint="/Table/GetSiteTables",
data_path=("data",),
list_key="siteTables",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="台桌维表 ODSGetSiteTables -> siteTables 原始 JSON",
),
OdsTaskSpec(
code="ODS_GOODS_CATEGORY",
class_name="OdsGoodsCategoryTask",
table_name="billiards_ods.stock_goods_category_tree",
endpoint="/TenantGoodsCategory/QueryPrimarySecondaryCategory",
data_path=("data",),
list_key="goodsCategoryList",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="库存商品分类树 ODSQueryPrimarySecondaryCategory -> goodsCategoryList 原始 JSON",
),
OdsTaskSpec(
code="ODS_STORE_GOODS",
class_name="OdsStoreGoodsTask",
table_name="billiards_ods.store_goods_master",
endpoint="/TenantGoods/GetGoodsInventoryList",
data_path=("data",),
list_key="orderGoodsList",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="门店商品档案 ODSGetGoodsInventoryList -> orderGoodsList 原始 JSON",
),
OdsTaskSpec(
code="ODS_TABLE_DISCOUNT",
class_name="OdsTableDiscountTask",
table_name="billiards_ods.table_fee_discount_records",
endpoint="/Site/GetTaiFeeAdjustList",
data_path=("data",),
list_key="taiFeeAdjustInfos",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="台费折扣/调账 ODSGetTaiFeeAdjustList -> taiFeeAdjustInfos 原始 JSON",
),
OdsTaskSpec(
code="ODS_TENANT_GOODS",
class_name="OdsTenantGoodsTask",
table_name="billiards_ods.tenant_goods_master",
endpoint="/TenantGoods/QueryTenantGoods",
data_path=("data",),
list_key="tenantGoodsList",
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=False,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
description="租户商品档案 ODSQueryTenantGoods -> tenantGoodsList 原始 JSON",
),
OdsTaskSpec(
code="ODS_SETTLEMENT_TICKET",
class_name="OdsSettlementTicketTask",
table_name="billiards_ods.settlement_ticket_details",
endpoint="/Order/GetOrderSettleTicketNew",
data_path=(),
list_key=None,
pk_columns=(),
include_site_column=False,
include_source_endpoint=False,
include_page_no=False,
include_page_size=False,
include_fetched_at=True,
include_record_index=True,
conflict_columns_override=("source_file", "record_index"),
requires_window=False,
include_site_id=False,
description="结账小票详情 ODSGetOrderSettleTicketNew 原始 JSON",
),
)
def _get_spec(code: str) -> OdsTaskSpec:
for spec in ODS_TASK_SPECS:
if spec.code == code:
return spec
raise KeyError(f"Spec not found for code {code}")
_SETTLEMENT_TICKET_SPEC = _get_spec("ODS_SETTLEMENT_TICKET")
class OdsSettlementTicketTask(BaseOdsTask):
"""Special handling: fetch ticket details per payment relate_id/orderSettleId."""
SPEC = _SETTLEMENT_TICKET_SPEC
def extract(self, context) -> dict:
"""Fetch ticket payloads only (used by fetch-only pipeline)."""
existing_ids = self._fetch_existing_ticket_ids()
candidates = self._collect_settlement_ids(
context.store_id or 0, existing_ids, context.window_start, context.window_end
)
candidates = [cid for cid in candidates if cid and cid not in existing_ids]
payloads, skipped = self._fetch_ticket_payloads(candidates)
return {"records": payloads, "skipped": skipped, "fetched": len(candidates)}
def execute(self, cursor_data: dict | None = None) -> dict:
spec = self.SPEC
context = self._build_context(cursor_data)
store_id = TypeParser.parse_int(self.config.get("app.store_id")) or 0
counts = {"fetched": 0, "inserted": 0, "updated": 0, "skipped": 0, "errors": 0}
loader = GenericODSLoader(
self.db,
spec.table_name,
self._resolve_columns(spec),
list(spec.conflict_columns_override or ("source_file", "record_index")),
)
source_file = self._resolve_source_file_hint(spec)
try:
existing_ids = self._fetch_existing_ticket_ids()
candidates = self._collect_settlement_ids(
store_id, existing_ids, context.window_start, context.window_end
)
candidates = [cid for cid in candidates if cid and cid not in existing_ids]
counts["fetched"] = len(candidates)
if not candidates:
self.logger.info(
"%s: 窗口[%s ~ %s] 未发现需要抓取的小票",
spec.code,
context.window_start,
context.window_end,
)
return self._build_result("SUCCESS", counts)
payloads, skipped = self._fetch_ticket_payloads(candidates)
counts["skipped"] += skipped
rows: list[dict] = []
for idx, payload in enumerate(payloads):
row = self._build_row(
spec=spec,
store_id=store_id,
record=payload,
page_no=None,
page_size_value=None,
source_file=source_file,
record_index=idx if spec.include_record_index else None,
)
if row is None:
counts["skipped"] += 1
continue
rows.append(row)
inserted, updated, _ = loader.upsert_rows(rows)
counts["inserted"] += inserted
counts["updated"] += updated
self.db.commit()
self.logger.info(
"%s: 小票抓取完成,候选=%s 插入=%s 更新=%s 跳过=%s",
spec.code,
len(candidates),
inserted,
updated,
counts["skipped"],
)
return self._build_result("SUCCESS", counts)
except Exception:
counts["errors"] += 1
self.db.rollback()
self.logger.error("%s: 小票抓取失败", spec.code, exc_info=True)
raise
# ------------------------------------------------------------------ helpers
def _fetch_existing_ticket_ids(self) -> set[int]:
sql = """
SELECT DISTINCT
CASE WHEN (payload ->> 'orderSettleId') ~ '^[0-9]+$'
THEN (payload ->> 'orderSettleId')::bigint
END AS order_settle_id
FROM billiards_ods.settlement_ticket_details
"""
try:
rows = self.db.query(sql)
except Exception:
self.logger.warning("查询已有小票失败,按空集处理", exc_info=True)
return set()
return {
TypeParser.parse_int(row.get("order_settle_id"))
for row in rows
if row.get("order_settle_id") is not None
}
def _collect_settlement_ids(
self, store_id: int, existing_ids: set[int], window_start, window_end
) -> list[int]:
ids = self._fetch_from_payment_table(store_id)
if not ids:
ids = self._fetch_from_payment_api(store_id, window_start, window_end)
return sorted(i for i in ids if i is not None and i not in existing_ids)
def _fetch_from_payment_table(self, store_id: int) -> set[int]:
sql = """
SELECT DISTINCT COALESCE(
CASE WHEN (payload ->> 'orderSettleId') ~ '^[0-9]+$'
THEN (payload ->> 'orderSettleId')::bigint END,
CASE WHEN (payload ->> 'relateId') ~ '^[0-9]+$'
THEN (payload ->> 'relateId')::bigint END
) AS order_settle_id
FROM billiards_ods.payment_transactions
WHERE (payload ->> 'orderSettleId') ~ '^[0-9]+$'
OR (payload ->> 'relateId') ~ '^[0-9]+$'
"""
params = None
if store_id:
sql += " AND COALESCE((payload ->> 'siteId')::bigint, %s) = %s"
params = (store_id, store_id)
try:
rows = self.db.query(sql, params)
except Exception:
self.logger.warning("读取支付流水以获取结算单ID失败将尝试调用支付接口回退", exc_info=True)
return set()
return {
TypeParser.parse_int(row.get("order_settle_id"))
for row in rows
if row.get("order_settle_id") is not None
}
def _fetch_from_payment_api(self, store_id: int, window_start, window_end) -> set[int]:
params = self._merge_common_params(
{
"siteId": store_id,
"StartPayTime": TypeParser.format_timestamp(window_start, self.tz),
"EndPayTime": TypeParser.format_timestamp(window_end, self.tz),
}
)
candidate_ids: set[int] = set()
try:
for _, records, _, _ in self.api.iter_paginated(
endpoint="/PayLog/GetPayLogListPage",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
):
for rec in records:
relate_id = TypeParser.parse_int(
(rec or {}).get("relateId")
or (rec or {}).get("orderSettleId")
or (rec or {}).get("order_settle_id")
)
if relate_id:
candidate_ids.add(relate_id)
except Exception:
self.logger.warning("调用支付接口获取结算单ID失败当前批次将跳过回退来源", exc_info=True)
return candidate_ids
def _fetch_ticket_payload(self, order_settle_id: int):
payload = None
try:
for _, _, _, response in self.api.iter_paginated(
endpoint=self.SPEC.endpoint,
params={"orderSettleId": order_settle_id},
page_size=None,
data_path=self.SPEC.data_path,
list_key=self.SPEC.list_key,
):
payload = response
except Exception:
self.logger.warning(
"调用小票接口失败 orderSettleId=%s", order_settle_id, exc_info=True
)
if isinstance(payload, dict) and isinstance(payload.get("data"), list) and len(payload["data"]) == 1:
# 本地桩/回放可能把响应包装成单元素 list这里展开以贴近真实结构
payload = payload["data"][0]
return payload
def _fetch_ticket_payloads(self, candidates: list[int]) -> tuple[list, int]:
"""Fetch ticket payloads for a set of orderSettleIds; returns (payloads, skipped)."""
payloads: list = []
skipped = 0
for order_settle_id in candidates:
payload = self._fetch_ticket_payload(order_settle_id)
if payload:
payloads.append(payload)
else:
skipped += 1
return payloads, skipped
def _build_task_class(spec: OdsTaskSpec) -> Type[BaseOdsTask]:
attrs = {
"SPEC": spec,
"__doc__": spec.description or f"ODS ingestion task {spec.code}",
"__module__": __name__,
}
return type(spec.class_name, (BaseOdsTask,), attrs)
ENABLED_ODS_CODES = {
"ODS_ASSISTANT_ACCOUNTS",
"ODS_ASSISTANT_LEDGER",
"ODS_ASSISTANT_ABOLISH",
"ODS_INVENTORY_CHANGE",
"ODS_INVENTORY_STOCK",
"ODS_PACKAGE",
"ODS_GROUP_BUY_REDEMPTION",
"ODS_MEMBER",
"ODS_MEMBER_BALANCE",
"ODS_MEMBER_CARD",
"ODS_PAYMENT",
"ODS_REFUND",
"ODS_COUPON_VERIFY",
"ODS_RECHARGE_SETTLE",
"ODS_TABLES",
"ODS_GOODS_CATEGORY",
"ODS_STORE_GOODS",
"ODS_TABLE_DISCOUNT",
"ODS_TENANT_GOODS",
"ODS_SETTLEMENT_TICKET",
"ODS_ORDER_SETTLE",
}
ODS_TASK_CLASSES: Dict[str, Type[BaseOdsTask]] = {
spec.code: _build_task_class(spec)
for spec in ODS_TASK_SPECS
if spec.code in ENABLED_ODS_CODES
}
# Override with specialized settlement ticket implementation
ODS_TASK_CLASSES["ODS_SETTLEMENT_TICKET"] = OdsSettlementTicketTask
__all__ = ["ODS_TASK_CLASSES", "ODS_TASK_SPECS", "BaseOdsTask", "ENABLED_ODS_CODES"]

View File

@@ -0,0 +1,91 @@
# -*- coding: utf-8 -*-
"""订单ETL任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.order import OrderLoader
from models.parsers import TypeParser
class OrdersTask(BaseTask):
"""订单数据ETL任务"""
def get_task_code(self) -> str:
return "ORDERS"
# ------------------------------------------------------------------ E/T/L hooks
def extract(self, context: TaskContext) -> dict:
"""调用 API 拉取订单记录"""
params = self._merge_common_params(
{
"siteId": context.store_id,
"rangeStartTime": TypeParser.format_timestamp(context.window_start, self.tz),
"rangeEndTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, pages_meta = self.api.get_paginated(
endpoint="/Site/GetAllOrderSettleList",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="settleList",
)
return {"records": records, "meta": pages_meta}
def transform(self, extracted: dict, context: TaskContext) -> dict:
"""解析原始订单 JSON"""
parsed_records = []
skipped = 0
for rec in extracted.get("records", []):
parsed = self._parse_order(rec, context.store_id)
if parsed:
parsed_records.append(parsed)
else:
skipped += 1
return {
"records": parsed_records,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
"""写入 fact_order"""
loader = OrderLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_orders(
transformed["records"], context.store_id
)
counts = {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
return counts
# ------------------------------------------------------------------ helpers
def _parse_order(self, raw: dict, store_id: int) -> dict | None:
"""解析单条订单记录"""
try:
return {
"store_id": store_id,
"order_id": TypeParser.parse_int(raw.get("orderId")),
"order_no": raw.get("orderNo"),
"member_id": TypeParser.parse_int(raw.get("memberId")),
"table_id": TypeParser.parse_int(raw.get("tableId")),
"order_time": TypeParser.parse_timestamp(raw.get("orderTime"), self.tz),
"end_time": TypeParser.parse_timestamp(raw.get("endTime"), self.tz),
"total_amount": TypeParser.parse_decimal(raw.get("totalAmount")),
"discount_amount": TypeParser.parse_decimal(raw.get("discountAmount")),
"final_amount": TypeParser.parse_decimal(raw.get("finalAmount")),
"pay_status": raw.get("payStatus"),
"order_status": raw.get("orderStatus"),
"remark": raw.get("remark"),
"raw_data": json.dumps(raw, ensure_ascii=False),
}
except Exception as exc:
self.logger.warning("解析订单失败: %s, 原始数据: %s", exc, raw)
return None

View File

@@ -0,0 +1,90 @@
# -*- coding: utf-8 -*-
"""团购/套餐定义任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.dimensions.package import PackageDefinitionLoader
from models.parsers import TypeParser
class PackagesDefTask(BaseTask):
"""同步团购套餐定义"""
def get_task_code(self) -> str:
return "PACKAGES_DEF"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params({"siteId": context.store_id})
records, _ = self.api.get_paginated(
endpoint="/PackageCoupon/QueryPackageCouponList",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="packageCouponList",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_package(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = PackageDefinitionLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_packages(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_package(self, raw: dict, store_id: int) -> dict | None:
package_id = TypeParser.parse_int(raw.get("id"))
if not package_id:
self.logger.warning("跳过缺少 package id 的套餐记录: %s", raw)
return None
return {
"store_id": store_id,
"package_id": package_id,
"package_code": raw.get("package_id") or raw.get("packageId"),
"package_name": raw.get("package_name"),
"table_area_id": raw.get("table_area_id"),
"table_area_name": raw.get("table_area_name"),
"selling_price": TypeParser.parse_decimal(
raw.get("selling_price") or raw.get("sellingPrice")
),
"duration_seconds": TypeParser.parse_int(raw.get("duration")),
"start_time": TypeParser.parse_timestamp(
raw.get("start_time") or raw.get("startTime"), self.tz
),
"end_time": TypeParser.parse_timestamp(
raw.get("end_time") or raw.get("endTime"), self.tz
),
"type": raw.get("type"),
"is_enabled": raw.get("is_enabled"),
"is_delete": raw.get("is_delete"),
"usable_count": TypeParser.parse_int(raw.get("usable_count")),
"creator_name": raw.get("creator_name"),
"date_type": raw.get("date_type"),
"group_type": raw.get("group_type"),
"coupon_money": TypeParser.parse_decimal(
raw.get("coupon_money") or raw.get("couponMoney")
),
"area_tag_type": raw.get("area_tag_type"),
"system_group_type": raw.get("system_group_type"),
"card_type_ids": raw.get("card_type_ids"),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

@@ -0,0 +1,138 @@
# -*- coding: utf-8 -*-
from .base_dwd_task import BaseDwdTask
from loaders.facts.payment import PaymentLoader
from models.parsers import TypeParser
import json
class PaymentsDwdTask(BaseDwdTask):
"""
DWD Task: Process Payment Records from ODS to Fact Table
Source: billiards_ods.ods_payment
Target: billiards.fact_payment
"""
def get_task_code(self) -> str:
return "PAYMENTS_DWD"
def execute(self) -> dict:
self.logger.info(f"Starting {self.get_task_code()} task")
window_start, window_end, _ = self._get_time_window()
self.logger.info(f"Processing window: {window_start} to {window_end}")
loader = PaymentLoader(self.db, logger=self.logger)
store_id = self.config.get("app.store_id")
total_inserted = 0
total_updated = 0
total_skipped = 0
# Iterate ODS Data
batches = self.iter_ods_rows(
table_name="billiards_ods.ods_payment_record",
columns=["site_id", "pay_id", "payload", "fetched_at"],
start_time=window_start,
end_time=window_end
)
for batch in batches:
if not batch:
continue
parsed_rows = []
for row in batch:
payload = self.parse_payload(row)
if not payload:
continue
parsed = self._parse_payment(payload, store_id)
if parsed:
parsed_rows.append(parsed)
if parsed_rows:
inserted, updated, skipped = loader.upsert_payments(parsed_rows, store_id)
total_inserted += inserted
total_updated += updated
total_skipped += skipped
self.db.commit()
self.logger.info(
"Task %s completed. inserted=%s updated=%s skipped=%s",
self.get_task_code(),
total_inserted,
total_updated,
total_skipped,
)
return {
"status": "SUCCESS",
"counts": {
"inserted": total_inserted,
"updated": total_updated,
"skipped": total_skipped,
},
"window_start": window_start,
"window_end": window_end,
}
def _parse_payment(self, raw: dict, store_id: int) -> dict:
"""Parse ODS payload into Fact structure"""
try:
pay_id = TypeParser.parse_int(raw.get("payId") or raw.get("id"))
if not pay_id:
return None
relate_type = str(raw.get("relateType") or raw.get("relate_type") or "")
relate_id = TypeParser.parse_int(raw.get("relateId") or raw.get("relate_id"))
# Attempt to populate settlement / trade identifiers
order_settle_id = TypeParser.parse_int(
raw.get("orderSettleId") or raw.get("order_settle_id")
)
order_trade_no = TypeParser.parse_int(
raw.get("orderTradeNo") or raw.get("order_trade_no")
)
if relate_type in {"1", "SETTLE", "ORDER"}:
order_settle_id = order_settle_id or relate_id
return {
"store_id": store_id,
"pay_id": pay_id,
"order_id": TypeParser.parse_int(raw.get("orderId") or raw.get("order_id")),
"order_settle_id": order_settle_id,
"order_trade_no": order_trade_no,
"relate_type": relate_type,
"relate_id": relate_id,
"site_id": TypeParser.parse_int(
raw.get("siteId") or raw.get("site_id") or store_id
),
"tenant_id": TypeParser.parse_int(raw.get("tenantId") or raw.get("tenant_id")),
"create_time": TypeParser.parse_timestamp(
raw.get("createTime") or raw.get("create_time"), self.tz
),
"pay_time": TypeParser.parse_timestamp(raw.get("payTime"), self.tz),
"pay_amount": TypeParser.parse_decimal(raw.get("payAmount")),
"fee_amount": TypeParser.parse_decimal(
raw.get("feeAmount")
or raw.get("serviceFee")
or raw.get("channelFee")
or raw.get("fee_amount")
),
"discount_amount": TypeParser.parse_decimal(
raw.get("discountAmount")
or raw.get("couponAmount")
or raw.get("discount_amount")
),
"payment_method": str(raw.get("paymentMethod") or raw.get("payment_method") or ""),
"pay_type": raw.get("payType") or raw.get("pay_type"),
"online_pay_channel": raw.get("onlinePayChannel") or raw.get("online_pay_channel"),
"pay_terminal": raw.get("payTerminal") or raw.get("pay_terminal"),
"pay_status": str(raw.get("payStatus") or raw.get("pay_status") or ""),
"remark": raw.get("remark"),
"raw_data": json.dumps(raw, ensure_ascii=False)
}
except Exception as e:
self.logger.warning(f"Error parsing payment: {e}")
return None

View File

@@ -0,0 +1,111 @@
# -*- coding: utf-8 -*-
"""支付记录ETL任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.payment import PaymentLoader
from models.parsers import TypeParser
class PaymentsTask(BaseTask):
"""支付记录 E/T/L 任务"""
def get_task_code(self) -> str:
return "PAYMENTS"
# ------------------------------------------------------------------ E/T/L hooks
def extract(self, context: TaskContext) -> dict:
"""调用 API 抓取支付记录"""
params = self._merge_common_params(
{
"siteId": context.store_id,
"StartPayTime": TypeParser.format_timestamp(context.window_start, self.tz),
"EndPayTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, pages_meta = self.api.get_paginated(
endpoint="/PayLog/GetPayLogListPage",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
)
return {"records": records, "meta": pages_meta}
def transform(self, extracted: dict, context: TaskContext) -> dict:
"""解析支付 JSON"""
parsed, skipped = [], 0
for rec in extracted.get("records", []):
cleaned = self._parse_payment(rec, context.store_id)
if cleaned:
parsed.append(cleaned)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
"""写入 fact_payment"""
loader = PaymentLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_payments(
transformed["records"], context.store_id
)
counts = {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
return counts
# ------------------------------------------------------------------ helpers
def _parse_payment(self, raw: dict, store_id: int) -> dict | None:
"""解析支付记录"""
try:
return {
"store_id": store_id,
"pay_id": TypeParser.parse_int(raw.get("payId") or raw.get("id")),
"order_id": TypeParser.parse_int(raw.get("orderId")),
"order_settle_id": TypeParser.parse_int(
raw.get("orderSettleId") or raw.get("order_settle_id")
),
"order_trade_no": TypeParser.parse_int(
raw.get("orderTradeNo") or raw.get("order_trade_no")
),
"relate_type": raw.get("relateType") or raw.get("relate_type"),
"relate_id": TypeParser.parse_int(raw.get("relateId") or raw.get("relate_id")),
"site_id": TypeParser.parse_int(
raw.get("siteId") or raw.get("site_id") or store_id
),
"tenant_id": TypeParser.parse_int(raw.get("tenantId") or raw.get("tenant_id")),
"pay_time": TypeParser.parse_timestamp(raw.get("payTime"), self.tz),
"create_time": TypeParser.parse_timestamp(
raw.get("createTime") or raw.get("create_time"), self.tz
),
"pay_amount": TypeParser.parse_decimal(raw.get("payAmount")),
"fee_amount": TypeParser.parse_decimal(
raw.get("feeAmount")
or raw.get("serviceFee")
or raw.get("channelFee")
or raw.get("fee_amount")
),
"discount_amount": TypeParser.parse_decimal(
raw.get("discountAmount")
or raw.get("couponAmount")
or raw.get("discount_amount")
),
"pay_type": raw.get("payType"),
"payment_method": raw.get("paymentMethod") or raw.get("payment_method"),
"online_pay_channel": raw.get("onlinePayChannel")
or raw.get("online_pay_channel"),
"pay_status": raw.get("payStatus"),
"pay_terminal": raw.get("payTerminal") or raw.get("pay_terminal"),
"remark": raw.get("remark"),
"raw_data": json.dumps(raw, ensure_ascii=False),
}
except Exception as exc:
self.logger.warning("解析支付记录失败: %s, 原始数据: %s", exc, raw)
return None

View File

@@ -0,0 +1,93 @@
# -*- coding: utf-8 -*-
"""商品档案PRODUCTSETL任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.dimensions.product import ProductLoader
from models.parsers import TypeParser
class ProductsTask(BaseTask):
"""商品维度 ETL 任务"""
def get_task_code(self) -> str:
return "PRODUCTS"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params({"siteId": context.store_id})
records, _ = self.api.get_paginated(
endpoint="/TenantGoods/QueryTenantGoods",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="tenantGoodsList",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
parsed_row = self._parse_product(raw, context.store_id)
if parsed_row:
parsed.append(parsed_row)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = ProductLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_products(
transformed["records"], context.store_id
)
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_product(self, raw: dict, store_id: int) -> dict | None:
try:
product_id = TypeParser.parse_int(
raw.get("siteGoodsId") or raw.get("tenantGoodsId") or raw.get("productId")
)
if not product_id:
return None
return {
"store_id": store_id,
"product_id": product_id,
"site_product_id": TypeParser.parse_int(raw.get("siteGoodsId")),
"product_name": raw.get("goodsName") or raw.get("productName"),
"category_id": TypeParser.parse_int(
raw.get("tenantGoodsCategoryId") or raw.get("goodsCategoryId")
),
"category_name": raw.get("categoryName"),
"second_category_id": TypeParser.parse_int(raw.get("goodsCategorySecondId")),
"unit": raw.get("goodsUnit"),
"cost_price": TypeParser.parse_decimal(raw.get("costPrice")),
"sale_price": TypeParser.parse_decimal(
raw.get("goodsPrice") or raw.get("salePrice")
),
"allow_discount": None,
"status": raw.get("goodsState") or raw.get("status"),
"supplier_id": TypeParser.parse_int(raw.get("supplierId"))
if raw.get("supplierId")
else None,
"barcode": raw.get("barcode"),
"is_combo": bool(raw.get("isCombo"))
if raw.get("isCombo") is not None
else None,
"created_time": TypeParser.parse_timestamp(raw.get("createTime"), self.tz),
"updated_time": TypeParser.parse_timestamp(raw.get("updateTime"), self.tz),
"raw_data": json.dumps(raw, ensure_ascii=False),
}
except Exception as exc:
self.logger.warning("解析商品记录失败: %s, 原始数据: %s", exc, raw)
return None

View File

@@ -0,0 +1,90 @@
# -*- coding: utf-8 -*-
"""退款记录任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.refund import RefundLoader
from models.parsers import TypeParser
class RefundsTask(BaseTask):
"""同步支付退款流水"""
def get_task_code(self) -> str:
return "REFUNDS"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params(
{
"siteId": context.store_id,
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, _ = self.api.get_paginated(
endpoint="/Order/GetRefundPayLogList",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_refund(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = RefundLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_refunds(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_refund(self, raw: dict, store_id: int) -> dict | None:
refund_id = TypeParser.parse_int(raw.get("id"))
if not refund_id:
self.logger.warning("跳过缺少退款ID的数据: %s", raw)
return None
return {
"store_id": store_id,
"refund_id": refund_id,
"site_id": TypeParser.parse_int(raw.get("site_id") or raw.get("siteId")),
"tenant_id": TypeParser.parse_int(raw.get("tenant_id") or raw.get("tenantId")),
"pay_amount": TypeParser.parse_decimal(raw.get("pay_amount")),
"pay_status": raw.get("pay_status"),
"pay_time": TypeParser.parse_timestamp(
raw.get("pay_time") or raw.get("payTime"), self.tz
),
"create_time": TypeParser.parse_timestamp(
raw.get("create_time") or raw.get("createTime"), self.tz
),
"relate_type": raw.get("relate_type"),
"relate_id": TypeParser.parse_int(raw.get("relate_id")),
"payment_method": raw.get("payment_method"),
"refund_amount": TypeParser.parse_decimal(raw.get("refund_amount")),
"action_type": raw.get("action_type"),
"pay_terminal": raw.get("pay_terminal"),
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
"channel_pay_no": raw.get("channel_pay_no"),
"channel_fee": TypeParser.parse_decimal(raw.get("channel_fee")),
"is_delete": raw.get("is_delete"),
"member_id": TypeParser.parse_int(raw.get("member_id")),
"member_card_id": TypeParser.parse_int(raw.get("member_card_id")),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

@@ -0,0 +1,92 @@
# -*- coding: utf-8 -*-
"""台费折扣任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.table_discount import TableDiscountLoader
from models.parsers import TypeParser
class TableDiscountTask(BaseTask):
"""同步台费折扣/调价记录"""
def get_task_code(self) -> str:
return "TABLE_DISCOUNT"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params(
{
"siteId": context.store_id,
"startTime": TypeParser.format_timestamp(context.window_start, self.tz),
"endTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, _ = self.api.get_paginated(
endpoint="/Site/GetTaiFeeAdjustList",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="taiFeeAdjustInfos",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_discount(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = TableDiscountLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_discounts(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_discount(self, raw: dict, store_id: int) -> dict | None:
discount_id = TypeParser.parse_int(raw.get("id"))
if not discount_id:
self.logger.warning("跳过缺少折扣ID的记录: %s", raw)
return None
table_profile = raw.get("tableProfile") or {}
return {
"store_id": store_id,
"discount_id": discount_id,
"adjust_type": raw.get("adjust_type") or raw.get("adjustType"),
"applicant_id": TypeParser.parse_int(raw.get("applicant_id")),
"applicant_name": raw.get("applicant_name"),
"operator_id": TypeParser.parse_int(raw.get("operator_id")),
"operator_name": raw.get("operator_name"),
"ledger_amount": TypeParser.parse_decimal(raw.get("ledger_amount")),
"ledger_count": TypeParser.parse_int(raw.get("ledger_count")),
"ledger_name": raw.get("ledger_name"),
"ledger_status": raw.get("ledger_status"),
"order_settle_id": TypeParser.parse_int(raw.get("order_settle_id")),
"order_trade_no": TypeParser.parse_int(raw.get("order_trade_no")),
"site_table_id": TypeParser.parse_int(
raw.get("site_table_id") or table_profile.get("id")
),
"table_area_id": TypeParser.parse_int(
raw.get("tableAreaId") or table_profile.get("site_table_area_id")
),
"table_area_name": table_profile.get("site_table_area_name"),
"create_time": TypeParser.parse_timestamp(
raw.get("create_time") or raw.get("createTime"), self.tz
),
"is_delete": raw.get("is_delete"),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

@@ -0,0 +1,84 @@
# -*- coding: utf-8 -*-
"""台桌档案任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.dimensions.table import TableLoader
from models.parsers import TypeParser
class TablesTask(BaseTask):
"""同步门店台桌列表"""
def get_task_code(self) -> str:
return "TABLES"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params({"siteId": context.store_id})
records, _ = self.api.get_paginated(
endpoint="/Table/GetSiteTables",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="siteTables",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_table(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = TableLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_tables(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_table(self, raw: dict, store_id: int) -> dict | None:
table_id = TypeParser.parse_int(raw.get("id"))
if not table_id:
self.logger.warning("跳过缺少 table_id 的台桌记录: %s", raw)
return None
return {
"store_id": store_id,
"table_id": table_id,
"site_id": TypeParser.parse_int(raw.get("site_id") or raw.get("siteId")),
"area_id": TypeParser.parse_int(
raw.get("site_table_area_id") or raw.get("siteTableAreaId")
),
"area_name": raw.get("areaName") or raw.get("site_table_area_name"),
"table_name": raw.get("table_name") or raw.get("tableName"),
"table_price": TypeParser.parse_decimal(
raw.get("table_price") or raw.get("tablePrice")
),
"table_status": raw.get("table_status") or raw.get("tableStatus"),
"table_status_name": raw.get("tableStatusName"),
"light_status": raw.get("light_status"),
"is_rest_area": raw.get("is_rest_area"),
"show_status": raw.get("show_status"),
"virtual_table": raw.get("virtual_table"),
"charge_free": raw.get("charge_free"),
"only_allow_groupon": raw.get("only_allow_groupon"),
"is_online_reservation": raw.get("is_online_reservation"),
"created_time": TypeParser.parse_timestamp(
raw.get("create_time") or raw.get("createTime"), self.tz
),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

@@ -0,0 +1,69 @@
# -*- coding: utf-8 -*-
from .base_dwd_task import BaseDwdTask
from loaders.facts.ticket import TicketLoader
class TicketDwdTask(BaseDwdTask):
"""
DWD Task: Process Ticket Details from ODS to Fact Tables
Source: billiards_ods.ods_ticket_detail
Targets:
- billiards.fact_order
- billiards.fact_order_goods
- billiards.fact_table_usage
- billiards.fact_assistant_service
"""
def get_task_code(self) -> str:
return "TICKET_DWD"
def execute(self) -> dict:
self.logger.info(f"Starting {self.get_task_code()} task")
# 1. Get Time Window (Incremental Load)
window_start, window_end, _ = self._get_time_window()
self.logger.info(f"Processing window: {window_start} to {window_end}")
# 2. Initialize Loader
loader = TicketLoader(self.db, logger=self.logger)
store_id = self.config.get("app.store_id")
total_inserted = 0
total_errors = 0
# 3. Iterate ODS Data
# We query ods_ticket_detail based on fetched_at
batches = self.iter_ods_rows(
table_name="billiards_ods.settlement_ticket_details",
columns=["payload", "fetched_at", "source_file", "record_index"],
start_time=window_start,
end_time=window_end
)
for batch in batches:
if not batch:
continue
# Extract payloads
tickets = []
for row in batch:
payload = self.parse_payload(row)
if payload:
tickets.append(payload)
# Process Batch
inserted, errors = loader.process_tickets(tickets, store_id)
total_inserted += inserted
total_errors += errors
# 4. Commit
self.db.commit()
self.logger.info(f"Task {self.get_task_code()} completed. Inserted: {total_inserted}, Errors: {total_errors}")
return {
"status": "success",
"inserted": total_inserted,
"errors": total_errors,
"window_start": window_start.isoformat(),
"window_end": window_end.isoformat()
}

View File

@@ -0,0 +1,102 @@
# -*- coding: utf-8 -*-
"""充值记录任务"""
import json
from .base_task import BaseTask, TaskContext
from loaders.facts.topup import TopupLoader
from models.parsers import TypeParser
class TopupsTask(BaseTask):
"""同步储值充值结算记录"""
def get_task_code(self) -> str:
return "TOPUPS"
def extract(self, context: TaskContext) -> dict:
params = self._merge_common_params(
{
"siteId": context.store_id,
"rangeStartTime": TypeParser.format_timestamp(context.window_start, self.tz),
"rangeEndTime": TypeParser.format_timestamp(context.window_end, self.tz),
}
)
records, _ = self.api.get_paginated(
endpoint="/Site/GetRechargeSettleList",
params=params,
page_size=self.config.get("api.page_size", 200),
data_path=("data",),
list_key="settleList",
)
return {"records": records}
def transform(self, extracted: dict, context: TaskContext) -> dict:
parsed, skipped = [], 0
for raw in extracted.get("records", []):
mapped = self._parse_topup(raw, context.store_id)
if mapped:
parsed.append(mapped)
else:
skipped += 1
return {
"records": parsed,
"fetched": len(extracted.get("records", [])),
"skipped": skipped,
}
def load(self, transformed: dict, context: TaskContext) -> dict:
loader = TopupLoader(self.db)
inserted, updated, loader_skipped = loader.upsert_topups(transformed["records"])
return {
"fetched": transformed["fetched"],
"inserted": inserted,
"updated": updated,
"skipped": transformed["skipped"] + loader_skipped,
"errors": 0,
}
def _parse_topup(self, raw: dict, store_id: int) -> dict | None:
node = raw.get("settleList") if isinstance(raw.get("settleList"), dict) else raw
topup_id = TypeParser.parse_int(node.get("id"))
if not topup_id:
self.logger.warning("跳过缺少充值ID的记录: %s", raw)
return None
return {
"store_id": store_id,
"topup_id": topup_id,
"member_id": TypeParser.parse_int(node.get("memberId")),
"member_name": node.get("memberName"),
"member_phone": node.get("memberPhone"),
"card_id": TypeParser.parse_int(node.get("tenantMemberCardId")),
"card_type_name": node.get("memberCardTypeName"),
"pay_amount": TypeParser.parse_decimal(node.get("payAmount")),
"consume_money": TypeParser.parse_decimal(node.get("consumeMoney")),
"settle_status": node.get("settleStatus"),
"settle_type": node.get("settleType"),
"settle_name": node.get("settleName"),
"settle_relate_id": TypeParser.parse_int(node.get("settleRelateId")),
"pay_time": TypeParser.parse_timestamp(
node.get("payTime") or node.get("pay_time"), self.tz
),
"create_time": TypeParser.parse_timestamp(
node.get("createTime") or node.get("create_time"), self.tz
),
"operator_id": TypeParser.parse_int(node.get("operatorId")),
"operator_name": node.get("operatorName"),
"payment_method": node.get("paymentMethod"),
"refund_amount": TypeParser.parse_decimal(node.get("refundAmount")),
"cash_amount": TypeParser.parse_decimal(node.get("cashAmount")),
"card_amount": TypeParser.parse_decimal(node.get("cardAmount")),
"balance_amount": TypeParser.parse_decimal(node.get("balanceAmount")),
"online_amount": TypeParser.parse_decimal(node.get("onlineAmount")),
"rounding_amount": TypeParser.parse_decimal(node.get("roundingAmount")),
"adjust_amount": TypeParser.parse_decimal(node.get("adjustAmount")),
"goods_money": TypeParser.parse_decimal(node.get("goodsMoney")),
"table_charge_money": TypeParser.parse_decimal(node.get("tableChargeMoney")),
"service_money": TypeParser.parse_decimal(node.get("serviceMoney")),
"coupon_amount": TypeParser.parse_decimal(node.get("couponAmount")),
"order_remark": node.get("orderRemark"),
"raw_data": json.dumps(raw, ensure_ascii=False),
}

View File

View File

@@ -0,0 +1,33 @@
# -*- coding: utf-8 -*-
"""数据库集成测试"""
import pytest
from database.connection import DatabaseConnection
from database.operations import DatabaseOperations
# 注意:这些测试需要实际的数据库连接
# 在CI/CD环境中应使用测试数据库
@pytest.fixture
def db_connection():
"""数据库连接fixture"""
# 从环境变量获取测试数据库DSN
import os
dsn = os.environ.get("TEST_DB_DSN")
if not dsn:
pytest.skip("未配置测试数据库")
conn = DatabaseConnection(dsn)
yield conn
conn.close()
def test_database_query(db_connection):
"""测试数据库查询"""
result = db_connection.query("SELECT 1 AS test")
assert len(result) == 1
assert result[0]["test"] == 1
def test_database_operations(db_connection):
"""测试数据库操作"""
ops = DatabaseOperations(db_connection)
# 添加实际的测试用例
pass

Some files were not shown because too many files have changed in this diff Show More