在准备环境前提交次全部更改。
This commit is contained in:
@@ -2,14 +2,14 @@
|
||||
|
||||
## 作用说明
|
||||
|
||||
ETL 数据管线集合。每个上游数据源对应 `pipelines/` 下的一个子目录,当前仅有飞球平台(`feiqiu`)。管线负责从 SaaS API 抽取数据,经 ODS→DWD→Core→DWS 逐层处理后落库。
|
||||
ETL Connector(数据源连接器)集合。每个上游数据源对应 `pipelines/` 下的一个子目录(即一个 Connector),当前仅有飞球平台(`feiqiu`)。Connector 负责从 SaaS API 抽取数据,经 ODS→DWD→Core→DWS 逐层处理后落库。
|
||||
|
||||
## 内部结构
|
||||
|
||||
- `pipelines/feiqiu/` — 飞球平台 ETL(api、cli、config、loaders、models、orchestration、scd、tasks、utils、quality、tests)
|
||||
- `pipelines/feiqiu/` — 飞球 Connector(api、cli、config、loaders、models、orchestration、scd、tasks、utils、quality、tests)
|
||||
|
||||
## Roadmap
|
||||
|
||||
- 将通用抽取/加载逻辑抽离为 `etl_sdk` 共享包,供多管线复用
|
||||
- 将通用抽取/加载逻辑抽离为 `etl_sdk` 共享包,供多 Connector 复用
|
||||
- 将各平台 API 客户端拆分为独立 `connectors` 包,实现可插拔数据源接入
|
||||
- 新增管线时在 `pipelines/` 下创建同构子目录
|
||||
- 新增 Connector 时在 `pipelines/` 下创建同构子目录
|
||||
|
||||
212
apps/etl/connectors/feiqiu/.env
Normal file
212
apps/etl/connectors/feiqiu/.env
Normal file
@@ -0,0 +1,212 @@
|
||||
# ==============================================================================
|
||||
# NeoZQYY ETL Connector(飞球)配置
|
||||
# ==============================================================================
|
||||
# ETL env_parser.py 从此文件加载
|
||||
# 优先级:DEFAULTS < 此 .env < 环境变量 < CLI 参数
|
||||
# 敏感值禁止提交;本文件已在 .gitignore 中排除
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 门店配置
|
||||
# ------------------------------------------------------------------------------
|
||||
STORE_ID=2790685415443269
|
||||
TIMEZONE=Asia/Shanghai
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据库配置
|
||||
# ------------------------------------------------------------------------------
|
||||
# CHANGE 2026-02-15 | 默认指向测试库,生产环境切换为 etl_feiqiu
|
||||
PG_DSN=postgresql://local-Python:Neo-local-1991125@100.64.0.4:5432/test_etl_feiqiu
|
||||
PG_CONNECT_TIMEOUT=10
|
||||
|
||||
# 数据库 Schema
|
||||
SCHEMA_OLTP=ods
|
||||
SCHEMA_ETL=meta
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# API 配置(上游 SaaS API)
|
||||
# ------------------------------------------------------------------------------
|
||||
API_BASE=https://pc.ficoo.vip/apiprod/admin/v1/
|
||||
API_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnQtdHlwZSI6IjQiLCJ1c2VyLXR5cGUiOiIxIiwiaHR0cDovL3NjaGVtYXMubWljcm9zb2Z0LmNvbS93cy8yMDA4LzA2L2lkZW50aXR5L2NsYWltcy9yb2xlIjoiMTIiLCJyb2xlLWlkIjoiMTIiLCJ0ZW5hbnQtaWQiOiIyNzkwNjgzMTYwNzA5OTU3Iiwibmlja25hbWUiOiLnp5_miLfnrqHnkIblkZjvvJrmganmgakxIiwic2l0ZS1pZCI6IjAiLCJtb2JpbGUiOiIxMzgxMDUwMjMwNCIsInNpZCI6IjI5NTA0ODk2NTgzOTU4NDUiLCJzdGFmZi1pZCI6IjMwMDk5MTg2OTE1NTkwNDUiLCJvcmctaWQiOiIwIiwicm9sZS10eXBlIjoiMyIsInJlZnJlc2hUb2tlbiI6InoxazVzWjlDeEFKYnFkNG1pT3NwUzBsQTRMYUNGcURkQjJBdFdsQk1DbDA9IiwicmVmcmVzaEV4cGlyeVRpbWUiOiIyMDI2LzIvMjIg5LiL5Y2IMTE6NTk6MzAiLCJuZWVkQ2hlY2tUb2tlbiI6ImZhbHNlIiwiZXhwIjoxNzcxNzc1OTcwLCJpc3MiOiJ0ZXN0IiwiYXVkIjoiVXNlciJ9.27D1QgKFYGgMKR9bS5NbCSl4kIf9oFVOQLsFl_ITxdI
|
||||
API_TIMEOUT=20
|
||||
API_PAGE_SIZE=200
|
||||
API_RETRY_MAX=3
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 路径配置(已更新为 NeoZQYY 路径)
|
||||
# ------------------------------------------------------------------------------
|
||||
EXPORT_ROOT=C:/NeoZQYY/export/ETL/JSON
|
||||
LOG_ROOT=C:/NeoZQYY/export/ETL/LOG
|
||||
FETCH_ROOT=C:/NeoZQYY/export/ETL/JSON
|
||||
WRITE_PRETTY_JSON=true
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 管线流程配置
|
||||
# ------------------------------------------------------------------------------
|
||||
PIPELINE_FLOW=FULL
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 时间窗口配置
|
||||
# ------------------------------------------------------------------------------
|
||||
OVERLAP_SECONDS=600
|
||||
WINDOW_BUSY_MIN=30
|
||||
WINDOW_IDLE_MIN=180
|
||||
IDLE_START=04:00
|
||||
IDLE_END=16:00
|
||||
WINDOW_SPLIT_UNIT=day
|
||||
WINDOW_SPLIT_DAYS=10
|
||||
WINDOW_COMPENSATION_HOURS=2
|
||||
ALLOW_EMPTY_RESULT_ADVANCE=true
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 快照配置
|
||||
# ------------------------------------------------------------------------------
|
||||
SNAPSHOT_MISSING_DELETE=true
|
||||
SNAPSHOT_ALLOW_EMPTY_DELETE=false
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据完整性检查配置
|
||||
# ------------------------------------------------------------------------------
|
||||
INTEGRITY_MODE=history
|
||||
INTEGRITY_HISTORY_START=2025-07-01
|
||||
INTEGRITY_INCLUDE_DIMENSIONS=true
|
||||
INTEGRITY_AUTO_CHECK=false
|
||||
INTEGRITY_AUTO_BACKFILL=false
|
||||
INTEGRITY_COMPARE_CONTENT=true
|
||||
INTEGRITY_CONTENT_SAMPLE_LIMIT=50
|
||||
INTEGRITY_BACKFILL_MISMATCH=true
|
||||
INTEGRITY_RECHECK_AFTER_BACKFILL=true
|
||||
|
||||
# 指定 ODS 任务代码(逗号分隔,为空则全部)
|
||||
# INTEGRITY_ODS_TASK_CODES=
|
||||
|
||||
# 是否强制按月切分(默认 true)
|
||||
# INTEGRITY_FORCE_MONTHLY_SPLIT=true
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 校验配置
|
||||
# ------------------------------------------------------------------------------
|
||||
VERIFY_SKIP_ODS_ON_FETCH=true
|
||||
VERIFY_ODS_LOCAL_JSON=true
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据库会话参数(defaults.py → db.session.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 会话时区(默认跟随 TIMEZONE)
|
||||
# DB_SESSION_TIMEZONE=Asia/Shanghai
|
||||
|
||||
# SQL 语句超时(毫秒,默认 30000)
|
||||
# DB_STATEMENT_TIMEOUT_MS=30000
|
||||
|
||||
# 锁等待超时(毫秒,默认 5000)
|
||||
# DB_LOCK_TIMEOUT_MS=5000
|
||||
|
||||
# 事务空闲超时(毫秒,默认 600000 = 10 分钟)
|
||||
# DB_IDLE_IN_TX_TIMEOUT_MS=600000
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 清洗配置(defaults.py → clean.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 是否记录未知字段(默认 true)
|
||||
# CLEAN_LOG_UNKNOWN_FIELDS=true
|
||||
|
||||
# 未知字段日志上限(默认 50)
|
||||
# CLEAN_UNKNOWN_FIELDS_LIMIT=50
|
||||
|
||||
# 哈希算法(默认 sha1)
|
||||
# CLEAN_HASH_ALGO=sha1
|
||||
|
||||
# 哈希盐值(默认空)
|
||||
# CLEAN_HASH_SALT=
|
||||
|
||||
# 严格数值校验(默认 true)
|
||||
# CLEAN_STRICT_NUMERIC=true
|
||||
|
||||
# 金额舍入精度(默认 2 位小数)
|
||||
# CLEAN_ROUND_MONEY_SCALE=2
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 安全配置(defaults.py → security.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 日志中是否脱敏(默认 true)
|
||||
# SECURITY_REDACT_IN_LOGS=true
|
||||
|
||||
# 需脱敏的键名(JSON 数组,默认 ["token","password","Authorization"])
|
||||
# SECURITY_REDACT_KEYS=["token","password","Authorization"]
|
||||
|
||||
# 日志中是否回显 token(默认 false,调试用)
|
||||
# SECURITY_ECHO_TOKEN_IN_LOGS=false
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# IO 文件大小限制(defaults.py → io.max_file_bytes)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 单文件最大字节数(默认 50MB = 52428800)
|
||||
# MAX_FILE_BYTES=52428800
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# DWD 层配置
|
||||
# ------------------------------------------------------------------------------
|
||||
DWD_FACT_UPSERT=true
|
||||
|
||||
# 事实表 UPSERT 批量大小(默认 1000)
|
||||
# DWD_FACT_UPSERT_BATCH_SIZE=1000
|
||||
|
||||
# 最小批量大小(锁冲突时自动缩小,默认 100)
|
||||
# DWD_FACT_UPSERT_MIN_BATCH_SIZE=100
|
||||
|
||||
# 最大重试次数(默认 2)
|
||||
# DWD_FACT_UPSERT_MAX_RETRIES=2
|
||||
|
||||
# 重试退避时间(JSON 数组,秒,默认 [1,2,4])
|
||||
# DWD_FACT_UPSERT_RETRY_BACKOFF=[1,2,4]
|
||||
|
||||
# 事实表 backfill 锁等待超时(毫秒,为空则沿用 DB_LOCK_TIMEOUT_MS)
|
||||
# DWD_FACT_UPSERT_LOCK_TIMEOUT_MS=
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 任务列表配置
|
||||
# ------------------------------------------------------------------------------
|
||||
RUN_TASKS=PRODUCTS,TABLES,MEMBERS,ASSISTANTS,PACKAGES_DEF,ORDERS,PAYMENTS,REFUNDS,COUPON_USAGE,INVENTORY_CHANGE,TOPUPS,TABLE_DISCOUNT,ASSISTANT_ABOLISH,LEDGER
|
||||
INDEX_LOOKBACK_DAYS=60
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# DWS 月度/薪资配置(defaults.py → dws.*)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 是否允许历史月度重算(默认 false)
|
||||
# DWS_MONTHLY_ALLOW_HISTORY=false
|
||||
|
||||
# 上月宽限天数(默认 5,即次月 1-5 号仍可计算上月)
|
||||
# DWS_MONTHLY_PREV_GRACE_DAYS=5
|
||||
|
||||
# 历史月份数(默认 0,即不回溯)
|
||||
# DWS_MONTHLY_HISTORY_MONTHS=0
|
||||
|
||||
# 新人封顶生效日期(默认 2026-03-01)
|
||||
# DWS_MONTHLY_NEW_HIRE_CAP_EFFECTIVE_FROM=2026-03-01
|
||||
|
||||
# 新人封顶天数(默认 25)
|
||||
# DWS_MONTHLY_NEW_HIRE_CAP_DAY=25
|
||||
|
||||
# 新人最高等级(默认 2)
|
||||
# DWS_MONTHLY_NEW_HIRE_MAX_TIER_LEVEL=2
|
||||
|
||||
# 薪资计算运行天数(默认 5)
|
||||
# DWS_SALARY_RUN_DAYS=5
|
||||
|
||||
# 是否允许非周期内运行(默认 false)
|
||||
# DWS_SALARY_ALLOW_OUT_OF_CYCLE=false
|
||||
|
||||
# 包房课单价(默认 138)
|
||||
# DWS_SALARY_ROOM_COURSE_PRICE=138
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# 运行模式(defaults.py → run.data_source)
|
||||
# ------------------------------------------------------------------------------
|
||||
# 数据源模式:hybrid(默认,API+本地)、online(仅 API)、offline(仅本地)
|
||||
# 也可通过 PIPELINE_FLOW 间接设置(FULL→hybrid, FETCH_ONLY→online, INGEST_ONLY→offline)
|
||||
# DATA_SOURCE=hybrid
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# API 额外请求头(defaults.py → api.headers_extra)
|
||||
# ------------------------------------------------------------------------------
|
||||
# JSON 对象格式,默认空
|
||||
# API_HEADERS_EXTRA={}
|
||||
@@ -1,11 +1,15 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""CLI主入口
|
||||
|
||||
支持两种执行模式:
|
||||
支持三种执行模式:
|
||||
1. 传统模式:指定任务列表直接执行
|
||||
2. 管道模式:指定管道类型和处理模式,执行多层 ETL
|
||||
2. Flow 模式:通过 --flow 指定预定义 Flow 类型,执行多层 ETL
|
||||
3. Layers 模式:通过 --layers 自由组合 ETL 层(ODS,DWD,DWS,INDEX)
|
||||
|
||||
处理模式说明:
|
||||
注意:--flow 和 --layers 互斥,不能同时使用。
|
||||
--pipeline 是 --flow 的已弃用别名,使用时会输出 DeprecationWarning。
|
||||
|
||||
处理模式说明(--processing-mode):
|
||||
- increment_only:仅增量 - 只执行增量数据处理
|
||||
- verify_only:校验并修复 - 跳过增量,直接校验数据一致性并自动补齐
|
||||
- 可选 --fetch-before-verify:校验前先从 API 获取数据
|
||||
@@ -14,21 +18,27 @@
|
||||
示例:
|
||||
# 传统模式
|
||||
python -m cli.main --tasks ODS_MEMBER,ODS_ORDER
|
||||
|
||||
# 管道模式(仅增量)
|
||||
python -m cli.main --pipeline api_full --processing-mode increment_only
|
||||
|
||||
# 管道模式(校验并修复,跳过增量)
|
||||
python -m cli.main --pipeline api_full --processing-mode verify_only
|
||||
|
||||
# 管道模式(校验并修复,校验前先从 API 获取数据)
|
||||
python -m cli.main --pipeline api_full --processing-mode verify_only --fetch-before-verify
|
||||
|
||||
# 管道模式(增量+校验并修复)
|
||||
python -m cli.main --pipeline api_full --processing-mode increment_verify
|
||||
|
||||
# 带时间窗口的管道模式
|
||||
python -m cli.main --pipeline api_ods_dwd --window-start "2026-02-01" --window-end "2026-02-02"
|
||||
|
||||
# Flow 模式(仅增量)
|
||||
python -m cli.main --flow api_full --processing-mode increment_only
|
||||
|
||||
# Flow 模式(校验并修复,跳过增量)
|
||||
python -m cli.main --flow api_full --processing-mode verify_only
|
||||
|
||||
# Flow 模式(校验并修复,校验前先从 API 获取数据)
|
||||
python -m cli.main --flow api_full --processing-mode verify_only --fetch-before-verify
|
||||
|
||||
# Flow 模式(增量+校验并修复)
|
||||
python -m cli.main --flow api_full --processing-mode increment_verify
|
||||
|
||||
# 带时间窗口的 Flow 模式
|
||||
python -m cli.main --flow api_ods_dwd --window-start "2026-02-01" --window-end "2026-02-02"
|
||||
|
||||
# --layers 模式:自由组合层
|
||||
python -m cli.main --layers ODS,DWD --store-id 1
|
||||
|
||||
# 仅执行 DWS + INDEX 层
|
||||
python -m cli.main --layers DWS,INDEX
|
||||
"""
|
||||
import sys
|
||||
import argparse
|
||||
@@ -46,11 +56,11 @@ from orchestration.cursor_manager import CursorManager
|
||||
from orchestration.run_tracker import RunTracker
|
||||
from orchestration.task_registry import default_registry
|
||||
from orchestration.task_executor import TaskExecutor
|
||||
from orchestration.pipeline_runner import PipelineRunner
|
||||
from orchestration.flow_runner import FlowRunner
|
||||
from api.client import APIClient
|
||||
|
||||
# 管道选项
|
||||
PIPELINE_CHOICES = [
|
||||
# Flow 选项
|
||||
FLOW_CHOICES = [
|
||||
"api_ods", # API → ODS
|
||||
"api_ods_dwd", # API → ODS → DWD
|
||||
"api_full", # API → ODS → DWD → DWS汇总 → DWS指数
|
||||
@@ -84,6 +94,31 @@ def setup_logging():
|
||||
logging.basicConfig(level=logging.INFO, format=fmt, datefmt=datefmt)
|
||||
return logging.getLogger("etl_billiards")
|
||||
|
||||
# --layers 合法层名
|
||||
VALID_LAYERS = {"ODS", "DWD", "DWS", "INDEX"}
|
||||
|
||||
|
||||
def parse_layers(raw: str) -> list[str]:
|
||||
"""解析逗号分隔的层名字符串,校验合法性。
|
||||
|
||||
Args:
|
||||
raw: 逗号分隔的层名,如 "ODS,DWD"
|
||||
|
||||
Returns:
|
||||
大写层名列表,如 ["ODS", "DWD"]
|
||||
|
||||
Raises:
|
||||
ValueError: 包含无效层名时抛出
|
||||
"""
|
||||
layers = [l.strip().upper() for l in raw.split(",") if l.strip()]
|
||||
if not layers:
|
||||
raise ValueError("--layers 不能为空")
|
||||
invalid = set(layers) - VALID_LAYERS
|
||||
if invalid:
|
||||
raise ValueError(f"无效的层名: {invalid},合法值: {sorted(VALID_LAYERS)}")
|
||||
return layers
|
||||
|
||||
|
||||
|
||||
def parse_args():
|
||||
"""解析命令行参数"""
|
||||
@@ -95,20 +130,29 @@ def parse_args():
|
||||
# 传统任务模式
|
||||
python -m cli.main --tasks ODS_MEMBER,ODS_ORDER --store-id 1
|
||||
|
||||
# 管道模式(仅增量)
|
||||
python -m cli.main --pipeline api_ods_dwd --processing-mode increment_only
|
||||
# Flow 模式(仅增量)
|
||||
python -m cli.main --flow api_ods_dwd --processing-mode increment_only
|
||||
|
||||
# 管道模式(校验并修复,跳过增量)
|
||||
python -m cli.main --pipeline api_full --processing-mode verify_only
|
||||
# Flow 模式(校验并修复,跳过增量)
|
||||
python -m cli.main --flow api_full --processing-mode verify_only
|
||||
|
||||
# 管道模式(校验并修复,先从 API 获取数据)
|
||||
python -m cli.main --pipeline api_full --processing-mode verify_only --fetch-before-verify
|
||||
# Flow 模式(校验并修复,先从 API 获取数据)
|
||||
python -m cli.main --flow api_full --processing-mode verify_only --fetch-before-verify
|
||||
|
||||
# 管道模式(增量+校验并修复)
|
||||
python -m cli.main --pipeline api_full --processing-mode increment_verify
|
||||
# Flow 模式(增量+校验并修复)
|
||||
python -m cli.main --flow api_full --processing-mode increment_verify
|
||||
|
||||
# 指定时间窗口
|
||||
python -m cli.main --pipeline api_ods --window-start "2026-02-01" --window-end "2026-02-02"
|
||||
python -m cli.main --flow api_ods --window-start "2026-02-01" --window-end "2026-02-02"
|
||||
|
||||
# --layers 模式:自由组合 ETL 层(与 --flow 互斥)
|
||||
python -m cli.main --layers ODS,DWD --store-id 1
|
||||
|
||||
# 仅执行 DWS + INDEX 层
|
||||
python -m cli.main --layers DWS,INDEX
|
||||
|
||||
# --pipeline 仍可用(已弃用,建议迁移到 --flow)
|
||||
python -m cli.main --pipeline api_full
|
||||
""",
|
||||
)
|
||||
|
||||
@@ -117,11 +161,22 @@ def parse_args():
|
||||
parser.add_argument("--tasks", help="任务列表,逗号分隔(传统模式)")
|
||||
parser.add_argument("--dry-run", action="store_true", help="试运行(不提交)")
|
||||
|
||||
# 管道参数(新增)
|
||||
# Flow 参数
|
||||
parser.add_argument(
|
||||
"--flow",
|
||||
choices=FLOW_CHOICES,
|
||||
help="Flow 类型(与 --layers 互斥):api_ods, api_ods_dwd, api_full, ods_dwd, dwd_dws, dwd_dws_index, dwd_index",
|
||||
)
|
||||
# --pipeline 作为已弃用别名,映射到独立 dest 以便检测使用
|
||||
parser.add_argument(
|
||||
"--pipeline",
|
||||
choices=PIPELINE_CHOICES,
|
||||
help="管道类型:api_ods, api_ods_dwd, api_full, ods_dwd, dwd_dws, dwd_dws_index, dwd_index",
|
||||
choices=FLOW_CHOICES,
|
||||
dest="pipeline_deprecated",
|
||||
help="[已弃用] 请使用 --flow。功能与 --flow 相同,使用时输出 DeprecationWarning",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--layers",
|
||||
help="ETL 层自由组合,逗号分隔(ODS,DWD,DWS,INDEX),与 --flow/--pipeline 互斥",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--processing-mode",
|
||||
@@ -217,7 +272,7 @@ def parse_args():
|
||||
parser.add_argument("--export-root", help="导出根目录")
|
||||
parser.add_argument("--log-root", help="日志根目录")
|
||||
|
||||
# 数据源模式(新参数,替代 --pipeline-flow)
|
||||
# 数据源模式(新参数,替代旧 --pipeline-flow)
|
||||
parser.add_argument(
|
||||
"--data-source",
|
||||
dest="data_source",
|
||||
@@ -226,7 +281,7 @@ def parse_args():
|
||||
help="数据源模式:online(仅在线抓取)/ offline(仅本地入库)/ hybrid(抓取+入库)",
|
||||
)
|
||||
|
||||
# 抓取/清洗管线(--pipeline-flow 已弃用,请使用 --data-source)
|
||||
# 数据源 Flow(--pipeline-flow 已弃用,请使用 --data-source)
|
||||
parser.add_argument("--pipeline-flow", choices=["FULL", "FETCH_ONLY", "INGEST_ONLY"], help="[已弃用] 请使用 --data-source")
|
||||
parser.add_argument("--fetch-root", help="抓取JSON输出根目录")
|
||||
parser.add_argument("--ingest-source", help="本地清洗入库源目录")
|
||||
@@ -236,13 +291,21 @@ def parse_args():
|
||||
parser.add_argument("--idle-start", help="闲时窗口开始(HH:MM)")
|
||||
parser.add_argument("--idle-end", help="闲时窗口结束(HH:MM)")
|
||||
parser.add_argument("--allow-empty-advance", action="store_true", help="允许空结果推进窗口")
|
||||
|
||||
# 强制全量更新(跳过 ODS hash 去重 + DWD 变更对比,无条件写入)
|
||||
parser.add_argument(
|
||||
"--force-full",
|
||||
dest="force_full",
|
||||
action="store_true",
|
||||
help="强制全量处理:跳过 ODS hash 去重和 DWD 变更对比,无条件写入",
|
||||
)
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
def resolve_data_source(args) -> str:
|
||||
"""解析 data_source 参数,处理旧参数 --pipeline-flow 的弃用映射。
|
||||
|
||||
优先级:--data-source > --pipeline-flow > 默认值 hybrid
|
||||
优先级:--data-source > --pipeline-flow(弃用别名)> 默认值 hybrid
|
||||
"""
|
||||
_FLOW_TO_DATA_SOURCE = {
|
||||
"FULL": "hybrid",
|
||||
@@ -306,7 +369,7 @@ def build_cli_overrides(args) -> dict:
|
||||
if args.log_root:
|
||||
overrides.setdefault("io", {})["log_root"] = args.log_root
|
||||
|
||||
# 抓取/清洗管线(旧参数保留向后兼容)
|
||||
# 数据源 Flow(旧参数 --pipeline-flow 保留向后兼容,配置键名不变)
|
||||
if args.pipeline_flow:
|
||||
overrides.setdefault("pipeline", {})["flow"] = args.pipeline_flow.upper()
|
||||
|
||||
@@ -345,6 +408,10 @@ def build_cli_overrides(args) -> dict:
|
||||
overrides.setdefault("run", {}).setdefault("idle_window", {})["end"] = args.idle_end
|
||||
if args.allow_empty_advance:
|
||||
overrides.setdefault("run", {})["allow_empty_result_advance"] = True
|
||||
|
||||
# 强制全量更新
|
||||
if args.force_full:
|
||||
overrides.setdefault("run", {})["force_full_update"] = True
|
||||
|
||||
# 任务
|
||||
if args.tasks:
|
||||
@@ -379,11 +446,29 @@ def main():
|
||||
"""主函数
|
||||
|
||||
资源生命周期由 CLI 层统一管理(try/finally),
|
||||
TaskExecutor / PipelineRunner 通过依赖注入接收已创建的资源。
|
||||
TaskExecutor / FlowRunner 通过依赖注入接收已创建的资源。
|
||||
"""
|
||||
logger = setup_logging()
|
||||
args = parse_args()
|
||||
|
||||
# --pipeline 已弃用别名处理:合并到 args.flow(参数名保留以兼容旧调用)
|
||||
if args.pipeline_deprecated:
|
||||
import warnings
|
||||
warnings.warn(
|
||||
"--pipeline 参数已弃用,请使用 --flow",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
if args.flow:
|
||||
print("错误: --pipeline 和 --flow 不能同时指定", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
args.flow = args.pipeline_deprecated
|
||||
|
||||
# --layers 和 --flow 互斥校验
|
||||
if getattr(args, "layers", None) and args.flow:
|
||||
print("错误: --layers 和 --flow 互斥,请只指定其中一个", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
try:
|
||||
# 加载配置
|
||||
cli_overrides = build_cli_overrides(args)
|
||||
@@ -421,10 +506,10 @@ def main():
|
||||
data_source = resolve_data_source(args)
|
||||
|
||||
# ── 判断执行模式 ──────────────────────────────────
|
||||
if args.pipeline:
|
||||
# 管道模式
|
||||
logger.info("执行模式: 管道模式")
|
||||
logger.info("管道类型: %s", args.pipeline)
|
||||
if args.flow:
|
||||
# Flow 模式
|
||||
logger.info("执行模式: Flow 模式")
|
||||
logger.info("Flow 类型: %s", args.flow)
|
||||
logger.info("处理模式: %s", args.processing_mode)
|
||||
|
||||
# 解析时间窗口
|
||||
@@ -461,13 +546,13 @@ def main():
|
||||
if args.verify_tables:
|
||||
verify_tables = [t.strip().lower() for t in args.verify_tables.split(",") if t.strip()]
|
||||
|
||||
# 组装 PipelineRunner 并执行
|
||||
runner = PipelineRunner(
|
||||
# 组装 FlowRunner 并执行
|
||||
runner = FlowRunner(
|
||||
config, executor, registry,
|
||||
db_conn, api_client, logger,
|
||||
)
|
||||
result = runner.run(
|
||||
pipeline=args.pipeline,
|
||||
pipeline=args.flow,
|
||||
processing_mode=args.processing_mode,
|
||||
data_source=data_source,
|
||||
window_start=window_start,
|
||||
@@ -478,7 +563,66 @@ def main():
|
||||
verify_tables=verify_tables,
|
||||
)
|
||||
|
||||
logger.info("管道执行完成: %s", result.get("status"))
|
||||
logger.info("Flow 执行完成: %s", result.get("status"))
|
||||
|
||||
elif getattr(args, "layers", None):
|
||||
# --layers 模式:自由组合层
|
||||
layers = parse_layers(args.layers)
|
||||
logger.info("执行模式: --layers 模式")
|
||||
logger.info("层列表: %s", layers)
|
||||
logger.info("处理模式: %s", args.processing_mode)
|
||||
|
||||
# 解析时间窗口(与 --flow 模式共享逻辑)
|
||||
window_start = None
|
||||
window_end = None
|
||||
|
||||
if args.window_start:
|
||||
window_start = parse_datetime(args.window_start)
|
||||
if args.window_end:
|
||||
window_end = parse_datetime(args.window_end)
|
||||
|
||||
if window_start is None and window_end is None:
|
||||
from datetime import timedelta
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
tz = ZoneInfo(config.get("app.timezone", "Asia/Shanghai"))
|
||||
window_end = datetime.now(tz)
|
||||
window_start = window_end - timedelta(hours=args.lookback_hours)
|
||||
logger.info("使用回溯时间窗口: %s ~ %s", window_start, window_end)
|
||||
|
||||
config.config.setdefault("run", {}).setdefault("window_override", {})
|
||||
config.config["run"]["window_override"]["start"] = window_start
|
||||
config.config["run"]["window_override"]["end"] = window_end
|
||||
|
||||
# 任务过滤器
|
||||
task_codes = None
|
||||
if args.tasks:
|
||||
task_codes = [t.strip().upper() for t in args.tasks.split(",") if t.strip()]
|
||||
|
||||
# 校验表过滤
|
||||
verify_tables = None
|
||||
if args.verify_tables:
|
||||
verify_tables = [t.strip().lower() for t in args.verify_tables.split(",") if t.strip()]
|
||||
|
||||
# 组装 FlowRunner 并执行(使用 layers 参数)
|
||||
runner = FlowRunner(
|
||||
config, executor, registry,
|
||||
db_conn, api_client, logger,
|
||||
)
|
||||
result = runner.run(
|
||||
pipeline=None,
|
||||
layers=layers,
|
||||
processing_mode=args.processing_mode,
|
||||
data_source=data_source,
|
||||
window_start=window_start,
|
||||
window_end=window_end,
|
||||
window_split=args.window_split if args.window_split != "none" else None,
|
||||
task_codes=task_codes,
|
||||
fetch_before_verify=args.fetch_before_verify,
|
||||
verify_tables=verify_tables,
|
||||
)
|
||||
|
||||
logger.info("--layers 执行完成: %s", result.get("status"))
|
||||
|
||||
else:
|
||||
# 传统模式
|
||||
@@ -5,8 +5,9 @@ DEFAULTS = {
|
||||
"app": {
|
||||
"timezone": "Asia/Shanghai",
|
||||
"store_id": "",
|
||||
"schema_oltp": "billiards",
|
||||
"schema_etl": "etl_admin",
|
||||
# CHANGE 2026-02-15 | 对齐新库 etl_feiqiu 六层架构
|
||||
"schema_oltp": "ods",
|
||||
"schema_etl": "meta",
|
||||
},
|
||||
"db": {
|
||||
"dsn": "",
|
||||
@@ -64,6 +65,9 @@ DEFAULTS = {
|
||||
"overlap_seconds": 600,
|
||||
"snapshot_missing_delete": True,
|
||||
"snapshot_allow_empty_delete": False,
|
||||
"snapshot_protect_early_cutoff": True,
|
||||
# CHANGE 2026-02-18 | 强制全量更新:跳过 ODS hash 去重 + DWD 变更对比,无条件写入
|
||||
"force_full_update": False,
|
||||
"window_split": {
|
||||
"unit": "day",
|
||||
"days": 10,
|
||||
@@ -86,6 +90,7 @@ DEFAULTS = {
|
||||
"max_file_bytes": 50 * 1024 * 1024,
|
||||
},
|
||||
"pipeline": {
|
||||
# 旧配置节(保留以兼容旧 .env 和调用方)
|
||||
# 运行流程:FETCH_ONLY(仅在线抓取落盘)、INGEST_ONLY(本地清洗入库)、FULL(抓取 + 清洗入库)
|
||||
"flow": "FULL",
|
||||
# 在线抓取 JSON 输出根目录(按任务、run_id 与时间自动创建子目录)
|
||||
@@ -53,7 +53,7 @@ ENV_MAP = {
|
||||
"PIPELINE_FLOW": ("pipeline.flow",),
|
||||
"JSON_FETCH_ROOT": ("pipeline.fetch_root",),
|
||||
"JSON_SOURCE_DIR": ("pipeline.ingest_source_dir",),
|
||||
"FETCH_ROOT": ("pipeline.fetch_root",),
|
||||
"FETCH_ROOT": ("pipeline.fetch_root", "io.fetch_root"),
|
||||
"INGEST_SOURCE_DIR": ("pipeline.ingest_source_dir",),
|
||||
"INTEGRITY_MODE": ("integrity.mode",),
|
||||
"INTEGRITY_HISTORY_START": ("integrity.history_start",),
|
||||
@@ -83,6 +83,37 @@ ENV_MAP = {
|
||||
"ODS_JSON_DOC_DIR": ("ods.json_doc_dir",),
|
||||
"ODS_INCLUDE_FILES": ("ods.include_files",),
|
||||
"ODS_DROP_SCHEMA_FIRST": ("ods.drop_schema_first",),
|
||||
# ── 以下为 2026-02-16 补齐:defaults.py 中有定义但此前缺少 ENV 映射的参数 ──
|
||||
# 数据库会话参数
|
||||
"DB_SESSION_TIMEZONE": ("db.session.timezone",),
|
||||
"DB_STATEMENT_TIMEOUT_MS": ("db.session.statement_timeout_ms",),
|
||||
"DB_LOCK_TIMEOUT_MS": ("db.session.lock_timeout_ms",),
|
||||
"DB_IDLE_IN_TX_TIMEOUT_MS": ("db.session.idle_in_tx_timeout_ms",),
|
||||
# 清洗配置
|
||||
"CLEAN_LOG_UNKNOWN_FIELDS": ("clean.log_unknown_fields",),
|
||||
"CLEAN_UNKNOWN_FIELDS_LIMIT": ("clean.unknown_fields_limit",),
|
||||
"CLEAN_HASH_ALGO": ("clean.hash_key.algo",),
|
||||
"CLEAN_HASH_SALT": ("clean.hash_key.salt",),
|
||||
"CLEAN_STRICT_NUMERIC": ("clean.strict_numeric",),
|
||||
"CLEAN_ROUND_MONEY_SCALE": ("clean.round_money_scale",),
|
||||
# 安全配置
|
||||
"SECURITY_REDACT_IN_LOGS": ("security.redact_in_logs",),
|
||||
"SECURITY_REDACT_KEYS": ("security.redact_keys",),
|
||||
"SECURITY_ECHO_TOKEN_IN_LOGS": ("security.echo_token_in_logs",),
|
||||
# IO 文件大小限制
|
||||
"MAX_FILE_BYTES": ("io.max_file_bytes",),
|
||||
# 完整性检查:强制按月切分
|
||||
"INTEGRITY_FORCE_MONTHLY_SPLIT": ("integrity.force_monthly_split",),
|
||||
# DWD 事实表 UPSERT 批量参数
|
||||
"DWD_FACT_UPSERT_BATCH_SIZE": ("dwd.fact_upsert_batch_size",),
|
||||
"DWD_FACT_UPSERT_MIN_BATCH_SIZE": ("dwd.fact_upsert_min_batch_size",),
|
||||
"DWD_FACT_UPSERT_MAX_RETRIES": ("dwd.fact_upsert_max_retries",),
|
||||
"DWD_FACT_UPSERT_RETRY_BACKOFF": ("dwd.fact_upsert_retry_backoff_sec",),
|
||||
"DWD_FACT_UPSERT_LOCK_TIMEOUT_MS": ("dwd.fact_upsert_lock_timeout_ms",),
|
||||
# 运行模式(直接设置,不经旧 pipeline.flow 配置键映射;配置键名保留以兼容旧配置)
|
||||
"DATA_SOURCE": ("run.data_source",),
|
||||
# API 额外请求头(JSON 对象格式)
|
||||
"API_HEADERS_EXTRA": ("api.headers_extra",),
|
||||
}
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@ from copy import deepcopy
|
||||
from .defaults import DEFAULTS
|
||||
from .env_parser import load_env_overrides
|
||||
|
||||
# pipeline.flow → run.data_source 值映射
|
||||
# pipeline.flow → run.data_source 值映射(旧配置键兼容,配置键名保留不变)
|
||||
_FLOW_TO_DATA_SOURCE = {
|
||||
"FULL": "hybrid",
|
||||
"FETCH_ONLY": "online",
|
||||
@@ -76,7 +76,7 @@ class AppConfig:
|
||||
except Exception:
|
||||
raise SystemExit(f"db.session.{k} 需为整数毫秒")
|
||||
|
||||
# ── 旧键 → 新键 兼容映射 ──
|
||||
# ── 旧键 → 新键 兼容映射(pipeline.* 配置键名保留以兼容旧配置文件)──
|
||||
pipeline = cfg.get("pipeline", {})
|
||||
run = cfg.setdefault("run", {})
|
||||
io = cfg.setdefault("io", {})
|
||||
@@ -5,6 +5,16 @@
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-19
|
||||
|
||||
### 文档全面刷新 — Schema 名称、技术栈、任务统计同步至项目现状
|
||||
|
||||
- **摘要**:批量更新 `docs/` 下 144 个文档文件,将旧 schema 名称(`billiards_ods`/`billiards_dwd`/`billiards_dws`/`etl_admin`)统一替换为当前六层架构名称(`ods`/`dwd`/`dws`/`meta`);更新 `system_overview.md` 移除已废弃的 PySide6 GUI 和 Flask 引用,替换为 FastAPI + admin-web;修正 ODS 任务数量(16→23)和 DWD 任务数量(5→2);更新 `environment_setup.md` 安装方式(pip→uv sync)、DDL 路径、运行入口;补充 `scheduling.md` 缺失的 CLI 参数(`--force-full`、`--window-split-unit`、`--window-split-days`、`--window-compensation-hours`);`data_flow.md` 新增 INDEX 层数据流图
|
||||
- **影响范围**:文档(`docs/` 全目录,含 architecture/、etl_tasks/、operations/、database/、business-rules/ 等)
|
||||
- **风险**:极低(纯文档变更,无代码修改)
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-15
|
||||
|
||||
### 审计目录整合与 API 文档字段归类修正
|
||||
@@ -6,10 +6,8 @@
|
||||
|-------------|------|
|
||||
| [`architecture/`](architecture/README.md) | 架构设计文档 — 系统整体架构、数据流向(ODS→DWD→DWS)、模块交互关系 |
|
||||
| [`api-reference/`](api-reference/) | API 参考文档(25 个端点的标准化文档 + JSON 样本) |
|
||||
| [`audit/`](audit/README.md) | 审计统一目录(AI 变更审计 + 仓库审计报告) |
|
||||
| [`audit/changes/`](audit/changes/) | AI 逐次变更审计记录 |
|
||||
| [`audit/`](audit/README.md) | 审计目录(历史归档,新记录已迁移至根 `docs/audit/`) |
|
||||
| [`audit/repo/`](audit/repo/) | 仓库审计报告(由 `scripts/audit/` 自动生成:文件清单、调用流、文档对齐) |
|
||||
| [`audit/audit_dashboard.md`](audit/audit_dashboard.md) | 审计一览表 — 基于审计源记录自动生成的汇总视图(时间线 + 模块索引) |
|
||||
| [`business-rules/`](business-rules/README.md) | 业务规则文档 — 指数算法、DWS 口径定义、SCD2 处理规则等业务逻辑 |
|
||||
| [`database/`](database/README.md) | 数据库文档统一目录 — 层级概览 + ODS/DWD/DWS/ETL_Admin 表级文档 |
|
||||
| [`database/overview/`](database/overview/) | 层级概览 / 速查索引(表清单、主键、记录数、业务域分类) |
|
||||
@@ -26,6 +24,7 @@
|
||||
## 维护约定
|
||||
|
||||
- 代码变更涉及表结构或口径时,同步更新 `database/`
|
||||
- 审计一览表通过 `python scripts/gen_audit_dashboard.py` 重新生成,不要手动编辑
|
||||
- 审计一览表通过 `python scripts/audit/gen_audit_dashboard.py`(在项目根目录执行)重新生成
|
||||
- 审计变更记录已统一迁移至根 `docs/audit/changes/`
|
||||
- 审计报告通过 `python -m scripts.audit.run_audit` 重新生成,不要手动编辑
|
||||
- 文档统一 UTF-8 编码,中文撰写
|
||||
@@ -168,7 +168,7 @@ null
|
||||
| `rechargeCardList` / `giveCardList` 中的 `cardTypeName` | 对应会员卡类型配置中的卡类型名称 |
|
||||
| `balance` vs `principalBalance` 差额 | 反映赠送金额部分,与充值记录中的赠送金额对应 |
|
||||
|
||||
> 当前该接口尚未建立 ODS 表,暂无 ETL 入库流程。该接口适合用于 DWS 层的会员资产快照统计,如后续需要持久化,建议在 `billiards_dws` schema 下新建汇总表。
|
||||
> 当前该接口尚未建立 ODS 表,暂无 ETL 入库流程。该接口适合用于 DWS 层的会员资产快照统计,如后续需要持久化,建议在 `dws` schema 下新建汇总表。
|
||||
|
||||
### 金额校验关系
|
||||
|
||||
@@ -7,15 +7,15 @@
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────────────────┐
|
||||
│ ODS 层(billiards_ods) │
|
||||
│ ODS 层(ods) │
|
||||
│ 操作数据存储 — 原始数据落地 │
|
||||
│ 保留源 payload,便于回溯 │
|
||||
│ 22 张 ODS 表,对应 22 个 API 端点 │
|
||||
│ 23 张 ODS 表,对应 23 个 API 端点 │
|
||||
└───────────────┬───────────────────────┘
|
||||
│ DWD_LOAD_FROM_ODS
|
||||
▼
|
||||
┌───────────────────────────────────────┐
|
||||
│ DWD 层(billiards_dwd) │
|
||||
│ DWD 层(dwd) │
|
||||
│ 明细数据 — 清洗、标准化、关联 │
|
||||
│ 维度表走 SCD2(缓慢变化维度) │
|
||||
│ 事实表按时间增量写入 │
|
||||
@@ -23,28 +23,35 @@
|
||||
│ DWS 汇总任务
|
||||
▼
|
||||
┌───────────────────────────────────────┐
|
||||
│ DWS 层(billiards_dws) │
|
||||
│ 数据服务 — 汇总、指标、指数 │
|
||||
│ DWS 层(dws) │
|
||||
│ 数据服务 — 汇总、指标 │
|
||||
│ 助教业绩 / 财务日报 / 会员分析 │
|
||||
│ 工资计算 / 自定义指数算法 │
|
||||
│ 工资计算 │
|
||||
└───────────────┬───────────────────────┘
|
||||
│ INDEX 指数任务
|
||||
▼
|
||||
┌───────────────────────────────────────┐
|
||||
│ INDEX 层(dws) │
|
||||
│ 自定义指数算法 │
|
||||
│ WBI / NCI / RS / OS / MS / ML │
|
||||
└───────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## ODS 层(操作数据存储)
|
||||
|
||||
- Schema:`billiards_ods`
|
||||
- Schema:`ods`
|
||||
- 职责:从上游 SaaS API 抓取原始数据并落地,保留完整源 payload
|
||||
- 数据来源:在线 API 抓取(`APIClient`)或离线 JSON 回放(`LocalJsonClient`)
|
||||
- 任务模式:每个业务实体对应一个 ODS 任务(如 `ORDERS`、`PAYMENTS`、`MEMBERS` 等)
|
||||
- 加载方式:通用 ODS 加载器,批量 upsert + 冲突处理
|
||||
- 任务模式:每个业务实体对应一个 ODS 任务(如 `ODS_PAYMENT`、`ODS_MEMBER`、`ODS_SETTLEMENT_RECORDS` 等),由 `BaseOdsTask` + `OdsTaskSpec` 声明式配置驱动,通过 `_build_task_class()` 工厂函数动态生成任务类
|
||||
- 加载方式:`_insert_records_schema_aware` schema-aware 写入——运行时从 `information_schema` 读取目标表结构,按列名自动匹配,批量 upsert + 冲突处理(`ON CONFLICT` 策略可配置)
|
||||
|
||||
### 核心业务实体(16 个)
|
||||
### 核心业务实体(23 个 ODS 任务)
|
||||
|
||||
订单(settlement_records)、支付(payment_transactions)、退款(refund_transactions)、会员(member_profiles)、会员余额变动(member_balance_changes)、储值卡(member_stored_value_cards)、助教(assistant_accounts_master)、助教服务记录(assistant_service_records)、助教作废记录(assistant_cancellation_records)、台桌(site_tables_master)、商品(store_goods_master / tenant_goods_master)、库存变动(goods_stock_movements)、团购套餐(group_buy_packages)、团购核销(group_buy_redemption_records)、台费折扣(table_fee_discount_records)、台费流水(table_fee_transactions)等。
|
||||
助教账号档案(assistant_accounts_master)、结账记录(settlement_records)、台费计费流水(table_fee_transactions)、助教服务流水(assistant_service_records)、助教废除记录(assistant_cancellation_records)、门店商品销售流水(store_goods_sales_records)、支付流水(payment_transactions)、退款流水(refund_transactions)、平台/团购券核销(platform_coupon_redemption_records)、会员档案(member_profiles)、会员储值卡(member_stored_value_cards)、会员余额变动(member_balance_changes)、充值结算(recharge_settlements)、团购套餐定义(group_buy_packages)、团购套餐核销(group_buy_redemption_records)、库存汇总(goods_stock_summary)、库存变化记录(goods_stock_movements)、台桌维表(site_tables_master)、库存商品分类树(stock_goods_category_tree)、门店商品档案(store_goods_master)、台费折扣/调账(table_fee_discount_records)、租户商品档案(tenant_goods_master)、结账小票详情(settlement_ticket_details)。
|
||||
|
||||
## DWD 层(明细数据)
|
||||
|
||||
- Schema:`billiards_dwd`
|
||||
- Schema:`dwd`
|
||||
- 职责:对 ODS 原始数据进行清洗、标准化、关联,生成可分析的明细数据
|
||||
- 核心任务:`DWD_LOAD_FROM_ODS`
|
||||
- 质量检查:`DWD_QUALITY_CHECK`
|
||||
@@ -58,7 +65,7 @@
|
||||
- 台桌维度(`dim_table`)
|
||||
- 套餐维度(`dim_package`)
|
||||
|
||||
每条维度记录包含 `effective_from`、`effective_to`、`is_current` 字段,支持历史版本追溯。
|
||||
每条维度记录包含 `valid_from`、`valid_to`、`is_current` 字段,支持历史版本追溯。
|
||||
|
||||
### 事实表处理
|
||||
|
||||
@@ -67,7 +74,7 @@
|
||||
|
||||
## DWS 层(数据服务)
|
||||
|
||||
- Schema:`billiards_dws`
|
||||
- Schema:`dws`
|
||||
- 职责:基于 DWD 明细数据进行汇总计算,输出业务指标和分析结果
|
||||
|
||||
### 汇总任务分类
|
||||
@@ -82,7 +89,7 @@
|
||||
|
||||
### 自定义指数算法
|
||||
|
||||
系统实现了六个自定义业务指数,参数存储在 `billiards_dws.cfg_index_parameters`:
|
||||
系统实现了六个自定义业务指数,参数存储在 `dws.cfg_index_parameters`:
|
||||
|
||||
| 指数 | 全称 | 说明 |
|
||||
|------|------|------|
|
||||
@@ -97,7 +104,7 @@
|
||||
|
||||
## ETL 管理层
|
||||
|
||||
- Schema:`etl_admin`
|
||||
- Schema:`meta`
|
||||
- 职责:调度元数据管理
|
||||
- 内容:游标(水位)记录、任务运行记录、调度配置
|
||||
- 关键组件:`cursor_manager.py`(水位管理)、`run_tracker.py`(运行记录)
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user