feat: P1-P3 全栈集成 — 数据库基础 + DWS 扩展 + 小程序鉴权 + 工程化体系

## P1 数据库基础
- zqyy_app: 创建 auth/biz schema、FDW 连接 etl_feiqiu
- etl_feiqiu: 创建 app schema RLS 视图、商品库存预警表
- 清理 assistant_abolish 残留数据

## P2 ETL/DWS 扩展
- 新增 DWS 助教订单贡献度表 (dws.assistant_order_contribution)
- 新增 assistant_order_contribution_task 任务及 RLS 视图
- member_consumption 增加充值字段、assistant_daily 增加处罚字段
- 更新 ODS/DWD/DWS 任务文档及业务规则文档
- 更新 consistency_checker、flow_runner、task_registry 等核心模块

## P3 小程序鉴权系统
- 新增 xcx_auth 路由/schema(微信登录 + JWT)
- 新增 wechat/role/matching/application 服务层
- zqyy_app 鉴权表迁移 + 角色权限种子数据
- auth/dependencies.py 支持小程序 JWT 鉴权

## 文档与审计
- 新增 DOCUMENTATION-MAP 文档导航
- 新增 7 份 BD_Manual 数据库变更文档
- 更新 DDL 基线快照(etl_feiqiu 6 schema + zqyy_app auth)
- 新增全栈集成审计记录、部署检查清单更新
- 新增 BACKLOG 路线图、FDW→Core 迁移计划

## Kiro 工程化
- 新增 5 个 Spec(P1/P2/P3/全栈集成/核心业务)
- 新增审计自动化脚本(agent_on_stop/build_audit_context/compliance_prescan)
- 新增 6 个 Hook(合规检查/会话日志/提交审计等)
- 新增 doc-map steering 文件

## 运维与测试
- 新增 ops 脚本:迁移验证/API 健康检查/ETL 监控/集成报告
- 新增属性测试:test_dws_contribution / test_auth_system
- 清理过期 export 报告文件
- 更新 .gitignore 排除规则
This commit is contained in:
Neo
2026-02-26 08:03:53 +08:00
parent fafc95e64c
commit b25308c3f4
224 changed files with 17660 additions and 32198 deletions

View File

@@ -0,0 +1,53 @@
[00:59:54] ETL 鍏ㄩ摼璺暟鎹竴鑷存€ф鏌ュ紑濮?..
鏃ュ織: 7c2227788c1c4e34800094446e970631.log
鎴愬姛 ODS 浠诲姟: 21
鏁版嵁搴撹繛鎺ユ垚鍔燂紙鍙妯″紡锛?
[API鈫擮DS] 寮€濮嬮€愯〃妫€鏌?..
妫€鏌?ODS_ASSISTANT_ACCOUNT 鈫?ods.assistant_accounts_master... 鉁?
妫€鏌?ODS_ASSISTANT_LEDGER 鈫?ods.assistant_service_records... 鉁?
妫€鏌?ODS_GOODS_CATEGORY 鈫?ods.stock_goods_category_tree... 鉂?瀛樺湪宸紓
妫€鏌?ODS_GROUP_BUY_REDEMPTION 鈫?ods.group_buy_redemption_records... 鉁?
妫€鏌?ODS_GROUP_PACKAGE 鈫?ods.group_buy_packages... 鉂?瀛樺湪宸紓
妫€鏌?ODS_INVENTORY_CHANGE 鈫?ods.goods_stock_movements... 鉁?
妫€鏌?ODS_INVENTORY_STOCK 鈫?ods.goods_stock_summary... 鉁?
妫€鏌?ODS_MEMBER 鈫?ods.member_profiles... 鉂?瀛樺湪宸紓
妫€鏌?ODS_MEMBER_BALANCE 鈫?ods.member_balance_changes... 鉁?
妫€鏌?ODS_MEMBER_CARD 鈫?ods.member_stored_value_cards... 鉂?瀛樺湪宸紓
妫€鏌?ODS_PAYMENT 鈫?ods.payment_transactions... 鉂?瀛樺湪宸紓
妫€鏌?ODS_PLATFORM_COUPON 鈫?ods.platform_coupon_redemption_records... 鉁?
妫€鏌?ODS_RECHARGE_SETTLE 鈫?ods.recharge_settlements... 鉂?瀛樺湪宸紓
妫€鏌?ODS_REFUND 鈫?ods.refund_transactions... 鉁?
妫€鏌?ODS_SETTLEMENT_RECORDS 鈫?ods.settlement_records... 鉂?瀛樺湪宸紓
妫€鏌?ODS_STORE_GOODS 鈫?ods.store_goods_master... 鉂?瀛樺湪宸紓
妫€鏌?ODS_STORE_GOODS_SALES 鈫?ods.store_goods_sales_records... 鈿狅笍 鏃?API JSON
妫€鏌?ODS_TABLES 鈫?ods.site_tables_master... 鉁?
妫€鏌?ODS_TABLE_FEE_DISCOUNT 鈫?ods.table_fee_discount_records... 鉂?瀛樺湪宸紓
妫€鏌?ODS_TABLE_USE 鈫?ods.table_fee_transactions... 鉁?
妫€鏌?ODS_TENANT_GOODS 鈫?ods.tenant_goods_master... 鉂?瀛樺湪宸紓
[ODS鈫擠WD] 寮€濮嬮€愯〃妫€鏌?..
妫€鏌?dwd.dim_assistant 鈫?ods.assistant_accounts_master... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dim_goods_category 鈫?ods.stock_goods_category_tree... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dim_groupbuy_package 鈫?ods.group_buy_packages... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dim_member 鈫?ods.member_profiles... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dim_member_card_account 鈫?ods.member_stored_value_cards... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dim_store_goods 鈫?ods.store_goods_master... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dim_table 鈫?ods.site_tables_master... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dim_tenant_goods 鈫?ods.tenant_goods_master... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_assistant_service_log 鈫?ods.assistant_service_records... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_groupbuy_redemption 鈫?ods.group_buy_redemption_records... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_member_balance_change 鈫?ods.member_balance_changes... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_payment 鈫?ods.payment_transactions... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_platform_coupon_redemption 鈫?ods.platform_coupon_redemption_records... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_recharge_order 鈫?ods.recharge_settlements... 鉁?
妫€鏌?dwd.dwd_refund 鈫?ods.refund_transactions... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_settlement_head 鈫?ods.settlement_records... 鉁?
妫€鏌?dwd.dwd_store_goods_sale 鈫?ods.store_goods_sales_records... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_table_fee_adjust 鈫?ods.table_fee_discount_records... 鉂?瀛樺湪宸紓
妫€鏌?dwd.dwd_table_fee_log 鈫?ods.table_fee_transactions... 鉂?瀛樺湪宸紓
[DWD鈫擠WS] 寮€濮嬫鏌?..
DWS 琛? 34 寮狅紝18 寮犳湁鏁版嵁
鉁?鎶ュ憡宸茬敓鎴? C:\NeoZQYY\export\ETL-Connectors\feiqiu\REPORTS\consistency_check_20260225_005954.md

View File

@@ -0,0 +1,65 @@
"""一次性脚本:执行 2026-02-24 迁移cleanup_assistant_abolish + add_goods_stock_warning_info"""
import os
import sys
from pathlib import Path
from dotenv import load_dotenv
# 加载根 .env
root = Path(__file__).resolve().parent.parent.parent
load_dotenv(root / ".env")
dsn = os.environ.get("TEST_DB_DSN")
if not dsn:
print("ERROR: TEST_DB_DSN 未设置", file=sys.stderr)
sys.exit(1)
import psycopg2
migrations = [
root / "db/etl_feiqiu/migrations/2026-02-24__cleanup_assistant_abolish_residual.sql",
root / "db/etl_feiqiu/migrations/2026-02-24__add_goods_stock_warning_info.sql",
]
conn = psycopg2.connect(dsn)
try:
for mig in migrations:
print(f"\n--- 执行: {mig.name} ---")
sql = mig.read_text(encoding="utf-8")
# 去掉验证 SQL 注释部分(-- === 之后)
parts = sql.split("-- =====")
exec_sql = parts[0]
with conn.cursor() as cur:
cur.execute(exec_sql)
conn.commit()
print(f" 完成")
# 验证 Task 1
with conn.cursor() as cur:
cur.execute("SELECT * FROM meta.etl_task WHERE task_code IN ('ODS_ASSISTANT_ABOLISH', 'ASSISTANT_ABOLISH')")
print(f"\nTask 1 验证 - ASSISTANT_ABOLISH 残留记录数: {cur.rowcount}")
# 验证 Task 4
with conn.cursor() as cur:
cur.execute("""
SELECT column_name, data_type FROM information_schema.columns
WHERE table_schema = 'ods' AND table_name = 'store_goods_master'
AND column_name IN ('warning_sales_day', 'warning_day_max', 'warning_day_min')
ORDER BY column_name
""")
rows = cur.fetchall()
print(f"Task 4 验证 - ODS 新列: {rows}")
with conn.cursor() as cur:
cur.execute("""
SELECT column_name, data_type FROM information_schema.columns
WHERE table_schema = 'dwd' AND table_name = 'dim_store_goods_ex'
AND column_name IN ('warning_sales_day', 'warning_day_max', 'warning_day_min')
ORDER BY column_name
""")
rows = cur.fetchall()
print(f"Task 4 验证 - DWD 新列: {rows}")
finally:
conn.close()
print("\n全部迁移执行完毕")

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,56 @@
"""验证后端服务可达:健康检查 + JWT 认证 + /api/tasks/flows 200"""
import urllib.request
import json
import sys
BASE = "http://localhost:8000"
def main():
# 1. 健康检查
try:
r = urllib.request.urlopen(f"{BASE}/health")
print(f"[OK] GET /health -> {r.status}")
except Exception as e:
print(f"[FAIL] GET /health -> {e}")
sys.exit(1)
# 2. 登录获取 JWT
try:
login_data = json.dumps({"username": "admin", "password": "admin123"}).encode()
req = urllib.request.Request(
f"{BASE}/api/auth/login",
data=login_data,
headers={"Content-Type": "application/json"},
)
resp = urllib.request.urlopen(req)
body = json.loads(resp.read().decode())
token = body["access_token"]
print(f"[OK] POST /api/auth/login -> JWT obtained ({token[:20]}...)")
except Exception as e:
print(f"[FAIL] POST /api/auth/login -> {e}")
sys.exit(1)
# 3. 验证 /api/tasks/flows
try:
req2 = urllib.request.Request(
f"{BASE}/api/tasks/flows",
headers={"Authorization": f"Bearer {token}"},
)
resp2 = urllib.request.urlopen(req2)
data = json.loads(resp2.read().decode())
print(f"[OK] GET /api/tasks/flows -> {resp2.status}")
if isinstance(data, list):
print(f" Flows count: {len(data)}")
for f in data[:5]:
print(f" - {f}")
else:
preview = json.dumps(data, indent=2, ensure_ascii=False)[:300]
print(f" Response preview: {preview}")
except Exception as e:
print(f"[FAIL] GET /api/tasks/flows -> {e}")
sys.exit(1)
print("\n=== 后端服务验证通过 ===")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,93 @@
"""验证+修复 2026-02-24 迁移:直接执行 ALTER TABLE 语句"""
import os
import sys
from pathlib import Path
from dotenv import load_dotenv
root = Path(__file__).resolve().parent.parent.parent
load_dotenv(root / ".env")
dsn = os.environ.get("TEST_DB_DSN")
if not dsn:
print("ERROR: TEST_DB_DSN 未设置", file=sys.stderr)
sys.exit(1)
import psycopg2
conn = psycopg2.connect(dsn)
conn.autocommit = True
try:
with conn.cursor() as cur:
# 先检查列是否已存在
cur.execute("""
SELECT column_name FROM information_schema.columns
WHERE table_schema = 'ods' AND table_name = 'store_goods_master'
AND column_name IN ('warning_sales_day', 'warning_day_max', 'warning_day_min')
""")
existing = [r[0] for r in cur.fetchall()]
print(f"ODS 已有列: {existing}")
if len(existing) < 3:
print("ODS 列不完整,执行 ALTER TABLE...")
cur.execute("""
ALTER TABLE ods.store_goods_master
ADD COLUMN IF NOT EXISTS warning_sales_day NUMERIC(18,2),
ADD COLUMN IF NOT EXISTS warning_day_max INTEGER,
ADD COLUMN IF NOT EXISTS warning_day_min INTEGER
""")
cur.execute("""
COMMENT ON COLUMN ods.store_goods_master.warning_sales_day IS
'库存预警参考的日均销量。来源goodsStockWarningInfo.sales_day';
COMMENT ON COLUMN ods.store_goods_master.warning_day_max IS
'库存预警天数上限。来源goodsStockWarningInfo.warning_day_max';
COMMENT ON COLUMN ods.store_goods_master.warning_day_min IS
'库存预警天数下限。来源goodsStockWarningInfo.warning_day_min';
""")
print("ODS ALTER 完成")
cur.execute("""
SELECT column_name FROM information_schema.columns
WHERE table_schema = 'dwd' AND table_name = 'dim_store_goods_ex'
AND column_name IN ('warning_sales_day', 'warning_day_max', 'warning_day_min')
""")
existing_dwd = [r[0] for r in cur.fetchall()]
print(f"DWD 已有列: {existing_dwd}")
if len(existing_dwd) < 3:
print("DWD 列不完整,执行 ALTER TABLE...")
cur.execute("""
ALTER TABLE dwd.dim_store_goods_ex
ADD COLUMN IF NOT EXISTS warning_sales_day NUMERIC(18,2),
ADD COLUMN IF NOT EXISTS warning_day_max INTEGER,
ADD COLUMN IF NOT EXISTS warning_day_min INTEGER
""")
cur.execute("""
COMMENT ON COLUMN dwd.dim_store_goods_ex.warning_sales_day IS
'库存预警参考的日均销量。来源goodsStockWarningInfo.sales_day';
COMMENT ON COLUMN dwd.dim_store_goods_ex.warning_day_max IS
'库存预警天数上限。来源goodsStockWarningInfo.warning_day_max';
COMMENT ON COLUMN dwd.dim_store_goods_ex.warning_day_min IS
'库存预警天数下限。来源goodsStockWarningInfo.warning_day_min';
""")
print("DWD ALTER 完成")
# 最终验证
with conn.cursor() as cur:
cur.execute("""
SELECT column_name, data_type FROM information_schema.columns
WHERE table_schema = 'ods' AND table_name = 'store_goods_master'
AND column_name IN ('warning_sales_day', 'warning_day_max', 'warning_day_min')
ORDER BY column_name
""")
print(f"\n最终验证 ODS: {cur.fetchall()}")
cur.execute("""
SELECT column_name, data_type FROM information_schema.columns
WHERE table_schema = 'dwd' AND table_name = 'dim_store_goods_ex'
AND column_name IN ('warning_sales_day', 'warning_day_max', 'warning_day_min')
ORDER BY column_name
""")
print(f"最终验证 DWD: {cur.fetchall()}")
finally:
conn.close()

View File

@@ -0,0 +1,119 @@
# -*- coding: utf-8 -*-
"""API 健康检查脚本
任务 1.3:登录获取 JWT → 验证任务注册表 → 执行 sync-check
"""
import json
import sys
import requests
BASE_URL = "http://localhost:8000"
ADMIN_USER = "admin"
ADMIN_PASS = "admin123"
def main() -> int:
ok = True
# ── 1. 登录获取 JWT ──────────────────────────────────────
print("=" * 60)
print("[1/3] POST /api/auth/login — 登录获取 JWT")
print("=" * 60)
try:
resp = requests.post(
f"{BASE_URL}/api/auth/login",
json={"username": ADMIN_USER, "password": ADMIN_PASS},
timeout=10,
)
except requests.ConnectionError:
print("✗ 无法连接后端服务,请确认 uvicorn 已在 :8000 启动")
return 1
if resp.status_code != 200:
print(f"✗ 登录失败: HTTP {resp.status_code}")
print(f" 响应: {resp.text[:500]}")
return 1
tokens = resp.json()
jwt = tokens["access_token"]
print(f"✓ 登录成功,获取 JWT前 40 字符): {jwt[:40]}...")
print(f" token_type: {tokens['token_type']}")
print()
headers = {"Authorization": f"Bearer {jwt}"}
# ── 2. 获取任务注册表 ────────────────────────────────────
print("=" * 60)
print("[2/3] GET /api/tasks/registry — 验证任务注册表")
print("=" * 60)
resp = requests.get(f"{BASE_URL}/api/tasks/registry", headers=headers, timeout=10)
if resp.status_code != 200:
print(f"✗ 获取注册表失败: HTTP {resp.status_code}")
print(f" 响应: {resp.text[:500]}")
ok = False
else:
data = resp.json()
groups = data.get("groups", {})
total_tasks = sum(len(tasks) for tasks in groups.values())
common_tasks = sum(
1 for tasks in groups.values() for t in tasks if t.get("is_common")
)
if total_tasks == 0:
print("✗ 任务注册表为空!")
ok = False
else:
print(f"✓ 任务注册表非空")
print(f" 业务域数量: {len(groups)}")
print(f" 总任务数: {total_tasks}")
print(f" 常用任务数: {common_tasks}")
print(f" 业务域列表: {', '.join(sorted(groups.keys()))}")
# 按域打印任务数
for domain in sorted(groups.keys()):
tasks = groups[domain]
n_common = sum(1 for t in tasks if t.get("is_common"))
print(f" {domain}: {len(tasks)} 个任务({n_common} 个常用)")
print()
# ── 3. Sync-Check ────────────────────────────────────────
print("=" * 60)
print("[3/3] GET /api/tasks/sync-check — 后端与 ETL 注册表同步检查")
print("=" * 60)
resp = requests.get(f"{BASE_URL}/api/tasks/sync-check", headers=headers, timeout=30)
if resp.status_code != 200:
print(f"✗ sync-check 请求失败: HTTP {resp.status_code}")
print(f" 响应: {resp.text[:500]}")
ok = False
else:
sc = resp.json()
if sc.get("error"):
print(f"⚠ sync-check 返回错误: {sc['error']}")
ok = False
elif sc.get("in_sync"):
print("✓ 后端注册表与 ETL 真实注册表完全同步 (in_sync=true)")
else:
print("✗ 后端注册表与 ETL 真实注册表不同步 (in_sync=false)")
if sc.get("backend_only"):
print(f" 仅后端有ETL 缺失): {sc['backend_only']}")
if sc.get("etl_only"):
print(f" 仅 ETL 有(后端缺失): {sc['etl_only']}")
ok = False
print()
# ── 汇总 ─────────────────────────────────────────────────
print("=" * 60)
if ok:
print("✓ API 健康检查全部通过")
else:
print("✗ API 健康检查存在问题,请查看上方详情")
print("=" * 60)
return 0 if ok else 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,362 @@
"""
ETL 全流程联调报告生成脚本
从后端 API 获取执行日志,解析计时数据和错误信息,生成综合联调报告。
报告输出到 SYSTEM_LOG_ROOT 环境变量指定的目录。
"""
import os
import re
import json
import sys
from datetime import datetime, timedelta
from pathlib import Path
from dotenv import load_dotenv
# 加载根 .env
load_dotenv(Path(__file__).resolve().parents[2] / ".env")
SYSTEM_LOG_ROOT = os.environ.get("SYSTEM_LOG_ROOT")
if not SYSTEM_LOG_ROOT:
print("ERROR: SYSTEM_LOG_ROOT 环境变量未设置", file=sys.stderr)
sys.exit(1)
# ── 执行元数据(从 API 历史获取) ──
EXEC_ID = "1e1c93ff-2ab0-42e6-b529-ec14b551c91a"
EXEC_STATUS = "success"
EXEC_EXIT_CODE = 0
EXEC_STARTED = "2026-02-24T02:15:26.689731+08:00"
EXEC_FINISHED = "2026-02-24T02:50:39.679479+08:00"
EXEC_DURATION_MS = 2112989
TASK_COUNT = 41
FLOW = "api_full"
PROCESSING_MODE = "full_window"
WINDOW_START = "2025-11-01"
WINDOW_END = "2026-02-20"
WINDOW_SPLIT_DAYS = 30
# ── 日志文件路径 ──
SCRIPT_DIR = Path(__file__).resolve().parent
ERROR_LOG_PATH = SCRIPT_DIR / "_tmp_error_log.txt"
if not ERROR_LOG_PATH.exists():
print(f"ERROR: 日志文件不存在: {ERROR_LOG_PATH}", file=sys.stderr)
sys.exit(1)
log_text = ERROR_LOG_PATH.read_text(encoding="utf-8")
lines = log_text.splitlines()
# ── 1. 解析各任务的开始/结束时间 ──
TS_RE = re.compile(r"^\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]")
def parse_ts(line: str):
m = TS_RE.match(line)
if m:
return datetime.strptime(m.group(1), "%Y-%m-%d %H:%M:%S")
return None
# 提取每个任务的开始和结束时间
task_timings: dict[str, dict] = {}
START_RE = re.compile(r"开始执行(\w+) \(ODS\)|(\w+): ODS fetch\+load start")
COMPLETE_RE = re.compile(r"(\w+) ODS 任务完成:|(\w+): 完成,统计=|(\w+): 完成, 统计=|(\w+): 结果统计:")
FAIL_RE = re.compile(r"任务 (\w+) 失败:")
DWD_START_RE = re.compile(r"DWD_LOAD_FROM_ODS.*开始|开始运行.*DWD_LOAD_FROM_ODS")
DWD_COMPLETE_RE = re.compile(r"DWD_LOAD_FROM_ODS: 完成")
DWS_START_RE = re.compile(r"(DWS_\w+):.*开始|开始执行.*(DWS_\w+)")
DWS_COMPLETE_RE = re.compile(r"(DWS_\w+): 完成")
for line in lines:
ts = parse_ts(line)
if not ts:
continue
# ODS 任务开始
m = START_RE.search(line)
if m:
task = m.group(1) or m.group(2)
if task and task not in task_timings:
task_timings[task] = {"start": ts, "end": None, "status": "running"}
# ODS 任务完成
m = COMPLETE_RE.search(line)
if m:
task = m.group(1) or m.group(2) or m.group(3) or m.group(4)
if task and task in task_timings:
task_timings[task]["end"] = ts
task_timings[task]["status"] = "success"
# DWD 开始
if "DWD_LOAD_FROM_ODS" in line and ("开始" in line or "start" in line.lower()):
if "DWD_LOAD_FROM_ODS" not in task_timings:
task_timings["DWD_LOAD_FROM_ODS"] = {"start": ts, "end": None, "status": "running"}
# DWD 完成
m = DWD_COMPLETE_RE.search(line)
if m and "DWD_LOAD_FROM_ODS" in task_timings:
task_timings["DWD_LOAD_FROM_ODS"]["end"] = ts
task_timings["DWD_LOAD_FROM_ODS"]["status"] = "success"
# DWS 任务开始(仅首次)
for pattern in [r"(DWS_\w+):.*(?:开始|start)", r"开始执行.*(DWS_\w+)"]:
m2 = re.search(pattern, line)
if m2:
task = m2.group(1)
if task not in task_timings:
task_timings[task] = {"start": ts, "end": None, "status": "running"}
# DWS 任务完成
m = DWS_COMPLETE_RE.search(line)
if m:
task = m.group(1)
if task in task_timings:
task_timings[task]["end"] = ts
task_timings[task]["status"] = "success"
# 任务失败
m = FAIL_RE.search(line)
if m:
task = m.group(1)
if task in task_timings:
task_timings[task]["end"] = ts
task_timings[task]["status"] = "failed"
else:
task_timings[task] = {"start": ts, "end": ts, "status": "failed"}
# 计算耗时
for task, info in task_timings.items():
if info["start"] and info["end"]:
info["duration_s"] = (info["end"] - info["start"]).total_seconds()
else:
info["duration_s"] = 0
# ── 2. 收集错误和警告 ──
errors: list[dict] = []
warnings: list[dict] = []
for i, line in enumerate(lines):
ts = parse_ts(line)
if "ERROR" in line:
# 收集错误行及后续 traceback 上下文(最多 10 行)
context_lines = [line]
for j in range(i + 1, min(i + 10, len(lines))):
next_line = lines[j]
if TS_RE.match(next_line) and "Traceback" not in next_line:
break
context_lines.append(next_line)
errors.append({"ts": ts, "line": line.strip(), "context": "\n".join(context_lines)})
elif "WARNING" in line:
warnings.append({"ts": ts, "line": line.strip()})
# ── 3. 分类错误 ──
error_categories: dict[str, list] = {}
for err in errors:
if "未知的任务类型" in err["line"]:
cat = "任务未注册"
elif "member_birthday_manual" in err["context"]:
cat = "FDW 表缺失(根因)"
elif "InFailedSqlTransaction" in err["context"]:
cat = "事务级联失败"
else:
cat = "其他"
error_categories.setdefault(cat, []).append(err)
# ── 4. 按层分组计时 ──
ods_tasks = {k: v for k, v in task_timings.items() if k.startswith("ODS_")}
dwd_tasks = {k: v for k, v in task_timings.items() if k.startswith("DWD_")}
dws_tasks = {k: v for k, v in task_timings.items() if k.startswith("DWS_")}
# Top-5 耗时
all_with_duration = [(k, v) for k, v in task_timings.items() if v["duration_s"] > 0]
top5 = sorted(all_with_duration, key=lambda x: x[1]["duration_s"], reverse=True)[:5]
# 各层总耗时
ods_total = sum(v["duration_s"] for v in ods_tasks.values())
dwd_total = sum(v["duration_s"] for v in dwd_tasks.values())
dws_total = sum(v["duration_s"] for v in dws_tasks.values())
# 成功/失败统计
success_count = sum(1 for v in task_timings.values() if v["status"] == "success")
failed_count = sum(1 for v in task_timings.values() if v["status"] == "failed")
failed_tasks = [k for k, v in task_timings.items() if v["status"] == "failed"]
# ── 5. 生成报告 ──
def fmt_duration(seconds: float) -> str:
"""格式化秒数为 mm:ss 或 hh:mm:ss"""
if seconds < 0:
return "N/A"
m, s = divmod(int(seconds), 60)
h, m = divmod(m, 60)
if h > 0:
return f"{h}h {m:02d}m {s:02d}s"
return f"{m}m {s:02d}s"
report_lines: list[str] = []
def w(line: str = ""):
report_lines.append(line)
w("# ETL 全流程联调报告")
w()
w(f"> 生成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
w()
# ── 执行概要 ──
w("## 执行概要")
w()
w(f"| 项目 | 值 |")
w(f"|------|-----|")
w(f"| Execution ID | `{EXEC_ID}` |")
w(f"| Flow | `{FLOW}` (API → ODS → DWD → DWS → INDEX) |")
w(f"| 处理模式 | `{PROCESSING_MODE}` |")
w(f"| 时间窗口 | {WINDOW_START} ~ {WINDOW_END} |")
w(f"| 窗口切分 | 按天,{WINDOW_SPLIT_DAYS} 天/切片(共 4 个切片) |")
w(f"| 强制全量 | 是 (`force_full`) |")
w(f"| 任务数 | {TASK_COUNT} 个(全选 `is_common=True` |")
w(f"| 开始时间 | {EXEC_STARTED} |")
w(f"| 结束时间 | {EXEC_FINISHED} |")
w(f"| 总时长 | {fmt_duration(EXEC_DURATION_MS / 1000)} ({EXEC_DURATION_MS}ms) |")
w(f"| 退出码 | {EXEC_EXIT_CODE} |")
w(f"| 最终状态 | `{EXEC_STATUS}` |")
w(f"| 任务成功 | {success_count} / {success_count + failed_count} |")
w(f"| 任务失败 | {failed_count} |")
w()
# ── 性能报告 ──
w("## 性能报告")
w()
w("### 各层耗时汇总")
w()
w(f"| 层 | 任务数 | 总耗时 | 平均耗时 |")
w(f"|-----|-------|--------|---------|")
ods_count = len(ods_tasks)
dwd_count = len(dwd_tasks)
dws_count = len(dws_tasks)
w(f"| ODS | {ods_count} | {fmt_duration(ods_total)} | {fmt_duration(ods_total / max(ods_count, 1))} |")
w(f"| DWD | {dwd_count} | {fmt_duration(dwd_total)} | {fmt_duration(dwd_total / max(dwd_count, 1))} |")
w(f"| DWS+INDEX | {dws_count} | {fmt_duration(dws_total)} | {fmt_duration(dws_total / max(dws_count, 1))} |")
w()
w("### Top-5 耗时任务")
w()
w(f"| 排名 | 任务 | 耗时 | 状态 |")
w(f"|------|------|------|------|")
for rank, (task, info) in enumerate(top5, 1):
w(f"| {rank} | `{task}` | {fmt_duration(info['duration_s'])} | {info['status']} |")
w()
w("### ODS 层各任务耗时明细")
w()
w(f"| 任务 | 开始 | 结束 | 耗时 | 记录数 |")
w(f"|------|------|------|------|--------|")
# 从日志中提取 fetched 数量
fetch_counts: dict[str, int] = {}
for line in lines:
m = re.search(r"(\w+) ODS 任务完成: \{'fetched': (\d+)", line)
if m:
fetch_counts[m.group(1)] = int(m.group(2))
for task in sorted(ods_tasks.keys()):
info = ods_tasks[task]
start_str = info["start"].strftime("%H:%M:%S") if info["start"] else "?"
end_str = info["end"].strftime("%H:%M:%S") if info["end"] else "?"
fetched = fetch_counts.get(task, "?")
w(f"| `{task}` | {start_str} | {end_str} | {fmt_duration(info['duration_s'])} | {fetched} |")
w()
w("### DWD + DWS 层各任务耗时明细")
w()
w(f"| 任务 | 开始 | 结束 | 耗时 | 状态 |")
w(f"|------|------|------|------|------|")
for task in sorted({**dwd_tasks, **dws_tasks}.keys()):
info = task_timings[task]
start_str = info["start"].strftime("%H:%M:%S") if info["start"] else "?"
end_str = info["end"].strftime("%H:%M:%S") if info["end"] else "?"
w(f"| `{task}` | {start_str} | {end_str} | {fmt_duration(info['duration_s'])} | {info['status']} |")
w()
# ── DEBUG 报告 ──
w("## DEBUG 报告")
w()
if not errors and not warnings:
w("无错误或警告。")
else:
w(f"共发现 **{len(errors)}** 个 ERROR**{len(warnings)}** 个 WARNING。")
w()
w("### 错误分类汇总")
w()
w(f"| 类别 | 数量 | 说明 |")
w(f"|------|------|------|")
for cat, errs in error_categories.items():
if cat == "任务未注册":
desc = "`ODS_ASSISTANT_ABOLISH` 未在 `task_registry.py` 中注册"
elif cat == "FDW 表缺失(根因)":
desc = "`fdw_app.member_birthday_manual` 关系不存在"
elif cat == "事务级联失败":
desc = "根因错误导致事务终止,后续 DWS 任务全部 `InFailedSqlTransaction`"
else:
desc = "未分类"
w(f"| {cat} | {len(errs)} | {desc} |")
w()
w("### 错误详情")
w()
w("#### 错误 1ODS_ASSISTANT_ABOLISH 任务未注册")
w()
w("- 时间: 02:15:59")
w("- 错误: `ValueError: 未知的任务类型: ODS_ASSISTANT_ABOLISH`")
w("- 位置: `orchestration/task_registry.py:96`")
w("- 原因: `ODS_ASSISTANT_ABOLISH` 任务在后端任务注册表中标记为 `is_common=True`,但 ETL `task_registry` 中尚未注册该任务的实现类")
w("- 影响: 仅该任务失败,不影响其他任务执行")
w("- 建议: 完成 `assistant-abolish-cleanup` spec 的任务注册,或将后端注册表中该任务的 `is_common` 设为 `False`")
w()
w("#### 错误 2FDW 表缺失导致 DWS 级联失败(根因)")
w()
w("- 时间: 02:50:36")
w("- 根因: `UndefinedTable: 关系 \"fdw_app.member_birthday_manual\" 不存在`")
w("- 触发任务: `DWS_MEMBER_CONSUMPTION`")
w("- 降级尝试: 代码尝试降级为 `dim_member.birthday`,但降级查询在已失败的事务中执行,仍然报错")
w("- 级联影响: 事务被终止后,以下 10 个任务全部 `InFailedSqlTransaction`:")
w(" - `DWS_MEMBER_VISIT`")
w(" - `DWS_FINANCE_DAILY`")
w(" - `DWS_FINANCE_RECHARGE`")
w(" - `DWS_FINANCE_INCOME_STRUCTURE`")
w(" - `DWS_FINANCE_DISCOUNT_DETAIL`")
w(" - `DWS_ASSISTANT_MONTHLY`")
w(" - `DWS_ASSISTANT_FINANCE`")
w(" - `DWS_WINBACK_INDEX`")
w(" - `DWS_NEWCONV_INDEX`")
w(" - `DWS_RELATION_INDEX`")
w("- 建议:")
w(" 1. 在 `zqyy_app` 数据库中创建 `member_birthday_manual` 表(或对应的 FDW 映射)")
w(" 2. 或修改 `DWS_MEMBER_CONSUMPTION` 的降级逻辑,在 FDW 失败时先 ROLLBACK 再重试降级查询")
w(" 3. 考虑为 DWS 任务使用独立事务/连接,避免单任务失败导致级联")
w()
if warnings:
w("### 警告详情")
w()
for warn in warnings:
w(f"- `{warn['line']}`")
w()
# ── 黑盒测试报告占位 ──
w("## 黑盒测试报告")
w()
w("_将在一致性检查完成后追加_")
w()
# ── 输出报告 ──
output_dir = Path(SYSTEM_LOG_ROOT)
output_dir.mkdir(parents=True, exist_ok=True)
date_str = datetime.now().strftime("%Y-%m-%d")
output_path = output_dir / f"{date_str}__etl_integration_report.md"
report_content = "\n".join(report_lines)
output_path.write_text(report_content, encoding="utf-8")
print(f"报告已生成: {output_path}")
print(f"任务统计: {success_count} 成功 / {failed_count} 失败 / {success_count + failed_count} 总计")
print(f"错误数: {len(errors)}, 警告数: {len(warnings)}")

209
scripts/ops/etl_monitor.py Normal file
View File

@@ -0,0 +1,209 @@
# -*- coding: utf-8 -*-
"""ETL 全流程联调监控脚本
每 30 秒轮询执行状态和日志,检测错误/警告,最长等待 30 分钟。
将监控结果输出为 JSON 供后续报告生成使用。
用法: python scripts/ops/etl_monitor.py <execution_id> <jwt_token>
"""
import json
import re
import sys
import time
import urllib.request
from datetime import datetime, timezone
BASE_URL = "http://localhost:8000"
POLL_INTERVAL = 30 # 秒
MAX_IDLE_MINUTES = 30
MAX_IDLE_SECONDS = MAX_IDLE_MINUTES * 60
# 精确匹配真正的错误/警告日志行(排除 JSON 统计中的 'errors': 0 等误报)
ERROR_PATTERN = re.compile(
r"\b(ERROR|CRITICAL)\b.*(?!.*'errors':\s*0)", re.IGNORECASE
)
TRACEBACK_PATTERN = re.compile(r"Traceback \(most recent call last\)")
EXCEPTION_PATTERN = re.compile(r"^\[stderr\].*Exception:", re.IGNORECASE)
WARNING_PATTERN = re.compile(r"\bWARNING\b", re.IGNORECASE)
def api_get(path: str, token: str) -> dict:
req = urllib.request.Request(
f"{BASE_URL}{path}",
headers={"Authorization": f"Bearer {token}"},
)
with urllib.request.urlopen(req, timeout=15) as resp:
return json.loads(resp.read().decode())
def classify_log_lines(lines: list[str]) -> dict:
"""分类日志行,返回错误和警告。"""
errors, warnings = [], []
for i, line in enumerate(lines):
if TRACEBACK_PATTERN.search(line) or EXCEPTION_PATTERN.search(line):
# 收集上下文(前后各 5 行)
ctx_start = max(0, i - 5)
ctx_end = min(len(lines), i + 6)
errors.append({
"line_no": i + 1,
"text": line.strip(),
"context": [l.strip() for l in lines[ctx_start:ctx_end]],
})
elif ERROR_PATTERN.search(line) and "'errors':" not in line:
errors.append({"line_no": i + 1, "text": line.strip(), "context": []})
elif WARNING_PATTERN.search(line) and "'errors':" not in line:
warnings.append({"line_no": i + 1, "text": line.strip()})
return {"errors": errors, "warnings": warnings}
def monitor(execution_id: str, token: str) -> dict:
"""主监控循环。返回完整监控结果。"""
print(f"[监控] 开始监控 execution_id={execution_id}")
print(f"[监控] 轮询间隔={POLL_INTERVAL}s, 最长空闲={MAX_IDLE_MINUTES}min")
start_time = datetime.now(timezone.utc)
last_log_count = 0
last_new_log_time = time.time()
poll_count = 0
all_log_text = ""
final_status = "unknown"
final_exit_code = None
poll_history = []
while True:
poll_count += 1
now = datetime.now(timezone.utc).isoformat()
# 获取日志
try:
logs_data = api_get(f"/api/execution/{execution_id}/logs", token)
except Exception as e:
print(f"[监控] #{poll_count} {now} 日志获取失败: {e}")
time.sleep(POLL_INTERVAL)
continue
log_text = logs_data.get("output_log") or ""
lines = log_text.split("\n") if log_text else []
current_count = len(lines)
new_lines = current_count - last_log_count
if new_lines > 0:
last_new_log_time = time.time()
all_log_text = log_text
# 获取执行状态
try:
hist_data = api_get("/api/execution/history?limit=10", token)
this_exec = next((h for h in hist_data if h["id"] == execution_id), None)
status = this_exec["status"] if this_exec else "unknown"
exit_code = this_exec.get("exit_code") if this_exec else None
duration_ms = this_exec.get("duration_ms") if this_exec else None
except Exception as e:
print(f"[监控] #{poll_count} {now} 状态获取失败: {e}")
status = "unknown"
exit_code = None
duration_ms = None
poll_record = {
"poll": poll_count,
"time": now,
"log_lines": current_count,
"new_lines": new_lines,
"status": status,
}
poll_history.append(poll_record)
# 打印最新几行日志
if new_lines > 0:
recent = lines[last_log_count:current_count]
for line in recent[-3:]:
print(f" {line.strip()}")
print(
f"[监控] #{poll_count} {now} | "
f"日志行={current_count}(+{new_lines}) | 状态={status}"
)
last_log_count = current_count
# 检查完成条件
if status in ("success", "failed", "cancelled"):
final_status = status
final_exit_code = exit_code
print(f"[监控] 任务完成: status={status}, exit_code={exit_code}, duration_ms={duration_ms}")
break
# 检查超时
idle_seconds = time.time() - last_new_log_time
if idle_seconds > MAX_IDLE_SECONDS:
print(f"[监控] 超时警告: {MAX_IDLE_MINUTES}分钟无新日志")
final_status = "timeout_warning"
break
time.sleep(POLL_INTERVAL)
end_time = datetime.now(timezone.utc)
# 分类日志
all_lines = all_log_text.split("\n") if all_log_text else []
classified = classify_log_lines(all_lines)
result = {
"execution_id": execution_id,
"start_time": start_time.isoformat(),
"end_time": end_time.isoformat(),
"monitor_duration_s": (end_time - start_time).total_seconds(),
"final_status": final_status,
"final_exit_code": final_exit_code,
"total_log_lines": len(all_lines),
"total_polls": poll_count,
"errors": classified["errors"],
"warnings": classified["warnings"],
"error_count": len(classified["errors"]),
"warning_count": len(classified["warnings"]),
"poll_history": poll_history,
"full_log": all_log_text,
}
return result
if __name__ == "__main__":
if len(sys.argv) < 3:
print("用法: python scripts/ops/etl_monitor.py <execution_id> <jwt_token>")
sys.exit(1)
exec_id = sys.argv[1]
jwt = sys.argv[2]
result = monitor(exec_id, jwt)
# 输出结果到 JSON 文件(临时位置,后续报告脚本读取)
import os
from pathlib import Path
from dotenv import load_dotenv
# 加载环境变量
root_env = Path(__file__).resolve().parent.parent.parent / ".env"
load_dotenv(root_env)
log_root = os.environ.get("SYSTEM_LOG_ROOT")
if not log_root:
raise RuntimeError("SYSTEM_LOG_ROOT 环境变量未设置")
out_dir = Path(log_root)
out_dir.mkdir(parents=True, exist_ok=True)
date_str = datetime.now().strftime("%Y-%m-%d")
out_path = out_dir / f"{date_str}__etl_monitor_result.json"
with open(out_path, "w", encoding="utf-8") as f:
# 不输出完整日志到 JSON太大单独保存
result_slim = {k: v for k, v in result.items() if k != "full_log"}
json.dump(result_slim, f, ensure_ascii=False, indent=2)
log_path = out_dir / f"{date_str}__etl_full_log.txt"
with open(log_path, "w", encoding="utf-8") as f:
f.write(result["full_log"])
print(f"[监控] 结果已保存: {out_path}")
print(f"[监控] 完整日志已保存: {log_path}")
print(f"[监控] 最终状态: {result['final_status']}, 错误数: {result['error_count']}, 警告数: {result['warning_count']}")

View File

@@ -0,0 +1,472 @@
"""
从 API 获取 ETL 执行日志,提取精细计时数据。
Task 4.1: 运维联调 - 性能计时分析
策略:
- 所有任务都使用统一的切片格式 "XXX: 开始执行(n/N)" / "XXX: 完成(n/N)"
- 根据任务名前缀分类到 ODS / DWD / DWS / INDEX 阶段
- 工具类任务使用 "开始执行工具类任务" / "工具类任务执行成功"
- 失败任务使用 "任务 XXX 失败:" 格式
"""
import os
import sys
import re
import requests
from datetime import datetime
from collections import OrderedDict
from dotenv import load_dotenv
# 加载环境变量
load_dotenv(os.path.join(os.path.dirname(__file__), '..', '..', '.env'))
BACKEND_URL = "http://localhost:8000"
EXECUTION_ID = "41938155-db8c-4eec-9b81-9e5aef42fb8a"
# 任务分层映射(根据 design.md 中的任务列表)
INDEX_TASKS = {"DWS_WINBACK_INDEX", "DWS_NEWCONV_INDEX", "DWS_RELATION_INDEX"}
DWD_TASKS = {"DWD_LOAD_FROM_ODS"}
# DWS 任务 = 以 DWS_ 开头但不在 INDEX_TASKS 中的
# ODS 任务 = 以 ODS_ 开头的
def classify_task(task_name):
"""根据任务名分类到阶段"""
if task_name in INDEX_TASKS:
return "INDEX"
if task_name in DWD_TASKS:
return "DWD"
if task_name.startswith("DWS_"):
return "DWS"
if task_name.startswith("ODS_"):
return "ODS"
return "OTHER"
def login():
resp = requests.post(f"{BACKEND_URL}/api/auth/login",
json={"username": "admin", "password": "admin123"})
resp.raise_for_status()
return resp.json()["access_token"]
def get_logs(token):
headers = {"Authorization": f"Bearer {token}"}
resp = requests.get(f"{BACKEND_URL}/api/execution/{EXECUTION_ID}/logs",
headers=headers)
resp.raise_for_status()
data = resp.json()
return data.get("error_log", "") or data.get("output_log", "")
def parse_timestamp(ts_str):
try:
return datetime.strptime(ts_str, "%Y-%m-%d %H:%M:%S")
except ValueError:
return None
def extract_timing_data(log_text):
"""从日志提取所有任务的计时数据"""
lines = log_text.split('\n')
ts_re = re.compile(r'^\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]')
# 切片开始: "XXX: 开始执行(1/4),窗口[...]"
# 也匹配: "XXX: 开始执行(1/4), 窗口[...]"(英文逗号)
slice_start_re = re.compile(
r'(\w+): 开始执行\((\d+)/(\d+)\)[,]\s*窗口\[([^\]]+)\]')
# 切片完成: "XXX: 完成(1/4),已处理 30/111.17 天"
slice_end_re = re.compile(
r'(\w+): 完成\((\d+)/(\d+)\)[,]\s*已处理')
# 工具类任务开始: "XXX: 开始执行工具类任务"
util_start_re = re.compile(r'(\w+): 开始执行工具类任务')
# 工具类任务完成: "XXX: 工具类任务执行成功"
util_end_re = re.compile(r'(\w+): 工具类任务执行(成功|失败)')
# 任务失败: "任务 XXX 失败:"
task_fail_re = re.compile(r'任务 (\w+) 失败')
# 任务未启用: "任务 XXX 未启用或不存在"
task_skip_re = re.compile(r'任务 (\w+) 未启用或不存在')
# 抓取阶段开始(非工具类 DWS 任务的首条日志)
fetch_start_re = re.compile(r'(\w+): (抓取阶段开始|ODS fetch\+load start)')
# 任务完成统计: "XXX: 完成,统计=" 或 "XXX: 完成, 统计="
task_done_re = re.compile(r'(\w+): 完成[,]\s*统计=')
# ODS 任务完成: "ODS_XXX ODS 任务完成:"
ods_done_re = re.compile(r'(\w+) ODS 任务完成')
# Flow 结束
flow_end_re = re.compile(r'所有任务执行完成|Flow 执行完成')
flow_start_re = re.compile(r'开始执行 Flow: api_full|FLOW_API_FULL.*开始执行')
# 数据结构: task_name -> {layer, start, end, status, slices: [...]}
tasks = OrderedDict()
flow_start = None
flow_end = None
# 跟踪切片开始时间
pending_slices = {} # (task_name, slice_idx) -> (ts, window)
for line in lines:
m = ts_re.match(line)
if not m:
continue
ts = parse_timestamp(m.group(1))
if not ts:
continue
content = line[m.end():].strip()
# 去掉日志级别前缀
content = re.sub(
r'^(INFO|WARNING|ERROR|DEBUG|CRITICAL)\s*\|\s*\w+\s*\|\s*', '', content)
# Flow 开始
if flow_start_re.search(content):
if flow_start is None:
flow_start = ts
# Flow 结束
if flow_end_re.search(content):
flow_end = ts
# 切片开始
sm = slice_start_re.search(content)
if sm:
task_name = sm.group(1)
idx = int(sm.group(2))
total = int(sm.group(3))
window = sm.group(4)
layer = classify_task(task_name)
if task_name not in tasks:
tasks[task_name] = {
'layer': layer, 'start': ts, 'end': None,
'status': 'running', 'total_slices': total, 'slices': []
}
pending_slices[(task_name, idx)] = (ts, window)
continue
# 切片完成
sm = slice_end_re.search(content)
if sm:
task_name = sm.group(1)
idx = int(sm.group(2))
key = (task_name, idx)
if key in pending_slices:
start_ts, window = pending_slices.pop(key)
dur = (ts - start_ts).total_seconds()
if task_name in tasks:
tasks[task_name]['slices'].append({
'idx': idx, 'window': window,
'start': start_ts, 'end': ts, 'duration_sec': dur
})
tasks[task_name]['end'] = ts
continue
# 工具类任务开始
sm = util_start_re.search(content)
if sm:
task_name = sm.group(1)
layer = classify_task(task_name)
if task_name not in tasks:
tasks[task_name] = {
'layer': layer, 'start': ts, 'end': None,
'status': 'running', 'total_slices': 0, 'slices': []
}
else:
tasks[task_name]['start'] = ts
continue
# 工具类任务完成
sm = util_end_re.search(content)
if sm:
task_name = sm.group(1)
success = sm.group(2) == '成功'
if task_name in tasks:
tasks[task_name]['end'] = ts
tasks[task_name]['status'] = 'success' if success else 'failed'
continue
# 抓取阶段开始(作为任务首次出现的标记)
sm = fetch_start_re.search(content)
if sm:
task_name = sm.group(1)
layer = classify_task(task_name)
if task_name not in tasks:
tasks[task_name] = {
'layer': layer, 'start': ts, 'end': None,
'status': 'running', 'total_slices': 0, 'slices': []
}
continue
# 任务完成统计: "XXX: 完成,统计=" 格式
sm = task_done_re.search(content)
if sm:
task_name = sm.group(1)
if task_name in tasks:
tasks[task_name]['end'] = ts
if tasks[task_name]['status'] == 'running':
tasks[task_name]['status'] = 'success'
continue
# ODS 任务完成: "ODS_XXX ODS 任务完成:" 格式
sm = ods_done_re.search(content)
if sm:
task_name = sm.group(1)
if task_name in tasks:
tasks[task_name]['end'] = ts
if tasks[task_name]['status'] == 'running':
tasks[task_name]['status'] = 'success'
continue
# 任务未启用/跳过
sm = task_skip_re.search(content)
if sm:
task_name = sm.group(1)
layer = classify_task(task_name)
tasks[task_name] = {
'layer': layer, 'start': ts, 'end': ts,
'status': 'skipped', 'total_slices': 0, 'slices': []
}
continue
# 任务失败
sm = task_fail_re.search(content)
if sm:
task_name = sm.group(1)
if task_name in tasks:
tasks[task_name]['end'] = ts
tasks[task_name]['status'] = 'failed'
else:
layer = classify_task(task_name)
tasks[task_name] = {
'layer': layer, 'start': ts, 'end': ts,
'status': 'failed', 'total_slices': 0, 'slices': []
}
# 计算每个任务的总耗时
for name, t in tasks.items():
if t['slices']:
t['total_duration_sec'] = sum(
s['duration_sec'] for s in t['slices'])
# 如果所有切片都完成了,标记为成功
if t['status'] == 'running' and t.get('total_slices', 0) > 0:
if len(t['slices']) >= t['total_slices']:
t['status'] = 'success'
# ODS 任务切片全部完成即视为成功total_slices 可能为 0 因为日志格式)
if t['status'] == 'running' and t['layer'] == 'ODS' and len(t['slices']) == 4:
t['status'] = 'success'
elif t['start'] and t['end']:
t['total_duration_sec'] = (t['end'] - t['start']).total_seconds()
else:
t['total_duration_sec'] = 0
return {'flow_start': flow_start, 'flow_end': flow_end, 'tasks': tasks}
def fmt_dur(seconds):
if seconds is None or seconds == 0:
return "<1s"
if seconds < 60:
return f"{seconds:.0f}s"
m = int(seconds // 60)
s = seconds % 60
return f"{m}m{s:.0f}s"
def fmt_ts(ts):
return ts.strftime('%H:%M:%S') if ts else 'N/A'
def generate_report(data):
"""生成 Markdown 计时报告"""
out = []
w = out.append
w("# ETL 执行精细计时报告\n")
w(f"- execution_id: `{EXECUTION_ID}`")
w(f"- Flow 开始: {data['flow_start']}")
w(f"- Flow 结束: {data['flow_end']}")
if data['flow_start'] and data['flow_end']:
total = (data['flow_end'] - data['flow_start']).total_seconds()
w(f"- 总耗时: {fmt_dur(total)}")
w("")
# 按阶段分组
layers = OrderedDict([('ODS', []), ('DWD', []), ('DWS', []), ('INDEX', [])])
for name, t in data['tasks'].items():
layer = t['layer']
if layer in layers:
layers[layer].append((name, t))
all_durations = [] # (name, dur, layer) for Top-5
layer_totals = {}
for layer, task_list in layers.items():
w(f"## {layer} 阶段\n")
if not task_list:
w(f"未检测到 {layer} 任务计时数据\n")
layer_totals[layer] = 0
continue
# 按耗时降序
task_list.sort(key=lambda x: x[1]['total_duration_sec'], reverse=True)
# 区分成功、失败、跳过
success_tasks = [(n, t) for n, t in task_list if t['status'] not in ('failed', 'skipped')]
failed_tasks = [(n, t) for n, t in task_list if t['status'] == 'failed']
skipped_tasks = [(n, t) for n, t in task_list if t['status'] == 'skipped']
layer_dur = 0
if success_tasks:
w("| 任务 | 切片数 | 总耗时 | 开始 | 结束 | 状态 |")
w("|------|--------|--------|------|------|------|")
for name, t in success_tasks:
dur = t['total_duration_sec']
layer_dur += dur
sc = len(t['slices']) or t.get('total_slices', 0)
status = '' if t['status'] == 'success' else ''
w(f"| {name} | {sc} | {fmt_dur(dur)} | {fmt_ts(t['start'])} | {fmt_ts(t['end'])} | {status} |")
all_durations.append((name, dur, layer))
w("")
if failed_tasks:
w(f"### {layer} 失败任务\n")
w("| 任务 | 失败时间 | 状态 |")
w("|------|---------|------|")
for name, t in failed_tasks:
w(f"| {name} | {fmt_ts(t['end'])} | ❌ 失败 |")
all_durations.append((name, 0, layer))
w("")
if skipped_tasks:
w(f"### {layer} 跳过任务\n")
for name, t in skipped_tasks:
w(f"- {name}(未启用)")
w("")
# 阶段总耗时 = 从第一个任务开始到最后一个任务结束
all_starts = [t['start'] for _, t in task_list if t['start']]
all_ends = [t['end'] for _, t in task_list if t['end']]
if all_starts and all_ends:
stage_wall = (max(all_ends) - min(all_starts)).total_seconds()
w(f"{layer} 阶段墙钟耗时(从首个任务开始到末个任务结束): **{fmt_dur(stage_wall)}**")
w(f"{layer} 阶段任务累计耗时: **{fmt_dur(layer_dur)}**")
layer_totals[layer] = stage_wall
else:
layer_totals[layer] = 0
w("")
# 切片详情(仅展示有切片且耗时 > 5s 的任务)
tasks_with_slices = [(n, t) for n, t in task_list
if t['slices'] and t['total_duration_sec'] > 5]
if tasks_with_slices:
w(f"### {layer} 窗口切片详情\n")
for name, t in tasks_with_slices:
w(f"#### {name}\n")
w("| 切片 | 窗口 | 耗时 | 开始 | 结束 |")
w("|------|------|------|------|------|")
for s in sorted(t['slices'], key=lambda x: x['idx']):
total_s = t.get('total_slices', '?')
win_short = s['window'].replace('+08:00', '')
w(f"| {s['idx']}/{total_s} | {win_short} | {fmt_dur(s['duration_sec'])} | {fmt_ts(s['start'])} | {fmt_ts(s['end'])} |")
w("")
# 阶段汇总
w("## 阶段耗时汇总\n")
w("| 阶段 | 墙钟耗时 | 占比 |")
w("|------|---------|------|")
total_wall = sum(layer_totals.values())
for layer in ['ODS', 'DWD', 'DWS', 'INDEX']:
dur = layer_totals.get(layer, 0)
pct = dur / total_wall * 100 if total_wall > 0 else 0
w(f"| {layer} | {fmt_dur(dur)} | {pct:.1f}% |")
w(f"| **合计** | **{fmt_dur(total_wall)}** | **100%** |")
w("")
# Top-5
w("## Top-5 耗时最长的任务\n")
top5 = sorted(all_durations, key=lambda x: x[1], reverse=True)[:5]
w("| 排名 | 任务 | 阶段 | 耗时 |")
w("|------|------|------|------|")
for i, (name, dur, layer) in enumerate(top5, 1):
w(f"| {i} | {name} | {layer} | {fmt_dur(dur)} |")
w("")
# 任务执行总结
w("## 执行统计\n")
total_tasks = len(data['tasks'])
success = sum(1 for t in data['tasks'].values() if t['status'] == 'success')
failed = sum(1 for t in data['tasks'].values() if t['status'] == 'failed')
skipped = sum(1 for t in data['tasks'].values() if t['status'] == 'skipped')
running = sum(1 for t in data['tasks'].values() if t['status'] == 'running')
w(f"- 总任务数: {total_tasks}")
w(f"- 成功: {success}")
w(f"- 失败: {failed}")
if skipped:
w(f"- 跳过(未启用): {skipped}")
if running:
w(f"- 状态不明: {running}")
w("")
if failed:
w("### 失败任务清单\n")
for name, t in data['tasks'].items():
if t['status'] == 'failed':
w(f"- **{name}** ({t['layer']}) — 失败时间 {fmt_ts(t['end'])}")
w("")
return '\n'.join(out)
def main():
print("=" * 60)
print("ETL 执行日志计时数据提取")
print("=" * 60)
# 1. 登录
print("\n[1] 登录获取 token...")
token = login()
print(f" Token: {token[:20]}...")
# 2. 获取日志
print("\n[2] 获取执行日志...")
log_text = get_logs(token)
print(f" 日志长度: {len(log_text)} 字符, {log_text.count(chr(10))}")
if not log_text:
print(" 错误: 日志为空!")
sys.exit(1)
# 保存原始日志
raw_path = os.path.join(os.path.dirname(__file__), '..', '..',
'export', 'temp_raw_execution_log.txt')
os.makedirs(os.path.dirname(raw_path), exist_ok=True)
with open(raw_path, 'w', encoding='utf-8') as f:
f.write(log_text)
print(f" 原始日志已保存: {raw_path}")
# 3. 解析
print("\n[3] 解析计时数据...")
data = extract_timing_data(log_text)
print(f" Flow: {data['flow_start']} ~ {data['flow_end']}")
layers = {}
for t in data['tasks'].values():
layers.setdefault(t['layer'], []).append(t)
for layer in ['ODS', 'DWD', 'DWS', 'INDEX']:
tl = layers.get(layer, [])
ok = sum(1 for t in tl if t['status'] == 'success')
fail = sum(1 for t in tl if t['status'] == 'failed')
print(f" {layer}: {len(tl)} 任务 ({ok} 成功, {fail} 失败)")
# 4. 生成报告
print("\n[4] 生成计时报告...")
report = generate_report(data)
report_path = os.path.join(os.path.dirname(__file__), '..', '..',
'export', 'temp_timing_report.md')
with open(report_path, 'w', encoding='utf-8') as f:
f.write(report)
print(f" 报告已保存: {report_path}")
# 5. 打印
print("\n" + "=" * 60)
print(report)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,246 @@
# -*- coding: utf-8 -*-
"""
修复 ODS_ASSISTANT_LEDGER 误删记录2025-11-21 ~ 2025-11-23
背景:
run_id 89322026-02-24 00:24快照对比时recent endpoint 因数据保留期滚动
丢失了 2025-11-21~2025-11-23 的 67 条记录_mark_missing_as_deleted 将其误标
为 is_delete=1。
修复策略:
1. 调 Former endpoint 拉取 2025-11-01 ~ 2025-11-24 的完整数据
2. 用 ODS 任务的 _insert_records_schema_aware 入库content_hash 去重保证幂等)
3. 对比 ODS 中 is_delete=1 但 Former 返回 is_delete=0 的记录INSERT 修正版本行
4. 完成后提示用户跑 DWD 加载
用法:
cd apps/etl/connectors/feiqiu
python ../../../../scripts/ops/fix_assistant_ledger_misdelete.py [--dry-run]
"""
from __future__ import annotations
import argparse
import json
import sys
from datetime import datetime
from pathlib import Path
from zoneinfo import ZoneInfo
# 加载环境变量
from dotenv import load_dotenv
_ROOT = Path(__file__).resolve().parents[2]
load_dotenv(_ROOT / ".env", override=False)
_FEIQIU_ENV = _ROOT / "apps" / "etl" / "connectors" / "feiqiu" / ".env"
if _FEIQIU_ENV.exists():
load_dotenv(_FEIQIU_ENV, override=False)
# 确保 ETL 模块可导入
sys.path.insert(0, str(_ROOT / "apps" / "etl" / "connectors" / "feiqiu"))
from config.settings import AppConfig
from api.client import APIClient
from database.connection import DatabaseConnection
TZ = ZoneInfo("Asia/Shanghai")
FORMER_ENDPOINT = "/AssistantPerformance/GetFormerOrderAssistantDetails"
TABLE = "ods.assistant_service_records"
STORE_ID = 2790685415443269
WINDOW_START = "2025-11-01 00:00:00"
WINDOW_END = "2025-11-24 00:00:00"
def parse_args():
p = argparse.ArgumentParser(description="修复 ODS_ASSISTANT_LEDGER 误删记录")
p.add_argument("--dry-run", action="store_true", help="仅查询不写入")
return p.parse_args()
def fetch_former_records(api: APIClient) -> list[dict]:
"""调 Former endpoint 拉取指定窗口的全部记录。"""
params = {
"siteId": STORE_ID,
"startTime": WINDOW_START,
"endTime": WINDOW_END,
}
all_records, _ = api.get_paginated(
endpoint=FORMER_ENDPOINT,
params=params,
page_size=200,
data_path=("data",),
list_key="orderAssistantDetails",
)
return all_records
def find_misdeleted_ids(db: DatabaseConnection) -> set[int]:
"""查询 ODS 中被误标 is_delete=1 的记录 ID窗口内最新版本"""
sql = """
SELECT DISTINCT ON (id) id, is_delete, fetched_at
FROM ods.assistant_service_records
WHERE create_time >= %s AND create_time < %s
ORDER BY id, fetched_at DESC NULLS LAST
"""
rows = db.query(sql, (WINDOW_START, WINDOW_END))
return {r["id"] for r in rows if r["is_delete"] == 1}
def get_table_columns(db: DatabaseConnection) -> list[str]:
"""获取 ODS 表的列名列表。"""
sql = """
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'ods' AND table_name = 'assistant_service_records'
ORDER BY ordinal_position
"""
return [r["column_name"] for r in db.query(sql)]
def insert_correction_rows(
db: DatabaseConnection,
former_records: list[dict],
misdeleted_ids: set[int],
columns: list[str],
dry_run: bool,
) -> int:
"""为误删记录插入修正版本行is_delete=0新 fetched_at
策略:从 Former API 返回的原始数据构造 ODS 行,
content_hash 基于 payload + is_delete=0 计算ON CONFLICT DO NOTHING 保证幂等。
"""
import hashlib
now = datetime.now(TZ)
corrected = 0
for rec in former_records:
rec_id = rec.get("id")
if rec_id is None:
continue
try:
rec_id = int(rec_id)
except (ValueError, TypeError):
continue
if rec_id not in misdeleted_ids:
continue
# 构造 payload JSON
payload_json = json.dumps(rec, ensure_ascii=False, sort_keys=True)
# content_hash = md5(payload_json + "|is_delete=0")
hash_input = payload_json + "|is_delete=0"
content_hash = hashlib.md5(hash_input.encode("utf-8")).hexdigest()
# 从 payload 提取 create_time
raw_ct = rec.get("create_time") or rec.get("createTime") or rec.get("Create_time")
create_time_val = None
if raw_ct:
try:
from dateutil import parser as dtparser
create_time_val = dtparser.parse(str(raw_ct))
except (ValueError, TypeError):
pass
if dry_run:
print(f" [DRY-RUN] 将修正 id={rec_id}, create_time={create_time_val}, content_hash={content_hash}")
corrected += 1
continue
# INSERT 修正行(含 create_time
sql = """
INSERT INTO ods.assistant_service_records
(id, payload, is_delete, content_hash, fetched_at, source_file, create_time)
VALUES (%s, %s, %s, %s, %s, %s, %s)
ON CONFLICT (id, content_hash) DO NOTHING
"""
from psycopg2.extras import Json as PgJson
db.execute(sql, (
rec_id,
PgJson(rec, dumps=lambda v: json.dumps(v, ensure_ascii=False)),
0,
content_hash,
now,
f"fix_misdelete_former_{WINDOW_START[:10]}_{WINDOW_END[:10]}",
create_time_val,
))
corrected += 1
return corrected
def main():
args = parse_args()
config = AppConfig.load()
dsn = config.get("db.dsn")
if not dsn:
raise RuntimeError("db.dsn 未配置")
print(f"=== 修复 ODS_ASSISTANT_LEDGER 误删记录 ===")
print(f"窗口: {WINDOW_START} ~ {WINDOW_END}")
print(f"Former endpoint: {FORMER_ENDPOINT}")
print(f"目标表: {TABLE}")
if args.dry_run:
print("[DRY-RUN 模式]")
print()
# 1. 连接数据库
db = DatabaseConnection(dsn, session={"timezone": "Asia/Shanghai"})
print("数据库连接成功")
# 2. 查询当前误删记录
misdeleted = find_misdeleted_ids(db)
print(f"ODS 中窗口内 is_delete=1 的记录数: {len(misdeleted)}")
if not misdeleted:
print("无需修复,退出")
db.close()
return
# 3. 调 Former endpoint 拉取数据
api = APIClient(
base_url=config.get("api.base_url"),
token=config.get("api.token"),
timeout=config.get("api.timeout", 20),
retry_max=config.get("api.retry_max", 3),
)
print(f"正在调用 Former endpoint...")
former_records = fetch_former_records(api)
print(f"Former endpoint 返回 {len(former_records)} 条记录")
# 4. 匹配Former 返回的记录中,哪些在 ODS 被误标为 is_delete=1
former_ids = set()
for rec in former_records:
rid = rec.get("id")
if rid is not None:
try:
former_ids.add(int(rid))
except (ValueError, TypeError):
pass
recoverable = misdeleted & former_ids
print(f"可修复记录数: {len(recoverable)} (ODS误删={len(misdeleted)}, Former返回={len(former_ids)})")
if not recoverable:
print("Former endpoint 未返回任何误删记录,退出")
db.close()
return
# 5. 获取表结构
columns = get_table_columns(db)
# 6. 插入修正版本行
corrected = insert_correction_rows(db, former_records, recoverable, columns, args.dry_run)
if not args.dry_run:
db.commit()
print(f"\n已插入 {corrected} 条修正版本行is_delete=0")
print("\n下一步:跑 DWD 加载以同步修正数据到 DWD 层")
print(" cd apps/etl/connectors/feiqiu")
print(' python -m cli.main --tasks DWD_LOAD_FROM_ODS --window-start "2025-11-01" --window-end "2025-11-24" --force-window-override')
else:
print(f"\n[DRY-RUN] 将修正 {corrected} 条记录")
db.close()
print("\n完成")
if __name__ == "__main__":
main()

View File

@@ -10,6 +10,7 @@
docs/database/ddl/etl_feiqiu__dws.sql
docs/database/ddl/etl_feiqiu__app.sql
docs/database/ddl/zqyy_app__public.sql
docs/database/ddl/zqyy_app__auth.sql
docs/database/ddl/fdw.sql
用法cd C:\\NeoZQYY && python scripts/ops/gen_consolidated_ddl.py
@@ -256,6 +257,7 @@ def main():
# zqyy_app
write_schema_file(app_conn, "zqyy_app", "public", "小程序业务表")
write_schema_file(app_conn, "zqyy_app", "auth", "用户认证与权限")
# FDW
write_fdw_file()
@@ -269,7 +271,7 @@ def main():
old_file.unlink()
print(f"\n🗑️ 已删除旧文件:{old_file.name}")
print(f"\n✅ 完成,共 8 个文件")
print(f"\n✅ 完成,共 9 个文件")
if __name__ == "__main__":

View File

@@ -0,0 +1,255 @@
# 生成 ETL 全流程联调综合报告
# 输出路径:{SYSTEM_LOG_ROOT}/{date}__etl_integration_report.md
# 环境变量 SYSTEM_LOG_ROOT 缺失时报错终止。
# 用法cd C:\NeoZQYY && python scripts/ops/gen_integration_report.py
import os
import sys
from pathlib import Path
from dotenv import load_dotenv
# 加载根 .env
load_dotenv(Path(__file__).resolve().parents[2] / ".env")
SYSTEM_LOG_ROOT = os.environ.get("SYSTEM_LOG_ROOT")
if not SYSTEM_LOG_ROOT:
print("ERROR: 环境变量 SYSTEM_LOG_ROOT 未设置,无法输出报告。", file=sys.stderr)
sys.exit(1)
output_dir = Path(SYSTEM_LOG_ROOT)
output_dir.mkdir(parents=True, exist_ok=True)
REPORT_DATE = "2026-02-24"
output_path = output_dir / f"{REPORT_DATE}__etl_integration_report.md"
# ── 报告内容 ──────────────────────────────────────────────────────────────────
REPORT = r"""# ETL 全流程联调报告
> 生成时间2026-02-24
> execution_id: `41938155-db8c-4eec-9b81-9e5aef42fb8a`
> run_uuid: `f764a42487c34e0f9bd19e4fa9c57f03`
---
## 1. 执行概要
| 项目 | 值 |
|------|-----|
| Flow | `api_full`API → ODS → DWD → DWS → INDEX |
| 处理模式 | `full_window`(全窗口) |
| 时间窗口 | 2025-11-01 ~ 2026-02-20自定义 |
| 窗口切分 | 30 天/切片,共 4 个切片 |
| force_full | 是 |
| 任务数 | 41 个常用任务(`is_common=True` |
| 开始时间 | 2026-02-24 12:27:22 |
| 结束时间 | 2026-02-24 13:05:43 |
| 总耗时 | 38m22s |
| 退出码 | 0 |
| 最终状态 | **success** |
| 数据统计 | 225,760 fetched / 13,437 inserted / 225,648 updated |
### 窗口切片
| 切片 | 窗口范围 |
|------|---------|
| 1 | 2025-10-31 22:00 ~ 2025-11-30 22:00 |
| 2 | 2025-11-30 22:00 ~ 2025-12-30 22:00 |
| 3 | 2025-12-30 22:00 ~ 2026-01-29 22:00 |
| 4 | 2026-01-29 22:00 ~ 2026-02-20 02:00 |
---
## 2. 性能报告
### 2.1 阶段耗时汇总
| 阶段 | 墙钟耗时 | 占比 | 状态 |
|------|---------|------|------|
| ODS | 31m17s | 81.8% | ✅ 全部成功21 任务 + 1 跳过) |
| DWD | 2m30s | 6.5% | ✅ 全部成功1 任务) |
| DWS | 4m29s | 11.7% | ⚠️ 7 成功 / 8 失败 |
| INDEX | <1s | 0.0% | ❌ 3 任务全部失败(级联) |
| **合计** | **38m16s** | **100%** | 29 成功 / 11 失败 / 1 跳过 |
### 2.2 Top-5 耗时瓶颈
| 排名 | 任务 | 阶段 | 耗时 | 备注 |
|------|------|------|------|------|
| 1 | ODS_PLATFORM_COUPON | ODS | 10m41s | 数据量大4 切片均 >2m |
| 2 | ODS_TABLE_USE | ODS | 4m23s | 每切片 ~1m |
| 3 | ODS_GROUP_BUY_REDEMPTION | ODS | 4m22s | 每切片 ~1m |
| 4 | ODS_PAYMENT | ODS | 4m7s | 每切片 ~1m |
| 5 | DWS_ASSISTANT_DAILY | DWS | 2m43s | 聚合计算密集 |
> Top-5 合计 26m16s占总耗时 68.6%。ODS 阶段 API 拉取是主要瓶颈。
### 2.3 ODS 任务耗时明细
| 任务 | 切片数 | 总耗时 | 状态 |
|------|--------|--------|------|
| ODS_PLATFORM_COUPON | 4 | 10m41s | ✅ |
| ODS_TABLE_USE | 4 | 4m23s | ✅ |
| ODS_GROUP_BUY_REDEMPTION | 4 | 4m22s | ✅ |
| ODS_PAYMENT | 4 | 4m7s | ✅ |
| ODS_MEMBER_BALANCE | 4 | 2m8s | ✅ |
| ODS_SETTLEMENT_RECORDS | 4 | 1m45s | ✅ |
| ODS_TABLE_FEE_DISCOUNT | 4 | 54s | ✅ |
| ODS_INVENTORY_CHANGE | 4 | 48s | ✅ |
| ODS_MEMBER_CARD | 4 | 35s | ✅ |
| ODS_ASSISTANT_LEDGER | 4 | 24s | ✅ |
| ODS_MEMBER | 4 | 14s | ✅ |
| ODS_INVENTORY_STOCK | 4 | 9s | ✅ |
| ODS_ASSISTANT_ACCOUNT | 4 | 7s | ✅ |
| ODS_STORE_GOODS | 4 | 7s | ✅ |
| ODS_REFUND | 4 | 4s | ✅ |
| ODS_RECHARGE_SETTLE | 4 | 4s | ✅ |
| ODS_TENANT_GOODS | 4 | 4s | ✅ |
| ODS_TABLES | 4 | 2s | ✅ |
| ODS_GROUP_PACKAGE | 4 | 2s | ✅ |
| ODS_GOODS_CATEGORY | 4 | 1s | ✅ |
| ODS_STORE_GOODS_SALES | 4 | 1s | ✅ |
| ODS_ASSISTANT_ABOLISH | — | — | ⏭️ 跳过(未启用) |
### 2.4 DWD 任务耗时明细
| 任务 | 切片数 | 总耗时 | 状态 |
|------|--------|--------|------|
| DWD_LOAD_FROM_ODS | 4 | 2m30s | ✅ |
### 2.5 DWS 任务耗时明细
| 任务 | 切片数 | 总耗时 | 状态 |
|------|--------|--------|------|
| DWS_ASSISTANT_DAILY | 4 | 2m43s | ✅ |
| DWS_ASSISTANT_CUSTOMER | 4 | 1m32s | ✅ |
| DWS_GOODS_STOCK_MONTHLY | 4 | 1s | ✅ |
| DWS_BUILD_ORDER_SUMMARY | 0 | 1s | ✅ |
| DWS_GOODS_STOCK_DAILY | 4 | <1s | ✅ |
| DWS_GOODS_STOCK_WEEKLY | 4 | <1s | ✅ |
| DWS_ASSISTANT_SALARY | 4 | <1s | ✅ |
| DWS_MEMBER_CONSUMPTION | — | — | ❌ 失败(根因) |
| DWS_MEMBER_VISIT | — | — | ❌ 级联失败 |
| DWS_FINANCE_DAILY | — | — | ❌ 级联失败 |
| DWS_FINANCE_RECHARGE | — | — | ❌ 级联失败 |
| DWS_FINANCE_INCOME_STRUCTURE | — | — | ❌ 级联失败 |
| DWS_FINANCE_DISCOUNT_DETAIL | — | — | ❌ 级联失败 |
| DWS_ASSISTANT_MONTHLY | — | — | ❌ 级联失败 |
| DWS_ASSISTANT_FINANCE | — | — | ❌ 级联失败 |
### 2.6 INDEX 任务耗时明细
| 任务 | 状态 |
|------|------|
| DWS_WINBACK_INDEX | ❌ 级联失败 |
| DWS_NEWCONV_INDEX | ❌ 级联失败 |
| DWS_RELATION_INDEX | ❌ 级联失败 |
---
## 3. DEBUG 报告
### 3.1 错误摘要
共发现 **1 个根因错误**,导致 **10 个任务级联失败**。
### 3.2 WARNINGODS_ASSISTANT_ABOLISH 未启用
- **级别**:低优先级
- **现象**:日志输出 `ODS_ASSISTANT_ABOLISH 未启用或不存在`
- **原因**:任务注册表中该任务标记为 `is_common=False`(非活跃),但仍在全选列表中
- **影响**:无,任务被正常跳过
- **建议**:无需处理,属于预期行为
### 3.3 ROOT CAUSEDWS_MEMBER_CONSUMPTION 失败
**根因分析**:两个 BUG 叠加导致任务失败。
#### BUG-1FDW 外部表未部署
- **现象**:查询 `fdw_app.member_birthday_manual` 时报错(外部表不存在)
- **原因**`db/fdw/setup_fdw_reverse.sql` 中定义的反向 FDW 外部表未在测试库部署
- **影响**:主 SQL 执行失败,触发 rollback进入 `sql_fallback` 分支
#### BUG-2sql_fallback 列名错误
- **现象**fallback SQL 引用了不存在的列 `tenant_member_id`
- **原因**:实际列名应为 `member_id`fallback SQL 未与表结构同步
- **影响**fallback 也失败,事务进入 `InFailedSqlTransaction` 状态
### 3.4 CASCADE FAILURE10 个任务级联失败
- **触发点**DWS_MEMBER_CONSUMPTION 失败后,数据库连接的事务处于 `InFailedSqlTransaction` 状态
- **根因**`run_tasks` 的 `except` 块未调用 `self.db.rollback()`,导致后续所有任务在同一个已失败的事务中执行
- **级联任务**(共 10 个):
1. DWS_MEMBER_VISIT
2. DWS_FINANCE_DAILY
3. DWS_FINANCE_RECHARGE
4. DWS_FINANCE_INCOME_STRUCTURE
5. DWS_FINANCE_DISCOUNT_DETAIL
6. DWS_ASSISTANT_MONTHLY
7. DWS_ASSISTANT_FINANCE
8. DWS_WINBACK_INDEX
9. DWS_NEWCONV_INDEX
10. DWS_RELATION_INDEX
### 3.5 修复建议
| 优先级 | 修复项 | 说明 |
|--------|--------|------|
| **P0** | 修复 SQL 列名 | `tenant_member_id` → `member_id`DWS_MEMBER_CONSUMPTION 的 sql_fallback |
| **P0** | run_tasks except 块添加 rollback | `self.db.rollback()` 防止级联失败 |
| **P1** | 部署 FDW 外部表 | 执行 `db/fdw/setup_fdw_reverse.sql` 到测试库 |
---
## 4. 黑盒测试报告
> ⏳ 待补充 — 将在 Task 5.3 完成后追加。
>
> 预期内容:
> - API vs ODS通过数/总数
> - ODS vs DWD通过数/总数
> - DWD vs DWS表概览
> - 白名单差异统计
> - 失败表清单
> - 全链路检查报告路径引用
---
## 附录
### A. 完整 CLI 参数
```
--flow api_full
--processing-mode full_window
--window-start 2025-11-01
--window-end 2026-02-20
--window-split day
--window-split-days 30
--force-full
--tasks ODS_ASSISTANT_ACCOUNT ODS_ASSISTANT_LEDGER ODS_ASSISTANT_ABOLISH
ODS_SETTLEMENT_RECORDS ODS_TABLE_USE ODS_TABLE_FEE_DISCOUNT
ODS_TABLES ODS_PAYMENT ODS_REFUND ODS_PLATFORM_COUPON
ODS_MEMBER ODS_MEMBER_CARD ODS_MEMBER_BALANCE ODS_RECHARGE_SETTLE
ODS_GROUP_PACKAGE ODS_GROUP_BUY_REDEMPTION
ODS_INVENTORY_STOCK ODS_INVENTORY_CHANGE
ODS_GOODS_CATEGORY ODS_STORE_GOODS ODS_STORE_GOODS_SALES ODS_TENANT_GOODS
DWD_LOAD_FROM_ODS
DWS_BUILD_ORDER_SUMMARY DWS_ASSISTANT_DAILY DWS_ASSISTANT_MONTHLY
DWS_ASSISTANT_CUSTOMER DWS_ASSISTANT_SALARY DWS_ASSISTANT_FINANCE
DWS_MEMBER_CONSUMPTION DWS_MEMBER_VISIT
DWS_FINANCE_DAILY DWS_FINANCE_RECHARGE DWS_FINANCE_INCOME_STRUCTURE
DWS_FINANCE_DISCOUNT_DETAIL
DWS_GOODS_STOCK_DAILY DWS_GOODS_STOCK_WEEKLY DWS_GOODS_STOCK_MONTHLY
DWS_WINBACK_INDEX DWS_NEWCONV_INDEX DWS_RELATION_INDEX
```
### B. 精细计时报告
完整的窗口切片级计时数据见:`export/temp_timing_report.md`
"""
output_path.write_text(REPORT.strip(), encoding="utf-8")
print(f"✅ 报告已生成: {output_path}")

View File

@@ -0,0 +1,146 @@
# -*- coding: utf-8 -*-
"""
在测试库 test_etl_feiqiu 执行 SPI 种子数据脚本。
种子脚本db/etl_feiqiu/seeds/seed_index_parameters.sql
目标表dws.cfg_index_parametersindex_type='SPI'
使用方式:
python scripts/ops/run_seed_spi_params.py
"""
import os
import sys
from pathlib import Path
from dotenv import load_dotenv
import psycopg2
# 加载根 .env
_ROOT = Path(__file__).resolve().parents[2]
load_dotenv(_ROOT / ".env", override=False)
DSN = os.getenv("TEST_DB_DSN")
if not DSN:
print("ERROR: TEST_DB_DSN 未配置,请在根 .env 中设置")
sys.exit(1)
SEED_FILE = _ROOT / "db" / "etl_feiqiu" / "seeds" / "seed_index_parameters.sql"
def execute_seed(conn) -> bool:
"""执行种子数据脚本,返回是否成功"""
sql = SEED_FILE.read_text(encoding="utf-8")
if not sql.strip():
print("⚠️ 种子脚本为空,跳过")
return False
try:
cur = conn.cursor()
cur.execute(sql)
cur.close()
conn.commit()
print("✅ 种子数据脚本执行成功")
return True
except Exception as e:
conn.rollback()
print(f"❌ 种子数据脚本执行失败: {e}")
return False
def verify(conn) -> bool:
"""验证 SPI 参数插入结果"""
cur = conn.cursor()
checks = []
# 1. SPI 参数总数(应为 28 个)
cur.execute("""
SELECT COUNT(*) FROM dws.cfg_index_parameters
WHERE index_type = 'SPI'
""")
spi_count = cur.fetchone()[0]
checks.append((f"SPI 参数数量 = {spi_count}(期望 28", spi_count == 28))
# 2. 关键参数存在且值正确
key_params = [
("weight_level", 0.60),
("weight_speed", 0.30),
("weight_stability", 0.10),
("amount_base_spend_30", 500.0),
("compression_mode", 1.0),
]
for pname, expected in key_params:
cur.execute("""
SELECT param_value FROM dws.cfg_index_parameters
WHERE index_type = 'SPI' AND param_name = %s
ORDER BY effective_from DESC LIMIT 1
""", (pname,))
row = cur.fetchone()
if row:
actual = float(row[0])
ok = abs(actual - expected) < 1e-6
checks.append((f" {pname} = {actual}(期望 {expected}", ok))
else:
checks.append((f" {pname} 缺失", False))
# 3. 权重归一化Level 子分权重之和 = 1.0
cur.execute("""
SELECT SUM(param_value) FROM dws.cfg_index_parameters
WHERE index_type = 'SPI'
AND param_name IN ('w_level_spend_30', 'w_level_spend_90',
'w_level_ticket_90', 'w_level_recharge_90')
""")
level_sum = float(cur.fetchone()[0] or 0)
checks.append((f" Level 权重之和 = {level_sum:.2f}(期望 1.00", abs(level_sum - 1.0) < 1e-6))
# 4. 总分权重之和 = 1.0
cur.execute("""
SELECT SUM(param_value) FROM dws.cfg_index_parameters
WHERE index_type = 'SPI'
AND param_name IN ('weight_level', 'weight_speed', 'weight_stability')
""")
total_sum = float(cur.fetchone()[0] or 0)
checks.append((f" 总分权重之和 = {total_sum:.2f}(期望 1.00", abs(total_sum - 1.0) < 1e-6))
cur.close()
print("\n" + "=" * 50)
print("SPI 种子数据验证结果")
print("=" * 50)
all_ok = True
for name, ok in checks:
status = "" if ok else ""
print(f" {status} {name}")
if not ok:
all_ok = False
return all_ok
def main():
dsn_display = DSN.split("@")[1] if "@" in DSN else DSN
print(f"连接测试库: {dsn_display}")
print(f"种子脚本: {SEED_FILE.name}\n")
if not SEED_FILE.exists():
print(f"ERROR: 种子脚本不存在: {SEED_FILE}")
sys.exit(1)
conn = psycopg2.connect(DSN)
if not execute_seed(conn):
conn.close()
sys.exit(1)
all_ok = verify(conn)
conn.close()
if all_ok:
print("\n✅ SPI 种子数据执行完成,所有验证通过")
else:
print("\n⚠️ 部分验证未通过,请检查")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,462 @@
# -*- coding: utf-8 -*-
"""
P1 数据库基础设施层端到端验证脚本。
检查项:
1. 业务库 auth / biz Schema 存在性
2. ETL 库 app Schema 及 35 张 RLS 视图存在性
3. 业务库 fdw_etl Schema 及外部表存在性 + 可查询性
4. RLS 视图 site_id 过滤正确性
5. app_user / app_reader 角色权限配置
用法:
python scripts/ops/validate_p1_db_foundation.py
"""
from __future__ import annotations
import os
import sys
from pathlib import Path
from dotenv import load_dotenv
# ── 环境变量加载 ──────────────────────────────────────────────
_ROOT = Path(__file__).resolve().parents[2]
load_dotenv(_ROOT / ".env", override=False)
PG_DSN = os.environ.get("PG_DSN")
APP_DB_DSN = os.environ.get("APP_DB_DSN")
_missing = []
if not PG_DSN:
_missing.append("PG_DSN")
if not APP_DB_DSN:
_missing.append("APP_DB_DSN")
if _missing:
raise RuntimeError(
f"必需环境变量缺失: {', '.join(_missing)}"
"请在根 .env 中配置 PG_DSN 和 APP_DB_DSN。"
)
import psycopg2 # noqa: E402 — 延迟导入,确保环境变量校验先行
# ── 35 张 RLS 视图清单 ────────────────────────────────────────
EXPECTED_RLS_VIEWS: list[str] = [
# DWD 层11 张)
"v_dim_member",
"v_dim_assistant",
"v_dim_member_card_account",
"v_dim_table",
"v_dwd_settlement_head",
"v_dwd_table_fee_log",
"v_dwd_assistant_service_log",
"v_dwd_recharge_order",
"v_dwd_store_goods_sale",
"v_dim_staff",
"v_dim_staff_ex",
# DWS 层24 张)
"v_dws_member_consumption_summary",
"v_dws_member_visit_detail",
"v_dws_member_winback_index",
"v_dws_member_newconv_index",
"v_dws_member_recall_index",
"v_dws_member_assistant_relation_index",
"v_dws_member_assistant_intimacy",
"v_dws_assistant_daily_detail",
"v_dws_assistant_monthly_summary",
"v_dws_assistant_salary_calc",
"v_dws_assistant_customer_stats",
"v_dws_assistant_finance_analysis",
"v_dws_finance_daily_summary",
"v_dws_finance_income_structure",
"v_dws_finance_recharge_summary",
"v_dws_finance_discount_detail",
"v_dws_finance_expense_summary",
"v_dws_platform_settlement",
"v_dws_assistant_recharge_commission",
"v_cfg_performance_tier",
"v_cfg_assistant_level_price",
"v_cfg_bonus_rules",
"v_cfg_index_parameters",
"v_dws_order_summary",
]
# ── 辅助函数 ──────────────────────────────────────────────────
def _connect(dsn: str, label: str):
"""建立数据库连接,失败时输出脱敏信息。"""
try:
conn = psycopg2.connect(dsn)
return conn
except psycopg2.OperationalError as exc:
# 脱敏:只显示 host/dbname不泄露密码
safe = dsn.split("@")[-1] if "@" in dsn else "(unknown)"
print(f"❌ 无法连接 {label}{safe}: {exc}", file=sys.stderr)
raise
def _schema_exists(cur, schema_name: str) -> bool:
cur.execute(
"SELECT 1 FROM information_schema.schemata WHERE schema_name = %s",
(schema_name,),
)
return cur.fetchone() is not None
def _view_exists(cur, schema_name: str, view_name: str) -> bool:
cur.execute(
"SELECT 1 FROM information_schema.views "
"WHERE table_schema = %s AND table_name = %s",
(schema_name, view_name),
)
return cur.fetchone() is not None
def _foreign_table_exists(cur, schema_name: str, table_name: str) -> bool:
cur.execute(
"SELECT 1 FROM information_schema.tables "
"WHERE table_schema = %s AND table_name = %s AND table_type = 'FOREIGN'",
(schema_name, table_name),
)
return cur.fetchone() is not None
# ── 核心验证逻辑 ──────────────────────────────────────────────
def validate_p1_db_foundation() -> dict:
"""
返回验证结果字典:
{
"schemas": {"auth": bool, "biz": bool, "app": bool, "fdw_etl": bool},
"rls_views": {"app.v_dim_member": bool, ...},
"fdw_tables": {"fdw_etl.v_dim_member": bool, ...},
"rls_filtering": bool | None, # None = SKIP
"permissions": {"app_user": bool, "app_reader": bool},
"errors": [str, ...]
}
"""
result: dict = {
"schemas": {},
"rls_views": {},
"fdw_tables": {},
"rls_filtering": None,
"permissions": {},
"errors": [],
}
etl_conn = _connect(PG_DSN, "ETL 库")
app_conn = _connect(APP_DB_DSN, "业务库")
try:
_check_schemas(etl_conn, app_conn, result)
_check_rls_views(etl_conn, result)
_check_fdw_tables(app_conn, result)
_check_rls_filtering(etl_conn, result)
_check_permissions(etl_conn, app_conn, result)
finally:
etl_conn.close()
app_conn.close()
return result
def _check_schemas(etl_conn, app_conn, result: dict):
"""检查 auth / biz业务库和 app / fdw_etlETL 库 + 业务库Schema 存在性。"""
with app_conn.cursor() as cur:
for s in ("auth", "biz", "fdw_etl"):
ok = _schema_exists(cur, s)
result["schemas"][s] = ok
if not ok:
result["errors"].append(f"业务库缺少 Schema: {s}")
with etl_conn.cursor() as cur:
ok = _schema_exists(cur, "app")
result["schemas"]["app"] = ok
if not ok:
result["errors"].append("ETL 库缺少 Schema: app")
def _check_rls_views(etl_conn, result: dict):
"""检查 ETL 库 app Schema 中 35 张 RLS 视图是否存在。"""
with etl_conn.cursor() as cur:
for vname in EXPECTED_RLS_VIEWS:
ok = _view_exists(cur, "app", vname)
result["rls_views"][f"app.{vname}"] = ok
if not ok:
result["errors"].append(f"ETL 库缺少 RLS 视图: app.{vname}")
def _check_fdw_tables(app_conn, result: dict):
"""检查业务库 fdw_etl Schema 中外部表存在性 + 可查询性。
cfg_* 表无 RLS 过滤,可直接 SELECT count(*)。
其余 RLS 表的远端视图需要 app.current_site_id
先获取一个有效 site_id 再统一查询。
"""
# 无 RLS 过滤的配置表
cfg_views = {
"v_cfg_performance_tier",
"v_cfg_assistant_level_price",
"v_cfg_bonus_rules",
"v_cfg_index_parameters",
}
# 先从一张 cfg 表确认 FDW 链路可用
with app_conn.cursor() as cur:
for vname in EXPECTED_RLS_VIEWS:
exists = _foreign_table_exists(cur, "fdw_etl", vname)
key = f"fdw_etl.{vname}"
if not exists:
result["fdw_tables"][key] = False
result["errors"].append(f"业务库缺少外部表: {key}")
continue
if vname in cfg_views:
# 无 RLS直接查询
try:
cur.execute(f"SELECT count(*) FROM fdw_etl.{vname}")
cur.fetchone()
result["fdw_tables"][key] = True
except Exception as exc:
app_conn.rollback()
result["fdw_tables"][key] = False
result["errors"].append(f"外部表 {key} 查询失败: {exc}")
else:
# RLS 表:远端需要 app.current_site_id此处仅验证存在性
# 可查询性在 _check_fdw_rls_queryability 中统一验证
result["fdw_tables"][key] = True # 存在即通过
# 对 RLS 外部表做可查询性抽查(通过 dblink 在远端设置 site_id
_check_fdw_rls_queryability(app_conn, result, cfg_views)
def _check_fdw_rls_queryability(app_conn, result: dict, cfg_views: set):
"""通过 ETL 库直连验证 RLS 外部表的可查询性。
FDW 远端会话无法继承本地 SET 的 session 变量,
因此改为:直连 ETL 库设置 site_id 后查询 app.v_* 视图,
间接证明 FDW 映射的源视图可查询。
"""
etl_conn = _connect(PG_DSN, "ETL 库FDW 可查询性验证)")
try:
with etl_conn.cursor() as cur:
# 获取一个有效 site_id
try:
cur.execute(
"SELECT DISTINCT site_id FROM dws.dws_member_consumption_summary LIMIT 1"
)
row = cur.fetchone()
except Exception:
etl_conn.rollback()
return # 无法获取 site_id跳过
if row is None:
return
site_id = row[0]
cur.execute("SET app.current_site_id = %s", (str(site_id),))
# 抽查一张 RLS 视图
rls_sample = "v_dws_member_consumption_summary"
try:
cur.execute(f"SELECT count(*) FROM app.{rls_sample}")
cnt = cur.fetchone()[0]
if cnt == 0:
# 有 site_id 但无数据,不算失败
pass
except Exception as exc:
key = f"fdw_etl.{rls_sample}"
result["fdw_tables"][key] = False
result["errors"].append(
f"RLS 外部表源视图 app.{rls_sample} 查询失败: {exc}"
)
etl_conn.rollback()
finally:
etl_conn.close()
def _check_rls_filtering(etl_conn, result: dict):
"""设置 site_id 后验证 RLS 视图过滤正确性。"""
# 从 DWS 表取一个实际存在的 site_idDWS 表使用 site_id 列)
with etl_conn.cursor() as cur:
try:
cur.execute(
"SELECT DISTINCT site_id FROM dws.dws_member_consumption_summary LIMIT 1"
)
row = cur.fetchone()
except Exception as exc:
result["rls_filtering"] = None
result["errors"].append(f"无法获取 site_id 样本: {exc}")
etl_conn.rollback()
return
if row is None:
result["rls_filtering"] = None
return
site_id = row[0]
# 新连接,设置 site_id 后查询
verify_conn = _connect(PG_DSN, "ETL 库RLS 验证)")
try:
with verify_conn.cursor() as cur:
cur.execute("SET app.current_site_id = %s", (str(site_id),))
cur.execute(
"SELECT site_id FROM app.v_dws_member_consumption_summary LIMIT 100"
)
rows = cur.fetchall()
if not rows:
result["rls_filtering"] = None # SKIP — 该 site_id 无数据
return
all_match = all(r[0] == site_id for r in rows)
result["rls_filtering"] = all_match
if not all_match:
result["errors"].append(
f"RLS 过滤失败: 设置 site_id={site_id}"
f"v_dws_member_consumption_summary 返回了其他门店数据"
)
finally:
verify_conn.close()
def _check_permissions(etl_conn, app_conn, result: dict):
"""验证 app_readerETL 库)和 app_user业务库角色权限。"""
# ── app_reader对 app Schema 的 USAGE + 视图 SELECT ──
app_reader_ok = True
with etl_conn.cursor() as cur:
try:
cur.execute(
"SELECT has_schema_privilege('app_reader', 'app', 'USAGE')"
)
if not cur.fetchone()[0]:
app_reader_ok = False
result["errors"].append("app_reader 缺少 app Schema USAGE 权限")
# 抽查一张视图的 SELECT 权限
cur.execute(
"SELECT has_table_privilege('app_reader', 'app.v_dim_member', 'SELECT')"
)
if not cur.fetchone()[0]:
app_reader_ok = False
result["errors"].append(
"app_reader 缺少 app.v_dim_member SELECT 权限"
)
except Exception as exc:
etl_conn.rollback()
app_reader_ok = False
result["errors"].append(f"app_reader 权限检查异常: {exc}")
result["permissions"]["app_reader"] = app_reader_ok
# ── app_user对 auth / biz 的 USAGE ──
app_user_ok = True
with app_conn.cursor() as cur:
try:
for schema in ("auth", "biz"):
cur.execute(
"SELECT has_schema_privilege('app_user', %s, 'USAGE')",
(schema,),
)
if not cur.fetchone()[0]:
app_user_ok = False
result["errors"].append(
f"app_user 缺少 {schema} Schema USAGE 权限"
)
except Exception as exc:
app_conn.rollback()
app_user_ok = False
result["errors"].append(f"app_user 权限检查异常: {exc}")
result["permissions"]["app_user"] = app_user_ok
# ── 输出格式化 ────────────────────────────────────────────────
def _icon(val) -> str:
if val is None:
return "⏭️"
return "" if val else ""
def print_report(result: dict):
"""打印结构化验证报告。"""
print("\n" + "=" * 60)
print(" P1 数据库基础设施层验证报告")
print("=" * 60)
# Schema 检查
print("\n📦 Schema 存在性")
for name, ok in result["schemas"].items():
print(f" {_icon(ok)} {name}")
# RLS 视图
print(f"\n👁️ RLS 视图(共 {len(EXPECTED_RLS_VIEWS)} 张)")
passed = sum(1 for v in result["rls_views"].values() if v)
failed = sum(1 for v in result["rls_views"].values() if not v)
print(f" ✅ 通过: {passed} ❌ 失败: {failed}")
for name, ok in result["rls_views"].items():
if not ok:
print(f"{name}")
# FDW 外部表
print(f"\n🔗 FDW 外部表(共 {len(EXPECTED_RLS_VIEWS)} 张)")
passed_fdw = sum(1 for v in result["fdw_tables"].values() if v)
failed_fdw = sum(1 for v in result["fdw_tables"].values() if not v)
print(f" ✅ 通过: {passed_fdw} ❌ 失败: {failed_fdw}")
for name, ok in result["fdw_tables"].items():
if not ok:
print(f"{name}")
# RLS 过滤
print(f"\n🔒 RLS 过滤正确性: {_icon(result['rls_filtering'])}", end="")
if result["rls_filtering"] is None:
print(" (无数据,跳过验证)")
else:
print()
# 权限
print("\n🔑 角色权限")
for role, ok in result["permissions"].items():
print(f" {_icon(ok)} {role}")
# 汇总
total_checks = (
len(result["schemas"])
+ len(result["rls_views"])
+ len(result["fdw_tables"])
+ (1 if result["rls_filtering"] is not None else 0)
+ len(result["permissions"])
)
total_pass = (
sum(1 for v in result["schemas"].values() if v)
+ sum(1 for v in result["rls_views"].values() if v)
+ sum(1 for v in result["fdw_tables"].values() if v)
+ (1 if result["rls_filtering"] is True else 0)
+ sum(1 for v in result["permissions"].values() if v)
)
total_skip = 1 if result["rls_filtering"] is None else 0
print(f"\n{'=' * 60}")
print(f" 汇总: {total_pass}/{total_checks} 通过", end="")
if total_skip:
print(f" ({total_skip} 项跳过)", end="")
print()
if result["errors"]:
print(f"\n⚠️ 失败详情(共 {len(result['errors'])} 项):")
for err in result["errors"]:
print(f"{err}")
print("=" * 60 + "\n")
# ── 入口 ──────────────────────────────────────────────────────
if __name__ == "__main__":
result = validate_p1_db_foundation()
print_report(result)
# 退出码:有失败项则非零
has_failure = bool(result["errors"])
sys.exit(1 if has_failure else 0)