在前后端开发联调前 的提交20260223
This commit is contained in:
293
docs/deployment/EXPORT-PATHS.md
Normal file
293
docs/deployment/EXPORT-PATHS.md
Normal file
@@ -0,0 +1,293 @@
|
||||
# Export 输出路径规范
|
||||
|
||||
> 最后更新:2026-02-19
|
||||
> 本文档描述 `export/` 目录的统一结构、各子目录用途、对应的 `.env` 变量、以及代码中的读取方式。
|
||||
|
||||
---
|
||||
|
||||
## 目录总览
|
||||
|
||||
```
|
||||
export/
|
||||
├── ETL-Connectors/feiqiu/
|
||||
│ ├── JSON/ — API 原始 JSON 导出(ODS 抓取落盘)
|
||||
│ ├── LOGS/ — ETL 运行日志(每次 run 一个 .log)
|
||||
│ └── REPORTS/ — ETL 质检/完整性报告(JSON 格式)
|
||||
├── SYSTEM/
|
||||
│ ├── LOGS/ — 系统级运维日志(预留)
|
||||
│ ├── REPORTS/
|
||||
│ │ ├── dataflow_analysis/ — 数据流结构分析报告(Markdown + 采集中间产物)
|
||||
│ │ ├── field_audit/ — 字段排查报告(Markdown)
|
||||
│ │ └── full_dataflow_doc/ — 全链路数据流文档(Markdown)
|
||||
│ └── CACHE/
|
||||
│ └── api_samples/ — API 样本缓存(24h 有效,gen_full_dataflow_doc 使用)
|
||||
└── BACKEND/
|
||||
└── LOGS/ — 后端结构化日志(预留,待 5.3 后端日志改造后启用)
|
||||
```
|
||||
|
||||
> 服务器部署时不保留 `export/`(通过 `setup-server-git.py` 排除),仅开发机留存。
|
||||
> 服务器的输出路径由各环境 `.env` 独立配置,指向 `repo/export/` 下对应子目录。
|
||||
|
||||
---
|
||||
|
||||
## 环境变量与目录映射
|
||||
|
||||
| 环境变量 | 默认值(开发机) | 对应目录 | 说明 |
|
||||
|----------|------------------|----------|------|
|
||||
| `EXPORT_ROOT` | `C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON` | `ETL-Connectors/feiqiu/JSON/` | ODS 抓取 JSON 落盘根目录 |
|
||||
| `LOG_ROOT` | `C:/NeoZQYY/export/ETL-Connectors/feiqiu/LOGS` | `ETL-Connectors/feiqiu/LOGS/` | ETL 运行日志 |
|
||||
| `FETCH_ROOT` | `C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON` | `ETL-Connectors/feiqiu/JSON/` | FETCH_ONLY 模式 JSON 输出(通常与 EXPORT_ROOT 相同) |
|
||||
| `ETL_REPORT_ROOT` | `C:/NeoZQYY/export/ETL-Connectors/feiqiu/REPORTS` | `ETL-Connectors/feiqiu/REPORTS/` | ETL 质检/完整性报告 |
|
||||
| `SYSTEM_ANALYZE_ROOT` | `C:/NeoZQYY/export/SYSTEM/REPORTS/dataflow_analysis` | `SYSTEM/REPORTS/dataflow_analysis/` | 数据流结构分析报告 |
|
||||
| `FIELD_AUDIT_ROOT` | `C:/NeoZQYY/export/SYSTEM/REPORTS/field_audit` | `SYSTEM/REPORTS/field_audit/` | 字段排查报告 |
|
||||
| `FULL_DATAFLOW_DOC_ROOT` | `C:/NeoZQYY/export/SYSTEM/REPORTS/full_dataflow_doc` | `SYSTEM/REPORTS/full_dataflow_doc/` | 全链路数据流文档 |
|
||||
| `API_SAMPLE_CACHE_ROOT` | `C:/NeoZQYY/export/SYSTEM/CACHE/api_samples` | `SYSTEM/CACHE/api_samples/` | API 样本缓存 |
|
||||
| `SYSTEM_LOG_ROOT` | `C:/NeoZQYY/export/SYSTEM/LOGS` | `SYSTEM/LOGS/` | 系统级运维日志 |
|
||||
| `BACKEND_LOG_ROOT` | `C:/NeoZQYY/export/BACKEND/LOGS` | `BACKEND/LOGS/` | 后端结构化日志 |
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 各目录详细说明与代码配合
|
||||
|
||||
### 1. ETL-Connectors/feiqiu/JSON — API 原始 JSON 导出
|
||||
|
||||
环境变量:`EXPORT_ROOT`、`FETCH_ROOT`
|
||||
|
||||
配置加载链路:
|
||||
```
|
||||
.env EXPORT_ROOT=...
|
||||
→ env_parser.py ENV_MAP["EXPORT_ROOT"] → ("io.export_root",)
|
||||
→ defaults.py io.export_root 默认 ""(空字符串,强制要求 .env 配置)
|
||||
→ AppConfig.get("io.export_root")
|
||||
```
|
||||
|
||||
代码使用:
|
||||
- `utils/json_store.py` 的 `dump_json()` 负责写入 JSON 文件
|
||||
- `tasks/ods/ods_json_archive_task.py` 调用 `dump_json()` 将 API 原始响应落盘
|
||||
- 目录结构:`{EXPORT_ROOT}/{TASK_CODE}/{TASK_CODE}-{run_id}-{timestamp}/`
|
||||
- 每个子目录包含按 endpoint 命名的 `.json` 文件和 `manifest.json`
|
||||
|
||||
示例输出:
|
||||
```
|
||||
export/ETL-Connectors/feiqiu/JSON/
|
||||
└── ODS_PAYMENT/
|
||||
└── ODS_PAYMENT-abc123-20260219_1430/
|
||||
├── payment_transactions.json
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. ETL-Connectors/feiqiu/LOGS — ETL 运行日志
|
||||
|
||||
环境变量:`LOG_ROOT`
|
||||
|
||||
配置加载链路:
|
||||
```
|
||||
.env LOG_ROOT=...
|
||||
→ env_parser.py ENV_MAP["LOG_ROOT"] → ("io.log_root",)
|
||||
→ defaults.py io.log_root 默认 ""(空字符串,强制要求 .env 配置)
|
||||
→ AppConfig.get("io.log_root") 或 config["io"]["log_root"]
|
||||
```
|
||||
|
||||
代码使用:
|
||||
- `orchestration/task_executor.py` 的 `_attach_run_file_logger()` 方法
|
||||
- 读取 `self.config["io"]["log_root"]`
|
||||
- 创建 `{LOG_ROOT}/{run_uuid}.log`
|
||||
- 每次 ETL 运行生成一个以 run_uuid 命名的日志文件
|
||||
- `utils/logging_utils.py` 的 `configure_logging()` 上下文管理器
|
||||
- 接收 `log_file: Path` 参数,支持同时输出到控制台和文件
|
||||
|
||||
示例输出:
|
||||
```
|
||||
export/ETL-Connectors/feiqiu/LOGS/
|
||||
├── 37f16195960649b384a6bf1e3bdb092c.log
|
||||
├── 7b6d77fe8a6a4503a0ce4374e3864be9.log
|
||||
└── ecc3684e2c794d1ba30b6748b999593c.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. ETL-Connectors/feiqiu/REPORTS — ETL 质检/完整性报告
|
||||
|
||||
环境变量:`ETL_REPORT_ROOT`
|
||||
|
||||
当前代码行为(已适配):
|
||||
- `quality/integrity_service.py` 的 `write_report()` 函数
|
||||
- 读取 `ETL_REPORT_ROOT` 环境变量作为输出根目录
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
- 支持 `report_path` 参数覆盖
|
||||
- `quality/integrity_checker.py` 的 `_default_report_path()` 函数
|
||||
- 读取 `ETL_REPORT_ROOT` 环境变量
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
- `tasks/dwd/dwd_quality_task.py` 的 `DwdQualityTask`
|
||||
- `REPORT_PATH` 从 `ETL_REPORT_ROOT` 环境变量读取
|
||||
- 环境变量缺失时 `load()` 抛出 `RuntimeError`
|
||||
- `scripts/debug/generate_report.py`
|
||||
- `REPORTS_DIR` 从 `ETL_REPORT_ROOT` 环境变量读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
- ETL 内部脚本(`check/`、`repair/`、`debug/`、`scripts/`)
|
||||
- 均通过 `os.environ.get("ETL_REPORT_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
|
||||
---
|
||||
|
||||
### 4. SYSTEM/REPORTS/dataflow_analysis — 数据流结构分析报告
|
||||
|
||||
环境变量:`SYSTEM_ANALYZE_ROOT`
|
||||
|
||||
代码使用:
|
||||
- `scripts/ops/gen_dataflow_report.py` 的 `resolve_data_dir()` 函数
|
||||
- 通过 `_env_paths.get_output_path("SYSTEM_ANALYZE_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
- 支持 `--output-dir` CLI 参数覆盖
|
||||
- `scripts/ops/analyze_dataflow.py` 的 `resolve_output_dir()` 函数
|
||||
- 通过 `_env_paths.get_output_path("SYSTEM_ANALYZE_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
|
||||
目录内容:
|
||||
```
|
||||
export/SYSTEM/REPORTS/dataflow_analysis/
|
||||
├── bd_descriptions/ — 业务描述 JSON(每表一个)
|
||||
├── db_schemas/ — 数据库表结构 JSON(ODS + DWD)
|
||||
├── field_mappings/ — 字段映射 JSON
|
||||
├── json_trees/ — API JSON 结构树
|
||||
├── collection_manifest.json — 采集清单
|
||||
└── dataflow_YYYY-MM-DD_HHMMSS.md — 最终报告
|
||||
```
|
||||
|
||||
已适配:代码直接读取 `SYSTEM_ANALYZE_ROOT`,无需改动。
|
||||
|
||||
---
|
||||
|
||||
### 5. SYSTEM/REPORTS/field_audit — 字段排查报告
|
||||
|
||||
环境变量:`FIELD_AUDIT_ROOT`
|
||||
|
||||
当前代码行为(已适配):
|
||||
- `scripts/ops/field_audit.py`
|
||||
- 通过 `_env_paths.get_output_path("FIELD_AUDIT_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
- 支持 `--output` 参数覆盖
|
||||
- `scripts/ops/export_dwd_field_review.py`
|
||||
- 通过 `_env_paths.get_output_path("FIELD_AUDIT_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
|
||||
---
|
||||
|
||||
### 6. SYSTEM/REPORTS/full_dataflow_doc — 全链路数据流文档
|
||||
|
||||
环境变量:`FULL_DATAFLOW_DOC_ROOT`
|
||||
|
||||
当前代码行为(已适配):
|
||||
- `scripts/ops/gen_full_dataflow_doc.py`
|
||||
- `OUT` 通过 `_env_paths.get_output_path("FULL_DATAFLOW_DOC_ROOT")` 读取
|
||||
- `SAMPLE_DIR` 通过 `_env_paths.get_output_path("API_SAMPLE_CACHE_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
- `scripts/ops/gen_dataflow_doc.py`
|
||||
- `OUT` 通过 `_env_paths.get_output_path("FULL_DATAFLOW_DOC_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
- `scripts/ops/gen_api_field_mapping.py`
|
||||
- `INPUT_DOC` 通过 `_env_paths.get_output_path("FULL_DATAFLOW_DOC_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
|
||||
---
|
||||
|
||||
### 7. SYSTEM/CACHE/api_samples — API 样本缓存
|
||||
|
||||
环境变量:`API_SAMPLE_CACHE_ROOT`
|
||||
|
||||
当前代码行为(已适配):
|
||||
- `scripts/ops/gen_full_dataflow_doc.py`
|
||||
- `SAMPLE_DIR` 通过 `_env_paths.get_output_path("API_SAMPLE_CACHE_ROOT")` 读取
|
||||
- 环境变量缺失时抛出 `KeyError`
|
||||
- 缓存 24 小时有效,超时重新从 API 获取
|
||||
|
||||
---
|
||||
|
||||
### 8. BACKEND/LOGS — 后端结构化日志
|
||||
|
||||
环境变量:`BACKEND_LOG_ROOT`(预留)
|
||||
|
||||
当前状态:
|
||||
- 后端仅使用 uvicorn 默认日志,无文件输出
|
||||
- 对应 LAUNCH-CHECKLIST 第 5.3 项"后端结构化日志"
|
||||
|
||||
启用时机:
|
||||
- 后端接入结构化日志后,配置 `BACKEND_LOG_ROOT` 指向此目录
|
||||
|
||||
---
|
||||
|
||||
### 9. SYSTEM/LOGS — 系统级运维日志
|
||||
|
||||
环境变量:`SYSTEM_LOG_ROOT`(预留)
|
||||
|
||||
当前状态:
|
||||
- 预留给未来的系统级运维脚本日志输出
|
||||
- 如监控系统(LAUNCH-CHECKLIST 7.2)上线后的采集器日志
|
||||
|
||||
---
|
||||
|
||||
## 配置优先级
|
||||
|
||||
所有路径变量遵循项目统一的配置优先级:
|
||||
|
||||
```
|
||||
defaults.py 默认值(路径类均为空字符串)< 根 .env < 应用 .env(如 feiqiu/.env)< 环境变量 < CLI 参数
|
||||
```
|
||||
|
||||
ETL 模块的路径变量通过 `env_parser.py` 的 `ENV_MAP` 映射到 `AppConfig` 的 `io.*` 配置节。
|
||||
`defaults.py` 中所有 `io.*` 路径默认值已清空为 `""`,如果 `.env` 未配置,下游代码会因空路径而失败。
|
||||
系统级脚本直接通过 `os.environ.get()` 或 `python-dotenv` 读取。
|
||||
|
||||
---
|
||||
|
||||
## 代码适配状态
|
||||
|
||||
| 目录 | 环境变量 | 代码已适配 | 备注 |
|
||||
|------|----------|-----------|------|
|
||||
| ETL JSON | `EXPORT_ROOT` | ✅ | `env_parser.py` → `io.export_root` |
|
||||
| ETL LOGS | `LOG_ROOT` | ✅ | `env_parser.py` → `io.log_root` |
|
||||
| ETL FETCH | `FETCH_ROOT` | ✅ | `env_parser.py` → `io.fetch_root` |
|
||||
| ETL REPORTS | `ETL_REPORT_ROOT` | ✅ | `integrity_service.py` / `dwd_quality_task.py` 已适配 |
|
||||
| dataflow_analysis | `SYSTEM_ANALYZE_ROOT` | ✅ | `gen_dataflow_report.py` 已读取 |
|
||||
| field_audit | `FIELD_AUDIT_ROOT` | ✅ | `field_audit.py` 已适配 |
|
||||
| full_dataflow_doc | `FULL_DATAFLOW_DOC_ROOT` | ✅ | `gen_full_dataflow_doc.py` 已适配 |
|
||||
| api_samples | `API_SAMPLE_CACHE_ROOT` | ✅ | `gen_full_dataflow_doc.py` 已适配 |
|
||||
| SYSTEM LOGS | `SYSTEM_LOG_ROOT` | — | 预留 |
|
||||
| BACKEND LOGS | `BACKEND_LOG_ROOT` | — | 预留 |
|
||||
|
||||
---
|
||||
|
||||
## 服务器环境配置示例
|
||||
|
||||
开发机(`C:\NeoZQYY\.env`):
|
||||
```env
|
||||
EXPORT_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON
|
||||
LOG_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/LOGS
|
||||
FETCH_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/JSON
|
||||
ETL_REPORT_ROOT=C:/NeoZQYY/export/ETL-Connectors/feiqiu/REPORTS
|
||||
SYSTEM_ANALYZE_ROOT=C:/NeoZQYY/export/SYSTEM/REPORTS/dataflow_analysis
|
||||
BACKEND_LOG_ROOT=C:/NeoZQYY/export/BACKEND/LOGS
|
||||
```
|
||||
|
||||
服务器测试环境(`D:\NeoZQYY\test\repo\.env`):
|
||||
```env
|
||||
EXPORT_ROOT=D:/NeoZQYY/test/repo/export/ETL-Connectors/feiqiu/JSON
|
||||
LOG_ROOT=D:/NeoZQYY/test/repo/export/ETL-Connectors/feiqiu/LOGS
|
||||
FETCH_ROOT=D:/NeoZQYY/test/repo/export/ETL-Connectors/feiqiu/JSON
|
||||
ETL_REPORT_ROOT=D:/NeoZQYY/test/repo/export/ETL-Connectors/feiqiu/REPORTS
|
||||
SYSTEM_ANALYZE_ROOT=D:/NeoZQYY/test/repo/export/SYSTEM/REPORTS/dataflow_analysis
|
||||
BACKEND_LOG_ROOT=D:/NeoZQYY/test/repo/export/BACKEND/LOGS
|
||||
```
|
||||
|
||||
服务器正式环境(`D:\NeoZQYY\prod\repo\.env`):
|
||||
```env
|
||||
EXPORT_ROOT=D:/NeoZQYY/prod/repo/export/ETL-Connectors/feiqiu/JSON
|
||||
LOG_ROOT=D:/NeoZQYY/prod/repo/export/ETL-Connectors/feiqiu/LOGS
|
||||
FETCH_ROOT=D:/NeoZQYY/prod/repo/export/ETL-Connectors/feiqiu/JSON
|
||||
ETL_REPORT_ROOT=D:/NeoZQYY/prod/repo/export/ETL-Connectors/feiqiu/REPORTS
|
||||
SYSTEM_ANALYZE_ROOT=D:/NeoZQYY/prod/repo/export/SYSTEM/REPORTS/dataflow_analysis
|
||||
BACKEND_LOG_ROOT=D:/NeoZQYY/prod/repo/export/BACKEND/LOGS
|
||||
```
|
||||
Reference in New Issue
Block a user