微信小程序页面迁移校验之前 P5任务处理之前

2026-03-09 01:19:21 +08:00
parent 263bf96035
commit 6e20987d2f
1112 changed files with 153824 additions and 219694 deletions
--- a/.kiro/specs/etl-unified-pipeline/.config.kiro
+++ b/.kiro/specs/etl-unified-pipeline/.config.kiro
@@ -0,0 +1 @@
+{"specId": "a277a91a-b35c-4d48-b4a2-09df0e47b71b", "workflowType": "requirements-first", "specType": "feature"}
--- a/.kiro/specs/etl-unified-pipeline/design.md
+++ b/.kiro/specs/etl-unified-pipeline/design.md
@@ -0,0 +1,834 @@
+# 技术设计：ETL 统一请求编排与线程模型改造
+
+> 对应需求文档：[requirements.md](./requirements.md)
+
+## 1. 架构概览
+
+本次改造将现有 21 个 ODS 任务从"同步串行执行"迁移到统一的"串行请求 + 异步处理 + 单线程写库"管道架构。核心设计原则：
+
+- **请求串行化**：所有 API 请求通过全局 `RequestScheduler` 排队，严格一个接一个发送，遵循 `RateLimiter` 限流
+- **处理并行化**：API 响应提交到 `ProcessingPool` 多线程处理（字段提取、hash 计算等），不阻塞请求线程
+- **写入串行化**：所有数据库写入由单个 `WriteWorker` 线程执行，避免并发写入冲突
+- **配置灵活化**：通过 `PipelineConfig` 支持全局默认 + 任务级覆盖
+- **可取消**：通过 `CancellationToken` 支持外部取消信号，优雅中断
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      FlowRunner                                  │
+│  ┌──────────────────────────────────────────────────────────┐   │
+│  │                  Unified_Pipeline                         │   │
+│  │                                                           │   │
+│  │  ┌─────────────┐    ┌──────────────┐    ┌─────────────┐  │   │
+│  │  │  Request     │───▶│ Processing   │───▶│  Write      │  │   │
+│  │  │  Scheduler   │    │ Pool         │    │  Worker     │  │   │
+│  │  │  (串行请求)  │    │ (N 工作线程) │    │  (单线程)   │  │   │
+│  │  └──────┬───────┘    └──────────────┘    └─────────────┘  │   │
+│  │         │                                                  │   │
+│  │  ┌──────▼───────┐    ┌──────────────┐                     │   │
+│  │  │  Rate        │    │ Cancellation │                     │   │
+│  │  │  Limiter     │    │ Token        │                     │   │
+│  │  │  (5-20s)     │    │ (外部取消)   │                     │   │
+│  │  └──────────────┘    └──────────────┘                     │   │
+│  └──────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  ┌──────────────────────────────────────────────────────────┐   │
+│  │  DWD_Loader (多线程 SCD2 调度)                            │   │
+│  │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐            │   │
+│  │  │ Table1 │ │ Table2 │ │ Table3 │ │ Table4 │  ...       │   │
+│  │  │ SCD2   │ │ SCD2   │ │ SCD2   │ │ SCD2   │            │   │
+│  │  └────────┘ └────────┘ └────────┘ └────────┘            │   │
+│  └──────────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### 1.1 设计决策与理由
+
+| 决策 | 选项 | 理由 |
+|------|------|------|
+| 请求串行 vs 并行 | 串行 | 上游飞球 API 无并发友好设计，并行请求易触发风控；串行 + 限流是最安全的策略 |
+| 处理线程数 | 默认 2 | ODS 数据处理是轻量 CPU 操作（JSON 解析、hash 计算），2 线程足够消化请求间隔产生的积压 |
+| 写入单线程 | 单线程 | PostgreSQL 单连接写入避免锁竞争和事务冲突，简化错误处理和回滚逻辑 |
+| Pipeline 嵌入 vs 独立 | 嵌入 BaseOdsTask | Pipeline 作为 BaseOdsTask 内部执行引擎，对外接口（TaskExecutor、FlowRunner）完全不变 |
+| DWD 多线程 | 调度层并行 | 仅在调度层并行调用 `_merge_dim_scd2()`，方法本身不改，每张表独立事务 |
+
+## 2. 架构
+
+### 2.1 整体数据流
+
+```mermaid
+graph TD
+    A[FlowRunner.run] --> B[TaskExecutor.run_tasks]
+    B --> C{ODS 任务}
+    C --> D[BaseOdsTask.execute]
+    D --> E[UnifiedPipeline.run]
+    E --> F[RequestScheduler<br/>串行请求 + RateLimiter]
+    F --> G[ProcessingPool<br/>多线程处理]
+    G --> H[WriteWorker<br/>单线程写库]
+    H --> I[ODS 表写入完成]
+    
+    I --> J{有 Detail_Mode?}
+    J -->|是| K[DetailFetcher<br/>二级详情拉取]
+    K --> F
+    J -->|否| L[返回结果]
+    K --> L
+    
+    L --> M[DWD_Loader]
+    M --> N[多线程 SCD2 调度]
+    N --> O[DWD 表写入完成]
+```
+
+### 2.2 线程模型详细设计
+
+```
+主线程（RequestScheduler）
+    │
+    │  for request in request_queue:
+    │      if cancel_token.is_cancelled: break
+    │      resp = api_client.post(endpoint, params)
+    │      processing_queue.put((request_id, resp))
+    │      rate_limiter.wait(cancel_token.event)
+    │
+    │  processing_queue.put(SENTINEL × worker_count)
+    │  等待所有 worker 完成
+    │  write_queue.put(SENTINEL)
+    │  等待 writer 完成
+    │
+    ├──▶ Worker Thread 1 ──┐
+    ├──▶ Worker Thread 2 ──┤
+    │                      │
+    │   processing_queue   │
+    │   ┌─────────────┐    │
+    │   │ (id, resp)   │───▶ 字段提取 / content_hash 计算
+    │   │ (id, resp)   │     write_queue.put(processed_rows)
+    │   │ SENTINEL     │
+    │   └─────────────┘    │
+    │                      │
+    │                      ▼
+    │              Write Thread (单线程)
+    │              ┌─────────────┐
+    │              │ write_queue  │
+    │              │ batch=100    │──▶ UPSERT / INSERT
+    │              │ timeout=5s   │
+    │              │ SENTINEL     │
+    │              └─────────────┘
+    │
+    ▼
+PipelineResult（统计信息）
+```
+
+关键设计点：
+- `processing_queue`：`queue.Queue(maxsize=queue_size)`，默认 100，满时 `RequestScheduler` 阻塞（背压机制）
+- `write_queue`：`queue.Queue(maxsize=queue_size * 2)`，默认 200
+- SENTINEL：`None` 对象，通知线程退出
+- 取消信号：主线程检查 `cancel_token`，worker/writer 通过 SENTINEL 正常退出
+- 批量写入：累积到 `batch_size`（默认 100）或等待 `batch_timeout`（默认 5 秒）后执行一次
+
+### 2.3 取消信号传递链
+
+```
+外部触发（Admin-web / CLI / 超时）
+    │ cancel_token.cancel()
+    ▼
+RequestScheduler
+    │ rate_limiter.wait() 提前返回 False
+    │ 主循环 break，不再发新请求
+    ▼
+ProcessingPool
+    │ 通过 SENTINEL 正常退出
+    │ 已入队数据全部处理完成
+    ▼
+WriteWorker
+    │ 通过 SENTINEL 正常退出
+    │ 已处理数据全部写入 DB
+    ▼
+返回 PipelineResult(cancelled=True, ...)
+```
+
+## 3. 组件与接口
+
+### 3.1 PipelineConfig（配置数据类）
+
+文件：`apps/etl/connectors/feiqiu/config/pipeline_config.py`
+
+```python
+@dataclass(frozen=True)
+class PipelineConfig:
+    """统一管道配置，支持全局默认 + 任务级覆盖。"""
+    workers: int = 2              # ProcessingPool 工作线程数
+    queue_size: int = 100         # 处理队列容量
+    batch_size: int = 100         # WriteWorker 批量写入阈值
+    batch_timeout: float = 5.0    # WriteWorker 等待超时（秒）
+    rate_min: float = 5.0         # RateLimiter 最小间隔（秒）
+    rate_max: float = 20.0        # RateLimiter 最大间隔（秒）
+    max_consecutive_failures: int = 10  # 连续失败中断阈值
+
+    def __post_init__(self):
+        if self.workers < 1:
+            raise ValueError(f"workers 必须 >= 1，当前值: {self.workers}")
+        if self.queue_size < 1:
+            raise ValueError(f"queue_size 必须 >= 1，当前值: {self.queue_size}")
+        if self.batch_size < 1:
+            raise ValueError(f"batch_size 必须 >= 1，当前值: {self.batch_size}")
+        if self.rate_min > self.rate_max:
+            raise ValueError(
+                f"rate_min({self.rate_min}) 不能大于 rate_max({self.rate_max})"
+            )
+
+    @classmethod
+    def from_app_config(cls, config: AppConfig, task_code: str | None = None) -> "PipelineConfig":
+        """从 AppConfig 加载，支持 pipeline.<task_code>.* 任务级覆盖。"""
+        def _get(key: str, default):
+            # 优先任务级 → 全局级 → 默认值
+            if task_code:
+                val = config.get(f"pipeline.{task_code.lower()}.{key}")
+                if val is not None:
+                    return type(default)(val)
+            val = config.get(f"pipeline.{key}")
+            if val is not None:
+                return type(default)(val)
+            return default
+
+        return cls(
+            workers=_get("workers", 2),
+            queue_size=_get("queue_size", 100),
+            batch_size=_get("batch_size", 100),
+            batch_timeout=_get("batch_timeout", 5.0),
+            rate_min=_get("rate_min", 5.0),
+            rate_max=_get("rate_max", 20.0),
+            max_consecutive_failures=_get("max_consecutive_failures", 10),
+        )
+```
+
+### 3.2 CancellationToken（取消令牌）
+
+文件：`apps/etl/connectors/feiqiu/utils/cancellation.py`
+
+```python
+class CancellationToken:
+    """线程安全的取消令牌，封装 threading.Event。"""
+    def __init__(self, timeout: float | None = None):
+        self._event = threading.Event()
+        self._timer: threading.Timer | None = None
+        if timeout is not None and timeout > 0:
+            self._timer = threading.Timer(timeout, self.cancel)
+            self._timer.daemon = True
+            self._timer.start()
+
+    def cancel(self):
+        """发出取消信号。"""
+        self._event.set()
+
+    @property
+    def is_cancelled(self) -> bool:
+        return self._event.is_set()
+
+    @property
+    def event(self) -> threading.Event:
+        return self._event
+
+    def dispose(self):
+        """清理超时定时器。"""
+        if self._timer is not None:
+            self._timer.cancel()
+            self._timer = None
+```
+
+### 3.3 RateLimiter（限流器）
+
+文件：`apps/etl/connectors/feiqiu/api/rate_limiter.py`
+
+```python
+class RateLimiter:
+    """请求间隔控制器，支持取消信号中断等待。"""
+    def __init__(self, min_interval: float = 5.0, max_interval: float = 20.0):
+        if min_interval > max_interval:
+            raise ValueError(
+                f"min_interval({min_interval}) 不能大于 max_interval({max_interval})"
+            )
+        self._min = min_interval
+        self._max = max_interval
+        self._last_interval: float = 0.0
+
+    def wait(self, cancel_event: threading.Event | None = None) -> bool:
+        """等待随机间隔。返回 False 表示被取消信号中断。
+        将等待时间拆分为 0.5s 小段，每段检查 cancel_event。"""
+        interval = random.uniform(self._min, self._max)
+        self._last_interval = interval
+        remaining = interval
+        while remaining > 0:
+            if cancel_event and cancel_event.is_set():
+                return False
+            sleep_time = min(0.5, remaining)
+            time.sleep(sleep_time)
+            remaining -= sleep_time
+        return True
+
+    @property
+    def last_interval(self) -> float:
+        return self._last_interval
+```
+
+### 3.4 UnifiedPipeline（统一管道引擎）
+
+文件：`apps/etl/connectors/feiqiu/pipeline/unified_pipeline.py`
+
+这是核心组件，封装"串行请求 + 异步处理 + 单线程写库"的完整执行引擎。
+
+```python
+@dataclass
+class PipelineResult:
+    """管道执行结果统计。"""
+    status: str = "SUCCESS"           # SUCCESS / PARTIAL / CANCELLED / FAILED
+    total_requests: int = 0
+    completed_requests: int = 0
+    total_fetched: int = 0
+    total_inserted: int = 0
+    total_updated: int = 0
+    total_skipped: int = 0
+    request_failures: int = 0
+    processing_failures: int = 0
+    write_failures: int = 0
+    cancelled: bool = False
+    errors: list[dict] = field(default_factory=list)
+    timing: dict[str, float] = field(default_factory=dict)  # 各阶段耗时
+
+class UnifiedPipeline:
+    """统一管道引擎：串行请求 + 异步处理 + 单线程写库。"""
+
+    def __init__(
+        self,
+        api_client: APIClient,
+        db_connection,
+        logger: logging.Logger,
+        config: PipelineConfig,
+        cancel_token: CancellationToken | None = None,
+    ):
+        self.api = api_client
+        self.db = db_connection
+        self.logger = logger
+        self.config = config
+        self.cancel_token = cancel_token or CancellationToken()
+        self._rate_limiter = RateLimiter(config.rate_min, config.rate_max)
+
+    def run(
+        self,
+        requests: Iterable[PipelineRequest],
+        process_fn: Callable[[Any], list[dict]],
+        write_fn: Callable[[list[dict]], WriteResult],
+    ) -> PipelineResult:
+        """执行管道。
+        
+        Args:
+            requests: 请求迭代器（由 BaseOdsTask 生成，包含 endpoint、params 等）
+            process_fn: 处理函数，将 API 响应转换为待写入记录列表
+            write_fn: 写入函数，将记录批量写入数据库
+        """
+        if self.cancel_token.is_cancelled:
+            return PipelineResult(status="CANCELLED", cancelled=True)
+
+        processing_queue = queue.Queue(maxsize=self.config.queue_size)
+        write_queue = queue.Queue(maxsize=self.config.queue_size * 2)
+        result = PipelineResult()
+
+        # 启动处理线程池
+        workers = []
+        for i in range(self.config.workers):
+            t = threading.Thread(
+                target=self._process_worker,
+                args=(processing_queue, write_queue, process_fn, result),
+                name=f"pipeline-worker-{i}",
+                daemon=True,
+            )
+            t.start()
+            workers.append(t)
+
+        # 启动写入线程
+        writer = threading.Thread(
+            target=self._write_worker,
+            args=(write_queue, write_fn, result),
+            name="pipeline-writer",
+            daemon=True,
+        )
+        writer.start()
+
+        # 主线程：串行请求
+        self._request_loop(requests, processing_queue, result)
+
+        # 发送 SENTINEL 到处理队列
+        for _ in workers:
+            processing_queue.put(None)
+        for w in workers:
+            w.join()
+
+        # 发送 SENTINEL 到写入队列
+        write_queue.put(None)
+        writer.join()
+
+        # 确定最终状态
+        if result.cancelled:
+            result.status = "CANCELLED"
+        elif result.request_failures + result.processing_failures + result.write_failures > 0:
+            result.status = "PARTIAL"
+
+        return result
+```
+
+### 3.5 BaseOdsTask 改造
+
+文件：`apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py`（修改现有文件）
+
+改造策略：在 `BaseOdsTask.execute()` 内部用 `UnifiedPipeline` 替代现有的同步循环，但保留所有现有功能（时间窗口解析、分页拉取、结构感知写入、快照软删除、content_hash 去重）。
+
+```python
+class BaseOdsTask(BaseTask):
+    """改造后的 ODS 任务基类。"""
+
+    def execute(self, cursor_data: dict | None = None) -> dict:
+        spec = self.SPEC
+        # ... 现有的窗口解析、分段逻辑保持不变 ...
+
+        # 构建 PipelineConfig（支持任务级覆盖）
+        pipeline_config = PipelineConfig.from_app_config(self.config, spec.code)
+        cancel_token = getattr(self, '_cancel_token', None)
+
+        pipeline = UnifiedPipeline(
+            api_client=self.api,
+            db_connection=self.db,
+            logger=self.logger,
+            config=pipeline_config,
+            cancel_token=cancel_token,
+        )
+
+        # 将现有的分页请求逻辑封装为 PipelineRequest 迭代器
+        # 将现有的 _insert_records_schema_aware 封装为 write_fn
+        # 将现有的字段提取/hash 计算封装为 process_fn
+        result = pipeline.run(
+            requests=self._build_requests(spec, segments, store_id, page_size),
+            process_fn=self._build_process_fn(spec),
+            write_fn=self._build_write_fn(spec, source_file),
+        )
+
+        # ... 快照软删除逻辑保持不变 ...
+        # ... 结果构建逻辑保持不变 ...
+```
+
+关键约束：
+- `OdsTaskSpec` 数据类的所有现有字段保持不变
+- `_insert_records_schema_aware()`、`_mark_missing_as_deleted()` 等方法保持不变
+- `TaskExecutor` 调用 `task.execute(cursor_data)` 的接口保持不变
+- `TaskRegistry` 中的注册代码保持不变
+
+### 3.6 OdsTaskSpec 扩展（Detail_Mode 支持）
+
+在现有 `OdsTaskSpec` 数据类中新增可选字段：
+
+```python
+@dataclass(frozen=True)
+class OdsTaskSpec:
+    # ... 所有现有字段保持不变 ...
+
+    # Detail_Mode 可选配置（新增）
+    detail_endpoint: str | None = None          # 详情接口 endpoint
+    detail_param_builder: Callable[[dict], dict] | None = None  # 详情请求参数构造
+    detail_target_table: str | None = None      # 详情数据目标表名
+    detail_data_path: tuple[str, ...] | None = None  # 详情数据的 data_path
+    detail_list_key: str | None = None          # 详情数据的 list_key
+    detail_id_column: str | None = None         # 从列表数据中提取 ID 的列名
+```
+
+当 `detail_endpoint` 为 `None` 时，Pipeline 跳过详情拉取阶段，行为与纯列表模式完全一致。
+
+### 3.7 DWD 多线程调度器
+
+文件：`apps/etl/connectors/feiqiu/tasks/dwd/dwd_load_task.py`（修改现有文件）
+
+改造 `DwdLoadTask.load()` 方法，将现有的串行 `for dwd_table, ods_table in TABLE_MAP` 循环改为 `concurrent.futures.ThreadPoolExecutor` 并行调度：
+
+```python
+def load(self, extracted: dict[str, Any], context: TaskContext) -> dict[str, Any]:
+    now = extracted["now"]
+    parallel_workers = int(self.config.get("dwd.parallel_workers", 4))
+
+    # 将表分为维度表和事实表两组
+    dim_tables = [(d, o) for d, o in self.TABLE_MAP.items() 
+                  if self._table_base(d).startswith("dim_")]
+    fact_tables = [(d, o) for d, o in self.TABLE_MAP.items() 
+                   if not self._table_base(d).startswith("dim_")]
+
+    summary = []
+    errors = []
+
+    # 维度表并行 SCD2 合并（每张表独立事务、独立数据库连接）
+    with ThreadPoolExecutor(max_workers=parallel_workers) as executor:
+        futures = {}
+        for dwd_table, ods_table in dim_tables:
+            if only_tables and ...:  # 过滤逻辑保持不变
+                continue
+            future = executor.submit(
+                self._process_single_table, dwd_table, ods_table, now, context
+            )
+            futures[future] = dwd_table
+
+        for future in as_completed(futures):
+            dwd_table = futures[future]
+            try:
+                table_result = future.result()
+                summary.append(table_result)
+            except Exception as exc:
+                errors.append({"table": dwd_table, "error": str(exc)})
+
+    # 事实表同样并行处理
+    # ... 类似逻辑 ...
+
+    return {"tables": summary, "errors": len(errors), "error_details": errors}
+```
+
+关键约束：
+- `_merge_dim_scd2()` 方法本身不改
+- 每张表使用独立的数据库连接和事务
+- 单张表失败不影响其他表
+
+### 3.8 任务日志管理器
+
+文件：`apps/etl/connectors/feiqiu/utils/task_log_buffer.py`（新建）
+
+```python
+class TaskLogBuffer:
+    """任务级日志缓冲区，收集单个任务的所有日志，任务完成后一次性输出。"""
+
+    def __init__(self, task_code: str, parent_logger: logging.Logger):
+        self.task_code = task_code
+        self._buffer: list[LogEntry] = []
+        self._lock = threading.Lock()
+        self._parent = parent_logger
+
+    def log(self, level: int, message: str, *args, **kwargs):
+        """线程安全地缓冲一条日志。"""
+        with self._lock:
+            self._buffer.append(LogEntry(
+                timestamp=datetime.now(),
+                level=level,
+                task_code=self.task_code,
+                message=message % args if args else message,
+            ))
+
+    def flush(self) -> list[LogEntry]:
+        """将缓冲区内容按时间顺序一次性输出到父 logger，并返回日志列表。"""
+        with self._lock:
+            entries = sorted(self._buffer, key=lambda e: e.timestamp)
+            for entry in entries:
+                self._parent.log(
+                    entry.level,
+                    "[%s] %s",
+                    entry.task_code,
+                    entry.message,
+                )
+            self._buffer.clear()
+            return entries
+```
+
+## 4. 数据模型
+
+### 4.1 PipelineConfig 配置命名空间
+
+在 `AppConfig` 中新增 `pipeline.*` 命名空间：
+
+| 配置键 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `pipeline.workers` | int | 2 | ProcessingPool 工作线程数 |
+| `pipeline.queue_size` | int | 100 | 处理队列容量 |
+| `pipeline.batch_size` | int | 100 | WriteWorker 批量写入阈值 |
+| `pipeline.batch_timeout` | float | 5.0 | WriteWorker 等待超时（秒） |
+| `pipeline.rate_min` | float | 5.0 | RateLimiter 最小间隔（秒） |
+| `pipeline.rate_max` | float | 20.0 | RateLimiter 最大间隔（秒） |
+| `pipeline.max_consecutive_failures` | int | 10 | 连续失败中断阈值 |
+| `pipeline.<TASK_CODE>.workers` | int | - | 任务级覆盖：工作线程数 |
+| `pipeline.<TASK_CODE>.rate_min` | float | - | 任务级覆盖：最小间隔 |
+| `pipeline.<TASK_CODE>.rate_max` | float | - | 任务级覆盖：最大间隔 |
+| `dwd.parallel_workers` | int | 4 | DWD 层并行线程数 |
+
+任务级覆盖示例（`.env`）：
+```
+PIPELINE_WORKERS=2
+PIPELINE_RATE_MIN=5.0
+PIPELINE_ODS_GROUP_PACKAGE.RATE_MIN=8.0
+PIPELINE_ODS_GROUP_PACKAGE.RATE_MAX=25.0
+```
+
+CLI 参数覆盖：
+```
+--pipeline-workers 4
+--pipeline-batch-size 200
+--pipeline-rate-min 3.0
+--pipeline-rate-max 15.0
+```
+
+### 4.2 PipelineRequest 数据类
+
+```python
+@dataclass
+class PipelineRequest:
+    """管道请求描述。"""
+    endpoint: str
+    params: dict
+    page_size: int | None = 200
+    data_path: tuple[str, ...] = ("data",)
+    list_key: str | None = None
+    segment_index: int = 0        # 所属窗口分段索引
+    is_detail: bool = False       # 是否为详情请求
+    detail_id: Any = None         # 详情请求的 ID
+```
+
+### 4.3 PipelineResult 数据类
+
+```python
+@dataclass
+class PipelineResult:
+    """管道执行结果。"""
+    status: str = "SUCCESS"
+    total_requests: int = 0
+    completed_requests: int = 0
+    total_fetched: int = 0
+    total_inserted: int = 0
+    total_updated: int = 0
+    total_skipped: int = 0
+    total_deleted: int = 0
+    request_failures: int = 0
+    processing_failures: int = 0
+    write_failures: int = 0
+    cancelled: bool = False
+    errors: list[dict] = field(default_factory=list)
+    timing: dict[str, float] = field(default_factory=dict)
+    # Detail_Mode 统计（仅在启用时填充）
+    detail_success: int = 0
+    detail_failure: int = 0
+    detail_skipped: int = 0
+```
+
+### 4.4 WriteResult 数据类
+
+```python
+@dataclass
+class WriteResult:
+    """单次批量写入结果。"""
+    inserted: int = 0
+    updated: int = 0
+    skipped: int = 0
+    errors: int = 0
+```
+
+### 4.5 LogEntry 数据类
+
+```python
+@dataclass
+class LogEntry:
+    """日志条目。"""
+    timestamp: datetime
+    level: int
+    task_code: str
+    message: str
+```
+
+### 4.6 现有数据模型不变
+
+以下现有数据模型保持不变：
+- `OdsTaskSpec`：仅新增 Detail_Mode 可选字段，所有现有字段不变
+- `TaskContext`：不变
+- `TaskMeta`：不变
+- `SnapshotMode`：不变
+- `ColumnSpec`：不变
+
+
+## 5. 正确性属性
+
+*属性（Property）是系统在所有有效执行中都应保持为真的特征或行为——本质上是对系统应做什么的形式化陈述。属性是人类可读规格与机器可验证正确性保证之间的桥梁。*
+
+以下属性基于需求文档中的验收标准推导，经过冗余消除和合并后得到 16 个独立属性。
+
+### Property 1: 请求严格串行
+
+*对于任意*一组提交到 RequestScheduler 的 API 请求，每个请求的发送时间戳必须严格晚于上一个请求的响应完成时间戳，无论请求来自同一个 ODS 任务还是不同的 ODS 任务。
+
+**Validates: Requirements 1.2, 1.6**
+
+### Property 2: RateLimiter 间隔范围
+
+*对于任意*有效的 min_interval 和 max_interval 配置（min_interval <= max_interval），RateLimiter.wait() 的实际等待时间始终在 [min_interval, max_interval] 范围内（允许 ±0.5s 的系统调度误差）。
+
+**Validates: Requirements 1.3**
+
+### Property 3: PipelineConfig 构造与验证
+
+*对于任意*一组配置参数，当 workers >= 1、queue_size >= 1、batch_size >= 1 且 rate_min <= rate_max 时，PipelineConfig 应成功构造并正确存储所有参数值；当任一条件不满足时，应抛出 ValueError。
+
+**Validates: Requirements 1.5, 4.1, 4.4, 4.5**
+
+### Property 4: 配置分层与任务级覆盖
+
+*对于任意*任务代码和配置值组合，PipelineConfig.from_app_config() 应遵循优先级：任务级配置（`pipeline.<task_code>.*`）> 全局配置（`pipeline.*`）> 默认值。当任务级配置存在时，应覆盖全局配置；当任务级配置不存在时，应回退到全局配置。
+
+**Validates: Requirements 1.4, 2.3, 2.6, 4.2, 4.3, 4.6, 7.2**
+
+### Property 5: 管道完成语义
+
+*对于任意*一组成功的 API 请求（无失败、无取消），UnifiedPipeline.run() 返回时，PipelineResult 中的 total_fetched 应等于所有请求返回的记录总数，且 total_inserted + total_updated + total_skipped 应等于 total_fetched。
+
+**Validates: Requirements 2.7**
+
+### Property 6: WriteWorker 批量大小约束
+
+*对于任意*配置的 batch_size，WriteWorker 每次调用 write_fn 时传入的记录数不超过 batch_size。
+
+**Validates: Requirements 2.5**
+
+### Property 7: CancellationToken 状态转换
+
+*对于任意* CancellationToken 实例，初始状态 is_cancelled 为 False；调用 cancel() 后 is_cancelled 变为 True 且不可逆转。对于配置了超时的 CancellationToken，在超时时间到达后 is_cancelled 自动变为 True。
+
+**Validates: Requirements 3.1, 3.6**
+
+### Property 8: 取消后已入队数据不丢失
+
+*对于任意*管道执行过程中触发取消信号的时刻，管道返回时：(a) 不再发送新的 API 请求，(b) 所有已提交到 processing_queue 的数据全部被处理完成，(c) 所有已处理完成的数据全部被写入数据库，(d) 返回结果的 status 为 CANCELLED 且 cancelled 为 True。
+
+**Validates: Requirements 3.2, 3.3, 3.4, 3.5**
+
+### Property 9: 迁移前后输出等价
+
+*对于任意* ODS 任务和相同的输入数据（API 响应序列），通过 UnifiedPipeline 执行后产生的数据库写入结果（inserted/updated/skipped 计数和记录内容）应与迁移前的同步串行执行完全一致。
+
+**Validates: Requirements 5.1, 5.3, 5.4, 5.5**
+
+### Property 10: Detail_Mode 可选性
+
+*对于任意* OdsTaskSpec，当 detail_endpoint 为 None 时，管道执行应跳过详情拉取阶段，结果中 detail_success/detail_failure/detail_skipped 均为 0；当 detail_endpoint 已配置时，管道应在列表拉取完成后执行详情拉取，且详情请求遵循与列表请求相同的限流规则。
+
+**Validates: Requirements 6.1, 6.3, 6.4**
+
+### Property 11: 单项失败不中断整体
+
+*对于任意*管道执行中的单个失败项（API 请求失败、处理异常、详情接口错误、DWD 单表合并失败），管道应继续处理后续项目，不中断整体流程，且失败项被正确记录在结果的 errors 列表中。
+
+**Validates: Requirements 6.5, 9.1, 9.2, 7.3, 7.4**
+
+### Property 12: 连续失败触发中断
+
+*对于任意*管道执行，当连续失败次数超过 max_consecutive_failures 配置值时，管道应主动中断执行，返回结果的 status 为 FAILED。当连续失败次数未超过阈值时，管道应继续执行。
+
+**Validates: Requirements 9.5**
+
+### Property 13: 写入失败回滚当前批次
+
+*对于任意*批量写入操作，当 write_fn 抛出数据库异常时，当前批次的事务应被回滚（不产生部分写入），该批次的记录被标记为写入失败，后续批次不受影响。
+
+**Validates: Requirements 9.3**
+
+### Property 14: 结果统计完整性
+
+*对于任意*管道执行（包括 Detail_Mode 和 DWD 多线程），返回结果中的统计信息应完整且一致：request_failures + processing_failures + write_failures 应等于 errors 列表的长度；detail_success + detail_failure + detail_skipped 应等于详情请求总数；DWD 汇总中成功表数 + 失败表数应等于总表数。
+
+**Validates: Requirements 6.6, 8.2, 9.4, 7.5**
+
+### Property 15: 日志缓冲区按任务隔离
+
+*对于任意*多个并发任务的日志流，每个 TaskLogBuffer 的 flush() 输出应仅包含该任务的日志条目，且按时间戳升序排列，不包含其他任务的日志。
+
+**Validates: Requirements 10.1, 10.4**
+
+### Property 16: DWD 并行与串行结果一致
+
+*对于任意*一组 DWD 表的 SCD2 合并操作，多线程并行执行的最终结果（每张表的 inserted/updated 计数）应与串行逐表执行的结果完全一致。
+
+**Validates: Requirements 7.1**
+
+## 6. 错误处理
+
+### 6.1 错误分类与处理策略
+
+| 错误类型 | 触发条件 | 处理方式 | 影响范围 |
+|----------|----------|----------|----------|
+| API 请求失败 | HTTP 错误、超时、API 返回错误码 | 由 `APIClient` 内置重试（3 次指数退避）；耗尽后记录错误，继续下一个请求 | 单个请求 |
+| 处理异常 | 字段提取、hash 计算等抛出异常 | 捕获异常，记录错误日志（含记录标识），标记为处理失败，继续处理队列 | 单条记录 |
+| 写入失败 | 数据库错误（约束冲突、连接断开等） | 回滚当前批次事务，记录错误日志（含批次大小），标记为写入失败 | 单个批次 |
+| 连续失败 | 连续 N 次请求/处理/写入失败 | 主动中断管道，status=FAILED | 整个任务 |
+| 取消信号 | 外部触发 cancel_token.cancel() | 停止新请求，等待已入队数据处理完成后退出 | 整个任务 |
+| 配置错误 | workers<1, rate_min>rate_max 等 | 构造时抛出 ValueError，任务不启动 | 整个任务 |
+| DWD 单表失败 | SCD2 合并过程中异常 | 回滚该表事务，记录错误，继续处理其他表 | 单张表 |
+
+### 6.2 连续失败计数逻辑
+
+```python
+consecutive_failures = 0
+
+for request in requests:
+    try:
+        response = api_client.post(...)
+        consecutive_failures = 0  # 成功则重置
+    except Exception:
+        consecutive_failures += 1
+        if consecutive_failures >= config.max_consecutive_failures:
+            result.status = "FAILED"
+            break
+```
+
+### 6.3 事务管理
+
+- ODS 层：每个窗口分段（segment）的数据在该分段全部处理完成后统一 commit，分段失败时 rollback 该分段（保留现有语义）
+- DWD 层：每张表独立事务，单表失败 rollback 不影响其他表
+- WriteWorker：每个批次独立事务，批次失败 rollback 不影响后续批次
+
+## 7. 测试策略
+
+### 7.1 测试框架
+
+- 单元测试：`pytest`（ETL 模块内 `tests/unit/`）
+- 属性测试：`hypothesis`（Monorepo 级 `tests/`）
+- 每个属性测试最少运行 100 次迭代
+
+### 7.2 属性测试计划
+
+每个正确性属性对应一个属性测试，使用 `hypothesis` 库实现：
+
+| 属性 | 测试文件 | 生成器策略 |
+|------|----------|-----------|
+| P1: 请求严格串行 | `tests/test_pipeline_properties.py` | 生成随机请求序列，用 FakeAPI 记录时间戳 |
+| P2: RateLimiter 间隔范围 | `tests/test_rate_limiter_properties.py` | 生成随机 (min, max) 对，验证 wait() 时间 |
+| P3: PipelineConfig 构造 | `tests/test_pipeline_config_properties.py` | 生成随机配置参数组合（含无效值） |
+| P4: 配置分层覆盖 | `tests/test_pipeline_config_properties.py` | 生成随机的多层配置字典 |
+| P5: 管道完成语义 | `tests/test_pipeline_properties.py` | 生成随机记录集，验证计数一致 |
+| P6: WriteWorker 批量约束 | `tests/test_pipeline_properties.py` | 生成随机 batch_size 和记录流 |
+| P7: CancellationToken 状态 | `tests/test_cancellation_properties.py` | 生成随机超时值 |
+| P8: 取消后数据不丢失 | `tests/test_pipeline_properties.py` | 生成随机请求序列 + 随机取消时刻 |
+| P9: 迁移等价 | `tests/test_migration_properties.py` | 生成随机 API 响应，对比新旧实现 |
+| P10: Detail_Mode 可选性 | `tests/test_detail_mode_properties.py` | 生成有/无 detail_endpoint 的 OdsTaskSpec |
+| P11: 单项失败不中断 | `tests/test_pipeline_properties.py` | 生成含随机失败的请求序列 |
+| P12: 连续失败中断 | `tests/test_pipeline_properties.py` | 生成连续失败序列 + 随机阈值 |
+| P13: 写入失败回滚 | `tests/test_pipeline_properties.py` | 生成含随机写入失败的批次 |
+| P14: 结果统计完整性 | `tests/test_pipeline_properties.py` | 生成随机执行结果，验证计数一致性 |
+| P15: 日志缓冲区隔离 | `tests/test_log_buffer_properties.py` | 生成多任务随机日志流 |
+| P16: DWD 并行串行一致 | `tests/test_dwd_parallel_properties.py` | 生成随机表集合 + mock SCD2 |
+
+每个测试必须包含注释标签：
+```python
+# Feature: etl-unified-pipeline, Property 1: 请求严格串行
+```
+
+### 7.3 单元测试计划
+
+单元测试聚焦于具体示例、边界条件和集成点：
+
+| 测试目标 | 测试文件 | 覆盖内容 |
+|----------|----------|----------|
+| RateLimiter | `tests/unit/test_rate_limiter.py` | 边界：min=max、取消中断、min>max 抛错 |
+| CancellationToken | `tests/unit/test_cancellation.py` | 边界：预取消、超时=0、dispose |
+| PipelineConfig | `tests/unit/test_pipeline_config.py` | 边界：无效参数、CLI 覆盖 |
+| UnifiedPipeline | `tests/unit/test_unified_pipeline.py` | 集成：FakeAPI + FakeDB 端到端 |
+| TaskLogBuffer | `tests/unit/test_task_log_buffer.py` | 边界：空缓冲区、并发写入 |
+| DWD 多线程调度 | `tests/unit/test_dwd_parallel.py` | 集成：mock SCD2 + 单表失败 |
+| Detail_Mode | `tests/unit/test_detail_mode.py` | 集成：列表→详情完整流程 |
+
+### 7.4 测试环境
+
+- 单元测试使用 FakeDB/FakeAPI，不涉及真实数据库连接
+- 属性测试使用 `hypothesis` 库，最少 100 次迭代
+- 集成测试（如需）使用 `test_etl_feiqiu` 测试库，通过 `TEST_DB_DSN` 连接
--- a/.kiro/specs/etl-unified-pipeline/requirements.md
+++ b/.kiro/specs/etl-unified-pipeline/requirements.md
@@ -0,0 +1,159 @@
+# 需求文档：ETL 统一请求编排与线程模型改造
+
+## 简介
+
+对飞球 Connector ETL 系统（`apps/etl/connectors/feiqiu/`）的请求编排和线程模型进行全局统一化改造。当前系统所有 ODS 任务在 `BaseOdsTask.execute()` 中同步串行执行 API 请求、数据处理和数据库写入，无限流机制、无取消信号、无并行处理能力。本次改造建立统一的请求编排框架，所有 21 个 ODS 任务迁移到"串行请求 + 异步处理 + 单线程写库"架构，支持全局限流（5-20 秒随机间隔）、外部取消信号、可选的"列表→详情"二级拉取模式，并对 DWD 层加载进行多线程优化。
+
+## 术语表
+
+- **ETL_System**：飞球 Connector ETL 系统（`apps/etl/connectors/feiqiu/`），负责从飞球 SaaS API 拉取数据并加载到 PostgreSQL 的 ODS → DWD → DWS 各层
+- **Unified_Pipeline**：统一请求编排框架，所有 ODS 任务共用的"串行请求 + 异步处理 + 单线程写库"执行引擎
+- **Request_Scheduler**：全局请求调度器，负责将所有 API 请求统一排队、串行发送、遵循限流规则
+- **Rate_Limiter**：请求间隔控制器，控制相邻两次 API 请求之间的随机等待时间（默认 5-20 秒均匀分布），防止触发上游风控
+- **Processing_Pool**：异步处理线程池，多个工作线程并行消费 API 响应数据，执行字段提取、数据清洗、content_hash 计算等 CPU 密集操作
+- **Write_Worker**：单线程写入工作器，汇总所有处理完成的结果，统一执行数据库写入操作，保证写入串行化
+- **CancellationToken**：取消令牌，外部组件（如 Admin-web、CLI）通过设置该令牌通知正在执行的任务中断
+- **ODS_Task**：ODS 层数据拉取任务的统称，当前共 21 个，通过 `OdsTaskSpec` 数据类定义、`_build_task_class()` 动态生成任务类
+- **Detail_Mode**：二级详情拉取模式，在列表接口拉取完成后逐条调用详情接口获取更丰富的数据，属于可选能力
+- **Pipeline_Config**：管道配置，包含 worker 数、队列大小、批量写入阈值、限流间隔等参数，不同任务可独立配置
+- **BaseOdsTask**：当前 ODS 任务基类（`tasks/ods/ods_tasks.py`），封装时间窗口解析、API 分页拉取、结构感知写入、快照软删除等核心逻辑
+- **TaskExecutor**：任务执行器（`orchestration/task_executor.py`），封装单个任务的执行生命周期（游标管理、运行记录、数据源路由）
+- **FlowRunner**：流程编排器（`orchestration/flow_runner.py`），编排多层任务（ODS → DWD → DWS → INDEX）的执行顺序
+- **DWD_Loader**：DWD 层加载任务（`DwdLoadTask`），通过 `_merge_dim_scd2()` 执行 SCD2 合并，将 ODS 原始数据转换为维度/事实表
+
+## 需求
+
+### 需求 1：统一请求调度器
+
+**用户故事：** 作为 ETL 运维人员，我希望所有 API 请求通过统一的调度器串行发送并遵循限流规则，避免触发上游风控导致 IP 封禁或数据拉取失败。
+
+#### 验收标准
+
+1. THE Request_Scheduler SHALL 维护一个全局请求队列，所有 ODS 任务的 API 请求统一进入该队列排队等待发送
+2. THE Request_Scheduler SHALL 严格串行发送请求：等待上一个请求的 HTTP 响应完整返回后，再等待限流间隔，然后发送队列中的下一个请求
+3. THE Rate_Limiter SHALL 在每两个相邻请求之间插入 5 至 20 秒之间的随机等待时间（均匀分布）
+4. THE Rate_Limiter SHALL 支持通过 Pipeline_Config 调整最小间隔（默认 5 秒）和最大间隔（默认 20 秒），不同任务可配置不同的间隔范围
+5. IF Rate_Limiter 初始化时最小间隔大于最大间隔，THEN THE Rate_Limiter SHALL 抛出 `ValueError` 并包含描述性错误信息
+6. WHEN 同一次 FlowRunner 执行中包含多个 ODS 任务时，THE Request_Scheduler SHALL 按任务注册顺序依次处理每个任务的请求，同一时刻仅有一个 HTTP 请求在途
+7. THE Request_Scheduler SHALL 在每个请求完成后记录请求耗时、响应状态码和目标 endpoint 到日志
+
+### 需求 2：异步处理与单线程写库架构
+
+**用户故事：** 作为 ETL 运维人员，我希望 API 响应数据的处理（字段提取、清洗、hash 计算）能与请求发送并行执行，同时保证数据库写入的串行化，在不增加 API 压力的前提下提升整体吞吐。
+
+#### 验收标准
+
+1. THE Unified_Pipeline SHALL 在每个 API 请求的响应返回后，立即将响应数据提交到 Processing_Pool 的任务队列，不阻塞 Request_Scheduler 的限流等待计时
+2. THE Processing_Pool SHALL 支持多个工作线程并行消费处理队列中的响应数据，执行字段提取、数据清洗、content_hash 计算、record 层合并等操作
+3. THE Processing_Pool 的工作线程数量 SHALL 通过 Pipeline_Config 配置（默认值 2），不同任务可独立配置
+4. THE Write_Worker SHALL 作为单独的线程运行，从处理完成队列中消费数据，统一执行数据库 INSERT/UPSERT 操作
+5. THE Write_Worker SHALL 支持批量写入：累积到配置的阈值（默认 100 条记录）或等待超时（默认 5 秒）后执行一次批量写入
+6. THE Write_Worker 的批量写入阈值和等待超时 SHALL 通过 Pipeline_Config 配置，不同任务可独立配置
+7. WHEN 所有请求发送完毕后，THE Unified_Pipeline SHALL 等待 Processing_Pool 和 Write_Worker 全部完成后再返回最终结果
+8. THE Unified_Pipeline SHALL 保证多线程读库操作的安全性：Processing_Pool 中的工作线程可并行读取数据库（如查询最新 content_hash），使用独立的只读数据库连接
+
+### 需求 3：外部取消信号支持
+
+**用户故事：** 作为 ETL 运维人员，我希望能通过 Admin-web 或 CLI 发送取消信号中断正在执行的 ODS 任务，避免长时间运行的任务无法停止。
+
+#### 验收标准
+
+1. THE CancellationToken SHALL 提供线程安全的 `cancel()` 方法和 `is_cancelled` 属性，供外部组件触发取消
+2. WHEN CancellationToken 被触发时，THE Request_Scheduler SHALL 在当前请求的限流等待期或响应等待期中断，不再发起后续请求
+3. WHEN CancellationToken 被触发时，THE Processing_Pool SHALL 完成当前已提交到队列中的所有数据处理任务，不丢弃已入队的数据
+4. WHEN CancellationToken 被触发时，THE Write_Worker SHALL 将所有已处理完成的数据写入数据库后再退出，保证已处理数据的持久化
+5. WHEN 任务因取消信号中断时，THE Unified_Pipeline SHALL 返回部分完成的统计结果（已完成的请求数、已处理的记录数、已写入的记录数），任务状态标记为 `CANCELLED`
+6. THE CancellationToken SHALL 支持超时自动取消：可在创建时指定最大执行时间（秒），超时后自动触发取消信号
+7. IF CancellationToken 在任务启动前已处于取消状态，THEN THE Unified_Pipeline SHALL 立即返回空结果，不发送任何请求
+
+### 需求 4：Pipeline 配置体系
+
+**用户故事：** 作为 ETL 运维人员，我希望线程模型的各项参数（worker 数、队列大小、批量写入阈值、限流间隔）足够灵活，不同接口可以有不同的配置，以适应不同 API 的特性。
+
+#### 验收标准
+
+1. THE Pipeline_Config SHALL 支持以下可配置参数：Processing_Pool 工作线程数（`workers`，默认 2）、处理队列容量（`queue_size`，默认 100）、Write_Worker 批量写入阈值（`batch_size`，默认 100）、Write_Worker 等待超时秒数（`batch_timeout`，默认 5.0）、Rate_Limiter 最小间隔秒数（`rate_min`，默认 5.0）、Rate_Limiter 最大间隔秒数（`rate_max`，默认 20.0）
+2. THE Pipeline_Config SHALL 遵循现有配置分层体系（根 `.env` < `.env.local` < 环境变量 < CLI 参数），通过 `AppConfig` 的 `pipeline.*` 命名空间读取
+3. THE Pipeline_Config SHALL 支持任务级覆盖：通过 `pipeline.<task_code>.*` 命名空间为特定任务指定独立配置，未指定时回退到 `pipeline.*` 全局默认值
+4. IF Pipeline_Config 中 `workers` 小于 1 或 `queue_size` 小于 1，THEN THE Unified_Pipeline SHALL 抛出 `ValueError` 并包含描述性错误信息
+5. IF Pipeline_Config 中 `batch_size` 小于 1，THEN THE Unified_Pipeline SHALL 抛出 `ValueError` 并包含描述性错误信息
+6. THE Pipeline_Config SHALL 支持运行时通过 CLI 参数 `--pipeline-workers`、`--pipeline-batch-size`、`--pipeline-rate-min`、`--pipeline-rate-max` 覆盖全局默认值
+
+### 需求 5：现有 ODS 任务迁移
+
+**用户故事：** 作为 ETL 开发者，我希望现有 21 个 ODS 任务全部迁移到统一管道框架上，保持功能完全等价，不丢失任何现有能力。
+
+#### 验收标准
+
+1. THE Unified_Pipeline SHALL 完整保留 BaseOdsTask 的所有现有功能：时间窗口解析（`_resolve_window`）、窗口分段（`build_window_segments`）、API 分页拉取（`iter_paginated`）、结构感知写入（`_insert_records_schema_aware`）、快照软删除（`_mark_missing_as_deleted`）、content_hash 去重（`skip_unchanged`）
+2. THE Unified_Pipeline SHALL 保留 OdsTaskSpec 数据类的所有现有字段定义，迁移后的任务通过相同的 `OdsTaskSpec` 实例配置
+3. WHEN 迁移完成后，THE ETL_System 对每个 ODS 任务执行相同输入数据时 SHALL 产生与迁移前完全相同的数据库写入结果（相同的 inserted/updated/skipped 计数和相同的记录内容）
+4. THE Unified_Pipeline SHALL 保留现有的 `endpoint_routing` 逻辑（recent/former 路由拆分），迁移后的请求路由行为与现有系统一致
+5. THE Unified_Pipeline SHALL 保留现有的 `source_file`、`source_endpoint`、`fetched_at` 等元数据写入逻辑
+6. THE Unified_Pipeline SHALL 兼容现有的 `TaskExecutor` 执行生命周期（游标管理、运行记录、数据源路由），迁移后 TaskExecutor 无需修改调用方式
+7. WHEN 迁移完成后，THE TaskRegistry 中所有 21 个 ODS 任务的注册代码和元数据 SHALL 保持不变
+
+### 需求 6：二级详情拉取模式
+
+**用户故事：** 作为 ETL 开发者，我希望统一管道框架支持"列表接口拉完后逐条调详情"的二级拉取模式，以便团购详情等需要二次请求的业务能在框架内实现。
+
+#### 验收标准
+
+1. THE Unified_Pipeline SHALL 支持可选的 Detail_Mode：在列表接口的所有分页数据拉取并写入 ODS 完成后，从已写入的记录中提取 ID 列表，逐条调用详情接口
+2. THE OdsTaskSpec SHALL 新增可选字段支持 Detail_Mode 配置：详情接口 endpoint、详情请求参数构造函数、详情数据目标表名、详情数据的 data_path 和 list_key
+3. WHEN OdsTaskSpec 未配置 Detail_Mode 相关字段时，THE Unified_Pipeline SHALL 跳过详情拉取阶段，行为与纯列表拉取模式完全一致
+4. THE Detail_Mode 的详情请求 SHALL 通过 Request_Scheduler 统一排队，遵循与列表请求相同的限流规则
+5. IF 详情接口对某个 ID 返回错误或超时，THEN THE Unified_Pipeline SHALL 记录错误日志（含 ID 和错误信息）并继续处理下一个 ID，不中断整体流程
+6. WHEN 详情拉取完成后，THE Unified_Pipeline SHALL 在任务执行结果中包含详情拉取的统计信息（详情成功数、详情失败数、详情跳过数），与列表拉取统计分开记录
+
+### 需求 7：DWD 层多线程优化
+
+**用户故事：** 作为 ETL 运维人员，我希望 DWD 层加载任务能利用多线程并行处理多张表的 SCD2 合并，缩短 DWD 层的整体执行时间。
+
+#### 验收标准
+
+1. THE DWD_Loader SHALL 支持多线程并行执行多张 DWD 表的 SCD2 合并操作，每张表的合并在独立线程中运行
+2. THE DWD_Loader 的并行线程数 SHALL 通过配置参数控制（默认值 4），通过 `AppConfig` 的 `dwd.parallel_workers` 读取
+3. THE DWD_Loader SHALL 保证每张表的 SCD2 合并操作在独立的数据库事务中执行，单张表失败不影响其他表的处理
+4. WHEN 某张 DWD 表的 SCD2 合并失败时，THE DWD_Loader SHALL 记录错误日志（含表名和错误信息），将该表标记为失败，继续处理其他表
+5. THE DWD_Loader SHALL 在所有表处理完成后返回汇总结果：成功表数、失败表数、每张表的 inserted/updated 计数
+6. THE DWD_Loader 的现有 `_merge_dim_scd2()` 方法 SHALL 保持不变，多线程优化仅在调度层面并行调用该方法
+
+### 需求 8：可观测性与监控
+
+**用户故事：** 作为 ETL 运维人员，我希望统一管道框架提供充分的运行时可观测性，便于监控执行状态和排查问题。
+
+#### 验收标准
+
+1. THE Unified_Pipeline SHALL 在任务执行过程中记录以下关键指标到日志：当前请求队列深度、Processing_Pool 活跃线程数、Write_Worker 待写入队列深度、已完成请求数/总请求数
+2. THE Unified_Pipeline SHALL 在任务完成后输出执行摘要：总耗时、请求阶段耗时、处理阶段耗时、写入阶段耗时、各阶段的记录数统计
+3. WHEN Processing_Pool 的任务队列达到容量上限时，THE Unified_Pipeline SHALL 记录警告日志，Request_Scheduler 暂停发送新请求直到队列有空位（背压机制）
+4. WHEN Write_Worker 的待写入队列积压超过 `queue_size * 2` 时，THE Unified_Pipeline SHALL 记录警告日志
+5. THE Unified_Pipeline SHALL 与现有的 `EtlTimer` 集成，在 FlowRunner 的计时报告中体现各 ODS 任务的请求/处理/写入阶段耗时
+
+### 需求 9：错误处理与容错
+
+**用户故事：** 作为 ETL 运维人员，我希望统一管道框架具备完善的错误处理机制，单个请求或记录的失败不影响整体任务的执行。
+
+#### 验收标准
+
+1. IF 单个 API 请求失败（HTTP 错误、超时、API 返回错误码），THEN THE Request_Scheduler SHALL 按现有 `APIClient` 的重试策略（最多 3 次，指数退避）重试，重试耗尽后记录错误并继续处理下一个请求
+2. IF Processing_Pool 中某条记录的处理抛出异常，THEN THE Processing_Pool SHALL 记录错误日志（含记录标识和异常信息），将该记录标记为处理失败，继续处理队列中的其他记录
+3. IF Write_Worker 执行批量写入时发生数据库错误，THEN THE Write_Worker SHALL 回滚当前批次的事务，记录错误日志（含批次大小和错误信息），将该批次的记录标记为写入失败
+4. WHEN 任务执行完成后，THE Unified_Pipeline SHALL 在执行结果中汇总所有错误：请求失败数、处理失败数、写入失败数，以及每个失败项的错误摘要
+5. IF 任务执行过程中连续失败次数超过配置阈值（默认 10 次），THEN THE Unified_Pipeline SHALL 主动中断任务执行，将任务状态标记为 `FAILED`，避免无效重试浪费时间
+6. THE Unified_Pipeline SHALL 保留现有 BaseOdsTask 的事务管理语义：每个窗口分段（segment）的数据在该分段全部处理完成后统一 commit，分段失败时 rollback 该分段
+
+### 需求 10：Admin-web 日志输出优化
+
+**用户故事：** 作为 ETL 运维人员，我希望在 Admin-web 管理后台查看 ETL 执行日志时，各个任务的日志按任务分组、有序展示，避免多任务并行执行时日志行交叉混乱导致难以阅读和排查问题。
+
+#### 验收标准
+
+1. THE ETL_System SHALL 为每个 ODS 任务的执行日志添加任务标识前缀（任务代码），使日志行可按任务归属区分
+2. THE Admin-web SHALL 支持按任务代码过滤和分组展示 ETL 执行日志，用户可选择查看单个任务的日志或全部日志
+3. THE Unified_Pipeline SHALL 在多线程环境下保证日志写入的原子性：每条日志消息作为完整的一行输出，不会被其他线程的日志截断或插入
+4. THE ETL_System SHALL 为每个任务维护独立的日志缓冲区，任务完成后将该任务的完整日志按时间顺序一次性输出到 Admin-web，避免执行过程中不同任务的日志行交叉
+5. THE Admin-web SHALL 在 ETL 执行结果页面中按任务分段展示日志：每个任务的日志折叠为独立区块，展开后显示该任务的完整执行日志（含时间戳、日志级别、消息内容）
+6. WHEN 多个 ODS 任务在同一次 FlowRunner 执行中运行时，THE Admin-web SHALL 在顶部展示任务执行时间线概览（每个任务的开始时间、结束时间、状态），用户可点击跳转到对应任务的日志区块
+
--- a/.kiro/specs/etl-unified-pipeline/tasks.md
+++ b/.kiro/specs/etl-unified-pipeline/tasks.md
@@ -0,0 +1,229 @@
+# 实施计划：ETL 统一请求编排与线程模型改造
+
+## 概述
+
+将飞球 Connector ETL 系统的 ODS 任务从同步串行执行迁移到"串行请求 + 异步处理 + 单线程写库"统一管道架构。按组件依赖顺序逐步实现：基础组件 → 核心引擎 → 任务迁移 → DWD 优化 → 日志优化。
+
+## 任务
+
+- [x] 1. 实现基础组件（PipelineConfig、CancellationToken、RateLimiter）
+  - [x] 1.1 创建 `apps/etl/connectors/feiqiu/config/pipeline_config.py`，实现 `PipelineConfig` 数据类
+    - 定义 `workers`、`queue_size`、`batch_size`、`batch_timeout`、`rate_min`、`rate_max`、`max_consecutive_failures` 字段及默认值
+    - 实现 `__post_init__` 参数校验（workers>=1、queue_size>=1、batch_size>=1、rate_min<=rate_max）
+    - 实现 `from_app_config(config, task_code)` 类方法，支持 `pipeline.<task_code>.*` 任务级覆盖 → 全局 `pipeline.*` → 默认值的三级回退
+    - _需求: 1.3, 1.4, 1.5, 2.3, 2.5, 2.6, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6_
+
+  - [x] 1.2 编写 PipelineConfig 属性测试
+    - **Property 3: PipelineConfig 构造与验证** — 生成随机配置参数组合（含无效值），验证合法参数成功构造、非法参数抛出 ValueError
+    - **Property 4: 配置分层与任务级覆盖** — 生成随机多层配置字典，验证任务级 > 全局级 > 默认值的优先级
+    - 测试文件：`tests/test_pipeline_config_properties.py`
+    - **验证: 需求 1.4, 1.5, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6**
+
+  - [x] 1.3 创建 `apps/etl/connectors/feiqiu/utils/cancellation.py`，实现 `CancellationToken` 类
+    - 基于 `threading.Event` 实现线程安全的 `cancel()` 方法和 `is_cancelled` 属性
+    - 实现超时自动取消（构造时传入 `timeout` 秒数，通过 `threading.Timer` 触发）
+    - 实现 `dispose()` 清理定时器
+    - _需求: 3.1, 3.6_
+
+  - [x] 1.4 编写 CancellationToken 属性测试
+    - **Property 7: CancellationToken 状态转换** — 生成随机超时值，验证初始 False、cancel() 后 True 且不可逆、超时自动触发
+    - 测试文件：`tests/test_cancellation_properties.py`
+    - **验证: 需求 3.1, 3.6**
+
+  - [x] 1.5 创建 `apps/etl/connectors/feiqiu/api/rate_limiter.py`，实现 `RateLimiter` 类
+    - 构造时校验 `min_interval <= max_interval`，否则抛出 `ValueError`
+    - 实现 `wait(cancel_event)` 方法：生成 `[min, max]` 均匀分布随机间隔，拆分为 0.5s 小段轮询 cancel_event
+    - 暴露 `last_interval` 属性
+    - _需求: 1.3, 1.5_
+
+  - [x] 1.6 编写 RateLimiter 属性测试
+    - **Property 2: RateLimiter 间隔范围** — 生成随机 (min, max) 对，验证 wait() 实际等待时间在 [min, max] ± 0.5s 范围内
+    - 测试文件：`tests/test_rate_limiter_properties.py`
+    - **验证: 需求 1.3**
+
+  - [x] 1.7 编写基础组件单元测试
+    - 测试文件：`apps/etl/connectors/feiqiu/tests/unit/test_pipeline_config.py`、`tests/unit/test_cancellation.py`、`tests/unit/test_rate_limiter.py`
+    - 覆盖边界条件：RateLimiter min=max、CancellationToken 预取消/timeout=0/dispose、PipelineConfig 无效参数/CLI 覆盖
+    - _需求: 1.3, 1.5, 3.1, 3.6, 4.1, 4.4, 4.5_
+
+- [x] 2. 检查点 — 基础组件验证
+  - 确保所有测试通过，ask the user if questions arise.
+
+- [x] 3. 实现核心管道引擎（UnifiedPipeline）
+  - [x] 3.1 创建数据类 `PipelineRequest`、`PipelineResult`、`WriteResult`
+    - 文件：`apps/etl/connectors/feiqiu/pipeline/models.py`
+    - `PipelineRequest`：endpoint、params、page_size、data_path、list_key、segment_index、is_detail、detail_id
+    - `PipelineResult`：status、各阶段计数、errors 列表、timing 字典、Detail_Mode 统计
+    - `WriteResult`：inserted、updated、skipped、errors
+    - _需求: 2.7, 6.6, 8.2, 9.4_
+
+  - [x] 3.2 创建 `apps/etl/connectors/feiqiu/pipeline/unified_pipeline.py`，实现 `UnifiedPipeline` 核心引擎
+    - 实现 `__init__`：接收 api_client、db_connection、logger、PipelineConfig、CancellationToken，初始化 RateLimiter
+    - 实现 `run(requests, process_fn, write_fn) -> PipelineResult` 主方法：
+      - 预取消检查（cancel_token 已取消则立即返回空结果）
+      - 创建 processing_queue（maxsize=queue_size）和 write_queue（maxsize=queue_size*2）
+      - 启动 N 个 worker 线程（`_process_worker`）和 1 个 writer 线程（`_write_worker`）
+      - 主线程执行 `_request_loop`：串行发送请求、限流等待、取消检查、背压阻塞
+      - 发送 SENTINEL 通知线程退出，join 等待完成
+      - 计算最终 status（SUCCESS/PARTIAL/CANCELLED/FAILED）
+    - _需求: 1.1, 1.2, 1.6, 2.1, 2.2, 2.4, 2.7, 2.8, 3.2, 3.7, 8.3_
+
+  - [x] 3.3 实现 `_request_loop` 请求调度逻辑
+    - 遍历 requests 迭代器，逐个发送 API 请求
+    - 每个请求完成后记录耗时、状态码、endpoint 到日志
+    - 将响应数据 put 到 processing_queue（满时阻塞 = 背压）
+    - 请求间调用 rate_limiter.wait(cancel_event)，被取消则 break
+    - 实现连续失败计数：成功重置为 0，失败 +1，超过 max_consecutive_failures 则中断
+    - _需求: 1.2, 1.7, 3.2, 8.1, 8.3, 9.1, 9.5_
+
+  - [x] 3.4 实现 `_process_worker` 处理线程逻辑
+    - 从 processing_queue 消费数据，调用 process_fn 处理
+    - 处理结果 put 到 write_queue
+    - 单条记录处理异常时捕获、记录错误、标记失败、继续处理
+    - 收到 SENTINEL 时退出
+    - _需求: 2.1, 2.2, 9.2_
+
+  - [x] 3.5 实现 `_write_worker` 写入线程逻辑
+    - 从 write_queue 消费数据，累积到 batch_size 或等待 batch_timeout 后调用 write_fn 批量写入
+    - 写入失败时回滚当前批次事务、记录错误、标记失败、继续处理后续批次
+    - 队列积压超过 queue_size*2 时记录警告日志
+    - 收到 SENTINEL 时将剩余数据 flush 写入后退出
+    - _需求: 2.4, 2.5, 2.6, 8.4, 9.3, 9.6_
+
+  - [x] 3.6 编写 UnifiedPipeline 属性测试
+    - **Property 1: 请求严格串行** — 用 FakeAPI 记录时间戳，验证每个请求发送时间 > 上一个响应完成时间
+    - **Property 5: 管道完成语义** — 生成随机记录集，验证 total_fetched == total_inserted + total_updated + total_skipped
+    - **Property 6: WriteWorker 批量大小约束** — 生成随机 batch_size 和记录流，验证每次 write_fn 调用的记录数 <= batch_size
+    - **Property 8: 取消后已入队数据不丢失** — 生成随机请求序列 + 随机取消时刻，验证已入队数据全部处理和写入
+    - **Property 11: 单项失败不中断整体** — 生成含随机失败的请求序列，验证后续项目继续处理
+    - **Property 12: 连续失败触发中断** — 生成连续失败序列 + 随机阈值，验证超过阈值时中断
+    - **Property 13: 写入失败回滚当前批次** — 生成含随机写入失败的批次，验证回滚且后续批次不受影响
+    - **Property 14: 结果统计完整性** — 验证各计数字段的一致性关系
+    - 测试文件：`tests/test_pipeline_properties.py`
+    - **验证: 需求 1.2, 1.6, 2.5, 2.7, 3.2, 3.3, 3.4, 3.5, 6.6, 8.2, 9.1, 9.2, 9.3, 9.4, 9.5**
+
+  - [x] 3.7 编写 UnifiedPipeline 单元测试
+    - 测试文件：`apps/etl/connectors/feiqiu/tests/unit/test_unified_pipeline.py`
+    - 使用 FakeAPI + FakeDB 端到端测试：正常流程、空请求、预取消、背压触发
+    - _需求: 2.7, 3.7, 8.1, 8.3_
+
+- [x] 4. 检查点 — 核心引擎验证
+  - 确保所有测试通过，ask the user if questions arise.
+
+- [x] 5. BaseOdsTask 改造与 ODS 任务迁移
+  - [x] 5.1 扩展 `OdsTaskSpec` 数据类，新增 Detail_Mode 可选字段
+    - 在 `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py` 中为 `OdsTaskSpec` 新增：`detail_endpoint`、`detail_param_builder`、`detail_target_table`、`detail_data_path`、`detail_list_key`、`detail_id_column`
+    - 所有新增字段默认值为 `None`，不影响现有 21 个任务的 OdsTaskSpec 实例
+    - _需求: 6.2, 6.3_
+
+  - [x] 5.2 改造 `BaseOdsTask.execute()` 方法，嵌入 UnifiedPipeline
+    - 在 `execute()` 内部构建 `PipelineConfig.from_app_config(self.config, spec.code)`
+    - 将现有分页请求逻辑封装为 `_build_requests()` → `Iterable[PipelineRequest]`
+    - 将现有字段提取/hash 计算封装为 `_build_process_fn()` → `Callable`
+    - 将现有 `_insert_records_schema_aware` 封装为 `_build_write_fn()` → `Callable`
+    - 调用 `pipeline.run(requests, process_fn, write_fn)` 替代现有同步循环
+    - 保留快照软删除（`_mark_missing_as_deleted`）、endpoint_routing、元数据写入（source_file、source_endpoint、fetched_at）
+    - 保留 TaskExecutor 调用接口不变（`task.execute(cursor_data)` 签名不变）
+    - _需求: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7_
+
+  - [x] 5.3 实现 Detail_Mode 详情拉取逻辑
+    - 在 `BaseOdsTask` 中实现 `_build_detail_requests()` 方法：从已写入 ODS 的记录中提取 ID 列表，生成 `PipelineRequest(is_detail=True)` 序列
+    - 详情请求通过同一个 UnifiedPipeline 的 RequestScheduler 排队，遵循相同限流规则
+    - 单个详情请求失败时记录错误日志（含 ID 和错误信息），继续处理下一个
+    - 在 PipelineResult 中填充 detail_success/detail_failure/detail_skipped 统计
+    - _需求: 6.1, 6.4, 6.5, 6.6_
+
+  - [x] 5.4 编写 Detail_Mode 属性测试
+    - **Property 10: Detail_Mode 可选性** — 生成有/无 detail_endpoint 的 OdsTaskSpec，验证无配置时跳过详情阶段、有配置时执行详情拉取且遵循限流
+    - 测试文件：`tests/test_detail_mode_properties.py`
+    - **验证: 需求 6.1, 6.3, 6.4**
+
+  - [x] 5.5 编写迁移等价属性测试
+    - **Property 9: 迁移前后输出等价** — 生成随机 API 响应序列，对比 UnifiedPipeline 与原同步串行实现的数据库写入结果（inserted/updated/skipped 计数和记录内容）
+    - 测试文件：`tests/test_migration_properties.py`
+    - **验证: 需求 5.1, 5.3, 5.4, 5.5**
+
+  - [x] 5.6 编写 Detail_Mode 和迁移单元测试
+    - 测试文件：`apps/etl/connectors/feiqiu/tests/unit/test_detail_mode.py`
+    - 覆盖：列表→详情完整流程、无 detail_endpoint 跳过、详情单条失败不中断
+    - _需求: 6.1, 6.3, 6.5_
+
+- [x] 6. 检查点 — ODS 迁移验证
+  - 确保所有测试通过，ask the user if questions arise.
+
+- [x] 7. DWD 层多线程优化
+  - [x] 7.1 改造 `apps/etl/connectors/feiqiu/tasks/dwd/dwd_load_task.py` 中的 `DwdLoadTask.load()` 方法
+    - 从 `AppConfig` 读取 `dwd.parallel_workers`（默认 4）
+    - 将现有串行 `for dwd_table, ods_table in TABLE_MAP` 循环改为 `concurrent.futures.ThreadPoolExecutor` 并行调度
+    - 每张表调用 `_process_single_table()` 在独立线程中执行，使用独立数据库连接和事务
+    - `_merge_dim_scd2()` 方法本身不改
+    - 单张表失败时捕获异常、记录错误日志（含表名和错误信息）、标记失败、继续处理其他表
+    - 所有表处理完成后返回汇总结果：成功表数、失败表数、每张表的 inserted/updated 计数
+    - _需求: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6_
+
+  - [x] 7.2 编写 DWD 并行属性测试
+    - **Property 16: DWD 并行与串行结果一致** — 生成随机表集合 + mock SCD2，验证多线程并行执行的结果与串行逐表执行完全一致
+    - 测试文件：`tests/test_dwd_parallel_properties.py`
+    - **验证: 需求 7.1**
+
+  - [x] 7.3 编写 DWD 多线程单元测试
+    - 测试文件：`apps/etl/connectors/feiqiu/tests/unit/test_dwd_parallel.py`
+    - 覆盖：mock SCD2 正常并行、单表失败不影响其他表、汇总结果正确
+    - _需求: 7.3, 7.4, 7.5_
+
+- [x] 8. 可观测性与日志优化
+  - [x] 8.1 在 UnifiedPipeline 中集成运行时指标日志
+    - 在 `_request_loop` 中定期记录：当前请求队列深度、ProcessingPool 活跃线程数、WriteWorker 待写入队列深度、已完成请求数/总请求数
+    - 在 `run()` 返回前计算并记录执行摘要：总耗时、请求/处理/写入各阶段耗时、各阶段记录数统计
+    - 与现有 `EtlTimer` 集成，在 FlowRunner 计时报告中体现各 ODS 任务的阶段耗时
+    - _需求: 8.1, 8.2, 8.5_
+
+  - [x] 8.2 创建 `apps/etl/connectors/feiqiu/utils/task_log_buffer.py`，实现 `TaskLogBuffer` 类
+    - 实现线程安全的 `log(level, message)` 方法，将日志条目缓冲到内存列表
+    - 实现 `flush()` 方法：按时间戳升序排列，一次性输出到父 logger，添加 `[task_code]` 前缀
+    - 定义 `LogEntry` 数据类（timestamp、level、task_code、message）
+    - _需求: 10.1, 10.3, 10.4_
+
+  - [x] 8.3 编写日志缓冲区属性测试
+    - **Property 15: 日志缓冲区按任务隔离** — 生成多任务随机日志流，验证每个 TaskLogBuffer 的 flush() 仅包含该任务日志且按时间戳升序
+    - 测试文件：`tests/test_log_buffer_properties.py`
+    - **验证: 需求 10.1, 10.4**
+
+  - [x] 8.4 编写 TaskLogBuffer 单元测试
+    - 测试文件：`apps/etl/connectors/feiqiu/tests/unit/test_task_log_buffer.py`
+    - 覆盖：空缓冲区 flush、并发多线程写入、日志前缀格式
+    - _需求: 10.1, 10.3, 10.4_
+
+- [x] 9. 检查点 — DWD 优化与日志验证
+  - 确保所有测试通过，ask the user if questions arise.
+
+- [x] 10. Admin-web 日志展示优化
+  - [x] 10.1 在 `apps/etl/connectors/feiqiu/` 中集成 TaskLogBuffer 到 BaseOdsTask 和 FlowRunner
+    - 在 BaseOdsTask.execute() 中创建 TaskLogBuffer 实例，替代直接 logger 调用
+    - 在 FlowRunner 中为每个任务分配独立的 TaskLogBuffer，任务完成后调用 flush()
+    - 保证多线程环境下日志写入原子性（每条日志完整一行）
+    - _需求: 10.1, 10.3, 10.4_
+
+  - [x] 10.2 在 `apps/admin-web/` 中实现按任务分组的日志展示
+    - 在 ETL 执行结果页面中按任务分段展示日志：每个任务折叠为独立区块
+    - 展开后显示该任务的完整执行日志（时间戳、日志级别、消息内容）
+    - 支持按任务代码过滤和分组展示
+    - 顶部展示任务执行时间线概览（每个任务的开始/结束时间、状态），可点击跳转
+    - _需求: 10.2, 10.5, 10.6_
+
+- [x] 11. CLI 参数扩展
+  - [x] 11.1 在 `apps/etl/connectors/feiqiu/cli/` 中添加 Pipeline 相关 CLI 参数
+    - 新增 `--pipeline-workers`、`--pipeline-batch-size`、`--pipeline-rate-min`、`--pipeline-rate-max` 参数
+    - 将 CLI 参数值注入到 AppConfig，使其在 PipelineConfig.from_app_config() 中生效
+    - _需求: 4.6_
+
+- [x] 12. 最终检查点 — 全量验证
+  - 确保所有测试通过，ask the user if questions arise.
+
+## 备注
+
+- 标记 `*` 的子任务为可选，可跳过以加速 MVP 交付
+- 每个任务引用了具体的需求编号，确保可追溯
+- 检查点任务用于增量验证，确保每个阶段的正确性
+- 属性测试验证通用正确性属性，单元测试验证具体示例和边界条件
+- 属性测试位于 Monorepo 级 `tests/` 目录，单元测试位于 ETL 模块内 `tests/unit/`
				`@@ -0,0 +1 @@`
				`{"specId": "a277a91a-b35c-4d48-b4a2-09df0e47b71b", "workflowType": "requirements-first", "specType": "feature"}`