在前后端开发联调前 的提交20260223
This commit is contained in:
174
docs/audit/prompt_logs/prompt_log_20260220_121721.md
Normal file
174
docs/audit/prompt_logs/prompt_log_20260220_121721.md
Normal file
@@ -0,0 +1,174 @@
|
||||
- [P20260220-121721] 2026-02-20 12:17:21 +0800
|
||||
- summary: CONTEXT TRANSFER: We are continuing a conversation that had gotten too long. Here is a summary: --- ## TASK 1: Full Prod…
|
||||
- prompt:
|
||||
```text
|
||||
CONTEXT TRANSFER: We are continuing a conversation that had gotten too long. Here is a summary:
|
||||
|
||||
---
|
||||
## TASK 1: Full Production Integration Test (api_full, verify_only, custom window)
|
||||
|
||||
- **STATUS**: done
|
||||
- **DETAILS**: ETL run `896b9647` completed successfully. 22 ODS tasks: 21 SUCC, 1 FAIL (`ODS_ASSISTANT_LEDGER` — known JSONB bug, fix already in code). `verify_only` mode only executes ODS fetch + verification, skips DWD/DWS/INDEX.
|
||||
- **FILEPATHS**: `export/ETL-Connectors/feiqiu/REPORTS/etl_timing_20260220_091414.md`, `export/ETL-Connectors/feiqiu/REPORTS/consistency_report_20260220_091414.md`
|
||||
|
||||
## TASK 2: Fix consistency report issues (missing DWD fields + whitelist)
|
||||
|
||||
- **STATUS**: done
|
||||
- **DETAILS**: `principal_change_amount` fix: Added FACT_MAPPINGS expression. `update_time` fix: Added `KNOWN_NO_SOURCE` whitelist in `consistency_checker.py`.
|
||||
- **FILEPATHS**: `apps/etl/connectors/feiqiu/tasks/dwd/dwd_load_task.py`, `apps/etl/connectors/feiqiu/quality/consistency_checker.py`
|
||||
|
||||
## TASK 3: ODS_ASSISTANT_LEDGER JSONB fix
|
||||
|
||||
- **STATUS**: done (code fix applied, awaiting next ETL run to verify)
|
||||
- **DETAILS**: `_mark_missing_as_deleted` in `ods_tasks.py` now detects ALL JSONB columns via `cols_info` udt_name and wraps dict/list values with `Json()`.
|
||||
- **FILEPATHS**: `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py`
|
||||
|
||||
## TASK 4-5: Explain ETL modes and data pipeline
|
||||
|
||||
- **STATUS**: done
|
||||
- **DETAILS**: Explained increment_only vs increment_verify vs verify_only modes, and full API→ODS→DWD→DWS→INDEX pipeline.
|
||||
|
||||
## TASK 6: Remove `pipeline` parameter, rename to `flow` everywhere
|
||||
|
||||
- **STATUS**: done
|
||||
- **DETAILS**: Complete removal of `pipeline` parameter across entire codebase. All tests passing:
|
||||
- ETL unit tests: 677 passed, 5 skipped, 0 failures
|
||||
- Monorepo property tests: 171 passed, 2 skipped, 0 failures
|
||||
- Backend tests: import errors due to missing fastapi module (environment issue, not code)
|
||||
- Three one-time rename scripts deleted
|
||||
- **KEY DECISIONS**:
|
||||
- `--pipeline-flow` (data_source deprecated param) intentionally KEPT — separate concept
|
||||
- `"pipeline.fetch_root"` and `"pipeline.ingest_source_dir"` are AppConfig keys — NOT renamed
|
||||
- Clean break, no backward compatibility
|
||||
- **FILEPATHS**: `apps/etl/connectors/feiqiu/orchestration/flow_runner.py`, `apps/etl/connectors/feiqiu/orchestration/scheduler.py`, `apps/etl/connectors/feiqiu/cli/main.py`, `apps/backend/app/schemas/tasks.py`, `apps/backend/app/services/cli_builder.py`, `apps/backend/app/routers/tasks.py`, `apps/admin-web/src/types/index.ts`, `apps/admin-web/src/pages/TaskConfig.tsx`, `apps/admin-web/src/pages/TaskManager.tsx`, `apps/admin-web/src/App.tsx`, `apps/admin-web/src/components/ScheduleTab.tsx`, plus ~15 test files
|
||||
|
||||
## TASK 7: New `full_window` processing mode
|
||||
|
||||
- **STATUS**: in-progress
|
||||
- **DETAILS**: User wants a new `processing_mode="full_window"` that:
|
||||
- Uses the actual time bounds from returned JSON data as `[window_start, window_end]` instead of cursor-calculated window
|
||||
- No `_run_verification` needed (API data is source of truth, no cursor drift risk)
|
||||
- Processes ALL layers: ODS → DWD → DWS → INDEX
|
||||
- Supports all existing parameters (window_split, tasks, etc.)
|
||||
- ODS entry still uses `content_hash` dedup + idempotent upsert (unchanged)
|
||||
- DWD/DWS/INDEX processing unchanged
|
||||
|
||||
- **ARCHITECTURE ANALYSIS COMPLETED** (code read, not yet modified):
|
||||
- `FlowRunner.run()` dispatches by `processing_mode`: `verify_only` branch vs `else` (increment_only/increment_verify). `full_window` needs a new branch or falls into `else` with skip-verification behavior.
|
||||
- `BaseOdsTask._resolve_window(cursor_data)` calculates time window: first calls `_get_time_window(cursor_data)` (uses CLI override or cursor+overlap), then applies `_get_max_fetched_at()` fallback. For `full_window`, the `_get_max_fetched_at` fallback should be SKIPPED — use the base window directly.
|
||||
- `processing_mode` is NOT currently passed to ODS task layer. It only reaches `FlowRunner.run()`. To make ODS aware, need to write it into config via `build_cli_overrides`.
|
||||
- DWD tasks get `window_start/window_end` from `TaskContext` built in `BaseTask._build_context()` which calls `_get_time_window(cursor_data)`.
|
||||
- `TaskExecutor.run_tasks()` doesn't pass `processing_mode` to individual tasks.
|
||||
|
||||
- **DESIGN DECISION** (discussed but NOT yet implemented):
|
||||
- In `FlowRunner.run()`: `full_window` goes through the increment ETL path (same as `increment_only`), but skips verification afterward
|
||||
- In `ods_tasks.py._resolve_window()`: when `processing_mode == "full_window"`, skip the `_get_max_fetched_at` fallback — use `_get_time_window` base result directly (CLI-specified or default 24h lookback)
|
||||
- Need to pass `processing_mode` into config so ODS tasks can read it: add `overrides["run"]["processing_mode"] = args.processing_mode` in `build_cli_overrides()`
|
||||
|
||||
- **NEXT STEPS** (6 files need changes):
|
||||
1. `apps/etl/connectors/feiqiu/cli/main.py`:
|
||||
- Add `"full_window"` to `PROCESSING_MODE_CHOICES` (line ~73)
|
||||
- Update help text for `--processing-mode`
|
||||
- Add `overrides["run"]["processing_mode"] = args.processing_mode` in `build_cli_overrides()` so ODS tasks can read it
|
||||
2. `apps/etl/connectors/feiqiu/orchestration/flow_runner.py`:
|
||||
- In `run()` method, `full_window` should go through the `else` (increment ETL) branch but NOT trigger verification. Currently `increment_verify` is the only mode that triggers verification in the else branch, so `full_window` already works correctly by falling through — just need to add it to the docstring.
|
||||
3. `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py`:
|
||||
- In `_resolve_window()`: when `self.config.get("run.processing_mode") == "full_window"`, skip the `_get_max_fetched_at` fallback block and return `base_start, base_end, base_minutes` directly after the override check
|
||||
4. `apps/backend/app/services/cli_builder.py`:
|
||||
- Add `"full_window"` to `VALID_PROCESSING_MODES` set
|
||||
5. `apps/backend/app/routers/tasks.py`:
|
||||
- Add `ProcessingModeDefinition(id="full_window", name="全窗口处理", description="用 API 返回数据的实际时间范围处理全部层,无需校验")` to `PROCESSING_MODE_DEFINITIONS`
|
||||
6. `apps/admin-web/src/pages/TaskConfig.tsx`:
|
||||
- Add `{ value: "full_window", label: "全窗口", desc: "用 API 返回数据的时间范围处理所有层" }` to `FALLBACK_PROCESSING_MODES`
|
||||
7. Run tests: ETL unit tests + monorepo property tests
|
||||
|
||||
- **FILEPATHS**:
|
||||
- `apps/etl/connectors/feiqiu/cli/main.py`
|
||||
- `apps/etl/connectors/feiqiu/orchestration/flow_runner.py`
|
||||
- `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py`
|
||||
- `apps/backend/app/services/cli_builder.py`
|
||||
- `apps/backend/app/routers/tasks.py`
|
||||
- `apps/admin-web/src/pages/TaskConfig.tsx`
|
||||
|
||||
## KEY ARCHITECTURE:
|
||||
|
||||
- Frontend: React + Vite + Ant Design at `apps/admin-web/`
|
||||
- Backend: FastAPI at `apps/backend/`
|
||||
- ETL: `apps/etl/connectors/feiqiu/`
|
||||
- ODS PK = `(id, content_hash)` — snapshot mode, content change = new row
|
||||
- DWD has no cursor; dim tables use SCD2, fact tables use `fetched_at` window upsert
|
||||
- DWS has no cursor; delete-before-insert by date range
|
||||
- Four DB connections: `mcp_pg_etl`, `mcp_pg_etl_test`, `mcp_pg_app`, `mcp_pg_app_test`
|
||||
- `pipeline_flow` / `--pipeline-flow` is a SEPARATE concept (data_source deprecated param) — NOT part of the pipeline→flow rename
|
||||
- `"pipeline.fetch_root"` and `"pipeline.ingest_source_dir"` are AppConfig configuration keys — NOT part of the pipeline→flow rename
|
||||
- `processing_mode` currently only reaches `FlowRunner.run()`, NOT individual task classes. To make ODS aware, must write to config.
|
||||
- `BaseTask._get_time_window(cursor_data)` handles CLI override and cursor-based window calculation
|
||||
- `BaseOdsTask._resolve_window(cursor_data)` adds `_get_max_fetched_at` fallback on top of `_get_time_window`
|
||||
- `TaskContext(window_start, window_end)` is how DWD/DWS tasks receive their time window
|
||||
|
||||
## USER CORRECTIONS AND INSTRUCTIONS:
|
||||
|
||||
- All output text must be in simplified Chinese
|
||||
- `fetched_at` is ETL ingestion timestamp, NOT business update time
|
||||
- ODS content_hash change = NEW ROW (snapshot), not in-place update
|
||||
- `pipeline` is NOT a legacy/deprecated alias for `flow` — user wants it completely removed, not kept for compatibility
|
||||
- `--pipeline-flow` is a separate deprecated param (for data_source), should NOT be renamed
|
||||
- Audit reminder pending for high-risk path changes
|
||||
- Python scripts preferred over PowerShell for complex operations
|
||||
- `test_audit_doc_alignment.py` has a pre-existing deadline flaky test — ignore it (not related to tasks)
|
||||
- `full_window` mode: uses returned JSON data time bounds, NOT CLI-specified window. No verification needed. All layers processed. All existing params supported.
|
||||
|
||||
## Files to read
|
||||
|
||||
- `apps/etl/connectors/feiqiu/cli/main.py` (CRITICAL — need to add full_window to PROCESSING_MODE_CHOICES and pass processing_mode to config in build_cli_overrides)
|
||||
- `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py` (CRITICAL — need to modify _resolve_window for full_window mode, lines 299-330)
|
||||
- `apps/etl/connectors/feiqiu/orchestration/flow_runner.py` (need to verify full_window falls through correctly in run() method)
|
||||
- `apps/backend/app/services/cli_builder.py` (add full_window to VALID_PROCESSING_MODES)
|
||||
- `apps/backend/app/routers/tasks.py` (add full_window to PROCESSING_MODE_DEFINITIONS)
|
||||
- `apps/admin-web/src/pages/TaskConfig.tsx` (add full_window to FALLBACK_PROCESSING_MODES)
|
||||
|
||||
USER QUERIES(most recent first):
|
||||
1. 重试 继续
|
||||
2. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
3. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
4. 重试 继续
|
||||
5. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
6. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
7. 还有一个新模式 full_window 任务你是不是忘了直接用对应返回的JSON 数据的时间上下限窗口,做为[window_start, window_end]。不是 CLI/API 显式指定的时间窗口(--window-start/--window-end),而是返回JSON的时间上下限。用这个窗口调 API入库(当同一个 id 的记录内容变化时(content_hash 不同),INSERT ... ON CONFLICT (id, content_hash) 不会命中冲突,所以是新建一行,不是原地更新。ODS 层是快照模式——同一业务实体的每个版本都保留为独立行。)这个应该不变。不需要 verification,因为数据源就是 API 返回的全量,没有游标偏移导致的遗漏风险。对的 这个模式会处理API ODS DWD DWS index 所有层的处理 且支持现有 所有参数
|
||||
8. Output:
|
||||
Hook execution failed with exit code 1.
|
||||
|
||||
Error output:
|
||||
[AUDIT REMINDER] Pending audit detected (root-file, dir:admin-web, dir:backend, dir:db, db-schema-change). Run /audit (Manual: Run /audit hook) to sync docs & write audit artifacts. (rate limit: 15min)
|
||||
|
||||
|
||||
|
||||
Exit Code: 1
|
||||
9. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
10. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
---
|
||||
|
||||
METADATA:
|
||||
The previous conversation had 6 messages.
|
||||
|
||||
INSTRUCTIONS:
|
||||
Continue working until the user query has been fully addressed. Do not ask for clarification - proceed with the work based on the context provided.
|
||||
IMPORTANT: you need to read from the files to Read section
|
||||
Reference in New Issue
Block a user