在前后端开发联调前 的提交20260223
This commit is contained in:
251
docs/audit/prompt_logs/prompt_log_20260220_110911.md
Normal file
251
docs/audit/prompt_logs/prompt_log_20260220_110911.md
Normal file
@@ -0,0 +1,251 @@
|
||||
- [P20260220-110911] 2026-02-20 11:09:11 +0800
|
||||
- summary: CONTEXT TRANSFER: We are continuing a conversation that had gotten too long. Here is a summary: --- ## TASK 1: Full Prod…
|
||||
- prompt:
|
||||
```text
|
||||
CONTEXT TRANSFER: We are continuing a conversation that had gotten too long. Here is a summary:
|
||||
|
||||
---
|
||||
## TASK 1: Full Production Integration Test (api_full, verify_only, custom window)
|
||||
|
||||
**STATUS**: done
|
||||
|
||||
**DETAILS**: ETL run `896b9647` completed successfully. 22 ODS tasks: 21 SUCC, 1 FAIL (`ODS_ASSISTANT_LEDGER` — known JSONB bug, fix already in code). `verify_only` mode only executes ODS fetch + verification, skips DWD/DWS/INDEX.
|
||||
|
||||
**FILEPATHS**:
|
||||
- `export/ETL-Connectors/feiqiu/REPORTS/etl_timing_20260220_091414.md`
|
||||
- `export/ETL-Connectors/feiqiu/REPORTS/consistency_report_20260220_091414.md`
|
||||
|
||||
---
|
||||
|
||||
## TASK 2: Fix consistency report issues (missing DWD fields + whitelist)
|
||||
|
||||
**STATUS**: done
|
||||
|
||||
**DETAILS**:
|
||||
- `principal_change_amount` fix: Added FACT_MAPPINGS expression `COALESCE(CAST(principal_after AS numeric),0) - COALESCE(CAST(principal_before AS numeric),0)`
|
||||
- `update_time` fix: Confirmed upstream API has NO update_time field. Kept as NULL with comment. Added `KNOWN_NO_SOURCE` whitelist in `consistency_checker.py`
|
||||
- Whitelist logic fully implemented: `check_ods_vs_dwd_mappings()` now checks `KNOWN_NO_SOURCE` dict before marking fields as "missing", marks whitelisted fields as "pass(已知无源字段)"
|
||||
- All 735 unit tests passing
|
||||
|
||||
**FILEPATHS**:
|
||||
- `apps/etl/connectors/feiqiu/tasks/dwd/dwd_load_task.py`
|
||||
- `apps/etl/connectors/feiqiu/quality/consistency_checker.py`
|
||||
- `apps/etl/connectors/feiqiu/tests/unit/test_cli_args.py`
|
||||
|
||||
---
|
||||
|
||||
## TASK 3: ODS_ASSISTANT_LEDGER JSONB fix
|
||||
|
||||
**STATUS**: done (code fix applied, awaiting next ETL run to verify)
|
||||
|
||||
**DETAILS**: `_mark_missing_as_deleted` in `ods_tasks.py` now detects ALL JSONB columns via `cols_info` udt_name and wraps dict/list values with `Json()`.
|
||||
|
||||
**FILEPATHS**: `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py`
|
||||
|
||||
---
|
||||
|
||||
## TASK 4: Explain increment_only vs increment_verify vs verify_only modes
|
||||
|
||||
**STATUS**: done
|
||||
|
||||
**DETAILS**: Traced `flow_runner.py` code and explained all three modes:
|
||||
- `verify_only`: Only verification (optional ODS fetch first), no DWD/DWS/INDEX ETL
|
||||
- `increment_only`: Run ETL for all resolved layers, no verification
|
||||
- `increment_verify`: Run ETL + verification after
|
||||
- All modes run `_run_post_consistency_check()` unconditionally
|
||||
|
||||
---
|
||||
|
||||
## TASK 5: Explain full increment_verify data pipeline (API→ODS→DWD→DWS→INDEX)
|
||||
|
||||
**STATUS**: done
|
||||
|
||||
**DETAILS**: Traced complete data flow:
|
||||
- ODS (22 tasks): API paginated fetch → `content_hash` dedup → snapshot soft-delete
|
||||
- DWD (1 task `DWD_LOAD_FROM_ODS`): dim tables → SCD2 merge; fact tables → `fetched_at` window incremental upsert via FACT_MAPPINGS
|
||||
- DWS (16 tasks): delete-before-insert aggregation from DWD
|
||||
- INDEX (4 tasks): Custom index algorithms (WBI/NCI/RS/ML)
|
||||
- Confirmed: DWD/DWS have NO cursor strategy. DWD compares ODS latest vs DWD current. DWS is full recalculate per date range.
|
||||
- Confirmed: ODS PK is `(id, content_hash)` — content_hash change = NEW ROW (snapshot mode), NOT in-place update. User was correct.
|
||||
|
||||
---
|
||||
|
||||
## TASK 6: Remove `pipeline` parameter, rename to `flow` everywhere
|
||||
|
||||
**STATUS**: in-progress
|
||||
|
||||
**DETAILS**:
|
||||
- User wants to completely remove `pipeline` parameter/field name across the entire codebase, replacing with `flow`
|
||||
- No backward compatibility needed — clean break
|
||||
|
||||
**What's been done:**
|
||||
1. `apps/etl/connectors/feiqiu/orchestration/flow_runner.py` — DONE:
|
||||
- `run()` parameter `pipeline` → `flow`
|
||||
- Return dict key `"pipeline"` → `"flow"`
|
||||
- Docstring updated, CHANGE comment added
|
||||
2. `apps/etl/connectors/feiqiu/orchestration/scheduler.py` — DONE:
|
||||
- `PIPELINE_LAYERS` → `FLOW_LAYERS` (module-level constant)
|
||||
3. `apps/etl/connectors/feiqiu/cli/main.py` — PARTIALLY DONE:
|
||||
- Removed `--pipeline` argument definition (the `add_argument("--pipeline", ...)` block)
|
||||
- Removed `--pipeline` from help text examples
|
||||
- BUT: The `pipeline_deprecated` handling block (lines ~445-456) still references `args.pipeline_deprecated` which no longer exists — **this will crash**. Need to delete that entire block.
|
||||
- `runner.run(pipeline=args.flow, ...)` on line ~555 NOT yet changed to `runner.run(flow=args.flow, ...)`
|
||||
- `runner.run(pipeline=None, layers=layers, ...)` on line ~613 NOT yet changed to `runner.run(flow=None, layers=layers, ...)`
|
||||
|
||||
**What still needs to be done:**
|
||||
- `apps/etl/connectors/feiqiu/cli/main.py`:
|
||||
- Delete the `if args.pipeline_deprecated:` block (lines ~445-456)
|
||||
- Change `runner.run(pipeline=args.flow, ...)` → `runner.run(flow=args.flow, ...)`
|
||||
- Change `runner.run(pipeline=None, layers=layers, ...)` → `runner.run(flow=None, layers=layers, ...)`
|
||||
- Remove `--pipeline` from the module docstring at top (line ~10-11)
|
||||
- Remove `--pipeline-flow` deprecated argument and its handling (lines ~284-285, ~319-325, ~372-375)
|
||||
- `apps/backend/app/schemas/tasks.py`: `pipeline` field → `flow`
|
||||
- `apps/backend/app/services/cli_builder.py`: `config.pipeline` → `config.flow`, `--pipeline` → `--flow` in cmd building
|
||||
- `apps/backend/app/routers/tasks.py`: `config.pipeline` → `config.flow`
|
||||
- `apps/admin-web/src/pages/TaskManager.tsx`: `config.pipeline` → `config.flow`
|
||||
- `apps/admin-web/src/App.tsx`: `runningTask.config.pipeline` → `runningTask.config.flow`
|
||||
- `apps/admin-web/src/pages/TaskConfig.tsx`: `pipeline: flow` → `flow: flow` (or rename variable)
|
||||
- `apps/admin-web/src/components/ScheduleTab.tsx`: `pipeline: 'api_full'` → `flow: 'api_full'`
|
||||
- `apps/admin-web/src/types/` — check for TypeScript interface with `pipeline` field
|
||||
- Test files (ALL need `pipeline` → `flow`):
|
||||
- `apps/etl/connectors/feiqiu/tests/unit/test_pipeline_runner_properties.py` — heavy usage
|
||||
- `apps/etl/connectors/feiqiu/tests/unit/test_debug_orchestration_properties.py`
|
||||
- `apps/etl/connectors/feiqiu/scripts/debug/debug_orchestration.py`
|
||||
- `tests/test_property_5_etl_param_parsing.py`
|
||||
- `apps/backend/tests/test_task_queue.py`
|
||||
- `apps/backend/tests/test_task_executor.py`
|
||||
- `apps/backend/tests/test_schedule_properties.py`
|
||||
- `apps/backend/tests/test_schedules_router.py`
|
||||
- `apps/backend/tests/test_scheduler.py`
|
||||
- `apps/backend/tests/test_queue_properties.py`
|
||||
- `apps/backend/tests/test_execution_router.py`
|
||||
- `apps/backend/tests/test_cli_builder.py`
|
||||
- `apps/backend/tests/test_tasks_router.py`
|
||||
- `apps/backend/tests/test_site_isolation_properties.py`
|
||||
- `apps/backend/tests/test_task_config_properties.py`
|
||||
- After all changes: run ETL unit tests + backend tests + monorepo property tests
|
||||
|
||||
**FILEPATHS**:
|
||||
- `apps/etl/connectors/feiqiu/orchestration/flow_runner.py` (MODIFIED)
|
||||
- `apps/etl/connectors/feiqiu/orchestration/scheduler.py` (MODIFIED)
|
||||
- `apps/etl/connectors/feiqiu/cli/main.py` (PARTIALLY MODIFIED — broken state, needs completion)
|
||||
- `apps/backend/app/schemas/tasks.py` (needs change)
|
||||
- `apps/backend/app/services/cli_builder.py` (needs change)
|
||||
- `apps/backend/app/routers/tasks.py` (needs change)
|
||||
- All frontend `.tsx` files listed above
|
||||
- All test files listed above
|
||||
|
||||
---
|
||||
|
||||
## TASK 7: New `full_window` processing mode
|
||||
|
||||
**STATUS**: not-started
|
||||
|
||||
**DETAILS**:
|
||||
- User wants a new `processing_mode="full_window"` distinct from `increment_verify`
|
||||
- ODS tasks should NOT use cursor to calculate time window; instead use the actual time bounds from the returned JSON data
|
||||
- No `_run_verification` needed (API data is the source of truth, no cursor drift risk)
|
||||
- ODS entry still uses `content_hash` dedup + idempotent upsert (unchanged)
|
||||
- DWD/DWS processing unchanged (DWD compares ODS vs DWD, DWS full recalculate)
|
||||
- Implementation needs to happen in `flow_runner.py` (new branch in `run()`) and `ods_tasks.py` (window derivation from JSON response data instead of cursor)
|
||||
|
||||
**FILEPATHS**:
|
||||
- `apps/etl/connectors/feiqiu/orchestration/flow_runner.py`
|
||||
- `apps/etl/connectors/feiqiu/tasks/ods/ods_tasks.py`
|
||||
|
||||
---
|
||||
|
||||
## KEY ARCHITECTURE:
|
||||
- Frontend: React + Vite + Ant Design at `apps/admin-web/` (localhost:5173)
|
||||
- Backend: FastAPI at `apps/backend/` (localhost:8000)
|
||||
- ETL: `apps/etl/connectors/feiqiu/`
|
||||
- ODS PK = `(id, content_hash)` — snapshot mode, content change = new row
|
||||
- DWD has no cursor; dim tables use SCD2, fact tables use `fetched_at` window upsert
|
||||
- DWS has no cursor; delete-before-insert by date range
|
||||
- Four DB connections: `mcp_pg_etl`, `mcp_pg_etl_test`, `mcp_pg_app`, `mcp_pg_app_test`
|
||||
|
||||
## USER CORRECTIONS AND INSTRUCTIONS:
|
||||
- All output text must be in simplified Chinese
|
||||
- `fetched_at` is ETL ingestion timestamp, NOT business update time
|
||||
- ODS content_hash change = NEW ROW (snapshot), not in-place update — user corrected agent's mistake
|
||||
- `pipeline` is NOT a legacy/deprecated alias for `flow` — user wants it completely removed, not kept for compatibility
|
||||
- `--pipeline-flow` is a separate deprecated param (for data_source), also should be cleaned up
|
||||
- Audit reminder pending for high-risk path changes
|
||||
- Python scripts preferred over PowerShell for complex operations
|
||||
|
||||
## Files to read
|
||||
- `apps/etl/connectors/feiqiu/cli/main.py` (CRITICAL — partially modified, in broken state, needs immediate fixes)
|
||||
- `apps/etl/connectors/feiqiu/orchestration/flow_runner.py` (already modified, verify current state)
|
||||
- `apps/backend/app/schemas/tasks.py` (needs `pipeline` → `flow`)
|
||||
- `apps/backend/app/services/cli_builder.py` (needs `pipeline` → `flow`)
|
||||
- `apps/backend/app/routers/tasks.py` (needs `pipeline` → `flow`)
|
||||
- `apps/etl/connectors/feiqiu/tests/unit/test_pipeline_runner_properties.py` (heavy `pipeline` usage in tests)
|
||||
- `apps/admin-web/src/pages/TaskConfig.tsx` (frontend `pipeline` field)
|
||||
- `apps/admin-web/src/types/` (check for TypeScript interfaces)
|
||||
|
||||
USER QUERIES(most recent first):
|
||||
1. 问题:- 内容变化(content_hash 不同)→ ON CONFLICT DO UPDATE,是原地更新同一行,不是新建我记得hash不同 也是 新建记录啊?是快照模式,难道我记错了?还是有不同类型的表,处理方式不同?确认:任务1:执行。任务2:区别于increment_verify的新任务,可以叫做 full_window:直接用对应返回的JSON 数据的时间上下限窗口,做为[window_start, window_end]。不是 CLI/API 显式指定的时间窗口(--window-start/--window-end),而是返回JSON的时间上下限。用这个窗口调 API入库(content_hash 去重 + 幂等 upsert)这个应该不变。不需要 verification,因为数据源就是 API 返回的全量,没有游标偏移导致的遗漏风险。对的
|
||||
2. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
3. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
4. 问题:- 内容变化(content_hash 不同)→ ON CONFLICT DO UPDATE,是原地更新同一行,不是新建我记得hash不同 也是 新建记录啊?是快照模式,难道我记错了?还是有不同类型的表,处理方式不同?确认:任务1:执行。任务2:区别于increment_verify的新任务,可以叫做 full_window:直接用对应返回的JSON 数据的时间上下限窗口,做为[window_start, window_end]。不是 CLI/API 显式指定的时间窗口(--window-start/--window-end),而是返回JSON的时间上下限。用这个窗口调 API入库(content_hash 去重 + 幂等 upsert)这个应该不变。不需要 verification,因为数据源就是 API 返回的全量,没有游标偏移导致的遗漏风险。对的
|
||||
5. Output:
|
||||
Hook execution failed with exit code 1.
|
||||
|
||||
Error output:
|
||||
[AUDIT REMINDER] Pending audit detected (root-file, dir:admin-web, dir:backend, dir:db, db-schema-change). Run /audit (Manual: Run /audit hook) to sync docs & write audit artifacts. (rate limit: 15min)
|
||||
|
||||
|
||||
|
||||
Exit Code: 1
|
||||
6. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
7. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
8. 问题:- DWD 和 DWS 没有游标策略吧?直接用ODS更新的内容与 DWD 校对进行更新/新增吧?DWS是全量更新吧?- ODS 软删除或者内容变化的更新是新建数据处理吧?任务:- 不要pipeline,不需要兼容,彻底去掉这个参数,无论是预设快捷方式(类似别名)还是别的,都用FLOW.- 我想区别increment_verify,ODS 不要用游标(cursor)算时间窗口 [window_start, window_end],而是用返回Json数据的 [window_start, window_end]。这样的情况下,没有必要_run_verification 校验了吧?
|
||||
9. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
10. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
11. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
12. - pipeline 不是被Flow替代了么?为什么还有遗留?- increment_verify 是 最全的么?帮我查下increment_verify 的API - ODS - DWD - DWS(index) 是怎么运作,怎么处理数据的?
|
||||
13. Output:
|
||||
Hook execution failed with exit code 1.
|
||||
|
||||
Error output:
|
||||
[AUDIT REMINDER] Pending audit detected (root-file, dir:admin-web, dir:backend, dir:db, db-schema-change). Run /audit (Manual: Run /audit hook) to sync docs & write audit artifacts. (rate limit: 15min)
|
||||
|
||||
|
||||
|
||||
Exit Code: 1
|
||||
14. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
15. Output:
|
||||
Command executed successfully with no output.
|
||||
|
||||
Exit Code: 0
|
||||
---
|
||||
|
||||
METADATA:
|
||||
The previous conversation had 8 messages.
|
||||
|
||||
INSTRUCTIONS:
|
||||
Continue working until the user query has been fully addressed. Do not ask for clarification - proceed with the work based on the context provided.
|
||||
Reference in New Issue
Block a user