chore(ops): reload 卡死三层预防 + F1-5a 完整走查报告

reload 卡死三层预防(走查中遭遇 uvicorn graceful shutdown 死等触发):
- Layer 1 (apps/backend/start_uvicorn.py 新): 把 reload-excludes
  封装在 Python 字符串内,ps1 命令行只有字面路径,根治 PowerShell
  PSNativeCommandArgumentPassing 在不同 profile 下 wildcard 展开
  行为差异(数组 splatting 和 --% 都不稳)。同时显式设
  timeout-graceful-shutdown=5,5 秒强杀防死等
- Layer 2 (scripts/ops/backend-watchdog.ps1 新): 自主 socket 探针
  (TcpClient + 手写 HTTP/1.1 GET,Connection: close)规避 .NET
  HttpClient pool 复用 + 系统代理误报;3s × 3 = 9s 触发重启;
  进程链 kill 至 pwsh 后端窗口(关闭原窗口);3 次/小时上限自停
- Layer 3 (scripts/ops/start-admin.ps1): 启动时拉起 watchdog,
  菜单 [4] 仅重启后端选项,主菜单退出时一并 kill 看门狗

CLAUDE.md: 新增"后端 reload 卡死预防(强制)"章节,
分级文件风险表 + SOP + 启动菜单速查

走查报告(应查尽查严肃版):
- 后端 6 个改造点 PASS(P1-P4 + GUC + ai_run_logs runtime 字段)
- admin-web 7 页 Playwright 实地走查 → 5 项 UI 不完整登记 F1-5b
- 小程序看板 tab 7 页 weixin-devtools-mcp 实地 + DB 数据核对 →
  board-finance 5/6 项上界裁剪吻合;board-customer 业务日生效;
  board-coach 月度聚合表设计盲区;5 项 sandbox 覆盖盲区登记 F1-5b
- 8 张走查截图归档 docs/audit/changes/screenshots/2026-05-05_f1_5a_walkthrough/

audit_dashboard 刷新到 153 条审计

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Neo
2026-05-05 11:53:08 +08:00
parent 1baa21222b
commit 95a4500c75
14 changed files with 745 additions and 5 deletions

View File

@@ -3,6 +3,8 @@
# 服务成功启动后自动打开浏览器
$ErrorActionPreference = "Stop"
# CHANGE 2026-05-05 | $watchdogProc 提前到 try 块外,确保 catch / 退出兜底能访问
$watchdogProc = $null
try {
# CHANGE 2026-03-07 | 定位项目根目录:从 bat 启动目录推算,不穿透 junction
@@ -135,6 +137,12 @@ try {
# 2. NO_COLOR 对 uvicorn 不生效 → 加 --no-color 参数
# 3. PS 5.1 传统控制台 UTF-8 箭头乱码 → 所有临时脚本开头 chcp 65001
# 并统一设 NO_COLOR=1 禁用 ANSI 转义码
# CHANGE 2026-05-05 | F1-5a 走查发现 reload 卡死问题:
# - 加 --timeout-graceful-shutdown 5:5 秒后 graceful 失败强杀,reload 不再死等
# - 配合 backend-watchdog.ps1:连续 30 秒探针失败自动重启
# CHANGE 2026-05-05 v3 | 根治 wildcard 展开:
# 用 apps/backend/start_uvicorn.py 启动脚本,所有 wildcard 字符串封装在 Python 内部,
# PowerShell shell 完全不接触 wildcard,根本上规避 PSNativeCommandArgumentPassing 行为
$beLines = @(
"`$env:NEOZQYY_ROOT = ${q}${ProjectRoot}${q}"
""
@@ -149,7 +157,9 @@ try {
"Set-Location -LiteralPath ${q}${backendDir}${q}"
"Write-Host ${q}=== 后端 FastAPI ===${q} -ForegroundColor Green"
"Write-Host `"NEOZQYY_ROOT=`$env:NEOZQYY_ROOT`""
"& ${q}${venvPython}${q} -m uvicorn app.main:app --reload --port 8000 --no-use-colors"
""
"# 调用 start_uvicorn.py(reload-excludes/timeout-graceful-shutdown 等参数硬编码在 Python 内)"
"& ${q}${venvPython}${q} ${q}${backendDir}\start_uvicorn.py${q} --port 8000"
"Write-Host ${q}后端已退出,按任意键关闭...${q} -ForegroundColor Red"
"`$null = `$Host.UI.RawUI.ReadKey(${q}NoEcho,IncludeKeyDown${q})"
)
@@ -281,6 +291,62 @@ try {
else { Write-Host "tenant-admin 等待超时,请手动打开 http://localhost:5174" -ForegroundColor Red }
}
# ── 看门狗启动/停止函数 ──
# CHANGE 2026-05-05 | F1-5a 走查发现 reload 卡死问题:
# 看门狗在独立窗口运行,周期探针 /health,卡死自动强杀+重启 backend。
function Start-Watchdog {
$watchdogScript = Join-Path $ProjectRoot "scripts\ops\backend-watchdog.ps1"
if (-not (Test-Path $watchdogScript)) {
Write-Host " !! 看门狗脚本不存在: $watchdogScript" -ForegroundColor Yellow
return $null
}
Write-Host "[守护] 启动后端看门狗 (后台监控,卡死自动重启) ..." -ForegroundColor DarkCyan
# WindowStyle Minimized:看门狗窗口最小化,不打扰主操作
$proc = Start-Process $psExe `
-ArgumentList "-NoExit", "-ExecutionPolicy", "Bypass", "-File", $watchdogScript, "-ProjectRoot", $ProjectRoot `
-WindowStyle Minimized -PassThru
return $proc
}
function Stop-Watchdog {
param($WdProc)
if ($WdProc -and -not $WdProc.HasExited) {
Write-Host " 终止看门狗 (PID=$($WdProc.Id))..." -ForegroundColor Yellow
taskkill /PID $WdProc.Id /T /F 2>$null | Out-Null
}
}
# ── 仅重启后端(保留前端) ──
# CHANGE 2026-05-05 | 测试场景下大多数改动只在后端,前端不需要重启;
# 此选项跳过前端避免 WS 重连 + 浏览器刷新成本
function Restart-BackendOnly {
param([ref]$BeProc)
$ErrorActionPreference = "Continue"
Write-Host ""
Write-Host "仅重启后端 (前端 admin-web/tenant-admin 保留)..." -ForegroundColor Yellow
if ($BeProc.Value -and -not $BeProc.Value.HasExited) {
Write-Host " 终止旧后端 (PID=$($BeProc.Value.Id))..." -ForegroundColor Yellow
taskkill /PID $BeProc.Value.Id /T /F 2>$null | Out-Null
}
# 兜底按端口清理
$listeners = Get-NetTCPConnection -LocalPort 8000 -State Listen -ErrorAction SilentlyContinue
foreach ($l in $listeners) {
taskkill /PID $l.OwningProcess /T /F 2>$null | Out-Null
}
# 等端口释放
$waited = 0
while ($waited -lt 15) {
Start-Sleep -Seconds 1
$waited++
$still = Get-NetTCPConnection -LocalPort 8000 -State Listen -ErrorAction SilentlyContinue
if (-not $still) { break }
}
$ErrorActionPreference = "Stop"
Write-Host " 端口 8000 已释放,启动新后端..." -ForegroundColor Green
$BeProc.Value = Start-Process $psExe -ArgumentList "-NoExit", "-ExecutionPolicy", "Bypass", "-File", $beTmp -PassThru
Write-Host " 新后端已启动 (PID=$($BeProc.Value.Id))" -ForegroundColor Green
}
# ── 倒计时函数 ──
function Show-Countdown {
param([int]$Seconds)
@@ -294,7 +360,10 @@ try {
# ── 首次启动 ──
$beProc = $null; $feProc = $null; $taProc = $null
$watchdogProc = $null
Start-AllServices -BeProc ([ref]$beProc) -FeProc ([ref]$feProc) -TaProc ([ref]$taProc)
# CHANGE 2026-05-05 | 启动看门狗(独立窗口,最小化运行)
$watchdogProc = Start-Watchdog
Wait-AndOpenBrowser
# ── 交互菜单循环 ──
@@ -307,15 +376,22 @@ try {
Write-Host " [1] 终止所有服务" -ForegroundColor Yellow
Write-Host " [2] 重启所有服务(间隔 5 秒倒计时)" -ForegroundColor Yellow
Write-Host " [3] 退出(终止服务并关闭窗口)" -ForegroundColor Yellow
Write-Host " [4] 仅重启后端(保留前端,推荐:测试时 reload 卡死/Python 改动)" -ForegroundColor Yellow
Write-Host "========================================" -ForegroundColor Cyan
$choice = Read-Host "请输入选项 (1/2/3)"
$choice = Read-Host "请输入选项 (1/2/3/4)"
switch ($choice) {
"1" {
# CHANGE 2026-05-05 | [1] 全停 = 服务 + 看门狗都停(否则看门狗会自动拉起后端)
Stop-Watchdog -WdProc $watchdogProc
$watchdogProc = $null
Stop-AllServices -BeProc $beProc -FeProc $feProc -TaProc $taProc
$beProc = $null; $feProc = $null; $taProc = $null
}
"2" {
# CHANGE 2026-05-05 | 重启时先停看门狗,避免它在停-启间隙误判卡死
Stop-Watchdog -WdProc $watchdogProc
$watchdogProc = $null
Stop-AllServices -BeProc $beProc -FeProc $feProc -TaProc $taProc
Show-Countdown -Seconds 5
# 重新生成日志文件名(避免旧文件锁定)
@@ -336,7 +412,8 @@ try {
"Set-Location -LiteralPath ${q}${backendDir}${q}"
"Write-Host ${q}=== 后端 FastAPI ===${q} -ForegroundColor Green"
"Write-Host `"NEOZQYY_ROOT=`$env:NEOZQYY_ROOT`""
"& ${q}${venvPython}${q} -m uvicorn app.main:app --reload --port 8000 --no-use-colors"
""
"& ${q}${venvPython}${q} ${q}${backendDir}\start_uvicorn.py${q} --port 8000"
"Write-Host ${q}后端已退出,按任意键关闭...${q} -ForegroundColor Red"
"`$null = `$Host.UI.RawUI.ReadKey(${q}NoEcho,IncludeKeyDown${q})"
)
@@ -372,14 +449,22 @@ try {
$beProc = $null; $feProc = $null; $taProc = $null
Start-AllServices -BeProc ([ref]$beProc) -FeProc ([ref]$feProc) -TaProc ([ref]$taProc)
# CHANGE 2026-05-05 | 重启完后重新启动看门狗
$watchdogProc = Start-Watchdog
Wait-AndOpenBrowser
}
"3" {
Stop-AllServices -BeProc $beProc -FeProc $feProc -TaProc $taProc
Stop-Watchdog -WdProc $watchdogProc
$watchdogProc = $null
$running = $false
}
"4" {
Restart-BackendOnly -BeProc ([ref]$beProc)
# 看门狗保持运行(它会在重启 grace 期间不探针)
}
default {
Write-Host " 无效选项,请输入 1、2 或 3" -ForegroundColor Red
Write-Host " 无效选项,请输入 1、2、34" -ForegroundColor Red
}
}
}
@@ -390,6 +475,12 @@ try {
Write-Host $_.ScriptStackTrace -ForegroundColor DarkRed
}
# CHANGE 2026-05-05 | 主窗口退出前兜底 kill 看门狗,防止主窗口被强关时看门狗成孤儿进程
if ($watchdogProc -and -not $watchdogProc.HasExited) {
Write-Host " 兜底终止看门狗 (PID=$($watchdogProc.Id))..." -ForegroundColor DarkGray
taskkill /PID $watchdogProc.Id /T /F 2>$null | Out-Null
}
Write-Host ""
Write-Host "按任意键关闭此窗口..." -ForegroundColor DarkGray
$null = $Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")