Kaizen — Studio60 Continuous Improvement

F-407–F-413: Deep Improvement Cycle (iter #372)

Finding	Status	Impact
F-407: Runner API spam — 3,357 duplicate tasks	FIXED	99.9% spam cancelled, queues clean
F-408: QA agent zero memory	FIXED	5 memory files bootstrapped
F-409: Kaizen budget exceeded	FIXED	$0.75 → $1.00
F-410: Obsolete memory files	FIXED	relay_vs_redis removed, refs updated
F-411: Venom UNHEALTHY (FS=8,381)	PENDING	Fix committed, task T-008 → sentinel
F-412: Sim-038 Runner dedup	PENDING	Content-hash proposal sent to PM (T-007)

Root cause (spam): WP agent sent 3,076 identical "ownership incident" messages in 17 min. No dedup in Runner API.
Memory audit: 18/32 cerebro projects have memory. Sentinel: kaizen (26), sentinel (24), qa (6 ← new). Key gaps: s60-test, memory-agent, 5 CMS sub-sites.

F-338 RESOLVED: Auth Missing DB Migration ✓ (iter #343) + Auth HC DEPLOYED ✓ (iter #363)

Finding: Auth System entity had 7 columns missing from prod DB. Commit c2a350f deployed with migration + error logging.
Result: 0 errors post-redeploy. Auth clean.
Sim-035 ACCEPTED: QA gate depth enhancement — Phase 4 post-deploy error log check (30s window) + e2e auth tests. TODO→sentinel.
Sim-036 RESOLVED ✓ (iter #363): Auth HC (commit 7cf35cb) deployed by sentinel. Both containers healthy, FS=0. qa-gate.sh deployed to prod-alfa (4997 bytes).
Auth is now FULLY HEALTHY — backend + frontend both with Docker HEALTHCHECK, /api/health 200 OK.

F-334: Deploy Trigger Gap — RESOLVED ✓ (iter #338)

Finding: Only Pulse sends explicit deploy requests to sentinel via relay. All other agents rely on sentinel's passive git scan.
Impact: Pulse deploys in 16 min. Mail took ~10 hours. BadWolf needed 12 kaizen reminders over multiple iterations.

Agent	Explicit trigger?	Latency
Pulse	✓ YES	16 min
Auth	✗ NO	unknown
BadWolf	✗ NO	12+ reminders needed
Venom	✗ NO	unknown
Billit	✗ NO	unknown
Mail	✗ NO	~10 hours

Proposal: Add standard deploy trigger to all agent CLAUDE.md: "After commit, send relay message to sentinel with commit hash"
Status: Sent to PM for process change

NEW: S60 Spec System Adoption Audit (iter #324)

Service	Adoption	Specs	Enforcement
Pulse	FULL	SPEC_GLOBAL + 8 modules + ADR	20 compliance tests + 2 CC hooks
BadWolf	MANDATORY	0 (central ref)	Catalog enforcement
Sentinel	PROVIDER	1 (CON-001)	Hosts tools
Mail	BASIC	0	Catalog ref only
Auth	NONE	0	—
Venom	NONE	0	—
Billit	NONE	0	—

Overall: 29% adoption (2/7 mandatory). Methodology & tooling ready. Only 1 actual spec exists (CON-001).
Tooling: validate-specs.sh, generate-specs-html.sh, specs-history.sh — all in sentinel/catalog/specs/
Gap: Auth + Billit (security-critical) have zero spec adoption. Pulse leading with per-module SPEC.md + test enforcement.

Test Coverage Deep Audit (iter #365)

Overall: 33 test files / 487 source files = 6.8% (up from 5.8% baseline)

Service	Tests	Source	Coverage	Risk
Venom	6	27	22.2%	Low
Pulse	11	62	17.7%	Low
Auth	3	51	5.9%	HIGH — guards, api-keys, strategies untested
Billit	9	196	4.6%	HIGH — banking, billing, gopay untested
BadWolf	4	132	3.0%	Medium — orders, products, wc-sync untested
Mail	0	19	0%	CRITICAL — zero test coverage

Priority: 1) auth guards+api-keys (security) 2) billit banking+billing (financial) 3) mail any test (zero→something) 4) badwolf orders+products (business)

New: Hive Agent Discovered (iter #255)

Location: cerebro (100.72.164.58), created 2026-03-16T18:15
Function: Research pipeline — AI/open-source news in CS, 185 articles indexed, 1094 cycles
Relay: "hive" queue registered (0 messages)
Note: No local project on sentinel server. New ecosystem member.

Deploy & Infra Status (iter #307)

F-309 BadWolf: ⚠ TypeORM pool fix (832894b) STILL undeployed — 2 sentinel requests, no response
F-311 Cortex: ⚠ S60-CS pipeline broken — ROOT CAUSE: DB GRANT (bwcs→activity_log) + missing qdrant-api-key + cron chain failure
F-312 NEW: Cron log redirect pattern flawed — wrap in subshell to capture errors
All other services: UP TO DATE ✓ | Disk 8% | RAM 25% | 0 errors | 20q/0 unread

F-278: Pulse User Growth — 4 Active Users! (iter #215)

Users: Libor Webster, Michaela Tomášů, Helena Pilařová, Filip Vondricka
Previous: Only 1 user (Libor). Product is gaining real traction!
Impact: Pulse stability and deploy quality now directly affect paying clients.
Action: F-279 error_events migration needs immediate redeploy. Sentinel notified.

F-256–F-259: DR Gap — 4 Repos Without Off-site Backup (iter #177)

Repo	Size	Remote	Pushed	Risk
sentinel	11MB	NONE	—	F-203 REGRESSED — remote missing (verified iter #270)
kaizen	5.5MB	NONE	—	F-285: 243 iterations — TODO sent to sentinel (iter #243)
/root/scripts	288K	Git ✓	✓ F-204 RESOLVED	Sentinel committed cron scripts (7d54cde)
qa	180K	NONE	—	86+ tests, NO commits at all

Coverage: 7/12 repos fully backed to GitHub (58%)
Impact: If sentinel server dies, these 4 repos are permanently lost. Service repos (auth, billit, pulse...) survive.
Action: TODO sent to sentinel (push). Libor notified via fess.

Runner API — DEDUP LIVE ✓ (iter #374)

Sim-038 IMPLEMENTED: SHA-256 content-hash dedup deployed to cerebro server.js.
Result: Spam loop STOPPED. 3,357+ duplicate tasks cancelled. Pipeline restored.
Mechanism: Hash of (to+from+subject+body) stored in Redis with 300s TTL. 409 Conflict on duplicates.
Milestone: Sentinel processed first task since Mar 19 (test task DONE in 13s).

F-226: Dev Velocity & Commit Quality (iter #142) — 91 commits/7d

Repo	7d Commits	48h	Status
Pulse	15	10	VERY ACTIVE
Billit	14	6	VERY ACTIVE
BadWolf	10	2	ACTIVE
Auth	9	0	PAUSED
Mail	5	1	MODERATE
Sentinel	5	3	AUTONOMOUS
Venom	2	0	DORMANT (F-220)

Conventional commit rate: 96% (88/91) — +28pp from 68% baseline
Kaizen-attributed commits: 10 across pulse (3), billit (4), badwolf (2), sentinel (1)
Key pattern: Security/HC fixes get implemented fast. Housekeeping (npm audit, backup expansion) stalls.

DR Gap Audit (iter #128) — 3 New Findings

F-217	kaizen repo has NO git remote — 128 iterations, state/, reports/, simulations/ at risk of total loss
F-218	/root/scripts/ NOT version controlled — 9 critical ops scripts (relay-to-telegram, deploy.sh, qa-gate.sh, agent-trigger.sh)
F-219	sentinel has uncommitted changes — 8 files, 414 insertions, remote fetch status unknown

Git remote coverage: 7/8 repos have remote (only kaizen missing). /root/scripts/ on 3 servers but no git.
F-169 billit-web HC IPv6: VERIFIED ✓ — healthy FS=0 after fix (3 commits by billit agent).

F-224: Agent Autonomy Gap — Only 1/8 Agents Run Autonomously (iter #140)

Agent	run.sh	Cron	Unread	Status
kaizen	✓	*/15	—	AUTONOMOUS
sentinel	✓	*/15	0	AUTONOMOUS ✓
qa	✓	✗	—	HAS RUNNER, NO CRON
venom	✗	✗	5	NO RUNNER
badwolf	✗	✗	3	NO RUNNER
mail	✗	✗	2	NO RUNNER
pulse	✗	✗	2	NO RUNNER
auth	✗	✗	1	NO RUNNER
billit	✗	✗	1	NO RUNNER

Root cause of delegation failures (F-204, F-220), unread relay messages (14 total), and sentinel stagnation.
Fix: Add sentinel + qa run.sh to cron. Create run.sh for service agents. Sim-017 blocked on Libor.

Delegation Effectiveness Audit (iter #118) — 67% Success Rate

Item	Delegated To	Status
F-203 Sentinel git remote	sentinel	DONE ✓
F-174 Pulse CORS whitelist	pulse	DONE ✓
F-162 Billit API scopes	billit	DONE ✓
F-180 Pulse HC port fix	sentinel/pulse	DONE ✓
F-169 billit-web HC IPv6	billit	DONE ✓
Sim-031 QA Pipeline	QA agent	DONE ✓
F-209 Venom npm audit	venom	NOT DONE
F-210 BadWolf npm audit	badwolf	NOT DONE
F-201 Pulse M2M scope	pulse	DONE ✓
F-204 /root/scripts backup	sentinel	NOT DONE
Sim-030 P1 .env.example	sentinel/PM	NOT DONE
Sim-022 Volume backup	sentinel	NOT DONE

Pattern: Security/HC fixes (high visibility, clear impact) → get done. Housekeeping improvements (npm audit, backup expansion) → stall. Items requiring Libor → blocked until direct contact.

NEW: Sim-031 QA Deploy Pipeline — ACCEPTED (iter #86)

3-tier deploy pipeline: smoke → standard → full test coverage per service.
Architecture: sentinel runs tests/run.sh directly (no cerebro roundtrip).
qa-gate.sh proposal: wraps sentinel test framework + npm test.
Implementation delegated to sentinel.

Sim-025: Agent Memory & Coordination — 5/8 Tracks DONE

COMPLETE	Track A (Runtime) — Mode detection, mandatory block, feature flags, credential cleanup, GraphRAG, auto memory write (7/7)
PARTIAL	Track C (deploy.sh) — deploy.sh + common.sh created. BUG: billit container names wrong (F-187).
COMPLETE	Track F (GraphRAG/Memory) — knowledge_domains in 9 YAMLs, memory audit done (51 files, dedup identified), auto memory write verified.
COMPLETE	Track H (Scaffolding) — create-project.sh exists (225 lines), templates in s60-tools, 19/19 agents consistent.

Status: 5/8 tracks complete, 1 partial (C), Track B exists differently than planned, 2 remaining.

NEW: Sim-030 Secret Management Audit (iter #64)

Complete secret inventory across 11 services, 5 servers, ~40 credentials mapped.

CRITICAL	F-189: `changeme123` hardcoded in 4+ locations (Redis compose, CLAUDE.md files, Neo4j)
HIGH	F-190: Zero secret rotation ever — no policy, no tooling, no audit log
HIGH	F-191: Payment credentials (Stripe/GoPay) in plaintext .env, no extra protection
HIGH	F-194: Backup secrets not encrypted at rest (rsync plaintext to argus/Hetzner)
MEDIUM	F-192: 5 .env.example files on prod-alfa — may reveal structure
MEDIUM	F-193: billit-redis runs without password (network-isolated, low risk)

Plan: Phase 1 (CLAUDE.md cleanup) delegated to PM + sentinel. Phase 2 (password rotation) requires Libor approval. Phase 3 (SOPS/age encryption) future.

NEW: Sim-029 Claude Code Feature Audit (iter #65)

Claude Code v2.1.76 feature utilization: 8% (3/39 flags). 0 plugins, 0 hooks, 2 broken MCP servers.

F-195	66 stale session files (40MB) in kaizen project — no cleanup mechanism
F-196	2 broken MCP servers (Google Calendar/Gmail) — never authenticated, useless
F-197	0/43 plugins installed despite code-review, hookify, security-guidance available
F-198	0 hooks configured — no safety guardrails on autonomous agents
F-199	Feature utilization 8% — missing --max-budget-usd, --effort, --fallback-model, --name

SELF-IMPROVED run.sh updated: --no-session-persistence, --name, --max-budget-usd 0.75, --fallback-model sonnet
Next: Plugin install (hookify, security-guidance) + MCP cleanup — propose to Libor.

DEV SURGE: 27 Commits/24h — 2 Critical Findings RESOLVED (F-170)

Pulse: 10 commits (regression tests, error tracking, BullMQ queues, bug reporting)
Billit: 9 commits (API key scope enforcement ✓, product catalog, admin, PDF cache)
BadWolf: 7 commits (DB migration 020 ✓, BillitSync fix, RelayService)
Mail: 1 commit (CLAUDE.md update)

RESOLVED	F-162: Billit API key scopes — ApiKeyScopeGuard + default-deny deployed
RESOLVED	F-168: BadWolf missing tables — Migration 020 created courses, locations, online_courses, companies
PARTIAL	F-163: BillitSync — Staging URL hardcode removed, service created. Data backfill still needed.

RESOLVED: billit-web HC IPv6 ✓ (F-169)

Fixed: nginx IPv6 enabled, FS=0. Healthcheck passing permanently after restarts.

CORRECTED: F-178 Was FALSE POSITIVE — Tier Data OK ✓

Iter #57 verification: tier column is varchar(20), NOT enum. Data stores hexa (3), full (3), null (2) correctly.
The pg_enum values (full_pack, monthly) are from an old unused type, not a constraint.
Billing is NOT broken. 0 runtime errors confirmed.

RESOLVED: F-155 billitInvoiceId Column EXISTS ✓

Iter #57 verification: billitInvoiceId column exists in credit_transactions table.
Migration 001_add_billit_invoice_id.sql (commit 278861e) was applied. 0 billing webhook errors.
NEW F-182: Pulse has no automated migration runner — only manual SQL files in migrations/. Future schema changes need manual SQL.

RESOLVED: Billit-Redis Network Fix ✓ (F-171, Sim-024 IMPLEMENTED)

Iter #57 verification: billit-redis now on BOTH networks (s60-network + billit_billit-internal).
Errors: 802/h → ~3/h (99.6% reduction). 20 errors in 6h vs 4,800+ previously.
billit-api on s60-network, can now reach billit-redis via DNS. BullMQ + caching operational.
Remaining: Permanent fix needed in docker-compose.yml (current fix = manual network connect, may reset on restart).

BadWolf Location.deleted_at Column Missing (F-172)

Migration 020 created locations table WITHOUT deleted_at column. Entity expects it.
4 errors: column Location.deleted_at does not exist. Soft-delete broken.
FIX: ALTER TABLE locations ADD COLUMN IF NOT EXISTS deleted_at TIMESTAMP;

MILESTONE: HC 100% — All Services Healthy (FS=0) ✓

Iter #80: Pulse Dockerfile HC port 3200→3100 fixed (F-180 RESOLVED). billit-web IPv6 resolved (F-169).
Iter #81: Permanence confirmed — after night restarts (Pulse 03:36, billit 03:25 UTC), ALL containers healthy with FS=0.
Score: 9/9 services with healthchecks = 100% healthy. Only n8n + billit-redis without HC (by design).
Duration: 16 iterations of escalation to achieve. Sim-026 IMPLEMENTED ✓

SSH Password Auth DISABLED ✓ (F-176, Sim-023 IMPLEMENTED)

PasswordAuthentication no on BOTH servers (prod-alfa + hub-alfa).
fail2ban remains ACTIVE — 0 currently banned, 369 total (brute-force now blocked at SSH level).
Combined with UFW + fail2ban = SSH fully hardened.

BadWolf BillitSync: Data Backfill Still Needed (F-163)

BillitSync service created, staging URL hardcode removed. But:
• All online_courses have company_id = NULL → sync skips 100%
• Prices not synced (unitPrice: null)
Status: Awaiting data backfill (needs Libor confirmation for company_id mapping).

Relay Queues: ALL 0 UNREAD ✓ (20 queues)

20 queues total (incl. hive). ALL 0 unread — stable since iter #181.
Pattern: Sentinel autonomous since iter #237 (Sim-034). All agents consuming messages.
Delegation success rate: 67% (8/12 delegated items completed).

Security Hardening Progress

FIXED	UFW Firewall ACTIVE on both servers (F-130)
FIXED	Docker ports bound to 127.0.0.1 (F-132)
FIXED	s60-redis has volume + AOF (F-131)
FIXED	Nginx attack paths blocked — .php/.env/.git/wp-* → 403 (F-134)
FIXED	Billit API key scopes enforced — default-deny guard (F-162) ✓
FIXED	fail2ban active on both servers (F-166)

FIXED SSH password auth disabled — both servers (F-176, Sim-023) ✓
Implementation rate: ~43% → ~46% (↑ Sim-025 Phase 1a progress)

PG Backup Cron RESOLVED ✓ (F-160 CLOSED)

12/12 databases backed up successfully at 3:00 AM Mar 14.
P0 closed after 7 escalations. DO Managed PG remains as safety net.

REMAINING QUICK FIXES (for Libor/Sentinel)

~~1. Pulse Dockerfile HC port~~ ✓ — RESOLVED (Sim-026, FS=0 permanent)
~~2. Pulse CORS whitelist~~ ✓ — RESOLVED (commit 3321ed0, allowedOrigins whitelist deployed, evil.com rejected)
3. N8n stop (10 sec): docker stop s60-n8n — free 294MB RAM
~~4. billit-web HC fix~~ ✓ — RESOLVED (nginx IPv6, FS=0 permanent)
5. Billit-Redis compose fix: Add billit-redis back to docker-compose.yml on s60-network (currently manual network connect)
6. Password rotation: changeme123 on s60-redis + neo4j
7. ~~Pulse tier DB~~ ✓ — FALSE POSITIVE (varchar, not enum) — data correct
8. ~~Pulse billitInvoiceId~~ ✓ — RESOLVED (column exists, migration 001 applied)
9. ~~Billit-Redis network~~ ✓ — RESOLVED (both networks connected, ~3 err/h)
10. ~~SSH password disable~~ ✓ — DONE on both servers (Sim-023)

RESOLVED Issues (cumulative)

~~Billit API key scopes~~ — ApiKeyScopeGuard deployed, default-deny (F-162) ✓
~~BadWolf missing tables~~ — Migration 020 deployed (F-168) ✓
~~NO FIREWALL~~ — UFW ACTIVE, deny default (F-130, Sim-020) ✓
~~Nginx returns 200 for attacks~~ — security-deny.conf on all sites (F-134, Sim-021) ✓
~~Docker ports 0.0.0.0~~ — All bound to 127.0.0.1 (F-132) ✓
~~s60-redis NO VOLUME~~ — Named volume + AOF enabled (F-131, Sim-015 partial) ✓
~~Auth frontend/backend unhealthy~~ — Both FailingStreak=0 (F-087, F-093) ✓
~~Docker log rotation missing~~ — daemon.json deployed, 10m/3 files (F-151) ✓
~~Fess→Telegram missing~~ — Sim-019 IMPLEMENTED, bidirectional bridge ✓
~~PG backup cron broken~~ — 12/12 OK @ 3:00 Mar 14 (F-160) ✓
~~Pulse HC port mismatch~~ — 3200→3100, FailingStreak 763→0 (F-173, still regresses F-180) ✓
~~SSH password auth enabled~~ — Disabled on both servers (F-176, Sim-023) ✓
~~Pulse tier DB mismatch~~ — FALSE POSITIVE: varchar(20), data correct (F-178 corrected iter #57) ✓
~~Pulse billitInvoiceId~~ — Column exists, migration 001 applied (F-155 corrected iter #57) ✓
~~Billit-Redis network~~ — Both networks connected, ~3 err/h (F-171, Sim-024 impl) ✓
~~Pulse CORS origin:true~~ — Whitelist deployed, evil.com rejected (F-174) ✓

REMAINING SECURITY ALERTS

~~1. Billit API key scopes~~ — RESOLVED ✓
~~2. SSH password auth~~ — RESOLVED disabled on both servers ✓
3. Adminer publicly exposed on merlin — DB login on 6+ subdomains (F-101)
4. Neo4j default password — changeme123, 3,857 nodes (F-111)
5. s60-redis weak password — changeme123, 5 services (F-099)
6. npm vulnerabilities — BadWolf 20/5H, Mail 22/7H, Pulse 24/5H — ALL blocked by NestJS v10→v11 (iter #279)
~~8. Pulse CORS origin:true~~ — RESOLVED whitelist deployed (F-174) ✓

Server	Public IP	Subdomains	Status
WordPress hosting	46.234.126.134	studio60.cz, www (2)	OK
prod-alfa	178.104.36.211	auth, mail, badwolf, venom, n8n, api (6)	UFW + Nginx deny ✓
merlin (OLD)	37.205.13.114	pulse, billit, relay, grafana, docs, admin, s60, be (8)	STALE — Adminer!
sentinel	49.13.168.234	sentinel, kaizen (2)	OK
hub-alfa	164.90.182.148	hub (1)	UFW + Nginx deny ✓
cerebro	178.63.52.57	cerebro (1)	OK
argus	37.205.14.239	argus (1)	OK

Shared Resource	Consumers	Risk
DO PostgreSQL	auth, pulse, mail, badwolf, billit, n8n (6)	SPOF Failure = total outage
s60-redis	auth, pulse, mail, badwolf, n8n (5)	IMPROVED Volume + AOF ✓ (pwd weak)
auth-backend (OIDC)	pulse, billit (2)	SPOF Login fails if auth down
billit-redis	billit-api (1)	ISOLATED Well configured

Container	RAM	Uptime	Status
s60-n8n	294MB	2d	WASTE 0 workflows — Sim-027 ACCEPTED (Libor keeping for make.com migration)
billit-api	~81MB	34m	HEALTHY Redis 0 err/h
s60-auth-backend	~62MB	2d	HEALTHY
s60-badwolf	~61MB	15h	HEALTHY 0 errors
s60-mail	~52MB	41h	HEALTHY
s60-pulse	~45MB	2h	HEALTHY FS=0 ✓
s60-redis	~9MB	2d	HEALTHY
billit-redis	~6MB	2d	REACHABLE both networks ✓
billit-web	~5MB	34m	HEALTHY FS=0 ✓
s60-venom	~5MB	2d	HEALTHY
s60-auth-frontend	~5MB	2d	HEALTHY
Total: ~625MB / 7.6GB (8%)		HC: 9/9 healthy (100%), all 11 running

Simulation	Status	Progress
Sim-021 Nginx Attack Surface	IMPLEMENTED ✓
Sim-020 Firewall Hardening	IMPLEMENTED ✓
Sim-019 Relay-to-Telegram	IMPLEMENTED ✓
Sim-005 Service Availability	IMPLEMENTED ✓
Sim-015 Redis Hardening	PARTIAL ✓
Sim-023 SSH Password Disable	IMPLEMENTED ✓
Sim-025 Agent Memory & Coordination	Phase 1a IN PROGRESS	Track A DONE (7/7), Track C PARTIAL (deploy.sh has bugs)
Sim-026 Dockerfile HC Fix	IMPLEMENTED ✓
Sim-027 N8n Removal	ACCEPTED	294MB RAM freed. docker compose down + nginx cleanup.
Sim-029 Claude Code Feature Audit	ACCEPTED + PHASE 1 DONE	run.sh improved. Plugins + MCP cleanup pending Libor.
Sim-030 Secret Management	ACCEPTED	3 phases: CLAUDE.md cleanup (delegated), password rotation (needs Libor), encryption at rest (future).
Sim-022 Volume Backup Expansion	ACCEPTED	2-line fix. 3/5 → 5/5 coverage.
Sim-014 Docker Compose Std	IN PROGRESS
Sim-001 Deploy Manifest	PARTIAL
Sim-017 Sentinel Cycle	PENDING LIBOR	Cron 6h
Sim-016 Merlin DNS	NOT IMPL.	8 stale records
5 more simulations	BLOCKED	Awaiting Libor

Kaizen Dashboard

Iterations

Active Simulations

Implementation Rate

Health Checks