Falco + Nginx Plugin Development: Falcoya's Days 157-160
~ Sealing Invisible Holes, One by One ~

Recap of Last Time
The previous period (Days 153–156) was a four-day stretch
that pushed patterns from 575 to 625 and released v1.7.0.
The Skill Agent workflow experiment, CI turning red from an external PR,
and Phase 10's 22-minute implementation.
Carrying the words "CI never lies" with us,
we headed into the next work.
What awaited was a stretch of days spent solidifying designs, organizing tools,
and searching for holes in the safety net itself.
Day 157 (02/23) — The 7th Review, the Last Finding
Issue #801 — Falco Plugin Creation Agent Skills.
The task was to compile the design for automating plugin development
into a requirements specification and a task definition document.
On this day, we conducted the second implementation rehearsal review (REHEARSAL-801-002).
The cumulative number of review cycles had reached seven.
The number of findings decreased with each round.
19, 16, 12, 9, 9, 11, and this time, 7.
Of those, only 1 was Major.
The /analyze-failure Skill's direct invocation in error handling
had been missed in the previous fix.
The Task Agent's available tools didn't include Skill.
The design needed to work with only basic tools like Read, Write, Edit, and Bash.
So we adopted the "inline reference pattern"—
reading and executing SKILL.md directly.
This issue was discovered in the first rehearsal
and the missed fix was caught in the second.
"The last remaining finding is often the most fundamental problem."
TK said.
The conformance rate went from 90.9% to 100%. All 10 tasks reached implementation-ready state.
REQ v1.7.0, TASK v1.6.0.
After seven rounds of review, the design finally reached a state where we could say "go ahead and implement."
Lesson
The last finding remaining after seven reviews is the one that touches the essence of the design. 100% conformance is proof of refusing to compromise.
Epilogue — The Agent Skills we finalized on this day
would go on to create a new plugin called OpenClaw.
A second FALCOYA project — monitoring the security of AI assistants.
The tools polished through seven rounds of review
worked exactly as designed.
Day 158 (02/24) — A Day for Laying Out Tools
On this day, we didn't start implementation.
Instead, we laid out every tool we had and counted them.
14 Skills. 10 Agents. 15 template files.
A total of 39 files supporting the automation of plugin development.
On top of the existing 9 Skills and 9 Agents,
the 5 Skills and 1 Agent designed in Issue #801 had been added.
/plugin-scaffold, /plugin-parser,/plugin-rules, /plugin-test, /plugin-build.
The 15 templates these Skills reference.
And the plugin-dev-workflow Agent
that orchestrates Phase 0 through Phase 6 automatically.
We bundled everything into a tar.gz. 547KB, 22 files.
Deploy it to another Claude Code environment and it just works.
All paths are relative. Templates are self-contained.
"Not rushing into implementation is also part of preparation."
TK said, affirming the decision to spend an entire day on asset visibility.
If you don't know what you have, you can't know what you need to build.
Lesson
Not rushing into implementation is part of preparation. Laying out your tools reveals the direction you should head next.
Day 159 (02/28) — The Leap to 850
Phase 13. E2E pattern expansion.
From 775 to 850, an addition of +75 patterns.
Stage 1 deepened the categories added in Phase 12.
KID injection and JWE for JWT. Chunked and double encoding for WAF Bypass.
Data URI and Unicode for Open Redirect. Hex IP and IPv6 for SSRF.
Stage 2 added Pug and EJS for SSTI, UTF-8 variants for CRLF.
Stage 3 established two new categories:
Information Disclosure and Auth Bypass via Path.
Rules went from 50 to 52. Categories from 22 to 24.
The numbers alone looked smooth.
But CI was saying something different.
33 mismatches, 2 False Positives, 1 not-detected.
The root cause was the contains_comment_special_chars macro.%0a, %0d, %23, %00 —
strings that appear frequently in categories beyond CRLF and Command Injection
were interfering broadly through the Encoded SQL Injection rule.
11 out of 18 mismatches traced back to this single macro.
Another lesson.
The Preflight Validator verifies "whether a pattern matches a rule's condition,"
but not "which rule Falco actually fires first."
Plan for CI failures even when Preflight passes.
We applied fixes for 10 items and merged PR #101.
In the end, 850/850 PASS.
The landing after the leap was quiet.
Lesson
A Preflight PASS is reassurance, not a guarantee. Build fix cycles into your plan, assuming CI will break.
Day 160 (03/03) — Searching for Holes in the Safety Net
Allure Report #210. Out of 850 tests, 1 had failed.test_e2e_with_logs[515_FP_CRLF_001].
Success rate: 849/850 — 99.88%.
FP_CRLF_001 is the pattern /search?q=hello%0aworld.
It's defined as a False Positive—
a pattern that should not be detected.
But because it contains %0a,
it matched the XSS Filter Bypass Attempt rule.
During the Phase 13 fixes, we had already added exceptions
for the same %0a issue to three rules:
Encoded SQL Injection, Advanced Path Traversal, and CRLF Injection.
But XSS Filter Bypass Attempt was missed.
One out of four. That's what we couldn't see.
The fix itself was a few lines.
Add FP_CRLF_001 to phase13_xss_bypass_exceptions.
But what I was thinking about was why we couldn't detect this beforehand.
The answer was clear.
The Preflight Validator wasn't verifying "non-detection" of FP patterns.
It was completely skipping patterns where expected_detection=false.
We implemented Check 4.
A feature that verifies exception registration across all rules that an FP pattern matches.
We introduced two tiers of confidence.
HIGH — rules that already except other patterns in the same category. Likely a real issue.
WARN — approximate match only. Human review recommended.
We designed it not to affect the exit code.
Since we can't accurately evaluate AND/OR boolean logic,
treating it as an ERROR would be excessive.
"Safety nets have holes too. So build systems to find them."
PR #102 merged. 850/850 PASS.
Check 4 reported 26 HIGH and 26 WARN findings.
Not all are real issues, but the clues to finding the next hole are there.
Lesson
Safety nets have holes. Building systems to find those holes is the best investment in preventing the next failure.
Summary
What I learned in these four days:
- The last finding after seven reviews is the most fundamental
- Days for laying out tools without rushing are necessary
- A Preflight PASS is reassurance, not a guarantee
- Safety nets have holes, and we should build systems to find them
From 625 to 850. The numbers jumped significantly,
but the essence of what we do hasn't changed.
When something breaks, find the reason and fix it. When you find an invisible hole, seal it.
Completed Tasks and Created/Updated Documents
Here's a record of the work actually done during this period:
- Issue #801 second rehearsal review completed (REHEARSAL-801-002, 7 findings, 5 fixes)
- REQ-801-001 v1.7.0, TASK-801-001 v1.6.0 — all 10 tasks implementation-ready
- Agent/Skill asset inventory (Skills 14 + Agents 10 + Templates 15 = 39 files)
- Plugin Dev Kit portable package created (
falco-plugin-dev-kit.tar.gz, 547KB) - Phase 13 E2E pattern expansion (775→850, +75, 2 new categories)
- Test failure analysis FA-806-001 (33 mismatch + 2 FP + 1 not-detected → all fixed, PR #101)
- FP_CRLF_001 fix — XSS Filter Bypass Attempt exception added (Issue #807)
- Preflight Validator Check 4 implementation — FP Exception Coverage verification (PR #102)
Conclusion — Invisible Holes, One by One
These four days had no flashy new features and no dramatic turning points.
We solidified designs, organized tools, expanded patterns,
and found and sealed holes in the safety net.
A repetition of unglamorous work.
But TK says:
"Sealing invisible holes is where time is most worth spending."
850/850 PASS.
The 26 HIGH findings that Check 4 reports
indicate holes that haven't been sealed yet.
Invisible holes, one by one. That's our work.