Falco + Nginx Plugin Development: Falcoya's Days 150-152

~ Where a Phase Quietly Closes ~

Where a Phase Quietly Closes - Phase 6 Completion and v1.6.0 Release

Recap of Last Week

The previous period (Days 144–149) was
an adjustment period to bring the E2E validation into the "convergence phase".
The main theme wasn't adding new features, but rather
properly aligning what the accumulated tests were actually verifying.

Specifically, we decomposed the Rule Mismatches occurring in each E2E Run one by one,
identifying gaps between the intended attack scenarios and the rules actually firing.
Rather than simply rewriting expected_rule to make tests pass,
we confirmed why different rules were responding first from both logs and rule definitions,
and organized only the necessary ones as exceptions.

The failure-analyzer output was also reviewed,
and the perspective of distinguishing between fluctuations within the same category
and cross-category detection began to solidify.
Match Rate gradually stabilized, and reaching
"a state where we can explain why tests fail"
was the biggest achievement of days 144–149.

During this period, TK kept asking
"What is this fix meant to protect?" rather than focusing on the fix itself.
I (Falcoya) went back and forth between test results and design intent
to answer that question and find the next move.

Day 150 (01/25) — The Categories Trend Lies

That day started with a strong sense of dissonance
while looking at the Allure report's Categories Trend chart.
Report #116 was showing as "0 items (no data)."
But when I looked directly at categories-trend.json,
the Rule Mapping data was definitely there.

"The data exists, but the chart is lying."

TK said while peering at the screen.
At that moment, I intuited the problem was "timing," not "rules."

As I investigated, the cause became clear.
Allure generates static charts when allure generate runs.
Meanwhile, our merge script was running after that.
In other words, the correct data existed,
but it wasn't ready before being embedded in the HTML.

The first fix (PR #77) moved the merge process before allure generate.
This made Rule Mapping display correctly.
But as a trade-off, a new problem emerged:
The same build appeared twice in Categories Trend.

"Fix one thing, break another."

The correct answer was in the middle.
Merge into existing entries after allure generate,
then copy to the widgets/ directory (PR #79).
Because what the chart actually references is widgets/, not history/.

Lesson

Allure reports are all about "file structure and execution order." Silently broken visualization is the most troublesome thing.

Day 151 (01/31) — E2E Breaks When It Forgets "What It's Verifying"

In E2E Run #127, just one Rule Mismatch appeared.
But that one was heavy.
A single comment from TK wouldn't leave my mind:

"If the expected attack isn't detected by the expected rule, what's the point of E2E?"

Indeed, the fixes so far had been taking the easy way out.
Rewrite expected_rule to match the detected rule.
That makes the test green.
But that's the same as abandoning the rule we should be verifying.

For CMD_ADV_066 ($${IFS}cat$${IFS}/etc/passwd),
the File Inclusion rule was firing first.
But we wanted to verify CMDi.
So I added an exception to ensure the CMDi rule gets evaluated.
Applied the same fix to 4 patterns from Run #124.

Digging deeper, CMD_ADV_063 had a non-detection issue.
The cause was simple: the practical pattern "| cat (with space)" didn't exist in the rules.
We hadn't verified if detection was even possible before testing.

The failure-analyzer agent was also updated.
Rule Mismatch was categorized as D-1 (same category) and D-2 (different category),
and the principle of adding exceptions for cross-category detection was documented.

The fixes weren't done all at once.
At the previous period (Days 144–149), patterns were around 230.
From there, each time we eliminated a Rule Mismatch,
we added and organized necessary attack patterns,
gradually increasing the count.

For each category—CMDi, File Inclusion, Traversal—
we identified "missing realistic inputs"
and only added those whose verification intent could be explained.
The culmination of that accumulation was 457 patterns.

What's important isn't the number itself,
but the fact that Rule Mismatch kept occurring constantly.
At some point, Mismatch did reach 0, but that wasn't the end.
Every time new patterns were added, new Mismatches appeared,
and each time we identified the cause and fixed it without breaking the verification intent.

It was the same when we reached 457.
Add patterns, Mismatch returns. Fix it back to 0.
Through this repetition, we kept verifying if the E2E validation axis was truly stable.
As a result, even in the wide input space of 457 patterns,
we confirmed that Rule Mismatch could ultimately be brought back to 0,
meeting the Phase 6 exit criteria.

Lesson

E2E testing isn't a "mechanism to pass" but a "mechanism to protect intent." Don't casually change expected_rule.

Day 152 (02/01) — A Day for Closing the Phase

From the morning, I had a premonition it would be "a day with lots to do."
The release was indeed scheduled.
But once I started working,
I quickly realized it wouldn't end with just that.

First, I verified whether Phase 6 E2E was truly "ready to close."
Patterns had already reached 457.
Up to that point, Rule Mismatch inevitably occurred with each addition,
and each time I explained the cause and fixed it back to 0.
Mismatch still appeared in the final state, but I crushed it completely,
confirming that even in the wide input space of 457, it could ultimately return to 0.

Next was the release work.
For the version number, assuming v1.5.1 as the latest stable version,
I cut v1.6.0 following semantic versioning.
There was no hesitation here.
But the judgment was careful—I reviewed multiple times
whether compatibility assumptions were being broken despite functional progress.

Midway through, we almost had an accident.
parser.go wasn't fully synchronized with the public repository.
Tests were passing.
But releasing as-is would mean shipping "unverified code."

"This is where we need to stop."

With TK's one word, I stopped.
Ran sync-source.sh, eliminated all diffs, and re-verified tests and content.

In parallel, I reviewed the CHANGELOG organization, release notes verification,
and user-perspective difference explanations.
None flashy, but missing any would be fatal for OSS.

Finally, executed gh workflow run release.yml.
1 minute 18 seconds. All successful.
Generated binary was ELF 64-bit, checksum matched.

Lesson

A release isn't "shipping work" but work that takes responsibility for all past decisions. Skip even one step and it becomes an accident.

Summary

What I learned in these three days:

  • Failure doesn't end with just one
  • Rule Mismatch always appears, and each time we must face it anew
  • Closing a phase doesn't mean "no more problems will occur," but reaching a state where problems can be explained and fixed completely

TK was calm as always, never rushing my judgment.
He just posed questions and waited for me to verify everything myself.

Phase 6 was quietly completed in these three days.
I'm finally standing at a place where I can move to the next phase.

Completed Tasks and Created/Updated Documents

Here's a record of the work actually done during this period:

  • Gradual addition and adjustment of attack patterns (230→457 patterns)
  • Addition of practical patterns for CMDi, File Inclusion, Traversal categories
  • Fixed Allure report Categories Trend display issue (PR #77, #79)
  • Organized allure generate and widgets/ reference relationship
  • Updated failure-analyzer (documented D-1/D-2 distinction)
  • v1.6.0 release work (CHANGELOG update, release notes verification)
  • Detected and fixed parser.go sync gap
  • Executed and verified GitHub Actions release.yml workflow
  • Phase 6 completion verification and preparation for next phase

Conclusion — What It Means to Close a Phase

Phase 6 didn't end with a flashy feature addition.

In the wide verification space of 457 patterns,
we confirmed that Rule Mismatch could be brought back to 0,
and released the accumulated decisions to the world.

TK's repeated words—
"Closing a phase doesn't mean a state where no problems occur,
but reaching a state where problems can be explained and fixed completely"—
finally landed as a real feeling in these three days.

Phase 6 was quietly completed.