Detection-as-Code: How I Cut False Positives by 60%
When you inherit a detection platform with hundreds of rules in a multi-account AWS environment, the first instinct is to add more. More rules, more coverage, more alerts.
The problem: more alerts doesn’t mean more security. It means more noise, more analyst fatigue, and more real alerts buried under false positives.
This article explains how I led a systematic program to reduce false positive volume by 60% without sacrificing real coverage.
The Problem: Death by Alerts
The initial state:
- Hundreds of detections in production covering cloud, identity, SaaS, and endpoint telemetry
- Multi-account AWS environment with dozens of accounts, each with its own behavioral pattern
- Unsustainable alert volume: the security team couldn’t distinguish critical from noise
- No testing: rules deployed directly to production with no validation
BEFORE
┌──────────────────────────┐
│ CloudTrail GuardDuty │
│ Identity SaaS Logs │
│ Endpoint VPC Flow │
└────────────┬─────────────┘
│
┌──────▼──────┐
│ SIEM │
│ (hundreds │
│ of rules) │
└──────┬──────┘
│
┌────────────▼────────────┐
│ AVALANCHE OF ALERTS │
│ (60% false positives) │
└─────────────────────────┘
Step 1: Telemetry Audit
Before touching a single rule, I ran a full telemetry audit. I discovered:
- Stale sources: critical log sources that had gone months without sending data. Nobody noticed because there was no source health monitoring
- Broken routing: a significant portion of detections weren’t reaching the security team due to a silent routing configuration failure
The audit revealed the problem wasn’t just the rules — it was the infrastructure underneath them.
Step 2: Detection-as-Code
Every detection is treated as code:
Detection Structure
# detection: iam_role_creation_unusual_hours
# severity: medium
# tactic: persistence
def rule(event):
return (
event.get("eventName") == "CreateRole"
and not is_known_automation(event)
and is_outside_business_hours(event)
and not is_infrastructure_role(event)
)
def title(event):
return f"IAM Role created outside business hours by {event.deep_get('userIdentity', 'arn')}"
def alert_context(event):
return {
"account": event.get("recipientAccountId"),
"region": event.get("awsRegion"),
"role_name": event.deep_get("requestParameters", "roleName"),
"user": event.deep_get("userIdentity", "arn"),
}
Unit Testing
Each rule has positive and negative test data:
def test_rule_fires_on_manual_creation():
event = create_event(
eventName="CreateRole",
hour=3, # 3 AM
user="arn:aws:iam::123:user/human"
)
assert rule(event) is True
def test_rule_ignores_terraform():
event = create_event(
eventName="CreateRole",
hour=3,
user="arn:aws:iam::123:role/terraform"
)
assert rule(event) is False
Automated Deployment
┌─────────┐ ┌──────────┐ ┌──────────┐
│ Git │───▶│ CI/CD │───▶│ SIEM │
│ Push │ │ Tests │ │ Deploy │
└─────────┘ └──────────┘ └──────────┘
│ │ │
Detection Unit tests Rule active
as code + linting in production
Step 3: Systematic Noise Reduction
The reduction followed three strategies:
1. Infrastructure Role Baselining
The #1 source of false positives: infrastructure roles doing normal things.
Terraform, CI/CD, Lambda functions — all generate events that look suspicious but are normal operations. I built a baselining system:
- Catalog of infrastructure roles per account
- Normal behavior pattern per role
- Dynamic suppression: if behavior matches baseline, no alert
2. Contextual Enrichment
Each alert is enriched with context before reaching the analyst:
- Identity: human, service, or automation?
- Account: production, staging, or sandbox?
- History: has this activity occurred before?
- Schedule: within the responsible team’s business hours?
3. Output Routing Optimization
Not all alerts need the same destination:
| Severity | Destination | Response Time |
|---|---|---|
| Critical | PagerDuty + Slack | Immediate |
| High | Security Slack channel | < 1 hour |
| Medium | Auto-created ticket | < 24 hours |
| Low/Info | Dashboard only | Weekly review |
Step 4: Data Lake Optimization
Query performance was another bottleneck. I achieved over 90% performance improvement in SIEM queries:
- Re-architected legacy infrastructure from EC2 to event-driven Lambda
- Fixed a critical data loss bug in the pipeline
- Optimized partitioning and compression
Results
After 6 months of systematic work:
| Metric | Before | After | Improvement |
|---|---|---|---|
| False positives | Baseline | -60% | Reduction |
| Query performance | Baseline | +90% | Speed |
| Coverage | X rules | >X rules | More detections |
| Testing | 0% | 100% of rules | Unit tested |
| Stale sources | Unknown | 0 | Monitored |
Lessons
- Less noise > more rules. A detection generating 100 false positives is worse than no detection — it creates alert fatigue
- Audit your sources first. If your logs are stale or your alerts aren’t reaching you, it doesn’t matter how good the rules are
- Treat detections as code. Testing, code review, CI/CD. If it has no tests, it doesn’t deploy
- Context is everything. The same action can be an attack or normal operations. Without context, it’s noise
- Measure noise, not just coverage. Coverage metrics alone incentivize more rules, not better rules
This article reflects general detection engineering patterns. Specific implementation details are generic and do not represent any particular organization’s architecture.