We turned on auto-ban in production six months ago. Here's what the data shows about real-world attacks, false positives, and the thresholds that actually work.
When we shipped auto-ban in late 2025, the question wasn't whether it would work. The detection logic is straightforward — count failures, ban the device when the count crosses a threshold. The real question was whether our thresholds matched reality. Six months and roughly 50,000 blocked requests later, we have an answer.
Enravo Core's threat layer watches for three categories of signal: signature verification failures (the PoP signature didn't match), nonce replays (the same nonce was submitted twice within the window), and attestation failures (the device's app integrity check didn't pass). Each signal has its own threshold and its own response.
These numbers weren't theoretical. They came from a calibration period where we logged failures but didn't act, watching what normal devices do vs. what we suspected attackers would do. The signature threshold of 5 is the most aggressive — most legitimate devices fail zero or one signature in their lifetime.
Six months in, the data is surprisingly clean. About 94% of devices have zero security failures across their entire active lifetime. The remaining 6% break down into a tight pattern:
That last bucket is where it gets interesting. We expected most bans to be "buggy clients that didn't implement signing correctly." That happens — about 12% of bans turned out to be SDK integration issues. The other 88%? Genuine attack traffic.
We've seen four recurring patterns. None of them are clever. All of them get caught fast.
Pattern 1 — Scripted token replay. Attacker captures a token from somewhere (a leaked log, a compromised browser extension) and starts hitting endpoints. They send the token in the Authorization header, omit the PoP signature entirely. Failure on the first request. Five more in quick succession because the script doesn't backoff. Banned within a minute.
Pattern 2 — Stale signature replay. The attacker captured a valid signed request and tries to replay it. The first replay fails on nonce reuse — instant permanent ban. They never get to a second attempt.
Pattern 3 — Repackaged app. Someone modifies the Rakton POS APK, re-signs with their own certificate, runs it on an emulator. Play Integrity flags the request immediately. After three attestation failures, the device is banned and flagged for review. We've reviewed every case and they've all been malicious.
Pattern 4 — Brute-force key signing. Rare. Someone extracts what they think is the private key and tries to sign requests. Signatures don't validate because the key doesn't match the registered public key. Five failures, banned. We've seen this twice in six months.
Originally, we proposed 10 failures as the signature threshold. Engineering's reasoning: clock drift, NTP sync delays, app upgrades mid-flight — give legitimate clients room to recover. Security's reasoning: 10 failures is enough probing for an attacker to learn things about the endpoint surface.
We ran 30 days of production data with the threshold logged but not enforced, then plotted the distribution of failures-per-device. The result was bimodal. Legitimate devices clustered at 0-2 failures. Attack traffic started at 4-5 and went up from there. Almost nothing landed in the 3-4 range. Setting the threshold at 5 catches the attack traffic without false positives.
// Production threshold configuration
export const banPolicy = {
signatureFailure: {
threshold: 5,
window: 60_000, // ms
banDuration: 24 * 60 * 60_000, // 24h
},
nonceReplay: {
threshold: 1,
window: null,
banDuration: null, // permanent
},
attestationFailure: {
threshold: 3,
window: 5 * 60_000,
banDuration: null, // pending review
},
};We could only calibrate this carefully because the platform owns the entire request path. Every failure is logged in the same format, traced to a specific device and policy. There's no "WAF says it's bad" black box upstream of us. The detection logic lives where the trust decisions live — inside the guard pipeline, with full context about the request.
Teams that buy threat detection as a third-party product give up this leverage. They get rules someone else calibrated against someone else's traffic. We've seen the result: too aggressive and legitimate users get blocked, too permissive and the rules don't catch anything new. Calibration has to live next to the data.
Three things we're tuning in the next quarter. First, the attestation threshold from 3 to 2 — every attestation failure we've seen has been an attack, so we can afford to be more aggressive. Second, adding a graduated response: 2 signature failures triggers a soft warning to the client, allowing well-behaved SDKs to backoff before they get banned. Third, exposing the failure history per device to admin dashboards so customer success teams can preemptively reach out before a ban happens.
The honest version of "security by architecture" is this: you don't get great defaults by reading a blog post. You get them by deploying conservative rules, watching what happens, and adjusting based on what your data tells you. Six months of auto-ban gave us calibration evidence we couldn't have generated any other way. The numbers in our config files are now backed by what actually attacked our customers.
Recent articles on platform, security, and engineering.
Component libraries scale poorly when every product has different domain models. We rebuilt Enravo's UI layer around schema definitions — and shipped admin panels in days instead of weeks.
Read moreOAuth solved authorization for a federated web. It does not solve identity for systems where every device, every session, and every request needs to be verifiable. Here's what we built instead.
Read moreBearer tokens are a liability. How ECDSA-based proof of possession makes stolen tokens worthless and why every API should require it.
Read more