Nine stages, seven independent layers, four invariants. This page is the canonical reference your security team and SAP Basis lead can read together.
Same identity, same intent, same context, same data — same response. No probabilistic security decisions.
Seven independent layers can each deny. A single bypass cannot leak data on its own.
The JWT's sub and tenant_id flow all the way into the SAP / IAM calls. No service-account sleight of hand.
Infra outage (Redis, audit DB, JWKS) denies before it serves. Explicit env opt-in can toggle degraded mode for non-production.
Rate-limit middleware
Per-user and per-tenant fixed-window counters in Redis. 429 with accurate Retry-After. Fails open on Redis hiccup — the trust system is the real gate.
Request-ceiling middleware
ASGI-level caps on body size (default 64 KB), URL length (default 2 KB), and per-request wall time (default 10 s). 413 / 414 / 504 before the app runs.
Authenticate
JWT verification with HS256 / RS256 / ES256 / PS256. JWKS rotation cached process-wide. iss, aud, exp, nbf, signature, and configured leeway all enforced.
Authorise
SAP BAPI_USER_GET_DETAIL walks ACTIVITYGROUPS and PROFILES; per-profile auth-objects fetched via SUSR_GET_PROFILE_AUTH_OBJECTS. Cross-cloud IAM (AWS / Azure / GCP) checked for any system the request touches.
Adaptive trust
Tenant-scoped Redis ledger. Frequency, scope-expansion, coverage-growth, cross-user coordination, revisit ratio. Trust score → trust level → rate-limit band and request restrictions.
Policy
Priority-weighted expression set evaluated by a safe AST whitelist. Rejects Call outside {len,any,all,min,max,abs,str,int,float,bool}; lambdas, comprehensions, imports, dunders. Deny-by-default; deny-wins-on-tie.
Plan
Intent compiler validates and structures the request; QueryPlanner emits a SafeQuery with :named placeholders. Tenant isolation rendered as a row policy on every entity.
Execute
MODE-gated dispatch. MODE=PRODUCTION uses the real SAP adapter (or RDS / Synapse / BigQuery). Simulation and in-memory fixtures refuse to run in production.
Mask
Schema-driven response firewall. Per-field classification × user clearance × auth-object gate. Drop / redact / hash / partial / aggregate. PII detected in untagged fields fails the response.
Audit
Tamper-evident HMAC chain. SHA-256 of prev_hash || canonical_json(payload), HMAC-SHA256-signed under AEGIS_AUDIT_HMAC_KEY, written under a row-level lock on the previous row. Postgres-backed; SQLite fallback in dev.
Postgres-backed in production, SQLite-fallback in dev. Row-level lock on append; HMAC-signed for integrity beyond just hash chaining.
row_hash = sha256(prev_hash || canonical_json(payload))
hmac_sig = hmac_sha256(AEGIS_AUDIT_HMAC_KEY, row_hash)
An attacker who writes directly to the DB cannot extend the chain undetected — they would need the HMAC key, which lives in your KMS / Vault, not the database.
# Admin-gated end-to-end re-walk:
GET /api/audit/verify
→ {"ok": true, "entries_checked": 18342}
# Unauthenticated low-info probe for k8s:
GET /api/audit/integrity
→ {"ok": true, "entries_checked": 18342}
A break returns the first offending row id. The Helm chart ships a CronJob that runs this every hour and pages on ok=false.
| Threat | Mitigation |
|---|---|
| Forged JWT | iss / aud / exp / nbf / signature verification; JWKS rotation |
| Token replay after expiry | exp + short leeway + clock sync |
| Cross-tenant read | tenant_id in JWT + row policy on every query |
| SQL injection via intent | Parameterised SafeQuery; no string interpolation |
| Scope expansion via broad intent | Planner subset check + LOW-trust aggregation deny |
| Field-level PII leak | ResponseFirewall mask from FieldTag |
| Inference via repeated queries | Trust coverage ratio + revisit + coordination |
| Coordinated cross-user attack | Global coverage-growth + density signals |
| Audit tamper | HMAC hash chain + verify endpoint |
| Trust ledger DoS (Redis down) | Fail-closed by default |
| Oversize / slow-loris | Ceiling middleware (body / path / timeout) |
| Request-rate abuse | Rate limit middleware (per user + per tenant) |
| Unauthenticated admin actions | require_admin dependency |
| Config drift between subsystems | Single DataSchema registry |
| Default-secret deployment | Startup-time refusal in PRODUCTION when readiness blockers exist |
Not yet mitigated and explicitly on the roadmap: cross-region encryption-at-rest key management, formal pen-test, DDoS upstream, and an automatic kill-switch on chain-break.
GET /api/audit/verify returns ok:false. Immediately disable writes, capture first_break_id, isolate the DB, compare row_hash against the last known-good backup. Do not truncate.
Trust system denies every request by default. Bring Redis back; no data loss (trust state regenerates). To serve traffic during the outage, accept the risk and set AEGIS_TRUST_FAIL_OPEN_ON_REDIS_OUTAGE=1 on the degraded cluster only.
In PRODUCTION the pipeline returns 503 because AEGIS_AUDIT_STRICT is implicit. Fix the DB before re-enabling traffic.
Rotate IdP keys; AegisAI picks up via JWKS without a restart. Set a 24-hour overlap to avoid in-flight request failures.
Check aegis.requests{status=429} in your OTel backend. If per-user, investigate the account. If tenant-wide, raise AEGIS_RATE_LIMIT_PER_TENANT_RPM or shard the tenant.
The gateway refuses to start in PRODUCTION when readiness blockers exist. Rotate JWT_SECRET and AEGIS_AUDIT_HMAC_KEY via the configured secrets provider, then redeploy.