~/home/study/web-cache-deception-lab-nginx

Web Cache Deception Lab: Nginx + Cloudflare

Learn how cache deception works, craft deceptive URLs, configure Nginx, and test against Cloudflare. The guide covers cache key logic, path-confusion tricks, and detection methods for security professionals.

Introduction

Web Cache Deception (WCD) is an attack technique that tricks a caching layer-often a CDN or reverse proxy-into storing and serving sensitive resources (HTML pages, JSON APIs, or credentials) as if they were public static assets. By manipulating the request URL, extensions, or headers, an attacker can cause the cache to serve private data to anyone who later requests the same deceptive URL.

Why it matters: A successful WCD can bypass authentication, leak internal configuration, or expose private API responses. Because CDNs such as Cloudflare sit at the edge of the network, the impact can be global, turning a single mis-configuration into a mass data-leak.

Real-world relevance: Security researchers have demonstrated WCD against major platforms, and bug bounty programs now list it as a high-severity issue. Understanding how the cache key is built, how normalization works, and how extensions are interpreted is essential for any web-security professional.

Prerequisites

  • Solid grasp of HTTP caching fundamentals (Cache-Control, ETag, Vary, etc.).
  • Familiarity with Nginx configuration syntax and basic server administration on Linux.
  • Access to a Cloudflare-protected domain (free tier works) and the ability to modify DNS records.
  • Command-line tools: curl, wget, dig, and a modern browser with dev-tools.

Core Concepts

Before diving into the lab, review the three pillars of cache deception:

  1. Cache key construction - The combination of scheme, host, path, query string, and selected request headers that uniquely identifies a cached entry.
  2. Normalization & extension handling - How the caching layer rewrites or sanitizes URLs (removing duplicate slashes, decoding %20, stripping extensions).
  3. Response classification - Whether the backend marks a response as cache-able (Cache-Control: public) or private, and whether the CDN respects that classification.

Diagram (described):

  • Client → (HTTPS) → Cloudflare Edge Node → (HTTP) → Nginx Origin.
  • Cloudflare builds a cache key: scheme://host + normalized-path + sorted-query + selected-Vary headers.
  • If the key collides with a previous request for a private page, the private response may be cached and served to the next requester.

Cache key construction and normalization

Both Nginx and Cloudflare construct the cache key from the request line, but they differ in how they treat certain characters:

  • Path normalization: duplicate slashes (//), dot-segments (/./, /../), and percent-encoding are collapsed.
  • Extension stripping: Many CDNs treat .css, .js, .png as static, and may ignore the rest of the URI after the final dot if the extension matches a known static type.
  • Query string handling: By default Cloudflare includes the full query string in the key, but you can configure Cache-Key to ignore it.
  • Vary header influence: If the origin sends Vary: Accept-Encoding, Cookie, Cloudflare adds those header values to the key.

Example Nginx cache-key snippet (escaped for clarity):


proxy_cache_key "$scheme://$host$request_uri";

Note the use of $request_uri which already contains the normalized path and query string.

Path-confusion and extension spoofing

Path-confusion attacks rely on the fact that many caches treat the part after the last dot as an indicator of content type, regardless of the actual response body. By appending a harmless static extension to a private URL, the attacker convinces the cache that the response is static.

Typical payloads:

  • /admin/config.json/admin/config.json.css
  • /api/secret?token=abc/api/secret?token=abc.css
  • /private/data/private/data/.well-known/..%2F..%2F..%2F/private/data (double-encoding trick)

When the origin returns Content-Type: application/json but the URL ends with .css, Cloudflare still stores it under a static key. Subsequent requests for /admin/config.json.css receive the JSON payload.

Static resource impersonation

Many applications serve both static assets (images, CSS) and dynamic pages from the same host. If the caching layer cannot differentiate them correctly, a private HTML page can be cached as if it were a CSS file.

Example scenario:


curl -I https://example.com/dashboard
# HTTP/1.1 200 OK
# Content-Type: text/html
# Cache-Control: private, max-age=0

curl -I https://example.com/dashboard.css
# HTTP/1.1 200 OK (served from cache)
# Content-Type: text/css
# X-Cache-Status: HIT

Because the second request ends with .css, the CDN classifies it as static and stores the HTML response under the .css key. Any user requesting /dashboard.css now receives the private dashboard HTML.

Host header and Vary header abuse on CDNs

CDNs often use the Host header as part of the cache key. If the origin varies responses based on Host (multi-tenant setups) but does not include Vary: Host, the CDN can mistakenly serve data belonging to a different virtual host.

Similarly, abusing Vary on cookies or custom headers can create unintended cache collisions. Example:


curl -H "Cookie: session=alice" https://example.com/profile
# Returns Alice's profile (private)

curl -H "Cookie: session=bob" https://example.com/profile.css
# Cloudflare treats .css as static and ignores Cookie Vary → Bob receives Alice's data

Mitigation: Always include Vary: Cookie when the response depends on authentication cookies, or configure the CDN to ignore cookies for static assets.

Bypassing cache rules with query parameters and cookies

Most CDNs consider the full query string when generating the cache key, but many developers deliberately strip it for static resources to improve cache hit-rate. Attackers can exploit this by appending a harmless query parameter that does not affect the backend logic, yet forces the CDN to treat the request as unique.

Example bypass:


curl -I "https://example.com/secret.docx?cachebuster=123"
# If the origin sends Cache-Control: private, the CDN will still store it because the query string is part of the key.

Conversely, if the CDN is configured to ignore query strings for .css files, an attacker can drop the query entirely, causing the private response to be cached under the generic static key.

Step-by-step lab: crafting deceptive URLs, configuring Nginx, testing against Cloudflare

This lab walks you through a complete WCD exploitation cycle.

  1. Setup a test domain (e.g., lab.example.com) and point it to a Cloudflare-proxied IP.
  2. Deploy a minimal Nginx origin that serves a private HTML page at /private/secret and a static CSS file at /static/style.css.
  3. Configure Nginx to send Cache-Control: private for the secret page and public for the CSS.
  4. Create deceptive URLs by appending .css to the secret path.
  5. Flush the Cloudflare cache (via API or dashboard) to ensure a clean start.
  6. Trigger the cache with curl and verify the X-Cache-Status header.
  7. Validate leakage by requesting the deceptive URL from a different client.

Full Nginx config (escaped):


server { listen 80; server_name lab.example.com; # Private endpoint - never cache location = /private/secret { add_header Cache-Control "private, max-age=0"; return 200 "<html><body>Sensitive data for Alice</body></html>"; } # Public static asset location /static/ { root /usr/share/nginx/html; add_header Cache-Control "public, max-age=86400"; } # Fallback - 404 location / { return 404; }
}

Testing steps (bash):

# 1. Warm up the cache with the legitimate static file
curl -I https://lab.example.com/static/style.css

# 2. Request the secret page with a deceptive .css extension
curl -i https://lab.example.com/private/secret.css
# Expected: 200 OK, Content-Type: text/html, X-Cache-Status: MISS (first time)

# 3. Request the same URL from another client (no auth)
curl -i https://lab.example.com/private/secret.css
# Expected: 200 OK, X-Cache-Status: HIT - secret HTML is now publicly cached

When using Cloudflare, you will see the CF-Cache-Status header instead of X-Cache-Status. The same logic applies.

Detecting successful deception with curl/wget and browser dev tools

Key indicators:

  • CF-Cache-Status: HIT (or X-Cache-Status: HIT) on a request that should be private.
  • Mismatched Content-Type (e.g., text/css header but HTML body).
  • Presence of authentication cookies in the request but not reflected in the Vary header.

Example curl detection script:

#!/usr/bin/env bash
URL=$1
RESPONSE=$(curl -s -D - "$URL" -o /dev/null)
if echo "$RESPONSE" | grep -iq "CF-Cache-Status: HIT"; then echo "[!] Cache HIT detected - possible deception!" echo "$RESPONSE" | grep -i "content-type"
else echo "[-] No cache hit - likely safe."
fi

In Chrome dev tools, look at the Network tab, select the request, and inspect the Response Headers. Cloudflare adds CF-Cache-Status. A value of HIT combined with a private Cache-Control header is a red flag.

Tools & Commands

  • curl - fetch headers, manipulate Host, Cookie, and query strings.
  • wget - similar to curl but useful for recursive fetches.
  • dig - verify DNS points to Cloudflare.
  • Cloudflare API - curl -X POST " ... to flush caches.
  • Burp Suite / OWASP ZAP - intercept and replay requests with altered extensions.
  • ngrep or tcpdump - observe raw HTTP traffic between client and edge.

Defense & Mitigation

  • Strict Cache-Control: Use Cache-Control: private, no-store for any endpoint that depends on authentication.
  • Vary on Authorization: Include Vary: Authorization, Cookie when responses differ per user.
  • Normalize URLs at the edge: Configure Cloudflare Workers or Page Rules to reject requests where the extension does not match the Content-Type of the response.
  • Extension whitelist: Serve static assets from a dedicated sub-domain (e.g., static.example.com) that never returns private data.
  • Header sanitization: Strip potentially confusing query parameters before caching (Cloudflare Cache-Key transformation).
  • Security testing: Include WCD checks in your CI pipeline using the detection script above.

Common Mistakes

  • Assuming Cache-Control: private alone prevents caching - CDNs may ignore it if the URL looks static.
  • Relying on file extensions to infer content type - attackers can append a benign extension to any path.
  • Not adding Vary: Cookie when responses depend on session cookies.
  • Flushing the CDN cache but forgetting to purge the edge-node cache (Cloudflare’s “Purge Everything” vs. “Purge by URL”).
  • Testing only with a browser - many browsers automatically strip unknown extensions; use curl for raw behavior.

Real-World Impact

Enterprises that host mixed static/dynamic content on the same host are prime targets. A successful WCD can expose:

  • Internal API keys returned in JSON payloads.
  • User-specific dashboards leaking personal data.
  • Configuration files (.env, config.yaml) if served via a mis-configured route.

Case study (hypothetical): A SaaS provider used Cloudflare to accelerate its UI. The /account/settings page returned JSON with API tokens and was protected by a session cookie. By requesting /account/settings.json.css, an attacker caused Cloudflare to cache the JSON as a CSS asset. Within minutes, the token was discoverable by anyone who accessed the CSS URL, leading to a full account takeover.

My expert opinion: As CDNs add more aggressive edge-caching heuristics, the attack surface widens. Organizations must treat caching as a security layer, not just a performance optimization.

Practice Exercises

  1. Deploy the lab environment described above on a VPS and a free Cloudflare account. Verify that the secret page is not cached initially.
  2. Craft three deceptive URLs using different techniques (extension spoofing, double-encoding, query-string addition). Document which ones result in a cache HIT.
  3. Modify the Nginx configuration to add Vary: Cookie. Re-run the exercises and note the change in behavior.
  4. Write a small Cloudflare Worker that rejects any request where the URL extension does not match the Content-Type header. Deploy and test.
  5. Integrate the detection Bash script into a CI job that scans your production URLs daily and alerts on unexpected cache hits.

Further Reading

  • RFC 7234 - Hypertext Transfer Protocol (HTTP/1.1): Caching
  • Cloudflare Docs: Cache-Control and Cache-Key
  • PortSwigger Web Cache Deception write-up (2023)
  • OWASP Testing Guide - Testing for Improper Cache Configuration (OTG-CACH-001)
  • “Cache Poisoning and Deception” - Black Hat Europe 2022 presentation slides

Summary

Web Cache Deception exploits the way CDNs and reverse proxies build cache keys. By manipulating URL paths, extensions, and headers, attackers can force private responses into public caches. This guide covered the underlying cache-key mechanics, common deception techniques, a hands-on lab with Nginx and Cloudflare, detection methods, and mitigation best practices. Mastering these concepts equips security professionals to audit, harden, and monitor modern edge-caching deployments.