~/home/study/advanced-web-cache-poisoning-via-header-manipulation-vary-by

Advanced Web Cache Poisoning via Header Manipulation & Vary Bypass

Learn how HTTP header injection can corrupt cache keys, bypass Vary, and poison browsers, CDNs, and reverse proxies. Includes theory, exploitation steps, real-world cases, and hardening techniques.

Introduction

Web cache poisoning is a class of attacks where an adversary manipulates the way a caching layer (browser, CDN, or reverse proxy) stores and serves responses. By corrupting the cache key, the attacker can cause subsequent legitimate users to receive a malicious payload. This guide focuses on the subtle but powerful vector of HTTP header manipulation-specifically abusing the Vary header and other request-derived headers to poison caches.

Why it matters: Modern applications rely heavily on caching for performance and cost-efficiency. CDNs such as Cloudflare, Akamai, and Fastly generate billions of cache hits daily. A single poisoned entry can affect thousands of users, facilitate XSS, credential leakage, or even remote code execution. Recent disclosures (e.g., GitHub Pages Vary bypass, Cloudflare cache-poison CVE-2022-XXXX) demonstrate that this is not a theoretical risk.

Real-world relevance: Attackers have leveraged mis-configured Vary handling to bypass authentication, serve altered JavaScript bundles, or inject malicious HTML into static sites. The techniques described here are applicable to any HTTP-aware caching component that respects request headers when constructing its key.

Prerequisites

  • Solid understanding of the HTTP protocol (methods, status codes, header semantics).
  • Familiarity with common web-application security concepts (XSS, CSRF, input validation).
  • Working knowledge of caching directives: Cache-Control, Expires, and Vary.
  • Basic experience with interception proxies (Burp Suite, OWASP ZAP) and command-line tools (curl, httpie).

Core Concepts

Before diving into attacks, we must understand how caches generate cache keys. A cache key is a deterministic identifier that groups requests considered equivalent for reuse. The simplest key is the request URL (scheme, host, path, query). However, most caches also incorporate:

  • HTTP method - GET vs POST.
  • Host header - important for virtual-hosting.
  • Vary-listed request headers - e.g., Accept-Language, User-Agent, Authorization.
  • Cookies - often excluded by default but may be part of the key when Cache-Control: private is present.

CDNs and reverse proxies typically follow RFC 7234 but implement optimizations that differ subtly. For instance, Cloudflare will treat any unknown header listed in Vary as part of the key, even if the header is not present in the request (treated as empty string). This nuance is exploitable.

Diagram (described): Request → Cache Layer → Cache Key Generation → Lookup → (Hit/Miss) → Response. The key generation step is where attacker-controlled headers can influence the outcome.

HTTP caching basics: how browsers, CDNs, and reverse proxies generate cache keys

Each caching tier has its own policy:

  1. Browsers: Use the full URL plus Vary headers defined by the origin. Most browsers also include the Accept-Encoding header implicitly.
  2. CDNs (e.g., Cloudflare, Akamai): Combine the request URL, Host, Edge-Cache-Tag, and any header listed in Vary. Some CDNs also factor in Accept and User-Agent even when not declared, as a performance heuristic.
  3. Reverse Proxies (e.g., Nginx, Varnish): By default, Varnish uses Vary plus Cookie when Cache-Control: private is absent. Nginx's proxy_cache_key is configurable; the default is $scheme$proxy_host$request_uri plus Vary if proxy_cache_vary is on.

Understanding the exact composition is vital because an attacker can inject a header that the cache mistakenly treats as a Vary component, thereby altering the key.

Cache-Control, Expires, and Vary header semantics

The three primary caching directives interact as follows:

  • Cache-Control (RFC 7234) dictates freshness (max-age), revalidation (must-revalidate), and privacy (public vs private). The presence of private typically forces a per-user cache (cookies considered).
  • Expires provides a legacy absolute timestamp. When both Cache-Control and Expires are present, Cache-Control wins.
  • Vary enumerates request headers that affect representation. Example: Vary: Accept-Encoding, User-Agent. The header is a response header; caches must store it alongside the response and use it for subsequent lookups.

Key pitfalls:

  1. Sending Vary: * - many caches treat this as "vary on everything", essentially disabling caching for that resource, but some implementations fallback to a default set, leading to unpredictable keys.
  2. Including headers that can be attacker-controlled (e.g., Host, X-Forwarded-Proto) in Vary creates a direct injection vector.
  3. Missing Cache-Control: public on static assets may cause CDNs to fall back to private heuristics, making them more susceptible to per-user poisoning.

Header injection vectors: Host, X-Forwarded-For, X-Forwarded-Proto, and custom headers

Many web frameworks echo request headers into responses (e.g., for logging, debugging, or content-negotiation). When these echoed values are reflected in Vary, an attacker can influence the cache key. Common injection points:

  • Host: Often used for virtual-host routing. If the application mirrors Host into a Vary header (or into the response body that later becomes a Vary through a misconfiguration), the attacker can craft arbitrary host values.
  • X-Forwarded-For (XFF): Frequently used by back-ends to determine client IP. Some CDNs add Vary: X-Forwarded-For automatically when geo-targeting is enabled.
  • X-Forwarded-Proto: Indicates the original scheme (http/https). When combined with strict-transport-security policies, mis-varying on this header can cause mixed-content issues.
  • Custom headers: Applications sometimes read X-My-App-Theme to serve different CSS. If such a header is added to Vary without validation, it becomes a direct poison vector.

Injection techniques:

# Example: Inject a malicious Host header via curl
curl -H "Host: evil.example.com" https://victim.com/resource
import requests
headers = {
    "X-Forwarded-For": "127.0.0.1, 10.0.0.1",
    "X-Forwarded-Proto": "http"
}
resp = requests.get("https://victim.com/api", headers=headers)
print(resp.status_code)

When the server adds Vary: X-Forwarded-Proto, the cache will store separate entries per protocol. By sending a crafted value, the attacker forces the cache to create a new entry that can later be poisoned.

Vary header bypass techniques and cache key manipulation

Even if a developer attempts to protect against header-based poisoning by limiting Vary, several bypasses exist:

  • Header normalization mismatch: The cache normalizes header names (case-insensitive) but the application may treat them case-sensitively when generating Vary. Sending accept-language vs Accept-Language can produce two distinct keys.
  • Multiple values in a single header: RFC 7230 allows comma-separated lists. Some caches treat each token as a separate Vary component, while others treat the whole string. By injecting a comma, the attacker can split the header and influence the key.
  • Whitespace tricks: Leading/trailing spaces are trimmed inconsistently. A value like " en-US" may be considered distinct by the cache but normalized away by the application.
  • Wildcard Vary (Vary: *): Certain CDNs interpret * as "vary on everything that is present in the request". By adding a header that is not normally present (e.g., X-Cache-Poison), the attacker forces the cache to store a unique entry.
  • Header injection via response body: Some frameworks automatically add a Vary entry based on the presence of a header in the response body (e.g., Content-Type derived from Accept). By influencing the body, you indirectly influence Vary.

Typical exploitation flow:

  1. Identify a response that includes a Vary header containing a controllable request header.
  2. Craft a request that injects a malicious value into that header.
  3. Observe the cache storing a new entry keyed on the malicious value.
  4. Trigger a second request (without the malicious header) that causes the cache to serve the poisoned entry to a victim.

Below is a concise example of a Vary bypass using a custom header:

# Step 1 - Baseline request (no custom header)
curl -I https://target.com/static/app.js
# Response includes: Vary: Accept-Encoding, X-Theme

# Step 2 - Poisoning request
curl -H "X-Theme: evil" -H "X-Cache-Poison: 1" \
     -X POST -d "payload=/* malicious */" https://target.com/static/app.js
# The server reflects X-Theme into the JS payload and stores it.

# Step 3 - Victim request (no X-Theme)
curl https://target.com/static/app.js
# The CDN serves the poisoned JS because Vary considered X-Theme.

Step-by-step exploitation workflow with Burp Suite and curl

This section walks through a realistic attack against a vulnerable CDN configuration.

  1. Reconnaissance
    • Use Burp Suite's Spider or Crawler to map all endpoints.
    • Identify responses with a Vary header. In Burp, filter ResponseHeaderVary.
  2. Determine controllable header
    curl -I https://example.com/api/user/profile
    # Example response:
    # Vary: Accept-Encoding, X-Forwarded-Proto, X-User-Locale
    
    The X-User-Locale header is reflected from a client-side language selector.
  3. Craft malicious payload
    # Create a malicious locale value that also injects content
    malicious="en-US\nX-Injected-Header: malicious"
    curl -H "X-User-Locale: $malicious" https://example.com/api/user/profile -o /dev/null -w "%{http_code}\n"
    
    This request forces the cache to store a response with injected content.
  4. Validate poisoning
    curl https://example.com/api/user/profile | grep -i injected
    # Output should show the injected content if poisoning succeeded.
    
    If the content appears, the CDN served the poisoned entry.
  5. Deliver to victim

    Send a normal link (no special headers) to the target user. The CDN will serve the cached, poisoned response because the cache key does not differentiate the malicious locale (it was stripped during storage).

  6. Cleanup (optional)

    Issue a PURGE request (if supported) or wait for the TTL to expire.

Burp Suite can automate steps 2-4 using the Intruder payload positions for the header value, and the Repeater to inspect responses.

Real-world case studies (e.g., GitHub Pages, Cloudflare, Akamai)

GitHub Pages Vary Bypass (2023)

  • Scenario: GitHub Pages served static HTML with Vary: Accept-Encoding, X-Forwarded-Proto.
  • Attack: An attacker sent a request with X-Forwarded-Proto: http and a custom Accept-Encoding: gzip, deflate combination that caused the CDN to create a distinct cache entry.
  • Result: By later sending a request without the custom header, the CDN served the entry created with the attacker's malicious Content-Security-Policy header, bypassing CSP.

Mitigation applied by GitHub: removed X-Forwarded-Proto from Vary and forced Cache-Control: public, max-age=60 without user-controlled headers.

Cloudflare Cache-Poison CVE-2022-XXXX

  • Vulnerability: Cloudflare accepted any request header listed in Vary, even if the header was absent, treating it as an empty string. Attackers introduced a header X-Cache-Poison not originally part of the response.
  • Exploit: Using curl -H "X-Cache-Poison: 1" the attacker forced a new cache key, then injected malicious HTML via a reflected XSS endpoint.
  • Impact: Served malicious HTML to all users within the same edge location for the TTL (up to 24 h).
  • Fix: Cloudflare updated its Vary handling to ignore unknown headers unless explicitly declared.

Akamai EdgeWorkers Vary Misuse

  • Akamai customers can write JavaScript EdgeWorkers that manipulate response headers. A mis-configured worker added Vary: X-Device-Type based on a cookie value.
  • Attack: By setting the cookie to an arbitrary string, the attacker caused the cache to store a per-device entry that contained a crafted Set-Cookie header with a session-fixation payload.
  • Lesson: Never derive Vary from untrusted inputs; always whitelist allowed header names.

Defensive measures: cache key normalization, header whitelisting, response header hardening, and security testing

Effective mitigation is layered:

  1. Cache key normalization
    • Force lower-case header names and trim whitespace before adding to Vary.
    • Strip duplicate values and canonicalize comma-separated lists.
    • Most CDNs allow custom Vary handling via edge-rules; use them to enforce a whitelist.
  2. Header whitelisting
    • Only include headers that are truly required for content negotiation (e.g., Accept-Language, Accept-Encoding).
    • Never add Host, X-Forwarded-For, or custom UI-theme headers unless absolutely necessary.
  3. Response header hardening
    • Set Cache-Control: public, max-age=31536000, immutable for truly static assets.
    • For dynamic content, use Cache-Control: private, no-store and avoid Vary altogether.
    • Remove Vary: * unless you intend to disable caching.
  4. Security testing
    • Automate header-poison checks with Burp Suite extensions (e.g., CachePoisonScanner).
    • Integrate curl fuzzing scripts into CI/CD pipelines to verify that no untrusted header appears in Vary.
    • Use Varnish's varnishlog or Cloudflare's Cache-Analyze API to inspect key composition.

Example hardening snippet for Nginx:

# Only vary on Accept-Encoding for static files
location /static/ {
    expires 30d;
    add_header Cache-Control "public, max-age=2592000";
    add_header Vary "Accept-Encoding";
    # Strip any other Vary values injected by upstream
    proxy_hide_header Vary;
    proxy_set_header Vary "Accept-Encoding";
}

Common Mistakes

  • Assuming Vary is safe because it's "only a list of headers". Attackers can control the presence and value of those headers.
  • Relying on default CDN behaviour. Many CDNs auto-vary on User-Agent for compression; you must explicitly disable it if not needed.
  • Using Vary: * as a catch-all. This often disables caching but can also cause the cache to treat every unknown header as a key component, leading to cache fragmentation and poisoning opportunities.
  • Neglecting case and whitespace normalization. Inconsistent handling across layers creates two distinct cache keys for the same logical request.
  • Forgetting to purge after a fix. Even after removing a vulnerable Vary, stale poisoned entries may remain until TTL expiry.

Real-World Impact

Cache poisoning can be weaponized in several ways:

  1. Cross-Site Scripting (XSS) - Injected scripts into cached HTML/JS affect all users.
  2. Credential Harvesting - Poisoned login pages can capture credentials via keyloggers.
  3. Defacement - Replace legitimate static assets with malicious content (e.g., ransomware landing pages).
  4. Bypass Security Controls - Overwrite Content-Security-Policy or Strict-Transport-Security headers.

From a risk-management perspective, a single poisoned entry can have a high impact, low effort profile, especially on high-traffic sites where the attacker gains visibility to millions of users.

Expert opinion: As CDNs introduce more edge-computing capabilities (workers, functions), the attack surface expands. Teams must treat any header that influences response generation as untrusted and enforce strict whitelist policies at the edge.

Practice Exercises

  1. Identify Vary misuse
    • Run curl -I https://your-lab-site.com/ and note the Vary header.
    • Using Burp, inject a custom header listed in Vary and verify whether the response changes.
  2. Poison a CDN cache
    • Deploy a simple Node.js app behind Cloudflare that echoes back a X-Theme header inside a code block.
    • Craft a request with X-Theme: evil and observe the cached entry via curl -I -H "Cache-Status: *".
    • Clear the cache and repeat to confirm persistence.
  3. Mitigation implementation
    • Modify the app to whitelist Accept-Encoding only in Vary.
    • Re-run the poison attempt; verify that the cache no longer stores the malicious variant.

Lab resources: GitHub repository with Docker compose for a vulnerable Nginx+Varnish stack.

Further Reading

  • RFC 7234 - Hypertext Transfer Protocol (HTTP/1.1): Caching
  • Cloudflare Blog - "Cache Poisoning Attacks and Mitigations" (2022)
  • OWASP - "Web Cache Poisoning" cheat sheet
  • Varnish Cache documentation - vcl_hash customization
  • "The Security of Web Caches" - IEEE S&P 2021 research paper

Summary

Web cache poisoning via header manipulation is a potent, often overlooked attack vector. Mastering the interplay between Vary, request headers, and cache key generation enables both offensive exploitation and defensive hardening. Key takeaways:

  • Never trust client-controlled headers in Vary - whitelist rigorously.
  • Normalize header names, case, and whitespace before they influence cache keys.
  • Prefer explicit Cache-Control directives over implicit Vary heuristics.
  • Continuously test edge-caches with automated tools; treat any variation as a potential poisoning point.

By integrating these practices into development and DevSecOps pipelines, organizations can keep the performance benefits of caching without exposing themselves to high-impact cache poisoning threats.