Exploring Android Heap allocations in jemalloc 'new'

Rédigé par Nicolas Stefanski - 30/05/2023 - dans Exploit , Reverse-engineering - Téléchargement
When writing an exploit for a memory corruption vulnerability, knowing the heap allocator internals is often required to shape the heap as desired. This article will dive into one of Android libc allocators: jemalloc 'new' (jemalloc version 5 and superior). Whereas scudo is the latest allocator introduced in the platform, jemalloc 'new' is still very used today but not well documented.


Jemalloc 'new', jemalloc >= 5, is a libc heap allocator introduced in Android 10. While sharing the same name with jemalloc and some features, its backend is very different. There is not much public research on the subject1. Moreover, despite the introduction of the scudo allocator in Android 11, a lot of devices are still using jemalloc 'new', Samsung Galaxy S23 included. For all these reasons, a post on the subject seems relevant.

The current article will focus mainly on the Android implementation2 (b0dbd53 commit as of writing) corresponding to the 5.1.0 jemalloc branch (with custom changes).

This blogpost will describe the main concepts of jemalloc 'new' and differences with jemalloc 'old'. Jemalloc version >= 5 will be referred as jemalloc new whereas jemalloc old will refer to jemalloc version < 5.

Anakin Padme meme

How to know if a device uses jemalloc 'new' ?

Android versions are known to be very fragmented across devices. But even if two devices share the same Android version, they could have a different libc allocator. For instance, Google Pixel 4a 5G on Android 13 uses scudo as its allocator whereas Samsung Galaxy S23 uses jemalloc 'new' instead.

The easiest way to find the allocator in use is to look at the symbols in the libc. Look for symbols starting with je_. If hundreds are found, it should be jemalloc old or new. If the Android version is fairly new, it is mostly jemalloc 'new'.

$ readelf -s libc.so | grep je_ | wc -l

To be sure, look for symbols containing extent in them, jemalloc 'old' will return only around 10 results and jemalloc 'new' around 100.

$ readelf -s libc.so | grep extent | wc -l


Jemalloc 'new' uses several important objects:

  • arena (arena_t): main structure keeping track of allocations
  • extent (extent_t): virtual memory segment
  • region: small allocation
  • slab: an extent containing regions of the same size, the underlying structures is an extent_t
  • bin (bin_t): heap of extents/slab of a specific size class
  • tcache (tcache_t): cache of the most recent freed allocations for a given thread

As opposed to scudo or dlmalloc, metadata are not inlined before the address returned by malloc(), they are stored in a totally different place in memory.

Another noteworthy feature is the lack of memory unmapping. By default mapped extents are kept until the end of the process (except if the opt_retain option is set to false, the default is true).

jemalloc new overview diagram
Jemalloc 'new' main structures


Main structures


Arena structures manage parts of the memory independently to reduce concurrency locks. On Android, only 2 arenas are used. Each thread is assigned to one of these two arenas. As such, reusing an allocation made by one thread from another (for instance to exploit a use-after-free), requires the two threads to share the same arena. Its structure struct arena_s/arena_t is available in include/jemalloc/internal/arena_structs_b.h.

arenas maintain lists of allocated/free/cached chunks of memory. These basic chunks are called extents on jemalloc 'new'. They can be used as a whole for large allocations or split into smaller regions.

Available extents (i.e. free) are spread across 3 heaps:

  • dirty: extents newly freed
  • muzzy: extents decayed from dirty
  • retained: extents decayed from muzzy

These heaps store extents from most recently freed (dirty) to least recently freed (retained). Extents transition from a heap to the next one when a decay occurs.

Note: if the global variable opt_retain is set to 1 (default case), when a decay occurs in the retained heap, the extents stay there and are never unmapped. So by default, the mapped extents are never unmapped from the process memory.

// include/jemalloc/internal/arena_structs_b.h

struct arena_s {

// [...]

* Collections of extents that were previously allocated. These are
* used when allocating extents, in an attempt to re-use address space.
* Synchronization: internal.

extents_t extents_dirty;
extents_t extents_muzzy;
extents_t extents_retained;

// [...]


The extents_t structure keeps track of free extents per size class in its heaps. The full list of extents can also be accessed through the lru field which is a simple double linked list. The bitmap field is used to determine which heaps are non-empty.

// include/jemalloc/internal/extent_structs.h
struct extents_s {
	// [...]

	 * Quantized per size class heaps of extents.
	 * Synchronization: mtx.
	extent_heap_t		heaps[NPSIZES+1];

	 * Bitmap for which set bits correspond to non-empty heaps.
	 * Synchronization: mtx.
	bitmap_t		bitmap[BITMAP_GROUPS(NPSIZES+1)];

	 * LRU of all extents in heaps.
	 * Synchronization: mtx.
	extent_list_t		lru;

	// [...]

Arenas also keep track of slabs with free regions (small allocations) in heaps ordered by size class (bins). On Android 64-bit processes, the number of small size classes (NBINS) is 36.

// include/jemalloc/internal/arena_structs_b.h
struct arena_s {
	// [...]
	 * bins is used to store heaps of free regions.
	 * Synchronization: internal.
	bin_t			bins[NBINS];

	// [...]

There is a bin for each size class. These bin_t contain extents (also called slabs in the context of small allocations) with free regions but also a list of full slabs.

// include/jemalloc/internal/bin.h
typedef struct bin_s bin_t;
struct bin_s {
	// [...]

	 * Current slab being used to service allocations of this bin's size
	 * class.  slabcur is independent of slabs_{nonfull,full}; whenever
	 * slabcur is reassigned, the previous slab must be deallocated or
	 * inserted into slabs_{nonfull,full}.
	extent_t		*slabcur;

	 * Heap of non-full slabs.  This heap is used to assure that new
	 * allocations come from the non-full slab that is oldest/lowest in
	 * memory.
	extent_heap_t		slabs_nonfull;

	/* List used to track full slabs. */
	extent_list_t		slabs_full;

    // [...]

The slabcur field holds the last slab used that contains free regions. This field is used by jemalloc to speed up small allocations. If there are no more free regions in slabcur, the slab is inserted into the slabs_full list and another slab from slab_nonfull with free regions takes its place.

arena/bins/extents relations
arena/bins/extents relations

Extents, extents everywhere!

The main difference with the 'old' jemalloc is the use of extents instead of chunks. All allocations are backed by them (even jemalloc metadata). See this article3 for further information about jemalloc 'old'.

Jemalloc 'old' chunks had a fixed size of 2MB (on Android systems) whereas extents sizes can vary but are always a multiple of page size (4096 bytes). The memory reservation is done through mmap called in pages_map. Newly created extents are assigned a serial number (last extent's serial number + 1). These newly mapped extents can be split and merged depending on jemalloc needs. In case of split, children extents inherit their parent serial number. In case of a merge, the smallest serial number is kept. Two extents can be merged only if they are contiguous in memory and belong to the same arena.

References to all the created extents are kept in a global radix tree extents_rtree of type rtree_t. This tree is mostly used at the time of deallocation to find the extent corresponding to the pointer being freed.

Extents are used for two types of allocations:

  • Small allocations (<= 0x3800 bytes): in this case extents are called slabs and contain several regions of the same size (the number of region depends on the size class).
  • Large allocations (> 0x3800 bytes): the extent contains a single allocation.

There is no huge allocations like in jemalloc 'old', these are large allocations.

// include/jemalloc/internal/extent_structs.h

struct extent_s {
	uint64_t		e_bits; // metadata about the extent: serial number, is it a slab, ...
	void			*e_addr; // pointer to actual mmaped memory

	 * List linkage, used by a variety of lists:
	 * - bin_t's slabs_full
	 * - extents_t's LRU
	 * - stashed dirty extents
	 * - arena's large allocations
	ql_elm(extent_t)	ql_link;

	 * Linkage for per size class sn/address-ordered heaps, and
	 * for extent_avail
	phn(extent_t)		ph_link;

	union {
		/* Small region slab metadata. */
		arena_slab_data_t	e_slab_data;

		// [...]

Here are few noteworthy fields:

  • e_bits: extent metadata
  • e_addr: pointer to actual mapped memory
  • ql_link and ph_link: extents can be stored in lists and heaps, ql_link contains the list node of the current extent and ph_link the heap node
  • e_slab_data: if the extent is a slab, this field contains slab metadata (mainly a bitmap describing free/allocated regions in this slab)

Note: For simplicity and readability, in these diagrams, data are depicted inside extent_t. Metadata (extent_t) and actual data (result of mmap pointed by the e_addr field), are not stored contiguously and can even be very far appart from each other.

Small allocations

An allocation is considered small if its size is <= 0x3800. In this case, extents are called slabs and each slab contains regions of the same size.

For each allocation size class, there is a corresponding slab/extent size. Assuming an allocation of size N in a slab of size M, the number of regions in the slab is M/N.

Small allocation size classes for 64-bit processes on Android
# Allocation Size Slab/Extent Size



4096 (0x1000)


16 (0x10)

4096 (0x1000)


32 (0x20)

4096 (0x1000)


48 (0x30)

12288 (0x3000)


64 (0x40)

4096 (0x1000)


80 (0x50)

20480 (0x5000)


96 (0x60)

12288 (0x3000)


112 (0x70)

28672 (0x7000)


128 (0x80)

4096 (0x1000)


160 (0xa0)

20480 (0x5000)


192 (0xc0)

12288 (0x3000)


224 (0xe0)

28672 (0x7000)

# Allocation Size Slab/Extent Size


256 (0x100)

4096 (0x1000)


320 (0x140)

20480 (0x5000)


384 (0x180)

12288 (0x3000)


448 (0x1c0)

28672 (0x7000)


512 (0x200)

4096 (0x1000)


640 (0x280)

20480 (0x5000)


768 (0x300)

12288 (0x3000)


896 (0x380)

28672 (0x7000)


1024 (0x400)

4096 (0x1000)


1280 (0x500)

20480 (0x5000)


1536 (0x600)

12288 (0x3000)


1792 (0x700)

28672 (0x7000)

# Allocation Size Slab/Extent Size


2048 (0x800)

4096 (0x1000)


2560 (0xa00)

20480 (0x5000)


3072 (0xc00)

12288 (0x3000)


3584 (0xe00)

28672 (0x7000)


4096 (0x1000)

4096 (0x1000)


5120 (0x1400)

20480 (0x5000)


6144 (0x1800)

12288 (0x3000)


7168 (0x1c00)

28672 (0x7000)


8192 (0x2000)

4096 (0x1000)


10240 (0x2800)

20480 (0x5000)


12288 (0x3000)

12288 (0x3000)


14336 (0x3800)

28672 (0x7000)

These sizes are computed from information contained in include/jemalloc/internal/size_classes.h as described in this comment:

// include/jemalloc/internal/size_classes.h
 *   ...
 *   SIZE_CLASSES: Complete table of SC(index, lg_grp, lg_delta, ndelta, psz,
 *                 bin, pgs, lg_delta_lookup) tuples.
 *     index: Size class index.
 *     lg_grp: Lg group base size (no deltas added).
 *     lg_delta: Lg delta to previous size class.
 *     ndelta: Delta multiplier.  size == 1<<lg_grp + ndelta<<lg_delta
 *     psz: 'yes' if a multiple of the page size, 'no' otherwise.
 *     bin: 'yes' if a small bin size class, 'no' otherwise.
 *     pgs: Slab page count if a small bin size class, 0 otherwise.
 *     lg_delta_lookup: Same as lg_delta if a lookup table size class, 'no'

#define SIZE_CLASSES \
  /* index, lg_grp, lg_delta, ndelta, psz, bin, pgs, lg_delta_lookup */
    SC(  0,      3,        3,      0,  no, yes,   1,  3) \ // size == 8 
    SC(  1,      3,        3,      1,  no, yes,   1,  3) \ // size == 16 (0x10)
    SC(  2,      4,        4,      1,  no, yes,   1,  4) \ // size == 32 (0x20)
    SC(  3,      4,        4,      2,  no, yes,   3,  4) \ // size == 48 (0x30)
    SC(  4,      4,        4,      3,  no, yes,   1,  4) \ // size == 64 (0x40)

Each size class has a corresponding index size class, first parameter of the SC macro (often called szind in the source code). The SIZE_CLASSES macro is used for large allocations as well. The formula to compute the size of allocations in a size class is as follow:

size == 1<<lg_grp + ndelta<<lg_delta

For instance on Android in a 64-bit process, the allocation size of the size class 1 is:

1<<3 + 1<<3 => 16


Note: Several SIZE_CLASSES macro are defined in include/jemalloc/internal/size_classes.h depending on the underlying architecture. On Android aarch64 in a 64-bit process, the SIZE_CLASSES actually used is defined under the following conditions:

#   if (LG_SIZEOF_PTR == 3 && LG_TINY_MIN == 3 && LG_QUANTUM == 4 && LG_PAGE == 12)

For a 32-bit process, replace LG_SIZEOF_PTR with 2 and LG_QUANTUM with 3.

Extents/slabs keep track of free regions through e_slab_data containing a bitmap of available regions.

Small allocation diagram
Small allocation structures relations

Large allocations

An allocation is considered large if its size is strictly greater than 0x3800. Like small allocations, large ones have size classes available in the same macro in include/jemalloc/internal/size_classes.h (look for size classes with the column bin with a no value).

#define SIZE_CLASSES \
  /* index, lg_grp, lg_delta, ndelta, psz, bin, pgs, lg_delta_lookup */ \
    SC( 35,     13,       11,      3,  no, yes,   7, no) \ // last small alloc size class
    SC( 36,     13,       11,      4, yes,  no,   0, no) \ // first large alloc size class
    SC( 37,     14,       12,      1, yes,  no,   0, no) \
    SC( 38,     14,       12,      2, yes,  no,   0, no) \

Here a list of the first ones in a 64-bit process:

Large allocation size classes for 64-bit processes on Android
# Extent size in tcache
36 16384 (0x4000) yes
37 20480 (0x5000) yes
38 24576 (0x6000) yes
39 28672 (0x7000) yes
40 32768 (0x8000) yes
41 40960 (0xa000) yes
42 49152 (0xc000) yes
42 57344 (0xe000) yes
44 65536 (0x10000) yes
45 81920 (0x14000) no
46 98304 (0x18000) no
  ... no
231 8070450532247928832 (0x7000000000000000) no


When a large allocation of size N is required, the closest size class M such that M >= N is selected. An extent of size M + 0x1000 is then claimed.

The actual base of the allocation (address sent back to the user), is randomized between the extent base and base + 0x1000 with a granularity of 0x40 bytes (see extent_addr_randomize in include/jemalloc/internal/extent_inlines.h).

For instance, if an allocation of 0x9000 bytes is made (i.e. malloc(0x9000)):

  • The closest size class 0xa000 is selected.
  • An extent of size 0xb000 (size class + 0x1000) is retrieved.
  • A random pointer to the first page of this extent is returned, aligned on 64 bytes.

This is only true when an extent is extracted from a list of extents but not from a tcache (see tcache section). From a tcache, the base address is the same as before and not randomized again.

Extent selection algorithm

When a new slab or a large allocation is needed, an extent needs to be retrieved. The selection process follows these steps:

  1. The arena is retrieved from the thread local storage.
  2. Check if arena->extents_dirty contains a matching extent
    • On Android, find the oldest extent with the lowest address (described as a "first-fit" algorithm in the source code)
    • Split the extent in two parts if it is too big
  3. Check if arena->extents_muzzy contains a matching extent
    • Same process as arena->extents_dirty
  4. Check if arena->extents_retained contains a matching extent
    • Same process as arena->extents_muzzy
  5. Otherwise, create a new extent using mmap.

When an extent is created using mmap, its size depends on the value of the extent_grow_next field in the arena_t structure. extent_grow_next defines the index size class of the new created extent. After the extent creation, this value is incremented. The created extent will then be split into:

  • A first extent of the requested size that will be used immediatly
  • A second extent that will be placed in the arena->extents_retained heap.

The sz_pind2sz_tab global table, contains, for each index, the corresponding mmap size. As such, the easiest way to infer the new size, is to dump this table and to look at the index of extent_grow_next.

Here is an extract of this table:

var je_sz_pind2sz_tab = [
	0x1000, 0x2000, 0x3000, 0x4000, 0x5000, 0x6000, 0x7000,
	0x8000, 0xA000, 0xC000, 0xE000, 0x10000, 0x14000, 0x18000,
	0x1C000, 0x20000, 0x28000, 0x30000, 0x38000, 0x40000,
	0x50000, 0x60000, 0x70000, 0x80000, 0xA0000, 0xC0000,
	0xE0000, 0x100000, 0x140000, 0x180000, 0x1C0000, 0x200000,
	0x280000, 0x300000, 0x380000, 0x400000, 0x500000, 0x600000,
	0x700000, 0x800000, 0xA00000, 0xC00000, 0xE00000, 0x1000000,
	0x1400000, 0x1800000, 0x1C00000, 0x2000000, 0x2800000,
	0x3000000, 0x3800000, 0x4000000, 0x5000000, 0x6000000,
	0x7000000, 0x8000000, 0xA000000, 0xC000000, 0xE000000,
	0x10000000, 0x14000000, 0x18000000, 0x1C000000, 0x20000000,
	0x28000000, 0x30000000, 0x38000000, 0x40000000, 0x50000000,
	0x60000000, 0x70000000, 0x80000000, 0xA0000000, 0xC0000000,
	0xE0000000, 0x100000000, 0x140000000, 0x180000000,
	0x1C0000000, 0x200000000, 0x280000000, 0x300000000,
	0x380000000, 0x400000000, 0x500000000, 0x600000000,
	// ...

Note: Before jemalloc 5.2.1, the extent was selected from arena->extents_dirty using a best-fit algorithm (smallest matching size whatever the serial number). However, it seems that the implementation was leaking memory. Therefore it was replaced by the "first-fit" implementation in jemalloc 5.2.1 master repository (and not 5.2.0 as stated in the comment) and Android repository. See the comment below4.

// The best-fit selection is reported to possiblity cause a memory leak.
// This code has been completely removed from 5.2.0, so remove it from
// our tree rather than risk a leak.
// See https://github.com/jemalloc/jemalloc/issues/1454


Besides the associated arena, each thread keeps in its local storage a tcache_t/struct tcache_s structure (include/jemalloc/internal/tcache_structs.h). To speed up the allocation process, some free allocations are kept in the tcache before they are made available to other threads .

When performing an allocation, jemalloc will look in bins_small for small allocations and bins_large for large allocations. If no entry is found, the allocator uses the arena bound to the thread. By default, on Android, only small allocations and large allocations with sizes <= 0x10000 are put in bins_large. The largest allocation size that can be put in the large tcache is defined by the global variable opt_lg_tcache_max (max_size = 1 << opt_lg_tcache_max).

// include/jemalloc/internal/tcache_structs.h

struct tcache_s {
    // [...]
	/* Drives incremental GC. */
	ticker_t	gc_ticker;
	 * The pointer stacks associated with bins follow as a contiguous array.
	 * During tcache initialization, the avail pointer in each element of
	 * tbins is initialized to point to the proper offset within this array.
	cache_bin_t	bins_small[NBINS];

    // [...]
	/* The arena this tcache is associated with. */
	arena_t		*arena;
	/* Next bin to GC. */
	szind_t		next_gc_bin;
	// [...]

	 * We put the cache bins for large size classes at the end of the
	 * struct, since some of them might not get used.  This might end up
	 * letting us avoid touching an extra page if we don't have to.
	cache_bin_t	bins_large[NSIZES-NBINS];

Each entry of bins_small/bins_large corresponds to a specific allocation size class and is a type cache_bin_t (include/jemalloc/internal/cache_bin.h).

// include/jemalloc/internal/cache_bin.h

typedef struct cache_bin_s cache_bin_t;
struct cache_bin_s {
	/* Min # cached since last GC. */
	cache_bin_sz_t low_water;
	/* # of cached objects. */
	cache_bin_sz_t ncached;
    // [...]
	 * Stack of available objects.
	 * To make use of adjacent cacheline prefetch, the items in the avail
	 * stack goes to higher address for newer allocations.  avail points
	 * just above the available space, which means that
	 * avail[-ncached, ... -1] are available items and the lowest item will
	 * be allocated first.
	void **avail;

In cache_bin_s, avail is a dynamically allocated stack containing addresses and is always pointing to the end of the allocated buffer. The start address of the avail buffer can be retrieved with the max number of elements ncached_max : start_avail = &avail[-ncached_max]. This stack is negatively indexed so the first entry (most recently added) is located at avail[-ncached] and the last entry (least recently added) at avail[-1].

When an allocation is popped from the tcache, the entry at avail[-ncached] is returned and ncached is decreased. When an allocation is freed and pushed to the tcache, ncached is increased and the address is stored at avail[-ncached].

tcache structure diagram
tcache structure

Note: each thread has its own tcache and is associated to an arena, but its tcache can contain pointers allocated from extents or slabs of other arenas.

For each cache bin size class, a maximum number of cached objects is allowed and stored in the global array tcache_bin_info of type cache_bin_info_t.

// include/jemalloc/internal/cache_bin.h

typedef struct cache_bin_info_s cache_bin_info_t;
struct cache_bin_info_s {
	/* Upper limit on ncached. */
	cache_bin_sz_t ncached_max; // int32_t


The memory is not infinite and cached memory should be made available again to other threads if it is too old. This is the reason why tcaches->bins_{small,large} need to be flushed. This can be triggered by 2 events:

  • tcache->bins_XXX[i].ncached reaches tcache_bin_info[i].ncached_max, the cache is full.
  • tcache->g_ticket.tick starts with the value tcache->g_ticket.nticks and is decreased each time a tcache operation is performed. When tcache->g_ticket.tick reaches 0, a garbage collection is triggered: the bin pointed by the index tcache->next_gc_bin is flushed and tcache->next_gc_bin is increased modulo the total number of bins (tcache_event_hard in src/tcache.c).

tcache bins flushes are performed in two ways depending on the event triggering them:

  • Cache full: remove half of the oldest entries from the cache
  • GC triggered: remove 3/4 of the oldest entries from cache below the value tcache->bins_XXX[i].low_water and low_water takes the value of ncached

When a tcache is flushed:

  • Small allocation objects (i.e. regions) are put back to their corresponding arena->bin[i] and marked as free.
  • Large allocation objects (i.e. extents) are put back to the heap of available extents in their arena (the arena->extents_dirty heap).

Memory usage


Small allocations

Assuming szind the index class size, the process jemalloc uses to allocate a small region is as follows:

  • Look for a matching region directly in tcache->bins_small[szind] heap
  • Look for a slab with a matching free region in arena->bins[szind] heap
  • Create a new slab from an extent
    • Look for an extent in extents heaps
      • arena->extents_dirty
      • arena->extents_muzzy
      • arena->extents_retained
    • Create a new extent if no extent is available
    • split the extent/slab into regions and put it in arena->bins[szind]

Large allocations

Assuming szind is the index class size, the process jemalloc uses to perform a large allocation is as follows:

  • If the size is <= 0x10000, look for a matching large alloc directly in tcache->bins_large[szind] heap.
  • Look for an extent in extents heaps
    • arena->extents_dirty
    • arena->extents_muzzy
    • arena->extents_retained
  • Create a new extent if no extent is available.


When freeing the pointer ptr, the process is the following:

  • Look for the extent in which ptr belongs.
  • Find the corresponding size class index szind for this extent.
  • If its a small size class (<= 0x3800)
    • put ptr into the tcache->bins_small[szind] cache
    • if the cache bin is full, flush it
  • If its a large size class and its size is <= 0x10000
    • put ptr into the tcache->bins_large[szind] cache
    • if the cache_bin is full, flush it
  • If its a large size class and its size is > 0x10000
    • get the ptr extent and put it back to the heap of available extents, arena->extents_dirty


Jemalloc 'new' internals are quite complex and use a lot of different structures to increase speed and minimize memory fragmentation in a multithreaded process.

From an attacker point of view, here are a few takeaways:

  • When exploiting heap overflow vulnerabilities, as opposed to scudo, there is no canaries between allocations. Depending on the allocation size, the adjacent region or extent can be overwritten. However, the exploitation of such vulnerabilities on large allocations is hindered by the random offset added by jemalloc.
  • When exploiting use-after-free vulnerabilities, the tcache mecanism makes it easy to reuse an allocation on the same thread. However, this becomes tricky when trying to reuse an allocation from another thread as the tcache needs to be flushed and the two threads need to share the same arena.
  • Another significant behavior to take into account is the lack of extent unmapping, the opt_retain option being set to true by default.

The author would like to thank his colleagues at Synacktiv, Clément Berthaux and Jean-Baptiste Cayrou, for their feedback and support throughout the writing process of this post.