WordPress for Security Audit

Written by Antoine Gicquel - 15/12/2023 - in Pentest - Download

WordPress is a major player in the CMS market, powering around 40% of websites today. This widespread adoption has made it an attractive target for security research, as the safety of millions of websites depends on it. In this article, we will study in detail its core architecture: project structure, authorizations mechanisms, hooks, routing system, APIs and plugins.

WordPress Core

Compared to other CMSs, the codebase of WordPress Core is rather small (around 500,000 single lines of PHP code in 1300 files). Indeed, WordPress aims to be a simple and easy solution for building a blog, and additional use-cases and features are to be handled with plugins. Nonetheless, WordPress natively provides users with usual features such as a routing system, users and data management, and authentication.

 

Project structure

First, let’s examine its file structure. WordPress is neatly organized, and its webroot is meant to point to the root folder of the WordPress installation, which is composed of the following files and folders:

  • index.php : the entrypoint for all requests on the REST API and the customer-facing part of the website;
  • wp-admin/ : pages, libraries and resources specific to the admin dashboard;
    • admin-ajax.php : endpoint responsible for handling AJAX requests from the clients browsers, used by some plugins and by the administration dashboard;
    • *.php : all the features the administration dashboard offers;
  • wp-includes/ : contains all core WordPress libraries code and resources;
  • wp-content/ : contains the themes, plugins and user-uploaded files;
  • wp-config.php : contains most of the configuration data and secrets of the application;
  • xmlrpc.php : deprecated, used to integrate WordPress with third-party applications before the REST API was developed;
  • wp-load.php : bootstraps WordPress Core;
  • the rest of PHP files are there to handle specific features of WordPress (wp-login.php for login, wp-activate.php for email verification, …).

When navigating a WordPress-powered site, this translates to the following sections:

Customer-facing part of the website, as an unauthenticated user
Customer-facing part of the website, as an unauthenticated user
Login form
Login form
Authenticated users dashboard
Authenticated users dashboard
Customer-facing part of the website, as an authenticated user. Notice the top-bar added to the page
Customer-facing part of the website, as an authenticated user. Notice the top-bar added to the page

 

 

Authentication and Authorizations

WordPress's authorizations are based on a capabilities mechanism1. Users belong to groups, which aggregate specific rights to do specific actions, called capabilities. For example, giving a user the “Author” role gives him all the capabilities associated with the Author role. It is also possible to give a capability directly to a user, without assigning it to a role. Roles can inherit from one another. If role A inherits from role B, then the set of capabilities associated with A contains those associated with B. By default, there is no menu to edit the capabilities associated to users and roles in the dashboard, but it can be added with plugins such as User Role Editor2.
The WordPress Core function to check the capabilities of the user initiating the current request is current_user_can($cap). It returns whether the initiator of the request possess the capability $cap. Beware of the false friend is_admin(), which returns true if the constant WP_ADMIN is defined (basically equivalent to checking if the URI is prefixed by /wp-admin/, as every page in there set the WP_ADMIN constant to true).
Capabilities are designated by a string identifier (e.g. "publish_posts"), and consist in no more than that. Adding capabilities to roles and user is really straightforward in the code, and functions like add_cap and current_user_can only consist in adding, removing and checking for the presence of strings in a list. As an attacker, obtaining one of the following capabilities is interesting:

  • unfiltered_html: allows you to insert malicious HTML tags (like javascript) in articles, opening the door for an XSS or a CSRF;
  • install_themes, intall_plugins: allows you to install a theme or a plugin of your choice, equivalent to a free PHP code execution;
  • edit_themes, edit_plugins: same thing, but with existing plugins and themes.

On a side note, be careful with themes as if you introduce a bug in a theme file (which is included in every page), there are chances that the website will not respond to any further request, and thus you might lock yourself out with no possibility to revert your changes from the administrator panel.

Plugins can register new capabilities and create roles with the function add_role(), granting it with the newly defined capabilities. It is good practice to also grant the Administrator role with your new capabilities, so a plugin defining new capabilities might contain the following:

function myplugin_define_roles_and_caps() {
    $plugin_role_id = "mypluginrole";
    $plugin_role_displayname = "MyPlugin Role";
    $plugin_caps = array("myplugin_list_X" => true, "myplugin_do_Y" => false);

    // Create the role
    add_role($plugin_role_id, $plugin_role_displayname, $plugin_caps);
    // Grant the administrator with all the capabilities defined by this plugin
    $admin = get_role("administrator");
    if ( $admin ) {
        $admin->add_cap("myplugin_list_X", true);
        $admin->add_cap("myplugin_do_Y", true);
    }
}

add_action( 'plugins_loaded', 'myplugin_define_roles_and_caps' );

This code will be placed in the plugin’s functions.php file, which will be described in a following section about plugins.

Regarding authentication, WordPress uses a classic approach: you enter your username and password in the login form (located in wp-login.php at the webroot of your WordPress instance), the password is salted and hashed using MD5 or bcrypt depending on the versions of WordPress and PHP in use and these credentials are checked against the ones stored in the database. If they match, WordPress grants you with a cookie named wordpress_logged_in_[hash], acknowledging that you have correctly logged in as your user.
The content of the wordpress_logged_in_[hash] cookie is interesting. Indeed, it is generated by the function wp_generate_auth_cookie, and is relying on session tokens stored along an expiration date in the database. This session token is the only thing preventing us to use directly the password hash of a user to forge a cookie, as you can see in the code:

function wp_generate_auth_cookie( $user_id, $expiration, $scheme = 'auth', $token = '' ) {
    $user = get_userdata( $user_id );
    if ( ! $user ) {
        return '';
    }

    if ( ! $token ) {
        $manager = WP_Session_Tokens::get_instance( $user_id );
        $token   = $manager->create( $expiration );
    }

    $pass_frag = substr( $user->user_pass, 8, 4 ); // user_pass is the hash of the user password

    $key = wp_hash( $user->user_login . '|' . $pass_frag . '|' . $expiration . '|' . $token, $scheme );

    // If ext/hash is not present, compat.php's hash_hmac() does not support sha256.
    $algo = function_exists( 'hash' ) ? 'sha256' : 'sha1';
    $hash = hash_hmac( $algo, $user->user_login . '|' . $expiration . '|' . $token, $key ;

    $cookie = $user->user_login . '|' . $expiration . '|' . $token . '|' . $hash;
    [...]
}

 

Database structure

In its default configuration, WordPress’s database schema, just like its codebase, is condensed and fits in a dozen of tables:

  • wp_users : stores user data on the CMS, like login, password and registration date;
  • wp_posts : stores the actual content of posts and pages of your WordPress site, along with information on the pages (title, status, …);
  • wp_comments : stores the comments left by users on your posts, alongside additional data like the comment date, author account, author IP address, …;
  • wp_terms : lists the terms (think of it like categories or tags for posts) of your site;
  • wp_term_relationships : links posts to terms;
  • wp_term_taxonomy : stores descriptions and links associated with a term;
  • wp_{user,post,comment,term}meta : These tables follow a specific schema, with 3 main columns: XXXX_id acting as a foreign key to the wp_XXXX table, meta_key naming the attribute and meta_value containing its value. Note that these tables can also be used by plugins to add additional metadata like a rating feature.
    • wp_usermeta : stores attributes like users’ capabilities* and various options (rich_editing, comment_shortcuts, …);
    • wp_postmeta : stores attributes like the template page of a post;
    • wp_commentmeta : can be used by plugins to store any data needed alongside comments;
    • wp_termmeta : stores attributes such as a term’s descriptive image, used mainly by plugins;
  • wp_options : stores the settings of the instance, including:
    • basic data like canonical_url and date_format;
    • mail server credentials;
    • widgets*;
    • user roles* (administrator, author, …) and associated capabilities;
    • ...

Note that some of these fields, marked with a *, contain lists or dictionaries and are stored as a serialized object. Although there are no hardening present on the PHP configuration, the serialized data is only composed of arrays and basic types.

 

Hooks (Actions & Filters)

First, some terminology needs to be defined here:

  • A hook3 is a list of functions (“callbacks”), along with a string identifier used to refer to the hook. Each callback is given a priority index in the hook when it is registered, which determines its order of execution. When the hook is triggered, callbacks are called in the defined order, one after the other. Actions and filters are wrapper concepts around the same class, WP_Hook.
  • An action is a type of hook in which no data is passed from one callback to the next. Actions are meant to trigger, well… actions whenever something happens, without needing a return value.
do_action($hook_name, $arg1, $arg2, ...)
├─ callback1_priority_1($arg1, $arg2, ...)
├─ callback2_priority_1($arg1, $arg2, ...)
├─ callback3_priority_2($arg1, $arg2, ...)
├─ ...
├─ return ''
  • Filters on the other hand chain the callbacks, and output the value of the final callback. They are meant to filter user-supplied values, or to inject data somewhere. For example, it is extensively used when creating menus, where any core class or plugin can register a callback which takes a menu in input, and outputs the menu extended with the options provided by the class/plugin.
apply_filters($hook_name, $value, $extra_arg1, $extra_arg2, ...)
├─ $value = callback1_priority_1($value, $extra_arg1, $extra_arg2, ...)
├─ $value = callback2_priority_1($value, $extra_arg1, $extra_arg2, ...)
├─ $value = callback3_priority_2($value, $extra_arg1, $extra_arg2, ...)
├─ ...
├─ return $value

Notice that $value is updated upon each callback execution, and the updated value is passed to the next callback.

Now, if you scroll through WordPress Core code, you might stumble on lines like this one:

do_action( 'wp_create_file_in_uploads', $file, $attachment_id ); // For replication

Which seems totally normal, given what we saw earlier. Next thing you wonder, what does this 'wp_create_file_in_uploads' action do ? You start searching for calls to add_action('wp_create_file_in_uploads' and add_filter('wp_create_file_in_uploads' to see which callbacks are linked to this hook, and… nothing. Indeed, some of these hook calls (annotated with a // For replication comment) are placed in the code to act like anchor points for plugin developers to register actions, to be notified or to tweak the behavior of WordPress to their needs. Nonetheless, this does not mean that by default nothing will get executed when the previously mentioned action is called. There is a special hook, registered with the name 'all', which is executed every time an action or a filter is triggered. This hook is particularly useful for logging every other hook call for example, but comes with a heavy performance impact.

Here is a very simplified version of the apply_filters() method of the class WP_Hook:

public function apply_filters( $value, $args ) {
    do {
        $this->current_priority[ $nesting_level ] = current( $this->iterations[ $nesting_level ] );
        $priority = $this->current_priority[ $nesting_level ];
        foreach ( $this->callbacks[ $priority ] as $the_ ) {
            $value = call_user_func_array( $the_['function'], $args );
        }
    } while ( false !== next( $this->iterations[ $nesting_level ] ) );

    unset( $this->iterations[ $nesting_level ] );
    unset( $this->current_priority[ $nesting_level ] );

    return $value;
}

We can see that behind the scenes of every hook is the (in)famous PHP function call_user_func or his sibling call_user_func_array. This means if a WordPress plugin lets the user decide on the second argument passed to add_action / add_filter (a.k.a. the callback function name), it could be close to having an arbitrary code execution, depending on the functions in the current scope and on the arguments passed to the hook. Similarly, if you control the first argument of such a call, you could register a set callback on a hook of your choice. Maybe you can find a hook which takes parameters in a format unexpected by your callback and trigger some interesting behavior ? This could make for fun CTF challenges, digging in a WordPress instance with custom plugins to find interesting gadgets for this kind of exploitation. Left as an exercise to the reader !

To round this part up, keep in mind that actions and filters are just two ways of referring to the same objects of type WP_Hook and as such, a hook can act both as an action and as a filter. So don’t be surprised to see a mix of add_action and add_filter referring to the same hook name, it’s intended, and it acts on the same WP_Hook object behind the scenes.

 

Routing & Rewrite Rules

WordPress mixes two types of routing:

  • In general, all the requests related to the customer-facing (unauthenticated) part of a WordPress-powered website are rewritten (according to the Apache/Nginx/IIS server configuration) and then managed by the index.php file at the webroot of the application using the Rewrite API. This mechanism is what we will discuss in the following paragraph.
  • All the other requests, for example to the backend panel, are not rewritten and are directly handled by the corresponding PHP files, like any basic PHP website. All the files include the core WordPress classes and then immediately check if the user has permission to use the administrative feature before running any code.

In the general case, the request is processed by the web server and lands on index.php. As WordPress internally decides on the action to take based on each query’s GET parameters, any “pretty” request URI (e.g. /author/admin) must be converted to the form /index.php?var1=value1&var2=value2&... (here /index.php?author_name=admin). This conversion is done in the wp-include/rewrite.php file with the help of the rewrite_rules option (stored in the wp_options table), a dictionary mapping regexps to parameters:

php > var_dump($rewrite_rules);
array(96) {
  ["^wp-json/?$"]=>
  string(22) "index.php?rest_route=/"
  ["^wp-json/(.*)?"]=>
  string(33) "index.php?rest_route=/$matches[1]"
  ["^index.php/wp-json/?$"]=>
  string(22) "index.php?rest_route=/"
  ["^index.php/wp-json/(.*)?"]=>
  string(33) "index.php?rest_route=/$matches[1]"
  ["^wp-sitemap\.xml$"]=>
  string(23) "index.php?sitemap=index"
...

The request URI is evaluated against each regexp one at a time, and upon first match, is converted to the mapped form. In this form, it can be processed by the rest of WordPress to invoke the right controller and serve the right page.
More specifically, it translates to the following code path, where a box in a box means the outer file includes the inner file:

WordPress routing codepath

 

WordPress APIs

Throughout the years, WordPress has seen 3 evolutions of API systems: XML-RPC, admin-ajax and the REST API. Let’s detail each of them:

XML-RPC

This system is the oldest of all. It was originally made for integrating your WordPress blog with other blogging sites and applications, in order to receive updates, post comments and do everything you could do on the blog without using the blog’s UI.
Developers and external applications interact with a blog’s XML-RPC API by sending HTTP POST requests to the XML-RPC endpoint on a WordPress site, which is a file called xmlrpc.php usually located at the webroot of a WordPress server. As the name implies, the content of the POST request is an XML document describing what you want to do. For instance, if you wanted to create a post on a WordPress website using the XML-RPC mechanism, the content of your request would be the following :

<?xml version="1.0"?>
<methodCall>
  <methodName>metaWeblog.newPost</methodName>
  <params>
    <param>
      <value><string>YOUR_USERNAME</string></value>
    </param>
    <param>
      <value><string>YOUR_PASSWORD</string></value>
    </param>
    <param>
      <value><string>New Post Title</string></value>
    </param>
    <param>
      <value><string>Post Content</string></value>
    </param>
    <param>
      <value><string>category_slug</string></value>
    </param>
  </params>
</methodCall>

This request is passed on to a wp_xmlrpc_server class, implemented in wp-includes/class-wp-xmlrpc-server.php by default (you could specify your own implementation via the wp_xmlrpc_server_class filter). The methodName value is checked against a large mapping of strings to function names (like 'metaWeblog.newPost''this:mw_newPost' to follow on the example of a blog post creation, where mw_newPost is a method of the wp_xmlrpc_server class) and the corresponding function is called using the apply_filters method which we studied earlier. As you can see, the authentication mechanism is quite rudimentary: no fancy token, header or cookie.
The XML-RPC suffers from multiple downsides:

  • slight performance overhead: as the format of an XML-RPC request is quite verbose and needs to be parsed from XML, the computing power required on the server side is non-negligible;
  • security issues: XML-RPC is known for being an open-door to credentials brute forcing (using the system.multicall method4) and denial of service attacks;
  • limited functionality: with the arrival of the admin-ajax and REST API, the XML-RPC has not received the same amount of attention from developers as the newer ones, and thus lacks features compared to both of them.

Despite its age, XML-RPC is still enabled by default in WordPress. Disabling it in favor of the REST API is highly encouraged nowadays.

Admin-AJAX

This API is used by WordPress itself in the UI of the blog. Its main purpose is to be used together with JavaScript to trigger actions and update data on a panel without triggering a full reload of the page. While it is mainly used in the administration panel part of a WordPress blog, the list of actions handled by the admin-ajax mechanism include two unauthenticated ones (heartbeat and generate_password), and can be extended by plugins to handle other features.
To interact with this API, a developer needs to send a GET (or POST) request to the admin-ajax.php file, located in the wp-admin folder from the webroot of your WordPress blog. The specific action and parameters are provided through GET or POST classic parameters, and the authentication is handled the same way as everywhere else on the blog, that is through the wordpress_logged_in_[hash] cookie.
A request might look like the following, using JavaScript code:

var requestData = {
  action: 'my_custom_ajax_action', // The action to be performed on the server-side
  data_param1: 'value1',
  data_param2: 'value2',
};

// Send the Ajax request to admin-ajax.php
jQuery.post({
  url: '/wp-admin/admin-ajax.php', // Path to the admin-ajax.php endpoint
  data: requestData, // Data to be sent with the request
  dataType: 'json', // The expected data type of the response
  success: function(response) {
    // Handle the response from the server
    console.log('Response:', response);
  },
  error: function(error) {
    // Handle errors, if any
    console.error('Error:', error);
  }
});

When a request is received and if the action parameter value matches with an AJAX action (as defined in wp-admin/admin-ajax.php and implemented in wp-admin/includes/ajax-actions.php), this action is triggered, and the result is served as a response.

REST API

The REST API is the newest of all API mechanisms on WordPress. It uses JSON to transfer data, and follows the CRUD (Create, Read, Update, Delete) model. Its endpoints are accessible with the URI prefix /wp-json/, and plugins can register their own API routes via the register_rest_route function. To ensure that plugins do not conflict with each other or with the core, API routes are composed of a “namespace” and a “path”. For instance, in the /wp-json/wp/v2/posts/123 route (which returns the content of the post with ID 123):

  • /wp-json/ is the URL prefix to trigger the WordPress REST API,
  • wp/v2 is WordPress Core’s namespace,
  • /posts/123 corresponds to the path. (defined as /posts/(?P<id>[\d]+) in the call to register_rest_route).

Note that the Rewrite API will transform this route to /?rest_route=/wp/v2/posts/123, as seen in the paragraph about WordPress routing.

When hitting the route /wp-json/, you are given the full list of registered API endpoints. This can hint you towards interesting plugins and custom namespaces when looking for an quick-win over a specific WordPress instance.


In this API, two types of authentication are accepted:

  • classic cookie authentication: like in every other part of WordPress, users can authenticate with the wordpress_logged_in_[hash] cookie. However, to prevent CSRF attacks, this authentication needs to be used along with a CSRF nonce5, passed in the request header X-WP-Nonce. Note that this nonce mechanism is also used across a WordPress blog to protect actions on the administration panel. If you encounter them you should have a look at the documentation page;
  • basic authentication with an Application Password: Recent versions of WordPress have introduced the concept of Application Password, which you can generate from your account settings and work like API tokens. If your WordPress instance is served over HTTPS, you can then use these credentials to log in the API with HTTP Basic authentication.

 

Themes and Plugins

Plugins

Plugins on WordPress are located under the wp-content/plugins/ folder. They are usually contained in their own folder, and are made of at least one main file, named after the plugin, which acts as the entrypoint of the plugin. From there, plugins can include other files, register callbacks on hooks, register REST routes, add roles and capabilities, … They are discovered by WordPress using the get_plugins() function in /wp-admin/includes/plugin.php.
For example, WordPress’ example plugin Hello Dolly6 consists in the following very simple directory structure:

hello-dolly/
├── hello.php
└── readme.txt

A more complex plugin would be composed of more files and classes, with the main file including the others, just like any PHP project.
Note that these files are accessible from the webroot, at yoursite.com/wp-content/plugins/hello-dolly/readme.txt for the readme.txt file of Hello Dolly for example. Plugins downloaded from GitHub or on the WordPress store usually contain a LICENSE.txt, CHANGELOG.txt or some other standard file which could leak info about the version of the plugin. This is how tools like WPScan7 work to discover the version of installed plugins.

In order to access query parameters, plugins can use the global PHP variables $_GET and $_POST. WordPress automatically escapes quotes and backslashes in these arrays (using the add_magic_quotes function, which internally calls PHP's addslashes function) before any plugin can use them. However, as stated in the PHP documentation8, this mechanism should not be relied on in terms of security. Alternatively, to access GET request parameters, it is possible to register a query variable with the query_vars hook, and then to retrieve its value with the get_query_var function.

To sanitize these variables, WordPress offers a variety of filtering and escaping functions, which were mentioned in the first article of this series. Each one of these functions removes or escapes a different set of characters from a string, to use in different contexts. To handle file upload, the function wp_check_filetype checks the uploaded file's extension and outputs its corresponding MIME type.

Non-exhaustive list of WordPress escaping and sanitizing methods (43 methods)
  • sanitize_email()
  • sanitize_file_name()
  • sanitize_hex_color()
  • sanitize_hex_color_no_hash()
  • sanitize_html_class()
  • sanitize_key()
  • sanitize_meta()
  • sanitize_mime_type()
  • sanitize_option()
  • sanitize_sql_orderby()
  • sanitize_term()
  • sanitize_term_field()
  • sanitize_text_field()
  • sanitize_textarea_field()
  • sanitize_title()
  • sanitize_title_for_query()
  • sanitize_title_with_dashes()
  • sanitize_user()
  • sanitize_url()
  • esc_attr()
  • esc_html()
  • esc_js()
  • esc_textarea()
  • esc_sql()
  • esc_url()
  • esc_url_raw()
  • wp_kses()
  • wp_kses_array_lc()
  • wp_kses_attr()
  • wp_kses_bad_protocol()
  • wp_kses_bad_protocol_once()
  • wp_kses_check_attr_val()
  • wp_kses_decode_entities()
  • wp_kses_hair()
  • wp_kses_hook()
  • wp_kses_html_error()
  • wp_kses_js_entities()
  • wp_kses_no_null()
  • wp_kses_normalize_entities()
  • wp_kses_post()
  • wp_kses_split()
  • wp_kses_stripslashes()
  • wp_kses_version()

 

Themes

Themes, on the other side, are stored in wp-content/themes and are made of multiple template PHP files, which will be used to render the pages of the website in place of the default theme, as well as a main file, called functions.php. For example, a simple theme might contain the following file structure:

example-plugin/
├── style.css
├── index.php
├── header.php
├── footer.php
└── functions.php

Themes might include templates for every page type (author.php when rendering /author/<author_username>, date.php when rendering posts by date, single.php when rendering a single post, …), as well as three specific templates for the header, footer and sidebar of the blog. The index.php file is used to render the pages when the specific template for a page is not found. Theme templates are included and rendered in wp-includes/template-loader.php, using this code:

$tag_templates = array(
    'is_embed'             => 'get_embed_template',
    'is_404'               => 'get_404_template',
    [...]
    'is_date'              => 'get_date_template',
    'is_archive'           => 'get_archive_template',
);
$template      = false;

// Loop through each of the template conditionals, and find the appropriate template file.
foreach ( $tag_templates as $tag => $template_getter ) {
    if ( call_user_func( $tag ) ) {
        $template = call_user_func( $template_getter );
    }
}

if ( ! $template ) {
    $template = get_index_template();
}

$template = apply_filters( 'template_include', $template );
if ( $template ) {
    include $template;
} elseif ( current_user_can( 'switch_themes' ) ) {
    $theme = wp_get_theme();
    if ( $theme->errors() ) {
        wp_die( $theme->errors() );
    }
}
return;

All the functions get_X_template are defined in /wp-include/template.php, and use the X_template hooks to retrieve the right template. Themes can manifest themselves in these hooks using the add_filter function to declare their own templates.

Everything here holds for “classic” themes, but WordPress also supports “block” templates (introduced with version 5.0 and the Gutenberg Block editor), which are composed of HTML files instead of PHP, and include “blocks” using HTML comment syntax like:

<!-- wp:query-pagination {"paginationArrow":"arrow","align":"wide","layout":{"type":"flex","justifyContent":"space-between"}} -->
    <!-- wp:query-pagination-previous /-->
    <!-- wp:query-pagination-next /-->
<!-- /wp:query-pagination -->

These files are parsed using the WP_Block_Parser class and then rendered using the render_block function in wp-include/blocks.php. This is a custom template engine, but its attack surface is inexistent as it only processes themes files, which would be considered trusted. To put it differently, it cannot be more risky than the “classic” themes, which lets the theme template files directly execute arbitrary PHP code on the server.

 

A note on static analysis

While this is not the main topic of this article, it seems relevant to include a word about automatic security analysis of the WordPress codebase, specifically through static analysis. As a reminder, static analysis consists in analyzing a program without executing it, as opposed to dynamic analysis, where a debugger is used to analyze the code paths of a running program. Static analysis tools often use a technique called “taint tracking”, where user input (more generally, any source) is “tainted” as unsafe data, and the taint is propagated through operations realized on this input. An issue is raised when a source meets a known vulnerable method (more generally, any sink), and the data path can be retrieved by following the taint.

Sadly, PHP has multiple hurdles, such as:

  • multiple convoluted ways to call a function f, like f(), call_user_func("f"), "f"(), "F"(), …
  • dynamic typing, with loose comparisons and funny type juggling. This renders calls to $obj->method() hard to follow and taint as the static analyzer doesn’t know the type of (or the class implemented by) $obj

WordPress, like most of the CMSs, makes use of most of these language features. For example, the frequent use of call_user_func() by the hooks which use the name of a function as a string instead of the PHP function object, makes static analysis techniques really painful. SemGrep tainting engine, for instance, has a hard time on the following examples:

<?php
/* semgrep-rule.yml
rules:
  - id: test
    languages:
      - php
    message: Success
    mode: taint
    pattern-sinks:
      - pattern-either:
        # Direct calls
        - pattern: dangerousFunction(...)
        # Let's also include the direct calls via `call_user_func`
        - pattern: call_user_func("dangerousFunction", ...)
        # We should also include calls via call_user_func_array, forward_static_call & forward_static_call_array
        # ...
        # This syntax is not supported by SemGrep while it is a valid PHP structure:
        # - pattern: "dangerousFunction"(...)
    pattern-sources:
      - pattern: $user_data
    severity: ERROR
*/
$user_data = 'bad_bad_bad';
$controller_func = 'theController';

function dangerousFunction($data) {
    echo $data;
}

function theController($data) {
    dangerousFunction($data);
}

/* This is trivial, and raises an ERROR as expected */
theController($user_data);

/* This does NOT raise anything, the taint is not followed on this call */
call_user_func($controller_func, $user_data);

/* This does NOT raise anything as well */
$controller_func($user_data);
?>

This behavior is not specific to SemGrep, and these function calls mechanisms are not the only aspects on which static analysis tools struggle. As such, these tools will find trivial source-to-sink paths (like an add_action($_GET['param1'], $_GET['param2'])), but might skip over some more convoluted data flows, and you should not rely 100% on them when auditing WordPress or any CMS, be it the core or a plugin.

Conclusion

When facing a three man-days black-box audit of an up-to-date WordPress instance, go for the plugins and themes. Enumerate them by fuzzing the /wp-content/ directory, and then get a copy of the plugin/theme code either from GitHub or from the WordPress store. If it is not available on any of these platforms then you will have to perform the audit in black-box, but it has high chances of being a custom plugin/theme which has not yet received any security audit.

Regarding a white-box approach, while using static analysis tools with generic rules can help you to unveil quick-wins over large plugins, it is essential to understand the inner mechanisms of WordPress to conduct a more comprehensive assessment. With a proper development environment and a debugger, looking for cross-references, definitions or function names manually is straight-forward and will save you a lot of time in your research.