Writing a decent win32 keylogger [2/3]

Written by Martin Balc'h - 21/12/2023 - in Outils , Système - Download

In this series of articles, we talk about the ins and out of how to build a keylogger for Windows that is able to support all keyboard layouts and reconstruct Unicode characters correctly regardless of the language (excluding those using input method editors).

In the first part, after a brief introduction introducing the concepts of scan codes, virtual keys, characters and glyphs, we describe three different ways to capture keystrokes (GetKeyState, SetWindowsHookEx, GetRawInputData) and the differences between those techniques.

In the second part, we detail how Windows stores keyboard layouts information in the kbd*.dll and how to parse them.

In the third and last part, we go through the process of using the extracted information to convert scan codes into characters and all the tricky cases presented by different layouts, such as ligatures, dead keys, extra shift-states and SGCAPS.
Finally, we present our methodology to test and validate that our reconstruction is correct by writing a testing tool which can automate the injection of scan codes and retrieve the reference text produced by Windows which we compare with our reconstructed text.

Part 1 Part 2 Part 3 Github

In the previous article, we saw a few different techniques to capture key-presses on Windows. In this article, we explain how Windows translates scan-codes into characters and how we can parse keyboard layout DLLs to extract the data required to emulate that process.

Translating to characters

Now that we saw a few ways to retrieve the key-presses and context info, let’s find out how Windows goes about converting scan codes to first virtual keys, and then characters. You can find information on Windows' keyboard input model here.

Here is a simplified overview of the process:

overview of windows keyboard input model

As we saw earlier, the activated input language will select a keyboard layout. Windows supports more than a hundred different layouts out of the box, with the option to create or import more. For each layout you will find a DLL whose name starts with ‘KBD’ located in C:\Windows\System32\, such as KBDFR.DLL, KBDUS.DLL, and so on.

You can find the list of usable keyboard layouts by enumerating the following registry key:

  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layouts

Those DLLs contain everything Windows needs to convert scan codes (from the keyboard) into characters (displayed inside application windows). We will now describe their contents and how you can go about parsing that into a more readily useable intermediary format.

KBD*.DLL structure

While keyboard layout DLLs can export two functions, only one of them is really useful for our purposes and we we will not cover the second (KbdNlsLayerDescriptor) in this article. It you want to read more about it, you can go here for the definitions and here for a reference implementation.

The only function of interest to us is KbdLayerDescriptor which is defined like this:

PKBDTABLES KbdLayerDescriptor(VOID);

By calling it, you get a pointer to a KBDTABLES structure, which is defined as:

typedef struct tagKbdLayer {
    // shift states modifiers info (shift, control, alt, alt-gr, etc.)
    PMODIFIERS pCharModifiers;

    // virtual keys to character conversion tables
    PVK_TO_WCHAR_TABLE pVkToWcharTable;

    // list of supported diactritics for this layout (aka dead-keys)
    PDEADKEY pDeadKey;

    // names of keys (eg: RETURN => ENTRÉE)
    PVSC_LPWSTR pKeyNames;
    PVSC_LPWSTR pKeyNamesExt;
    WCHAR *KBD_LONG_POINTER *KBD_LONG_POINTER pKeyNamesDead;

    // scan code to virtual keys conversion
    USHORT  *KBD_LONG_POINTER pusVSCtoVK;
    BYTE    bMaxVSCtoVK;
    PVSC_VK pVSCtoVK_E0;
    PVSC_VK pVSCtoVK_E1;

    // locale specific flags (ALT-GR, Left to right, ...)
    DWORD fLocaleFlags;

    // ligatures
    BYTE       nLgMax;
    BYTE       cbLgEntry;
    PLIGATURE1 pLigature;

    // type & subtype
    DWORD      dwType;
    DWORD      dwSubType;

} KBDTABLES, * PKBDTABLES;

Exploring the KBDTABLES structure

In order to understand the contents of this structure, we wrote a tool to dump the DLLs to JSON files which will be easier to work with afterwards.

You can also check out the XML files generated by kbdlayout.info for each keyboard layout. The “XML internal tables” is basically a XML representation of the data contained in the keyboard layout DLLs. The “XML for processing” files are a higher level view of the data, in a more structured, easier to use format. Those tables were very useful to us to make sure our extraction process was correct, thanks Jan!

You can find the full source code of the tool here.

The program is written in C++ (C with 1 class to be truthful ;p) and uses the JSON library by Niels Lohmann which is very useful, powerful and easy to use (<3 single header libs).

Loading the dll & getting the pointer

Let’s start loading the DLL and then retrieve a pointer to KbdLayerDescriptor and call it to get our KBDTABLES pointer.

// load FR keyboard layout
const char * dll = "KBDFR";
HMODULE hmod = LoadLibraryA(dll);
if(!hmod)
    // handle error

// find the exported function KbdLayerDescriptor
FARPROC func = GetProcAddress(hmod, "KbdLayerDescriptor");
if(!func)
    // handle error

// cast the function pointer and call it
PKBDTABLES kbd = ((PKBDTABLES(*)())func)();

Keyboard layout locale flags

Now that we have our kbd pointer, we can start to extract information. We start by parsing the keyboard layout locale flags.

// create a json object we will fill with our extracted data
j = json({});
// parse flags
j["flag_altgr"]     = kbd->fLocaleFlags & KLLF_ALTGR       ? 1 : 0;
j["flag_shiftlock"] = kbd->fLocaleFlags & KLLF_SHIFTLOCK   ? 1 : 0;
j["flag_ltr"]       = kbd->fLocaleFlags & KLLF_LRM_RLM     ? 1 : 0;

There are 3 different locale flags:

  • 0x1: KLLF_ALTGR, if set, indicates that for this keyboard layout, the right hand ALT key should be handled as CONTROL + ALT
  • 0x2: KLLF_SHIFTLOCK, unused but if set, indicates that pressing the SHIFT key will reset the status of the CAPSLOCK key
  • 0x4: KLLF_LRM_RLM, only used for keyboard layouts with right-to-left scripts, inserts left-to-right marker (LRM) and right-to-left marker (RLM) on specific key presses (left/right shift/control and backspace combinations)

Parsing shift states & modifiers

Some layouts have more modifier keys than the common SHIFT, CONTROL, ALT and relatively common ALT-GR. For instance, Japanese keyboard use a dedicated KANA key and Canadian Multilangual uses the right control key as an extra modifier key. So this information has to be stored in the keyboard layout. Here are the relevant structures:

typedef struct {
    BYTE Vk;
    BYTE ModBits;
} VK_TO_BIT, *PVK_TO_BIT;

typedef struct {
    PVK_TO_BIT pVkToBit;
    WORD       wMaxModBits;
    BYTE       ModNumber[];
} MODIFIERS, *PMODIFIERS;

The KBDTABLES structure contains a pointer to this MODIFIERS struct which contains:

  • a list (pVkToBit) of virtual keys which act as modifiers, with an associated modifier bit value
  • a list (ModNumber) of wMaxModBits values which map an input modifier bit-field to a column index (the shift state)

Here are example values taken from the French keyboard layout:

Now let us see what the ModNumber list means:

Modifier Mod value ModNumber Comment
no modifier 0x0 0  
shift 0x1 1  
control 0x2 2  
alt 0x4 15 no valid combo with ALT only
alt+shift 0x5 15 no valid combo with ALT+SHIFT only
alt+control 0x6 3 ALT-GR = ALT + CONTROL

So for the legacy AZERTY French keyboard layout there are at most 3 possible modifiers (shift, control and alt-gr) for a single key and the column order in the virtual keys to character tables (that we will describe later) will be:

  • 0 = no modifier
  • 1 = shift
  • 2 = control
  • 3 = alt-gr

Now this is the code we wrote to dump those shift states to JSON:

// shift states & modifiers
j["shiftstates"] = json::array();
for(int i=0; i<=kbd->pCharModifiers->wMaxModBits; i++)
    j["shiftstates"].push_back(kbd->pCharModifiers->ModNumber[i]);

j["modifiers"] = json::array();
for(int i=0; kbd->pCharModifiers->pVkToBit[i].Vk; i++)
{
    json o = json();
    o["modbits"] = kbd->pCharModifiers->pVkToBit[i].ModBits;
    o["vk"] = kbd->pCharModifiers->pVkToBit[i].Vk;
    o["vkn"] = VKN(kbd->pCharModifiers->pVkToBit[i].Vk);
    j["modifiers"].push_back(o);
}

The VKN macro returns the ASCII string representation of the virtual key name (see vk_names.h and vk_names.py in the repo).

Parsing VkToWcharTable

Now things get a little more dicey, the VkToWcharTable structure pointed to in KBDTABLES is defined like this:

typedef struct tagKbdLayer {
    ...
    PVK_TO_WCHAR_TABLE pVkToWcharTable;
    ...
} KBDTABLES, * PKBDTABLES;

typedef struct _VK_TO_WCHAR_TABLE {
    PVK_TO_WCHARS1 pVkToWchars;
    BYTE           nModifications;
    BYTE           cbSize;
} VK_TO_WCHAR_TABLE, *PVK_TO_WCHAR_TABLE;

Which refers PVK_TO_WCHARS1 which is a structure defined by a macro:

Which is called ten times:

TYPEDEF_VK_TO_WCHARS(1)
TYPEDEF_VK_TO_WCHARS(2)
TYPEDEF_VK_TO_WCHARS(3)
TYPEDEF_VK_TO_WCHARS(4)
TYPEDEF_VK_TO_WCHARS(5)
TYPEDEF_VK_TO_WCHARS(6)
TYPEDEF_VK_TO_WCHARS(7)
TYPEDEF_VK_TO_WCHARS(8)
TYPEDEF_VK_TO_WCHARS(9)
TYPEDEF_VK_TO_WCHARS(10)

This will result in defining 10 almost identical structures, named VK_TO_WCHARS1, VK_TO_WCHARS2, … to VK_TO_WCHARS10 with the only difference between them beeing the size of the wch buffer. Those structures contain:

  • VirtualKey: a virtual key
  • Attributes: a set of flags for this entry of the conversion table from virtual key to character
  • wch: a list of characters that can be output when this virtual key is pressed (based upon the current shift state)

So to sum up, we have a pointer to multiple VK_TO_WCHAR_TABLE structures which each contain:

  • a pointer to VK_TO_WCHARS structures (of a specific size)
  • how many modifiers will be present for each entry (nModifications)
  • the offset to the next entry in the VK_TO_WCHARS struct

Here is our code to dump the tables to JSON, note how we cheat by using only VK_TO_WCHARS10 pointers and go to the next entry by recasting the pointer at the right address, instead of using the proper pointer according to the size (which would complexify the code).

j["vk_to_wchars"] = json::array();
for(int i=0; kbd->pVkToWcharTable[i].cbSize; i++)
{
    json o = json();
    o["index"] = i+1;
    o["num_mods"] = kbd->pVkToWcharTable[i].nModifications;
    o["table"] = json::array();

    PVK_TO_WCHARS10 pvk2wch = (PVK_TO_WCHARS10)kbd->pVkToWcharTable[i].pVkToWchars;
    while(pvk2wch->VirtualKey)
    {
        json it = json();
        it["vk"] = pvk2wch->VirtualKey;
        it["vkn"] = VKN(pvk2wch->VirtualKey);
        it["attrs"] = pvk2wch->Attributes;
        it["wch"] = json::array();

        for(int j=0; j<kbd->pVkToWcharTable[i].nModifications; j++)
            it["wch"].push_back(pvk2wch->wch[j]);

        pvk2wch = (PVK_TO_WCHARS10)((char*)pvk2wch + kbd->pVkToWcharTable[i].cbSize);
        o["table"].push_back(it);
    }
    j["vk_to_wchars"].push_back(o);
}

Parsing VSCtoVK

Now let us talk about the data structures that allow us to convert scan codes into virtual keys. The first one is pusVSCtoVK which is just a pointer to USHORT, accompanied with bMaxVSCtoVK which gives us the number of items in the array. The index is the scan code and the value of the USHORT pointed to is the virtual key or’d to eventual flags.

The parsing code is straight forward:

j["vsc_to_vk"] = json::array();
USHORT * vvk = kbd->pusVSCtoVK;
if(vvk)
{
    for(int i=0; i<kbd->bMaxVSCtoVK; i++)
    {
        json o = json();
        o["sc"] = i;
        o["vk"] = vvk[i] & 0xff;            // mask out the virtual key flags
        o["vkn"] = VKN(vvk[i] & 0xff);      // mask out the virtual key flags
        o["flags"] = vkftos(vvk[i]);        // utility function to convert the flags to json
        j["vsc_to_vk"].push_back(o);
    }
}

With our function vkftos declared like this:

json vkftos(int vk)
{
    json o = json::array();
    if(vk & KBDEXT)         o.push_back("KBDEXT");
    if(vk & KBDMULTIVK)     o.push_back("KBDMULTIVK");
    if(vk & KBDSPECIAL)     o.push_back("KBDSPECIAL");
    if(vk & KBDNUMPAD)      o.push_back("KBDNUMPAD");
    if(vk & KBDUNICODE)     o.push_back("KBDUNICODE");
    if(vk & KBDINJECTEDVK)  o.push_back("KBDINJECTEDVK");
    if(vk & KBDMAPPEDVK)    o.push_back("KBDMAPPEDVK");
    if(vk & KBDBREAK)       o.push_back("KBDBREAK");
    return o;
}

All flag values and defines can be found here.

Now there are two more tables (pVSCtoVK_E0 and pVSCtoVK_E1) which map extended scan codes to virtual keys. Both tables work the same:

typedef struct tagKbdLayer {
    ...
    PVSC_VK pVSCtoVK_E0;
    PVSC_VK pVSCtoVK_E1;
    ...
} KBDTABLES, * PKBDTABLES;

typedef struct _VSC_VK {
    BYTE Vsc;
    USHORT Vk;
} VSC_VK, *PVSC_VK;

So we have an array of VSC_VK structures with two members, one for the virtual scan code and one for the associated virtual key. You have to keep reading items from the list until you get a nil virtual scan code.

Here is our parsing code:

j["vsc_to_vk_e0"] = json::array();
PVSC_VK vv0 = kbd->pVSCtoVK_E0;
for(int i=0; vv0 && vv0[i].Vsc; i++)
{
    json o = json();
    o["sc"] = vv0[i].Vsc;
    o["vk"] = vv0[i].Vk & 0xff;
    o["vkn"] = VKN(vv0[i].Vk & 0xff);
    o["flags"] = vkftos(vv0[i].Vk);
    j["vsc_to_vk_e0"].push_back(o);
}

j["vsc_to_vk_e1"] = json::array();
PVSC_VK vv1 = kbd->pVSCtoVK_E1;
for(int i=0; vv1 && vv1[i].Vsc; i++)
{
    json o = json();
    o["sc"] = vv1[i].Vsc;
    o["vk"] = vv1[i].Vk & 0xff;
    o["vkn"] = VKN(vv1[i].Vk & 0xff);
    o["flags"] = vkftos(vv1[i].Vk);
    j["vsc_to_vk_e0"].push_back(o);
}

Parsing dead keys

An interesting feature that is not used by all keyboard layouts is the support of “dead keys”. A dead key is a key that will not output a character when pressed but will instead wait for the next key press to output one or more characters to the screen. One such example is the key ^ (circumflex accent) on a french keyboard:

first key second key output
^ e ê
^ i î
^ ' ' ^^
^ p ^p

Such information is stored in the PDEADKEY pDeadKey variable whose structure is pretty simple:

typedef struct {
    DWORD  dwBoth;
    WCHAR  wchComposed;
    USHORT uFlags;
} DEADKEY, *PDEADKEY;

For each of those entries, the upper 16 bits of the dwBoth variable represent the 1st character (the ‘dead’ character) and the lower 16 bits will be the character that the dead key can be combined with (for example, the letter E). The wchComposed variable is the combination of both those characters (in our example ê). This table will contain the list of all valid combinations:

We can parse all that information like this:

j["deadkeys"] = json::array();
PDEADKEY pd = kbd->pDeadKey;
for(int i=0; pd && pd[i].dwBoth != 0; i++)
{
    json o = json();
    o["vk1"] = pd[i].dwBoth >> 16;
    o["vk2"] = pd[i].dwBoth & 0xffff;
    o["combined"] = pd[i].wchComposed;
    o["flags"] = pd[i].uFlags;
    j["deadkeys"].push_back(o);
}

Parsing ligatures

The only thing left that we have to parse in order to fully emulate Windows character translation from keypresses are ligatures. Ligatures are the representation of two or more characters into a single glyph. The word “cœur” (which means “heart”) contains such a ligature, the characters o and e are merged into a single glyph œ. Funnily enough, we can’t type that word with a standard french keyboard 💔. As an additionnal note, TTF fonts have support for ligatures, and can sometimes automatically handle such cases to display the proper joined character without requiring the input text to use the specific unicode codepoints for the ligature characters.

Now let us see an example from a keyboard layout that supports ligature: arabic. When you press the B key, the output character will be , which is the combination of ل (Arabic letter LAM) and ا (Arabic letter ALEF). If you were to press backspace just after pressing the b key, you would only remove the ALEF character, and not both, and only the character LAM would remain.

Here are the relevant data structures in the KBDTABLES struct:

typedef struct tagKbdLayer {
    ...
    BYTE       nLgMax;
    BYTE       cbLgEntry;
    PLIGATURE1 pLigature;
    ...
} KBDTABLES, *PKBDTABLES;

With PLIGATURE1 defined by the following macro for up to 5 characters long ligatures:

#define TYPEDEF_LIGATURE(n) typedef struct _LIGATURE##n { \
    BYTE  VirtualKey; \
    WORD  ModificationNumber; \
    WCHAR wch[n]; \
} LIGATURE##n, *PLIGATURE##n;

TYPEDEF_LIGATURE(1)
TYPEDEF_LIGATURE(2)
TYPEDEF_LIGATURE(3)
TYPEDEF_LIGATURE(4)
TYPEDEF_LIGATURE(5)

The nLgMax variable indicates the maximum number of characters for a single ligature for the current keyboard layout. The cbLgEntry variable indicates the size in bytes of a single ligature entry. We parse the ligature table like this (using only PLIGATURE5 pointers for shorter code):

j["ligatures"] = json::array();
PLIGATURE5 lg = (PLIGATURE5)((BYTE*)kbd->pLigature);
for(int i=0; lg && lg->VirtualKey; i++, lg = (PLIGATURE5)((BYTE*)kbd->pLigature + i*kbd->cbLgEntry))
{
    json o = json();
    o["vk"] = lg->VirtualKey;
    o["modnum"] = lg->ModificationNumber;
    o["chars"] = json::array();
    for(int k=0; k<kbd->nLgMax; k++)
        o["chars"].push_back(lg->wch[k]);
    j["ligatures"].push_back(o);
}

… And that’s it, we’re done extracting data from those keyboard layout DLLs! There is more that we haven’t covered, such as key names, because we won’t be needing them to reconstruct our text.

In the next article we will explain how to emulate Windows' scan code to character translation with the data we extracted from the keyboard layout DLLs, you can find it here.