Unleash the dino: Time-based strategies to improve password cracking

Written by Arnaud Van Straaten - 23/12/2020 - in Outils - Download
In this article we share technical details on how Kraqozorus automatically generates password cracking strategies that improve both the number of cracked hashes and time required to run the attacks.

Problem definition

 
During pentests, our experts have to penetrate the target's infrastructure in order to underline all the flaws that could lead to a compromission. Most of the time, accessing a user account, with more or less privileges, can bring new elements into focus.  New endpoints, features or sensitive assets are often discovered that way.
However, passwords are not stored as plaintext but in the form of a hash that needs to be cracked.
These hashes, depending on the function used to compute them, can have widely different performance characteristics.
For example, web services mostly use bcrypt, sha1, and md5, whereas kerberoasting gives TGS ( among other things ) and the /etc/shadow file usually contains sha512crypt hashes. We want pentesters to avoid spending time or effort thinking about which cracking jobs should be launched to maximize their chances of success within the alloted time frame - and we propose to automate that decision.

 

Sample performance

The Kraqominus rig

 
This article uses the following baseline for performance metrics.
  • Kraqozorus software
  • 1 GPU RTX 2080Ti
  • 1 CPU i7-4730K
  • JohnTheRipper Jumbo
  • Hashcat 6.1.1

 

This machine is nicknamed Kraqominus.
 

Sample cracking times

 
Here is the estimated time required to perform a 8-characters-brute-force attack (?a?a?a?a?a?a?a?a) on our Kraqominus rig, for each of these hash functions.
Samples
  Hash function  Estimated time   Mean Speed Hashcat  Mean Speed JohnTheRipper
 Bcrypt  694287.9   y   300 H/s  250 H/s
 Sha512crypt        770.0 y  270 KH/s  4 KH/s
 Sha1  4.4 d  15 GH/s  30 MH/s
 MD5  23.3 h  79 GH/s  50 MH/s
 NT  23.2 h  79.5 GH/s  65 MH/s
 
As can be seen, cracking time and speed vary according to the hash function. However, during an engagement, pentesters do not have unlimited time. A penetration test usually lasts about a week, which means they usually only have a few days to crack hashes so they should not waste time thinking about which attacks to run or monitoring the cracking jobs' progress.
In order to optimize the cracking process efficiency, we should:
  • run tailored attacks, using knowledge acquired during the engagement
  • run as many attacks as possible in the alloted time
 

Strategies

 
Kraqozorus provides strategies in order to:
  1. Simplify the cracking process
  2. Save time
  3. Increase the percentage of cracked passwords

 

Simply put, strategies are sets of attacks, such as:
  • Brute-force
  • Wordlists
  • Wordlists + rules
  • Wordlists combination
  • Wordlists + masks
  • Masklists
  • Recycling cracked passwords
  • Single mode

As a reminder, a rule-based attack consists of a bunch of mutations (modify, cut, extend, duplicate, ... ) on a given wordlist and a mask-based attack represents a subset of a brute-force attack, it's more specific, so by definition, faster.

Kraqozorus lets users manually create strategies, using their expert knowledge. It also can generate a strategy based on a time constraint.
 
Strategy creation form
Strategy creation page

 

 

Manual strategy

 
These strategies are created for general purpose, or specific uses. The goal of general purpose strategies is to have a default good set of attacks with high cracking potential. The cracked passwords can then be used to perform various analysis, such as finding common base words or most used masks.
 
To illustrate this point, 3 default strategies have been created with different purposes and they are used in the benchmark in the last paragraph.
Brute-force strategy:
- full 1 to 7
- lower, digits, upper 8-length words
- lower, digits 9-length words
Wordlists strategy:
- 4 wordlists (~106 GB): breachcompilation, hashes.org (2012 to mi-2020), probv2-huge and internal-all (custom-wordlist)
Mixed strategy:
- single mode with rules: single and jumbo 
- wordlists: m3g9tr0n, crackstation, entr0py, rockyou, probv2-huge, internal-all
- wordlists with rules: m3g9tr0n+hob064, crackstation+hob064, entr0py+hob064, rockyou+hob064, rockyou+10000randrules, internal-medium+internal-prior, breachcompialtion+dive
- brute-force: 3 to 7
- masks: lower*6+digit*2, lower/upper*1+lower*4+digit*3, upper*1+lower*3+digit*4

 

Specific strategies are usually created just after the recon step. Using OSINT, target profiling, wordlists containing common words, target information, etc. can be combined. These resources are target-specific and have a high cracking potential.  For example, projects like CeWL (https://github.com/digininja/CeWL), Bopscrk (https://github.com/R3nt0n/bopscrk) can help create dedicated wordlists, and, if the password policy with the common masks is correlated, highly efficient masks suited for the target can be obtained. Moreover, the Single password mangling rule is quite efficient with authentication data that includes the user name, or other user-specific information, with the hashes. This rule extracts the username and performs a bunch of mutations. As it is really common to reuse parts of the username in a password, this usually works well.
 
To summarize, manual strategies can be highly efficient but can take a variable runtime (based on hashformat), take time to be well built  and they require technical expertise/experience.
 

Generated strategy

 
Generated strategies are generated on-demand, based on a time constraint, cracking resources, and hash types. How can the software maximize the cracking efficiency without exceeding a dedicated time? How to select the most valuable attacks to run in the allotted time?
When abstracting this problem, we found an analogy with the 0-1 knapsack problem. A lot of algorithms exists for this NP-complete problem, from which we selected the following two approaches:
  • genetic optimization approach
  • "black box" solver approach
In both cases, the approach is to:
  • Select all potential attacks, and compute statistics regarding their efficiency (this is called an Item)
  • Identify overlaps between attacks
  • Run the optimizer, using the list of Items and the information on overlapping attacks
 

User interface

 
strategy generation
Strategy generation form

 

 
As seen in the screenshot above, the user inputs the following parameters:
  • allotted time, which is the maximum amount of time the attack is allowed to run
  • the hash format, in order for the optimizer to evaluate how much time each attack will take
  • the workers, they are the cracking units that form your cracking cluster
In order to account for various factors, such as workers initialization times, the optimizer is constrained to 95% of the maximum duration that the user selected.
 

Potential attacks selection

 
For each worker, the software filters all the available attacks stored in the database. Attacks can be ruled out if they do not work on the specified worker, as attacks are John the Ripper or Hashcat specific.
For each attack, based on previous usage data, the following statistics are computed:
  • the total amount of cracked password
  • if the hashformat is unsalted, the mean duration of this attack
  • it the hashformat is salted, it will use the ETA (estimated time) computed by the ratio between the keyspace size and the cracking speed. If the attack time is greater than the time bound, it will be rejected outright
 
At the end of this step, an Item is obtained with the following information:
  • the attack name
  • the total amount of passwords cracked by this attack
  • the average duration of this attack, using the specified hash format
 

Attack overlap detection

 
The next step is to check for attack overlaps. For example, if the strategy includes a mask attack that cracks alphanumeric passwords of length 6, it makes no sense to include a mask attack that only cracks numeric passwords, as they would already be tested. In order to avoid wasting time, a dictionary of "incompatible" attacks is generated, based on the attacks attributes (such as wordlists, rules, masks).
Currently, overlaps are only checked between attacks of the same type.
  • To resolve this problem for masks, Kraqozorus computes the intersections between each pair of masks. If this intersection is not empty, a bilateral incompatibility between them is created in the dictionary.
  • To create incompatibilities for wordlists, we use the label of a wordlist attack. By convention, it is structured as <name_wordlist>-<size_or_date> (for example, rockyou-small or custom_wordlist-2020). We then use the hypothesis that rockyou-large overlaps with rockyou-small, but not with custom_wordlist-2020.
  • In the case of rules based attacks, the wordlist overlap check is combined with a check based on the name of the rule.
  • In the case of wordlists + masks attacks, the wordlist overlap check is combined with the mask overlap check.

 

Moreover, attacks can overlap between several workers. Each duplicated attack will be assigned to the worker that is the most efficient for the given attack. The worker with the largest amount of attacks will have its strategy created first. Once computed, all attacks selected for this worker's strategy are removed from the pool of attacks that can be selected for the other workers. However, the time allowed to a hashcat worker is split in two parts. The first part is dedicated to mask attacks and the second to the others. If the mask attacks did not consume their alloted time, the remaining time is added to the time the second strategy can use. Each selected attack is blacklisted thus avoiding its repetition in next strategies.

 Below a snippet of code which shows a standalone tool to compute overlapping between masks.

#!/usr/bin/env python3

from typing                     import Tuple, Dict, List, Set, Optional
from json import loads
import functools

CHARSET_l = set(range(ord('a'), ord('z') + 1))  # lower 
CHARSET_u = set(range(ord('A'), ord('Z') + 1))  # upper
CHARSET_d = set(range(ord('0'), ord('9') + 1))  # digits
CHARSET_h = CHARSET_d | CHARSET_l  # lower hex
CHARSET_H = CHARSET_d | CHARSET_u  # upper hex
CHARSET_s = {ord(x) for x in " !\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"}  # special
CHARSET_a = CHARSET_l | CHARSET_u | CHARSET_d | CHARSET_s  # all
CHARSET_b = set(range(0, 256))  # bits
CHARSETS = {
    'l': CHARSET_l,
    'u': CHARSET_u,
    'd': CHARSET_d,
    'h': CHARSET_h,
    'H': CHARSET_H,
    's': CHARSET_s,
    'a': CHARSET_a,
    'b': CHARSET_b,
    '?': {ord('?')}
}

class ItemMask:
    ''' Class to store mask's properties '''
    def __init__(self, mask, custom_charset=()):
        self.mask = mask
        self.custom_charset = custom_charset

def _expand_mask(
    mask: str,
    custom_charsets: Tuple[Optional[str], Optional[str], Optional[str], Optional[str]],
) -> List[Set[int]]:
    ''' Expand the mask to obtain all the solution space '''
    out: List[Set[int]] = []
    i = 0
    while i < len(mask):
        cur = mask[i]
        if cur == '?' and i + 1 < len(mask):
            i += 1
            typ = mask[i]
            if typ in ('1', '2', '3', '4'):
                out.append(_expand_custom_charset(custom_charsets[int(typ) - 1]))
            else:
                out.append(CHARSETS.get(typ, set()))
        else:
            out.append({ord(cur)})
        i += 1
    return out

def _expand_custom_charset(
    cs: Optional[str]
) -> Set[int]:
    ''' expand the solution space of the mask depending on the custom charsets (?1,?2,?3,?4) '''
    if not cs:
        return set()
    out = set()
    i = 0
    while i < len(cs):
        cur = cs[i]
        if cur == '?' and i + 1 < len(cs):
           i += 1
           out |= CHARSETS.get(cs[i], set())
        else:
            out.add(ord(cur))
        i += 1
    return out

def _mask_overlap(
        mask1: List[Set[int]],
        mask2: List[Set[int]]
) -> Tuple[int, int]:
    ''' check if a collision exists between 2 masks '''
    card = functools.reduce(
        lambda acc, n: acc * len(n),
        mask1,
        1)
    # masks do not have the same length, can't overlap
    if len(mask1) != len(mask2):
        return (card, 0)
    common = 1
    for (c1, c2) in zip(mask1, mask2):
        common *= len(c1 & c2)
        if common == 0:
            break
    return (card - common, common)

def get_incompatibilities_mask_type(
    items: List[ItemMask]
) -> Dict[int, List[int]]:
    ''' get the incompatibilities dictionnary for the attack type involving a mask '''
    incompatibilities_mask: Dict[str, List[str]] = {k.mask: [] for k in items}
    for maskattack in items:
        incompatibilities_mask[maskattack.mask] = []
        # list of all maskattacks minus the selected maskattack
        potential_incompatibilities = items.copy()
        potential_incompatibilities.remove(maskattack)
        mask_expansion = _expand_mask(maskattack.mask, maskattack.custom_charset)
        for candidate in potential_incompatibilities:
            candidate_mask_expansion = _expand_mask(candidate.mask, candidate.custom_charset)
            _, common_mask = _mask_overlap(mask_expansion, candidate_mask_expansion)
            if common_mask != 0:
                incompatibilities_mask[maskattack.mask].append(candidate.mask)
    return incompatibilities_mask

def file_to_items(filein):
    ''' Create a list of ItemMask from the content of the json file '''
    items = []
    filein_content = {}
    with open(filein, 'r') as fin:
        filein_content = loads(fin.read())
    for item_mask in filein_content['masklist']:
        items.append(ItemMask(item_mask['mask'], (item_mask['charset1'], item_mask['charset2'],item_mask['charset3'],item_mask['charset4'])))
    return items

if __name__ == '__main__':
    """
    > test.json
    {
        "masklist": [
            {
                "mask": "?l?d?d?d?l",
                "charset1": "",
                "charset2": "",
                "charset3": "",
                "charset4": ""
            },
            {
                "mask": "?a?a?a?a?a",
                "charset1": "",
                "charset2": "",
                "charset3": "",
                "charset4": ""
            },
            {
                "mask": "?a?l?d?a?a",
                "charset1": "",
                "charset2": "",
                "charset3": "",
                "charset4": ""
            }
        ]
    }
    """
    filein = './test.json'
    items_mask = file_to_items(filein)
    result_incompatibilities = get_incompatibilities_mask_type(items_mask)
    print(result_incompatibilities)
    # {'?l?d?d?d?l': ['?a?a?a?a?a'], '?a?a?a?a?a': ['?l?d?d?d?l', '?a?l?d?a?a'], '?a?l?d?a?a': ['?a?a?a?a?a']}
    

Genetic optimizer

 

At this point, we have a list of Items and a dictionary of "incompatibilities". If there are less than 30 items in that list, the optimal solution is found by brute force.
 
solutions = List[]
store_all_solutions(solutions, Items)
remove_incompatibilities(solutions, Incompatibilities)
get_max_valuable_solution(solutions)

 

 
If the list is larger, the following algorithm is used. First of all, a greedy approximation is computed, taking into account possible overlaps. The pseudo code of the greedy algorithm is given below.
 
sorted_items = sort_ratio_value_weight(Items)
solution = Solution(items=List[], total_value=0, total_weight=0)
for item in sorted_items:
    if solution.total_weight + item.weight <= max_weight:
        solution.add_item(item)
        solution.add_weight(item.weight)
        solution.add_value(item.value)
 
The default parameters for the genetic algorithm are:
 
population_size = 100
threshold_max_population = population_size * 3
max_generations = 1000
crossover_probability = 0.9
mutation_probability = 0.25
backpack_capacity = list[Items].length
chromosomal_base = result of the greedy algorithm
avoid_local_max_every = max_generations * 0.05
 

Population generation

The result given by the greedy algorithm will serve as a chromosomal base for the initial population. A chromosom is defined as a list of bits, 1 when the attack is selected, 0 otherwise. 10% of the population is seeded with the greedy approximation, while the remaining 90% is seeded with random chromosoms. Overlaps are removed on this population by using the following process: when a gene at index x in the chromosom is set to 1, all the associated incompatibilities must be set to 0. If that is not the case, the chromosom is rejected, and a new random chromosom is generated.

Iterate through generations

 
For each generation, the program proceeds through the following steps:
  1. parent fitness computation
  2. parent selection
  3. crossover
  4. child fitness computation
  5. current best individual selection

 

Here, value is the total amount of cracked passwords, and weight the sum of the average duration of all attacks selected on the genotype.

 

Steps 1 through 4 are used to compute the value and weight of an individual. If its weight is greater than the maximum amount, the program will inhibit random genes as long as its weight is not valid. These parts are used as checkpoints.
The parent selection at step 2 works in two phases:
  • 10% of the new population is generated using the best 10% of the previous population, based on the postulate that if we select only the best individuals, the potential of a population will increase. However this method can increase the probability of reaching a non optimal local maximum.
  • The remaining 90% is generated by randomly choosing individuals in the parent population, where the probability of choosing a given individual is proportional to its fitness.

 

At step 3, two parents are randomly selected to produce two children. Depending on the crossover_probability parameter, a random value determines if these parents will create new individuals or not. If not, parents will be added, with no modification, to the next generation. Otherwise, two individuals will be created by mixing the parents in the following way. Each gene of a new individual is taken randomly from one of its parents in the genotypic order. If the gene is set to 1 and its index refers to an entry in the incompatibilities dictionary, all the incompatible genes will be set to 0 to avoid a genetic malformation. When the new individual genetic heritage is complete, a random mutation might be applied depending on the mutation_probability parameter. The mutation will express an inhibited gene or, contrariwise, will inhibit an expressed gene. In the first case, the genotype will be checked, if an incompatible gene was expressed, it will be inhibited to comply with the defined genetic rules.

 

The most valuable individual is extracted during the last step. If the same individual is selected for more than avoid_local_max_every generations, the population size is increased by 10%, as long as the threshold_max_population is not reached. If a better individual emerges, the population will be reset to its original size by pruning the worst individuals. At the end of the last generation, the best individual will be converted to a cracking strategy where its genotype represents the attacks.

 

Solver approach

 
This part is based on the Microsoft Z3 solver and the knapsack investment problem. The majority of the code came from https://github.com/hakank/hakank/blob/master/z3/knapsack_investments.py. The biggest modification is about the data modeling to adapt to our problem as much as possible. While in practice this solution is quite short and faster than the genetic optimizer, it is not guaranteed to complete in a reasonable amount of time, and requires the Z3 solver to be installed.

 

To summarize, generated strategies are quickly created, don't require a technical expertise and they are adapted to time constraints (hashformats, salts). However, they require data on past attacks so they can't be used at first.

Experiments

Generation time

As we can see in the following table, a result is shown for each worker with the algorithm, the generation time, the number of selected attacks, the total time of the solution and the max time. The difference between the number of selected attacks is not important in this case because it was on an experimental setup and some optimizations have been made since. However the generation time is much better for the solver than the genetic approach, so the user experience is improved.
Generation time for a 3h strategy
  Worker    Algorithm    Generation time (sec)    Selected attacks    Strategy time (hour)     Max time (hour)  
  John   Genetic   19.095   42/46   2.848   2.85
  John   Solver   0.590   40/46   2.626   2.85
  Hashcat-other    Genetic   45.568   110/117   1.975   2.28
  Hashcat-other    Solver   24.568   109/117   1.693   2.28
  Hashcat-mask    Genetic   13.463   24/31   0.305   0.87
  Hashcat-mask    Solver   0.091   24/31   0.305   0.87
 
Estimated time John: 2.848 h
 
Estimated time Hashcat: 2.280 h
 
Both methods are implemented in Kraqozorus, because while the Solver approach is more efficient, it requires a dependency that might not be available (z3).
 

Results on a real case

 
Let's take this recent leak (uploaded in october) from Hashes.org (we intentionaly used an outdated (august 2020) version of the wordlist hashes.org-2020.txt).
 
Sample
Hashlist from Hashes.org

 

We managed to crack reach the hashes.org "score" in a bit less than 12 hours after uploading the leak to the development box.

From this hashlist, a 5-hour strategy has been generated using the genetic algorithm. The goal is to compare its effectiveness with those of the existing strategies.

 
Result from generated strategy
Result from generated strategy

 

The worker time is not related with the total time elapsed of the task.
Experimental results
  Strategy    Time required     % cracked 
  Generated     4.82h   87.20
  Bruteforce    6.93h   08.31
  Wordlists    1.23h   79.01
  Mixed    5.60h   48.21
 
As intended, the generated strategy cracks more passwords than the other strategies in the allotted time frame. It even performs well compared to the preexisting strategy we used to reach 92.8%. One can observe that the Wordlist strategy also performs well. It is easily explained by the fact this is has been optimized over the course of several years.
 
When generating strategies, the main limiting factor is amount and quality of attacks available on the platform (about 210 attacks on the experimental rig).
 

Conclusion

 
With this new feature, we generated three default strategies (3h, 48h, 168h) for NT, NetNTLMv2, TGS, Sha512crypt and Bcrypt on the production server, replacing the previous, manually generated, strategies. This allows us to run a highly efficient strategy on a hashlist with just a few clicks. Moreover, stategies' quality can be improved later on by refreshing cracking history and/or adding new attacks.  Pentesters can generate strategies that are tailored to their time constraints, freeing them from the task of manually starting attacks against their hashlists.

 

Furthermore, each step of the cracking process increases the probability of success:
  1. Recon / OSINT / Leaks / Public wordlists / Cracking assets
  2. Cracking strategy
  3. Analyze cracked hashes (statistics, common masks, ...) , adapt assets and launch a new strategy