Security

The Great Password Collider: Recovering Passwords from Hashes Without Heavy Computation

Often you need to recover a password when all you have is the hash. You could brute-force it on your own machine, but it’s much faster to use an existing database. Even public datasets contain tens of millions of hash–password pairs, and querying them through a cloud service takes only seconds.

There are several zettabytes of digital data in existence, but far from all of it is unique: duplicate copies are scattered across billions of devices and servers. Regardless of the data type, working with it involves solving the same fundamental tasks. These include reducing redundancy by partially removing duplicates (deduplication), integrity checking, creating incremental backups, and user authorization. Naturally, the last aspect is of greatest interest to us, but all of these techniques are built on common data-processing methods that use hashing. There are cloud services that can accelerate this process—for purposes that are well known.

At first glance it may seem odd that so many different tasks rely on the same procedure: computing and comparing checksums or hashes—fixed-length bit strings. Yet the approach is genuinely universal. Checksums act as digital fingerprints for files, keys, passwords, and other data—collectively called messages in cryptography. Hashes (or digests) let you compare these values, quickly detect any changes, and make access checks safer. For example, hashes enable verifying entered passwords without sending them in plaintext.

Mathematically, this is done by a hashing algorithm—an iterative transformation applied to data blocks into which the original message is split. The input can be anything, from a short password to a massive database. Each block is padded (often with zeros) or truncated to the required length, and the process repeats until a fixed-size digest is produced.

info

The maximum input size a hash function can handle is determined by how the algorithm encodes the data. Typically, the message length is stored as a 64-bit integer, so the usual limit is 2^64−1 bits—about two exabytes. In practice, this constraint isn’t relevant yet, even for very large data centers.

Hashes are typically written in hexadecimal. This makes them much easier to compare by eye, and the representation is a quarter the length of the binary form. The shortest hashes come from Adler-32, CRC32, and other algorithms with 32-bit digests. The longest are from SHA-512. Beyond those, there are about a dozen other popular hash functions, most of which produce intermediate digest lengths: 160, 224, 256, and 384 bits. Work on functions with longer outputs continues, since the longer the digest, the more distinct values a hash function can produce.

Uniqueness: The Cornerstone of Security

A hash’s uniqueness—its resistance to collisions—is one of the key properties that determines the cryptographic strength of a hashing scheme. The space of possible passwords is effectively unbounded, but the set of hash values is always finite. Any hash function’s digests are only unique up to the size of its output space: 2^n for an n‑bit hash. For example, CRC32 has just 2^32 possible outputs, so collisions are hard to avoid. Most modern functions use 128‑ or 160‑bit digests, which greatly expands the space of distinct hashes—to 2^128 and 2^160, respectively.

info

Strictly speaking, cryptographic hash functions are subject to stricter requirements than cyclic code–based checksums (e.g., CRC). However, in practice these terms are often used interchangeably.

A collision is when different inputs (including passwords) produce the same hash. Collisions can occur naturally at scale or be deliberately engineered for attacks. This phenomenon underpins attacks on various cryptographic systems, especially authentication protocols. Typically, these systems hash the entered password or key and then send the resulting digest for comparison—often mixing in a salt (random data) at some stage or applying additional cryptographic transformations to strengthen security. The passwords themselves aren’t stored; only their digests are transmitted and compared. Crucially, hashing any password with the same function always yields a digest of fixed, known length.

Pseudo-Reversal

You can’t invert a hash and recover the original password directly, even if you ignore the salt, because hashing is a one-way function. From the resulting digest, you can’t determine the size or type of the input. What you can do instead is find a different password that produces the same hash. Thanks to hash collisions, this is easier: you might never discover the original password, but you can find another one that, when hashed with the same algorithm, yields the required digest.

All it takes is to compute about 2^128 password–hash pairs—or an order of magnitude or two more, depending on the digest length of the chosen function. But numbers like 2 to a ridiculously large power are intimidating only if you’re thinking in terms of your own modest machine. The good news is that today the speed of recovering a password from its hash doesn’t have to depend on the attacker’s own compute power: in many cases you no longer need to run a long brute-force search. A lot of the work has already been done.

New optimization techniques emerge practically every year. Teams like HashClash, Distributed Rainbow Table Generator, and other international cryptographic computing projects drive this work. As a result, hashes have already been precomputed for every short combination of printable characters and for entries from common password lists. These can be quickly compared against a captured hash until an exact match is found.

What used to take weeks or months of CPU time can now be done in a few hours thanks to multi-core processors and GPU-accelerated brute-force tools using CUDA and OpenCL. Admins generate rainbow tables on idle servers, while others simply spin up a virtual cluster on Amazon EC2.

Lookup vs. Compute

Popular hashing algorithms are so fast that, by now, hash-to-password lookup tables (rainbow tables) exist for practically every function with a short digest. Meanwhile, for hash functions with 128-bit digests and above, weaknesses—either in the algorithms themselves or in specific implementations—are being found, which greatly simplifies attacks.

In the 1990s, Ronald Rivest’s MD5 algorithm became extremely popular. It was widely used for user authentication on websites and for client applications connecting to servers. However, later research showed the algorithm is not sufficiently secure. In particular, it is vulnerable to collision attacks. In other words, it’s possible to deliberately craft a different data sequence whose hash exactly matches a known one.

Because message digests are widely used in cryptography, relying on the MD5 algorithm today leads to serious problems in practice. For example, such an attack can be used to forge an X.509 digital certificate. It’s even possible to forge an SSL certificate in a way that lets an attacker pass off their fake as a trusted root CA certificate. Moreover, most trust stores still contain certificates that use MD5 for their signatures. As a result, the entire public key infrastructure (PKI) is vulnerable to such attacks.

You’ll only need to resort to an exhaustive brute-force attack when you’re up against genuinely strong passwords (a large set of random characters) and hash functions with long digests (160 bits or more) that don’t have any known serious weaknesses. The vast majority of short and dictionary-based passwords can be cracked in a few seconds using online services today.

Frontline Fighters in the Cloud

1. The “Hash Killer” project has been around for almost eight years. It helps crack MD5, SHA‑1 (SHA‑160), and NTLM hashes. The current number of known hash–plaintext pairs is 43.7 million. You can upload multiple hashes at once for parallel analysis. Passwords containing Cyrillic or other non-Latin characters are sometimes recovered but may display with incorrect character encoding. The site also runs an ongoing password-cracking contest and offers utilities to make the job easier—such as tools to merge password lists, reformat them, and remove duplicates.

HashKiller doesn’t play well with Cyrillic, but it still knows Cyrillic passwords
HashKiller doesn’t play well with Cyrillic, but it still knows Cyrillic passwords
Hash Killer cracked three out of five passwords in half a second
Hash Killer cracked three out of five passwords in half a second

2. CrackStation supports nearly all hash types you’ll encounter in practice: LM, NTLM, MySQL 4.1+, MD2/4/5 + MD5-half, SHA-160/224/256/384/512, RIPEMD-160, and Whirlpool. You can upload up to ten hashes at a time for analysis. It searches an indexed database. For MD5, it contains 15 million hash–plaintext pairs (about 190 GB), plus roughly 1.5 million pairs for each of the other hash functions.

CrackStation can recover many dictionary passwords even from NTLM hashes
CrackStation can recover many dictionary passwords even from NTLM hashes

According to the developers, the database includes all words from the English Wikipedia and most popular passwords compiled from public lists. It also covers tricky variants with case changes, leetspeak, repeated characters, mirroring, and other tweaks. However, truly random passwords—even just five characters long—are a problem: in my tests, half of them weren’t found, even for LM hashes.

CrackStation struggles to crack random passwords five characters long or more, even from LM hashes
CrackStation struggles to crack random passwords five characters long or more, even from LM hashes

3. CloudCracker.net is a free service for instant password lookups from MD5 and SHA-1 hashes. The digest type is detected automatically by hash length. For now, CloudCracker only finds matches for hashes of certain English words and common passwords like admin123. It won’t recover even short passwords made of random characters, such as D358, from an MD5 digest.

The “cloud cracker” instantly finds dictionary passwords from their hashes
The “cloud cracker” instantly finds dictionary passwords from their hashes

4. The MD5Decode.com service maintains a database of passwords with known MD5 values. It also shows all other hashes that correspond to the recovered password: MD2, MD4, SHA (160–512), RIPEMD (128–320), Whirlpool-128, Tiger (128–192 in 3–4 passes), Snefru-256, GOST, Adler-32, CRC32, CRC32b, FNV (132/164), JOAAT 8, HAVAL (128–256 in 3–5 passes). If the number of passes isn’t specified, the function computes a single-pass hash.

There’s no built‑in search on the site yet, but you can enter a password or its hash directly in your browser’s address bar by appending it to the site’s URL with the /encrypt/ prefix.

MD5Decode covers all hash types for dictionary passwords
MD5Decode covers all hash types for dictionary passwords

5. The aptly named MD5Decrypt.org only lets you look up a password–MD5 hash match. On the plus side, it has its own database of 10 million pairs and can automatically search 23 partner databases. The site also offers a hash calculator to compute digests of an input message using MD4, MD5, and SHA-1.

MD5Decrypt can recover composite passwords built from dictionary words, but it only accepts one hash at a time for analysis
MD5Decrypt can recover composite passwords built from dictionary words, but it only accepts one hash at a time for analysis

Another site, MD5Lab.com, is hosted behind Cloudflare in San Francisco. Its search is still clunky for now, though the database is growing quickly. Just keep it on your radar.

Googling Hashes

Not every service offers hash-to-password lookups for free. Some require registration and bombard you with ads, and many even advertise paid cracking services. A few really do run powerful clusters and queue submitted hashes as jobs, but there are plenty of chancers as well. They’ll charge you for results that are actually available for free, exploiting customers’ lack of awareness.

Rather than promoting legitimate services here, I’d suggest a different approach: use popular search engines to find hash–password pairs. Their crawlers scan the web daily and index new data, including fresh entries from rainbow tables.

So to start, just paste the hash into Google. If it corresponds to a dictionary password, it will usually show up on the first page of results. You can Google individual hashes manually, but for larger lists it’s more convenient to process them with the BozoCrack script.

A Universal Approach

Among the many hash functions, MD5 and SHA‑1 are the most popular, but the same approach applies to other algorithms as well. For example, the Windows SAM registry hive stores two hashes for each password by default: the LM hash (a legacy DES-based scheme) and the NT hash (produced by running the Unicode password through MD4). Both hashes are 128 bits long, but LM is much weaker due to numerous simplifications in its design.

Over time, both hash types are being replaced by stronger authentication mechanisms, yet many still rely on this legacy scheme as-is. By copying the SAM hive and decrypting it with the system key from the SYSTEM hive, an attacker can obtain the list of local accounts and their stored credential hashes.

From there, the attacker can find a character sequence that matches the administrator’s hash. That gives them full access to the OS while leaving fewer traces than a crude password reset. Remember, due to hash collisions, the working password doesn’t have to be the same as the real owner’s—but Windows won’t see any difference. As Bad Religion sang, “’Cause to you I’m just a number and a clever screen name.”

A similar issue exists in other authentication systems as well—for example, in WPA/WPA2, which is widely used to secure Wi‑Fi connections. When a wireless client connects to an access point, there’s a standard initial exchange that includes the 4‑way handshake. During this handshake, the password is never sent in cleartext, but key material derived from it (via a hash-based function) is transmitted over the air. The necessary frames can be captured by putting a Wi‑Fi adapter into monitor mode, typically using a modified or patched driver. Moreover, in many cases you don’t have to wait for the next connection: you can force it by sending broadcast deauthentication (deauth) frames to all associated clients. Within seconds, they’ll attempt to reconnect and perform the handshake.

After saving the handshake file(s), you can extract the password hash and either recover the actual password or find another one that the access point will accept just the same. Many online services can analyze not only a raw hash but also a file containing the recorded handshake. Typically, you need to provide the pcap file and the SSID of the chosen access point, since its identifier is used when deriving the PSK.

The well-known CloudCracker.com, which has been written up everywhere in recent years, still charges for the service. GPUHASH.me accepts Bitcoin. That said, there are also free sites with similar functionality—for example, DarkIRCop.

Online services and rainbow tables still can’t recover every hash–password pair. But hash functions with short digests have already been beaten, and short or dictionary passwords are easy to uncover even from SHA‑1 hashes. What’s especially striking is the instant lookup of passwords by their digests using Google. It’s the simplest, fastest, and completely free option.

it? Share: