Typosquatting permutations

Typosquatting permutation generation is the process of algorithmically enumerating all plausible misspellings and variations of a domain name. This guide explains the permutation categories, the tools that generate them, the combinatorial explosion problem, and how security teams prioritize the output.

7 min read

What it is#

Typosquatting permutation generation is the systematic production of every plausible variant of a domain name that an attacker might register or a user might accidentally type. Security tools enumerate these permutations so that brand protection teams can check registration status, inspect DNS records, and prioritize defensive action before a lookalike domain reaches end users.

The concept is straightforward. Given a domain like example.com, apply a set of string-manipulation rules to produce every candidate that could plausibly impersonate it. The resulting list typically runs into the thousands. Have I Been Squatted's twistrs library automates the process with a broad set of fuzzing algorithms, including adjacent-key substitution across multiple keyboard layouts, homoglyphs, vowel swap, TLD variation, and keyword-style permutations.

Permutation categories#

Most domain fuzzing engines share a core set of permutation categories. Each models a different class of user error or attacker technique.

Character omission removes one character at a time. For a label of n characters, this produces exactly n variants: example.com yields xample.com, eample.com, exmple.com, and so on. Omission models the most frequent keyboard error, skipping a key entirely.

Character insertion (addition) places an extra character adjacent to each position in the label. When limited to keys neighboring the original on a given keyboard layout, the output stays manageable. When expanded to the full alphabet, the count rises to (n + 1) × 26 raw candidates.

Transposition swaps each pair of adjacent characters, producing n − 1 variants: exmaple.com, examlpe.com. This mirrors the common slip of pressing two keys in the wrong order.

Adjacent-key substitution (replacement) replaces each character with a neighbor on the keyboard layout. The letter s is adjacent to a, d, w, and x on QWERTY, so each position fans out to k variants where k is the average number of adjacent keys (typically 3 to 6). Including multiple keyboard layouts multiplies the output further.

Homoglyph replacement substitutes characters with visually similar alternatives, either within ASCII (l1, o0) or across Unicode scripts (Latin a → Cyrillic а). Homoglyph sets can be large; the Unicode Consortium's confusables list contains thousands of entries, making this category one of the biggest contributors to total permutation count. When rendered through internationalized domain names encoded as Punycode, these variants are valid DNS labels and difficult to distinguish visually from the original.

Vowel swap replaces each vowel with every other vowel: example.com becomes axample.com, exomple.com, exampla.com. The category models phonetic confusion and produces a modest set of candidates.

Repetition doubles a single character: exxample.com, examplle.com. Users occasionally hold a key a fraction too long, making these variants plausible in practice.

Hyphenation inserts or removes hyphens. Inserting a hyphen between each adjacent pair of characters in the label yields n − 1 variants; removing hyphens from domains that legitimately contain them generates additional candidates.

TLD swap replaces the top-level domain: example.comexample.net, example.co, example.cm. Full permutation engines often evaluate dozens to well over a thousand alternative suffixes, depending on configuration. Closely related ccTLDs (.com / .cm, .co / .com) account for a disproportionate share of real-world typosquatting abuse.

Subdomain (dot insertion) places a period within the label, splitting it into a subdomain and a different parent domain: ex.ample.com. Exploitation requires the attacker to control the resulting parent domain (ample.com), so this category is less common in the wild but still worth monitoring.

Bitsquatting generates domains that differ by a single bit-flip in the ASCII representation of each character. A 10-character label produces roughly 70 raw variants (7 meaningful bits per character), of which 20 to 40 typically map to valid DNS characters.

Dictionary/keyword squatting appends or prepends common words (login-, secure-, -support) to the base domain, modeling combosquatting and brand impersonation rather than accidental typos. The candidate count depends entirely on the size of the dictionary used.

Scale and the combinatorial explosion#

For a domain label of n characters, even a conservative set of fuzzing rules produces a large candidate list. Consider a 10-character label:

CategoryApproximate count
Omission10
Transposition9
Replacement (QWERTY)~40
Insertion (adjacent keys)~55
Vowel swap~12
Repetition10
Hyphenation9
Bitsquatting~30
Subtotal (basic)~175
Homoglyphs (ASCII + Unicode)hundreds to thousands
TLD swap50–1,500+
Dictionary/keyworddictionary-dependent

Including Unicode homoglyphs and a broad TLD list pushes the total well past 2,000 candidates for a short domain. Longer labels, such as a 15-character brand name, scale roughly linearly in the positional categories and combinatorially in substitution categories, easily reaching 5,000 or more raw permutations.

Large-scale measurement research applying eight candidate-generation techniques to popular domain names, drawing from a corpus of over 3.3 billion DNS records and TLS certificate data, has identified more than 2.3 million registered typosquatting domains. That figure illustrates the gap between raw permutation volume and the subset that is actually registered, but it also shows how many adversaries exploit even a fraction of the permutation space.

Deduplication and filtering#

Raw permutation lists contain substantial overlap. Inserting a after position 2 and substituting position 3 with a can produce the same string. Homoglyph variants may duplicate ASCII-only substitutions after Punycode encoding. Effective tooling deduplicates the output before performing DNS lookups.

Filtering also removes invalid candidates. DNS labels cannot exceed 63 characters, cannot start or end with a hyphen, and (outside of IDN) are restricted to a-z, 0-9, and hyphens. These constraints discard a measurable fraction of raw output, particularly from homoglyph and insertion categories.

Prioritization#

Generating permutations is the easy part; deciding which ones matter is harder. Teams typically prioritize by:

  • Registration status. Only registered domains pose an active threat. A DNS resolution check against each permutation filters the list from thousands to dozens.
  • Permutation category. Omission, transposition, and adjacent-key substitution model real typing errors and attract more accidental traffic than vowel swap or full-alphabet insertion.
  • MX and web signals. A registered variant with MX records or an active HTTP server is higher risk than a parked page. MX records in particular suggest phishing infrastructure.
  • Certificate issuance. A typosquat domain that appears in Certificate Transparency logs with a recently issued TLS certificate may be preparing to serve HTTPS content.
  • Levenshtein distance. Permutations with a single-edit distance from the original are more likely to receive mistyped traffic than those with two or more edits.

Tooling landscape#

Have I Been Squatted's twistrs library (Rust, open-source) implements domain permutation generation across the categories above, including faux-TLD insertion, keyword squatting, double-vowel insertion, and related algorithms. Its compiled performance suits large-scale or continuous monitoring pipelines. Downstream integrations can resolve candidates against live DNS and add enrichment such as WHOIS, GeoIP, or fuzzy hashes of served content.

Permutation libraries focus on candidate generation and basic checks. Operationalizing the output, comparing against live registration data, correlating with certificate transparency logs, and triggering alerts, typically requires an additional monitoring layer.

Have I Been Squatted generates permutations across these categories for every monitored domain, checks each candidate against registration data, and enriches matches with DNS, HTTP, RDAP, and screenshot data. Certificate Transparency extended search surfaces newly issued certificates for permutation matches, closing the gap between domain registration and active abuse. The result is a prioritized view of which variants are registered, which are active, and which warrant defensive registration or takedown.

More from Typosquatting

View all

Put what you learn into practice

Monitor typosquats, investigate infrastructure, and move from reading to detection with continuous domain coverage built for security teams.