Domain permutation analysis (Scoring and prioritization) - Threat intelligence

What it is#

Domain permutation analysis is the systematic generation and evaluation of domain name variants that could be confused with a legitimate domain. Given an input like example.com, permutation tools produce hundreds or thousands of alternatives that an attacker might register for phishing, traffic interception, or brand abuse. It is the discovery engine behind typosquatting permutation detection.

Permutation categories#

Different transformation types model different attacker strategies and user mistakes.

Omission removes one character at a time: examle.com, exmple.com. These model keyboard slips where a user misses a key.

Insertion adds a character adjacent to existing ones: examplle.com, exaample.com. These catch accidental double-taps or autocomplete errors.

Transposition swaps adjacent characters: exmaple.com, examlpe.com. This is one of the most common real-world typos.

Replacement substitutes a character with a nearby key or visually similar alternative: ezample.com (keyboard proximity), examp1e.com (visual similarity).

Homoglyph substitution replaces characters with visually identical ones from other scripts: exаmple.com using Cyrillic а instead of Latin a. In a browser address bar, these are nearly indistinguishable.

TLD swaps change the top-level domain: example.net, example.co, example.org. Users frequently misremember or mistype TLDs.

Hyphenation adds or removes hyphens: ex-ample.com, exam-ple.com. These exploit how quickly users scan URLs.

Additional techniques include subdomain prepending (example.com.attacker.com), vowel swapping, and plural/singular variations.

Scoring and prioritization#

Raw permutation output is too large to investigate manually. Scoring narrows the field by ranking candidates on multiple factors.

Registration status is the first filter, unregistered permutations are potential defensive registrations but not immediate threats. Among registered domains, active DNS resolution separates live infrastructure from parked pages.

String similarity scores (Levenshtein distance, Jaro-Winkler, visual similarity) quantify how confusable each variant is. A homoglyph with an edit distance of 1 that renders identically in a browser ranks higher than a three-character transposition.

Infrastructure signals add further context. Hosting on a known-bad ASN, a recently issued TLS certificate matching the monitored brand, or web content mimicking a login page all elevate priority.

Automation and tools#

Have I Been Squatted's twistrs library automates permutation generation and supports basic resolution checks. Enrichment extends that further, pulling WHOIS data, checking CT logs, and crawling content before scored results feed into triage queues for analyst review. Have I Been Squatted builds on this approach, combining permutation generation with real-time domain monitoring and enrichment to surface lookalike domains that warrant attention.

Limitations#

No permutation engine covers every creative variation an attacker might invent. Homoglyph coverage depends on script support, and combinatorial explosion means longer domain names produce impractically large candidate sets. Permutation analysis works best as one input to a broader detection pipeline rather than a standalone solution.

Domain permutation analysis

What it is#

Permutation categories#

Scoring and prioritization#

Automation and tools#

Limitations#

Put what you learn into practice