Domain permutation analysis
Domain permutation analysis systematically generates lookalike domain variants to identify potential typosquatting and brand impersonation. This guide covers permutation categories, scoring approaches, and how to prioritize thousands of candidates.
3 min read
What it is#
Domain permutation analysis is the systematic generation and evaluation of domain name variants that could be confused with a legitimate domain. Given an input like example.com, permutation tools produce hundreds or thousands of alternatives that an attacker might register for phishing, traffic interception, or brand abuse. It is the discovery engine behind typosquatting permutation detection.
Permutation categories#
Different transformation types model different attacker strategies and user mistakes.
Omission removes one character at a time: examle.com, exmple.com. These model keyboard slips where a user misses a key.
Insertion adds a character adjacent to existing ones: examplle.com, exaample.com. These catch accidental double-taps or autocomplete errors.
Transposition swaps adjacent characters: exmaple.com, examlpe.com. This is one of the most common real-world typos.
Replacement substitutes a character with a nearby key or visually similar alternative: ezample.com (keyboard proximity), examp1e.com (visual similarity).
Homoglyph substitution replaces characters with visually identical ones from other scripts: exаmple.com using Cyrillic а instead of Latin a. In a browser address bar, these are nearly indistinguishable.
TLD swaps change the top-level domain: example.net, example.co, example.org. Users frequently misremember or mistype TLDs.
Hyphenation adds or removes hyphens: ex-ample.com, exam-ple.com. These exploit how quickly users scan URLs.
Additional techniques include subdomain prepending (example.com.attacker.com), vowel swapping, and plural/singular variations.
Scoring and prioritization#
Raw permutation output is too large to investigate manually. Scoring narrows the field by ranking candidates on multiple factors.
Registration status is the first filter, unregistered permutations are potential defensive registrations but not immediate threats. Among registered domains, active DNS resolution separates live infrastructure from parked pages.
String similarity scores (Levenshtein distance, Jaro-Winkler, visual similarity) quantify how confusable each variant is. A homoglyph with an edit distance of 1 that renders identically in a browser ranks higher than a three-character transposition.
Infrastructure signals add further context. Hosting on a known-bad ASN, a recently issued TLS certificate matching the monitored brand, or web content mimicking a login page all elevate priority.
Automation and tools#
Have I Been Squatted's twistrs library automates permutation generation and supports basic resolution checks. Enrichment extends that further, pulling WHOIS data, checking CT logs, and crawling content before scored results feed into triage queues for analyst review. Have I Been Squatted builds on this approach, combining permutation generation with real-time domain monitoring and enrichment to surface lookalike domains that warrant attention.
Limitations#
No permutation engine covers every creative variation an attacker might invent. Homoglyph coverage depends on script support, and combinatorial explosion means longer domain names produce impractically large candidate sets. Permutation analysis works best as one input to a broader detection pipeline rather than a standalone solution.
Previous
ASN reputation
Next
Malicious redirect chains
More from Threat intelligence
View allDomain threat intelligence
Domain threat intelligence is the collection and analysis of signals from domain registrations, DNS, certificates, and hosting to detect abuse. This guide covers core data sources, enrichment workflows, and how domain threat intelligence supports incident response.
Malicious domain detection
Malicious domain detection combines registration signals, DNS behavior, content analysis, and reputation feeds to identify domains used for phishing, malware, or fraud. This guide covers detection approaches, scoring models, and false positive management.
What is certificate transparency?
Certificate Transparency (CT) is an ecosystem of public, append-only logs of issued certificates. Originally created to catch rogue certificates after high-profile CA compromises, CT logs have become an important source of threat intelligence for domain and subdomain monitoring.