What is combosquatting?
Combosquatting appends or prepends keywords to a legitimate brand name to create deceptive domains like paypal-login.com or secure-amazon.com. This guide covers the research that quantified combosquatting at internet scale, explains why these domains persist far longer than typosquats, and outlines detection strategies.
6 min read
What it is#
Combosquatting is a domain abuse technique where an attacker registers a domain that pairs a correctly spelled brand name with one or more additional keywords. paypal-login.com, amazon-security-alert.com, and microsoft-support.net are all combosquats. The brand name is intact; the deception comes from appending or prepending a word that implies official functionality, producing a domain that looks like a legitimate service endpoint.
The technique differs fundamentally from typosquatting. A typosquat exploits keyboard errors (transposition, omission, addition). Combosquatting exploits meaning. The added keyword makes the domain feel intentional rather than accidental. Victims do not need to mistype anything. They encounter these domains through phishing emails, malicious advertisements, or poisoned search results and click because the URL appears plausible. When the appended keyword is a trademarked brand name, combosquatting overlaps directly with brand impersonation and keyword squatting.
Scale of the problem#
The first large-scale empirical analysis of combosquatting analyzed more than 468 billion DNS records collected from passive and active sources over nearly six years. Starting from the 500 most popular trademarked domain names in the United States, the research searched for domains combining those trademarks with additional words.
The findings quantified the scale:
- Prevalence. Combosquatting domains were roughly 100 times more prevalent than typosquatting domains in the dataset, a conclusion corroborated by independent analysis of global DNS traffic.
- Persistence. Nearly 60% of abusive combosquatting domains remained active for more than 1,000 days (over three years) without takedown. Typosquatting domains tend to be flagged and removed faster because their misspellings trigger existing edit-distance heuristics.
- Growth. Year-over-year combosquatting activity increased throughout the study period, suggesting that attackers view the technique as effective and underpoliced.
- Attack diversity. The same registration pattern supports phishing, social engineering, affiliate abuse, trademark infringement, business email compromise, and advanced persistent threats.
The 1,000-day persistence figure is striking. It means a single combosquat registration can serve as infrastructure for multiple campaigns over several years. Combined with the sheer volume of possible brand-keyword combinations, this creates an asymmetry. Defenders face an effectively unbounded search space, while attackers can register domains cheaply and use them for extended periods before enforcement action takes effect.
Why combosquatting works#
Combosquatting bypasses the mental model that equates "suspicious domain" with "misspelled domain":
- No typo required. The brand name is spelled correctly, so autocorrect and Levenshtein distance metrics provide no protection. A combosquat sits entirely outside the edit-distance radius of the legitimate domain.
- Keyword plausibility. Words like
login,support,verify, andsecurefeel natural next to a brand name. Real services use similar structures for subdomains (support.example.com), training users to accept the pattern. - Hyphen confusion. Hyphens mimic the visual structure of subdomain separators.
google-accounts.comreads much likeaccounts.google.comat a glance, especially in the truncated URL bars of mobile browsers. - Emotional targeting. Keywords are chosen to create urgency or imply authority. A user who receives a link to
bankname-security-alert.comis primed to act quickly rather than verify the domain.
Common keyword categories#
Akamai published an analysis of the most frequently observed keywords in confirmed phishing domains. The top ten, ranked by prevalence, were: support, com, login, help, secure, www, account, app, verify, and service.
These keywords cluster into functional categories:
- Authentication.
login,signin,verify,auth,sso,account - Trust and safety.
secure,safety,alert,protect,confirm - Support.
help,support,service,center - Infrastructure.
mail,cloud,app,api,cdn,dev - Action.
update,renew,activate,download
The appearance of com and www in the top ten highlights a notable tactic, embedding TLD-like strings inside the domain label (accountpaypal-com.info, com-apple.co) to make the URL resemble a legitimate address parsed differently. Country-code strings like jp, us, and uk also appear frequently, suggesting geographic targeting of specific markets.
Combosquatting vs. typosquatting#
| Typosquatting | Combosquatting | |
|---|---|---|
| Brand name | Misspelled | Correctly spelled |
| User error | Requires typing mistake | No mistake needed |
| Delivery | User navigates directly | Often delivered via phishing |
| Domain length | Similar to target | Longer than target |
| Variant count | Bounded and deterministic | Effectively unbounded |
| Persistence | Shorter (detected by edit-distance heuristics) | Longer (60% survive >1,000 days) |
| Detection | String distance metrics | Substring and keyword matching |
The two techniques are not mutually exclusive. A single domain can combine a misspelled brand with a keyword (amaz0n-security.com), blending combosquatting with homoglyph substitution. Multiple squatting types also stack with TLD squatting (brand-security.co) or IDN homograph attacks in more sophisticated campaigns.
Detection challenges#
Combosquatting is harder to enumerate than character-level permutation techniques like bitsquatting or vowel swap. Those methods produce a bounded, deterministic set of variants from a known domain. Combosquatting draws from an open vocabulary where any keyword may appear, in any position, with or without hyphens, across hundreds of TLDs. Exhaustive precomputation is not feasible, which makes defensive registration impractical as a primary defense.
Effective domain monitoring therefore relies on continuous observation rather than precomputed variant lists:
- Substring scanning. Monitoring newly registered domain feeds and Certificate Transparency logs for labels containing the brand name as a substring, optionally weighted by known high-risk keywords.
- Registration velocity. Flagging clusters of brand-keyword combinations registered in rapid succession, a pattern typical of campaign staging.
- WHOIS and RDAP metadata. Creation date, registrar, and privacy-shielding patterns help distinguish bulk abuse registrations from legitimate domains.
- Passive DNS and content analysis. Resolution activity and hosted-page inspection (brand logos, login forms, cloned content) reveal whether a combosquat is actively serving malicious content or sitting parked.
False positives remain a persistent challenge. Legitimate services, partners, resellers, and fan sites may register domains containing a brand name. Triage requires domain threat intelligence context beyond the domain string itself, following the same layered approach used for all lookalike domain categories.
Monitoring with Have I Been Squatted#
Have I Been Squatted scans Certificate Transparency logs and registration feeds for brand-keyword combinations alongside character-level permutations. Detected domains are enriched with DNS, HTTP, RDAP, and screenshot data to distinguish active threats from benign registrations. This layered approach addresses the core combosquatting challenge. The keyword space is unbounded, so detection depends on continuous monitoring and contextual triage rather than static variant lists.
Previous
What is bitsquatting?
Next
What is keyword squatting?
More from Typosquatting
View allIDN homograph attacks
IDN homograph attacks exploit visual similarity between characters in different Unicode scripts to create domains that appear identical to legitimate ones. This guide covers the technical mechanism, notable demonstrations, browser and registry defenses, and detection approaches.
Typosquatting examples
Documented real-world typosquatting incidents, from Google's typo-domain disputes to Fortune 500 email interception and supply-chain attacks on package managers. Each case illustrates a distinct attack category with dates, outcomes, and lessons.
Typosquatting permutations
Typosquatting permutation generation is the process of algorithmically enumerating all plausible misspellings and variations of a domain name. This guide explains the permutation categories, the tools that generate them, the combinatorial explosion problem, and how security teams prioritize the output.