12/08/2018, 13:42
Typosquatting abuse in popular websites
Typosquatting is the act of purposefully registering a domain name that is a mistype of a popular domain name. It is a concept that has been known and studied for over 15 years, yet still thoroughly practiced up until this day In typosquatting, an attacker abuses the fact that real ...
-
Typosquatting is the act of purposefully registering a domain name that is a mistype of a popular domain name.
-
It is a concept that has been known and studied for over 15 years, yet still thoroughly practiced up until this day
-
In typosquatting, an attacker abuses the fact that real human users may mistype a URL while typing it in their browser’s address bar or email client.
-
As such, a typosquatter can register vacebook.com and capture the traffic of users who mistype facebook.com and would otherwise receive an error in their browsers.
-
As a matter of fact, in May 2013, Facebook was awarded 2.8 million dollars in damages caused by typosquatting, as well as over 100 typosquatting domains that were registered and monetized by typosquatters
-
Reported on the first content-based longitudinal study of typosquatting abuse, consisting of over 900 GB of data gathered over a period of seven months.
-
verifying whether previously discovered typosquatting trends still hold today.
-
Provided new results and insights in the typosquatting landscape, based on both the static and longitudinal aspects of data.
-
Showing that the adoption of strict policies and easy dispute-resolution procedures from registries, can decrease typosquatting abuse
-
Data Gathering
-
Set up two automated crawlers, which where supplied with the Alexa top 500 domains of April 1, 2013 as input.
-
The first crawler generates the typosquatting domains for each authoritative domain in the input, according to the aforementioned models.
-
For each authoritative and generated domain, the crawler first determines whether the domain resolves to an IP address. If so, the crawler visits the web page hosted on the domain using PhantomJS1 , a headless JavaScript-enabled web browser.
-
After loading the web page, the crawler waits for 10 seconds, allowing the page to load dynamic content or perform a redirect.
-
Finally, the crawler saves the IP address, final URL, HTML body and a screenshot of the page to disk.
-
The crawler was configured to process the entire list of domains daily for a period of 7 months starting at April 1, 2013 and running until October 31, 2013.
-
In total, 28,179 potential typosquatting domains were generated, out of which 17,172 resolved to an IP address at least once during our study.
-
The second crawler was configured to perform a WHOIS lookup for every domain ever successfully resolved by the HTTP crawler.
-
The WHOIS responses (if any) were parsed using Ruby Whois2 and then saved to disk.
-
Missing-dot typos: The dot following “www” is forgotten, e.g., wwwexample.com
-
Character-omission typos: One character is omitted, e.g., www.exmple.com
-
Character-permutation typos: Consecutive characters are swapped, e.g., www.examlpe.com
-
Character-substitution typos: Characters are replaced by their adjacent ones, given a specific keyboard layout, e.g., www.ezample.com where “x” was replaced by the QWERTY-adjacent “z”
-
Character-duplication typos: Characters are mistakenly typed twice, e.g., www.exaample.com
-
Authoritative Pages : redirecting to or displaying the authoritative domain without any abuse
-
Coinciding Pages : containing legitimate content that happen to reside on a typosquatting variant of an authoritative domain
-
Protected Pages : notifying the user that she made a typo and/or link to the authoritative domain
-
Ad parking Pages : that have no content other than showing advertisements
-
Adult content Pages : showing adult/pornographic content
-
Affiliate abuse Pages : taking advantage of an affiliate program offered by another domain (see Section II-D1)
-
For sale Pages : that have no content other than being advertised as for sale
-
Hit stealing Pages : redirecting to a legitimate domain without abusing an affiliate program
-
Scam Pages : persuading the user to enter personal information or to download malware (see Section II-D2)
-
No content Pages : that have no content (e.g., blank pages or pages under construction)
-
Server error Pages : displaying an error, which was caused by a server-side problem
-
Crawl error Pages : for which the crawler failed or that explicitly block the crawler’s IP address
-
Other Unclassified pages : and pages that do not fall into any of the above categories
- Affiliate abuse
- Scam
Affiliate abuse
-
We consider a page to be performing affiliate abuse when it redirects its visitors to a legitimate website, taking advantage of an affiliate program offered by that legitimate site.
-
Affiliate programs are arrangements in which a website owner (the advertiser) pays a commission to a third party (the affiliate) for sending traffic to her website.
-
For instance, amazon.com pays a commission for every purchase made by visitors coming from websites participating in their affiliate program. To identify what traffic comes from which affiliate, each affiliate is assigned a unique identifier that she should specify in the URLs toward which she forwards her visitors
-
Users who mistype match.com as ma5ch.com (“t” substituted by the QWERTY-adjacent “5”) are eventually brought back to the match.com domain, but the typosquatting page appends an affiliate identifier to the URL when it redirects the user’s browser from the typosquatting domain to the authoritative one.
-
As such, the owners of the authoritative domain will now have to pay an affiliate commission to the typosquatter, for a visit that should have been theirs in the first place.
Scam
-
A scam page is a page that tries to trick users into performing an action that is undesirable for the user and profitable for the attacker.
-
Two popular types of scams are “surveys” and malicious advertisements (malvertising). In surveys, users are asked to perform a series of steps in return for some reward, for example a $$00 coupon for a big box store.
-
In malvertising, the scam page is trying to convince the user to willingly download and execute a malicious program. Fig. 1 shows the ad we got when purposefully mistyping youtube.com as outube.com. If the user downloads and installs the purported software update, she will be infected with malware (11/51 virus engines at virustotal.com identified the downloaded executable as malicious).
Malicious vs. Defensive Registrations
- Data indicates that typosquatting is still very prevalent for the list of authoritative domains we considered. Out of these 500 domains, 477 have at least one malicious typosquatting domain. We considered a domain to be malicious when it is classified as such for at least 7 days during the data gathering period. These numbers indicates that on the attack side, typosquatters have no trouble registering and exploiting typosquatting domains, despite long-standing anticybersquatting legislation [1].
- On the defense side, trademark owners can protect themselves against typosquatting by proactively making defensive typosquatting domain registrations whenever they register an authoritative domain. Many registrars provide a service to automatically register a wide range of possible cybersquatting domain names when a trademark owner wants to register a domain. Nevertheless, our data shows that only 156 of the authoritative domains in our list have defensive domain registrations, meaning that 344 domains (representing 68.8% of the 500 most popular sites of the Internet) have no defensive registrations whatsoever. Thus, anyone who makes a typo for these domains and does not receive an error, is sure to land on a malicious typosquatting page.
- top 3 of authoritative domains with the most defensive registrations consists of
- ffingtonpost.com with 57 defensive domains,
- ericanexpress.com with 42 omains and
- oomberg.com with 39 domains.
- The top 3 of authoritative domains with the most malicious typosquatting domains are adultfriendfinder.com with 132 typosquatting domains, constantcontact.com with 103 typosquatting domains and odnoklassniki.ru with 97 such domains. Alarmingly, out of the three banks in our top 500 list (bankofamerica.com, hdfcbank.com and icicibank.com), only bankofamerica.com has defensive registrations.
- This means that if a user enters a typo for the domain of one of the two other banks, she could easily land on a phishing page, thinking she entered the proper domain name of her bank. Although we did not encounter any phishing pages for these banks during our study, our data shows hdfcbank.com had 42 active malicious typosquatting domains, icicibank.com had 43, and bankofamerica.com had 46. Any of these domains could start hosting phishing pages at any time or redirect users to the websites of competing financial institutes.
- It is surprising to see that, in a time where companies are estimated to spend 7% of their information technology budgets on security, and global cyber crime costs are estimated between $300 billion and $$ trillion [15], many companies do not bother to make any defensive registrations at all for their domains. In particular, one would expect the financial sector to take a leading role in protecting their reputation and their customers. It seems these companies are either not aware of the problem, or simply do not care about it. The fact that large Internet companies such as Microsoft [21] and Facebook [22] are successfully contending with cybersquatters through defensive typosquatting registration.