Domain keywords used to spot phishing sites
Words like "update," "security," "login," "billing," when combined with a legitimate base domain name -- or its misspelled variation -- are common indicators of phishing sites, said Andrew Hay, director of security research at San Francisco-based OpenDNS.
OpenDNS has assembled a list of these keywords, as well as a list of domains commonly targeted by spammers.
The idea comes from algorithms mostly commonly used in fields such as bioinformatics and data mining, and uses natural language processing techniques.
A potentially fraudulent domain name is then compared to these lists, and then the probability of it being malicious is calculated based on this and other factors.
Those other factors include the use of uncommon domain name registrars, unusual hosting locations, and Whois information that doesn't match that of the parent domain.
Using just the keywords alone is not enough. The owners of the original, legitimate site might legitimately register related domain names, either for special subsites, or to help protect their users against typos and misspellings. Or they might grab back the phishing domains from criminals and redirect them to a legitimate site.
"But if, say, Facebook launched a new page, it would stand to reason that the site would be hosted on an IP address or data center controlled by Facebook," said Hay. And it would be registered with the company's usual registrar, not a small outfit in Kuala Lumpur.
So a combination approach of looking at keywords and registrar information can help spot new phishing sites while minimizing false positives, he said.
OpenDNS provides DNS-based security services in the cloud to more than 10,000 enterprises, schools, and religious institutions trying to protect employees or children from malicious sites, processing more than 70 billion Internet request per day.
OpenDNS has been trying out the new technique, which they call NLPRank, for the past two months.
In that time, Hay said, the company has found thousands of new phishing domains and manually added them to its block lists.
For example, last month, NLPRank released a report about a new set of PayPal phishing sites with domains like securitycheck-paypal dot com, and x-paypal dot com. Both sites looked extremely realistic because they copied the text, colors, images and even the HTML itself from the real site.
"What we're doing right now is still proving the model," Hay said. Right now, newly-detected phishing sites are manually reviewed before being added to the system.
The algorithms are still being tweaked, and new indicators being added, he said, to reduce the number of false positives.
There is no date set yet for when NLPRank will become an automated feature of the OpenDMS platform.
"We just want to make 100 percent sure that we're happy with he false positive rate, that we're not going to introduce any problems for customers," said Hay.