Let's approach website spamming objectively and discuss how to defend against bot attacks and identify weak spots. The most common types of spam include registration spam, comment spam, post spam, and contact form spam.
Nowadays, custom-made websites are rare, and most sites are built on well-known CMS systems like Wordpress or Joomla, or discussion forums like PHPBB, SMF, etc. These systems have open-source code, which allows spam bot creators to design bots specifically for these platforms.
1) Rewrite Registration Functions
CMS systems are modularly written, making it easy to rename registration functions so that spam bots cannot detect them. Almost every CMS system operates a support forum where non-programmers can often find exact instructions on how to change registration functions.
2) Control Question
As bots get more sophisticated, the control questions must also be more sophisticated.
NO
- 3x5 = 15
- Copy [word] = word
YES
- First and third letters of the word [hello] = hlo
- Capital of the Czech Republic = Prague
Spam bot creators are often from the former Soviet Union, China, or English-speaking countries, so using diacritics in your answers can help keep bots at bay.
3) CAPTCHA (Turing Test)
CAPTCHA stands for "Completely Automated Public Turing Test to Tell Computers and Humans Apart." It involves displaying an image with distorted text, and the user must type the text into an input field. The human brain can recognize distorted text correctly, while an internet bot using OCR technology will not.
CAPTCHA was a very effective tool for filtering out spam bots for a long time. However, with the improvement of OCR technology, more bots can bypass CAPTCHA.
Some companies that are fully dedicated to spam have taken it a step further. Their bots operate differently by copying the CAPTCHA image, sending it to a call center (often in India), where the CAPTCHA is solved and the bot is then fed the answer.
The most widely used and supported CAPTCHA is currently Google's reCAPTCHA.
4) Filtering Non-Existent Emails
Spam bots need many email addresses for registration processes. They often favor gmail.com addresses, though many of these addresses do not exist. Therefore, you can enhance the registration process by checking DNS records, especially the existence of MX records and the level of SPF records.
5) IP Addresses and Emails on a Blacklist
Of all the methods mentioned, the most effective has been checking IP addresses and emails against a spammer database maintained by the project Stop Forum Spam. Stop Forum Spam allows remote checking of registration details via API and offers modules and plugins for the most common CMS systems and forums. Other website and forum operators can automatically add spammers from their forums to the database, cutting off spammers completely.
Conclusion
We tested all these methods against spammers on our websites, and the most successful methods were:
- Control question
- Verification against the Stop Forum Spam project
Each method alone can filter out 99.96% of all spammers (based on a sample of 400,000 registration attempts on Joomla CMS and SMF forum). Together, these methods have so far filtered out 100% of spam.