About
Bad Bots are robots with bad intentions.
They are also known as attackers.
Usage
They walk through:
- web pages trying to find a form and to fill them trying:
- to send email in mass
- to create a fake account (to be able to crawl the backend, to send email via invite,…)
- to authenticate with password guessing
- sockets trying:
- to break into your system
- to use your system maliciously (sending SMTP message, …)
Definition
Bad bots are:
- Bad User-Agent Strings
- Vulnerability scanners
- E-mail harvesters
- Content scrapers
- Link Ranking Bots
- Aggressive bots that scrape content
- Image Hotlinking Sites and Image Thieves
- Government surveillance bots
- Botnet Attack Networks (Mirai)
- Known Wordpress Theme Detectors (Updated Regularly)
- SEO companies that your competitors use to try improve their SEO
- Link Research and Backlink Testing Tools
- Browser Adware and Malware (Yontoo etc)
Protection
- Use a double opt-in for your signup forms
Honeypot
A Honeypot is an input field that only program/bot should see.
Form
This input field or a checkbox is hidden from human using styling (CSS) such as:
- putting the field out of the screen
- setting the same color as the background
- or not displaying it
Example:
- This text field should not be filled with any value by a human because it can not see it thanks to the absolute position to the left that makes it disappear from the screen (position: absolute; left: -5000px;)
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input type="text" name="badbot_should_fill_it_humane_not" tabindex="-1" value="">
</div>
Scrapping
An honey pot link is:
- a hidden link in a page
- that is forbidden into robots.txt
Every user / bot that access the link, disobey the rule while scrapping are bad bot.
Challenge
A challenge is a test to prove that:
- you are human
- you are not a bot
A challenge may be presented:
- before loading the page such as IUAM challenge (I'm under attack mode / Js) - They checks if the client supports Javascript.
- before filling a form such as captcha
- or after detection
Note: in cloudflare, the parameters __cf_chl_jschl_tk__ and __cf_chl_captcha_tk__ are added to the url after a user successfully passes a:
- a IUAM challenge
- or Captcha, respectively.
Captcha
A captcha is a visual challenge to prove that you are human.
The test can also be difficult for human and is therefore a barrier on forms submission (low sign-up rate,..)
Captcha doesn't stop human spammers. See double opt-in.
It should be used therefore
- only if the fake account problem is extremely severe.
- and if it's the case, at a later stage in the sign up process (ie towards the end of the process).
Otherwise a recaptcha can be used.
Agent
A human will use generally use a real browser (as agent) to interact with the website and sign up.
A bot is not a browser and may not implement:
You may implement rule such as:
- Do something afterwards if the cookie is present.
- Show my forms via Javascript
Example:
- The external form html in the server
<form action="" method="post">
<input type="email" value="" name="EMAIL" class="email" placeholder="email address">
<input type="submit" value="Subscribe" name="subscribe" class="button">
</form>
- The HTML page (that does not have any form - only a anchor newsletter_form to point where the form should be added)
<h2>My Form</h2>
<div id="newsletter_form"></div>
- This Web Api Fetch will add the form
// Web Api
let pagePath= parent.JSINFO.id.replace(":","/");
fetch(`/_export/code/${pagePath}?codeblock=1`, {
method: 'GET', // *GET, PUT, DELETE, etc.
})
.then(function(response) {
// Response text is a promise, you need to pass it to a callback to resolve it
response.text().then(function(data) {
document.getElementById('newsletter_form').innerHTML=data;
});
})
// or Jquery
// For Jquery, you can also use [[https://api.jquery.com/load/|Jquery load]]
// $('#newsletter_form').load('/_export/code/email/fake?codeblock=0');
- Result:
Browser
A human will use generally the same browser to sign up and confirm the email.
By setting a cookie or taking the browser fingerprint, we can see if the signup and the confirmation was done with the same browser.
Browser fingerprinting is also used to identify the characteristics of botnets, because the connections of botnets are established by a different device every time. See device-tracking-by-web-sites-can-be-a-good-thing/
A bot (hacker) who logged into the account using a device that had never accessed the account before can potentially be identified.
Computer / IP
A human will
- use generally the same computer to sign up and confirm the email.
- not signup multiple time from the same computer
- not submit form more than once in a 24 hour period (Shopify shows a captcha challenge if this is the case)
By taking the browser fingerprint (and IP), we can monitor this behavior.
High Engagement
Because a bot will click on all links, it will ends up with a high engagement score that no human could achieve.
- open rate of 100% (open email by mail send)
- click through rate of 100%
- and so on
A high engagement score within a short period of time is a big red flag.
Firewall
You can restrict access by Ip Address or Mac Address.
You can therefore also restrict access by country. Example: How to restrict your traffic to a country with Firewalld / Iptable? (ie packet filtering by country)
Black list
From bad behavior, there are blacklist created where the IP or domain are registered.
When receiving a connection, you can check these lists and taking action accordinlgly.
Tarpit
A tarpit is a network service that intentionally inserts delays in the protocol banner, slowing down clients by forcing them to wait. The cost is a socket but no high CPU or memory usage.
Example:
Port knocking
Port knocking redirects your traffic to a port with a routing command (for instance iptables) only if it receives a good sequence.
Example with knockd to manage an SSH port.
/etc/knockd.conf
[options]
UseSyslog
[openSSH]
sequence = 7000,8000,9000
seq_timeout = 5
command = /sbin/iptables -A INPUT -s %IP% -p tcp --dport 22 -j ACCEPT
tcpflags = syn
[closeSSH]
sequence = 9000,8000,7000
seq_timeout = 5
command = /sbin/iptables -D INPUT -s %IP% -p tcp --dport 22 -j ACCEPT
tcpflags = syn
Unban
If you ban an IP, you also need to manage the unban.
Example:
- ycombinator unban service has this form: 1)
http://domain/unban?ip=<ip address>
Software protection
Third software protection looks through log files to find bad behavior (such as too many login attempts) and block based on the IP address.
List: