Security - Bad Bot (Spambot, Attacker …)

About

Bad Bot are robots / user (attacker) with bad intentions.

Usage

They walk through pages trying to find a form and to fill them trying:

See Web Security - Fake Form Submission (Signup,..)

Definition

Bad bots are:

  • Bad User-Agent Strings
  • Vulnerability scanners
  • E-mail harvesters
  • Content scrapers
  • Link Ranking Bots
  • Aggressive bots that scrape content
  • Image Hotlinking Sites and Image Thieves
  • Government surveillance bots
  • Botnet Attack Networks (Mirai)
  • Known Wordpress Theme Detectors (Updated Regularly)
  • SEO companies that your competitors use to try improve their SEO
  • Link Research and Backlink Testing Tools
  • Browser Adware and Malware (Yontoo etc)

Protection

  • Use a double opt-in for your signup forms

Honeypot

A Honeypot is an input field that only program/bot should see.

Form

This input field or a checkbox is hidden from human using styling (CSS) such as:

  • putting the field out of the screen
  • setting the same color as the background
  • or not displaying it

Example:

  • This text field should not be filled with any value by a human because it can not see it thanks to the absolute position to the left that makes it disappear from the screen (position: absolute; left: -5000px;)
<div style="position: absolute; left: -5000px;" aria-hidden="true">
       <input type="text" name="badbot_should_fill_it_humane_not" tabindex="-1" value="">
</div>

Scrapping

An honey pot link is:

  • a hidden link in a page
  • that is forbidden into robots.txt

Every user / bot that access the link, disobey the rule while scrapping are bad bot.

Challenge

A challenge is a test to prove that:

  • you are human
  • you are not a bot

A challenge may be presented:

  • before loading the page such as IUAM challenge (I'm under attack mode / Js) - They checks if the client supports Javascript.
  • before filling a form such as captcha
  • or after detection

Note: in cloudflare, the parameters __cf_chl_jschl_tk__ and __cf_chl_captcha_tk__ are added to the url after a user successfully passes a:

  • a IUAM challenge
  • or Captcha, respectively.

Captcha

A captcha is a visual challenge to prove that you are human.

The test can also be difficult for human and is therefore a barrier on forms submission (low sign-up rate,..)

Captcha doesn't stop human spammers. See double opt-in.

It should be used therefore

  • only if the fake account problem is extremely severe.
  • and if it's the case, at a later stage in the sign up process (ie towards the end of the process).

Otherwise a recaptcha can be used.

Agent

A human will use generally use a real browser (as agent) to interact with the website and sign up.

A bot is not a browser and may not implement:

You may implement rule such as:

  • Do something afterwards if the cookie is present.
  • Show my forms via Javascript

Example:

  • The external form html in the server
<form action="" method="post">
<input type="email" value="" name="EMAIL" class="email" placeholder="email address">
<input type="submit" value="Subscribe" name="subscribe" class="button">
</form>
  • The HTML page (that does not have any form - only a anchor newsletter_form to point where the form should be added)
<h2>My Form</h2>
<div id="newsletter_form"></div>
// Web Api
let pagePath= parent.JSINFO.id.replace(":","/");
fetch(`/_export/code/${pagePath}?codeblock=1`, {
    method: 'GET', // *GET, PUT, DELETE, etc.
  })
  .then(function(response) {
    // Response text is a promise, you need to pass it to a callback to resolve it
    response.text().then(function(data) {
       document.getElementById('newsletter_form').innerHTML=data;
    });
})

// or Jquery
// For Jquery, you can also use [[https://api.jquery.com/load/|Jquery load]]
// $('#newsletter_form').load('/_export/code/email/fake?codeblock=0');
  • Result:

Browser

A human will use generally the same browser to sign up and confirm the email.

By setting a cookie or taking the browser fingerprint, we can see if the signup and the confirmation was done with the same browser.

Browser fingerprinting is also used to identify the characteristics of botnets, because the connections of botnets are established by a different device every time. See device-tracking-by-web-sites-can-be-a-good-thing/

A bot (hacker) who logged into the account using a device that had never accessed the account before can potentially be identified.

Computer / IP

A human will

  • use generally the same computer to sign up and confirm the email.
  • not signup multiple time from the same computer
  • not submit form more than once in a 24 hour period (Shopify shows a captcha challenge if this is the case)

By taking the browser fingerprint (and IP), we can monitor this behavior.

High Engagement

Because a bot will click on all links, it will ends up with a high engagement score that no human could achieve.

A high engagement score within a short period of time is a big red flag.

Unban

Unban 1)

http://domain/unban?ip=<ip address> 

Software protection

Documentation / Reference


Powered by ComboStrap