Web - Robots.txt

About

robots.txt is one way to control and gives permission to Web Bot when they crawl your website.

Example

Disallow all

User-agent: *    # applies to all robots
Disallow: /      # disallow indexing of all pages

Disallow a subdirectory

# Group 1
User-agent: Googlebot
Disallow: /nogooglebot/

# Group 2
User-agent: *
Allow: /

Sitemap: http://www.example.com/sitemap.xml

Delay between page view

The crawl delay is the number of seconds the bot should wait between pageview

Example:

User-agent: *
Disallow: 
Allow: /*

Crawl-delay: 5

Syntax

Disallow

An empty value for “Disallow”, indicates that all URIs can be retrieved. At least one “Disallow” field must be present in the robots.txt file.

The “Disallow” field specifies a partial URI that is not to be visited. This can be a full path, or a partial path; any URI that starts with this value will not be retrieved. For example,

  • to disallows both /help.html and /help/index.html
Disallow: /help 
  • to disallow /help/index.html but allow /help.html.
Disallow: /help/ 

Test

Documentation


Powered by ComboStrap