Robots.txt Guide
Posted: April 30th, 2004, 12:56 am
This is a guide for some of you newer webmasters who have just started ou and have not really spent too much time in the Search Engine Optimization (SEO) department.
Robots.txt is a text file (obviously) that tells good robors where not to go in your site. Most robots follow the robots.txt file (examples of those that do: Googlebot, Yahoo, Altavista, MSN). However, some robots, especially those used for spam, etc. do not follow the rules placed in Robots.txt
The first rule of robots.txt is that it must be placed in the main directory; this means that you cannot place a robots.txt file in this directory:
http://www.domain.com/thisdirectory/
you can however place a robots.txt in this directory
http://www.domain.com/
The robots will not visit your robots.txt file every time they visit your site, they usually visit the file about once a week.
Lets get down to the basic syntax of the robots.txt file:
To exclude all robots from the entire server
To allow all robots complete access
To exclude all robots from part of the server
To exclude a single robot
To allow a single robot
Please not that the * (wildcard) can only be used in the User-Agent: field, and not in the Disallow: field.
Currently, there is no Allow: field for robots.txt
Robots.txt is a text file (obviously) that tells good robors where not to go in your site. Most robots follow the robots.txt file (examples of those that do: Googlebot, Yahoo, Altavista, MSN). However, some robots, especially those used for spam, etc. do not follow the rules placed in Robots.txt
The first rule of robots.txt is that it must be placed in the main directory; this means that you cannot place a robots.txt file in this directory:
http://www.domain.com/thisdirectory/
you can however place a robots.txt in this directory
http://www.domain.com/
The robots will not visit your robots.txt file every time they visit your site, they usually visit the file about once a week.
Lets get down to the basic syntax of the robots.txt file:
To exclude all robots from the entire server
Code: Select all
User-agent: *
Disallow: /
Code: Select all
User-agent: *
Disallow:
Code: Select all
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
Code: Select all
User-agent: BadBot
Disallow: /
Code: Select all
User-agent: WebCrawler
Disallow:
User-agent: *
Disallow: /
Currently, there is no Allow: field for robots.txt