Robots.txt is a text file (obviously) that tells good robors where not to go in your site. Most robots follow the robots.txt file (examples of those that do: Googlebot, Yahoo, Altavista, MSN). However, some robots, especially those used for spam, etc. do not follow the rules placed in Robots.txt
The first rule of robots.txt is that it must be placed in the main directory; this means that you cannot place a robots.txt file in this directory:
http://www.domain.com/thisdirectory/
you can however place a robots.txt in this directory
http://www.domain.com/
The robots will not visit your robots.txt file every time they visit your site, they usually visit the file about once a week.
Lets get down to the basic syntax of the robots.txt file:
To exclude all robots from the entire server
Code: Select all
User-agent: *
Disallow: /
Code: Select all
User-agent: *
Disallow:
Code: Select all
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
Code: Select all
User-agent: BadBot
Disallow: /
Code: Select all
User-agent: WebCrawler
Disallow:
User-agent: *
Disallow: /
Currently, there is no Allow: field for robots.txt