I knew the way web crawlers/bots work to index your website. In fact Google also has a feature of submitting a SiteMap to better index the pages in Ur site. But what if U don't want some pages to be crawled. Well, today I learned that there is a way in which we can request crawlers to ignore certain pages in the site. The trick is to place a 'robots.txt' file in the root directory of the site. This text file contains folders and URLs that need not be crawled.
The protocol, however, is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee privacya
Wednesday, February 13, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment