Page is a not externally linkable
jdMorgan - 4:59 pm on Feb 23, 2007 (gmt 0)
This is one of the main reasons to properly identify your directory crawler. It is commonly-understood that directories need to re-validate submitted links, and that since there is no actual crawling going on, a request for robots.txt will probably *not* be made. If the targeted site's Webmaster sees a link to a familiar directory in the user-agent string, or can follow the link and be reminded of submitting to that directory, then he/she is less likely to get trigger-happy and ban the user-agent. Legitimate directory administrators and search engines alike will do well to keep in mind that there is an awful lot of abuse going on, and Web sites are increasingly running with "shields up" to protect themselves from all the scraping and harvesting going on these days. Using a meaningful, informative, and syntactically-correct user-agent string is not only the polite thing to do, it's also a matter of keeping directory/search engine listings comprehensive by making sure the user-agent doesn't get shown to the door when fetching pages. Jim
> Remember, I'm not crawling an entire website submitted to my directory, I'm just checking the entry page, usually the index page.