Forum Moderators: goodroi
An example would be without listing the direct url:
<example.com/links2.html>
I’d check for a robots.txt file by just doing the basics:
<example.com/robots.txt>
Any other ideas?
The below are blocking a large amount of crawlers, but I can’t see how their blocking a crawl to their directory pages at
[edited by: agerhart at 6:24 pm (utc) on Feb. 17, 2004]
[edit reason] please stop dropping URLs [/edit]
Also they could be serving a robots.txt specific to the user-agent, you could use a tool like wget (wget -U useragent http*//domain.tld/robots.txt) and try useragent strings from your logfiles for various robots.
[edited by: pageoneresults at 12:51 pm (utc) on Feb. 18, 2004]
[edit reason] Delinked Example [/edit]