homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Robots.txt and SERPS
Blocking serps with robots.txt files and more

10+ Year Member

Msg#: 293 posted 6:22 pm on Feb 17, 2004 (gmt 0)

I've been running across a few sites and should have been keeping a list on them. I've done a robots.txt file search and couldn't locate any blocking of a search engine spider. Or, I could be doing it wrong too. I'd notice huge directories for link exchanges with no PR and the home page with a PR of 5 or greater. Doesn't make sense, unless your blocking a search engine crawl. I'd like to know if there is another way of catching this type of activity. Maybe the directories are new, and most likely not.

An example would be without listing the direct url:
Iíd check for a robots.txt file by just doing the basics:

Any other ideas?

The below are blocking a large amount of crawlers, but I canít see how their blocking a crawl to their directory pages at

[edited by: agerhart at 6:24 pm (utc) on Feb. 17, 2004]
[edit reason] please stop dropping URLs [/edit]



10+ Year Member

Msg#: 293 posted 6:36 pm on Feb 17, 2004 (gmt 0)

They could be using a header tag such as '<meta name="robots" content="noindex">'

Also they could be serving a robots.txt specific to the user-agent, you could use a tool like wget (wget -U useragent http*//domain.tld/robots.txt) and try useragent strings from your logfiles for various robots.

[edited by: pageoneresults at 12:51 pm (utc) on Feb. 18, 2004]
[edit reason] Delinked Example [/edit]

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved