homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

robots.txt restrictions conflictual with site's Terms
Which of them should prevail?

10+ Year Member

Msg#: 385 posted 9:11 am on May 14, 2004 (gmt 0)

Hi all!

Very useful forum! The robots.txt file of a large web site I'm looking at is:

User-agent: *
Disallow: /admin
Disallow: /empdir
Disallow: /jobman
Disallow: /jobsearch
Disallow: /reports
Disallow: /talentmatch

The few pages I intend to parse (/seeker.epl) are at the root level and this seems to be allowed, right? HTML files have no META tags for "robots". But here's is what I found on their Terms and Conditions page:

While using the Site or Site-related services, you agree not to do any of the following without our prior written authorization:
Use any search engine, software, tool, agent or other device or mechanism, including without limitation browsers, spiders, robots, avatars or intelligent agents (other than those made available by the Site or other generally available third party web browsers, e.g., Netscape Navigator or Microsoft Internet Explorer), to navigate or search the Site.

It is not in my intention to do something illegal with my spider or abuze of their bandwidth. The already public links I intend to collect will after all send people on their web site for the actual content.

Question is: Which restrictions should prevail: those from their Terms page or from robots.txt? Can I use the spider according to robots.txt and simply ignore the rest?

Thanks so much,



WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 385 posted 5:38 pm on May 15, 2004 (gmt 0)

Welcome to WebmasterWorld cristiscu,

a good question, what about just contacting them and asking their permission?

I would think that as long as you follow robots.txt that you are behaving according to the rules set by them for robots but I am not 100% sure either.


WebmasterWorld Senior Member receptional us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 385 posted 3:24 pm on May 18, 2004 (gmt 0)

without our prior written authorization

Makes sense then to agree with the previous poster - send them an email.

That said, there is no way Googlebot has time to read terms of use! That is exactly what robots.txt is designed for. So if they moan at you, they should also moan at Google... then again, they probably don't want to do that, which is their right.

Anyway - they know how to use robots.txt, so if they want to ban your user aganet then (presumably) you will let them by recognizing a commend for your bot should one arrove... but getting an OK from them in an email would be they best solution for you if you can.


10+ Year Member

Msg#: 385 posted 6:52 am on May 19, 2004 (gmt 0)

Thanks for the feedback.

I'm afraid asking them permission wouldn't be recommended in this case. I'm talking here about a company with revenus of 1M$ a day and huge traffic, not about a guy whose site's bandwith can be affected by my robot! And I wouldn't want to give them ideas about what I do. Anyway, my indexing engine would also use other sites.

As said before, there is no robot to look and read the terms on sites and, as long as you obey robots.txt and access is not denied, nobody can complain.


I would also have another concern. It appears most of you guys have their own site(s) and do not usually welcome robots (Googlebot is a lucky exception ;-)). I understand your attitude, I have my own small non-profit site and it bothers me when I see robots opening hundreds of connections and picking up email addreses for spam.

But, with so much info on the Internet, there is a huge need for friendly robots to index specific data and present it in a more intelligent way, and to create new value for the inet user. I trully think site owners should not deny access to any kind of robot, just because they can (actually, they can only say in robots.txt when spiders are not welcome). There are no laws at this time for these issues, but hopefully we'll see some in the near future.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved