Forum Moderators: open

Message Too Old, No Replies

Does Alltheweb respect robots.txt?

Showing restricted directories!

         

hanuman

3:53 am on Aug 25, 2002 (gmt 0)

10+ Year Member



Does alltheweb respect robots.txt exclusions? If so how one explains that running a search on "mydomainname" brings up 52,875 web pages with the first result to be mydomainname.com/restricted_directory !

any thoughts?

Rumbas

2:03 pm on Aug 25, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yes, Fast respects robots.txt.

I could be that your robots.txt is not setup correctly. Are you sure that you disallowed the real Fast spiders [searchengineworld.com]

What happens if you click the link? Do you get access to the area?

If you do, maybe you should protect it?
Restricted areas should always be protected with a password. If a user can type in the URL and get a page, so will the spider.

hanuman

11:37 pm on Aug 25, 2002 (gmt 0)

10+ Year Member



Rumbas, the robots.txt is correctly written. The directory I am restricting is only to be excluded from search engines and spiders not from the public.

however Alltheweb seems to ignore the rule and giving this directory the highest rank page among the site 50K of documents.

strange hu?

heini

11:45 pm on Aug 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hanuman, talk to them. Fast does a lot of experimenting with crawlers, they have literally an army of spiders, so perhaps something just went wrong.
Another option: did you put up the robots txt from the start or is it new? The listed pages are not in the index forever?