Forum Moderators: open

Message Too Old, No Replies

DepSpid

Despite claims, distributed crawler doesn't check robots.txt

         

Pfui

2:25 am on Jan 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mozilla/4.0 (compatible; DepSpid/5.07; +http://about.depspid.net)

From info via that URL:

"If you want to prevent all robots from accessing certain places, you should specify a User-Agent: * section in your robots.txt file. If you want to only prevent the DepSpid spider from accessing certain places, you will need to use a User-Agent: DepSpid section in your robots.txt file."

That would be just dandy if it asked for robots.txt in the first place. However in the last 24 hours, in three (attempted) crawls by three different ISPs (1 .uk; 2 .de), it's 0 for 0.

01/10 22:44:09
/

01/10 22:52:56
/

01/11 16:43:41
/

Annie
(just passing through; waves to all!:)

Mokita

6:59 pm on Jan 12, 2007 (gmt 0)

10+ Year Member



Hi Annie! Long time no see. You've been missed in this forum.

I've seen DepSpid in just one site so far and it only asked for the default home page and didn't go any further.

It stated its referrer as a site which does have a link to mine. I expect that if it comes back to crawl it will ask for robots.txt - if it doesn't, it'll be banned very quickly.

GaryK

9:39 pm on Jan 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I second that Annie. We miss you. I hope everything is alright.