DepSpid

Forum Moderators: open

Message Too Old, No Replies

DepSpid

Despite claims, distributed crawler doesn't check robots.txt

Pfui

2:25 am on Jan 12, 2007 (gmt 0)

Mozilla/4.0 (compatible; DepSpid/5.07; +http://about.depspid.net)

From info via that URL:

"If you want to prevent all robots from accessing certain places, you should specify a User-Agent: * section in your robots.txt file. If you want to only prevent the DepSpid spider from accessing certain places, you will need to use a User-Agent: DepSpid section in your robots.txt file."

That would be just dandy if it asked for robots.txt in the first place. However in the last 24 hours, in three (attempted) crawls by three different ISPs (1 .uk; 2 .de), it's 0 for 0.

01/10 22:44:09
/

01/10 22:52:56
/

01/11 16:43:41
/

Annie
(just passing through; waves to all!:)

Mokita

6:59 pm on Jan 12, 2007 (gmt 0)

Hi Annie! Long time no see. You've been missed in this forum.

I've seen DepSpid in just one site so far and it only asked for the default home page and didn't go any further.

It stated its referrer as a site which does have a link to mine. I expect that if it comes back to crawl it will ask for robots.txt - if it doesn't, it'll be banned very quickly.

GaryK

9:39 pm on Jan 13, 2007 (gmt 0)

I second that Annie. We miss you. I hope everything is alright.

DepSpid

Despite claims, distributed crawler doesn't check robots.txt

Pfui

Mokita

GaryK

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week