Forum Moderators: open

Message Too Old, No Replies

DotBot

         

keyplyr

12:29 am on Jul 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Requested robots.txt, wher it is disallowed, then ignored it and took 40 html pages.

208.115.111.*** - - [28/Jul/2008:14:55:06 -0400] "GET /robots.txt HTTP/1.0" 301 243 "-" "DotBot/1.0.1 (http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)"

Now banned.

They have an info page which shows how to deny their UA via robots.txt standard, but since they don't obey it, what's the point? They also do not explain what the data is being used for.

keyplyr

5:38 am on Jul 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Seems to be obeying robots.txt now, so DotBot may have been operating on a cached copy from an earlier crawl. Regardless, it will stay banned until their purpose is made public.

IanTurner

10:10 pm on Sep 4, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Took a 1000 pages a day for 8 days in August on one of my sites, would be nice if they were a little more open about what they are planning.

wilderness

10:47 pm on Sep 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Took a 1000 pages a day for 8 days in August on one of my sites, would be nice if they were a little more open about what they are planning.

Ian,
During my time here at Webmaster World and in conjunction with my websites I've learned to accept and recognize that reputable bots and/or SE's do NOT utilize colo's for their crawling.

Harvesting on the other hand is entirely different issue (pun).

Don

incrediBILL

11:37 pm on Sep 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's not entirely true Don as smaller companies with reputable bots often use colo's and even as they grow it may still be a colo but have their own dedicated IP block and DNS entries which mask the facility.

It's those distributed bots I worry about.

wilderness

11:49 pm on Sep 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



smaller companies with reputable bots

oxymoron?

I've a couple of widget bots that I allow, however I kinda doubt a non-widget site, whom monitors their bot traffic would allow their crawling.