Forum Moderators: open

Message Too Old, No Replies

"CMS crawler"

their URL gives "Access Denied"

         

Mokita

10:00 pm on Nov 15, 2006 (gmt 0)

10+ Year Member



User Agent: CMS crawler (+http://buytaert.net/crawler/)
IP: 217.67.229.133 - seems to be located in the Netherlands.

It only requested our default (index) page. But as they refuse to reveal who they are or what they are doing, they will eat 403s from now on.

Mokita

9:17 am on Nov 16, 2006 (gmt 0)

10+ Year Member



<red face>I really need to curb my trigger-happy posting fingers. I was pretty incensed at getting the "Access denied" message when I tried to load the URL quoted in the UA, and finding nothing enlightening in Google, sounded off here before investigating deeper.</red face>

When I dug further, I found this:

[buytaert.net...]

The site belongs to a PhD student, who is heavily involved with Drupal. The crawler seems to be counting how many sites use CMS.

Based on the info in his FAQ, I won't be banning it.

GaryK

5:49 pm on Nov 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't feel so embarrassed. We've all been there and done that. :)

I'm not sure which I dislike more: UAs with no URL in them, or UAs with bad/misleading/outdated URLs in them?

Mokita

11:11 pm on Nov 17, 2006 (gmt 0)

10+ Year Member



Thanks Gary :)

GaryK

9:57 pm on Nov 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You're welcome.

CMS crawler visited my sites last week too from the same IP Address. As with your visit it only took the default root page.

I've also seen Drupal (+http://drupal.org/) crawling from the same IP Address as recently as November 5, 2006. So that seems to confirm what you stated about the guy being involved with Drupal.

GaryK

7:04 pm on Nov 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



He's got a new UA this week:

[buytaert.net...]

Same IP Address. Same bad URL that return "Access Denied" error.Still no robots.txt. He's banned.