Forum Moderators: open

Message Too Old, No Replies

stat statcrawler@gmail.com

just noticed this today...

         

BillyS

12:56 pm on Oct 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just happened to notice this today in my logs:

66.92.*.* - - [02/Oct/2004:08:31:18 -0400] "GET /robots.txt HTTP/1.0" 200 484 "-" "stat (statcrawler@gmail.com)"
66.92.*.* - - [02/Oct/2004:08:31:19 -0400] "GET / HTTP/1.0" 200 19826 "-" "stat statcrawler@gmail.com"

Could not find anything here or on the web yet. Anyone else see this?

[edited by: Brett_Tabke at 1:36 pm (utc) on Oct. 2, 2004]

Brett_Tabke

1:36 pm on Oct 2, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



someone running a bot from home with a fake agent name.

bull

2:15 pm on Oct 2, 2004 (gmt 0)

10+ Year Member



Use an anti-lowercase rule to get rid of many of these rogue ones

RewriteCond %{HTTP_USER_AGENT} ^[a-z0-9]+
RewriteCond %{HTTP_USER_AGENT}!^msnbot
RewriteCond %{HTTP_USER_AGENT}!^contype
RewriteRule!robots\.txt - [F]

uncle_bob

6:42 pm on Oct 2, 2004 (gmt 0)

10+ Year Member



Visited me yesterday. Odd the UA is different for robots.txt and / requests. I'm never keen on spiders that don't give a url in their user-agent.

bull

8:06 pm on Oct 2, 2004 (gmt 0)

10+ Year Member



Odd the UA is different for robots.txt and / requests

Rogue Yahoo bots [webmasterworld.com] do the same ;-)

aerostat

9:01 am on Oct 7, 2004 (gmt 0)

10+ Year Member



stat is an experimental crawler for a next generation search engine like service, which would try to be webmaster friendly. It's interesting that it only got noticed by now after it has crawled tens of millions of pages. I haven't seen any complains (aside from a couple of curious inquiries) yet regarding it's behavior.

If you think it's misbehaving on your site, send a note to statcrawler@gmail.com and it'll be dealt with.

Thanks

wilderness

10:54 am on Oct 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's interesting that it only got noticed by now after it has crawled tens of millions of pages.

When did it begin identiying itself with a UA?

The should provide an insight into how many webmasters view their logs and also take steps to assure the protection of their data.

Bots and/or bot creators have been crawling for such a length of time that most have little regard for websites or webmasters desires.

aerostat

11:50 pm on Oct 7, 2004 (gmt 0)

10+ Year Member



It's been using the same ua id from the beginning of the current crawl, since early september. It adheres to the robots.txt instructions. The current minimum page fetch interval for a site is 30 seconds (there is no scheduled delay between the fetch of robots.txt and the first page.) I suspect most webmasters won't mind as there are zero complaints so far. The intention is to create a service that is mutually beneficial to the sites crawled and the service provider.

BTW, we're not affiliated with either aerostat.com or aerostat.net.

Thanks.