homepage Welcome to WebmasterWorld Guest from 54.226.136.179
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
possible botnet
wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4474787 posted 11:47 pm on Jul 11, 2012 (gmt 0)

very unlikely and highly suspicious that a private user would request robots.txt and "nothing else".

68.47.129.zz - - [11/Jul/2012:22:45:09 +0100] "GET /robots.txt HTTP/1.1" 200 2680 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

 

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4474787 posted 2:14 am on Jul 12, 2012 (gmt 0)

I have, from time to time, but very few time to times!

Depends on site, as I sometimes wish to see what they are doing that I am not, for example.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4474787 posted 2:24 am on Jul 12, 2012 (gmt 0)

tangor,
I guess it's o.k. if I do it to another's site, however don't expect the same curiosity on my own ;)

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4474787 posted 2:35 am on Jul 12, 2012 (gmt 0)

Been there, done that, had the panic, too... but then again, robots.txt is for everyone (or should be) and that's the way I roll on it (robots.txt). Get ugly with anything else on my site... well that's a different story! :)

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4474787 posted 4:42 am on Jul 12, 2012 (gmt 0)

very unlikely and highly suspicious

There you go, talking about me again.

Another benign explanation: It's someone in a programming class, assigned to make a crawler. They're not interested in your content, so they just pick up robots.txt because the crawler has to bring back something to prove it's been there. For appearances' sake they might also pick up the front page. And then you never see them again.


Over on an unrelated forum, a couple of people I know were talking about an online programming class that one of them was taking and the other was thinking about taking. The one who was thinking about it said, quote, I have no interest in making a web-crawler; I want to handle files and data.

Confirming a long-held suspicion that that's where some of them are coming from.

Incidentally, while looking up this post I found a discussion about a legitimate mail server that unfortunately lives right in the middle of a blocked range. The site administrator had to, in his words, "poke a hole" in the middle of the block so people could get their e-mail. I know the feeling.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4474787 posted 7:30 pm on Jul 12, 2012 (gmt 0)

Depends on why blocked and how legit. I sometimes get clients say "You've blocked email from X" and it turns out the idiots are running a mail server on a broadband line - often static but it's still a supid idea when the whole range is in at least one good RBL. I'm the one who then has to "poke a hole" in the banned block.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4474787 posted 8:38 pm on Jul 12, 2012 (gmt 0)

I'm the one who then has to "poke a hole" in the banned block.


Although this may seem like a real PITA, I do it frequently to allow visitors from Euro ranges and are widget folks whom are referred to me.
The only problem with these types of accommodations for RPIE and APNIC changes is that those countries Dynamic IP's change drastically from day to day (at least in most instances).

grandma genie



 
Msg#: 4474787 posted 4:44 pm on Aug 20, 2012 (gmt 0)

Came in today. Just robots.txt.

68.47.129.nn "GET /robots.txt HTTP/1.1" 200 1646 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4474787 posted 7:37 pm on Aug 20, 2012 (gmt 0)

Well, if they'd asked for anything more, that MSIE 5 would have got them blocked at the gate, wouldn't it? :) Or at least rewritten. (Mine says "I'm sorry, but the server thinks you are a robot.")

:: detour to own htaccess as I realize I forgot to exempt robots.txt from the UA-based rewrite ::

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4474787 posted 10:35 pm on Aug 20, 2012 (gmt 0)

There's always seems to be something from Little Rock.

68.47.129.xx - - [11/Jul/2012:22:45:09 +0100] "GET /robots.txt HTTP/1.1" 200 2680 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4474787 posted 9:27 pm on Aug 21, 2012 (gmt 0)

I have some sites that have a block on all comcast traffic. If comcast were South America, CN, KR or UA/RU they would be completely blocked. Terrible source of rubbish. Unfortunately one of my customers gets legit traffic from some of their users. :(

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4474787 posted 9:38 pm on Aug 21, 2012 (gmt 0)

dstiles,
They have some commercial ranges that I wouldn't hesitate to deny (as do most major providers). Unfortunately ARIN sub-net-searches are no longer possible.

I get regular Comcast users that are compliant and believe this particular IP, is just a compromised machine.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4474787 posted 8:34 pm on Aug 22, 2012 (gmt 0)

Lots of compromised machines about. This week's mail spam has doubled and there are a lot more bad site hits.

I've blocked 1060 comcast IPs over the past couple of years, ranging from single hits to a few dozen per IP. My server is a small-time operation so this is very big. If I get time perhaps I'll see if they easily group into IP ranges - and then block those ranges.

On the other hand, BT here in the UK has been blocked on over 1160 IPs - but UK is my normal market. Maybe I should address them as well.

jmccormac

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



 
Msg#: 4474787 posted 8:45 pm on Aug 22, 2012 (gmt 0)

The odd thing is that botnets don't usually request robots.txt.

Regards...jmcc

grandma genie



 
Msg#: 4474787 posted 9:47 pm on Oct 20, 2012 (gmt 0)

Ooh, look. Came in today with an actual identity:

68.47.129.nn - - [20/Oct/2012:14:37:24 -0400] "GET /robots.txt HTTP/1.1" 200 1750 "-" "Mozilla/5.0 (compatible; CompSpyBot/1.0; +h**p://www.compspy.com/spider.html)"

Same IP as the one that showed up in August.

-- GG

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved