homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

possible botnet

 11:47 pm on Jul 11, 2012 (gmt 0)

very unlikely and highly suspicious that a private user would request robots.txt and "nothing else".

68.47.129.zz - - [11/Jul/2012:22:45:09 +0100] "GET /robots.txt HTTP/1.1" 200 2680 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"



 2:14 am on Jul 12, 2012 (gmt 0)

I have, from time to time, but very few time to times!

Depends on site, as I sometimes wish to see what they are doing that I am not, for example.


 2:24 am on Jul 12, 2012 (gmt 0)

I guess it's o.k. if I do it to another's site, however don't expect the same curiosity on my own ;)


 2:35 am on Jul 12, 2012 (gmt 0)

Been there, done that, had the panic, too... but then again, robots.txt is for everyone (or should be) and that's the way I roll on it (robots.txt). Get ugly with anything else on my site... well that's a different story! :)


 4:42 am on Jul 12, 2012 (gmt 0)

very unlikely and highly suspicious

There you go, talking about me again.

Another benign explanation: It's someone in a programming class, assigned to make a crawler. They're not interested in your content, so they just pick up robots.txt because the crawler has to bring back something to prove it's been there. For appearances' sake they might also pick up the front page. And then you never see them again.

Over on an unrelated forum, a couple of people I know were talking about an online programming class that one of them was taking and the other was thinking about taking. The one who was thinking about it said, quote, I have no interest in making a web-crawler; I want to handle files and data.

Confirming a long-held suspicion that that's where some of them are coming from.

Incidentally, while looking up this post I found a discussion about a legitimate mail server that unfortunately lives right in the middle of a blocked range. The site administrator had to, in his words, "poke a hole" in the middle of the block so people could get their e-mail. I know the feeling.


 7:30 pm on Jul 12, 2012 (gmt 0)

Depends on why blocked and how legit. I sometimes get clients say "You've blocked email from X" and it turns out the idiots are running a mail server on a broadband line - often static but it's still a supid idea when the whole range is in at least one good RBL. I'm the one who then has to "poke a hole" in the banned block.


 8:38 pm on Jul 12, 2012 (gmt 0)

I'm the one who then has to "poke a hole" in the banned block.

Although this may seem like a real PITA, I do it frequently to allow visitors from Euro ranges and are widget folks whom are referred to me.
The only problem with these types of accommodations for RPIE and APNIC changes is that those countries Dynamic IP's change drastically from day to day (at least in most instances).

grandma genie

 4:44 pm on Aug 20, 2012 (gmt 0)

Came in today. Just robots.txt.

68.47.129.nn "GET /robots.txt HTTP/1.1" 200 1646 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"


 7:37 pm on Aug 20, 2012 (gmt 0)

Well, if they'd asked for anything more, that MSIE 5 would have got them blocked at the gate, wouldn't it? :) Or at least rewritten. (Mine says "I'm sorry, but the server thinks you are a robot.")

:: detour to own htaccess as I realize I forgot to exempt robots.txt from the UA-based rewrite ::


 10:35 pm on Aug 20, 2012 (gmt 0)

There's always seems to be something from Little Rock.

68.47.129.xx - - [11/Jul/2012:22:45:09 +0100] "GET /robots.txt HTTP/1.1" 200 2680 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"


 9:27 pm on Aug 21, 2012 (gmt 0)

I have some sites that have a block on all comcast traffic. If comcast were South America, CN, KR or UA/RU they would be completely blocked. Terrible source of rubbish. Unfortunately one of my customers gets legit traffic from some of their users. :(


 9:38 pm on Aug 21, 2012 (gmt 0)

They have some commercial ranges that I wouldn't hesitate to deny (as do most major providers). Unfortunately ARIN sub-net-searches are no longer possible.

I get regular Comcast users that are compliant and believe this particular IP, is just a compromised machine.


 8:34 pm on Aug 22, 2012 (gmt 0)

Lots of compromised machines about. This week's mail spam has doubled and there are a lot more bad site hits.

I've blocked 1060 comcast IPs over the past couple of years, ranging from single hits to a few dozen per IP. My server is a small-time operation so this is very big. If I get time perhaps I'll see if they easily group into IP ranges - and then block those ranges.

On the other hand, BT here in the UK has been blocked on over 1160 IPs - but UK is my normal market. Maybe I should address them as well.


 8:45 pm on Aug 22, 2012 (gmt 0)

The odd thing is that botnets don't usually request robots.txt.


grandma genie

 9:47 pm on Oct 20, 2012 (gmt 0)

Ooh, look. Came in today with an actual identity:

68.47.129.nn - - [20/Oct/2012:14:37:24 -0400] "GET /robots.txt HTTP/1.1" 200 1750 "-" "Mozilla/5.0 (compatible; CompSpyBot/1.0; +h**p://www.compspy.com/spider.html)"

Same IP as the one that showed up in August.

-- GG

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved