Forum Moderators: open

Message Too Old, No Replies

Deluged with unknown bots

         

montclairguy

8:59 pm on May 14, 2010 (gmt 0)

10+ Year Member



For several years, now, I've been seeing automated processes chip away at my bandwidth and resources. I've firmed up some of the code in my auto banning apps lately, and am just amazed at how often my site is being hit. In fact, I'm concerned that I may have missed some IP's for legitimate search crawlers but nothing I see points me in that direction.

Here's a snippet of my error_log showing access denied to these things (and this is during a lull in activity!) Has anybody seen IPs similar to these?


[Fri May 14 16:44:15 2010] [error] [client 76.167.40.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:15 2010] [error] [client 75.36.45.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:16 2010] [error] [client 75.62.133.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:17 2010] [error] [client 71.104.93.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:18 2010] [error] [client 69.228.150.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:18 2010] [error] [client 76.167.40.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:18 2010] [error] [client 76.243.65.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:19 2010] [error] [client 76.243.65.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:22 2010] [error] [client 71.106.15.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:26 2010] [error] [client 69.232.155.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:26 2010] [error] [client 69.230.73.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:27 2010] [error] [client 76.212.213.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:28 2010] [error] [client 69.230.48.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:28 2010] [error] [client 69.230.48.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:30 2010] [error] [client 69.230.71.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:30 2010] [error] [client 75.36.45.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:30 2010] [error] [client 71.104.91.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:31 2010] [error] [client 75.36.45.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:31 2010] [error] [client 75.36.45.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:31 2010] [error] [client 76.243.65.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:34 2010] [error] [client 75.62.132.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:36 2010] [error] [client 69.232.155.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:39 2010] [error] [client 69.228.150.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:39 2010] [error] [client 69.228.150.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:44 2010] [error] [client 75.36.45.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:45 2010] [error] [client 69.230.63.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:45 2010] [error] [client 69.230.63.*] client denied by server configuration: /cgi-bin/our_search_engine.cgi
[Fri May 14 16:44:45 2010] [error] [client 76.230.110.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi
[Fri May 14 16:44:50 2010] [error] [client 76.212.213.*] client denied by server configuration: /cgi-bin/our_cart_app.cgi


User-agents are always these:


"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 (.NET CLR 3.5.30729)"
"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8 (.NET CLR 3.5.30729)"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"


Anybody have any clue as to who or what these masquerading processes are? And, yes, I'm positive they are not humans visiting my site; they never request anything but specific pages, and never any of the graphics or style sheets, and never accept any cookies.

[edited by: incrediBILL at 8:11 pm (utc) on May 15, 2010]
[edit reason] Obscured IPs [/edit]

wilderness

3:52 pm on May 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm sure everybody is just as confused as I, regarding what you have presented here.

1) The UA's are standard browser UA's.

2) Attempting to use Error logs to identify visitors is not a reliabale/useable reference, rather you need actual RAW Visitor Logs (of which you would only present a solitary line here and with the Ip-Class D-obscured.
2a) You've only provided error logs to specific areas of you CGI, and what you need to do is, look at your overall-Raw Logs and see how these log lines interact with your website (s) as a hole. There are not any references to HTML pages?

3) I did Arin searches on a few of these IP's (that you provided) and they were large block ranges from major service providers. Some of these providers are prone to open-proxies, however nothing you have provided here would confirm that.

4) If these specific IP's and/or IP ranges (and while consistently utilizing the same UA) are constantly revisiting the same page and/or request (this is NOT any confirmation that the visitor is a bot or any other automated request), than, you may utilize an access restriction in mod_rewrite based upon BOTH the UA and an IP range.
EX:
# User-Agent ends with SV1 and Visitor IP is 123.456.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} SV1\)$
RewriteCond %{REMOTE_ADDR} ^123\.456\.
RewriteRule .* - [F]

In any event there is not any simple solution or answer to your inquiry.

montclairguy

2:38 am on May 17, 2010 (gmt 0)

10+ Year Member



My apologies, as I thought I was fairly clear. I'll expound.

I presented the error log simply to display the frequency of requests and the IPs. I felt presenting you with my access log would be rather pointless as I stated, "they never request anything but specific pages, and never any of the graphics or style sheets, and never accept any cookies." The specific pages are the two (renamed) applications listed in the error log -- our custom shopping cart application, and our custom site search engine.

The IPs are soft-banned for 8 hours at a time for following a specific pattern of requesting URLs too quickly, without accepting cookies, with a blank referrer or a referrer ending in only a top level domain. I suppose I should have included that information in my original post, but I didn't wish to give out too much information to any blackhats, spammers, scrapers, etc. who may monitor these forums. Banning the IPs is not the issue and not really what I was asking opinions on how to accomplish, but thank you for your rewrite rule, nonetheless.

I was / am simply curious if anyone else sees activity in the same IP blocks, performing similar requests, at a similar frequency, using the user-agents provided.

montclairguy

2:53 am on May 17, 2010 (gmt 0)

10+ Year Member



Additionally, another interesting tidbit about these bots is that they're always out of a certain area of California. Even if these processes are being routed through open proxies, it's rather odd that they're only routed through open proxies in California.

I'm more of the opinion that they may be linked to a Netenforcers or Cyveillence type operation, but that is truly just a hunch and I could be way off.

dstiles

8:15 pm on May 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Plugins for (eg) Firefox can be set to check for changes to specific pages every x hours but given that the target is a cart that is not very likely unless you have some kind of order tracking page.

If it's only one or two hits from each IP and then the IP changes then it's possibly a botnet. I see a lot of those from both broadband and server IPs. They often show with either a non-existent path/pagename or with a probing querystring (likely if the hit is a search page or other page commonly accessed with querystrings). There are browser header checks that can help determine this.

Botnet IPs are sometimes also used for spamming and show up in spam blacklists or lists of open proxies/servers. Check in robtex.com. This can also show you /24 rDNS entries - ie you can check to see if a /24 block has dynamic (or no) rDNS (probably dynamic) or if it has proper domain names - but note: sometimes domain names are used for business DSL rDNS so you'd have to check what the block owner's business is as well - ISP or host.

montclairguy

1:43 am on May 18, 2010 (gmt 0)

10+ Year Member



Thanks for the response. The hits are non-stop from the same IP ranges. They cycle dynamically to others and pound away all day and all night. There's no guessing going on, as to what they're trying to fetch (every URL requested exists.)

However, I recently saw an IP block disappear after about 3 years of hitting my site, in the 4.232.#*$!.#*$! range. I actually posted about this, here, four years ago.

[webmasterworld.com ]

wilderness

3:53 am on May 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wrong thread

enigma1

8:24 am on May 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is my log about this I think it maybe of relevance. I also do get them all the time on a particular domain though.

209.249.53.* - - [04/May/2010:14:13:03 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1 + FairShare-http://fairshare.cc)"
67.202.9.* - - [04/May/2010:14:13:31 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
75.101.196.* - - [04/May/2010:14:13:32 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
204.236.214.* - - [04/May/2010:14:13:33 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.161.* - - [04/May/2010:14:13:33 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:34 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:35 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
67.202.37.* - - [04/May/2010:14:13:35 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
204.236.214.* - - [04/May/2010:14:13:36 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.171.* - - [04/May/2010:14:13:36 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.162.* - - [04/May/2010:14:13:37 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:38 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.85.* - - [04/May/2010:14:13:38 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.162.* - - [04/May/2010:14:13:39 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
75.101.196.* - - [04/May/2010:14:13:39 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.85.* - - [04/May/2010:14:13:41 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:41 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.161.* - - [04/May/2010:14:13:42 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.86.* - - [04/May/2010:14:13:42 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.162.* - - [04/May/2010:14:13:43 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:46 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.171.* - - [04/May/2010:14:13:47 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.171.* - - [04/May/2010:14:13:47 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
67.202.9.* - - [04/May/2010:14:13:48 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.85.* - - [04/May/2010:14:13:48 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:49 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:49 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
67.202.37.* - - [04/May/2010:14:13:50 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
67.202.37.* - - [04/May/2010:14:13:50 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.161.* - - [04/May/2010:14:13:51 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
67.202.37.* - - [04/May/2010:14:13:51 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
204.236.214.* - - [04/May/2010:14:13:52 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:53 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.162.* - - [04/May/2010:14:13:53 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:13:58 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.162.* - - [04/May/2010:14:13:59 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.161.* - - [04/May/2010:14:13:59 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:14:00 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
75.101.182.* - - [04/May/2010:14:14:01 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.171.* - - [04/May/2010:14:14:01 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
75.101.196.* - - [04/May/2010:14:14:02 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.86.* - - [04/May/2010:14:14:02 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:14:03 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.74.* - - [04/May/2010:14:14:03 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.162.* - - [04/May/2010:14:14:04 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.85.* - - [04/May/2010:14:14:04 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.161.* - - [04/May/2010:14:14:05 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
75.101.196.* - - [04/May/2010:14:14:05 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.171.* - - [04/May/2010:14:14:06 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
174.129.86.* - - [04/May/2010:14:14:06 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
75.101.196.* - - [04/May/2010:14:14:07 -0400] "GET / HTTP/1.0" 301 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

Now I cannot be certain if the very first entry is part of this (the only one that includes a link in the UA). If it is, then this process could be very well initiated by other means (a service some host provides and some other entity abuses - check that cc link ). The entries are sequential in the log. It happened in this case where I got no other access in-between. All attempts were denied access due to incorrect headers which is the first layer to validate a request for.

jdMorgan

2:15 pm on May 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just wondering why you're using a 301 redirect response on these accesses, instead of a 403-Forbidden...

Jim

montclairguy

2:43 pm on May 18, 2010 (gmt 0)

10+ Year Member



@wilderness
No, I did link to the correct thread in the post. That post concerns all of this. I wrote it 4 years ago.

@enigma1
Looks -very- similar, but all of yours appear to be coming from Amazon's (abused) "aws" cloud. I'm curious as to whether you see any from a user agent claiming to be Firefox 3.0.7 or 3.0.8.

enigma1

6:32 pm on May 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@Jim, there are several reasons,
- typical 403 response prints out few more bytes about the forbidden message
- 403 is more direct and the attacker immediately knows to try something else
- 301 can be used to monitor additional behavior of the bot. For example what happens if we output a 301 redirect header to another domain or page say or an invalid unique url, are they going to follow? Do they record it somewhere perhaps we can find it later on using the SEs?

Now assuming a 301 is followed by the bot, it could be possible to utilize such traffic in various ways to service your interests. The likelihood of a bot doing an RFI or SQL injection and then follow a 301 or 302 redirect header is quite high. I think they would want to see what they accomplished isn't it? While with a 403 nothing happens, they know they got caught so likely they will try something and there isn't a benefit for the site owner. So I typically put a script for the 403 handler and process the request at the application level so I can call whatever functions I need from the framework in place (instead of the typical htaccess processing).

I think redirecting the junk traffic has its uses and can offer info in the long run how to counter them. In my opinion in most cases they utilize ready libraries or tools most of which follow the basic elements in the rfc specs like a redirect header.

@montclairguy yes I have, they are referrer spam requests, so I won't put a full list here but here is an entry with the referrer modified.

218.186.10.* - - [07/May/2010:10:20:57 -0400] "GET / HTTP/1.1" 301 5 "spam referrer here" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 (.NET CLR 3.5.30729) FBSMTWB"

But the spam referrer is another pattern I see often usually from machines from the far east. One of the reasons behind it is that if the server publishes logs and then they force the popular spiders to index them by redirecting them to the log url, then they get some sort of references from the log pages. Another reason is of course for them to scan the referrer later on through the SEs and see what servers have weaknesses etc.

montclairguy

12:32 am on May 19, 2010 (gmt 0)

10+ Year Member



I used to do a redirect to my honeypot on the last access before soft banning. I only ever found a single site where the honeypot text showed up. Have you ever been successful in locating any of your redirected destination page's text in any SE results?

Oddly, the referrer is either always blank, or my domain name, for the bots I'm discussing, so I don't think they're looking to spam my logs (which I would never publish on a bet, anyway!)

jdMorgan

2:52 am on May 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



montclairguy,

Are the accesses detailed in your first post all GETs, or POSTs, or what? I certainly wouldn't put up with anyone trying to access my cart scripts, and especially POSTing to it...

Jim

enigma1

6:44 am on May 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@montclairguy I cannot see that because the SEs have already the valid pages. Invalid pages won't be indexed. So the only thing I could do is redirect the rogue bot to a new page (not in SEs yet) and see if this eventually gets indexed by SEs.

For the logs is not something the site owner publishes, what happens is the s/w many hosts have generates automatically error logs and are web accessible. Or they have some statistics exposed for errors along with referrer info.

montclairguy

10:19 pm on May 19, 2010 (gmt 0)

10+ Year Member



Jim,

They're all GETs, with cycling user agents:

69.230.63.* [19/May/2010:08:53:57 -0400] "GET /cgi-bin/our_cart.cgi?perfect_arguments_to_get_a_page HTTP/1.1" 403 2810 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

69.230.63.* [19/May/2010:08:56:49 -0400] "GET /cgi-bin/our_cart.cgi?perfect_arguments_to_get_a_page HTTP/1.1" 403 2810 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

69.230.63.* [19/May/2010:08:57:39 -0400] "GET /cgi-bin/our_cart.cgi?perfect_arguments_to_get_a_page HTTP/1.1" 403 2810 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8"


My guess is that they're either content scrapers, or content "monitors" (Markmonitor, Cyveillence, etc.) I -really- wish I could find out who or what these things are. Reports to the abuse departments of the ISPs have gone unanswered, as I'm sure they don't consider this abuse. Looks like legitimate visits, I'm sure, in their eyes -- but they definitely are not humans browsing.

montclairguy

1:43 pm on May 20, 2010 (gmt 0)

10+ Year Member



For whatever reason, after 5(+?) years of this, the bots stopped hammering away last night around 1:00 am eastern. My logs show very minimal attempts since that time. Very odd.

montclairguy

11:41 am on May 22, 2010 (gmt 0)

10+ Year Member



After a couple of days, they're back. I must have cycled out of their rotation briefly. It's very frustrating to be unable to track these things down.

wilderness

2:45 pm on May 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's very frustrating to be unable to track these things down.


Why waste your time awaiting rhyme or reason to materialize?

Five of these ranges (your opening inquiry) are from ATT/SBC PPPoX Pool.
Copy and paste "PPPoX Pool" (absent the quotes) into an ARIN search and gather a listing of them all, at least for your records.

Then deny these five complete "PPPoX Pool" ranges.
In my previous websites, I had them denied from 2002-2010.

One range is a Verion large block, however the examples you provided were limited to two Class B's and three Class C.
deny these ranges.
Verizon has large blocks that are NOT broken down into localized sub-nets.

One of the ranges was a Road Runner.
Unless your able to focus on localized sub-net, than deny these two class B's as well.

Some of the other ATT ranges are from rback20b.irvnca, which are pests as well.

In the event you perceive these denials as extreme (as you did when I offered a solution in 2006), than add denials based on multiple commands utilizing both IP range AND URI's and/or UA's.

montclairguy

5:56 am on May 23, 2010 (gmt 0)

10+ Year Member



Why waste your time awaiting rhyme or reason to materialize?

I'm not sure a few posts here, to see if anyone has similar experiences or has identified these IPs, qualifies as a waste of time. I was hoping someone had information which I did not, and might come forward and say, "That's definitely so-and-so" (where so-and-so is not the ISP).

In the event you perceive these denials as extreme (as you did when I offered a solution in 2006), than add denials based on multiple commands utilizing both IP range AND URI's and/or UA's.

I do, without identification, however as stated earlier in the thread, denying them is not the issue and not why I started this topic. I already have a custom solution in place to handle banning misbehaved processes for configurable periods of time, but I definitely thank for your advice (again) nonetheless.

The remaining information you provide is intriguing, especially that you've banned some of these ranges for eight years. The class information is also enlightening; thank you for that.

Again, thanks for the information. I do appreciate everyone's opinion.