Forum Moderators: open
robots.txt? Yes
(Also noted in: "amazonaws.com plays host to wide variety of bad bots [webmasterworld.com]")
[edited by: Ocean10000 at 5:32 pm (utc) on Oct. 1, 2009]
[edit reason] Breaking Hyperlink [/edit]
The XML page cannot be displayed
Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.
--------------------------------------------------------------------------------
The server did not understand the request, or the request was invalid. Error processing resource 'http://www.w3.org/TR/xhtm... Here's the log with referrer:
75.101.192.195 - - [01/Oct/2009:07:10:09 -0600] "GET /robots.txt HTTP/1.0" 200 397 "-" "taptubot *** please read [taptu.com...] ***"
Then, this evening, I received new friends, apparently related to the above:
75.101.136.108 - - [01/Oct/2009:11:52:59 -0600] "GET //admin/ HTTP/1.1" 403 3077 "-" "Mozilla Firefox"
75.101.136.108 - - [01/Oct/2009:11:52:59 -0600] "POST //admin/record_company.php/password_forgotten.php?action=insert HTTP/1.1" 403 3522 "-" "Mozilla Firefox"
75.101.136.108 - - [01/Oct/2009:11:53:01 -0600] "GET //images/b6f04.php?cmd=uptime HTTP/1.0" 403 2985 "-" "lwp-trivial/1.41"
75.101.136.108 - - [01/Oct/2009:18:27:45 -0600] "GET //admin/ HTTP/1.1" 403 3077 "-" "Mozilla Firefox"
75.101.136.108 - - [01/Oct/2009:18:27:45 -0600] "POST//admin/record_company.php/password_forgotten.php?action=insert HTTP/1.1" 403 3522 "-" "Mozilla Firefox"
75.101.136.108 - - [01/Oct/2009:18:27:48 -0600] "GET //images/86032.php?cmd=uptime HTTP/1.0" 403 2985 "-" "lwp-trivial/1.41"
I'm guessing this is yet another Zen Cart exploit scanner, but at this time, Milw0rm is down or otherwise unreachable for me.
I'm wondering if there's a valid corrolation between these three visits? The first seemingly driving by to see if the lights are on; the next two ringing my doorbell to see who's home.
We've been building pages for people that want certain portions of their domain to view properly on wireless phones for a good bit of time now.
Nearly every time I've seen this bot, it comes in looking for things like mobi, m, and pda first.
taptubot *** please read [taptu.com...] ***
I tried the URL and got the same message; 'unable to read. . . . . ', etc.
Then the UA switched to: Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3 and it kept hammering my site, but with no query string. Just:
GET /m/ HTTP/1.0
GET /mobile/ HTTP/1.0
GET /mobi/ HTTP/1.0
GET /iphone/ HTTP/1.0
GET /pda/ HTTP/1.0
All from 72.44.42.161 ~ Duncansville, PA
Not sure what to do with it.
@OnThePike
@JimmieT
I'm confused about how this or any bot-runner's host server could relay back to you/your server/IP(s). Do you read your logs online via a program on your server? Or did you click on links in your server's log file? Or--?
(FWIW, I never, ever click log-based links so there's no referer and thus no way a bot-runner could link itself to my server/IPs. If I even think about visiting a bot's link, I run it through Google first to learn what I can about it, and/or read the page without going to the site. If things look more okay than not, then I copy-paste the URL into my browser.)
@caribguy
If any of your sites rely on Twitter traffic... A ton of Twitter-related apps/hosts hail from amazonaws.com. They usually do not ID themselves as Twitter-anything but they're definitely tracking URLs in tweets because every time a site/page gets mentioned, we're swarmed.
(Not coincidentally, amazonaws and most of the simultaneous tweet-tracking hosts are already blocked as bad bot havens.)
...amazonaws and most of the simultaneous tweet-tracking hosts are already blocked as bad bot havens - Pfui
I'm confused about how this or any bot-runner's host server could relay back to you/your server/IP(s). Do you read your logs online via a program on your server? Or did you click on links in your server's log file? Or--?
I can then read the logs, do reverse lookup if needed, and decide whether to modify .htaccess if needed. I can copy/paste the bot URL into my home browser to find out more about it. My host IP will not be affected.
My site is strictly informational. There is no need to keep trying to ‘GET’ what I don’t have, yet, even after all the redirects or 403s, they continue. The referrer was Amazonaws, in this case. I think eventually they will give up and go away.
Jim
Thanks for your process. I think I misunderstood your original post:)
Oh, and re amazonaws as host (never seen it as Referer) -- I've been watching that host for months now because the service is a haven for bad bots and iffy UAs, annoyingly undeterred by 403s. Expect more.
@keyplyr
Just musing here but from the looks of AWS's Twitter fellow travelers, 403s don't seem to impact subsequent, apparently personal hits. A while back, I followed up on the some of the bots' sites and our tweeted links appeared in their repackaged info.
Next time I see another swarm while it's happening, I'll re-check.