homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

CityGrid.com URL Validation

 9:48 pm on Dec 4, 2012 (gmt 0)

Amazon.com, Inc.
Amazon Web Services, Elastic Compute Cloud, EC2
1200 12th Avenue South

Nice of them to make it appear legit.



 8:45 pm on Dec 5, 2012 (gmt 0)

Why do you think it appears legit? Anything amazon should be blocked with prejudice! They host hundreds of nasty little me-too bots. There are a couple (at least) threads hereabouts giving full IP ranges.


 11:10 pm on Dec 5, 2012 (gmt 0)

The Search utility at the top of the page is a great tool to verify if a topic has been discussed before. Often times, if members wish to add more info to a previously discussed topic that has now been closed, they will reference that specific discussion as a link in their post.

Amazon EC2 & its ranges may have been the most discussed topic within the Search Engine Spider and User Agent Identification Forum.


 1:13 am on Dec 6, 2012 (gmt 0)

I've just this instant (really) come from updating my shared htaccess*, and would have been awfully annoyed if it turned out I'd missed one. Gosh, what a lot you can shave off the filesize by putting multiple Deny-froms in a single line, as in

Deny from 1.3 1.8 1.45
Deny from
Deny from
Deny from

(Not AWS, obviously, but same principle. Cut the size by 1/3, and I didn't even consolidate very much.)

* Multiple domains in the same userspace. You can't do everything in a shared htaccess-- and some things you wouldn't want to-- but consolidating the mod_auth_whatsit is a huge timesaver.


 1:25 am on Dec 6, 2012 (gmt 0)

When I consolidated mine, it went from 36k to 19k. (but I also broadened many ranges to include smaller ranges, mostly Chinese.)


 7:16 am on Dec 6, 2012 (gmt 0)

When I said legit, I ment that they are attempting to look like maybe a yellow pages bot or something.

I don't mean to annoy anyone here with my posts but with a site with very heavy traffic I like to share the odd user agents that come along for those that are bot blockers.

You never know when it will come through some proxy or something.


 8:28 pm on Dec 6, 2012 (gmt 0)

I have my system arranged such that it checks not only the actual IP but any proxy IPs as well. If a proxy IP is in a blocked range (server farm, high-abuse-rate IP etc) then the actual IP is blocked as well: it usually means the idiot using the real IP has a virus that's turned it into a bot-net or some such.

An example of this occurred during the past few days - several different IPs came in with a variety of UAs etc but ALL with a single proxy IP that placed the requester at a UK server farm in the range 88.208.193.nnn. This is not the first time this server range has attacked via proxies: it happened several months ago as well.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved