homepage Welcome to WebmasterWorld Guest from 54.227.40.166
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Amazon
wilderness




msg:3229436
 11:56 pm on Jan 23, 2007 (gmt 0)

There's and old thread closed.
[webmasterworld.com...]

Today
216.182.238.102 - - [23/Jan/2007:14:54:56 -0800] "GET /robots.txt HTTP/1.1" 403 - "-" "O#*$!earch/1.x (www.o#*$!earch.com)"

 

thetrasher




msg:3230158
 2:37 pm on Jan 24, 2007 (gmt 0)

SMBot?! ([webmasterworld.com ])

Advertised website openisearch.com is hosted by "specificmedia".

GaryK




msg:3230562
 7:51 pm on Jan 24, 2007 (gmt 0)

O#*$!earch/1.x (www.o#*$!earch.com)

Are the odd characters part of the UA or yet another bug in the software?

For people using PHP's get_browser() function, adding this to their browscap.ini file without quotation marks will cause errors. That would make this a malicious bot. Would Amazon do something like that?

Am I missing anything Don? Thanks.

thetrasher




msg:3230613
 8:35 pm on Jan 24, 2007 (gmt 0)

Gary, it's censorship by WebmasterWorld.

o#*$!earch is openisearch, but there is a "bad" word between O and E. I think Specificmedia knows about WebmasterWorld's censorship.

Amazon is not running bots from 216.182.224.0/20! They sell computer power and bandwidth to anyone. It's like a temporary virtual server. See here: [webmasterworld.com...]

wilderness




msg:3230621
 8:43 pm on Jan 24, 2007 (gmt 0)

Gary and trasher,
I read a recent announcement (believe in the IAR [renamed] relases) were Amazon, eBay and another were partnering in a venture that was SE related.

the bot name is " open I search" all one name.

I believe the forum censor is screening the alternative word for phallus.

GaryK




msg:3231065
 5:28 am on Jan 25, 2007 (gmt 0)

I always thought the dirty words filter used asterisks. Oh well.

wilderness




msg:3233707
 5:49 am on Jan 27, 2007 (gmt 0)

just a heads up.

216.182.233.215 - - [26/Jan/2007:20:37:04 -0800] "GET /robots.txt HTTP/1.0" 403 - "-" "complex_network_group/Nutch-0.9-dev (discovering the structure of the world-wide-web; [cantor.ee.ucla.edu...] nimakhaj@gmail.com)"

hybrid6studios




msg:3244974
 9:48 am on Feb 7, 2007 (gmt 0)

Iím pretty sure this is either the little brother of SMBot or itís replacement. Can anyone else confirm that this is run by Specific Media? Sure smells like it. Since we discussed it and I started banning it, SMBot completely quit hitting my sites and OpenISearch picked up where it left off, slamming my sites, even worse than SMBot.

Here are some interesting similarities with SMBot:

1) OpenISearch has the same format for the User-Agent:
- OpenISearch User-Agent: OpenISearch/1.x (www.openisearch.com)"
- SMBot User-Agent: "SMBot/1.1 (www.specificmedia.com)"
2) The web sites are a very similar design style.
3) OpenISearch and SMBot both come from the same IP block (216.182.236.*, 216.182.237.*, 216.182.238.*) and server at Amazon Web Services (compute.amazonaws.com).
4) Both domains are registered to "Domains by Proxy".

Went to teh site listed in the User-Agent, www.OpenISearch.com, and it's a front. Claims to be "The Ultimate Search Engine", that will have "more results than all other search engines combined". They're planning to overtake Google, Yahoo, and MSN? Have fun with that.

None of the links on the page are even working...it claims to be "Coming Soon". Hmmm...

Anyone else have info on OpenISearch/SMBot? Please contribute.

hybrid6studios




msg:3244989
 10:11 am on Feb 7, 2007 (gmt 0)

I went through my logs again and found more IP blocks that these bots have in common. Here's my complete list:

216.182.225.*
216.182.228.*
216.182.230.*
216.182.231.*
216.182.233.*
216.182.236.*
216.182.237.*
216.182.238.*
216.182.239.*

wilderness




msg:3245265
 3:11 pm on Feb 7, 2007 (gmt 0)

RewriteCond %{REMOTE_ADDR} ^216\.182\.2(2[4-9]¦3[0-9])\. [OR]

hybrid6studios




msg:3246138
 9:07 am on Feb 8, 2007 (gmt 0)

Thanks for the info wilderness. I'm guessing you've had it hit a few of your sites?

wilderness




msg:3246601
 5:41 pm on Feb 8, 2007 (gmt 0)

I'm guessing you've had it hit a few of your sites?

In early December I added the range as a result of threads referenced in this thread.
OpenI has been relentless at eating 403's of the IP range denial.
OpenI also catches a SetEnvIf for "Open".

In addition I'm getting some slight traffic from the following (course the below catches three rules; one for the IP range (same Class C as OpenI) and the other for Nutch), as well as "crawl".)

216.182.236.zz - - [05/Feb/2007:18:47:20 -0800] "GET /robots.txt HTTP/1.0" 403 - "-" "complex_network_group/Nutch-0.9-dev (discovering the structure of the world-wide-web; [cantor.ee.ucla.edu...] nimakhaj@gmail.com)"

As a result of the four rules implemented in SetEnvIf (my SetEnvIf and deny from's are not configured to allow the reading of robots.txt, whewereas my Rewrites for specific IP ranges are allowed access to robots.txt), neither is able to read robots.txt and is stuck in a 403 loop.

hybrid6studios




msg:3249148
 10:43 am on Feb 11, 2007 (gmt 0)

Thanks Wilderness. So, updated range is: 216.182.224.* - 216.182.239.* (for newbies not familiar with Regex)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved