Forum Moderators: open
Today
216.182.238.102 - - [23/Jan/2007:14:54:56 -0800] "GET /robots.txt HTTP/1.1" 403 - "-" "O#*$!earch/1.x (www.o#*$!earch.com)"
O#*$!earch/1.x (www.o#*$!earch.com)
For people using PHP's get_browser() function, adding this to their browscap.ini file without quotation marks will cause errors. That would make this a malicious bot. Would Amazon do something like that?
Am I missing anything Don? Thanks.
o#*$!earch is openisearch, but there is a "bad" word between O and E. I think Specificmedia knows about WebmasterWorld's censorship.
Amazon is not running bots from 216.182.224.0/20! They sell computer power and bandwidth to anyone. It's like a temporary virtual server. See here: [webmasterworld.com...]
216.182.233.215 - - [26/Jan/2007:20:37:04 -0800] "GET /robots.txt HTTP/1.0" 403 - "-" "complex_network_group/Nutch-0.9-dev (discovering the structure of the world-wide-web; [cantor.ee.ucla.edu...] nimakhaj@gmail.com)"
Here are some interesting similarities with SMBot:
1) OpenISearch has the same format for the User-Agent:
- OpenISearch User-Agent: OpenISearch/1.x (www.openisearch.com)"
- SMBot User-Agent: "SMBot/1.1 (www.specificmedia.com)"
2) The web sites are a very similar design style.
3) OpenISearch and SMBot both come from the same IP block (216.182.236.*, 216.182.237.*, 216.182.238.*) and server at Amazon Web Services (compute.amazonaws.com).
4) Both domains are registered to "Domains by Proxy".
Went to teh site listed in the User-Agent, www.OpenISearch.com, and it's a front. Claims to be "The Ultimate Search Engine", that will have "more results than all other search engines combined". They're planning to overtake Google, Yahoo, and MSN? Have fun with that.
None of the links on the page are even working...it claims to be "Coming Soon". Hmmm...
Anyone else have info on OpenISearch/SMBot? Please contribute.
I'm guessing you've had it hit a few of your sites?
In early December I added the range as a result of threads referenced in this thread.
OpenI has been relentless at eating 403's of the IP range denial.
OpenI also catches a SetEnvIf for "Open".
In addition I'm getting some slight traffic from the following (course the below catches three rules; one for the IP range (same Class C as OpenI) and the other for Nutch), as well as "crawl".)
216.182.236.zz - - [05/Feb/2007:18:47:20 -0800] "GET /robots.txt HTTP/1.0" 403 - "-" "complex_network_group/Nutch-0.9-dev (discovering the structure of the world-wide-web; [cantor.ee.ucla.edu...] nimakhaj@gmail.com)"
As a result of the four rules implemented in SetEnvIf (my SetEnvIf and deny from's are not configured to allow the reading of robots.txt, whewereas my Rewrites for specific IP ranges are allowed access to robots.txt), neither is able to read robots.txt and is stuck in a 403 loop.