Forum Moderators: open
This morning 64.62.175.137 sucked me for in excess of 570 MB of bandwidth. The IP is Hurricane Electric from California however whether it is them directly I have no idea.
I have written to them requesting if they can supply me with an explanation however has anyone else seen such activity?
Thanks.
AddType application/x-httpd-php .htm .html
Options -Indexes
RewriteEngine on
RewriteCond %{HTTP_HOST} ^site\.com [OR]
RewriteCond %{HTTP_HOST} ^anothersite\.com [OR]
RewriteCond %{HTTP_HOST} ^www\.anothersite\.com
RewriteRule ^(.*) [site.com...] [L,R=301]
RewriteCond %{HTTP_USER_AGENT} ^Missigua [OR]
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer
RewriteRule ^.* - [F]
Do I need to add "_Bot"?
Relevant to our discussion:
Omni-Explorer.Omni-Explorer is a stealth-mode venture-backed startup based in Silicon Valley. Stay tuned to this site; we plan on launching shortly.
The Omni-Crawler
If you have found this page because of the Omni-crawler, please bear with us. We hope to be able to point many more users to the valuable content on your site shortly.
If you are finding our crawler overly burdensome on your site or prefer not to be included, you can exclude our crawler. Simply insert the following into your robots.txt file (if you don't know what one is, see the Robotstxt.org site).
The Omni-Explorer agent is: OmniExplorer_Bot/1.09. To prevent it from crawling your site, please put the following in your robots.txt file:
User-Agent: OmniExplorer_Bot/1.09
Disallow: *We will also obey the delay directive (in seconds) for how long to wait between page views on your site:
Crawl-delay: 2
We know some of you got hit by an earlier version of our crawler that was particularly...well, hungry. For that, we sincerely apologize.
Please feel free to send us feedback. If you feel that the crawlers are not matching the behavior stated on this page, please include the HTTP log file lines and your robots.txt file (or site) so we can verify the issue. Thank you.
So ;)
You expect webmasters to trust in the possibly of a change in policy and methods given the history of the bot and/or
"stealth-mode venture-backed startup"
Actually, no. Not really. They tried to hit one of my sites again after I excluded them. They didn't get very far. :)
I just thought the "stealth-mode venture-backed startup" an interesting choice of words myself, and figured others might agree.
I also saw that given someone earlier in the thread posted that they had a version 1.10 bot, so I'm wondering if it's the same "stealth-mode" group, or someone else who is mimicking their user-agent.
The Omni-Explorer site is now (sort-of) online:
What does that mean? Where did you pull the info mdreher?
I see crazy crawling across 4-5 different sites I monitor as of this week that we’ve tried block? With no discernable benefit from any research on this bot or any proof that it obeys a robots.txt file … we are left with little choice but to only waste time figuring out who it is and if they are really legit? So then we try to to block it... am I wrong here?
(URL necessary here to show original source.)
And no, I'm not with them - I don't endorse them. Just another person who got hit by them as well. Mine is a low-traffic site, and let's just say the sudden bandwidth spike made me take notice.
(Edit to clarify and reinforce Wilderness' understanding of my position.)
In terms of using the robots text (as they recommend) there seems to be many versions so I‘m guessing you would have to list each one?
I’m going to try…
RewriteCond %{HTTP_USER_AGENT} ^Omni [NC]
jd01, what's the significance of the “no case”?
[google.com...]
OmniExplorer_Bot/3.06d (+http://www.omni-explorer.com) WorldIndexer
Came back this morning and found this batch:
65.19.150.212
65.19.150.231
65.19.150.227
65.19.150.244
65.19.150.244
65.19.150.222
They're web site claims the IP address range is 65.19.150.193 - 65.19.150.254 which is either incorrect or someone else is using their agent name to slide in under the radar. They also claim to be a venture backed Silicon Valley startup but the domain is registered in Oregon.
Additional info provided:
The Omni-Explorer agent is: OmniExplorer_Bot/1.09. To prevent it from crawling your site, please put the following in your robots.txt file:User-Agent: OmniExplorer_Bot
Disallow: /We will also obey the delay directive (in seconds) for how long to wait between page views on your site:
Crawl-delay: 2
Considering what crawled me today claims to be version 3.06d the web site is horribly out of date even though is claims to be updated this month.
# UA "OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Internet Categorizer"
# UA "OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com) Internet Categorizer"
# UA "OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com)"
# UA "OmniExplorer_Bot/1.10 (+http://www.omni-explorer.com) Jobs Crawler"
# UA "OmniExplorer_Bot/1.18 (+http://www.omni-explorer.com) Torrent Crawler"
# UA "OmniExplorer_Bot/2.3 (+http://www.omni-explorer.com) WorldIndexer"
# UA "OmniExplorer_Bot/2.57 (+http://www.omni-explorer.com) WorldIndexer"
# UA "OmniExplorer_Bot/2.67 (+http://www.omni-explorer.com) WorldIndexer"
# UA "OmniExplorer_Bot/2.69 (+http://www.omni-explorer.com) WorldIndexer"
# UA "OmniExplorer_Bot/2.70 (+http://www.omni-explorer.com) WorldIndexer"
# UA "OmniExplorer_Bot/2.71 (+http://www.omni-explorer.com) WorldIndexer"
# UA "OmniExplorer_Bot/2.73 (+http://www.omni-explorer.com) WorldIndexer"
# UA "OmniExplorer_Bot/2.78a (+http://www.omni-explorer.com) WorldIndexer"
# UA "OmniExplorer_Bot/2.82 (+http://www.omni-explorer.com) WorldIndexer"
coming from these IP addresses:
64.62.175.130
64.62.175.131
64.62.175.137
64.71.131.109
64.71.131.117
65.19.150.206
65.19.150.207
65.19.150.208
65.19.150.209
65.19.150.210
65.19.150.211
65.19.150.212
65.19.150.213
65.19.150.214
65.19.150.220
65.19.150.221
65.19.150.222
65.19.150.223
65.19.150.224
65.19.150.225
65.19.150.226
65.19.150.227
65.19.150.228
65.19.150.229
65.19.150.230
65.19.150.231
65.19.150.232
65.19.150.233
65.19.150.234
65.19.150.235
65.19.150.236
65.19.150.237
65.19.150.238
65.19.150.239
65.19.150.240
65.19.150.241
65.19.150.242
65.19.150.243
65.19.150.244
65.19.150.245
65.19.150.246
65.19.150.247
65.19.150.248
65.19.150.249
65.19.150.250
65.19.150.251
65.19.169.228
65.19.169.229
65.19.169.230
65.19.150.250
65.19.169.242
65.19.169.252
65.19.169.254
All of the recent spidering activity has come from the 65.19.150.* block.
RewriteCond %{HTTP_USER_AGENT} ^Missigua [OR]
RewriteCond %{HTTP_USER_AGENT} ^Omni
RewriteRule ^.* - [F]
Should they be able to get past that re-write? I see thousands of hits and still see many different names as volatilegx mentions like:
OmniExplorer_Bot/1.09
OmniExplorer_Bot/2.93 ..etc
So maybe "^Omni" on it's own doesn't cover every combo?
So maybe "^Omni" on it's own doesn't cover every combo?
soquinn,
the line covers everything where the user-agent begins-with Omni
If the user-agent does not begin with Omni, than you'll need to find another term that works.
These method of (begins-with, ends-with and contains) were all explained in Msg#12 of this thread.
Perhaps you need to bookmark it or find a more thorough explantion?
It begins with Omni and is not case sensitive so I’m puzzled. The ( ^ ) binds the match to the beginning of the User-Agent string but I’ve had trouble finding out if it still works with compound words and special characters, underscores, back slashes and the many versions like volatilegx listed? Teach a man to fish, right?
The corrected code posted above will block Omni as stated, but what does that mean? It means the Omni will get a 403-Forbidden response from your server, and will not be able to access the requested page. However, you will still see Omni in your 'stats' as having visited your site. A closer examination, looking at your raw server access log files, will show that Omni got 403-Forbidden responses to all requests, however.
The above assumes that you have privileges to use mod_rewrite, and that it is configured properly.
If you want to completely block all access to your server (so that they get no response at all, and you see no log entries) from that IP range, you'll need to do it at the server firewall.
If you want to completely block all access to your server (so that they get no response at all, and you see no log entries) by that user-agent, then you'll need a very expensive enterprise-class firewall.
Just wanted to adjust expectations, if needed...
JIm
Teach a man to fish, right?
Next requirement is teaching the man how to work the reel, :) then the depth finder, then showing him how to put the fish on the stringer and on and on.
EVERY UA example that Dan provided in Msg#44 BEGINS with Omni.
You could have anything from a simple syntax error to not having "Rewrite on".
On numerous occassions, I've made syntax errors in an IP range rewrite and then in a week or two I notice that something isn't working properly. The two days that follow of going line-by-line through my extensive htaccess are very humbling.
I have a question for you and it's not my desire to be facetious?
If your unable to get a simple rewrite begins-with functioning?
How do you expect to implement more complicated rewrites?
My suggestion is to get what you currently have functioning before attempting to understand other options.
Just wanted to adjust expectations, if needed...
Jim,
Is it possible that a webmaster attemping to implement rewrites doesn't understand access codes?
Don't answer ;)
[faqs.org...]
or
[members.tripod.com...]