homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 51 message thread spans 2 pages: 51 ( [1] 2 > >     
Hurricane Electric
570 MB in two hour period
OptiRex



 
Msg#: 2800 posted 11:28 am on Apr 1, 2005 (gmt 0)

Hi

This morning 64.62.175.137 sucked me for in excess of 570 MB of bandwidth. The IP is Hurricane Electric from California however whether it is them directly I have no idea.

I have written to them requesting if they can supply me with an explanation however has anyone else seen such activity?

Thanks.

 

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 2800 posted 8:11 pm on Apr 2, 2005 (gmt 0)

I saw them once today with UA

OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com) Cars Crawler

It's the third time, each time with the bit in bold different and previous times with /1.08

no request for robots.txt so consequently banned ;o)

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 2800 posted 11:41 pm on Apr 3, 2005 (gmt 0)

Another one today from the same range

64.62.175.133 OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com) Rentals Crawler

ken_b

WebmasterWorld Senior Member ken_b us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 2800 posted 12:00 am on Apr 4, 2005 (gmt 0)

Yeah, the Cars Crawler hit my site yesterday too. Never saw them before.

Galtego

10+ Year Member



 
Msg#: 2800 posted 6:26 pm on Apr 21, 2005 (gmt 0)

After seeing over two thousand requests yesterday and today, looking at requests from their IP range when they are using a browser, it looks like they are building scraper directories.

annelisabeth

5+ Year Member



 
Msg#: 2800 posted 6:42 pm on May 23, 2005 (gmt 0)

Got hit for 300 megabytes yesterday.

I grepped my files and found these IP addresses:

64.71.131.107
64.71.131.108
64.71.131.109
64.71.131.110
64.71.131.111
64.71.131.112
64.71.131.114
64.71.131.115
64.71.131.120
64.71.131.121

The last two IP numbers were the the ones with a straw into two of my sites. The other accessed now and then. I didn't see those accesses until May 16.

They were present in April in the 64.62.175. range. Usually hovering around 133-137

mrjpcool

10+ Year Member



 
Msg#: 2800 posted 6:12 am on May 28, 2005 (gmt 0)

That bot has used 4.4 gigs of bandwidth on one of our sites in the last 2 days. We have blocked these IP ranges to stop them:

65.19.169.2*
64.71.131.1*
64.62.175.****

We also blocked all reuqest where OmniExplorer is used as an agent.

Last week they hit us for 13 gigs in 3 days. This bot is really bad!

soquinn

10+ Year Member



 
Msg#: 2800 posted 4:56 am on May 31, 2005 (gmt 0)

mrjpcool, how did you block all requests where OmniExplorer is used as an agent? We've been hit too but with different names:

OmniExplorer_Bot/1.07
OmniExplorer_Bot/1.09
OmniExplorer_Bot/1.10

Are they, legit?

hanuman

10+ Year Member



 
Msg#: 2800 posted 2:00 pm on May 31, 2005 (gmt 0)

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer [OR]
RewriteRule ^.* - [F]

in the .htaccess should do it

soquinn

10+ Year Member



 
Msg#: 2800 posted 2:49 pm on May 31, 2005 (gmt 0)

Thanks, I know they are eating up bandwidth but anyone know if they are legit?

soquinn

10+ Year Member



 
Msg#: 2800 posted 2:59 pm on May 31, 2005 (gmt 0)

also can you block more bots by just adding more lines like:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot1 [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot2 [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot3 [OR]
RewriteRule ^.* - [F]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2800 posted 8:22 pm on May 31, 2005 (gmt 0)

"also can you block more bots by just adding more lines like"

As many as you desire.
There are three options for the line preferences.

"^" Begins with (Does not require the full name and good
for catching may which have the same beginning.)
"$" ends with

"contains" does NOT require any leading character and the phrase may be any place in the UA.

Some examples:
[webmasterworld.com...]

SetEnvIf User-Agent ^Java keep_out

UA begins with the word Java.

SetEnvIf User-Agent ^Web keep_out

UA begins with the word Web.

SetEnvIf User-Agent Library$ keep_out

UA ends with the word Library.

"SetEnvIf User-Agent Library keep_out"
UA contains the word Library.

volatilegx

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 2800 posted 2:28 pm on Jun 1, 2005 (gmt 0)

I note that the SetEnvIf method of banning bots does not require Mod_Rewrite, which is not available on all Apache servers.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2800 posted 2:50 pm on Jun 1, 2005 (gmt 0)

Hey Dan,
The examples and explanations that I provided (as related to begins with, ends with and contains) would also work with Mod_Rewrite (AFAIK).

The use of Mod_Rewrite for me didn't begin until I began reading this forum.
Previously I only used SetEnvIf and deny from.
Today I use a mixture of both, however remain with most UA denies for SetEnvIf.

One noticeable difference between the two options is that SetEnvIf and deny from do not allowed the denied IP or UA to view robots.txt.

If there's one thing that many of the regulars have learned in this forum, it's that there are multiple methods to implement these procedures.

Don

GaryK

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 2800 posted 10:13 pm on Jun 1, 2005 (gmt 0)

I've also seen this crawler with Job Crawler, Jobs Crawler, and Internet Categorizer appended to the basic user agent.

niki_man

5+ Year Member



 
Msg#: 2800 posted 4:24 pm on Jun 5, 2005 (gmt 0)

Do I have to use?
deny from 64.62.175.****

Can't I just use
deny from 64.62.175.*?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2800 posted 6:09 pm on Jun 5, 2005 (gmt 0)

Can't I just use
deny from 64.62.175.*?

niki,
if you use?
deny from 64.62.175.

that denies the entire 0-255 range of the D Class.
If that is what you desire?
Make sure you use the last period ending the C Class.

Don

mcneely

10+ Year Member



 
Msg#: 2800 posted 8:44 pm on Jun 5, 2005 (gmt 0)

Yes, this is quite a nusance

64.71.131.*** OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com) personals Crawler

hermosa

10+ Year Member



 
Msg#: 2800 posted 8:58 pm on Jun 6, 2005 (gmt 0)

Please explain further. Show exactly what goes into the file where I should save that code and the name under which we should save it. Thank you.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2800 posted 10:13 pm on Jun 6, 2005 (gmt 0)

Please explain further.

Was this inquiry to my previous response to niki?
I'm not sure, since you did not include any quoted material?

Or was your inquiry to another in this thread?

Jaunty Edward

5+ Year Member



 
Msg#: 2800 posted 5:43 am on Jun 7, 2005 (gmt 0)

Hi,
I was hit by this one too, its taking GB's in days. When i use this script in the .htaccess I myself can not see the website. May be becoz IE also has explorer into the UserAgent.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer [OR]
RewriteRule ^.* - [F]

Any other ideas how to block this bot.

Thanks

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2800 posted 11:16 am on Jun 7, 2005 (gmt 0)

When i use this script in the .htaccess I myself can not see the website. May be becoz IE also has explorer into the UserAgent.
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer [OR]
RewriteRule ^.* - [F]

Any other ideas how to block this bot.

Jaunty,
IF the above rewrite is the only rewrite that you have in your htaccess?
Than you would be required to remove the trailing "[OR]".

Use of this trailing OR would likely provide you and any other visitor with a 500 Server Error because your website was not functioning, which is not the same as yourself being denied access.

Most web hosts these days are using CPanel, which offers an option for directly adding IP ranges.
Under Advanced Tools and IP Deny Manager (Just insert the range [and don't forget the ending "." if only using the first three classes.])

This page might offer a more extensive explantion of SetEnvIf and deny from:
[webhelpinghand.com...]

This forum link offers some examples of those methods applied:
[webmasterworld.com...]
Many examples
[webmasterworld.com...]

Don

BTW, even the lines that soquinn provided in message #11 of this thread would NOT function?
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot1 [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot2 [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot3 [OR]
RewriteRule ^.* - [F]

because the trailing [OR] was improperly inlcuded in the last line. There are not any additional lines after that closing line whichj would warrant an OR option.
Instead the proper syntax would be:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot1 [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot2 [OR]
RewriteCond %{HTTP_USER_AGENT} ^anotherbadbot3
RewriteRule ^.* - [F]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2800 posted 11:58 am on Jun 7, 2005 (gmt 0)

for starting fresh in creating an htaccess file?
Varies from web host to web host, depending upon the default set-up of the host.
Some require additional lines to be inserted (I've been with the same host for over five years and they do not.)
My server defaults with these beginning lines:

Options -Indexes
<Limit GET>
</Limit GET>

To provide some examples of what I've inserterted?
Options -Indexes (host default)
<Limit GET> (host default)
SetEnvIf User-Agent Become keep_out
SetEnvIf Referer ^file keep_out
SetEnvIfNoCase Referer yellowbrick keep_out
order allow,deny
deny from 193.
deny from 194.
allow from all
deny from env=keep_out
</Limit> (host default)
RewriteRule ^robots\.txt$ - [L]
RewriteRule .*$ - [F]
RewriteCond %{HTTP_REFERER} ^www.addresses.com.* [OR]
RewriteCond %{HTTP_REFERER} ^www.alexa.com.* [OR]
RewriteCond %{HTTP_REFERER} ^XXXX:.*
RewriteRule .*$ - [F]
RewriteCond %{REMOTE_ADDR} ^12\.175\.0\.(3[2-9]¦4[0-7])$ [OR]
RewriteCond %{REMOTE_ADDR} ^83\.(1[1-9][0-9]¦2[0-5][0-9])\.
RewriteRule .*$ - [F]
end of quote

Please not that the "(host default)" above are NOT part of the functioning htaccess, rather notations.
Also in the lines above where I use "keep_out", you may use any term you desire, JUST as long as you use the same term at the end of the UA line and at the end of the group section. (after; deny from env=)

There are others who participate in this forum who are much more knowledgeable about htaccess, Apache and Regex than myself. The others are also more adapt than myself in providing answers.

Jdmorgans Apache forum is a good source (even though it contains many things you may not be interested in or may never use:
[webmasterworld.com...]

Jaunty Edward

5+ Year Member



 
Msg#: 2800 posted 12:22 pm on Jun 7, 2005 (gmt 0)

Hi wilderness,

yes your suggestion worked... atlest i can see my site, lets see if Omni can come again.

Thank you very much for the help.

For others with no knowledge about all this.
Just make this your first three lines of the .htaccess file

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer
RewriteRule ^.* - [F]

thats it.

Thanks

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2800 posted 1:47 pm on Jun 7, 2005 (gmt 0)

For others with no knowledge about all this.

I would suggest taking the time and going through the very extensive "Close To Perfect Htaccess" to help you understand how these procedures are implemented:

[webmasterworld.com...]

Don

soquinn

10+ Year Member



 
Msg#: 2800 posted 6:57 pm on Jun 7, 2005 (gmt 0)

wilderness, you are right I've had trouble with:


AddType application/x-httpd-php .htm .html
Options -Indexes
RewriteEngine on
RewriteCond %{HTTP_HOST} ^mymainsite\.com
RewriteCond %{HTTP_HOST} ^anothersite\.com
RewriteCond %{HTTP_HOST} ^www\.anothersite\.com
RewriteRule ^(.*) [mymainsite.com...] [L,R=301]
RewriteCond %{HTTP_USER_AGENT} ^Missigua [OR]
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer [OR]
RewriteRule ^.* - [F]

I'll try removing the [OR] from the second last line... keep geting 403 forbidden using MS explorer

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2800 posted 11:12 pm on Jun 7, 2005 (gmt 0)

I'll try removing the [OR] from the second last line... keep geting 403 forbidden using MS explorer

soquinn,
Nothing you have in the lines you provided in msg #26 would deny MS Explorer.
The extra [OR] might generate a 500 error and make your site unavailable to ALL visitors (yourself included).

Sticky mail me your entire htaccess and I'll take a look.

Don

mdreher

5+ Year Member



 
Msg#: 2800 posted 12:14 am on Jun 9, 2005 (gmt 0)

They've also hit one of my websites as well - 65.19.169.236 was the IP that was used. Using the 1.07 version of the bot.

micha2305

10+ Year Member



 
Msg#: 2800 posted 3:23 pm on Jun 15, 2005 (gmt 0)

OmniExplorer_Bot/1.09 hit my site. The most stupid thing is that it actually "pushes buttons", i.e. it sends POST requests to the server. Stupid!

(I have a "form" where the user needs to push a button to get a verification email. Seems like I have to add a CAPTCHA thingy there.)

volatilegx

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 2800 posted 9:23 pm on Jun 15, 2005 (gmt 0)

Welcome to WebmasterWorld micha2305 and mdreher :)

Wow that's some new info! I wasn't aware that this bot makes POST requests.

This 51 message thread spans 2 pages: 51 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved