homepage Welcome to WebmasterWorld Guest from 54.167.179.48
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Are these considered as bad bots ?
pkKumar



 
Msg#: 4533935 posted 1:17 pm on Jan 7, 2013 (gmt 0)

While analyzing my log file i found these:

182.118.20.232 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.8.0.11)+Gecko/20070312+Firefox/1.5.0.11;+360Spider

98.100.226.10 Mozilla/5.0+(textmode;+U;+Linux+i386;+en-US;+rv:3.0.110.0)+Gecko/20101006+EzineArticlesLinkScanner/3.0.0g

89.185.234.128 HTTP/1.0 - - - abc.com 301 0 0 231 44 483

78.158.11.226 HTTP/1.0 Lynx/2.8.5rel.1+libwww-FM/2.15FC+SSL-MM/1.4.1c+OpenSSL/0.9.7e-dev

Are these to be considered as bat bots?

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 4:46 pm on Jan 7, 2013 (gmt 0)

Hi pkKumar and welcome to WebmasterWorld!

That depends, what were they doing?

The answer also depends on what you think is acceptable for something to do on your site.

None of them would've gained access to my server by default but that's a whole different story.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 5:50 pm on Jan 7, 2013 (gmt 0)

Bad words in any UA!
From the lines you provided.

Spider
Link
Scanner
libwww
dev

Lynx/2.8.5rel.1+libwww-FM/2.15FC+SSL-MM/1.4.1c+OpenSSL/0.9.7e-dev


This is a Lynx tool for viewing html in the same manner that a robot views html.
The tool is freely useable by anybody.

Another may advise you of 89.182. range.
I may only guess it was a blank UA, since you provided nothing.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4533935 posted 10:46 pm on Jan 7, 2013 (gmt 0)

I allow ezine scanner - my daughter writes articles for them and they come around verifying her site - not always successfully: they seem to have a few problems every now and then.

The others would not be accepted here.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 11:25 pm on Jan 7, 2013 (gmt 0)



Exmasters hosting
89.185.228.0 - 89.185.229.255
89.185.228.0/23

pkKumar



 
Msg#: 4533935 posted 4:21 am on Jan 8, 2013 (gmt 0)

can these be blocked with robots.txt, whether they will follow robots.txt, if not then you will say block in .htaccess but mine is windows server shared hostng. Site is in asp.net. What's the alternative ?

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 4:34 am on Jan 8, 2013 (gmt 0)

You cannot block robots with robots.txt but you can disallow areas of your site or the whole site for compliant robots only. Usually, compliant robots show a URL where you can read their policies. The robots.txt file does not force robots to obey your instructions. On Windows servers, look into using isapi instead of .htaccess, try a search here or ask your host.

pkKumar



 
Msg#: 4533935 posted 4:51 am on Jan 8, 2013 (gmt 0)

@not2easy, The user agent i have posted here doesn't have any email address or website mention.. That's the problem

pkKumar



 
Msg#: 4533935 posted 7:07 am on Jan 8, 2013 (gmt 0)

well isapi is not an option specially with shared hosting server. I didn't find anything here but according to a blog we can do that with request filtering in web.config like this.
<requestFiltering>
<filteringRules>
<filteringRule name="Block bot" scanUrl="false" scanQueryString="false">
<scanHeaders>
<add requestHeader="user-agent" />
</scanHeaders>
<appliesTo />
<denyStrings>
<clear />
<add string="googlebot" />
</denyStrings>
</filteringRule>
</filteringRules>
</requestFiltering>

For IP it has:
<ipSecurity allowUnlisted="true">
<clear/>
<add ipAddress="111.111.111.111"/> <!-- Blocks a specific IP. -->
<add ipAddress="222.222.222.0" subnetMask="255.255.255.0"/> <!-- Blocks entire subnet. -->
</ipSecurity>

Has anybody tried that ?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 10:49 am on Jan 8, 2013 (gmt 0)

Best thing you may do is find another host and method immediately.

One that uses Apache shared.

If you wait too long, you'll have so much time invested in alternative methods that you'll be unable to move later.

Has anybody tried that ?


just do a google on

<denyStrings> filtering

pkKumar



 
Msg#: 4533935 posted 11:57 am on Jan 8, 2013 (gmt 0)

i know its a silly question but i have heard of linux and windows shared hosting server till know. No idea about how Apache works.

other thing, if you have asp.net website then it is recommended to host on a windows server, is it correct ?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 12:51 pm on Jan 8, 2013 (gmt 0)

other thing, if you have asp.net website then it is recommended to host on a windows server, is it correct ?


I have no clue!
Never had an asp site and I've had websites for more than a decade.

All my hosts (shared) have been Apache based.
Having an Apache host doesn't require any real knowledge of Apache, rather the host does all that and presents you with a control panel.

99.% of the participants in this forum use Apache htaccess to manage access restrictions.

All I'm suggesting is that your fighting the longtime-trend by going with Windoze and ASP. If your looking for help in same regard, than you will be better off locating a forum that specializes in those categories.

Microsoft IIS Web Server and ASP.NET [webmasterworld.com]

There's a forum participant that could help you'll if he'd just come out of lurk mode.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4533935 posted 8:31 pm on Jan 8, 2013 (gmt 0)

I've been using only IIS/ASP hosting for over twelve years and wish I'd never dropped linux to begin with! By the time I realised the mistake I had a good-sized library of ASP code. :(

I get over the problem with a file of ASP code that traps various IPs, UAs and methods with the aid of a MySQL database of banned and blocked IPs (there is a difference); the database also holds many known ranges for broadband ISPs. Newly auto-blocked (previously unencountered - typically 10 to 20 a day) IPs are analyzed by me and the full range added to the database.

The code file is included in every web page - takes a bit of extra time to load and test but not significant (typically 15 to 40 milliseconds), even with the MySQL IP database behind it (I have a timer that monitors this) and logging the result(s) to half a dozen text-based logs.

My method was adopted some years ago because, at the time, ASP had only an expensive third-party isapi application. Now, I would recommend using the htaccess-like isapi module under IIS. Advantages of that include control of image serving, which my method does not easily cover.

keyplr - extend that IP range to 202.85.192.0 - 202.85.207.255

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 9:58 pm on Jan 8, 2013 (gmt 0)

Many thanks for coming out of lurk-mode ;)

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4533935 posted 10:23 pm on Jan 8, 2013 (gmt 0)

i know its a silly question but i have heard of linux and windows shared hosting server till know. No idea about how Apache works.

You're mixing two unrelated things. There are two basic types of server, IIS and Apache. Don't know about IIS, but Apache can run on any operating system. You can also get a pseudo-server that runs Apache on your personal computer (Mac, Windows or Linux) so you can test your site exactly the way it would appear in real life. Including things like site-absolute links and php files that you can't do on your local browser.

You don't need to know what OS your server uses. You only need to know whether it's Apache or not-Apache, because that determines whether you're making your rules in htaccess or, er, something else.

Since Apache is more widely used overall, you will find more people who know what to do about it. F'rinstance, we'd offer up at least three different ways to block a blank user-agent ;)

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4533935 posted 11:55 pm on Jan 8, 2013 (gmt 0)

I am in the same boat as dstiles, except I run ColdFusion(for over a decade and a half) on IIS/MSSQL/MySQL, no need for Apache.

I see a lot of Windows hosts offer "Mod-Rewrite for IIS" on IIS 6/7/7.5 on share hosting plans. There are also several other flavors out there.

.htaccess(and other Apache bells and whistles) is a great tool, but I only use it when I want to block something and don't want to hear about it(like specific SE spiders from Asia) in my custom logs. I don't use IIS logs, due to the fact that all data is stored in very well optimized & encrypted Schema, then there are custom very well optimized CF Apps written on top of that. This way I keep all the Data, all I need to do is back up DB and Code and I am a history, with the history ;).

@pkKumar,

Do your self a favor and start analyzing request headers from your visitors(programmatically record them), then compare them and learn. Take notes on different UAs. 99% of the Bad Bots and Scrapers have them a$* sideways/backwards and in a wrong order. I wount go into the details on this, But that should get You started!

^ P.S. It is all incrediBILL's fault, he got me thinking, and I now see the light! :) Easy prey for a bot blocker code logic.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 1:00 am on Jan 9, 2013 (gmt 0)

he got me thinking


Sorry about that.

Won't let it happen again! :)

Those headers can make a huge difference because all browsers pretty much send the same headers all the time but bots do not and it's much easier to trap tons of garbage via header analysis (for now) than user agents. Made a huge difference in how much stuff I was able to block than I did before once I started looking at them in more detail.

Just for fun I ran a test and disabled the data center blocking list just to see how much the header tests would stop all by themselves and it was amazing that a simple bit of code could really block most of the current crawler crud without user agent parsing, data center block lists, blacklists or any of the other time consuming crap.

Sadly, some have done a better job at faking headers and you need all the other stuff to still block them but at least they're currently the minority.

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4533935 posted 3:19 am on Jan 9, 2013 (gmt 0)

some have done a better job at faking headers


They still send "Keep Alive:300" :)

pkKumar



 
Msg#: 4533935 posted 9:16 am on Jan 9, 2013 (gmt 0)

You're mixing two unrelated things. There are two basic types of server, IIS and Apache.


ok so in the background there are two types of web server IIS and apache. I have asp.net site then i guess IIS is the preferd choice. Can asp.net site run on Apache server ?

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 9:34 am on Jan 9, 2013 (gmt 0)

Can asp.net site run on Apache server ?

AFAIK you'd need to switch to PHP or use another server side language for functions you currently use asp for. That's why it was advised to switch over before you amass a lot of application dependent documents.

The beauty about using Apache, is it is a standard that has been built upon over the years, using core modules. Unless your host has installed some odd custom configuration, these functions will always be applicable and are easily researched for solutions to your specific needs, and since approx 3x as many webmasters use Apache than IIS, as Lucy said, you'll get more help with questions.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4533935 posted 9:56 pm on Jan 9, 2013 (gmt 0)

Wilderness - Me? Lurking? :)

pkKumar - IIS is IIS and nothing else, although it does have .NET as a so-called "smart" "plugin" - never use it myself. So: IIS is not "preferred" but mandatory.

ASP code CAN run on an emulator - I think there are two or three - but it's not something I would ever consider. I looked at them a year or two ago and they were not reported as reliable. In fact, few emulators of this nature are every completely reliable.

If you only have one or two web sites to manage it's worth re-coding for apache or some other linux/unix server, probably using PHP to code and MySQL as the database (if required). You can use PHP under IIS but why? If you decide on PHP then run it on the far safer (from hacks) linux machine.

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4533935 posted 8:56 pm on Jan 10, 2013 (gmt 0)

BTW, the is also this: [iis.net...]

Based on my experience, it is not as easy to switch from ASP/IIS to Apache/PHP when you/your client are already invested.

Good place to start though if one has decided to go the path of..., what do you call it LAMP now days ;)...

p.s. I know I am in mostly none-MS env, watch them keyboard buttons now..

Blend27

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4533935 posted 3:00 am on Jan 11, 2013 (gmt 0)

If you decide on PHP then run it on the far safer (from hacks) linux machine.


Seriously off the original topic but in what universe is PHP safer on Linux? Poorly written PHP scripts are actually the biggest threat to escalation on any server I've ever seen.

The .NET on Linux problem may be solved by MainSoft's Grasshopper doftware, but I've never tired it.

Anyway, this discussion is beyond the scope of Spider Identification.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved