Welcome to WebmasterWorld Guest from 54.147.10.72

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Are these considered as bad bots ?

     

pkKumar

1:17 pm on Jan 7, 2013 (gmt 0)



While analyzing my log file i found these:

182.118.20.232 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.8.0.11)+Gecko/20070312+Firefox/1.5.0.11;+360Spider

98.100.226.10 Mozilla/5.0+(textmode;+U;+Linux+i386;+en-US;+rv:3.0.110.0)+Gecko/20101006+EzineArticlesLinkScanner/3.0.0g

89.185.234.128 HTTP/1.0 - - - abc.com 301 0 0 231 44 483

78.158.11.226 HTTP/1.0 Lynx/2.8.5rel.1+libwww-FM/2.15FC+SSL-MM/1.4.1c+OpenSSL/0.9.7e-dev

Are these to be considered as bat bots?

incrediBILL

4:46 pm on Jan 7, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Hi pkKumar and welcome to WebmasterWorld!

That depends, what were they doing?

The answer also depends on what you think is acceptable for something to do on your site.

None of them would've gained access to my server by default but that's a whole different story.

wilderness

5:50 pm on Jan 7, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Bad words in any UA!
From the lines you provided.

Spider
Link
Scanner
libwww
dev

Lynx/2.8.5rel.1+libwww-FM/2.15FC+SSL-MM/1.4.1c+OpenSSL/0.9.7e-dev


This is a Lynx tool for viewing html in the same manner that a robot views html.
The tool is freely useable by anybody.

Another may advise you of 89.182. range.
I may only guess it was a blank UA, since you provided nothing.

dstiles

10:46 pm on Jan 7, 2013 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I allow ezine scanner - my daughter writes articles for them and they come around verifying her site - not always successfully: they seem to have a few problems every now and then.

The others would not be accepted here.

keyplyr

11:25 pm on Jan 7, 2013 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Exmasters hosting
89.185.228.0 - 89.185.229.255
89.185.228.0/23

pkKumar

4:21 am on Jan 8, 2013 (gmt 0)



can these be blocked with robots.txt, whether they will follow robots.txt, if not then you will say block in .htaccess but mine is windows server shared hostng. Site is in asp.net. What's the alternative ?

not2easy

4:34 am on Jan 8, 2013 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



You cannot block robots with robots.txt but you can disallow areas of your site or the whole site for compliant robots only. Usually, compliant robots show a URL where you can read their policies. The robots.txt file does not force robots to obey your instructions. On Windows servers, look into using isapi instead of .htaccess, try a search here or ask your host.

pkKumar

4:51 am on Jan 8, 2013 (gmt 0)



@not2easy, The user agent i have posted here doesn't have any email address or website mention.. That's the problem

pkKumar

7:07 am on Jan 8, 2013 (gmt 0)



well isapi is not an option specially with shared hosting server. I didn't find anything here but according to a blog we can do that with request filtering in web.config like this.
<requestFiltering>
<filteringRules>
<filteringRule name="Block bot" scanUrl="false" scanQueryString="false">
<scanHeaders>
<add requestHeader="user-agent" />
</scanHeaders>
<appliesTo />
<denyStrings>
<clear />
<add string="googlebot" />
</denyStrings>
</filteringRule>
</filteringRules>
</requestFiltering>

For IP it has:
<ipSecurity allowUnlisted="true">
<clear/>
<add ipAddress="111.111.111.111"/> <!-- Blocks a specific IP. -->
<add ipAddress="222.222.222.0" subnetMask="255.255.255.0"/> <!-- Blocks entire subnet. -->
</ipSecurity>

Has anybody tried that ?

wilderness

10:49 am on Jan 8, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Best thing you may do is find another host and method immediately.

One that uses Apache shared.

If you wait too long, you'll have so much time invested in alternative methods that you'll be unable to move later.

Has anybody tried that ?


just do a google on

<denyStrings> filtering

pkKumar

11:57 am on Jan 8, 2013 (gmt 0)



i know its a silly question but i have heard of linux and windows shared hosting server till know. No idea about how Apache works.

other thing, if you have asp.net website then it is recommended to host on a windows server, is it correct ?

wilderness

12:51 pm on Jan 8, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



other thing, if you have asp.net website then it is recommended to host on a windows server, is it correct ?


I have no clue!
Never had an asp site and I've had websites for more than a decade.

All my hosts (shared) have been Apache based.
Having an Apache host doesn't require any real knowledge of Apache, rather the host does all that and presents you with a control panel.

99.% of the participants in this forum use Apache htaccess to manage access restrictions.

All I'm suggesting is that your fighting the longtime-trend by going with Windoze and ASP. If your looking for help in same regard, than you will be better off locating a forum that specializes in those categories.

Microsoft IIS Web Server and ASP.NET [webmasterworld.com]

There's a forum participant that could help you'll if he'd just come out of lurk mode.

dstiles

8:31 pm on Jan 8, 2013 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I've been using only IIS/ASP hosting for over twelve years and wish I'd never dropped linux to begin with! By the time I realised the mistake I had a good-sized library of ASP code. :(

I get over the problem with a file of ASP code that traps various IPs, UAs and methods with the aid of a MySQL database of banned and blocked IPs (there is a difference); the database also holds many known ranges for broadband ISPs. Newly auto-blocked (previously unencountered - typically 10 to 20 a day) IPs are analyzed by me and the full range added to the database.

The code file is included in every web page - takes a bit of extra time to load and test but not significant (typically 15 to 40 milliseconds), even with the MySQL IP database behind it (I have a timer that monitors this) and logging the result(s) to half a dozen text-based logs.

My method was adopted some years ago because, at the time, ASP had only an expensive third-party isapi application. Now, I would recommend using the htaccess-like isapi module under IIS. Advantages of that include control of image serving, which my method does not easily cover.

keyplr - extend that IP range to 202.85.192.0 - 202.85.207.255

wilderness

9:58 pm on Jan 8, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Many thanks for coming out of lurk-mode ;)

lucy24

10:23 pm on Jan 8, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



i know its a silly question but i have heard of linux and windows shared hosting server till know. No idea about how Apache works.

You're mixing two unrelated things. There are two basic types of server, IIS and Apache. Don't know about IIS, but Apache can run on any operating system. You can also get a pseudo-server that runs Apache on your personal computer (Mac, Windows or Linux) so you can test your site exactly the way it would appear in real life. Including things like site-absolute links and php files that you can't do on your local browser.

You don't need to know what OS your server uses. You only need to know whether it's Apache or not-Apache, because that determines whether you're making your rules in htaccess or, er, something else.

Since Apache is more widely used overall, you will find more people who know what to do about it. F'rinstance, we'd offer up at least three different ways to block a blank user-agent ;)

blend27

11:55 pm on Jan 8, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am in the same boat as dstiles, except I run ColdFusion(for over a decade and a half) on IIS/MSSQL/MySQL, no need for Apache.

I see a lot of Windows hosts offer "Mod-Rewrite for IIS" on IIS 6/7/7.5 on share hosting plans. There are also several other flavors out there.

.htaccess(and other Apache bells and whistles) is a great tool, but I only use it when I want to block something and don't want to hear about it(like specific SE spiders from Asia) in my custom logs. I don't use IIS logs, due to the fact that all data is stored in very well optimized & encrypted Schema, then there are custom very well optimized CF Apps written on top of that. This way I keep all the Data, all I need to do is back up DB and Code and I am a history, with the history ;).

@pkKumar,

Do your self a favor and start analyzing request headers from your visitors(programmatically record them), then compare them and learn. Take notes on different UAs. 99% of the Bad Bots and Scrapers have them a$* sideways/backwards and in a wrong order. I wount go into the details on this, But that should get You started!

^ P.S. It is all incrediBILL's fault, he got me thinking, and I now see the light! :) Easy prey for a bot blocker code logic.

incrediBILL

1:00 am on Jan 9, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



he got me thinking


Sorry about that.

Won't let it happen again! :)

Those headers can make a huge difference because all browsers pretty much send the same headers all the time but bots do not and it's much easier to trap tons of garbage via header analysis (for now) than user agents. Made a huge difference in how much stuff I was able to block than I did before once I started looking at them in more detail.

Just for fun I ran a test and disabled the data center blocking list just to see how much the header tests would stop all by themselves and it was amazing that a simple bit of code could really block most of the current crawler crud without user agent parsing, data center block lists, blacklists or any of the other time consuming crap.

Sadly, some have done a better job at faking headers and you need all the other stuff to still block them but at least they're currently the minority.

blend27

3:19 am on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



some have done a better job at faking headers


They still send "Keep Alive:300" :)

pkKumar

9:16 am on Jan 9, 2013 (gmt 0)



You're mixing two unrelated things. There are two basic types of server, IIS and Apache.


ok so in the background there are two types of web server IIS and apache. I have asp.net site then i guess IIS is the preferd choice. Can asp.net site run on Apache server ?

keyplyr

9:34 am on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Can asp.net site run on Apache server ?

AFAIK you'd need to switch to PHP or use another server side language for functions you currently use asp for. That's why it was advised to switch over before you amass a lot of application dependent documents.

The beauty about using Apache, is it is a standard that has been built upon over the years, using core modules. Unless your host has installed some odd custom configuration, these functions will always be applicable and are easily researched for solutions to your specific needs, and since approx 3x as many webmasters use Apache than IIS, as Lucy said, you'll get more help with questions.

dstiles

9:56 pm on Jan 9, 2013 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Wilderness - Me? Lurking? :)

pkKumar - IIS is IIS and nothing else, although it does have .NET as a so-called "smart" "plugin" - never use it myself. So: IIS is not "preferred" but mandatory.

ASP code CAN run on an emulator - I think there are two or three - but it's not something I would ever consider. I looked at them a year or two ago and they were not reported as reliable. In fact, few emulators of this nature are every completely reliable.

If you only have one or two web sites to manage it's worth re-coding for apache or some other linux/unix server, probably using PHP to code and MySQL as the database (if required). You can use PHP under IIS but why? If you decide on PHP then run it on the far safer (from hacks) linux machine.

blend27

8:56 pm on Jan 10, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BTW, the is also this: [iis.net...]

Based on my experience, it is not as easy to switch from ASP/IIS to Apache/PHP when you/your client are already invested.

Good place to start though if one has decided to go the path of..., what do you call it LAMP now days ;)...

p.s. I know I am in mostly none-MS env, watch them keyboard buttons now..

Blend27

incrediBILL

3:00 am on Jan 11, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If you decide on PHP then run it on the far safer (from hacks) linux machine.


Seriously off the original topic but in what universe is PHP safer on Linux? Poorly written PHP scripts are actually the biggest threat to escalation on any server I've ever seen.

The .NET on Linux problem may be solved by MainSoft's Grasshopper doftware, but I've never tired it.

Anyway, this discussion is beyond the scope of Spider Identification.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month