Welcome to WebmasterWorld Guest from 54.159.50.111

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Are these considered as bad bots ?

     
1:17 pm on Jan 7, 2013 (gmt 0)

Junior Member

joined:Jan 7, 2013
posts:110
votes: 0


While analyzing my log file i found these:

182.118.20.232 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.8.0.11)+Gecko/20070312+Firefox/1.5.0.11;+360Spider

98.100.226.10 Mozilla/5.0+(textmode;+U;+Linux+i386;+en-US;+rv:3.0.110.0)+Gecko/20101006+EzineArticlesLinkScanner/3.0.0g

89.185.234.128 HTTP/1.0 - - - abc.com 301 0 0 231 44 483

78.158.11.226 HTTP/1.0 Lynx/2.8.5rel.1+libwww-FM/2.15FC+SSL-MM/1.4.1c+OpenSSL/0.9.7e-dev

Are these to be considered as bat bots?
4:46 pm on Jan 7, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Hi pkKumar and welcome to WebmasterWorld!

That depends, what were they doing?

The answer also depends on what you think is acceptable for something to do on your site.

None of them would've gained access to my server by default but that's a whole different story.
5:50 pm on Jan 7, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Bad words in any UA!
From the lines you provided.

Spider
Link
Scanner
libwww
dev

Lynx/2.8.5rel.1+libwww-FM/2.15FC+SSL-MM/1.4.1c+OpenSSL/0.9.7e-dev


This is a Lynx tool for viewing html in the same manner that a robot views html.
The tool is freely useable by anybody.

Another may advise you of 89.182. range.
I may only guess it was a blank UA, since you provided nothing.
10:46 pm on Jan 7, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


I allow ezine scanner - my daughter writes articles for them and they come around verifying her site - not always successfully: they seem to have a few problems every now and then.

The others would not be accepted here.
11:25 pm on Jan 7, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5821
votes: 64




Exmasters hosting
89.185.228.0 - 89.185.229.255
89.185.228.0/23
4:21 am on Jan 8, 2013 (gmt 0)

Junior Member

joined:Jan 7, 2013
posts:110
votes: 0


can these be blocked with robots.txt, whether they will follow robots.txt, if not then you will say block in .htaccess but mine is windows server shared hostng. Site is in asp.net. What's the alternative ?
4:34 am on Jan 8, 2013 (gmt 0)

Moderator from US 

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:2572
votes: 48


You cannot block robots with robots.txt but you can disallow areas of your site or the whole site for compliant robots only. Usually, compliant robots show a URL where you can read their policies. The robots.txt file does not force robots to obey your instructions. On Windows servers, look into using isapi instead of .htaccess, try a search here or ask your host.
4:51 am on Jan 8, 2013 (gmt 0)

Junior Member

joined:Jan 7, 2013
posts:110
votes: 0


@not2easy, The user agent i have posted here doesn't have any email address or website mention.. That's the problem
7:07 am on Jan 8, 2013 (gmt 0)

Junior Member

joined:Jan 7, 2013
posts:110
votes: 0


well isapi is not an option specially with shared hosting server. I didn't find anything here but according to a blog we can do that with request filtering in web.config like this.
<requestFiltering>
<filteringRules>
<filteringRule name="Block bot" scanUrl="false" scanQueryString="false">
<scanHeaders>
<add requestHeader="user-agent" />
</scanHeaders>
<appliesTo />
<denyStrings>
<clear />
<add string="googlebot" />
</denyStrings>
</filteringRule>
</filteringRules>
</requestFiltering>

For IP it has:
<ipSecurity allowUnlisted="true">
<clear/>
<add ipAddress="111.111.111.111"/> <!-- Blocks a specific IP. -->
<add ipAddress="222.222.222.0" subnetMask="255.255.255.0"/> <!-- Blocks entire subnet. -->
</ipSecurity>

Has anybody tried that ?
10:49 am on Jan 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Best thing you may do is find another host and method immediately.

One that uses Apache shared.

If you wait too long, you'll have so much time invested in alternative methods that you'll be unable to move later.

Has anybody tried that ?


just do a google on

<denyStrings> filtering
11:57 am on Jan 8, 2013 (gmt 0)

Junior Member

joined:Jan 7, 2013
posts:110
votes: 0


i know its a silly question but i have heard of linux and windows shared hosting server till know. No idea about how Apache works.

other thing, if you have asp.net website then it is recommended to host on a windows server, is it correct ?
12:51 pm on Jan 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


other thing, if you have asp.net website then it is recommended to host on a windows server, is it correct ?


I have no clue!
Never had an asp site and I've had websites for more than a decade.

All my hosts (shared) have been Apache based.
Having an Apache host doesn't require any real knowledge of Apache, rather the host does all that and presents you with a control panel.

99.% of the participants in this forum use Apache htaccess to manage access restrictions.

All I'm suggesting is that your fighting the longtime-trend by going with Windoze and ASP. If your looking for help in same regard, than you will be better off locating a forum that specializes in those categories.

Microsoft IIS Web Server and ASP.NET [webmasterworld.com]

There's a forum participant that could help you'll if he'd just come out of lurk mode.
8:31 pm on Jan 8, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


I've been using only IIS/ASP hosting for over twelve years and wish I'd never dropped linux to begin with! By the time I realised the mistake I had a good-sized library of ASP code. :(

I get over the problem with a file of ASP code that traps various IPs, UAs and methods with the aid of a MySQL database of banned and blocked IPs (there is a difference); the database also holds many known ranges for broadband ISPs. Newly auto-blocked (previously unencountered - typically 10 to 20 a day) IPs are analyzed by me and the full range added to the database.

The code file is included in every web page - takes a bit of extra time to load and test but not significant (typically 15 to 40 milliseconds), even with the MySQL IP database behind it (I have a timer that monitors this) and logging the result(s) to half a dozen text-based logs.

My method was adopted some years ago because, at the time, ASP had only an expensive third-party isapi application. Now, I would recommend using the htaccess-like isapi module under IIS. Advantages of that include control of image serving, which my method does not easily cover.

keyplr - extend that IP range to 202.85.192.0 - 202.85.207.255
9:58 pm on Jan 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Many thanks for coming out of lurk-mode ;)
10:23 pm on Jan 8, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12721
votes: 244


i know its a silly question but i have heard of linux and windows shared hosting server till know. No idea about how Apache works.

You're mixing two unrelated things. There are two basic types of server, IIS and Apache. Don't know about IIS, but Apache can run on any operating system. You can also get a pseudo-server that runs Apache on your personal computer (Mac, Windows or Linux) so you can test your site exactly the way it would appear in real life. Including things like site-absolute links and php files that you can't do on your local browser.

You don't need to know what OS your server uses. You only need to know whether it's Apache or not-Apache, because that determines whether you're making your rules in htaccess or, er, something else.

Since Apache is more widely used overall, you will find more people who know what to do about it. F'rinstance, we'd offer up at least three different ways to block a blank user-agent ;)
11:55 pm on Jan 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1667
votes: 36


I am in the same boat as dstiles, except I run ColdFusion(for over a decade and a half) on IIS/MSSQL/MySQL, no need for Apache.

I see a lot of Windows hosts offer "Mod-Rewrite for IIS" on IIS 6/7/7.5 on share hosting plans. There are also several other flavors out there.

.htaccess(and other Apache bells and whistles) is a great tool, but I only use it when I want to block something and don't want to hear about it(like specific SE spiders from Asia) in my custom logs. I don't use IIS logs, due to the fact that all data is stored in very well optimized & encrypted Schema, then there are custom very well optimized CF Apps written on top of that. This way I keep all the Data, all I need to do is back up DB and Code and I am a history, with the history ;).

@pkKumar,

Do your self a favor and start analyzing request headers from your visitors(programmatically record them), then compare them and learn. Take notes on different UAs. 99% of the Bad Bots and Scrapers have them a$* sideways/backwards and in a wrong order. I wount go into the details on this, But that should get You started!

^ P.S. It is all incrediBILL's fault, he got me thinking, and I now see the light! :) Easy prey for a bot blocker code logic.
1:00 am on Jan 9, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


he got me thinking


Sorry about that.

Won't let it happen again! :)

Those headers can make a huge difference because all browsers pretty much send the same headers all the time but bots do not and it's much easier to trap tons of garbage via header analysis (for now) than user agents. Made a huge difference in how much stuff I was able to block than I did before once I started looking at them in more detail.

Just for fun I ran a test and disabled the data center blocking list just to see how much the header tests would stop all by themselves and it was amazing that a simple bit of code could really block most of the current crawler crud without user agent parsing, data center block lists, blacklists or any of the other time consuming crap.

Sadly, some have done a better job at faking headers and you need all the other stuff to still block them but at least they're currently the minority.
3:19 am on Jan 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1667
votes: 36


some have done a better job at faking headers


They still send "Keep Alive:300" :)
9:16 am on Jan 9, 2013 (gmt 0)

Junior Member

joined:Jan 7, 2013
posts:110
votes: 0


You're mixing two unrelated things. There are two basic types of server, IIS and Apache.


ok so in the background there are two types of web server IIS and apache. I have asp.net site then i guess IIS is the preferd choice. Can asp.net site run on Apache server ?
9:34 am on Jan 9, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5821
votes: 64


Can asp.net site run on Apache server ?

AFAIK you'd need to switch to PHP or use another server side language for functions you currently use asp for. That's why it was advised to switch over before you amass a lot of application dependent documents.

The beauty about using Apache, is it is a standard that has been built upon over the years, using core modules. Unless your host has installed some odd custom configuration, these functions will always be applicable and are easily researched for solutions to your specific needs, and since approx 3x as many webmasters use Apache than IIS, as Lucy said, you'll get more help with questions.
9:56 pm on Jan 9, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


Wilderness - Me? Lurking? :)

pkKumar - IIS is IIS and nothing else, although it does have .NET as a so-called "smart" "plugin" - never use it myself. So: IIS is not "preferred" but mandatory.

ASP code CAN run on an emulator - I think there are two or three - but it's not something I would ever consider. I looked at them a year or two ago and they were not reported as reliable. In fact, few emulators of this nature are every completely reliable.

If you only have one or two web sites to manage it's worth re-coding for apache or some other linux/unix server, probably using PHP to code and MySQL as the database (if required). You can use PHP under IIS but why? If you decide on PHP then run it on the far safer (from hacks) linux machine.
8:56 pm on Jan 10, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1667
votes: 36


BTW, the is also this: [iis.net...]

Based on my experience, it is not as easy to switch from ASP/IIS to Apache/PHP when you/your client are already invested.

Good place to start though if one has decided to go the path of..., what do you call it LAMP now days ;)...

p.s. I know I am in mostly none-MS env, watch them keyboard buttons now..

Blend27
3:00 am on Jan 11, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


If you decide on PHP then run it on the far safer (from hacks) linux machine.


Seriously off the original topic but in what universe is PHP safer on Linux? Poorly written PHP scripts are actually the biggest threat to escalation on any server I've ever seen.

The .NET on Linux problem may be solved by MainSoft's Grasshopper doftware, but I've never tired it.

Anyway, this discussion is beyond the scope of Spider Identification.