Forum Moderators: coopster & phranque

Message Too Old, No Replies

How to use installed PERL module WWW::RobotsRules

         

aodonline

3:51 pm on Jan 5, 2004 (gmt 0)

10+ Year Member



One my server we have an Ensim CP. I have reseller access so I can add and remove different services from my domains.

I noticed today that I essentialy can make two different CGI type directories by enabling certain services. This puzzles me since I really don't know a lot about PERL and CGI scripts.

Once I realized that I now have two places to run perl scripts from I immediatly decided to see if there was a difference in the two locations. I ran a script called perldiver.cgi in the default directory called cgi-bin, and the other simply called perl.

There is quite a few more modules in the perl location and one in particular that grabbed my attention. WWW::RobotsRules [perldoc.com...]

What I'm wanting to know is how to use this module in my site to block bad bots from accessing sections of my site, and if possible to block certain programs from all of my site.

Yes I know about about mod_rewrite, but this possible solution is interesting since no one in the Apache forum has mentioned it.

bcolflesh

3:54 pm on Jan 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Did you read the description of that module? It's for parsing robots.txt files - not for bot blocking.

aodonline

4:42 pm on Jan 5, 2004 (gmt 0)

10+ Year Member



Read the SYNOPSIS. It’s for stopping bots from disobeying the robots.txt

See also: [perldoc.com...]

If you have script that uses this module and make it an include on every page of your site, it EFFECTIVLY blocks the bot if you write your script to do so.

All I'm asking is if someone here has done this or has thought about it.

[edit] removed unessary comment to a post taken out of context by me[/edit]

[edited by: aodonline at 5:25 pm (utc) on Jan. 5, 2004]

bcolflesh

4:48 pm on Jan 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You may want to read the whole page - that module is used to parse robots.txt files based on the accepted standard - the user can compare the output in scripts where they query a server and wish to respect (or not) the contents of the robots.txt file.

Reading is fundamental.

aodonline

5:28 pm on Jan 5, 2004 (gmt 0)

10+ Year Member



Ok so let me get this straight here.

Your saying the module is basicly for some one running a bot of there own to use to control there bot by phrasing the robots.txt for allowed/disallowed commands?

If thats what your saying, then would it not be possible to use the module in reverse on your own site.

I realize the module just phrases the robots.txt, but how you use the results shouldn't matter.