Forum Moderators: coopster

Message Too Old, No Replies

Implementing old thread script?

         

jake58

8:55 pm on Dec 30, 2007 (gmt 0)

10+ Year Member



Howdy:

I am running windows 2000 and apache 2.2.4. With 5 domains.

I have been getting hundreds, if not thousands, of hits by bots each day. In addition, I have over 4,000 files for download so I get download managers, search.live and googlebot eating up my dsl bandwidth.

I have blocked a lot of the bots with peerguardian 2. But I still don't have a good way to block download managers.

Here is an old thread that seems like it would work but I am not running linux/apache.

[webmasterworld.com...]

I do have php installed.

My question is, how do I install this script since I do not have .htaccess files?

thanks,

john

PHP_Chimp

9:21 pm on Dec 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How are all of your download files managed? Are they all called from a single page using get or are they individually paged?

Hopefully you are using a download.php?f=get_this sort of arrangement, as this will make it a lot easier for you to use that script. As you can just run that script at the top of the download.php page.

If you dont have that sort of setup then it will be more difficult to implement. As .htaccess files will have all requests passed through them, so you can get up things to happen on any request in any part of your site through 1 file. You could place that script at the top of each page that is used to call a download...however if you have a few thousand pages to alter you may just decide that it isnt worth it.

Do you not have .htaccess because your host doesnt allow it?

henry0

9:29 pm on Dec 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jake, I did not try this script (but plan to)

I believe that on your system you should be able to install it in a dir at root level
and use this, on the top of each script but bellow header if you do not have OB on.

add: (on each page)
require_once($_SERVER['DOCUMENT_ROOT']."/your_dir/bad_bot.php");

Further those pages need to be .php
or html if it parses php.

I never run anything out of non *nix machine
so I am not even sure that my above suggestion will work!

jake58

9:50 pm on Dec 30, 2007 (gmt 0)

10+ Year Member



This is a windows/apache home server. No htaccess files.

I am just doing a directory listing of files within apache.

Anything else I have tried takes to long to display the files.

What is OB?

could this go in the bottom of the httpd.conf file?

[edited by: dreamcatcher at 11:11 pm (utc) on Dec. 30, 2007]
[edit reason] no urls, thanks. [/edit]

lammert

12:50 am on Dec 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is a windows/apache home server. No htaccess files.

But there should be a httpd.conf file which accepts the same commands as what you would have put in the .htaccess file.

jake58

12:54 am on Dec 31, 2007 (gmt 0)

10+ Year Member



So I can just add the php script to the bottom of the httpd.conf file or is there some command needed?

jake58

1:38 am on Dec 31, 2007 (gmt 0)

10+ Year Member



I am beginning to think it might be better to use a smart type of firewall software. The reason is because what if these are not download managers but website copiers?

Website copiers just need the ip address and wouldn't they bypass any php file running?

I have to do something quick because I just had to add an ip to the peerguardian list because it was using 100% of my bandwidth.

lammert

1:42 am on Dec 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think you are referring to the following piece of .htaccess code in the thread you mentioned:
<IfModule mod_php4.c>
php_value auto_prepend_file "/path/to/file/block_bad.php"
</IfModule>

You can place this in your httpd.conf file, or alternatively you can put a "require_once('somedir/block_bad.php')" in each of your PHP files as henry0 already pointed out. In both cases the script block_bad.php will be executed just before a requested page is parsed.

jake58

3:17 am on Dec 31, 2007 (gmt 0)

10+ Year Member



I do not have a mod_php4.c or mod_php5.c file or anything like it.

What is the module?

jake58

3:29 am on Dec 31, 2007 (gmt 0)

10+ Year Member



I think the windows version is phpapache5.dll and will try it.

jake58

8:42 am on Dec 31, 2007 (gmt 0)

10+ Year Member



I got it working in the httpd.conf file.

In my testing:

It has no effect on website copiers.

If a person does download from the directory listing a single file I am getting full bandwidth. If more than one is slows greatly.

If I try and do a third it stalls the third one until one of the other 2 are done then starts the 3rd.

I didn't get any kind of error screens.

One has to remeber my workstation ip address is the same as the web server and on the same dsl connection.

That may make some difference.

PHP_Chimp

12:05 pm on Dec 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To stop site copiers you may well just have to block them using there user agent string.
You can do this through mod_rewrite in the httpd.conf. You may want to ask on the Apache [webmasterworld.com] forum if you need help setting up that side of things.

The problem with blocking UA strings is that they are set by the user agent themselves...so there is nothing stopping people changing there US string that should have been 'nasty site ripper' to 'Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1.11) Gecko/20071204 Ubuntu/7.10 (gutsy) Firefox/2.0.0.11' or any other valid user agent string. Some of the site rippers already come with IE6 pre-configured as the UA string.

There was a script posted on one of these forums about stopping scrapers. This would stop the rippers as well, as it only allowed a certain number of visits in a give time frame. However I cant remember where or who posted it. I will have a look, but hopefully someone else remembers it and can point you in the correct direction.