Forum Moderators: phranque
I am wondering if anyone can help me with how I stop HTTRACK
I have tried putting this in htaccess:
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^httrack.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^httrack* [OR]
RewriteRule ^.* - [F]
It isn't working and so it's costing me around 5GB a day in bandwidth!
And help would be very much appreciated.
Thanks
Steve
[edited by: jdMorgan at 2:44 pm (utc) on July 21, 2009]
[edit reason] No URLs, please. [/edit]
I have changed my htaccess file now and will check the logs tomorrow to see if it has stopped them doing this. Hopefully it will have.
Just a word of caution!
After making changes to htaccess, you should ALWAYS and IMMEDIATELY check your site (s) to assure they function and do not return a 500 error (site not working) due to a syntax error.
I have been monitoring my logs and this hasn't stopped HTTrack from downloading my site! :(
They were back again last night..
"HTTrack off-line browser 7.03 GB 23 Jul 2009 - 22:35"
Does anyone have any other ideas suggestions on how I can stop this? I was wondering about bandwidth limiting? Is that possible?
Thanks In Advance
Steve
Before getting too excited, let's be sure the code is correct and that mod_rewrite is actually enabled. I noted a missing directive in the code above, so I'd suggest:
Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC]
RewriteRule ^ - [F]
If you don't want to install these add-ons, then you could use an on-line user-agent spoofer like WannaBrowser, although results with such on-line tools are aometimes inconsistent (especially when redirects are involved).
Also note that if you use a custom 403 error page, then the URL-path of that page will need to be excluded from the rule above. Otherwise, you'll get a 403 error loop.
Jim
Thanks, this htaccess stuff really is beyond my understanding! I have followed your advice and can see the access to HTTrack being blocked now so that's reassuring. :) Thank you.
As you have noted I do have a 403 custom error page loop but not sure how I would deal with this as my error pages are served via server configuration which is administered by my server host i.e. the url doesn't actually change when the error page is displayed.
I think I would prefer to just redirect these rippers back to Google if I knew how to!
Thanks so much for you help.
Steve
In order to cure the 403 loop problem, you'll need to find the local path of the custom 403 page. Or perhaps you might want to replace it with your own custom 403 page by using the ErrorDocument directive, in which case, you can define the path youself.
At any rate, symbolically, the cure for the loop (and another problem) is:
Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{REQUEST_URI} !^/(robots\.txt¦[i]<path-to-custom-403-page\.html>[/i])$
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC]
RewriteRule ^ - [F]
For more information about mod_rewrite and regular expressions patterns, see the resources cited in our Apache Forum Charter (link at top of this page).
Jim