Forum Moderators: DixonJones
a reverse dns lookup revealed the following host: cache-ink2-cro-hsi.cableinet.co.uk.
cableinet is the old name for blueyonder, a UK cable ISP (i.e. telewest).
Anyone got any idea what's going on here? would you advise me to restrict this kind of activity?
It did request my robots.txt
thanks Paul
It's works the same was as the IE File>Save As, but quicker, insofar as that it downloads all .html, .js, .gif, .jpg etc. files (based on the options set) from your site.
I don't know of any way of stopping is, as it uses the standard http port 80. The program works by examining the source code of the page and then requesting and downloading all assets. It then follows all links and does the same on those pages. I'm not sure what it does with server side pages though (.asp, .jsp, .php etc.).
Someone with a Blueyonder account has used it to take a copy of your entire site.
This might be because they like it so much and want to browse it offline. Or it might be because they intend to clone it in some way.
Either way, its not nice.
If I remember correctly, HTTrack can be configured to ignore the robots.txt, but you should be able to block its user agent, which at least would make it more difficult for them.
However, if your site makes a living from advertising you may not like it, or if your site is serving dynamic information that cange a lot (like a shop). There are probably a bunch of other reasons not to allow off line browsing - but my point was just that it is not allways bad. It depends on the purpose of the site :)
If HTTrack can be configured to ignore robots.txt I guess the best way for me to ban it is with .htacces?
At slow connection speeds and per minute phone plus internet charges, no way im going to browse sites with the clock ticking away.
There is nothing wrong with downloading material from the web. Google does it all the time. It is how it is used where a problem may occur. if its available on the net, people will view it (and download the whole like) if they like it. As long as its for personal use, no problem at all.