Welcome to WebmasterWorld Guest from 220.127.116.11
I discovered it because my webtracker, logged a address that was somethign like c:/download/files etc..and it wasn't on my computer. Of course, it also meant he was online at the time.. Incidentally this could be a advanatage of having a webtracker instead of analysing logs?
It's cool to see people find your site worthy of saving, but it robs me of the satifaction of seeing hits, or drawing more traffic, since I update once a week..
It might also indicate a problem, because, users might find my site too slow to load, hence they prefered to read it off their hard-disks , even when online!
Perhaps I should do something to prevent this?
And Asia, I used to do it all the time when I was in China. Internet by the minute can be expensive.
There is not a great deal you can do to prevent it, but to curtail downloading programs you could include their signatures in your robots.txt file. Even then though, some of these programs may be setup ignore the robots.txt file. Have a look at Brett's Robots.txt tutorial here [searchengineworld.com].
I agree with Xoc though that one way to foil "ripperofferers" would be hard links rather than relative links.
Here is Slovenia internet costs me 1$ per hour+ 30$ per month.In Europe internet is stll paid by hour in many if not most countries.And i have 33.600 kb modem/line.
I use teleport pro which is excellent. i think this program can solve the "hard links " so you can browse offline no matter what type of links are there as long the links are on the same doman(otherwise teleport pro wont donwload them-but i think you can set up this too).
There are several possibilities how this program save the site or page on the hardrive, one is like it is on the web (excellent for mirroring) and another is
prepared for offline reading.
We use WinHTTrack.. a simple freeware program which can follow links, and download one site or any levels of many sites. It also downlaods much faster than downloading and saving one page at a time as it runs several threads at a time such as LeechFTP. It follows robots.txt.
We do not do it to steal code. I would take a guess that most offline browsing is not for any nefarious or cheating purpose. But I guess if you have mainly a marketing or advertising site, you may have reason to speculate other reasons for people downlaoding material en masse.
People do it to us regularly and we are pleased they find our content useful enough to download it for later reading.. even the whole site. When we do find that people have published under their own name our material we do pursue it, and use several methods to find such illegal copying. Its harder to find breaches of copyright when people make multiple copies to distribute off line to others. It is still illegal to do do, but just the act of downloading even a whole site for personal use, I dont think is a problem at all.
Your logs still reflect the page views, at least the first time they look at a page. And server based and browser based cacheing already causes random error in your page counts. You also may see the hits.. (eg my documents/yoursite/blah.htm whenever an absolute location is called and the reader is on line, and they didnt download all elements.)
As publishers ourselves, we have to accept that publishing information on the Web, means that you allow people to view it, whether on or off line, though not to breach copyright by copying code, or reproducing content on other domains without clearance. Same deal whever you publically publish anything such as a book.
Has anyone ever run across their site posted somewhere else and not known about it? I'd like to know how you found it if you did...