Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

How to protect the website from download?

Prevent downloading for offline browsing



10:53 pm on Feb 27, 2005 (gmt 0)

10+ Year Member


I have one pretty large website running from a database (with about 150.000 entries).

Now i see that some people would like to get my database contents and are using some kind of downloading software to download all my pages and then probably they rip out the content from downloaded pages. Not only that they are stealing my database, they produce a high load on my server and eating up my bandwith traffic.

How can i prevent that - do anyone have experience in this field?

Thank you!


4:11 am on Feb 28, 2005 (gmt 0)

WebmasterWorld Senior Member txbakers is a WebmasterWorld Top Contributor of All Time 10+ Year Member

you can't.


5:36 am on Feb 28, 2005 (gmt 0)

WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member

never put the site up. You could track that people were looking at every page in order like a spider and only allow certain spiders to do that. Have a spider trap that might help as well. There is no way to stop somebody from getting one page or a few pages but you can detect if somebody is methodicly getting every page in a certain order and ban them by cookie or ip.


7:53 am on Feb 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Spiders sometimes go evil too (even Google's) and start sucking up bandwidth as if they were the only thing that mattered.

I use a throttle control to damp both problems.

Anyone hitting my sites faster than a given rate gets put onto an escalating series of bans -- ultimately their IP address gets banned for 7 days.

Usually, the ten minute ban (during which all incoming requests get sent a page saying "you are spidering too fast") is enough to stop most out of control spiders....they exhaust their cache of links and assume their job is done.

There are several levels of acceptable spidering (eg -- not the actual numbers: more than 3 CGI executions in a second is a ban. More than 30 in a minute is also a ban).

That won't stop a well-behaved spider getting the whole site. But (for a typical site of mine) that'll take them a week or more. That solves the crazy bandwidth problem.

It also solves several other problems as badly behaved spiders (like HTTrack) do not retry at a controlled rate -- they assume the site is closed to them.


6:39 pm on Mar 1, 2005 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Hello Philarmon,

I agree with what's been said so far. However, there are some solid steps you can take via .htaccess with mod_rewrite (for Apache servers) to ban known downloading agents. Be careful to research each user agent before you ban them. What's bad for one website may be a good thing to another.

Related Threads [google.com]


6:57 pm on Mar 1, 2005 (gmt 0)

10+ Year Member

Thanks for all the info guys! I think i'll try it with both - the too-many-hits-ban (although i have some concerns with the SE spiders who can spider a lot of pages at once pretty fast) and the user agent ban.

You're great :)


Featured Threads

Hot Threads This Week

Hot Threads This Month