Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k

Sites stealing my contents

How can I stop other sites from grabbing the contents of my site?

3:03 am on May 7, 2005 (gmt 0)

5+ Year Member

OK. I have a growing site that offers statistics and ranking for webmasters. As I expected, some of them started to pull the pages where their stats are, handle them with PHP and show only the items they want to on their sites.
Are there any ways to stop that?
7:41 am on May 7, 2005 (gmt 0)

10+ Year Member

It depends on how you're handling the data. If it's stored in text files and you're using Apache, you can disable the linking or accessing of the text files if the request is not made by your server.

I'd need more info on exactly how you're implementing it to go any further.

7:50 pm on May 7, 2005 (gmt 0)

5+ Year Member

Thanks for your reply.
On my site, data is grabbed from the database in the real time. So, these guys open the main page (or whatever pages they need) using PHP then they handle it using implode and explode to show only the data they want to show. So, what I need to prevent is that method: I do not want anyone to be able to "include" the files using any scripts.
Is this possible?
8:35 pm on May 7, 2005 (gmt 0)

10+ Year Member

You can block them if you know their IP or their User Agent (if it is not a "normal" user agent) or if they can be distinguished from a normal user.
You can use Apache or PHP or asp ... to do that.
8:47 pm on May 7, 2005 (gmt 0)

5+ Year Member

Good. This leads to something.. Normally, what is the user agent that is used when grabbing pages the way I described?
9:00 pm on May 7, 2005 (gmt 0)

10+ Year Member

Problem is that usually grabbers use MSIE or Mozilla or Konqueror user agent.
Sometimes they use a specific user agent such as PHP/4.2.x (look for lists of webbots, fake UA, grabbing...)
You first have to find something that makes a difference with normal users (the best being an IP or an host, an absence of referrer or a specific User Agent).
Inserting a PHP redirection (without a html redirection) to access the pages is a good way to limit grabbing.
9:03 pm on May 7, 2005 (gmt 0)

10+ Year Member

There are going to be around 100 user agent's to block, and half of those programs allow them to change what user agent is being sent to the server.

On important data, I run some code that stores how many times/areas an ip has hit, if there is more than N hits to a specific area/target, it can either throttle them (warning that they're hitting the site too many times in that area, and if the warning is ignored, they are automatically banned [crawlers don't read error msgs]), or silently redirect from the intended page back to the main index.

9:11 pm on May 7, 2005 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Your server logs should tell you what the UA is if you can ID who's doing it.

There are some massive long threads on this

[webmasterworld.com...] (and it's [b]three[b] predecessors (links in first post of each).


There is a more recent one around, but I can't find it at the moment.

11:13 pm on May 7, 2005 (gmt 0)

10+ Year Member

Or, if your server is Apache, just write a rule that prevents remote access of your files. You may have to change the extension of them, but PHP won't care.
11:20 pm on May 7, 2005 (gmt 0)

5+ Year Member

Thank you all. I guess preventing certain UAs is not applicable then.. I will give the apache rules a try and come back with results.
4:02 pm on May 9, 2005 (gmt 0)

5+ Year Member

Maybe require a membership and require a login to view the data? Require them to be referred from your main page or something and track them logged in via sessions...
8:59 am on May 10, 2005 (gmt 0)

5+ Year Member

I thought of showing a confirmation page where you have to enter the randomly created code in a box to view the contents. But I didn't want to make it difficult for my visitors.

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month