| 7:41 am on May 7, 2005 (gmt 0)|
It depends on how you're handling the data. If it's stored in text files and you're using Apache, you can disable the linking or accessing of the text files if the request is not made by your server.
I'd need more info on exactly how you're implementing it to go any further.
| 7:50 pm on May 7, 2005 (gmt 0)|
Thanks for your reply.
On my site, data is grabbed from the database in the real time. So, these guys open the main page (or whatever pages they need) using PHP then they handle it using implode and explode to show only the data they want to show. So, what I need to prevent is that method: I do not want anyone to be able to "include" the files using any scripts.
Is this possible?
| 8:35 pm on May 7, 2005 (gmt 0)|
You can block them if you know their IP or their User Agent (if it is not a "normal" user agent) or if they can be distinguished from a normal user.
You can use Apache or PHP or asp ... to do that.
| 8:47 pm on May 7, 2005 (gmt 0)|
Good. This leads to something.. Normally, what is the user agent that is used when grabbing pages the way I described?
| 9:00 pm on May 7, 2005 (gmt 0)|
Problem is that usually grabbers use MSIE or Mozilla or Konqueror user agent.
Sometimes they use a specific user agent such as PHP/4.2.x (look for lists of webbots, fake UA, grabbing...)
You first have to find something that makes a difference with normal users (the best being an IP or an host, an absence of referrer or a specific User Agent).
Inserting a PHP redirection (without a html redirection) to access the pages is a good way to limit grabbing.
| 9:03 pm on May 7, 2005 (gmt 0)|
There are going to be around 100 user agent's to block, and half of those programs allow them to change what user agent is being sent to the server.
On important data, I run some code that stores how many times/areas an ip has hit, if there is more than N hits to a specific area/target, it can either throttle them (warning that they're hitting the site too many times in that area, and if the warning is ignored, they are automatically banned [crawlers don't read error msgs]), or silently redirect from the intended page back to the main index.
| 9:11 pm on May 7, 2005 (gmt 0)|
Your server logs should tell you what the UA is if you can ID who's doing it.
There are some massive long threads on this
[webmasterworld.com...] (and it's [b]three[b] predecessors (links in first post of each).
There is a more recent one around, but I can't find it at the moment.
| 11:13 pm on May 7, 2005 (gmt 0)|
Or, if your server is Apache, just write a rule that prevents remote access of your files. You may have to change the extension of them, but PHP won't care.
| 11:20 pm on May 7, 2005 (gmt 0)|
Thank you all. I guess preventing certain UAs is not applicable then.. I will give the apache rules a try and come back with results.
| 4:02 pm on May 9, 2005 (gmt 0)|
Maybe require a membership and require a login to view the data? Require them to be referred from your main page or something and track them logged in via sessions...
| 8:59 am on May 10, 2005 (gmt 0)|
I thought of showing a confirmation page where you have to enter the randomly created code in a box to view the contents. But I didn't want to make it difficult for my visitors.