Forum Moderators: phranque
PROBLEM
I have a hunch people are using programs to grab all the URLS contained within my search engine. The hard work I put into find websites are being devoured by other competitors of mine. They are probably scouring links off my site to add to their search engine.
EXAMPLE
This IP address: 64.90.185.82.nyinternet.net is hitting our site every 2 seconds and grabbing just HTML pages (no .js files and gif files are being grabbed). I think it hit all of our pages within our site with 10 minutes - I'm almost sure of that.
NEED HELP - RESOLUTION?
Would I use htaccess to prevent competitors / programs from grabbing all of my pages? How can I prevent certain programs from hitting my site, but making sure Google and other major SEs will successfully visit? Should I develop something that will prevent an IP address from hitting "X" number of pages within "X" seconds? Again, how would this impact google and other major crawling SEs?
The second alternative is to turn my static links to external sites into dynamic cgi links. That way these bots would not be able to grab all the links to the external sites, right???
What do you think is better resolution to this problem? How can I protect myself from my competition without protecting myself from Google and the major search engines??? Thanks so much for your help.
Brad
Hmmm.....i performed the search and read up on spider traps.........I can tell this spider trap thing isn't for a novice.... (I know nothing very very little about programming - I'm a marketing guy :))
Would anyone be willing to help educate me on how to write a very basic script / htaccess / ??? to block this:
64.90.185.82.nyinternet.net
Or is this well beyond me and should I even try? :) Hopefully there are some newbies out there that are like me and would like to learn alongst side me....
Would anyone be willing to help educate me on how to write a very basic script / htaccess / ??? to block this:64.90.185.82.nyinternet.net
very simple.
In your .htaccess file just add below line
------ below lines -----
deny from 123.123.123.123
deny from nyinternet.net
------ above lines -----
Replace 123.123.123.123 with the IP number you want to block and also remember, you can block as many IP as you wish, just keep on adding more lines like "deny from 123.123.123.123" or "deny from .hostname"
Hope this helps.
1. First create a txt file and then rename it: .htaccess
2. When I upload it to the server, upload it in ASCII form (not binary)
3. I know I have to change the permissions on the file once its on the server (but I don't know exactly to what). Do you know how I should CHMOD it?