Forum Moderators: open
scooter2.* 11 Altavista
scooter.* 11 Altavista
scooterr.* 11 Altavista
bigip1-snat.* 11 Altavista
vscooter.* 12 Altavista
*.sv.av.com 11 Altavista
The * is the wild card. 11 and 12 is how many times it was hit. The * can be anything.
63.173.190.16 <<--- that hit my spider trap 27 times in the past 3 months and only search engines know about it.
DNS wild cards are easy. You can take out most every single spider google has by this DNS:
*.googlebot.com
Agents can be faked, DNS cannot.
Just a thought.
john
Plus some of the SE's are adding new IPs every few months. I'd rather have a failsafe and take my chances than have the search engine catch me with my pants down.
One of the best methods I have experienced had both options going. When the browser visits, a UA match is performed. For example, Mozilla to the left and the rest to the right.
If Mozilla has a term that matches something from the SE then it is given the proper page.
The "rest" go off by UA/IP match.
The bots can afford a slight slowdown but we're not going to give that slowdown to every single user that enters the site.
As for the getting caught, I'd rather get caught by some user who is just trying to steal a page than a SE robot. I'm not going to take that risk at all.
You have a spider trap.......this spider trap is submitted to search engines. The IP is recorded and DNS upon going to the web page. The database isn't even on the same server as the any of my cloaked site, it just requests to use the spider database. Here's my code, and this code loads extremely fast, like not even close to .01 seconds. Like I said, it's stupid to stick with agents. So IP's change, their DNS's do not, and if they do the spider trap will pick them up but I seriously doubt altavista's going to change their entire network and server names anytime soon and if they did I'd easily pick them up on next weeks run of spider trap. It's updated weekly and I've never had a spider miss it and I've been cloaking for 2 years. Be safe with your agents as you think but if I check logs and never saw a spider miss and go to my normal site I would say I'm doing pretty good. I couldn't find your code because it was 404 but here's my spider trap IP and DNS loggin code:
$new_ip = $REMOTE_ADDR;
$new_dns = gethostbyaddr($new_ip);
$result = mysql_query("SELECT * FROM spiders ORDER BY id", $db);
while($myrow = mysql_fetch_array($result)){
$id = $myrow["id"];
$ip = $myrow["ip"];
$count = $myrow["count"];
if($new_ip == $ip){
$found = "1";
$count++;
mysql_query("UPDATE spiders SET count='$count' WHERE id='$id'", $db);
}
}
if(!$found){
mysql_query("INSERT into spiders (ip, dns, engine) VALUES ('$new_ip', '$new_dns', 'IP') ", $db);
So you aren't doing real time DNS resolution then? You do the lookup with the spider trap and save it, then reference it just like the list of IP addresses? (am I getting that right?)
If that is what you are doing, then the concern doesn't apply, the concern was over doing a DNS lookup for each page request in real time ...
My site gets in the ball park of a thousand hits a day. The load time on it is unnoticable from being cloaked to being cloaked. Real time DNS is recorded by the server itself, does this slow it down too when feeding the web page? No. It's sent when the user is trying to acces the web page. If you're really convinced my DNS database slows it down I'll throw it onto a site that gets 80,000 hits a day. I do work for a company that is one of the largest food chains in the world so I'll test it on their site for half an hour and run speed tests on loading time vs old loading time.
PHP, and I would use sites that have html analyzers/loading time. I work for an applicaiton service provider (asp) so they have loads of stuff that checks html errors, load speed programs, bandwith speed, etc. I'll run the test 10 times on the site without it then 10 times with it. I'll tell you the results after I put it on.