Forum Moderators: open

Message Too Old, No Replies

Block the Googlebot but keep current listing?

         

Jesse_Smith

3:42 am on Sep 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is it possible to ban the Googlebot from crawling your site, but at the same time keep the current indexed URLs from being deleated from the index?

mcavic

4:50 am on Sep 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't think so - that seems counterintuitive from Googlebot's perspective.

BlueSky

4:57 am on Sep 21, 2003 (gmt 0)

10+ Year Member



From my experience, when I returned 404's the urls stayed in the index at least until the bot checked a few times, but the cached pages dropped fairly fast. Googlebot indexed a bunch of pages I didn't want so I banned that section of my site via robots.txt. The results were pretty much the same as the 404's though some urls (no cache) are still showing after six weeks. When I returned 401's/403's, the urls/pages for those dropped out very fast.

Jesse_Smith

6:28 am on Sep 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't want to give it 401's/403. I just got sites that have more than enough URLs, that when I get Googlebombed, my dedicated server let's me know it *server load in the 20s*. Just wanted if it's possible, to stop it from crawling some of the 29 domains, but keep all indexed URLs indexed. The current 200,000+ indexed listings is way more than enough to get traffic.

dirkz

12:37 pm on Sep 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It still doesn't make sense to me. If GoogleBot can't look at what is at your page, it should not appear in any SERPs. Period. If you don't want your pages crawled, you could use meta tags like "index,nofollow" or robots.txt. You can fintune to single pages that shouldn't be crawled. Of course they won't appear in SERPs.

What do you mean by "server load"? Top? If your server load goes up to 20 just because a few Googlebots are using HTTP then imho something's misconfigured or you are accessing DBs excessively (creating dynamic pages). If that's the case, try to optimize DB access or export static pages once or twice a day.

IMO, every URL in the index is too precious to throw away.

mcavic

2:32 pm on Sep 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apache should not be raising your load average to 20. But, if you're using PHP (or maybe other languages too) it's very possible that you have a bug whereby when Googlebot hits a page with incorrect parameters, it goes into an infinite loop. And that could easily cause a load average of 20, even after Googlebot has left.