homepage Welcome to WebmasterWorld Guest from 54.145.243.51
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Can we schedule crawl times for googlebot?
JustBarno



 
Msg#: 4227189 posted 6:19 pm on Nov 5, 2010 (gmt 0)

Short version: I would like to disable Google's indexing during certain days/times. I am also concerned about changing my robots.txt file and what the long term effects are (do they not come back?)

My company hosts monthly sales on our website where our traffic increases dramatically. Yesterday, the need for heavy load testing an optimization was pushed to the forefront during the perfect storm of heavy use and a Google crawl.

The fact that in just a couple hours, during our heaviest period of use, Google had downloaded well over a gigabyte of data (not to mention the stress on the SQL server), this was more than enough to push the system over the tipping point to horrible performance (1000% typical load time).

Does anyone have experience with similar problems? Were you able to find acceptable solutions while still maintaining good search results? We are currently top 3 on all our important search terms and phrases and I would hate to lose that. But if our site doesn't work, that is worse.

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4227189 posted 8:42 pm on Nov 5, 2010 (gmt 0)

Hello JustBarno, and welcome to the forums.

I've been looking for the Google reference and can't locate it right now, but the essence of the answer is that no, it's not a good idea. This video gets close: [youtube.com...]

Usually the crawl team does a good job with allocating crawl resources in a way that doesn't hurt the server. You can ask googlebot to crawl more slowly, but that often has other, negative repercussions.

From your description of the problem, it sounds like Google needs to retrieve the full page for every request - database calls and all that. Have you considered server-side caching and then replying with a 304 status if the page hasn't changed?

JustBarno



 
Msg#: 4227189 posted 10:40 pm on Nov 5, 2010 (gmt 0)

Thanks Tedster, that video was very informative. Normally I think you're right that it wouldn't hurt our server, but the problem is that we were already near the tipping point. I'll look into the server side caching but our pages are almost constantly changing (new high bids, auctions closed) etcetera.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4227189 posted 12:10 am on Nov 6, 2010 (gmt 0)

Was this a one time problem, a once-in-a-while problem, or a regular problem? If it happens more than a little bit, Google will also be experiencing the server delays and should adjust their crawl rate without you taking any action. At least that's the way it's supposed to work, and it often does.

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4227189 posted 7:04 pm on Nov 6, 2010 (gmt 0)

Optimize, optimize, optimize!

Your own site that is. Employ caching, "minify" css and html, employ Gzip, minimize image file sizes, minimize image use, get rid of clunky code, consolidate javascript into one file and load it from the end of the page, etc..etc.

I'm sure you've done all of that but run firebug and pagespeed to double check, before doing anything else you want to reduce the size of... everything.

cls_wired

5+ Year Member



 
Msg#: 4227189 posted 10:13 am on Nov 7, 2010 (gmt 0)

As far as I know, before loading your pages Google bot always reads robots.txt. You look in your log files how often Google do this. If this happens several times a day, probably you will be helped by a script which during certain time will overwrite a file robots.txt, adding there the instruction crawl-delay. And after heaviest (peak) time will be passed, it will clear the rule, to let robot continue indexing with speed as before.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4227189 posted 5:22 pm on Nov 7, 2010 (gmt 0)

That might seem like an option - however that is the kind of thing Matt Cutts is warning about in the video I linked to above. Can I use robots.txt to optimize Googlebot's crawl? [youtube.com]

leadegroot

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4227189 posted 10:25 am on Nov 8, 2010 (gmt 0)

Well, if you are really desperate, you could do some IP/agent sniffing and serve a non-contented 503 to googlebot at the high times.
Bit risky, though...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved