Forum Moderators: phranque

Message Too Old, No Replies

How to effectively limit crawl rate

         

Skips

7:20 am on Mar 26, 2025 (gmt 0)

Top Contributors Of The Month



Hey everyone,

Lately, I've been receiving alerts from my server management software about hitting (and now exceeding) resource limits (I'm on a VPS). After digging through the logs, it turns out Googlebot's been really busy crawling my site.

Considering how much Google pushes for better website performance, it's kinda strange that their own bot can end up hammering a site like this. I'm guessing there should be a straightforward way to control the crawl rate. I had crawl limits set in Search Console before, but I just checked, and that option seems to be gone now.

Does anyone know the best way to manage this now that the old setting’s gone?

Thanks!

not2easy

11:54 am on Mar 26, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You can still file a 'special' request to slow crawling in your GSC account, the instructions are here: [search.google.com...]

In their words - you can use the special request
to report a problem with unusually high crawl rate, mentioning the optimal rate for your site in your request. You cannot request an increase in crawl rate, and it may take several days for the request to be evaluated and fulfilled.

lucy24

4:31 pm on Mar 27, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Meanwhile, add this line to your robots.txt:
Crawl-delay: 3
(or whatever number of seconds you find appropriate). Google will not honor this directive, but some robots do, so it can’t hurt.

Skips

7:48 am on Mar 29, 2025 (gmt 0)

Top Contributors Of The Month



Thank you for the tips - sounds good!

Skips

3:39 pm on Mar 29, 2025 (gmt 0)

Top Contributors Of The Month



@not2easy - Thanks! I went ahead and submitted a request to decrease the crawl rate using the link you provided — I didn’t know about that option. I've been getting 300K+ Googlebot requests per day on dynamic pages with lots of calculations happening on each request, sometimes with multiple requests hitting at the same time. It almost felt like Google was load-testing my server! :)

NickMNS

7:11 pm on Mar 29, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've been getting 300K+ Googlebot requests per day on dynamic pages with lots of calculations happening on each request

This is your issue, not crawl rate. I'm assuming that these pages are very similar and differ only by some inputs (query params), is that correct? If so Google is not only wasting your server resources but also crawling pages for nothing, as much of that content will be considered duplicate. If that is the case, you should consider adding a "rel canonical" link to tell Google that this group of pages are the same as the one you link to, and so only crawl and show that one. Google may still crawl the other pages but most likely at a much reduced rate.

If the pages aren't at all similar consider using caching strategies, or for server calculation try using memoization to reduce the load on the server.

Skips

8:32 am on Mar 31, 2025 (gmt 0)

Top Contributors Of The Month



NickMNS, your assumption isn't correct - the pages aren't similar and duplicate content has never been an issue in the 10+ years the site's been online. I already use caching, but because the results depend on a variety of frequently changing factors, they need to be re-calculated fairly often. So while caching helps to some extent, it doesn’t eliminate the need for fresh calculations.

What's bothering me now is that googlebot has never been this greedy. I'm suspecting that it's gathering data for google's ai model, which is problematic for 2 reasons: one being overconsuming my server resources, and second ai models benefits from data without driving any traffic (or very little traffic) to the websites it takes data from.

NickMNS

3:08 pm on Mar 31, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't know your site or your content. So I'm just throwing out ideas. Another strategy you could try, is to delay carrying out your calculations until after you get a user interaction, scroll, or click or whatever makes sense for your use case. This would force Googlebot and other bots to use JS rendering to get the data, this would put a bigger load on their resources and likely result in less crawling on their part. It would greatly reduce the load on your server. But, you need to be sure to protect the endpoint that gets the calculations, so that the bots don't hit it directly, this can be done using a POST request or using random generated token.

Skips

6:07 am on Apr 1, 2025 (gmt 0)

Top Contributors Of The Month



Yes, I understand and appreciate the effort to help. I could use a static cached version for search engines while keeping content updated for human users, but right now, I’m trying to figure out what’s happening with this Googlebot activity.

My request to lower the crawl rate doesn’t seem to have been processed, or at least the results are not what I had hoped them to be. I'm still receiving excessive resource usage alerts from the server. Meanwhile, organic traffic has dropped by roughly 40% since yesterday. So, Googlebot is aggressively crawling my site, yet it’s simultaneously sending significantly less traffic.

To me, this suggests only one thing: scraping data for their AI model. It otherwise makes no sense for a search engine to waste its own resources indexing content that isn’t appearing in search results. And it seems to be doing so in an excessively greedy manner, with complete disregard for content owner interests - pushing the limits of a $150/mo VPS hosting while providing nothing in return. I am slightly annoyed.

lucy24

5:04 pm on Apr 1, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wouldn't it be fun if you could just start returning a 503 to all Googlebot page requests. That’ll teach ’em.

:: some day, somehow, somewhere ::

Skips

7:09 pm on Apr 1, 2025 (gmt 0)

Top Contributors Of The Month



So, I have to hand it to Google - to my surprise, they did respond to my request and not with some generic cliche from no-reply address, but with an actual, detailed response related to my domain with return email of the analyst handling the case, so I could reach out for any further clarification. Which I did! And once again, the person was very responsive and constructive. Looks like we’ve found a solution, involving some adjustments on our side (limiting number of crawlable pages) and limiting crawl rate on theirs. I have to admit, I never thought I'd be chatting with a Google analyst - had to contain myself to keep it professional and to the point, resisting the temptation to asking the thousands of questions every webmaster has, LOL

lucy24

5:42 pm on Apr 2, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wow! Now if only we could all figure out the specific magic word that got your problem transferred to a human :)

Skips

4:01 pm on Apr 3, 2025 (gmt 0)

Top Contributors Of The Month



hard to tell :) Could be the sheer number of requests from the crawler that caught the attention. I still think though, that it'd be easier and quicker, if they didn't remove crawl rate limit option from Search Console...

NickMNS

4:12 pm on Apr 8, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Could be the sheer number of requests from the crawler that caught the attention

Not sure that 300k is that high a number, for Google. In early February, I had a couples of days where Google was crawling 2M a day. These days it's averaging about 100k a day, with spikes above 300k.

lucy24

5:44 pm on Apr 8, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's probably more useful to look at it by proportion of site size rather than by raw numbers, but I don’t know if the appropriate data exists.

:: business with multi-file search ::

Looking only at page requests, anywhere from 10-20% daily. And then you consider pages that, in G’s experience, change frequently, as opposed to ones that are fairly static. (I hope I am right in assuming that G can tell when the only change is in, say, a dynamic navigation header which would be affected by changes in something other than the current page.)