Forum Moderators: phranque

Message Too Old, No Replies

To crawl delay or not .

         

paranoid android

5:06 pm on Jun 6, 2016 (gmt 0)

10+ Year Member



I run a fairly busy popular news site. We get an average of 100k unique visitors per day, which sometimes increases if a story is particularly popular or something has gone viral.

Our hosts are sometimes unable to cope with surges in traffic.

Last night the server completely grinded to a halt. The reason, according to the engineers, was that Bing, Google etc were all crawling the site a lot at the same time. He took it upon himself to add a crawl delay to our robots.txt file of 30.

I'm of the mind that if the server can't handle lots of crawling activity, it needs to be upgraded to one that can. Are there any negative SEO/ranking considerations when using a craw-delay? I'm going to suggest they upgrade us to a more powerful server, and remove the entry from our robots file.

topr8

5:38 pm on Jun 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



i set a crawl delay and haven't noticed any adverse effects .. however maybe the experience of other members may differ.

robzilla

6:03 pm on Jun 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm of the mind that if the server can't handle lots of crawling activity, it needs to be optimized :-) Upgrading hardware is a last resort for me. Have you profiled your pages to see where the bottlenecks are? Considered upgrading your software stack (e.g. upgrade to PHP7 if it's a PHP site), or implementing more aggressive/efficient caching?

lucy24

8:37 pm on Jun 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google--specifically--ignores the crawl-delay directive. If you want them to slow down, you'd have to set a crawl rate in wmt/gsc. And then they will sulk because they want to make this decision for themselves.

100k unique visitors per day

So, let's say conservatively 1M requests total (humans getting images, stylesheets, scripts and so on).

crawl delay ... 30

i.e. 2 robotic requests per minute, or 43200 per day per robot.

Nope, I don't think crawl-delay is your problem.

The problem I see is: What the ### is your host doing, unilaterally modifying your robots.txt? On rare and isolated occasions they might need to tweak your htaccess (or IIS/Nginx equivalent) if there's an error you can't pinpoint. But robots.txt? They might as well be messing with your actual site files.

Andy Langton

10:30 pm on Jun 6, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



i.e. 2 robotic requests per minute, or 43200 per day per robot.


1440 minutes in a day = 720 requests/day at crawl-delay: 30, no?

For OP, I've never seen higher than 10 requests/second for any legitimate bot, and Google is usually substantially slower than that. I suspect you might have a had a rogue crawler masquerading as a search engine, or alternative some sort of misconfiguration. Either way, crawl-delay is not the answer, if for no other reason than Lucy24 mentioned - it's not a standard feature.

I would request more information on the useragents/IPs causing issues.

paranoid android

2:03 am on Jun 7, 2016 (gmt 0)

10+ Year Member



We have some fairly aggressive caching already on the server, and the wordpress installation is optimised to run as efficiently and quick as possible.
I am a little peeved that they went in and edited our robots file without consulting me first. Not only because they didnt bother to check if i wanted that (we actually have a virtual hosts file which they inadvertently overwrote with a physical one), but because they potentially had an adverse affect on SEO.

They seem to be of the mind that anything that can stop the server being hit too much is ideal. If that means giving us less traffic and/or exposure to search bots, then all the better for them.

Perhaps its time i found a new host who care about allowing you to grow as a site.

lucy24

3:12 am on Jun 7, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1440 minutes in a day = 720 requests/day at crawl-delay: 30, no?

Whoops, hit the wrong button on my calculator! So it would be 2880 total (2 per minute, not one every other minute). Fortunately this only strengthens what I was trying to say: Crawl-Delay can't possibly be the problem.

In general I look with suspicion on robots who average out to more than one request per second. But obviously this would depend on how big the site is (in daily accesses, not in pagecount or filesize). If they really are hammering away too hard, you should expect to see a lot of 500-class errors (503, most likely) as they pass the server's Simultaneous Requests cap.

I am a little peeved

I absolutely would feel the same.