|I need a new server because of Google's crawl.|
Not sure what my point is, but I'm having to upgrade my server as the result of Google.
I've always prided myself on having a site that served instantly and have generally thrown all of hardware, hosting, and coding at it to make that happen.
But lately my servers lagging - to the point where it's noticable. Some digging revealed that it's a friend's site. His site has a huge, ever increasing database. And he's gotten busier over the years. We determined that it was his site that was slowing the server. There's a particular monster query his site does, and it's on a dynamic page so there's a lot of combinations that generate this query. So one big query being hit all the time is slowing the server.
The page is for an informational lookup. My immediate reaction was to think scraper, and go looking for an IP to block. Which sure enough, we found - except the IP is googlebot.
Yep, the dots all connect. I need a new server because it's slow. It's slow due to a specific type of query, and the vast majority of hits for that query are coming from Google. If Googlebot went away, I wouldn't need a new server because my server would still be fast.
There's no alternatives (we've tuned mysql etc, looked at the query, considered tinkering with Google, none are appropriate long term fixes), so a new server it is...because of Google.
Worse, it's cyclic - my sites have gotten slower as a result of this that if Google actually uses site speed in it's algo, then I'm surely getting smacked.
In the end, it's still profitable. but is there going to be a point where people realize the costs that Googlebot is creating for them, and attempt to do somehting about it? This guy actually suggested we robot.txt exclude Google from most of the site (particularly all the long tail data they're scraping from the site). I advised against - but the fix is going to cost thousands - and not everyone is going to make that same decision.
Caching? Or do the pages change too quickly?
Why not use something like:
in your robots.txt to reduce the number of requests?
I'd calc the new/renew of site content v indexing then apply a crawl delay (if necessary) before spending money on hardware (real cost) UNLESS there is compelling reason for that REAL MONEY upgrade which has benefit until 2014ish and perhaps beyond by 6/12 months.
Meanwhile, database service should not be that much of a drain if all indexing/tablelayout is concise. Might be a time for other eyes to look over what's been done. Sometimes (and I have been guilty of this) we miss a trick, new or old, that will optimize speed. A qualified consultant might be the lesser cost to max results...
I've implemented caching and a few other things. And the site owner is actually a developer assures me the queries are already optimized.
We could change the crawl speed, we could block Google from certain portions of the site. But he does get some good amount of traffic from Google, I'm extremely adverse to screwing around with that stuff - there's too much risk of it blowing up. We lose his google traffic and the repurcussions in a month are more than the cost of a server.
He'll be kicking in for some of the cost - I've been hosting for free for him for a decade, and without him on my servers things would still run fast. So him chipping in for some of the hardware will be cheaper an easier than if he has to buy all of his own server, get a colo, and start paying hosting.
Given the revenue, it's well worth the cost of an upgrade. I just find it odd that to preserve that income we have to upgrade hardware simply to keep up with Google's crawler. If the profit wasn't there I'd have blocked Google as a scraper (which as I noted above is actaully what I was initially looking for).
How dynamic is the monster query and how critical is it for googlebot to see the upto the minute data ?
Could you generate static versions of the most frequently requested pages at an off peak time and serve those to googlebot during peak times ?
|Could you generate static versions of the most frequently requested pages at an off peak time and serve those to googlebot during peak times ? |
Not for less than the cost of a new server. It's not something I could do anyway, the site owner would have to do it, and he's already working at his real job :).
We've decided that throwing hardware at it is the easiest and lowest cost mid term solution.
The current hardware we're on may be a bit antique - dual xeon's (pre 'dual core' stuff), 8gigs of 533mhz ram, scsi drives. Nevertheless, I've actually gotten rid of much of my high processing power apps on it over the years - outside this friend's site the machine is still way overkill for what I need to do. But an upgrade to current hardware with fast drives, fast mobo's, ddr3 ram, 16 cores, I expect a machine like that will be 10X faster than what we have now, so it should do us for the next 2-5 years.
The only point I'm still chewing on is that I currently have an identical cold spare server for parts, and with a new machine I won't have a complete identical machine sitting in storage for the time when a piece of hardware dies at 3am.
It's really not a bad investment, not considering that it's really by biggest business costs (we should be thankful that we have a business model like this) - it's just curious that I have to pay the money at all in order to satisfy Google's crawler.
|it's just curious that I have to pay the money at all in order to satisfy Google's crawler |
So certainly not free traffic as was suggested in another thread [webmasterworld.com]