Welcome to WebmasterWorld Guest from 54.145.166.96

Message Too Old, No Replies

Google Throttles Back Crawl Rate For SOPA Protest Day

   
2:54 pm on Jan 18, 2012 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



According to Google, for SOPA protest day only, googlebot has been throttled back to minimise the affect on the participating sites.
3:21 pm on Jan 18, 2012 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Good move by Google. I was wondering what influences this blackout would reveal. Specifically I was all ready to test what ranking influence (if any) came from Wikipedia, Reddit and the other sites participating in the blackout so I would know which sites are best to spam.
3:27 pm on Jan 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



googlebot has been throttled back to minimise the affect on the participating sites.


But is this fair to other sites?

Marshall
3:38 pm on Jan 18, 2012 (gmt 0)



Since when has Google been bothered about making things 'fair'? They have their rules, and we have to abide by them - but they certainly aren't 'democratic' or fair.
4:28 pm on Jan 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was all ready to test what ranking influence (if any) came from Wikipedia, Reddit and the other sites participating in the blackout

Wikipedia isn't returning 503, they just have a protest message slapped over the top of their pages, so in their case there probably isn't much effect at all.
5:09 pm on Jan 18, 2012 (gmt 0)



sanjuu wrote:
Since when has Google been bothered about making things 'fair'? They have their rules, and we have to abide by them - but they certainly aren't 'democratic' or fair.

I don't believe Google ever claimed to be democratic. Why even mention it?

--
Ryan
5:16 pm on Jan 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well Googlebot seems to have all but disappeared on my main site. It may be on all sites then?

Regards...jmcc
5:58 pm on Jan 18, 2012 (gmt 0)



It's going to be one of those years... huh... revolution > rebirth > Golden Era > rinse and repeat.
6:04 pm on Jan 18, 2012 (gmt 0)

5+ Year Member



let's be real... this is not for the "participating sites". the main reason is to minimize the effect on wikipedia, which g loves so much...

(googlebot activity dropped on my site too...)
6:05 pm on Jan 18, 2012 (gmt 0)

5+ Year Member



why don't they throttle back googlebot's activity on participating sites only?
6:17 pm on Jan 18, 2012 (gmt 0)

5+ Year Member



I don't think anyone is asking or discussing the right questions. The discussion is on googlebot scaling back on crawling sites.

What about the fact that many will not spend time on Wikipedia today. How will that affect Wikipedia's ranking in search results? Is one day of less pages views and time-on-site enough to lower a site rankings? Why does slowing down googlebot crawl rate help (or hurt) this? And what about those visitors who are looking for content, where will they find it if Wikipedia is down as well as other sites? Will this day be enough to cause traffic shifts in the future? Maybe it is not only googlebot that has taken a day off, maybe there are other things that google put on hold today that we cannot see.

I think those are the questions that I would like to discuss answers to...
6:36 pm on Jan 18, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



My guess is they don't want to cache code and content that will not be on those pages tomorrow.

The extra code also skews both the internal and external linking profile of the sites.
6:55 pm on Jan 18, 2012 (gmt 0)



zerillos wrote:
why don't they throttle back googlebot's activity on participating sites only?

How would Google know which sites are participating in the blackout prior to crawling them?

--
Ryan
8:41 pm on Jan 18, 2012 (gmt 0)

5+ Year Member



0 visits from g-bot on all my properties for today.
9:44 pm on Jan 18, 2012 (gmt 0)

5+ Year Member



Well, they claim to know what quality is. It should be a piece of cake for their algo...
9:58 pm on Jan 18, 2012 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I was wondering what influences this blackout would reveal.

My take on this is that Google is not doing this to save electricity or to favor Wikipedia. This "throttling back" makes me think that Google doesn't want to distort many levels of the index that involve the algorithm, and my guess is that it may be throttling back more than crawl rate.

It also makes me wonder how something like constant multivariate testing (which I believe Google is doing) be paused or slowed?

Some areas of speculation...

If user behavior, eg, is factored into the algo... and I think it is... then how might blacked out sites affect an ongoing statistical model? I'm thinking users backing out on a mass scale when expected pages aren't there could change an important ongoing metric.

g1smd suggests...
My guess is they don't want to cache code and content that will not be on those pages tomorrow.

That makes sense. Taking this further along the lines of my speculations... Google is not just one database. It's a system of interrelated databases on such a massive scale that it undoubtedly has rules on the order of operations, probably with databases just to manage those. So yes, I'd guess that for code and content to get out of sync with user behavior might also create big anomalies in the back end.
11:27 pm on Jan 18, 2012 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



According to Pierre Far's earlier article, if a website is using a 503 status during the protest, then googlebot would automatically scale back when it saw that increase in 503 responses. His entire article (from Monday) does share some more hints on how googlebot crawls and how indexing occurs.

See "Website outages and blackouts the right way"
https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB
12:19 am on Jan 19, 2012 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Wikipedia is showing a 200 response... along with some kind of overlay. Apparently many other websites either didn't have the time or the capability to implement a 503. Therefore, at the end of the thread, Pierre Far posted this...

Pierre Far - Hello everyone. We realize many webmasters are concerned about the medium-term effects of today's blackout. As a precaution, the crawl team at Google has configured Googlebot to crawl at a much lower rate for today only so that the Google results of websites participating in the blackout are less likely to be affected.5:05 AM

Also, here's an http link to the Google+ discussion that will work with WebmasterWorld's linking system and redirect to https on Google+... [plus.google.com...]

For the algo hounds here, notice that he says "medium term effects of today's blackout."

The discussion, btw, is an excellent reference on best ways to handle website outages.
12:20 am on Jan 19, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yeah, and my take is that CSS + JS-driven overlay is not "the right way".
 

Featured Threads

Hot Threads This Week

Hot Threads This Month