Welcome to WebmasterWorld Guest from 107.22.126.144

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Throttles Back Crawl Rate For SOPA Protest Day

     
2:54 pm on Jan 18, 2012 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:24615
votes: 590


According to Google, for SOPA protest day only, googlebot has been throttled back to minimise the affect on the participating sites.
3:21 pm on Jan 18, 2012 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3286
votes: 226


Good move by Google. I was wondering what influences this blackout would reveal. Specifically I was all ready to test what ranking influence (if any) came from Wikipedia, Reddit and the other sites participating in the blackout so I would know which sites are best to spam.
3:27 pm on Jan 18, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 4, 2001
posts:2231
votes: 38


googlebot has been throttled back to minimise the affect on the participating sites.


But is this fair to other sites?

Marshall
3:38 pm on Jan 18, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 12, 2011
posts:46
votes: 0


Since when has Google been bothered about making things 'fair'? They have their rules, and we have to abide by them - but they certainly aren't 'democratic' or fair.
4:28 pm on Jan 18, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 28, 2002
posts:757
votes: 0


I was all ready to test what ranking influence (if any) came from Wikipedia, Reddit and the other sites participating in the blackout

Wikipedia isn't returning 503, they just have a protest message slapped over the top of their pages, so in their case there probably isn't much effect at all.
5:09 pm on Jan 18, 2012 (gmt 0)

Full Member

5+ Year Member

joined:Mar 22, 2011
posts:339
votes: 0


sanjuu wrote:
Since when has Google been bothered about making things 'fair'? They have their rules, and we have to abide by them - but they certainly aren't 'democratic' or fair.

I don't believe Google ever claimed to be democratic. Why even mention it?

--
Ryan
5:16 pm on Jan 18, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts:2609
votes: 92


Well Googlebot seems to have all but disappeared on my main site. It may be on all sites then?

Regards...jmcc
5:58 pm on Jan 18, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Feb 25, 2011
posts:196
votes: 0


It's going to be one of those years... huh... revolution > rebirth > Golden Era > rinse and repeat.
6:04 pm on Jan 18, 2012 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:Aug 16, 2006
posts:397
votes: 1


let's be real... this is not for the "participating sites". the main reason is to minimize the effect on wikipedia, which g loves so much...

(googlebot activity dropped on my site too...)
6:05 pm on Jan 18, 2012 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:Aug 16, 2006
posts:397
votes: 1


why don't they throttle back googlebot's activity on participating sites only?
6:17 pm on Jan 18, 2012 (gmt 0)

New User

10+ Year Member

joined:June 7, 2005
posts:29
votes: 0


I don't think anyone is asking or discussing the right questions. The discussion is on googlebot scaling back on crawling sites.

What about the fact that many will not spend time on Wikipedia today. How will that affect Wikipedia's ranking in search results? Is one day of less pages views and time-on-site enough to lower a site rankings? Why does slowing down googlebot crawl rate help (or hurt) this? And what about those visitors who are looking for content, where will they find it if Wikipedia is down as well as other sites? Will this day be enough to cause traffic shifts in the future? Maybe it is not only googlebot that has taken a day off, maybe there are other things that google put on hold today that we cannot see.

I think those are the questions that I would like to discuss answers to...
6:36 pm on Jan 18, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


My guess is they don't want to cache code and content that will not be on those pages tomorrow.

The extra code also skews both the internal and external linking profile of the sites.
6:55 pm on Jan 18, 2012 (gmt 0)

Full Member

5+ Year Member

joined:Mar 22, 2011
posts:339
votes: 0


zerillos wrote:
why don't they throttle back googlebot's activity on participating sites only?

How would Google know which sites are participating in the blackout prior to crawling them?

--
Ryan
8:41 pm on Jan 18, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:July 1, 2009
posts:191
votes: 0


0 visits from g-bot on all my properties for today.
9:44 pm on Jan 18, 2012 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:Aug 16, 2006
posts:397
votes: 1


Well, they claim to know what quality is. It should be a piece of cake for their algo...
9:58 pm on Jan 18, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11914
votes: 297


I was wondering what influences this blackout would reveal.

My take on this is that Google is not doing this to save electricity or to favor Wikipedia. This "throttling back" makes me think that Google doesn't want to distort many levels of the index that involve the algorithm, and my guess is that it may be throttling back more than crawl rate.

It also makes me wonder how something like constant multivariate testing (which I believe Google is doing) be paused or slowed?

Some areas of speculation...

If user behavior, eg, is factored into the algo... and I think it is... then how might blacked out sites affect an ongoing statistical model? I'm thinking users backing out on a mass scale when expected pages aren't there could change an important ongoing metric.

g1smd suggests...
My guess is they don't want to cache code and content that will not be on those pages tomorrow.

That makes sense. Taking this further along the lines of my speculations... Google is not just one database. It's a system of interrelated databases on such a massive scale that it undoubtedly has rules on the order of operations, probably with databases just to manage those. So yes, I'd guess that for code and content to get out of sync with user behavior might also create big anomalies in the back end.
11:27 pm on Jan 18, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


According to Pierre Far's earlier article, if a website is using a 503 status during the protest, then googlebot would automatically scale back when it saw that increase in 503 responses. His entire article (from Monday) does share some more hints on how googlebot crawls and how indexing occurs.

See "Website outages and blackouts the right way"
https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB
12:19 am on Jan 19, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11914
votes: 297


Wikipedia is showing a 200 response... along with some kind of overlay. Apparently many other websites either didn't have the time or the capability to implement a 503. Therefore, at the end of the thread, Pierre Far posted this...

Pierre Far - Hello everyone. We realize many webmasters are concerned about the medium-term effects of today's blackout. As a precaution, the crawl team at Google has configured Googlebot to crawl at a much lower rate for today only so that the Google results of websites participating in the blackout are less likely to be affected.5:05 AM

Also, here's an http link to the Google+ discussion that will work with WebmasterWorld's linking system and redirect to https on Google+... [plus.google.com...]

For the algo hounds here, notice that he says "medium term effects of today's blackout."

The discussion, btw, is an excellent reference on best ways to handle website outages.
12:20 am on Jan 19, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Yeah, and my take is that CSS + JS-driven overlay is not "the right way".
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members