homepage Welcome to WebmasterWorld Guest from 54.234.128.25
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google Throttles Back Crawl Rate For SOPA Protest Day
engine




msg:4408138
 2:54 pm on Jan 18, 2012 (gmt 0)

According to Google, for SOPA protest day only, googlebot has been throttled back to minimise the affect on the participating sites.

 

goodroi




msg:4408146
 3:21 pm on Jan 18, 2012 (gmt 0)

Good move by Google. I was wondering what influences this blackout would reveal. Specifically I was all ready to test what ranking influence (if any) came from Wikipedia, Reddit and the other sites participating in the blackout so I would know which sites are best to spam.

Marshall




msg:4408148
 3:27 pm on Jan 18, 2012 (gmt 0)

googlebot has been throttled back to minimise the affect on the participating sites.


But is this fair to other sites?

Marshall

sanjuu




msg:4408153
 3:38 pm on Jan 18, 2012 (gmt 0)

Since when has Google been bothered about making things 'fair'? They have their rules, and we have to abide by them - but they certainly aren't 'democratic' or fair.

freejung




msg:4408174
 4:28 pm on Jan 18, 2012 (gmt 0)

I was all ready to test what ranking influence (if any) came from Wikipedia, Reddit and the other sites participating in the blackout

Wikipedia isn't returning 503, they just have a protest message slapped over the top of their pages, so in their case there probably isn't much effect at all.

rlange




msg:4408213
 5:09 pm on Jan 18, 2012 (gmt 0)

sanjuu wrote:
Since when has Google been bothered about making things 'fair'? They have their rules, and we have to abide by them - but they certainly aren't 'democratic' or fair.

I don't believe Google ever claimed to be democratic. Why even mention it?

--
Ryan

jmccormac




msg:4408214
 5:16 pm on Jan 18, 2012 (gmt 0)

Well Googlebot seems to have all but disappeared on my main site. It may be on all sites then?

Regards...jmcc

Lenny2




msg:4408232
 5:58 pm on Jan 18, 2012 (gmt 0)

It's going to be one of those years... huh... revolution > rebirth > Golden Era > rinse and repeat.

zerillos




msg:4408237
 6:04 pm on Jan 18, 2012 (gmt 0)

let's be real... this is not for the "participating sites". the main reason is to minimize the effect on wikipedia, which g loves so much...

(googlebot activity dropped on my site too...)

zerillos




msg:4408238
 6:05 pm on Jan 18, 2012 (gmt 0)

why don't they throttle back googlebot's activity on participating sites only?

crazybrain




msg:4408249
 6:17 pm on Jan 18, 2012 (gmt 0)

I don't think anyone is asking or discussing the right questions. The discussion is on googlebot scaling back on crawling sites.

What about the fact that many will not spend time on Wikipedia today. How will that affect Wikipedia's ranking in search results? Is one day of less pages views and time-on-site enough to lower a site rankings? Why does slowing down googlebot crawl rate help (or hurt) this? And what about those visitors who are looking for content, where will they find it if Wikipedia is down as well as other sites? Will this day be enough to cause traffic shifts in the future? Maybe it is not only googlebot that has taken a day off, maybe there are other things that google put on hold today that we cannot see.

I think those are the questions that I would like to discuss answers to...

g1smd




msg:4408255
 6:36 pm on Jan 18, 2012 (gmt 0)

My guess is they don't want to cache code and content that will not be on those pages tomorrow.

The extra code also skews both the internal and external linking profile of the sites.

rlange




msg:4408270
 6:55 pm on Jan 18, 2012 (gmt 0)

zerillos wrote:
why don't they throttle back googlebot's activity on participating sites only?

How would Google know which sites are participating in the blackout prior to crawling them?

--
Ryan

Donna




msg:4408325
 8:41 pm on Jan 18, 2012 (gmt 0)

0 visits from g-bot on all my properties for today.

zerillos




msg:4408352
 9:44 pm on Jan 18, 2012 (gmt 0)

Well, they claim to know what quality is. It should be a piece of cake for their algo...

Robert Charlton




msg:4408362
 9:58 pm on Jan 18, 2012 (gmt 0)

I was wondering what influences this blackout would reveal.

My take on this is that Google is not doing this to save electricity or to favor Wikipedia. This "throttling back" makes me think that Google doesn't want to distort many levels of the index that involve the algorithm, and my guess is that it may be throttling back more than crawl rate.

It also makes me wonder how something like constant multivariate testing (which I believe Google is doing) be paused or slowed?

Some areas of speculation...

If user behavior, eg, is factored into the algo... and I think it is... then how might blacked out sites affect an ongoing statistical model? I'm thinking users backing out on a mass scale when expected pages aren't there could change an important ongoing metric.

g1smd suggests...
My guess is they don't want to cache code and content that will not be on those pages tomorrow.

That makes sense. Taking this further along the lines of my speculations... Google is not just one database. It's a system of interrelated databases on such a massive scale that it undoubtedly has rules on the order of operations, probably with databases just to manage those. So yes, I'd guess that for code and content to get out of sync with user behavior might also create big anomalies in the back end.

tedster




msg:4408384
 11:27 pm on Jan 18, 2012 (gmt 0)

According to Pierre Far's earlier article, if a website is using a 503 status during the protest, then googlebot would automatically scale back when it saw that increase in 503 responses. His entire article (from Monday) does share some more hints on how googlebot crawls and how indexing occurs.

See "Website outages and blackouts the right way"
https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB

Robert Charlton




msg:4408401
 12:19 am on Jan 19, 2012 (gmt 0)

Wikipedia is showing a 200 response... along with some kind of overlay. Apparently many other websites either didn't have the time or the capability to implement a 503. Therefore, at the end of the thread, Pierre Far posted this...

Pierre Far - Hello everyone. We realize many webmasters are concerned about the medium-term effects of today's blackout. As a precaution, the crawl team at Google has configured Googlebot to crawl at a much lower rate for today only so that the Google results of websites participating in the blackout are less likely to be affected.5:05 AM

Also, here's an http link to the Google+ discussion that will work with WebmasterWorld's linking system and redirect to https on Google+... [plus.google.com...]

For the algo hounds here, notice that he says "medium term effects of today's blackout."

The discussion, btw, is an excellent reference on best ways to handle website outages.

g1smd




msg:4408402
 12:20 am on Jan 19, 2012 (gmt 0)

Yeah, and my take is that CSS + JS-driven overlay is not "the right way".

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved