Welcome to WebmasterWorld Guest from 54.160.221.82

Message Too Old, No Replies

Having Google +1 button on a page will override robots.txt blocks!

     
5:16 pm on Sep 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


Now this one is interesting.I just found this answer from a google employee in one of their forum threads - [google.com...]

The +1 Button is only intended to be used on pages that contain all public content. By putting the button on a page we're taking it as an indication from you that this page is public content. This means that we will fetch your page even if crawler directives indicate otherwise.


What is interesting to me is until now,i was thinking that if you block a page in robots.txt, google wouldn't even crawl it. But it looks like google will fetch the page, see whether it has a +1 button and then disregard the robots.txt block. Wow, this is great!

Looks like Googlebot will get into every corner of your site.
5:26 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


I think white listing wouldn't work with google anymore. From what I gather by going through that thread and the OP's observations, it looks like some of their bots might not even identify themselves with a proper useragent.

These guys are getting more and more evil with every passing day!
5:36 pm on Sept 2, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12671
votes: 141


um, yea, that's pretty nasty.
5:36 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


My assumption was a bit different. I'm thinking that when a +1 vote is registered for a page, then that URL gets added to Google's crawling queue if it wasn't already there - and it then gets crawled even if there's a Disallow rule.

For one, that process would be a lot less resource intensive that taking a full inventory of every URL on the web. And this process would only be a minor ignoring of the robots.txt protocol instead of a major violation.

Nevertheless, we'll need to be very cautious about any automated adding of +1 buttons across a website.
6:29 pm on Sept 2, 2011 (gmt 0)

Full Member

10+ Year Member

joined:Feb 23, 2003
posts:207
votes: 0


There's nothing nefarious about it. Google doesn't need to crawl all your pages (regardless of robots.txt) to find the +1 buttons. You've got the JavaScript, from Google, right on the page. They just check referrers for the +1 to ensure it's been crawled. That's all.
6:57 pm on Sept 2, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12671
votes: 141


Yes, but what about the various things I block because I don't want a lot of sort=, pagination=, display= and other types of potential duplicate content cluttering up the joint? I could rely on GWT's parameter exclusion (yea right) or rely on Google to figure it out (because THAT always works) or figure out a way to return a noindex (hunh?) Easier just to not use +1, except they're pretty much saying you *gotta* use +1 - at least that's what the clients hear.

They're messing with my systems. Do. Not. Want.
6:59 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


but what about the various things I block because I don't want a lot of sort=, pagination=, display= and other types of potential duplicate content cluttering up the joint?

Exactly right. I don't even want to see a crawler requesting those URLs or to spend the bandwidth responding.
7:32 pm on Sept 2, 2011 (gmt 0)

Preferred Member

joined:Jan 7, 2010
posts:443
votes: 0


Netmeg - Hard.Luck.On.You.

Another google trick - they are really showing their true colours el-rapido these days.
7:43 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member

joined:Apr 14, 2010
posts:3169
votes: 0


they are really showing their true colours el-rapido these days.


I agree, 110%. I'm sure Google has a plan in place for when webmasters revolt and say hey - stop making billions off my content yo!

Ignoring robots.txt directives sounds like grounds for a lawsuit, why should you foot the bandwidth?

Worse is that analytics and some web hosts hide googlebot activity, as if they know it shouldn't be there...

update: from John Mu
Just to follow up on the Instant Preview questions.. We fetch the content for Instant Previews (provided it's not cached yet) on demand when the user requests it. When we do that, we need to be able to fetch the page the way that the user would see it, and for that we may fetch content that's otherwise disallowed by the robots.txt file.


Yup, lawsuit incoming. The explanation doesn't hold water since it should not be possible to request an instant preview on a disallowed page. Chicken before the egg problem John.
8:14 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 28, 2002
posts:757
votes: 0


but what about the various things I block because I don't want a lot of sort=, pagination=, display= and other types of potential duplicate content cluttering up the joint?

Right, but the +1 button uses the rel="canonical" link if it exists to identify the target of the button, so all you have to do is properly canonicalize those pages and the bot should request the URL that the button actually points to, which will be the canonical unless specified otherwise.

Or you can explicitly specify the URL in the code for the button.

You still have a fair amount of control over this.

I can see how it's kind of rude in principle, but in actual practice why would you want a Google +1 button pointing a URL that you don't Google to see? Just make sure you're pointing the button at a URL that you like, which you probably ought to do anyway otherwise when people share it on Google+ they'll be sharing all sorts of random parametrized URLs that you don't like.
8:23 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 28, 2002
posts:757
votes: 0


Hmmm... I just realized that it's not entirely certain that my interpretation of the statement is correct. I just asked for clarification, we'll see what they say.
8:46 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Feb 12, 2006
posts:2492
votes: 22


you can understand their thinking though... google+1 is not for users, not really. its not like the facebook button where people click it to share stuff with their friends... the whole point of google+1 is to let google know that you think the page is worthy of being in their index. everything else is just fluff. so why put the button on a page if you dont want it boosted in the index? there's no point.

the only reason to put it on a noindexed page is if google spreads some of the benefit throughout the rest of the site. but is that the way it works? i dont think it is. a click only counts towards that specific URL.
8:59 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I do understand the thinking behind this, but I also think they haven't thought it through very well. Now, if the crawling uses a canonical link (we're waiting for that clarification) that would handle a lot of these edge cases that look so troubling.

However, Google does rush things into production without looking at cross-discipline ramifications. Remember how the first AJAX SERPs broke Analytics?
9:08 pm on Sept 2, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12671
votes: 141


Actually you can program the +1 button now to share to your google+ page, as I understand it.

I am not saying I would *intentionally* put the button on a noindexed page, I am talking about where I put the button on pages where alternative URLs (to the same content) can be generated, and where I usually block off those alternative URLs from being indexed. The canonical might help in some cases, but I already have canonical set and I still get crap in the index if I don't specifically block it out. Google is not 100% reliable in this.
9:12 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 28, 2002
posts:757
votes: 0


Actually you can program the +1 button now to share to your google+ page

That is currently the default behavior.
9:32 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member

joined:Apr 14, 2010
posts:3169
votes: 0


The canonical might help in some cases, but I already have canonical set and I still get crap in the index if I don't specifically block it out. Google is not 100% reliable in this.


Google is 100% reliable in that they will crawl all available data, the question is will they obey webmasters and not crawl what we say don't crawl. The answer appears to be no. I've set up several honeypot pages to see what Googlebot does in reality, the only unbiased answers come from testing for yourself.
9:51 pm on Sept 2, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 28, 2002
posts:757
votes: 0


the only unbiased answers come from testing for yourself

Good point, except that they say that they may crawl the page, which means even if they don't crawl your test pages, they may still crawl others in other contexts -- however, it would be interesting to see the results. Maybe I'll set up a test too, in my vast spare time...
9:52 pm on Sept 2, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 4, 2001
posts: 2143
votes: 7


Maybe I'm over simplifying, but isn't anything that helps your ranking a good think even if it has some quirks? Remember, that old adage applies here: you get what you pay for and Google+ is free.

Marshall
1:48 am on Sept 3, 2011 (gmt 0)

Full Member

joined:Mar 17, 2011
posts:275
votes: 0


Most sites now use a Database to basically pull a page together. Back in the days that I used HTML to create every page, it would have been easy to delete something like the +1 from a specific page.

Now though, with a DB driven site, it is nearly impossible.

On another tread I mentioned I was deleting pages to try to improve my page rank. Every page on the DB portion of the site has +1. It wouldn't have made sense to use no-index or disallow to remove the page from the index, perhaps.
2:42 am on Sept 3, 2011 (gmt 0)

Preferred Member

5+ Year Member

joined:Aug 2, 2006
posts:375
votes: 1


No +1, no problem. Sounds needy.
3:34 am on Sept 3, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 19, 2004
posts: 168
votes: 1


Now though, with a DB driven site, it is nearly impossible.


Not so! Suppressing +1 should be easy peasy lemon squeezy on most any DB driven site. Just suppress +1 on pages you don't want in the index. Too much complaining here. Personally, I think that Google could honor the noindex command on a +1 page. They need to step up to the plate on this one. Mr. Cutts, why can't Google honor the noindex?
3:37 am on Sept 3, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


robots.txt has been designed to block some or all the disciplined bots from crawling any page.

Why should a recently introduced +1 button take precedence to jeopardise what has been well understood by all? I do agree that one shouldn't use the button on a page they don't want to share publicly. But, any good bot should obey the most restrictive instruction when there are conflicts.

When you have two robot meta tags on a page by mistake, one telling the bots to "index" and the other telling the bot to "noindex", doesn't google say they will apply the most restrictive tag? Why should it be different in this case?

I sincerely feel that it would be better if they apply the same logic here.
5:04 am on Sept 3, 2011 (gmt 0)

Senior Member from LK 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 16, 2005
posts:2414
votes: 16


except they're pretty much saying you *gotta* use +1 - at least that's what the clients hear.


Why? It only affects SERPS for people who follow those who +1. Are these sites going to get lots of +1s from people who are followed by potential customers?

Having a +1 button does not seem to greatly change the number of +1s you get.

If its just the clients perception and you disagree but cannot change their minds, that is their problem - much the same as if they insisted on link exchanges with dodgy sites.

@indyrank. This is different. It is more like what Facebook does with links - do they follow robots.txt? Incidentally FB also censor links in comments, even to fairly mild material (Wikipedia entry on a racy text adventure game, IMDB page on a frightening but not particularly offensive film).

I agree that ideally Google would index them separately, but that would be expensive. It does not seem unreasonable to say that you should not put +1 buttons on pages you do not want indexed. I think what they need to do is document it better so people are clear about it when they add the button.
6:11 am on Sept 3, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


graeme_p, what is different? Do not mix up facebook with a search engine bot. AFAIK facebook doesn't have any bot to crawl websites or pages on websites like the search robots.

Yes, google does mix up social and search a lot these days to confuse several webmasters like you. However, whenever googlebot crawls a page on a site, it is supposed to check the robots.txt to find what is disallwed and obey the rules there.It is googlebot which is supposed to determine whether a page can be crawled and indexed.

[edited by: indyank at 6:34 am (utc) on Sep 3, 2011]

6:18 am on Sept 3, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


I don't think canonical tags are a good solution to this issue for several reasons. Moreover, they are not widely used and they are just hints for googlebot.

robots.txt should take precedence over anything else.
6:25 am on Sept 3, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


Google, if you don't like users using the +1 button on pages they have blocked, help them remove it, warn them or ban them from using that button.

It isn't nice to find workarounds for bypassing robots.txt.
6:36 am on Sept 3, 2011 (gmt 0)

Full Member

joined:Mar 17, 2011
posts:275
votes: 0


Just suppress +1 on pages you don't want in the index.


Suppress? I'd love to know more about suppress. If you have the same sidebar for the whole site, how do you suppress something on that sidebar on one page? I don't see suppress in Drupal, Wordpress or any of the other DB driven CMSs.

I am not saying it can be done, but how?
8:25 am on Sept 3, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


A few lines of code can detect what page the user requested and then decide to show or not show the button.

I recently did this for the FaceBook "like" button on a MediaWiki-driven website. It should be equally simple to deploy on any script-driven site just as long as you know the URL format for all pages that should display the button or for all pages that should not display the button.

In this case the rule was "don't show the button on pages where URL contains index.php or begins Talk: or...". Additionally, we don't bother showing the button if it is a searchengine bot requesting the page.
9:20 am on Sept 3, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 3, 2003
posts: 805
votes: 0


and with regard to another thread - +1 button on any page that is a spider trap folder, will catch and ban google.
3:00 pm on Sept 3, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


The question is will they obey webmasters and not crawl what we say don't crawl. The answer appears to be no.

That's been true a long time, unfortunately. You'll find scores of reports in WW's "Search Engine Spider and User Agent Identification [webmasterworld.com]" forum.

For example, here's fresh info about non-obvious Twitter-mining:

Resolving "urlresolver" | Google IPs repeat no-robots runs
Recap post: [webmasterworld.com...]

And more GWT news:

Google Web Preview | Not just from bare IPs anymore... [webmasterworld.com...]

After spending too much unrecompensed time 'accommodating' GWT before G worked out their own bugs, I will no longer kick their tires for them via +1 or anything else. I'm seriously weary, and increasingly wary, of jumping through their we-cloak-but-you-can't hoops.
This 42 message thread spans 2 pages: 42
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members