| This 42 message thread spans 2 pages: < < 42 ( 1  ) || |
|Having Google +1 button on a page will override robots.txt blocks!|
Now this one is interesting.I just found this answer from a google employee in one of their forum threads - [google.com...]
|The +1 Button is only intended to be used on pages that contain all public content. By putting the button on a page we're taking it as an indication from you that this page is public content. This means that we will fetch your page even if crawler directives indicate otherwise. |
What is interesting to me is until now,i was thinking that if you block a page in robots.txt, google wouldn't even crawl it. But it looks like google will fetch the page, see whether it has a +1 button and then disregard the robots.txt block. Wow, this is great!
Looks like Googlebot will get into every corner of your site.
|AFAIK facebook doesn't have any bot to crawl websites or pages on websites like the search robots. |
Google crawls pages that the webmaster has indicated they want to encourage users to link to using their social stuff.
Facebook crawls pages that users share using their social stuff.
Not identical, but pretty similar. The difference is that Google's also leads to indexing in the search engine - but why would you put a +1 button on a page you did not want in the SERPS?
|It should be equally simple to deploy on any script-driven site just as long as you know the URL format for all pages that should display the button or for all pages that should not display the button. |
Depending on the CMS you could also add some meta data to the page in the database, or use different page templates.
|and with regard to another thread - +1 button on any page that is a spider trap folder, will catch and ban google |
Good point for us to remember - and the best reason why Google should not do this: they will be blocked from indexing lots of good sites.
ive just had another look at the google+ code and there is actually an option to provide a URL of your choosing, as an attribute on the <div>.
i presume that google would crawl those URLs, rather than the ones that we are all worrying about.
|I don't see suppress in Drupal, Wordpress or any of the other DB driven CMSs. |
I'm sure someone will develop a WP plugin for you. I gather that WP lacks a cloaking plugin as well. I guess I'm saying, the spoils usually go to those than know a bit more than how to use Drupal and Wordress out of the box (webmasters who write code).
Oh, how I long for the good ol days of cloaking. Such fun...
|i presume that google would crawl those URLs, rather than the ones that we are all worrying about. |
My concern is that robots.txt has to be ignored to crawl these pages which means Google is disobeying and ignoring robots.txt... that should NOT be the default behavior though I suppose they want all the help they can get for +1.
I tried +1 but it yielded zero benefit for me and hurt page load times considerably, it wasn't for me.
|londrum wrote: |
you can understand their thinking though... google+1 is not for users, not really. its not like the facebook button where people click it to share stuff with their friends... the whole point of google+1 is to let google know that you think the page is worthy of being in their index. everything else is just fluff. so why put the button on a page if you dont want it boosted in the index? there's no point.
Hmm... Good point. If I'm remembering correctly, the "improve our index" aspect seemed to be the focus of the +1 button when it was announced. The sharing-through-Goole+ aspect seemed to come later.
Perhaps some/most of us have been incorrectly thinking of the +1 button as "a sharing tool that can help rankings" and not "a ranking tool that can be used to share"...
I'd rather than my "popularity" comes from VISITORS, not some "rank-this" scheme. Thus no +1 on my sites, thus no worries on this particular revelation of Google behavior. But I'm not surprised...
|I don't see suppress in Drupal, Wordpress or any of the other DB driven CMSs. |
I don't know enough about Drupal. In Wordpress, I'd probably just put some PHP code directly into the template rather than messing around with trying to write an actual plugin.
In MODX, my preferred CMS, there are half a dozen ways you could do this, depending on exactly what you wanted to do. If you want page-by-page manual control, you could make the button code the default value of a template variable, then remove it as needed - that has the advantage of being very easy, it would take less than 5 min. To control it with conditional logic, you could do it as a plugin that listens for the OnWebPagePrerender event and strips out the button as needed, or you could use a snippet to generate the button and remove it depending on request parameters before the page renders - that would be faster, because the results could be cached. Another approach would be to use snippets to dynamically generate your robots.txt, your XML sitemap, and your +1 button and use the "Searchable" property to control all three on a page-by-page basis.
My point is that in a good CMS such as MODX there would be a lot of ways to do this, some of them quite easy.
Are you certain of that? They haven't responded to my question. To me it wouldn't make sense to crawl the exact URL that triggerd the script for all the reasons discussed here, it would make much more sense to crawl the button's target.
Isn't it obvious.
If there is +1 button, it means Google can crawl that page.
Yes, it is obvious and just that simple - and equally unexpected, I would say.
The challenge now is writing code to suppress the *1 button whenever the content is accessed through a non-canonical URL, or on a "sort" or Site Search results page that you don't want indexed... that kind of thing.
Once you know the issue, the code is certainly do-able. But you need to know that there's an issue. I'm quite happy that indyank brought this issue to light.
I agree, thanks for the heads up, then again, I see an assault on webmasters by Google with all these new "innovations" without early documentation which results in a confusion on forward procedures as in "we'll keep chipping away until everyone has no clue and we can do what we want to do... and then let TIME OF ACCEPTANCE be our defense if anyone makes a stink."
Then again, I don't give Google an inch more than is appropriate and that only grudgingly. Currently growing other options, and have been for the last six years.
And have a number of Google IPs banned in .htaccess for bad behavior and will maintain as well. There's "free" and there's "free ride" but nobody makes a free ride on me unless I get something back... and recently that hasn't been working, regardless of all the "hope and change" with me hoping and G making all the changes. (sigh)
I'm late to this thread. There are plenty people 1000x smarter than me. However, who actually has the audacity to think they can block Google from seeing their site? People just pointed out 2 in this thread. Based on my personal experience, I would assume they know about your site whether you realize it or not. Wouldn't the conclusion of this thread be something like, don't assume that your methods of denying a Google snoop are going to work. You don't know what they know. Safer to assume they know everything. Afterall, they are ultimately out to get the bad guys and if you're blocking them, isn't that a smoke signal? Whaddya got to hide from us? The brains of 100's of Google engineers and employees who are on a crusade to keep the "bad guys" from cashing in is > than webmasters trying to figure out what's behind that concrete wall. False sense of security that's all I'm saying even with my modest intelligence.
The problem is that too many websites are coded before they have been fully designed. Additionally, URL structure and URL-space fail to be included as a core part of that design process.
Too often, sites are designed by thinking more about the files hosted on the server, and less on the URL requests that will be made to fetch content. That's the wrong approach.
If you know the URL space that the pages and files comprising your site occupy, it's a simple job to code the button to show only where it should. Base the rules on the requested URL/requested URL format.
| This 42 message thread spans 2 pages: < < 42 ( 1  ) |