Canonical Question - About multiple querystring with similar content - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Canonical Question - About multiple querystring with similar content

IanTurner

2:21 pm on Mar 5, 2013 (gmt 0)

I'm using a page that has a parameter which can take any numerical value.

Now I only want this page to appear once in Google - as the parameter is not going to change the content or meaning of the content, just some of the values.

Is the canonical tag going to be the way to go on this one - or is there some other way of dealing with this irrelevant parameter issue as far as Google is concerned.

I have my ideas, really want to get fresh input from others to make sure my thinking is along the correct lines.

TheOptimizationIdiot

9:16 pm on Mar 5, 2013 (gmt 0)

I'd personally "cover all the bases" with:

1.) Setting WTM to ignore the parameter(s).
2.) Noindex the pages with the parameter(s).
3.) Put a canonical on the pages with parameters pointing to the main URL.

Basically jump up and down yelling:
"It's Not Here for You Google; Go Get the One Without Parameters!"

Andy Langton

10:06 pm on Mar 5, 2013 (gmt 0)

I would consider sending the page (if the parameter is present) via a redirect to the canonical, that also set a cookie recording the numeric value. If the cookie is present, change the output of the page. No possibility of dupe content.

phranque

12:08 am on Mar 6, 2013 (gmt 0)

I would suggest:
- what Andy said

OR

- what TOI said in #1 & #3 but not #2

TheOptimizationIdiot

12:42 am on Mar 6, 2013 (gmt 0)

Okay, now I'm all curious.

Why would you not noindex the pages with the additional parameters?

indyank

3:14 am on Mar 6, 2013 (gmt 0)

Read your 1 and 2 together and you will know there is an unwanted danger of you telling Google to noindex a proper page you want to see in their index.

TheOptimizationIdiot

3:49 am on Mar 6, 2013 (gmt 0)

Huh?

You set the parameters to be ignored, but in case Google doesn't or glitches or something on those same pages you've said to ignore due to the parameters you put a noindex and rel=canonical to the page without parameters.

The noindex is a backup in case the WMT setting doesn't work. It's not a "proper page" you want indexed. You want the pages without the parameters indexed not the pages you've said to ignore the parameters on.

Here maybe I'm missing something:
example.com/this-page-could-have-parameters.php
^^^ This is the one you want indexed. See no parameters.

example.com/this-page-could-have-parameters.php?var=on-this-one-it-really-does-so-noindexing-this-one-doesnt-hurt-a-bit-its-a-backup-so-if-Google-glitches-you-dont-have-to-worry-about-a-duplicate
^^^ This one gets a noindex and rel=canonical pointing to the page without parameters.

Please, someone explain to me how that could possibly keep the page without parameters out of the index?

lucy24

5:00 am on Mar 6, 2013 (gmt 0)

It's not "with parameter" vs. "without paramter". It's value of parameter. Depending on how your site is coded, URLs without the given parameter may not exist at all.

size: large
size: jumbo
size: king size

is all the same stuff. But if you don't specify a size, you won't get your coffee.

TheOptimizationIdiot

5:25 am on Mar 6, 2013 (gmt 0)

...as the parameter is not going to change the content or meaning of the content, just some of the values.

...or is there some other way of dealing with this irrelevant parameter issue as far as Google is concerned.

Sounds like the parameter is irrelevant to me since that's what's specifically stated and the value is also stated to have no bearing on the meaning of the content of the page.

So, in this case it wouldn't be at all about the value, but rather about the parameter itself. It also seems like it would be fairly easy to create and link to a page without it if it's irrelevant and there isn't one already.

Maybe I misread?

IanTurner

7:40 am on Mar 6, 2013 (gmt 0)

You didn't misread - the page without the parameter just has it set to 1 by default and doesn't need it in the url, so basic internal links are without parameter.

However, I too have a concern with your item 2 as the pages with the parameter may well generate external inbound links and in my view the value of these links would be lost if the page was noindexed.

TheOptimizationIdiot

8:30 am on Mar 6, 2013 (gmt 0)

Nah, it's been tested quite a few times and noindex pages pass link weight.

I have and have had 1000s of pages noindexed on a site for years (they were in for a while in between) and the rankings are actually better with the pages I don't need in the index for one reason or another noindexed.

TheOptimizationIdiot

8:43 am on Mar 6, 2013 (gmt 0)

Eric Enge: Can a NoIndex page accumulate PageRank?

Matt Cutts: A NoIndex page can accumulate PageRank, because the links are still followed outwards from a NoIndex page.

Eric Enge: So, it can accumulate and pass PageRank.

Matt Cutts: Right, and it will still accumulate PageRank, but it won't be showing in our Index. So, I wouldn't make a NoIndex page that itself is a dead end. You can make a NoIndex page that has links to lots of other pages.

[stonetemple.com...]

lucy24

9:03 am on Mar 6, 2013 (gmt 0)

You're conflating two layers of gwt settings.

First question is: Does parameter affect page content? If no, then you tell them to ignore the parameter and take no further action.

All further complications, such as no-indexing any page that contains a given parameter at all, only kick in when you get to the second case: a parameter that does affect page content. You've already said it doesn't.

page.php?a=1&b=2&c=3
ignore c

link to page.php?a=1&b=2&c=3
= exactly the same as link to page.php?a=1&b=2

noindex page.php?a=1&b=2&c=3
= page.php?a=1&b=2 is removed from index

:: noting with interest that g### thinks I have a parameter called "newwindow" ... meaning that they've taken some linking site's "open in new window" code and applied it to a static html page, failing to notice that it's the other site's parameter, not mine ::

TheOptimizationIdiot

9:15 am on Mar 6, 2013 (gmt 0)

example.com/this-page-could-have-parameters.php?var=on-this-one-it-really-does-so-noindexing-this-one-doesnt-hurt-a-bit-its-a-backup-so-if-Google-glitches-you-dont-have-to-worry-about-a-duplicate

Using your example:

page.php?a=1&b=2&c=3
ignore c

link to page.php?a=1&b=2&c=3
= exactly the same as link to page.php?a=1&b=2

page.php?a=1&b=2&c=3
ignore c

link to page.php?a=1&b=2&c=3
= exactly the same as link to page.php?a=1&b=2
= noindexing when parameter c is included doesn't hurt a bit. It's a backup so if Google glitches you don't have to worry about a duplicate.

g1smd

9:55 am on Mar 6, 2013 (gmt 0)

Once you make the leap to extensionless URLs without parameters, and with very careful URL-format design, most of this confusion goes away.

lucy24

11:59 am on Mar 6, 2013 (gmt 0)

noindexing when parameter c is included doesn't hurt a bit

There's a big difference between ignoring a parameter, and ignoring pages whose URL contains the parameter. You could end up doing exactly the opposite of what you intended.

Detour to raw logs suggests that they first invented "newwindow" --in search, not analytics-- way back in October, meaning I'll never know where it originally came from. Probably from one of those ### search-results pages that should never have been crawled in the first place, and now I'm stuck with their blasted parameters. Grumble.

TheOptimizationIdiot

12:07 pm on Mar 6, 2013 (gmt 0)

There's a big difference between ignoring a parameter, and ignoring pages whose URL contains the parameter.

You're not ignoring the pages, you're noindexing duplicates if for some reason Google decides to spider a page with a parameter you've told them to ignore. Links still count. PageRank still get accumulated and passed. All you're doing is safeguarding against duplicate pages being indexed caused by a parameter you've said to ignore not being ignored for some reason.

There's no way it can have the opposite effect if it's implemented correctly, because the pages should not ever be accessed, and if they are then you're simply noindexing duplicates of the canonical version.

As far as your issue goes, why not just fix all the query strings with a couple lines of mod_rewrite and not worry about where they came from?

RewriteCond %{THE_REQUEST} \?
RewriteRule .? http://www.example.com%{REQUEST_URI}? [R=301,L]

[edited by: TheOptimizationIdiot at 12:25 pm (utc) on Mar 6, 2013]

phranque

12:07 pm on Mar 6, 2013 (gmt 0)

Nah, it's been tested quite a few times and noindex pages pass link weight.

however, this won't help the parameter-free version of the url accumulate PR.

Matt Cutts: Right, and it will still accumulate PageRank, but it won't be showing in our Index. So, I wouldn't make a NoIndex page that itself is a dead end. You can make a NoIndex page that has links to lots of other pages.

the noindexed urls (with parameters) will accumulate PR and pass PR to linked and followed urls on the noindexed pages.
since the url-with-parameters is noindexed, the link rel canonical element in that document will be ignored.

TheOptimizationIdiot

12:11 pm on Mar 6, 2013 (gmt 0)

Good golly. Ian can do whatever he wants.

They're both just "safeguards" against you saying "ignore the parameter" and it not being ignored.

If Google handles the "ignore" correctly, neither the rel=canonical nor the noindex will be seen/counted. If they don't ignore the parameter for some reason, then both will be found and have the same effect as the "ignore" should.

And where on earth do you get a canonical is ignored on a noindexed page(s)? Never mind, I give up.

TheOptimizationIdiot

12:36 pm on Mar 6, 2013 (gmt 0)

bold added

This is definitely an interesting question :-). Before the rel=canonical link element was announced, using noindex robots meta tags was one way that webmasters were directing us towards canonicals, so this is certainly something we know and understand. However, with the coming of the rel=canonical link element, the optimal way of specifying a canonical is (apart from using a 301 redirect to the preferred URL) is to only use the rel=canonical link element.

One reason for this is that we sometimes find a non-canonical URL first. If this URL has a noindex robots meta tag, we might decide not to index anything until we crawl and index the canonical URL. Without the noindex robots meta tag (with the rel=canonical link element) we can start by indexing that URL and show it to users in search results.

http://productforums.google.com/forum/#!topic/webmasters/0sqRrolO_Ss

I would still use both, because it has exactly the desired effect and the canonical is recognized on noindexed pages, according to JohnMu quoted above.

bold added

When Google detects duplicate content, such as variations caused by URL parameters, we group the duplicate URLs into one cluster and select what we think is the "best" URL to represent the cluster in search results. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL. Consolidating properties from duplicates into one representative URL often provides users with more accurate search results.

To improve this process, we recommend using the parameter handling tool to give Google information about how to handle URLs containing specific parameters. We'll do our best to take this information into account; however, there may be cases when the provided suggestions may do more harm than good for a site.

[support.google.com...]

They'll do their best to take the ignore into account.
Hmmm I wonder why would I want "safeguards" (canonical & noindex) on the duplicate pages?

phranque

1:31 pm on Mar 6, 2013 (gmt 0)

i shudder and look for a solution based on http protocol whenever matt or john use the following or similar weasel words:
- "we might decide not to"
- "(we) select what we think is 'best'"
- "we'll do our best too take this into account"

the proper solution requires no beneficence on the part of google to get the intended url indexed with minimal loss of link equity.
respond with a cookie and a 301 to the canonical url.
there is no "might" or "think" involved.
no waiting around for G to "take things into account" (or not).
no link rel canonical required since you are there already and no GWT parameter magic since there are no parameters in the canonical url.

IanTurner

2:26 pm on Mar 6, 2013 (gmt 0)

@g1smd - no it doesn't the page already has extensionless URLs - this is a parameter that will be used to make the users experience quicker and easier.

The only reason I'm not posting it is that it is coming out of some javascript generated in div that is created using Ajax.

I may look at finding a way of rewriting the code to use a post or using the cookie and 301 to the canonical if that isn't possible.

g1smd

3:03 pm on Mar 6, 2013 (gmt 0)

Understood.

In that case, rel="canonical" pointing to the URL without any parameters seems the best way to go.

Make sure that any "share this" style add-ons encourage the user share the canonical version.

TheOptimizationIdiot

3:28 pm on Mar 6, 2013 (gmt 0)

If that's the case I would think you should be able to switch the JS generated in the div to just be something like: onClick="yourFunctionName(NUMBER);" or onSomeOtherEvent="yourFunctionName(NUMBER);"

Then include something like this on the page(s) from an external JS file:

var xmlhttp;
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}

function yourFunctionName(value_to_send) {
xmlhttp.onreadystatechange=function()
{
if (xmlhttp.readyState==4 && xmlhttp.status==200)
{
document.getElementById("yourDiv").innerHTML=xmlhttp.responseText;
}
}
xmlhttp.open("POST","ajax_test.asp",true);
xmlhttp.setRequestHeader("Content-type","application/x-www-form-urlencoded");
xmlhttp.send("yourParameter="+value_to_send);
}

Put it together in a hurry from these two pages so I didn't "really think it through", but hope it gives you some ideas or a direction anyway:

[w3schools.com...]
[w3schools.com...]

IanTurner

5:20 pm on Mar 6, 2013 (gmt 0)

I like w3schools too - one of the better resources out there.

Thanks for this - it has at least started my research into the solution.

g1smd

5:42 pm on Mar 6, 2013 (gmt 0)

Some bits of w3schools are out of date, so do tread with caution. There was a very interesting thread on the subject, here at WebmasterWorld, last Autumn sometime.

Robert Charlton

7:15 pm on Mar 6, 2013 (gmt 0)

Some bits of w3schools are out of date, so do tread with caution.

g1smd - This might be the discussion you're thinking about, from back in Oct - Nov 2012. It contains a number of cautions about the current status of w3schools and perhaps deserves equal time now....

Best HTML Course for beginners?
http://www.webmasterworld.com/html/4511180.htm [webmasterworld.com]

g1smd

7:26 pm on Mar 6, 2013 (gmt 0)

Yes, in particular the link to [w3fools.com...]

TheOptimizationIdiot

8:39 pm on Mar 6, 2013 (gmt 0)

Well, the w3fools site isn't quite accurate either. I just got a ways down the page and thought they seemed to be a bit too much, so I decided to check on the accuracy of the information they present and what do you know the first (and only) so far one I checked:

www.w3schools.com/html/html_links.asp.

The name attribute specifies the name of an anchor. The name attribute is used to create a bookmark inside an HTML document.

This is misleading. Named anchors have been deprecated since HTML4 and replaced with element IDs. (Yes, that's right: you can link to any element with a href="#thing" as long as it has id="thing". Yes, it works everywhere.)

Is, as they would say: Blatantly False

From the w3schools site:

An anchor with an id inside an HTML document:
<a id="tips">Useful Tips Section</a>

Create a link to the "Useful Tips Section" inside the same document:
<a href="#tips">Visit the Useful Tips Section</a>

Or, create a link to the "Useful Tips Section" from another page:
<a href="http://www.w3schools.com/html_links.htm#tips">
Visit the Useful Tips Section</a>

[w3schools.com...]
http://www.w3schools.com/html/html_links.asp

I guess now the Fools rant about the w3schools site could be as out of date as the w3schools site may have been in places for a while which would definitely make them the Fools too, but of course this is the Internet, so if the Fools site generates traffic I guess the pot calling the kettle black and even citing it as some "authority" is okay.

#SMH I find it tough to believe the "sticklers" here would recommend a site that's two years out of date ranting about another site being inaccurate and not being updated, when that site (the Fools) is doing exactly the same thing they (the Fools) were complaining about (a site being out of date and inaccurate) but, eh, whatever.

g1smd

10:25 pm on Mar 6, 2013 (gmt 0)

I haven't checked all the claims they make; haven't got the time. So the headline is that websites can contain errors, and sites that purport to list those errors can themselves contain more errors. Whatever you are doing, check multiple tutorial sites as well as the official W3C if code doesn't work the way you expect.

This 32 message thread spans 2 pages: 32