homepage Welcome to WebmasterWorld Guest from 54.211.201.65
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Rel Canonical or URL Parameter Handling or Both?
McMohan




msg:4555603
 8:33 am on Mar 16, 2013 (gmt 0)

Here is the scenario -

An ecommerce site. Thousands of "Sort by" pages (such as by Price, Type etc) and Pagination pages indexed. No Rel Canonical tag in place, but URL Parameters in WMT was set, with the option of "No URLs" for "Which URLs with this parameter should Googlebot crawl?"

I thought Google would not crawl and index these pages because of this directive, but it does. Three possibilities to my mind -

1. Add Rel Canonical tag and disable URL Parameters
2. Add Rel Canonical tag and keep URL Parameters.
3. Add Rel Canonical tag, keep URL Parameters and change it to "Let Googlebot decide" for "Which URLs with this parameter should Googlebot crawl?"

Any other line of attack that is better suited? Thank you

 

seoskunk




msg:4555700
 9:11 pm on Mar 16, 2013 (gmt 0)

Let Googlebot decide - surely they can't penalise for that!

Andy Langton




msg:4555711
 9:51 pm on Mar 16, 2013 (gmt 0)

Personally, I dislike the canonical attribute, although I dislike GWT parameter handling even more.

The main reason is that these are regarded as "hints" as to how Google should interpret your site. They're not absolute directives that will be obeyed. By far the most common reason for Google to ignore your "hint" is that the content at the URLs is different when Google visits (a very common occurrence). So they're not going to drop it or canonicalise if they think they might discover new content.

For sorting parameters, this happens pretty much all the time, as is the case with pagination. This is why GWT parameter handling isn't doing what you expect. To Google "no URLs" means "no new content", and if they don't see that, they will carry on indexing.

Personally, I regard "sort by" and similar filtering on ecommerce sites as user interface features that should not create new URLs. I strongly favour javascript/ajax implementations for such things, rather than readable hyperlinks that imply new content. The canonical attribute is the poor second solution to this problem, which will work some of the time. Parameter handling is hit and miss and is not a real solution.

One side note is that the parameter handling feature is described in a very worrisome way by Google. They talk about "crawling" and "do not crawl". Now, not crawling implies not retrieving the content at all - rather than mapping to a canonical, which is an indexing task. I assume this is just linguistic sloppiness on Google's part, although I have no test data on it to prove one way or another what the parameter handling option actually does when it actually works.

@seoskunk there's no "penalty" for duplicating content. You're just serving unnecessary content to Google that leaves it to sort through and make decisions - essentially, handing over control of how your site is crawled and indexed for a third party to decide, based on their particular criteria of the time. Fixing such problems has nothing to do with penalties, and everything to do with controlling your site's destiny. It's old school tech SEO.

McMohan




msg:4555994
 7:20 am on Mar 18, 2013 (gmt 0)

@Andy, thanks for your excellent post. Since there is no way to delete URL Parameters, I decided to "Let Googlebot decide". Through URL parameters and Rel Canonical, we have given the cues to Googlebot to decide how to treat these pages and I don't see what more we can do to help improve it.

TheOptimizationIdiot




msg:4555997
 7:49 am on Mar 18, 2013 (gmt 0)

I strongly favour javascript/ajax implementations for such things, rather than readable hyperlinks that imply new content. The canonical attribute is the poor second solution to this problem, which will work some of the time. Parameter handling is hit and miss and is not a real solution.

Exactly why I usually "totally cover the bases", which means: Set WMT to disregard, canonicalize anything sorted or duplicated, and also noindex anything sorted or duplicated.

I like the JavaScript approach too, and something I don't usually highlight in posts, but do is only include the canonical URLs in xml-sitemaps, which is Google specific as far as I know, but they have stated they try to treat a URL in an xml-sitemap that's duplicated on a same-site url but not included as "more important" or canonical.

there's no "penalty" for duplicating content. You're just serving unnecessary content to Google that leaves it to sort through and make decisions - essentially, handing over control of how your site is crawled and indexed for a third party to decide, based on their particular criteria of the time. Fixing such problems has nothing to do with penalties, and everything to do with controlling your site's destiny. It's old school tech SEO.

Absolutely. They "group" or "cluster" duplicate URLs together and then try to figure out which one to "give the weight to and show", so you're not really ever penalized, but letting them figure it out on their own can have unexpected results.

Duplication can also "stall out" crawling due to multiple URLs with the same content being crawled and taking cycles away from different URLs, but as far as onsite duplicate content goes there is no penalty.

lucy24




msg:4556003
 8:35 am on Mar 18, 2013 (gmt 0)

only include the canonical URLs in xml-sitemaps, which is Google specific as far as I know

I hope you didn't mean that only g### looks at your xml sitemap, because that's definitely not the case :) Everyone with pretensions to robotitude asks for it.

TheOptimizationIdiot




msg:4556004
 8:38 am on Mar 18, 2013 (gmt 0)

Huh?

Not only Google uses xml sitemaps, but Google does specifically state via youtube video if a URL is in an xml sitemap and there's a duplicate that's not they take the url in the xml sitemap as an indication of the canonical (that's the distinction I was making, I have no clue about Bing or Yahoo! or any other search engine as far as an xml sitemap mention being an indication of the canonical when a duplicate is not present).

As far as Google and xml sitemaps goes, I have recently (within the last 3 months) had pages only mentioned anywhere online via xml sitemap not only rank, but rank well, so their bot does spider and their algo does "look at and evaluate" the urls even mentioned exclusively within xml sitemaps, but I haven't found any evidence any other SE does that yet.

lucy24




msg:4556011
 9:51 am on Mar 18, 2013 (gmt 0)

Too many "only" and "specific" in the same sentence. It ends up sounding as if you meant the opposite of what you intended.

Definite sense of deja vu here.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved