Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to structure URLs with different sort lists of the same products?

         

olly

12:28 am on Jun 9, 2009 (gmt 0)

10+ Year Member



Hi everyone

I wanted to put a situation forward and get a feel for what people think should be done to promote best possible rankings in a particular situation.

We have a site advertising hundreds of different item listings.

Our URLs are structured as follows using URL rewriting:
abc.com/categoryx/blue (this will default to page 1)
abc.com/categoryx/blue/2 (page 2)
abc.com/categoryx/blue/3 (page 3)

We have also introduced sorting on the above:
abc.com/categoryx/blue/1/pricing-high-low
abc.com/categoryx/blue/2/pricing-high-low (page 2)
Each of the above produces a listing of widgets with the first 100 or so words of description/content on each widget.

At any rate, the URLs above will have very similar content - not quite duplicated but not distinct enough to constitute unique content. The same widget would be displayed on many of the different URLS above. There are lots of competitor websites advertising the same products with similar text as well.

We rank top 5 for all major terms that we compete for, i.e. "abc" in a highly competitve field. But for secondary terms such as "categoryx blue" we do not rank very well at all. The pages we would like to rank well for as well as the terms are:
abc.com/categoryx/blue (on search for "blue category1")
page 2 and onwards will never be required to be shown as a search result.

My questions is: is it a mistake to have sorted pages of similar (but not identical) things on different URLs?
Should it rather be structured as:
(i) abc.com/categoryx?pg=2&type=blue&sort=high-low
or:
(ii) abc.com/categoryx-blue-2-high-low

With explicit GET parameters I thought Google might give more "power" to that page and also see us having fewer "thin" or non-unique content pages. This goes against having "clean" URLs but will perhaps give you fewer more concentrated pages of content in Google's eyes?

The second alternative introduces less slashes, which might also be a factor? I have seen huge sites like tripadvisor minimising their use of slashes.

Any thoughts or suggestions would be greatly appreciated. By the way, I did link internally to the paged lists with nofollows but found this to give bad results.

tedster

12:48 am on Jun 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello olly and welcome to the forums.

Both (all three) approaches are still creating different URLs - even the query string counts as part of the UR. And using fewer slashes is not a ranking factor.

I would just make sure that sorted pages were not indexed at all - robots.txt works, or meta robots noindex on the sorted content pages. Then write the urls for sorts any way is easy for you to maintain.

olly

1:08 am on Jun 9, 2009 (gmt 0)

10+ Year Member



Hi tedster

Thanks for the swift reply! With 24k+ posts your opinion most certainly carries some serious weight.

In your opinion, do you think using meta robots noindex will improve the ranking of the desired (indexed) pages? Is it possible that Google already disregards them and there would be no benefit of using them?

On a side note, I often wonder whether something like this could fall under the category of "over tweaking"?

tedster

2:37 am on Jun 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, it is *possible* that Google already disregards the sort pages, but I would rather take my money into a casino than bet on it.

Even on high PR sites (8+) I have seen the introduction of sort pages into Google's index cause all kinds of trouble. So my point of view is that I know my site best and I will choose what Google does and does not have the option to include.

[edited by: tedster at 8:13 pm (utc) on June 9, 2009]

olly

12:15 pm on Jun 9, 2009 (gmt 0)

10+ Year Member



Thanks tedster

I have seen numerous reports of Google ignoring these directives? Perhaps they may still cache pages but will not count these pages when they calculate rankings. What are your thoughts on this?

Also, would you include a "nofollow" directive as well as the "noindex"?

Is there any advantage between robots.txt and the meta robots noindex directive?

Robert Charlton

6:53 pm on Jun 9, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Is there any advantage between robots.txt and the meta robots noindex directive?

You don't want to use robots.txt and the meta robots noindex directive simultaneously. See this current discussion for some distinctions between them and why it may appear that Google is ignoring "noindex"...

Robots.txt disallowed file shows up in SERPs & Google traffic drops
[webmasterworld.com...]

tedster

8:42 pm on Jun 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps they may still cache pages but will not count these pages when they calculate rankings. What are your thoughts on this?

We do need to be precise with word choices here: having no cached page visible doesn't mean the page isn't indexed. Having a "noindex" robots tag means that the page is spidered (it must be in order to have the meta tag read) but its content is not included in the searchable index.

However, in a situation such as a noindex,follow meta tag - it is clear that Google must use the links on the page in their calculations, isn't it?

olly

9:34 pm on Jun 9, 2009 (gmt 0)

10+ Year Member



Wow, seriously impressed with the amount of participation in this forum :) Thanks guys.

having no cached page visible doesn't mean the page isn't indexed

I thought if a page was indexed that it was stored and indexed in the Google cache? Thus by noindex I assumed a page would be read by Google but not stored/cached. Am I misunderstanding something? Are you saying one needs to discriminate between Google's cache and their searchable cache? I'm a little confused...

Robots.txt disallowed file shows up in SERPs & Google traffic drops
[webmasterworld.com...]
Thanks Robert, very interesting article - I had not made the connection regarding the subtle distinction between robots.txt and meta directives. Can you think of a situation where this would be of practical concern though? If Google was to still spider a page (due to no locks in robots.txt) with a noindex directive, yet still not index it, is it not as if it had never seen it at all?

Thanks again for your guys time on this :)

aakk9999

7:27 am on Jun 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Alternatively, if user can sort your pages dynamically by clicking on some sort of sort criteria, then you could send sort parameters via POST so that they are not the part of your URL in the first place.

This does, however, mean that the user will not be able to bookmark sorted page (i.e. bookmarking sorted page will give unsorted results when accessed again) so you have to weigh what you want to achieve.

Shaddows

9:55 am on Jun 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi olly

NOINDEX means it will not show in SERPs. Google still has a FULL COPY, and will treat it exactly like any other page for upstream and downstream calculations.

You can allow the page to show in SERPs, but stop users accessing Google's cached version using NOARCHIVE

You can stop Google visiting a page using robots.txt. They will not see any robots metatags in this case. The page MAY STILL SHOW IN SERPS as a URL, depending on inbound links. You will not get any ranking credit for anything on that page.

For sorted results, you are (IMHO) best using NOINDEX, and may find a use for the CANONICAL TAG (referencing the default sort URL)

olly

9:55 am on Jun 10, 2009 (gmt 0)

10+ Year Member



Thanks aakk9999, that's not a bad idea...

So far we have 3 separate methods:
(i) sort via POST
(ii) robots.txt (prevent spidering altogether)
(iii) meta robots noindex (spider, but not include in index)

If we assume that we want just page 1 of the unsorted list to compete in Google, which of the above will give the best results?

olly

12:24 pm on Jun 10, 2009 (gmt 0)

10+ Year Member



After investigating some of the competitors websites and thinking through the problem, I decided that the POST method was the best to go for. The previous sort pages now return 404.

My reasons were mainly that I don't think the sort pages should be included in Google's index, cache or any ranking calculations whatsoever. After all, their only utility is to aid the user to browse conveniently - they add no more value to the original page from a search engines point of view.

I'm hoping this will effectively reduce the amount of duplicate/similar content in Google's store and allow the authoritative (or canonical) pages to compete without any penalty being applied (either on a sitewide-level or page-level).

aakk9999

11:41 pm on Jun 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Instead of pages with sort parameters in URL now returning 404, they could return 301 redirect to the page 1 of unsorted results.

This way you are not losing visitor to the site (in case someone bookmarked previously sorted results) and seeing 301 Google will slowly drop pages with sort parameters in URL from its index. This also means if there is any external link to previous URL with sort parameters, you will still get this link juice.