How to noindex sort pages - page.php?sortby - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to noindex sort pages - page.php?sortby

sauce

7:43 pm on Aug 31, 2011 (gmt 0)

Trying to recover an ecommerce site from Panda and I've found tons of dupes in G from sort pages... each sort had 6 functions so basically I have 6 dupe pages of each category/product.

For the sort pages:

http://www.example.com/categorypage.php?sortby=price&asc

I've added:

<link rel="canonical" href="http://www.example.com/categorypage.php" />

to the headers but I'm wondering if I should add:

<meta name="ROBOTS" content="NOINDEX,FOLLOW">

to the sort page headers also... Will this also deindex: http://www.example.com/categorypage.php ?\

Also, is there any robots.txt regex or something I cna use to block after the ?

ie: Disallow: /*.php?*

Panda Sux!

Thx

[edited by: tedster at 8:01 pm (utc) on Aug 31, 2011]
[edit reason] switch to example.com [/edit]

tedster

3:09 am on Sep 1, 2011 (gmt 0)

Disallow: /*.php?sortby

That's all you need to stop crawling the sortby URLs. The rules in robots.txt are naturally considered to be the start of a pattern so the final asterisk is not needed. And if you never need to see any query string indexed of any kind, then Disallow: /*.php? would do the job.

However, the noindex robots meta is also a good idea, since Google sometimes does index a URL even though they haven't crawled it.

Another step you could take is to use the feature in WebmasterTools where you tell Google which parameters to ignore.

doc_z

7:29 am on Sep 1, 2011 (gmt 0)

I had the same problem and I'm using the canonical tag. It took some until Google fixed it, but for me it seems to be the best way.

You can also use "noindex,follow", but not "noindex,follow" and the canonical tag at the same time.

I wouldn't use robots.txt to block URLs because it's a waste of link power.

tangor

7:43 am on Sep 1, 2011 (gmt 0)

I wouldn't use robots.txt to block URLs because it's a waste of link power.

I question that statement since robots.txt has NOTHING to do with links!

But I'm willing to be educated with examplars which indicate that robots.txt is injurious to link juice (power, et. al.)

This is one of those don't pee on my leg and tell me it's raining kind of things.

tedster

3:55 pm on Sep 1, 2011 (gmt 0)

I wouldn't use robots.txt to block URLs because it's a waste of link power.

For me, it depends on how much crawling there is for those parameters. I'd rather that bots didn't even request those URLs, but I suppose at a very low level might be OK.

schuon

5:10 pm on Sep 1, 2011 (gmt 0)

I wouldn't use robots.txt to block URLs because it's a waste of link power.

Well, if you block a page that get's external link power via robots.txt, that link power is lost. If you'd do a noindex, follow instead it can be passed on. In this case though, I'd assume you don't have that many external links on sort-by "price".

I used a canonical before to tell Google it's all the same page, and once the duplicate variants got removed from the index, I blocked it with robots.txt. From my experience stuff that was indexed and immediately got blocked via robots.txt tends to stay around in the index, somewhere deep down...

doc_z

6:23 am on Sep 2, 2011 (gmt 0)

I question that statement since robots.txt has NOTHING to do with links!

Of course it has to do with links because it generates dead ends in the linking scheme and prevents link power flowing around (in contrast to "noindex,follow").

jerednel

3:51 pm on Sep 2, 2011 (gmt 0)

I've found Google's upgraded parameter handling tool in WMT works pretty well for this type of thing. Although URLs still linger.

netmeg

4:49 pm on Sep 2, 2011 (gmt 0)

Eh, I pretty much block everything with a question mark in the URL in robots.txt, and the sites are doing quite well. FWIW.