Welcome to WebmasterWorld Guest from 54.145.85.22

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

How to noindex sort pages - page.php?sortby

     
7:43 pm on Aug 31, 2011 (gmt 0)

New User

10+ Year Member

joined:Mar 23, 2004
posts:39
votes: 0


Trying to recover an ecommerce site from Panda and I've found tons of dupes in G from sort pages... each sort had 6 functions so basically I have 6 dupe pages of each category/product.

For the sort pages:

http://www.example.com/categorypage.php?sortby=price&asc


I've added:

<link rel="canonical" href="http://www.example.com/categorypage.php" /> 


to the headers but I'm wondering if I should add:

<meta name="ROBOTS" content="NOINDEX,FOLLOW">


to the sort page headers also... Will this also deindex: http://www.example.com/categorypage.php ?\


Also, is there any robots.txt regex or something I cna use to block after the ?

ie: Disallow: /*.php?*

Panda Sux!

Thx

[edited by: tedster at 8:01 pm (utc) on Aug 31, 2011]
[edit reason] switch to example.com [/edit]

3:09 am on Sept 1, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Disallow: /*.php?sortby

That's all you need to stop crawling the sortby URLs. The rules in robots.txt are naturally considered to be the start of a pattern so the final asterisk is not needed. And if you never need to see any query string indexed of any kind, then Disallow: /*.php? would do the job.

However, the noindex robots meta is also a good idea, since Google sometimes does index a URL even though they haven't crawled it.

Another step you could take is to use the feature in WebmasterTools where you tell Google which parameters to ignore.
7:29 am on Sept 1, 2011 (gmt 0)

Senior Member from DE 

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 20, 2003
posts:877
votes: 4


I had the same problem and I'm using the canonical tag. It took some until Google fixed it, but for me it seems to be the best way.

You can also use "noindex,follow", but not "noindex,follow" and the canonical tag at the same time.

I wouldn't use robots.txt to block URLs because it's a waste of link power.
7:43 am on Sept 1, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6964
votes: 385


I wouldn't use robots.txt to block URLs because it's a waste of link power.

I question that statement since robots.txt has NOTHING to do with links!

But I'm willing to be educated with examplars which indicate that robots.txt is injurious to link juice (power, et. al.)

This is one of those don't pee on my leg and tell me it's raining kind of things.
3:55 pm on Sept 1, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I wouldn't use robots.txt to block URLs because it's a waste of link power.

For me, it depends on how much crawling there is for those parameters. I'd rather that bots didn't even request those URLs, but I suppose at a very low level might be OK.
5:10 pm on Sept 1, 2011 (gmt 0)

New User

5+ Year Member

joined:Sept 1, 2011
posts: 12
votes: 0


I wouldn't use robots.txt to block URLs because it's a waste of link power.

Well, if you block a page that get's external link power via robots.txt, that link power is lost. If you'd do a noindex, follow instead it can be passed on. In this case though, I'd assume you don't have that many external links on sort-by "price".

I used a canonical before to tell Google it's all the same page, and once the duplicate variants got removed from the index, I blocked it with robots.txt. From my experience stuff that was indexed and immediately got blocked via robots.txt tends to stay around in the index, somewhere deep down...
6:23 am on Sept 2, 2011 (gmt 0)

Senior Member from DE 

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 20, 2003
posts:877
votes: 4


I question that statement since robots.txt has NOTHING to do with links!


Of course it has to do with links because it generates dead ends in the linking scheme and prevents link power flowing around (in contrast to "noindex,follow").
3:51 pm on Sept 2, 2011 (gmt 0)

New User

5+ Year Member

joined:June 21, 2011
posts:11
votes: 0


I've found Google's upgraded parameter handling tool in WMT works pretty well for this type of thing. Although URLs still linger.
4:49 pm on Sept 2, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12906
votes: 194


Eh, I pretty much block everything with a question mark in the URL in robots.txt, and the sites are doing quite well. FWIW.