tedster

msg:4357450 | 3:09 am on Sep 1, 2011 (gmt 0) |
Disallow: /*.php?sortby That's all you need to stop crawling the sortby URLs. The rules in robots.txt are naturally considered to be the start of a pattern so the final asterisk is not needed. And if you never need to see any query string indexed of any kind, then Disallow: /*.php? would do the job. However, the noindex robots meta is also a good idea, since Google sometimes does index a URL even though they haven't crawled it. Another step you could take is to use the feature in WebmasterTools where you tell Google which parameters to ignore.
|
doc_z

msg:4357512 | 7:29 am on Sep 1, 2011 (gmt 0) |
I had the same problem and I'm using the canonical tag. It took some until Google fixed it, but for me it seems to be the best way. You can also use "noindex,follow", but not "noindex,follow" and the canonical tag at the same time. I wouldn't use robots.txt to block URLs because it's a waste of link power.
|
tangor

msg:4357518 | 7:43 am on Sep 1, 2011 (gmt 0) |
| I wouldn't use robots.txt to block URLs because it's a waste of link power. |
| I question that statement since robots.txt has NOTHING to do with links! But I'm willing to be educated with examplars which indicate that robots.txt is injurious to link juice (power, et. al.) This is one of those don't pee on my leg and tell me it's raining kind of things.
|
tedster

msg:4357646 | 3:55 pm on Sep 1, 2011 (gmt 0) |
| I wouldn't use robots.txt to block URLs because it's a waste of link power. |
| For me, it depends on how much crawling there is for those parameters. I'd rather that bots didn't even request those URLs, but I suppose at a very low level might be OK.
|
schuon

msg:4357673 | 5:10 pm on Sep 1, 2011 (gmt 0) |
| I wouldn't use robots.txt to block URLs because it's a waste of link power. |
| Well, if you block a page that get's external link power via robots.txt, that link power is lost. If you'd do a noindex, follow instead it can be passed on. In this case though, I'd assume you don't have that many external links on sort-by "price". I used a canonical before to tell Google it's all the same page, and once the duplicate variants got removed from the index, I blocked it with robots.txt. From my experience stuff that was indexed and immediately got blocked via robots.txt tends to stay around in the index, somewhere deep down...
|
doc_z

msg:4357872 | 6:23 am on Sep 2, 2011 (gmt 0) |
| I question that statement since robots.txt has NOTHING to do with links! |
| Of course it has to do with links because it generates dead ends in the linking scheme and prevents link power flowing around (in contrast to "noindex,follow").
|
jerednel

msg:4358003 | 3:51 pm on Sep 2, 2011 (gmt 0) |
I've found Google's upgraded parameter handling tool in WMT works pretty well for this type of thing. Although URLs still linger.
|
netmeg

msg:4358022 | 4:49 pm on Sep 2, 2011 (gmt 0) |
Eh, I pretty much block everything with a question mark in the URL in robots.txt, and the sites are doing quite well. FWIW.
|
|