netmeg

msg:4328525 | 11:51 pm on Jun 20, 2011 (gmt 0) |
I took a client site to Magento Enterprise last year. I put in a bunch of redirects to old URLs (several thousand at least) and I also blocked off tons stuff that didn't need to be in the search engines, like sort and page and whatnot. And of course, I blocked the search results pages. Have you implemented the canonical URL plugin? It was written for the Community version, but it works on Enterprise. I guess I would take a look at what those errors actually are, and find a way to make sure that the store isn't generating them anymore. If you block with robots.txt, that's not going to cause them to fall out of the index; it'll just keep Google from going in to see them at all (and presumably you won't get any more) If you want them out of the index, you're going to have to remove them manually one by one (ouch) or figure out a way to serve up a 404 or 410 result code on them. I'm a little curious on what these pages really are though; I don't think we have anything similar.
|
emikey

msg:4328578 | 3:19 am on Jun 21, 2011 (gmt 0) |
Yikes.. I don't like the idea of doing 32k redirects. Do you know of any way to do a bulk upload or redirects in magento?
|
netmeg

msg:4328585 | 3:25 am on Jun 21, 2011 (gmt 0) |
The way we did it was to bypass magento altogether and put them in .htaccess - the previous store was in a subdirectory, so we just installed a separate apache in that directory and put an .htaccess in *just* for serving redirects. Seemed to work okay. If your pages aren't serving any particular purpose, or helpful to your users, not much point redirecting them. Just get 'em out of there.
|
g1smd

msg:4328635 | 7:51 am on Jun 21, 2011 (gmt 0) |
I'm just working on a site (not Magento but a similar problem) that has exposed 100 000 URLs that should never have been indexed. Since the URLs all had a common pattern, the solution was to add a rule to .htaccess to send "410 Gone" for all of those. You could also set up redirects for URLs that actually have traffic, but only redirect to a new page if the content on the new page closely matches the content of the old page. In particular, do not "funnel" hundreds of URLs to one destination. Especially do not redirect users to the root home page. If you're going to do that, serve "410 Gone" and ensure the ErrorDocument has clickable links to the home page and to the major category pages.
|
emikey

msg:4329031 | 10:22 pm on Jun 21, 2011 (gmt 0) |
Correct me if I am wrong, but doesn't adding a large number of rules to your .htaccess slow down your sever? This is great advice, and I really appreciate it!
|
incrediBILL

msg:4329034 | 10:28 pm on Jun 21, 2011 (gmt 0) |
| Since the URLs all had a common pattern, the solution was to add a rule to .htaccess to send "410 Gone" for all of those. |
| I had a similar situation and put NOINDEX in those pages so googlebot would stop trying to hit them and WMTs wouldn't be cluttered with the junk. Took a couple of months to resolve but googlebot did and it's all good now.
|
g1smd

msg:4329064 | 11:22 pm on Jun 21, 2011 (gmt 0) |
| Correct me if I am wrong, but doesn't adding a large number of rules to your .htaccess slow down your sever? This is great advice, and I really appreciate it! |
| Since the URLs all had a common pattern, the solution was to add a (one, single) rule to .htaccess to send "410 Gone" for all of those.
|
emikey

msg:4330067 | 5:39 pm on Jun 23, 2011 (gmt 0) |
Can you send me an example of the htaccess code rule i could use?
|
g1smd

msg:4330125 | 7:21 pm on Jun 23, 2011 (gmt 0) |
It depends on the URLs themselves:
RewriteCond (^|&)pid=([^&]+)(&|$) RewriteRule ^(index\.php)?$ - [G]
|
|