Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Canonicalization Question - What to do with pages I don't want?

         

Jez123

8:31 am on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Due to 2 bad errors, my site is not only listing versions of the same page with "/" and without "/" it's also listing old pages that were 301 redirected to newer pages on my server.

My question is: how do I deal with it? The .htaccess is now correct (it got overwritten) and the URL base is now once again correct (the "/" got deleted in the Wordpress settings!) what do I do with all the extra pages that I don't want? There are about 70 to 100. Is it worth noting them all and removing the URL's via WMT? Someone suggested that it could be dangerous to do that. I'd like some advice please.

aakk9999

9:34 am on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just redirect them 301 to a correct page - add this to your .htaccess (and make sure you are not internally linking to these unwanted URLs).

Andy Langton

9:40 am on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just to second aakk9999's advice. What you have is non-canonical URLs that have been inadvertently created. So you need to canonicalise them :)

The other method would be to use a rel=canonical tag on the pages, but a 301 is a more brute force method.

You may find that it takes a long time for the pages to disappear, since this type of content is almost always flagged as low quality by Google - and thus is not revisited very frequently. You can speed things up a little by submitting a sitemap of 'bad' URLs, but it's not usually worth the effort if you can site out the inevitable wait.

Jez123

10:23 am on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thnaks for the replies.

How do you submit a bad URL sitemap?

I have fixed the htaccess and the problem should not continue but I am worried as the site has been affected by Penguin that it will slow down my recovery.

It's not worth using google Remove URL tool? It would take a while as you can only submit one at a time.

Andy Langton

12:55 pm on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Google removal tool is really for cases where there is something sensitive in the SERPs that you need removed urgently - if someone had published a credit card number, for instance.

A 'bad url sitemap' isn't anything special - you just submit a normal sitemap with the URLs you want Google to crawl - in this case, those you want Google to canonicalise. You can submit a number of sitemaps, remember.

Jez123

1:58 pm on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, I am not quite understanding. So I submit another site map with the duplicated URL's with how I want google to recognise them? Even though they are already listed in the old sitemap correctly?

aakk9999

2:56 pm on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No, you either submit an additional sitemap that contains the "wrong" URLs or you append to the current sitemap wrong URLs and re-submit this sitemap.

The idea is that, having seen "wrong URLs" in the sitemap, Google will crawl them sooner, and hence it will see 301 Redirect and therefore process the redirect sooner.

After litle while you should remove these wrong URLs from the sitemap as ultimately the sitemap should only contain valid URLs.

g1smd

8:09 pm on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



While the incorrect URLs are listed they may still bring some traffic. The redirect within your site will get the visitor to the right place.

It can take Google many months to delist redirected URLs. That's usually not a problem.

Make sure your mod_rewrite rules are in the right order. The wrong order will introduce multiple-step redirect chains for some non-canonical URL requests.

Jez123

8:19 pm on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Make sure your mod_rewrite rules are in the right order. The wrong order will introduce multiple-step redirect chains for some non-canonical URL requests.


Thanks g1smd

OK, It's not my speciality. How do I know if it's right or wrong?

Ok, this looks bad, after writing the line above, I see that google has recognised the sitemap and says there is no error with it (though all the URL's should redirect. It's now counting more URL's. If they were redirecting properly google would say that there was a sitemap error wouldn't they?

Jez123

10:54 am on Jun 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In WMT, if I fetch a page with / and I click success it's all fine but if I fetch a page without / and click success I get this error:

The page seems to redirect to itself. This may result in an infinite redirect loop. Please check the Help Center article about redirects.

HTTP/1.1 301 Moved Permanently
Date: Thu, 21 Jun 2012 10:49:24 GMT
Server: Apache
X-Powered-By: PHP/5.2.9
X-Pingback:
Set-Cookie: PHPSESSID=3bcf0a2c26d7499c3fa0d64eb7da337c; path=/
Location: http://www.example.com/some-stuff/
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8

Is this the usual message when redirecting from non / to /?

aakk9999

12:49 pm on Jun 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There is something wrong in your .htacess as the page without / should not redirect. Why don't you post extract from .htacess to Apache forum, replacing your domain with example.com, maybe someone will spot the error.
This is assuming the canonical page is the one without /