Welcome to WebmasterWorld Guest from 54.167.86.211

Message Too Old, No Replies

My pages listed in duplicate by Google

     
4:15 pm on Jun 11, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2004
posts: 780
votes: 12


Google seems to have listed all of my pages twice.

I noticed today that there are 2 copies of every page when I go site:example.com

I also note that the PR of some of the inner pages has dropped to 0

As well as the above, my WMT site map shows that out of 80~ existing pages only 31 are indexed! (was full amount last time I checked which was Friday I think) Yet, go site: and I see that 160~ are in fact indexed and cached!

The site preview is showing odd things too. Not all the page is showing in the clip and some parts have a red box around them - some images are missing or portions of the page. From the preview if I click cache, which surely it must be to show the preview, I get a google 404.

I am at a total loss - I have never seen such a mess. Has anyone seen this or have any advice please?
4:33 pm on June 11, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2004
posts: 780
votes: 12


Here's an example of 2 of the URL's when I click the cache on both duplicated versions of the page - I have changed the domain and product names:

webcache.googleusercontent.com/search?q=cache:yWru1V-6u28J:www.example.net/blue-widget/free/+&cd=1&hl=en&ct=clnk&gl=uk

webcache.googleusercontent.com/search?q=cache:AePae6j9cb0J:www.example.net/blue-widget/free+&cd=2&hl=en&ct=clnk&gl=uk

One is cached and the other is not but they are both listed when I search site:mydomain
4:50 pm on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Do they both point to the http protocol, not one to https?
4:55 pm on June 11, 2012 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 12, 2004
posts:608
votes: 1


seems to be trailing slash problem?
5:02 pm on June 11, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2004
posts: 780
votes: 12


@ tedster There is no https so, no.

@ levo please explain more.
5:08 pm on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


levo got it. Notice that the first URL you pasted in ends with a "/" and the second one does not. In other words, the two URLs are different.
5:20 pm on June 11, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2004
posts: 780
votes: 12


Yes, I see it now. I need to somehow fix that with a redirect. Does it explain why pages are being dropped from the WMT sitemap?

Thanks both of you.
7:18 pm on June 11, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2004
posts: 780
votes: 12


The .htaccess seems to be redirecting url/ to url with no issues. So I still am very much at a loss with this. Is it possible the the .htaccess could be redirecting me OK but not google's spiders?
8:48 pm on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 19, 2003
posts:859
votes: 3


When redirecting have you used a 301?
8:59 pm on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


If the correct single-step 301 redirect is in place, Google will figure it out.

Partial showing of previews is quite common. Allow a few weeks for Google to pull in all the elements of the page.

For URLs which now redirect, the preview will be the first thing to no longer show up. The snippet is often the next thing to go.

The cache view is not at all the same thing as a preview. There's many reasons why the cache view can be missing.

Make sure the site is tecnically 100% correct. Check all the reports in WMT and act on them and then allow three months for things to fall into place.
9:20 pm on June 11, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2004
posts: 780
votes: 12


@g1smd I don't think I made it clear but the redirect has always been in place. I don't understand how google indexed them to start with.

@petehall, I assume it's 301 but not in a position to check right now. I will have a proper look at the .htaccess tomorrow. I didn't write it - I'm not that technical!
9:28 pm on June 11, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


If Google has indexed both URLs (with and without slash), either the redirect is a 302 redirect or the redirect was missing for some time or you have some sort of multiple step redirection chain and intermediate URLs are being indexed.

You'll need to request both www and non-www URLs both with and without slash and look at the results in the Live HTTP Headers extension for Firefox.
10:16 pm on June 11, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12717
votes: 244


The .htaccess seems to be redirecting url/ to url with no issues.

Is /url/ a "real" directory, a "real" file, or just a name hiding a mess of secret rewrites?

If you are redirecting, everyone ends up on the same page (both physically and metaphorically, ahem). But if you're rewriting, neither g### nor your human visitors knows that the content might be coming from the identical place either way.
7:55 am on June 12, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2004
posts: 780
votes: 12


@Lucy, there is nothing suspicious (I hope) about it. What I have tried to achieve is redirect old pages to new wordpress migration of the site.

I will post the code here (hope that's OK) someone else suggested that it could be an issue as some redirects have / and some do not. Can that confuse the redirection? I am hoping someone can clarify.

#redirects from old site
redirect 301 /example1.htm http:// www.mytopsecretdomain.net/example1/
redirect 301 /example2.htm http:// www.mytopsecretdomain.net/
redirect 301 /example3.htm http:// www.mytopsecretdomain/example3/
redirect 301 /example4.htm http:// mytopsecretdomain/example4/
redirect 301 /example5.htm http:// mytopsecretdomain/example5/
redirect 301 /example6.htm http:// mytopsecretdomain/example6/

redirect 301 /site-map3.xml http:// mytopsecretdomain/about-us/blog-sitemap/
redirect 301 /contact.php http:// mytopsecretdomain/contact-us/
redirect 301 /blog/ /about-us/blog/

RedirectMatch 301 ^/example10(.*) http:// www.mytopsercetdomain.net/example10/
RedirectMatch 301 ^/example11(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example12(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example13(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example14(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example15(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example16(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example17(.*) http:// www.mytopsecretdomain.net/example17/
RedirectMatch 301 ^/example18/(.*) http:// www.mytopsecretdomain.net/example18/

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress
2:45 pm on June 12, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2004
posts: 780
votes: 12


Sorry to bump this but has anyone got any comments on the code posted please?
5:49 pm on June 12, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Yes. It's awful.

The required http: is missing from some rule targets. The canonical URL for the site root should end with a trailing slash.

You should convert all of the rules to use RewriteRule syntax. There can be problems when you mix mod_alias and mod_rewrite code in the same file.

Only capture (.*) if you intend to reuse that backreference.

Delete the <IfModule> containers.

RewriteBase / is the default and is not needed.

The -f and -d checks are extremely inefficient. There are better methods available and these are often discussed in previous threads.