homepage Welcome to WebmasterWorld Guest from 54.198.130.203
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
My pages listed in duplicate by Google
Jez123




msg:4463994
 4:15 pm on Jun 11, 2012 (gmt 0)

Google seems to have listed all of my pages twice.

I noticed today that there are 2 copies of every page when I go site:example.com

I also note that the PR of some of the inner pages has dropped to 0

As well as the above, my WMT site map shows that out of 80~ existing pages only 31 are indexed! (was full amount last time I checked which was Friday I think) Yet, go site: and I see that 160~ are in fact indexed and cached!

The site preview is showing odd things too. Not all the page is showing in the clip and some parts have a red box around them - some images are missing or portions of the page. From the preview if I click cache, which surely it must be to show the preview, I get a google 404.

I am at a total loss - I have never seen such a mess. Has anyone seen this or have any advice please?

 

Jez123




msg:4464003
 4:33 pm on Jun 11, 2012 (gmt 0)

Here's an example of 2 of the URL's when I click the cache on both duplicated versions of the page - I have changed the domain and product names:

webcache.googleusercontent.com/search?q=cache:yWru1V-6u28J:www.example.net/blue-widget/free/+&cd=1&hl=en&ct=clnk&gl=uk

webcache.googleusercontent.com/search?q=cache:AePae6j9cb0J:www.example.net/blue-widget/free+&cd=2&hl=en&ct=clnk&gl=uk

One is cached and the other is not but they are both listed when I search site:mydomain

tedster




msg:4464013
 4:50 pm on Jun 11, 2012 (gmt 0)

Do they both point to the http protocol, not one to https?

levo




msg:4464018
 4:55 pm on Jun 11, 2012 (gmt 0)

seems to be trailing slash problem?

Jez123




msg:4464020
 5:02 pm on Jun 11, 2012 (gmt 0)

@ tedster There is no https so, no.

@ levo please explain more.

tedster




msg:4464027
 5:08 pm on Jun 11, 2012 (gmt 0)

levo got it. Notice that the first URL you pasted in ends with a "/" and the second one does not. In other words, the two URLs are different.

Jez123




msg:4464037
 5:20 pm on Jun 11, 2012 (gmt 0)

Yes, I see it now. I need to somehow fix that with a redirect. Does it explain why pages are being dropped from the WMT sitemap?

Thanks both of you.

Jez123




msg:4464090
 7:18 pm on Jun 11, 2012 (gmt 0)

The .htaccess seems to be redirecting url/ to url with no issues. So I still am very much at a loss with this. Is it possible the the .htaccess could be redirecting me OK but not google's spiders?

petehall




msg:4464117
 8:48 pm on Jun 11, 2012 (gmt 0)

When redirecting have you used a 301?

g1smd




msg:4464121
 8:59 pm on Jun 11, 2012 (gmt 0)

If the correct single-step 301 redirect is in place, Google will figure it out.

Partial showing of previews is quite common. Allow a few weeks for Google to pull in all the elements of the page.

For URLs which now redirect, the preview will be the first thing to no longer show up. The snippet is often the next thing to go.

The cache view is not at all the same thing as a preview. There's many reasons why the cache view can be missing.

Make sure the site is tecnically 100% correct. Check all the reports in WMT and act on them and then allow three months for things to fall into place.

Jez123




msg:4464132
 9:20 pm on Jun 11, 2012 (gmt 0)

@g1smd I don't think I made it clear but the redirect has always been in place. I don't understand how google indexed them to start with.

@petehall, I assume it's 301 but not in a position to check right now. I will have a proper look at the .htaccess tomorrow. I didn't write it - I'm not that technical!

g1smd




msg:4464136
 9:28 pm on Jun 11, 2012 (gmt 0)

If Google has indexed both URLs (with and without slash), either the redirect is a 302 redirect or the redirect was missing for some time or you have some sort of multiple step redirection chain and intermediate URLs are being indexed.

You'll need to request both www and non-www URLs both with and without slash and look at the results in the Live HTTP Headers extension for Firefox.

lucy24




msg:4464162
 10:16 pm on Jun 11, 2012 (gmt 0)

The .htaccess seems to be redirecting url/ to url with no issues.

Is /url/ a "real" directory, a "real" file, or just a name hiding a mess of secret rewrites?

If you are redirecting, everyone ends up on the same page (both physically and metaphorically, ahem). But if you're rewriting, neither g### nor your human visitors knows that the content might be coming from the identical place either way.

Jez123




msg:4464306
 7:55 am on Jun 12, 2012 (gmt 0)

@Lucy, there is nothing suspicious (I hope) about it. What I have tried to achieve is redirect old pages to new wordpress migration of the site.

I will post the code here (hope that's OK) someone else suggested that it could be an issue as some redirects have / and some do not. Can that confuse the redirection? I am hoping someone can clarify.

#redirects from old site
redirect 301 /example1.htm http:// www.mytopsecretdomain.net/example1/
redirect 301 /example2.htm http:// www.mytopsecretdomain.net/
redirect 301 /example3.htm http:// www.mytopsecretdomain/example3/
redirect 301 /example4.htm http:// mytopsecretdomain/example4/
redirect 301 /example5.htm http:// mytopsecretdomain/example5/
redirect 301 /example6.htm http:// mytopsecretdomain/example6/

redirect 301 /site-map3.xml http:// mytopsecretdomain/about-us/blog-sitemap/
redirect 301 /contact.php http:// mytopsecretdomain/contact-us/
redirect 301 /blog/ /about-us/blog/

RedirectMatch 301 ^/example10(.*) http:// www.mytopsercetdomain.net/example10/
RedirectMatch 301 ^/example11(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example12(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example13(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example14(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example15(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example16(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example17(.*) http:// www.mytopsecretdomain.net/example17/
RedirectMatch 301 ^/example18/(.*) http:// www.mytopsecretdomain.net/example18/

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

Jez123




msg:4464406
 2:45 pm on Jun 12, 2012 (gmt 0)

Sorry to bump this but has anyone got any comments on the code posted please?

g1smd




msg:4464508
 5:49 pm on Jun 12, 2012 (gmt 0)

Yes. It's awful.

The required http: is missing from some rule targets. The canonical URL for the site root should end with a trailing slash.

You should convert all of the rules to use RewriteRule syntax. There can be problems when you mix mod_alias and mod_rewrite code in the same file.

Only capture (.*) if you intend to reuse that backreference.

Delete the <IfModule> containers.

RewriteBase / is the default and is not needed.

The -f and -d checks are extremely inefficient. There are better methods available and these are often discussed in previous threads.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved