Welcome to WebmasterWorld Guest from 54.226.246.160

Message Too Old, No Replies

My pages listed in duplicate by Google

     

Jez123

4:15 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google seems to have listed all of my pages twice.

I noticed today that there are 2 copies of every page when I go site:example.com

I also note that the PR of some of the inner pages has dropped to 0

As well as the above, my WMT site map shows that out of 80~ existing pages only 31 are indexed! (was full amount last time I checked which was Friday I think) Yet, go site: and I see that 160~ are in fact indexed and cached!

The site preview is showing odd things too. Not all the page is showing in the clip and some parts have a red box around them - some images are missing or portions of the page. From the preview if I click cache, which surely it must be to show the preview, I get a google 404.

I am at a total loss - I have never seen such a mess. Has anyone seen this or have any advice please?

Jez123

4:33 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's an example of 2 of the URL's when I click the cache on both duplicated versions of the page - I have changed the domain and product names:

webcache.googleusercontent.com/search?q=cache:yWru1V-6u28J:www.example.net/blue-widget/free/+&cd=1&hl=en&ct=clnk&gl=uk

webcache.googleusercontent.com/search?q=cache:AePae6j9cb0J:www.example.net/blue-widget/free+&cd=2&hl=en&ct=clnk&gl=uk

One is cached and the other is not but they are both listed when I search site:mydomain

tedster

4:50 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Do they both point to the http protocol, not one to https?

levo

4:55 pm on Jun 11, 2012 (gmt 0)

10+ Year Member



seems to be trailing slash problem?

Jez123

5:02 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@ tedster There is no https so, no.

@ levo please explain more.

tedster

5:08 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



levo got it. Notice that the first URL you pasted in ends with a "/" and the second one does not. In other words, the two URLs are different.

Jez123

5:20 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, I see it now. I need to somehow fix that with a redirect. Does it explain why pages are being dropped from the WMT sitemap?

Thanks both of you.

Jez123

7:18 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The .htaccess seems to be redirecting url/ to url with no issues. So I still am very much at a loss with this. Is it possible the the .htaccess could be redirecting me OK but not google's spiders?

petehall

8:48 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When redirecting have you used a 301?

g1smd

8:59 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If the correct single-step 301 redirect is in place, Google will figure it out.

Partial showing of previews is quite common. Allow a few weeks for Google to pull in all the elements of the page.

For URLs which now redirect, the preview will be the first thing to no longer show up. The snippet is often the next thing to go.

The cache view is not at all the same thing as a preview. There's many reasons why the cache view can be missing.

Make sure the site is tecnically 100% correct. Check all the reports in WMT and act on them and then allow three months for things to fall into place.

Jez123

9:20 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@g1smd I don't think I made it clear but the redirect has always been in place. I don't understand how google indexed them to start with.

@petehall, I assume it's 301 but not in a position to check right now. I will have a proper look at the .htaccess tomorrow. I didn't write it - I'm not that technical!

g1smd

9:28 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If Google has indexed both URLs (with and without slash), either the redirect is a 302 redirect or the redirect was missing for some time or you have some sort of multiple step redirection chain and intermediate URLs are being indexed.

You'll need to request both www and non-www URLs both with and without slash and look at the results in the Live HTTP Headers extension for Firefox.

lucy24

10:16 pm on Jun 11, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



The .htaccess seems to be redirecting url/ to url with no issues.

Is /url/ a "real" directory, a "real" file, or just a name hiding a mess of secret rewrites?

If you are redirecting, everyone ends up on the same page (both physically and metaphorically, ahem). But if you're rewriting, neither g### nor your human visitors knows that the content might be coming from the identical place either way.

Jez123

7:55 am on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@Lucy, there is nothing suspicious (I hope) about it. What I have tried to achieve is redirect old pages to new wordpress migration of the site.

I will post the code here (hope that's OK) someone else suggested that it could be an issue as some redirects have / and some do not. Can that confuse the redirection? I am hoping someone can clarify.

#redirects from old site
redirect 301 /example1.htm http:// www.mytopsecretdomain.net/example1/
redirect 301 /example2.htm http:// www.mytopsecretdomain.net/
redirect 301 /example3.htm http:// www.mytopsecretdomain/example3/
redirect 301 /example4.htm http:// mytopsecretdomain/example4/
redirect 301 /example5.htm http:// mytopsecretdomain/example5/
redirect 301 /example6.htm http:// mytopsecretdomain/example6/

redirect 301 /site-map3.xml http:// mytopsecretdomain/about-us/blog-sitemap/
redirect 301 /contact.php http:// mytopsecretdomain/contact-us/
redirect 301 /blog/ /about-us/blog/

RedirectMatch 301 ^/example10(.*) http:// www.mytopsercetdomain.net/example10/
RedirectMatch 301 ^/example11(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example12(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example13(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example14(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example15(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example16(.*) http:// www.mytopsecretdomain.net
RedirectMatch 301 ^/example17(.*) http:// www.mytopsecretdomain.net/example17/
RedirectMatch 301 ^/example18/(.*) http:// www.mytopsecretdomain.net/example18/

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

Jez123

2:45 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry to bump this but has anyone got any comments on the code posted please?

g1smd

5:49 pm on Jun 12, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes. It's awful.

The required http: is missing from some rule targets. The canonical URL for the site root should end with a trailing slash.

You should convert all of the rules to use RewriteRule syntax. There can be problems when you mix mod_alias and mod_rewrite code in the same file.

Only capture (.*) if you intend to reuse that backreference.

Delete the <IfModule> containers.

RewriteBase / is the default and is not needed.

The -f and -d checks are extremely inefficient. There are better methods available and these are often discussed in previous threads.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month