homepage Welcome to WebmasterWorld Guest from 54.196.57.4
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

    
Duplicate home pages appearing in Google
Webmaster showing error from phantom dupes?!
WebDogOne




msg:4591940
 7:50 pm on Jul 10, 2013 (gmt 0)

Duplicate home pages problem > what are: ?cat=1, ?cat=, ?fullsite=true
I noticed on one of my clients sites that there were pages that appear to be duplicates of the home page. These were found when I noticed a Error message regarding duplicate Titles and Descriptions in Webmaster Tools. The pages show as example.com/?cat=-1, example.com/?cat= and example.com/?fullsite=true.

These are not URLs we have created and are not showing on the server.

I checked the database using site:example.com and sure enough, these URLs are in the Google database.
Since they appear to be creating duplicate home pages I was about to use the Remove URL tool but then noticed that the URL was causing it to "Remove Site" rather than "Remove Page", which of course spooked me into A) thinking there were multiple versions of the site, or B) removing these URLs would somehow remove the entire site.
Q: What are these?
Q: Where do they come from?
Q: How do I safely get rid of them?

Any and all comments welcome. Thanks.

[edited by: phranque at 8:21 pm (utc) on Jul 10, 2013]
[edit reason] exemplified domain [/edit]

 

phranque




msg:4591953
 8:26 pm on Jul 10, 2013 (gmt 0)

welcome to WebmasterWorld, WebDogOne!


google discovered these urls "somewhere" and it doesn't really matter where unless they were linked internally from your site.
use a tool such as xenu to crawl your site and see if they appear there.
if you are linking to non-canonical urls then you should fix that problem.

then to solve the googlebot problem, add some external redirects so that any requests for non-canonical urls (such as extraneous query strings) are redirected with a 301 status code to the canonical url.

depending on your server (apache? IIS?) you should post any further questions about specific implementation details in the appropriate forum.

WebDogOne




msg:4591964
 8:43 pm on Jul 10, 2013 (gmt 0)

Thanks for the quick tips. I was leaning in that direction so will follow through that way.
btw - I really wasn't sure what was causing this(and still not)so thought this forum appropriate.
Q: would these be viewed as duplicate content?
Q: would removing them using the Remove URL tool kill the site entirely?

aakk9999




msg:4591983
 9:40 pm on Jul 10, 2013 (gmt 0)

To answer your questions:

Q: would these be viewed as duplicate content?
Yes, if two URLs display the same page content, then this is viewed as duplicate content.

Q: would removing them using the Remove URL tool kill the site entirely?
If you specify the exact URL to be removed, then only this URL is removed (unless the specified URL is a folder, and you click on "Remove directory"). If you go this route, then the URL to be removed must either return 404/410 or be blocked by robots.txt. However, I would also read this before you proceed with URL removal: [support.google.com...]

However, before doing this, I would suggest that you take steps that pharanque has recommended - crawl your site with a tool such as Xenu Link Sleuth to see if these URLs are somehow created internally. You may be surprised what such crawl may discover - for example missing a leading slash may cause unexpected URLs to be created, also this can be result when URLs are "stringed together" using javascript or similar.

There is also another way of addressing duplicate content issue, which is to use canonical link element on the home page, but before doing this it would be a good idea to figure out whether incorrect URLs are somehow created from within your client's site.

g1smd




msg:4592003
 11:03 pm on Jul 10, 2013 (gmt 0)

Adding rel="canonical" is likely to help.

Redirects will need to be carefully coded and tested.

URLs with parameters are generally a nightmare. I made the decision several years ago to use extensionless URLs without parameters. This gives a LOT more control over exactly what can be indexed.

WebDogOne




msg:4592006
 11:18 pm on Jul 10, 2013 (gmt 0)

The thing is...there should be nothing at all generating these URL types. And, they aren't being picked up in Bing/Yahoo.
I am wondering if they are left overs from their prior site and just sitting in the Google database. However, why are they resolving to the home page?
The "fullsite" seems to indicate a mobile detect script.....or, perhaps there are cookies.... sigh...long night ahead

aakk9999




msg:4592027
 12:40 am on Jul 11, 2013 (gmt 0)

I am wondering if they are left overs from their prior site and just sitting in the Google database.
Yes, possible. It is also possible that Google is "adding" query string to see if this will uncover "new pages". I've seen this on one of my client's site where Google has added query string often found in Wordpress URLs (but the client site was not in Wordpress)

However, why are they resolving to the home page?
Many websites with dynamic page generation have the same problem. For example, if you try this (replace www.example.com with your client's domain name):

www.example.com?some-nonexisting-parm=1

You will most likely find that this will resolve to home page.

The reason is that in most back end scripts/applications, the script only looks for parameters that it *needs* to generate the page (and it will only error if mandatory parameter is missing). But most scripts will not check whether there are additional parameters that are not required. So appending spurious parameters will still generate the page based on the URL (and the other parameters the script needs).

One must be careful when coding the script not to accept any additional parameters because in this way you may end up supressing/redirecting pages with tracking parameters - hence handle this with care.

lucy24




msg:4592038
 1:07 am on Jul 11, 2013 (gmt 0)

Check parameters periodically in gwt even if none of your urls use parameters. If there's anything listed that you don't use, mark it as "ignore this parameter".

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved