|Duplicate home pages appearing in Google|
Webmaster showing error from phantom dupes?!
Duplicate home pages problem > what are: ?cat=1, ?cat=, ?fullsite=true
I noticed on one of my clients sites that there were pages that appear to be duplicates of the home page. These were found when I noticed a Error message regarding duplicate Titles and Descriptions in Webmaster Tools. The pages show as example.com/?cat=-1, example.com/?cat= and example.com/?fullsite=true.
These are not URLs we have created and are not showing on the server.
I checked the database using site:example.com and sure enough, these URLs are in the Google database.
Since they appear to be creating duplicate home pages I was about to use the Remove URL tool but then noticed that the URL was causing it to "Remove Site" rather than "Remove Page", which of course spooked me into A) thinking there were multiple versions of the site, or B) removing these URLs would somehow remove the entire site.
Q: What are these?
Q: Where do they come from?
Q: How do I safely get rid of them?
Any and all comments welcome. Thanks.
[edited by: phranque at 8:21 pm (utc) on Jul 10, 2013]
[edit reason] exemplified domain [/edit]
welcome to WebmasterWorld, WebDogOne!
google discovered these urls "somewhere" and it doesn't really matter where unless they were linked internally from your site.
use a tool such as xenu to crawl your site and see if they appear there.
if you are linking to non-canonical urls then you should fix that problem.
then to solve the googlebot problem, add some external redirects so that any requests for non-canonical urls (such as extraneous query strings) are redirected with a 301 status code to the canonical url.
depending on your server (apache? IIS?) you should post any further questions about specific implementation details in the appropriate forum.
Thanks for the quick tips. I was leaning in that direction so will follow through that way.
btw - I really wasn't sure what was causing this(and still not)so thought this forum appropriate.
Q: would these be viewed as duplicate content?
Q: would removing them using the Remove URL tool kill the site entirely?
To answer your questions:
Yes, if two URLs display the same page content, then this is viewed as duplicate content.
|Q: would these be viewed as duplicate content? |
If you specify the exact URL to be removed, then only this URL is removed (unless the specified URL is a folder, and you click on "Remove directory"). If you go this route, then the URL to be removed must either return 404/410 or be blocked by robots.txt. However, I would also read this before you proceed with URL removal: [support.google.com...]
|Q: would removing them using the Remove URL tool kill the site entirely? |
There is also another way of addressing duplicate content issue, which is to use canonical link element on the home page, but before doing this it would be a good idea to figure out whether incorrect URLs are somehow created from within your client's site.
Adding rel="canonical" is likely to help.
Redirects will need to be carefully coded and tested.
URLs with parameters are generally a nightmare. I made the decision several years ago to use extensionless URLs without parameters. This gives a LOT more control over exactly what can be indexed.
The thing is...there should be nothing at all generating these URL types. And, they aren't being picked up in Bing/Yahoo.
I am wondering if they are left overs from their prior site and just sitting in the Google database. However, why are they resolving to the home page?
The "fullsite" seems to indicate a mobile detect script.....or, perhaps there are cookies.... sigh...long night ahead
Yes, possible. It is also possible that Google is "adding" query string to see if this will uncover "new pages". I've seen this on one of my client's site where Google has added query string often found in Wordpress URLs (but the client site was not in Wordpress)
|I am wondering if they are left overs from their prior site and just sitting in the Google database. |
Many websites with dynamic page generation have the same problem. For example, if you try this (replace www.example.com with your client's domain name):
|However, why are they resolving to the home page? |
You will most likely find that this will resolve to home page.
The reason is that in most back end scripts/applications, the script only looks for parameters that it *needs* to generate the page (and it will only error if mandatory parameter is missing). But most scripts will not check whether there are additional parameters that are not required. So appending spurious parameters will still generate the page based on the URL (and the other parameters the script needs).
One must be careful when coding the script not to accept any additional parameters because in this way you may end up supressing/redirecting pages with tracking parameters - hence handle this with care.
Check parameters periodically in gwt even if none of your urls use parameters. If there's anything listed that you don't use, mark it as "ignore this parameter".