homepage Welcome to WebmasterWorld Guest from 184.73.87.85
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google indexed page that doesn't exist on my site.
themaninthejar




msg:4560551
 11:13 am on Apr 2, 2013 (gmt 0)

A duplicate page title in WMT has alerted me to this problem. Google seems to have indexed a duplicate copy of one of my pages and has indexed it under the same url with the string ?words=widget directly after the .htm.

Google .htm?words=widget and it appears to have happened to a few sites (although my site doesn't show in this serp).

The page that Google has indexed does not exist on my server space yet it's indexed under my domain, and being reported as a duplicate title and description in WMT.

[edited by: goodroi at 11:23 am (utc) on Apr 2, 2013]
[edit reason] Please read the forum charter [/edit]

 

phranque




msg:4560555
 11:29 am on Apr 2, 2013 (gmt 0)

The page that Google has indexed does not exist on my server space


when you say "doesn't exist" does that mean a request for the indexed url returns a 404 Not Found or a 410 Gone status code response?

themaninthejar




msg:4560576
 12:40 pm on Apr 2, 2013 (gmt 0)

No I mean I can't find it with my FTP manager.

netmeg




msg:4560608
 2:00 pm on Apr 2, 2013 (gmt 0)

Sounds like a parameter issue.

themaninthejar




msg:4560611
 2:36 pm on Apr 2, 2013 (gmt 0)

Where would the parameter issue reside?

themaninthejar




msg:4560612
 2:39 pm on Apr 2, 2013 (gmt 0)

Ah. I see that the moderator's edit has caused some confusion. The inserted word "widget" will lead you to believe it is a word that I use on my site, even a keyword. It is in fact the surname of a mass murderer and is not mentioned anywhere on my site and bears no relation to my subject matter.

tantalus




msg:4560623
 2:54 pm on Apr 2, 2013 (gmt 0)

put www.yoursite.com/mypage.htm?words=widget into a browser or server header checker and find out the response you are getting. There is one on this site I think but can't remember where so try this. [urivalet.com...]

If it returns a 200 OK then the prob is on your server side and google is seeing the different urls as duplicates. (most probably picked up by a link pointing to you site)

themaninthejar




msg:4560626
 3:13 pm on Apr 2, 2013 (gmt 0)

In a browser this returns the page duplicated from my original with the words= suffix.

topr8




msg:4560662
 4:27 pm on Apr 2, 2013 (gmt 0)

so the page does exist if it is called,
try going to
example.com/oneofyourpages.htm?a=anything

where oneofyourpages is actually one of your pages, assuming the page is displayed, the first thing i'd do is implement the canonical tag on your pages

and then look into using htaccess to block all pages with parameters

lucy24




msg:4560789
 11:04 pm on Apr 2, 2013 (gmt 0)

Most extensions do not use parameters, so any query string after .html or .jpg or what-have-you is simply ignored and the page is served up as-is. In theory you can change this behavior with the AcceptPathInfo setting (assuming Apache)-- but why the bleep should you have to?

Your search engine is willfully and wantonly attaching parameters to URLs that by their nature cannot have parameters. My own wmt parameters page begins with the line

newwindow=true

attached to an html URL. (They'll only show you one, though they claim more.) I have to assume that g### picked it up via some linking site, where "newwindow" refers to the site's internal behavior; it's obviously meaningless in isolation.

Remember that indexing and crawling are separate functions; a search engine will happily index a page it has never seen. All you can do is go into gwt, pull up the parameters page and edit to say explicitly "no effect on page content".

rainborick




msg:4560824
 2:38 am on Apr 3, 2013 (gmt 0)

Well, that's not really all you can do. You can add a rel="canonical" tag to the page, which would generally resolve the issue, or you could set up a 301 redirect. Then just use Webmaster Tools to do a Fetch As Googlebot, followed by a submit.

Google is pretty good at detecting and eventually correcting this issue on its own. But these corrective steps do speed up the process.

TheOptimizationIdiot




msg:4560825
 2:46 am on Apr 3, 2013 (gmt 0)

There's actually a bunch more that goes into the decision making when you're dealing with 1,000,000,000,000+ URIs and running a business.

First, there are so many who thought .html was better than .php (and may still be) eliminating .html with a parameter would be silly since people thought they would do better by parsing .htm and .html pages as php and using parameters on them.

Second, there are some of us who are "nutty" and serve images (or javascript or css) from .php files because we know browsers and search engines pay attention to the server headers and content, not the extension of a URI.

Third, when you run a major search engine and have the insane number of pages and URIs they have to deal with you hit a point of diminishing returns by worrying about coding for minutia like newwindow=true is, so what's way more cost effective than trying to figure out all the parameters you don't need to crawl is to spider the URI and see if it returns a 200 OK header, then if it does you do what you do (as Google does) and group the URIs with the same content together and 'give value to/return in the actual SERPs' what you determine to be 'the best/most authoritative one' you find.

When you really get into running a search engine and trying to figure out what to do with 1,000,000,000,000+ URIs/pages there's really a bunch of reasons to spider and 'let slide' a bunch of things many of us who don't deal with those numbers might think are silly or easy to fix, but they're really not 'that important' when you deal with things on the scale they have to code for, especially when you get into how time consuming finding and coding solutions for some of the things must be and how much better that time finding and coding for issues could be spent doing something else.

(For example, not picking on you Lucy24, I wouldn't have ever thought about coding for newwindow=true and with the number of URIs they have to deal with they might not have a clue someone was silly enough to even use it, so is the time spent digging through the insane number of URIs they have to deal with to find the 'goofiness' some people erroneously link with worth the time invested when they'll probably find a (relative) few at most or is the time of some search engineer with a doctorate and $1,000 an hour salary probably better spent somewhere else? IOW: how much would they profit by "eliminating" newwindow=true from the index and how is that possibly more than they would spend by finding and coding for it and other silliness on the part of webmasters? I don't see how they could really be bothered with it personally.)

lucy24




msg:4560858
 6:55 am on Apr 3, 2013 (gmt 0)

You can add a rel="canonical" tag to the page

Adding "rel='canonical'" won't do a particle of good if it was a static html page to start with, meaning that the parameters just go along for the ride and now every version of the page is calling itself canonical.

First, there are so many who thought .html was better than .php (and may still be) eliminating .html with a parameter would be silly since people thought they would do better by parsing .htm and .html pages as php and using parameters on them.

Contrariwise, calling php html and then turning around and attaching parameters is itself so silly, why would the search engine make extra work for itself by playing along? Take the html at face value, request htm(l) pages without parameters and index what you get. If the site owner ends up with un-indexed pages, surely that's their problem and not the search engine's.

topr8




msg:4560865
 7:12 am on Apr 3, 2013 (gmt 0)

Adding "rel='canonical'" won't do a particle of good if it was a static html page to start with, meaning that the parameters just go along for the ride and now every version of the page is calling itself canonical.


it's first thing in the morning for me so i might have misunderstood you lucy24 - but from what you are saying you misunderstand the canonical tag.

it should be used like this:

<link rel="canonical" href="http://www.example.com/mypage.html">

and it should tell google that if the page has paramaters to treat the page as though it was without parameters.

calling php html and then turning around and attaching parameters is itself so silly


i would agree with this though, i would have thoguth a basic rule could be to ignore parameters with and htm/l extension

bhonda




msg:4560902
 10:04 am on Apr 3, 2013 (gmt 0)

request htm(l) pages without parameters

I could be off on one here, but what about javascript? I could be wrong, but isn't it valid to use a querystring/parameter for use by javascript, which could change what is rendered/displayed on the page?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved