Forum Moderators: Robert Charlton & goodroi
What chance does any webmaster have when Google appears to be so messed up? Perhaps, Adam Lasnik or Google Guy could enlighten me as to how the situation observed can arise.
If this has been covered elsewhere, please point me in the right direction.
g1smd:
Another type of Supplemental Result is where the page is simply the previous version of the page. The current version is shown as a normal result, but if you search for keywords that were on the page some 8 to 30 months ago (and which are no longer on the current version of the page) then you see the same page as a Supplemental Result.The snippet will usually also show that same old content, but the cache will always be the one from recent days or weeks (except for a brief time last week when the old cache would show against the old results in several datacentres).
[webmasterworld.com...]
In short, this is intentional. The main purpose of the Supplemental index is for Google to be able to respond to a wider variety of obscure and complex queries.
[edited by: tedster at 6:20 pm (utc) on Sep. 22, 2006]
Yes. That is how Google works. Current content found at a URL is in the normal index, and any content on older versions of the page (and not on the current version of the page) is in the Supplemental Index. However this is NOT duplicate content. Only one of the results is served at any one time. Which one is served depends on the query.
I will modify the statement of mine that Tedster quoted above though. That stuff was my best guess a year ago. Supplemental Results used to mainly span 8 to 30 month old data, but for the last 6 months or so (Google has updated Supplemental Results several times) I would say this is now mainly 3 to 15 months, or so. I don't see anything (gfe-gv) older than (dated before) 2005-July right now. If I look at gfe-eh then I don't see anything older than (dated before) 2005-December, I think.
Supplemental Results that represent URLs that redirect, or are 404, or are for expired domains, can be safely ignored. Those will be dropped after one year. In the meantime your 301 redirect or your custom 404 page delivers the visitor through to the correct page anyway.
Supplemental Results that are for URLs that still return "200 OK" need to be investigated. Many times it will be a Duplicate Content problem (www/non-www, multiple domains, multiple parameter order, http/https, URL capitalisation issues {on IIS only} etc, maybe even too-similar titles/descriptions) and those are always a big problem for any site. However, sometimes it is just the new data / old data situation for the same URL, and that is not a problem. Google likes to hold on to the old version of a page so that someone who looked at it the day before you changed it, and now wants to look at it again, many weeks later, can still find it today.
My more recent thoughts are in: [webmasterworld.com...]
Make sure that every page of your site links back to the root index page, but always ensure that you link only to www.domain.com/ or to www.domain.com/folder/ and always omit the index file filename itself from the URL.
Finally, in this long expose on Duplicate Content issues, make sure that every page of your site has a unique title tag and a unique meta description - one that describes exactly what can be found on that particular page.
These searches are useful in finding out what is going on:
site:domain.com
site:domain.com inurl:www
site:domain.com -inurl:www
site:www.domain.com
site:www.domain.com inurl:www
site:www.domain.com -inurl:www
Unfortunately, our site is one of those sites hit very badly with supplemental results for our existing pages.
(I am refferring to second level pages, our home pages is being indexed regularly).
The reason our site went supplemental is due to a url rewritting wich can not be 301 redirected. The site was written in cold fusion and 301 redirecting creates a link loopage.
We have no links to our old url's, only to our rewritten url's and we surely thought, that would take care of the problem.
However, Google is not crawling our pages due to the supplemental issue ever since May.
What could we possibly do to get Google understand our situation, get our supplemental pages crawled once and for all?
Please help us..
P.S. Shall you need any clarifications or have any questions, please ask, we are desperate in getting this taken care of..
Last, and definitely least, you could try to use robots.txt to keep the bots out of the other "versions" of a page.
Whatever you do, the alternatives will turn Supplemental and hang around for a year before Google deletes them from view.
The problem is that both url's are pulling the same exact page. I have to treat both url's the same being that they use the same exact content. As far as linking, we only link to the rewritten url's, nothing is linked to our old url's.
How could I make the rewritten url's stronger and more noticeable for Google to start indexing these url's and get it out of supplemental?
[edited by: F_Rose at 7:29 pm (utc) on Sep. 25, 2006]
.
Imagine a shirt shop that messed up their URLs. They had a whole bunch of URLs like this:
/shirts.php?colour=blue&size=16
/shirts.php?colour=red&size=17
/shirts.php?colour=green&size=14
/shirts.php?colour=red&size=15
/shirts.php?colour=white&size=17
and then they found that another part of the site accessed the same five pages using:
/shirts.php?size=16&colour=blue
/shirts.php?size=17&colour=red
/shirts.php?size=14&colour=green
/shirts.php?size=15&colour=red
/shirts.php?size=17&colour=white
giving "exact" duplicate content.
.
You could put:
Disallow: /shirts.php?colour=
in the "robots.txt" file and the problem is solved.
Well, not in the best way, as you are throwing away some PR on the "other" version. But at least there is no more duplicate content exposed to indexing.