Forum Moderators: Robert Charlton & goodroi
This is my first post, so hello there.
We have been having a nightmare too - all ties in with the dates mentioned, and pages were mostly non supplemental until today. Then, wham, most pages gone and our visitor numbers devastated.
One of our competitors was affected first time around, but has stayed rock solid ever since - wish we were so lucky. Can't see any real difference apart from they have a PR4; we are PR3. Been fixing things like 404 server responses, uploaded a new sitemap, removed old 301 redirects. I can't think of anything else to fix this.
I've been searching for months to solutions to this, but like most people had no real luck. Any meaningful advice gratefully accepted.
Regards,
Darren
I see another big clean up in supplemental Results there.
I see many recently created new Supplemental Results representing content from just a few months ago. These are where the content has been edited very recently (and searches for the new content point to a non-Supplemntal Result), or where on-site duplicate content existed but the alternative (non Supplemental) URLs have now been filtered out (in one case the filtering was through robots directives to get certain URL formats delisted).
I did notice when doing site: command that there were very OLD pages that no longer exist being displayed in G's search results. It is as if they were bringing in results from a year ago.
The site:query shows all 120.000 pages, but the main page is not in the first position.
On 72.14.207.104 the site:query seems shows now only 56.000 pages, and ranking is very bad.
I was not affected during the 27th june-problem.
And Adsense earnings are of course way down :(
[edited by: HuhuFruFru at 3:25 pm (utc) on Aug. 17, 2006]
I think it's probably a good thing that we're seeing all this flip-flopping activity from Google at the moment. I have a hunch that they will look at a bank holiday weekend for an update and (although a long shot) we, in the England have our August bank holiday starting Saturday 26th August. The mad scientists at Google are obviously planning something, the supplementals have shot up from the recent low afew days ago ... A sign I think that changes are coming soon.
On my sitemap diagnostics, I have been watching regular trawling and the 404 report keeps going up and down. I need a complete site visit from Googlebot and I'm hoping that next weekend will be the time. From my rather rusty memory, in the past the visits started mid-week and the odd bit of flux started showing on the Thursday / Friday.
Of course, the sting in the tail is that if you are not helped out by the changes ... it could a long time coming before the next one.
I don't know ... could be a load of rubbish, it just makes me feel good thinking ahead.
All the Best
Col :-)
Also, site:www.domain.com on gfe-eh.google.com only displays 33 pages out of 1,690 for one of my sites.
Two steps forward, one step back.
[edited by: Halfdeck at 5:03 pm (utc) on Aug. 17, 2006]
Today, all the .co.uk listings disappeared on gfe-eh; and the .com listings remain indexed and ranking as before.
This did not happen on previous updates; but other sites redirected earlier did update back then. It looks like Google wants to hang on to redirected URLs for at least a year for various purposes, before dropping them.
So, measure the effectiveness of your redirects not by whether the redirected URLs still appear in the SERPs, but by how well the domain that you redirected to is faring: are all pages listed, are they fully indexed, are they NOT URL-only or Supplemental, and does most of the site appear in a site:domain.com search before the "click here to see omitted results" message appears?
Matt Cutts suggested deep links etc should help the problem and many others had similar theories, so rather than sit on my hands I had a dabble at something that I knew couldn't hurt.
I fired out a press release (we had some large announcements anyhow) linking to 5 pages on our site. Not expecting much short term I was suprised to see all five pages back fully indexed with an up to date version of each page (aug 11).
An incredible result that proved that more links would solve all.
EXCEPT:- This morning I've just checked and all the pages have gone back to supplemental with a cached date of 29th May!
Well frustration reigns yet again. How poor is this for the users? It's pretty pathetic google. The profits you guys make, in my humble opinion is hugely disproportionate to the quality of service you provide.
Internet users deserve better results than what you’re serving up.
You’re fortunate you’re still operating in an immature industry but you will be found out. I hope that before long a player with sufficient resources gets involved, cos there’s relatively easy money to be had.
Far easier than running an airline.
I would expect Matt Cutts to start making comments on it in only a few weeks time, or so, judging from the recent hints on his blog.
But the point is, why is it in the supps in the first place? That's the $million question. At the end of the day, it's unique content that's relevant to potential searches.
Why not just index pages and give them a fighting chance in the index, putting what it deems to be the most relevant up top?
Is it not possible that google can't actually cope at the minute and it's not the be all and end all in search engine technology?
For a page that goes 404 or the domain expires, Google keeps a copy of the very last version of the page that they saw, as a Supplemental Result and show it in the index when the number of other pages returned is low. The cached copy will be quite old.
For a normal site, the current version of the page should be in the normal index, and the previous version of the page is held in the supplemental index.
If you use search terms that match the current content, then you see that current content in the title and snippet, in the cache, and on the live page.
If you search for terms that were only on the old version of the page, then you see those old search terms in the title and snippet, even though they are not in the cache, nor found on the live page. That result will be marked as supplemental.
There are also supplemental results where the result is for duplicate content of whatever Google considers to be the "main" site. These results seemingly hang around forever, with an old cache, a cache that often no longer reflects what is really on the page right now. Usually there is no "normal" result for that duplicate URL - just the old supplemental, based on the old data. On the other hand, the "main" URL will usually have both a normal result and a supplemental result (but not always).
If you have multiple URLs leading to the same content, "duplicate content", some of the URLs will appear as normal results and some will appear as Supplemental Results. The Supplemental Results will hang around for a long time, even if the page is edited or is deleted. Google might filter out some of the duplicates, removing them from their index: in that case what is left might just be a URL that is a Supplemental Result.
The fix for this is to make sure that every page has only one URL that can access it; make sure that any alternatives cannot be indexed. Run Xenu LinkSleuth over the site and make sure that you fix every problem found. Additionally do make sure that you have a site-wide 301 redirect from non-www to www as that is another form of duplicate content waiting to cause you trouble.
Also, make sure that every page has a unique title tag and a unique meta description, as failing to do so is another problem that can hurt a site.
If you have multiple URLs leading to the same content, "duplicate content", some of the URLs will appear as normal results and some will appear as Supplemental Results. The Supplemental Results will hang around for a long time, even if the page is edited or is deleted. Google might filter out some of the duplicates, removing them from their index: in that case what is left might just be a URL that is a Supplemental Result.The fix for this is to make sure that every page has only one URL that can access it; make sure that any alternatives cannot be indexed. Run Xenu LinkSleuth over the site and make sure that you fix every problem found. Additionally do make sure that you have a site-wide 301 redirect from non-www to www as that is another form of duplicate content waiting to cause you trouble.
Also, make sure that every page has a unique title tag and a unique meta description, as failing to do so is another problem that can hurt a site.
Many thanks for that thorough reply, a nice summary of possible problems.
What do you suggest if none of these issues apply. All I can think of is when we launched we weren't aware of the www / non www problem but then put this right with a 301 redirect months ago.
I may be over-simplifying this whole issue, but this whole supplemental thing seems very over complicated and no-suprising that google's indexing is in such a mess.
Why not:- index everything, and rank what it deems more relevant over the non relevant for the search phrase entered? It seems a massive waste of resources to cache different version of the pages then revert back to old versions.
Let's say that google deems a page to be dupe enough to make it supplemental (wrongly in some cases) and therefore it punishes the content of that page and doesn't show any very specific search phrases as results. What if that page was the only page on the web that contained information on that search term?
The user isn't getting the best result possible.
<edit reason: format quote>
[edited by: tedster at 7:12 pm (utc) on Aug. 17, 2006]
They do this whether the page is 404, the domain expired, or the page content has been edited and updated: they keep an old copy for many months as a Supplemental Result.
Every URL for an active site has the new content in the normal index, and the old content in the Supplemental index.
For a non-active site/URL, the last known content is moved to the supplemental index a few weeks after the site/URL is no longer active.
When duplicate content is also dropped into the equation, then things become very messy. Some URLs for an active site will already be Supplemental, and others will be filtered out as duplicates. This happens in several very specific circumstances: when a domain has content showing as "200 OK" at both www and non-www URLs, when a site has several domains (perhaps .com and .co.uk) all pointing to the same content with "200 OK" status for all URL variations, and where a "page" has several valid URLs that can reach it (like the 10 or more variations for each thread in a forum like PHPbb or vBulletin; or the navigation by various searches in shopping carts - where the path followed builds a URL, and each product has dozens of different paths, and hence URLs, to reach it).
Having one canonical URL for every piece of content (especially for dynamic sites like forums and carts), 301 redirects from non-www to www, and excluding URLs that can only deliver proper content to users that are logged in (like "post new thread" and "reply" and "send PM" URLs in a forum) is key.
Matt Cutts was right all along; just some of the Google-speak was too cryptic to understand what the long term implications of some things really are; and I can see that there are certain types of spam that these actions can severely cripple; as well as legitimate sites where the owner does not take enough care with their site architecture, or cannot interpret the symptoms of what is going wrong.
Just to say that these searches are very important:
site:domain.com
site:domain.com inurl:www
site:domain.com -inurl:www
site:www.domain.com
site:www.domain.com inurl:www
site:www.domain.com -inurl:www