Forum Moderators: Robert Charlton & goodroi
Major Change in Supplemental Result Handling today:
Over the last 18 to 24 months, I have written many times about how a page can appear as a normal result for search terms that are located on the current version of the page, and as a Supplemental Result when you search for words that were on the previous version of the page (but are no longer on the current version of the page).
In the latter case those "old" words also appear in the snippet too. In both cases (old search and new search) the cache is usually just a few weeks old, so it never shows any of the words associated with the "old search".
As of today, the new search is still linking to the new cache, but the "old search" now brings up a cache that is dated just one or days before the date of the last change of content on the page, and therefore the cache DOES now show the old words from the old content.
This is a new thing today, and Google has NOT worked like that at any time in the last two years or more. So, rather than get rid of old supplemental results, Google now gives them more space on their server, now actually keeping the old cache copy for them alive too.
I was hoping that old indexed data with no matching cached page was going to get deleted from Google's index in their currrent tidy up.
However, what they have chosen to do, is not to delete it, but to now keep an older copy of the cache to go with it. This is addition to keeping a new copy of the cache in the normal index.
I have seen this effect on a large number of pages today. It doesn't happen for all sites, may be not all that data is complete yet?
Google begins to look more and more like archive.org every day.
So,if you alter a page, Google will return that page for the current content but it will also return that page if you search for the previous version of the content. Before today, you could only see a modern copy of the cache. Now, you get to see either a new copy or the old copy depending on exactly what you searched for.
I do still see "old snippet" for old search, and "new snippet" for new search, but with both pointing to a "new cache", just as Google has done for the last couple of years.
Maybe it still continues in some DCs that I am not currently looking at?
The results are utterly awful. There is no "exact match" for "quoted searches" any more, either.
Some of the searches I do, now have thousands of results, rather than dozens, but none of the results actually fit the search query.
Many SERPs are stuffed full of supplementals. Some are 100% supplemental results.
Some sites have lost 99% of their indexed pages.
Many cache dates go back to 2004 January.
This can't be the intended index? Can it? Tell me this is a joke.
Google has no record of one single page listed for this site. I need to mention that this is a 6 week old site.
I have older sites that have a few hundred listings that are showing the main page only, some sites showing 5 or 6 pages that were showing a few hundred.
What's going on?
At first, I thought I was suddenly banned, now I think there is something very major happening.
Can anyone enlighten me? I posted on MC's blog but it just seems I'm posting in between an argument between some other ppl and no one noticed.
Please? I'm very worried.
Thank you,
.::DC::.
64.233.161.99
64.233.161.104
64.233.161.107
64.233.161.147
64.233.167.99
64.233.167.104
64.233.167.147
64.233.179.99
64.233.179.104
64.233.179.107
64.233.187.99
64.233.187.104
72.14.207.99
72.14.207.104
72.14.207.107
216.239.37.99
216.239.37.104
216.239.37.107
216.239.39.99
216.239.39.104
216.239.39.107
It's tempting to think that he's been locked in the machine room until he and his team sort out the chaos that their over-zealous anti-spam filters are wreaking on the Google index. In reality, though, he's probably just taking advantage of the nice spring weather to chill by the beach for a few days; a mini celebration of the "fact" that thousands of naughty spammers are having their naughty pages wiped from the face of the planet.
The sooner Google realise that Spam is pretty much the only content that you can never filter out the better. Spammers will always find a way around.
The great irony is that it's just plain easy to create keyword-rich, original-looking content designed with the sole purpose of achieving good search engine rankings for AdWord revenues. Unfortunately, it is much, much harder to make genuinely useful content that is also search engine friendly. Until Google recognise this immutable fact and stop their futile meddling we're all in for a bumpy ride.
Until Google recognise this immutable fact and stop their futile meddling we're all in for a bumpy ride.
I think Google realised this a while ago, hence it tries using human behavioural patterns as sighted with analytics to track down what they consider useless sites.
I imagine they take some benchmark sites and try to get some form of bayesian or whatever recognition model together. If your site doesn't fit this model, your out. Something like 99,9% of all users staying on the page < 1 seconds after loading.
Like with all stats there will be error margins and as I said in a post before, a 0.01 error margin on 1 billion pages is still 10.000.000 pages. But maybe they get the error down as their sample is huge ..
Like spamassasin with a huge sample size, but better, some will get through most will land in the spam bin. Difference though the user behaviour is the data and not the data [as in the web pages] itself. This is imo harder to cheat as you have to create convincing user behaviour. You might be able to do php delay scripts, but they would have to be statistically accurate ..
=====
>>It's tempting to think that he's been locked in the machine room until he and his team sort out the chaos that their over-zealous anti-spam filters are wreaking on the Google index. In reality, though, he's probably just taking advantage of the nice spring weather to chill by the beach for a few days; a mini celebration of the "fact" that thousands of naughty spammers are having their naughty pages wiped from the face of the planet.
--
I do think he's theoretically locked in that room. The sites that are missing, some of them are very quality, very old sites with deep links from other very quality, very important sites.
I do still believe that something has gone very, very wrong. Maybe it all started with good intentions of a massive spam cleanup but something went haywire along the way .. IMO.
.::DC::.
There now appear to be three distinctly different batches of supplementals floating around.
While this multibillion dollar company tenaciously caring for my deleted stuff is heartwarming, I'd rather they come over and mow my lawn.
In case you miss some of your pages from 2004, on 216.239.59.104 an entirely different batch of supplementals has appeared there.
OMG, you're right. There are caches of pages that haven't even EXISTED for well over a year.
This is G o o g l e's cache of mysite/pages/contact.cfm as retrieved on 27 Sep 2004
I wonder why they're running out of space? LOL
.::DC::.
Yahoo fixes cache problem for $299 with Directory Submission [webmasterworld.com]
Can someone show Matt this - I'm sure the Google service team will be eager to beat this [ I hope :) ]
Previously, in order to see them you needed to make a search for words that were on the page at that earlier time, but which are NOT in the current version of the page right now. If you then search for words in your current content, you will likely see the same page URL returned, but NOT as a Supplemental Result, and with a cache from just a few weeks ago.
If Google has now dropped the current data for the page, and they have done that for some sites, you might only see the page as old data, with an old cache, as a Supplemental Result now.
On a site:www.mysite.com search our fully indexed (since recovery) site is now showing the "alt" text for our template header images rather than the meta description or snipet from the content.
example:
>>>MySite Homepage, All your widget needs... Quick find:. Please Select, --------------------, - WIDGETS, - WADGETS, - WOGETS, - WEEGETS, - WINGETS ...<<<
Obviously this means the snipet is the same for most pages so they only appear when clicking the additional link at the foot of the page.
All is OK for keyword searches and rank is uneffected but is seams weird to use header "alt" text rather than useful info in the snipets for site: search.
I hope this does not lead to any duplicate content issues as headers must be the same in most websites.
Today, many DCs return zero results every time for this and several other similar queries for stuff that Google should have cleaned up long ago.
Now, is this a DC that has been cleaned up of old Supplemental Results, or is it a DC that has the Supplemental data missing and Google is going to add it back in again, in the next few days?
Time will tell.