Forum Moderators: Robert Charlton & goodroi
"We've all watched the MC video and read every thread on here when it comes to data refreshes, but I think we're still missing something. What is it that changes during that push time that wreaks such havoc?"
Possibly the push operation is not a system wide atomic event. Dance anyone ;)
If you only have a short term disruption that could be one answer.
Please note the use of the weasel words possibly and could.
[edited by: theBear at 1:51 pm (utc) on Aug. 19, 2006]
In some ways, it really does resemble the dances of old...how I hated them [I was digging the everflux idea and hope that we can return to that type of constantly updating model on Sept 22]. Yeah, I know that everflux is really just a continual update of data, but isn't that what the data refresh should be doing, albeit in a more compacted timeframe? I haven't seem much about refreshing with a new binary each time, and have yet to see any credible answers as to "why" the microfilters seem to exist only during this update time.
Granted, this particular refresh has treated my site a little differently than it did on Jun 17 and July 27, so I am not seeing filter-busting with the use of "" or &filter=0, so I can't really say for certain that microfilters are occuring again [someone that went from #1 to AWOL will have to comment on that] or even if the refresh is complete and this is the starting point until the next data refresh, with gfe-eh holding the most accurate results (g1, I agree with you now on that; almost everything about this datacenter looks better to me on the advanced queries).
Like everytime this happens, now and hopefully in the future, I try to remove myself from the situation a bit, once understanding that a refresh/update/SERPS_in_blender/dance is occuring. It's good to get a general feel for what seems to be happening, and then backing off. The more we try to track every little movement during times of fluxuation, the more stressful it becomes. Back to marketing...
Cygnus
In an ideal world, the various site searches should show all pages as www and no pages as non-www URLs.
In practice, many sites will show a selection of both www and non-www URLs. The aim here is maximise the number of www pages listed, and maximise the number of listed www pages that are not supplemental.
The number of non-www pages listed as supplemental is totally irrelevant. As long as non-www URLs redirect to their equivalent www pages for live pages and return a 404 error for pages that no longer exist, then all is well with the site. Google will hang on to those types of supplemental results (URLs that redirect or are gone) for a year or more. You cannot alter that. Don't bother measuring them.
Every page of a normal site has the current content as a normal URL, and the older version of the same page is stored as a supplemental result. You see the supplemental result when you search for older content that used to be on the page, but no longer is. You cannot change this.
For a site with duplicate content, then none, some, or all URLs for that same content will be normal results and none, some, or all alternative URLs will be shown as supplemental results. Aim to get all non-canonical URLs deindexed. What is left will be a mixture of normal and supplemental results. Let Google reindex the site. It will take a while for the supplemental status to be lifted for the remaining URLs.
The "nonsense" searches like site:www.domain.com -inurl:www (show all www pages that do not have a www in the URL) show non-www URLs whatever their status is, and www URLs that have a supplemental tag for at least one version of the cached content.
Some sites return zero for some of the searches. Those are the ones that have perfect canonicalisation. Most others show many anomolies.
The take away here is to measure how many normal www results there are, not how many supplemental or non-www results there are (while making sure that every piece of content has only one URL for it, there is a site-wide redirect from non-www to www, and each page has unique title and meta description tags).
Umm, "the take away here is..."; jeez I'm starting to sound like Matt Cutts. Oi! Google! Stop messing with my mind...
My two largest sites dove in Google ranking on June 27 (traffic down about 80%), but one of them came back on July 27, and was indexed correctly. The other one still listed hundreds of pages before the home page.
On Aug. 17, both sites were indexed correctly (home page first), but both also dropped almost completely out of Google's search results. However, I also noticed the number of pages indexed for each site was down about 75%. The number of pages listed for each site is now actually more accurate, but it seems to have negatively affected my search ranking. Still confused.
I checked them out and most of them only have one small paragraph of text (it's a poetry site where kids post their own poetry). I assume this SR was due to not enough text on the page compared to the header/footer/menu so I deleted the footer links which were a repeat of the top menu anyway.
HOWEVER, I just checked again a few hours later and the SRs no longer show up in the inurl search so Google must be doing something with that command.
When I do the site:example.com I get 6 listings of 1060 then the blurb about seeing omitted results, strange thing is when I click on the see omitted blurb it then shows 10 results out of 837, click second page then its 11-20 out of 783 etc. Home page can finally be found on page 3 and non supplemental.
I know this isn't something new, just thought I would share.
Guess we will just have join the waiting list with everyone else.
BTW, these results were from 72.14.207.104
Does anyone know if removing my domain completely from Google would result in a clean re-introduction at a later date (i.e. when googlebot next follows my incoming links).
Thanks in advance
Col :-)
Jesus get your stuff together or drop your search engine.
Supplemental results is NORMALY results which are almost 100% the same as another page, but these days you just need the same description and the whole body of the page can be different then its also supplemental result or 50% of the body text in some cases, but here the last days A LOT of good site has become supp. I just pray that VISTA (with search fuction on desktop) will come here in jan-feb. be cause if you tell then there is a problem they sometimes fix it the same day.
Here is what has been happening. The majority of the site's pages have continued to rank well. It has had steady traffic while all summer it has been losing pages, a few here and a few there. They mostly appear to be bottom level pages on the site. Occasionally, one or two would reappear and one or two others would disappear. Then recently Matt indicated all old (2005) supplementals would be updated by the end of August. The next day all the 2005 supplementals were gone. Replaced by other 2006 cached date supplementals to add up to about 435 pages for the site. This is close to the number of pages on the site. These new supplementals appeared in the sitemaps site:command, as well as across other IPs. The next day they were gone again. Then a couple of days ago in the sitemaps site:command and across some of the other IPs I follow the 2006 supplementals seemed to be back. The site was listed as having 435 but I could only get 160 or so to show. I tried clicking all different ways. Only 3 new supplemental pages showed at the bottom of the last page. The rest would not come up.
I thought this could be a glitch. This never changed until last evening and this morning. Now more pages are going away, a few here and a few there, each time I check. So the total number of pages is down to 384. The missing pages are likely more bottom level pages. It isn't any of the 2006 supplementals that are still not showing. But is based on the numbers for the pages that do show.
Here's the funny part. Traffic is still there. It must be a glitch somehow. I can understand numbers changing as Google works through it's database changes. But it is strange when pages are listed but act like they aren't there. Is anyone else experiencing this problem?
Thanks,
Librarian
What may have triggered this is by listing the non-www as the preferred version of the domain name in the sitemaps domain name choice area. We already had it listed by using the 301. I may remove the choice in the Webmaster Tools area.
I thought perhaps more pages were being converted to the www version but not show up.
Librarian
Google have been testing a new infrastructure that includes, among other things, an improved "site:" search. The new site: search command returns much more accurate results for sites with more than 1000 pages. So, where the old site: command might have returned 100,000 pages, the new, more accurate one, might return something like 10,000 pages.
As of last night, Google seem to have rolled this new infrastructure out accross all DCs. They've done this before in recent weeks, only to revert it some hours later, so this may or may not stick. But the point is, the new more accurate site: search is why some of you are noticing missing pages with no loss of trafic.
Now...if Google would just fix the bug/s they introduced on the 27th June maybe I could get my missing traffic back...
my side has about 5800 pages, 5000īs are in Supp hell. But
till 17. Aug. google germany shows terrible results on the most prefered DC on 72.14.221.104 and .99 . It seems like google likes now forums and the most silly thing pages with classified ads like <edited>! And lots of nonsens. If this was a Data Push then google pushed the waste out of the dustbin. Let us cross the fingers and hope they will things get better.
Great job!
Edit: And this is not explicable with link popularity or on page seo.
<edit reason - no specifics, please>
[edited by: tedster at 7:02 pm (utc) on Aug. 21, 2006]
What is fantastic is google's webmasters guidlines:-
Quality guidelines - basic principlesMake pages for users, not for search engines.
Yep, well that's what we did and the result is supplemental hell.
Isn't it about time you guys at google re-wrote that.
How about:-
Follow these guidelines and you'll probably not appear in our full index. This is reserved mainly for spammers.
I'm sorry to rant, but I just feel that google is not being fair to us or the user. The lack of feedback is an admission to me that they have big issues.
I don't ever recall seeing anyone say before that 72.14.221.x is their preferred IP.
Maybe Google are bringing more datacentres into the rotation.
Sorry gsm1,
google mostly uses this two IPs in the last two weeks in rotation with 66.249.99.x. But the last IPs are realy rare in last two weeks. So I made the quote, that they prefer it for Germany.
The presence of supplemental results may not necessarily be a bad sign. Google holds on to many 404 and expired pages, old versions of edited pages, and certain types of duplicate content as supplemental results. They cannot be removed.
Measure what is properly indexed, check the various site searches, and check my previous posts in this thread for some clues that I hope can help you.
I have sorted all of the problems for all of the sites that I watch. I realised several months ago that counting supplemental results is the wrong thing to do. This latest update has confirmed that. Supplemental results are results that Google are going to hold on to for a year before dropping (as long as the URL returns 404 or 301). I make sure that all content has only one canonical URL, and that that URL gets fully indexed. I don't worry that other URLs show as supplemental right now, just as long as those other URLs return a 301 or a 404 status code. Google will drop the 404 and 301 URLs after a year.
[edited by: Northstar at 8:23 pm (utc) on Aug. 21, 2006]
anyone seeing a correlation between this and the site having its ratings go up up soon after?
A week or so ago, I noticed the same thing, yet my rankings are still down--page 3-5. Googlebot activity has picked up drastically though (I got linked from a very pop site too) and the number of pages has increased to about 100%, still about 30% being supplementals.
Can you please run a site:domain.com and let's see if we figure something out?
I get about 4 results, but about 400 are not shown (Results 1 - 4 of about *** from www.domain.com for -inurl:www.) The results are the same for www and without. Even those 4 listed, have the www.....com/page.html address on google's listing. What to make of this?