Forum Moderators: open

Message Too Old, No Replies

Lost Index Files

         

Napoleon

11:10 am on Jun 23, 2003 (gmt 0)



Some people may not like this post, and criticism of it would not at all surprise me. However, I suggest that before reading it you ask yourself whether Google actually DESIRES the current fluctuating situation. Is it where they actually want to be? Is it what they want to project to webmasters and the public?

Against that background perhaps the following analysis and theory may fall into place more easily.

DATA ANALYSIS AND BACKGROUND[
Last week I posted a message requesting members to sticky mail details of their own specific situations and sites with respect to the fluctuations.

After spending days analyzing, and watching the picture continue to change before my eyes, I eventually found a theory to hang my hat on. No doubt it will be challenged, but at least it currently fits the data bank I have (my own sites, plus a third party observation set I use, plus those that were submitted to me by the above).

Two general phenomena seem to be dominating the debate:

a) Index pages ranking lower than sub-pages for some sites on main keyword searches

b) Sites appearing much lower than they should on main keyword searches, yet ranking highly when the &filter=0 parameter is applied.

These problems are widespread and there is much confusion out there between the two (and some others).

The first has probably attracted most attention, no doubt because it is throwing up such obvious and glaring glitches in visible search returns (eg: contact pages appearing as the entry page to the site). The second is less visible to the searcher because it simply torpedoes the individual sites affected.

By the way, in case anyone is still unaware, the &filter=0 parameter reverses the filter which screens out duplicate content. Except it does more than that.... it is currently screening out many sites for no obvious reason (sites that are clearly clean and unique).

So why is all this happening? Is there a pattern, and is there a relationship between these two and the other problems?

Well at first I wrestled with all sorts of theories. Most were shot down because I could always find a site in the data set that didn't fit the particular proposition I had in mind. I checked the obvious stuff: onsite criteria, link patterns, WHOIS data... many affected sites were simply 'clean' on anyone's interpretation.

Throughout though, there was the one constant: none of the sites affected were old (eg: more than 2 years) or at least none had old LINK structures.

This seemed ridiculous. There would be no logic to Google treating newer sites in this manner and not older ones. It is hardly likely to check the date when crawling! But the above fact was still there.

I have been toying with all sorts of ideas to resolve it... and the only one that currently makes any sense is the following.

THE GOOGLE TWILIGHT ZONE
In addition to WebmasterWorld I read a number of search blogs and portals. On one of these (GoogleWatch) a guy called Daniel Brandt quotes GoogleGuy as stating: "That is, we wind down the crawl after fetching 2B+ URLs, and the URL in question might not have been in that set of documents".

Now, assuming that is true (and it's published on the website so I would imagine it isn't invented), or even partially true, all sorts of explanations emerge.

1) The 2BN+ Set
If you are in here, as most long standing and higher PR sites will be, it is likely to be business as usual. These sites will be treated as if they were crawled by the old GoogleBot DEEP crawler. They will be stable.

2) The Twilight Set
But what of the rest? It sounds like Google may only have partial data for these, because the crawlers 'wound down' before getting the full picture. Wouldn't THAT explain some of the above?

To answer this question we need to consider Google's crawling patterns. One assumes that they broadly crawl down from high PR sites. They could also crawl down from older sites, sites they know about and sites they know both exist and are stable. That too would make sense.

You can probably see where this is heading.

If your site or its link structure is relatively new, and/or say PR5 or below, you may well reside in the twilight zone. Google will not have all the data (or all the data AT ONCE) and you will be experiencing instability.

I have sites in my observation set that enter and exit both the problem sets above (a) and (b). It's as though Google is getting the requisite data for a period and then losing some of it again. As if the twilight zone is a temporary repository, perhaps populated and over-written by regular FreshBot data.

The data most affected by this is the link data (including anchor text) – it seems to retain the cache of the site itself and certain other data. This omission would also partially explain the predominance of sub-pages, as with the loss of this link data there is nothing to support the index above those sub-pages (Google is having take each page on totally stand alone value).

IS IT A PROBLEM?
I also wonder whether Google sees all of this as a problem. I certainly do. Problem (a) is clearly visible to the searching public. They DON'T want to be presented with the links page for example when they enter a site! That is a poor search experience.

Do they see (b) as a problem? Again, I do. Sites are being filtered out when they have no duplicate content. Something isn't right. Google is omitting some outstanding sites, which will be noticeable in some cases.

The combination of (a) and (b) and perhaps other less well publicized glitches gives a clear impression of instability to anyone watching the SERPS closely (and that's a growing body of people). Together they are also disaffecting many webmasters who have slavishly followed their content-content-content philosophy. As I inferred the other day, if following the Google content/link line gets them no-where at all, they will seek other SEO avenues, which isn't good for Google in the long term.

WHY HAVE A TWILIGHT ZONE?
Some people speculate that there is a software flaw (the old 4 byte / 5 byte theory for URL IDs) and that consequently Google has a shortage of address space with which to store all the unique URL identifiers. Well... I guess that might explain why a temporary zone is appealing to Google. It could well be a device to get around that issue whilst it is being solved. Google though has denied this.

However, it may equally be a symptom of the algorithmic and crawler changes we have seen recently. Ditching the old DeepBot and trying to cover the web with FreshBot was a fundamental shift. It is possible that for the time being Google has given up the chase of trying to index the WHOLE web... or at least FULLY index it at once. Possibly we are still in a transit position, with FreshBot still evolving to fully take on DeepBot responsibility.

If the latter is correct, then the problems above may disappear as Freshbot cranks up its activity (certainly (a)). In the future the 'wind down' may occur after 3BN, and then 4BN.... problem solved... assuming the twilight zone theory is correct.

At present though those newer (eg: 12 months+) links may be subject to ‘news’ status, and require refreshing periodically to be taken account of. When they are not fresh, the target site will struggle, and will display symptoms like sub-pages ranking higher than the index page. When they are fresh, they will recover for a time.

VERDICT?
Certainly evidence is mounting that we have a more temporary zone in play. Perhaps problem (b) is simply an overzealous filter (very overzealous indeed!). However, problem (a) and other issues suggest a range of instability that affects some sites and not others. Those affected all seem to have the right characteristics to support the theory: relatively new link structure and/or not high PR.

The question that many will no doubt ask is that, if this is correct…. how long will it last? Obviously I can’t answer that. All I have put forward is a proposition based upon a reasonable amount of data and information.

I must admit, I do struggle to find any other explanation for what is currently happening. Brett’s ‘algo tweak’ suggestion just doesn’t stack up against the instability, the site selection for that instability, or the non-application to longer established sites.

The above theory addresses all those, but as ever…. if anyone has a better idea, which accounts for all the symptoms I have covered (and stands up against a volume of test data), I’m all ears. Maybe GoogleGuy wishes to comment and offer a guiding hand through these turbulent times.

Anon27

4:10 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



Yes, but will it be there 5 minutes from now? : )

Most likly not...

But it is a good sign to see it again.

dvduval

4:13 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's been there for about 12 hours. I'll keep my fingers crossed.

Note: I used the Google "site submit" page a couple of times to make sure I was on the list (even though I have several hundred incoming links).

PollyG

4:18 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



I had the missing index file problem but it reappeared yesterday and now our results combined with the sub-page results are great.

I'm sure it's a coincidence but the index file re-appeared the day after updating it with some new product news.

skipfactor

4:30 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is the first time I've seen my index match its allincnchor ranking in the -fi SERPs for both of its 2 optimized keyphrases. Definitely a good sign.

my3cents

4:45 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



I'm seeing it to, but it seems to be for only particular keywords, others still show different paths to the same file, etc.

I guess this would debunk the theory that it is something as simple as flicking a switch, if the whole index could realize when it had the same file indexed by different paths then results for different terms at that dc would all show the effects of merging them into one?

It's a good sign none the less, let's hope this progresses instead of stalling and reverting again.

mrguy

4:47 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry,

I'm not seeing for my watched SERPS.

Still looks all screwed up here!

mfishy

4:47 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



skipfactor,

I'm getting the same for the first time but for dc and va :)

The weirdest part about this update is that on some datacenters I'm #1 and #2 and on others I'm #1,000,000 :)

It's not a small fluctuation of SERPS

my3cents

4:53 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



no it's not small fluctuations, what I am seeing is that for certain terms on -fi , index pages have returned to where they have always been, while on other terms they are still several pages down.

I wonder if Napoleon is noticing the same thing?

my3cents

NovaW

4:55 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



With google it's like the clock chimes 12 and the princess turns back into a pumpkin.

Yesterday I posted that for the 1st time since Dominic - the rankings were back to normal across all datacenters. Well the clock chimed and it's back to where it was - buried.

What seems significant is although there has been a constant flux with steady improvement towards pre-dominic standings - yesterday was the 1st time I saw a consistent & fast change across all datacenters & then a change back 24hrs later.

It's kind of annoying to say the least.

Quite often when I search for something I do a search - find a site & then the following day I find that site again by doing the same search. Maybe it's just me, but when you can't find the site 24hrs later it's irritating - I don't see how moving the results around like a yoyo is good for Google or anybody using google.

Alphawolf

4:57 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



Brett,

Why does everyone always forget se history?

You can't forget what you weren't around to see. I've been a member here just 6 months now.

To many of us this is our first big change that we have noticed since our SE radar wasn't finely tuned until we joined WW.

AW

dvduval

5:10 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To many of us this is our first big change that we have noticed since our SE radar wasn't finely tuned until we joined WW.

Exactly. To make a similar comparison, there are always more undergraduate students that graduate students.

I hope WebmasterWorld never loses sight of that!

customdy

5:14 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



"With google it's like the clock chimes 12 and the princess turns back into a pumpkin. "

For us the clock only gets to chime 3 times... I know I need to relax and stop checking every 10 minutes but I am seeing our SERPs on www change atleast 4-5 times a day.

Results on fi looking good, for now anyway..

illad

5:15 pm on Jun 26, 2003 (gmt 0)



I think Brett's comment was amusing. He was trying to quell people's fears by pointing out other engines that attempted to "improve" their algo. Unfortunately these engines all failed horribly in their efforts and were brought to the brink of non-existence by their changes. Omen?

kstprod

5:16 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



Just to add, I am also seeing changes for the better. Not the best, but by far better than this past week.

Before the "mess" - index was #6
At the beginning of the "mess" - index was #6 in -in and -sj (other dc's gone)
This past week - index completely gone (or buried DEEP)
Today - index is back (with June 25 Fresh tags), but now at #22 in all DC's and www2 and www3

I didn't change one thing on my site during this time. I have absolutely NO idea what to make of this change. Don't get me wrong, #22 is better than buried, but my normal ol' #6 is way better than #22.

I sure wish we knew something concrete, so I knew if I should start working on getting that #22 closer to #1.

Hopefully, everyone will start seeing these changes and we can all get back to normal. :)

Napoleon

5:18 pm on Jun 26, 2003 (gmt 0)



>> I wonder if Napoleon is noticing the same thing? <<

Some affected sites are OK at some centers, and others at other centers. In and out, all over the place, as if each center is acting independently of the others.

It seems that once a site is in the twilight zone it just won't stick.... but it won't die either.

The man who knows (GG) is obviously saying nothing. That change of behaviour really has me wondering to be honest.

chiyo

5:25 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



illad wrote:>>Unfortunately these engines all failed horribly in their efforts and were brought to the brink of non-existence by their changes. Omen? <<

This is a webmaster urban myth that still persists. While some of their demise may have been due to changing algos, and webmasters got confused upset and angry much more significant was overportalization and inserting of paid ads transparently in their SERPS. Google just provided a far more competitive service for users at the time by going back to basics.

Nope, no omen at all.

my3cents

5:55 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



nice post chiyo. I'm a google fan, like many, and I have hope this will work out for the best in the long term.

Let's not underestimate the power of work of mouth though. I'm noticing slightly increased traffic from other engines, with 6 degrees of seperation, how long does it take Joe Searcher to spread the word when he starts having problems?

Look how fast word of mouth spread about how great the results were at google when spam was completely dominating INK and the first three pages were all duplicate sites?

I realize G is not nearly that bad, but all it takes is for 2% of the Joe Searchers out their to spread the word and before you know it....

.... I hope this gets ironed out soon, think of the thousands of clients alone that have been told by their SEO that something is wrong with Google, a lot of those folks are telling their friends and word is spreading. Whether there is anything to this or not, word of mouth spreads fast and people tend to believe their friends over advertising, marketing or stock brokers.

mfishy

6:04 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How can Google possibly be good anymore? My site doesn't come up #1! And it's the best damn posters affiliate page on the web. I even traded for backlinks to prove it. :)

Google is still very good. It has taken a few minor steps back as of late, however. Yahoo! is not to be taken lightly as a competitor either.

2_much

6:43 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok, I've done my best to read through this thread but it's a bit long, so sorry if I missed some stuff.

Here's what I do. I'm not tracking every data center. I'm not looking at every fluctuation.

I am going to google.com and searching for my primary keywords. Then I check the anchor text for the backward links for a keyword, and do a quick check on the on-site criteria. I throw all this on an excel. I also check some other competitive affiliate keywords just to compare. I do it manually because I learn as I do it, instead of using a tool to compile the data.

Then I run it on a webmap tool.

Voila. It all makes sense.

I for one love trying to decode NFFC's posts. Lots of wisdom in them.

Anon27

6:58 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



A little OT, but the current situation with Google results made mass media this morning on New York Radio during prime drive-time.

At approximately at 7:50AM, Elvis Duran of the Z100 Morning Zoo Crew (New York City's #1 station) announced his displeasure with the current Google search results, stating "Google is all screwed up". He gave several hilarious search results for certain search terms.

I heard it all live, then turned to my wife and said: "see, I told you!"

cline

7:16 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have some confirmation of the theory about duplicate title tags.

I've got a site badly affected by the lost index problem. It also has (an IT-department efficiency induced) lots of duplicate title tags. Changing the title tags has improved the site's ranking, but I still can't Google to like the index page.

my3cents

7:26 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



word of mouth + media coverage

you do the math

As funny as I think that Zoo Crew story is, I hate to see it happening, just when Google had the market cornered.....

my3cents

steveb

9:38 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Why not post the detailed rationale of why you think it is an algo change"

Because obviously one doesn't exist. Again, these algo change comments are laughable. Dropping a glass on the floor and watching it shatter is not "changing its algo".

There is little evidence of an algo change *except* the higher valuation of "fresh". That is a very significant change, but you don't change an algo by botching a deepcrawl and publishing that failure for the world to see (backlinks). You don't change the algo by stupidly listing newhoo.com backlinks 5% of the time instead of dmoz.org ones.

The fact of the matter is that these current spammy/fresh results are temporary heaven for pure, content-less SEO nonsense. Put new pages on established domains. Rule the serps. End of story.

Google is objectively broken, and some people are temporarily benefitting from it. Fine, but it is simply ludicrous to suggest that a search engine that has a pagerank display on its toolbar has not assigned rank to any new pages since April 15th, and what is displayed is calculated from from *before* that. Etc., etc. There are at least four major, ongoing data failures going on at google right now that at best render any algo change utterly trivial. It is exactly like talking about re-arranging deck chairs on the Titanic.

So can we get real here? There are two things to talk about, the ways Google is broken -- which is like beating a dead horse now -- and how to temporarily benefit from the extremely poor and volatile index Google is forced to put before the public. And again the secret to that is, make fresh pages... new pages, or simply move text around on older pages. This won't help the post- March 15th sites who still are in no man's land, but for established sites it is extremely easy to get top ten rankings for moderate search terms (not the really tough ones though).

Alphawolf

9:42 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



At approximately at 7:50AM, Elvis Duran of the Z100 Morning Zoo Crew (New York City's #1 station) announced his displeasure with the current Google search results, stating "Google is all screwed up". He gave several hilarious search results for certain search terms.

I heard it all live, then turned to my wife and said: "see, I told you!"

:)

That's cool. So, you listen to Z100, eh? I am a 92.3, 92.7 102.3 102.7 sorta guy myself. OK- I also hit 95.5 and 100.3 every now and again.

Best- your own CD. ;)

AW

mrbrad

9:56 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



As funny as I think that Zoo Crew story is, I hate to see it happening, just when Google had the market cornered.....

I think that story is GREAT! And I dont think it is an isolated incident anymore.

Over the last few days I have been hearing more people complain about Google.
The complaints Im starting to hear is: "I found this great site on Google the other day and I went back to Google to show somebody else the site and it was gone, we couldnt find the site anymore."

I used to think the average surfer wouldn't notice SERPS being bad at Google but now I am starting to think otherwise.

This is what happens to giant companies with no competition, they get careless and end up falling on their face.

We could be witnessing an awful train wreck here and another chapter in history ala Northern Light and Alta Vista.

tracylee

10:30 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



"That's cool. So, you listen to Z100, eh? I am a 92.3, 92.7 102.3 102.7 sorta guy myself. OK- I also hit 95.5 and 100.3 every now and again.

Best- your own CD. ;) "

Nah, XM all the way!

steve128

10:34 pm on Jun 26, 2003 (gmt 0)



>I heard it all live, then turned to my wife and said: "see, I told you!" <

My wife said...well there is always MSN,,,screech ..swerve...blank look . lol

Anon27

2:41 am on Jun 27, 2003 (gmt 0)

10+ Year Member



My wife said...well there is always MSN,,,screech ..swerve...blank look . lol

My wife is much less understanding. She wants to quit her job, and every morning for the past two months I have been trying to explain to her why things are very un-stable...

When she heard it on the radio from the #1 DJ in NYC, it saved my marriage, well..., maybe not.

cabbie

3:03 am on Jun 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>When she heard it on the radio from the #1 DJ in NYC, it saved my marriage, well..., maybe not.<
LOL...ditto ANON27
A BIT OS BUT My wife thinks I sit on my bum all day with my legs up on the desk and having fun.Why should she have to work when I am making enough.
"Because I dont make any money except from Google and who knows how long thats going to be for"
Alan

Stefan

3:06 am on Jun 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry to get talking about lost index pages again, but for any of those experiencing my particular problem, which is directory sites linking/listing URL's of my domain in the form of www.mysite.org/?theirdomain.com, and thereby duplicating my sites for the SE's, you might find that the webmasters responsible have no clue as to what damage they're causing in the first place. They put the link in that form because they're obsessed with affiliates or something, mash your serps, and have no idea that it's even happening. If you act riled enough and send many emails you will eventually make progress.

Thank you for letting me share this with all of you. I'm going back to the twilight zone now.

This 345 message thread spans 12 pages: 345