Forum Moderators: Robert Charlton & goodroi
How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.
So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.
When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.
Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.
Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.
[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]
All our sites have followed a pattern of non indexing below a certain level.
We have 3 sites with identical structures on different regional domains offering seperate content and different page structures.
All pages are below 80k accessable via a sitemap submitted via Google SiteMaps
The total pages of each is around 57,000.
We have 5 levels
All pages are linked from at least 2/3
All pages have already [ or will have ] at least 1 IBL from a relevant site.
We have around 250 reciprocal links per site, mainly onto our home page.
We have around 60 IBL's into the home page
All sites rebounded from "supplemental hell" with full indexing around mid April.
All sites have withdrawn systematically to different levels, being 4,3,2
Is anyone seeing the same pattern?
Whitey
Is it possible those having difficulties have done a major restructuring of your site in the past? Changing page location in directory structure, changing page location in linkage structure, or from my perspective the worst thing is renaming a page.
The history of these structural changes may be impacting how you are indexed now.
Certainly renamed pages seem to live forever in the Supplemental index and may impact how the current page is indexed. Of course Google certainly considers the directory path as part of a page's name to some extent. Likewise the linkage path to a certain page may be recorded historically.
Google clearly "remembers" a given page base solely on it's content, then perhaps considers the number and types of restructuring of navigation and directory structure to get to these pages as part of it's ranking mechanism.
These historical changes may trigger scraper site filters (defective filters!). I think alphabetical naming structures for directory and linkage have caused numerous problems historically for many legitimate sites, simply because this structure is so similar to an automated scraper site structure. Didn't Wikipedia get booted from the index for a while? Just one example.
Perhaps having multiple linkage structures to all pages shows an effort at organization that an automated site typically might not have.
Perhaps content dependent (random, no structure) internal linking, if present, helps guarantee indexing as well. Links back up and across the site structure based on content with pertinent anchor text, etc. This seems to happen naturally over time in a site with very good content.
So the primary point was:
Could historical changes in site structure impact your current indexing difficulties?
(What to do? I have no idea!)
It is going to be more link racing for webmasters. And now one-way links instead of recips. If everyone is seeking a one-way link soon, who will link to whom?
Does this help the machine crisis at G?
One of my sites, there are three of us in the niche that are legit, the other 7 or so are spammers with a small percentage of on-topic pages (but they are stuffed/built in such a way as too look like more).
The three of us competing had a little IM meeting the other day. Somebody brought something up which I didn't like at the time, but that I'm now re-considering - setting up a fake site with private registration (our sites are in our names or our companies' names) on one of those cheapy $3 a month hosting plans, or even using some of the free blog services, and just having it point to stories on our three sites, without any links back to the fake site, with fake comments about the stories.
It's disturbing/misleading, but if this continues and Google has changed things to favor that kind of setup, we may have to resort to some things that many of us would prefer not to, but you have to fight the spammers (and now apparently Google) somehow.
But if this continues and Google has changed things to favor that kind of setup, we may have to resort to some things that many of us would prefer not to, but you have to fight the spammers (and now apparently Google) somehow.
That's probably the last thing you'll want to do. Knee-jerk reactions to changes in the algo have caused more problems for more people. At this point all you can do is make sure your yard is cleaned up and wait for the next big update. If you are scrambling and making changes assuming that it was this, or it was that, you may be just piling on the issues you'll have to contend with next month and the month after that.
I know, some of you can't wait. Well, that's when the risk factor increases and you do things that may not be in your best interest. Patience is key and if Google has problems with their indexing, they'll have to fix it or risk losing some of their market share.
Until we see the media reporting on this stuff on a regular basis, it doesn't matter as Google still has that 45%+ market share that doesn't know what the heck is going on with a small group of websites. Nor do they care. :(
I think for almost all the cases I have heard, the problems really lie in pr and link destruction of their sites. Meaning that because of deindexing and changes in the algo you are only going to get indexed so much by Google. I just really don't believe that it is from bad design and interlinking, but an external change. So, you need to get your main content higher up the hierarchy on your site for the short term, and then begin getting more "quality" inbound links again so that once your pr(or whatever they call this new link rep) repairs, you can get indexed deeper. I think it is a pretty simple process, but you might have to wait a long time if you are only getting indexed one or two levels now. For those only getting indexed two levels, there isn't really a quick fix as you can't put direct links to all your content on the front page. For that it seems it will have to be just a lot of work getting good links.
Anyway looking through all the pages on my site that are cached shows March dates as well so the only fresh page they are doing is the homepage for now.
lifesupport:www.mydomain.com -peasedontnuke :-)
The spammers are still doing just fine, so I doubt AS/AW has taken a hit.
I notice that I'm down to dropping 2-4 pages everyday over the past few days on one site. That's about the number of pages I typically add in a day. It's almost like Google has a cache and it's slowly expiring anything older than a certain date.
Traffic is back up to where it was before and I am feeling good. I would suggest what I did to anyone having problems getting indexed deeper after this stupid update. Raise your content up a level (keep usability of course) and you should get indexed. With whatever you do the user should not be inconvenienced, but doing large article indexes at the bottom for a resource should be fine and should yield results quickly.
What happens when our links pages get removed?
Is this going to have a negative effect on our link partners? And so their pr is lower and so their pages statrt to get deindexed? And so on and so on?
A site: search reveals just TWO pages listed; the main index page, and the only internal page that has any external incoming links.
All the other pages are blown away, except for four pages with images (and very little text) that are listed as Supplemental Results in some searches.
The factors for this site? No idea, but PR 2 internal pages (PR 4 main index), "old skool" bloated HTML code, poor site navigation, and some duplicate titles and/or meta descriptions probably do not help matters at all.
Several other people that I know, people who have well-structured sites, breadcrumb navigation, lean HTML code with external CSS, unique title and meta description on every page, etc, have 90 to 100% of their pages indexed and are doing just fine.
Several other people that I know, people who have well-structured sites, breadcrumb navigation, lean HTML code with external CSS, unique title and meta description on every page, etc, have 90 to 100% of their pages indexed and are doing just fine.
So from what I can tell it's age and not beauty that counts. Your mileage may vary.
Second site is older pages have dropped from 10k to now just 521... I only just discovered if I run the site:www.mysite.com "my keyword" it will return all of the pages containing the keyword and then some.. up to 20k and more.. Different keywords mean very different results but so far just checking todays traffic it is actually up around normal levels from the content pages many of which do not show up in the standard site command. The same test on the other site did not give the same result so for the time being we think the site might be lost?