homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 200 message thread spans 7 pages: < < 200 ( 1 2 [3] 4 5 6 7 > >     
Some big observations on dropped pages

 5:01 pm on May 22, 2006 (gmt 0)

I have been trying to figure our why my site dropped from 57,000 pages down to only 700. Today I noticed a huge pattern, and barring something major, I believe it is the reason for the dropped pages. First, I noticed that all pages three levels deep and higher are indexed. Any pages indexed lower than that are externally linked in some way.

How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.

So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.

When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.

Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.

Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.

[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]



 7:41 am on May 24, 2006 (gmt 0)

So what would be the best way to do a sitemap for oh about 2000 pages? Would you limit to a certain amount of links and split it all up somehow or do one big site map? With the indexing problem we would have to have a sitemap directly off of the home page which links to the bulk of the pages below (keeping everything 3rd level). We have 15 main sections and about 200 or so sub sections with somewhere around 1700 or so articles within those sub sections. Any ideas on how to do that?


 8:46 am on May 24, 2006 (gmt 0)

Probably a complete coincidence, but I have a couple of dot coms which still have all pages on a site: search, although my .co.uks have 90% missing? Am I missing something here?


 10:12 am on May 24, 2006 (gmt 0)

I can report an observation that may help.

All our sites have followed a pattern of non indexing below a certain level.

We have 3 sites with identical structures on different regional domains offering seperate content and different page structures.

All pages are below 80k accessable via a sitemap submitted via Google SiteMaps

The total pages of each is around 57,000.
We have 5 levels
All pages are linked from at least 2/3
All pages have already [ or will have ] at least 1 IBL from a relevant site.
We have around 250 reciprocal links per site, mainly onto our home page.
We have around 60 IBL's into the home page

All sites rebounded from "supplemental hell" with full indexing around mid April.

All sites have withdrawn systematically to different levels, being 4,3,2

Is anyone seeing the same pattern?



 11:54 am on May 24, 2006 (gmt 0)

One thing I've seen hints of in these posts is historical changes of site structure for linkage, directory, and perhaps even page naming.

Is it possible those having difficulties have done a major restructuring of your site in the past? Changing page location in directory structure, changing page location in linkage structure, or from my perspective the worst thing is renaming a page.

The history of these structural changes may be impacting how you are indexed now.

Certainly renamed pages seem to live forever in the Supplemental index and may impact how the current page is indexed. Of course Google certainly considers the directory path as part of a page's name to some extent. Likewise the linkage path to a certain page may be recorded historically.

Google clearly "remembers" a given page base solely on it's content, then perhaps considers the number and types of restructuring of navigation and directory structure to get to these pages as part of it's ranking mechanism.

These historical changes may trigger scraper site filters (defective filters!). I think alphabetical naming structures for directory and linkage have caused numerous problems historically for many legitimate sites, simply because this structure is so similar to an automated scraper site structure. Didn't Wikipedia get booted from the index for a while? Just one example.

Perhaps having multiple linkage structures to all pages shows an effort at organization that an automated site typically might not have.

Perhaps content dependent (random, no structure) internal linking, if present, helps guarantee indexing as well. Links back up and across the site structure based on content with pertinent anchor text, etc. This seems to happen naturally over time in a site with very good content.

So the primary point was:

Could historical changes in site structure impact your current indexing difficulties?

(What to do? I have no idea!)


 12:05 pm on May 24, 2006 (gmt 0)

Well I don't think that this post will be alive for long but as long will not be deleteded by the mods I like to share a great secret(maybe not for all of us)... Google's "vanishing ladies trend" gives the opportunity to clever webmasters that they are aware what pages are gone out of the index and they have great content they can copy paste there content and create new websites or blogs on deferent TLD's ,the same thing that applyes with Yahoo ... and me personally I rank at top positions at Yahoo with stolen content from huge websites that have been banned from them....
any comments?


 12:29 pm on May 24, 2006 (gmt 0)

"huge websites that have been banned from them"
and i can understand why they have been banned from Yahoo some wikilike sites from a small country in the BENELUX with zillions of spam links (90% from blogspam ) that rank at top SERPS in Google for city travel guide or city accommodation with 0 content or better to say 66 nil point........


 12:40 pm on May 24, 2006 (gmt 0)

I have a comment. If you rank by stealing content I wish bad upon you.


 1:15 pm on May 24, 2006 (gmt 0)

It seems like this is going to create a lot of problems with stolen content from low PR sites. It may even encourage it.

It is going to be more link racing for webmasters. And now one-way links instead of recips. If everyone is seeking a one-way link soon, who will link to whom?

Does this help the machine crisis at G?


 1:26 pm on May 24, 2006 (gmt 0)

>> It is going to be more link racing for webmasters. And now one-way links instead of recips. If everyone is seeking a one-way link soon, who will link to whom?

One of my sites, there are three of us in the niche that are legit, the other 7 or so are spammers with a small percentage of on-topic pages (but they are stuffed/built in such a way as too look like more).

The three of us competing had a little IM meeting the other day. Somebody brought something up which I didn't like at the time, but that I'm now re-considering - setting up a fake site with private registration (our sites are in our names or our companies' names) on one of those cheapy $3 a month hosting plans, or even using some of the free blog services, and just having it point to stories on our three sites, without any links back to the fake site, with fake comments about the stories.

It's disturbing/misleading, but if this continues and Google has changed things to favor that kind of setup, we may have to resort to some things that many of us would prefer not to, but you have to fight the spammers (and now apparently Google) somehow.


 1:31 pm on May 24, 2006 (gmt 0)

But if this continues and Google has changed things to favor that kind of setup, we may have to resort to some things that many of us would prefer not to, but you have to fight the spammers (and now apparently Google) somehow.

That's probably the last thing you'll want to do. Knee-jerk reactions to changes in the algo have caused more problems for more people. At this point all you can do is make sure your yard is cleaned up and wait for the next big update. If you are scrambling and making changes assuming that it was this, or it was that, you may be just piling on the issues you'll have to contend with next month and the month after that.

I know, some of you can't wait. Well, that's when the risk factor increases and you do things that may not be in your best interest. Patience is key and if Google has problems with their indexing, they'll have to fix it or risk losing some of their market share.

Until we see the media reporting on this stuff on a regular basis, it doesn't matter as Google still has that 45%+ market share that doesn't know what the heck is going on with a small group of websites. Nor do they care. :(


 1:36 pm on May 24, 2006 (gmt 0)

How many of us on our site maps have image url's on them? I know I have some url's for images on my site map and I am considering taking them off


 1:44 pm on May 24, 2006 (gmt 0)

pageoneresults makes a good point.

I agree - clean up anything overtly offensive to Google, and wait. If you expect to be back up there immediately, yet see no changes in the SERPs, there is going to be a temptation to keep on fiddling.

My bet is this is probably just a complete waste of effort.


 3:40 pm on May 24, 2006 (gmt 0)

Well my changes brought almost instant results, and the fixes are not spammy in any way. I say if there is a quick, unspammy way to get things done then do it. My site now has over 12k pages indexed on some of the datacenters and climbing almost hourly thanks to the index changes I made. Getting results in only two days is very satisfying....now I sound like a weightloss commercial.

I think for almost all the cases I have heard, the problems really lie in pr and link destruction of their sites. Meaning that because of deindexing and changes in the algo you are only going to get indexed so much by Google. I just really don't believe that it is from bad design and interlinking, but an external change. So, you need to get your main content higher up the hierarchy on your site for the short term, and then begin getting more "quality" inbound links again so that once your pr(or whatever they call this new link rep) repairs, you can get indexed deeper. I think it is a pretty simple process, but you might have to wait a long time if you are only getting indexed one or two levels now. For those only getting indexed two levels, there isn't really a quick fix as you can't put direct links to all your content on the front page. For that it seems it will have to be just a lot of work getting good links.


 6:09 pm on May 24, 2006 (gmt 0)

I cheched the cache of my homepage and this is the third day in a row it has an updated cache so they are busy spidering that again. Now I notice Google isn't indexing most of my site map, and my site map is one I made, not auto generated. The caches of a few pages show March dates.

Anyway looking through all the pages on my site that are cached shows March dates as well so the only fresh page they are doing is the homepage for now.


 6:13 pm on May 24, 2006 (gmt 0)

One thing I would suggest to everyone who can't seem to get anything indexed is to install a blog on your domain name and just write every once in a while in it. After writing make sure you hit pingomatic so it pings all the services. You will get links in all the blog places and Google loves blogs too. You will get a higher pr on the blog and you can link to your content from the blog. I registered a brand new domain name for my blog two months ago and within a week I was fully indexed and now after only two months I am a pr of 5 with no work getting links. Just pinging pingomatic.


 6:29 pm on May 24, 2006 (gmt 0)

Hate to keep a running total, but now dcs are showing 35k pages indexed on our site. Woohoo!


 6:40 pm on May 24, 2006 (gmt 0)

I'm noticing that my content rich pages are ok but pages with less (+ unique) content have been nuked.

Page rank 5 indexed to 3 levels.

I did install a blog on a thousand page indexed site - and now the blog pages are virtually the only ones left!


 6:47 pm on May 24, 2006 (gmt 0)

One other note - Google must be saving a lot of money from Adsense over the last few weekss - and what an easy way to do it!


 10:39 am on May 25, 2006 (gmt 0)

I saw a slight improvement today with indexed pages going from 2 to 10. The pages are high level, content rich, and not in supplemental results. As for the other 300 ish pages of the site, I wait and hope... It's still progress as yesterday I considered asking Google if they had a life support switch for my 2 surviving pages.

lifesupport:www.mydomain.com -peasedontnuke :-)


 11:27 am on May 25, 2006 (gmt 0)

I'm down to 7. LOL.

Good thing I went Web 2.0 in my marketing after Google hosed me on 9/22. I can't take these "updations" anymore ;)


 1:46 pm on May 25, 2006 (gmt 0)

>> One other note - Google must be saving a lot of money from Adsense over the last few weekss - and what an easy way to do it!

The spammers are still doing just fine, so I doubt AS/AW has taken a hit.

I notice that I'm down to dropping 2-4 pages everyday over the past few days on one site. That's about the number of pages I typically add in a day. It's almost like Google has a cache and it's slowly expiring anything older than a certain date.


 5:49 pm on May 25, 2006 (gmt 0)

I think it is leveling out at indexing 40k of my pages now. A lot better than 700 :)

Traffic is back up to where it was before and I am feeling good. I would suggest what I did to anyone having problems getting indexed deeper after this stupid update. Raise your content up a level (keep usability of course) and you should get indexed. With whatever you do the user should not be inconvenienced, but doing large article indexes at the bottom for a resource should be fine and should yield results quickly.


 6:02 pm on May 25, 2006 (gmt 0)

Congrats on your success. Hope the pages stick. Our crawl slowed WAY down now. It did hit our indexes. I am sure when the ol' bot comes back we will get hit hard again getting a full crawl if not we at least we can get some of those missing articles back in.


 9:52 pm on May 25, 2006 (gmt 0)

For some reason the only site of our that has been left alone is one site that we have left alone. Seems rock steady for now. Surprisingly it's the one that would be in need of a bit of clean up. Many of the pages are old "science experiments" Also the forum got compromised a while ago with tons of spam entries - they have been removed but the pages still remain in the google index. Page count is steady at 9150 - actual page count would be less than 500. Our other site has been slowly getting deindexed each day is now down to just 658 pages... From 10k - Each day it's dumping a hanfull of good content.

What happens when our links pages get removed?

Is this going to have a negative effect on our link partners? And so their pr is lower and so their pages statrt to get deindexed? And so on and so on?


 10:17 pm on May 25, 2006 (gmt 0)

Yep that is exactly what happens. It is a circular effect. As your pages get deindexed the people you are linked to then are threatened with lower pr and deindexing. Fun isn't it?


 11:27 pm on May 25, 2006 (gmt 0)

It makes you wonder how far down this whole thing will spiral before they wake up and put the brakes on.
"The rich get richer the poor get the picture"


 12:38 am on May 26, 2006 (gmt 0)

I had a call from a friend who has a 40 page informational site, as he has happened to notice that his traffic has plummeted in the last week or two.

A site: search reveals just TWO pages listed; the main index page, and the only internal page that has any external incoming links.

All the other pages are blown away, except for four pages with images (and very little text) that are listed as Supplemental Results in some searches.

The factors for this site? No idea, but PR 2 internal pages (PR 4 main index), "old skool" bloated HTML code, poor site navigation, and some duplicate titles and/or meta descriptions probably do not help matters at all.

Several other people that I know, people who have well-structured sites, breadcrumb navigation, lean HTML code with external CSS, unique title and meta description on every page, etc, have 90 to 100% of their pages indexed and are doing just fine.


 12:58 am on May 26, 2006 (gmt 0)

Several other people that I know, people who have well-structured sites, breadcrumb navigation, lean HTML code with external CSS, unique title and meta description on every page, etc, have 90 to 100% of their pages indexed and are doing just fine.

I have many sites like that newer than 3 years old that are like this and are not doing fine. Yet every site I manage that's older than 3 years (including one with bloated HTML code generated from a 1997 model web page builder) is doing fine. While these older sites took an indexing hit initially, all are fine and dandy again. Over the next few weeks I will be redoing the code for the 1997 site and what happens then could be interesting but I bet it holds up fine.

So from what I can tell it's age and not beauty that counts. Your mileage may vary.


 1:34 am on May 26, 2006 (gmt 0)

The age part is a whole different subject. There is a page that was built in 1997 that has only two pages with no content, and has been closed for two years. It has only seven incoming links according to Yahoo, yet is in the top 5 for an extremely competitive keyword. In fact, there is no site younger than 5 years in the top twenty for that keyword. Needless to say I am trying to buy that domain :)


 5:26 am on May 26, 2006 (gmt 0)

One of our sites has gone exactly this way - lost pages with site: command = lost pages

Second site is older pages have dropped from 10k to now just 521... I only just discovered if I run the site:www.mysite.com "my keyword" it will return all of the pages containing the keyword and then some.. up to 20k and more.. Different keywords mean very different results but so far just checking todays traffic it is actually up around normal levels from the content pages many of which do not show up in the standard site command. The same test on the other site did not give the same result so for the time being we think the site might be lost?


 5:36 am on May 26, 2006 (gmt 0)

I think it is leveling out at indexing 40k of my pages now. A lot better than 700

Man, I feel like such a newbie to the web and i've been doing it for years, I still don't know how you can get a site with 5k+ pages let alone 40k+ lol

This 200 message thread spans 7 pages: < < 200 ( 1 2 [3] 4 5 6 7 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved