| This 200 message thread spans 7 pages: < < 200 ( 1  3 4 5 6 7 ) > > || |
|Some big observations on dropped pages|
I have been trying to figure our why my site dropped from 57,000 pages down to only 700. Today I noticed a huge pattern, and barring something major, I believe it is the reason for the dropped pages. First, I noticed that all pages three levels deep and higher are indexed. Any pages indexed lower than that are externally linked in some way.
How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.
So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.
When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.
Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.
Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.
[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]
Your observation does not hold true for me. My main page has a PR of 4. My site pages indexed have gone down to a grand total of three. The homepage, one second level page (other pages in this level are not indexed), and a FOURTH level page. I have a sitemap linked from the main page but it is not indexed.
As bizarre as it is frustrating.
|With no content in-between. |
Personally I think that is where some problems may arise. It would typically be unnatural to jump that many levels and not have anything in between. That's like having a Nuked Grilled Cheese sandwich with no cheese. ;)
From my viewpoint, if you are structuring a site with 2, 3, 4 or more levels, it's the inbetween that makes the site whole. For example...
The above will have a root level page with a list of links to those brandname's root level pages.
The above will have a root level page with a list of links to those products from that particular brand. That root level page could be paginated and have hundreds even thousands of links. Allowing spiders access to those pages is a key element in the horizontal portion of the pyramid.
Of course this is the core product page and is also a root level page with nothing below it (maybe). This will all be relative to the depth of the site and how you have it structured. In some instances, you may see inverted pyramids in your structure.
All of those root level pages are the driving force behind a well structured and deep website.
You say you have 7 categories? Well, that probably means 7 primary site maps. And don't just think of these as ordinary site maps. These are built for the user first, the indexing is a natural occurrence from the page being logically structured and linked to from multiple pyramids all leading back to the primary structure. It gets really deep when you think about it (pun intended).
|If your theory is correct, it would explain why I have lost all but my index page. |
New site? Nothing inbetween? All pages at the 4th level. Minimal PageRank (what is seen at the public level)? It may be the issue but it is really difficult to determine. You would surely be led to believe that if you read here and there and everywhere else.
|However, the fact is that I rolled out this structure in March and all of my pages were indexed and traffic was decent. |
2006 March? Three months online? Nah, you're in that timeframe when the site is going to do all sorts of wierd things. It has no history yet. Googlebot will index anything first time around, or it used to anyway. Once it makes it rounds again and again, different things are occurring each time.
Also, if you rolled the site out in 2006 March, how did you get indexed so quickly and acquire decent traffic?
|So my rewrite code is technically good, just now maybe the rules have changed? |
It may be technically good, but is it structurally sound? That nothing inbetween would be a concern for me.
I think it also matters how the pages are interlinked on the website as far as how deep the spidering goes. Generally I think the first post is correct on my site, although I am currently going through the whole thing and improving the links [deleting worthless ones].
All I can say is I have found some bad links that I had on my site, they used to be good when I initially linked to those domains but they must have expired and were taken over by spamming pages with all ads. I didn't know they were like this or else I wouldn't have had the links there.
In a week ot two I should get all the links checked, will see if that helps my site. It will help the users anyways.
Just out of interest, are any of you using Google Site Maps? I am, and it occurred to me that it is a potential cause of the problem.
pageoneresults: great feedback, thanks.
|Also, if you rolled the site out in 2006 March, how did you get indexed so quickly and acquire decent traffic? |
To sxplain that better: it was a pre-existing site that was updated. It had a customer base, a low page rank and steady, not huge, traffic. It was a 10 year old, all static html, site, it was converted to data driven, php site. When Google first indexed the new site it added all the new pages. So, for the last 2 months we have had the old results and the new ones mixed in the index results.
As you indicate I may need to re-think my site structure, I basically have the "drill down" effect on the site that you mention, Off of the home page you have links to the "brand level index pages" which, in turn, have links to all of the individual products, or brand sub-categories in some cases. So the pyramid is there for the users and, I would hope, the spiders to follow. However, thanks to my attempts at SEO with ModRewrite ;-) I have essentially simulated a structure that doesn't have those middle layers from a pure URL perspective. May have to re-do that and see if it helps.
|It was a 10 year old, all static html, site, it was converted to data driven, php site. When Google first indexed the new site it added all the new pages. So, for the last 2 months we have had the old results and the new ones mixed in the index results. |
This is the normal process, you can't change anything just yet unless you want to add insult to injury.
Did you set up a rewrite to take the old pages and 301 (permanently redirect) them to the new pages? The process of replacing old with new takes some time. It could be upwards of six months depending on other factors.
>> Just out of interest, are any of you using Google Site Maps? I am, and it occurred to me that it is a potential cause of the problem.
Many of us have been using Google site maps for quite some time before BD, without any problems whatsoever (and in fact Google site maps was very helpful in identifying problems).
Can anyone see a pattern here? (or should I go get some fresh air?):
I only thought about this because I am about to launch a new site - which by its user-friendly design has sparse text (few keywords), and 4-level navigation. So, if I want this site to do well in Google - I should add loads of text, and re-do the navigation. The end result is not nearly so user-friendly - but should do better in Google.
Fine. Or is it?
Seems to me, that if you have a spammy, keyword-laden, dumb navigation website, then you will do better in the natural SERPS. However, you're site will not be so user-friendly. The only alternative is to stick with the user-friendly approach, forget about natural SERPS, and buy Adwords. Follow this to its conclusion - and you get ugly sites on the natural SERPS, and user-friendly sites on the right hand side.
What a way to educate users!
ie if you want ugly sites, click on the list on the left - if you want well-navigated, user-friendly sites, click on the list on the right!
Going out now :-)
|Can anyone see a pattern here? (or should I go get some fresh air?): |
Probably go get some fresh air. ;)
The pattern? Yes. In this particular instance, much of it has to do with websites that don't have sufficient PageRank (or history) to get a deep crawl started and maintained.
It also has a lot to do with how those sites are structured and how they are funneling the PageRank throughout the site.
If you have a site that is PR4 or less, there is a good chance that your "deep level pages" are not going to get indexed as frequently. But, this is all going to be relative to many factors with site structure being at the top of the list.
Jumping 3 or 4 levels down without anything inbetween may not be a "best practice" option as you miss out on all the stuff that glues those pages together.
pageone, the only problem is you automatically assume (like Matt Cutts) that something is wrong with structure or external linking. My structure is exactly like the pyramid you talking about. It fans out and spreads out the PR very well. I do afterall have 800 pages indexed.
I use no link exchanges, and actually only have a few external links to extremely reputable places for resources. The site has been up for more than a year and is almost a PR of 6. The entire site - 800 pages now is indexed three levels deep. My homepage gets indexed almost daily, and the lower levels once a week. I know what I am doing, I just want the lowest article level pages (4th level) to be indexed because they have been indexed for the last 6 months!
So, for the sake of it I just put a link to a giant index of the articles on the bottom of the page so that Google will index them. I would bet quite a bit that they will now be indexed, and since my users will very rarely use that link to get to those articles, my pyramid structure will still be there for them, and users can now search for the valuable content that it is.
It just seems to me that your attitude is a little egotistical. Since we can't post our urls it is easy to criticize without seeing. Trust me when I say I have a very logical, hierarchical structure that has been helped by some very experienced people in the SEO business. Up until one night two weeks ago the indexing was almost perfect for me, then because sites linking to us got hammered it hammered us also. I could give you twenty examples if you would like to see. It was cyclical problem with pr degrading over a large number of sites. At the same time, pr has been handed to some sites like candy, my personal blog with only 12 incoming links (none of which are very good) now has a PR of 5. This is just stupid. Hey, at least I can brag about my pr.
|Just out of interest, are any of you using Google Site Maps? I am, and it occurred to me that it is a potential cause of the problem. |
"I use no link exchanges, and actually only have a few external links to extremely reputable places for resources. The site has been up for more than a year and is almost a PR of 6. The entire site - 800 pages now is indexed three levels deep. My homepage gets indexed almost daily, and the lower levels once a week. I know what I am doing, I just want the lowest article level pages (4th level) to be indexed because they have been indexed for the last 6 months!"
Our site has been around for 5 years with a very logical pyramid structure lots of inbound links to both top level and deep content. I can easily get content indexed by simply putting a link on some of the higher levels. It is just virtually impossible with the shear amount of content to get everything linked at such a high level. Keep in mind that YAHOO and MSN have ZERO problems indexing WITHOUT any sort of site map (on site or some feature offered - eg. google sitemaps).
"It just seems to me that your attitude is a little egotistical. Since we can't post our urls it is easy to criticize without seeing."
This is the way I see you come across also. Not to be rude or anything. It just could be the way you discuss things - like myself who can come across a bit blunt sometimes. Since I don't know you I will assume that is the case. Although, some site owners DO need a kick in the pants in that their structure leaves something to be desired. There are many of us that know what we are doing to distribute PR and traffic flow.
Still I would like anyone who can answer the question: If all pages are NOT being indexed then how can PR be sucessfully and accurately determined and distributed?
I thought the same thing was happening with my site, since the page count dropped from 10K to 600. I played around with the site:example.com keyword-in-my-url and returned more than 5K results for some searches.
It looks more like a large number of pages went supplemental, and are not being returned in regular site: search results... I haven't been around much lately, but think I read something to this end recently.
Just as a note, I had a tough time finding the pages, and they were only returned for two or three single keywords out of about 15 single or multiple keyword combinations I tried.
Hope this helps.
Added: This also explains how pages that don't appear to be indexed show PageRank =)
|pageone, the only problem is you automatically assume (like Matt Cutts) that something is wrong with structure or external linking. |
That's all we can do unless we do a full discovery. Most of what we do is based on assumption and I was merely providing examples. Those where the examples applied might want to take a look at things to see if they coincide with their unique issues.
|My structure is exactly like the pyramid you talking about. It fans out and spreads out the PR very well. I do afterall have 800 pages indexed. |
Great! I don't think I implied that it wasn't. I was just offering advice based on my observations and reading of the topic.
|I use no link exchanges, and actually only have a few external links to extremely reputable places for resources. |
Believe it or not, few external links can sometimes be a negative point. I'm not saying that is the case with you, but it is something for those reading along to consider.
|The site has been up for more than a year and is almost a PR of 6. |
How do you determine an "almost PR of 6"? ;) Is that a 5.9 on the Richter Scale?
|The entire site - 800 pages now is indexed three levels deep. |
Cool! That means it is being seated in the index.
|My homepage gets indexed almost daily, and the lower levels once a week. |
That's a potential issue. Once you see indexing occurring throughout the day is when you'll know you are getting deep crawled. In some instances, Googlebot will remain attached to your site, it's a marriage made in heaven. ;)
|I know what I am doing, I just want the lowest article level pages (4th level) to be indexed because they have been indexed for the last 6 months! |
I don't believe I ever implied that you didn't know what you were doing. These topics are for general discussion and don't apply to everyone. You take what you feel may apply to your situation and go from there.
If they've been indexed for the past 6 months and then all of a sudden have gone MIA, it's not the end of the world. And yes, dropping a link in the primary navigation structure to the articles section will more than likely keep those pages indexed. It's referred to as channeling the PageRank. I've seen some very effective implementations of channeling. ;)
|It just seems to me that your attitude is a little egotistical. Since we can't post our urls it is easy to criticize without seeing. |
Hmmm, are we having a bad day? Please, don't take it out on me. I'm trying to bring to light the various issues that may be affecting the sites and pages we are discussing.
|Trust me when I say I have a very logical, hierarchical structure that has been helped by some very experienced people in the SEO business. |
It's all relative. :)
|Up until one night two weeks ago the indexing was almost perfect for me, then because sites linking to us got hammered it hammered us also. |
So many factors involved when determining the root problem. In this case, you've lost backlinks which will surely have an overall effect on your site depending on the quality of those backlinks. And then you have to look at every single change you made to the site in the last 90-120 days. I'm sure you've done that, but have others who may be experiencing similar problems?
Remember, this topic is not about you, it's about hundreds and thousands of people who have similar issues that are somewhat related but can be influenced by so many different factors.
Thanks for the honesty arubicus. I didn't want to come off that way. I freely admit in the past I had a really bad site structure. I revamped it with the help of these forums and friends with more experience. That is what I was trying to convey is that the design was built with input from many trusted people and using Googles own suggestions. I myself am not an expert but I have people around me that are, and they are confused too. So for the record, I am an optimization rookie, but all I know is that for a long time I was indexed form 4-5 levels now only to 3. And that is where I am just frustrated, it is like I am getting punished for my pyramid structure. I was working off traffic from my broad base like is suggested by everyone, and then that base got taken out from under me. We are using other ways to get traffic so we are still getting 500-1000 uniques a day, but the drop from 2-3k uniques is hard to take.
Yet another move by google to try and push up adwords revenue.
Enough is enough frankly,
So far about 7 sites out of 12 we currently work on have been hit hard by this the others have partial damage, all quality, all offering unique content yet none of the new pages with the latest content feature.
Out of the remaining 5 sites, two are PR7 and the pages lower down have been hit, same issue with new pages being added and the content off them not being spidered.
In all, i think Google is now one big JOKE
What was once a relevent search engine is now just an advertising adwords dumping ground. Its days are now numbered, death due to its own greed imo.
pageone, thanks for the reply. I am not having a bad day by the way, in fact I am having a great day.
About the almost page rank of 6, there is a way to somewhat calculate that through a site I know and so far it has worked for me every time thought I am sure it isn't near to perfect. It does some matches with incoming links and gives you an idea of what you will be on update. And it says six now for my site. It can be wrong though, so who knows.
Sorry if I was focusing this too much on my particular situation. This stuff affects everyone, so to all I apologize if you are annoyed at my posts.
On a note about the affect of lost links, I am suspecting that a certain part of it was affected by a good friend who has an extremely popular blog that regularly links to us. He just recently (two weeks) ago got deindexed quite a bit after a domain change (did it as seo as he could). We then lost a lot of links from his deindexed pages, which in turn hurt other sites that were linking to us. The thing that is frustrating is that the root cause might have been so far down the line you won't be able to find it or do anything about it. It also seems it is unfairly punishing everyone else. I think that because the changes were so severe in deindexing in this last update, it caused huge ripple effects knocking off large chunks of quality links from reputable websites because the links were on deindexed pages of others. This effect could go on and on obviously and could explain why the drops are not happening for everyone at once. It all depends on how far down the line you are, or if those linking to you are deindexed yet.
Sorry that wasn't pointed at you. I read what I wrote and I am sorry that it appeared as if it were. It was more that I was agreeing with you. I was on the phone at the same time and trying to hurry through the post. Again sorry about that.
Yea, google is turning into a joke. Since nothing is really been fixed since big daddy, pages still disappearing. The index is wacked. I still see a lot of 404 pages in the top ten serps and a lot of spam.
I have been noticing a pick up in traffic from msn though.... Even though I have not changed rank in msn recently.
I guess some searchers are starting to migrate over there to msn
|I guess some searchers are starting to migrate over there to msn. |
Hehehe, a sentiment you'll typically see amongst us Webmasters. Unfortunately, the general public probably has no clue as to what is going on. In fact, they probably don't even notice these things.
An exodus of a few hundred Webmasters from Google to MSN and/or Yahoo! could occur but, at this point, Google is well seated and it probably wouldn't have a drastic impact on their bottom line. That still leaves all of us wondering how we are going to make up for that 45%+ of market share that Google has. :(
Pageone, I also forgot to say that when I meant indexed daily or weekly I meant that the page was cached that way. I get googlebot going through our site a ton every day. We use Omniture web analytics to make sure our traffic is flowing well through our site also, and there doesn't seem to be any major kinks. I really think that deindexing circle of death has been the culprit. Just don't know how long it will affect those linking to me. It is like getting a whole new set of links, and that is frustrating.
|Did you set up a rewrite to take the old pages and 301 (permanently redirect) them to the new pages? The process of replacing old with new takes some time. It could be upwards of six months depending on other factors |
Yes I did the 301 redirects on the old pages and redirected all pages that weren't carried forward to a custom, user friendly "you might want to try.." 404. Thanks to the input from this site, Matt's blog and other sources I did that much.
6 months was longer than I was thinking it would take to see the result of the seo effort, thanks for the heads up. Still it is unnerving to see all but one of your pages drop out of the index over night. ;-)
|you can't change anything just yet unless you want to add insult to injury. |
That is my inclination anyway, just want to be sure I am not missing some big hole I have left in my approach.
Thanks for keeping this thread on focus, it is a huge help to those of use who are just coming up to speed with this stuff.
I am also facing a similar problem. My site is hacing home page PR 2 and was completely indexed. But now its like just the home page is indexed and rest all the pages are not even present in the cache. Could it be another update?
|Yes I did the 301 redirects on the old pages. |
|And redirected all pages that weren't carried forward to a custom, user friendly "you might want to try.." 404. |
Just to be sure and, for everyone following along, did you check and double check the server headers being returned by those pages that are being redirected?
In reference to the custom 404, I cringe when I see that. The reason being is many custom 404s are implemented incorrectly and the pages that should be returning a 404 are actually returning a 200.
|6 months was longer than I was thinking it would take to see the result of the seo effort, thanks for the heads up. |
This is just "my experience" and I operate with a fairly small set of properties. This may not be the same for others and 6 months is just an educated guess. For some it could be a month, for others a few months, and it could even be as long a year for some. So many factors involved. :(
|Still it is unnerving to see all but one of your pages drop out of the index over night. ;-) |
Very unnerving and usually the death of a business who relies solely on the Internet to generate income. Free traffic is an added plus, unfortunately it's not guaranteed.
|That is my inclination anyway, just want to be sure I am not missing some big hole I have left in my approach. |
That's all you can do. You have to backtrack, check, double check, test, research, test, backtrack, etc. It's a never ending process and it will drive most over the edge. ;)
|Thanks for keeping this thread on focus, it is a huge help to those of use who are just coming up to speed with this stuff. |
You're quite welcome. It's unfortunate though because what we are discussing is typically not something your "average SEO" is going to be looking at. The list of factors that can affect your campaigns is a lengthy one. Once you've covered the ones you are familiar with, then you need to go on a mission and get answers to all those that you are not familiar with. It's a tedious process.
[edited by: pageoneresults at 5:55 pm (utc) on May 23, 2006]
I just checked my site and they cached the homepage for yesterday and yesterday it showed a cache for the day before that so looks like they are starting to rebuild the results like they say.
This is after last week my homepage not even being in the index at all. Anyway my site still isn't ranking well but at least I am back in so it is a start. Once I get the links that went bad cleaned up we will see.
|did you check and double check the server headers being returned by those pages that are being redirected? |
Yes, made extra sure the correct code is being returned in the header. Thanks.
Ok, I made the change on my site to list all 10,000 articles on one page from my homepage (not a main link, a secondary one just for Google). As I said I dropped from 57,000 to 700 in one day last week as they stopped spidering below three levels. Now, not even two days after I made this change, I have jumped to 10k pages indexed and almost all those articles are now indexed. My other fourth level pages are not. If that isn't funny I don't know what is. So for me, that pyramid structure did no good unless I had more links or whatever to be graced to four levels deep which would take a decent amount of time. Instead Google just gobbled up 10,000 unorganized (but alphabetical) articles on one page. Traffic is back up too. Just ridiculous.
We are loosing about 10 - 20 pages a day now - from 10k to 900 to 800 to 740 today. We have built some test pages now to hang straight of the index page to see if they get picked up... Traffic from internal pages is dropping...
What's the load time for a person on 56k for a page with 10,000 links?
I had a shopping cart template once that came with a sitemap generating page, I ran it once and it made a page with about 45,000 links. I thought that would be about as nasty of a page as could be made, so I didn't publish it. Now according to this testimonial it may have worked...whoda thunk it?
|I had a shopping cart template once that came with a sitemap generating page, I ran it once and it made a page with about 45,000 links. I thought that would be about as nasty of a page as could be made, so I didn't publish it. Now according to this testimonial it may have worked...whoda thunk it? |
I remember a shopping cart like that, too. They claimed to be SE friendly but what a crummy interface and one of my biggest gripes was the ridiculously huge sitemap. So we went with a different cart and generated a custom sitemap that managed to get each and every one of 120,000 items indexed. But that was in the bizarro world we lived in 12 months ago where usable sitemaps made sense and pages were added to the index instead of subtracted. I feel such nostalgia for the good old days.
So what would be the best way to do a sitemap for oh about 2000 pages? Would you limit to a certain amount of links and split it all up somehow or do one big site map? With the indexing problem we would have to have a sitemap directly off of the home page which links to the bulk of the pages below (keeping everything 3rd level). We have 15 main sections and about 200 or so sub sections with somewhere around 1700 or so articles within those sub sections. Any ideas on how to do that?
| This 200 message thread spans 7 pages: < < 200 ( 1  3 4 5 6 7 ) > > |