Welcome to WebmasterWorld Guest from 126.96.36.199
How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.
So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.
When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.
Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.
Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.
[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]
have been trying to figure our why my site dropped from 57,000 pages down to only 700.
How old is the site? Has this been a gradual trend or did you wake up one morning and poof, they were gone?
Also keep in mind that the site: command is currently not working (on hyphenated domains) and that has been confirmed by Google.
The vast majority of my content is on the fourth level of my site. By level I mean starting from the main page, not by the number of / in the url. For example on mine most go like 1. home page -> 2. category list -> 3. article list -> 4. article. So the pages on the "article lists" level are being indexed but not the ones on the "article" level.
Also, on the question of only some of the pages being indexed on one level, I said that if the pages are externally directly linked then they will get indexed regardless of the level they are on. I have about 100 of the fourth level articles being indexed because they are linked to directly. All the rest of my fourth level is not indexed, amounting to about 50,000 pages with my forum, the articles I mentioned earlier, and other resource pages lower than the third level.
My site is having the same pattern you mentioned.
The only thing I noticed is that the levels deep Google will crawl are not related to the PR (PR6 4 levels, PR5 3 Levels)..It will crawl pages 3 levels deep by PR. Meaning; my PR is 6 all pages with PR 4 are indexed, pages with PR 3 are gone. I tested with another site I have with PR5 all pages with PR3 are indexed,pages below are gone beside pages that have 3-4 inner links from crawled pages.
The best practice I guess is to have a huge site map with all links to the inner pages so the PR will be pushed there. this will also make all pages level 3.
Our site does the same thing. Level 1/2/3 get the bulk of the indexing. Level 4 where the BULK of our content lies is a small percentage. Pretty much google is indexing the "directory" navigation of the site not the content. I can get any page to get indexed simply by adding a link to the home page. As soon as the links gets taken off the page disappears. I find it highly unlikely to fit 2000 or so links on the home page. So the site is built for the visitors and split up into a feasible directory structure.
What I am finding hard to understand the bulk of our incoming links are to the deep content. Not many links to individual pages but the sum of which is a ton. If those pages are not being index/crawled then how is PR fairly calculated? What I mean is internal PR from those pages adding to the circulating PR as well as incoming links also adding to the PR circulation? Any thoughts on that matter?
I do know that my direct competitor has a pagerank of 6 and they are getting 4 levels indexed, while we are 5 and are getting 3 levels indexed.
I believe that answered your question right there. PageRank is like the Richter Scale. The difference between PR5 and PR6 is pretty impressive, especially with a site that is structured with deep links.
I don't think that is totally a coincidence. What I did is put a sitemap of all the past level 4 content into a level 2 page so now it is all level 3.
I believe pages should follow a logical directory/category structure. Within that structure you have root level pages that act as indexes into the content within each category. If the site is large enough, you'd have multiple site maps to control the flow of indexing. Internal linking structure is key in this instance.
My PR6 site has a 10 year old forum and it's pretty well indexed which is not bad since the posts are what....4th level?
I wish every site matched this theiry but the one with the few articles on the homepage ending up indexed is a perfect match.
As an addition I think the new indexing is done from the homepage down regardless if there is another page with a higher pr below it. I think this fits the behavior most are seeing. As an example, my blog is a PR of 5 and has all the posts and comments indexed, down to level 3. My friend's is a PR of 3 and only has the front page entries indexed. Forums if linked from the front page have a post level of 4, so if anyone has a pr of 5 on their home page and are linked to forums, check and see if your individual posts are still indexed. Mine were a week ago, now only a few remain, but all the topic level ones remain.
What I am going to do, is still keep the other structure, but maybe just have a link on the homepage to a big list that goes directly to the content instead of the directory structure. One for usability that would be prominent to users, and one maybe at the bottom for our friends the spiders. I know this is just an enlarged sitemap, but since Google seems to love sitemaps, they shouldn't mind my list of 10,000 articles on one page.
Since this new 'indexing depth' has just been releases by Google, and PR hasn't been updated (other than the flawed update) for ~6 months, is it possible that Google's 'real' PR is now different than what's displaying on the toolbar? Will 'link exchange' sites soon be dropping PR with the next update?
But for the short term I can only get all my pages indexed by not doing this. That is what I am complaining about. This is almost forcing me to take away some of that structure to fit the new indexing.
I wouldn't look at it that way. There really is no short term for what you want and need to do. It's a gradual process and you can't force the algo. ;)
You can say "go get more links", but that takes time, and my business can not wait 5-6 months to get me up to a PR 6 so I can get those other pages indexed.
I won't say that. I'd say continue to build upon what you have and let nature take it's course. With a little bit of direction from you of course. :)
I know this is just an enlarged sitemap, but since Google seems to love sitemaps, they shouldn't mind my list of 10,000 articles on one page.
Yikes! You definitely don't want a page of just links. There needs to be structure to that page, an outline.
Draw a map of your current site architecture. Put your home page at the top. Then list the primary categories under your home page. So now maybe you have the home page at top, then seven pages below the home page. Now, take those seven pages and spread those out to sub-categories. How many are there? Do they need to be spread out further (horizontally)? Think of your site as this huge pyramid. Within the pyramid will be other pyramids. All pyramids are linked naturally based on the architecture of the site.
For me, it's all about harnessing the power of the site structure for best overall indexing. If you are a new site, within the past 12 months, expect to see fluctuations while your deep level pages become seated in the index. In the mean time, you may have to do some PPC to stay in the game.
Another thing, you definitely need to make sure that the site has no major technical issues to contend with. A poorly implemented rewrite will do more harm than no rewrite at all. ;)
This is exactly the structure of our site. Makes no difference. For the deeper the content the less likely the indexing. For use to do a sitemap up to level three is virtually impossible.
The sitemap would be on level 2 and anything from that page would be level 3. Since level 4 is not getting indexed we cannot put 2000+ pages in that one file. We could split and have 15 sitemaps (linked from the home page) for each "category" Still with the number of articles in each category it would be impossible to put less than 100 links on the level 2 (which would make the articles level 3). Also that means there would be 15 site maps off the home pages sucking PR away from the normal site structure.
Hmmmm...seems like Google is really shooting itself in the foot. If this is a pattern then their revenues will sharply drop. And what happens when corps don't hit their projected earnings mark. Ask Dell.
Google is really shooting itself in the foot
May be not.
They probably hope that the use will navigate from the root indexed page to the deep not indexed page, following the links in the site, if he/she is interested.
It gives Google the possibility to show more sites of the same topic on the one search result page because they show the root pages only.
For the user it effectively means more choices.
That should make no difference unless I missed something that google was displaying 15 pages from a site in their results. If you are talking about the duel listings why not just show the BEST page from the site to free up space.
Even at that why not take a user DIRECTLY to the page? Isn't that what a search engine is for? MSN and Yahoo take a searcher directly to the best page.
I just don't think this is a good argument.
Having a sitemap that lists a lot of stuff is good.
Having multiple sitemaps is also good.
If Google is going to only devote so much crawl strength to you, then you just have to prioritize where you want it to go. Do you need a level two page crawled daily, or do you need 100 level four pages crawled once a month? Point your links and distribute your pagerank accordingly.
sit on my hands...approach or do something.
To comment on the main theory of this thread, I have a small site that has a directory structure of 1 level under the index. A wide and short pyramid. However, in an effort to be indexing friendly I created a deeper, keyword based, hierarchy using ModRewrite, so that all of my content now appears to be 3 to 4 levels under the root (my-site/brandname/producttype/product), with no content in-between. If your theory is correct, it would explain why I have lost all but my index page.
Reading PageOneResults post:
Another thing, you definitely need to make sure that the site has no major technical issues to contend with. A poorly implemented rewrite will do more harm than no rewrite at all.