homepage Welcome to WebmasterWorld Guest from 54.161.155.142
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 200 message thread spans 7 pages: 200 ( [1] 2 3 4 5 6 7 > >     
Some big observations on dropped pages
tsm26




msg:722450
 5:01 pm on May 22, 2006 (gmt 0)

I have been trying to figure our why my site dropped from 57,000 pages down to only 700. Today I noticed a huge pattern, and barring something major, I believe it is the reason for the dropped pages. First, I noticed that all pages three levels deep and higher are indexed. Any pages indexed lower than that are externally linked in some way.

How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.

So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.

When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.

Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.

Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.

[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]

 

mistah




msg:722451
 7:02 pm on May 22, 2006 (gmt 0)

I have noticed the same thing. Except that my PR5 site is only boing indexed two layers deep.

I think the only solution is to get more "quality" links in to my site. Remember, there is a lot more to it than PR these days - Google also tries to assess the "quality" of those links.

hvacdirect




msg:722452
 7:11 pm on May 22, 2006 (gmt 0)

They always say design the site for people not search engines. If something like this is true, then I guess the homepage should be your sitemap, a totally flat design.

tsm26




msg:722453
 7:18 pm on May 22, 2006 (gmt 0)

I didn't mean to say there isn't more than just PR involved. In fact I meant the opposite, in that google seems to now be doing an overall ranking for a site which tells it how deep it will crawl the site. Like the previous poster said, that seems to make it more viable to put a huge sitemap and structure on the front page which isn't very friendly. I am trying something on my end and I will keep everyone posted on how it goes.

tsm26




msg:722454
 7:21 pm on May 22, 2006 (gmt 0)

I also forgot to mention that when I said three levels down that was including the home page. So not counting the home page as a layer then it is only two layers. I think that is what you meant mistah, that it was two layers from the home page, which adds credibility to what we are seeing. Can anyone else confirm this with their site?

pageoneresults




msg:722455
 7:22 pm on May 22, 2006 (gmt 0)

have been trying to figure our why my site dropped from 57,000 pages down to only 700.

How old is the site? Has this been a gradual trend or did you wake up one morning and poof, they were gone?

Also keep in mind that the site: command is currently not working (on hyphenated domains) and that has been confirmed by Google.

Joy6320




msg:722456
 7:35 pm on May 22, 2006 (gmt 0)

I don't think your observation is correct. On one of my sites that has dropped pages all pages in the the 1st and 2nd level have been indexed. At the 3rd level about 1/2 the pages have been indexed. No indexing at 4th level. However if you were correct I should have all or no pages indexed at the 3rd level. How about this theory: Some people have suggested that with Big Daddy a significant amount of pages had to be reindexed. If this is correct wouldn't the top levels of a website be indexed 1st?

trinorthlighting




msg:722457
 8:42 pm on May 22, 2006 (gmt 0)

I went from 80 pages indexed and cached to 4 pages in one weekend on one of my sites. Wow!

I noticed all my links for these pages dissappeared as well using the link command.

80 page one results on google serps now replaced by a scraper site... Way to go google....

tsm26




msg:722458
 8:52 pm on May 22, 2006 (gmt 0)

The drop didn't happen gradually, last week on Wednesday night my site had 57,000 indexed, the next morning a little over 700.

The vast majority of my content is on the fourth level of my site. By level I mean starting from the main page, not by the number of / in the url. For example on mine most go like 1. home page -> 2. category list -> 3. article list -> 4. article. So the pages on the "article lists" level are being indexed but not the ones on the "article" level.

Also, on the question of only some of the pages being indexed on one level, I said that if the pages are externally directly linked then they will get indexed regardless of the level they are on. I have about 100 of the fourth level articles being indexed because they are linked to directly. All the rest of my fourth level is not indexed, amounting to about 50,000 pages with my forum, the articles I mentioned earlier, and other resource pages lower than the third level.

tsm26




msg:722459
 9:00 pm on May 22, 2006 (gmt 0)

The drop in pages is not because Google is misreporting site: We don't have a hyphenated domain name. The night that this drop happened caused our pageviews to drop from 10,352 the one day to 4,540 the next day. Unique visitors dropped from 4410 the one day to 1496. Overall we have seen a 50-60% drop in page views, and a 400% drop in unique visitors. The pages deindexed were original articles that drove a lot of one page visits but got a ton of adsense revenue.

joergnw10




msg:722460
 9:22 pm on May 22, 2006 (gmt 0)

The theory doesn't work for my site: About half of level 2 is indexed and maybe 20% of level 3 (a lot of them supplemental). Those level 2 pages that have links coming in from other sites are not indexed.

asher02




msg:722461
 9:29 pm on May 22, 2006 (gmt 0)

tsm26 you are right.

My site is having the same pattern you mentioned.

The only thing I noticed is that the levels deep Google will crawl are not related to the PR (PR6 4 levels, PR5 3 Levels)..It will crawl pages 3 levels deep by PR. Meaning; my PR is 6 all pages with PR 4 are indexed, pages with PR 3 are gone. I tested with another site I have with PR5 all pages with PR3 are indexed,pages below are gone beside pages that have 3-4 inner links from crawled pages.

The best practice I guess is to have a huge site map with all links to the inner pages so the PR will be pushed there. this will also make all pages level 3.

arubicus




msg:722462
 9:38 pm on May 22, 2006 (gmt 0)

To add to the theory:

Our site does the same thing. Level 1/2/3 get the bulk of the indexing. Level 4 where the BULK of our content lies is a small percentage. Pretty much google is indexing the "directory" navigation of the site not the content. I can get any page to get indexed simply by adding a link to the home page. As soon as the links gets taken off the page disappears. I find it highly unlikely to fit 2000 or so links on the home page. So the site is built for the visitors and split up into a feasible directory structure.

What I am finding hard to understand the bulk of our incoming links are to the deep content. Not many links to individual pages but the sum of which is a ton. If those pages are not being index/crawled then how is PR fairly calculated? What I mean is internal PR from those pages adding to the circulating PR as well as incoming links also adding to the PR circulation? Any thoughts on that matter?

jrs_66




msg:722463
 10:38 pm on May 22, 2006 (gmt 0)

I could also confirm this theory. Levels 1,2 and 3 being indexed- without exception. Most of my content is on level 4- no indexing.

tsm26




msg:722464
 10:42 pm on May 22, 2006 (gmt 0)

Your guess is as good as mine as how they are calculating this overall "siterank". I do know that my direct competitor has a pagerank of 6 and they are getting 4 levels indexed, while we are 5 and are getting 3 levels indexed. I don't think that is totally a coincidence. What I did is put a sitemap of all the past level 4 content into a level 2 page so now it is all level 3. We will see if the pages now get indexed over the next few days. All this in my opinion is just gonna make usability all go to crap, but if it fixes things even in the near term I am all for it.

pageoneresults




msg:722465
 10:49 pm on May 22, 2006 (gmt 0)

I do know that my direct competitor has a pagerank of 6 and they are getting 4 levels indexed, while we are 5 and are getting 3 levels indexed.

I believe that answered your question right there. PageRank is like the Richter Scale. The difference between PR5 and PR6 is pretty impressive, especially with a site that is structured with deep links.

I don't think that is totally a coincidence. What I did is put a sitemap of all the past level 4 content into a level 2 page so now it is all level 3.

I believe pages should follow a logical directory/category structure. Within that structure you have root level pages that act as indexes into the content within each category. If the site is large enough, you'd have multiple site maps to control the flow of indexing. Internal linking structure is key in this instance.

Atomic




msg:722466
 10:55 pm on May 22, 2006 (gmt 0)

I have a PR6 site that's well indexed for levels 1,2 and 3. Then I have another PR4 site that is indexed for levels 1 and 2 and 4 new articles on the homepage are also indexed. Then I have a PR3 site with only a few pages indexed. A level 2 page has higher PR than the homepage because of all the links that come to it yet it's not even in the index. This happens to be the sites very popular forum that was introduced in February. So a new forum but it caught on quickly and we gave some high profile niche personalities their own sections which led to all the linking. Still, despite the targeted, organic linking and decent PR when Big Daddy kicked in none of the pages are in the index.

My PR6 site has a 10 year old forum and it's pretty well indexed which is not bad since the posts are what....4th level?

I wish every site matched this theiry but the one with the few articles on the homepage ending up indexed is a perfect match.

tsm26




msg:722467
 11:21 pm on May 22, 2006 (gmt 0)

Obviously there are much more complicated things going on then my simplified theory, so if people have some more things to add that would be great.

As an addition I think the new indexing is done from the homepage down regardless if there is another page with a higher pr below it. I think this fits the behavior most are seeing. As an example, my blog is a PR of 5 and has all the posts and comments indexed, down to level 3. My friend's is a PR of 3 and only has the front page entries indexed. Forums if linked from the front page have a post level of 4, so if anyone has a pr of 5 on their home page and are linked to forums, check and see if your individual posts are still indexed. Mine were a week ago, now only a few remain, but all the topic level ones remain.

kidder




msg:722468
 11:24 pm on May 22, 2006 (gmt 0)

So who is adding content and who is sitting on their hands?

tsm26




msg:722469
 11:26 pm on May 22, 2006 (gmt 0)

pageoneresults, I think that sites should follow the logical directory structure also, but for the short term I can only get all my pages indexed by not doing this. That is what I am complaining about. This is almost forcing me to take away some of that structure to fit the new indexing. You can say "go get more links", but that takes time, and my business can not wait 5-6 months to get me up to a PR 6 so I can get those other pages indexed.

What I am going to do, is still keep the other structure, but maybe just have a link on the homepage to a big list that goes directly to the content instead of the directory structure. One for usability that would be prominent to users, and one maybe at the bottom for our friends the spiders. I know this is just an enlarged sitemap, but since Google seems to love sitemaps, they shouldn't mind my list of 10,000 articles on one page.

jrs_66




msg:722470
 11:40 pm on May 22, 2006 (gmt 0)

Is it possible-

Since this new 'indexing depth' has just been releases by Google, and PR hasn't been updated (other than the flawed update) for ~6 months, is it possible that Google's 'real' PR is now different than what's displaying on the toolbar? Will 'link exchange' sites soon be dropping PR with the next update?

pageoneresults




msg:722471
 11:42 pm on May 22, 2006 (gmt 0)

But for the short term I can only get all my pages indexed by not doing this. That is what I am complaining about. This is almost forcing me to take away some of that structure to fit the new indexing.

I wouldn't look at it that way. There really is no short term for what you want and need to do. It's a gradual process and you can't force the algo. ;)

You can say "go get more links", but that takes time, and my business can not wait 5-6 months to get me up to a PR 6 so I can get those other pages indexed.

I won't say that. I'd say continue to build upon what you have and let nature take it's course. With a little bit of direction from you of course. :)

I know this is just an enlarged sitemap, but since Google seems to love sitemaps, they shouldn't mind my list of 10,000 articles on one page.

Yikes! You definitely don't want a page of just links. There needs to be structure to that page, an outline.

Draw a map of your current site architecture. Put your home page at the top. Then list the primary categories under your home page. So now maybe you have the home page at top, then seven pages below the home page. Now, take those seven pages and spread those out to sub-categories. How many are there? Do they need to be spread out further (horizontally)? Think of your site as this huge pyramid. Within the pyramid will be other pyramids. All pyramids are linked naturally based on the architecture of the site.

For me, it's all about harnessing the power of the site structure for best overall indexing. If you are a new site, within the past 12 months, expect to see fluctuations while your deep level pages become seated in the index. In the mean time, you may have to do some PPC to stay in the game.

Another thing, you definitely need to make sure that the site has no major technical issues to contend with. A poorly implemented rewrite will do more harm than no rewrite at all. ;)

arubicus




msg:722472
 11:50 pm on May 22, 2006 (gmt 0)

"Think of your site as this huge pyramid. Within the pyramid will be other pyramids. All pyramids are linked naturally based on the architecture of the site."

This is exactly the structure of our site. Makes no difference. For the deeper the content the less likely the indexing. For use to do a sitemap up to level three is virtually impossible.

The sitemap would be on level 2 and anything from that page would be level 3. Since level 4 is not getting indexed we cannot put 2000+ pages in that one file. We could split and have 15 sitemaps (linked from the home page) for each "category" Still with the number of articles in each category it would be impossible to put less than 100 links on the level 2 (which would make the articles level 3). Also that means there would be 15 site maps off the home pages sucking PR away from the normal site structure.

texasville




msg:722473
 11:52 pm on May 22, 2006 (gmt 0)

>>>>Overall we have seen a 50-60% drop in page views, and a 400% drop in unique visitors. The pages deindexed were original articles that drove a lot of one page visits but got a ton of adsense revenue. <<<<<

Hmmmm...seems like Google is really shooting itself in the foot. If this is a pattern then their revenues will sharply drop. And what happens when corps don't hit their projected earnings mark. Ask Dell.

Vadim




msg:722474
 12:39 am on May 23, 2006 (gmt 0)

Google is really shooting itself in the foot

May be not.
They probably hope that the use will navigate from the root indexed page to the deep not indexed page, following the links in the site, if he/she is interested.

It gives Google the possibility to show more sites of the same topic on the one search result page because they show the root pages only.

For the user it effectively means more choices.
Vadim

arubicus




msg:722475
 12:53 am on May 23, 2006 (gmt 0)

"It gives Google the possibility to show more sites of the same topic on the one search result page because they show the root pages only."

That should make no difference unless I missed something that google was displaying 15 pages from a site in their results. If you are talking about the duel listings why not just show the BEST page from the site to free up space.

Even at that why not take a user DIRECTLY to the page? Isn't that what a search engine is for? MSN and Yahoo take a searcher directly to the best page.

I just don't think this is a good argument.

texasville




msg:722476
 1:30 am on May 23, 2006 (gmt 0)

I agree Arbicus. In my sector we sell "widgets" of ten different kinds. Google has deindexed all ten of the widget pages. Still serves up my index page in the serps. This puts visitors an extra click away from what they want. Used to..it served up mysite/blue-widget.html for a blue widget search. And the surfer went right to it.
Now this is a small site. Not too bad but for the huge sites with hundreds, thousands of pages..it has to be difficult for the surfer to find the page he was looking for.
If this keeps up, it will be a night mare. I use the se's to find pages dealing with different technical issues when building a site. Like a css issue. If Google no longer indexes that page from w3c or other sites I use...I'll just have to use msn or Y!..and so will the general surfer after a while.
And as I said..if they are dropping all those pages with adsense..then they got to be losing revenue. Too many people just opt out if they don't hit what they want right away. And the poster also said his traffic was way down. No revenue.

steveb




msg:722477
 1:41 am on May 23, 2006 (gmt 0)

Having a directory structure is good.

Having a sitemap that lists a lot of stuff is good.

Having multiple sitemaps is also good.

If Google is going to only devote so much crawl strength to you, then you just have to prioritize where you want it to go. Do you need a level two page crawled daily, or do you need 100 level four pages crawled once a month? Point your links and distribute your pagerank accordingly.

The_Zenker




msg:722478
 1:51 am on May 23, 2006 (gmt 0)

I manage a small shop site that dropped from 400+ pages to only the main index page showing in the index. It is a hyphenated domain name i.e. my-site.com, so I was hoping that it was all related to the "fix" that google is working on. I have been following this thread, and others, to try and determine whether I should take the...
sit on my hands
...approach or do something.

To comment on the main theory of this thread, I have a small site that has a directory structure of 1 level under the index. A wide and short pyramid. However, in an effort to be indexing friendly I created a deeper, keyword based, hierarchy using ModRewrite, so that all of my content now appears to be 3 to 4 levels under the root (my-site/brandname/producttype/product), with no content in-between. If your theory is correct, it would explain why I have lost all but my index page.

Reading PageOneResults post:
Another thing, you definitely need to make sure that the site has no major technical issues to contend with. A poorly implemented rewrite will do more harm than no rewrite at all.

made me re-think this approach. However, the fact is that I rolled out this structure in March and all of my pages were indexed and traffic was decent.
So my rewrite code is technically good, just now maybe the rules have changed?

ronburk




msg:722479
 2:50 am on May 23, 2006 (gmt 0)

Point your links and distribute your pagerank accordingly.

The voice of reason, crying out in the wilderness :-).

This 200 message thread spans 7 pages: 200 ( [1] 2 3 4 5 6 7 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved