Forum Moderators: Robert Charlton & goodroi
How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.
So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.
When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.
Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.
Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.
[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]
Was this the reason for the pages back in the index? I knew that some of my pages still had pr, even though they were not visible using site: command.
If you read matts blog he stated that the media bot will assist googlebot.
[webmasterworld.com...]
I'd say a large proportion of webmasters are in this trap.
The problem is clearly identified on Matt Cutt's blog. When you have links on gambling, viagra , underwear , fast cash appearing on a travel site which contains travel content, Google is able to identify the offending page.
The page disappears, the links to your site disappear and the whole thing propogates through the network.
Our experiment proves it with pinpoint accuracy and the very large no. of sites and the [ network in question ] recognise it.
However, it is only one of Google's many assault's on SPAM , so those need to be taken into account also , plus ,.... i firmly believe there is collataral damage.
[edited by: Whitey at 2:49 pm (utc) on June 10, 2006]
The problem is clearly identified on Matt Cutt's blog. When you have links on gambling, viagra , underwear , fast cash appearing on a travel site which contains travel content, Google is able to identify the offending page.
The page disappears, the links to your site disappear and the whole thing propogates through the network.
If your page offends the linking guidelines, your pages disappears as well as your overall ranking previously supported by those pages.
Our experiment proves it with pinpoint accuracy and the very large no. of sites and the [ network we observed in question ] recognise it - and there are some good big sites in the network - and spammy ones that should be removed -it's those one's which are causing the problem.
However, it is only one of Google's many assault's on SPAM causing page drops.... and i firmly believe there is collataral damage as Google irons out it's problems, not only related to this alert.
You loose your position, even if it's 3 or 4 sites removed - it travels down the line.
Then you look at the principles behind the algo. and possibly you have pages that fall into the same definition of the algo. applied to legitimate sites, such as your, and then loose pages.
I don't know the 2nd part of it, but some experiments on this would likely reveal combinations of linking and content issues - even of an internal nature [ like "link islands" ].
However, I keep emphasising that Google does appear to have a lot of other technical errors occurring, which is confusing people, especially who are making observations via the disfuntional site: tool [ possibly still not fully fixed ]
Sorry, this is on this thread, but i think there's an overlap, which probably should now go to the other thread - but there is a synergy and need, i felt to alert folks of the *urgent* propogation component.
Amazingly, few people have participated in the discussion [ and also the discussion of the related network's forum ] - despite it being recognised as a major collapse issue by the administrators of this key network. Here it is again [webmasterworld.com...]
My site dropped from nearly a million pages to less than 100,000 indexed by google overnight. I saw a similar drop in a competitor too. tsm26 did an amazing job of quickly identifying the issue and trying a solution that ended up solving his problem. Too few folks are willing to try innovative solutions like you did because they are counter to SEO folklore. Nice work!
tsm26, one final question for you: after implementing your solution did you get hurt in the results for the higher up pages that probably target more competitive terms, ie your pages that remained in the index when your page count dropped?
Our remaining pages currently rank very well for some very competitive terms. I'm concerned about changing the site structure to get the more obscure pages indexed will suck PR from our competitive pages and they will drop in the serps.
I'll try to make my question easy for you with multiple choice answers:
A. We never ranked for any competitive terms, so the change didn't matter.
B. We ranked well for competitive terms and still continue to after the change.
C. We ranked well for competitive terms and those terms got hurt in the serps after the change but the traffic from the more obscure terms more than made up for that loss.
D. Something else. Please explain :-)
Thanks again!
if i sell a green widget on my red widget websites domain, my green widget pages will be dropped because the websites about red widgets?
< continued here: [webmasterworld.com...] >
[edited by: tedster at 5:02 pm (utc) on June 14, 2006]