Some big observations on dropped pages - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Some big observations on dropped pages

«
1
2
3
4
5
6
7
»

tsm26

5:01 pm on May 22, 2006 (gmt 0)

10+ Year Member

I have been trying to figure our why my site dropped from 57,000 pages down to only 700. Today I noticed a huge pattern, and barring something major, I believe it is the reason for the dropped pages. First, I noticed that all pages three levels deep and higher are indexed. Any pages indexed lower than that are externally linked in some way.

How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.

So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.

When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.

Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.

Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.

[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]

vanillaice

5:36 am on May 26, 2006 (gmt 0)

10+ Year Member

Top Contributors Of The Month

I think it is leveling out at indexing 40k of my pages now. A lot better than 700

Man, I feel like such a newbie to the web and i've been doing it for years, I still don't know how you can get a site with 5k+ pages let alone 40k+ lol

tsm26

6:01 am on May 26, 2006 (gmt 0)

10+ Year Member

We purchased a license for over 10k pages of exclusive content from a business publisher otherwise 40k is a lot. The other pages are our forums, resources, and blogs.

adamovic

9:14 am on May 26, 2006 (gmt 0)

10+ Year Member

I also see a lot of spammer and scrapper sites in Google SERPS. I'm noticing that Googlers algotihms for spam detection are behavoring bad.

I think that I know what is the root of problem. All their algorithms are approximate (with numerical or statistical error) and when you perform 30 different algorithms on billions of pages numerical and statistical errors do accumulate.

Who does know the basics of numerical analyzis and/or probability/statistics will know what I'm talking about.

Accumulation of numberical and probability error for 30 different algorithms and billions of pages. That accumulation makes Google less relevant and at this point I'm seeing better results in Y! SERPS.

1984bb

9:42 am on May 26, 2006 (gmt 0)

10+ Year Member

"at this point I'm seeing better results in Y! SERPS."
Hm.... at yahoo? what Yahoo? Yahoo angola Yahoo uganda Yahoo mount everest?

stakaman

9:52 am on May 26, 2006 (gmt 0)

10+ Year Member

I have seen a huge drop of pages on one blog hosted on big G's free service (you know which service I am talking about).

I had 80% of unique content pages dropped, with the majority of the remaining gone into supplemental.

Another extremely interesting find, is that I am searching for the titles of my 'supplemental' pages and they don't even come up. At least before, G would rank supplementals if it didn't find more relevant pages in its clean index.
Has anyone else noticed this?

If Google can't properly index 'blogs' that they have created, then how do you expect them to correctly index the rest of the sites?

It signifies a major shift in G policy when indexing the web. They are not anymore organising the world's information but the information they see important, effectively 'censoring'.

Which again promts the thought:
Why drop so many pages?
Lack of capacity or simply a bug?

There is no other answer.
This is not a shift in priorities as Mr Cutts claims, this is G in serious trouble.

1984bb

10:17 am on May 26, 2006 (gmt 0)

10+ Year Member

"Why drop so many pages?"
They do not drop pages they reindexig with certain priorities (see Matt Cutts blog) and that will take some time and patience from webmasters and at this point is useless this thread goin on.....

stakaman

10:27 am on May 26, 2006 (gmt 0)

10+ Year Member

I have read Mr Cutts blog. He claims that refreshing the supplemental will take all summer!
With his postings affecting the stock market, he will not admit that there are major issues.

Why do you thing crawling is now prioritised?
Why did Bigdaddy shrink the 'Google' web instead of expanding it?
Why did Matt talk about indexing more pages before Bigdaddy and then started saying that its indexing only (or mainly) 'important' pages after the Bigdaddy bug was spotted.

With Google resolving canonical issues and even indexing more javascript everyone should see more pages, not less, after the last update.

There are thousands of webmasters that keep reporting problems with millions of sites hit by this.

And this is why this thread keeps going on.

:)

stakaman

10:29 am on May 26, 2006 (gmt 0)

10+ Year Member

Besides 1984bb, if Matts blog has all the answers why do you come to WebMasterWorld?

1984bb

11:08 am on May 26, 2006 (gmt 0)

10+ Year Member

"why do you come to WebMasterWorld"
I am not in WW to post why my pages are droped ,there are more interesting threads here ,to learn ,to take and give advice as a webmaster,to communicate with other webmasters in many other fields and not only because I don't rank at Google so I have to join WW to post libels against Google and Matt Cutts.
"I have read Mr Cutts blog. He claims that refreshing the supplemental will take all summer!"
this is what I saied it takes some time and patience.....

stakaman

11:33 am on May 26, 2006 (gmt 0)

10+ Year Member

Critisising Google is not illegal yet, as far as I know. So please let us make up our own minds about the recent developments.
I have been reading WebMasterWorld for over 3 years and Google engineers often listen in and resolve issues because of these threads.

Your ignorance is forgiven as a newbie...

mistah

7:24 pm on May 26, 2006 (gmt 0)

10+ Year Member

I'm starting to see a few pages coming back here in the UK.

A few selected "third level" pages from my site are now in the index. It's nothing big, but it's a glimmer of hope. Fingers crossed for a reindex over the Bank Holiday Weekend.

crobb305

8:15 pm on May 26, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

They do not drop pages they reindexig with certain priorities (see Matt Cutts blog) and that will take some time and patience from webmasters and at this point is useless this thread goin on.....

1984bb, Why are you reading and posting in this thread so frequently if it is "useless"? Hypocricy and immaturity at it's best.

On another note...I have seen a significant reduction in spidering on a site of mine during the past month, but there is no real reason for it in terms of IBLs or page "reputation". So, I wonder if part of their "priority" crawling has to do with the condition of a site's indexing? That is, if a pages are already indexed properly, utilize minimal redirects, and do not change frequently, then perhaps they are spidered less. It might not suggest that the page is less important, which many might automatically assume if they see googlebot visiting less. Just a thought.

Getting that supplemental mess cleaned up is a big task, so they really do need to prioritize.

TerrCan123

8:53 pm on May 26, 2006 (gmt 0)

10+ Year Member

I have found Matt Cutts blog to be very helpful, I am glad they allow someone to post as much as he has on this subject. It made me take a closer look at my site and see some problems I have which I think are hurting my rankings [such as the mysite.com vs. www.mysite.com problem, bad outside links].

Anyway hope he continues to write about what is going on, every bit helps.

steveb

9:19 pm on May 26, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Matt has always been very helpful and extremely generous. That doesn't mean that he isn't sometimes wrong or that he doesn't sometimes misjudge things or appreciate the acctual scope of problems. He's human, even if from Ken-tuck-yee.

CainIV

9:25 pm on May 26, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I am going to post some observations I have from my websites to help us better figure out what might be happening.

4 distinct websites about different genres.

2 have been registered and indexed in Google longer than 3 years.

2 were indexed in Nov 2004

All have 301 redirect in place from non www to www. All are taking part of link exchanges to a moderate degree. All of the sites have some one ways, some recips and some outbound links to high quality sites inthe respective genre.

All are using Google sitemaps, and every site has a link to a sitemap page that lists every page in the site, and this link is near the top of the page code for each page.

A site: search shows more pages listed now than last week for all four sites. A site search for each site shows the sitemap.html as average listing of 5-8th for the site search.

site: search shows the root url with www as the first listing in Google.

Hope this helps...

jo1ene

10:44 pm on May 26, 2006 (gmt 0)

10+ Year Member

I have been patiently waiting for my pages to come back. Pre BD, I had 1,800+ pages, dropped to 339, up to 607, now back down to 300ish. What's up? I also see that only the first 2 levels beyond the home page are being indexed. Even the 2nd level is sketchy. I completely overhauled the site in Sept., 2005. It was indexed well right off the bat, so link structure is not the problem.

Another issue that has been discussed and applies to me is that many of the pages are similar accross the website. I have the same 30 resources (apparel, travel, venues, etc.) under each of 50 some-odd regions. Some of them have not had substantial content (listings/ads) yet so they would probably appear identical to a machine.

But why should this be a bad thing? It's a completely natural occurance in a heirarchical-type-of-resource website. There's only so creative one can be before it looks sillier than the "duplication" - so called.

Who is it that we complain to? We were told to be patient. Now what?

avalanche101

11:07 pm on May 26, 2006 (gmt 0)

10+ Year Member

CainIV:

How have your sites been faring int he google SERPS post Big Daddy?

CainIV

11:30 pm on May 26, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Hi avalanche. My sites have slowly recovered starting the dreaful Sept 05 Jagger update and are moving up in the serps.

One thing I should further add is that around Sept I applied the 301's to all four websites I mentioned previously and also removed all links pointing home with the keyword in the anchor.

Hope it helps...

CainIV

11:34 pm on May 26, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I also see that only the first 2 levels beyond the home page are being indexed. Even the 2nd level is sketchy. I completely overhauled the site in Sept., 2005. It was indexed well right off the bat, so link structure is not the problem.

Possibly it could help to create a complete sitemap and link to it from a main part on every page as part of the navigation. Treat the sitemap as an important page in your site as it really is a directory of your entire site. Place the link to the sitemap high up in page code to further help establish importance. If Google deems it important, its likely to deem outgoing links on it important as well.

If you mean by having regions / duplication the following scenario:

this-widget-nantucket.html
this-widget-katmandu.html

and each page is really about the same content, with reworded text for the region and Title, then I could see why you might be nailed with a duplicate penalty.

vanillaice

12:09 am on May 27, 2006 (gmt 0)

10+ Year Member

Top Contributors Of The Month

Well, i've done the following things to try to improve my listings...

(I review products for my site under different categories)

1. Changed the location from www.domain.com/product/ to www.domain.com/product.asp

2. Replaced underscores with hyphens in file/image names.

3. Moved many links that are currently not linking back to my site to my links page (they were on the side menu on every page, blog style).

4. Kept my 5 top traffic trades on the menu, and added only 5 other link-trade links on my menu.
three are straight trades --
Link 1 (PR 4)
Link 2 (PR 0)
Link 3 (PR 4)
two are a-b-c trades --
Link 4 (IN = 3 / OUT = PR 0)
Link 5 (IN = 3 / OUT = PR 0)

While i'm at it, my 5 traffic trades =
Link 1 (PR 4)
Link 2 (PR 4)
Link 3 (PR 3)
Link 4 (PR 4)
Link 5 (PR 0) *should I remove?*

So my question is the last one, should I remove the PR 0 traffic trade? The traffic isn't all that great.

Also, will the links on the links.asp page hurt me? Should I scrap them? A lot of them are not even hardlink trades, they're sort of traffic trades where if I send traffic, my link appears on their page in a hardlink format usually. Not an ideal way, so I may remove them if they hurt.

Thanks for any tips! Hopefully my changes show some results, as another page was dropped today :( Down to 5 listed, yikes.

jo1ene

2:28 am on May 27, 2006 (gmt 0)

10+ Year Member

Possibly it could help to create a complete sitemap and link to it from a main part on every page as part of the navigation.

Why should I? Thank you for the advice, you might be right, but why should I have to play games? Obviously the way I was doing things was just fine - until now. And it's technically sound. Visitor's find their way around.

Besides, such a sitemap would be complete nightmare to any user.

If you mean by having regions / duplication the following scenario:
this-widget-nantucket.html
this-widget-katmandu.html

Something like that...

Something Region 1
--Region 1 apparel
--Region 1 food
--Region 1 lodging
Something Region 2
--Region 2 apparel
--Region 2 food
--Region 2 lodging

and each page is really about the same content, with reworded text for the region and Title, then I could see why you might be nailed with a duplicate penalty.

As I have mentioned elsewhere, why should this be suspect? It's a perfectly common, natural and legitimate practice - an obvious way to organize regional content in particular.

Google's policies are forcing people to do unnatural things to get around their "prevention" techniques.

I have been spedning my valuable time putting together articles and resources for my visitors. I don't have time to play games with these people.

avalanche101

11:47 am on May 27, 2006 (gmt 0)

10+ Year Member

"Google's policies are forcing people to do unnatural things to get around their "prevention" techniques."

And they say on their guide to webmasters to not create a site for search engines, but for visitors (wording to that effect). So it really shouldn't be a requirement to have a site map if your navigation is good enough for visitors to pottle around your site and find the page they require.

I'm sure if MSN or Yahoo! started asking for a sitemap submission people would be up in arms.

Ha!

CainIV

5:51 pm on May 27, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Why should I? Thank you for the advice, you might be right, but why should I have to play games?

You are right, but I thought that you were asking for tips (and not a conversation on the ethics of Google)

If you choose to chase Google traffic then playing the game is inevitable.

Having said that, I dont think having a sitemap on every page is a hindrance to your visitors whatsoever, it doesnt have to be in your middle content area.

LuckyGuy

7:02 pm on May 27, 2006 (gmt 0)

10+ Year Member

I have found Matt Cutts blog to be very helpful, I am glad they allow someone to post as much as he has on this subject. It made me take a closer look at my site and see some problems I have which I think are hurting my rankings [such as the mysite.com vs. www.mysite.com problem, bad outside links].

And that is the point. Now everyone looks for problems on their sites, because of bad ranking and lost pages. But before big daddy it worked very well with indexing, spidering and caching.
Why don�t someone suggest that google has some more problems with the new big daddy infrastructure. E.g. theres is Matt Cutts who explains that there is a new spider technic based on incoming links but on the other hand the google sitemaps team writes on their blog that they are looking why so many pages are dropping of the index. And they are hopefully not the reason. That does not match.
As an database engineering I know what it means to handle some million datasets. But google does billions with cache + old cache + deleted pages and so on. There is more than 100 rows of code but millions. Each row has the ability to run into an error. So if you do one wrong it will hurt the whole system. To find this small error is the case and will take time.

IMO

webdevfv

12:28 pm on May 28, 2006 (gmt 0)

10+ Year Member

Last year I put in place internal rewrites so that my product pages changed from mysite.com/product.php?id=001&make=brandname&model=thingy to mysite.com/001/brandname/thingy.html

Firstly, it looks better and perhaps easier to remember for users, and secondly it was aimed at helping rankings etc.

Well, that's all gone to pot since google doesn't rank hardly any of my product pages now and not one single new page since May 2005.

I'm now assuming that google thinks my product pages are four directories deep when I could easily have them as top level. Given my low PR I assume that's why my indexed page count is low.

I suppose I might as well go back to using the first example if Google will index all my product pages - has anyone any experience of doing this and seeing more pages indexed (with a low PR).

g1smd

5:25 pm on May 28, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I would have made that mysite.com/001/brandname/thingy.html URL actually an index page "look-alike", like mysite.com/001/brandname/thingy/ but more importantly I would also make sure that mysite.com/001/brandname/ and mysite.com/001/ serve content that shows links to "lower level" content, so that the heirarchy isn't broken.

webdevfv

7:34 am on May 29, 2006 (gmt 0)

10+ Year Member

The thing is my products belong to more than one category.

category pages are like this:
www.mysite.com/first-category.php
www.mysite.com/second-category.php

product pages are like this:
www.mysite.com/products/00001/brandname/first-model.html
www.mysite.com/products/00002/brandname/new-model.html
www.mysite.com/products/00003/brandname/other-model.html

They don't follow a hierarchical structure as such. What you end up getting is this:

www.mysite.com/first-category.php
www.mysite.com/products/00001/brandname/first-model.html
www.mysite.com/products/00002/brandname/new-model.html

www.mysite.com/second-category.php
www.mysite.com/products/00002/brandname/new-model.html
www.mysite.com/products/00003/brandname/other-model.html

As opposed to
www.mysite.com/first-category/00001/brandname/first-model.html
www.mysite.com/first-category/00002/brandname/new-model.html
www.mysite.com/second-category/00002/brandname/new-model.html
www.mysite.com/second-category/00003/brandname/other-model.html
which results in duplication of pages.

That's why I didn't go down the hierarchical route.

tigger

9:17 am on May 29, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>and each page is really about the same content, with reworded text for the region and Title, then I could see why you might be nailed with a duplicate penalty

I agree with Jo1ene, what is the problem in offering surfers the information they need in a well laid out form after all if you've got say a restaurant guide does G want to see one massive page with every restaurant on it or the guide split down into regions! so its easy for surfers to use. It really seems like G is asking webmasters to start going backward rather than forward

webdevfv

9:56 am on May 29, 2006 (gmt 0)

10+ Year Member

That's exactly it. If Google is to penalise sites that offer well laid out structures, those sites will simply change tack and do whatever is necessary to get indexed even if it means offering no directory structure at all.

www.mysite.com/00001-brandname-first-model.html
www.mysite.com/00002-brandname-new-model.html

It makes no difference to me although those URLs will get remarkably long once I replace brandname and model with the right details.

tigger

10:01 am on May 29, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

"if" it is that Google goes alone this route of pulling sites because of well laid out structures it won't take the public long before they get up if they can't find a local restaurants or reviews - it really makes you wonder if G just inst going alone some self destructive route with BD

This 200 message thread spans 7 pages: 200

«
1
2
3
4
5
6
7
»