Forum Moderators: open
Anyway, the site just took a major hit across all pages, and it's a rare time when we've wondered if we were affected by the algo element(s) that some refer to sandboxing.
Any experience along these lines? We were not aware of an existing site being hurt *just* by virture of adding 30% more pages. (No dup issues or anything obviously like that at play here.)
I can think of a few reasons why it could be related. If you added links to all those pages on current ranked pages you could have affected your keyword density, lost some PageRank (lets not start a discussion on whether that is important though), increased the % of links->text on the page and many other factors.
Did you link to these new pages from all your current pages or was it just one link to the new section from the main page? More information will help identify the problem and although it may have something to do with the added pages, I seriously doubt it has anything to do with just the fact that they were added.
In the meanwhile may we please have some more facts?
What's the PR of the index page?Yes! I know this might re-alight the argument about PR, the toolbar reading,etc but I think we need to look at everything here.
This is an excellent topic as we have been adding new pages to our sites but with great trepidation because of the fear it would drive our PR down. The other difficulty is trying to determine where the sandbox leaves off, and where the lack of an updated PR comes in.
Wouldn't linking to 500 pages with a PR0 have a negative overall effect?
IMHO, it depends on your site structure and where and how you add the new pages. Sometimes you can dilute the PR of the existing pages enough so that they lose their positions. And if the new pages don't rank well for any particular terms, then overall traffic may decrease.
Let's say the site is about widgets. And let's say that people buy mainly on shape, but color and size are also important. So, sitewide nav/links appear for the main "shape" pages, i.e., square widgets, round widgets, elongated widgets, etc.
Then each section then has it's own specific links, further classifying the section's widgets by color and size. So in the "sqaure widgets" section, there will be cross linking to square red widgets, square blue widgets, sqaure small widgets, etc.
The new section I'm referring to, which was previously a single page on "triangular widgets," has now got the same sort of subpages that the sqaure widgets section had (triangular widgets by color and size). All of the new pages in the triangular section now link back to the triangular widgets main page, and to the homepage.
This is the same format employed site wide...and I've never seen a site of similar structure hurt by the addition of a new section, or more pages to a section.
The damage to the entire site is extensive. Most pages are dropped on the order of 100-200 places in the SERP's. Most of the new pages have been spidered but don't appear anywhere in the SERP's.
It's as if the site were just hit by the Florida update. The only new event was the addition of the pages. I find it nearly impossible to believe that adding new pages is the issue, but maybe too many pages at once bothered them? Makes no sense to me.
I've more or less ruled out link text as a problem. The new pages are not the most original in the world because the site is essentially a directory sort of thing, but that was never an issue before. What text there is is unique to us (no feeds or standard aff blurbs).
Cabbie, the new pages only went up around the 12th of the month, so I can't say about the tweak on the 5th, and if anything, things seemed to loosen up again on the 22nd or thereabouts.
I made a post some months back about what to do if your site gets hit (dropped site checklist), but using my own darned checklist (which was supplemented by others), has been of little help. It's driving me crazy. It's not like we just added 10,000 auto-gen pages. These pages were hand coded. Yes we still do some of that. ;-)
Since then, just based on a gut feeling, I've been adding new pages at a slower rate, waiting for each batch of new pages to be cached before adding more. I dont know if it makes any difference, but I figured there would be less risk (if any ever existed) by going a bit slower.
Whilst waiting for your info, I'd cobbled together some thoughts of my own.
My own deep suspicion is that the villain is PR, or more precisely, the diminution of PR by a sudden increase in the new pages taking you below a threshold when the neo-sandbox effect comes into play.
Let's digress slightly. New sites on new domains get sandboxed for circa 8 months (according to observations made by others on this forum).It seems all new incoming links are 'frozen' and PR delayed. That seems to make sense as Google vets the links for 'unatural accretion'
New sites where the file names are changed but on an existing domain get put into limbo for three months.Once again PR falls below a threshold because inward links may have nowhere to go.
New sites which keep their file names on existing domains get updated immediately without loss of PR (Not sure about this one)
It seems to me therefore that loss of PR in all these cases is a common factor.
Am I missing a trick here?
Wouldn't linking to 500 pages with a PR0 have a negative overall effect?
Adding more content will not hurt your PR in the sense that it will damage your over all competitiveness.
I have added huge blocks of pages - 200, 300, 500 at a time when needed, also, at times when it was appropriate I parceled out the new content 15-20 articles a week so that there was always fresh content, but in "spider size" portions to keep freshbot interested.
It depends on the site, and the level of maintenance you want to assign to it - some times it is just easier to upload it and forget about it, other times it is better to do some handholding.
On the 22nd-23rd this site took a dive from hundreds of first page results to ~30 to ~300 place positions, with the average drop from first page to about page 9 or 10. No other changes to the site, and it remains a strong PR6 with about 25 to 30 internal pages also a PR6. All pages continue to be fully indexed, and all the new pages in the subsection have been added as well.
I, too, had wondered if the new pages had anything to do with the drop in rankings as nothing else has been done to the site (other than minor updates) and other sites, which I also oversee, weren't effected by the change around the 22nd.
The new pages were added between the 5th and the 15th, and were all crawled before the 22nd.
>> This is the same format employed site wide
>> The new pages are not the most original in the world
>> What text there is is...
Erhm.. something strikes me a being quite similar here... if i didn't know better i might think that these four phrases were just one phrase with only minor variations. I guess if i saw 500 similar phrases i wouldn't really be able to tell one from the other... get it? ;)
--
Added:
yes, i saw that: "No dup issues or anything obviously like that" - but obvious isn't really always...
No, what I meant was, it's not like this is CNNdotcom where every page is a new original piece or something. This is an information site organized a bit like a directory. It also contains contributed content. Similar in structure to Brett's pyramid thing. There are text blurbs for each entry, and we write every one of the entries ourselves; 10 or 20 entries per page...thousands of entries across the site. And there are also contributed articles (hundreds) from enthusiasts with interest/expertise.
Also, I used widgets to illustrate, but the site is not about things for sale. Mainly information. It's closer to a site about butterflies than it is to a site selling books or bbq's.
It's more a labor of love, that ended up being profitable too...revenues now come from advertising/sponsorships and more recently (last year) some specialized aff links.
I made some of the points I did because we've already looked at things like too many templated pages and stuff like that. But we don't see any dup issues here, and believe me, we've seen some pretty subtle dub problems in the past. (I'm one who believes that the dup filters have a *lot* to do with what has been going on over the last year.)
Added:
IF the site were about butterflies, which it's not, and we added a new section on European varieties, when the site had mainly covered N. American varieties, one might expect the new pages to be structured similarly to the old, as is the case here.
[edited by: caveman at 6:19 pm (utc) on Sep. 28, 2004]
We were hoping to get a bit of nice press for the new section.
I am, however, curious as to whether your new pages have been spidered. We've also been adding huge numbers of new pages, and have noticed that the speed for these pages to show up in Google has slowed down significantly.
Never seen the addition of new pages to a site hurt the site's PR. :) The percentage of links pointing to external pages is no different on the new pages to what exists on the old pages.
Re the dup thing, as noted, the content on the new pages is similar to the old content only to the extent that when a butterfly site adds new pages about new regions/varieties, the new pages are also about butterflies. I can't see that hurting the site.
We looked at whether more anchor text (by virtue of more new pages) pointing back to other key pages in the site might have caused a problem, but we're not going to have some of our links say "Regal Butterflies" and others say "Regal-ish Butterflies" just to avoid some screwy filter.
----
Here's the updated track for the search "
bbc site:bbc.co.uk" - i'm not sure it's 100% related, but with a bit of effort (and a suspicious mind) i guess you could spot a trend anyway: Oct 12, 2003 [webmasterworld.com]: 3,100,000 pages
Apr 09, 2004 [webmasterworld.com]: 823,000 pages
Jul 12, 2004 [webmasterworld.com]: 696,000 pages
Sep 28, 2004 [google.com]: 586,000 pages
I do think that there's something to size of sites playing a role in the algo...and also rate of growth (though I know that that's controversial)...but adding 500 pages of genuine listings to a 1400 page site, as a reason for essentially knocking out the entire site?
BTW, PR bars still show nornal, but I don't trust 'em now anyway, so who knows.
How did you create these pages? Where did the information come from? Did you do a wee bit of plagiarism ie re-write stuff already in print or the net, add some fresh thoughts, that sort of thing.
How many of us have the cash or time to commission original material from an expert. We use copywriters, freelance authors, just about anyone with some writing skill.
Could you have been hit by Latent Semantic Indexing?
If your writers were not experts in the field, just ordinary folk with a writing gift, could they have written stuff which didn't look right when LSI is applied? Yes! a horror story, if LSI has advanced to penalising content wholesale.
But, if we think hard about Googles probable intention to rid the net of duplicate content ( no!, that's a certainty) and all the puff and wind which goes for content it becomes a serious contender.
We all know good, rich, abundant text content is part of the "weighting" in the algo ( or we think we do) - Google is always banging on about the need for us all to concentrate on giving the punter a rich satisfying diet. So it could be on the cards for a LSI filter to knock out anything which was 'unnatural'
Phew! a worrying thought isn't it?
Back to you Caveman. How did you write your new stuff?
What was your root stock of information?
Here is a comment in another thread by Decaff:
Today...when a new site that is trying to "suddenly" compete in a competitive area shows up in the SERPs with hundreds of inbound links, a huge number of content pages but no real history...this raises a red flag as far as Google is concerned...this would fall under the "anomalies" aspect of data mining..
If your content is seen as 'anomalous' because it is (a) suddenly injected into the website and (b) looks like wind and puff when examined by LSI then surely Google will put it into limbo for a while?
Please answer my question about how you created the new content. Could it be construed as wind and puff?