| This 84 message thread spans 3 pages: 84 (  2 3 ) > > || |
|Can a Load of New Pages Hurt an Existing Site?|
Not sure I've Heard Comments on This
One of our mid sized sites became quite a bit larger two weeks ago, when we added about 500 pages of content...a section we had wanted to add for a while now. The new section provides greater detail and info on a subtopic of our site which was previously only covered by a single page.
Anyway, the site just took a major hit across all pages, and it's a rare time when we've wondered if we were affected by the algo element(s) that some refer to sandboxing.
Any experience along these lines? We were not aware of an existing site being hurt *just* by virture of adding 30% more pages. (No dup issues or anything obviously like that at play here.)
I cannot think of a single reason why Google would penalise more content, that wouldn't make sense considering that a huge number of sites add 100-1000 pages a day.
I can think of a few reasons why it could be related. If you added links to all those pages on current ranked pages you could have affected your keyword density, lost some PageRank (lets not start a discussion on whether that is important though), increased the % of links->text on the page and many other factors.
Did you link to these new pages from all your current pages or was it just one link to the new section from the main page? More information will help identify the problem and although it may have something to do with the added pages, I seriously doubt it has anything to do with just the fact that they were added.
Anything perceived as 'unnatural' by Google is at risk in my opinion of a penalty, or at any rate a period in the wilderness.
To put up 30% more pages in one hit may be regarded as 'unnatural'
I've read somewhere else on this forum recently the view that pages should be added cautiously.
Don't know yet whether there is sufficient evidence to back up this theory. Perhaps this thread will encourage folk to come forward with their experiences.
In the meanwhile may we please have some more facts?
What's the PR of the index page?Yes! I know this might re-alight the argument about PR, the toolbar reading,etc but I think we need to look at everything here.
caveman - it sounds like a coincidence - I agree that there is no reason putting up new content and linking it correctly would affect your overall standings for an existing site - despite the dire warnings of the nay-sayers about sandboxes and penalty boxes - I tend to think that it would be too quick to judge the results based on a few days without waiting it out, see the fresh results without the effect of the little SERPs update that has been occuring for the last few days. Im seeing that its taking about a week to 10 days for that type of content add to show up in the results.
I am in the same boat Caveman, a site that I added a lot of pages to over the last month has just taken a hit.I too wondered if the new pages had an effect but I don't think so.This site took a hit on aug 5 but had recovered nearly all traffic until this last update.I am putting it down to a reversion to the algo used on Aug 5th as all my sites that suffered then suffered with this last one and all my sites that benefited from Aug 5th gained with this last one.Too coincidental in my book.
I also have sites that have remained steady through all of them.It would be nice for all of them to have that stability.
It's the same here as well ,in one of my sites i added over 100 new pages the show after 3 days in the results some of them , very fresh though ,hit the top finaly my entire page was hit at the 23-24 update.
A load of new pages are ready to upload. I wonder should I join the boat or wait?
|I am in the same boat Caveman |
Were all the 500 pages spidered?
Wouldn't linking to 500 pages with a PR0 have a negative overall effect?
This is an excellent topic as we have been adding new pages to our sites but with great trepidation because of the fear it would drive our PR down. The other difficulty is trying to determine where the sandbox leaves off, and where the lack of an updated PR comes in.
|Wouldn't linking to 500 pages with a PR0 have a negative overall effect? |
IMHO, it depends on your site structure and where and how you add the new pages. Sometimes you can dilute the PR of the existing pages enough so that they lose their positions. And if the new pages don't rank well for any particular terms, then overall traffic may decrease.
My apologies. Clearly I should have provided more info.
Let's say the site is about widgets. And let's say that people buy mainly on shape, but color and size are also important. So, sitewide nav/links appear for the main "shape" pages, i.e., square widgets, round widgets, elongated widgets, etc.
Then each section then has it's own specific links, further classifying the section's widgets by color and size. So in the "sqaure widgets" section, there will be cross linking to square red widgets, square blue widgets, sqaure small widgets, etc.
The new section I'm referring to, which was previously a single page on "triangular widgets," has now got the same sort of subpages that the sqaure widgets section had (triangular widgets by color and size). All of the new pages in the triangular section now link back to the triangular widgets main page, and to the homepage.
This is the same format employed site wide...and I've never seen a site of similar structure hurt by the addition of a new section, or more pages to a section.
The damage to the entire site is extensive. Most pages are dropped on the order of 100-200 places in the SERP's. Most of the new pages have been spidered but don't appear anywhere in the SERP's.
It's as if the site were just hit by the Florida update. The only new event was the addition of the pages. I find it nearly impossible to believe that adding new pages is the issue, but maybe too many pages at once bothered them? Makes no sense to me.
I've more or less ruled out link text as a problem. The new pages are not the most original in the world because the site is essentially a directory sort of thing, but that was never an issue before. What text there is is unique to us (no feeds or standard aff blurbs).
Cabbie, the new pages only went up around the 12th of the month, so I can't say about the tweak on the 5th, and if anything, things seemed to loosen up again on the 22nd or thereabouts.
I made a post some months back about what to do if your site gets hit (dropped site checklist), but using my own darned checklist (which was supplemented by others), has been of little help. It's driving me crazy. It's not like we just added 10,000 auto-gen pages. These pages were hand coded. Yes we still do some of that. ;-)
I have added a similar percentage of new pages to an established high ranking site. The new pages were spidered and indexed within a few days. Many went to top 10 in the serps with no negative results suffered by the rest of the site.
Since then, just based on a gut feeling, I've been adding new pages at a slower rate, waiting for each batch of new pages to be cached before adding more. I dont know if it makes any difference, but I figured there would be less risk (if any ever existed) by going a bit slower.
Thanks Caveman for coming back with more information.
I for one will now see if I can come up with some helpful observations.
Whilst waiting for your info, I'd cobbled together some thoughts of my own.
My own deep suspicion is that the villain is PR, or more precisely, the diminution of PR by a sudden increase in the new pages taking you below a threshold when the neo-sandbox effect comes into play.
Let's digress slightly. New sites on new domains get sandboxed for circa 8 months (according to observations made by others on this forum).It seems all new incoming links are 'frozen' and PR delayed. That seems to make sense as Google vets the links for 'unatural accretion'
New sites where the file names are changed but on an existing domain get put into limbo for three months.Once again PR falls below a threshold because inward links may have nowhere to go.
New sites which keep their file names on existing domains get updated immediately without loss of PR (Not sure about this one)
It seems to me therefore that loss of PR in all these cases is a common factor.
Am I missing a trick here?
There's been some juggling in the SERPs over the past week or so, I wouldn't necessarily blame add new content for the rankings shift.
|Wouldn't linking to 500 pages with a PR0 have a negative overall effect? |
Adding more content will not hurt your PR in the sense that it will damage your over all competitiveness.
I have added huge blocks of pages - 200, 300, 500 at a time when needed, also, at times when it was appropriate I parceled out the new content 15-20 articles a week so that there was always fresh content, but in "spider size" portions to keep freshbot interested.
It depends on the site, and the level of maintenance you want to assign to it - some times it is just easier to upload it and forget about it, other times it is better to do some handholding.
I, too, added a new section to a web site that took a hit on the 22nd-23rd. That new section increased site size about 25% (from about 170 pages to 220). I added a link to the main navigation bar used throughout the site, which goes to a page with links to the new sub pages. The design of the section involved a new (original) template, and each has original content, hand-coded. None of these new pages have any out bound links, nor any backlinks yet - just internal navigation.
On the 22nd-23rd this site took a dive from hundreds of first page results to ~30 to ~300 place positions, with the average drop from first page to about page 9 or 10. No other changes to the site, and it remains a strong PR6 with about 25 to 30 internal pages also a PR6. All pages continue to be fully indexed, and all the new pages in the subsection have been added as well.
I, too, had wondered if the new pages had anything to do with the drop in rankings as nothing else has been done to the site (other than minor updates) and other sites, which I also oversee, weren't effected by the change around the 22nd.
The new pages were added between the 5th and the 15th, and were all crawled before the 22nd.
Has the index page lost PR?
What's happened to the PR of the old pages?
Have any of the pages been PRO'd?
Of course you may not have the 'real' PR info yet.
Am I right in thinking the Google PR tool gives a false reading at this point?
I seem to have read that observation somewhere.
>> The new section (...) has now got the same sort of subpages that ...
>> This is the same format employed site wide
>> The new pages are not the most original in the world
>> What text there is is...
Erhm.. something strikes me a being quite similar here... if i didn't know better i might think that these four phrases were just one phrase with only minor variations. I guess if i saw 500 similar phrases i wouldn't really be able to tell one from the other... get it? ;)
yes, i saw that: "No dup issues or anything obviously like that" - but obvious isn't really always...
Does it really take 8 months to come out of a sandbox?! ... I was under the impression that it was 60 to 90 days :/
Whoever believes in sandbox theory will say none of the site is out of sandbox yet. And also there is no time frame.
very funny claus...
No, what I meant was, it's not like this is CNNdotcom where every page is a new original piece or something. This is an information site organized a bit like a directory. It also contains contributed content. Similar in structure to Brett's pyramid thing. There are text blurbs for each entry, and we write every one of the entries ourselves; 10 or 20 entries per page...thousands of entries across the site. And there are also contributed articles (hundreds) from enthusiasts with interest/expertise.
Also, I used widgets to illustrate, but the site is not about things for sale. Mainly information. It's closer to a site about butterflies than it is to a site selling books or bbq's.
It's more a labor of love, that ended up being profitable too...revenues now come from advertising/sponsorships and more recently (last year) some specialized aff links.
I made some of the points I did because we've already looked at things like too many templated pages and stuff like that. But we don't see any dup issues here, and believe me, we've seen some pretty subtle dub problems in the past. (I'm one who believes that the dup filters have a *lot* to do with what has been going on over the last year.)
IF the site were about butterflies, which it's not, and we added a new section on European varieties, when the site had mainly covered N. American varieties, one might expect the new pages to be structured similarly to the old, as is the case here.
[edited by: caveman at 6:19 pm (utc) on Sep. 28, 2004]
You have a deeply suspicious mind, my friend.
But, you may have a point.
Why caveman did you feel you had to upload 30% new content in one hit?
Isn't it more natural to upload as you go along?
It would appear Google puts the anchors on anything suspicious, until it can check whether its OK or not.
Your 30% upload seems 'unatural'. Well, to me at any rate.
Do you think you have hit the duplicate content button?
Have you done this do you think inadvertently?
Sorry, this is not criticism - we are just wanting to help by analysing the facts.We may be totally off beam...so excuse the observations.
Over to you Caveman
Midhurst, per my added note above, all we did was to expand a previously meager section of the site (single page) to be comparable in quality and structure to the other sections. Doing it in pieces would have made the new expanded section look odd/incomplete, so we waited until it worked well as a stand alone entity.
We were hoping to get a bit of nice press for the new section.
OK, Caveman, sorry about the last post.
Have now read your most recent post which overlapped with mine.
So lets have your comments about the PR thesis?
Caveman, I don't think it's the new content. I think it's a tightening of the dup content filter that we first saw on August 4th. Specifically, stuff that was not considered duplicate content prior, is now being filtered out. This is reducing the positive impact of internal linking and anchor text, thus eroding your position in the SERPS across the board.
I am, however, curious as to whether your new pages have been spidered. We've also been adding huge numbers of new pages, and have noticed that the speed for these pages to show up in Google has slowed down significantly.
Yeah most of the pages have been spidered.
Never seen the addition of new pages to a site hurt the site's PR. :) The percentage of links pointing to external pages is no different on the new pages to what exists on the old pages.
Re the dup thing, as noted, the content on the new pages is similar to the old content only to the extent that when a butterfly site adds new pages about new regions/varieties, the new pages are also about butterflies. I can't see that hurting the site.
We looked at whether more anchor text (by virtue of more new pages) pointing back to other key pages in the site might have caused a problem, but we're not going to have some of our links say "Regal Butterflies" and others say "Regal-ish Butterflies" just to avoid some screwy filter.
It seems i was wrong then, sorry about that. I could buy in to the theory that the "indexed volume per domain" knob - if such a knob exists - had been turned somehow, though.
Here's the updated track for the search "
bbc site:bbc.co.uk" - i'm not sure it's 100% related, but with a bit of effort (and a suspicious mind) i guess you could spot a trend anyway:
Oct 12, 2003 [webmasterworld.com]: 3,100,000 pages
Apr 09, 2004 [webmasterworld.com]: 823,000 pages
Jul 12, 2004 [webmasterworld.com]: 696,000 pages
Sep 28, 2004 [google.com]: 586,000 pages
Hey claus, that was a fair question....
I do think that there's something to size of sites playing a role in the algo...and also rate of growth (though I know that that's controversial)...but adding 500 pages of genuine listings to a 1400 page site, as a reason for essentially knocking out the entire site?
BTW, PR bars still show nornal, but I don't trust 'em now anyway, so who knows.
"The new pages are not the most original in the world"
How did you create these pages? Where did the information come from? Did you do a wee bit of plagiarism ie re-write stuff already in print or the net, add some fresh thoughts, that sort of thing.
How many of us have the cash or time to commission original material from an expert. We use copywriters, freelance authors, just about anyone with some writing skill.
Could you have been hit by Latent Semantic Indexing?
If your writers were not experts in the field, just ordinary folk with a writing gift, could they have written stuff which didn't look right when LSI is applied? Yes! a horror story, if LSI has advanced to penalising content wholesale.
But, if we think hard about Googles probable intention to rid the net of duplicate content ( no!, that's a certainty) and all the puff and wind which goes for content it becomes a serious contender.
We all know good, rich, abundant text content is part of the "weighting" in the algo ( or we think we do) - Google is always banging on about the need for us all to concentrate on giving the punter a rich satisfying diet. So it could be on the cards for a LSI filter to knock out anything which was 'unnatural'
Phew! a worrying thought isn't it?
Back to you Caveman. How did you write your new stuff?
What was your root stock of information?
What I meant by a horror story was this.
I could understand new content being penalised under LSI if it didn't hold true; but to bash the whole site when a certain percentage of 'false' pages were added would be a total nightmare.
Lets hope it's just me being paranoid and you are able to debunk it rapido. But, thinking outside the box is probably the only way to get at the truth.
Come on Caveman please respond to my observations.
Here is a comment in another thread by Decaff:
Today...when a new site that is trying to "suddenly" compete in a competitive area shows up in the SERPs with hundreds of inbound links, a huge number of content pages but no real history...this raises a red flag as far as Google is concerned...this would fall under the "anomalies" aspect of data mining..
If your content is seen as 'anomalous' because it is (a) suddenly injected into the website and (b) looks like wind and puff when examined by LSI then surely Google will put it into limbo for a while?
Please answer my question about how you created the new content. Could it be construed as wind and puff?
| This 84 message thread spans 3 pages: 84 (  2 3 ) > > |