Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Ranking well on major keywords until I added many pages

         

ItsOnlyMe

12:16 am on Jan 3, 2009 (gmt 0)

10+ Year Member



Well, here's my first post after lurking for a couple months. Great resource BTW.

My website was ranking well on our favorite keywords, we'd moved up from 80+ or so in September to the 10th - 12th position. This was the result of a couple of months of studying SEO and re-writing my website (largely ignored for years) to match. All was well.

However, I made a big mistake apparently, and I'm pretty much at a loss at the moment exactly why it was a mistake, how to fix it, or if it will fix itself. I'm hoping that maybe someone here can give me some advice. Many many hours of searching for the answer on my own and I'm basically where I started.

I'll try to keep this brief, which may be tough.

My site had 50 or so pages of unique content. Google's cache had about 80 pages in it, but many of them were pages that I had renamed, the old names still remained in the cache.

After reading in multiple places that generally the more unique pages you had, the better, I came up with the (not so) bright idea of adding a -bunch- of pages that would be appealing to my target audience. The content was mostly unique, and I felt it would not only bring my desired audience to my website, but would also help with SERP by giving me more pages.

I added links to 48 pages to an existing, indexed page linked directly from my home page, one for each (continental) state. These new pages consisted entirely of links to the bulk of the new pages.

In hind-site, I know that it was totally reckless. What I added amounted to 1450 pages, added to a 50 page site.

I made the change on Sunday the 13th of December. Monday morning I found in my logs that Googlebot was already crawling the new content. I gleefully did a search on my keyword, half expecting that I had moved up. No movement, was still at about position 12. About 30 minutes later, after verifying that Googlebot was still crawling, I searched for my keywords again. To my horror, I didn't find my website until I reached page 7. I was sick.

I panicked and deleted the links from the existing page, and moved the other pages so that Googlebot would quit crawling them. A few hours later I decided that this could cause even more harm due to 404s, and put the pages back in place, minus the links (i.e. they existed but were no longer linked to from my site).

The new pages mainly consisted of names, addresses, a (google) map, and in many cases (maybe 80%) a list of services that these businesses provided. They should have been unique from the standpoint that no other page on the net (that I could find) had all the information in one spot.

Today is over two weeks later, and my site is still banished to the back forty, although the position flips from page 5-8 from refresh to refresh.

I'm leaning towards triggering some sort of duplicate content or spam filter, and I think it may be because all of these pages were "framed" in a template that would include 2-3 customer testimonials in a sidebar, plus other ad verbiage at the top of the page that would have been largely the same on each page (although some of it rotated at random, so those portions would have been different to some degree). I'm thinking that maybe the small amount of content on some of these pages didn't offset the non-unique content from the template, and this looked like hundreds of pages that were almost all the same.

Since the 14th, I've spent hundreds of hours adding more unique content to each of the pages. It's been difficult since there are so many of them, but I do believe they're much more unique (again, assuming that was the problem).

Googlebot has crawled many of the pages multiple times since the 15th of December, but almost every one of them has a version from 12/16/08 in the public cache. I don't understand why the cache isn't being updated even though Googlebot has received newer versions of the pages. For example, page picked at random was requested on 12/15 (the bad day), 12/16, 12/24, 12/28, and 01/01. However, the cached page is dated 12/16 and is of course minus the changes I've made.

A site search initially returns 246 pages, but then after 124 of them, I get the dreaded "we have omitted some entries very similar" message. This is mainly why I'm concluding that it's a duplicate content issue. And keep in mind that there should be close to 1500 pages in the cache, I've confirmed that everyone of them has been requested by the Googlebot multiple times since 12/15.

Other than calling me stupid, which I already know, anyone have any helpful suggestions for fixing this, short of trying to get all the pages removed? I've now put in so much time to improve the content that I'd feel like admitting defeat by removing them.

I'd like confirmation that I triggered some sort of filter or penalty, and some advice for fixing it. Maybe a bit of explanation of why the cached pages aren't being updated, which I feel would help the problem because of the added unique content.

My home page and many of my important interior pages moved up from PR3 to PR4 with this weekend's export, so this leaves me more in a quandry.

BTW, until yesterday, the new pages were not in my sitemap. Yesterday in frustration I went ahead and restored the links and added all of the pages to my sitemap (and re-submitted) just in case that was somehow aiding the problem. Webmaster tools now shows 1500+ pages in the sitemap, out of which (it says) 46 pages indexed (which is not accurate either per the site: search).

Sorry, it did get long.

darkyl

3:43 am on Jan 3, 2009 (gmt 0)

10+ Year Member



It's never a good idea to add so many pages alltogether.

Anyway I think you are right, the problem might be that you have many pages with thin content on them, consisting only of names, addresses and a google map.

You also say you added some "testimonials" that, while rotating, were the same for all the pages.

I was basically in a very similar situation: 2000 pages contatining only names, addresses and maps of stores. I also had something similar to your testimonial. Site wasn't ranking well and was penalized in the serps.

Then I thought that these testimonials contained more text than the "core" text of each page so all pages had a majority of identical text with a few different lines (names, addresses, etc).

Removing those "testimonials" seems to have helped: after months of bad serp's, the site suddenly started to reappear with a 1000% boost in traffic. Now it goes up and down, as if Google is trying to evaluate if the site deserves those good serp's or not.

I think you're doing the right thing adding more unique text to each page.
Also be very careful in your internal linking, avoid "mass" menus, duplicate menus or pages with too many links, try to build a hierachical structure for the new pages (which might be worse for PR, but PR is useless if you get filtered out or penalized).

Also, are the anchor texts for the links to the new pages similar between them?

ItsOnlyMe

4:57 am on Jan 3, 2009 (gmt 0)

10+ Year Member



First of all, thanks so much for your reply. I'm glad to know that someone else has experienced this, and you already gave me ideas on fixing it that hadn't dawned on me.

I'm pretty sure the biggest cause of this is that they are templated pages. I've spent all the time since my original post researching, and it appears that templates are, although good for visitors imho, bad for SEO.

Every page on my site uses the same template. It consists of a top page graphic, a banner, some small text blurbs like "we will finance your purchase", "highly recommended by professionals", etc. These are static across all pages.

The template displays 3 random testimonials in random order, from a pool of around 15.

The template has the exact same menu structure, with my "core" pages in it.

The template also does the footer, which has links to my privacy policy, a contact page, search page, and home page (I added a "nofollow" tag to all of these links, as well as most of the menu links earlier today).

The anchor texts are not very unique. Many of the business that I'm displaying information for are chains (probably 80% of the pages are spread amongst 4 chains). I ignorantly just used their name in anchor text. Making the link text the name and city, state should aid here I would hope, as that should be unique.

You've given me a bunch to chew on, and the link anchors are an obvious fix. I'm just not sure how to deal with the template issue.

Google says that we're supposed to build pages for users, the template accomplishes that for me. I'll work on a new way to accomplish this, if nothing else I'll display the new pages minus the template, but I'm trying to advertise to those that view the data I'm displaying. This sucks. I wish it was "kosher" to just not display the template when it's a bot visiting, but of course it's not. Sure is tempting I must say.

How long after you eliminated your testimonials was it before you saw an improvement? Do you think it's a penalty with a set time frame, or is it a filter that auto corrects as new content gets cached?

Thanks Darkyl, you've been extremely helpful.

potentialgeek

5:36 am on Jan 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm thinking that maybe the small amount of content on some of these pages didn't offset the non-unique content from the template, and this looked like hundreds of pages that were almost all the same.

Right.

Since the 14th, I've spent hundreds of hours adding more unique content to each of the pages. It's been difficult since there are so many of them, but I do believe they're much more unique (again, assuming that was the problem).

I think you're better off removing all the thin pages and not putting them back online until they are thick. If you can't make them thick, keep them offline. Quality over quantity. Right now Google just sees the volume of pages as spam, and it will keeping seeing it as spam as long as the pages are very thin and very similar. It's a very old spam trick to inflate the size of your site. The crackdown started in 2005 (or before).

Googlebot has crawled many of the pages multiple times since the 15th of December, but almost every one of them has a version from 12/16/08 in the public cache. I don't understand why the cache isn't being updated even though Googlebot has received newer versions of the pages.

Google stopped updating its cache as soon as it crawls in 2008 or before. Don't worry about it. Very few users click on the cache when they can click on the site.

A site search initially returns 246 pages, but then after 124 of them, I get the dreaded "we have omitted some entries very similar" message. This is mainly why I'm concluding that it's a duplicate content issue.

You're right again. So why keep the duplicate content online?

And keep in mind that there should be close to 1500 pages in the cache, I've confirmed that everyone of them has been requested by the Googlebot multiple times since 12/15.

You need pretty unique pages to get them all indexed. No way is Google going to waste its storage resources caching tons of duplicate content.

Other than calling me stupid, which I already know, anyone have any helpful suggestions for fixing this, short of trying to get all the pages removed? I've now put in so much time to improve the content that I'd feel like admitting defeat by removing them.

Keep the work you've done, but remove all lame pages until they're valuable. In 2007 (if not earlier), Google started to target thin pages and thin sites. I deleted over 100 thin pages off my site when it was penalized and gradually the penalty was completely lifted.

p/g

darkyl

1:38 pm on Jan 3, 2009 (gmt 0)

10+ Year Member



The anchor texts are not very unique. Many of the business that I'm displaying information for are chains (probably 80% of the pages are spread amongst 4 chains). I ignorantly just used their name in anchor text. Making the link text the name and city, state should aid here I would hope, as that should be unique.

So, another point in common: anchor texts consisting of name+city, just like I did (my site also lists chains).
It makes sense but the similar anchors combined with thin content pages triggers some flags at Google.

How long after you eliminated your testimonials was it before you saw an improvement?

I started to see improved rankings after 7-10 days after I removed the "testimonials" but the wait might be affected by many factors.

Do you think it's a penalty with a set time frame, or is it a filter that auto corrects as new content gets cached?

I don't think the penalty has a set time frame, google will lift it as soon as it thinks your site is "reliable" again.

wheel

1:56 pm on Jan 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's never a good idea to add so many pages alltogether.

I disagree. Google has specifically stated it's not a problem at that volume, and in the past I've added thousands of pages of *unique* content all at once and had no problems - in fact my traffic doubled.

So the problem's not adding the content. I think potentialgeek probably has it nailed when he talks about unique and valuable content. Adding a bunch of thin pages would be the first place I'd be looking.

darkyl

2:45 pm on Jan 3, 2009 (gmt 0)

10+ Year Member



I disagree. Google has specifically stated it's not a problem at that volume, and in the past I've added thousands of pages of *unique* content all at once and had no problems - in fact my traffic doubled.

You're right, I generalized while I was thinking about this specific case, similar to mine: many similar pages with thin content, similar anchor text in a directory style site.
I've noticed that if I add a few stores (pages) at a time instead of hundreds alltogehter they're less likely to be filtered out.

It might be (just a theory) that google *thinks* that hundreds of new entries alltogether don't seem natural in a "quality and reviewed directory" (which they prefer as they pointed out several times). So, in this case, the number of new pages that triggers some flag might be lower than usual.