Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

90% of our pages went supplemental in past month.

         

Bill_H

8:56 am on Apr 28, 2007 (gmt 0)

10+ Year Member



Hi:
We have a two year old website well ranked in Google that has seen 990+ of 1200+ pages go supplemental in one month.

1. The website pages are dynamically generated, along with dynamic unique page titles, dynamic unique meta descriptions etc. The website is on a shared server on IIS using Asp.net 1.1.

2. The URL format used in all external and internal links is of the SEO friendly pattern domain.com/product-category/product-name-123456 (product numeric id).aspx

3. An IIS low level HTTPagehander converts the SEO friendly URL to an internal format of domain.com/product-category/Products.aspx?PId=123456&CId=1 but the browser and the bot see just the SEO friendly URL.

Example.
External format SEO friendly URL http://www.example.com/product-category/specific-product-name-1234.aspx

Internal format http://www.example.com/product-category/Products.aspx?PId=1234&CId=1

4. So technically there are two URLs for every page - one SEO friendly, one paramterized and thus it is possible for every page to have a duplicate content penalty as two different URLs point to the exact same page.

5. BUT our robots.txt blocks the parameterized URL pages using the domain/product-category/products.aspx?PId=1234 format by blocking domain/product-category/Products.aspx page. Yes, we block upper and lower case to get around the IIS issue of ignoring case.

6. Submitted daily sitemap to Google lists all and only 1200+ SEO friendly urls.

7. Google DOES have in their index 16 pages with both the SEO friendly URL and the internal parameterized url - which logically draws a duplicate content penalty for those particular pages. But that does not explain why the remaining 990+ went supplemental.

The immediately obvious conclusion would be duplicate content penalties. But if duplicate content penalties caused by the SEO friendly URL and the internal parameterized URL showing the same content - then it means that GoogleBot regularly violates the robots.txt.

In our server logs which currently only go back to January 1, there is no evidence of Googlebot having crawled any of the forbidden internal pages. Furthermore, in Google's sitemaster tools, we have no indication that GoogleBot has tried to access the forbidden pages and been blocked by the robots.txt. And to repeat, there are only 16 pages in the Google cache for the website showing the parameterized URL.

Thoughts?

TIA,
Bill

[edited by: tedster at 5:55 pm (utc) on April 28, 2007]
[edit reason] switch to example.com - it will never be owned [/edit]

trinorthlighting

6:15 pm on Apr 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are your product descriptions manufacturers descriptions that are duplicated on the internet on the manufacturers site? If so, that could be one cause.

Bill_H

6:16 pm on Apr 28, 2007 (gmt 0)

10+ Year Member



Nope, our product descriptions are unique.

TIA,
Bill

tedster

6:55 pm on Apr 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First, a few supplementals is not easy to avoid for a site of any significant size. While it can be a sign of trouble, you may just be looking at low PR for the urls, or semantically near-duplicate on-page language compared to other urls on your site.

Only 16 duplicate urls is most likely just going to trip a filter so that both urls do not show in one search. A true penalty is not the usual result.

-----

I've worked with several large sites that use such a rewrite scheme in .NET. In every case, they have had one or more of these liabilities in the beginning:

1. It looks like your url rewrite still keys off the product number. So if someone makes a typo earlier in the file name, the url will still resolve. For example, garbage-junk-1234.aspx will still resolve, with a 200 status code, as if it were specific-product-name-1234.aspx

2. If a bad url does come in, be sure you are returning a 404 status code in the http header. On IIS, it's very common for so-called "custom error pages" to return a 302 status.

3. Be 100% sure that you have addressed capitalization issues thoroughly in every section of the url that follows the domain name. I suggest you "kick the tires" very hard in your url schema -- while checking the http status codes for each instance. Firefox with the Live HTTP Headers extension is a great free tool for this kind of QA.

4. Remember that you have the IIS level for error handling, and you also have the .NET engine's own error handling. Check out how .html extensions are handled in case you get a bad backlink.

And finally, here's another active thread on the topic of url rewriting and Google. You may pick up some good tips there:

Rewriting Dynamic URLs w/o Having Duplicate Content [webmasterworld.com]

Bill_H

7:18 pm on Apr 28, 2007 (gmt 0)

10+ Year Member



Tedster:
Having a few pages goes supp would not be too disturbing, having now 1010+ (12 more today) of 1200 go supp is terribly disturbing to our website when we only have 1200+ to begin with. We could not compete with the 800lbs bear in our space, so we focused on getting ranking for our individual product names and settled for being at the bottom of page one on our keyword. We can't compete with the bear or Amazon or wiki.

Anway, it is our product pages that have gone supplemental, some have a PR of 5 and some have really decent link juice, yet they are going supplemental like crazy. Prior to this most were page on for their product names for at least a year.

TIA,
Bill

rekitty

4:52 am on Apr 30, 2007 (gmt 0)

10+ Year Member



Anway, it is our product pages that have gone supplemental, some have a PR of 5 and some have really decent link juice, yet they are going supplemental like crazy. Prior to this most were page on for their product names for at least a year.

Bill, do you have external links directly into the product pages? I found a single external link can do wonders.

Also, did your PageRank hold up in the latest update? Are you *sure* you still have good juice incoming? Is it possible a bunch of your external links could be on pages gone supplemental?

JudgeJeffries

6:41 am on Apr 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I found a single external link can do wonders.

Anyone one else found that one backlink is sufficent?

Bill_H

7:00 am on Apr 30, 2007 (gmt 0)

10+ Year Member



In answer to the poster about some external link changes, I am sure there has been link flucuation on a few of the pages, there always is a changing relationship. But I am not talking about a few pages here and there. I am talking about over 1000 pages going supplemental in one month.

As far as the poster asking about one link being sufficient to stay out of supplementals, I doubt it is a critical factor or that it has a numerical decision point. We have some pages with a PR of 2-3 with dozens of decent links that have suddenly gone supplemental. Many of them were ranking above the fold, page one on Google for their keywords.

Cheers,
Bill

sigmon

10:53 am on Apr 30, 2007 (gmt 0)

10+ Year Member



Hello,

I belive the amount of content on the page is a factor, should have 250 words min per page?

Bill_H

1:24 pm on Apr 30, 2007 (gmt 0)

10+ Year Member



Sigmon- do you have any facts to base that conclusion on - or are you just taking a guess? Frankly I have hundreds of unique pages with 500-1000 words of text that have gone supplemental.

Cheers,
Bill

malachite

1:48 pm on Apr 30, 2007 (gmt 0)

10+ Year Member



I believe the amount of content on the page is a factor, should have 250 words min per page?

If that were the case, we wouldn't see MFAs (made for adsense sites) topping SERPs; they'd go supplemental. But they don't.

And it doesn't explain why many long established sites are seeing pages with 100% unique content, 1000 words+, ending up in the supplemental index.

pageoneresults

1:52 pm on Apr 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmmm, I'd really be looking at the technical side of things. You have a lot going on and one hiccup within the machine can cause major issues. Have you checked, double checked and triple checked the responses being returned by the server headers? Are you sure that you are returning a 200, 301, 404, etc. when applicable?

A shift in your PR may also present some issues. Any changes in the displayed PR for your site over the past 60-90 days?

Is it possible that a technical implementation you are using is only supported by one of the SEs? For example, Google may support a particular protocol with robots.txt. The others may not. So, you leave yourself open to be indexed by the others but not by Google. What about all the pages being indexed by the others? Those are surely going to get indexed by Google at some point. The chain of command is almost infinite. ;)

5. BUT our robots.txt blocks the parameterized URL pages using the domain/product-category/products.aspx?PId=1234 format by blocking domain/product-category/Products.aspx page. Yes, we block upper and lower case to get around the IIS issue of ignoring case.

The above is probably the culprit. Using robots.txt for preventing the indexing of content may not be the best solution.

In this case, your rewrite should be set up so that those dynamic pages cannot be browsed to. I've mentioned competitor sabotage in a few topics here and there. This is one of those areas that is ripe for the picking. And, being on a Windows platform makes you even more vulnerable due to the way it handles other things (see tedster's comments above). :(

If you are in a fairly competitive space, I'd take the above into serious consideration. Do some research and see if you can find any inbound links to the dynamic URIs.

Play_Bach

2:16 pm on Apr 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just checked the Supplemental page count of two of my sites:

180 (of 635) pages indexed, the rest supplemental
160 (of 214) pages indexed, the rest supplemental

Is anybody else seeing a pattern where the supplemental pages seem to kick in around page 15+ of the Google results?

Bill_H

2:39 pm on Apr 30, 2007 (gmt 0)

10+ Year Member



PageOneResults:
The website is running on a shared IIS server, so we can't use htacess or ISAPI. Stuck with using a HTTPHandler low level. Using robots.txt just happens to be Google's recommendation.

We also have a 301 redirect occuring if Google tries to browse the internal URLs that are also blocked in robots.txt.

Perhaps the most puzzling aspect, is that there has been no substantive structural changes since last year. Only the content is periodically updated. Somehow, in March or April the algo for pages to go supplimental seems to have changed- I suspect.

TIA,
Bill

Bill_H

2:41 pm on Apr 30, 2007 (gmt 0)

10+ Year Member



Play_Bach;
Where supplimental pages "kick in" is primarily a result of total pages returned by the query. If the total page count is low, then the supp pages will get tacked onto the back of the no supp pages. Google has repeatedly confirmed that. Browse Cutt's blog.

Cheers,
Bill

pageoneresults

2:47 pm on Apr 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We also have a 301 redirect occuring if Google tries to browse the internal URIs.

And how are you doing that? Are you using IP or User-agent?

Perhaps the most puzzling aspect, is that there has been no substantive structural changes since last year.

That wouldn't matter. A shift in PR. A technical glitch. There are all sorts of factors that come into play.

The website is running on a shared IIS server, so we can't use htacess or ISAPI. Stuck with using a HTTPHandler low level. Using robots.txt just happens to be Google's recommendation.

That's a problem.

pageoneresults

2:51 pm on Apr 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I forgot to ask, have you seen a drop in traffic since the pages went Supplemental?

Play_Bach

2:52 pm on Apr 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Where supplimental pages "kick in" is primarily a result of total pages returned by the query.

Sorry, I forgot to mention that I used site:example.com to just return my pages (not a general query).

centime

3:12 pm on Apr 30, 2007 (gmt 0)

10+ Year Member



try using the noindex tag on the dynamic url pages

I think All the search engines effectively ignore robot.txt if the spider arrives on the page via an external link

I think robot.txt is probably only interrrogated if the spider starts from your homepage

as suggested above by pageoneresults, check for inbound links to the dynamic urls

Also, i actually believe that catalogue pages, for any site will slowly be deprecated by SE's in favour of informational pages about the same product

After all, how different can 100 pages, from 100 suppliers all selling yellow widgets, be?

Why would any SE give em all the same prominence, especially if the manufacturers product page is available?

rekitty

6:51 am on May 1, 2007 (gmt 0)

10+ Year Member



Here's my two part supplemental story. Hopefully it might help.

Problem 1: Sometime around Aug 06 a bunch of pages went supplemental.

Fix 1: made unique titles, descriptions, headings, etc, and they came back.

Problem 2: Slowly from Oct to March each week additional pages would go supplemental on the fringes of the site causing a loss of long-tail traffic while the head was still strong. This slowly eroded the site until a huge part was supplemental and traffic was impacted.

Fix 2: Apply PageRank directly to the forehead (links, links, links, deep and shallow).

We are back to record traffic now, awaiting the next curve ball Google throws our way.