| This 58 message thread spans 2 pages: 58 (  2 ) > > || |
|Fixing a Supplemental Results - duplicate URL problem|
I've been reading g1smd's information on supplementals, and I had a question.
Almost all of my 90,000 pages were deemed supplemental after a duplicate content issue with a script that google somehow found out about.
I tried to change the URL, and 301 the supplemental to the new url. The new URL after about 2 weeks, all also became supplemental.
Today, desperate to get my ecommerce site back to "normal", completely re-did all of my URLs, and actually deleted the old supplemental pages that used to 301, and the actual page they 301'd to. So now all supplemental pages in google return a custom generic 404 on my site. All products now have new URLs and directory structures.
Is this the right way to get my site back to normal?
[edited by: tedster at 6:35 am (utc) on May 4, 2007]
Moving to brand new URLs in the hope of shaking off Supplemental is doomed to failure. The new URLs flag to Google that you now have a new site. It takes a while for any PR benefit the old URLs had to start flowing to the new URLs. You have to start again with acquiring incoming links. All in all, if I read what you wrote correctly, it is a thoroughly bad idea.
The idea behind the redirects was for sites that already had two or more pre-existing URLs each of which serve the same content, and all with a 200 OK status, for those to change things so that only one URL served "200 OK" content, and all the others send a 301 redirect to that canonical URL.
If I have misread the situation, please point me to the discussion that you are quoting so that I can see the context.
>> actually deleted the old supplemental pages that used to 301 <<
Just a point on terminology.
If a URL returns a 301 response, then there can be no content delivered at that URL. The server doesn't get a chance to send any content. The redirect has already been issued to the browser and the browser is already re-requesting some other URL instead.
You can have content on the server, but it will never be served. The redirect kicks in first. The browser never gets to that content.
A URL returns either
- content and a 200 OK response, or
- it returns a 301 or 302 response, no content, and the URL of where to go, or
- it returns an error code like 401, 404, or 500, and an error page for the browser to show.
Thank you for your response, it's very much appreciated!
Basically, I had the classic issue of almost all of my pages go into supplemental results. What happened was I had a url rewrite that was working off of dynamic URLs. Google somehow found out about the dynamic urls, and I got dinged with duplicate content for just about all of my pages. My traffic fell heavily.
Since each page was heavily concentrated on the keywords for it, there was no way I felt to ever get that page to rank for different keywords (I read your example before about red widgets and blue widgets). So I knew the only way to go "back to normal" was to just 301 the current supplemental results into new pages that I created, in hopes that my site would just bounce back, since I put a block in my robots.txt of my rewrite directory, and had only one fresh page returning a 200.
After only a day, Google started to pick up on the new URLs, and they ranked almost as well as the old URLs. After a few days, those new URLs *also* went into supplemental. I was devistated! I read everything you wrote (just about), and I found something that said if you 301 from a supplemental, you are basically going to move the supplemental ranking from that page to the new one.
So what I did was, I felt I had no other choice since my traffic was down so drastically. I simply made all the old pages a straight 404. I checked to make sure the headers did infact return a 404, and they did. I had all new URLs for all of my pages. Google picked up on them right away, and it's been 2 days so far, and those pages are still ranking well, and are not supplemental.
I have my base href set, have all directories blocked in robots that should not be open, and even went so far as to just delete those dynamic pages, just to be 100% sure, there can be no more duplicate content.
[edited by: tedster at 6:34 am (utc) on May 4, 2007]
>>> If a URL returns a 301 response, then there can be no content delivered at that URL. The server doesn't get a chance to send any content. The redirect has already been issued to the browser and the browser is already re-requesting some other URL instead.
You can have content on the server, but it will never be served. The redirect kicks in first. The browser never gets to that content.
I completely made than 404 because I thought if you 301 from a current supplemental url, then you are telling google "take all the information from this current url, and apply it to this new url". That is why I thought my 301s turned into supplementals also. Is this incorrect?
Just an update....
another 2 pages of URLs was added to google, I'm happy that so-far, it seems to be working... I would be devistated if these new pages go into supplemental.
Pages generally go into supplemental due to duplicate content right?
Thanks guys, this website is great!
I had a little issue with this, and found the neat new 410 Gone to be a nice solution.
Unlike a 404, which says, "We can't find it." and may be a temporary situation, requiring an HTTP compliant User Agent re-request the resource from the original location.
Or, a 301, which says, "We moved all the information here, permanently". The purpose of a 301 would be to 'capture' the link weight of inbound links, but it does not sound like there should be any anyway.
A 410 says, "Hey, it's GONE! We removed it! Do Not Ask Again!", in a definite, deliberate, permanent manner. (A 410 *must* be implemented by the server administrator, it is not a default setting, so it communicates rather clearly, the resource was purposely removed, not move, relocated, or just missing.)
To answer your supplemental question:
URLs go supplemental for a number of reasons, the main, stated reason I have read is they do not have enough PageRank to be included in the main index, so rather than not indexing them the degree holders at Google decided on a supplemental index.
Accuracy, retouches, etc.
I hope things improve for you. However...
The last thing you want to do to your site is to change URLs. URLs should be considered permanent [w3.org] -- really permanent, and any requirement to change them --ever, for any reason-- should be considered as a failure. This is a failure in the sense that the original URL-structure of the site was not "designed" to last.
Now this isn't meant as a personal attack -- We (Webmasters) see this all the time on many sites, big and small. But it helps to think in terms of the Web as a vast library of information: If the librarian is constantly shuffling and renaming the books, how useful is the library? Not very.
Many mistakes are made at the outset. Even a simple error, like linking to "index.html" is a problem waiting to happen. Why? Well, in a couple of years you may decide to switch to a dynamic content-generation approach, and then you'll be faced with the problem that the index page needs to be named "index.php" or "index.asp" -- Neither the URL nor the filename actually need to be changed in this case, but that's a tangential subject, here. The point is that if you link to "/" instead of linking to "index.html," then as far as URLs are concerned, switching to using .php to generate your pages is a simple matter of changing the server's DirectoryIndex definition to point requests for "/" to "index.php" instead of "index.html" -- The server takes care of the new URL-to-filename mapping, and none the wiser.
The same applies for all page URLs -- There's no need to mention the "technology" of a page at all. While the file may be .html or .php, the URL need not even mention it! A URL designates content, and no-one but the Webmaster need be concerned with how that content is generated; If the generation method is changed but the content remains the same (more or less), why should that affect the URL? It shouldn't! Apache mod_rewrite, mod_alias, mod_negotiation, AcceptPathInfo, ISAPI rewrite on Windows IIS, and many other techniques can be used to 'hide' the page-generation technology so that it need not ever affect the URL.
Back to the main subject: If you have any more problems, don't rename pages. Instead, do the following:
Make sure each of your pages has a unique title and description. Just having those two factors duplicated can lead to Supplemental doom.
Make sure the unique content on each page is not overwhelmed by the navigation and page template content; If the navigation and "decorative" elements are larger than the unique informational content, you can expect problems -- at least in the short term while the search engines' back-end processing routines are attempting to figure out what is common navigation and what is "content" on each page.
And as for content, I've called it "unique content" above, but you need to make sure that it is unique. If your product descriptions are taken from the manufacturers' sites, and there are any other sites that use the same content -- well then, it's not "unique" at all, is it? -- Boom! Supplemental.
Eliminate all duplication; Each page should be reachable by one URL and one URL only. All others should be redirected to that one "canonical" URL. Avoid situations where the following duplicates exist:
That's eight duplicate pages, common on many, many sites -- even simple, static, .html-based sites. And those are just the ones I could think of right now!
Specifically addressing the static/dynamic URL issue: When rewriting static-looking search-engine-friendly URLs to dynamic script calls, it's a three-step process: Change the links on your pages to look static.
Internally rewrite the static URLs, when requested from your server, into the dynamic form needed to invoke your script(s).
Externally redirect dynamic URLs, only if requested directly by a client (browser or robot), to the new static URL form.
The last step is a big tricky, and how you do it depends on what kind of server you're using. If it's Apache, we have several threads in the Apache section of the WebmasterWorld Library that describe the process more fully.
Thanks for your reply!
>>> Back to the main subject: If you have any more problems, don't rename pages. Instead, do the following: <<<
Thank you for these suggestions. The only problem is these pages are already supplemental. The problem is:
That URL is specifically keyword targeted for "product a". If that pages goes supplemental, and I fix the reason why it went supplemental in the first place, then it will still remain supplemental for life as a page targeted for "product a" SERP. Thats why I figured I had no choice but to fix the duplicate content issue, and go completely new URL. I hope you can see my logic behind this. Do you still think it's a bad idea? I can see if the page is still in the normal index and has not gone supplemental yet, but if it's already supplemental, am I doomed?
>>> Make sure each of your pages has a unique title and description. Just having those two factors duplicated can lead to Supplemental doom. <<<
They all do. Each one is listed as the title of the product, and each description is either 100% unique from me writing it personally, or has a bit of the manufacture description but has user reviews and content like that on the page.
I've given up on re-writing the URLs, and just have my code automatically build pages that makes them static. I'm very nervous now to have duplicate content again :)
>>> The last thing you want to do to your site is to change URLs. URLs should be considered permanent -- really permanent, and any requirement to change them --ever, for any reason-- should be considered as a failure. This is a failure in the sense that the original URL-structure of the site was not "designed" to last. <<<
I completely understand that. But what is one to do if the whole site goes supplemental? This website feeds my family, so I need it up! :) Even if I fix the reason why it went supplemental in the first place, the supplemental pages never convert back to normal pages for that same keyword. Correct?
>>> Unlike a 404, which says, "We can't find it." and may be a temporary situation, requiring an HTTP compliant User Agent re-request the resource from the original location. <<<
I checked to make sure all 404 pages that are on my site now, do infact return a 404 header. Whew!
>>> Or, a 301, which says, "We moved all the information here, permanently". The purpose of a 301 would be to 'capture' the link weight of inbound links, but it does not sound like there should be any anyway. <<<
That is what I did originally. Here is what I did:
1) Pages all went supplemental
2) I re-did all those URLs, and did a 301 to the new page.
So the supplemental pages were 301 redirecting to new pages. Those new pages ALSO went supplemental! Doh! I read somewhere by g1smd that 301 moves everything known about a page from the old page to the new. So I figured it was moving over the "supplemental" as well... bad assumption?
I have no idea why the new pages went supplemental. I was blocking the directory in robots.txt that created the dynamic urls in the first place, and had a 301 on each old page. I figured the only thing to do was 404 all the old stuff, and just start fresh. Knock on wood, *so far* it's working, and I hope it stays like this! What do you think?
>>> URLs go supplemental for a number of reasons, the main, stated reason I have read is they do not have enough PageRank to be included in the main index, so rather than not indexing them the degree holders at Google decided on a supplemental index. <<<
When you have 100,000 + pages, how can you give all of them a good page rank? It's nearly impossible to give all of them good inbound links. :(
Thanks for all your help!
[edited by: tedster at 5:58 pm (utc) on May 5, 2007]
[edit reason] switch to example.com - it will never be owned [/edit]
|then it will still remain supplemental for life |
No, even though they are spidered less frequently, supplemental urls do get spidered and re-evaluated. They can lose their supplemental status.
|When you have 100,000 + pages... |
You're right - I have not worked with a big site that doesn't have a few supplemental urls. A few supplementals on a big site is not the kiss of death for the site by a long shot. In fact, many times those supplemental tags make plenty of sense to me.
>>> No, even though they are spidered less frequently, supplemental urls do get spidered and re-evaluated. They can lose their supplemental status. <<<
Did not know this! Is this rare or common if the page has been fixed for the reason it went supplemental in the first place?
|When you have 100,000 + pages, how can you give all of them a good page rank? It's nearly impossible to give all of them good inbound links. |
...and so, it's nearly impossible to keep them all in the main index.
(I would guess the creative webmaster could find a way to have supplemental pages and still rank for their desired terms. Here's a hint: there may be a way to use all those extra pages to your advantage, even if they aren't in the main index...)
I wasn't saying you should use a 404.
If you purposely removed the page, and they will not return a 410 is better. A 404 will cause SEs to continue requesting the URLs you have removed the resources from, because a 404 is not defined as a permanent situation.
In this situation, you may want to follow jdMorgan's advice and use a 301 'Permanent' redirect to the new SE Friendly URLs.
>>> In this situation, you may want to follow jdMorgan's advice and use a 301 'Permanent' redirect to the new SE Friendly URLs. <<<
But if you 301 from a supplemental to a new page, won't the supplemental status follow?
Indeed, a page can even show as a Supplemental Result for some search terms and as a normal result for some other search terms.
The Supplemental Result will usually be associated with search terms that were for an older version of the content on that page.
The normal results will reflect newer content, mostly the current version of the page.
>>> The normal results will reflect newer content, mostly the current version of the page. <<<
See this is the problem. If I don't change the content on the page, the supplemental page will still be returned for this search term. Since the URLs are built to target what's on the page, changing the content on them is not an option.
Edit: I guess just fixing the reason it went supplemental in the first place is the best option, and hope it returns back as a normal result.... eventually.
Remember to give us an update on your supplemental fixings
So far it looks like google added another 100 pages, or so it says, but I don't see which ones.
As far as normal listings, I don't see any new ones added since yesterday, but it shows the homepage was accessed on the 5th.
I'm going to see if I can 410 the pages, but have a default page there. If the pages are 410, does google remove it from its DB faster?
Also, I used to have supplemental results starting around Page 13 as of yesterday, now they start at page 9. I can't tell what happened, but I'm thinking google removed or moved some pages to supplemental that were returning a 404, and were pages I had removed as part of my fix.
None of the new pages are supplemental... yet, I hope it stays this way! I just did a inurl search and could not find any of the new URLs in supplemental, so my fingers are crossed. I wonder why google hasn't added any new pages in 2 days now, even though they have accessed the homepage on the 5th?
|If the pages are 410, does google remove it from its DB faster? |
No, Google treats 410 and 404 identically.
I think it's important to remember supplemental does not mean supplemental for all searches, so where a page may be returned as supplemental for a site:site.com search, meaning the pages are a 'supplement' to the results presented for site:site.com, the same page may not be returned as supplemental for allinurl: because the new URLs would be the correct results for allinurl:newurl, rather than a supplement to the search.
I think if people (in general, not you specifically) think about what a 'supplement' to a search might be, it makes sense that the first 100 results for a site: search might be considered the 'best fit' heuristically (IOW: Here is an example of what is included in the site, site.com.) for the search, and the rest might be considered a supplement (IOW: If you didn't find what you were looking for in the first 100, here are some more topics, specifics, pages that might be *exactly* what you were looking for from the site, site.com.).
So some of my questions remain:
1) Since a 301 is "give everything you know about this page to this new page", If you have a supplemental page, and you 301 to a normal page, does the supplemental tag follow that page, due to the 301?
2) How likely is it that a page gets re-evaluated and put back into the normal index once it's supplemental?
3) With me changing the URLs, is this hurting my credibility with google?
4) Would it have been better to just leave the pages supplemental, and fix the duplicate content issue, and wait for the pages to be re-evaluated as per #2?
Sorry for all the questions, but I'm sure there are others on here who would like to know as well :) Thanks for all your help guys!
1. Don't know. Never really looked or thought about it.
2. Supplemental pages are re-crawled. Are the pages supplemental for all searches regarding the keywords/phrases/topics of the pages, or just for the site: search?
(If you are thinking you are ever going to get 100,000 pages returned in the 'regular' results for a site: search, keep thinking.)
3. Cool URIs don't change [w3.org]. Is it hurting your credibility? Don't know, shouldn't matter. Pick a set of meaningful URLs which have a clear hierarchy, structure, pattern, naming convention, etc. and stick with them. Get rid of the rest (dups.), either via redirect, or 410.
4. Fix the duplicate content issue. Pick a set of URLs. Don't think you will have all URLs returned in the regular results. Watch traffic, search terms, visits, entry pages, etc. not the little tag at the end of the URL for a site: search.
2) The problem I have is that each URL/page that went supplemental is targeted to a specific product/keyword. So if it's supplemental, getting it to show for a regular result is almost impossible, and changing the content on it is also impossible, since the url is in the format of [somesite.com...]
And the site:search does show the page as supplemental, so if i search for that keyword, the page doesn't show up until deep in the results - hence my traffic has fallen by half. :(
(If you are thinking you are ever going to get 100,000 pages returned in the 'regular' results for a site: search, keep thinking.)
I know of about 5 sites from the top of my head that have all 100 pages in site:search as regular results. I'm jealous :)
Get rid of the rest (dups.), either via redirect, or 410.
I'm worried with a 301 redirect, it will move the supplemental tag along with it. With the 410, tedster says google treats 404 and 410 the same. If this is the case, I'll leave them as 404, since I currently have a custom 404 set. If there might be a difference with the 410, is it possible to get it set with a custom page?
Fix the duplicate content issue.
Done. I have set my base href, have completely deleted all pages that could be causing duplicates - and for good measure, blocked the directory in robots.txt.
One more question...
Lets say IF the new pages that I have created, and have been added by google turn supplemental as well, what could be the cause?
Would it just be the low pagerank?
Before all of this duplicate content issue started happening, almost all of my pages were returning high results in google (first two pages), and I had no issues... I didn't even know what supplemental results were until google got hold of my dynamic pages!
Thank you for helping me understand all of this!
|I know of about 5 sites from the top of my head that have all 100 pages in site:search as regular results. I'm jealous. |
Still not the 100,000 I posted. 100 x 10 = 1000 ; 100 x 100 = 10,000.
I didn't know what your results / expectations were.
|With the 410, tedster says google treats 404 and 410 the same. |
Yes, they state they handle a 404 as if it is a 410, but a 410 is the correct response. Go ahead and leave the 404 if you like. Personally I would rather serve the correct response, because then if Google (or others) decides to handle Status Codes in an HTTP RFC Compliant manner I don't have to change anything.
They currently handle 404's incorrectly, treating them as 410's which were introduced in HTTP 1.1 since most hosts & webmasters do not bother with the technicalities of the web.
Would it be 'just the low PR'?
I would think the better question would be: What is causing the low PR, is it just the duplication splitting the PR passed to each page, or is there something else?
What do your inbound links look like?
Google changed their 'link weight' calculation and it could have taken some time to populate through the entire system, so is it possible some of your inbounds were devalued and contributed to the drop?
What is the age of your site compared to the others in your niche?
How original is your site compared to the others in your niche?
How do your inbound links compare to other site in your niche?
And on, and on, and on...
It's also helpful to remember that a supplemental results is a url PLUS a cache date. The Supplemental Results [webmasterworld.com] reference thread in our Hot topic area is a good resource and touches on that fact. Many of the other issues discussed in this current thread are addressed there as well.
If you create a new url and 301 redirect the old url to it, but the new url is essentially sitting in the same position within the link structure and has only the backlink influence transferred by the 301 redirect, it is most likely that the new url will also become supplemental. This is not because a 301 "transfers" supplemental satus, per se, but because the root cause has not been changed - the new url is in the exact same situation and so gets evaluated as supplemental for the same reasons.
but the new url is essentially sitting in the same position within the link structure
You mean a duplicate content issue?
No, he means the URL is not the root cause of the problem, so you can move them 'till the cows come home' and it won't do any good.
The problem is the low PageRank (or some other related scoring mechanism, causing the resources to be returned as supplemental), so you will need to address the exact same issues with a new URL, which may include, but are not limited to, the issues I have pointed out above, and the issues addressed in the thread Tedster indicated.
The real answer is your are going to have to do some SEO work to get your URLs out of the supplemental index for any meaningful period of time, which usually starts with determining a reasonable 'root cause' for them being supplemental in the first place, which may include, but is not limited to, the duplication of the content on your own site.
Removing the duplication issue is the easy answer. Increasing the PageRank of 100,000 pages OR finding an alternate method of ranking for your chosen terms is where the real work starts.
It is going to continue to get harder to rank a website for your chosen terms, so starting on a long-term plan to continue increasing the 'strength' of your site should be priority 1, and should include a plan to increase the heuristically perceived value of your website in relation to others in your niche.
>> 1) Since a 301 is "give everything you know about this page to this new page", If you have a supplemental page, and you 301 to a normal page, does the supplemental tag follow that page, due to the 301? <<
No. The new URL is evaluated for its own content, and incoming links, and so on. It is treated as a "new URL", and that might not be a good thing.
>> 2) How likely is it that a page gets re-evaluated and put back into the normal index once it's supplemental? <<
If the reasons for being Supplemental are removed, then it can make it back into the main index for some searches. Whatever happens, there will still be older versions of the same page at the same URL that will continue to be indexed as Supplemental Results.
>> 3) With me changing the URLs, is this hurting my credibility with Google? <<
You are signalling that you have a new site, and so you reset some of your age factors back to zero. That likely isn't a good thing. Cool URIs don't change.
>> 4) Would it have been better to just leave the pages supplemental, and fix the duplicate content issue, and wait for the pages to be re-evaluated as per #2? <<
Yes. You should fix the Duplicate Content issues and then wait for Google to reindex things.
When you fix a www vs non-www Duplicate Content issue, you will see many of the www URLs reindexed as normal results in weeks to months, while the (now redirected) non-www URLs slip into Supplemental and continue to show like that for a year or so. There is nothing you can do to change that behaviour, nor should you - if those Supplemental Results appear anywhere in the SERPs and a visitor clicks them, your redirect gets the visitor to the correct URL on your site anyway.
| This 58 message thread spans 2 pages: 58 (  2 ) > > |