I don't know when this came out of pre-mod but it seems to have gotten buried.
[webmasterworld.com...] -- Post #25
[webmasterworld.com...] -- Post #37
[webmasterworld.com...] -- Post #400
[webmasterworld.com...] -- Post #414 and #417.
g1smd, I tried to send this to you by sticky but your inbox is full:
Thanks for your reply. Prior to creating the thread I had read all those posts of yours. While they do confirm the problems I'm describing they don't seem to provide any of the answers I was looking for. Perhaps that's why you posted the links i.e. to confirm the problems. Please confirm if that's the case or if there's something (answers) I'm missing. Thanks.
"2) What do you do if yoursite.com/index.htm is a duplicate of www.yoursite.com/index.htm. OK, you've got your 301 in place but how do you treat the removal of the index.htm page? "
I don't think anyone knows. I am trying to do the same and still waiting for a redirected .com/index.html to be removed. :(
Interesting thought about taking the whole site down for a couple of months then try to restart and cure,
anyone here done that?
Don't forget the inflated page count problem that
possibly triggers filters. That one has never been
I posted the links as "further reading", nothing more.
These are some my opinions to the questions you've asked.
|1) Is using the removal tool to remove the whole site the same as using robots.txt to ban Google. If robots.txt is a low-strength medicine is that preferable i.e. ban googlebot, all google crawls stop, google stops showing your page, you re-allow Googlebot, it re-indexes, you've lost a few days traffic but have now got all your pages indexed and indexed correctly. (It can't be that simple, can it?) |
Disallowing Googlebot doesn't remove your pages from the index, while both disallowing and removing the site with URL Console... also doesn't remove your pages from the index, but hides it for six months. There is no way to remove pages for a few days and then get them back. All you can do to clean up your indexed pages is to set up proper 301 redirects and keep waiting.
I succeeded to clean up some URLs with 301 last month, so 301 are not completely broken.
|2) What do you do if yoursite.com/index.htm is a duplicate of www.yoursite.com/index.htm. OK, you've got your 301 in place but how do you treat the removal of the index.htm page? |
Use 301, and ensure that wrong URL also has some backlinks so it's likely to be crawled. Take care to have much stronger backlinks to right version.
|4) If you use the removal tool and remove your whole site for a while... when your site comes back do you still benefit from all your backlinks? I'm assuming that removing your site from Google's index is not as serious as allowing your domain to expire :) |
You benefit from your backlinks even during the time period when the pages are removed - Google keeps crawling them, follows outbound links and credits PR to them.
Its been a long while since I have had anything positive to say about Google.
I think I might be seeing the light at the end of the supplemental tunnel. I hope I am not speaking too soon but I notice two things today on the above DC relating to my site (missing in action Sept 22nd).
1)The index.html is not listed anymore. This Supplemental is aparently out of the index.
2)The site is ranking for one of its phrases again.
Anyone else seeing improvemnts on "7"?
We did what wizard suggested for #2. After placing the 301 we dropped a strong link to the wrong url to be crawled again. So far, we had partial success. The new crawl has made the wrong urls lose their title and description and are now listed as url's only. If what I read here is correct, this is a waiting stage till the link is spider one more time to retrieve title and description. At that time, we expect that the 301 will be complete and google will list just one page with no dupes.
Thanks g1smd, I did particularly like your detailed explanation of the canonical problem.
I cleaned up 301s on one of my sites as well but a site:mysite.com -www still shows up all the old non-www pages as being in supplemental. And it's because Google is not removing them from supplementals is why I think 301 handling is broken at Google.
With respect the home page - same problem. You can 301 it but the old page stays in supplemental and it's more of a problem than dupes of internal pages being stored.
obono, how many times was the wrong URL crawled after the 301? i.e. how many times (and for how long) did Google have to keep hitting the 301 for that page? But, most importantly, what do you get when you do a search for site:yoursite.com -www (if your dup problem was www and non-www versions of the same pages)?
|If what I read here is correct, this is a waiting stage till the link is spider one more time to retrieve title and description. At that time, we expect that the 301 will be complete and google will list just one page with no dupes. |
That is exactly what I was hoping to read here but don't recall having ever seen any thread that suggested this was the case. On the contrary the suggestions have always been that the old URLs stay in supplementals (as per the link in my OP). I would appreciate if you could point me to any thread that suggests Google handles 301s properly.
My largest site had been supplimental (via the site: command) for many moons. Also had the 10x page count inflation problem, but that cleared up a few months ago. Site remained supplimental.
As of yesterday noticed that the site is no longer supplimental, although traffic remains at a dozen or so G refs per day (last year 10,000 G refs per day).
I added this site to Google Base about ten days ago -- seems to be the most logical place to start in figuring out how it got desupplimentalized.
Folks may want to try adding the index from supplimental sites to Google Base and see what happens in a couple of weeks.
Whole site come out of Supplemental - or just starting with some pages?
About 10% of the pages, including index are no longer supplimental via site: command.
Intresting - I have seen this too :) - and on some DC the homepage is top on site command followed by the recently crawled pages - which make it easier to see. Although the amount of DC with these results seem to be declining at the mo :(
I am trying not to get to optimistic though at this stage. I had a homepage crawled recently that was probably not crawled since Jan/Feb time.
Have you submitted to Google Base or would you ascribe your partial desupplimentalization to some other cause?
I don't track G bot consistently but he seems to have been showing up all through my supplimental period.
As of yesterday some of my desupplimentalized pages had a newer cache, but most showed old 2004 caches. Today I am seeing mostly Nov 2005 cache dates.
No - not really intrested in Google Base so have not submitted.
>>ascribe your partial desupplimentalization to some other cause?
I am hoping that Google might get a handle on the problem ;)...
I am not seeing any traffic as a result - It is a fairly large site and only about 10-20 pages are no longer supplemental - but logical pages - eg pages linked from homepage and for a clients site - just the homepage is no longer supplemental.
However, my main site shows little improvements - so seems a bit random at this stage.
>>>I don't track G bot consistently but he seems to have been showing up all through my supplimental period.
Probably Mozilla Googlebot - this Googlebot does not tend to add pages to the index and has another purpose (which is not exactly clear)
oddsod, I would not know how many times the 'wrong' links had been crawled. I do not keep such detailed stats. We had 18 subdomains affected. About 15 turned to 'urls only' in less than a week.
Today, I can already see that 3 of them have been 301'd and no longer have the www.subdomain.domain.com problem. This is probably the 7th or 8th day since we placed the htaccess and dropped the 'misdirected' links. From all the urls only one went suplemental, maybe because we acted quickly. We are watching that one closely to see if it comes out of that index.
I am not very experienced on this but it seems you can only wait and let the spiders do their work at their own pace. Before implementing this I consulted with a few people here that seem to have a lot more knowledge and thought this might work...
|As of yesterday noticed that the site is no longer supplimental |
How do you know it? I mean, it may currently show no supplemental pages in SERPs but unless several months have passed and those supplementals haven't returned you can't be sure they're really gone.
|traffic remains at a dozen or so G refs per day (last year 10,000 G refs per day). |
Suggests, unfortunately, that they're still supplemental though you may not be able to figure that from SERPs queries. (It could be other algo changes that caused the drop but my gut feeling would be to blame supps first)
|I added this site to Google Base about ten days ago |
Added the site? How? Copied all the site content over to Google base? I fail to see the connection between Google base and anything. Please explain.
obono, what you describe is typically how it happens. The problem in the OP is really about these supplementals that seem to have been rectified but revert back to the original problem after a few months and those same pages go supplemental again. So, you can wait, repeat, wait, repeat, wait, repeat and your pages will still keep reappearing as supplementals.
I'm talking about getting them out of the real supplemental rather than getting them out of Google's public admission of supplemental.
Yes, the partial desupplimentalization may very well be smoke and mirrors.
As for adding to Google Base, I simply created a Google Base account and added the site -- title, labels, url. Why this might have an effect on G proper, I dunno. I suppose that if a new temple were erected to Athena it might behoove the faithful to toss the priests a proverbial drachma and make one's ablutions. Do you suppose the oracle can recognize an apostate on sight (or by site)?
|Anyone else seeing improvemnts on "7"? |
|We did what wizard suggested for #2. After placing the 301 we dropped a strong link to the wrong url to be crawled again. So far, we had partial success. The new crawl has made the wrong urls lose their title and description and are now listed as url's only. If what I read here is correct, this is a waiting stage till the link is spider one more time to retrieve title and description. At that time, we expect that the 301 will be complete and google will list just one page with no dupes. |
During last months, it took irreasonably long time for Google to do it, but still, you can hardly do it other way. I succeeded with moving some urls with 301 recently, true, but also I have other that still are supplemental. But I find it good that at least a few redirects succeeded to remove supplementals.
|I cleaned up 301s on one of my sites as well but a site:mysite.com -www still shows up all the old non-www pages as being in supplemental. And it's because Google is not removing them from supplementals is why I think 301 handling is broken at Google. |
I don't deny. In my previous post, I said:
|I succeeded to clean up some URLs with 301 last month, so 301 are not completely broken. |
I agree there _is_ a problem with 301, but recently, after months of waiting, some of my supplementals have gone, so I turned optimistic.
I have posted this before.
The 301 was added in March and the www pages became better indexed (more of them, and URL-only entries gained a title and description). This took just a couple of weeks.
The non-www stuck in the index, and many were supplemental results too. The non-www took several months to get rid of... and then after several more months Google just suddenly added any of them back into the index again (in August I think) without warning, and they have been impossible to get rid of since then.
While doing site:mydomain.com, only my homepage and few pages show up with most of the pages shown as url only, so it looks like this site has been mostly penalized, but when I do site:mydomain.com keyword, then the internal pages show up with proper title and description. Cache page is also quite recent around 19-25 Nov.
Why this is the case? Is it incomplete database merge of Google recently? Has anyone seeing this?
FYI, this website has been suffering from canonical url and supplementals, so I put up 301 redirect from non-www to www, double slash // and /index.html to / about 2 or 3 weeks ago.
I have a spare Supplemental of my Home Page which I would like to get rid of. Google must have left it by accident when penalising me for duplicate content.
I just found a hit in my logs from google custom and followed it back out of curiosity. I hadn't really seen a hit from it before and it didn't contain a search term. I started reading about google custom and in their promotion I saw this little tout and had to laugh.
"And Google's index is continuously scrubbed to eliminate duplicate URLs and links that no longer exist. "
I thought everybody would get a nice chuckle if they hadn't seen it.
ROFL. That certainly is funny!
One of my sites has always had problems of this nature.
I had tried to ban pages with robots.txt - but they are never really removed and Google seems to penalise me for having similar content displayed under 2 different URLs.
Since it was performing so badly, I have decided to take drastic action and I have renamed all of the dynamic pages that were causing this problem. Some of the old URL's have 302 redirects and the rest are just displaying the default 404 page.
I will let you know if it works.
Those 302 redirects will mess you up a lot more.
You should be using the 301 redirect.
Let me see..
IF i have 2 pages with target words in both pages, both have PR5, diferent URL.
If Google drop one, my "target word" serps result decrease.
But the question is, drop or put same kind of penalty.
I don't think google have any kind of penalty but a way to discount value for same content, google want our real pages without 10 dup pages in a CMS increasing site value.
Same problem with www. 2 pages add double value for the site. Some of suplemental are dup content.
I don't change nothing until this mess end.
I change hosting in Set., 50000 suplemental in serps, use 301 for a single page with information
MOVED , last nigth no supplementals, from Set. until now i never loose more then 5%-10% of visitors in a day or two. Serps up for some words down for others.
I decid rest, fishing and forget google mess until this end.
Sorry my bad english.
| This 31 message thread spans 2 pages: 31 (  2 ) > > |