homepage Welcome to WebmasterWorld Guest from 54.147.196.159
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 31 message thread spans 2 pages: 31 ( [1] 2 > >     
Canonicals, persistent supplementals, broken removal tool
<sigh> Google isn't it time you came out of beta?
oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 32146 posted 8:21 pm on Nov 22, 2005 (gmt 0)

There've been numerous threads about the duplicate content problems and several related problems.

Problem 1: www and non-www are seen as two different sites but if you have a non-www robots.txt blocking Googlebot ... Google sees that as a block on both the non-www and the www pages. thread [webmasterworld.com]

Problem 2: Supplementals just don't go away. Matt keeps saying there's no problem but the reality is that you may be able to get a page to go out of supplemental for a while but it will pop straight back in again [webmasterworld.com].

Problem 3: Google removal tool - Apart from the fact that it's too easy... you state something in your robots.txt and competitors are rushing to use the Removal Tool to get you in trouble - what's the point of a tool that doesn't do what it says it will do?

Matt Cutts said in early October (in his moving to a new web host post) that they were doing something about 301 within the next "couple of months" or so. That's no promise, of course, and we have to work on the assumption that Google may never fix the broken bits.

I have some questions that I'm hoping some of you more experienced with the problem can help with. I appreciate that the removal tool is dangerous and you can knock your site out for 90/180 days. But, on some sites the problem is so bad that it really may be worth starting again.

1) Is using the removal tool to remove the whole site the same as using robots.txt to ban Google. If robots.txt is a low-strength medicine is that preferable i.e. ban googlebot, all google crawls stop, google stops showing your page, you re-allow Googlebot, it re-indexes, you've lost a few days traffic but have now got all your pages indexed and indexed correctly. (It can't be that simple, can it?)

2) What do you do if yoursite.com/index.htm is a duplicate of www.yoursite.com/index.htm. OK, you've got your 301 in place but how do you treat the removal of the index.htm page?

3) How do you get https pages removed? The removal tool doesn't do this. What other options are there?

4) If you use the removal tool and remove your whole site for a while... when your site comes back do you still benefit from all your backlinks? I'm assuming that removing your site from Google's index is not as serious as allowing your domain to expire :)

 

oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 32146 posted 9:25 am on Nov 27, 2005 (gmt 0)

<bump>

I don't know when this came out of pre-mod but it seems to have gotten buried.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 32146 posted 7:12 pm on Nov 27, 2005 (gmt 0)

See also:

[webmasterworld.com...] -- Post #25

[webmasterworld.com...] -- Post #37

[webmasterworld.com...] -- Post #400

[webmasterworld.com...] -- Post #414 and #417.

oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 32146 posted 11:26 am on Nov 28, 2005 (gmt 0)

g1smd, I tried to send this to you by sticky but your inbox is full:

Thanks for your reply. Prior to creating the thread I had read all those posts of yours. While they do confirm the problems I'm describing they don't seem to provide any of the answers I was looking for. Perhaps that's why you posted the links i.e. to confirm the problems. Please confirm if that's the case or if there's something (answers) I'm missing. Thanks.

cleanup

10+ Year Member



 
Msg#: 32146 posted 12:42 pm on Nov 28, 2005 (gmt 0)


"2) What do you do if yoursite.com/index.htm is a duplicate of www.yoursite.com/index.htm. OK, you've got your 301 in place but how do you treat the removal of the index.htm page? "

I don't think anyone knows. I am trying to do the same and still waiting for a redirected .com/index.html to be removed. :(

Interesting thought about taking the whole site down for a couple of months then try to restart and cure,

anyone here done that?

twebdonny



 
Msg#: 32146 posted 1:48 pm on Nov 28, 2005 (gmt 0)

Don't forget the inflated page count problem that
possibly triggers filters. That one has never been
addressed adequately.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 32146 posted 7:50 pm on Nov 28, 2005 (gmt 0)

I posted the links as "further reading", nothing more.

Wizard

5+ Year Member



 
Msg#: 32146 posted 10:06 pm on Nov 28, 2005 (gmt 0)

These are some my opinions to the questions you've asked.

1) Is using the removal tool to remove the whole site the same as using robots.txt to ban Google. If robots.txt is a low-strength medicine is that preferable i.e. ban googlebot, all google crawls stop, google stops showing your page, you re-allow Googlebot, it re-indexes, you've lost a few days traffic but have now got all your pages indexed and indexed correctly. (It can't be that simple, can it?)

Disallowing Googlebot doesn't remove your pages from the index, while both disallowing and removing the site with URL Console... also doesn't remove your pages from the index, but hides it for six months. There is no way to remove pages for a few days and then get them back. All you can do to clean up your indexed pages is to set up proper 301 redirects and keep waiting.

I succeeded to clean up some URLs with 301 last month, so 301 are not completely broken.

2) What do you do if yoursite.com/index.htm is a duplicate of www.yoursite.com/index.htm. OK, you've got your 301 in place but how do you treat the removal of the index.htm page?

Use 301, and ensure that wrong URL also has some backlinks so it's likely to be crawled. Take care to have much stronger backlinks to right version.

4) If you use the removal tool and remove your whole site for a while... when your site comes back do you still benefit from all your backlinks? I'm assuming that removing your site from Google's index is not as serious as allowing your domain to expire :)

You benefit from your backlinks even during the time period when the pages are removed - Google keeps crawling them, follows outbound links and credits PR to them.

cleanup

10+ Year Member



 
Msg#: 32146 posted 5:43 pm on Nov 29, 2005 (gmt 0)

66.102.7.104

Its been a long while since I have had anything positive to say about Google.

I think I might be seeing the light at the end of the supplemental tunnel. I hope I am not speaking too soon but I notice two things today on the above DC relating to my site (missing in action Sept 22nd).

1)The index.html is not listed anymore. This Supplemental is aparently out of the index.

2)The site is ranking for one of its phrases again.

Anyone else seeing improvemnts on "7"?

obono

10+ Year Member



 
Msg#: 32146 posted 5:54 pm on Nov 29, 2005 (gmt 0)

oddsod,

We did what wizard suggested for #2. After placing the 301 we dropped a strong link to the wrong url to be crawled again. So far, we had partial success. The new crawl has made the wrong urls lose their title and description and are now listed as url's only. If what I read here is correct, this is a waiting stage till the link is spider one more time to retrieve title and description. At that time, we expect that the 301 will be complete and google will list just one page with no dupes.

oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 32146 posted 6:13 pm on Nov 29, 2005 (gmt 0)

Thanks g1smd, I did particularly like your detailed explanation of the canonical problem.

@ Wizard

I cleaned up 301s on one of my sites as well but a site:mysite.com -www still shows up all the old non-www pages as being in supplemental. And it's because Google is not removing them from supplementals is why I think 301 handling is broken at Google.

With respect the home page - same problem. You can 301 it but the old page stays in supplemental and it's more of a problem than dupes of internal pages being stored.

obono, how many times was the wrong URL crawled after the 301? i.e. how many times (and for how long) did Google have to keep hitting the 301 for that page? But, most importantly, what do you get when you do a search for site:yoursite.com -www (if your dup problem was www and non-www versions of the same pages)?

If what I read here is correct, this is a waiting stage till the link is spider one more time to retrieve title and description. At that time, we expect that the 301 will be complete and google will list just one page with no dupes.

That is exactly what I was hoping to read here but don't recall having ever seen any thread that suggested this was the case. On the contrary the suggestions have always been that the old URLs stay in supplementals (as per the link in my OP). I would appreciate if you could point me to any thread that suggests Google handles 301s properly.

Atticus



 
Msg#: 32146 posted 7:11 pm on Nov 29, 2005 (gmt 0)

My largest site had been supplimental (via the site: command) for many moons. Also had the 10x page count inflation problem, but that cleared up a few months ago. Site remained supplimental.

As of yesterday noticed that the site is no longer supplimental, although traffic remains at a dozen or so G refs per day (last year 10,000 G refs per day).

I added this site to Google Base about ten days ago -- seems to be the most logical place to start in figuring out how it got desupplimentalized.

Folks may want to try adding the index from supplimental sites to Google Base and see what happens in a couple of weeks.

Dayo_UK

10+ Year Member



 
Msg#: 32146 posted 7:12 pm on Nov 29, 2005 (gmt 0)

Atticus

Whole site come out of Supplemental - or just starting with some pages?

Atticus



 
Msg#: 32146 posted 7:17 pm on Nov 29, 2005 (gmt 0)

About 10% of the pages, including index are no longer supplimental via site: command.

Dayo_UK

10+ Year Member



 
Msg#: 32146 posted 7:19 pm on Nov 29, 2005 (gmt 0)

Intresting - I have seen this too :) - and on some DC the homepage is top on site command followed by the recently crawled pages - which make it easier to see. Although the amount of DC with these results seem to be declining at the mo :(

I am trying not to get to optimistic though at this stage. I had a homepage crawled recently that was probably not crawled since Jan/Feb time.

Atticus



 
Msg#: 32146 posted 7:26 pm on Nov 29, 2005 (gmt 0)

Dayo,

Have you submitted to Google Base or would you ascribe your partial desupplimentalization to some other cause?

I don't track G bot consistently but he seems to have been showing up all through my supplimental period.

As of yesterday some of my desupplimentalized pages had a newer cache, but most showed old 2004 caches. Today I am seeing mostly Nov 2005 cache dates.

Dayo_UK

10+ Year Member



 
Msg#: 32146 posted 7:29 pm on Nov 29, 2005 (gmt 0)

No - not really intrested in Google Base so have not submitted.

>>ascribe your partial desupplimentalization to some other cause?

I am hoping that Google might get a handle on the problem ;)...

I am not seeing any traffic as a result - It is a fairly large site and only about 10-20 pages are no longer supplemental - but logical pages - eg pages linked from homepage and for a clients site - just the homepage is no longer supplemental.

However, my main site shows little improvements - so seems a bit random at this stage.

>>>I don't track G bot consistently but he seems to have been showing up all through my supplimental period.

Probably Mozilla Googlebot - this Googlebot does not tend to add pages to the index and has another purpose (which is not exactly clear)

obono

10+ Year Member



 
Msg#: 32146 posted 6:19 am on Nov 30, 2005 (gmt 0)

oddsod, I would not know how many times the 'wrong' links had been crawled. I do not keep such detailed stats. We had 18 subdomains affected. About 15 turned to 'urls only' in less than a week.

Today, I can already see that 3 of them have been 301'd and no longer have the www.subdomain.domain.com problem. This is probably the 7th or 8th day since we placed the htaccess and dropped the 'misdirected' links. From all the urls only one went suplemental, maybe because we acted quickly. We are watching that one closely to see if it comes out of that index.

I am not very experienced on this but it seems you can only wait and let the spiders do their work at their own pace. Before implementing this I consulted with a few people here that seem to have a lot more knowledge and thought this might work...

oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 32146 posted 3:08 pm on Nov 30, 2005 (gmt 0)

As of yesterday noticed that the site is no longer supplimental

How do you know it? I mean, it may currently show no supplemental pages in SERPs but unless several months have passed and those supplementals haven't returned you can't be sure they're really gone.

traffic remains at a dozen or so G refs per day (last year 10,000 G refs per day).

Suggests, unfortunately, that they're still supplemental though you may not be able to figure that from SERPs queries. (It could be other algo changes that caused the drop but my gut feeling would be to blame supps first)

I added this site to Google Base about ten days ago

Added the site? How? Copied all the site content over to Google base? I fail to see the connection between Google base and anything. Please explain.

obono, what you describe is typically how it happens. The problem in the OP is really about these supplementals that seem to have been rectified but revert back to the original problem after a few months and those same pages go supplemental again. So, you can wait, repeat, wait, repeat, wait, repeat and your pages will still keep reappearing as supplementals.

I'm talking about getting them out of the real supplemental rather than getting them out of Google's public admission of supplemental.

Atticus



 
Msg#: 32146 posted 5:42 pm on Nov 30, 2005 (gmt 0)

oddsod,

Yes, the partial desupplimentalization may very well be smoke and mirrors.

As for adding to Google Base, I simply created a Google Base account and added the site -- title, labels, url. Why this might have an effect on G proper, I dunno. I suppose that if a new temple were erected to Athena it might behoove the faithful to toss the priests a proverbial drachma and make one's ablutions. Do you suppose the oracle can recognize an apostate on sight (or by site)?

arbitrary

5+ Year Member



 
Msg#: 32146 posted 6:03 pm on Nov 30, 2005 (gmt 0)

Anyone else seeing improvemnts on "7"?

Not me.

Wizard

5+ Year Member



 
Msg#: 32146 posted 9:33 pm on Nov 30, 2005 (gmt 0)

We did what wizard suggested for #2. After placing the 301 we dropped a strong link to the wrong url to be crawled again. So far, we had partial success. The new crawl has made the wrong urls lose their title and description and are now listed as url's only. If what I read here is correct, this is a waiting stage till the link is spider one more time to retrieve title and description. At that time, we expect that the 301 will be complete and google will list just one page with no dupes.

During last months, it took irreasonably long time for Google to do it, but still, you can hardly do it other way. I succeeded with moving some urls with 301 recently, true, but also I have other that still are supplemental. But I find it good that at least a few redirects succeeded to remove supplementals.

I cleaned up 301s on one of my sites as well but a site:mysite.com -www still shows up all the old non-www pages as being in supplemental. And it's because Google is not removing them from supplementals is why I think 301 handling is broken at Google.

I don't deny. In my previous post, I said:

I succeeded to clean up some URLs with 301 last month, so 301 are not completely broken.

I agree there _is_ a problem with 301, but recently, after months of waiting, some of my supplementals have gone, so I turned optimistic.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 32146 posted 11:41 pm on Nov 30, 2005 (gmt 0)

I have posted this before.

The 301 was added in March and the www pages became better indexed (more of them, and URL-only entries gained a title and description). This took just a couple of weeks.

The non-www stuck in the index, and many were supplemental results too. The non-www took several months to get rid of... and then after several more months Google just suddenly added any of them back into the index again (in August I think) without warning, and they have been impossible to get rid of since then.

sit2510

10+ Year Member



 
Msg#: 32146 posted 6:52 am on Dec 1, 2005 (gmt 0)

While doing site:mydomain.com, only my homepage and few pages show up with most of the pages shown as url only, so it looks like this site has been mostly penalized, but when I do site:mydomain.com keyword, then the internal pages show up with proper title and description. Cache page is also quite recent around 19-25 Nov.

Why this is the case? Is it incomplete database merge of Google recently? Has anyone seeing this?

FYI, this website has been suffering from canonical url and supplementals, so I put up 301 redirect from non-www to www, double slash // and /index.html to / about 2 or 3 weeks ago.

dgdclynx

10+ Year Member



 
Msg#: 32146 posted 7:00 am on Dec 1, 2005 (gmt 0)

I have a spare Supplemental of my Home Page which I would like to get rid of. Google must have left it by accident when penalising me for duplicate content.

texasville

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 32146 posted 8:05 pm on Dec 1, 2005 (gmt 0)

I just found a hit in my logs from google custom and followed it back out of curiosity. I hadn't really seen a hit from it before and it didn't contain a search term. I started reading about google custom and in their promotion I saw this little tout and had to laugh.
"And Google's index is continuously scrubbed to eliminate duplicate URLs and links that no longer exist. "

I thought everybody would get a nice chuckle if they hadn't seen it.

oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 32146 posted 8:25 pm on Dec 1, 2005 (gmt 0)

ROFL. That certainly is funny!

glitterball

10+ Year Member



 
Msg#: 32146 posted 11:37 pm on Dec 1, 2005 (gmt 0)

One of my sites has always had problems of this nature.
I had tried to ban pages with robots.txt - but they are never really removed and Google seems to penalise me for having similar content displayed under 2 different URLs.

Since it was performing so badly, I have decided to take drastic action and I have renamed all of the dynamic pages that were causing this problem. Some of the old URL's have 302 redirects and the rest are just displaying the default 404 page.

I will let you know if it works.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 32146 posted 11:42 pm on Dec 1, 2005 (gmt 0)

Those 302 redirects will mess you up a lot more.

You should be using the 301 redirect.

ruip

5+ Year Member



 
Msg#: 32146 posted 3:14 am on Dec 2, 2005 (gmt 0)

Let me see..
IF i have 2 pages with target words in both pages, both have PR5, diferent URL.
If Google drop one, my "target word" serps result decrease.

But the question is, drop or put same kind of penalty.
I don't think google have any kind of penalty but a way to discount value for same content, google want our real pages without 10 dup pages in a CMS increasing site value.

Same problem with www. 2 pages add double value for the site. Some of suplemental are dup content.
I don't change nothing until this mess end.

I change hosting in Set., 50000 suplemental in serps, use 301 for a single page with information
MOVED , last nigth no supplementals, from Set. until now i never loose more then 5%-10% of visitors in a day or two. Serps up for some words down for others.

I decid rest, fishing and forget google mess until this end.

Sorry my bad english.

This 31 message thread spans 2 pages: 31 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved