| 7:08 pm on Apr 20, 2005 (gmt 0)|
Anybody knows if there are two listings of your site:
1.http://site.com (no desc)
Is that a sign of duplicate content penalty? I thought so and made the mistake of removing [site.com...] with google removal tool, which also removed the cannonical one.
If you have more sites with these two listings what should you do to fix that?
| 7:09 pm on Apr 20, 2005 (gmt 0)|
Don't appologize. In my case I don't think it was the tool. If not the tool does not do what it is supposed to do. Like I said earlier by what the tool is supposed to do the worse that could have happened I would have lost those 5 pages only.
There is only a few who have done this incorrectly. Other that did it correctly still seen their whole site drop from the index. There are still the many who didn't do a darned thing and dropped from the index and there are many of athority who didn't do a darned thing and dropped from the index also.
So a select few screwed up and removed their own site so what about the rest. Spam filter? Look at the results for goodness sake! Much scraper spam crap and athorities missing.
| 7:17 pm on Apr 20, 2005 (gmt 0)|
|1.http://site.com (no desc) |
Is that a sign of duplicate content penalty?
I would say no. It seems Google knows they are for the same page, if removing one using the removal tool removes both (almost as if Google treates them as the same url, despite toolbar pr differences--just taking one as canonical and devaluing the other to avoid duplicate appearances in the serps). This was not true of the 302 removals as the 302 urls were distinctly different from the canonical, therefore enabling you to remove the 302 without hurting the intended.
My 4 year old website has always had both versions listed, the www version always had the highest pagerank, and G devalued the other. That would be my logic, but I believe some have used 301s to redirect one to the other just in case. It just seems Google has likely advanced beyond penalizing for having a www version and non-www version in the serps.
| 7:28 pm on Apr 20, 2005 (gmt 0)|
1. Install a 301 rewrite rule set.
2. Add randomly changing page context and site related piece of content to the pages starting with your homepage.
The first will heal the site and the second will break the duplicate content problem.
Note that number 2 is also theorised (by some folks, not me) to be non functional in the case of a so called 302 hijack.
We are happy with our progress, a large high pr site is apt to have been classified by Google as a spammer so your mileage may depend upon how far along the duplication process is.
You may have to ask Google as noted in msg #116 in this thread by GoogleGuy to reinclude your site.
[edited by: theBear at 7:58 pm (utc) on April 20, 2005]
| 7:37 pm on Apr 20, 2005 (gmt 0)|
>> 1. http://site.com (no desc)
>> 2. http://www.site.com
>> Is that a sign of duplicate content penalty?
Let's get this straight. It isn't a "penalty" as such, it is simply that when faced with multiple URLs delivering the exact same content, that they want to only list one of them.
Google chooses one URL to list and drops the others. On the way out you may see some, or all, of the others as URL-only listings for a while.
The wider problem for a site is that page1.html might be associated with domain.com and page2.html might be associated with www.domain.com and so on. This can have consequences for the way that PR is distributed around your site, and you can see such split PR in operation on many such sites.
That is, if domain.com/page1.html links to domain.com/page2.html but for page2.html Google actually lists www.domain.com/page2.html, then that latter page isn't getting any PR from page1.html is it?
You'll see various pages switch allegience from domain.com to www.domain.com, and back, on a random basis, and all sorts of other strange effects.
If one of the versions becomes a Supplemental Result then you could be in bigger trouble. Google does not update the search index or the snippet for those, and your page might start being returned as a result based on old content: for content that is no longer on the real page and no longer in the displayed cache either.
Google used to be able to consolidate listings and merge PR, and used to do this every few months. I haven't seen that happening since at least last Summer.
You can help the situation simply by using a 301 redirect from non-www to www and that will eventually fix the problem.
As for removal, it seems that a request to take out anything with domain.com in it also takes out www.domain.com at the same time. Unrelated to the 302 problem that everyone else here is asking about, I have a friend who uses www on all the URLs of his site. In fact the non-www version cannot even be accessed. However, there was a URL-only listing for thesite.com/ in Google a few weeks ago when doing a site: search. I used the tool to get rid of that rogue result and the www index page disappeared too. It's no big deal as it is only really a splash page (sooo 1998) and the rest of the site is unharmed. I'm still wondering where Google got the non-www result from. The URL cannot even be accessed. There is nothing there.
[edited by: g1smd at 7:48 pm (utc) on April 20, 2005]
| 7:38 pm on Apr 20, 2005 (gmt 0)|
Did your site completely disappear from serps? When you searched for your company name, where did it rank in the serps prior to making the changes you mentioned above? I have seen my company name jump from position 75+ to position 4. So, there is still some dup penalty issues, despite my changing all the content several times since the first of the year.
| 7:42 pm on Apr 20, 2005 (gmt 0)|
thanks crob305 and thebear for your feedback.
I have rewrite rules to make all urls canonical with 301s. That's been there for a long time.
I once changed the copy of the home page of one of the sites with the problem. Still no luck.
| 7:47 pm on Apr 20, 2005 (gmt 0)|
No the site remained visable but its traffic was sinking fast.
We caught it before it totally crashed.
We had from 2 to 5 copies of 750 pages.
Pre 301 insertion, we also had a number of 302 leaches (a problem I consider completely fixable).
| 8:04 pm on Apr 20, 2005 (gmt 0)|
"If one of the versions becomes a Supplemental Result then you could be in bigger trouble. Google does not update the search index or the snippet for those, and your page might start being returned as a result based on old content: for content that is no longer on the real page and no longer in the displayed cache either."
Could cause a trickle down effect and bang whole site is gone.
If google is able to associate with and without www, it seems that they may have associated 302's in some way, then if the 302 goes supplemental they could both go or revert to older associations?
| 8:09 pm on Apr 20, 2005 (gmt 0)|
Don't forget that:
"a particular page returned in the search results might not be a supplemental result for all search queries that it could be returned for".
I hope you are also aware of that.
| 8:11 pm on Apr 20, 2005 (gmt 0)|
| 8:17 pm on Apr 20, 2005 (gmt 0)|
And since we are dealing with software controlling hardware that was written by wetware we are really really in trouble ;).
Just between us wetware types.
| 8:35 pm on Apr 20, 2005 (gmt 0)|
And also that the angle of the dangle is directly proportional to the heat of the beat.
| 9:29 pm on Apr 20, 2005 (gmt 0)|
No one on webmasterworld reccomending doing anything bad. Some people did the bad thing on their own.
The main problem, and no reason to dance away from this, is Google's failure to handle this issue properly, both from a technology standpoint to start with, and then from a deal-with-webmasters one later. Google has adopted a cloak of secrecy about its many technological problems, and in doing so leads to webmasters more or less staggering around in the dark trying to fix problems caused by Google.
Of course Google owes nobody nothing, but that is not the point. The point is Google's database and search results are crippled by the combination of their bad technology (and thinking) combined with their cloak of secrecy.
If Google wants better search results, Google needs to learn from their mistakes. A small bit of evidence has shown up suggesting they have learned some, but more evidence abounds to suggest they have much more things to learn.
| 10:18 pm on Apr 20, 2005 (gmt 0)|
<Of course Google owes nobody nothing, but that is not the point.>
You are right. But honest decent publishers who follow Google´s own webmasters guidelines expect at least fairness of Google. And its not fair at all to remove or penalize sites just because somebody has decided to hijack their contents against the will of the owners of these sites.
| 10:41 pm on Apr 20, 2005 (gmt 0)|
|"a particular page returned in the search results might not be a supplemental result for all search queries that it could be returned for". |
This is what was so scary to me since my site is currently returned as a supplemental page for my own name.
| 10:53 pm on Apr 20, 2005 (gmt 0)|
Well said steveb. I don't, however, entirely agree that Google owes nothing to anyone though. Google doesn't manufacture all of those web pages it displays. It makes its money from resources supplied by us. It also promotes itself as the world's leading information organizer. I think that people who provide content and use the search engine deserve at least a minimal level of professionalism.
Some tour companies make their money guiding people around national parks. If they were to start forest fires, cause accidents, or lose their clients in the woods, they would be held accountable.
Then, there's Google. Overnight, it can ruin years of work of thousands of webmasters without as much as an apology. That might be legal but it ain't ethical. As a company, Google has some growing up to do.
| 11:07 pm on Apr 20, 2005 (gmt 0)|
Google owes us terms of:
Professional business conduct
To the same degree that Microsoft, the Telephone Company, and many others owe us. I built my business around needing the telephone company, Microsoft Windows and many other services. I give Google more than $200K income per year - I expect better from them.
Do I get calls from Microsoft, the telephone company, and others to help improve my business? YES! And I generate a lot less revenue for them.
I don't want Google to call me. I just don't want them to destroy my business overnight unless I have done evil.
| 12:15 am on Apr 21, 2005 (gmt 0)|
I fear we've lost the thread's "redirection issues" focus and lost GG's very helpful participation.
If he returns I'll apologize and buy beers for any participant in this thread who attends the New Orleans Conference.
| 1:11 am on Apr 21, 2005 (gmt 0)|
I submitted a reinclusion request earlier today. My site appears to be under some kind of penalty since Google accounts for exactly 1% of referrals.
I have to say that I was hesitant since this is like admitting that I did something wrong - which I never did. The site is approaching its 1st birthday in the end of May. Every night I write till I can't stand it any longer. At least one article a day, now 850 pages - over 500,000 words.
I'm starting to see good traffic from Yahoo. MSN has me on page 1 for a 230 million term (which I have to admit is probably an overstatement of my site's importance for this particular word). And my wife thinks I'm nuts when I try to explain why she can't find my website in Google (it's at 213 tonight, showing as supplemental).
I figure if I give up now, it's like admitting defeat. I'm just not a quitter, so... it's back to writing - then maybe a Margarita!
If Google lifts the penalty, I will make one promise... You'll see me at the next Conference (although I will be writing later that evening...)
| 1:35 am on Apr 21, 2005 (gmt 0)|
We are trying to figure this one out ourselves and waiting on a response from google. If it were a spam penalty, I'll be darned if I know where to look especially through the thousands of pages we have! I hate to do anything to the site unless I had a real good idea of what it was that was causing it. We rank very well in both msn and yahoo and don't want to risk those and loose everything (unless I do something that is easily reversable). Never exchange links, never cross link any of our own sites, on page factors are most generally typical, seeing others far worse. Internal linking is simple pyramid. Large amount of 302 hijacking from one particular search engine which had 80% of our pages. Old 301 pages popping back in (probably a side effect).
| 8:20 am on Apr 21, 2005 (gmt 0)|
Sorry, I got pulled into this thread: [webmasterworld.com...]
I should have known better, but oh well. bucaro/illusionist, I searched for your nicks in our user support tracking database but didn't see any emails. If you do a reinclusion request, please include your handles so I can try to get someone to find you. Very few people used the url removal tool to take out their own sites, so I can try to gather some people into one group and ask someone if we can do anything on our end.
For the person who asked about the url removal tool: its removal for six months, not 90 days. I understand how someone thought it might help to try the url removal tool, but please don't use it on one's own site. arubicus, did you say you saw weird behavior with www vs. non-www or trailing slashes vs. without? Could you submit something to google.com/support so I can try to get someone to check it out? Use arubicus in the form somewhere. I'm going to be gone Friday and this weekend, but I'd be curious to hear of any remaining canonicalization issues.
Ugh. Very bleary. Going to bed now..
| 8:37 am on Apr 21, 2005 (gmt 0)|
Night Night GG
I have sent mine with my handle.
I admit I might have a problem with duplicate content - but trying to add lots of reviews etc.
But the whole site has gone - but the way it has gone I am wondering if it is a canonical url non-www problem.
| 8:38 am on Apr 21, 2005 (gmt 0)|
Thanks, Dayo_UK. Goodnight..
| 8:49 am on Apr 21, 2005 (gmt 0)|
"I'd be curious to hear of any remaining canonicalization issues"
Okay, it's not a 302 one, but I've been thinking of it as a similar "canonicalization issue"...
Google's database is overflowing with URL listings like:
where there is also a normal listing for
These occur from the trifecta of the unfortunate Google policy of URLs-are-pages combined with the bajillion puke scraper sites that scrape search results, where both Yahoo and MSN display results without the trailing slash
It's my experience that when a page gets a second URL only listing, it drops in the results, which would then end up penalizing pages without a file extension, particularly if they are popular and get scraped often.
These URL only links fade fairly quickly, but still it would be nice to see Google recognize and combine these with the canonical page, rather than seemingly demerit the canonical page.
| 9:29 am on Apr 21, 2005 (gmt 0)|
I believe I may have been confused about the re-inclusion thing. My site does show up doing a site command and it does show up when I do a www.my-site.com with no commands. However, it shows up with no title or description.
I presume then that a re-inclusion will not do any good since the site is in the index. The problem I have is that it is no longer in the ranking which is the real underlying problem. And it is a site which was in the top 5 for a 3 million page plus keyword for years.
And yes, we did see other domains listed using the site command. Now they are gone, but everything is still listed as supplemental.
Do I wait? Do a re-inclusion? Can it hurt to do a re-inclusion?
| 10:12 am on Apr 21, 2005 (gmt 0)|
<Very few people used the url removal tool to take out their own sites, so I can try to gather some people into one group and ask someone if we can do anything on our end.>
Though I´m not among them who took out their sites by mistake, I wish to thank you on their behalf for taking care of this matter.
Very kind of you GG. Much appreciated.
| 12:07 pm on Apr 21, 2005 (gmt 0)|
can someone tell me what canonicalization stands for, sorry to be so dumb.
| 12:15 pm on Apr 21, 2005 (gmt 0)|
<Very few people used the url removal tool to take out their own sites, so I can try to gather some people into one group and ask someone if we can do anything on our end.>
Although this is not about redirection, I used it to remove some duplicate domain.com versions of my pages via noindex tags, including my home page. Unfortunately, as I now know, it removes the www versions as well.
It's been 20 days or so and despite being re-spidered the www pages have not reappeared.
The 'Remove Individual pages' section of the google help page does not have a footer, similar to 'Remove your website', indicating a 90 day period.
Does anybody know whether the 90 days / 6 months applies to individual pages removed using noindex tags?
| 12:24 pm on Apr 21, 2005 (gmt 0)|
Hi Union Jack
Hard to define (as I am not an expert)
Following thread may help - read GG posts:-
But basically as I understand it Google finds the main url of the site (which is normally the page with the highest page rank) and perhaps where the site crawl starts.
However, I think, sometimes the wrong url can be picked. (Eg if you have the site on the non-www aswell - or your homepage is something like www.domain.com/home.php?sid=122323 - or all your links point to another page and therefore that page is seen as the most important and hence the canonical url)
Not 100% sure though
| 12:47 pm on Apr 21, 2005 (gmt 0)|
GG - Yeah I will send it in here in a little bit.