Forum Moderators: Robert Charlton & goodroi
Now today I do a routine Google search to see if there are any recent copycats who took content from the site and posted it as theirs. Turns out that Google gave me back quite a list of sites. I was shocked - but then rather speechless when I noticed it were my OWN domains!
Fact is that I own at least two dozen unused domains that I might use later - basically when I expand my business to the blue and red widgets, as well as the yellow widgets with the purple dots. But it seems like, due to a misconfiguration in httpd.conf (where my green widgets site is first in the virtual hosts list), they all went live with the content of my green widgets site.
Obviously they're not live anymore, so maybe it wasn't httpd.conf alone - I'm assuming that this scenario has been fixed since I moved to a different host nearly two months ago. But all those pages are still in the Google index, and I'm sure they will be for quite some time.
So what I'm wondering - can this unfortunate episode hurt my Google rankings for the Green Widgets site? I know that what one site does should never affect another site to the worse, and I've heard that, when there's duplicate content, the duplicate pages that have lesser authority/PR are simply dropped. But when all the sites/domains serving the duplicate content clearly reside on the same server, that may be another story?
But that's not all - I'm worried that all the domains that leaked out with the wrong content now have a penalty on them that could weigh heavily when I ultimately decide to lauch one of them with the content they were meant for. Warranted?
Finally - what would you do in such a situation? Hit yourself with a brick? Write a friendly letter to Google? Not worry about it?
How come Google sees more than I can? I really don't get it.
Like I said, I didn't purposely put up multiple domains serving the same content. I don't even know how Google indexed them with the same content in the first place. And since I don't know how it happened, I'm afraid I also don't know how to avoid it. Here's the problem.
Sure, the Green Widgets site is the first Virtual Host in my httpd.conf, so Apache would point all unknown domains in that direction. Just that Apache only operates locally on my server. There's a lightweight tux server listening on port 80, and it's technically impossible that it forwards any domains to Apache that aren't hardwired (and the domains I have the problems with are certainly not).
So how on Earth did Google pull this off? I'm afraid I don't even have a conspiracy theory to offer, I'm simply stumped.
If the Google cache date says March 7, does that definitely mean that the Googlebot was there and found the exact cache content under said URL that day, or is there an alternative explanation for this (like: the Googlebot was there that day, found nothing, so it stuck with the OLD cache)? A couple of months ago I had a different configuration for a day or two with apache listening on port 80, so if the Googlebot got by that day, I'd understand - but only two weeks ago, that's really quite impossible.
In terms of the cache date, it should mean that is the davte of last spidering, whether the content was actually retrieved or googlebot just got a no-change response to if-modified-since. But Google has been known to have data glitches, too, so who can say for 100% sure, eh?