Forum Moderators: Robert Charlton & goodroi
It makes it look like I have duplicate content.
So if my page's URL is http://www.example.com/page1.html
And I search for that page with this:
site:http://www.example.com/page1.html
I get results like this:
http://www.example.com/page1.html?ref=whatever.org
http://www.test.com/page1.html?ref=spammer.net
and so on...
it appears someone is duplicating my site and getting spidered which is hurting me.
marc
[edited by: tedster at 10:06 pm (utc) on Dec. 9, 2008]
[edit reason] switch to exampe.com - it can never be owned [/edit]
now when someone visits http://www.example.com/test1.html?ref=whatever.org, it just pops them to the correct url.
However, google has LOADS of these pages indexed... any search for one of my subpages returns up to 3 variations of this per page.
Would I be wrong to assume that this sudden influx of these over the past 5 months would have caused my slow downturn in the SERPS? Does google treat these as duplicates.
I sent google a reinclusion request to hopefully have this corrected, but it hasn't helped thus far.
Should I painstakingly try to submit each of these for removal from google's serps?
thanks for the help.
marc
Actually, I did this two months ago... However, google has LOADS of these pages indexed
Have you verified that your server actually returns a 301 status in the http header? Google would normally have dropped those urls by now if it's really a 301 status rather than some other kind of redirect.
Should I painstakingly try to submit each of these for removal from google's serps?
I'd say no - that's not compatible with the 301 redirect approach, since you would need a robots.txt disallow rule or a robots meta tag. And to switch strategies would lose any backlink juice that is there.
it appears someone is duplicating my site and getting spidered which is hurting me.
They don't need to duplicate your content. I've seen this kind of thing a bit, and it's often related to a kind of log spamming. The site just adds the query string to the end of url on your site to identify their domain as the source of the referral. By any chance are your logs open to public view?
You could also do a search like [site:spammer.net keyword string] to see if your content is also being served on that domain. But what you've described is just creating duplicate URLs on your domain.
Yes, I'm seeing the following when checking headers with an external tool:
HTTP Status Code: HTTP/1.1 301 Moved Permanently
In case it means anything, when you view the cached page on one of the offending URLs within google, it goes all the way back to September... these are quite old. One of my subpages has 4 different ref= instances... all of them show a timestamp of a different day within September.
No, running this query does not find anything relevant:
[site:spammer.net keyword string]
And lastly, there are no public log files available. I've never used them.
One other thing of note... I started using a sitemap 3 weeks ago... hopefully that will help google figure out what is important.
Oh, and I almost forgot... a couple of months ago, when I first figured this out, Google Webmaster Tools was reporting "multiple duplicate title tags" which revealed these ref= spammers... That dissapeared from WMT quite a while ago... I just wish they would remove those pages from the SERPS and restore my rankings to the way they used to be before this started!
Thanks again for thinking about this... I'll ask around and find the proper location to send your Xmas gift :-)
marc
[edited by: Robert_Charlton at 2:34 am (utc) on Dec. 10, 2008]
It looks like all those extra URLs have been dropped into the Supplemental Index. Leave them alone now. Google will quietly drop them in their own timescale. Where they show in the SERPs they will bring traffic. Over time, they will drop out, often in large blocks, a month or two apart.