Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Potential Duplicate urls?

         

dane120

11:23 am on Nov 12, 2007 (gmt 0)

10+ Year Member



After doing some fairly basic SEO on my site I am finding a strange occurrence on Google.

I now have all the urls of my site indexed within Google, however after doing a site search I have noticed that our affiliate tracking urls are being indexed.

The affiliate is pointing directly to my homepage, therefore in Google I have indexed both

www.example.com

and

www.example.com/CAMPAIGN=widgets&KEYWORDS=widgets_text

Both these pages contain the exact same content. My question is will this have a detrimental effect on my homepage rankings due to spamming, and if so how do I remove the affiliate url from Google?

[edited by: pageoneresults at 1:46 pm (utc) on Nov. 12, 2007]
[edit reason] Examplified URI References [/edit]

tedster

6:34 pm on Nov 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, that is potentially a duplicate url situation. You should block indexing of those alternate urls in some way - using a robots.txt disallow rule is probably the simplest approach.

Some people write a 301 redirect rule that removes the tracking information from the url, but they first need to be sure their analytics still picks up the original url requested for tracking purposes. The actual technical details are up to you, but definitely keep those urls out of the Google index.

jd01

6:48 pm on Nov 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another method is to have the script used for tracking affiliates insert a 'noindex' meta tag when an affiliate URL is detected.

Justin

dane120

10:21 am on Nov 13, 2007 (gmt 0)

10+ Year Member



Thanks for the help.

One further question is what is the correct syntax for blocking query strings in the robot.txt file?

tedster

4:46 pm on Nov 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To block Googlebot from block crawling any URL that includes a? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):
User-agent: Googlebot

Disallow: /*?

[google.com...]

The robots.txt standard has not officially added wild card pattern matching, but Google and Yahoo do follow it.