Forum Moderators: phranque

Message Too Old, No Replies

Internal Link Condoms

How are you addressing duplicate content?

         

pageoneresults

4:15 pm on Aug 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For those of you working with dynamic sites, how are you dealing with the possible indexing of duplicate content within your own site?

For example, you have provided the ability for the visitor to sort columns of information. Are those sort links spiderable?

lammert

1:36 am on Aug 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I generate on the fly meta robot tags in my PHP and SSI scripts. Only the version I want to get indexed has a "index,follow,noarchive" in the header, all the others "noindex,follow".

pageoneresults

3:01 am on Aug 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I generate on the fly meta robot tags in my PHP and SSI scripts.

Great idea lammert. But, what if I don't have that type of technical capability, what other options do I have available to me?

balam

3:08 am on Aug 15, 2006 (gmt 0)

10+ Year Member



> that type of technical capability

Is it your personal skill set you refer to, or technical limitations with regards to hosting?

Or put another way, how is the site dynamic in the first place?

pageoneresults

3:44 am on Aug 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is it your personal skill set you refer to, or technical limitations with regards to hosting?

It could be both. What options are available to me if I have the above scenario? Let's say that I cannot generate the metadata. Let's say that I cannot do something at the server level. What are my options?

lammert

11:41 pm on Aug 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's say that I cannot do something at the server level. What are my options?

Is this an academic question, or a real life problem?

There are basically two approaches in fighting the duplicate problem with dynamic websites. One approach is to rewrite all URLs to static versions without parameters and make sure that every part of the website uses only one version of the URL for a given page. For your example where the sort order of a table must be changed, cookies can be used which stores the preferred sort order per visitor.

The other approach is to accept that the site is accesible via multiple URLs, but make the site immune for it. Adding life meta tags as I do is one possibility, but you could also use Google's wildcard extension to the robots.txt. Wildcards in robots.txt are however not widely supported so that solution will only work for one search engine.

If you have neither access to the page source code, nor can add a sophisticated robots.txt, the remaining options are rather limited. Especially Googlebot has a habit of following all paths it shouldn't follow so eventually it will find duplicate URLs. Google claims that there is "almost nothing a competitor can do" to hurt a website in their index, so once Google has found the duplicates and stored them you are out of luck of removing them.