homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Trying to fix duplicate content issues.
Northstar

5+ Year Member



 
Msg#: 3073436 posted 12:39 pm on Sep 6, 2006 (gmt 0)

My directory cgi script is producing duplicate content but I have no idea how to block search engines from indexing the duplicates. Here is my problem. The script is producing both of these URLs that go to the same page. The longer URL version is no longer being linked to anywhere within my site but Google is still indexing it.

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

http://www.example.com/cgi-bin/pseek/dirs.cgi?lv=2&ct=category_widgets

Most of my Google traffic goes to the short version "cid=147" pages. So I want to keep indexing these. I don't want Google to index the "ct=category_widgets" pages. I can't add a no_index tags because they are produced dynamically and the tag would show up on both versions of the page. I can't use the normal robots to block the cgi because that would block both pages also.

Is they any other way to block this duplicate page with robots.txt other then blocking the cgi? Are there any other solutions to this problem?

<edit reason: use example.com>

[edited by: tedster at 1:33 pm (utc) on Sep. 6, 2006]

 

cavendish

5+ Year Member



 
Msg#: 3073436 posted 2:39 pm on Sep 6, 2006 (gmt 0)

You should set up a 301 redirect from the longer URL to the shorter one if both represent the same page (the same content). I would use .htaccess redirects.
Here you can find some examples
[webmasterworld.com...]

Best of luck,
Alex

Northstar

5+ Year Member



 
Msg#: 3073436 posted 7:38 pm on Sep 6, 2006 (gmt 0)

I read the other posting but still don't understand how to write the redirect. Can you or anyone give a example of how to write this?

I want to redirect

http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

to

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3073436 posted 10:03 pm on Sep 6, 2006 (gmt 0)

Far easier is to modify the script so that it detects what URL was requested and simply add a <meta name="robots" content="noindex"> tag on all versions that you do not want to be indexed.

Use the noindex tag on alternative URLs where there are parameter differences and use the 301 redirect where non-www URLs and/or alternative domains are the problem.

Alex70

5+ Year Member



 
Msg#: 3073436 posted 10:02 am on Sep 7, 2006 (gmt 0)

I need your help or I may be an unemployed webmaster soon. :-(
The problem is:
10 sites selling widget in 10 different cities across Europe
Info on the templates is different from site to site, although describe items that are very similar; title and description original and unique.
But the code for all the templates is the same ( hyperlinks, tables, picture, the layout in general )
Websites share the same IP on a dedicated server
The question: is there a possibility that google my filter or penalize me for having identical code across those websites?
Any help would be very much appreciated!

soulful house

5+ Year Member



 
Msg#: 3073436 posted 12:14 pm on Sep 7, 2006 (gmt 0)

it should be definitely OK if the similarity is only in the CODE and not in the products/text content.

Northstar

5+ Year Member



 
Msg#: 3073436 posted 4:43 pm on Sep 7, 2006 (gmt 0)

I have already checked with the writers of my script and they told me there is no way to modify it to block the duplicate URLs.

Would it be possible make a robots file to only block URLS that contain "dir.cgi" since that is the only major difference in the duplicate URLs? If so how would I write it?

Northstar

5+ Year Member



 
Msg#: 3073436 posted 4:57 pm on Sep 7, 2006 (gmt 0)

Would this in my robots.txt work to block a spacific cgi page?

Disallow: /dir.cgi/

Alex70

5+ Year Member



 
Msg#: 3073436 posted 4:58 pm on Sep 7, 2006 (gmt 0)

not 100% sure

User-Agent: *
Disallow: /dir.cgi/

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3073436 posted 5:41 pm on Sep 7, 2006 (gmt 0)

" I read the other posting but still don't understand how to write the redirect. Can you or anyone give a example of how to write this?

I want to redirect

http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

to

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147 "

It can be done but you really don't want to invoke yet another script or have a very large .htaccess file.

Where it gets done is inside of dirs2.cgi.

You can do the redirect there or manipulate the meta tags emitted by the script to include noindex,follow,nocache ... etc..

What follows is the heart of a redirector, of course you need to provide all of the logic to construct the url that gets put into $location.


print <<"--end--";
Status: 301 Moved Permanently
Location: $location

--end--

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3073436 posted 5:58 pm on Sep 7, 2006 (gmt 0)

User-Agent: *
Disallow: /cgi-bin/dirs2

Will result in Google not indexing any of the dirs2 anything stuff.

However if Google finds a link to /cgi-bin/dirs2 anything it will create a url only listing in its index.

Please note that blocking the bot also blocks following any of the links within the blocked files.

Use with extreme caution, you can easily block things you really didn't want to and you will lose part of your internal link structure and any external link credit (that is if any other site linked to your dirs2 anything url) for IBLs to that dynamic page.

[edited by: theBear at 5:59 pm (utc) on Sep. 7, 2006]

Northstar

5+ Year Member



 
Msg#: 3073436 posted 6:51 pm on Sep 7, 2006 (gmt 0)

The one I want to block is "dirs.cgi". So if I use this robots it will block dirs.cgi and leave dirs2.cgi correct?

User-Agent: *
Disallow: /cgi-bin/pseek/dirs.cgi

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3073436 posted 6:57 pm on Sep 7, 2006 (gmt 0)

If you want to block the dirs.cgi and Google doesn't stumble on the period when parsing that should work.

I know what happens with the other one because I'm blocking several routines in some forum software that way.

Partial match blocking can have unintended fallout.

And that also means that the redirector goes inside the dirs.cgi script if you choose to go that route.

[edited by: theBear at 7:01 pm (utc) on Sep. 7, 2006]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3073436 posted 6:59 pm on Sep 7, 2006 (gmt 0)

>> I have already checked with the writers of my script and they told me there is no way to modify it to block the duplicate URLs. <<

They are wrong. There is a way. It's just they are too lazy to actually do it. A script can be modified to anything you want it to.

Tell them that they can do it, or you can outsource the work and maybe their attitude might change a tad.

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3073436 posted 7:02 pm on Sep 7, 2006 (gmt 0)

g1smd, I didn't want to say that but I agree.

Northstar

5+ Year Member



 
Msg#: 3073436 posted 9:00 pm on Sep 7, 2006 (gmt 0)

I kind of thought they were just being lazy but I didn't know for sure. They just keep telling me to change over to the static version and I won't have a duplicate content issue. I keep telling them that Google already indexed the dynamic content years ago and changing everything would have a devastating effect of my traffic. They tell me Google would pickup the new static URLs very fast and it would have little effect of my traffic but I don't believe them.

I will have another chat with them and see if they can change it or place a noindex tag in the duplicate URL. I paid good money for the script and I don't think it shouldn't have had a duplicate content issue in the first place.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3073436 posted 9:04 pm on Sep 7, 2006 (gmt 0)

Google will pick up the new URLs within weeks, but the old URLs will count as yet more duplicate content. You really want to completely avoid having that happen.

You'll also lose all your backlink credit to established URLs, and you will lose any "age" accredited to the old URLs. The old URLs will also continue to appear as Supplemental Results for a full year after you make the move.

If they really understood how the footprint you leave in the search engines index is vitally important to your rankings, then they would not be arguing with you at all.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved