Welcome to WebmasterWorld Guest from 54.147.44.13

Message Too Old, No Replies

Trying to fix duplicate content issues.

     
12:39 pm on Sep 6, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:July 6, 2006
posts:132
votes: 0


My directory cgi script is producing duplicate content but I have no idea how to block search engines from indexing the duplicates. Here is my problem. The script is producing both of these URLs that go to the same page. The longer URL version is no longer being linked to anywhere within my site but Google is still indexing it.

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

http://www.example.com/cgi-bin/pseek/dirs.cgi?lv=2&ct=category_widgets

Most of my Google traffic goes to the short version "cid=147" pages. So I want to keep indexing these. I don't want Google to index the "ct=category_widgets" pages. I can't add a no_index tags because they are produced dynamically and the tag would show up on both versions of the page. I can't use the normal robots to block the cgi because that would block both pages also.

Is they any other way to block this duplicate page with robots.txt other then blocking the cgi? Are there any other solutions to this problem?

<edit reason: use example.com>

[edited by: tedster at 1:33 pm (utc) on Sep. 6, 2006]

2:39 pm on Sept 6, 2006 (gmt 0)

New User

5+ Year Member

joined:July 1, 2006
posts:15
votes: 0


You should set up a 301 redirect from the longer URL to the shorter one if both represent the same page (the same content). I would use .htaccess redirects.
Here you can find some examples
[webmasterworld.com...]

Best of luck,
Alex

7:38 pm on Sept 6, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:July 6, 2006
posts:132
votes: 0


I read the other posting but still don't understand how to write the redirect. Can you or anyone give a example of how to write this?

I want to redirect

http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

to

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

10:03 pm on Sept 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Far easier is to modify the script so that it detects what URL was requested and simply add a <meta name="robots" content="noindex"> tag on all versions that you do not want to be indexed.

Use the noindex tag on alternative URLs where there are parameter differences and use the 301 redirect where non-www URLs and/or alternative domains are the problem.

10:02 am on Sept 7, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 1, 2005
posts:137
votes: 0


I need your help or I may be an unemployed webmaster soon. :-(
The problem is:
10 sites selling widget in 10 different cities across Europe
Info on the templates is different from site to site, although describe items that are very similar; title and description original and unique.
But the code for all the templates is the same ( hyperlinks, tables, picture, the layout in general )
Websites share the same IP on a dedicated server
The question: is there a possibility that google my filter or penalize me for having identical code across those websites?
Any help would be very much appreciated!
12:14 pm on Sept 7, 2006 (gmt 0)

New User

5+ Year Member

joined:July 24, 2006
posts:13
votes: 0


it should be definitely OK if the similarity is only in the CODE and not in the products/text content.
4:43 pm on Sept 7, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:July 6, 2006
posts:132
votes: 0


I have already checked with the writers of my script and they told me there is no way to modify it to block the duplicate URLs.

Would it be possible make a robots file to only block URLS that contain "dir.cgi" since that is the only major difference in the duplicate URLs? If so how would I write it?

4:57 pm on Sept 7, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:July 6, 2006
posts:132
votes: 0


Would this in my robots.txt work to block a spacific cgi page?

Disallow: /dir.cgi/

4:58 pm on Sept 7, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 1, 2005
posts:137
votes: 0


not 100% sure

User-Agent: *
Disallow: /dir.cgi/

5:41 pm on Sept 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


" I read the other posting but still don't understand how to write the redirect. Can you or anyone give a example of how to write this?

I want to redirect

http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

to

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147 "

It can be done but you really don't want to invoke yet another script or have a very large .htaccess file.

Where it gets done is inside of dirs2.cgi.

You can do the redirect there or manipulate the meta tags emitted by the script to include noindex,follow,nocache ... etc..

What follows is the heart of a redirector, of course you need to provide all of the logic to construct the url that gets put into $location.


print <<"--end--";
Status: 301 Moved Permanently
Location: $location

--end--

5:58 pm on Sept 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


User-Agent: *
Disallow: /cgi-bin/dirs2

Will result in Google not indexing any of the dirs2 anything stuff.

However if Google finds a link to /cgi-bin/dirs2 anything it will create a url only listing in its index.

Please note that blocking the bot also blocks following any of the links within the blocked files.

Use with extreme caution, you can easily block things you really didn't want to and you will lose part of your internal link structure and any external link credit (that is if any other site linked to your dirs2 anything url) for IBLs to that dynamic page.

[edited by: theBear at 5:59 pm (utc) on Sep. 7, 2006]

6:51 pm on Sept 7, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:July 6, 2006
posts:132
votes: 0


The one I want to block is "dirs.cgi". So if I use this robots it will block dirs.cgi and leave dirs2.cgi correct?

User-Agent: *
Disallow: /cgi-bin/pseek/dirs.cgi

6:57 pm on Sept 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


If you want to block the dirs.cgi and Google doesn't stumble on the period when parsing that should work.

I know what happens with the other one because I'm blocking several routines in some forum software that way.

Partial match blocking can have unintended fallout.

And that also means that the redirector goes inside the dirs.cgi script if you choose to go that route.

[edited by: theBear at 7:01 pm (utc) on Sep. 7, 2006]

6:59 pm on Sept 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


>> I have already checked with the writers of my script and they told me there is no way to modify it to block the duplicate URLs. <<

They are wrong. There is a way. It's just they are too lazy to actually do it. A script can be modified to anything you want it to.

Tell them that they can do it, or you can outsource the work and maybe their attitude might change a tad.

7:02 pm on Sept 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


g1smd, I didn't want to say that but I agree.
9:00 pm on Sept 7, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:July 6, 2006
posts:132
votes: 0


I kind of thought they were just being lazy but I didn't know for sure. They just keep telling me to change over to the static version and I won't have a duplicate content issue. I keep telling them that Google already indexed the dynamic content years ago and changing everything would have a devastating effect of my traffic. They tell me Google would pickup the new static URLs very fast and it would have little effect of my traffic but I don't believe them.

I will have another chat with them and see if they can change it or place a noindex tag in the duplicate URL. I paid good money for the script and I don't think it shouldn't have had a duplicate content issue in the first place.

9:04 pm on Sept 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Google will pick up the new URLs within weeks, but the old URLs will count as yet more duplicate content. You really want to completely avoid having that happen.

You'll also lose all your backlink credit to established URLs, and you will lose any "age" accredited to the old URLs. The old URLs will also continue to appear as Supplemental Results for a full year after you make the move.

If they really understood how the footprint you leave in the search engines index is vitally important to your rankings, then they would not be arguing with you at all.