Welcome to WebmasterWorld Guest from 54.162.239.134

Message Too Old, No Replies

Trying to fix duplicate content issues.

     
12:39 pm on Sep 6, 2006 (gmt 0)

5+ Year Member



My directory cgi script is producing duplicate content but I have no idea how to block search engines from indexing the duplicates. Here is my problem. The script is producing both of these URLs that go to the same page. The longer URL version is no longer being linked to anywhere within my site but Google is still indexing it.

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

http://www.example.com/cgi-bin/pseek/dirs.cgi?lv=2&ct=category_widgets

Most of my Google traffic goes to the short version "cid=147" pages. So I want to keep indexing these. I don't want Google to index the "ct=category_widgets" pages. I can't add a no_index tags because they are produced dynamically and the tag would show up on both versions of the page. I can't use the normal robots to block the cgi because that would block both pages also.

Is they any other way to block this duplicate page with robots.txt other then blocking the cgi? Are there any other solutions to this problem?

<edit reason: use example.com>

[edited by: tedster at 1:33 pm (utc) on Sep. 6, 2006]

2:39 pm on Sep 6, 2006 (gmt 0)

5+ Year Member



You should set up a 301 redirect from the longer URL to the shorter one if both represent the same page (the same content). I would use .htaccess redirects.
Here you can find some examples
[webmasterworld.com...]

Best of luck,
Alex

7:38 pm on Sep 6, 2006 (gmt 0)

5+ Year Member



I read the other posting but still don't understand how to write the redirect. Can you or anyone give a example of how to write this?

I want to redirect

http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

to

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

10:03 pm on Sep 6, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Far easier is to modify the script so that it detects what URL was requested and simply add a <meta name="robots" content="noindex"> tag on all versions that you do not want to be indexed.

Use the noindex tag on alternative URLs where there are parameter differences and use the 301 redirect where non-www URLs and/or alternative domains are the problem.

10:02 am on Sep 7, 2006 (gmt 0)

5+ Year Member



I need your help or I may be an unemployed webmaster soon. :-(
The problem is:
10 sites selling widget in 10 different cities across Europe
Info on the templates is different from site to site, although describe items that are very similar; title and description original and unique.
But the code for all the templates is the same ( hyperlinks, tables, picture, the layout in general )
Websites share the same IP on a dedicated server
The question: is there a possibility that google my filter or penalize me for having identical code across those websites?
Any help would be very much appreciated!
12:14 pm on Sep 7, 2006 (gmt 0)

5+ Year Member



it should be definitely OK if the similarity is only in the CODE and not in the products/text content.
4:43 pm on Sep 7, 2006 (gmt 0)

5+ Year Member



I have already checked with the writers of my script and they told me there is no way to modify it to block the duplicate URLs.

Would it be possible make a robots file to only block URLS that contain "dir.cgi" since that is the only major difference in the duplicate URLs? If so how would I write it?

4:57 pm on Sep 7, 2006 (gmt 0)

5+ Year Member



Would this in my robots.txt work to block a spacific cgi page?

Disallow: /dir.cgi/

4:58 pm on Sep 7, 2006 (gmt 0)

5+ Year Member



not 100% sure

User-Agent: *
Disallow: /dir.cgi/

5:41 pm on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



" I read the other posting but still don't understand how to write the redirect. Can you or anyone give a example of how to write this?

I want to redirect

http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

to

http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147 "

It can be done but you really don't want to invoke yet another script or have a very large .htaccess file.

Where it gets done is inside of dirs2.cgi.

You can do the redirect there or manipulate the meta tags emitted by the script to include noindex,follow,nocache ... etc..

What follows is the heart of a redirector, of course you need to provide all of the logic to construct the url that gets put into $location.


print <<"--end--";
Status: 301 Moved Permanently
Location: $location

--end--

5:58 pm on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



User-Agent: *
Disallow: /cgi-bin/dirs2

Will result in Google not indexing any of the dirs2 anything stuff.

However if Google finds a link to /cgi-bin/dirs2 anything it will create a url only listing in its index.

Please note that blocking the bot also blocks following any of the links within the blocked files.

Use with extreme caution, you can easily block things you really didn't want to and you will lose part of your internal link structure and any external link credit (that is if any other site linked to your dirs2 anything url) for IBLs to that dynamic page.

[edited by: theBear at 5:59 pm (utc) on Sep. 7, 2006]

6:51 pm on Sep 7, 2006 (gmt 0)

5+ Year Member



The one I want to block is "dirs.cgi". So if I use this robots it will block dirs.cgi and leave dirs2.cgi correct?

User-Agent: *
Disallow: /cgi-bin/pseek/dirs.cgi

6:57 pm on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want to block the dirs.cgi and Google doesn't stumble on the period when parsing that should work.

I know what happens with the other one because I'm blocking several routines in some forum software that way.

Partial match blocking can have unintended fallout.

And that also means that the redirector goes inside the dirs.cgi script if you choose to go that route.

[edited by: theBear at 7:01 pm (utc) on Sep. 7, 2006]

6:59 pm on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



>> I have already checked with the writers of my script and they told me there is no way to modify it to block the duplicate URLs. <<

They are wrong. There is a way. It's just they are too lazy to actually do it. A script can be modified to anything you want it to.

Tell them that they can do it, or you can outsource the work and maybe their attitude might change a tad.

7:02 pm on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



g1smd, I didn't want to say that but I agree.
9:00 pm on Sep 7, 2006 (gmt 0)

5+ Year Member



I kind of thought they were just being lazy but I didn't know for sure. They just keep telling me to change over to the static version and I won't have a duplicate content issue. I keep telling them that Google already indexed the dynamic content years ago and changing everything would have a devastating effect of my traffic. They tell me Google would pickup the new static URLs very fast and it would have little effect of my traffic but I don't believe them.

I will have another chat with them and see if they can change it or place a noindex tag in the duplicate URL. I paid good money for the script and I don't think it shouldn't have had a duplicate content issue in the first place.

9:04 pm on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Google will pick up the new URLs within weeks, but the old URLs will count as yet more duplicate content. You really want to completely avoid having that happen.

You'll also lose all your backlink credit to established URLs, and you will lose any "age" accredited to the old URLs. The old URLs will also continue to appear as Supplemental Results for a full year after you make the move.

If they really understood how the footprint you leave in the search engines index is vitally important to your rankings, then they would not be arguing with you at all.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month