Forum Moderators: Robert Charlton & goodroi
http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147
http://www.example.com/cgi-bin/pseek/dirs.cgi?lv=2&ct=category_widgets
Most of my Google traffic goes to the short version "cid=147" pages. So I want to keep indexing these. I don't want Google to index the "ct=category_widgets" pages. I can't add a no_index tags because they are produced dynamically and the tag would show up on both versions of the page. I can't use the normal robots to block the cgi because that would block both pages also.
Is they any other way to block this duplicate page with robots.txt other then blocking the cgi? Are there any other solutions to this problem?
<edit reason: use example.com>
[edited by: tedster at 1:33 pm (utc) on Sep. 6, 2006]
Best of luck,
Alex
Use the noindex tag on alternative URLs where there are parameter differences and use the 301 redirect where non-www URLs and/or alternative domains are the problem.
Would it be possible make a robots file to only block URLS that contain "dir.cgi" since that is the only major difference in the duplicate URLs? If so how would I write it?
I want to redirect
http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets
to
http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147 "
It can be done but you really don't want to invoke yet another script or have a very large .htaccess file.
Where it gets done is inside of dirs2.cgi.
You can do the redirect there or manipulate the meta tags emitted by the script to include noindex,follow,nocache ... etc..
What follows is the heart of a redirector, of course you need to provide all of the logic to construct the url that gets put into $location.
print <<"--end--";
Status: 301 Moved Permanently
Location: $location--end--
Will result in Google not indexing any of the dirs2 anything stuff.
However if Google finds a link to /cgi-bin/dirs2 anything it will create a url only listing in its index.
Please note that blocking the bot also blocks following any of the links within the blocked files.
Use with extreme caution, you can easily block things you really didn't want to and you will lose part of your internal link structure and any external link credit (that is if any other site linked to your dirs2 anything url) for IBLs to that dynamic page.
[edited by: theBear at 5:59 pm (utc) on Sep. 7, 2006]
I know what happens with the other one because I'm blocking several routines in some forum software that way.
Partial match blocking can have unintended fallout.
And that also means that the redirector goes inside the dirs.cgi script if you choose to go that route.
[edited by: theBear at 7:01 pm (utc) on Sep. 7, 2006]
They are wrong. There is a way. It's just they are too lazy to actually do it. A script can be modified to anything you want it to.
Tell them that they can do it, or you can outsource the work and maybe their attitude might change a tad.
I will have another chat with them and see if they can change it or place a noindex tag in the duplicate URL. I paid good money for the script and I don't think it shouldn't have had a duplicate content issue in the first place.
You'll also lose all your backlink credit to established URLs, and you will lose any "age" accredited to the old URLs. The old URLs will also continue to appear as Supplemental Results for a full year after you make the move.
If they really understood how the footprint you leave in the search engines index is vitally important to your rankings, then they would not be arguing with you at all.