OSC, dupe content, robots.txt and 301 redirects

Forum Moderators: goodroi

Message Too Old, No Replies

OSC, dupe content, robots.txt and 301 redirects

Need to ensure bots 301 to new SEO URL

spina45

3:26 am on Mar 9, 2007 (gmt 0)

Hi,

I have an OsCommerce shop which is very prone to producing duplicate content. I have used robots.txt to reduce some of the dupe content with statements like:

Disallow: /*osCsid
Disallow: /*&action
Disallow: /*&products
Disallow: /*&sort

But here’s my question, I just had a developer install SEO URLs (I modified the stock version and it came out very nicely) so can I now add the code below to my robots.txt and will SE spiders still be redirected to my new SEO URL?

Disallow: /*product_info

Note: I’m 301 redirecting via .htaccess from PHP product URL directly above to new SEO URL.

I guess the very short way of asking my question is this: If I disallow a URL in robots.txt can SE spiders still be redirected from disallowed URL to new SEO URL?

jdMorgan

5:22 am on Mar 9, 2007 (gmt 0)

If the robots are disallowed from fetching those URL-paths, they won't fetch the old URLs, and so will never see the redirects from old to new URLs. I'd dump the robots.txt directives, and just use the 301 redirects; Eventually, the SEs will get the picture and stop asking for the old unfriendly URLs -- as long as you don't continue to link to them, that is.

Jim

spina45

2:41 pm on Mar 9, 2007 (gmt 0)

Thank you for your reply

> If the robots are disallowed from fetching those URL-paths, they...will never see the redirects from old to new URLs.

Even if the redirects are done in .htaccess?

> I'd dump the robots.txt directives

Are you specifially talking about my main OSC PHP product URLs that have been replaced with new SEO URLs? Or, do you mean ALL dupe content URL variations?

> as long as you don't continue to link to them, that is.

I've changed all my *internal* links to new SEO URLs, but there are still tons of external links (websites, blog posts, etc.) that are pointing to old and dupe content urls.

jdMorgan

3:04 pm on Mar 9, 2007 (gmt 0)

>> If the robots are disallowed from fetching those URL-paths, they...will never see the redirects from old to new URLs.
> Even if the redirects are done in .htaccess?

If you tell a (robots.txt-compliant) robot (using a Disallow in robots.txt) not to fetch an old URL, then it won't request that old URL from your server, and so will never trigger the 301 redirect in your .htaccess that you intended to use to "tell it" the new URL. Simple as that.

>> I'd dump the robots.txt directives
> Are you specifically talking about my main OSC PHP product URLs that have been replaced with new SEO URLs? Or, do you mean ALL dupe content URL variations?

I mean all duplicate-content URLs -- Use the 301 redirects to "correct" them in the SE indexes, keeping in mind the above clarification.

>> as long as you don't continue to link to them, that is.
> I've changed all my *internal* links to new SEO URLs, but there are still tons of external links (websites, blog posts, etc.) that are pointing to old and dupe content urls.

In that case, the SE robots will continue to request these obsolete and non-SE-friendly URLs. Over time, as these old links disappear from the Web, the spidering frequency will decrease, but you'll still need the 301s in place to redirect them. You may be able to accelerate their "fading-out" by asking your major linking partners to update their links to your site.

In simplest terms, use robots.txt as a "fetching/bandwidth control", on-page meta-robots tags as "page content" control and SE indexing (results listing) control, and URL redirection as "URL control". In other words, robots.txt and meta-robots protect the contents of the box, and redirection simply changes the labeling on the outside of the box.

Jim

spina45

8:23 pm on Mar 9, 2007 (gmt 0)

Jim,

Thank you for the education. I normally figure most people on WW are more knowledgeable than I am -- and considering you're a moderator I guess...I am not worthy! :)

I have removed "disallow" based on your feedback -- but, since I've noticed that Yahoo and MSN don't seem to listen to robots.txt (or else they take a long time to act on it) do MSN and Yahoo suppress rankings due to duplicate content as Google allegedly does?

Finally, because I'm now allowing SE bots to fetch more php URLs (that redirect to new SEO URLs) will it appear to SEs that I have more site content and result in any ranking enhancement?

Thank you.