Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: goodroi
I have an OsCommerce shop which is very prone to producing duplicate content. I have used robots.txt to reduce some of the dupe content with statements like:
But hereís my question, I just had a developer install SEO URLs (I modified the stock version and it came out very nicely) so can I now add the code below to my robots.txt and will SE spiders still be redirected to my new SEO URL?
Note: Iím 301 redirecting via .htaccess from PHP product URL directly above to new SEO URL.
I guess the very short way of asking my question is this: If I disallow a URL in robots.txt can SE spiders still be redirected from disallowed URL to new SEO URL?
> If the robots are disallowed from fetching those URL-paths, they...will never see the redirects from old to new URLs.
Even if the redirects are done in .htaccess?
> I'd dump the robots.txt directives
Are you specifially talking about my main OSC PHP product URLs that have been replaced with new SEO URLs? Or, do you mean ALL dupe content URL variations?
> as long as you don't continue to link to them, that is.
I've changed all my *internal* links to new SEO URLs, but there are still tons of external links (websites, blog posts, etc.) that are pointing to old and dupe content urls.
If you tell a (robots.txt-compliant) robot (using a Disallow in robots.txt) not to fetch an old URL, then it won't request that old URL from your server, and so will never trigger the 301 redirect in your .htaccess that you intended to use to "tell it" the new URL. Simple as that.
>> I'd dump the robots.txt directives
> Are you specifically talking about my main OSC PHP product URLs that have been replaced with new SEO URLs? Or, do you mean ALL dupe content URL variations?
I mean all duplicate-content URLs -- Use the 301 redirects to "correct" them in the SE indexes, keeping in mind the above clarification.
>> as long as you don't continue to link to them, that is.
> I've changed all my *internal* links to new SEO URLs, but there are still tons of external links (websites, blog posts, etc.) that are pointing to old and dupe content urls.
In that case, the SE robots will continue to request these obsolete and non-SE-friendly URLs. Over time, as these old links disappear from the Web, the spidering frequency will decrease, but you'll still need the 301s in place to redirect them. You may be able to accelerate their "fading-out" by asking your major linking partners to update their links to your site.
In simplest terms, use robots.txt as a "fetching/bandwidth control", on-page meta-robots tags as "page content" control and SE indexing (results listing) control, and URL redirection as "URL control". In other words, robots.txt and meta-robots protect the contents of the box, and redirection simply changes the labeling on the outside of the box.
Thank you for the education. I normally figure most people on WW are more knowledgeable than I am -- and considering you're a moderator I guess...I am not worthy! :)
I have removed "disallow" based on your feedback -- but, since I've noticed that Yahoo and MSN don't seem to listen to robots.txt (or else they take a long time to act on it) do MSN and Yahoo suppress rankings due to duplicate content as Google allegedly does?
Finally, because I'm now allowing SE bots to fetch more php URLs (that redirect to new SEO URLs) will it appear to SEs that I have more site content and result in any ranking enhancement?