Mod Rewrite brain twister

Forum Moderators: phranque

Message Too Old, No Replies

Mod Rewrite brain twister

Can't figure this out

Elric99

12:24 pm on Apr 16, 2008 (gmt 0)

I'm using Mod Rewrite on a number of my websites to make dynamic URLs static.

Is there any way that a search engine could tell that the pages are dynamic and using mod rewrite? Or are they 100% the same as static pages to anyone or anything viewing them?

Thanks

jdMorgan

1:23 pm on Apr 16, 2008 (gmt 0)

Internal rewrites, properly implemented with mod_rewrite, are invisible to search engines.

One way to view this is that the action of mod_rewrite doing an internal rewrite is no different than the basic function of the Web server. For a simple example, a typical Web server takes a request for the URL http://www.example.com/widgets.php and translates that to a request for a filepath such as /var/users/example/public/html/widgets.php

This action is essentially equivalent to a rewrite, and of course, search engines are unaware of the "/var/users/example/public/html" path and can't/won't really care if you change that path to something else, say "/var/users/example/public/html/newdir".

The main "exposure" is that often, no steps are taken to prevent search engines from indexing the "real" URL path, in this example, "http://www.example.com/newdir/widgets.php" and so, adding the rewrite creates a duplicate-content issue if that "real" URL is accidentally exposed during development by an incorrect link or perhaps by use of the Google Toolbar.

This can easily be prevented or "fixed" by an additional snippet of code that redirects direct client requests for /newdir back to the root of the domain, so that the "real" URL-path is not directly accessible.

The key to understanding these problems is to recognize and consider URL-spaces and filespaces as separate spaces, and to recognize that the basic function of a server is to map a "standard" URL-space to an arbitrary filespace. This allows browsers to request "resources" and "objects" using the standard URL "addressing system" of HTTP without regard to the hardware, operating system, or filesystem conventions of the server that hosts those resources.

Jim

Elric99

3:08 pm on Apr 16, 2008 (gmt 0)

Thanks very kindly for the reply. That is exactly what I needed to know. Great info.

I just wanted to ask about where you mentioned:

This can easily be prevented or "fixed" by an additional snippet of code that redirects direct client requests for /newdir back to the root of the domain, so that the "real" URL-path is not directly accessible.

Where could I get this code please?

And also, could robots.txt be used instead to do the same job?

Thank you,

jdMorgan

5:57 pm on Apr 16, 2008 (gmt 0)

> Where could I get this code please?

Try searching WebmasterWorld (link at top of page) for "redirect direct client requests" for several examples.

> And also, could robots.txt be used instead to do the same job?

Perhaps, depending on the nature of the URLs. However, if the "real" URL has ever been indexed by search engines, a 301 redirect is a better method, since it 'recovers' the traffic and the PageRank/Link-popularity of the URL.

Jim

g1smd

1:04 am on Apr 17, 2008 (gmt 0)

I've posted the code you require a number of times, and the code has been discussed at least several times every month for the last couple of years. There's many prior examples here.