Welcome to WebmasterWorld Guest from 18.210.22.132

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Redirect dynamic url to another dynamic url

Redirect dynamic url to another dynamic url

     
12:31 am on Dec 7, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 7, 2010
posts: 35
votes: 2


Hello to all,

This problem give me a hard times so maybe you can help me...

I have this rule in my .htaccess:

RewriteRule ^media/([^/.]*)/?/([_A-Za-z0-9-]+).html/?$ itemout.php?id=$1
...means that, for example, the link
http://mydomain.com/media/100/test.html will be pass through itemout.php wich will point it to the above url and give him a adequate content.

Did I had to restrict googlebot in my robots.txt for indexing php files, something like: Disallow: /*.php$, now I have duplicate content (webmaster tools, about 200 links):

1. http://mydomain.com/media/100/test.html
2. http://mydomain.com/itemout.php?id=100

so, what should I do for 301 redirect to SEO friendly url like one above?
4:22 am on Dec 7, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


It requires two rules to fix both problems:

# Externally redirect direct client requests for /itemout\.php?id=100 as a URL-path
# to the correct URL http://example.com/media/100/test.html
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /itemout\.php\?id=([0-9]+)\ HTTP/
RewriteRule ^itemout\.php$ http://example.com/media/%1/test.html? [R=301,L]
#
# Internally rewrite requests for URL-path /media/100/test.html to
# internal server filepath plus query-string /itemout.php?id=100
RewriteRule ^media/([0-9]+)/test\.html$ /itemout.php?id=$1 [L]

Jim
9:08 am on Dec 7, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 7, 2010
posts: 35
votes: 2


Thank you Jim for a quck help...
But maybe I wasn't clear enough so the problem still exist:

This:
RewriteRule ^itemout\.php$ http://example.com/media/%1/test.html? [R=301,L]

test.html is always different, dynamically given by itemout.php

If is not possible, maybe this is the answer:
RewriteRule ^itemout\.php$ http://example.com/404.html? [R=301,L]

---this will redirect to non existent page, produce 404 err and that can tell Google not to index such request?
1:55 pm on Dec 7, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


No, that is a redirect, which will produce a 301 response, possibly followed by a 200-OK. *Never* redirect the URL of a missing file. Instead, you must rewrite it to a non-existent filepath on Apache 2.0 and below, or use the new [R=404] syntax on Apache 2.2 and later (which despite appearances, does not actually invoke a client redirect).

Because the URL-path-part "/test" is not present in the "/itemout.php" URLs, there is no way for mod_rewrite or Apache to "know" what its value should be if it is a variable that can change. This is because your /itemout.php rule "drops" that part of the URL when rewriting /media/100/test.html to itemout.php, so that information is lost.

Therefore, you will not be able to "recover" direct requests for the itemout.php-style URLs, and you will have to 404 them.

# Generate a 404-Not Found response for direct client requests for /itemout\.php as a URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /itemout\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^itemout\.php$ /filepath-that-is-known-not-to-exist.xyz [L]
# Alternate RewriteRule for use only on Apache 2.2 and later
# RewriteRule ^itemout\.php$ - [R=404,L]
#
# Internally rewrite requests for URL-path /media/100/test.html to
# internal server filepath plus query-string /itemout.php?id=100
RewriteRule ^media/([0-9]+)/test\.html$ /itemout.php?id=$1 [L]

The first "404 RewriteRule" posted above works on any Apache Server, while the second one (currently commented-out) will only work on Apache 2.2 and later. At this time, I recommend using the first version for best "portability" among hosts and servers.

Jim
2:19 pm on Dec 7, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 7, 2010
posts: 35
votes: 2


Thanks again Jim, it was very helpful...
I did 404 like U suggest and now it is ok, below is the excellent tool for checking server headers:

Check Server Headers [seoconsultants.com]
2:47 pm on Dec 7, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


An even better tool is the "Live HTTP Headers" checker -- an add-on for Firefox and all other Mozilla-based browsers. There are also a few other add-ons with header checking built in. These give a more accurate and more complete view of the HTTP headers because they work with your actual browser requests, rather than using a proxy to forward the requests through another server.

Jim
2:46 pm on Dec 16, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 7, 2010
posts: 35
votes: 2


Another problem come up!

In the example above, I have 404 itemout.php and did internally rewrite requests for URL-path /media/100/anything.html to
internal server filepath plus query-string /itemout.php?id=100 with

RewriteRule ^media/([0-9]+)/?/([A-Za-z0-9-]+)\.html$ /itemout.php?id=$1 [L]

This work just fine.

I have another similar rule:
RewriteRule ^books/([^/.]*)/?/([_A-Za-z0-9-]+).html/?$ count.php?id=$1

and suddenly, Google come up with duplicate title warning like this:

/books/100/anything.html
/books/100/anything.html?id=100

so I am more then confused now, from where second link come from!

Anyway, I put next rule in my robots.txt:
Disallow: /*?
in hope this will stop Google to index the link with mark *?*,
Jim, any idea?
3:45 pm on Dec 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Make sure that every RewriteRule has the [L] flag at the end.
This can cause random difficult-to-fathom problems.

([^/.]*) - By not passing this part of the URL request to the backend script, you allow competitors to generate infinite URL-space on your site. This part of the request must be passed to the script for validation, and a 404 error generated if it does not fit the ONE valid value it should have (if not a 404, then at least a 301 redirect to the correct URL).

/?/ - are you saying some of your URL requests can have a double slash within? To rewrite these as valid is not a good idea.

.html/?$ - to allow a URL request both with and without a trailing slash to be rewritten promotes duplicate content and must be avoided at all costs.

Your existing rule incorporates three errors, each of which can lead to complete disaster for your site.

All protocol, sub-domain, domain, TLD, port, path, file, extension and parameter errors in URL requests must be trapped and either redirected to the correct URL or else have the correct HTTP error served. Only one valid request must be allowed to serve content.
4:57 pm on Dec 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> so I am more then confused now, from where second link come from!

It came from either a bad redirect rule (either during development or still present) or a bad link out on the Web, possibly a maliciously-bad link posted by a competitor.

Full URL canonicalization as recommended above by g1smd will fix this.

Jim
7:39 pm on Dec 16, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 7, 2010
posts: 35
votes: 2


Thank you both!

I did this:
RewriteRule ^books/([^/.]*)/([A-Za-z0-9-]+).html$ count.php?id=$1 [L]

but still is possible to call links below:

/books/100/anything.html
/books/100/anything.html?id=100

In my php script, count.php, this code handle the id value:

$id = clean_string($_GET['id']);
(clean_string is function with mysql_real_escape_string, not important for this issue).

How to block character ? after the .html, this thing driving me crazy.

P.S.
I use this code in my php script wich can handle coming ? in url:

$checkurl = $_SERVER['REQUEST_URI'];
$blockthis = strpos($checkurl, "?"); //check for spec. char ?
if ($blockthis == false) {
//do nothing
} else {
header('HTTP/1.1 404 Not Found');
//call custom 404 page
exit;
}

So, this php routine will 404 such call but is it possible in .htaccess itself?
8:27 pm on Dec 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


If there are NO parameters in URLs for any of your site, you can block all URL requests with parameters using just two simple lines of .htaccess code. If there are some known parameters, a small number of additional preceding RewriteCond lines may be needed.

On another note, you need to pass both $1 and $2 to your script, and validate the values of both.

Additionally, you allow mixed case URL requests. Mare sure your script fully validates the correct case of every character in the request and send the correct HTTP error headers if incorrect.

Your script has to generate the 404 header for those. There is no way for .htaccess/mod_rewrite to "know" what records exist in the content database.
10:19 pm on Dec 16, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 7, 2010
posts: 35
votes: 2


Thank you for your patience

"block all URL requests with parameters"...well, what kind of statement can be?
10:43 pm on Dec 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


RewriteCond %{QUERY_STRING} .


will detect all requests where there is ANY query string parameters.

It is just a matter of setting the RewriteRule pattern to select the paths and files where you need to further look for query string data.

[edited by: jdMorgan at 10:36 pm (utc) on Dec 20, 2010]
[edit reason] Single typo correction [/edit]