homepage Welcome to WebmasterWorld Guest from 54.243.17.133
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Variable Spaces in Redirect and Rewrite
BBonanza




msg:3830230
 11:21 am on Jan 20, 2009 (gmt 0)

I have many pages cached with Google which are PR1-PR4 like this:

www.domain.co.uk/car_news/car_news.asp?variable=ford%20Mondeo%20news

I have used the rules below to 301 redirect then rewrite the above url to

www.domain.co.uk/car-news/ford%20mondeo%20news

***************************************************

#This is the URL rewrite rule
RewriteBase /
RewriteCond %{QUERY_STRING} ^variable=(.+)$
RewriteRule ^car_news/car_news\.asp$ car-news/%1? [NC,R]
RewriteRule ^car-news/([^/]+)?$ car_news/car_news.asp?variable=$1 [NC,L]

#This is the URL 301 Redirect rule
RewriteCond %{QUERY_STRING} ^variable=(.+)$
RewriteRule ^car_news/car_news.asp$ [domain.co.uk...] [L,R=301]

**************************************************

The spaces are all in the variable - how can I get rid of the spaces and google cache:

www.domain.co.uk/car-news/ford-mondeo-news

but rewrite to:

www.domain.co.uk/car_news/car_news.asp?variable=ford%20mondeo%20news

as this would preserve the PR.

The bit that is really stumping me is that the variables being pulled from the database DO have spaces and can't be changed.
How can I get this to work?

Please help - I am losing hair over this.

Thank you

 

jdMorgan




msg:3830406
 3:42 pm on Jan 20, 2009 (gmt 0)

The URL appearing in Google and in the browser's address bar depends on the link you publish on your page. So mod_rewrite can't really help with the basic problem.

If your code is in .htaccess, then your redirect rule can be modified to redirect the query string URLs containing %20 to SEO-friendly URLs using hyphens, but this is only a 'fix-up' and won't cure the problem. Replacing characters using mod_rewrite in .htaccess, without benefit of a 'tolower' RewriteMap is horribly inefficient.

If you want the code, see this thread [webmasterworld.com] in our library, but be aware that it may slow your server to a crawl unless you correct all links published on your pages before implementing it.

Jim

BBonanza




msg:3832927
 9:59 am on Jan 23, 2009 (gmt 0)

Thanks Jim - it seems I have a bit of a problem here because not only do I have the %20 issue on rewrite - I have now noticed that if a page with spaces in the URL is redirected then rewritten it dumps %2520 in the url which google may see as 2 different pages.

What I mean is this:

I have a 301 redirect and a rewrite for this page:
www.domain.co.uk/usedcars.asp?make=bmw&model=3 series (which has a space)

the redirect goes from the above page to:
www.domain.co.uk/used/bmw/3 series

which is then rewritten to the original.

The problem is that if I type:
www.domain.co.uk/used/bmw/3 series

into the browser it resolves to:
www.domain.co.uk/used/bmw/3%20series

but if I type:
www.domain.co.uk/usedcars.asp?make=bmw&model=3 series

into the browser it resolves to:
www.domain.co.uk/used/bmw/3%2520series

Which are 2 different pages.

It seems if you follow just the rewrite you get %20, but if you follow the redirect and rewrite you get %2520 which I am sure google will see as 2 pages.

I have almost given in to the fact that I wont get rid of the %20 in many of my URLs but now I have the %2520 issue.

Is there any kind of fix for this?

Thanks

g1smd




msg:3832982
 11:24 am on Jan 23, 2009 (gmt 0)

The %25 is the % of %20 being encoded again.

You need to use URLs in the links on your pages that have hyphens between words. The URLs in the links are the URLs that Google will index.

Next you need a redirect such that if any requested URL contains %20 or %2520 then a 301 is issued to a URL with a hyphen in it.

At some point, you'll need a rewrite to connect the requested URLs, that is, URLs with hyphens in them, to the internal script that runs the site.

Anything less than that is just going to make things worse.

jdMorgan




msg:3833211
 4:07 pm on Jan 23, 2009 (gmt 0)

I'd suggest these changes:

# 301 Redirect with rewrite/redirect loop prevention
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /car_news/car_news\.asp\?variable=([^&\ ]+)\ HTTP/
RewriteRule ^car_news/car_news\.asp$ http://www.example.com/car-news/%1? [NE,R=301,L]
#
RewriteRule ^car-news/([^/]+)?$ car_news/car_news.asp?variable=$1 [NC,NE,L]

Note: This is two rules, replacing your three.

Jim

[edited by: jdMorgan at 4:14 pm (utc) on Jan. 23, 2009]

BBonanza




msg:3834694
 1:38 pm on Jan 26, 2009 (gmt 0)

Jim, That works a treat. I can now modify all my scripts to suit.
Thank you for taking the time to help me out.

Do I alter the links on my website to go to the new rewrites or do I leave them to follow the redirect?

The reason I ask is that my site used to have 50,000 pages cached but now we only have 6000 cached by Google.
I believe google is not as keen to cache my pages because they are dynamic pages, because a close competitor recently updated their pages to rewritten ones and now they have 200,000 pages cached.

The thing is though, they have the links on their site going direct to the rewrites, not the dynamic URLs to be 301 redirected then rewritten.
Would that not cause dupe content?

Anyway thanks again.

jdMorgan




msg:3834707
 2:02 pm on Jan 26, 2009 (gmt 0)

The answer is simple, no matter what rewrites and/or redirects you may use: Always link to the URLs you want to appear in the visitor's browser address bar and in search engine results listings.

As I said above, the redirect is only a "fix-up" and won't actually cure the problem. Changing the the links on your pages is the actual "cure."

The internal rewrite is invisible to the Web if it is properly-implemented, and takes effect only inside your server; It simply maps requests for the new URLs to the correct script location in your filesystem.

So in the terms you used, you want to "link to the rewrite" so that only search engines using old URLs from their database, and visitors clicking on obsolete links (on other sites) or using obsolete bookmarks will invoke your "fix-up" redirect.

Jim

BBonanza




msg:3834744
 3:12 pm on Jan 26, 2009 (gmt 0)

This is my scenario, and I am sure you can understand my hesitancy because this scenario refers to thousands of cached pages not just a few.

I have this page ranked on the first page with Google:

www.domain.co.uk/car_sales/car.asp?mod=mondeo

It has a pr2 pagerank. The page will be rewritten to:

www.domain.co.uk/used/ford/mondeo

If I take off my site all the links to the former URL how will Google ever know the dynamic URL has a permanent 301 redirect to the latter and pass the PR to the new URL.

Will google not have 2 pages cached like this:

www.domain.co.uk/car_sales/car.asp?mod=mondeo - PR2
www.domain.co.uk/used/ford/mondeo - PR-Greybar

Which are the same page in essence.
I may be missing something very simple here, but I don't want to make an error which is difficult to undo.

Thanks

jdMorgan




msg:3834763
 3:53 pm on Jan 26, 2009 (gmt 0)

I have this page ranked on the first page with Google:

www.domain.co.uk/car_sales/car.asp?mod=mondeo

It has a pr2 pagerank. The page will be rewritten to:

www.domain.co.uk/used/ford/mondeo

No, /car_sales/car.asp?mod=mondeo should be externally redirected to /used/ford/mondeo

This addresses old URLs in the search engine databases, old links on the Web, and old bookmarks.

When a request arrives at your server for /used/ford/mondeo, regardless of whether this request required prior invocation of the external redirect, then it will be internally rewritten to your script using the now-internal-only path /car_sales/car.asp?mod=mondeo

Please do not confuse internal rewrites with external redirects, or the "direction" of the action. If these points are not clear, then you won't fully understand what is happening on your own site.

If I take off my site all the links to the former URL how will Google ever know the dynamic URL has a permanent 301 redirect to the latter and pass the PR to the new URL.

Because Google and all the other search engines keep URLs in a database, and they keep them for a long time -- years.

For pages with no inbound links from external sites, the search-engine-cached URLs and the "new links" on your own site will serve to trigger the PR transfer. Remember where this PR comes from, and this will be clearer: It comes from the linking page, and does not exist as any kind of "permanent credit" in an "account" at Google. Google states that they re-calculate the PR of all URLs in their index as an on-going process; It is only the "Google Toolbar display PR" that is subject to update latency.

So, the new on-page links establish the PR of the new URLs (or equivalently, re-establish the PR of the 'pages' that now have new URLs), the external redirect fixes-up client requests for the old URLs and tells Google to ascribe the PR of the old URLs to the new if links remain to those old URLs from other sites, and the internal rewrite delivers incoming requests for the new URLs to the script filepath on your server, so that you can generate content for those new URLs.

Maybe someone else can jump in here and explain this better -- I'm evidently not doing so well at it... :(

Jim

BBonanza




msg:3834793
 4:17 pm on Jan 26, 2009 (gmt 0)

Thanks Jim - I am getting this now.
Please don't take my hesitancy as ignorance.

I asked the same question here:

http://www.helicontech.com/forum/forum_posts-TID-11459-PN-0-TPN-2.htm

5th post down.

and got totally the opposite responce.
I feared that maybe I should get another opinion due to my competitors website doing exactly the opposite to what the other moderator advised.
This can muddle your mind sometimes :-)

Richard

[edited by: jdMorgan at 2:10 pm (utc) on Jan. 27, 2009]
[edit reason] De-linked [/edit]

g1smd




msg:3835059
 8:43 pm on Jan 26, 2009 (gmt 0)

Link to the folder-based URL format from within your own site. Links "define" URLs.

Requests for folder-based URLs are processed by the rewrite to get the content from the server.

Bots and people requesting dynamic URLs should be fed a redirect to make them request the new URL.

Bots remember old URLs for years, but the redirect will force them to request the new version of the URL.

g1smd




msg:3835060
 8:47 pm on Jan 26, 2009 (gmt 0)

On the "other" forum, no one has pointed out that placing rewrites before redirects is exposing the rewritten URL back into the outside world.

You need to list the redirects first and the rewrites last to avoid that problem.

BBonanza




msg:3835407
 8:34 am on Jan 27, 2009 (gmt 0)

I am certainly not upset with the other chap because you can see from the length of the topic that he has been trying to help me out.
The forum is a support forum for ISAP Rewrite software which we bought to give us the ability to rewrite our .asp pages on a windows server.

You guys have given me the answers to issues which were puzzling me.
For that I am extremely grateful.

BBonanza




msg:3835527
 11:39 am on Jan 27, 2009 (gmt 0)

I have a map file which is used to attach car makes to the models. This works:

avensis toyota
auris toyota
3series bmw
x3 bmw

This doesn't:

avensis toyota
auris toyota
3 series bmw
x3 bmw

The space in the "3 series" crashes the map file.
I have to have it with the space because that is how it resides in our database.

Is there a map file character I can use to emulate the space.

Thanks

jdMorgan




msg:3835648
 2:10 pm on Jan 27, 2009 (gmt 0)

Maybe try escaping the space, as in "\ " and also try encoding it as "%20". You can also try enclosing the whole "3 series bmw" in quotes, just as shown here.

Only the latest version of ISAPI Rewrite is mod_rewrite compatible, so do be careful about versions and use the ISAPI Rewrite documentation to check what we write here if you're using ISAPI Rewrite on an MS server instead of mod_rewrite on Apache.

Jim

BBonanza




msg:3836312
 9:27 am on Jan 28, 2009 (gmt 0)

I tried all three methods, but to no avail.
I am using the latest version of ISAPI and I will check the docs for issues.
Your earlier info did work fine though.

I was thinking that maybe I could use a map file for all the models without spaces and separate rules for the spaced models - there aren't that many.
My worry is conflicts could occur.

The other option is to just avoid the make of the vehicle in the URL, but that would be a great opportunity missed.

This is the dynamic URL:

www.domain.co.uk/folder/page.asp?model=avensis

This is our prefered outcome:

www.domain.co.uk/used/toyota/avensis/car

The map file works fine on the none spaced models but dislikes the "3 series" "grand cherokee" type models.

I could output:
www.domain.co.uk/used/avensis/car
in a heartbeat - without the need for a map file.

This is the map file code:

#redirect code
RewriteMap mapfile txt:mapfile.txt
RewriteMap lower int:tolower
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /folder/page\.asp\?model=([^&\ ]+)\ HTTP/
RewriteRule ^folder/page\.asp$ [domain.co.uk...] [NE,R=301,L]

#rewrite code
RewriteBase /
RewriteCond %{QUERY_STRING} ^make=(.+)&model=(.+)$
RewriteRule ^folder/page\.asp$ used/%1/%2/car? [NC,R]
RewriteRule ^used/([^/]+)/([^/]+)/car/?$ folder/page.asp?make=$1&model=$2 [NC,L]

Is the gain worth the trouble?
Thanks Jim

Richard

BBonanza




msg:3836485
 1:48 pm on Jan 28, 2009 (gmt 0)

Hi Guys,
The issue is now sorted and the map file works fine with the gaps.
FYI here is what works.
************************************************
Mapfile enrties with space should look like:

3%20series bmw

And the config should be:

RewriteBase /
RewriteMap mapfile txt:mapfile.txt
RewriteMap lower int:tolower
RewriteMap escape int:escape

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /folder/page\.asp\?model=([^&\ ]+)\ HTTP/
RewriteRule ^folder/page\.asp$ [domain.co.uk...] [NE,R=301,L]
************************************************

Thanks again for your help :-)

jdMorgan




msg:3836505
 2:29 pm on Jan 28, 2009 (gmt 0)

Good, now get rid of the redundant and highly-inefficient *double* call to the {lower} map!

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /folder/page\.asp\?model=([^&\ ]+)\ HTTP/
RewriteCond ${lower:%1} (.+)
RewriteRule ^folder/page\.asp$ http://www.example.co.uk/used/${mapfile:${escape:%1}}/%1/car? [NE,R=301,L]

Here, the second RewriteCond calls ${lower}, and replaces the previously-captured car model in %1 with the lowercase equivalent for subsequent use in the rule, likely saving a considerable amount of CPU time per request. This is one of the times where the fact that RewriteCond *replaces* previously-matched back-references comes in handy... :)

Jim

BBonanza




msg:3836531
 3:13 pm on Jan 28, 2009 (gmt 0)

It didn't work :(
I checked and double checked I had it right.

jdMorgan




msg:3836671
 5:59 pm on Jan 28, 2009 (gmt 0)

Well, it should work as long as I typed it right and you made the proper transformation from the 'generalized' parameter names posted here to the ones you're actually using.

RewriteMap expansions are explicitly allowed in RewriteConds and the syntax is given in the RewriteCond documentation. Just make sure the RewriteCond order is as I showed it, so that the map is called with %1 from the first RewriteCond, and then replaces that %1 value with the lowercased value. Also make sure that any additional RewriteConds you might have are added before the two RewriteConds poste here, so that %1 doesn't get replaced again by a subsequent RewriteCond not shown here.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved