homepage Welcome to WebmasterWorld Guest from 54.166.84.82
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 39 message thread spans 2 pages: 39 ( [1] 2 > >     
SEO friendly URLs
JamieEff



 
Msg#: 4464350 posted 1:05 pm on Jun 12, 2012 (gmt 0)

Hi there

Newbie to the forum here (and to mod_rewrite as well!)

Basically I am trying to get blog urls re-written into something a little more acceptable to the search engines.

The url that is generated by the system is something like this:

http://www.example.com/blog-article.php?zzBlog=4 (where the 4 is the uniue url for a specific article)

I am looking to re-write it to look something like this:

http://www.example.com/blog/4.html

I have written the below to re-write it to :

RewriteEngine On
RewriteRule ^blog/([^-]*)\.html$ /blog-article.php?zzBlog=$1 [L]


but it doesnt seem to be working.

The .htaccess file is uploaded and in the root of the server

Any suggestions graetfully received.

Many thanks

Jamie

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 5:29 pm on Jun 12, 2012 (gmt 0)

The first step is to alter the links on the page to be in the format you now want the URLs to be in. That step is crucial.

The next step is to rewite requests in that format to point to the internal filepath where the content really resides.

The [^-]* part is a problem, meaning "not a hyphen zero or more times". The * should be + (otherwise
example.com/blog/.html would be valid) and the - should be something else as you don't have hyphen at the end of the part you want to capture.

The final part of the puzzle is to add a set of redirects that intercept external requests for parameterised URLs and redirect to the new 'friendly' URL. This rule will need a preceding RewriteCond looking at THE_REQUEST to ensure that only external requests are redirected and not those as a result of a previous internal rewrite.

There are hundreds of previous threads with example code to take inspiration from.

JamieEff



 
Msg#: 4464350 posted 9:39 am on Jun 13, 2012 (gmt 0)

Thanks for your quick reply g1smd and I take your point about the previous threads. I did look before I posted and have since as well but the problem is that I am not entirely too sure what I need to be search for to find what I am looking to achieve.

I am 100% new to this and, although I know that a certain something can be achieved, it doesnt mean I know how to express it or ask for it in the right way.

I *think* that I am on the right track from your feedback and really appreciate it.

All the best

Jamie

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 9:52 am on Jun 13, 2012 (gmt 0)

Hopefully you have a test subdomain where you can try code out (subdomain is always better than folder, for many reasons). Don't do this on the live site!

Try the suggestions above out and see how you get on. Are there other rules in the htaccess file that might interfere?

Rule order is very important: rules that block access (no point redirecting something you then block), redirects next (to avoid exposing rewritten requests as new URLs) and finally internal rewrites to serve the content.

JamieEff



 
Msg#: 4464350 posted 10:02 am on Jun 13, 2012 (gmt 0)

on the site that I am testing this on, this is the only rule on the htaccess so am pretty free what I can do which is great.

Just need to understand what I need to do and how to do it! lol

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4464350 posted 10:27 am on Jun 13, 2012 (gmt 0)

:: shuffling papers ::

The Redirect-to-Rewrite Two-Step

Problem: Your dynamically generated pages have long, ugly, hard-to-memorize URLs, probably containing query strings. You want them to have short pretty URLs.

The Solution comes in two parts.

Part 1. Redirect
When a user asks for the long ugly URL, redirect to the short pretty URL. Basic pattern:

RewriteCond %{THE_REQUEST} \?
RewriteCond %{QUERY_STRING} queryname=([a-z]+)
RewriteRule longcomplicatedURL http://www.example.com/blahblah/%1? [R=301,L]


The %1 is captured from the original query string, and the final ? means that you now get rid of the query string. In real life it will usually be a little more complicated, but that's the basic process.

Example:
user asks for
www.example.com/directory/morestuff/index.php?model=volvo

They get bounced over to
www.example.com/cars/volvo

Part 2. Rewrite
You get an incoming request for a short pretty URL-- either from a new arrival or from someone who was redirected in Part 1. The server can't tell the difference.

RewriteRule blahblah/([a-z]+)$ longcomplicatedURL?queryname=$1 [L]

This time around, you're capturing part of the request and changing it into a query string.

Example:
user asks for
www.example.com/cars/volvo

They may think that's what they're getting-- it's what the browser's address bar says-- but behind the scenes the page content is really coming from
www.example.com/directory/morestuff/index.php?model=$1

Now you see why Part 1 had to look at THE_REQUEST. It's for insurance. If something happens later on, your long complicated URL might pass through mod_rewrite again. If it does, you need to be sure it doesn't get re-redirected. Otherwise there will be an infinite loop.

Now wait a minute! Does this mean that if someone starts out asking for "longcomplicatedURL", they go through this whole rigamarole and then they end up right back where they started?

Yup. But they don't know it. They only know what the browser's address bar tells them. Even robots-- yes, even google-- can't tell that they're being rewritten.

The Redirect part of the package-- Part 1-- is not technically necessary. The Rewrite-- Part 2-- will function without it. But redirecting everyone to the same URL means that everyone is now on the same page ... and it avoids nasty things like Duplicate Content.

But you're not done yet.

Part 0.
Before you do anything with Part 1 and Part 2, go over your current site carefully. Make sure that your own links point only to the short pretty URL. Requests for the long complicated URL should come only from outside-- from people with outdated bookmarks, or old links from other sites. Your own site will use only the pretty URLs.


;)

JamieEff



 
Msg#: 4464350 posted 7:43 pm on Jun 14, 2012 (gmt 0)

Thank you lucy24, I appreciate the input...

Am working my way through trying to make sense of it!

Jamie

JamieEff



 
Msg#: 4464350 posted 12:33 pm on Jun 20, 2012 (gmt 0)

right, I am making some headway.... I think!

I have set up the .htaccess file to include this

RewriteEngine On
RewriteRule ^blogarticle/([^/]*)\.php$ /blog-article.php?zzBlog=$1 [L]

Where, /blog-article.php?zzBlog=5 is re-written to this /blogarticle/5.php which is great....

My next question is, and am not sure if this is something that I need to speak to you guys about or someone on the php forum, but how would I go about taking the 5 and replacing it with the actual title of the article?

thanks

Jamie

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4464350 posted 1:07 pm on Jun 20, 2012 (gmt 0)

Ouch.
blogarticle/([^/]*)\.php$

This pattern has two problems. First, it allows requests in the form

blogarticle/.php

with empty names. Second, it makes the server backtrack: after capturing all the way to the end-- because Regular Expressions are greedy by default-- it then slams into "Oh, oops, I was supposed to leave room for a .php".

What you want instead is

[^/.]+

This will ensure the requested filename has a non-zero length (+ instead of *), and it will stop the RegEx before it hits the extension (by including "no periods" in the group).

As to the last question: Do you mean that the visble article name-- which will become a query string-- doesn't correspond directly to the filename you need to pull up? Yup, that's a php question, involving some type of lookup. But g1 probably knows how to do it. (I don't. I don't speak php.) He will also give you a sales pitch for going extensionless as long as you're changing your URLs anyway. (Me, if I see an extensionless URL my first response is to tell it to go put some clothes on ;))

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 7:06 pm on Jun 20, 2012 (gmt 0)

Do you need
/blogarticle/ in the URL? I think not.

www.example.com/b235-insert-witty-title-here

RewriteRule ^b([0-9]+)-(.*) /blogarticle.php?id=$1&name=$2 [L]

The PHP code needs to do a LOT of things.

Look in the database for the requested ID. If it does not exist, return 404 header and include the error page code and text.

If the ID does exist, look at the requested name and compare it with the name found in the database. If there is a difference send a 301 redirect header pointing to the correct URL for this ID.

If the requested ID does exist and the requested name is exactly correct, serve the metadata and page content.

JamieEff



 
Msg#: 4464350 posted 10:29 am on Jun 21, 2012 (gmt 0)

thanks to you both :D

lucy24, I took your advice and made the necessary changes you suggested.

@g1smd that all makes sense bar the name=$2 - not sure what 'name' should be replaced to from the database... is it the contentid column or some other column (or indeed am I way off track ...)? Also, noticed you're in the same area as me.., you been in D long?

thanks

Jamie

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 12:33 pm on Jun 21, 2012 (gmt 0)

Name or "slug" is the text part of the URL. Store it as a database record or generate it from the page title (by converting to lowercase, convert & to hyphen, remove apostrophes and quotes, convert spaces and punctuation to hyphens, all using RexEx).

Lived here forever.

JamieEff



 
Msg#: 4464350 posted 2:37 pm on Jun 23, 2012 (gmt 0)

Hey g1smd

Sorry, think my ignorance is showing through - I understand that name *should* be the text ie have-a-nicew-day-blog-post.htm (or whatever), what i dont get is how to get the title that is created... if the title is storedas an entry in a table - which column from the table do I use to replace name in your example?

Sorry if I'm not being too clear.. :S

Jamie

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 4:06 pm on Jun 23, 2012 (gmt 0)

Usually it is the page name, i.e. the text found in the <title> element, but converted to lowercase, etc as described above.

JamieEff



 
Msg#: 4464350 posted 10:33 am on Jun 26, 2012 (gmt 0)

Thanks but am still struggling to understand where the data i being drawn from.

in your example,
RewriteRule ^b([0-9]+)-(.*) /blogarticle.php?id=$1&name=$2 [L]

I understand where the data is coming from regarding id=$1
but I dont know where the data is supposed to come from for name=$2

Am I supposed to be changing 'name' to a column title from the contents table of the database?

If thats the case, the line which stores the Title data is stored but which column would I use? There are:
ContentID 167
ContentGroup 1
ContentRefID Title
ContentUserID 0
ContentValue Page title goes here
ContentDate 2012-06-26 08:53:34
ContentPageName zzBlog
ContentPageSetParentID 81
ContentPageSetOrder 1
ParentID Null

Do I need to change name to something else from the table?

I'm really sorry, but I am completely lost ....... (and clearly out of my depth!)

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 1:09 pm on Jun 26, 2012 (gmt 0)

The data comes from the requested URL.

In turn that comes from the link that was clicked so the first job is to alter the on-page links to point to /345-witty-title-here. The rewrite kicks in after that link is clicked.

I assume you'd use the page title as the text after changing to lower case, converting &amp; to hyphen, removing apostrophes and quotes, and converting punctuation and spaces to hyphens.

When the request is received check to see if the requested ID is valid and return 404 if not. If valid compare the requested name with the value in database for this ID and redirect to canonical form if they are different.

If the name is correct for this ID show the content.

JamieEff



 
Msg#: 4464350 posted 2:24 pm on Jun 26, 2012 (gmt 0)

right, I think i understand what you are trying to say.. the problem is that with the system that I am using I cannot control what the URL is.

Basically the url created would be something like

www.example.com/article.php?BlogItems=4

I am unable to tack on the page title to that... I was hoping I would be able to do that using the mod_rewrite somehow accessing the title from the database itself....

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 5:30 pm on Jun 26, 2012 (gmt 0)

Mod_rewrite cannot make URLs.

Mod_rewrite cannot change URLs.

Mod_rewrite cannot affect the links on the pages of your site.

Mod_rewrite looks at the requested URL after a link is clicked and rewrites the request to fetch the content from the real location of the content inside the server without revealing what that location is.

i.e. you link to href="/345-witty-title-here"

User clicks this link and browser asks for www.example.com/345-witty-title-here

Mod_rewrite sees the request, rewrites the internal pointer and the server then looks internally at /index.php?id=345&name=witty-title-here for the content to be served.

You will need to edit the PHP script to
1. link to the new URL format in place of the old, and
2. process the parameterised request checking the ID is valid, and the name is valid for the current ID before attempting to serve any content.

The final step is to add a redirect so that when a parameterised URL is requested, the user is redirected to the new format URL. This updates the URLs that searchengines have indexed. Since the new URL contains some text that was not found in the old URL, this redirect also has to be generated by a PHP script hooked up to the database. The user's parametersied external URL request has to be rewritten to point to this special additional PHP script.

[edited by: g1smd at 5:56 pm (utc) on Jun 26, 2012]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4464350 posted 5:53 pm on Jun 26, 2012 (gmt 0)

with the system that I am using I cannot control what the URL is.

Cannot control what the URL is, or cannot control what the filename is? The essence of rewriting is that the two don't have to be the same thing.

You may not be able to store a file under the name /short-and-zippy-title

But there shouldn't be anything preventing you from writing material that says "see further fascinating discussion next door in <a href = "/short-and-zippy-title">this other amazing blog post</a> which I also wrote"

... and then when your user clicks on the link, you fire up mod_rewrite to secretly serve up content from

/preliminaryblahblah/wordpress/longboringurl.php?query=stuff&otherquery=morestuff&thirdquery=stuffyoudontevencareabout

while the user's address bar never shows anything but

/short-and-zippy-title

It is a little trickier with a pre-made CMS because you have to be careful not to interfere with the system's own built-in rewrites. But there is always a way.

System
redhat


 
Msg#: 4464350 posted 7:00 am on Jan 21, 2013 (gmt 0)

The following 2 messages were cut out to new thread by engine. New thread at: apache/4537982.htm [webmasterworld.com]
3:08 pm on Jan 21, 2013 (utc 0)

jasimon9

5+ Year Member



 
Msg#: 4464350 posted 12:16 am on Jan 26, 2013 (gmt 0)

I have a question regarding the "canonical" method presented by lucy24 in the "The Redirect-to-Rewrite Two-Step". Part 0 is to make sure links in my own site only use the pretty URLs. What happens if this does not occur?

The reason I ask is because there are are number of variations in the query string where there are optional GET vars, some of which still need to be used to pass information to the page. For example these are used in cases where the link is from a SERP, or from an email we have sent out, or other special functions.

I don't want to set up a multiplicity of pretty pages to handle all the cases. At present when the query string has different GET vars than my redirect code, the RewriteCond does not match and the URL is just passed through, and it works as it always has. So the simplest thing would be to just let that happen, and only rewrite the main cases. Or as an alternative, rewrite the URL and still include a query string for the special cases. But why bother? Why not just omit rewriting of these cases?

What is the downside of having some links in my site. The way it is written in Part 0, it looks important to have everything come through the pretty URL. So I am looking for understanding of what happens if you fix up most of the internal site links, but still have some as described?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 9:37 am on Jan 26, 2013 (gmt 0)

Why not use the pretty format for the parts that fit into it, and add the extra stuff as parameters?

I assume you've a site with a mix of internal links like:

href="/foobar/54324-quux-wibble"
which used to be

href="/index.php?cat=foobar&item=54234&prod=quux-wibble"
and other links with more parameters such as

href="/index.php?cat=foobar&item=54234&prod=quux-wibble&extraparam=something&another=somestuff"

In this case, I would alter your PHP script so that the last link is rendered as

href="/foobar/54324-quux-wibble?extraparam=something&another=somestuff"

Use the QSA flag in the internal rewrite and you're all set.

If you know the parameter names that will be present, it's also possible to set redirects to the new ("mixed") format from the old (all parameters) format too.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4464350 posted 2:04 pm on Jan 26, 2013 (gmt 0)

The key element of Rule 0 (did I really call it that?) is the same thing g1 keeps ragging on: link to the URLs you want people to see and use. Redirects are for links coming in from outside, where it's not in your power to change them ahead of time.

I should also say that I was thinking of, uh, semi-static pages. So you can say /round-blue-widgets in place of index.php?item=widget&shape=round&color=blue because that's really a page, even if it's constructed on the fly from various bits of php. But you can't be expected to generate a new pretty URL for every possible combination of items in your visitor's shopping cart.

jasimon9

5+ Year Member



 
Msg#: 4464350 posted 6:04 pm on Jan 26, 2013 (gmt 0)

g1smd: yes, I was thinking of doing it along the lines you suggest: redirect to the pretty name, but with some parameter still used. However, just letting some of the internal links stay as they are would work too. I was just wondering what the thinking was for "change all the internal links to the pretty name" as given in lucy24's Rule 0. I just wanted to know how "critical" that is. From your answer it appears not that critical and I could go either way.

lucy24: right, in that I think creating redirects for lots of combinations seems counterproductive.

My purpose is to tell the user and the search engine what the page is about.

It is not so much a shopping cart issue as it is "special cases".

For example, one of the pages in question is a landing page that has several variations, depending if they came to the page from a link on the site or from PPC. Also because we are A-B testing different versions of the page for SEO value. So that is controlled by an extra but optional parameter.

Another extra, optional parameter is used in links that we put in emails, which are intended to take the user to view a specific page on the site. But without special handling for that case, if the "login cookie" is present, they get redirected back to their logged in profile. So a parameter is used to detect this case to prevent that unwanted redirect.

So I am thinking to do one of two cases:

1. Just leave all these special cases in place, so that the link is like the following:

real-page-name.php?regularparm1=abc&regularparm2=xyz&noredirect=true?version=external

2. Alternatively, do some rewriting to the pretty URL, but leave some of the paramters, as g1smd suggests:

pretty-page-name.php?noredirect=true?version=external

#1 is obviously much less work, but #2 seems nicer.

I also like the idea of getting rid of the php out of the pretty names as you have suggested, but I want to get the basics working first. There will be php code that needs to be changed to accommodate this additional change because the site is template-driven, and I don't want to be working with that additional factor at the same time.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 6:32 pm on Jan 26, 2013 (gmt 0)

#2 is much nicer as you can tell Google to ignore those extra parameters, i.e.
/pretty-page-name?noredirect=true&version=external
is really just
/pretty-page-name

If you go with #1 and tell Google to ignore the extra parameters, you're still left with two "base" URLs:
/pretty-page-name and
/real-page-name.php?regularparm1=abc&regularparm2=xyz
and ne'er the twain shall meet.

And, yes, while the file on the server hard drive needs the .php extension in order to function, there is no need to have .php in the URLs that users request.

jasimon9

5+ Year Member



 
Msg#: 4464350 posted 9:30 pm on Jan 26, 2013 (gmt 0)

How do you tell Google to ignore the other parameters?

But on principle I agree nicer to have the prettiest name possible.

Regarding php suffix: I do not disagree. Just trying to do one thing at a time.

jasimon9

5+ Year Member



 
Msg#: 4464350 posted 12:45 am on Jan 28, 2013 (gmt 0)

I have gotten the rewrite portion to work as g1smd suggested in hist post of 9:37 am on Jan 26, 2013. Using a simple test page redirect.php that just displays its GET parameters, the following rules seem to do the job:


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /redirect\.php\?one=1&two=2&?([A-Za-z0-9=&]*)\ HTTP/
RewriteRule ^redirect\.php$ /redirected.php?%1 [R=301,L]
RewriteRule ^redirected\.php$ /redirect.php?one=1&two=2%1 [L,QSA]


At first I thought the QSA should work by itself without the back references, but I could get that to work. With this method, the back reference has to be applied to both rewrite rules.

Again the objective is to rewrite a portion of the query string, but pass through some optional parts via the back reference. Please note "one=1" does not imply that the 1 is a parameter that changes, which would of course require the more usual handling.

The reason I am providing what I found to work is in part to see where I might be going wrong.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 12:57 am on Jan 28, 2013 (gmt 0)

You don't want QSA on the redirect as that would append ALL of the original parameters on the end of the friendly URL. You want to append only some selected parameters.

The redirect target should include the protocol and hostname.

Add a blank line after each RewriteRule to aid human readability.

QSA is needed only on the rewrite. This will re-append to the internal request, the parameters that were originally attached to the frindly URL request.

In the rewrite, %1 is always empty. The QSA flag does everything you need. Remove the %1.

The site should link to the friendly URL or to friendly URL plus the special parameters.

The redirect rule is there for agents that attempt to access the old style URLs.

Most requests are fulfilled by only the rewrite.


The RewriteCond pattern possibly needs a bit of a rejig. The &? for optional ampersand is problematical. It allows request ending &two=2someotherstuff to match without an & between the two=2 and the someotherstuff. That's probably bad.

If there's more parameters there WILL be an ampersand, perhaps
&two=2(&([A-Za-z0-9=-]+))?\ HTTP/ for one more parameter
&two=2(&([A-Za-z0-9=-]+(&[A-Za-z0-9=-]+)*))?\ HTTP/ for multiple parameters or
&two=2(&([a-z]+=[A-Za-z0-9-]+(&[a-z]+=[A-Za-z0-9-]+)*))?\ HTTP/ for multiple parameters, each with %2 to carry over.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4464350 posted 2:18 am on Jan 28, 2013 (gmt 0)

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /redirect\.php\?one=1&two=2&?([A-Za-z0-9=&]*)\ HTTP/
RewriteRule ^redirect\.php$ /redirected.php?%1 [R=301,L]
RewriteRule ^redirected\.php$ /redirect.php?one=1&two=2%1 [L,QSA]


mod_rewrite Rule: Put a blank line before each Rule. This will help you remember something which this package strongly suggests you've forgotten: A RewriteCond applies only to the immediately following Rule-- whether or not that Rule actually executes. So the %1 in the second Rule is meaningless-- not just empty but potentially 500-meaningless-- because there was no RewriteCond to take it from.

QSA is the default. You only need to say it explicitly when the target of your rule contains a ? --meaning that you've added a query-- and you also want to preserve the existing query. Note that once you've done this, you can't make any assumptions about which individual query comes where, so be careful with ^ and $ anchors in any Condition that looks at the QUERY_STRING.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4464350 posted 2:29 am on Jan 28, 2013 (gmt 0)

Put a blank line after* each Rule.

This 39 message thread spans 2 pages: 39 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved