Welcome to WebmasterWorld Guest from 18.208.186.19

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

.htaccess with mod rewrite or Wordpress-plugin?

URL rewriting, better choice on the long run

     
4:40 pm on Aug 4, 2014 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:Feb 22, 2008
posts: 345
votes: 0


Hi,
let's say you have to solve a little issue with URL-rewriting for a relaunch on Wordpress. It has to do with file extensions like .php or .htm, .html ect.

Let's say there are two working ways, which would you choose, considering "the long run" of the site, i.e., causing few playground for future issues and change necessities, dependancy:

-one of the available Wordpress plugins which do a more or less small rewrite-action within WP, supplementing the usual dynamic URL-creation

-redirecting the URL by mod_rewrite, server internal, without 301 (relaunch with WP doesn't actually change URLs, i.e., domain.de/bla.html --> domain.de/bla.html)

Considering the long run and not wanting to be faced with more or less "usual" issues like Plugin/WP-Updates, htaccess seems to be preferable?
Dependance, security, speed? htaccess again better, I suppose?

I'm not only faced with a certain site, it's more like a key question and basic issue for me, concerning other sites in future too.

Thanks,


deeper
7:00 pm on Sept 7, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15934
votes: 887


Had no idea this could be important.

You have to make sure the Regular Expression captures all characters that actually occur, and none that don't.

\w = alphanumerics and also lowline _
"alphanumeric" includes all letters in all scripts, including accented letters. But those probably don't occur in your URLs.

[\w-] = alphanumerics, lowline and hyphen

You will often see the simpler form
.+
but this is sloppy because the server then captures the entire thing-- including directory slashes and any .extension --and then has to backtrack to omit ".html".

20 named pages is OK. You can list them all as
(page1|page2|page3|etcetera)
but if there are only a few extensionless URLs in this location, it may be simpler to say
^([\w-]+)\.html
and let the server ignore the ones that don't match. A lot really comes down to probability. Can I assume you don't have many non-page files lying loose in the top-level directory? (Things like favicon.ico or robots.txt don't matter, because those requests are so rare by comparison with page requests.)
1:06 pm on Sept 8, 2014 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:Feb 22, 2008
posts: 345
votes: 0


There are about 20 pages like example.com/dir/page-with-pics.html and about 40 like example.com/page-with-text.html.

The first case is the one I could list as
(page1|page2|page3|.....20 etcetera) ? Not the other 40 ones?

At the moment (!) there are not many non-page files in my top-level directory: Two .doc, one .pdf, css-directory, pics-directory.
But isn't it more important how things are after relaunch with WP, i.e., the WP directory structure which you can see here:
[lyfac.me...] (btw, this pic doesn't show about 20 .php-files in the main folder; it shows only folders).

I would prefer the best solution, for example in terms of reliability, performance... As it is only a work being done once, I don't mind naming 40 or 20 pages. So if this is the best choice, o.k.
3:55 pm on Sept 8, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15934
votes: 887


Any time that the entire contents of a directory are affected, it is enough to express the pattern as

^(dir1/[\w-]+)\.html
or for multiple directories
^((?:dir1|dir2|dir3)/[\w-]+)\.html

and that will catch everything.

If all .html pages at the top level are to be handled by WP, you can say

^([\w-]+)\.html

Any non-page files in this location will escape the rule. The drawback is that on every request, the server has to capture the whole filename-- and then throw away the capture if it turns out there is no following .html. But then, the server would also have to throw away the capture if it turns out [\w-]+ is followed by a directory slash.

40 files is, in my estimation, too many to list by name-- checking for all those matches is more work for the server than having to throw away the occasional non-match. But it may be less complicated if you separate the rules: one rule for the pages in named directories, and then a second one for pages lying loose in the root. No matter what you do, the server will have to evaluate all requests. Nothing you can do about that. But at least they're conditionless rules.

If you have 40 .html files in the root, and 38 of them are to be handled by WP, then the best approach is to add a condition like

RewriteCond %{REQUEST_URI} !^/(page1|page2)

so the rule says "all URLs except the ones I specify".


Oh yes and... In any rule that ends in an internal rewrite, make sure the .html has a closing anchor
\.html$
Otherwise you risk Duplicate Content if a request comes in with appended garbage at the end.
6:20 pm on Sept 9, 2014 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:Feb 22, 2008
posts: 345
votes: 0


O.K,, let me summarize the facts, so we can find the best way:

-All existing pages have .html as extension.

-All future posts and pages will have no .html. Within WP there will be only pages and posts without .html, because in WP the existing pages can only have extensionless URLs and the future posts will have none because I want it.

-There are only two URL-patterns:
www.example.com/page1-with-text.html
www.example.com/dir/page-with-pics.html

-At the moment the WP main folder (not root of webserver) has about 40 pages, two .doc, two .pdf, pics- and CSS-folder.

Could you please give me explicitly the codes for both ways, "file-alternative without page-listing" and "condition alternative"? "Listing option" we can drop.

Finally could you give me your recommendation, what of both you would choose, obviously they are both "good".

Regarding DC and \.html$:
WP adds automatically an canonical meta tag pointing to itsself, so requests with garbage-URLs as ending should be solved already.
Is it wise to do both? As you know german sayings: "Zu viele Köche verderben den Brei" :).
7:34 pm on Sept 9, 2014 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:Feb 22, 2008
posts: 345
votes: 0


Forgot to mention: As internal links by WP will only cause potential issues in terms of DC I prefer the internal URL-rewrite.
5:50 am on Sept 10, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15934
votes: 887


not root of webserver

Never mind the server. Is WP installed at the root level of its domain, so all requests for the domain pass through the same htaccess file? If yes, then the rules are simply

RewriteRule ^((?:dir1|dir2|dir3)/[\w-]+)\.html$ /index.php?$1 [L]


and

RewriteRule ^([\w-]+)\.html$ /index.php?$1 [L]


These rules go immediately before the #WordPress# section.

Now, technically the two rules could be collapsed into a single rule

RewriteRule ^((?:(?:dir1|dir2|dir3)/)?[\w-]+)\.html$ /index.php?$1 [L]


(note the extra parentheses and question marks). But I am inclined to think that doing it this way creates more work for the server.

All of these rules mean: "Take any request for URLs ending in .html and send them directly to the WP engine". URLs that don't end in .html-- whether because they are extensionless or because they have some other extention-- simply won't match the rule.

"Zu viele Köche verderben den Brei"

Heh. I don't think it's the appropriate proverb here, though. In fact what you want to do is leave WP itself as little to do as possible.

:: nebulous mental image of server acting as a bunch of sous-chefs skimming off all the gruntwork so the master chef at WP only has to deal with the fancy stuff, possibly having to do with movie I was dragged out to see the week before last ::

Don't rely on WP-generated redirects or "canonical" tags if you can handle the issue cleanly in htaccess. Was it this thread or a different one where someone (not me) explained very eloquently what happens when WP receives a request for an URL that it has to redirect? NOTHING is more work for the server than a CMS.
9:30 pm on Sept 11, 2014 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:Feb 22, 2008
posts: 345
votes: 0


Using Filezilla shows me the server like this:
On the top there is the account root "/". It contains six folders:

-"family name" = old WP test installation (forget it)

-"prename" = actual WP test installtion. It contains all single WP folders and files, for example xmlrpc.php. It it aligned to a subdomain.

-"website" = HTML-Site, which is still online and live for all visitors, with the normal domain.

-Statistics, errorlogs, logs

Researching about WP-canonical showed me, that is makes sense, especially for a lot of special cases, where wrong links with any parameters or uppercase letters appear.
Furthermore using BOTH, canonical and redirects can cause bigger problems if you don't know always exactly and savvy what you're doing.
Therefore I like in general "less is more".

You think it does make sense doing both and they hardly will interfere badly?
2:45 pm on Sept 15, 2014 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:Feb 22, 2008
posts: 345
votes: 0


Regarding the already existing pages with .html: Within WP they have no .html, therefore WP will automatically generate cononicals without .html too.

Will this produce duplicate content, not prevented by using .html$?
This 68 message thread spans 3 pages: 68