homepage Welcome to WebmasterWorld Guest from 54.197.65.82
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Rewriting a directory folder, to a root folder
wiki.example.com/directory/Page_Name to wiki.example.com/Page_Name
chopin2256

5+ Year Member



 
Msg#: 3871627 posted 7:53 pm on Mar 16, 2009 (gmt 0)

I run Mediawiki on my site. My current URL structure is wiki.example.com/directory/Page_Name

I want the url structure to be wiki.example.com/Page_Name

I do not want to put the files in root directory. Currently, my htaccess script is as follows:

RewriteRule ^directory/?(.*)$ /wiki/index.php?title=$1 [L,QSA]

Where /wiki/ is the actual location of my wiki, and directory/ is the virtual path. The above works fine. But I am having trouble rewriting the urls to root, because my wiki files are in a subdirectory. I have two questions:

1. Is it possible to have my wiki pages serve in the root even though they are in a subdirectory?
2. If yes, how?

Thanks!

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3871627 posted 8:03 pm on Mar 16, 2009 (gmt 0)

Yes it is possible.

You need a rewrite that accepts 'root' URL requests and rewrites them to fetch content from the script in the sub-folder.

The difficult parts will be:
- restricting the rewrite so that it only captures valid URL requests that should be rewritten (i.e. it should not capture requests for robots.txt and such like),
- making sure that links to CSS, JS, and image files still work. If they are 'relative' links then this change on rewrite will 'break' them. They will need to be changed to be root-relative (begins with /) links instead.

chopin2256

5+ Year Member



 
Msg#: 3871627 posted 8:35 pm on Mar 16, 2009 (gmt 0)

Ok I was wondering why it was going so slow, and the CSS was all messed up. I tried this:

RewriteRule ^(.*)$ /wiki/index.php?title=$1 [L,QSA]

And I got a 500 Internal Server error. Then I tried this:

RewriteRule ^(.*)$ wiki/index.php?title=$1 [L,QSA]

Without the front slash in front of wiki, and it worked, however it was slow, and the css was all broken. So it looks like I may need to put the CSS and image files in the root?

Also, how do I do this:

- restricting the rewrite so that it only captures valid URL requests that should be rewritten (i.e. it should not capture requests for robots.txt and such like),

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3871627 posted 9:05 pm on Mar 16, 2009 (gmt 0)

For more information, read this thread from our library [webmasterworld.com].

Jim

[edited by: jdMorgan at 11:15 pm (utc) on Mar. 16, 2009]

chopin2256

5+ Year Member



 
Msg#: 3871627 posted 12:16 am on Mar 17, 2009 (gmt 0)

Thanks for the info. I have most of this working now, including the CSS and Image rewrites. However, I still do not understand this statement:

it should not capture requests for robots.txt

I don't see why this is a problem?

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3871627 posted 12:50 am on Mar 17, 2009 (gmt 0)

Does you wiki/index.php script generate a correct robots.txt file for the site? If not, then if you rewrite robots.txt requests to your script, it's a problem.

JIm

chopin2256

5+ Year Member



 
Msg#: 3871627 posted 1:15 am on Mar 17, 2009 (gmt 0)

My robots.txt file is in the root created by me (not in the wiki folder), and I have the robots only blocking the /wiki/ folder. This way, it will only allow my root article pages to be spidered. I have nothing else in the root directory, just robots.txt and /wiki/ This is a subdomain, and its only purpose is to run the wiki.

If this setup is a problem, I still don't see why, despite the warnings you, and Mediawiki give.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3871627 posted 1:23 am on Mar 17, 2009 (gmt 0)

In that case, I suggest you test your code and your site by requesting robots.txt with your browser. If it works properly, and you see your robots.txt file contents in your browser, then fine.

Jim

[edited by: jdMorgan at 1:23 am (utc) on Mar. 17, 2009]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3871627 posted 1:34 am on Mar 17, 2009 (gmt 0)

Yes, try to view the robots.txt in your browser. Can you see it, or does your script return 'junk' for that request?

chopin2256

5+ Year Member



 
Msg#: 3871627 posted 1:40 am on Mar 17, 2009 (gmt 0)

Ohhhh, whoops. I understand the warnings now. The robots.txt won't show. Arggg, is there any solution to this other than using another directory?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3871627 posted 2:00 am on Mar 17, 2009 (gmt 0)

Yes. Make the rewrite less permissive so that only URL requests that are supposed to be handled by the script are actually going to match the URL pattern in the rewrite.

On a site I recently worked on, all URL paths to be fed to the rewrite were simply a ten digit number with no extension. A simple pattern ensured that only those matched, and everything else was not rewritten to the script.

You need to study your URL patterns closely to derive a simple regex pattern that matches everything it needs to match while not matching robots.txt, images, CSS, JS, SE verification files, and other such files. You can 'negative match' specific names with a RewriteCond, but you need to be thorough.

Do you know what Google would have done with whatever you were actually returning for the robots.txt URL? No? Neither do I. That's why it is very important to be very careful with this stuff. Bad rewrites/redirects can drop your entire site out of Google.

chopin2256

5+ Year Member



 
Msg#: 3871627 posted 4:19 am on Mar 17, 2009 (gmt 0)

Thanks for your help. Now I get it. I allowed for the robots.txt to be displayed. If I can view this file in the browser, is it safe to say that Google can read robots.txt correctly?

chopin2256

5+ Year Member



 
Msg#: 3871627 posted 12:53 pm on Mar 17, 2009 (gmt 0)

I still have one more problem....

Note these facts:

1. Wiki is in /wiki/
2. I am redirecting URLS successfully to root with this code:

RewriteRule ^(robots)\.txt - [L]
RewriteRule ^[^:]*[./] - [L]
RewriteRule ^(.+)$ wiki/index.php?title=$1 [PT,L,QSA]

The top line of code allows robots.txt
The second line of code allows all directory folders to be a negative match (allowing css/skins/images to work)
The last line redirects to root.

Now to the problem:

Edit and history pages do not display correctly. When I click on edit for instance, this is the url that is displayed:

[wiki.example.com...]

Let's say I am editing a page called "Page_Name". Usually if there is text on the page, it will say "Editing Page_Name" and of course, I could edit. But instead, I get a page with no text in the edit box, and I get this in the title on the page "Editing Wiki/index.php"

Any assistance on this would be greatly appreciated. I am trying my best here.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3871627 posted 8:35 pm on Mar 17, 2009 (gmt 0)

*** I am redirecting URLs successfully to root with this code ***

That is not how it works. You have this backwards.

The code takes URL requests for URLs in the root and rewrites the URL request to get the content from a different internal filepath within the server.

That is, a rewrite does not 'make' a URL. URLs exist when they appear in a link that someone can click on. The rewrite connects that request for URL 'A' to the place on the server, at 'B', where that content resides.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved