|Rewriting a directory folder, to a root folder|
wiki.example.com/directory/Page_Name to wiki.example.com/Page_Name
I run Mediawiki on my site. My current URL structure is wiki.example.com/directory/Page_Name
I want the url structure to be wiki.example.com/Page_Name
I do not want to put the files in root directory. Currently, my htaccess script is as follows:
RewriteRule ^directory/?(.*)$ /wiki/index.php?title=$1 [L,QSA]
Where /wiki/ is the actual location of my wiki, and directory/ is the virtual path. The above works fine. But I am having trouble rewriting the urls to root, because my wiki files are in a subdirectory. I have two questions:
1. Is it possible to have my wiki pages serve in the root even though they are in a subdirectory?
2. If yes, how?
Yes it is possible.
You need a rewrite that accepts 'root' URL requests and rewrites them to fetch content from the script in the sub-folder.
The difficult parts will be:
- restricting the rewrite so that it only captures valid URL requests that should be rewritten (i.e. it should not capture requests for robots.txt and such like),
- making sure that links to CSS, JS, and image files still work. If they are 'relative' links then this change on rewrite will 'break' them. They will need to be changed to be root-relative (begins with /) links instead.
Ok I was wondering why it was going so slow, and the CSS was all messed up. I tried this:
RewriteRule ^(.*)$ /wiki/index.php?title=$1 [L,QSA]
And I got a 500 Internal Server error. Then I tried this:
RewriteRule ^(.*)$ wiki/index.php?title=$1 [L,QSA]
Without the front slash in front of wiki, and it worked, however it was slow, and the css was all broken. So it looks like I may need to put the CSS and image files in the root?
Also, how do I do this:
|- restricting the rewrite so that it only captures valid URL requests that should be rewritten (i.e. it should not capture requests for robots.txt and such like), |
For more information, read this thread from our library [webmasterworld.com].
[edited by: jdMorgan at 11:15 pm (utc) on Mar. 16, 2009]
Thanks for the info. I have most of this working now, including the CSS and Image rewrites. However, I still do not understand this statement:
|it should not capture requests for robots.txt |
I don't see why this is a problem?
Does you wiki/index.php script generate a correct robots.txt file for the site? If not, then if you rewrite robots.txt requests to your script, it's a problem.
My robots.txt file is in the root created by me (not in the wiki folder), and I have the robots only blocking the /wiki/ folder. This way, it will only allow my root article pages to be spidered. I have nothing else in the root directory, just robots.txt and /wiki/ This is a subdomain, and its only purpose is to run the wiki.
If this setup is a problem, I still don't see why, despite the warnings you, and Mediawiki give.
In that case, I suggest you test your code and your site by requesting robots.txt with your browser. If it works properly, and you see your robots.txt file contents in your browser, then fine.
[edited by: jdMorgan at 1:23 am (utc) on Mar. 17, 2009]
Yes, try to view the robots.txt in your browser. Can you see it, or does your script return 'junk' for that request?
Ohhhh, whoops. I understand the warnings now. The robots.txt won't show. Arggg, is there any solution to this other than using another directory?
Yes. Make the rewrite less permissive so that only URL requests that are supposed to be handled by the script are actually going to match the URL pattern in the rewrite.
On a site I recently worked on, all URL paths to be fed to the rewrite were simply a ten digit number with no extension. A simple pattern ensured that only those matched, and everything else was not rewritten to the script.
You need to study your URL patterns closely to derive a simple regex pattern that matches everything it needs to match while not matching robots.txt, images, CSS, JS, SE verification files, and other such files. You can 'negative match' specific names with a RewriteCond, but you need to be thorough.
Do you know what Google would have done with whatever you were actually returning for the robots.txt URL? No? Neither do I. That's why it is very important to be very careful with this stuff. Bad rewrites/redirects can drop your entire site out of Google.
Thanks for your help. Now I get it. I allowed for the robots.txt to be displayed. If I can view this file in the browser, is it safe to say that Google can read robots.txt correctly?
I still have one more problem....
Note these facts:
1. Wiki is in /wiki/
2. I am redirecting URLS successfully to root with this code:
RewriteRule ^(robots)\.txt - [L]
RewriteRule ^[^:]*[./] - [L]
RewriteRule ^(.+)$ wiki/index.php?title=$1 [PT,L,QSA]
The top line of code allows robots.txt
The second line of code allows all directory folders to be a negative match (allowing css/skins/images to work)
The last line redirects to root.
Now to the problem:
Edit and history pages do not display correctly. When I click on edit for instance, this is the url that is displayed:
Let's say I am editing a page called "Page_Name". Usually if there is text on the page, it will say "Editing Page_Name" and of course, I could edit. But instead, I get a page with no text in the edit box, and I get this in the title on the page "Editing Wiki/index.php"
Any assistance on this would be greatly appreciated. I am trying my best here.
*** I am redirecting URLs successfully to root with this code ***
That is not how it works. You have this backwards.
The code takes URL requests for URLs in the root and rewrites the URL request to get the content from a different internal filepath within the server.
That is, a rewrite does not 'make' a URL. URLs exist when they appear in a link that someone can click on. The rewrite connects that request for URL 'A' to the place on the server, at 'B', where that content resides.