Mod rewrite dynamic content to extensionless URL and permanent redirect

Forum Moderators: phranque

Message Too Old, No Replies

Mod rewrite dynamic content to extensionless URL and permanent redirect

HaloPlayer

7:48 am on Feb 9, 2014 (gmt 0)

This one seems really complicated to me.
I have files which are linked to a page like this:

www.example.com/get.php?file=file.zip or www.example.com/get.php?file=file.rar (they can be different formats e.g. .zip .rar .ace)
the page is loaded and the user can click a link to download the file.

I have done a mod rewrite so that the links are now:

www.example.com/download/storage/file.zip or
www.example.com/download/storage/file.rar etc.

I want to do make these links extension-less so: www.example.com/download/storage/file

How is this done when you have different file formats that you want to make extension-less? Technically isn't the .php in the URL the extension?

I don't even think it is possible to do a permanent redirect because there are 3 different possible file formats/extensions?

lucy24

9:08 am on Feb 9, 2014 (gmt 0)

You can't have two or more files using the same URL. Yes, OK, you can-- it's one of the three approaches to responsive design-- but nothing to do with the present question. And you definitely don't want to have an URL ending in .zip or .rar when that isn't the actual extension of the file involved. Save that for when you've got php files hiding behind an .html extension.

Why don't you simply do something like

blahblah/zipfile
blahblah/rarfile

and then rewrite using
(zip|rar)file >> file=file.$1
?

I don't even think it is possible to do a permanent redirect

I hope you're talking about redirecting your old complicated URL to the new sleeker one, since the whole point of a prettified URL is that you're quietly rewriting. You can redirect anything, provided the old name and new name each hold the same information.

I'm left with another question, though. Do those URLs lead directly to downloads, or to a preliminary page? Browsers associate certain behaviors with certain extensions. If a browser sees .jpg it expects to receive an image file; if it sees a .zip extension it will expect a download. If there's no extension at all, it will probably behave as if it's getting a page.

Thinking about it further: If these names lead directly to downloads, would you even want to prettify the URL? Surely they aren't going to be indexed. And you probably don't want users bookmarking a direct-download URL or passing it around to others. If it is never to be used anywhere but a "click to download" link, it doesn't much matter what the URL looks like.

:: wandering off to find out whether ".rar" is really a thing, or did someone's fingers aim for ".tar" and miss ::

HaloPlayer

9:22 am on Feb 9, 2014 (gmt 0)

Thanks lucy24 for the quick reply.
To answer your questions, the two examples I gave are probably confusing, I was trying to highlight the use of different extensions, I should have wrote file.zip anotherfile.rar etc.

Yes I am talking about redirecting the old dynamic URL to a static one, doing this to prevent duplicate content for the search engines.

The URL's don't lead directly to downloads, they lead to a preliminary page which has the actual file download link which a user clicks.

I can work out the redirect rule and condition (I think, I will have a play around with it) my main concern is making the URL's extensionless and whether it is possible.
I should explain I am making these URL's cleaner for SEO purposes, because these download pages are actual content and not just a simple link.

HaloPlayer

10:05 am on Feb 9, 2014 (gmt 0)

Thinking about it, if it isn't possible to make the URL's extensionless and I just left them with the .zip .rar etc. extensions e.g. www.example.com/download/storage/file.zip would Google still consider this worth indexing considering it ends with .zip? In other words would this hurt my rankings?

lucy24

8:46 pm on Feb 9, 2014 (gmt 0)

Google indexes everything. Looking into the contents of a zip file is trivial; one of my text editors does it by default. But google does recognize the
Disallow: *.xtn
format in robots.txt. So you can ask them not to crawl certain filetypes. Or, in the alternative, let them crawl and apply a universal "noindex". Yes, they crawl files they can't index. "Well, how was I to know that .midi is a sound file? It could have been a disguised page!"

The URL's don't lead directly to downloads, they lead to a preliminary page which has the actual file download link which a user clicks.

And the different download formats each have a page of their own? As long as it's a page, making it extensionless should not be a problem. The only question is how many pages are involved. With just a few, you can make individual RewriteRules-- one set for the preliminary redirects, a second set for the rewrites. But if there are many, it will probably work better to detour to a php script that does the lookup in each direction. The rule itself would then look something like

RewriteCond %{QUERY_STRING} page=(\w+)\.(zip|rar)
RewriteRule ^blahblah/getfile\.php /fixup.php?type=%2&name=%1 [L]

Position this rule among your specific redirects, even though on the surface it's only an internal rewrite. Add a THE_REQUEST condition if the "real" filename-plus-query is the same as the URL that you're redirecting from. If it's different, the condition probably isn't needed.

And then we get into the non-Apache question, which is: If each of those pages is just a link to the download, is there even enough content on the page to be worth indexing? You may be better off slapping a global "noindex" on all of them.

HaloPlayer

9:58 am on Feb 10, 2014 (gmt 0)

Thanks for the reply.

Yes each download has a page of it's own, it's just a simple php script that uses an echo to grab the filename.zip from the URL and this is printed as a URL to on the download page the user clicks this link and it goes to the actual download. There are a lot of different files.
But the pages do contain additional content.
I would love to do a mod rewrite and just clean up the URL so instead of www.example.com/download.php?file=filename.zip
it could be www.example.com/download/storage/filename

That is my ultimate goal.

lucy24

9:17 pm on Feb 10, 2014 (gmt 0)

so instead of www.example.com/download.php?file=filename.zip
it could be www.example.com/download/storage/filename

That part's trivial. Something like

RewriteRule ^download/storage/([^/.]+)$ /download.php?file=$1.zip [L]

setting aside the question of how to distinguish between .zip and .rar (yes, I looked it up ;)) if the name is otherwise the same. Put it in the URL somewhere, either as part of the filename or as a pseudo-directory like "download/storage/zip/filename". I think I said that in my first post.

Have you ever used the "download.php?file=blahblah.zip" form in public URLs? If so, you will need a second rule-- or rather, a first rule, since external redirects go before internal rewrites. Something like this:

RewriteCond %{THE_REQUEST} \.zip
RewriteCond %{QUERY_STRING} file=([^&.])\.zip
RewriteRule ^download\.php /download/storage/%1 [R=301,L]

Details will depend on your exact naming format. Just put something in the first condition that isn't supposed to be in a human request, such as an extension. I think it's most efficient to begin by looking purely at the request, since this will exclude most (internal) requests and the server will never need to capture anything. The condition you're capturing from has to be the very last condition.

HaloPlayer

9:42 am on Feb 12, 2014 (gmt 0)

Thank you for the help lucy24, I will have a fiddle around with this and see how I go.

lucy24

10:49 am on Feb 12, 2014 (gmt 0)

Ouch, ouch, what was the cat doing posting in my name?

RewriteRule ^download\.php /download/storage/%1 [R=301,L]

should OF COURSE be

RewriteRule ^download\.php http://www.example.com/download/storage/%1 [R=301,L]

Full protocol-plus-domain for all external redirects.

HaloPlayer

6:22 am on Feb 22, 2014 (gmt 0)

Thanks lucy24.

I have had a good think about it, and removing the extension is not the best idea, but I still wan't to do a Mod rewrite to tidy up the URL a bit, but I do have a question in regards to keyword stuffing, if I do change the dynamic links to static ones and add keywords to the URL (the keywords are relevant to the pages) would Google consider this to be a spammy URL?

For example I was thinking of this:
Original URL: http://www.example.com/demo/download-bo.php?file=somefile.rar

Would become: http://www.example.com/demo/download/book/somefile.rar

Would this be an issue?