| This 33 message thread spans 2 pages: < < 33 ( 1  ) || |
|Mod Rewrite Not Working Properly. Why?|
url rewrite not working
I'm very new to rewriting URLs. I've only been learning about it for two days!
I'm trying to rewrite my URLs, but it's not working. I have changed the URLs on my website to look like this:
<a href="/12/how-to-lose-weight-fast/">How to Lose Weight Fast</a>
My .htaccess file looks like this:
RewriteRule ^/([0-9]+)/[A-Za-z0-9-]+/?$ article.php?id=$1 [NC,L]
The rewrite is working, but article.php is loading slowly and without CSS styling.
Anyone know why?
Okay, another final question. Or maybe two. :-)
I have recently seen a professional htaccess file. To say the least, it was eye-opening. I had no idea that such things could be done with a htaccess file. Apparently, it's not just used for redirecting and rewriting but also for authenticating users, setting PHP handlers, overriding default server settings, and blocking all kinds of bad requests, users, bots and rippers.
At the moment, my htaccess file contains no blocks whatsoever. Is there a default list of bad requests, bots and rippers that every site should block?
I searched online for such a list and found quite a few. The problem is, most of them were quite old, and I didn't know if I could trust any of them. What's more, some of them seemed quite lengthy and therefore possibly over the top.
What's the best way of keeping an eye on who or what makes a request of my site? Do I just keep an eye on my cPanel raw access logs, or is there a more advanced way (e.g., special software)?
|my htaccess file contains no blocks whatsoever |
Holy ###. NO blocks? Welcome to mod_authz-thingummy. (Its exact name depends on which Apache version you've got. In 1.3-- which I devoutly hope you haven't got-- it was mod_access instead.)
Don't bother about importing other people's lists. Make your own. There are two prongs:
#1 The raw
Deny from 18.104.22.168
directive, using IP addresses in CIDR format. At first you will spend a lot of time counting on your fingers; after a while you'll get it internalized so when you see
22.214.171.124 - 126.96.36.199
you instantly translate
#2 User-agent blocks using mod_setenvif, which always executes before mod_authz-whatever. You can look at all kinds of aspects of the request, but the most useful shortcut is BrowserMatch which means "look for this RegEx in the user-agent string":
BrowserMatch ^-?$ keep_out
BrowserMatch Ahrefs keep_out
BrowserMatch "America Online Browser" keep_out
BrowserMatch AppEngine keep_out
where quotation marks are used to preserve literal spaces, and the first thing on the alphabetical list is the null user-agent. (Technically - is if they don't send the User-Agent header at all, while "" [nothing] is if the header is empty. Cover your bets.) The variable called "keep_out" doesn't mean anything; give it any name you want and then proceed to
Deny from env=keep_out
If you want to start building up Deny lists, go next door to the Search Engine Spiders and User-Agent forum (SSID). There's always a running thread on server farms. Some people also deny whole countries. You'll also get a sales pitch on whitelisting (by user-agent, not IP). Personally I think this is only appropriate for huge sites that don't mind locking out the occasional human.
You can also use mod_rewrite for access control, but save it for the more complicated actions, especially the ones that are specific to your site.
|Do I just keep an eye on my cPanel raw access logs |
I'm not sure how you fit "cPanel" and "raw" into the same sentence. Raw means raw. If you've never looked, you will first have to find where your host keeps them. They may be aliased from your site's physical directory (where you go to upload stuff) or you may have to follow a different path, possibly using a different password. And you will almost certainly have to change the default number of days that they keep raw logs. It's simply a text file; any text editor will open it.
Analytics programs like GA or Piwik are good for tracking real human visitors. Robots and lockouts can only be tracked in raw logs. As you start building up your Deny lists you'll see the 403 responses start accumulating.
The important thing to remember about htaccess is that it is a server configuration file. It contains "per directory" settings that override the main server configuration file (that you might not have access to). There are a huge number of things that you can stick in there. It's important to group similar things together to make it easier to manage the content of that file. Make sure that every part has a plain-English description describing exactly what it does.
There will be various requests your site should block. The list will be personal to your site. Start by blocking things that claim to be Google, Yahoo and Bing but come from implausible IPs. You'll quickly find a bunch of other malicious requests you need to block by looking in your server log files. You can block requests by URL path, attached parameters, user agent, remote IP address, and several others.
You'll find many lists of "things to block" published on the web. Many of these you'll never see on your site. Several things you need to block you'll never see listed elsewhere. However, this isn't something you fit and forget. These settings need regular review. This is probably the most time consuming part of the site configuration.
| This 33 message thread spans 2 pages: < < 33 ( 1  ) |