homepage Welcome to WebmasterWorld Guest from 23.20.61.85
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 33 message thread spans 2 pages: < < 33 ( 1 [2]     
Mod Rewrite Not Working Properly. Why?
url rewrite not working
Tehuti

5+ Year Member



 
Msg#: 4612283 posted 1:15 pm on Sep 24, 2013 (gmt 0)

I'm very new to rewriting URLs. I've only been learning about it for two days!

I'm trying to rewrite my URLs, but it's not working. I have changed the URLs on my website to look like this:

<a href="/id_number/hyphenated_article_title/">Anchor_text</a>

E.g.,

<a href="/12/how-to-lose-weight-fast/">How to Lose Weight Fast</a>

My .htaccess file looks like this:

RewriteEngine on
RewriteRule ^/([0-9]+)/[A-Za-z0-9-]+/?$ article.php?id=$1 [NC,L]


The rewrite is working, but article.php is loading slowly and without CSS styling.

Anyone know why?

 

Tehuti

5+ Year Member



 
Msg#: 4612283 posted 6:18 pm on Oct 2, 2013 (gmt 0)

Okay, another final question. Or maybe two. :-)

I have recently seen a professional htaccess file. To say the least, it was eye-opening. I had no idea that such things could be done with a htaccess file. Apparently, it's not just used for redirecting and rewriting but also for authenticating users, setting PHP handlers, overriding default server settings, and blocking all kinds of bad requests, users, bots and rippers.

At the moment, my htaccess file contains no blocks whatsoever. Is there a default list of bad requests, bots and rippers that every site should block?

I searched online for such a list and found quite a few. The problem is, most of them were quite old, and I didn't know if I could trust any of them. What's more, some of them seemed quite lengthy and therefore possibly over the top.

Another question.

What's the best way of keeping an eye on who or what makes a request of my site? Do I just keep an eye on my cPanel raw access logs, or is there a more advanced way (e.g., special software)?

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4612283 posted 7:54 pm on Oct 2, 2013 (gmt 0)

my htaccess file contains no blocks whatsoever

Holy ###. NO blocks? Welcome to mod_authz-thingummy. (Its exact name depends on which Apache version you've got. In 1.3-- which I devoutly hope you haven't got-- it was mod_access instead.)

Don't bother about importing other people's lists. Make your own. There are two prongs:

#1 The raw
Deny from 11.22.33.44
directive, using IP addresses in CIDR format. At first you will spend a lot of time counting on your fingers; after a while you'll get it internalized so when you see
11.22.32.0 - 11.22.47.255
you instantly translate
11.22.32.0/20

#2 User-agent blocks using mod_setenvif, which always executes before mod_authz-whatever. You can look at all kinds of aspects of the request, but the most useful shortcut is BrowserMatch which means "look for this RegEx in the user-agent string":

BrowserMatch ^-?$ keep_out
BrowserMatch Ahrefs keep_out
BrowserMatch "America Online Browser" keep_out
BrowserMatch AppEngine keep_out

where quotation marks are used to preserve literal spaces, and the first thing on the alphabetical list is the null user-agent. (Technically - is if they don't send the User-Agent header at all, while "" [nothing] is if the header is empty. Cover your bets.) The variable called "keep_out" doesn't mean anything; give it any name you want and then proceed to

Deny from env=keep_out

If you want to start building up Deny lists, go next door to the Search Engine Spiders and User-Agent forum (SSID). There's always a running thread on server farms. Some people also deny whole countries. You'll also get a sales pitch on whitelisting (by user-agent, not IP). Personally I think this is only appropriate for huge sites that don't mind locking out the occasional human.

You can also use mod_rewrite for access control, but save it for the more complicated actions, especially the ones that are specific to your site.

Do I just keep an eye on my cPanel raw access logs

I'm not sure how you fit "cPanel" and "raw" into the same sentence. Raw means raw. If you've never looked, you will first have to find where your host keeps them. They may be aliased from your site's physical directory (where you go to upload stuff) or you may have to follow a different path, possibly using a different password. And you will almost certainly have to change the default number of days that they keep raw logs. It's simply a text file; any text editor will open it.

I don't know how many people do their own log wrangling, or if there are standard programs to do the work. Mine evolved over a couple of years, and my sites are so tiny I do it all in javascript-- up from my original system of running a battery of Regular Expressions through the text editor.

Analytics programs like GA or Piwik are good for tracking real human visitors. Robots and lockouts can only be tracked in raw logs. As you start building up your Deny lists you'll see the 403 responses start accumulating.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4612283 posted 10:04 pm on Oct 2, 2013 (gmt 0)

The important thing to remember about htaccess is that it is a server configuration file. It contains "per directory" settings that override the main server configuration file (that you might not have access to). There are a huge number of things that you can stick in there. It's important to group similar things together to make it easier to manage the content of that file. Make sure that every part has a plain-English description describing exactly what it does.

There will be various requests your site should block. The list will be personal to your site. Start by blocking things that claim to be Google, Yahoo and Bing but come from implausible IPs. You'll quickly find a bunch of other malicious requests you need to block by looking in your server log files. You can block requests by URL path, attached parameters, user agent, remote IP address, and several others.

You'll find many lists of "things to block" published on the web. Many of these you'll never see on your site. Several things you need to block you'll never see listed elsewhere. However, this isn't something you fit and forget. These settings need regular review. This is probably the most time consuming part of the site configuration.

This 33 message thread spans 2 pages: < < 33 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved