Welcome to WebmasterWorld Guest from 54.224.127.133

Forum Moderators: Ocean10000 & incrediBILL & phranque

How to Check Header Fields

     
6:40 pm on May 10, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


[webmasterworld.com...]

I understand all but the first method. I have access only to my raw access log and htaccess. How do you:
Check Header fields and block if abnormal

This has been mentioned before in this forum but never the "how" part. All the other methods you can observe in your raw access log. Thanks in advance.
8:54 pm on May 10, 2017 (gmt 0)

Senior Member from NL 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:1461
votes: 189


"Check header fields" technically encompasses seveal of the other methods. User Agent and Referrer, for example, are both HTTP headers [en.wikipedia.org].

Apache allows you to use these header fields to respond appropriately to HTTP requests. You may, for example, want to block requests with an empty User-Agent header field, because this is usually a bot. Certain things (like the aforementioned) are "abnormal" for everyone, other things only for your specific situation, so rules should ideally be tailored.

For implementation with RewriteCond, see: [httpd.apache.org...]
Server-Variables: These are variables of the form %{ NAME_OF_VARIABLE } where NAME_OF_VARIABLE can be a string taken from the following list...
10:20 pm on May 10, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3275
votes: 160


There was a discussion here [webmasterworld.com] where this came up and incredibill was kind enough to share tips that might help a person get started with capturing and logging headers. A fair php understanding helps with setting it up.
4:12 am on May 11, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13740
votes: 459


Are you asking how, concretely?

It's easiest if all your pages already have a shared header or footer (SSI, php, whatever) and then you can just add the header-logging to things that already get done. Here is the version of iBill's LogHeader code that I currently run on all sites. It is obviously best if you understand what every word means--but it works fine if you don't. (Three guesses how I know this.)
 function get_server($var)
{
return isset($_SERVER[$var]) ? $_SERVER[$var] : false;
}

if (!function_exists('getallheaders'))
{
function getallheaders()
{
$headers = '';
foreach ($_SERVER as $name => $value)
{
if (substr($name, 0, 5) == 'HTTP_')
{ $headers[str_replace(' ', '-', ucwords(strtolower(str_replace('_', ' ', substr($name, 5)))))] = $value; }
}
return $headers;
}
}

$ip = get_server('REMOTE_ADDR');
$fh = fopen($_SERVER['DOCUMENT_ROOT'] . "/boilerplate/headers-". date('Ymd') . ".log","a");
fwrite($fh, date('Y-m-d:') . date("H:i:s\n"));
$thispage = $_SERVER['REQUEST_URI'];
fwrite($fh, "URL: $thispage\n");
fwrite($fh, "IP: $ip\n");

foreach (getallheaders() as $name => $value)
{
fwrite($fh, "$name: $value\n");
}

fwrite($fh, "----\n\n");
fclose($fh);
Note the line
$fh = fopen($_SERVER['DOCUMENT_ROOT'] . "/boilerplate/headers-". date('Ymd') . ".log","a");

On my sites I tend to have a directory called /boilerplate/ so I throw the header logs in there too. Replace with anything that's convenient for you. (I don't know what happens if you specify a directory that doesn't already exist. Will it be created, or will the whole program fail?) The line looks for a "headers-" file whose name includes today's date, creates one if it doesn't already exist, and then writes to it.

Note that header logs will not correspond exactly to access logs, because header logs made by this function roll over at midnight, while your server logs probably roll over at some other time. Mine currently go until about 4AM. These logs are stored locally on your own site--not elsewhere on the server like access logs on typical shared hosting--so they won't disappear after a fixed time. You have to go in, download them and delete.

I added this line
$thispage = $_SERVER['REQUEST_URI'];
fwrite($fh, "URL: $thispage\n");
The name of the requested file isn't a regular header field, so you need to log it explicitly--assuming you want to know, which presumably you do.

This function runs on all requests for pages--including any custom error pages, meaning that if an image request is blocked, I'll be able to figure out why. I also rewrite robots.txt requests to robots.php so I can log headers on those too. (This turned out not to be all that useful, because robots don't always send the same headers when requesting a plain-text file as when requesting html, but oh well.)