Forum Moderators: coopster
I can read the file and split it into fields but this is inconsistent as some error messages have more fields than others. Not only that but I'm splitting on the space character messes up when there is a multiple word description.
e.g.
[Sat Mar 17 15:44:28 2007] [error] [client 123.123.123.123] client denied by server configuration:
/home/widgets/www/index.php, referer: www.example.com/index.php
That would split into
field1 -> [Sat Mar 17 15:44:28 2007]
field2 -> [error]
field2 -> [client 123.123.123.123]
then it gets messedup
field4 -> client
field5 -> denied
field6 -> by
etc.
Is there a way to set the log file to put quotes around the string?
"client denied by server configuration"
and therefore not split within quotes?
Second problem.
I use modsecurity which I believe has it's own console / parsing facility but I'd like to extract data from that too. It is also in the apache error_log:
[Sat Mar 17 16:05:28 2007] [error] [client 123.123.123.123] ModSecurity: Warning. Pattern match "(?:\\\\b(?:f(?:tp_(?:nb_)?f?(?:ge¦pu)t¦get(?:s?s¦c)¦scanf¦write¦open¦read)¦
gz(?:(?:encod¦writ)e¦compress¦open¦read)¦s(?:ession_start¦candir)¦
read(?:(?:gz)?file¦dir)¦move_uploaded_file¦
(?:proc_¦bz)open)¦\\\\$_(?:(?:pos¦ge)t¦session))\\\\b" at RESPONSE_BODY. [id "970015"] [msg "PHP source code leakage"] [severity "WARNING"] [hostname "www.example.com"] [uri "/test/test.php?l=200&w=main&r=0"] [unique_id "hlLyJMPy7CBWuGicAAAAD"]
I guess what is needed is to extract based on regexp?
The first three fields are always the timestamp, error and client IP so that can be extracted easy enough.
Read line by line:
1. find first '[' and extract data from between that and the first ']' Assign to timestamp
2. find second '[' and extract data from between that and the second ']' Assign to error_msg
3. find third '[' and extract data from between that and the third ']' Assign to clientip
If the line contains MODSECURITY this is a bit tricky. How can I extract the id field?
4. find '[id' extract characters until stop ']' i.e. "970015" is extracted. Similarily find '[msg' and extract "PHP source code leakage"
In a different language I would extract that last bit using a function like this:
m_line = '[Sat Mar 17 16:05:28 2007] [error] [client 123.123.123.123] ModSecurity: Warning. Pattern match "(?:\\\\b(?:f(?:tp_(?:nb_)?f?(?:ge¦pu)t¦get(?:s?s¦c)¦scanf¦write¦open¦read)¦
gz(?:(?:encod¦writ)e¦compress¦open¦read)¦s(?:ession_start¦candir)¦
read(?:(?:gz)?file¦dir)¦move_uploaded_file¦(?:proc_¦bz)open)¦\\\\$_(?:(?:pos¦ge)t¦
session))\\\\b" at RESPONSE_BODY. [id "970015"] [msg "PHP source code leakage"] [severity "WARNING"] [hostname "www.example.com"] [uri "/test/test.php?l=200&w=main&r=0"] [unique_id "hlLyJMPy7CBWuGicAAAAD"]'
//find first occurance of '[msg' and extract 50 following characters
$temp_1 := substr($line, at('[msg', m_line) + 5, 50)
//within $temp_1 find end ']'
$msg := substr($temp_1, 1, at(']', $temp1_1) -1)
This would then set $msg = "PHP source code leakage"
How can I do that with php or is there an easier way to parse an error log file?
[edited by: Frank_Rizzo at 4:56 pm (utc) on Mar. 17, 2007]
[edited by: dreamcatcher at 9:51 pm (utc) on Mar. 17, 2007]
[edit reason] Use example.com, thanks. Also, removed smiley code. [/edit]