Success! I have managed to get Splunk to group Rails logs into logical groups.
First, I changed my Rails web server to be a light forwarder, as I wanted the indexer to do the grouping (as per the behaviour breakdown from the previous part of this series). Then:
$SPLUNK_HOME/etc/system/local/inputs.conf
[default]
host = web.server.example.com
[monitor:///var/www/sites/web.application.example.com/shared/log/production.log*]
disabled = false
followTail = 0
sourcetype = rails_log
[monitor:///var/log/httpd]
disabled = false
followTail = 0
sourcetype = access_combined
I’m not sure if I need to have the [rails_log] declaration on the LightForwarder’s etc/system/local/props.conf file, but it’s there as an empty declaration just in case.
Before we go further, note that my Rails logging uses log4r and a custom formatter, so my logs look like this:
[ApplicationName][INFO] 100819 06:37:07 ::
Processing SessionsController#new (for 172.0.0.1 at 2010-08-19 06:37:07) [GET]
[ApplicationName][INFO] 100819 06:37:07 :: Parameters: {"unauthenticated"=>"true", "action"=>"new", "controller"=>"sessions"}
[ApplicationName][INFO] 100819 06:37:07 :: Rendering template within layouts/agent
[ApplicationName][INFO] 100819 06:37:07 :: Rendering sessions/new
[ApplicationName][INFO] 100819 06:37:07 :: Completed in 20ms (View: 12, DB: 5) | 200 OK [https://web.application.example.com/user/sign_in?unauthenticated=true]
[ApplicationName][INFO] 100820 01:01:25 ::
Processing Index#index (for 172.0.0.1 at 2010-08-20 01:01:25) [GET]
[ApplicationName][INFO] 100820 01:01:25 :: Parameters: {"action"=>"index", "controller"=>"indes"}
[ApplicationName][INFO] 100820 01:01:25 :: Filter chain halted as [:authenticate] rendered_or_redirected.
[ApplicationName][INFO] 100820 01:01:25 :: Completed in 3ms (View: 1, DB: 4) | 401 Unauthorized [https://web.application.example.com/]
The important thing to note here is that Rails inserts 2 linebreaks before putting “Processing blah blah” so we’ll use that to split. In the props.conf documentation, it recommends using the LINE_BREAKER item to handle multiline breaks, and it seemed easier to me as it uses just one regexp to break the input stream. The important part for this kind of splitting is to know:
* Wherever the regex matches, the start of the first matching group is considered the end of the
previous event, and the end of the first matching group is considered the start of the next event.
* The contents of the first matching group is ignored as event text.
What this means is that the LINE_BREAKER regexp (using PCRE) must have a match group that can be discarded for matching. If you grep over the default provided files (`grep -Rn “LINE_BREAKER” /opt/splunk/`), you’ll see that most of the options use a linebreak or multiples thereof as the first match group. We’ll do the same. Other than that, we just want to match on the “Processing” text as the start delimiter… which comes out to the below…
On the indexer’s $SPLUNK_HOME/etc/system/local/props.conf
[rails_log]
LINE_BREAKER = ([\r\n]).* \:\: [\r\n]+Processing
Now, on the index, the rails_log items are multiline events!