diff options
author | Costa Tsaousis <costa@netdata.cloud> | 2023-12-06 13:45:09 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-12-06 13:45:09 +0200 |
commit | 246e32e270c86cd68d8db48b2b49e8cd25f2bc73 (patch) | |
tree | e9375fdfe04df517ac246159d87a1ee307ad4664 /collectors | |
parent | a860bdb5dc15df6fb822860fb886fd6f1bf82707 (diff) |
Update README.md
Diffstat (limited to 'collectors')
-rw-r--r-- | collectors/log2journal/README.md | 529 |
1 files changed, 380 insertions, 149 deletions
diff --git a/collectors/log2journal/README.md b/collectors/log2journal/README.md index 85c36c68d6..f308b9a4b8 100644 --- a/collectors/log2journal/README.md +++ b/collectors/log2journal/README.md @@ -1,3 +1,4 @@ + # log2journal `log2journal` and `systemd-cat-native` can be used to convert a structured log file, such as the ones generated by web servers, into `systemd-journal` entries. @@ -68,12 +69,11 @@ This pipeline ensures a flexible and comprehensive approach to log processing, a ## Real-life example -We have an nginx server logging in this format: +We have an nginx server logging in this standard combined log format: ```bash - log_format access '$remote_addr - $remote_user [$time_local] ' + log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' - '$request_length $request_time ' '"$http_referer" "$http_user_agent"'; ``` @@ -84,85 +84,185 @@ My nginx log uses this log format: log_format access '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' - '$request_length $request_time ' '"$http_referer" "$http_user_agent"'; I want to use `log2joural` to convert this log for systemd-journal. `log2journal` accepts a PCRE2 regular expression, using the named groups in the pattern as the journal fields to extract from the logs. -Prefix all PCRE2 group names with `NGINX_` and use capital characters only. - -For the $request, use the field `MESSAGE` (without NGINX_ prefix), so that -it will appear in systemd journals as the message of the log. - -Please give me the PCRE2 pattern. +Please give me the PCRE2 pattern to extract all the fields from my nginx +log files. ``` ChatGPT replies with this: ```regexp -^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>[^"]+)" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)" + (?x) # Enable PCRE2 extended mode + ^ + (?<remote_addr>[^ ]+) \s - \s + (?<remote_user>[^ ]+) \s + \[ + (?<time_local>[^\]]+) + \] + \s+ " + (?<request> + (?<request_method>[A-Z]+) \s+ + (?<request_uri>[^ ]+) \s+ + (?<server_protocol>[^"]+) + ) + " \s+ + (?<status>\d+) \s+ + (?<body_bytes_sent>\d+) \s+ + "(?<http_referer>[^"]*)" \s+ + "(?<http_user_agent>[^"]*)" +``` + +Let's see what the above says: + +1. `(?x)`: enable PCRE2 extended mode. In this mode spaces and newlines in the pattern are ignored. To match a space you have to use `\s`. This mode allows us to split the pattern is multiple lines and add comments to it. +1. `^`: match the beginning of the line +2. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`. +3. `\s`: match a space +4. `-`: match a hyphen +5. and so on... + +We edit `nginx.yaml` and add it, like this: + +```yaml +pattern: | + (?x) # Enable PCRE2 extended mode + ^ + (?<remote_addr>[^ ]+) \s - \s + (?<remote_user>[^ ]+) \s + \[ + (?<time_local>[^\]]+) + \] + \s+ " + (?<request> + (?<request_method>[A-Z]+) \s+ + (?<request_uri>[^ ]+) \s+ + (?<server_protocol>[^"]+) + ) + " \s+ + (?<status>\d+) \s+ + (?<body_bytes_sent>\d+) \s+ + "(?<http_referer>[^"]*)" \s+ + "(?<http_user_agent>[^"]*)" ``` Let's test it with a sample line (instead of `tail`): ```bash -# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>[^"]+)" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' -MESSAGE=GET /index.html HTTP/1.1 +# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml +BODY_BYTES_SENT=4172 +HTTP_REFERER=- +HTTP_USER_AGENT=Go-http-client/1.1 +REMOTE_ADDR=1.2.3.4 +REMOTE_USER=- +REQUEST=GET /index.html HTTP/1.1 +REQUEST_METHOD=GET +REQUEST_URI=/index.html +SERVER_PROTOCOL=HTTP/1.1 +STATUS=200 +TIME_LOCAL=19/Nov/2023:00:24:43 +0000 + +``` + +As you can see, it extracted all the fields and made them capitals, as systemd-journal expects them. + +To make sure the fields are unique for nginx and do not interfere with other applications, we should prefix them with `NGINX_`: + +```yaml +pattern: | + (?x) # Enable PCRE2 extended mode + ^ + (?<remote_addr>[^ ]+) \s - \s + (?<remote_user>[^ ]+) \s + \[ + (?<time_local>[^\]]+) + \] + \s+ " + (?<request> + (?<request_method>[A-Z]+) \s+ + (?<request_uri>[^ ]+) \s+ + (?<server_protocol>[^"]+) + ) + " \s+ + (?<status>\d+) \s+ + (?<body_bytes_sent>\d+) \s+ + "(?<http_referer>[^"]*)" \s+ + "(?<http_user_agent>[^"]*)" + +prefix: 'NGINX_' # <<< we added this +``` + +And let's try it: + +```bash +# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml NGINX_BODY_BYTES_SENT=4172 NGINX_HTTP_REFERER=- NGINX_HTTP_USER_AGENT=Go-http-client/1.1 NGINX_REMOTE_ADDR=1.2.3.4 NGINX_REMOTE_USER=- -NGINX_REQUEST_LENGTH=104 -NGINX_REQUEST_TIME=0.001 +NGINX_REQUEST=GET /index.html HTTP/1.1 +NGINX_REQUEST_METHOD=GET +NGINX_REQUEST_URI=/index.html +NGINX_SERVER_PROTOCOL=HTTP/1.1 NGINX_STATUS=200 NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 ``` -As you can see, it extracted all the fields. - -The `MESSAGE` however, has 3 fields by itself: the method, the URL and the procotol version. Let's ask ChatGPT to extract these too: - +Now, all fields start with `NGINX_` but we want `NGINX_REQUEST` to be the `MESSAGE` of the log line, as we will see it by default in `journalctl` and the Netdata dashboard. Let's rename it: + +```yaml +pattern: | + (?x) # Enable PCRE2 extended mode + ^ + (?<remote_addr>[^ ]+) \s - \s + (?<remote_user>[^ ]+) \s + \[ + (?<time_local>[^\]]+) + \] + \s+ " + (?<request> + (?<request_method>[A-Z]+) \s+ + (?<request_uri>[^ ]+) \s+ + (?<server_protocol>[^"]+) + ) + " \s+ + (?<status>\d+) \s+ + (?<body_bytes_sent>\d+) \s+ + "(?<http_referer>[^"]*)" \s+ + "(?<http_user_agent>[^"]*)" + +prefix: 'NGINX_' + +rename: # <<< we added this + - new_key: MESSAGE # <<< we added this + old_key: NGINX_REQUEST # <<< we added this ``` -I see that the MESSAGE has 3 key items in it. The request method (GET, POST, -etc), the URL and HTTP protocol version. -I want to keep the MESSAGE as it is, with all the information in it, but also -extract the 3 items from it as separate fields. - -Can this be done? -``` - -ChatGPT responded with this: - -```regexp -^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)" -``` - -Let's test this too: +Let's test it: ```bash -# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' -MESSAGE=GET /index.html HTTP/1.1 # <<<<<<<<< MESSAGE +# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml +MESSAGE=GET /index.html HTTP/1.1 # <<< renamed ! NGINX_BODY_BYTES_SENT=4172 NGINX_HTTP_REFERER=- NGINX_HTTP_USER_AGENT=Go-http-client/1.1 -NGINX_HTTP_VERSION=1.1 # <<<<<<<<< VERSION -NGINX_METHOD=GET # <<<<<<<<< METHOD NGINX_REMOTE_ADDR=1.2.3.4 NGINX_REMOTE_USER=- -NGINX_REQUEST_LENGTH=104 -NGINX_REQUEST_TIME=0.001 +NGINX_REQUEST_METHOD=GET +NGINX_REQUEST_URI=/index.html +NGINX_SERVER_PROTOCOL=HTTP/1.1 NGINX_STATUS=200 NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 -NGINX_URL=/index.html # <<<<<<<<< URL ``` -Ideally, we would want the 5xx errors to be red in our `journalctl` output. To achieve that we need to add a PRIORITY field to set the log level. Log priorities are numeric and follow the `syslog` priorities. Checking `/usr/include/sys/syslog.h` we can see these: +Ideally, we would want the 5xx errors to be red in our `journalctl` output and the dashboard. To achieve that we need to add a PRIORITY field to set the log level. Log priorities are numeric and follow the `syslog` priorities. Checking `/usr/include/sys/syslog.h` we can see these: ```c #define LOG_EMERG 0 /* system is unusable */ @@ -175,73 +275,207 @@ Ideally, we would want the 5xx errors to be red in our `journalctl` output. To a #define LOG_DEBUG 7 /* debug-level messages */ ``` -Avoid setting priority to 0 (`LOG_EMERG`), because these will be on your terminal (the journal uses `wall` to let you know of such events). A good priority for errors is 3 (red in `journalctl`), or 4 (yellow in `journalctl`). - -To set the PRIORITY field in the output, we can use `NGINX_STATUS` fields. We need a copy of it, which we will alter later. +Avoid setting priority to 0 (`LOG_EMERG`), because these will be on your terminal (the journal uses `wall` to let you know of such events). A good priority for errors is 3 (red), or 4 (yellow). + +To set the PRIORITY field in the output, we can use `NGINX_STATUS`. We will do this in 2 steps: a) inject the priority field as a copy is `NGINX_STATUS` and then b) use a pattern on its value to rewrite it to the priority level we want. + +First, let's inject it: + +```yaml +pattern: | + (?x) # Enable PCRE2 extended mode + ^ + (?<remote_addr>[^ ]+) \s - \s + (?<remote_user>[^ ]+) \s + \[ + (?<time_local>[^\]]+) + \] + \s+ " + (?<request> + (?<request_method>[A-Z]+) \s+ + (?<request_uri>[^ ]+) \s+ + (?<server_protocol>[^"]+) + ) + " \s+ + (?<status>\d+) \s+ + (?<body_bytes_sent>\d+) \s+ + "(?<http_referer>[^"]*)" \s+ + "(?<http_user_agent>[^"]*)" + +prefix: 'NGINX_' + +rename: + - new_key: MESSAGE + old_key: NGINX_REQUEST + +inject: # <<< we added this + - key: PRIORITY # <<< we added this + value: '${NGINX_STATUS}' # <<< we added this +``` -We can instruct `log2journal` to duplicate `NGINX_STATUS`, like this: `log2journal --inject 'PRIORITY=${NGINX_STATUS}'`. Let's try it: +Let's see what this does: ```bash -# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --inject 'PRIORITY=${NGINX_STATUS}' +# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml MESSAGE=GET /index.html HTTP/1.1 NGINX_BODY_BYTES_SENT=4172 NGINX_HTTP_REFERER=- NGINX_HTTP_USER_AGENT=Go-http-client/1.1 -NGINX_HTTP_VERSION=1.1 -NGINX_METHOD=GET NGINX_REMOTE_ADDR=1.2.3.4 NGINX_REMOTE_USER=- -NGINX_REQUEST_LENGTH=104 -NGINX_REQUEST_TIME=0.001 +NGINX_REQUEST_METHOD=GET +NGINX_REQUEST_URI=/index.html +NGINX_SERVER_PROTOCOL=HTTP/1.1 NGINX_STATUS=200 -PRIORITY=200 # <<<<<<<<< PRIORITY IS HERE NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 -NGINX_URL=/index.html +PRIORITY=200 # <<< PRIORITY added + +``` +Now we need to rewrite it to the right priority based on its value. We will assign the priority 6 (info) when the status is 1xx, 2xx, 3xx, priority 5 (notice) when status is 4xx, priority 3 (error) when status is 5xx and anything else will go to priority 4 (warning). Let's do it: + +```yaml +pattern: | + (?x) # Enable PCRE2 extended mode + ^ + (?<remote_addr>[^ ]+) \s - \s + (?<remote_user>[^ ]+) \s + \[ + (?<time_local>[^\]]+) + \] + \s+ " + (?<request> + (?<request_method>[A-Z]+) \s+ + (?<request_uri>[^ ]+) \s+ + (?<server_protocol>[^"]+) + ) + " \s+ + (?<status>\d+) \s+ + (?<body_bytes_sent>\d+) \s+ + "(?<http_referer>[^"]*)" \s+ + "(?<http_user_agent>[^"]*)" + +prefix: 'NGINX_' + +rename: + - new_key: MESSAGE + old_key: NGINX_REQUEST + +inject: + - key: PRIORITY + value: '${NGINX_STATUS}' + +rewrite: # <<< we added this + - key: PRIORITY # <<< we added this + match: '^[123]' # <<< we added this + value: 6 # <<< we added this + + - key: PRIORITY # <<< we added this + match: '^4' # <<< we added this + value: 5 # <<< we added this + + - key: PRIORITY # <<< we added this + match: '^5' # <<< we added this + value: 3 # <<< we added this + + - key: PRIORITY # <<< we added this + match: '.*' # <<< we added this + value: 4 # <<< we added this ``` -Now that we have the `PRIORITY` field equal to the `NGINX_STATUS`, we can use instruct `log2journal` to change it to a valid priority, by appending: `--rewrite 'PRIORITY=/^5/3' --rewrite 'PRIORITY=/.*/6'`. These rewrite commands say to match everything that starts with `5` and replace it with priority `3` (error) and everything else with priority `6` (info). Let's see it: +Rewrite rules are processed in order and the first matching a field, stops by default processing for this field. This is why the last rule, that matches everything does not always change the priority to 4. + +Let's test it: ```bash -# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --inject 'PRIORITY=${NGINX_STATUS}' --rewrite 'PRIORITY=/^5/3' --rewrite 'PRIORITY=/.*/6' +# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml MESSAGE=GET /index.html HTTP/1.1 NGINX_BODY_BYTES_SENT=4172 NGINX_HTTP_REFERER=- NGINX_HTTP_USER_AGENT=Go-http-client/1.1 -NGINX_HTTP_VERSION=1.1 -NGINX_METHOD=GET NGINX_REMOTE_ADDR=1.2.3.4 NGINX_REMOTE_USER=- -NGINX_REQUEST_LENGTH=104 -NGINX_REQUEST_TIME=0.001 +NGINX_REQUEST_METHOD=GET +NGINX_REQUEST_URI=/index.html +NGINX_SERVER_PROTOCOL=HTTP/1.1 NGINX_STATUS=200 -PRIORITY=6 # <<<<<<<<<< PRIORITY changed to 6 NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 -NGINX_URL=/index.html +PRIORITY=6 # <<< PRIORITY rewritten here ``` -Similarly, we could duplicate `${NGINX_URL}` to `NGINX_ENDPOINT` and then process it to remove any query string, or replace IDs in the URL path with constant names, thus giving us uniform endpoints independently of the parameters. +Rewrite rules are powerful. You can have named groups in them, like in the main pattern, to extract sub-fields from them, which you can then use in variable substitution. You can use rewrite rules to anonymize the URLs, e.g to remove customer IDs or transaction details from them. + +To complete the example, we can also inject a `SYSLOG_IDENTIFIER`. Generally your journal logs should always have 3 fields: `MESSAGE`, `PRIORITY` and `SYSLOG_IDENTIFIER`. These 3 fields make it a complete entry. Then you can add as many fields as required for your use case. + +```yaml +pattern: | + (?x) # Enable PCRE2 extended mode + ^ + (?<remote_addr>[^ ]+) \s - \s + (?<remote_user>[^ ]+) \s + \[ + (?<time_local>[^\]]+) + \] + \s+ " + (?<request> + (?<request_method>[A-Z]+) \s+ + (?<request_uri>[^ ]+) \s+ + (?<server_protocol>[^"]+) + ) + " \s+ + (?<status>\d+) \s+ + (?<body_bytes_sent>\d+) \s+ + "(?<http_referer>[^"]*)" \s+ + "(?<http_user_agent>[^"]*)" + +prefix: 'NGINX_' + +rename: + - new_key: MESSAGE + old_key: NGINX_REQUEST + +inject: + - key: PRIORITY + value: '${NGINX_STATUS}' + - key: SYSLOG_IDENTIFIER # <<< we added this + value: 'nginx-log' # <<< we added this + +rewrite: + - key: PRIORITY + match: '^[123]' + value: 6 + + - key: PRIORITY + match: '^4' + value: 5 + + - key: PRIORITY + match: '^5' + value: 3 + + - key: PRIORITY + match: '.*' + value: 4 +``` -To complete the example, we can also inject a `SYSLOG_IDENTIFIER` with `log2journal`, using `--inject SYSLOG_IDENTIFIER=nginx-log`, like this: +Let's see it: ```bash -# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --inject 'PRIORITY=${NGINX_STATUS}' --inject 'SYSLOG_IDENTIFIER=nginx' -rewrite 'PRIORITY=/^5/3' --rewrite 'PRIORITY=/.*/6' +# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml MESSAGE=GET /index.html HTTP/1.1 NGINX_BODY_BYTES_SENT=4172 NGINX_HTTP_REFERER=- NGINX_HTTP_USER_AGENT=Go-http-client/1.1 -NGINX_HTTP_VERSION=1.1 -NGINX_METHOD=GET NGINX_REMOTE_ADDR=1.2.3.4 NGINX_REMOTE_USER=- -NGINX_REQUEST_LENGTH=104 -NGINX_REQUEST_TIME=0.001 +NGINX_REQUEST_METHOD=GET +NGINX_REQUEST_URI=/index.html +NGINX_SERVER_PROTOCOL=HTTP/1.1 NGINX_STATUS=200 -PRIORITY=6 NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 -NGINX_URL=/index.html -SYSLOG_IDENTIFIER=nginx-log # <<<<<<<<< THIS HAS BEEN ADDED +PRIORITY=6 +SYSLOG_IDENTIFIER=nginx-log # <<< SYSLOG_IDENTIFIER added ``` @@ -249,105 +483,102 @@ Now the message is ready to be sent to a systemd-journal. For this we use `syste ```bash -# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --inject 'PRIORITY=${NGINX_STATUS}' --inject 'SYSLOG_IDENTIFIER=nginx' -rewrite 'PRIORITY=/^5/3' --rewrite 'PRIORITY=/.*/6' | systemd-cat-native +# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml | systemd-cat-native # no output # let's find the message -# journalctl -o verbose SYSLOG_IDENTIFIER=nginx -Sun 2023-11-19 04:34:06.583912 EET [s=1eb59e7934984104ab3b61f5d9648057;i=115b6d4;b=7282d89d2e6e4299969a6030302ff3e4;m=69b419673;t=60a783417ac72;x=2cec5dde8bf01ee7] +# journalctl -r -o verbose SYSLOG_IDENTIFIER=nginx-log +Wed 2023-12-06 13:23:07.083299 EET [s=5290f0133f25407aaa1e2c451c0e4756;i=57194;b=0dfa96ecc2094cecaa8ec0efcb93b865;m=b133308867;t=60bd59346a289;x=5c1bdacf2b9c4bbd] PRIORITY=6 _UID=0 _GID=0 - _BOOT_ID=7282d89d2e6e4299969a6030302ff3e4 - _MACHINE_ID=6b72c55db4f9411dbbb80b70537bf3a8 - _HOSTNAME=costa-xps9500 + _CAP_EFFECTIVE=1ffffffffff + _SELINUX_CONTEXT=unconfined + _BOOT_ID=0dfa96ecc2094cecaa8ec0efcb93b865 + _MACHINE_ID=355c8eca894d462bbe4c9422caf7a8bb + _HOSTNAME=lab-logtest-src _RUNTIME_SCOPE=system _TRANSPORT=journal - _CAP_EFFECTIVE=1ffffffffff + MESSAGE=GET /index.html HTTP/1.1 + NGINX_BODY_BYTES_SENT=4172 + NGINX_HTTP_REFERER=- + NGINX_HTTP_USER_AGENT=Go-http-client/1.1 + NGINX_REMOTE_ADDR=1.2.3.4 + NGINX_REMOTE_USER=- + NGINX_REQUEST_METHOD=GET + NGINX_REQUEST_URI=/index.html + NGINX_SERVER_PROTOCOL=HTTP/1.1 + NGINX_STATUS=200 + NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 + SYSLOG_IDENTIFIER=nginx-log + _PID=114343 + _COMM=systemd-cat-nat + _AUDIT_SESSION=253 _AUDIT_LOGINUID=1000 - _AUDIT_SESSION=1 - _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-59780d3d-a3ff-4a82-a6fe-8d17d2261106.scope + _SYSTEMD_CGROUP=/user.slice/user-1000.slice/session-253.scope + _SYSTEMD_SESSION=253 _SYSTEMD_OWNER_UID=1000 - _SYSTEMD_UNIT=user@1000.service - _SYSTEMD_USER_UNIT=vte-spawn-59780d3d-a3ff-4a82-a6fe-8d17d2261106.scope + _SYSTEMD_UNIT=session-253.scope _SYSTEMD_SLICE=user-1000.slice - _SYSTEMD_USER_SLICE=app-org.gnome.Terminal.slice - _SYSTEMD_INVOCATION_ID=6195d8c4c6654481ac9a30e9a8622ba1 - _COMM=systemd-cat-nat - MESSAGE=GET /index.html HTTP/1.1 # <<<<<<<<< CHECK - NGINX_BODY_BYTES_SENT=4172 # <<<<<<<<< CHECK - NGINX_HTTP_REFERER=- # <<<<<<<<< CHECK - NGINX_HTTP_USER_AGENT=Go-http-client/1.1 # <<<<<<<<< CHECK - NGINX_HTTP_VERSION=1.1 # <<<<<<<<< CHECK - NGINX_METHOD=GET # <<<<<<<<< CHECK - NGINX_REMOTE_ADDR=1.2.3.4 # <<<<<<<<< CHECK - NGINX_REMOTE_USER=- # <<<<<<<<< CHECK - NGINX_REQUEST_LENGTH=104 # <<<<<<<<< CHECK - NGINX_REQUEST_TIME=0.001 # <<<<<<<<< CHECK - NGINX_STATUS=200 # <<<<<<<<< CHECK - NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 # <<<<<<<<< CHECK - NGINX_URL=/index.html # <<<<<<<<< CHECK - SYSLOG_IDENTIFIER=nginx-log # <<<<<<<<< CHECK - _PID=354312 - _SOURCE_REALTIME_TIMESTAMP=1700361246583912 + _SYSTEMD_USER_SLICE=-.slice + _SYSTEMD_INVOCATION_ID=c59e33ead8c24880b027e317b89f9f76 + _SOURCE_REALTIME_TIMESTAMP=1701861787083299 + +``` + +So, the log line, with all its fields parsed, ended up in systemd-journal. Now we can send all the nginx logs to systemd-journal like this: + +```bash +tail -F /var/log/nginx/access.log |\ + log2journal -f nginx.yaml |\ + systemd-cat-native +``` + +## Best practices + +**Create a systemd service unit**: Add the above commands to a systemd unit file. When you run it in a systemd unit file you will be able to start/stop it and also see its status. Furthermore you can use the `LogNamespace=` directive of systemd service units to isolate your nginx logs from the logs of the rest of the system. Here is how to do it: + +Create the file `/etc/systemd/system/nginx-logs.service` (change `/path/to/nginx.yaml` to the right path): ``` +[Unit] +Description=NGINX Log to Systemd Journal +After=network.target + +[Service] +ExecStart=/bin/sh -c 'tail -F /var/log/nginx/access.log | log2journal -f /path/to/nginx.yaml' | systemd-cat-native +LogNamespace=nginx-logs +Restart=always +RestartSec=3 + +[Install] +WantedBy=multi-user.target +``` -So, the log line, with all its fields parsed, ended up in systemd-journal. +Reload systemd to grab this file: -The complete example, would look like the following script. -Running this script with parameter `test` will produce output on the terminal for you to inspect. -Unmatched log entries are added to the journal with PRIORITY=1 (`ERR_ALERT`), so that you can spot them. +```bash +sudo systemctl daemon-reload +``` -We also used the `--filename-key` of `log2journal`, which parses the filename when `tail` switches output -between files, and adds the field `NGINX_LOG_FILE` with the filename each log line comes from. +Enable and start the service: -Finally, the script also adds the field `NGINX_STATUS_FAMILY` taking values `2xx`, `3xx`, etc, so that -it is easy to find all the logs of a specific status family. +```bash +sudo systemctl enable nginx-logs.service +sudo systemctl start nginx-logs.service +``` + +To see the logs of the namespace, use: ```bash -#!/usr/bin/env bash - -test=0 -last=0 -send_or_show='./systemd-cat-native' -[ "${1}" = "test" ] && test=1 && last=100 && send_or_show=cat - -pattern='(?x) # Enable PCRE2 extended mode -^ -(?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR -(?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER -\[ - (?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL -\] -\s+ " -(?<MESSAGE> # MESSAGE - (?<NGINX_METHOD>[A-Z]+) \s+ # NGINX_METHOD - (?<NGINX_URL>[^ ]+) \s+ # NGINX_URL - HTTP/(?<NGINX_HTTP_VERSION>[^"]+) # NGINX_HTTP_VERSION -) -" \s+ -(?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS -(?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT -"(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER -"(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT -' - -tail -n $last -F /var/log/nginx/*access.log \ - | log2journal "${pattern}" \ - --filename-key 'NGINX_LOG_FILE' \ - --unmatched-key 'MESSAGE' \ - --inject-unmatched 'PRIORITY=1' \ - --inject 'PRIORITY=${NGINX_STATUS}' \ - --rewrite 'PRIORITY=/^5/3' \ - --rewrite 'PRIORITY=/.*/6' \ - --inject 'NGINX_STATUS_FAMILY=${NGINX_STATUS}' \ - --rewrite 'NGINX_STATUS_FAMILY=/^(?<first_digit>[0-9]).*$/${first_digit}xx' \ - --rewrite 'NGINX_STATUS_FAMILY=/^.*$/UNKNOWN' \ - --inject 'SYSLOG_IDENTIFIER=nginx-log' \ - | $send_or_show +journalctl -f --namespace=nginx-logs ``` +Netdata will automatically pick the new namespace and present it at the list of sources of the dashboard. + +You can also instruct `systemd-cat-native` to log to a remote system, sending the logs to a `systemd-journal-remote` instance running on another server. Check [the manual of systemd-cat-native](https://github.com/netdata/netdata/blob/master/libnetdata/log/systemd-cat-native.md). + + ## `log2journal` options ``` |