summaryrefslogtreecommitdiffstats
path: root/collectors
diff options
context:
space:
mode:
authorCosta Tsaousis <costa@netdata.cloud>2023-12-06 13:45:09 +0200
committerGitHub <noreply@github.com>2023-12-06 13:45:09 +0200
commit246e32e270c86cd68d8db48b2b49e8cd25f2bc73 (patch)
treee9375fdfe04df517ac246159d87a1ee307ad4664 /collectors
parenta860bdb5dc15df6fb822860fb886fd6f1bf82707 (diff)
Update README.md
Diffstat (limited to 'collectors')
-rw-r--r--collectors/log2journal/README.md529
1 files changed, 380 insertions, 149 deletions
diff --git a/collectors/log2journal/README.md b/collectors/log2journal/README.md
index 85c36c68d6..f308b9a4b8 100644
--- a/collectors/log2journal/README.md
+++ b/collectors/log2journal/README.md
@@ -1,3 +1,4 @@
+
# log2journal
`log2journal` and `systemd-cat-native` can be used to convert a structured log file, such as the ones generated by web servers, into `systemd-journal` entries.
@@ -68,12 +69,11 @@ This pipeline ensures a flexible and comprehensive approach to log processing, a
## Real-life example
-We have an nginx server logging in this format:
+We have an nginx server logging in this standard combined log format:
```bash
- log_format access '$remote_addr - $remote_user [$time_local] '
+ log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
- '$request_length $request_time '
'"$http_referer" "$http_user_agent"';
```
@@ -84,85 +84,185 @@ My nginx log uses this log format:
log_format access '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
- '$request_length $request_time '
'"$http_referer" "$http_user_agent"';
I want to use `log2joural` to convert this log for systemd-journal.
`log2journal` accepts a PCRE2 regular expression, using the named groups
in the pattern as the journal fields to extract from the logs.
-Prefix all PCRE2 group names with `NGINX_` and use capital characters only.
-
-For the $request, use the field `MESSAGE` (without NGINX_ prefix), so that
-it will appear in systemd journals as the message of the log.
-
-Please give me the PCRE2 pattern.
+Please give me the PCRE2 pattern to extract all the fields from my nginx
+log files.
```
ChatGPT replies with this:
```regexp
-^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>[^"]+)" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+```
+
+Let's see what the above says:
+
+1. `(?x)`: enable PCRE2 extended mode. In this mode spaces and newlines in the pattern are ignored. To match a space you have to use `\s`. This mode allows us to split the pattern is multiple lines and add comments to it.
+1. `^`: match the beginning of the line
+2. `(?<remote_addr[^ ]+)`: match anything up to the first space (`[^ ]+`), and name it `remote_addr`.
+3. `\s`: match a space
+4. `-`: match a hyphen
+5. and so on...
+
+We edit `nginx.yaml` and add it, like this:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
```
Let's test it with a sample line (instead of `tail`):
```bash
-# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>[^"]+)" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"'
-MESSAGE=GET /index.html HTTP/1.1
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+BODY_BYTES_SENT=4172
+HTTP_REFERER=-
+HTTP_USER_AGENT=Go-http-client/1.1
+REMOTE_ADDR=1.2.3.4
+REMOTE_USER=-
+REQUEST=GET /index.html HTTP/1.1
+REQUEST_METHOD=GET
+REQUEST_URI=/index.html
+SERVER_PROTOCOL=HTTP/1.1
+STATUS=200
+TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+
+```
+
+As you can see, it extracted all the fields and made them capitals, as systemd-journal expects them.
+
+To make sure the fields are unique for nginx and do not interfere with other applications, we should prefix them with `NGINX_`:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_' # <<< we added this
+```
+
+And let's try it:
+
+```bash
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
-NGINX_REQUEST_LENGTH=104
-NGINX_REQUEST_TIME=0.001
+NGINX_REQUEST=GET /index.html HTTP/1.1
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
```
-As you can see, it extracted all the fields.
-
-The `MESSAGE` however, has 3 fields by itself: the method, the URL and the procotol version. Let's ask ChatGPT to extract these too:
-
+Now, all fields start with `NGINX_` but we want `NGINX_REQUEST` to be the `MESSAGE` of the log line, as we will see it by default in `journalctl` and the Netdata dashboard. Let's rename it:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename: # <<< we added this
+ - new_key: MESSAGE # <<< we added this
+ old_key: NGINX_REQUEST # <<< we added this
```
-I see that the MESSAGE has 3 key items in it. The request method (GET, POST,
-etc), the URL and HTTP protocol version.
-I want to keep the MESSAGE as it is, with all the information in it, but also
-extract the 3 items from it as separate fields.
-
-Can this be done?
-```
-
-ChatGPT responded with this:
-
-```regexp
-^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"
-```
-
-Let's test this too:
+Let's test it:
```bash
-# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"'
-MESSAGE=GET /index.html HTTP/1.1 # <<<<<<<<< MESSAGE
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
+MESSAGE=GET /index.html HTTP/1.1 # <<< renamed !
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
-NGINX_HTTP_VERSION=1.1 # <<<<<<<<< VERSION
-NGINX_METHOD=GET # <<<<<<<<< METHOD
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
-NGINX_REQUEST_LENGTH=104
-NGINX_REQUEST_TIME=0.001
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
-NGINX_URL=/index.html # <<<<<<<<< URL
```
-Ideally, we would want the 5xx errors to be red in our `journalctl` output. To achieve that we need to add a PRIORITY field to set the log level. Log priorities are numeric and follow the `syslog` priorities. Checking `/usr/include/sys/syslog.h` we can see these:
+Ideally, we would want the 5xx errors to be red in our `journalctl` output and the dashboard. To achieve that we need to add a PRIORITY field to set the log level. Log priorities are numeric and follow the `syslog` priorities. Checking `/usr/include/sys/syslog.h` we can see these:
```c
#define LOG_EMERG 0 /* system is unusable */
@@ -175,73 +275,207 @@ Ideally, we would want the 5xx errors to be red in our `journalctl` output. To a
#define LOG_DEBUG 7 /* debug-level messages */
```
-Avoid setting priority to 0 (`LOG_EMERG`), because these will be on your terminal (the journal uses `wall` to let you know of such events). A good priority for errors is 3 (red in `journalctl`), or 4 (yellow in `journalctl`).
-
-To set the PRIORITY field in the output, we can use `NGINX_STATUS` fields. We need a copy of it, which we will alter later.
+Avoid setting priority to 0 (`LOG_EMERG`), because these will be on your terminal (the journal uses `wall` to let you know of such events). A good priority for errors is 3 (red), or 4 (yellow).
+
+To set the PRIORITY field in the output, we can use `NGINX_STATUS`. We will do this in 2 steps: a) inject the priority field as a copy is `NGINX_STATUS` and then b) use a pattern on its value to rewrite it to the priority level we want.
+
+First, let's inject it:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+inject: # <<< we added this
+ - key: PRIORITY # <<< we added this
+ value: '${NGINX_STATUS}' # <<< we added this
+```
-We can instruct `log2journal` to duplicate `NGINX_STATUS`, like this: `log2journal --inject 'PRIORITY=${NGINX_STATUS}'`. Let's try it:
+Let's see what this does:
```bash
-# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --inject 'PRIORITY=${NGINX_STATUS}'
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
-NGINX_HTTP_VERSION=1.1
-NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
-NGINX_REQUEST_LENGTH=104
-NGINX_REQUEST_TIME=0.001
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
-PRIORITY=200 # <<<<<<<<< PRIORITY IS HERE
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
-NGINX_URL=/index.html
+PRIORITY=200 # <<< PRIORITY added
+
+```
+Now we need to rewrite it to the right priority based on its value. We will assign the priority 6 (info) when the status is 1xx, 2xx, 3xx, priority 5 (notice) when status is 4xx, priority 3 (error) when status is 5xx and anything else will go to priority 4 (warning). Let's do it:
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+inject:
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+
+rewrite: # <<< we added this
+ - key: PRIORITY # <<< we added this
+ match: '^[123]' # <<< we added this
+ value: 6 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '^4' # <<< we added this
+ value: 5 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '^5' # <<< we added this
+ value: 3 # <<< we added this
+
+ - key: PRIORITY # <<< we added this
+ match: '.*' # <<< we added this
+ value: 4 # <<< we added this
```
-Now that we have the `PRIORITY` field equal to the `NGINX_STATUS`, we can use instruct `log2journal` to change it to a valid priority, by appending: `--rewrite 'PRIORITY=/^5/3' --rewrite 'PRIORITY=/.*/6'`. These rewrite commands say to match everything that starts with `5` and replace it with priority `3` (error) and everything else with priority `6` (info). Let's see it:
+Rewrite rules are processed in order and the first matching a field, stops by default processing for this field. This is why the last rule, that matches everything does not always change the priority to 4.
+
+Let's test it:
```bash
-# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --inject 'PRIORITY=${NGINX_STATUS}' --rewrite 'PRIORITY=/^5/3' --rewrite 'PRIORITY=/.*/6'
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
-NGINX_HTTP_VERSION=1.1
-NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
-NGINX_REQUEST_LENGTH=104
-NGINX_REQUEST_TIME=0.001
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
-PRIORITY=6 # <<<<<<<<<< PRIORITY changed to 6
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
-NGINX_URL=/index.html
+PRIORITY=6 # <<< PRIORITY rewritten here
```
-Similarly, we could duplicate `${NGINX_URL}` to `NGINX_ENDPOINT` and then process it to remove any query string, or replace IDs in the URL path with constant names, thus giving us uniform endpoints independently of the parameters.
+Rewrite rules are powerful. You can have named groups in them, like in the main pattern, to extract sub-fields from them, which you can then use in variable substitution. You can use rewrite rules to anonymize the URLs, e.g to remove customer IDs or transaction details from them.
+
+To complete the example, we can also inject a `SYSLOG_IDENTIFIER`. Generally your journal logs should always have 3 fields: `MESSAGE`, `PRIORITY` and `SYSLOG_IDENTIFIER`. These 3 fields make it a complete entry. Then you can add as many fields as required for your use case.
+
+```yaml
+pattern: |
+ (?x) # Enable PCRE2 extended mode
+ ^
+ (?<remote_addr>[^ ]+) \s - \s
+ (?<remote_user>[^ ]+) \s
+ \[
+ (?<time_local>[^\]]+)
+ \]
+ \s+ "
+ (?<request>
+ (?<request_method>[A-Z]+) \s+
+ (?<request_uri>[^ ]+) \s+
+ (?<server_protocol>[^"]+)
+ )
+ " \s+
+ (?<status>\d+) \s+
+ (?<body_bytes_sent>\d+) \s+
+ "(?<http_referer>[^"]*)" \s+
+ "(?<http_user_agent>[^"]*)"
+
+prefix: 'NGINX_'
+
+rename:
+ - new_key: MESSAGE
+ old_key: NGINX_REQUEST
+
+inject:
+ - key: PRIORITY
+ value: '${NGINX_STATUS}'
+ - key: SYSLOG_IDENTIFIER # <<< we added this
+ value: 'nginx-log' # <<< we added this
+
+rewrite:
+ - key: PRIORITY
+ match: '^[123]'
+ value: 6
+
+ - key: PRIORITY
+ match: '^4'
+ value: 5
+
+ - key: PRIORITY
+ match: '^5'
+ value: 3
+
+ - key: PRIORITY
+ match: '.*'
+ value: 4
+```
-To complete the example, we can also inject a `SYSLOG_IDENTIFIER` with `log2journal`, using `--inject SYSLOG_IDENTIFIER=nginx-log`, like this:
+Let's see it:
```bash
-# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --inject 'PRIORITY=${NGINX_STATUS}' --inject 'SYSLOG_IDENTIFIER=nginx' -rewrite 'PRIORITY=/^5/3' --rewrite 'PRIORITY=/.*/6'
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
-NGINX_HTTP_VERSION=1.1
-NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
-NGINX_REQUEST_LENGTH=104
-NGINX_REQUEST_TIME=0.001
+NGINX_REQUEST_METHOD=GET
+NGINX_REQUEST_URI=/index.html
+NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
-PRIORITY=6
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
-NGINX_URL=/index.html
-SYSLOG_IDENTIFIER=nginx-log # <<<<<<<<< THIS HAS BEEN ADDED
+PRIORITY=6
+SYSLOG_IDENTIFIER=nginx-log # <<< SYSLOG_IDENTIFIER added
```
@@ -249,105 +483,102 @@ Now the message is ready to be sent to a systemd-journal. For this we use `syste
```bash
-# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --inject 'PRIORITY=${NGINX_STATUS}' --inject 'SYSLOG_IDENTIFIER=nginx' -rewrite 'PRIORITY=/^5/3' --rewrite 'PRIORITY=/.*/6' | systemd-cat-native
+# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml | systemd-cat-native
# no output
# let's find the message
-# journalctl -o verbose SYSLOG_IDENTIFIER=nginx
-Sun 2023-11-19 04:34:06.583912 EET [s=1eb59e7934984104ab3b61f5d9648057;i=115b6d4;b=7282d89d2e6e4299969a6030302ff3e4;m=69b419673;t=60a783417ac72;x=2cec5dde8bf01ee7]
+# journalctl -r -o verbose SYSLOG_IDENTIFIER=nginx-log
+Wed 2023-12-06 13:23:07.083299 EET [s=5290f0133f25407aaa1e2c451c0e4756;i=57194;b=0dfa96ecc2094cecaa8ec0efcb93b865;m=b133308867;t=60bd59346a289;x=5c1bdacf2b9c4bbd]
PRIORITY=6
_UID=0
_GID=0
- _BOOT_ID=7282d89d2e6e4299969a6030302ff3e4
- _MACHINE_ID=6b72c55db4f9411dbbb80b70537bf3a8
- _HOSTNAME=costa-xps9500
+ _CAP_EFFECTIVE=1ffffffffff
+ _SELINUX_CONTEXT=unconfined
+ _BOOT_ID=0dfa96ecc2094cecaa8ec0efcb93b865
+ _MACHINE_ID=355c8eca894d462bbe4c9422caf7a8bb
+ _HOSTNAME=lab-logtest-src
_RUNTIME_SCOPE=system
_TRANSPORT=journal
- _CAP_EFFECTIVE=1ffffffffff
+ MESSAGE=GET /index.html HTTP/1.1
+ NGINX_BODY_BYTES_SENT=4172
+ NGINX_HTTP_REFERER=-
+ NGINX_HTTP_USER_AGENT=Go-http-client/1.1
+ NGINX_REMOTE_ADDR=1.2.3.4
+ NGINX_REMOTE_USER=-
+ NGINX_REQUEST_METHOD=GET
+ NGINX_REQUEST_URI=/index.html
+ NGINX_SERVER_PROTOCOL=HTTP/1.1
+ NGINX_STATUS=200
+ NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
+ SYSLOG_IDENTIFIER=nginx-log
+ _PID=114343
+ _COMM=systemd-cat-nat
+ _AUDIT_SESSION=253
_AUDIT_LOGINUID=1000
- _AUDIT_SESSION=1
- _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-59780d3d-a3ff-4a82-a6fe-8d17d2261106.scope
+ _SYSTEMD_CGROUP=/user.slice/user-1000.slice/session-253.scope
+ _SYSTEMD_SESSION=253
_SYSTEMD_OWNER_UID=1000
- _SYSTEMD_UNIT=user@1000.service
- _SYSTEMD_USER_UNIT=vte-spawn-59780d3d-a3ff-4a82-a6fe-8d17d2261106.scope
+ _SYSTEMD_UNIT=session-253.scope
_SYSTEMD_SLICE=user-1000.slice
- _SYSTEMD_USER_SLICE=app-org.gnome.Terminal.slice
- _SYSTEMD_INVOCATION_ID=6195d8c4c6654481ac9a30e9a8622ba1
- _COMM=systemd-cat-nat
- MESSAGE=GET /index.html HTTP/1.1 # <<<<<<<<< CHECK
- NGINX_BODY_BYTES_SENT=4172 # <<<<<<<<< CHECK
- NGINX_HTTP_REFERER=- # <<<<<<<<< CHECK
- NGINX_HTTP_USER_AGENT=Go-http-client/1.1 # <<<<<<<<< CHECK
- NGINX_HTTP_VERSION=1.1 # <<<<<<<<< CHECK
- NGINX_METHOD=GET # <<<<<<<<< CHECK
- NGINX_REMOTE_ADDR=1.2.3.4 # <<<<<<<<< CHECK
- NGINX_REMOTE_USER=- # <<<<<<<<< CHECK
- NGINX_REQUEST_LENGTH=104 # <<<<<<<<< CHECK
- NGINX_REQUEST_TIME=0.001 # <<<<<<<<< CHECK
- NGINX_STATUS=200 # <<<<<<<<< CHECK
- NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 # <<<<<<<<< CHECK
- NGINX_URL=/index.html # <<<<<<<<< CHECK
- SYSLOG_IDENTIFIER=nginx-log # <<<<<<<<< CHECK
- _PID=354312
- _SOURCE_REALTIME_TIMESTAMP=1700361246583912
+ _SYSTEMD_USER_SLICE=-.slice
+ _SYSTEMD_INVOCATION_ID=c59e33ead8c24880b027e317b89f9f76
+ _SOURCE_REALTIME_TIMESTAMP=1701861787083299
+
+```
+
+So, the log line, with all its fields parsed, ended up in systemd-journal. Now we can send all the nginx logs to systemd-journal like this:
+
+```bash
+tail -F /var/log/nginx/access.log |\
+ log2journal -f nginx.yaml |\
+ systemd-cat-native
+```
+
+## Best practices
+
+**Create a systemd service unit**: Add the above commands to a systemd unit file. When you run it in a systemd unit file you will be able to start/stop it and also see its status. Furthermore you can use the `LogNamespace=` directive of systemd service units to isolate your nginx logs from the logs of the rest of the system. Here is how to do it:
+
+Create the file `/etc/systemd/system/nginx-logs.service` (change `/path/to/nginx.yaml` to the right path):
```
+[Unit]
+Description=NGINX Log to Systemd Journal
+After=network.target
+
+[Service]
+ExecStart=/bin/sh -c 'tail -F /var/log/nginx/access.log | log2journal -f /path/to/nginx.yaml' | systemd-cat-native
+LogNamespace=nginx-logs
+Restart=always
+RestartSec=3
+
+[Install]
+WantedBy=multi-user.target
+```
-So, the log line, with all its fields parsed, ended up in systemd-journal.
+Reload systemd to grab this file:
-The complete example, would look like the following script.
-Running this script with parameter `test` will produce output on the terminal for you to inspect.
-Unmatched log entries are added to the journal with PRIORITY=1 (`ERR_ALERT`), so that you can spot them.
+```bash
+sudo systemctl daemon-reload
+```
-We also used the `--filename-key` of `log2journal`, which parses the filename when `tail` switches output
-between files, and adds the field `NGINX_LOG_FILE` with the filename each log line comes from.
+Enable and start the service:
-Finally, the script also adds the field `NGINX_STATUS_FAMILY` taking values `2xx`, `3xx`, etc, so that
-it is easy to find all the logs of a specific status family.
+```bash
+sudo systemctl enable nginx-logs.service
+sudo systemctl start nginx-logs.service
+```
+
+To see the logs of the namespace, use:
```bash
-#!/usr/bin/env bash
-
-test=0
-last=0
-send_or_show='./systemd-cat-native'
-[ "${1}" = "test" ] && test=1 && last=100 && send_or_show=cat
-
-pattern='(?x) # Enable PCRE2 extended mode
-^
-(?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR
-(?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER
-\[
- (?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL
-\]
-\s+ "
-(?<MESSAGE> # MESSAGE
- (?<NGINX_METHOD>[A-Z]+) \s+ # NGINX_METHOD
- (?<NGINX_URL>[^ ]+) \s+ # NGINX_URL
- HTTP/(?<NGINX_HTTP_VERSION>[^"]+) # NGINX_HTTP_VERSION
-)
-" \s+
-(?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS
-(?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT
-"(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER
-"(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT
-'
-
-tail -n $last -F /var/log/nginx/*access.log \
- | log2journal "${pattern}" \
- --filename-key 'NGINX_LOG_FILE' \
- --unmatched-key 'MESSAGE' \
- --inject-unmatched 'PRIORITY=1' \
- --inject 'PRIORITY=${NGINX_STATUS}' \
- --rewrite 'PRIORITY=/^5/3' \
- --rewrite 'PRIORITY=/.*/6' \
- --inject 'NGINX_STATUS_FAMILY=${NGINX_STATUS}' \
- --rewrite 'NGINX_STATUS_FAMILY=/^(?<first_digit>[0-9]).*$/${first_digit}xx' \
- --rewrite 'NGINX_STATUS_FAMILY=/^.*$/UNKNOWN' \
- --inject 'SYSLOG_IDENTIFIER=nginx-log' \
- | $send_or_show
+journalctl -f --namespace=nginx-logs
```
+Netdata will automatically pick the new namespace and present it at the list of sources of the dashboard.
+
+You can also instruct `systemd-cat-native` to log to a remote system, sending the logs to a `systemd-journal-remote` instance running on another server. Check [the manual of systemd-cat-native](https://github.com/netdata/netdata/blob/master/libnetdata/log/systemd-cat-native.md).
+
+
## `log2journal` options
```