Merge pull request #14266 from mburke5678/logging-json-parsing

mburke5678 · web-flow · commit db94f0c2e9c2 · 2019-04-10T10:12:34.000-04:00
Logging json parsing
diff --git a/logging/efk-logging-fluentd.adoc b/logging/efk-logging-fluentd.adoc
@@ -40,5 +40,7 @@ include::modules/efk-logging-fluentd-external.adoc[leveloffset=+1]
 
 include::modules/efk-logging-fluentd-throttling.adoc[leveloffset=+1]
 
+include::modules/efk-logging-fluentd-json.adoc[leveloffset=+1]
+
 
 
diff --git a/modules/efk-logging-fluentd-json.adoc b/modules/efk-logging-fluentd-json.adoc
@@ -0,0 +1,73 @@
+// Module included in the following assemblies:
+//
+// * logging/efk-logging-fluentd.adoc
+
+[id="efk-logging-fluentd-json-{context}"]
+= Configuring Fluentd JSON parsing
+
+You can configure Fluentd to inspect each log message to determine if the message is in *JSON* format and merge
+the message into the JSON payload document posted to Elasticsearch. This feature is disabled by default.
+
+You can enable or disable this feature by editing the `MERGE_JSON_LOG` environment variable in the *fluentd* daemonset.
+
+[IMPORTANT]
+====
+Enabling this feature comes with risks, including:
+
+* Possible log loss due to Elasticsearch rejecting documents due to inconsistent type mappings.
+* Potential buffer storage leak caused by rejected message cycling.
+* Overwrite of data for field with same names.
+
+The features in this topic should be used by only experienced Fluentd and Elasticsearch users.
+====
+
+.Prerequisite
+
+* Set cluster logging to the unmanaged state. In managed state, the Cluster Logging Operator reverts changes made to the `fluentd` configuration map.
+
+.Procedure
+
+Use the following command to enable this feature:
+
+----
+oc set env ds/fluentd MERGE_JSON_LOG=true <1>
+----
+
+<1> Set this to `false` to disable this feature or `true` to enable this feature.
+
+[id="efk-logging-fluentd-json-string-{context}"]
+== Setting MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING
+
+You can use link:https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/README.md[environment variables] to modify your fluentd configuration.
+
+However, if you set the `CDM_UNDEFINED_TO_STRING` enviroment variable to `true` when `MERGE_JSON_LOG` is also set to `true`, you will receive an Elasticsearch *400* error if you have already generated conflicting fields, until the indices roll over for the next day. The error occurs because when`MERGE_JSON_LOG=true`, Fluentd adds fields with data types other than *string*. When you set `CDM_UNDEFINED_TO_STRING=true`, Fluentd attempts to add those fields with a *string* value resulting in the Elasticsearch *400* error. 
+
+When Fluentd rolls over the indices for the next day's logs, it will create a brand new index. The field definitions are updated and you will not get the *400* error. 
+
+Records that have *hard* errors, such as schema violations, corrupted data, and so forth, cannot be retried. Fluent sends the records for error handling. If you link:https://docs.fluentd.org/v1.0/articles/config-file#@error-label[add a
+`<label @ERROR>` section] to your Fluentd config, as the last <label>, you can handle these records as needed. 
+
+For example:
+
+----
+data:
+  fluent.conf:
+
+....
+
+    <label @ERROR>
+      <match **>
+        @type file
+        path /var/log/fluent/dlq
+        time_slice_format %Y%m%d
+        time_slice_wait 10m
+        time_format %Y%m%dT%H%M%S%z
+        compress gzip
+      </match>
+    </label>
+----
+
+This section writes error records to the link:https://www.elastic.co/guide/en/logstash/current/dead-letter-queues.html[Elasticsearch dead letter queue (DLQ) file]. See link:https://docs.fluentd.org/v0.12/articles/out_file[the fluentd documentation] for more information about the file output. 
+
+Then you can edit the file to clean up the records manually, edit the file to use with the Elasticsearch `/_bulk index` API and use cURL to add those records. For more information on
+Elasticsearch Bulk API, see link:https://www.elastic.co/guide/en/elasticsearch/reference/5.6/docs-bulk.html[the Elasticsearch documentation].

Original file line number	Diff line number	Diff line change
`@@ -40,5 +40,7 @@ include::modules/efk-logging-fluentd-external.adoc[leveloffset=+1]`
`40`	`40`
`41`	`41`	`include::modules/efk-logging-fluentd-throttling.adoc[leveloffset=+1]`
`42`	`42`
	`43`	`+include::modules/efk-logging-fluentd-json.adoc[leveloffset=+1]`
	`44`	`+`
`43`	`45`
`44`	`46`