From 46387bafe6d66d42e1975d500197b1d3a6ca4b2f Mon Sep 17 00:00:00 2001 From: Amee Lepcha Date: Wed, 18 Jun 2025 20:46:11 +0530 Subject: [PATCH 1/2] Update collect-multiline-logs.md --- .../send-data/reference-information/collect-multiline-logs.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/send-data/reference-information/collect-multiline-logs.md b/docs/send-data/reference-information/collect-multiline-logs.md index 5bdaa0d426..882a0f8dd2 100644 --- a/docs/send-data/reference-information/collect-multiline-logs.md +++ b/docs/send-data/reference-information/collect-multiline-logs.md @@ -8,6 +8,10 @@ Sumo Logic Sources by default have multiline processing enabled. Multiline proc Multiline processing requires your logs to have line breaks or carriage returns between messages. If the logs are part of a larger individual message (for example, JSON array or XML) Sumo Logic will in most cases not be able to break these into individual logs. +:::warning +Line breaks or carriage returns are control characters (`\r`, `\r\n`) that create new lines but are often invisible in text editors. +::: + ## Multiline Processing Caveats Multiline messages that are more than 2,000 lines or 512KB in size will get flushed and collected as single log lines due to the default log message size limitations. Depending on the Collector's available memory, you may be able to increase this limit. Contact Support for assistance by navigating to **Help** > **Support** in the Sumo Logic From 4cc69db158c61b28fef413ce5e14e5a598c055c3 Mon Sep 17 00:00:00 2001 From: Amee Lepcha Date: Fri, 20 Jun 2025 00:59:12 +0530 Subject: [PATCH 2/2] Update collect-multiline-logs.md --- .../collect-multiline-logs.md | 27 +++++++++---------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/docs/send-data/reference-information/collect-multiline-logs.md b/docs/send-data/reference-information/collect-multiline-logs.md index 882a0f8dd2..2deb28ba82 100644 --- a/docs/send-data/reference-information/collect-multiline-logs.md +++ b/docs/send-data/reference-information/collect-multiline-logs.md @@ -4,12 +4,10 @@ title: Collecting Multiline Logs description: Sumo Logic Sources can be configured to detect log boundaries automatically or with a regular expression. --- -Sumo Logic Sources by default have multiline processing enabled. Multiline processing is used to ensure a log message that is made up of multiple lines, separated by a line break or carriage return, are properly grouped as a single log message when ingested into Sumo Logic. - -Multiline processing requires your logs to have line breaks or carriage returns between messages. If the logs are part of a larger individual message (for example, JSON array or XML) Sumo Logic will in most cases not be able to break these into individual logs. +Sumo Logic Sources, by default, have multiline processing enabled. Multiline processing is used to ensure that a log message made up of multiple lines, with each line separated by a line break or carriage return, is correctly grouped as a single log message when ingested into Sumo Logic. :::warning -Line breaks or carriage returns are control characters (`\r`, `\r\n`) that create new lines but are often invisible in text editors. +The line breaks or carriage returns are control characters used to create new lines, usually represented by the escape sequences `\r` and `\r\n`, but are often invisible in text editors. Sumo Logic will not be able to split your log messages that do not contain these characters. ::: ## Multiline Processing Caveats @@ -28,19 +26,19 @@ Sources have the option to be configured to automatically infer log boundaries o ## Infer Boundaries -By default, **Infer Boundaries** is selected when **Multiline Processing** is enabled. The Collector will attempt to detect a common pattern which denotes the first line of a multiline message. The Collector will look at each line coming in from a Source and attempt to match that line to the known expression. If the line matches then the Collector will mark this as the start of a new message and any additional lines that do not match the expression will be assumed as part of that message. Once the Collector detects another line matching the expression it will flush the previous lines as a single message and mark that next line as the start of a new message. +By default, **Infer Boundaries** is selected when **Multiline Processing** is enabled. The Collector will attempt to detect a common pattern that denotes the first line of a multiline message. The Collector will look at each line coming in from a Source and attempt to match that line to the known expression. If the line matches, then the Collector will mark this as the start of a new message, and any additional lines that do not match the expression will be assumed as part of that message. Once the Collector detects another line matching the expression, it will flush the previous lines as a single message and mark that next line as the start of a new message. -The Collector will attempt to use the first 1,000 lines, or as many lines as appear within 30 seconds, and an algorithm to try and determine a pattern that may denote a new message starting line. **Infer boundaries** works best if the log messages contain a common anchor to start the line, such as a timestamp, and the formatting of the messages being received by the source are in a consistent format. +The Collector will attempt to use the first 1,000 lines, or as many lines as appear within 30 seconds, and an algorithm to try and determine a pattern that may denote a new message starting line. **Infer boundaries** works best if the log messages contain a common anchor to start the line, such as a timestamp, and the formatting of the messages being received by the source is in a consistent format. ## Boundary Regex You can specify the boundary between messages using a regular expression. Enter a regular expression for the full first line of every multi-line message in your log files. -In cases where a single Source is being used to collect multiple different types of files of varying formats or if no consistent pattern is detected within the messages being received then it is possible for each line to be flushed as a single message or some messages to be improperly grouped into a single message. +In cases where a single Source is being used to collect multiple different types of files of varying formats or if no consistent pattern is detected within the messages being received, then it is possible for each line to be flushed as a single message or some messages to be improperly grouped into a single message. -Even when ingesting a single Source type, auto detection is not guaranteed to work for all cases, this is noted within the Source configuration with the following text: `Please note, Infer Boundaries may not be accurate for all log types`. In this case, a custom **Boundary Regex** expression may be required for detecting the start of each log message. +Even when ingesting a single Source type, auto-detection is not guaranteed to work for all cases. This is noted within the Source configuration with the following text: `Please note, Infer Boundaries may not be accurate for all log types`. In this case, a custom **Boundary Regex** expression may be required for detecting the start of each log message. -When the option for **Boundary Regex** is used with the multiline detection the Collector will use the supplied regular expression to try and match the first line of a multiline message. +When the option for **Boundary Regex** is used with the multiline detection, the Collector will use the supplied regular expression to try and match the first line of a multiline message. :::note The expression supplied must match the entire first line of a message up to, and in some cases including, the trailing line feed or carriage return. @@ -58,8 +56,7 @@ Acceptable boundary expressions may be: * `.*\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}.*\n` * `^.*\[CPU-ResourceMonitor-1\].*` -Unacceptable boundary expressions would include the following since they -do not match the entire first line: +Unacceptable boundary expressions would include the following, since they do not match the entire first line: * `^\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}` * `[CPU-ResourceMonitor-1\]` @@ -68,12 +65,12 @@ do not match the entire first line: ### How Does Multiline Work With Syslog Sources? -Sumo Logic does not provide any options for multiline detection within Syslog Sources. For Syslog messages received over UDP Sumo Logic will treat all content contained within a single syslog request as a single message. +Sumo Logic does not provide any options for multiline detection within Syslog Sources. For Syslog messages received over UDP, Sumo Logic will treat all content contained within a single syslog request as a single message. -When syslog messages are received over TCP Sumo Logic will treat each line within a request as a new message. This is because TCP is received as a data stream and the Collector will flush a message whenever a line feed is detected. +When syslog messages are received over TCP, Sumo Logic will treat each line within a request as a new message. This is because TCP is received as a data stream, and the Collector will flush a message whenever a line feed is detected. ### How Does Multiline Work With HTTP Sources? -Multiline detection on an HTTP source only works within the confines of a single HTTP request. If you send multiple multiline messages within a single HTTP post request the multiline options will apply to those messages. If you send a multiline message as separate POST requests the multiline options do not apply. +Multiline detection on an HTTP source only works within the confines of a single HTTP request. If you send multiple multiline messages within a single HTTP post request, the multiline options will apply to those messages. If you send a multiline message as separate POST requests, the multiline options do not apply. -Sumo Logic cannot thread together multiple HTTP posts into a single message. This is due to there being no guarantee of the order of receipt (simply the nature of HTTP) and because there is no certainty that multiple clients are not sending to the same HTTP Source, which may cause additional issues with how the order of messages are received. +Sumo Logic cannot thread together multiple HTTP posts into a single message. This is due to there being no guarantee of the order of receipt (simply the nature of HTTP) and because there is no certainty that multiple clients are not sending to the same HTTP Source, which may cause additional issues with how the order of messages is received.