How to properly handle timestamps in a rsyslog -> telegraf syslog input -> telegraf loki output -> greptime loki input workflow? #6269
-
Hey all, I'm working on configuring pipelines to get my logs into better formats. I'm at the very early stages of figuring out greptimedb, so if something is covered in the manual, I may have not found it yet, or just not seen how it applies. I have definitely been using the manual, though. :) My set up for logs is this:
That all works. I end up with my table full of syslog entries. Now I want to split that table out based on what kind of log is ingested. Postfix, monit, kernel, etc. From the manual, that means I need to set up a pipeline. Since I'm somewhat familiar with parsing postfix logs, I picked that as my initial pipeline attempt. Using this as my test data on the dashboard's pipeline configuration screen: { "line": "facility_code=\"2\" severity_code=\"6\" version=\"1\" timestamp=\"1749390620136427000\" procid=\"3474351\" message=\" DF9887C022F: to=<user@example.org>, orig_to=<root>, relay=none, delay=283818, delays=283788/0.03/30/0, dsn=4.4.1, status=deferred (connect to smtp.example.org[n.n.n.n]:25: Connection timed out)\""} I figured out how to use the regex processor to split everything up into string fields. (Note that I have not tried to address any of the other fields you can get in other postfix logs. Just the ones in my test data.) But when I try to transform the timestamp into an epoch, I get a "Failed to exec pipeline: Type: String value not supported for Epoch" message. This is my pipeline yaml: processors:
- regex:
fields:
- line
patterns:
- 'facility_code="(?<facility_code>\d+?)"'
- 'severity_code="(?<severity_code>\d+?)"'
- 'version="(?<version>\d+)"'
- 'timestamp="(?<timestamp>\d+)"'
- 'procid="(?<process_id>\d+)"'
- 'message="\s?(?<postfix_queue_id>[\d\w]{11})\:?'
- 'to=<(?<to>.*?)>{1}'
- 'orig_to=<(?<orig_to>.*?)>{1}'
- 'relay=(?<relay>.*?)[,\s$]+'
- 'delay=(?<delay>.*?)[,\s$]+'
- 'delays=(?<delays>.*?)[,\s$]+'
- 'dsn=(?<dsn>.*?)[,\s$]+'
- 'status=(?<status>.*?)[,\s$]+'
- '(?<message>.*)'
transform:
- fields:
- line_postfix_queue_id
- line_to, to
- line_orig_to, orig_to
- line_relay, relay
- line_status, status
- line_message, message
type: string
index: fulltext
- fields:
- line_facility_code, facility_code
- line_severity_code, severity_code
- line_version, version
type: uint64
index: fulltext
- fields:
- line_timestamp, timestamp
type: epoch, ns
index: timestamp
- fields:
- line_process_id, process_id
- line_delay, delay
type: uint64
- fields:
- line_delays, delays
- line_dsn, dsn
type: string If I move the timestamp back into the first transform and get rid of the epoch transform, it works. (Other than message, it just grabs the whole line, but that can be figured out on my own.) The value I get when timestamp is a string is just digits. There are no non-digit characters in it. And if I add it to the uint64 transform, it is converted properly. I did try converting the string to an int, and then the int to an epoch. It didn't error, but instead of an "epoch" field, I just see "null" in the test results. My goal is for the message timestamp to be the time used when viewing and searching logs. How can I do that? Any help would be appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi, First of all, the values you extracted from the regex are all strings (I'm sure you've figured it out!). In the transform section, the Depending on your description, you can use the Feel free to ask any questions if you encounter any problems! |
Beta Was this translation helpful? Give feedback.
Hi,
Thanks for trying out GreptimeDB :)
First of all, the values you extracted from the regex are all strings (I'm sure you've figured it out!).
In the transform section, the
type
field is set to specify the datatype in the database tables. The pipeline engine will try to 'convert' the type from the context to the desired one. In this case,line_timestamp
is a string, and the target type is atimestamp
in the database (nanosecond timestamp, to be specific. The type description here only states information about the target type, doesn't carry any information about the input type). The pipeline engine will have no idea what the string looks like (is it a date string, or epoch digits?), so i…