The SignalFx Entity ETL is a Node.js script which performs the following configurable steps:
- extracts entities like VMs, databases, load balancers, etc. from SignalFx that were added or updated since the last run
- transforms them into a format suitable for a target system
- loads transformed entities to the target system using HTTP requests
It is recommended to schedule the script to run in intervals. See below for details.
git clone git@github.com:signalfx/entity-etl.git
cd entity-etl
Update the config file to match the requirements of the target system you want to use. See Configuration Parameters below for details.
It is recommended to put SignalFx Access Token in the SIGNALFX_ACCESS_TOKEN
environment variable. Refer to the SignalFx documentation to check how to obtain the Access Token.
If your target system requires some form of authentication you may also use environment variables in the same way.
You can run the script either using Node.js installation on your system or using Docker container.
Prerequisites:
-
Node.js 12.14.x or later installed.
-
Network connectivity to the Node Package Manager (npm) to install script's dependencies (one-time operation).
-
Network connectivity to SignalFx and the target system.
-
Install script's dependencies first.
npm install
-
Run the script using one of the following modes:
node app
-- processes all entity typesnode app awsEc2 gce
-- processes specified types only (awsEc2
andgce
in this case)
Prerequisites:
-
Docker 19.x+ installed.
-
Network connectivity to Docker Hub to build Docker image with the Entity ETL script.
-
Network connectivity to SignalFx and the target system.
-
Update config file - this is optional; you can provide the updated
config.json
file later too. -
Build your own Docker image:
docker build -t entity-etl .
-
Run the script in the Docker container:
docker run -v $PWD/data/cache:/app/data/cache -e SIGNALFX_ACCESS_TOKEN=$SIGNALFX_ACCESS_TOKEN entity-etl
The above example assumes you have updated the
config.json
file before building the Docker image. If you want to provide an updatedconfig.json
after the image is built you can use the following command:docker run -v $PWD/data/cache:/app/data/cache -v $PWD/config.json:/app/config.json \ -e SIGNALFX_ACCESS_TOKEN=$SIGNALFX_ACCESS_TOKEN entity-etl
You can also specify a list of entity types to process:
docker run -v $PWD/data/cache:/app/data/cache -v $PWD/config.json:/app/config.json \ -e SIGNALFX_ACCESS_TOKEN=$SIGNALFX_ACCESS_TOKEN entity-etl node app awsEc2 gce
Note: the Entity ETL script uses a local filesystem to store cache data. To ensure this data is not lost between Docker container runs it is recommended to mount a volume as shown above (i.e.
-v $PWD/data/cache:/app/data/cache
) to persist the cache outside of the Docker container.
You can use tools like cron
if you want to run the script on an interval.
Due to API limitations we recommend the script interval of 15 minutes.
-
Use
crontab -e
command to edit your crontab file. -
Enter the following line to run the script every 15 minutes:
*/15 * * * * cd /path/to/the/entity-etl/script && /usr/local/bin/node app > /tmp/entity-etl.log
-
Optionally: adjust the crontab file if you want to use an interval different from 15 minutes. Rebuild the Docker image in such a case.
-
Use the following command to schedule the script execution:
docker run -d -v $PWD/data/cache:/app/data/cache -v $PWD/config.json:/app/config.json \ -e SIGNALFX_ACCESS_TOKEN=$SIGNALFX_ACCESS_TOKEN entity-etl crond -f
Parameter | Description | Allowed Values |
---|---|---|
logLevel |
Script's log level. | trace , debug , info , warn , error , silent |
sfx.server |
SignalFx API server URL. It is shown on your profile page in SignalFx. | https://api.<realm>.signalfx.com |
sfx.headers |
List of HTTP headers attached to all requests sent to SignalFx. The X-SF-TOKEN is a mandatory header. Refer to the SignalFx documentation to check how to obtain the Access Token. |
Note it is possible to use environment variables in header values like this: {{env.MY_SECRET}} |
sfx.entitiesTypesEndpoint |
SignalFx API endpoint used to fetch supported entity types and associated metadata. | /v2/entities/types |
sfx.entitiesEndpoint |
SignalFx API endpoint used to fetch entities of a given type updated after given time. Please note that {{placeholder}} syntax denotes a template. {{type}} will be replaced with the actual entity type and {{updatedFromMs}} will be replaced with the last known entity update time. It is also possible to use environment variables here, e.g. {{env.FIXED_UPDATED_FROM_MS}} . |
/v2/entities?type={{type}}&updatedFromMs={{updatedFromMs}} |
target.method |
Specifies HTTP method used to send transformed entities to a target system. | Any standard HTTP method supported by the target system. |
target.server |
Target system server URL. | DNS name or IP address of the target system. |
target.headers |
List of HTTP headers attached to all requests sent to the target system. | Note it is possible to use environment variables in header values like this: {{env.MY_SECRET}} |
target.entitiesEndpoint |
Target system API endpoint used to store entities of a given type. Please note that {{placeholder}} syntax denotes a template. {{type}} will be replaced with the actual entity type and environment variables are replaced with their values here, e.g. {{env.FIXED_UPDATED_FROM_MS}} . |
Any valid endpoint on the target.server that accepts transformed entities as an HTTP request body. |
target.maxBatchSize |
Specifies max number of entities sent to the target system in a single HTTP request. | Positive integer. Maximum value is dictated by the target system limits. |
entitiesCacheTtlInHours |
Entities fetched from SignalFx are cached so that the script can send only new or updated entities to the target system. This parameter specifies how long entities are kept in the cache. When entities are added, updated or retrieved from cache their TTL is updated. | Any positive integer. Recommended value: 8 hours. |
All above parameters are required (though the target.headers
list may be empty if the target system does not need them).
Here's a sample configuration file.
{
"logLevel": "info",
"sfx": {
"server": "https://api.us1.signalfx.com",
"headers": {
"X-SF-TOKEN": "{{env.SIGNALFX_ACCESS_TOKEN}}"
},
"entitiesTypesEndpoint": "/v2/entities/types",
"entitiesEndpoint": "/v2/entities?type={{type}}&updatedFromMs={{updatedFromMs}}"
},
"target": {
"method": "PUT",
"server": "http://localhost:9090/",
"headers": {
"Authentication": "Bearer {{env.MY_SECRET_TOKEN}}",
"Content-Type": "application/json"
},
"entitiesEndpoint": "/sample/{{type}}?jwt={{env.MY_SECRET_TOKEN}}",
"maxBatchSize": 10000
},
"entitiesCacheTtlInHours": 1
}
Entity ETL script stores entities fetched from SignalFx in cache files in the data/cache
folder. Each entity type uses separate file to store cached values. Those files are maintained by the script. If you want to fetch all entities from SignalFx so that they are all pushed down to the target system feel free to remove all files in the cache folder (just make sure to keep the folder in place).
See entitiesCacheTtlInHours
parameter description in the Configuration Parameters section for additional information.
The Entity ETL script uses Handlebars templating language.
The templates
folder contains several template files used to convert entities to a format supported by the target system. Each file is used to convert entities of a single type, e.g. awsEc2.hbs
file is used to convert awsEc2
entities.
The targetBody.hbs
is a special template used to combine one or more converted entities into a single HTTP request body sent to the target system.
The provided templates are suitable for ServiceNow CMDB table API. Feel free to adapt them to your requirements.
To see what fields are available for a given entity type inspect the /v2/entities
endpoint response. See Entity API section for details.
To see the list of supported entity types issue the following request:
curl -H "X-SF-TOKEN: $SIGNALFX_ACCESS_TOKEN" https://api.us1.signalfx.com/v2/entities/types
To see what fields are available for a given entity type inspect the response to the following request:
curl -H "X-SF-TOKEN: $SIGNALFX_ACCESS_TOKEN" https://api.us1.signalfx.com/v2/entities?type=awsRds
The request returns entities which were updated in last 15 minutes. If the response is empty, you may want to specify broader time range using query parameters listed below.
Supported query parameters:
type
- required. Examples: awsEC2, azureVm. Complete list of currently supported types is returned by/entities/types
endpoint.updatedToMs
- optional. Default: current epoch time in milliseconds.updatedFromMs
- optional. Default: current epoch time in milliseconds minus 15 minutes.
Note 1: We use the curl
tool in the above examples. Feel free to use any other HTTP client.
Note 2: You may need to escape (i.e. \&
) the ampersand character depending on the shell you use.
Note 3: The above examples assume your SignalFx API server is api.us1.signalfx.com
. The actual value may be different - refer to your profile page in SignalFx to check the API server address.
- 200 HTTP OK
Returns the list of entities of requested type and the information if the results are partial. In the latter case, the client should repeat the request and specify a narrower time range (the Entity ETL script implements this behavior).
Sample response structure:
{
"items": [
{
"AWSUniqueId": "i-0123456789abcdefg_us-west-2_123456789123",
"aws_account_id": "123456789123",
"aws_architecture": "x86_64",
"aws_arn": "arn:aws:ec2:us-west-2:123456789123:instance/i-0123456789abcdefg",
"aws_availability_zone": "us-west-2c",
"aws_hypervisor": "xen",
"aws_image_id": "ami-087c2c50437d0b80d",
"aws_instance_id": "i-0123456789abcdefg",
"aws_instance_type": "t3a.medium",
"aws_launch_time": "Tue Feb 18 18:14:10 UTC 2020",
"aws_private_dns_name": "ip-111–11-1-11.us-west-2.compute.internal",
"aws_region": "us-west-2",
"aws_reservation_id": "r-123456789abcdefg",
"aws_root_device_type": "ebs",
"aws_state": "{Code: 80,Name: stopped}",
"aws_state_reason": "{Code: Client.UserInitiatedShutdown,Message: Client.UserInitiatedShutdown: User initiated shutdown}",
"aws_tag_Name": "sample-test",
"updatedOnMs": 1582304819692
},
{
"AWSUniqueId": "i-0123456789abcdefh_us-east-2_123456789123",
"aws_account_id": "123456789123",
"aws_architecture": "x86_64",
"aws_arn": "arn:aws:ec2:us-east-2:123456789123:instance/i-0123456789abcdefh",
"aws_availability_zone": "us-east-2b",
"aws_hypervisor": "xen",
"aws_image_id": "ami-0307f7ccf6ea35750",
"aws_instance_id": "i-123456789abcdefh",
"aws_instance_type": "c4.large",
"aws_launch_time": "Fri Mar 22 16:18:01 UTC 2019",
"aws_private_dns_name": "ip-10-0-100-100.us-east-2.compute.internal",
"aws_region": "us-east-2",
"aws_reservation_id": "r-0123456789abcdefh",
"aws_root_device_type": "ebs",
"aws_state": "{Code: 16,Name: running}",
"aws_tag_Name": "Sample ECS host",
"aws_tag_aws_autoscaling_groupName": "Sample-ecs-host-ECSAutoScalingGroup-ABCDEFGHIJKLM",
"aws_tag_aws_cloudformation_logical-id": "ECSAutoScalingGroup",
"aws_tag_aws_cloudformation_stack-id": "arn:aws:cloudformation:us-east-2:123456789123:stack/Sample-ecs-host/abcdefgh-1234-5678-9012-a1b2c3d4e5f6",
"aws_tag_aws_cloudformation_stack-name": "Sample-ecs-host",
"updatedOnMs": 1582320323006
}
],
"partialResults": false
}
- 400 HTTP Bad request
The most likely reason is an unsupported value of a type
query parameter.
Sample response:
{
"code": 400,
"message": "{\"error\": \"Invalid value of the query param 'type'\""
}
- 401 HTTP Unauthorized
The token that has been used to issue the request is invalid. Please note the difference between User API Token and Access Token. Valid organization-level access token should be used for the execution of the sample script. Make sure that the script is pointing to the correct SignalFx realm.
Sample response:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401 Unauthorized</h2>
<table>
...
</table>
</body>
</html>
- 403 HTTP Forbidden
The entity API is enabled by SignalFx per customer's request. The following message means that the API is not enabled in the current organization.
{
"error": "The requested endpoint is not enabled in your organization."
}
Please contact SignalFx support for enablement of the feature in the selected organization.
- Please note that the entities data may be available with a delay which is dependent on current load on SignalFx system. Entity ETL script implementation accounts for the delay.
- Depending on the source of the data, the updates of the metadata may be extracted by SignalFx with a delay. For example, for sources based on cloud providers' APIs, the usual pull interval is 15 minutes.