diff --git a/CODEOWNERS b/CODEOWNERS index 9ecd841656..672884605e 100644 --- a/CODEOWNERS +++ b/CODEOWNERS @@ -20,6 +20,9 @@ # DNS server /content/en/user-guide/tools/dns-server @simonrw @joe4dev @dfangl +# chaos engineering +/content/en/user-guide/chaos-engineering/ @viren-nadkarni + ###################### ### SERVICE OWNERS ### ###################### diff --git a/content/en/references/internal-endpoints.md b/content/en/references/internal-endpoints.md index b52c359774..efb9d6a3ae 100644 --- a/content/en/references/internal-endpoints.md +++ b/content/en/references/internal-endpoints.md @@ -18,13 +18,14 @@ The API path for the LocalStack internal resources is `/_localstack`. The following endpoints are available: | Endpoint | Description | -| ---------------------------------| --------------------------------------------------------------------------- | -| `/_localstack/health`| To check the available and running AWS services in LocalStack. You can use the endpoint to restart the LocalStack services. | -| `/_localstack/plugins` | Shows the [Plux plugins](https://github.com/localstack/localstack/blob/master/docs/localstack-concepts/README.md#plugins) information in LocalStack. | -| `/_localstack/init`| Shows the initialization status after setting up [Init hooks](https://docs.localstack.cloud/references/init-hooks/). | +| ------------------------------------ | --------------------------------------------------------------------------- | +| `/_localstack/health` | To check the available and running AWS services in LocalStack. You can use the endpoint to restart the LocalStack services. | +| `/_localstack/plugins` | Shows the [Plux plugins](https://github.com/localstack/localstack/blob/master/docs/localstack-concepts/README.md#plugins) information in LocalStack. | +| `/_localstack/init` | Shows the initialization status after setting up [Init hooks](https://docs.localstack.cloud/references/init-hooks/). | | `/_localstack/cloudformation/deploy` | Enables you to deploy CloudFormation templates locally through a web interface. | -| `/_localstack/diagnose`| Reports extensive and sensitive data from LocalStack instance, enabled via the `DEBUG=1` configuration variable. | +| `/_localstack/diagnose` | Reports extensive and sensitive data from LocalStack instance, enabled via the `DEBUG=1` configuration variable. | | `/_localstack/config` | Enables dynamic configuration updates at runtime, enabled via the `ENABLE_CONFIG_UPDATES` configuration variable. | +| `/_localstack/chaos` | [Chaos API]({{< ref "chaos-api" >}}) configuration endpoint | | `/_localstack/state//save` | Get a snapshot of the given AWS service using the Persistence mechanism. | | `/_localstack/state//load` | Load the most recent snapshot of the given service using the Persistence mechanism. | | `/_localstack/state/reset` | Reset the state of the services using the Persistence mechanism. | diff --git a/content/en/tutorials/fault-injection-service-experiments/fis-experiment-1.png b/content/en/tutorials/fault-injection-service-experiments/fis-experiment-1.png deleted file mode 100644 index a2d832f597..0000000000 Binary files a/content/en/tutorials/fault-injection-service-experiments/fis-experiment-1.png and /dev/null differ diff --git a/content/en/tutorials/fault-injection-service-experiments/fis-experiment-2.png b/content/en/tutorials/fault-injection-service-experiments/fis-experiment-2.png deleted file mode 100644 index e1b66507aa..0000000000 Binary files a/content/en/tutorials/fault-injection-service-experiments/fis-experiment-2.png and /dev/null differ diff --git a/content/en/tutorials/fault-injection-service-experiments/fis-experiments.png b/content/en/tutorials/fault-injection-service-experiments/fis-experiments.png deleted file mode 100644 index 212f50d3ef..0000000000 Binary files a/content/en/tutorials/fault-injection-service-experiments/fis-experiments.png and /dev/null differ diff --git a/content/en/tutorials/fault-injection-service-experiments/index.md b/content/en/tutorials/fault-injection-service-experiments/index.md deleted file mode 100644 index e53c89b207..0000000000 --- a/content/en/tutorials/fault-injection-service-experiments/index.md +++ /dev/null @@ -1,404 +0,0 @@ ---- -title: "Chaos Engineering: Running Experiments with Fault Injection Service" -linkTitle: "Chaos Engineering: Running Experiments with Fault Injection Service" -weight: 8 -description: > - Conduct experiments on your AWS infrastructure to simulate faults and understand their effects, enhancing application resilience. -type: tutorials -teaser: "" -services: -- fis -- agw -- ddb -- lmb -platform: -- Java -deployment: -- awscli -tags: -- BASH -- FIS -- API Gateway -- DynamoDB -- Lambda -pro: true -leadimage: "fis-experiments.png" ---- - - -## Introduction - -Fault Injection Simulator (FIS) is a service designed for conducting controlled chaos engineering tests on AWS infrastructure. -Its purpose is to uncover vulnerabilities and improve system robustness. -FIS offers a means to deliberately introduce failures and observe their impacts, helping developers to better equip their systems against actual outages. -To read about the FIS service, refer to the dedicated [FIS documentation](https://docs.localstack.cloud/user-guide/aws/fis/). - -## Getting started - -This tutorial is designed for users new to the Fault Injection Simulator and assumes basic knowledge of the AWS CLI and our -[`awslocal`](https://github.com/localstack/awscli-local) wrapper script. -In this example, we will use the FIS to create controlled outages in a DynamoDB database. -The aim is to test the software's behavior and error handling capabilities. - -For this particular example, we'll be using a [sample application repository](https://github.com/localstack-samples/samples-chaos-engineering/tree/main/FIS-experiments). -Clone the repository, and follow the instructions below to get started. - -### Prerequisites - -The general prerequisites for this guide are: - -- LocalStack Pro with [LocalStack Auth Token]({{}}) -- [AWS CLI]({{}}) with the [`awslocal` wrapper]({{}}) -- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) - -Start LocalStack by using the `docker-compose.yml` file from the repository. -Ensure to set your Auth Token as an environment variable during this process. -The cloud resources will be automatically created upon the LocalStack start. - -{{< command >}} -$ LOCALSTACK_AUTH_TOKEN= -$ docker compose up -{{< /command >}} - -### Application Architecture - -The following diagram shows the architecture that this application builds and deploys: - -{{< figure src="fis-experiment-1.png" width="800">}} - -### Creating an experiment template - -Before starting any FIS experiments, it's important to verify that our application is functioning correctly. -Start by creating an entity and saving it. -To do this, use `cURL` to call the API Gateway endpoint for the POST method: - -{{< command >}} -$ curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \ ---header 'Content-Type: application/json' \ ---data '{ - "id": "prod-2004", - "name": "Ultimate Gadget", - "price": "49.99", - "description": "The Ultimate Gadget is the perfect tool for tech enthusiasts looking for the next level in gadgetry. -Compact, powerful, and loaded with features." -}' - -Product added/updated successfully. - -{{< /command >}} - -You can use the file named `experiment-ddb.json` that contains the FIS experiment configuration. -This file will be used in the upcoming call to the [`CreateExperimentTemplate`](https://docs.aws.amazon.com/fis/latest/APIReference/API_CreateExperimentTemplate.html) API within the FIS resource. - -```bash -$ cat experiment-ddb.json -{ - "actions": { - "Test action 1": { - "actionId": "localstack:generic:api-error", - "parameters": { - "service": "dynamodb", - "api": "all", - "percentage": "100", - "exception": "DynamoDbException", - "errorCode": "500" - } - } - }, - "description": "Template for interfering with the DynamoDB service", - "stopConditions": [{ - "source": "none" - }], - "roleArn": "arn:aws:iam:000000000000:role/ExperimentRole" -} -``` - -This template is designed to target all APIs of the DynamoDB resource. -While it's possible to specify particular operations like `PutItem` or `GetItem`, the objective here is to entirely disconnect the database. - -As a result, this configuration will cause all API calls to fail with a 100% failure rate, each resulting in an HTTP 500 status code and a `DynamoDbException`. - -{{}} -$ awslocal fis create-experiment-template --cli-input-json file://experiment-ddb.json - -{ - "experimentTemplate": { - "id": "895591e8-11e6-44c4-adc3-86592010562b", - "description": "Template for interfering with the DynamoDB service", - "actions": { - "Test action 1": { - "actionId": "localstack:generic:api-error", - "parameters": { - "service": "dynamodb", - "api": "all", - "percentage": "100", - "exception": "DynamoDbException", - "errorCode": "500" - } - } - }, - "stopConditions": [ - { - "source": "none" - } - ], - "creationTime": 1699308754.415716, - "lastUpdateTime": 1699308754.415716, - "roleArn": "arn:aws:iam:000000000000:role/ExperimentRole" - } -} - -{{}} - -Take note of the `id` field in the response. -This is the ID of the experiment template that will be used in the next step. - -### Starting the experiment - -Following the creation of the experiment template, you can create a new experiment using the template's ID. - -{{< command >}} -$ awslocal fis start-experiment --experiment-template-id - -{ - "experiment": { - "id": "1b1238fd-316d-4956-93e7-5ada677a6f69", - "experimentTemplateId": "895591e8-11e6-44c4-adc3-86592010562b", - "roleArn": "arn:aws:iam:000000000000:role/ExperimentRole", - "state": { - "status": "running" - }, - "actions": { - "Test action 1": { - "actionId": "localstack:generic:api-error", - "parameters": { - "service": "dynamodb", - "api": "all", - "percentage": "100", - "exception": "DynamoDbException", - "errorCode": "500" - } - } - }, - "stopConditions": [ - { - "source": "none" - } - ], - "creationTime": 1699308823.74327, - "startTime": 1699308823.74327 - } -} - -{{< /command >}} - -Replace the `` placeholder with the ID of the experiment template that was created in the previous step. - -### Simulating an outage - -Once the experiment starts, the database becomes inaccessible. -This means users cannot retrieve or add new products, resulting in the API Gateway returning an Internal Server Error. -Downtime and data loss are critical issues to avoid in enterprise applications. - -Fortunately, encountering this issue early in the development phase allows developers to implement effective error handling and develop mechanisms to prevent data loss during a database outage. - -It's important to note that this approach is not limited to DynamoDB; outages can be simulated for any storage resource. - -### Setting up a solution - -{{< figure src="fis-experiment-2.png" width="800">}} - -A possible solution involves setting up an SNS topic, an SQS queue, and a Lambda function. -The Lambda function will be responsible for retrieving queued items and attempting to re-execute the `PutItem` operation on the database. -If DynamoDB remains unavailable, the item will be placed back in the queue for a later retry. - -{{< command >}} -$ curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \ ---header 'Content-Type: application/json' \ ---data '{ - "id": "prod-1003", - "name": "Super Widget", - "price": "29.99", - "description": "A versatile widget that can be used for a variety of purposes. -Durable, reliable, and affordable." -}' - -A DynamoDB error occurred. -Message sent to queue. - -{{< /command >}} - -If we review the logs, it will show that the `DynamoDbException` has been managed effectively. - -```bash -2023-11-06T22:21:40.789 DEBUG --- [ asgi_gw_2] l.services.fis.handler : FIS handler called with configs: {'dynamodb': {None: [(100, 'DynamoDbException', '500')]}} -2023-11-06T22:21:40.789 INFO --- [ asgi_gw_2] localstack.request.aws : AWS dynamodb.PutItem => 500 (DynamoDbException) -2023-11-06T22:21:40.834 DEBUG --- [ asgi_gw_4] l.services.sns.publisher : Topic 'arn:aws:sns:us-east-1:000000000000:ProductEventsTopic' publishing '5520d37a-fc21-4a73-b1bf-f9b9afce5908' to subscribed -'arn:aws:sqs:us-east-1:000000000000:ProductEventsQueue' with protocol 'sqs' (subscription 'arn:aws:sns:us-east-1:000000000000:ProductEventsTopic:0a4abf8c-744a-404a-9ff9-f132e25d1b30') -``` - -This element will remain in the queue until the outage is resolved. - -### Stopping the experiment - -To stop the experiment, use the following command: - -{{< command >}} -$ awslocal fis stop-experiment --id - -{ - "experiment": { - "id": "1b1238fd-316d-4956-93e7-5ada677a6f69", - "experimentTemplateId": "895591e8-11e6-44c4-adc3-86592010562b", - "roleArn": "arn:aws:iam:000000000000:role/ExperimentRole", - "state": { - "status": "stopped" - }, - "actions": { - "Test action 1": { - "actionId": "localstack:generic:api-error", - "parameters": { - "service": "dynamodb", - "api": "all", - "percentage": "100", - "exception": "DynamoDbException", - "errorCode": "500" - }, - "startTime": 1699308823.750742, - "endTime": 1699309736.259625 - } - }, - "stopConditions": [ - { - "source": "none" - } - ], - "creationTime": 1699308823.74327, - "startTime": 1699308823.74327, - "endTime": 1699309736.259646 - } -} - -{{< /command >}} - -Replace the `` placeholder with the ID of the experiment that was created in the previous step. - -The experiment has been terminated, allowing the Product that initially failed to reach the database to finally be stored successfully. -This can be confirmed by scanning the database. - -{{< command >}} -$ awslocal dynamodb scan --table-name Products - -{ - "Items": [ - { - "name": { - "S": "Super Widget" - }, - "description": { - "S": "A versatile widget that can be used for a variety of purposes. -Durable, reliable, and affordable." - }, - "id": { - "S": "prod-1003" - }, - "price": { - "N": "29.99" - } - }, - { - "name": { - "S": "Ultimate Gadget" - }, - "description": { - "S": "The Ultimate Gadget is the perfect tool for tech enthusiasts looking for the next level in gadgetry. -Compact, powerful, and loaded with features." - }, - "id": { - "S": "prod-2004" - }, - "price": { - "N": "49.99" - } - } -], - "Count": 2, - "ScannedCount": 2, - "ConsumedCapacity": null -} - -{{< /command >}} - -### Configuring the latency - -The LocalStack FIS service can also introduce latency using the following experiment template: - -```bash -{ - "description": "template for testing delays in API calls", - "actions": { - "latency": { - "actionId": "localstack:generic:api-error", - "parameters": { - "latency": "4" - } - } - }, - "stopConditions": [ - { - "source": "none" - } - ], - "roleArn": "arn:aws:iam:000000000000:role/ExperimentRole" -} -``` - -Save this template as `latency-experiment.json` and use it to create an experiment definition through the FIS service: - -{{< command >}} -$ awslocal fis create-experiment-template --cli-input-json file://latency-experiment.json - -{ - "experimentTemplate": { - "id": "966f5632-4e2c-4567-b99c-436c333e523f", - "description": "template for testing delays in API calls", - "actions": { - "latency": { - "actionId": "localstack:generic:api-error", - "parameters": { - "latency": "4" - } - } - }, - "stopConditions": [ - { - "source": "none" - } - ], - "creationTime": 1699619228.208613, - "lastUpdateTime": 1699619228.208613, - "roleArn": "arn:aws:iam:000000000000:role/ExperimentRole" - } -} - -$ awslocal fis start-experiment --experiment-template-id -{{< /command >}} - -Replace the `` placeholder with the ID of the experiment template that was created in the previous step. - -While the experiment is active, you can use the same sample stack to observe and understand the effects of a 4-second delay on each service call. - -{{< command >}} -$ curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \ ---header 'Content-Type: application/json' \ ---data '{ - "id": "prod-1088", - "name": "Super Widget", - "price": "29.99", - "description": "A versatile widget that can be used for a variety of purposes. -Durable, reliable, and affordable." -}' - -An error occurred (InternalError) when calling the GetResources operation (reached max retries: 4): Failing as per Fault Injection Simulator configuration - -{{< /command >}} diff --git a/content/en/tutorials/route53-failover-with-fis/index.md b/content/en/tutorials/route-53-failover/index.md similarity index 61% rename from content/en/tutorials/route53-failover-with-fis/index.md rename to content/en/tutorials/route-53-failover/index.md index a6b491d094..2cb206950c 100644 --- a/content/en/tutorials/route53-failover-with-fis/index.md +++ b/content/en/tutorials/route-53-failover/index.md @@ -1,13 +1,10 @@ --- -title: "Chaos Engineering: Route53 Failover with FIS" -linkTitle: "Chaos Engineering: Route53 Failover with FIS" -weight: 9 -description: > - Integrate FIS with Route 53 to create a resilient, self-repairing infrastructure, which manages traffic effectively during simulated disruptions. +title: "Chaos Engineering: Route53 Failover" +linkTitle: "Chaos Engineering: Route53 Failover" +description: Set up Route 53 failover to create a resilient, self-repairing infrastructure, which manages traffic effectively during simulated disruptions. type: tutorials teaser: "" services: -- fis - agw - ddb - lmb @@ -17,43 +14,34 @@ platform: deployment: - awscli tags: -- BASH -- FIS - Route53 - API Gateway - DynamoDB - Lambda +- Chaos Engineering pro: true leadimage: "route-53-failover.png" --- ## Introduction -LocalStack allows you to integrate & test [Fault Injection Simulator (FIS)](https://docs.localstack.cloud/user-guide/aws/fis/) with [Route53](https://docs.localstack.cloud/user-guide/aws/route53/) to automatically divert users to -a healthy secondary zone if the primary region fails, ensuring system availability and responsiveness. -Route53's health checks and -traffic redirection enhance architecture resilience and ensure service continuity during regional outages, crucial for uninterrupted -user experiences. +LocalStack allows you to integrate and test [Chaos API]({{< ref "chaos-api" >}}) with [Route53]({{< ref "user-guide/aws/route53" >}}) to automatically divert users to a healthy secondary zone if the primary region fails, ensuring system availability and responsiveness. +Route53's health checks and traffic redirection enhance architecture resilience and ensure service continuity during regional outages, crucial for uninterrupted user experiences. {{< callout "note">}} -Route53 Failover with FIS is currently available as part of the **LocalStack Enterprise** plan. -If you'd like to try it out, -please [contact us](https://www.localstack.cloud/demo) to request access. +Route53 Failover and Chaos API is currently available as part of the LocalStack Enterprise plan. +If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo) to request access. {{< /callout >}} ## Getting started -This tutorial is designed for users new to the Route53 and FIS services. -In this example, there's an active-primary and -passive-standby configuration. -Route53 routes traffic to the primary region, which processes product-related requests through -API Gateway and Lambda functions, with data stored in DynamoDB. -If the primary region fails, Route53 redirects to the standby -region, maintained in sync by a replication Lambda function. +This tutorial is designed for users new to the Route53 and LocalStack Chaos API. +In this example, there's an active-primary and passive-standby configuration. +Route53 routes traffic to the primary region, which processes product-related requests through API Gateway and Lambda functions, with data stored in DynamoDB. +If the primary region fails, Route53 redirects to the standby region, maintained in sync by a replication Lambda function. For this particular example, we'll be using a [sample application repository](https://github.com/localstack-samples/samples-chaos-engineering/tree/main/route53-failover). -Clone the repository, and follow the -instructions below to get started. +Clone the repository, and follow the instructions below to get started. ### Prerequisites @@ -66,15 +54,14 @@ The general prerequisites for this guide are: - `dig` Start LocalStack by using the `docker-compose.yml` file from the repository. -Ensure to set your Auth Token as an environment variable -during this process. +Ensure to set your Auth Token as an environment variable during this process. {{< command >}} $ LOCALSTACK_AUTH_TOKEN= $ docker compose up {{< /command >}} -### Application Architecture +### Architecture The following diagram shows the architecture that this application builds and deploys: @@ -83,21 +70,17 @@ The following diagram shows the architecture that this application builds and de ### Creating the resources To begin, deploy the same services in both `us-west-1` and `us-east-1` regions. -The resources specified in the `init-resources.sh` -file will be created when the LocalStack container starts, using Initialization Hooks and the `awslocal` CLI tool. +The resources specified in the `init-resources.sh` file will be created when the LocalStack container starts, using [Initialization Hooks]({{< ref "references/init-hooks" >}}) and the `awslocal` CLI tool. The objective is to have a backup system in case of a regional outage in the primary availability zone (`us-west-1`). -We'll focus -on this region to examine the existing resilience mechanisms. +We'll focus on this region to examine the existing resilience mechanisms. {{< figure src="route53-failover-2.png" width="800">}} - The primary API Gateway includes a health check endpoint that returns a 200 HTTP status code, serving as a basic check for its availability. - Data synchronization across regions can be achieved with AWS-native tools like DynamoDB Streams and AWS Lambda. - Here, any changes to the -primary table trigger a Lambda function, replicating these changes to a secondary table. - This configuration is essential for high availability -and disaster recovery. + Here, any changes to the primary table trigger a Lambda function, replicating these changes to a secondary table. + This configuration is essential for high availability and disaster recovery. ### Configuring a Route53 hosted zone @@ -124,13 +107,10 @@ awslocal route53 create-health-check \ ) {{< /command >}} -This command creates a Route 53 health check for an HTTP endpoint (`12345.execute-api.localhost.localstack.cloud:4566/dev/healthcheck`) -with a 10-second request interval and captures the health check's ID. -The caller reference identifier in AWS resource creation or updates -prevents accidental duplication if requests are repeated. +This command creates a Route 53 health check for an HTTP endpoint (`12345.execute-api.localhost.localstack.cloud:4566/dev/healthcheck`) with a 10-second request interval and captures the health check's ID. +The caller reference identifier in AWS resource creation or updates prevents accidental duplication if requests are repeated. -To update DNS records in the specified Route53 hosted zone (`$HOSTED_ZONE_ID`), add two CNAME records: `12345.$HOSTED_ZONE_NAME` -pointing to `12345.execute-api.localhost.localstack.cloud`, and `67890.$HOSTED_ZONE_NAME` pointing to `67890.execute-api.localhost.localstack.cloud`. +To update DNS records in the specified Route53 hosted zone (`$HOSTED_ZONE_ID`), add two CNAME records: `12345.$HOSTED_ZONE_NAME` pointing to `12345.execute-api.localhost.localstack.cloud`, and `67890.$HOSTED_ZONE_NAME` pointing to `67890.execute-api.localhost.localstack.cloud`. Set a TTL (Time to Live) of 60 seconds for these records. {{< command >}} @@ -164,13 +144,10 @@ $ awslocal route53 change-resource-record-sets \ }' {{< /command >}} -Finally, we'll update the DNS records in the Route53 hosted zone identified by **`$HOSTED_ZONE_ID`**. -We're adding two CNAME records -for the subdomain `test.$HOSTED_ZONE_NAME`. -The first record points to `12345.$HOSTED_ZONE_NAME` and is linked with the earlier created -health check, designated as the primary failover target. -The second record points to `67890.$HOSTED_ZONE_NAME` and is set as the secondary -failover target. +Finally, we'll update the DNS records in the Route53 hosted zone identified by `$HOSTED_ZONE_ID`. +We're adding two CNAME records for the subdomain `test.$HOSTED_ZONE_NAME`. +The first record points to `12345.$HOSTED_ZONE_NAME` and is linked with the earlier created health check, designated as the primary failover target. +The second record points to `67890.$HOSTED_ZONE_NAME` and is set as the secondary failover target. {{< command >}} $ awslocal route53 change-resource-record-sets \ @@ -210,10 +187,8 @@ $ awslocal route53 change-resource-record-sets \ }' {{< /command >}} -This setup represents the basic failover configuration where traffic is redirected to different endpoints based on their health check -status. -To confirm that the CNAME record for `test.hello-localstack.com` points to `12345.execute-api.localhost.localstack.cloud`, -you can use the following `dig` command: +This setup represents the basic failover configuration where traffic is redirected to different endpoints based on their health check status. +To confirm that the CNAME record for `test.hello-localstack.com` points to `12345.execute-api.localhost.localstack.cloud`, you can use the following `dig` command: {{< command >}} $ dig @localhost test.hello-localstack.com CNAME @@ -233,74 +208,26 @@ test.hello-localstack.com. ### Creating a controlled outage Our setup is now complete and ready for testing. -To mimic a regional outage in the `us-west-1` region, we'll conduct an experiment that -halts all service invocations in this region, including the health check function. -Once the primary region becomes non-functional, -Route 53's health checks will fail. -This failure will activate the failover policy, redirecting traffic to the corresponding services -in the secondary region, thus maintaining service continuity. +To mimic a regional outage in the `us-west-1` region, we'll configure the [Chaos API]({{< ref "chaos-api" >}}) to halt all service invocations in this region, including the health check function. +Once the primary region becomes non-functional, Route 53's health checks will fail. +This failure will activate the failover policy, redirecting traffic to the corresponding services in the secondary region, thus maintaining service continuity. {{< command >}} -$ cat region-outage-experiment.json - -{ - "description": "template for internal server error for few regions i.e. us-west-1", - "actions": { - "regionUnavailable-us-west-1": { - "actionId": "localstack:generic:api-error", - "parameters": { - "region": "us-west-1", - "errorCode": "503" - } - } - }, - "stopConditions": [], - "roleArn": "arn:aws:iam:000000000000:role/ExperimentRole" -} - -{{< /command >}} - -This Fault Injection Simulator (FIS) experiment template is set up to mimic a `Service Unavailable` (503 error) in the `us-west-1` region. -To create the experiment template, use the following command: - -{{< command >}} -$ awslocal fis create-experiment-template --cli-input-json file://region-outage-experiment.json -{{< /command >}} - -Once the template is created, start the experiment using its ID: - -{{< command >}} -$ awslocal fis start-experiment --experiment-template-id - -{ - "experiment": { - "id": "651b5196-b244-4a8b-8ab6-d7b9e13998a0", - "experimentTemplateId": "d3a1a31b-c52e-49ec-8387-8f5eb75a11df", - "roleArn": "arn:aws:iam:000000000000:role/ExperimentRole", - "state": { - "status": "running" - }, - "actions": { - "regionUnavailable-us-east-1": { - "actionId": "localstack:generic:api-error", - "parameters": { - "region": "us-west-1", - "errorCode": "503" - } - } - }, - "stopConditions": [], - "creationTime": 1699902569.439826, - "startTime": 1699902569.439826 +$ curl -L -X POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ +-H 'Content-Type: application/json' \ +-d ' +[ + { + "region": "us-west-1" } -} - +]' {{< /command >}} -Replace `` with the ID of the experiment template created in the previous step. -When the experiment is active, -Route 53's health checks will detect the failure and redirect traffic to the standby region as per the failover setup. -Confirm this redirection with: +This will cause all services to fail in the `us-west-1` region with a 503 Service Unavailable error. +Because of this, Route 53's health checks will detect the failure and redirect traffic to the standby region as per the failover setup. + +Confirm this redirection with the following command. +Notice that the secondary endpoint is returned in the CNAME answer. {{< command >}} $ dig @localhost test.hello-localstack.com CNAME @@ -360,7 +287,7 @@ Running the script will resolve the CNAME record for 'test.hello-localstack.com' {{< command >}} $ python3 dns-resolver.py -s{"price":"29.99","name":"Super Widget","description":"A versatile widget that can be used for a variety of purposes. +{"price":"29.99","name":"Super Widget","description":"A versatile widget that can be used for a variety of purposes. Durable, reliable, and affordable.","id":"prod-1088"} {{< /command >}} diff --git a/content/en/tutorials/route53-failover-with-fis/route-53-failover.png b/content/en/tutorials/route-53-failover/route-53-failover.png similarity index 100% rename from content/en/tutorials/route53-failover-with-fis/route-53-failover.png rename to content/en/tutorials/route-53-failover/route-53-failover.png diff --git a/content/en/tutorials/route53-failover-with-fis/route53-failover-1.png b/content/en/tutorials/route-53-failover/route53-failover-1.png similarity index 100% rename from content/en/tutorials/route53-failover-with-fis/route53-failover-1.png rename to content/en/tutorials/route-53-failover/route53-failover-1.png diff --git a/content/en/tutorials/route53-failover-with-fis/route53-failover-2.png b/content/en/tutorials/route-53-failover/route53-failover-2.png similarity index 100% rename from content/en/tutorials/route53-failover-with-fis/route53-failover-2.png rename to content/en/tutorials/route-53-failover/route53-failover-2.png diff --git a/content/en/tutorials/simulating-outages-in-your-application-stack/index.md b/content/en/tutorials/simulating-outages-in-your-application-stack/index.md deleted file mode 100644 index 50d1eea21f..0000000000 --- a/content/en/tutorials/simulating-outages-in-your-application-stack/index.md +++ /dev/null @@ -1,209 +0,0 @@ ---- -title: "Chaos Engineering: Simulating outages in your application stack" -linkTitle: "Chaos Engineering: Simulating outages in your application stack" -weight: 10 -description: > - Utilize the LocalStack Outages Extension to simulate service disruptions and assess how well your infrastructure can deploy and recover from unexpected situations. This tool helps you test the resilience of your system by creating controlled outage scenarios, allowing you to identify and improve upon weaknesses. -type: tutorials -teaser: "" -services: -- ecs -- ec2 -- agw -- ddb -platform: -- JavaScript -deployment: -- terraform -tags: -- BASH -- API Gateway -- DynamoDB -- ECS -pro: true -leadimage: "outages.png" ---- - -## Introduction - -[LocalStack Outages Extension](https://pypi.org/project/localstack-extension-outages/) can simulate outages for any AWS region or service. -You can install and use the Outages Extension through [LocalStack Extension mechanism](https://docs.localstack.cloud/user-guide/extensions/) to test infrastructure resilience by intentionally causing service outages and observing the system's recovery in scenarios with incomplete infrastructure is an effective approach. -This method evaluates the system's deployment mechanisms and its ability to handle and recover from infrastructure anomalies, a critical aspect of chaos engineering. - -{{< callout "note">}} -Outages Extension is currently available as part of the **LocalStack Enterprise** plan. -If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo) to request access. -{{< /callout >}} - -## Getting started - -This guide is designed for users who are new to Outages Extension. -We'll simulate partial outages by interrupting specific services, such as halting an ECS instance creation or disrupting a database service. -By closely watching Terraform's responses and the status of AWS resources, you'll learn how Terraform manages these disruptions. - -For this particular example, we'll be using a Terraform configuration file from a [sample application repository](https://github.com/localstack-samples/samples-chaos-engineering/tree/main/extension-outages). -Clone the repository, and follow the instructions below to get started. - -### Prerequisites - -The general prerequisites for this guide are: - -- LocalStack Pro with [LocalStack CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli) & [LocalStack Auth Token](https://docs.localstack.cloud/getting-started/auth-token/) -- [AWS CLI](https://docs.localstack.cloud/user-guide/integrations/aws-cli/) with the [`awslocal` wrapper](https://docs.localstack.cloud/user-guide/integrations/aws-cli/#localstack-aws-cli-awslocal) -- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) -- [Terraform](https://www.terraform.io/downloads.html) and [`tflocal` wrapper](https://docs.localstack.cloud/user-guide/integrations/terraform/#tflocal-wrapper-script). - -Start LocalStack by using the `docker-compose.yml` file from the repository. -Ensure to set your Auth Token as an environment variable during this process. - -{{< command >}} -$ LOCALSTACK_AUTH_TOKEN= -$ docker compose up -{{< /command >}} - -### Installing the extension - -To install the LocalStack Outages Extension, first set up your LocalStack Auth Token in your environment. -Once the token is configured, use the command below to install the extension: - -{{< command >}} -$ localstack extensions install localstack-extension-outages -{{< /command >}} - -Alternatively, you can enable automatic installation of the extension by setting the environment variable `EXTENSION_AUTO_INSTALL=localstack-extension-outages` when you start the LocalStack container. -This can be done by including it in your `docker` command line interface (CLI) or in your `docker-compose` configuration as an environment variable. - -Follow our [Managing Extensions documentation](https://docs.localstack.cloud/user-guide/extensions/managing-extensions/) for more information on how to install & manage extensions. - -### Running Terraform - -To get started, initialize & apply the Terraform configuration using the `tflocal` CLI to create the local resources. -The Terraform configuration file operates independently of the application, meaning the application won't be available during this phase. -To deploy the entire stack, including the application, refer to the [sample repository](https://github.com/localstack-samples/sample-terraform-ecs-apigateway). - -{{< command >}} -$ tflocal init -$ tflocal plan -$ tflocal apply -{{< /command >}} - -The following output would be retrieved: - -```bash -Apply complete! Resources: 57 added, 0 changed, 0 destroyed. - -Outputs: - -api_id = "3eed6d1d" -api_invoke_url = "https://3eed6d1d.execute-api.us-east-1.amazonaws.com" -api_invoke_url_foodstore_foods = "https://3eed6d1d.execute-api.us-east-1.amazonaws.com/foodstore/foods/{foodId}" -api_invoke_url_petstore_pets = "https://3eed6d1d.execute-api.us-east-1.amazonaws.com/petstore/domestic/pets/{petId}" -api_test_page = -container_security_group = "sg-db749514a062de41c" -ecs_cluster_name = "arn:aws:ecs:us-east-1:000000000000:cluster/ecs-cluster" -private_dns_namespace = "60bfac90" -vpc_id = "vpc-f9d6b124" -``` - -Next, you can update certain resources. -This includes increasing the number of tasks in the `task_definition` for the ECS service from 3 to 5 and upgrading the `openapi` specification version used by API Gateway from 3.0.1 to 3.1.0. - -### Simulating outages - -After running the Terraform `plan` command to preview these changes, you can simulate an outage affecting the ECS and API Gateway V2 services before applying the changes. -To do this, execute the following command: - -{{< command >}} -$ curl --location --request POST 'http://outages.localhost.localstack.cloud:4566/outages' \ ---header 'Content-Type: application/json' \ ---data-raw '[ - { - "service": "ecs", - "region": "us-east-1" - }, - { - "service": "apigatewayv2", - "region": "us-east-1" - } -]' -{{< /command >}} - -In the LocalStack logs, you'll notice that during the periods between successful calls, the controlled outages are marked by a `ServiceUnavailableException` accompanied by a 503 HTTP status code. -These exceptions specifically affect the targeted AWS APIs. - -```bash -2023-11-09T21:53:31.801 INFO --- [ asgi_gw_9] localstack.request.aws : AWS ec2.GetTransitGatewayRouteTableAssociations => 200 -2023-11-09T21:53:31.824 INFO --- [ asgi_gw_2] localstack.request.aws : AWS apigatewayv2.GetVpcLink => 503 (ServiceUnavailableException) -2023-11-09T21:53:31.828 INFO --- [ asgi_gw_6] localstack.request.aws : AWS servicediscovery.ListTagsForResource => 200 -2023-11-09T21:53:31.831 INFO --- [ asgi_gw_8] localstack.request.aws : AWS ec2.DescribeRouteTables => 200 -2023-11-09T21:53:31.834 INFO --- [ asgi_gw_7] localstack.request.aws : AWS servicediscovery.ListTagsForResource => 200 -2023-11-09T21:53:31.836 INFO --- [ asgi_gw_0] localstack.request.aws : AWS ec2.DescribePrefixLists => 200 -2023-11-09T21:53:31.842 INFO --- [ asgi_gw_1] localstack.request.aws : AWS ec2.DescribeSecurityGroups => 200 -2023-11-09T21:53:31.848 INFO --- [ asgi_gw_6] localstack.request.aws : AWS ec2.GetTransitGatewayRouteTablePropagations => 200 -2023-11-09T21:53:31.876 INFO --- [ asgi_gw_9] localstack.request.aws : AWS ec2.DescribeRouteTables => 200 -2023-11-09T21:53:31.879 INFO --- [ asgi_gw_5] localstack.request.aws : AWS ec2.DescribeRouteTables => 200 -2023-11-09T21:53:32.205 INFO --- [ asgi_gw_8] localstack.request.aws : AWS ecs.DescribeClusters => 503 (ServiceUnavailableException) -2023-11-09T21:53:32.280 INFO --- [ asgi_gw_3] localstack.request.aws : AWS ecs.DescribeTaskDefinition => 503 (ServiceUnavailableException) -2023-11-09T21:53:32.443 INFO --- [ asgi_gw_0] localstack.request.aws : AWS ecs.DescribeTaskDefinition => 503 (ServiceUnavailableException) -2023-11-09T21:53:32.584 INFO --- [ asgi_gw_6] localstack.request.aws : AWS apigatewayv2.GetVpcLink => 503 (ServiceUnavailableException) -2023-11-09T21:53:33.271 INFO --- [ asgi_gw_9] localstack.request.aws : AWS ecs.DescribeClusters => 503 (ServiceUnavailableException) -2023-11-09T21:53:33.473 INFO --- [ asgi_gw_2] localstack.request.aws : AWS ecs.DescribeTaskDefinition => 503 (ServiceUnavailableException) -2023-11-09T21:53:33.889 INFO --- [ asgi_gw_7] localstack.request.aws : AWS ecs.DescribeTaskDefinition => 503 (ServiceUnavailableException) -``` - -During infrastructure provisioning, depending on the tool and provider used, attempts may be made to reapply changes to resources following a failure, or the action might simply fail. - -### Simulating shutdowns - -To simulate the shutdown of an entire region, execute the following command: - -{{< command >}} -$ curl --location --request POST 'http://outages.localhost.localstack.cloud:4566/outages' \ ---header 'Content-Type: application/json' \ ---data-raw '[ - { - "service": "*", - "region": "us-east-1" - } -]' -{{< /command >}} - -### Other operations - -To stop outages, submit an empty list in the configuration using the following `POST` request: - -{{< command >}} -$ curl --location --request POST 'http://outages.localhost.localstack.cloud:4566/outages' \ ---header 'Content-Type: application/json' \ ---data-raw '[]' -{{< /command >}} - -To view the current configuration, use this `GET` request: - -{{< command >}} -$ curl --location --request GET 'http://outages.localhost.localstack.cloud:4566/outages' -{{< /command >}} - -To add a new service/region rule to the configuration, use a `PATCH` request as shown below: - -{{< command >}} -$ curl --location --request PATCH 'http://outages.localhost.localstack.cloud:4566/outages' \ ---header 'Content-Type: application/json' \ ---data-raw '[{"service": "transcribe", "region": "us-west-1"}]' -{{< /command >}} - -To remove a service/region rule from the configuration, execute a `DELETE` request as follows: - -{{< command >}} -$ curl --location --request DELETE 'http://outages.localhost.localstack.cloud:4566/outages' \ ---header 'Content-Type: application/json' \ ---data-raw '[{"service": "transcribe", "region": "us-west-1"}]' -{{< /command >}} - -### Conclusion - -By closely watching Terraform's responses and the status of cloud resources, you'll learn how Terraform manages these disruptions. -It's important to note how it attempts to retry operations, whether it rolls back changes or faces partial failures, and how it logs these incidents. - -This is crucial for understanding the resilience of your infrastructure provisioning against challenging conditions. -It also aids in enhancing your IaC configurations, ensuring they are more robust and effective in handling faults and errors in real-life situations. diff --git a/content/en/tutorials/simulating-outages/arch-1.png b/content/en/tutorials/simulating-outages/arch-1.png new file mode 100644 index 0000000000..1833d942cf Binary files /dev/null and b/content/en/tutorials/simulating-outages/arch-1.png differ diff --git a/content/en/tutorials/simulating-outages/arch-2.png b/content/en/tutorials/simulating-outages/arch-2.png new file mode 100644 index 0000000000..46d4e4a103 Binary files /dev/null and b/content/en/tutorials/simulating-outages/arch-2.png differ diff --git a/content/en/tutorials/simulating-outages-in-your-application-stack/outages.png b/content/en/tutorials/simulating-outages/banner.png similarity index 100% rename from content/en/tutorials/simulating-outages-in-your-application-stack/outages.png rename to content/en/tutorials/simulating-outages/banner.png diff --git a/content/en/tutorials/simulating-outages/index.md b/content/en/tutorials/simulating-outages/index.md new file mode 100644 index 0000000000..c0c280cf3f --- /dev/null +++ b/content/en/tutorials/simulating-outages/index.md @@ -0,0 +1,237 @@ +--- +title: "Chaos Engineering: Simulating Outages using Chaos API" +linkTitle: "Chaos Engineering: Simulating Outages using Chaos API" +description: Use the Chaos API to simulate service disruptions and assess how well your infrastructure can deploy and recover from unexpected situations. +type: tutorials +teaser: "" +services: +- ecs +- ec2 +- agw +- ddb +platform: +- JavaScript +deployment: +- terraform +tags: +- API Gateway +- DynamoDB +- ECS +- Chaos Engineering +pro: true +leadimage: "banner.png" +aliases: + - /tutorials/simulating-outages-in-your-application-stack/ +--- + +## Introduction + +[LocalStack Chaos API]({{< ref "chaos-api" >}}) is capable of simulating infrastructure faults to allow conducting controlled chaos engineering tests on AWS infrastructure. +Its purpose is to uncover vulnerabilities and improve system robustness. +Chaos API offers a means to deliberately introduce failures and observe their impacts, helping developers to better equip their systems against actual outages. + +## Getting started + +In this tutorial we study the effects of outages on a sample AWS application. +We use the Chaos API to simulate the outage and design a mitigation to make the application resilient against database outages. + +This tutorial is designed for users new to the Chaos API and assumes basic knowledge of the AWS CLI and our [`awslocal`](https://github.com/localstack/awscli-local) wrapper script. +In this example, we will use the Chaos API to create controlled outages in a DynamoDB database. +The aim is to test the software's behavior and error handling capabilities. + +For this particular example, we'll be using a [sample application repository](https://github.com/localstack-samples/samples-chaos-engineering/tree/master/chaos-api). +Clone the repository, and follow the instructions below to get started. + +### Prerequisites + +The general prerequisites for this guide are: + +- LocalStack Pro with [LocalStack Auth Token]({{}}) +- [AWS CLI]({{}}) with the [`awslocal` wrapper]({{}}) +- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) + +Start LocalStack by using the `docker-compose.yml` file from the repository. +Ensure to set your Auth Token as an environment variable during this process. +The cloud resources will be automatically created upon the LocalStack start. + +{{< command >}} +$ LOCALSTACK_AUTH_TOKEN= +$ docker compose up +{{< /command >}} + +### Architecture + +The following diagram shows the architecture that this application builds and deploys: + +{{< figure src="arch-1.png" width="800">}} + +### Preflight checks + +Before starting any outages, it's important to verify that our application is functioning correctly. +Start by creating an entity and saving it. +To do this, use curl to call the API Gateway endpoint for the POST method: + +{{< command >}} +$ curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \ +--header 'Content-Type: application/json' \ +--data '{ + "id": "prod-2004", + "name": "Ultimate Gadget", + "price": "49.99", + "description": "The Ultimate Gadget is the perfect tool for tech enthusiasts looking for the next level in gadgetry. +Compact, powerful, and loaded with features." +}' + +Product added/updated successfully. + +{{< /command >}} + +### Simulating the outage + +Next, we will configure the Chaos API to target all DynamoDB operations. +The Chaos API is powerful enough to refine outages to particular operations like `PutItem` or `GetItem`, but the objective here is to simulate a failure of entire service. +The following configuration will cause all API calls to fail with a 80% failure rate, each resulting in an HTTP 500 status code and a `SomethingWentWrong` error. + +{{}} +curl --location --request PATCH 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ +--header 'Content-Type: application/json' \ +--data ' +[ + { + "service": "dynamodb", + "probability": 0.8, + "error": { + "statusCode": 500, + "code": "SomethingWentWrong" + } + } +]' +{{}} + +This makes the database inaccessible. +No external client or a LocalStack service can retrieve or add new products, resulting in the API Gateway returning an Internal Server Error. + +Downtime and data loss are critical issues to avoid in enterprise applications. +Fortunately, encountering this issue early in the development phase allows developers to implement effective error handling and develop mechanisms to prevent data loss during a database outage. + +### Designing a more resilient system + +{{< figure src="arch-2.png" width="800">}} + +A possible solution involves setting up an SNS topic, an SQS queue, and a Lambda function. +The Lambda function will be responsible for retrieving queued items and attempting to re-execute the `PutItem` operation on the database. +If DynamoDB remains unavailable, the item will be placed back in the queue for a later retry. + +{{< command >}} +$ curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \ +--header 'Content-Type: application/json' \ +--data '{ + "id": "prod-1003", + "name": "Super Widget", + "price": "29.99", + "description": "A versatile widget that can be used for a variety of purposes. +Durable, reliable, and affordable." +}' + +A DynamoDB error occurred. +Message sent to queue. + +{{< /command >}} + +If we review the logs, it will show that the `DynamoDbException` has been managed effectively. + +```text +2023-11-06T22:21:40.789 DEBUG --- [ asgi_gw_2] l.services.fis.handler : FIS handler called with configs: {'dynamodb': {None: [(100, 'DynamoDbException', '500')]}} +2023-11-06T22:21:40.789 INFO --- [ asgi_gw_2] localstack.request.aws : AWS dynamodb.PutItem => 500 (DynamoDbException) +2023-11-06T22:21:40.834 DEBUG --- [ asgi_gw_4] l.services.sns.publisher : Topic 'arn:aws:sns:us-east-1:000000000000:ProductEventsTopic' publishing '5520d37a-fc21-4a73-b1bf-f9b9afce5908' to subscribed +'arn:aws:sqs:us-east-1:000000000000:ProductEventsQueue' with protocol 'sqs' (subscription 'arn:aws:sns:us-east-1:000000000000:ProductEventsTopic:0a4abf8c-744a-404a-9ff9-f132e25d1b30') +``` + +This element will remain in the queue until the outage is resolved. + +### Ending the outage + +To stop the outage, use the following configuration: + +{{< command >}} +curl --location --request POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ +--header 'Content-Type: application/json' \ +--data '[]' +{{< /command >}} + +With the outage now ended, the Product that initially failed to reach the database to finally be stored successfully. +This can be confirmed by scanning the database. + +{{< command >}} +$ awslocal dynamodb scan --table-name Products + +{ + "Items": [ + { + "name": { + "S": "Super Widget" + }, + "description": { + "S": "A versatile widget that can be used for a variety of purposes. +Durable, reliable, and affordable." + }, + "id": { + "S": "prod-1003" + }, + "price": { + "N": "29.99" + } + }, + { + "name": { + "S": "Ultimate Gadget" + }, + "description": { + "S": "The Ultimate Gadget is the perfect tool for tech enthusiasts looking for the next level in gadgetry. +Compact, powerful, and loaded with features." + }, + "id": { + "S": "prod-2004" + }, + "price": { + "N": "49.99" + } + } +], + "Count": 2, + "ScannedCount": 2, + "ConsumedCapacity": null +} + +{{< /command >}} + +### Introducing network latency + +The LocalStack Chaos API can also introduce a network latency for all connections. +This can be done with the following configuration: + +{{< command >}} +$ curl --location --request POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/effects' \ +--header 'Content-Type: application/json' \ +--data '{ + "latency": 5000 +}' +{{< /command >}} + +With this configured, you can use the same sample stack to observe and understand the effects of a 5-second delay on each service call. + +{{< command >}} +$ curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \ +--max-time 2 \ +--header 'Content-Type: application/json' \ +--data '{ + "id": "prod-1088", + "name": "Super Widget", + "price": "29.99", + "description": "A versatile widget that can be used for a variety of purposes. +Durable, reliable, and affordable." +}' + +An error occurred (InternalError) when calling the GetResources operation (reached max retries: 4) + +{{< /command >}} diff --git a/content/en/user-guide/aws/fis/index.md b/content/en/user-guide/aws/fis/index.md index dafbcbf7e6..bd1f3dad5c 100644 --- a/content/en/user-guide/aws/fis/index.md +++ b/content/en/user-guide/aws/fis/index.md @@ -3,7 +3,7 @@ title: "Fault Injection Service (FIS)" linkTitle: "Fault Injection Service (FIS)" description: > Get started with Fault Injection Service (FIS) on LocalStack -tags: ["Pro image"] +tags: ["Enterprise plan"] --- ## Introduction @@ -15,6 +15,10 @@ The full list of such possible fault injections is available in the [AWS docs](h LocalStack allows you to use the FIS APIs in your local environment to introduce faults in other services, in order to check how your setup behaves when parts of it stop working locally. The supported APIs are available on our [API coverage page](https://docs.localstack.cloud/references/coverage/coverage_fis/), which provides information on the extent of FIS API's integration with LocalStack. +{{< callout "tip" >}} +LocalStack also features its own powerful chaos engineering tool, [Chaos API]({{< ref "chaos-api" >}}). +{{< /callout >}} + ## Concepts FIS defines the following elements: @@ -26,6 +30,11 @@ FIS defines the following elements: Together this is termed as an Experiment. After the designated time, FIS restores systems to their original state and/or ceases introducing faults. +{{< callout "note" >}} +FIS experiment emulation is part of LocalStack Enterprise. +If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo). +{{< /callout >}} + FIS actions can be categorized into two main types: 1. Single-time events — For example, the `aws:ec2:stop-instances` FIS action, which sends a [`StopInstances`](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_StopInstances.html) API to specific EC2 instances. @@ -232,20 +241,22 @@ The following actions are deprecated and marked for removal: - **`localstack:generic:api-error`**: Raise a custom HTTP error. This action accepts the following parameters. + Please migrate to the [Chaos API]({{< ref "chaos-api" >}}) which supports this capability and more. - `region`: The region name where faults will be introduced, e.g. `us-west-1`. - Default: region of the experiment + Default: region of the experiment - `service`: The service name to limit faults to, e.g. `kms`. - Default: all services + Default: all services - `operation`: The operation name for the specified service to limit faults to, e.g. `ListKeys` - `percentage`: The percentage of API calls to fail among matching calls. - Default: 100 + Default: 100 - `exception`: The name of the exception to raise for affected API calls. - Default: `InternalError` + Default: `InternalError` - `errorCode`: The HTTP error code to return for impacted API calls. - Default: 500 + Default: 500 - **`localstack:kms:inject-api-internal-error`**: Special case of the previous action which injects an InternalError for KMS operations. - **`localstack:log-debug`**: Prints a debug message in the LocalStack logs when experiment is started and stopped. - **`localstack:generic:latency`**: Introduces a latency in the network call. + Please migrate to the [Chaos API]({{< ref "chaos-api" >}}). ## Current Limitations diff --git a/content/en/user-guide/chaos-engineering/_index.md b/content/en/user-guide/chaos-engineering/_index.md index e02de7be1c..e40e5af456 100644 --- a/content/en/user-guide/chaos-engineering/_index.md +++ b/content/en/user-guide/chaos-engineering/_index.md @@ -8,12 +8,6 @@ cascade: type: docs --- -The best way to understand concepts is through practice, so dive into our chaos engineering tutorials. -Learn how to [build resilient software -by detecting potential outages with the Fault Injection Service]({{< ref "tutorials/fault-injection-service-experiments" >}}), create a -[strong architecture through Route53 failover experiments]({{< ref "tutorials/route53-failover-with-fis" >}}), and -[simulate outages in your application stack]({{< ref "tutorials/simulating-outages-in-your-application-stack" >}}) . - ## Introduction Chaos engineering via LocalStack is a method to enhance system resilience by deliberately introducing controlled disruptions. @@ -23,10 +17,11 @@ This technique takes different forms depending on the team: - Architects concentrate on the strength of system design - Operations teams investigate the dependability of infrastructure setup. -Integrating chaos tests early in the development process helps identify and mitigate potential flaws, leading to systems that are more robust under stress and can withstand -turbulent conditions. +Integrating chaos tests early in the development process helps identify and mitigate potential flaws, leading to systems that are more robust under stress and can withstand turbulent conditions. Chaos engineering in LocalStack encompasses the following features: - **Application behavior and error management** through Fault Injection Service (FIS) experiments. -- **Robust architecture** tested via failover scenarios using FIS. +- **Robust architecture** tested via failover scenarios using the Chaos API. - **Consistent infrastructure setup** under challenging conditions like outages, examined through automated provisioning processes. + +The best way to understand concepts is through practice, so dive into our [chaos engineering tutorials]({{< ref "tutorials" >}}). diff --git a/content/en/user-guide/chaos-engineering/chaos-api/index.md b/content/en/user-guide/chaos-engineering/chaos-api/index.md new file mode 100644 index 0000000000..472c8aba2a --- /dev/null +++ b/content/en/user-guide/chaos-engineering/chaos-api/index.md @@ -0,0 +1,191 @@ +--- +title: "Chaos API" +linkTitle: "Chaos API" +description: Simulate outages and network failures to test the resiliency of your infrastructure +tags: ["Enterprise plan"] +--- + +## Introduction + +LocalStack Chaos API allows you to mimic outages across any AWS region or service. +Intentionally triggering service outages and monitoring the system's response in situations where the infrastructure is compromised offers a powerful way to test. +This strategy helps gauge the effectiveness of the system's deployment procedures and its resilience against infrastructure disruptions, which is a key element of chaos engineering. + +You can use LocalStack Chaos API to cause API failures for any combination of the following: + +- Service +- Region +- Operation + +You can customise the HTTP error code and message that LocalStack responds with. +If required, you can make the failures occur probabilistically. + +Furthermore, the Chaos API can also be configured to add a network latency for all calls. + +{{< alert title="Note">}} +Chaos API is available as part of the LocalStack Enterprise plan. +If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo) to request access. +{{< /alert >}} + +## Prerequisites + +The prerequisites for this guide are: + +- LocalStack Pro with [LocalStack CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli) & [LocalStack Auth Token](https://docs.localstack.cloud/getting-started/auth-token/) +- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) +- [Python](https://www.python.org/downloads/) + +## Configuration + +The disruption types supported by Chaos API are broadly categorised into two groups. +**Service Faults** lead to an application-level HTTP error in an AWS service, and **Network Effects** introduce network-level effects to all connections. + +### Service Faults + +Service faults can be configured using the endpoint at `/_localstack/chaos/faults`. +The configuration schema consists of an array of one or more rules, where each rule specifies the conditions for the fault to occur. +When active, rules are evaluated sequentially on every request to LocalStack until the first match. + +The schema for the configuration is as follows. + +```json +[ + { + "region": "(str) Region name, e.g. 'ap-south-1'. If omitted, all regions are affected.", + "service": "(str) Name of the service, e.g. 'kinesis'. If omitted, all services are affected.", + "operation": "(str) Name of the operation, e.g. 'PutRecord'. If omitted, all operations are affected.", + "probability": "(num) Probability of invoking this rule, e.g. 0.5. If omitted, 1 is used.", + "error": { + "statusCode": "(int) HTTP status code to use in response, e.g. 503. If omitted, 503 is used.", + "code": "(str) Descriptive error code used in response. If omitted, 'ServiceUnavailable' is used." + } + }, + ... +] +``` + +The endpoint allows the following operations: +- `GET`: Get current configuration +- `POST`: Add new configuration +- `PATCH`: Add a rule +- `DELETE`: Delete a rule + +An empty array `[]` disables the faults entirely, while an empty rule in the array `[{}]` causes all AWS operations to lead to faults. + +### Network Effects + +Network effects are configured using the endpoint `/_localstack/chaos/effects`. +Currently the Chaos API only supports a latency factor. + +```json +{ + "latency": "(int) Network latency in milliseconds. By default, 0 is used." +} +``` + +This endpoint allows the following operations: +- `GET`: Get current configuration +- `POST`: Add new configuration + +## Examples + +To cause faults, make a POST request as follows: + +{{< command >}} +$ curl --location --request POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ +--header 'Content-Type: application/json' \ +--data ' +[ + { + "service": "s3", + "region": "us-east-1" + }, + { + "service": "s3", + "region": "ap-south-1" + }, + { + "service": "lambda", + } +]' +{{< /command >}} + +In this example, S3 is affected in `us-east-1` and `ap-south-1,` and Lambda is affected in all regions. +All calls to these services in these regions will return a 503 Service Unavailable error. + +To see this in action, try to create an S3 bucket in `us-east-1`: + +{{< command >}} +$ awslocal s3 mb s3://test-bucket --region us-east-1 + +make_bucket failed: s3://test-bucket An error occurred (ServiceUnavailableException) when calling the CreateBucket operation (reached max retries: 4): Service 's3' not accessible due to an outage + +{{< /command >}} + +However, the same operation, when run in `eu-central-1` will work as expected. + +{{< command >}} +$ awslocal s3 mb s3://test-bucket --region eu-central-1 + +make_bucket: test-bucket + +{{< /command >}} + +Faults can be disabled by setting an empty rule list in the configuration. +The following request will clear the current configuration: + +{{< command >}} +$ curl --location --request POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ +--header 'Content-Type: application/json' \ +--data '[]' +{{< /command >}} + +To retrieve the current configuration, make the following GET call: + +{{< command >}} +$ curl --location --request GET 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' +{{}} + +To add a new rule to the current configuration, make a PATCH call as follows: + +{{< command >}} +$ curl --location --request PATCH 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ +--header 'Content-Type: application/json' \ +--data ' +[ + { + "service": "kinesis", + "operation": "PutRecord", + "probability": 0.3, + "error": { + "statusCode": 400, + "code": "ProvisionedThroughputExceededException" + } + } +]' +{{}} + +This new rule will cause probabilistic failures for Kinesis PutRecord operation. +Here, the returned error is also customised to be HTTP 400 ProvisionedThroughputExceededException. + +To remove a rule from the configuration, make a DELETE call as follows: + +{{< command >}} +$ curl --location --request DELETE 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ +--header 'Content-Type: application/json' \ +--data '[{"service": "lambda"}]' +{{}} + +The rule to be removed must be exactly the same as in the existing configuration. + +## Comparison with Fault Injection Service + +AWS [Fault Injection Service (FIS)]({{< ref "fis" >}}) also allows controlled chaos engineering experiments on infrastructure. + +When it comes to fault injection, Chaos API has some overlaps with FIS, but it is designed to have a broader application. +For example, the FIS action `aws:fis:inject-api-internal-error` injects Internal Errors into requests, but only for EC2. +The Chaos API has not such limitation. + +Another difference is that FIS is capable of running procedural experiments where it can invoke API actions that affect AWS resources. +For example, the action `aws:ec2:stop-instances` can stop EC2 instances. +The Chaos API focuses on declarative effects that impact the AWS API. diff --git a/content/en/user-guide/chaos-engineering/fault-injection-service/index.md b/content/en/user-guide/chaos-engineering/fault-injection-service/index.md index c873effc6d..47881246bc 100644 --- a/content/en/user-guide/chaos-engineering/fault-injection-service/index.md +++ b/content/en/user-guide/chaos-engineering/fault-injection-service/index.md @@ -1,53 +1,51 @@ --- -title: "Using the Fault Injection Service" -linkTitle: "Using the Fault Injection Service" -weight: 1 -description: Use LocalStack Outages Extension to mimic service outages by testing your infrastructure's ability to deploy robustly and recover from unexpected events. -tags: ["Pro image"] +title: "AWS Fault Injection Service" +linkTitle: "AWS Fault Injection Service" +description: Use Fault Injection Service to simulate faults in your infrastructure and test its fault tolerance +tags: ["Enterprise plan"] +aliases: + - /tutorials/fault-injection-service-experiments/ --- ## Introduction -The [AWS Fault Injection Service](https://aws.amazon.com/fis/) is a fully managed service designed to help you improve -the resilience of your applications by simulating -real-world outages and operational issues. -This service allows you to conduct controlled experiments on your AWS -infrastructure, injecting -faults and observing how your system responds under various conditions. -By using the Fault Injection Service, you can -identify weaknesses, -test recovery procedures, and ensure that your applications can withstand unexpected disruptions. -This proactive -approach to reliability -engineering enables you to enhance system robustness, minimize downtime, and maintain a high level of service -availability for your users. - -To see the FIS service in action within a more complex application stack, please refer to the [Chaos Engineering Tutorials]({{< ref "tutorials" >}}). - -### Prerequisites - -The general prerequisites for this guide are: - -- LocalStack Pro - with [LocalStack CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli) & [LocalStack Auth Token](https://docs.localstack.cloud/getting-started/auth-token/) -- [AWS CLI](https://docs.localstack.cloud/user-guide/integrations/aws-cli/) with - the [`awslocal` wrapper](https://docs.localstack.cloud/user-guide/integrations/aws-cli/#localstack-aws-cli-awslocal) +The [Fault Injection Service](https://aws.amazon.com/fis/) is a fully managed service by AWS designed to help you improve the resilience of your applications by simulating real-world outages and operational issues. +This service allows you to conduct controlled experiments on your AWS infrastructure, injecting faults and observing how your system responds under various conditions. + +By using the Fault Injection Service, you can identify weaknesses, test recovery procedures, and ensure that your applications can withstand unexpected disruptions. +This proactive approach to reliability engineering enables you to enhance system robustness, minimize downtime, and maintain a high level of service availability for your users. + +To see the FIS in action within a more complex application stack, please refer to the [Chaos Engineering Tutorials]({{< ref "tutorials" >}}). + +{{< alert title="Note">}} +Fault Injection Service emulation is available as part of the LocalStack Enterprise plan. +If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo) to request access. +{{< /alert >}} + +## Prerequisites + +The prerequisites for this guide are: + +- LocalStack Pro with [LocalStack CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli) & [LocalStack Auth Token](https://docs.localstack.cloud/getting-started/auth-token/) +- [AWS CLI](https://docs.localstack.cloud/user-guide/integrations/aws-cli/) with the [`awslocal` wrapper](https://docs.localstack.cloud/user-guide/integrations/aws-cli/#localstack-aws-cli-awslocal) - [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) -Ensure to set your Auth Token as an environment variable before beginning: +Ensure that you set the Auth Token as an environment variable before beginning: {{< command >}} $ LOCALSTACK_AUTH_TOKEN= $ localstack start {{< /command >}} -### The Setup +## Getting Started -This guide is created with users who are new to FIS in mind, and assumes basic knowledge of the AWS CLI and -our `awslocal` wrapper script. +{{< callout "tip" >}} +For more information on LocalStack FIS, please refer to the [FIS service docs]({{< ref "user-guide/aws/fis" >}}). +{{< /callout >}} -The following demo will depict constructing various FIS experiments designed to trigger different types of failures in a -DynamoDB service. +This guide is created with users who are new to FIS in mind, and assumes basic knowledge of the AWS CLI and our `awslocal` wrapper script. + +The following demo will depict constructing various FIS experiments designed to trigger different types of failures in a DynamoDB service. Let's create a simple DynamoDB table called `Students` in the `us-east-1` region. @@ -116,7 +114,7 @@ $ awslocal dynamodb put-item --table-name Students --region us-east-1 --item '{ }' {{< /command >}} -And then we can look up one of the students by ID, also using the AWS local CLI: +And then we can look up one of the students by ID, also using the awslocal CLI: {{< command >}} $ awslocal dynamodb get-item --table-name Students --key '{"id": {"S": "1216"}}' @@ -143,49 +141,47 @@ $ awslocal dynamodb get-item --table-name Students --key '{"id": {"S": "1216"}}' {{< /command >}} -### Key concepts of FIS +## Key concepts of FIS -Some of the most important concepts associated with a FIS experiment, that we'll see in the following, are: +Some of the most important concepts associated with a FIS experiment are: **1. Experiment Templates**: Experiment templates define the actions, targets, and any stop conditions for your experiment. -They serve as -blueprints for conducting fault injection experiments, allowing you to specify what resources are targeted, what faults are injected, -and under what conditions the experiment should automatically stop. +They serve as blueprints for conducting fault injection experiments, allowing you to specify what resources are targeted, what faults are injected, and under what conditions the experiment should automatically stop. **2. Actions**: Actions are the specific fault injection operations that the experiment performs on the target resources. -These can be -injecting latency or throttling to API requests, completely blocking access to instances, etc. Actions define the type of fault, parameters for -the fault injection, and the targets affected. +These can be injecting latency or throttling to API requests, completely blocking access to instances, etc. +Actions define the type of fault, parameters for the fault injection, and the targets affected. **3. Targets**: Targets are the AWS resources on which the experiment actions will be applied. -To make things even more fine-grained, a specific operation -of the service can be targeted. +To make things even more fine-grained, a specific operation of the service can be targeted. **4. Stop Conditions**: Stop conditions are criteria that, when met, will automatically stop the experiment. **5. IAM Roles and Permissions**: To run experiments, AWS FIS requires specific IAM roles and permissions. -These are necessary for AWS FIS to -perform actions on your behalf, like injecting faults into your resources. +These are necessary for AWS FIS to perform actions on your behalf, like injecting faults into your resources. **6. -Experiment Execution**: When you start an experiment, AWS FIS executes the actions defined in the experiment template against the specified targets, -adhering to any defined stop conditions. -The execution process is logged, and detailed information about the experiment's progress and outcome is -provided. +Experiment Execution**: When you start an experiment, AWS FIS executes the actions defined in the experiment template against the specified targets, adhering to any defined stop conditions. +The execution process is logged, and detailed information about the experiment's progress and outcome is provided. + +## Examples -### Getting started with FIS +### Service Unavailability -#### Service Unavailability +{{< callout "warning" >}} +The `localstack:generic:api-error` action is deprecated and marked for removal. +Please use the [Chaos API]({{< ref "chaos-api" >}}) to achieve the same effect. +{{< /callout >}} In a file called `dynamodb-experiment.json` let's define a FIS experiment that causes all calls to the `GetItem` API of the DynamoDB service to return a 503 `Service Unavailable` response. This failure will happen 100% of the times the method is called. -```bash +```json { "actions": { "Test disruption": { @@ -296,7 +292,7 @@ An error occurred (DynamoDbException) when calling the GetItem operation (reache The logs now show several attempts of performing the `GetItem` operation, before returning the error message: -```bash +```text 2024-03-20T15:54:16.630 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.GetItem => 500 (DynamoDbException) 2024-03-20T15:54:16.707 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.GetItem => 500 (DynamoDbException) 2024-03-20T15:54:16.825 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.GetItem => 500 (DynamoDbException) @@ -330,13 +326,17 @@ Finally, the experiment can be stopped using the experiment's ID with the follow awslocal fis stop-experiment --id 1a01327a-79d5-4202-8132-e56e55c9391b ``` -#### Region Unavailability +### Region Unavailability + +{{< callout "warning" >}} +The `localstack:generic:api-error` action is deprecated and marked for removal. +Please use the [Chaos API]({{< ref "chaos-api" >}}) to achieve the same effect. +{{< /callout >}} This sort of experiment involves disabling entire regions to simulate regional outages and failovers. -Let's see what that would look like, -in a separate file, `regional-experiment.json`: +Let's see what that would look like, in a separate file, `regional-experiment.json`: -```bash +```json { "actions": { "regionUnavailable-us-east-1": { @@ -516,12 +516,17 @@ Just as with the earlier experiment, this one should be stopped by running the f awslocal fis stop-experiment --id e49283c1-c2e0-492b-b69f-9fbd710bc1e3 ``` -#### Service Latency +### Service Latency + +{{< callout "warning" >}} +The `localstack:generic:latency` action is deprecated and marked for removal. +Please use the [Chaos API]({{< ref "chaos-api" >}}) to achieve the same effect. +{{< /callout >}} Let's now add some latency to our DynamoDB API calls. First the definition of a new experiment template in another file, `latency-experiment.json`: -```bash +```json { "actions": { "latency": { @@ -596,10 +601,8 @@ $ awslocal fis start-experiment --experiment-template-id 1f6e0ce8-57ed-4987-a7e5 {{< /command >}} This FIS experiment introduces a delay of 5 seconds to all DynamoDB API calls within the `us-east-1` region. -Tables located in the `eu-central-1` region, -or any other service, remain unaffected. -To extend the latency effect to a regional level, the specific service constraint can be omitted, -thereby applying the latency to all resources within the selected region. +Tables located in the `eu-central-1` region, or any other service, remain unaffected. +To extend the latency effect to a regional level, the specific service constraint can be omitted, thereby applying the latency to all resources within the selected region. As always, remember to stop your experiment, so it does not cause unexpected issues down the line: @@ -609,10 +612,9 @@ awslocal fis stop-experiment --id dd598567-56e6-4d00-9ef5-15c7e90e7851 Remember to replace the IDs with your own corresponding values. -#### Experiment overview +### Experiment overview -If you want to keep track of all your experiments and make sure nothing is running in the background to hinder any other work, you can get an overview by using -the command: +If you want to keep track of all your experiments and make sure nothing is running in the background to hinder any other work, you can get an overview by using the command: {{< command >}} $ awslocal fis list-experiments @@ -655,5 +657,3 @@ $ awslocal fis list-experiments } {{< /command >}} - -For extra information or limitations of the LocalStack FIS service, please refer to the dedicated service [documentation]({{< ref "user-guide/aws/fis" >}}). diff --git a/content/en/user-guide/chaos-engineering/outages-extension/index.md b/content/en/user-guide/chaos-engineering/outages-extension/index.md index 4a6a18c107..6f2c572b52 100644 --- a/content/en/user-guide/chaos-engineering/outages-extension/index.md +++ b/content/en/user-guide/chaos-engineering/outages-extension/index.md @@ -1,13 +1,17 @@ --- title: "Outages Extension" linkTitle: "Outages Extension" -weight: 3 description: Use LocalStack Outages Extension to mimic service outages by testing your infrastructure's ability to deploy robustly and recover from unexpected events. tags: ["Enterprise plan"] --- ## Introduction +{{< callout "warning" >}} +Outages Extension has been deprecated. +It is recommended to migrate to the [Chaos API]({{< ref "chaos-api" >}}). +{{< /callout >}} + The [LocalStack Outages Extension](https://pypi.org/project/localstack-extension-outages/) allows you to mimic outages across any AWS region or service. By integrating the Outages Extension using the [LocalStack Extension mechanism](https://docs.localstack.cloud/user-guide/extensions/), you can assess your infrastructure's robustness. @@ -16,11 +20,6 @@ where the infrastructure is compromised offers a powerful way to test. This strategy helps gauge the effectiveness of the system's deployment procedures and its resilience against infrastructure disruptions, which is a key element of chaos engineering. -{{< callout >}} -Outages Extension is currently available as part of the **LocalStack Enterprise** plan. -If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo) to request access. -{{< /callout >}} - ### Prerequisites The general prerequisites for this guide are: diff --git a/content/en/user-guide/chaos-engineering/special-configs/index.md b/content/en/user-guide/chaos-engineering/special-configs/index.md deleted file mode 100644 index 1138896103..0000000000 --- a/content/en/user-guide/chaos-engineering/special-configs/index.md +++ /dev/null @@ -1,117 +0,0 @@ ---- -title: "Special Configurations" -linkTitle: "Special Configurations" -weight: 4 -description: Set up LocalStack for chaos engineering using environment variables. -tags: ["Enterprise plan"] ---- - -## Introduction - -LocalStack allows users to inject intentional errors, particularly in Kinesis and DynamoDB. -You can introduce controlled chaos into your development environment enhance to enhance service resilience. -By configuring environment variables, you can simulate disruptions. -This simple setup helps improve the response mechanisms of these key AWS services, ensuring robust architecture under challenging conditions with minimal initial configuration. - -This guide demonstrates the `DYNAMODB_ERROR_PROBABILITY` and `KINESIS_ERROR_PROBABILITY` configuration flags. -The guide assumes basic knowledge of the AWS CLI and our [`awslocal`](https://github.com/localstack/awscli-local) wrapper script. - -## Kinesis Error Probability - -The `KINESIS_ERROR_PROBABILITY` setting allows users to introduce `ProvisionedThroughputExceededException` errors randomly into Kinesis API responses. -The value for this setting ranges from 0.0 (default) to 1.0. - -To demonstrate, set up LocalStack with `KINESIS_ERROR_PROBABILITY` at 0.5, indicating a 50% chance of receiving a `ProvisionedThroughputExceededException` from Kinesis. - -{{< command >}} -$ KINESIS_ERROR_PROBABILITY=0.5 localstack start -{{< /command >}} - -Next, create a Kinesis stream using the AWS CLI with the [`CreateStream`](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_CreateStream.html) API. -For example, to create a stream named "ProductsStream" with one shard, use: - -{{< command >}} -$ awslocal kinesis create-stream \ - --stream-name ProductsStream \ - --shard-count 1 -{{< /command >}} - -Assuming you have a product JSON converted into a Base64-encoded string, you can add this record to the stream like so: - -{{< command >}} -$ awslocal kinesis put-record \ - --stream-name ProductsStream \ - --partition-key "12345" \ - --data "eyJwcm9kdWN0SWQiOiIxMjMiLCJwcm9kdWN0TmFtZSI6IlN1cGVyV2lkZ2V0IiwicHJvZHVjdFByaWNlIjoiMTk5Ljk5In0=" -{{< /command >}} - -After performing similar operations repeatedly, you can check the logs to verify that the configuration is working as intended. -Remember, records will only be added during successful calls. - -```bash -2023-11-09T23:33:49.867 INFO --- [ asgi_gw_0] localstack.request.aws : AWS kinesis.CreateStream => 200 -2023-11-09T23:34:01.003 INFO --- [ asgi_gw_0] localstack.request.aws : AWS kinesis.PutRecord => 200 -2023-11-09T23:34:05.114 INFO --- [ asgi_gw_0] localstack.request.aws : AWS kinesis.PutRecord => 200 -2023-11-09T23:34:08.178 INFO --- [ asgi_gw_0] localstack.request.aws : AWS kinesis.PutRecord => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:34:08.346 INFO --- [ asgi_gw_0] localstack.request.aws : AWS kinesis.PutRecord => 200 -2023-11-09T23:34:09.726 INFO --- [ asgi_gw_0] localstack.request.aws : AWS kinesis.PutRecord => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:34:10.499 INFO --- [ asgi_gw_0] localstack.request.aws : AWS kinesis.PutRecord => 200 -2023-11-09T23:34:11.982 INFO --- [ asgi_gw_0] localstack.request.aws : AWS kinesis.PutRecord => 200 -``` - -## DynamoDB Error Probability - -The `DYNAMODB_ERROR_PROBABILITY` setting, similar to the Kinesis configuration, allows for random `ProvisionedThroughputExceededException` responses from the DynamoDB service. -It also accepts a decimal value between 0.0 (default) and 1.0. - -To start LocalStack with a high error probability for DynamoDB, set `DYNAMODB_ERROR_PROBABILITY` to 0.8: - -{{< command >}} -$ DYNAMODB_ERROR_PROBABILITY=0.8 localstack start -{{< /command >}} - -Next, create a DynamoDB table using the AWS CLI with the [`CreateTable`](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_CreateTable.html) API. -For example, to create a table named "Products" with a primary key of "ProductId", use: - -```bash -$ awslocal dynamodb create-table \ - --table-name Products \ - --attribute-definitions AttributeName=ProductId,AttributeType=S \ - --key-schema AttributeName=ProductId,KeyType=HASH \ - --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1 -``` - -You can add items to the table using the [`PutItem`](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html) API. -For example, to add a product with an ID of "123", a name of "SuperWidget", and a price of "199.99", use: - -```bash -awslocal dynamodb put-item \ - --table-name Products \ - --item '{ - "ProductId": {"S": "123"}, - "ProductName": {"S": "SuperWidget"}, - "ProductPrice": {"N": "199.99"} - }' -``` - -The logs will now show a higher frequency of `ProvisionedThroughputExceededException` errors, followed by successful attempts due to the `boto3` retry mechanism: - -```bash -2023-11-09T23:59:12.836 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.CreateTable => 200 -2023-11-09T23:59:27.889 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:27.968 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:28.089 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:28.410 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.PutItem => 200 -2023-11-09T23:59:35.845 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:35.911 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:36.028 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:36.249 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:36.673 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:37.484 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:39.101 INFO --- [ asgi_gw_0] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:42.326 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-09T23:59:48.737 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.PutItem => 400 (ProvisionedThroughputExceededException) -2023-11-10T00:00:01.606 INFO --- [ asgi_gw_1] localstack.request.aws : AWS dynamodb.PutItem => 200 -``` - -Despite these errors, the retry mechanism ensures that all items are eventually added to the DynamoDB table. diff --git a/content/en/user-guide/chaos-engineering/web-application-dashboard/index.md b/content/en/user-guide/chaos-engineering/web-application-dashboard/index.md index 37a7589ffc..897955f21e 100644 --- a/content/en/user-guide/chaos-engineering/web-application-dashboard/index.md +++ b/content/en/user-guide/chaos-engineering/web-application-dashboard/index.md @@ -1,20 +1,19 @@ --- title: "Chaos Engineering Dashboard" linkTitle: "Chaos Engineering Dashboard" -weight: 2 -description: Effortlessly design, activate, and manage fault injection experiments with the LocalStack user-friendly dashboard. -tags: ["Pro image"] +description: Effortlessly design, activate, and manage Fault Injection Service experiments +tags: ["Enterprise plan"] --- ## Introduction -LocalStack's Chaos Engineering dashboard offers streamlined testing for cloud applications, enabling you to simulate server -errors, service outages, regional disruptions, and network latency with ease, ensuring your app is ready for real-world challenges. -You can find this **Pro** feature in the web app by navigating to [**app.localstack.cloud/chaos-engineering**](https://app.localstack.cloud/chaos-engineering). +The Chaos Engineering dashboard in LocalStack offers streamlined testing for cloud applications, enabling you to simulate server errors, service outages, regional disruptions, and network latency with ease, ensuring your app is ready for real-world challenges. -## Web Application FIS Dashboard +You can find this feature in the LocalStack Web Application by navigating to [**app.localstack.cloud/chaos-engineering**](https://app.localstack.cloud/chaos-engineering). -LocalStack Web Application provides a dashboard for conducting FIS experiments in user stacks. +## FIS Dashboard + +The FIS Dashboard in LocalStack Web Application allows you to conduct Fault Injection Service experiments on infrastructure stacks. This control panel offers various FIS experiment options, which includes: - **500 Internal Error**: This experiment randomly terminates incoming requests, returning an `Internal Server Error` with a response code of 500. @@ -24,9 +23,6 @@ This control panel offers various FIS experiment options, which includes: {{< figure src="fis-dashboard.png" width="900" >}} -This LocalStack dashboard is not just an easy-to-use testing tool, it's a foundation for building reusable Fault Injection -Simulation (FIS) templates. -By defining experiments using this interface, you create a set of -customizable templates that can be seamlessly integrated into any future automation workflows. -It's a time-saving -feature, ensuring consistent and efficient testing across various stages of your application's development lifecycle. +This LocalStack dashboard is not just an easy-to-use testing tool, it's a foundation for building reusable Fault Injection Service templates. +By defining experiments using this interface, you create a set of customizable templates that can be seamlessly integrated into any future automation workflows. +It's a time-saving feature, ensuring consistent and efficient testing across various stages of your application's development lifecycle.