AWS AZ
recover_az
Rolls back the subnet(s), EKS instance(s), ASG(s) that were affected by the fail_az action
Simulates the loss of an AZ in an AWS Region for EKS clusters with managed nodegroups
Type | action |
Module | azchaosaws.eks.actions |
Name | fail_az |
Return | mapping |
This function simulates the loss of an AZ in an AWS Region for EKS clusters with managed nodegroups. All nodegroups within the tagged clusters will be affected. For network failure type, it uses a blackhole network ACL with deny all traffic. For instance failure type, it stops normal instances with force; stops persistent spot instances; cancels spot requests and terminates one-time spot instances. Ensure your target clusters are tagged. ASG(s) that are part of the managed node groups will also be impacted.
Usage
JSON
{
"name": "fail_az",
"type": "action",
"provider": {
"type": "python",
"module": "azchaosaws.eks.actions",
"func": "fail_az",
"arguments": {
"az": "",
"dry_run": true
}
}
}
YAML
name: fail_az
provider:
arguments:
az: ""
dry_run: true
func: fail_az
module: azchaosaws.eks.actions
type: python
type: action
Arguments
Name | Type | Default | Required | Title | Description |
---|---|---|---|---|---|
az | string | Yes | Availability Zone | AZ to target | |
tags | mapping | [{"Name": ""}] | No | Tags | Match only resources with these tags |
failure_type | string | network | No | Failure Type | Type of failure to apply: network, instance |
state_path | string | /tmp/fail_eks_az.json | No | Local Path to Operation | Path to a local file that holds the information of this operation, used by the recover_az action. Unless you need to run this action multiple times in the same experiment, you can ignore this field |
dry_run | boolean | false | No | Dry Run | Only perform a dry run for it |
Required:
Optional:
[{"Name": ""}]
)Return structure
{
"AvailabilityZone": str,
"DryRun": bool,
"Clusters": [
{
"ClusterName": str,
"NodeGroups": [
{
"NodeGroupName": str,
"AutoScalingGroups": [
{
"AutoScalingGroupName": str,
"Before": {
"SubnetIds": List[str],
"AZRebalance": bool,
"MinSize": int,
"MaxSize": int,
"DesiredCapacity": int
},
"After": {
"SubnetIds": List[str],
"AZRebalance": bool,
"MinSize": int,
"MaxSize": int,
"DesiredCapacity": int
}
}
],
"Subnets": [
{
"SubnetId": str,
"VpcId": str,
"Before": {
"NetworkAclId": str,
"NetworkAclAssociationId": str
},
"After": {
"NetworkAclId": str,
"NetworkAclAssociationId": str
}
},
...
],
"Instances": [
{
"InstanceId": str,
"Before": {
"State": 'pending'|'running'
},
"After": {
"State": 'stopping'|'stopped'
}
},
...
]
}
...
]
}
]
}
Signature
def fail_az(
az: str = None,
dry_run: bool = None,
failure_type: str = "network",
tags: List[Dict[str, str]] = [{"Name": ""}],
state_path: str = "fail_az.{}.json".format(__package__.split(".", 1)[1]),
configuration: Configuration = None,
) -> Dict[str, Any]:
pass