Author: Cwen Yin (Software Engineer at PingCAP)
Transcreator: Yajing Wang; Editors: Calvin Weng, Caitin Chen, Tom Dewan
Chaos Engineering is a way to test a production software system’s robustness by simulating unusual or disruptive conditions. For many people, however, the transition from learning Chaos Engineering to practicing it on their own systems is daunting. It sounds like one of those big ideas that require a fully-equipped team to plan ahead. Well, it doesn’t have to be. To get started with chaos experimenting, you may be just one suitable platform away.
Chaos Mesh is an easy-to-use, open-source, cloud-native Chaos Engineering platform that orchestrates chaos in Kubernetes environments. This 10-minute tutorial will help you quickly get started with Chaos Engineering and run your first chaos experiment with Chaos Mesh.
For more information about Chaos Mesh, refer to our previous article or the chaos-mesh project on GitHub.
A preview of our little experiment
Chaos experiments are similar to experiments we do in a science class. It’s perfectly fine to stimulate turbulent situations in a controlled environment. In our case here, we will be simulating network chaos on a small web application called web-show. To visualize the chaos effect, web-show records the latency from its pod to the kube-controller pod (under the namespace of kube-system
) every 10 seconds.
The following clip shows the process of installing Chaos Mesh, deploying web-show, and creating the chaos experiment within a few commands:
Now it’s your turn! It’s time to get your hands dirty.
Let’s get started!
For our simple experiment, we use Kubernetes in the Docker (Kind) for Kubernetes development. You can feel free to use Minikube or any existing Kubernetes clusters to follow along.
Prepare the environment
Before moving forward, make sure you have Git and Docker installed on your local computer, with Docker up and running. For macOS, it’s recommended to allocate at least 6 CPU cores to Docker. For details, see Docker configuration for Mac.
-
Get Chaos Mesh:
git clone https://github.com/chaos-mesh/chaos-mesh.git cd chaos-mesh/
-
Install Chaos Mesh with the
install.sh
script:./install.sh --local kind
install.sh
is an automated shell script that checks your environment, installs Kind, launches Kubernetes clusters locally, and deploys Chaos Mesh. To see the detailed description ofinstall.sh
, you can include the--help
option.Note:
If your local computer cannot pull images from
docker.io
orgcr.io
, use the local gcr.io mirror and execute./install.sh --local kind --docker-mirror
instead. -
Set the system environment variable:
source ~/.bash_profile
Note:
Depending on your network, these steps might take a few minutes.
If you see an error message like this:
ERROR: failed to create cluster: failed to generate kubeadm config content: failed to get kubernetes version from node: failed to get file: command "docker exec --privileged kind-control-plane cat /kind/version" failed with error: exit status 1
increase the available resources for Docker on your local computer and execute the following command:
./install.sh --local kind --force-local-kube
When the process completes you will see a message indicating Chaos Mesh is successfully installed.
Deploy the application
The next step is to deploy the application for testing. In our case here, we choose web-show because it allows us to directly observe the effect of network chaos. You can also deploy your own application for testing.
-
Deploy web-show with the
deploy.sh
script:# Make sure you are in the Chaos Mesh directory cd examples/web-show && ./deploy.sh
Note:
If your local computer cannot pull images from
docker.io
, use thelocal gcr.io
mirror and execute./deploy.sh --docker-mirror
instead. -
Access the web-show application. From your web browser, go to
http://localhost:8081
.
Create the chaos experiment
Now that everything is ready, it’s time to run your chaos experiment!
Chaos Mesh uses CustomResourceDefinitions (CRD) to define chaos experiments. CRD objects are designed separately based on different experiment scenarios, which greatly simplifies the definition of CRD objects. Currently, CRD objects that have been implemented in Chaos Mesh include PodChaos, NetworkChaos, IOChaos, TimeChaos, and KernelChaos. Later, we’ll support more fault injection types.
In this experiment, we are using NetworkChaos for the chaos experiment. The NetworkChaos configuration file, written in YAML, is shown below:
apiVersion: pingcap.com/v1alpha1
kind: NetworkChaos
metadata:
name: network-delay-example
spec:
action: delay
mode: one
selector:
namespaces:
- default
labelSelectors:
"app": "web-show"
delay:
latency: "10ms"
correlation: "100"
jitter: "0ms"
duration: "30s"
scheduler:
cron: "@every 60s"
For detailed descriptions of NetworkChaos actions, see Chaos Mesh wiki. Here, we just rephrase the configuration as:
- target:
web-show
- mission: inject a
10ms
network delay every60s
- attack duration:
30s
each time
To start NetworkChaos, do the following:
-
Run
network-delay.yaml
:# Make sure you are in the chaos-mesh/examples/web-show directory kubectl apply -f network-delay.yaml
-
Access the web-show application. In your web browser, go to
http://localhost:8081
.From the line graph, you can tell that there is a 10 ms network delay every 60 seconds.
Congratulations! You just stirred up a little bit of chaos. If you are intrigued and want to try out more chaos experiments with Chaos Mesh, check out examples/web-show.
Delete the chaos experiment
Once you’re finished testing, terminate the chaos experiment.
-
Delete
network-delay.yaml
:# Make sure you are in the chaos-mesh/examples/web-show directory kubectl delete -f network-delay.yaml
-
Access the web-show application. From your web browser, go to
http://localhost:8081
.
From the line graph, you can see the network latency level is back to normal.
Delete Kubernetes clusters
After you’re done with the chaos experiment, execute the following command to delete the Kubernetes clusters:
kind delete cluster --name=kind
Note:
If you encounter the
kind: command not found
error, executesource ~/.bash_profile
command first and then delete the Kubernetes clusters.
Cool! What’s next?
Congratulations on your first successful journey into Chaos Engineering. How does it feel? Chaos Engineering is easy, right? But perhaps Chaos Mesh is not that easy-to-use. Command-line operation is inconvenient, writing YAML files manually is a bit tedious, or checking the experiment results is somewhat clumsy? Don’t worry, Chaos Dashboard is on its way! Running chaos experiments on the web sure does sound exciting! If you’d like to help us build testing standards for cloud platforms or make Chaos Mesh better, we’d love to hear from you!
If you find a bug or think something is missing, feel free to file an issue, open a pull request (PR), or join us on the #sig-chaos-mesh channel in the TiDB Community slack workspace.
GitHub: https://github.com/chaos-mesh
Experience modern data infrastructure firsthand.
TiDB Dedicated
A fully-managed cloud DBaaS for predictable workloads
TiDB Serverless
A fully-managed cloud DBaaS for auto-scaling workloads