{"id":3455,"date":"2021-12-09T10:09:56","date_gmt":"2021-12-09T10:09:56","guid":{"rendered":"https:\/\/en.pingcap.com\/?p=3455"},"modified":"2024-08-20T06:46:24","modified_gmt":"2024-08-20T13:46:24","slug":"implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development","status":"publish","type":"post","link":"https:\/\/www.pingcap.com\/ko\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/","title":{"rendered":"Implementing Chaos Engineering in K8s: Chaos Mesh Principle Analysis and Control Plane Development"},"content":{"rendered":"\n<p><strong>Author:<\/strong>&nbsp;<a href=\"https:\/\/github.com\/mayocream\">Mayo Cream<\/a>&nbsp;(Kubernetes Member, CNCF Security TAG Member, OSS Contributor)<\/p>\n\n\n\n<p><a href=\"https:\/\/chaos-mesh.org\/docs\/\">Chaos Mesh<\/a>&nbsp;is an open-source, cloud-native Chaos Engineering platform built on Kubernetes (K8s) custom resource definitions (CRDs). Chaos Mesh can simulate various types of faults and has an enormous capability to orchestrate fault scenarios. You can use Chaos Mesh to conveniently simulate various abnormalities that might occur in development, testing, and production environments and find potential problems in the system.<\/p>\n\n\n\n<p>In this article, I&#8217;ll explore the practice of Chaos Engineering in Kubernetes clusters, discuss important Chaos Mesh features through analysis of its source code, and explain how to develop Chaos Mesh&#8217;s control plane with code examples.<\/p>\n\n\n\n<p>If you&#8217;re not familiar with Chaos Mesh, please review the&nbsp;<a href=\"https:\/\/chaos-mesh.org\/docs\/#architecture-overview\">Chaos Mesh documentation<\/a>&nbsp;to get a basic knowledge of Chaos Mesh&#8217;s architecture.<\/p>\n\n\n\n<p>For the test code in this article, see the&nbsp;<a href=\"https:\/\/github.com\/mayocream\/chaos-mesh-controlpanel-demo\">mayocream\/chaos-mesh-controlpanel-demo<\/a>&nbsp;repository on GitHub.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-chaos-mesh-creates-chaos\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#how-chaos-mesh-creates-chaos\"><\/a>How Chaos Mesh creates chaos<\/h2>\n\n\n\n<p>Chaos Mesh is a Swiss army knife for implementing Chaos Engineering on Kubernetes. This section introduces how it works.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"privileged-mode\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#privileged-mode\"><\/a>Privileged mode<\/h3>\n\n\n\n<p>Chaos Mesh runs privileged containers in Kubernetes to create failures. Chaos Daemon&#8217;s Pod runs as&nbsp;<code>DaemonSet<\/code>&nbsp;and adds additional&nbsp;<a href=\"https:\/\/kubernetes.io\/docs\/concepts\/policy\/pod-security-policy\/#capabilities\">capabilities<\/a>&nbsp;to the Pod&#8217;s container runtime via the Pod&#8217;s security context.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: apps\/v1\nkind: DaemonSet\nspec:\n  template:\n    metadata: ...\n    spec:\n      containers:\n        - name: chaos-daemon\n          securityContext:\n            {{- if .Values.chaosDaemon.privileged }}\n            privileged: true\n            capabilities:\n              add:\n                - SYS_PTRACE\n            {{- else }}\n            capabilities:\n              add:\n                - SYS_PTRACE\n                - NET_ADMIN\n                - MKNOD\n                - SYS_CHROOT\n                - SYS_ADMIN\n                - KILL\n                # CAP_IPC_LOCK is used to lock memory\n                - IPC_LOCK\n            {{- end }}<\/code><\/pre>\n\n\n\n<p>The Linux capabilities grant containers privileges to create and access the&nbsp;<code>\/dev\/fuse<\/code>&nbsp;Filesystem in Userspace (FUSE) pipe. FUSE is the Linux userspace filesystem interface. It lets non-privileged users create their own file systems without editing the kernel code.<\/p>\n\n\n\n<p>According to&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/chaos-mesh\/pull\/1109\">pull request #1109<\/a>&nbsp;on GitHub, the&nbsp;<code>DaemonSet<\/code>&nbsp;program uses cgo to call the Linux&nbsp;<code>makedev<\/code>&nbsp;function to create a FUSE pipe.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ #include &lt;sys\/sysmacros.h&gt;\n\/\/ #include &lt;sys\/types.h&gt;\n\/\/ \/\/ makedev is a macro, so a wrapper is needed\n\/\/ dev_t Makedev(unsigned int maj, unsigned int min) {\n\/\/   return makedev(maj, min);\n\/\/ }\n\/\/ EnsureFuseDev ensures \/dev\/fuse exists. If not, it will create one\nfunc EnsureFuseDev() {\n    if _, err := os.Open(\"\/dev\/fuse\"); os.IsNotExist(err) {\n        \/\/ 10, 229 according to https:\/\/www.kernel.org\/doc\/Documentation\/admin-guide\/devices.txt\n        fuse := C.Makedev(10, 229)\n        syscall.Mknod(\"\/dev\/fuse\", 0o666|syscall.S_IFCHR, int(fuse))\n    }\n}<\/code><\/pre>\n\n\n\n<p>In&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/chaos-mesh\/pull\/1453\">pull request #1453<\/a>, Chaos Daemon enables privileged mode by default; that is, it sets&nbsp;<code>privileged: true<\/code>&nbsp;in the container&#8217;s&nbsp;<code>SecurityContext<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"killing-pods\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#killing-pods\"><\/a>Killing Pods<\/h3>\n\n\n\n<p><code>PodKill<\/code>,&nbsp;<code>PodFailure<\/code>, and&nbsp;<code>ContainerKill<\/code>&nbsp;belong to the&nbsp;<code>PodChaos<\/code>&nbsp;category.&nbsp;<code>PodKill<\/code>&nbsp;randomly kills a Pod. It calls the API server to send the kill command.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import (\n    \"context\"\n    v1 \"k8s.io\/api\/core\/v1\"\n    \"sigs.k8s.io\/controller-runtime\/pkg\/client\"\n)\ntype Impl struct {\n    client.Client\n}\nfunc (impl *Impl) Apply(ctx context.Context, index int, records &#91;]*v1alpha1.Record, obj v1alpha1.InnerObject) (v1alpha1.Phase, error) {\n    ...\n    err = impl.Get(ctx, namespacedName, &amp;pod)\n    if err != nil {\n        \/\/ TODO: handle this error\n        return v1alpha1.NotInjected, err\n    }\n    err = impl.Delete(ctx, &amp;pod, &amp;client.DeleteOptions{\n        GracePeriodSeconds: &amp;podchaos.Spec.GracePeriod, \/\/ PeriodSeconds has to be set specifically\n    })\n    ...\n    return v1alpha1.Injected, nil\n}<\/code><\/pre>\n\n\n\n<p>The&nbsp;<code>GracePeriodSeconds<\/code>&nbsp;parameter lets Kubernetes&nbsp;<a href=\"https:\/\/kubernetes.io\/docs\/concepts\/workloads\/pods\/pod-lifecycle\/#pod-termination-forced\">forcibly terminate a Pod<\/a>. For example, if you need to delete a Pod immediately, use the&nbsp;<code>kubectl delete pod --grace-period=0 --force<\/code>&nbsp;command.<\/p>\n\n\n\n<p><code>PodFailure<\/code> patches the Pod object resource to replace the image in the Pod with a wrong one. Chaos only modifies the <code>image<\/code> fields of <code>containers<\/code> and <code>initContainers<\/code>. This is because most of the metadata about a Pod is immutable. For more details, see <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/workloads\/pods\/#pod-update-and-replacement\" target=\"_blank\" rel=\"noreferrer noopener\">Pod update and replacement<\/a>.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>func (impl *Impl) Apply(ctx context.Context, index int, records []*v1alpha1.Record, obj v1alpha1.InnerObject) (v1alpha1.Phase, error) {\n    ...\n    pod := origin.DeepCopy()\n    for index := range pod.Spec.Containers {\n        originImage := pod.Spec.Containers[index].Image\n        name := pod.Spec.Containers[index].Name\n        key := annotation.GenKeyForImage(podchaos, name, false)\n        if pod.Annotations == nil {\n            pod.Annotations = make(map[string]string)\n        }\n        \/\/ If the annotation is already existed, we could skip the reconcile for this container\n        if _, ok := pod.Annotations[key]; ok {\n            continue\n        }\n        pod.Annotations[key] = originImage\n        pod.Spec.Containers[index].Image = config.ControllerCfg.PodFailurePauseImage\n    }\n    for index := range pod.Spec.InitContainers {\n        originImage := pod.Spec.InitContainers[index].Image\n        name := pod.Spec.InitContainers[index].Name\n        key := annotation.GenKeyForImage(podchaos, name, true)\n        if pod.Annotations == nil {\n            pod.Annotations = make(map[string]string)\n        }\n        \/\/ If the annotation is already existed, we could skip the reconcile for this container\n        if _, ok := pod.Annotations[key]; ok {\n            continue\n        }\n        pod.Annotations[key] = originImage\n        pod.Spec.InitContainers[index].Image = config.ControllerCfg.PodFailurePauseImage\n    }\n    err = impl.Patch(ctx, pod, client.MergeFrom(&amp;origin))\n    if err != nil {\n        \/\/ TODO: handle this error\n        return v1alpha1.NotInjected, err\n    }\n    return v1alpha1.Injected, nil\n}<\/code><\/pre>\n\n\n\n<p>The default container image that causes failures is&nbsp;<code>gcr.io\/google-containers\/pause:latest<\/code>.<\/p>\n\n\n\n<p><code>PodKill<\/code>&nbsp;and&nbsp;<code>PodFailure<\/code>&nbsp;control the Pod lifecycle through the Kubernetes API server. But&nbsp;<code>ContainerKill<\/code>&nbsp;does this through Chaos Daemon that runs on the cluster node.&nbsp;<code>ContainerKill<\/code>&nbsp;uses Chaos Controller Manager to run the client to initiate gRPC calls to Chaos Daemon.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>func (b *ChaosDaemonClientBuilder) Build(ctx context.Context, pod *v1.Pod) (chaosdaemonclient.ChaosDaemonClientInterface, error) {\n    ...\n    daemonIP, err := b.FindDaemonIP(ctx, pod)\n    if err != nil {\n        return nil, err\n    }\n    builder := grpcUtils.Builder(daemonIP, config.ControllerCfg.ChaosDaemonPort).WithDefaultTimeout()\n    if config.ControllerCfg.TLSConfig.ChaosMeshCACert != \"\" {\n        builder.TLSFromFile(config.ControllerCfg.TLSConfig.ChaosMeshCACert, config.ControllerCfg.TLSConfig.ChaosDaemonClientCert, config.ControllerCfg.TLSConfig.ChaosDaemonClientKey)\n    } else {\n        builder.Insecure()\n    }\n    cc, err := builder.Build()\n    if err != nil {\n        return nil, err\n    }\n    return chaosdaemonclient.New(cc), nil\n}<\/code><\/pre>\n\n\n\n<p>When Chaos Controller Manager sends commands to Chaos Daemon, it creates a corresponding client based on the Pod information. For example, to control a Pod on a node, it creates a client by getting the&nbsp;<code>ClusterIP<\/code>&nbsp;of the node where the Pod is located. If the Transport Layer Security (TLS) certificate configuration exists, Controller Manager adds the TLS certificate for the client.<\/p>\n\n\n\n<p>When Chaos Daemon starts, if it has a TLS certificate it attaches the certificate to enable gRPCS. The TLS configuration option&nbsp;<code>RequireAndVerifyClientCert<\/code>&nbsp;indicates whether to enable mutual TLS (mTLS) authentication.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>func newGRPCServer(containerRuntime string, reg prometheus.Registerer, tlsConf tlsConfig) (*grpc.Server, error) {\n    ...\n    if tlsConf != (tlsConfig{}) {\n        caCert, err := ioutil.ReadFile(tlsConf.CaCert)\n        if err != nil {\n            return nil, err\n        }\n        caCertPool := x509.NewCertPool()\n        caCertPool.AppendCertsFromPEM(caCert)\n        serverCert, err := tls.LoadX509KeyPair(tlsConf.Cert, tlsConf.Key)\n        if err != nil {\n            return nil, err\n        }\n        creds := credentials.NewTLS(&amp;tls.Config{\n            Certificates: &#91;]tls.Certificate{serverCert},\n            ClientCAs:    caCertPool,\n            ClientAuth:   tls.RequireAndVerifyClientCert,\n        })\n        grpcOpts = append(grpcOpts, grpc.Creds(creds))\n    }\n    s := grpc.NewServer(grpcOpts...)\n    grpcMetrics.InitializeMetrics(s)\n    pb.RegisterChaosDaemonServer(s, ds)\n    reflection.Register(s)\n    return s, nil\n}<\/code><\/pre>\n\n\n\n<p>Chaos Daemon provides the following gRPC interfaces to call:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ ChaosDaemonClient is the client API for ChaosDaemon service.\n\/\/\n\/\/ For semantics around ctx use and closing\/ending streaming RPCs, please refer to https:\/\/godoc.org\/google.golang.org\/grpc#ClientConn.NewStream.\ntype ChaosDaemonClient interface {\n    SetTcs(ctx context.Context, in *TcsRequest, opts ...grpc.CallOption) (*empty.Empty, error)\n    FlushIPSets(ctx context.Context, in *IPSetsRequest, opts ...grpc.CallOption) (*empty.Empty, error)\n    SetIptablesChains(ctx context.Context, in *IptablesChainsRequest, opts ...grpc.CallOption) (*empty.Empty, error)\n    SetTimeOffset(ctx context.Context, in *TimeRequest, opts ...grpc.CallOption) (*empty.Empty, error)\n    RecoverTimeOffset(ctx context.Context, in *TimeRequest, opts ...grpc.CallOption) (*empty.Empty, error)\n    ContainerKill(ctx context.Context, in *ContainerRequest, opts ...grpc.CallOption) (*empty.Empty, error)\n    ContainerGetPid(ctx context.Context, in *ContainerRequest, opts ...grpc.CallOption) (*ContainerResponse, error)\n    ExecStressors(ctx context.Context, in *ExecStressRequest, opts ...grpc.CallOption) (*ExecStressResponse, error)\n    CancelStressors(ctx context.Context, in *CancelStressRequest, opts ...grpc.CallOption) (*empty.Empty, error)\n    ApplyIOChaos(ctx context.Context, in *ApplyIOChaosRequest, opts ...grpc.CallOption) (*ApplyIOChaosResponse, error)\n    ApplyHttpChaos(ctx context.Context, in *ApplyHttpChaosRequest, opts ...grpc.CallOption) (*ApplyHttpChaosResponse, error)\n    SetDNSServer(ctx context.Context, in *SetDNSServerRequest, opts ...grpc.CallOption) (*empty.Empty, error)\n}<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"network-failure-injection\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#network-failure-injection\"><\/a>Network failure injection<\/h3>\n\n\n\n<p>From&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/chaos-mesh\/pull\/41\">pull request #41<\/a>, we know that Chaos Mesh injects network failures this way: it calls&nbsp;<code>pbClient.SetNetem<\/code>&nbsp;to encapsulate parameters into a request and send the request to the Chaos Daemon on the node for processing.<\/p>\n\n\n\n<p>The network failure injection code is shown below as it appeared in 2019. As the project developed, the functions were distributed among several files.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>func (r *Reconciler) applyPod(ctx context.Context, pod *v1.Pod, networkchaos *v1alpha1.NetworkChaos) error {\n    ...\n    pbClient := pb.NewChaosDaemonClient(c)\n    containerId := pod.Status.ContainerStatuses&#91;0].ContainerID\n    netem, err := spec.ToNetem()\n    if err != nil {\n        return err\n    }\n    _, err = pbClient.SetNetem(ctx, &amp;pb.NetemRequest{\n        ContainerId: containerId,\n        Netem:       netem,\n    })\n    return err\n}<\/code><\/pre>\n\n\n\n<p>In the&nbsp;<code>pkg\/chaosdaemon<\/code>&nbsp;package, we can see how Chaos Daemon processes requests.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>func (s *Server) SetNetem(ctx context.Context, in *pb.NetemRequest) (*empty.Empty, error) {\n    log.Info(\"Set netem\", \"Request\", in)\n    pid, err := s.crClient.GetPidFromContainerID(ctx, in.ContainerId)\n    if err != nil {\n        return nil, status.Errorf(codes.Internal, \"get pid from containerID error: %v\", err)\n    }\n    if err := Apply(in.Netem, pid); err != nil {\n        return nil, status.Errorf(codes.Internal, \"netem apply error: %v\", err)\n    }\n    return &amp;empty.Empty{}, nil\n}\n\/\/ Apply applies a netem on eth0 in pid related namespace\nfunc Apply(netem *pb.Netem, pid uint32) error {\n    log.Info(\"Apply netem on PID\", \"pid\", pid)\n    ns, err := netns.GetFromPath(GenNetnsPath(pid))\n    if err != nil {\n        log.Error(err, \"failed to find network namespace\", \"pid\", pid)\n        return errors.Trace(err)\n    }\n    defer ns.Close()\n    handle, err := netlink.NewHandleAt(ns)\n    if err != nil {\n        log.Error(err, \"failed to get handle at network namespace\", \"network namespace\", ns)\n        return err\n    }\n    link, err := handle.LinkByName(\"eth0\") \/\/ TODO: check whether interface name is eth0\n    if err != nil {\n        log.Error(err, \"failed to find eth0 interface\")\n        return errors.Trace(err)\n    }\n    netemQdisc := netlink.NewNetem(netlink.QdiscAttrs{\n        LinkIndex: link.Attrs().Index,\n        Handle:    netlink.MakeHandle(1, 0),\n        Parent:    netlink.HANDLE_ROOT,\n    }, ToNetlinkNetemAttrs(netem))\n    if err = handle.QdiscAdd(netemQdisc); err != nil {\n        if !strings.Contains(err.Error(), \"file exists\") {\n            log.Error(err, \"failed to add Qdisc\")\n            return errors.Trace(err)\n        }\n    }\n    return nil\n}<\/code><\/pre>\n\n\n\n<p>Finally, the&nbsp;<a href=\"https:\/\/github.com\/vishvananda\/netlink\"><code>vishvananda\/netlink<\/code>&nbsp;library<\/a>&nbsp;operates the Linux network interface to complete the job.<\/p>\n\n\n\n<p>From here,&nbsp;<code>NetworkChaos<\/code>&nbsp;manipulates the Linux host network to create chaos. It includes tools such as iptables and ipset.<\/p>\n\n\n\n<p>In Chaos Daemon&#8217;s Dockerfile, you can see the Linux tool chain that it depends on:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>RUN apt-get update &amp;&amp; \\ \n    apt-get install -y tzdata iptables ipset stress-ng iproute2 fuse util-linux procps curl &amp;&amp; \\\n    rm -rf \/var\/lib\/apt\/lists\/*<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"stress-test\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#stress-test\"><\/a>Stress test<\/h3>\n\n\n\n<p>Chaos Daemon also implements&nbsp;<code>StressChaos<\/code>. After the Controller Manager calculates the rules, it sends the task to the specific&nbsp;<code>Daemon<\/code>. The assembled parameters are shown below. They are combined into command execution parameters and appended to the&nbsp;<code>stress-ng<\/code>&nbsp;command for execution.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Normalize the stressors to comply with stress-ng\nfunc (in *Stressors) Normalize() (string, error) {\n    stressors := \"\"\n    if in.MemoryStressor != nil &amp;&amp; in.MemoryStressor.Workers != 0 {\n        stressors += fmt.Sprintf(\" --vm %d --vm-keep\", in.MemoryStressor.Workers)\n        if len(in.MemoryStressor.Size) != 0 {\n            if in.MemoryStressor.Size&#91;len(in.MemoryStressor.Size)-1] != '%' {\n                size, err := units.FromHumanSize(string(in.MemoryStressor.Size))\n                if err != nil {\n                    return \"\", err\n                }\n                stressors += fmt.Sprintf(\" --vm-bytes %d\", size)\n            } else {\n                stressors += fmt.Sprintf(\" --vm-bytes %s\",\n                    in.MemoryStressor.Size)\n            }\n        }\n        if in.MemoryStressor.Options != nil {\n            for _, v := range in.MemoryStressor.Options {\n                stressors += fmt.Sprintf(\" %v \", v)\n            }\n        }\n    }\n    if in.CPUStressor != nil &amp;&amp; in.CPUStressor.Workers != 0 {\n        stressors += fmt.Sprintf(\" --cpu %d\", in.CPUStressor.Workers)\n        if in.CPUStressor.Load != nil {\n            stressors += fmt.Sprintf(\" --cpu-load %d\",\n                *in.CPUStressor.Load)\n        }\n        if in.CPUStressor.Options != nil {\n            for _, v := range in.CPUStressor.Options {\n                stressors += fmt.Sprintf(\" %v \", v)\n            }\n        }\n    }\n    return stressors, nil\n}<\/code><\/pre>\n\n\n\n<p>The Chaos Daemon server side processes the function&#8217;s execution command to call the official Go package&nbsp;<code>os\/exec<\/code>. For details, see the&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/chaos-mesh\/blob\/98af3a0e7832a4971d6b133a32069539d982ef0a\/pkg\/chaosdaemon\/stress_server_linux.go#L33\"><code>pkg\/chaosdaemon\/stress_server_linux.go<\/code><\/a>&nbsp;file. There is also a file with the same name that ends with darwin.&nbsp;<code>*_darwin<\/code>&nbsp;files prevent possible errors when the program is running on macOS.<\/p>\n\n\n\n<p>The code uses the&nbsp;<a href=\"https:\/\/github.com\/shirou\/gopsutil\"><code>shirou\/gopsutil<\/code><\/a>&nbsp;package to obtain the PID process status and reads the stdout and stderr standard outputs. I&#8217;ve seen this processing mode in&nbsp;<a href=\"https:\/\/github.com\/hashicorp\/go-plugin\"><code>hashicorp\/go-plugin<\/code><\/a>, and go-plugin does this better.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"io-fault-injection\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#io-fault-injection\"><\/a>I\/O fault injection<\/h3>\n\n\n\n<p><a href=\"https:\/\/github.com\/chaos-mesh\/chaos-mesh\/pull\/826\">Pull request #826<\/a>&nbsp;introduces a new implementation of IOChaos, without the use of sidecar injection. It uses Chaos Daemon to directly manipulate the Linux namespace through the underlying commands of the&nbsp;<a href=\"https:\/\/github.com\/opencontainers\/runc\">runc<\/a>&nbsp;container and runs the&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/toda\">chaos-mesh\/toda<\/a>&nbsp;FUSE program developed by Rust to inject container I\/O chaos. The&nbsp;<a href=\"https:\/\/pkg.go.dev\/github.com\/ethereum\/go-ethereum\/rpc\">JSON-RPC 2.0<\/a>&nbsp;protocol is used to communicate between toda and the control plane.<\/p>\n\n\n\n<p>The new IOChaos implementation doesn&#8217;t modify the Pod resources. When you define the IOChaos chaos experiment, for each Pod filtered by the selector field, a corresponding PodIOChaos resource is created. PodIoChaos&#8217;&nbsp;<a href=\"https:\/\/kubernetes.io\/docs\/concepts\/overview\/working-with-objects\/owners-dependents\/\">owner reference<\/a>&nbsp;is the Pod. At the same time, a set of&nbsp;<a href=\"https:\/\/kubernetes.io\/docs\/concepts\/overview\/working-with-objects\/finalizers\/\">finalizers<\/a>&nbsp;is added to PodIoChaos to release PodIoChaos resources before PodIoChaos is deleted.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Apply implements the reconciler.InnerReconciler.Apply\nfunc (r *Reconciler) Apply(ctx context.Context, req ctrl.Request, chaos v1alpha1.InnerObject) error {\n    iochaos, ok := chaos.(*v1alpha1.IoChaos)\n    if !ok {\n        err := errors.New(\"chaos is not IoChaos\")\n        r.Log.Error(err, \"chaos is not IoChaos\", \"chaos\", chaos)\n        return err\n    }\n    source := iochaos.Namespace + \"\/\" + iochaos.Name\n    m := podiochaosmanager.New(source, r.Log, r.Client)\n    pods, err := utils.SelectAndFilterPods(ctx, r.Client, r.Reader, &amp;iochaos.Spec)\n    if err != nil {\n        r.Log.Error(err, \"failed to select and filter pods\")\n        return err\n    }\n    r.Log.Info(\"applying iochaos\", \"iochaos\", iochaos)\n    for _, pod := range pods {\n        t := m.WithInit(types.NamespacedName{\n            Name:      pod.Name,\n            Namespace: pod.Namespace,\n        })\n        \/\/ TODO: support chaos on multiple volume\n        t.SetVolumePath(iochaos.Spec.VolumePath)\n        t.Append(v1alpha1.IoChaosAction{\n            Type: iochaos.Spec.Action,\n            Filter: v1alpha1.Filter{\n                Path:    iochaos.Spec.Path,\n                Percent: iochaos.Spec.Percent,\n                Methods: iochaos.Spec.Methods,\n            },\n            Faults: &#91;]v1alpha1.IoFault{\n                {\n                    Errno:  iochaos.Spec.Errno,\n                    Weight: 1,\n                },\n            },\n            Latency:          iochaos.Spec.Delay,\n            AttrOverrideSpec: iochaos.Spec.Attr,\n            Source:           m.Source,\n        })\n        key, err := cache.MetaNamespaceKeyFunc(&amp;pod)\n        if err != nil {\n            return err\n        }\n        iochaos.Finalizers = utils.InsertFinalizer(iochaos.Finalizers, key)\n    }\n    r.Log.Info(\"commiting updates of podiochaos\")\n    err = m.Commit(ctx)\n    if err != nil {\n        r.Log.Error(err, \"fail to commit\")\n        return err\n    }\n    r.Event(iochaos, v1.EventTypeNormal, utils.EventChaosInjected, \"\")\n    return nil\n}<\/code><\/pre>\n\n\n\n<div style=\"--ub-icon-rotation:rotate(0deg);--ub-icon-size:40px;--ub-icon-justification:center;--ub-icon-border-top:  undefined;--ub-icon-border-right:  undefined;--ub-icon-border-bottom:  undefined;--ub-icon-border-left:  undefined\" class=\"wp-block-ub-icon\" id=\"ub-icon-\"><div class=\"ub_icon\"><div class=\"ub_icon_wrapper\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 24 24\" aria-hidden=\"true\"><path d=\"M6.6 6L5.4 7l4.5 5-4.5 5 1.1 1 5.5-6-5.4-6zm6 0l-1.1 1 4.5 5-4.5 5 1.1 1 5.5-6-5.5-6z\"><\/path><\/svg><\/div><\/div><\/div>\n\n\n\n<p>In the controller of the PodIoChaos resource, Controller Manager encapsulates the resource into parameters and calls the Chaos Daemon interface to process the parameters.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Apply flushes io configuration on pod\nfunc (h *Handler) Apply(ctx context.Context, chaos *v1alpha1.PodIoChaos) error {\n    h.Log.Info(\"updating io chaos\", \"pod\", chaos.Namespace+\"\/\"+chaos.Name, \"spec\", chaos.Spec)\n    ...\n    res, err := pbClient.ApplyIoChaos(ctx, &amp;pb.ApplyIoChaosRequest{\n        Actions:     input,\n        Volume:      chaos.Spec.VolumeMountPath,\n        ContainerId: containerID,\n        Instance:  chaos.Spec.Pid,\n        StartTime: chaos.Spec.StartTime,\n    })\n    if err != nil {\n        return err\n    }\n    chaos.Spec.Pid = res.Instance\n    chaos.Spec.StartTime = res.StartTime\n    chaos.OwnerReferences = &#91;]metav1.OwnerReference{\n        {\n            APIVersion: pod.APIVersion,\n            Kind:       pod.Kind,\n            Name:       pod.Name,\n            UID:        pod.UID,\n        },\n    }\n    return nil\n}<\/code><\/pre>\n\n\n\n<p>The\u00a0<code>pkg\/chaosdaemon\/iochaos_server.go<\/code>\u00a0file processes IOChaos. \u200b\u200bIn this file, a FUSE program needs to be injected into the container. As discussed in issue\u00a0#2305\u00a0on GitHub, the\u00a0<code>\/usr\/local\/bin\/nsexec -l- p \/proc\/119186\/ns\/pid -m \/proc\/119186\/ns\/mnt - \/usr\/local\/bin\/toda --path \/tmp --verbose info<\/code>\u00a0command is executed to run the toda program under the same namespace as the Pod.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>func (s *DaemonServer) ApplyIOChaos(ctx context.Context, in *pb.ApplyIOChaosRequest) (*pb.ApplyIOChaosResponse, error) {\n    ...\n    pid, err := s.crClient.GetPidFromContainerID(ctx, in.ContainerId)\n    if err != nil {\n        log.Error(err, \"error while getting PID\")\n        return nil, err\n    }\n    args := fmt.Sprintf(\"--path %s --verbose info\", in.Volume)\n    log.Info(\"executing\", \"cmd\", todaBin+\" \"+args)\n    processBuilder := bpm.DefaultProcessBuilder(todaBin, strings.Split(args, \" \")...).\n        EnableLocalMnt().\n        SetIdentifier(in.ContainerId)\n    if in.EnterNS {\n        processBuilder = processBuilder.SetNS(pid, bpm.MountNS).SetNS(pid, bpm.PidNS)\n    }\n    ...\n    \/\/ Calls JSON RPC\n    client, err := jrpc.DialIO(ctx, receiver, caller)\n    if err != nil {\n        return nil, err\n    }\n    cmd := processBuilder.Build()\n    procState, err := s.backgroundProcessManager.StartProcess(cmd)\n    if err != nil {\n        return nil, err\n    }\n    ...\n}<\/code><\/pre>\n\n\n\n<p>The following code sample builds the running commands. These commands are the underlying namespace isolation implementation of runc:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ GetNsPath returns corresponding namespace path\nfunc GetNsPath(pid uint32, typ NsType) string {\n    return fmt.Sprintf(\"%s\/%d\/ns\/%s\", DefaultProcPrefix, pid, string(typ))\n}\n\/\/ SetNS sets the namespace of the process\nfunc (b *ProcessBuilder) SetNS(pid uint32, typ NsType) *ProcessBuilder {\n    return b.SetNSOpt(&#91;]nsOption{{\n        Typ:  typ,\n        Path: GetNsPath(pid, typ),\n    }})\n}\n\/\/ Build builds the process\nfunc (b *ProcessBuilder) Build() *ManagedProcess {\n    args := b.args\n    cmd := b.cmd\n    if len(b.nsOptions) &gt; 0 {\n        args = append(&#91;]string{\"--\", cmd}, args...)\n        for _, option := range b.nsOptions {\n            args = append(&#91;]string{\"-\" + nsArgMap&#91;option.Typ], option.Path}, args...)\n        }\n        if b.localMnt {\n            args = append(&#91;]string{\"-l\"}, args...)\n        }\n        cmd = nsexecPath\n    }\n    ...\n}<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"control-plane\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#control-plane\"><\/a>Control plane<\/h2>\n\n\n\n<p>Chaos Mesh is an open-source chaos engineering system under the Apache 2.0 protocol. As discussed above, it has rich capabilities and a good ecosystem. The maintenance team developed the&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/toda\"><code>chaos-mesh\/toda<\/code><\/a>&nbsp;FUSE based on the chaos system, the&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/k8s_dns_chaos\"><code>chaos-mesh\/k8s_dns_chaos<\/code><\/a>&nbsp;CoreDNS chaos plug-in, and Berkeley Packet Filter (BPF)-based kernel error injection&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/bpfki\"><code>chaos-mesh\/bpfki<\/code><\/a>.<\/p>\n\n\n\n<p>Now, I&#8217;ll describe the server side code required to build an end-user-oriented chaos engineering platform. This implementation is only an example\u2014not necessarily the best example. If you want to see the development practice on a real world platform, you can refer to Chaos Mesh&#8217;s&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/chaos-mesh\/tree\/master\/pkg\/dashboard\">Dashboard<\/a>. It uses the&nbsp;<a href=\"https:\/\/github.com\/uber-go\/fx\"><code>uber-go\/fx<\/code><\/a>&nbsp;dependency injection framework and the controller runtime&#8217;s manager mode.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"key-chaos-mesh-features\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#key-chaos-mesh-features\"><\/a>Key Chaos Mesh features<\/h3>\n\n\n\n<p>As shown in the Chaos Mesh workflow below, we need to implement a server that sends YAML to the Kubernetes API. Chaos Controller Manager implements complex rule verification and rule delivery to Chaos Daemon. If you want to use Chaos Mesh with your own platform, you only need to connect to the process of creating CRD resources.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/download.pingcap.com\/images\/blog\/chaos-mesh-basic-workflow.jpg\" alt=\"Chaos Mesh's basic workflow\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Chaos Mesh&#8217;s basic workflow<\/em><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Let&#8217;s take a look at the example on the Chaos Mesh website:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import (\n    \"context\"\n    \"github.com\/pingcap\/chaos-mesh\/api\/v1alpha1\"\n    \"sigs.k8s.io\/controller-runtime\/pkg\/client\"\n)\nfunc main() {\n    ...\n    delay := &amp;chaosv1alpha1.NetworkChaos{\n        Spec: chaosv1alpha1.NetworkChaosSpec{...},\n    }\n    k8sClient := client.New(conf, client.Options{ Scheme: scheme.Scheme })\n    k8sClient.Create(context.TODO(), delay)\n    k8sClient.Delete(context.TODO(), delay)\n}<\/code><\/pre>\n\n\n\n<p>Chaos Mesh provides APIs corresponding to all CRDs. We use the&nbsp;<a href=\"https:\/\/github.com\/kubernetes-sigs\/controller-runtime\">controller-runtime<\/a>&nbsp;developed by Kubernetes&nbsp;<a href=\"https:\/\/github.com\/kubernetes\/community\/tree\/master\/sig-api-machinery\">API Machinery SIG<\/a>&nbsp;to simplify the interaction with the Kubernetes API.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"inject-chaos\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#inject-chaos\"><\/a>Inject chaos<\/h3>\n\n\n\n<p>Suppose we want to create a&nbsp;<code>PodKill<\/code>&nbsp;resource by calling a program. After the resource is sent to the Kubernetes API server, it passes Chaos Controller Manager&#8217;s&nbsp;<a href=\"https:\/\/kubernetes.io\/docs\/reference\/access-authn-authz\/admission-controllers\/\">validating admission controller<\/a>&nbsp;to verify data. When we create a chaos experiment, if the admission controller fails to verify the input data, it returns an error to the client. For specific parameters, you can read&nbsp;<a href=\"https:\/\/chaos-mesh.org\/docs\/simulate-pod-chaos-on-kubernetes\/#create-experiments-using-yaml-configuration-files\">Create experiments using YAML configuration files<\/a>.<\/p>\n\n\n\n<p><code>NewClient<\/code>&nbsp;creates a Kubernetes API client. You can refer to this example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>package main\nimport (\n    \"context\"\n    \"controlpanel\"\n    \"log\"\n    \"github.com\/chaos-mesh\/chaos-mesh\/api\/v1alpha1\"\n    \"github.com\/pkg\/errors\"\n    metav1 \"k8s.io\/apimachinery\/pkg\/apis\/meta\/v1\"\n)\nfunc applyPodKill(name, namespace string, labels map&#91;string]string) error {\n    cli, err := controlpanel.NewClient()\n    if err != nil {\n        return errors.Wrap(err, \"create client\")\n    }\n    cr := &amp;v1alpha1.PodChaos{\n        ObjectMeta: metav1.ObjectMeta{\n            GenerateName: name,\n            Namespace:    namespace,\n        },\n        Spec: v1alpha1.PodChaosSpec{\n            Action: v1alpha1.PodKillAction,\n            ContainerSelector: v1alpha1.ContainerSelector{\n                PodSelector: v1alpha1.PodSelector{\n                    Mode: v1alpha1.OnePodMode,\n                    Selector: v1alpha1.PodSelectorSpec{\n                        Namespaces:     &#91;]string{namespace},\n                        LabelSelectors: labels,\n                    },\n                },\n            },\n        },\n    }\n    if err := cli.Create(context.Background(), cr); err != nil {\n        return errors.Wrap(err, \"create podkill\")\n    }\n    return nil\n}<\/code><\/pre>\n\n\n\n<p>The log output of the running program is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>I1021 00:51:55.225502   23781 request.go:665] Waited for 1.033116256s due to client-side throttling, not priority and fairness, request: GET:https:\/\/***\n2021\/10\/21 00:51:56 apply podkill<\/code><\/pre>\n\n\n\n<p>Use kubectl to check the status of the&nbsp;<code>PodKill<\/code>&nbsp;resource:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ k describe podchaos.chaos-mesh.org -n dev podkillvjn77\nName:         podkillvjn77\nNamespace:    dev\nLabels:       &lt;none&gt;\nAnnotations:  &lt;none&gt;\nAPI Version:  chaos-mesh.org\/v1alpha1\nKind:         PodChaos\nMetadata:\n  Creation Timestamp:  2021-10-20T16:51:56Z\n  Finalizers:\n    chaos-mesh\/records\n  Generate Name:     podkill\n  Generation:        7\n  Resource Version:  938921488\n  Self Link:         \/apis\/chaos-mesh.org\/v1alpha1\/namespaces\/dev\/podchaos\/podkillvjn77\n  UID:               afbb40b3-ade8-48ba-89db-04918d89fd0b\nSpec:\n  Action:        pod-kill\n  Grace Period:  0\n  Mode:          one\n  Selector:\n    Label Selectors:\n      app:  nginx\n    Namespaces:\n      dev\nStatus:\n  Conditions:\n    Reason:  \n    Status:  False\n    Type:    Paused\n    Reason:  \n    Status:  True\n    Type:    Selected\n    Reason:  \n    Status:  True\n    Type:    AllInjected\n    Reason:  \n    Status:  False\n    Type:    AllRecovered\n  Experiment:\n    Container Records:\n      Id:            dev\/nginx\n      Phase:         Injected\n      Selector Key:  .\n    Desired Phase:   Run\nEvents:\n  Type    Reason           Age    From          Message\n  ----    ------           ----   ----          -------\n  Normal  FinalizerInited  6m35s  finalizer     Finalizer has been inited\n  Normal  Updated          6m35s  finalizer     Successfully update finalizer of resource\n  Normal  Updated          6m35s  records       Successfully update records of resource\n  Normal  Updated          6m35s  desiredphase  Successfully update desiredPhase of resource\n  Normal  Applied          6m35s  records       Successfully apply chaos for dev\/nginx\n  Normal  Updated          6m35s  records       Successfully update records of resource<\/code><\/pre>\n\n\n\n<p>The control plane also needs to query and acquire Chaos resources, so that platform users can view all chaos experiments&#8217; implementation status and manage them. To achieve this, we can call the&nbsp;<code>REST<\/code>&nbsp;API to send the&nbsp;<code>Get<\/code>&nbsp;or&nbsp;<code>List<\/code>&nbsp;request. But in practice, we need to pay attention to the details. At our company, we&#8217;ve noticed that each time the controller requests the full amount of resource data, the load of the Kubernetes API server increases.<\/p>\n\n\n\n<p>I recommend that you read the&nbsp;<a href=\"https:\/\/zoetrope.github.io\/kubebuilder-training\/controller-runtime\/client.html\">How to use the controller-runtime client<\/a>&nbsp;(in Japanese) controller runtime tutorial. If you don&#8217;t understand Japanese, you can still learn a lot from the tutorial by reading the source code. It covers many details. For example, by default, the controller runtime reads kubeconfig, flags, environment variables, and the service account automatically mounted in the Pod from multiple locations.&nbsp;<a href=\"https:\/\/github.com\/armosec\/kubescape\/pull\/21\">Pull request #21<\/a>&nbsp;for&nbsp;<a href=\"https:\/\/github.com\/armosec\/kubescape\"><code>armosec\/kubescape<\/code><\/a>&nbsp;uses this feature. This tutorial also includes common operations, such as how to paginate, update, and overwrite objects. I haven&#8217;t seen any English tutorials that are so detailed.<\/p>\n\n\n\n<p>Here are examples of&nbsp;<code>Get<\/code>&nbsp;and&nbsp;<code>List<\/code>&nbsp;requests:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>package controlpanel\nimport (\n    \"context\"\n    \"github.com\/chaos-mesh\/chaos-mesh\/api\/v1alpha1\"\n    \"github.com\/pkg\/errors\"\n    \"sigs.k8s.io\/controller-runtime\/pkg\/client\"\n)\nfunc GetPodChaos(name, namespace string) (*v1alpha1.PodChaos, error) {\n    cli := mgr.GetClient()\n    item := new(v1alpha1.PodChaos)\n    if err := cli.Get(context.Background(), client.ObjectKey{Name: name, Namespace: namespace}, item); err != nil {\n        return nil, errors.Wrap(err, \"get cr\")\n    }\n    return item, nil\n}\nfunc ListPodChaos(namespace string, labels map&#91;string]string) (&#91;]v1alpha1.PodChaos, error) {\n    cli := mgr.GetClient()\n    list := new(v1alpha1.PodChaosList)\n    if err := cli.List(context.Background(), list, client.InNamespace(namespace), client.MatchingLabels(labels)); err != nil {\n        return nil, err\n    }\n    return list.Items, nil\n}<\/code><\/pre>\n\n\n\n<p>This example uses the manager. This mode prevents the cache mechanism from repetitively fetching large amounts of data. The following&nbsp;<a href=\"https:\/\/zoetrope.github.io\/kubebuilder-training\/controller-runtime\/client.html\">figure<\/a>&nbsp;shows the workflow:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Get the Pod.<\/li>\n\n\n\n<li>Get the&nbsp;<code>List<\/code>&nbsp;request&#8217;s full data for the first time.<\/li>\n\n\n\n<li>Update the cache when the watch data changes.<\/li>\n<\/ol>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/download.pingcap.com\/images\/blog\/list-request.jpg\" alt=\"List request\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>List request<\/em><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"orchestrate-chaos\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#orchestrate-chaos\"><\/a>Orchestrate chaos<\/h3>\n\n\n\n<p>The container runtime interface (CRI) container runtime provides strong underlying isolation capabilities that can support the stable operation of the container. But for more complex and scalable scenarios, container orchestration is required. Chaos Mesh also provides&nbsp;<a href=\"https:\/\/chaos-mesh.org\/docs\/define-scheduling-rules\/\"><code>Schedule<\/code><\/a>&nbsp;and&nbsp;<a href=\"https:\/\/chaos-mesh.org\/docs\/create-chaos-mesh-workflow\/\"><code>Workflow<\/code><\/a>&nbsp;features. Based on the set&nbsp;<code>Cron<\/code>&nbsp;time,&nbsp;<code>Schedule<\/code>&nbsp;can trigger faults regularly and at intervals.&nbsp;<code>Workflow<\/code>&nbsp;can schedule multiple fault tests like Argo Workflows.<\/p>\n\n\n\n<p>Chaos Controller Manager does most of the work for us. The control plane mainly manages these YAML resources. You only need to consider the features you want to provide to end users.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"platform-features\"><a href=\"https:\/\/www.pingcap.com\/blog\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development#platform-features\"><\/a>Platform features<\/h3>\n\n\n\n<p>The following figure shows Chaos Mesh Dashboard. We need to consider what features the platform should provide to end users.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/download.pingcap.com\/images\/blog\/chaos-mesh-dashboard.jpg\" alt=\"Chaos Mesh Dashboard\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-center\"><em>Chaos Mesh Dashboard<\/em><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>From the Dashboard, we know that the platform may have these features:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Chaos injection<\/li>\n\n\n\n<li>Pod crash<\/li>\n\n\n\n<li>Network failure<\/li>\n\n\n\n<li>Load test<\/li>\n\n\n\n<li>I\/O failure<\/li>\n\n\n\n<li>Event tracking<\/li>\n\n\n\n<li>Associated alarm<\/li>\n\n\n\n<li>Timing telemetry<\/li>\n<\/ul>\n\n\n\n<p>If you are interested in Chaos Mesh and would like to improve it, join its&nbsp;<a href=\"https:\/\/slack.cncf.io\/\">Slack channel<\/a>&nbsp;(#project-chaos-mesh) or submit your pull requests or issues to its&nbsp;<a href=\"https:\/\/github.com\/chaos-mesh\/chaos-mesh\">GitHub repository<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post describes the practice of Chaos Engineering in K8s clusters, discusses important Chaos Mesh features through analysis of its source code, and explains how to develop Chaos Mesh&#8217;s control plane with code examples.<\/p>","protected":false},"author":8,"featured_media":3456,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ub_ctt_via":"","footnotes":""},"categories":[6],"tags":[21],"class_list":["post-3455","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-engineering","tag-chaos-engineering"],"acf":[],"featured_image_src":"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png","author_info":{"display_name":"TiDB Team","author_link":"https:\/\/www.pingcap.com\/ko\/blog\/author\/pingcap\/"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Implementing Chaos Engineering in K8s<\/title>\n<meta name=\"description\" content=\"In this post, we will explore the practice of Chaos Engineering in K8s clusters, and discuss important Chaos Mesh features through the code.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Implementing Chaos Engineering in K8s\" \/>\n<meta property=\"og:description\" content=\"In this post, we will explore the practice of Chaos Engineering in K8s clusters, and discuss important Chaos Mesh features through the code.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pingcap.com\/ko\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/\" \/>\n<meta property=\"og:site_name\" content=\"TiDB\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/pingcap2015\" \/>\n<meta property=\"article:published_time\" content=\"2021-12-09T10:09:56+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-08-20T13:46:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1501\" \/>\n\t<meta property=\"og:image:height\" content=\"501\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"TiDB Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:site\" content=\"@PingCAP\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TiDB Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/\"},\"author\":{\"name\":\"TiDB Team\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/b17c1fde961eebd318de8729d595df74\"},\"headline\":\"Implementing Chaos Engineering in K8s: Chaos Mesh Principle Analysis and Control Plane Development\",\"datePublished\":\"2021-12-09T10:09:56+00:00\",\"dateModified\":\"2024-08-20T13:46:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/\"},\"wordCount\":1807,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png\",\"keywords\":[\"Chaos Engineering\"],\"articleSection\":[\"Engineering\"],\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/\",\"url\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/\",\"name\":\"Implementing Chaos Engineering in K8s\",\"isPartOf\":{\"@id\":\"https:\/\/www.pingcap.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png\",\"datePublished\":\"2021-12-09T10:09:56+00:00\",\"dateModified\":\"2024-08-20T13:46:24+00:00\",\"description\":\"In this post, we will explore the practice of Chaos Engineering in K8s clusters, and discuss important Chaos Mesh features through the code.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#primaryimage\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png\",\"width\":1501,\"height\":501},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pingcap.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Implementing Chaos Engineering in K8s: Chaos Mesh Principle Analysis and Control Plane Development\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pingcap.com\/#website\",\"url\":\"https:\/\/www.pingcap.com\/\",\"name\":\"TiDB\",\"description\":\"TiDB | SQL at Scale\",\"publisher\":{\"@id\":\"https:\/\/www.pingcap.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pingcap.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pingcap.com\/#organization\",\"name\":\"PingCAP\",\"url\":\"https:\/\/www.pingcap.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png\",\"width\":811,\"height\":232,\"caption\":\"PingCAP\"},\"image\":{\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/facebook.com\/pingcap2015\",\"https:\/\/x.com\/PingCAP\",\"https:\/\/linkedin.com\/company\/pingcap\",\"https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/b17c1fde961eebd318de8729d595df74\",\"name\":\"TiDB Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"contentUrl\":\"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg\",\"caption\":\"TiDB Team\"},\"url\":\"https:\/\/www.pingcap.com\/ko\/blog\/author\/pingcap\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Implementing Chaos Engineering in K8s","description":"In this post, we will explore the practice of Chaos Engineering in K8s clusters, and discuss important Chaos Mesh features through the code.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pingcap.com\/ko\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/","og_locale":"ko_KR","og_type":"article","og_title":"Implementing Chaos Engineering in K8s","og_description":"In this post, we will explore the practice of Chaos Engineering in K8s clusters, and discuss important Chaos Mesh features through the code.","og_url":"https:\/\/www.pingcap.com\/ko\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/","og_site_name":"TiDB","article_publisher":"https:\/\/facebook.com\/pingcap2015","article_published_time":"2021-12-09T10:09:56+00:00","article_modified_time":"2024-08-20T13:46:24+00:00","og_image":[{"width":1501,"height":501,"url":"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png","type":"image\/png"}],"author":"TiDB Team","twitter_card":"summary_large_image","twitter_creator":"@PingCAP","twitter_site":"@PingCAP","twitter_misc":{"Written by":"TiDB Team","Est. reading time":"16\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#article","isPartOf":{"@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/"},"author":{"name":"TiDB Team","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/b17c1fde961eebd318de8729d595df74"},"headline":"Implementing Chaos Engineering in K8s: Chaos Mesh Principle Analysis and Control Plane Development","datePublished":"2021-12-09T10:09:56+00:00","dateModified":"2024-08-20T13:46:24+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/"},"wordCount":1807,"commentCount":1,"publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png","keywords":["Chaos Engineering"],"articleSection":["Engineering"],"inLanguage":"ko-KR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/","url":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/","name":"Implementing Chaos Engineering in K8s","isPartOf":{"@id":"https:\/\/www.pingcap.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#primaryimage"},"image":{"@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#primaryimage"},"thumbnailUrl":"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png","datePublished":"2021-12-09T10:09:56+00:00","dateModified":"2024-08-20T13:46:24+00:00","description":"In this post, we will explore the practice of Chaos Engineering in K8s clusters, and discuss important Chaos Mesh features through the code.","breadcrumb":{"@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/"]}]},{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#primaryimage","url":"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png","width":1501,"height":501},{"@type":"BreadcrumbList","@id":"https:\/\/www.pingcap.com\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pingcap.com\/"},{"@type":"ListItem","position":2,"name":"Implementing Chaos Engineering in K8s: Chaos Mesh Principle Analysis and Control Plane Development"}]},{"@type":"WebSite","@id":"https:\/\/www.pingcap.com\/#website","url":"https:\/\/www.pingcap.com\/","name":"\ud2f0DB","description":"TiDB | SQL at Scale","publisher":{"@id":"https:\/\/www.pingcap.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pingcap.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/www.pingcap.com\/#organization","name":"PingCAP","url":"https:\/\/www.pingcap.com\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/","url":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","contentUrl":"https:\/\/static.pingcap.com\/files\/2021\/11\/pingcap-logo.png","width":811,"height":232,"caption":"PingCAP"},"image":{"@id":"https:\/\/www.pingcap.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/facebook.com\/pingcap2015","https:\/\/x.com\/PingCAP","https:\/\/linkedin.com\/company\/pingcap","https:\/\/youtube.com\/channel\/UCuq4puT32DzHKT5rU1IZpIA"]},{"@type":"Person","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/b17c1fde961eebd318de8729d595df74","name":"TiDB Team","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/www.pingcap.com\/#\/schema\/person\/image\/","url":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","contentUrl":"https:\/\/static.pingcap.com\/files\/2022\/10\/17234942\/avatar.jpg","caption":"TiDB Team"},"url":"https:\/\/www.pingcap.com\/ko\/blog\/author\/pingcap\/"}]}},"grav_blocks":false,"card_markup":"<a class=\"card-resource bg-white\" href=\"https:\/\/www.pingcap.com\/ko\/blog\/implementing-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\/\"><div class=\"card-resource__image-container\"><img class=\"card-resource__image\" alt=\"implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development\" src=\"https:\/\/static.pingcap.com\/files\/2021\/12\/implement-chaos-engineering-in-k8s-chaos-mesh-principle-analysis-and-control-plane-development.png\" loading=\"lazy\" width=1501 height=501 \/><\/div><div class=\"card-resource__content-container\"><div class=\"card-resource__content-head\"><div class=\"card-resource__category\">Engineering<\/div><\/div><h5 class=\"card-resource__title\">Implementing Chaos Engineering in K8s: Chaos Mesh Principle Analysis and Control Plane Development<\/h5><\/div><\/a>","_links":{"self":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/3455","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/comments?post=3455"}],"version-history":[{"count":8,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/3455\/revisions"}],"predecessor-version":[{"id":18955,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/posts\/3455\/revisions\/18955"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media\/3456"}],"wp:attachment":[{"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/media?parent=3455"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/categories?post=3455"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pingcap.com\/ko\/wp-json\/wp\/v2\/tags?post=3455"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}