Gremlin brings Chaos Engineering as a Service to Kubernetes

Enterprise

The practice of Chaos Engineering developed at Amazon and Netflix a decade ago to help those web scale companies test their complex systems for worst case scenarios before they happened. Gremlin was started by a former employee of both these companies to make it easier to perform this type of testing without a team of Site Reliability Engineers (SREs). Today, the company announced that it now supports chaos engineering-style testing on Kubernetes clusters.

The company made the announcement at the beginning of KubeCon, the Kubernetes conference taking place in San Diego this week.

Gremlin co-founder and CEO Kolton Andrus says that the idea is to be able to test and configure Kubernetes clusters so they will not fail, or at least reduce the likelihood. He says to do this it’s critical to run chaos testing in live environments, whether you’re testing Kubernetes clusters or anything else, but it’s also a bit dangerous to do be doing this. He says to mitigate the risk, best practices suggest that you limit the experiment to the smallest test possible that gives you the most information.

“We can come in and say I’m going to deal with just these clusters. I want to cause failure here to understand what happens in Kubernetes when these pieces fail. For instance, being able to see what happens when you pause the scheduler. The goal is being able to help people understand this concept of the blast radius, and safely guide them to running an experiment,” Andrus explained.

In addition, Gremlin is helping customers harden their Kubernetes clusters to help prevent failures with a set of best practices. “We clearly have the tooling that people need [to conduct this type of testing], but we’ve also learned through many, many customer interactions and experiments to help them really tune and configure their clusters to be fault tolerant and resilient,” he said.

The Gremlin interface is designed to facilitate this kind of targeted experimentation. You can check the areas you want to apply a test, and you can see graphically what parts of the system are being tested. If things get out of control, there is a kill switch to stop the tests.

Gremlin Kubernetes testing screen. Screenshot: Gremlin

Gremlin launched in 2016. Its headquarters are in San Jose. It offers both a freemium and pay product. The company has raised almost $27 million, according to Crunchbase data.

Products You May Like

Articles You May Like

California can ban new gas cars starting in 2035, EPA says
TuSimple’s former CEO wants a new board that will liquidate the company
YouTube’s latest test lets creators post voice notes as comments
Shein must cede Indians’ data, control of local ops to re-enter India
Meet Skyseed, a VC fund and incubator backing the Bluesky and AT Protocol ecosystem

Leave a Reply

Your email address will not be published. Required fields are marked *