Securing Kafka, Istio-style, with higher than native mTLS performance in microservice environments

Kafka is emerging as a dominant messaging platform in microservice environments. Whereas one-to- one, request/response-style communications are handled using direct service-to-service communication, one-to- many/many-to-one Pub/Sub communications are better handled using a messaging platform like Kafka. Today’s typical microservice deployments involve producer and consumer services running in highly ephemeral container orchestration environments such as Kubernetes, Mesos, or Docker Swarm. Kafka usually serves as an external data service that the producers and consumers use for rapidly communicating data.

Typical Microservices Environment with sidecars for Kafka security

Typical Microservices Environment with sidecars for Kafka security

To effectively secure communications in a microservices environment, it is necessary to secure both producers/consumers and data flowing through Kafka. To secure producers and consumers, a promising new platform called Istio was just announced (May 2017) in the Kubernetes community. Istio is an open service mesh platform that manages and secures communication between services. Istio’s approach is to deploy a sidecar container alongside workloads and use mTLS protocol as the backbone to encrypt data-in- transit and send identities across for API-level access control, all without changing a single line of application code.

At Banyan, we leverage such an Istio-style approach for protecting Kafka providing superior security, higher mTLS performance, and several operational benefits compared to using only the native security features in Kafka. In particular, this approach provides:

  • Deep visibility and independent audit into the ephemeral producers/consumers enabled by decorating mTLS certificates exchanged with Kafka brokers.
  • Higher than native mTLS performance made possible primarily by using high-speed TLS library, ciphersuites and other optimizations, as opposed to using native Java SSLEngine, which has several known performance limitations.
  • State-of-the-art secure encryption (e.g., TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) independent of the vagaries of Java versions on the server and language/TLS library implementation on the clients
  • Fast CRUD-based policy matching at the topic-level using an easy-to-use RBAC framework to write universal, high-level policies
  • Separation of application development from security considerations, allowing developers to just focus on application logic and velocity.
  • Fine-grained access controls such as leased access to producers and consumers that is valid only for specific times and duration.
  • Highly secure PKI infrastructure enabled by high frequency certificate rotation, secure key management (e.g., in-memory keys), and secure bootstrapping.

Head on over to our article on Medium to learn more.

Our performance results show that for typical microservice deployments, where the number of concurrent connections is high (>=64) and the record sizes are small (<= 1KB) our system provides ~200-300% performance improvement over native TLS implementation both in terms of Throughput and Response Times. The results also show that the native performance gets worse as the record sizes get smaller and the number of concurrent connections increase, both of which are required for fast producer-to-consumer messaging in microservices. Although there is a CPU cost associated with using a sidecar (~15% for 32MB/s and 64 connections), the security, performance, and operational benefits provided by this approach easily outweigh the CPU overhead for most deployments.

Throughput Comparisons Response Time Comparisons

Sustained Throughput and Average Response Time Comparisons