ApacheCon North America 2020 Talk 2


by Shekhar Prasad Rajak — Posted on October 1, 2020

Back to Blog Home page

ApacheCon @Home 2020

Title: Cluster Management in Apache Ecosystem & Kubernetes


Apache have powerful cluster & resource manager already, so do we really need to use Kubernetes for the deployment while using Apache projects ?

Let’s find out what type of cluster management system Apache already have ,How cluster management works in each of below cases and when we don’t need any other cluster management top of it and when we can leverage the power of both this apache cluster modes and Kubernetes in resource & cluster management.

  • Apache Spark Standalone: A simple cluster manager available as part of the Spark distribution..

  • Apache Mesos: A general purpose distributed OS level push based scheduler & resource manager.

  • Apache Hadoop YARN: A distributed computing framework for monolithic job scheduling and cluster resource management for Hadoop cluster (Apache/CDH/HDP)

We will see some benchmarks and features that kubernetes can provide but it is not present(or not mature enough) in the Apache ecosystem, but still using, one or both can improve the performance.

We will deep dive into fundamentals of Kubernetes and Apache distribution, resource & cluster management system, Job scheduling, to get clear cut idea behind both ecosystems and why they are best in particular cases like Big Data, Machine Learning, Load balancer, and so on.

Applications are containerised in Kubernetes Pod, Kubernetes Service is used as Load balancer, Kubernetes High availability is because of distribution of Pods in worker nodes, Local Storage, Persistent volume & Networking and many other features will be compared side by side with Apache Ecosystem. Like in Mesos, Application Group models dependencies as a tree of groups and Components are started in dependency order, Mesos-DNS works as basic load balancer, applications distribution among slave nodes, two-level scheduling mechanism, modern kernel “cgroups” in Linux & “zones” in Solaris, and so on.

Along with the comparison & benchmark the talk will provide practical guide to use the Apache project with Kubernetes. Audience will understand the Software System design and generic problems of processing the request through the cluster & resource managers and why it is important to have modular, micro service based, loosely coupled software design, so that it can easily go through the container or OS level cluster management systems.

This talk is clearly not to show who is winning but how can you win in your time, in the dark situation.

Follow @shekharrajak