Introduction to Apache Zookeeper

Introduction to Apache Zookeeper

·

3 min read

What is Apache Zookeeper

Imagine we have two or more application servers of the same or different application instances and they need to share some config data, each instance can modify the data and when it does the other need to be aware of the update.

so how can we achieve this behavior? let's think

  1. we could use a common database, applications can read from and write changes to it, and they will query the data frequently at a certain interval to read the data and sync the changes

  2. we could use socket. the applications will have their socket clients and a separate socket server which will host the config. Clients would read, push changes and ng get notified when data changes via socket

problesm with 1st approch

  • won`t get updates instantly

  • put extra load on the database by frequently querying it

problems with 2nd approach

  • you need to set up a socket server from scratch

  • implement logic on the application level for data reading and changes

  • availability issues, single point of failure

now Introducing Apache Zookeeper which solves all the problems with both of the approaches,

what is Zookeeper?

it is a distributed config manager, a Distributed Coordination Service for Distributed Applications

when do we use it?

  • when we need to share config or coordination data between applications

  • update config data

  • watch for changes in data

concepts

  • you can think of it as a file directory tree-like structure

  • so just like a tree, there are nodes and child nodes

  • a node can have both data and other child nodes

  • nodes that contain data are called znodes

  • image

  • zookeeper data is kept on memory for low latency

  • zookeeper is replicated

  • clients maintain a TPC connection with the server to send, recive requests and get updates

  • zookeeper marks each update with a number to reflect the order of sk transactions

  • every node in Zookeeper is identified by a path (/ being the root path/node)

  • zookeeper has the concept of ephemeral nodes these znodes exist as long as the session that the znode was created by is active. When the session ends the znode is automatically deleted

  • zookeeper has watched, client can watch a znode for any changes

  • some simple APIs provided by zookeeper

create : creates a node at a location in the tree

delete : deletes a node

exists : tests if a node exists at a location

get data : reads the data from a node

set data : writes data to a node

get children : retrieves a list of children of a node

sync : waits for data to be propagated

Hands-on

we will be using docker for this

let's create a zookeeper container

   docker run --name my-zk zookeeper:latest

now lest enter into the container on a new terminal

   docker exec -it my-zk /bin/bash

we are going to use zkCli.sh to interact with the zookeeper server as a client, to enter the client

/apache-zookeeper-3.8.0-bin/bin/zkCli.sh

create a node on zkCli terminal

create /test

set data on node

set /test "count=0"

or we could create and set data with a single command

create /test "count=0"

now to retrieve the data

get /test 
/* count=0 */

watch for data change

now let's open up a new terminal and enter the zookeeper cli

docker exec -it my-zk /bin/bash

watch for changes in /test node

addWatch /test

now on the previous terminal set new data for /test node, and we would see that the second terminal reacted to the change

set /test count=1

image

References