Skip to content

Propagating Changes in One Directory to Another

Ahmed Abdul Hamid edited this page Sep 3, 2019 · 15 revisions

Overview

In this use case, we create Brooklin datastreams to reflect changes in one file system directory to another.

Summary

Prerequisites

Brooklin requires Java Development Kit 8+. Here are some options:

Instructions

1. Set up ZooKeeper

  1. Download the latest stable release of ZooKeeper.

  2. Untar the ZooKeeper tarball

    tar -xzf zookeeper-3.4.14.tar.gz
    cd zookeeper-3.4.14 
  3. Start a ZooKeeper server

    bin/zkServer.sh start conf/zoo_sample.cfg &

2. Set up Brooklin

  1. Download the latest tarball (tgz) from Brooklin releases.
  2. Untar the Brooklin tarball
    tar -xzf brooklin-1.0.0.tgz
    cd brooklin-1.0.0 
  3. Run Brooklin
    bin/brooklin-server-start.sh config/dir-sync-example.properties >/dev/null 2>&1 &

3. Create a datastream

  1. Create a datastream to sync changes made in a source directory to a destination directory.

    # Replace <src-dir> and <dest-dir> below with file paths of source and destination 
    # directories, respectively
    bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n first-dir-datastream -s <src-dir> -d <dest-dir> -dp 1 -c dirC -p 1 -t dirTP -m '{"owner":"test-user"}'

    Here are the options we used to create this datastream:

    -o CREATE                      The operation is datastream creation
    -u http://localhost:32311/     Datstream Management Service URI
    -n first-dir-datastream        Datastream name
    -s <src-dir>                   Datastream source (source directory path in this case)
    -d <dest-dir>                  Datastream destination (destination directory path in this case)
    -c dirC                        Connector name ("dirC" is the name we use to refer to DirectoryConnector in config)
    -t dirTP                       Transport provider name ("dirTP" is the name we use to refer to DirectoryTransportProvider in config)
    -p 1                           Number of source partitions
    -dp 1                          Number of destination partitions
    -m '{"owner":"test-user"}'     Datastream metadata (specifying datastream owner is mandatory)
    
  2. Verify the datastream creation by requesting all datastream metadata from Brooklin using the command line REST client.

    bin/brooklin-rest-client.sh -o READALL -u http://localhost:32311/
  3. You can also view some more information about the different Datastreams and DatastreamTasks by querying the health monitoring REST endpoint of the Datastream Management Service.

    curl -s "http://localhost:32311/health"

4. Try it out

  1. Add/Modify/Delete files and/or directories in the source directory you specified when you created the datastream in step 3.

Please note that files/directories present in the source directory before datastream creation will not be copied to the destination directory. Only the ones you change after the datastream is created will be reflected in the destination.

  1. Observe the destination directory you specified when you created the datastream in step 3.

  2. If you wish to delete the datastream you created, you can do so by running:

    bin/brooklin-rest-client.sh -o DELETE -u http://localhost:32311/ -n first-dir-datastream
  3. Feel free to explore the various operations you can perform on datastreams using the REST client utility.

    bin/brooklin-rest-client.sh --help

5. Stop Brooklin and ZooKeeper

When you are done, run the following commands to stop all running apps.

cd brooklin-1.0.0
bin/brooklin-server-stop.sh

cd zookeeper-3.4.14
bin/zkServer.sh stop conf/zoo_sample.cfg