From 484e3f84bbb6904529178b7eda01b16ddea39506 Mon Sep 17 00:00:00 2001 From: "rajiv.sambasivan@gmail.com" Date: Fri, 8 May 2020 17:13:52 +0530 Subject: [PATCH] README fixes. --- README.md | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index bffe15e..c216248 100644 --- a/README.md +++ b/README.md @@ -11,35 +11,38 @@ ArangoML Pipeline is a common and extensible Metadata Layer for Machine Learning **News:** -[ArangoML Pipeline Cloud](https://www.arangodb.com/2020/01/arangoml-pipeline-cloud-manage-machine-learning-metadata/) is offering a no-setup, free-to-try managed service for ArangpML Pipeline. A [ArangoML Pipeline Cloud tutorial](https://colab.research.google.com/github/arangoml/arangopipe/blob/master/examples/Arangopipe_with_TensorFlow_Beginner_Guide.ipynb#) is also available without any installation or Signup. +[ArangoML Pipeline Cloud](https://www.arangodb.com/2020/01/arangoml-pipeline-cloud-manage-machine-learning-metadata/) is offering a no-setup, free-to-try managed service for ArangpML Pipeline. A [ArangoML Pipeline Cloud tutorial](https://colab.research.google.com/github/arangoml/arangopipe/blob/master/examples/Arangopipe_with_TensorFlow_Beginner_Guide.ipynb#) is also available without any installation or signup. ## Quick Start -To get started with no installations of any sort (using ArangoML Pipeline Cloud) +To get started without any installations (using ArangoML Pipeline Cloud) , click : [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/arangoml/arangopipe/blob/master/examples/Arangopipe_Feature_Examples.ipynb) -The [examples folder](https://github.com/arangoml/arangopipe/tree/master/examples) contains examples more example notebooks that illustrate the features of **Arangopipe**. +The [examples folder](https://github.com/arangoml/arangopipe/tree/master/examples) contains notebooks that illustrate the features of **Arangopipe**. + +## Overview +When machine learning pipelines are created, for example using [TensorFlow Extended](https://www.tensorflow.org/tfx/guide) or [Kubeflow](https://www.kubeflow.org/), +the capture (and access to) of metadata across the pipeline is vital. Typically, each component of such an ML pipeline produces or requires metadata, for example: -## Introduction -When productizing Machine Learning Pipelines (e.g., [TensorFlow Extended](https://www.tensorflow.org/tfx/guide) or [Kubeflow](https://www.kubeflow.org/)) -the capture (and access to) of metadata across the pipeline is vital. Typically, each of the components of such ML pipeline produces/requires Metadata, for example: * Data storage: size, location, creation date, checksum, ... * Feature Store (processed dataset): transformation, version, base datasets ... * Model Training: training/validation performance, training duration, ... * Model Serving: model linage, serving performance, ... -Instead of each component storing its own metadata, a common Metadata Layer allows for queries across the entire pipeline and more efficient management. -[**ArangoDB**](https://www.arangodb.com) being a multi model database supporting both efficient document and graph data models within a single database engine is a great fit for such kind of common metadata layer for the following reasons: +Instead of each component storing its metadata, a common metadata layer simplifies data management and permits querying the entire pipeline. +[**ArangoDB**](https://www.arangodb.com), being a multi model database, supporting both efficient document and graph data models within a single database engine, is a great fit for such a metadata layer, for the following reasons: + * The metadata produced by each component is typically unstructured (e.g., TensorFlow's training metadata is different from PyTorch's metadata) and hence a great fit for document databases * The relationship between the different entities (i.e., metadata) can be neatly expressed as graphs (e.g., this model has been trained by *run_34* on *dataset_y*) -* Querying the metadata can be easily expressed as a graph traversal (e.g., all models which have been derived from *dataset_y*) +* Metadata queries are easily expressed as graph traversals (e.g., all models which have been derived from *dataset_y*) ## Use Cases -ArangoML Pipeline benefits many different scenarios including: -* Capture of Lineage Information (e.g., Which dataset influences which Model?) -* Capture of Audit Information (e.g, A given model was training two months ago with the following training/validation performance) -* Reproducible Model Training -* Model Serving Policy (e.g., Which model should be deployed in production based on training statistics) +ArangoML Pipeline can benefit many scenarios, such as: + +* Capture of lineage information (e.g., Which dataset influences which model?) +* Capture of audit information (e.g, A given model was training two months ago with the following training/validation performance) +* Reproducible model training +* Model serving policy (e.g., Which model should be deployed in production based on training statistics) * Extension of existing ML pipelines through simple python/HTTP API ## Documentation