Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add edge generic api #106

Merged
merged 4 commits into from
Apr 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ If you would like to use **Arangopipe** with your pipelines, you would need to d
4. `pip install PyYAML==5.1.1`
5. `pip install pandas `

## Connecting to ArangoDB using Arangopipe
Please look at [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/arangoml/arangopipe/blob/master/arangopipe_managed_service.ipynb) for details of connecting to an ArangoDB instance to use with your **Arangopipe** installation. If you would like to save the connection information specified in your session so that it can be reused in another session, please see [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/arangoml/arangopipe/blob/master/examples/Arangopipe_Feature_Examples.ipynb) for an example.




Expand Down
5 changes: 3 additions & 2 deletions arangopipe/Dockerfile_Torch_FE
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,10 @@ FROM continuumio/miniconda3
MAINTAINER Joerg Schad <info@arangodb.com>
ENV GIT_PYTHON_REFRESH=quiet
RUN apt-get update && apt-get install -y curl
RUN pip install mlflow hyperopt sklearn2 jsonpickle python-arango jupyter matplotlib PyYAML==5.1.1 arangopipe==0.0.6.1
RUN pip install mlflow hyperopt sklearn2 jsonpickle python-arango jupyter matplotlib PyYAML==5.1.1 arangopipe==0.0.6.9.3
RUN mkdir -p /workspace
RUN conda install pytorch -c pytorch
#RUN conda install pytorch
RUN pip install torch==1.2.0 torchtext==0.4

WORKDIR /
COPY --from=0 / .
Expand Down
2 changes: 1 addition & 1 deletion arangopipe/arangopipe/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# __init__.py

# Version of the arangopipe package
__version__ = "0.0.6.9.3"
__version__ = "0.0.6.9.4"
34 changes: 30 additions & 4 deletions arangopipe/arangopipe/arangopipe_storage/arangopipe_admin_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -393,8 +393,8 @@ def remove_vertex_from_arangopipe(self, vertex_to_remove, purge=True):

return

def add_edge_definition_to_arangopipe(self, edge_name, from_vertex_name,
to_vertex_name):
def add_edge_definition_to_arangopipe(self, edge_col_name, edge_name,
from_vertex_name, to_vertex_name):
rf = self.cfg['arangodb'][self.mscp.DB_REPLICATION_FACTOR]

if not self.db.has_graph(self.cfg['mlgraph']['graphname']):
Expand All @@ -415,16 +415,42 @@ def add_edge_definition_to_arangopipe(self, edge_name, from_vertex_name,

else:
if not self.emlg.has_edge_definition(edge_name):
self.db.create_collection(edge_name, edge = True,\
if not self.emlg.has_edge_collection(edge_col_name):
self.db.create_collection(edge_col_name, edge = True,\
replication_factor = rf)
self.emlg.create_edge_definition(edge_collection = edge_name,\

self.emlg.create_edge_definition(edge_collection = edge_col_name,\
from_vertex_collections=[from_vertex_name],\
to_vertex_collections=[to_vertex_name] )
else:
logger.error("Edge, " + edge_name + " already exists!")

return

def add_edges_to_arangopipe(self, edge_col_name, from_vertex_list,
to_vertex_list):
rf = self.cfg['arangodb'][self.mscp.DB_REPLICATION_FACTOR]

if not self.db.has_graph(self.cfg['mlgraph']['graphname']):
self.emlg = self.db.create_graph(self.cfg['mlgraph']['graphname'])
else:
self.emlg = self.db.graph(self.cfg['mlgraph']['graphname'])

#Check if all data needed to create an edge exists, if so, create it

if not self.emlg.has_edge_collection(edge_col_name):
msg = "Edge collection %s did not exist, creating it!" % (
edge_col_name)
logger.info(msg)
self.db.create_collection(edge_col_name, edge = True,\
replication_factor = rf)

ed = self.emlg.create_edge_definition(edge_collection = edge_col_name,\
from_vertex_collections= from_vertex_list,\
to_vertex_collections= to_vertex_list )

return

def remove_edge_definition_from_arangopipe(self, edge_name, purge=True):

if not self.db.has_graph(self.cfg['mlgraph']['graphname']):
Expand Down
17 changes: 17 additions & 0 deletions arangopipe/arangopipe/arangopipe_storage/arangopipe_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,23 @@ def lookup_entity(self, asset_name, asset_type):

return asset_info

def find_entity(self, attrib_name, attrib_value, asset_type):
aql = 'FOR doc IN %s FILTER doc.%s == @value RETURN doc' % (
asset_type, attrib_name)
# Execute the query
cursor = self.db.aql.execute(aql, bind_vars={'value': attrib_value})
asset_keys = [doc for doc in cursor]

asset_info = None
if len(asset_keys) == 0:
msg = "Asset %s with %s = %s was not found!" % (
asset_type, attrib_name, attrib_value)
logger.info(msg)
else:
asset_info = asset_keys

return asset_info

def lookup_dataset(self, dataset_name):
""" Return a dataset identifier given a name. This can be used to get the dataset id that is used to log run information associated with execution of the pipeline."""

Expand Down
3 changes: 2 additions & 1 deletion arangopipe/arangopipe_frontend/app/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion arangopipe/makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ python_arangopipe:$(SRC)
upload_test_pypi:
twine upload --repository-url https://test.pypi.org/legacy/ -u rajiv.sambasivan -p $(TEST_PYPI_PASSWORD) dist/*
docker_APSI_build:$(DOCKER_SI_FILE)
docker build -t $(DOCKER_SI_IMG_NAME) -f $(DOCKER_SI_FILE) .
docker build -t $(DOCKER_SI_IMG_NAME) -f $(DOCKER_SI_FILE) .
docker_publish_SI_latest:
@echo 'starting docker SI build...'
docker login --username arangopipe --password $(DOCKER_PASSWORD)
Expand Down
2 changes: 1 addition & 1 deletion arangopipe/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# This call to setup() does all the work
setup(
name="arangopipe",
version="0.0.6.9.3",
version="0.0.6.9.4",
description="package for machine learning meta-data management and analysis",
long_description=README,
long_description_content_type="text/markdown",
Expand Down
2 changes: 1 addition & 1 deletion arangopipe/startup_commands.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ echo "Waiting for arangod"
sleep 5
done
echo "arangod is up!"
foxx install /apmdb /aisis-foxx/aisis-foxx.zip -u root -p /aisis-foxx/passwd.txt
foxx install /createDB /aisis-foxx/aisis-foxx.zip -u root -p /aisis-foxx/passwd.txt
export PYTHONPATH=$PYTHONPATH:/workspace/experiments/examples/test_data_generator
python -c "from generate_model_data import generate_runs; generate_runs()"
npm start
10 changes: 5 additions & 5 deletions arangopipe/tests/CItests/arangopipe_testcases.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,20 +462,20 @@ def test_dataset_shift_negative(self):
def add_edge_to_arangopipe(self):
self.admin.add_vertex_to_arangopipe('test_vertex_s')
self.admin.add_vertex_to_arangopipe('test_vertex_d')
self.admin.add_edge_definition_to_arangopipe('test_edge',
self.admin.add_edge_definition_to_arangopipe('test_edge_col', 'test_edge',
'test_vertex_s', 'test_vertex_d')
return

def test_arangopipe_edge_add(self):
self.add_edge_to_arangopipe()
self.assertTrue(self.admin.has_edge('test_edge'))
self.assertTrue(self.admin.has_edge('test_edge_col'))

return

def remove_edge_from_arangopipe(self):
self.admin.add_vertex_to_arangopipe('test_vertex_s1')
self.admin.add_vertex_to_arangopipe('test_vertex_d1')
self.admin.add_edge_definition_to_arangopipe('test_edge_1',
self.admin.add_edge_definition_to_arangopipe('test_edge_col', 'test_edge_1',
'test_vertex_s1', 'test_vertex_d1')
self.admin.remove_edge_definition_from_arangopipe(
'test_edge_1', purge=True)
Expand Down Expand Up @@ -508,9 +508,9 @@ def add_edge_link(self):
sd = {'name': "sample doc"}
v1 = self.ap.insert_into_vertex_type('test_vertex_s3', sd)
v2 = self.ap.insert_into_vertex_type('test_vertex_s4', sd)
self.admin.add_edge_definition_to_arangopipe('test_edge',
self.admin.add_edge_definition_to_arangopipe('test_edge_col', 'test_edge',
'test_vertex_s3', 'test_vertex_s4')
ei = self.ap.insert_into_edge_type('test_edge', v1, v2)
ei = self.ap.insert_into_edge_type('test_edge_col', v1, v2)

return ei

Expand Down
47 changes: 32 additions & 15 deletions arangopipe/tests/container_tests/torch_arangopipe_testcases.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,48 @@
import sys, traceback
from ch_torch_linear_regression_driver import run_driver
from arangopipe.arangopipe_storage.managed_service_conn_parameters import ManagedServiceConnParam
import yaml


class TestArangopipe(unittest.TestCase):


def __init__(self, *args, **kwargs):
super(TestArangopipe, self).__init__(*args, **kwargs)
self.test_cfg = self.get_test_config()
self.mscp = ManagedServiceConnParam()

return

def setUp(self):
conn_config = ArangoPipeConfig()
self.mscp = ManagedServiceConnParam()
conn_params = { self.mscp.DB_SERVICE_HOST : "localhost", \
self.mscp.DB_ROOT_USER : "root",\
self.mscp.DB_ROOT_USER_PASSWORD : "open sesame",\
self.mscp.DB_SERVICE_END_POINT : "apmdb",\
self.mscp.DB_SERVICE_NAME : "createDB",\
self.mscp.DB_SERVICE_PORT : 8529,\
self.mscp.DB_CONN_PROTOCOL : 'http'}
conn_params = { self.mscp.DB_SERVICE_HOST : self.test_cfg['arangodb'][self.mscp.DB_SERVICE_HOST], \
#self.mscp.DB_ROOT_USER : self.test_cfg['arangodb'][self.mscp.DB_ROOT_USER],\
#self.mscp.DB_ROOT_USER_PASSWORD : self.test_cfg['arangodb'][self.mscp.DB_ROOT_USER_PASSWORD],\
self.mscp.DB_SERVICE_END_POINT : self.test_cfg['arangodb'][self.mscp.DB_SERVICE_END_POINT],\
self.mscp.DB_SERVICE_NAME : self.test_cfg['arangodb'][self.mscp.DB_SERVICE_NAME],\
self.mscp.DB_SERVICE_PORT : self.test_cfg['arangodb'][self.mscp.DB_SERVICE_PORT],\
self.mscp.DB_CONN_PROTOCOL : self.test_cfg['arangodb'][self.mscp.DB_CONN_PROTOCOL]}


conn_config = conn_config.create_connection_config(conn_params)
self.admin = ArangoPipeAdmin(reuse_connection = False, config = conn_config)
the_config = self.admin.get_config()
self.ap = ArangoPipe(config = the_config)
self.provision_project()

return


def get_test_config(self):
file_name = os.path.join(os.path.dirname(__file__),
"../test_config/test_datagen_config.yaml")
with open(file_name, "r") as file_descriptor:
test_cfg = yaml.load(file_descriptor, Loader=yaml.FullLoader)

return test_cfg




def provision_project(self):
err_raised = False
Expand Down Expand Up @@ -441,12 +463,7 @@ def test_arangopipe_edge_link_add(self):
return


def tearDown(self):
#pass
self.admin.delete_arangomldb()
self.ap = None
self.admin = None
return


if __name__ == '__main__':
unittest.main()
Binary file added arangopipe/tests/pytorch/.data/ag_news_csv.tar.gz
Binary file not shown.
4 changes: 4 additions & 0 deletions arangopipe/tests/pytorch/.data/ag_news_csv/classes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
World
Sports
Business
Sci/Tech
19 changes: 19 additions & 0 deletions arangopipe/tests/pytorch/.data/ag_news_csv/readme.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
AG's News Topic Classification Dataset

Version 3, Updated 09/09/2015


ORIGIN

AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).


DESCRIPTION

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.

The file classes.txt contains a list of classes corresponding to each label.

The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 4), title and description. The title and description are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is "\n".
Loading