Skip to content
This repository has been archived by the owner on Nov 17, 2021. It is now read-only.

Server To NLP Tools Communication

heckej edited this page Apr 30, 2020 · 19 revisions

About cluster-connector

This Python module functions as an API to communicate with the Cluster Server. It can be used by a tool that is written in Python and should support the server by means of providing certain services. The communication protocol used here is specifically focused on providing the services of the Natural Language Processing tool in this repository, that can determine offensiveness, nonsense and compare questions. However it's not hard to change this protocol to another purpose.

Getting the package

Installation

To install the cluster-connector to your virtual environment:

  1. Open a terminal.
  2. Activate the virtual environment (for a conda environment use conda activate [environment-name]).
  3. Execute the following command:
pip install --index-url https://clusterdocs.azurewebsites.net/pkg/ cluster-connector

Alternatively you can download the wheels or tar.gz files from the ClusterConnector repository and install them manually from the directory that contains the downloaded file:

pip install cluster-connector-x.y.z.tar.gz

or

pip install cluster-connector-x.y.z-py3-none-any.whl

Upgrade

To upgrade to the newest version use:

pip install --index-url https://clusterdocs.azurewebsites.net/pkg/ cluster-connector --upgrade

In case you are sure there is a newer version (because we told you so) and pip won't let you upgrade, first uninstall cluster-connector using:

pip uninstall cluster-connector

Afterwards install it again using the instructions above.

Usage

Use the following import statement to make use of the connector module:

from cluster import connector

The Connector() class can be used as

con = connector.Connector()

Documentation

Documentation of the most recent version of the Python NLP Cluster API to connect to the Cluster server can be found here. Note that this API is still under development, so there are a lot of upcoming changes in the implementation, but the interface as it is right now shouldn't change too much. The interface is somewhat different to the proposed interface, but a lot is still the same. Older versions of the documentation can be found using the format /docs/version/x.y.z.

Changes

When a new version of cluster-connector is released, some information about the changes will be added here.

  • sentence and sentence_id keys correctly added to nonsense task.
  • When in debugging mode, the exception StopAsyncIteration and RunTimeError is no longer printed where it occurred before.
  • The method reply() no longer expects the argument to contain a sentence key as it mistakenly did before.
  • When in debugging mode, the exception StopAsyncIteration is no longer printed where it occurred before.
  • New authorization argument in Connector and WebsocketThread. This argument will become compulsory in a future release.
  • Security issue related to websocket connection temporarily fixed (same problem as in 1.0.1). A decent fix will result in a new major release.
  • Security issue related to websocket connection temporarily fixed. A decent fix will result in a new major release.

This version is a major update involving several incompatibility changes related to some attributes and the task/message protocol.

  • Usage of websockets enabled by default.
  • Usage of requests no longer supported. Therefore the attributes use_websocket, prefetch and fetch_in_background have been removed. Also the specifications of some methods that made use of requests (get_next_item() and reply()) has been updated.
  • Support added for general offensiveness test for both questions and answers.
  • Nonsense protocol implemented generally.
  • Usage of websockets added as connection method instead of GET/POST requests. This is officially still under development, so it is disabled by default, but it is greatly encouraged to enable this feature, because future releases will quite probably drop support for GET/POST requests.
  • New class WebsocketThread added. This class is used in the Connection class and should not be used elsewhere in normal circumstances.
  • Problems while parsing tasks from the server have been fixed.
  • A method to test the validity of a task has been added.
  • The method has_item() may throw websocket related exceptions from now on.
  • Logging debugging information is now available. Use the following statements to enable this:
        >> import logging, sys
        >> logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
    
  • Bug solved: KeyError occurred because of case sensitiveness in dictionary keys.
  • Bug solved: empty responses from server are now handled correctly.

The method has_task() now behaves as expected, i.e. it returns whether any tasks are available instead of always returning True.

What's behind?

Connector

The Connector class creates and starts a new WebsocketThread instance using _init_websocket_thread() which passes the following to the newly created thread:

  • self.websocket_uri: A string containing the uri of the websocket host with which a connection should be made.
  • self._websocket_exceptions: A reference to a queue in which raised exceptions should be saved to be passed on to the caller of this thread.
  • self.add_tasks: An instance method of the calling class that handles received messages.
  • self._reply_queue: A queue in which the messages to be sent can be found.
  • asyncio.get_event_loop(): The loop to be set as the asynchronous event loop of this thread. This is needed to run the asynchronous methods inside this class.
  • self._websocket_connection_timeout: The timeout to be set when connecting to the websocket host.

Among those the references to self._websocket_exceptions, self.add_tasks() and self._reply_queue are the ones that are used for the actual communication between the thread and the calling Connector instance.

When a method is called that involves communication with the server, this method should check whether this websocket thread is still alive and connected, using the private method _checkout_websocket(). In case the websocket thread has passed an exception via the _websocket_exceptions queue, that particular exception is raised. Because of that, every method calling _checkout_websocket() must mention in its documentation that it can raise exceptions. If for some (unclear) reason the websocket has exited (i.e. it is no longer alive) without passing any exception, then a new thread is initialized.

After usage of the Connector instance, the websocket connection still has to be closed using close(). This method sets self._websocket_thread.stop to True, so the websocket thread is told to close the websocket connection and return.

WebsocketThread

An instance of the WebsocketThread class receives the above mentioned values and references from the calling class instance. They are used as follows:

  • _websocket_exceptions: Whenever a method in the WebsocketThread class raises an exception that it cannot handle, this exception is added to this exception queue. It is up to the calling class instance to check regularly whether any exceptions have been passed by the thread. In most cases, afterwards the WebsocketThread attribute self.stop is set to True, so it exits.
  • add_tasks: When the websocket receives some tasks from the server, it adds them to the task list of the calling class instance using the provided method, so the tasks can be parsed according to the calling class its rules.
  • _reply_queue: An asynchronous method runs a loop to check whether any replies are available from the NLP and if there are any, they are sent to the server.
  • _websocket_connection_timeout: The thread starts a websocket connections, but if it fails to do so within the given timeout, it will pass an exception to the calling class instance and return (i.e. self.stop is set to True).

A public attribute of the WebsocketThread class is the following:

  • stop: Whenever this attribute is set to True, any running method must return immediately if it is in a loop, except for the method _communicate_with_server() which controls the websocket connection and closes it before returning, after it has found out that self.stop was set to True.

Collaborating on cluster-connector

Version numbering system

We try follow the guidelines from Semantic Versioning 2.0.0.

Creating the package

To create both a .tar.gz and a .whl package use the following command. Make sure to update the version number in the setup.py file first.

python setup.py sdist bdist_wheel

Creating documentation

Documentation is created using pdoc via the following command in the package root directory:

pdoc cluster --html