This is a flask webservice app that presents labeling tasks to users who visit in exchange for completion IDs. The expectation is that those IDs are then redeemed on crowdworking websites like Mechanical Turk.
This repo can be set up with a simple call to pip to install the requirements.txt which can be found in the root of this repo
pip install -r requirements.txt
This service can be run using a simple call to python, lets use a sample task config for now.
python main.py ./tasks/sample_task.json
This app uses a series of config files to detail the tasks that you want to run. These config file are all detailed in the master config file which defines the task. Please reference the sample_task.json (found at /tasks/sample_task.json) as a starting point.
This config file details all the higher order information of the task that you want your users to complete. This includes the following details:
- output_path : The path to which the labels and data generated by the user are saved.
- participating_users_output_path: The path to which a list of all users who participated in your task (and their completion hashes) is saved.
- input_path : The path at which all the data that you want labeled is stored (More details on this format later)
- task_template_path : The path INSIDE THE /TEMPLATES FOLDER where the html page that will contain the task for the user to complete
- admin_users_path : The path where the list of users who will have admin access to add or remove tasks while the service is live
- samples_per_task : The number of samples that you want the user to label before they are "done"
- welcome_instructions : The instructions that are presented to the user on the splash page
- task_page_instructions : The instructions that are presented to the user on the task page
One config file to speak about in particular is the "input_path" field. This is another file that contains the data that you want to have annotated. This is a JSON file which should contain at the top level, a list of datasets. Each dataset will contain an ID, a name, and a set of samples that you want to have annotated. Each dataset must have a unique ID. Each and each sample in a single dataset must also have a unique ID. The ID space for datasets and samples may overlap, and the ID spaces for samples in differend datasets may overlap as well. These samples are selected randomly and loaded into your task page for annotation.
The best way to learn is by doing! Play around with the sample data and the sample task to get an understanding of the data structures and how they translate into the app.
While the service is running you are able to added and delete data on the fly. You can do this with an admin user account. An admin user account can be created using the createUsers.py script in the /scripts folder. You then go to URL_TO_TASK/admin-cms (usually this is something like 127.0.0.1:5000/admin-cms) to access the admin content management system (CMS). Log in with the user name and password that you entered in the createUsers.py script. From here you are able to add or remove datasets and samples from the data you are working with. Note that on login your password is transmitted to the service clear text, but it is never stored in clear text on the service. Don't reuse passwords.