Skip to content

Commit

Permalink
* Bundle libpostal_data program, executable via Loader.load() fo…
Browse files Browse the repository at this point in the history
…r convenience (issue #939)
  • Loading branch information
saudet committed Sep 4, 2020
1 parent ab34a7f commit bfbd6da
Show file tree
Hide file tree
Showing 15 changed files with 108 additions and 20 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@

* Bundle `libpostal_data` program, executable via `Loader.load()` for convenience ([issue #939](https://github.com/bytedeco/javacpp-presets/issues/939))
* Enable all stable target architectures in the presets for LLVM ([pull #937](https://github.com/bytedeco/javacpp-presets/pull/937))
* Virtualize `QObject` and its subclasses from Qt to allow customization ([issue bytedeco/javacpp#419](https://github.com/bytedeco/javacpp/issues/419))
* Bundle programs from Clang and LLVM, executable via `Loader.load()` for convenience ([issue #833](https://github.com/bytedeco/javacpp-presets/issues/833))
Expand Down
23 changes: 16 additions & 7 deletions libpostal/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,22 @@ This directory contains the JavaCPP Presets module for:
libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data.
The goal of this project is to understand location-based strings in every language, everywhere.


Data Files
----------

libpostal needs to download a few gigabytes of data from S3. The basic files are on-disk representations of the data structures necessary to perform expansion.
For address parsing, since model training takes a few days, the libpostal team publishes the fully trained model to S3 and will update it automatically as new addresses get added to OSM, OpenAddresses, etc.
Same goes for the language classifier model. Data files are automatically downloaded when you run the build with enabled data download.
To check for and download any new data files, you can either run ```make```, or run:

```libpostal_data download all $YOUR_DATA_DIR```
```java
String libpostal_data = Loader.load(org.bytedeco.libpostal.libpostal_data.class);
ProcessBuilder pb = new ProcessBuilder("bash", libpostal_data, "download", "all", "/path/to/libpostal/data/");
pb.inheritIO().start().waitFor();
```

Replace `"/path/to/libpostal/data/"` with the path where the training data should be stored.

Replace $YOUR_DATA_DIR with the path where the training data should be stored.

Documentation
-------------
Expand All @@ -36,7 +41,7 @@ Here is a simple example of the libpostal parser and normalization functionality
We can use [Maven 3](http://maven.apache.org/) to download and install automatically all the class files as well as the native binaries.
To run this sample code, after creating the `pom.xml` and `Example.java` source files below, simply execute on the command line:
```bash
$ mvn compile exec:java -Dexec.args="/PATH_TO_LIBPOSTAL_TRAINING_DATA_DIRECTORY"
$ mvn compile exec:java
```

### The `pom.xml` build file
Expand All @@ -45,15 +50,15 @@ To run this sample code, after creating the `pom.xml` and `Example.java` source
<modelVersion>4.0.0</modelVersion>
<groupId>org.bytedeco.libpostal</groupId>
<artifactId>example</artifactId>
<version>1.5.3</version>
<version>1.5.4-SNAPSHOT</version>
<properties>
<exec.mainClass>Example</exec.mainClass>
</properties>
<dependencies>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>libpostal-platform</artifactId>
<version>1.1-alpha-1.5.3</version>
<version>1.1-alpha-1.5.4-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
Expand All @@ -70,7 +75,11 @@ import static org.bytedeco.libpostal.global.postal.*;

public class Example {
public static void main(String[] args) throws Exception {
String dataDir = args.length >= 1 ? new String(args[0]) : "/PATH_TO_LIBPOSTAL_TRAINING_DATA_DIRECTORY";
String dataDir = args.length >= 1 ? new String(args[0]) : "data/";
String libpostal_data = Loader.load(org.bytedeco.libpostal.libpostal_data.class);
ProcessBuilder pb = new ProcessBuilder("bash", libpostal_data, "download", "all", dataDir);
pb.inheritIO().start().waitFor();

boolean setup1 = libpostal_setup_datadir(dataDir);
boolean setup2 = libpostal_setup_parser_datadir(dataDir);
boolean setup3 = libpostal_setup_language_classifier_datadir(dataDir);
Expand Down
6 changes: 5 additions & 1 deletion libpostal/samples/Example.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@

public class Example {
public static void main(String[] args) throws Exception {
String dataDir = args.length >= 1 ? new String(args[0]) : "/PATH_TO_LIBPOSTAL_TRAINING_DATA_DIRECTORY";
String dataDir = args.length >= 1 ? new String(args[0]) : "data/";
String libpostal_data = Loader.load(org.bytedeco.libpostal.libpostal_data.class);
ProcessBuilder pb = new ProcessBuilder("bash", libpostal_data, "download", "all", dataDir);
pb.inheritIO().start().waitFor();

boolean setup1 = libpostal_setup_datadir(dataDir);
boolean setup2 = libpostal_setup_parser_datadir(dataDir);
boolean setup3 = libpostal_setup_language_classifier_datadir(dataDir);
Expand Down
4 changes: 2 additions & 2 deletions libpostal/samples/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.bytedeco.libpostal</groupId>
<artifactId>example</artifactId>
<version>1.5.3</version>
<version>1.5.4-SNAPSHOT</version>
<properties>
<exec.mainClass>Example</exec.mainClass>
</properties>
<dependencies>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>libpostal-platform</artifactId>
<version>1.1-alpha-1.5.3</version>
<version>1.1-alpha-1.5.4-SNAPSHOT</version>
</dependency>
</dependencies>
<build>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal.global;

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand All @@ -25,6 +25,9 @@ public class libpostal_address_parser_options_t extends Pointer {
@Override public libpostal_address_parser_options_t position(long position) {
return (libpostal_address_parser_options_t)super.position(position);
}
@Override public libpostal_address_parser_options_t getPointer(long i) {
return new libpostal_address_parser_options_t(this).position(position + i);
}

public native @Cast("char*") BytePointer language(); public native libpostal_address_parser_options_t language(BytePointer setter);
public native @Cast("char*") BytePointer country(); public native libpostal_address_parser_options_t country(BytePointer setter);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand Down Expand Up @@ -29,6 +29,9 @@ public class libpostal_address_parser_response_t extends Pointer {
@Override public libpostal_address_parser_response_t position(long position) {
return (libpostal_address_parser_response_t)super.position(position);
}
@Override public libpostal_address_parser_response_t getPointer(long i) {
return new libpostal_address_parser_response_t(this).position(position + i);
}

public native @Cast("size_t") long num_components(); public native libpostal_address_parser_response_t num_components(long setter);
public native @Cast("char*") BytePointer components(int i); public native libpostal_address_parser_response_t components(int i, BytePointer setter);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand All @@ -25,6 +25,9 @@ public class libpostal_duplicate_options_t extends Pointer {
@Override public libpostal_duplicate_options_t position(long position) {
return (libpostal_duplicate_options_t)super.position(position);
}
@Override public libpostal_duplicate_options_t getPointer(long i) {
return new libpostal_duplicate_options_t(this).position(position + i);
}

public native @Cast("size_t") long num_languages(); public native libpostal_duplicate_options_t num_languages(long setter);
public native @Cast("char*") BytePointer languages(int i); public native libpostal_duplicate_options_t languages(int i, BytePointer setter);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand Down Expand Up @@ -27,6 +27,9 @@ public class libpostal_fuzzy_duplicate_options_t extends Pointer {
@Override public libpostal_fuzzy_duplicate_options_t position(long position) {
return (libpostal_fuzzy_duplicate_options_t)super.position(position);
}
@Override public libpostal_fuzzy_duplicate_options_t getPointer(long i) {
return new libpostal_fuzzy_duplicate_options_t(this).position(position + i);
}

public native @Cast("size_t") long num_languages(); public native libpostal_fuzzy_duplicate_options_t num_languages(long setter);
public native @Cast("char*") BytePointer languages(int i); public native libpostal_fuzzy_duplicate_options_t languages(int i, BytePointer setter);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand All @@ -25,6 +25,9 @@ public class libpostal_fuzzy_duplicate_status_t extends Pointer {
@Override public libpostal_fuzzy_duplicate_status_t position(long position) {
return (libpostal_fuzzy_duplicate_status_t)super.position(position);
}
@Override public libpostal_fuzzy_duplicate_status_t getPointer(long i) {
return new libpostal_fuzzy_duplicate_status_t(this).position(position + i);
}

public native @Cast("libpostal_duplicate_status_t") int status(); public native libpostal_fuzzy_duplicate_status_t status(int setter);
public native double similarity(); public native libpostal_fuzzy_duplicate_status_t similarity(double setter);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand Down Expand Up @@ -33,6 +33,9 @@ public class libpostal_near_dupe_hash_options_t extends Pointer {
@Override public libpostal_near_dupe_hash_options_t position(long position) {
return (libpostal_near_dupe_hash_options_t)super.position(position);
}
@Override public libpostal_near_dupe_hash_options_t getPointer(long i) {
return new libpostal_near_dupe_hash_options_t(this).position(position + i);
}

public native @Cast("bool") boolean with_name(); public native libpostal_near_dupe_hash_options_t with_name(boolean setter);
public native @Cast("bool") boolean with_address(); public native libpostal_near_dupe_hash_options_t with_address(boolean setter);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand All @@ -25,6 +25,9 @@ public class libpostal_normalize_options_t extends Pointer {
@Override public libpostal_normalize_options_t position(long position) {
return (libpostal_normalize_options_t)super.position(position);
}
@Override public libpostal_normalize_options_t getPointer(long i) {
return new libpostal_normalize_options_t(this).position(position + i);
}

// List of language codes
public native @Cast("char*") BytePointer languages(int i); public native libpostal_normalize_options_t languages(int i, BytePointer setter);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand Down Expand Up @@ -26,6 +26,9 @@ public class libpostal_normalized_token_t extends Pointer {
@Override public libpostal_normalized_token_t position(long position) {
return (libpostal_normalized_token_t)super.position(position);
}
@Override public libpostal_normalized_token_t getPointer(long i) {
return new libpostal_normalized_token_t(this).position(position + i);
}

public native @Cast("char*") BytePointer str(); public native libpostal_normalized_token_t str(BytePointer setter);
public native @ByRef libpostal_token_t token(); public native libpostal_normalized_token_t token(libpostal_token_t setter);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Targeted by JavaCPP version 1.5.3: DO NOT EDIT THIS FILE
// Targeted by JavaCPP version 1.5.4-SNAPSHOT: DO NOT EDIT THIS FILE

package org.bytedeco.libpostal;

Expand Down Expand Up @@ -27,6 +27,9 @@ public class libpostal_token_t extends Pointer {
@Override public libpostal_token_t position(long position) {
return (libpostal_token_t)super.position(position);
}
@Override public libpostal_token_t getPointer(long i) {
return new libpostal_token_t(this).position(position + i);
}

public native @Cast("size_t") long offset(); public native libpostal_token_t offset(long setter);
public native @Cast("size_t") long len(); public native libpostal_token_t len(long setter);
Expand Down
47 changes: 47 additions & 0 deletions libpostal/src/main/java/org/bytedeco/libpostal/libpostal_data.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* Copyright (C) 2020 Samuel Audet
*
* Licensed either under the Apache License, Version 2.0, or (at your option)
* under the terms of the GNU General Public License as published by
* the Free Software Foundation (subject to the "Classpath" exception),
* either version 2, or any later version (collectively, the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
* http://www.gnu.org/licenses/
* http://www.gnu.org/software/classpath/license.html
*
* or as provided in the LICENSE.txt file that accompanied this code.
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.bytedeco.libpostal;

import org.bytedeco.javacpp.Loader;
import org.bytedeco.javacpp.annotation.Platform;
import org.bytedeco.javacpp.annotation.Properties;

import org.bytedeco.libpostal.presets.*;

/**
* With this class, we can extract easily the {@code libpostal_data} program ready for execution.
* For example, we can check for and download any new data files all from Java in a portable fashion this way:
* <pre>{@code
* String libpostal_data = Loader.load(org.bytedeco.libpostal.libpostal_data.class);
* ProcessBuilder pb = new ProcessBuilder("bash", libpostal_data, "download", "all", "/path/to/libpostal/data/");
* pb.inheritIO().start().waitFor();
* }</pre>
*
* @author Samuel Audet
*/
@Properties(
value = @Platform(executable = "libpostal_data")
)
public class libpostal_data {
static { Loader.load(); }
}

0 comments on commit bfbd6da

Please sign in to comment.