Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException in Parser and poor Unicode support #70

Closed
gwenn opened this issue Feb 14, 2016 · 31 comments
Closed

NullPointerException in Parser and poor Unicode support #70

gwenn opened this issue Feb 14, 2016 · 31 comments

Comments

@gwenn
Copy link

gwenn commented Feb 14, 2016

With JavaCPP 1.1:

import org.bytedeco.javacpp.*;
import org.bytedeco.javacpp.annotation.*;

@Properties(target="org.sqlite.SQLite", value={
@Platform(include="sqlite3.h")
})
public class SQLite {
}
javac -cp javacpp.jar SQLite.java
java -jar javacpp.jar SQLite
Targeting org/sqlite/SQLite.java
Parsing sqlite3.h
Exception in thread "main" java.lang.NullPointerException
    at org.bytedeco.javacpp.tools.Parser.declarator(Parser.java:913)
    at org.bytedeco.javacpp.tools.Parser.function(Parser.java:1245)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2347)
    at org.bytedeco.javacpp.tools.Parser.group(Parser.java:1985)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2346)
    at org.bytedeco.javacpp.tools.Parser.extern(Parser.java:2272)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2345)
    at org.bytedeco.javacpp.tools.Parser.parse(Parser.java:2421)
    at org.bytedeco.javacpp.tools.Parser.parse(Parser.java:2517)
    at org.bytedeco.javacpp.tools.Builder.parse(Builder.java:67)
    at org.bytedeco.javacpp.tools.Builder.build(Builder.java:624)
    at org.bytedeco.javacpp.tools.Builder.main(Builder.java:773)
@saudet
Copy link
Member

saudet commented Feb 14, 2016

Try with the latest on the master branch. There's been a lot of fixes since 1.1.

@gwenn
Copy link
Author

gwenn commented Feb 14, 2016

Same error with JavaCPP version 1.2-SNAPSHOT

Targeting org/sqlite/SQLite.java
Parsing sqlite3.h
Exception in thread "main" java.lang.NullPointerException
    at org.bytedeco.javacpp.tools.Parser.declarator(Parser.java:1005)
    at org.bytedeco.javacpp.tools.Parser.function(Parser.java:1441)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2598)
    at org.bytedeco.javacpp.tools.Parser.group(Parser.java:2198)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2597)
    at org.bytedeco.javacpp.tools.Parser.extern(Parser.java:2523)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2596)
    at org.bytedeco.javacpp.tools.Parser.parse(Parser.java:2672)
    at org.bytedeco.javacpp.tools.Parser.parse(Parser.java:2769)
    at org.bytedeco.javacpp.tools.Builder.parse(Builder.java:68)
    at org.bytedeco.javacpp.tools.Builder.build(Builder.java:620)
    at org.bytedeco.javacpp.tools.Builder.main(Builder.java:769)

@gwenn
Copy link
Author

gwenn commented Feb 14, 2016

A quick and dirty fix:

diff --git a/src/main/java/org/bytedeco/javacpp/tools/Parser.java b/src/main/java/org/bytedeco/javacpp/tools/Parser.java
index 9b3c8bc..2cfe4d1 100644
--- a/src/main/java/org/bytedeco/javacpp/tools/Parser.java
+++ b/src/main/java/org/bytedeco/javacpp/tools/Parser.java
@@ -1002,7 +1002,10 @@ public class Parser {
                 }
                 info = infoMap.getFirst(cppType += ")");

-                String functionType = Character.toUpperCase(originalName.charAt(0)) + originalName.substring(1);
+                String functionType = null;
+                if (originalName != null) {
+                    functionType = Character.toUpperCase(originalName.charAt(0)) + originalName.substring(1);
+                }
                 if (info != null && info.pointerTypes.length > 0) {
                     functionType = info.pointerTypes[0];
                 } else if (typedef) {

It seems that the error is related to the following line:

  void (*(*xDlSym)(sqlite3_vfs*,void*, const char *zSymbol))(void);

@saudet saudet added the bug label Feb 14, 2016
@saudet
Copy link
Member

saudet commented Feb 14, 2016

Thanks!! The parser is (not so) quick and (but quite) dirty anyway. (BTW, the first thing I'd like to try at this point for the parser is to see if we couldn't use Clang for that: #51)

@saudet
Copy link
Member

saudet commented Feb 14, 2016

In any case, if you get sqlite working, let me know and we'll start a "presets" out of that! Thanks

@gwenn
Copy link
Author

gwenn commented Feb 14, 2016

I've created a branch here.
But JavaCPP Builder fails:
https://travis-ci.org/gwenn/sqlite-jna/builds/109200887
I'll try to fix it.

saudet added a commit that referenced this issue Feb 15, 2016
@saudet
Copy link
Member

saudet commented Feb 15, 2016

Thanks! I'll try to get a look at that sooner rather than later...

@gwenn
Copy link
Author

gwenn commented Feb 15, 2016

Ok,
https://travis-ci.org/gwenn/sqlite-jna/builds/109444251
JavaCPP Builder succeed.
And only some tests related to callback are failing.

May I ask you two questions:
a) Is there any way to tell JavaCPP that some SQLite functions expect UTF-8 encoded string (a not the default/current encoding) ?
b) Is there any way to create a FunctionPointer with an address set to -1 (which means TRANSIENT for SQLITE) ?
I tried:

    public static final Destructor TRANSIENT = new Destructor() {
        {
            this.address = -1; // SQLITE_TRANSIENT
        }

        @Override
        public void call(Pointer p) {
            throw new UnsupportedOperationException();
        }
    };

with no success.
Thanks.

@gwenn
Copy link
Author

gwenn commented Feb 16, 2016

Ok,
Callbacks fixed:
https://travis-ci.org/gwenn/sqlite-jna/builds/109658416

Regards.

@saudet
Copy link
Member

saudet commented Feb 17, 2016

Great, thanks! Are you by any chance comparing the overhead of each tool (JNA, JNR, etc)? It would be nice to have some benchmarks as reference.

For the function pointer, I was starting to consider adding support for native void allocate(int i) to FunctionPointer, but you're right, we can simply overload the method and use a cast for that:

void sqlite3_result_blob(sqlite3_context c, Pointer p, int i, FunctionPointer destructor);
void sqlite3_result_blob(sqlite3_context c, Pointer p, int i, @Cast("sqlite3_destructor_type") long destructor);

It's not type safe, but it isn't in C anyway. Does it still make sense to try to cram that into FunctionPointer?

As for the character encoding, String gets passed as "modified UTF-8", because that's what we get most efficiently with JNI. To get another encoding, we can use new BytePointer(String s, String charsetName) as a @Cast("const char*") BytePointer argument. We could probably add settings and stuff, but they would need to be dynamic in some way... If you'd like to try something, any experimentation would be welcome.

@gwenn
Copy link
Author

gwenn commented Feb 17, 2016

Sorry, I have no benchmark between the different binding tools.
Right now, I don't have access to www.ch-werner.de/javasqlite/ site but I think there is a benchmark
included.

And SQLite don't like the "modified UTF-8" produced by JNI. So only the BytePointer can be used.

@saudet
Copy link
Member

saudet commented Feb 17, 2016

From what I see on your branches, it doesn't look like the other tools found a better way to pass strings in a different encoding, right?

BTW, if String.getBytes(Charset) is more efficient, we could add that to BytePointer for ease of use.

@gwenn
Copy link
Author

gwenn commented Feb 17, 2016

@saudet
Copy link
Member

saudet commented Feb 17, 2016

Right, but we still need to allocate bytes on the heap. It doesn't do it in native code somehow.

Or are you saying there would be an advantage in setting the default encoding used by BytePointer? We could add a property for that, and do the same as JNA.

@gwenn
Copy link
Author

gwenn commented Feb 17, 2016

Maybe I am biased but I want my program to preserve data integrity (data not corrupted by a bad encoding) even if there is an additional cost.
I guess that many JNI bindings using GetStringUTFChars are buggy (including mine).

Something like that

static native void foo(@Encoding("UTF-8")String bar);

would be perfect.

@gwenn
Copy link
Author

gwenn commented Feb 19, 2016

Some benchmarks using the BenchmarkDriver available here:

=====================sqlite-jna==========================
Driver: org.sqlite.driver.JDBC
URL:jdbc:sqlite:/Users/gwen/Java/sqlite-jna/bench.sqlite

Scale factor value: 1
Number of clients: 10
Number of transactions per client: 10

Initializing dataset...DBMS: SQLite 3

* Starting Benchmark Run *

* Benchmark Report *
* Featuring <direct queries> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.485 seconds.
0 / 100 failed to complete.
Transaction rate: 206.18556701030928 txn/sec.

* Benchmark Report *
* Featuring <direct queries> <transactions> 
--------------------
Time to execute 100 transactions: 0.295 seconds.
0 / 100 failed to complete.
Transaction rate: 338.98305084745766 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.402 seconds.
0 / 100 failed to complete.
Transaction rate: 248.75621890547262 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <transactions> 
--------------------
Time to execute 100 transactions: 0.196 seconds.
0 / 100 failed to complete.
Transaction rate: 510.204081632653 txn/sec.
=====================sqlite-javacpp===========================
Driver: org.sqlite.driver.JDBC
URL:jdbc:sqlite:/Users/gwen/Java/sqlite-javacpp/bench.sqlite

Scale factor value: 1
Number of clients: 10
Number of transactions per client: 10

Initializing dataset...DBMS: SQLite 3

* Starting Benchmark Run *

* Benchmark Report *
* Featuring <direct queries> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.423 seconds.
1 / 100 failed to complete.
Transaction rate: 234.04255319148936 txn/sec.

* Benchmark Report *
* Featuring <direct queries> <transactions> 
--------------------
Time to execute 100 transactions: 0.274 seconds.
0 / 100 failed to complete.
Transaction rate: 364.963503649635 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.317 seconds.
0 / 100 failed to complete.
Transaction rate: 315.45741324921136 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <transactions> 
--------------------
Time to execute 100 transactions: 0.159 seconds.
0 / 100 failed to complete.
Transaction rate: 628.9308176100628 txn/sec.
==================sqlite-jni===============================
Driver: org.sqlite.driver.JDBC
URL:jdbc:sqlite:/Users/gwen/Java/sqlite-jni/bench.sqlite

Scale factor value: 1
Number of clients: 10
Number of transactions per client: 10

Initializing dataset...DBMS: SQLite 3

* Starting Benchmark Run *

* Benchmark Report *
* Featuring <direct queries> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.399 seconds.
0 / 100 failed to complete.
Transaction rate: 250.62656641604008 txn/sec.

* Benchmark Report *
* Featuring <direct queries> <transactions> 
--------------------
Time to execute 100 transactions: 0.28 seconds.
0 / 100 failed to complete.
Transaction rate: 357.1428571428571 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.325 seconds.
0 / 100 failed to complete.
Transaction rate: 307.6923076923077 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <transactions> 
--------------------
Time to execute 100 transactions: 0.175 seconds.
0 / 100 failed to complete.
Transaction rate: 571.4285714285714 txn/sec.
Java version: 1.8.0_60, vendor: Oracle Corporation
OS name: "mac os x", version: "10.11.3", arch: "x86_64", family: "mac"

@saudet
Copy link
Member

saudet commented Feb 20, 2016

So, the wrapper based on JavaCPP is on average faster than the custom JNI? Cool :) Thanks for taking the time!

Of course, I understand the need to use standard UTF-8, but it's not just about UTF-8. Specifying the encoding info in an annotation like that prevents the end user from using any other encodings. In the general case, not in the case of SQLite, we often need to be able to set it at runtime, don't we?

@gwenn
Copy link
Author

gwenn commented Feb 20, 2016

There are other libraries supporting only utf-8:

xmlChar, the libxml2 data type is a byte, those bytes must be assembled as UTF-8 valid strings.

Jansson uses UTF-8 as the character encoding. All JSON strings must be valid UTF-8

@saudet
Copy link
Member

saudet commented Feb 20, 2016

Ah, why did the JDK go with "modified UTF-8"...

@saudet
Copy link
Member

saudet commented Feb 21, 2016

BTW, if you make a blog post or something about your experience using JNA, JNR, JNI, JavaCPP, BridJ, etc let me know! Thanks

saudet added a commit that referenced this issue Apr 17, 2016
…IFIED_UTF8_STRING` for old behavior) (issue #70)
@saudet
Copy link
Member

saudet commented Apr 17, 2016

About UTF-8 encoding, I've switched to String.getBytes() and new String(byte[]) by default in the latest commit, to have the default charset used. This is usually UTF-8 on Linux, but we can launch the JVM with the -Dfile.encoding=UTF-8 option to make sure that is what gets selected.

The calls are a bit slow, about 300 ns, but GetStringUTFChars() isn't exactly fast either (about 100 ns), so I think it's a good trade-off. Things are more consistent that way.

Let me know if you encounter any problems with that! Thanks for all the tests you've done :)

@gwenn
Copy link
Author

gwenn commented Apr 17, 2016

Sorry, but for me, it is useless:
I don't want to use/change a global state/variable to say that sqlite3_bind_text expects UTF-8.

@saudet
Copy link
Member

saudet commented Apr 17, 2016

Well, ok, say we had an annotation and we could write something like the following:

static native void foo(@Encoding("UTF-8") String bar);

Other than saving a couple of keystrokes (that we could save using the parser anyway) what is the rationale for not doing it the way we already can as below?

static native void foo(BytePointer bar);
static void foo(String bar) { foo(new BytePointer(bar, "UTF-8")); }

Where we could wrap new BytePointer(bar, "UTF-8") in a helper method BTW. Also, sometimes, we can't get String objects from the native API, only BytePointer, so usually we do need overloads with BytePointer anyway.

In any case, if it's important enough, please do send a pull request, but unless I understand the benefit it's not something I'm going to take time to implement and maintain myself.

@saudet
Copy link
Member

saudet commented Apr 18, 2016

I've just had an idea though. We could hard code the encoding used for all String passed to and received from a given library via a macro like this @Platform(define = "STRING_BYTES_ENCODING UTF-8", ...). Now that's something that would make sense to me... What do you think?

@gwenn
Copy link
Author

gwenn commented Apr 18, 2016

Good idea.
And I only use the UTF-8 methods in my binding.
But SQLite supports two encodings by default: UTF-8 (sqlite3_bind_text) and UTF-16 (sqlite3_bind_text16)...
So your solution works in my case but not in all cases.

@saudet
Copy link
Member

saudet commented Apr 19, 2016

Right, but you have to understand that in general there isn't any good options with standard Java types like primitive arrays, String or even direct NIO buffers. For example, we can't resize a ByteBuffer or modify the content of a String, so when we need to support a native function with an output argument like that, it's just easier (and more efficient than whatever hack we can come up with) to just use BytePointer (or CharPointer for UTF-16 or IntPointer for UTF-32). There are other things Java can't do, like arrays with more than Integer.MAX_VALUE elements. We could go on and on, but the ultimate solution to everything remains the same: Use an appropriate subclass of Pointer. :)

@saudet saudet changed the title NullPointerException in Parser NullPointerException in Parser and poor Unicode support Dec 15, 2018
@saudet
Copy link
Member

saudet commented Jan 22, 2021

There is a good reason to provide UTF-16 support through JNI functions though: performance, see issue #442 (comment).

Thanks to @equeim, we should have this implemented soon! (Including "modified UTF-8" while we're at it.)

@saudet saudet removed the help wanted label Mar 1, 2021
@saudet
Copy link
Member

saudet commented Mar 1, 2021

Ok, it's done! Thanks to @equeim and his contribution in pull #460, we can now map String arguments to, for example, UTF-8 by annotating a class with something like @Platform(define = "STRING_BYTES_CHARSET \"UTF-8\", ...), while we can still use @AsUtf16 on individual String parameters to get fast access to the underlying UTF-16 code units.

@saudet
Copy link
Member

saudet commented Mar 1, 2021

BTW, we may want to update the guide here with a new section for that:
https://github.com/bytedeco/javacpp/wiki/Mapping-Recipes

@gwenn
Copy link
Author

gwenn commented Mar 8, 2021

Thanks. I will try to use sqlite3.*16 functions with @AsUtf16 annotation where possible. In these cases, I guess I can get rid of @Cast("const char*") BytePointer and use a String directly.

@saudet
Copy link
Member

saudet commented Mar 9, 2021

These changes have been released with JavaCPP 1.5.5. Enjoy!

Thanks. I will try to use sqlite3.*16 functions with @AsUtf16 annotation where possible. In these cases, I guess I can get rid of @Cast("const char*") BytePointer and use a String directly.

You can also provide overloads with CharPointer to allow passing UTF-16 data to and from off-heap memory as well.

@saudet saudet closed this as completed Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants