Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator: refactor StringAdapter to support std::u16string #448

Merged
merged 2 commits into from
Feb 1, 2021

Conversation

equeim
Copy link
Contributor

@equeim equeim commented Jan 9, 2021

Refactor StringAdapter to always operate on T internally (instead of char or wchar_t) and convert to requested pointer type when needed.

Also removed overloads for unsigned char*, wchar_t* and replaced signed int* with int*. As far as I understand, constructors, assign functions and conversion operators are used only to convert between std::basic_string and pointer types corresponding to String and *Pointer or *Buffer Java classes.

@saudet
Copy link
Member

saudet commented Jan 9, 2021

Refactor StringAdapter to always operate on T internally (instead of char or wchar_t) and convert to requested pointer type when needed.

It was already like that, no? Or do you mean you've created templates for the specialized functions? That's probably useful to extend to other types like char16_t, yes.

Also removed overloads for unsigned char*, wchar_t* and replaced signed int* with int*. As far as I understand, constructors, assign functions and conversion operators are used only to convert between std::basic_string and pointer types corresponding to String and *Pointer or *Buffer Java classes.

No, those are there for native functions. Functions like these do exists:

wchar_t* getMyWChars();
std::wstring getMyWString();

out.println("};");
out.println();
out.println("template<typename T> const T StringAdapter<T>::emptyStringLiteral[1] = { 0 };");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put that as a local variable so we don't need to allocate memory or have lines all over the place for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to completely get rid of it and call clear() in assign() when ptr is null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@equeim
Copy link
Contributor Author

equeim commented Jan 10, 2021

It was already like that, no?

I mean that now non-const ptr is allocated using new char[] or new wchar_t[] regardless of T. If you have StringAdapter<char16_t> and call converting operator to unsigned short*, new array will be allocated using new wchar_t[] which is incorrect on Unix systems. This PR makes StringAdapter always allocate new array using new T[] and then reintepret_cast it to desired pointer (char16_t* to unsigned short* in this case).

No, those are there for native functions. Functions like these do exists:

Well the first one is irrelevant because it is not basic_string and StringAdapter will not be used in this case.
As for second one - I don't see how wchar_t* overload will be used in this case.
If we are converting std::wstring to IntPointer, ShortPointer or CharPointer, then int*, short* (which doesn't exist right now) and unsigned short* overload will be used.

@saudet
Copy link
Member

saudet commented Jan 11, 2021

It was already like that, no?

I mean that now non-const ptr is allocated using new char[] or new wchar_t[] regardless of T. If you have StringAdapter<char16_t> and call converting operator to unsigned short*, new array will be allocated using new wchar_t[] which is incorrect on Unix systems. This PR makes StringAdapter always allocate new array using new T[] and then reintepret_cast it to desired pointer (char16_t* to unsigned short* in this case).

But it still works like it did with wchar_t, right? I see it more like a genericization of the methods.

No, those are there for native functions. Functions like these do exists:

Well the first one is irrelevant because it is not basic_string and StringAdapter will not be used in this case.
As for second one - I don't see how wchar_t* overload will be used in this case.
If we are converting std::wstring to IntPointer, ShortPointer or CharPointer, then int*, short* (which doesn't exist right now) and unsigned short* overload will be used.

Right, I suppose we could add a WCharTPointer class at some point though. Any reason to remove it? We should be able to genericize it with P there as well. Actually, shouldn't we do exactly like VectorAdapter and have both types in the class template? That helps avoid some ambiguous conversions.

BTW, the reason I use unsigned short* and signed int* is because that's what the JNI types for char and int are, so please make sure those stay like that. Thanks!

@equeim
Copy link
Contributor Author

equeim commented Jan 11, 2021

But it still works like it did with wchar_t, right? I see it more like a genericization of the methods.

For std::wstring, yes. Generalization of array allocation function allows StringAdapter to work with any basic_string, not just std::string or std::wstring.

Actually, shouldn't we do exactly like VectorAdapter and have both types in the class template

Well, doing it like VectorAdapter would require adding second template parameter to StringAdapter and I'm not sure how to do it correctly in Generator. It would be easier to make conversion operator itself a function template (basically rename template<typename P> P* getPtrImpl to template<typename P> operator P*()).

Ideally, I would also like to add static check so that it would be a compilation error if P and T have different sizes. But that can potentially break existing code (you wouldn't be able to compile both testStdWString functions on the same platform, for example).

@saudet
Copy link
Member

saudet commented Jan 12, 2021

Well, doing it like VectorAdapter would require adding second template parameter to StringAdapter and I'm not sure how to do it correctly in Generator. It would be easier to make conversion operator itself a function template (basically rename template<typename P> P* getPtrImpl to template<typename P> operator P*()).

We don't need to change anything in Generator for this. A second template parameter should work the same as VectorAdapter. If you get any errors when doing that though, please let me know!

Ideally, I would also like to add static check so that it would be a compilation error if P and T have different sizes. But that can potentially break existing code (you wouldn't be able to compile both testStdWString functions on the same platform, for example).

When the types are fixed in the class template, that's pretty hard to get wrong, so I don't think we need to worry about that.

@equeim
Copy link
Contributor Author

equeim commented Jan 12, 2021

We don't need to change anything in Generator for this. A second template parameter should work the same as VectorAdapter. If you get any errors when doing that though, please let me know!

How to use it correctly though? It seems that Parser generates annotations that make Generator use only one template parameter (with second defaulting to be same as first). Looks like it uses @Cast instead for some reason.
I tried to do it manually like this:

std::vector<char16_t> testVector();
public static native @StdVector("unsigned short, char16_t") char[] testVector();
// Parser generated
// public static native @Cast("char16_t*") @StdVector CharPointer testVector();

And immediately hit compilation error:

invalid conversion from «__gnu_cxx::__alloc_traits<std::allocator<char16_t>, char16_t>::value_type*» {aka «char16_t*»} to «const short unsigned int*»

It looks like a bug in VectorAdapter's operator const P*() - it doesn't cast &vec[0] from T* to P*.

@saudet
Copy link
Member

saudet commented Jan 13, 2021

Right, there's probably just a cast missing there. I'm not able to find a working example at the moment. I had to use it in the past, but it looks like I'm not using it anymore as part of the presets, so it's not getting tested. So, don't worry too much about it :)

@saudet
Copy link
Member

saudet commented Jan 20, 2021

Any updates on moving the type template? It should be pretty easy to make it the same as VectorAdapter, but let's not worry about testing this too much if it's not used right now for StringAdapter either anyway.

@equeim
Copy link
Contributor Author

equeim commented Jan 20, 2021

Sorry for delay.
I though about it for some time, and I can't see a way to introduce second tempate parameter (while removing existing manual overloads) and don't break existing Java code. Right now StringAdapter works as it is because of these overloads. If we remove them and introduce template parameter, then StringAdapter will require specializing it manually or using Cast instead, like VectorAdapter. Meaning that this code:

static native @StdString BytePointer testStdString(@StdString BytePointer str);

will break because C++ compiler will complain about converting from StringAdapter<char> to signed char*.

@equeim
Copy link
Contributor Author

equeim commented Jan 20, 2021

I updated PR to illustrate the issue.

@saudet
Copy link
Member

saudet commented Jan 21, 2021

Well, it may not be fully backward compatible, but it seems to work fine when handled more closely to VectorAdapter, like this:

diff --git a/src/main/java/org/bytedeco/javacpp/annotation/StdString.java b/src/main/java/org/bytedeco/javacpp/annotation/StdString.java
index e2b0940..aaf0080 100644
--- a/src/main/java/org/bytedeco/javacpp/annotation/StdString.java
+++ b/src/main/java/org/bytedeco/javacpp/annotation/StdString.java
@@ -17,7 +17,7 @@ import org.bytedeco.javacpp.tools.Generator;
  */
 @Documented @Retention(RetentionPolicy.RUNTIME)
 @Target({ElementType.METHOD, ElementType.PARAMETER})
-@Cast({"std::basic_string", "&"}) @Adapter("StringAdapter")
+@Adapter("StringAdapter")
 public @interface StdString {
-    String value() default "char";
-}
\ No newline at end of file
+    String value() default "";
+}
diff --git a/src/main/java/org/bytedeco/javacpp/tools/Generator.java b/src/main/java/org/bytedeco/javacpp/tools/Generator.java
index 4755371..7e2af9e 100644
--- a/src/main/java/org/bytedeco/javacpp/tools/Generator.java
+++ b/src/main/java/org/bytedeco/javacpp/tools/Generator.java
@@ -1082,7 +1082,7 @@ public class Generator {
             out.println("#include <string>");
             out.println("template<typename P, typename T = P> class JavaCPP_hidden StringAdapter {");
             out.println("public:");
-            out.println("    StringAdapter(const P* ptr, typename std::basic_string<T>::size_type size, void* owner) : str(str2) { assign(ptr, size, owner); }");
+            out.println("    StringAdapter(const P* ptr, typename std::basic_string<T>::size_type size, void* owner) : str(str2) { assign((P*)ptr, size, owner); }");
             out.println();
             out.println("    StringAdapter(const std::basic_string<T>& str) : size(0), owner(NULL), ptr(NULL), str2(str), str(str2) { }");
             out.println("    StringAdapter(      std::basic_string<T>& str) : size(0), owner(NULL), ptr(NULL), str(str) { }");
diff --git a/src/test/java/org/bytedeco/javacpp/AdapterTest.java b/src/test/java/org/bytedeco/javacpp/AdapterTest.java
index e1fdcd3..e367d94 100644
--- a/src/test/java/org/bytedeco/javacpp/AdapterTest.java
+++ b/src/test/java/org/bytedeco/javacpp/AdapterTest.java
@@ -48,13 +48,13 @@ import static org.junit.Assert.*;
 public class AdapterTest {
 
     static native @StdString String testStdString(@StdString String str);
-    static native @StdString BytePointer testStdString(@StdString BytePointer str);
+    static native @StdString @Cast("char*") BytePointer testStdString(@StdString @Cast("char*") BytePointer str);
 
-    static native @StdWString CharPointer testStdWString(@StdWString CharPointer str);
-    static native @StdWString IntPointer testStdWString(@StdWString IntPointer str);
+    static native @StdString @Cast("wchar_t*") CharPointer testStdWString(@StdString @Cast("wchar_t*") CharPointer str);
+    static native @StdString @Cast("wchar_t*") IntPointer testStdWString(@StdString @Cast("wchar_t*") IntPointer str);
 
-    static native @StdString("char16_t") CharPointer testStdU16String(@StdString("char16_t") CharPointer str);
-    static native @StdString("char32_t") IntPointer testStdU32String(@StdString("char32_t") IntPointer str);
+    static native @StdString @Cast("char16_t*") CharPointer testStdU16String(@StdString @Cast("char16_t*") CharPointer str);
+    static native @StdString @Cast("char32_t*") IntPointer testStdU32String(@StdString @Cast("char32_t*") IntPointer str);
 
     static native String testCharString(String str);
     static native @Cast("char*") BytePointer testCharString(@Cast("char*") BytePointer str);
@@ -63,7 +63,7 @@ public class AdapterTest {
 
     static native IntPointer testIntString(IntPointer str);
 
-    static native @Const @ByRef @StdString byte[] getConstStdString();
+    static native @Const @ByRef @StdString @Cast("char*") byte[] getConstStdString();
 
     static class SharedData extends Pointer {
         SharedData(Pointer p) { super(p); }

So, what about this. We leave StringAdapter and @StdString like they are, but we introduce a new BasicStringAdapter that is pretty much like your modified StringAdapter in the last commit above, along with new annotations like @StdBasicString, @StdU16String, and @StdU32String?

@equeim
Copy link
Contributor Author

equeim commented Jan 21, 2021

So, @StdU16String and @StdU32String will use BasicStringAdapter, while @StdString and @StdWString will use StringAdapter?

@saudet
Copy link
Member

saudet commented Jan 22, 2021

Yes, that would be the idea. We could probably switch @StdWString to @StdBasicString as well since I'm not aware of many users, but that means making any changes there is probably moot. Thoughts?

Add BasicStringAdapter, which is new version of StringAdapter
that can work with any basic_string<T>.

Also add `@StdU16String` and `@StdU32String` annotations
to use with `std::u16string` and `std::u32string`.
@saudet saudet merged commit 7918521 into bytedeco:master Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants