Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python API: Segfault with nodes() and rels() when using get_as_arrow() #3224

Closed
cmdlineluser opened this issue Apr 6, 2024 · 3 comments · Fixed by #2539
Closed

Python API: Segfault with nodes() and rels() when using get_as_arrow() #3224

cmdlineluser opened this issue Apr 6, 2024 · 3 comments · Fixed by #2539
Assignees
Labels
bug Something isn't working

Comments

@cmdlineluser
Copy link

import kuzu
import pathlib
import tempfile

with tempfile.TemporaryDirectory() as tmp:
    tmp = Path(tmp)
    
    users = tmp / "users.csv"
    users.write_bytes(b"A\nB\nC\n")
    
    follows = tmp / "follows.csv"
    follows.write_bytes(b"A,B\nB,C\n")

    db = kuzu.Database(tmp)
    c = kuzu.Connection(db)

    c.execute("CREATE NODE TABLE User(name STRING, PRIMARY KEY (name))")
    c.execute("CREATE REL TABLE Follows(FROM User TO User)")

    c.execute(f'COPY User FROM "{str(users)}"')
    c.execute(f'COPY Follows FROM "{str(follows)}"')

    df = c.execute("""
    MATCH (a:User)-[f:Follows* SHORTEST]->(b:User) 
    RETURN a.name, b.name, nodes(f) as nodes
    """).get_as_pl()
    
    print(df)
    # segfault

I guess it's an arrow issue as .get_as_arrow() also segfaults.

.get_as_df() returns a pandas DataFrame without error:

  a.name b.name                                              nodes
0      B      C                                                 []
1      A      B                                                 []
2      A      C  [{'_id': {'offset': 1, 'table': 0}, '_label': ...

properties(nodes(f), 'name') works as expected and returns a list[str] which suggests the {} dict/struct type may be the issue.

@prrao87
Copy link
Member

prrao87 commented Apr 6, 2024

Yep, that seems like it's to do with Arrow, so once that's addressed, the get_as_pl() would also work as intended.

@prrao87 prrao87 added the bug Something isn't working label Apr 6, 2024
@prrao87 prrao87 changed the title Python API: nodes() and rels() causing segfault when calling .get_as_pl() Python API: Segfault with nodes() and rels() when using get_as_arrow() Apr 6, 2024
@mxwli mxwli linked a pull request Apr 8, 2024 that will close this issue
@andyfengHKU
Copy link
Contributor

I wonder if @cmdlineluser can double check if this fixes the issue.

@prrao87
Copy link
Member

prrao87 commented Apr 11, 2024

@cmdlineluser you can test this out on your sample DB with the latest pre release:

pip install --pre kuzu

Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants