You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I made a standard dataframe without any exotic data types and tried to make a lance v2 table but got an index out of bounds error. Here is my python code:
importasyncioimportosimportpolarsasplimportlancedb# 8/1/24 - THE GOAL IS TO SUCCESSFULLY MAKE A LANCE v2 TABLE AND COMPARE THE COMPRESSION TO PARQUET# Set environment variables to enable the new featuresos.environ['LANCE_USE_FSST'] ='1'os.environ['LANCE_USE_BITPACKING'] ='1'# Check that lancedb folder exists. If not, create itifnotos.path.exists('data'):
os.makedirs('data')
asyncdefmain():
df=pl.read_parquet('data/transactions.parquet')
withawaitlancedb.connect_async("data") asconn:
awaitconn.create_table(name="transactions_v1", data=df, use_legacy_format=True)
awaitconn.create_table(name="transactions_v2", data=df, use_legacy_format=False)
# Run the async main functionasyncio.run(main())
error:
thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-encoding-0.15.0/src/encodings/physical/bitpack.rs:161:13:
index out of bounds: the len is 34689 but the index is 34689
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-encoding-0.15.0/src/encodings/logical/primitive.rs:421:32:
called `Result::unwrap()` on an `Err` value: JoinError::Panic(Id(2158), ...)
Traceback (most recent call last):
File "/home/evan/Documents/hypersync_lancev2/write_lance.py", line 23, in <module>
asyncio.run(main())
File "/home/evan/.rye/py/cpython@3.12.2/install/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/evan/.rye/py/cpython@3.12.2/install/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/evan/.rye/py/cpython@3.12.2/install/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/evan/Documents/hypersync_lancev2/write_lance.py", line 20, in main
await conn.create_table(name="transactions_v2", data=df, use_legacy_format=False)
File "/home/evan/Documents/hypersync_lancev2/.venv/lib/python3.12/site-packages/lancedb/db.py", line 778, in create_table
new_table = await self._inner.create_table(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_asyncio.RustPanic: rust future panicked: unknown error```
### Are there known steps to reproduce?
The dataset I used can be replicated with `pip install hypersync` and running this code to download the data
```import asyncio
import hypersync
import polars as pl
import time
from hypersync import ColumnMapping, DataType, TransactionField, BlockField, TransactionSelection
async def historical_blocks_txs_sync():
"""
Use hypersync to query blocks and transactions and write to a LanceDB table. Assumes existence of a previous LanceDB table to
query for the latest block number to resume querying.
"""
# hypersync client, load with specific url
client = hypersync.HypersyncClient(hypersync.ClientConfig())
# set the block range
from_block: int = 20000000
to_block: int = 20025000
# # add +/-1 to the block range because the query is not inclusive to the block number
query = hypersync.Query(
from_block=from_block-1,
to_block=to_block+1,
include_all_blocks=True,
transactions=[TransactionSelection()],
field_selection=hypersync.FieldSelection(
block=[e.value for e in BlockField],
transaction=[e.value for e in TransactionField],
)
)
# Setting this number lower reduces client sync console error messages.
query.max_num_transactions = 1_000 # for troubleshooting
# configuration settings to predetermine type output here
config = hypersync.StreamConfig(
hex_output=hypersync.HexOutput.PREFIXED,
column_mapping=ColumnMapping(
transaction={
TransactionField.GAS_USED: DataType.FLOAT64,
TransactionField.MAX_FEE_PER_BLOB_GAS: DataType.FLOAT64,
TransactionField.MAX_PRIORITY_FEE_PER_GAS: DataType.FLOAT64,
TransactionField.GAS_PRICE: DataType.FLOAT64,
TransactionField.CUMULATIVE_GAS_USED: DataType.FLOAT64,
TransactionField.EFFECTIVE_GAS_PRICE: DataType.FLOAT64,
TransactionField.NONCE: DataType.INT64,
TransactionField.GAS: DataType.FLOAT64,
TransactionField.MAX_FEE_PER_GAS: DataType.FLOAT64,
TransactionField.MAX_FEE_PER_BLOB_GAS: DataType.FLOAT64,
TransactionField.VALUE: DataType.FLOAT64,
},
block={
BlockField.GAS_LIMIT: DataType.FLOAT64,
BlockField.GAS_USED: DataType.FLOAT64,
BlockField.SIZE: DataType.FLOAT64,
BlockField.BLOB_GAS_USED: DataType.FLOAT64,
BlockField.EXCESS_BLOB_GAS: DataType.FLOAT64,
BlockField.BASE_FEE_PER_GAS: DataType.FLOAT64,
BlockField.TIMESTAMP: DataType.INT64,
}
)
)
return await client.collect_parquet('data', query, config)
# time the query
start_time = time.time()
data = asyncio.run(historical_blocks_txs_sync())
end_time = time.time()
print(f"Time taken: {end_time - start_time}")
The text was updated successfully, but these errors were encountered:
LanceDB version
v0.7.14
What happened?
I made a standard dataframe without any exotic data types and tried to make a lance v2 table but got an index out of bounds error. Here is my python code:
error:
The text was updated successfully, but these errors were encountered: