Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitions not pushed down #2090

Closed
rtyler opened this issue Jan 18, 2024 · 1 comment
Closed

Partitions not pushed down #2090

rtyler opened this issue Jan 18, 2024 · 1 comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working

Comments

@rtyler
Copy link
Member

rtyler commented Jan 18, 2024

Environment

Delta-rs version: 0.16.5

Binding: Rust

Environment:

  • Cloud provider: AWS/S3
  • OS: Linux
  • Other:

Bug

The table has a ds partition and whenever the DF query has more criteria it seems to fail to push down partition filters and a full table scan happens

use deltalake::arrow::util::pretty::print_batches;
use deltalake::datafusion::execution::context::SessionContext;
use deltalake::datafusion::prelude::*;
use deltalake::datafusion::common::*;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    let ctx = SessionContext::new();
    let table = deltalake::open_table("s3://REDACTED")
        .await
        .expect("Failed to open the delta table");

    let df = ctx.read_table(Arc::new(table)).expect("Failed to load table");
    let batches = df.filter(col("ds").eq(Expr::Literal(ScalarValue::from("2024-01-18"))))?
                    // adding this filter results in an apparent full table scan.
                    //
                    // With this commented out the job runs in 13s
                    //.filter(col("event_name").eq(Expr::Literal(ScalarValue::from("REDACTED"))))?
                    .limit(0, Some(10))?
                    .collect()
                    .await
                    .expect("Failed to build dataframe");
    /*
     * This SQL query also appears to trigger a scan
    ctx.register_table("source", Arc::new(table)).expect("Failed to register table with datafusion");
    let batches = ctx
        .sql("SELECT * FROM source WHERE ds = '2024-01-18' AND event_name = 'REDACTED' LIMIT 10")
        .await
        .expect("Failed to execute query")
        .collect()
        .await
        .expect("Failed to collect batches");
    */

    print_batches(&batches).expect("Failed to print batches");
    Ok(())
}

What happened:

What you expected to happen:

How to reproduce it:

More details:

@rtyler rtyler added bug Something isn't working binding/rust Issues for the Rust crate labels Jan 18, 2024
@rtyler
Copy link
Member Author

rtyler commented Jan 20, 2024

I'm going to close this. I will re-open it if I can reproduce it, but I had a repro case last week and cannot reproduce it again 🤦

@rtyler rtyler closed this as completed Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant