Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Table.GenerateByPage to handle empty tables being returned #484

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bgribaudo
Copy link
Contributor

Per Table.GenerateByPage's Helper Functions description, this method should call getNextPage until that callback returns null, then will combine all returned pages into a single table.

However, when the existing Table.GenerateByPage returns an empty table, its actual behavior differs from the above description (see below examples). If the callback returns an empty table, Table.GenerateByPage may output 0 rows or an extra, all-null row--neither of which seems desirable or in alignment with the above description.

This PR addresses these behavior anomalies. (Note: It would be good for someone with internal Power Query knowledge to validate this PR to ensure that it is done in a way that aligns with PQ's lazy paradigm and that it does not trigger extra streaming.)

Demo of Existing's Behavior Anomalies

let
    // Existing version of this method
    Table.GenerateByPage = 
        (getNextPage as function) as table =>
            let        
                listOfPages = List.Generate(
                    () => getNextPage(null),            // get the first page of data
                    (lastPage) => lastPage <> null,     // stop when the function returns null
                    (lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
                ),
                // concatenate the pages together
                tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
                firstRow = tableOfPages{0}?
            in
                // if we didn't get back any pages of data, return an empty table
                // otherwise set the table type based on the columns of the first page
                if (firstRow = null) then
                    Table.FromRows({})
                // check for empty first table
                else if (Table.IsEmpty(firstRow[Column1])) then
                    firstRow[Column1]
                else
                    Value.ReplaceType(
                        Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),
                        Value.Type(firstRow[Column1])
                    )
    ,
    PageFunction = (previous) => 
        let
            /*
            // Actual: In this scenario, 2 rows are output, the 2nd holding a null column value because Response2 contained 0 rows.
            // Expected: Only 1 row (from Response1) should have been returned.
            Response1 = #table({"A"},{{1}}),
            Response2 = #table({"A"},{})
            */
            // Actual: In this scenario, 0 rows are output. Response1 contained no rows so Response2's data is skipped
            // Expected: 1 row returned (row from Response2)
            Response1 = #table({"A"},{}),
            Response2 = #table({"A"},{{1}})
        in
            if previous = null then Response1 
            else if previous = Response1 then Response2
            else null,
    Result = Table.GenerateByPage(PageFunction)
in
    Result

@CurtHagenlocher
Copy link
Contributor

We specifically can't use Table.Combine, as that will effectively break the streaming behavior for most cases.

FWIW, we somehow ended up with an older implementation as the "boilerplate" for this function. A slightly newer definition -- albeit one which still doesn't fix the "empty row" problem -- looks like this:

Table.GenerateByPage = (
    getNextPage as function,
    optional tableType as type
) as table =>
    let
        listOfPages = List.Generate(
            () => getNextPage(null),
            (lastPage) => lastPage <> null,
            (lastPage) => getNextPage(lastPage)
        ),
        tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
        firstRow = tableOfPages{0}?,
        keys = if tableType = null then Table.ColumnNames(firstRow[Column1])
            else Record.FieldNames(Type.RecordFields(Type.TableRow(tableType))),
        appliedType = if tableType = null then Value.Type(firstRow[Column1]) else tableType
    in
        if tableType = null and firstRow = null then
            Table.FromRows({})
        else
            Value.ReplaceType(
                Table.ExpandTableColumn(tableOfPages, "Column1", keys),
                appliedType
            )

What's better about it is that when you know the table type, it avoids the duplicate request for the first page.

@bgribaudo
Copy link
Contributor Author

@CurtHagenlocher,

Thanks for that revised logic! I was planning to look into the effects of the first-row-get-type logic today--but the revision you provided should save me that effort. :-)

Is switching from Table.Combine to filtering listOfPages using Table.IsEmpty any better from the streaming perspective (see PR revision made a few minutes ago)?

Without this change, `try Table.GenerateByPage(each error "bad", type table[A=any])` won't catch the `error "bad"` raised here (which simulates the first data page fetch attempt raising an error).
@bgribaudo
Copy link
Contributor Author

@CurtHagenlocher, do you like either of the last two commits ("Adjusting error propagation" or "Reverting last change to avoid triggering a row fetch")?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants