Please add comments to the associated issue #355.
There is an experimental implementation of a query facility for Ballerina, which builds on the table and stream features.
List comprehensions are a well-established feature of many programming languages. At a high level, queries can be viewed as a kind of comprehension. They consist of:
- iterating over some kind of collection
- for each member of the collection
- binding variables
- using those variables to perform a boolean test
- if the test succeeds, using those variables to produce some new values
- constructing a new collection using the values produced in the iteration
An SQL select statement is a kind of comprehension that iterates over the rows of a table. The XQuery FLWOR statement is a kind of comprehension that iterates over XML sequences.
C# provides a sophisticated language-integrated query (LINQ) facility, with an SQL-like syntax, which is a generalized and sophisticated kind of list comprehension.
This document proposes a simple form of language-integrated query for Ballerina, with a syntax similar to C#.
In its simplest form, a query has the following syntax:
query-expr := (foreach-clause [where-clause])+ select-clause
foreach-clause := "foreach" typed-binding-pattern "in" expression
where-clause := "where" expression
select-clause := "select" expression
A query-expr
is a kind of comprehension that builds on existing Ballerina language elements:
- the
foreach-clause
iterates and binds variables- it works like the existing
foreach
statement - the
in
expression must result in an iterable value, which is used to create an iterator object - an iterator object is created from the iterable value
- the typed-binding-pattern can use destructuring to bind multiple variables
- any variables bound by the typed-binding-pattern are in scope in the where-clause and the select-clause
- multiple foreach clauses are similar to a join
- it works like the existing
- the
where
clause uses a boolean expression to filter - the
select
clause produces values, which become members of the constructed collection
A common case is to query some kind of collection of records, where the collection might be a list, a table or a stream. In this case:
- the typed-binding-pattern would often include a mapping-binding-pattern
- the expression in the select-clause would typically be a mapping constructor expression
A query-expr
is not restricted to constructing lists. The type of value constructed is determined as follows:
- If there is a contextually expected type that allows only one basic type of structure, then the contextually expected type determines what is constructed. In this case,
the contextually expected type for the
query-expr
will determine the contextually expected type for the expression in the select clause. - Otherwise, the basic type of the value cthat is constructed comes from the basic type of the value being iterated over. In other words the static type of the
in
expression in the foreach-clause determines the basic type of what is constructed. The static type of theselect
expression determines the member type of what is constructed. The static typing of mapping constructor expr will need to be refined to work as expected in the select-clause, since the spec currently makes the static type of a mapping constructor expr bemap<T>
if there is no contextually expected type; instead we would want to say that when the query-expr has no contextually expected type, the contextually expected type for the select-expr consists of only a preference for a specific record/tuple type rather than a broader map/list. To make this work in an extensible way with objects representing collections, we will need a richerQueryable
interface similar toIteratable
that allows for construction. This could also support queries that mutate collections (see below).
Assuming
type Person record {|
string firstName;
string lastName;
string country;
|};
Person[] personList;
We could write
var x =
foreach var person in personList
where person.country == "UK"
select {
firstName: person.firstName,
lastName: person.lastName
};
or
var x =
foreach var { firstName: nm1, lastName: nm2, country: c } in personList
where c == "UK"
select { firstName: nm1, lastName: nm2 };
This can be shortened to:
var x =
foreach var { firstName, lastName, country } in personList
where country == "UK"
select { firstName: firstName, lastName: lastName };
but we should also have a shortcut mapping constructor syntax to parallel the shortcut mapping binding pattern syntax, so you can write:
var x =
foreach var { firstName, lastName, country } in personList
where country == "UK"
select { firstName, lastName };
The inferred type of x would be:
record {| string firstName; string lastName; |}[]
We also should make spread operator work in mapping constructors:
var x =
foreach var { country, ...rest } in personList
where country == "UK"
select { country, ...rest }
The syntax proposed above is inspired by C# LINQ (which was in turn inspired by SQL). C# uses from
instead of foreach
presumably following SQL. However I don't think from
reads well when it is before the select
, as it must be to ensure that declarations always lexically precede their usage.
var x =
from var { firstName, lastName, country } in personList
where country == "UK"
select { firstName, lastName };
More importantly, foreach
in a query is conceptually doing exactly the same thing as foreach in the existing foreach
statement, i.e. binding a variable to successive members of an iteration.
For uniformity, we could also allow our existing foreach
statement to have a where clause.
The reason to stick with select
rather a keyword that suggests addition (e.g. add
or insert
) is to avoid making it seem that this mutates an existing table, particularly if we have keywords update
and delete
that do mutate (see below).
A keyword-based syntax enables adding more clauses to provide richer query functionality. For example, C# provides join
, group
and orderby
clauses.
The proposed SQL-like syntax creates an important design problem. What type of structure does the query-expr
construct?
The proposed rule that a query-expr should by default create a collection of the same kind as it is iterating over is motivated by consistency with the map/filter functions, which always construct a result of the same basic type as their first argument.
In C# the query syntax is sugar that maps onto a variety of complex method calls making sophisticated use of parameterized types. The result will be an IEnumerable<T>
for some T. In our case, we want the query syntax to simply produce a table. However, the comprehension concept (and indeed syntax) works just as well for lists (and other iterable collections) as it does for tables.
With a comprehension, you have a source collection type and a result collection type. These do not have to be the same. It is straightforward to allow the expression in the from clause to be anything iterable, just as with foreach. The problem is how to decide what the type of the result collection should be.
An alternative would be use a keyword other than select when constructing xml or list values.
Note that the type of <T>E
is the intersection of T and the static type of E. So you can use <table>query-expr
to construct a table from a list, without having to specify the precise type of the table.
There is an alternative syntactic approach based on constructor syntax, which makes explicit the kind of thing that is explicit. This is the usual syntactic approach for list comprehensions e.g. Python. Applying this approach to lists for Ballerina, would give something like
list-comprehension-expr :=
"[" (foreach-clause [where-clause])+ select-clause "]"
foreach-clause := "foreach" typed-binding-pattern "in" expression
where-clause := "where" expression
select-clause := "select" expression
For example:
var list = [foreach var i in 1 ..< 10 where i % 2 = 0 select i]
Applying this to tables might give us something like this:
var x =
table [
foreach var { firstName, lastName, country } in personList
where country == "UK"
add { firstName, lastName }
]
I'm not sure if I like this better. It feels a little better to me with curly braces:
var x =
table {
foreach var { firstName, lastName, country } in personList
where country == "UK"
select { firstName, lastName }
}
Problems with this approach:
- does not extend to update/delete
- does not extend (at least not obviously) to XML
- does not extend to aggregations
We want something like this to work:
foreach var { firstName, lastName, country } in personList
where country == "UK"
orderby firstName, lastName
select { firstName, lastName }
foreach var person in personList
orderby person.firstName, person.lastName
select person
The orderby clause conceptually specifies a function from a row to a sequence of values, which is used to sort the rows.
We want to allow this to do the right i18n thing without building this into the core language (different locales need different sort orders). Probably the orderby clause needs to be able to reference a collation object that can be provided by a standard library module.
This should work for lists and xml as well.
The static type of the expressions in the orderby clause must be ordered, where the following rules determine whether a type is ordered:
- a subtype of a simple basic type is ordered
- strings are ordered as with codePointCompare
- numbers are ordered by <
- -0.0f is before +0.0f
- need to say how NaN is handled (IEEE has a totalOrder function)
- if T is ordered, T? is ordered; nil comes first
- if T is ordered, T[] is ordered; ordering is lexicographic
C# LINQ does this quite nicely:
join-clause :=
"join" typed-binding-pattern "in" iterable-expr
"on" left-expr "equals" right-expr
This does an inner equi-join (which is the most useful kind of join).
There's some SQL-funkiness as regards handling NULL.
group-clause :=
"group" construct-expr "by" group-expr "into" typed-binding-pattern
construct-expr
and group-expr
are evaluated once for each iteration yielding values c and g.
For each distinct (using ==
) value of g
- a list is constructed of all corresponding values of c,
- the list is bound to typed-binding-pattern
- the following clauses of the query are executed with the variables bound by typed-binding-pattern in scope
TBD: also variant without into
.
XXX: How to determine the result? Typically with into
the result will be a list.
TBD
- sum, average, count etc
- make sense for lists and xml too
- not in C# LINQ syntax
- not a comprehension
Something like this would be possible:
foreach var employee in employeeTable
sum employee.salary
i.e. use an aggregating keyword instead of select, but it's not very extensible,
and it would be weird for count
.
foreach var employee in employeeTable count
An alternative would be too put the aggregation up front:
count foreach var employee in employeeTable
Then sum
might become
sum foreach var employee in employeeTable select employee.salary
More thought is needed on this.
SQL has an INSERT INTO statement that allows rows to be added to a table.
We can provide similar functionality via a query statement that does not use the values produced to construct a new structure, but instead adds them to an existing structure. For example:
foreach var employee in employeeTable
where employee.rating >= 5
insert employee into goodEmployeeTable;
SQL provides statements that mutate the table being iterated over. This cannot be supported by Ballerina's existing iterator interface; a richer interface would be needed.
SQL has update statement (which mutates a table). We could do something similar, e.g.:
foreach var employee in employeeTable
where employee.rating >= 5
update { salary: salary + "1000" };
Semantics are that the row is replaced by a row with the specified fields changed. For this to work the expression after in
has to evaluate to a table.
SQL has a delete statement. We could do something similar, e.g.:
foreach var employee in employeeTable
where employee.rating < 2
delete;
or
delete foreach var employee in employeeTable
where employee.rating < 2;
This is different from the filter function in that it mutates the table. Syntactically, this has some similarities with count
.
For streaming query, there needs to be a way to group things into windows based on a timestamp-valued field in each value being iterated over.
Something like this:
window-clause :=
"window" binding-pattern
["sliding"|"fixed"] duration-expression
"on" timestamp-expression
Sketch of semantics:
- timestamp-expression evaluates to a timestamp which is used to compute the windows
- duration-expression gives the duration of the window
- binding-pattern gets bound to a record
{start: t1, end: t2 }
where t1 and t2 are the start and end timestamps of the window
A group-clause can then be used to group together rows using the variables bound in the binding-pattern of a window-clause.
foreach var r in tab
window w fixed 10 minutes on r.timestamp
group r by w.start into var grp
...
- refine static typing for list and mapping constructors to allow the contextually expected type to be a record or tuple type descriptor rather than just mapping or list type
- allow
foreach
statements to have awhere
clause - abbreviated form of mapping constructor (where field is just an identifier), analogous to abbreviated form of mapping binding pattern
- make spread operator work in mapping constructor