Table proposal discussion #354

jclark · 2019-10-03T10:22:47Z

This issue is to collect comments on the Tables proposal

jclark · 2019-10-03T10:24:44Z

Comment from @sanjiva. Another variant of persistent tables is to be able to create the persistent table within Ballerina and use the Ballerina type to create the schema for the underlying database engine.

Originally posted by @jclark in #286 (comment)

jclark · 2019-10-06T03:02:42Z

Overall I think this proposal is not satisfactory. I will explain the reasons in subsequent comments.

jclark · 2019-10-06T03:17:52Z

Compared to T[], one big thing that table<T> provides is the primary field concept. You could think of it as a map of records, where keys can be structured and where keys come from the records. This is useful functionality. The problem is that the immutable clone semantics makes it inconvenient to use and inconsistent with the other structural types in Ballerina. To update a field, you have to

get a record representing the row
clone it
modify the relevant fields
store it back

Whereas what you really want to do is simply:

get a record representing the row
modify the relevant fields

So why are we doing the immutable clone? One reason is to enable persistence/transactionality. The other reason is so that the table can enforce the primary key constraints. Otherwise you could get the following situation:

some variable r refers to a record
the record is stored in a table without copying
r is then used to change a field of the record that is part of the primary key to be the same as another record stored in the table

In #348, @hrennau makes the interesting suggestion of pushing the primary key concept into record. If we did this, then we could say that the fields corresponding to primary keys are immutable, thus preventing the situation described above.

jclark · 2019-10-06T03:21:43Z

Another more minor thing, that table<T> brings us compared to T[] is a syntax that avoids repeating field/column names for every row. But we could imagine having a list of mappings syntax that provides this same feature, but results in a value of type T[]. In other words, something like our table constructor syntax, but which results in a list of records value.

jclark · 2019-10-06T03:24:07Z

Another idea underlying the tables proposal was that it would be the gateway to providing a convenient SQL-like language-integrated feature for manipulating tabular data. But in fact it is possible to provide such as feature on top of lists without needing a table data type, as the Query proposal shows.

jclark · 2019-10-06T03:33:48Z

The most fundamental flaw in the proposal is this. It tries to both

treat tables as data, i.e. have the table basic type be part of anydata, and
allow transactional access to tables, using the persistent table concept

But it is simply not possible to do both these things, while at the same time keeping to Ballerina's explicit error handling principle. Transactional reads on data can fail in a way that needs to be reported as an error value not as a panic (because the errors are expected and need to be handled by the programmer typically using a retry). For example, a transactional read from a database can fail because of a deadlock. But anydata requires read operations like clone that do not return error values.

With transactions, each transaction gets its own distinct, isolated view of the table. This feels like something behavioural not like pure data.

jclark · 2019-10-06T03:47:40Z

Another fundamental issue is that it doesn't really solve the problem of providing convenient access to a relational database. It is not sufficient to look at each table in isolation. You really want to be able to capture the underlying Entity/Relationship data model, which involves multiple tables, as done by
e.g. JHipster.

hrennau · 2019-10-12T07:26:03Z

I think that the Table proposal is very important for the evolution of the Ballerina language (expressiveness, degree of being declarative). The proposal must neither be canceled, nor finalized prematurely.

The main shortcoming of its current state may be the lack of "goals and requirements" beyond the very high-level paragraphs opening, and the very specific SQL interop section closing, the proposal. We might be more systematic; start with defining categories of goals, like (a) which kinds of entities to represent, (b) which categories of operations to support, (c) which logical operations to support, (d) kinds of operation support.

How about an additional document which is equivalent to “Use cases” or “Requirements” documents preparing and guiding the devlopment of a W3C specification? Presently, important goals and requirements are scattered across discussions of various formal details – for example, the need of representing ordered SQL results pops up in the middle of the discussion of a formal detail. (While it is inevitable to let our minds shuttle, the resulting document should abstract from the process of its creation.) Currently, there is no coherent picture of goals and requirements; and guidance given by stated goals and requirements is reduced because perceived in isolation, rather than in context.

To sketch the direction which I imagine the “Requirements” documents could take:
(1) What do we want to represent, Ballerina-internal view?
Describing the new from the perspective of what is already there. Tentatively: “A collection of records, constrained by a uniform record type including (or extended by) a uniform identity model (ie list of primary keys)
(2) What do we want to represent, external view?
Listing the entities from the world which might be covered. Tentatively: SQL table, including its relationships with other tables (foreign key semantics); CSV data; a set of JSON documents constrained by a common grammar; a set of RDF resources constrained by a common class
(3) Which operation categories to support?
For example: filtering (intra-collection); navigation (intra-collection); query (possibly cross-collection; (1) à la SQL; 2 à la SPARQL; ...)
(4) Which operations to support? …
(5) Kinds of operation support?
(a) Reuse of existing query languages, reuse of existing queries
(b) Enable declarative expression of intent

Such a document would be written in an iterative way, enabling us to see new requirements readily in the context of the complete set of requirements already defined.

snow6oy · 2019-10-31T14:16:50Z

Hello, it seems the count function got lost in version 1.0.0. Can it be put back?

numOfRows = myTable.count();

gimantha · 2019-10-31T14:23:15Z

Hello, it seems the count function got lost in version 1.0.0. Can it be put back?
numOfRows = myTable.count();

Hi @snow6oy, It is already in the spec, Please refer to https://github.com/ballerina-platform/ballerina-spec/blob/master/lang/proposals/table/table.bal line no 11.

jclark · 2019-11-01T00:46:09Z

@snow6oy It's called length to be consistent with string and array (since table rows are ordered in this proposal).

anupama-pathirage · 2019-11-26T16:46:05Z

The row type descriptor of a table type descriptor that is the inherent type of a table value must be a closed record type, with the keys of the record type being the same as the column names.

The below question is regarding the above condition of having the keys of the record type being the same as the column names.

When we are dealing with SQL queries, sometimes the result of the query may contain some complex column names.

Ex :

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;

The result of the above query will be a table with two columns named COUNT(CustomerID) and Country. But we cannot define a record which is having a key name as COUNT(CustomerID) .So when we think about SQL interop with ballerina tables how can we have a mapping with such a column and corresponding ballerina table column name?

One alternative is to modify the SQL query with as to have a simple column name as follows. But that may not feasible in some cases.

SELECT COUNT(CustomerID) as IDCount, Country
FROM Customers
GROUP BY Country;

In the current ballerina table implementation we don’t have such restriction. I.e. only the order of the data types are considered and the corresponding ballerina table can have any name for the fields in the constraint record type of the table.

jclark · 2019-11-27T01:05:54Z

@anupama-pathirage You can use a https://ballerina.io/spec/lang/2019R3/#QuotedIdentifier to define a record with any field name you want.

anupama-pathirage · 2019-11-27T11:19:56Z

@jclark Thanks for the reference. I missed the spec definition for QuotedIdentifier as only the alphanumerics are allowed at the moment in the current implementation.

grainier · 2019-11-28T09:00:40Z

@jclark, @hasithaa, @anupama-pathirage Presently, the main usage of tables are with SQL client. And currently, when retrieving data, we return a CursorTable from SQL client and assign it to a table<ConstrainType>. But, with this proposal, we have to decouple the table concept from SQL and make it a more generic data type for tabular data.

To do that, what we can do is to return an iterator from SQL client and use that to construct the table. However, there are some implementation complexities with this approach. For example, the iterator should be aware of the value type it returns (i.e record {| ConstrainType value; |}?). And that information is not available since we don't have generics/parametric types at the moment. Therefore we thought of making the Iterator.next() return a RowType (map<anydata|error>) regardless of the returned data type from DB.

type Iterable object {
	
	__iterator() returns Iterator {
		...
	}
}

// current implementation
type Iterator object {
	next() returns record {| ConstrainType value; |}? {
		...
	}
}

// proposed implementation
type Iterator object {
	next() returns record {| map<anydata|error> value; |}? {
		...
	}
}

With this, if a user of a SQL client needs to retrieve information from a DB into a table, they can use this Iterator to do that. WDYT?

jclark · 2019-11-29T01:54:38Z

@grainier It should be convenient for the user to be able to perform further processing on the results they get back from the SQL client, by e.g. using a query or using functional methods like foreach or map. So what we need is an Iterable with some more conveniences and a built-in type name. This is exactly what stream<T> in the upcoming Event Stream Processing proposal provides. So I was hoping to be able to use that for this application also. Do you think that would work?

hasithaa · 2019-12-12T10:48:35Z

@grainier It should be convenient for the user to be able to perform further processing on the results they get back from the SQL client, by e.g. using a query or using functional methods like foreach or map. So what we need is an Iterable with some more conveniences and a built-in type name. This is exactly what stream<T> in the upcoming Event Stream Processing proposal provides. So I was hoping to be able to use that for this application also. Do you think that would work?

Conceptually it should work. But we may have to special case stream value in the implementation when representing an SQL result set. This is the exact same problem we faced with the table. (Cursor tables vs in-memory tables).

For me, it is more natural to represent a SQL result set using an Iterable object. If we can find a way to extend type-param support for lang.object module, we can easily provide functional iterations for iterator objects. That will give more convenient API for the users. (This also related to #386)

jclark · 2019-12-12T10:59:47Z

That would be true of the old design of stream<T>. But in the new design, there is very little about stream<T> that is specific to event streaming. It is just an Iterable<T> that is more convenient to use. (Actually that is not completely true: the timestamping support is specific to events.) A couple of other points:

an Iterable<T> isn't enough to do querying and it is crucial to be able to use language-integrated query to refine the results of an SQL query
there is no way to convenient way to write the type Iterable<T>.

jclark · 2020-02-29T04:29:14Z

It is clear that this proposal is not the right approach. So closing this issue.

jclark added enhancement Enhancement to language design lang Relates to the Ballerina language specification labels Oct 3, 2019

This was referenced Oct 3, 2019

Tables and query proposal discussion #286

Closed

RecordSet instead of Table #348

Closed

jclark mentioned this issue Oct 15, 2019

Preserve identity of records and other structural values #363

Open

This was referenced Dec 16, 2019

[Spec Deviation] Direct table type descriptor is not implemented ballerina-platform/ballerina-lang#13170

Closed

Cannot retrive reocord from table structure by passing parameter ballerina-platform/ballerina-lang#11023

Closed

jclark closed this as completed Feb 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table proposal discussion #354

Table proposal discussion #354

jclark commented Oct 3, 2019 •

edited

Loading

jclark commented Oct 3, 2019

jclark commented Oct 6, 2019

jclark commented Oct 6, 2019

jclark commented Oct 6, 2019

jclark commented Oct 6, 2019

jclark commented Oct 6, 2019 •

edited

Loading

jclark commented Oct 6, 2019

hrennau commented Oct 12, 2019

snow6oy commented Oct 31, 2019

gimantha commented Oct 31, 2019

jclark commented Nov 1, 2019

anupama-pathirage commented Nov 26, 2019

jclark commented Nov 27, 2019

anupama-pathirage commented Nov 27, 2019

grainier commented Nov 28, 2019

jclark commented Nov 29, 2019

hasithaa commented Dec 12, 2019 •

edited

Loading

jclark commented Dec 12, 2019

jclark commented Feb 29, 2020

Table proposal discussion #354

Table proposal discussion #354

Comments

jclark commented Oct 3, 2019 • edited Loading

jclark commented Oct 3, 2019

jclark commented Oct 6, 2019

jclark commented Oct 6, 2019

jclark commented Oct 6, 2019

jclark commented Oct 6, 2019

jclark commented Oct 6, 2019 • edited Loading

jclark commented Oct 6, 2019

hrennau commented Oct 12, 2019

snow6oy commented Oct 31, 2019

gimantha commented Oct 31, 2019

jclark commented Nov 1, 2019

anupama-pathirage commented Nov 26, 2019

jclark commented Nov 27, 2019

anupama-pathirage commented Nov 27, 2019

grainier commented Nov 28, 2019

jclark commented Nov 29, 2019

hasithaa commented Dec 12, 2019 • edited Loading

jclark commented Dec 12, 2019

jclark commented Feb 29, 2020

jclark commented Oct 3, 2019 •

edited

Loading

jclark commented Oct 6, 2019 •

edited

Loading

hasithaa commented Dec 12, 2019 •

edited

Loading