-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add temp table bulk loading #2
Conversation
Wow! This project has been published for a grand total of one single day, and already you have delved into the codebase and implemented a non-trivial feature like this. I'm impressed! Since this is a non-trivial feature and there are several design decisions involved, I don't want to rush this PR. I'll try to have a close look at it over the coming days. In the meantime, could you provide some details about your design? Such as:
One thing I immediately notice (though I haven't considered it in light of the entire PR) is that you seem to have SQL files with the temp table definition. I may prefer just having the temp table definition directly in the config. As I see it, this could be a property under (Also, sorry for the mess in Db.fs. It was the first file I wrote when prototyping, and I haven't gotten around to properly cleaning it up.) |
Also, it seems that currently, there can only be one temp table per script ( |
By the way, since this may be a fairly major feature design-wise: If you want this implemented, but don't necessarily want to spend the time discussing and modifying the PR possibly at length, then just let me know and I may be able to look into implementing this myself. But if you'd rather be involved and have your name on the commits and not have the work you have done so far go to waste, then let's continue adapting this PR. |
Some very good points, documentation isn’t my one of my strengths, but I’ll try and answer your questions over the next few days. One thing I can say about temp tables in the config file. I added the temp table load to sqlclient because I was using it for an etl process for energy data. Currently that project has 516 different temp tables so adding that all that into the config file could make it huge and unmanageable. You also have sql primary key, clustered indexes and vast array of option on a temp-table which could be hard to add config values for. I also like to have the create table script and exec script to hand so I can easily edit both in sql management studio. The one reason I was on this project so quick is because the |
Is there any reason why you can't use user-defined table types?
I was thinking you could just have a single config property containing the entire definition, primary keys and all. In any case, I see your point. Perhaps both approaches can be supported - |
I'll do some research on temp tables in the coming days. I've never used them apart from as an ad-hoc replacement for user-defined table types in FSharp.Data.SqlClient. So it is possible I have a somewhat twisted view of what temp tables are and how they are used. Perhaps the "temp table as TVP" approach doesn't always make sense? Do you know? Also, do you know if temp tables make sense as "input" to stored procedures, or only scripts? |
After doing a bit more research and looking at how temp tables are used, as well as some pitfalls regarding which scope they're available in, I am perhaps thinking that the best and most flexible option is to require users to have temp table definitions in SQL files (meaning we can generate one temp table helper type per file), and then have the API allow users to:
Something like this: let test (connStr: string) : Async<..> =
async {
use conn = new SqlConnection()
do! conn.OpenAsync() |> Async.AwaitTask
do!
TempTables.MyTempTable
.WithConnection(conn)
// DTO variant
.AsyncCreateAndLoad(rows = myData |> List.map TempTables.MyTempTable.createRow)
// Parameter variant
.AsyncCreateAndLoad(
rows = (
myData |> List.map (fun x ->
TempTables.MyTempTable.createRow(
col1 = x.Col1,
col2 = x.Col2
)
)
)
return!
Scripts.SomethingThatUsesTempTable
.WithConnection(conn)
.WithParameters(..)
.AsyncExecute()
} This is just a draft. One caveat is that nothing requires you to create/define the temp table before calling scripts/procedures that use it. However, some benefits are:
I will work a bit more on the API design and see if there is an elegant way to make it safer (i.e., forcing users to load temp tables as part of the parameter list, just like with TVPs). |
As an alternative, the following API mirrors the current TVP functionality: MyScript
.WithConnection(connStr)
.WithParameters(
tempTable1 = [TempTables.MyTempTable1.create(Foo = 1)],
tempTable2 = [TempTables.MyTempTable2.create (Foo = 1, Bar = "test")]
) This would require the following:
A drawback of this approach is that temp tables are required to be created and loaded in a single script; it is not possible to share temp table data between scripts. The approach mentioned in the previous post is more flexible, at the expense of a little bit of safety. Given that temp tables may be more of an escape hatch and not the recommended general approach when using Facil, I don't necessarily think the "lack of safety" matters that much. I think it would be hard to cause a lot of real harm by forgetting to create/load the temp table before calling your script. It's something that fails fast. So I may actually prefer the approach in the previous comment. |
We currently use temp table in the SqlClient to stream data from one database to another. I cannot change the schema of the target database which is why I use temp table not table type parameters. I think I can change my code todo the following and stick to the other connection requirements such as keeping things in the C# internals
tempTable could also be a filename. I'm not 100% sure about yaml so I don't know if the format is correct
This would be a really help me slot this into existing code. |
Just want to let you know that I'm actively working on this now. I hope to have something working by the end of the weekend, but no promises. The API will be similar to what you suggest above, but the internals are rather different. |
Clean up Clean up
@cmeeren I've had a go at this myself and I've got the code to match what I said above:
I still haven't done the non-dto WithParameters correctly, it's hacked in for now. It's still very rough but if you like the way I'm going I can clean it up. |
Thanks. Don't worry about it. I'm close with an implementation that works with both DTO and non-DTO, and uses somewhat different internals/config. |
Non-DTO are in now in. Its become quite light touch PR - I'm managed to squeeze it in without changing too much. @cmeeren It's here if you need it, could save you a bit of time. |
Hi, Support for temp tables are now on Are you able to give it a try? I'll wait with releasing this until you have provided feedback. It's well tested and works, and I think you'll find it satisfies your requirements, but I'd like to be sure and hear your opinion before releasing. |
Note that I just updated the readme with info on temp tables. Also, I overlooked a compile error; the nupkg will be available here. |
By the way, thank you so much for this PR. I have credited you in the changelog and the commit (4a300f3). Even though I did not merge this PR, the work you did here definitely helped me pin down some critical implementation details and sped things up quite a bit. The reason I implemented it myself, was a combination of the following: 1) This is a large and non-trivial feature, 2) I had some clear opinions on the implementation details, 3) I have limited resources for OSS work at the moment, and 4) guiding this PR to match my preferred implementation would take a lot longer than simply doing it myself. Thanks again for your work, and let me know if the current implementation works for you. |
@cmeeren it's looking really good, I really like the extra config options. I didn't notice one thing, if you changed the render to it output this:
You could cut down the verbose typings at the user side for extra developer happiness 😄
I'll try and add this to a real project tonight and see how it goes. |
It is a deliberate design decision on my part to not support nested records like that for TVPs and temp tables. I could do it, but it would have to be named something other than Furthermore, if you already have the nested record, there's a trivial workaround using anonymous record copy-and-update: let myNestedRecord = { .. }
let myNestedRecordWithCorrectTempTableTypes =
{| myNestedRecord with MyTableType = myNestedRecord.MyTableType |> List.map Script1.tempTableInlined.create |}
..
.WithParameters(myNestedRecordWithCorrectTempTableTypes) This design decision is not set in stone, but I'd like to see how Facil currently feels before potentially changing it. |
Thinking about it, it may actually be better to just have the DTO overloads accept nested DTOs. I'll give it some more thought. |
Just dropping some of this into a real project I can see you are using c# style ResizeArrays and IEnumerable which comes from the c# library. I think most f# users would rather see the native f# list and seq than the c# types. For large resizearrays this could dramatical increase the memory needed as it's re-cast into f# list. Correct me if I’m wrong but are you using the c# project because you of the awaits and tasks. You may already know but there is a an TaskBuilder which makes working with .net tasks really easy and you wouldn’t need to c# project Also there is AsyncSeq which could be better?
|
All the decisions you mention are deliberately made, primarily for performance reasons.
I use Also, I wouldn't return I use C#'s native Using
AFAIK this is not true. To build up an F# list most efficiently, you'd have to prepend each item, building the list up in reverse. Then at the end, you'd have to reverse it, creating a new list. That would first of all effectively be a copying of the list, and second of all using an F# list would generally cause massively more allocations (one per item) compared to Note that I haven't benchmarked anything (primarily because as a general library I know nothing of the workloads users will have); this is just going by intuition. I've given it quite a bit of thought, but let me know if you think I'm in the wrong in any of the above.
In case it wasn't clear above: Note that I do not surface Note also that other performance-focused F# libraries also surface |
Sorry, I should have put this as an issue/suggestion rather than a comment on this PR. I agree with points you have made on performance on the dotnet side but most of the bottlenecks I've found working with SQL isn't the dotnet side but some IO lock in SQL or some-such. There is a trade off here with speed and building a toolset that hooks up with other parts of the ecosystem. Coming from someone who battles with the F# SqlClient everyday I'm hoping your new ace tool will be picked up and used by the community and all the help it brings. Just trying to pre-empt some future - "Why are we using X when we should be using Y" which is at the core of the project. |
I see. If you believe any of the points I mentioned in the previous comment are mistaken, please open an issue and we can surely discuss both the public API and implementation details. I'll add a FAQ entry about |
I have given this "nested DTO" feature some thought. While I agree it is useful to be able to just pass nested DTOs and "have it magically work", I see several problems with this: Overload resolutionOverload resolution doesn't work for the TVP/temp table Records vs. primitives for single-col TVPs/temp tablesFor single-column TVPs/temp tables, it's hard to know whether the user wants to pass a list of record with a single field, or just pass a list of primitive values. Both may be useful in different circumstances. Ability to drop DTO in configIn the config, one can omit DTO overloads for any table type. This would break if the table type is used in a procedure/script where the DTO overload is not omitted. I prefer keeping such interactions to a minimum to avoid confusion. Makes it harder for the user to chooseSince the user must currently call the relevant Less orthogonal APICurrently, since the proc/script DTO overloads accept the concrete TVP/temp table type, it works equally well regardless of how the user creates the TVP/temp table rows. If we support nested DTOs, we lock it down to a single way of creating the TVPs/temp tables. While it is theoretically possible to add multiple "nested DTO" overloads, it quickly becomes confusing and downright unmanageable; imagine if we add more The above points, together with the fact that calling the relevant |
By the way, please let me know if you (or anyone else reading this) think the temp table feature is ready to be released on NuGet (i.e., there are no immediate breaking changes that needs to be made to this feature). I'll wait for your go-ahead on this one. |
Published in 0.1.3. |
This is a feature I added to SqlClient which I would like to also include here
(Edit by @cmeeren: Closes #3)