Guaranteed Backwards Compatibility #43

GerHobbelt · 2019-08-13T13:00:51Z

Considering

See discussion at #15: as James emphasized backwards compatibility there, it triggered this thought process:

normally, you test for that kind of thing, using stuff like qiqqa test databases and running multiple versions of qiqqa in parallel plus checking if upgrading/downgrading via setup.exe actually flies (version checks or otherwise in there might block downgrade paths; currently the Qiqqa installer mentions everything as if it were an upgrade all the time, but that's okay as it allows moving back & forth that way without much trouble.
then there's the choice of how far we take this:

Does 'backwards compatibility' mean:
1. when you upgrade, then downgrade, after the downgrade all your latest activity after the upgrade makes it through? ("full continuation across versions" - tough to accomplish / near impossible), or
2. when you upgrade, then downgrade, you get 'rolled back' to the time you were upgrading? ("rollback to previous version's SNAPSHOT"?)
3. when you upgrade (or downgrade, your major data sources are kept intact (using option 1 or 2 above), while the derivative data, e.g. search index caches, are rebuilt on every data format change: if we upgrade Lucene (upgrade LuceneNet #23), or downgrade it as we downgrade Qiqqa, we MUST regenerate the index as the Lucene file format has changed significantly. ("regenerate what you can that's unnoticeable")
4. or you use a 'weaker' compatibility definition, where 'derivative data' includes BrainStorms, Expedition sessions, etc.: the Qiqqa Configuration and Qiqqa Library Database plus PDF collection (and very probably the OCR text output) are then considered sources which must be kept backwards compatible using option 1 or 2 above, while the rest is marked as 'incompatible' and regenerated upon user action/request: this would mean BrainStorms and such would not always survive past an upgrade or downgrade where the brainstorm file format changes. ("regenerate as much as you can")

Depending on the choice made above (do we go with option 1,2,3 or 4?), there's the technical side of it:

is an extremely tough nut to crack under all circumstances and there's the question of new versions introducing data that old versions don't carry (take add source location/URL to Qiqqa Library record & update source location/URL for an existing PDF #27 which currently is only noticeable to hackers like me until extra work will make it visible via CSL, but how 'faithful' should the backwards compatible move be: kill all the new URI info? You get the point. Some degree of "metadata loss" is always lurking when downgrading.)
snapshotting can be done by
1. creating a (archived?) backup of the Qiqqa files/libraries and then migrating the files themselves, but that only works out well for upgrades: downgrades need to unpack that archive or they'll handle more-or-less-visibly incompatible data files. This would require more coding.
2. using (file format) version identifiers in the Qiqqa data file system, such that every version of Qiqqa knows "where to find his own": upgrades then go and look for older versions to migrate upwards when "their own" doesn't exist yet, while downgrades can simply revert to looking at their own data "that was before".
regenerate-what-you-can is the same as option 2, except you exchange extra time for code by regenerating indexes/caches instead of adding cod to migrate them to the new file format/version.
regenerate-even-more is like 3, and thus 2, except you don't code upgrade migration code for some of the Qiqqa derivative data files, e.g. brainstorms, expeditions, ...

Preference

My personal preference would be to go with option 2.ii - snapshotting with version in path (though do we need to snapshot/copy the entire bunch of metadata files as there are (or will be) interrelations, or do we snapshot/copy single files? My pref is the first: one snapshot per version, upgrade what you need upgraded).

Add to that as much option 3 as we can get away with (Lucene search index regeneration, OCR cache regeneration, i.e. re-run OCR task when we change the formats there, or the OCR engine maybe: #35 + #34, autotags, ...)

It's very simple and thus least error prone, which is handy to have for a path less traveled: back-pedaling on a Qiqqa software update/upgrade.

The 'negative' consequences is the user disk slowly filling with Qiqqa library/config snapshots, but that's a 'risk' I am very much okay with: here a 20K+ articles library clocks in at LESS THAN 50MByte of metadata, SQlite3 library main database included.

Of course, it would get worse than that when we change the OCR text cache, but that can very probably be coped with nicely by blowing away the entire OCR cache and regenerating it from scratch.

The (Lucene) Search Index is also quite another matter as that one clocks in at about ❗️700MByte❗️ for that same 20K+ article library, hence we should consider the Lucene 'database' as 'separate' from the rest of the Qiqqa library metadata (.library, .autotags, folder watcher, you name it)

Suggested storage layout

- documents     -- fixed hashed PDF store, no changes expected
- index         -- Lucene search index store; only migrate on Lucene changes
  - *           -- v79 and before Lucene v2.9.x search db files
  - v82/*       -- Lucene DB for when upgraded. Use Qiqqa version, 
                -- which is fastest changing
  - v97/*       -- future Lucene DB, yet another upgrade there
                -- (which happened with Qiqqa v97 release then)
- *             -- v79 and before Qiqqa library metadata: autotags, library, etc.
- v82/*         -- (v81 was experimental) for every Qiqqa version with ...
- v83/*         -- ... *any* format change, there's a version-named ...
- v84/*         -- ... directory for the metadata so you can rollback.
... etc. ...

The text was updated successfully, but these errors were encountered:

GerHobbelt · 2019-08-15T12:18:21Z

To keep in mind when working on this:

Functional Risk Scenarios

Qiqqa crashes/aborts for whatever reason and files in the library are damaged or get erased. Qiqqa then MUST NOT revert to previous releases' files and 'migrate' them so as to have a file that works.

Current example as of this writing: nuke the new config json file and Qiqqa will load the old-skool binary-.NET-serialized file instead.

Technical Risk Scenarios

To stay backwards compatible with old Qiqqa releases we MUST NOT CHANGE/AUGMENT the serializable C# classes as then old binary files won't deserialize.

There are several ways around this problem, but making them immutable and using some other classes for subsequent versions is a Code Smell in my opinion. As I'm moving to JSON-based data formats anyway, another possible solution is to have a separate binary which can (and always will) read old binary format Qiqqa files and be able to convert those to the new json formats (which also works out nicely for the [Obsolete]-marked serializer class code in Qiqqa 😄 ) // the second part of the solution should be a test to ensure the json-backed storage can deserialize into old and new class instances: how does the json deserializer respond to extra or missing json fields?!?!

Also: as we keep the old releases' files immutable after an upgrade, do we upgrade all files during setup/install using maybe a separate tool or do we migrate on demand in Qiqqa -- as brainstorms and such aren't necessarily loaded at execution start, hence the moment every file is migrated to the new release MAY be at very different times, which MAY be ease-to-confuse such legal and wanted upgrade migrations with the func risk nr.1 above: inadvertent re-migrate after file loss due to qiqqa crash or other failure mode action.

…to MSVS2019 solution by using a dummy project and prebuild script - added skeleton projects for qiqqa diagnostic and migration work. Related to jimmejardine#43

…erywhere. Part of he move towards achieving jimmejardine#43

GerHobbelt · 2019-08-22T10:06:47Z

Thought: create a minimal v80 derivative which has no changes in the serializable classes and installs in parallel to a regular Qiqqa install - to be used as a fast test base for 'is this stuff still backwards compatible' checks.

GerHobbelt · 2019-08-26T21:46:36Z

🤔 Are we forever stuck with this collection of protobuf, .NET binary serialization and JSON serialization for the sake of backwards compatibility?! 🤔

When I look through the GetSatisFaction forum at https://getsatisfaction.com/qiqqa/topics/ it looks like both scenarios are true:

Qiqqa is abandoned and folks have moved elsewhere. Great. </sarcasm>
Folks pick up from where they got to several years ago and your guess is as good as mine when it comes to pinpointing the Qiqqa version they've been using last and want to recover/continue from.
Couple of folks I can see hanging on for better or worse. _{(Yeah, that's the third item of two.)}

Otherwise it's unclear where the activity and usage of Qiqqa is out there. ❓

Anyway; on the tech front I'm not sure if zoning out the backwards-compat crap into another binary is such a swell idea. Heck, right now I don't know what is. Meanwhile riding on the old cruft isn't giving me joy either as there's a few spots that I want to kick the tires of without getting bogged down in the protobuf/serialization funk.

To be continued... 🌵 <rant off>

…ad UX as it takes ages for the sort to complete and the view to update) - this resulted in work done but also the need to nuke the slower dictionary-based PDFDocument setup, which does a lot of casting and conversion back&forth at run-time. As a consequence, serialization will have to be done differently and 'upgrading' old-format records should have been coded into the DESERIALIZER/LOADER anyway, instead of some nasty hack at the (re)write side of things. TODO: complete this work and add in backwards compat load code for issue jimmejardine#43 (jimmejardine#43)

…to MSVS2019 solution by using a dummy project and prebuild script - added skeleton projects for qiqqa diagnostic and migration work. Related to jimmejardine#43

…ad UX as it takes ages for the sort to complete and the view to update) - this resulted in work done but also the need to nuke the slower dictionary-based PDFDocument setup, which does a lot of casting and conversion back&forth at run-time. As a consequence, serialization will have to be done differently and 'upgrading' old-format records should have been coded into the DESERIALIZER/LOADER anyway, instead of some nasty hack at the (re)write side of things. TODO: complete this work and add in backwards compat load code for issue jimmejardine#43 (jimmejardine#43)

…icated library so the mainline codebase doesn't keep cluttered with old stuff, just because we want to be able to load/import old Qiqqa libraries.

…to MSVS2019 solution by using a dummy project and prebuild script - added skeleton projects for qiqqa diagnostic and migration work. Related to #43

GerHobbelt mentioned this issue Aug 13, 2019

README says: "Would anyone who is interested in contributing towards this repository please contact @jimmejardine" #15

Closed

GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Aug 17, 2019

first bit of work on migrating towards full-out JSON serialization ev…

9004d40

…erywhere. Part of he move towards achieving jimmejardine#43

GerHobbelt added the 🤔question Further information is requested or this is a support question label Oct 4, 2019

GerHobbelt added this to the Our Glorious Future milestone Oct 9, 2019

GerHobbelt modified the milestones: Our Glorious Future, v82 Nov 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guaranteed Backwards Compatibility #43

Guaranteed Backwards Compatibility #43

GerHobbelt commented Aug 13, 2019

GerHobbelt commented Aug 15, 2019

GerHobbelt commented Aug 22, 2019

GerHobbelt commented Aug 26, 2019

Guaranteed Backwards Compatibility #43

Guaranteed Backwards Compatibility #43

Comments

GerHobbelt commented Aug 13, 2019

Considering

Preference

Suggested storage layout

GerHobbelt commented Aug 15, 2019

Functional Risk Scenarios

Technical Risk Scenarios

GerHobbelt commented Aug 22, 2019

GerHobbelt commented Aug 26, 2019