Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guaranteed Backwards Compatibility #43

Open
GerHobbelt opened this issue Aug 13, 2019 · 3 comments
Open

Guaranteed Backwards Compatibility #43

GerHobbelt opened this issue Aug 13, 2019 · 3 comments
Labels
🤔question Further information is requested or this is a support question

Comments

@GerHobbelt
Copy link
Collaborator

Considering

See discussion at #15: as James emphasized backwards compatibility there, it triggered this thought process:

  1. normally, you test for that kind of thing, using stuff like qiqqa test databases and running multiple versions of qiqqa in parallel plus checking if upgrading/downgrading via setup.exe actually flies (version checks or otherwise in there might block downgrade paths; currently the Qiqqa installer mentions everything as if it were an upgrade all the time, but that's okay as it allows moving back & forth that way without much trouble.

  2. then there's the choice of how far we take this:

    Does 'backwards compatibility' mean:

    1. when you upgrade, then downgrade, after the downgrade all your latest activity after the upgrade makes it through? ("full continuation across versions" - tough to accomplish / near impossible), or

    2. when you upgrade, then downgrade, you get 'rolled back' to the time you were upgrading? ("rollback to previous version's SNAPSHOT"?)

    3. when you upgrade (or downgrade, your major data sources are kept intact (using option 1 or 2 above), while the derivative data, e.g. search index caches, are rebuilt on every data format change: if we upgrade Lucene (upgrade LuceneNet  #23), or downgrade it as we downgrade Qiqqa, we MUST regenerate the index as the Lucene file format has changed significantly. ("regenerate what you can that's unnoticeable")

    4. or you use a 'weaker' compatibility definition, where 'derivative data' includes BrainStorms, Expedition sessions, etc.: the Qiqqa Configuration and Qiqqa Library Database plus PDF collection (and very probably the OCR text output) are then considered sources which must be kept backwards compatible using option 1 or 2 above, while the rest is marked as 'incompatible' and regenerated upon user action/request: this would mean BrainStorms and such would not always survive past an upgrade or downgrade where the brainstorm file format changes. ("regenerate as much as you can")

Depending on the choice made above (do we go with option 1,2,3 or 4?), there's the technical side of it:

  1. is an extremely tough nut to crack under all circumstances and there's the question of new versions introducing data that old versions don't carry (take add source location/URL to Qiqqa Library record & update source location/URL for an existing PDF #27 which currently is only noticeable to hackers like me until extra work will make it visible via CSL, but how 'faithful' should the backwards compatible move be: kill all the new URI info? You get the point. Some degree of "metadata loss" is always lurking when downgrading.)

  2. snapshotting can be done by

    1. creating a (archived?) backup of the Qiqqa files/libraries and then migrating the files themselves, but that only works out well for upgrades: downgrades need to unpack that archive or they'll handle more-or-less-visibly incompatible data files. This would require more coding.

    2. using (file format) version identifiers in the Qiqqa data file system, such that every version of Qiqqa knows "where to find his own": upgrades then go and look for older versions to migrate upwards when "their own" doesn't exist yet, while downgrades can simply revert to looking at their own data "that was before".

  3. regenerate-what-you-can is the same as option 2, except you exchange extra time for code by regenerating indexes/caches instead of adding cod to migrate them to the new file format/version.

  4. regenerate-even-more is like 3, and thus 2, except you don't code upgrade migration code for some of the Qiqqa derivative data files, e.g. brainstorms, expeditions, ...

Preference

My personal preference would be to go with option 2.ii - snapshotting with version in path (though do we need to snapshot/copy the entire bunch of metadata files as there are (or will be) interrelations, or do we snapshot/copy single files? My pref is the first: one snapshot per version, upgrade what you need upgraded).

Add to that as much option 3 as we can get away with (Lucene search index regeneration, OCR cache regeneration, i.e. re-run OCR task when we change the formats there, or the OCR engine maybe: #35 + #34, autotags, ...)

It's very simple and thus least error prone, which is handy to have for a path less traveled: back-pedaling on a Qiqqa software update/upgrade.

The 'negative' consequences is the user disk slowly filling with Qiqqa library/config snapshots, but that's a 'risk' I am very much okay with: here a 20K+ articles library clocks in at LESS THAN 50MByte of metadata, SQlite3 library main database included.

Of course, it would get worse than that when we change the OCR text cache, but that can very probably be coped with nicely by blowing away the entire OCR cache and regenerating it from scratch.

The (Lucene) Search Index is also quite another matter as that one clocks in at about ❗️700MByte❗️ for that same 20K+ article library, hence we should consider the Lucene 'database' as 'separate' from the rest of the Qiqqa library metadata (.library, .autotags, folder watcher, you name it)

Suggested storage layout

- documents     -- fixed hashed PDF store, no changes expected
- index         -- Lucene search index store; only migrate on Lucene changes
  - *           -- v79 and before Lucene v2.9.x search db files
  - v82/*       -- Lucene DB for when upgraded. Use Qiqqa version, 
                -- which is fastest changing
  - v97/*       -- future Lucene DB, yet another upgrade there
                -- (which happened with Qiqqa v97 release then)
- *             -- v79 and before Qiqqa library metadata: autotags, library, etc.
- v82/*         -- (v81 was experimental) for every Qiqqa version with ...
- v83/*         -- ... *any* format change, there's a version-named ...
- v84/*         -- ... directory for the metadata so you can rollback.
... etc. ...
@GerHobbelt
Copy link
Collaborator Author

To keep in mind when working on this:

Functional Risk Scenarios

  1. Qiqqa crashes/aborts for whatever reason and files in the library are damaged or get erased. Qiqqa then MUST NOT revert to previous releases' files and 'migrate' them so as to have a file that works.

    Current example as of this writing: nuke the new config json file and Qiqqa will load the old-skool binary-.NET-serialized file instead.

Technical Risk Scenarios

  1. To stay backwards compatible with old Qiqqa releases we MUST NOT CHANGE/AUGMENT the serializable C# classes as then old binary files won't deserialize.

    There are several ways around this problem, but making them immutable and using some other classes for subsequent versions is a Code Smell in my opinion. As I'm moving to JSON-based data formats anyway, another possible solution is to have a separate binary which can (and always will) read old binary format Qiqqa files and be able to convert those to the new json formats (which also works out nicely for the [Obsolete]-marked serializer class code in Qiqqa 😄 ) // the second part of the solution should be a test to ensure the json-backed storage can deserialize into old and new class instances: how does the json deserializer respond to extra or missing json fields?!?!

    Also: as we keep the old releases' files immutable after an upgrade, do we upgrade all files during setup/install using maybe a separate tool or do we migrate on demand in Qiqqa -- as brainstorms and such aren't necessarily loaded at execution start, hence the moment every file is migrated to the new release MAY be at very different times, which MAY be ease-to-confuse such legal and wanted upgrade migrations with the func risk nr.1 above: inadvertent re-migrate after file loss due to qiqqa crash or other failure mode action.

GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Aug 16, 2019
…to MSVS2019 solution by using a dummy project and prebuild script

- added skeleton projects for qiqqa diagnostic and migration work. Related to jimmejardine#43
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Aug 17, 2019
@GerHobbelt
Copy link
Collaborator Author

Thought: create a minimal v80 derivative which has no changes in the serializable classes and installs in parallel to a regular Qiqqa install - to be used as a fast test base for 'is this stuff still backwards compatible' checks.

@GerHobbelt
Copy link
Collaborator Author

🤔 Are we forever stuck with this collection of protobuf, .NET binary serialization and JSON serialization for the sake of backwards compatibility?! 🤔

When I look through the GetSatisFaction forum at https://getsatisfaction.com/qiqqa/topics/ it looks like both scenarios are true:

  1. Qiqqa is abandoned and folks have moved elsewhere. Great. </sarcasm>
  2. Folks pick up from where they got to several years ago and your guess is as good as mine when it comes to pinpointing the Qiqqa version they've been using last and want to recover/continue from.
  3. Couple of folks I can see hanging on for better or worse. (Yeah, that's the third item of two.)

Otherwise it's unclear where the activity and usage of Qiqqa is out there. ❓

Anyway; on the tech front I'm not sure if zoning out the backwards-compat crap into another binary is such a swell idea. Heck, right now I don't know what is. Meanwhile riding on the old cruft isn't giving me joy either as there's a few spots that I want to kick the tires of without getting bogged down in the protobuf/serialization funk.

To be continued... 🌵 <rant off>

GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Sep 5, 2019
…ad UX as it takes ages for the sort to complete and the view to update) - this resulted in work done but also the need to nuke the slower dictionary-based PDFDocument setup, which does a lot of casting and conversion back&forth at run-time. As a consequence, serialization will have to be done differently and 'upgrading' old-format records should have been coded into the DESERIALIZER/LOADER anyway, instead of some nasty hack at the (re)write side of things.

TODO: complete this work and add in backwards compat load code for issue jimmejardine#43 (jimmejardine#43)
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 2, 2019
…to MSVS2019 solution by using a dummy project and prebuild script

- added skeleton projects for qiqqa diagnostic and migration work. Related to jimmejardine#43
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 3, 2019
…ad UX as it takes ages for the sort to complete and the view to update) - this resulted in work done but also the need to nuke the slower dictionary-based PDFDocument setup, which does a lot of casting and conversion back&forth at run-time. As a consequence, serialization will have to be done differently and 'upgrading' old-format records should have been coded into the DESERIALIZER/LOADER anyway, instead of some nasty hack at the (re)write side of things.

TODO: complete this work and add in backwards compat load code for issue jimmejardine#43 (jimmejardine#43)
@GerHobbelt GerHobbelt added the 🤔question Further information is requested or this is a support question label Oct 4, 2019
GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Oct 4, 2019
…icated library so the mainline codebase doesn't keep cluttered with old stuff, just because we want to be able to load/import old Qiqqa libraries.
@GerHobbelt GerHobbelt added this to the Our Glorious Future milestone Oct 9, 2019
@GerHobbelt GerHobbelt modified the milestones: Our Glorious Future, v82 Nov 3, 2019
GerHobbelt added a commit that referenced this issue Nov 5, 2019
…to MSVS2019 solution by using a dummy project and prebuild script

- added skeleton projects for qiqqa diagnostic and migration work. Related to #43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤔question Further information is requested or this is a support question
Projects
None yet
Development

No branches or pull requests

1 participant