Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't delete dataset in DB when annotations exist #7429

Merged
merged 20 commits into from
Dec 6, 2023

Conversation

frcroth
Copy link
Member

@frcroth frcroth commented Nov 8, 2023

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • Create an annotation for a dataset
  • Delete that dataset
  • Download the annotation (it still works)
  • Opening annotation or dataset (via old URL) leads to error

TODOs:

  • Fix compact writes of annotations
  • do use the deleted dataset?
  • Frontend: Disable the view button for annotations with dataset status = "Deleted by user."

Issues:


(Please delete unneeded items, merge only when none are left open)

  • Updated changelog
  • Needs datastore update after deployment

@frcroth frcroth self-assigned this Nov 8, 2023
@frcroth frcroth changed the title Dont delete dataset row when annotations exist WIP: Dont delete dataset row when annotations exist Nov 8, 2023
@frcroth frcroth marked this pull request as draft November 8, 2023 10:57
@fm3
Copy link
Member

fm3 commented Nov 11, 2023

I’ve given this some more thought, and I think maybe we should reuse the status field of datasets to store this information. Have a look at deactivateUnreportedDataSources, which is called when updating the dataset list as triggered by an update from a datastore server. If a datastore is deleted on disk (not by webknossos, but by someone with disk access), this is how webknossos "deactivates" it, implicitly setting isActive to false by setting a status message. That is a string (because it can also include different error messages from the datastore), but it has a couple of standard possible values, one of which is "No longer available on datastore." (Compare the uses of the status field throughout the code). We could add one status message like "Deleted by user" for this feature. What do you think?

Sorry that I haven’t said this sooner, I see you have already started implementing it with a new column. But maybe unifying this would be nice.

@philippotto
Copy link
Member

I just looked into this, but I think two things are missing:

  • I don't see a dataset status in the annotation objects (only on the DS object) which is why I cannot change the UI when rendering annotations for deleted DS.
  • opening the annotation (with a deleted dataset) rightfully throws an error. however, opening the DS directly in view mode (e.g., via an old URL) yields a 200 status. rendering crashes later for other reasons, but I would expect an error response similar to the case where an annotation is opened.

@philippotto philippotto self-assigned this Nov 14, 2023
@fm3
Copy link
Member

fm3 commented Nov 14, 2023

How do these things work for datasets that are "No longer available on the datastore"?

@philippotto
Copy link
Member

Good point. I checked that now and the error is controlled by the front-end. It only cares for the existence of dataset.dataSource.dataLayers. If that is undefined, the error is shown. Now, the error is properly shown for the same DS for which it didn't work before. I assume that it was a caching issue? possibly bound to the rescan interval of the disk?

By the way, annotations with the "No longer available on the datastore" case are normally shown in the annotations list. The error is then shown when clicking "open" which is fine, I guess.

So, from my point of view, the front-end wouldn't need any change. Avoiding outdated dataset.dataSource.dataLayers would be nice for the back-end (if my theory from above is correct). Alternatively, the front-end could start checking the status string, but I'm not sure whether that is actually helps?

@fm3
Copy link
Member

fm3 commented Nov 14, 2023

Good to hear! Yes, I think that behavior is ok.

If I understood it correctly, the new delete action should immediately delete the layers in the postgres database, so there should be no backend-internal caching for dataset.dataSource.dataLayers. Maybe you could have another look at that Felix, and see if you can reproduce the issue Philipp describes

@frcroth frcroth requested a review from fm3 November 15, 2023 09:01
@frcroth frcroth changed the title WIP: Dont delete dataset row when annotations exist Don't delete dataset row when annotations exist Nov 15, 2023
@frcroth frcroth changed the title Don't delete dataset row when annotations exist Don't delete dataset in DB when annotations exist Nov 15, 2023
@frcroth frcroth marked this pull request as ready for review November 15, 2023 09:06
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, I think this help create less confusion for users :)

I added some comments, I think this needs to be integrated with the existing deactivateUnreported logic a bit more.

app/controllers/WKRemoteDataStoreController.scala Outdated Show resolved Hide resolved
uploaderIdOpt: Option[ObjectId],
searchQuery: Option[String],
includeSubfolders: Boolean,
excludeRemovedOnDisk: Boolean = true)(implicit ctx: DBAccessContext): Fox[SqlToken] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the datasets marked as deleted are not included in the list query at all? I’d say they should still be included, just like the "no longer available on datastore" ones. I think there are already filters used by the frontend for active/inactive datasets. My guess is that the isUsable column is used for that. Also could you have another look at the deactivateUnreported logic in DatasetService? I’d guess that this will somehow interact. the deletedByUserStatus should probably be included in inactiveStatusList?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I implemented this now, but I noticed that the status "Deleted by user" is actually never (or only for a short amount of time) present in the DB, because I trigger the rechecking of the inbox which checks the directories and the dataset is actually deleted on disk, so it will be marked as "No longer available on datastore.". Is this sufficient or should there be a more explicit status? Otherwise I could delete the "Deleted by user" status again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see. I think it would be valuable to preserve the information that it was in fact deleted by the user. Could you adapt the existing functions so that "Deleted by user" is not overwritten by "No longer available on datastore"?

Also, I do not yet fully see why this added check inbox trigger is needed now. Wouldn’t its effects already have happened on the wk side (delete layers, set status, etc)? Maybe I’m missing something though. It’s not like the new trigger should hurt much (it does potentially take some seconds, though, depending on IO speed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it now. This behavior was caused by a typo in an SQL query, which meant the status was never actually assigned. This is fixed now. A problem that persists is that the deleted dataset is still displayed in the dataset list. This status should be probably filtered (via a modified request).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Hmm, I would have expected that these datasets have isActive=false, so the current query would skip them together with the "unreported" ones.

Maybe this has something to do with the isUsable bool? Just noticed that. Looks somewhat redundant to be honest, but I haven’t looked into it that much. Maybe you need to change that value when updating the status

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not certain what the best UX here would be. I would have said isUnreported (meaning show them if the "missing" filter is active) is good. That is at least some explanation why the name is still reserved. But maybe you prefer something different? cc @philippotto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have said isUnreported (meaning show them if the "missing" filter is active) is good. That is at least some explanation why the name is still reserved.

I agree with that (for "deleted" datasets where annotations exist). To avoid any potential misunderstandings, let me write out how I'd imagine this overall:

  • deleting a DS for which no annotation exists should completely delete the DS so that it cannot be found via the UI (also not via any filters)
  • deleting a DS for which annotations exist, should tell the user "we deleted the dataset, but since annotations exist for it, the dataset will still be listed under "unreported". if you want to delete the dataset completely, do ...".
    • the perceived behavior would be equal to when the user deletes a dataset from disk.

Does this make sense?

Copy link
Member

@fm3 fm3 Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. On "if you want to delete the dataset completely, do ..." – unfortunately there is currently no user-exposed way to do this. That would be a follow-up issue. So I’d omit that part of the message for the moment 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted by user is now also a cause for datasets to be unreported.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! As discussed in person, I’d ask for additional explanations in the frontend for the user,

  • before deletion: that the name may still be blocked
  • after deletion inside of "show error", that this was still referenced and is kept to not break things elsewhere

…shboard tab; also explain that the name is not guaranteed to be usable after deletion
@philippotto
Copy link
Member

Looking good! As discussed in person, I’d ask for additional explanations in the frontend for the user,

  • before deletion: that the name may still be blocked
  • after deletion inside of "show error", that this was still referenced and is kept to not break things elsewhere

Done!

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@frcroth
Copy link
Member Author

frcroth commented Dec 6, 2023

@philippotto Does this need frontend review?

@philippotto
Copy link
Member

@philippotto Does this need frontend review?

I don't think so. The front-end diff is mostly changing some words. @fm3 had a look at this, I think :)

@frcroth frcroth merged commit bc835db into master Dec 6, 2023
2 checks passed
@frcroth frcroth deleted the delete-dataset-annotation branch December 6, 2023 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleting datasets triggers SQL foreign-key violation
3 participants