-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Bag of Images" object for unregistered fields of view #751
Comments
@ambrosejcarr thanks for generating a concrete issue and proposals on this somewhat nebulous problem. Some clarification:
From the beginning, we (and our spec) have supported registered data, meaning data that includes coordinates for each tile, not just resliced data that is already an aligned 5D tensor. Any departure from this is walking back from our original spec and should be discussed as such. Personally, I'm strongly in favor of keeping it. Overall, I agree with @ambrosejcarr 's assessment of the current situation- we have assumed that the raw data could come in with arbitrary locations for every .tif file. Fortunately, a lot of the refactoring work described above and the work put back on contributors can be avoided if we reconsider this assumption. I propose that we modify the assumption to match real-world data generation: Within a FOV, images for each (channel, round) share the same x,y coordinates. This doesn't exactly solve the 5-D ImageStack problem, but it does:
As a contributor, I say that loading, resampling and rewriting entire data sets is too much to ask, now and in the future. Furthermore, reslicing of a data set is a lot easier to say than do. Specifically, implementing image translations, cropping, and dealing with FOV overlap is prone to introducing artifacts into downstream image processing pipelines. Also, the tools out there to do it are either moderately complicated (render) or moderately painful (FIJI). I look forward to more discussion here- this will definitely impact priorities for the next few months as SpaceTx data generation ramps up. |
I think there might have been some difference of interpretation there, but Deep and I are on your page here: we can take registered data as you describe, without reslicing. Sorry if that wasn't clear. However, it may make more sense to reslice it ourselves outside of starfish, at least for the spaceTx experiments, to triage the work necessary to treat data as it is acquired. Treating registered data directly could mean a general interpretation of complex transformations, which opens a scary box.
I agree with the "in the future" part of this, but can you help me understand why this is so burdensome? As far as I can tell, for a one-off experiment it would just mean a few dozen compute hours and an extra 1-60 tb (size on 1 fov) of temporary storage, hence my proposal above that we (starfish team) might want to resclice it to free time to focus on infrastructure work. Certainly it seems easier to me to take registered data from users and reslice it on s3 than it does to adapt starfish in the short term to support data that's not pixel-aligned in 3d. What am I missing here? Thanks Brian! |
These problems are completely avoided by doing per-round, per-channel processing and handling the registration in the IntensityTable. And this seems to me to be pretty easy to implement and useful for several reasons beyond just handling SpaceTx data in the short term. I'm also a little concerned that a focus on building for resliced data will make registered (but not resliced) data second-class citizens in starfish. I'm biased here of course, because I'm generating this kind of data, but I'm not the only one! |
Don't worry. Just gathering information to figure out what makes sense to implement now (and we won't make that decision unilaterally). It's more a question of "when". I think we could get your workflows functioning in current starfish without too much trouble (as you outline, thanks!), but I'm concerned about some of the ISS and expansion approaches. I think a separation of objects (ImageStack vs BagOfImages) might be the right way to differentiate the two types of workflows, which would keep both as first class citizens. Your other points are very helpful. It's good to understand it's not a computational/cost issue, but an algorithm one. I generally think your point about processing like the data is generated is a very compelling one will advocate that starfish accommodate that. |
Because we requested registered & resliced data for the MVP,
ImageStack
makes the implicit assumption that contained data is registered, and thus that data can be treated as a 5-d tensor that lives in a consistent euclidean space.As users push back on this requirement, our attempts to use unregistered data is causing fragmentation of our object model. For example, we are discussing the need to assert that data lives in the same coordinate space to submit it to certain types of spot detection (see #695).
When data is not registered, it means:
IntensityTable
is more complex and must be done tile-by-tileImageStack
(for spot calling)The purpose of this issue is to discuss a short and long-term plan to solve this problem. My best guess at how to tackle this is below
Long-term Proposal
Filter
andSpotFinder
components that currently takeImageStack
objects. Given that users have not had trouble storing unregistered data in SpaceTx-format, the underlying data should be stored in the same way on disk and potentially also in-memory.Registration
methods that we expose in the future would then take "bag of images" and emit ImageStacks.ImageStack
that all tiles hold the same physical coordinates.SpotFinder
methods that require registration can take onlyImageStack
objects. More flexible spot finders can take eitherImageStack
or "bag of images".Filter
methods will not offer 3d processing to "bag of images" objects.Short-term (spaceTx) Proposal:
Unregistered data is currently strictly out-of-scope for starfish. To avoid creating technical debt by introducing inappropriate flexibility into
ImageStack
, we should:SpaceTx-format
should retain the ability to store per-tile coordinates so that we can expand to work with unregistered data in the future.ImageStack
that all tiles hold the same physical coordinates (all data is registered & resliced)cc @joshmoore @berl @kevinyamauchi @dganguli @shanaxel42 @ttung
The text was updated successfully, but these errors were encountered: