-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read and comment before 3/22. #34
Comments
Thanks for the ping @iangrooms! I will put this in my queue. |
Nice work, @iangrooms! In addition to my comments in the overleaf document, I have two more general comments/questions:
|
We are planning to eventually publish the gcm-filters paper in JOSS, where it will get a DOI and become citeable. But, in my recollection, we explicitly decided to do this paper first. If we time thing right, we can do the JOSS paper while this one is under review, and then update the citation before the JAMES paper goes to press. If not, we can go the Zenodo route. But either way, I think it's fine to just reference the GitHub url in our first draft. Edit: alternatively, if we feel the package is mostly ready, we could just go ahead and submit the JOSS paper in parallel with the JAMES one. That is allowed and encouraged by JOSS. |
I agree with everything you said, but I also think that the deadline for having this sort of fully reproducible workflow is publication time, not submission time. Since several other papers in preparation are currently blocking on this one getting submitted (including the JOSS paper itself), I think we should defer polishing this stuff until after we get our first round of reviews back. |
I managed to read through the paper this evening. Great work everyone! I think it's shaping up to be a very useful and exciting contribution. I left numerous comments and made small changes in overleaf. (I hesitated to make large changes without tracked changes enabled.) My three general comments about where we need to focus are:
I'm happy to work on 1 and 3 if others agree with this assessment. |
Thanks everyone for reading and leaving comments. I think there is one major question for us to discuss: How much should we discuss the gcm-filters package in this paper? Should we say that "it's under development" or "we have developed it"? Either way, I think it would be quite convenient to talk about it, e.g. by saying that the package has a default setting for Here's a list of changes to be made, that I think don't require as much (or maybe any) discussion:
I've also left responses to several comments on the overleaf document about things that I didn't think needed changes or where I wasn't sure. |
Of course I left something out: We should use the same color map for vorticity in Figures 5 (@sdbachman) and 6 (@ElizabethYankovsky). Can you coordinate to choose the same colormap and update one of the figures? |
I think we should say something like this: "A open-source python package implementing this algorithm, called gcm-filters, is currently under development (see https://github.com/ocean-eddy-cpt/gcm-filters). An early version of the package was used to generate the results in this paper. A paper describing the software itself is in preparation for Journal of Open Source Software, to coincide with the first release. In the present manuscript, our focus is the algorithm itself, not the implementation." |
That seems reasonable to me, though I think the hardest part will be the rewrite to section 4 where I think you want to supply extra details about implementation. I think you might be the best person to work on that section and find the appropriate line between saying too much and not saying enough. |
Just for the sake of discussion, I'd like to raise the possibility of just removing section 4. The paper is already long. In the JOSS paper we can do benchmarking more carefully and thoroughly. If this paper is indeed about the algorithm, rather than the implementation, is section 4 necessary? I don't feel strongly either way. Just seems like an option worth considering. |
I'm actually not opposed, despite having spent a few hours on coding and running the tests. (@NoraLoose and @arthurBarthe also spent time on it.) For me the point was just to underscore that the algorithm is fast enough to be practically useful. If we drop the section, perhaps we could add some comments about timing into section 3, just to show that the cost is reasonable? On the other hand, whether people use it or not will probably only be weakly related to whatever timings we report in this paper so it might not matter. I'm curious what others think. |
The work we have spent on coding and running tests will not go to waste. We could report these results (and go deeper) in the JOSS paper, where we can naturally focus a bit more on the details of the implementation. Everyone who contributed to this paper will be invited to coauthor that one. |
Some last-minute comments (the paper looks great!): Introduction l. 164: "The background intuition for kernel-based spatial filters is developed entirely for functions on Euclidean spaces." l. 184: "in matrix form as Lf" Again this is probably a personal preference, but for me, Lf is a common notation for applying the operator L to f regardless of whether a matrix multiplication is used in the implementation, so I think you could just as well write, "We write this operation as Lf - note that it is not necessary to actually construct the matrix corresponding to the operator L". l. 192: maybe say somewhere around here that in the case of cartesian periodic boundaries, the eigenvalues would be given by the discrete Fourier modes, but because we deal with more complex grids (at least that's my understanding) the eigenvectores are not that simple to obtain. This would go with my earlier comment about explaining the benefits of the new method. l. 242: "where dxmin is the smallest grid spacing" maybe say earlier in 2.3.1 that the grid spacing is allowed to be irregular? More generally I think this section is well-written but could benefit from specifying (again maybe at the beginning of 2.3.1) what grids we are allowed to work with Figure 4: In Figure 4 I think it's quite nice to see how the effective filter "adapts" when hitting a continent. Maybe worth quickly noting / commenting on this? l. 366 I'm not sure I entirely understood this very well |
Thanks for the comments @arthurBarthe. I will attempt to incorporate your comments about the intro into my re-write. For the other comments:
I updated it so that it now refers to the intuition developed in this section.
The sentence currently reads "We write this operation in matrix form as
I added a statement to this effect.
I updated this so that it refers to quadrilateral grids. I don't really know what one would use for triangular grids. I don't want to list specific types of grids that this filter will work with; it should work for any grid where you have a discrete Laplacian.
I added a comment.
I will add a figure, per Scott's comment. |
@iangrooms: Figures 5-6 should now be consistent, we're both using the red/blue colormap |
Thanks! Kudos for super speed. |
I will say that I did a lot of work that was targeted at section 4. Specifically, this work consisted in implementing a hierarchy of Laplacians -- with the most complex ones handling tripolar exchanges -- in order to "match" Ian's Fortran implementation on the POP grid so that we could do the timing tests. But on second thought I like your idea of moving the computational section to the JOSS paper. In a deeper analysis, we could maybe report and compare the computational speeds of the full hierarchy of Laplacians that I implemented (rather than just the most complex ones). I'm imagining something like the following:
If people agree that this would be an interesting analysis for the JOSS paper, I actually have results on all of this already, part of which I reported in #4 and some notebooks posted in this repo (for CPU and GPU). |
Thanks for the comments @NoraLoose! Again I would emphasize that no work will be "wasted" under this proposal; it's simply a different partition of results across two distinct papers. I like your matrix a lot...but I would also argue that it needs a third dimension: parallel scaling (e.g. number of CPU cores / GPUs). Ideally we would look at strong and weak scaling, as this is the correct way to benchmark an HPC application. A key problem with our current presentation is that I fear it compares performance for implementations that use different amounts of parallelism. In my recent CPU tests of gcm-filters with dask, I have found some subtle issues related to thread vs. processed-based parallelism that I don't yet understand. This could explain some of the anomalous results found by Ian (e.g. lack of parallel scaling for dask); it makes me mistrust the result in the current table of benchmarking. I'd like to dig deeper on this, but I don't want to hold up this paper with that. So that's one motivation to consider splitting out the benchmarking results. If we go with this proposal, I think we can still describe the package in general terms in the current paper and say something about the fact that it is fast enough to use on real model output and works well on GPU. |
@rabernat everything you are saying sounds good to me! |
OK, I think all updates have been made. I will now open it up again for people to read and comment. Please use the comment system on overleaf for minor changes, and this github issue for bigger comments. Please get your comments in before Monday 3/22. You can use overleaf's History feature to see changes between this version and the first draft: Compare the two labels "First complete draft" and "Second complete draft." I also posted a pdf diff to the repository. Also, please add your grant info to the acknowledgments section. |
Thanks everyone for your comments. I think there are no outstanding issues (I made the minor changes suggested). Send me a message on Slack if there's anything you want to discuss. The only things left are
I will work on 2. I expect to be able to submit before the end of the week. |
It looks like we have a complete draft! Thanks everyone for making figures and putting in descriptions. I think the only things missing are (i) author affiliations, (ii) linking to the code & data in the acknowledgments, and (iii) funding info. Please fill in (i) and (iii), but we can leave (ii) until the end.
Please read and leave comments. For minor comments I recommend using the overleaf "comment" feature, but for more substantive comments please add them to this issue so we can all see them as they happen. Since we're all busy, I've given us a week to read & comment.
The author order is provisional, based on the following idea: grouped by affiliation, alphabetical within each institution. Please don't take offense and if you think we should change the order please speak up. If you prefer to bring this up privately, just message me on Slack. If you're a postdoc and feel like you need an advocate please talk to your mentor or any of the PIs whom you're comfortable speaking with.
The text was updated successfully, but these errors were encountered: