Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use F90 interfaces and kinds to handle 4/8/d build issues, and only build one copy of the library #78

Closed
edwardhartnett opened this issue Nov 17, 2020 · 82 comments · Fixed by #250

Comments

@edwardhartnett
Copy link
Contributor

This will come after #77

Currently we build 3 flavors of the library, 4, 8, and D. Instead, use fortran interfaces and kinds to provide one library that handles all three cases.

Not sure when this should happen, we want to try this out on a simpler project first.

@edwardhartnett
Copy link
Contributor Author

@jbathegit when do we think this issue will be addressed?

@jbathegit
Copy link
Collaborator

@edwardhartnett I honestly have no idea. I haven't had any time to work on NCEPLIBS-bufr updates in the past couple of months, and I'm not sure how soon I'll be able to get back to it. I'd certainly welcome any help if someone else wants to work on this in the meantime.

@jbathegit
Copy link
Collaborator

@edwardhartnett @aerorahul @rmclaren @kgerheiser @jack-woollen

I've spent a bit of time looking into this. I've come up with a prototype that will work, but there are a couple of issues as noted below. For example, let's say I have an existing Fortran-77 library routine named writsa which has the following call signature:

call writsa( lunxx, lmsgt, msgt, msgl, xval, yval)

and where all of the arguments are currently implicitly typed (so the first four are integers and the last two are reals), and the third argument (msgt) is an array of size lmsgt. The goal here is to develop a single interface that can automatically handle any of 3 cases that might be passed in by a user:

  • the _4 case = 4-byte integers and 4-byte reals
  • the _8 case = 8-byte integers and 8-byte reals
  • the _d case = 4-byte integers and 8-byte reals

So I could pick, say, the second case as the new default (meaning that, going forward, the only compiled version of the library would be equivalent to the existing _8 build) and have something like the following:

module modi_writsa

    interface writsa
        module procedure writsa_4, writsa_8, writsa_d
    end interface

    contains

    subroutine writsa_4( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(kind=4), intent(in) :: lunxx
        integer(kind=4), intent(in) :: lmsgt
        integer(kind=4), intent(out) :: msgt(:)
        integer(kind=4), intent(out) :: msgl
        real(kind=4), intent(in) :: xval
        real(kind=4), intent(out) :: yval

        integer my_lunxx, my_lmsgt, my_msgl, ierr
        integer, allocatable :: my_msgt(:)
        real :: my_xval, my_yval

        my_lunxx = lunxx
        my_lmsgt = lmsgt
        my_xval = xval
        allocate(my_msgt(lmsgt),stat=ierr)
        if (ierr .ne. 0 ) then
          print *, 'allocate error in writsa_4'
          return
        endif

        call writsa_f ( my_lunxx, my_lmsgt, my_msgt, my_msgl, my_xval, my_yval )

        msgt(1:lmsgt) = my_msgt(1:lmsgt)
        msgl = my_msgl
        yval = my_yval
       
        return
    end subroutine

    subroutine writsa_8( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(kind=8), intent(in) :: lunxx
        integer(kind=8), intent(in) :: lmsgt
        integer(kind=8), intent(out) :: msgt(:)
        integer(kind=8), intent(out) :: msgl
        real(kind=8), intent(in) :: xval
        real(kind=8), intent(out) :: yval

        call writsa_f ( lunxx, lmsgt, msgt, msgl, xval, yval )

        return
    end subroutine

    subroutine writsa_d( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(kind=4), intent(in) :: lunxx
        integer(kind=4), intent(in) :: lmsgt
        integer(kind=4), intent(out) :: msgt(:)
        integer(kind=4), intent(out) :: msgl
        real(kind=8), intent(in) :: xval
        real(kind=8), intent(out) :: yval

        integer my_lunxx, my_lmsgt, my_msgl, ierr
        integer, allocatable :: my_msgt(:)

        my_lunxx = lunxx
        my_lmsgt = lmsgt
        allocate(my_msgt(lmsgt),stat=ierr)
        if (ierr .ne. 0 ) then
          print *, 'allocate error in writsa_d'
          return
        endif

        call writsa_f ( my_lunxx, my_lmsgt, my_msgt, my_msgl, xval, yval )

        msgt(1:lmsgt) = my_msgt(1:lmsgt)
        msgl = my_msgl
       
        return
    end subroutine

  end module

I would then rename the existing subroutine writsa as writsa_f, and that way users can still call the same name writsa as before, but now it will automatically re-direct them through the proper interface procedure (either writsa_4, writsa_8, or writsa_d, depending on how they have the variables declared within their own application code), and where each of these would then act as a conduit into the existing subroutine (now renamed as writsa_f) and where the call signature there is now clearly defined with all of the arguments being 8-byte integers and 8-byte reals as the default build of the library. I've tested this out and it works;

HOWEVER...

the main issue I see here is one of efficiency; specifically, in the writsa_4 and writsa_d cases, I have to copy all of the 4-byte values input by the user into corresponding 8-byte local variables, then call the single writsa_f routine with those 8-byte variables, then copy the output back into the original 4-byte values. This is even more complicated by the fact that one of the variables (msgt) is an array, which means I need to dynamically allocate the correct amount of space at run time in order to do the transfers. This seems like a lot of extra overhead, especially if we have to dynamically allocate temporary memory every time the subroutine is called. Is there maybe a cleaner way to do this that I'm missing here?

A secondary issue is that this, or whatever approach we end up using, will need to be replicated across hundreds of existing subroutines and functions in order to end up with a single build of the library that handles all of the existing _4, _8, and _d cases, plus all of the accompanying documentation, etc. Again, unless I'm missing something here that would make this process a lot more straightforward, this will be a lengthy task.

I'd greatly appreciate any comments, thoughts, advice, suggestions, etc. - thanks!

@kgerheiser
Copy link
Contributor

As you note there's no silver bullet when doing this in Fortran. Hopefully, a native method will be included in the next Fortran standard.

But for now another way to overload the method would be to overload writsa_f directly eliminating the need to copy data and then call the function. To do this directly in Fortran you would have to copy and paste all the code in three different places.

A method I've used before to reduce the code duplication is to use the pre-processor to #include the body of the function and the kinds:

#define INT_T int32
#define REAL_T real32
subroutine writsa_4( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(INT_T), intent(in) :: lunxx
        integer(INT_T), intent(in) :: lmsgt
        integer(INT_T), intent(out) :: msgt(:)
        integer(INT_T), intent(out) :: msgl
        real(REAL_T), intent(in) :: xval
        real(REAL_T), intent(out) :: yval
#include "writsa_body.f"
    end subroutine
#undef INT_T
#undef REAL_T

#define INT_T int32
#define REAL_T real64
subroutine writsa_d( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(INT_T), intent(in) :: lunxx
        integer(INT_T), intent(in) :: lmsgt
        integer(INT_T), intent(out) :: msgt(:)
        integer(INT_T), intent(out) :: msgl
        real(REAL_T), intent(in) :: xval
        real(REAL_T), intent(out) :: yval
#include "writsa_body.f"
    end subroutine
#undef INT_T
#undef REAL_T

But then you split up the function body and interface definition and rely on the pre-processor. Or it could be automated with the build system.

With bufr you would have to start at the lowest level functions and work your way up.

@aerorahul
Copy link
Contributor

As @kgerheiser notes, Fortran does not support templates.
Another possible approach would be to write the main code in a language that supports templates e.g. C++ and provide Fortran interfaces.

@rmclaren
Copy link
Contributor

Ive never used it but fypp (https://fypp.readthedocs.io/en/stable/fypp.html) seems to be a popular way to address these kinds of issues. I noticed that some JCSDA code uses it. Its a little ugly in my opinion but anyways...

@edwardhartnett
Copy link
Contributor Author

I'm not in favor of bringing in any new tools like fypp. In fact, I strongly oppose the idea.

There is always a strong urge to attempt to change Fortran in some way to overcome it's many shortcomings. (For example the UFS apparently has a home-rolled python to Fortran translator). In my experience, none of these band-aids on Fortran bring long-term benefit to the organization.

For example, we would not be very delighted to have to be using one of the many pre-processors I remember from Fortran 77 days, which gave F77 programmers new and wonderful innovations, now long-since forgotten. Will programmers at EMC in 25 years want to deal with fypp, our home-rolled python translator, or any other hack we make to Fortran to try and make it more easy to program?

Fortran is a long-term standard, but these other tools are local, temporary, and not really in it for the long haul. As the programmers that work on them retire or move on to new projects, these tools will be abandoned. But standard Fortran will march on, and NOAA will still want to run the millions of lines of existing Fortran code it has developed at great cost.

Future NOAA programmers, cursing our shortsightedness, will simply be removing these band-aids and returning to standard Fortran (or C/C++).

@edwardhartnett
Copy link
Contributor Author

I would be in favor of moving the core of this library (in a gradual way) to C. I think C++ is not really needed, and is much harder for Fortran programmers to learn and consequently harder to staff.

(It should not be assumed that any Fortran programmer can instantly learn either C or C++. There is a steep learning curve, especially when it comes to pointers and object oriented programming, which can be real stumbling blocks. Training and help is required. Also the programmer must be willing to put in the effort.)

I also think this problem could be solved in pure Fortran by making the functions all accept double-precision, and then having single precision calls which cast/copy the data to double-precision. Since no calculations are done by the library (i.e. no hairy physics equations being solved) having single precision vs. double is not going to affect performance.

@jbathegit do you know C or C++?

The way to solve this in C would be first to write a function using parameters and void pointers, which could take any of the data sizes needed, then write simple C wrappers for each actual case needed (i.e. one for single precision one for double), then write Fortran APIs to call those C wrappers. This way the void pointer is hidden from the Fortran layer, and it just sees C functions with nice, well-defined types. All of this is 100% standard C and Fortran.

@jbathegit
Copy link
Collaborator

Thanks everyone for your feedback and suggestions!

First of all, and to answer the question from @edwardhartnett, yes I'm proficient in C, but not so much in C++.

Secondly, a question for @rmclaren - in your example writsa_d, how does any actual space for the local variable _msgt get allocated? I see where you're declaring it as an array, but I don't see where you've actually allocated it to be a certain size, so how can _writsa safely write into that space without risking a segfault? Unless I'm missing something(?), this ends up being the same problem I had in my original writsa_d code (see above), where I needed to dynamically allocate my_msgt every time the function was called.

I was able to get around that by using the suggestion from @kgerheiser to incorporate an #include directive in a source file modi_writsa.F90:

module modi_writsa

    interface writsa
        module procedure writsa_4, writsa_8, writsa_d
    end interface

    contains

    subroutine writsa_4( lunxx, lmsgt, msgt, msgl, xval, yval )

        integer(kind=4), intent(in) :: lunxx
        integer(kind=4), intent(in) :: lmsgt
        integer(kind=4), intent(out) :: msgt(:)
        integer(kind=4), intent(out) :: msgl
        real(kind=4), intent(in) :: xval
        real(kind=4), intent(out) :: yval

#include "writsa_body"

    end subroutine

    subroutine writsa_8( lunxx, lmsgt, msgt, msgl, xval, yval )
        
        integer(kind=8), intent(in) :: lunxx
        integer(kind=8), intent(in) :: lmsgt
        integer(kind=8), intent(out) :: msgt(:)
        integer(kind=8), intent(out) :: msgl
        real(kind=8), intent(in) :: xval
        real(kind=8), intent(out) :: yval

#include "writsa_body"

    end subroutine

    subroutine writsa_d( lunxx, lmsgt, msgt, msgl, xval, yval )

        integer(kind=4), intent(in) :: lunxx
        integer(kind=4), intent(in) :: lmsgt
        integer(kind=4), intent(out) :: msgt(:)
        integer(kind=4), intent(out) :: msgl
        real(kind=8), intent(in) :: xval
        real(kind=8), intent(out) :: yval

#include "writsa_body"

    end subroutine

end module

But, as he noted, this separates the function body from the interface definition and relies on pre-processing, which in turn becomes a hairy mess for Doxygen to make sense of. Furthermore, while I was able to get a simple test code to compile and run via:

ifort -r8 -i8 modi_writsa.F90 test.f90

from the UNIX command line, I couldn't then get the same modi_writsa.F90 code to work within the existing CMake build procedure for NCEPLIBS-bufr; instead, it squawked with the following diagnostic:

undefined reference to writsa_

So I'm still kind of stuck here.

Please note that, whatever solution we come up, it has to work for any user and existing application code simply by calling the name writsa; otherwise, if we change that interface in any way, then every existing user has to correspondingly change his or her own application code, which is something we definitely want to avoid ;-)

@jbathegit
Copy link
Collaborator

jbathegit commented Jan 20, 2022

@edwardhartnett @kgerheiser @rmclaren @aerorahul @jack-woollen

Digging further, the undefined reference is due to the writsa reference being defined in the interface statement, which in turn is defined inside of a module, so I just have a use modi_writsa to bring it into scope within my simple test program. But when I'm building the NCEPLIBS-bufr library via CMake, there's no corresponding main program, and therefore no way I can think of to make the writsa reference visible within the library itself, aside from requiring every existing user to add use modi_writsa within their application code.

Am I missing something here? In other words, is there some straightforward way that I can make the writsa reference in the above interface statement visible throughout the compiled library so that an end user can see it by just linking the compiled library, and without requiring them to have to specify use modi_writsa in their application code?

@kgerheiser
Copy link
Contributor

There has to be a use <module> somewhere. For users there would be a single public facing module, which re-exports all the internal modules, like:

module bufr_mod
use modi_writsa
use ...
use ...
end module bufr_mod

And internally, each piece of code would use the individual modules that it needs.

@jbathegit
Copy link
Collaborator

jbathegit commented Jan 20, 2022

Thanks @kgerheiser, and yeah I did think of something like that, but I was hoping there might be some way for every user to not have to add any such new directive to their application code, or even worse within every individual source file of their code where any NCEPLIBS-bufr routine ever gets called.

I guess I was hoping there might be some sort of generic Fortran structure that I was overlooking and that would automatically get compiled and be in scope within the library without users having to modify their own application code in order to have visibility to that interface(?)

Anyway, thanks again for the confirmation, and if anyone else can think of a separate way around this, please let me know.

I guess my other question is a more general one for @edwardhartnett, in terms of how important this is? I get that it would be nice to be able to have one version of the compiled library that all application codes can link to (vs. separate _4, _8 and _d builds), but this is quite a bit of restructuring just for one subprogram out of many, and we haven't even addressed the issue of how Doxygen is going to behave with all of these #include *body directives linking in all of the various code snippets from all over the place. And converting any of these interfaces to C language would be just as time-consuming, so in the long run is it really worth doing all of this? In other words, are we getting enough proverbial bang-for-the buck from this time/resource investment?

@jbathegit
Copy link
Collaborator

Another issue I see with a single public-facing Fortran module such as bufr_mod is that any user trying to access NCEPLIBS-bufr from within application code written in C or Python would be out of luck. In other words, how are they going to be able to add a use bufr_mod to their application code if their code isn't Fortran?

@kgerheiser
Copy link
Contributor

@jbathegit you can still interface with C by using bind(c) and iso_c_binding types.

@jbathegit
Copy link
Collaborator

jbathegit commented Jan 21, 2022

@jbathegit you can still interface with C by using bind(c) and iso_c_binding types.

I'm confused. Given the above example module modi_writsa, how would I reference the name writsa from within C code? As far as I know, I can't add the bind(c) attribute to a name defined in a Fortran interface statement.

In other words, interface writsa bind(c) doesn't work, so how would I do this?

@edwardhartnett
Copy link
Contributor Author

How important is this? I would say quite important.

Right now we are doing this 4/8/d thing, which is a NOAA special - I've never seen anyone else do this or even try it. So we are well outside the normal with this.

Heading further down the road with 4/8/d is just putting off the problem. I have outlined a solution in C - can this work?

@aerorahul
Copy link
Contributor

Before this library is refactored/rewritten, what does the future use of BUFR look like in our systems and in general with the development around us?
Who are the primary users of this and the new library? What are their needs?
Is this library needed in the future or can it be combined with other existing solutions?

@edwardhartnett
Copy link
Contributor Author

I have heard the desire to phase out GRIB1 (although this still has not been achieved). There has been no talk of getting rid of BUFR, has there?

Our goal is to have world-class software so scientists can do world-class science. We will do whatever programming work we need to do, in order to achieve that.

If that means extra programming... well, we are programmers. It's what we do all day, every day.

image

@aerorahul
Copy link
Contributor

BUFR is not going anywhere. I am talking about use of bufr, not about its existence.

@edwardhartnett
Copy link
Contributor Author

OK, so this is our bufr solution for NOAA for the next several decades.

At the moment I am working hard on the GRIB libraries, and that will take some time. @jbathegit if you want to defer this entire conversation until I have free time to come help NCEPLIBS-bufr with this conversion away from 4/8/d, that's fine.

We are doing this for other libraries, and will learn as we go. So let's table this issue for a while until we have more experience.

Jeff, if you want to try out the C solution, I think that's where we will end up heading.

@jack-woollen
Copy link
Contributor

jack-woollen commented Jan 24, 2022 via email

@jbathegit
Copy link
Collaborator

@aerorahul raises a good question that I've raised myself in the past, which is what the future of BUFR looks like in our systems? He's correct that BUFR isn't going anywhere, and in fact we'll definitely continue receiving BUFR observational data from the outside world for many years/decades into the future. So we're always going to need to have software to be able to read and write BUFR when interacting with the rest of the world, because it is and will remain a WMO standard for a long time, and there's been no discussion whatsoever at the WMO level about changing that.

But as I've mentioned previously, and just because we'll have a continued need to be able to read/decode and write/encode BUFR in our interactions with the rest of the world, that doesn't mean we necessarily need to forever continue to use BUFR as our internal NCEP storage and archive format. In other words, there's no reason many of our internal codes couldn't switch to something else (direct JEDI/IODA array vectors?) if that were deemed preferable, and in that case it would greatly reduce the number of NCEP codes that we have to support with our BUFR software. But of course that's an enterprise-level decision that would have far-reaching implications into a lot of existing software and other internal practices. So, again, just throwing that out there as food-for-thought.

For my part, and to answer @jack-woollen's question, I have no idea how many existing codes or what percentage use the 8 or the d versions. I would imagine it's a fairly small number, because most BUFR values fit just fine into 32 bit fields. That said, there are a handful of newer BUFR descriptors whose range of values don't fit into 32 bits and therefore do require 8 bytes, and which is why I've suggested making that the new default and just tailoring all newly-developed interfaces to that build.

@jack-woollen
Copy link
Contributor

jack-woollen commented Jan 24, 2022 via email

@jbathegit
Copy link
Collaborator

The bufrlib inputs and outputs are all already 8 bytes. You don't need different libraries for that.

@jack-woollen please clarify what you mean by this. Specifically, when a user application calls, e.g. SUBROUTINE WRITSB(LUNIT), how does the subroutine know whether LUNIT was declared (or compiled) as a 4-byte or 8-byte integer in the calling program? This isn't mandated in the library itself, which just lets it be implicitly declared as an integer. Similarly, when an application code calls FUNCTION IUPBS01(MBAY,S01MNEM), the array MBAY is just implicitly declared internally as DIMENSION MBAY(*), so how does the function know whether each array member is 4 or 8 bytes long in the calling program? Since we never defined these interfaces explicitly in the past, that's part of the reason we now have this ambiguity.

Unless I'm missing something, the only things that are explicitly declared as 8 bytes within the library itself are the REAL values arrays that get passed into and out of subroutines such as UFBINT, UFBREP, etc. Pretty-much every other integer or real call parameter to the library doesn't have an explicit length specified.

@jack-woollen
Copy link
Contributor

jack-woollen commented Jan 24, 2022 via email

@CoryMartin-NOAA
Copy link

just because we'll have a continued need to be able to read/decode and write/encode BUFR in our interactions with the rest of the world, that doesn't mean we necessarily need to forever continue to use BUFR as our internal NCEP storage and archive format. In other words, there's no reason many of our internal codes couldn't switch to something else (direct JEDI/IODA array vectors?) if that were deemed preferable, and in that case it would greatly reduce the number of NCEP codes that we have to support with our BUFR software. But of course that's an enterprise-level decision that would have far-reaching implications into a lot of existing software and other internal practices.

Even if we still use BUFR files for archival/direct input to JEDI/IODA, that should still greatly simplify the codes that use BUFR libraries, no? We are already in the process of removing preprocessing (prepobs, etc.) code to put everything in JEDI UFO, so I think it's important to look at this with the lens of one unified software system (written in C++) will handle observation ingest once legacy systems are retired. The only other use that I could think of would be verification/MET+. Tagging @ADCollard @emilyhcliu for awareness/input.

@CoryMartin-NOAA
Copy link

Also to comment on @edwardhartnett 's quote "OK, so this is our bufr solution for NOAA for the next several decades." I think that is a chilling thought. This current BUFR library has pieces old enough to run for senate. This warrants thought on what the BUFR library at NCEP will look like in 2040.

@jbathegit
Copy link
Collaborator

On a related note, I don't think that any library subroutine or function which is never intended to be directly called from application codes should be on the list. Specifically, I noted

  • sntbbe.f
  • sntbde.f
  • sntbfe.f
  • stntbia.f

which I don't think should ever be called from outside of the library, and which I therefore don't think should need wrappers.

@jbathegit
Copy link
Collaborator

Saving a transcript of recent email discussions into this GitHub issue, for the record...

On Tue, Oct 18, 2022 at 3:49 PM Jeff Ator - NOAA Federal <jeff.ator@noaa.gov> wrote:
Hey Jack,

I agree we've probably considered enough "what ifs" to be able to move forward.  I'll continue working on this as well when I have time, but there are far fewer functions than subroutines, so I don't think it's fair to you that I only have to work on the former and not the latter.  So how about we both just work from your earlier list, and then we each just push our updates to the r8wraps branch on GitHub and use that forum to keep each other updated on where we're at?

-Jeff

On Tue, Oct 18, 2022 at 7:02 AM Jack Woollen - NOAA Affiliate <jack.woollen@noaa.gov> wrote:
Hey Jeff

That's a good one about the seg fault generated by x84 operating on a constant in the arg list. So another rule for input args is, to be "safe", always copy inputs, via x84, into a local 4byte scalar or array. I suppose this will work for a constant array in the arg list also. Probably don't see many of those. In any case I think we've looked at enough ins and outs to be fairly confident in moving forward with reworking the im8b blocks with the new strategy. I can work on this intermittently in the next month or so. One possible division of labor could be functions and subroutines. Since I already went through a lot of subroutines how about I continue with those, and you work on functions.

Jack

On Mon, Oct 17, 2022 at 5:09 PM Jeff Ator - NOAA Federal <jeff.ator@noaa.gov> wrote:
Hey Jack,

Now that I think about it some more, you're probably right that the LMSGOT assignment case would only work on a big-endian machine if it was declared locally as I*8.

You're also right that, for any input parameter, x84 could just remap the memory and x48 could just remap it back so there would be no "net" change to the input value.  However, we'd still have a problem if a calling program ever passed in a constant as that input parameter rather than a variable set to the same value, and in which case we'd end up with a segfault from trying to byte-shift protected memory.  In other words, we can't be sure a calling program will always pass in a value whose memory address we can modify.  So to be safe, I think we're always going to need to use x84 to copy any input parameter into a separate local array or scalar before making the recursive call.

On the output side, we could still pass any output parameter directly into the recursive call, but after doing so we'd need to make sure we always call x48 on any such parameter before we pass control back to the calling program, in case it's a big-endian machine.

Thanks,
-Jeff

On Fri, Oct 14, 2022 at 6:57 PM Jack Woollen - NOAA Affiliate <jack.woollen@noaa.gov> wrote:


On Fri, Oct 14, 2022 at 1:45 PM Jeff Ator - NOAA Federal <jeff.ator@noaa.gov> wrote:
Hey Jack,

I agree that using x84 with N=1 has no practical effect (i.e. "does nothing" ;-) on a little-endian machine like WCOSS2, but it seems to me it would if we were on a big-endian machine, because in that case the value of LO (i.e. the first byte of the low-order word) is 5, not 1.  Again, we need to make sure whatever changes we make to the library are endian-independent, and if we happened to be on a big-endian machine then there would be some byte-shifting involved.  So that's why I was trying to include that logic in my own prototype, because when you pass an I*8 integer across a subroutine interface in Fortran, what you're really passing is the address of the first byte of the underlying memory, so if that same memory is re-interpreted as I*4 on a big-endian machine, then you end up with the wrong half of the word if you don't use x84.

Absolutely right. So for endian sake we need to use x84 for all inputs to the recursive call.  And also x48 for all outputs back to the application, unless you use x84 to copy values to a separate array or scalar which won't be changed.

And that in turn brings up another possible bugaboo.  As you know, the idea behind x84 is so that we can remap an I*8 array as an I*4 array for use in the recursive call to the same subroutine inside of the IM8B wrapper.  However, if we happen to be on a big-endian machine, then x84 is actually shifting bytes, which means that we're modifying an array that was supposed to only be input to the subroutine.  In other words, if in a subroutine docblock we advertise an argument as being strictly "input", then we should never be modifying the underlying memory anywhere or for any purposes inside of that subroutine!

 If we use x84 going in and x48 going out it shouldn't be a problem. It would just change it and change it back, if the value wasn't changed by the recursive call. Alternatively, you could always use x84 to copy to a separate array (or scalar), then no need for x48 going out at all in that case, if there's no changes.

In the case of N=1 we can get around this problem by just using an assignment statement to a local variable inside of the subroutine (e.g. MY_LMSGOT = LMSGOT) and then pass that local variable (which is implicitly I*4) into the recursive subroutine call, because in that case it's an assignment by value (not by address), and so it will always work regardless of the endianness of the underlying machine.  But what if we were passing a contiguous array of I*8 integers into the subroutine as input?  In that case we'd need to remap the memory using x84, but if we do that then we're modifying an array that is strictly an input array!

Two things. One, I think the assignment only works if lmsgot is defined I*8. And two, see the response above.

The x-files give a lot of options, enough I think to handle any of the various situations presented by the subroutines modified for IM8B operation. It will certainly suffice for most, if not all, of them.

Thoughts?
-Jeff

On Fri, Oct 14, 2022 at 10:41 AM Jack Woollen - NOAA Affiliate <jack.woollen@noaa.gov> wrote:

As for x84 and x48 you're exactly right. X48 should be used in all cases when returning 8byte values from 4byte words. It is the equivalent of int8=int4. And x84 should be used for all 8byte word array input to 4byte arrays. In the case of scalar words no need to use x84 unless you want to copy the 8byte word into 4bytes with another name, i.e. call x84(i,j,1), as opposed to call x84(i,i,1). The latter case actually does nothing. The x-routines (x-files?) are generally able to work fine when the first 2 arguments are the same.

And for why ones are defined with -1, see directory /lfs/h2/emc/global/noscrub/Jack.Woollen/squawk/x84x48/bits. Output from running showbits (below) tells the story.

I'll start reworking subroutines on the list, asap.

Sincerely
Jack


+ ftn -g -traceback -o showbits.x showbits.f bits.f
+ showbits.x
 
 bits for int4=0
00000000000000000000000000000000 =      0
 
 bits for int8=0
0000000000000000000000000000000000000000000000000000000000000000 =      0
 
 bits for int4=-1
11111111111111111111111111111111 =     -1
 
 bits for int8=-1
1111111111111111111111111111111111111111111111111111111111111111 =     -1
 
 



On Thu, Oct 13, 2022 at 5:30 PM Jeff Ator - NOAA Federal <jeff.ator@noaa.gov> wrote:
Dear Jack,

I did look at your example, and I do appreciate you putting this together.  I can take care of setting the value of the logical "little" within CMake for the library, and I presume your plan going forward is that we would now call x48 whenever we need to shift the bytes for an I*4 to be interpreted as an I*8, and call x84 whenever we need to shift the bytes for an I*8 to be interpreted as an I*4, including for discrete (i.e. non-array) integer values?  In other words, for a single, discrete integer, we would just use N=1 as the last argument to x48 or x84, correct?

If so, then I agree we should be able to do recursive calls to the same subroutine within the wrappers, rather than having to have separate _8 subroutines.  And that's definitely a good thing, as we've already discussed.

On a more detailed note, and maybe I'm just being dense here, but why in x48 is the integer constant "ones" set to -1?  That was the one detail I couldn't quite follow.

Thanks,
-Jeff

On Wed, Oct 12, 2022 at 3:05 PM Jack Woollen - NOAA Affiliate <jack.woollen@noaa.gov> wrote:
Jeff

I don't think you looked at my example. All that is taken care of. Please look again.

Jack

On Wed, Oct 12, 2022 at 3:01 PM Jeff Ator - NOAA Federal <jeff.ator@noaa.gov> wrote:
Thanks Jack, but I don't know of any big-endian machines we can test on, and after diving down further into the weeds myself over the past couple of days, I'm not sure we can even keep the simplification we discussed earlier.  In other words, we may need to revert to our previous approach anyway, even for the cases where we're passing discrete integers.

As I'm sure you're aware, Fortran passes everything by address across subprogram interfaces, unless you explicitly specify the "value" attribute inside the subprogram, which isn't applicable here.  So even when you're just passing a single discrete integer, what you're really passing isn't the value, but the address of where the value is stored in memory.  So in the simple example of setting an I*8 integer to a value of 10 (=hex 0A) in the calling program, we have the following sequence of bytes reading left to right on a little-endian machine such as WCOSS2:

  byte1  byte2  byte3  byte4  byte5  byte6  byte7  byte8
|  0A  |  00  |  00  |  00  |  00  |  00  |  00  |  00  |

whereas on a big-endian machine we would instead have:

  byte1  byte2  byte3  byte4  byte5  byte6  byte7  byte8
|  00  |  00  |  00  |  00  |  00  |  00  |  00  |  0A  |

Now, and unless I'm missing something, when we pass either of these into a subroutine where the receiving variable is I*4, all that's happening is that the receiving variable receives the same starting address in memory and just interprets it as I*4.  So in the first case it maps to bytes 1-4 and gets the correct value of 10, but in the second case it gets the wrong value of 0.  In other words, the simplified logic we're using now only works because we happen to be testing on a little-endian machine, and it would fail if we were testing on a big-endian machine!  And the above diagram also explains the I*4 10-element array values I was seeing earlier for jdate_in, back when I was passing it an I*8 5-element array, and where on a big-endian machine we would have instead ended up with:

jdate_in(1) = 0
jdate_in(2) = 10
jdate_in(3) = 0
jdate_in(4) = 20
jdate_in(5) = 0
jdate_in(6) = 30
jdate_in(7) = 0
jdate_in(8) = 40
jdate_in(9) = 0
jdate_in(10) = 50

So bottom line, I think the only way we can really be safe is if we continue with the same approach we've been using up until now within the r8wraps branch, where we have a separate _8 subroutine (rather than recursively calling the same subroutine), and where all of the integer variables passed across a subprogram interface are explicitly declared as I*8 or I*4 and have the correct value explicitly stored in them outside of the subroutine interface (rather than relying on the subroutine to be able to correctly map the memory space or otherwise figure out what's what).  And that in turn should also take care of any endianness concerns.

If I'm missing something here, please let me know.  Again, I really appreciate all of your help with NCEPLIBS-bufr!


-Jeff


On Wed, Oct 12, 2022 at 11:07 AM Jack Woollen - NOAA Affiliate <jack.woollen@noaa.gov> wrote:
Jeff

I worked on the dumpbf problem, both the io array problem and the endian problem. Of course we need to run on a big endian platform to test that. But this approach should work there also. The directory /lfs/h2/emc/global/noscrub/Jack.Woollen/squawk/x84x48 contains the sample code for testing dumpbf. The script that runs it is called runt, main program runt.f. Check it out and see what it does.

Jack

On Fri, Oct 7, 2022 at 4:19 PM Jeff Ator - NOAA Federal <jeff.ator@noaa.gov> wrote:
Hey Jack,

If I take /lfs/h2/emc/obsproc/noscrub/jeff.ator/for_JWoollen_2/dumpbf.f and replace the line my_jdin(kk) = jdate_in((kk*2)-1) with CALL UPB( my_jdin(kk), 64, jdate_in, ibit ) and then compile and run with debugging on, I see that jdate_in has the following values inside of the IF(IM8B) block, since it's an I*8 5-element array in the calling program main.f and therefore an I*4 10-element array inside of dumpbf.f:

jdate_in(1) = 10
jdate_in(2) = 0
jdate_in(3) = 20
jdate_in(4) = 0
jdate_in(5) = 30
jdate_in(6) = 0
jdate_in(7) = 40
jdate_in(8) = 0
jdate_in(9) = 50
jdate_in(10) = 0

So I link to the libbufr_4 build of the library (since that's the single build that we're working towards, and so nbitw=32 and nbytw=4), and I call UPB with nbits = 64 and ibit = 0, 64, 128, 192, 256 for each of the kk=1,5 times through the loop, but I get back a value of 0 in each case for each of the my_jdin(kk) values each time through the loop.  I have to admit that I really struggle trying to follow all of the ISHFT, IREV and other bitwise logic deep down in the bowels of UPB and UPBB, so I'm not sure what I'm doing wrong here and would really appreciate if you could help me decipher what the problem may be.

Again, I think we need to have something that's endian-independent here rather than using straight assignment with the ((kk*2)-1) subscript notation, so thanks in advance for any help or guidance you can provide here.

Thanks,
-Jeff

On Fri, Oct 7, 2022 at 12:13 AM Jack Woollen - NOAA Affiliate <jack.woollen@noaa.gov> wrote:
Jeff
Yay, it does seem simpler. A couple things off the top of my bufr fried brain. I think if nbytw =8 it should not go into any "wrapper" block, regardless of the im8 setting. As for endian independence,  (eg nbytw=4 with im8=t) maybe we can just do byte reversal similar to what gets done to enable LE numbers to get packed properly into BE bufr bit strings.
Jack

On Thu, Oct 6, 2022 at 8:02 PM Jeff Ator - NOAA Federal <jeff.ator@noaa.gov> wrote:
Thanks Jack!  This looks great and should lead to much simpler library code in the long run.  In fact, based on what you've shown, and in cases where we're only passing discrete (i.e. non-array) integer values across subprogram interfaces, I think we can probably simplify even further by not even having any wrappers at all!  In other words, we'd only need wrappers for cases where:
  1. we need to change a value based on whether it's I*4 or I*8, such as in atrcpt.f when we need to do something like LMSGOT_4 = (LMSGOT_8 * 2) before recursively (re)calling the subprogram and passing in LMSGOT_4 instead of LMSGOT_8
  2. we need to pass an array of integers where each array member is its own discrete value, such as is the case with JDATE and JDUMP in subroutine dumpbf.f
But I think in all other subprograms we may not even need wrappers or recursive calls at all!  See my updated test program in /lfs/h2/emc/obsproc/noscrub/jeff.ator/for_JWoollen_2 for an example of what I'm talking about, and note here that there's no wrapper whatsoever needed for minimg.f since those arguments are all discrete (i.e. non-array) integers!

However, we still need one for dumpbf.f.  This is an example of the 2nd case noted above, and note that this situation is different from the case of the array of integers MSGIN we were looking at in atrcpt.f, and which was more analogous to the case of MBAY inside of the library where we're passing a contiguous BUFR message in memory across an array of integers.  In that case, we've already seen that the application program and called subroutine can each partition the array properly according to how it's defined within that particular block of code (i.e. whether it's windowed as I*8 array members or I*4 array members within that block) and return the expected values in each case.

However, in this case (for dumpbf.f), and unless I'm missing something, we still need a loop to individually copy each individual array member from I*8 arrays to I*4 arrays inside of the IF(IMB8) loop, and then back from I*4 arrays to I*8 arrays after the recursive call to DUMPBF.  The logic I have now works for both the Intel and GNU compilers; however, the ((kk*2)-1) subscript notation during the array copies is little-endian dependent, so a better approach would be to use logic that's endian-independent, e.g. replace the line my_jdin(kk) = jdate_in((kk*2)-1) with something like CALL UPB( my_jdin(kk), 64, jdate_in, ibit ), and replace the line jdump_out((kk*2)-1) = my_jdout(kk) with a similar call to UPB (or maybe UP8?)   But I couldn't quite get that to work, unless I missed something or wasn't quite calling it correctly, or else if you can think of a better way to do this?  Or maybe my brain is just fried from looking at this for most of the day? ;-)

I think this is the last case (i.e. an array of individual data values) that we need to figure out, and otherwise I think we're good to go with your new simplified approach.  Again, kudos to you for coming up with that - that's a great approach that seems to work nicely and is much simpler overall!  Please let me know if you can help come up with something endian-independent for the copy logic noted above, and note that if you do end up linking a copy of the library and using a call to UPB or something along those lines, then make sure you first include an explicit call to WRDLEN within this test version of DUMPBF, because otherwise you won't yet have nbitw, nbytw, etc. defined inside of the library! ;-)

Thanks,
-Jeff

On Wed, Oct 5, 2022 at 9:50 AM Jack Woollen - NOAA Affiliate <jack.woollen@noaa.gov> wrote:
Jeff

Thanks for the problem! I made some progress. Check out /lfs/h2/emc/global/noscrub/Jack.Woollen/squawk/rework. To run the test use atetest.

Jack

On Mon, Oct 3, 2022 at 12:45 PM Jeff Ator - NOAA Federal <jeff.ator@noaa.gov> wrote:
Sure thing Jack.  I just put together a short sample program which shows the error with the GNU compilers on WCOSS2.

See the code in /lfs/h2/emc/obsproc/noscrub/jeff.ator/for_JWoollen, which you can copy to your own area for testing.  If you run the compile script build.sh, you should see it squawk on line 17 of atrcpt.f with the following error message:

Error: Type mismatch in argument 'lmsgot_8' at (1); passed INTEGER(4) to INTEGER(8)

But if you then uncomment line 7 of atrcpt.f and re-run build.sh, it compiles cleanly, and you can then run the resulting a.out executable to see the expected output.

FWIW, the Intel compiler on WCOSS2 doesn't flag this, and it runs fine either way there.  But the library needs to work with the GNU compilers as well, so that's why I've now started using this approach in the r8wraps branch.

-Jeff

On Sat, Oct 1, 2022 at 1:35 PM Jack Woollen - NOAA Affiliate <jack.woollen@noaa.gov> wrote:
Jeff

I also tweaked how the arguments are defined and passed between the i4 and i8 wrappers, because the previous approach ended up generating error diagnostics with the GNU compiler when I was testing a stripped-down prototype on WCOSS2.

Maybe you can show me an example of the error you're talking about here. I'd like to take a look at it.

Thanks
Jack

@jbathegit
Copy link
Collaborator

@jack-woollen I copied a transcript of our recent email discussion to this GitHub issue, so we have a record of it here. And based on the outcome of that discussion, I also just updated the r8wraps branch to use X48 and X84 in all of the routines for which we've already developed I*8 wrappers. We've still got many more to go, but we're making progress!

Only detail I wanted to point out to you was in drfini.f. In that routine, we need to call X48 for both an array (MDRF) and for two scalars (LUNIT and NDRF), and the Gnu compiler squawks if you have different ranks for any argument between different calls to the same routine from within the same program block. So I just declared each of those two scalars as single-member arrays within drfini.f, and that resolved it. Take a look if you like and let me know if you have any questions.

@jack-woollen
Copy link
Contributor

jack-woollen commented Oct 24, 2022 via email

@jbathegit
Copy link
Collaborator

Jeff My only question is does it work? Jack

It did when I ran the CI tests for both Intel and Gnu. The drfini routine is called from within two of the CI test programs (test_OUT_4.f and test_OUT_6.f) in the test subdirectory of the library bundle, and they both passed.

@jack-woollen
Copy link
Contributor

jack-woollen commented Oct 24, 2022 via email

@jbathegit
Copy link
Collaborator

Hey @jack-woollen, maybe I'm missing something obvious, but from a user perspective what's the difference between copysb and ufbcpy? I see the former calls the latter for compressed subsets, but for documentation purposes I'd like to better understand why a user would call one vs. the other in the first place?

On a related note, I'd also like to better understand how ufbcup fits into the mix, and under what circumstances it would make sense for a user to call ufbcup instead of ufbcpy or copysb.

(Thanks! ;-)

@jack-woollen
Copy link
Contributor

jack-woollen commented Oct 25, 2022 via email

@edwardhartnett
Copy link
Contributor Author

I hope all these historical details make it into the appropriate doxygen documentation, so they are captured for future users and maintainers, who will be working on this code long after we have all retired. ;-)

@jbathegit
Copy link
Collaborator

jbathegit commented Oct 27, 2022

@edwardhartnett I'm trying to update the Doxygen documentation as I go, which is the main reason I asked those questions ;-)

@jack-woollen as I'm looking at ufbqcd and ufbqcp I'm seeing another potential issue. Each of those has a "real" as a calling argument (QCD is output from ufbqcd, whereas QCP is input to ufbqcp). But since it's not explicitly declared as "real*4" or "real*8" in either routine, then it seems we have a similar issue like we did before for integers, and where we ended up needing to use x48 and x84 to go back and forth between 4-byte and 8-byte values in order to be able to converge everything into a single library build.

In this case, there probably aren't too many codes out there which call either routine, so do you happen to know if those are currently compiled with 4-byte reals or 8-byte reals? If so, then maybe we just explicitly declare them accordingly within both of those routines - thoughts?

@jbathegit
Copy link
Collaborator

Or, another possible option...

By my understanding, QCP and QCD really only ever hold integer values anyway, so is there any reason we couldn't just change them to integers (IQCP and IQCD?) in these routines and then use x48 and x84? These updates are all going to be part of a new 12.0.0 (major version) release anyway, so there's an implicit understanding that folks may need to make some tweaks to their application codes to work with the updated library anyway, and so we could just document these real->integer type changes in the documentation and in the user guide and release notes so that everyone's aware.

Thoughts?

@jack-woollen
Copy link
Contributor

jack-woollen commented Oct 27, 2022 via email

@jbathegit
Copy link
Collaborator

Thanks Jack. I'm happy to leave them be if you're fairly confident that all of those codes are compiled using 4-byte reals.

Alternatively, did you see my other suggestion about maybe just converting QCD and QCP from reals to integers (IQCD and IQCP?) within those two routines and then using x48 and x84 like we do for other integers? Not sure why those variables may have even been declared as reals in the first place, since unless I'm missing something they really should only ever contain integer values anyway.

@jack-woollen
Copy link
Contributor

jack-woollen commented Oct 27, 2022 via email

@jbathegit
Copy link
Collaborator

Hey @jack-woollen, just an FYI that I pushed another commit up to the r8wraps branch. The updates in this latest commit include starting to modify the actual test codes to link to the _4 build of the library, even when the test codes themselves are compiled using 8-byte integers. In other words, we still run three separate tests of each of the test codes compiled using _4, _8, and _d, but each of those tests now links to the _4 build of the library! So far everything's working well, but I did want to bring your attention to one minor issue that we seem to have overlooked.

Specifically, we've only been worrying about adding _8 wrappers to user-callable library routines which have integer call arguments. So, for example, we hadn't added _8 wrappers to iupbs01 or iupbs3 because they didn't have any integer call arguments which required using x84 and/or x48. However, this leads to a problem when such routines internally call other user-callable library routines which do contain such wrappers.

For example, iupbs01 internally calls gets1loc and i4dy, and iupbs3 internally calls getlens. And in those cases, if the application program previously called setim8b because it was compiled using 8-byte integers, then those internal calls to gets1loc, i4dy, and getlens end up getting made with im8b still set to .true., even though they're being internally called from within iupbs01 or iupbs3 which now themselves are only compiled as _4 as part of the library. And, in turn, this means that those internally-called routines are now themselves calling x84 and x48 when they shouldn't be. These errors quickly showed up when I began testing out some of the test code mods noted above.

So to resolve this, we need to make sure that any user-callable subroutine or function in the library contains an internal _8 wrapper, even if the routine itself doesn't have any integer call arguments. That way, we know that im8b will have been switched to .false. before any internal calls are made to any other user-callable library routines. You can see where I made this change to iupbs01 and iupbs3 as part of this commit.

@jack-woollen
Copy link
Contributor

Testing is a wonderful thing! Sounds like a good solution. Clearly defining the 4byte and 8byte pathways through common subroutines. Kind of head spinning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants