Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Interactive Datasets for REPL usage #76

Merged
merged 173 commits into from
Dec 27, 2019
Merged

Provide Interactive Datasets for REPL usage #76

merged 173 commits into from
Dec 27, 2019

Conversation

yeesian
Copy link
Owner

@yeesian yeesian commented Jan 23, 2019

closes #21 #56 #66 #79

This package is currently not very conducive for interactive REPL usage. This PR

  1. removes the need to register drivers before usage.
  2. introduces "interactive datasets" (IDataset) that does not need to be contained within a do-block. This makes it much easier to play with datasets interactively, rather than having to enclose everything within a do-block.
  3. gives featurelayers "first-class treatment": they can now be created and copied as "floating" objects (via createlayer() and copy(featurelayer)), independently of the choice of a dataset. For floating featurelayers, a corresponding dataset will be created in memory.

CHANGELOG

new

  • src/types.jl: introduce IDataset (short for "interactive datasets") which has a finalizer that closes the corresponding file when it goes out of scope. This is in contrast to Dataset which exists only within the scope of a do-block. They both fall under the type hierarchy
    • Dataset <: AbstractDataset and
    • IDataset <: AbstractDataset.
  • src/types.jl: introduce FeatureDefn, IFeatureDefnView <: AbstractFeatureDefn
  • src/types.jl: introduce GeomFieldDefn, IGeomFieldDefnView <: AbstractGeomFieldDefn
  • src/types.jl: introduce FieldDefn, IFieldDefnView <: AbstractGeomFieldDefn
  • src/types.jl: IFeatureDefnView, IFieldDefnView and IGeomFieldDefnView are introduced to model internal references to existing objects. They support only a restricted set of operations.
  • src/types.jl: enable more flags
  • unsafe_ methods will return objects that needs to be cleaned up, and are not meant to be used by users.
  • tests: use /vsimem/ instead of creating tmp folders and files, and update for new dataset APIs
  • src/featurelayer.jl: introduce nfield(featurelayer) and ngeom(featurelayer)

different

Now getspatialref(...), getgeom(...), and getspatialfilter(...) always makes a copy of the object.

breaking

  • fromWKT(data) and fromWKB(data) no longer takes in a spatialref argument.

call signatures

  • createlayer(dataset, name) -> createlayer(name = name, dataset = dataset)
  • create(filename, driver) -> create(filename, driver = driver) or create(driver, filename = filename)
  • colortable = getcolortable(band) -> getcolortable(band) do colortable
  • copywholeraster(source, dest, options) -> copywholeraster(source, dest, options = options)
  • transform!(coordtransform, xs, ys, zs) -> transform!(xs, ys, zs, coordtransform) (for consistency with transform!(geom, coordtransform))

renamed

  • registerdrivers(...) -> environment(...). It is no longer required for file io, since we provide a driver manager that registers all the drivers during module initialization.
  • setattr!(spatialref, path) -> setattrvalue!(spatialref, path) (consistent with setattrvalue!(spatialref, path, value))
  • copylayer(dataset, layer, name) -> copy(layer, name = name, dataset = dataset) (layer is implied)
  • createcopy(dataset, filename, driver) -> copy(dataset, filename = filename, driver = driver) (create is implied)

The name geomfield sounded too long and confusing (is it a geom? or a field?), so I've decided to rename it to just geom:

  • deletegeomfielddefn!(featuredefn, i) -> deletegeomdefn!(featuredefn, i)
  • gfd = creategeomfielddefn(...) -> creategeomdefn(...) do gfd
  • getgeomfielddefn(...) -> getgeomdefn(...)
  • getgeomfield(...) -> getgeom(...)
  • setgeomfield!(...) -> setgeom!(...)
  • ngeomfield(...) -> ngeom(...)

Standardize on get/add|fielddefn/geomdefn/feature:

  • addgeomfielddefn!(featuredefn, geomdefn) -> addgeomdefn!(featuredefn, geomdefn)
  • creategeomfield!(layer, geomdefn) -> addgeomdefn!(layer, geomdefn)
  • createfield!(layer, fielddefn) -> addfielddefn!(layer, fielddefn)
  • createfeature!(layer, feature) -> addfeature!(layer, feature)

Some method names are too generic-sounding, so I've lengthened them to be clearer.

  • find(...) -> findstylestring(...)

Too many methods were being called get<methodname> without any unifying reason. I've decided to try and rename methods such that all remaining methods that start with get suggests a method corresponding to a modifier (such as add, set, etc).

  • getblocksize(rasterband) -> blocksize(rasterband)
  • getdatatype(rasterband) -> pixeltype(rasterband)
  • getaccess(rasterband) -> accessflag(rasterband)
  • getnumber(rasterband) -> indexof(rasterband)
  • getmaskflags(rasterband) -> maskflags(rasterband)
  • getmaximum(rasterband) -> maximum(rasterband)
  • getminimum(rasterband) -> minimum(rasterband)
  • getfidcolname(layer) -> fidcolumnname(layer)
  • getgeomcolname(layer) -> geomcolumnname(layer)
  • getlayerdefn(layer) -> layerdefn(layer)
  • getcurvegeom(geom) -> curvegeom(geom)
  • getlineargeom(geom) -> lineargeom(geom)
  • getdim(geom) -> geomdim(geom)
  • getgeomname(geom) -> geomname(geom)
  • getpaletteinterp(colortable) -> paletteinterp(colortable)
  • getsampleoverview(...) -> sampleoverview(...)
  • getrgba(...) -> toRGBA(...)
  • getextent(...) -> envelope(...)
  • getenvelope(geom) -> envelope(geom)
  • getenvelope3d(geom) -> envelope3d(geom)
  • getcolumnname(...) -> columnname(...)
  • getcolumnusage(...) -> columnusage(...)
  • getcolumntype(...) -> columntype(...)

Some of the index methods are now prefixed with find to be suggestive of what is actually happening.

  • getrowindex(rat, pxvalue) -> findrowindex(rat, pxvalue)
  • getcolumnindex(...) -> findcolumnindex(...)
  • getfieldindex(...) -> findfieldindex(...)
  • getgeomfieldindex(...) -> findgeomindex(...)

At the end of it all, we have

julia> AG.get
getattrvalue          getdefaultRAT          getgeotransform!       getoverview            getsubtype
getband               getdriver              getjustify             getpart                getthreadconfigoption
getcategorynames      getfeature             getlayer               getpoint               gettype
getcolorentry         getfeaturedefn         getlinearbinning       getpoint!              getunit
getcolorentryasrgb    getfid                 getmaskband            getprecision           getunittype
getcolorinterp        getfield               getmediatype           getproj                getwidth
getcolortable         getfielddefn           getname                getscale               getx
getconfigoption       getgeom                getnativedata          getspatialfilter       gety
getcoorddim           getgeomdefn            getnodatavalue         getspatialref          getz
getdataset            getgeomtype            getnonlineargeomflag   getstylestring
getdefault            getgeotransform        getoffset              getstyletable

julia> AG.set
setattributefilter!   setfeature!            setjustify!            setparam!              setstyleignored!
setattrvalue!         setfid!                setlinearbinning!      setparams!             setstylestring!
setcategorynames!     setfield!              setmediatype!          setpoint!              setstyletable!
setcolorentry!        setfrom!               setname!               setpointcount!         setsubtype!
setcolorinterp!       setgeom!               setnativedata!         setprecision!          setthreadconfigoption
setcolortable!        setgeomignored!        setnextbyindex!        setproj!               settype!
setconfigoption       setgeomtype!           setnodatavalue!        setrowcount!           setunit!
setcoorddim!          setgeotransform!       setnonlineargeomflag!  setscale!              setunittype!
setdefault!           setignored!            setnullable!           setspatialfilter!      setvalue!
setdefaultRAT!        setignoredfields!      setoffset!             setspatialref!         setwidth!

julia> AG.add
addfeature    addfielddefn   addgeom!       addpart!       addstyle!
addfeature!   addfielddefn!  addgeomdefn!   addpoint!

which feels a lot more reasonable.

dropped

  • getfeaturesread(layer): not all drivers seem to update this count properly, and advanced users can call GDAL.getfeaturesread(layer.ptr)
  • starttransaction(layer): advanced users can call GDAL.starttransaction(layer.ptr)
  • rollbacktransaction(layer): advanced users can call GDAL.rollbacktransaction(layer.ptr)
  • committransaction(layer): advanced users can call GDAL.committransaction(layer.ptr)
  • synctodisk!(layer): advanced users can call GDAL.synctodisk(layer.ptr)
  • flushcache!(rasterband): advanced users can call GDAL.flushrastercache(rasterband.ptr)
  • setstyletabledirectly!(feature, styletable): use setstyletable!(feature, styletable) instead. advanced users can call GDAL.setstyletabledirectly(feature.ptr, styletable.ptr).
  • setgeomdirectly!(feature[, i], geom): use setgeom!(feature[, i], geom) instead. advanced users can call GDAL.setgeometrydirectly(feature.ptr[, i], geom.ptr).
  • addgeomdirectly!(geom1, geom1): use push!(geom1, geom2) instead. advanced users can call GDAL.addgeometrydirectly(geom1.ptr, geom2.ptr).
  • setspatialref!(geom, spatialref) and transform!(geom, spatialref): rather than doing setspatialref!(geom, spatialref1); transform!(geom, spatialref2), do createcoordtrans(spatialref1, spatialref2) do coordtransform; transform!(geom, coordtransform) end instead. advanced users can call GDAL.assignspatialreference(geom.ptr, spatialref.ptr) and GDAL.transformto(geom.ptr, spatialref.ptr) instead.

@yeesian yeesian requested a review from visr March 6, 2019 20:15
@yeesian yeesian changed the title [WIP] Provide Interactive Datasets for REPL usage Provide Interactive Datasets for REPL usage Mar 6, 2019
@yeesian yeesian changed the title Provide Interactive Datasets for REPL usage [WIP] Provide Interactive Datasets for REPL usage Jul 31, 2019
@yeesian
Copy link
Owner Author

yeesian commented Aug 1, 2019

This PR triggered the need to think very carefully about the interactions between (i) "scoped" versus (ii) "floating" objects, so I just want to write that up first. I'll provide an update on the changelog later.

Motivation

The problem with using do-blocks to manage context (see example with obj1) is that they are difficult to work with in an interactive way.

ArchGDAL.read(filename) do dataset1
    # dataset1 exists within this scope
end
# we do not have access to dataset1 from here on

In the example above, we do not get to see the state of obj unless we write
code to display information within the scope of the do-block. This makes it difficult to explore in a depth-first manner (see example with obj2)

dataset2 = ArchGDAL.read(filename)
# we have access to dataset2 from here on

Issues

The issue with having "floating" GDAL objects is that the ownership of GDAL objects is not properly reflected in the wrappers that we implement in this package (which only has a pointer/handle to the object itself, and does not reference any of the other objects it might have a relationship with). Therefore, if we are not careful, we might encounter "gotchas" such as the following example

dataset2 = ArchGDAL.read(filename)
featurelayer2 = ArchGDAL.getlayer(dataset2, 0)
# ...
# dataset2 is no longer referenced by anything from here on
# Julia's GC triggers, and closes dataset2
# now featurelayer2 is left "dangling" and cause segfaults

The issue arose because ArchGDAL.destroy() is registered with the finalizer for dataset2. That is unfortunate, since it can arise without the user ever calling ArchGDAL.destroy() or any unsafe_ methods, making it difficult to verify safe code/behavior [1].

References and Ownership

The use of do-block sets up a call stack that corresponds to the dependencies [2,3] between "scoped" objects (which is managed inside src/context.jl), so we did not have similar issues before:

ArchGDAL.read(filename) do dataset1
    # dataset1 exists within this scope
    featurelayer1 = ArchGDAL.getlayer(dataset1, 0)
    # dataset1 will continue to exist within this scope, so
    # any object (e.g. featurelayer1) created within this scope
    # will not be left dangling.
end

For "floating" objects, we need to think about (a) the objects that depend on them, and (b) the objects that they depend on. For (a), we need to ensure that they are not destroyed while there are still objects that depends on them. For (b), we need to ensure that the objects they depend on are not destroyed within their scope.

The analysis is complicated when there are interactions between "scoped" and "floating" objects: to ensure that "floating" objects are not destroyed by the GC prematurely, there should be at least one reference to them. On the other hand, to avoid memory leaks, we do not wish to remove destroy() from their finalizers and we do not want there to be a permanent reference to each "floating object".

(a) For objects that depend on "floating objects"

This might happen when they are needed in the construction of an object, or an object has ones of its attributes being set to the floating object as a value.

  • IGeometry: There are not a lot of situations in which this can arise, but they do exist. (One example is setgeomdirectly!().) My current approach is to simply ban other objects from depending on IGeometry in those situations where they might arise.
  • ISpatialRef: The most common class of objects to depend on ISpatialRef are geometries (see [3]). My current approach is to remove the dependency in those situations where they arise (e.g. by making a copy of the spatialref).
  • IDataset: This is tricky because basically everything depends on it (directly or indirectly). For FeatureLayer and RasterBand, I introduced a reference to the dataset. For most things (Feature, FeatureDefn, FieldDefn, GeomFieldDefn, etc), one approach is to keep all of them "scoped" so that only the dataset is allowed to be a "floating" object. I have not found this to be a big penalty on the REPL experience, and we can provide helper functions in the future to "export" them into formats that do not depend on the dataset (if we want them to persist). The only two remaining types are IGeometry and ISpatialRef, which I'll address in the next section.

(b) For objects that "floating objects" depend on

Rather than building up a web of dependencies, the "floating" objects maintain only the handle to their own GDAL object. Therefore, if we have methods that retrieves an internal reference, we make a clone/copy of it.

  • IDataset: I don't yet know of a situation in which this arises, so I assume it's not common enough for me to worry for now.
  • ISpatialRef: We can call getspatialref() on Geometry, FeatureLayer, and GeomFieldDefn, so I just made sure to clone the spatialref each time.
  • IGeometry: We can call getgeom(geom, i), getgeom(feature, i), or getgeom(feature), so I just made sure to clone the geometry each time.

Some Counter Measures

By the nature of its approach, ArchGDAL.jl has already been overly permissive with unsafe behavior (see [3,4]), and the introduction of "floating" objects introduces an additional source of risk, so here are some counter-measures I took to manage them.

  1. Be more vigilant about enforcing the ! convention for functions/methods that mutate their inputs.
  2. Be extremely conservative about "floating" objects (currently they are IDataset, IGeometry, and ISpatialRef). (Any PR that introduces a new one will need to perform the due diligence of analyzing and explaining its safety implications, similar to the previous section.)
  3. Be extremely conservative about functions that mutate their inputs. (Any PR that introduces a new one will need to perform the due diligence of analyzing and explaining its safety implications, similar to the previous section.)
  4. Removed addgeomdirectly!(geom1, geom2), setgeomfielddirectly!(feature, i, geom), setstylestringdirectly!(feature, style), setstyletabledirectly!(feature::Feature, styletable::StyleTable) since there are versions of the same functions that do not assume ownership of the inputs.
  5. Removed setspatialref!(geom, spatialref) and transform!(geom, spatialref), since it introduces a lot of safety issues without providing much functionality. We also removed spatialref from the list of arguments to fromWKT/fromWKB since it does not do much for now. The same thing can be achieved with transform!(geom, coordtransform). There is a more general argument that SRS should be associated with an entire layer or geomfielddefn rather than with individual geometries. In any case, we can re-introduce it in the future if appropriate. Moreover, there is always the corresponding GDAL function as a workaround for users who need it.
  6. Remove parametric methods that previously allowed the user to provide their own geoemtries. They will now return only one of Geometry and IGeometry (depending on the context).
  7. I have increased the test coverage by a bit (but it is still wanting, so I have opened Improve test coverage #82 for it).

Footnotes

[1]: A similar issue has been encountered in JuliaGeo/LibGEOS.jl#58.

[2]: By a dependency, we mean whether or not the "lifetime"/validity/existence of the objects depend on each other. Whenever we have code that looks like y = f(x), where x is the input and y is the output to the function f, then we say that x and y are "dependent" (they are related to each other through f).

[3]: The code in src/context.jl assumes that the dependency runs in one direction, from the output y to the input x. This is true for functions that do not mutate their inputs, but not true in general. Any methods that mutates their inputs will be suspect. For example,

  | | |_| | | | (_| |  |  Version 1.1.1 (2019-05-16)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using ArchGDAL

julia> geom = ArchGDAL.createpoint(1,2)
Geometry: POINT (1 2)

julia> ArchGDAL.importEPSG(4326) do spatialref
       ArchGDAL.setspatialref!(geom, spatialref)
       print(ArchGDAL.getspatialref(geom))
       end
Spatial Reference System: +proj=longlat +datum=WGS84 +no_defs
julia> ArchGDAL.getspatialref(geom)
Segmentation fault: 11

[4]: I'd like to have an API that guarantees safety (through type-checks/etc). Nonetheless, I do not have enough time (within the scope of this project) to perform a comprehensive analysis, so I'm going to let practicality overrule for now. Moreover, there is a large amount of GDAL code out in the wild, and I'd like to keep it easy to "port code" from another repository if we have

  • a sufficiently similar (if not the same!) object model,
  • corresponding functions for most GDAL functions, and
  • workarounds that are not ovely onerous.

@yeesian yeesian changed the title [WIP] Provide Interactive Datasets for REPL usage Provide Interactive Datasets for REPL usage Aug 7, 2019
@yeesian
Copy link
Owner Author

yeesian commented Aug 7, 2019

@visr this is now ready for review

@evetion
Copy link
Collaborator

evetion commented Dec 21, 2019

What's needed to merge this? I'm glad to help if someone can point me in the right direction.

@visr
Copy link
Collaborator

visr commented Dec 21, 2019

The code here is good to merge. But it switches to the GDAL.jl 1.0 release which doesn't have a working GEOS linked on Linux. That's why I didn't merge it yet, because it would break geometry operations that work now.

Right now there is a PR on Yggdrasil to add GDAL. I think we should finish that, use that build for GDAL.jl (which means support only julia 1.3+). If that is working, we can merge and tag this fairly quickly.

@evetion
Copy link
Collaborator

evetion commented Dec 25, 2019

@yeesian @rafaqz I've migrated the Array interface to the new interface and also incorporated #102. Please review.

Closes #102 and #101.

Copy link
Collaborator

@visr visr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that this is finally coming to completion. We should probably take the changelog written in the first post and add that to the release notes, to make it easier for people to update to the new ArchGDAL.

@rafaqz
Copy link
Collaborator

rafaqz commented Dec 27, 2019

Looks good to me

count = sum(buffer[:, :, 1:1] .> 0)
@test total / count ≈ 76.33891347095299
@test total / (AG.height(ds) * AG.width(ds)) ≈ 47.55674749653172
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be in a different test set as it's not about setindex

yeesian and others added 24 commits December 27, 2019 15:58
push! -> addfeature!, addgeom!
pushfeature -> addfeature
writefeature -> createfeature
write! -> setfeature!
cherry-picked from #79
* GDAL.C.GDALDummyProgress -> GDAL.gdaldummyprogress

* OGRDatumType not supported for now

* replace all occurrences of GDAL.C. with their new counterparts

* update to new GDAL.jl API

* update to new GDAL.jl API

* OGRDatumType is removed in GDAL3.0

* Update .travis.yml

* Update appveyor.yml

* do not wrap GDAL pointers in an additional layer of Ptr

* add return keywords

* whitespacing

* tighten type signatures for constructors

* Ptr{GDAL.GDALDatasetH}(C_NULL) -> C_NULL
AppVeyor cancelled all other builds that I want to check
Co-authored-by: Rafael Schouten <rafaqz@users.noreply.github.com>
Co-authored-by: Rafael Schouten <rafaqz@users.noreply.github.com>
@visr visr mentioned this pull request Dec 27, 2019
5 tasks
@visr visr merged commit c5df7b7 into master Dec 27, 2019
@visr visr deleted the idataset branch December 27, 2019 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GDAL Environment management
4 participants