Skip to content

Commit

Permalink
Codify cross-platform file paths
Browse files Browse the repository at this point in the history
The netcdf-c code has to deal with a variety of platforms:
Windows, OSX, Linux, Cygwin, MSYS, etc.  These platforms differ
significantly in the kind of file paths that they accept.  So in
order to handle this, I have created a set of replacements for
the most common file system operations such as _open_ or _fopen_
or _access_ to manage the file path differences correctly.

A more limited version of this idea was already implemented via
the ncwinpath.h and dwinpath.c code. So this can be viewed as a
replacement for that code. And in path in many cases, the only
change that was required was to replace '#include <ncwinpath.h>'
with '#include <ncpathmgt.h>' and then replace file operation
calls with the NCxxx equivalent from ncpathmgr.h Note that
recently, the ncwinpath.h was renamed ncpathmgmt.h, so this pull
request should not require dealing with winpath.

The heart of the change is include/ncpathmgmt.h, which provides
alternate operations such as NCfopen or NCaccess and which properly
parse and rebuild path arguments to work for the platform on which
the code is executing. This mostly matters for Windows because of the
way that it uses backslash and drive letters, as compared to *nix*.
One important feature is that the user can do string manipulations
on a file path without having to worry too much about the platform
because the path management code will properly handle most mixed cases.
So one can for example concatenate a path suffix that uses forward
slashes to a Windows path and have it work correctly.

The conversion code is in libdispatch/dpathmgr.c, and the
important function there is NCpathcvt which does the proper
conversions to the local path format.

As a rule, most code should just replace their file operations with
the corresponding NCxxx ones defined in include/ncpathmgmt.h. These
NCxxx functions all call NCpathcvt on their path arguments before
executing the actual file operation.

In some rare cases, the client may need to directly use NCpathcvt,
but this should be avoided as much as possible. If there is a need
for supporting a new file operation not already in ncpathmgmt.h, then
use the code in dpathmgr.c as a template. Also please notify Unidata
so we can include it as a formal part or our supported operations.
Also, if you see an operation in the library that is not using the
NCxxx form, then please submit an issue so we can fix it.

Misc. Changes:
* Clean up the utf8 testing code; it is impossible to get some
  tests to work under windows using shell scripts; the args do
  not pass as utf8 but as some other encoding.
* Added an extra utf8 test case: test_unicode_path.sh
* Add a true test for HDF5 1.10.6 or later because as noted in
  PR #1794,
  HDF5 changed its Windows file path handling.
  • Loading branch information
DennisHeimbigner committed Mar 4, 2021
1 parent f388bce commit 0b7a538
Show file tree
Hide file tree
Showing 55 changed files with 1,331 additions and 1,285 deletions.
11 changes: 10 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -662,7 +662,6 @@ IF(USE_HDF5)
IF(HDF5_VERSION_STRING AND NOT HDF5_VERSION)
SET(HDF5_VERSION ${HDF5_VERSION_STRING})
ENDIF()

IF("${HDF5_VERSION}" STREQUAL "")
MESSAGE(STATUS "Unable to determine hdf5 version. NetCDF requires at least version ${HDF5_VERSION_REQUIRED}")
ELSE()
Expand Down Expand Up @@ -803,6 +802,15 @@ IF(USE_HDF5)
SET(HAS_PAR_FILTERS no CACHE STRING "")
ENDIF()

# Check to see if HDF5 library is 1.10.6 or greater.
# Used to control path name conversion
IF(${HDF5_VERSION} VERSION_LESS "1.10.6")

This comment has been minimized.

Copy link
@dopplershift

dopplershift Mar 31, 2021

Member

If you arrive here by taking the IF(HDF5_C_LIBRARY AND HDF5_HL_LIBRARY AND HDF5_INCLUDE_DIR) on line 611, and not the ELSE(...), HDF5_VERSION is unset and the build fails.

This comment has been minimized.

Copy link
@dopplershift

dopplershift Mar 31, 2021

Member

This is hit right now trying to build 4.8.0 packages for 64-bit Windows on conda-forge: conda-forge/libnetcdf-feedstock#116

This comment has been minimized.

Copy link
@WardF

WardF via email Apr 1, 2021

Member

This comment has been minimized.

Copy link
@dopplershift

dopplershift Apr 1, 2021

Member

I'm pretty sure you have to be specifying all of the hdf C lib, HL lib, and include dir to avoid the path where there HDF5 version is set.

SET(HDF5_UTF8_PATHS FALSE)
ELSE()
SET(HDF5_UTF8_PATHS TRUE)
ENDIF()
MESSAGE("-- Checking for HDF5 version 1.10.6 or later: ${HDF5_UTF8_PATHS}")

SET(H5_USE_16_API 1)
OPTION(NC_ENABLE_HDF_16_API "Enable HDF5 1.6.x Compatibility(Required)" ON)
IF(NOT NC_ENABLE_HDF_16_API)
Expand Down Expand Up @@ -2204,6 +2212,7 @@ configure_file(
SET(EXTRA_DIST ${EXTRA_DIST} ${CMAKE_CURRENT_SOURCE_DIR}/test_common.in)
SET(TOPSRCDIR "${CMAKE_CURRENT_SOURCE_DIR}")
SET(TOPBUILDDIR "${CMAKE_CURRENT_BINARY_DIR}")
SET(ISMSVC "${MSVC}")
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/test_common.in ${CMAKE_CURRENT_BINARY_DIR}/test_common.sh @ONLY NEWLINE_STYLE LF)


Expand Down
4 changes: 2 additions & 2 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,9 @@ AC_MSG_NOTICE([checking supported formats])
# An explicit disable of netcdf-4 | netcdf4 is treated as if it was disable-hdf5
AC_MSG_CHECKING([whether we should build with netcdf4 (alias for HDF5)])
AC_ARG_ENABLE([netcdf4], [AS_HELP_STRING([--disable-netcdf4],
[(synonym for --enable-hdf5)])])
[(deprecated synonym for --enable-hdf5)])])
test "x$enable_netcdf4" = xno || enable_netcdf4=yes
AC_MSG_RESULT([$enable_netcdf4])
AC_MSG_RESULT([$enable_netcdf4 (deprecated; Please use with --disable-hdf5)])
AC_MSG_CHECKING([whether we should build with netcdf-4 (alias for HDF5)])
AC_ARG_ENABLE([netcdf-4], [AS_HELP_STRING([--disable-netcdf-4],
[(synonym for --disable-netcdf4)])])
Expand Down
2 changes: 1 addition & 1 deletion debug/cf.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ S3TEST=1
#CDF5=1
#HDF4=1

#TR=--trace
H518=1

for arg in "$@" ; do
case "$arg" in
Expand Down
2 changes: 1 addition & 1 deletion include/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ ncbytes.h nchashmap.h ceconstraints.h rnd.h nclog.h ncconfigure.h \
nc4internal.h nctime.h nc3internal.h onstack.h ncrc.h ncauth.h \
ncoffsets.h nctestserver.h nc4dispatch.h nc3dispatch.h ncexternl.h \
ncpathmgr.h ncindex.h hdf4dispatch.h hdf5internal.h nc_provenance.h \
hdf5dispatch.h ncmodel.h isnan.h nccrc.h ncexhash.h ncxcache.h
hdf5dispatch.h ncmodel.h isnan.h nccrc.h ncexhash.h ncxcache.h ncfilter.h

if USE_DAP
noinst_HEADERS += ncdap.h
Expand Down
20 changes: 3 additions & 17 deletions include/hdf5internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -201,23 +201,9 @@ extern int NC4_hdf5get_libversion(unsigned*,unsigned*,unsigned*);/*libsrc4/nc4hd
extern int NC4_hdf5get_superblock(struct NC_FILE_INFO*, int*);/*libsrc4/nc4hdf.c*/
extern int NC4_isnetcdf4(struct NC_FILE_INFO*); /*libsrc4/nc4hdf.c*/

#ifdef _WIN32
extern int nc4_find_default_chunksizes2(NC_GRP_INFO_T *grp, NC_VAR_INFO_T *var);

/* Maxinum length of a typical path in UTF-8.
* When converting from ANSI to UTF-8, the length will be up to 3 times,
* so round up 260*3 to 1024. (260=MAX_PATH) */
#define MAX_PATHBUF_SIZE 1024

/* Struct for converting ANSI to UTF-8. */
typedef struct pathbuf
{
void *ptr;
char buffer[MAX_PATHBUF_SIZE];
} pathbuf_t;

const char *nc4_ndf5_ansi_to_utf8(pathbuf_t *pb, const char *path);
void nc4_hdf5_free_pathbuf(pathbuf_t *pb);

#endif /* _WIN32 */
EXTERNL hid_t nc4_H5Fopen(const char *filename, unsigned flags, hid_t fapl_id);
EXTERNL hid_t nc4_H5Fcreate(const char *filename, unsigned flags, hid_t fcpl_id, hid_t fapl_id);

#endif /* _HDF5INTERNAL_ */
1 change: 1 addition & 0 deletions include/nchttp.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ typedef struct NC_HTTP_STATE {
const char** headset; /* which headers to capture */
NClist* headers;
NCbytes* buf;
char errbuf[1024]; /* assert(CURL_ERROR_SIZE <= 1024) */
} NC_HTTP_STATE;

extern int nc_http_open(const char* objecturl, NC_HTTP_STATE** state, size64_t* lenp);
Expand Down
97 changes: 86 additions & 11 deletions include/ncpathmgr.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
* Copyright 2018, University Corporation for Atmospheric Research
* See netcdf/COPYRIGHT file for copying and redistribution conditions.
*/
#ifndef _NCWINIO_H_
#define _NCWINIO_H_
#ifndef _NCPATHMGR_H_
#define _NCPATHMGR_H_

#include "config.h"
#include <stdlib.h>
Expand All @@ -28,6 +28,7 @@
#endif
#endif

/* Define wrapper constants for use with NCaccess */
/* Define wrapper constants for use with NCaccess */
#ifdef _WIN32
#define ACCESS_MODE_EXISTS 0
Expand All @@ -49,16 +50,63 @@
#define ACCESS_MODE_RW (R_OK|W_OK)
#endif

/* Path Converter */
#ifdef _WIN32
#ifndef S_IFDIR
#define S_IFDIR _S_IFDIR
#define S_IFREG _S_IFREG
#endif
#ifndef S_ISDIR
#define S_ISDIR(mode) ((mode) & _S_IFDIR)
#define S_ISREG(mode) ((mode) & _S_IFREG)
#endif
#endif /*_WIN32*/

/*
WARNING: you should never need to explictly call this function;
rather it is invoked as part of the wrappers for e.g. NCfopen, etc.
This function attempts to take an arbitrary path and convert
it to a canonical form.
Assumptions about Input path:
1. It is a relative or absolute path
2. It is not a URL
3. It conforms to the format expected by one of the following:
Linux (/x/y/...), Cygwin (/cygdrive/D/...),
Windows (D:/...), or MSYS (/D/...), or relative (x/y...)
4. It is encoded in the local platform character set.
Note that for most systems, this is utf-8. But for Windows,
the encoding is most likely some form of ANSI code page, probably
the windows 1252 encoding.
Note that in any case, the path must be representable in the
local Code Page.
On output it produces a re-written path that has the following
properties:
1. The path is normalized to match the platform on which the code
is running (e.g. cygwin, windows, msys, linux). So for example
using a cygwin path under visual studio will convert e.g.
/cygdrive/d/x/y to d:\x\y. See ../unit_test/test_pathcvt.c
for example conversions.
It returns the converted path.
Note that this function is intended to be Idempotent: f(f(x) == f(x).
This means it is ok to call it repeatedly with no harm.
*/
EXTERNL char* NCpathcvt(const char* path);

/* path -> URL Path converter */
EXTERNL char* NCurlpath(const char* path);
/* Canonicalize and make absolute by prefixing the current working directory */
EXTERNL char* NCpathabsolute(const char* name);

/* Fix path in case it was escaped by shell */
EXTERNL char* NCdeescape(const char* name);
/* Convert from the local coding (e.g. ANSI) to utf-8;
note that this can produce unexpected results for Windows
because it first converts to wide character and then to utf8. */
EXTERNL int NCpath2utf8(const char* path, char** u8p);

#ifdef WINPATH
/* Wrap various stdio and unistd IO functions.
It is especially important to use for windows so that
NCpathcvt (above) is invoked on the path.
*/
#if defined(WINPATH)
/* path converter wrappers*/
EXTERNL FILE* NCfopen(const char* path, const char* flags);
EXTERNL int NCopen3(const char* path, int flags, int mode);
Expand All @@ -68,6 +116,10 @@ EXTERNL int NCremove(const char* path);
EXTERNL int NCmkdir(const char* path, int mode);
EXTERNL int NCrmdir(const char* path);
EXTERNL char* NCcwd(char* cwdbuf, size_t len);
EXTERNL char* NCcwd(char* cwdbuf, size_t len);
#ifdef HAVE_SYS_STAT_H
EXTERNL int NCstat(char* path, struct stat* buf);
#endif
#ifdef HAVE_DIRENT_H
EXTERNL DIR* NCopendir(const char* path);
EXTERNL int NCclosedir(DIR* ent);
Expand All @@ -78,8 +130,12 @@ EXTERNL int NCclosedir(DIR* ent);
#define NCopen2(path,flags) open((path),(flags))
#define NCremove(path) remove(path)
#define NCaccess(path,mode) access(path,mode)
#define NCmkdir(path,mode) mkdir(path,mode)
#define NCgetcwd(buf,len) getcwd(buf,len)
#ifdef HAVE_SYS_STAT_H
#define NCstat(path,buf) stat(path,buf)
#endif
#define NCcwd(buf, len) getcwd(buf,len)
#define NCmkdir(path, mode) mkdir(path,mode)
#define NCrmdir(path) rmdir(path)
#ifdef HAVE_DIRENT_H
#define NCopendir(path) opendir(path)
Expand All @@ -89,7 +145,26 @@ EXTERNL int NCclosedir(DIR* ent);

/* Platform independent */
#define NCclose(fd) close(fd)
#define NCfstat(fd,buf) fstat(fd,buf)

/**************************************************/
/* Following definitions are for testing only */

/* Possible Kinds Of Output */
#define NCPD_UNKNOWN 0
#define NCPD_NIX 1
#define NCPD_MSYS 2
#define NCPD_CYGWIN 3
#define NCPD_WIN 4
#define NCPD_REL 5 /* actual kind is unknown */

EXTERNL char* NCpathcvt_test(const char* path, int ukind, int udrive);

EXTERNL void printutf8hex(const char* s, char* sx);

EXTERNL int NChasdriveletter(const char* path);
/**************************************************/
/* From dutil.c */
EXTERNL char* NC_backslashEscape(const char* s);
EXTERNL char* NC_backslashUnescape(const char* esc);

#endif /* _NCWINIO_H_ */
#endif /* _NCPATHMGR_H_ */
2 changes: 0 additions & 2 deletions include/ncrc.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,6 @@ extern NCTriple* NC_rcfile_ith(NCRCinfo*,size_t);
/* From dutil.c (Might later move to e.g. nc.h */
extern int NC__testurl(const char* path, char** basenamep);
extern int NC_isLittleEndian(void);
extern char* NC_backslashEscape(const char* s);
extern char* NC_backslashUnescape(const char* esc);
extern char* NC_entityescape(const char* s);
extern int NC_readfile(const char* filename, NCbytes* content);
extern int NC_writefile(const char* filename, size_t size, void* content);
Expand Down
3 changes: 3 additions & 0 deletions include/ncuri.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ EXTERNL void ncurifree(NCURI* ncuri);
/* Replace the protocol */
EXTERNL int ncurisetprotocol(NCURI*,const char* newprotocol);

/* Replace the path */
EXTERNL int ncurisetpath(NCURI*,const char* newpath);

/* Replace the constraints */
EXTERNL int ncurisetquery(NCURI*,const char* query);

Expand Down
2 changes: 1 addition & 1 deletion lib_flags.am
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ if USE_DAP
AM_CPPFLAGS += -I${top_srcdir}/oc2
endif

if HAVE_AWS
if ENABLE_S3_SDK
AM_LDFLAGS += -lstdc++
endif

Expand Down
45 changes: 6 additions & 39 deletions libdispatch/ddispatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ See LICENSE.txt for license information.
#include "ncbytes.h"
#include "ncrc.h"
#include "ncoffsets.h"
#include "ncpathmgr.h"

/* Required for getcwd, other functions. */
#ifdef HAVE_UNISTD_H
Expand All @@ -19,7 +20,6 @@ See LICENSE.txt for license information.
/* Required for getcwd, other functions. */
#ifdef _WIN32
#include <direct.h>
#define getcwd _getcwd
#endif

#if defined(ENABLE_BYTERANGE) || defined(ENABLE_DAP) || defined(ENABLE_DAP4)
Expand Down Expand Up @@ -57,61 +57,28 @@ NCDISPATCH_initialize(void)

/* Capture temp dir*/
{
char* tempdir;
char* p;
char* q;
char cwd[4096];
#ifdef _WIN32
char* tempdir = NULL;
#if defined _WIN32 || defined __MSYS__
tempdir = getenv("TEMP");
#else
tempdir = "/tmp";
#endif
if(tempdir == NULL) {
fprintf(stderr,"Cannot find a temp dir; using ./\n");
tempdir = getcwd(cwd,sizeof(cwd));
if(tempdir == NULL || *tempdir == '\0') tempdir = ".";
}
globalstate->tempdir= (char*)malloc(strlen(tempdir) + 1);
for(p=tempdir,q=globalstate->tempdir;*p;p++,q++) {
if((*p == '/' && *(p+1) == '/')
|| (*p == '\\' && *(p+1) == '\\')) {p++;}
*q = *p;
}
*q = '\0';
#ifdef _WIN32
#else
/* Canonicalize */
for(p=globalstate->tempdir;*p;p++) {
if(*p == '\\') {*p = '/'; };
tempdir = ".";
}
*q = '\0';
#endif
globalstate->tempdir= strdup(tempdir);
}

/* Capture $HOME */
{
char* p;
char* q;
char* home = getenv("HOME");

if(home == NULL) {
/* use tempdir */
home = globalstate->tempdir;
}
globalstate->home = (char*)malloc(strlen(home) + 1);
for(p=home,q=globalstate->home;*p;p++,q++) {
if((*p == '/' && *(p+1) == '/')
|| (*p == '\\' && *(p+1) == '\\')) {p++;}
*q = *p;
}
*q = '\0';
#ifdef _WIN32
#else
/* Canonicalize */
for(p=home;*p;p++) {
if(*p == '\\') {*p = '/'; };
}
#endif
globalstate->home = strdup(home);
}

/* Now load RC File */
Expand Down
10 changes: 0 additions & 10 deletions libdispatch/dfile.c
Original file line number Diff line number Diff line change
Expand Up @@ -1853,12 +1853,7 @@ NC_create(const char *path0, int cmode, size_t initialsz,
/* Skip past any leading whitespace in path */
const unsigned char* p;
for(p=(const unsigned char*)path0;*p;p++) {if(*p > ' ') break;}
#ifdef WINPATH
/* Need to do path conversion */
path = NCpathcvt((const char*)p);
#else
path = nulldup((const char*)p);
#endif
}

memset(&model,0,sizeof(model));
Expand Down Expand Up @@ -2009,12 +2004,7 @@ NC_open(const char *path0, int omode, int basepe, size_t *chunksizehintp,
/* Skip past any leading whitespace in path */
const char* p;
for(p=(const char*)path0;*p;p++) {if(*p < 0 || *p > ' ') break;}
#ifdef WINPATH
/* Need to do path conversion */
path = NCpathcvt(p);
#else
path = nulldup(p);
#endif
}

memset(&model,0,sizeof(model));
Expand Down
Loading

0 comments on commit 0b7a538

Please sign in to comment.