Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wishlist: dandi wtf #57

Open
yarikoptic opened this issue Mar 17, 2020 · 5 comments
Open

wishlist: dandi wtf #57

yarikoptic opened this issue Mar 17, 2020 · 5 comments
Assignees
Labels
DX Developer eXperience

Comments

@yarikoptic
Copy link
Member

similar to datalad wtf but with details pertinent to dandi. Here is datalad example

DataLad 0.12.2 WTF (configuration, datalad, dependencies, environment, extensions, git-annex, location, metadata_extractors, python, system)

WTF

configuration <SENSITIVE, report disabled by configuration>

datalad

  • full_version: 0.12.2
  • version: 0.12.2

dependencies

  • appdirs: 1.4.3
  • boto: 2.44.0
  • cmd:7z: 16.02
  • cmd:annex: 7.20190819+git2-g908476a9b-1~ndall+1
  • cmd:bundled-git: 2.20.1
  • cmd:git: 2.20.1
  • cmd:system-git: 2.24.0
  • cmd:system-ssh: 7.9p1
  • exifread: 2.1.2
  • git: 3.0.5
  • gitdb: 2.0.5
  • humanize: 0.5.1
  • iso8601: 0.1.11
  • keyring: 17.1.1
  • keyrings.alt: 3.1.1
  • msgpack: 0.5.6
  • mutagen: 1.40.0
  • requests: 2.21.0
  • wrapt: 1.10.11

environment

  • GIT_PAGER: less --no-init --quit-if-one-screen
  • GIT_PYTHON_GIT_EXECUTABLE: /usr/lib/git-annex.linux/git
  • LANG: en_US
  • LANGUAGE: en_US:en
  • LC_ADDRESS: en_US.UTF-8
  • LC_COLLATE: en_US.UTF-8
  • LC_CTYPE: en_US.UTF-8
  • LC_IDENTIFICATION: en_US.UTF-8
  • LC_MEASUREMENT: en_US.UTF-8
  • LC_MESSAGES: en_US.UTF-8
  • LC_MONETARY: en_US.UTF-8
  • LC_NAME: en_US.UTF-8
  • LC_NUMERIC: en_US.UTF-8
  • LC_PAPER: en_US.UTF-8
  • LC_TELEPHONE: en_US.UTF-8
  • LC_TIME: en_US.UTF-8
  • PATH: /home/yoh/gocode/bin:/home/yoh/gocode/bin:/home/yoh/proj/dandi/dandi-cli/venvs/dev3/bin:/home/yoh/bin:/home/yoh/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/sbin:/usr/sbin:/usr/local/sbin

extensions

  • container:
    • description: Containerized environments
    • entrypoints:
      • datalad_container.containers_add.ContainersAdd:
        • class: ContainersAdd
        • load_error: None
        • module: datalad_container.containers_add
        • names:
          • containers-add
          • containers_add
      • datalad_container.containers_list.ContainersList:
        • class: ContainersList
        • load_error: None
        • module: datalad_container.containers_list
        • names:
          • containers-list
          • containers_list
      • datalad_container.containers_remove.ContainersRemove:
        • class: ContainersRemove
        • load_error: None
        • module: datalad_container.containers_remove
        • names:
          • containers-remove
          • containers_remove
      • datalad_container.containers_run.ContainersRun:
        • class: ContainersRun
        • load_error: None
        • module: datalad_container.containers_run
        • names:
          • containers-run
          • containers_run
    • load_error: None
    • module: datalad_container
    • version: 0.5.0

git-annex

  • build flags:
    • Assistant
    • Webapp
    • Pairing
    • S3
    • WebDAV
    • Inotify
    • DBus
    • DesktopNotify
    • TorrentParser
    • MagicMime
    • Feeds
    • Testsuite
  • dependency versions:
    • aws-0.20
    • bloomfilter-2.0.1.0
    • cryptonite-0.25
    • DAV-1.3.3
    • feed-1.0.0.0
    • ghc-8.4.4
    • http-client-0.5.13.1
    • persistent-sqlite-2.8.2
    • torrent-10000.1.1
    • uuid-1.3.13
    • yesod-1.6.0
  • key/value backends:
    • SHA256E
    • SHA256
    • SHA512E
    • SHA512
    • SHA224E
    • SHA224
    • SHA384E
    • SHA384
    • SHA3_256E
    • SHA3_256
    • SHA3_512E
    • SHA3_512
    • SHA3_224E
    • SHA3_224
    • SHA3_384E
    • SHA3_384
    • SKEIN256E
    • SKEIN256
    • SKEIN512E
    • SKEIN512
    • BLAKE2B256E
    • BLAKE2B256
    • BLAKE2B512E
    • BLAKE2B512
    • BLAKE2B160E
    • BLAKE2B160
    • BLAKE2B224E
    • BLAKE2B224
    • BLAKE2B384E
    • BLAKE2B384
    • BLAKE2BP512E
    • BLAKE2BP512
    • BLAKE2S256E
    • BLAKE2S256
    • BLAKE2S160E
    • BLAKE2S160
    • BLAKE2S224E
    • BLAKE2S224
    • BLAKE2SP256E
    • BLAKE2SP256
    • BLAKE2SP224E
    • BLAKE2SP224
    • SHA1E
    • SHA1
    • MD5E
    • MD5
    • WORM
    • URL
  • operating system: linux x86_64
  • remote types:
    • git
    • gcrypt
    • p2p
    • S3
    • bup
    • directory
    • rsync
    • web
    • bittorrent
    • webdav
    • adb
    • tahoe
    • glacier
    • ddar
    • git-lfs
    • hook
    • external
  • supported repository versions:
    • 5
    • 7
  • upgrade supported from repository versions:
    • 0
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
  • version: 7.20190819+git2-g908476a9b-1~ndall+1

location

  • path: /mnt/datasets/dandi
  • type: directory

metadata_extractors

  • annex:
    • load_error: None
    • module: datalad.metadata.extractors.annex
    • version: None
  • audio:
    • load_error: None
    • module: datalad.metadata.extractors.audio
    • version: None
  • datacite:
    • load_error: None
    • module: datalad.metadata.extractors.datacite
    • version: None
  • datalad_core:
    • load_error: None
    • module: datalad.metadata.extractors.datalad_core
    • version: None
  • datalad_rfc822:
    • load_error: None
    • module: datalad.metadata.extractors.datalad_rfc822
    • version: None
  • exif:
    • load_error: None
    • module: datalad.metadata.extractors.exif
    • version: None
  • frictionless_datapackage:
    • load_error: None
    • module: datalad.metadata.extractors.frictionless_datapackage
    • version: None
  • image:
    • load_error: None
    • module: datalad.metadata.extractors.image
    • version: None
  • xmp:
    • load_error: None
    • module: datalad.metadata.extractors.xmp
    • version: None

python

  • implementation: CPython
  • version: 3.7.3

system

  • distribution: debian/10.0
  • encoding:
    • default: utf-8
    • filesystem: utf-8
    • locale.prefered: UTF-8
  • max_path_length: 275
  • name: Linux
  • release: 4.19.0-5-amd64
  • type: posix
  • version: DANDI cli #1 SMP Debian 4.19.37-5+deb10u1 (2019-07-19)
@yarikoptic
Copy link
Member Author

We just need to strip away metadata_extractors (at least for now while there is no any kind of integration with datalad), git-annex sections and possibly "adopt" (copy) datalad/support/external_versions.py... I also wonder if there is some sane way to make this whole wtf a reusable component tunable for any given project... may be an independent python module? WDYT @jwodder ?

@yarikoptic
Copy link
Member Author

yarikoptic commented Nov 25, 2020

oh, crazy but now making so much sense in hinge sight came to my mind -- we should (ab)use https://github.com/duecredit/duecredit/ !!! We just need to add duecredit support to all related projects -- that would kill two three birds at once -- citations, dependencies tracking as pertinent to the specific invocation, and their versions

ATM we only "inject" versioning for numpy but it already works

$> DANDI_CACHE=ignore python -m duecredit `which dandi` ls /tmp/bad.nwb /tmp/HardwareTests-V2-IP8.nwb
PATH                          SIZE    SESSION_START_TIME   IDENTIFIER                                                       SESSION_DESCRIPTION ND_TYPES                                                                                                      NWB  
/tmp/bad.nwb                  32.0 MB 2019-11-08/18:46:09  2ae7afd1a09f78c3d7c3311d71990095010fab706d91f9048986eef429991a70 PLACEHOLDER         CurrentClampSeries (73), CurrentClampStimulusSeries (73), Device (148), IntracellularElectrode (147), LabN... 2.2.4
/tmp/HardwareTests-V2-IP8.nwb 9.2 MB  2020-11-21/20:42:02  ac24acc942a5b87538bf15d140e06b4576481565b77b114877c4d26ba23fc09e PLACEHOLDER         Device (7), IntracellularElectrode (6), LabNotebook, LabNotebookDevice, StimulusSets, Subject, SweepTable,... 2.2.4
Summary:                      41.2 MB 2019-11-08/18:46:09>                                                                                                                                                                                                         
                                      2020-11-21/20:42:02<                                                                                                                                                                                                         

DueCredit Report:
- Scientific tools library / numpy (v 1.19.4) [1]

1 package cited
0 modules cited
0 functions cited

References
----------

[1] Van Der Walt, S., Colbert, S.C. & Varoquaux, G., 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), pp.22–30.

@satra
Copy link
Member

satra commented Nov 25, 2020

we can add duecredit. there are two things that come to mind:

  1. i think what would be useful for neuroscientists is dataset citation. this crowd would be less interested in citing software, although we should list that as well.

  2. the issue i have with duecredit for software with citing papers is that it misses a lot of contributors. the above example is a perfect one. that paper does not reflect numpy contributors or even the originator.

there is no good answer, but before investing too much time, we may want to be clear about the kinds of sections of citations that would be generated.

@yarikoptic
Copy link
Member Author

re datasets: yes, ultimately we should aim for that. For DataLad datasets with some older aggregated metadata we already do that BTW, see datalad/datalad#3184

re misses: in the context of this issue, of primary interest is version information on all involved dependencies. As for "due credit" of all contributors -- someone smart could e.g. extend duecredit to provide a mode where it would list all contributors associated with github repository or smth like that. But it would not be "citeable" really. The best is to just use zenodo records per each (used) version (would also be a nice feature to add to duecredit, so it could automagically choose correct DOI according to the version). Eh -- we even had it "planned": duecredit/duecredit#117

@yarikoptic
Copy link
Member Author

yarikoptic commented Nov 25, 2020

re datasets: it would be up for us actually to just add due.cite(Doi('...')) upon operation on some dandiset ;)

edit: meanwhile could be some free text based on description etc with url to dandiset if known, again with a simple due.cite(Text())ordue.cite(Url())` if just boring url. I will submit a PR for generic duecredit addition now and we could extend on that later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DX Developer eXperience
Projects
None yet
Development

No branches or pull requests

4 participants