Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider computing and storing directory tree of datasets #1948

Open
sunu opened this issue Sep 1, 2021 · 2 comments
Open

Consider computing and storing directory tree of datasets #1948

sunu opened this issue Sep 1, 2021 · 2 comments

Comments

@sunu
Copy link
Contributor

sunu commented Sep 1, 2021

This can help solve a bunch of related problems: #1946, #1947, alephdata/alephclient#34 and alephdata/alephclient#35

Properties we should consider storing for each directory in the tree: size, number of files, folders it contains, filetype of its children etc

@brrttwrks
Copy link

This would be beneficial in a number of ways. From an analytical perspective, knowing what types of files are underneath a folder would help in the exploratory process - prioritizing which folders are more important or have relevant filetypes for an investigation or which folders are less important/full of non-relevant filetypes. Being able to filter by folder size would be useful as well. In general, I think this would make browsing more productive and effective in finding leads.

From the UX perspective, by adding failure/error as one such property, we can similary debug more easily, pinpoint problematic areas, and prioritize what aspects of the ingestion to improve, fix, or add to.

@brrttwrks
Copy link

I would add, and this is not related to documents, but to entities, is that there is a place to do a similar list of analytical properties for entities that would enable a better experience. For example, if an entity has intervals, how many of each does each have? If the intervals include payments, how many, how much, max and min, sum ... . I imagine emails might be interesting to add analytics to as well. I guess my point is, having some analytics on entities and documents would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants