Skip to content

Commit

Permalink
Feat/attribute schema (#70)
Browse files Browse the repository at this point in the history
* feat: adds initial implementation for attribute schema building and validation

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* feat: adds schema validation to collection building in the cli package

Adds AttributeSchema to the AttributeStore interface for
fetching attribute schema descriptor information.

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* chore: removes extra options from build schema subcommand

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* fix: fixes json schema creation in FromTypes method for valid results

* chore: adds consistent import sorting and unit tests in oras client package

* chore: removed DefaultContentDeclararions and CommonAttributeMapping from schema

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* fix: fixes unhandled error

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* docs: updates README.md with schema and linked collection building instructions

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* test: adds build command unit tests with dataset configuration inputs

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* docs: adds fixes to CLI command examples

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* chore: adds grammar fixes to various comments

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* fix: remove unused imports from annotations.go and empty file from docs

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* test: adds SchemaLoading unit test

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

* docs: fixes formatting of SchemaConfiguration in README.md

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>

Signed-off-by: Jennifer Power <barnabei.jennifer@gmail.com>
  • Loading branch information
jpower432 committed Aug 27, 2022
1 parent 79e067d commit f6bd927
Show file tree
Hide file tree
Showing 46 changed files with 1,917 additions and 402 deletions.
231 changes: 222 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,22 @@ uor-client-go version

### User Workflow

1. Create a directory with artifacts to publish to a registry as an OCI artifact. If the files reference each other, the client will replace the in-content linked files with the content address.
> WARNING: Currently, only JSON is supported for link replacement.
2. Use the `uor-client-go build` command to build the workspace as an OCI artifact in build-cache The default location is ~/.uor/cache. It an be set with the `UOR_CACHE` environment variable`.
1. Use `uor-client-go build schema` to build a schema to be used with a collection.
2. Use the `uor-client-go build collection` command to build the workspace as an OCI artifact in build-cache The default location is ~/.uor/cache. It can be set with the `UOR_CACHE` environment variable`.
3. Use the `uor-client-go push` command to publish to a registry as an OCI artifact.
4. Use the `uor-client-go pull` command to pull the artifact back to a local workspace.
5. Use the `uor-client-go inspect` command to inspect the build cache to list information about references.

### Build a schema into an artifact
```
# This is a optional precursor to building a collection. Collections can reference an
# already built schema or no schema at all.
uor-client-go build schema schema-config.yaml localhost:5000/myschema:latest
```
### Build workspace into an artifact

```
uor-client-go build my-workspace localhost:5000/myartifacts:latest
uor-client-go build collection my-workspace localhost:5000/myartifacts:latest
```

```
Expand All @@ -81,25 +86,26 @@ uor-client-go pull localhost:5000/myartifacts:latest -o my-output-directory --at

## Getting Started

### Basic Collection Publishing
1. Create a new directory.
2. Add the content to be uploaded in the directory (can be files of any content types).
3. Create a json doc where the value of each kv pair is the path to each file within the directory. Multiple json docs can be used to create deep graphs, but a graph must only have one root. Multiple json docs in a build directory is for advanced use cases. Most use cases do not need more than one json doc.

Example json doc:

```
```bash
{
"fish": "fish.jpg",
"text": "subdir1/file.txt",
"fish2": "subdir1/fish2.jpg"
}
```

4. Create a dataset-config.yaml outside of the content directory that references the relative paths from within the content directory to each file. Add user defined key value pairs as subkeys to the `annotations`section. Each file should have as many attributes as possible. Multiple files can be referenced by using the `*` wildcard.
4. Create a dataset-config.yaml outside the content directory that references the relative paths from within the content directory to each file. Add user defined key value pairs as subkeys to the `annotations`section. Each file should have as many attributes as possible. Multiple files can be referenced by using the `*` wildcard.

Example dataset-config.yaml:

```
```bash
kind: DataSetConfiguration
apiVersion: client.uor-framework.io/v1alpha1
collection:
Expand Down Expand Up @@ -136,19 +142,226 @@ uor-client-go pull localhost:5000/myartifacts:latest -o my-output-directory --at
`uor-client-go inspect`

9. Optionally pull the collection back down to verify the content with `uor-client-go pull`:
`uor-client-go pull localhost:5000/test/dataset:latest -o my-output-directory`
`uor-client-go pull localhost:5000/test/dataset:latest -o my-output-directory`

10. Optionally pull a subset of the collection back down to verify the content with `uor-client-go pull`:

Example attribute-query.yaml:
```
```bash
kind: AttributeQuery
apiVersion: client.uor-framework.io/v1alpha1
attributes:
fiction: true
```
`uor-client-go pull localhost:5000/test/dataset:latest -o my-output-directory --attributes attribute-query.yaml`

### Collection Publishing with Schema
1. Create a schema-configuration file to define attribute keys and types for corresponding collections:
Example schema-config.yaml

```bash
kind: SchemaConfiguration
apiVersion: client.uor-framework.io/v1alpha1
schema:
attributeTypes:
"animal": string
"size": number
"color": string
"habitat": string
"mammal": boolean
```
2. Build and save the schema:
```
uor-client-go build schema schema-config.yaml localhost:5000/myschema:latest
```
3. Push the schema to the remote registry:
```
uor-client-go push localhost:5000/myschema:latest
```
5. Create a new directory.
6. Add the content to be uploaded in the directory (can be files of any content types).
7. Create a json doc where the value of each kv pair is the path to each file within the directory. Multiple json docs can be used to create deep graphs, but a graph must only have one root. Multiple json docs in a build directory is for advanced use cases. Most use cases do not need more than one json doc.

Example json doc:

```bash
{
"fish": "fish.jpg",
"dog": "subdir1/dog.jpg",
}
```

8. Create a dataset-config.yaml outside the content directory that references the relative paths from within the content directory to each file. Add user defined key value pairs as subkeys to the `annotations`section. Each file should have as many attributes as possible. Multiple files can be referenced by using the `*` wildcard.

Example dataset-config.yaml:

```bash
kind: DataSetConfiguration
apiVersion: client.uor-framework.io/v1alpha1
collection:
schemaAddress: "localhost:5000/myschema:latest"
files:
- file: "fish.jpg"
attributes:
animal: "fish"
habitat: "ocean"
size: "small"
color: "blue"
mammal: false
- file: "subdir1/dog.jpg"
attributes:
animal: "dog"
habitat: "house"
size: "medium"
color: "brown"
mammal: true
- file: "*.jpg"
attributes:
custom: "customval"
```

9. Run the UOR client build command referencing the dataset config, the content directory, and the destination registry location. The attributes specified will be validated against the schema provided.
```
uor-client-go build my-workspace localhost:5000/test/dataset:latest --dsconfig dataset-config.yaml
```
10. Run the UOR push command to publish
```
uor-client-go push localhost:5000/test/dataset:latest
```

11. Optionally inspect the OCI manifest of the dataset:
`curl -H "Accept: application/vnd.oci.image.manifest.v1+json" <servername>:<port>/v2/<namespace>/<repo>/manifests/<digest or tag>`

12. Optionally inspect the cache:
`uor-client-go inspect`

13. Optionally pull the collection back down to verify the content with `uor-client-go pull`:
`uor-client-go pull localhost:5000/test/dataset:latest -o my-output-directory`

14. Optionally pull a subset of the collection back down to verify the content with `uor-client-go pull`:

Example attribute-query.yaml:
```bash
kind: AttributeQuery
apiVersion: client.uor-framework.io/v1alpha1
attributes:
mammal: true
```
`uor-client-go pull localhost:5000/test/dataset:latest -o my-output-directory --attributes attribute-query.yaml`

### Collection Publishing with Links
> IMPORTANT: Linked collection must have an attached schema
1. Build the schema
```bash
vi schema-config.yaml
kind: SchemaConfiguration
apiVersion: client.uor-framework.io/v1alpha1
schema:
attributeTypes:
"animal": string
"size": number
"color": string
"habitat": string
"type": string
```

```bash
uor-client-go build schema schema-config.yaml localhost:5000/myschema:latest
uor-client-go push localhost:5000/myschema:latest
```

2. Build a leaf collection
```bash
mkdir leaf-workspace
echo "leaf" > leaf-workspace/leaf.txt
```
```bash
vi leaf-dataset-config.yaml
kind: DataSetConfiguration
apiVersion: client.uor-framework.io/v1alpha1
collection:
schemaAddress: localhost:5000/myschema:latest
files:
- file: "*.txt"
attributes:
animal: "fish"
habitat: "ocean"
size: "small"
color: "blue"
type: "leaf"
```
```
uor-client-go build leaf-workspace localhost:5000/leaf:latest --dsconfig leaf-dataset-config.yaml
uor-client-go push localhost:5000/leaf:latest
```
3. Build a collection and link the previously built collection
```bash
mkdir root-workspace
echo "root" > root-workspace/root.txt
```
```bash
vi root-dataset-config.yaml
kind: DataSetConfiguration
apiVersion: client.uor-framework.io/v1alpha1
collection:
linkedCollections:
- localhost:5000/leaf:latest
schemaAddress: localhost:5000/myschema:latest
files:
- file: "*.txt"
attributes:
animal: "cat"
habitat: "house"
size: "small"
color: "orange"
type: "root"
```
```bash
uor-client-go build root-workspace localhost:5000/root:latest --dsconfig root-dataset-config.yaml
uor-client-go push localhost:5000/root:latest
```
4. Pull the collection with the `--pull-all` flag
```bash
uor-client-go pull localhost:5000/root:latest
ls
root.txt
uor-client-go pull localhost:5000/root:latest --pull-all
ls
leaf.txt root.txt
```
5. Pull all with attributes
```bash
vi color-query.yaml
kind: AttributeQuery
apiVersion: client.uor-framework.io/v1alpha1
attributes:
"color": "orange"
```

```bash
uor-client-go pull localhost:5000/root:latest --pull-all --attributes color-query.yaml
ls
root.txt
```
```bash
vi size-query.yaml
kind: AttributeQuery
apiVersion: client.uor-framework.io/v1alpha1
attributes:
"size": "small"
```
```bash
uor-client-go pull localhost:5000/root:latest --pull-all --attributes size-query.yaml
ls
leaf.txt root.txt
```

# Glossary

`collection`: a collection of linked files represented as on OCI artifact
23 changes: 23 additions & 0 deletions api/v1alpha1/schema_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
package v1alpha1

import (
"github.com/uor-framework/uor-client-go/schema"
)

// SchemaConfigurationKind object kind of SchemaConfiguration
const SchemaConfigurationKind = "SchemaConfiguration"

// SchemaConfiguration configures a schema.
type SchemaConfiguration struct {
TypeMeta `json:",inline"`
Schema SchemaConfigurationSpec `json:"schema"`
}

// SchemaConfigurationSpec defines the configuration spec to build a UOR schema.
type SchemaConfigurationSpec struct {
// Address is the remote location for the default schema of the
// collection.
Address string `json:"address"`
// AttributeTypes is a collection of attribute type definitions.
AttributeTypes schema.Types `json:"attributeTypes,omitempty"`
}
Loading

0 comments on commit f6bd927

Please sign in to comment.