Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-12898 Allow uploading asset and referencing via Parameter #9138

Closed
wants to merge 8 commits into from

Conversation

bbende
Copy link
Contributor

@bbende bbende commented Aug 1, 2024

Summary

This pull request adds the ability to upload assets to parameter contexts and then reference them from parameters in the context. This feature is described in NIFI-12898 and the feature proposal.

A new framework extension - AssetManager - is responsible for storing/retrieving assets on the file system.

The following REST end-points are added:

Create/upload an asset:
POST /parameter-contexts/{id}/assets

List assets:
GET /parameter-contexts/{id}/assets

Get asset content:
GET /parameter-contexts/{id}/assets/{assetId}

The last two end-points are primarily intended to be used when a node joins the cluster for it to compare it's current assets with the coordinator and synchronize it's assets to match.

To reference an asset it is done through a normal parameter context update, where a parameter will send no value and instead provide an asset reference:

referencedAssets: [
  {
     "id": "assetId1"
  },
 {
     "id": "assetId2"
  },
]

The back-end will automatically make the value of the parameter become the concatenated paths of the referenced assets.

The following CLI commands were added:

nifi create-asset -pcid <param-context-id> -af <full-path-to-a-file>

nifi list-assets -pcid <param-context-id>

nifi add-asset-reference -pcid <param-context-id> -pn <param-name> -aid <asset-id>

nifi remove-asset-reference -pcid <param-context-id> -pn <param-name> -aid <asset-id>

When a parameter context is updated, any asset that was previously referenced by a parameter and no longer referenced after the update will be deleted.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting together this new feature @bbende! I plan to take a closer look soon.

GitHub CodeQL flagged several issues related to using input parameters for file names, so that will require some closer evaluation.

@bbende
Copy link
Contributor Author

bbende commented Aug 2, 2024

Thanks @exceptionfactory ! I noticed the CodeQL issues as well...

I did sanitize the Filename header value before passing it to any code that would use it to write a File.

If that isn't enough then I suppose a fall back option would be to write the file names with the generated UUID and store some kind of mapping file along side the asset that contains the original filename.

The problem with this is that the value of the parameter that references the asset is going to be the concatenated paths of the assets and it's going to contain file names that the user has no idea what they are. Example, upload PostgresDriver.jar and it generates id 123, then reference this asset in a parameter named db_driver, and the value of the parameter will be '/path/to/assets/123'. So some looking at the parameter has no idea what file that is.

@exceptionfactory
Copy link
Contributor

Thanks for the reply @bbende. Several classes mitigate the path traversal problem using Java Path objects with normalization and checking that the resolved child path starts with the selected parent path.

It appears that the parent Parameter Context ID also needs to be checked. Instead of using the user-provided value, recommend looking up the Parameter Context and using the ID returned.

@bbende
Copy link
Contributor Author

bbende commented Aug 3, 2024

@exceptionfactory thanks for the pointers! I made a couple of improvements and seems to have resolved the CodeQL issues. I think most of them were actually somewhat false positives because the sanitize method was returning the input variable if null or blank, but that was causing CodeQL to think the unmodified user input was being potentially used later on.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this new feature @bbende! The majority of the code changes look good, and the initial changes to address the CodeQL path concerns also look good. I plan to do some additional testing, but on initial review, I just noted a couple minor things.

@bbende
Copy link
Contributor Author

bbende commented Aug 6, 2024

@exceptionfactory I pushed a commit that addresses the comments you had, and also another commit that adds a delete end-point to be able to clean up assets that were uploaded and maybe never referenced, let me know of anything else

markap14 and others added 7 commits August 9, 2024 10:18
- Remove AssetRequestReplicator and use existing UploadRequestReplicator
- Remove AssetManagerFactoryBean and expose AssetManager from Java based Spring config
- Switch multi-part form upload to direct octect-stream upload
- Adding CLI commands creating and referencing assets
- Add REST end-point for listing assets in a given param context
- Fix param context update merger
- Add REST end-point for retrieving content of an asset
- Add CLI commands for listing assets and getting asset content
- Refactor asset id to a generated uuid from param context id + asset name
- Add AssetSynchronizer and wire into StandardFlowService
- Protect against referencing assets from different context
- Implement syncing assets from the cluster coordinator
- Add system tests for replacing an asset and syncing assets
- Update flow diffing/synchronization to account for changes to referenced assets
- Change ParameterDTO from a list of String assetIds to a list of AssetReferenceDTOs
- Authorize node identities to read parameter contexts
- Add field to AssetDTO to indicate if content is missing
- Update synchronization to call createMissingAsset when content is missing
- Improve teardown of system tests to destroy environment if cluster is not in a good state
@markobean
Copy link
Contributor

Is this new feature accessible only via the API and not in the GUI?

@bbende
Copy link
Contributor Author

bbende commented Aug 9, 2024

@markobean at the moment yes, if someone picks up a UI JIRA to add it then it may be in the UI later on

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for building out these new features @bbende, the latest version looks good! +1 merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants