Skip to content

Commit

Permalink
REPO-5191 Document how to create a custom metadata extractor (#1826)
Browse files Browse the repository at this point in the history
Linked to bug fix found while writing these docs:
    Alfresco/alfresco-community-repo#227
    Alfresco/alfresco-transform-core#316
[skip ci] .md files only
  • Loading branch information
alandavis committed Jan 6, 2021
1 parent 361a543 commit 502653b
Show file tree
Hide file tree
Showing 5 changed files with 434 additions and 17 deletions.
19 changes: 17 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,23 @@
New Features
</h2>

<h2> Libraries
</h2>
<ul>
<li>Metadata Extract

The out of the box extraction of metadata is now generally performed asynchronously via a T-Engine connected to the
repository either as part of the Alfresco Transform Service or as a Local transformer. This provides better security,
scalability and reliability. The framework used for metadata extraction within the content repository remains,
allowing custom extractors / embedders of metadata to still function, as long as they don't extend the extractors
that have been removed. Ideally such custom code should be gradually moved into a T-Engine. For more information see
[Metadata Extractors](https://github.com/Alfresco/acs-packaging/blob/master/docs/metadata-extract-embbed.md).
<li>Removal of Legacy transformers

In ACS 6, the Alfresco Transform Service and Local transformers where introduced to help offload the transformation
of content to a separate process. In ACS 7, the out of the box Legacy transformers and transformation framework have
been removed. This helps provide greater clarity around installation and administration of transformations and
technically a more scalable, reliable and secure environment.
<li>Removal of 3rd party libraries

With the offloading of both transforms and metadata extraction to T-Engines a number of 3rd party libraries
are no longer needed within the content repository. They do still exist within the T-Engines performing the
same tasks. Any AMPs that where making use of these will need to provide these libraries themselves. This will
Expand Down
90 changes: 90 additions & 0 deletions docs/build_and_release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Build
The `acs-packaging` project uses _Travis CI_. \
The `.travis.yml` config file can be found in the root of the repository.


## Stages
Although a little unusual, builds of branches other than `master` and `release` branches are preceded by
cloning and building branches with the same name in upstream projects: `alfresco-community-repo` and
`alfresco-enterprise-repo`. This is done so that development can take place in parallel between projects
without having to do development releases to link them together. In fact, you don't even have to wait for
the upstream Travis build to complete. You just need to make sure the parent poms reference
each other and the `dependency.alfresco-community-repo.version` and
`dependency.alfresco-enterprise-repo.version` use the same SNAPSHOT values.

1. **test**: Java build with unit tests, integration tests and WhiteSource scan.
2. **docker_latest**:
3. **release**:
3. **publish**: Artifact deployment to AWS Release bucket.


ALL BELOW IS COPIED FROM ANOTHER PROJECT

## Branches
Travis CI builds differ by branch:
* `master` / `SP/*` / `HF/*` branches:
- regular builds which include the _Build_ stage;
> On the `master` branch only the _Build_ stage updates the `latest` T-Engines images on
> both Quay and DockerHub:
> - alfresco/alfresco-pdf-renderer
> - alfresco/alfresco-imagemagick
> - alfresco/alfresco-tika
> - alfresco/alfresco-libreoffice
> - alfresco/alfresco-transform-misc
> - alfresco/alfresco-transform-core-aio
- if the commit message contains the `[trigger release]` tag, the builds will also
include the _Release_ stage;
- PR builds where the latest commit contains the `[trigger release]` tag will execute dry runs
of the release jobs (no artifacts will be published until the PR is actually merged).
* `ATS-*` branches:
- regular builds which include only the _Build_ and _Tests_ stages;
* `company_release` branch:
- builds that include the _Company Release_ stage only.
- the `company_release` branch should be used for one-off events; once used (a build
completes), the branch should be deleted.

All other branches are ignored.


## Release process steps & info
Prerequisites:
- the `master` / `SP/*` / `HF/*` branch is green and it contains all the changes that should be
included in the next release.

Steps:
1. Create a new branch with the name `ATS-###_release_version` from the `master` / `SP/*`/ `HF/*`
branch.
2. Update the project version if the current POM version is not the next desired release; use a
maven command, i.e.
```bash
mvn versions:set -DnewVersion=#.##.#-SNAPSHOT versions:commit
```
3. Update the project's dependencies (remove the `-SNAPSHOT` suffixes - only for dependencies, not
for the local project version).
4. Create a new commit with the `[trigger release]` tag in its message. If no local changes have
been generated by steps (2) and (3), then an empty commit should be created - e.g.
```bash
git commit --allow-empty -m "ATS-###: Release T-Core (T-Engines) #.##.# [trigger release]"
```

> The location of the `[trigger release]` tag in the commit message is irrelevant.

> If for any reason your PR contains multiple commits, the commit with the `[trigger release]`
tag should be the last (newest) one. This will trigger the Release dry runs.
5. Open a new Pull Request from the `ATS-###_release_version` branch into the original
`master` / `SP/*` / `HF/*` branch and wait for a green build; the **Release** stage on the PR build
will only execute a _Dry_Run_ of the release.
6. Once it is approved, merge the PR, preferably through the **Rebase and merge** option. If the
**Create a merge commit** (_Merge pull request_) or **Squash and merge** options are used, you
need to ensure that the _commit message_ contains the `[trigger release]` tag (sub-string).

## Company Release process steps & info
Prerequisites:
- The **Release** stage is complete - i.e. the release commit is tagged and the release
artifacts are deployed on Nexus.

Steps:
1. Create a new `company_release` branch from the `master` / `SP/*`/ `HF/*` branch. This job uses
the latest branch git tag to identify the version that must be uploaded to the S3 release bucket.
2. Wait for a green build on the branch.
3. Delete local and remote `company_release` branch.
21 changes: 14 additions & 7 deletions docs/creating-a-t-engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ technologies:
* Maven
* Docker

Custom T-Engines may also be used to extract or embed Metadata, as they are just a specialist form of transform.
For more information see (metadata-extract-embed.md)[metadata-extract-embed.md].

## Developing and Debugging T-Engines

Expand Down Expand Up @@ -122,7 +124,7 @@ as the one used by the [Tika T-Engine](https://github.com/Alfresco/alfresco-tran

### The Controller Class

T-Engines generally extend an AbstractTransformerController and provide
T-Engines generally extend the AbstractTransformerController and provide
implementations of the following methods. Take a look at the
[HelloWorldController.java](https://github.com/Alfresco/alfresco-helloworld-transformer/blob/master/alfresco-helloworld-transformer-engine/src/main/java/org/alfresco/transformer/HelloWorldController.java)
example and the alfresco-transformer-base [README](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-transformer-base/README.md) useful.
Expand All @@ -135,12 +137,6 @@ and requests from the Transform Service via a message queue.
@Override
public void transformImpl(String transformName, String sourceMimetype, String targetMimetype,
Map<String, String> transformOptions, File sourceFile, File targetFile)

@PostMapping(value = "/transform", consumes = MULTIPART_FORM_DATA_VALUE)
public ResponseEntity<Resource> transform(HttpServletRequest request,
@RequestParam("file") MultipartFile sourceMultipartFile,
@RequestParam(value = "targetExtension") String targetExtension,
@RequestParam(value = "language") String language)
```


Expand All @@ -154,6 +150,17 @@ Method parameters:
* **sourceFile** the source as a file
* **targetFile** the target as a file

The helloworld example does all the actual transform processing in this method for simplicity, but if you look at
the core T-Engines, you will see they offload the actual work to a class which implements the `Transform`
interface. It has a `transform` method with identical parameters. This provides a better separation of
responsibilities, and the ability to combine transformers.
~~~
default void transform(String transformName, String sourceMimetype, String targetMimetype,
Map<String, String> transformOptions,
File sourceFile, File targetFile) throws Exception {
}
~~~

##### getProbeTestTransform

```java
Expand Down
Loading

0 comments on commit 502653b

Please sign in to comment.