Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document indexing section in deployer #7192 #1866

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

alhambrav
Copy link
Member

Ticket reference or full description of what's in the PR

Add document indexing section in deployer craftercms/craftercms#7192

- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.ms-excel
- application/vnd.ms-powerpoint
- application/vnd.openxmlformats-officedocument.presentationml.presentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you add to this list?

source/reference/modules/deployer/index.rst Outdated Show resolved Hide resolved
source/reference/modules/deployer/index.rst Outdated Show resolved Hide resolved
^^^^^^^^^^^^^^^^^
Document Indexing
^^^^^^^^^^^^^^^^^
Crafter Deployer indexes documents via the following configured items:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CrafterCMS indexes content items as follows

- Indexing of jacketed documents with anything that matches the configured pattern.

Note that indexing of documents in authoring and indexing of documents in delivery each have their own configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, CrafterCMS indexes non-text-based content as follows:
[add notes on how we index filenames and meta-data where possible]

"""""""""
Mimetypes
"""""""""
The list of supported mimetypes determines what's considered a document that should be full-text indexed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is the default list of MIME-types with full-text-search indexing enabled.

""""""""""""""""""""""""
Remote Documents Pattern
""""""""""""""""""""""""
The ``remoteBinaryPathPatterns`` in the config determines what a remote document is via path pattern.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What config? Name it up here before the example.

""""""""""""""
Jacket Pattern
""""""""""""""
The ``metadataPathPatterns`` in the configuration determines if a document should be indexed with the metadata of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define "configuration" filename (like above)

- The Deployer skips the document and leaves an error in the logs.
- The document is not re-processed unless it is updated/re-deployed or the re-process API is called.

The default behavior when a document cannot be indexed is that the Deployer logs the error and moves on. Process commits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repeats the bullets above. We should pick one.


If the deployment as a whole cannot be completed due to a catastrophic exception, then all content including documents
will be re-processed until the deployment succeeds. By default the Git Diff process is configured to update the processed
commits regardless of success or failure. Some deployments set this to false and force the processor chain to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repeats the paragraphs above.

These paragraphs regarding the different configuration should be pulled into a single paragraph that describes this approach and details the pros and cons.

- The Deployer skips the document and leaves an error in the logs.
- The document is not re-processed unless it is updated/re-deployed or the re-process API is called.

The default behavior when a document cannot be indexed is that the Deployer logs the error and moves on. Process commits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be "Processed" commits files ... ?

- Indexing of any remote document that matches the configured list of remote documents pattern
- Indexing of jacketed documents with anything that matches the configured pattern.

Note that indexing of documents in authoring and indexing of documents in delivery each have their own configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing is done differently in the authoring environment vs the delivery environment.

Authoring indexing is done to help content authors do their work and is controlled by CrafterCMS. The authoring search index is tuned to help authors and is not used by the project/site for delivery concerns.

On the other hand, delivery indexing is done to enable search and search-based features for the delivery project/site. This is configurable per project/site, and the index is tuned to help end-users use the project/site.

@alhambrav @sumerjabri should we clarify here that we actually have three different indices? authoring, preview and delivery?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants