Skip to content

Commit

Permalink
adding a page for mapping charechter filters
Browse files Browse the repository at this point in the history
  • Loading branch information
leanneeliatra committed Sep 20, 2024
1 parent 9230b00 commit fdb2ddd
Showing 1 changed file with 89 additions and 0 deletions.
89 changes: 89 additions & 0 deletions _analyzers/character-filters/mapping-character-filter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
layout: default
title: Mapping Character Filter
parent: Character Filters
nav_order: 95
---

# Mapping character filter

The `mapping character filter` allows you to define a map of `keys` and `values` for character replacements. Whenever the filter encounters a string of characters matching a key, it replaces them with the corresponding value.

Matching is greedy, meaning that the longest matching pattern is prioritized. Replacements can also be empty strings if needed.

The mapping character filter helps in scenarios where specific text replacements are required before tokenization.

## Example of the mapping filter
The following example demonstrates a mapping filter that converts Roman numerals (I, II, III, IV, etc.) into their corresponding Arabic numerals (1, 2, 3, 4, etc.).
```
GET /_analyze
{
"tokenizer": "keyword",
"char_filter": [
{
"type": "mapping",
"mappings": [
"I => 1",
"II => 2",
"III => 3",
"IV => 4",
"V => 5"
]
}
],
"text": "I have III apples and IV oranges"
}
```
This filter will produce the following text:
```
I have 3 apples and 4 oranges
```

## Configuring the mapping filter
There are two ways to configure the mappings.
1. `mappings`: Provide an array of key-value pairs in the form `key => value`. For every key found, the corresponding value will replace it in the input text.
2. `mappings_path`: Specify the path to a UTF-8 encoded file containing key-value mappings. Each mapping should be on a new line in the format `key => value`. The path can be absolute or relative to the OpenSearch configuration directory.

### Using a custom mapping character filter
You can create a custom mapping character filter by defining your own set of mappings. The following example demonstrates the creation of a custom character filter that replaces common abbreviations in a text.
```
PUT /text-index
{
"settings": {
"analysis": {
"analyzer": {
"custom_abbr_analyzer": {
"tokenizer": "standard",
"char_filter": [
"custom_abbr_filter"
]
}
},
"char_filter": {
"custom_abbr_filter": {
"type": "mapping",
"mappings": [
"BTW => By the way",
"IDK => I don't know",
"FYI => For your information"
]
}
}
}
}
}
```
Use the custom analyzer as shown
```
GET /text-index/_analyze
{
"tokenizer": "keyword",
"char_filter": [ "custom_abbr_filter" ],
"text": "FYI, updates to the workout schedule are posted. IDK when it takes effect, but we have some details. BTW, the finalized schedule will be released Monday."
}
```
This filter will produce the following text:
```
For your information, updates to the workout schedule are posted. I don't know when it takes effect, but we have some details. By the way, the finalized schedule will be released Monday.
```

0 comments on commit fdb2ddd

Please sign in to comment.