Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to diagnosis if result not match as expected? Chinese search term used. #476

Open
judychu opened this issue Sep 21, 2020 · 0 comments
Open

Comments

@judychu
Copy link

judychu commented Sep 21, 2020

First of all I'm using the lunr-languages plugin thus the behavior might be different. All documents, index and search terms used are Traditional Chinese.

I have successfully created a index with expected token, and there are two documents' field contain the same token, search result only present A document but not B. Here are the example:

Part of the Index

"version": "2.3.9",
      "fields": [
        "name"
      ],
      "fieldVectors": [
        [ // Assume only 2 documents added. 
          "name/roasted-chicken",
          [
            3,
            12.04,
            4,
            12.04,
            5,
            12.04,
            6,
            12.04
          ]
        ],
        [
          "name/lemon-salt-chicken",
          [
            11,
            10.923,
            12,
            10.923,
            13,
            10.923,
            14,
            10.923,
            15,
            10.923
          ]
        ]
      ],
      "invertedIndex": [
        [ // Some token is trimmed here
          "烤雞",
          {
            "_index": 4,
            "name": {
              "roasted-chicken": {
                "position": [
                  [
                    2,
                    2
                  ]
                ]
              }
            }
          }
        ],
        [
          "雞",
          {
            "_index": 14,
            "name": {
              "lemon-salt-chicken": {
                "position": [
                  [
                    5,
                    1
                  ]
                ]
              }
            }
          }
        ],
      ],
      "pipeline": [
        "stemmer"
      ]
    }

A document (roasted-chicken) and B document (lemon-salt-chicken) Name field contain chinese term "雞" (which means Chicken in English), however only B document return as a result:

ref: "lemon-salt-chicken"
score: 10.923
matchData: { 雞:{name:{"position": [ [5,1]}}|

And my reference code in Gatsby
gatsby-node.js

const index = lunr(function () {
this.use(lunr.zh);

this.ref(`slug`);
this.field(`name`, { boost: 10 });
this.metadataWhitelist = ["position"];
for (const doc of documents) {
	this.add(doc);
}
});

search.js

const index = Index.load(data.RecipeIndex);
let rawsearch = index.search(q);

I know its quite difficult to troubleshoot for non-Latin language, but my only questions are:

  1. What's the number under fieldVectors in Index means? Is that something related to relevance?
  2. Any hints to find out why the A document is not returned? I guess its related to low score but don't know how to figure it now.

Any response would be very appreciated! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant