Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

용언에서 선어말어미를 제외한 어미만 분리하기 #163

Open
bab2min opened this issue Apr 24, 2024 · 1 comment
Open

용언에서 선어말어미를 제외한 어미만 분리하기 #163

bab2min opened this issue Apr 24, 2024 · 1 comment
Labels
기능 추가 기여하기 좋음 처음 기여하기에 쉽고 좋은 이슈

Comments

@bab2min
Copy link
Owner

bab2min commented Apr 24, 2024

bab2min/kiwipiepy#165 에서 요청된 기능

@bab2min bab2min added 기능 추가 기여하기 좋음 처음 기여하기에 쉽고 좋은 이슈 labels Apr 24, 2024
@bab2min
Copy link
Owner Author

bab2min commented Apr 24, 2024

enum Match에 새 bit flag인 joinEp를 추가하고 아래 함수에 결합 로직만 추가하면 된다.

Kiwi/src/Kiwi.cpp

Lines 533 to 599 in 37bfa60

template<class TokenInfoIt>
TokenInfoIt joinAffixTokens(TokenInfoIt first, TokenInfoIt last, Match matchOptions)
{
if (!(matchOptions & (Match::joinNounPrefix | Match::joinNounSuffix | Match::joinVerbSuffix | Match::joinAdjSuffix | Match::joinAdvSuffix))) return last;
if (std::distance(first, last) < 2) return last;
auto next = first;
++next;
while (next != last)
{
TokenInfo& current = *first;
TokenInfo& nextToken = *next;
// XPN + (NN. | SN) => (NN. | SN)
if (!!(matchOptions & Match::joinNounPrefix)
&& current.tag == POSTag::xpn
&& ((POSTag::nng <= nextToken.tag && nextToken.tag <= POSTag::nnb) || nextToken.tag == POSTag::sn)
)
{
concatTokens(current, nextToken, nextToken.tag);
++next;
}
// (NN. | SN) + XSN => (NN. | SN)
else if (!!(matchOptions & Match::joinNounSuffix)
&& nextToken.tag == POSTag::xsn
&& ((POSTag::nng <= current.tag && current.tag <= POSTag::nnb) || current.tag == POSTag::sn)
)
{
concatTokens(current, nextToken, current.tag);
++next;
}
// (NN. | XR) + XSV => VV
else if (!!(matchOptions & Match::joinVerbSuffix)
&& clearIrregular(nextToken.tag) == POSTag::xsv
&& ((POSTag::nng <= current.tag && current.tag <= POSTag::nnb) || current.tag == POSTag::xr)
)
{
concatTokens(current, nextToken, setIrregular(POSTag::vv, isIrregular(nextToken.tag)));
++next;
}
// (NN. | XR) + XSA => VA
else if (!!(matchOptions & Match::joinAdjSuffix)
&& clearIrregular(nextToken.tag) == POSTag::xsa
&& ((POSTag::nng <= current.tag && current.tag <= POSTag::nnb) || current.tag == POSTag::xr)
)
{
concatTokens(current, nextToken, setIrregular(POSTag::va, isIrregular(nextToken.tag)));
++next;
}
// (NN. | XR) + XSM => MAG
else if (!!(matchOptions & Match::joinAdvSuffix)
&& nextToken.tag == POSTag::xsm
&& ((POSTag::nng <= current.tag && current.tag <= POSTag::nnb) || current.tag == POSTag::xr)
)
{
concatTokens(current, nextToken, POSTag::mag);
++next;
}
else
{
++first;
if (first != next) *first = std::move(*next);
++next;
}
}
return ++first;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
기능 추가 기여하기 좋음 처음 기여하기에 쉽고 좋은 이슈
Projects
None yet
Development

No branches or pull requests

1 participant