Skip to content

Go port of kuroshiro, a Japanese language library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported.

License

Notifications You must be signed in to change notification settings

chanyeinthaw/kuroshiro.go

Repository files navigation

You’re dabbling in Go and poking around with Japanese languages like a tourist playing with chopsticks—cute

GitHub Roaster

kuroshiro

kuroshiro.go

License Go Reference

kuroshiro.go is a Golang port of kuroshiro, a Japanese language library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported.

Feature

  • Japanese Sentence => Hiragana, Katakana or Romaji
  • Furigana and okurigana supported
  • Multiple romanization systems supported
  • Useful Japanese utils

Prerequisites

kuroshiro.go uses mecab internally. For install instructions of mecab, you could check the official website of mecab from here.

You need to tell Go where MeCab has been installed.

$ export CGO_LDFLAGS="-L/path/to/lib -lmecab -lstdc++"
$ export CGO_CFLAGS="-I/path/to/include"

If you installed mecab-config, execute following comands.

$ export CGO_LDFLAGS="`mecab-config --libs`"
$ export CGO_CFLAGS="-I`mecab-config --inc-dir`"

Usage

Install with go get

$ go get github.com/chanyeinthaw/kuroshiro.go
import (
	"fmt"

	"github.com/chanyeinthaw/kuroshiro.go"
	"github.com/chanyeinthaw/kuroshiro.go/analyzer"
)

const INPUT = "感じ取れたら手を繋ごう、重なるのは人生のライン and レミリア最高!"
func main() {
	analyzer, err := analyzer.NewMecab()
	defer analyzer.Destroy()
	if err != nil {
		panic(err)
	}

	ks := kuroshiro.New(analyzer)
	opts := kuroshiro.NewOptions().ConvertTo(kuroshiro.HIRAGANA).SetMode(kuroshiro.SPACED)
	result, err := ks.Convert(INPUT, opts)
	if err != nil {
		panic(err)
	}

	fmt.Println(result)
}

Romanization System

kuroshiro supports three kinds of romanization systems.

nippon: Nippon-shiki romanization. Refer to ISO 3602 Strict.

passport: Passport-shiki romanization. Refer to Japanese romanization table published by Ministry of Foreign Affairs of Japan.

hepburn: Hepburn romanization. Refer to BS 4812 : 1972.

There is a useful webpage for you to check the difference between these romanization systems.

Notice for Romaji Conversion

Since it's impossible to fully automatically convert furigana directly to romaji because furigana lacks information on pronunciation (Refer to なぜ フリガナでは ダメなのか?).

kuroshiro will not handle chōon when processing directly furigana (kana) -> romaji conversion with every romanization system (Except that Chōonpu will be handled)

For example, you'll get "kousi", "koushi", "koushi" respectively when converts kana "こうし" to romaji using nippon, passport, hepburn romanization system.

The kanji -> romaji conversion with/without furigana mode is unaffected by this logic.

About

Go port of kuroshiro, a Japanese language library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported.

Topics

Resources

License

Stars

Watchers

Forks

Languages