SpaCy Tutorial

Gateway to Natural Language Processing (NLP)

日本語はこちら

This tutorial will loosely follow spaCy's official tutorial.

Code

Introduction to spaCy
- Installation
- Tokenization
- Stop words
- Lemmatization
- Sentence Segmentation
- Part-of-speech (POS) tagger
- Named entity recognizer (NER)
- Syntactic dependency parser
Intermediate spaCy
- Word vectors
- Working with big dataset
- Pipelines
Advanced spaCy
- Using GPU
- Model training
- Transfer learning from BERT (text classifier)
- Annotation with Prodigy

NLP Frameworks

*** Based on my experience. Partially taken from PyText paper

When to use spaCy

End-to-end NLP analysis and machine learning
Preprocessing for downstream analysis and machine learning
Baseline for more complex custom models
The following tasks (The ones in bold are recommended tasks):
- ~~Automatic speech recognition~~
- Constituency parsing (partially supported)
- Coreference resolution (partially supported)
- Active learning annotation (thru Prodigy)
- Chunking (only noun phrase)
- ~~Crossmodal~~
- Data masking (possible with spaCy's models and Matcher)
- Dependency parsing
- ~~Dialogue~~
- Entity linking
- ~~Grammatical error correction~~
- Information extraction (possible with spaCy's models and Matcher)
- Intent Detection ~~and Slot Filling~~
- Language modeling (ULMFiT-like language model is experimental)
- Lemmatization
- ~~Lexical normalization~~
- ~~Machine translation~~
- ~~Missing elements~~
- ~~Multi-task learning~~
- ~~Multi-modal~~
- Named entity recognition
- Natural language inference (partially supported)
- Part-of-speech tagging
- Question answering (partially supported)
- ~~Relationship extraction~~
- Rule-based Matcher (you don't need a model for this :) )
- Semantic textual similarity
- ~~Semantic role labeling~~
- Sentiment analysis
- Sentence segmentation
- Stop words
- Tokenization (character, word, sub-word-level)
- ~~Summarization~~
- Text classification
- ~~Topic modeling~~
- Word Embedding (standard Word2Vec/GloVe, sense2vec, and contextualized)
- WordNet (partially supported)

SpaCyチュートリアル

自然言語処理（NLP）のゲートウェイ

このチュートリアルはspaCyの公式チュートリアルとMegagon Labsのスライド、オージス総研の記事を参考にしています。

コード

spaCy初級
- spaCyとGiNZAのインストール
- トークン化
- ストップワード
- 見出し語化
- 文単位分割
- 品詞（POS）タグ付け
- 固有表現抽出（NER）
- 依存構文解析のラベル付け
spaCy中級
- 単語ベクトル
- 大規模データ処理
- パイプライン処理
spaCy上級
- GPUの使用
- モデル学習
- 日本語BERTの転移学習（文書分類）
- Prodigyでのラベル付け

NLPのフレームワーク

*** 個人的な経験に基づく。PyTextの論文を一部抜粋

spaCyの使用例

自然言語の分析から機械学習まで全て
下流タスク（分析や機械学習）の前処理
より複雑なカスタムモデル用のベースライン
以下のタスク（太字が推奨タスク）:
- ~~Automatic speech recognition~~
- Constituency parsing (一部サポート)
- Coreference resolution (一部サポート)
- Active learning annotation (Prodigyで)
- Chunking (名詞句のみ)
- ~~Crossmodal~~
- Data masking (spaCyのモデルとMatcherで可能)
- Dependency parsing（依存構文解析のラベル付け）
- ~~Dialogue~~
- Entity linking
- ~~Grammatical error correction~~
- Information extraction (spaCyのモデルとMatcherで可能)
- Intent Detection ~~and Slot Filling~~
- Language modeling (ULMFiTのようなLMは試験的)
- Lemmatization（見出し語化）
- ~~Lexical normalization~~
- ~~Machine translation~~
- ~~Missing elements~~
- ~~Multi-task learning~~
- ~~Multi-modal~~
- Named entity recognition（固有表現抽出）
- Natural language inference (一部サポート)
- Part-of-speech tagging（品詞タグ付け）
- Question answering (一部サポート)
- ~~Relationship extraction~~
- Rule-based Matcher（ルールベースのマッチング） (モデル不要)
- Semantic textual similarity
- ~~Semantic role labeling~~
- Sentiment analysis
- Sentence segmentation
- Stop words（ストップワード）
- Tokenization（トークン化） (character, word, sub-word-level)
- ~~Summarization~~
- Text classification（文章分類）
- ~~Topic modeling~~
- Word Embedding (通常のWord2VecやGloVe、sense2vec、文脈を考慮した単語ベクトルなど)
- WordNet (一部サポート)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
img		img
01_intro_to_spacy.ipynb		01_intro_to_spacy.ipynb
02_intermediate_spacy.ipynb		02_intermediate_spacy.ipynb
03_advanced_spacy.ipynb		03_advanced_spacy.ipynb
README.md		README.md
ja_01_intro_to_spacy.ipynb		ja_01_intro_to_spacy.ipynb
ja_02_intermediate_spacy.ipynb		ja_02_intermediate_spacy.ipynb
ja_03_advanced_spacy.ipynb		ja_03_advanced_spacy.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpaCy Tutorial

Gateway to Natural Language Processing (NLP)

Code

NLP Frameworks

When to use spaCy

SpaCyチュートリアル

自然言語処理（NLP）のゲートウェイ

コード

NLPのフレームワーク

spaCyの使用例

About

Releases

Packages

Languages

yuibi/spacy_tutorial

Folders and files

Latest commit

History

Repository files navigation

SpaCy Tutorial

Gateway to Natural Language Processing (NLP)

Code

NLP Frameworks

When to use spaCy

SpaCyチュートリアル

自然言語処理（NLP）のゲートウェイ

コード

NLPのフレームワーク

spaCyの使用例

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages