When analyzing the text,there are ususally three major perspectives are considered.
- first is the quantitative measures, including the number of words or characters, the number or numbers, the freuency of the word and etc.
- The other one the the text readability. text readability is an important factor to be considered, expecially for the quantative analysis study. The popular readability metrics include Flesch Kincaid Grade Level, Flesch Reading Ease, Gunning Fog Index, Dale Chall Readability, Automated Readability Index (ARI), Coleman Liau Index, Lisnear Write, and SMOG. Here's a package in python called py-readability-metrics which can enable us to get the score of all these mentioned metrics.
- The third one is the content analysis of the text file, which LDA(laatent dirichlet allocaation method is a popular top modeling used nowadays.