AIDM-7340

This repository was created in the Semester 2 of 2022-2023. It stores the group project datasets&codes of the course AIDM-7340 AI for Digital Media @ Hong Kong Baptist University, School of Communication (HKBU). This group project was created by YU Minghao, ZHAI Yongqing, JIANG Zhuohao, SONG Yihui, FANG Ying, HE Yuchen.

Group Project Introduction

Wailing Wall or Tree Hole? An AI Exploration of Mourning Behaviors for the Deceased on Chinese Social Media - Taking Li Wenliang as an Example

Our usual custom for the deceased: grieve at the beginning and not mention it frequently afterwards. But on social media, certain mourning behaviors seem to have become a daily activity.

The most representative of these is Li Wenliang's Weibo comment area, and people call it as “Chinese Wailing Wall” .Even long after their pass, people are still flooding into their accounts and pouring lots of comments everyday. Moreover, they not only mourn, but also share their own stories and emotions in departed’s accounts.

· What prompted them to leave these messages?
· Are there any common characteristics of these messages?
—— One media phenomenon worthy of exploration

Our study intends to use AI-based classification models as aids to investigate the profound communicative characteristics behind this phenomenon, focusing primarily on the comments section for Dr. Li Wenliang, while also incorporating 11 other cases of large-scale online mourning in the Chinese internet as a control group(Appendix I). Through analysis of text content and development trends, we hope to shed light on this phenomenon.

Academic Poster

Visualization Result

Content Introduction

│   Chinese-Words-Spllit-Comparison.ipynb
                //  Word Splitting for Control Group using ERNIE model. 
│   Chinese-Words-Spllit-Liwenliang.ipynb
                //  Word Splitting for Li Wenliang using ERNIE model. 
│   Comments_Select_Sample.ipynb
                //  Select representative comments context for text analysis.
│   Comparison_ModelClassification.ipynb
                //  Sequence classification task on control group using trained BERT model. 
│   DataVisualization_Liwenliang.ipynb
                //  Data Visualization for Li Wenliang.
│   LiwenliangSampling.ipynb
                // Sampling measurements for manual labeling Li Wenliang.
│   Liwenliang_BERT_torch_tree_hole.ipynb
                // BERT classification model establishment for Tree Hole.
│   Liwenliang_BERT_torch_whaling wall.ipynb
                // BERT classification model establishment for Wailing Wall.
│   Liwenliang_ModelClassification.ipynb
                // Model Classification procedure for Li Wenliang.
│   README.md
│   WeiboScraping.ipynb
                // Main scraping functions.
│   
├───dataset
│       codebook.csv
                // Distinctive words for identifying Li Wenliang comments' categories.
                // Divided by Wailing Wall, Tree Hole and Not Related.
                // Used in sequence tokenizer tasks.
│       codebook.txt
                // TXT version of the words without label for better tokenization. 
│       codebook_comparison.txt
                // Distinctive words for control group.
│       comparison_group.csv
                // Manual labelled comparison group.
│       comparison_group_codebook.csv
                // Comparison group's codebook.
│       LiwenliangAllWordscount.csv
                // Word counts after segment. 
│       LiwenliangComments_Split_sample.csv
                // Complete dataset with comments split and classification marked.
                // Due to the file storage limitation, here only provide examples on Github.
                // Please assest OneDrive link below to fetch complete dataset.
│       LiwenliangSampleAll.csv
                // Liwenliang sample results for manual labelling.
│       LiwenliangSample_Tree_Hole.csv
                // Liwenliang sample results for Tree Hole model training.
│       LiwenliangSample_Whailing_Wall.csv
                // Liwenliang sample results for Wailing Wall model training.
│       
├───external
│       Alibaba-PuHuiTi-Medium.ttf 
                //External font file for Wordcloud generation.
│       
├───images
│       Average Comments per User on Each Day by Category (Li Wenliang).png
│       Distinctive Wordcloud by Category.png
│       Distribution of Comments.png
│       Number of Comments and Average Comments Counts on Each Day (Li Wenliang).png
│       Number of Comments by Category on Each Day (Li Wenliang).png
│       Overall Trend of the Proportion of Comments by Category (Li Wenliang).png
│       Probability_Distribution_of_Comments.png
│       Ratio of Tree Hole Comments to Wailing Wall (Li Wenliang).png
│       Top 20 Commenters.png
│       Top 20 Single Users’ Comments Counts by Categories.png
│       Trend of Comments' Uniqueness Ratio.png
│       Wordcloud by Category.png
│       
└───model_save
    ├───tree_hole
    └───whailing_wall
    // Empty folders just for interpretation due to the storage limitation of Github.
    // Please assest OneDrive link below to fetch trained models.

Comparison Group Exploration

├── word_counts_csv         
                // folder , the word count of each category for each account.
├── images                        
                // folder ,output images
├── external                    
                // folder, store font files used in the visualization images
├── every_words_difference_csv 
                // folder, all the selected tree-hole and wailing wall data for each account, sorted based on "difference" indicators  .
├── data_overview_csv 
                // folder, store CSV files for stacked barcharts visualization
├── comments_every_account 
                // folder, all comments (with category labels and corresponding word segmentation) of each account
├── Comments_All_Word_Clouds.ipynb   
                // Code, for displaying and generating a word cloud for the control group.
├── Comments Distribution Control Group.ipynb   
                // Compute the comments type count for the 11 control accounts, and generate a stacked bar chart.
├── Every_counts_words_flited.ipynb  
                 // Extract the comments of the tree holes and wailing wall of the 11 accounts in the control group, filter them using certain rules [1], and generate the corresponding CSV.

[1]After multiple trial adjustments, we have set the following criteria: in the current data set, words count should greater than 10, and we rank each account's words in descending order based on the "Difference" indicator and select the top 75% of words for further exploration. This ensures the details of each account's words and controls the data volume, guaranteeing a manageable time for manual classification and exploration.

"Difference" Function
Difference = |['Wailing Wall'] - ['Tree Hole'] / (['Wailing Wall'] + ['Tree Hole']) * 100|

The calculation rule of the "Difference" index is as follows: "Difference" is a measure of the "purity/confidence" of a word. If a word exists in a certain sentence, the higher the purity of this word, the more likely this sentence belongs to either "tree hole" or "wailing wall".

OneDrive External Link for Big Files:

(Including Models and Full Dataset)

https://1drv.ms/f/s!AmIVffcWFoDC2XVpcUQcAO4kuB7p?e=IKa18b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIDM-7340

Group Project Introduction

Academic Poster

Visualization Result

Content Introduction

Comparison Group Exploration

OneDrive External Link for Big Files:

About

Releases

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
control_group_data_exploration/control_group_data_exploration		control_group_data_exploration/control_group_data_exploration
dataset		dataset
external		external
images		images
7340-550W-Appendix.docx		7340-550W-Appendix.docx
Chinese-Words-Spllit-Comparison.ipynb		Chinese-Words-Spllit-Comparison.ipynb
Chinese-Words-Spllit-Liwenliang.ipynb		Chinese-Words-Spllit-Liwenliang.ipynb
Comments_Select_Sample.ipynb		Comments_Select_Sample.ipynb
Comparison_ModelClassification.ipynb		Comparison_ModelClassification.ipynb
DataVisualization_Liwenliang.ipynb		DataVisualization_Liwenliang.ipynb
Github for dataset and whole anaylsis.txt		Github for dataset and whole anaylsis.txt
LiwenliangSampling.ipynb		LiwenliangSampling.ipynb
Liwenliang_BERT_torch_tree_hole.ipynb		Liwenliang_BERT_torch_tree_hole.ipynb
Liwenliang_BERT_torch_whaling wall.ipynb		Liwenliang_BERT_torch_whaling wall.ipynb
Liwenliang_ModelClassification.ipynb		Liwenliang_ModelClassification.ipynb
README.md		README.md
WeiboScraping.ipynb		WeiboScraping.ipynb

antiwarp2000/AIDM-7340

Folders and files

Latest commit

History

Repository files navigation

AIDM-7340

Group Project Introduction

Academic Poster

Visualization Result

Content Introduction

Comparison Group Exploration

OneDrive External Link for Big Files:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages