You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Notes: for the journal version we want to prove that either we are better with short training and eventually, or we are
better eventually.
We need to present the results for both cases.
For the small fake news dataset, we need to show that we are better with at least one of the two training.
Zero-Shot with CS Loss
(test on 6k) (this is a data leakage)
these results are cross-dataset without any specific topic
Model
Epochs
Train Params
loss
accuracy
cs accuracy
f1
cs f1
precision
recall
Inter-space weight
Intra-space weight
BERT
50
1538
0.6544
0.6230
N/A
0.6028
N/A
0.7227
0.3979
N/A
N/A
Space-BERT
50
4622 (3)
0.5722
0.7679
0.5958
0.7657
0.5902
0.8322
0.6707
0.1
1e-5
Space-BERT*
50
98562 (64)
0.4534
0.7975
0.6148
0.7974
0.5809
0.8040
0.7860
0.1
1e-5
Space-BERT
50
197122 (128)
0.4567
0.7964
0.5863
0.7963
0.5299
0.7822
0.8208
0.1
1e-5
this is after intersection removal:
Model
Epochs
Train Params
loss
accuracy
cs accuracy
f1
cs f1
precision
recall
Inter-space weight
Intra-space weight
BERT
50
1538
0.6035
0.7736
N/A
0.6044
N/A
0.5985
0.6132
N/A
N/A
Space-BERT
50
4622 (3)
0.5544
0.8292
0.5154
0.7180
0.4711
0.6985
0.7515
0.1
1e-5
Space-BERT
50
98562 (64)
0.4666
0.7997
0.8071
0.7069
0.6153
0.6847
0.7798
0.1
1e-5
Space-BERT
50
197122 (128)
0.5079
0.7750
0.8201
0.6892
0.5904
0.6729
0.7816
0.1
1e-5
* - our assumption was right, the model with less parameters but higher CS scores is better in terms of generalization
(test on covid-fake)
these results are cross-dataset with specific (covid) topic
Train the model on some dataset (e.g. Fake News Kaggle Competition, or Covid Fake News Dataset)
Let's embed the whole fact-checking dataset into the same space (not the embeddings, but the centroid of the
embeddings).
Let's try to predict the news to be fake or true.
After we've predicted let's use the embedding centroid and extract k nearest neighbors from the knowledge base.
Let's see how well this nearest neighbors match the original fact-checking articles (this we will measure by
max/average cosine/euclidian similarity of the neighbors embedding with the original fact-checking articles,
and by the number of exact matches, e.g. recall and precision).
Notes: we should actually compare this with S-BERT. For now we just use Mean of BERT embeddings as a centroid.
We should also think how we can deal with the fact that to use concept space similarity we need to use N knowledge
bases - one for each label. Since otherwise vectors that fall in different spaces will be compared (since we force
them to be orthogonal only with those from the different concept spaces).
tp / (tp + fp) = precision (how well we identify true explanation)
tp / (tp + fn) = recall (how well we distinguish true explanation from false explanation)
Model
Train Dataset
Epochs
Train Params
Mean Cosine Similarity
Max Cosine Similarity
Mean Euclidean Distance
Max Euclidean Distance
precision
recall
BERT ✅
Covid Fake
50
1538
0.
0.
0.
0.
0.
0.
BERT ✅
Fake News Kaggle
50
1538
0.
0.
0.
0.
0.
0.
Space-BERT ✅
Covid Fake
50
4622 (3)
0.
0.
0.
0.
0.
0.
Space-BERT ✅
Covid Fake
50
98562 (64)
0.
0.
0.
0.
0.
0.
Space-BERT ✅
Covid Fake
50
197122 (128)
0.
0.
0.
0.
0.
0.
Space-BERT ✅
Fake News Kaggle
50
4622 (3)
0.
0.
0.
0.
0.
0.
Space-BERT ✅
Fake News Kaggle
50
98562 (64)
0.
0.
0.
0.
0.
0.
Space-BERT ✅
Fake News Kaggle
50
197122 (128)
0.
0.
0.
0.
0.
0.
Some tests that are used to make sure that model works as expected
IMDB
Experiment
Epochs
Train Params
Loss
Accuracy
F1-score (macro)
Precision
Recall
Inter-space weight
Intra-space weight
Space-DistilBERT (CE + inter-space loss)
5
4622
0.8804
0.6141
0.5587
0.8957
0.2594
0
0
Space-DistilBERT (CE loss)
5
4622
0.4883
0.8080
0.8079
0.8262
0.7808
0
0
Space-DistilBERT (CE loss)
5
197122
0.3855
0.8322
0.8320
0.8093
0.8663
0
0
Space-DistilBERT (CE + inter-space loss)
5
197122
0.7847
0.7890
0.7889
0.8016
0.7687
0.1
0
DistilBERT-base-cased
5
592130
0.4612
0.7852
0.7819
0.8799
0.6614
N/A
N/A
Paper tables:
Same dataset benchmarking
Data
Model
# Latent Dimensions
loss
accuracy
cs accuracy
f1
cs f1
precision
recall
Fake COVID News
BERT
N/A
0.6338
0.7145
N/A
0.6966
N/A
0.7528
0.7047
Space-BERT
3
0.7875
0.7949
0.5234
0.7828
0.3436
0.8478
0.7855
Space-BERT
64
0.5877
0.8645
0.7743
0.8627
0.7637
0.8725
0.8610
Space-BERT
128
0.5727
0.8808
0.6528
0.8797
0.5967
0.8859
0.8782
Liar (multi-label)
BERT
N/A
1.7426
0.2221
N/A
0.1211
N/A
0.1261
0.1815
Space-BERT
3
2.2877
0.2362
0.1824
0.1540
0.1079
0.1542
0.1977
Space-BERT
64
2.3192
0.2580
0.2362
0.2034
0.1590
0.2201
0.2227
Space-BERT
128
2.3651
0.2572
0.2081
0.2120
0.1267
0.2527
0.2248
Liar (binary-label)
BERT
N/A
0.7026
0.5666
N/A
0.3617
N/A
0.2833
0.5000
Space-BERT
3
0.8476
0.5900
0.5838
0.5280
0.5825
0.5778
0.5515
Space-BERT
64
0.8591
0.6267
0.5877
0.6002
0.5855
0.6185
0.6031
Space-BERT
128
0.8747
0.6251
0.5744
0.5971
0.4194
0.6170
0.6007
Kaggle Fake News
BERT
N/A
0.6251
0.6550
N/A
0.6337
N/A
0.7400
0.4319
Space-BERT
3
0.7444
0.8069
0.6408
0.8030
0.6396
0.8769
0.6946
Space-BERT
64
0.5305
0.8685
0.6834
0.8672
0.6497
0.9132
0.8018
Space-BERT
128
0.5016
0.8868
0.6436
0.8859
0.5900
0.9228
0.8334
Fake News Net
BERT
N/A
0.5502
0.7548
N/A
0.4302
N/A
0.3774
0.5000
Space-BERT
3
0.7124
0.7557
0.2732
0.4336
0.2400
0.8777
0.5016
Space-BERT
64
0.6754
0.7955
0.6468
0.6654
0.6090
0.7359
0.6453
Space-BERT
128
0.6826
0.8028
0.7131
0.6833
0.6302
0.7475
0.6616
Cross-dataset benchmarking
(Train) -> (Test)
Model
# Dims.
loss
accuracy
cs accuracy
f1
cs f1
precision
recall
(GossipCop) -> (CovidFake)
BERT
N/A
0.8384
0.5234
N/A
0.3436
N/A
0.2616
0.5000
Space-BERT
3
0.7862
0.5234
0.6145
0.3436
0.6052
0.2617
0.5000
Space-BERT
64
0.8823
0.5375
0.6064
0.3806
0.5874
0.6911
0.5151
Space-BERT
128
0.9444
0.5373
0.5691
0.3797
0.4712
0.6954
0.5149
(GossipCop) -> (Politifact)
BERT
N/A
0.5990
0.5909
N/A
0.3714
N/A
0.2955
0.5000
Space-BERT
3
0.5621
0.5909
0.4403
0.3714
0.3699
0.2955
0.5000
Space-BERT
64
0.5385
0.6695
0.6259
0.6085
0.6258
0.6812
0.6181
Space-BERT
128
0.5312
0.6941
0.6468
0.6395
0.6155
0.7172
0.6447
(FakeNewsNet) -> (CovidFake)
BERT
N/A
0.8171
0.5234
N/A
0.3436
N/A
0.2617
0.5000
Space-BERT
3
0.7652
0.5234
0.6024
0.3436
0.5881
0.2617
0.5000
Space-BERT
64
0.8256
0.5482
0.6191
0.4064
0.6116
0.6983
0.5265
Space-BERT
128
0.8719
0.5480
0.5956
0.4056
0.5354
0.7008
0.5263
(Train) -> (Test)
Model
# Dims.
loss
accuracy
cs accuracy
f1
cs f1
(Gossip) -> (CovidFake)
BERT
N/A
0.8384
0.5234
N/A
0.3436
N/A
Space-BERT
3
0.7862
0.5234
0.6145
0.3436
0.6052
Space-BERT
64
0.8823
0.5375
0.6064
0.3806
0.5874
Space-BERT
128
0.9444
0.5373
0.5691
0.3797
0.4712
(Gossip) -> (Politifact)
BERT
N/A
0.5990
0.5909
N/A
0.3714
N/A
Space-BERT
3
0.5621
0.5909
0.4403
0.3714
0.3699
Space-BERT
64
0.5385
0.6695
0.6259
0.6085
0.6258
Space-BERT
128
0.5312
0.6941
0.6468
0.6395
0.6155
(NewsNet) -> (CovidFake)
BERT
N/A
0.8171
0.5234
N/A
0.3436
N/A
Space-BERT
3
0.7652
0.5234
0.6024
0.3436
0.5881
Space-BERT
64
0.8256
0.5482
0.6191
0.4064
0.6116
Space-BERT
128
0.8719
0.5480
0.5956
0.4056
0.5354
Ablation Study
Same dataset inter-space and intra-space loss ablation study
Data
Model
# Latent Dimensions
loss
accuracy
cs accuracy
f1
cs f1
precision
recall
GossipCop
Space-BERT
3
1.9304
0.7566
0.7053
0.4307
0.6387
0.3783
0.5000
Space-BERT
64
2.0755
0.2818
0.6857
0.2538
0.6366
0.5683
0.5175
Space-BERT
128
2.0751
0.7448
0.7442
0.4436
0.6522
0.4861
0.4985
Fake News Net
Space-BERT
3
1.7721
0.7548
0.6196
0.4302
0.5962
0.3774
0.5000
Space-BERT
64
0.7079
0.7815
0.7055
0.6016
0.6325
0.7203
0.5928
Space-BERT
128
0.6910
0.7973
0.7321
0.6703
0.6370
0.7350
0.6500
Cross-dataset inter-space and intra-space loss ablation study