From 962b2139fcef78eb57904bdeede9b1296321001a Mon Sep 17 00:00:00 2001
From: Dave Van Veen <davemvanveen@gmail.com>
Date: Thu, 14 Sep 2023 21:34:22 -0700
Subject: [PATCH] add p values

---
 index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/index.html b/index.html
index 976c37f..b979bc5 100644
--- a/index.html
+++ b/index.html
@@ -374,7 +374,7 @@
 	<table align=center width=800px>
 		<tr>
 			<td align=left width=800px>
-                    Clinical reader study. Top: Study design comparing the summarization of GPT-4 vs. that of human experts on three attributes: completeness, correctness, and conciseness. Bottom: Results. GPT-4 summaries are rated higher than human summaries on completeness for all three summarization tasks and on correctness overall. Radiology reports highlight a trade-off between correctness (better) and conciseness (worse) with GPT-4. Highlight colors correspond to a value’s location on the color spectrum. Asterisks denote statistical significance by Wilcoxon signed-rank test. 
+                    Clinical reader study. Top: Study design comparing the summarization of GPT-4 vs. that of human experts on three attributes: completeness, correctness, and conciseness. Bottom: Results. GPT-4 summaries are rated higher than human summaries on completeness for all three summarization tasks and on correctness overall. Radiology reports highlight a trade-off between correctness (better) and conciseness (worse) with GPT-4. Highlight colors correspond to a value’s location on the color spectrum. Asterisks denote statistical significance by Wilcoxon signed-rank test &ast;p-value &lt; 0.05, &ast;&ast;p-value &lt;&lt; 0.001.
 			</td>
 		</tr>
 	</table>