-
Notifications
You must be signed in to change notification settings - Fork 0
/
analysis.Rmd
150 lines (128 loc) · 4.17 KB
/
analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
title: "Survival Analysis of Ovarian Carcinoma Patients in Clinical Trials"
output:
html_document:
df_print: paged
pdf_document: default
editor_options:
markdown:
wrap: 72
---
```{r setup, include=FALSE, warning=F, message=F}
knitr::opts_chunk$set(echo = TRUE)
library(data.table)
library(tidyverse)
library(arules)
library(arulesViz)
library(survival)
library(survminer)
library(plotly)
```
```{r warning=FALSE}
data(jasa, package="survival")
jasa
```
```{r warning=FALSE}
data(ovarian)
ovarian %>%
mutate(fustat = as.integer(fustat)) ->
ovarian
head(ovarian)
```
```{r}
table(ovarian$rx)
table(ovarian$fustat)
table(ovarian$resid.ds)
```
fustat: censoring status <br>
0 = censored <br>
1 = uncensored <br>
<br>
rx: treatment type <br>
1 = cyclophosphamide <br>
2 = combined (cyclophosphamide + adriamycin) <br>
<br>
resid.ds <br>
1 = microscopic residual disease <br>
2 = macroscopic residual disease <br>
```{r}
summary(ovarian)
```
```{r warning=FALSE}
ggplot(ovarian, aes(x = age, y = futime/30.42, color = factor(fustat))) +
geom_point() +
scale_color_manual(values = c("darkgreen", "red"), labels = c("Censored", "Death")) +
labs(x = "Age", y = "Time to death (in months)", color = "Survival status")
```
```{r}
# Compute Kaplan-Meier survival curve
km.fit <- survfit(Surv(futime/30.42, fustat) ~ 1, data = ovarian)
summary(km.fit)
```
```{r}
# Kaplan Meier on uncensored patients
uncensored_patients <- ovarian[ovarian$fustat == 1, ]
km.uncensored <- survfit(Surv(futime/30.42, fustat) ~ 1, data = uncensored_patients)
ggsurvplot(km.uncensored,
xlab = "Time (in months)",
ylab = "Survival Probability",
legend.title = "Group",
legend.labs = c("All Patients")) +
ggtitle("Kaplan-Meier on Uncensored Patients")
```
Here it is easy to determine the survival rate/survival probability
because we have legitimate data of each patient where he/she is alive or
dead at a particular instance of time. But in theory this graph is wrong
as we are not considering the uncensored patients so the graph gives us
a wrong estimation of survival rate at a particular time.
```{r message=F, warning=F}
ggsurvplot(km.fit,
xlab = "Time (in months)",
ylab = "Survival Probability",
legend.title = "Group",
legend.labs = c("All Patients")) +
ggtitle("Kaplan-Meier Survival Curve")
```
Here the vertical lines indicate an event where the person gets censored
hence in order to plot the survival rate we use the law of total
probability through which we can estimate the survival rate/survival
probability. And if we don't consider censored patients the Kaplan-Meier
curve wrongly gives the survival rate. For eg. After 5 months from the
first graph the survival rate is 0.75 but after considering censored
patients it outputs 0.95. Hence this shows the importance of not
neglecting censored subjects.
```{r}
# Computing Kaplan-Meier survival curves by treatment regimen
ovarian_km_rx <- survfit(Surv(futime/30.42, fustat) ~ rx, data = ovarian)
ggsurvplot(ovarian_km_rx, color = "rx",
xlab = "Time (in months)",
ylab = "Survival Probability",
legend.title = "Treatment",
legend.labs = c("Standard", "Experimental")) +
ggtitle("Kaplan-Meier Survival Curve by Treatment")
```
From the plot, it appears that the combined approach seems to have the
potential to increase the survival rate.
```{r}
# Estimated survival probability after 2 years (24 months)
summary(km.fit, times = 24)
```
```{r}
# naive estimate for suvival after 2 years (24 months)
num_deaths <- filter(ovarian, fustat==1 & futime <= 24) %>%
nrow()
num_alive <- filter(ovarian, futime > 24) %>%
nrow()
binom.test(c(num_alive, num_deaths))
```
```{r}
ggsurvplot(ovarian_km_rx, conf.int = 0.95)
```
```{r}
summary(ovarian_km_rx)
```
```{r}
data.logrank <- survdiff(Surv(futime, fustat) ~ factor(rx), data = ovarian)
data.logrank
```
The results from the log-rank test did not reveal a significant difference in survival between the treatment types (rx=1 and rx=2) for ovarian carcinoma patients in the clinical trials. However, more thorough research and larger sample sizes might be required to offer more conclusive evidence. <br>