my_data <-read.csv2("plasma.csv")
my_data$group<-as.factor(my_data$group)

Today we aim to perform statistical tests on the dataset we have at our disposal.

\(\huge{First \ \ step :}\)

We will see whether the concentration of citrate in plasma changes significantly over time.

Difference between 8am and 11am

\(\\\) We perform a student’s test :

test_1<-t.test(my_data$X11am-my_data$X8am)

pvalues<-test_1$p.value
mean(my_data$X11am-my_data$X8am)
## [1] NA
round(pvalues,3)
## [1] 0.096

On average, consentration increased by 9.4

At the risk of being wrong at 9.6%, the change in concentration between the two hours has significantly increased.

Difference between 11am and 3pm

\(\\\) We perform a student’s test :

test_2<-t.test(my_data$X3pm-my_data$X11am)

pvalues_2<-test_2$p.value
mean(my_data$X3pm-my_data$X11am)
## [1] NA
round(pvalues_2,3)
## [1] 0.013

On average, consentration decreased by 15.3

At the risk of being wrong at 1.3%, the change in concentration between the two hours has significantly decreased.

\(\huge{Second \ \ step :}\)

Difference between two groups

library(plyr)    
library(ggplot2)
GenderPlot1_FLIP = ggplot(my_data, aes(x = group, y = X8am)) + geom_boxplot() + coord_flip()
GenderPlot1_FLIP 
## Warning: Removed 4 rows containing non-finite values (stat_boxplot).

Group 1 seems to have a higher average.

Let’s do a test to be sure

library("dplyr")
## 
## Attachement du package : 'dplyr'
## Les objets suivants sont masqués depuis 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## Les objets suivants sont masqués depuis 'package:stats':
## 
##     filter, lag
## Les objets suivants sont masqués depuis 'package:base':
## 
##     intersect, setdiff, setequal, union
library(dplyr)
group_by(my_data, group) %>%
  summarise(
    count = n(),
    mean = mean(X8am, na.rm = TRUE),
    sd = sd(X8am, na.rm = TRUE)
  )
## # A tibble: 3 x 4
##   group count  mean    sd
##   <fct> <int> <dbl> <dbl>
## 1 1         7  124.  21.1
## 2 2         3  105.  14.0
## 3 <NA>      4  NaN   NA
library("ggpubr")
## 
## Attachement du package : 'ggpubr'
## L'objet suivant est masqué depuis 'package:plyr':
## 
##     mutate
ggboxplot(my_data, x = "group", y = "X8am", 
          color = "group", palette = c("#00AFBB", "#E7B800"),
        ylab = "X8am", xlab = "Groups")
## Warning: Removed 4 rows containing non-finite values (stat_boxplot).

# Shapiro-Wilk normality test for Men's weights
with(my_data, shapiro.test(X8am[group == "1"]))# p = 0.8698
## 
##  Shapiro-Wilk normality test
## 
## data:  X8am[group == "1"]
## W = 0.9662, p-value = 0.8698
# Shapiro-Wilk normality test for Women's weights
with(my_data, shapiro.test(X8am[group == "2"])) # p = 0.4822
## 
##  Shapiro-Wilk normality test
## 
## data:  X8am[group == "2"]
## W = 0.92827, p-value = 0.4822

From the output, the two p-values are greater than the significance level 0.05 implying that the distribution of the data are not significantly different from the normal distribution. In other words, we can assume the normality.

res.ftest <- var.test(X8am ~ group, data = my_data)
res.ftest
## 
##  F test to compare two variances
## 
## data:  X8am by group
## F = 2.2782, num df = 6, denom df = 2, p-value = 0.6722
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##   0.05792299 16.53937046
## sample estimates:
## ratio of variances 
##           2.278195

The p-value of F-test is p = 0.6722. It’s greater than the significance level alpha = 0.05. In conclusion, there is no significant difference between the variances of the two sets of data. Therefore, we can use the classic t-test witch assume equality of the two variances.

res <- t.test(X8am ~ group, data = my_data, var.equal = TRUE)
res
## 
##  Two Sample t-test
## 
## data:  X8am by group
## t = 1.4604, df = 8, p-value = 0.1823
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -11.44319  50.96700
## sample estimates:
## mean in group 1 mean in group 2 
##        124.4286        104.6667

The p-value of the test is 0.1823, which is above the significance level alpha = 0.05. We can conclude that the average concentration of group 1 is not significantly different from the average concentration of group 2 by one value. This is due to the fact that there is not enough data.