| Data analysis | TD 1 -- Multivariate Normal and R

# Exercice 1 : IQ Knowing that IQ is a normal measure of mean 100 and standard deviation 15, what is the probability of having an IQ more than 120? less than 100? ```{r} pnorm(120, mean = 100, sd = 15, lower.tail = F, log.p = FALSE) ``` ou après quelques calculs : ```{r} 1 - pnorm(4/3) ``` Visualisation : ```{r} library(ggplot2) ``` ```{r} QI.sup.120<-function(x){ ifelse(x>120,dnorm(x,mean=100,sd=15),NA) } ggplot(data.frame(x=c(20, 180)),aes(x)) + stat_function(fun = dnorm,args = list(mean=100,sd=15)) + stat_function(fun =QI.sup.120 , geom = "area", fill = "coral", alpha = 0.3) + geom_text(x = 127, y = 0.003, size = 4, fontface = "bold", label = paste0(round(pnorm(120,mean=100,sd=15,lower.tail = F),2))) + scale_x_continuous(breaks = c(80,100,120,130)) + geom_vline(xintercept=120,colour="coral") ``` # Exercice 2 : Bias of the maximum likelihood estimator of the variance Show that the maximum likelihood estimator of the variance is biased and propose an unbiased estimator. Demo : $$ \mathbb{E}[ \, \hat{\sigma}^2 \, ] = \mathbb{E}[\, \dfrac{1}{n} \sum_{i=1}^n (X_i - \bar X)^2] \\ = \mathbb{E}[\, \dfrac{1}{n} \sum_{i=1}^n X_i^2 - \bar{X}^2]\\ = \sigma^2 + \mu^2 - \dfrac{\sigma^2}{n} - \mu^2 $$ # Exercice 3 : Extreme values Consider the Fisher irises. Find flowers whose measured widths and lengths are exceptionally large or small. ```{r} library(tidyverse) ``` ```{r} data(iris) head(iris) ``` ```{r} parameters <- iris %>% select(-"Species") %>% gather(factor_key = TRUE) %>% group_by(key) %>% summarise(mean= mean(value), sd= sd(value)) %>% mutate(min=mean - 2*sd,max=mean + 2*sd) parameters ``` ```{r} #flower.outliers <-(apply( X=t((t(iris[,1:4]) < parameters$min) + (t(iris[,1:4]) > parameters$max)),MARGIN = 1,FUN = function(x) if(x) return(1) else(0))) flower.outliers <- t((t(iris[,1:4]) < parameters$min) + (t(iris[,1:4]) > parameters$max)) flower.outliers <- rowSums(flower.outliers) ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point(colour=as.numeric(iris$Species),size= flower.outliers*2 + 1 ) ``` # Exercice 4 : Equiprobability Ellipses Generate 1000 observations of a two-dimensional normal distribution $\mathcal{N}(\mu, \Sigma)$ $$ \mu = \left(\begin{array}{c} 0 \\ 0 \end{array}\right) \\ \Sigma = \left(\begin{array}{cc} 2 & 1\\ 1 & 0.75 \end{array}\right) $$ After, draw the ellipses of equiprobability of the multiples of 5%. ```{r} #par(mfrow=c(1,3)) # partage l'affichage en 2 sigma<-matrix(c(2,1,1,0.75),2,2) mu <- c(0,0) cholesky_sigma =chol(sigma) t(chol(sigma)) %*% chol(sigma) #Y<- t(t(chol(sigma)) %*% t(matrix(rnorm(2000),1000,2)) + mu) Y<- matrix(rnorm(2000),1000,2) %*% chol(sigma) + mu plot(Y,xlab="x",ylab="y",pch='.') ``` ```{r} Q<-qchisq(p=seq(0.05,0.95,by=0.1),df=2) x<-seq(-4,4,length=100) y<-seq(-4,4,length=100) sigmainv<-solve(sigma) a<-sigmainv[1,1] b<-sigmainv[2,2] c<-sigmainv[1,2] z<-outer(x,y,function(x,y) (a*x**2+b*y**2+2*c*x*y)) ## Fonction is t(y) %*% y image(x,y,z) contour(x,y,z,col="blue4",levels=Q,labels=seq(from=0.05,to=0.95,by=0.1),add=T) ``` ```{r} persp(x,y,1/(2*pi)*det(sigma)**(-1/2)*exp(-0.5*z),col="cornflowerblue",zlab="f(x)") ```