Basic_R.Rmd

# 0. 참고할 문서
1. [R 기본](http://github.com/rstudio/cheatsheets/raw/master/base-r.pdf)
2. [데이터 가공](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf)
3. [데이터 시각화](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf)
4. [연동형 문서 작성](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf)


# 1. 일단 따라해봅시다

- **1. R 기본 문서를 참고해 주세요**  

## 1.1. 정의하기
```{r}
a<-10 #a는 10입니다.
b<-5 #b는 5입니다.
c<-"animal" #c는 animal입니다.
```
- 주의: 변수 이름은 문자로 시작해야합니다.  


## 1.2. 출력하기
```{r}
a
b
c
```


## 1.3. 사칙연산
```{r}
a+b #더하기
a-b #빼기
a/b #나누기
a*b #곱하기
a^2 #제곱
sqrt(a^2) #루트
```

## 1.4. 논리
```{r}
a==b #a는 b와 같다.
a!=b #a는 b와 같지 않다.
a>b #a는 b보다 크다.
a<b #a는 b보다 작다.
a>=b #a는 b보다 크거나 같다.
a<=b #a는 b보다 작거나 같다.
```

## 1.5. 데이터프레임  
```{r}
a<-1:10

a
```

```{r}

b<-c(1,2,3,4,5,10)

b
```

```{r}
c<-c("Konkuk","University")

c
```
- c는 '합치다'라는 뜻의 'combine'의 약자입니다.  

```{r}
example<-data.frame(name=c("철수","영희","건이"), sex=c("Male","Female","Male"),age=c(29,40,33), height=c(175,163,168))

example
```

```{r}
x<-example$height #example 데이터 중 height 데이터만 추출
x
```

```{r}
example[2,3]
example[3,3]
example[,3]
example[1,]
example[-1,]
example[c(2,3),]
```


## 1.6. 기본 통계
```{r}
sum(x) #합계
mean(x) #평균
median(x) #중간값
var(x) #분산
sd(x) #표준편차
max(x) #최대값
min(x) #최소값
range(x) #범위
```


# 2. 기초 R 문법
## 2.1. Working directory 확인  
```{r}
getwd()
```

## 2.2. Working directory 설정
```{r eval=FALSE}
setwd("/Users/Youngjun/Google Drive/wd")
```

- R studio > Tools > Global option > Default working directory 에서 바꿀수도 있습니다.

## 2.3. Working directory 안에 있는 파일 확인
```{r}
dir()
```

## 2.4. 패키지 인스톨
```{r eval=FALSE}
install.packages("dplyr")
install.packages("ggplot2")
install.packages("readxl")
```

## 2.5. 패키지 업데이트
```{r eval=FALSE}
update.packages(ask=FALSE)
```

## 2.6. 패키지 로딩
```{r eval=FALSE}
library(dplyr)
library(ggplot2)
```

## 2.7. 파일 읽기
```{r eval=FALSE}
read.csv("dataframe.csv")
readxl::read_excel("dataframe.xlsx")
```

## 2.8. 파일 쓰기
```{r eval=FALSE}
write.csv(example, "test.txt", row.names=FALSE)
```

# 3. 데이터 가공: dplyr 패키지

- **2. 데이터 가공 문서를 참고해 주세요**  

## 3.0. 패키지 로딩 및 데이터 읽기
```{r}
library(dplyr)
readxl::read_excel("example_df.xlsx")
df<-readxl::read_excel("example_df.xlsx")
```

## 3.1. 필요한 데이터만 추출하기: filter

```{r}
filter(df, Name=="Kim")
filter(df, weight > 500)
filter(df, wgrade != "C")
filter(df, qgrade != "3")
filter(df, qgrade=="1++" & wgrade=="A") #and
filter(df, qgrade=="1++" | wgrade=="A") #or
```


## 3.2. 파이프 함수의 사용: %>%  

- x %>% f(y)  becomes f(x,y)  

```{r}
df$month %>% mean()
mean(df$month)
```


## 3.3. 그룹 분석: group_by

```{r}
group_by(df, Name) %>% summarise(avg=mean(month))
group_by(df, Name) %>% summarise(month=mean(month), mean_weight=mean(weight), max_weight=max(weight))
group_by(df, qgrade) %>% summarise(month=mean(month))
```


# 4. 시각화: ggplot2 패키지
- **3. 데이터 시각화 문서 및 R graphic cookbook을 참고해 주세요**  

## 4.0. 패키지 로딩
```{r}
library(ggplot2)
```

## 4.1. 메인 데이터셋 지정: ggplot(data=df, aes(x,y))
```{r}
g<-ggplot(df, aes(x=month,y=weight))

g
```

## 4.2. 산점도: + geom_point()
```{r}
g + geom_point()
```


## 4.3. 막대그래프: + geom_bar()
```{r}
h<-ggplot(df, aes(qgrade))

h + geom_bar()

h + geom_bar() + scale_x_discrete(limits=c("3","2","1","1+","1++")) #순서 지정하기

h + geom_bar(width=0.5, fill="red") + scale_x_discrete(limits=c("3","2","1","1+","1++")) #너비 및 색 지정
```

## 4.4. 분포도 그리기: + geom_density()

```{r}
g<-ggplot(df, aes(month))

g + geom_density(kernel="gaussian")
```


# 5. dplyr + ggplot

```{r}
filter(df, month<40) %>% ggplot(aes(month)) + geom_density(kernel="gaussian") #40개월 미만의 개체의 분포도

filter(df, Name=="Kim") %>% ggplot(aes(wgrade)) + geom_bar(width=0.4, fill="#81BEF7") + coord_flip() #김씨네 육량등급
```

- html 컬러 차트: https://html-color-codes.info/Korean/

# 6. 연동형 문서 작성: R 마크다운

- **4. 연동형 문서 작성을 참고해 주세요**  
- https://gist.github.com/ihoneymon/652be052a0727ad59601  
- ***워드나 한글로 예술 작품을 만들 필요는 없습니다 -> 보고서는 정보 전달이 목적***
- R 마크다운 = 기존의 마크다운 + R 코드를 재현  
- 기본 데이터가 변하더라도 같은 형식의 보고서를 빠르게 작성할 수 있음

# 7. 결론

본 수업을 통해 기본적인 R 사용법을 배워보았습니다. 각자가 다뤄야 하는 데이터의 종류와 성격이 다르겠지만 기본적으로 데이터를 **가공하고-분석하고-해석하는** 일련의 과정들은 다르지 않을 것입니다. 지금은 비록 기본만 배웠지만 각자가 더 공부해 분석 및 보고서 작성과정에 들이는 시간을 줄일 수 있길 바래봅니다.