This is a basic introductory look at using R for generating descriptive statistics of a univariate data set. Here, we will use the historical dataset of Michelson’s experiment to determine the speed of light in air provided as a an ASCII file with header content and the observed speed of light for 100 trials.

We need to first read the data into R. Since the data is in a properly formatted ASCII file, we only need to tell R to ignore the first 60 lines, which is header information. R will then import the data into a list of class data.frame.

```
>C <- read.table("Michelso.dat",skip=60)
```

We can take a look at the dataset by simply typing the dataset name at the prompt. Here you can see that R automatically assigned the variable V1 to the data.

```
> C
V1
1 299.85
2 299.74
3 299.90
4 300.07
...
```

The summary() command in R provides the summary statistics: MIn, 1st Q, Median, Mean, 3rd Q and Max. We call this function with the argument 'C$V1' which tells R to act on the named variable, V1, in the data.frame C. (The options commands set the output number formatting to something realistic.)

```
> options(scipen=100)
> options(digits=10)
> summary(C$V1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
299.6200 299.8075 299.8500 299.8524 299.8925 300.0700
```

Standard deviation, trimmed mean and number of data points can be obtained individually.

```
>sd(C$V1)
[1] 0.07901054782
>mean(C$V1,trim=0.05)
[1] 299.8528889
>length(C$V1)
[1] 100
```

If we want to get skewness and kurtosis we'll need the fBasics package installed

```
> install.packages("fBasics")
> library(fBasics)
...
>skewness(C$V1, method="moment")
[1] -0.01798640563
attr(,"method")
[1] "moment"
>kurtosis(C$V1, method="moment")
[1] 3.198586275
attr(,"method")
[1] "moment"
```

To determine confidence intervals on the mean, we can use the one sample t-test. We can ignore the mean value to test against since in our case it is not known (or relevant for confidence interval estimation)

```
> t.test(C$V1, conf.level=0.99)
One Sample t-test
data: C$V1
t = 37950.9329, df = 99, p-value < 0.00000000000000022
alternative hypothesis: true mean is not equal to 0
99 percent confidence interval:
299.8316486 299.8731514
sample estimates:
mean of x
299.8524
```

Another method for obtaining much of this information in a single step can be found in the stat.desc() function from the pastecs package.

```
> install.packages("pastecs")
> library(pastecs)
...
> options(scipen=100)
> options(digits=4)
> stat.desc(C)
V1
nbr.val 100.0000000
nbr.null 0.0000000
nbr.na 0.0000000
min 299.6200000
max 300.0700000
range 0.4500000
sum 29985.2400000
median 299.8500000
mean 299.8524000
SE.mean 0.0079011
CI.mean.0.95 0.0156774
var 0.0062427
std.dev 0.0790105
coef.var 0.0002635
```

We'll look at the generation of some standard statistical plots for exploratory data analysis in a future post.

Caveat lector — All work and ideas presented here may not be accurate and should be verified before application.

## One thought on “Basic R — Descriptive Statistics of Univariate Data”