Category Archives: Statistics

WLF Curve Fitting

A short script to fit a W-L-F model to experimental data. This is flaky, depending on the input guesses and convergence is poor. We simply stop the convergence at the point we have an acceptable fit. This is not robust!


options(scipen = 10)
aT <- structure(list(Temp.F = c(64.4, 86, 104, 122, 149, 176, 203,
230, 257, 284, 311), aT = c(16.7, 1.76, 1, 0.358, 0.00965, 0.000722,
0.000000308, 0.0000000733, 0.00000000357, 0.000000000202, 0.0000000000191
)), .Names = c("Temp.F", "aT"), class = "data.frame", row.names = c(NA,
-11L))
at.nls<-nls(log(aT) ~ ((-C1*(Temp.F-104))/(C2+(Temp.F-104))),
data = aT,
start = list(C1 = 100, C2 = 202),
trace = 1,
# weights = c(10,rep(1,length(aT)-1)),
nls.control(maxiter = 3,
tol = 1e-16,
minFactor = 1/1000000000024,
warnOnly = TRUE))
plot(log(aT$aT)~aT$Temp.F)
lines(predict(at.nls)~aT$Temp.F)
C1 <- summary(at.nls)$coefficients[[1]]
C2 <- summary(at.nls)$coefficients[[2]]
plot((aT$aT)~aT$Temp.F, log = "y", ylim = c(1e-20,100), xlim = c(0,500))
lines(exp(predict(at.nls))~aT$Temp.F)
curve(exp((-C1*(x-104))/(C2+(x-104))), add = TRUE)

Average of multiple curves

This R script is useful for creating average curves of experimental material property data. it simply breaks the multiple curve data into discrete chunks and averages the values within each chunk. The size of the window is customizable. Additionally, it also generates a lowess fit for the resultant dataset, applying additional smoothing if desired.


# R Script to compute an average curve from material test data
options(digits = 5)

# Read raw data (data should be in 2 colums) and sort by x-value.
# Data should be offset to the origin and all but a single 0,0 point should be included)
Data <- read.csv("Z:/Christopher/15/141531/Materials/Planar_23.csv", header=FALSE)
Data.sort <- Data[order(Data$V1),]

# Define the number of samples to generate
points <- 19
samples <- floor((length(Data.sort$V1)-1)/points)

# Initialize the matrix (we assume that the )
Result <- matrix(nrow = points, ncol = 2)
Result[1,] <- c(0,0)

# Loop over the data computing the average of groups of data points
# defined by the width of the sample window
i <- 1
x <- 1
for(x in 2:points){
V1 <- V2 <- 0.0
for(y in 1:samples){
V1 <- V1 + Data.sort[y+i,1]
V2 <- V2 + Data.sort[y+i,2]
}
i<-i+samples
Result[x,] <- c(V1/samples,V2/samples)
}

# Plot the raw data and the averaged curve in red
plot(Data.sort, pch = 1)
lines(Result, col = "red", lwd = 3)

# Create a lowess smoothed curve and plot in blue.
# This is helpful to remove noise in the average especially with a
# large number of sampled points
Smooth <- (lowess(Result, f = .3))
lines(Smooth, col = "blue", lwd = 3)

# Perform some data manipulation and export as tab delimited text files
Smooth<-(do.call(rbind, lapply(Smooth, unlist)))
Smooth[,1] <- c(0,0)
write(t(Result), file = "Planar_23_avg.txt", ncolumns = 2, sep = "\t")
write(Smooth, file = "Planar_23_smth.txt", ncolumns = 2, sep = "\t")

Rplot

Calculating Mooney-Rivlin Constants

In this post we will look at the procedure for determining the Mooney-Rivlin constants from simple tensile test data of an elastomeric solid. The definition and derivation of the material model is left to others. For our purposes, all we need to know is that the material model yields a predicted engineering stress under simple tension of

where C1 and C2 are the Constants that need to determine and is the stretch ratio, defined as the ratio of the stretched length to the initial length of the sample. This can be defined in terms of the measured strain in a simple tensile test

We’ll start with a set pf published stress strain data for a 40 Shore A material from GLS corporation. The stress strain curve from the literature is shown below.

Using g3data we can extract the points and tabulate them. For simplicity, we also convert the strain to stretch ratio and the stress to SI units. The data file can be downloaded here.

Strain            Stress (psi)                                                    Stretch Ratio () Stress (Pa)      
0 0   1 0
0.5 141.2   1.5 9.74 x 105
1.0 204.9   2 1.41 X 106
2.0 299.8   3 2.07 x 106
3.0 367.6   4 2.53 x 106

Usign R to perform the regression is easy:

1. Read in the data from the file G7940.dat.

> SS_Data <- read.table("c:/G7940.dat", header = TRUE)

2. Examine the imported data.

> SS_Data
  Strain  Stress
1    0.0       0
2    0.5  974000
3    1.0 1410000
4    2.0 2070000
5    3.0 2530000

3. The "attach" command allows us to access the variables directly without having to reference the original structure.

> attach(SS_Data)

4. A quick plot of the imported data can then be generated.

> plot(Strain, Stress)

5. Since the MR model uses the stretch ratio, not the strain, we convert the strains and then plot the stress vs. stretch ratio.

> Stretch = Strain + 1
> plot (Stretch, Stress)


6. Now for the curve fitting itself. We use the "nls" function which stands for "Nonlinear Least Squares". We provide the model as given in the equation at the beginning of the post where C1 & C2 are the two constants we wish to determine, guesses for the initial values of those constants, and request that the trace is provided. Then we ask for a summary of the results. This prints the fitted values for our two constants along with some other helpful data.

> MR.fit <- nls(Stress ~ (2*C1+2*C2/Stretch)*(Stretch-1/(Stretch^2)),
+ start=list(C1=100,C2=100), trace=T)
1.361224e+13 :  100 100 
2117860971 :  239897.5 335348.9 
> summary(MR.fit)
Formula: Stress ~ (2 * C1 + 2 * C2/Stretch) * (Stretch - 1/(Stretch^2))
Parameters:
   Estimate Std. Error t value Pr(>|t|)    
C1   239898       8001   29.98 8.15e-05 ***
C2   335349      23840   14.07 0.000778 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 26570 on 3 degrees of freedom

Number of iterations to convergence: 1 
Achieved convergence tolerance: 5.802e-07 

7. Lastly, we would like to see how our curve fit matches the data. First we extract the coefficients into a vector "C" With the plot shown above still open, we add the curve and clean it up with a title. Notice the method for calling the coefficients.

> C <- coef(MR.fit)
> curve((2*C[1]+2*C[2]/x)*(x-1/(x^2)), from=1.0, to = 4.0, add=TRUE)
> title(main="Mooney-Rivlin Fit to Simple Tensile Data")


Caveat lector — All work and ideas presented here may not be accurate and should be verified before application.

Basic R – Univariate Dataset Graphics in R, Exploratory Data Analysis

As a continuation of this post, we continue on the analysis of univariate data by generating graphical views of the Michelson speed of light data.

The simplest method to view the distribution is the stem and leaf plot.


> stem(C$V1)

  The decimal point is 1 digit(s) to the left of the |

  2996 | 2
  2996 | 5
  2997 | 222444
  2997 | 566666788999
  2998 | 000001111111111223344444444
  2998 | 5555555566677778888888888999
  2999 | 0011233444
  2999 | 55566667888
  3000 | 000
  3000 | 7

We can also create a boxplot of the data. Notice the method to insert a mathematical expression in teh axis label.


> boxplot(C)
> title(ylab=expression(paste(m/sec," ",plain( x )," ",10^3)))
> title("Boxplot of Speed of Light")

Which gives
01

And a histogram gives us a graphical view similar to the stem an leaf.


> hist(C$V1, main="Histogram of Data", xlab=expression(paste(m/sec," ",plain( x )," ",10^3)))

02

It’s always a good idea to generate a Q-Q Plot to check for normality.


> qqnorm(C$V1)

03


Caveat lector — All work and ideas presented here may not be accurate and should be verified before application.

Basic R — Descriptive Statistics of Univariate Data

This is a basic introductory look at using R for generating descriptive statistics of a univariate data set. Here, we will use the historical dataset of Michelson’s experiment to determine the speed of light in air provided as a an ASCII file with header content and the observed speed of light for 100 trials.

We need to first read the data into R. Since the data is in a properly formatted ASCII file, we only need to tell R to ignore the first 60 lines, which is header information. R will then import the data into a list of class data.frame.


>C <- read.table("Michelso.dat",skip=60)

We can take a look at the dataset by simply typing the dataset name at the prompt. Here you can see that R automatically assigned the variable V1 to the data.


> C
        V1
1   299.85
2   299.74
3   299.90
4   300.07
...

The summary() command in R provides the summary statistics: MIn, 1st Q, Median, Mean, 3rd Q and Max. We call this function with the argument 'C$V1' which tells R to act on the named variable, V1, in the data.frame C. (The options commands set the output number formatting to something realistic.)


> options(scipen=100)
> options(digits=10)
> summary(C$V1)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
299.6200 299.8075 299.8500 299.8524 299.8925 300.0700 

Standard deviation, trimmed mean and number of data points can be obtained individually.


>sd(C$V1)
[1] 0.07901054782
>mean(C$V1,trim=0.05)
[1] 299.8528889
>length(C$V1)
[1] 100

If we want to get skewness and kurtosis we'll need the fBasics package installed


> install.packages("fBasics")
> library(fBasics)
...
>skewness(C$V1, method="moment")
[1] -0.01798640563
attr(,"method")
[1] "moment"
>kurtosis(C$V1, method="moment")
[1] 3.198586275
attr(,"method")
[1] "moment"

To determine confidence intervals on the mean, we can use the one sample t-test. We can ignore the mean value to test against since in our case it is not known (or relevant for confidence interval estimation)


> t.test(C$V1, conf.level=0.99)

	One Sample t-test

data:  C$V1 
t = 37950.9329, df = 99, p-value < 0.00000000000000022
alternative hypothesis: true mean is not equal to 0 
99 percent confidence interval:
 299.8316486 299.8731514 
sample estimates:
mean of x 
 299.8524

Another method for obtaining much of this information in a single step can be found in the stat.desc() function from the pastecs package.


> install.packages("pastecs")
> library(pastecs)
...
> options(scipen=100)
> options(digits=4)
> stat.desc(C)
                        V1
nbr.val        100.0000000
nbr.null         0.0000000
nbr.na           0.0000000
min            299.6200000
max            300.0700000
range            0.4500000
sum          29985.2400000
median         299.8500000
mean           299.8524000
SE.mean          0.0079011
CI.mean.0.95     0.0156774
var              0.0062427
std.dev          0.0790105
coef.var         0.0002635

We'll look at the generation of some standard statistical plots for exploratory data analysis in a future post.


Caveat lector — All work and ideas presented here may not be accurate and should be verified before application.

Treating Attribute Data

When speaking of attribute data in this case we are concerning ourselves with the measurement of of a certain quantity which can take on one of two values: TRUE or FALSE; 0, 1; Pass, Fail; Heads, Tails; etc. — a binary output. Probability theory gives us a simple tool for the analysis of such a system: The Binomial Distribution.

Note: The binomial distribution is valid for sampling with replacement. It is a good approximation for sampling without replacement when the parent population is large. For small populations, the hypergeometric distribution should be used.

The binomial distribution is defined by the PMF

which gives the discrete probability of n outcomes in N trials where N is the number of independent experiments yielding a binary result, n is the number of occurrences of a specified binary outcomes in N trials and p is the explicit probability that the specified outcome will occur in a single trial.

When performing experimentation to determine the probability of a product failure rate or to verify that a product will meet a specified reliability rating, we are more interested in the probabilities that the specified number of outcomes or less will occur. For this we use the cumulative probability function for a binomial distribution, defined in our case as the probability that n or less outcomes will occur. This is simply the sum of the probabilities that x outcomes will occur for

With this background, lets layout the problem. A newly designed product must meet a certain reliability rating which is defined as a maximum percentage of unit failures during operation. Our goal is to determine, through testing, and to a certain confidence level, whether the product meets this criteria. For this, we need to approach the binomial distribution from the inside-out. What we are actually specifying when we define a reliability criteria is the probability that the device will fail, or p in our equations above. Much like with a six-sided die that has a probability of presenting a given number on each roll and will, on average, present that number one out of six times over a large number of trials, we wish our product to have a specified probability (typically low) of failure. The question then becomes: How do we test for this?

Having defined the desired probability of failure of the parent population, p, and realizing that we will accept probabilities that are lower, but not higher, , we can see that the cumulative probability will give the probability of n or less failures occurring in N trials of a product that has a probability of failure of p. Thus the cumulative probability yields the likelihood that we will see n or less failures of the product purely by chance. Therefore is the probability that n or less failures would present in N trials where the probability of failure for each trial is p. We call this value our confidence.

Now we can specify our confidence that the probability of failure of each individual device id p. The question then becomes: How many trials, N, must be run and how many failures, n, are allowable in those trials? By specifying either N or n, the other can be found directly from the formula. For example, if we specify that we want to determine to a 99% confidence that 5% or less of the devices in the parent population will fail during operation () and we are willing to allow one failure during our testing (), then the planned number of test samples needs to be 130.

Unfortunately, the total number of trials cannot necessarily be specified in advance, because the laws of chance may dictate that a failure occurs in the first few samples. Therefore it is prudent to plan for at least one failure during testing. We can then compute the number of trials required for occurrences of failure of both zero and one device and may halt the testing if no failures have been recorded after the lower number of trials. In the example above, that would equate to 90 test samples.

It is useful to create a table similar to that shown below to better understand the testing requirements of a given situation. This table is set up in a slightly different format to allow the problem to be stated in a slightly different fashion. Here we look at the confidence, , in the probability that the failure will not occur, , if we see a number of observed failures, n, in the total opportunities, N.

Table of Required Samples for Attribute Testing
Tableof Required Samples for Attribute Testing

Caveat lector — All work and ideas presented here may not be accurate and should be verified before application.