Multivariable Regression for Concrete Compression Testing through R

Today’s project focuses on creating a linear model that would describe the influence of multiple ingredients on concrete’s ability to withstand loads. The linear model was built with R.

Data comes from Chung-Hua University, China. Input variables measured were cement, slag, fly ash, water, super plasticizer(SP), coarse aggregate and fine aggregates. Input variables were measured in kg/m3 of concrete. The output variable is compressive strength after 28 days, measured in MPa. Results show that water is the strongest influencer of compressive strength. Slag is the weakest influencer of compressive strength. Super plasticizer had little to no impact and was completely removed from the model. The compressive strength was determined to follow below equation:

 Compressive strength
 = 0.04970*(Cement) – 0.04519*(Slag) + 0.03859*(Fly ash) – 0.27055*(Water) – 0.06986*(Coarse Aggregate) – 0.05358*(Fine Aggregate) 

Normalized Histogram of Residuals

The correlation coefficient shows a strong fit (R2 = 0.8962) and the probability values are low for each variable. The normalized histogram shows a normal distribution of residuals. The distribution of residuals strongly support the linear model and removes the risk of systematic error.

The problem was approached by creating a multivariable linear regression of all the input variables:

Initial Regression

A high correlation coefficient exists. Some of the probability values, however, do not show strong evidence against the null hypothesis – notably slag, fine aggregates and SP.  Fortunately, the step() function only selects feasible variables.

Final Regression

The coefficients are listed in the column. The full coding are as follows:

#Multivariable regression of Concrete Compression Test
#By Matthew Mano (
#import data
#remove incomplete tests
#generate linear model 
concreter<-lm(CS~ Cement+Slag+Fly.ash+Water+SP+CA+FA, data=concretec)
#get information of initial model 
#remove unnecessary variables 
#get information of secondary model 
#graphing residuals in histogram
hist(r, prob=TRUE,main=“Normalized Histograms of Residuals”,xlab=“Standard Deviations”)
#adding reference normal curve
curve(dnorm(x, mean=mean(r), sd=sd(r)), add=TRUE, col=“red”)


The links to the code, csv file and original dataset are attached. If you have any ideas for improvement or would like to get in contact, please comment or email me directly at

Link to code & csv:
Link to original data:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s