# Descriptive statistics for a numeric variable using SAS and R tidyverse

This post is part of 'SASnR | Descriptive statistics' series

When working with data, we frequently want to summarize the information for numeric variables to understand the central tendency and variation.

We need programming features to get descriptive statistics for numeric variables.

Let us assume that we have the below input data for 19 students in a class.

 Name Sex Age Height Weight Alice F 13 56.5 84 Barbara F 13 65.3 98 Carol F 14 62.8 102.5 Jane F 12 59.8 84.5 Janet F 15 62.5 112.5 Joyce F 11 51.3 50.5 Judy F 14 64.3 90 Louise F 12 56.3 77 Mary F 15 66.5 112 Alfred M 14 69 112.5 Henry M 14 63.5 102.5 James M 12 57.3 83 Jeffrey M 13 62.5 84 John M 12 59 99.5 Philip M 16 72 150 Robert M 12 64.8 128 Ronald M 15 67 133 Thomas M 11 57.5 85 William M 15 66.5 112

Let us assume that we have a requirement to get the mean and standard deviation of Height of students within males and females as shown below.

 Sex n mean sd F 9 60.58889 5.018328 M 10 63.91 4.937937

There are multiple ways of obtaining this result in both SAS and R. Below is one such approach.

## SAS code

`data class;    set sashelp.class;run; proc sort data=class;    by sex;run; proc summary data=class;    by sex    ;    var height;    output out=stats01 n=n mean=mean std=sd;run;`

• Proc summary/ proc means can be used to fetch descriptive statistics for numeric variables
• As we need a separate for males and females (by sex), we need to specify that using by statement in proc summary
• The numeric variable(s) for which descriptive statistics have to obtained should be specified on the var statement
• The name of the output dataset into which the statistics have to stored has to be specified on the out= option of the output statement
• The required summary statistics have to be specified on the output statement. On the left hand side of the '=' sign, we need to specify the statistic keyword and on the right hand side, the name of the variable into which the statistic has to be saved

## R tidyverse code

`#Load required librarieslibrary(tidyverse)library(haven)  class<-haven::read_sas("class.sas7bdat") groups<-group_by(class,Sex); stats01<-summarize(groups,n=n(),mean=mean(Height),sd=sd(Height))`
• tidyverse and haven packages are loaded using library function
• the sas dataset named "class" is read into R using "read_sas" function of haven package
• as we need a summary based on sex variable, group_by function is used to create groups based on the values of Sex variable
• summarize function is used to obtain the summarize the data within groups.
• functions mean, and sd are used to get the required summary statistics. On the RIGHT hand side of the '=' sign, we need to specify the statistic keyword and on the LEFT hand side, the name of the variable into which the statistic has to be saved