Descriptive statistics for a numeric variable using SAS and R tidyverse


This post is part of 'SASnR | Descriptive statistics' series

When working with data, we frequently want to summarize the information for numeric variables to understand the central tendency and variation.

We need programming features to get descriptive statistics for numeric variables.

Let us assume that we have the below input data for 19 students in a class.


Name

Sex

Age

Height

Weight

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Jane

F

12

59.8

84.5

Janet

F

15

62.5

112.5

Joyce

F

11

51.3

50.5

Judy

F

14

64.3

90

Louise

F

12

56.3

77

Mary

F

15

66.5

112

Alfred

M

14

69

112.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

Jeffrey

M

13

62.5

84

John

M

12

59

99.5

Philip

M

16

72

150

Robert

M

12

64.8

128

Ronald

M

15

67

133

Thomas

M

11

57.5

85

William

M

15

66.5

112


Let us assume that we have a requirement to get the mean and standard deviation of Height of students within males and females as shown below.

Sex

n

mean

sd

F

9

60.58889

5.018328

M

10

63.91

4.937937


There are multiple ways of obtaining this result in both SAS and R. Below is one such approach.

SAS code


data class;
    set sashelp.class;
run;
 
proc sort data=class;
    by sex;
run;
 
proc summary data=class;
    by sex    ;
    var height;
    output out=stats01 n=n mean=mean std=sd;
run;

 

  • Proc summary/ proc means can be used to fetch descriptive statistics for numeric variables
  • As we need a separate for males and females (by sex), we need to specify that using by statement in proc summary
  • The numeric variable(s) for which descriptive statistics have to obtained should be specified on the var statement
  • The name of the output dataset into which the statistics have to stored has to be specified on the out= option of the output statement
  • The required summary statistics have to be specified on the output statement. On the left hand side of the '=' sign, we need to specify the statistic keyword and on the right hand side, the name of the variable into which the statistic has to be saved

R tidyverse code


#Load required libraries
library(tidyverse)
library(haven)
  
class<-haven::read_sas("class.sas7bdat")
 
groups<-group_by(class,Sex);
 
stats01<-summarize(groups,n=n(),mean=mean(Height),sd=sd(Height))
  • tidyverse and haven packages are loaded using library function
  • the sas dataset named "class" is read into R using "read_sas" function of haven package
  • as we need a summary based on sex variable, group_by function is used to create groups based on the values of Sex variable
  • summarize function is used to obtain the summarize the data within groups.
  • functions mean, and sd are used to get the required summary statistics. On the RIGHT hand side of the '=' sign, we need to specify the statistic keyword and on the LEFT hand side, the name of the variable into which the statistic has to be saved

 

Example class dataset (sas dataset) can be downloaded from here.

 





Post categories
SASnR
SDTM