When working with data, we frequently want to summarize the information for numeric variables to understand the central tendency and variation.
We need programming features to get descriptive statistics for numeric variables.
Let us assume that we have the below input data for 19 students in a class.
Name |
Sex |
Age |
Height |
Weight |
Alice |
F |
13 |
56.5 |
84 |
Barbara |
F |
13 |
65.3 |
98 |
Carol |
F |
14 |
62.8 |
102.5 |
Jane |
F |
12 |
59.8 |
84.5 |
Janet |
F |
15 |
62.5 |
112.5 |
Joyce |
F |
11 |
51.3 |
50.5 |
Judy |
F |
14 |
64.3 |
90 |
Louise |
F |
12 |
56.3 |
77 |
Mary |
F |
15 |
66.5 |
112 |
Alfred |
M |
14 |
69 |
112.5 |
Henry |
M |
14 |
63.5 |
102.5 |
James |
M |
12 |
57.3 |
83 |
Jeffrey |
M |
13 |
62.5 |
84 |
John |
M |
12 |
59 |
99.5 |
Philip |
M |
16 |
72 |
150 |
Robert |
M |
12 |
64.8 |
128 |
Ronald |
M |
15 |
67 |
133 |
Thomas |
M |
11 |
57.5 |
85 |
William |
M |
15 |
66.5 |
112 |
Let us assume that we have a requirement to get the mean and standard deviation of Height of students within males and females as shown below.
Sex |
n |
mean |
sd |
F |
9 |
60.58889 |
5.018328 |
M |
10 |
63.91 |
4.937937 |
There are multiple ways of obtaining this result in both SAS and R. Below is one such approach.
data class;
set sashelp.class;
run;
proc sort data=class;
by sex;
run;
proc summary data=class;
by sex ;
var height;
output out=stats01 n=n mean=mean std=sd;
run;
#Load required libraries
library(tidyverse)
library(haven)
class<-haven::read_sas("class.sas7bdat")
groups<-group_by(class,Sex);
stats01<-summarize(groups,n=n(),mean=mean(Height),sd=sd(Height))
Example class dataset (sas dataset) can be downloaded from here.