*Copyright @ www.mycsg.in;
Understanding the _FREQ_ variable in PROC MEANS output
When `proc means` creates an output dataset SAS automatically adds the variable `_FREQ_`
`_FREQ_` tells us how many input observations contributed to that summary row
This is not always the same as the `N` statistic for an analysis variable because `N` counts only non-missing values of that specific variable
Create a sample dataset
We set one `height` value to missing so the difference between `_FREQ_` and `N` becomes visible
The variable `weight` remains non-missing for all observations
data class02; set sashelp.class; if age=16 then height=.; run;
Copy Code
View Log
SAS Log
data class02; set sashelp.class; if age=16 then height=.; run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS02 has 19 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
`class02` still contains 19 observations
However, only 18 observations now have a non-missing value for `height`
View Data
Dataset View
Create summary statistics and compare _FREQ_ with N and NMISS
`var height weight;` requests statistics for two numeric variables
`n=` stores the number of non-missing observations for each analysis variable
`nmiss=` stores the number of missing observations for each analysis variable
The automatically created `_FREQ_` variable stores the number of rows that contributed to the summary record itself
proc means data=class02; var height weight; output out=statsfreq2 n=nheight nweight nmiss=nmissheight nmissweight mean=meanheight meanweight median=medianheight medianweight; run;
Copy Code
View Log
SAS Log
proc means data=class02; var height weight; output out=statsfreq2 n=nheight nweight nmiss=nmissheight nmissweight mean=meanheight meanweight median=medianheight medianweight; run; NOTE: There were 19 observations read from the data set WORK.CLASS02. NOTE: The data set WORK.STATSFREQ2 has 1 observations and 10 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
`_FREQ_` should be 19 because all 19 input records contribute to the overall summary row
`nheight` should be 18 because one `height` value is missing
`nmissheight` should be 1 for the same reason
`nweight` should remain 19 because no weight values were set to missing
This example shows clearly that `_FREQ_` counts contributing rows while `N` counts non-missing values for a particular variable
View Data
Dataset View
Key points to remember
`_FREQ_` is created automatically in many summary output datasets
`_FREQ_` counts how many observations contributed to the summary row
`N` counts non-missing values of a specific analysis variable
`NMISS` counts missing values of a specific analysis variable
When an analysis variable contains missing values `_FREQ_` and `N` may differ