*Copyright @ www.mycsg.in;
What is delimited data
In the previous raw data lesson, we saw column input where each value is expected in a fixed set of columns
Delimited data is different because values are separated by a delimiter, such as a comma, tab, pipe, or ampersand
In this style, SAS reads from one delimiter to the next rather than from a fixed column position
How does SAS read delimited raw data
For delimited data, we commonly use list input
On the `input` statement, we list the variable names in the order the values appear in the raw file
On the `infile` statement, we describe the source of the raw data and specify the delimiter when needed
Character variables must have a trailing dollar sign on the input statement
Read comma delimited data using list input
We create a dataset named `list01` from comma-separated in-stream data
`dlm=','` tells SAS that the comma separates values
The listed variable names on the `input` statement define the order in which values are read
After running the code, inspect the dataset and verify that each raw value was assigned to the intended variable
data list01; infile datalines dlm=','; input name $ sex $ age height weight; datalines; Alice,F,13,56.5,84 John,M,16,72,150 Jane,F,12,59.8,84.5 ; run;
Copy Code
View Log
SAS Log
data list01; infile datalines dlm=','; input name $ sex $ age height weight; datalines; NOTE: The data set WORK.LIST01 has 3 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds ; run;
`list01` contains three observations read from comma-delimited data
Notice that SAS did not need fixed column positions because the delimiter guided the read
View Data
Dataset View
Check the dataset attributes
`proc contents` helps us inspect variable names, types, lengths, and other metadata of the created dataset
This is a good practice when learning input styles because it confirms whether SAS assigned attributes as expected
proc contents data=list01; run;
Copy Code
View Log
SAS Log
proc contents data=list01; run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
Review the output and confirm which variables are character and which are numeric
Pay particular attention to the lengths assigned to the character variables `name` and `sex`
Space is the default delimiter in simple list input
When `dlm=` is not specified, simple list input uses blank spaces as delimiters
This works well when values are separated by one or more spaces and character values do not contain embedded blanks
data list02; infile datalines; input name $ sex $ age height weight; datalines; Alice F 13 56.5 84 John M 16 72 150 Jane F 12 59.8 84.5 ; run;
Copy Code
View Log
SAS Log
data list02; infile datalines; input name $ sex $ age height weight; datalines; NOTE: The data set WORK.LIST02 has 3 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.01 seconds ; run;
`list02` should contain the same logical information as `list01` even though the delimiter is now a blank space
This example shows why list input is convenient for clean delimited raw data
View Data
Dataset View
Reading values longer than the default character length
With simple list input, SAS assigns a default length of 8 to character variables when no prior length has been defined
If a character value is longer than 8 characters, it can be truncated unless we define a longer length before the `input` statement
This is an important point to remember when reading names, descriptions, or codes longer than 8 characters
Example without defining a longer length
In this first example, the variable `city` is not given an explicit length before input
Inspect the result carefully and observe whether long values fit fully
data city_short; infile datalines dlm=','; input city $ score; datalines; Hyderabad,95 Bengaluru,88 Ahmedabad,91 ; run;
Copy Code
View Log
SAS Log
data city_short; infile datalines dlm=','; input city $ score; datalines; NOTE: The data set WORK.CITY_SHORT has 3 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds ; run;
Example with an explicit longer length
Now, we define `city` with length 20 before reading the raw data
This ensures that longer city names are preserved completely
data city_long; length city $20; infile datalines dlm=','; input city $ score; datalines; Hyderabad,95 Bengaluru,88 Ahmedabad,91 ; run;
Copy Code
View Log
SAS Log
data city_long; length city $20; infile datalines dlm=','; input city $ score; datalines; NOTE: The data set WORK.CITY_LONG has 3 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds ; run;
Compare `city_short` and `city_long` and confirm how the defined length affects the stored character values
This comparison reinforces why variable attributes should be considered before reading raw data
View Data
Dataset View
Key points to remember
Delimited data stores values separated by delimiters rather than fixed columns
List input reads values in order and assigns them to the variables listed on the `input` statement
The `dlm=` option identifies the delimiter when it is not a blank space
Character variables require a trailing dollar sign on the input statement
Define a longer character length before input when values may exceed the default length