*Copyright @ www.mycsg.in;

PROC IMPORT - Part 02: DATAROW=, NAMEROW=, GUESSINGROWS=, and Mixed-Type Columns

Overview

Part 01 covered the basic use of PROC IMPORT for reading delimited files and Excel workbooks, including DBMS=, DELIMITER=, SHEET=, GETNAMES=, and REPLACE.
This lesson covers four more advanced topics: controlling which row SAS reads variable names from (NAMEROW=), which row data starts from (DATAROW=), how many rows SAS examines when guessing variable types (GUESSINGROWS=), and what to do when a column contains a mix of numeric and character values.
These options are essential when working with real-world files that do not follow the simple header-on-row-1, data-from-row-2 pattern.

DATAROW= and NAMEROW=

By default, PROC IMPORT assumes variable names are on row 1 and data begins on row 2 of a file.
NAMEROW= specifies which row of the file contains the variable names. This is useful when the file has a title or metadata rows above the header.
DATAROW= specifies which row data values begin. This must be greater than or equal to NAMEROW=. Any rows between the header and DATAROW= are skipped.
NAMEROW= and DATAROW= are supported for Excel imports (DBMS=XLSX or DBMS=XLS). For delimited text files, the equivalent is handled differently — use a DATA step with FIRSTOBS= or skip rows using the INFILE statement.

*------------------------------------------------------------------------------;
* Syntax illustration - requires an actual Excel file to run ;
* Adjust the file path to match your environment ;
*------------------------------------------------------------------------------;
 
/*
proc import
    datafile="C:\data\study_results.xlsx"
    out=work.results
    dbms=xlsx
    replace;
    sheet="Data";
    getnames=yes;
    namerow=3;
    datarow=5;
run;
*/
 
*------------------------------------------------------------------------------;
* This tells PROC IMPORT: ;
*    - Row 3 of the Excel sheet contains the column headers ;
*    - Rows 1 and 2 are titles or metadata to ignore ;
*    - Data values start at row 5 (rows 3-4 are header/blank) ;
*------------------------------------------------------------------------------;

This is a syntax-only block — run it after substituting a real file path.
A common scenario: a sponsor-provided Excel file has two title rows, then a merged header row, then a blank row, then data. NAMEROW=3 and DATAROW=5 handles this exactly.
If NAMEROW and DATAROW are the same, SAS uses that single row as both the header and the start of data — which would result in the header row being read as a data observation. Always set DATAROW to the first actual data row.

GUESSINGROWS= - Controlling Type Detection

When PROC IMPORT reads a delimited file or Excel workbook, it scans a number of rows to determine whether each column is numeric or character.
By default, it scans the first 20 rows (for delimited files) or a small number of rows for Excel. If the first 20 rows all contain numeric values in a column, PROC IMPORT declares that column numeric — even if later rows contain character values.
GUESSINGROWS= controls how many rows PROC IMPORT examines before deciding the type. Setting it to MAX tells SAS to scan the entire file.
Using GUESSINGROWS=MAX is slower on large files but eliminates the risk of misclassifying a column that has mixed values or has character values only in later rows.

* Syntax illustration ;
 
/*
proc import
    datafile="C:\data\lab_results.csv"
    out=work.labs
    dbms=csv
    replace;
    getnames=yes;
    guessingrows=max;
run;
*/

Use GUESSINGROWS=MAX as the safe default for any file where you are not completely certain that the first few rows are representative of the entire column's content.
For very large files (millions of rows), consider setting GUESSINGROWS= to a large but finite number (e.g., 10000) rather than MAX, to balance accuracy against import time.
Always run PROC CONTENTS on the imported dataset to verify the type and length of each variable before proceeding with analysis.

The Mixed-Type Column Problem

The most common PROC IMPORT pitfall is a column that should be character but gets imported as numeric — or vice versa — because of the guessing logic.
A typical scenario in clinical data: a result column that is mostly numeric (e.g., 12.5, 8.3) but occasionally contains a character string (e.g., "LLOQ", "BQL", ">1000"). PROC IMPORT may guess numeric and then silently lose the character values, converting them to missing.
The solution is to force the column type using a LENGTH or ATTRIB statement in a subsequent DATA step, or to use an IMPORT statement that reads everything as character and then convert selectively.

*------------------------------------------------------------------------------;
* Step 1: Import with GUESSINGROWS=MAX to get the best guess ;
* (Using inline data to simulate what PROC IMPORT would produce) ;
*------------------------------------------------------------------------------;
 
data work.raw_import;
    input subject $ result_char $15.;
    datalines;
001 12.5
002 8.3
003 LLOQ
004 15.2
005 BQL
006 10.1
;
run;
 
*------------------------------------------------------------------------------;
* Step 2: Create a clean version with both forms of result ;
*------------------------------------------------------------------------------;
data work.clean;
    set work.raw_import;
    * Keep original character result ;
    result_c = result_char;
    * Convert to numeric where possible - missing otherwise ;
    result_n = input(result_char, ?? best12.);
    * Flag special values ;
    if result_n = . and result_char not in ("" ".") then special_flag = "Y";
    else special_flag = "N";
run;

The ?? modifier on the INPUT function suppresses the invalid data NOTE and sets result_n to missing when conversion fails — this is the correct approach for columns with a mix of numbers and special character codes.
RESULT_C preserves the original character value for any row that has a special code; RESULT_N contains the numeric value for rows where conversion succeeded.
SPECIAL_FLAG makes it easy to separate rows with special values from rows with ordinary numeric results downstream.
Inspect WORK.CLEAN to verify that LLOQ and BQL rows have result_n = missing and special_flag = Y, while numeric rows have a populated result_n.

Verifying an Import with PROC CONTENTS

After every PROC IMPORT run, verify the imported structure immediately using PROC CONTENTS before doing any further processing.
Key things to check: variable types (numeric vs character), variable lengths, and observation count.
A variable you expected to be character but PROC IMPORT made numeric will silently lose all non-numeric values — catching this early prevents downstream analysis errors.

proc contents data=work.clean varnum;
run;

The VARNUM option lists variables in their order of creation rather than alphabetically, which matches the column order in the source file — easier to compare against the original.
Look for the Type column in the output: 1 = numeric, 2 = character. Confirm this matches your expectations for each column.
Look for the Length column: character columns default to the longest value found during guessing. A very short length (e.g., 8) may truncate longer values — increase it in a subsequent DATA step if needed.

Key Points

NAMEROW= specifies which row holds the variable names when the file has title or metadata rows above the header.
DATAROW= specifies where data values begin — must be equal to or greater than NAMEROW=.
GUESSINGROWS=MAX forces SAS to scan the entire file before deciding variable types — use this whenever column content is not uniform in the first few rows.
Mixed-type columns (numbers plus special character codes) are the most common PROC IMPORT problem — import as character or use GUESSINGROWS=MAX, then convert selectively with INPUT and the ?? modifier.
Always run PROC CONTENTS after importing to verify that types and lengths match expectations before proceeding with any analysis.

*Copyright @ www.mycsg.in;

PROC IMPORT - Part 02: DATAROW=, NAMEROW=, GUESSINGROWS=, and Mixed-Type Columns

Overview

DATAROW= and NAMEROW=

SAS Log

GUESSINGROWS= - Controlling Type Detection

SAS Log

The Mixed-Type Column Problem

SAS Log

Dataset View

Verifying an Import with PROC CONTENTS

SAS Log

Key Points