PROC IMPORT - Part 02: DATAROW=, NAMEROW=, GUESSINGROWS=, and Mixed-Type Columns
Overview
Part 01 covered the basic use of PROC IMPORT for reading delimited files and Excel workbooks, including DBMS=, DELIMITER=, SHEET=, GETNAMES=, and REPLACE.
This lesson covers four more advanced topics: controlling which row SAS reads variable names from (NAMEROW=), which row data starts from (DATAROW=), how many rows SAS examines when guessing variable types (GUESSINGROWS=), and what to do when a column contains a mix of numeric and character values.
These options are essential when working with real-world files that do not follow the simple header-on-row-1, data-from-row-2 pattern.
DATAROW= and NAMEROW=
By default, PROC IMPORT assumes variable names are on row 1 and data begins on row 2 of a file.
NAMEROW= specifies which row of the file contains the variable names. This is useful when the file has a title or metadata rows above the header.
DATAROW= specifies which row data values begin. This must be greater than or equal to NAMEROW=. Any rows between the header and DATAROW= are skipped.
NAMEROW= and DATAROW= are supported for Excel imports (DBMS=XLSX or DBMS=XLS). For delimited text files, the equivalent is handled differently — use a DATA step with FIRSTOBS= or skip rows using the INFILE statement.
SAS Log
This is a syntax-only block — run it after substituting a real file path.
A common scenario: a sponsor-provided Excel file has two title rows, then a merged header row, then a blank row, then data. NAMEROW=3 and DATAROW=5 handles this exactly.
If NAMEROW and DATAROW are the same, SAS uses that single row as both the header and the start of data — which would result in the header row being read as a data observation. Always set DATAROW to the first actual data row.
GUESSINGROWS= - Controlling Type Detection
When PROC IMPORT reads a delimited file or Excel workbook, it scans a number of rows to determine whether each column is numeric or character.
By default, it scans the first 20 rows (for delimited files) or a small number of rows for Excel. If the first 20 rows all contain numeric values in a column, PROC IMPORT declares that column numeric — even if later rows contain character values.
GUESSINGROWS= controls how many rows PROC IMPORT examines before deciding the type. Setting it to MAX tells SAS to scan the entire file.
Using GUESSINGROWS=MAX is slower on large files but eliminates the risk of misclassifying a column that has mixed values or has character values only in later rows.
SAS Log
Use GUESSINGROWS=MAX as the safe default for any file where you are not completely certain that the first few rows are representative of the entire column's content.
For very large files (millions of rows), consider setting GUESSINGROWS= to a large but finite number (e.g., 10000) rather than MAX, to balance accuracy against import time.
Always run PROC CONTENTS on the imported dataset to verify the type and length of each variable before proceeding with analysis.
The Mixed-Type Column Problem
The most common PROC IMPORT pitfall is a column that should be character but gets imported as numeric — or vice versa — because of the guessing logic.
A typical scenario in clinical data: a result column that is mostly numeric (e.g., 12.5, 8.3) but occasionally contains a character string (e.g., "LLOQ", "BQL", ">1000"). PROC IMPORT may guess numeric and then silently lose the character values, converting them to missing.
The solution is to force the column type using a LENGTH or ATTRIB statement in a subsequent DATA step, or to use an IMPORT statement that reads everything as character and then convert selectively.
SAS Log
The ?? modifier on the INPUT function suppresses the invalid data NOTE and sets result_n to missing when conversion fails — this is the correct approach for columns with a mix of numbers and special character codes.
RESULT_C preserves the original character value for any row that has a special code; RESULT_N contains the numeric value for rows where conversion succeeded.
SPECIAL_FLAG makes it easy to separate rows with special values from rows with ordinary numeric results downstream.
Inspect WORK.CLEAN to verify that LLOQ and BQL rows have result_n = missing and special_flag = Y, while numeric rows have a populated result_n.
Dataset View
Verifying an Import with PROC CONTENTS
After every PROC IMPORT run, verify the imported structure immediately using PROC CONTENTS before doing any further processing.
Key things to check: variable types (numeric vs character), variable lengths, and observation count.
A variable you expected to be character but PROC IMPORT made numeric will silently lose all non-numeric values — catching this early prevents downstream analysis errors.
SAS Log
The VARNUM option lists variables in their order of creation rather than alphabetically, which matches the column order in the source file — easier to compare against the original.
Look for the Type column in the output: 1 = numeric, 2 = character. Confirm this matches your expectations for each column.
Look for the Length column: character columns default to the longest value found during guessing. A very short length (e.g., 8) may truncate longer values — increase it in a subsequent DATA step if needed.
Key Points
NAMEROW= specifies which row holds the variable names when the file has title or metadata rows above the header.
DATAROW= specifies where data values begin — must be equal to or greater than NAMEROW=.
GUESSINGROWS=MAX forces SAS to scan the entire file before deciding variable types — use this whenever column content is not uniform in the first few rows.
Mixed-type columns (numbers plus special character codes) are the most common PROC IMPORT problem — import as character or use GUESSINGROWS=MAX, then convert selectively with INPUT and the ?? modifier.
Always run PROC CONTENTS after importing to verify that types and lengths match expectations before proceeding with any analysis.