I'm wondering if the very specific thing I'm trying to do is possible; there's other ways to solve this (see below) so it's more a question of efficiency than anything.
For background, this is to be part of a macro that writes partial data step statements, so I have one absolute requirement: no step boundaries. With that in mind, I'm trying to efficiently get a list of variables on some dataset, with the kicker that this dataset can ideally be affected by clauses such as dataset(drop=...). Here's a toy macro to illustrate:
%MACRO LIST_VARS(dataset);
%let dsid = %sysfunc(open(&dataset));
%let vars = ;
%do i = 1 %to %sysfunc(attrn(&dsid, nvar));
%let vars = &vars %sysfunc(varname(&dsid, &i));
%end;
%let dsid = %sysfunc(close(&dsid));
&vars
%MEND;
/* Example use -- no clauses, no issue. */
%put %LIST_VARS(sashelp.class);
* Name Sex Age Height Weight ;
The problem is that if you include drop= or keep= clauses, the number of variables will be affected but their variable number will not:
/* Three variables retained: Name (#1), nAtBat (#3), and nAssts (#20) */
%put %LIST_VARS(sashelp.baseball(keep=na:));
* WARNING: Argument 2 to function VARNAME is out of range. ; /* Fires for varname(2) */
* Name nAtBat ; /* Note: no nAssts */
Specifically, while I'd assume the three variables can be retrieved as varname(1, 2, 3) they actually still occupy numbers 1, 3, 20 in this dataset as can be checked via PROC CONTENTS:
proc contents data=sashelp.baseball(keep=na:) varnum;
quit;
* # Variable ;
* 1 Name ;
* 3 nAtBat ;
* 20 nAssts ;
This creates a bit of a catch-22: to get these varnum I'd need the varname, and to get the varname I need the varnum. It is a brute-force option to keep incrementing variable numbers within varname until nvar (3) non-missing names are returned, but that isn't exactly elegant and will result in a bunch of warnings which I'd rather avoid. So, my question: is there any way to do this, i.e. how to efficiently figure out in this example that it's variables 1, 3, and 20 that were retained, in macro language?
---
To be sure, I've already worked around this issue in the following ways: (i) implement drop/keep functionality in pure macro (i.e. read the full variable list and subset manually) but I'd rather also support dataset clauses, and (ii) dump the clauses into a side-session view via %sysfunc(dosubl) to then process that, but that causes quite some performance overhead. The same would hold for running e.g. PROC CONTENTS in a side session. A third option could be to first read in the full dataset, get all variables, and then handle the clauses separately, but that would also require a lot of housekeeping and manual processing. Surely there is some quick function or trick that I'm missing?
Extract the list of variable names from SASHELP.VCOLUMN
Example:
proc sql;
select name from sashelp.vcolumn where libname='SASHELP' and memname='BASEBALL';
quit;
or if you just want variable names that begin with NA
proc sql;
select name from sashelp.vcolumn where libname='SASHELP' and memname='BASEBALL'
and upcase(name) eqt 'NA';
quit;
This should be easy to turn into a macro, if that's what you want.
Can you elaborate on how much performance overhead you anticipate. Are you going to call this macro thousands of times. From your description (this is to be part of a macro that writes partial data step statements) it seems to me that a few calls to DOSUBL to build the data step code would have acceptable performance.
(ii) dump the clauses into a side-session view via %sysfunc(dosubl) to then process that, but that causes quite some performance overhead
Fair, this is nice-to-have rather than performance-critical; the DOSUBL is still easily 95+% of the processing time (see example below). This introduces a bit of additional overhead because I use a separate library to avoid conflicts with "user space". While there is a bit of post-processing of the variable list it comes nowhere near the additional time needed for the side session steps. And indeed, the macro may end up getting called several hundreds or thousands of times over a large set of programs, at which point shaving half a second from each iteration will add up. Or at least, with my current solution it becomes less competitive to macro this up versus writing the code out by hand...
/* Desired version (but doesn't support clauses). */
%MACRO LIST_VARS(dataset);
%local dsid vars i;
%let dsid = %sysfunc(open(&dataset));
%do i = 1 %to %sysfunc(attrn(&dsid, nvar));
%let vars = &vars %sysfunc(varname(&dsid, &i));
%end;
%let dsid = %sysfunc(close(&dsid));
&vars
%MEND;
/* Supports clauses, loses much performance. */
%MACRO LIST_VARS_DOSUBL(dataset);
%local rc;
%let rc = %sysfunc(dosubl(%nrstr(
options dlcreatedir nonotes;
libname templib "%sysfunc(getoption(work))/templib" compress=yes;
data templib.tempds / view=templib.tempds;
set %unquote(&dataset);
run;
)));
%let rc = %sysfunc(libname(TEMPLIB, %sysfunc(getoption(work))/templib,, compress=yes));
%LIST_VARS(TEMPLIB.TEMPDS)
%let rc = %sysfunc(libname(TEMPLIB));
%MEND;
/* Benchmark */
%MACRO TIME_BOTH(dataset, its=10);
%local _ j t0 t1 t2;
%let t0 = %sysfunc(datetime());
%do j = 1 %to &its;
%let _ = %LIST_VARS(&dataset);
%end;
%let t1 = %sysfunc(datetime());
%do j = 1 %to &its;
%let _ = %LIST_VARS_DOSUBL(&dataset);
%end;
%let t2 = %sysfunc(datetime());
%put FIRST VERSION TOOK %sysfunc(putn(%sysevalf(&t1 - &t0), tod12.5));
%put SECOND VERSION TOOK %sysfunc(putn(%sysevalf(&t2 - &t1), tod12.5));
%MEND;
%TIME_BOTH(sashelp.class);
* FIRST VERSION TOOK 0:00:00.016 ;
* SECOND VERSION TOOK 0:00:01.568 ;
You can not retrieve the variable name with option DROP= KEEP= , why not create a temporary dataset ?
%MACRO LIST_VARS(dataset); data temp; set &dataset ; run; %let dsid = %sysfunc(open(temp)); %let vars = ; %do i = 1 %to %sysfunc(attrn(&dsid, nvar)); %let vars = &vars %sysfunc(varname(&dsid, &i)); %end; %let dsid = %sysfunc(close(&dsid)); /* &vars*/ %put &=vars ; %MEND; /* Example use -- no clauses, no issue. */ %LIST_VARS(sashelp.class(drop=name));
If you really want to work with DROP= KEEP=, try function VNEXT().
%MACRO LIST_VARS(dataset); data _null_; set &dataset.; length __vname__ $ 40 __list__ $ 2000; do until(lowcase(__vname__) = '__vname__'); __list__=catx(' ',__list__,__vname__); call vnext(__vname__); end; call symputx('vars',__list__,'G'); stop; run; %put &=vars; /* &vars*/ %MEND; /* Example use -- no clauses, no issue. */ %LIST_VARS(sashelp.class(drop=name));
Thanks, but this doesn't address the main requirement: the macro cannot end an ongoing data step. Throwing this into a side session makes this basically the solution I have now because there isn't really a need to actually process the clauses, just compile.
If there is a way to use something like call vnext in pure macro after a %sysfunc(open)) call, that would be exactly what I'm looking for!
It is not just that it complains about the varnum that is not there. The NVAR dataset attribute also does not provide the right number of variables needed to find all of the names that are there.
25 %put %LIST_VARS(sashelp.class); 1=Name 2=Sex 3=Age 4=Height 5=Weight 26 %put %LIST_VARS(sashelp.class(drop=sex)); WARNING: Argument 2 to function VARNAME referenced by the %SYSFUNC or %QSYSFUNC macro function is out of range. 1=Name 2= 3=Age 4=Height
This isn't entirely accurate -- the problem in your example is that you're querying four variables (nvar = 4) but they are indices 1, 3, 4, 5 while the loop goes 1, 2, 3, 4. Sex (#2) isn't on the dataset anymore, that's what gives the warning, that's what causes the empty return in that slot. If you were to drop the fifth variable, Weight, this example would work without issue, but only because the varnum would remain consecutive then.
Your solution is to keep looping until you retrieve the expected number of varname, but I'd rather have no warnings at all 😞
(it's also not very efficient if you keep few variables on a very large dataset)
What I have done is just add my own note to the log so users know they can ignore the warning.
%macro list_vars(dataset);
%local dsid var i n ;
%let dsid = %sysfunc(open(&dataset));
%let n = %sysfunc(attrn(&dsid, nvar));
%do %until(&i >= &n);
%let i=%eval(&i+1);
%let var=%sysfunc(varname(&dsid,&i));
%if %length(&var) %then %sysfunc(nliteral(&var));
%else %do;
%let n=%eval(&n+1);
%put NOTE: Warning from VARNAME function is because of dropped variables ;
%end;
%end;
%let dsid = %sysfunc(close(&dsid));
%mend list_vars;
Example
16 17 /* Example */ 18 %put %list_vars(sashelp.class); Name Sex Age Height Weight 19 %put %list_vars(sashelp.class(drop=sex)); WARNING: Argument 2 to function VARNAME referenced by the %SYSFUNC or %QSYSFUNC macro function is out of range. NOTE: Warning from VARNAME function is because of dropped variables Name Age Height Weight
Today I would also use DOSUBL for this. But before there was DOSUBL, I hit this problem. I see it as a bug that that VARNAME function does not honor the KEEP/DROP option. I've been told in that in SCL, it does honor KEEP/DROP.
Luckly, Søren Lassen posted some solutions to SAS-L. Basic approach is to open the dataset twice, once with the keep/drop option, and once without. With that, you can loop over the list of variables without the keep/drop applied, and use the VARNUM function to test if the variable exists in the dataset with the keep/drop applied. Code like:
%macro ListVars(data);
/***
Based on %VarList macro developed by Søren Lassen
Posted to SAS-L December 2010 and May 4, 2015
Macro is complicated by fact that the varname function does not honor keep/drop option.
Therefore need workaround to prevent er ror msg if user specifies keep/drop.
The work around is to iterate over list of variables in dataset
without keep/drop applied, and check if each variable is in the dataset
with keep/drop applied.
***/
%local
dsid /*id of &data, with any keep/drop options*/
dsid2 /*id of &data, without any keep/drop options*/
i
var /*name of ith variable in &data*/
return
rc
;
%*If &data had a rename option, macro would not return the renamed variables;
%*So user is not allowed to use rename option.;
%if %sysfunc(find(&data,rename,i)) %then %do;
%put ER%str()ROR: (%nrstr(%%)&sysmacroname) cannot include rename option on dataset.;
%goto mexit;
%end;
%let dsid=%sysfunc(open(&data)); /*dataset with any keep/drop options*/
%let dsid2=%sysfunc(open(%scan(&data,1,%str(%( )))); /*dataset without keep/drop options*/
%do i=1 %to %sysfunc(attrn(&dsid2,nvars));
%let var=%sysfunc(varname(&dsid2,&i));
%if (%sysfunc(varnum(&dsid,&var))) %then
%do;
%if %superq(return) = %str() %then %let return=&var;
%else %let return=&return &var;
%end;
%end;
%let rc=%sysfunc(close(&dsid));
%let rc=%sysfunc(close(&dsid2));
&return
%mexit:
%mend ListVars;
%put %listvars(sashelp.class) ;
%put %listvars(sashelp.class(keep=_numeric_)) ;
%put %listvars(sashelp.class(drop=_character_)) ;
So you can combine the two.
Say you have coded a macro that can get the answer when there are dataset options, but it requires generating a step boundary. Say something like this:
%macro list_vars_step(dataset,mvar=varlist);
%if not %symexist(&mvar) %then %global &mvar;
%let &mvar=;
data _null_;
if 0 then set &dataset;
do while(1=1);
length ___NAME___ $32 ;
call vnext(___NAME___);
if ___NAME___='___NAME___' then stop;
if ' '=resolve('%let &mvar=&&&mvar '||___NAME___||';') then;
end;
stop;
run;
%mend list_vars_step;
Now you can make your new macro to return the list of names without a step boundary use that macro in a DOSUBL() side session only when there are dataset options. Otherwise just use the VARNAME() function.
%macro list_vars(dataset);
%local rc vars dsid i;
%if %sysfunc(indexc(&dataset,())) %then %do;
%global varlist;
%let rc=%sysfunc(dosubl(%nrstr(options nonotes;%list_vars_step(&dataset))));
%let vars=&varlist;
%end;
%else %do;
%let dsid = %sysfunc(open(&dataset));
%do i=1 %to %sysfunc(attrn(&dsid, nvar));
%let vars=&vars %sysfunc(varname(&dsid,&i));
%end;
%let dsid=%sysfunc(close(&dsid));
%end;
&vars.
%mend list_vars;
NOTE the %GLOBAL statement. You cannot return macro variables from DOSUBL() into local macro variables.
339 /* Example */ 340 %put %list_vars(sashelp.class); Name Sex Age Height Weight 341 %put %list_vars(sashelp.class(drop=sex)); Name Age Height Weight
@Tom wrote in part:
NOTE the %GLOBAL statement. You cannot return macro variables from DOSUBL() into local macro variables.
DOSUBL() can return side session global macro variables into main session local macro variables. In this case, you can't use the %GLOBAL statement in the side session to create the macro variable, because it will throw (I think incorrectly):
ERROR: Attempt to %GLOBAL a name (VARLIST) which exists in a local environment.
But if you add an open-code %let statement inside the DOSUBL call, it will happily create a side session global macro variable, which will be returned to the main session local symbol table.
So code like:
%macro list_vars_step(dataset,mvar=varlist);
%if not %symexist(&mvar) %then %global &mvar;
%let &mvar=;
data _null_;
if 0 then set &dataset;
do while(1=1);
length ___NAME___ $32 ;
call vnext(___NAME___);
if ___NAME___='___NAME___' then stop;
if ' '=resolve('%let &mvar=&&&mvar '||___NAME___||';') then;
end;
stop;
run;
%mend list_vars_step;
%macro list_vars(dataset);
%local rc vars dsid i;
%if %sysfunc(indexc(&dataset,())) %then %do;
%local varlist;
%let rc=%sysfunc(dosubl(%nrstr(%let varlist=;%list_vars_step(&dataset) %put INSIDE DOSUBL ; %put _user_ ; %put ; ))); /*added code here*/
%put AFTER DOSUBL ; %put _user_ ; %put ; /*added code here*/
%let vars=&varlist;
%end;
%else %do;
%let dsid = %sysfunc(open(&dataset));
%do i=1 %to %sysfunc(attrn(&dsid, nvar));
%let vars=&vars %sysfunc(varname(&dsid,&i));
%end;
%let dsid=%sysfunc(close(&dsid));
%end;
&vars.
%mend list_vars;
%put %list_vars(sashelp.class(drop=sex));
Returns:
33 %mend list_vars; INSIDE DOSUBL GLOBAL VARLIST Name Age Height Weight LIST_VARS DATASET sashelp.class(drop=sex) LIST_VARS DSID LIST_VARS I LIST_VARS RC LIST_VARS VARLIST LIST_VARS VARS 34 35 %put %list_vars(sashelp.class(drop=sex)); AFTER DOSUBL LIST_VARS DATASET sashelp.class(drop=sex) LIST_VARS DSID LIST_VARS I LIST_VARS RC 0 LIST_VARS VARLIST Name Age Height Weight LIST_VARS VARS Name Age Height Weight
The log is a bit out of order, but you can see that inside the DOSUBL block there is a global macro variable VARLIST, the value of which is returned the local macro variable when DOSUBL completes.
Thanks.
%macro list_vars(dataset);
%local rc vars dsid i;
%if %sysfunc(indexc(&dataset,())) %then %do;
%let rc=%sysfunc(dosubl(%nrstr(
options nonotes;
%let vars=;
%list_vars_step(&dataset,mvar=vars)
)));
%end;
%else %do;
%let dsid = %sysfunc(open(&dataset));
%do i=1 %to %sysfunc(attrn(&dsid, nvar));
%let vars=&vars %sysfunc(varname(&dsid,&i));
%end;
%let dsid=%sysfunc(close(&dsid));
%end;
&vars.
%mend list_vars;
Results
199 /* Example */ 200 %let varlist=; 201 %put %list_vars(sashelp.class); Name Sex Age Height Weight 202 %put %list_vars(sashelp.class(drop=sex)); Name Age Height Weight 203 %put %list_vars(sashelp.class(rename=(sex=Gender))); Name Gender Age Height Weight
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.