Thank you for taking time to review my question and Greatly appreciate your time and effort on this.
Need to create a report if datasets have one more special characters are been entered in same cell or if entered in the one more columns in the datasets. I have 15 datasets where i would need to repeat this to find the special characters.
The report should contain all the entire row of dataset. if special character is records.
Allowed characters: all small & uppercase letters, numbers and []{}()!?/,.%+@$&*:=^_-\";<>~# ’ ≥ ≤ and ' ' space.
Example characters entered in the dataset is "Ⅱ、Ⅲ、vf、at、on。"
Where "、" is special character and the row should be printed.
Sounds like you want the VERIFY() function (or perhaps KVERIFY() since you seem to want to accept some characters that are not 7bit ASCII codes)
If you want to test whether any bytes in a character variable are not normal ASCII codes then you could use the range of codes from space to tilde.
proc print data=mydata;
where verify(mycol,collate(32,126));
run;
If you want to allow things like ≥ ≤ then you are not using a single byte encoding and so will need to use KVERIFY() instead.
proc print data=mydata;
where kverify(mycol,collate(32,126)||'E289A5E289A4'x);
run;
If you want to test multiple variables use cats() function.
proc print data=mydata;
where verify(cats(vara,varb,mycol),collate(32,126));
run;
To test all of the variables might be easier to use a data step so you could use variable list.
data report;
set mydata;
if verify(cats(of _character_),collate(32,126));
run;
Sounds like you want the VERIFY() function (or perhaps KVERIFY() since you seem to want to accept some characters that are not 7bit ASCII codes)
If you want to test whether any bytes in a character variable are not normal ASCII codes then you could use the range of codes from space to tilde.
proc print data=mydata;
where verify(mycol,collate(32,126));
run;
If you want to allow things like ≥ ≤ then you are not using a single byte encoding and so will need to use KVERIFY() instead.
proc print data=mydata;
where kverify(mycol,collate(32,126)||'E289A5E289A4'x);
run;
If you want to test multiple variables use cats() function.
proc print data=mydata;
where verify(cats(vara,varb,mycol),collate(32,126));
run;
To test all of the variables might be easier to use a data step so you could use variable list.
data report;
set mydata;
if verify(cats(of _character_),collate(32,126));
run;
That is probably NOT an actual space. Might be a non-breaking space which is normally 'A0'x in single byte encodings and should be 'C2A0'x in UTF-8 encoding.
You cannot use variable lists in WHERE. That is why I suggested using a data step so you could use an IF statement instead.
@possible wrote:
Make senses but in the data step, alot of incorrect data has been printed.
Not sure what you mean. Are you getting errors when running the data step? Or are you records selected that do not appear to have any invalid characters?
SAS might not like running CATS() on a lot of long variables since the result might be too long. You might need to work a little harder.
data report;
set have;
any=0;
array _char_ _character_;
do _n_=1 to dim(_char_) while (not any);
any = kverify(_char_[_n_],collate(32,126)||'C2A0'x);
end;
if any;
run;
Since you are picking observations ("records") based on the presence of invalid characters in ANY of the variables there will be some values printed that are fine.
If you see and example where none of the values appear to have any invalid characters then look at the HEX codes for the values. To make it easier you could remove the valid characters first using KCOMPRESS() and then only print the hexcodes for the characters that are left.
Say this issue is identified in observation 10 you could run this data step to see any non valid characters in that observation only.
data test;
set have (firstobs=10 obs=10);
length string $200;
string=kcompress(cats(of _character_),collate(26,126)||'C2A0'x));
len = lengthn(string);
string = putc(string,cats('$hex',2*len));
put string=;
run;
You might see things like TAB ('09'x) or LINEFEED ('0A'x) or perhaps some other invisible character.
In that case you will want to use the DO loop method to check each member of the array of character variables. Add an OUTPUT statement inside the DO loop to write out an observation per variable with issues. (or just have the data step write the report directly without saving. Which was easier in the days of plain text output.)
You will need to take some care to use variable names that do not actually appear in your dataset.
One to store the NAME of the variable with issues and another to store the VALUE of the variable.
You might also want to keep any KEY variables that you dataset has, or perhaps just keep the observations number.
data report;
length __obs__ 8 __name__ $32 __value__ $400 ;
set have;
__obs__+1;
array _char_ _character_;
do _n_=1 to dim(_char_) ;
if kverify(_char_[_n_],collate(32,126)||'C2A0'x) then do;
__name__ = vname(_char_[_n_]);
__value__ = _char_[_n_];
output;
end;
end;
keep __obs__ __name__ __value__;
run;
Look at the actual hex codes of the characters in the field. Most likely there is a TAB or other invisible character in addition to the ?. Or perhaps the printer is just displaying ? instead of some character it does not understand.
You might want to include that in the report. For example you might include the hex code of the first invalid character it finds.
data report;
length __obs__ __loc__ 8 __hex__ $8 __name__ $32 __value__ $400 ;
set have;
__obs__+1;
array _char_ _character_;
do _n_=1 to dim(_char_) ;
__loc__ = kverify(_char_[_n_],collate(32,126)||'C2A0'x) ;
if __loc__ then do;
__hex__ =ksubstr(_char_[_n_],__loc__,1);
__hex__ =putc(__hex__,cats('$hex',2*length(__hex__)));
__name__ = vname(_char_[_n_]);
__value__ = _char_[_n_];
output;
end;
end;
keep __obs__ __loc__ __hex__ __name__ __value__;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.