BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
possible
Obsidian | Level 7

Thank you for taking time to review my question and Greatly appreciate your time and effort on this. 

 

Need to create a report if datasets have one more special characters are been entered in same cell or if entered in the one more columns in the datasets. I have 15 datasets where i would need to repeat this to find the special characters.

 

The report should contain all the entire row of dataset. if special character is records. 

 

Allowed characters: all small & uppercase letters, numbers and []{}()!?/,.%+@$&*:=^_-\";<>~# ’ ≥ ≤ and ' ' space.

 

Example characters entered in the dataset is "Ⅱ、Ⅲ、vf、at、on。" 

 

Where "、" is special character and the row should be printed. 

 

@Ksharp @Tom 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Sounds like you want the VERIFY() function (or perhaps KVERIFY() since you seem to want to accept some characters that are not 7bit ASCII codes)

 

If you want to test whether any bytes in a character variable are not normal ASCII codes then you could use the range of codes from space to tilde.

proc print data=mydata;
  where verify(mycol,collate(32,126));
run;

If you want to allow things like  ≥ ≤ then you are not using a single byte encoding and so will need to use KVERIFY() instead.

proc print data=mydata;
  where kverify(mycol,collate(32,126)||'E289A5E289A4'x);
run;

 

If you want to test multiple variables use cats() function.

proc print data=mydata;
  where verify(cats(vara,varb,mycol),collate(32,126));
run;

To test all of the variables might be easier to use a data step so you could use variable list.

data report;
  set mydata;
  if verify(cats(of _character_),collate(32,126));
run;

View solution in original post

12 REPLIES 12
Tom
Super User Tom
Super User

Sounds like you want the VERIFY() function (or perhaps KVERIFY() since you seem to want to accept some characters that are not 7bit ASCII codes)

 

If you want to test whether any bytes in a character variable are not normal ASCII codes then you could use the range of codes from space to tilde.

proc print data=mydata;
  where verify(mycol,collate(32,126));
run;

If you want to allow things like  ≥ ≤ then you are not using a single byte encoding and so will need to use KVERIFY() instead.

proc print data=mydata;
  where kverify(mycol,collate(32,126)||'E289A5E289A4'x);
run;

 

If you want to test multiple variables use cats() function.

proc print data=mydata;
  where verify(cats(vara,varb,mycol),collate(32,126));
run;

To test all of the variables might be easier to use a data step so you could use variable list.

data report;
  set mydata;
  if verify(cats(of _character_),collate(32,126));
run;
possible
Obsidian | Level 7
kverify step seems working fine as expected but one small issue is if the data has any space after the comma then the code is printing that data as well and how can I include all the variables instead of manually adding them. Can I use _all_ or any other function at mycol ?

example: device,The day was today
Tom
Super User Tom
Super User

That is probably NOT an actual space.  Might be a non-breaking space which is normally 'A0'x in single byte encodings and should be 'C2A0'x in UTF-8 encoding.

 

You cannot use variable lists in WHERE.  That is why I suggested using a data step so you could use an IF statement instead.

possible
Obsidian | Level 7
Make senses but in the data step, alot of incorrect data has been printed.
Tom
Super User Tom
Super User

@possible wrote:
Make senses but in the data step, alot of incorrect data has been printed.

Not sure what you mean.  Are you getting errors when running the data step? Or are you records selected that do not appear to have any invalid characters?

 

SAS might not like running CATS() on a lot of long variables since the result might be too long.  You might need to work a little harder.

data report;
  set have;
  any=0;
  array _char_ _character_;
  do _n_=1 to dim(_char_) while (not any);
     any = kverify(_char_[_n_],collate(32,126)||'C2A0'x);
  end;
  if any;
run;
possible
Obsidian | Level 7
Yes, if I use the data step then the records in the report doesn't have any invalid characters but they're still generated in the report.

I think in this case where step is best option and i may need to manually adjust the variables
Tom
Super User Tom
Super User

Since you are picking observations ("records") based on the presence of invalid characters in ANY of the variables there will be some values printed that are fine.

 

If you see and example where none of the values appear to have any invalid characters then look at the HEX codes for the values.  To make it easier you could remove the valid characters first using KCOMPRESS()  and then only print the hexcodes for the characters that are left.

 

Say this issue is identified in observation 10 you could run this data step to see any non valid characters in that observation only.

data test;
  set have (firstobs=10 obs=10);
  length string $200;
  string=kcompress(cats(of _character_),collate(26,126)||'C2A0'x));
  len = lengthn(string);
  string = putc(string,cats('$hex',2*len));
  put string=;
run;
  

You might see things like TAB ('09'x) or LINEFEED ('0A'x) or perhaps some other invisible character.

possible
Obsidian | Level 7
Thank you for helping. I was helping but I got a new request where only effected field must be printed and not entire row. this means if a variable has special characters then only that field with the values should be printed. How to handle this?
Tom
Super User Tom
Super User

In that case you will want to use the DO loop method to check each member of the array of character variables.  Add an OUTPUT statement inside the DO loop to write out an observation per variable with issues.  (or just have the data step write the report directly without saving.  Which was easier in the days of plain text output.)

 

You will need to take some care to use variable names that do not actually appear in your dataset.

One to store the NAME of the variable with issues and another to store the VALUE of the variable.

You might also want to keep any KEY variables that you dataset has, or perhaps just keep the observations number.

 

data report;
  length __obs__ 8 __name__ $32 __value__ $400 ;
  set have;
  __obs__+1;
  array _char_ _character_;
  do _n_=1 to dim(_char_) ;
    if kverify(_char_[_n_],collate(32,126)||'C2A0'x) then do;
      __name__ = vname(_char_[_n_]);
      __value__ = _char_[_n_];
      output;
    end;
  end;
  keep __obs__ __name__ __value__;
run;
possible
Obsidian | Level 7
Thank you. This is working. For some of the values, I see that "?" is been printed into output which is valid ASCII code and this record shouldn't be in the listing. Any suggestion?
Tom
Super User Tom
Super User

Look at the actual hex codes of the characters in the field.  Most likely there is a TAB or other invisible character in addition to the ?.  Or perhaps the printer is just displaying ? instead of some character it does not understand.

 

You might want to include that in the report.  For example you might include the hex code of the first invalid character it finds.

data report;
  length __obs__ __loc__ 8 __hex__ $8 __name__ $32 __value__ $400 ;
  set have;
  __obs__+1;
  array _char_ _character_;
  do _n_=1 to dim(_char_) ;
    __loc__ =  kverify(_char_[_n_],collate(32,126)||'C2A0'x) ;
    if __loc__ then do;
      __hex__ =ksubstr(_char_[_n_],__loc__,1);
      __hex__ =putc(__hex__,cats('$hex',2*length(__hex__)));
      __name__ = vname(_char_[_n_]);
      __value__ = _char_[_n_];
      output;
    end;
  end;
  keep __obs__ __loc__ __hex__ __name__ __value__;
run;
possible
Obsidian | Level 7
Thank you for the assists Tom. It worked.

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 970 views
  • 6 likes
  • 2 in conversation
OSZAR »