BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ciaohermosa
Calcite | Level 5

I have a dataset with variables for year, city, indicator, and indicator_options. I have aggregated them together to get the count of all those for indicator_options within the indicator, city and year. I am trying to suppress data that is between 1-5 and then also considering secondary/complementary suppression so that if there is only one missing value, the next lowest value is suppressed. I have been able to do the first round of suppression but am getting stuck with the secondary suppression.

 

I asked ChatGPT, which gave me a pretty good solution:

 

 

DATA trends;
    SET trends;
    BY options indicator city year;

    /* Step 1: Initialize variables for tracking */
    RETAIN missing_count next_min_value;

    /* Initialize for the first record in each group */
    IF first.year THEN DO;
        missing_count = 0;
        next_min_value = .;
    END;

    /* Step 2: Count the missing values in the group */
    IF count_supp = . THEN missing_count + 1;

    /* Step 3: Find the next lowest non-missing value (next smallest value) */
    IF count_supp NE . THEN DO;
        IF next_min_value = . OR count_supp < next_min_value THEN next_min_value = count_supp;
    END;

    /* Step 4: Modify the next smallest non-missing value if exactly one missing value exists in the group */
    IF last.year AND missing_count = 1 THEN DO;
        IF count_supp = next_min_value THEN count_supp = .; /* Set the next smallest non-missing value to missing */
    END;

    /* Step 5: Output the modified rows */
    count_supp_new = count_supp; /* Create the new column for modified values */
    OUTPUT;

    /* Drop temporary variables */
    DROP missing_count next_min_value;
RUN;

 

However, the problem arises with Step 4 which seems to just be looking at the last row, not the next lowest value within the group. I can't figure out how to modify the code so that it considers the whole group.

 

Here is the output with the code:

yearcityindicatoroptionscountcount_suppcount_supp_new
2017Parisjob titleConsultant888
2019Parisjob titleConsultant666
2021Parisjob titleConsultant4..
2023Parisjob titleConsultant88.

 

Here is what it should look like with the next lowest value marked as missing/suppressed.

yearcityindicatoroptionscountcount_suppcount_supp_new
2017Parisjob titleConsultant888
2019Parisjob titleConsultant66.
2021Parisjob titleConsultant4..
2023Parisjob titleConsultant888
1 ACCEPTED SOLUTION

Accepted Solutions
quickbluefish
Barite | Level 11

Here's a modified version that does that:

data have;
infile cards dsd truncover firstobs=1 dlm='09'x;
length year 3 city indicator options $30 count 3;
input year city indicator options count;
cards;
2017	Paris	job title	Consultant	8
2019	Paris	job title	Consultant	6
2021	Paris	job title	Consultant	4
2023	Paris	job title	Consultant	8
;
run;

data have;
length srt 3;
set have;
srt=_N_;
run;

proc sort data=have; by city indicator options count; run;

data want;
set have;
by city indicator options;
length nsupp count_supp count_supp_new 3;
if first.options then nsupp=0;
count_supp=count;
count_supp_new=count;
if count<5 or nsupp=1 then do;
	count_supp_new=.;
	if count<5 then count_supp=.;
	nsupp+1;
end;
drop nsupp;
run;

proc sort data=want; by srt; run;

proc print data=want; run;

View solution in original post

10 REPLIES 10
quickbluefish
Barite | Level 11

I'm not sure if your data are more complicated than what you've shown here, but if not, then this seems to work - suppresses counts <5 and will suppress the next largest if the number suppressed has not reached two yet:

data have;
infile cards dsd truncover firstobs=1 dlm='09'x;
length year 3 city indicator options $30 count 3;
input year city indicator options count;
cards;
2017	Paris	job title	Consultant	8
2019	Paris	job title	Consultant	6
2021	Paris	job title	Consultant	4
2023	Paris	job title	Consultant	8
;
run;

data have;
length srt 3;
set have;
srt=_N_;
run;

proc sort data=have; by city indicator options count; run;

data want;
set have;
by city indicator options;
length nsupp supp_count 3;
if first.options then nsupp=0;
supp_count=count;
if count<5 or nsupp=1 then do;
	supp_count=.;
	nsupp+1;
end;
drop nsupp;
run;

proc sort data=want; by srt; run;

proc print data=want; run;
 

quickbluefish_0-1744151009732.png

...would definitely test thoroughly, though.

 

ciaohermosa
Calcite | Level 5

Thank you so much for responding! This is really helpful and works for the case I provided and several others.

 

Yes my data is more complex/larger than the example I gave, but was just showing a snippet. It looks like your code worked for several scenarios, but I found an issue in one setting - see below.

 

Below is the output:

yearcityindicatorindicator_optionscountcount_suppcount_supp_new
2017Londonjob siteBuilding A3..
2019Londonjob siteBuilding A99.
2021Londonjob siteBuilding A2..
2023Londonjob siteBuilding A2..

 

But this is what I should get- any ideas on how to adjust the code to make this work?

 

yearcityindicatorindicator_optionscountcount_suppcount_supp_new
2017Londonjob siteBuilding A3..
2019Londonjob siteBuilding A999
2021Londonjob siteBuilding A2..
2023Londonjob siteBuilding A2..
quickbluefish
Barite | Level 11
I might not be understanding. What's the difference between 'count_supp' and 'count_supp_new'? Is the former supposed to contain everything >=5 and the latter supposed to contain everything >=5 that was also not removed for complementary reasons?
ciaohermosa
Calcite | Level 5

Count is the original count of the category. count_supp is the the results after the first round of suppression (marking everything <5 as suppressed - easy). Count_supp_new is for the second round, which would then check if there is one value <5 or one value that is already marked as suppressed, it suppresses the next lowest value within the group. It just separates it out into separate columns so I can view the changes for each step to check the results. The only outcome needs to be one final suppressed column. 

 

As I am writing this, I think your code does that. What I need instead I am realizing is a way to not run the secondary suppression if there are already 2 suppressed values present in count_supp. The goal overall of the secondary suppression is that someone looking at the data can't go back and figure out what the one suppressed number is by doing simple math. So for example, if the counts are 2, 10, 11, and 12, and only 2 is suppressed, you can figure out that 2 is the missing number because you know the sum is 35. What I need to do is suppress 2 and 10. But in the case where the values are 2, 3, 11, and 12, both 2 and 3 are suppressed in the first round of suppression, it doesn't matter that I know the sum is 28 because I don't know how the remaining 5 that aren't showing are divided between the 2 suppressed categories.

 

Hope that makes sense...sorry for the confusion!

quickbluefish
Barite | Level 11
Yes, does the updated code do what you're looking for? I'm very familiar with the concept as it's part of my daily work. It can definitely get a lot trickier when the table you create with these data also have both row and column totals in addition to the individual counts.
ciaohermosa
Calcite | Level 5

Unfortunately the updated code doesn't work. I think the issue is that the code is suppressing the next value, when what it needs to do is suppress the next lowest count_supp value within the group and also if there are already 2 suppressed (or missing = .) values within the group, then it should not do any further suppression.

quickbluefish
Barite | Level 11
OK, what are the raw numbers for which it's not working? It seems to be doing what you're asking for in both cases above (with raw counts of [3, 9, 2, 2] or [8,6,4,8]).
ciaohermosa
Calcite | Level 5

This worked! Sorry there was an issue with the way I was sorting the real data, but once that was corrected the code functioned. Thanks so much for the time you put into this and the quick response!

quickbluefish
Barite | Level 11

Here's a modified version that does that:

data have;
infile cards dsd truncover firstobs=1 dlm='09'x;
length year 3 city indicator options $30 count 3;
input year city indicator options count;
cards;
2017	Paris	job title	Consultant	8
2019	Paris	job title	Consultant	6
2021	Paris	job title	Consultant	4
2023	Paris	job title	Consultant	8
;
run;

data have;
length srt 3;
set have;
srt=_N_;
run;

proc sort data=have; by city indicator options count; run;

data want;
set have;
by city indicator options;
length nsupp count_supp count_supp_new 3;
if first.options then nsupp=0;
count_supp=count;
count_supp_new=count;
if count<5 or nsupp=1 then do;
	count_supp_new=.;
	if count<5 then count_supp=.;
	nsupp+1;
end;
drop nsupp;
run;

proc sort data=want; by srt; run;

proc print data=want; run;
quickbluefish
Barite | Level 11

Glad that helped.  I have always had trouble figuring out a way to automate this process, as I said, for tables that involve row and column totals -- see below for an example.  Here, we've suppressed the count that's below the threshold (4) and the next smallest one (6) to mask it, so the row total for Paris can't be used to back-calculate the 4.  However, because of the column totals, we can currently back calculate either of the suppressed numbers, which leaves the question of which numbers to suppress in those columns, which themselves, in turn, need to be masked so that you can't back calculate from those row totals.  It gets pretty confusing... 

quickbluefish_0-1744290876454.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1150 views
  • 0 likes
  • 2 in conversation
OSZAR »