BookmarkSubscribeRSS Feed
kadesh
Calcite | Level 5
Hi everyone, I need your help because I'm going crazy. I need to create a program that, starting from an imported dataset, searches for the first possible combination of amounts, which when added together give the searched amount. Let me explain better with an example. Suppose we have a dataset consisting of two columns. In the first column there is a list of amounts, which represent the invoices received, in the second column the amount to search for (which is repeated the same for the entire extension of the dataset). I need the program to find the combination of amounts in the first column that result in the amount found in the second column.
Example:
Column 1:
100
200
50
400
1000

Column 2:
1700
1700
1700
1700
1700
I need the program to find the combination that gives the result 1700. So it will be line 1 + line 2 + line 4 + line 5. If there are multiple combinations, he will have to stop at the first find, without continuing further.
Finally, I would need this combination to be reported in a new column called "combination".

I have made many attempts, but I can only use "nested cycles" which however allow limited management of the amounts. In my project, the invoices to be added to arrive at the amount to be searched for can even be hundreds.

Thanks for your help
12 REPLIES 12
Tom
Super User Tom
Super User

Your data structure seems strange.  Why do you have the same total repeated onto every observation instead of the total being in its own dataset with just one observation?

 

This is actually a complex problem that might be something you could solve using some of SAS's more advanced operations research procedures (not my area of expertise).

 

You could conceivably brute force your way to a solution my analyzing all combinations of 1 value, then all combinations of 2 values, etc until you find one that adds to the number in question.

 

What did you try?  Did you try using CALL ALLCOMBI() ?  

https://documentation.sas.com/doc/en/vdmmlcdc/8.1/lefunctionsref/n1ejs42wcawniyn1lfoj8r4taf1o.htm

 

Tom
Super User Tom
Super User

First let's make a dataset that looks like what you described.

data have;
  input item ;
  total=1700;
cards;
100
200
50
400
1000
;

Now let's convert it into a wide structure that is easier to work with.

proc transpose data=have out=wide(drop=_name_);
  by total;
  var item;
run;

Now we can use COMB() and CALL ALLCOMB() to brute force a way through all combinations until one adds up.

data want;
  set wide;
  array col col:;
  n=dim(col);
  found=0;
  length list $200;
  do k=1 to n until(found);
    do j=1 to comb(n,k) until(found);
      call allcomb(j,k,of col[*]);
      sum=0;
      list=' ';
      do i=1 to k;
        sum+col[i];
        list=catx(' ',list,col[i]);
      end;
      found = sum = total ;
    end;
  end;
  if not found then call missing(k,list);
  keep k list;
run;

Results

Tom_0-1728338371408.png

 

kadesh
Calcite | Level 5
Thank you very much, this solution works, unfortunately AllComb seems to have a limit of 33 items. The database I use can contain hundreds of invoices. I believe the only solution is through using OR OPT
ballardw
Super User

@kadesh wrote:
Thank you very much, this solution works, unfortunately AllComb seems to have a limit of 33 items. The database I use can contain hundreds of invoices. I believe the only solution is through using OR OPT

That sort of constraint would have been nice to have in the original question.

And right in the documentation of Call Allcomb is

If you need to find combinations of more than 33 items, use the CALL ALLCOMBI routine.

which would be similar.

 

Though I will say I have a hard time visualizing the usefulness

ballardw
Super User

How many rows of data are there?

 

One of the places where transpose to a wide data set and then use the various permutation/combination functions.

 

data have;
   input v t;
datalines;
100   1700
200   1700
50    1700
400   1700
1000  1700
;

proc transpose data=have out=trans (drop=_name_);
   by t;
run;
     
data want;
   set trans;
   array x(*) col: ;
   array c(5);  /*the 5 here should equal the number of values in the X array*/
   n=dim(x);
   do k=1 to n; /* number of values to consider, i.e 1, 2, 3, 4, at a time*/
      pcomb = comb(n,k); /* how many combinations of n choose k*/
      do j=1 to pcomb;  /* to the first combination. */
         /* make sure the target list is empty*/
         call missing(of c(*));
         call allcomb(j,k,of x(*));
         /* copy the selected values, the first k */
         do h=1 to k;
            c[h]=x[h];
         end;
         /* if the target value is met*/
         if t= sum(of c(*)) then output;
         
      end;
   end;
   drop n k pcomb j h;
run;

While this works for your example list it will have serious time problems if the number of values gets very large as we are dealing with factorial numbers here. For example if there are 40 values then there will be on the order of 658,000 loops just in the consideration of 40 selecting 5 elements at a time. If you have 40 values there will be 1.0995116E12 approximately loops executed. Might take a bit of time.

 

This does not have any useable output if the target number is not matched exactly. The variables C1, C2,...C5 will have some combination of the values if there is any.

 

 

Ksharp
Super User

If you have many obs like hundreds of obs ,you need help of SAS/OR . and can get all the solution.

Post it at OR forum and calling @RobPratt 

https://communities.sas.com/t5/Mathematical-Optimization/bd-p/operations_research

 

data have;
input Column1;
cards;
100
200
50
400
1000
;

proc optmodel;
set idx ;
num x{idx};
var v{idx} binary;

read data have into  idx=[_n_] x=Column1 ;
con con1:sum{i in idx} v[i]*x[i]=1700;
solve with clp/findallsolns;
create data want from [combination id]={s in 1.._NSOL_,i in idx} Column1=x[i] flag=v[i].sol[s];
quit;

proc print;run;

Ksharp_0-1728355589390.png

 

Ksharp
Super User
If you want only one solution to enhance efficience, could change
solve with clp/findallsolns;
into
solve with clp ;
kadesh
Calcite | Level 5

Using the ALLCOMBS function works but has a limit of 33 items, while the invoices to manage can be hundreds. Perhaps the only solution, as you suggested, is to use OR OPT, unfortunately I don't have the license for this tool. I'm afraid I'll have to give up...
Ksharp
Super User
You can use SAS/OR freely by SAS OnDemand for Academic:
https://welcome.oda.sas.com/login
Astounding
PROC Star

Years ago, I wrote a DATA step to find all combinations without using any functions.  I had to stop after about 25 values, because the program took so long to run.  Let me try to explain why.

Each value  you add doubles the number of possible combinations.  That value can either be omitted, giving you the same number of  combinations you had without the value, or included, giving you another equally sized set of combinations.  So with 100 values, you can have 2**100 combinations (I guess you could subtract one because you don't need to include the empty set of all values omitted.)

If you had a machine that could process 1M values per second, it could handle 20 values easily since 2**20 is roughly 1M.  Once you get up to 40 values, you would need 1M seconds (almost 12 days if I did the math right).  If you wrote a program to process 100 values, you would not live long enough to have it process every combination.  

ballardw
Super User
/* count all the 100 choose n for n=1 to 100*/
data combinations;
   do i=1 to 100;
      x= sum(x,comb(100,i));
   end;
run;

Roughly 1.2676506E30

Or 1.2676506002282E23 seconds at the million per second. Which wouldn't complete before the year 20,000.

 

Permutations where the ORDER of values gets considered is way more...

 


@Astounding wrote:

Years ago, I wrote a DATA step to find all combinations without using any functions.  I had to stop after about 25 values, because the program took so long to run.  Let me try to explain why.

Each value  you add doubles the number of possible combinations.  That value can either be omitted, giving you the same number of  combinations you had without the value, or included, giving you another equally sized set of combinations.  So with 100 values, you can have 2**100 combinations (I guess you could subtract one because you don't need to include the empty set of all values omitted.)

If you had a machine that could process 1M values per second, it could handle 20 values easily since 2**20 is roughly 1M.  Once you get up to 40 values, you would need 1M seconds (almost 12 days if I did the math right).  If you wrote a program to process 100 values, you would not live long enough to have it process every combination.  


 

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1816 views
  • 4 likes
  • 6 in conversation
OSZAR »