Hi all,
I came across an interesting behavior in the SAS data step. Essentially, two identical calculations have different results.
data indata;
a = 3;
m = 1/a;
n = 1/a;
x = m-n;
y = m-n;
run;
proc print data=indata;
run;
The results are below.
As you can see, X and Y are different, despite both being set to the same operation of (m-n). I understand that there are issues with floating point values in programming where a subtraction of two identical fractions may not result in 0. My confusion is why x and y are not either both 0 or both the value of -1.8513E-17.
Cheers
I am in Technical Support. I am sorry I led you down the path of numeric precision when that isn't exactly the issue here. I am running the exact same version of 9.4M6 as you:
13 %put &sysvlong;
9.04.01M6P110718
and I am not replicating the same difference that you show.
While the behavior does seem strange and incorrect, the fact that we cannot replicate it in the same version and the fact that it does not appear to occur in later releases suggests that it would not be treated as a bug for M6. Ultimately, the way to account for the difference when using imprecise values is to use the ROUND function.
Since you are running on Windows 32-bit, the following SAS Note might provide some additional explanation:
6214 - Similar equations yield different results on Intel machines
If you would like to continue to pursue this through a case with Technical Support, please feel free to open a new one and include your complete log and output.
Are you sure you printed the same dataset as made by that code?
1 data indata; 2 a = 3; 3 4 m = 1/a; 5 n = 1/a; 6 7 x = m-n; 8 y = m-n; 9 run; NOTE: The data set WORK.INDATA has 1 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 10 11 proc print data=indata; 12 run; NOTE: There were 1 observations read from the data set WORK.INDATA. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
Yes. I just re-ran in a fresh session. I added two additional 'flags' to check equality.
data indata;
a = 3;
m = 1/a;
n = 1/a;
x = m-n;
y = m-n;
if m=n then FLAG1="Y";
if x=y then FLAG2="Y";
run;
proc print data=indata;
run;
What SAS version are you running? A coworker ran the above code on SAS EG as well, and observed similar results.
Hi Tom,
I reran the same code and got the result that the OP had.
However, if I insert the "put" command in between, suddenly both are 0 again. But outputting the values to log shouldn't change the calculation, should it?
Best,
I can't replicate your results. I ran on 9.4M7 (Windows) and 9.4M8 (Linux).
Hi Quentin,
Thanks for testing on your machine! It appears I have a different version.
Cheers
Pictures of code are not that helpful. The text itself is better. Same for the LOG. And preferably the OUTPUT also, but using ODS instead of plain old text output makes that difficult now.
Showing the LOG would help us be sure that you have actually created a NEW dataset. If the data step failed for some reason, such as you have the file open in another window, then the printout is of an something different. Perhaps in that different dataset X and Y were calculated differently.
Also it would be interesting to see if the pattern continues if you change the code.
Say by making more variables that should have the same result. Do they all get different values? Or are the two values you are getting now the only possible results?
Or perhaps just adding other statements that don't really do anything might make the data step compiler generate different code and cause it to work.
This is classic numeric precision in SAS. We recommend using the ROUND function.
data indata;
a = 3;
m = 1/a;
n = 1/a;
x = round(m-n,.01);
y = round(m-n,.01);
if m=n then FLAG1="Y";
if x=y then FLAG2="Y";
run;
proc print data=indata;
run;
Hi,
Do you mind if I ask further clarification question? Because it seems that this goes beyond the precision issue. If 1/3 cannot be represented precisely in binary, shouldn't the value of c and d be BOTH off by the same amount and their difference should be zero regardless?
I tried rerun the code and it seems that simply outputting the results to the log using put (as seen on my post above) would change the result of r and s. This seems too unintuitive for me to be explain by the numeric precision issue.
@FoxHope wrote:
Hi,
Do you mind if I ask further clarification question? Because it seems that this goes beyond the precision issue. If 1/3 cannot be represented precisely in binary, shouldn't the value of c and d be BOTH off by the same amount and their difference should be zero regardless?
Agree, what you are reporting doesn't make sense to me, and isn't the basic numeric precision issue.
Even if somehow these two statements could return different values (because of what, random noise in precision):
m = 1/a;
n = 1/a;
I can't think of an explanation for how these two statements could return different values (accept again random noise in precision...)
x = m-n;
y = m-n;
And yes, your addition of a PUT statement shouldn't change values.
Something odd is happening. Did you quit SAS and restart a fresh session?
Hi Quentin,
Yes, I did restart my computer (just in case), restart the session, tried in both SAS 9.4 and SAS EG, I still have the same result that I posted earlier. I wonder if it's because I am running on a slightly older version of SAS (9.4_M6) since other people can't replicate what I see (except OP since we are having the same results).
Best
You might try submitting to tech support. I can't think of an explanation. It's interesting that you both have the same version (9.4M6) showing this odd behavior. But I still have a hard time believing this is a weird bug.
Here is another example that may help clarify the numeric precision issue. Note that both X and Y appear to be 1 but they are slightly different when viewed in hex.
132 data test;
133 x=1;
134 y=1/10;
135
136 do i=1 to 9;
137 x+1;
138 y + 1/10;
139 end;
140 x=x/10;
141
142 put x hex16.;
143 put y hex16.;
144 run;
3FF0000000000000
3FEFFFFFFFFFFFFF
Whenever the numeric variable is manipulated with an arithmetic function, the manipulation is done in binary. In other words, the floating point number is manipulated in binary and the result we see is the decimal form of the manipulation. SAS does not convert the number to decimal form, and then perform the manipulation in decimal.
Therefore, no matter what SAS step you use, PROC SQL, DATA Step or any other procedure, there is a chance that the binary mathematics will not result in the decimal mathematics.
The reason that SAS uses floating point representation is because this representation of numerics is what the operating system can easily manipulate. Using floating point allows the operating system to manipulate the bytes SAS stores without the need to convert those bytes to a binary form. This saves a significant amount of processing time. Additionally, using floating point representation, SAS can store numbers with larger magnitude and greater precision with less bytes than if we used a decimal representation.
That makes sense as the usual numeric precision issue.
But I don't see how that could explain:
a = 3;
m = 1/a;
n = 1/a;
Assigning different values to M and N. Can it?
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.