You can use the FILENAME ZIP method to read and write ZIP archive files within your SAS code. You can also read and create .gz files (gzip) by using FILENAME ZIP with GZIP option. In this article you will learn how this works, as demonstrated in a series of examples.
A ZIP archive can contain one or multiple files, optionally organized in a folder structure. To address and read a single member from the ZIP file, you can use folder-member syntax like this:
filename inzip ZIP "./projects/freddiemac.zip";
data fm;
/* Read text file directly from ZIP archive */
infile inzip(ri130701_13dn01.txt);
input @1 record_type $2. @;
/* continue processing */
run;
Alternatively, you can use the MEMBER= option on the FILENAME ZIP statement:
filename inzip ZIP "./projects/freddiemac.zip"
member="ri130701_13dn01.txt";
data fm;
/* Read text file directly from ZIP archive */
infile inzip;
input @1 record_type $2. @;
/* continue processing */
run;
You can think of a ZIP archive as a folder that contains other files and folders in a hierarchy. In this way it makes sense that you can navigate the ZIP contents by using the directory-related functions DOPEN and DREAD.
filename zipdemo ZIP "&ziproot./zipdemo.zip";
/* List the files in the ZIP */
/* Output to log */
data _null_;
fid=dopen("zipdemo");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
put memname=;
end;
rc=dclose(fid);
run;
Sample output:
memname=class.csv
memname=SciFi-AI.csv
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.07 seconds
Modern Excel files (XLSX) use the ZIP format under the covers, and you can explore the structure with FILENAME ZIP:
filename titanic ZIP "&ziproot./titanic-full.xlsx";
/* List the files in the ZIP */
/* Output to log */
data _null_;
fid=dopen("titanic");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
put memname=;
end;
rc=dclose(fid);
run;
Result, including subfolder names within the XLSX zip structure:
memname=[Content_Types].xml
memname=_rels/.rels
memname=xl/workbook.xml
memname=xl/_rels/workbook.xml.rels
memname=xl/worksheets/sheet1.xml
memname=xl/worksheets/sheet2.xml
memname=xl/theme/theme1.xml
memname=xl/styles.xml
memname=xl/sharedStrings.xml
memname=xl/drawings/drawing1.xml
memname=xl/media/image1.png
memname=xl/webextensions/taskpanes.xml
memname=xl/webextensions/webextension1.xml
memname=xl/worksheets/_rels/sheet2.xml.rels
memname=xl/drawings/_rels/drawing1.xml.rels
memname=xl/webextensions/_rels/taskpanes.xml.rels
memname=docProps/core.xml
memname=docProps/app.xml
memname=xl/worksheets/sheet3.xml
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
You can use the FCOPY function to copy a member file out of a ZIP to another folder in your SAS session. FCOPY requires two filerefs:
filename zipdemo ZIP "&ziproot./zipdemo.zip" member='SciFi-AI.csv';
filename scifi "&ziproot./data/SciFi-AI.csv";
data _null_;
rc=fcopy('zipdemo','scifi');
run;
but remember that you don't need to copy a text file out in order to read it with DATA step:
filename zipdemo ZIP "&ziproot./zipdemo.zip";
data scifi;
/* Read directly just like a normal CSV */
infile zipdemo(SciFi-AI.csv) dsd firstobs=2;
length title $ 20 year 8 cost 8 boxoffice 8;
input title year cost boxoffice;
run;
proc print data=scifi (obs=10);
run;
filename source "&ziproot./dailylog.txt";
filename tozip ZIP "&ziproot./data/dailylog.txt.gz" GZIP;
filename tozip2 ZIP "&ziproot./data/dailylog2.txt.gz" GZIP;
/* read and rewrite text */
data _null_;
infile source;
file tozip ;
input;
put _infile_ ;
run;
/* OR, use FCOPY */
data _null_;
rc=fcopy('source','tozip2');
run;
filename fromzip ZIP "./projects/dailylog_20230821.txt.gz" GZIP;
data logdata;
/* read directly from compressed file */
infile fromzip;
input date : yymmdd10. time : anydttme. ;
format date date9. time timeampm.;
run;
Use FOPEN, FOPTNUM, FOPTNAME and FINFO to learn the specific ZIP member properties such as name, original file size, compressed size, and original date/time.
FILENAME F ZIP "C:\Users\sascrh\Downloads\Zillow_Neighborhoods.zip"
member="Zillow_Neighborhoods.mxd";
data deets;
fId = fopen("f","S");
if fID then
do;
infonum=foptnum(fid);
do i=1 to infonum;
infoname=foptname(fid,i);
select (infoname);
when ('Filename') filename=finfo(fid,infoname);
when ('Member Name') membername=finfo(fid,infoname);
when ('Size') filesize=input(finfo(fid,infoname),15.);
when ('Compressed Size') compressedsize=input(finfo(fid,infoname),15.);
when ('CRC-32') crc32=finfo(fid,infoname);
when ('Date/Time') filetime=input(finfo(fid,infoname),anydtdtm.);
end;
end;
compressedratio = compressedsize / filesize;
output;
end;
fId = fClose( fId );
run;
For a complete example and helpful SAS macros, see Using FILENAME ZIP and FINFO to list the details in your ZIP files.
Series of "ZIP files" articles on blogs.sas.com
How do I read and write ZIP files in SAS? Ask the Expert webinar
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.