In one of my Data Analysis projects I had to unzip many zipped CSV source files and club them together.
the files had similar patterns and had one monthly file per zipped file. task at hand was to unzip them and create a single dataset/data frame for analysis. I had R installed in my local machine So I chose to use R programming rather than using shell programming.
Below is how I did it.
1)unzip all of them and move them to input path from landing path:
Explanation:
(i)In line 3 of the code I have first imported the package tidyverse which I would be using throughout the analysis.
(ii)next I have pulled the list of ziped files in the landing path, as we can see in line 6 of the snip.
(iii)the I have set value of the directory where I needed unzipped files
(iv)in line 10 of the snip I have set the working directory
(v) next I have initiated for loop unzipped one after other
2)club all unzipped files into one data frame or CSV file
get the list of all CSV files unzipped by earlier code chunk
Once we have the list of files we just need to read them and combine them. I have used below code chunk to do that
I have passed list of files to for loop.
for first file the DF dataset will not be there; it passes the first IF condition and creates the data frame and reads the file directly from file1
second file onwards it follows second if condition puts it into temp data frame called tempory and later it binds with existing dataset and inserts unique values back to dataset data frame
once we have combined dataset below code can be used to write it to a csv file called test.csv
test run:
I have placed 3 zipped files in landing path
called tidyverse package using library() function
fired next chunk of code
3 files got listed and directories have been set
once for loop was fired; files got unzipped and the input directory was populated with source files
now we have list ready and working directory has been changed
once the for loop was triggered we see the combined data frame was created; we glimpsed that Df to verify
once verified, we fired the command to write the data into CSV
code successfully unzipped and combined 3 source files