Summary - In this blog i will show how we can use talend open studio to download file from gcs bucket/folders to local machine. I have extra logic built in - which first checks if file already exists in local machine and only downloads the file if its not existing. You can adjust the logic as per your needs.
Details - As of now talend supports connection to gcs based on access-key and secret key.
- As a pre-requisite you should have access to gcp and google cloud storage (gcs) buckets.
- You can start for free and get credits
- once you are setup on gcp and have initial project launched - you can create a bucket in gcs
- Talend has tGS* components to help integration with gcp.
 |
enabling Interoperability |
- you have to get your access key and secret key from gcp. To get these keys you have to enable Interoperability in gcs panel. See screenshot below-
Next you should use option "create a new key" to generate keys and make a note of these.
- Now in talend open studio for data integration - from palette select tGSConnection component and enter above keys. This component establishes connection to gcp.
- tGSList component can be used to get list of item in a gcs bucket.
- The logic i have used is to get list of all items (key/name of which starts with a prefix) and iterate for each object and see if it already exists in my local folder on local machine. If it exists i skip this file else i use tGSGet to get this file from gcs to local machine
- Finally i use tGSClose to close connection to gcp.
 |
Talend Job |
 |
Components to connect to GCP and get bucket its from GCS |
 |
Specifying the key prefix so that relevant files are returned by tGSList |
 |
Java Logic to get file name from key returned by gcs |
tJava is used to derive the file name from the complete key name returned by tGSList and
Java Code
String strTemp = ((String)globalMap.get("tGSList_1_CURRENT_KEY"));
int index_of_slash = strTemp.indexOf("/");
if (index_of_slash ==-1)
globalMap.put("gs_filename", strTemp);
else
globalMap.put("gs_filename", strTemp.substring(index_of_slash+1));
 |
Path of local of local file/folder to be checked for existance |
 |
using NOT as we want to get file if it not exists |
 |
giving output directory and key name in tGSGet component |