Showing posts with label tgsconnection. Show all posts
Showing posts with label tgsconnection. Show all posts

Sunday, November 18, 2018

Download data from google cloud storage (gcs) using Talend

Summary - In this blog i will show how we can use talend open studio to download file from gcs bucket/folders to local machine. I have extra logic built in - which first checks if file already exists in local machine and only downloads the file if its not existing. You can adjust the logic as per your needs.

Details - As of now talend supports connection to gcs based on access-key and secret key.

  1. As a pre-requisite you should have access to gcp and google cloud storage (gcs) buckets.
  2. You can start for free and get credits
  3. once you are setup on gcp and have initial project launched - you can create a bucket in gcs
  4. Talend has tGS* components to help integration with gcp.
  5. enabling Interoperability
  6. you have to get your access key and secret key from gcp. To get these keys you have to enable Interoperability in gcs panel. See screenshot below- 


Next you should use option "create a new key" to generate keys and make a note of these.
  • Now in talend open studio for data integration - from palette select tGSConnection component and enter above keys. This component establishes connection to gcp.
  • tGSList component can be used to get list of item in a gcs bucket.
  • The logic i have used is to get list of all items (key/name of which starts with a prefix) and iterate for each object and see if it already exists in my local folder on local machine. If it exists i skip this file else i use tGSGet to get this file from gcs to local machine
  • Finally i use tGSClose to close connection to gcp.
Talend Job
Components to connect to GCP and get bucket its from GCS

Specifying the key prefix so that relevant files are returned by tGSList


Java Logic to get file name from key returned by gcs

tJava is used to derive the file name from the complete key name returned by tGSList and
Java Code
String strTemp = ((String)globalMap.get("tGSList_1_CURRENT_KEY"));
int index_of_slash = strTemp.indexOf("/");
if (index_of_slash ==-1)  
globalMap.put("gs_filename", strTemp);
else

globalMap.put("gs_filename", strTemp.substring(index_of_slash+1));


Path of local of local file/folder to be checked for existance


using NOT as we want to get file if it not exists


giving output directory and key name in tGSGet component