Showing posts with label gcp bigquery. Show all posts
Showing posts with label gcp bigquery. Show all posts

Sunday, November 18, 2018

Export google bigquery public dataset to json, avro, csv


Summary - In this blog we will see how we can export data from google bigquery public datasets to csv, avro and json format files.

Details - There are multiple ways you can export public datasets to files from google bigquery. You can use tools like talend who can do it for you or you can use inbuilt features from google bigquery to do same.
I will like to demonstrate how easy it is to export these datasets into multiple files of different formats using google bigquery export options
  1. Get access to google cloud platform - you can start for free and get credits.
  2. once you are setup with first project - go to bigquery from gcp console.
  3. navigate to your desired public dataset in the left hand panel.
  4. select the table you want to export and click on preview option to see sample data in right hand bottom panel.
  5. In screenshot below i have used "bigquery-public-data.new_york" dataset and it has tables which stores data related to yellow taxi trips. I have selected tlc_yellow_trips_2016 in left hand panel.


  • Once you are in preview panel - you will have an option of "EXPORT".
  • Click on this option and enter/your google cloud storage bucket and folder where you want to export the data to 
  • You can also select export file format and compression
  • It is advisable to use * in filename - which allows gcp bigquery job to split export into multi-part files - which makes it easier to further load these files into other databases or process them in parallel.
  • In example below i am exporting data into my bucket called ag_nycdata and folder is tlc_yellow_trips_2016
  • I have used multiple exports like csv without compression, csv with compression, json with and without compression and avro.
Once you start the export - the data gets exported to google cloud storage and files gets generated as expected - see below