Summary - In this blog i will load same data (which is split into multiple files) from csv, avro and json format into google bigquery table and capture load timings.
Details - I have citibike trips data available in csv, json and avro format in google cloud storage GS.
Load timings recorded - *(no other load was running on bigquery)
Bulk Load CSV files
launch google cloudshell and run below commands
bq load --source_format=CSV --autodetect pristine-nomad-222804:nyc_data.citibike_trips_csv gs://ag_nycdata/citibike_trips/citibike_trips*.csv.gzip
Details - I have citibike trips data available in csv, json and avro format in google cloud storage GS.
- Get the data - i have downloaded the data from google bigquery public datasets - refer to blog export-google-bigquery-public-dataset.html for steps to download the data.
Load timings recorded - *(no other load was running on bigquery)
Bulk Load CSV files
launch google cloudshell and run below commands
bq load --source_format=CSV --autodetect pristine-nomad-222804:nyc_data.citibike_trips_csv gs://ag_nycdata/citibike_trips/citibike_trips*.csv.gzip
Avg size of each file - 15.6MB (compressed)
Avg rowcount in each file - 584544
Total rowcount loaded - 33319019
Bulk Load JSON files -
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect pristine-nomad-222804:nyc_data.citibike_trips_json gs://ag_nycdata/citibike_trips/citibike_trips*.json.gzip
Bulk Load AVRO files -
bq load --source_format=AVRO --autodetect pristine-nomad-222804:nyc_data.citibike_trips_avro gs://ag_nycdata/citibike_trips/citibike_trips*.avro
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect pristine-nomad-222804:nyc_data.citibike_trips_json gs://ag_nycdata/citibike_trips/citibike_trips*.json.gzip
Avg size of each file - 18.8MB (compressed)
Avg rowcount in each file - 584544
Total rowcount loaded - 33319019
Bulk Load AVRO files -
bq load --source_format=AVRO --autodetect pristine-nomad-222804:nyc_data.citibike_trips_avro gs://ag_nycdata/citibike_trips/citibike_trips*.avro
Avg size of each file - 72.7MB
Avg rowcount in each file - 584544
Total rowcount loaded - 33319019