Add 5 key value pairs:
a. For the first key pair, for Key, enter parentProject
, and for Value, enter your Google project name.
b. For the second key pair, for Key, enter query
, and for Value, type your query: SELECT col1, col2, col3 FROM your_table WHERE col4 = 'yyy'
. The filter of rows and selection of the columns to be returned will be pushed down to BigQuery, improving performance and reducing costs.
c. For the third key pair, for Key , enter viewsEnabled
, and for Value , enter true
.
d. For the forth key pair, forKey , enter materializationDataset
, and for Value , enter a dataset where the GCP user has table creation permission.
If you want to materialize the temporary table in a different project you can also add an optional key pair, for Key , enter materializationProject
, and for Value enter the name of another Project where you have dataset and table creation permission.
e. For the last key pair (optional), for Key, enter maxparallelism
, for Value, enter a number between 1 and 1000. This defines the number of streams that will read from the BigQuery Storage APIs. For performance optimization the number of executor cores in your Glue Job should be higher than the maxparallelism value. By default, the connector creates one partition per 400MB in the table being read (before filtering). As of this writing, the Google Cloud BigQuery Storage API has a maximum of 1,000 parallel read streams.