Skip to main content

Migrating from BigQuery to ClickHouse

This guide shows how to migrate data from BigQuery to ClickHouse.

In this guide, we first export a table to Google's object store (GCS) and then import that data into ClickHouse Cloud. These steps need to be repeated for each table you wish to export from BigQuery to ClickHouse.

How long will exporting data to ClickHouse take?

Exporting data from BigQuery to ClickHouse is dependent on the size of your dataset. As a comparison, it takes about an hour to export the 4TB public Ethereum dataset from BigQuery to ClickHouse using this guide.

TableRowsFiles ExportedData SizeBigQuery ExportSlot TimeClickHouse Import
blocks16,569,4897314.53GB23 secs37 min15.4 secs
transactions1,864,514,4145169957GB1 min 38 sec1 day 8hrs18 mins 5 secs
traces6,325,819,30617,9852.896TB5 min 46 sec5 days 19 hr34 mins 55 secs
contracts57,225,83735045.35GB16 sec1 hr 51 min39.4 secs
Total8.26 billion23,5773.982TB8 min 3 sec> 6 days 5 hrs53 mins 45 secs

1. Export table data to GCS

In this step, we utilize the BigQuery SQL workspace to execute our SQL commands. Below, we export a BigQuery table named mytable to a GCS bucket using the EXPORT DATA statement.

EXPORT DATA
OPTIONS (
uri = 'gs://mybucket/mytable/*.parquet',
format = 'PARQUET',
overwrite = true
)
AS (
SELECT * FROM mytable
);

In the above query, we export our BigQuery table to the Parquet data format. We also have a * character in our uri parameter. This ensures the output is sharded into multiple files, with a numerically increasing suffix, should the export exceed 1GB of data.

This approach has a number of advantages:

  • Google allows up to 50TB per day to be exported to GCS for free. Users only pay for GCS storage.
  • Exports produce multiple files automatically, limiting each to a maximum of 1GB of table data. This is beneficial to ClickHouse since it allows imports to be parallelized.
  • Parquet, as a column-oriented format, represents a better interchange format since it is inherently compressed and faster for BigQuery to export and ClickHouse to query

2. Importing data into ClickHouse from GCS

Once the export is complete, we can import this data into a ClickHouse table. You can use the ClickHouse SQL console or clickhouse-client to execute the commands below.

You must first create your table in ClickHouse:

CREATE TABLE default.mytable
(
`timestamp` DateTime64(6),
`some_text` String
)
ENGINE = MergeTree
ORDER BY (timestamp);

After creating the table, we enable parallel inserts on selects to speed up our export:

SET parallel_distributed_insert_select = 1;

Finally, we can insert the data from GCS into our ClickHouse table using the INSERT INTO SELECT command, which inserts data into a table based on the results from a SELECT query.

To retrieve the data to INSERT, we can use the s3Cluster function to retrieve data from our GCS bucket since GCS is interoperable with Amazon S3:

INSERT INTO mytable
SELECT
timestamp,
ifNull(some_text, '') as some_text
FROM s3Cluster(
'default',
'https://storage.googleapis.com/mybucket/mytable/*.parquet.gz',
'<ACCESS_ID>',
'<SECRET>'
);

The ACCESS_ID and SECRET used in the above query is your HMAC key associated with your GCS bucket.

Use ifNull when exporting nullable columns

In the above query, we use the ifNull function with the some_text column to insert data into our ClickHouse table with a default value. You can also make your columns in ClickHouse Nullable, but this is not recommended as it may affect negatively performance.

3. Testing successful data export

To test whether your data was properly inserted, simply run a SELECT query on your new table:

SELECT * FROM mytable limit 10;

To export more BigQuery tables, simply redo the steps above for each additional table.

Further reading and support

In addition to this guide, we also recommend reading our blog post that shows how to use ClickHouse to speed up BigQuery.

If you are having issues transferring data from BigQuery to ClickHouse, please feel free to contact us at support@clickhouse.com.