Skip to content

How do I use AES to encrypt an Apache HBase table in Amazon EMR?

5 minute read
0

I want to use Advanced Encryption Standard (AES) to encrypt an Apache HBase table in Amazon EMR.

Resolution

Note: If you use Amazon Simple Storage Service (Amazon S3) as the data source instead of Hadoop Distributed File System (HDFS), then use server-side and client-side encryption to protect data.

Encrypt a new Apache HBase table

Complete the following steps:

  1. Open the Amazon EMR console.

  2. Choose a cluster that already has HBase. Or, create a new cluster with HBase.

  3. Use SSH to connect to the Amazon EMR cluster primary node.

  4. Run the keytool command to create a secret key for AES encryption:

    sudo keytool -keystore /etc/hbase/conf/hbase.jks -storetype jceks -storepass:file mysecurefile -genseckey -keyalg AES -keysize 128 -alias example-alias

    Note: Replace example-alias with your alias.
    Example output:

    Output:Enter key password for example-key-store
        (RETURN if same as keystore password):
    Warning:The JCEKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/hbase/conf/hbase.jks -destkeystore /etc/hbase/conf/hbase.jks -deststoretype pkcs12".
  5. Add the following properties to the hbase-site.xml file for each node in the Amazon EMR cluster. Provide the hbase.jks path and password for the hbase.crypto.keyprovider parameter. Also, specify your alias in the hbase.crypto.master.key.name parameter:

       
      <property> 
        <name>hbase.crypto.keyprovider.parameters</name>  
        <value>jceks:///etc/hbase/conf/hbase.jks?password=your_password</value>
      </property>
    
      <property>
        <name>hbase.crypto.master.key.name</name>
        <value>example-alias</value>
      </property>
    
      <property>
        <name>hbase.regionserver.hlog.reader.impl</name>
        <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
      </property>
    
      <property>
        <name>hbase.regionserver.hlog.writer.impl</name>
        <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
      </property>
    
      <property>
        <name>hfile.format.version</name>
        <value>3</value>
      </property>
    
      <property>
        <name>hbase.regionserver.wal.encryption</name>
        <value>true</value>
      </property>
    
      <property>
        <name>hbase.crypto.keyprovider</name>
        <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
      </property>

    Note: Instead of the default MD5 algorithm, you can use a custom encryption key hash algorithm, such as HBase-25181. For more information, see Add options for disabling column family encryption and choosing hash algorithm for wrapped encryption keys on the Apache website. To set the hash, use the hbase.crypto.key.hash.algorithm configuration option. The following example uses the SHA-512 algorithm:

    <property>
      <name>hbase.crypto.key.hash.algorithm</name>
      <value>SHA-512</value></property>
  6. Copy the hbase.jks file to all cluster nodes in the location that's specified in the hbase.crypto.keyprovider parameter:

    cd /etc/hbase/conf
    scp hbase.jks example-host-to-copy:/tmp
    ssh example-to-hostsudo cp /tmp/hbase.jks /etc/hbase/conf/

    Note: Replace example-host-to-copy and example-to-host with your nodes' public DNS names.

  7. Restart all HBase services on the primary and core nodes. Then, repeat the hbase-regionserver stop and start commands on each core node.
    Note: When you stop and start AWS Region servers, you might affect ongoing reads and writes to HBase tables on your cluster. Stop and start the HBase daemons only during downtime. It's a best practice to stop and start a test cluster before you stop and start a production cluster.

    Amazon EMR 5.30.0 and later versions:

    sudo systemctl stop hbase-mastersudo 
    systemctl stop hbase-regionserver
    
    sudo systemctl start hbase-mastersudo 
    systemctl start hbase-regionserver

    Amazon EMR 4x to Amazon EMR 5.29.0 versions:

    sudo initctl stop hbase-master
    sudo initctl stop hbase-regionserver
    
    sudo initctl start hbase-master
    sudo initctl start hbase-regionserver
  8. Log in to the HBase shell:

    # hbase shell
  9. Create a table with AES encryption:

    create 'example-table',{NAME=>'columnfamily',ENCRYPTION=>'AES'}

    Note: Replace example-table with your table name.
    Example output:

    0 row(s) in 1.6760 seconds=> Hbase::Table - example-table1
  10. Confirm that AES encryption is turned on:

    describe 'example-table'

    Note: Replace example-table with your table name.
    Example output:

    Table example-table is ENABLED
    table1
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'columnfamily', BLOOMFILTER => 'ROW', ENCRYPTION => 'AES', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
    DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
    1 row(s) in 0.0320 seconds

Encrypt an existing Apache HBase table

Complete the following steps:

  1. Describe the unencrypted table:

    describe 'example-table'

    Note: Replace example-table with the table that you want to describe.
    Example output:

    Table example-table is ENABLED
    table2
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
    'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6
    5536', REPLICATION_SCOPE => '0'}1 row(s) in 0.0140 seconds2
  2. Turn on AES encryption:

    alter 'example-table',{NAME=>'columnfamily2',ENCRYPTION=>'AES'}

    Note: Replace example-table with your table.
    Example output:

    Updating all regions with the new schema...
    1/1 regions updated.
    Done.0 row(s) in 1.9000 seconds3.
  3. Confirm that the table is encrypted:

    describe 'example-table'

    Note: Replace example-table with your table.
    Example output:

    Table example-table is ENABLED
    table2
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'columnfamily2', BLOOMFILTER => 'ROW', ENCRYPTION => 'AES', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
    DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}1 row(s) in 0.0120 seconds

    Note: If you create a secondary index on the table, then WAL encryption might not work and a java.lang.NullPointerException response occurs. To resolve this issue, set hbase.regionserver.wal.encryption to false in the hbase-site.xml file:

    <property>      
          <name>hbase.regionserver.wal.encryption</name>
          <value>false</value>
      </property>

Related information

Using the HBase shell

Transparent encryption in HDFS on Amazon EMR

AWS OFFICIALUpdated 10 months ago