How do I resolve the "Unknown dataset URI pattern: dataset" error when I use Sqoop in Amazon EMR to export Amazon RDS data to Amazon S3 in Parquet format?

2 minute read
0

I want to resolve the “Unknown dataset URI pattern: dataset” error when I use Sqoop in Amazon EMR to export Amazon Relational Database Service (Amazon RDS) data to Amazon Simple Storage Service (Amazon S3) in Parquet format.

Resolution

The Unknown dataset URI pattern: dataset error affects Sqoop version 1.4.7. To resolve this error, complete the following steps to download and install kite-data-s3-1.1.0.jar:

  1. To connect to the Amazon EMR primary node, use SSH.

  2. To download kite-data-s3-1.1.0.jar, use wget:

    [hadoop@example-ip-address]$ wget https://repo1.maven.org/maven2/org/kitesdk/kite-data-s3/1.1.0/kite-data-s3-1.1.0.jar

    Note: Replace example-ip-address with your IP address.

  3. Check that the downloaded file is the correct size (1.7 MB):

    [hadoop@example-ip-address]$ du -h kite-data-s3-1.1.0.jar
    1.7M     kite-data-s3-1.1.0.jar

    Note: Replace example-ip-address with your IP address.

  4. Move the downloaded file to the Sqoop library directory /usr/lib/sqoop/lib/:

    sudo cp kite-data-s3-1.1.0.jar /usr/lib/sqoop/lib/
  5. Grant the required permission for the downloaded file:

    sudo chmod 755 /usr/lib/sqoop/lib/kite-data-s3-1.1.0.jar
  6. To import the downloaded file, use the s3n connector.

    Example:

    sqoop import --connect jdbc:mysql://mysql.cdfqbesrukqe.eu-west-1.rds.amazonaws.com:3306/dev --username admin -P --table hist_root --target-dir example-s3n://example-bucket/sqoop_parquet/demo --as-parquetfile -m 2 --split-by identifiers -- --schema onwatch

    Note: If you use the s3 connector, then the Unknown dataset URI pattern: dataset error appears. The value of --target-dir must be in the format of a path with at least 3 layers (s3n://example-bucket/example-namespace/example-dataset>)

For more information, see Dataset, view, and repositories URI on the Kite SDK website.