How can I use the KPL to put data records into a Kinesis data stream?

3 minute read
0

I want to use the Amazon Kinesis Producer Library (KPL) to write and put data records into an Amazon Kinesis data stream.

Resolution

Prerequisites:

  • A running Amazon Elastic Compute Cloud (Amazon EC2) Linux instance
  • An AWS Identity and Access Management (IAM) role is attached to your instance
  • The KinesisFullAccess policy is attached to the instance's IAM role

To use the KPL to put records into a Kinesis data stream, complete the following steps:

  1. Connect to your Linux instance.

  2. Install the latest version of the OpenJDK 8 developer package:

    sudo yum install java-1.8.0-openjdk-devel
  3. Confirm that Java is installed:

    java -version

    The output looks similar to the following example:

    java version "1.7.0_181"OpenJDK Runtime Environment (amzn-2.6.14.8.80.amzn1-x86_64 u181-b00)
    OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
  4. Run the following commands to set Java 1.8 as the default java and javac providers:

    sudo /usr/sbin/alternatives --config java sudo /usr/sbin/alternatives --config javac
  5. Add a repository with an Apache Maven package:

    sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
  6. Set the version number for the Maven packages:

    sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
  7. Use yum to install Maven:

    sudo yum install -y apache-maven

    To confirm that Maven is properly installed, run the following command:

    mvn -version

    The output looks similar to the following example:

    Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z)Maven home: /usr/share/apache-maven
    Java version: 1.7.0_181, vendor: Oracle Corporation
    Java home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.181.x86_64/jre
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "4.14.33-51.37.amzn1.x86_64", arch: "amd64", family: "unix"
  8. Install git, and then download the KPL from the GitHub website:

    sudo yum install gitgit clone https://github.com/awslabs/amazon-kinesis-producer
  9. Open the amazon-kinesis-producer/java/amazon-kinesis-producer-sample/ directory, and then list the files:

    `cd amazon-kinesis-producer/java/amazon-kinesis-producer-sample/ls`
    `default_config.properties  pom.xml  README.md  src  target`
  10. Run a command similar to the following to create a Kinesis data stream:

    aws kinesis create-stream --stream-name kinesis-kpl-demo --shard-count 2
  11. Run list-streams to confirm that the stream is created:

    aws kinesis list-streams
  12. Open the SampleProducer.java file on the GitHub repository, and then modify the following fields:
    For public static final String STREAM_NAME_DEFAULT, enter the name of the Kinesis data stream that you previously created.
    For public static final String REGION_DEFAULT, enter the AWS Region that you're using.
    Example:

    cd src/com/amazonaws/services/kinesis/producer/samplevi SampleProducerConfig.java
    public static final String STREAM_NAME_DEFAULT = "kinesis-kpl-demo";
    public static final String REGION_DEFAULT = "us-east-1";
  13. To allow Maven to download all the directory's dependencies, run the following command in the amazon-kinesis-producer-sample directory:

    mvn clean package
  14. To run the producer and to send data into the Kinesis data stream, run the following command in the amazon-kinesis-producer-sample directory:

    mvn exec:java -Dexec.mainClass="com.amazonaws.services.kinesis.producer.sample.SampleProducer"
  15. To verify the number of records sent to the stream, check the Incoming Data (Count) graph on the Monitoring tab of the Kinesis console.

Note: The record count might be lower than the number of records sent to the data stream. This lower record count can occur because the KPL uses aggregation.

Related information

Writing to your Kinesis Data Stream using the KPL

Resharding, scaling, and parallel processing

AWS OFFICIAL
AWS OFFICIALUpdated 19 days ago