Explore how to access and analyze Arbitrum blockchain data using AWS Glue and Amazon Athena. Learn to run scalable queries on real-world data for DeFi, research, and more.
Arbitrum is now part of the AWS Public Blockchain Datasets—opening the door for developers, analysts, and researchers to access high-quality, queryable Arbitrum data directly from Amazon S3.
With data provided by SonarX and updated regularly, you can dive into L2 analytics using familiar tools like AWS Glue and Amazon Athena.
Background
The AWS Public Blockchain Datasets initiative started in 2022 to make blockchain data accessible at scale. The recent addition of Arbitrum—an Ethereum Layer 2 scaling solution—enables fast, cost-efficient data exploration for developers working on rollups, DeFi apps, or cross-chain analysis. SonarX, an AWS Partner specializing in blockchain data indexing, ensures the data remains accurate and well-structured.
Dataset Structure and Access
The Arbitrum dataset is hosted under:
s3://aws-public-blockchain/v1.1/sonarx/arbitrum/
The dataset is organized by date, with consistent schemas that includes transaction hash, sender, receiver, block number, gas used, and timestamp. This structure makes it easy to target specific time periods or conduct large-scale analytics.
Practical Use Cases
Here’s how developers and analysts are using Arbitrum data on AWS:
- DeFi Monitoring: Track Arbitrum-native protocols, liquidity movements, and token flows.
- Cross-chain Research: Compare transaction volumes and behaviors between chains such as Ethereum and Arbitrum.
- Compliance & Security: Monitor for suspicious transactions or usage spikes.
- App Performance Analysis: Measure throughput, gas usage, and user activity trends.
Getting Started with AWS Glue and Athena
Let’s walk through how to prepare and query the dataset.
Note: You can run the following commands from your AWS CloudShell terminal.
Step 1: Create an IAM Role for AWS Glue and attach necessary permissions
# Create IAM Role for AWS Glue
aws iam create-role \
--role-name AWSGlueServiceRole-Arbitrum \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}]
}' > /dev/null
# Attach inline policy for S3 access
aws iam put-role-policy \
--role-name AWSGlueServiceRole-Arbitrum \
--policy-name S3Access \
--policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::aws-public-blockchain",
"arn:aws:s3:::aws-public-blockchain/*"
]
}]
}'
# Attach AWS-managed Glue policy
aws iam attach-role-policy \
--role-name AWSGlueServiceRole-Arbitrum \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
Step 2: Create and Run a Glue Crawler
# Create Glue Crawler with daily schedule
aws glue create-crawler \
--name arbitrum-blockchain-crawler \
--role AWSGlueServiceRole-Arbitrum \
--database-name sonarx_arbitrum \
--targets '{"S3Targets": [{"Path": "s3://aws-public-blockchain/v1.1/sonarx/arbitrum/", "SampleSize": 2}]}' \
--schedule "cron(0 5 * * ? *)"
# Start the initial run
aws glue start-crawler --name arbitrum-blockchain-crawler
The first time you start the crawler, it might take anywhere between three to five minutes to fetch files from the datasets, and assemble a fitting data schema. You can monitor the first run of the AWS Glue Crawler from the Console, or by running the following command:
aws glue get-crawler --name arbitrum-blockchain-crawler --query '[Crawler][0].State'
Step 3: Query Arbitrum Data with Athena
Here’s a sample query to retrieve smart contract deployment activity over the last 30 days:
-- Daily smart contract deployments on Arbitrum over the last 30 days
SELECT
date,
COUNT(*) AS contracts_created
FROM "sonarx_arbitrum"."traces"
WHERE created_address IS NOT NULL
AND date >= date_format(date_add('day', -30, current_date), '%Y-%m-%d')
GROUP BY date
ORDER BY date ASC;
Conclusion
With Arbitrum now part of AWS’s Public Blockchain datasets, you can tap into fast, affordable L2 insights using scalable AWS services. Start exploring today and let us know what you build with Arbitrum data on AWS.