BTC transactions table inputs missing spend_output_index

0

As the title suggests, I queried the BTC transactions table and found that (after querying the first 100_000_000 rows) every one of the transactions inputs has spend_output_index = null.

Is this an error or is there another way to determine a spends output index?

The query I used was:

SELECT 
    COUNT(*)
FROM (
    SELECT 1
    FROM 
        transactions tx,
        UNNEST(tx.inputs) as t_input(input)
    WHERE 
        input.spent_transaction_hash IS NOT NULL AND 
        input.spend_output_index IS NULL
    LIMIT 100000000
) AS subquery;

And the results...

#	_col0
1	100000000
charles
asked 5 months ago251 views
2 Answers
1

Hello Charles,

Thank you very much for bringing this issue to Rajan and I's attention. The CloudFormation template and accompanying schema in the repository have been updated to reflect these changes.

Have a fantastic rest of your day.

profile pictureAWS
EXPERT
answered 4 months ago
0

I used the below doc to setup Athena to query the BTC transaction table

[+] https://aws.amazon.com/blogs/database/access-bitcoin-and-ethereum-open-datasets-for-cross-chain-analytics/

The above documentation contain a CloudFormation stack

[+] CF stack: https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/DBBLOG-2500/aws-public-blockchain.yaml

I then verified that that spend_output_index was, in fact, coming out to be null for all the rows in the BTC transaction table.

I then checked the source for the data. As per the above doc, the data is stored in a public S3 bucket with URI: s3://aws-public-blockchain/v1.0/btc/transactions/

Below is one of the sample records from this above S3 files:

{"hash":"77406ab7b318fd951a7eceeeb215b1ab6870f2b27d90c8642ec29e5a923200ce","version":"2","size":"304","block_hash":"00000000000000000001b120c6a88c9a3d52bd61b0cd6be3526077d34fae3dfc","block_number":"817878","index":"375","virtual_size":"147","lock_time":"0","input_count":"1","output_count":"1","is_coinbase":false,"output_value":0.00000331,"outputs":[{"address":"bc1pe727l62jdxgr3jk8qxruu4dh0k5lmqdzpapcxjr9htfc66rtt2ksgk20et","index":"0","required_signatures":"1","script_asm":"1 cf95efe952699038cac70187ce55b77da9fd81a20f43834865bad38d686b5aad","script_hex":"5120cf95efe952699038cac70187ce55b77da9fd81a20f43834865bad38d686b5aad","type":"witness_v1_taproot","value":0.00000331}],"block_timestamp":1700611323000,"date":"2023-11-22","last_modified":1700611367997.9,"fee":0.0000337,"input_value":0.00003701,"inputs":[{"address":"bc1pm8nr9cralwndxuap4g06ry0y6ptjur83uupxursdeuzumwrwr56qyudkl6","index":"0","required_signatures":"1","script_asm":"","script_hex":"","sequence":"4261412863","spent_output_index":"514","spent_transaction_hash":"64d83c205fef56d7772cb6b4189f8f3d55153fb1ce512391c488bd5d3965d07a","txinwitness":["e20ca15ddc137123056cef48433a2d8cb6ceb291c23fd4e5263f269fed48bceb6cc426b3930e09119c20218db174f0feef2d07ca38172cdd4e11a83c39ebf6f2","209cbb6921ba3ba7e3ed0f4a320c86cbb0352d6523fba1dc165f0c1f288ad639a6ac0063036f726401010a746578742f706c61696e00337b2270223a226272632d3230222c226f70223a226d696e74222c227469636b223a226d6d7373222c22616d74223a223230227d68","c19cbb6921ba3ba7e3ed0f4a320c86cbb0352d6523fba1dc165f0c1f288ad639a6"],"type":"witness_v1_taproot","value":0.00003701}]}

It can be observed that the above records contains the field in "inputs" column called "spent_output_index" (NOT "spend_output_index"). This means that due to this mismatch in the field name, the data in the Athena for spend_output_index is coming out to be null as there is no such field in the S3 source parquet files.

I deleted the CloudFormation stack and manually updated the below line in the CloudFormation template. I then restored this updated CloudFormation template and I can confirm that I was getting correct output in the Athena queries:

Original line

          Type: array<struct<index:bigint,spent_transaction_hash:string,spend_output_index:bigint,script_asm:string,script_hex:string,sequence:bigint,required_signatures:bigint,type:string,address:string,value:double>>

Updated line

          Type: array<struct<index:bigint,spent_transaction_hash:string,spent_output_index:bigint,script_asm:string,script_hex:string,sequence:bigint,required_signatures:bigint,type:string,address:string,value:double>>

Note, I have replaced "spend_output_index" with "spent_output_index" in the above line.

After making the above change the query was working fine and I was getting 0 rows with spent_output_index = null.

SELECT 
    COUNT(*)
FROM (
    SELECT 1
    FROM 
        transactions tx,
        UNNEST(tx.inputs) as t_input(input)
    WHERE 
        input.spent_transaction_hash IS NOT NULL AND 
        input.spent_output_index IS NULL
    LIMIT 100000000
) AS subquery;

output
=========

#	_col0
1	0
AWS
SUPPORT ENGINEER
Rajan_L
answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions