BTC transactions table inputs missing spend_output_index

0

As the title suggests, I queried the BTC transactions table and found that (after querying the first 100_000_000 rows) every one of the transactions inputs has spend_output_index = null.

Is this an error or is there another way to determine a spends output index?

The query I used was:

SELECT 
    COUNT(*)
FROM (
    SELECT 1
    FROM 
        transactions tx,
        UNNEST(tx.inputs) as t_input(input)
    WHERE 
        input.spent_transaction_hash IS NOT NULL AND 
        input.spend_output_index IS NULL
    LIMIT 100000000
) AS subquery;

And the results...

#	_col0
1	100000000
charles
demandé il y a 6 mois256 vues
2 réponses
1

Hello Charles,

Thank you very much for bringing this issue to Rajan and I's attention. The CloudFormation template and accompanying schema in the repository have been updated to reflect these changes.

Have a fantastic rest of your day.

profile pictureAWS
EXPERT
répondu il y a 5 mois
0

I used the below doc to setup Athena to query the BTC transaction table

[+] https://aws.amazon.com/blogs/database/access-bitcoin-and-ethereum-open-datasets-for-cross-chain-analytics/

The above documentation contain a CloudFormation stack

[+] CF stack: https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/DBBLOG-2500/aws-public-blockchain.yaml

I then verified that that spend_output_index was, in fact, coming out to be null for all the rows in the BTC transaction table.

I then checked the source for the data. As per the above doc, the data is stored in a public S3 bucket with URI: s3://aws-public-blockchain/v1.0/btc/transactions/

Below is one of the sample records from this above S3 files:

{"hash":"77406ab7b318fd951a7eceeeb215b1ab6870f2b27d90c8642ec29e5a923200ce","version":"2","size":"304","block_hash":"00000000000000000001b120c6a88c9a3d52bd61b0cd6be3526077d34fae3dfc","block_number":"817878","index":"375","virtual_size":"147","lock_time":"0","input_count":"1","output_count":"1","is_coinbase":false,"output_value":0.00000331,"outputs":[{"address":"bc1pe727l62jdxgr3jk8qxruu4dh0k5lmqdzpapcxjr9htfc66rtt2ksgk20et","index":"0","required_signatures":"1","script_asm":"1 cf95efe952699038cac70187ce55b77da9fd81a20f43834865bad38d686b5aad","script_hex":"5120cf95efe952699038cac70187ce55b77da9fd81a20f43834865bad38d686b5aad","type":"witness_v1_taproot","value":0.00000331}],"block_timestamp":1700611323000,"date":"2023-11-22","last_modified":1700611367997.9,"fee":0.0000337,"input_value":0.00003701,"inputs":[{"address":"bc1pm8nr9cralwndxuap4g06ry0y6ptjur83uupxursdeuzumwrwr56qyudkl6","index":"0","required_signatures":"1","script_asm":"","script_hex":"","sequence":"4261412863","spent_output_index":"514","spent_transaction_hash":"64d83c205fef56d7772cb6b4189f8f3d55153fb1ce512391c488bd5d3965d07a","txinwitness":["e20ca15ddc137123056cef48433a2d8cb6ceb291c23fd4e5263f269fed48bceb6cc426b3930e09119c20218db174f0feef2d07ca38172cdd4e11a83c39ebf6f2","209cbb6921ba3ba7e3ed0f4a320c86cbb0352d6523fba1dc165f0c1f288ad639a6ac0063036f726401010a746578742f706c61696e00337b2270223a226272632d3230222c226f70223a226d696e74222c227469636b223a226d6d7373222c22616d74223a223230227d68","c19cbb6921ba3ba7e3ed0f4a320c86cbb0352d6523fba1dc165f0c1f288ad639a6"],"type":"witness_v1_taproot","value":0.00003701}]}

It can be observed that the above records contains the field in "inputs" column called "spent_output_index" (NOT "spend_output_index"). This means that due to this mismatch in the field name, the data in the Athena for spend_output_index is coming out to be null as there is no such field in the S3 source parquet files.

I deleted the CloudFormation stack and manually updated the below line in the CloudFormation template. I then restored this updated CloudFormation template and I can confirm that I was getting correct output in the Athena queries:

Original line

          Type: array<struct<index:bigint,spent_transaction_hash:string,spend_output_index:bigint,script_asm:string,script_hex:string,sequence:bigint,required_signatures:bigint,type:string,address:string,value:double>>

Updated line

          Type: array<struct<index:bigint,spent_transaction_hash:string,spent_output_index:bigint,script_asm:string,script_hex:string,sequence:bigint,required_signatures:bigint,type:string,address:string,value:double>>

Note, I have replaced "spend_output_index" with "spent_output_index" in the above line.

After making the above change the query was working fine and I was getting 0 rows with spent_output_index = null.

SELECT 
    COUNT(*)
FROM (
    SELECT 1
    FROM 
        transactions tx,
        UNNEST(tx.inputs) as t_input(input)
    WHERE 
        input.spent_transaction_hash IS NOT NULL AND 
        input.spent_output_index IS NULL
    LIMIT 100000000
) AS subquery;

output
=========

#	_col0
1	0
AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
Rajan_L
répondu il y a 6 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions