Adding Unique IDs to Transaction Logs in Polygon using JQ
I’m working on serving Polygon data via Outserv. As part of the process, I downloaded the first 15 million Polygon blocks and stored them as JSON. To avoid any duplicates while loading this data into Outserv, each object in JSON has a unique ID. That’s true for Blocks and Transactions that have hash
and Accounts that have address
. But, is not true for Logs, which don’t have any globally unique identifiers.
This is how a sample Block data looks like in JSON (lots of details redacted for simplicity):
{
"Blks": [
{
"hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a",
"number": "0x1e3ce90",
"transactions": [
{
"hash": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b",
"blockNumber": "0x1e3ce90",
"fee": "0x352ad5e986f000",
"from": {
"address": "0x18f768455e7f5fb09fc491fd86bcc282bcdd5973"
},
"to": {
"address": "0xdef171fe48cf0115b1d80b88dc8eab59176fee57"
},
"logs": [
{
"logIndex": "0x2",
"address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
"topics": [
"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
"0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
"0xabc",
"0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
],
"blockNumber": "0x1e3ce90",
"block": {
"hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
}
}
]
}
]
}
]
}
I had 36 GB of gzipped compressed JSON data in this format. We want to add a lid
, which is a composite of transaction hash and the log index. Together, they would be globally unique.
Log.lid = parent Transaction.hash + Log.logIndex
jq
is an excellent tool for JSON manipulation. I’ve used it for simpler stuff in the past. This seemed like a complex job. But, after hours of trial, error and online searches, I was able to find a great solution. Let’s dive into it!
Step 1: Get blocks
cat sample-input.json | jq '.Blks[]'
would return the block object.
Step 2: Select transactions
cat sample-input.json | jq '.Blks[].transactions[]'
would return the transaction object.
Step 3: Select logs
cat sample-input.json | jq '.Blks[].transactions[] | .logs[]'
would return the log object.
Step 4: Add a key to log
cat sample-input.json | jq '.Blks[].transactions[] | .logs[] | . += {"lid": "test"}'
would add an lid
key to the log.
{
"logIndex": "0x2",
"address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
"topics": [
"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
"0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
"0xabc",
"0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
],
"blockNumber": "0x1e3ce90",
"block": {
"hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
},
"lid": "test"
}
Step 5: Add parent Transaction hash
Now the challenge lies in getting the parent transaction’s hash. For that, we can use the select
directive in jq
.
cat sample-input.json | jq '.Blks[].transactions[] | select (.) as $txn | .logs[] | . += {"lid": $txn.hash}'
{
"logIndex": "0x2",
"address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
"topics": [
"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
"0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
"0xabc",
"0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
],
"blockNumber": "0x1e3ce90",
"block": {
"hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
},
"lid": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b"
}
Step 6: Add logIndex
Now, we need to select the log and add .logIndex
to the lid
key.
cat sample-input.json | jq '.Blks[].transactions[] | select (.) as $txn | $txn.logs[] | select (.) as $log | . += {"lid": ($txn.hash + "|" + $log.logIndex)}'
Note the (...)
brackets to indicate we want to concatenate the values.
{
"logIndex": "0x2",
"address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
"topics": [
"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
"0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
"0xabc",
"0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
],
"blockNumber": "0x1e3ce90",
"block": {
"hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
},
"lid": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b|0x2"
}
Alright. This gives us the unique lid
for each log. But, this is only outputting the log. We also want to output the transaction and the block.
Step 7: Output everything
To output everything, we use the update operator |=
, instead of pipes.
cat sample-input.json | jq '.Blks[].transactions[] |= select (.) as $txn | $txn.logs[] |= select (.) as $log | . += {"lid": ($txn.hash + "|" + $log.logIndex)}'
Step 8: Handle nulls
Not every block in Polygon has transactions, and not every transaction has logs. In such a case, we still want to output the blocks without any modifications. For that, we use if .. then .. else .. end
conditional operators.
# Check both transactions length and logs length
`if ((.transactions | length > 0) and (.transactions[].logs | length > 0))`
`then (above modification)` # The entire mod logic within (...)
`else .` # output without any modification
cat sample-input.json | jq '.Blks[] |= if ((.transactions | length > 0) and (.transactions[].logs | length > 0)) then (.transactions[] |= select (.) as $txn | $txn.logs[] |= select (.) as $log | . += {"lid": ($txn.hash + "|" + .logIndex)}) else . end'
{
"Blks": [
{
"hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a",
"number": "0x1e3ce90",
"transactions": [
{
"hash": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b",
"blockNumber": "0x1e3ce90",
"fee": "0x352ad5e986f000",
"from": {
"address": "0x18f768455e7f5fb09fc491fd86bcc282bcdd5973"
},
"to": {
"address": "0xdef171fe48cf0115b1d80b88dc8eab59176fee57"
},
"logs": [
{
"logIndex": "0x2",
"address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
"topics": [
"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
"0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
"0xabc",
"0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
],
"blockNumber": "0x1e3ce90",
"block": {
"hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
},
"lid": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b|0x2"
}
]
}
]
}
]
}
Now we have a globally unique log ID lid
, which is a composite of the parent transaction hash and the log index — done in a multi-level nested JSON. Thanks, jq
!