Adding Unique IDs to Transaction Logs in Polygon using JQ

I’m working on serving Polygon data via Outserv. As part of the process, I downloaded the first 15 million Polygon blocks and stored them as JSON. To avoid any duplicates while loading this data into Outserv, each object in JSON has a unique ID. That’s true for Blocks and Transactions that have hash and Accounts that have address. But, is not true for Logs, which don’t have any globally unique identifiers.

This is how a sample Block data looks like in JSON (lots of details redacted for simplicity):

{
  "Blks": [
    {
      "hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a",
      "number": "0x1e3ce90",
      "transactions": [
        {
          "hash": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b",
          "blockNumber": "0x1e3ce90",
          "fee": "0x352ad5e986f000",
          "from": {
            "address": "0x18f768455e7f5fb09fc491fd86bcc282bcdd5973"
          },
          "to": {
            "address": "0xdef171fe48cf0115b1d80b88dc8eab59176fee57"
          },
          "logs": [
            {
              "logIndex": "0x2",
              "address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
              "topics": [
                "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
                "0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
                "0xabc",
                "0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
              ],
              "blockNumber": "0x1e3ce90",
              "block": {
                "hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
              }
            }
          ]
        }
      ]
    }
  ]
}

I had 36 GB of gzipped compressed JSON data in this format. We want to add a lid, which is a composite of transaction hash and the log index. Together, they would be globally unique.

Log.lid = parent Transaction.hash + Log.logIndex

jq is an excellent tool for JSON manipulation. I’ve used it for simpler stuff in the past. This seemed like a complex job. But, after hours of trial, error and online searches, I was able to find a great solution. Let’s dive into it!

Step 1: Get blocks

cat sample-input.json | jq '.Blks[]' would return the block object.

Step 2: Select transactions

cat sample-input.json | jq '.Blks[].transactions[]' would return the transaction object.

Step 3: Select logs

cat sample-input.json | jq '.Blks[].transactions[] | .logs[]' would return the log object.

Step 4: Add a key to log

cat sample-input.json | jq '.Blks[].transactions[] | .logs[] | . += {"lid": "test"}' would add an lid key to the log.

{
  "logIndex": "0x2",
  "address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
  "topics": [
    "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
    "0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
    "0xabc",
    "0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
  ],
  "blockNumber": "0x1e3ce90",
  "block": {
    "hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
  },
  "lid": "test"
}

Step 5: Add parent Transaction hash

Now the challenge lies in getting the parent transaction’s hash. For that, we can use the select directive in jq.

cat sample-input.json | jq '.Blks[].transactions[] | select (.) as $txn | .logs[] | . += {"lid": $txn.hash}'

{
  "logIndex": "0x2",
  "address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
  "topics": [
    "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
    "0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
    "0xabc",
    "0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
  ],
  "blockNumber": "0x1e3ce90",
  "block": {
    "hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
  },
  "lid": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b"
}

Step 6: Add logIndex

Now, we need to select the log and add .logIndex to the lid key.

cat sample-input.json | jq '.Blks[].transactions[] | select (.) as $txn | $txn.logs[] | select (.) as $log | . += {"lid": ($txn.hash + "|" + $log.logIndex)}'

Note the (...) brackets to indicate we want to concatenate the values.

{
  "logIndex": "0x2",
  "address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
  "topics": [
    "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
    "0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
    "0xabc",
    "0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
  ],
  "blockNumber": "0x1e3ce90",
  "block": {
    "hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
  },
  "lid": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b|0x2"
}

Alright. This gives us the unique lid for each log. But, this is only outputting the log. We also want to output the transaction and the block.

Step 7: Output everything

To output everything, we use the update operator |=, instead of pipes.

cat sample-input.json | jq '.Blks[].transactions[] |= select (.) as $txn | $txn.logs[] |= select (.) as $log | . += {"lid": ($txn.hash + "|" + $log.logIndex)}'

Step 8: Handle nulls

Not every block in Polygon has transactions, and not every transaction has logs. In such a case, we still want to output the blocks without any modifications. For that, we use if .. then .. else .. end conditional operators.

# Check both transactions length and logs length
`if ((.transactions | length > 0) and (.transactions[].logs | length > 0))`
`then (above modification)` # The entire mod logic within (...)
`else .` # output without any modification

cat sample-input.json | jq '.Blks[] |= if ((.transactions | length > 0) and (.transactions[].logs | length > 0)) then (.transactions[] |= select (.) as $txn | $txn.logs[] |= select (.) as $log | . += {"lid": ($txn.hash + "|" + .logIndex)}) else . end'

{
  "Blks": [
    {
      "hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a",
      "number": "0x1e3ce90",
      "transactions": [
        {
          "hash": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b",
          "blockNumber": "0x1e3ce90",
          "fee": "0x352ad5e986f000",
          "from": {
            "address": "0x18f768455e7f5fb09fc491fd86bcc282bcdd5973"
          },
          "to": {
            "address": "0xdef171fe48cf0115b1d80b88dc8eab59176fee57"
          },
          "logs": [
            {
              "logIndex": "0x2",
              "address": "0x83000597e8420ad7e9edd410b2883df1b83823cf",
              "topics": [
                "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
                "0x00000000000000000000000018f768455e7f5fb09fc491fd86bcc282bcdd5973",
                "0xabc",
                "0x000000000000000000000000def171fe48cf0115b1d80b88dc8eab59176fee57"
              ],
              "blockNumber": "0x1e3ce90",
              "block": {
                "hash": "0x8e904e2f8156609e1769f283c5cb373903e721109672920fce78e1f322146d4a"
              },
              "lid": "0xd52f34dbbf76786cb1fb63cf0509b46ecf521863e2f1faea259f7b17e07df57b|0x2"
            }
          ]
        }
      ]
    }
  ]
}

Now we have a globally unique log ID lid, which is a composite of the parent transaction hash and the log index — done in a multi-level nested JSON. Thanks, jq!



Date
August 20, 2022