Terraform is an incredibly powerful infrastructure-as-code (IaC) tool that allows you to easily define, deploy and manage your application infrastructure across many providers. An example being that you may want to spin up an AWS EC2 instance, or create and manage lambda functions. One of the many abilities of terraform is the possibility to use dynamic data sources for the Terraform modules, allowing for the running scripts (in our case a python script) and using its output to drive the Terraform configuration.

In this example, we’ll create a basic python script that generates some JSON containing information about Kafka topics, which we’ll then use the output of that to create some Kafka topic terraform definitions.

Writing the python script

topic_data.py

# Imagine we do something to parse/generate this information...
import json
import base64

if __name__ == "__main__":
    data = {
        "topics": {
            "streaming.example.topic.a": {
                "cluster_name": "cluster_a",
                "partitions_count": 2
            },
            "streaming.example.topic.b": {
                "cluster_name": "cluster_a",
                "partitions_count": 6
            }
        }
    }
    # We'll explain this line in a moment.
    output = {"output": base64.b64encode(json.dumps(data, indent=2).encode()).decode()}
    print(json.dumps(output, indent=2))

This isn’t the most impressive python you’ve ever seen, but it’ll do for an example. When exposing data to terraform, it must adhere to the interface that terraform provides for external data sources. For more details on this, you can read it here, but in essence, you must read all data on stdin as JSON, and you must output the result on stdout as JSON.

Next we’ll reference the python script in the Terraform as an external data source, calling it to retrieve our results.

Using the Python Script in a Terraform Configuration

In the most basic implementation, we can create an external datasource, as you can see in the example below. We define it like any other Terraform resource, and use the keyword program to call our python script.

One “interesting” quirk with the Terraform external data sources is that the JSON you pull in, can only be string keys and values, so the above data with

"partition_count": 6

will cause terraform to fall over and so you need to do some munging of data to get it into terraform, using jsondecode and base64 encoding. We can do this by first base64 encoding the data we want to push into terraform, and then base64 decoding once in terraform, followed finally by jsondecoding to get our output (ugh).

topics.tf

data "external" "topic_data" {
  program = ["python", "${path.module}/topic_data.py"]
}

output "topic_data_info" {
  value = jsondecode(base64decode("${data.external.topic_data.result.output}"))
}

The above example produces an output which, when applied will dump out the JSON we “generated” in python. By running the following

terraform apply

You’ll be able to see the output of the python script, that has passed through terraform and back out again.

❯ terraform plan
data.external.topic_data: Reading...
data.external.topic_data: Read complete after 0s [id=-]

Changes to Outputs:
  + topic_data_info = {
      + topics = {
          + "streaming.example.topic.a" = {
              + cluster_name     = "cluster_a"
              + partitions_count = 2
            }
          + "streaming.example.topic.b" = {
              + cluster_name     = "cluster_a"
              + partitions_count = 6
            }
        }
    }

We can then use this data to dynamically change our resources where required, in my case creating Kafka topics:

data "external" "topic_data" {
  program = ["python", "${path.module}/topic_data.py"]
}

locals {
        kafka_topics = jsondecode(base64decode("${data.external.topic_data.result.output.topics}"))
}

resource "confluent_kafka_topic" "test_topic" {
  for_each = local.kafka_topics

  kafka_cluster {
    id = example.clusters[each.value.cluster_name].id
  }

  topic_name       = each.key
  partitions_count = each.value.partitions_count

  ...

}

Conclusion

Using a Python script as an external data source for Terraform is a simple and powerful way to incorporate dynamic data into your infrastructure configuration. With just a few lines of code, you can use the output of a Python script to drive your Terraform configuration, making it easier to manage and maintain your infrastructure over time.