Here at Fathom, R&D is what we do. As a result of that, there are a lot of demonstrations of what we’ve done, how we’ve provided value to customers in the past and what we focus our efforts on. Recently, I revisited an old project of ours to get it ready for a demo with a prospective client - it was the classic scenario of “If you come back to this project in 6 months, will you remember what you did?”, only this time viewed from the other end of the telescope.
The team had done their due diligence - the documentation was there, the coding repository was well maintained, but there was one problem we didn’t foresee at the time - all of our infrastructure for the project was in the cloud, or more specifically AWS. And while that wasn’t a problem in itself, pragmatically that meant that it had all been taken down in the mean time - our client wasn’t using the dev environments, idle services were just lying around accumulating costs, etc. This meant that I had to spend time familiarising myself with the code, the deployment procedure and then spend time debugging when inevitably it didn’t deploy perfectly as expected.
However, since we did that project, our team has implemented improvements in how we manage infrastructure, specifically taking an infrastructure as code approach. That’s why I’d like to spread the love and introduce you to Terraform :)
What is Terraform?
Terraform is a product by HashiCorp that allows you to manage infrastructure through code. Want to make an AWS Lambda function? Use Terraform. Want to deploy a server instance in EC2? Use Terraform. Want to create an S3 bucket? Use Terraform. Want to do all 3 and then immediately tear them down with just 2 commands? That’s impossible - actually I’m kidding, use Terraform. And, it’s not limited to just AWS, Terraform can provision infrastructure across multiple cloud providers including Azure, Google Cloud and Oracle.
I’m not a big believer in being zealous when it comes to tech - everything has it’s upsides AND downsides. The software industry changes so fast you need to be prepared to learn and keep up with the best software. In the spirit of that, rather than telling you how great I find Terraform, let’s run through a simple example and recognise the pros and cons afterwards.
A Simple Example
Let’s say a client comes to you and says “I’d like a RESTful API that writes to and reads from a database” - it’s an abstract enough example of what a client may say but bear with me. For the API, let’s say we’re happy using API Gateway, the database will be DynamoDB and to perform the reads, writes and cleaning up the I/O we plan to use a single Lambda function.
Once we’ve established what our infrastructure looks like - 1 API, 1 DynamoDB table and a Lambda function, we can begin provisioning infrastructure using Terraform. Find yourself a nice new empty folder, here we’ll create our main.tf. Terraform scripts use the extension “.tf” and Terraform will read all of the “.tf” files in the current working dierctory and merge them into a single script. We will only need 1 file for this example (main.tf), although as projects get more complex and you start to use modules(a separate topic) more files and folders would be useful:
provider "aws" {
access_key = "ACCESS_KEY_HERE"
secret_key = "SECRET_KEY_HERE"
region = "eu-west-1"
}
The first entry in main.tf is the provider that you’re using. In this example As an AWS partner, Fathom’s platform of choice is AWS and so our example is for AWS. By giving Terraform your access and secret keys you enable it to provision infrastructure for you on the AWS platform, with your permission. Having your secret key checked into source control is not best practice and there are alternative approaches, but we’ll use this method for simplicity in the example.
Now we can actually start to provision, starting with the API. Let’s say I create an OpenAPI file containing my specification for that API called “swagger.yaml”:
resource "aws_api_gateway_rest_api" "client-example-api" {
name = "client-example-api-dev"
description = "Example development API"
body = "${file("./swagger/swagger.yaml")}"
}
resource "aws_api_gateway_deployment" "client-example-api" {
rest_api_id = "${aws_api_gateway_rest_api.client-example-api.id}"
stage_name = "dev"
stage_description = "${md5(file("./swagger/swagger.yaml"))}"
depends_on = ["aws_api_gateway_rest_api.client-example-api"]
}
So above you can see that we’re provisioning a resource(Terraform’s name for everything you might provision from a provider) which is of a type “aws_api_gateway_rest_api” and we’re giving a simple name “client-example-api”. The body of that API is the previously mentioned swagger file. Then after making the API, we want to deploy it - this requires a deployment resource. The deployment resource is give the rest API ID so it knows which one to deploy, the stage name, description and also includes a “depends_on” setting so that it checks if the API exists before attempting to create the resource. Note that the “depends_on” directive is seldom needed in Terraform as it can usually determine the correct order for provisioning resources.
We can create the dynamoDB table as well, which is a little bit simpler just by adding this to our main.tf:
resource "aws_dynamodb_table" "example-client-db" {
name = "example-client-db"
read_capacity = 1
write_capacity = 1
hash_key = "ID"
attribute {
name = "ID"
type = "S"
}
tags {
Name = "example-client-db"
Environment = "dev"
}
}
Here, our resource creates a dynamoDB table called example-client-db where the hash_key is a string called ID. We’ll need this table in our next part.
Now that you have the basic idea behind provisioning resources, we can have a look at the code for the Lambda function(things are about to get slightly more complicated):
resource "aws_lambda_function" "database-handler" {
function_name = "database-handler"
# The zip containing the Lambda function
filename = "./lambda/dbHandler/dbHandler.zip"
source_code_hash = "${base64sha256(file("./lambda/dbHandler/dbHandler.zip"))}"
# "index" is the filename within the zip file (index.js) and "handler"
# is the name of the property under which the handler function was
# exported in that file.
handler = "index.handler"
runtime = "nodejs8.10"
timeout = 10
# The IAM role used for this function
role = "${aws_iam_role.database-handler-role.arn}"
environment {
variables = {
region = "eu-west-1",
table = "loyaltysystem_awards_dev"
}
}
}
# IAM role which dictates what other AWS services the Lambda function
# may access.
resource "aws_iam_role" "database-handler-role" {
name = "database-handler-role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_policy" "database-handler-policy" {
name = "database-handler-policy"
description = ""
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"dynamodb:Get",
"dynamodb:Put"
],
"Effect": "Allow",
"Resource": "arn:aws:dynamodb:eu-west-1:ACCOUNT_ID:table/example-client-db"
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "database-handler-attach" {
role = "${aws_iam_role.database-handler-role.name}"
policy_arn = "${aws_iam_policy.database-handler-policy.arn}"
}
resource "aws_lambda_permission" "database-handler-apigw" {
statement_id = "AllowAPIGatewayInvoke"
action = "lambda:InvokeFunction"
function_name = "database-handler"
principal = "apigateway.amazonaws.com"
# The /*/* portion grants access from any method on any resource
# within the API Gateway "REST API".
source_arn = "${aws_api_gateway_rest_api.client-example-api.execution_arn}/*/*"
}
So that’s a pretty long piece of code, let’s break it down. The first resource (aws_Lambda_function) is the Lambda function itself. This creates a Lambda function using the code contained in a local zip folder - an S3 bucket containing the files could also be used, but this is simpler for our example.
The next 3 resources (aws_iam_role, aws_iam_policy, aws_iam_role_policy_attachment) deal with AWS IAM which is the AWS Identity and Access Management service. This essentially breaks down to:
- Create a role for lambda to use while invoking this function, let’s call it the “database-handler-role”.
- Make a policy to define the limits of what this role can do. In this case, we’re saying that this Lambda function should be allowed to perform the action “Get” and “Put” on a specific dynamoDB table - example-client-db. If you want to use the above terraform code, don’t forget to replace ACCOUNT_ID in the ARN with your own account ID.
- Attach the policy to the role so that the role is allowed to do only what we say it can. Terraform can be used to retrieve the name and ARNs of the role and policy respectively. This is very convenient and allows for a certain level of abstraction when designing your terraform files.
Lastly, the aws_lambda_permission resource is used to give permission to the API we’ve already defined to invoke the lambda function. Without this, our API wouldn’t be able to call our lambda function, resulting in a 500 error.
With all of this in our main.tf we can finally provision some infrastructure.
Install Terraform for your given OS,
then navigate to the folder where your main.tf is kept and execute the following
command terraform init
. This will initialize terraform in that folder and make
sure that terraform can login using your provider details.
To apply our changes to our infrastructure we can use terraform apply
. After
some processing, terraform will show you the resources it is creating and will
ask you to confirm - NOTE: once you say yes to this Terraform will attempt to
make these resources, should it fail to make some of them, Terraform will not
rollback any of the successful ones. It may take a few minutes to provision the
infrastructure. But once it’s up, go ahead on your AWS console and check out
your API :)
Please note that you can use the command terraform plan
to view the changes
that terraform proposes to make in advance of actually making them with
terraform apply
. It us generally considered best practice to use terraform plan
first even though terraform apply
will as for confirmation before
proceeding to create any infrastructure,
So I have my project up and running in minutes, I can do a tech demo quickly,
but right now my client doesn’t need it, how do I take it down?
terraform destroy
to take down your provisioned infrastructure.
If you wish to make changes to existing infrastructure, change your scripts
and run Terraform again. Generally, Terraform will update the infrastructure
with minimal disruption. terraform plan
is your friend here in terms of
telling you what will happen and if any existing services will be restarted
when applying the plan.
What Are The Advantages?
The advantages of using any infrastructure orchestration tool should be obvious given the example above - it’s quicker, cheaper and simpler than setting up and tearing down your infrastructure manually. But what makes Terraform different from the likes of Puppet, Chef, Ansible or SaltStack?
- Terraform is a declarative infrastructure orchestration tool. In other words, you just tell Terraform what you want - not how to get it. How to get it is taken care of by Terraform itself. The major advantage of this is that it avoids the nitty gritty of imperative tools like Chef where you have to define a set of procedures(in Chef’s case - recipes) that are gone through to set up the infrastructure.
- Terraform is platform agnostic. Despite the AWS-heavy example given above, Terraform works with a number of cloud providers and while this is true of some other tools like Terraform, it isn’t true of another declarative infrastructure tool - CloudFormation, which only works with AWS. Please note that while Terraform supports multiple providers, its scripts are not provider agnostic.
- Terraform has a large community behind it with a lot of support resources.
- The Terraform files which describes the infrastructure are written in a very simple, easy to read language called HashiCorp Configuration Language (HCL).
Are There Any Disadvantages?
Like I said earlier, I’m not a big fan of believing a piece of software has absolutely no or minimal drawbacks and when presented with one I’m skeptical at best. That being said, Terraform is a great tool with only a few minor annoyances:
- Terraform is still developing, the latest stable release at time of writing is version 0.11.11. It’s still a very powerful tool, but there are limitations, which resolve over time.
- Terraform describes itself as “Infrastructure As Code” which is funny considering it isn’t actually written in a programming/coding language - it’s more like “Infrastructure As Config Files”. While this might seem pedantic, it’s actually quite limiting because a fully-fledged programming language would give you a level of abstraction that simply can’t be achieved just using a configuration language. And, there are some very convoluted contructs implemented in HCl to overcome these limitations. Conditinal or iterated provisioning is really awkward to set up. We’ve found in our use of Terraform that we’ve been somewhat restricted in the level of abstarctio the configuration language can provide. One way in which we’ve overcome that is using Embedded Javascript but that’s a whole other Post.
Conclusions
There’s no doubt that Terraform is a powerful tool with a huge amount to offer. In the context of bringing up and tearing down infrastructure over a short period of time, and having repeatable infrastructure deploymemt, its value can’t be underestimated. It’s easy to use, even for those unfamiliar with DevOps. In the DevOps community, people are using Terraform and it has gone, in the words of one author, “from good to the best”.
However, it is still developing and compared to something like Puppet, Terraform’s 9 year old bigger brother, it is relatively young with less flexibility in some areas.
Try Terraform out, feel free to connect and message me on LinkedIn about it! https://bit.ly/2TYdOfl