Microsoft Azure has a nice service for scheduling tasks called Azure Automation. While Azure Automation is able to other things as well, such as being able to act as a Powershell DSC pull server, we'll focus on the runbooks and scheduling. Runbooks are scripts that do things, e.g. run maintenance and reporting tasks. Runbooks often, but not necessarily manipulate objects in Azure. Runbooks are serverless like Azure Functions and the pricing is therefore similar - you pay for the amount of computing resources you use and don't have to pay for a server that's mostly idle.
On the surface Azure Automation, runbooks and schedules seem deceptively simple, and getting something running in the Azure Portal is actually relatively easy. However, several Azure technologies are involved in making the pieces of the puzzle come together:
- Managed identities: these are Azure-managed users that don't have a password . They can be system-managed, which means that they're tied to the Azure Automation account you've created, or user-managed, which means their lifecycle is decoupled from their respective Azure Automation account. This means you can use the same user-managed identity in multiple places. Use of managed identities is recommended over service principal. In the context of Azure Automation the managed identity is the user under which automation runbooks run.
- Roles: these are a collection of permissions. Roles are linked with users, groups and identities.
- Build-in roles: Azure comes with a bunch of these, and in many cases they are sufficient. But for runbooks you can typically narrow down the permissions much more, because the their actions are very predictable.
- Custom roles: I recommend creating custom roles for automation jobs. You can start by cloning a built-in role or by creating a new role from scratch. In either case keeping the scope and amount of permissions to the minimum is a good policy for security reasons, plus reduces the risk of runbooks accidentally destroying something (e.g. due to an unintentional bug or a supply chain attack).
- Permissions: these are the actions (e.g. "Microsoft.Compute/virtualMachines/start/action" or "Microsoft.Compute/*/read") that the role includes and excludes.
- Scopes: these define the scope of the permissions in the role. The scope can be limited to a single resource, a resource group or an entire Azure subscription.
- RBAC: the managed identity linked with the Azure Automation account needs to assigned to a Azure role which allows it to do whatever operations the runbook(s) require. Without proper role assignement runbooks will launch, but will fail to do their job.
- Runbooks: runbooks can be written with Powershell or Python, or created using a graphical editor. They handle logging in to Azure and run with the privileges of the managed identity the Azure Automation account is linked with. Note that some runbooks may not work with managed identities, or many work with system-managed identities, but not user-managed identities. So prepare to modify the runbook to make them work in your particular use-case. Runbooks are linked with a schedule.
- Schedules: runbooks can be scheduled to run at a certain time of the day and have basic recurrence options (daily, weekly, etc).
When starting with Azure Automation I recommend doing it manually first. Once you're able to make it work manually, you can way more easily codify your work. That said, here comes the sample Terraform code to set up Azure Automation to start and stop VMs on schedule - something that seems to be a very common use-case. First we need some plumbing:
data "azurerm_subscription" "primary" {
}
resource "azurerm_resource_group" "development" {
name = "${var.resource_prefix}-rg"
location = "northeurope"
}
Then we create the user-managed identity which the Azure Automation account will use and assign it a custom role that has the permissions to start and stop VMs:
resource "azurerm_role_definition" "stop_start_vm" {
name = "StopStartVM"
scope = data.azurerm_subscription.primary.id
description = "Allow stopping and starting VMs in the primary subscription"
permissions {
actions = ["Microsoft.Network/*/read",
"Microsoft.Compute/*/read",
"Microsoft.Compute/virtualMachines/start/action",
"Microsoft.Compute/virtualMachines/restart/action",
"Microsoft.Compute/virtualMachines/deallocate/action"]
not_actions = []
}
}
resource "azurerm_user_assigned_identity" "development_automation" {
resource_group_name = azurerm_resource_group.development.name
location = azurerm_resource_group.development.location
name = "development-automation"
}
resource "azurerm_role_assignment" "development_automation" {
scope = data.azurerm_subscription.primary.id
role_definition_id = azurerm_role_definition.stop_start_vm.role_definition_resource_id
principal_id = azurerm_user_assigned_identity.development_automation.principal_id
}
With the user-managed identity in place we can create the Azure Automation account:
resource "azurerm_automation_account" "development" {
name = "development"
location = azurerm_resource_group.development.location
resource_group_name = azurerm_resource_group.development.name
sku_name = "Basic"
identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.development_automation.id]
}
}
As you can see, the Azure Automation account is linked with the user-managed identity in the identity block.
Now the Simple-Azure-VM-Start-Stop.ps1 runbook can be added:
data "local_file" "simple_azure_vm_start_stop" {
filename = "${path.module}/scripts/SimpleAzureVMStartStop.ps1"
}
resource "azurerm_automation_runbook" "simple_azure_vm_start_stop" {
name = "Simple-Azure-VM-Start-Stop"
location = azurerm_resource_group.development.location
resource_group_name = azurerm_resource_group.development.name
automation_account_name = azurerm_automation_account.development.name
log_verbose = "true"
log_progress = "true"
description = "Start or stop virtual machines"
runbook_type = "PowerShell"
content = data.local_file.simple_azure_vm_start_stop.content
}
The runbook is from here, but small modifications were made to make it work with user-managed identities. In particular, the Azure connection part was changed from this simplistic version to:
try {
$null = Connect-AzAccount -Identity
}
catch {
--- snip ---
}
to a more complex version:
params(
--- snip ---
[Parameter(Mandatory = $true)]
$AccountId,
--- snip ---
)
--- snip ---
try {
# Ensures you do not inherit an AzContext in your runbook
Disable-AzContextAutosave -Scope Process
# Connect to Azure with user-assigned managed identity
$AzureContext = (Connect-AzAccount -Identity -AccountId $AccountId).context
# set and store context
$AzureContext = Set-AzContext -SubscriptionName $AzureContext.Subscription -DefaultProfile $AzureContext
}
catch {
--- snip ---
}
The -AccountId parameter somewhat confusingly expects to get the Client ID of the managed identity.
Now, with the runbook in place we can create the schedules:
resource "azurerm_automation_schedule" "nightly_vm_backup_start" {
name = "nightly-vm-backup-start"
resource_group_name = azurerm_resource_group.development.name
automation_account_name = azurerm_automation_account.development.name
frequency = "Day"
interval = 1
timezone = "Etc/UTC"
start_time = "2022-08-11T01:00:00+00:00"
description = "Start VMs every night for backups"
}
resource "azurerm_automation_schedule" "nightly_vm_backup_stop" {
name = "nightly-vm-backup-stop"
resource_group_name = azurerm_resource_group.development.name
automation_account_name = azurerm_automation_account.development.name
frequency = "Day"
interval = 1
timezone = "Etc/UTC"
start_time = "2022-08-11T01:30:00+00:00"
description = "Stop VMs every night after backups"
The final step is to link the schedules with the runbook with job schedules. Note that the parameters to to the runbook are passed here. Also note that the keys (parameter names) have to be lowercase even if in the Powershell code they're uppercase (e.g. AccountId -> accountid):
resource "azurerm_automation_job_schedule" "nightly_vm_backup_start" {
resource_group_name = azurerm_resource_group.development.name
automation_account_name = azurerm_automation_account.development.name
schedule_name = azurerm_automation_schedule.nightly_vm_backup_start.name
runbook_name = azurerm_automation_runbook.simple_azure_vm_start_stop.name
parameters = {
resourcegroupname = azurerm_resource_group.development.name
accountid = azurerm_user_assigned_identity.development_automation.client_id
vmname = "testvm"
action = "start"
}
}
resource "azurerm_automation_job_schedule" "nightly_vm_backup_stop" {
resource_group_name = azurerm_resource_group.development.name
automation_account_name = azurerm_automation_account.development.name
schedule_name = azurerm_automation_schedule.nightly_vm_backup_stop.name
runbook_name = azurerm_automation_runbook.simple_azure_vm_start_stop.name
parameters = {
resourcegroupname = azurerm_resource_group.development.name
accountid = azurerm_user_assigned_identity.development_automation.client_id
vmname = "testvm"
action = "stop"
}
}
With this code you should be able to schedule the startup and shutdown of a VM called "testvm" succesfully. If that is not the case, go to Azure portal -> Automation Accounts -> Development -> Runbooks -> Simple-Azure-VM-Start-Stop, edit the runbook and use the "test pane" to debug what is going on. You can get script input, output, errors and all that good stuff from there, and you can trigger the script with various parameters for testing purposes.
This code is also available as a generalized module in GitHub.