Puppet types and providers development part 5: self.instances

May 30, 2021 

This blog post is a part of this blog post series:

The self.instances is a provider class method that is used to produce an array containing all resources found from the system. For example, a yum package provider might run "rpm -qa" in self.instances to get a list of packages installed on the system. This serves two main purposes:

  • Allows using of "puppet resource <type>" to produce Puppet code for all resources of <type> found on the system.
  • Data created by self.instances can be passed to self.prefetch to cache resource properties. The cached properties are then reused for every resource of that provider type. For example, a yum package provider might create a list of packages installed on a system when Puppet encounters the first "yum" package resource of "yum" type. That cache would then be reused, thus removing the need to individually check each package's state.

As discussed in Puppet types and providers development part 4: caching resource properties to improve performance self.instances or self.prefetch are not really suitable for scenarios where you need to manage (multiple) remote systems using the same provider, because things like connection would be given in the Puppet resource definition are at the provider instance level, to which provider class level methods like self.instance and self.prefetch do not have access to.

Implementing self.instances in the provider is actually quite easy once you have created the type. The example below shows a generalized basic pattern:

  def self.instances
    # Detected resources will be added to this array
    my_resources = []

    # Do something provider-specific here to get the data from the system.
    # The data could be a textual list of packages installed on a system or a Hash
    # created from JSON data received from a remote server. Put it to "my_data"
    # which is assumed to be an array where each entry contains all the properties
    # and parameters for a single resource.

    # Loop through the data ("my_data") you collected and convert the properties
    # and parameters of each resource (e.g. a package or a remote resource) into a
    # Hash, then add that Hash to the resources array in the format that Puppet
    # expects.
    my_data.each do |rd|
      my_resource = {}
      my_resource[:ensure] = <my_data determines if this is :present or :absent> 
      my_resource[:property_a] = <get the value from my_data>
      my_resource[:property_b] = <get the value from my_data>
      my_resource[:param_c] = <get the value from my_data>
      my_resource[:param_d] = <get the value from my_data>

      # Add the Hash 

      my_resources << new(my_resource)

You can easily test self.instances with

puppet resource --modulepath <modules-dir> <your-type>

where <module-dir> contains the module with your provider and <your-type> is the name of your type. Note that by default"puppet resource" will only output the properties of the resources it detects, not their parameters. If you want to output one or more parameters you need to use --param:

puppet resource --modulepath <modules-dir> --param param_a --param param_b <your-type>

This logic seems to be based on the assumed differences between properties and parameters. Properties are actual characteristics of the resource you're managing. Parameters are essentially metadata that just helps drive logic in the provider. In other words you can manage properties of a resource but not its parameters.

A bit more information on self.instances is available in Gary Larizza's Seriously, What Is This Provider Doing? blog post as well as self.instances method documentation - I recommend reading both of them.

Samuli Seppänen
Samuli Seppänen
Author archive