Puppet types and providers development part 6: mysteries of self.prefetch

May 30, 2021 

This blog post is a part of this blog post series:

I will open this blog post with a quote from the famous Gary Larizza:

After wading the waters of self.prefetch, I’m PRETTY SURE its implementation might have come to uncle Luke after a long night in Reed’s chem lab where he might have accidently synthesized mescaline.


After reading Gary's rather elaborate explanation about self.prefetch I still did not fully get what self.prefetch actually does and the official documentation surely does not help:

This method may be implemented by a provider in order to pre-fetch resource properties. If implemented it should set the provider instance of the managed resources to a provider with the fetched state (i.e. what is returned from the instances method).


This post tries to shed more light on this seemingly magical Puppet provider class function. I recommend reading the previous parts of this blog post series before diving into the murky depths of self.prefetch.

So, the purpose of self.prefetch is to create the @property_hash for the provider the first time a Puppet resource of that type is encountered. Then, when more resources are found the cached values can reused, thus improving performance. For example, a yum provider might use self.instances to create a list of packages installed with yum. That data would then be used by the individual "package" resources using "yum" provider to figure what the current state of that package is (present, absent, etc).

A typical generalized implementation of self.prefetch and its supporting methods looks like this:

using this providerusing this providerusing this provider  # The resource setter and getter methods need to be
  # present or you will be in world of hurt in
  # self.prefetch when trying to figure out why it
  # does not work

  # self.instances method is often used to produce
  # the raw material for self.prefetch
  def self.instances
    # Implementation omitted

  # The "resources" parameter passed on to
  # self.prefetch contains those Puppet
  # resources in the catalog that belong to this
  # particular type. In practice the resources
  # are passed in as instance of the type, e.g.
  # Puppet::Type::Mytype.
  # Note that the resources are not guaranteed
  # to be using this particular provider, which
  # explains some of the conditional logic below.
  def self.prefetch(resources)
    # Get the list resource present on the
    # system from self.instances and assign
    # it to a variable to avoid having to call
    # self.instances again.
    my_instances = instances

    # Loop through the resources array that
    # contains all the resources in the catalog
    # that belong to this type. They may or may
    # not belong to this provider.
    resources.keys.each do |name|
      # Find which provider each type instance
      # should use. For example, if you are
      # managing packages with providers "pip"
      # and "yum" at the same time then some
      # type instances will use the "pip" provider
      # and some will use the "yum" provider. So,
      # in that case you would need to determine
      # which of those packages should use the
      # "pip" and which should use "yum" provider.
      # The code below finds the first resource in
      # self.instances whose properties match a type
      # instance in the catalog and sets it to a
      # variable.
      provider = my_instances.find { |my_instance| \
        my_instance.property_a == resources[name][:property_a] && \
        my_instance.property_b == resources[name][:property_b] \
      # If a match was found, the type instance in
      # the catalog is set to use this provider.
      if provider
        resources[name].provider = provider

Note that the @property_hash only contains cached values for the resources that are actually in the catalog: the other resources that self.instances produces are dropped. This makes sense as there's no point in caching something that won't be used by anything.

While self.prefetch is typically married with self.instances, that is not obligatory. In fact, if querying all resources on the system is a particularly expensive operation you may want to actually check the catalog (the "resources" parameter passed to self.prefetch) to understand what queries you really have to make and what you can avoid. This can make a big performance difference if there are potentially tons of resources to query and each query is slow.

If you need to understand self.prefetch even better I suggest just adding lots of "p" commands to strategic places and check out what you get. The output will be messy, but quite enlightening in an esoteric way. Here are some highlights of my own quest with an early version of the keycloak_role_mapper provider. The test catalog I was working with using "puppet apply" was this:

keycloak_role_mapping { 'john-roles':
    realm       => 'foobar',
    name        => 'john',
    role        => 'testrole',

keycloak_role_mapping { 'jane-roles':
    realm       => 'foobar',
    name        => 'jane',
    role        => 'testrole', 

Here's the dummy self.prefetch implementation I wrote to truly understand the garbage that gets passed to self.prefetch:

  def self.prefetch(resources)
    p "Resources: #{resources.class}"
    resources.each do |resource|
      p "Resource: #{resource.class}"
      resource.each do |value|
        p "Value: #{value.class}"

This is what comes out from "puppet apply":

Notice: Compiled catalog for localhost.localdomain in environment production in 0.08 seconds
"Resources: Hash"
"Resource: Array"
"Value: String"
"Value: Puppet::Type::Keycloak_role_mapping"
"Resource: Array"
"Value: String"
"Value: Puppet::Type::Keycloak_role_mapping"
Notice: Applied catalog in 3.31 seconds

The keys in "resources" are the namevars of the resources of this particular type (keycloak_role_mapping). If we do

p resources.keys

we get this in return:

["john", "jane"]

The other interesting part is the resource Array. The first entry in it contains the namevar ("john" or "jane") again. The second part is actually a reference to the Puppet type, in this case the instance of Puppet::Type::Keycloak_role_mapping. It looks horrible, but you can actually extract all the type parameters and properties from it. For example this code

p "Title: #{resource[1].title}"
p "Name: #{resource[1][:name]}"
p "Role: #{resource[1][:role]}"

will give you this output:

"Title: john-roles"
"Name: john"
"Role: testrole"

"Title: jane-roles"
"Name: jane"
"Role: anotherrole"

So, self.prefetch is fed all the Puppet type instances found in the catalog that belong to this particular type. Those are then used to map the actual resources in the system (e.g. "packages installed with yum") with the resources defined in Puppet code (e.g. package resources that use the "yum" provider).

In other words, self.prefetch gets a list of resources of this type (e.g. "package"), which may be using multiple provider (e.g. "yum" or "pip"). It then checks the actual state of the system (which it often gets from self.instances) and selects the first resource that uses this provider and matches this type instance.

Samuli Seppänen
Samuli Seppänen
Author archive