Skip to content

Improve resource handling for secret strings and objects #1084

@michaeltlombardi

Description

@michaeltlombardi

Summary of the new feature / enhancement

As a resource developer,
I want to indicate in my resource instance schema which properties require secret strings or objects,
so that users know the property value must be protected.

As a user,
I want to be able to understand which properties require secrets,
so that I can effectively and safely author a configuration document or invoke the resource directly.

As a resource developer and integrating developer,
I want strong contracts for how secrets should be passed to resources,
so that I can ensure my own code is adhering to safe-and-secure-by-design principles.

Currently, secure strings and objects as passed to resources as string and object values, respectively. There's no way for a resource to understand whether it has been passed a string that should be redacted except where the resource itself understands a given property to be sensitive.

Further, there's no way to indicate in a resource instance JSON Schema whether the property requires or accepts sensitive values. Therefore, users and integrating tools have no way of understanding which resource properties are sensitive except by investigating the resource manually.

Instead, resources should be able to indicate support/requirement for secret values in their JSON Schema and rely on a strong contract for how those values are sent to the resource.

Proposed technical implementation details (optional)

I see three potential ways to address this problem. These options aren't mutually exclusive:

  1. Reusable schemas - Define secureString and secureObject schemas to reusably reference in instance schemas.
  2. Canonical properties - Define canonical properties for common input types, starting with _token and _credential.
  3. Extended vocabulary - Extend the JSON Schema vocabulary to support indicating sensitive properties.
  4. Pass sensitive property names as metadata - Add a standard way for DSC to indicate to a resource which properties were known to contain secure strings or secure objects.
  5. Pass sensitive property names as canonical write-only property - Add a canonical property like _propertiesWithSensitiveValues which resources can define in their instance schema to indicate support for redacting properties that are marked as sensitive by DSC even when the resource doesn't consider them sensitive by design.

Reusable schemas

Without extending the vocabulary of the JSON Schema or defining new canonical properties, we could define the following reusable subschemas:

$id: .../secureString.json
readOnly: true
type: object
additionalProperties: false
required: [secureString]
properties:
  secureString: { type: string }
---
$id: .../secureObject.json
readOnly: true
type: object
unevaluatedProperties: false
required: [secureObject]
properties:
  secureObject: { type: object, unevaluatedProperties: false }

Which could be referenced in an instance json schema:

type: object
properties:
  alwaysSecureString:
    $ref: .../secureString.json
  maybeSecureString:
    oneOf:
    - $ref: .../secureString.json
    - type: string

In this model, the resource expects to receive the value as an object wrapping the sensitive string or object, like

{
  "secureString": "actual secret"
}

or

{
  "secureObject": {
    // actual secret object
  }
}

The main drawback to this model is the need for defining validation to apply to the string or object. Consider the example snippet:

type: object
properties:
  token:
    $ref: .../secureString.json
    properties:
      secureString:
        minLength: 8
  credential:
    $ref: .../secureObject.json
    properties:
      secureObject:
        properties:
          username:
            type: string
            minLength: 3
            maxLength: 30
            pattern: '^[^\/]$'
          password:
            type: string
            minLength: 8

To accomplish schematizing the properties of a secure object or adding further validation to
the secure string, you need to extend the referenced schema. This isn't ergonomic or obvious,
but is equally supportable for properties that may be secret, like:

type: object
properties:
  token:
    oneOf:
      - $ref: .../secureString.json
        properties:
          secureString: { $ref: '#/$defs/tokenValidation' }
      - type: string
        $ref: '#/$defs/tokenValidation'
  credential:
    oneOf:
      - $ref: .../secureObject.json
        properties:
          secureObject:
            required: [username, password]
            properties:
              username:
                $ref: '#/$defs/username'
              password:
                type: string
                $ref: '#/$defs/passwordValidation'
      - type: object
        required: [username, password]
        properties:
          username:
            $ref: '#/$defs/username'
          password:
            $ref: .../secureString.json
            properties:
              secureString:
                $ref: '#/$defs/passwordValidation'
$defs:
  tokenValidation:
    minLength: 8
  username:
    type: string
    minLength: 3
    maxLength: 30
    pattern: '^[^\/]$'
  passwordValidation:
    minLength: 8

Canonical properties

It might be easiest and most ergonomic to support two new canonical properties:

  • _token for secure strings used to send tokens to an API.
  • _credential to pass a secure object to send an identity and secret to an API.

Consider the following definitions:

$id: .../token.json
readOnly: true
type: object
properties:
  secureString:
    type: string
    minLength: 1
---
$id: .../credential.json
readOnly: true
oneOf:
- title: Secret Object Credental
  type: object
  required: [secureObject]
  additionalProperties: false
  properties:
    secureObject:
      type: object
      required: [username, password]
      additionalProperties: false
      properties:
        username: { $ref: '#/$defs/username' }
        password: { $ref: '#/$defs/password' }
- title: Object with Secure String Credential
  type: object
  required: [username, password]
  additionalProperties: false
  properties:
    username: { $ref: '#/$defs/username' }
    password:
      type: object
      required: [secureString]
      properties:
        secureString: { $ref: '#/$defs/password' }
$defs:
  username:
    type: string
    minLength: 1
  password:
    type: string
    minLength: 1

Then, a resource could reference use these canonical properties:

title: my resource
type: object
properties:
  _token: { $ref: .../token.json }
  _credential: { $ref: .../credential.json }

And accept the following inputs:

{ "_token": { "secureString": "<token>" } }
{
  "_credential": {
    "secureObject": {
      "username": "<username>",
      "password": "<password>"
    }
  }
}
{
  "_credential": {
    "username": "<username>",
    "password": { "secureString": "<password>" }
  }
}

With a strong contract around the canonical resource properties, we could inform developers how they should implement their resources to deserialize this data (and to never emit it). When we implement development kits, we could provide idiomatic, ergonomic options for developers to incorporate tokens and credentials into their resource design.

Note that for resources that need to reuse these semantics for multiple properties in the same resource, they could just reference the canonical resource schema even with a different property name, like:

title: my resource
properties:
  serviceToken: { $ref: .../token.json }
  databaseToken: { $ref: .../token.json }

We don't currently have a model for canonical properties with variant names, but we could have a handler for names matching patterns:

patternProperties:
  '^_(token|\w+Token)$': { $ref: .../token.json }
  '^_(credential|\w+Credential)$': { $ref: .../credential.json }
properties:
  _serviceToken:
    title: Service token
    description: This token is used to authenticate to the foo service.
  _databaseCredential:
    title: Database credential
    description: This credential is used to authenticate to the local DB.

We can probably write some linting/recommendations to check for apparent definitions of token and credential properties and recommend using canonical properties / patterns instead. The pattern properties can also be defined as reusable references, e.g.

# compose from individual properties
allOf:
 - $ref: .../patternProperties/tokens.json
 - $ref: .../patternProperties/credentials.json
properties:
  _serviceToken:
    title: Service token
    description: This token is used to authenticate to the foo service.
  _databaseCredential:
    title: Database credential
    description: This credential is used to authenticate to the local DB.
---
# Apply all known pattern properties:
$ref: .../patternProperties.json
properties:
  _exist: {}
  _inDesiredState: {}
  _serviceToken: {}
  _credential: {}

The downside to the patternProperties approach/support is that to avoid accidentally indicating any property matching those patterns as valid, you need to use the propertyNames keyword to define the set of valid property names, like:

$ref: .../patternProperties.json
properties:
  _exist: {}
  _inDesiredState: {}
  _serviceToken: {}
  _credential: {}
  nonCanonicalProperty: {}
propertyNames:
  enum: [_exist, _inDesiredState, _serviceToken, _credential, nonCanonicalProperty]

That can be a little onerous, but we could again provide some authoring help to smooth that over in the editor and with RDKs.

Extended JSON Schema vocabulary

We could also define a new JSON Schema keyword that indicates whether a property needs to be handled securely, like x-dsc-secure, where defining it as true indicates different things depending on the subschema its applied to:

  • If applied to a string, the value needs to be sent as a secure string object wrapping the current subschema.
  • If applied to an object, the value needs to be sent as a secure object wrapping the current subschema.

The main drawback to this is that we would need to implement a custom applicator keyword (not annotation, validation, or format), because it changes the validation behavior for the subschema rather than simply extending the validation for the currently described subschema (consider the difference between x-dsc-secure and x-isAscii, where the latter indicates that the string value must only contain ascii characters).

I believe we can do this in the jsonschema crate, but that leaves the following problems:

  1. A resource implemented in an arbitrary language would need to have a JSON Schema library that supports custom keywords and reimplement the keyword behavior themselves. If there's an RDK for that language, it would need to provide the keyword and handling for the developer.
  2. We don't have a way to get VS Code (or other editors) to understand the custom keyword.

This is primarily a problem for validation and applicator keywords, but not annotation keywords (which can be ignored when they aren't understood). Ignoring the x-dsc-secure keyword would fundamentally break understanding and validation of data for any property using that keyword.

Pass sensitive property names as metadata

We could define a metadata field that resources can optionally support as write-only metadata input where the field tells the resource which property names should be handled as sensitive data, even for properties that aren't sensitive by design. In this model:

  1. DSC checks whether the resource instance schema supports indicating properties with sensitive data. If the resource doesn't support this behavior, DSC just invokes the resource as normal. Otherwise, it continues through the following steps.
  2. Dsc analyzes the resource instance for any properties where the value is a secure object or secure string.
  3. If any properties contain securing values, DSC inserts the array containing the names of those properties into _metadata before invoking the resource.
  4. The resource is responsible for checking the list of sensitive property names and not emitting those values in any messaging.

The property could be defined with a JSON Schema such as:

title: Properties with sensitive values
description: >-
  This field contains an array of property names for the resource instance where the property
  contained one or more secure strings or secure objects. Resources accepting this metadata
  must not emit these values in any messages.
writeOnly: true
type: array
items:
  title: Name of property with sensitive values
  description: >-
    Each item in the array is the name of a resource instance property where the user passed a
    value for that property containing one or more secure strings or secure objects.
  type: string

This could be inserted at the top level of the _metadata or nested under the Microsoft.DSC field. I propose naming this metadata field itself propertiesWithSensitiveValues. If the field is inserted at the top level, it should be prefixed with an underscore. If inserted within the Microsoft.DSC field, it doesn't need a prefix.

For example, given the following configuration snippet:

- type: Example/Resource
  name: Example with sensitive property values
  properties:
    token: "[secret('api_token')]"
    credential: "[params('service_credential')]"
    foo: "[secret('foo')]"
    baz: true

DSC would send either of the following snippets as the input JSON (formatted as YAML for readability):

  • Top-level metadata:

    token: token_returned_from_secret_vault
    credential:
      username: parameterized_username
      password: parameterized_password
    foo: foo_returned_from_secret_vault
    baz: true
    _metadata:
      _propertiesWithSensitiveValues:
        - token
        - credential
        - foo
  • Nested in Microsoft.DSC metadata:

    token: token_returned_from_secret_vault
    credential:
      username: parameterized_username
      password: parameterized_password
    foo: foo_returned_from_secret_vault
    baz: true
    _metadata:
      Microsoft.DSC:
        propertiesWithSensitiveValues:
          - token
          - credential
          - foo

The resource would then be responsible for adhering to the implied contract and not emitting
values for those named properties.

Pass sensitive property names as a canonical write-only property

This option builds on the proposal for the metadata but defines it as a canonical property instead.

In this model:

  1. DSC checks whether the resource instance schema defines the _propertiesWithSensitiveValues canonical property. If the resource doesn't have this canonical property, DSC just invokes the resource as normal. Otherwise, it continues through the following steps.
  2. DSC analyzes the resource instance for any properties where the value is a secure object or secure string.
  3. If any properties contain securing values, DSC inserts the array containing the names of those properties into _propertiesWithSensitiveValues before invoking the resource.
  4. The resource is responsible for checking the list of sensitive property names and not emitting those values in any messaging.

We could define the canonical property JSON Schema like this:

title: Properties with sensitive values canonical property
description: >-
  This property contains an array of property names for the resource instance where the property
  contained one or more secure strings or secure objects. Resources defining this canonical
  property are indicating that they adhere to the property contract and won't emit any values
  for the received property names in any messaging.

writeOnly: true
type: array
items:
  title: Name of property with sensitive values
  description: >-
    Each item in the array is the name of a resource instance property where the user passed a
    value for that property containing one or more secure strings or secure objects.
  type: string

For an example of how this would work in DSC, given the following resource instance definition
snippet:

For example, given the following configuration snippet:

- type: Example/Resource
  name: Example with sensitive property values
  properties:
    token: "[secret('api_token')]"
    credential: "[params('service_credential')]"
    foo: "[secret('foo')]"
    baz: true

DSC would send the resource the following input JSON (formatted as YAML for readability):

token: token_returned_from_secret_vault
credential:
  username: parameterized_username
  password: parameterized_password
foo: foo_returned_from_secret_vault
baz: true
_propertiesWithSensitiveValues:
  - token
  - credential
  - foo

The resource would be responsible for adhering to the contract of the canonical property and not
emit the values of the token, credential, or foo properties in any messages.

Additional considerations

For both the reusable subschemas and canonical properties, we could probably intelligently handle sending the correct data type to the resource and introduce warnings/errors when users try to send simple (non-secure) objects or strings to those properties. Alternatively, we could emit a warning but wrap the value for the resource anyway.

We should never leak secrets to messages or output data from DSC itself. We can't prevent resources from leaking secrets.

Summary

I think the best short term option is to support the canonical property for indicating which properties contain sensitive values even when the resource didn't consider them sensitive by design. This model would immediately enable resource authors to clearly indicate adherance to this contract.

In the medium term, I still see value in the _token and _credential canonical properties, but I don't think we need to wrap the data in secure strings or objects when sending to the resource. Instead, we could warn the user when they're not passing secure data for those properties. I think we could probably benefit from defining the pattern properties, but those are lower priority (though would eventually simplify resource instance schema authoring).

We should ensure that we keep an awareness of the data sensitivity for resources referencing these subschemas or defined with these canonical properties so we don't leak those values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue-EnhancementThe issue is a feature or ideaSchema-ImpactChange requires updating a canonical schema for configs or manifests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions