Improve resource handling for secret strings and objects

### Summary of the new feature / enhancement

> As a resource developer,
> I want to indicate in my resource instance schema which properties require secret strings or objects,
> so that users know the property value must be protected.

> As a user,
> I want to be able to understand which properties require secrets,
> so that I can effectively and safely author a configuration document or invoke the resource directly.

> As a resource developer and integrating developer,
> I want strong contracts for how secrets should be passed to resources,
> so that I can ensure my own code is adhering to safe-and-secure-by-design principles.

Currently, secure strings and objects as passed to resources as string and object values, respectively. There's no way for a resource to understand whether it has been passed a string that should be redacted except where the resource itself understands a given property to be sensitive.

Further, there's no way to indicate in a resource instance JSON Schema whether the property _requires_ or _accepts_ sensitive values. Therefore, users and integrating tools have no way of understanding which resource properties are sensitive except by investigating the resource manually.

Instead, resources should be able to indicate support/requirement for secret values in their JSON Schema and rely on a strong contract for how those values are sent to the resource. 

### Proposed technical implementation details (optional)

I see three potential ways to address this problem. These options aren't mutually exclusive:

1. [Reusable schemas](#proposal-reusable-schemas) - Define `secureString` and `secureObject` schemas to reusably reference in instance schemas.
1. [Canonical properties](#proposal-canonical-properties) - Define canonical properties for common input types, starting with `_token` and `_credential`.
1. [Extended vocabulary](#proposal-extended-json-schema-vocabulary) - Extend the JSON Schema vocabulary to support indicating sensitive properties.
1. [Pass sensitive property names as metadata](#proposal-pass-sensitive-property-names-as-metadata) - Add a standard way for DSC to indicate to a resource which properties were known to contain secure strings or secure objects.
1. [Pass sensitive property names as canonical write-only property](#proposal-pass-sensitive-property-names-as-a-canonical-write-only-property) - Add a canonical property like `_propertiesWithSensitiveValues` which resources can define in their instance schema to indicate support for redacting properties that are marked as sensitive by DSC even when the resource doesn't consider them sensitive by design.

## Reusable schemas

<a id="proposal-reusable-schemas"></a>

Without extending the vocabulary of the JSON Schema or defining new canonical properties, we could define the following reusable subschemas:

```yaml
$id: .../secureString.json
readOnly: true
type: object
additionalProperties: false
required: [secureString]
properties:
  secureString: { type: string }
---
$id: .../secureObject.json
readOnly: true
type: object
unevaluatedProperties: false
required: [secureObject]
properties:
  secureObject: { type: object, unevaluatedProperties: false }
```

Which could be referenced in an instance json schema:

```yaml
type: object
properties:
  alwaysSecureString:
    $ref: .../secureString.json
  maybeSecureString:
    oneOf:
    - $ref: .../secureString.json
    - type: string
```

In this model, the resource expects to receive the value as an object wrapping the sensitive string or object, like

```json
{
  "secureString": "actual secret"
}
```

or

```jsonc
{
  "secureObject": {
    // actual secret object
  }
}
```

The main drawback to this model is the need for defining validation to apply to the string or object. Consider the example snippet:

```yaml
type: object
properties:
  token:
    $ref: .../secureString.json
    properties:
      secureString:
        minLength: 8
  credential:
    $ref: .../secureObject.json
    properties:
      secureObject:
        properties:
          username:
            type: string
            minLength: 3
            maxLength: 30
            pattern: '^[^\/]$'
          password:
            type: string
            minLength: 8
```

To accomplish schematizing the _properties_ of a secure object or adding further validation to
the secure string, you need to _extend_ the referenced schema. This isn't ergonomic or obvious,
but is equally supportable for properties that _may_ be secret, like:

```yaml
type: object
properties:
  token:
    oneOf:
      - $ref: .../secureString.json
        properties:
          secureString: { $ref: '#/$defs/tokenValidation' }
      - type: string
        $ref: '#/$defs/tokenValidation'
  credential:
    oneOf:
      - $ref: .../secureObject.json
        properties:
          secureObject:
            required: [username, password]
            properties:
              username:
                $ref: '#/$defs/username'
              password:
                type: string
                $ref: '#/$defs/passwordValidation'
      - type: object
        required: [username, password]
        properties:
          username:
            $ref: '#/$defs/username'
          password:
            $ref: .../secureString.json
            properties:
              secureString:
                $ref: '#/$defs/passwordValidation'
$defs:
  tokenValidation:
    minLength: 8
  username:
    type: string
    minLength: 3
    maxLength: 30
    pattern: '^[^\/]$'
  passwordValidation:
    minLength: 8
```

## Canonical properties

<a id="proposal-canonical-properties">

It might be easiest and most ergonomic to support two new canonical properties:

- `_token` for secure strings used to send tokens to an API.
- `_credential` to pass a secure object to send an identity and secret to an API.

Consider the following definitions:

```yaml
$id: .../token.json
readOnly: true
type: object
properties:
  secureString:
    type: string
    minLength: 1
---
$id: .../credential.json
readOnly: true
oneOf:
- title: Secret Object Credental
  type: object
  required: [secureObject]
  additionalProperties: false
  properties:
    secureObject:
      type: object
      required: [username, password]
      additionalProperties: false
      properties:
        username: { $ref: '#/$defs/username' }
        password: { $ref: '#/$defs/password' }
- title: Object with Secure String Credential
  type: object
  required: [username, password]
  additionalProperties: false
  properties:
    username: { $ref: '#/$defs/username' }
    password:
      type: object
      required: [secureString]
      properties:
        secureString: { $ref: '#/$defs/password' }
$defs:
  username:
    type: string
    minLength: 1
  password:
    type: string
    minLength: 1
```

Then, a resource could reference use these canonical properties:

```yaml
title: my resource
type: object
properties:
  _token: { $ref: .../token.json }
  _credential: { $ref: .../credential.json }
```

And accept the following inputs:

```json
{ "_token": { "secureString": "<token>" } }
{
  "_credential": {
    "secureObject": {
      "username": "<username>",
      "password": "<password>"
    }
  }
}
{
  "_credential": {
    "username": "<username>",
    "password": { "secureString": "<password>" }
  }
}
```

With a strong contract around the canonical resource properties, we could inform developers how they should implement their resources to deserialize this data (and to never emit it). When we implement development kits, we could provide idiomatic, ergonomic options for developers to incorporate tokens and credentials into their resource design.

Note that for resources that need to reuse these semantics for multiple properties in the same resource, they could just reference the canonical resource schema even with a different property name, like:

```yaml
title: my resource
properties:
  serviceToken: { $ref: .../token.json }
  databaseToken: { $ref: .../token.json }
```

We don't _currently_ have a model for canonical properties with variant names, but we _could_ have a handler for names matching patterns:

```yaml
patternProperties:
  '^_(token|\w+Token)$': { $ref: .../token.json }
  '^_(credential|\w+Credential)$': { $ref: .../credential.json }
properties:
  _serviceToken:
    title: Service token
    description: This token is used to authenticate to the foo service.
  _databaseCredential:
    title: Database credential
    description: This credential is used to authenticate to the local DB.
```

We can probably write some linting/recommendations to check for apparent definitions of token and credential properties and recommend using canonical properties / patterns instead. The pattern properties can also be defined as reusable references, e.g.

```yaml
# compose from individual properties
allOf:
 - $ref: .../patternProperties/tokens.json
 - $ref: .../patternProperties/credentials.json
properties:
  _serviceToken:
    title: Service token
    description: This token is used to authenticate to the foo service.
  _databaseCredential:
    title: Database credential
    description: This credential is used to authenticate to the local DB.
---
# Apply all known pattern properties:
$ref: .../patternProperties.json
properties:
  _exist: {}
  _inDesiredState: {}
  _serviceToken: {}
  _credential: {}
```

The downside to the `patternProperties` approach/support is that to avoid accidentally indicating _any_ property matching those patterns as valid, you need to use the `propertyNames` keyword to define the set of valid property names, like:

```yaml
$ref: .../patternProperties.json
properties:
  _exist: {}
  _inDesiredState: {}
  _serviceToken: {}
  _credential: {}
  nonCanonicalProperty: {}
propertyNames:
  enum: [_exist, _inDesiredState, _serviceToken, _credential, nonCanonicalProperty]
```

That can be a little onerous, but we could again provide some authoring help to smooth that over in the editor and with RDKs.

## Extended JSON Schema vocabulary

<a id="proposal-extended-json-schema-vocabulary"></a>

We could also define a new JSON Schema keyword that indicates whether a property needs to be handled securely, like `x-dsc-secure`, where defining it as `true` indicates different things depending on the subschema its applied to:

- If applied to a string, the value needs to be sent as a secure string object wrapping the current subschema.
- If applied to an object, the value needs to be sent as a secure object wrapping the current subschema.

The main drawback to this is that we would need to implement a custom _applicator_ keyword (not annotation, validation, or format), because it changes the validation behavior for the subschema rather than simply extending the validation for the currently described subschema (consider the difference between `x-dsc-secure` and `x-isAscii`, where the latter indicates that the string value must only contain ascii characters).

I believe we _can_ do this in the [jsonschema crate](https://docs.rs/jsonschema/0.33.0/jsonschema/#custom-keywords), but that leaves the following problems:

1. A resource implemented in an arbitrary language would need to have a JSON Schema library that supports custom keywords and reimplement the keyword behavior themselves. If there's an RDK for that language, it would need to provide the keyword and handling for the developer.
1. We don't have a way to get VS Code (or other editors) to understand the custom keyword.

This is primarily a problem for validation and applicator keywords, but not annotation keywords (which can be ignored when they aren't understood). Ignoring the `x-dsc-secure` keyword would fundamentally break understanding and validation of data for any property using that keyword.

## Pass sensitive property names as metadata

<a id="proposal-pass-sensitive-property-names-as-metadata"></a>

We could define a metadata field that resources can optionally support as write-only metadata input where the field tells the resource which property names should be handled as sensitive data, even for properties that aren't sensitive by design. In this model:

1. DSC checks whether the resource instance schema supports indicating properties with sensitive data. If the resource doesn't support this behavior, DSC just invokes the resource as normal. Otherwise, it continues through the following steps.
1. Dsc analyzes the resource instance for any properties where the value is a secure object or secure string.
1. If any properties contain securing values, DSC inserts the array containing the names of those properties into `_metadata` before invoking the resource.
1. The resource is responsible for checking the list of sensitive property names and _not_ emitting those values in any messaging.

The property could be defined with a JSON Schema such as:

```yaml
title: Properties with sensitive values
description: >-
  This field contains an array of property names for the resource instance where the property
  contained one or more secure strings or secure objects. Resources accepting this metadata
  must not emit these values in any messages.
writeOnly: true
type: array
items:
  title: Name of property with sensitive values
  description: >-
    Each item in the array is the name of a resource instance property where the user passed a
    value for that property containing one or more secure strings or secure objects.
  type: string
```

This _could_ be inserted at the top level of the `_metadata` or nested under the `Microsoft.DSC` field. I propose naming this metadata field itself `propertiesWithSensitiveValues`. If the field is inserted at the top level, it should be prefixed with an underscore. If inserted within the `Microsoft.DSC` field, it doesn't need a prefix.

For example, given the following configuration snippet:

```yaml
- type: Example/Resource
  name: Example with sensitive property values
  properties:
    token: "[secret('api_token')]"
    credential: "[params('service_credential')]"
    foo: "[secret('foo')]"
    baz: true
```

DSC would send either of the following snippets as the input JSON (formatted as YAML for readability):

- Top-level metadata:

  ```yaml
  token: token_returned_from_secret_vault
  credential:
    username: parameterized_username
    password: parameterized_password
  foo: foo_returned_from_secret_vault
  baz: true
  _metadata:
    _propertiesWithSensitiveValues:
      - token
      - credential
      - foo
  ```

- Nested in `Microsoft.DSC` metadata:

  ```yaml
  token: token_returned_from_secret_vault
  credential:
    username: parameterized_username
    password: parameterized_password
  foo: foo_returned_from_secret_vault
  baz: true
  _metadata:
    Microsoft.DSC:
      propertiesWithSensitiveValues:
        - token
        - credential
        - foo
  ```

The resource would then be responsible for adhering to the implied contract and _not_ emitting
values for those named properties.

## Pass sensitive property names as a canonical write-only property

<a id="proposal-pass-sensitive-property-names-as-a-canonical-write-only-property"></a>

This option builds on the proposal for the metadata but defines it as a canonical property instead.

In this model:

1. DSC checks whether the resource instance schema defines the `_propertiesWithSensitiveValues` canonical property. If the resource doesn't have this canonical property, DSC just invokes the resource as normal. Otherwise, it continues through the following steps.
1. DSC analyzes the resource instance for any properties where the value is a secure object or secure string.
1. If any properties contain securing values, DSC inserts the array containing the names of those properties into `_propertiesWithSensitiveValues` before invoking the resource.
1. The resource is responsible for checking the list of sensitive property names and _not_ emitting those values in any messaging.

We could define the canonical property JSON Schema like this:

```yaml
title: Properties with sensitive values canonical property
description: >-
  This property contains an array of property names for the resource instance where the property
  contained one or more secure strings or secure objects. Resources defining this canonical
  property are indicating that they adhere to the property contract and won't emit any values
  for the received property names in any messaging.

writeOnly: true
type: array
items:
  title: Name of property with sensitive values
  description: >-
    Each item in the array is the name of a resource instance property where the user passed a
    value for that property containing one or more secure strings or secure objects.
  type: string
```

For an example of how this would work in DSC, given the following resource instance definition
snippet:

For example, given the following configuration snippet:

```yaml
- type: Example/Resource
  name: Example with sensitive property values
  properties:
    token: "[secret('api_token')]"
    credential: "[params('service_credential')]"
    foo: "[secret('foo')]"
    baz: true
```

DSC would send the resource the following input JSON (formatted as YAML for readability):

```yaml
token: token_returned_from_secret_vault
credential:
  username: parameterized_username
  password: parameterized_password
foo: foo_returned_from_secret_vault
baz: true
_propertiesWithSensitiveValues:
  - token
  - credential
  - foo
```

The resource would be responsible for adhering to the contract of the canonical property and not
emit the values of the `token`, `credential`, or `foo` properties in any messages.

## Additional considerations

For both the reusable subschemas and canonical properties, we could probably intelligently handle sending the correct data type to the resource and introduce warnings/errors when users try to send simple (non-secure) objects or strings to those properties. Alternatively, we could emit a warning but wrap the value for the resource anyway.

We should never leak secrets to messages or output data from DSC itself. We can't _prevent_ resources from leaking secrets.

## Summary

I think the best short term option is to support the canonical property for indicating which properties contain sensitive values even when the resource didn't consider them sensitive by design. This model would immediately enable resource authors to clearly indicate adherance to this contract.

In the medium term, I still see value in the `_token` and `_credential` canonical properties, but I don't think we need to wrap the data in secure strings or objects when sending to the resource. Instead, we could warn the _user_ when they're not passing secure data for those properties. I think we could probably benefit from defining the pattern properties, but those are lower priority (though would eventually simplify resource instance schema authoring).

We should ensure that we keep an awareness of the data sensitivity for resources referencing these subschemas or defined with these canonical properties so we don't leak those values.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve resource handling for secret strings and objects #1084

Summary of the new feature / enhancement

Proposed technical implementation details (optional)

Reusable schemas

Canonical properties

Extended JSON Schema vocabulary

Pass sensitive property names as metadata

Pass sensitive property names as a canonical write-only property

Additional considerations

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve resource handling for secret strings and objects #1084

Description

Summary of the new feature / enhancement

Proposed technical implementation details (optional)

Reusable schemas

Canonical properties

Extended JSON Schema vocabulary

Pass sensitive property names as metadata

Pass sensitive property names as a canonical write-only property

Additional considerations

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions