-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Summary of the new feature / enhancement
As a resource developer,
I want to indicate in my resource instance schema which properties require secret strings or objects,
so that users know the property value must be protected.
As a user,
I want to be able to understand which properties require secrets,
so that I can effectively and safely author a configuration document or invoke the resource directly.
As a resource developer and integrating developer,
I want strong contracts for how secrets should be passed to resources,
so that I can ensure my own code is adhering to safe-and-secure-by-design principles.
Currently, secure strings and objects as passed to resources as string and object values, respectively. There's no way for a resource to understand whether it has been passed a string that should be redacted except where the resource itself understands a given property to be sensitive.
Further, there's no way to indicate in a resource instance JSON Schema whether the property requires or accepts sensitive values. Therefore, users and integrating tools have no way of understanding which resource properties are sensitive except by investigating the resource manually.
Instead, resources should be able to indicate support/requirement for secret values in their JSON Schema and rely on a strong contract for how those values are sent to the resource.
Proposed technical implementation details (optional)
I see three potential ways to address this problem. These options aren't mutually exclusive:
- Reusable schemas - Define
secureString
andsecureObject
schemas to reusably reference in instance schemas. - Canonical properties - Define canonical properties for common input types, starting with
_token
and_credential
. - Extended vocabulary - Extend the JSON Schema vocabulary to support indicating sensitive properties.
- Pass sensitive property names as metadata - Add a standard way for DSC to indicate to a resource which properties were known to contain secure strings or secure objects.
- Pass sensitive property names as canonical write-only property - Add a canonical property like
_propertiesWithSensitiveValues
which resources can define in their instance schema to indicate support for redacting properties that are marked as sensitive by DSC even when the resource doesn't consider them sensitive by design.
Reusable schemas
Without extending the vocabulary of the JSON Schema or defining new canonical properties, we could define the following reusable subschemas:
$id: .../secureString.json
readOnly: true
type: object
additionalProperties: false
required: [secureString]
properties:
secureString: { type: string }
---
$id: .../secureObject.json
readOnly: true
type: object
unevaluatedProperties: false
required: [secureObject]
properties:
secureObject: { type: object, unevaluatedProperties: false }
Which could be referenced in an instance json schema:
type: object
properties:
alwaysSecureString:
$ref: .../secureString.json
maybeSecureString:
oneOf:
- $ref: .../secureString.json
- type: string
In this model, the resource expects to receive the value as an object wrapping the sensitive string or object, like
{
"secureString": "actual secret"
}
or
The main drawback to this model is the need for defining validation to apply to the string or object. Consider the example snippet:
type: object
properties:
token:
$ref: .../secureString.json
properties:
secureString:
minLength: 8
credential:
$ref: .../secureObject.json
properties:
secureObject:
properties:
username:
type: string
minLength: 3
maxLength: 30
pattern: '^[^\/]$'
password:
type: string
minLength: 8
To accomplish schematizing the properties of a secure object or adding further validation to
the secure string, you need to extend the referenced schema. This isn't ergonomic or obvious,
but is equally supportable for properties that may be secret, like:
type: object
properties:
token:
oneOf:
- $ref: .../secureString.json
properties:
secureString: { $ref: '#/$defs/tokenValidation' }
- type: string
$ref: '#/$defs/tokenValidation'
credential:
oneOf:
- $ref: .../secureObject.json
properties:
secureObject:
required: [username, password]
properties:
username:
$ref: '#/$defs/username'
password:
type: string
$ref: '#/$defs/passwordValidation'
- type: object
required: [username, password]
properties:
username:
$ref: '#/$defs/username'
password:
$ref: .../secureString.json
properties:
secureString:
$ref: '#/$defs/passwordValidation'
$defs:
tokenValidation:
minLength: 8
username:
type: string
minLength: 3
maxLength: 30
pattern: '^[^\/]$'
passwordValidation:
minLength: 8
Canonical properties
It might be easiest and most ergonomic to support two new canonical properties:
_token
for secure strings used to send tokens to an API._credential
to pass a secure object to send an identity and secret to an API.
Consider the following definitions:
$id: .../token.json
readOnly: true
type: object
properties:
secureString:
type: string
minLength: 1
---
$id: .../credential.json
readOnly: true
oneOf:
- title: Secret Object Credental
type: object
required: [secureObject]
additionalProperties: false
properties:
secureObject:
type: object
required: [username, password]
additionalProperties: false
properties:
username: { $ref: '#/$defs/username' }
password: { $ref: '#/$defs/password' }
- title: Object with Secure String Credential
type: object
required: [username, password]
additionalProperties: false
properties:
username: { $ref: '#/$defs/username' }
password:
type: object
required: [secureString]
properties:
secureString: { $ref: '#/$defs/password' }
$defs:
username:
type: string
minLength: 1
password:
type: string
minLength: 1
Then, a resource could reference use these canonical properties:
title: my resource
type: object
properties:
_token: { $ref: .../token.json }
_credential: { $ref: .../credential.json }
And accept the following inputs:
{ "_token": { "secureString": "<token>" } }
{
"_credential": {
"secureObject": {
"username": "<username>",
"password": "<password>"
}
}
}
{
"_credential": {
"username": "<username>",
"password": { "secureString": "<password>" }
}
}
With a strong contract around the canonical resource properties, we could inform developers how they should implement their resources to deserialize this data (and to never emit it). When we implement development kits, we could provide idiomatic, ergonomic options for developers to incorporate tokens and credentials into their resource design.
Note that for resources that need to reuse these semantics for multiple properties in the same resource, they could just reference the canonical resource schema even with a different property name, like:
title: my resource
properties:
serviceToken: { $ref: .../token.json }
databaseToken: { $ref: .../token.json }
We don't currently have a model for canonical properties with variant names, but we could have a handler for names matching patterns:
patternProperties:
'^_(token|\w+Token)$': { $ref: .../token.json }
'^_(credential|\w+Credential)$': { $ref: .../credential.json }
properties:
_serviceToken:
title: Service token
description: This token is used to authenticate to the foo service.
_databaseCredential:
title: Database credential
description: This credential is used to authenticate to the local DB.
We can probably write some linting/recommendations to check for apparent definitions of token and credential properties and recommend using canonical properties / patterns instead. The pattern properties can also be defined as reusable references, e.g.
# compose from individual properties
allOf:
- $ref: .../patternProperties/tokens.json
- $ref: .../patternProperties/credentials.json
properties:
_serviceToken:
title: Service token
description: This token is used to authenticate to the foo service.
_databaseCredential:
title: Database credential
description: This credential is used to authenticate to the local DB.
---
# Apply all known pattern properties:
$ref: .../patternProperties.json
properties:
_exist: {}
_inDesiredState: {}
_serviceToken: {}
_credential: {}
The downside to the patternProperties
approach/support is that to avoid accidentally indicating any property matching those patterns as valid, you need to use the propertyNames
keyword to define the set of valid property names, like:
$ref: .../patternProperties.json
properties:
_exist: {}
_inDesiredState: {}
_serviceToken: {}
_credential: {}
nonCanonicalProperty: {}
propertyNames:
enum: [_exist, _inDesiredState, _serviceToken, _credential, nonCanonicalProperty]
That can be a little onerous, but we could again provide some authoring help to smooth that over in the editor and with RDKs.
Extended JSON Schema vocabulary
We could also define a new JSON Schema keyword that indicates whether a property needs to be handled securely, like x-dsc-secure
, where defining it as true
indicates different things depending on the subschema its applied to:
- If applied to a string, the value needs to be sent as a secure string object wrapping the current subschema.
- If applied to an object, the value needs to be sent as a secure object wrapping the current subschema.
The main drawback to this is that we would need to implement a custom applicator keyword (not annotation, validation, or format), because it changes the validation behavior for the subschema rather than simply extending the validation for the currently described subschema (consider the difference between x-dsc-secure
and x-isAscii
, where the latter indicates that the string value must only contain ascii characters).
I believe we can do this in the jsonschema crate, but that leaves the following problems:
- A resource implemented in an arbitrary language would need to have a JSON Schema library that supports custom keywords and reimplement the keyword behavior themselves. If there's an RDK for that language, it would need to provide the keyword and handling for the developer.
- We don't have a way to get VS Code (or other editors) to understand the custom keyword.
This is primarily a problem for validation and applicator keywords, but not annotation keywords (which can be ignored when they aren't understood). Ignoring the x-dsc-secure
keyword would fundamentally break understanding and validation of data for any property using that keyword.
Pass sensitive property names as metadata
We could define a metadata field that resources can optionally support as write-only metadata input where the field tells the resource which property names should be handled as sensitive data, even for properties that aren't sensitive by design. In this model:
- DSC checks whether the resource instance schema supports indicating properties with sensitive data. If the resource doesn't support this behavior, DSC just invokes the resource as normal. Otherwise, it continues through the following steps.
- Dsc analyzes the resource instance for any properties where the value is a secure object or secure string.
- If any properties contain securing values, DSC inserts the array containing the names of those properties into
_metadata
before invoking the resource. - The resource is responsible for checking the list of sensitive property names and not emitting those values in any messaging.
The property could be defined with a JSON Schema such as:
title: Properties with sensitive values
description: >-
This field contains an array of property names for the resource instance where the property
contained one or more secure strings or secure objects. Resources accepting this metadata
must not emit these values in any messages.
writeOnly: true
type: array
items:
title: Name of property with sensitive values
description: >-
Each item in the array is the name of a resource instance property where the user passed a
value for that property containing one or more secure strings or secure objects.
type: string
This could be inserted at the top level of the _metadata
or nested under the Microsoft.DSC
field. I propose naming this metadata field itself propertiesWithSensitiveValues
. If the field is inserted at the top level, it should be prefixed with an underscore. If inserted within the Microsoft.DSC
field, it doesn't need a prefix.
For example, given the following configuration snippet:
- type: Example/Resource
name: Example with sensitive property values
properties:
token: "[secret('api_token')]"
credential: "[params('service_credential')]"
foo: "[secret('foo')]"
baz: true
DSC would send either of the following snippets as the input JSON (formatted as YAML for readability):
-
Top-level metadata:
token: token_returned_from_secret_vault credential: username: parameterized_username password: parameterized_password foo: foo_returned_from_secret_vault baz: true _metadata: _propertiesWithSensitiveValues: - token - credential - foo
-
Nested in
Microsoft.DSC
metadata:token: token_returned_from_secret_vault credential: username: parameterized_username password: parameterized_password foo: foo_returned_from_secret_vault baz: true _metadata: Microsoft.DSC: propertiesWithSensitiveValues: - token - credential - foo
The resource would then be responsible for adhering to the implied contract and not emitting
values for those named properties.
Pass sensitive property names as a canonical write-only property
This option builds on the proposal for the metadata but defines it as a canonical property instead.
In this model:
- DSC checks whether the resource instance schema defines the
_propertiesWithSensitiveValues
canonical property. If the resource doesn't have this canonical property, DSC just invokes the resource as normal. Otherwise, it continues through the following steps. - DSC analyzes the resource instance for any properties where the value is a secure object or secure string.
- If any properties contain securing values, DSC inserts the array containing the names of those properties into
_propertiesWithSensitiveValues
before invoking the resource. - The resource is responsible for checking the list of sensitive property names and not emitting those values in any messaging.
We could define the canonical property JSON Schema like this:
title: Properties with sensitive values canonical property
description: >-
This property contains an array of property names for the resource instance where the property
contained one or more secure strings or secure objects. Resources defining this canonical
property are indicating that they adhere to the property contract and won't emit any values
for the received property names in any messaging.
writeOnly: true
type: array
items:
title: Name of property with sensitive values
description: >-
Each item in the array is the name of a resource instance property where the user passed a
value for that property containing one or more secure strings or secure objects.
type: string
For an example of how this would work in DSC, given the following resource instance definition
snippet:
For example, given the following configuration snippet:
- type: Example/Resource
name: Example with sensitive property values
properties:
token: "[secret('api_token')]"
credential: "[params('service_credential')]"
foo: "[secret('foo')]"
baz: true
DSC would send the resource the following input JSON (formatted as YAML for readability):
token: token_returned_from_secret_vault
credential:
username: parameterized_username
password: parameterized_password
foo: foo_returned_from_secret_vault
baz: true
_propertiesWithSensitiveValues:
- token
- credential
- foo
The resource would be responsible for adhering to the contract of the canonical property and not
emit the values of the token
, credential
, or foo
properties in any messages.
Additional considerations
For both the reusable subschemas and canonical properties, we could probably intelligently handle sending the correct data type to the resource and introduce warnings/errors when users try to send simple (non-secure) objects or strings to those properties. Alternatively, we could emit a warning but wrap the value for the resource anyway.
We should never leak secrets to messages or output data from DSC itself. We can't prevent resources from leaking secrets.
Summary
I think the best short term option is to support the canonical property for indicating which properties contain sensitive values even when the resource didn't consider them sensitive by design. This model would immediately enable resource authors to clearly indicate adherance to this contract.
In the medium term, I still see value in the _token
and _credential
canonical properties, but I don't think we need to wrap the data in secure strings or objects when sending to the resource. Instead, we could warn the user when they're not passing secure data for those properties. I think we could probably benefit from defining the pattern properties, but those are lower priority (though would eventually simplify resource instance schema authoring).
We should ensure that we keep an awareness of the data sensitivity for resources referencing these subschemas or defined with these canonical properties so we don't leak those values.