[meta] HDP files strategies of integrity and authenticity (hash, digital signatures, ...)

Related:
- **_[meta] HDP Declarative Programming (working draft) #16_**


---

First things first: one primary goal of HDP files themselves is both to allow exchange of how to reference datasets and how data is allowed to be manipulated and, as consequence, this means auditability. Also, HDP files (at least the ones used for end users) are meant to be usable if printed on paper (think a judge attaching HDP instructions that on worst case someone would have to digit again). **HDP files should be human readable**

Note that the data themselves can (and by default is!) considered sensitive. But the ideal (and, this means what is being optimized) is that even if people exchange HDP files could do it without fear if the files leak or need to be audited. This means that even if we could make it easier to embed passwords or direct access to private resources on file **we're likely to make it intentionally hard**, so the average user is likely to simply don't know how to use it.

>> **File Based Encryption of (typically) HDP file is not an goal, but integrity (and some cases, authenticity) is required**

## 1. So what's the point of integrity checks?

One core feature of HDP is having in common vocabulary to allow translation of the HDP files between different human natural languages and do in such a way that whatever was the original natural language written, the file could ideally still keep like the original way.

In other words: if the HDP file is being translated on-the-fly if an user does not understand Modern Standard Arab, we could have multiple teams exchange (maybe even working with the same filesystem!) even if most people don't speak same language

But then one point of improvement happen:
1. **how to check if an on-the-fly translation was not changed?**
2. **What if tools make some easy to catch mistakes and now the original file is not reversible on-the-fly?**
3. **What if tools that make the hash received upgrade, new hashing do not match?** (Note that for this case, since HDP have much, MUCH more moving parts than an static file, users could upgrade old files or at least use external tool, like file based, to test integrity)

Note: the HDP files themselves (as soon as eventually not just Latin Language being the reference, but all other core languages being equally valid) may intentionally need changes. So some way to check can help humans to avoid out-of-sync states

### 1.1 Some non-cryptographic hashing

Actually to make it feasible to translate from and to other languages we need some integrity check. This is why we need to get it working as soon as possible.

It's not rocket science. Even an MD5-like would do it. This is meant to be used for non-intentional errors.

We may actually use some weak (and explicitly say that is weak) hashing integrity check so the users don't have a false sense of security.

### 1.1 Authenticated signatures

Authenticated signatures, maybe both with a secret (think password-like string) or public key authentication still worth having. Note that it is always still possible to just do this with entire source files (without using any HDP internal hashing to selectively ignore parts that don't matter) but at some point we may also release some way to allow authenticated/integrity checks also considering internals.

But the main point here is that if the default is not user friendly enough, or it could actually make users experience miserable (like keep track of several secrets just to know the authenticity, and then encourage bad usages) we may enforce everyone.

Also, we're aware one actually the average user base (instead of maybe use Git, like private repositories on GitHub/GitLab/Gitee) is likely way to share would be Google Drive/Dropbox/Etc and (even without considering "State Sponsored attacks'', but actually just someone stealing access from an collaborator to that cloud storage;). So actually may be desirable to use such features if the files themselves are saved outside an secure network.

## 2. Reflective quote "What's your threat model?" (Extra: memes added)

There are so many potential threat models that, at least in my personal option, we could either go for users' simplicity (while still operational) or go full military-grade authenticity, like use of GPG FIPS compliant smartcards ready to use on air gapped networks.

On image: meme about threat models

![Captura de tela de 2021-03-21 22-10-25](https://user-images.githubusercontent.com/812299/111928626-f5491d80-8a92-11eb-8e37-fd2b5eff1b02.png)




Note that I'm very aware that (in special for potential users who create HDP files or process HDP files from others) the ideal perfect usage (think like an information manager working as an data hub for MANY other working groups) is the extreme of air gapped network, but our point here is that HDP files themselves shouldn't require the same level of sensitivity of data themselves. We may not be able to implement the most user-friendly implementations, but whoever processes the data or prepares HDP files to be exchanged, should care that the consumers must have some friendly way to check authenticity.


On image: meme about how we should not use ways to check authenticity (that is different from encryption) that average end user could use it wrong.


![vault-no30-company-safe-acces-okay-so-the-password-is-2739121](https://user-images.githubusercontent.com/812299/111928718-2f1a2400-8a93-11eb-81ba-5a52a0b7c7cd.png)


## 3. Opinionated idea about not use security by obscurity or "strong algorithms" used wrong.

This is directed to people who would think that AES 256 is 2 times stronger than AES 128. This is from 2009, but for who undestand English, can give an idea of who just using strong algoritms can make things go wrong https://www.youtube.com/watch?v=ySQl0NhW1J0.

I also really like the idea of we try to focus on acceptable secure that is more likely to not be used wrong. Note that an good part of HDP itself, by allowing multiple natural languages, meet the criteria 2 on '2. Speak the user’s language!':

> Source: https://www.usenix.org/sites/default/files/conference/protected-files/hotsec15_slides_green.pdf

![Captura de tela de 2021-03-21 22-25-58](https://user-images.githubusercontent.com/812299/111929404-0004b200-8a95-11eb-8f59-89914d87cf7f.png)

In other words, in general maybe the HDP itself as one way to exchange what is meant to be is likely to not implement features that are unsafe for average user, and when is not avoidable implement ones that can go wrong, we still keep simplicity by default while allowing who have advanced threat models fit an HDP on your current workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[meta] HDP files strategies of integrity and authenticity (hash, digital signatures, ...) #17

1. So what's the point of integrity checks?

1.1 Some non-cryptographic hashing

1.1 Authenticated signatures

2. Reflective quote "What's your threat model?" (Extra: memes added)

3. Opinionated idea about not use security by obscurity or "strong algorithms" used wrong.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[meta] HDP files strategies of integrity and authenticity (hash, digital signatures, ...) #17

Description

1. So what's the point of integrity checks?

1.1 Some non-cryptographic hashing

1.1 Authenticated signatures

2. Reflective quote "What's your threat model?" (Extra: memes added)

3. Opinionated idea about not use security by obscurity or "strong algorithms" used wrong.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions