Skip to content

[meta] HDP Declarative Programming (working draft) #16

@fititnt

Description

@fititnt

Trivia:

  • HDP naming:
    • HDP = 'HDP Declarative Programming' is the default name.
    • HDP = 'Humanitarian Declarative Programming' could be one way to call when the intent of the moment is strictly humanitarian.
      • The definition of humanitarian is out of scope.

The triggering motivation

Context: the HXL-Data-Science-file-formats was aimed to use HXL as file format to direct input on softwares for data mining/machine learning/statistical analysis and, since HXL is an solution for fast sharing of humanitarian data, one problem becomes how to also make easier authorization to access data and/or minimal level of anonymization also as fast as possible (ideally in real time).

  1. The initial motivation for HDP was to be able to abstract "acceptable use policies" (AUP) both understood by humans (think from judges able to enforce/authorize usage to even local community representatives that could write rules knowing that machines could enforce then) on how data could be processed and by machines.
    1. The average scenario usage in this context is already with a huge level of lack of trust between different groups of humans.
      1. While by no means (in special because I'm from @EticaAI, and we mostly advocate for avoiding misuse of A/IS) implementing systems would not make mistakes, from the start we're already planning ways to allow auditing without actually needing to access sensitive data.
      2. Auditing rules that could be reviewed even by people outside the origin country or without knowledge of the language would make things easier,
        1. even if this means an human, how know the native language but no programming, could create an quick file to say what one term means
        2. ... and even if it means already planning ahead one way that such translations tables could be digitally signed
  2. Is possible that the average usage of HDP (if actually go beyond proof of concepts or internal usage) may actually be ways to both reference datasets (eve if they are not ready to use, but could be triggered to built) and instructions to process them
    1. To be practical, only a syntax for abstract "acceptable use policies" (AUP) without some implementation would not be usable. So this actually is a requirement.
    2. At this moment (2021-03-16) is not clear what should be very optimized for end user HDP and what could be on just a few languages (like in English and Portuguese), but the idea of already try to allow use of different natural languages to express references to datasets works as a benchmark.

Some drafted goals/restrictions (as 2021-03-16):

  1. Both the documentation on how to write the concept of HDP and the proof of concepts to implement are public domain dedication. BSD-0 can be used as alternative.

    1. No licenses or pre-authorizations to use are necessary.
  2. Be in the user creator language. This means that the underlining tool should allow exchange HDP files (that in practice means how to find datasets or manipulate them) for example in Portuguese or Russian and others could still (with help of HDP) convert the key terms of don't understand such languages

    1. The v0.7.5 already drafted this. But > v0.8.0 should improve the proof of concepts. At the moment the core_vocab already has the 6 UN official languages and, because this project was born via the @HXL-CPLP, the Portuguese language.
          2. Note that special care is done with HDP keywords and instructions that would likely to be used by people who, de facto, need to homologate how data can be used. Since often the data columns may be in the native language one or two humans with both technical skills and a way to understand the native language may need to create a filter and label such filter with tasks that accomplish what that users want and then digitally sign this filter.
    2. The inner steps of commands delegated to underlining tools (the wonderful HXL python library is a great example!) is not aimed for average end users so, for the propose of this goal and (as new tools to abstract could increase over time) to make easier localization we strictly don't grant them translations.
      1. There exists a possibility that the code editors already show usage tips in English for the underlining tools, that at least for some languages (like Portuguese) we from HXL-CPLP may translate the help messages. Something similar could be done in other languages with volunteers.
  3. The syntax of HDP must be planned in such a way that make it intentionally hard to average user save information that would make the file itself a secret (like passwords or direct URLs to private resources)

    1. Users around human rights or typical collaboration in the middle of urgent disasters may share their HDP files with average end user cloud file sharing. Since HDP files themselves may be shared across several small groups (but users only know that the dataset exists, while not requesting it), if in the worst case scenario only the data that one group is affected, the potential damage is mitigated by default.
    2. In general the idea is to allow some level of indirection for things that need to be kept private while still maintaining usability.
  4. Be offline by default and, when applicable, be air-gapped network friendly

    1. In some cases people may want to potentially use HDP to manage files on one local network because they need to work offline (or the files are too big) while using the same HDP files to share with others, but others could still use an online version.
    2. Since HDP collection of files could potentially allow really big projects, soon or later in particular who need to have access to data from several other groups could fear that such level of abstraction could lead to being targeted. While this is actually not a problem unique to HDP potential usage and most documentation is likely to be focused to help who consumes sensitive data on the last mile, at least allow applicability of air-gapped network seems reasonable.
    3. Please note that each case is a case. By "being offline by default" doesn't mean that all resources must be downloaded (in fact, this would be opposite of interest of who would like to share data with others, even if they're trusted), but the fact that command line tools or projects that make reference for load resources outside of network need to have at least some explicitly authorization.
  5. "Batteries included", aka try already offer tools that do syntax checks of HDP files.
        1. If you use a code editor that supports JSON Schema, the v0.7.5 already has an early version that warns misuse. At the moment it still requires writing with the internal terms used (Latin). But if eventually the schema becomes generated using the internal core_vocab, this means that other languages would have such support too.

  6. The average HDP file should be optimized if it needs to be printed on paper as it is and have ways to express complex but common items of acceptable use policy (as 2021-03-16 not sure if this is tbe best approach) as some sort of constant. (This means the ideal max characters per line and typical indentation level should be carefully planned ahead). (This type of hint was based on suggestions we hear)

    1. Yes, we're in 2021, but as friendly as possible to allow the HDP files (in special the ones that are about authorization) being able to be saved as PDF or even on paper, the better. Even in places that allow attach digital archives, while the authorization can be public, the attachments may require extra authorization (like being a lawyer or at least be in person requesting the files).
    2. Ideally the end result could be concise enough to discourage large amounts of texts on the files themselves (even if it means we developer like "custom constants" that are part of the HDP specification itself, like an tag that means 'authorized with full non-anonymised access to strict use to red cross/MSF' or 'destruct any copy after not more necessary').
      1. Such types of constants can both help to make rules concise (so worst case scenario if people have to write again letter by letter something that is just not an customization of well know HDP example/template files, its possible) but also would allow with automatic translation
  7. Do exist other ideas, but as much of possible, both by the syntax of HDP files (that may be easier just have translation for the core key terms) and, if necessary, creation of constants to abstract concepts, ideally should allow that the exact file (either digitally signed or with literally PDF of an judge authorization, so the "authorization" could be an link to such file) be able to be understood even outside the original country.

    1. Both for how some custom filters may need to be created, or if either the language used was a totally new one, or the original source wrote a term wrong, the idea here is allow an human, who accepts and digitally signs an extra HDP file, can take full responsibility for mistakes.
    2. Again, the idea of average HDP files not requiring ways to point to resources or have reference to passwords also is perfect when is made by paper and the decisions (and who create the underlying rules if is something more specific) could be audited.
      1. Note that some types of auditing could be a human reading the new rule or, since the filters start to have common patterns, the filters someone else creates can be tested against example datasets. While not as ideal as human review, as long as some example datasets for that language already exist (think for example one that simulates Spreadsheets malformed but with personal information) could be used against what was proposed to help that rule of the initial user. (this type of extra validations don't need to be public)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions