Skip to content

Formalize sqibble #83

@DominikRafacz

Description

@DominikRafacz

sqibble is non formalized idea, by formalization of which the package may benefit in numerous ways.

We can define sqibble as a tibble containing at least one column of type sq. Additionally, exactly one of columns of type sq has a special role of being "sequence" column. sqtibble has also attribute column_roles which is a named character vector with at least one element. This element has name sequence and value that is equal to the name of the "sequence" column (which usually is equal to "sequence").

Other columns in the sqibble can also have roles specified. In this case, the mapping between a column's role (the role name is determined by the functions that use and generate the column) and its actual name (which can potentially change) is done using the column_roles attribute. Another frequently used role will potentially be "name", a column that determines the name of the sequence.

By specifying roles in this way, we will be able to create a function (working title: extract_role_column) to extract from sqibble a column with the required role. If it is not available, a warning and a column with NA will be returned, or an error altogether -- the user will be able to specify the security level (as with other functions).

Why do we need such formalization? It will allow us to write functions that operate on such objects instead of writing functions that take several vectors including one sequence vector. An example of such a function is currently write_fasta -- it takes two vectors: x and name. With a formalization like the one described above, the function will instead be able to take a single parameter -- sqibbl. The requirement will be for sqibble to have columns with the roles "sequence" (which, recall, is a general requirement on sqibble) and "name". A call to

write_fasta(some_sqibble)

will then be equivalent to a call to

write_fasta(x = some_sqibble %>% extract_role_column("sequence"), name = extract_role_column("name"))

which currently, if we are using unformed sqibbles looks like this:

write_fasta(x = some_sqibble %>% pull("whatever-name-sequence-column-has-i-have-no-freaking-idea"), name = some_sqibble %>% pull("whatever-name-name-has"))

It could bring ease of use to users and another convenience to potential developers.

Metadata

Metadata

Assignees

Labels

discussionDevelopment direction idea to quarrel overenhancementWe don't do that here... yetrefactorBecause we have too much free time

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions