Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize sqibble #83

Open
DominikRafacz opened this issue Feb 28, 2021 · 2 comments
Open

Formalize sqibble #83

DominikRafacz opened this issue Feb 28, 2021 · 2 comments
Assignees
Labels
discussion Development direction idea to quarrel over enhancement We don't do that here... yet refactor Because we have too much free time

Comments

@DominikRafacz
Copy link
Collaborator

sqibble is non formalized idea, by formalization of which the package may benefit in numerous ways.

We can define sqibble as a tibble containing at least one column of type sq. Additionally, exactly one of columns of type sq has a special role of being "sequence" column. sqtibble has also attribute column_roles which is a named character vector with at least one element. This element has name sequence and value that is equal to the name of the "sequence" column (which usually is equal to "sequence").

Other columns in the sqibble can also have roles specified. In this case, the mapping between a column's role (the role name is determined by the functions that use and generate the column) and its actual name (which can potentially change) is done using the column_roles attribute. Another frequently used role will potentially be "name", a column that determines the name of the sequence.

By specifying roles in this way, we will be able to create a function (working title: extract_role_column) to extract from sqibble a column with the required role. If it is not available, a warning and a column with NA will be returned, or an error altogether -- the user will be able to specify the security level (as with other functions).

Why do we need such formalization? It will allow us to write functions that operate on such objects instead of writing functions that take several vectors including one sequence vector. An example of such a function is currently write_fasta -- it takes two vectors: x and name. With a formalization like the one described above, the function will instead be able to take a single parameter -- sqibbl. The requirement will be for sqibble to have columns with the roles "sequence" (which, recall, is a general requirement on sqibble) and "name". A call to

write_fasta(some_sqibble)

will then be equivalent to a call to

write_fasta(x = some_sqibble %>% extract_role_column("sequence"), name = extract_role_column("name"))

which currently, if we are using unformed sqibbles looks like this:

write_fasta(x = some_sqibble %>% pull("whatever-name-sequence-column-has-i-have-no-freaking-idea"), name = some_sqibble %>% pull("whatever-name-name-has"))

It could bring ease of use to users and another convenience to potential developers.

@DominikRafacz
Copy link
Collaborator Author

By the way -- read_fasta currently returns a sqibble with the name sq for the sequence column, which can sometimes be problematic or confusing, perhaps it would be better to use the name "sequence"?

@ErdaradunGaztea ErdaradunGaztea added this to To discuss in Abstract ideas via automation Feb 28, 2021
@ErdaradunGaztea ErdaradunGaztea added discussion Development direction idea to quarrel over enhancement We don't do that here... yet refactor Because we have too much free time labels Feb 28, 2021
@michbur michbur self-assigned this Mar 8, 2021
@michbur
Copy link
Member

michbur commented Mar 8, 2021

By the way -- read_fasta currently returns a sqibble with the name sq for the sequence column, which can sometimes be problematic or confusing, perhaps it would be better to use the name "sequence"?

We should use "name" or "id" and "sequence". Thoughts on the naming convention @leonjessen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Development direction idea to quarrel over enhancement We don't do that here... yet refactor Because we have too much free time
Projects
Abstract ideas
  
To discuss
Development

No branches or pull requests

3 participants