Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read-seqs input parameter improvement #36

Open
penuts7644 opened this issue Feb 4, 2019 · 2 comments
Open

read-seqs input parameter improvement #36

penuts7644 opened this issue Feb 4, 2019 · 2 comments

Comments

@penuts7644
Copy link
Contributor

Hi Quentin,

According to the documentation for the -read-seqs parameter, the input CSV file should be formatted as: with the sequence index as first column and the sequence in the second separated by a semicolon ';'.

I would think that I would be able to pass in a CSV file with multiple semicolon separated columns and that IGoR will only use the first two. However, what happens is that each line is only separated on the first semicolon character found in that line. This means that the second column is combined with the remaining columns.

Example:

This index;sequence;other_data will turn into: index as first column and sequence;other_data as second column.
I would expect the following to happen: index as first column and sequence as second column.

Is there a reason for this behaviour?

Cheers, Wout

@qmarcou
Copy link
Owner

qmarcou commented Mar 5, 2019

Hi @penuts7644,
Nope there is no good reason other than: by assuming there are only two colums to the CSV the user cannot make mistakes in the column ordering.
I agree this is not very handy and I will try and make a e change for a slightly more flexible format
Best
Quentin

@decenwang
Copy link

Hi All, @qmarcou @penuts7644

  1. Another question. when I input the sequences in fasta format by 'igor -read_seqs' command line, but I did not assign the index for the sample. and I found in the /tmp file, the sequences were automatically added the number with semicolon, e.g. 0; 1; 2; ………………. According to definition, the numbers are the indices, but not the DNA index/barcode. because they are from the same sample, so I really need to assign the index for each sample(all the sequences of each sample)? Anyway, I hope igor can recognize the index by itself if I can input an index file before inputing the sequences. maybe single index or dual indices.
  2. If we use the PE sequencing, fast-dump splits, trimmomatic trims. and then we get split read1 and read2 files for each sample. So both of the Read1 and Read2 within one sample should be analyzed, or I just analyze either read1 or read2?
  3. Could you please add a plugin or functionality as a translator from DNA into peptide? since TCR chains are special, they need the help of MHC I/II, namely the anchor residues. special amino acids (e.g. Arg, Glu may be different in numbers among different cohorts)

Thanks a million!

Repository owner deleted a comment from decenwang Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants