Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging qname from BAM file to readId.x #435

Closed
hannalee809 opened this issue Jun 27, 2024 · 4 comments
Closed

Merging qname from BAM file to readId.x #435

hannalee809 opened this issue Jun 27, 2024 · 4 comments

Comments

@hannalee809
Copy link

Hi!

I ran bambu analysis using se.multiSample (I have 3 replicates) and was having trouble merging the readIDs from the bam files to the output of se.multiSample. I have been trying to merge by the "readId.x" from the metadata(se.multiSample)$readToTranscriptMaps[[1]] with the "qname" from the bam file. Any suggestions or thoughts would be really helpful, thank you!!

@andredsim
Copy link
Collaborator

Hi there,

Could you paste the head() of the two tables you are trying to merge so that I can get a better idea of the problem.

Kind Regards,
Andre Sim

@hannalee809
Copy link
Author

Hi,

Thank you for your response! The tables that I am trying to merge are the unique_read_info which contains information from the bam file and the merged_data. The merged_data contains metadata(se.multiSample)$readToTranscriptMaps[[1]] which I had merged with the fullLengthCounts. Here are the head() of the two tables.

Screenshot 2024-07-01 at 8 16 44 AM

Screenshot 2024-07-01 at 8 16 55 AM

The qname and the readID.x are not matching up when I run se.multiSample, but when I run each sample individually and not together, it does match up. If it is helpful, here is the code I ran for the se.multiSample:

Screenshot 2024-07-01 at 8 22 26 AM

@andredsim
Copy link
Collaborator

Hi, Thanks for sharing these. It looks like you are running bambu correctly as I do not see any issues there.

I just have a few more questions.

Is unique_read_info from 1 bam file (ie. RNA_cell_naive1.bam)? And does that match with the first file name in this vector names(metadata(se.multipleSample)$readToTranscriptMaps). As you used metadata(se.multiSample)$readToTranscriptMaps[[1]] this selects the read map for the first bam file (however the order might differ from the input order)

When you try merge unique_read_info and merged data what is the output? Is it an error or perhaps an empty table? In the examples you show, could you find if "2bfee94f-8ea0-466e-aec4-ff14000f8cd1" %in% unique_read_info$qname?

How did you merge the full length counts and the read to transcript map, they do not have any columns that directly key into each other. There are ways around this but it is a bit messy as reads can have multiple equal matches?

What is your final goal output that you want to produce/question you want to ask? Maybe I can suggest an alternative way of producing it, if I know how.

Kind Regards,
Andre Sim

@hannalee809
Copy link
Author

Hi!

After looking more thoroughly into the code, I was able to resolve the merging issue! Essentially, I merged the metadata with the rowData of the multisample. This provided me with the columns needed to merge with the full length counts. Because of the merging of different data frames, it did get complicated and I had to be more careful with the process. Thank you so much for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants