Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confidence intervals #17

Open
Homap opened this issue Mar 7, 2023 · 4 comments
Open

Confidence intervals #17

Homap opened this issue Mar 7, 2023 · 4 comments

Comments

@Homap
Copy link

Homap commented Mar 7, 2023

Hi,

I was wondering how are the confidence intervals calculated for pairwise LD calculation using r^2?

My LD calculation corresponds very well with patterns of genetic diversity and recombination rate, however, I see a very large confidence interval for r^2? What does that mean? Can I trust the r^2 values?

I have 20 unphased chromosomes (10 diploid individuals) for LD calculation so I have used the genotype data.

Thank you!

@hewm2008
Copy link
Contributor

hewm2008 commented Mar 9, 2023

The very large confidence interval for r^2? , it may be caused by the following two reasons,:
1 Your SNP dataset is too small, t.he number of snp is too small .
2 The quality of genome assembly is too poor (for example, the length of N between scaffolds is incorrect)

and now you can use para [-bin1 1000 -bin2 1000 -breaks 1000 ] to smooth curve .
perl Plot_OnePop.pl -inFile LDdecay.stat.gz -output OUT -bin1 1000 -bin2 1000 -break 1000

At first glance at your data, I think it may be due to the small sample size(only 10s), which caused the SNP to be too small, etc.

@Homap
Copy link
Author

Homap commented Mar 9, 2023

Thanks for your reply. The genome assembly is good since it was put into chromosomes by linkage maps and in addition, it was improved using optical mapping.

Then the problem is small sample size, that is 20 chromosomes.

In most cases, for example, when LD is about 0.1, low CI is about 0.08 and high CI is about 0.92 which encompasses the whole range of LD, however, the LD calculation is consistently close to the low CI.

As I said, the patterns of LD corresponds very well with what you would expect based on genetic diversity and recombination rate. In this case, can I trust the LD calculation? I get the average of LD over a window of about 200 Kb so in the case, I can get a variance estimate across the window but it is not the same thing as per pairwise comparison confidence interval for LD. I would appreciate very much if you have any suggestion or recommendations.

Thank you!

@Homap
Copy link
Author

Homap commented Mar 9, 2023

Another question, how would sample size impact calculation of LD decay? Would I expect to get a smaller or larger value of LD decay for a small sample size?

@hewm2008
Copy link
Contributor

hewm2008 commented Mar 9, 2023

The LD calculation R^2 is always close to the low CI.
I guess it is mainly because your sample size is too small. The small sample size will increase R^2, but it will also reduce the reliability, so you get a biger R^2 which close to the value of LowCI.

As for calculating the average value in Windows, I haven't tried it. You can try it to see if it will be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants