Breeding-Value-Prediction

Alternative Methods to Breeding Value Prediction in Loblolly Pine

Breeding-Value-Prediction

Abstract

Phenotypic variation in forest trees may be partitioned into genomic and environmntal compenets which are consequently used to estimate the heritability of traits as the proportion of total phenotypic variation attributed to genetic variation.
Applied tree breeding programs can use matrices of relationships, based either on recorded pedigrees in structured breeding populations or on genotypes of molecular genetic markers, to model genetic covariation among related individuals and predict genetic values for individuals for whom no phenotypic measurements are available.
This study tests the hypothesis that genetic covariation among individuals of similar genetic value will be reflected in shared patterns of gene expression. We collected gene expression data by high-throughput sequencing of RNA isolated from pooled seedlings of parents with known genetic value, and compared alternative approaches to data analysis to test this hypothesis.

Background of samples

All information about samples is located in the RNA Seq Data repo

Step 1 - Transcript normalization and SNP filtering

The counts were normarlized multiple ways, however the following way was used for prediction:

Using techincal replicate counts and asreml to normalize for batch, index, lane, and pedigree:

markdown

For more normalization schemes using DESEQ2, edgeR, sommer in bio and tech see repo folder step3.normalization
SNP's were filtered multiple ways:

markdown

The final data sets used for prediction were restructured to be in identical order:

markdown

Step 2 - Prediction of EW families with LGEP

The EW vs. LGEP

Organizing test and train data sets

EW and LGEP families were subset into train and test objects

markdown

Conduct prediction on EW

Family mean estimates of counts and snps were used for prediction with OmicKriging and glmnet (lasso/ridge):

EW predictions

Step 5 - Prediction of 70-fold CV

Construct 70 test groups

Instead of predicting across batch, here we split the complete data set into a 7-fold CV (repeated 10 times). The cv groups were split so that each test fold had individuals which were spread across the phenotypic range:

create 70 fold

Conduct prediction on each of the test folds using all data

prediction_script

Visualize predictions

70 Fold CV visualization

Step 6 - Prediction using LOO

Predictions were conducted using a maximum training size of 55 to predict the 56th family using OK, lasso, & ridge. Script:

LOO Script

Visualize prediction of LOO

LOO markdown

**The below part is defunct, the scripts are still there but are not used.

Estimate anova scores for features using LGEP and then conduct prediction on EW

Generate anova scores using LGEP as training

Utilizing the biological replicate data sets, ANOVA scores were estimated for each feature (snp/transcript):

LGEP_ANOVA

Conduct prediction on EW

Family mean estimates of counts and snps were used for prediction with OmicKriging and glmnet (lasso/ridge):

EW predictions

The below part is defunct, the scripts are still there but are not used.

First construct the 70 test groups, estimate ANOVA scores, and then conduct predictions

Conduct prediction on each of the test folds across pvals

Just as when predicting on the EW families, predictions were carried out for each of the 70 unique test groups:

predict 70-fold

Visualize prediction of 70-fold

70-fold-cv markdown

Name		Name	Last commit message	Last commit date
Latest commit History 303 Commits
disk6directory		disk6directory
old		old
.DS_Store		.DS_Store
.gitignore		.gitignore
11_3_20_all3_test_snp_prediction.v3.R		11_3_20_all3_test_snp_prediction.v3.R
Batch3_phenos.csv		Batch3_phenos.csv
Breeding-Value-Prediction.Rproj		Breeding-Value-Prediction.Rproj
PedigreeOrigins_ValidationSet.xlsx		PedigreeOrigins_ValidationSet.xlsx
README.html		README.html
README.md		README.md
_config.yml		_config.yml
cts.csv.zip		cts.csv.zip
cts_evi78k.csv		cts_evi78k.csv
evigene78.3kseqs.fa.gz.aa		evigene78.3kseqs.fa.gz.aa
evigene78.3kseqs.fa.gz.ab		evigene78.3kseqs.fa.gz.ab
expt.dat.720.RData		expt.dat.720.RData
fam_eff_high_mod.012		fam_eff_high_mod.012
fam_eff_high_mod.012.indv		fam_eff_high_mod.012.indv
fam_eff_high_mod.012.pos		fam_eff_high_mod.012.pos
fam_no_eff.012.indv		fam_no_eff.012.indv
fam_no_eff.012.pos		fam_no_eff.012.pos
fam_no_eff.012.tar.gz		fam_no_eff.012.tar.gz
fam_no_eff.TsTv		fam_no_eff.TsTv
out.012		out.012
out.012.indv		out.012.indv
out.012.pos		out.012.pos
segmentaa		segmentaa
segmentab		segmentab
segmentac		segmentac
segmentad		segmentad
segmentae		segmentae
segmentaf		segmentaf
segmentag		segmentag
segmentah		segmentah
segmentai		segmentai
segmentaj		segmentaj
segmentak		segmentak
segmental		segmental
segmentam		segmentam
segmentan		segmentan
segmentao		segmentao
segmentap		segmentap
segmentaq		segmentaq
segmentar		segmentar
segmentas		segmentas
segmentat		segmentat
segmentau		segmentau
segmentav		segmentav
segmentaw		segmentaw
segmentax		segmentax
segmentay		segmentay
segmentaz		segmentaz
segmentba		segmentba
segmentbb		segmentbb
segmentbc		segmentbc
segmentbd		segmentbd
segmentbe		segmentbe
segmentbf		segmentbf
segmentbg		segmentbg
segmentbh		segmentbh
segmentbi		segmentbi
snp_info.txt.tar.gz		snp_info.txt.tar.gz
sommer.ct.log2.all_pheno.RData		sommer.ct.log2.all_pheno.RData
sommer.ct.log2.rownames.RData		sommer.ct.log2.rownames.RData
sommer.ct1.log2.Rdata		sommer.ct1.log2.Rdata
sommer.ct10.log2.Rdata		sommer.ct10.log2.Rdata
sommer.ct11.log2.Rdata		sommer.ct11.log2.Rdata
sommer.ct12.log2.Rdata		sommer.ct12.log2.Rdata
sommer.ct13.log2.Rdata		sommer.ct13.log2.Rdata
sommer.ct14.log2.Rdata		sommer.ct14.log2.Rdata
sommer.ct15.log2.Rdata		sommer.ct15.log2.Rdata
sommer.ct16.log2.Rdata		sommer.ct16.log2.Rdata
sommer.ct17.log2.Rdata		sommer.ct17.log2.Rdata
sommer.ct18.log2.Rdata		sommer.ct18.log2.Rdata
sommer.ct19.log2.Rdata		sommer.ct19.log2.Rdata
sommer.ct2.log2.Rdata		sommer.ct2.log2.Rdata
sommer.ct20.log2.Rdata		sommer.ct20.log2.Rdata
sommer.ct21.log2.Rdata		sommer.ct21.log2.Rdata
sommer.ct22.log2.Rdata		sommer.ct22.log2.Rdata
sommer.ct23.log2.Rdata		sommer.ct23.log2.Rdata
sommer.ct24.log2.Rdata		sommer.ct24.log2.Rdata
sommer.ct25.log2.Rdata		sommer.ct25.log2.Rdata
sommer.ct26.log2.Rdata		sommer.ct26.log2.Rdata
sommer.ct27.log2.Rdata		sommer.ct27.log2.Rdata
sommer.ct28.log2.Rdata		sommer.ct28.log2.Rdata
sommer.ct29.log2.Rdata		sommer.ct29.log2.Rdata
sommer.ct3.log2.Rdata		sommer.ct3.log2.Rdata
sommer.ct30.log2.Rdata		sommer.ct30.log2.Rdata
sommer.ct31.log2.Rdata		sommer.ct31.log2.Rdata
sommer.ct32.log2.Rdata		sommer.ct32.log2.Rdata
sommer.ct33.log2.Rdata		sommer.ct33.log2.Rdata
sommer.ct34.log2.Rdata		sommer.ct34.log2.Rdata
sommer.ct35.log2.Rdata		sommer.ct35.log2.Rdata
sommer.ct4.log2.Rdata		sommer.ct4.log2.Rdata
sommer.ct5.log2.Rdata		sommer.ct5.log2.Rdata
sommer.ct6.log2.Rdata		sommer.ct6.log2.Rdata
sommer.ct7.log2.Rdata		sommer.ct7.log2.Rdata
sommer.ct8.log2.Rdata		sommer.ct8.log2.Rdata
sommer.ct9.log2.Rdata		sommer.ct9.log2.Rdata
train_data.rdata.RData		train_data.rdata.RData

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breeding-Value-Prediction

Abstract

Background of samples

Step 1 - Transcript normalization and SNP filtering

Step 2 - Prediction of EW families with LGEP

Organizing test and train data sets

Conduct prediction on EW

Step 5 - Prediction of 70-fold CV

Construct 70 test groups

Conduct prediction on each of the test folds using all data

Visualize predictions

Step 6 - Prediction using LOO

Visualize prediction of LOO

Generate anova scores using LGEP as training

Conduct prediction on EW

Conduct prediction on each of the test folds across pvals

Visualize prediction of 70-fold

About

Releases

Packages

Languages

arfesta/Breeding-Value-Prediction

Folders and files

Latest commit

History

Repository files navigation

Breeding-Value-Prediction

Abstract

Background of samples

Step 1 - Transcript normalization and SNP filtering

Step 2 - Prediction of EW families with LGEP

Organizing test and train data sets

Conduct prediction on EW

Step 5 - Prediction of 70-fold CV

Construct 70 test groups

Conduct prediction on each of the test folds using all data

Visualize predictions

Step 6 - Prediction using LOO

Visualize prediction of LOO

Generate anova scores using LGEP as training

Conduct prediction on EW

Conduct prediction on each of the test folds across pvals

Visualize prediction of 70-fold

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages