Alternative Methods to Breeding Value Prediction in Loblolly Pine
-
Phenotypic variation in forest trees may be partitioned into genomic and environmntal compenets which are consequently used to estimate the heritability of traits as the proportion of total phenotypic variation attributed to genetic variation.
-
Applied tree breeding programs can use matrices of relationships, based either on recorded pedigrees in structured breeding populations or on genotypes of molecular genetic markers, to model genetic covariation among related individuals and predict genetic values for individuals for whom no phenotypic measurements are available.
-
This study tests the hypothesis that genetic covariation among individuals of similar genetic value will be reflected in shared patterns of gene expression. We collected gene expression data by high-throughput sequencing of RNA isolated from pooled seedlings of parents with known genetic value, and compared alternative approaches to data analysis to test this hypothesis.
- All information about samples is located in the RNA Seq Data repo
The counts were normarlized multiple ways, however the following way was used for prediction:
- Using techincal replicate counts and asreml to normalize for batch, index, lane, and pedigree:
-
For more normalization schemes using DESEQ2, edgeR, sommer in bio and tech see repo folder step3.normalization
-
SNP's were filtered multiple ways:
- The final data sets used for prediction were restructured to be in identical order:
- The EW vs. LGEP
-
EW and LGEP families were subset into train and test objects
Family mean estimates of counts and snps were used for prediction with OmicKriging and glmnet (lasso/ridge):
Instead of predicting across batch, here we split the complete data set into a 7-fold CV (repeated 10 times). The cv groups were split so that each test fold had individuals which were spread across the phenotypic range:
Predictions were conducted using a maximum training size of 55 to predict the 56th family using OK, lasso, & ridge. Script:
**The below part is defunct, the scripts are still there but are not used.
Estimate anova scores for features using LGEP and then conduct prediction on EW
Utilizing the biological replicate data sets, ANOVA scores were estimated for each feature (snp/transcript):
Family mean estimates of counts and snps were used for prediction with OmicKriging and glmnet (lasso/ridge):
The below part is defunct, the scripts are still there but are not used.
First construct the 70 test groups, estimate ANOVA scores, and then conduct predictions
Just as when predicting on the EW families, predictions were carried out for each of the 70 unique test groups: