Read processing with expHTS

Assuming each individual raw read file is named sample_name_R1.fastq and sample_name_R2.fastq, create a foulder sample_name for each individual containing the raw reads. Also, create file samples.txt containing a first column with "sample_name" and a second column with a file name where the cleaned reads will be deposited. For example:

SAMPLES_ID	SEQUENCE_ID
sample_name	sample_name_clean

Then, run expHTS:

expHTS preprocess -f samples.txt -A adapters_f_rc.txt

Generate pseudo-references using pseudo-it

Concatenate your R1, R2 and SE into a single file. We used the OryCun2.0 reference as the scaffolding reference. Run pseudo-it like so:

for s in *SE.fastq.gz;
do 
	echo "Single End is $s"
	echo "Read 1 is ${s/SE.fastq.gz/R1.fastq.gz}"
	echo "Read 2 is ${s/SE.fastq.gz/R2.fastq.gz}"
	echo "Output is ${s/_clean_SE.fastq.gz/_pseudoref}"
	python /usr/local/bin/pseudo-it.py -1 ${s/SE.fastq.gz/R1.fastq.gz} -2 ${s/SE.fastq.gz/R2.fastq.gz} -s $s -np 16 --nct 4 --nt 4 4 oryCun2.fa ${s/_clean_SE.fastq/_pseudoref};
done

Mapping reads to pseudo-reference

Index reference.

bwa index pseudoreference.fa

Map reads to each pseudo-reference.

Map single and paired end read data:

bwa mem -M -t 6 $my_path $my_first_read_file $my_second_read_file > $my_output_file
bwa mem -M -t 6 $my_path $single_read_file > $my_output_file

Create and sort bam files:

samtools view -b $mysam > $mybam
samtools sort $mybam -o $mysortedbam

Merge paired and single end bam files

samtools merge $final_bam $pe $se_file

Add read groups with Picard.

java -jar /usr/local/bin/picard.jar AddOrReplaceReadGroups INPUT="$name2"_merge.bam OUTPUT="$name2"_merge_addRG.bam RGID=M02585.25.1 RGLB= RGPL=illumina RGPU=H3LYTBCXX.2 RGSM=$name3

Remove Duplicates with Picard.

java -jar /usr/local/bin/picard.jar MarkDuplicates INPUT=$input OUTPUT=$output METRICS_FILE=$metrics REMOVE_DUPLICATES=true ASSUME_SORTED=true

Realign indels with GATK.

samtools index $input > $input.bai

java -jar /usr/local/bin/GenomeAnalysisTK.jar -nt 12 -T RealignerTargetCreator -R $path_to_reference -I $inputbam -o $output.realigner.intervals

java -Xmx2g -jar /usr/local/bin/GenomeAnalysisTK.jar -T IndelRealigner -R $path_to_reference -I $inputbam -targetIntervals $input.realigner.intervals -o $output.realigned.bam

Calculate coverage and capture statistics.

list.txt containts the name of each sample and respective pseudo-reference.

parallel --colsep '\t' 'java -jar /usr/local/bin/picard.jar CalculateHsMetrics BAIT_INTERVALS=/scratch/3/mafalda/SpeciesForReference/{4}/{3}_141216_Lepus_Ex1_MJ_EZ_HX1_capture_targets.interval_list TARGET_INTERVALS=/scratch/3/mafalda/SpeciesForReference/{4}/{3}_141216_Lepus_Ex1_MJ_EZ_HX1_capture_targets.interval_list INPUT={1}.realigned.bam OUTPUT={1}.realigned.picard.stats REFERENCE_SEQUENCE={2} PER_TARGET_COVERAGE={1}.per.target.coverage' :::: list.txt

Go back to main page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.pseudoreferences_and_mapping.md

1.pseudoreferences_and_mapping.md

Read processing with expHTS

Generate pseudo-references using pseudo-it

Mapping reads to pseudo-reference

Files

1.pseudoreferences_and_mapping.md

Latest commit

History

1.pseudoreferences_and_mapping.md

File metadata and controls

Read processing with expHTS

Generate pseudo-references using pseudo-it

Mapping reads to pseudo-reference