-
Notifications
You must be signed in to change notification settings - Fork 1
/
merge_troubleshoot.html
185 lines (182 loc) · 7.72 KB
/
merge_troubleshoot.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta content="en-us" http-equiv="Content-Language"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="no-cache, no-store, must-revalidate" http-equiv="Cache-Control"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="0" http-equiv="Expires"/>
<title>
allpairs_global command
</title>
<link href="stylesx.css" rel="stylesheet" type="text/css"/>
<style type="text/css">
body.c4 {background-color:#c0c0c0;}
div.c3 {position:absolute; top:45px; left:20px; width:830px; background-color:#ffffff; border-width:10px; border-style:solid;border-color:white;}
span.c2 {font-weight: bold}
div.c1 {position:absolute; top:10px; left:20px; width:850px; height:60px;}
.TopButtonPara { color:white; background-color:rgb(50,100,150); border-color:rgb(50,100,150); font-family:Arial, Helvetica, sans-serif; font-weight:normal; font-size:9pt; text-align:center; border-width:4px; border-style:solid; }
.TopButton { color:white; }
a.TopButton:link { text-decoration:none; }
a.TopButton:visited { text-decoration:none; }
a.TopButton:hover { color:orange; }
.NewButtonPara { color:white; background-color:rgb(50,100,150); border-color:rgb(50,100,150); font-family:Arial, Helvetica, sans-serif; font-weight:normal; font-size:9pt; text-align:center; border-width:4px; border-style:solid; }
.NewButton { color:white; }
a.NewButton:link { text-decoration:none; }
a.NewButton:visited { text-decoration:none; }
a.NewButton:hover { color:orange; }
.SideButtonPara { color:white; font-family:Arial, Helvetica, sans-serif; font-size:9pt; font-weight:normal; text-align:center; line-height:18px; }
.SideButton { color:white; }
a.SideButton:link { text-decoration:none; }
a.SideButton:visited { text-decoration:none; }
a.SideButton:hover { color:orange; }
</style>
</head>
<body style="background-color:#c0c0c0;">
<div>
<a href="https://drive5.com/usearch">
<img alt="USEARCH v12" src="usearch12_banner.jpg" style="position:absolute; top:40px; left:10px; padding:0px; border:0px;"/>
</a>
</div>
<div style="position:absolute; top:115px; left:10px; width:850px; background-color:#ffffff; min-height:500px">
<div style="position:relative; float:left; background-color:#696969; width:125px; left: 0px; min-height:500px; padding:5px; height: 125px;">
<div class="SideButtonPara" style="text-align:center; padding-top:5px;">
<a class="SideButton" href="index.html">
Docs home
</a>
<br/>
<hr style="border:0; border-bottom: 1px solid white;"/>
<a class="SideButton" href="cmds.html">
Commands
</a>
<br/>
<a class="SideButton" href="topics.html">
Topics
</a>
<br/>
<a class="SideButton" href="citation.html">
Publications
</a>
<br/>
</div>
</div>
<div class="ManText" style="left:20px; position: absolute; left:135px; width:695px; background-color:white; padding:10px">
<h1>
Trouble-shooting fastq_mergepairs problems
</h1>
<p>
<span class="ManText">
<strong>
See also
</strong>
<br/>
<a href="cmd_fastq_mergepairs.html">
fastq_mergepairs command
</a>
<br/>
<a href="merge_options.html">
fastq_mergepairs options
</a>
<br/>
<a href="merge_report.html">
Reviewing a fastq_mergepairs report to check for problems
</a>
<br/>
<a href="merge_tabbed_check.html">
Using the tabbedout file to investigate merging problems
</a>
<br/>
<a href="merge_check.html">
Validating merged reads to check for problems
</a>
<br/>
<br/>
The primary tools for trouble-shooting merge problems are the report files generated by the -report, -tabbedout and -alnout options.
<br/>
<br/>
The report file gives a summary which will show if many pairs failed to merge, and if so will give the most common reasons why they failed, e.g. because no alignment was found or there were too many mismatches (differences) in the alignment.
<br/>
<br/>
To investigate examples, take a small subset of pairs which failed to merge and examine the tabbedout and alnout files. Example command lines:
</span>
</p>
<p class="ManCode">
usearch -fastq_mergepairs *R1*.fastq
<span class="ManCode">
-fastqout_notmerged_fwd fwd.fq -fastqout_notmerged_rev rev.fq
<br/>
<br/>
usearch -fastx_subsample fwd.fq -reverse rev.fq -fastqout subf.fq -output2 subr.fq -sample_size 100
<br/>
<br/>
usearch -fastq_mergepairs subf.fq -reverse subr.fq -tabbedout out.txt -report report.txt \
<br/>
-alnout aln.txt
<br/>
<br/>
</span>
<span class="Text">
The format of the tabbedout file is not documented in detail (and is subject to change in different usearch builds), but is fairly self-explanatory. Each read pair is one line in the file. The read label is the first field. Subsequent fields are separated by tabs. Each field reports the results of one step in the merging process:
</span>
</p>
<p class="ManCode">
<span class="ManCode">
M00967:15:000000000-A2G1J:1:1101:18083:3926 aln=123-128-121 diffs=15
<span class="auto-style2">
toomanydiffs result=notmerged
</span>
</span>
</p>
<p class="ManText">
This shows that the pair failed to merge because there were too many (15) mismatches in the alignment.
</p>
<p class="ManText">
The aln= field has three values separated by dashes: number of unaligned bases in the forward read, alignment length, and number of unaligned bases in the merged read. In the example above, there are 15 mismatches in the alignment (diffs=15) which exceeds the maximum (toomanydiffs) and the pair is therefore discarded (result=notmerged). You can find the alignment in the alnout file, or extract just this read pair (
<a href="DELETE_URL">
fastx_getseq
</a>
) so that you get just one alignment.
</p>
<p class="auto-style1">
<strong>
Trouble-shooting pairs that do not align
<br/>
</strong>
If the reads do not align, this may be because there are too many differences in the alignment due to read erors, the alignment is very short, or there is no overlap because the amplicon is too long. Long amplicons may be caused by PhiX reads or by amplifying a different gene or region; you can check for this by aligning the forward and reverse reads to a database of known genes (e.g. SILVA for 16S).
</p>
<p class="auto-style1">
You can check whether a plausible alignment exists by using
<a href="DELETE_URL">
ublast
</a>
to align the reads. Use
<a href="DELETE_URL">
fastx_getseq
</a>
to get the pair into FASTQ files containing just one read (fwd.fq and rev.fq). Use
<a href="DELETE_URL">
fastx_revcomp
</a>
to get the reverse read on the same strand, and use the -strand plus option of ublast to exclude matches on the wrong strand. For example,
</p>
<p class="auto-style1">
<span class="ManCode">
usearch -fastx_getseq R1s.fq M00967:15:0000 -fastqout fwd.fq
</span>
</p>
<p class="auto-style1">
<span class="ManCode">
usearch -fastx_getseq R2s.fq M00967:15:0000 -fastqout rev.fq
<br/>
<br/>
usearch -fastx_revcomp rev.fq -fastqout rev_rc.fq
<br/>
<br/>
usearch -ublast fwd.fq -db rev_rc.fq -strand plus -evalue 1e-2 -alnout hits.txt
</span>
<br/>
</p>
</div>
</div>
</body>
</html>