-
Notifications
You must be signed in to change notification settings - Fork 1
/
readqualfiltering.html
135 lines (132 loc) · 5.48 KB
/
readqualfiltering.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta content="en-us" http-equiv="Content-Language"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="no-cache, no-store, must-revalidate" http-equiv="Cache-Control"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="0" http-equiv="Expires"/>
<title>
allpairs_global command
</title>
<link href="stylesx.css" rel="stylesheet" type="text/css"/>
<style type="text/css">
body.c4 {background-color:#c0c0c0;}
div.c3 {position:absolute; top:45px; left:20px; width:830px; background-color:#ffffff; border-width:10px; border-style:solid;border-color:white;}
span.c2 {font-weight: bold}
div.c1 {position:absolute; top:10px; left:20px; width:850px; height:60px;}
.TopButtonPara { color:white; background-color:rgb(50,100,150); border-color:rgb(50,100,150); font-family:Arial, Helvetica, sans-serif; font-weight:normal; font-size:9pt; text-align:center; border-width:4px; border-style:solid; }
.TopButton { color:white; }
a.TopButton:link { text-decoration:none; }
a.TopButton:visited { text-decoration:none; }
a.TopButton:hover { color:orange; }
.NewButtonPara { color:white; background-color:rgb(50,100,150); border-color:rgb(50,100,150); font-family:Arial, Helvetica, sans-serif; font-weight:normal; font-size:9pt; text-align:center; border-width:4px; border-style:solid; }
.NewButton { color:white; }
a.NewButton:link { text-decoration:none; }
a.NewButton:visited { text-decoration:none; }
a.NewButton:hover { color:orange; }
.SideButtonPara { color:white; font-family:Arial, Helvetica, sans-serif; font-size:9pt; font-weight:normal; text-align:center; line-height:18px; }
.SideButton { color:white; }
a.SideButton:link { text-decoration:none; }
a.SideButton:visited { text-decoration:none; }
a.SideButton:hover { color:orange; }
</style>
</head>
<body style="background-color:#c0c0c0;">
<div>
<a href="https://drive5.com/usearch">
<img alt="USEARCH v12" src="usearch12_banner.jpg" style="position:absolute; top:40px; left:10px; padding:0px; border:0px;"/>
</a>
</div>
<div style="position:absolute; top:115px; left:10px; width:850px; background-color:#ffffff; min-height:500px">
<div style="position:relative; float:left; background-color:#696969; width:125px; left: 0px; min-height:500px; padding:5px; height: 125px;">
<div class="SideButtonPara" style="text-align:center; padding-top:5px;">
<a class="SideButton" href="index.html">
Docs home
</a>
<br/>
<hr style="border:0; border-bottom: 1px solid white;"/>
<a class="SideButton" href="cmds.html">
Commands
</a>
<br/>
<a class="SideButton" href="topics.html">
Topics
</a>
<br/>
<a class="SideButton" href="citation.html">
Publications
</a>
<br/>
</div>
</div>
<div class="ManText" style="left:20px; position: absolute; left:135px; width:695px; background-color:white; padding:10px">
<h1>
Read quality filtering
</h1>
<p>
<span class="ManText">
<b>
See also
<br/>
</b>
<a href="quality_score.html">
Quality scores
</a>
<br/>
<a href="exp_errs.html">
Expected errors
</a>
<br/>
<b>
</b>
<a href="avgq.html">
Average Q is a bad idea!
</a>
<br/>
<a href="global_trimming.html">
Global trimming
</a>
<br/>
<a href="fastq_choose_filter.html">
Choosing FASTQ filter parameters
</a>
<br/>
<br/>
Raw reads generated by a next-generation sequencing machine such as 454 or Illumina have predicted error probabilities for each base indicated by
<a href="quality_score.html">
quality (Q) scores
</a>
. In many applications it is important to filter reads to reduce the number of errors, especially in marker gene sequencing experiments such as 16S or ITS where it is very challenging to distinguish true biological sequences and between-sample variations from sequencing error and PCR artifacts (chimeras and point mutations during amplification).
</span>
</p>
<p>
<span class="ManText">
In USEARCH, quality filtering is done with the
<a href="cmd_fastq_filter.html">
fastq_filter
</a>
command. I strongly recommend using
<a href="exp_errs.html">
expected error filtering
</a>
.
</span>
</p>
<p class="ManText">
You can use
<a href="DELETE_URL">
fastx_learn
</a>
to estimate the error rate after filtering.
</p>
<p>
<span class="ManText">
There is an important difference between Q scores in pyrosequencing reads from 454 and Illumina reads. In effect, 454 ignores the possibility of substitution errors and Illumina ignores indels. With 454, the Q score is the estimated probability that the length of the current homopolymer is wrong, and with Illumina the Q score is the probability that the base call is wrong. In the case of Illumina, this is reasonable because indel errors are very rare. But with 454, substitution errors are quite common, occurring with comparable frequency to homopolymer errors. This means that 454 Q scores are not as predictive of read errors as Illumina Q scores.
<br/>
</span>
</p>
</div>
</div>
</body>
</html>