
-dtype: protein: prot;nucleid: nucl
2 step: blastn -query $query -out $outname -db $database_name -evalue 1e-5
blastp -query $query -out $outname -db $database_name (-e e-5)
* blastp -query *.fasta -out *txt -db $database_name -e e-5 -max_target_seqs 3 -outfmt "7 sacc"
3 举例:blastn -db plant_rna -query test.fa -out test.out -evalue 0.00001 -max_target_seqs 5 -num_threads 4 -outfmt format "7 qacc sacc evalue length pident"
blastn:这个不用说了吧,核酸对核酸的比对
-db: 指定blast搜索用的数据库,详见上篇文章
-query:用来查询的输入序列,fasta格式
-out:输出结果文件
-evalue: 设置e值cutoff
-max_target_seqs:设置最多的目标序列匹配数(以前我都用-b 5 -v 5,理解不对请指教)
-num_threads:指定多少个cpu运行任务(依赖于你的系统,同于以前的-a参数)
-outfmt format "7 qacc sacc evalue length pident" :这个是新BLAST+中最拉风的功能了,直接控制输出格式,不用再用parser啦, 7表示带注释行的tab格式的输出,可以自定义要输出哪些内容,用空格分格跟在7的后面,并把所有的输出控制用双引号括起来,其中qacc查询序列的acc,sacc表示目标序列的acc,evalue即是e值,length即是匹配的长度,pident即是序列相同的百分比,其他可用的特征(红色字体)如下:
*** Formatting options
-outfmt alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1 10 = Comma-separated values Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession sallacc means All subject accessions qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Qu ery and subject frames separated by a '/' qframe means Query frame sframe means Subject frame When not provided, the default value is: 'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' Default = `0'
