hmmer download and installation
For Mac OS/X, Linux, UNIX systems, compile and install from source code:
% wget ftp://selab.janelia.org/pub/software/hmmer3/3.0/hmmer-3.0.tar.gz % tar zxf hmmer-3.0.tar.gz % cd hmmer-3.0 % ./configure % make % make check
For Windows systems, download the binary compressed package directly, decompress it and use it.
Programs included in hmmer
phmmer: Similar to Blastp, uses a protein sequence to search the protein sequence library;
>phmmer tutorial/HBB HUMAN uniprot sprot.fa
jackhmmer: Similar to psiBlast, protein sequences iteratively search protein sequence libraries;
>jackhmmer tutorial/HBB HUMAN uniprot sprot.fa
hmmbuild: Build HMM model using multiple alignment sequences;
hmmsearch: Use HMM model to search sequence library;
hmmscan: Use sequence to search HMM library;
hmmalign: Use HMM as a clue to construct multiple alignment sequences;
>hmmalign globins4.hmm tutorial/globins45.fa
hmmconvert: Convert HMM format
hmmemit: Obtain a pattern sequence from the HMM model;
hmmfetch: Retrieve an HMM model from the HMM library by name or acceptance number;
hmmpress: Format the HMM database to facilitate hmmscan search and use;
hmmstat: Display statistical information of HMM database;
Search sequence database using HMM model
Use hmmbuild to build the HMM model, input a multiple alignment sequence file in Stockholm format or FASTA format (such as: tutorial/globins4.sto), the command is as follows:
>hmmbuild globins4.hmm tutorial/globins4.sto
globins4.hmm is the output HMM model
Use hmmsearch to search the protein sequence database. The protein sequence database is in FASTA format. The command is as follows:
>hmmsearch globins4.hmm uniprot sprot.fasta >globins4.out
globins4.out is the output result file, as follows:
*The example uses the example in the official tutorial
Search the HMM database using protein sequences
Build an HMM database. The HMM database is a file containing multiple HMM models. It can be downloaded from Pfam, SMART, and TIGRFams, or it can be built by yourself from multiple alignment sequences, such as:
>hmmbuild globins4.hmm tutorial/globins4.sto
>hmmbuild fn3.hmm tutorial/fn3.sto
>hmmbuild Pkinase.hmm tutorial/Pkinase.sto
>cat globins4.hmm fn3.hmm Pkinase.hmm >minifam
Use hmmpress to format the database, including compression and index creation. The command is as follows:
>hmmpress minifam
This step can be completed quickly, and the output content is as follows:
Working… done.
Pressed and indexed 3 HMMs (3 names and 2 accessions).
Models pressed into binary file: minifam.h3m
SSI index for binary model file: minifam.h3i
Profiles (MSV part) pressed into: minifam.h3f
Profiles (remainder) pressed into: minifam.h3p
Use hmmscan to search the HMM database, the command is as follows:
>hmmscan minifam tutorial/7LESS_DROME
I also encountered this problem. I searched online for a long time and couldn't find a suitable solution, so I wrote one myself. The code is as follows
import glob # They are all things from the standard library
import os
# Put the fasta file (compared) you want to build hmm in the same folder as this program, and then run this program to run hmmbuild
os.chdir(os.path.dirname(__file__))
fs = glob.glob('*.fasta') # Get each fasta file. If your fasta file has a suffix other than .fasta, you can change it here, or directly change it to '*.fa*'
for f in fs:
hmm = os.path.splitext(f)[0] '.hmm'
stockholm = os.path.splitext(f)[0] '.sto'
with open(f, 'r') as fhandle: # This is used to read fasta files and save all fasta files to the list
fastas = ['>' tmp.replace('\n', '\r', 1).replace('\n', '').replace('\r', '\n') for tmp in tuple(filter(None, (fhandle.read().split('>'))))]
for i in range(len(fastas)):
fastas[i] = fastas[i].split('\n')
fastas[i][0] = fastas[i][0].split()[0][1:10]
tmp = []
for j in range(len(fastas[i][1]) // 80 1):
tmp.append(fastas[i][1][80 * j : 80 * j 80])
fastas[i][1] = tmp
with open(stockholm, 'w') as out: # The sto file is written here
out.write('# STOCKHOLM 1.0\n\n')
for j in range(len(fastas[0][1]) - 1):
for i in range(len(fastas)):
out.write('% -12s%s\n' % (fastas[i][0], fastas[i][1][j]))
out.write('\n')
for i in range(len(fastas)):
out.write('% -12s%s\n' % (fastas[i][0], fastas[i][1][-1]))
out.write('//')
os.system('hmmbuild --amino %s %s' % (hmm, stockholm)) # hmmbuild is running here, you can modify the parameters inside
1. Start with existing bioinformatics tools. Be familiar with how to use existing software, network servers, databases, etc. to serve biological research. Don’t repeat work. Don’t develop your own if you can use ready-made ones.
2. Familiar with command line operating systems, such as DOS and Linux, you can write simple shells; then you can install command line-level programs and run some regular processes. Learning how to find and install software is the most important and fundamental skill. In fact, many problems can be easily solved if you find the right software package.
3. Be familiar with a simple scripting language. I personally recommend using python. For specific reasons, please see my post. Small scripts are very useful when there are no ready-made tools, or when data format conversion is required. General applications do not need to write too much code by themselves. We must believe that other experts may have encountered the problems we usually encounter, so there are a large number of toolkits on the Internet. As for more programming languages, one can master everything, R, perl, etc. are all similar.
4. Be familiar with the knowledge of simple algorithms and data structures, so that you can understand the internal mechanisms of many programs, and then know their advantages and disadvantages, which will also be helpful for writing your own programs. If you have the energy, then study statistics, machine learning, etc. .
5. Expand, research, analyze, and develop within your own biological field.
The above is the detailed content of Can I install HMMER software on Windows systems?. For more information, please follow other related articles on the PHP Chinese website!