CLC bio accelerates Sean Eddy's HMMER algorithm with innovative technology

29 Jul 2007

At the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) & 6th European Conference on Computational Biology (ECCB), this week in Vienna, CLC bio released an accelerated version of Sean Eddy's popular HMMER software suite for protein sequence analysis.

Senior Scientific Officer at CLC bio, Dr. Jannick Bendtsen, states
Our implementation of HMMER runs on CLC Bioinformatics Cell, which uses innovative SIMD technology to unleash already existing powers in the computer processor to accelerate HMMER 25 times. Thus searches previously taking more than 3 hours can be done in less than 10 minutes. This implementation works both with a graphical interface through our commercial workbenches, for example CLC Combined Workbench, or through a command-line interface - and it works on both Mac OS X, Windows and Linux.

With CLC Bioinformatics Cell users are able to utilize hidden computational powers in their computer or server. CLC Bioinformatics Cell uses SIMD technology to parallelize and thereby accelerate bioinformatics algorithms on Intel and AMD processor powered computers as well as computer clusters. The Bioinformatics Cell is optimized for Smith-Waterman searches, ClustalW alignments and HMMER protein sequence analyses.

Professor Ronald Worthington, Pharmaceutical Sciences at Southern Illinois University, previously stated,
'One of the big things that genomics companies do, is constantly screen the new data that's coming online to look for targets of interest using the HMMER software with the Pfam database - but it's really, really slow. Some Pfam searches we have done went on for weeks, even with dual-processor computers. If CLC bio can produce an SIMD-enabled HMMER, it will be huge for the genomics industry and academics.
Quote from Genome Technology Magazine, May 2007

CLC Bioinformatics Cell

CLC bio

High-performance ComputingWould you like to perform fast and precise database searches, using Smith-Waterman at the speed of BLAST? With a CLC Bioinformatics Cell you’re able to utilize the hidden computational powers in your computer or server. With the Cell you can speed up a Smith-Waterman search, previously taking two hours, to around one minute!The Cell includes the fastest Smith-Waterman implementation ever made on standard hardware - nucleotide searches are accelerated up to 110 times and protein searches are accelerated up to 50 times on new computers. The Cell thus removes the argument for using BLAST in situations where you search through data where you not only need some of the answers but all of the answers. IntegratedThe Cell is fully integrated with the most comprehensive bioinformatics desktop software applications in the market: CLC Gene Workbench, CLC Protein Workbench, and CLC Combined Workbench.The Cell thus benefits from all the features of CLC bio’s workbenches such as the graphical user interface, the option of running multiple analyses in batch runs, the data storage opportunities, the graphical and tabular overview of the results, etc. Using high performance computing has never been easier or more effective. Command-line execution is also an option. This means you have the option of integrating high-performance calculations in your existing bioinfomatics workflow of scripts and in-house programs.FlexibleThis innovative approach to hardware-accelerated bioinformatics provides flexibility in a variety of ways: Physical location: No need for a cooled server room or new expensive hardware to perform high performance computing. All you need is your computer. Operating system: As long as your computer has an Intel or AMD processor, the Cell works on both Windows, Linux, and Mac OS X operating systems. Software use: The Cell can be used as an integrated part of all CLC bio’s workbenches, or through a command line interface. Customization: With CLC Developer Kit, you can create your own workflows including the functionalities of the Cell. Utilize the full power of your computer clusterIf you have a computer cluster, or if you have a computer with multiple CPUs or cores, the Cell provides you with a solution that accelerates your calculations proportionally with the number of Cell’s implemented. Thus, a computer cluster with 1 0 Cells will provide a speed-up factor of 0.Smith Waterman searches – when you need ALL the answersFinding homologue DNA, RNA, or protein sequences in a database can be done in many ways. Smith-Waterman based searches is the only method that identifies all true hits, but the algorithm is very slow when working on large datasets and most scientists therefore use the much faster BLAST search algorithm.But the speed of BLAST comes on the expense of the quality. The sensitivity of BLAST is in fact so low, that there is significant risk of missing important sequences of interest. Compared to Smith Waterman based searches, up to 50% of the search hits are thus not found using BLAST.With the Cell, you can speed up a Smith-Waterman search previously taking half a day to around 0 minutes, and the Cell thus removes the argument for using BLAST – at least in situations where you search through data where you not only need some of the answers but all of the answers. The Cell includes the fastest Smith-Waterman implementation ever made on standard hardware – nucleotide searches are accelerated up to 11 0 times and protein searches are accelerated up to 45 times on most modern computers.ClustalW alignmentsClustalW is one of the most used methods for doing multiple sequence alignments. When aligning many sequences and when aligning long sequences, the speed of the algorithm is however not impressing.As ClustalW is so widely used as it is, we have implemented a SIMD-version of it in the Cell, resulting in an acceleration of 4-5 times on most computers.

(0)

Links

Tags

CLC bio accelerates Sean Eddy's HMMER algorithm with innovative technology