<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/rss.css" type="text/css"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:extra="http://www.w3.org/1999/xhtml"
    xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel rdf:about="http://almob.org/feeds/latestarticles/journal?quantity=&amp;format=rss&amp;version=">
        <title>Algorithms for Molecular Biology - Latest Articles</title>
        <link>http://www.almob.org</link>
        <description>The latest research articles published by Algorithms for Molecular Biology</description>
        <dc:date>2009-06-24T00:00:00Z</dc:date>
        <items>
            <rdf:Seq>
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/9" />
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/8" />
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/7" />
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/6" />
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/5" />
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/4" />
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/3" />
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/2" />
                                <rdf:li rdf:resource="http://www.almob.org/content/4/1/1" />
                                <rdf:li rdf:resource="http://www.almob.org/content/3/1/16" />
                            </rdf:Seq>
        </items>
        <extra:info rdf:parseType="Literal">
            <html:div style="font:14px Verdana, Geneva, Arial, Helvetica, sans-serif" xmlns:html="http://www.w3.org/1999/xhtml">
                <html:span style="font-weight:bold">
                    This is an RSS newsfeed from BioMed Central
                </html:span>
                <html:br />
                <html:span style="font-size: 12px;">
                    It is intended to be used with an RSS reader. For more information about RSS newsfeeds from BioMed Central, visit
                    <html:br />
                    <html:a href="http://www.biomedcentral.com/info/about/rss/" style="color:#3333CC; font-size:12px;">
                        http://www.biomedcentral.com/info/about/rss/
                    </html:a>
                    <html:br />
                </html:span>
            </html:div>
        </extra:info>
        <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </channel>
        <item rdf:about="http://www.almob.org/content/4/1/9">
        <title>Review of Bioinformatics: A Computing Perspective edited by Shuba Gopal, Anne Haake, Rhys Price Jones and Paul Tymann</title>
        <description>No description available</description>
        <link>http://www.almob.org/content/4/1/9</link>
                <dc:creator>Dae-Won Kim</dc:creator>
                <dc:creator>Hong-Seog Park</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:9</dc:source>
        <dc:date>2009-06-24T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-9</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>9</prism:startingPage>
        <prism:publicationDate>2009-06-24T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/4/1/8">
        <title>A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series 

</title>
        <description>Background:
The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, iscritical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of ecient biclustering algorithms able to identify all maximal contiguous column coherent biclusters.
Methods:
In this work, we propose e-CCC-Biclustering, a biclustering algorithm that nds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using ecient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated expression patterns and dierent ways to compute the errors allowed in the expression patterns. We propose a statistical test to score the biclusters according to the relevance of their approximate expression pattern together with a method to lter highly overlapping biclusters.
Results:
We present results in real data showing the eectiveness of e -CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of the art methods that require exact matching of gene expression time series.DiscussionThe identication of co-regulated genes, involved in specic biological processes, remains one of the main avenues open to researchers studying gene regulatory networks. The ability of the proposed methodology to eciently identify sets of genes with similar expression patterns is shown to be instrumental in the discovery of relevant biological phenomena, leading to more convincing evidence of specic regulatory mechanisms.AvailabilityA prototype implementation of the algorithm together with the dataset and examples used in the paper is available in http:/kdbio.inesc-id.pt/software/e- CCC-Biclustering.</description>
        <link>http://www.almob.org/content/4/1/8</link>
                <dc:creator>Sara Madeira</dc:creator>
                <dc:creator>Arlindo Oliveira</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:8</dc:source>
        <dc:date>2009-06-04T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-8</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>8</prism:startingPage>
        <prism:publicationDate>2009-06-04T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>PDF</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/4/1/7">
        <title>Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity </title>
        <description>Background:
To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility.
Results:
We compared eight conventional methods for ranking genes: weighted average difference (WAD), average difference (AD), fold change (FC), rank products (RP), moderated t statistic (modT), significance analysis of microarrays (samT), shrinkage t statistic (shrinkT), and intensity-based moderated t statistic (ibmT) with six preprocessing algorithms (PLIER, VSN, FARMS, multi-mgMOS (mmgMOS), MBEI, and GCRMA). A total of 36 real experimental datasets was evaluated on the basis of the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the RP method performed well for VSN-, FARMS-, MBEI-, and GCRMA-preprocessed data, and the WAD method performed well for mmgMOS-preprocessed data. Our analysis of the MicroArray Quality Control (MAQC) project&apos;s datasets showed that the FC-based gene ranking methods (WAD, AD, FC, and RP) had a higher level of reproducibility: The percentages of overlapping genes (POGs) across different sites for the FC-based methods were higher overall than those for the t-statistic-based methods (modT, samT, shrinkT, and ibmT). In particular, POG values for WAD were the highest overall among the FC-based methods irrespective of the choice of preprocessing algorithm.
Conclusion:
Our results demonstrate that to increase sensitivity, specificity, and reproducibility in microarray analyses, we need to select suitable combinations of preprocessing algorithms and gene ranking methods. We recommend the use of FC-based methods, in particular RP or WAD.</description>
        <link>http://www.almob.org/content/4/1/7</link>
                <dc:creator>Koji Kadota</dc:creator>
                <dc:creator>Yuji Nakai</dc:creator>
                <dc:creator>Kentaro Shimizu</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:7</dc:source>
        <dc:date>2009-04-22T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-7</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>7</prism:startingPage>
        <prism:publicationDate>2009-04-22T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/4/1/6">
        <title>Evolving DNA motifs to predict GeneChip probe performance</title>
        <description>Background:
Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI&apos;s GEO database to indicated the quality of individual HG-U133A probes. Low correlation indicates a poor probe.
Results:
Regular expressions can be automatically created from a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming.
Conclusion:
The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided.</description>
        <link>http://www.almob.org/content/4/1/6</link>
                <dc:creator>W Langdon</dc:creator>
                <dc:creator>A Harrison</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:6</dc:source>
        <dc:date>2009-03-19T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-6</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>6</prism:startingPage>
        <prism:publicationDate>2009-03-19T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/4/1/5">
        <title>A Linear Programming Approach for estimating the structure of a Sparse Linear Genetic Network from transcript profiling data
</title>
        <description>Background:
A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data.
Results:
The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known regulatory associations. In each S. cerevisiae LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification.
Conclusion:
A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational &#8211; experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.</description>
        <link>http://www.almob.org/content/4/1/5</link>
                <dc:creator>Sahely Bhadra</dc:creator>
                <dc:creator>Chiranjib Bhattacharyya</dc:creator>
                <dc:creator>Nagasuma Chandra</dc:creator>
                <dc:creator>I. Saira Mian</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:5</dc:source>
        <dc:date>2009-02-24T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-5</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>5</prism:startingPage>
        <prism:publicationDate>2009-02-24T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/4/1/4">
        <title>An image processing approach to computing distances between RNA secondary structures dot plots</title>
        <description>Background:
Computing the distance between two RNA secondary structures can contribute in understanding the functional relationship between them. When used repeatedly, such a procedure may lead to finding a query RNA structure of interest in a database of structures. Several methods are available for computing distances between RNAs represented as strings or graphs, but none utilize the RNA representation with dot plots. Since dot plots are essentially digital images, there is a clear motivation to devise an algorithm for computing the distance between dot plots based on image processing methods.
Results:
We have developed a new metric dubbed &apos;DoPloCompare&apos;, which compares two RNA structures. The method is based on comparing dot plot diagrams that represent the secondary structures. When analyzing two diagrams and motivated by image processing, the distance is based on a combination of histogram correlations and a geometrical distance measure. We introduce, describe, and illustrate the procedure by two applications that utilize this metric on RNA sequences. The first application is the RNA design problem, where the goal is to find the nucleotide sequence for a given secondary structure. Examples where our proposed distance measure outperforms others are given. The second application locates peculiar point mutations that induce significant structural alternations relative to the wild type predicted secondary structure. The approach reported in the past to solve this problem was tested on several RNA sequences with known secondary structures to affirm their prediction, as well as on a data set of ribosomal pieces. These pieces were computationally cut from a ribosome for which an experimentally derived secondary structure is available, and on each piece the prediction conveys similarity to the experimental result. Our newly proposed distance measure shows benefit in this problem as well when compared to standard methods used for assessing the distance similarity between two RNA secondary structures.
Conclusion:
Inspired by image processing and the dot plot representation for RNA secondary structure, we have managed to provide a conceptually new and potentially beneficial metric for comparing two RNA secondary structures. We illustrated our approach on the RNA design problem, as well as on an application that utilizes the distance measure to detect conformational rearranging point mutations in an RNA sequence.</description>
        <link>http://www.almob.org/content/4/1/4</link>
                <dc:creator>Tor Ivry</dc:creator>
                <dc:creator>Shahar Michal</dc:creator>
                <dc:creator>Assaf Avihoo</dc:creator>
                <dc:creator>Guillermo Sapiro</dc:creator>
                <dc:creator>Danny Barash</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:4</dc:source>
        <dc:date>2009-02-09T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-4</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>4</prism:startingPage>
        <prism:publicationDate>2009-02-09T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/4/1/3">
        <title>Lossless filter for multiple repeats with bounded edit distance</title>
        <description>Background:
Identifying local similarity between two or more sequences, or identifying repeats occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding such fragments while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice.
Results:
The filter TUIUIU that we introduce in this paper provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment or repeats inference method, eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate repeat. It consists in the verification of several strong necessary conditions that can be checked in a fast way. We implemented three versions of the filter. The first is simply a straightforward extension to the case of multiple sequences of an application of conditions already existing in the literature. The second uses a stronger condition which, as our results show, enable to filter sensibly more with negligible (if any) additional time. The third version uses an additional condition and pushes the sensibility of the filter even further with a non negligible additional time in many circumstances; our experiments show that it is particularly useful with large error rates. The latter version was applied as a preprocessing of a multiple alignment tool, obtaining an overall time (filter plus alignment) on average 63 and at best 530 times smaller than before (direct alignment), with in most cases a better quality alignment.
Conclusion:
To the best of our knowledge, TUIUIU is the first filter designed for multiple repeats and for dealing with error rates greater than 10% of the repeats length.</description>
        <link>http://www.almob.org/content/4/1/3</link>
                <dc:creator>Pierre Peterlongo</dc:creator>
                <dc:creator>Gustavo Akio Tominaga Sacomoto</dc:creator>
                <dc:creator>Alair Pereira do Lago</dc:creator>
                <dc:creator>Nadia Pisanti</dc:creator>
                <dc:creator>Marie-France Sagot</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:3</dc:source>
        <dc:date>2009-01-30T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-3</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>3</prism:startingPage>
        <prism:publicationDate>2009-01-30T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/4/1/2">
        <title>CHSMiner: a GUI tool to identify chromosomal homologous segments</title>
        <description>Background:
The identification of chromosomal homologous segments (CHS) within and between genomes is essential for comparative genomics. Various processes including insertion/deletion and inversion could cause the degeneration of CHSs.
Results:
Here we present a Java software CHSMiner that detects CHSs based on shared gene content alone. It implements fast greedy search algorithm and rigorous statistical validation, and its friendly graphical interface allows interactive visualization of the results. We tested the software on both simulated and biological realistic data and compared its performance with similar existing software and data source.
Conclusion:
CHSMiner is characterized by its integrated workflow, fast speed and convenient usage. It will be useful for both experimentalists and bioinformaticians interested in the structure and evolution of genomes.</description>
        <link>http://www.almob.org/content/4/1/2</link>
                <dc:creator>Zhen Wang</dc:creator>
                <dc:creator>Guohui Ding</dc:creator>
                <dc:creator>Zhonghao Yu</dc:creator>
                <dc:creator>Lei Liu</dc:creator>
                <dc:creator>Yixue Li</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:2</dc:source>
        <dc:date>2009-01-15T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-2</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>2</prism:startingPage>
        <prism:publicationDate>2009-01-15T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/4/1/1">
        <title>Auto-validating von Neumann rejection sampling from small phylogenetic tree spaces</title>
        <description>Background:
In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an auto-validating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces.
Results:
The posterior samples from the auto-validating sampler are used to rigorously (i) estimate posterior probabilities for different rooted topologies based on mitochondrial DNA from human, chimpanzee and gorilla, (ii) conduct a non-parametric test of rate variation between protein-coding and tRNA-coding sites from three primates and (iii) obtain a posterior estimate of the human-neanderthal divergence time.
Conclusion:
This solves the open problem of rigorously drawing independent and identically distributed samples from the posterior distribution over rooted and unrooted small tree spaces (3 or 4 taxa) based on any multiply-aligned sequence data.</description>
        <link>http://www.almob.org/content/4/1/1</link>
                <dc:creator>Raazesh Sainudiin</dc:creator>
                <dc:creator>Thomas York</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2009, 4:1</dc:source>
        <dc:date>2009-01-07T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-4-1</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>4</prism:volume>
        <prism:startingPage>1</prism:startingPage>
        <prism:publicationDate>2009-01-07T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.almob.org/content/3/1/16">
        <title>HuMiTar: A sequence-based method for prediction of human microRNA targets</title>
        <description>Background:
MicroRNAs (miRs) are small noncoding RNAs that bind to complementary/partially complementary sites in the 3&apos; untranslated regions of target genes to regulate protein production of the target transcript and to induce mRNA degradation or mRNA cleavage. The ability to perform accurate, high-throughput identification of physiologically active miR targets would enable functional characterization of individual miRs. Current target prediction methods include traditional approaches that are based on specific base-pairing rules in the miR&apos;s seed region and implementation of cross-species conservation of the target site, and machine learning (ML) methods that explore patterns that contrast true and false miR-mRNA duplexes. However, in the case of the traditional methods research shows that some seed region matches that are conserved are false positives and that some of the experimentally validated target sites are not conserved.
Results:
We present HuMiTar, a computational method for identifying common targets of miRs, which is based on a scoring function that considers base-pairing for both seed and non-seed positions for human miR-mRNA duplexes. Our design shows that certain non-seed miR nucleotides, such as 14, 18, 13, 11, and 17, are characterized by a strong bias towards formation of Watson-Crick pairing. We contrasted HuMiTar with several representative competing methods on two sets of human miR targets and a set of ten glioblastoma oncogenes. Comparison with the two best performing traditional methods, PicTar and TargetScanS, and a representative ML method that considers the non-seed positions, NBmiRTar, shows that HuMiTar predictions include majority of the predictions of the other three methods. At the same time, the proposed method is also capable of finding more true positive targets as a trade-off for an increased number of predictions. Genome-wide predictions show that the proposed method is characterized by 1.99 signal-to-noise ratio and linear, with respect to the length of the mRNA sequence, computational complexity. The ROC analysis shows that HuMiTar obtains results comparable with PicTar, which are characterized by high true positive rates that are coupled with moderate values of false positive rates.
Conclusion:
The proposed HuMiTar method constitutes a step towards providing an efficient model for studying translational gene regulation by miRs.</description>
        <link>http://www.almob.org/content/3/1/16</link>
                <dc:creator>Jishou Ruan</dc:creator>
                <dc:creator>Hanzhe Chen</dc:creator>
                <dc:creator>Lukasz Kurgan</dc:creator>
                <dc:creator>Ke Chen</dc:creator>
                <dc:creator>Chunsheng Kang</dc:creator>
                <dc:creator>Peiyu Pu</dc:creator>
                <dc:source>Algorithms for Molecular Biology 2008, 3:16</dc:source>
        <dc:date>2008-12-22T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/1748-7188-3-16</dc:identifier>
        <prism:publicationName>Algorithms for Molecular Biology</prism:publicationName>
        <prism:issn>1748-7188</prism:issn>
        <prism:volume>3</prism:volume>
        <prism:startingPage>16</prism:startingPage>
        <prism:publicationDate>2008-12-22T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <cc:License rdf:about="http://creativecommons.org/licenses/by/2.0/">
        <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
    </cc:License>
</rdf:RDF>
