I develop and apply data mining and systems
biology bioinformatics approaches to
, with a focus on neuromuscular disease and
the functional analysis of omics data. My most recent tool is
which facilitates the graphical display of gene association networks organised
on subcellular localizations.
PROJECTS AND INTERESTS
High-throughput "omics" technologies provide tremendous scope to identify
characteristics by which patients can be grouped. This can be used to predict
how a disease will progress differently from person to person,
or to predict which drug(s) will work best for which patients.
I'm interested to use network biology to identify clusters of interacting
genes or proteins that influence the severity of a disease or that can affect
a patient’s response to a given therapy.
Muscle, neuromuscular disease, and data mining
I'm interested in trying to make sense of muscle omics data. Which RNAs,
proteins, and other molecules are important to the normal function of the
muscle? What does it mean when
the level of one or more of these molecules is consistently changed in disease
or under experimental conditions? How do molecular expression profiles relate
to pathways, organelles, and metabolic components of the cell? How can we extract
the most information from existing datasets, and how can we best compare new datasets
with old? Achieving better approaches to these questions can help to suggest
new therapeutic avenues for neuromuscular disorders.
I also want to make bioinformatics resources more accessible to muscle
biologists, and to increase the ease of interpretation in the visual display of
Asides from muscle research and integrative systems
biology, I have a background in
mining large datasets.
Pathways, Networking, and
Since the creation of the
and standardized data formats such
as that of the Proteomics Standards Initiative, it is now trivial to obtain (e.g.
from resources such as
) a carefully curated list of experimentally-determined protein-protein interactions
(PPIs) for a given organism. Resources for heavily studied model organisms such as yeast and
mouse, and human cell lines, now list tens of thousands of PPIs, and these list are quite
comprehensive in terms of covering all 'known' (published) interactions.
I've been using the whole interactome, consisting of all PPIs and of interactions
with other molecule types, to
help explore muscle function and especially to add another layer of information to muscle
proteomic data. My work has made use of the resources listed above, but in addition
I'm working on ways to
adapt analyses specifically towards muscle cell types. An example of this is our
CellWhere tool which visually displays PPI networks, organizing them according to protein
subcellular location. Alongside its generic function giving the most frequently annotated subcellular
locations of given proteins, it also facilitates the highlighting of subcellular locations
that are of special pertinence to a particular research project. In our own work, we
often use the CellWhere tool to prioritize locations such as the muscle contractile aparatus,
or the neuromuscular junction.
CellWhere publication (L. Zhu et al., Nucleic Acids Res. 43, W571–W575, 2015)
. CellWhere is online
I have a long-term goal of providing network exploration tools and analyses that are
adapted to the specific details of muscle cell function.
Muscle Gene Sets
More than ten thousand samples of muscle transcriptomic data have been
uploaded to the public Gene Expression Omnibus in the past ten years,
representing many millions of dollars of research expenditure and incalculable
hours of research effort. These data ought to serve as a massive reference set
for ongoing and future studies of dysferlinopathy and other neuromuscular disorders.
One way to distil the data and render them more accessible to bench researchers
is to extract from each study lists of genes ("gene sets") that were differentially
expressed. With careful curation, each transcriptomic dataset may yield multiple
comparisons, not only relating to the primary focus of that study, such as a
pathology or an experimental treatment, but also more general comparisons not
necessarily envisaged by the study’s authors, but relating to factors such as
age, sex, and muscle group.
Muscle gene sets may be used in several ways, including:
(1) to aid in the interpretation of new omics data by allowing their comparison
with previous data; (2) to uncover overlap between pathologies or treatments,
thereby identifying common signatures and possible biomarkers; and (3) to determine
which genes are frequently differentially expressed in muscle experiments and
disease, thus identifying potentially important contributors to muscle function
and pathology. We have extracted several hundred gene sets from published muscle
data, focused on in vitro studies, and are now extending this to in vivo studies.
Preliminary meta-analysis shows that muscle function ontologies are enriched among
the more frequently differentially expressed genes. We have applied muscle gene
sets to several research problems, including inflammatory response in dysferlinopathy,
myoblast regenerative capacity in muscle ageing, and the identification of
disease-contributing genetic variants.
Our work on muscle gene sets is ongoing in collaboration with the
team of Silvio Bicciato at the university of Padova.
One of the more frequent and devastating muscle diseases is Duchenne muscular dystrophy
(DMD). This disease is caused by DNA mutations to the coding sequence, 79 exons in length,
of a very large filamentous structural protein called dystrophin. Generally, mutations
that result in DMD are those that render the coding sequence meaningless, whereas
other mutations that only cause loss of parts of the coding sequence can sometimes result
in a truncated but still partially functional dystrophin protein. These latter
mutations are responsible for a milder disease known as Becker muscular dystrophy.
Usually the DMD-causing mutations disrupt the codon reading frame of the mRNA transcript.
However, due to the intricacies of the codon alignments among the 79 exons, it is
possible to restore the reading frame by the use of small drug molecule DNA analogues
that target exons surrounding the mutation. The goal of this exon skipping approach is
to produce truncated dystrophin protein that will improve the severity of DMD towards
that of BMD. The strategy is a leading possibility among the few therapeutic approaches
that may be successful for DMD.
Successfully targeting a given exon, however, is not always easy, and depends on
identifying a DNA-analogue sequence that will bind strongly to its target exon at a
location that blocks the splicing machinery's capacity to recognise that stretch of
pre-mRNA as exonic. I've used previous experimental data together with computational
approaches to create a predictive algorithm to help researchers design new exon-skipping
drug sequences, and we have validated this algorithm with the help of
Toshifumi Yokota's team at the University of Alberta.
Our exon skipping algorithm was
published in PLoS ONE.
Data analysis and tool development
I'm involved concurrently with the omics data analysis aspects of
around a dozen collaborative projects, working with
researchers (both at the Center for Stratified Medicine and internationally) who study
neuromuscular disorders such as Amyotrophic Lateral Sclerosis, Duchenne and Becker muscular dystrophies,
and Dysferlinopathy (LGMD2B).
In tool development, I'm mainly interested to better explore omics
data. In particular, tools that can be adapted towards muscle research.
This includes tools to visualize pathways and networks, such as our CellWhere tool (
), and also tools to extract information from previous datasets, such as our
current Muscle Gene Sets project that is under development (
). I have also developed algorithms for the design of
better exon skipping drugs for DMD (
I'm always interested to analyze new datasets
and always ready to consider new collaborations in tool development.