close

Se connecter

Se connecter avec OpenID

Caractérisation des bases moléculaires de l`isolement reproducteur

IntégréTéléchargement
Caractérisation des bases moléculaires de l’isolement
reproducteur post-zygotique intrinsèque chez le Grand
Corégone (Coregonus clupeaformis)
Thèse
Anne-Marie Dion-Côté
Doctorat en biologie
Philosophiae doctor (Ph.D.)
Québec, Canada
© Anne-Marie Dion-Côté, 2016
Caractérisation des bases moléculaires de l’isolement
reproducteur post-zygotique intrinsèque chez le Grand
Corégone (Coregonus clupeaformis)
Thèse
Anne-Marie Dion-Côté
Sous la direction de :
Louis Bernatchez, directeur recherche
Résumé
Alors que les technologies de séquençage à très haut débit ont permis de réaliser
d’importants progrès pour documenter l’architecture génomique de la spéciation, les
mécanismes moléculaires responsables de l’isolement reproducteur demeurent nébuleux.
L’objectif principal de cette thèse est d’apporter un nouvel éclairage sur les bases
moléculaires de l’isolement reproducteur dans un système d’espèces naissantes : le Grand
Corégone. En particulier, l’approche était d’examiner les mécanismes associés à la stabilité
du génome et à sa dérégulation dans un contexte de divergence et d’hybridation. Par une
méthode de transcriptomique, nous avons documenté une profonde dérégulation à l’échelle
du transcriptome chez les hybrides entre les formes naine et normale du Grand Corégone,
incluant la dérépression des éléments transposables et des transcrits non-codants. Par la
suite, nous avons observé que les hybrides montrent des signes importants de
déstabilisation de la méiose et de la mitose, malgré l’absence de changement de caryotype
entre les formes parentales. Finalement, nous avons observé un polymorphisme subchromosomique importants chez trois paires de Grand Corégones sympatriques,
principalement associé à la période d’allopatrie. Ces travaux montrent que des changements
importants à l’échelle du génome entier surviennent tôt au cours du processus de
divergence, et sont susceptibles de conduire à de profonds dérèglements chez les hybrides.
Enfin, cette thèse montre qu’une approche dite intégrative favorise une meilleure
compréhension des mécanismes liés à la divergence et à la spéciation.
iii
Abstract
While high throughput sequencing has led to significant progress towards the
understanding of the genomic architecture of speciation, molecular mechanisms responsible
for reproductive isolation remain unclear. The main objective of this thesis is to contribute
to elucidate the molecular basis of reproductive isolation in a nascent species complex : the
Lake Whitefish. In particular, the approach was to examine the mechanisms associated with
genome stability and instability in a context of divergence and hybridization. Using
transcriptomics, we documented a profound deregulation at the transcriptome scale in
hybrids between dwarf and normal Lake Whitefish, including transposable element and
non-coding RNA derepression. Then, we found that hybrids show significant signs of
meiosis and mitosis breakdown, despite the absence of karyotype changes between parental
forms. Finally, we observed conspicuous sub-chromosomal polymorphism in three
sympatric Lake Whitefish species pairs, mainly associated with earlier allopatry. This work
shows that genome-scale reorganization occurs early during divergence, and may lead to
profound dysregulation in hybrids. This thesis also shows that integrative biology promotes
a better understanding of the mechanisms linked to divergence and speciation.
iv
Table des matières
RÉSUMÉ .......................................................................................................................................... III
ABSTRACT ..................................................................................................................................... IV
TABLE DES MATIÈRES ................................................................................................................V
LISTE DES ABRÉVIATIONS......................................................................................................XII
REMERCIEMENTS ...................................................................................................................... XV
AVANT-PROPOS .......................................................................................................................XVII
CHAPITRE 1 : INTRODUCTION GÉNÉRALE ...........................................................................1
1.1
LA SPÉCIATION COMME MÉCANISME GÉNÉRATEUR DE BIODIVERSITÉ ................................2
1.2
BARRIÈRES À LA REPRODUCTION ............................................................................................2
1.3
L’ISOLEMENT REPRODUCTEUR POST-ZYGOTIQUE INTRINSÈQUE RÉVÉLÉ PAR LES
HYBRIDES .............................................................................................................................................3
1.4
CONTEXTE GÉOGRAPHIQUE DE LA SPÉCIATION .....................................................................4
1.5
PROGRÈS RÉCENTS EN GÉNÉTIQUE DE LA SPÉCIATION .........................................................5
1.6
MÉCANISMES DE L’ISOLEMENT REPRODUCTEUR POST-ZYGOTIQUE INTRINSÈQUE ............6
1.6.1
MÉCANISMES GÉNÉTIQUES DE L’ISOLEMENT REPRODUCTEUR POST-ZYGOTIQUE
INTRINSÈQUE ........................................................................................................................................6
1.6.2
MÉCANISMES TRANSCRIPTIONNELS DE L’ISOLEMENT REPRODUCTEUR POST-ZYGOTIQUE
INTRINSÈQUE ........................................................................................................................................7
1.6.3
MÉCANISMES CHROMOSOMIQUES DE L’ISOLEMENT REPRODUCTEUR POST-ZYGOTIQUE
INTRINSÈQUE ........................................................................................................................................8
1.7
LE RÔLE DE LA STABILITÉ DU GÉNOME EN SPÉCIATION ........................................................9
1.7.1
LES ÉLÉMENTS TRANSPOSABLES ...........................................................................................10
1.7.2
L’ANEUPLOÏDIE ......................................................................................................................10
1.7.3
L’HÉTÉROCHROMATINE .........................................................................................................11
1.8
LE GRAND CORÉGONE ...........................................................................................................12
1.9
OBJECTIFS DE LA THÈSE .........................................................................................................14
v
CHAPITRE 2 : RNA-SEQ REVEALS TRANSCRIPTOMIC SHOCK INVOLVING
TRANSPOSABLE ELEMENTS REACTIVATION IN HYBRIDS OF YOUNG LAKE
WHITEFISH SPECIES ....................................................................................................................16
2.1
RÉSUMÉ ...................................................................................................................................17
2.2
ABSTRACT................................................................................................................................18
2.3
INTRODUCTION .......................................................................................................................19
2.4
RESULTS ..................................................................................................................................22
2.5
DISCUSSION .............................................................................................................................26
2.6
MATERIAL AND METHODS ......................................................................................................35
2.7
ACKNOWLEDGEMENTS ...........................................................................................................39
2.8
TABLES.....................................................................................................................................40
2.9
FIGURES ...................................................................................................................................43
2.10
SUPPLEMENTARY METHODS ................................................................................................47
2.11
SUPPLEMENTARY TABLE ......................................................................................................48
2.12
SUPPLEMENTARY FIGURES ..................................................................................................49
CHAPITRE 3 : REPRODUCTIVE ISOLATION IN A NASCENT SPECIES PAIR IS
ASSOCIATED WITH ANEUPLOIDY IN HYBRID OFFSPRING ...............................................59
3.1
RÉSUMÉ ...................................................................................................................................60
3.2
ABSTRACT................................................................................................................................61
3.3
INTRODUCTION .......................................................................................................................62
3.4
MATERIAL AND METHODS .....................................................................................................65
3.5
RESULTS ..................................................................................................................................68
3.6
DISCUSSION .............................................................................................................................70
3.7
DATA ACCESSIBILITY ..............................................................................................................77
3.8
ACKNOWLEDGEMENTS ...........................................................................................................78
3.9
TABLES.....................................................................................................................................79
3.10
FIGURES.................................................................................................................................81
3.11
SUPPLEMENTARY TABLES ....................................................................................................84
CHAPITRE 4 : CYTOGENETICS AND MISSED INFORMATION FROM GENOME
SEQUENCING: STANDING CHROMOSOMAL VARIATION ASSOCIATED WITH
REPRODUCTIVE ISOLATION IN LAKE WHITEFISH SPECIES PAIRS ...............................89
4.1
RÉSUMÉ ...................................................................................................................................90
vi
4.2
ABSTRACT................................................................................................................................91
4.3
INTRODUCTION .......................................................................................................................92
4.4
MATERIAL AND METHODS ......................................................................................................95
4.5
RESULTS ..................................................................................................................................99
4.6
DISCUSSION ...........................................................................................................................103
4.7
DATA ACCESSIBILITY ............................................................................................................109
4.8
ACKNOWLEDGEMENTS .........................................................................................................110
4.9
TABLES...................................................................................................................................111
4.10
FIGURES...............................................................................................................................115
4.11
SUPPLEMENTARY CODE .....................................................................................................121
4.12
SUPPLEMENTARY TABLES ..................................................................................................125
4.13
SUPPLEMENTARY FIGURES ................................................................................................131
CHAPITRE 5 : CONCLUSION ...................................................................................................133
5.1
RETOUR SUR LES PRINCIPAUX RÉSULTATS .........................................................................134
5.2
PERSPECTIVES .......................................................................................................................137
5.3
VERS UNE APPROCHE INTÉGRATIVE DE L’ÉTUDE DE LA SPÉCIATION ...............................139
CHAPITRE 6 : BIBLIOGRAPHIE ..............................................................................................140
vii
Liste des tableaux
TABLE 2.1. DIFFERENTIAL EXPRESSION SUMMARY. ........................................................... 40
TABLE 2.2. GO ENRICHMENT (BIOLOGICAL PROCESSES, FDR < 0.05) FOR
DIFFERENTIALLY EXPRESSED TRANSCRIPTS BETWEEN PURE NORMAL AND DWARF
WHITEFISH EMBRYOS (FDR < 0.01, FOLD-CHANGE > 2). .................................................... 41
TABLE 2.3. OVER-EXPRESSION OF TRANSPOSABLE ELEMENTS AND NON-CODING
TRANSCRIPTS IN MALFORMED BACKCROSSES. .................................................................... 42
TABLE S 2.1. GO ENRICHMENT (BIOLOGICAL PROCESSES, FDR < 0.05) AMONG
COMMONLY OVER-EXPRESSED TRANSCRIPTS IN MALFORMED BACKCROSSES AS
COMPARED TO ALL OTHER GROUPS (FDR < 0.01, FOLD-CHANGE > 2). ............................. 48
TABLE 3.1. INDIVIDUALS SAMPLED IN THIS STUDY. ............................................................ 79
TABLE 3.2. SUMMARY STATISTICS OF CHROMOSOME NUMBER PER CROSS-TYPE AND
GROUP. .................................................................................................................................. 80
TABLE S 3.1. SUMMARY STATISTICS FOR INDIVIDUAL FISH................................................ 84
TABLE S 3.2. ANOVA SUMMARY ON VARIATION COEFFICIENTS OF CHROMOSOME
COUNTS PER INDIVIDUAL...................................................................................................... 86
TABLE S 3.3. TUKEY HSD POST-HOC TEST. ........................................................................ 87
TABLE S 3.4. FLIGNER-KILLEEN TEST (ON INDIVIDUAL MEDIAN CHROMOSOME COUNTS) ..
..................................................................................................................................... 88
TABLE 4.1. NUMBER OF INDIVIDUALS ANALYZED PER LAKE AND ECOTYPE WITH THEIR
AVERAGE PHENOTYPIC CHARACTERISTICS. ...................................................................... 111
TABLE 4.2. FACTORS (SUPPLEMENTARY VARIABLES) WITH A SIGNIFICANT EFFECT ON
DIMENSIONS 1 AND 3 FROM THE MULTIPLE FACTOR ANALYSIS. ....................................... 112
TABLE 4.3. SIGNIFICANT BARYCENTER POSITION ESTIMATES OF THE FACTOR LEVELS FOR
WHICH MAIN FACTORS HAD A SIGNIFICANT EFFECT ON DIMENSIONS 1 AND 3 FROM THE
MULTIPLE FACTOR ANALYSIS (TABLE 1). .......................................................................... 113
TABLE 4.4. CHROMOSOME MARKERS SIGNFICANTLY CORRELATED (P < 0.05) TO
DIMENSIONS 1 AND 3 FROM THE MULTIPLE FACTOR ANALYSIS. ....................................... 114
TABLE S 4.1. INDIVIDUAL CHARACTERISTICS AND MARKERS SCORED. ........................... 125
viii
TABLE S 4.2. INDIVIDUAL CHARACTERISTICS AND IMPUTED MARKER VALUES USING THE
IMPUTEMFA() FUNCTION. ................................................................................................. 128
ix
Liste des Figures
FIGURE 2.1. ANNOTATION SUMMARY OF THE ASSEMBLED TRANSCRIPTOME. ................... 43
FIGURE 2.2. TRANSPOSABLE ELEMENTS ANNOTATION. ...................................................... 44
FIGURE 2.3. UNIQUE TRANSCRIPTION PROFILE OBSERVED IN BACKCROSSES, AND
MALFORMED BACKCROSSES IN PARTICULAR. ..................................................................... 45
FIGURE 2.4. OVER-EXPRESSED TRANSPOSABLE ELEMENT SUPER-FAMILIES IN
MALFORMED BACKCROSSES IN ALL COMPARISONS (N = 125, FDR < 0.01, FOLD-CHANGE >
2).
46
FIGURE S 2.1. DISTRIBUTION OF VARIATION COEFFICIENT OF READ COUNTS FOR EACH
CONTIG. ................................................................................................................................ 49
FIGURE S 2.2. DISTRIBUTION OF LOG2 FOLD-CHANGE (LOGFC) FOR TRANSCRIPTS
DIFFERENTIALLY EXPRESSED BETWEEN DWARF AND NORMAL EMBRYOS (FDR < 0.01, FC
> 2). 50
FIGURE S 2.3. DISTRIBUTION OF LOG2 FOLD-CHANGE (LOG2FC) FOR TRANSCRIPTS
DIFFERENTIALLY EXPRESSED IN MALFORMED BACKCROSS. .............................................. 51
FIGURE S 2.4. D/A RATIO DISTRIBUTION IN HYBRIDS. ......................................................... 52
FIGURE S 2.5. CARBOHYDRATE METABOLISM (PARTIAL) IS DOWN-REGULATED IN
MALFORMED HYBRIDS.
........................................................................................................ 53
FIGURE S 2.6. KEGG MAP OF THE TCA CYCLE. ................................................................ 54
FIGURE S 2.7. KEGG MAP OF THE OXIDATIVE PHOSPHORYLATION PATHWAY. ............... 55
FIGURE S 2.8. KEGG MAP OF FATTY ACID METABOLISM. ................................................. 56
FIGURE S 2.9. KEGG MAP OF THE PURINE METABOLISM. ................................................. 57
FIGURE S 2.10. KEGG MAP OF THE PYRIMIDINE METABOLISM. ....................................... 58
FIGURE 3.1. KARYOTYPES OF PURE PARENTAL FORMS AND ABNORMAL METAPHASES OF
MALFORMED BACKCROSSES. ............................................................................................... 81
FIGURE 3.2. MEIOTIC BREAKDOWN IN MALFORMED BACKCROSSES REFLECTED BY THE
ANALYSIS OF MITOTIC CHROMOSOMES. .............................................................................. 82
FIGURE 3.3. MITOTIC AND MEIOTIC CHROMOSOMAL INSTABILITY OCCURS IN
BACKCROSSES BASED ON THE ANALYSIS OF MITOTIC CHROMOSOMES. ............................. 83
x
FIGURE 4.1. EXAMPLE OF CHROMATIN STRUCTURE POLYMORPHISM IN A NORMAL AND A
DWARF INDIVIDUAL FROM EAST LAKE.
............................................................................ 115
FIGURE 4.2. B CHROMOSOME IDENTIFIED IN LAKE WHITEFISH BY A) C-BANDING, B)
CMA3/DAPI STAINING AND C) FISH WITH RDNA 28S (RED) AND RDNA 5S (GREEN)
PROBES................................................................................................................................ 116
FIGURE 4.3. REPRESENTATIVE RDNA SITES POLYMORPHISM SHOWN BY FLUORESCENT IN
SITU HYBRIDIZATION (FISH) WITH PROBES FOR 5S AND 28S RDNA. ............................ 117
FIGURE 4.4. PARTIAL CONSENSUS IDEOGRAM FOR ALL THREE SPECIES PAIRS SHOWING
CHROMOSOME SHAPE AND ALL MARKERS IDENTIFIED ON CHROMOSOMES AND SCORED. ....
................................................................................................................................... 118
FIGURE 4.5. MULTIPLE FACTOR ANALYSIS PERFORMED WITH FACTOMINER. .............. 119
FIGURE 4.6. MULTIPLE FACTOR ANALYSIS CORRELATION CIRCLE FOR DIMENSIONS 1 AND
3.
.................................................................................................................................. 120
FIGURE S 4.1. MULTIPLE FACTOR ANALYSIS PERFORMED WITH FACTOMINER. ........... 131
FIGURE S 4.2. MULTIPLE FACTOR ANALYSIS CORRELATION CIRCLE FOR DIMENSIONS 1
AND 3................................................................................................................................... 132
xi
Liste des abréviations
ADN, ADNc, ADNg, ADNr
Acide désoxyribonucléique, ADN complémentaire,
ADN génomique, ADN ribosomique
ANOVA
Analysis of variance (analyse de la variance)
ARN, ARNm, miARN, ARNnc
Acide ribonucléique, ARN messager, micro ARN,
ARN non-codant
BC
Backcross (rétro-croisement)
BDM
Bateson-Dobzhansky-Muller
BLAST
Basic local alignment search tool (algorithme
d’alignement de séquences)
bp
Base pair (paire de bases)
BP
Before present (avant ce jour)
BWA
Burrows-Wheeler Aligner (outil d’alignement de
séquences)
cm
centimètre
DNMT
DNA methyltransferase (méthyltransférase d’ADN)
EDTA
Ethylenediaminetetraacetic acid (acide
éthylènediaminetétraacétique)
eQTL
Expression quantitative trait locus (locus de trait
quantitatif d’expression)
FC
Fold-change (ratio de changement d’expression)
FDR
False-discovery rate (taux de faux-positifs)
FPKM
Fragments per kilobase of transcript per million reads
(fragments par kilobase de transcrit par million de
lectures alignées)
g, kg, mg, µg, ng
Gramme, kilogramme, milligramme, microgramme,
nanogramme
GO
Gene ontology (ontologie génique)
MFA
Multiple factor analysis (analyse factorielle multiple)
MMR
Mismatch repair (réparation des mésappariements)
NF
Nombre fondamental
nr
Non-redundant database (base de données de
séquences non-redondante)
ORF
Open reading frame (cadre de lecture ouvert)
xii
PCA, PC
Principal component analysis, principal component
(analyse en composantes principales, composante
principale)
pH
Puissance hydrogène (mesure de l’acidité)
RNA-seq
RNA-sequencing (séquençage d’ARN)
RRBS
Reduced representation bisulfite sequencing
(séquençage bisulfite à représentation réduite [du
génome])
SNP
Single-nucleotide polymorphism (polymorphisme de
nucléotide simple)
TCA
Tricitric acid (acide tricritrique)
U
Unité (activité d’enzyme)
UV
Ultra-violet
YBP
Years before present (années avant ce jour)
xiii
Sequencing of the human genome did not provide
conclusive answers to old biological mysteries
– instead, it showed that the right questions
had not even begun to be asked.
T Ryan Gregory (2005) The Evolution of the Genome
xiv
Remerciements
J’aimerais d’abord remercier chaleureusement mon directeur Louis Bernatchez. Louis, tu
m’as offert le plus beau cadeau qu’un mentor puisse offrir: la liberté d’explorer et de faire
des découvertes inattendues, en plus des moyens pour y parvenir. Je te remercie également
pour ton support, tant scientifique que moral. Tu as su tendre l’oreille aux moments
opportuns, et me pousser lorsque j’en avais besoin. Mille fois merci!
Je veux remercier les membres de mon comité d’encadrement: Christian Landry, Julie
Turgeon et Nadia Aubin-Horth. Tous trois m’avez fait confiance à différentes étapes, et j’ai
énormément apprécié les nombreux échanges scientifiques, professionnels et personnels
que nous avons eus. Je vous dois une fière chandelle!
Je voudrais également remercier mes plus proches collaborateurs, sans qui cette thèse
n’existerait probablement pas. Sébastien Renaut, Eric Normandeau, Radka Symonová, Petr
Ráb, Fabien Lamaze, Šárka Pelikánová: merci pour votre support, votre enthousiasme et
votre audace. J’ai également eu l’immense plaisir de collaborer à d’autres études du
laboratoire. Merci à Ciro Rico et Martin Laporte de m’avoir offert cette chance.
Le laboratoire Bernatchez est une grande famille, où plusieurs gens nous épaulent. Dans le
désordre, merci à Christopher Sauvage, Charles Perrier, Gregory Maes, Kim Praebel, Anne
Dalziel, Jean-Sébastien Moore, Scott Pavey, Ben Sutherland et Bérénice Bougas, pour tout
ce que vous avez fait pour moi. Merci aussi à Madoka Krick, Laura Benestan, Clément
Rougeux, François-Olivier Hébert, Alysse Perreault-Payette, Simon Bernatchez, Élianne
Valiquette et Geneviève Ouellet-Cauchon. Merci également à Guillaume Côté, Serge
Higgins, Jean-Christopher Terrien, Jade Larivière, Lucie Papillon, Alain Goulet et Richard
Janvier pour leur immense support technique. Merci à Thierry Gosselin, Vincent Bourret,
Caroline Côté, Marie Filteau, Pierre-Alexandre Gagnaire, et Julie Jeukens pour leurs
judicieux conseils.
xv
Lors de mes séjours en République Tchèque, j’ai eu l’immense plaisir de collaborer avec
des gens extrêmement généreux. Merci à Zuzáná Majtanova, Petra Sejnohová et Alexandr
Sember. Une pensée toute spéciale pour Jana Čechová.
J’ai aussi le bonheur de remercier Karine Jacquet, une grande amie, et Jean-Yves Masson,
mon superviseur de maîtrise. Merci pour vos questions, vos pistes de réponses, votre
support et vos encouragements au long de mon doctorat. Merci également à mon amie
Emilie, sans qui je ne serais possiblement jamais atterie là où je suis.
Un merci tout spécial à mon « comité de relecture » : Jean-Sébastien Moore, Emilie
Castonguay et Mathilde Dion-Côté.
Je remercie également tous les organismes qui m’ont soutenue financièrement : le CRSNQ,
Québec-Océan, le département de biologie de l’Université Laval, Québec-Océan, le FRQNT, la fondation Richard-Bernard, l’IBIS, la Société Provancher et l’AÉLIÉS.
Enfin, aucun mot de suffirait à exprimer toute la gratitude que j’ai envers ma famille et mes
amis pour leur support et leur patience au fil des ans, et en particulier envers Michel, mon
père, Constance, ma mère, Charles-Olivier, Mathilde et, plus récemment, Marianne. Merci.
xvi
Avant-propos
Cette thèse est organisée en 5 chapitres, incluant l’introduction générale (Chapitre 1) et la
conclusion (Chapitre 5). Les chapitres 2, 3 et 4 sont publiés ou en voie de l’être dans des
revues scientifiques. D’autre part, j’ai contribué à 2 autres articles publiés par des membres
du laboratoire. Ceux-ci sont inclus en Annexe.
Le chapitre 2 est publié sous la référence : Dion-Côté A-M, Renaut S, Normandeau E,
Bernatchez L. 2014. RNA-seq reveals transcriptomic shock involving transposable
elements reactivation in hybrids of young lake whitefish species. Molecular Biology and
Evolution 31:1188–1199.
AMDC et LB ont conçu le projet. LB a supervisé le projet. SR a effectué les croisements et
récolté le matériel biologique. AMDC a produit les données. AMDC et EN ont analysé les
données. AMDC a écrit le manuscrit en collaboration avec SR, EN et LB.
Le chapitre 3 est publié sous la référence : Dion-Côté A-M, Symonová R, Ráb P,
Bernatchez L. 2015. Reproductive isolation in a nascent species pair is associated with
aneuploidy in hybrid offspring. Proceedings of the Royal Society B: Biological Sciences
282:20142862–20142862.
AMDC et LB ont conçu le projet. LB et PR ont supervisé le projet. AMDC a acquis les
données avec le support de SP. AMDC analysé les données en collaboration avec RS.
AMDC a écrit le manuscrit en collaboration avec RS, PR et LB.
Le chapitre 4 a été soumis à la revue Molecular Ecology : Dion-Côté A-M, Symonová R,
Lamaze FC, Pelikánová S, Ráb P, Bernatchez L. Standing chromosomal variation
associated with rapid divergence between nascent Lake Whitefish species.
AMDC et LB ont conçu le projet. LB a supervisé le projet. AMDC a acquis les données
avec le support de SP. AMDC, RS et FCL ont analysé les données. AMDC a écrit le
manuscrit en collaboration avec RS, FCL, PR et LB.
xvii
Chapitre 1 : Introduction générale
1
1.1 La spéciation comme mécanisme générateur de biodiversité
La spéciation est le processus évolutif par lequel est générée la biodiversité. On peut ainsi
considérer l’espèce comme l’unité fondamentale de la biodiversité. Les mécanismes
favorisant la spéciation, s’y opposant, et modulant sa vitesse, sont donc d’un intérêt central
en biologie évolutive. Différents concepts d’espèces existent, ayant chacun leurs forces et
leurs faiblesses selon le système d’étude (Coyne et Orr 2004). Dans le cadre de cette thèse,
nous adopterons le concept biologique d’espèce puisque c’est celui qui capture le mieux les
propriétés du système à l’étude. Ainsi, les espèces sont définies comme « des groupes de
populations naturelles, effectivement ou potentiellement interfécondes, qui sont
génétiquement isolées d’autres groupes similaires » (Mayr 1942). La spéciation est donc le
processus évolutif par lequel les barrières à la reproduction se mettent progressivement en
place, conduisant à l’apparition de lignées évolutives génétiquement divergentes et
reproductivement isolées. Ces barrières peuvent être de natures écologique ou génétique,
ou, plus vraisemblablement, une combinaison des deux, puisque toute entité biologique est
déterminée par l’interaction de ces facteurs (Coyne et Orr 2004).
1.2 Barrières à la reproduction
L’implantation de barrières à la reproduction est essentielle au processus de divergence et
de spéciation. En effet, en l’absence de barrières à la reproduction, des populations seront
en mesure d’échanger librement des allèles (flux de gènes), prévenant la divergence
(Slatkin 1987). L’isolement reproducteur peut se produire avant ou après la fertilisation : on
dira qu’il est pré-zygotique ou post-zygotique. Les barrières pré-zygotiques réduisent la
probabilité qu’il y ait fertilisation et favorisent ainsi la divergence en réduisant le flux de
gènes entre populations. Citons à titre d’exemples le choix d’hôte chez les insectes
phytophages et l’homogamie chez l’épinoche à trois épines (McKinnon et al. 2004; Nosil
2007).
Les mécanismes d’isolement reproducteur post-zygotique peuvent être à leur tour classés
comme extrinsèque, s’ils dépendent de l’environnement, ou intrinsèque, s’ils sont plutôt
génétiques et indépendants de l’environnement. Bien qu’intuitive, cette classification n’est
pas absolue. En effet, certains mécanismes peuvent présenter des caractéristiques de
2
l’isolement reproducteur post-zygotique intrinsèque et extrinsèque. Par exemple, une
éclosion asynchrone de larves hybrides repose vraisemblablement sur des mécanismes
génétiques intrinsèques (Robison et al. 2001), mais sera également défavorable en raison du
manque de synchronisme avec les ressources alimentaires d’un point de vue extrinsèque
(Cushing 1990). Bien que les barrières pré-zygotiques semblent contribuer davantage à
l’isolement reproducteur total (Coyne et Orr 1997), les barrières post-zygotiques sont les
plus susceptibles d’être irréversibles (Muller 1939; Muller 1942). Cette thèse s’intéresse
principalement aux mécanismes moléculaires de l’isolement post-zygotique intrinsèque.
1.3 L’isolement reproducteur post-zygotique intrinsèque révélé par les
hybrides
Les hybrides offrent une fenêtre extraordinaire sur l’étude de la spéciation en raison de
leurs propriétés uniques par rapport aux formes parentales (Maheshwari et Barbash 2011).
En nature, les approches génomiques en zones hybrides ont montré que certaines portions
du génome sont moins échangées entre lignées divergentes (ou introgressent moins),
révélant ainsi que ces régions sont impliquées dans l’adaptation et l’isolement reproducteur
(Abbott et al. 2013). En milieu contrôlé, les croisements hybrides permettent de mettre en
évidence des interactions épistatiques complexes entre les génomes parentaux, notamment
du point de vue transcriptionnel (Landry et al. 2007). Les croisements hybrides F2
(croisement de deux hybrides F1) ou rétro-croisés (croisement d’un hybride F1 avec l’une
des forme parentales F0) présentent fréquemment un phénomène de rupture hybride (hybrid
breakdown en anglais), alors que les hybrides de première génération (F1) peuvent ne
montrer que peu ou pas de signes d’isolement reproducteur (e.g. Ellison et Burton 2008;
Dey et al. 2014; Stelkens et al. 2015). Ce phénomène semble résulter d’interactions
épistatiques (ou non-additives) entre allèles divergents des génomes parentaux (Rieseberg
et al. 1999; Burton et al. 2006). De plus, les hybrides présentent fréquemment des
phénomènes de dérégulation épigénétique, susceptibles d’induire à leur tour des
changements d’expression génique et des aberrations de ségrégation chromosomique
(Comai et al. 2003; Michalak 2009). Enfin, parce que les parents contribuent de façon
inégale à la progéniture, par exemple via l’ADN mitochondrial transmis par la mère, les
hybrides sont susceptibles de présenter des phénomènes d’incompatibilité et de
3
débalancement uniques (Blier et al. 2001; Maheshwari et Barbash 2011; Burton et al.
2013). Ainsi, l’étude de l’hybridation permet de mettre en évidence les mécanismes
impliqués dans l’isolement reproducteur post-zygotique intrinsèque entre lignées
divergentes.
1.4 Contexte géographique de la spéciation
Historiquement, les barrières géographiques à la reproduction, en limitant le flux de gènes,
étaient reconnues comme les plus importantes pour favoriser la spéciation (Mayr 1954a).
Ainsi, la spéciation était étudiée dans son contexte géographique. Un système était donc
caractérisé selon le niveau d’isolement géographique présent entre lignées divergentes. Cet
isolement peut être complet (allopatrie), partiel (parapatrie) ou absent (sympatrie), ce qui
influencera l’impact relatif des forces évolutives en jeu (Coyne et Orr 2004).
En l’absence de flux de gènes, les populations isolées sont libres d’accumuler des
différences génétiques par sélection divergente ou aléatoirement par dérive génétique
(Haldane 1930; Coyne et Orr 2004). D’ailleurs, l’effet de la dérive et de la sélection sur le
développement de l’isolement reproducteur entre populations isolées a pu être reproduit en
laboratoire (Rice et Hostert 1993). En nature, la plus forte évidence supportant un rôle pour
l’isolement géographique dans l’apparition de nouvelles espèces provient d’une
concordance géographique entre espèces soeurs avec des barrières géographiques ou
climatiques connues (Coyne et Orr 2004). Par exemple, la région des Grands Lacs en
Amérique du Nord constitue une zone de suture, en raison de la glaciation du Pléistocène.
Des populations de différentes espèces se sont retrouvées isolées dans différents refuges
glaciaires, prévenant l’effet homogénéisateur du flux de gènes pendant plusieurs dizaines
de milliers d’années, et favorisant ainsi l’apparition d’espèces soeurs (April et al. 2012).
Au sens strict, la spéciation sympatrique se produit dans un contexte où les populations se
chevauchent géographiquement et sont donc libres de se reproduire entre elles. Ainsi, la
spéciation sympatrique de nature écologique requiert une source de sélection divergente,
une forme d’isolement reproducteur, et un mécanisme génétique liant ces deux facteurs
(Nosil 2012). Les exemples empiriques de spéciation purement sympatrique sont
relativement rares, puisqu’il est souvent difficile de rejeter complètement l’hypothèse
d’allopatrie. Ainsi, des exemples « classiques » de spéciation écologique sympatrique sont
4
en réalité plus complexes, par exemple par un rôle sous-estimé de la démographie ou une
contribution de l’isolement géographique. En effet, un cas de spéciation sympatrique chez
l’épinoche à trois épines semble plutôt résulter de deux vagues de colonisation successives
des populations marines en eaux douces (Taylor et McPhail 2000). Dans le même ordre
d’idées, l’exemple classique de divergence sympatrique chez l’insecte Rhagoletis
pomonella aurait été favorisé par un flux de gènes entre populations originant du Mexique
vers les populations états-uniennes (Feder et al. 2005).
Enfin, le mode de divergence se situe généralement entre l’allopatrie stricte et la sympatrie
pure, et ces deux situations sont plutôt les extrêmes d’un continuum de flux de gènes
(Jiggins 2006). Une autre possibilité est la divergence dans un contexte mixte, c’est-à-dire
d’une phase d’allopatrie suivie d’un contact secondaire (Nosil 2012). Les modes de
spéciation mixtes sont susceptibles d’accélérer la fixation de certains réarrangements entre
populations divergentes (Feder et al. 2011). Ainsi, les régions de contact secondaire
constituent des zones particulièrement intéressantes pour l’étude de la divergence et de la
spéciation, tel qu’évoqué précédemment pour le nord-est de l’Amérique du Nord.
1.5 Progrès récents en génétique de la spéciation
Un « gène de spéciation » peut être défini comme tout gène contribuant à l’évolution de
l’isolement reproducteur entre populations divergentes (Nosil et Schluter 2011). On peut
distinguer les bases génétiques de la spéciation liées à l’adaptation d’une part, et à
l’isolement reproducteur d’autre part. Cette classification est imparfaite, puisque
l’adaptation à des environnements divergents peut conduire à l’isolement reproducteur,
mais elle permet de clarifier le point suivant. Avec le progrès des technologies de
séquençage, la dernière décennie a été riche en découvertes dans le domaine de
l’architecture génomique de la spéciation, et, en particulier, de l’adaptation (e.g. Colosimo
et al. 2005; Hoekstra et al. 2006; Ellegren 2013; Renaut et al. 2013; Seehausen et al. 2014)
Cependant, des progrès relativement timides ont été réalisés afin d’identifier les bases
moléculaires directes de l’isolement reproducteur, et, plus particulièrement, des
incompatibilités hybrides. Certes, il existe dans la littérature plusieurs exemples de « gènes
de spéciation » impliqués directement dans l’isolement reproducteur (Presgraves 2010;
Maheshwari et Barbash 2011; Nosil et Schluter 2011), mais, sauf exceptions (e.g. Ellison et
5
Burton 2008; Gagnaire, Normandeau, et Bernatchez 2013), ces exemples ont été identifiés
entre lignées relativement distantes (e.g. Russo et al. 1995; Maheshwari et Barbash 2011).
Il demeure donc difficile d’évaluer si ces « gènes de spéciation » ont réellement contribué
aux phases initiales du processus de divergence (Via et West 2008; Via 2009).
Par ailleurs, les avancées réalisées grâce aux technologies de séquençage à très haut débit
tendent à s’appuyer en grande partie sur les polymorphismes de nucléotide simple ou SNP
(simple nucleotide polymorphism). Or, la caractérisation des meilleurs assemblages de
génomes tels que chez l’humain a montré que les variations structurales et de nombre de
copies sont plus fréquentes que les substitutions simples (Alkan et al. 2011).
Malheureusement, plusieurs de ces variations telles que les séquences répétées, les
insertions d’éléments transposables et les petites inversions demeurent extrêmement
difficiles à caractériser avec les technologies actuelles (Treangen et Salzberg 2011; Ekblom
et Wolf 2014). Ainsi, d’autres stratégies doivent être utilisées en complément des
technologies de séquençage à haut débit afin de saisir l’ensemble des mécanismes
génétiques impliqués en spéciation, et en particulier de ceux liés à l’isolement reproducteur.
1.6 Mécanismes de l’isolement reproducteur post-zygotique intrinsèque
L’isolement post-zygotique intrinsèque se traduit par des phénomènes non mutuellement
exclusifs de stérilité et de non-viabilité des hybrides (Coyne et Orr 2004, p. 253). Il existe
quatre grands mécanismes pouvant conduire à de telles incompatibilité hybrides : la
présence d’endosymbiontes, les incompatibilités génétiques, différents niveaux de ploïdie
et différents réarrangements chromosomiques (Coyne et Orr 2004). Les incompatibilités
génétiques et les réarrangements chromosomiques sont les deux mécanismes abordés par
cette thèse.
1.6.1
Mécanismes génétiques de l’isolement reproducteur post-zygotique intrinsèque
L’isolement post-zygotique intrinsèque conduit à la non-viabilité ou à la stérilité des
hybrides. Or, comment est-il possible que la non-viabilité ou la stérilité hybride évoluent,
alors qu’il s’agit de quelque chose de clairement contre-sélectionné (Orr et Presgraves
2000)? Conceptuellement, même Darwin reconnaissait qu’il est difficile pour une
population de traverser une vallée adaptative pour arriver à un autre optimum adaptatif (Orr
6
1996). Ce paradoxe est résolu par le modèle de Dobzhansky-Muller (Dobzhansky 1937;
Muller 1942). Ce modèle postule que deux populations isolées sont en mesure d’accumuler
des mutations différentes à des loci épistatiques. Dans un contexte de contact secondaire ou
d’hybridation, cette interaction peut être sous-optimale, conduisant à la stérilité ou à la
mortalité des hybrides. Les hybrides de seconde génération (hybrides F2 ou rétro-croisés) et
des générations suivantes (F3, etc.) tendent à montrer des manifestations plus prononcées
des incompatibilités génétiques que les hybrides de première génération (ou F1). En effet,
la recombinaison méiotique chez les hybrides F1 risque de briser des complexes de gènes
co-adaptés. Le modèle de Dobzhansky-Muller a guidé de nombreuses études du domaine
de la spéciation. On connaît aujourd’hui un certain nombres de gènes clairement impliqués
dans des incompatibilités de type Dobzhansky-Muller, et ce, des invertébrés aux
mammifères (Barbash et al. 2003; Presgraves et al. 2003; Ellison et Burton 2008; Mihola et
al. 2009).
1.6.2
Mécanismes
transcriptionnels
de
l’isolement
reproducteur
post-zygotique
intrinsèque
Les profils transcriptionnels peuvent être considérés comme des phénotypes intermédiaires
entre les bases génétiques et le phénotype au sens commun (Aubin-Horth et Renn 2009).
Les profils d’expression génique résultent de la co-évolution de dizaines de protéines
(facteurs de transcriptions), de la séquence régulatrice des gènes et de modifications
épigénétiques telles que les modifications des histones et la méthylation de l’ADN (Wray
2003). C’est dans les années 1970 que King et Wilson, s’appuyant sur l’exemple du
chimpanzé et de l’humain, ont émis l’hypothèse voulant que les différences de régulation
de l’expression des gènes pouvaient jouer un rôle clé en évolution et, en particulier, en
spéciation (1975). De telles différences de régulation peuvent conduire chez les hybrides à
des phénomènes importants de non-additivité, voire de transgressivité par rapport aux
niveaux parentaux et ainsi être liées à l’isolement reproducteur (Wray 2003; Landry et al.
2007). Au-delà des interractions génétiques épistatiques, plusieurs études rapportent des
phénomènes de dérégulation épigénétiques importants chez les hybrides, associés à une
dérégulation de l’expression génique (Michalak 2009). Ainsi, l’étude des profils
7
transcriptionnels entre lignées divergentes et leurs hybrides permet de mieux comprendre
les mécanismes sous-jacents de l’isolement reproducteur post-zygotique intrinsèque.
1.6.3
Mécanismes
chromosomiques
de
l’isolement
reproducteur
post-zygotique
intrinsèque
Le rôle des réarrangements chromosomiques en spéciation, principalement à savoir s’ils
font partie des causes ou des conséquences de la divergence, demeure un sujet chaudement
débattu en biologie évolutive (Faria et Navarro 2010). Il existe deux grands modèles
théoriques concernant le rôle des réarrangements chromosomiques en spéciation. Les
modèles classiques, et en particulier le modèle stasipatrique de White (1969; 1978a),
reposent sur l’apparition de changements chromosomiques défavorables (sous-dominants)
chez les hybrides (Faria et Navarro 2010). Ces réarrangements peuvent déstabiliser la
méiose chez l’hétérozygote, à un point tel que les hybrides seront stériles (White 1978a;
King 1993; Searle 1998). Or, il est peu probable qu’un réarrangement fortement sousdominant augmente en fréquence, puisque celui-ci devrait être rapidement éliminé par la
sélection (Brown et O'Neill 2010). Ainsi, d’autres modèles ont dû être développés pour
tenter de comprendre le rôle des réarrangements chromosomiques en spéciation.
Rieseberg (2001) ainsi que Noor et al. (2001) ont proposé que les inversions contribuaient
plutôt à limiter la recombinaison des régions impliquées, favorisant ainsi la divergence. Ces
modèles, essentiellement de nature génique, postulent plutôt que les réarrangements
chromosomiques, et particulièrement les inversions, favorisent des réductions locales du
flux génique, se traduisant par une augmentation de la différenciation génétique à
l’intérieur de ces réarrangements (Noor et al. 2001; Rieseberg 2001). Les réarrangements
peuvent limiter la recombinaison chez l’hétérozygote par deux mécanismes : 1) en limitant
les enjambements chromosomique (« crossovers ») ou 2) en conduisant à la production de
gamètes non-balancés une fois recombinés (non-viables). Dans les deux cas, les régions à
proximités du réarrangement tendent à moins recombiner chez les hybrides, et donc à
diverger (Faria et Navarro 2010).
Néanmoins, ces modèles ne concernent que les inversions, et négligent d’autres types de
réarrangements chromosomiques tels que les fusion et fissions de chromosomes, les
additions et délétions d’hétérochromatine, ou les translocations (King 1993). Le rôle des
8
autres types de réarrangements chromosomiques dans l’isolement reproducteur est
considéré comme neutre, sinon nébuleux (King 1993). Ainsi, il n’existe pas à l’heure
actuelle de modèle complètement satisfaisant, et formulant des prédictions claires en tenant
compte des divers phénomènes chromosomiques associés à la spéciation (Faria et Navarro
2010).
1.7 Le rôle de la stabilité du génome en spéciation
La stabilité du génome se définit comme un état où la séquence et la structure du génome
demeurent constantes, et dans lequel le génome est en mesure d’assumer correctement ses
fonctions, comme l’expression des gènes et la ségrégation fidèle des chromosomes. Par
opposition, l’instabilité génomique est donc un état de dérégulation où les mutations
surviennent à plus haute fréquence, que ce soit un changement de la séquence d’ADN, les
réarrangements chromosomiques ou par l’aneuploïdie (Aguilera et Gómez-González 2008).
Le concept de stabilité du génome est central en biologie moléculaire, mais peu utilisé en
biologie évolutive. Or, il s’agit d’une propriété clée du génome, susceptible d’influencer de
nombreux processus évolutifs, notamment par l’apport essentiel de la variation via la
mutation au sein de la population (Lynch 2007). Plusieurs facteurs sont susceptibles de
menacer la stabilité génomique, tant d’un point de vue intrinsèque qu’extrinsèque. Par
exemple, le bris des points de contrôle du cycle cellulaire favorise la transmission d’erreurs
de réparation ou réplication de l’ADN, consuisant ainsi à des mutations. Les agents
mutagènes, comme les rayons UV, peuvent causer des cassures de l’ADN qui, lorsque non
réparées, peuvent également conduire à des mutations ou des réarrangements
chromosomiques (Aguilera et Gómez-González 2008).
Par ailleurs, un certain nombre de gènes impliqués dans l’isolement reproducteur postzygotique intrinsèque est associé au maintien ou à la rupture de cette stabilité du génome
(Presgraves 2010; Maheshwari et Barbash 2011). Une étude élégante réalisée chez la levure
a montré en 2003 que la voie de réparation des mésappariements de l’ADN était
responsable de la stérilité des hybrides entre souches divergentes, liant directement les
mécanismes de stabilité du génome à l’isolement reproducteur (Greig et al. 2003). D’autres
exemples illustrant l’importance de la stabilité du génome dans un contexte de divergence
et de spéciation sont discutés ci-bas.
9
1.7.1
Les éléments transposables
Les éléments transposables sont des séquences d’ADN capables de se déplacer (« sauter »)
à l’intérieur du génome de l’hôte via un intermédiaire d’ARN (Classe 1 : Rétrotransposons) ou via sa propre excision (Classe 2 : Transposons à ADN) (Feschotte et
Pritham 2007; Levin et Moran 2011). Chez la majorité des métazoaires, seule une infime
fraction du génome code pour des protéines (Lander et al. 2001; Mouse Genome
Sequencing Consortium et al. 2002). Le reste du génome contient divers éléments
structuraux (ex. centromères et télomères) et régulateurs (ex. promoteurs et miARNs), et
bien sûr les éléments transposables. Les éléments transposables doivent impérativement
être maintenus dans un état silencieux, sans quoi ils constituent une menace sérieuse pour la
stabilité du génome, et conséquemment la valeur adaptative de l’individu (Levin et Moran
2011). Barbara McClintock a suggéré il y a plus de trente ans que l’hybridation puisse
conduire à une dérépression des éléments transposables (McClintock 1984). Les
conséquences de la dérépression des éléments transposables sont imprévisibles, mais
peuvent aller de la dérégulation de la transcription, à l’interruption de cadres de lecture, et
même au cancer (Feschotte 2008; Levin et Moran 2011). Plusieurs exemples de
réactivation des éléments transposables résultant d’hybridation inter-spécifique sont
rapportés dans la littérature, et sont parfois associés à la déméthylation de l’ADN (e.g.
O'Neill et Graves 1998; Ungerer et al. 2006; Kelleher et al. 2012). Ainsi, l’hybridation est
susceptible de conduire à des phénomènes d’instabilité génomique liée à la dérépression
des éléments transposables.
1.7.2
L’aneuploïdie
L’aneuploïdie peut se définir comme étant la présence d’un nombre anormal de
chromosomes, résultant d’erreurs lors de la ségrégation des chromosomes (Compton 2011;
Gordon et al. 2012). Les mécanismes responsables de l’aneuploïdie sont nombreux et
complexes. On retiendra que ces mécanismes peuvent conduire à la non-disjonction d’un
ou plusieurs chromosomes en méiose ou en mitose (Compton 2011). En fonction du
moment où cette non-disjonction se produit, l’individu sera entièrement aneuploïde ou
présentera un mosaïcisme, c’est-à-dire une présence de cellules euploïdes et aneuploïdes au
sein de l’organisme. Les conséquences de l’aneuploïdie sont variables, et dépendent de
10
l’organisme à l’étude et du type d’aneuploïdie impliqué. De plus, la propension à produire
des cellules aneuploïdes varie énormément d’une espèce à l’autre (de 1 cellule sur 1 million
chez la levure à près de 1 sur 100 chez l’humain (Compton 2011)). Alors que la triploïdie
est relativement bien tolérée chez les plantes (complément entier surnuméraire), la plupart
des trisomies (un seul chromosome surnuméraire) sont non-viables chez l’humain (Siegel et
Amon 2012).
Les mécanismes moléculaires par lesquels l’aneuploïdie conduit à une réduction de la
valeur adaptative pour l’individu demeurent incompris. Néanmoins, plusieurs auteurs
s’accordent pour dire que le débalancement d’un chromosome conduit à un déséquilibre de
la stoechiométrie des protéines (Torres et al. 2008; Compton 2011; Siegel et Amon 2012).
De plus, il semble que l’aneuploïdie soit fortement associée à l’instabilité génomique, en
plus de conduire à une augmentation des dommages à l’ADN (Sheltzer et al. 2011). Enfin,
lorsque des chromosomes sur-numéraires sont présents, il semble que les machineries
transcriptionnelle et de traduction puissent devenir limitantes, ce qui exerce une pression
pour atteindre l’homéostasie cellulaire (Torres et al. 2007). À l’exception de quelques cas
chez des espèces modèles comme la levure et la souris (Forejt et al. 2012; Hauffe et al.
2012; Charron et al. 2014), l’aneuploïdie n’a été que très rarement rapportée, voire étudiée,
dans un contexte d’isolement reproducteur.
1.7.3
L’hétérochromatine
L’hétérochromatine est une forme de l’ADN nucléaire compactée grâce à des modifications
de la molécule d’ADN elle-même (ex. méthylation des cytosines) et des histones autour
desquelles l’ADN est enroulé (ex. hypo-acétylation et méthylation) (Grewal et Jia 2007).
L’hétérochromatine joue un rôle clé dans nombre de processus nucléaires, et
particulièrement dans le maintien de la stabilité du génome, notamment pour la régulation
de la transcription, la recombinaison méiotique et la ségrégation des chromosomes
(Eissenberg et Elgin 2014). Typiquement, l’hétérochromatine est constituée de séquences
d’ADN hautement répétées telles que les satellites, les éléments transposables et les
centromères. Pour cette raison, les séquences hétérochromatiniennes demeurent
extrêmement difficiles à caratériser par les approches actuelles de séquençage et tendent à
11
manquer même dans les assemblages de génomes les plus complets tel que celui de
l’humain (Ekblom et Wolf 2014).
L’hétérochromatine, et en particulier la méthylation de l’ADN, occupe un rôle central dans
le maintien de la stabilité du génome (Weber et Schübeler 2007). L’ADN est méthylé tôt au
cours du développement par une famille d’enzymes essentielles au développement
nommées ADN méthyl-transférase (DNMT, DNA methyltransferase) (Jones 2012). La
fonction de cette méthylation varie selon le contexte (site d’initiation de la transcription,
corps du gène, centromère), mais on retiendra surtout son rôle de répresseur de la
transcription et dans le maintien de l’hétérochromatine. La méthylation de l’ADN permet
également de réprimer les éléments transposables. Enfin, la dérégulation de la méthylation
de l’ADN a été associée à une augmentation de l’instabilité du génome, dont la présence de
réarrangements chromosomiques et l’aneuploïdie (Grewal et Jia 2007).
1.8 Le Grand Corégone
Le Grand Corégone (Coregonus clupeaformis) est un poisson lacustre d’Amérique du
Nord. Certaines populations sont caractérisées par l’occurrence en sympatrie d’une forme
normale et d’une forme naine dérivée. Dans le bassin de la Rivière St-Jean dans le nord-est
de l’Amérique du Nord, deux lignées glaciaires auraient été géographiquement isolées par
l’avancée des glaciers il y a environ 60 000 ans, selon les plus récentes estimations basées
sur le séquençage complet du génome mitochondrial (Jacobsen et al. 2012). Suite au retrait
des glaciers (~12 000 ans), ces deux lignées, dites Atlantique et Acadienne, ont colonisé les
lacs nouvellement formés, se retrouvant ainsi en sympatrie suite à un contact secondaire
(Bernatchez et al. 2010). La compétition intra-spécifique pour les ressources (Landry et al.
2007), la disponibilité de niche écologique (Pigeon et al. 1997) et les pressions de sélection
divergente (Lu et Bernatchez 1999) auraient ensemble favorisé la divergence de la forme
naine limnétique depuis la forme normale benthique. Ces deux écotypes diffèrent par une
gamme de traits phénotypiques, tels que la taille et le poids à l’âge adulte, l’âge de maturité
sexuelle et le mode de vie. Alors que la forme normale benthique peut atteindre une taille
de plus de 40 cm, un poids supérieur à 1 kg et la maturité à 3-4 ans, la forme naine
limnétique n’atteint qu’une taille d’environ 20 cm pour un poids de 100g, et mature dès 1-2
ans (Bernatchez et al. 2010). Des études transcriptomiques réalisées à partir de micro-puces
12
à ADN ont également montré des différences d’expression génique importantes entre les
formes naine et normale (Derome et al. 2006; St-Cyr et al. 2008; Whiteley et al. 2008;
Jeukens et Bernatchez 2011). Dans un contexte de divergence sympatrique, c’est
l’implantation de barrières à la reproduction efficaces qui permet de consolider un tel
niveau de divergence. Il existe des preuves directes et indirectes de l’isolement pré-zygotique chez les corégones.
De par leur différence de taille, les corégones nains et normaux sont davantage suceptibles
de se reproduire avec leurs semblables (McKinnon et al. 2004). De plus, des données non
publiées suggèrent que les formes naines et normales se reproduisent à des moments et
lieux différents dans certains plans d’eau. Par ailleurs, puisqu’il est possible de réaliser des
croisements contrôlés entre les formes naines et normales du Grand Corégone, nous savons
qu’il existe des barrières post-zygotiques extrinsèques et intrinsèques puissantes entre les
formes naine et normale. La mortalité embryonnaire des croisements hybrides F1 et rétrocroisés est beaucoup plus importante que celle des croisements purs nains et normaux
(~50% et ~70% respectivement, contre ~20%; Lu et Bernatchez 1998; Rogers et
Bernatchez 2006). Parmi les embryons rétro-croisés, une proportion importante se
développe anormalement (10-30%) et montre un phénotype malformé (Renaut et
Bernatchez 2011; Dion-Côté et al. 2015). De plus, l’émergence des larves rétro-croisées est
asynchrone comparativement aux larves pures, ce qui peut conduire à un découplage avec
les ressources trophiques disponibles (Rogers et Bernatchez 2006). Enfin, des études de
transcriptomique ont montré des phénomènes de dérégulation transcriptionnelle importante
chez ces hybrides rétro-croisés malformés, associés à une sous-régulation des gènes
essentiels au développement (Renaut et Bernatchez 2011).
Ainsi, le Grand Corégone est un excellent système pour étudier les bases moléculaires de
l’isolement reproducteur. D’une part, des études ont montré que les différences
phénotypiques observées sont en bonne partie déterminées génétiquement (Bernatchez et
al. 2010). D’autre part, puisque ces formes ont divergé récemment à l’échelle évolutive
(~12 000 ans), il est plus probable que les incompatibilités identifiées aient réellement
contribué au processus de divergence, plutôt que de s’être accumulées par la suite (Via et
West 2008; Via 2009). Enfin, la possibilité de réaliser des croisements hybrides en
laboratoire permet d’étudier directement les barrières à la reproduction chez les hybrides.
13
1.9 Objectifs de la thèse
L’objectif général de ma thèse était d’apporter un nouvel éclairage sur les bases
moléculaires de l’isolement post-zygotique dans un système d’espèces naissantes. Mon
approche cible les causes mécanistiques de l’isolement reproducteur, par opposition à la
recherche de gènes de spéciation. Plus spécifiquement, j’investigue l’importance de la
stabilité du génome et de sa réorganisation tôt dans le processus de spéciation. Mes travaux
apportent des éléments de réponses à deux questions d’actualité dans le domaine de l’étude
de la spéciation (Marie Curie SPECIATION Network 2012). Premièrement, quelles sont les
barrières contribuant à l’isolement reproducteur au début du processus de spéciation?
Ensuite, quelles sont les conditions environnementales et génétiques favorables à la
divergence et, ultimement, à la spéciation?
Dans le second chapitre, nous avons voulu tester directement l’hypothèse de la dérépression
des éléments transposables chez les hybrides. En effet, des travaux réalisés au laboratoire
suggéraient que les éléments transposables puissent être déréprimés chez les hybrides du
Grand Corégone (Renaut et Bernatchez 2011). Par une approche de séquençage d’ARN,
nous avons confirmé que les éléments transposables et les ARNs non-codants étaient
déréprimés chez les hybrides rétro-croisés malformés. Cette étude a également permis de
confirmer les phénomènes de transgressivité transcriptionnelle chez les hybrides et de
découvrir de nouveaux gènes candidatsde la divergence adaptative entre les Corégones
nains et les normaux.
Le troisième chapitre visait à tester l’hypothèse que l’instabilité génomique contribue à
l’isolement reproducteur dans le système du Grand Corégone. Pour ce faire, nous avons
directement examiné les chromosomes d’hybrides rétro-croisés sains et malformés, ce qui
nous a permis de mettre en évidence un phénomène d’aneuploïdie généralisée. Plus
précisément, nous avons découvert que les comptes de chromosomes variaient beaucoup
d’une cellule à l’autre chez les individus rétro-croisés sains, une observation cohérente avec
une instabilité mitotique de la ségrégation des chromosomes. D’autre part, nous avons
découvert des individus aux comptes chromosomiques très variables chez les rétro-croisés
malformés, certains étant haploïdes, diploïde, triploïde et même presque tétraploïde. Cette
observation suggère fortement une instabilité méiotique prononcée chez leur parent hybride
14
F1, conduisant vraisemblablement à la non-disjonction du complément chromosomique
entier.
Dans le quatrième et dernier chapitre, nous avons appliqué des techniques de cytogénétique
classique et moléculaire afin de vérifier la présence de réarrangements chromosomiques au
sein de trois paires d’espèces naissantes du Grand Corégone. Cette approche nous a permis
de mettre en évidence un polymorphisme sub-chromosomique inattendu, reflétant
vraisemblablement un processus de réorganisation génomique en cours. Nous avons
également appliqué une stratégie inédite de statistiques multi-variées aux données
cytogénétiques qui nous a permis de détecter des patrons de divergence chromosomiques
entre lignées glaciaires, entre lacs, et entre écotypes.
15
Chapitre 2 : RNA-seq reveals transcriptomic shock
involving transposable elements reactivation in hybrids of
young lake whitefish species
16
2.1 Résumé
L’identification des bases moléculaires de l’isolement reproducteur entre lignées
divergentes est une étape essentielle vers la compréhension de la spéciation dans les
populations naturelles. Les barrières à la reproduction peuvent conduire à la rupture
hybride, un syndrome documenté dans plusieurs systèmes et impliquant potentiellement la
réactivation des éléments transposables. Dans l’Est de l’Amérique du Nord, deux lignées du
Grand Corégone ont colonisé plusieurs lacs postglaciaires il y a ~12 000 ans, et une espèce
naine limnétique a évolué de façon répétée à partir de l’espèce normale benthique.
L’isolement reproducteur est incomplet entre ces formes : des hybrides viables peuvent être
générés en laboratoire, mais une mortalité significative chez les hybrides rétro-croisés est
observée, associée à un phénotype malformé, suggérant un phénomène de rupture hybride.
Par des analyses de séquençage d’ARN, l’objectif de cette étude était d’identifier les gènes
mal exprimés chez ces hybrides et de rigoureusement tester l’hypothèse de réactivation des
éléments transposables chez ceux-ci. Nous avons comparé le profil transcriptionnel
d’embryons purs, hybrides F1, et d’hybrides rétro-croisés sains et malformés à un stade
embryonnaire tardif. Pour la première fois, des différences d’expression prononcées,
cohérentes avec la divergence adaptative documentée par des études antérieures chez les
adultes, ont été identifiées entre les embryons purs. Une profonde dérégulation à l’échelle
du transcriptome a été observée chez les rétro-croisés malformés, avec plus de 15% des
transcrits différentiellement exprimés pour toutes les comparaisons, comparativement à
1,5% entre les formes parentales. De solides évidences supportant l’hypothèse de la
réactivation des éléments transposables et des transcrits non-codants sont présentées. Nous
suggérons que la rupture hybride résulte probablement de nombreuses incompatibilités
génomiques, incluant vraisemblablement les éléments transposables. Combinés aux études
antérieurest, ces résultats révèlent une synergie entre plusieurs barrières à la reproduction,
contribuant au maintien de la divergence entre ces deux jeunes espèces de corégones.
17
2.2 Abstract
Identifying the molecular basis of reproductive isolation among diverging lineages
represents an essential step toward understanding speciation in natural populations.
Postzygotic barriers can lead to hybrid breakdown, a syndrome that has been documented
in several systems, potentially involving the reactivation of transposable elements. In
northeastern North America, two lake whitefish lineages have repeatedly colonized
postglacial lakes ~12,000 years ago, and a dwarf limnetic species has evolved multiple
times from the normal benthic species. Reproductive isolation is incomplete between them;
viable hybrids can be generated in the laboratory but significant mortality occurs and is
associated with a malformed phenotype in backcross embryos, thus revealing a hybrid
breakdown syndrome. By means of RNA-seq analyses, the objective of this study was to
determine which genes were misregulated in hybrids and rigorously test the hypothesis of
transposable element reactivation. We compared the transcriptomic landscape in pure
embryos, F1-hybrids, and healthy and malformed backcrosses at the late embryonic stage.
Extensive expression differences consistent with previously documented adaptive
divergence between pure normal and dwarf embryos were identified for the first time.
Pronounced transcriptome-wide deregulation in malformed backcrosses was observed, with
over 15% of transcripts differentially expressed in all comparisons, compared with 1.5%
between pure parental forms. Convincing evidence of transposable elements and noncoding
transcripts reactivation in malformed backcrosses is presented. We propose that hybrid
breakdown likely results from extensive genomic incompatibilities, plausibly encompassing
transposable elements. Combined with previous studies, these results reveal synergy among
many reproductive barriers, thus maintaining divergence between these two young
whitefish species.
18
2.3 Introduction
Biodiversity is generated and maintained in different geographical settings through the
process of speciation. In recent years, considerable efforts have been deployed towards the
understanding of ecological sympatric speciation, a particular case of speciation where
populations diverge, while still potentially freely interbreeding (Rundle and Nosil 2005;
Schluter 2009; Nosil and Feder 2012). At the other end of the continuum, allopatric
speciation occurs when populations diverge while geographically separated, and is
considered by modern evolutionary biologists to be the null-hypothesis for speciation
(Coyne and Orr 2004). Theory describes well the buildup and maintenance of accumulated
divergence,
with
the
Bateson-Dobzhansky-Muller
(BDM)
model
of
genetic
incompatibilities at the heart of the process. Still, little is known about the specific
molecular targets most likely maintaining young diverging lineages apart following
secondary contact in natural populations, the ‘ultimate test’ of allopatric speciation (Coyne
and Orr 2004). Hence, key questions regarding the number and type of genes most likely to
be involved in the speciation process remain, and accordingly, a better understanding of the
mechanisms underlying recent speciation event is needed, especially in natural populations
(Nosil and Schluter 2011; Marie Curie SPECIATION Network 2012).
Classically, reproductive barriers have been divided into pre- and post-zygotic. Intrinsic
post-zygotic barriers, as described by the BDM model of genetic incompatibilities, result in
hybrid sterility or inviability. These incompatibilities arise from accumulated genetic
divergence, either neutral (Lynch 2007) or adaptive (Rundle and Nosil 2005). When
secondary contact occurs, recombination can disrupt co-adapted alleles occurring in
parental forms, and according to the BDM model of genetic incompatibilities, this can lead
to a breakdown of epistatic interactions, particularly so in post-F1-hybrid generations.
Ultimately, this may result in strong post-zygotic intrinsic reproductive barriers
(Dobzhansky 1937; Muller 1942; Coyne and Orr 2004). Clearly, determining the genetic
basis of intrinsic post-zygotic reproductive barriers will help unravel how biodiversity is
maintained as such barriers invariably prevent further gene flow among lineages, and thus
complete the speciation process (Muller 1939; Muller 1942). The best-documented
examples of the molecular mechanisms underlying post-reproductive barriers mostly
involve one or a few genes in old species (i.e. several hundreds of thousands to a few
19
millions of years), where speciation is nearly complete (Wittbrodt et al. 1989; Schartl et al.
1994; Barbash et al. 2003; Presgraves et al. 2003, but see Christie and Macnair 1984;
Wright et al. 2013). Moreover, the combined action of time and linkage disequilibrium may
eventually hinder the discovery of the causative mutations, and associated post-zygotic
barriers (Via and West 2008). Therefore, investigating the molecular mechanisms
underlying reproductive isolation in recently diverged natural populations may provide a
better comprehension of the speciation process.
Recent work has provided evidence regarding the role of gene regulatory networks,
epigenetics and transposable elements in speciation, leading to a broader interpretation of
the BDM model (reviewed in Abbott et al. 2013). In particular, numerous studies have
suggested a crucial role for transposable elements in speciation (Rieseberg et al. 1995;
O'Neill and Graves 1998; Labrador et al. 1999; Ungerer et al. 2006; Symonová et al. 2013).
These studies generally support McClintock’s hypothesis of a “genomic shock”, which
posits that transposable elements reactivation in hybrids results from the stress triggered by
the admixture of two diverged genomes in a single individual (McClintock 1984). Early
studies in Drosophila melanogaster have shown that a DNA transposon (P element) can
lead to sterility in F1-progeny of dysgenic crosses (Kidwell et al. 1977; Rubin et al. 1982),
likely as a result of the reactivation of multiple types of transposable elements (Khurana et
al. 2011). Similarly, a recent study has shown the transcriptional reactivation of
transposable elements in interspecific Drosophila hybrids (Kelleher et al. 2012). In
marsupials, amplification of retro-elements has been associated with genome-wide undermethylation and chromosome remodeling in hybrids (O'Neill and Graves 1998). Hence,
evidence supporting the hypothesis that transposable element reactivation represents an
important genetic mechanism underlying post-zygotic hybridization barrier is rapidly
mounting.
Lake whitefish represents an excellent model to study early post-zygotic barriers because
reproductive isolation is recent and remains incomplete in many populations. Natural
populations from northeastern North America experienced a recent allopatric period during
the Pleistocene glaciation (~60,000 yr BP), followed by secondary contact in newly formed
lakes once the ice sheet retreated (Bernatchez et al. 2010; Jacobsen et al. 2012). Lakes from
the St. John River basin (Maine, USA and Québec, Canada) were colonized by two glacial
20
lineages, so-called Atlantic and Acadian, the former evolving rapidly and repeatedly into a
dwarf limnetic species from the normal benthic species (Pigeon et al. 1997). Early
transcriptomic studies using heterologous cDNA microarrays have documented gene
expression divergence linked to phenotypic divergence between dwarf and normal
whitefish, both at the juvenile and adult stage, yet not at the embryonic one (Derome et al.
2006; St-Cyr et al. 2008; Whiteley et al. 2008; Nolte et al. 2009; Renaut et al. 2009;
Jeukens et al. 2010). Given that hybrids can be generated and maintained under laboratory
conditions, earlier studies have found strong post-zygotic barriers between dwarf and
normal whitefish, including an increased mortality of hybrid embryos, patterns of
transgressive segregation comprising incoherent hybrid hatching times, as well as signs of
segregation distortion in backcrosses, suggestive of BDM incompatibilities (Lu and
Bernatchez 1998; Rogers and Bernatchez 2006; Rogers and Bernatchez 2007; Renaut et al.
2009; Renaut and Bernatchez 2011; Gagnaire et al. 2013). A proportion of backcross
embryos (30-50% depending on crosses) has also been characterized by a unique
malformed phenotype and displays extensive gene expression variance and underexpression of essential development genes (Renaut and Bernatchez 2011). Altogether, these
observations suggest extensive genomic incompatibilities between normal and dwarf
whitefish.
In the current study, the molecular phenotype of malformed backcross embryos was further
characterized to disentangle the molecular mechanisms underlying post-zygotic hybrid
incompatibilities. It was hypothesized that transposable elements would be reactivated in
malformed backcross embryos, and would be accompanied by the down-regulation of key
metabolic pathways in response to genomic stress. To test this hypothesis, gene expression
was measured by means of RNA-sequencing in pure, F1-hybrids, healthy backcross, and
malformed backcross embryos. Our results suggest a severe genomic shock involving
transposable elements reactivation in the hybrid progeny, despite the very young age of
these incipient species.
21
2.4 Results
Transcriptome sequencing and assembly
A total of 1.26 x 109 100bp paired-end reads were sequenced, totalizing 1.26 x 1011
nucleotides. On average, 5.24 x 107± 6.23 x 106 paired-end reads were sequenced per
library. Filtered individual reads (9.30 x 108 remaining reads) were assembled into 77,697
contigs with a minimum size of 200bp using ABySS first for each library, then all libraries
together, and finally with CLC genomics workbench (0.7 overlap, 98% similarity). The
final transcriptome also included 1,350 contigs smaller than 200bp, and annotated as
transposable elements, for a total of 79,047 contigs (total assembly size: 67,143,916 bp).
This corresponds to roughly 2% of the expected whitefish genome size (3Gb, Booke 1968;
Hardie and Hebert 2003). The smallest contig was 94bp and the largest 10,610bp, with an
average size of 565bp and a median of 849bp. A blast of the assembled transcriptome
against itself (blastn, % identity >96%, overlap >100 bp) gave very few hits (n = 5,907),
suggesting the assembly of numerous alternative transcripts, closely related paralog
assembly, splitting of exons, or more likely a combination of the three.
Transcriptome annotation
Approximately half of the contigs (n = 38,636, 49%) were successfully annotated by tblastx
(Figure 2.1). Among these annotations, 22,139 were unique (57.3%), consistent with the
assembly of alternative transcripts, exon splitting, or closely related paralogs. An openreading frame (ORF) of size ≥300bp was found for 71.3% of the contigs in the annotated
portion (27,546 contigs out of 38,636), whereas an ORF was found for only 19.6% of the
un-annotated portion (7,917 contigs out of 40,411). Hence, contigs for which we found no
tblastx hit and no ORF ≥ 300bp were annotated as putatively non-coding transcripts.
Annotations in Figure 2.2 show that expressed transposable elements are largely composed
of DNA transposons (n = 2,122, 72%) relative to retrotransposons (n = 823, 28%). Among
transposable element super-families, the Mariner/Tc1 DNA transposons were the most
prevalent (n = 2,003, 68%). The second most abundant are gypsy elements (n = 205, 7.0%),
a class of LTR retrotransposons, followed by CR1 (n = 190, 6.5%), which are non-LTR
retrotransposons. Other super-families (n = 550, 18.7%) individually account for less than
5% of the annotated expressed transposable elements.
22
Transcriptional divergence between pure normal and dwarf embryos
To provide an overview of the transcriptomic landscape and reduce the number of
dimensions of this large dataset, a principal component analysis (PCA) was performed with
the prcomp function in the R environment, based on normalized read counts obtained from
edgeR. While normal and dwarf replicates closely cluster on the two first PCs (Figure
2.3a), PC3 shows that the dwarf transcriptomes form a distinct and more cohesive group as
compared to the normal transcriptomes (Figure 2.3b). However, PC3 explained a relatively
small proportion of the overall variance found in the dataset (< 7.5%).
Table 2.1 provides a summary of the number of differentially expressed transcripts between
all groups for all contigs (below diagonal) and transposable elements (TE, above diagonal).
Briefly, 1,166 transcripts were differentially expressed between dwarf and normal whitefish
embryos, including 314 by at least five fold and 28 by more than 1000 fold (Figure S 2.2).
Enriched Gene Ontology (GO) biological processes included terms related to muscle
development and vision, in addition to cardiac development (Table 2.2) and the term
“Mitotic chromosome condensation”. Phosphofructokinase, a key enzyme in glycolysis that
catalyzes an irreversible reaction, was up-regulated in dwarf as compared to normal
whitefish (Figure S 2.5).
Gene expression in healthy hybrids
As depicted in Figure 2.3a, PC1 did not differentiate F1-hybrids and healthy backcrosses
from pure embryos. However, healthy backcrosses displayed extensive transcriptional
variability, comprising a range of expression largely outside the scope of the pure embryos
on PC2. Also, PC3 showed that both F1-hybrids and healthy backcrosses were more similar
to normal than dwarf embryos, possibly reflecting maternal effects (see Discussion).
Differential expression analysis confirmed observations based on PCA. More transcripts
were differentially expressed in F1-hybrids as compared to dwarfs (n = 653) than to
normals (n = 108). Similarly, healthy backcrosses had more differentially expressed
transcripts as compared to dwarf (n = 967) than to normal (n = 248).
Transcriptome-wide deregulation in malformed backcrosses
As shown in Figure 2.3a, PC1 clearly distinguishes malformed backcrosses from all other
healthy embryos. In addition, malformed backcrosses show pronounced heterogeneity
23
among replicates relative to other cross types. A large number of transcripts were
differentially expressed in malformed backcrosses compared to pure crosses, ranging from
17.5% (n = 13,217, compared to normal) to 21.0 % (n = 15,815, compared to dwarf) of the
expressed transcripts.
The proportions of over-expressed and under-expressed transcripts in malformed
backcrosses are presented in Table 2.3. The “all contigs” column shows that the proportion
of over- vs. under-expressed transcripts does not significantly differ from a 1:1 ratio in all
comparisons. Among over-expressed transcripts in malformed backcrosses relative to all
other groups (“Common” column, n = 4,379), 25 enriched GO terms were found, most of
which grouped into broader categories defined as protein synthesis, inflammatory response
and stress response (Table S 2.1). In addition, enzymes from core metabolic pathways were
down-regulated in malformed embryos. For example, many down-regulated enzymes were
found in the carbohydrate metabolism pathway (Figure S 2.5). Similar trends of downregulation of housekeeping metabolic pathways were also observed for the TCA cycle,
oxidative phosphorylation, fatty acid elongation and purine and pyrimidine metabolism
(Figure S 2.6 to Figure S 2.10).
Transposable elements and non-coding RNA reactivation in malformed backcrosses
In contrast to the even ratio of expression observed among all contigs, transposable
elements showed a highly significant excess of over-expression (p < 0.0001 in all
comparisons, Fisher’s exact test) with an approximate 2:1 ratio in all comparisons (Table
2.3). Depending on the comparison, several transcripts were over-expressed by a factor of
10 or more relative to other embryo groups, including one transposable element showing a
232 fold increase compared to dwarf (Figure S 2.3). Among the over-expressed
transposable elements in malformed backcrosses that were common to all comparisons,
retrotransposons were over-represented (n = 51 out of 125 over-expressed transposable
elements [40.8%] vs n = 823 out of 2,945 annotated transposable elements [27.9%]; p =
0.032, Fisher’s exact test). However, only two super-families of retrotransposons (L1, n =
45 contigs out of 2,945 annotated transposable elements [1.5%] vs n = 3 contigs out of 125
commonly over-expressed transposable elements [2.4%]; R2, n = 11 out of 2,945 annotated
transposable elements [3.7%] vs n = 3 out of 125 commonly over-expressed transposable
24
elements [2.4%]) were significantly over-represented (p = 1.81e-05 and p = 0.0033
respectively, Fisher’s exact test).
Similarly to transposable elements, putatively non-coding RNA displayed a highly
significant excess of over-expression in malformed backcrosses, again with an approximate
2:1 ratio compared to all other cross types (Table 2.3). Finally, a contig homologous to the
DNA methyltransferase DNMT1 was significantly under-expressed in malformed
backcrosses in all comparisons (fold-change = 0.43, 0.40, 0.41, and 0.43 vs normal, dwarf,
F1-hybrids, and healthy backcrosses, respectively; all FDR ≤ 0.00001). This gene is of
particular interest because it has been shown to be involved in the transcriptional repression
of transposable elements (see Discussion).
25
2.5 Discussion
The goal of this study was to identify the transcriptional mechanisms underlying the
genomic shock previously described in hybrids between dwarf and normal whitefish.
Specifically, we tested the hypothesis that transposable elements reactivation may be linked
to the genomic shock observed in malformed backcross embryos. Key transcriptional
differences were identified between normal and dwarf embryos, consistent with welldescribed phenotypes in adult normal and dwarf whitefish. Transcriptome-wide
deregulation observed in malformed backcross embryos was associated with a shutdown of
core metabolic pathways. In particular, observations supporting transposable elements and
non-coding transcripts reactivation in malformed backcrosses are presented, giving a new
angle to our understanding of the genetic mechanisms underlying post-zygotic barriers in
lake whitefish.
Transcriptome assembly and annotation
We assembled the most comprehensive and complete transcriptome published to date for a
coregonid species (Jeukens et al. 2010; Renaut et al. 2010), providing a useful tool to the
salmonids research community. The number of contigs assembled was relatively high (n =
79,047), which can be explained in several ways. First, all individuals were used to
assemble the reference transcriptome, including hybrids, in order to consider “biologically
aberrant” transcripts that may be expressed only in hybrids. In addition, salmonids are
pseudo-tetraploids, having undergone a recent whole genome duplication (~60 MYA;
Crête-Lafrenière et al. 2012), potentially increasing the number of assembled contigs.
Evidence also suggests exon splitting among assembled contigs (data not shown), but
investigating the details of this phenomenon was beyond the scope of the current study.
Considering the difficulties regarding gDNA assembly with next-generation DNA
sequencing data encountered in whitefish and also another salmonids (Quinn et al. 2008;
Hébert et al. 2013), we believe our assembly efforts are satisfactory for the questions being
addressed here.
Non-coding transcripts can account for up to ± 50% of the poly-adenylated fraction of the
transcriptome in well-annotated species such as human and yeast (Nagalakshmi et al. 2008;
Djebali et al. 2012). Here, putatively non-coding transcripts accounted for 39.8% of the
26
assembled contigs. This suggests that the vast majority of the contigs in the un-annotated
portion are indeed non-coding RNAs, and reflects substantial expression of non-coding
RNAs, as previously found in different model species (Clark et al. 2011).
Transcriptional and regulatory divergence between dwarf and normal whitefish embryos
We present the first significant transcriptional differences between dwarf and normal
whitefish embryos. Specifically, we found that about 1.5% of the transcriptome was
differentially expressed (n = 1,166), consistent with regulatory divergence between these
young species very early in their ontogeny. In contrast, previous work using heterologous
microarrays developed for Atlantic salmon (Salmo salar) identified few differentially
expressed genes between normal and dwarf embryos (Renaut et al. 2009, n = 5; Renaut et
al. 2011, n = 2).
In a study comparing RNA-sequencing to gene expression arrays in human, Marioni et al.
(2008) detected more differentially expressed transcripts by RNA-seq than micro-arrays
(but see Kogenaru et al. 2012). Hence, the discrepancy between this study and previous
work could be caused by a combination of factors. First, RNA-seq has a marked increased
dynamic range compared to micro-arrays (Nagalakshmi et al. 2008). Other factors inherent
to micro-array technology used by Renaut et al. (2009) and Renaut and Bernatchez (2011)
include probe absence on the micro-array itself or divergence with the studied species, loop
design and multiple comparisons (Kerr and Churchill 2001). Very few studies, if any in
non-model species, have made comparisons of both methods with similar samples,
rendering difficult to definitely conclude what causes the observed discrepancy.
Nevertheless, our work suggests a significantly increased power of RNA-sequencing in
detecting specific differentially expressed genes compared to micro-arrays, in non-model
species.
While common garden experiments have previously shown that phenotypic divergence is
partially genetically determined in lake whitefish adults (Rogers et al. 2002), our results
now illustrate that genetic divergence also has a transcriptional impact very early in
ontogeny. Gene ontology analysis among differentially expressed transcripts revealed that
skeletal muscle development genes were enriched between dwarf and normal embryos.
This result is consistent with previous transcriptomic studies on adult white muscle
27
(Derome et al. 2006; St-Cyr et al. 2008; Jeukens et al. 2010), phenotypic differences in
adult size, and growth rate between dwarf and normal whitefish (Bernatchez et al. 2010). A
new candidate physiological process potentially involved in phenotypic divergence
between dwarf and normal embryos was also identified, namely cardiac development. This
is of particular interest, since Trudel et al. (2001) found that dwarf whitefish have lower
food conversion efficiency than normal whitefish, suggesting a higher dwarf metabolic rate.
Since cardiac output is tightly linked to metabolic demand in fishes, cardiac physiological
changes are thus expected in lake whitefish (Farrell and Jones 1992). Along these lines,
previous work has shown a trend towards increased ventricle size in dwarf compared to
normal lake whitefish (Evans et al. 2013). Our finding that genes associated with cardiac
development are differentially expressed between dwarf and normal embryos thus suggests
that changes in heart development may contribute to metabolic divergence. Alternatively,
gene expression differences observed between dwarf and normal embryos could be the
result of different stages of development or tissue compositions rather than being due to
genetic differences. However, the number of genes that were differentially expressed
between pure dwarf and normal whitefish is larger (n = 1,166) than the total number of
genes that have been reported to differ in expression across all major embryonic
developmental stages in zebrafish (n = 732, Mathavan et al. 2005). While the results on
zebrafish and ours cannot be directly compared as they were obtained with different
methods, they certainly provide an indication that most of the genes differentially expressed
between dwarf and normal embryos are indeed the result of genetic differences rather than
different stage of development. In addition, the fact that embryos of both forms used in this
study corresponded to a well described phenotypically developmental stage, and that genes
underlying these differences are in agreement with the adaptive phenotypic divergence
between these two forms again argues against the sole effect of tissue composition or
development effect. Hence, we do not rule out a possible effect of tissue composition or
development, potentially amplifying observed patterns. However, we argue that this is not
the main explanation for such important gene expression divergence and misregulation.
Gene expression in healthy hybrids
The hypothesis that gene expression patterns in hybrids are more similar to normal than
dwarf embryos was verified, as expected based on previous observations in juveniles (Nolte
28
et al. 2009). This is likely the consequence of the genetic background of backcrosses (~75%
normal background), and maternal effects on gene expression in F1-hybrids, as found in
hybrids of another salmonid (Bougas et al. 2010). In backcrosses, this could also result
from segregation distortion disfavoring dwarf alleles in a normal genetic background, and
this phenomenon has been previously documented in lake whitefish hybrids (Rogers and
Bernatchez 2006; Gagnaire et al. 2013). Results presented here are supported by a previous
study of inheritance patterns in juvenile lake whitefish (Renaut et al. 2009). This study
revealed primarily additive patterns of gene expression in F1-hybrids, while backcross
hybrids were characterized by extensive non-additive gene expression and considerable
gene expression variance (Renaut et al. 2009; Renaut and Bernatchez 2011). This could
result from a combination of dominance effects and disruption of co-adapted alleles
through recombination events, ultimately leading to gene misexpression (Landry et al.
2007). On the other hand, genes with large pleiotropic effects, generally assumed to be
transcription factors, could also lead to extensive patterns of gene misexpression. Along
those lines, previous eQTL mapping studies in whitefish have localized several mapping
hotspots (up to 53 eQTL mapping to a single locus), consistent with master regulator
localization (Derome et al. 2008; Whiteley et al. 2008). Admittedly, our study design was
not suitable for direct testing of these alternative hypotheses. Considering the large number
of differentially expressed genes in hybrids, multiple genetic mechanisms are likely
contributing to these gene expression profiles, as observed in hybrids of another salmonid
(Bougas et al. 2010).
Evidence for a transcriptome-wide shock in malformed backcrosses
Hybrid incompatibilities, reflected by the disruption of gene expression pathways, can lead
to increased mortality and transgressive phenotypes in recombinant backgrounds
(Dobzhansky 1940; Muller 1942). This phenomenon has been documented before in lake
whitefish backcrosses through the characterization of a malformed phenotype unique to
recombinant hybrids, and by transcriptomic studies using microarrays (Renaut et al. 2009;
Renaut and Bernatchez 2011). It was thus hypothesized that malformed backcross hybrids
would show extensive transcriptional variance and differential expression, and observations
confirmed this pattern.
29
Given that both healthy and malformed backcross embryos come from the same mother,
our results cannot be attributed to maternal effects specific to malformed embryos. In
addition, the same malformed phenotype was observed in two completely independent
backcrosses, although in variable proportions (normal female X F1-hybrid male, dwarf
female X F1-hybrid male, Dion-Côté, unpublished). PCA showed a unique transcriptional
profile in malformed backcrosses and high heterogeneity among different pools of embryos
(Figure 2.3). Differential expression with pure parental crosses also revealed extensive
transcriptional deregulation in these embryos. In addition, over-expressed transcripts in
malformed backcrosses were enriched in biological processes associated with stress and
inflammatory responses, protein synthesis and lipid metabolism, consistent with a genomic
shock response (Table S 2.1). We also observed that genes in core metabolic pathways
were down-regulated in these malformed backcrosses, complementing previous observation
of down-regulation of essential developmental genes (Renaut and Bernatchez 2011).
Finally, the majority of the transcripts analyzed (n = 799 out of 1,166, 68.5%) showed a
non-additive level of expression in malformed backcrosses (|d/a ratio| > 1.5, Figure S 2.4).
Altogether, these results provide new insights into the malformed phenotype observed in
backcrosses by showing that the genomic shock involves a displacement of cellular activity
from basal metabolism to stress response. In summary, our results are in line with genomewide transcriptional BDM incompatibilities, but could alternatively result from a few major
regulatory genes impacting the whole genome (Landry et al. 2007; Maheshwari and
Barbash 2011). However, as discussed above, it appears more likely that multiple
intertwined genetic mechanisms generate these complex transcriptional patterns, including
transposable elements reactivation, as discussed below.
An alternative explanation could be that the transcriptome-wide deregulation in malformed
backcrosses results from different stages of development or tissue composition. Again, we
do not rule out the possible contribution of these factors, however, we argue that they are
not the main explanation for the observed pattern of transcriptome-wide deregulation. The
total number of genes found to be differentially expressed across all major embryonic
developmental stages in zebrafish (n = 732, Mathavan et al. 2005) is one order of
magnitude lower than observed here (n = 13,217 to n = 15,815). In addition, a previous
study found evidence of transcriptional deregulation, including transposable element
30
reactivation, in individually dissected tissues from adult backcrosses (Renaut et al. 2010).
Moreover, we found evidence of increased variance and elevated transgressivity in
malformed backcrosses as compared to other groups, consistent with genetic
incompatibilities. For all these reasons, we believe that the magnitude of the changes
observed, in terms of number of transcripts differentially expressed and the fold-change
involved cannot be explained solely by differential tissue distribution or sampling stage.
Transposable elements and non-coding transcripts reactivation in hybrids
A preliminary study in lake whitefish has presented evidence that transposable elements
were reactivated in adult backcrosses (Renaut et al. 2010). These elements must be tightly
regulated in order to prevent their mobilization, which can lead to several deleterious
effects such as chromosomal rearrangements (Lönnig and Saedler 2002), epigenetic
alterations (Slotkin and Martienssen 2007), and the generation of DNA damage and
apoptosis (Belgnaoui et al. 2006; Khurana et al. 2011). Transposable element regulation is
achieved typically through cytosine methylation mediated by DNA methyltransferases
(DNMT) and by RNA-based mechanisms, mainly by PIWI protein and PIWI-interacting
RNA (piRNA) in the gonads, leading to their stable repression in the progeny (Slotkin and
Martienssen 2007). Our results clearly show that transposable elements and non-coding
transcripts are reactivated in malformed embryos, often by a factor of 10 or more, and as
much as 232 fold. Over 6,000 non-coding transcripts are differentially expressed in all
comparisons, with over 62% of these being over-expressed; such differences are very
unlikely to be the mere result of different tissue composition, as discussed above.
Moreover, the reactivation of transposable elements is expected to lead to gene
deregulation because they can influence epigenetic marks in their genetic neighbourhood
(Slotkin and Martienssen 2007). We believe that transposable elements could be a key
component of post-zygotic isolation mechanisms in the lake whitefish system given that (1)
around 14,000 transcripts are differentially expressed in malformed backcrosses, regardless
of the comparison, (2) among these, over 8,000 transcripts are commonly differentially
expressed in all comparisons, (3) many transposable elements are reactivated by over 10
fold, including one by as much as 232 fold, and finally (4) a very strict cut-off was used to
identify these (FC > 2, FDR 0.01). Hence, we propose that transposable elements represent
very strong alternative candidates, in addition to genes involved in classical BDM
31
incompatibilities, to explain the genome-wide gene expression disruption (Whitelaw and
Martin 2001) in malformed hybrids.
Reactivation of transposable elements in malformed backcrosses has been observed
previously in other inter-specific crosses (Michalak 2009), but to our knowledge, this has
not been demonstrated in such young vertebrate species (~12,000 years). We have shown
here that many super-families of transposable elements were reactivated in malformed
hybrids. Similarly to the P element system in D. melanogaster, this could suggest that a
given lineage possesses a transposable element for which the other lineage is naive, leading
to piRNA pathway shutdown and consequent reactivation of multiple super-families of
transposable elements (Khurana et al. 2011). On the other hand, our results also support
those obtained for Drosophila interspecific crosses where transposable element reactivation
was shown to result from piRNA pathway protein divergence (Kelleher et al. 2012).
Unfortunately, at this stage of our research program, it is impossible to distinguish between
these two hypotheses. Interestingly, the DNA methyltransferase DNMT1, partly
responsible for repression of repetitive elements, including transposable elements, through
DNA methylation (Jones 2012), was down-regulated in malformed backcrosses. DNMT1
under-expression may therefore also contribute to transposable elements and non-coding
transcripts reactivation. Altogether, our data argue for a severe genomic shock between
recently diverged normal and dwarf whitefish resulting from a combination of divergence
in regulatory mechanisms, in addition to a role for transposable elements and non-coding
transcripts reactivation in hybrids.
Multiple factors contributing to speciation in lake whitefish
The present work adds to a long-term research program that has highlighted multiple preand post-zygotic barriers between dwarf and normal whitefish, some of which can be
indirectly linked to the short period of geographic isolation (Bernatchez et al. 2010). Postzygotic barriers include incoherent hatching time in hybrids (Rogers and Bernatchez 2006),
decreased sperm performance in backcrosses (Whiteley et al. 2009), genome-wide
segregation distortion (Rogers and Bernatchez 2006; Gagnaire et al. 2013), and increased
hybrid mortality throughout development (Lu and Bernatchez 1998; Rogers and Bernatchez
2006; Renaut and Bernatchez 2011). Here, major gene expression disruption previously
32
documented in hybrids was confirmed (Renaut and Bernatchez 2011), and new insights
regarding core metabolic pathways down-regulation and transposable element reactivation
in hybrids were provided. These transcriptional mechanisms appear to be related to a
malformed phenotype and premature death in hybrid embryos.
Despite their young evolutionary age, dwarf and normal whitefish are at an advanced
(albeit incomplete) stage along the speciation continuum, most likely because multiple
mechanisms are acting synergistically to maintain and promote their divergence. This is
consistent with the multifarious hypothesis advanced by Nosil et al. (2009), suggesting that
speciation is likely to proceed faster and further where selection is acting on multiple traits.
As argued by us and others, ~60,000 years of allopatry (~15,000-20,000 generations),
followed by secondary contact and ecological divergence ~12,000 years ago (~3,000-4,000
generations) have generated sufficient genomic divergence between normal and dwarf
whitefish to lead to one of the most complete speciation event involving young sympatric
species pairs of North temperate freshwater fishes (Hendry 2009; Bernatchez et al. 2010).
Other systems provide compelling evidences that multiple barriers can act synergistically to
promote speciation. Studies in the walking stick Timema cristinae have revealed that both
geographical and ecological components are likely to have played a role (Nosil 2007).
Nevertheless, speciation is less advanced in Timema as compared to dwarf and normal
whitefish, as suggested by a lack of evidence for hybrid inviability, despite longer
divergence time (~ 1-2 MY vs 60,000 Y, Nosil 2007). Likewise, many reproductive barriers
have also been documented in the Ficedula flycatcher system, including strong intrinsic
post-zygotic barriers, despite little ecological divergence (Saetre and Saether 2010).
Similarly to the lake whitefish, Ficedula flycatchers have experienced allopatric isolation
during the Pleistocene glaciation (~ 1.5-2 MYA), followed by secondary contact in central
Europe. As portrayed by the Ficedula, Timema and lake whitefish systems, mixed
geographic modes of divergence tend to accelerate the divergence process (Nosil and Feder
2012). Where multiple factors, either adaptive or not, act synergistically, divergence is
likely to be more pronounced, and speciation to proceed faster (Nosil 2007). Clearly, none
of these factors should be neglected, whether ecological or non-adaptive, to realize a
holistic understanding of the mechanisms underlying incipient speciation. As White
(1978b) stressed more than 35 years ago: “speciation is the result of the combined action
33
and interaction of many processes, and any model that relies exclusively on a single process
is bound to be simplistic”.
34
2.6 Material and methods
Crosses and sampling
Protocols for crosses and rearing were described previously (Nolte et al. 2009; Renaut et al.
2009; Renaut and Bernatchez 2011). While individuals used in the current study come from
the same crosses generated during Fall 2006, biological samples are distinct and represent
an older developmental stage than the one previously investigated. Briefly, pure half-sib
families were created by crossing one female with five males. F1-hybrids were obtained by
crossing one normal female (the same as pure normal) to five dwarf males (the same as
pure dwarf). Backcrosses were made with a F1-hybrid laboratory female crossed to five
normal males. Due to sexual products availability, it was impossible to do the
complementary crosses. However, earlier studies have shown similar mortality and
phenotypes for both types of crosses (Lu and Bernatchez 1998; Rogers and Bernatchez
2006). Embryos were individually sampled and inspected between 62 and 69 days postfertilization (between 310 and 345 degree-days, Renaut and Bernatchez 2011) and stored in
RNAlater® at -ƒ#4FGQAMPPCQNMLBQRMRFCNMQR-phylotypic stage, where organs are well
formed, heartbeat is visible, and eyes and dorsal line are pigmented. Malformed embryos
showed strong deformities, with tail generally curved and no heartbeat detectable visually,
as in Renaut and Bernatchez (2011). This phenotype does not correspond to any earlier
stage of development observed in normally developing embryos, while still depicting
characteristics of the phylotypic stage, including eye and dorsal line pigmentation.
Malformed embryos can be easily distinguished from dead ones. To confirm viability, a
comparative group of malformed embryos was followed for several weeks, and continued
development was observed (see Renaut and Bernatchez 2011).
RNA extraction, library preparation and sequencing
Whole embryos were homogenized using a TissueLyser II (Qiagen, Hilden, Germany) and
total RNA was extracted using the PureLink® RNA Mini Kit following the manufacturer's
instructions (Ambion, Life technologies, Carlsbad, United States). RNA was quantified
with a NanoDrop2000 spectrophotometer (Thermo Scientific, Waltham, United States), and
2µg of total RNA was digested with 4U of DNase I (Invitrogen, Life technologies,
Carlsbad, United States) for 15min. at room temperature. DNaseI was inactivated by the
35
?BBGRGML MD K- %$4! DGL?J AMLACLRP?RGML
?LB GLAS@?RCB ?R ƒ# DMP KGL 2.!
quality was assessed using the Experion RNA StdSens kit (Bio-Rad, Hercules, United
States). Only high quality samples (intact rRNA and no detectable trace of gDNA) were
kept for subsequent steps. RNA concentration was measured with Quant-iT™ RiboGreen®
RNA Assay Kit (Invitrogen, Life technologies, Carlsbad, United States) and RNA stored at
-ƒ#
Each sequencing library was prepared from a pool of 400ng of total RNA from three
embryos, for a total of 1200ng. Five libraries were prepared for each group (normal, F1hybrid, healthy backcross, malformed backcross), except dwarf (n = 4, due to sample
limitation). Libraries were individually tagged using the TruSeq RNA sample preparation
kit V2 (Illumina, San Diego, California) following the manufacturer’s instructions with one
KGLMPKMBGDGA?RGML@PGCDJW2.!U?QCJSRCBDP?EKCLRCB?LBNPGKCB?Rƒ#DMPKGL. in
order to get the appropriate size range (300-700 bases). Library size and concentration were
evaluated using the Experion RNA HighSens kit (Bio-Rad, Hercules, United States).
Sequencing was performed on the Illumina HiSeq 2000 platform for 100 cycles with
paired-ends (6 libraries per lane for a total of 4 lanes), at the McGill University and
Genome Quebec Innovation Centre, Montreal, Canada.
De novo assembly and annotation
Tags and adaptor sequences were removed (Genome Quebec Innovation Centre) and reads
were trimmed with a quality threshold of 2% error-rate per base using CLC genomics
workbench (CLCbio, Aarhus, Denmark). The first assembly was done with cleaned reads
of ≥ 90bp and comprising no more than one ambiguous base (1N) using ABySS 1.3.3
(Simpson et al. 2009), one library at a time, with k-mer sizes of 34, 44, 54 and 64 bp
(abyss-pe q=10 e=5 c=5 s=200 d=50 j=4). Scaffolds were removed as they represented less
than 1% of the contigs. Contigs > 80bp were kept and segmented in 100 and 200 bases
fragments with 75% overlap. Fragmented contigs from all the libraries were then combined
into a single file and contigs < 80bp were again removed. All contigs were reassembled
with k-mer sizes of 34, 44, 54 and 64 bases. To reduce redundancy, contigs of ≥ 200 bases
and k-mer coverage ≥ 500 X were kept and reassembled together (70% overlap, 98%
36
identity) using CLC Genomics Workbench. Contigs of < 200 bases but with k-mer
coverage ≥ 1000 X were also kept for subsequent analysis of repetitive elements.
Transposable elements were annotated using tblastx against the Repbase database (Jurka et
al. 2005). Then, blastx was used against (1) Swissprot (http://www.uniprot.org), (2)
Ensembl database for Danio rerio (Danio Zv9), (3) the combined Ensembl databases for
Gasterosteus (BROADS1), Oryzias (HdrR), Takifugu (FUGU 4.0) and Tetraodon
(TETRAODON
8.0)
(http://www.ensembl.org),
and
finally
(4)
nr
(http://www.ncbi.nlm.nih.gov, updated May 23 2012). The best blast result with a minimal
e-value = 10-6 was kept for each contig (Figure 1). Finally, the function getorf from the
EMBOSS package was used in order to detect open reading frames (ORF) (Rice et al.
2000). The longest ORF of ≥ 300bp for each contig was kept for subsequent analysis.
Read mapping and differential expression analysis
Burrows-Wheeler Aligner (BWA, v0.6.2-r126) was used to map reads back to assembled
contigs (Li and Durbin 2009). Default parameters were used with the following exceptions:
-n 3 (maximum edit distance), -e 3 (maximum number of gap extensions), -l 16 (seed size),
-R 30 (number of equally best hits to proceed to suboptimal alignment). For differential
expression analyses, edgeR (Robinson and Oshlack 2010), an R Bioconductor package
(http://www.bioconductor.org/), was used. Library sizes were normalized and only contigs
for which we had ≥ 1 count per million reads in at least 3 samples were kept, leaving
75,431 contigs to be analyzed for differential expression out of 79,047 that were assembled
(95.4%, 74,273 contigs ≥ 200bp and 1,158 contigs < 200bp repetitive elements). This is
because weakly expressed genes are more susceptible to be called as differentially
expressed (Nagalakshmi et al. 2008). All groups were compared to each other in a pairwise
manner, and differentially expressed contigs were determined with a strict false-discovery
rate (FDR) < 0.01 and fold-change (FC) > 2. Gene ontology (GO) enrichment was
performed using Blast2GO with the default parameters (Conesa et al. 2005). In order to
ascertain computed fold-change by edgeR, we compared them with the results of DEGseq
(Wang et al. 2010). Fragments per thousand base pairs (FPKM) values were used to
calculate fold-change with DEGseq using the samwrapper function (Tusher et al. 2001).
37
Pearson correlation coefficient was > 0.95 between fold-change computed with edgeR and
DEGseq for all comparisons (data not shown).
38
2.7 Acknowledgements
We wish to thank Jean-Christophe Therrien and Serge Higgins (LARSA) for their technical
support during the crossing and maturation of embryos. We would also like to thank Anne
C. Dalziel, François-Olivier Hébert, Scott A. Pavey and Christopher Sauvage for inspiring
discussions and comments. The manuscript was also improved by constructive comments
of three anonymous referees. This work was supported by a Natural Science and
Engineering Research Council of Canada (NSERC) discovery grant and Canadian Research
Chair in Genomics and Conservation of Aquatic Resources to L.B. a NSERC postgraduate
scholarship to A-M. D-C and a NSERC postdoctoral fellowship to S. R. A-M. D-C. also
received a stipend from Québec-Océan. This project is a contribution to the research
program of Québec-Océan.
39
2.8 Tables
Table 2.1. Differential expression summary. Number of differentially expressed contigs for
each comparison (FDR < 0.01, fold-change > 2). Above the main diagonal (shadowed) are
the number of contigs annotated as transposable elements (TE) and below are all contigs.
BC: backcross.
TE
Normal
Dwarf
F1-hybrid
83
22
31
Healthy
BC
Malformed
BC
26
64
40
350
400
354
317
All contigs
Normal
Dwarf
F1-Hybrid
Healthy BC
Malformed BC
1,166
108
248
13,217
653
967
15,815
436
13,710
40
13,455
Table 2.2. GO enrichment (biological processes, FDR < 0.05) for differentially expressed
transcripts between pure normal and dwarf whitefish embryos (FDR < 0.01, fold-change >
2).
GO-ID
Term
FDR
GO:0048769
sarcomerogenesis
< 0.0001
13
GO:0030240
skeletal muscle thin filament assembly
< 0.0001
13
GO:0030241
skeletal muscle myosin thick filament assembly
< 0.0001
13
GO:0055003
cardiac myofibril assembly
< 0.0001
14
GO:0048739
cardiac muscle fiber development
< 0.0001
13
GO:0045214
sarcomere organization
< 0.0001
14
GO:0055008
cardiac muscle tissue morphogenesis
< 0.0001
14
GO:0043056
forward locomotion
< 0.0001
9
GO:0051592
response to calcium ion
< 0.0001
13
GO:0007512
adult heart development
0.0001
9
GO:0018298
protein-chromophore linkage
0.0001
5
GO:0007076
mitotic chromosome condensation
0.0008
6
GO:0090212
negative regulation of establishment of blood-brain barrier
0.0011
3
GO:0007186
G-protein coupled receptor signaling pathway
0.0036
18
GO:0015949
nucleobase-containing small molecule interconversion
0.0045
3
GO:0045859
regulation of protein kinase activity
0.0049
21
GO:0060041
retina development in camera-type eye
0.0066
9
GO:0001701
in utero embryonic development
0.0138
13
GO:0006956
complement activation
0.0155
4
GO:0006565
0.0156
2
GO:0001580
L-serine catabolic process
detection of chemical stimulus involved in sensory perception
of bitter taste
0.0156
2
GO:0009586
rhodopsin mediated phototransduction
0.0156
2
GO:0046724
oxalic acid secretion
0.0156
2
GO:0002576
platelet degranulation
0.0174
6
GO:0006775
fat-soluble vitamin metabolic process
0.0207
5
GO:0009268
response to pH
0.0344
3
GO:0048871
multicellular organismal homeostasis
0.0353
6
GO:0030049
muscle filament sliding
0.0353
6
GO:0019265
glycine biosynthetic process, by transamination of glyoxylate
0.0375
2
GO:0006941
striated muscle contraction
0.0381
7
GO:0072521
purine-containing compound metabolic process
0.0484
27
GO:0006813
potassium ion transport
0.0500
7
41
# Contigs
Table 2.3. Over-expression of transposable elements and non-coding transcripts in
malformed backcrosses.
Normal
Dwarf
F1-Hybrid
Healthy
BC
Common
All contigs
Transposable elements
Non-coding
13,217
(50.6%/49.4%)
15,815
(51.8%/48.2%)
13,710
(50.6%/49.4%)
13,455
(50.5%/49.5%)
8,096
(54.1%/45.9%)
350
(70.6%/29.4%)***
400
(70.3%/29.7%)***
354
(63.6%/36.4%)***
317
(66.6%/33.4%)***
177
(70.6%/29.4%)***
6,007
(65.5%/34.5%)*** 7,111
(71.3%/28.7%)*** 6,312
(67.3%/32.7%)*** 6,315
(62.8%/37.2%)*** 3,553
***: All p-values < 0.0001 (Fisher’s exact test).
42
(71.3%/28.7%)*** 2.9 Figures
a
Repbase
Swissprot
Ensembl - Danio rerio
Ensembl - Other fishes
nr
b
Danio rerio
n = 2,909
Swissprot
n = 28,675
Other fishes
n = 1,363
Repbase
n = 2,948
nr
n = 2,741
Putatively coding
n = 7,917
Putatively non-coding
n = 31,494
ORF detection
Figure 2.1. Annotation summary of the assembled transcriptome. a. Schematic view of the
annotation procedure. b. Pie chart showing the contribution of each database to the
annotation. “Other fishes” category includes the combined Ensembl databases for
Gasterosteus, Oryzias, Takifugu and Tetraodon. Non-annotated contigs were classified as
putatively coding if an ORF ≥ 300bp was detected, or putatively non-coding if an ORF ≥
300bp could not be detected.
43
a
b
Mariner/Tc1
n = 2,003
DNA transposons
n = 2,122
Others
n = 231
LTR Retrotransposons
n = 265
hAT
n = 57
Gypsy
n = 205
Non-LTR Retro-transposons
n = 558
L1
n = 45
SINE
REX n = 105
CR1
n = 190 n = 112
Figure 2.2. Transposable elements annotation. a. Transposable element type representation
(n = 2,945 contigs) and b. Transposable element super-family representation according to
Kapitonov and Jurka (2008).
44
b
Normals
Dwarfs
F1-hybrids
Healthy BC
Malformed BC
-200
-100
-100
-50
0
0
100
PC3 (7.5%)
PC2 (21.3%)
50
200
100
300
a
-100
0
100
200
PC1 (32.3%)
300
400
-100
0
100
200
PC1 (32.3%)
300
400
Figure 2.3. Unique transcription profile observed in backcrosses, and malformed
backcrosses in particular. Principal component analysis as performed by the R function
prcomp. a. PC1 (32.3% of the variance) and PC2 (21.3 % of the variance). b. PC1 (32.3%
of the variance) and PC3 (7.5% of the variance). Normal whitefish embryos are depicted by
empty triangles, dwarfs by filled triangles, F1-hybrids by checked squares, healthy
backcrosses by empty squares, and malformed backcrosses by filled squares. BC:
backcrosses.
45
Mariner/Tc1
n = 68
Others
n=5
R2 n = 3**
hAT n = 3
SINE n = 8
L1
CR1
n = 11** n = 9
REX
n=9
Gypsy n = 8
Figure 2.4. Over-expressed transposable element super-families in malformed backcrosses
in all comparisons (n = 125, FDR < 0.01, fold-change > 2). L1 and R2 super-families are
significantly over-represented as compared to the whole annotated transcriptome (**: p <
0.005 Fisher’s exact test).
46
2.10 Supplementary methods
Expression inheritance patterns
Type of inheritance was computed as previously described, including the modified formula
taking into account the 75% normal background and 25% dwarf background in backcrosses
(Renaut et al. 2009). Briefly, the distribution of dominance effects (d/a ratio) was calculated
to disentangle additivity, dominance and non-additivity in hybrids for genes that are
differentially expressed between pure normal and dwarf embryos. Additivity is observed
when a hybrid level expression corresponds to the average of the pure forms (|d/a ratio| <
0.5 is considered additive). Dominance is seen when the hybrid level of expression is closer
to one of the pure forms rather than the other (complete dominance is observed when |d/a
ratio| = 1). To include incomplete dominance, we defined a range of |d/a ratio| comprised
between 0.5 - 1.5 as dominant. Finally, non-additivity is observed when the |d/a ratio| is >
1.5 and negative value reflects a level of expression closer to the dwarf level of expression.
47
2.11 Supplementary table
Table S 2.1. GO enrichment (biological processes, FDR < 0.05) among commonly overexpressed transcripts in malformed backcrosses as compared to all other groups (FDR <
0.01, fold-change > 2).
GO-ID
Term
FDR
# Contigs
Protein synthesis
GO:0006418
tRNA aminoacylation for protein translation
<0.0001
16
GO:0019509
L-methionine salvage from methylthioadenosine
0.0001
6
GO:0070981
L-asparagine biosynthetic process
0.0067
3
GO:0032057
negative regulation of translational initiation in response to stress
0.0193
3
GO:0015824
proline transport
0.0378
4
GO:0052548
regulation of endopeptidase activity
0.0402
15
Inflammatory response
GO:0042742
defense response to bacterium
0.0002
9
GO:0032496
response to lipopolysaccharide
0.0118
14
GO:0019371
cyclooxygenase pathway
0.0127
4
GO:0002690
positive regulation of leukocyte chemotaxis
0.0130
6
GO:0002551
mast cell chemotaxis
0.0193
3
GO:0070555
response to interleukin-1
0.0363
7
GO:0050930
induction of positive chemotaxis
0.0378
3
Stress response
GO:0071157
negative regulation of cell cycle arrest
0.0187
4
GO:0009408
response to heat
0.0247
8
GO:0036293
response to decreased oxygen levels
0.0270
15
GO:0014003
oligodendrocyte development
0.0017
8
GO:0051001
negative regulation of nitric-oxide synthase activity
0.0017
4
GO:0010757
negative regulation of plasminogen activation
0.0067
3
GO:0001539
ciliary or flagellar motility
0.0073
5
GO:0072376
protein activation cascade
0.0160
6
GO:0051918
negative regulation of fibrinolysis
0.0193
3
GO:0045766
positive regulation of angiogenesis
0.0217
8
GO:0016338
calcium-independent cell-cell adhesion
0.0270
6
GO:0042761
very long-chain fatty acid biosynthetic process
0.0270
4
GO:0019367
fatty acid elongation, saturated fatty acid
0.0378
4
GO:0006027
glycosaminoglycan catabolic process
0.0378
3
GO:0034625
fatty acid elongation, monounsaturated fatty acid
0.0378
3
Other
48
2.12 Supplementary figures
a
4000
2000
0
b
4000
Frequency
d
4000
0.4
0.6
0.8
1.0
0
0.2
0.4
0.6
0.8
1.0
0
0.2
0.4
0.6
0.8
1.0
0
0.2
0.4
0.6
0.8
1.0
0
0.2
0.4
0.6
0.8
1.0
2000
0
4000
2000
0
e
0.2
2000
0
c
0
4000
2000
0
Variation Coefficient
Figure S 2.1. Distribution of variation coefficient of read counts for each contig. Median is
shown by arrowhead; a. normal b. dwarf c. F1-hybrids d. healthy backcrosses e. malformed
backcrosses.
49
Figure S 2.2. Distribution of log2 fold-change (logFC) for transcripts differentially
expressed between dwarf and normal embryos (FDR < 0.01, FC > 2). Positive values mean
the transcript is over-expressed in dwarf embryos as compared to normal while negative
values mean it is under-expressed in dwarf embryos as compared to normal.
50
Figure S 2.3. Distribution of log2 fold-change (log2FC) for transcripts differentially
expressed in malformed backcross. Transcripts that are a. under-expressed or b. overexpressed as compared to normal; c. under-expressed or d. over-expressed as compared to
dwarf; e. under-expressed or f. over-expressed as compared to F1-hybrid; g. underexpressed or h. over-expressed as compared to healthy backcross (“common”, FDR<0.01,
FC > 2). In white are the proportions for all contigs, in dark gray are the proportions for
transposable elements, and in light gray are shown the overlaps between the two
distributions.
51
Additivity
Dominance
Non-additivity
a 400
Frequency
200
0
<-10 -10
200
b
-5
0
5
10
>10
-5
0
5
10
>10
0
5
10
>10
100
0
<-10 -10
c 100
50
0
<-10 -10
-5
d/a ratio
Figure S 2.4. d/a ratio distribution in hybrids. a. F1-hybrids b. healthy backcrosses c.
malformed backcrosses.
52
Glucose
Hexokinase
Glucose-6-phosphate
Phosphoglucose isomerase
Fructose 6-phosphate
Phosphofructokinase
Fructose 1,6-bisphosphate
Aldolase
Glyceraldehyde
3-phosphate
Glyceraldehyde-3-phosphate
dehydrogenase
Dihydroxyacetone
phosphate
Triose phosphate
isomerase
1,3 bisphosphoglycerate
Phosphoglycerate kinase
3-phosphoglycerate
Phosphoglycerate mutase
2-phosphoglycerate
Enolase
Phosphoenol pyruvate
Pyruvate kinase
Pyruvate
TCA Cycle
Figure S 2.5. Carbohydrate metabolism (partial) is down-regulated in malformed hybrids.
Shown in bold italic are under-expressed enzymes (FDR < 0.01, fold-change > 2) in
malformed embryos as compared to all other groups. Other enzymes are not differentially
expressed. Circle: Phosphofructokinase is the enzyme up-regulated in dwarf compared to
normal embryos.
53
Figure S 2.6. KEGG map of the TCA cycle. Each unique under-expressed enzyme in
malformed backcrosses in all comparisons is shown by a unique color (FDR < 0.01, foldchange > 2).
54
Figure S 2.7. KEGG map of the oxidative phosphorylation pathway. Each unique underexpressed enzyme in malformed backcrosses in all comparisons is shown by a unique color
(FDR < 0.01, fold-change > 2).
55
Figure S 2.8. KEGG map of fatty acid metabolism. Each unique under-expressed enzyme in
malformed backcrosses in all comparisons is shown by a unique color (FDR < 0.01, foldchange > 2).
56
Figure S 2.9. KEGG map of the purine metabolism. Each unique under-expressed enzyme
in malformed backcrosses in all comparisons is shown by a unique color (FDR < 0.01, foldchange > 2).
57
Figure S 2.10. KEGG map of the pyrimidine metabolism. Each unique under-expressed
enzyme in malformed backcrosses in all comparisons is shown by a unique color (FDR <
0.01, fold-change > 2).
58
Chapitre 3 : Reproductive isolation in a nascent species
pair is associated with aneuploidy in hybrid offspring
59
3.1 Résumé
La spéciation peut survenir lorsque les génomes de deux populations accumulent des
incompatibilités génétiques et/ou des réarrangements chromosomiques qui limitent
l’hybridation en nature. La stabilité des chromosomes est critique pour la survie et la
transmission fidèle du génome, et l’hybridation peut la compromettre. Cependant, cette
hypothèse a rarement été testée entre populations ayant divergé récemment. Ici, nous
testons l’instabilité chromosomique chez les hybrides de deux espèces naissantes, le Grand
Corégone « nain » et et le Grand Corégone « normal » (Coregonus clupeaformis). Nous
avons examiné les chromosomes d’embryons purs et rétro-croisés sains et malformés. Alors
que les individus purs montraient des nombres de chromosomes correspondant au nombre
diploïde attendu (2n = 80), les rétro-croisés sains ont montré des évidences d’instabilité
mitotique par une augmentation de la variance intra-individuelle des comptes de
chromosomes. Chez les rétro-croisés malformés, une aneuploïdie sévère a été trouvée,
correspondant aux multiples du nombre haploïde (1n = 40, 2n = 80, 3n = 120), suggérant
une rupture méiotique chez leur parent hybride F1. Cependant, aucun réarrangement
chromosomique n’a pu être identifié entre les formes parentales. L’instabilité génomique
via l’aneuploïdie semble donc contribuer à l’isolement reproducteur entre les Grands
Corégones nain et normal, et ce, malgré leur divergence très récente (~15 – 20 000
générations). Ainsi, les incompatibilités génétiques pourraient s’accumuler tôt durant la
spéciation, limitant l’hybridation entre espèces naissantes.
60
3.2 Abstract
Speciation may occur when the genomes of two populations accumulate genetic
incompatibilities and/or chromosomal rearrangements that prevent inter-breeding in nature.
Chromosome stability is critical for survival and faithful transmission of the genome, and
hybridization can compromise this. However, this hypothesis has rarely been tested
between recently diverged populations. Here, we test for chromosomal instability in
hybrids between nascent species, the ‘dwarf’ and ‘normal’ lake whitefish (Coregonus
clupeaformis). We examined chromosomes in pure embryos, and healthy and malformed
backcross embryos. While pure individuals displayed chromosome numbers corresponding
to the expected diploid number (2n = 80), healthy backcrosses showed evidence of mitotic
instability through an increased variance of chromosome numbers within an individual. In
malformed backcrosses extensive aneuploidy corresponding to multiples of the haploid
number (1n = 40, 2n = 80, 3n = 120) was found, suggesting meiotic breakdown in their F1
parent. However, no detectable chromosome rearrangements between parental forms were
identified. Genomic instability through aneuploidy thus appears to contribute to
reproductive isolation between dwarf and normal lake whitefish, despite their very recent
divergence (~15-20,000 generations). Thus, genetic incompatibilities may accumulate early
during speciation, thus limiting hybridization between nascent species.
61
3.3 Introduction
A fundamental goal in modern evolutionary biology is to characterize the barriers that
promote and secure divergence between nascent species, thus resulting in reproductive
isolation and ultimately speciation (Coyne and Orr 2004; Seehausen et al. 2014). Prezygotic barriers have been shown to contribute more to total reproductive isolation than
post-zygotic barriers between sympatric species pairs (Coyne and Orr 1997). However,
intrinsic post-zygotic reproductive barriers are thought to be permanent and contribute
significantly to speciation in an irreversible fashion (Muller 1939; Muller 1942). Among
post-zygotic reproductive barriers, it is now clear that nucleotide divergence and genome
re-organization through chromosomal rearrangements are intrinsically associated (Noor et
al. 2001; Rieseberg 2001; Lynch 2007; Faria and Navarro 2010). Although it is challenging
to study these processes in non-model species, unraveling how nucleotide and
chromosomal divergence accumulate and interact to lead to reproductive isolation is crucial
to the understanding of speciation.
While the cytogenetic impact of inter-specific hybridization has long been studied, it has
only rarely been investigated between nascent species (Pinney 1918; White 1969; King
1993; Coyne and Orr 1997). One notable exception is the study of chromosomal races in
the house mouse (Mus musculus complex; Muller 1939; Muller 1942; Pialek et al. 2005;
Hauffe et al. 2012), including a recent study showing that chromosome asynapsis between
subspecies hybrids is responsible for infertility (Noor et al. 2001; Rieseberg 2001; Lynch
2007; Faria and Navarro 2010; Bhattacharyya et al. 2013). Given the scarcity of studies
examining lineages in early stages of divergence, it is hard to draw any conclusions
regarding the cytogenetic impact of hybridization and its role in reproductive isolation, and
how it varies across taxa or the divergence time of the system under scrutiny. Indeed, as
divergence time increases, the initial genetic changes leading to reproductive isolation will
be mixed with subsequent genetic changes that accumulate over time (Via 2009). This will
make it more difficult to detect causative mutations leading to reproductive isolation,
including the role of chromosomal stability in early speciation (Faria and Navarro 2010).
To decipher the initial causes of divergence, and specifically the role of chromosome
changes, it is thus necessary to look at the very first stages of speciation.
62
The geographical and ecological contexts under which divergence has occurred in the lake
whitefish (Coregonus clupeaformis) system are well understood, thus making it an ideal
system in which to study the early stages of speciation. The Acadian and Atlantic lake
whitefish lineages were geographically separated ~60,000 YBP or ~12,000-15,000
generations ago (Bernatchez et al. 2010; Jacobsen et al. 2012), during which time,
according to the Bateson-Dobzhansky-Muller model (BDM), they could freely accumulate
genetic incompatibilities (Bateson 1909; Dobzhansky 1937; Muller 1942). This
geographical isolation was followed by secondary contact in newly formed lakes after the
Laurentian ice sheet retreated ~12,000 YBP (~3-4,000 generations ago). Following
secondary contact, the Acadian lineage evolved repeatedly by character displacement into a
‘dwarf’ limnetic form while the Atlantic lineage maintained the ‘normal’ benthic form
(Bernatchez 2004; Bernatchez et al. 2010). Gene flow between these sympatric nascent
species is still possible (Renaut et al. 2011; Gagnaire et al. 2013) despite the existence of
hybrid incompatibilities leading to a dramatic reduction in embryonic survival in first and
second generation hybrids (Lu and Bernatchez 1998; Rogers and Bernatchez 2006; Renaut
and Bernatchez 2011). This mortality is associated with the appearance of a ‘malformed’,
slow growing phenotype in ~30-50% of backcross individuals, with the remaining embryos
developing normally (‘healthy’ phenotype; Renaut and Bernatchez 2011). Consistent with
predictions from a BDM model integrating transcriptional data (Landry et al. 2007),
previous studies have documented a much higher variance in gene expression in malformed
backcrosses compared to parental forms (Renaut and Bernatchez 2011; Dion-Côté et al.
2014). Transposable elements are also reactivated in both healthy adult backcrosses and
malformed backcross embryos, potentially leading to genome instability (Renaut et al.
2010; Dion-Côté et al. 2014). While earlier studies suggested a primary role for gene
expression dysregulation in the appearance of this malformed phenotype (Renaut and
Bernatchez 2011; Dion-Côté et al. 2014), the molecular basis remains unclear.
Importantly, mixed geographic modes of divergence (i.e. allopatry followed by secondary
contact) are predicted to favor chromosome rearrangements between diverging lineages
(Feder et al. 2011). These chromosome changes can lead to chromosomal incompatibilities
in hybrids, either because they will result in unbalanced gametes or disrupt meiosis (King
1993). Accumulating evidence shows that genetic and chromosomal incompatibilities
63
among species lead to dysregulation involving gene expression, transposable element
reactivation and epigenetic inconsistencies in hybrids, all of which also affect genome
stability (Ferree and Barbash 2009; Brown and O'Neill 2010). Intriguingly, many
aneuploidy events (genome instability in the form of unbalanced segregation of
chromosomes) in metazoans lead to similar phenotypes involving significant growth delays
combined with malformations (Lindsley et al. 1972; Torres et al. 2008). Hence, the fact that
this malformed phenotype occurs only in post-F1 lake whitefish hybrids, combined with
extensive transcriptional dysregulation in backcrosses raises the question of whether
aneuploidy might occur in the hybrid progeny of lake whitefish.
In this context, our goal was to test the hypothesis that genomic instability in the form of
aneuploidy accompanies hybrid breakdown in the backcross progeny of dwarf and normal
lake whitefish. We directly tested if ‘healthy’ and ‘malformed’ backcrosses display higher
chromosomal instability compared to dwarf and normal lake whitefish by examining
embryonic metaphase chromosomes. We reasoned that increased intra-individual variance
in chromosome numbers would indicate increased mitotic instability, while increased interindividual variance would be consistent with meiotic breakdown (Barbero 2011). As
predicted, we found that healthy backcrosses display higher chromosomal instability
compared to pure embryos, and this effect is amplified in malformed backcrosses.
Moreover, we found haploid, diploid and triploid individuals among malformed
backcrosses, suggesting meiotic breakdown in their F1 parents. Yet, conventional
karyotyping of the parental forms did not reveal any chromosomal rearrangements. Thus,
chromosomal instability occurred in hybrids despite the absence of any obvious
chromosomal rearrangements between dwarf and normal genomes. Our results thus support
the hypothesis that chromosomal instability in hybrids, possibly resulting from the
accumulation of minute chromosomal or genetic divergence in allopatry, represents a
strong post-zygotic reproductive barrier in this nascent species complex.
64
3.4 Material and Methods
Crosses and sampling
Dwarf lake whitefish (Acadian lineage) were caught on their spawning grounds in a
RPG@SR?PW BP?GLGLE GLRM ,?IC 4dKGQAMS?R? ƒ}. ƒ}7
?LB LMPK?J J?IC UFGRCDGQF
!RJ?LRGA JGLC?EC
UCPC A?SEFR LC?P ,?IC !WJKCP ƒ}. ƒ}7
BSPGLE &?JJ Sperm and eggs were collected in the field and brought to the lab for artificial fertilization.
Additionally, two lab-reared mature F1-hybrid males (produced from a dwarf mother from
Lake Témiscouata and a normal father from Lake Aylmer) from a previous study were used
(Renaut et al. 2009). In total, 8 partially half-sib backcross families (i.e. four half-sib
families from the same F1-hybrid father, four half-sib families from the other F1-hybrid
father) were produced and used in this study. Due to the limited availability of sexually
mature fish, it was impossible to create all complementary crosses (i.e. normal mother x
dwarf father). However, previous work has documented similar mortality for both types of
crosses (Lu and Bernatchez 1998; Rogers and Bernatchez 2006). Moreover, the malformed
phenotype also occurs in the reciprocal backcross (i.e. with a F1-hybrid female (Renaut and
Bernatchez 2011). A complete description of the embryos sampled in this study can be
found in Table 1. It should be noted that the malformed phenotype was found in all
backcross families, although only a subset was sampled.
All eggs were incubated in the same slowly flowing water system (4.5-5.5ƒC), and reared in
a common environment at the LARSA (Laboratoire de recherche en sciences aquatiques,
Laval University). The Laval University animal care committee (CPAUL) revised and
approved all experimental procedures (Protocol 82178).
Chromosome preparation and microscopy
Healthy (pure dwarf, pure normal and backcross) and malformed (only found among
backcrosses) individuals were sampled. It should be noted that the malformed phenotype
segregates within all backcross families. As previously documented within the same longterm research program, the malformed phenotype is easily identified by the strong
deformities seen, including a curved tail and no visually detectable heartbeat. Malformed
individuals still display characteristics of the phylotypic stage, including eye and dorsal line
65
pigmentation, but still do not resemble any earlier stage of development in normally
developing embryos (see Renaut & Bernatchez 2011 for more details).
Chromosome suspensions from embryos were prepared following previously published
method (Völker et al. 2005) using early-eyed stage embryos (~150-180 degree-days, i.e. 30
to 36 days of development at 5ƒC). Chromosome suspensions from 4 wild dwarf
individuals (from Lake Témiscouata, 2 males and 2 females) and 4 lab-reared normal
individuals (from Lake Aylmer, undetermined sex) were prepared using leucocyte culture
as described elsewhere (Fujiwara et al. 2001). Unfortunately, these individuals could not be
sexed, as there is currently no sex marker available for Coregonus (Yano et al. 2012) and
we did not detect a heteromorphic sex chromosome in both dwarf and normal individuals, a
common situation in salmonids (Davidson et al. 2009).
Metaphase chromosomes were stained with Giemsa-Romanowski dye (pH 6.8-7.0, Dr.
Kulich Pharma, Hradec Králové, Czech Republic) following standard protocols and
examined using a Provis AX70 Olympus microscope. Images were captured with a CCD
camera (DP30W Olympus). A total of 402 metaphase spreads from 41 embryos were
examined, in addition to 64 metaphases from the 8 adult fish (see Table 3.1). With the
exception of 1 embryo for which only 4 observations could be made, at least 5 metaphases
were examined per embryo, with an average of 9.8 observations per individual (see Table
3.2). In adults, 8 metaphases were karyotyped per individual.
Statistical analyses
All statistical analyses were performed using R v.2.15.1 (R Core Team 2012). We first
tested whether intra-individual variance of chromosome counts was dependent on the
experimental group (i.e. pure dwarf, pure normal, backcross healthy, or backcross
malformed). We thus performed an ANOVA on log-transformed individual coefficient of
variation of chromosome numbers. Coefficients of variation were used to control for the
apparent correlation between chromosome number variance and ploidy level. The Tukey
HSD test was then used to identify the comparisons responsible for the significant
differences among groups.
We then tested the hypothesis that the variance in median chromosome numbers per
individual is significantly different between groups. Specifically, we wanted to know if the
66
variance in median chromosome counts per individual was higher in malformed
backcrosses. We applied the Fligner-Kelleen test for homogeneity of variance on median
chromosome counts per individual, first on all groups, and then using pair-wise
comparisons among all groups. A false-discovery rate (FDR) correction was applied to pvalues using the function p.adjust.
67
3.5 Results
Adult Giemsa-stained karyotypes corresponded to previously described karyotypes
(Phillips et al. 1996). Both have a diploid chromosome number of 2n = 80, including 10
meta-/sub-metacentric chromosomes and 30 acrocentric chromosomes of gradually
decreasing size, with the exception of one distinguishable large pair. No obvious
differences were detected between the karyotypes of the two forms (Figure 3.1a and 1b).
Summary statistics of chromosome number per group can be found in Table 2. The
complete summary statistics of chromosome number per individual are in supplementary
data (Table S 3.1). In pure dwarf and normal embryos, counts were centered on the
expected diploid number (2n = 80, Figure 3.2a and 2b, (Phillips et al. 1996)). Dwarf
embryos had a mean of 81.7 ± 11.7 chromosomes/metaphase and a median of 79
chromosomes/metaphase while normal embryos had a mean of 78.3 ± 5.6
chromosomes/metaphase and a median of 79 chromosome/metaphase (Table 3.2). Among
pure embryos, no counts exceeded 86 chromosomes, with the exception of a single
suspected
triploid
dwarf
individual
(Figure
3.3a,
mean
=
108.1
±
13.5
chromosomes/metaphase, median = 111.5 chromosomes/metaphase). Pure embryos
displayed some variance around the diploid number (2n=80, Figure 3.2a,b and Figure 3.3),
which was expected as chromosome suspensions from embryos are more difficult to spread
than those from other tissues, and therefore more difficult to count (Völker and Ráb 2015
and Figure 3.1).
In healthy backcrosses (Figure 3.2c), counts were also centered on 2n = 80 (median = 78
chromosomes/metaphase) but with a lower mean (73.7 ± 13.3 chromosomes/metaphase)
compared to pure embryos (Table 3.2). Metaphases with 20 to 86 chromosomes were
found, but all of these individuals seemed diploid (with a median chromosome number
close to 80), although with increased variance in chromosome number.
In sharp contrast, a clear tri-modal distribution was found in malformed backcrosses
(Figure 3.2d and Figure 3.3a), with chromosome numbers concentrated around multiples of
the haploid number (1n = 40, 2n = 80, 3n = 120). The mean chromosome number was
lower than all other groups (mean = 76.0 ± 34.7 chromosomes/metaphase), while the
median was equal to healthy backcrosses (median = 78 chromosomes/metaphase, Table
68
3.2). Malformed backcrosses could be separated according to their ploidy (Table 3.1c,d,e
and Figure 3.3a). Three malformed backcross individuals were clear haploids (1n = 40),
one individual was a triploid (3n= 120) and one individual was almost tetraploid (Figure 3e,
mean = 139 ± 20.8 chromosomes/metaphase, median = 145 chromosome/metaphase).
Metaphases with as few as 32 chromosomes to as many as 158 chromosomes were found in
malformed backcrosses. Chromosome fragments were also found in malformed
backcrosses, although these were relatively rare (Figure 3.1e, arrowheads).
An ANOVA testing for differences in the intra-individual coefficient of variation of
chromosome counts revealed a significant difference among groups (F3,36= 4.911, p =
0.0058). There was a significant difference between healthy backcrosses and pure normal
and dwarf embryos (Tukey HSD test, p ≤ 0.05), but no comparison involving malformed
backcrosses was significant (Figure 3.3b). This is because there were three haploids among
malformed backcrosses, which had very small variance of chromosome numbers. In
addition, we found a significant difference in the variance of the median chromosome
number among groups (Fligner-Killeen test, χ2=12.0222, df=3, p<2.20E-16). The variance
of median chromosome number in malformed backcrosses was significantly different from
pure normal, pure dwarf and healthy backcrosses, after correction for multiple testing
(Figure 3.3c, Fligner-Killeen test, FDR<0.05, p ≤ 0.05). Complete statistical analyses can
be found in the supplementary material (Table S 3.4).
69
3.6 Discussion
In this study, we investigated the role of chromosomal instability in reproductive isolation
between nascent lake whitefish species pairs by measuring the chromosome numbers of
normal, dwarf, and healthy and malformed backcrosses. Increased intra-individual variance
of chromosome number was found in healthy backcrosses. This strongly supports the
hypothesis of mitotic chromosome segregation problems, resulting in extra or missing
chromosomes after mitotic cell division (Barbero 2011). However, malformed backcrosses
did not display evidence for mitotic chromosome instability compared to pure embryos.
This is likely because three stable haploid malformed backcrosses were sampled, thus
reducing the variance within the group. Even more strikingly, higher inter-individual
variance was found in malformed backcrosses than any other group, i.e. extensive
aneuploidy, with an extra or missing haploid complement in the majority of individuals.
This result is consistent with meiotic nondisjunction in their F1-hybrid parent (Barbero
2011). Yet, karyotypes from parental forms did not reveal any obvious differences at the
whole chromosome level. This suggests that aneuploidy in hybrids is caused by minute
sub-chromosomal incompatibilities or genetic incompatibilities acting through mitotic and
meiotic mechanisms. The accumulation of these incompatibilities may have been facilitated
by the geographic isolation between the two pure forms for approximately 12,000-15,000
generations. Clearly, such incompatibilities cause substantial reproductive isolation
between lake whitefish lineages, since 30-50% of post-F1-hybrids are malformed and die
during their early development (Rogers and Bernatchez 2006; Renaut and Bernatchez
2011). Our results are especially noteworthy considering the very young age of these
lineages on an evolutionary timescale (Jacobsen et al. 2012).
Importantly, our results were collected from 8 partially half-sib backcross families, arguing
that these results are not only due to a ‘family effect’, but apply more generally to these
populations. Moreover, the malformed phenotype associated with aneuploidy was observed
in two different cohorts (crosses from Renaut and Bernatchez 2011 and this study). Finally,
the malformed phenotype has also been observed in post-F2 hybrids (Dion-Côté and
Bernatchez, unpublished data). These independent observations strongly support the
hypothesis of segregating sub-chromosomal or genetic incompatibilities between lake
70
whitefish lineages leading to aneuploidy in their hybrid progeny, and reproductive
isolation.
We note that this extensive aneuploidy would have been very difficult to interpret or even
detect from whole genome sequence data alone, stressing the importance of cytogenetics in
the post-genomic era. While these approaches have been largely neglected since the advent
of modern sequencing techniques, we have shown here that they provide key information
regarding genome organization and stability that are difficult to detect from sequence data.
Potential mechanisms underlying chromosome segregation breakdown
We predicted that lake whitefish hybrids would display higher chromosomal instability
compared to pure parental forms. Our data support this prediction and here, we discuss
four, non-mutually exclusive, candidate mechanisms that are potentially responsible for
chromosomal segregation breakdown, namely (1) chromosomal rearrangements, (2) the
mismatch repair pathway, (3) centromere divergence and (4) heterochromatin
decondensation.
The most parsimonious explanation for both mitotic and meiotic breakdown in backcrosses
is that significant chromosomal rearrangements have occurred between the Atlantic and
Acadian lake whitefish lineages, thus interfering with proper meiotic and mitotic
chromosomal segregation (Barbero 2011). However, karyotyping suggest that this is not the
case, at least at the whole-chromosome scale as both karyotypes are essentially the same
(Figure 3.1a and 1b). Yet, our results cannot rule out the possibility that more subtle
changes at the sub-chromosomal level might be involved, including heterochromatin and
rDNA genes additions/deletions, with potential consequences for gene expression
regulation. It is noteworthy that such sub-chromosomal changes have been recently
detected in another Coregonus species pair from Europe, where no major karyotypic
differences were found (Symonová et al. 2013).
Alternatively, DNA repair pathways such as the highly conserved mismatch repair (MMR)
DNA repair pathway (Harfe and Jinks-Robertson 2000) may be responsible for meiotic
breakdown. Indeed, this mechanism is responsible for induced aneuploidy between
incipient species of yeast (Saccharomyces, Greig et al. 2003). Meiotic crossovers are
critical for balanced chromosome segregation in meiosis as they maintain a tight connection
71
between homologous chromosomes during meiosis I. When divergent chromosomes are
combined in yeast hybrids, the MMR pathway prevents these crossovers, thus resulting in
aneuploid progeny. Thus, the MMR pathway may underlie the meiotic breakdown in lake
whitefish F1-hybrids if nucleotide divergence is high enough in regions targeted by the
meiotic recombination machinery. However, this mechanism does not provide an
explanation for mitotic breakdown in healthy backcrosses.
A third candidate mechanism leading to increased chromosomal instability in hybrids was
originally proposed by Henikoff et al. (Henikoff et al. 2001) based on the observation of
concerted, rapid evolution of centromeres and their associated proteins. Centromeres are
defined by repetitive sequences, including transposable elements. Centromeres are thus
rapidly evolving due to their labile nature, despite their highly conserved and critical role in
chromosome segregation. This could lead to chromosomal incompatibilities, even between
allopatric populations of the same species (Henikoff et al. 2001; Lynch 2007). Hence, it is
plausible that aneuploidy in lake whitefish backcrosses may result from the disruption of
the chromosome segregation machinery via centromere incompatibilities.
A fourth possibility is that aneuploidy results from heterochromatin decondensation due to
mis-regulation in lake whitefish hybrids, which could significantly affect chromosome
segregation in both mitosis and meiosis (Grewal and Jia 2007). Indeed, accumulating
studies support a role for heterochromatin regulation and associated proteins in
reproductive isolation (O'Neill and Graves 1998; Bayes and Malik 2009; Ferree and
Barbash 2009; Cattani and Presgraves 2012). Therefore, heterochromatin deregulation in
lake whitefish hybrids could also disrupt mitotic and meiotic chromosome pairing, inducing
aneuploidy in backcrosses.
We cannot yet conclusively state which of these molecular mechanisms is responsible for
the
aneuploidy
in
lake
whitefish
backcrosses.
However,
the
heterochromatin
decondensation hypothesis is especially promising, as previous work in our system has
found a massive reactivation of both transposable elements and non-coding RNAs in
malformed backcrosses, consistent with heterochromatin disruption (Renaut and
Bernatchez 2011; Dion-Côté et al. 2014). Also, we previously found that the Gene
Ontology category “chromosome condensation” was enriched among genes differentially
72
expressed between dwarf and normal embryos, suggesting divergence in the regulation of
chromosome compaction (Dion-Côté et al. 2014). We also note that the reactivation of
transposable elements may lead to aneuploidy via chromosomal rearrangements
(mechanism 1). Further studies looking specifically at sub-chromosomal structure and
heterochromatin regulation in dwarf and normal lake whitefish, as well as their hybrids,
will help to disentangle these potentially non-mutually exclusive mechanisms.
Development canalization and ‘aneuploidy syndrome’
The fact that healthy backcrosses appear to develop normally, and can eventually reproduce
(Dion-Côté and Bernatchez, unpublished data) despite mitotic instability questions how
such intra-individual variation is buffered through development. Indeed, Waddington
(Waddington 1942) elegantly suggested that developmental pathways are under strong
selective pressure (or canalized), thus buffering for genetic and environmental variations.
Hence, the phenotype of healthy backcrosses appears canalized despite higher
chromosomal variation compared to pure embryos.
However, this canalization is broken down in malformed backcrosses. The malformed
phenotype occurs in conjunction with more variable cytogenetic backgrounds and also in
putative diploid individuals (Figure 3.3a). Moreover, malformed backcrosses did not show
statistically significant mitotic instability (p = 0.15 vs dwarf embryos and p=0.09 vs normal
embryos, Tukey HSD post-hoc test, Table S 3.3). As explained above, this is because three
haploid malformed backcrosses had a much smaller variance of chromosome number
(within an individual) than other malformed backcrosses (Figure 3.3a, Table S 3.1). These
haploid individuals may suffer from this ‘aneuploidy syndrome’, but not hybrid
incompatibilities per se as they bear only one genome (likely the pure maternal).
How can one explain this consistent malformed phenotype despite high cytogenetic and
transcriptional variability (Renaut and Bernatchez 2011; Dion-Côté et al. 2014)? Lindsley
and Sandler (Lindsley et al. 1972) found that different types of hyperploidy in Drosophila
resulted in a common phenotype combining rough eyes, abnormal wings, bristle and
abdomen, which they described as a ‘hyperploidy syndrome’. A recent study also found a
consistent transcriptional profile within species that was independent of the specific
chromosome aberration investigated (Sheltzer et al. 2012). In general, many organisms for
73
which the aneuploidy effect has been studied were found to display developmental
abnormalities, in addition to a transcriptional signature involving protein synthesis,
inflammatory and stress responses (Torres et al. 2008; Dürrbaum et al. 2014). Not
surprisingly, this signature was also found in lake whitefish malformed backcrosses (DionCôté et al. 2014), in addition to a down-regulation of essential developmental genes
(Renaut and Bernatchez 2011).
Aneuploidy may contribute to the mis-regulated transcriptional landscapes previously
described in lake whitefish backcross embryos; alternatively, transcriptional mis-regulation
in hybrids may lead to aneuploidy. While the causal relationship remains difficult to
establish, it appears that the malformed phenotype and associated transcriptional response
that we identified in lake whitefish hybrids mirror what has been found in other organisms.
Developmental canalization breakdown in malformed backcrosses is thus associated with
and ‘aneuploidy syndrome’ involving increased chromosomal instability and a distinctive
transcriptional response.
Implications for the study of speciation
Although cytogenetic studies looking at early diverging lineages are scarce, reproductive
isolation through chromosomal instability has been observed in hybrids across several taxa.
In yeast (Saccharomyces paradoxus), a recent study found that chromosomal differences
lead to chromosomal instability in the progeny of diverging strains (Charron et al. 2014).
Importantly, the divergence of these strains occurred under a similar biogeographical
context as the lake whitefish, i.e. a phase of geographic isolation followed by recent
secondary contact between lineages. This further supports our interpretation that the
conditions under which divergence occurred in lake whitefish have facilitated the
accumulation of incompatibilities, which may be either of chromosomal or genetic nature,
leading to chromosomal instability in hybrids. In the house mouse (Mus musculus
domesticus), numerous studies have clearly shown that hybridization between certain
chromosomal races leads to chromosomal mis-segregation and hence reduction in litter size
(Forejt et al. 2012; Hauffe et al. 2012). Combined with our results, these studies suggest
that chromosomal instability can occur in the hybrid progeny of early diverging lineages
across a broad range of taxa.
74
However, it should be noted that, divergence time among these lineages is much greater
than for the lake whitefish, given than mouse races have diverged several hundred
thousands to million years ago (Macholán et al. 2012) and that yeast produces multiple
generations per year. Moreover, it should be stressed that the chromosomal instability we
have documented appears to occur in the absence of detectable chromosomal
rearrangement. To our knowledge, our study is thus the first to investigate the cytogenetic
impact of hybridization among such recently diverged lineages (~12-15,000 generations), at
least in vertebrates. There are few, if any, examples of such striking incompatibilities in
lineages as young as the lake whitefish, and it is possible that this is only because the
cytogenetic consequences of hybridization have been overlooked.
As a consequence, chromosomal speciation models, including the cytogenetic impact of
hybridization, have been somewhat neglected in the past decade. This is due to the
combination of the presumed small involvement of chromosome rearrangements to early
speciation stages (with the exception of inversions, e.g. (Noor et al. 2001; Rieseberg 2001;
Feder and Nosil 2009)) and theoretical issues concerning the fixation of strongly
underdominant chromosomal rearrangements (Faria and Navarro 2010). However, the
conditions promoting the fixation of new chromosomal rearrangements were present in the
lake whitefish system, including: a mixed geographic mode of divergence (Bernatchez et al.
2010), small effective population size (Ne ~ 1000, Campbell and Bernatchez 2004),
geographical isolation of lineages (Bernatchez and Dodson 1991) , and possibly meiotic
drive (Gagnaire, Normandeau, Pavey, et al. 2012). Unfortunately, it is not possible to
testify at this stage of our research program whether the chromosomal instability observed
in lake whitefish hybrids is the result of genetic or chromosomal incompatibilities, and
hence a case of chromosomal speciation. Yet, both genetic and subtle chromosomal
changes may be involved, and future research should help to disentangle these alternative
hypotheses.
Our results are critical to the understanding of how reproductive isolation has emerged in
the lake whitefish system, and other nascent species. We show that genomic instability,
through aneuploidy, transcriptional dysregulation and transposable reactivation, can interact
and efficiently limit hybridization early in the divergence process, and thus contribute to
speciation. Future work looking at systems where conditions promoting the appearance and
75
fixation of chromosome rearrangements are found will help to draw conclusions regarding
the generality of our observations.
76
3.7 Data accessibility
Raw chromosome counts are available on Dryad (http://dx.doi.org/10.5061/dryad.jt008).
77
3.8 Acknowledgements
We are grateful to Jean-Yves Masson, Anne C. Dalziel and Ben Sutherland for inspiring
discussions, and Serge Higgins, Guillaume Côté and Jean-Christophe Therrien for their
support during the crossing and rearing of embryos at the LARSA. We also thank Jana
Čechová, Alain Goulet and Richard Janvier for their technical help during chromosome
suspension preparation and microscopy. Glenn Yannic, Gaétan Daigle and Charles Bordet
also contributed significantly to statistical approaches. This work was supported by a
Natural Science and Engineering Research Council of Canada (NSERC) discovery grant
and Canadian Research Chair in Genomics and Conservation of Aquatic Resources to L.B.
and a NSERC postgraduate scholarship to A.-M.D.-C. A.-M.D.-C. also received financial
supports from Québec-Océan for a short training and a FRQNT international internship.
This project is a contribution to the research program of Québec-Océan. R.S. and P.R. were
supported by the project 14-02940S of the Czech Science Foundation.
78
3.9 Tables
Table 3.1. Individuals sampled in this study. The family name reflects the number of the
parent and the direction of the cross (female x male). N: normal, D: dwarf.
Type
NxN
DxD
Backcross
Total
Family
N14 x N1
N15 x N7
N17 x N8
N18 x N9
N19 x N13
D55 x D79
D57 x D80
D58 x D78
D59 x D75
D63 x F1-2
D64 x F1-1
D68 x F1-2
D72 x F1-2
N17 x F1-2
N18 x F1-1
N25 x F1-1
N28 x F1-1
n healthy
2
2
2
1
3
4
3
1
3
3
2
1
0
1
0
3
0
31
n malformed
0
0
0
0
0
0
0
0
0
3
0
2
1
0
2
0
2
10
79
Total
10
11
20
41
Table 3.2. Summary statistics of chromosome number per cross-type and group. N: normal,
D: dwarf, BC: backcross.
Group
NxN
DxD
BC healthy
BC malformed
n
individuals
10
11
10
10
Mean
SD
Median
n metaphases
78.3
81.7
73.7
76.0
5.6
11.7
13.3
34.7
79
79
78
78
102
95
103
102
80
3.10 Figures
Figure 3.1. Karyotypes of pure parental forms and abnormal metaphases of malformed
backcrosses. Pure karyotypes are composed of 10 metacentric pairs and 30 acrocentric pairs
of decreasing size. a. Normal individual from lake Aylmer b. Dwarf individual from lake
Témiscouata. c. Haploid metaphase from a malformed backcross. d. Triploid metaphase
from a malformed backcross. e. Nearly tetraploid metaphase from a malformed backcross.
Arrowheads denote chromosome fragments. Bar = 10µM.
81
a
30
15
b
0
0
20
40
60
80
100
120
140
160
0
20
40
60
80
100
120
140
160
0
20
40
60
80
100
120
140
160
0
20
40
60
80
100
120
140
160
30
Frequency
20
10
c
0
30
20
10
d
0
10
5
0
chromosomes per metaphase
Figure 3.2. Meiotic breakdown in malformed backcrosses reflected by the analysis of
mitotic chromosomes. Histogram showing the distribution of chromosome counts in lake
whitefish embryos. a: normal fish (n = 11), b: dwarf fish (n = 10), c: healthy backcrosses (n
= 10), d: malformed backcrosses (n = 11).
82
a
Healthy backcross
Malformed backcross
Normal
Dwarf
chromosome count
per metaphase
160
120
80
40
b
coefficient of variation
0
0.35
b
0.30
0.25
0.20
a,b
0.15
0.10
a
a
normal
dwarf
c
median chromosome count
per metaphase
0.05
healthy malformed
backcross backcross
b
140
120
100
80
a
a
normal
dwarf
a
60
40
healthy malformed
backcross backcross
Figure 3.3. Mitotic and meiotic chromosomal instability occurs in backcrosses based on the
analysis of mitotic chromosomes. a. Mean chromosome counts and standard deviation per
individual for each group. b. Boxplot showing the distribution of individual coefficient of
variation of chromosome number per group. Different letters indicate statistically
significant differences (Tukey HSD test, p ≤ 0.05). c. Boxplot showing the distribution of
median chromosome number per group. Different letters indicate statistically significant
differences (Fligner-Killeen test, FDR<0.05, p ≤ 0.05).
83
3.11 Supplementary tables
Table S 3.1. Summary statistics for individual fish.
Individual
N14N1-33
N14N1-34
N15N7-25
N15N7-27
N17N8-16
N17N8-17
N18N9-7
N19N13-1
N19N13-3
N19N13-4
D55D79-1
D55D79-2
D55D79-5
D55D79-6
D57D80-14
D57D80-16
D57D80-17
D58D78-28
D59D75-35
D59D75-36
D59D75-27
D63F12-15
D63F12-3
D63F12-5
D64F11-2
D64F11-5
D68F12-14
N17F12-1
N25F11-19
N25F11-20
N25F11-21
D63F12-10
D63F12-7
D63F12-8
D68F12-17
D68F12-18
D72F12-24
N18F11-49
N18F11-51
N28F11-41
N28F11-42
Group
NxN
NxN
NxN
NxN
NxN
NxN
NxN
NxN
NxN
NxN
DxD
DxD
DxD
DxD
DxD
DxD
DxD
DxD
DxD
DxD
DxD
BC healthy
BC healthy
BC healthy
BC healthy
BC healthy
BC healthy
BC healthy
BC healthy
BC healthy
BC healthy
BC malformed
BC malformed
BC malformed
BC malformed
BC malformed
BC malformed
BC malformed
BC malformed
BC malformed
BC malformed
Mean
80.2
79.9
79.8
77.1
76.0
79.4
78.6
78.5
78.3
75.2
78.8
79.9
78.3
77.2
76.8
77.3
76.2
76.4
78.5
78.2
108.1
79.8
71.0
78.6
66.4
72.9
76.8
72.7
69.4
76.6
74.9
42.4
139.0
38.6
77.3
89.9
102.8
78.0
81.2
39.0
117.2
SD
2.8
1.6
2.4
11.7
4.9
2.5
5.9
2.1
2.9
5.7
2.5
4.8
3.5
4.3
3.0
2.7
2.9
4.5
5.6
3.2
13.5
1.7
13.5
3.6
21.8
10.7
4.5
14.7
20.2
4.9
14.0
3.1
20.8
3.5
11.1
33.2
32.3
2.9
3.6
2.6
5.2
Median
79
79
80
80
77
79.5
80
78
79
76
78.5
81
78.5
78.5
77
79
77
79
80
78
111.5
80
75
78
78.5
77
77
75.5
78
76
81
41.5
145
38
80
109
121
77.5
81.5
39.5
115
84
The name of the individuals reflects the parent and sense of the cross and the number of the
individual (female x male + number). Two F1 males were involved, F11 and F12. BC :
backcross.
85
Table S 3.2. ANOVA summary on variation coefficients of chromosome counts per
individual.
Source of variation
df
Group
Residuals
Total
3
37
40
Sum of
squares
6.54
18.28
24.82
Mean
squares
2.18
0.49
86
F value
P
4.41
0.01
Table S 3.3. Tukey HSD post-hoc test.
Comparison
BCM-BCH
DxD-BCH
NxN-BCH
DxD-BCM
NxN-BCM
NxN-DxD
low
-0.15
-0.82
-0.92
-0.67
-0.77
-0.10
high
-1.00
-1.64
-1.76
-1.49
-1.61
-0.93
p
0.70
0.01
-0.07
0.16
0.08
0.72
DxD : pure dwarf
NxN : pure normal
BCH : Backcross healthy
BCM : Backcross malformed
87
adj
0.96
0.05
0.03
0.15
0.09
0.99
Table S 3.4. Fligner-Killeen test (on individual median chromosome counts).
Comparison
DxD, NxN, BCH, BCM
BCM-DxD
BCM-NxN
BCH-BCM
BCH-DxD
BCH-NxN
DxD-NxN
Chi-squared
120.0222
50.1651
60.7344
62.4471
0.0557
3.6888
4.1497
df
3
1
1
1
1
1
1
p-value
< 2.20E-16
1.41E-12
6.53E-15
2.74E-15
0.8134
0.05478
0.04164
DxD : pure dwarf
NxN : pure normal
BCH : Backcross healthy
BCM : Backcross malformed
88
FDR
N/A
2.83E-12
1.96E-14
1.64E-14
0.81
0.07
0.06
Chapitre 4 : Cytogenetics and missed information from
genome sequencing: standing chromosomal variation
associated with reproductive isolation in Lake Whitefish
species pairs
89
4.1 Résumé
Le rôle des réarrangements chromosomiques en spéciation demeure un sujet débattu, bien
que les conditions démographiques associées à la divergence devraient promouvoir leur
apparition. Afin d’étudier le rôle des changements chromosomiques en relation avec la
spéciation, nous étudions deux lignées du Grand Corégone (Coregonus clupeaformis) ayant
récemment colonisé des lacs post-glaciaires suivant une période d’allopatrie au cours de
Pléistocène. Une forme naine limnétique a évolué de façon répétée à partir de la forme
benthique normale et est devenue isolée reproductivement. Il a été prédit que
l’accumulation d’incompatibilités génétiques en allopatrie puisse avoir facilité la
divergence en sympatrie. Les hybrides du Grand Corégone souffrent d’instabilité mitotique
et méiotique, résultant possiblement d’une divergence de structure chromomique en
allopatrie. Ici, nous testons cette hypothèse en utilisant une approche de cytogénétique et
testons la présence de différences chromosomiques et de parallélisme entre trois paires de
Grand Corégone sympatriques. Alors que le caryotype est constant entre les écotypes nain
et normal de corégones (2n = 80, NF = 98), il existe un polymorphisme subchromosomique omniprésent impliquant de potentiels régulateurs importants de la
ségrégation des chromosomes. Des analyses multi-variées supportent l’hypothèse que des
réarrangements chromosomiques sont apparus en allopatrie, mais ne trouvent aucune
preuve de parallélisme entre les lacs. Ainsi, les réarrangements chromosomiques dans le
système du Grand Corégone pourraient contribuer à la divergence en en déstabilisant la
ségrégation mitotique et méiotique chez les hybrides. Les structures sous-chromosomiques
étudiées, telles que les centromères, les éléments répétés et l’hétérochromatine demeurent
difficiles à séquencer et assembler, supportant le fait que la cytogénétique demeure une
approche hautement complémentaire au séquençage pour détecter les bases génomiques de
la spéciation.
90
4.2 Abstract
The role of chromosome rearrangements in speciation remains a debated topic, although
demographic conditions associated with divergence should promote their appearance. To
address the role of chromosome changes in relation with speciation, we study two Lake
Whitefish (Coregonus clupeaformis) lineages that recently colonized post-glacial lakes
following allopatry during Pleistocene. A dwarf limnetic form evolved repeatedly from the
normal benthic form, becoming reproductively isolated. The accumulation of genetic
incompatibilities in allopatry is hypothesized to have facilitated divergence in sympatry.
The hybrids of these species pairs are known to experience mitotic and meiotic instability
that may be the result of the structural divergence of chromosomes during allopatry. Here
we test this hypothesis using cytogenetics and test for the presence of chromosome
differences and parallelism among sympatric species pairs from three lakes. While
karyotypes are the same in dwarf and normal ecotypes (2n = 80, NF = 98), extensive subchromosomal polymorphisms involving potentially important regulators of chromosome
segregation occur. Multivariate analyses support the hypothesis that chromosomal
rearrangements appeared during allopatry, and find no evidence for parallelism among
lakes. Thus, chromosome rearrangements in the Lake Whitefish system may contribute to
divergence by destabilizing mitotic and meiotic chromosome segregation in hybrids. The
chromosome structures detected here, such as centromeres, repetitive elements and
heterochromatin are still difficult to sequence and assemble, arguing that cytogenetics and
sequencing are complementary approaches for detecting the genomic bases of speciation.
91
4.3 Introduction
Understanding the role of genetic and chromosomal changes associated with divergence is
a major focus in evolutionary biology (Brown and O'Neill 2010; Marie Curie
SPECIATION Network 2012). Thanks to the advent of massive parallel sequencing
technologies, considerable progress has been made in the past decade towards the
understanding of the genetic basis of adaptation and speciation (Seehausen et al. 2014).
Evidence in plants, yeast, mammals and fishes suggest that chromosome structure changes
are associated with divergence and reproductive isolation (e.g. Fishman et al. 2013;
Symonová et al. 2013; Charron et al. 2014). However, important chromosome structures
such as centromeres, repetitive elements and heterochromatin are still largely missing from
even the most complete genome assemblies (Hoskins et al. 2007; Altemose et al. 2014;
Ekblom and Wolf 2014), hindering our understanding of population divergence and
speciation, Still, these structures play key roles in gene expression regulation, chromosome
segregation and genome stability, and can greatly influence the genomic landscape of
speciation, modulate variations in recombination rate along chromosomes, and impact
hybrid fitness (Grewal and Jia 2007; Brown and O'Neill 2010). While advances in long
read technologies hold great promise for the comprehension of these structures, they remain
challenging to characterize (Kim et al. 2014). As a consequence, fundamental questions
about how chromosomal rearrangements are shaped by neutral and selective processes and
whether they contribute to speciation remain, despite a century of study (Brown and O'Neill
2010; Faria and Navarro 2010).
Alternatively, higher orders of chromosome and chromatin structure can be characterized
by cytogenetic approaches. Robust models describing the role of chromosomal inversions
have been formulated (Noor et al. 2001; Rieseberg 2001; Hoffmann and Rieseberg 2008).
However, the contribution of other types of rearrangements that can be targeted by
cytogenetics (e.g. heterochromatin addition/deletion, copy number variation) to divergence
and reproductive isolation remains overlooked, although this type of polymorphism
represents a substantial source of variation within a species (King 1993; Kidd et al. 2008).
Moreover, heterochromatin is involved in recombination regulation and chromosome
segregation (Grewal and Jia 2007; Brown and O'Neill 2010), so this type of polymorphism
may even provide a source of variable reproductive isolation if it is associated with hybrid
92
incompatibilities
(Cutter
2012).
Therefore,
modern
cytogenetic
techniques
are
complementary to next-generation sequencing techniques and can help to reveal how the
genome is structurally organized into chromosomes, how it is shaped by divergence, and
how chromosome structure contributes to speciation.
Salmonids typically display substantial interspecific chromosome rearrangements (Phillips
and Ráb 2001), which may be the result of the “plasticity” of their genome conferred by
their ancestral tetraploid state (Mable et al. 2011). Teleosts experienced a whole genome
duplication (WGD – 3R) event preceding their diversification ~350 MYA and salmonids
underwent an additional WGD (4R) about 60 MYA (Crête-Lafrenière et al. 2012). The
rediploidization process of the salmonids is incomplete, as evidenced by residual tetrasomic
inheritance and the formation of meiotic tetravalents (Allendorf et al. 2015). Furthermore,
cytogenetic studies investigating multiple individuals have often revealed intraspecific
polymorphism, typically resulting from Robertsonian fusion and fission of chromosomes,
but that may also involve additions and deletions of heterochromatin (reviewed in Phillips
and Ráb 2001). For example, the largest metacentric chromosome of Lake Whitefish shows
length polymorphism, possibly as a result of a varying heterochromatin content and/or level
of compaction (Phillips et al. 1996). In Lake Trout (Salvelinus namaycush), large blocks of
heterochromatin are polymorphic and heritable (Phillips and Ihssen 1986). However, the
role of these sub-chromosomal rearrangements, which may accompany and influence the
speciation process, has rarely been examined in a context of early divergence.
Here, we use the Lake Whitefish system as a model to examine chromosomal divergence
during speciation and test for parallelism in this process. During the Pleistocene glaciation,
two Lake Whitefish lineages (the Atlantic and Acadian lineages) underwent geographical
isolation ~60,000 YBP (or ~15-20,000 generations ago) in northeastern North America
(Jacobsen et al. 2012). These two glacial lineages came into secondary contact when they
repeatedly colonized newly formed lakes following the retreat of the Laurentide ice sheet
~12,000 YBP (3-4,000 generations ago) (Bernatchez and Dodson 1991). Competitive
interactions and niche availability presumably contributed to the divergence of a derived,
dwarf limnetic form from the ancestral normal benthic form (Landry et al. 2007; Landry
and Bernatchez 2010), with variable levels of genetic divergence and gene flow (Lu and
Bernatchez 1999; Renaut et al. 2012; Gagnaire et al. 2013). Despite very low sequence
93
nucleotide divergence between dwarf and normal Lake Whitefish (Hébert et al. 2013),
significant post-zygotic reproductive isolation between the two ecotypes has been
documented, whereby F1-hybrids and backcrosses have a higher embryonic mortality rate
relative to pure parental forms (Lu and Bernatchez 1998; Rogers and Bernatchez 2006). In
backcrosses, hybrid breakdown involves the appearance of a characteristic malformed
phenotype, gene expression deregulation and transposable element derepression (Renaut et
al. 2009; Renaut and Bernatchez 2011; Dion-Côté et al. 2014). Moreover, we have recently
shown that healthy and malformed backcrosses experience mitotic instability and meiotic
breakdown respectively (Dion-Côté et al. 2015), suggesting a role for chromosome
rearrangements in Lake Whitefish. Nevertheless, it is unknown if chromosome structure
remodeling have accompanied divergence between normal and dwarf Lake Whitefish, and
if these potential changes may have played a role in reproductive isolation.
In this study, we document the karyotype and chromatin structure between three sympatric
pairs of normal and dwarf Lake Whitefish, specifically targeting sub-chromosomal markers
associated with heterochromatin and repetitive DNA. We test the hypothesis that genetic
and phenotypic divergence in sympatric normal and dwarf Lake Whitefish is accompanied
by chromosomal and chromatin structure divergence, possibly arising during ancestral
allopatry. While karyotypes are stable among all populations examined (Dion-Côté et al.
2015), we observe that sub-chromosomal structures associated with heterochromatin and
repetitive regions are highly polymorphic. By applying multivariate analyses, we identify
markers associated with glacial lineage and inter-lake divergence. These observations
support the presence of ‘standing chromosomal variation’ at the time of colonization, but
we cannot reject the hypothesis of de novo chromosomal changes following postglacial
colonization.
94
4.4 Material and methods
Sampling and chromosome suspension preparation
We sampled three sympatric lakes of the St. John River basin: Cliff Lake (ME, USA),
Témiscouata Lake (Québec, Canada) and East Lake (Québec, Canada), which are part of a
long term research program on Lake Whitefish (Bernatchez et al. 2010). In Cliff and
Témiscouata lakes the dwarf form is derived from the Acadian lineage while the normal
form is derived from the Atlantic lineage. Contrary to Cliff Lake and Témiscouata Lake,
the dwarf and normal forms from East Lake are thought to have originated from only one
glacial lineage (Pigeon et al. 1997). A total of 30 individuals were sampled (see Table 4.1
for a summary and Table S 4.1 for more details).
Chromosome suspensions were prepared as described by Fujiwara et al. (2001) with some
modifications (Dion-Côté et al. 2015). Between 0.2 to 2 ml of fresh blood was sampled
with heparinized syringes and kept on ice up to 12 hours. White blood cells were
transferred to 5ml of freshly prepared cell culture media (media 199 [Life technologies], 10
% FBS [Sigma], 0.01 % LPS [Sigma], 60 µg/ml Kanamycine [Sigma], 18 µg/ml
Phytohemaglutinine [Sigma], 0.5 X Antibiotic Antimycotic [Sigma] and 1.75 µl of 10% βmercaptoethanol per 100ml of media). Cells were incubated for 6 days at 20oC with gentle
mixing every 24 h. Colchicine (25 µl of a 1 % solution) was added to the cell suspension 45
min before collection. Cells were hypotonized for 20 min in 2 ml of 0.075M KCl at room
temperature, and then fixed by the addition of an equal volume of fixative (3:1
methanol:acetic acid). Three washes with fixative were performed before dropping the
suspensions on slides (SuperFrost quality).
Giemsa, Chromomycine A3 and C-Band Staining
Metaphase spreads were stained for 10 min in 3% Giemsa-Romanowski (Dr. Kulich
Pharma, Hradec Králové, Czech Republic) in phosphate buffer (pH 6.8-7.0) and then
thoroughly rinsed with dH2O. Chromosomes were then sequentially stained with
Chromomycine A3 (CMA3) and C-banding (with DAPI as a counter stain) according to
Rabová et al. (2015). CMA3 stains GC-rich heterochromatin and often colocalizes with
rDNA genes, while C-banding stains constitutive heterochromatin (i.e. that remains
compacted through interphase) that is thought to be associated with repeats, including
95
centromeres (Comings 1978). Chromosomes were examined using a Provis AX70 Olympus
microscope, and images taken with a CCD camera (DP30W Olympus) equipped with
standard filters. To reduce the effects of technical artifacts, at least 10 metaphases per
individual were examined, and only consistent signals among metaphases were reported.
Fluorescent In Situ Hybridization (FISH)
We amplified the whole 5S and adjacent non-transcribed DNA segments (~170 bp) using
previously
published
primers
(5S-A:
TACGCCCGATCTCGTCCGATC,
5S-B:
CAGGCTGGTATGGCCGTAAGC, (Pendas et al. 1995)). A ~240 bp fragment of the 28S
was amplified also using published primers (28S-C1: ACCCGCTGAATTTAAGCAT,
Dayrat et al. 2001, 28S-D2: TCCGTGTTTCAAGACGGG, Chombard et al. 1998). PCR
products identities were confirmed by Sanger sequencing (Macrogen Inc., Netherlands).
PCR products were purified on agarose gel and labeled with biotin-dUTP or digoxigenindUTP using the Roche Nick Translation kit according to the manufacturer’s instructions
(Roche, Mannheim, Germany). Chromosomes were prepared for hybridization according to
Cremer et al. (2008) following a minimal aging of chromosomes for 3 hours at 37oC.
Labeled probes were hybridized for 24 hours at 37 oC. Cy-3-Streptavidin (Invitrogen, San
Diego, USA) and anti-digoxigenin-fluorescein (Roche, Mannheim, Germany) were used to
detect biotin-dUTP and digoxigenin-dUTP labeled probes, respectively.
Multivariate analyses
We used multivariate analyses to test whether chromosome changes are associated with
glacial lineages, ecotype or lake. If chromosome changes occur randomly, individuals
should also cluster randomly in the multivariate space. The input dataset included
individual characteristics (lineage, lake and ecotype), phenotypic supplementary variables
and cytogenetic markers (Table S 4.1). We also included phenotypic variables, which may
better reflect the proportion of dwarf or normal ancestry of each individual, considering a
certain level of gene flow in all three lakes (weight, length, Fulton condition index index
and the number of gill rakers). These measures were considered in the imputation of
missing data (see below). Cytogenetic markers identified were transformed into a
presence/absence matrix, where a homozygote absent is coded as “0”, a heterozygote is “1”
and a homozygote present is “2”. Since acrocentric chromosomes cannot be readily
96
distinguished from one another, we counted the number of chromosomes that had CMA3
and rDNA 28S (by FISH) signals (numbered 0 to 6). Missing or ambiguous data were
coded as “NA”. The final dataset comprised 39 polymorphic markers (markers found in ≥ 2
individuals were kept) in 29 individuals for which phenotypic data were available. There
were 143 missing data points (“NA”) out of 1131 entries (12.6%, Supplementary Table 1).
Missing data were imputed with the function imputeMFA() from the missMDA package
(version 1.7.3) in R (version 3.2.1). The advantage of this method is that it imputes missing
data based on other known variables (chromosome markers, length, weight, Fulton’s
condition index and gill raker number). Without this preliminary step, downstream analyses
would have replaced missing data by the mean of all of the individuals, thus potentially
blurring signal. The imputation included three groups of variables (continuous and
categorical): (1) the glacial lineage (categorical; Atlantic or Acadian, sensu Bernatchez and
Dodson (1991)), lake (categorical; Cliff, East or Témiscouata Lake), and ecotype; (2) the
weight, length, Fulton’s condition index and the number of gill rakers; (3) chromosome
markers. The first five principal components were used for the imputation (ncp = 5) at a
threshold for imputation of 1x10-9, using the “Regularized” method. The resulting complete
dataset (Table S 4.2) was used for subsequent analyses (multiple factorial analysis, see
below). We did not include sex as a variable, as there was no evidence of a true sex
chromosome in Lake Whitefish (data not shown).
We then performed a Multiple Factorial Analysis (MFA) on the chromosome markers
using the MFA() function from the FactoMineR package (version 1.30, Lê et al. 2008).
Variables from groups 1 and 2 (characteristics and phenotype above) were coded as
supplementary, to test a potential relationship among them and the different chromosome
markers, and thus did not contribute to the dimension definition. The chromosome markers
were coded in two groups: we grouped the continuous aCMA3 and a28S markers together
(from 0 to 6 sites) and all the other markers together (from 0 to 2 for zygosity). The
function dimdesc() from the FactoMineR package was used to retrieve supplementary
variables (factors and factor levels) significantly linked to dimensions constructed by the
MFA. The function applies an ANOVA model with one factor for each dimension. F-tests
were used to detect the influence of each supplementary factor on the dimension (Lineage,
Lake, Ecotype, Lake/ecotype). Then, t-tests were used for each factor level (Atlantic,
97
Acadian;
Témiscouata,
East,
Cliff;
Normal,
Dwarf;
Témiscouata.Dwarf,
Témiscouata.Normal, etc.). This function also provides the estimate of the barycenter
position (centroid) for each factor levels with a significant association to MFA dimensions.
98
4.5 Results
Karyotypes are stable among Lake Whitefish species pairs
Conventional Giemsa staining confirmed that the Lake Whitefish karyotype is of the
salmonid type A sensu Phillips et al. (2001), and is conserved in all six Lake Whitefish
populations from the three lakes (2n = 80, NF = 98). This karyotype includes 10 pairs of
meta/sub-metacentric chromosomes, one pair of large acrocentric chromosomes and 29
pairs of subtelo/acrocentric chromosomes of decreasing size (Booke 1968; Phillips et al.
1996; Dion-Côté et al. 2015). Subtle karyotype polymorphism was revealed and
subsequently included in the multivariate analysis. As shown in Figure 4.1, the length of
the p-arm of chromosome 1 was polymorphic in all lakes. Notably, the short form was near
fixation in dwarf fish compared to normal fish from Cliff Lake (marker “1p”, Table S 4.1).
The p-arm of chromosome 10 was also polymorphic in all three lakes (marker “10p”). In
some instances, it was clearly sub-metacentric, while in others it was sub-telomeric /
acrocentric (Figure 4.1). In addition, we identified a B or supernumerary chromosome in
one dwarf individual from Cliff Lake (Figure 4.2). B chromosomes, usually derived from A
chromosomes, occur in some individuals of a population and do not segregate in a
Mendelian fashion (Jones 1995; Camacho 2005). This bi-armed B chromosome was not
present in all cells examined, and was also never found in more than one copy.
Additionally, it was positively stained with CMA3 and C-bands, suggesting the presence of
repeated elements and heterochromatin, a common feature of B chromosomes (Camacho
2005).
Heterochromatin revealed by CMA3 and C-banding
Chromatin structure and polymorphism was then characterized among lakes and species
pairs. Several chromosomes showed polymorphic CMA3 banding patterns indicative of
GC-rich heterochromatin (Figure 4.1). Telomeric signals were present on chromosomes 3,
4, 5, 9 and 10, and on the centromere/p-arm of one to six acrocentric chromosomes
(summarized in Figure 4.4). A strong telomeric CMA3 signal on acrocentric chromosomes
was found only in two normal individuals from the Atlantic lineage, one from Témiscouata
Lake and the other from Cliff Lake (marker “aCMA-telo”, Table S 4.1). A large CMA3
banding pattern on the p-arm of chromosome 4 was also found in two individuals from
99
Cliff Lake, one dwarf and one normal. The same individuals were also the only ones
showing CMA3 band on the q-arm of chromosome 4. Although there was no clear
association between these variations and specific glacial lineage, ecotype or lake, these
contributed to form significant lake and glacial lineage clusters when combined to other
chromosomal structures in subsequent multivariate analyses (see below).
Both monomorphic and polymorphic C-bands, indicating constitutive heterochromatin,
were found. Most chromosomes had centromeric C-bands in all individuals, although the
staining intensity was stronger in meta/sub-metacentric chromosomes compared to
acrocentric chromosomes (Figure 4.1). Two monomorphic heterochromatin blocks were
also found: 1) a large heterochromatin block on the q-arm of chromosome 2, close to the
centromere; 2) three bands on the q-arm of chromosomes 5 and 6, and 3) a double
interstitial C-band on the large acrocentric chromosome 11. The remaining bands were all
polymorphic, and none were completely differentially fixed among glacial lineages, lakes
or ecotypes. All markers identified are summarized in Figure 4.
Polymorphism of 5S and 28S ribosomal rDNA genes
Ribosomal DNA genes (rDNA) are organized as tandem repeats often associated with
transposable elements (e.g. Cioffi et al. 2010; Symonová et al. 2013; Vergilino et al. 2013)
Several sites located on different chromosomes hybridized with the 5S probe, most of
which co-localized with C-bands (Figure 4.3). There were three 5S rDNA sites on
chromosome 2: the two distal were polymorphic, similarly to C-bands (Figure 4.1 and
Figure 4.3). The signal was much weaker for other sites. One small interstitial band was
found on chromosome 1, a centromeric signal on chromosome 10 (which almost colocalized with rDNA 28S), and centromeric and interstitial signals on different acrocentric
chromosomes. Polymorphic rDNA 28S signals were also detected on the p-arm of
chromosomes 3, 4 and 10. Finally, zero to six rDNA 28S signals were found on the parm/centromere of acrocentric/subtelomeric chromosomes. These rDNA 28S sites tended to
strongly colocalize with the CMA3 staining (see multiple factorial analysis below,
summarized in Figure 4.4).
100
Multiple factorial analysis (MFA) reveals lineage but not ecotype divergence
To detect patterns of cytogenetic variation among all fish analyzed, a multiple factorial
analysis (MFA) was applied to chromosome markers, after imputation of missing data
(Table S 4.2). Together, dimensions 1 and 3 (20.99% and 12.46% of the variance
respectively) revealed glacial lineage and lake differentiation (Figure 4.5). Sympatric dwarf
and normal whitefish within each lake also tended to diverge (minimal overlap between
ellipses), although not in parallel and not significantly (see below). This can be seen by
examining confidence ellipses around the barycenter position estimate (centroid or average
position) for each variable analyzed in each quadrant (Figure 4.5). The second dimension
(14.73% of the variance, Figure S 4.1 and Figure S 4.2) was not significantly associated
with glacial lineage, lake or ecotype.
Dimension 1 correlated significantly with the variables “Lake/ecotype”, “Lake” and
“Lineage” (R2 = 0.72, 0.57, 0.25 respectively, F-test, all p-values < 0.01; Table 4.2). Glacial
lineage factor levels “Atlantic” and “Acadian”, were significantly correlated with this
dimension (T-test, p-value < 0.01; Table 4.3). Cliff and East Lakes were also differenciated
by dimension 1 (T-test, p-value < 0.01, <0.001 respectively; Table 4.3). The East Lake
dwarf whitefish significantly correlated with dimension 1 (T-test, p-value < 0.01). Finally,
Fulton condition index index and length were significantly correlated with dimension 1 (R2
= 0.47 and 0.44 respectively, F-test, p-value < 0.05).
Dimension 3 correlated significantly with the variables “Lake/ecotype” and “Lake” (R2 =
0.46, 0.24 respectively, F-test, p-values < 0.05; Table 4.2). The variable level Cliff.normal
had a significant effect on dimension 3 (T-test, p-value < 0.001; Table 4.3), and
Témiscouata and Cliff lakes (T-test, p-values < 0.05; Table 4.3) were resolved on the third
dimension. Together, dimensions 1 and 3 differentiated glacial lineages and the three lakes,
but did not identify parallel divergence between ecotypes among lakes.
Identification of markers associated with divergence
To identify the chromosome markers most correlated to divergence, we retrieved markers
that were most correlated to dimension 1 and 3 (Table 4.4, Figure 4.6). For example,
acrocentric signals of CMA3 and rDNA28S (a28S and aCMA) were positively correlated
with each other, and with the first dimension (R2 = 0.91, p <0.001; R2 = 0.83, p-value
101
<0.001, respectively). This dimension also resolved glacial lineages and Cliff from East
lake (Table 4.3, Figure 4.5). Thus, individuals from the Atlantic lineage tended to show
more CMA3 and 28S rDNA sites on the centromeres of their acrocentric chromosomes.
Twenty-five different markers (out of 39) were significantly correlated with dimensions 1
and 3 (Table 4.4). Therefore, many markers covary and are associated with divergence
between glacial lineage, among lakes, and between ecotypes within lakes.
102
4.6 Discussion
We evaluated the relationship between chromosomal divergence and the rapid genetic and
phenotypic divergence among Lake Whitefish species-pairs, specifically targeting
cytogenetic markers associated with heterochromatin and repetitive DNA. By
implementing a statistical multivariate framework to study highly polymorphic subchromosomal markers, we found that chromosome reorganization was primarily associated
with earlier allopatric divergence among glacial lineages and more-recent inter-lake
divergence. Dwarf and normal ecotypes showed a trend towards divergence within lakes,
although not in parallel. Together, these observations support the hypothesis that substantial
chromosome structure changes are associated with divergence in the Lake Whitefish
system, but via unique mechanisms in each lake.
High chromosomal polymorphism in the Lake Whitefish system
The karyotypes we report are consistent with previous work in other populations of Lake
Whitefish, and show that Robertsonian rearrangements are not present in comparison to
previously reported karyotypes from other populations (Booke 1968; Phillips et al. 1996;
Dion-Côté et al. 2015). However, substantial sub-chromosomal polymorphism was
identified, mainly involving labile or rapidly evolving heterochromatin structures and
repeated DNA. Such sub-chromosomal polymorphisms have previously been shown to be
heritable (Phillips and Ihssen 1986), thus representing true genetic/epigenetic variation.
Several characteristics of the Lake Whitefish system may contribute to this high level of
polymorphism, as discussed below.
Firstly, pronounced cytogenetic polymorphism is common in salmonids, including
Coregonus and is hypothesized to be due to ancestral tetraploidy (Phillips and Ráb 2001).
Indeed, length polymorphism has been identified in the p-arm of chromosome 1 in
Coregonus (Phillips et al. 1996; Jankun and Ráb 1997), and polymorphism of rDNA genes
has been observed among lineages of Salmo trutta (Caputo et al. 2009), and Coregonus
albula (Jankun et al. 2003). These observations are consistent with karyotype and genetic
flexibility in fishes, especially in salmonids, in which a number of genome duplications
have occurred (Phillips and Ráb 2001; Ravi and Venkatesh 2008; Mable et al. 2011).
103
Secondly, the demographic history of the Lake Whitefish species pairs is expected to favor
the spread of chromosome rearrangements to some extent. This is because allopatry, as
found among the Atlantic and Acadian glacial lineages, is predicted to favor the differential
fixation of alleles, such as chromosome rearrangements (Coyne and Orr 2004). In addition,
simulations have shown that mixed geographic modes of divergence, that is geographical
isolation followed by secondary contact in sympatry, as is the case of Lake Whitefish,
promote the fixation of chromosome rearrangements (Feder et al. 2011). Finally,
colonization events that lead to population bottlenecks may rapidly lead to genetic
differentiation, including through chromosomal rearrangements, even if they are slightly
deleterious (King 1993; Faria and Navarro 2010; Feder et al. 2011). Hence, in addition to
the labile nature of salmonid genomes, the observation of high chromosomal polymorphism
corroborates expectations based on the historical biogeography of the Lake Whitefish.
Thirdly, several lines of evidence indicate that Lake Whitefish hybrids experience genomic
instability. In general, introgressive hybridization, can promote genome reshuffling in fish
(Pereira et al. 2014). Introgressive hybridization occurs among species-pairs of Lake
Whitefish in natural populations (Gagnaire et al. 2013), and is associated with the
derepression of non-coding RNAs and transposable elements in Lake Whitefish hybrids
(Renaut et al. 2010; Dion-Côté et al. 2014). When transposable elements are derepressed,
they can promote genome rearrangements and propagate in the genome of their host (Levin
and Moran 2011). In addition, we have found that aneuploidy, an extreme form of genome
instability that can promote chromosome rearrangements in meiosis, occurs in backcrosses
between dwarf and normal Lake Whitefish (Dion-Côté et al. 2015). Therefore, we
hypothesize that gene flow between dwarf and normal Lake Whitefish may contribute to
maintain chromosomal polymorphisms in Lake Whitefish.
Overall, the intrinsic properties of salmonid genomes, historical biogeography of the Lake
Whitefish, and ongoing introgressive hybridization are all expected to increase genome
lability and consequently polymorphism. The sampling of purely allopatric populations and
other sympatric species pairs should help to test these hypotheses and disentangle the
relative contributions of ancestral chromosomal polymorphisms and de novo chromosome
reorganization in association with divergence.
104
A statistical multivariate strategy to analyze cytogenetic polymorphism
Our statistical multivariate strategy helped to resolve patterns in this highly polymorphic
dataset and identify markers associated with divergence. Importantly, this approach allows
the use of discrete and continuous data, such as presence/absence of a specific marker or
the number of rDNA sites. In addition, it is possible to include supplementary phenotypic
measures (e.g. length, weight) or environmental data (e.g. lake) to test their association with
cytogenetic patterns. A similar method was recently published based on principal
coordinates analysis (PCoA) 0CPSXXG?LB!JRLMPBS
. However, this method is based
on continuous variables such as total haploid chromosome length and centromeric
asymmetry, which are difficult to implement in non-model systems. In addition, this
method does not allow handling of discrete data, such as presence and absence of
cytogenetic markers. To our knowledge, it is the first time that MFA has been applied to
cytogenetic data and code is made readily available to the community (see supplementary
material).
Standing chromosomal variation shaped by divergence
Heterochromatin and rDNA polymorphism is mostly shared among the three species pairs
sampled, so we suggest that the ancestral Lake Whitefish population already had high
levels of polymorphism, or standing chromosomal variation. Multivariate analyses
revealed three nested levels of divergence based upon cytogenetic polymorphisms detected
with 5S and 28S rDNA, CMA3 and C-Band staining: 1) between glacial lineages, 2) among
lakes, and 3) between sympatric species pairs within lakes, which are consistent with welldocumented population genetic structure in the system (Pigeon et al. 1997; Lu et al. 2001;
Campbell and Bernatchez 2004; Bernatchez et al. 2010).
The most striking level of divergence revealed by multivariate analyses is between glacial
lineages. Genetic divergence, including through chromosome rearrangements, is facilitated
by geographic isolation (Coyne and Orr 2004; Kawakami et al. 2009). Therefore,
geographical isolation between the Atlantic and Acadian lineages either promoted the
differential fixation of ancestral chromosomal rearrangements (standing chromosomal
variation), or allowed for de novo remodeling of new chromosome rearrangements within a
glacial lineage after geographical separation.
105
The second strongest signal revealed by the multivariate approach was among lakes. Two
non-mutually exclusive hypotheses may explain divergence among lakes, similar to the
divergence among glacial lineages. First, population bottleneck associated with lake
colonization may have lead to differential fixation of ancestral cytogenetic variants (Mayr
1954b). Alternatively, considering that the markers used in this study are associated with
heterochromatic repeats and are thus extremely labile, there may have been de novo
remodeling following lake colonization. Such rapid remodeling (<15,000 years) of subchromosomal structures associated with ecological divergence has been previously
documented in another young Coregonus species pair in Europe using a similar strategy
(Symonová et al. 2013). If rapid remodeling occurred in the Lake Whitefish, then the
sharing of chromosomal variation between sympatric ecotypes would be best explained by
ongoing gene flow (Gagnaire et al. 2013).
Thirdly, ecotypes within lakes showed a trend towards divergence, however this was not
significant in most cases and did not occur in parallel among lakes. The absence of
parallelism and incomplete differentiation between ecotypes can be explained by: 1) the
short time since divergence, 2) the unique combination of evolutionary forces in each lake,
and 3) possible de novo chromosomal reorganization. Lake Whitefish ecotypes began to
diverge after lake colonization, < 12,000 YBP (or < 3-4,000 generations). This is very little
time to fix chromosome rearrangements that may span large regions of the genome (e.g.
Mb). Yet, stochastic processes experienced by small populations (i.e. genetic drift), as
expected for Lake Whitefish, may accelerate fixation (Coyne and Orr 2004). In addition,
ongoing gene flow may tend to eliminate differences between ecotypes. Finally, there are
unique combinations of evolutionary forces at play within each lake, including differences
in abiotic factors, which could contribute to non-parallelism among lakes (Landry et al.
2007). Potential de novo reorganization may also contribute to generate unique
chromosome patterns with each lake.
While we cannot exclude the contribution of de novo chromosomal rearrangements after
post-glacial lake colonization, our data suggest there was already a high level of standing
chromosomal variation segregating between both Lake Whitefish glacial lineages under
study. Indeed, most polymorphism is shared among all lakes and ecotypes. Allopatry and
post-colonization history also appear to have shaped chromosome organization.
106
Heterochromatin architecture divergence and reproductive isolation
Michalak (2009) suggested that heterochromatin structure and function may be involved in
rapid divergence and hybrid breakdown. Heterochromatin plays pivotal roles in
transcriptional regulation and chromosome segregation (Grewal and Jia 2007), and may
influence crossover localization during meiosis (John and King 1985). Several indirect
observations suggest that heterochromatin divergence between normal and dwarf forms,
and disruption in their hybrids, may occur. Here, we observed that cytogenetic markers
associated with heterochromatin and repetitive DNA tend to differ between sympatric
species pairs. We previously found that DNMT1, an enzyme involved in heterochromatin
maintenance (DNA methylation specifically), is down-regulated in malformed backcross
embryos (Dion-Côté et al. 2014). Consistent with heterochromatin disruption in these
malformed backcrosses, we also reported global transcriptional deregulation, transposable
elements derepression, and non-coding RNAs upregulation (Dion-Côté et al. 2014).
Finally, we have previously shown that Lake Whitefish hybrids suffer from mitotic and
meiotic instability (Dion-Côté et al. 2015). As a first step to directly test the role of
heterochromatin in the divergence of Lake Whitefish species pairs, we are currently
studying DNA methylation patterns in dwarf and normal Lake Whitefish, and inheritance
patterns in reciprocal hybrids.
The potential for Variable Reproductive Isolation in the Lake Whitefish system
Variable reproductive isolation (VRI) is characterized by heritable polymorphism for
hybrid incompatibilities within a species (Cutter 2012). Several observations support the
idea that the Lake Whitefish may be a good non-model system to test for the importance
and extent of variable reproductive isolation (VRI) between nascent species. Genome-wide
approaches in the Lake Whitefish system have uncovered very few differentially fixed
alleles between ecotypes (Renaut et al. 2010; Gagnaire et al. 2013; Hébert et al. 2013). In
addition, independent crosses made in a controlled environment revealed variable survival
rate in hybrids (Lu and Bernatchez 1998; Rogers and Bernatchez 2006; Renaut and
Bernatchez 2011). The present study identified a high level of chromosome polymorphism
segregating in three natural species pairs, which may be associated with mitotic and meiotic
instability in hybrids (Dion-Côté et al. 2015). While the direct cause of this chromosomal
107
instability remains unclear, our data suggest this could be due to polymorphic subchromosomal rearrangements. If VRI associated to chromosome changes occurs in this
system, chromosome divergence patterns should correlate with the genetic and phenotypic
divergence gradient previously documented (Lu and Bernatchez 1999; Campbell and
Bernatchez 2004; Rogers and Bernatchez 2007; Gagnaire et al. 2013). The distance
between barycenter position estimates between ecotypes within lake shows a trend
consistent with this hypothesis (Témiscouata < East < Cliff), but deeper sampling,
including Webster Lake and Indian Lake not analyzed here, should help clarify this point.
In conclusion, we have uncovered extensive sub-chromosomal polymorphism in the Lake
Whitefish system, by combining classic and molecular cytogenetics. Polymorphic
chromosomal markers were correlated with geographical isolation, lake colonization and
ecotype divergence. A large body of work has shown that natural selection drove ecological
and morphological divergence in the Lake Whitefish system. Similarly to Mimulus species
(Fishman et al. 2013), chromosome rearrangement may have consolidated reproductive
isolation in the Lake Whitefish system. Echoing (Valente et al. 2014), we encourage the
speciation research community to apply cytogenetics to complement sequencing efforts to
document potentially important polymorphism from an evolutionary standpoint which
cannot readily be evidenced even with the best sequencing techniques currently available.
108
4.7 Data accessibility
Complementary data, including 5S and 28S rDNA sequencing files and raw microscopy
images for all techniques used (Giemsa, C-bands, CMA3 and FISH) are available on Dryad
(doi:10.5061/dryad.tg0mt).
109
4.8 Acknowledgements
We would like to thank Martin Laporte for insightful discussions, and Clément Rougeux,
Anne C. Dalziel and Ben J.G. Sutherland for reading an earlier version of this manuscript.
We are grateful to Alex Bernatchez for counting gill rakers. This work was supported by a
Natural Science and Engineering Research Council of Canada (NSERC) discovery grant
and Canadian Research Chair in Genomics and Conservation of Aquatic Resources to L.B.,
and a NSERC postgraduate scholarship to A.-M.D.-C. A.-M.D.-C. also received financial
support from the FRQ-NT for international training through Québec-Océan. RS and PR
were supported by the project 14-02940S of the Czech Science Foundation. This is a
contribution to the Québec-Océan research program.
110
4.9 Tables
Table 4.1. Number of individuals analyzed per lake and ecotype with their average
phenotypic characteristics. Fulton: Fulton condition index.
Lake
Témiscouata
East
Cliff
Ecotype
Lineage
n
Dwarf
Acadian
5
Normal
Atlantic
5
Dwarf
Acadian
4
Normal
Acadian
4
Dwarf
Acadian
5
Normal
Atlantic
6
Average
weight
82.00 ±
9.90
197.42 ±
111.41
48.98 ±
22.98
310.38 ±
251.40
138.68 ±
21.97
427.33 ±
25.42
111
Average
length
21.04 ±
0.90
26.56 ±
5.00
17.68 ±
2.18
29.80 ±
6.49
23.86 ±
1.61
33.97 ±
0.89
Average
Fulton
0.88 ±
0.04
0.97 ±
0.06
0.84 ±
0.14
1.01 ±
0.10
1.02 ±
0.10
1.10 ±
0.03
Average n
gill rakers
22.60 ±
3.51
22.80 ±
0.45
21.50 ±
2.12
24.25 ±
2.50
24.75 ±
1.26
24.00 ±
1.41
Table 4.2. Factors (supplementary variables) with a significant effect on dimensions 1 and
3 from the multiple factor analysis. R2 and p-values (F-test) were calculated by the
dimdesc() function from the FactoMineR package. No factor had a significant effect on
dimension 2.
R2
Dimension 1
Lake/ecotype
0.7146
Lake
0.5739
Lineage
0.2523
Dimension 3
Lake/ecotype
0.4572
Lake
0.235
Factor
p-value
< 0.0001
< 0.0001
0.0055
0.0108
0.0307
112
Table 4.3. Significant barycenter position estimates of the factor levels for which main
factors had a significant effect on dimensions 1 and 3 from the multiple factor analysis
(Table 1). R2 and p-values (t-test) were calculated by the dimdesc() function from the
FactoMineR package. Note that no factors had a significant effect on dimension 2.
Factor
level
Estimate
Dimension 1
Cliff
0.975
Atlantic
0.6185
Acadian
-0.6185
East:Dwarf
-1.8441
East
-1.2682
Dimension 3
Cliff:Normal
1.0816
Cliff
0.442
Témiscouata
-0.5881
p-value
0.0016
0.0055
0.0055
0.0001
< 0.0001
0.0008
0.0476
0.0106
113
Table 4.4. Chromosome markers signficantly correlated (p < 0.05) to dimensions 1 and 3
from the multiple factor analysis.
Marker
Correlation
Dimension 1
a28S
0.9125
aCMA
0.8295
2q5S2
0.516
10p
0.435
aCInter
0.4101
8pC1
0.4094
10pC1
0.4074
2qC2
0.3745
4pCMA1
-0.4126
3pCMA1
-0.4172
aCsmallDouble
-0.4333
4pC1
-0.4497
4p28S1
-0.4887
3p28S1
-0.5259
3pC1
-0.5351
Dimension 3
5qCMA1
0.7062
3pCMA1
0.6711
4qC2
0.6268
aCMA.telo
0.5227
3p28S1
0.4989
3pC1
0.4183
4pC2
0.3897
1q5S
-0.4383
a5S
-0.535
10p28S1
-0.5905
2qC3
-0.6855
10pCMA1
-0.6901
2q5S3
-0.7084
p-value
< 0.0001
< 0.0001
0.0042
0.0184
0.0271
0.0274
0.0283
0.0453
0.0261
0.0244
0.0189
0.0144
0.0071
0.0034
0.0028
< 0.0001
0.0001
0.0003
0.0036
0.0059
0.0239
0.0367
0.0174
0.0028
0.0007
< 0.0001
< 0.0001
< 0.0001
114
4.10 Figures
A
B
C
D
10µm
Figure 4.1. Example of chromatin structure polymorphism in a normal and a dwarf
individual from East Lake. The same slides were sequentially stained with Giemsa (not
shown), Chromomycine A3 (CMA3) and C-bands. Identifiable chromosomes are
numbered, followed by acrocentric chromosomes with markers scored, and then remaining
acrocentric chromosomes by decreasing size. A) C-banded karyotype of a normal
individual from East Lake. Note the long form of chromosome 1. B) CMA3 karyotype of
the same normal individual from East Lake. C) C-banded karyotype of a dwarf individual
from East Lake. Note the short of chromosome 1. D) CMA3 karyotype of the same dwarf
individual from East Lake.
115
A
B
C
Figure 4.2. B chromosome identified in Lake Whitefish by A) C-banding, B) CMA3/DAPI
staining and C) FISH with rDNA 28S (red) and rDNA 5S (green) probes.
116
A
B
10µm
Figure 4.3. Representative rDNA sites polymorphism shown by Fluorescent In Situ
Hybridization (FISH) with probes for 5S and 28S rDNA. rDNA 5S is shown in green and
rDNA 28S is shown in red. Note the heterozygote status for chromosome 2. A) FISH
karyotype of a normal individual from East Lake. B) FISH karyotype of a dwarf individual
from East Lake.
117
p: long/
short
pC1,
pCMA2, 28S1
pC1, CMA1
CMA1,
28S1
p: SM/ST
pCMA1
pC1
qC1, 5S1
qC2, 5S2
qC3, 5S3
1
qC1
qC1
2
3
qC2
qC3
qCMA1
qC2
qC3
qCMA1
4
5
pC1
pC1
p: ST/SM
pCMA1,
C1,28S1
c5S
pC1
pCMA1
qC1
qC2
qC1
qC2
qC3
6
qC1
7
8
9
10
C1
aCMA
C2
a28S
a5S
a-large
C-double
a-small
C-double
aCMA-telo
11
a5S-inter
aC-inter
aC-telo
C-Band (strong)
C-Band (weak)
CMA3
DAPI
FISH 28S
FISH 5S
...
Figure 4.4. Partial consensus ideogram for all three species pairs showing chromosome
shape and all markers identified on chromosomes and scored. Markers are identified
according to the chromosome arm on which they are, technique used and distance from
centromere (1 for the closest, then 2, etc.). The eleven readily identifiable chromosomes are
numbered 1 to 11, followed by nine acrocentric chromosomes bearing markers (20
chromosomes missing).
118
-2
-1
Lake.Ecotype
0
1
2
Lineage
3
2
Cliff:Normal
Dim 3 (12.46%)
East:Dwarf
East:Normal
Cliff:Dwarf
Temis:Dwarf
Temis:Normal
Acadian
Atlantic
1
0
-1
Ecotype
Lake
3
2
1
Normal
Dwarf
0
Cliff
East
Temis
-1
-2
-1
0
1
2
Dim 1 (20.99%)
Figure 4.5. Multiple factor analysis performed with FactoMineR. Dimensions 1 and 3 are
shown. Each quadrant shows the same data with the ellipses around factor levels shown:
“lineage”, “lake”, “ecotype” or “Lake/ecotype”. Data points represent single individuals.
119
Figure 4.6. Multiple factor analysis correlation circle for dimensions 1 and 3. Vectors for
all variables are shown, but only the five most structuring chromosome markers are
identified. Marker names correspond to Figure 4 for a physical representation of the
markers identified and Table 4 for correlation values. Dashed vectors represent
supplementary variables that are not used to compute the MFA (length, weight, Fulton, gill
raker number).
120
4.11 Supplementary Code
# Multiple Factor Analysis - MFA
Multiple Factor Analysis is dedicated to datasets where variables are structured into groups.
Several sets of variables (continuous or categorical) are therefore simultaneously studied.
The core of MFA is based on a factorial analysis (PCA in the case of quantitative variables,
MCA in the case of qualitative variables) in which the variables are weighted. **These
weights are identical for the variables of the same group (and vary from one group to
another)**.
```{r}
rm (list=ls())
```
## 0- Load libraries
```{r}
require(FactoMineR, quietly = TRUE, warn.conflicts = FALSE)
require(missMDA, quietly = TRUE, warn.conflicts = FALSE)
require(dplyr, quietly = TRUE, warn.conflicts = FALSE)
suppressMessages(library(gplots))
```
## 1- Import and format data
```{r}
Input <- "analyses_master_10072015.csv"
df = read.table(Input, header=T, row.names =1, sep=",")
cNAMES = colnames(df)
rNAMES = row.names(df)
row.names(df) = rNAMES
121
# Remove "X" added to column titles by R
colnames(df) = gsub("X","",cNAMES)
# We need to transform data to numerical values
cols = c(10:ncol(df))
df[,cols] = as.data.frame(apply(df[,cols], 2, function(x) as.numeric(x)))
```
## 2- Impute the incomplete data set with imputeMFA
Impute the missing values of a dataset with Multiple Factor Analysis (MFA). The variables
are structured a priori into groups of variables. The variables can be continuous or
categorical but within a group the nature of the variables is the same.
```{r}
res.impute.nosex <- imputeMFA (df.nosex, group=c(4,4,39), type=c("n","c","c"),
method="Regularized", ncp = 5,threshold = 1e-09, maxiter = 1000000)
# write imputed data in a .csv file
write.csv(cbind(df.nosex[,c(1:4)],(res.impute.nosex$tab.disj[,c(1:13)])),file="imputed_data.csv")
```
## 3- Perform a Multiple Factor Analysis (MFA)
**Variable group definition**
Group 1: SUPPLEMENTARY qualitative variables = Race, Ecotype, Lake, Lake.ecotype
Group 2: SUPPLEMENTARY quantitative variables = Mass, Length, Fulton, Gill rakers
Group 3: ACTIVE 2 NOT SCALED quantitative variables (0:6 positive chromosomes)
Group 4: ACTIVE 37 NOT SCALED quantitative variable (0,1,2) = remaining. Coded as
quantitative in order to keep the relationship among alleles, i.e. the heterozygote (1) is more
similar to both homozygotes (0,2) than homozygotes to each other
122
```{r}
res.MFA.nosex=MFA(res.impute.nosex$completeObs, group=c(4,4,2,37),
type=c("n","c","s","s"), name.group=c("supp. quali", "supp. quanti", "Continuous markers",
"Discrete Markers"), num.group.sup=c(1,2), ncp=5, graph=F)
plot(res.MFA.nosex, habillage=1, title="Lake", cex=0.8)
plot(res.MFA.nosex, habillage=2, title="Lake:Ecotype", cex=0.8)
plot(res.MFA.nosex, habillage=3, title="Ecotype", cex=0.8)
plot(res.MFA.nosex, habillage=4, title="Lineage", cex=0.8)
```
## 4- Retrieve significant supplementary variables for each dimension
The supplementary variables are those that have an absolute v.test > 1.96
```{r}
write.infile(dimdesc(res.MFA.nosex, axes=c(1:3)), file="dim.des.csv")
v.test=res.MFA.nosex$quali.var.sup$v.test
v.test
write.csv(v.test, file="v.test.csv")
```
## 5- Plot 0.95 confidence ellipses for supplementary qualitative variables in the 3 first
dimensions
```{r}
plotellipses(res.MFA.nosex, magnify = 2.5, axis=c(1,2), cex=0.4)
plotellipses(res.MFA.nosex, magnify = 2.5, axis=c(1,3), cex=0.4)
plotellipses(res.MFA.nosex, magnify = 2.5, axis=c(2,3), cex=0.4)
```
## 6- Retrieve the most influent chromosome markers in the 3 first dimensions
123
```{r}
# Create PDF with significant dimensions c(1:2)
cairo_pdf(filename="MFA_var_dim1.2.pdf")
plot(res.MFA.nosex, axes=c(1,2), choix="var", habillage="group", cex=1, shadow=TRUE,
select="contrib 5", title="5 most important variables for axis 1 & 3 contruction")
dev.off()
# Create PDF with significant dimensions (here 1&3)
cairo_pdf(filename="MFA_var_dim1.3.pdf")
plot(res.MFA.nosex, axes=c(1,3), choix="var", habillage="group", cex=1, shadow=TRUE,
select="contrib 5", title="5 most important variables for axis 1 & 3 contruction")
dev.off()
# Retrieve loadings
# You can calculate them by dividing variables' coordinates on a dimension by this
dimension's eigenvalue's square root.
loadings =
sweep(res.MFA.nosex$quanti.var$coord,2,sqrt(res.MFA.nosex$eig[1:ncol(res.MFA.nosex
$quanti.var$coord),1]),FUN="/")
write.csv(loadings, file = "loadings.markers.csv", quote=F)
```
124
4.12 Supplementary tables
Table S 4.1. Individual characteristics and markers scored. Weight is in g and Length is in
cm. The number of signals is reported for markers « a28S » and « aCMA ». For all other
markers, 0 stands for homozygote absent, 1 stands for heterozygote, and 2 stands for
homozygote present.
Individual
TN5
TN9
TN13
TN14
TN15
TD1
TD2
TD3
TD7
TD9
ED13
ED21
ED23
ED26
EN17
EN21
EN11
EN12
CD2
CD6
CD8
CD10
CD19
CD22
CN3
CN4
CN7
CN9
CN14
CN15
Sex
M
F
F
F
M
M
F
M
F
F
M
M
M
F
F
F
M
M
F
M
F
F
M
F
M
F
M
M
F
F
Lake
Temis
Temis
Temis
Temis
Temis
Temis
Temis
Temis
Temis
Temis
East
East
East
East
East
East
East
East
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Lake/ecotype
Temis:Normal
Temis:Normal
Temis:Normal
Temis:Normal
Temis:Normal
Temis:Dwarf
Temis:Dwarf
Temis:Dwarf
Temis:Dwarf
Temis:Dwarf
East:Dwarf
East:Dwarf
East:Dwarf
East:Dwarf
East:Normal
East:Normal
East:Normal
East:Normal
Cliff:Dwarf
Cliff:Dwarf
Cliff:Dwarf
Cliff:Dwarf
Cliff:Dwarf
Cliff:Dwarf
Cliff:Normal
Cliff:Normal
Cliff:Normal
Cliff:Normal
Cliff:Normal
Cliff:Normal
Ecotype
Normal
Normal
Normal
Normal
Normal
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Normal
Normal
Normal
Normal
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Normal
Normal
Normal
Normal
Normal
Normal
Race
Atlantic
Atlantic
Atlantic
Atlantic
Atlantic
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Acadian
Atlantic
Atlantic
Atlantic
Atlantic
Atlantic
Atlantic
Weight
94.4
112.9
368.9
239.8
171.1
72.5
76.1
84.4
79.1
97.9
80.3
44.6
46
25
681.7
168.6
144.2
247
149.6
141
105.3
133.1
NA
164.4
438.4
437.9
376.2
442.6
429.5
439.4
Length
21.3
22.8
34
28.2
26.5
20.6
20.3
20.8
20.9
22.6
20.8
17.4
16.7
15.8
39.2
26
25
29
24.7
24.3
21
24.6
NA
24.7
34.5
34.8
32.3
34.3
33.7
34.2
Fulton
0.98
0.95
0.94
1.07
0.92
0.83
0.91
0.94
0.87
0.85
0.89
0.85
0.99
0.63
1.13
0.96
0.92
1.01
0.99
0.98
1.14
0.89
NA
1.09
1.07
1.04
1.12
1.10
1.12
1.10
125
Gills
23
22
23
23
23
17
22
26
25
23
20
NA
NA
23
25
27
21
24
25
25
23
26
NA
NA
23
26
24
25
24
22
a28S aCMA 1p 1pDAPI 10p
2
6
1
2
1
NA
3
1
2
2
3
4
1
2
0
3
5
1
1
1
5
5
0
2
NA
NA
2
1
0
0
4
4
1
1
2
3
2
2
1
1
2
2
2
2
1
NA
2
2
2
0
1
2
1
2
0
0
0
1
2
1
0
3
0
1
0
1
2
0
2
1
3
4
1
2
0
NA
1
2
2
1
3
3
1
2
1
NA
1
NA
1
0
NA
2
0
2
2
5
4
0
2
1
NA
3
0
2
0
4
5
1
2
2
NA
2
NA
1
0
3
3
0
2
2
NA
2
2
2
1
NA
5
1
2
1
4
6
1
2
0
NA
5
2
2
2
3
6
2
2
2
3
3
2
2
1
a5S a5SaCMA1
2
0
NA
NA
0
2
1
0
1
0
1
2
1
0
NA
NA
0
1
2
0
0
0
0
2
2
0
NA
NA
0
1
2
0
0
1
0
0
0
0
2
2
0
2
2
0
NA
NA
0
0
1
0
NA
NA
0
NA
NA
0
0
0
0
NA
NA
0
0
0
0
NA
NA
0
1
0
0
NA
NA
0
NA
NA
0
0
1
1
NA
NA
0
0
2
0
0
0
0
Individual
TN5
TN9
TN13
TN14
TN15
TD1
TD2
TD3
TD7
TD9
ED13
ED21
ED23
ED26
EN17
EN21
EN11
EN12
CD2
CD6
CD8
CD10
CD19
CD22
CN3
CN4
CN7
CN9
CN14
CN15
1q5S 2qC2 2qC3 2q5S2 2q5S3 3pC1 3pCMA1 3p28S1 4pC1 4pC2 4pCMA1 4p28S1 4qC2 4qC3 4qCMA1 5pC1 5pCMA1
0
2
2
2
2
0
1
1
0
0
1
1
0
1
0
2
0
NA
2
2
NA
NA
0
0
NA
0
2
1
NA
0
2
0
2
2
1
2
2
2
2
1
1
1
0
0
0
1
0
2
0
2
0
0
2
2
2
2
1
1
1
0
2
0
1
0
2
0
2
0
NA
2
2
2
2
0
0
1
0
0
0
1
0
0
0
0
0
NA
2
2
NA
NA
2
2
NA
0
2
0
NA
0
2
0
2
2
1
2
2
2
2
0
0
1
0
0
0
0
0
2
0
2
0
0
2
2
2
2
1
1
1
0
2
1
1
0
2
0
2
0
1
2
2
2
2
0
0
0
0
0
2
2
0
2
0
0
0
NA
2
2
NA
NA
1
1
NA
0
2
0
NA
0
2
0
2
0
0
0
0
0
0
1
1
2
2
2
0
0
2
2
0
2
0
0
2
2
2
2
2
2
2
1
2
1
2
0
1
0
2
1
NA
2
1
NA
NA
1
1
NA
0
1
1
NA
0
0
0
0
0
0
1
1
1
1
2
1
1
2
2
1
0
0
2
0
2
0
1
2
2
2
2
2
1
1
1
2
0
0
2
2
0
2
0
NA
2
2
NA
NA
1
1
NA
0
1
1
NA
0
2
0
2
1
0
1
1
1
1
1
1
1
NA NA
0
1
NA NA
0
NA
0
NA
2
1
NA
NA
1
1
NA
0
0
0
NA
0
2
0
0
0
NA
2
2
NA
NA
1
1
NA
2
2
0
NA
0
0
0
2
0
0
2
2
2
2
0
0
0
0
0
0
0
0
2
0
2
0
NA
2
2
NA
NA
0
0
NA
0
2
0
NA
0
2
0
2
0
0
2
1
2
1
0
0
0
0
2
0
0
1
2
0
2
0
NA
2
2
NA
NA
1
1
NA
0
2
0
NA
1
2
1
2
0
0
2
2
2
2
1
1
1
0
2
0
0
0
2
0
2
0
NA
2
0
NA
NA
1
1
NA
0
0
0
NA
0
0
0
2
0
NA
2
0
NA
NA
1
1
NA
0
2
1
NA
0
2
1
2
0
0
2
0
2
0
1
2
2
0
2
0
1
2
2
0
2
2
NA
2
0
NA
0
2
2
NA
0
2
0
NA
2
2
0
2
0
0
2
0
2
0
0
0
0
0
2
0
0
0
2
0
2
0
0
2
0
2
0
1
1
1
0
0
0
0
0
0
0
2
0
126
Individual 5qCMA1 6pC1 7pC1 7qC1 7qC2 8pC1 8qC1 10pCMA1 10pC1 10p28S1 10c5S aCLargeDouble aCInter aCsmallDouble
TN5
0
2
NA NA NA NA NA
1
1
2
0
0
2
0
TN9
0
2
2
2
2
2
2
2
2
NA
NA
2
2
0
TN13
0
2
NA NA NA NA NA
2
2
2
0
2
2
2
TN14
0
2
0
0
1
1
0
2
2
2
0
2
2
0
TN15
0
0
0
0
1
1
0
2
2
2
2
0
2
0
TD1
0
2
NA NA NA NA NA
2
2
NA
NA
0
0
2
TD2
0
2
2
1
2
2
2
2
2
2
1
2
2
2
TD3
0
2
2
0
2
2
2
2
0
1
0
0
0
2
TD7
0
0
0
0
0
0
0
2
0
1
0
0
1
2
TD9
0
2
1
0
2
NA NA
2
0
NA
NA
0
1
2
ED13
0
2
2
0
2
2
2
2
0
2
1
2
0
2
ED21
0
2
NA NA NA NA NA
2
2
2
0
0
1
2
ED23
0
0
1
0
1
0
0
1
2
1
NA
0
2
2
ED26
0
2
2
2
2
2
2
2
0
2
1
2
NA
2
EN17
0
2
2
0
2
2
2
2
0
2
2
2
0
2
EN21
0
2
2
0
2
2
2
2
0
NA
NA
2
2
2
EN11
0
NA NA NA NA NA NA
2
2
2
1
NA
NA
NA
EN12
0
2
0
0
0
0
0
1
2
NA
NA
0
0
2
CD2
0
2
0
0
0
2
2
2
0
NA
NA
0
2
0
CD6
0
2
0
0
2
2
2
2
2
2
0
0
2
0
CD8
0
2
2
0
2
2
2
2
2
NA
NA
0
1
2
CD10
0
2
0
0
0
2
2
2
2
2
NA
0
2
1
CD19
0
2
NA NA NA NA NA
2
2
NA
NA
0
2
0
CD22
0
2
2
0
2
2
2
2
2
2
0
0
2
1
CN3
NA
2
2
0
2
0
2
2
2
NA
NA
0
2
0
CN4
0
2
2
0
2
2
2
2
2
NA
NA
0
0
2
CN7
2
2
1
0
0
2
2
0
2
0
0
0
2
2
CN9
0
2
0
0
2
2
2
1
1
NA
NA
0
0
2
CN14
0
2
2
0
2
2
2
2
2
2
1
0
2
2
CN15
0
2
0
0
0
0
0
2
2
2
0
NA
NA
NA
127
Table S 4.2. Individual characteristics and imputed marker values using the imputeMFA()
function. Weight is in g and Length is in cm. The number of signals is reported for markers
« a28S » and « aCMA ». For all other markers, 0 stands for homozygote absent, 1 stands
for heterozygote, and 2 stands for homozygote present, except when the value was imputed.
Individua
TN5
TN9
TN13
TN14
TN15
TD1
TD2
TD3
TD7
TD9
ED13
ED21
ED23
ED26
EN17
EN21
EN11
EN12
CD2
CD6
CD8
CD10
CD22
CN3
CN4
CN7
CN9
CN14
CN15
Lake
Temi
Temi
Temi
Temi
Temi
Temi
Temi
Temi
Temi
Temi
East
East
East
East
East
East
East
East
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Cliff
Lake/ecotyp
Temis:Norma
Temis:Norma
Temis:Norma
Temis:Norma
Temis:Norma
Temis:Dwarf
Temis:Dwarf
Temis:Dwarf
Temis:Dwarf
Temis:Dwarf
East:Dwarf
East:Dwarf
East:Dwarf
East:Dwarf
East:Normal
East:Normal
East:Normal
East:Normal
Cliff:Dwarf
Cliff:Dwarf
Cliff:Dwarf
Cliff:Dwarf
Cliff:Dwarf
Cliff:Normal
Cliff:Normal
Cliff:Normal
Cliff:Normal
Cliff:Normal
Cliff:Normal
Ecotyp
Normal
Normal
Normal
Normal
Normal
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Normal
Normal
Normal
Normal
Dwarf
Dwarf
Dwarf
Dwarf
Dwarf
Normal
Normal
Normal
Normal
Normal
Normal
Race
Atlanti
Atlanti
Atlanti
Atlanti
Atlanti
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Acadia
Atlanti
Atlanti
Atlanti
Atlanti
Atlanti
Atlanti
Mas Lengt Fulto Gills
94.4 21.3 0.98 23.0
112. 22.8 0.95 22.0
368.
34
0.94 23.0
239. 28.2 1.07 23.0
171. 26.5 0.92 23.0
72.5 20.6 0.83 17.0
76.1 20.3 0.91 22.0
84.4 20.8 0.94 26.0
79.1 20.9 0.87 25.0
97.9 22.6 0.85 23.0
80.3 20.8 0.89 20.0
44.6 17.4 0.85 22.0
46
16.7 0.99 21.9
25
15.8 0.63 23.0
681. 39.2 1.13 25.0
168.
26
0.96 27.0
144.
25
0.92 21.0
247 29
1.01 24.0
149. 24.7 0.99 25.0
141 24.3 0.98 25.0
105.
21
1.14 23.0
133. 24.6 0.89 26.0
164. 24.7 1.09 23.8
438. 34.5 1.07 23.0
437. 34.8 1.04 26.0
376. 32.3 1.12 24.0
442. 34.3 1.10 25.0
429. 33.7 1.12 24.0
439. 34.2 1.10 22.0
128
a28 aCM
2.00 6.00
3.57 3.00
3.00 4.00
3.00 5.00
5.00 5.00
2.52 2.00
4.00 4.00
3.00 2.00
2.00 2.00
2.80 2.00
1.00 2.00
0.00 0.00
0.00 3.00
1.00 2.00
3.00 4.00
1.87 1.00
3.00 3.00
1.01 1.00
2.97 2.00
5.00 4.00
3.44 3.00
4.00 5.00
3.00 3.00
2.90 2.00
3.70 5.00
4.00 6.00
3.56 5.00
3.00 6.00
3.00 3.00
1p 1pDAP 10p a5S a5S.inte aCMA.tel 1q5
1.0
2.00 1.0 1.0
2.00
0.00
0.00
1.0
2.00 2.0 1.6
1.40
0.00
0.44
1.0
2.00 0.0 2.0
1.00
0.00
1.00
1.0
1.00 1.0 1.0
0.00
1.00
0.00
0.0
2.00 1.2 2.0
1.00
0.00
0.31
1.0
0.00 0.0 0.8
1.04
0.00
0.54
1.0
1.00 2.0 1.0
2.00
0.00
1.00
2.0
1.00 1.0 0.0
0.00
0.00
0.00
2.0
2.00 1.0 2.0
2.00
0.00
1.00
2.0
2.00 0.0 0.9
1.11
0.00
0.58
1.0
2.00 0.0 1.0
2.00
0.00
0.00
1.0
2.00 1.0 0.0
1.00
0.00
0.00
0.0
1.00 0.0 0.0
0.00
0.00
0.0
2.00 1.0 2.0
2.00
0.00
0.00
1.0
2.00 0.0 2.0
2.00
0.00
1.00
2.0
2.00 1.0 1.3
1.76
0.00
0.49
1.0
2.00 1.0 0.0
1.00
0.00
0.00
1.7
1.00 0.0 0.8
1.35
0.00
0.35
0.0
2.00 2.0 0.0
0.33
0.00
0.0
2.00 1.0 0.0
0.00
0.00
0.00
0.0
2.00 0.0 0.2
0.53
0.00
0.01
1.0
2.00 2.0 0.0
0.00
0.00
0.00
0.0
2.00 2.0 1.0
0.00
0.00
0.00
2.0
2.00 1.0 0.2
0.74
0.00
0.10
1.0
2.00 1.0 0.3
0.89
0.00
0.20
1.0
2.00 0.0 0.0
1.00
1.00
0.00
2.0
2.00 2.0 0.2
0.87
0.00
0.18
2.0
2.00 2.0 0.0
2.00
0.00
0.00
2.0
2.00 1.0 0.0
0.00
0.00
0.00
Individua
TN5
TN9
TN13
TN14
TN15
TD1
TD2
TD3
TD7
TD9
ED13
ED21
ED23
ED26
EN17
EN21
EN11
EN12
CD2
CD6
CD8
CD10
CD22
CN3
CN4
CN7
CN9
CN14
CN15
2qC
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
0.00
2.00
2.00
1.00
2.00
2.00
1.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2qC 2q5S 2q5S 3pC 3pCMA 3p28S 4pC
2.00 2.00 2.00 0.00
1.00
1.00 0.00
2.00 1.77 2.20 0.00
0.00
0.80 0.00
2.00 2.00 2.00 1.00
1.00
1.00 0.00
2.00 2.00 2.00 1.00
1.00
1.00 0.00
2.00 2.00 2.00 0.00
0.00
1.00 0.00
2.00 1.97 1.94 2.00
2.00
0.77 0.00
2.00 2.00 2.00 0.00
0.00
1.00 0.00
2.00 2.00 2.00 1.00
1.00
1.00 0.00
2.00 2.00 2.00 0.00
0.00
0.00 0.00
2.00 1.97 2.00 1.00
1.00
0.70 0.00
0.00 0.00 0.00 1.00
1.00
2.00 2.00
2.00 2.00 2.00 2.00
2.00
2.00 1.00
1.00 1.40 1.00 1.00
1.00
1.58 0.00
1.00 1.00 1.00 2.00
1.00
1.00 2.00
2.00 2.00 2.00 2.00
1.00
1.00 1.00
2.00 1.38 1.51 1.00
1.00
1.27 0.00
1.00 1.00 1.00 1.00
1.00
1.00 0.81
1.00 1.62 1.03 1.00
1.00
1.51 0.00
2.00 1.71 1.11 1.00
1.00
0.78 2.00
2.00 2.00 2.00 0.00
0.00
0.00 0.00
2.00 1.76 1.34 0.00
0.00
0.63 0.00
1.00 2.00 1.00 0.00
0.00
0.00 0.00
2.00 2.00 2.00 1.00
1.00
1.00 0.00
0.00 2.00 0.43 1.00
1.00
1.07 0.00
0.00 2.00 0.51 1.00
1.00
0.86 0.00
0.00 2.00 0.00 1.00
2.00
2.00 0.00
0.00 2.01 0.00 2.00
2.00
0.91 0.00
0.00 2.00 0.00 0.00
0.00
0.00 0.00
0.00 2.00 0.00 1.00
1.00
1.00 0.00
4pC 4pCMA 4p28S 4qC
0.00
1.00
1.00 0.00
2.00
1.00
0.54 0.00
0.00
0.00
1.00 0.00
2.00
0.00
1.00 0.00
0.00
0.00
1.00 0.00
2.00
0.00
1.12 0.00
0.00
0.00
0.00 0.00
2.00
1.00
1.00 0.00
0.00
2.00
2.00 0.00
2.00
0.00
0.97 0.00
2.00
0.00
0.00 2.00
2.00
1.00
2.00 0.00
1.00
1.00
1.38 0.00
2.00
1.00
0.00 0.00
2.00
0.00
0.00 2.00
1.00
1.00
0.67 0.00
1.79
0.00
1.00 0.71
0.00
0.00
1.39 0.00
2.00
0.00
0.29 0.00
0.00
0.00
0.00 0.00
2.00
0.00
0.12 0.00
2.00
0.00
0.00 1.00
2.00
0.00
0.00 0.00
0.00
0.00
0.63 0.00
2.00
1.00
0.16 0.00
2.00
0.00
1.00 2.00
2.00
0.00
0.20 2.00
2.00
0.00
0.00 0.00
0.00
0.00
0.00 0.00
129
4qC 4qCMA 5pC 5pCMA 5qCMA
1.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
2.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
2.00
0.00
2.00
2.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
0.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
1.00
0.00
2.00
1.00
0.00
0.00
0.00
0.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
1.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
0.00
0.00
0.00
0.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
0.00
0.00
2.00
0.00
0.31
2.00
1.00
2.00
0.00
0.00
2.00
0.00
2.00
2.00
2.00
2.00
0.00
2.00
0.00
0.00
2.00
0.00
2.00
0.00
0.00
0.00
0.00
2.00
0.00
0.00
Individual
TN5
TN9
TN13
TN14
TN15
TD1
TD2
TD3
TD7
TD9
ED13
ED21
ED23
ED26
EN17
EN21
EN11
EN12
CD2
CD6
CD8
CD10
CD22
CN3
CN4
CN7
CN9
CN14
CN15
6pC1 7pC1 7qC1 7qC2 8pC1 8qC1 10pCMA1
2.00 0.41 0.54 0.94 0.84 0.33
1.00
2.00 2.00 2.00 2.00 2.00 2.00
2.00
2.00 0.99 0.50 1.43 1.24 0.87
2.00
2.00 0.00 0.00 1.00 1.00 0.00
2.00
0.00 0.00 0.00 1.00 1.00 0.00
2.00
2.00 1.02 0.06 1.25 1.09 1.16
2.00
2.00 2.00 1.00 2.00 2.00 2.00
2.00
2.00 2.00 0.00 2.00 2.00 2.00
2.00
0.00 0.00 0.00 0.00 0.00 0.00
2.00
2.00 1.00 0.00 2.00 1.36 1.40
2.00
2.00 2.00 0.00 2.00 2.00 2.00
2.00
2.00 0.95 0.23 0.84 0.57 0.70
2.00
0.00 1.00 0.00 1.00 0.00 0.00
1.00
2.00 2.00 2.00 2.00 2.00 2.00
2.00
2.00 2.00 0.00 2.00 2.00 2.00
2.00
2.00 2.00 0.00 2.00 2.00 2.00
2.00
2.08 1.83 0.49 1.92 1.92 1.92
2.00
2.00 0.00 0.00 0.00 0.00 0.00
1.00
2.00 0.00 0.00 0.00 2.00 2.00
2.00
2.00 0.00 0.00 2.00 2.00 2.00
2.00
2.00 2.00 0.00 2.00 2.00 2.00
2.00
2.00 0.00 0.00 0.00 2.00 2.00
2.00
2.00 2.00 0.00 2.00 2.00 2.00
2.00
2.00 2.00 0.00 2.00 0.00 2.00
2.00
2.00 2.00 0.00 2.00 2.00 2.00
2.00
2.00 1.00 0.00 0.00 2.00 2.00
0.00
2.00 0.00 0.00 2.00 2.00 2.00
1.00
2.00 2.00 0.00 2.00 2.00 2.00
2.00
2.00 0.00 0.00 0.00 0.00 0.00
2.00
10pC1 10p28S1
1.00
2.00
2.00
2.21
2.00
2.00
2.00
2.00
2.00
2.00
2.00
1.39
2.00
2.00
0.00
1.00
0.00
1.00
0.00
1.46
0.00
2.00
2.00
2.00
2.00
1.00
0.00
2.00
0.00
2.00
0.00
1.84
2.00
2.00
2.00
1.33
0.00
1.74
2.00
2.00
2.00
1.81
2.00
2.00
2.00
2.00
2.00
1.40
2.00
1.57
2.00
0.00
1.00
1.48
2.00
2.00
2.00
2.00
130
10c5S aCLargeDouble
0.00
0.00
0.94
2.00
0.00
2.00
0.00
2.00
2.00
0.00
0.19
0.00
1.00
2.00
0.00
0.00
0.00
0.00
0.31
0.00
1.00
2.00
0.00
0.00
-0.05
0.00
1.00
2.00
2.00
2.00
1.02
2.00
1.00
1.54
0.41
0.00
0.06
0.00
0.00
0.00
0.25
0.00
0.10
0.00
0.00
0.00
0.29
0.00
0.58
0.00
0.00
0.00
0.53
0.00
1.00
0.00
0.00
-0.85
aCInter
2.00
2.00
2.00
2.00
2.00
0.00
2.00
0.00
1.00
1.00
0.00
1.00
2.00
1.15
0.00
2.00
0.97
0.00
2.00
2.00
1.00
2.00
2.00
2.00
0.00
2.00
0.00
2.00
1.22
aCsmallDouble
0.00
0.00
2.00
0.00
0.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
1.80
2.00
0.00
0.00
2.00
1.00
1.00
0.00
2.00
2.00
2.00
2.00
1.43
4.13 Supplementary figures
-2
-1
Lake.Ecotype
0
1
2
Lineage
2
1
East:Normal
East:Dwarf
Acadian
Cliff:Dwarf
Cliff:Normal
Temis:Dwarf
Temis:Normal
Atlantic
0
Dim 2 (14.73%)
-1
Ecotype
-2
Lake
2
1
East
DwarfNormal
0
Temis
Cliff
-1
-2
-2
-1
0
1
2
Dim 1 (20.99%)
Figure S 4.1. Multiple factor analysis performed with FactoMineR. Dimensions 1 and 2 are
shown. Each quadrant shows the same data with the ellipses around factor levels shown:
“lineage”, “lake”, “ecotype” or “Lake/ecotype”. Data points represent single individuals.
131
Figure S 4.2. Multiple factor analysis correlation circle for dimensions 1 and 3. Vectors for
all variables are shown, but only the five most structuring chromosome markers are
identified. Marker names correspond to Figure 4 for a physical representation of the
markers identified and Table 4 for correlation values. Dashed vectors represent
supplementary variables that are not used to compute the MFA (length, weight, Fulton, gill
raker number).
132
Chapitre 5 : Conclusion
133
5.1 Retour sur les principaux résultats
L’objectif général de cette thèse était d’examiner les bases moléculaires de l’isolement
reproducteur post-zygotique intrinsèque sous un nouvel angle. Plus spécifiquement, notre
approche était d’étudier les mécanismes moléculaires dans leur ensemble, par opposition à
la recherche de « gènes de spéciation ». Nous avons évalué la contribution de certains
mécanismes négligés dans l’étude de systèmes d’espèces naissantes non-modèles, tels que
la
dérépression
des
éléments
transposables,
certains
types
de
réarrangements
chromosomiques et la ségrégation des chromosomes. Le système du Grand Corégone
présente trois avantages clés pour ces objectifs: la présence de réplicats naturel, une
divergence récente à l’échelle évolutive et la possibilité d’effectuer des croisements
hybrides en laboratoire. Les hybrides se portent bien à ce genre d’étude puisqu’ils
permettent de mettre en évidence non seulement les différences entre les formes parentales,
mais également les conséquences mécanistiques de cette divergence.
Le premier chapitre expérimental visait à tester l’hypothèse d’une association entre la
dérépression des éléments transposables et l’isolement reproducteur entre jeunes lignées
divergentes (Dion-Côté et al. 2014). Par une approche de séquençage massif d’ARN, nous
avons été en mesure de confirmer cette hypothèse, mais également d’observer une
dérégulation à l’échelle du transcriptome chez les hybrides malformés, incluant
l’expression d’ARNs non-codants. De façon intéressante, la dérégulation du transcriptome
et la dérépression des éléments transposables et des ARNs non-codants étaient associées à
une sous-expression du transcrit de la protéine DNMT1, suggérant une hypométhylation du
génome. Nous avons également observé des différences d’expression entre corégones nains
et normaux au stade le plus précoce à ce jour. Ces dernières différences d’expression sont
supportées par des études antérieures puisqu’elles étaient cohérentes avec la divergence
phénotypique connue. Ces résultats suggèrent un « choc génomique » (Comai et al. 2003;
Landry et al. 2007) chez les hybrides, lié à des différences d’expression entre les formes
parentales, malgré leur divergence extrêmement récente à l’échelle évolutive.
On remarquera qu’une étude récente réalisée chez les hybrides F1 de Drosophila
melanogaster et D. simulans a montré qu’une proportion significative des différences
d’expression observées résultait vraisemblablement du délai dévelopemental et des
134
différences de représentation des types de tissus chez ceux-ci, par opposition à des
incompatibilités strictement génétiques (Wei et al. 2014). Les hybrides rétro-croisés
malformés du Grand Corégone montrent des délais développementaux importants (Renaut
et Bernatchez 2011) et ont vraisemblablement des constitutions tissulaires différentes des
embryons sains. Cependant, la proportion de gènes impliqués dans notre étude (> 10 000
transcrits ou > 12% du transcriptome), et l’ampleur des différences d’expression observées
(jusqu’à > 1000 fois) est beaucoup plus importante que les observations faites par Wei et al
(2014) chez les hybrides de drosophiles. Ainsi, il demeure raisonnable de conclure que les
différences d’expression massives observées chez les hybrides rétro-croisés malformés
résultent d’une dérégulation à l’échelle du transcriptome, laquelle est associée à la
dérépression des éléments transposables.
Par la suite, nous avons testé l’hypothèse que l’instabilité génomique soit associée à
l’isolement reproducteur entre corégones nains et normaux (Dion-Côté et al. 2015). Pour ce
faire, nous avons directement examiné les chromosomes d’embryons purs et rétro-croisés
sains et malformés. C’est grâce à cette approche que nous avons détecté un phénomène
d’aneuploïdie généralisée chez les embryons rétro-croisés sains et malformés. De plus, une
analyse statistique de la variance montre que cette aneuploïdie repose vraisemblablement
sur des mécanismes mitotiques chez les embryons rétro-croisés sains, et méiotiques chez
les embryons rétro-croisés malformés. Les causes proximales de cette instabilité demeurent
incomprises, mais suggèrent la présence d’incompatibilités génétiques ou chromosomiques
déstabilisant la ségrégation des chromosomes chez les hybrides F1 et rétro-croisés.
Ces conclusions reposent sur des croisements réalisés à partir de seulement deux mâles,
parce que les géniteurs hybrides F1 étaient peu nombreux. Cependant, ces deux mâles ont
été croisés avec quatre femelles naines et quatre femelles normales, et nos observations
reposent ainsi sur plus de huit familles. De plus, le phénotype malformé a également été
observé chez des familles dont la femelle était hybride F1, dans des croisements
complètement indépendants (Renaut et Bernatchez 2011). Ainsi, il demeure raisonnable de
conclure que l’instabilité chromosomique soit caractéristique des croisements hybrides
rétro-croisés chez le Grand Corégone, et contribue donc à l’isolement reproducteur.
135
Enfin, nous avons caractérisé les chromosomes de trois paires de corégones nains et
normaux par une stratégie combinant cytogénétique classique et moléculaire (Dion-Côté et
al., en préparation). Cette approche nous a permis de mettre en évidence un
polymorphisme subchromosomique important, associé principalement à l’hétérochromatine
et à la fraction labile et donc rapidement remaléable du génome. Des analyses multi-variées
montrent que ce polymorphisme est associé à une divergence significative entre lignées
glaciaires et entre les trois lacs, ainsi que légèrement entre écotypes, quoique de façon non
parallèle. Ces observations sont d’ailleurs largement supportées par des études génétiques
antérieures et la biogéographie historique du système (Bernatchez et al. 2010). Bien qu’un
échantillonage plus approfondi serait nécessaires pour conclure sur l’impact relatif des
forces évolutives impliquées, ces résultats suggèrent une réorganisation génomique en
cours, laquelle serait étroitement associée au processus de divergence dans le système du
Grand Corégone.
Les relations de causalité entre les observations réalisées dans le cadre de ma thèse et
l’isolement reproducteur entre les formes du Grand Corégone demeurent difficiles à établir.
Par exemple, est-ce que la dérépression des éléments transposables chez les hybrides
déclenche une série de mécanismes associés à l’instabilité du génome et déstabilisant le
développement ou est-ce plutôt l’inverse? Dans le même ordre d’idées, est-ce que les
réarrangements chromomiques déstabilisent la méiose chez les hybrides, ou serait-ce plutôt
des incompatibilités génétiques non détectées à ce jour?. Pour répondre à ces questions, les
outils d’interférence à l’ARN et de manipulation ciblée du génome tels que le système
CRISPR/Cas9 sont porteurs de grandes attentes (e.g. Barreto et al. 2014; Li et al. 2014).
Grâce à ceux-ci, il devient envisageable d’éteindre des gènes (par exemple DNMT1) ou de
remplacer un gène dans un fond génétique différent afin d’en tester directement l’effet (par
exemple d’ajouter un élément transposable). Ainsi, il devrait être possible à moyen terme
de clarifier les relations de causalité entre gènes candidats et isolement reproducteur chez
les espèces non-modèles, et en particulier chez le Grand Corégone.
Par ailleurs, cette thèse apporte des éléments de réponse à deux grandes interrogations du
domaine de la recherche en spéciation (Marie Curie SPECIATION Network 2012). La
première concerne la nature des barrières impliquées au début du processus de spéciation
(Chapitre
1).
Mes
travaux
suggèrent
que
136
des
incompatibilités
génétiques
et
chromosomiques complexes puissent être à l’oeuvre au tout début du processus de
divergence, c’est-à-dire chez des espèces très jeunes comme les formes du Grand
Corégone. À notre connaissance, il s’agit des premières preuves de ce type dans un système
ayant divergé il y a seulement quelques milliers de générations. La seconde question
soulève la contribution des conditions environnementales et génétiques favorables à la
divergence et, ultimement, à la spéciation (Chapitre 1). Cette thèse supporte le rôle
promoteur de l’allopatrie et son impact positif sur la divergence dans un contexte de contact
secondaire. Alors que de précédentes études ont pu mettre en évidence le rôle de la
variation génétique ancestrale dans la divergence adaptative chez les écotypes du Grand
Corégone, mes travaux suggèrent également la présence de variation chromosomique
ancestrale dans ce système, associée à la divergence en allopatrie, et, vraisemblablement, à
l’isolement reproducteur. Ainsi, ma thèse propose un rôle pour la réorganisation et la
stabilité du génome dans l’isolement reproducteur, et ce, très tôt au cours du processus de
spéciation (< 60 000 ans).
5.2 Perspectives
Dans leur ensemble, les résultats de mes travaux suggèrent un rôle significatif pour la
régulation de la stabilité du génome dans le processus de spéciation et soulèvent des
interrogations importantes quant à l’implication de la structure chromatinienne, et en
particulier de l’hétérochromatine, tôt en cours de divergence. En effet, les trois chapitres
expérimentaux abordent indirectement le rôle de l’hétérochromatine et en particulier de la
méthylation de l’ADN dans l’implantation des barrières à la reproduction. D’abord, nous
avons trouvé que l’ARNm codant pour une protéine responsable de la méthylation de
l’ADN (DNMT1) était sous-exprimé chez les embryons rétro-croisés malformés. Ceci était
également associé à une dérépression des éléments transposables et une sur-expression des
ARN non-codants, normalement éteints, via entre autres la méthylation de l’ADN (Chapitre
2). Par ailleurs, les embryons rétro-croisés sains et malformés montrent des signes
importants de rupture de la ségrégation des chromosomes en mitose et en méiose, une
fonction dans laquelle l’hétérochromatine joue un rôle important (Chapitre 3). Par ailleurs,
les lignées glaciaires, et, dans une certaine mesure, les Corégones nains et normaux à
l’intérieur d’un même lac, tendent à présenter des patrons d’hétérochromatine divergents au
137
niveau de leurs chromosomes (Chapitre 4). Ces différences sont susceptibles de déstabiliser
la recombinaison méiotique chez les hybrides ou, à tout le moins, d’influencer la
localisation des sites de recombinaison (John et King 1985; King 1993). Ainsi, une
hypothèse découlant de cette thèse est que la structure chromatinienne entre les formes
normale et naine puisse différer sensiblement, conduisant à une dérégulation chez les
hybrides.
Des travaux en cours visent d’ailleurs à tester directement cette hypothèse par une stratégie
combinant le génotypage-par-séquençage au traitement bisulfite (RRBS, Reduced
Representation Bisulfite Sequencing) et vise à étudier la méthylation de l’ADN dans les
populations naturelles et chez des croisements expérimentaux (Madoka Krick,
communication personnelle; voir Smith et al. 2015 pour un exemple chez l'épinoche à trois
épines). Nous pourrions également développer des études d’immunoprécipatation de la
chromatine (voir Kratochwil et Meyer 2014 pour un exemple chez les cichlidés) ciblant par
exemple la méthylation de la lysine 9 de l’histone H3 (anti-H3K9me), puisqu’il s’agit de la
principale marque par laquelle est définie l’hétérochromatine (Grewal et Jia 2007). Avec
ces nouvelles approches accessibles aux systèmes non-modèles, il sera possible de
documenter directement les différences de structures chromatinienne entre les formes naine
et normale du Grand Corégone, et d’évaluer sa stabilité chez les hybrides.
De façon plus générale, mes travaux montrent qu’une approche intégrative favorise une
meilleure compréhension des mécanismes moléculaires associés à la spéciation. Malgré le
séquençage de dizaines de génomes et même des efforts substantiels de reséquençage, le
constat demeure : 1) relativement peu de gènes clés ou « magiques » de spéciation ont été
identifiés, en particulier chez les espèces non-modèles (Presgraves 2010; Maheshwari et
Barbash 2011), et 2) l’isolement reproducteur, comme l’adaptation (Yeaman 2015), semble
présenter une architecture polygénique où l’impact de plusieurs gènes de contribution
mineure s’additionne pour produire une barrière forte à la reproduction. Par ailleurs,
certains types de variations génétiques comme les éléments transposables ou additions et
délétions d’hétérochromatine associées aux répétitions demeurent très difficiles à séquencer
et caractériser par les méthodes actuelles (Hoskins et al. 2007; Wei, Grenier, et al. 2014).
Avec l’amélioration exponentielle des technologies de séquençage et l’apport d’une
138
approche intégrative en spéciation, les prochaines années risquent de porter des découvertes
significatives.
5.3 Vers une approche intégrative de l’étude de la spéciation
La biologie intégrative est à la fois une approche et une attitude dans la pratique de la
science, qui vise l’unification de la recherche portant sur de nombreux niveaux de
complexité biologique : de la molécule à la cellule, du tissu à l’organisme, de la population
à la biodiversité (Wake 2003; Aubin-Horth et Renn 2009). L’étude des bases moléculaires
de la spéciation et de l’isolement reproducteur est donc un domaine de recherche
susceptible de bénéficier directement de cette approche, l’objectif étant de comprendre les
changements moléculaires responsable de la divergence et ultimement des patrons de la
biodiversité. Un première étape, déjà entamée, vers une compréhension globale de la
spéciation consiste à intégrer l’importance des changements de la séquence d’ADN avec
ceux de la structure du génome (e.g. inversions chromosomiques, insertions d’éléments
transposables, fusions et fissions de chromosomes), ces deux niveaux d’organisation étant
intrinsèquement liés (Lynch 2007). Dans ce contexte, la fonction de la chromatine n’est pas
à négliger, puisqu’elle est susceptible également d’engendrer elle-même des phénomènes
d’incompatibilités hybrides (Michalak 2009; Brown et O'Neill 2010). L’explosion des
technologies de séquençage, couplée à des techniques classiquement réservée aux systèmes
« modèles » (par exemple l’immunoprécipitation de chromatine), permettra sans doute des
avancées significatives au cours des années à venir. Ces progrès commencent déjà à
bouleverser profondément les paradigmes actuels en génétique de la spéciation, incluant
celui de « gène de spéciation ».
139
Chapitre 6 : Bibliographie
Abbott R, Albach D, Ansell S, Arntzen JW, Baird SJE, Bierne N, Boughman J, Brelsford
A, Buerkle CA, Buggs R, et al. 2013. Hybridization and speciation. Ecol Evol 26:229–246.
Aguilera A, Gómez-González B. 2008. Genome instability: a mechanistic view of its
causes and consequences. Nat Rev Genet 9:204–217.
Alkan C, Coe BP, Eichler EE. 2011. Genome structural variation discovery and
genotyping. Nat Rev Genet 12:363–375.
Allendorf FW, Bassham S, Cresko WA, Limborg MT, Seeb LW, Seeb JE. 2015. Effects of
Crossovers Between Homeologs on Inheritance and Population Genomics in PolyploidDerived Salmonid Fishes. Journal of Heredity 106:217–227.
Altemose N, Miga KH, Maggioni M, Willard HF. 2014. Genomic Characterization of
Large Heterochromatic Gaps in the Human Genome Assembly. PLoS Comput Biol
10:e1003628.
April J, Hanner RH, Dion-Côté A-M, Bernatchez L. 2012. Glacial cycles as an allopatric
speciation pump in north-eastern American freshwater fishes. Molecular Ecology 22:409–
422.
Aubin-Horth N, Renn SCP. 2009. Genomic reaction norms: using integrative biology to
understand molecular mechanisms of phenotypic plasticity. Molecular Ecology 18:3763–
3780.
Barbash DA, Siino DF, Tarone AM, Roote J. 2003. A rapidly evolving MYB-related
protein causes species isolation in Drosophila. P Natl Acad Sci USA 100:5302–5307.
Barbero JL. 2011. Sister chromatid cohesion control and aneuploidy. Cytogenet Genome
Res 133:223–233.
Barreto FS, Schoville SD, Burton RS. 2014. Reverse genetics in the tide pool: knock-down
of target gene expression via RNA interference in the copepod Tigriopus californicus.
Molecular Ecology Resources 15:868–879.
Bateson W. 1909. Heredity and variation in modern lights. In: Seward AC, editor. Darwin
and Modern Science. Cambridge: Cambridge University Press. pp. 85–101.
Bayes JJ, Malik HS. 2009. Altered Heterochromatin Binding by a Hybrid Sterility Protein
in Drosophila Sibling Species. Science 326:1538–1541.
Belgnaoui SM, Gosden RG, Semmes OJ, Haoudi A. 2006. Human LINE-1 retrotransposon
induces DNA damage and apoptosis in cancer cells. Cancer Cell Int. 6:13.
Bernatchez L, Dodson JJ. 1991. Phylogeographic structure in mitochondrial DNA of the
lake whitefish (Coregonus clupeaformis) and its relation to Pleistocene glaciations.
Evolution 45:1016–1035.
Bernatchez L, Renaut S, Whiteley AR, Derome N, Jeukens J, Landry L, Lu G, Nolte AW,
Østbye K, Rogers SM, et al. 2010. On the origin of species: insights from the ecological
genomics of lake whitefish. Philosophical Transactions of the Royal Society B: Biological
Sciences 365:1783–1800.
140
Bernatchez L. 2004. Ecological theory of adaptive radiation: an empirical assessment from
coregonine fishes (Salmoniformes). In: Hendry AP, Stearns SC, editors. Evolution
illuminated: salmon and their relatives. Oxford: Oxford University Press. pp. 175–207.
Bhattacharyya T, Gregorova S, Mihola O, Anger M, Sebestova J, Denny P, Simecek P,
Forejt J. 2013. Mechanistic basis of infertility of mouse intersubspecific hybrids. P Natl
Acad Sci USA 110:E468–E477.
Blier PU, Dufresne F, Burton RS. 2001. Natural selection and the evolution of mtDNAencoded peptides: evidence for intergenomic co-adaptation. Trends Genet. 17:400–406.
Booke HE. 1968. Cytotaxonomic studies of coregonine fishes of the Great Lakes, USA:
DNA and karyotype analysis. J Fish Res Board Can 25:1667–1687.
Bougas B, Granier S, Audet C, Bernatchez L. 2010. The transcriptional landscape of crossspecific hybrids and its possible link with growth in brook charr (Salvelinus fontinalis
Mitchill). Genetics 186:97–107.
Brown JD, O'Neill RJ. 2010. Chromosomes, conflict, and epigenetics: chromosomal
speciation revisited. Annu. Rev. Genom. Human Genet. 11:291–316.
Burton RS, Ellison CK, Harrison JS. 2006. The Sorry State of F2 Hybrids: Consequences
of Rapid Mitochondrial DNA Evolution in Allopatric Populations. Am Nat 168:S14–S24.
Burton RS, Pereira RJ, Barreto FS. 2013. Cytonuclear Genomic Interactions and Hybrid
Breakdown. Annu. Rev. Ecol. Evol. Syst. 44:281–302.
Camacho JPM. 2005. B chromosomes. In: Gregory TR, editor. The Evolution of the
Genome. Burlington, MA, USA: Academic Press. pp. 223–286.
Campbell D, Bernatchez L. 2004. Generic scan using AFLP markers as a means to assess
the role of directional selection in the divergence of sympatric whitefish ecotypes.
Molecular Biology and Evolution 21:945–956.
Caputo V, Giovannotti M, Nisi Cerioni P, Splendiani A, Olmo E. 2009. Chromosomal
study of native and hatchery trouts from Italy (Salmo trutta complex, Salmonidae):
conventional and FISH analysis. Cytogenet Genome Res 124:51–62.
Cattani MV, Presgraves DC. 2012. Incompatibility between X chromosome factor and
pericentric heterochromatic region causes lethality in hybrids between Drosophila
melanogaster and its sibling species. Genetics 191:549–559.
Charron G, Leducq J-B, Landry CR. 2014. Chromosomal variation segregates within
incipient species and correlates with reproductive isolation. Molecular Ecology 23:4362–
4372.
Chombard C, Boury-Esnault N, Tillier S. 1998. Reassessment of homology of
morphological characters in tetractinellid sponges based on molecular data. Systematic
Biology 47:351–366.
Christie P, Macnair MR. 1984. Complementary lethal factors in two North American
populations of the yellow monkey flower. Journal of Heredity 75:510–511.
141
Cioffi MB, Martins C, Bertollo LA. 2010. Chromosome spreading of associated
transposable elements and ribosomal DNA in the fish Erythrinus erythrinus. Implications
for genome change and karyoevolution in fish. BMC Evol Biol 10:271.
Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ, Rinn JL, Ponting CP, Stadler
PF, Morris KV, Morillon A, et al. 2011. The reality of pervasive transcription. PLoS Biol
9:e1000625.
Colosimo PF, Hosemann KE, Balabhadra S, Villarreal G, Dickson M, Grimwood J,
Schmutz J, Myers RM, Schluter D, Kingsley DM. 2005. Widespread parallel evolution in
sticklebacks by repeated fixation of Ectodysplasin alleles. Science 307:1928–1933.
Comai L, Madlung A, Josefsson C, Tyagi A. 2003. Do the different parental “heteromes”
cause genomic shock in newly formed allopolyploids? Philosophical Transactions of the
Royal Society B: Biological Sciences 358:1149–1155.
Comings DE. 1978. Mechanisms of chromosome banding and implications for
chromosome structure. Annual Review of Genetics 12:25–46.
Compton DA. 2011. Mechanisms of aneuploidy. Curr Opin Cell Biol 23:109–113.
Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. 2005. Blast2GO: a
universal tool for annotation, visualization and analysis in functional genomics research.
Bioinformatics 21:3674–3676.
Coyne J, Orr HA. 1997. ''Patterns of speciation in Drosophila“” revisited. Evolution
51:295–303.
Coyne J, Orr HA. 2004. Speciation. Sunderland, MA: Sinauer Associates, Inc.
Cremer M, Grasser F, Lanctôt C, Müller S, Neusser M, Zinner R, Solovei I, Cremer T.
2008. Multicolor 3D fluorescence in situ hybridization for imaging interphase
chromosomes. Methods Mol Biol 463:205–239.
Crête-Lafrenière A, Weir LK, Bernatchez L. 2012. Framing the Salmonidae family
phylogenetic portrait: a more complete picture from increased taxon sampling. PLoS ONE
7:e46662.
Cushing DH. 1990. Plankton production and year-class strength in fish populations: an
update of the match/mismatch hypothesis. Advances in marine biology 26:249–293.
Cutter AD. 2012. The polymorphic prelude
incompatibilities. Trends Ecol Evol 27:210–219.
to
Bateson–Dobzhansky–Muller
Davidson WS, Huang TK, Fujiki K, Schalburg von KR, Koop BF. 2009. The Sex
Determining Loci and Sex Chromosomes in the Family Salmonidae. Sex Dev 3:78–87.
Dayrat B, Tillier A, Lecointre G, Tillier S. 2001. New clades of euthyneuran gastropods
(Mollusca) from 28S rRNA sequences. Molecular Phylogenetics and Evolution 19:225–
235.
Derome N, Bougas B, Rogers SM, Whiteley AR, Labbe A, Laroche J, Bernatchez L. 2008.
Pervasive sex-linked effects on transcription regulation as revealed by expression
quantitative trait loci mapping in lake whitefish species pairs (Coregonus sp., Salmonidae).
Genetics 179:1903–1917.
142
Derome N, Duchesne P, Bernatchez L. 2006. Parallelism in gene transcription among
sympatric lake whitefish (Coregonus clupeaformis Mitchill) ecotypes. Molecular Ecology
15:1239–1249.
Dey A, Jin Q, Chen Y-C, Cutter AD. 2014. Gonad morphogenesis defects drive hybrid
male sterility in asymmetric hybrid breakdown of Caenorhabditis nematodes. Evol. Dev.
16:362–372.
Dion-Côté A-M, Renaut S, Normandeau E, Bernatchez L. 2014. RNA-seq reveals
transcriptomic shock involving transposable elements reactivation in hybrids of young lake
whitefish species. Molecular Biology and Evolution 31:1188–1199.
Dion-Côté A-M, Symonová R, Ráb P, Bernatchez L. 2015. Reproductive isolation in a
nascent species pair is associated with aneuploidy in hybrid offspring. Proceedings of the
Royal Society B: Biological Sciences 282:20142862–20142862.
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J,
Lin W, Schlesinger F. 2012. Landscape of transcription in human cells. Nature 489:101–
108.
Dobzhansky T. 1937. Genetics and the origin of species. New York: Columbia University
Press
Dobzhansky T. 1940. Speciation as a stage in evolutionary divergence. Am Nat:312–321.
Dürrbaum M, Kuznetsova AY, Passerini V, Stingele S, Stoehr G, Storchová Z. 2014.
Unique features of the transcriptional response to model aneuploidy in human cells. BMC
Genomics 15:139.
Eissenberg JC, Elgin SCR. 2014. Heterochromatin and Euchromatin. eLS.
Ekblom R, Wolf JBW. 2014. A field guide to whole-genome sequencing, assembly and
annotation. Eqvol Appl 7:1026–1042.
Ellegren H. 2013. Genome sequencing and population genomics in non-model organisms.
Trends Ecol Evol 29:51–63.
Ellison CK, Burton RS. 2008. Interpopulation hybrid breakdown maps to the mitochondrial
genome. Evolution 62:631–638.
Evans ML, Chapman LJ, Mitrofanov I, Bernatchez L. 2013. Variable extent of parallelism
in respiratory, circulatory, and neurological traits across lake whitefish species pairs. Ecol
Evol 3:546–557.
Faria R, Navarro A. 2010. Chromosomal speciation revisited: rearranging theory with
pieces of evidence. Trends Ecol Evol 25:660–669.
Farrell AP, Jones DR. 1992. The Heart. In: S HW, J RD, Farrell AP, editors. Fish
Physiology. Vol. Vol. XIIA. Academic Press. Fish Physiology. Academic Press. pp. 1–88.
Feder JL, Gejji R, Powell THQ, Nosil P. 2011. Adaptive chromosomal divergence driven
by mixed geographic mode of evolution. Evolution 65:2157–2170.
Feder JL, Nosil P. 2009. Chromosomal Inversions and Species Differences: When Are
Genes Affecting Adaptive Divergence and Reproductive Isolation Expected to Reside
Within Inversions? Evolution 63:3061–3075.
143
Feder JL, Xie X, Rull J, Velez S, Forbes A, Leung B, Dambroski H, Filchak KE, Aluja M.
2005. Mayr, Dobzhansky, and Bush and the complexities of sympatric speciation in
Rhagoletis. P Natl Acad Sci USA 102 Suppl 1:6573–6580.
Ferree PM, Barbash DA. 2009. Species-specific heterochromatin prevents mitotic
chromosome segregation to cause hybrid lethality in Drosophila. PLoS Biol 7:e1000234.
Feschotte C, Pritham EJ. 2007. DNA transposons and the evolution of eukaryotic genomes.
Annual Review of Genetics 41:331–368.
Feschotte C. 2008. Transposable elements and the evolution of regulatory networks. Nat
Rev Genet 9:397–405.
Fishman L, Stathos A, Beardsley PM, Williams CF, Hill JP. 2013. Chromosomal
Rearrangements and the Genetics of Reproductive Barriers in Mimulus (Monkey Flowers).
Evolution 67:2547–2560.
Forejt J, Pialek J, Trachtulec Z. 2012. Hybrid male sterility in mouse subspecific crosses.
In: Macholán M, Baird SJE, Munclinger P, Pialek J, editors. Evolution of the house mouse.
Cambridge: Cambridge University Press. pp. 482–503.
Fujiwara A, Nishida-Umehara C, Sakamoto T, Okamoto N, Nakayama I, Abe S. 2001.
Improved fish lymphocyte culture for chromosome preparation. Genetica 111:77–89.
Gagnaire P-A, Normandeau E, Bernatchez L. 2012. Comparative Genomics Reveals
Adaptive Protein Evolution and a Possible Cytonuclear Incompatibility between European
and American Eels. Molecular Biology and Evolution 29:2909–2919.
Gagnaire P-A, Normandeau E, Pavey SA, Bernatchez L. 2012. Mapping phenotypic,
expression and transmission ratio distortion QTL using RAD markers in the Lake
Whitefish (Coregonus clupeaformis). Molecular Ecology 22:3036–3048.
Gagnaire P-A, Pavey SA, Normandeau E, Bernatchez L. 2013. The Genetic Architecture of
Reproductive Isolation During Speciation-with-Gene-Flow in Lake Whitefish Species Pairs
Assessed by RAD Sequencing. Evolution 67:2483–2497.
Gordon DJ, Resio B, Pellman D. 2012. Causes and consequences of aneuploidy in cancer.
Nat Rev Genet 13:189–203.
Greig D, Travisano M, Louis EJ, Borts RH. 2003. A role for the mismatch repair system
during incipient speciation in Saccharomyces. J Evol Biol 16:429–437.
Grewal SIS, Jia S. 2007. Heterochromatin revisited. Nat Rev Genet 8:35–46.
Haldane JBS. 1930. A mathematical theory of natural and artificial selection. (Part VI,
Isolation.). Math. Proc. Camb. Phil. Soc. 26:220.
Hardie DC, Hebert PDN. 2003. The nucleotypic effects of cellular DNA content in
cartilaginous and ray-finned fishes. Genome 46:683–706.
Harfe BD, Jinks-Robertson S. 2000. DNA mismatch repair and genetic instability. Annual
Review of Genetics 34:359–399.
Hauffe HC, Gimenez MD, Searle JB. 2012. Chromosomal hybrid zones in the house
mouse. In: Macholán M, Baird SJE, Munclinger P, Pialek J, editors. Evolution of the house
mouse. Cambridge: Cambridge University Press. pp. 407–430.
144
Hendry AP. 2009. Ecological speciation! Or the lack thereof? Can J Fish Aquat Sci
66:1383–1398.
Henikoff S, Ahmad K, Malik HS. 2001. The Centromere Paradox: Stable Inheritance with
Rapidly Evolving DNA. Science 293:1098–1102.
Hébert FO, Renaut S, Bernatchez L. 2013. Targeted sequence capture and resequencing
implies a predominant role of regulatory regions in the divergence of a sympatric lake
whitefish species pair (Coregonus clupeaformis). Molecular Ecology 22:4896–4914.
Hoekstra HE, Hirschmann RJ, Bundey RA, Insel PA, Crossland JP. 2006. A single amino
acid mutation contributes to adaptive beach mouse color pattern. Science 313:101–104.
Hoffmann AA, Rieseberg LH. 2008. Revisiting the Impact of Inversions in Evolution:
From Population Genetic Markers to Drivers of Adaptive Shifts and Speciation? Annu.
Rev. Ecol. Evol. Syst. 39:21–42.
Hoskins RA, Carlson JW, Kennedy C, Acevedo D, Evans-Holm M, Frise E, Wan KH, Park
S, Mendez-Lago M, Rossi F, et al. 2007. Sequence finishing and mapping of Drosophila
melanogaster heterochromatin. Science 316:1625–1628.
Jacobsen MW, Hansen MM, Orlando L, Bekkevold D, Bernatchez L, Willerslev E, Gilbert
MTP. 2012. Mitogenome sequencing reveals shallow evolutionary histories and recent
divergence time between morphologically and ecologically distinct European whitefish
(Coregonus spp.). Molecular Ecology 21:2727–2742.
Jankun M, Ocalewicz K, Pardo BG, Martínez P, Woznicki P, Sánchez L. 2003.
Localization of 5S rRNA loci in three coregonid species (Salmonidae). Genetica 119:183–
186.
Jankun M, Ráb P. 1997. Multiple polymorphism of chromosome no. 1 in the karyotype of
whitefish, Coregonus lavaretus (Salmonidae) from lake system Saimaa, Finland.
Caryologia 50:185–195.
Jeukens J, Bernatchez L. 2011. Regulatory versus coding signatures of natural selection in a
candidate gene involved in the adaptive divergence of whitefish species pairs (Coregonus
spp.). Ecol Evol 2:258–271.
Jeukens J, Renaut S, St-Cyr J, Nolte AW, Bernatchez L. 2010. The transcriptomics of
sympatric dwarf and normal lake whitefish (Coregonus clupeaformis spp., Salmonidae)
divergence as revealed by next-generation sequencing. Molecular Ecology 19:5389–5403.
Jiggins CD. 2006. Sympatric speciation: why the controversy? Curr Biol 16:R333–R334.
John B, King M. 1985. The inter-relationship between heterochromatin distribution and
chiasma distribution. Genetica 66:183–194.
Jones PA. 2012. Functions of DNA methylation: islands, start sites, gene bodies and
beyond. Nat Rev Genet 13:484–492.
Jones RN. 1995. B chromosomes in plants. New Phytologist 131:411–434.
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. 2005.
Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res
110:462–467.
145
Kapitonov VV, Jurka J. 2008. A universal classification of eukaryotic transposable
elements implemented in Repbase. Nat Rev Genet 9:411–412.
Kawakami T, Butlin RK, Adams M, Saint KM, Paull DJ, Cooper SJB. 2009. Reexamination of a proposed case of stasipatric speciation: phylogeography of the Australian
morabine grasshoppers (Vandiemenella viaticaspecies group). Molecular Ecology 18:3429–
3442.
Kelleher ES, Edelman NB, Barbash DA. 2012. Drosophila interspecific hybrids phenocopy
piRNA-pathway mutants. PLoS Biol 10:e1001428.
Kerr MK, Churchill GA. 2001. Statistical design and the analysis of gene expression
microarray data. Genet. Res. 77:123–128.
Khurana JS, Wang J, Xu J, Koppetsch BS, Thomson TC, Nowosielska A, Li C, Zamore
PD, Weng Z, Theurkauf WE. 2011. Adaptation to P element transposon invasion in
Drosophila melanogaster. Cell 147:1551–1563.
Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague
B, Alkan C, Antonacci F, et al. 2008. Mapping and sequencing of structural variation from
eight human genomes. Nature 453:56–64.
Kidwell MG, Kidwell JF, Sved JA. 1977. Hybrid dysgenesis in Drosophila melanogaster: a
syndrome of aberrant traits including mutation, sterility and male recombination. Genetics
86:813–833.
Kim KE, Peluso P, Babayan P, Yeadon PJ, Yu C, Fisher WW, Chin C-S, Rapicavoli NA,
Rank DR, Li J, et al. 2014. Long-read, whole-genome shotgun sequence data for five model
organisms. Sci. Data 1:140045.
King M. 1993. Species evolution: the role of chromosome change. Cambridge: Cambridge
University Press
King MC, Wilson AC. 1975. Evolution at two levels in humans and chimpanzees. Science
188:107–116.
Kogenaru S, Yan Q, Guo Y, Wang N. 2012. RNA-seq and microarray complement each
other in transcriptome profiling. BMC Genomics 13:1–1.
Kratochwil CF, Meyer A. 2014. Mapping active promoters by ChIP-seq profiling of
H3K4me3 in cichlid fish - a first step to uncover cis-regulatory elements in ecological
model teleosts. Molecular Ecology Resources 15:761–771.
Labrador M, Farré M, Utzet F, Fontdevila A. 1999. Interspecific hybridization increases
transposition rates of Osvaldo. Molecular Biology and Evolution 16:931–937.
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K,
Doyle M, FitzHugh W, et al. 2001. Initial sequencing and analysis of the human genome.
Nature 409:860–921.
Landry CR, Hartl DL, Ranz JM. 2007. Genome clashes in hybrids: insights from gene
expression. Heredity 99:483–493.
Landry L, Bernatchez L. 2010. Role of epibenthic resource opportunities in the parallel
evolution of lake whitefish species pairs (Coregonus sp.). J Evol Biol 23:2602–2613.
146
Landry L, Vincent WF, Bernatchez L. 2007. Parallel evolution of lake whitefish dwarf
ecotypes in association with limnological features of their adaptive landscape. J Evol Biol
20:971–984.
Lê S, Josse J, Husson F. 2008. FactoMineR: an R package for multivariate analysis. Journal
of statistical software.
Levin HL, Moran JV. 2011. Dynamic interactions between transposable elements and their
hosts. Nat Rev Genet 12:615–627.
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics 25:1754–1760.
Li M, Yang H, Zhao J, Fang L, Shi H, Li M, Sun Y. 2014. Efficient and heritable gene
targeting in tilapia by CRISPR/Cas9. Genetics 197:591–599.
Lindsley DL, Sandler L, Baker BS, Carpenter AT, Denell RE, Hall JC, Jacobs PA, Miklos
GL, Davis BK, Gethmann RC, et al. 1972. Segmental aneuploidy and the genetic gross
structure of the Drosophila genome. Genetics 71:157–184.
Lönnig W-E, Saedler H. 2002. Chromosome rearrangements and transposable elements.
Annual Review of Genetics 36:389–410.
Lu G, Basley DJ, Bernatchez L. 2001. Contrasting patterns of mitochondrial DNA and
microsatellite introgressive hybridization between lineages of lake whitefish (Coregonus
clupeaformis); relevance for speciation. Molecular Ecology 10:965–985.
Lu G, Bernatchez L. 1998. Experimental evidence for reduced hybrid viability between
dwarf and normal ecotypes of lake whitefish (Coregonus clupeaformis Mitchill). P Roy Soc
Lond B Bio 265:1025–1030.
Lu G, Bernatchez L. 1999. Correlated trophic specialization and genetic divergence in
sympatric lake whitefish ecotypes (Coregonus clupeaformis): support for the ecological
speciation hypothesis. Evolution:1491–1505.
Lynch M. 2007. The origins of genome architecture. Sunderland, MA: Sinauer Associates
Inc
Mable BK, Alexandrou MA, Taylor MI. 2011. Genome duplication in amphibians and fish:
an extended synthesis. Journal of Zoology 284:151–182.
Macholán M, Baird SJE, Munclinger P, Pialek J eds. 2012. Evolution of the House mouse.
Cambridge: Cambridge University Press
Maheshwari S, Barbash DA. 2011. The genetics of hybrid incompatibilities. Annual
Review of Genetics 45:331–355.
Marie Curie SPECIATION Network. 2012. What do we need to know about speciation?
Trends Ecol Evol 27:27–39.
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. 2008. RNA-seq: An assessment
of technical reproducibility and comparison with gene expression arrays. Genome Research
18:1509–1517.
147
Mathavan S, Lee SGP, Mak A, Miller LD, Murthy KRK, Govindarajan KR, Tong Y, Wu
YL, Lam SH, Yang H, et al. 2005. Transcriptome Analysis of Zebrafish Embryogenesis
Using Microarrays. PLoS Genet 1:e29.
Mayr E. 1942. Systematics and the Origin of Species from the Viewpoint of a Zoologist.
New York: Columbia University Press
Mayr E. 1954a. Geographic speciation in tropical echinoids. Evolution 8:1–18.
Mayr E. 1954b. Change of genetic environment and evolution.
McClintock B. 1984. The significance of responses of the genome to challenge. Science
226:792–801.
McKinnon JS, Mori S, Blackman BK, David L, Kingsley DM, Jamieson L, Chou J,
Schluter D. 2004. Evidence for ecology's role in speciation. Nature 429:294–298.
Michalak P. 2009. Epigenetic, transposon and small RNA determinants of hybrid
dysfunctions. Heredity 102:45–50.
Mihola O, Trachtulec Z, Vlcek C, Schimenti JC, Forejt J. 2009. A Mouse Speciation Gene
Encodes a Meiotic Histone H3 Methyltransferase. Science 323:373–375.
Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E,
Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, et al. 2002.
Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.
Muller HJ. 1939. Reversibility in evolution considered from the standpoint of genetics.
Biological Reviews 14:261–280.
Muller HJ. 1942. Isolating mechanisms, evolution and temperature. Biol. Symp 6:71–125.
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. 2008. The
transcriptional landscape of the yeast genome defined by RNA sequencing. Science
320:1344–1349.
Nolte AW, Renaut S, Bernatchez L. 2009. Divergence in gene regulation at young life
history stages of whitefish (Coregonus sp.) and the emergence of genomic isolation. BMC
Evol Biol 9:59.
Noor MA, Grams KL, Bertucci LA, Reiland J. 2001. Chromosomal inversions and the
reproductive isolation of species. P Natl Acad Sci USA 98:12084–12088.
Nosil P, Feder JL. 2012. Genomic divergence during speciation: causes and consequences.
Philosophical Transactions of the Royal Society B: Biological Sciences 367:332–342.
Nosil P, Harmon LJ, Seehausen O. 2009. Ecological explanations for (incomplete)
speciation. Trends Ecol Evol 24:145–156.
Nosil P, Schluter D. 2011. The genes underlying the process of speciation. Trends Ecol
Evol 26:160–167.
Nosil P. 2007. Divergent host plant adaptation and reproductive isolation between ecotypes
of Timema Cristinae Walking sticks. Am Nat 169:151–162.
Nosil P. 2012. Ecological Speciation. (Harvey PH, May RM, Godfray H, Dunne JA,
editors.). New York: Oxford University Press
148
O'Neill MJ, Graves JA. 1998. Undermethylation associated with retroelement activation
and chromosome remodelling in an interspecific mammalian hybrid. Nature 393:68–72.
Orr HA, Presgraves DC. 2000. Speciation by postzygotic isolation: forces, genes and
molecules. BioEssays 22:1085–1094.
Orr HA. 1996. Dobzhansky, Bateson, and the genetics of speciation. Genetics 144:1331–
1335.
Pendas AM, Moran P, Martinez JL, Garcia Vazquez E. 1995. Applications of 5S rDNA in
Atlantic salmon, brown trout, and in Atlantic salmon brown trout hybrid identification.
Molecular Ecology 4:275–276.
Pereira CSA, Aboim MA, Ráb P, Collares-Pereira MJ. 2014. Introgressive hybridization as
a promoter of genome reshuffling in natural homoploid fish hybrids (Cyprinidae,
Leuciscinae). Heredity 112:343–350.
0CPSXXG,!JRLMPBS&!NPMNMQ?JDMP?KSJRGT?PG?RCOS?LRGR?RGTe approach to infer
karyological relationships among taxa. Comparative Cytogenetics 8:337–349.
Phillips RB, Ihssen PE. 1986. Inheritance of Q band chromosomal polymorphisms in lake
trout. Journal of Heredity 77:93–97.
Phillips RB, Ráb P. 2001. Chromosome evolution in the Salmonidae (Pisces): an update.
Biological Reviews 76:1–25.
Phillips RB, Reed KM, Ráb P. 1996. Revised karyotypes and chromosome banding of
coregonid fishes from the Laurentian Great Lakes. Can J Zool 74:323–329.
Pialek J, Hauffe HC, Searle JB. 2005. Chromosomal variation in the house mouse. Biol J
Linn Soc 84:535–563.
Pigeon D, Chouinard A, Bernatchez L. 1997. Multiple modes of speciation involved in the
parallel evolution of sympatric morphotypes of lake whitefish (Coregonus clupeaformis,
Salmonidae). Evolution 51:196–205.
Pinney E. 1918. A Study of the relation of the behavior of the chromatin to development
and heredity in teleost hybrids. Journal of Morphology 31:225–291.
Presgraves DC, Balagopalan L, Abmayr SM, Orr HA. 2003. Adaptive evolution drives
divergence of a hybrid inviability gene between two species of Drosophila. Nature
423:715–719.
Presgraves DC. 2010. The molecular evolutionary basis of species formation. Nat Rev
Genet 11:175–180.
Quinn NL, Levenkova N, Chow W, Bouffard P, Boroevich KA, Knight JR, Jarvie TP,
Lubieniecki KP, Desany BA, Koop BF, et al. 2008. Assessing the feasibility of GS FLX
Pyrosequencing for sequencing the Atlantic salmon genome. BMC Genomics 9:404.
R Core Team. 2012. R: A language and environment for statistical computing. Available
from: http://www.R-project.org/
Rábová M, Völker M, pelikánová Š, Ráb P. 2015. Sequential Chromosome Banding in
Fishes. In: Ozouf-Costaz C, Pisano E, Foresti F, Foresti de Almeida Foresto L, editors. Fish
Cytogenetic Techniques: Ray-Fin Fishes and Chondrichthyans. CRC Press.
149
Ravi V, Venkatesh B. 2008. Rapidly evolving fish genomes and teleost diversity. Curr
Opin Genet Dev 18:544–550.
Renaut S, Bernatchez L. 2011. Transcriptome-wide signature of hybrid breakdown
associated with intrinsic reproductive isolation in lake whitefish species pairs (Coregonus
spp. Salmonidae). Heredity 106:1003–1011.
Renaut S, Grassa CJ, Yeaman S, Moyers BT, Lai Z, Kane NC, Bowers JE, Burke JM,
Rieseberg LH. 2013. Genomic islands of divergence are not affected by geography of
speciation in sunflowers. Nature Communications 4:1827–.
Renaut S, Maillet N, Normandeau E, Sauvage C, Derome N, Rogers SM, Bernatchez L.
2012. Genome-wide patterns of divergence during speciation: the lake whitefish case study.
Philosophical Transactions of the Royal Society B: Biological Sciences 367:354–363.
Renaut S, Nolte A, Bernatchez L. 2009. Gene expression divergence and hybrid
misexpression between lake whitefish species pairs (Coregonus spp. salmonidae).
Molecular Biology and Evolution 26:925–936.
Renaut S, Nolte AW, Bernatchez L. 2010. Mining transcriptome sequences towards
identifying adaptive single nucleotide polymorphisms in lake whitefish species pairs
(Coregonus spp. Salmonidae). Molecular Ecology 19 Suppl 1:115–131.
Renaut S, Nolte AW, Rogers SM, Derome N, Bernatchez L. 2011. SNP signatures of
selection on standing genetic variation and their association with adaptive phenotypes along
gradients of ecological speciation in lake whitefish species pairs (Coregonus spp.).
Molecular Ecology 20:545–559.
Rice WR, Hostert EE. 1993. Laboratory experiments on speciation: what have we learned
in 40 years? Evolution.
Rieseberg LH, Archer MA, Wayne RK. 1999. Transgressive segregation, adaptation and
speciation. Heredity 83 ( Pt 4):363–372.
Rieseberg LH, Vanfossen C, Desrochers AM. 1995. Hybrid speciation accompanied by
genomic reorganization in wild sunflowers. Nature 375:313–316.
Rieseberg LH. 2001. Chromosomal rearrangements and speciation. Trends Ecol Evol
16:351–358.
Robinson MD, Oshlack A. 2010. A scaling normalization method for differential
expression analysis of RNA-seq data. Genome Biology 11:R25.
Robison BD, Wheeler PA, Sundin K, Sikka P, Thorgaard GH. 2001. Composite interval
mapping reveals a major locus influencing embryonic development rate in rainbow trout
(Oncorhynchus mykiss). J. Hered. 92:16–22.
Rogers SM, Bernatchez L. 2006. The genetic basis of intrinsic and extrinsic post-zygotic
reproductive isolation jointly promoting speciation in the lake whitefish species complex
(Coregonus clupeaformis). J Evol Biol 19:1979–1994.
Rogers SM, Bernatchez L. 2007. The genetic architecture of ecological speciation and the
association with signatures of selection in natural lake whitefish (Coregonus sp.
Salmonidae) species pairs. Molecular Biology and Evolution 24:1423–1438.
150
Rogers SM, Gagnon V, Bernatchez L. 2002. Genetically based phenotype-environment
association for swimming behavior in lake whitefish ecotypes (Coregonus clupeaformis
Mitchill). Evolution 56:2322–2329.
Rubin GM, Kidwell MG, Bingham PM. 1982. The molecular basis of P-M hybrid
dysgenesis: the nature of induced mutations. Cell 29:987–994.
Rundle HD, Nosil P. 2005. Ecological speciation. Ecology Letters 8:336–352.
Russo CA, Takezaki N, Nei M. 1995. Molecular phylogeny and divergence times of
drosophilid species. Molecular Biology and Evolution 12:391–404.
Saetre G-P, Saether SA. 2010. Ecology and genetics of speciation in Ficedula flycatchers.
Molecular Ecology 19:1091–1106.
Schartl A, Dimitrijevic N, Schartl M. 1994. Evolutionary origin and molecular biology of
the melanoma-inducing oncogene of Xiphophorus. Pigment Cell Res. 7:428–432.
Schluter D. 2009. Evidence for ecological speciation and its alternative. Science 323:737–
741.
Searle JB. 1998. Speciation, chromosomes, and genomes. Genome Research 8:1–3.
Seehausen O, Butlin RK, Keller I, Wagner CE, Boughman JW, Hohenlohe PA, Peichel CL,
Saetre G-P, Bank C, Brännström Å, et al. 2014. Genomics and the origin of species. Nat
Rev Genet 15:176–192.
Sheltzer JM, Blank HM, Pfau SJ, Tange Y, George BM, Humpton TJ, Brito IL, Hiraoka Y,
Niwa O, Amon A. 2011. Aneuploidy Drives Genomic Instability in Yeast. Science
333:1026–1030.
Sheltzer JM, Torres EM, Dunham MJ, Amon A. 2012. Transcriptional consequences of
aneuploidy. P Natl Acad Sci USA 109:12644–12649.
Siegel JJ, Amon A. 2012. New Insights into the Troubles of Aneuploidy. Annu. Rev. Cell
Dev. Biol. 28:189–214.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. 2009. ABySS: A
parallel assembler for short read sequence data. Genome Research 19:1117–1123.
Slatkin M. 1987. Gene flow and the geographic structure of natural populations. Science
236:787–792.
Slotkin RK, Martienssen R. 2007. Transposable elements and the epigenetic regulation of
the genome. Nat Rev Genet 8:272–285.
Smith G, Smith C, Kenny JG, Chaudhuri RR, Ritchie MG. 2015. Genome-wide DNA
methylation patterns in wild samples of two morphotypes of threespine stickleback
(Gasterosteus aculeatus). Molecular Biology and Evolution 32:888–895.
St-Cyr J, Derome N, Bernatchez L. 2008. The transcriptomics of life-history trade-offs in
whitefish species pairs (Coregonus sp.). Molecular Ecology 17:1850–1870.
Stelkens RB, Schmid C, Seehausen O. 2015. Hybrid Breakdown in Cichlid Fish. PLoS
ONE 10:e0127207.
151
Symonová R, Majtánová Z, Sember A, Staaks GB, Bohlen J, Freyhof J, Rábová M, Ráb P.
2013. Genome differentiation in a species pair of coregonine fishes: an extremely rapid
speciation driven by stress-activated retrotransposons mediating extensive ribosomal DNA
multiplications. BMC Evol Biol 13:42.
Taylor EB, McPhail JD. 2000. Historical contingency and ecological determinism interact
to prime speciation in sticklebacks, Gasterosteus. Proceedings of the Royal Society B:
Biological Sciences 267:2375–2384.
Torres EM, Sokolsky T, Tucker CM, Chan LY, Boselli M, Dunham MJ, Amon A. 2007.
Effects of Aneuploidy on Cellular Physiology and Cell Division in Haploid Yeast. Science
317:916–924.
Torres EM, Williams BR, Amon A. 2008. Aneuploidy: Cells Losing Their Balance.
Genetics 179:737–746.
Treangen TJ, Salzberg SL. 2011. Repetitive DNA and next-generation sequencing:
computational challenges and solutions. Nat Rev Genet.
Trudel M, Tremblay A, Schetagne R, Rasmussen JB. 2001. Why are dwarf fish so small?
An energetic analysis of polymorphism in lake whitefish (Coregonus clupeaformis). Can J
Fish Aquat Sci 58:394–405.
Tusher VG, Tibshirani R, Chu G. 2001. Significance analysis of microarrays applied to the
ionizing radiation response. P Natl Acad Sci USA 98:5116–5121.
Ungerer MC, Strakosh SC, Zhen Y. 2006. Genome expansion in three hybrid sunflower
species is associated with retrotransposon proliferation. Curr Biol 16:R872–R873.
Valente GT, Conte MA, Fantinatti BEA, Cabral-de-Mello DC, Carvalho RF, Vicari MR,
Kocher TD, Martins C. 2014. Origin and Evolution of B Chromosomes in the Cichlid Fish
Astatotilapia latifasciata Based on Integrated Genomic Analyses. Molecular Biology and
Evolution 31:2061–2072.
Vergilino R, Elliott TA, Desjardins-Proulx P, Crease TJ, Dufresne F. 2013. Evolution of a
transposon in Daphnia hybrid genomes. Mobile DNA 4:1–1.
Via S, West J. 2008. The genetic mosaic suggests a new role for hitchhiking in ecological
speciation. Molecular Ecology 17:4334–4345.
Via S. 2009. Natural selection in action during speciation. P Natl Acad Sci USA 106 Suppl
1:9939–9946.
Völker M, Ráb P, Kullmann H. 2005. Karyotype Differentiation in Chromaphyosemion
Killifishes (Cyprinodontiformes, Nothobranchiidae). I: Chromosome Banding Patterns of
C. Alpha, C. Kouamense and C. Lugens. Genetica 125:33–41.
Völker M, Ráb P. 2015. Direct chromosome preparation from embryos and larvae. In:
Ozouf-Costaz C, Pisano E, Foresti F, Foresti de Almeida Foresto L, editors. Fish
Cytogenetic Techniques (Chondrichthyans and Teleosts). CRC Press.
Waddington CH. 1942. Canalization of development and the inheritance of acquired
characters. Nature 150:563–565.
152
Wake MH. 2003. What is "Integrative Biology"? Integrative and Comparative Biology
43:239–241.
Wang L, Feng Z, Wang X, Wang X, Zhang X. 2010. DEGseq: an R package for identifying
differentially expressed genes from RNA-seq data. Bioinformatics 26:136–138.
Weber M, Schübeler D. 2007. Genomic patterns of DNA methylation: targets and function
of an epigenetic mark. Curr Opin Cell Biol 19:273–280.
Wei KH-C, Clark AG, Barbash DA. 2014. Limited gene misregulation is exacerbated by
allele-specific upregulation in lethal hybrids between Drosophila melanogaster and
Drosophila simulans. Molecular Biology and Evolution 31:1767–1778.
Wei KH-C, Grenier JK, Barbash DA, Clark AG. 2014. Correlated variation and population
differentiation in satellite DNA abundance among lines of Drosophila melanogaster. P Natl
Acad Sci USA 111:18793–18798.
White M. 1969. Chromosomal rearrangements and speciation in animals. Annual Review
of Genetics 3:75–98.
White M. 1978a. Modes of speciation. San Francisco: W. H. Freeman and Company
White M. 1978b. Chain processes in chromosomal speciation. Systematic Biology 27:285–
298.
Whitelaw E, Martin DI. 2001. Retrotransposons as epigenetic mediators of phenotypic
variation in mammals. Nat Genet 27:361–365.
Whiteley AR, Derome N, Rogers SM, St-Cyr J, Laroche J, Labbe A, Nolte A, Renaut S,
Jeukens J, Bernatchez L. 2008. The phenomics and expression quantitative trait locus
mapping of brain transcriptomes regulating adaptive divergence in lake whitefish species
pairs (Coregonus sp.). Genetics 180:147–164.
Whiteley AR, Persaud KN, Derome N, Montgomerie R, Bernatchez L. 2009. Reduced
sperm performance in backcross hybrids between species pairs of whitefish (Coregonus
clupeaformis). Can J Zool 87:566–572.
Wittbrodt J, Adam D, Malitschek B, Mäueler W, Raulf F, Telling A, Robertson SM,
Schartl M. 1989. Novel putative receptor tyrosine kinase encoded by the melanomainducing Tu locus in Xiphophorus. Nature 341:415–421.
Wray GA. 2003. The Evolution of Transcriptional Regulation in Eukaryotes. Molecular
Biology and Evolution 20:1377–1419.
Wright KM, Lloyd D, Lowry DB, Macnair MR, Willis JH. 2013. Indirect evolution of
hybrid lethality due to linkage with selected locus in Mimulus guttatus. PLoS Biol
11:e1001497.
Yano A, Nicol B, Jouanno E, Quillet E, Fostier A, Guyomard R, Guiguen Y. 2012. The
sexually dimorphic on the Y-chromosome gene ( sdY) is a conserved male-specific Ychromosome sequence in many salmonids. Evol Appl 6:486–496.
Yeaman S. 2015. Local Adaptation by Alleles of Small Effect. The American Naturalist
186:S74–S89.
153
Auteur
Document
Catégorie
Uncategorized
Affichages
0
Taille du fichier
6 982 KB
Étiquettes
1/--Pages
signaler