Skip to Main Content
Book cover for Oxford Handbook of Genetics Oxford Handbook of Genetics

Contents

Book cover for Oxford Handbook of Genetics Oxford Handbook of Genetics
Disclaimer
Oxford University Press makes no representation, express or implied, that the drug dosages in this book are correct. Readers must therefore always … More Oxford University Press makes no representation, express or implied, that the drug dosages in this book are correct. Readers must therefore always check the product information and clinical procedures with the most up to date published product information and data sheets provided by the manufacturers and the most recent codes of conduct and safety regulations. The authors and the publishers do not accept responsibility or legal liability for any errors in the text or for the misuse or misapplication of material in this work. Except where otherwise stated, drug dosages and recommendations are for the non-pregnant adult who is not breastfeeding.

Chromosome basics 60

DNA basics 62

DNA and protein synthesis—the genetic code 64

DNA sequence variation and mutation 68

Genetic investigations 74

Genetic linkage 78

Chromosomes (graphic see Fig. 3.1) are found in a cell’s nucleus and derive their name from the Greek for coloured body ‘chroma soma’.

They carry genes and are composed mainly of chromatin, in which the DNA helix is wrapped around core histones (proteins) to form a ‘beads on a string’ configuration (graphic see Fig. 3.2).

The human genome is made up of ~3.2 × 109 base pairs, less than 10% of which actually code for proteins (the rest are probably important for maintaining chromosome structure and regulating gene expression).

A normal human cell has 22 pairs of numbered chromosomes and 2 sex chromosomes (normally XX in a female and XY in a male), i.e. 46 chromosomes in total. This is called a diploid complement (2n).

A single set of chromosomes (e.g. sperm or ovum) is a haploid complement (n).

If there is an abnormal number of chromosomes in a cell, this is termed aneuploidy.

The gain of a single chromosome is referred to as trisomy.

Chromosomes have a centromere (where the two chromatin coils appear to join, and where the two coils are pulled apart in meiosis) and two arms ending in telomeres (important in maintaining chromosomal stability).

The shorter arm of the chromosomes is called p and the longer arm q.

 Classification of chromosomes depending on the position of centromeres. (Reproduced from Young (2005), Medical Genetics, Fig. 1.5, p. 7, with permission from Oxford University Press.)
Fig. 3.1

Classification of chromosomes depending on the position of centromeres. (Reproduced from Young (2005), Medical Genetics, Fig. 1.5, p. 7, with permission from Oxford University Press.)

 Stages in the packaging of DNA to form chromosome number 17. (Reproduced with permission from Strachan, T. and Read, A.P. (2004) Human Molecular Genetics
								3, Garland Science, London.)
Fig. 3.2

Stages in the packaging of DNA to form chromosome number 17. (Reproduced with permission from Strachan, T. and Read, A.P. (2004) Human Molecular Genetics 3, Garland Science, London.)

Human heredity is encoded by the nucleic acid DNA.

A nucleic acid is a chain of nucleotides; a nucleotide is a purine or pyrimidine base with attached sugar and phosphate entities.

In DNA the sugar is deoxyribose (hence deoxyribonucleic acid): in RNA the sugar is ribose (hence ribonucleic acid).

DNA contains four types of bases—two purines, adenine (A) and guanine (G), and two pyrimidines, cytosine (C) and thymine (T).

RNA contains uracil (U) in place of thymine.

The DNA double helix (structure determined by Watson and Crick in 1953) maintains a constant width and is faithfully replicated, because purines always face pyrimidines in the complementary A–T and G–C base pairs. Thus, DNA:

serves as a template for replication that re-establishes the double helix

can be ‘read’ and ‘copied’ in the process of producing proteins.

The paired strands of DNA are referred to as the 5′ (five prime) and 3′ (three prime) strands, so named because at the 5′ end a phosphate group is attached to the 5th carbon atom of the sugar component, and at the 3′ end the phosphate group is attached to the 3rd carbon atom of the sugar component.

The human genome (the entire human genetic complement) consists of ~21 000 genes.

A gene is the fundamental unit of heredity and is a sequence of DNA involved in producing a polypeptide chain. It is functionally defined by its protein product. A gene contains:

coding segments (exons)

intervening sequences (introns)

regulatory elements, e.g. promoter.

In genetic disorders an individual can be described by their:

Genotype: their genetic constitution.

Phenotype: their appearance, or other characteristics, which result from the interaction of their genetic constitution with the environment.

A mutation is a hereditable structural change in the sequence of DNA resulting in a change to an individual’s genotype which may, or may not, alter his/her phenotype.

Protein synthesis involves transcription of the coding DNA (cDNA) into messenger RNA (mRNA) and then translation of the mRNA into a polypeptide (graphic see Fig 3.3). The ‘sense’ strand of DNA is read from the ‘upstream’ 5′ end to the ‘downstream’ 3’ end and transcribed by RNA polymerase to make mRNA. The non-coding introns are then removed by ‘splicing’. mRNA then moves to the cytoplasm where it binds to transfer RNA (tRNA) on the surface of ribosomes. Every tRNA contains 3 nucleotide bases (an anticodon) which complement a set of 3 base pairs on the mRNA (a codon), which specifies a particular amino acid (graphic see Table 3.1) which is then added to the growing protein chain.

 Diagrammatic representation of how information in the sense strand of DNA is converted into a polypeptide chain. For the sake of simplicity, four codons are shown separated by short gaps. (Reproduced from Young (2005). Medical
									Genetics, Fig. 1.5 p. 7, with permission from Oxford University Press.
Fig. 3.3

Diagrammatic representation of how information in the sense strand of DNA is converted into a polypeptide chain. For the sake of simplicity, four codons are shown separated by short gaps. (Reproduced from Young (2005). Medical Genetics, Fig. 1.5 p. 7, with permission from Oxford University Press.

Table 3.1
Triplet codons and their corresponding amino acids and STOP sequences
T C A G

T

TTT = Phe

TCT = Ser

TAT = Tyr

TGT = Cys

TTC = Phe

TCC = Ser

TAC = Tyr

TGC = Cys

TTA = Leu

TCA = Ser

TAA = STOP

TGA = STOP

TTG = Leu

TCG = Ser

TAG = STOP

TGG = Trp

C

CTT = Leu

CCT = Pro

CAT = His

CGT = Arg

CTC = Leu

CCC = Pro

CAC = His

CGC = Arg

CTA = Leu

CCA = Pro

CAA = Gln

CGA = Arg

CTG = Leu

CCG = Pro

CAG = Gln

CGG = Arg

A

ATT = Ile

ACT = Thr

AAT = Asn

AGT = Ser

ATC = Ile

ACC = Thr

AAC = Asn

AGC = Ser

ATA = Ile

ACA = Thr

AAA = Lys

AGA = Arg

ATG = Met

ACG = Thr

AAG = Lys

AGG = Arg

G

GTT = Val

GCT = Ala

GAT = Asp

GGT = Gly

GTC = Val

GCC = Ala

GAC = Asp

GGC = Gly

GTA = Val

GCA = Ala

GAA = Glu

GGA = Gly

GTG = Val

GCG = Ala

GAG = Glu

GGG = Gly

T C A G

T

TTT = Phe

TCT = Ser

TAT = Tyr

TGT = Cys

TTC = Phe

TCC = Ser

TAC = Tyr

TGC = Cys

TTA = Leu

TCA = Ser

TAA = STOP

TGA = STOP

TTG = Leu

TCG = Ser

TAG = STOP

TGG = Trp

C

CTT = Leu

CCT = Pro

CAT = His

CGT = Arg

CTC = Leu

CCC = Pro

CAC = His

CGC = Arg

CTA = Leu

CCA = Pro

CAA = Gln

CGA = Arg

CTG = Leu

CCG = Pro

CAG = Gln

CGG = Arg

A

ATT = Ile

ACT = Thr

AAT = Asn

AGT = Ser

ATC = Ile

ACC = Thr

AAC = Asn

AGC = Ser

ATA = Ile

ACA = Thr

AAA = Lys

AGA = Arg

ATG = Met

ACG = Thr

AAG = Lys

AGG = Arg

G

GTT = Val

GCT = Ala

GAT = Asp

GGT = Gly

GTC = Val

GCC = Ala

GAC = Asp

GGC = Gly

GTA = Val

GCA = Ala

GAA = Glu

GGA = Gly

GTG = Val

GCG = Ala

GAG = Glu

GGG = Gly

Because there are more codons (61 plus 3 STOP codons) than there are amino acids (20), almost all amino acids are represented by more than one codon, i.e. the code is degenerate, particularly at the third base (graphic see Table 3.1). graphic See Table 3.2 for a list of amino acid abbreviations.

Table 3.2
Amino acid abbreviations and genetic notations
Abbreviation Amino acid Genetic notation

Ala

Alanine

A

Arg

Arginine

R

Asn

Asparagine

N

Asp

Aspartic acid

D

Cys

Cysteine

C

Gln

Glutamine

Q

Glu

Glutamic acid

E

Gly

Glycine

G

His

Histidine

H

Ile

Isoleucine

I

Leu

Leucine

L

Lys

Lysine

K

Met

Methionine

M

Phe

Phenylalanine

F

Pro

Proline

P

Ser

Serine

S

Thr

Threonine

T

Trp

Tryptophan

W

Tyr

Tyrosine

Y

Val

Valine

V

STOP

Nonsense

X

Abbreviation Amino acid Genetic notation

Ala

Alanine

A

Arg

Arginine

R

Asn

Asparagine

N

Asp

Aspartic acid

D

Cys

Cysteine

C

Gln

Glutamine

Q

Glu

Glutamic acid

E

Gly

Glycine

G

His

Histidine

H

Ile

Isoleucine

I

Leu

Leucine

L

Lys

Lysine

K

Met

Methionine

M

Phe

Phenylalanine

F

Pro

Proline

P

Ser

Serine

S

Thr

Threonine

T

Trp

Tryptophan

W

Tyr

Tyrosine

Y

Val

Valine

V

STOP

Nonsense

X

Genetic results are reported according to internationally agreed standards.

c. Refers to a numbered nucleotide in a gene sequence

p. Refers to a numbered amino acid in the protein product

These are described by a number representing the nucleotide in the coding DNA (cDNA) sequence, followed by a letter representing the original nucleotide (A, C, G, T) followed by > and the mutated nucleotide, e.g. in the β-globin gene (HBB) c.17A>T means that adenine at nucleotide 17 is changed to thymine:

Codon

5

6

7

HbA

CCT

GAG

GAG

(Pro)

(Glu)

(Glu)

HbS

CCT

GTG

GAG

(Pro)

(Val)

(Glu)

Codon

5

6

7

HbA

CCT

GAG

GAG

(Pro)

(Glu)

(Glu)

HbS

CCT

GTG

GAG

(Pro)

(Val)

(Glu)

If this results in an amino acid substitution, the mutation is termed a missense mutation.

In protein annotation this is written with a number representing the amino acid in the translated protein product, the first letter preceding the number being the wild-type amino acid, and the letter after being the altered amino acid, e.g. in sickle cell disease (graphic see Haemoglobinopathies, p. 130), HbS differs structurally from HbA due to a missense mutation in the β-globin gene, denoted by p.E6V (glutamic acid at amino acid 6 is changed to valine). These molecular differences affect the way the protein chain folds, illustrated diagrammatically in graphic Fig. 3.4.

 Folding of HbA (top) and HbS (bottom).
Fig. 3.4

Folding of HbA (top) and HbS (bottom).

If the nucleotide substitution does not alter the genetic code, it is termed a silent or synonymous substitution, although this could still cause problems by affecting splicing, etc.

Most splice site mutations occur in introns. Mutations in introns are referred to by the nearest nucleotide in an exon, e.g. in CFTR (the Cystic Fibrosis Transmembrane conductance Regulator gene, graphic see Chapter 5, Cystic fibrosis, p. 104) c.621+1G>T, the first nucleotide (G) in the intron 3′ to nucleotide 621 in the cDNA is replaced by T and, in c.1717–1G>A, the last nucleotide (G) in the intron 5′ to nucleotide 1717 in the cDNA is replaced by A.

The nucleotide number is followed by del/ins and the letter for the relevant nucleotide, e.g. c.394delT means the nucleotide T at position 394 in the cDNA is deleted and c.3905–3906insT means a T is inserted after nucleotide 3905 in the cDNA. Insertions/deletions involving single nucleotides or pairs of nucleotides cause a shift in the reading frame (frameshift mutation) and usually result in protein truncation.

In protein annotation, the term delta (or a small triangle) is used to denote a deletion, e.g. in CFTR, the ΔF508 mutation means a deletion of phenylalanine at amino acid 508 resulting from a three-nucleotide deletion. Although this particular terminology is not current, it is still in widespread use.

Some mutations, e.g. ΔF508, are well characterized as pathogenic changes (graphic see Table 3.3). On the other hand, interpreting the clinical significance of a newly identified ‘private mutation’, i.e. a mutation unique to a given individual or family, can sometimes be very difficult.

Table 3.3
The spectrum of known pathogenic mutations in humans. (Reproduced with permission of Oxford University Press, Young, Medical Genetics, Table 1.3, 2005.)
Type of mutation Proportion of total (expressed as %)

Point mutations

Missense

47

Nonsense

11

Splice-site

10

Regulatory

1

Deletions and insertions

Gross deletions

5

Small deletions

16

Gross insertions and duplications

1

Small insertions

6

Other rearrangements

3

Type of mutation Proportion of total (expressed as %)

Point mutations

Missense

47

Nonsense

11

Splice-site

10

Regulatory

1

Deletions and insertions

Gross deletions

5

Small deletions

16

Gross insertions and duplications

1

Small insertions

6

Other rearrangements

3

Data obtained from the Human Gene Mutation Database (graphic  www.hgmd.org) Sterson, P.D., Ball, E.V., Mort, M. et al. The human gene mutation database (HGMD): 2003 update. Human Mutatation, 21, 2003, 577–81. Reprinted with permission of John Wiley & Sons, Inc.

A mutation that results in an altered amino acid sequence in the encoded protein is termed a missense mutation. Not all missense mutations are pathogenic as the nature of the amino acid change, and its precise location in the three-dimensional protein structure, will determine whether there is any effect on protein function (graphic see HbS mutation under nucleotide substitution above).

Mutations that result in protein truncation are nearly always pathogenic. They include single nucleotide substitutions that encode STOP codons (nonsense mutations), frameshift mutations, in which the reading frame is lost, and also large deletions/insertions.

Splicing is the process by which the introns are removed from the primary transcript, and the exons are joined together. Splice-site mutations disrupt this process.

Some genes have alternative splice variants, where a single gene gives rise to more than one mRNA sequence that may have different tissue distributions. Mutations may abolish a splice acceptor or donor site or impair the efficiency of splicing, resulting in abnormal ratios of splice variants.

A mutation caused by an increase in the number of copies of a repeated trinucleotide, e.g. (CTG)n and (CAG)n. Examples of diseases caused by triplet repeat expansions are myotonic dystrophy (graphic see Chapter 5, Myotonic dystrophy, p. 152), fragile X syndrome (graphic see Chapter 5, Fragile X syndrome, p. 122) and Huntington disease (graphic see Chapter 5, Huntington disease, p. 144).

Triplet repeat mutations are unstable and may increase when transmitted from parent to child. This can cause significant worsening across generations, either causing the disease to appear at an earlier age, or be of worse severity. This phenomenon is called anticipation.

This term includes nucleotide substitutions that introduce a stop codon, out-of-frame deletions resulting in a truncated protein, or specific mutations that cause loss of function of the protein by disturbing the conformation or charge of a site critical in the interaction of the protein with other molecules. Most mutations in recessively inherited disease are loss-of-function.

Haploinsufficiency arises when the normal phenotype requires the protein product of two alleles, and reduction to 50% of the gene product (as a result of a loss-of-function mutation) results in an abnormal phenotype.

These mutations are site-specific and usually result in constitutive activation of a specific protein function. In achondroplasia, FGFR3 is mutated in a specific position (c.1138G>A or c.1138G>C). As a result, its normal signalling function is constitutively activated (i.e. activated even in the absence of bound fibroblast growth factor (FGF)) resulting in shortening of the long bones.

This is a mutation in one copy of a gene resulting in a mutant protein that has not only lost its own function, but also prevents the heterozygously produced wild-type protein of the same gene from functioning normally. It commonly acts by producing an altered polypeptide (subunit) that prevents or impairs the assembly of a multimeric protein (an assembly of two or more protein molecules), e.g. assembly of collagen triple helices in osteogenesis imperfecta (OI) also known as ‘brittle bone disease’.

 Different mutational mechanisms in osteogenesis imperfecta (‘brittle-bone disease’). The normal type I collagen molecule is a trimer made up of two proα1 chains and one proα2 chain. A mutant proα1 that becomes incorporated in the collagen trimers (dominant-negative effect) has more severe consequences than a mutant chain that is not synthesized (haploinsufficiency or loss-of-function effect). (Reproduced with permission from Suri, M. and Young, I.D. (2004). Genetics for
									pediatricians. Remedica, London)
Fig. 3.5

Different mutational mechanisms in osteogenesis imperfecta (‘brittle-bone disease’). The normal type I collagen molecule is a trimer made up of two proα1 chains and one proα2 chain. A mutant proα1 that becomes incorporated in the collagen trimers (dominant-negative effect) has more severe consequences than a mutant chain that is not synthesized (haploinsufficiency or loss-of-function effect). (Reproduced with permission from Suri, M. and Young, I.D. (2004). Genetics for pediatricians. Remedica, London)

Genetic analysis will be described briefly here, moving up through increasing levels of magnification.

Approaches to genetic investigation fall broadly into two groups: (i) genome-wide scans, e.g. chromosome analysis (karyotyping) and genomic microarray analysis (‘molecular karyotyping’); and (ii) highly focused analysis of an individual gene. In the latter group, the decision to select one of the ~21 000 genes for mutation analysis is usually made on the basis of a presumptive clinical diagnosis, e.g. Duchenne muscular dystrophy (DMD gene), cystic fibrosis (CFTR gene).

This requires obtaining living, dividing cells from:

Lymphocytes from venous blood (usually) or bone marrow pre-cursors

Fibroblasts (skin)

Chorionic villi or amniotic cells.

After preparation of these cells, usually by arresting cell division during metaphase when the chromosomes are in their most compact state, they can be stained using a variety of stains, the most common of which is Giemsa. After denaturing treatment with trypsin, this dye is added and binds to DNA, giving the characteristic appearance familiar to many of a ‘G-banded karyotype’. Analysis is done using light microscopy by a skilled cytogeneticist and is labour intensive (it takes several hours to complete an individual karyotype analysis).

Each of these ~500 bands corresponds to ~ 6–8 Mb (Megabases) of DNA. Given the labelling system of the long and short arms, and the banding appearance, it is then easy to identify, for example, 11p15.5 as being on the short arm ‘p’ of chromosome 11 in band 15.5 (graphic see Fig. 3.7).

 Banding pattern of chromosome 11. (Reproduced from Francke, U. (1994). Digitized and differentially shaded human chromosome ideograms for genomic applications, Cytogenet. Cell Genet., 65, 206–19, and with the permission of S. Karger, A.G. Basel.)
Fig. 3.7

Banding pattern of chromosome 11. (Reproduced from Francke, U. (1994). Digitized and differentially shaded human chromosome ideograms for genomic applications, Cytogenet. Cell Genet., 65, 206–19, and with the permission of S. Karger, A.G. Basel.)

 Normal male Giemsa banded karyotype. (Reproduced from Firth, Hurst, and Hall (2005), Oxford
									Desk Reference—Clinical Genetics, with permission.)
Fig. 3.6

Normal male Giemsa banded karyotype. (Reproduced from Firth, Hurst, and Hall (2005), Oxford Desk Reference—Clinical Genetics, with permission.)

This technique uses a DNA probe to identify a specific chromosomal abnormality that is too small to see with a light microscope, e.g. 22q11 deletion syndrome/DiGeorge syndrome. A DNA probe is made, with a complementary base sequence to a previously identified sequence in a target gene. A fluorescent dye is then attached to the DNA probe. When added to a chromosome spread, the probe hybridizes to the complementary sequence which will then fluoresce under UV light.

Whereas FISH does not give an overall analysis of a chromosome, it can be used in a prenatal setting to give a rapid result for the three common trisomies, i.e. trisomy 21, 18, and 13.

This is essentially a method for very high-resolution ‘chromosome analysis’ and is sometimes termed ‘molecular karyotyping’. Genomic arrays use competitive genome hybridization (CGH) of a mixture of test DNA from a diagnostic sample (labelled with a green probe) and normal DNA from a control (labelled red) with normal chromosomes. Hence an alteration in the green to red fluorescence ratio between the test sample and the normal sample will demonstrate whether there is more or less DNA from a particular chromosomal region and identify ‘duplications’ or ‘deletions’.

Genomic microarrays vary in terms of their resolution, e.g. some arrays report at a resolution of ~1Mb and others down to resolutions of ~100kb or less. Routine karyotyping by light microscopy has a resolution of ~5–10Mb.

SNP arrays provide a powerful tool for genome-wide genotyping using single nucleotide polymorphisms (SNPs). This approach is commonly used in research studies, e.g. in genome-wide association studies of common diseases, e.g. diabetes, obesity, hypertension. High resolution SNP arrays can also be used for molecular karyotyping.

Genomic array analysis is a relatively new tool and sometimes gives results that, with current levels of knowledge, may be novel and difficult to interpret.

This is a technique for identifying small deletions and duplications. Each MLPA probe consists of two oligonucleotides that are ligated by a thermostable ligase if they bind to the target sequence.

Deletions (or duplications) of exons are an important class of intragenic mutation (e.g. BRCA1) that may be missed by conventional sequencing. Unless a strategy such as MLPA is used to detect ‘dosage’, routine sequencing methods will simply read the sequence from the normal allele, giving a normal result and failing to identify, for example, that an entire exon is deleted from the disease allele.

Chromosomal microdeletions can also be tested for using MLPA kits. e.g. 22q11 deletion syndrome, Williams’ syndrome, and telomeric deletions.

Small sections (markers) of DNA from the sample are amplified, labelled with fluorescent tags, and the amounts are measured by electrophoresis. QFPCR is used to test for gene dosage and can be used to test for aneuploidy of whole chromosomes, e.g. chromosomes 13, 18, and 21.

Most DNA sequencing methodology in common use is based upon the Sanger dideoxy method, named after its double Nobel prize winning inventor Dr Fred Sanger. The automation of sequencing, using fluorescent labelling of the four nucleotides, has created a faster, cheaper alternative to the original process.

‘Next generation’ sequencing using new technologies is being developed and is currently being gradually introduced into large-scale research facilities. These machines have a huge sequencing capacity and will enable much faster and cheaper sequencing in the medium to long term, bringing the prospect of the ‘$1000 genome’ ever closer.

If two loci are positioned on the same chromosome, the distance between those two loci will affect the chance of there being a crossover, or recombination, during meiosis (gamete formation). This linkage or genetic distance can be measured and is referred to as the recombination fraction, denoted by θ. This is expressed in centiMorgans (cM) where a distance of 1cM between loci means that a crossover will occur between them once per 100 meioses: the greater the value of θ the further apart the two loci are.

Linkage analysis is occasionally used in the clinical setting for example for prenatal diagnosis of well-characterized monogenic disorders, e.g. cystic fibrosis, where the mutation on only one allele is known and, using linkage, it is possible to ‘track’ the presumed second mutation on the other allele.

Using this method, a series of single nucleotide polymorphisms (SNPs) located in and around the disease gene is used to track a given disease gene. This technique enables PGD for monogenic disorders because the SNP haplotype for a given gene can be analysed, which is technically more straightforward and generalizable than developing customized assays for individual ‘private’ mutations. The technical challenges are formidable and PGH has limited availability.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close