DNA exists within our cells as chromosomes. Chromosomes are single moelcules which contain regions that carry the information to produce or "encode" proteins or RNA molecules. These regions are called genes and they are the most basic functional genetic units in our chromosomes. We have approximately 35 000 genes some of which are expressed contiuously ("house-keeping genes"), some only when the cell is undergoing certain processes or only in cells that have matured in a particular way, and some are expressed in response to an environmental stimulus. The transcription start site defines a gene. Sequences "before" or 5' to the start site are called upstream, and those after or 3' are called downstream. Pseudogenes or remnants of duplicated genes that, due to mutation, no longer function are sometimes found in humans.

When consisdering all of our DNA, including the genes and many other sequences which do not encode proteins, we are talking about our genome. This name also applies to viruses - although a viral genome has much less DNA (or RNA) than a human genome.

A cistron is the smallest unit of DNA that can encode a protein. A cistron does not include any regulatory or non-coding sequences.

Prokaryotic cells generally group their closely related genes and those genes activated or inactivated at the same time, near to each other. The genes together with their controlling elements are called operons and may be transcribed as a single mRNA which is polycistronic, or capable of encoding several proteins. Polycistronic messenger RNA (mRNA) consists of gene sequences separated by intercistonic sequences. Preceding the first gene is a leader sequence and following the last gene is a trailer sequence. The DNA between prokaryotic genes is called intergenic DNA.

Eukaryotic cells organise their genome very differnetly. DNA encoding a gene's precursor mRNA (pre-mRNA) is organised into regions called exons (EXpressed sequences) which may be spread across thousands of nucleotide base pairs (bp). The areas between exons ina gene are called introns (INtervening sequences).

Introns are not removed by luck, but with the aid of sequence specific splicing signals. Most introns start (5') with the sequence GU and end (3') with an AG which are referred to as the splice donor and splice acceptor sites. Another important sequence is the branch site located 20-50 base pairs upstream (5') of the splice acceptor site and containing a conserved A.

Five small nuclear RNA molecules (snRNA) and their proteins form a complex called the spliceosome. When snRNA is associated with proteins they are known as small nuclear ribonucleoproteins (snRNP; "snurps"). The five snRNPs which form the spliceosome are called U1, U2, U4, U5 and U6. The splice donr site is attched to the branch site to form a lasso or "lariat". Through an enzymatic process the intron is then removed and the exons joined together.

As with many things in biology, there is more than one way for introns to be spliced. Another form of intron removal involving a spliceosome is called alternative splicing and is shown below. This relies upon alternative splice sites wihtin exons. This process can produce more than one protein due to different ways of splicing the same mRNA. Interestingly, eukaryotes carry a lot of DNA that does not appear to encode any protein. This is often called junk DNA.

But intron removal can occur in the absence of a spliceosome, or in fact, any protein-based enzyme at all. These introns are removed by self-splicing and rely upon the action of catalytic RNA molecules called ribozymes. Self-splicing introns are divided into two groups based on the way the chemoistry behind the splicing. Group I introns are found in protozoa, fungal mitochondria, bacteriophage T4 and bacteria. Group II introns exist in mitochondrial and chloroplast genes (plastids).

The region of mRNA that encodes the protein is called the coding sequence (cds) and is a duplicate of the exon region of the DNA since the introns are removed from the mRNA. Human genes are usually monocistronic meaning that each protein is translated from a single mRNA.

Regulatory sequences on the DNA called enhancers, permit the binding of proteins that control gene expression. Enhancer sequences may be kilobase pairs away from the exons.