"[A] repeating sequence of nucleotides that forms a transcription or a regulatory signal"[1] is a box.



This is an image of Bob, the guinea pig. Credit: selbst.

Genetics involves the expression, transmission, and variation of inherited characteristics.

Def. a "branch of biology that deals with the transmission and variation of inherited characteristics, in particular chromosomes and DNA"[2] is called genetics.

Theoretical box geneticsEdit

Def. one "of two specific regions in a promoter"[3] is called a box.

AGC boxesEdit

This is a digital photograph of Arabidopsis thaliana. Credit: Alberto Salguero Quiles en Getafe (Madrid), España.

"[T]he AGC box (10), GCC element (11), or AGCCGCC sequence (13), is an ethylene-responsive element found in the promoters of a large number of [pathogenesis related] PR genes".[4]

ATA boxesEdit

The ATA box is a variant of the TATA box that appears in the globin and other genes. Instead of a sequence TATA as in the TATA box, the ATA box lacks the first thymine (T) and may be tissue specific.

An ATA box may have the sequence AAATAT.[5]

CAAT boxesEdit

As representative of the Metazoa here is an image of a twaid shad. Credit: Hans Hillewaert.

A CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides along the template strand of DNA in eukaryotes.

CArG boxesEdit

The diagram shows a model for epigenetic regulation of SRF binding to CArG box chromatin. Credit: Oliver G. McDonald, Brian R. Wamhoff, Mark H. Hoofnagle, and Gary K. Owens.

CArG boxes are present in the promoters of smooth muscle cell genes.

"CArG box [CC(A/T)6GG] DNA [consensus] sequences present within the promoters of SMC genes play a pivotal role in controlling their transcription".[6]

C/D boxesEdit

This example of a C/D box is a small nucleolar RNA 73 (snoRNA U73). Credit: Rfam database (RF00071).

"Located within the introns of very long transcripts extending downstream of SNRPN, there are clusters of paternally expressed C/D box–containing snoRNAs that are highly expressed in the brain5,6."[7]

For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."[8]

"The [C and D] box elements are essential for snoRNA production [transcription] and for snoRNA-directed modification of rRNA nucleotides."[8]

CENP-B boxesEdit

The 17-bp motif of the CENP-B box repeats in DNA monomers. Credit: Jun-ichirou Ohzeki, Megumi Nakano, Teruaki Okada, Hiroshi Masumoto.

"The human α-satellite consensus sequence contains only three CpG sequences within its 171 base-pair sequence [23]. Interestingly, two of the three CpG sequences in the α-satellite consensus sequence are located within site 1 (5′-pTpTpCpG-3′) and site 3 (5′-pCpGpGpG-3′) of the CENP-B box (Fig. 1A;[9])."[9]

CGCG boxesEdit

"The minimum DNA-binding elements are 6-bp CGCG box, (A/C/G)CGCG(C/G/T)."[10]

"The promoter regions are assumed to be within ∼1 kb upstream of the starting transcription site (for the known genes) or the first ATG (for the predicted genes). These genes are related to ethylene signaling (EIN3) and ABA signaling (a putative ABA responsive protein), light perception (phytochrome A, phyA), stress responsive such as the DNA repairing protein, heat shock protein, touch protein (TCH 4), and CaM-regulated ion channel. CaM genes (CaM2 andCaM3) and AtSR6 also contains CGCGcis-elements in their promoter regions."[10]


Notation: let chromo stand for chromatin organization modifier.[11]

A "[c]hromatin organization modifier (chromo) domain is a conserved region of around 50 amino acids found in a variety of chromosomal proteins, which appear to play a role in the functional organization of the eukaryotic nucleus."[12]

DREB boxesEdit

It "is likely that the present OsDREBL may preferably bind to other elements such as the ethylene responsive element GCC box (AGCCGCC), rather than the CRT/DREB box (TACCGACAT), since the two boxes were very similar."[13]

"Based on DNA-binding data, this group [one AP2/ERF domain only] has been subdivided into dehydration-responsive element-binding (DREB)-like proteins, which interact with the dehydration-responsive, or cold-repeat, element (consensus sequence TACCGACAT) (Liu et al. 1998; Stockinger et al. 1997) and the EREBP-like (ethylene responsive element binding protein) proteins binding the GCC box (consensus sequence AGCCGCC) (Ohme-Takagi and Shinshi 1995; Sakuma et al. 2002)."[14]

Enhancer boxesEdit

This is an image of Dendromus mysticalis, the chestnut climbing mouse. Credit: Kenneth Worm.

An E-box (Enhancer Box) is a DNA sequence which usually lies upstream of a gene in a promoter region.

E2 boxesEdit

"The most dramatic impact on immunoglobulin gene enhancer activity was observed upon mutation of sites that contain an E2-box motif (G/ACAGNTGN)."[15]

"The E box sites that are most important are those of the E2 box class (GCAGXTGG/T). Two E2 box sites are present in the immunoglobulin heavy chain gene enhancer, p.E2 and /~E5, and one is present in the kappa enhancer, designated KE2 [29-31]."[16]

F boxesEdit

"The F-box is a protein motif of approximately 50 amino acids that functions as a site of protein-protein interaction."[17]

"The F-box of Elongin A binds Elongin C (El C). The association of Elongins B and C with A increases Elongin A transcriptional activity."[17]

"SCF complexes generally recognize substrates after they are phosphorylated on specific epitopes [10]. Phosphorylation is one of the major mechanisms used by cells to rapidly transduce signals. SCF complexes are therefore ideal for dynamic processes that require an abrupt change to be made irreversible (at least in the short term) via the degradation of key proteins. Examples of such processes are cell-cycle phase transitions - during which the cell-cycle regulators that were required for the previous phase are degraded as the cell enters the new phase - and shifts in transcription that last for a longer time period than otherwise because a transcriptional inhibitor is degraded. There is a wide variety of SCF targets that include cell-cycle regulators, for example, G1-phase cyclins, cyclin-dependent kinase inhibitors, DNA replication factors, and transcription factors that promote cell-cycle progression, as well as non-cell-cycle functions, such as a cytoskeletal regulator, cell-surface receptors, transcription-factor inhibitors, and non-cell-cycle transcription factors (Table 2)."[17]

"Second, Elongin A, the transcriptionally active subunit of the Elongin (SIII) complex - which facilitates transcription elongation by RNA polymerase II [16] - is an F-box protein (Figure 2c). Elongin A was isolated by virtue of its ability to increase the catalytic rate of transcript elongation by RNA polymerase II in vitro [16]. Binding of the other components of the complex, Elongin B and C, increases the specific activity of Elongin A. The F-box motif of Elongin A is in the smallest region shown to be sufficient for Elongin A to bind Elongin C in both yeast and humans [17,18]. Elongin C has homology to Skp1; the F-box-Elongin C interaction may therefore be evolutionarily conserved."[17]

Forkhead boxesEdit

Forkhead "is named for the Drosophila fork head protein, a transcription factor which promotes terminal rather than segmental development."[18]

Fur boxesEdit

Notation: Let Fur stand for ferric uptake regulation.

"The Fur protein [...] acts as a transcriptional repressor of iron-regulated promoters by virtue of its Fe2+-dependent DNA binding activity (5, 25, 32, 33)."[19]

"Under iron-rich conditions Fur binds the divalent ion, acquires a configuration able to bind target DNA sequences (generally known as Fur boxes or iron boxes, [...]), and inhibits transcription from virtually all the genes and operons repressed by the metal."[19]

When "iron is scarce, the equilibrium is displaced to release Fe2+, the RNA polymerase accesses cognate promoters, and the genes for the biosynthesis of siderophores and other iron-related functions are expressed (41, 55)."[19]

The "sequence 5′-GATAATGATAATCATTATC-3′ [is] the functional target of the Fur protein."[19]

Many "iron-regulated promoters appear to have not just one Fur box but multiple, sometimes overlapping, boxes (42, 53, 87)"[19]

The "same 19-bp sequence [5′-GATAATGATAATCATTATC-3′] can be viewed as a combination of three adjacent repeats, 5′-NATA/TAT-3′."[19]

G boxesEdit

The "perfect palindrome 5'-GCCACGTGGC-3' which is also known as the G-box motif."[20]

"TAF-1 can bind to the G-box and related motifs and that it functions as a transcription activator."[20]

"A G-box-related motif, containing the core sequence CACGTG is also present in the 5' regions of two other classes of light-responsive genes".[20]

GC boxesEdit

A GC box is a distinct pattern of nucleotides found in the promoter region of some eukaryotic genes upstream of the TATA box and approximately 110 bases upstream from the transcription initiation site.

GCC boxesEdit

The "EREBP-like (ethylene responsive element binding protein) proteins [bind] the GCC box (consensus sequence AGCCGCC) (Ohme-Takagi and Shinshi 1995; Sakuma et al. 2002)."[14]

H boxesEdit

An H box has a consensus sequence of 3'-ACACCA-5'.[21]

HMG boxesEdit

"Upstream Binding Factor (UBF) is important for activation of ribosomal RNA transcription and belongs to a family of proteins containing nucleic acid binding domains, termed HMG-boxes, with similarity to High Mobility Group (HMG) chromosomal proteins."[22]

"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[23]

"In mammals, the Tcf/Lef family consists of four genes: Tcf‐1, Lef‐1, Tcf‐3 and Tcf‐4. All TCF/LEF proteins display several common structural features (48,49). They contain a nearly identical DNA‐binding domain, the HMG box, recognizing the consensus sequence A/T A/T CAAA."[24]

HY boxesEdit

A core responsive element is the hypertrophy region HY box between -89 and -60 nucleotides (nts) upstream from the transcription start site.[25]

MADS boxesEdit

"The MADS-box encodes a novel type of DNA-binding domain found so far in a diverse group of transcription factors from yeast, animals, and seed plants."[26]

"The MADS-box comprises 180 nucleotides, encoding 60 amino acids [...] MADS is an acronym for the four DNA-binding proteins MCM1 [minichromosome maintenance gene 1], AGAMOUS [...], DEFICIENS [...], and SRF [serum response factor]."[26]

The "[Antirrhinum majus mutant squamosa(squa)] SQUA is a member of a family of transcription factors which contain the MADS-box, a conserved DNA binding domain."[27]

The "MADS-box is [...] AGAGGGAAAGTACAACTGAAGAGGATAGAGAACAAGATCAATAGACAGGTGACTTT CTCAAAGAGGAGAGGTGGATTGTTGAAAAAAGCTCATGAGCTCTCTGTGCTTTGTG ATGCTGAAGTGGCTCTTATTGTCTTCTCTAATAAGGGGAAGCTATTTGAGTATTCT ACTGAT",[27] which has 174 nucleotides (nts) and begins with the nucleotides for the amino acids RGK.[27] The six nucleotides following the MADS box are "TCTTGC"[27] which may be the additional six needed to get to 180 nts.

P boxesEdit

"As VRI [target gene: vrille (VRI)] accumulates in the nucleus during the mid to late day, it binds VRI/PDP1ϵ binding sites (V/P-boxes) [consensus A(/G)TTA(/T)T(/C):GTAAT(/C)], to repress Clk and cry transcription (Hardin, 2004)."[28]

"REV-ERBα and RORa are nuclear receptors rather than bZIP transcription factors like VRI and PDP1ϵ, and they regulate transcription by binding RORE elements rather than V/P-boxes (Bell-Pedersen et al., 2005)."[28]

Pribnow boxesEdit

"Although the first five bases of the conserved sequence are identical to the first five bases of the Pribnow box (TATAA), the sixth base of the Pribnow box is a 100 per cent conserved T (refs 15-17) while the 100 per cent conserved A found here is actually more similar to eukaryotic promoter sequences20."[29]

"Two domains upstream of the start site of transcription have been identified for which a consensus sequence has been formulated(1-5). These domains are the -35 sequence (5'-T-T-G-A-C-A) and the Pribnow box (5'-T-A-T-A-A-T) in the -10 region. Both domains are in close contact with the RNA polymerase during initiation of RNAsynthesis (2,6)."[30]

Prolamin boxesEdit

"The BPBF [barley prolamin-box (P-box) binding factor] expressed in bacteria as a GST-fusion binds a P-box 5′-TGTAAAG-3′ containing oligonucleotide derived from the promoter region of anHor2gene."[31]

"The primary structure of hordein [barley prolamins] polypeptides is closely related to that of prolamins from other grass species from the Pooideae subfamily, such as wheat and rye (Shewry & Tatham 1990;Shewry et al. 1995). The close evolutionary relationship is also manifested by the conservation of a putative regulatory element in their gene promoters, the endosperm box (Forde et al. 1985;Kreis et al. 1985). This conserved region consists of two motifs, a 7 bp element (5′TGTAAAG3′) termed the Prolamin Box (P-box) or endosperm motif (EM) followed at a distance of up to 8 nucleotides by the GCN4-like motif (GLM) which has the 5′(G/A)TGA(G/C)TCA(T/C)3′ consensus sequence (reviewed by Müller et al. 1995)."[31]

Pyrimidine boxesEdit

"Functional analyses of a number of hydrolase gene promoters, induced by gibberellin (GA) in aleurone cells following germination, have identified a GA-responsive complex as a tripartite element containing a pyrimidine box motif 5′-CCTTTT-3′."[32]

"Although this GARC [GA responsive complex] may not always be tripartite, most often it includes three sequence motifs, the TAACAAA box or GA responsive element (GARE), the pyrimidine box CCTTTT, and the TATCCAC box (Skriver et al., 1991;Gubler and Jacobsen, 1992; Rogers et al., 1994)."[32]

The "complementary strand of the pyrimidine box element (5′-CTTTT-3′) in GA-induced hydrolase gene promoters was identical to the core sequence (5′-AAAAG-3′) recognized by PBF in prolamin gene promoters (P-box: 5′-T/AAAAG-3′; Vicente-Carbajosa et al., 1997; Mena et al., 1998; Yanagisawa and Schmidt, 1999)".[32]

TACTAAC boxesEdit

"The comparison of the two rp51 genes [...] suggests that this homology might be more extensive if one allows up to 3 bases to differ from a larger consensus sequence, ATTTACTAAC."[33]

"A consensus sequence TACTAA(C/T) was derived for the branch site of Dictyostelium introns."[34]

T boxesEdit

The T box is a DNA-binding domain.[35]

"T-box genes encode transcription factors involved in the regulation of developmental processes."[35]

TATA boxesEdit

This image is a drawing of Haloquadratum walsbyi. Credit: Rotational.

The TATA box (also called Goldberg-Hogness box)[36] is a DNA sequence (cis-regulatory element) found in the promoter region of genes in archaea and eukaryotes;[37] approximately 24% of human genes contain a TATA box within the core promoter.[38]

The TATA box is a "binding site of either general transcription factors or histones.

U boxesEdit

"The U box is a domain of ∼70 amino acids that is present in proteins from yeast to humans."[39]

"The prototype U box protein, yeast Ufd2, was identified as a ubiquitin chain assembly factor that cooperates with a ubiquitin-activating enzyme (E1), a ubiquitin-conjugating enzyme (E2), and a ubiquitin-protein ligase (E3) to catalyze ubiquitin chain formation on artificial substrates."[39]

"The UFD2 protein and its homologs in other eukaryotes share a conserved domain designated the ‘U box’."[40]

"The U box mediates the interaction of UFD2 with ubiquitin conjugated proteins [...] the U box is a derived version of the RING-finger domain that lacks the hallmark metal-chelating residues of the latter [5,6] but is likely to function similarly to the RING-finger in mediating ubiquitin-conjugation of protein substrates."[40]

X boxesEdit

"The so-called X (or X1) box in the promoter of the human MHC class II DRA gene is the binding site for a ubiquitous mammalian sequence-specific DNA-binding protein called RFX, NF-X, NF-Xc, or RFX1 (4,19,23,24,27)."[41]

"RFX is MDBP [...] the MDBP (RFX) recognition site region in the DRA promoter can be considered to extend from positions -100 to -112 [...] a possible binding site for MDBP which begins 88 bp after the first residue of the presumably full-length RFX1 (MDBP) cDNA (26). This site (RFX+88) is as follows: 5'-GTTGGCATGGCAAC-3'."[41]

Y boxesEdit

The "Y-box [is] an inverted CCAAT box, in the promoter region of many genes; this domain is highly conserved in evolution [1, 3, 4]."[42]


