Molecular cloning of full-length coding sequences and characterization of α chains for donkey ( Equus asinus ) type I collagen

Donkey (Equus asinus) is a good donor for the collagen production. However, the information on mRNA and protein of donkey collagen has never been reported. In this work, the cDNA sequences coding proα1 and proα2 chains of donkey type I procollagen were determined from six and seven overlapping RT-PCR products, respectively. Further characterization of deduced amino acid sequences detailed the propeptides, telopeptides and triple-helical regions in donkey type I procollagen and collagen chains. Two proα chains of donkey type I procollagen share high similarities with corresponding sequences in mammalian species observed in this study. Considering the significance of lysine and proline in the structure and function of collagen, the distribution patterns of these two characteristic residues in α chains of donkey type I collagen were observed. The mRNA expression levels of type I collagen in donkey tissues were evaluated by quantitative real-time PCR.


INTRODUCTION
The family of collagen is one of the major components in the extracellular matrix.Besides the effect of body support, it also plays important roles in a broad range of physiological processes, such as development and cell adhesion (Borchiellini et al., 1996;Liu et al., 1997;Aumailley and Gayraud, 1998).Collagen and its derivative have broad applications in food, pharmaceutical and cosmetic industries.Type I collagen is the most abundant number in collagen family and the major matrix protein in the skin tissue (Miller and Gay, 1987).It is normally a heterotrimer of two identical α1 chains and one α2 chain [α1(I) 2 α2(I)], occasionally a homotrimer of α1 chains (α1(I) 3 ), with a characteristic triple-helical structure (Van der Rest and Garrone, 1991).The α1(I) and α2(I) poly-*Corresponding author.E-mail: yxzhang@ecust.edu.
Abbreviations: CDS, Coding sequences; PCR, polymerase chain reaction; qRT-PCR, quantitative real-time PCR; cDNA, complementary DNA.peptide chains are first synthesized individually as precursor proα1(I) and proα(I) chains with additional Nand C-terminal propeptides, which are then assembled together into trimeric procollagen molecules after steps of posttranslational modification (Bellamy and Bornstein, 1971;Alvares et al., 1999).In the extracellular space, the N-and C-propeptides are cleaved off by procollagen Nand C-propeptidases with their specific cleavage sites in type I procollagen chains (Ovens et al., 2000;Tuderman et al., 1978).The remaining collagen structure, consisting of the triple-helical region, the non-helical N-telopeptide and C-telopeptide, then forms the mature type I collagen molecule.The primary structure of type I collagen is a triple-helical region.Accordingly, each polypeptide chain of type I collagen molecule has a predominating prolinerich Gly-X-Y repeating sequence in which glycin residues occupy every third position and the X-position is often occupied by proline.The proline and lysine residues in the Y-position are often hydroxylated in the posttranslational modification.This process is associated with the normal assembly and function of type I collagen.The content of hydroxyproline is involved in the formation of intramolecular hydrogen bonds and the collagen triple helix conformation.Meanwhile, hydroxyproline-containing collagen-derived peptides often show physiological activities (Knight et al., 1999;Laskin et al., 1986;Ohara et al., 2010).The hydroxylysine supplies glycosylation sites in the posttranslational modification and takes part in the covalent cross-linking of collagen molecules (Gelse et al., 2003;Kadler et al., 2007).Donkey (Equus asinus) is a good donor for the collagen production.The donkey-derived collagen has exhibited good performances in health food and supplement industry.To prevent adulteration of donkey-derived collagen productions with less desirable collagen species is important for the health, religious and economic reasons.MRNA and protein sequences are important information for development of authentication assays for raw and processed donkey collagen.Also, the traditional medicine made from donkey hide which is rich in type I collagen has a widespread application to improve the hematopoiesis in Asia (Wu et al., 2007).However, the mechanism of this traditional medicine has still not been fully understood.Since studies have proved that some collagen-derived peptides have physiological activities (Monboisse et al., 1990;Mizuno and Kuboki, 2001;Postlethwaite and Kang, 1976;Ohara et al., 2010), it can be hypothesized that the special activity of donkey collagen products is derived from characteristic sequences in the donkey collagen when compared with other species.Therefore, it is also required to obtain the entire sequences information of donkey type I collagen, which is the major collagen type in skin, to analyze the characteristic peptides with potential activity and to further define the medical mechanism of the traditional medicine made from donkey hide.
In this work, we determined the complete coding cDNA sequences of both donkey proα1 (I) and proα2 (I) chains and further characterized the deduced entire amino acid sequences of donkey type I procollagen and collagen.The transcript levels of donkey type I collagen in its main expressing tissues were also observed.

Tissue collection
Fresh donkey skin, lung and liver samples were collected and immediately immersed in RNAlater (Qiagen, GmbH, GM) according to the manufacturer's instructions.Tissue samples were stored at -20°C.

RNA extraction and first-strand cDNA synthesis
Tissues were disrupted with Pellet Pestle Cordless Motor (Kimble, New Jersey, USA) and sterile grind pestles (Bio Basic Inc., CAN).Total RNA was extracted with Trizol (Invitrogen, Burlington, USA) according to a commercial protocol.First-strand cDNA was synthesized from 5 μg of total RNA with RNase H negative reverse transcriptase (Superscript III; Invitrogen) using oligo(dT) 20 (Toyobo, Lian et al. 4291 Osaka, JP) plus random hexamers (Takara, Dalian, CHN) primers.

Quantitative real-time polymerase chain reaction amplification (PCR) analysis of donkey type I collagen mRNA expression
Quantitative real-time PCR (qRT-PCR) was performed to determine the expression levels of type I collagen in donkey tissues.Primers of donkey type I collagen for qRT-PCR were designed based on the obtained cDNA sequence in this work.The sense primer used was 5'-CGTCTGGTACGGCGAAAG-3' and the antisense primer used was 5'-TCAGGCGCAGGAAAGTCA-3'.The housekeeping gene βactin was selected as an internal standard to normalize the Table 1.Primers used for the cloning of donkey COL1A1 and COL1A2 coding cDNA.All primers were arranged in their order of adoption.Nucleotides are numbered from the start of untranslated regions of cDNA sequences obtained in this work.abundance of qRT-PCR products.Sense and antisense primers of donkey β-actin used were 5'-CTGGCACCACACCTTCTAC-3' and 5'-ACATGATCTGGGTCATCTT-3'. The 20 μl reaction system contains 1 μl cDNA template, 0.8 μl 10 μM sense primer, 0.8 μl 10 μM antisense primer, 10 μl SYBR Green realtime PCR master mix (Toyobo) and 7.4 μl sterile water.The qRT-PCR was performed on a FTC-2000 detector (FungLyn Biotech, Shanghai, CHN) with the program of 94°C for 4 min, 40 cycles at 94°C for 20 s, 50°C for 30 s, 72°C for 15 s, then 72°C final extension for 7 min.Results were analyzed using the Ct method (Livak and Schmittgen, 2001).Standard curves were constructed to confirm the similar PCR efficiencies of type I collagen and β-actin.The expression level of type I collagen in donkey skin was chosen as the calibrator in the data processing.Each sample was tested in triplicate.

Cloning and sequencing of entire CDSs in donkey COL1A1 and COL1A2 cDNAs
In order to clone the entire coding sequences (CDSs) in donkey COL1A1 and COL1A2 cDNAs, the strategy of joining overlapping PCR fragments was adopted (Figure 1).Six overlapping PCR fragments were used to cover the donkey COL1A1 cDNA including the entire translated region.To amplify the full-length CDSs, for the COL1A1 cDNA, primers Col1a11-1 and Col1a13-2 were designed outside the start and stop codens, respectively.Similarly, for the donkey COL1A2 cDNA, seven overlapping PCR fragments were amplified to cover the full-length translated region.Primers Col1a21-1 and Col1a24-2 were designed flanking the CDS.The length of the entire CDS is 4392 bp in the donkey COL1A1 cDNA (Figure 2a) and 4095 bp in the COL1A2 cDNA (Figure 2b).When aligned with cDNA sequences used in the primers designing, the CDS region of donkey COL1A1 cDNA displays 94.08, 93.70 and 93.54% nucleotides identity with corresponding sequences in cattle, dog and human (the information of horse COL1A1 cDNA sequence is incomplete in the GenBank).And the CDS of donkey COL1A2 cDNA shows 92.11, 92.90, 99.54 and 92.34% identity with corres-ponding sequences in cattle, dog, horse and human, respectively.The donkey COL1A1 and COL1A2 cDNA sequences containing entire CDS have been submitted in GenBank under the accession nos.FJ594763 and FJ594764.

Characterizing of deduced polypeptide chains of donkey type I collagen
The proα1 (I) chain deduced from the donkey COL1A1 cDNA consists of 1463 amino acids, and the proα2 (I) chain deduced from the donkey COL1A2 cDNA contains 1364 amino acids.To further characterize the deduced donkey type I procollagen and collagen polypeptide chains, the multiple amino acid sequences comparison and observation of donkey with other mammalian species were performed.Both proα1 (I) and proα2 (I) chains show high conservation among species in the amino acid sequences multi-alignment (Figure 3).The predicted donkey proα1 (I) chain shares similarities of 96.58% with cattle (GenBank accession no.P02453), 96.93% with dog (GenBank accession no.Q9XSJ7), 96.72% with human (GenBank accession no.P02452), 91.87% with mouse (GenBank accession no.P11087) and 91.46% with rat (GenBank accession no.P02454).The similarity data of donkey and horse proα1 (I) chain is absent because of the incomplete amino acid sequence information of horse (GenBank accession no.XP_001499636).The predicted donkey proα2 (I) chain exhibits similarities of 94.87% with cattle (GenBank accession no.P02465), 95.39% with dog (GenBank accession no.O46392), 99.71% with horse (GenBank accession no.XP_00149-2989), 93.56% with human (GenBank accession no.NP_000080), 89.94% with mouse (GenBank accession no.Q01149) and 90.52% with rat (GenBank accession no.P02466).
The signal peptidase and N-propeptidase cleavage sites exhibit high conservation in both proα1(I) and proα2(I) chains among different mammalian species observed in this work (Figure 3).The N-propeptide is composed of 139 amino acids in donkey proα1(I) chain and 57 amino acids in donkey proα2(I) chain.The greatest   type I procollagen by propeptidases, the mature α1(I) chain is composed of 1056 amino acids and the mature α2 (I) chain contains 1038 amino acids.The proline-rich Gly-X-Y triple-helical region is the predominating structure of type I collagen.The Gly-X-Y pattern is maintained from the amino acid 1 to 1014 in both deduced α chains of donkey type I collagen.The length of this triplet repeating sequence is same in two α chains of all species observed here (Figure 3).The N-and C-telopeptides flanking the triple-helical region are composed of 16 and 26 amino acids, respectively, in the donkey α1(I) chain.In the donkey α2(I) chain, N-telopeptide has 9 amino acids and C-telopeptide contains 15 amino acids.

Distribution of proline and lysine in the donkey type I collagen
Contents of hydroxyproline and hydroxylysine in the Gly-X-Y triple-helical region are important to the intramolecular and intermolecular stability of type I collagen.Hydroxyproline in Gly-X-Y regions is thought to be associated with the collagen triple helix formation and hydroxylysine takes part in the collagen molecules crosslinking.The hydroxylation occurs at the Y-position proline and lysine in the Gly-X-Y repeating sequences of procollagen chains before the helix formation (Gelse et al., 2003;Kadler et al., 2007).In Gly-X-Y region of the donkey α1(I) chain, 116 of 237 proline residues are identified in the Y-position, and 24 of 36 lysine residues are identified in the Y-position.In triple-helical region of the donkey α2(I) chain, 98 of 205 proline residues are in the Y-position and 22 of 31 lysine residues are in the Yposition.In each α(I) chain, when compared with proline residues in Gly-X-Y regions among different mammalian species observed here, although specific locations vary, the total numbers and distributing ratios between X-and Y-positions maintain similar.It suggests that, the distribution has more important effect than the specific locations of the Y-position proline in Gly-X-Y regions on the type I collagen molecule formation.However, the total numbers, X/Y-position distributions and locations of lysine residues exhibit high conservation in type I collagen triple-helical regions.Particularly in the Gly-X-Y triplet sequence of the α1(I) chain, the numbers and specific locations of lysine are almost identical among species observed here, except for the rat, whose corresponding region has one more X-position lysine.This observation implies the significance of the conservation of lysine residues in the Gly-X-Y region for the function and conformation of type I collagen.
It has been demonstrated that, the hydroxyproline-containing peptide in the collagen can resist the gastrointestinal digestion, thus might be absorbed directly and have the physiological activity after the oral administration.The amount and sequence of produced hydroxyproline-containing peptides after digestion vary with the species of the collagen (Ohara et al., 2007;Iwai et al., 2005), which is accordant with the phenomenon that the specific locations of Y-position prolines in Gly-X-Y regions are different among species aligned here.It may contribute to the special activity of the traditional medicine rich in donkey collagen-derived peptides.With the entire protein sequences, the potential active peptides in the donkey type I collagen can be predicted through investigating the locations of Y-position prolines and the characteristic flanking amino acids.
The lysine in N-and C-telopeptides was proved to be involved in the intermolecular covalent cross-linking of collagen molecules in the process of collagen fibrils formation (Bank et al., 1999;Eyre et al., 1984).In the donkey collagen α1(I) chain, N-and C-telopeptides contain one lysine residue, respectively.In the donkey α2(I) chain only the N-telopeptide has one lysine residue.The number and specific locations of all these lysine residues are identical in species observed here.This strict conservation implies the importance of lysine in telopeptide to the normal function and structure of type I collagen.

Tissue expression of donkey type I collagen
The qRT-PCR was performed to determine the mRNA expression patterns of donkey type I collagen in its main expressing tissues.Type I collagen is the most abundant component of skin extracellular matrix (Gelse et al., 2003).And the skin is an important raw material in the donkey collagen industrial production.Type I collagen also expresses in the mammalian lung and liver, whose increased expression level is involved in the pathology of lung and hepatic fibrosis (Ratziu et al., 1998;Friedman. 2000;Zhang et al., 1994).In the work described here, the transcript levels of donkey type I collagen were determined in these tissues.In the qRT-PCR assay, type I collagen expression was detected in all the three tissues.The highest expression level was observed in the donkey skin tissue.The lung and liver respectively show 44 and 69% lower in the transcript level of type I collagen than the skin (Figure 4).

Conclusion
This study reported donkey COL1A1 and COL1A2 cDNAs containing entire coding sites (CDS), as the first donkey collagen mRNA information published.Further characterization of deduced amino acid sequences detailed triple-helical and non-helical regions in donkey type I procollagen and collagen chains.Some observations on proline and lysine in α chains of different mammalian species also have been done to deduce the meaning of distributing patterns of these characteristic residues to the structure and function of type I collagen in mammalian.The transcript levels of donkey type I collagen in its main expressing tissues were observed by qRT-PCR.The highest expression level was detected in the skin tissue.Donkey provides a significant part of animal collagen in health food industry.The present work may provide some useful information for the research and authentication of the farm animal production and collagen-derived food industry.

Figure 1 .
Figure 1.Flow chart of the cloning procedure.The strategy of joining overlapping PCR fragments was adopted to clone the entire CDSs in donkey COL1A1 and COL1A2 cDNAs.Six and seven overlapping PCR fragments were used to cover the donkey COL1A1 and COL1A2 cDNA including the entire translated region, respectively.

Figure 4 .
Figure 4. MRNA expression levels of donkey type I collagen in skin, lung and liver tissues were determined by qRT-PCR and normalized to the β-actin gene.The expression levels are presented relative to that in skin.Error bars indicated the SD values in qRT-PCR assay.