Complete nucleotide sequence and organization of the mitogenome of endangered Eumenis autonoe (Lepidoptera: Nymphalidae)

Eumenis autonoe, a member of the lepidopteran family, Nymphalidae (superfamily Papilionoidea) is an endangered species and is found only on one isolated remote island Jeju in South Korea, on Halla Mt, at altitudes higher than 1,400 m. In this study, the complete mitochondrial genome (mitogenome) of E. autonoe was reported. The 15,489-bp long E. autonoe genome evidenced the typical gene content found in animal mitogenomes, and harbors the gene arrangement identical to all other sequenced lepidopteran insects, which differs from the most common type found in insects, due to the movement of tRNA Met to a position 5’-upstream of tRNA Ile . As has been observed in many other lepidopteran insects, no typical ATN codon for the COI gene is available. Thus, we also designated the CGA (arginine) found at the beginning of the COI gene as a lepidopteran COI starter, in accordance with previous suggestions. The 678 bp long A + T-rich region, which is second longest in sequenced lepidopteran insects, harbored 10 identical 27 bp long tandem repeats plus one 13 bp long incomplete final repeat. Such a repeat sequence has been, thus far, only rarely detected in lepidopteran mitogenomes. The E. autonoe A + T-rich region harbored a poly-T stretch of 19 bp and a conserved ATAGA motif located at the end of the region, which have been suggested to function as structural signals for minor-strand mtDNA replication. Phylogenetic reconstruction using the concatenated 13 amino acid and nucleotide sequences of the protein-coding genes (PCGs) consistently supported a close relationship between Bombycoidea and Geometroidea among six available lepidopteran superfamilies (Tortricoidea, Pyraloidea, Papilionoidea, Bombycoidea, Geometroidea and Noctuoidea). Among the true butterflies (Pieridae, Nymphalidae, Lycaenidae and Papilionidae), a closer relationship between Lycaenidae and Pieridae, excluding Nymphalidae was consistently concluded to exist, although this result deviated from the traditional view.


INTRODUCTION
Eumenis autonoe, a member of the lepidopteran family Nymphalidae (superfamily Papilionoidea) is listed as a first-degree endangered wild animal in Korea (Kim, 2005).Historically, this species has been distributed throughout the Northern hemisphere, including Europe, regions (Joo and Kim, 2002).Thus, this species is a very rare example that provided us with an opportunity to evaluate the historical adaptation of butterfly species to the cooler temperatures prevailing on the Korean Peninsula.Nevertheless, the on-going global warming is predicted to further diminish the current population, which is already quite small.This may be particularly true in that true butterfly species are vulnerable to local extinction as a result of global warming (Wettstein and Schmid, 1999).Therefore, it seems essential to accumulate a minimal, but significant amount of genetic species information, which is largely unknown from the genetic perspective.Such information is expected to be helpful for long-term conservation objectives, as well as future utility.In this regard, the full-length mitochondrial genome (mitogenome) information is an important component of this knowledge.For example, the sequence information can be utilized in future population level work, including genetic diversity estimates, population relationships between donor and donee populations, primer design and so on.
Mitogenome sequences have already been determined in a variety of insects.This chart includes 16 lepidopteran species, belonging to six super families.However, only three of these species are true butterflies (Papilionoidea) belonging to the Papilionidae, Pieridae and Lycaenidae, respectively, but lacks another major true butterfly family, namely the Nymphalidae.In this study, thus, the complete mitogenome sequence of E. autonoe belonging to the Nymphalidae was determined.The mitogenome sequence was described via comparison to other insect mitogenomes, in particular to those of 16 sequenced lepidopteran species in terms of whole genome organization, arrangement, the major characteristics of individual genes and the composition of non-coding regions, including the A+T-rich region.Thus, the newly sequenced E. autonoe mitogenome is anticipated to enrich our understanding of the comparative biology of insect mitogenomes, particularly those of lepidopteran species.
Furthermore, the concatenated amino acid and nucleotide sequences of 13 protein-coding genes (PCGs) of the E. autonoe mitogenome were utilized in order to reconstruct the phylogenetic relationships among available lepidopteran superfamilies (Papilionoidea, Bombycoidea, Geometroidea, Noctuoidea, Tortricoidea and Pyraloidea) as well as among the true butterfly families within Papilionoidea (Pieridae, Nymphalida, Lycaenidae and Papilionidae).

Genomic DNA extraction
An adult specimen of E. autonoe was collected from Halla Mt., Korea on May, 2008.Total genomic DNA was extracted with the Wizard TM Genomic DNA Purification Kit, in accordance with the manufacturer's instructions (Promega, USA).

Mitochondrial DNA amplification by long PCR
In order to sequence the complete mitogenome of E. autonoe, 500 -700 bp of E. autonoe ND5 and srRNA genes (SF1 and SF2 in Figure 1, respectively) were initially sequenced.The primers for SF1 and SF2 were designed via the alignment of several insect mitogenomes sequenced in their entirety.Based on the sequence information, two pairs of primers specific to E. autonoe were designed to amplify two overlapping long fragments (LF1 and LF2 in Figure 1) using LA Taq™ (Takara Biomedical, Japan) under the following conditions: initial denaturation for 1 min at 94°C, followed by 30 cycles of 10 s at 94°C and 15 min at 60-61°C and a subsequent 10 min final extension at 72°C.The primer sequences for these short and long fragments are listed in Table 1.These PCR products were then utilized in the construction of a shotgun library.In brief, the DNAs were sheared into 1 -5 kb fragments with a Hydroshear (Gene Machine, USA) and the DNA fraction was collected using a Chromaspin TE 1000 column.The DNA fraction was then cloned into the pUC118 vector (Takara Biomedical, Japan) and each of the resultant plasmid DNAs was isolated using a Wizard Plus SV Minipreps DNA Purification System (Promega, USA).DNA sequencing was conducted with an ABI PRISM ® Big-Dye ® Terminator v 3.1 Cycle Sequencing Kit and an ABI PRISM™ 3100 Genetic Analyzer (PE Applied Biosystems, USA).

Gene identification and tRNA structures
Sequences from overlapping fragments were assembled with the neighboring fragments using CLUSTAL X software (Thompson et al., 1997).Via comparison of the DNA or amino acid sequences with homologous regions of known full-length insect mitogenome sequences, 13 PCGs, two rRNA genes and the A + T-rich region were determined via the alignment of the sequences using CLUSTAL X software (Thompson et al., 1997).The nucleotide sequences of the PCGs were translated on the basis of the invertebrate mtDNA genetic code.The 22 tRNA genes were identified by their proposed cloverleaf secondary structure and anticodon sequences with the aid of tRNAscan-SE 1.21 using invertebrate codon predictors and a cove score cut-off of 1 (Lowe and Eddy, 1997) the folding of the predicted tRNA sequences was further confirmed by visual inspection.The sequence data were deposited into the GenBank database under accession no.GQ868707.

Comparative mitochondrial gene analyses
The A + T-content of each gene and the whole genome were calculated via DNA frequency analysis (http://kd.lab.nig.ac.jp/mishima/ nucfreq1.html).Nucleotide composition at each codon position of the PCGs was calculated using PAUP ver.4.0b10 (Swofford, 2002) software.Gene overlap and intergenic-space sequences were hand-counted.Nucleotide composition, termed "compositional skew" was calculated for the PCGs between two strands and the whole genome with the EditSeq program included in the Lasergene software package (www.dnastar.com)using the following formula proposed by Perna and Kocher (1995) GC-skew = (G-C)/ (G+C) and AT-skew = (A-T)/(A+T), where C, G, A and T are the frequencies of the four bases.

Phylogenetic analysis
A phylogenetic reconstruction of relationships within the Lepidoptera was conducted on the basis of complete mitogenomes using Bayesian Inference (BI) and maximum likelihood (ML) algorithms.The alignment of the amino acid sequences of 13 individual PCGs from 17 sequenced lepidopteran PCGs with three dipteran out groups, including Bactrocera oleae (Nardi et al., 2003), Drosophila yakuba (Clary and Wolstenholme, 1985) and Anopheles gambiae (Beard et al., 1993) was constructed using CLUSTAL X software (Thompson et al., 1997) within BioEdit (Hall, 1999), under default conditions.On the other hand, the alignment of nucleotide sequences of 13 individual PCGs was constructed using RevTrans ver.1.4 (Wernersson and Pedersen, 2003), which aligns coding sequences on the basis of protein alignments.
The well-aligned blocks from the amino acid and nucleotide sequences were selected with GBlocks 0.91b (Castresana, 2000) with the maximum number of contiguous non-conserved positions set to eight.These were subsequently concatenated into amino acid (3,482 sites, which is 90% of original sites) and nucleotide (10,608 sites, which is 91% of original sites) sequence alignments.This alignment is available upon request.
Substitution model selection was conducted via a comparison of Akaike Information Criterion (AIC) scores (Akaike, 1974), calculated using the ProTest program ver.1.4 (Abascal et al., 2005) for amino acid sequence alignment and Modeltest software ver.3.7 (Posada and Crandall, 1998) for nucleotide sequence alignment.The mtRev-24 (Adachi and Hasegawa, 1996) + I + G + F model was selected as a model for BI and ML analyses for amino acid sequence in the absence of a recently developed mtArt model (Abascal et al., 2007) in the MrBayes package.On the other hand, the GTR (Lanave et al., 1984) + I + G were selected as the best-fitting model for nucleotide sequences to be utilized in BI and ML analyses.The BI analyses were conducted using MrBayes ver.3.1 (Huelsenbeck and Ronquist, 2001) under the following conditions: 1,000,000 generations, four chains (one cold chain and three hot chains) and a burn-in step of the first 10,000.The confidence values of the BI tree were expressed as the Bayesian posterior probabilities in percentages (BPP).The ML analysis was conducted using PHYML (Guindon et al., 2005) under the following conditions: the proportion of invariable sites as "estimated", the number of substitution rate categories as four, the gamma distribution parameter as "estimated" and the starting tree as a BIONJ distance-based tree.The confidence values of the ML tree were evaluated via a bootstrap test with 100 iterations.

General mitogenome features of E. autonoe
The complete mtDNA sequence of E. autonoe was 15,489 bp in length (Table 2).The complete mitogenome consists of 2 rRNAs, 22 tRNAs, 13 PCGs and one major non-coding A+T-rich region.As is the case in many insect mitogenomes, the major strand coded for a somewhat higher number of genes (9 PCGs and 14 tRNAs), whereas somewhat fewer genes were coded in the minor strand (4 PCGs, 8 tRNAs and 2 rRNA genes), as is shown in Figure 1.The genome size of 15,489 bp was well within the range detected in the completely sequenced lepidopteran insects, with sizes ranging from 15,140 in Artogeia melete to 15,928 in Bombyx mandarina (Table 3).E. autonoe harbors 3,728 codons, excluding termination codons and this number is most similar to those of Antheraea pernyi (3,732) and Phthonandria atrilineata (3,724) (Table 3).All individual E. autonoe mitochondrial genes were well within the range detected in the respective genes of other lepidopteran insects (data not shown).
Thus far, the complete mitogenome sequences of all lepidopteran insects, including E. autonoe, evidence identical orientation and gene order (Figure 1).However, this differs from the most common type that has been suggested as ancestral for insects (Boore et al., 1998).The difference between the two involves the movement of tRNA Met to a position 5'-upstream of tRNA Ile , which resulted in the following order: tRNA Met , tRNA Ile and tRNA Gln .

Nucleotide composition and base bias
The nucleotide composition of the mitogenome of E. autonoe is also biased toward A+T content at 79.1%, as is the case with other lepidopteran mitogenome sequences (Table 3).This value is well within the range found in the sequenced lepidopteran insects, where these values range from 77.9% in Ochrogaster lunifer to 82.7% in Coreana raphaelis (Table 3).The composition of the PCGs of E. autonoe was A, 32.3%; T, 44.5%; C, 11.5% and G, 11.5% (Table 4).Both guanine and cytosine were quite rare and accounted for only 23.2% (AT at 76.8%).This value is well within the range detected in the sequenced lepidopteran insects, but was the second highest of these values, next to O. lunifer (24.4%) among the sequenced lepidopteran insects (Table 4).The analysis of the base composition at each codon position of the concatenated 13 PCGs of E. autonoe showed that the third codon position (88.1%)harbored a higher A+T content than that of the first (72.2%)and second (69.9%) codon positions and a similar pattern was also detected in other sequenced lepidopteran species (Table 4).
To evaluate the degree of the base bias, the baseskew was measured and it was determined that AT-skew and GC-skew in the whole genome of E. autonoe (measured from the major strand) were -0.016 and -0.243, respectively (Table 5), thereby indicating that more Ts and more Cs are encoded in the entire mitogenome of E. autonoe.In other lepidopteran insects, the AT-skew values ranged between -0.047 (C.raphaelis) and 0.059 (Bombyx mori).Thus, the frequency of adenine varied only slightly from species to species in the Lepidoptera, in that the values varied within ± 0.1 (Table 5).In most other insects, a slight A-skew has been reported and indeed, the only exceptional cases that can be referenced were Reticulitermes (Isoptera) at 0.30 and Locusta migratoria (Orthoptera) at 0.18 (Cameron and Whiting, 2007;Flook et al., 1995).Thus, lepidopteran insects are typical in terms of AT-skew in the whole genome among insects.Unlike the AT-skew estimates, GC-skew estimates, including those of E. autonoe (-0.243) in sequenced lipidopteran species were all negative with somewhat large  5).Such a trend has also been noted in previous studies on Lepidoptera (Jiang et al., 2009) and Diptera (Junqueria et al., 2004), although the reason for this observed skewness remains to be clearly elucidated.With regard to PCGs, the degree of base bias was calculated in different strands of the E. autonoe mitogenome (Table 5).The major strand that encodes for 9 PCGs (ND2, ND3, ND6, COI, COII, COIII, ATP6, ATP 8and CytB) exhibited T-skew at -0.136, whereas the minor strand which encodes for only 4 PCGs (ND1, ND4, ND4Land ND5) evidenced T-skew at -0.191 (Table 5).Thus, the major strand which encodes for more PCGs is considered to be less biased for T-skew.Similarly, all other lepidopteran species evidenced profound T-skew in the minor strand as compared to the major strand.With regard to GC-skew, the major strand PCGs of E. autonoe evidenced a value of -0.184, whereas those of the minor   minor strand exhibited a value of 0.342.Thus, the frequency of Gs and Cs differ profoundly depending on the strand.This trend is also identical in other lepidopteran species (Table 5).Thus, mutational pressures that favor Gs and Ts are more severe on the minor-strand PCGs, although such strand-based inequalities base frequencies are yet to be clearly understood.

PCGs
Table 2 shows the start and stop codons of the 13 PCGs in the mitogenome of E. autonoe.Among 13 PCGs, 7 PCGs begin with ATG, 2 with ATT and 3 with ATC.However, in the COI gene, no canonical ATN initiator was detected in the start region of the COI gene (Figure 2).The only plausible traditional start codon for the COI gene is ATC, which was located 19 bp inside the 5' end of the tRNA Tyr gene.However, this ATC sequence required eight to nine additional amino acids, resulting in a peculiar alignment compared with other lepidopteran insects.Thus, the ATC sequence may not be the start codon for the COI gene.Rather, GCA (arginine) was designated as a start codon for lepidopteran insects, as has been suggested previously (Kim et al., 2009).This codon is present as a highly conserved region throughout all sequenced lepidopteran insects, including E. autonoe (Figure 2).Furthermore, it has been previously demonstrated that the CGA is well conserved in 39 lepidopteran insect species, encompassing eight families.Thus, it has been previously suggested that the sequence may be functionally constrained and may represent a synapomorphic characteristic for Lepidoptera (Kim et al., 2009).TAA functions as a stop codon in 10 genes, but three of the 13 PCGs harbored an incomplete stop codon consisting of a single thymine (COII, ND5 and ND4).Such an incomplete stop codon is frequently detected in insect mitogenomes (Cha et al., 2007;Hong et al., 2008;Kim et al., 2009).This phenomenon was caused by post-transcriptional modifications occurring during the mRNA maturation process, such as polyadenylation (Anderson et al., 1981;Ojala et al., 1981), but more experimental data will be required for a further and more decisive, conclusion.

rRNA and tRNA genes
The lrRNA and srRNA genes of the E. autonoe mitogenome were 1,335 and 775 bp in length, respectively.As has been observed in other insects, including lipidopteran insects (Boore et al., 1998;Kim et al., 2009), these genes are located between tRNA Leu (CUN) and tRNA Val and between tRNA Val and the A+T-rich region, respectively.The A+T content of the lrRNA and srRNA genes is 83.7 and 85.3%, respectively, which is consistent with the A+T content observed for these genes in other lepidopteran insects (Table 3).
A total of 22 tRNA genes (one specific for each amino acid and two for leucine and serine) were identified within the mitogenome.The tRNAs were interspersed throughout the mitogenome and ranged in length from 60 to 71 bp (Table 2).All tRNAs but tRNA Ser (AGN) were shown to be folded into the expected cloverleaf secondary structures (Figure 3).The unusual tRNA Ser (AGN) lacked the DHU loop (Figure 3).This incomplete tRNA Ser (60 bp) structure has been detected in the mitogenomes of other animals, including insects (Wolstenholme, 1992).
A total of 26 unmatched base pairs were detected in the E. autonoe mt tRNAs, but 20 of them were G-U pairs, which form a weak bond in the tRNAs.The remaining six were atypical pairings: one mismatch in the tRNA Ala (U-U), 2 in the tRNA Leu (UUR) (one U-U and one A-C), one in the tRNA Lys (C-U) and 2 -tRNA Ser (UCN) (2 U-U) (Figure 3).This number of mismatches in the E. autonoe tRNAs is well within the number reported for other lepidopteran insect tRNAs: 11 in O. lunifer (Salvato et al., 2008), 8 in C. raphaelis (Kim et al., 2006), 2 in Eriogyna pyretorum (Jiang et al., 2009) and one in P. atrilineata (Yang et al., 2009).The postulated tRNA cloverleaf structure harbored an invariable 7 bp in the aminoacyl stem, 5 bp in the anticodon stem and 7 bp in the anticodon loop, but also con-tained a variable length of the didydrouridine (DHU) arm and the TΨC arm, particularly within the loops.

Intergenic spacer sequences and overlapping sequences
The genes of E. autonoe were interleaved with a total of 139 bp, which were spread over ten regions, which ranged in size between 1 and 50 bp.The majority of intergenic spacer sequences were short (1 -2 bp), but two locations have relatively long intergenic spacer sequences.The longest one, which is located between tRNA Gln and ND2 in the E. autonoe mitogenome (50 bp) is consistently detected with similar sizes in the sequenced lepidopteran insects, ranging in size from 40 bp in Parnassius bremeri to 72 bp in Ochrogaster lunifer (Table 6).Nevertheless, other holometabolous insects such as Coleoptera, Diptera and Neuroptera, do not harbor such a long intergenic spacer sequence between the ND2 gene and the neighboring tRNA Met gene.Instead, only a very short intergenic spacer sequence is present in a few insects, such as the coleopterans Tribolium castaneum (Friedrich and Muqim, 2003) and Hydroscapha granulum (Unpublished, GenBank accession number AM493667) and the dipterans Mayetiola destructor (Unpublished, GenBank accession number NC0130 66), Culicoides arakawae (Unpublished, GenBank acces- sion number AB361004) and Ceratitis capitata (Spanos et al., 2000) ranging in size from 1 to 5 bp.Thus, this intergenic spacer sequence appeared to be synapomorphic in Lepidoptera.More interestingly, the sequence alignment of this intergenic spacer sequence to the neighboring ND2 gene revealed a sequence homology of 74% in E. autonoe (Figure 4).Previously, Kim et al. (2009) also detected substantially high sequence homo- Careful analysis of this region from the newly sequenced lepidopteran insect, Lymantria dispar (Unpublished, Gen Bank accession number FJ617240) also evidenced substantially high sequence homology at 60% (Figure 4).This indicated that the spacer sequence may have originated from a partial duplication of the ND2 gene, but the non-coding nature of this region may have allowed for a rapid sequence divergence from the original ND2 gene.
Other longer intergenic spacer sequences included the 16-bp long sequence located between the tRNA Ser (UCN)  6).This intergenic spacer sequence was also detected in all sequenced lepidopteran insects, ranging in size from 16 bp in P. bremeri, A. melete and the current E. autonoe to 38 bp in Ostrinia nubilalis (Table 6).Within this 16-bp long intergenic spacer sequence exists the 7-bp long ATACTAA motif, which is conserved in all lepidopteran species thus far sequenced (Figure 5).This 7-bp long motif has been suggested to be a possible mitochondrial transcription termination peptide binding site, in that the intergenic spacer sequence is located just past the final PCG, CytB, within the major strand (Cameron and Whiting, 2008;Taanman, 1999).
The E. autonoe mitochondrial genes overlap in a total of 51 bp at 11 locations, with the longest overlap measuring 16 bp, which was located between tRNA Phe and ND5.Similarly-sized overlapping sequences were also detected between tRNA Phe and ND5 in A. pernyi, C. boisduvalii, E. pyretorum, P. bremeri, Ostrinia furnacalis and O. nubilalis and a somewhat large one, 29 bp, was detected in Adoxophyes honmai (Table 6).

A+T-rich region
The A+T-rich region of the E. autonoe mitogenome was located between the srRNA and tRNA Met (Figure 1 and Table 2) and exhibits the highest A + T contents (94.5%) of any region of the E. autonoe mitogenome (Table 3).The 678 -bp long A + T-rich region of the E. autonoe mitogenome is the second longest among the completely sequenced lepidopteran insects, after the 747 bp long B. mandarina (Yukuhiro et al., 2002).This region comprised of a tandem repeat consisting of 10 duplicated and identical 27 bp copies and one partial copy of 13 bp, lacking 14 bp of the end portion of the 27 bp copy (Figure 6A).This repeat sequence consists of 26 A + T nucleotides and one C nucleotide, providing a very high A+T content (96.3%).
The presence of a tandem repeat in the mitochondrial  The 22 tRNAs are denoted by one-letter symbol and L*, L, S*and S denote tRNA Leu (UUR), tRNA Leu (CUN), tRNA Ser (AGN) and tRNA Ser (UCN), respectively.N2, C1, C2, A8, A6, C3, N3, N5, N4, 4L, N6, CB and N1 represent the ND2, COI, COII, ATP8, ATP6, COIII, ND3, ND5, ND4, ND4L, ND6, CytB and ND1, respectively.Species names are abbreviated by using one alphabet from genus name and three alphabets from species name.Full name of the species are presented in Table 3 (   A+T-rich region has been reported frequently in other insects (Cameron and Whiting, 2007), but is rare in sequenced lepidopteran insects.Careful analysis of the newly sequenced L. dispar mitogenome, which harbored a 435 bp long A+T-rich region, revealed the presence of a tandem repeat consisting of two duplicated 45 bp copies, the second copy of which harbored one substituted nucleotide (Unpublished, GenBank accession number FJ617240).Another example in the complete lepidopteran mitogenome is the 747 bp long Japanese B. mandarina A + T-rich region (Yukuhiro et al., 2002).It harbored a tandem triplication of a ≈ 126 bp fragment consisting of identical first and second copies, as well as one nucleotide-substituted and an AT-inserted third copy.Each of the 126 bp elements consisted of a ≈ 64 bp unit and a ≈ 62 bp repeated unit, each of which consisted of a 44 bp core sequence flanked by a 10 bp perfect inverted repeat in the case of the ≈ 64 bp unit and a 50 bp core sequence flanked by 6 bp perfect inverted repeats in the case of the ≈ 62 bp unit, respectively (Arunkumar et al., 2006).Liu et al. (2008) also reported the presence of a tandem repeat composed of six duplicated 38-bp copies, containing a ≈ 20 bp core motif flanked by 9-bp perfect inverted repeats in the 552-bp long A. pernyi A+T-rich region.Such repeats were considered characteristic of Antheraea, in that the partial mitogenome sequences of A. roylei and A. proylei also harbored highly similar repeat elements (Arunkumar et al., 2006;Liu et al., 2008).A common interpretation of the origin and persistence of repeat units within the A+T-rich region is the tandem duplication occurring via slipped-strand mispairing during replication (Moriz and Brown, 1987).
The remaining sequences of the A+T-rich region were composed of non-repetitive sequences, but harbored several poly-runs of T, A and TA.A BLAST search conducted to detect any relationship of the repeat sequence to other organisms or sequences proved only minimally successful, but the 27 bp long repeat sequence showed, interestingly, a high degree of sequence homology with a stretch of sequence located in the lrRNA gene at 74% (Figure 6B).The mechanism responsible for the location of such a similar sequence stretch in different locations within a mitogenome may be scrutinized further as more mitogenome sequence information is accumulated.
The A + T-rich region of the insect mitogenome, which is equivalent to the control region of the vertebrate mitogenome, has been demonstrated to harbor the replication origin for both strands in Drosophila species (Clary and Wolstenholme, 1987;Fauron and Wolstenholme, 1980) and the region located immediately downstream of a poly-T stretch at the 3'-end of the A + Trich region has been identified as the position of the minor-strand replication origin in B. mori (Saito et al., 2005).Thus, the poly-T stretch has been suggested to function as a possible recognition site for the initiation of replication of the minor strand of mtDNA.The E. autonoe A + T-rich region harbored a 19-bp long T stretch upstream of the 5'-end of the srRNA (Figure 7).This poly-T stretch is quite well conserved in all sequenced lepidopteran insects, ranging in size from 16 bp in L. dispar to 22 bp in B. mandarina (Figure 7).Additionally, immediately downstream of the poly-T stretch in the A + T-rich region is another conserved motif ATAGA, which is  very well-conserved in all sequenced lepidopteran insects, including E. autonoe (Figure 7).Previously, this motif has also been suggested to play some regulatory role together with the poly-T stretch (Kim et al., 2009).Nevertheless, this motif is conserved only in lepidopteran insects, but not in the Coleoptera and Diptera (data not shown).Thus, more experimental data are required for further conclusive decision.

Phylogenetic relationships
The 17 available lepidopteran mitogenomes, including that of E. autonoe, belong to the Obtectomera, representing six lepidopteran superfamilies (Tortricoidea, Pyraloidea, Papilionoidea, Bombycoidea, Geometroidea and Noctuoidea).Among them, the superfamilies Papilionoidea, Bombycoidea, Geometroidea and Noctuoidea are referred to as the Macrolepidoptera.The phylogenetic relationships of macrolepidopteran superfamilies have been the subject of substantial controversy, but no relationships have yet been clearly elucidated (Minet, 1991(Minet, , 1994;;Nielsen, 1989;Scott, 1986).One of the most compelling hypotheses in this regard is a closer relationship between the Papilionoidea and the Geometroidea, with the unresolved relationships of this group to  (Minet, 1991;Nielsen, 1989).(B) Bayesian Inference phylogram of apoditrisian superfamilies obtained with an amino acid dataset.(C) Bayesian Inference phylogram of apoditrisian superfamilies obtained with a nucleotide dataset.Numbers at each node specify BPP by BI analysis (first value) and bootstrap percentages of 100 pseudoreplicates from ML anaslysis (second value), respectively.The dipterans, Drosophilla yakuba (Clary and Wolstenholme, 1985), Anopheles gambiae (Beard et al., 1993), andBactrocera oleae et al., 2003) were employed as a cooutgroup.The scale bar indicates the number of substitutions per site.
each Noctuoidea and Bombycoidea, resulting in trichotomy (Minet, 1991;Nielsen, 1989; Figure 8A).The phylogenetic reconstruction of the relationships within Apoditrysia, both by concatenated amino acid and the nucleotide sequences of PCGs revealed an unexpected clustering of the Bombycoidea, Geometroidea and Noctuoidea, excluding the Papilionoidea (Figures 8B and C).In particular, the Geometroidea was identified as a sister taxon of the Bombycoidea with high nodal support on BI analyses using both amino acid and nucleotide data at 99% (Figures 8B and C), or relatively high nodal support on ML analyses using amino acid at 64% (Figure 8B) and nucleotide data at 63% (Figure 8C), respectively.On the other hand, the close relationship of Noctuoidea to the Bombycoidea + Geometroidea group is not always clear, in that this grouping was only strongly supported by BI analyses using both amino acid and nucleotide data at 100% (Figures 8B and C), whereas it was poorly supported by ML analyses using both datasets (Figures 8B and C).Nevertheless, a close relationship between the Bombycoidea and the Geometroidea is worth noting, in that this result deviates from the traditional view and those of other previously conducted phylogenetic studies namely, a sister group relationship between the Geometroidea and Papilionoidea, leaving Noctuoidea and Bombycoidea unresolved from the group (Mint, 1994), a close relationship between Noctuoidea and Papilionoidea, leaving Geometroidea and Bombycoidea unresolved from the group by five nuclear PCGs (Regier et al., 2008) and a close relationship between Geometroidea and Papilionoidea and between Noctuo-Idea and Bombycoidea based on the mitochondrial ND1 gene, nuclear rRNA genes and morphological data (Weller and Pashely, 1995).Only a recent phylogenetic analysis using the complete mitogenomes of the sequenced lepidopteran insects also supported a close relationship between the Geometroidea and Noctuoidea group and the Bombycoidea, in a fashion similar to that of this study (Jiang et al., 2009).This result suggested that macrolepidopteran evolution may be more complex than are currently understood.As more sequence information from a diverse taxonomic group becomes available, more comprehensive conclusions could be drawn.
A substantial debate has raged regarding the phylogenetic relationships existing among true butterfly families (Ehrlich and Ehrlich, 1967;Kristensen, 1976;Robbins, 1988;Scott, 1986).One of the most widely accepted relationships was that of Kristensen (1976), in which the Pieridae were identified as a sister to the Nymphalidae and Lycaenidae group, with the Papilionidae established as the basal lineage.This relationship was well supported by the recent work of Wahlberg et al. (2005), wherein the data matrix from substantially long DNA fragments and morphological features were collectively utilized, along with the elaborated phylogenetic algorithms.
The phylogenetic analysis of the four families of true butterflies (Papilionoidea), each represented by a single species, showed the monophyly of Papilionoidea with very high nodal support at 100% by BI and 96% by ML using the amino acid sequence data (Figure 8B) and 100% both by BI and ML using the nucleotide sequence data (Figure 8C).With regard to the internal relationships existing among the families of Papilionoidea, all analyses have positioned Nymphalidae as a sister to the Pieridae and Lycaenidae group, with the Papilionidae established as the basal lineage (Figures 8B and C).This relationship is consistent among all analyses, with very high respecttive nodal support (Figures 8B and C).One compromising result of our analysis to the most widely accepted relationships of true butterflies is the establishment of Papilionidae as the basal lineage (Kristensen, 1976).However, the phylogenetic relationships of true butterflies obtained by the full-length mitogenome sequences in this study overall do provide highly unconventional clustering; this has never previously been suggested in the relevant literature.One possible reason for the observed result may include a shortage in the numbers of available species, in that only a single species representing each family of the true butterflies is currently available.This may be the case, particularly because all current hypotheses contain multiple genera within the family (Wahlberg et al., 2005).
Collectively, the information currently available at least supports a strong clustering of Geometroidea and Bombycoidea, excluding Papilionoidea among the macrolepidopteran superfamilies.A monophyly of Papilionoidea and a monophyly of Bombycoidea were also well supported by the results of this study.All polygenetic analyses consistently placed Nymphalidae as a sister to the Pieridae and Lycaenidae group.Nevertheless, this topology has never been previously proposed.Thus, we are reluctant to draw a definitive conclusion regarding the phylogenetic relationships among butterflies, considering the rather limited taxonomic diversity.In order to further evaluate the phylogenetic relationships among the macrolepidopteran insects and among the true butterflies, a larger number of complete mitogenome sequences that encompass more of the taxonomic diversity will be required.

Figure 1 .
Figure 1.Circular map of the mitochondrial genome of Eumenis autonoe.COI, COII and COIII refer to the cytochrome oxidase subunits; CytB refers to cytochrome B; ATP6 and ATP8 refer to subunits 6 and 8 of F0 ATPase; ND1 -6 refer to components of NADH dehydrogenase.tRNAs are denoted as one-letter symbols consistent with the IUPAC-IUB single letter amino acid codes.Gene names that are not underlined indicate a clockwise transcriptional direction, whereas underlines indicate a counter-clockwise transcriptional direction.The E. autonoe mitogenome was sequenced by four overlapping fragments (SF1, SF2, LF1 and LF2), shown as single lines within a circle.

Figure 2 .
Figure 2. Alignment of the initiation context of the COI genes of lepidopteran insects, including that of Eumenis autonoe.The first four or five codons for COI and their amino acids are shown on the right-hand side of the figure.Underlined nucleotides indicate the adjacent partial sequence of tRNA Tyr .Arrows indicate the transcriptional direction.Boxed nucleotides indicate the currently proposed translation initiators for the COI gene of lepidopteran insects.The start codon for E. autonoe was designated as CGA.

Figure 3 .
Figure 3. Predicted secondary cloverleaf structures for the 22 tRNA genes of Eumenis autonoe.The tRNAs are labeled with the abbreviations of their corresponding amino acids.Nucleotide sequences from 5' to 3' are indicated for tRNA Ala .Dashes (-) indicate Watson-Crick base-pairing and centered asterisks (*) indicate G-U base-pairing.Arms of tRNAs (clockwise from top) are the amino acid acceptor (AA) arm, TΨC (T) arm, the anticodon (AC) arm and the dihydrouridine (DHU) arm.

Figure 4 .
Figure 4. Alignment of the intergenic spacer sequence located between tRNA Gln and ND2 and neighboring partial ND2 gene from several lepidopteran insects, including Eumenis autonoe.Only lepidopteran species evidencing a sequence homology of more than 60% between the intergenic spacer sequence and the ND2 gene are presented.Asterisks indicate consensus sequences in the alignment.Sequence homology between the spacer and the ND2 gene is shown in the parenthesis next to the species name.The nucleotide position is indicated at the beginning and end sites of the sequence.

Figure 5 .
Figure 5. Alignment of the internal spacer region located between ND1 and tRNA Ser (UCN) from all sequenced lepidopteran insects.The boxed nucleotides indicate the conserved heptanucleotide region (TTAGTAT) detected in all sequenced lepidopteran insects.Underlined and dotted nucleotides, respectively, indicate the adjacent partial sequences of the ND1 gene and tRNA Ser (UCN) gene.The arrows indicate the transcriptional direction.

Figure 6 .
Figure 6.Tandem repeat units detected in the Eumenis autonoe A+T-rich region.(A) Alignment among the repeats and (B) Alignment between the repeat and neighboring srRNA.Nucleotide positions of the sequences are provided at each end of the sequence.

Figure 7 .
Figure 7. Alignment of partial A+T-rich region and srRNA.The shaded nucleotides indicate the poly-T stretch and the boxed nucleotides indicate the conserved ATAGA motif.The direction of replication is indicated by arrows.The nucleotide position is indicated at the beginning and end sites of the sequence with respect to each mitogenome.

Table 1 .
List of primers used to amplify and sequence the mitogenome of Eumenis autonoe.

Table 3 .
Contd… a Termination codons were excluded in total codon count.b Protein coding genes.Bar (-) indicates lack of sequence information on the A+T rich region in the genome.

Table 4 .
Base composition at each codon position of the concatenated 13 PCGs in the lepidopteran mitogenomes.
Stop codon was excluded in the count.

Table 5 .
Composition and skewness in the lepidopteran mitogenomes.
*The skewness of whole PCGs and whole genome was calculated from major strand.