An instrument for measuring performance in geometry based on the van Hiele model

In this paper we present the process of constructing a test for assessing student performance in geometry corresponding to the first year of Secondary Education. The main goal was to detect student errors in the understanding of geometry in order to develop a proposal according to the Van Hiele teaching model, explained in this paper. Our research methodology took into account reliability using Cronbach's alpha coefficient, as well as the construct validity, with the extraction of 13 factors that accounted for a high percentage of variance. This result leads us to conclude that the instrument constructed has the appropriate technical and pedagogical features to be considered an original and significant contribution to the field of geometry teaching. The final version of the test constructed after the extraction of factors is shown in the Appendix.

Being aware of the problems involved in this sphere of teaching-learning, we believe that more educational research is needed to broach and solve the dysfunctions detected in the different reports assessing our educational system.We therefore posited investigating the most representative theories in this area of study, which then led to the need to design an instrument to measure performance in geometry of students in the first year of Compulsory Secondary Education in Spain (acronym in Spanish: ESO), which corresponds to 12 to 13 years old students.This is the year and stage of education that our research into geometry education focuses on.The findings of this work are important for their repercussions on student learning and improved teaching.Our concern as mathematics teachers is to incorporate to our teaching practice the contributions that have come from the field of educational research, specifically regarding geometry teaching, and to investigate and propose efficient teaching models.
As regards the specific research objectives, an educational research approach was taken, based on the need to solve problems of comprehension in geometry learning that come up daily in the classroom, and in this study the results of the design and validation of an instrument that allows us to probe into the mistakes that students make in the geometry block of contents in the curriculum were presented.To this effect, a test (Appendix) to measure the students' knowledge before and after completing the geometry subject matter in the first year of ESO was devised.With this instrument and in successive stages of research, our objective was to detect student errors in understanding at both moments of learning.
When devising the measuring instrument in question, we took as a reference the curricular framework for the first year of compulsory secondary education in Spain.We likewise took into account the related bibliography considered relevant by the scientific community, but above all, the students' errors in comprehension.
The curricular proposals for learning geometry that was analyzed are based mainly on the Van Hiele Model (1957, 1986), analyzed by Corberán et al. (1994), Jaime (1993), Gutiérrez et al, (1991, 1994), and Peeg et al. (1997), in addition to assessment tests designed to evaluate the contents in this area, such as those by Gutiérrez et al. (1991), Mayberry (1981Mayberry ( , 1983)), Usiskin (1982) and other contributions to geometry teaching such as those by Alsina et al.(1997), andRico et al. (2002).Furthermore, other studies, such as those by Guillén (2000) and Huerta (1999), suggest focusing the study on the detection of errors as part of the research stage in the Van Hiele Model and adapting the theoretical frame to the reality of the classrooms in which the research is to Sánchez-García and Cabello 1195 be carried out.Thus, in our case the instrument had 13 items with three responses to choose from and was applied to a sample of 177 students in two schools of secondary education (Table 1).
After a detailed analysis of the theories related to geometry learning in order to make a theoretical selection of the items for the final test, it can be said that these theories are linked to two focal points: the first focuses on the cognitive processes that deal with the comprehension of geometric concepts (Hoffer, 1983;Jaime, 1993;Van Hiele, 1957, 1986) and the second draws together all the theories emphasizing the formation of geometric concepts, such as the Theory of Figural Concepts (Fischbein, 1993), the Concept Image -Concept Definition model (Vinner and Hershkowitz, 1980) and the Visualization-based Interpretation of Geometric Concepts (Duval, 1995(Duval, , 1998)).These approaches are not independent, but rather complement each other.Although in no way underestimating the influence of the theories underlying any of these theoretical models, in this study our central focus revolves around the basic level defined by the Van Hiele, which has to do with the aspect of visualization (the image of the concept) in the forming of geometric concepts, and which we describe briefly in the next section.Other relevant studies about the forming of geometric concepts which will also serve as a reference point in the shaping of the thematic nucleus underlying our research were likewise approached.

THEORETICAL BACKGROUND
The theory known as the Van Hiele model emerged in Holland in 1957.Pierre and Dina Van Hiele defended their doctoral dissertations and carried out successive research studies until they developed a model which explains how students gradually construct geometric thinking in five consecutive levels, guided by the teacher through five stages of learning.Their contribution to the field of geometry learning has been discussed by different authors, the most outstanding being those by Fuys et al. (1988) and Pegg (1992).
The theoretical construction of the model has a dual nature: a descriptive aspect, in which the levels of geometric thinking that the student goes through are specified, and a prescriptive aspect, which establishes the learning stages the teacher must guide the students through so that they can acquire a determined level of knowledge.
The descriptive aspect is hierarchically ordered into five levels of thinking (Patkin, 2014), although according to Patkin and Barkai (2014), owing to doubts that have emerged from mathematics educators (including van Hiele himself), today the usual practice is to speak of the following four levels: (i) Recognition or visualization, (ii) analysis or description, (iii) informal deduction, and (iv) In addition to the specific thinking that takes place at each level, Van Hiele identified five general properties that characterize the model: sequential, advancement, intrinsic and extrinsic, linguistics, and mismatch.These properties are significant to educators because they provide guidance for making instructional decisions (Crowley, 1987).
On the other hand, the prescriptive aspect consists of five learning stages: (i) Inquiry; (ii) Directed orientation; (iii) Explanation, (iv) Free orientation, and (v) Integration.
According to Clements (2003) progression from one level to the next can be facilitated through planned instruction.Therefore geometry instruction should be designed with these stages in mind (Choi-koh 1999).
Authors such as Koester (2003) and Patkin and Sarfaty (2012) have focused on investigating the different levels of knowledge acquisition in geometry and the difficulties involved.The existence of an important line of research in this field have also been noted, related to the work done by Vinner and Hershkowitz (1980) and Hershkowitz (1990Hershkowitz ( , 1999)), which is focused on knowledge of the cognitive processes that take place in students' minds when they are learning a new geometric concept.In this sense, the authors differentiate the image of the concept from its definition.The results of their studies led them to formulate a theory focused on the concept, which is called mental drawing, defined as the set of all the drawings that have been associated with the concept in the learner's mind.Thus, the image of the concept is composed of its mental drawing together with the properties that have been associated with that drawing in the mind of the learner.The definition of the concept is a verbal definition that accurately explains its comprehension.
These authors affirm that knowing the conceptual images that the students have in their minds is very important for teaching, since it gives teachers a better understanding of students' learning processes and helps them to suggest improvements for teaching to help avoid the formation of images of erroneous concepts.
In parallel, Fischbein (1993) developed the Theory of Figural Concepts to refer to mental entities that have both conceptual and figural characteristics.From the point of view of this model, images and concepts interact constantly during the reasoning process and sometimes produce conflictive situations until the geometric concept is formed.Duval (1993Duval ( , 1995Duval ( , 1998) ) developed his theory by focusing on the framework of acquisition of geometric concepts based on different systems of representation located in the field of semiotics, a field subject to important cognitive activity.For this author, the construction of a geometric concept is directly related to the construction of its graphic representation.
From our point of view, at the basic level of recognition or visualization (the stage at which the students in our sample find themselves) students have not yet formed the figural concept; they know only images without knowledge of the concepts and think that the figures possess irrelevant attributes such as their orientation on the plane; that is why they find themselves in a conflictive situation in which the concept loses strength in relation to the image.
For this reason, and as indicated earlier, in order to detect the level of formation of geometric concepts and errors in comprehension as part of the inquiry stage of the Van Hiele model and to adapt the theoretical framework to the reality of the classroom, our study focuses on the basic level of recognition or visualization in the Van Hiele model, analyzing the conceptual images that students show at this level.The objective is highly relevant, since we are not aware of any previous research that has broached the relation between the recognition and the visualization of the concept (Van Hiele level 0) on one hand, and the curricular design proposal for geometry in Spain, on the other.Thus, this study revolves around the drawing up and validating of a test for measuring the performance of students in their first year of compulsory secondary education in geometry.In the next section, we describe the process of validating the instrument.

MATERIALS AND METHODS
This research was carried out using a quantitative analytical empirical approach with a non-experimental design in order to validate the instrument in question.

Technical characteristics: Sample, process, reliability and validity
As shown in Table 1, the target population was comprised of male and female students in their first year of compulsory secondary education (ESO) aged between 12 and 13.The schools were selected for their convenience.Following the current terminology, it can be said that the sample was incidental, because it was the one available for the study at the time it was being carried out (Pereda, 1987).
The first question posited was the number of sample units necessary and, as pointed out by Morales (2012) the question to be answered is "What is the study for?The literature is abundant in this sense and different authors maintain different criteria.The issue was related mainly to educational research.In our case, the fitting question is: How many students are needed to construct and analyze a measurement instrument and to subject the instrument to item and factor analysis, as well as interpret its reliability and validity?
Nunnally (1978) and Afifi and Clark (1990) proposed that there should be at least between five and ten subjects per item.This may be the most widespread criterion for carrying out factor analysis when trying to establish construct validity (Argibay, 2006).Other authors, such as Guilford (1954) and Kline (1986Kline ( , 1994)), consider a small sample adequate as long as the number is not much below 200.
To determine the number of sample units we adopted the criterion established by the above-mentioned authors.Since the recommendation is to use between 150 and 200 subjects, we assume that our sample, comprised of 177 students, is sufficient and adequate, and thus acceptable.The sample characteristics are shown in Table 1.
The process was began with an updated review of the literature concerning the contents to be assessed.This analysis allowed us to identify a series of variables that appeared as constants in the assessment of the geometrical concepts to be measured.Taking as a reference the theoretical foundation described in the previous section, and in awareness of the complexity involved in the construction of an instrument to assess the acquisition of geometric concepts, we decided to use a multiple choice test as the measurement instrument because it would allow us to collect and quantify the information in a simple way.In the wording of the questions for the pilot test, we took into account the specifications of Blaxter et al. (2000) and followed the indications of Spector (1992) for the stages used to devise it.
Once the items were selected (32 questions with three responses to choose from [96 items]) a group of experts evaluated the suitability of each item for the attribute to be measured.This is the most common way to detect the quality of the contents, especially in the educational context (Prieto and Muñiz, 2000;Prieto and Delgado, 2010).This is why we turned to a group of experts in the contents of the instrument in order to evaluate its items, instructions, and design (Millman and Greene, 1989).
Once the contents of the items had been analyzed, some of the items (19 questions with three responses to choose from [57 items] were eliminated and changed the content of others in line with the indications from the group of experts.A cognitive analysis of the items was also carried out in order to learn the strategies that the students would use to answer them. Since these procedures alone are not sufficient (Visser et al., 2000), an attempt was made to strengthen the process using the analyses presented in the following sections.
Subsequently, with the sample selected, and before imparting the teaching programmed for the contents in geometry, a pilot test was carried out and the test was given to 177 students in order to analyze its internal consistency.This processes yielded a Cronbach's alpha coefficient = 0.938, which according to authors such as Web (1983) or Kline (2000) more than satisfies the reliability requirements for the instrument.
It can thus be concluded that the results obtained show very high reliability, which according to the Cronbach interpretation would be Sánchez-García and Cabello 1197 within the coefficient value between 0.85 and 0.90, whose meaning is catalogued as almost perfect.Nonetheless, in order to be as rigorous as possible with the process we should also take into account the reflections made by Morales (1995) to the effect that when reliability is calculated in the human sciences, one should specify what kind of reliability it is.A low coefficient does not necessarily mean that the instrument is bad and should not be used.Reliability is not a characteristic of the instrument, but rather a characteristic of the results, of the scores obtained by a specific sample.And it is very important to underscore this fact even though it is commonly referred to as if it were a characteristic of the instrument; this frequent way of referring to it must be understood in order to understand what it means.One same instrument can measure and classify one sample very well but do so very badly with a high margin of error when applied to another sample.One same instrument can measure well if the subjects differ widely among themselves but have a low reliability if the sample is homogeneous.Reliability must thus be calculated for each sample, regardless of the reliability of the instrument.The data obtained from applying our test were analyzed using the statistical program SPSS 20.0 Subsequently, items were empirically selected based on having administered each one to the sample of subjects selected to this effect.Statistical analysis of the test was begun by analyzing the reliability of each item (Morales, 2003;Lukas, 1998;Muñiz and Yela, 2003).The term "difficulty index" is usually used to indicate the ratio of correct to incorrect answers to an item in the student sample in question, but to be consistent with the formula it was agreed to call it the "facility index."This analysis, together with the evaluation by experts, allowed us to shorten the test with a wide range of facility indices that gave the test a rating of average difficulty, desirable for measuring student learning.Subsequently, the psychometric characteristics of the test were measured in order to verify its reliability and validity to then draw up the definitive version, even though most of the items discriminated satisfactorily.
Before presenting the reliability study, it is worthwhile to recall that reliability expresses the accuracy of the measuring instrument and validity means that the instrument actually measures what we want it to measure.
The reliability of the results of the test, that is, its internal consistency as measured by Cronbach's alpha coefficient, was 0.938.Cronbach (1960) affirmed that only those tests with a reliability coefficient of at least 0.90 should be used for educational purposes (which is our case), whereas Nunnally (1978) proposed a minimum value of 0.70.After reviewing the works of several authors, Webb (1983) proposed an interpretation of the reliability coefficient that would fall within a coefficient value between 0.85 and 0.90.If we take that as a reference, it can be concluded that our results show very high reliability.Factor analysis was carried out, identifying the fit of the data, using the Kaiser-Meyer-Olkin (KMO) measure and Bartlett's sphericity test.
Some authors, such as Lukas (1998) and Muñiz (2003), report that attempts are being made to extract certain inferences from the test data.Validity refers to the fits, significance and usefulness of these inferences.It is traditionally defined as the degree to which a test measures what it is supposed to measure.This definition, however, needs to be qualified and requires specification with the validity of the contents, of the criterion and of the construct, as stated previously.
The content validity of the test seems to be guaranteed by the study carried out by the group of experts, the result of which was the elimination of certain items.
Finally, construct validity, the most important level of validation (Cronbach and Meehl, 1955), aimed at determining the degree to which the test measures the theoretical trait in question, is a complex process that usually includes several procedures.As  Messick (1980) states: "…construct validity is indeed the unifying concept of validity that integrates criterion and content considerations into a common framework for testing rational hypotheses" (p.1015).
In regard to the technical demands of a study of these characteristics, we can affirm that all of them have been fulfilled; there is usually consensus about the importance of construct validity (Cronbach, 1988;Chacón and Moreno, 2000;Loevinger, 1957;Messick, 1995;Moss, 1992;Pérez-Gil;Tenopyr, 1977); in addition, factor analysis is usually run to analyze the structure being measured (Cronbach, 1980(Cronbach, , 1982(Cronbach, , 1988;;Guion, 1977;Tenopyr, 1977;Thurstone, 1931Thurstone, , 1947)).Besides observing the high correlation between the items, the two tests mentioned earlier were carried out to analyze the appropriateness of applying factor analysis (García et al., 2000;Pérez, 2009).In line with technical data obtained, Table 2 and 3 show the descriptive statistics.
This structure, which maintains an unequivocal relation to the curriculum of the first year of upper secondary education in Spain, is what allows us to confirm the validity construct that can be seen in the Table 3.
The analyses performed yielded very satisfactory results.The definitive version of the test (Appendix) consists of 13 items with three responses to choose from.

RESULTS
This paper describes the process of devising an instrument (test) for measuring performance in geometry of students in their first year of compulsory secondary education (ESO) in Spain.After analyzing the theoretical framework and other preliminary assessment tests in this field, the test was given to a sample of 177 students in order to validate it.The empirical treatment and validation process consisted of two stages: the first, aimed at obtaining the number of factors and percentage of explained variance, yielded a test comprised of 13 factors, and the second, confirmatory analysis, determined that the test was comprised of 13 factors and 13 items.
The items on the test are related to the theory of acquisition of geometrical concepts and to the contents included in the prescriptive curricular design in Spain.It can be affirmed that the test is related to the attribute that it is intended to measure and fits the target population.Likewise, the data obtained from the (exploratory and confirmatory) factor analyses showed adequate values, allowing us to establish a definitive version consisting of 13 items grouped into 13 factors with eigenvalues higher than one that explain 73.516% of the variance.
The instrument was carried out in a pilot study.A quasiexperimental study with one non-randomized study group (n=137) was also conducted using a pre-and post-test design with the purpose of searching for differences before and after the Van Hiele learning process and detecting persistent errors (Cabello et al., 2014).The instrument allowed us to observe the initial concepts that the students had in relation to the contents of geometry and their most common errors.There were significant differences between the average performance of each of the two groups of the study in favor of the experimental group taught with the Van Hiele Model (F= 0.317, p =0.006).Eight persistent errors were also detected after implementing the learning process, only two of which were detected in control group.Therefore, the results confirm that the instrument is valid to detect the initial errors of students in relation to the geometrical concepts that they have acquired.We think it is important to have real data on the initial situation of the students before implementing the Van Hiele Model for effective teaching.The relevance of persistent error detection is that it allows us to determine the efficacy of the teaching methodology and that it indicates the errors that students make.The positive effects of employing the Van Hiele model in geometric concepts acquisition are clear (Alebous, 2016), but a tool is needed to detect geometric

Table 1 .
Percentages of distribution by sex in the sample (aged: 12 to 13).

Table 2 .
Explanatory percentage of variance in the factor analysis.