Measurement scales for AIDS-related knowledge and stigma in South Africa: An evaluation using item response theory

AIDS-related knowledge and stigma are key issues in combatting the HIV/AIDS pandemic, primarily because of their relationship with HIV/AIDS testing behavior. Previous studies exploring these issues in southern Africa have employed the 11-item AIDS-related knowledge scale and the 9-item stigma scale, but there has been limited psychometric testing of these scales. Using Item Response Theory (IRT), the two scales were investigated within the context of construction workers in South Africa. The IRT evaluation of these scales offers advantages over classical test theory (CTT) tests as they permit more nuanced understanding of the performance of individual items. Survey data from 512 construction workers in the Western Cape, South Africa, were used for the evaluation. Based on the tests, a revised 9-item AIDS-related knowledge scale and revised 8-item AIDS-related stigma scale were developed. The slope estimates and threshold parameters for the knowledge scale indicated a robust scale which is most reliable for respondents with low to moderate levels of AIDS knowledge, and less so for those with high knowledge levels. Similar estimates for the stigma scale indicated good reliability at moderate to high levels of AIDS-related stigma, declining when stigma was at low levels. The analysis indicates that the scale items are most precise/reliable when used in populations with (1) lower levels of education, (2) who are more likely to adhere to more traditional or non-scientific beliefs about the origin and causes of HIV and AIDS, and (3) and as a consequence of the first two, who are more likely to exhibit high levels of stigma towards those with HIV/AIDS. The results have various policy and programmatic implications for epidemiological efforts at addressing the pandemic, particularly interventions intended to boost serostatus testing behaviour, such as voluntary counselling and testing (VCT). Greater measurement integrity for applied scales improves the overall rigour of such interventions, thereby ensuring better targeting of high risk populations and more focused allocation and utilization of health financial, technical and human resources, two critically important factors in addressing the pandemic in resource-poor contexts.


INTRODUCTION
Against the backdrop of the sub-Saharan region ravaged by HIV/AIDS, the construction industry in South Africa is identified as one of the sectors most adversely affected by the pandemic (Ambert, 2002;BER, 2003;Bowen et al., 2014;Harinarain and Haupt, 2014). This is largely due to the fragmented nature of the construction industry (Meintjes et al., 2007), overwhelmingly comprised of small firms and a migratory workforce (IOM, 2010); the geographical spread of construction work; and the diversity of project types (Johnson and Budlender, 2002). It is also one of the sectors least responsive to the pandemic (Meintjes et al., 2007). Construction workers thus constitute a high-risk group for HIV/AIDS. In order to control disease transmission and provide proper care, HIV testing is essential (Denison et al., 2008;Kaufman et al., 2015). Workers" attitudes to testing (their testing behavior) are therefore important. This behavior is positively influenced by workers" level of AIDS-related knowledge (MacPhail et al., 2009;Shisana et al., 2014;Abiodun et al., 2014). Conversely, AIDSrelated stigma is a major barrier to willingness to test, to take preventative measures, or undergo treatment (Mahajan et al., 2008;Deacon et al., 2009;Mbatha, 2013).
Several measurement scales have been developed to investigate HIV/AIDS stigma in southern Africa (Nyblade and MacQuarrie, 2006;Siyam"kela Project, 2003;Maughan-Brown, 2004;Kalichman et al., 2005;Holzemer et al., 2007;Uys et al., 2009). Of these, we consider the 9-item stigma scale developed by Kalichman et al. (2005) to be particularly useful in that: (1) it was developed for use in the general South African population, (2) is brief, and (3) is available in three of the South African official languages.
Earlier work by Kalichman and Simbayi (2004) resulted in an 11-item AIDS-related knowledge scale for use in South Africa. A variety of studies in the general population have used the Kalichman and Simbayi (2004) knowledge scale and the Kalichman et al. (2005) stigma scale as the bases for their studies (Pitpitan et al., 2012;Scott-Sheldon et al., 2013). Bowen et al. (2014) have employed variations of these scales in application to workers in the construction industry.
Despite their extensive use, little evidence exists for proper evaluation of the psychometric properties of these scales, and none specifically in the context of the construction industry. Moreover, where such evaluation occurs, it is based mainly on methods derived from Classical Test Theory (CTT). While these have merit for establishing scale properties, they say very little about the specific items that constitute these scales. In this study, Item Response Theory (IRT) was used to evaluate the psychometric properties of the items that constitute these two scales using data from a survey of construction Govender et al. 13 workers in the Western Cape province of South Africa.

Item response theory: A brief overview
Item Response Theory (IRT) is a cluster of methods and techniques used in the development, evaluation, improvement, and scoring of multi-item scales (Embretson and Reise, 2000). Unlike CTT methods, IRT methods focus specifically on the individual items, examining their properties within and across individuals and generating rich item-level information which enables scalar assessment at a much more granular level than is possible using CTT methods (Hambleton and Swaminathan, 1985). The analytic basis of IRT is the item characteristic curve (ICC), which indicates the relationship between the probability of a person"s response to an item and his/her level on the construct /trait being measured by the item (such as knowledge or stigma). For dichotomous data, one-parameter (1PL), two-parameter (2PL), or threeparameter (3PL) logistic IRT models may be used; although the 2PL model is most frequently applied.
A 1PL model only estimates the difficulty of the items, and assumes that all scale items are invariant (constant) in terms of differentiating across individuals with varying degrees of knowledge/competency. In a 2PL model, both item difficulty and differentiation are assumed to be variant and thus estimated. In a 3PL model both these parameters as well as a third which relates to guessing are estimated.
Overall, the utility of a scale is measured by examining its reliability, which is an indication of its measurement precision. This precision is depicted graphically by the Item Information Curve (IIC), which indicates the range of levels of the latent trait at which the item is operating most reliably and thus most effectively.

Purpose of this study
The aim of this study was to assess, using IRT, the psychometric properties of the AIDS-related knowledge and stigma scales, within the context of construction workers in South Africa.

Background to the AIDS knowledge and AIDS stigma scales
In examining traditional beliefs about the cause of AIDS and AIDSrelated stigma, Kalichman and Simbayi (2004) Carey and Schroder (2002). All items collect information about respondent knowledge of HIV casual contagion, HIV transmission/prevention, and HIV disease processes and are scored for the number of correct answers (response options are "Yes", "No", or "Don"t know", with "Don"t know" responses scored as incorrect). Items for the 11-item scale are shown in Table 1. Kalichman et al. (2005) also developed a scale to measure AIDSrelated stigma in South Africa (hereafter termed the stigma scale). An initial pool of items was adapted from measures described by Pequegnat et al. (2001), Bauman et al. (2002), and Herek et al. (2002), together with three scale items drawn from a National Institute of Mental Health (NIMH) International Collaborative HIV/STD Prevention Trial. The initial pool of 24 items was subsequently refined to a 9-item scale, with dichotomous response options as either "Agree" or "Disagree". Higher scores indicated higher levels of AIDS-related stigmatizing attitudes. Items for this scale are shown in Table 1.

Survey data participants
A survey was administered to 512 site-based unskilled and skilled workers and office-based staff from 6 firms on 18 sites in the Western Cape, South Africa. Respondents were workers and staff who were available for participation on the day of the site visit by the field researchers. Most participants were male (91%) and almost twothirds (62%) were "African". Participant age ranged from 18 to 69 years (mean = 36, SD = 10.86), with most being in the 21-30 year age group. Over a quarter (29%) had at most primary level education, whilst 52% had secondary level education. Sixty-two per cent were permanent employees, as opposed to contract and occasional employees. In terms of the language versions of the questionnaires that were administered, 41% of the questionnaires administered were in English, 14% were in Afrikaans, and 46% in isiXhosa (an indigenous African language). Regarding HIV/AIDS status, 27% reported never having been tested, and 7% reported being HIV+.

Missing values
To begin with, missing value analysis was performed on the original dataset comprising 512 cases. Little"s MCAR test ( 2 = 202.61, df = 146, p = 0.001) indicated that the knowledge item missing values were not missing completely at random (range 0.8 -2.0%). Similarly, for the stigma items, missing values ranged from 2.5 to 3.3% and Little"s MCAR test ( 2 = 146.28, df = 101, p = 0.002) indicated that the stigma item missing values were also not missing completely at random. The extent of missing values was low for both scales (less than 5%), and accordingly list wise deletion (Graham, 2012) was adopted. This produced a 457-case final dataset with demographic characteristics almost identical to the original set, and this was the basis for the IRT analysis.
IBM SPSS Ver. 22.0 for Mac (IBM Corporation, 2013a) was used for the base statistical analysis. IBM AMOS Ver. 22.0 for Mac (IBM Corporation, 2013b) was used for the confirmatory factor analysis and IRTPRO 2.1 for Windows (Cai et al., 2011) was used for IRT testing.

Testing IRT model assumptions
The validity of IRT models is contingent upon satisfaction of a number of assumptions, such as appropriate dimensionality, model calibration and functional form. These assumptions were tested during the IRT modeling process.

Assessing dimensionality
The initial basis of all IRT analysis is unidimensionality, that is, a single latent trait measured by all scale items. To evaluate the assumption of unidimensionality, both the knowledge scale and the stigma scale were subject to confirmatory factor analysis (CFA) using structural equation modelling. The following indices were used to assess the fit of the CFA models: χ 2 /df ratio (less than 4); Bentler CFI (comparative fit index) (0.95 and greater); RMSEA (root mean square error of approximation) (0.05 and less); and Hoelter (critical N (CN) index) (200 and greater).
Good model fit was found for the 9 stigma items, χ2 /df ratio = 2.12, CFI = 0.95, RMSEA = 0.049, 90% CI [0.03, 0.07], and Hoelter (95%) = 321, and all factor loadings were significant (p<0.01) except for item 4 ("safe to work with others, including children'). Inspection of the modification indices revealed that the error terms of the "dirty" and "cursed" items needed to correlate. Specification of this path led to a significant improvement in the model: (Δ 2 (1) = 13.22, p<0.01). The factor loading for Item 4 was once again not statistically significant. Further testing revealed that a CFA model with the item deleted was not significantly different from the model with the item included, suggesting the item was of minimal value to the scale. This issue will be explored further in the IRT analysis.
Based on the CFA results, the 11-item knowledge scale and the 9-item stigma scale were both considered to satisfy the assumption of unidimensionality, and hence deemed appropriate for IRT analysis.

IRT model calibrations
To begin with, the 1PL and 2PL IRT models were fitted to the 11-item knowledge and 9-item stigma scales (Table 2). Dealing first with the knowledge scale, the threshold parameters for the 1PL model ranged from b = -1.52 to +0.02, while for the 2PL model, the slope parameters ranged from a = 0.84 to 2.40, and the threshold parameters ranged from b = -1.36 to + 0.01. Based on observed ranges for the threshold parameters, it appeared as though the knowledge scale is likely to perform optimally with persons possessing low levels of AIDSrelated knowledge.
The slope and threshold parameters for item 6 ("a person must have many different partners to get AIDS') of the knowledge scale were notably different to those associated with the other items in this scaleindicative of differential reliability. This issue is explored more fully later.
For the stigma scale, the threshold parameters for this 1PL model ranged from b = 0.28 to 2.34 while for the 2PL model the slope parameters ranged from a = 0.22 to 2.74, and the threshold parameters ranged from b = 0.42 to 3.52. The threshold parameter ranges described above suggested that the stigma scale was likely to perform best on persons presenting higher levels of stigma. The exception was for Item 4 ('safe for people who have AIDS to work with others, including children'), which indicated notably different slope and threshold parameters.
For the knowledge scale, the 1PL model appeared to cover a wider range of the threshold parameters than did the other model, whereas for the stigma scale the 2PL model covered a wider range. Final choice of the model and scale was influenced by the item characteristic curves (ICC plots) and model-data fit statistics.

Functional form
To evaluate whether or not the response option choice for each item in the two scales conformed to the 1PL and 2PL models, the Item Characteristic Curve (ICC) for each item was inspected. Figure 1 shows the ICC for Item 5 ('women-to-men transmission') for the 2PL model fit to the knowledge scale. Similarly, Figure 2 depicts the ICC for Item 2 ('people who have AIDS are cursed') for the 2PL model fit to the stigma scale. These ICCs were typical of the other ICC plots within their respective scales.
For Item 5 of the knowledge scale (Figure 1), the 50% probability level (the point at which the "switch" from an incorrect answer to correct answer regarding whether women can give AIDS to men) aligned with a trait (knowledge) level of roughly -1.35. In other words, a fairly low level of knowledge was required for a participant to select the correct answer. All other items in the knowledge scale displayed similar profiles. The item with the highest level of knowledge required to "switch" to a correct answer was Item 6, which considered whether a person must have had many different partners to get AIDS. For this item the required knowledge level was very close to 0.00 on the trait continuum. The relatively high (albeit still low overall) level of knowledge required to answer this question pointed to possible confusion on the part of participants, namely, that one has to be "promiscuous" to become infected. For Item 2 within the 9-item stigma scale (Figure 2), the 50% probability level (switching from disagreeing to agreeing with the statement that persons with AIDS are cursed) aligned with a trait (stigma) level of roughly 1.45. In other words, a comparatively high level of stigma was required for a participant to agree with the statement. All items of the stigma scale displayed similar profiles, with the exception of Item 4, where the trace lines did not intersect in the acceptable range. This indicated a potential problem with this question. The item with the lowest level of stigma required to "switch" to an "in agreement" answer was Item 5, which contended that persons who have AIDS must expect restrictions on their freedom.

Item level fit
To evaluate the absolute fit of the models to each item, the S-χ2 item-fit statistic for dichotomous data was examined (Table 2).
For the knowledge scale, the item-fit statistics indicated a satisfactory fit, in that only one (Item 6: 'different partners') of the eleven items was not well represented by the estimated parameters for the 1PL model. All items were well represented for the 2PL model.
For the stigma scale, the item-fit statistics indicated a poor fit, in that five of the nine items were not well represented by the estimated parameters for the 1PL model. In contrast, the 2PL model was a good fit, with all items well represented.
In summary, the analyses provided support for the use of the 2PL model for both the 11-item knowledge scale and the 9-item stigma scale, and this was adopted hereafter. For the stigma scale, item 4 was clearly problematic and required deeper investigation.

Model level fit (comparison)
To compare the relative fit of the models to the sample data, the following methods were employed: The Likelihood Ratio Test (LRT); the Bayesian Information Criterion (BIC); the Akaike Information Criterion (AIC); and the M 2 limited information goodness-of-fit statistic, its associated p-value, and the RMSEA index. The various fit statistics are given in Table 3.
For the knowledge scale, the LRT results suggested that the omission of items 5 and 6 improved the explanation of the item responses over that of the scale with all eleven items by 17.9% ( Δ 2 (22 -18) = 4852.34 -3983.71 = 868.63, p <0.01). This result was reflected by the BIC, AIC and M 2 statistics, confirming that the reduced knowledge model without items 5 and 6 should be the model of choice.
For the stigma scale, the LRT results suggested that the omission of item 4 yielded a 17.9% better fit than a scale with all nine items,  Δ 2 (18 -16) = 3268.91 -2700.69 = 568.22, p <0.01. The other fit statistics further confirmed the argument for exclusion of Item 4 from the stigma scale.
As determined above, the model assumptions were tenable, and hence a description of the item properties, including the extent of psychometric information (precision) available, could be made for each item and the scales. The model parameter estimates for a 9-item knowledge scale and an 8-item stigma scale are provided in Table 4.
For the reduced knowledge scale, slope estimates ranged from 1.35 (item 4) to 2.68 (item 2), indicating that most items have a similar relationship with knowledge. The item threshold parameters ranged from -1.36 (item 4) to -0.60 (item 11), with the majority located around an underlying knowledge level of -0.65, indicating that the knowledge scale was most useful/reliable for respondents with relatively low levels of knowledge about AIDS.
For the reduced stigma scale, the slope estimates ranged from 0.79 (item 5) to 2.69 (item 1), suggesting most items had a similar relationship with knowledge. The majority of the item thresholds were located around an underlying stigma level of 1.50, indicating that this stigma scale was most useful in distinguishing individuals with moderate/high rather than extreme (low) levels of stigma.
Examination of the Item Information Function (IIF) curves for the nine items in the reduced knowledge scale (Figure 3) revealed item 2 ('can a person get AIDS by  sharing kitchens or bathrooms with someone who has AIDS?') as providing the most amount of information (precision) to the scale, whereas item 4 ('can men give AIDS to women?') provided the least amount of information (note that item numbering from the original 11-item scale was retained).  The IIF curves for the eight items in the reduced stigma scale (Figure 4 all original item numbers retained) indicated that item 1 ('people who have AIDS are dirty') provided the most information while the item providing the least information was item 5 ('people who have AIDS must expect some restriction on their freedom').
To comprehend the workings of the scale as a whole, the area under each IIF can be summed together to create a total information function (TIF). Figures 5 and 6 depict the TIFs for the reduced 9-item knowledge and 8item stigma scales, respectively. The TIF for the reduced 9-item knowledge scale ( Figure 5) indicated that this scale did not provide relatively uniform information. Rather, the TIF provided the most information in the knowledge range of around -2.00 to 0.25. In essence, this reduced knowledge scale was most useful when administered to participants with low to poor levels of AIDS-related knowledge (range -2.00 to 0.25) but was not very useful where participants possessed either very low levels of AIDS-related knowledge (less than -2.00) or good to very good levels of knowledge (greater than 0.25). The marginal reliability for the 9-item reduced knowledge scale was 0.71.
The TIF for the 8-item reduced stigma scale ( Figure 6) indicated that this scale also did not provide relatively uniform information for the range above -0.50, but rather provided the most information in the stigma range of around 0.50 to 2.50. In essence, this reduced stigma scale was most useful with participants exhibiting moderate to high levels of AIDS-related stigma (range 0.50 to 2.50), but was not very useful where participants exhibited very low levels of AIDS-related stigma (less than 0.00). The marginal reliability for the 8-item reduced stigma scale was 0.56.

Conclusions
This study examined the psychometric properties of the 11-item knowledge scale and the 9-item stigma scale. We have argued for the use of IRT analysis as a logical extension of, and enhancement to, psychometric validation of scales by the application of CTT. The CTT and IRT analyses both demonstrated the unidimensionality of the knowledge and stigma scales. The CTT analysis indicated that item 4 was problematic for the stigma scale ("safe to work with others, including children"), and this was confirmed by the IRT analysis. Regarding the knowledge scale, the IRT analysis supported the removal of item 5 ("women-to-men" transmission) and item 6 ("different partners"). The removal of these items resulted in statistically significant improvement in both scales.
The 2PL model worked best for both scales, with both item differentiation and discrimination being variable across items. The location parameters for both scales indicated each is reliable, though over a defined range of the applicable latent trait. The knowledge scale was most reliable for lower levels of AIDS-related knowledge, with marginal reliability of 0.71. Conversely, the AIDS-related stigma scale was most optimal in the higher levels of stigma trait, with marginal reliability of 0.56. These findings have important implications for application of the scales in different populations and socio-cultural contexts. The knowledge scale appears to work best/most reliably for individuals with low levels of AIDS knowledge, suggesting it has greater application in similar populations. There are a variety of reasons why AIDS knowledge might be low or moderate in any community. Firstly, and as indicated in the extant literature (Kalichman and Simbayi, 2004;Agyemang et al., 2012), there is a significant relationship between levels of formal education and AIDS knowledge, with higher levels of education being associated with better AIDS knowledge. Secondly, there is evidence to suggest that greater adherence to traditional/supernatural/non-scientific beliefs about HIV and AIDS is associated with lower levels of AIDS knowledge (Aggleton and Chase, 2001;Kalichman and Simbayi, 2004). Taken together, there is reason to conclude that the current knowledge scale would work best in populations with low levels of formal education and high levels of adherence to traditional/non-scientific and even supernatural explanations for the origins and transmission of HIV/AIDS. This would make it an arguably good fit in communities that are already at high risk as a result of the fact that their risky sexual behaviour is not attenuated by valid AIDS knowledge. In contrast, in communities where AIDS knowledge is high as a result of better education and exposure to media, and/or where there is diminished currency for traditional/non-scientific explanations about HIV/AIDS, the scale would possibly be of reduced reliability.
Regarding the stigma scale, the analysis suggests that the scale would work best in communities where AIDS related stigma is moderate to high. As there is an established inverse relationship between education and stigma and between AIDS knowledge and stigma (Kalichman et al., 2005;Mall et al., 2013), this suggests that the scale would work most reliably in populations for which the knowledge scale would be applicable. However, in contexts where stigma is less obvious and blatant and more nuanced, the scale would undoubtedly lose some reliability. Likewise, in a measurement context where the social desirability was a greater factor, and hence stigma beliefs more attenuated, the scale would arguably be weaker.
It is generally acknowledged that voluntary counselling and testing is pivotal to properly addressing the prevalence and incidence of the HIV pandemic. As has been demonstrated elsewhere, the testing behaviour of individualsthat is, their propensity for serostatus testing is affected by a number of factors such as their gender, education, employment status, level of AIDS knowledge, adherence to customary/non-scientific beliefs, and levels of AIDS-related stigma (Stein and Nyamathi, 2000;Anderson and Beutel, 2007;Scott-Sheldon et al., 2013). Of these, AIDS knowledge and stigma have been shown to be particularly notable, and accordingly have received considerable attention as a means towards improving testing behaviour. There is thus considerable merit in ensuring that the scales used to measure these factors are robust enough for application in the contexts in which they are applied, and that they are sufficiently reliable for the specific demographic and socio-cultural characteristics of the specific communities for which they are designated.
The current study utilized a sample of construction workers in South Africa, but there are numerous key populations outside the construction sector which share similar demographic and socio-cultural characteristics, and for which there is heightened concern about HIV prevalence and incidence. One example of this is truck drivers, while another would be mine workers (Department of Public Works (DPW), 2004). The current analysis would suggest that the tested knowledge and stigma scales would have considerable currency for these key populations as well, alongside a range of other populations.
Finally, greater measurement integrity for applied scales improves the overall rigour of programmatic interventions for which these are used, thereby ensuring better targeting of high risk populations and more focused allocation and utilization of health financial, technical and human resources, two critically important factors in addressing the pandemic in resource-poor contexts. To the best of our knowledge, no IRT analyses of AIDSrelated knowledge and stigma scales for application within either the construction industry or the general population were found during the study. It is thus hoped that this work will contribute towards reducing this evidentiary deficit and help enhance current epidemiological efforts in South Africa and other low to middle income countries with populations of similar socio-cultural and HIV risk features.

Limitations
There are several limitations to this study that proscribe the applicability of its findings. Firstly, the study relied on a sample which is quite sharply defined in terms of various demographic and attitudinal and behaviourial characteristics, some of which are arguably very specific to the construction industry. While other populations may share similar characteristics, the specific configuration found in this sample may constrain generalisability of the results more than is hoped by the authors. Secondly, the issue of differential item functioning was not explored in this study. Such an examination is arguably important for various reasons, not least of all the use of the survey instrument in three distinct languages. It is thus likely that item functioning might be variant given the specific language applied, and this issue would be that much more important for use of the scales in diverse multilingual populations. Thirdly, the knowledge and stigma scales rely to some extent on localized understandings and representations of AIDS, and may not necessarily transfer to other contexts and populations where nonscientific beliefs about AIDS are similarly widespread but which are of a very different social, cultural or religious character.