Debate participation and academic achievement among high school students in the Houston Independent School District: 2012 - 2015

Competitive debate programs exist across the globe, and participation in debate has been linked to improved critical thinking skills and academic performance. However, few evaluations have been able to adequately address self-selection into the activity when examining its impact on achievement. This study evaluated the relationship between participating in a debate program and academic performance among high school students (N=35,788; 1,145 debaters and 34,643 non-debaters) using linked debate participation and academic record data from the Houston Independent School District. Academic performance was indicated by cumulative GPA and performance on the SAT college entrance exam. Selection into debate was addressed using propensity score methods informed by sociodemographic characteristics and 8th grade standardized test scores to account for pre-debate achievement. Debate participation was associated with 0.66 points (95% Confidence Interval (CI): 0.64, 0.68) higher GPA, 52.43 points (95%CI: 50.47, 54.38) higher SAT Math, and 57.05 points (95% CI: 55.14, 58.96) higher SAT Reading/Writing scores. Findings suggest that competitive debate is associated with better academic outcomes for students.


INTRODUCTION
There are persistent gaps in academic achievement and college-readiness in urban, public school districts, especially among lower income and minority students (Banerjee, 2016). Policy makers and educators have advanced extracurricular learning to address these achievement disparities (Marsh and Kleitman, 2002). However, there is limited quantitative evidence supporting the effectiveness of extracurricular programs at improving academic outcomes for lower income and/or minority secondary school students, especially regarding collegereadiness. Research is particularly needed in districts that predominantly serve Latino/Hispanic students, the fastest growing group in K-12 schools (US Department of Education, 2020). De facto segregation by race and *Corresponding author. E-mail: bmezuk@umich.edu. Tel: 734.615.9204.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License ethnicity remains in the US public school system. According to the US Department of Education, 95% of Hispanic and 96% of Black students attend a school that is at least 25% racial/ethnic minority; in comparison, only 52% of non-Hispanic white students attend a school that is at least 25% racial/ethnic minority (US Department of Education, 2020).
Competitive debate is a co-curricular activity centered on the communication of evidence-based argumentation. Pairs of students work together to debate both sides of policy-relevant topics (e.g., government support for renewable energy), and in the process practice academic skills including reading and interpreting complex nonfiction text, developing and responding to arguments orally and in writing, collaborative and cooperative learning, and time management (Mitchell, 1998). Debate leagues continue to grow worldwide, with international tournaments drawing debaters from up to 60 countries (English-Speaking Union, 2020). In addition, there is a large body of qualitative evidence supporting the positive impact of debate on critical thinking skills, school engagement, and personal development (Louden, 2010).
There are major challenges to isolating and quantifying the impact of extracurricular activities on academic performance. While there is an extensive literature, both quantitative and qualitative, describing the salience of a wide range of extracurricular activities (e.g., music, sports, theater) for adolescent development (Eccles et al., 2003;Eccles and Barber, 1999;Gibbs et al., 2015;Marsh and Kleitman, 2002), the causal evidence that specific activities enhance academic performance is limited. This stems from two methodologic issues that are challenging to address: First, the identification of an appropriate comparison group (Marsh and Kleitman, 2002); this is difficult because program evaluators often only have data on students who participate in the activity. Second, an adequate means to account for self-selection into the activity (Hunt, 2005); programs often only have data on students once they have begun participating, meaning that they cannot account for pre-activity academic performance when evaluating the impact of the program. Large academic administrative data systems, which can be linked to information regarding participation in extracurricular activities, provide an opportunity for addressing both of these limitations (Mezuk et al., 2011).
Using such large administrative data systems, a handful of studies have quantitatively evaluated the relationship between participating in a policy debate league and academic achievement in urban school districts. Mezuk et al. (2011) found that Chicago high school students who participated in debate were more likely to graduate from high school, performed better on the ACT college entrance exam, and gained more in GPA over the course of high school than comparable students who did not participate (Mezuk et al., 2011). A more recent report found that debate was associated with gains in standardized test scores and lower likelihood of absenteeism among middle school students in Baltimore (Shackelford, 2019). Both the Mezuk et al. (2011) andShackelford (2019) studies used propensity score methods to account for the non-random assignment (that is, self-selection) of students into debate programs; both identified that better-achieving students were more likely to self-select into debate, but that debate participation was still associated with academic outcomes even after accounting for this self-selection. In addition, other quantitative reports have examined the relationship between debate participation and indicators of psychosocial development (e.g., self-efficacy, civic engagement, etc.) and have reported positive correlations (Anderson and Mezuk, 2015;Kalesnikava et al., 2019). In sum, quantitative studies of debate participation in urban school districts show that while there is differential self-selection into debate, consistent with all extra-curricular activities (Hunt, 2005), debate participation is still associated with better academic performance after accounting for this self-selection.
The present study aims to extend this work by assessing the relationship between debate participation and indicators of academic achievement and collegereadiness among a large sample of high school students from a district that serves a predominantly Hispanic/ Latino student population. Data come from the Houston Urban Debate League (HUDL) and the Houston Independent School District (HISD), the largest school district in Texas, with records spanning 2012 to 2015. We use quasi-experimental propensity score methods to account for the non-random assortment of students into debate to attempt to isolate the influence of participation in this activity on academic achievement.

Data sources
Two sources of de-identified data on three 9 th grade cohorts (2012/13 through 2014/15) of students were merged to form the sample: 1) academic records from HISD and 2) debate participation records from HUDL. The analytic sample consisted of all HUDL participants (-debaters‖) during this period as indicated by debate tournament participation records. The comparison sample of nondebaters was created via a 30% random sample of 9 th grade students who did not debate from each academic year (2012/13 to 2014/15), which equated to approximately 11,000 students from each 9 th grade cohort. The resulting total sample for this analysis was 35,788 students, which consisted of 1,145 debaters (that is, students who participated in at least one debate tournament) and 34,643 non-debaters.
All demographic and academic performance variables were derived from HISD administrative records. Sociodemographic characteristics included sex, age in 9 th grade, race (coded as Hispanic/Latino, Black, non-Hispanic white, Asian, Native American, and other for analysis), cohort year (2012/13, 2013/14 and 2014/15), and whether the student qualified for free/reduced cost lunch, which served as a proxy of economic disadvantage. Finally, to account for differential self-selection of students into debate as a function of academic performance, we indexed pre-debate (that is, 8 th grade) achievement by performance on the Reading and Math sections of the State of Texas Assessments of Academic Readiness (STAAR) test, a state-wide standardized exam (Texas Education Agency, 2021). While the exact percentiles on the STAAR sections vary year to year, for 8 th grade, scores between 1700 -1759 on the Reading and scores between 1700 -1828 on the Math section are indicative that the student -Meets‖ academic readiness thresholds for those subjects; higher values indicate that the student -Masters‖ those subjects (Texas Education Agency, 2020).

Outcome assessment
We examined two academic outcomes: cumulative GPA (that is, last recorded GPA for each student, modeled as a continuous variable) and performance on the Math and Evidence-based Reading/Writing sections of the SAT college entrance exam. For individuals who took the SAT multiple times only the highest score was used. The format of the SAT changed during the study period (The College Board, 2015); from 2005 to 2016 the SAT was scored out of a total of 2400 points with three sections (Math, Critical Reading, and Writing) each worth 800 points. We converted these to the current (2016present) SAT format, which includes two sections (Math and Reading/Writing Sections) which are each worth 800 points (for a total possible score of 1600 points), according to College Board concordance guidelines (The College Board, 2016). The SAT has identified benchmarks that represent -college-readiness‖ (that is, a 75% likelihood of attaining at least a -C‖ in first semester college course related to each section); these are scores of ≥480 for the Reading/Writing section and ≥530 for the Math section (The College Board, 2016). SAT performance was examined as both a continuous outcome (that is, average expected score on each section) and as a binary outcome (that is, met college readiness benchmark for the section).

Treatment of missing data
Data in this study all come from administrative sources (e.g., debate tournament records and administrative school records) and as such, for some variables there is substantial missing data. As these data are unlikely to be missing completely at random, including only cases with complete data on all covariates (n=16,704) in our analysis would have resulted in a biased sample (Leyrat et al., 2019). To address this missing data problem, we used Multiple Imputation with Chained Equations (MICE) (van Buuren and Groothuis-Oudshoorn, 2011). We imputed 10 complete datasets from the original data, with a maximum of 10 iterations per imputation, using the R MICE package (Version 3.6.0). We verified the plausibility of the imputed values (e.g., ensuring there were no cases of implausible age in 9 th grade) using diagnostic plots comparing marginal distributions of observed and imputed data.

Analysis
First, we compared the sociodemographic characteristics of debaters and non-debaters using Chi-squared tests for categorical variables and t-tests for continuous variables. This comparison clarifies to what degree debaters differed from students who did not debate, including differential self-selection into the activity, and provides a metric to assess the reach of the program (that is, which types of students are engaging in the debate league, and which are not).
Next, we used inverse probability of treatment weighting (IPTW) (Austin and Stuart, 2015) to account for selection bias in our estimates of the relationship between debate participation and the two outcome indicators of academic achievement (SAT Ko and Mezuk 221 performance and GPA). IPTW addresses selection bias by weighting each observation in the dataset by the inverse of the probability (that is, propensity) they debated (e.g., students who are very likely to have debated, and did in fact debate, are downweighted and students who are very unlikely to have debated, but did in fact debate, are up-weighted). This weighting creates a -pseudo-population‖ in which debaters and non-debaters are balanced based on their observed characteristics. In this manner, IPTW generates estimates of the debate-achievement relationship that are less biased than those that would be generated from standard multivariable regression (Austin and Stuart, 2015).
To generate the propensity score (that is, probability that a student debated), we used a two-step process: First, we fit a logistic regression model predicting debate participation (1=yes, 0=no) from observed socio-demographic characteristics (that is, sex, age in 9 th grade, race, 9 th grade cohort/year, and free/reduced lunch) and predebate achievement (that is, 8 th grade STAAR reading and math scores) within each of the 10 imputed datasets. Next, from this logistic regression model, we estimated the predicted probability (that is, propensity, possible range: 0 (very unlikely to debate) to 1 (very likely to debate)) for each student in the sample. We generated the IPT weight for each student by taking the inverse of this probability (1/predicted probability of debate participation).
We then used this IPTW to fit regression models of debate predicting academic achievement (that is, GPA and SAT performance) using a two-step procedure: We fit a generalized linear model for each of the 10 imputed datasets estimating the effect of debate participation on each outcome (that is, GPA, SAT Math Score, and SAT Reading/Writing Score), using IPTW and adjusting for sex, age in 9 th grade, race, ninth grade cohort, free/reduced lunch status, and 8 th grade STAAR reading and math scores. Three alternative specifications of this model were considered: (1) unadjusted for all covariates while using IPTW, (2) unadjusted for all covariates using IPTW with the propensity score function including all interaction terms, and (3) adjusted for all covariates using IPTW with the propensity score function including all interaction terms. However, model fit was poor for the alternative models and the R 2 was consistently highest for the fully adjusted model using IPTW with no interaction terms in the propensity score function. Finally, parameter estimates (beta coefficients) and standard errors were then pooled across the 10 imputed datasets into a single set of values for each indicator of achievement.
All data analysis was conducted in R Studio (3.5.2) and all pvalues refer to two-tailed tests. This study was reviewed and deemed exempt from human subjects regulation by the Institutional Review Board at the University of Michigan. It was approved by the Office of Research and Accountability at HISD. Table 1, nearly two-thirds of the sample was Hispanic/Latino and three-quarters qualified for free/reduced lunch, a proxy indicator of socioeconomic disadvantage. This is consistent with the overall demographics of the HISD (Houston Independent School District, 2021), indicating that our sample was representative of the district as a whole. Debaters were slightly younger in 9 th grade and were more likely to be female and Asian or non-Hispanic White compared to non-debaters; there was no difference in free/reduced lunch status. While 8 th grade STAAR test scores were significantly higher for debaters, consistent with differential self-selection of higher-achieving students into the activity, even among debaters these higher scores were still only in the -meets‖ academic readiness category. Using IPTW to account for self-selection into debate, the average cumulative GPA for debaters was 0.66 points (95% Confidence Interval (CI): 0.64, 0.68) higher than comparison students. Similarly, debate participation was associated with 52.43 points (95% CI: 50.47, 54.38) higher score on the Math and 57.05 points (95% CI: 55.14, 58.96) higher score on the reading/writing section of the SAT. As shown in Figure 1, debate participants Error bars indicate 95% confidence intervals. College-readiness benchmarks are those scores provided by the SAT to indicate a 75% likelihood of attaining at least a C in first semester courses related to the section (e.g., a quantitative-oriented course for the Math SAT). were significantly more likely to meet the collegereadiness benchmark on the Reading/Writing (Odds ratio: 1.18, 95% CI: 1.13, 1.23) section, but not the Math section, of the SAT. The substantive impact of our analytic decision to use IPTW on our inferences is shown in Table 2. This table illustrates the estimates from 1) Complete case analysis (that is, not using MICE) using standard generalized linear models (that is, not using IPTW), 2) Imputed data using standard linear models (that is, not using IPTW), and 3) Imputed data analyze using IPTW models. Across all three of these modeling approaches, debate participation was significantly associated with both GPA and SAT outcomes; the results of the IPTW show that the relationship between debate and academic achievement was robust to differential self-selection based on observed sociodemographic characteristics and 8 th grade (pre-debate) achievement as indicated by the STAAR standardized test performance.

DISCUSSION
Competitive academic debate programs exist in thousands of communities around the globe, including recent growth in urban school districts in the United States (International Debate Education Association, 2017). Prior research has described the benefits of debate participation for outcomes such as critical thinking skills (Green and Klug, 1990;Kennedy, 2007), as well as self-efficacy and various indicators of social/emotional development (Anderson and Mezuk, 2015;Fine, 2004;Kalesnikava et al., 2019), that are in turn correlated with school engagement (Bellon, 2000). The present study, which is one of the largest quantitative evaluations of debate participation and achievement among high school students conducted to date, extends this work by providing robust evidence of the benefits of debate on academic performance and college readiness. These findings are consistent with those of prior studies in Chicago (Mezuk, 2009;Mezuk et al., 2011), which found that debate participants were more likely to reach college-readiness benchmarks on the ACT college entrance exam; this study, which is the first to examine the relationship between debate participation and performance on the SAT college entrance exam, similarly found stronger effects on the Reading/Writing versus Math sections of the test. Findings are also consistent with research among middle school students in Baltimore (Shackelford, 2019), which found positive impacts of debate on school engagement and standardized test scores entering into high school. In sum, this study adds to the growing literature showing that debate participation is associated with improved academic outcomes for adolescents in large urban districts.
Findings should be interpreted considering study strengths and limitations. Consistent with prior work on debate, and extra-curricular activities in general, there was differential self-selection of students with stronger academic performance in middle school into this high school debate program (Hunt, 2005;Mezuk et al., 2011). While this study used propensity score weighting to account for this self-selection when estimating the relationship between debate participation and achievement, the validity of IPTW methods to mimic an experimental design requires strong, and generally untestable, assumptions about unmeasured confounders and measurement error. Therefore, while our approach reduces the bias that such threats to validity introduced to our inferences, we cannot exclude the possibility of residual confounding due to unmeasured factors (e.g., participation in other extra-curricular activities in high school, parental/familial characteristics, non-cognitive skills such as grit (Heckman et al., 2006;Im et al., 2016;Shelly, 2011)). Strengths include the large sample with a diverse racial/ethnic study body, longitudinal design, and indicators of pre-debate achievement to minimize the bias introduced by self-selection into debate through IPTW methods.
The Hispanic/Latino population is the largest ethnic minority group in the United States, currently representing approximately 27% of K-12 public school students (US Department of Education, 2020). This is one of the first quantitative studies to examine the relationship between debate participation and academic outcomes in a predominantly Latino/Hispanic school district, and these findings are consistent with prior work examining co-curricular activities and school engagement among Latino/Hispanic students. For example, Diaz (2005) reported that Latino high school students who engaged in more extracurricular activities reported higher levels of school engagement, although this was a general phenomenon and not specific to any particular activity (Diaz, 2005). Similarly, LeCroy and Krysik (2008) reported that having a higher number of pro-academic peers were associated with both higher GPA and more school engagement among Latino middle school students (LeCroy and Krysik, 2008). As the number of Hispanic/Latino students grows, debate leagues have worked to ensure their programming is accessible to these students; for example, several leagues offer Spanish language debate competitions (e.g., leagues in Minnesota (Minnesota Urban Debate League, 2021) and New York (Zimmerman, 2019)).
In sum, the present study adds to the literature illustrating the role of time-intensive, academicallyoriented extra-curricular activities like debate for supporting school achievement for students in urban districts (Moriana et al., 2006). It demonstrates the potential of large administrative data systems to support rigorous evaluations of the impact of such programs on student achievement at scale (Mezuk et al., 2011). When viewed in combination with the large body of qualitative and ethnographic work that has explored the various ways that competitive debate relates to adolescent development (Asad and Bell, 2014;Branham, 1995;Fine, 2004), these findings emphasize the salience of this activity for student engagement with learning both inside and outside the classroom (Louden, 2010).