Prepared by Armando Chacon and Pablo Peña
Steam mostly, A.C.
In Inoma B.C.
The assessment is summarized in this document was done with the information provided to the authors by Inoma. The authors are solely responsible for methodological errors. They were not paid by Inoma or any other organization. supplementary materials may be requested by contacting the authors email@example.com.
INOMA is a Mexican nonprofit organization dedicated to improving educational outcomes complementing conventional education with educational games online. This paper presents the results of an impact assessment of educational online games Inoma primary school pupils in the state of Puebla exposed to such games. In its first stage-the subject of this evaluation-Inoma games they were designed with the purpose of developing math skills of students as measured by the national examination Link,es1.
In order to assess the impact of their online games, Inoma conducted a test Randomized Control (PCA) to students from 3rd to 6th grade in a group of public schools in the metropolitan area of the City of Puebla (AMCP) in the state of Puebla, in Mexico. The PCA was conducted between February and June 2012.
Inoma contó con el apoyo de las autoridades locales del estado de Puebla para lograr que los estudiantes estuvieran expuestos de manera sistemática a sus juegos en línea. Para la PCA la Secretaría de Educación del estado permitió que los estudiantes de las escuelas de tratamiento jugaran los juegos de Inoma una hora a la semana durante el tiempo de clase. Las escuelas ya dedican una hora semanalmente a desarrollar las capacidades digitales de sus alumnos—aprender a usar una computadora y el Internet—en sus aulas de medios. En las escuelas de tratamiento, esa hora semanal podía dedicarse a jugar los juegos de Inoma, aunque no era obligatorio.
The state Department of Education found that there were 184 primary schools in the AMCP conditions for the PCA.,es2 According to official records, all had media classrooms equipped with computers connected to the Internet. At an early stage Inoma randomly chose which of the 184 schools of treatment would be part of the sample 60 in which the PCA would apply. In a second stage randomly selected 30 schools treatment. However, it was considered that nine of the treatment schools derecognised test-their representatives did not attend the first session information-so randomly the same number of additional schools were selected to replace apparently they would disenrolled from the test. As a result, the sample was expanded to 69 schools: 33 schools in the treatment group and 36 in the control group.,es
1 The test applies to students in 3rd through 6th grade, from 1st to 3rd year of high school and high school 3rd at the end of each academic year.
2 Some schools have two shifts, morning and evening. For evaluating different shifts in the same school they were considered as different schools.
Subsequently it made known to Inoma that most schools did not have computers or Internet connectivity appropriate. Consequently, they had to repair media classrooms in schools treatment. They were removed in nine schools in the treatment group due to lack of infrastructure. Similarly, two more schools in the treatment group refused to participate and derecognised test. In order to maintain balance in the design of the PCA, some randomly selected schools were eliminated from the control group. The final sample consisted of 44 schools, 22 of them in the treatment group and 22 in the control group.,es
Table 1 shows the sample used for evaluation. There exposed the information is based on the results of Link. Only consider those schools whose students presented Link 2012 in 4th, 5th or 6th grade, and Link 2011 in the previous degree (3rd, 4th or 5th). Of the 4,138 primary schools in the state of Puebla with students who applied Link, 184 are in the AMCP and supposed that they had classrooms operating means 36 to the control group and 33 were assigned to the treatment group. However, 14 schools in the control group and 9 in the treatment group were removed from the test and 2 refused to participate.,es
Table 1 shows the number of students in 4th, 5th and 6th grade in 2012 who took Link in 2012 and 2011-when they were in 3rd, 4th and 5th graders. More than 350,000 students took Link, and more than 40,000 were in the AMCP. In the set of schools considered fit for the PCA there were more than 15,000 students. In the final control group there were 4,396 students and in the final treatment group was 5,584.
Since media classrooms were not operational before the PCA, Inoma had to repair schools belonging to treatment. So treatment is not limited to students were exposed to educational online games; it also meant that the media classrooms so that computers have operating properly and with Internet connectivity is repaired. In theory, having repaired classrooms media could have had a not attributable to students have played online games Inoma impact.
In order to obtain faster results, reduced versions Inoma prepared test link or "mini links".3 These tests consisted with between 14 and 19 reagents and applied in schools both the control group and the treatment group in February, April and June 20124. En principio, la aplicación de este tipo de pruebas basadas en exámenes Enlace anteriores por sí misma podría implicar un mejor desempeño en la prueba real—los estudiantes practican más y los maestros se hacen más conscientes del nivel de conocimiento de sus alumnos. En ambos grupos de escuelas, tanto del grupo control como del grupo de tratamiento, se aplicaron este tipo de exámenes. En consecuencia, al haber participado el grupo control en esta práctica adicional de la prueba Enlace puede considerarse que recibió de alguna manera también recibió un tratamiento.
For students were not required to play games Inoma. Because of this, you can only estimate the impact of "intent to treat" (intent-to-treat). The treatment group was offered treatment, but not necessarily all individuals took it. The impact is averaged among all students who were offered treatment, including those who voluntarily chose not to take it. The relevance of attempt to deal estimated depends on the actual intervention expected in practice. If the current policy was not mandatory-or be mandatory but without verifying their fulfillment then the attempt to treat estimate would be informative about the same.
INOMA could monitor what students in treatment schools were recorded in its online platform. Realizing that by mid-April only a modest group of students in treatment schools had registered Inoma gave them incentives to increase their participation. He offered computers Notebook the teacher and the school that recorded the largest share. Strictly speaking, this additional incentive for less responsive schools should also be considered as part of treatment.
the type of interaction that occurred between teachers and students in classrooms media during the time used these online games Inoma is unknown. Teachers could have let students explore their own games and decide how to play them or if any teachers were actively advised. What teachers have said or done during play sessions may have had an impact on the attitude of students and their level of commitment; for example, they are perceived as an obligation or as a diversion? This aspect of treatment is completely unknown-a black box.,es
3 Link results are available several months after the tests are applied.
4 Mathematics tests Liaison had between 53 and 71 questions in 2012, depending on grade.
IV. Impact metrics
The Ministry of Education of the state of Puebla Inoma shared with the scores of all primary school pupils for the evidence Link 2011 and 2012. Since the intervention occurred between February and June 2012, changes in test scores between 2011 and 2012 can be used as a metric of impact. It was not possible to include 3rd graders in the evaluation because Link applies to students from 3 to 9 and 12 degree. In 2011 the 3rd grade students were enrolled in 2nd grade, so were not evaluated.,es
There are several considerations regarding the use of test scores as a metric impact Link. First, the level of difficulty of Link varies from year to year. Consequently, both the mean and the variance of test scores could change if the skills of students remained constant. In order to control variations in the degree of difficulty, the impact metric defined as changes in standardized test scores. That is, the test scores for each grade in each year they played down the middle and then divided by the standard deviation. Then for each student the difference between standardized rating in 2011 and 2012. The average difference is zero, and any difference in standard deviations is defined Liaison was calculated.,es
Second, Link is not perfect and provides only a noisy measure of what is captured. In order to keep it short, it is required to select a small set of questions from a much larger group of questions theoretically equivalent. From the perspective of students, there is an element of chance in relation to their performance in Link.,es
Some students are more fortunate than others in a particular year because they studied more than some of the issues that appeared in the test. That is, no link noise causing "regression to the mean" throughout the tests: it is expected that some of the lucky students who either played a year have underperformed the following year. The same applies to the other end of the distribution: it is expected that some of those who performed poorly to perform better because they were unlucky last year. The empirical strategy to assess the impact of online games Inoma should consider this regression to the mean.,es
Third, in some respects, the unit of analysis is the classroom-not the student. Students share the same classroom teachers and resources. Consequently, the performance impact Link are not independent among students in the same classroom. In order to take into account the potential correlation error terms, standard errors in regression analysis should agglomerate per classroom.,es
Fourth, the contents of Link are not exactly the same each year. You may release a year is more related to the contents of games Inoma than other years. This variation can not be taken into account in the assessment of one year only. INOMA may have been lucky or unlucky in 2012. That limitation should be borne in mind when the results are analyzed.,es
Fifth, there are some skills that Inoma Link measures is not intended to improve. That is, games Inoma focus on a subset of the measuring Link. However, using all of the test should not be a problem. In any case, Link is a measure noise Inoma target set of skills. Since the noise is on the left side of the equation-the-dependent variable should not introduce a bias in the estimates of the impact5.
The ratings of the mini links could have been used as the metric impact assessment. However, the use of actual test results for various reasons Link is preferred. First there was a not inconsiderable loss cases among the three mini links. A significant number of students from one test to the next was not identified correctly, reducing the sample that could be used not random.6 En el caso de Enlace los estudiantes están correctamente identificados por su Clave Única de Registro de Población (CURP). En segundo lugar, las calificaciones de los mini Enlaces carecen de validez externa. Inoma los creo, los aplicó y los calificó. Por el contrario, la Secretaría de Educación Federal crea y califica Enlace, y esta es aplicada por cada escuela del país. En tercer lugar, los mini Enlaces pueden no haber sido tomados muy en serio, especialmente en las escuelas del grupo de control. Dado que nada estaba en juego, no es claro que sea comparable con Enlace. Por el contrario, los resultados de Enlace sí tienen implicaciones para las escuelas—parte de la compensación de los maestros está determinada con base en los resultados de la prueba Enlace. Finalmente, los resultados utilizando los mini Enlaces no pueden interpretarse con facilidad. No se traducen directamente en unidades comúnmente empleadas. Utilizar Enlace significa que un impacto equivalente a x desviaciones estándar está bien definido. Al usar los mini Enlaces, un impacto de x desviaciones estándar no coincide con cambios en las desviaciones estándar de Enlace—lo cual es un punto de referencia para otras intervenciones.
5 Un error de medición produce un sesgo de atenuación cuando se presenta en variables del lado derecho.
6 Menos del 80% de los que tomaron el mini Enlace de febrero fueron identificados entre los que tomaron el mini Enlace de junio.
V Empirical Strategy
Because of the problems that arose with the design of the PCA, which caused some schools remain excluded and others refused to participate, it was decided to adopt a quasi-experimental approach. Instead of comparing the means of final treatment groups and control, we opted for a regression analysis that included all schools in the state of Puebla and in-difference technique (DD) was applied. In short, the change is compared to the performance of students in the treatment group with the change in the performance of students in the control group, but not excluding any school.7
In the DD approach, the first difference is defined between years (the change in standardized test scores of 2011 and 2012) and the second is defined between groups (treatment versus control). In order to control for regression to the mean in each student's grades, it was used in the regressions a polynomial of third grade standardized test score of 2011, and all students in the state of Puebla were used. Standard errors are pelleted by living in some schools there are several rooms in the same grade.,es
Se corrieron regresiones separadas para niños y niñas, y para cada grado disponible (4to, 5to y 6to). Como prueba de robustez, además de los resultados en Matemáticas, se exploran los resultados en Español.
Las regresiones consideran todas las escuelas del estado de Puebla, incluidas aquellas que habían sido excluidas de los grupos de tratamiento o de control, así como las dos que abandonaron el grupo de tratamiento. Las regresiones incluyen efectos fijos para cada una de esas categorías.
DD approach implies that no controls for the invariant features are needed in the time students or schools. Examples of these student characteristics are: the educational level of parents, material resources from home, your preferences, your IQ. By the school, those features could include: its location, its facilities, curriculum, preparation of teachers, incentives to teachers. All these potential determinants of performance disappear because subtracted when considering changes in Link from one year to the next for each student. The regression equation is:
donde y es la calificación estandarizada de 2011 y Δy es el cambio en la calificación estandarizada entre 2011 y 2012.8 The subscript indicates the student i. ds is a dummy para estudiantes en escuelas en la muestra de 69 escuelas participantes en la PCA, dec is a dummy para los estudiantes en las escuelas que fueron excluidas del grupo de control, dand is a dummy for students in schools that were excluded from the treatment group ddt is a dummy para los estudiantes en las escuelas que abandonaron el grupo de tratamiento, y dt is a dummy para los estudiantes en las escuelas incluidas en el grupo final de tratamiento.
7 Si la CPA fue implementada correctamente, entonces una comparación de medias y el enfoque DD identificarían el mismo parámetro. Ello no ocurriría si la CPA no fuera aplicada correctamente.
8 Standardized tests have a mean of zero and a standard deviation of 1. The standardization is done for each test and for each grade separately.
The coefficient λ is the parameter of interest: the impact of treatment. It is estimated by comparing the final treatment groups and control. Estimates of φ and γ could be interpreted as "decoys". If properly they implemented randomization and treatment, estimates of φ and γ should not be statistically different from zero. Otherwise, they could be interpreted as evidence of selection in the sample.
Table 2 shows the results of the regression for Mathematics and Table 3 shows the regression to Spanish. Both tables present information in a similar way. The upper panel shows the results of boys and girls below. Each panel includes three columns, one for each grade analyzed: 4th, 5th and 6th. all students in the state of Puebla who had qualifications in 2011 and 2012 in the regressions in Tables 2 and 3 are included.9 The regression specification described in equation (1). The statistical significance of the coefficients were calculated using robust standard errors agglomerates per classroom.10
9 The results of students in grade 4 compared to 3 degree. Similar comparisons apply to students in grades 5 and 6.
10 classrooms using the group variable in the same grade and school were identified.
The first result that stands out is that the coefficient of dummy "Final treatment" is positive and significant in some cases. A positive impact with 95% confidence for children 5th grade and 6th grade girls. Nor can it be denied a positive impact with 90% confidence level for children 6. The point estimates are significant are not negligible. Involve major improvements: between 0.114 and 0.180 with a standard deviation Link.
Las estimaciones para alumnos de escuelas excluidas de los grupos de control o de tratamiento y para las escuelas que abandonaron el tratamiento resultan desconcertantes. En cuatro casos son positivos y significativos con 95% de confianza, y las estimaciones puntuales varían entre 0.243 and 0.381. Es decir, en promedio los alumnos de escuelas que estaban originalmente en la PCA y que fueron excluidos posteriormente obtuvieron mejores resultados que los alumnos que estuvieron en el grupo de tratamiento final en algunos casos. Estos hallazgos ponen en duda la validez del diseño experimental.
Likewise Table 2 indicates that the sample for the PCA is not representative of all schools in the state of Puebla. The coefficient of belonging to the sample for the PCA is negative and significant with 95% confidence for girls in grades 5th and 6th.
In summary, the results regarding the impact of online games Inoma in Mathematics are inconclusive but promising. It seems to have received access to online games Inoma had a positive impact.
El cuadro 3 muestra un análisis similar para Español. Si bien no hay una razón a priori para esperar un impacto en el desempeño en Español, los resultados muestran cierta evidencia de mejoras. La estimación del impacto es positiva y significativa con 95% de confianza para los niños de 6to grado, y con 90% de confianza para los niños de 5to grado y las niñas de 6to grado. También hay evidencia de estudiantes de 5to grado en las escuelas excluidas del tratamiento que tuvieron mejores resultados. El impacto en Español es sorprendente. En principio, es posible que los juegos de Inoma liberaran recursos del estudio de Matemáticas al estudio de Español.
VII. Advertencias y preocupaciones
There are several caveats and concerns about the results presented in Tables 2 and 3. The first and most important is that it is not clear what was the intensity of treatment, although it was probably modest. The exposure time was short-a few sessions with only a handful of games. Inoma platform was a new technology and there may be a learning curve for teachers who try to promote it. Most likely there was not enough time to operate it properly.,es
The second is that it is unclear how the experiment was perceived by teachers and school principals. What they perceived as a public policy whose results they would be held responsible or simply as an academic experiment without consequences in which they played a passive role if not indifferent? It is not clear what expectations they had in terms of the consequences of the assessment. For example, they would continue to have the responsibility to use new online games Inoma only if students had good performance? Will it be perceived as a strategy to replace traditional methods that would jeopardize the status of current teachers? These could be factors affecting the interpretation of the results as a guide to policy.,es
The third is that the likely contact between the principals of the treatment groups and control casts doubt on the purity of the experiment. Some principals were aware that they were participating in "some kind of competition." That perception may have affected their behavior, but it is unclear in what sense.
VIII. Discussion of findings
The comparison of final treatment groups and control itself gives evidence of a positive effect of exposure to online games Inoma. Despite the short time of exposure to treatment and their nonbinding status, it can not reject a positive impact with 95% confidence for some students. The point estimates in these cases are not negligible: 0.114 and 0.180 standard deviations in math scores.,es
A wide range of interventions have been studied empirically in the academic literature in terms of its impact on educational outcomes. Beyond interventions that affect the availability of desks, knowledge by the teachers of the subjects they teach and teacher absenteeism, no more empirical guidance about what works in reality. In fact, the greater the methodological rigor with studies, it is likely to reject a positive impact of an intervention under study11. In this context the evidence on online games Inoma is extremely encouraging.
INOMA took a giant step in the right when attempting to perform an experimental evaluation of the impact of their online games direction. The results presented here indicate that it is possible the existence of significant academic achievements result of playing their games online.
11Ver “School Resources and Educational Outcomes in Developing Coun-tries: A Review of the Literature from 1990 to 2010” — de Paul W. Glewwe, Eric A. Hanushek, Sarah D. Humpage, y Renato Ravina.