How to Prepare Your Data for SGP Analysis

The term big data’ has become a popular buzzword in both the scientific and general media. It describes datasets that are so large they cannot be easily managed or analyzed using traditional methods. Despite the increasing interest in big data, the majority of SGP analyses are still conducted with datasets that are relatively small in size compared to other types of research in education and science.

For example, the sgpData package contains five years worth of student assessments for each individual student in a sample of schools across the US. This is a lot of data, but it is not nearly as large as the information available on Facebook interactions (see How much data does SGP need? for more information).

There are two common formats used to represent longitudinal (time dependent) student assessment data: WIDE format and LONG format. WIDE format data is organized such that each case/row represents a unique student and each column represents variables associated with that student at different times. In contrast, LONG format data is organized such that each time a student takes an assessment, the results are stored as a separate row in the dataset. The sgpData package includes exemplar WIDE and LONG format data sets (sgpData_WIDE and sgpData_LONG) to assist users with data preparation.

It is important to note that if you are planning to perform SGP analyses on a routine basis, it is recommended that you use the LONG format for all data set operations. In particular, the higher level functions studentGrowthPercentiles and studentGrowthProjections operate with LONG data rather than WIDE data. Additionally, LONG data is generally easier to work with when preparing and storing compared to WIDE format data.

To prepare your data for SGP analysis, it is necessary to ensure that the assessments are all aligned in terms of dates and scores. This will ensure that the calculations return the most accurate and valid results. The sgpData package contains tools to align student assessments for each test type. Additionally, a list of test dates is provided to allow users to select the most appropriate dates for their particular context.

Once the assessments have been aligned, the next step is to create a longitudinal student database. This is a table that stores all of the current and past assessments for a particular student. This will be the base for all future SGP analyses that you conduct.

Lastly, you will need to decide if you want to use the most recent assessment or an earlier one in your calculations. Most SGP analysts use the most recent assessment to determine a student’s current SGP; however, this is not always the case. For example, some users prefer to calculate SGP for their students based on a combination of the most recent and at least one prior assessment from an earlier testing window.

SGP is a useful tool to have for educators who are interested in understanding the performance of their students in comparison to their peers. This is especially important for teachers who are interested in ensuring that their students are making progress towards their academic goals. To do this, it is essential to have access to quality data that is timely and accurate. SGP is a free and easy to use statistical program that can help educators do just that.