Providing Student Data
In order to track the impact that project success has on the participating schools, we will need a set of historical student data as well as data for consecutive semesters.
This data is broken up into several comma separated value .csv
files. The image below illustrates each data file and how they interact. The table below indicates generally what type of data the files include. However, please use the navigation menu on the left to explore detailed information regarding each data file.
File Name | Data Type | Data Examples |
---|---|---|
university_info.csv | The university information | e.g., OPEID, Name, Code |
university_layout.csv | The university structure. All programs offered at the instution each term. | e.g., College, Department, Major |
students.csv | All students at first enrollment by term as well as demographic attributes. | e.g., ACT, SAT, High School GPA |
terms.csv | All students attending the institution by term. | e.g., Status, Program, Cumulative GPA |
awarded_degrees.csv | All students awarded degrees by term |
Data Process
The following diagram illustrates the process undergone by each set of files submitted from a school to ECMC. This process ensures the correctness of data, logical relationship between the data provided in different files and the reconciliation of Data Warehouse (DW) numbers with official school numbers.
The DW team carries out the following steps for each submission of a new set of files from the school to the AWS upload site:
The files are run through the validation engine. The validation engine determines whether each file adheres to the rules specified in this data dictionary for the particular file type:
- All the required columns are provided.
- The columns are formatted as per specification and data type.
- The presence of data inside required columns.
The files are moved to the next step of the process if they successfully pass the validation step. If not, Validation Error Reports are sent back to the school with requisite information that would enable the school to fix the reported issues and resubmit the files.
The validated files are then checked for linkage i.e., the logical relationship between the data provided in each file. In this step, the DW team uploads the data into the DW and runs reports that check if the data provided in the different files link to each other and also to school's previously submitted data. As per the description provided in this data dictionary, the data provided in one file should follow the same standards and be a subset of each other. For e.g., the enrollment information provided in the Term file can only be linked to a student, if the corresponding student information was provided in the Student files.
Depending on the aforementioned reports, the DW team determines if the school passes this step. If not, a set of Missing Data Reports are generated and sent to the school. These missing data reports identify information provided in one file that may be missing from one or more files, and which is necessary to link these files together.
The school is requested to resubmit all the files with the missing data included. This resubmission triggers the whole process from the start.
Once the school successfully passes the Data Linkage Step, it is moved to the Data Quality Assurance (QA) step. In this step, various reports are generated by the DW Team to project the enrollment, graduation and incoming student numbers that was provided in the data files. A summary report is then prepared using the numbers from the different reports. Thereon, a careful evaluation is done by the DW team to identify any inconsistency in loading of the data and the cause behind those inconsistencies. For e.g. students are provided with multiple date of birth values in the student file.
Finally, the reports, along with the conclusions drawn from the evaluation, are sent to the school. The school is requested to evaluate the reports on their end to check if the numbers align with their official reporting numbers (numbers reported by the school's registrar office) and with their understanding of the data.
If no discrepancy is found in the QA and ECMC/School are satisfied with the numbers, we mark the School as QA complete. If not, we request the school to fix the reported issues and resubmit necessary corrections. This may lead to subsequent QA iterations until all issues are resolved.
This whole process is repeated every submission cycle.
Historical Data
We request that each school provides at least 5 years of historical data for this data set. This will allow us to measure the impact that project success has on student progress.
New File Submissions
After the historical data files are loaded into the Data Warehouse, we request that each school submit the data related to the last two semesters. For example, if a school is submitting files on 06/1/2019, the school should submit all four files (university_layout.csv, students.csv, terms.csv, and awarded_degrees.csv) with Fall 2018 and Spring 2019 data. This will allow us to validate and verify the data in the Data Warehouse and ensure the correctness of the analytics run against each school's data.
Below is a list of the upcoming submission dates with the data required for each submission:
Due Date | Terms Required |
---|---|
6/1/2019 | Fall 2018 and Spring 2019 |
9/1/2019 | Spring 2019 and Summer 2019 |
1/15/2020 | Summer 2019 and Fall 2019 |
6/1/2020 | Fall 2019 and Spring 2020 |
9/1/2020 | Spring 2020 and Summer 2020 |
1/15/2021 | Summer 2020 and Fall 2020 |
6/1/2021 | Fall 2020 and Spring 2021 |
9/1/2021 | Spring 2021 and Summer 2021 |
1/15/2022 | Summer 2021 and Fall 2021 |
6/1/2022 | Fall 2021 and Spring 2022 |
9/1/2022 | Spring 2022 and Summer 2022 |
1/15/2023 | Summer 2022 and Fall 2022 |