Build a machine learning model to predict student performance using Amazon SageMaker Canvas

There has been a paradigm change in the mindshare of education customers who are now willing to explore new technologies and analytics. Universities and other higher learning institutions have collected massive amounts of data over the years, and now they are exploring options to use that data for deeper insights and better educational outcomes.

You can use machine learning (ML) to generate these insights and build predictive models. Educators can also use ML to identify challenges in learning outcomes, increase success and retention among students, and broaden the reach and impact of online learning content.

However, higher education institutions often lack ML professionals and data scientists. With this fact, they are looking for solutions that can be quickly adopted by their existing business analysts.

Amazon SageMaker Canvas is a low-code/no-code ML service that enables business analysts to perform data preparation and transformation, build ML models, and deploy these models into a governed workflow. Analysts can perform all these activities with a few clicks and without writing a single piece of code.

In this post, we show how to use SageMaker Canvas to build an ML model to predict student performance.

Solution overview

For this post, we discuss a specific use case: how universities can predict student dropout or continuation ahead of final exams using SageMaker Canvas. We predict whether the student will drop out, enroll (continue), or graduate at the end of the course. We can use the outcome from the prediction to take proactive action to improve student performance and prevent potential dropouts.

The solution includes the following components:

Data ingestion – Importing the data from your local computer to SageMaker Canvas
Data preparation – Clean and transform the data (if required) within SageMaker Canvas
Build the ML model – Build the prediction model inside SageMaker Canvas to predict student performance
Prediction – Generate batch or single predictions
Collaboration – Analysts using SageMaker Canvas and data scientists using Amazon SageMaker Studio can interact while working in their respective settings, sharing domain knowledge and offering expert feedback to improve models

The following diagram illustrates the solution architecture.

Prerequisites

For this post, you should complete the following prerequisites:

Have an AWS account.
Set up SageMaker Canvas. For instructions, refer to Prerequisites for setting up Amazon SageMaker Canvas.
Download the following student dataset to your local computer.

The dataset contains student background information like demographics, academic journey, economic background, and more. The dataset contains 37 columns, out of which 36 are features and 1 is a label. The label column name is Target, and it contains categorical data: dropout, enrolled, and graduate.

The dataset comes under the Attribution 4.0 International (CC BY 4.0) license and is free to share and adapt.

Data ingestion

The first step for any ML process is to ingest the data. Complete the following steps:

On the SageMaker Canvas console, choose Import.
Import the Dropout_Academic Success - Sheet1.csv dataset into SageMaker Canvas.
Select the dataset and choose Create a model.
Name the model student-performance-model.

Data preparation

For ML problems, data scientists analyze the dataset for outliers, handle the missing values, add or remove fields, and perform other transformations. Analysts can perform the same actions in SageMaker Canvas using the visual interface. Note that major data transformation is out of scope for this post.

In the following screenshot, the first highlighted section (annotated as 1 in the screenshot) shows the options available with SageMaker Canvas. IT staff can apply these actions on the dataset and can even explore the dataset for more details by choosing Data visualizer.

The second highlighted section (annotated as 2 in the screenshot) indicates that the dataset doesn’t have any missing or mismatched records.

Build the ML model

To proceed with training and building the ML model, we need to choose the column that needs to be predicted.

On the SageMaker Canvas interface, for Select a column to predict, choose Target.

As soon as you choose the target column, it will prompt you to validate data.

Choose Validate, and within few minutes SageMaker Canvas will finish validating your data.

Now it’s the time to build the model. You have two options: Quick build and Standard build. Analysts can choose either of the options based on your requirements.

For this post, we choose Standard build.

Apart from speed and accuracy, one major difference between Standard build and Quick build is that Standard build provides the capability to share the model with data scientists, which Quick build doesn’t.

SageMaker Canvas took approximately 25 minutes to train and build the model. Your models may take more or less time, depending on factors such as input data size and complexity. The accuracy of the model was around 80%, as shown in the following screenshot. You can explore the bottom section to see the impact of each column on the prediction.

So far, we have uploaded the dataset, prepared the dataset, and built the prediction model to measure student performance. Next, we have two options:

Generate a batch or single prediction
Share this model with the data scientists for feedback or improvements

Prediction

Choose Predict to start generating predictions. You can choose from two options:

Batch prediction – You can upload datasets here and let SageMaker Canvas predict the performance for the students. You can use these predictions to take proactive actions.
Single prediction – In this option, you provide the values for a single student. SageMaker Canvas will predict the performance for that particular student.

Collaboration

In some cases, you as an analyst might want to get feedback from expert data scientists on the model before proceeding with the prediction. To do so, choose Share and specify the Studio user to share with.

Then the data scientist can complete the following steps:

On the Studio console, in the navigation pane, under Models, choose Shared models.
Choose View model to open the model.

They can update the model either of the following ways:

Share a new model – The data scientist can change the data transformations, retrain the model, and then share the model
Share an alternate model – The data scientist can select an alternate model from the list of trained Amazon SageMaker Autopilot models and share that back with the SageMaker Canvas user.

For this example, we choose Share an alternate model and assume the inference latency as the key parameter shared the second-best model with the SageMaker Canvas user.

The data scientist can look for other parameters like F1 score, precision, recall, and log loss as decision criterion to share an alternate model with the SageMaker Canvas user.

In this scenario, the best model has an accuracy of 80% and inference latency of 0.781 seconds, whereas the second-best model has an accuracy of 79.9% and inference latency of 0.327 seconds.

Choose Share to share an alternate model with the SageMaker Canvas user.
Add the SageMaker Canvas user to share the model with.
Add an optional note, then choose Share.
Choose an alternate model to share.
Add feedback and choose Share to share the model with the SageMaker Canvas user.

After the data scientist has shared an updated model with you, you will get a notification and SageMaker Canvas will start importing the model into the console.

SageMaker Canvas will take a moment to import the updated model, and then the updated model will reflect as a new version (V3 in this case).

You can now switch between the versions and generate predictions from any version.

If an administrator is worried about managing permissions for the analysts and data scientists, they can use Amazon SageMaker Role Manager.

Clean up

To avoid incurring future charges, delete the resources you created while following this post. SageMaker Canvas bills you for the duration of the session, and we recommend logging out of Canvas when you’re not using it. Refer to Logging out of Amazon SageMaker Canvas for more details.

Conclusion

In this post, we discussed how SageMaker Canvas can help higher learning institutions use ML capabilities without requiring ML expertise. In our example, we showed how an analyst can quickly build a highly accurate predictive ML model without writing any code. The university can now act on those insights by specifically targeting students at risk of dropping out of a course with individualized attention and resources, benefitting both parties.

We demonstrated the steps starting from loading the data into SageMaker Canvas, building the model in Canvas, and receiving the feedback from data scientists via Studio. The entire process was completed through web-based user interfaces.

To start your low-code/no-code ML journey, refer to Amazon SageMaker Canvas.

About the author

Ashutosh Kumar is a Solutions Architect with the Public Sector-Education Team. He is passionate about transforming businesses with digital solutions. He has good experience in databases, AI/ML, data analytics, compute, and storage.

Vedere AI