About the customer
Springer Nature advances discovery by publishing robust and insightful research, supporting the development of new areas of knowledge, making ideas and information accessible around the world, and leading the way on open access.
Key to this is their ability to provide the best possible service to the whole research community by:
- helping authors to share their discoveries
- providing quality publishing support to societies
- taking the lead on key issues that matter to funders and policymakers
- enabling researchers to find, access, and understand the work of others
- supporting librarians and institutions with innovations in technology and data
For this project, we worked together with Springer Nature’s daughter company, Macmillan Education, a global publisher with a local presence in over 130 countries worldwide. Its mission is to respond to educational needs at a local level, whether this is through teacher training, online resources or just recommending an appropriate textbook.
The Challenge
Today, Macmillan Education has over 66 million responses from students from the different courses they offer. Recently the company defined a set of initial goals on how they could strategically harness the data for their own and their customer’s benefit. For instance, to support recommendations for learning paths based on the assessment engine they added in 2022.
The Goal
Macmillan Education has identified multiple areas to activate the data to increase its market competitiveness.
Specifically for this project, the goals for adaptive learning on Google Cloud were:
- Offer teachers performance insights and data-driven customisation opportunities to make their life easier and to improve class performance and learning success
- Offer students more-engaging, personalised experiences that improve their performance and learning success
- Offer personalised content recommendations from the courses repository to both teachers and their students, based on profiles and past performance
- Turn these additions into core features of the products
The Solution
Devoteam G Cloud worked together with Macmillan to make their platforms smarter using adaptive learning techniques. To achieve this, Devoteam G Cloud set up an MLOps workflow on GCP to deploy the AI/ML models that will be produced. A data exploration phase gave insight into the behaviors of the learner. These insights can now be used to suggest a future learning path for the learner.
This project kickstarted the use of ML in MacMillan education’s platform. This will make their products smarter and make the experience more productive and personalised for both teachers and learners.
The high-level activities that Devoteam G Cloud performed, included:
- Setup of the GCP environment
- Setup of a data science stack on GCP
- Data exploration & analysis
- MLOps setup
- Model development & deployment
The solution consists in predicting activity scores for specific users. The model takes as inputs:
- a user embedding, i.e. a high dimensional vector representation of the user,
- his short-term history, characterised by a list of activity embeddings and the scores obtained on these activities,
- the embedding of the activity for which a score would be predicted.
To make the actual predictions a variational autoencoder, a neural matrix factorization model, and an adaptive model were developed.
The strong commitment from the customer could be seen as an early indicator of success. Of course, they believed in the project from the get-go, but as we were creating experiments and sharing our results with them, they began to understand the potential of embeddings. As a result, they understood that a lot of value could be derived from the creation of embeddings, which could then be applied to different use cases.
This knowledge sharing generated new ideas and new demands from Springer Nature’s side, which were eventually met throughout the second phase of the project.
The Methodology
First, we performed a data exploration and data quality analysis. We used Vertex AI Workbench to use JupyterLab notebooks for that part. This allowed us to create report-like documents containing pieces of code, visual results, and descriptive texts.
Throughout this exercise, a lot of interactions took place with Springer Nature in an agile way to keep them up-to-date with our latest discoveries about the state of their data. This would be beneficial for both sides.
- Springer Nature discovered in-depth insights about their data
- It increased Devoteam G Cloud’s understanding of some domain-specific knowledge hidden within the data itself.
The modeling phase was carried out by using Vertex AI Pipelines, separating the whole process into small dedicated components, ranging from data extraction, through data transformation & preparation to model training, testing, and deployment. This enforced the traceability and transparency of any created model and experiment.
Rather than only showing metrics and graphs, the translation from these mathematical concepts into some actionable insights and real-life impact was also part of this project.
The Result
As an indicator of performance, the Root Mean Square Error (RMSE) was used. It is an indication of how far the predictions are from the true scores. Since scores were between 0 and 1 ([0%, 100%]), the indicator’s value is quite easy to analyse.
Among the 1.4 M+ test scores, the predictions are on average ~0.191 off from the true scores of the students. From our analysis, the model is very good at predicting whether a student will fail or pass a given activity, making it especially useful for Springer Nature to comply with its mission of responding to educational needs.
As this project is tied to the education industry and mainly applied to help students and teachers throughout the academic years, it makes sense to start using it in production at the start of a new year. Hence, production use will start at the beginning of the next academic year, where more insightful metrics will be extracted, such as click-through rate and continuous model performance monitoring.
Your success starts here
What’s your Google Cloud challenge?