top of page

Invited sessions

Mitchell Stevens (CAROL, Stanford University)

Personalization, Prediction, Tracking: Parsing Responsible Use of Student Data in Higher Education

Responsible use of data in educational environments entails commitments to honor the integrity, discretion, and humanity of students. It also obliges instructors and organizations to improve practice in light of accumulating information and knowledge. Yet the line between positive and harmful uses of education data can be very difficult to discern. In my remarks I will summarize key insights of the responsible use project (, an ongoing transnational effort to accrete ethical guidelines for this domain.

Alina von Davier (ACTNext)

An Illustration of an AI-based Educational Assistant and Its Underlying Learning Analytics


While the icon R2-D2 ( ) may not yet be a fixture in our homes, many of us have intelligent assistants, such as Siri or Google Home, that help us manage our daily lives. Moreover, this technology is quickly becoming commonplace in schools as well. As the AI-assistants learn from reliable data about a student (such as through quality assessment data from testing companies like ACT), and engage with the cognitive theory of learning that governs these data, they can explicate the dependencies across knowledge domains in a relational way, so enabling better assessment and advice to students through a myriad of situations. This trend is supported by the growing need for learning and assessment systems (LAS) that capture a broad range of learner behavior necessary for the evaluation of complex skills such as problem solving, communication and collaboration in addition to the academic skills. Hence, the AI-assistants become one type of LAS that interacts with the learner directly, helps the learner progress in a personalized way, and provides embedded assessments of both the academic and socio-emotional learning skills.

A key feature of such an AI-assistant/LAS is the use of interfaces that enable rich, immersive interactions and can capture multimodal process data i.e. a time series of multiple data mediums including audio, video and log files of student activity. However, the analysis of such data poses a significant challenge: how do we extract meaningful evidence of construct competency from complex performances as captured in varied and unstructured multimodal data? In addition, analyzing each of the multiple data modalities in isolation may result in incongruities and without appropriate use of context it may be difficult to interpret student activity as they show significant behavioral variations over time. To address these challenges, we present a methodology that utilize advances in computational psychometrics and artificial intelligence. This approach exploits concept hierarchies that reflect the nature of the data and goals of the assessment. Most importantly, capturing data in realistic tasks and settings with multiple modalities (that capture performance holistically) makes this approach more ecologically valid for assessment of complicated competencies (like collaboration), compared to say inferences of latent traits drawn from traditional tests. We illustrate the proposed framework in a variety of settings including a prototype developed at ACT called the Companion App and other virtual-world simulations designed to assess complex behavior.

Shivani Rao (LinkedIn)

An overview of AI problems applied to the domain of Online Learning

Online learning platforms have grown tremendously in recent years, having an impact from K-12 to

lifelong learning. The data collected through these technologies presents a golden opportunity to develop data-driven methods to improve education. In this talk, we look at all the AI problems - like search and discovery, micro-content for micro learning, video understanding that emerge as data is collected about learners and their learning patterns on the learning platform. 

Adam Blum (OpenEd)

Methods of Intersystem Measurement of Instructional Resource Efficacy

With the explosion of digital resources for education from a long tail of educators and authors, the topic of the cruciality of measuring efficacy of this wide array of instructional resources is finally getting airplay. In order to measure true mastery of the subject matter assessment of knowledge needs to work across publisher, site and system boundaries. This implies the need to provide a standard for establishing learning events: both instructional resource consumption and post hoc assessment events. This truly separates mastery of skills from the modality of  instruction and method of measurement. We will survey the commonly available standards and protocols for doing this today.  And make some specific recommendations for which should be most effectively implemented by learning scientists.

Once we are able to tie assessment back to instructional resource consumption then we need to agree on methods of measuring efficacy of resources given this linkage.   We present some simple efficacy algorithms that do yield repeatably valid ratings.  We then describe several enhancements to this basic approach that determine efficacy for various popular ways of segmenting learner populations, including aptitude quintiles and socioeconomic markers.  We conclude by presenting a machine learning model that predicts efficacy of a resource by including all resource metadata and all non-PII student metadata attributes (including several computed ones) in a machine learning model. The ML model creates a "contextual efficacy" based prediction of a resource's efficacy for a given student.  We present the results of deploying these contextual efficacy recommendations to the broad OpenEd student userbase.

Zachary Pardos (UC Berkeley)

 Approaches to Scalable Personal Guidance in MOOCs and On Campus 


Emma Brunskill (Stanford University)

AI for Adaptive Curriculum


I’ll discuss our recent work on using AI to help select the right activity at the right time for a particular student in a scalable way. 

Parallel Session 1a: Recommendation and prediction in online environments

Yuchi Huang, David Edwards, Lu Ou and Saad Khan

GMMC: Generating Multimodal Micro Content

Video is a powerful and effective learning tool. However, creating high quality video content remains a laborious and expensive task, which makes it challenging to align a limited set of available video resources to ever evolving learner needs. To address this challenge we propose an innovative approach for real-time automated generation of ‘micro’ multimodal content tailored for specific skills gaps identified along a learning progression. Our approach utilizes advanced computer vision, NLP and machine learning algorithms to tag and segment open source long-form video content (e.g. lectures, documentaries) to create a large repository of fine-grained atomic video units/clips of learning content. Using deep learning based algorithms and graph ranking models these atomic units can then be assembled together in countless combinations to create semantically coherent ‘micro’ videos targeted towards a very specific facet of the learning topic.


Priya Venkat, Sanghamitra Deb and William Ford

Using weak supervision techniques to improve student experiences at Chegg


With 1.6 million subscribers and over a hundred fifty million content views, Chegg is a centralized hub where students come to get help with writing, science, math, and other educational needs. In order to impact a student’s learning capabilities we present personalized content to students. Student needs are unique based on their learning style , studying environment location and many other factors. Most students will engage with a subset of the products and contents available at Chegg. In order to recommend personalized content to students we have developed a generalized Machine Learning Pipeline that is able to handle training data generation and model building for a wide range of problems. We generate a knowledgebase with a hierarchy of concepts and associate student-generated content, such as chatroom data, equations, chemical formulae, reviews, etc. Collecting training data to develop models is a key bottleneck in developing NLP models. Employing subject matter experts to provide annotations is prohibitively expensive. Instead, we use weak supervision and active learning techniques, with tools such as snorkel, an open source project from Stanford, to make training data generation dramatically easier. With these methods, training data is generated by using broad stroke filters and high precision rules. The rules are modeled probabilistically to incorporate dependencies. Features are generated from question answering systems and text summarizations techniques for classification tasks. The generated structured information is then used to improve product features, and enhance recommendations made to students.


Shamya Karumbaiah and Ryan S Baker

Predicting Quitting in Students Playing a Learning Game


Identifying struggling students in real-time provides a virtual learning environment with an opportunity to intervene meaningfully with supports aimed at improving student learning and engagement. In this paper, we present a detailed analysis of quit prediction modeling in students playing a learning game called Physics Playground, designed for secondary school students to learn physics concepts through interactive gameplay. From the interaction log data of the game, we engineered a comprehensive set of aggregated features of varying levels of granularity. The student+level+visit related features represent a student’s progress in their current visit to a level. The student+level related features represent a student’s experience with the level so far, across all their previous visits. The student related features represent the student’s progress through the game across all the levels played so far. The level related features represent the general properties of a particular level. We then trained two kinds of models - individualized level-specific models and a single level-agnostic model. Contrary to our initial expectation, our results suggest that a level-agnostic model achieves superior predictive performance. Visualizing the feature importances, we observe that the level-agnostic model is based on high-level intuitive features that are generalizable across levels. Whereas, the level-specific models tend to select features related to fine-grained gameplay activities. We enhanced the level-agnostic model further with level-related and student-related features and this model can now be used in future work to automatically trigger cognitive and affective supports to motivate students to pursue a game level until completion.

Parallel Session 1b: TUTORING


David Lang, Sigtryggur Kjartansson, Jayadev Bhaskaran and Lucianna Bennoti

Modeling Student Response Times: Towards Efficient One-on-one Tutoring Dialogues


In this paper, we investigate the task of modeling how long it would take a student to respond to a tutor question during a tutoring dialogue.    Solving such a task has applications in educational settings such as intelligent tutoring systems, as well as in platforms that help busy human tutors to keep students engaged. Knowing how long it would normally take a student to respond to different types of questions could help tutors optimize their own time while answering multiple dialogues concurrently, as well as deciding when to prompt a student again. We study this problem using data from a service that offers tutor support for math, chemistry and physics through an instant messaging platform. We create a dataset of ~240K questions. The dataset we are using consists of more than 7,000 hours of tutorial dialogues between students and tutors through an app-based tutoring platform.     We explore several strong baselines for this task and compare them with human performance. We find that a bag of words model with historical features of a tutoring dialogue outperform a bag of words model. We also found that complex-traits like sentiment or entrainment performed poorly in this environment. Lastly, we also conducted thought experiments of how machine learning algorithms would behave if the content of a student's response could be anticipated. We also document that humans perform poorly at this classification task and present findings that humans have low interrater agreement in these tasks.

Katherine Stasaski and Marti Hearst

Foreign Language Tutoring via Grounded Dialogue


Our goal is an intelligent tutoring system to interact via dialogue with students learning a foreign language. An image grounds student-tutor interactions, which can be referenced by both the student and the model. The model can utilize dialogue to determine student understanding or identify misconceptions via asking for more detail and precise student explanations or referencing the image. We aim to have a focus on the domain-independent language of tutoring and tutoring techniques.    To this end, we collect data of crowdworkers role-playing as tutors and students, where tutors are given all necessary information to complete the question. We have crowdworkers respond to the dialogue as the tutor or student and classify the response into a given category.    Once data is collected, we experiment with various deep learning architectures to represent both the image information and the past dialogue conversation. Additionally, we experiment with architecture components to provide the next utterance in a student-tutor dialogue. We attempt to model the student’s past interactions to inform the dialogue and the progression of a student’s exercises. We present quantitative and qualitative results of these models to examine which representations yielded better tutoring responses.

Zoha Zargham, Sakshi Bhargava and Sanghamitra Deb

Personalization at Chegg

Providing students with help that makes learning efficient is the primary goal of Chegg. In this presentation we will talk about personalizing the experience of students in Chegg Study, Chegg Tutors, and textbook rental. Chegg Study is vastly popular among students who rely on Chegg’s experts to get step by step guided solution to their problems. We use a ranking system to surface the best experts to answer a question posed by a student. The ranks are based on the historical performance, subject matter expertise, activity and similar features. The preliminary model is created using a linear combination of these features with hand tuned weights with the goal of optimizing retention. Following the success of this model, we created a supervised model using the same features and it is currently being tested. Chegg tutors requires online real-time recommendation. The model for the real-time recommendation is created by combining slowly varying features such a reviews, successful lessons, active days with fast varying features such as “online/offline” status of tutors, their responsiveness etc. We use a Redis pub sub architecture to update the ranks in real time. Chegg’s textbook business has one of the biggest customer bases among all of Chegg products. Chegg uses various algorithms to recommend books to students, one of them is the classic association rules, based on the historical co-purchasing behavior of students. Using the results from this algorithm not only increases sales but also can be used for pricing strategies, inventory management and search result improvements.

Parallel session 2a: Learning analytics


Petr Johanes

Putting the Philosophy of Modeling to Work for Learning Analytics


Over the last decade, the field of learning analytics has radically expanded in scope, evidenced by researchers designing platforms that handle increasingly more learner data and predicting increasingly diverse aspects of learning. With this expansion has also come a renewed interest in epistemology, the branch of philosophy that considers the nature of knowledge and knowing. This is because designing learning analytics requires taking an epistemological position on knowing as well as learning (Knight, Buckingham Shum, & Littleton, 2014). Put another way, the learning analytics we design embody our epistemological stances (Knight, Shibani, & Shum, 2018; Sandoval, 2004). A very common epistemic feature of learning analytics is modeling: modeling (e.g., a cognitive process) for the purpose of designing learning analytics; modeling (e.g., the learning effects on performance) as the outcome of research with learning analytics; and modeling (e.g., different data sets or learner behaviors) as a way to probe learning analytics. Yet, the learning analytics community has generally not leveraged insights from vibrant debates in the philosophy of science around modeling and simulation into learning analytics practice or contributed from this practice to those philosophical discussions. This talk serves as a way to bridge these two worlds. The talk (1) outlines the main questions and recent developments around the philosophy of modeling and simulation, (2) productively links these to common practices in learning analytics, and (3) suggests ways that learning analytics research can be enhanced by and also contribute to philosophical debates going forward. 

Ryan Montgomery and Eric Greenwald

Learning and Analytics, Centered around Evidence


Frequently, Learning Analytics are applied as an afterthought to the development of the learning experience. This is logistically convenient, allowing the educational experiences and learning analytics to be developed by different groups of people with different expertise, at different times, relying on iteration to bring these two work products into alignment later. Within the Lightning Talk format, we will discuss a more coordinated approach to the design of learning and assessment experiences. The Lawrence Hall of Science in partnership with Amplify Science, have developed a comprehensive K-8 Science curriculum, built from the ground up to address the Next Generation Science Standards (NGSS). We divided the NGSS’ large set of student learning objectives (SLOs) into coherent learning progressions, to guide students through them in a way that allowed complex understandings to build off of more basic understandings. Drawing on the principles of evidence centered design, our teams of curriculum developers worked to create educational units that would teach these learning progressions, coordinating with members of the learning analytics team who developed assessments that would provide evidence of student understanding. We then used the expected student responses, as examples of what a student should be able to accomplish given the instructional sequences, to ensure that instruction stayed focused on the original SLOs. We describe an example of the types of coherent SLO+Assessment+Evidence+Instruction developed, and we estimate the magnitude of the effect that this integrated ECD approach has on our student learning outcomes

Jamie Poskin

Measuring Teacher and Student Learning


When teachers learn, students learn better. A learning analytics challenge is tracking teacher learning and student learning intensively and over time because this data is difficult to collect. TeachFX is a pedagogical intervention that records individual classes and differentiates between teacher talk and student talk. Teachers receive metrics including proportions of teacher and student talk, proportion of group work, length of wait time after asking questions, who talks to whom, and changes in all of these measures over time. By receiving this feedback within a few hours of completing a lesson, teachers can make immediate adjustments and learn what helps to increase the ratio and quality of student talk. Teachers can also share data with coaches or colleagues in teacher teams. The student talk ratio and qualitative data from class transcripts indicate the level and nature of student learning (Cohen & Lotan, 2014; Hattie, 2012). Thus, the student experience is also be revealed to the teacher working individually, with a coach, or in a team. This demonstration session shows how the TeachFX tool works and the data generated in individual classrooms, collectively for a school or district, and over time at all levels of aggregation. The demonstration also presents early trends in teacher talk, student talk, and teacher and student learning. Implications for special needs populations are particularly strong.

Parallel session 2b: MOOCS

David Lang

Predicting Clickstream Engagement in MOOCs using Transcript Level Features

Many online course providers have recently started publishing transcripts of their online videos in order to maintain compliance with the Americans with Disabilities Act. Simultaneously, one of the challenges of MOOCs (Massive Open Online Courses) is that quality evaluation of course content is difficult prior to a course's launch. Learning platforms must either have internal reviewers evaluate video quality or endanger their reputation by releasing unvetted courses. We propose using transcript level features to predict course engagement using clickstream measures. We used these features to model activity on user behaviors: plays, pauses, seeks to, and seeks from. We find that a simple lasso model with bag-of-word features can predict seek behaviors with a reduction of 10% mean squared errors compared to a null model . We also find that course vocabulary tend to be predictive of significance. No features we explored produced substantial gains in predicting video pause and video play behavior.

Varun Ganapathi, Byung-Hak Kimand Ethan Vizitei

Predicting and Improving Student Performance with Machine Learning

MOOCs have the opportunity to truly democratize education, but only if students actually complete the courses. We applied machine learning to predict the future performance of students in a MOOC. We found that simple models like logistic regression could achieve surprisingly accurate predictions of a student's future with only a few weeks of observed website behavior. However, a more sophisticated approach resulted in more accurate predictions. We recast the student performance prediction problem as a sequential event prediction problem. This led to GritNet, our new algorithm. GritNet is based on a deep learning model called bidirectional long short term memory (BLSTM). Our results, based on real Udacity students' graduation data, show that the GritNet consistently outperforms the standard logistic-regression based method. We then show how one can leverage the predictions in order to improve student performance directly. We discuss several experimental interventions, ranging from personalized notifications to adaptive content that could improve student performance.

Yu Su

Assessing Self-Learning Outcomes for Complex/Abstract Concepts under Virtual Reality Environment


Study Purpose
The research purpose of the study is to investigate self-learning outcomes on complex/abstract
concepts under Virtual Reality (VR)-based environment in comparing with video-based
environment by using Head-Mounted Displays (HMDs). Results would inform what video-based
instructions can be replaced by VR-based counterparts for significantly better learning outcomes
on Massive Open Online Courses (MOOCs).

Research Questions
1. Are self-learning outcomes for complex/abstract concepts under VR-based instruction
significantly better than video-based instruction on MOOCs?
2. Are there any different responding behaviors and learning outcomes between high and
low achievement students under same VR-based instruction?

Best practices in efficacy research

Digital Promise

Empirical Education

Khan Academy

Rosetta Stone

bottom of page