Email overload: Using machine learning to manage messages, commitments

Published

As email continues to be not only an important means of communication but also an official record of information and a tool for managing tasks, schedules, and collaborations, making sense of everything moving in and out of our inboxes will only get more difficult. The good news is there’s a method to the madness of staying on top of your email, and Microsoft researchers are drawing on this behavior to create tools to support users. Two teams working in the space will be presenting papers at this year’s ACM International Conference on Web Search and Data Mining February 11–15 in Melbourne, Australia.

“Identifying the emails you need to pay attention to is a challenging task,” says Partner Researcher and Research Manager Ryen White of Microsoft Research, who manages a team of about a dozen scientists and engineers and typically receives 100 to 200 emails a day. “Right now, we end up doing a lot of that on our own.”

Spotlight: AI-POWERED EXPERIENCE

Microsoft research copilot experience

Discover more about research at Microsoft through our AI-powered experience

According to the McKinsey Global Institute, professionals spend 28 percent of their time on email, so thoughtful support tools have the potential to make a tangible difference.

“We’re trying to bring in machine learning to make sense of a huge amount of data to make you more productive and efficient in your work,” says Senior Researcher and Research Manager Ahmed Hassan Awadallah. “Efficiency could come from a better ability to handle email, getting back to people faster, not missing things you would have missed otherwise. If we’re able to save some of that time so you could use it for your actual work function, that would be great.”

Email deferral: Deciding now or later

Awadallah has been studying the relationship between individuals and their email for years, exploring how machine learning can better support users in their email responses and help make information in inboxes more accessible. During these studies, he and fellow researchers began noticing varying behavior among users. Some tackled email-related tasks immediately, while others returned to messages multiple times before acting. The observations led them to wonder: How do users manage their messages, and how can we help them make the process more efficient?

“There’s this term called ‘email overload,’ where you have a lot of information flowing into your inbox and you are struggling to keep up with all the incoming messages,” explains Awadallah, “and different people come up with different strategies to cope.”

In “Characterizing and Predicting Email Deferral Behavior,” Awadallah and his coauthors reveal the inner workings of one such common strategy: email deferral, which they define as seeing an email but waiting until a later time to address it.

The team’s goal was twofold: to gain a deep understanding of deferral behavior and to build a predictive model that could help users in their deferral decisions and follow-up responses. The team—a collaboration between Microsoft Research’s Awadallah, Susan Dumais, and Bahareh Sarrafzadeh, lead author on the paper and an intern at the time, and Christopher Lin, Chia-Jung Lee, and Milad Shokouhi of the Microsoft Search, Assistant and Intelligence group—dedicated a significant amount of resources to the former.

“AI and machine learning should be inspired by the behavior people are doing right now,” says Awadallah.

The probability of deferring an email based on the workload of the user as measured by the number of unhandled emails. The number of unhandled emails is one of many features Awadallah and his coauthors used in training their deferral prediction model.

The team interviewed 15 subjects and analyzed the email logs of 40,000 anonymous users, finding that people defer for several reasons: They need more time and resources to respond than they have in that moment, or they’re juggling more immediate tasks. They also factor in who the sender is and how many others have been copied. They found some of the more interesting reasons revolved around perception and boundaries, delaying or not to set expectations on how quickly they respond to messages.

The researchers used this information to create a dataset of features—such as the message length, the number of unanswered emails in an inbox, and whether a message was human- or machine-generated—to train a model to predict whether a message is deferred. The model has the potential to significantly improve the email experience, says Awadallah. For example, email clients could use such a model to remind users about emails they’ve deferred or even forgotten about, saving them the effort they would have spent searching for those emails and reducing the likelihood of missing important ones.

“If you have decided to leave an email for later, in many cases, you either just rely on memory or more primitive controls that your mail client provides like flagging your message or marking the message unread, and while these are useful strategies, we found that they do not provide enough support for users,” says Awadallah.

Commitment detection: A promise is a promise

Among the deluge of incoming emails are outgoing messages containing promises we make—promises to provide information, set up meetings, or follow up with coworkers—and losing track of them has ramifications.

“Meeting your commitments is incredibly important in collaborative settings and helps build your reputation and establish trust,” says Ryen White.

Current commitment detection tools, such as those available in Cortana, are pretty effective, but there’s room for further advancement. White, lead author Hosein Azarbonyad, who was interning with Microsoft at the time of the work, and coauthor Microsoft Research Principal Applied Scientist Robert Sim seek to tackle one particular obstacle in their paper “Domain Adaptation for Commitment Detection in Email”: bias in the datasets available to train commitment detection models.

Researcher access is generally limited to public corpora, which tend to be specific to the industry they’re from. In this case, the team used public datasets of email from the energy company Enron and an unspecified tech startup referred to as “Avocado.” They found a significant disparity between models trained and evaluated on the same collection of emails and models trained on one collection and applied to another; the latter model failed to perform as well.

“We want to learn transferable models,” explains White. “That’s the goal—to learn algorithms that can be applied to problems, scenarios, and corpora that are related but different to those used during training.”

To accomplish this, the group turned to transfer learning, which has been effective in other scenarios where datasets aren’t representative of the environments in which they’ll ultimately be deployed. In their paper, the researchers train their models to remove bias by identifying and devaluing certain information using three approaches: feature-level adaptation, sample-level adaptation, and an adversarial deep learning approach that uses an autoencoder.

Emails contain a variety and number of words and phrases, some more likely to be related to a commitment—“I will,” “I shall,” “let you know”—than others. In the Enron corpus, domain-specific words like “Enron,” “gas,” and “energy” may be overweighted in any model trained from it. Feature-level adaptation attempts to replace or transform these domain-specific terms, or features, with similar domain-specific features in the target domain, explains Sim. For instance, “Enron” might be replaced with “Avocado,” and “energy forecast” might be replaced with a relevant tech industry term. The sample level, meanwhile, aims to elevate emails in the training dataset that resemble emails in the target domain, downgrading those that aren’t very similar. So if an Enron email is “Avocado-like,” the researchers will give it more weight while training.

General schema of the proposed neural autoencoder model used for commitment detection.

The most novel—and successful—of the three techniques is the adversarial deep learning approach, which in addition to training the model to recognize commitments also trains the model to perform poorly at distinguishing between the emails it’s being trained on and the emails it will evaluate; this is the adversarial aspect. Essentially, the network receives negative feedback when it indicates an email source, training it to be bad at recognizing which domain a particular email comes from. This has the effect of minimizing or removing domain-specific features from the model.

“There’s something counterintuitive to trying to train the network to be really bad at a classification problem, but it’s actually the nudge that helps steer the network to do the right thing for our main classification task, which is, is this a commitment or not,” says Sim.

Empowering users to do more

The two papers are aligned with the greater Microsoft goal of empowering individuals to do more, tapping into an ability to be more productive in a space full of opportunity for increased efficiency.

Reflecting on his own email usage, which finds him interacting with his email frequently throughout the day, White questions the cost-benefit of some of the behavior.

“If you think about it rationally, it’s like, ‘Wow, this is a thing that occupies a lot of our time and attention. Do we really get the return on that investment?’” he says.

He and other Microsoft researchers are confident they can help users feel better about the answer with the continued exploration of the tools needed to support them.

Related publications

Continue reading

See all blog posts