Projects

STATISTICS

Inferential Anomaly Estimation on Models Regressing Unobserved Streamed Data

Designed a nonparametric inferential algorithm to track the evolving behavior of regression models deployed to operate on data streams, and continuously approximate their probability to output anomaly values as the stream is subject to unknown distribution shifts. Method is agnostic to the choice of regression model and data stream dimensionality.

Abstract:

In most regression modelling tasks the model designer envisions it to operate on pre-defined finite ranges of the output-space, where outputs outside these sub-regions are improper. Ideally, the model will learn a mapping from the input-space onto a distribution naturally bounded within the desired ranges, and this assumption will be extensively validated before deployment. However, recent industry applications raised concerns over long-term distribution shifts on data feeding into model pipelines, which forces models to unexpectedly operate on deviating data, consequently leading into decaying performance over time. Decaying performance can manifest as biased outputs regressed onto output-space sub-regions considered anomalous for their application, making them “anomalous regressions”. This is a particular concern for models deployed on high-volume data streams, where regressions are continuously output at high rates and used downstream with little to no human supervision. Large counts of anomalous regressions can thus propagate fast, silently harming their applications. Given the growing complexity of modern models, their learned mappings may not be obtainable, which added to the unknown nature of evolving distribution shifts on yet unobserved data, makes it impossible to compute the probability of a model producing anomalous regressions as it continuously executes. We thus propose a new inferential perspective to this problem. We introduce a statistical method and a corresponding algorithm to continuously approximate the evolving probability of a model generating anomalous regressions as it executes. The method is independent of the underlying model’s nature and does not need to directly track distribution shifts or to determine the model’s learned mapping. Lastly, we present a visualization technique to monitor in real-time the evolving risk of a model generating anomalous regressions under potentially unknown shifts as it scales and executes on future yet unobserved data. 

Indep_Res__Inferential_Anomaly_Estimation.pdf

MACHINE LEARNING

Advancing Humor-Focused Sentiment Analysis through Improved Contextualized Embeddings and Model Architecture

Natural Language Processing (NLP) paper surveying word-embedding techniques and novel Deep Learning architectures to improve current performance on humor-focused sentiment analysis. 

Link: https://arxiv.org/abs/2011.11773

Abstract:

Humor is a natural and fundamental component of human interactions. When correctly applied, humor allows us to express thoughts and feelings conveniently and effectively, increasing interpersonal affection, likeability, and trust. However, understanding the use of humor is a computationally challenging task from the perspective of humor-aware language processing models. As language models become ubiquitous through virtual-assistants and IOT devices, the need to develop humor-aware models rises exponentially. To further improve the state-of-the-art capacity to perform this particular sentiment-analysis task we must explore models that incorporate contextualized and nonverbal elements in their design. Ideally, we seek architectures accepting non-verbal elements as additional embedded inputs to the model, alongside the original sentence-embedded input. This survey thus analyses the current state of research in techniques for improved contextualized embedding incorporating nonverbal information, as well as newly proposed deep architectures to improve context retention on top of popular word-embeddings methods.

pre_print_Advancing_Humor_Focused_Sentiment_Analysis.pdf

Predicting World Cup Outcome

Project developed with four colleagues to analyze the statistical impact of on-pitch and off-pitch factors’ on winning world cup matches. The project culminated on the development of a series of Machine Learning models to predict match outcome from historical and recent performance 

COMPUTER SCIENCE

Bunny Ears

App developed with three colleagues to help children learning to read. 

The user uses the camera to capture an image of printed text, which is then converted by the app into an internal text file. Following, the user records themselves reading the text, and the app uses the audio to score the reading for similarity and pronunciation.

See project's website: https://devpost.com/software/bunny-ears-noyxcj