My research interests are in NLP, Machine Learning and Data Science. I'm interested in applying statistical methods for detecting and interpreting the underlying topics in large volumes of text data. I also develop text analysis methods to solve problems in other scientific areas such as (computational) social and legal science.

Social Media and Computational Social Science

The daily interaction of billions of users with online social platforms such as Facebook, Twitter, Reddit or Instagram has made available enormous amounts of user generated content. The plethora and diversity of this data (e.g. text, images, videos or interactions with other users such as 'retweets' or 'likes') enabled studies in computational social science and sociolinguistics to analyse human behaviour on a large scale and automatically infer user latent attributes. Particularly, the growth of the user-generated content in social media can be used as a complementary source to traditional methods for extracting and studying user socioeconomic attributes such as occupation [1], income [2] and socioeconomic class [3]. I am interested in studying language use in social media to infer user characteristics using interpretable machine learning models while modelling the complex non-linear nature of the data. These approaches have real world applications in target advertising, health intervention and recommender systems.

[1] D. Preoţiuc-Pietro, V. Lampos and N. Aletras (2015). An Analysis of the User Occupational Class through Twitter Content. In ACL.
[2] D. Preoţiuc-Pietro, S. Volkova, V. Lampos, Y. Bachrach, N. Aletras (2015). Studying User Income through Language, Behaviour and Affect in Social Media, PLOS ONE.
[3] V. Lampos, N. Aletras, J. K. Geyti, B. Zou, I. J. Cox (2016). Inferring the Socioeconomic Status of Social Media Users based on Behaviour and Language. In ECIR.

Legal Text Mining

In his work on investigating the potential use of information technology in the legal domain, Lawlor surmised that computers would one day become able to analyse and predict the outcomes of judicial decisions [1]. He also stated that reliable prediction of the activity of judges would depend on a scientific understanding of the ways that the law and the facts impact on the relevant decision-makers, i.e. the judges. Building text-based predictive systems of judicial decisions can offer lawyers and judges a useful assisting tool [2]. Such systems may be used to rapidly identify cases and extract patterns that correlate with certain outcomes. They can also be used to prioritise the decision process on cases where law violations seem very likely. This may improve the delays imposed by the courts and encourage more applications by individuals who may have been discouraged by the expected time delays.

[1] R.C. Lawlor (1963). What computers can do: analysis and prediction of judicial decisions. American Bar Association Journal
[2] N. Aletras, D. Tsarapatsanis, D. Preoţiuc-Pietro, V. Lampos (2016). Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective, Peer J Computer Science.

Understanding Large Document Collections

Much of the information in digital libraries is stored in an unstructured way and is not organised using any automated system. That is often overwhelming for users in a way that makes it difficult to find specific information or explore such collections. A particular set of unsupervised statistical methods, namely topic models have been extensively used in Natural Language Processing and Information Retrieval for analysing and organising large document collections. Topic models have been integrated into document browsing systems allowing humans to navigate through and identify relevant information on a large scale [1]. The output of topic models, often represented by lists of the most probable words, needs post-processing to make it interpretable for users [2,3,4].

[1] N. Aletras, T. Baldwin, J. H. Lau and M. Stevenson (2017). Evaluating Topic Representations for Exploring Document Collections. Journal of the Association for Information Science and Technology (JASIST).
[2] N. Aletras and M. Stevenson (2013). Representing Topics Using Images. In NAACL-HLT.
[3] N. Aletras and M. Stevenson (2014). Labelling Topics using Unsupervised Graph-based Methods. In ACL.
[4] N. Aletras and A. Mittal (2017). Labeling Topics with Images using Neural Networks. In ECIR.

© Nikolaos Aletras 2018. | Design by TEMPLATED.