Skip to main content

Machine Learning for social scientists: theory and application

When:
Venue: Birkbeck Central

Book your place

We welcome Giovanni Cerulli a researcher at the IRCrES-CNR, Research Institute on Sustainable Economic Growth, National Research Council of Italy, Unit of Rome to Birkbeck to join us to discuss machine learning for social scientists and explore the theory and application. 

This event is planned to take place in-person on Wednesday 10 May in Room 206 in Birkbeck Central. 

Session 1 (11:00-12:00) An introduction to machine learning

Dr Giovanni Cerulli will present the fundamentals of machine learning (ML) focusing of definitions, usefulness, and fields of application. After introducing the “curse of dimensionality” (or sparseness problem), I will present a taxonomy of ML models (i.e., learners) to then introduce the main trade-off arising when applying ML to social sciences such as, in particular, the prediction-inference and the complexity-interpretability trade-offs. Next, I will introduce the overfitting phenomenon and discuss the prediction variance-bias trade-off. The session ends by presenting methods for estimating prediction accuracy, including k-fold cross-validation for optimally fine-tuning ML models.

Session 2 (12:15-13:30) Machine learning in practice

Dr Giovanni Cerulli will present two related Stata commands r_ml_stata_cv and c_ml_stata_cv, for fitting popular machine learning methods in both a regression and a classification setting. Using the recent Stata/Python integration platform introduced in Stata 16, these commands provide hyperparameters' optimal tuning via K-fold cross-validation using grid search. More specifically, they use the Python Scikit-learn application programming interface to carry out both cross-validation and outcome/label prediction. We will provide examples on real case-studies.  

Main reference

Cerulli G. (2022), Machine learning using Stata/Python, The Stata Journal, 22, 4, 772–810.

Biography 

Giovanni Cerulli is researcher/director at IRCrES-CNR, Research Institute on Sustainable Economic Growth, National Research Council of Italy, Unit of Rome. His research interest is in applied econometrics, with a special focus on causal inference, program evaluation, and machine learning applied to various fields of the social and epidemiological sciences. Giovanni has developed original causal inference models, such as dose-response and treatment models with social interaction providing Stata implementation. He has developed around twenty Stata commands for casual inference and machine learning working on Stata/Python/R integration for this purpose. Giovanni is author of the book Econometric Evaluation of Socio-Economic Programs: Theory and Applications(Springer, 2015; second edition 2022).He has published his papers in several quality scientific journals, and is currently editor-in-chief of the International Journal of Computational Economics and Econometrics.

Contact name: