Events

Past Event

Machine Learning for the Social Sciences: Summer Course, Session D

May 28, 2019 - July 5, 2019
4:00 PM - 6:00 PM
Event time is displayed in your time zone.

The course will run for the 6-week duration of the Columbia Summer Session D, from May 28th through July 5th, 2019.

QMSS S 5073 Machine Learning for Social Science is open to the public but requires registration with SPS prior to course registration. For more information on SPS application and registration, please visit their website and explore your options here*Further registration information coming soon*

Michael Parrott

M W 4:00 pm-6:10 pm

Course Goals:

Social scientists need to fully engage with machine learning approaches that are found in computer science, engineering, AI, tech and in industry. This course will provide a comprehensive overview of machine learning as it is applied in a number of domains. Every effort will be made to draw comparisons and contrasts between this machine learning approach and more traditional regression-based approaches in the social sciences. Emphasis will also be on opportunities to synthesize these two approaches. The course will start with an introduction to Python, the scikit-learn package, and GitHub. After that, there will be some discussion of data exploration, visualization in matplotlib, preprocessing, feature engineering, variable imputation, and feature selection. Supervised learning methods will be considered, including OLS models, linear models for classification, support vector machines, decision trees, and random forests, and gradient boosting. Calibration, model evaluation and strategies for dealing with imbalanced datasets, non-negative matrix factorization, and outlier detection will be considered next. This will be followed by unsupervised techniques: PCA, discriminant analysis, manifold learning, clustering, mixture models, cluster evaluation. Lastly, we will consider neural networks, convolutional neural networks for image classification and recurrent neural networks. Prerequisites are basic probability and statistics, basic linear algebra and calculus. The course will use Python, and so if students have programmed in at least one software language, that will make it easier to keep up with the course.