ECTS : 3
Description du contenu de l'enseignement :
Introduction (Data Science Basics, Standard Workflow, Roles and Skills, Team Organization Models)
Data 1 (Collection, Sorting, Filtering, Transformation, Tidy data)
Data 2 (Aggregation, Grouping, Summarizing, Relational Data)
Visualization (Scatterplots, Heatmaps, Maps, Networks, Parameter settings)
Linear Regression 1 (One variable LR, Multiple LR, Understanding the model)
Linear Regression 2 (Correlation and Multicolinearity, Making Predictions)
Classification 1 (Confusion Matrix, ROC curves, Logistic Regression)
Classification 2 (Trees, CART, Random Forests)
Clustering (Hierarchical clustering, k-means, Recommendation systems)
Text Analytics (Pre-processing, Bag of Words, Predict Sentiment)
Demo: Process Mining – Political Events Analysis
Controle Continue
Compétence à acquérir :
Mode de contrôle des connaissances :
50% Controle Continue: It will be an ~1 hr test with closed questions (mostly multiple choice questions). The questions will not demand the use of software, but a basic understanding of how the software tools could contribute will be required. The materials to be included comprise the material of sessions 1-10.
50% Assignment: Students will have 1 month to prepare a report according to an exercise definition (checklist) that will be delivered during the last session. The definition will point student to a rich source of data and will outline the basic techniques that should be used to analyze those data. The specific evaluation criteria will be described in the definition, but students should generally expect them to be related to the analytic techniques rather than to the actual solution (analysis) proposed in the report.
Bibliographie, lectures recommandées :
The pedagogy of the course is majorly based on the book: Dimitris Bertsimas, Allison O'Hair and Bill Pulleyblank, The Analytics Edge, Dynamic Ideas, 2016. ISBN: 978-0989910897
Another excellent book that describes most of the techniques we will discuss in an intuitive way is: Evans, J. R. (2016). Business analytics. Pearson Higher Ed.[1]
A more manager-oriented approach can be found at the (free or donate) book: Caffo, B., Peng, R. D., & Leek, R. H. (2016). Executive data science: A guide to training and managing the best data scientists. Leanpub - https://leanpub.com/eds
If you’ve never programmed before, you might find Hands on Programming with R by Garrett (https://rstudio-education.github.io/hopr/) to be a useful adjunct to this course.
If you get stuck in particular with R, start with Google. Typically adding “R” to a query is enough to restrict it to relevant results: if the search isn’t useful, it often means that there aren’t any R-specific results available. Google is particularly useful for error messages. If you get an error message and you have no idea what it means, try googling it! Chances are that someone else has been confused by it in the past, and there will be help somewhere on the web. If Google doesn’t help, try stackoverflow. Start by spending a little time searching for an existing answer, including [R] to restrict your search to questions and answers that use R.
[1] Of course, for each technique (Linear Regression, Logistic Regression, Trees, Clustering, etc.) there is a plethora of dedicated textbooks, but their focus is out of scope for this class…