Skip to content

rmadar/lecture-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction to Python for Data Analysis

This repository contains the material for a lecture on Python proposed for the master PFA, the Data Scientist University Diploma (DU), and the master IMAPP hosted at Université Clermont-Auvergne (UCA). Basic Python knowledge is not required but would be very valuable. However, it is better to have some basic mathematics knowledge like simple vector operations or statistics.

General Scope of the Lecture

The main goal of this lecture is to make people familiar with Python and data analysis tools to enable them to extend their knowledge on their own. Practical exercises and small projects are also proposed to provide a few working examples of data manipulations with different levels of complexity.

What This Lecture Is: A basic and practical introduction to Python together with some of the most important data analysis tools, namely numpy, matplotlib, and pandas.

What This Lecture Isn't: Neither a formal introduction to Python nor an extensive demonstration of all features available in the tools mentioned above.

Content of the Lecture -- full PDF

There is a lot of information in this lecture. To help you focus on important aspects, each chapter starts with a list of expected skills that you should take away, ranked with three levels: basic, medium, expert.

0. Practical Introduction to Jupyter Notebooks. This section is not present in the final PDF but is presented during the lecture.

1. Practical Introduction to Python. This first section is dedicated to basic object types and operations in Python. Functions will also be described, but object-oriented programming will not be covered.

2. Introduction to numpy. Differences between usual Python objects and numpy objects will be introduced.

3. Three tools to know. This section gives a glimpse of matplotlib, pandas, and scipy packages, allowing powerful data analysis.

4. Multidimensional data manipulation. Non-trivial operations for multidimensional data using the full power of numpy. Most of these operations can be performed with existing tools, but it is instructive to do them once with native numpy.

5. Introduction to image processing. Very first steps of image processing (definition, plotting, operation) including basic filter applications (noising, sharpening, border detection).

Other practical examples: Depending on the remaining time (and people's preferences), we can go through different topics among the following ones. Some of them can also be used as projects performed by students.

  • Fourier analysis
  • Principal component analysis (PCA)
  • Random Forest regression
  • Gaussian processes

List of Previous Exams with Corrections

How to Get Prepared

1. Get familiar with Python. I would recommend two links: w3school tutorial (both basic and complete) and https://www.learnpython.org (code can be run directly within your web browser).

2. Install Python with Anaconda. In order to run Python on your machine, you should install it. I would recommend Anaconda for this, which also includes Jupyter Notebook.

3. Install Git. This is a versioning software that can be installed following these instructions. This whole repository can be cloned using the git clone https://github.com/rmadar/lecture-python command.

4. Get familiar with notebooks. This represents a nice environment combining code, notes, and plots. This is very powerful to learn something and play with it. You can check out this video or this post.