Chapter 2 Python

Python is a general-purpose programming language and finds application in a broad range of domains, including web development, artificial intelligence and data science. Since it is characterized as a high-level programming language, it is considered relatively easy to learn.

You can download Python from here Python. To allow easier editing of code, we recommend to chose an editor of your choice, such as Visual Studio Code with a Tutorial on how to use Python with Visual Studio Code. Please make sure you have a running Python instance as we cannot offer full and individual support with setting up the work environment.

In order to write structured and readable Python code a set of coding standards, PEP 8 have been defined. Throughout the course we require you to ensure compliance with the coding standards. If you are using Visual Studio Code, you can use the extension Flake8 for ensuring compatibility with the coding standards.

The contents within this section can be found within the Python Cookbook, authored by Beazley & Jones (2013). Additionally, there is a wide range of online resources available for self-studying purposes, such as W3 Schools.

2.1 Data Structures

Before diving in the actual coding, it is vital to understand different data types employed by Python. The following section provides a first overview.

Strings str Contains a set of literals surrounded by quotation marks. Strings are arrays of bytes representing unicode characters.In Python, strings are fundamental data types used to represent textual data. They are sequences of characters enclosed within either single quotes, double quotes or triple quotes. Strings are immutable, meaning once defined, their contents cannot be changed. This immutability allows for efficient handling of string objects in Python.
# Example String
ex_string = "Hello World!"
Numeric Types int, float Python supports several numeric data types to represent numerical values. An Integer is a whole number without decimals of unlimited length. They can be positive, negative, or zero. In Python, integers have unlimited precision, meaning they can be of any size as long as the system’s memory allows. A Float is a number containing one or more decimals. Python provides various arithmetic operations and functions for working with numeric data types. These operations include addition, subtraction, multiplication, division, exponentiation, modulus, and floor division.
# Example numerics
ex_int = 5
ex_float = 2/3
Sequences list, tuple, range, set A sequence is an ordered collection of elements or items. Sequences allow you to store and manipulate multiple values in a single variable. Python provides several built-in sequence types, each with its own characteristics and use cases. The main sequence types in Python are lists, tuples, and strings. Sequences can might be ordered or unordered, items might be changeable or unchangeable. A list is a built-in data structure used to store a collection of items. Lists are ordered, mutable (modifiable), and can contain elements of different data types, including integers, floats, strings, and even other lists. Lists are defined using square brackets [ ], and elements within the list are separated by commas.
# Example Sequence
ex_list = [1,2,3,4]
ex_tuple = (1,2)
ex_range = range(1,10)
ex_set = set(ex_list)
Mappings dictionary a mapping is a collection of key-value pairs where each key is associated with a value. The concept of mapping is implemented in Python through dictionaries, which are unordered collections of items. Dictionaries are also known as associative arrays, hash tables, or simply maps in other programming languages.
# Example Mapping
ex_dictionary = {"key1": "value1", 
                 "key2":"value2"}
Booleans TRUE, FALSE Booleans are used to evaluate logical expressions and control the flow of program execution based on conditions. They are extensively used in programming for implementing branching logic, loop control, and decision-making constructs.
# Example Boolean
ex_boolean_t = True
ex_boolean_f = False

To identify the data type of a variable use type().

type(ex_list)
<class 'list'>
type(ex_int)
<class 'int'>

2.2 Operators

Operators in Python are symbols that perform operations on variables and values. Python supports various types of operators, including arithmetic operators, comparison operators, logical operators, assignment operators, and more. These operators are used to manipulate data, make decisions, and perform calculations within Python programs.

Comparison Operators

A comparison operator is used to compare two values and test whether they are the same.

== Equality
>,< greater, smaller
>=,<= greater than, smaller than
!= Inequality
x = 1
y = 3

print(x == y)
False
print(y > 3)
False

Logical Operators

Logical operators are used to combine conditional statements and return a Boolean result based on the logical relationship between them. Logical operators can be used to link a set of conditions.

&, and TRUE if both Boolean expressions are TRUE
|, or TRUE if either Boolean expression is TRUE
^, xor TRUE if either Boolean expression is TRUE
in TRUE if the operand is equal to one of a list of expressions
~, not Reverses the value of any other Boolean operator
x = 1
y = 3

print(x == 1 and y ==3)
True
print(x == 1 or y < 3)
True

When combining multiple operators, we need to use parentheses to facilitate their correct evaluation. Parentheses have the highest precedence and cause the expressions inside parentheses to be evaluated first. If two operators have the same precedence, the expression is evaluated from left to right.

2.3 Syntax

Comments

Comments are utilized to clarify the code’s purpose, improve its readability, and assist both other developers and your future self in understanding it better. Since comments are ignored by the Python interpreter when running the program, they do not alter its functionality.

You can put a comment into your code using a prefixed # in front of your comment. Commenting your code is especially useful if you want to use it at another point of time and make it understandable for other programmers. Comments are also useful for temporarily disabling lines of code without deleting them. This can be helpful for debugging or testing different sections of code. Write clear and concise comments that explain the intention of the code. Avoid redundant or unnecessary comments that simply restate what the code is doing.

Comments over multiple lines are used when describing functions and called Docstrings. A brief documentation about the usage of docstrings can be found here.

# This is a comment

'''
This is a multi-line comment
'''

Case Sensitivity & Indentation

Python is case-sensitive, meaning it distinguishes between uppercase and lowercase letters. This applies to variable names, function names, keywords, and any other identifiers in Python code.

Indentation plays a crucial role in Python’s syntax for defining the beginning and the end of code blocks. Consistent indentation (typically using four single spaces or tabs) is required to maintain the structure of the code and determine which statements belong to which block.

Consequently, ignoring indentation or predetermined capitalization result in errors

2.4 Loops

Loops are used to execute a block of code repeatedly as long as a certain condition is true. Python supports two main types of loops: for loops and while loops. These loops allow you to automate repetitive tasks and iterate over collections or sequences of data.

For loops are used to iterate over a sequence (such as a list, tuple, string, or range) and execute a block of code for each element in the sequence. The loop variable takes on each value in the sequence one by one.

a_list = [1,2,3,4,5] 

for element in a_list:
    print(element)  
1
2
3
4
5

While loops are used to repeatedly execute a block of code as long as a specified condition is TRUE. The loop continues to execute until the condition becomes false.

iterator = 1

while iterator <= 5:
  print(iterator)
  iterator += 1
1
2
3
4
5

Python provides loop control statements such as break, continue, and pass to modify the behavior of loops. break terminates the loop prematurely, continue skips the current iteration and moves to the next iteration and pass acts as a placeholder and does nothing.

2.5 Conditionals

Python supports the if, elif (short for “else if”), and else statements for implementing conditional logic.

a = 5
b = 3

if a > b:
  print("a is greater than b")
elif a < b:
  print("a is smaller than b")
else:
  print("a and b are equal")
a is greater than b

2.6 Functions

Functions are blocks of reusable code that perform a specific task. Functions allow you to break down your program into smaller, manageable parts, making your code more organized, readable, and modular. You can define your own functions or use built-in functions provided by Python or external libraries.

Since we follow the principle of avoiding redundant code, we want to write functions whenever possible. As a rough rule, a function is helpful once we are copy-pasting code 3 times or more.

You define a function using the def keyword, followed by the function name and parentheses (). Any parameters (inputs) to the function are listed within the parentheses. The function body, containing the code to be executed when the function is called, is indented. To execute a function, you “call” it by using its name followed by parentheses ().

# A basic function
def hello():
    print(f"Welcome!")

hello()
Welcome!

Information can be passed into functions as arguments. If the function requires any arguments, you pass them within the parentheses. Arguments are specified after the function name, inside the parentheses.

# A function with a single positional argument
def hello(name):
    print(f"Welcome {name}!")

hello("Lisa")
Welcome Lisa!

You can add an arbitrary amount of arguments, separated by commas. A function can also take a default argument, which will be processed throughout the function if no argument is provided within the function call.

If no argument is passed to the function during its call, default arguments can be defined within the function itself which will be processed in such a case.

# A function with a single optional argument
def hello(name="Somebody"):
    print(f"Welcome {name}!")

hello()
Welcome Somebody!

We differentiate between positional arguments and keyword arguments. A positional argument is passed to the function and evaluated based on its position in the function’s call hello(“Agustina”) while a keyword argument refers to a specific keyword within the function call hello(name=“Somebody”). The keyword argument is given for a specified variable.

# A function with a positional and optional keyword argument
def hello(name_1, name_2="Somebody"):
    print(f"Welcome {name_1} and {name_2}!")

hello("Lisa")
Welcome Lisa and Somebody!
hello("Lisa", "Florian")
Welcome Lisa and Florian!

The number of information that is passed into a function can also be handled in a flexible way. This implies the function takes as many arguments as the user specified and processes them accordingly. We therefore specify function class=“highlight-syntax”>*args is useful when you want to create flexible functions that can accept a varying number of positional arguments. It’s commonly used when working with functions that delegate to other functions or when building APIs that need to handle arbitrary inputs.

# A function with a variable number of input names
def hello(*names):
    print(f"Welcome {names}!")

hello("Lisa", "Ryan", "Florian")
Welcome ('Lisa', 'Ryan', 'Florian')!

class=“highlight-syntax”>*kwargs is useful when you want to create flexible functions that can accept a varying number of keyword arguments.

# A function with a variable number of input names as list
def hello(**names):
  
   for key, value in names.items():
        print(key, ":", value)
        print(f"Welcome {value}!")

hello(name_1="Lisa", name_2="Ryan", name_3="Florian")
name_1 : Lisa
Welcome Lisa!
name_2 : Ryan
Welcome Ryan!
name_3 : Florian
Welcome Florian!

You can play around with writing functions in order to understand how they work, what is possible and what is not.

2.7 Dataframes

When working with large structured sequences of data, data is commonly stored in a pandas Dataframe. Pandas is an open-source Python library. It offers powerful and flexible data structures, particularly Series (1-dimensional) and DataFrame (2-dimensional), that allow you to work with structured data easily and efficiently.

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet, where each column can represent a different feature, and each row represents an individual record or observation.

A Dataframe is at least a two dimensional table of potentially heterogenous data, containing labelled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects.

Since we are utilizing pandas Dataframes, we import the library first.

# Import modules
import pandas as pd

A Dataframe can easily be created from a dictionary, with keys representing columns and each key’s value is a data entry.

data = {"name": ["Lisa", "Florian", "Moritz"], 
        "grade": [2.3, 1, 1.7], 
        "profession": ["PhD", "PhD", "Student"]}
        
dataframe = pd.DataFrame(data)
      name  grade profession
0     Lisa    2.3        PhD
1  Florian    1.0        PhD
2   Moritz    1.7    Student

Alternatively, we can construct a Dataframe from external files, such as .csv.

df = pd.read_csv('data.csv')

We can extract the data types of the frame

dataframe.dtypes
name           object
grade         float64
profession     object
dtype: object

2.7.1 Horizontal Filtering

Horizontal filtering of a DataFrame typically involves selecting specific columns of the Dataframe. When working with large amounts of data, you may also have a large number of features (columns) in your dataset, while not all of them are relevant for your ongoing analysis. Horizontal filtering of the Dataframe allows you to select only the columns that are necessary for your analysis, making your dataset more manageable and improving computational efficiency.

We first extract all columns of the Dataframe at hand.

# Extract all available columns
dataframe.columns
Index(['name', 'grade', 'profession'], dtype='object')

We then apply a filter to the columns of the Dataframe to only show the columns name and grade.

dataframe[["name", "grade"]]
      name  grade
0     Lisa    2.3
1  Florian    1.0
2   Moritz    1.7

trades.loc[:, mask]

2.7.2 Vertical Filtering

Vertical filtering in a DataFrame refers to selecting specific rows based on a defined set of conditions. It can be especially relevant when working with only a subset of the data which meetds specific conditions. Vertical filtering allows you to extract rows that satisfy these conditions, enabling focused analysis on relevant portions of your dataset. Correspondingly, vertical filtering is also used for cleaning of the data. Single rows can be removed if they contain missing values, outliers or errors to ensure quality and integrity of your dataset.

we do so by creating a “mask” of the original Dataframe that indicates whether each row meets our defined condition. The outcome variable is Boolean, it is either True or False for each row. This mask is now used to filter the complete dataframe. All rows that received the Boolean value TRUE (that is, all rows that fulfill the condition) will remain within the filtered dataframe. All rows that received the Boolean value FALSE (that is, all rows that do not fulfill the condition) will be removed from the dataframe.

dataframe[dataframe["name"] == "Lisa"]
   name  grade profession
0  Lisa    2.3        PhD

We can make the filtering procedure more dynamic by employing a variable instead of a static name.

filter_name = "Lisa"

dataframe[dataframe["name"] == filter_name]
   name  grade profession
0  Lisa    2.3        PhD

We can extend our filter to now contain a multitude of conditions. Instead of filtering for a single scalar value, we filter for values within a list of values.

filter_name = ["Lisa", "Agustina"]

dataframe[dataframe["name"].isin(filter_name)]
   name  grade profession
0  Lisa    2.3        PhD

Using Boolean operators, we can also make use of multiple conditions. When using multiple conditions for filtering, it is necessary to cluster the single conditions with parentheses based on their logical structure.

filter_name = ["Lisa", "Florian"]

dataframe[((dataframe["name"].isin(filter_name)) | (dataframe["name"] == "Moritz")) & (dataframe["grade"] < 2)]
      name  grade profession
1  Florian    1.0        PhD
2   Moritz    1.7    Student

Recall that you need to store the filtered dataframe is a new variable if you want to proceed working with the filtered data.

2.8 Aggregating

Aggregating information from a DataFrame in Pandas involves summarizing or calculating statistics across rows and / or columns. You can apply built-in aggregation functions, such as sum, mean, median directly to the columns of a DataFrame.

dataframe.describe()
          grade
count  3.000000
mean   1.666667
std    0.650641
min    1.000000
25%    1.350000
50%    1.700000
75%    2.000000
max    2.300000
dataframe.groupby("profession")\
         .count()
            name  grade
profession             
PhD            2      2
Student        1      1

More complex ways of aggregation are possible as well, however they require you to define explicit ways of how the data is supposed to be aggregated.

dataframe["grade"].agg(lambda x: x.max() - x.min())

Be careful that the aggregation you have chosen might only work for numeric values, therefore you can either explicitly define the (numeric) subset of columns you want to aggregate or specify within the function call, that you only want to aggregate a certain type of column.