Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 10 additions & 18 deletions 01-Fundamentals.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,16 @@ You can add more key-value pairs via `my_dict[new_key] = new_value` syntax. If t
sentiment['dog'] = 5
```

### Dictionary vs. List

Both data types help you organize values, and they differ how you access the values. You access a list's values via a numerical index, and you access a dictionary's values via a key.

![](https://open.oregonstate.education/app/uploads/sites/6/2016/10/II.8_1_dict_list_indices_values.png#fixme)

![](https://open.oregonstate.education/app/uploads/sites/6/2016/10/II.8_2_dict_keys_values.png#fixme)

Source: <https://open.oregonstate.education/computationalbiology/chapter/dictionaries/>

### Application for Data Cleaning

Suppose that you want to do some data recoding. You want to look at the "case_control" column of `simple_df` and change "case" to "experiment" and "control" to "baseline". This correspondence relationship can be stored in a dictionary. You can use the `.replace()` method for Series objects with a dictionary as an input argument.
Expand Down Expand Up @@ -314,21 +324,3 @@ isinstance(simple_df, list)
## Exercises

Exercise for week 1 can be found [here](https://colab.research.google.com/drive/1nskVV4XFDVjkN_6OIQJDtDettOEr-n5W?usp=sharing).

## Appendix: Why Dictionaries?

If we didn't have a tool such as Dictionary, we could have tried to implement the following via Pandas Dataframes:

```{python}
sentiment_df = pd.DataFrame(data={'word': ["happy", "sad", "joy", "embarrassed", "restless", "apathetic", "calm"],
'sentiment': [8, 2, 7.5, 3.6, 4.1, 3.8, 7]})
sentiment_df
```

But to access a word's sentiment value, you have to write a complex syntax:

```{python}
sentiment_df.loc[sentiment_df.word == "joy", "sentiment"]
```

Besides the cumbersome syntax, it is not very fast: the program has to find which row "joy" is at. Whereas, in the dictionary data structure, the lookup is immediate. The time it takes for dictionary to take a key and retrieve a value *does not depend on the size of the dictionary,* whereas it does for the Dataframe implementation.
125 changes: 49 additions & 76 deletions 02-Iteration.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ It turns out that we can iterate over *many* types of data structures in Python.

- DataFrame

- Ranges (to be introduced in exercises)
- Ranges

- Dictionary

Expand Down Expand Up @@ -62,7 +62,7 @@ for rate in heartrates:
print("Current heartrate:", rate)
```

Here is what the Python interpretor is doing:
Here is what the Python interpreter is doing:

1. Assign `heartrates` as a list.
2. Enter For-Loop: `rate` is assigned to the next element of `heartrates`. If it is the first time, `rate` is assigned as the first element of `heartrates`.
Expand Down Expand Up @@ -142,110 +142,83 @@ Let's see this example step by step:

If it doesn't load properly, here is the [link](https://pythontutor.com/render.html#code=import%20math%0Aheartrates%20%3D%20%5B68,%2054,%2072,%2066,%2090,%20102,%2049%5D%0Aprint%28%22Before%3A%22,%20heartrates%29%0A%0Afor%20index,%20m%20in%20enumerate%28heartrates%29%3A%0A%20%20print%28%22Index%3A%22,%20index,%20%22%20%20%20m%3A%22,%20m%29%0A%20%20heartrates%5Bindex%5D%20%3D%20math.log%28m%29%0A%20%20%23heartrates%5Bindex%5D%20%3D%20math.log%28heartrates%5Bindex%5D%29%20%23this%20is%20okay%20also.%0A%20%20%0Aprint%28%22After%3A%22,%20heartrates%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false).

## Conditional Statements
## For-Loops on other iterable data structures

As we iterate through values in a data structure, it is quite common to want to have your code do different things depending on the value. For instance, suppose you are recoding `heartrates`, and the numerical values should be "low" if it is between 0 and 60, "medium" if it is between 60 and 100, "high" if it is above 100, and "unknown" otherwise (when it is below 0 or other data type).
### Tuple

Stepping back, we have been working with a *linear* way of executing code - we have unconditionally executing every line of code in our program. Here, we are create a "control flow" via **conditional statements** in which the your code will run a certain section *if* some conditions are met.
You can loop through a Tuple just like you did with a List, but remember that you can't modify it!

Here is how the syntax looks like for conditional statements:
### String

```
if <expression1>:
block of code 1
elif <expression2>:
block of code 2
else:
block of code 3

block of code 4
```
You can loop through a String by iterating on each letter within the String.

There are three possible ways the code can run:
```{python}
message = "I am hungry"
for text in message:
print(text)
```

1. If `<expression1>` is evaluated as `True`, then `block of code 1` will be run. When done, it will continue to `block of code 4`.
2. If `<expression1>` is evaluated as `False`, then it will ask if `<expression2>` is `True` or not. If `True`, then `block of code 2` will be run. When done, it will continue to `block of code 4`.
3. If `<expression1>` and `<expression2>` are both evaluated as `False`, then `block of code 3` is run. When done, it will continue to `block of code 4`.
However, Strings are **immutable**, similar to Tuples. So if you iterate via `enumerate()`, you won't be able to modify the original String.

An important takeaway is that *only one block of code can be run*.
### Dictionary

Let's see how this apply to the data recoding example. Let's just assume the data we want to recode is just a single value in a variable `rate`, not the entire list `heartrates`:
When you loop through a Dictionary, you loop through the Keys of the Dictionary:

```{python}
rate = heartrates[0]
print(rate)

if rate > 0 and rate <= 60:
rate = "low"
elif rate > 60 and rate <= 100:
rate = "medium"
elif rate > 100:
rate = "high"
else:
rate = "unknown"

print(rate)
sentiment = {'happy': 8, 'sad': 2, 'joy': 7.5, 'embarrassed': 3.6, 'restless': 4.1, 'apathetic': 3.8, 'calm': 7}
for key in sentiment:
print("key:", key)
```

You don't always need multiple `if`, `elif`, `else` statements when writing conditional statements. In its simplest form, a conditional statement requires only an `if` clause:
The `.items()` method for Dictionary is similar to the `enumerate()` function: it returns a list of tuples, and within each tuple the first element is a key, and the second element is a value.

```{python}
x = -12
sentiment.items()
```

if x < 0:
x = x * -1

print(x)
```{python}
for key, value in sentiment.items():
print(key, "corresponds to ", value)
```

Then, you can add an `elif` or `else` statement, if you like. Here's `if`-`elif`:
### Ranges

```{python}
x = .25
**Ranges** are a collection of sequential numbers, such as:

if x < 0:
x = x * -1
elif x >= 0 and x < 1:
x = 1 / x

print(x)
- 1, 2, 3, 4, 5

```
- 1, 3, 5

Here's `if`-`else`. The `in` statement asks whether an element (102) is found in an iterable data structure `heartrates`, and returns `True` if so:
- 10, 15, 20, 25, 30

```{python}
if 102 in heartrates:
print("Found 102.")
else:
print("Did not find 102.")
```
It seems natural to treat Ranges as Lists, but the neat thing about them is that only the bare minimum information is stored: the start, end, and step size. This could be a huge reduction in memory...if you need a sequence of numbers between 1 and 1 million, you can either store all 1 million values in a list, or you can just have a Range that holds the start: 1, the end: 1 million, and the step size: 1. That's a big difference!

Finally, let's put the data recoding example within a For-Loop:
You can create a Range via the following ways:

```{python}
heartrates = [68, 54, 72, 66, 90, 102]
- `range(stop)` which starts at 0 and ends in `stop` - 1.

- `range(start, stop)` which starts at `start` and ends in `stop` - 1

- `range(start, stop, step)` which starts at `start` and ends in `stop` - 1, with a step size of `step`.

When you create a Range object, it just tells you what the input values you gave it.

for index, rate in enumerate(heartrates):
if rate > 0 and rate <= 60:
heartrates[index] = "low"
elif rate > 60 and rate <= 100:
heartrates[index] = "medium"
elif rate > 100:
heartrates[index] = "high"
else:
heartrates[index] = "unknown"

print(heartrates)
```{python}
range(5, 50, 5)
```

Let's see this in action step by step:
Convert to a list to see its actual values:

<iframe width="800" height="500" frameborder="0" src="https://pythontutor.com/iframe-embed.html#code=heartrates%20%3D%20%5B68,%2054,%2072,%2066,%2090,%20102%5D%0A%0Afor%20index,%20rate%20in%20enumerate%28heartrates%29%3A%0A%20%20if%20rate%20%3E%200%20and%20rate%20%3C%3D%2060%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22low%22%0A%20%20elif%20rate%20%3E%2060%20and%20rate%20%3C%3D%20100%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22medium%22%0A%20%20elif%20rate%20%3E%20100%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22high%22%0A%20%20else%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22unknown%22%0A%20%20%20%20%0Aprint%28heartrates%29&amp;codeDivHeight=400&amp;codeDivWidth=350&amp;cumulative=false&amp;curInstr=0&amp;heapPrimitives=nevernest&amp;origin=opt-frontend.js&amp;py=311&amp;rawInputLstJSON=%5B%5D&amp;textReferences=false">
```{python}
list(range(5, 50, 5))
```

</iframe>
To use Ranges in a For-Loop, it's straightforward:

If it doesn't load properly, you can find it [here](https://pythontutor.com/render.html#code=heartrates%20%3D%20%5B68,%2054,%2072,%2066,%2090,%20102%5D%0A%0Afor%20index,%20rate%20in%20enumerate%28heartrates%29%3A%0A%20%20if%20rate%20%3E%200%20and%20rate%20%3C%3D%2060%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22low%22%0A%20%20elif%20rate%20%3E%2060%20and%20rate%20%3C%3D%20100%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22medium%22%0A%20%20elif%20rate%20%3E%20100%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22high%22%0A%20%20else%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22unknown%22%0A%20%20%20%20%0Aprint%28heartrates%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false)
```{python}
for i in range(5, 50, 5):
print(i)
```

## Exercises

Expand Down
110 changes: 110 additions & 0 deletions 03-Conditionals.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Conditional Statements

As we develop more complex code, it is quite common to want to have your code do different things depending on the value. For instance, suppose you are recoding `heartrates`, and the numerical values should be "low" if it is between 0 and 60, "medium" if it is between 60 and 100, "high" if it is above 100, and "unknown" otherwise (when it is below 0 or other data type).

Stepping back, we have been working with a *linear* way of executing code - we have unconditionally executing every line of code in our program. Here, we are create a "control flow" via **conditional statements** in which the your code will run a certain section *if* some conditions are met.

Here is how the syntax looks like for conditional statements:

```
if <expression1>:
block of code 1
elif <expression2>:
block of code 2
else:
block of code 3

block of code 4
```

There are three possible ways the code can run:

1. If `<expression1>` is evaluated as `True`, then `block of code 1` will be run. When done, it will continue to `block of code 4`.
2. If `<expression1>` is evaluated as `False`, then it will ask if `<expression2>` is `True` or not. If `True`, then `block of code 2` will be run. When done, it will continue to `block of code 4`.
3. If `<expression1>` and `<expression2>` are both evaluated as `False`, then `block of code 3` is run. When done, it will continue to `block of code 4`.

An important takeaway is that *only one block of code can be run*.

Let's see how this apply to the data recoding example. Let's just assume the data we want to recode is just a single value in a variable `rate`, not the entire list `heartrates`:

```{python}
heartrates = [68, 54, 72, 66, 90, 102, 49]

rate = heartrates[0]
print(rate)

if rate > 0 and rate <= 60:
rate = "low"
elif rate > 60 and rate <= 100:
rate = "medium"
elif rate > 100:
rate = "high"
else:
rate = "unknown"

print(rate)
```

You don't always need multiple `if`, `elif`, `else` statements when writing conditional statements. In its simplest form, a conditional statement requires only an `if` clause:

```{python}
x = -12

if x < 0:
x = x * -1

print(x)
```

Then, you can add an `elif` or `else` statement, if you like. Here's `if`-`elif`:

```{python}
x = .25

if x < 0:
x = x * -1
elif x >= 0 and x < 1:
x = 1 / x

print(x)

```

Here's `if`-`else`. The `in` statement asks whether an element (102) is found in an iterable data structure `heartrates`, and returns `True` if so:

```{python}
if 102 in heartrates:
print("Found 102.")
else:
print("Did not find 102.")
```

Finally, let's put the data recoding example within a For-Loop:

```{python}
heartrates = [68, 54, 72, 66, 90, 102]

for index, rate in enumerate(heartrates):
if rate > 0 and rate <= 60:
heartrates[index] = "low"
elif rate > 60 and rate <= 100:
heartrates[index] = "medium"
elif rate > 100:
heartrates[index] = "high"
else:
heartrates[index] = "unknown"

print(heartrates)
```

Let's see this in action step by step:

<iframe width="800" height="500" frameborder="0" src="https://pythontutor.com/iframe-embed.html#code=heartrates%20%3D%20%5B68,%2054,%2072,%2066,%2090,%20102%5D%0A%0Afor%20index,%20rate%20in%20enumerate%28heartrates%29%3A%0A%20%20if%20rate%20%3E%200%20and%20rate%20%3C%3D%2060%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22low%22%0A%20%20elif%20rate%20%3E%2060%20and%20rate%20%3C%3D%20100%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22medium%22%0A%20%20elif%20rate%20%3E%20100%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22high%22%0A%20%20else%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22unknown%22%0A%20%20%20%20%0Aprint%28heartrates%29&amp;codeDivHeight=400&amp;codeDivWidth=350&amp;cumulative=false&amp;curInstr=0&amp;heapPrimitives=nevernest&amp;origin=opt-frontend.js&amp;py=311&amp;rawInputLstJSON=%5B%5D&amp;textReferences=false">

</iframe>

If it doesn't load properly, you can find it [here](https://pythontutor.com/render.html#code=heartrates%20%3D%20%5B68,%2054,%2072,%2066,%2090,%20102%5D%0A%0Afor%20index,%20rate%20in%20enumerate%28heartrates%29%3A%0A%20%20if%20rate%20%3E%200%20and%20rate%20%3C%3D%2060%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22low%22%0A%20%20elif%20rate%20%3E%2060%20and%20rate%20%3C%3D%20100%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22medium%22%0A%20%20elif%20rate%20%3E%20100%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22high%22%0A%20%20else%3A%0A%20%20%20%20heartrates%5Bindex%5D%20%3D%20%22unknown%22%0A%20%20%20%20%0Aprint%28heartrates%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false)

## Exercises

Exercise for week 3 can be found [here](https://colab.research.google.com/drive/1kPVyALVVn7__x0q6kfE9PKRpGNQgtu0a?usp=sharing).
File renamed without changes.
File renamed without changes.
File renamed without changes.
7 changes: 4 additions & 3 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,10 @@ book:
- index.qmd
- 01-Fundamentals.qmd
- 02-Iteration.qmd
- 03-Functions.qmd
- 04-Iteration_Styles.qmd
- 05-Reference_vs_Copy.qmd
- 03-Conditionals.qmd
- 04-Functions.qmd
- 05-Iteration_Styles.qmd
- 06-Reference_vs_Copy.qmd
- references.qmd

sidebar:
Expand Down
Loading
Loading