IT 117: Intermediate Scripting
Class 14 - Midterm Review
Tips and Examples
Review
Microphone
Homework 7
I have posted homework 7
here.
It is NOT due this coming Sunday.
Instead it is due Sunday, March 24th.
This is to give you time to study for the midterm.
And to give me time to score it.
Midterm
The Midterm exam for this course will be held on Tuesday,
March 19th.
That is the first Tuesday after the Spring Break.
The exam will be given in this room.
It will consist of questions like those on the quizzes along with questions
asking you to write short segments of Python code.
60% of the points on this exam will consist of questions from the Ungraded Class
Quizzes.
There will be 15 of these questions worth 4 points each.
The other 40% of points will come from four questions that ask you to
write a short segment of code.
Each of the code questions is worth 10 points each.
To study for the code questions you should know
- Dictionaries
- Sets
- How to use the os and
sys modules
- How to write a regular expression
A good way to study for the code questions is to review the Class Exercises
and homework solutions.
Today's class will be a review session.
You will only be responsible for the material in the Class Notes for
today's class on the exam.
You will find the Midterm review Class Notes
here.
If for some reason you cannot take the exam on the date mentioned above
you must contact me to make alternate arrangements.
The Midterm is given on paper.
I scan each exam paper and upload the scans to Gradescope.
I score the exam on Gradescope.
You will get an email from Gradescope with your score when I am done.
The Midterm is a closed book exam.
You are not allowed to any resource, other than what is in your head, while taking the exam.
Cheating on the exam will result in a score of 0 and will be reported to the Administration.
Remember your Oath of Honesty.
To prevent cheating, certain rules
will be enforced during the exam.
Quiz 5
Let's look at the answers to
Quiz 5
No Class Exercise or Class Quiz
Today is a review session.
There will no Class Exercise or Class Quiz today.
Questions
Are there any questions before I begin?
Tips and Examples
Studying for the Midterm with Flashcards
- 60% of the point on the Midterm come from Class Quiz Questions
- You can use the flashcards you create for these questions
when studying for the Midterm
- But not all Class Quiz Questions will appear on the Midterm
- If the flashcard question covers a topic not in the Midterm Review ...
- you do not have to study it for the exam
- You should remove these flashcards from your collection
Review
Dictionaries
- Dictionaries
have multiple entries
- Each entry has two parts, a
key
and a value
- You use the key to get the value
- So an entry in a Python dictionary is a key-value pair
- You cannot have a dictionary entry that is a key with no value ...
- or a value with no key
Dictionary Literals
- A literal
is a value written directly inside the code
- Dictionary literals contain a number of entries separated by commas ...
- and enclosed in curly braces, { }
- The key and value are separated by a colon, :
>>> digit_names = {1 : "one", 2 : "two", 3 : "three"}
>>> digit_names
{1: 'one', 2: 'two', 3: 'three'}
Creating an Empty Dictionary
- An empty dictionary is a dictionary with no entries
- You created and empty dictionary like this
new_dict = {}
Getting Values from a Dictionary
Adding Elements to a Dictionary
- Dictionaries have no methods to add entries to them
- Instead, you use the [ ] to assign a value
to a new key
- This creates a new entry in the dictionary
>>> email_addresses
{}
>>> email_addresses["joe"] = "joe@gmail.com"
>>> email_addresses
{'joe': 'joe@gmail.com'}
- The entry will only be added ...
- if the key is not already used in the dictionary
- If the key is already in the dictionary ...
- it will change the value associated with that key
>>> email_addresses
{'joe': 'joe@gmail.com'}
>>> email_addresses["joe"] = "bigmanjoe@hotmail.com"
>>> email_addresses
{'joe': 'bigmanjoe@hotmail.com'}
Changing a Dictionary Value
- The way you change a value in a dictionary is similar to what you do in a list
- You use an assignment statement with the [ ]
operator
- The left hand side of the assignment statement ...
- is the dictionary variable followed by
[ ] ...
- with a key inside
- The new value appears on the right hand side
>>> students
{'01234': 'John Doe', '023413': 'Alan Smith'
>>> students["023413"] = "Al Smith"
>>> students
{'01234': 'John Doe', '023413': 'Al Smith'}
- This means that when you see a statement like this
email_addresses["joe"] = "joe@gmail.com"
- You cannot tell by looking at it whether you have added an entry ...
- or changed the value associated with an exiting entry
- The key determines what happens
- If the key is not already in the dictionary ...
- a new entry has been added
- If the key is already in the dictionary ...
- an existing entry gets a new value
Looping Through a Dictionary
- You can loop through a dictionary using a
for
loop
- The loop variable gets the values of each key ...
- as it loops through the statements in the code block
- You can use the key to get the value
>>> scores = {"amy" : 100, "bill": 95, "dave" : 60, "sally" : 95}
>>> for name in scores:
... print(name, scores[name])
...
dave 60
amy 100
sally 95
bill 95
- You can print the keys in sorted order using the
sorted
function
sorted
is a built-in function ...
- that takes anything you can use in a
for
loop ...
- and returns a sorted list
- If you run
sorted
on a dictionary it will return a list of sorted keys
>>> for name in sorted(scores):
... print(name, scores[name])
...
amy 100
bill 95
dave 60
sally 95
When To Use a Dictionary
- Lists and tuples are used when the elements are not particularly unique
- In a list of quiz scores there is nothing special about an individual score
- But every student is different
- Each student has a unique identity
- Dictionaries are used to store data about unique things
Lists versus Dictionaries
- Both lists and dictionaries are objects that hold multiple values
- In a list you access a value by its
index
- In a dictionary you access a value by its key
- You can think of a list as a collection of values
- While a dictionary is a collection of
variables
- A variable is a place in memory with a name ...
- that holds a value
The in
And not in
Operators
- The
in
and not in
operators work with dictionaries
- The same way they work in sequences
- It tells whether the dictionary contains a key
>>> digit_names = {"one":1, "two":2, "three":3, "four":4, "five":5 }
>>> "one" in digit_names
True
>>> 1 in digit_names
False
- The
not in
operator returns True
if the first
operand is not a key in the dictionary
>>> "one" not in digit_names
False
>>> 1 not in digit_names
True
Deleting Elements from A Dictionary
- To delete an entry in a dictionary you use a delete statement
- The general format of such a statement is
del DICTIONARY_NAME[KEY]
- Where DICTIONARY_NAME is a variable that points to
a dictionary object
- And KEY is the key of the entry to be deleted
>>> words_integers
{'three': 3, 'five': 5, 'two': 2, 'one': 1, 'four': 4}
>>> del words_integers["five"]
>>> words_integers
{"three": 3, "two": 2, "one": 1, "four": 4}
- If you use a key that does not exist
- You will get a KeyError
exception
is raised
del words_integers["six"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'six'
Getting the Number of Elements in a Dictionary
- The len function returns the number of entries
in a dictionary
>>> email_addresses
{'Chris': 'chrisk@yahoo.com', 'Alan': 'alanh@gmail.com'}
>>> len(email_addresses)
2
>>> words_integers
{'five': 5, 'one': 1, 'six': 6, 'two': 2, 'four': 4, 'three': 3}
>>> len(words_integers)
6
Attendance
Sets in Mathematics
- A set is an unordered collection of distinct objects
- You can put anything into a set
- But you can't add a value that is already there
- If you try, the set will not change
- The set with nothing in it is called the
empty set
Set Membership
- If the value x is in set A
we say that x is a
member
of A
Subsets and Supersets
- If you have two sets, A and
B ...
- and all the values in A are also in
B
- Then A is a
subset
of B
- Another way of describing this situation is that B is a
superset
of A
- The situation is shown in the following diagram
Union of Sets
- Again start with two sets A and
B
- The set of all the elements A and all the elements of
B is called the
union
of A and B
- In the diagram below the union of A and
B
is shown in red
Intersection of Sets
- The set of elements which are member of A
...
- and also members of B ..
- is the
intersection
of A and B
- In the diagram below, the intersection of A
and B is shown in red
Difference between Sets
- The set of all elements of A not in
B is the
difference
between A and B
- In the diagram below, the difference between A
and B is shown in red
Sets in Python
- A
set
in Python is an object that holds an unordered collection of
unique items
- The items inside a set can by of any data type ...
- as long as the data type is
immutable
Creating a Set in Python
- You create a set in Python using the built-in
set
function
set
takes a single argument
- That argument must be
iterable
- A Python object is iterable if you can use it in a
for
loop
- Here is an example
>>> num_list = [1,2,3]
>>> num_set = set(num_list)
>>> num_set
{1, 2, 3}
Set Literals
- A list
literal
is has values, separated by commas, inside square brackets
>>> list_1 = [1, 2, 3, 4, 5]
>>> type(list_1)
<class 'list'>
- A set literal uses curly braces
>>> nonsense = {"foo", "bar", "bletch"}
>>> type(nonsense)
<class 'set'>
The Empty Set
- We can use empty square brackets to create an empty list
>>> empty = []
>>> type(empty)
<class 'list'>
- But we cannot use empty curly braces to create a empty set
- The empty curly braces are an empty dictionary
>>> empty = {}
>>> type(empty)
<class 'dict'>
- When the creators of Python got to set literals
- They ran out of symbols to enclose the elements
- So they had to reuse { }
- So how do you create an empty set?
- You run
set
with no arguments
>>> set_1 = set()
>>> set_1
set()
- That is why an empty set
empty_set = set()
looks like this
>>> empty_set
set()
Adding Elements to a Set
- Sets are
mutable
objects so they can be changed at any time
- The add method adds a single element to a set
- So if we start with an empty set
>>> s1 = set()
>>> s1
set()
- We can use add to add individual elements
>>> s1.add(1)
>>> s1
{1}
- If you add an element that is already in the set
- Nothing will change
>>> s1.add(1)
>>> s1
{(3, 3, 3), 1, 'two'}
- But it won't raise an
exception
Removing Elements from a Set
- To remove an element from a set use one of two methods
- Both methods take a single argument
- The value that is to be removed
>>> numb_set
{1, 2, 3, 4, 5}
>>> numb_set.discard(2)
>>> numb_set
{1, 3, 4, 5}
>>> numb_set.remove(4)
>>> numb_set
{1, 3, 5}
- The only difference is what happens when you remove a value that is not
in the set
- discard will say nothing
>>> numb_set.discard(2)
>>> numb_set
{1, 3, 5}
- But remove will raise an exception
>>> numb_set.remove(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 4
The Size of a Set
- The
len
function gives the size of a set
>>> set_1 = {1, 2, 3}
>>> len(set_1)
3
>>> set_2 = {3, 2, 1}
>>> len(set_2)
3
>>> set_3 = {"one", "two", "three", "four"}
>>> len(set_3)
4
When Are Sets Equal?
- If two sets have the same elements
- They are equal
>>> set_1 = {1, 2, 3}
>>> set_2 = {3, 2, 1}
>>> set_1 == set_2
True
for
Loops with Sets
- Sets are iterable
- This means that they can be used in a
for
loop
- The general format of a
for
loop looks like this
for LOOP_VARIABLE in ITERABLE_OBJECT:
STATEMENT
...
- If you use a set in a
for
loop you will get each element in the
set
>>> names = {"amy", "bill", "dave", "sally"}
>>> for name in names:
... print(name)
...
1
2
3
4
5
- To print the elements in sorted order use the
sorted
function
>>> >>> for name in names:
... print(name)
...
dave
amy
sally
bill
Testing for Set Membership
- The
in
operator
tells you if a value is contained in a set
>>> set_1
{1, 2, 3, 4, 5}
>>> 7 in set_1
False
>>> 8 in set_1
False
>>> 3 in set_1
True
- The
not in
operator tells you if an element is not
in a set
>>> 8 not in set_1
True
>>> 3 not in set_1
False
Union of Sets in Python
- We can form the union of two sets in Python by using the
union method
>>> A = {1, 4, 8, 12}
>>> B = {1, 2, 6, 8}
>>> A.union(B)
{1, 2, 4, 6, 8, 12}
- The union operation is symmetrical
- This means that
A.union(B)
is the same as
B.union(A)
Intersection of Sets in Python
- Set objects in Python have an intersection method
>>> A
{8, 1, 12, 4}
>>> B
{8, 1, 2, 6}
>>> A.intersection(B)
{8, 1}
- Intersection is also symmetrical so
A.intersection(B)
is the same as
B.intersection(A)
Difference between Sets in Python
- In Python, we can use the set difference method
>>> A
{8, 1, 12, 4}
>>> B
{8, 1, 2, 6}
A.difference(B)
{12, 4}
- Set difference is not a symmetric operation
Subsets and Supersets in Python
- We can tell if one set is a subset of another using the
issubset method
- If we have two sets
>>> A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
>>> B = {1, 3, 5, 7, 9}
- We can ask if one set is the subset of another like this
>>> A.issubset(B)
False
>>> B.issubset(A)
True
- We can ask if one set is a superset of another using the
issuperset method
>>> A.issuperset(B)
True
>>> B.issuperset(A)
False
min
And max
with Sets
- To find the set element with the maximum value ...
- you can use the
max
built-in function
>>> B = {1, 3, 5, 7, 9}
>>> max(B)
9
- To find the set element with the minimum value ...
- use the
min
function
>>> min(B)
1
The os Module
- The os module allows you do naything
the operating system can do
os.getcwd()
os.listdir(path)
os.chdir(path)
Running Unix Commands within Python
os.environ
The os.path Module
- The os.path module contains functions which operate
on pathnames
- os.path is part of the os module
- If you import os you will also import os.path
os.path.isfile(path) and os.path.isdir(path)
os.path.basename(path)
The sys Module
- Python scripts run inside two environments
- The operating system
- The Python interpreter
- The sys module contains variables and
functions ...
- that let you interact with the Python interpreter
- You must import the sys module before you can use it
>>> import sys
Getting Values from the Command Line
- A script can get the values it needs from the command line
- The sys module contains the variable
argv
- sys.argv is a list variable that contains all the
command line arguments
- The first command line argument is the
pathname
used to run the script
Leaving a Running Script
- sys.exit() stops a running script
- You can use it to stop a script before it gets to the end of the code
- Why would you want to do this?
- There are many reasons
- The most common is when your script encounters an error
- Like not getting the command line arguments it needs
The Characters in Regular Expressions
- A regular expression
is a string of characters forming a pattern
- This pattern is compared against a string ...
- looking for a match
- If parts of the string that match the regular expression pattern ...
- we have a match
- A regular expression is a string composed of
- Ordinary characters
- Meta-characters
- Character classes
Ordinary Characters in Regular Expressions
- Ordinary characters are characters which are not meta-characters
- An ordinary character will match itself
- So the regular expression "cat" will match the string "cat"
- Regular expressions are case sensitive
- Upper case characters only match upper case characters
- And the same for lower case
- Digits are ordinary characters
- So the regular expression "5" matches the string "5"
- . matches one of any single character
except the newline
- * matches zero or more occurrences
of the previous character
- * in regular expressions is similar to the
* in Unix
- But there is an important difference
- * in Unix matches 0 or more occurrences of
any character
- * in regular expressions matches 0 or more
occurrences ...
- of the character that comes before it
- To get the same effect as *in Unix you must use
.*
- Note that I said zero or more occurrences
- The + meta-character is like
*
- It is used to indicate repetition of the previous character
- But * means zero or more occurrences
- + means one or more occurrences
- ? is also a repetition meta-character
- It means zero or one occurrences of the previous character
- In other words, it means the previous character is optional
- The backslash, \ , is a meta-character
- It turns off the special meaning of the character that immediately follows it
- It performs the same function as the backslash on the Bash command line
- If you wanted to search for a meta-character
- You would have to put \ in front of it
- The \ is also used in character classes
Character Classes
- Character classes match a single occurence of a set of characters
- A character class is represented by a \ in front of
a single letter
- When the letter is lower case the character class matches one
occurence ...
- of any character in the set
- When the letter is upper case the character class matches one
occurence ...
- of any character not in the lowercase class of the same letter
\d and \D Character Classes
- \d matches a single digit
- \d can be used with a repetition
meta-character ...
- to match many occurrences of a digit
- \D matches any single character that is not a
digit
The \w and \W Character Classes
- \w matches any single alphanumeric
character ...
- and the underscore, _
- The alphanumeric characters are the letters and the digits
- \W matches any single character that is
not a letter ...
- or an underscore, _ ...
- or a digit
The \s and \S Character Classes
- \s matches any
whitespace character
- \S matches any character that is not whitespace
Getting Strings from a Match
- Regular expressions can be used to get parts of the matching string
- To extract part of a string from a match we need two things
- The ( ) meta-character
- The group method of a match
object
Repetition in Regular Expressions
- If we want to match a certain number of digits
- We can use many instances of \d
- Like this
\d\d\d
- But there is another way
- We can follow \d with curly braces,
{ }
- And put the number of repetitions we want inside the braces
- Like this
\d{3}
- This works with all character classes
- And all ordinary characters
Specifying a Range of Repeating Characters
- We can also use curly braces to specify a range of repetitions
- When we do this, the curly braces contain two integers ...
- separated by a comma
- The first integer is the minimum number of repetitions ...
- a nd the second is the maximum
- If we wanted to match either 1, 2, or 3 digits we would write
\d{1,3}
Creating Custom Character Classes
- Character classes
are sets of characters
- Python provides 6 predefined character classes
- \d matches any digit
- \D matches any character
not a digit
- \w matches any alphanumeric
character and _
- \W matches any character
not an alphanumeric or _
- \s matches any whitespace
character
- \S matches any character
not a whitespace
- But Python lets you define your own character classes
- We do this using the [ ] meta-characters
- The characters you place inside the square brackets
- Are the characters in the character class
- If we wanted match a least one occurence of even digits we would write
[02468]+
Ranges of Characters in a Character Class
Greedy versus Non-greedy Matching
- There are two repetition meta-characters that can match many characters
- By default, any search using these meta-characters will always be "greedy"
- That means that the match will always be as long as possible
- Most of the time this is what you want
- But sometime it isn't
- To do this, put the meta-character ? ...
- after the repetition meta-characters *
or +