IT 117: Intermediate Scripting 
			Class 11
		
	
	
	
	
	
	
	
	Tips and Examples
	
	
	Review
	
	
	New Material
	
	
	
	Microphone
	Graded Quiz
	You can connect to Gradescope to take weekly graded quiz 
	   today during the last 15 minutes of the class.
	
	Once you start the quiz you have 15 minutes to finish it.
	You can only take this quiz today.
	There is not makeup for the weekly quiz because Gradescope does not permit it.
	
	Solution to Homework 4
	I have posted a solution to homework 4
		here.
	
	Let's take a look.
	Homework 6
	I have posted homework 6 
		here.
	
	It is due this coming Sunday at 11:59 PM.
	Midterm
	The Midterm exam for this course will be held on Tuesday,
		March 25th.
	
	That is the first Tuesday after the Spring Break
	The exam will be given in this room.
	It will consist of questions like those on the quizzes along with questions
		asking you to write short segments of Python code.
	
	60% of the points on this exam will consist of questions from the Weekly
	   Graded Quizzes.
	
	There is a link to the answers to the graded quizze on the class web page.
	There will be 15 of these questions worth 4 points each.
	The other 40% of points will come from four questions that ask you to 
		write a short segment of code.
	
	Each of the code questions is worth 10 points each.
	To study for the code questions you should know
	
		- Dictionaries
- Sets
- How to use the os and 
			sys modules
		
- How to write a regular expression
A good way to study for the code questions is to review the Class Exercises
	   and homework solutions.
	
	The last class before the exam, Thursday, March 13th, will be a review session.
	You will only be responsible for the material in the Class Notes for that class
		on the exam.
	
	You will find the Midterm review Class Notes 
		here.
	
	If for some reason you cannot take the exam on the date mentioned above
		you must contact me to make alternate arrangements.
	
	The Midterm is given on paper.
	I scan each exam paper and upload the scans to Gradescope.
	I score the exam on Gradescope.
	You will get an email from Gradescope with your score when I am done.
	The Midterm is a closed book exam.
	You are not allowed to use any resource, other than what is in your head, while taking the exam.
	Cheating on the exam will result in a score of 0 and will be reported to the Administration.
	Remember your Oath of Honesty.
 
	To prevent cheating, certain rules
		will be enforced during the exam.
	
	
	Questions
	Are there any questions before I begin?
	Tips and Examples
	
	Making Script Executable
	
		- All scripts submitted for this course must be 
			executable ...
		
- or you will lose points
- You make a file executable by doing two things
			
		
- The hashbang line must be the very first line of the script
- The first two characters of this first line must be #!
- This must be followed by the absolute address of the Python 3 interpreter
- For this course the hashbang line must be
			
#! /usr/bin/python3
 
- You can make a file executable by running the following Unix
			command on it
			
chmod 755 FILENAME 
- FILENAME is a placeholder
- You replace this with the name of the file you are making
			executable
		
- If I wanted to make the file hw2.py
			executable, I would would write the following on the Unix
			command line
			
chmod  755  hw2.py 
- You can also change the permissions in FileZilla
- Connect to pe15 using FileZilla
- Go to the homework directory for the assignment
- Right-click on the homework script
- Drag down to "Permissions" in the menu that appears
- Enter 755
Getting Your First IT Job
	
		- Students sometimes come to me asking for advice on how to get an IT
			job
		
- I have not had a job in industry for many years
- So I usually refer them to  
			Career Services
		
- They provide a web site called 
			Handshake
			where companies can find interns
		
- In IT, as in many fields, your first job is the stepping stone
			to your career in the field
		
- But it can be hard to find an internship or a job
			when you have no IT experience
		
- If this is a situation you face, there are things
			you can do
		
- If you have a job, check with your current employer
- They must have computers somewhere and maybe you can
			help in keeping them running
		
- If they have an IT Department ask if there is something you
			can do for them in your free time
		
- You can do something similar with local organization
			like a church, temple, mosque or youth group
		
- They probably use a computer for some of the work they do
		    or perhaps they need a web page
		
- Volunteer to do some IT work for them in return for a letter
			talking about the work you did
		
- You can cite this work in your resume
- Another place to find volunteer opportunities is
			Volunteer Match
		
- They have virtual opportunities that in many different areas
- Or perhaps you can find some open source project that needs 
			help
		
- The Free Software Foundations
			is based in Boston and often needs volunteers
		
- Many people who are looking to hire people say they are
			having a hard time ...
		
- finding people who take their work seriously
- Whenever you get a job, no matter how menial ...
- be sure to do your best ...
- and take the work seriously
- Employers don't want people who don't give a damn
- The economy goes through cycles and sometimes jobs are hard 
			to find
		
- Many people who are looking to hire people say they are
			having a hard time ...
		
- finding people who take their work seriously
- Whenever you get a job, no matter how menial ...
- be sure to do your best ...
- and take the work seriously
- Employers don't want people who don't give a damn
- Just get in the habit constantly looking for opportunities
- But above all don't give up
		- When solving a problem, one of your best resources ...
- is the people you work with
- Sometimes you get too wrapped up in a problem ...
- and can't see something that is obvious to others
- Other times you need a function or technique ...
- that you have never used before
- Nobody in IT knows everything
- Of course you could always Google for an answer
- But someone you work with might be able to explain it ...
- saving you hours of searching
- All of us who work in IT are critically dependent on our peers
- This is why I urge all of you to keep in touch with those you
			meet in your classes ...
		
- or on the job
- Most of the jobs I have had in my life ...
- have come from leads from people I know
- Make sure you have the email or text address of everyone
			you study or work with ...
		
- and keep in regular contact
- Maybe go out for a beer with them one a month
- You won't regret it
Review
	Working with the Operating System
	
		- Certain operations can only be performed by the operating system
- For example
			
				- Creating files
- Renaming files
- Deleting files
- Creating directories
 
- All of the things you can do at the command line ...
- can be done within Python
- Python can ask the operating system to perform these tasks for 
			you
		
The os Module
	
		- When you need the operating system to do something ...
- use Python's os module
- Of course you must import it first
			
>>> import os 
- Whenever you need to do something with a file ...
- other then reading or writing ...
- you need the os module
os.getcwd()
	
	os.listdir(path)
	
	os.chdir(path)
	
	os.rename(old_name, new_name)
	
		- You can change the name of a file with os.rename
			
>>> os.chdir("/home/ghoffmn/tmp")
>>> os.listdir(".")
['test.txt', 'dir1']
>>> os.rename("test.txt", "file.txt")
>>> os.listdir(".")
['dir1', 'file.txt']
- os.rename() also works on directories
			
>>> os.rename("dir1", "test_dir")
>>> os.listdir(".")
['test_dir', 'file.txt']
os.remove(path)
	
		- To delete a file use os.remove()
			
>>> os.remove("file.txt")
>>> os.listdir(".")
['test_dir']
- os.remove() does not work on directories
			
>>> os.remove("test_dir")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 21] Is a directory: 'test_dir'
os.rmdir(path )
	
	os.mkdir(path )
	
	Running Unix Commands within Python
	
	os.environ
	
	The os.path Module
	
		- The os.path module contain functions 
			which operate on pathnames
		
- It is part of the os module and does not
			have to be imported ...
		
- if you have already imported os
os.path.isfile(path) and os.path.isdir(path)
	
	os.path.basename(path)
	
	The sys Module
	
		- Python scripts run inside two environments
			
				- The operating system
- The Python interpreter
 
- The sys module contains variables 
			and functions ...
		
- that interact with the Python interpreter		
- You must import the sys module before you can use it
			
>>> import sys 
Getting Values from the Command Line
	
		- A script can get the values it needs from the command line
- The sys module contains the variable 
			argv
		
- sys.argv is a list variable 
			that contains all the command line arguments ...
		
- as well as the pathname used to run the program
			
$ cat argv.py 
#! /usr/bin/python3
import sys
print('sys.argv:', sys.argv)
$ argv.py foo bar blecth
sys.argv: ['./argv.py', 'foo', 'bar', 'blecth']
Leaving a Running Script
	
		- You can leave a script before the end of the code ...
- by using the sys.exit() function
		
- Why would you want to do this?
- There are many reasons
- The most common is when you encounter an error ...
- that prevents the script from proceeding
Usage Messages
	
		- If a script does not get the arguments it needs it should print a
			usage message
		
- A usage message tells the user ...
- they have not given the command line arguments needed
- In this class, usages messages must have the form
			
Usage: SCRIPT_NAME ARGUMENT_1 ARGUMENT_2 ... 
- For example, let's say the script list_dir.py 
			needs the name of a directory from the command line
- If it does not get it it should print a usage message that indicates the 
			argument it needs
			
$ ./list_dir.py 
Usage: list_dir.py DIRECTORY_NAME 
- The message is printed by the following code fragment
			
if len(sys.argv) < 2:
    print("Usage:", os.path.basename(sys.argv[0]), "DIRECTORY_NAME")
    sys.exit()
- Let's examine this code
- The first line checks the number of tokens on the commands line
- You might have thought that the value should be 1, not 2
- But sys.argv is a list containing all command line 
			strings ...
		
- including the pathname of the script
		
- So the pathname used to run the script can be obtained from the expression
			
sys.argv[0] 
- The second line prints the message
- It uses the os.path module function 
			basename() to strip away everything ...
		
- except the name of the script
- If I had not done this the usage message would read
			
$ ./list_dir.py 
Usage: ./list_dir.py DIRECTORY_NAME 
- The third line ends the running of the script
Attendance
	New Material
	Regular Expressions
	
	Working with Regular Expressions
	
		- Regular expression are a language used to specify patterns
- You could use regular expressions to find the Sox games ...
- in the range of dates I mentioned above
		
- Regular expressions are very powerful ...
- but they are difficult to learn
- They work in a way that takes some time to get used to
- Some of the characters used in regular expression
			look like Unix meta-characters
		
- But they are work very differently from Unix
- It is easy to get frustrated when first using regular expressions ...
- and to give up on them as not worth the effort
- But they are well worth the time needed to understand them
- The trick is to start slowly ...
- and build up experience as you go along
What You Need to Remember
	
		- In the sections below I will discuss regular expressions
- I will also show you some Python code that uses regular expressions
- You do NOT have to remember the Python
- I will not ask you for them on a test or quiz
- The Python code for regular expressions can be Googled when you need it
- But for this course you will have to learn how to create
			patterns ...
		
- in the regular expression language
		
The Characters in Regular Expressions
	
		- A 
			regular expression
			is a string of characters forming a pattern
		
- This pattern is compared against a string
- If some characters in the string fit the pattern, we have a match
- A regular expression is a string composed of
			
				- Ordinary characters
- Meta-characters
- Character classes
 
Ordinary Characters in Regular Expressions
	
		- Ordinary characters are characters which are not meta-characters
- An ordinary character will match itself
- So the regular expression "a" will match a string like "abc"
- And "bcd" will match "abcde"
- And so on
- When I say "a" matches "abc" ...
- I mean that "a" matches one of the characters in the string
- Regular expressions are case sensitive
- Upper case characters only match upper case characters
- And the same for lower case
- Digits are ordinary characters
- So the regular expression "5" matches the string "256"
Using Regular Expressions to Find a Match
	
		- There is more than one way to use regular expressions
- But the simplest way is to use them to find a line ...
- that matches a pattern written as a regular expression
- When used this way they are like the Unix command grep
- You run grepwith two arguments
				- A string you are trying to match
- A list of files to look for a matches
 
- So if I have a file with scores from Red Sox games
			
2011-07-02      Red Sox @  Astros       Win 7-5
2011-07-03      Red Sox @  Astros       Win 2-1
2011-07-04      Red Sox vs Blue Jays    Loss 7-9
2011-07-05      Red Sox vs Blue Jays    Win 3-2
... 
- The following grepcommand will find all games the Sox won
$ grep Win red_sox.txt 
2011-07-02      Red Sox @  Astros       Win 7-5
2011-07-03      Red Sox @  Astros       Win 2-1
2011-07-05      Red Sox vs Blue Jays    Win 3-2
... 
- We can do the same thing with regular expressions
- To use regular expressions you must import the re 
			module
			
import re 
- This module contains functions and classes that work with regular expressions
- You can find a match in Python using the re module's
			search function
		
- This search takes two arguments
			
				- A regular expression
- A line you are trying to match
 
- Normally, you would use search inside a 
			forloop
- Looping the through the lines in a file ...
- looking for lines that match the pattern
- But to show you how Python works with regular expressions ...
- I will do something different here
		
- I will use search with a 
			string literal
			...
		
- so you can see how search works
- If search finds a match it returns a match object
- search will return this match object ...
- if the regular expression finds matching characters ...
		
- anywhere in the line
- It will return Noneif it cannot find a match
- None
			is like zero for objects
		
- We use search in an assignment statement like this
			
>>> match = re.search("man", "A man, a plan, a canal. Panama")
- Here the regular expression is "man"
- And the line is "A man, a plan, a canal. Panama"
- match is an object variable
- If search finds a match in the line ...
- match will hold a pointer to the match object
- If it does not find a match, the value of match
			will be None
		
- In this case a match was found
			
>>> print(match)
<_sre.SRE_Match object; span=(2, 5), match="man"> 
- We can use match in an ifstatement
- Because Python thinks anything that points to an object is True
>>> if match:
...     print("Found match")
... else:
...      print("No match found")
... 
Found match
- Here is an example of search not finding
			a match
			
>>> match = re.search("xxxxxxxx", "A man, a plan, a canal. Panama")
>>> print(match)
None
>>> if match:
...     print("Found match")
... else:
...     print("No match found")
... 
No match found
Pattern Objects
	
		- search is a function in the
			re module
		
- If it finds a match, it creates a match object ...
- and returns a pointer to it
		
- This match object is  defined in the re
			module
		
- But in order to search for the match ...
- search creates another object ...
- also defined in re
- A pattern object
- Whenever I use regular expression in Python ...
- I do not use the search function
		
- Instead I create a pattern object from a regular expression
- To do this, I use the compile 
			function ...
		
- which is also contained in the re module
- I use it in an assignment statement like this
			
>>> pattern = re.compile("man")
- This pattern object has a search
			method ...
		
- which I can use to find a match
			
>>> match = pattern.search("A man, a plan, a canal. Panama")
>>> if match:
...     print("Found match")
... else:
...     print("No match found")
... 
Found match
- You won't need to remember this for quizzes or exams
- I am showing you this so you can understand what I am doing ...
- in the code below
A Test Function for Regular Expressions
	
		- To experiment with regular expressions we need a test function
- This function will take have two parameters
			
				- A regular expression string
- A line to be matched
 
- The pattern string will be turned into a pattern object ...
- and the search method on this object ...
- will look for the match
- Here is the code
			
def regex_test(regular_expression, line):
    pattern_object = re.compile(regular_expression )
    match_object   = pattern_object.search(line)
    if match_object :
        print("Regular expression:", regular_expression)
        print("Matches:", line)
    else:
        print("Regular expression:", regular_expression)
        print("Does NOT match", line)
- Here it is in operation
			
>>> regex_test("man", "A man, a plan, a canal, Panama")
Regular expression: man
Matches: A man, a plan, a canal, Panama
>>> regex_test("xxx", "A man, a plan, a canal, Panama")
Regular expression: xxx
Does NOT match A man, a plan, a canal, Panama
		- . matches one of any single character ...
- except 
			newline
		
- It works the same way as the ? meta-character
			on the Unix command line
		
- Here is an example
			
>>> regex_test("th.n", "And then I went home")
Regular expression: th.n
Matches: And then I went home
>>> regex_test("th.n", "I am better than you")
Regular expression: th.n
Matches: I am better than you
>>> regex_test("th.n", "I wish I were thiner")
Regular expression: th.n
Matches: I wish I were thiner
- . only matches a single character
- So you must use one . for every character 
			you are trying to match
			
>>> regex_test("t..n", "And then I went home")
Regular expression: t..n
Matches: And then I went home            
>>> regex_test("t..n", "Is there a taint of scandal?")
Regular expression: t..n
Matches: Is there a taint of scandal?
		- * matches zero or more occurrences
			of the previous character
		
- * in regular expressions ...
- is similar to the same character on the Unix command line ...
- but there is an important difference
- * in Unix matches 0 or more occurrences of 
			any character ...
		
- but the regular expression * will only 
			match ...
		
- the character come before it
- So the * in regular expressions is 
			more selective ...
		
- than the * in Unix
- This makes it more powerful
- It will match multiple instances of the character that comes before it
			
regex_test("t*n", "1234 tttttn abcd")
Regular expression: t*n
Matches: 1234 tttttn abcd 
- But it will also match no instances of the character that comes before it
			
regex_test(("t*n", "1234 n abcd")
Regular expression: t*n
Matches: 1234 n abcd
- Notice there is no "t" in the line ...
- but it still matches the line
- You can get the same effect as * in Unix ...
- but you must use .* in regular expressions to do this
			
>>> regex_test("t.*n", "abcd tan efg")
Regular expression: t.*n
Matches: abcd tan efg
>>> regex_test("t.*n", "xx the zzn")
Regular expression: t.*n
Matches: xx the zzn
>>> regex_test("t.*n", "123 train 456")
Regular expression: t.*n
Matches: 123 train 456
>>> regex_test("t.*n", "---think---")
Regular expression: t.*n
Matches: ---think---
- So * means one thing in Unix ...
- and another thing in regular expressions
- This is one of the reasons it takes time to get used to regular expressions
		- The + meta-character is like 
			*
		
- Because it is used to indicate repetition of the previous character
- * matches zero or more occurrences ...
- but + matches one or more occurrences
			
>>> regex_test("ab+c", "xxx  abccccc  yyy")
Regular expression: ab+c
Matches: xxx  abccccc  yyy
>>> regex_test("ab+c", "xxx abbbbbccccc zzz")
Regular expression: ab+c
Matches: xxx abbbbbccccc zzz
- It will not match no occurrences of the character it follows
			
>>> regex_test("ab+c", "xxx  accccc zzz")
Regular expression: ab+c
Does NOT match xxx  accccc zzz
- Unlike the * meta-character
			
>>> regex_test("ab*c", "xxx accccc zzz")
Regular expression: ab*c
Matches: xxx accccc zzz
		- ? is also a repetition meta-character
- It means zero or one occurrences of the previous character
- In other words, it means the previous character is optional
			
>>>  regex_test("ab?c", "qqq abc jjj")
Regular expression: ab?c
Matches: qqq abc jjj
>>> regex_test("ab?c", "123 ac 456")
Regular expression: ab?c
Matches: 123 ac 456
>>> regex_test("ab?c", "786 abbc vvv")
Regular expression: ab?c
Does NOT match 786 abbc vvv
		- The backslash, \ , is a meta-character
- It turns off the special meaning of the character that immediately 
			follows it
		
- It performs the same function as the backslash on the Unix command line
- To search for a meta-character, put a \ in front of 
			it
			
>>> regex_test("a\+b", "345 a+bcde")
Regular expression: a\+b
Matches: 345 a+bcde
- If you don't turn off the meta-character you won't get a match
			
>>> regex_test("a+b", "906 a+bcde")
Regular expression: a+b
Does NOT match 906 a+bcde
- To match more than one meta-character
- Put \ in front of each
			
>>> regex_test( "a\+\+\+b", "567  a+++bcde")
Regular expression: a\+\+\+b
Matches: 567  a+++bcde 
- The \ is also used in character classes
Character Classes
	
		- A character class is a set of characters
- Character classes match a single occurence of a character
			within the set
		
- There are 6 character classes built into regular expressions
- Their names all have the same format
- A \ in front of a single letter
- If the letter following \ is 
			lower case ...
		
- it will match a single character in the set
- But if it is upper case ...
- it matches a single character not in the set
\d and \D Character Classes
	
		- \d matches a single digit
			
>>> regex_test("\d", "1234")
Regular expression: \d
Matches: 1234
- \d can be used with a repetition meta-character
- To match many occurrences of a digit
			
>>> regex_test("\d*a", "1234abc")
Regular expression: \d*a
Matches: 1234abc
- \D matches any single character that is not a digit
			
>>> regex_test("\D", "1a234")
Regular expression: \D
Matches: 1a234
The \w and \W Character Classes
	
		- \w matches any single alphanumeric character 
			...
		
- and the underscore, _
		
- The alphanumeric characters are the letters and the digits
			
>>> regex_test("\w","---a------------")
Regular expression: \w
Matches: ---a------------
>>> regex_test("\w+","---1234abc------")
Regular expression: \w+
Matches: ---1234abc------
- \W matches any single character that is 
			not a letter ...
		
- or an underscore _ ...
- or a digit
			
>>> regex_test("\W+","###" )
Regular expression: \W+
Matches: ###
- Why does \w match the underscore, _?
- Perhaps because it is a legal first character ...
- in variable and function names
The \s and \S Character Classes
	
		- \s matches any
			whitespace character
			
>>> regex_test("a\sb", "----a b----")
Regular expression: a\sb
Matches: ----a b----
- \S matches any character that is not whitespace
			
>>> regex_test("\S+", "abcd")
Regular expression: \S+
Matches: abcd
Class Exercise
	
	Class Quiz