close

Se connecter

Se connecter avec OpenID

CIS 191: Linux and Unix

IntégréTéléchargement
CIS 191: Linux and Unix
Class 4
October 7th, 2015
Next week
• Lecture on Makefiles
• Xiuruo OOO
Running at
• In Ubuntu, you’ll probably need to install at
– sudo apt-get install at
– It should just work after this…
• In OSX, at relies on the atrun daemon to manage its jobs
– See man atrun
“The atrun utility runs commands queued by at(1). It is invoked
periodically by launchd(8) as specified in the
com.apple.atrun.plist property list. By default the property list
contains the Disabled key set to true, so atrun is never invoked.
Execute the following command as root to enable atrun:
launchctl load -w
/System/Library/LaunchDaemons/com.apple.atrun.plist”
Outline
Language Theory Overview
Grep Regular Expressions
Examples of Grep Regular Expressions
Sed
Languages
• A set of strings of symbols
• These symbols form an “alphabet”
• The language is “decided” by some process which
decides if a string is in the language or not
Regular Languages
• A regular language is a set that can be decided by
viewing a single character at time, using a fixed amount
of memory!
– Specifically, regular languages are languages that can be decided
by a DFA (deterministic finite automaton); you’ll learn more
about this in CIS 262 if you haven’t taken it already.
• It doesn’t matter how long the string is!
Regular Expressions
• A regular expression exactly describes a regular language
– That is, every regular language can be described by some
regular expressions
– And a regular expression describes a regular language
Regular Expressions Illustrated
• Suppose A and B are regular languages.
Regular Extensions
• A few extensions to classical regular expressions that stay
within regular langauges
– If A is an RE, then A+ matches one or more copies of A
– If A is an RE, then A? matches one or no copies of A
Core regex in one page
• ABC
– Sequence of A B and C, exactly one copy of each
• A|B
– A or B
• *
– >= 0 copies
• +
– >= 1 copies
• ?
– 0 or 1 copies
Truly Regular Expressions
• abc matches only the string “abc”
• (ab)* matches the empty string “”, “ab”, “abab”, …
• (a|b)+ matches any string containing some number of
‘a’s and ‘b’s
• (a*b)+ matches any string that has any number of ‘a’s
followed by a single ‘b’, at least once
– In other words, any string of ‘a’s and ‘b’s which ends in a ‘b’.
• a(b|c)*a matches any string which starts and ends with
an ‘a’ and has only ‘b’s and ‘c’s in between.
More Regular Expression Extensions
• There are a number of extensions that allow for more
concise representation
– . (dot) matches any single character (any character at all)
– [cde] matches any single character (here: c, d, and e) listed
between the square brackets
– [h-l] matches any character in the range of characters from h-l
• To match any character not in the list, place a caret (^) first inside
the brackets.
– [^0-9] matches anything that is not a digit.
– If A is a RE, then A{n,m} matches anywhere between m and n
copies of A, inclusive.
– A{n} matches exactly n copies of A.
• On this slide, .,[, ], {, and }, are metacharacters.
Metacharacters
• A certain number of predefined shortcuts (character
classes) are provided.
– [[:space:]], or ‘\s’, matches any whitespace character.
– [[:alnum:]], or ‘\w’, matches any “word” character
• By which we mean letters and numbers, though some
implementations include underscores (_)
–
–
–
–
[[:digit:]], ‘\d’, matches any digit (0-9)
^ matches “beginning-of-line”
$ matches “end-of-line”
\< and \> matches word boundaries
Metacharacters
• \\ matches backslash (\)
– Since \ is normally used to specify other metacharacters
• \* matches an asterisk
– Since * usually matches anything…
• \. matches a dot
• Metacharacters need to be preceeded by a backslash in
order to match the literal character
“Regular” Expressions: a Misnomer
• Just about any name but “regular” would have been
better!
– Many extensions describe non-regular languages
– The syntax and behavior is different for just about every system
involving regular expressions!
– What needs escaping changes based on implementation
• In fact, Vim has four different settings for this.
– See “:help magic”
– The way we describe or apply regular expressions and gather
the matches differs across settings.
New Skill
xkcd.com/208
Our focus: grep and sed
• As we’ve discussed, grep applies a regular expression to
each line in input file or files
• sed is a stream editor
– More on this soon…
Outline
Language Theory Overview
Grep Regular Expressions
Examples of Grep Regular Expressions
Sed
Motivating Examples
• We’re usually searching for a particular kind of text
– An integer, maybe with a minus sign in front
– A decimal number (for example 2.718)
– A first name followed by a last name
• Or maybe a last, first
– An email addres
– Sentences beginning with the word “The”, ending with
punctuation.
– A phone number
– Prime numbers
• This really does exist, but it relies on backreferences and is rather
inefficient…
Integers and Decimals
• Integers start with an optional -, followed by one or more
digits. The perfect regular expression is therefore…
Integers and Decimals
• Integers start with an optional -, followed by one or more
digits. The perfect regular expression is therefore…
– -?[[:digit:]]+
– -?\d+
Integers and Decimals
• Integers start with an optional -, followed by one or more
digits. The perfect regular expression is therefore…
– -?[[:digit:]]+
– -?\d+
• How about decimals? First, we need a characterization.
– There is an optional minus sign, then an optional string of digits,
followed by a ., then a string of digits.
Integers and Decimals
• Integers start with an optional -, followed by one or more
digits. The perfect regular expression is therefore…
– -?[[:digit:]]+
– -?\d+
• How about decimals? First, we need a characterization.
– There is an optional minus sign, then an optional string of digits,
followed by a ., then a string of digits.
– -?[[:digit:]]*\.[[:digit:]]+
– -?\d*\.\d+
Names
• Let’s begin with a characterization.
Names
• Let’s begin with a characterization of First Name Last
Name format.
– A capital letter, followed by any number of letters, then a space,
then another capital followed by any number of letters
• Now, let’s come up with the regular expression
Names
• Let’s begin with a characterization of First Name Last
Name format.
– A capital letter, followed by any number of letters, then a space,
then another capital followed by any number of letters
• Now, let’s come up with the regular expression
– [A-Z]\w*\s[A-Z]\w*
Names
• Let’s begin with a characterization of First Name Last
Name format.
– A capital letter, followed by any number of letters, then a space,
then another capital followed by any number of letters
• Now, let’s come up with the regular expression
– [A-Z]\w*\s[A-Z]\w*
• Do you see any potential issues with this approach?
Names
• Let’s begin with a characterization of First Name Last
Name format.
– A capital letter, followed by any number of letters, then a space,
then another capital followed by any number of letters
• Now, let’s come up with the regular expression
– [A-Z]\w*\s[A-Z]\w*
• Do you see any potential issues with this approach?
– What about hyphenated names? Multiple names? Middle
initials? Middle names written out?
Aside: Solve the Problem You Want to
• Many regular expressions will match the target
– But some are easier to construct (and to understand) than
others.
• If you know a little more about the text you will be
handling, you can sometimes make shortcuts
– This will become more apparent when we get to replacing
(rather than just matching) text.
• Modifying the problem is a major theme throughout
computer science, and in this course as well!
Aside #2: Evil Regular Expressions!!!
• There are two main kinds of RE engines.
– NFA (Nondeterministic Finite Automaton) engines step through
the regex and may backtrack on the input text
– DFA (Deterministic Finite Automaton) engines always move
forward in the string character by character
– Nonbacktracking NFA engines do exist…
– See http://swtch.com/~rsc/regexp/regexp1.html for more
details on the differences.
• The runtime can increase drastically for the following
– Repetitions of overlapping alternations
– Repetitions within repetitions
– Repetitions containing both wildcards and normal characters
Aside #2: Some evil examples
• Can you figure out why these might be “evil”?
–
–
–
–
–
(x*)*
(x.)*
(x|xx)*
(x|x?)*
The prime number checker we mentioned earlier
Aside #2: Some evil examples
• Can you figure out why these might be “evil”?
–
–
–
–
–
(x*)*
(x.)*
(x|xx)*
(x|x?)*
The prime number checker we mentioned earlier
• Think about how they behave on the string
– xxxxxxxxxxxxxxxxy
Aside #2: Some evil examples
• Can you figure out why these might be “evil”?
–
–
–
–
–
(x*)*
(x.)*
(x|xx)*
(x|x?)*
The prime number checker we mentioned earlier
• Think about how they behave on the string
– xxxxxxxxxxxxxxxxy
• Matching is exponential because ‘x’ matches with both
the sub-expression x* and the expression (x*); every time
it sees an ‘x’ input, potential matching paths doubles!
ReDos
• Regular expression denial of service
• Use evil regex to attack a service that accepts arbitrary
regex
• https://en.wikipedia.org/wiki/ReDoS
Outline
Language Theory Overview
Grep Regular Expressions
Examples of Grep Regular Expressions
Sed
grep with extended regex
• Generally, we want to use extended regular expressions
(as we discussed earlier)
– So when you call grep, call it with the –E flag
ps -aux
• All processes
• You can look up a particular process using grep…
ps aux
$ ps –aux | grep yes | less
ps aux with word boundry
$ ps -aux | grep –w yes | less
C identifiers
• Suppose we want to find all uses of the function strfry
in the directory chef
• We can use Bash expansions and grep together!
$ grep –E strfry *.c
chef.c: strfry(p_str);
chef.c: cond ? strfry(uuname) : uuname
recipes.c: is_strfry_ingredient(p_src)
C Identifiers
• But grep included results that we didn’t want, such as
is_strfry_ingredient
• What can we do?
C Identifiers
• But grep included results that we didn’t want, such as
is_strfry_ingredient
• What can we do?
– Include word boundaries!
$ grep –E \<strfry\> *.c
chef.c: strfry(p_str);
chef.c: cond ? strfry(uuname) : uuname
Grepping for Hardware…
• Another common scenario: attempting to find a
particular piece of hardware
• The lspci command will spit out a list of available PCI
(Peripheral Component Interconnect) devices
$ lspci | grep –i Network
Ethernet controller: Intel 82566MM Gigabit
Network controller: Intel PRO/Wireless
Grepping for Hardware
• Which kernel modules are related?
$ lsmod | grep –i iwl
iwl4965
202721
iwl_legacy 146875
mac80211
267163
cfg80211
170485
0
1 iwl4965
2 iwl4965,iwl_legacy
3 iwl4965,iwl_legacy,
mac80211
Display only the matching text
• Generally, when grep finds a match, it will display the
entire line
• Most of the time this is what you want!
• But when you are trying to extract a match from the text
– Like when you are looking for an address or a phone number…
• You may want to only display the match.
• You can do this with the –o option
– grep –oE ‘regular expression’ file_list
– displays just the matches on separate lines
Greedy Matching
• Let’s right a regular expression to match all instances of
html tags of the form <p>, <em>, <title>…
Greedy Matching
• Let’s right a regular expression to match all instances of
html tags of the form <p>, <em>, <title>…
– <.*>
Greedy Matching
• Let’s right a regular expression to match all instances of
html tags of the form <p>, <em>, <title>…
– <.*>
• What if we run this on
– <strong>Hi! I’m an example!</strong>
Greedy Matching
• Let’s right a regular expression to match all instances of
html tags of the form <p>, <em>, <title>…
– <.*>
• What if we run this on
– <strong>Hi! I’m an example!</strong>
• We’ll get the following match:
– <strong>Hi! I’m an example!</strong>
What went wrong?
• Grep matches expressions greedily.
• This means that it will try and match as much as it can (if
there is more to match in a line, it will do so – even if it
has already found a match!)
• While there are some syntaxes (such as Perl) which allow
for lazy matching, Grep’s extended regex syntax does not
allow this!
• You can use perl syntax with grep –P, but we are not
allowing that for assignments in this class.
A right answer (without greed)
• <strong>Hi! I’m an example!</strong>
• What if we try the following expression:
– <[^>]*>
A right answer (without greed)
• <strong>Hi! I’m an example!</strong>
• What if we try the following expression:
– <[^>]*>
• We’ll match every character that is not the close brace,
followed by a close brace.
• Hallelujah! Success! We get
– <strong>
– </strong>
• Just as we expected.
A right answer (without greed)
• <strong>Hi! I’m an example!</strong>
• What if we try the following expression:
– <[^>]*>
• We’ll match every character that is not the close brace,
followed by a close brace.
• Hallelujah! Success! We get
– <strong>
– </strong>
• Just as we expected.
Outline
Scheduled Jobs
Language Theory Overview
Grep Regular Expressions
Examples of Grep Regular Expressions
Sed
Sed Introduction
• The man page for sed describes it as “a stream editor for
filtering and transforming text.”
• You should always run sed with the –r option, which
allows for extended regular expressions
– Noticing a pattern here?
• You also always want to give sed its regular expressions
in single quotes, which tells Bash not to expand dollar
signs, asterisks, question marks, and so on
Sed Syntax
• sed regular expressions take the syntax
– s/regex/replacement/flags
• The g flag tells sed not to stop after the first replacement
– Think “globally”
• Patterns can be captured in parentheses, and used in the
replacement with backreferences
– Sort of like storing matched information in variables…
– Tell sed to store this information using extra parentheses in your
expression. Refer to them later with \1 for first group, \2 for
second group…
Regular Expression Parenthesis Groups
• From out in first, then from left to right.
• Recall the Name example from earlier
– [A-Z]\w*\s[A-Z]\w*
• If we rewrite the expression as
– (([A-Z]\w*)\s([A-Z]\w*))
• Group “1” matches the full name
• Group “2” matches the first name
• Group “3” matches the last name
Sed Examples
$ echo “hello” | sed –r ‘s/lo/p/
help
$ echo “Here is a sentence” | sed
Here was a sentence
$ echo “This is a sentence” | sed
This is not a sentence
$ echo “This is a sentence” | sed
ThXXX is a sentence
$ echo “This is a sentence” | sed
This not is not a sentence
$ echo “This is a sentence” | sed
This is not a sentence
–r ‘s/is/was/’
–r ‘s/is/is not’
–r ‘s/is/XXX’
–r ‘s/is/is not/g’
–r ‘s/\<is\>/is not/g’
Another Sed example
•
•
•
•
Consider translating a list of phone numbers from
(xxx)-xxx-xxxx to
xxx-xxx-xxxx
We need to replce the parenthesized part of the numbers
with its contents…
• sed –r ‘s/\(([0-9]{3})\)/\1/’
– Extra parentheses tell sed to store the matched number
– \1 grabs the matched text as a backreferences
Another Sed example
•
•
•
•
Consider translating a list of phone numbers from
(xxx)-xxx-xxxx to
xxx-xxx-xxxx
We need to replce the parenthesized part of the numbers
with its contents…
• sed –r ‘s/\(([0-9]{3})\)/\1/’
– Extra parentheses tell sed to store the matched number
– \1 grabs the matched text as a backreferences
• But there’s a simpler solution…
Another Sed example
•
•
•
•
Consider translating a list of phone numbers from
(xxx)-xxx-xxxx to
xxx-xxx-xxxx
We need to replce the parenthesized part of the numbers
with its contents…
• sed –r ‘s/\(([0-9]{3})\)/\1/’ numbers
– Extra parentheses tell sed to store the matched number
– \1 grabs the matched text as a backreferences
• But there’s a simpler solution… Remove the parentheses!
– sed –r ‘s/[\(\)]//’ numbers
Another Example
• Consider changing a list of names from (Last, First) to
(First, Last)
• As usual, we need to characterize the input first
Another Example
• Consider changing a list of names from (Last, First) to
(First, Last)
• As usual, we need to characterize the input first
– A capital letter, followed by any number of letters, then a
comma and a space; finally, one more capital letter and any
number of other letters.
• And the sed expression?
Another Example
• Consider changing a list of names from (Last, First) to
(First, Last)
• As usual, we need to characterize the input first
– A capital letter, followed by any number of letters, then a
comma and a space; finally, one more capital letter and any
number of other letters.
• And the sed expression?
– sed –r ‘s/([A-Z]\w*),\s([A-Z]\w*)/\2, \1/g’
Auteur
Document
Catégorie
Uncategorized
Affichages
4
Taille du fichier
500 KB
Étiquettes
1/--Pages
signaler