Yes, it is time for the obligatory ``hello world'' program:
The point of a ``hello world'' program is not what the program does, but how you get it to execute. This really depends on the system you are using, but for this class, I will assume it is a Unix system. In fact, I will assume you are using one of the RCI or ICI machines.
hello
.
Your program should be ready to go. Just type hello
at your account prompt and it should run. I won't insult you by
including the output of the program.
While there are two interesting points to make about the program itself, I first want to explain what the above sequence of actions did. (This is the part where people's eyes start glazing over. This is also the part where they shouldn't. Pay attention here and you will be able to figure out how to write and run a perl program on any machine.)
The first four steps took you through the creation of a perl program. This is nothing fancy. If you have edited files on a Unix system in the past, you didn't do anything new. However, notice that I added something to the beginning of the program. That line that starts with the tic-tac-toe thing (#) and an exclamation point (!). (It is pronounced hash-bang in Unixese.)
This new line of text is not part of the program itself. It is a command to the Unix operating system telling it to take the rest of the file and hand it to the program listed after the hash-bang. If you don't follow that, just keep in mind that it tells Unix that the rest of the file is written in perl.
The ``chmod'' thing in step five might be a little cryptic if you only use Unix to read your email. It changes the protection of the file you just created so that it can be executed by you, the user. (chmod u+x hello means that the user that created the file gets execution access to the hello file.
So, if you ever are stuck with the task of running a perl program on a non-Unix machine, you know that you have to find a way to make the file executable and make it so the operating system knows what language the file is in.
When you executed the hello program that you wrote, the perl interpreter started (because of the first line in the file) and it was handed the rest of the file as input.
Notice I said ``perl interpreter.''
Like awk and most BASICs, and unlike a compiled language like C, perl is an interpreted language. Programs written in a compiled language are first compiled into ``machine code'' and the resulting file can be run, independantly of any other program, directly by the machine's CPU. The compiler is not required to run the compiled program.
Every time an interpreted language is run, it requires the presence of another program that can read the language and make sense of it for the CPU. (Some will argue that perl compiles your program into an internal representation, but the bottom line is that your perl programs require the constant presence of another program, the interpreter, to run.)
Oh, yeah. I said there were two interesting things about the program that were worth mentioning. Here they are:
That is the last time I plan to get into that much detail. For the rest of this, I am going to assume that you have had some experience writting programs. If you have no such experience, I would recommend you start your programming experience with a different language, as perl is rather a mess as languages go.
Let's see, where to start... It is often customary to give the breifest version of a simple program to show how easy it is to program in perl. The truth is that such an example is often a lie. It is filled with shorthand that only works in limited (but useful!) situations. So, I am going to write the full blown version of a simple program and go through the rest of the class explaining it using other examples along the way. (I expect the class to go on for the rest of an hour. If time permits, I am going to have a small project for you to do on your own. It will be a question and answer session that will make things clearer to you.)
This program will take any sentence you give it (one line at a time) and produce it with the words reversed. Try it.
Here is the same program with all of perl's shorthand:
For now, don't pay too much attention to the shorthand version. Concentrate on the long version. There is quite a lot going on in it and you will have to understand that first.
Let's start with the variables and literals...
Variables in perl are much like variables in other languages. They are simply symbolic places to store information to be used throughout the program. C requires you to declare your variables to be a certain type. Integer, long, ... C needs to know it all. This is true to some extent with perl, but perl is not nearly as strict.
Perl needs to know what ``kind'' of variable you need and it figures out (for the most part) what type the variable is. It will also convert the value in the variable to whatever form it needs to be in.
The first kind of variable is a scalar. It always begins with a $. It holds single values, be they integers, floating point numbers, or strings.
Here is an example that demonstrates how to do variable assignments in perl and how type conversion happens automatically.
The first line assigns the string 'Life =' to the scalar variable $a. Next the numberic value 35 is assigned to $b. Then, $c gets the string '7' and $d gets the numeric value 42. Notice that $c had to be converted from a string to a number before the addition could be done.
Next, $e gets a string with $a and $d replaced with their current values. Notice that the number in $d needs to be converted back into a string. Finally, the result is printed to the screen.
The next kind of variable is an array. It holds several values (scalars). It begins with an @ when you are referring to the whole (or part of the whole) array and begins with a $ when you are referring to one of the scalars it contains.
Here is the same program but using arrays:
Here is another version of the same program that illustrates another way to assign scalars to an array:
You can think of an array as a column in a spreadsheet. The variable name is the column header, and the index in the square brackets is the row number. One difference is that the row numbers start at zero instead of one, usually. An array can hold as many values as there is memory on the computer.
You can also assign whole or partial arrays to other arrays directly:
And if you have a mind just as twisted as the implementors of the language, you would have probably guessed that this works as you might hope:
The last kind of variable I plan to cover is a hash table. It associates two scalars with each other. It is a lot like an array except you can use strings (or any other scalar) as index values. It is often used to associate strings with numbers. (Like associating a month with the numeric form of the month.)
When you are referring to the whole hash table, you put a % in front of it. When you are referring to a single scalar in the has table, you put a $ in front of it. (Do you see the trend yet? When you need a scalar value, you better have a $ in front of the variable, no matter what kind of variable it is.)
The part that is in the curly baces ({}) is called the ``key'' and the part being assigned (or retieved) is the value.
Here is another program that illustrates another way to assign to a hash table:
Like arrays, whole hash tables can be assigned to other hash tables. In fact, the `=>' symbol is actually an alternative comma. So, when you assign to a hash table, you are actually doing this: %hash_table = (key1, value1, key2, value2, ...);
You can do a couple of other things that are often useful. You can get all the keys or all the values out of the hash table.
This is dangerous in situations where the contents of the hash table can outstrip the memory of the machine. (There are situations where a hash table can hold more information than memory can hold... but we will not be covering that in this tutorial.)
A safer thing to do is to iterate through all the keys and values in the hash, one at a time. Like this:
However, I have not talked about while loops yet. That is the next topic.
If all our perl programs could do was execute from the top of the file to the end, we would be able to do very little with perl. Fortunately, we can control the flow of execution through a program.
Usually, different parts of code are executed based on the truth or falsity of some condition. So... we need to talk about condtions, what it means to be true or false. That is actually the hard part. The control structures are easy. Here is a simple condition:
The behavior of this program is rather clear. If the value of the variable $value is three, then the first print statement is executed. If not, the second gets done.
So, now we know that ``=='' means numeric equality. What are the other comparisions? Here they all are:
Comparison | Numeric | String | Return Value |
---|---|---|---|
Equal | == | eq | True if $a is equal to $b |
Not Equal | != | ne | True if $a is not equal to $b |
Less than | < | lt | True if $a is less than $b |
Greater than | > | gt | True if $a greater than $b |
Less or Equal | <= | le | True if $a is less than or equal to $b |
Greater or Equal | >= | ge | True if $a is greater than or equal to $b |
Comparison | <=> | cmp | 0 if equal, 1 if $a greater, -1 if $b greater |
You can also string a bunch of comparisons together using the logical operators. They are the same as they are in C.
$a && $b | True if $a and $b are true. |
$a || $b | True if $a or $b are true. |
! $a | True if $a is false. |
You can actually use the words and, or and not instead of the symbols. That is less common.
It is easy to say that something is ``true'' or ``false'' but what is true or false to perl? It turns out that this is a really important question. Some things return strings... others numeric values... others are just not defined. What does perl do when each of these things is the condition on which a decision is being made?
The easiest way to think about it is that anything that can be interpreted as a zero, an empty string, or is undefined is considered false. Everything else is true.
Just to make sure you follow me, consider the following code. If we replace condition with things in the left column, we get the output in the right.
|
|
||||||||||||||||||
(Undef is a subroutine that always returns an undefined value. And, yes, it is very useful.)
| If condition is true, execute the statements. |
| If condition is true, execute the first set of statements. If condition is false, execute the second. |
| If the first condition is true, execute the first set of statements. If the second condition is true, execute the second set of statements. Otherwise execute the last set of statements. You can have as many elsifs as you want. (No, I did not misspell elsif.) |
| If condition is true, execute the statement. (Not statements.) |
| As long as condition is true, execute the statements. (Only check condition before starting the statements each time.) There is also an until loop which is logically opposite to while. However, showing that loop at the same time tends to confuse people. |
|
This is really shorthand for this:
As it turns out, there is no reason that you must make each part of a for loop as the example shows. They can actually be quite strange. You can even leave parts out. |
|
Execute statements once for each item in the array @list. The
array can also be a literal array, like this:
|
There are only two more things in that original program that I will need to cover before we can (almost) completely understand it. The first is something called file handles. File handles are the ``variables'' used to reference files, the terminal, sockets, or pipes.
There are three default file handles that are available without having to do any extra work. People who are familiar with Unix programming should recognize them: STDIN, STDOUT, and STDERR. To what each of these file handles references depends on how you start the program. We will concern ourselves with just five situations. All the others require us to know more about perl than we do right now. (Remember, I am assuming that you are using perl on a Unix machine.)
Situation | What's going on | Example Unix Command | ||||||
---|---|---|---|---|---|---|---|---|
Simple Command |
|
% perlprogram | ||||||
Input Redirect |
|
% perlprog < file | ||||||
Output Redirect |
|
% perlprog > file | ||||||
Input from Pipe |
|
% otherprog | perlprog | ||||||
Ouput to Pipe |
|
% perlprog | otherprog |
Any resonable combination of these things is possible, as they would be for any program on a Unix system.
If you think about it, the two basic things you would like to be able to do to a file handle is read from it and write to it. This would enable you to interact with the user or process the contents of a file.
There are several ways to accomplish this. Here are just two:
To read from a file handle one line at a time, do something like this:
As it turns out, STDIN is read only. (Well, you should certainly assume that it is.) So, it doesn't make sense to write to it.
To write something out of a file handle, do something like this:
As it turns out, by default, print will print to STDOUT. So, this will do the same as the above:
By the way, STDERR is where you would put the error messages your program should generate if there is a problem.
Now, go back and see if you can understand the original program. You should know that there are functions used that I did not cover, but their function should be obvious.
Here is the program again:
Things to notice:
Regular expressions are hard to explain to people. To encourage experimentation, I have written a regular expression simulator. It is a Java application that will take a regular expression that you write and some input text and tell you how the regular expression worked. Don't try it just yet, though. Let me try to motivate them first.
Regular expressions are used in all kinds of places, but lets look at them as they would function in conditions first. Look at the following code:
This code fragment will print ``Stopping'' if the variable $text is equal to ``stop''. But what if $text was a complete sentence and we wanted to stop if the string ``stop'' occurred anywhere in that sentence.
A regular expression will let you specify a pattern to match against. It is much more powerful than just a simple comparison. Here is the program that does what we want:
The =~ is the perl pattern match operator. The left side is the text you are checking and the right is the regular expression that represents the pattern you are checking against.
In this case, if the text has the string ``stop'' anywhere in it (even in the middle of a word) the pattern match operator will return true and ``Stopping'' will be printed.
Patterns can get more complicated. In fact, they can get so complicated that it is nearly impossible for anyone but the programmer to understand them.
Here are some things you can stick in your patterns that match special things.
You can use `-' between letters to denote a set of letters. For example, [a-d] is the same as [abcd]. You can do the same with numbers, upper case letters and any sequence of characters in the ACSII character set.
If you put a caret (^) just after the open square bracket, it negates the meaning. For example, [^a-d] would match everything except a, b, c, or d.
For example:
This is very useful if you are processing lines of text from a file or user input and you want to pull parts of the text out... How about pulling the subject out of a mail message:
Or shorter:
There are several more, but this is confusing enough.
Besides $n, there are three other variables that have useful values after a match. $` will have the part of the string that was before the match, $' has the part that was after, and $& has the part that actually matched. For example:
Try the perl regular expression simulator now and see how regular expression operate.
If we look back at the ``split'' that was in the original program, we can now get an understanding of what it does. Here is the line:
Split will take a regular expression and divide a string into pieces. It breaks the string at the places where the regular expression matches. In the above case, the regular expression matches on one or more occurances of a whitespace character. It removes the whitespace characters and returns all the pieces in between as the elements of an array. So, we have something that breaks a sentence into words. (Of course, it acts funny with punctuation.)
Write a perl program that will take a file with each line in the following format:
For each line, it should print out the information with the proper labels. For example, if the line of the file is this:
It is a simple enough program, but you should try to make it as concise as possible. (short, but understandable)