Perltropolis

www.cmpnet.com
The Technology Network

Perltropolis
By Dennis Watson,
Jul 16, 1999 (1:36 PM)
URL: http://www.byte.com/column/BYT19990716S0001

Saying Perl is a programming language is like saying Manhattan is a city. Sure Manhattan is a city, but for many people it is so much more. You can choose to see it as only a city, but perhaps you are cheating yourself out of the fullness of Manhattan. Perl is to programming languages as Manhattan is to cities. It wasn't the first, but it has risen (often on the shoulders of its brethren) to a higher level. It is an icon and a beacon.

Like any decent cosmopolitan city, Perl is multifaceted. You can't know it in just one sitting. Just when you think you've got it down, you discover another avenue that reveals a new scene for exploration.

This month, I hope to explore some of Perls boroughs in an effort to expose it as the rich, featureful, cosmopolitan development environment it is. The CPAN module is the all-in-one solution for finding and installing Perl modules. CPAN is the Comprehensive Perl Archive Network, a loosely integrated set of FTP sites that mirror all of the available Perl modules. CPAN also has a scripts directory and, of course, the Perl source itself. But the CPAN module is used mainly to locate and install Perl modules. Installing CPANThe current distributions of Perl, 5.004_05 and higher, come with the CPAN module already bundled. You can invoke it from the command line, usually as root, like so:

# perl -MCPAN -e shell

The first time it is invoked, it will ask a series of questions, such as: what is a good build directory, and what is your favorite CPAN site. The entire process is actually quite friendly. If you don't know of a CPAN site, it has a built-in list of sites to choose from. If you do manage to monkey it up, don't worry. The settings are in a file called Config.pm in the CPAN directory of your Perl lib directory. On Linux this is /usr/lib/perl5/CPAN/Config.pm. You can go in there and edit it with your favorite text editor.

After you get past the configuration questions, you will get a "cpan>" prompt. Believe it or not, the best thing to do at this point is type "q" and quit the shell. Then invoke it again. It should spring to life again without asking the config questions and it should say something like:

There's a new CPAN.pm version (v1.50) available! You might want to try install Bundle::CPAN reload cpan without quitting the current session. It should be a seamless upgrade while we are running...

There will probably also be some messages, such as: "install Bundle::libnet as soon as possible". The CPAN module has examined your installation and determined that it might be able to do a better job if these modules were installed or upgraded. Bundles are a set of related or dependent modules that the CPAN modules can download and install together. Bundle::libnet is an especially important bundle for CPAN since it provides all of the networking functions via the LWP package. Install it as soon as you can.

Be prepared for a lot of questions when installing Bundle::libnet as well. It will ask for your domain and nearest SMTP server. Again, don't worry. It makes some guesses when it can and you can always open up another xterm and use your tools to find the answer or call your system administrator.

Typical Usage Let's examine a typical session with CPAN.

# perl -MCPAN -e shell

cpan shell -- CPAN exploration and modules installation (v1.40) ReadLine support enabled

cpan> help

command arguments description a string authors b or display bundles d /regex/ info distributions m or about modules i none anything of above

r as reinstall recommendations u above uninstalled distributions See manpage for autobundle, recompile, force, look, etc.

make make test modules, make test (implies make) install dists, bundles, make install (implies test) clean "r" or "u" make clean readme display the README file

reload index|cpan load most recent indices/CPAN.pm h or ? display this menu o various set and query options ! perl-code eval a perl command q quit the shell subroutine

cpan>

Cool, it fired up in shell mode and I asked it for some help with the commands. You can get more information on each command by typing "help command" from the cpan> prompt. For the complete story, you should enter "perldoc CPAN" at the Unix prompt. There are quite a few commands, but the ones I use most are:

m /module_name/ readme module_name install module_name

Let's assume I am interested in MP3s and I want to find Perl modules about MP3s. Of course, this is an inexact search since it is only looking at the module names. Audio::CD might be a module worth investigating, but this simple search will not turn that result up. So let's search on MP3.

cpan> m /MP3/

...snip lot's of messages here...

CPAN: LWP loaded ok Fetching with LWP:

...snip more messages here...

Module id = MPEG::MP3Info DESCRIPTION Extract/edit tag information in MP3 files CPAN_USERID CNANDOR (Chris Nandor ) CPAN_VERSION 0.71 CPAN_FILE CNANDOR/MPEG-MP3Info-0.71.tar.gz DSLI_STATUS bdpf (beta,developer,perl,functions) INST_FILE (not installed)

cpan>

Here I've turned up a module about MP3s. If there were a bunch of them, I would see a list of matches displaying just the name and author of the module. In this case, I got a pretty detailed blurb about the module. I can see the name and author and if I had it installed already, it would show me the current installed version in the INST_FILE section.

But I need to know more about this thing before I go installing it.

cpan> readme MPEG::MP3Info ...lots of output formatted by your favorite pager...

The "readme" command followed by the module name fetches and displays the readme file from the module distribution. This is usually enough to determine if the module fits your needs and perhaps gives you a few caveats.

Great. We decide that we like this module and we want to download and install it. This entire process also happens from within the CPAN module shell.

cpan> install MPEG::MP3Info ...snip lots of informational messages...

The CPAN module takes advantage of the standard Perl installation procedure of "perl Makefile.PL; make; make test; make install". It knows that these are the standard commands issued to install a module, so it performs them for you. It also knows to check the output of each step and it will not proceed if there is an error somewhere along the way.

The newer versions of the CPAN module can even check for dependencies. This hinges on the module writer including a list of dependencies in their Makefile.PL script, but I expect it will become more common soon. If CPAN detects a dependencies, it will ask you if you want it to download and build them first. Try that with InstallSomething Pro.

If you can't already tell, I love the CPAN module and it's just getting better. It also has a gateway to WAIT servers that lets you do more detailed searches for modules.

Mouths always drop and "Wow"s fly into the air whenever I show someone how to use CPAN. Try it once and I think you'll agree. I still use the by-hand method to build things like mod_perl or DBD::Oracle, but for everything else, it's the CPAN module. Before there was Perl there was sed, awk, grep, find, and all of the other shell tools. Call me spoiled, but when I write a shell script these days, I think to myself, "I wish I could do this like I do it in Perl." Too often, though, the script is a one-off or I am simply hacking something together at the command line and it would be overkill to write a Perl script to do a simple task. But if Perl is there and it makes sense, why not use it?

For example, here is a one-liner that fetches the recent uploads to CPAN and displays them. It uses the LWP::Simple module to make the request and print to send it to STDOUT.

> perl -MLWP::Simple -e 'print get("ftp://gatekeeper. dec.com/pub/plan/perl/CPAN/RECENT.html")' | more

Note that I pipe the output to more. I am using Perl like a Unix shell command, as if it were awk or cat. The concept here is that Perl can do small things, too. Using Perl as a pager like more does not make sense, but using it to fetch a document from the Web does. So use Perl for the parts it is best at and use pipes to glue it together with other traditional Unix tools.

Perl even provides command-line options that make it easier to use as a shell command. The -n option wraps a while() loop around the code you supply. Remember that the operator with no fileh andle specified will take its input from STDIN. This type of -n one-liner is good for writing filters. It lets you take advantage of Perl's powerful and familiar regular expression dialect.

% cat /etc/passwd | perl -ne 'print if /^dwatson/' dwatson:x:1111:222:Dennis Watson:/home/dwatson:/ bin/tcsh

In essence, our silly one-liner could be rewritten like this:

% cat /etc/passwd | perl -e 'while() { print if /^dwatson/; }' dwatson:x:1111:222:Dennis Watson:/home/dwatson:/bin /tcsh

Using Perl inside Emacs If you use Emacs and like me cannot remember all of the the differences between Emacs regular expressions and Perl regular expressions, you can invoke Perl (and thus Perl regular expressions) on an Emacs buffer region. Select a region using C-space or C-@ and the arrow keys. Then use C-u M-| to invoke a shell command on the region. Enter perl -pe "expression" in the minibuffer and hit return. Emacs feeds the region to Perl, which munges the stream and prints it. Emacs then replaces the original block with the output of the Perl command.

dos2unix We can instruct Perl to modify files in place and make "just in case" backups for us. The -i switch will make Perl open a temp file, read from STDIN, write changes to the temp file as specified by a script, then rename the temp file to the original file name. The -p switch is like -n in that it wraps your script with a while() but it also adds a "print $_" to the end of the loop. This makes it the choice when writing sed-like scripts that modify each line. Combine the two like 'perl -pi' and you have a powerful and simple tool for munging files. Here is a useful example of a DOS to Unix text file conversion script.

% perl -pi.dos -e 's/\cM$//' index.html

This one-liner substitutes any control-M's at the end of lines away. This cleans up the DOS line endings into nice Unix line endings. We are left with a cleaned-up index.html and a backup file with the old contents called index.html.dos. Of course, this is only one way to accomplish this task. Chop or chomp could also be used here and we could perform much more complex substitutions on the file. Don't you wish you could just try random things out in Perl? Do you write test scripts with lots of prints so you can test a new idea? How about a Perl shell that you could simply type in commands ad hoc and see the results? Enter "perl -d -e 1". Perl -d invokes the Perl debugger. Some readers are looking for the next page link. Don't go away! No need to be shy of debuggers. Especially since the program we are running is the equivalent of this.

#!/usr/bin/perl

That's what the -e 1 part does. We are essentially invoking the debugger on a null script. What's the point? Well, the debugger lets you type in arbitrary Perl commands, it executes them and returns control to you. This is basicaly what a shell does. With our Perl shell, we can test almost any Perl combination of commands.

[dwatson@othelo conf]$ perl -d -e 1

Loading DB routines from perl5db.pl version 1.01 Emacs support available.

Enter "h" or "h h" for help.

main::(-e:1): 1 DB $pi = 3.14159

DB p $pi 3.14159 DB sub area { return $pi*$_[0]**2 }

DB p area(10) 314.159 DB

This is a pretty simple example of creating a variable and a subroutine and trying it out. The p command is short for "print" in the debugger. I often run Perl like this just to use it like a calculator. Perhaps that is using a back-hoe to dig my flower garden, but I'm comfortable in Perl, so why mess around with bc or its equivalent? One thing to note is that "use strict" will get ignored in the debugger.

This example is pretty mundane. What you will find yourself using Perl this way for is more often to test out a new module, just to experiment, or (gasp!) hack. It is a great way to figure out how to get something to work in a script. Just try it out in the shell first and if at first you don't succeed, keep trying.

Here is a somewhat more interesting use of the Perl shell. I am calling up the LWP::Simple module and using one of the functions there to check on the Byte.com servers.

DB use LWP::Simple

DB p head('http://www.byte.com') text/htmlApache/1.3.4 (Unix) mod_perl/1.16 DB

Hey, they use mod_perl. Cool.

Often I have a critical script that I need to edit. Perhaps I'm in a fire fight of some sort and something is really broken and I need to edit this Perl script to fix it. But what if I've made a typo? Then running the script might make the problem worse. Or perhaps the file is a startup.pl script for a mod_perl server and I need to check it for correctness before I restart the server. Perl -wc is like lint for Perl. In fact, I alias it as "alias plint 'perl -wc '". If I simply need to check the syntax of a script, I run it with "perl -wc". The -w turns on warnings and the -c tells Perl to only parse the code and verify its syntax without running it. This will usually catch the errors caused by typos or other oversights. Of course, I always "use strict" in my scripts so that protection is available and gets checked also.

Here's what it looks like checking a mod_perl startup script called starup.pl

% perl -wc startup.pl Subroutine new redefined at /usr/lib/perl5/site_perl/ HTML/Mason/ApacheHandler.pm line 52. Subroutine handle_preview_request redefined at /usr/lib/perl5/site_perl/HTML/Mason/Preview.pm line 141. starup.pl syntax OK

Hmm, it has output a couple of errors about subroutines being redefined. This is caused by the extra pedantic level from the -w flag. Without the -w, it will not speak up this way. But the crying is deserved because there is probably something weird going on in the code and I would be better off if I fixed it. The final message tells us what we really need to know: that the syntax is OK. There are some questionable bits, like subroutines that get redefined, but the script will run. Still, it may not run as expected. Run-time errors like "divide by zero" cannot be checked this way, but "perl -wc" and "use strict" will cover a lot of error-checking ground.

Did I mention that Perl comes with it's own debugger? A lot of people seem to shy away from debuggers, opting instead to debug the "old-fashioned" way -- with lots of print statements littered all over the code. I think this is often a waste of time, especially when the bug is tricky. You are putting print statements where you think the bug is, but it's a bug and it's hiding where you aren't looking. In my experience, the bug becomes obvious somehow when you start stepping through the code and examining the data structures.

I think most people have heard that debuggers are hard to use. They are afraid of some imaginary learning curve. But like most things, you can get buy quite well knowing a few simple commands. Let's take a look at an admittedly contrived script and walk through it in the debugger.

#!/usr/bin/perl # file: test.pl

use strict;

my $faves = {'red', 'yellow', 'green', 'cyan'};

foreach (@{$faves}) { print "$_\n"; }

Emacs Debugger Bug You can run the debugger inside Emacs with M-x perldb. It will ask you how to run the script. Tell it to run it like "perl test.pl" if your script is called test.pl. Recent distributions of Linux I have worked with seem confused. Emacs prompts you in the minibuffer like this:

Run perldb (like this): perl -e 0

I'm not sure why it does this. This will invoke the debugger like the Perl shell discussed earlier, but we want to debug a script. So run it like this:

Run perldb (like this): perl test.pl

I will illustrate using the Perl debugger from the command line. The debugger commands are the same as if you were running it in Emacs or some other environment. Start the debugger on the hypothetical test.pl script like this:

% perl -d test.pl

Loading DB routines from perl5db.pl version 1.01 Emacs support available.

Enter h or `h h' for help.

main::(t.pl:5): my $faves = {'red', 'yellow', 'green', 'cyan'}; DB l 5==> my $faves = {'red', 'yellow', 'green', 'cyan'}; 6 7: foreach (@{$faves}) { 8: print "$_\n"; 9 } 10 11 DB b 8 DB

Perl starts up in debug mode and displays the first line of the program to be debugged. It has stopped on this line and is awaiting our command. The "l" command makes it print a short listing of the program near the region we are currently stopped at. This helps us see what is about to happen and keep a sense of balance. It puts an arrow on the line where we are stopped. I then set a break point on line eight's print statement with the "b 8" command. If we let the debugger continue running with a "c" command, it would stop for us at this break point. You can also set conditional break points so it will stop only when a variable has a certain value.

DB p hello hello DB p $faves

DB s main::(t.pl:7): foreach (@{$faves}) { DB l 7==> foreach (@{$faves}) { 8:b print "$_\n"; 9 } DB p $faves HASH(0x82a585c) DB

The "p" command is a shortcut for Perl's print. You can see how I printed "hello", then the value of the variable "$faves". At this point in the program, $faves is unassigned so nothing is printed. Then I take a step via the "s" command. "S" will step into a subroutine or simply step to the next statement, as it has here. "N" is like "s", except it goes through subroutine and stops you at the next statement in the current level. I do not use "n" here but I thought it deserved mention.

You can see that we have moved off of line five and are at the beginning of the foreach loop on line seven. At this point, the assignment of $faves should have happened, so I try to print it. It tells be that $faves is a hash reference. Really?

DB x $faves 0 HASH(0x8296d84) 'green' => 'cyan' 'red' => 'yellow' DB

The "x" command will dump any data structure in pretty printed manner. Here we see the name, value pairs of the hash reference $faves. The line "0 HASH(0x8296d84)" is the debugger's way of telling us that $faves points to a structure at memory location 0x8296d84 and that it considers that structure a hash.

Our example wants a list reference, so let's change $faves into a list and continue.

DB $faves = ['red', 'yellow', 'green', 'cyan']

DB x $faves 0 ARRAY(0x82444ac) 0 'red' 1 'yellow' 2 'green' 3 'cyan' DB c red main::(t.pl:8): print "$_\n"; DB c yellow main::(t.pl:8): print "$_\n"; DB q

I reassign $faves as a properly formatted array reference and dump it again. Now we can see that it is an array reference. I let the program continue with the "c" command. It prints the first value then runs into the break point we set earlier. Repeating the continue command makes it loop around, print another fave, and break. At this point, I am satisfied that the program will work properly once I change $faves into an array reference, so I quit the debugger with the "q" command.

We have only touched on the available functionality of the Perl debugger, but this is often enough to get by. Enter "h" at the debugger to get a list of all the available commands. Entering "h command" will get you a more detailed help message for that command. One of the better things about Java is javadoc. Java programmers get all this documentation about their class libraries all formatted and hyperlinked simply by running javadoc over their .java files. Perl programmers have this capability as well in the form of POD. POD is short for Plain Old Documentation. Javadoc encourages programmers to sprinkle their inline comments with small javadoc tags. POD can be used this way as well, but it is more often structured as a man-like page at the end of a script or package.

Perl installs the command "perldoc" in the same directory as the Perl binary itself when you install it. Perldoc offers a way to search the documentation of the modules installed on your system. If you want to find out more about the CPAN module, type "perldoc CPAN" from the Unix command line. Perldoc will find the appropriate module, extract, and format the POD, and feed it through a pager program for you. You can give perldoc any module name in the same format as you would use it in a script. So you can find out about Text::Template via 'perldoc Text::Template'. Unfortunately, perldoc is case-sensitive, and, of course, you can only get documentation on modules you have installed.

If you want nice HTML files like those java programmers, take a look at pod2html. Pod2html installs in the bin directory with Perl. It formats POD into HTML files and can link to other HTML files providing nice, linked, browsable docs.

You can also get documentation on built-in Perl functions. Give perldoc the -f option and a function name like "perldoc -f grep", and it will provide the appropriate documentation. You can get the output in different formats too. "Perldoc -t" will format the output as plain text. I often use this to dump a large POD to a text file so I can review it leisurely with an editor. For example, "perldoc -t DBI > dbi.txt" will dump the lengthy DBI POD into a text file dbi.txt.

The Perl Development EnvironmentI hope I've uncovered some heretofore unseen avenues in Perltropolis. All of these features, except perhaps the CPAN module, are available as part of the standard Perl installation. Becoming familiar with these tools should enrich your experience with Perl. Perl has become more like a development environment than just a programming language.

Dennis Watson is a senior software engineer at TechWeb. Dennis builds websites, maintains an Oracle database, herds cats, and hacks Perl for a living.

www.cmpnet.com
The Technology Network