Input and Output

1. The Standard Channels

There are three standard channels in Perl, for basic input and output. Each has a predefined file handle, and they are:

file handledescriptiondefault
STDOUTThe standard output channelterminal in which Perl script is running
STDERRThe standard error channelterminal in which Perl script is running
STDINThe standard input channelcharacters from the keyboard

The file handle is rather like another Perl data type; by convention (but not essentially) it is written in uppercase. As the term suggests, it gives us a handle on a file, either for reading in data, or writing out. But what is written to/read from need not only be a file; it could be a process, or a network connection.

When we print in Perl, by default we are printing to STDOUT. To put it another way, the following two statements are equivalent,

    print "Run of $0 at ", scalar(localtime), "\n"; 
    print STDOUT "Run of $0 at ", scalar(localtime), "\n"; 
Notice that where we do include a filehandle in a print statment, there is no comma following it. This is because the filehandle has a special role in the print; it must be distinguished from the (comma separated) list of items to be printed. In this example, $0 is a special Perl variable, holding the name of the current program; there is then a function call to localtime in scalar context, and then another string, newline (giving an output e.g. "Run of script.pl at Mon May 20 10:03:33 2002".)

Another example is provided by error messages; when a script dies, the die message goes to STDERR, which by default is the window or terminal from which the script was run. The defaults can be overridden; all the standard filehandles can be redirected. In a CGI context, they all change anyway. STDOUT becomes the client browser - in other words, if you print "Content-type: text/plain\n\n"; in a CGI script, you are printing it to STDOUT, and that means sending it back down the wire to the browser. STDERR is typically the web server's error log, and STDIN takes in html form data submitted by the post method.

In reading from STDIN, we typically use the angle brackets line-reading operator. In scalar context, angled brackets around any filehandle will read one line from it; that is, if we assign the result to a scalar:

    $line = <STDIN>;
Strictly speaking, the angle brackets read one record from the file handle. A record is terminated by a special character, which by default is a newline - so this does indeed read in one line. (This too can be changed; another special Perl variable, $/ holds the record separator.) The effect of the above line of code is to suspend execution of the script until the user has pressed the return key - at which point, all the characters they have entered, including that final carriage return, are stored into $line.


Exercise

Start a new script io.pl, enter and try out the following:

    print STDOUT "Enter your full name: ";
    $name = <STDIN>; 
    print STDOUT "Hello $name!"; 

Notice what happens when you run this: the exclamation mark appears on a new line. This is precisely because the line reading operator included the carriage return, which was therefore held in the variable $name. To overcome this, Perl provides the chomp function. So -

    chomp($name = <STDIN>); 

which is equivalent to reading in $name first (as above), and then doing chomp($name). chomp is a "soft" operation: it only removes a new line if there is one at the end, otherwise it leaves its argument unchanged. For a hard "chop off the last character regardless", there is the chop function. Add a chomp to your program and observe the change.

Set up a while loop to keep reading lines entered from the keyboard. Echo the text back to user, unless they type "quit" or "bye" etc., in which case, terminate the loop.


2. I/O to Files

File I/O is very straightforward in Perl; we use the open function, with two arguments: the file handle we are going to use, and the name of the file. The name can be prefixed by a mode operator, which indicates whether the file is being opened for read or write. Some examples:

examplemodenotes
open LOGFILE, 'log.txt';inputfile must exist
open LOGFILE, '< log.txt';inputsame as above; mode made explicit
open OUTPUT, "> $file";outputcreate $file if it doesn't exist; overwrite it if it does
open STDERR, ">> logs/error.log";appendcreate error.log if it doesn't exist; don't overwrite: add new text at the end
open FILE, '+< data.log';updateopen for reading and writing

Notice that the appending example redirects STDERR. If a run of a script containing that redirect dies, the message from die will be appended to the logs/error.log file. (In the other examples, the file handles are whatever we want them to be: FILE, OUTPUT, etc. are not reserved file handles.) It is not essential to have the space between the chevron mode operator and the file name path; however in some operating systems "<file" is a valid file name, so it is a good policy for safety to put the space in.

Where the file name includes a directory path, that path must be valid or the operation will fail. (It is perfectly possible to create directories from within a Perl script, but open requires that the directory exists.) Moreover, on unix, the Perl script must have the appropriate permissions to access the directory and file. This could be a problem for a CGI script that needs to write to its own log file - you may need to perform an operation like chmod 0666 on the file.

In Windows, there is a minor problem created by the fact that the directory separator is a backslash, given that in Perl, the backslash is interpreted within double quoted strings. However, you can use backslash within single quotes, or a unix-style forwards slash even for a Windows path. So both of these are OK:

    open (LOGFILE, '> C:\apache\logs\error.log') or die "Can't open error.log: $!"; 
    open (LOGFILE, "> C:/apache/logs/error.log") or die "Can't open error.log: $!"; 

Once we have successfully opened a file handle for writing, we can print to it. (If the > is omitted, an error will be generated if an attempt is made to print to the file.) As soon as we have finished with the file, it is good practice to close it (although all file handles will be automatically closed when the script finishes),

    print LOGFILE "Log at ", scalar(localtime), "\n"; 
    close LOGFILE; 
Remember that open is a regular 2-place function, and puts a comma after the file handle, its first argument; print, for the reasons discussed above, does not. If you are going to be printing a lot to a particular file handle, you may find it convenient to select it; it now becomes the "current file", and need not be explicitly stated:
    open (LOGFILE, '>> C:/apache/logs/script.log') or die "Can't open script.log: $!';
    select LOGFILE; 
    print "Log at ", scalar(localtime), "\n";  
Notice this is not the same as redirecting STDOUT; STDOUT remains whatever it was, and we can indeed explicitly print STDOUT in the middle of all this if we need. If we want to revert to STDOUT being the "current file", we can select STDOUT; at the appropriate point.

To read in from a file, the standard practice is to use the angle operator in the context of a while loop, reading it line by line. Using the angle operator in array context - as in, assigning it to an array variable @lines = <INPUT>; - would read the whole thing in at once, making each line of the INPUT file an element of the array. This is not recommended - if there is any prospect of the file being large, this could place an impossible burden on memory.

The example below reads from one file and writes to another. If you work a lot with both unix and dos files, you will be familiar with the carriage returns of one OS not being formatted properly in the other - at least with simple text editors like Notepad. This simple script can be used to strip out the carriage returns in a file, and re-write it, this time with the carriage returns of the current OS. (If you are in Windows, and want to run it, try the file unix.txt; opening this in Notepad will reveal the unix carriage returns.)

    $in = "unix.txt";
    $out = "$in.dos";

    open(OLD, "< $in");
    open(NEW, "> $out");

    while ($line = <OLD>) {
        chomp $line; 
        print NEW "$line\n";
    }

    close NEW;
    close OLD;

Exercises

A. Perl has a special variable to hold any arguments passed in at the command line to a script: @ARGV (much as e.g. Java has the String[] args in a main() method.) Modify the above code so that the name of the input file is passed in at the command line; that is, if we call the script strip.pl, it might be invoked

prompt> perl strip.pl inputfile
The output file should have the extension ".new" added to whatever file name is passed in. die the script with a usage message if no file name is given to it at the command line. Moreover, die the script with a warning if the open fails.

B. In fact there is also a file handle ARGV, which is used on the assumption that all the arguments passed in at the command line are file names, and behaves like a file handle on all those files, as if they were all concatenated together. In other words, a while($line = <ARGV>) reads every line from all the files specified at the command line, silently passing from one to the next. Another variable, $ARGV, keeps track of the name of the file currently being processed.

Modify the script so that it uses ARGV and $ARGV, to generate a new file which joins all these files into one: winter, horse, conscience and sylvia. At each point when it passes on to the next file, have it print a message to the standard output like "Now dealing with file: horse".

C. Colouring within html files is achieved like this: <td bgcolor="#003366">, where this example gives a background colour to a table cell. In this case, the 00 represents the red component of the colour, the 33 its green component, and 66 the blue component. Any values in the range 00 to FF are permissible, but only certain values combine to make colours which are considered to be "browser safe". Those values are 00, 33, 66, 99, CC and FF. Write a Perl script to iterate through these values, to create an html file which holds a colour chart like the one below.
Hints

Table of browser safe colours

Blue = 00 Blue = 33 Blue = 66 Blue = 99 Blue = CC Blue = FF
Red = 00  
#000000
 
#000033
 
#000066
 
#000099
 
#0000CC
 
#0000FF
Green = 00
Red = 00  
#003300
 
#003333
 
#003366
 
#003399
 
#0033CC
 
#0033FF
Green = 33
Red = 00  
#006600
 
#006633
 
#006666
 
#006699
 
#0066CC
 
#0066FF
Green = 66
Red = 00  
#009900
 
#009933
 
#009966
 
#009999
 
#0099CC
 
#0099FF
Green = 99
Red = 00  
#00CC00
 
#00CC33
 
#00CC66
 
#00CC99
 
#00CCCC
 
#00CCFF
Green = CC
Red = 00  
#00FF00
 
#00FF33
 
#00FF66
 
#00FF99
 
#00FFCC
 
#00FFFF
Green = FF
Red = 33  
#330000
 
#330033
 
#330066
 
#330099
 
#3300CC
 
#3300FF
Green = 00
Red = 33  
#333300
 
#333333
 
#333366
 
#333399
 
#3333CC
 
#3333FF
Green = 33
Red = 33  
#336600
 
#336633
 
#336666
 
#336699
 
#3366CC
 
#3366FF
Green = 66
Red = 33  
#339900
 
#339933
 
#339966
 
#339999
 
#3399CC
 
#3399FF
Green = 99
Red = 33  
#33CC00
 
#33CC33
 
#33CC66
 
#33CC99
 
#33CCCC
 
#33CCFF
Green = CC
Red = 33  
#33FF00
 
#33FF33
 
#33FF66
 
#33FF99
 
#33FFCC
 
#33FFFF
Green = FF
Red = 66  
#660000
 
#660033
 
#660066
 
#660099
 
#6600CC
 
#6600FF
Green = 00
Red = 66  
#663300
 
#663333
 
#663366
 
#663399
 
#6633CC
 
#6633FF
Green = 33
Red = 66  
#666600
 
#666633
 
#666666
 
#666699
 
#6666CC
 
#6666FF
Green = 66
Red = 66  
#669900
 
#669933
 
#669966
 
#669999
 
#6699CC
 
#6699FF
Green = 99
Red = 66  
#66CC00
 
#66CC33
 
#66CC66
 
#66CC99
 
#66CCCC
 
#66CCFF
Green = CC
Red = 66  
#66FF00
 
#66FF33
 
#66FF66
 
#66FF99
 
#66FFCC
 
#66FFFF
Green = FF
Red = 99  
#990000
 
#990033
 
#990066
 
#990099
 
#9900CC
 
#9900FF
Green = 00
Red = 99  
#993300
 
#993333
 
#993366
 
#993399
 
#9933CC
 
#9933FF
Green = 33
Red = 99  
#996600
 
#996633
 
#996666
 
#996699
 
#9966CC
 
#9966FF
Green = 66
Red = 99  
#999900
 
#999933
 
#999966
 
#999999
 
#9999CC
 
#9999FF
Green = 99
Red = 99  
#99CC00
 
#99CC33
 
#99CC66
 
#99CC99
 
#99CCCC
 
#99CCFF
Green = CC
Red = 99  
#99FF00
 
#99FF33
 
#99FF66
 
#99FF99
 
#99FFCC
 
#99FFFF
Green = FF
Red = CC  
#CC0000
 
#CC0033
 
#CC0066
 
#CC0099
 
#CC00CC
 
#CC00FF
Green = 00
Red = CC  
#CC3300
 
#CC3333
 
#CC3366
 
#CC3399
 
#CC33CC
 
#CC33FF
Green = 33
Red = CC  
#CC6600
 
#CC6633
 
#CC6666
 
#CC6699
 
#CC66CC
 
#CC66FF
Green = 66
Red = CC  
#CC9900
 
#CC9933
 
#CC9966
 
#CC9999
 
#CC99CC
 
#CC99FF
Green = 99
Red = CC  
#CCCC00
 
#CCCC33
 
#CCCC66
 
#CCCC99
 
#CCCCCC
 
#CCCCFF
Green = CC
Red = CC  
#CCFF00
 
#CCFF33
 
#CCFF66
 
#CCFF99
 
#CCFFCC
 
#CCFFFF
Green = FF
Red = FF  
#FF0000
 
#FF0033
 
#FF0066
 
#FF0099
 
#FF00CC
 
#FF00FF
Green = 00
Red = FF  
#FF3300
 
#FF3333
 
#FF3366
 
#FF3399
 
#FF33CC
 
#FF33FF
Green = 33
Red = FF  
#FF6600
 
#FF6633
 
#FF6666
 
#FF6699
 
#FF66CC
 
#FF66FF
Green = 66
Red = FF  
#FF9900
 
#FF9933
 
#FF9966
 
#FF9999
 
#FF99CC
 
#FF99FF
Green = 99
Red = FF  
#FFCC00
 
#FFCC33
 
#FFCC66
 
#FFCC99
 
#FFCCCC
 
#FFCCFF
Green = CC
Red = FF  
#FFFF00
 
#FFFF33
 
#FFFF66
 
#FFFF99
 
#FFFFCC
 
#FFFFFF
Green = FF

3. I/O to Processes

Those familiar with unix will be used to 'piping' processes to together - combining commands so that the results of one are fed in to the next. This requires the vertical bar, |, the pipe bar:

prompt> ls -l /etc | more
prompt> ps -aux | grep httpd
In the first example, we ask for a listing of the directory /etc in long format, piping the result to the 'more' command. This shows one page of text at a time, prompting the user for --more--; the upshot is therefore that the directory listing does not flash by all at once, we can go through it at our own pace. The second asks for a listing of all processes, which again can be quite long; this time, piping it to the grep command - filtering out only those containing the string 'httpd'. In this way we can immediately see if a web server (http daemon) is running.

In piping two commands together, one process generates an output, and the other takes an input. When opening a file handle on a process in Perl, we can make that handle output to the other process, or receive input from it; simply by placing the pipe bar appropriately:

    open OUTPUT, "| process";    # this Perl script outputs to process
    open INPUT, "process |";     # this Perl script receives input from process

To give a very simple illustation of this, here is a Perl script which achieves the same as the second of the unix examples above, by opening a file handle PS on the ps -aux command. We read in from PS in just the same way as if it were a file, line by line. The ps command puts variable amounts of whitespace between the items of information on each process, so each line is split with the regular expression /\s+/ - "one or more whitespace". The results of the split are read into an array, the last member of which details the actual command. This is checked against the string httpd, and if it matches, the whole ps line is printed out:

open (PS, "ps -aux |");

while ($line = <PS>) {
  ($user, $pid, $cpu, $mem, $vsz, $rss, $tty, $stat, $start, $time, $cmd) 
       = split(/\s+/, $line);
  print "$line" if $cmd =~ /httpd/;
}
close PS;

© INGENIO.co.uk