Beginning Haskell the practical way part two: File I/O

In my last post about Haskell I showed how you can interact with the user via command line arguments without the need for complicated libraries, so you can quickly start hacking and explore the language. In this post I want to cover file interaction, which is, since on UNIX-like systems everything is a file, probably enough for all tasks you'll want to accomplish, although there might be some better libraries for networking or databases. As mentioned in my previous post, Haskell uses a concept called monads to encapsulate functions, that might crash due to I/O or other side effects. Furthermore Haskell is a lazy language, so it only evaluates statements that it really needs; these properties make I/O a bit unintuitive at first, but in the end it allows you to produce powerful and expressive code. As a rule of thumb, if your data is not needed for some kind of output, it won't be loaded and code affecting it won't be executed; this is because the main function is an IO monad itself.

Reading files

First we'll need to import System.IO, which holds the functions for file interaction we'll use. The probably most basic function is openFile, which takes a filename as first argument, and ReadMode, WriteMode or AppendMode as second argument. To demonstrate it's use, we'll write a program similar to cat, which simply outputs a specified file to stdout:

import System.Environment (getArgs)
import System.IO

cat :: String -> IO ()
cat s = do
    file <- openFile s ReadMode
    fileData <- hGetContents file
    putStr fileData
    hClose file

main = do
    args <- getArgs
    case args of
        x:_ -> cat x
        _   -> putStrLn "Please specify a file name"

If you've read my last post the main function won't be a problem for you; the cat function is also pretty basic: it has the type cat :: String -> IO (), so even without further information about what it exactly does, we know that it takes a string, and at the end it will perform an IO-action. In it's first line we acquire a file handle for the file we want to output, which we assign with the <--operator to the variable file. After that we use the same operator to read the file into a variable via hGetcontents which takes a handle as parameter, print the data to stdin viea putStr fileData and finally close the file handle with hClose file. However, we can improve our program by using fileData <- readFile s, which frees us from opening and closing the file handle manually.

Writing files

Similary we can write a simple copy program; here we also make use of writeFile, instead of opening a file in WriteMode, and putting the contents into it with hPutStr:

import System.Environment (getArgs)
import System.IO

cp :: String -> String -> IO ()
cp inFile outFile = do
    fileData <- readFile inFile
    writeFile outFile fileData


main = do
    args <- getArgs
    case args of
        inFile:outFile:_ -> cp inFile outFile
        _                -> putStrLn "Please specify an input and an output file name"

Note that this won't work on binary files, but most of the time one's dealing with text files anyway, so I just refer you to the "http://www.haskell.org/haskellwiki/DealingWithBinaryData":"haskell wiki". The last pretty common IO operation one might want to use is the function interact. If you check its type in the interactive Haskell shell, you'll note that it's interact :: (String -> String) -> IO (); so interact takes a function f :: String -> String and results in an IO action, namely reading from stdin, applying f and writing the result of f to stdout. However, I don't see that much application for this function and would mostly use it to pipe files into stdin and pipe stdout to other files, so readFile and writeFile suffice.

Processing the input

Unless you just want to capture some data, you'll need to process your input. If you're familiar with other programming languages this might be a bit counter-intuitive, but I believe that you'll find it much more natural in the end. The scenario I found myself confronted with most frequently is line based input, like reading a configuration file with one entry per line, CSV or similar. This can be done quite easily with Haskell's lines and unlines functions; the first transforms a string into a list of strings corresponding to the lines, the latter a list of strings into a new string, where each input string is on its own line. Let's write a simple grep-program to demonstrate this:

import System.Environment (getArgs)
import System.IO
import Data.List

grep :: String -> String -> IO ()
grep pattern file = do
    fileData <- readFile file
    putStr $ unlines $ filter (isInfixOf pattern) $ lines fileData

grepStdin :: String -> IO ()
grepStdin pattern = interact (unlines . filter (isInfixOf pattern) . lines)


main = do
    args <- getArgs
    case args of
        pattern : file : _ -> grep pattern file
        pattern : _        -> grepStdin pattern
        _                  -> putStrLn "Please specify a pattern and a file name"

So this program simply splits the read data into lines, applies a filter which checks if our supplied pattern (which is more accurately just a substring) is in the list element, puts the filtered lines back together and prints them. Also you see the usage of interact; we construct a function String -> String just by composing the functions lines, filter and unlines, which gives us a pretty concise and clear way to implement this function. Moreover are we now able to read and write files and interact with users, and can write the first useful Haskell programs, like simple administration scripts, so learning it will become much more natural.

blogroll

social