Saturday, March 17, 2007

Solitaire

I was getting mighty sick of socket programming, so I did what any good programmer does. Procrastinate by surfing the web. I happened upon haskell.org's collection of solutions to the Ruby quiz puzzles, in Haskell. The random page function is neato. I took a stab at the first one, Bruce Schneier's solitaire cypher.

I spent a lot of time reading up on list manipulation functions in Data.List. That library is awesome. I didn't really grok the utility of zipWith until I caught myself thinking, I have two lists i would like to combine. Another function I put to good use was groupBy. groupBy is sort of odd, but i think there's a fair amount of utility there. I couldn't' think of a clever way to break a list up into a list of lists of 5 elements. it's *almost* like map, but needs to look at more of the list. I suppose there's something like reverse $ foldl (\acc val -> if (length head acc < 5) then (val:(head acc)):(tail acc) else [val]:acc but that seems opaque.

I ran into a couple of problems, I need to learn to read specifications very carefully. I wasted a fair amount of time by missing the part that both Jokers are considered 53 when used in a numeric context, I had used one Joker as 53 and another as 54. There were also some fiddly little details with converting from numeric to ascii codes... off by one error essentially.

Anyway, check out the quizzes. they're nice sized problems. Challenging enough to be more than a toy, but not so large that they're daunting. Keep a window on the haskell libraries open too. There's great stuff in the library you'll never realize you need till you're in the thick of a problem. Well, it worked well for me anyway.

Friday, March 9, 2007

Simple Socket Programming 2. Revenge of the chat.

See Part 1.

Chat is a much more interesting server application than webserver. Each thread of control must communicate with every other thread. Or, at least, with some sort of authority that communicates with every other thread.

I'm going to be explicit about where declarations come from. It's a lot easier to sort out where to look for documentation with this style of import.



> import Control.Concurrent (forkIO)
> import Control.Concurrent.STM (STM, TVar, atomically, newTVar, readTVar, writeTVar)
> import Control.Exception (bracket, finally)
> import Network (PortID(..), accept, sClose, listenOn)
> import System.IO (Handle, hClose, hFlush, hGetLine, hPutStrLn)



There are a whole different set of problems to solve when clients can talk to each other. The original threaded server simply used the bracket function to take care of all of the looping. Chat requires each thread to have access to every other thread.



> threaded talk =
> do { state <- atomically (newTVar [])
> ; bracket (listenOn $ PortNumber 8000)
> (sClose)
> (loop state) }
> where loop s sock = do { c <- accept sock
> ; threadCreator s talk c
> ; loop s sock}




Three changes, First the creation of state. This state var is passed down to each client thread. This is the key to interprocess communication. Every thread can read and write this variable, so they have a way to talk to each other. The second big change is simplifying the where clause. loop is a little bigger than I'm comfortable with, but i think the code is clear. handle was pulled out, and promoted to a top-level function, threadCreator. Third, the protocol was removed from the server. Threaded and threadCreator only concern themeelves with socket level interaction. Protocol for using the sockets is dealt with by whatever talk function is passed in.




> threadCreator state talk (h, n, p) =
> do { manipState state (\x -> h:x)
> ; forkIO $ finally (talk h state)
> (do { manipState state
> (\x -> filter (h /=) x)
> ; hClose h})}



ThreadCreator has a lot more work to do. When threaded accepts a connection threadCreator puts that connection into the global state. Conceptually, when we get a connection, cons it on to the list of all the connections. Then, when we're done talking to a client threadCreator removes the closed handle from the list of connections. manipState does the work of adding and removing the connections from the list.

ThreadCreator got some complexity from exception handling. before, a thread could lose it's socket and just die. Now we have special resources that must be cleaned up. ThreadCreator has one wonderful property that i feel more than makes up for it's clumsiness. ThreadCreator cleans up all of the resources it creates with no special handling. Everything you ever need to know about manipulating the global state is right there. All of the side effects are locked down to six lines of code. Anyone who's some time chasing malloc/free or new/delete pairs will appreciate how important this really is.



> manipState state op =
> atomically $ do { ls <- readTVar state
> ; writeTVar state $ op ls}




ManipState is a simple helper. perhaps it should go in a where block of threadCreator, but I'm wary of extensive where blocks. The point is, given some state, read the state, operate on it then write the state back.




> tell h s = hPutStrLn h s >> hFlush h
> echo h s = tell h "echo!" >>
> loop
> where loop = hGetLine h >>= tell h >> loop




Here we have a simple echo talk protocol to verify everything compiles and appears to work. running threaded echo in ghci fires up the server. It appears to echo back whatever i type at it. It also appears well behaved on disconnect. The last step is to really chat, with threads talking to each other.

> tellAll h s = mapM (\x -> tell x s) h

> chat :: Handle -> TVar [Handle]->IO a
> chat h s = tell h "chat!" >>
> loop
> where loop = do { msg <- hGetLine h
> ; ls <- atomically(readTVar s)
> ; tellAll ls msg
> ; loop }

Try threaded chat from a ghci prompt, or main = threaded chat for ghc.

The chat client handler thread reads a message, then writes that message to every open handle. The one big problem with this approach is simultaneous messages. There's a race condition where to threads may write to the output handle at the same time. This could garble client 1's message "foo" and client 2's message "bar" resulting in something like "fboo" and "ar". More serious is the case where client 1 disconnects, just as client 2 is writing to the handle. client 2 will get an io exception and be disconnected.

These problems could be solved with more state, and more error checking. I'll address them in part 3. As a preview, haskell threads are so inexpensive i think a good solution would be one thread for reading every handle, and another thread for writing every handle. That approach needs more sophisticated shared state. The basic idea is each reader thread calls a dispatch function, much like talk. The dispater writes to one or more TChan channels. The writer threads take data from the tchan, and put it on the socket.

Jason

Haskell hacking notes or Prelude to the chat server

I picked up a couple of haskell programming practices this week. They saved me a bit of time, perhaps someone else will find them useful.

Avoid where. Where is handy, it's nice for short simple declarations. Don't use it for multiple functions, or functions with pattern match. This is of course guideline, do whatever you want, but i had several huge where blocks, and I paid a price for that. In general, I'm going to keep where blocks short and sweet.

Write the type declaration first. Perhaps type in a trivial version, load it, then cut and paste the type out of GHCI. Several times, I could have saved myself hours by figuring out the type up front.

Anyway, that was the good stuff, the rest is just rambling. I'm the fire-aim-fire-aim kind of programmer. I do something, then test. Haskell is great for my style of small incremental changes style of programming. Get a simple program working, then you keep it working by replacing hard coded values with functions. as long as it compiles you're in good shape. This last week i've come up with surprisingly long strings of code that typecheck, and run right off the bat. It's always satisfying to see progress.

There were two fundamental problems holding me back this week. First, I don't think the documentation for thread communication is very good. To be fair, there is good documentation available, but the easy to find stuff isn't the best. When you don't know anything, it's hard to evaluate the quality of documentation. My plan is to write about transforming the threaded server, developed earlier, into a chat server.

The second thing that held me back was my shaky grasp of monads. i was constructing a type like IO(STM (TVar [a])). I should have been constructing a type like IO (TVar [a]). I didn't really understand I was carrying around an unnecessary layer of computation. It's still non-obvious (to me) how to carry around those types, but I'm getting there.

Saturday, March 3, 2007

Simple Socket Programming

I think one of the things that made perl so popular was the abundance of fantastic documentation about how to do stuff. I've been fooling with Haskell recently, the approach to documentation is different. For example, if you want to write a simple haskell server, you'll likely find this example: A simple TCP server <edit> In addition to the simple tcp server take a look at Stephan Walter's interact for TCP. It's a good trick for getting single threaded servers working quickly.</edit>


That's a server, but if you're as clueless as I am, it's a bit intimidating. You need to know about sockets. You need to know about exceptions. You need to know about handles. You need to know about concurrency. You need to know about software transactional memory. I wanted something with a shallower learning curve.

I've used the bind syntax rather than do syntax. I feel the do syntax is great sugar, but gets in my when way when trying to understand at the basic level. So, without further ado, a very simple server:

>import Network
>import System.IO
>import Control.Exception
>import Control.Concurrent
>
>oneShot = listenOn (PortNumber 8000) >>=
> \sock -> accept sock >>= talk >> sClose sock
> where talk (handle, name, port)
> = hPutStrLn handle "hi!" >> hClose handle

One shot in english: First, we create a server socket with listenOn. Second, we use that socket to accept and use a connection, Finally we close the connection. The talk function talks to the client. Talk gets a handle, we can use any of the System.IO h functions for client communication. Reading and writing works using the handle. Capturing the socket in the lambda \sock -> allows closing the socket at the end of the run.

Here's the same program again, with do syntax:

>oneShot2 = do
> sock <- listenOn (PortNumber 8000)
> conn <- accept sock
> talk conn
> sClose sock
> where talk (h, n, p) = do
> hPutStrLn h "hi!"
> hClose h

Only a handful functions need to be considered. listenOn, accept and sClose are all from the Network library. hPutStrLn and hClose come from System.IO. there's good stuff in those libraries, worth a look.

oneShot only accepts a single connection, then exits. Version two can handle many clients


>manyShot = bracket
> (listenOn $ PortNumber 8000)
> (sClose)
> (loop)
> where loop sock = accept sock >>= talk >> loop sock
> talk (h, n, p) = hPutStrLn h "hi!" >> hClose h


Two big changes between oneShot and manyShot. First, looping over the accept lets manyShot handle client after client.

Second is the use of exception handling. bracket takes three arguments, how to initialize, how to clean up, and what work to actually do. If there's an exception in loop, bracket will take care of cleaning up the server socket.

The next problem to resolve is threading. Currently only one client at a time can be handled.


>threaded = bracket
> (listenOn $ PortNumber 8000)
> (sClose)
> (loop)
> where loop sock = accept sock >>= handle >> loop sock
> handle (h, n, p) =
> forkIO (hPutStrLn h "hi!" >> hClose h)


forkIO takes a computation and runs it in another thread. This version can handle simultaneous requests.

A few important notes. I ran all of these examples in ghci in Aquamacs on OS X. I should have included withSocketsDo but i don't run windows. be sure to check that out if you're interested in developing cross platform servers.

Also, if you're compiling at every step, you will need a line like this:


>main = oneShot

to test the individual servers. compile with ghc --make for maximum simplicity.

Jason