<![CDATA[Tim McNamara - Always Learning]]>/Ghost 0.5Tue, 18 Apr 2017 07:50:02 GMT60<![CDATA[Avoid DIY Encryption by Learning About Encryption]]>Ever since I read about a pretty fundamental flaw with the aes gem, I have been bugged with the questions, "So, what's the right way to do it? And, if we should encrypt all the things, why is it so hard?". This post tries to walk you through what's going on, so that you can feel more confident about adding encryption to your applications.

Things are better these days

There have been some very good efforts to increase the usability of cryptoraphy recently. It used to be the case that you needed to compose several primitives together, each with their own parameters, to encrypt a message. If one of them was improperly used, bad people could easily make bad things happen.

Bernstein et al's NaCl (prononced "salt") secret_box API, which is growing nicely via the Sodium project, is a massive step forward. Python's cryptography package and the Fernet format spec make it especially easy to send secure messages securely over the wire.

Q: How do I encrypt something with a password?

I think this is the question that led the bug. Given that's how computer users are trained to enter passwords, it's not a big step to expect that it's possible for someone to produce code like this and expect it to work:

require 'aes'

message = "Super secret message"  
key = "password"

encrypted = AES.encrypt(message, key)  

The problem here—irrespective of Ruby's silent error handling of #hex that the article discusses—is that a password is not a key.

Deriving a key from a password

We can use a password to derive a key, which is what PBKDF2, bcrypt, scrypt and other key derivation functions do. That key is what can be used to encrypt a message. The key can have authentication added via HMAC and a hash code. We will use SHA256.

Using cryptography, the whole key derivation process looks like this:

import os  
from cryptography.hazmat.backends import default_backend  
from cryptography.hazmat.primitives.hashes import SHA256  
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC

password = b"password"  
salt = os.urandom(16) # bytes, i.e. 128 bits (minimum as per NIST SP 800-132)  
n_iter = 100*1000

kdf = PBKDF2HMAC(  
  algorithm=SHA256(),
  length=32, # bytes, i.e. 256 bits 
  salt=salt,
  iterations=n_iter,
  backend=default_backend()
)

key = kdf.derive(password)  

I am not a cryptographer, but here are a few remarks about some of the code in this snippet:

  • salt needs to be randomly generated to be useful
  • n_iter/iterations should be very high. To some extent, increasing the iteration count can offset a weak password.
  • You will need to transmit salt and iterations along with your message to enable the recipient to derive the same key later on

How to encrypt message with a key

Now that we have a usable key, how to we use it to actually encrypt something?

Here I present two options. One is almost certainly how you should do it, the second is an implementation of a scheme proposed in the Security Stack Exchange that seemed quite interesting. I'm including the second to demonstrate that it's quite difficult to build a strong fortification from the crypto jenga-like primitives that higher-level developers have struggled with for decades.

Example 1: Using crytography.fernet

Given the key that we have already generated, a few more imports allow us to generate URL-safe hidden messages that can be sent over the wire with some level of confidence that the contents will remain hidden:

from base64 import urlsafe_b64encode as base64  
from cryptography.fernet import Fernet

# [include snippet above]

encryptor = Fernet(base64(key))  
hidden_message_token = encryptor.encrypt(b"yolo")  

hidden_message_token is a base64-encoded message in that complies with the Fernet spec. It includes a timestamp, the ciphertext and some other metadata that enables the token to be authenticated.

To allow someone else to decrypt hidden_message_token from the password, they need to be able to generate key. That means you will also need to send salt and iterations with hidden_message_token. Once key is generated, create another encryptor, then all they will need is the following:

message = encryptor.decrypt(hidden_message_token)  

Example 2: DIY crypto

Somewhere you learn that Fermat uses the following under the hood:

All encryption ... is done with AES 128 in CBC mode.

What? Surely not! Surely that's not what happens. And, in a sense, you would be right. NaCl/libsodium uses AES 256 GCM. And to make things a tad more complex, ChaCha20-Poly1305 would probably be the first choice.

So anyway, you decide to take a look online for how to encrypt messages with a password and you find a promising answer. Here is an implementation of the setup and encryption phases of that answer.

import os  
import random  
import cryptography  
from cryptography.hazmat.primitives import hashes  
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes  
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC  
from cryptography.hazmat.backends import default_backend

message = "Papa Bear, this is Little Bear. The Monkey has the Green Banana."  
password = "yolo"


## SETUP STEP
# 

crypto_backend = default_backend()  
k1 = os.urandom(16) # 128 bits  
iv = os.urandom(16) # 128 bits  
salt = os.urandom(16) # 128 bits, minimum NIST SP 800-132  
n_iters = 100*1000


kdf = PBKDF2HMAC(  
  algorithm=hashes.SHA256(),
  length=32,
  salt=salt,
  iterations=n_iters,
  backend=crypto_backend
)

intermediate_key = kdf.derive(password)  
k2 = intermediate_key[:16]  
k3 = intermediate_key[16:]

cipher = Cipher(algorithms.AES(k2), modes.CBC(iv), backend=crypto_backend)  
encryptor = cipher.encryptor()  
ct = encryptor.update(k1) + encryptor.finalize()

# At this stage, we can store ct, k3, salt and iv
# for later use to encyrpt and decrypt data


## ENCRYPTION STEP

# assume that we have loaded ct, k3, salt and iv
# from storage, we need to regenerate k1 and k2
# from the password

kdf = PBKDF2HMAC(  
  algorithm=hashes.SHA256(),
  length=32,
  salt=salt,
  iterations=n_iters,
  backend=crypto_backend
)

intermediate_key_2 = kdf.derive(password)  
k2_2 = intermediate_key[:16]  
k3_2 = intermediate_key[16:]

assert k3 == k3_2 # means that we know the password is ok

cipher = Cipher(algorithms.AES(k2_2), modes.CBC(iv), backend=crypto_backend)  
decryptor = cipher.decryptor()  
k1_2 = decryptor.update(ct) + decryptor.finalize()

assert k1 == k1_2 # just checking

file_cipher = Cipher(algorithms.AES(k1_2), modes.CBC(iv), backend=crypto_backend)

# to encrypt, use file_cipher and make sure you finalize so that padding is 
# dealt with properly

file_encryptor = file_cipher.encryptor()  
hidden_message = file_encryptor.update(message) + file_encryptor.finalize()  

There's quite a lot going on here. I am sure that I will receive an email about how terrible this code is. That's probably why it's best to stick with the higher-level APIs.

]]>
/avoid-diy-encryption/c4d50e3b-8b6c-4a7a-996d-c11dfc401719Sat, 14 Jan 2017 01:39:00 GMT
<![CDATA[IO Completion Ports (IOCP) and Asynchronous I/O through STDIN, STDOUT and STDERR]]>tl;dr Can't be done directly. You have two options: a) mock async I/O with threads, or b) redirect STDIN, STDOUT & STDERR to other file handles that support overlapping (aka non-blocking/async) I/O, such as named pipes.

Background

tokio-rs 0.1 is out! Yay! This is great news for networking, I wonder what life is like for file I/O?

Its getting started examples use an echo server, but I really wanted to learn how to create an efficient worker to fit with th Hadoop Streaming API (among other use cases). That means reading from STDIN and writing to STDOUT. It turns out, tokio doesn't have support for non-blocking I/O for stdio.

Turns out, others have looked into this. As it happens, async I/O and upstream progress has stalled pending more research into how IO Completion Ports work. It seems that Windows being different makes life difficult.

This is going to require more research into how STDIN & co works w/ IOCP. I will tentatively assign this to the 1.0 milestone, but will potentially have to punt if it is tricky.

— carllerche, Dec 2015

Does it look possible?

A fairly large number of projects don't think so. Here is a quote from 2008 that is a fairly telling portent:

Development of the library Boost.Process stopped two years ago. One of the biggest outstanding issues is adding support for asynchronous I/O to stdin/stdout/stderr.

— Boris, asio C++ mailing list

Let's work our way through the MSDN documentation to figure the situation out. To start, let's clear up a few terms os that we all know what we're talking about.

What is an IOCP?

There are some significant differences between the UNIXish multiverse and Windows family when it comes to networking I/O. As well as differing APIs, there is also differing terminology.

Unlike calling select or poll on a single file descriptor, Windows offers you the ability to wrap a file handle in an IO completion port (IOCP). The file descriptor and the completion port are independent, but linked. The port takes care of dealing with the file itself.

Its proponents believe (with good reason) that the completion port model is a good one for supporting interleaved reads and writes acrosss multiple threads without blocking.

Some notes on Windows terminology differences:

  • Windows uses the term "overlapped I/O" where most UNIX-esque programmers would use the term "non-blocking I/O".
  • STDIN, STDOUT and STDERR are sometimes referred to as CONIN$, CONOUT$ and CONERR$ within Windows documentation

With all of this in mind, creating an IOCP looks like this under the covers:

HANDLE WINAPI CreateIoCompletionPort(  
  _In_     HANDLE    FileHandle,
  _In_opt_ HANDLE    ExistingCompletionPort,
  _In_     ULONG_PTR CompletionKey,
  _In_     DWORD     NumberOfConcurrentThreads
);

The important parameter is FileHandle, an object created by CreateFile. That handle must support overlapped I/O. Here is the relevant extract of the creating an CreateIoCompletionPort reference:

The handle passed in the FileHandle parameter can be any handle that supports overlapped I/O. Most commonly, this is a handle opened by the CreateFile function using the FILE_FLAG_OVERLAPPED flag (for example, files, mail slots, and pipes). Objects created by other functions such as socket can also be associated with an I/O completion port. For an example using sockets, see AcceptEx. A handle can be associated with only one I/O completion port, and after the association is made, the handle remains associated with that I/O completion port until it is closed.

— "CreateIoCompletionPort function" MSDN

This raises an important question, do the file handles for CONIN$, CONOUT$ & CONERR$ support FILE_FLAG_OVERLAPPED? We need to look to the documentation for CreateFile to see.

After some browsing, one comes across the section on async I/O describing how to provide the flag. We provide it within the dwFlagsAndAttributes parameter.

Synchronous and Asynchronous I/O Handles

CreateFile provides for creating a file or device handle that is either synchronous or asynchronous. A synchronous handle behaves such that I/O function calls using that handle are blocked until they complete, while an asynchronous file handle makes it possible for the system to return immediately from I/O function calls, whether they completed the I/O operation or not. As stated previously, this synchronous versus asynchronous behavior is determined by specifying FILE_FLAG_OVERLAPPED within the dwFlagsAndAttributes parameter. There are several complexities and potential pitfalls when using asynchronous I/O; for more information, see Synchronous and Asynchronous I/O.

This gets us closer, but we still don't yet know. When you read the Consoles section of the same article, you discover the documenation explicitly states that the parameter is ignored.

Consoles

The CreateFile function can create a handle to console input (CONIN$). If the process has an open handle to it as a result of inheritance or duplication, it can also create a handle to the active screen buffer (CONOUT$).

...

dwFlagsAndAttributes ignored

So after all of that we discover that no, it's not possible.

Maybe I should have read that original post in a little more detail before hunting through all of the documentation myself:

> If you look at the MSDN docs for CreateFile then you will see, under the
> heading Consoles, that CreateFile ignores file flags when creating a
> handle to a console buffer. I doubt that there is any way to do genuine
> asynchronous io to a console buffer.

— Roger Austin, , asio C++ mailing list

Other Approaches

Clearly, many projects face similar issues. They want to write to STDOUT as fast as possible, without blocking the mail thread. What have they done to create non-blocking servers that access these blocking APIs?

There are two main options:

  • use threads
  • redirect STDOUT/etc to another file handle such as a named pipe and perform async I/O on that

Threading

In an article entitled "Asynchronous I/O in Windows for Unix Programmers", Ryan Dahl (creator of node.js), provides a very good discussion of IOCP that includes file I/O, rather than just network I/O. His suggested approach for Console applications is to spawn threads that wait for events that then communicate with the main thread.

Console/TTY

It is (usually?) possible to poll a Unix TTY file descriptor for readability or writablity just like a TCP socket—this is very helpful and nice. In Windows the situation is worse, not only is it a completely different API but there are not overlapped versions to read and write to the TTY. Polling for readability can be accomplished by waiting in another thread with RegisterWaitForSingleObject().

emphasis added

This approach is taken by FastCGI within libfcgi/os_win32.c. STDIN is mocked out, but STDOUT is kept synchronous. The StdinThread function loops in a thread until shutdown:

/*

*--------------------------------------------------------------
 *
 * StdinThread--
 *
 *    This thread performs I/O on stadard input.  It is needed
 *      because you can't guarantee that all applications will
 *      create standard input with sufficient access to perform
 *      asynchronous I/O.  Since we don't want to block the app
 *      reading from stdin we make it look like it's using I/O 
 *      completion ports to perform async I/O.
 *
 * Results:
 *    Data is read from stdin and posted to the io completion
 *      port.
 *
 * Side effects:
 *    None.
 *
 *--------------------------------------------------------------
 */
static void StdinThread(LPDWORD startup){

    int doIo = TRUE;
    int fd;
    int bytesRead;
    POVERLAPPED_REQUEST pOv;

    while(doIo) {
        /*
         * Block until a request to read from stdin comes in or a
         * request to terminate the thread arrives (fd = -1).
         */
        if (!GetQueuedCompletionStatus(hStdinCompPort, &bytesRead, &fd,
        (LPOVERLAPPED *)&pOv, (DWORD)-1) && !pOv) {
            doIo = 0;
            break;
        }

    ASSERT((fd == STDIN_FILENO) || (fd == -1));
        if(fd == -1) {
            doIo = 0;
            break;
        }
        ASSERT(pOv->clientData1 != NULL);

        if(ReadFile(stdioHandles[STDIN_FILENO], pOv->clientData1, bytesRead,
                    &bytesRead, NULL)) {
            PostQueuedCompletionStatus(hIoCompPort, bytesRead, 
                                       STDIN_FILENO, (LPOVERLAPPED)pOv);
        } else {
            doIo = 0;
            break;
        }
    }

    ExitThread(0);
}

Redirect to another file handle, such as a named pipe

A very old version of Twisted seems to have an implement this approach. (From glancing at the current code, it looks like Twisted has moved back to threads :/ )

There are bound to be more examples around of using a proxy handle around though, as it seems like quite a nifty approach. The relevant MSDN article is "Creating a Child Process with Redirected Input and Output"

The important takeaways seem to be:

  • set up SECURITY_ATTRIBUTES correctly
  • make sure your named pipes have unique names
  • make sure that you are reading and writing correct ends of the pipe from each process

An extract of old Twisted code demonstrating how to proceed looks like this:

# Counter for uniquely identifying pipes
counter = itertools.count(1)

class Process(object):  
  ...
  def __init__(...):
    ...

    # Set the bInheritHandle flag so pipe handles are inherited. 
    saAttr = win32security.SECURITY_ATTRIBUTES()
    saAttr.bInheritHandle = 1

    # in duplex mode so we can read from it too in order to detect when
    # Create a pipe for the child process's STDIN. This one is opened
    # the child closes their end of the pipe.
    self.stdinPipeName = r"\\.\pipe\twisted-iocp-stdin-%d-%d-%d" % (self.pid, counter.next(), time.time())
    self.hChildStdinWr = win32pipe.CreateNamedPipe(
            self.stdinPipeName,
            win32con.PIPE_ACCESS_DUPLEX | win32con.FILE_FLAG_OVERLAPPED, # open mode
            win32con.PIPE_TYPE_BYTE, # pipe mode
            1, # max instances
            self.pipeBufferSize, # out buffer size
            self.pipeBufferSize, # in buffer size
            0, # timeout 
            saAttr)

    self.hChildStdinRd = win32file.CreateFile(
            self.stdinPipeName,
            win32con.GENERIC_READ,
            win32con.FILE_SHARE_READ|win32con.FILE_SHARE_WRITE,
            saAttr,
            win32con.OPEN_EXISTING,
            win32con.FILE_FLAG_OVERLAPPED,
            0);

    # Duplicate the write handle to the pipe so it is not inherited.
    self.hChildStdinWrDup = win32api.DuplicateHandle(
            currentPid, self.hChildStdinWr, 
            currentPid, 0, 
            0,
            win32con.DUPLICATE_SAME_ACCESS)
    win32api.CloseHandle(self.hChildStdinWr)
    self.hChildStdinWr = self.hChildStdinWrDup

Which approach to take?

There are others who are significantly more experienced in this area than I. The conventional approach certainly seems threads, but using redirection does appeal to me for some reason. As it nears midnight, my sugestion to the Tokio team and others would be to go with the approach that's easiest to maintain unless benchmarks prove compelling.

]]>
/iocp-and-stdio/2f39ff56-18b4-4d38-9a7e-4dd6275a83a8Thu, 12 Jan 2017 10:53:00 GMT
<![CDATA[Etymology of Rust Language Terms]]>My impression of reading crate documentation is that much of Rust's community assume a fairly high level of understanding of systems programming and computer science. Perhaps that high level of assumed prior knowledge is one of the reasons that the language is often described as hard to learn.

Here is an example from docs as of Dec 2016. Without prior knowledge of what the terms “box“ and “destructure” mean, the introduction of the chapter "Box Syntax and Patterns" is hard to comprehend:

Currently the only stable way to create a Box is via the Box::new method. Also it is not possible in stable Rust to destructure a Box in a match pattern. The unstable box keyword can be used to both create and destructure a Box.

As someone who came to Rust with prior experience largely in very high level dynamic languages, especially Python, some of the terminology has been a little confusing.

Here are some notes that I've put together that may be useful for others. It will expand with time. Corrections are welcome. Ping me on Twitter or Reddit.

  • “static”, e.g. 'static lifetime
    Seems to come from the C keyword, which itself is probably recycled from an earlier language. Indicates that a variable has a static location in memory (although its value may change, i.e. it is not constant).
  • “unit” - e.g. unit struct, unit (())
    Comes from the term "unit type" of type theory, types that contain one value and thus no information. Imagine a Boolean object that could only store true. That's a unit type. Booleans could be implemented as the union of two unit types, True and False.
  • “move” - e.g. move semantics, contrasted with copy semantics
    Used fairly heavily in C/C++ texts, indicating that the owner changes (not the data!). Avoids unnecessary copying of data, e.g. should be a faster, simpler runtime.
  • “box” - e.g. Box::new(datum) - tells the compiler to allocate datum on the heap rather than the stack. rustc provides the ability for you to customise a Box's allocation an de-allocation behaviour with lang items.

Other terms I'm still looking into:

  • trait (was this an invention by Graydon et al to describe something that looks like but isn't an interface or behaviour?)
  • ownership (does this have a deeper/more technical meaning than the intuitive one that I've developed?)
  • lifetime (reading into Wikipedia's RAII article, it looks like this term was in use before its development, e.g. pre-1985)
  • Drop (is this just a shorter term for destructor?)

The question that I always have when encountering an English term used in a specific field, e.g. computer science, or community (how many definitions of the term "object" are there in programming?), is that the common English is only approximately correct. For programming it's slightly worse, as everything is a metaphor. Does that mean the term has been in use for 30 years and has developed a very technical and precise meaning.


Acknowledgements: many thanks to _habnabit on irc.mozilla.org#rust for reviewing the post

]]>
/etymology-of-rust-language-terms/9e893a96-3509-4ac0-8480-f0d230c5e7e0Fri, 23 Dec 2016 00:19:00 GMT
<![CDATA[An Incomplete History of Concurrent Programming Languages]]>Many computer scientists have tried to make concurrent programming easier over several decades. What can we learn from that work?

Disclaimer 1: You are the peer review. This isn't necessarily the truth, it's a blog post.

Disclaimer 2: This is an early draft with quite a few empty sections. Good enough to begin soliciting feedback for, not finished enough to feel satisfied.

1970s

Concurrent Pascal

Aside: I love the idea of Pascal. It feels somewhat sad to me that this language that's easy to compile and is quite readable got absolutely destroyed by Sun Microsystems in the 90s and its Java project. ZTH did some amazing work in the preceding two decades. Pascal, Modula-2, Oberon. I sometimes feel like the WinTel monopoly/monoculture did worse things to computer science than it did to consumer-facing software.

Concurrent Pascal [PDF] was Per Brinch Hansen's attempt to fix Pascal. GOTOs are out! Concurrency is in.

In Concurrent Pascal, two important concepts are "monitors" and "processes". A monitor is able to access shared data, such as global variables. Processes are not able to access this shared state. They must communicate with monitors via message passing.

External hardware is interfaced by another type called a "class". A class provides multiple processes access to a single device. Significantly, the compiler guarantees at compile time that no two processes can write to a single class.

The compiler has a lot of work to do in Concurrent Pascal. Hansen was convinced that a fairly rigid, hierarchical structure or processes, monitors and classes would ensure that the code was completely deadlock free.

Here is an example of a

type page = array[1..512] of char

type jobprocess =  
  process
    (input, output: diskbuffer);
  var
    block: page;
  cycle
    input.receive(block);
    update(block);
    output.send(block);
end  

Smalltalk-76

1980s

Erlang/OTP

1990s

ParC took inspiration from hardware description languages Verilog and VHDL. The language relies heavily on metaphor of events, signals and receivers.

Its authors' community is in a similiar space to [SystemC][], which was built to simulate many concurrent activities.

Hardware descriptions are different from normal coding in that the description is intrinsically parallel (with a large number of threads) and entirely static.

V-MS (Rick Munden (?))

To make use of the primitives, you include a header library (parc.h) and then you're able to create actor-like parallel blocks called a "process". ParC processes are mapped 1:1 to OS-threads. Communication between ParC processes is facilitated with channels (called pipes) under a message passing model.

Interesting language features include assertions that enforce the sequence of an event, e.g. asserting that one event occurs before another.

@?(<expression>) is an event that fires prior to the same event expression with an @, e.g. @(clock) and @?(clock), no other events can interleave.

Another is the "non-blocking assignment", with @= notation, that allows programmers to defer the value's assignment into some time in the future.

Here is example 12 from the their examples page, illustrating two sinks receiving messages:

#include <stdio.h>
#include "parc.h"

using namespace parc;

pipe<int> chn;

module top {  
public:  
  int i;
  top() {
    i = 1;
  }

  process p1 {
  start:
    for (; i < 5 ; i++) {
      mt_printf("p1> %d @ %3.3f\n",
                i,MyKern()->Now().D());
      chn.write(&i,1);
    }
  } a;

  process p2 {
    int d,r; 
  start:
    while (CHNS_DEAD != (r = chn.read(&d,1))) mt_printf("p2< %d @ %3.3f\n",
                                                        d,MyKern()->Now().D());
    printf("p2: done\n");
  } b;

};

  process p3 {
    int d,r; 
  start:
    migrate();
    while (CHNS_DEAD != (r = chn.read(&d,1))) mt_printf("p3< %d @ %3.3f\n",
                                                        d,MyKern()->Now().D());
    printf("p3: done\n");
  };


void test()  
{
  top t;

  p3 c;

  root()->StartAll();
}

int main(int argc,char **argv)  
{
  test();
}

2010s

Fortress

Chapel

Super Instruction Architecture (SIA) and its implementation (?) SIAL.

Mesham

References

http://www.cise.ufl.edu/research/ParallelPatterns/contents.htm

]]>
/an-incomplete-history-of-concurrent-programming-languages/792f4af6-aa8b-41bd-b6cb-0e985598ea6fThu, 22 Dec 2016 09:40:00 GMT
<![CDATA[Describing the Actor Model of Computation]]>I have always been under the impresssion that an actor is more or less defined as an isolated process that communicates via message passing, with each process sending messages others via inserting them in some mailbox. Describing Erlang/OTP and Scala/Akka as actor implementations would embody that view.

Actors, as described by Carl Hewitt* - who is arguably the creator of the model - turn out not to mean lightweight threads that exchange messages. They're far more restricted, yet in a sense significantly more profound and fundamental. Under his defintion, an actor is a formalism for an primitive of concurrent computation. The sense of the word "primitive" here might mean something like the for loop is a primitive of structured programming.

If not lightweight threads, then what? Hewitt Actors can create other actors, respond to messages, update their own state and send asyncronous messages to their acquainences. The rest is superfluous, perhaps even internally inconsistent. In Hewitt's mind, there is no requirement for an actor to have a mailbox, for example. That mailbox would itself be an actor, requiring its own actor-mailbox, ad infinitum. Hewitt describes mailbox implementations as "Fog Cutter" Actors.

In the Hewitt Actor model, there is no global state. There are no universal truths. Information propogated through the social network of actors. In effect, the entire application would operate in a manner like DNS. Or, as was intended, operate like a scientific community. In 1976, he described his overall intentions of the project:

The long term goal is to construct systems whose behavior approximates the behavior of scientific societies. That is, the ultimate aim is to build systems which model the way scientists construct, communicate, test, and modify theories.

In this light, the actor model really looks like it is attempting to emphasise the connectedness of actors, rather than their isolation. Perhaps a better term for Hewitt's model would have been neuron or hypernode or something? Actor still sounds to me a little like imposter or façade.

Reading through them, I found papers relating to actor model mid-70s and into the mid-80s quite fasinating at least in part because of the fascination and enthusiasm that lept out of almost every page. In the 80s, he and his MIT colleagues were deeply inspired by a) the prospect of emerging massively parallel computer architectures, such as the Connection Machine, b) creating a new model of computation that could efficiently make use of them and c) the prospect of mimicing scientific communities to create peer-to-peer distributed knowledge workable. I found this quote from one of Hewitt's then-PhD students, Gul Agha, particularly representative given it was penned in 1985:

However, there is now good reason to believe that we may have approached the point of deminishing returns in terms of the size and speed of the individual processor. Already, smaller processors would be far more cost-effective, if we could use large numbers of them cooperatively. In particular, this implies being able to use them in parallel.

Actor programming remains influential, if somewhat niche, as a method for implementing distributed systems. Proponents have emphaised the safety guarantees of message passing semantics and many distributed algorithms avoid global state entirely.

It is somewhat sad to me though that the most interesting part of the overall actor mission, to create systems that can resolve conflict and reason cooperatively, has never really resonated. Perhaps it is just too hard. I am sure that Hewitt is impressed with the prominence of the Internet of Things, Merkel trees and probably even the DAO. Still, I'm sure he and his students must be shaking their fists slightly. I wonder though how many people are willing to learn the lessons of 40 years ago by reading scanned copies old papers.

Given the vague nature of human language and the differences that arise between theorical computer science and implementation. It's difficult to distinguish between CSP, dataflow programming and message passign in general.

* Fun Fact: Hewitt got his account banned from Wikipedia for his articles on Actor programming

]]>
/describing-the-actor-model-of-computation/5c6fbad4-4695-4de4-8798-7df0aeb7f961Thu, 10 Nov 2016 09:10:00 GMT
<![CDATA[Escaping from Rust's Borrow Checker]]>Rust is a a programming language that provides strong assurances about safety. It achieves that by being confident about who is responsible for what at any given time. One part of how it achieves that is through its ownership system and "move semantics".

During a move, values are copied and the original values are marked as invalid by the compiler. This can be a strange thing to get your head around for new Rust programmers as it makes variables in local scope inaccessible later on within a function.

The following code snippet sets up a situation that will break with the error "use of moved value: `car`" when you attempt to compile it:

fn main () {  
  let me = Driver{};
  let car = Car{};

  me.drive(car);
  me.drive(car); // illegal, has been moved
}

struct Driver;  
struct Car;

impl Driver {  
  fn drive(&self, car: Car) -> () {
    println!("Zoom zoom!");
  }
}

The type signature of Driver.drive() tells us the level of authority it needs over its arguments. When you're interepreting a signature such as fn drive(&self, car: Car), the ampersand (&) says "I need a reference reference to self" and no decoration indicates that ownership of car is required.

One of the reasons that taking ownership is an issue later on within local scope of main() earlier is that objects are deleted (via the Drop trait) when their owners no longer need them. That is, once Driver.drive() returns, car is dropped.

Using References

If we are in charge of the code that requires ownership, perhaps we could just adjust our function to not need it?

In our tiny application, our definition of Driver.drive() changes from this..

impl Driver {  
  fn drive(&self, car: Car) -> () {
    println!("Zoom zoom!");
  }
}

..to this:

impl Driver {
  fn drive(&self, car: &Car) -> () {
    println!("Zoom zoom!");
  }
}

Our main() function also changes. We now need to adjust our code to match the signature that we just defined:

fn main () {  
  let me = Driver{};
  let car = Car{};

  me.drive(car);
  me.drive(car); // illegal, has been moved
}

Is now..

fn main () {
  let me = Driver{};
  let car = Car{};

  me.drive(&car);
  me.drive(&car);
}

Now, when we run the code, we hear our car roar!

Zoom, zoom!  
Zoom, zoom!  

Using Clone

Using references is a good sign that you're doing things the Rust way. Duplicating data unnecessarily is frowned upon, as its probably runs slower than using a reference and will almost certainly lead to more memory being used.

Still, the Clone trait is perfectly valid Rust and can be a convienent way to sidestep the borrow checker while you're focusing on learning Rust.

It's quite easy to use. In our main() function, we only need to call car.clone(). Make sure you're cloning at the first call to me.drive(), otherwise you would be trying to clone data that has already been moved.

To implement the trait on our data, you generally need to implement a function that will be able to duplicate the struct you're interested in duplicating. In our case, as we're using a unit struct, we can use use a code annotation to have the implementation written for us by the Rust compiler.

fn main () {
  let me = Driver{};
  let car = Car{};

  me.drive(car.clone()); 
  me.drive(car);
}

struct Driver;
#[derive(Clone)]
struct Car;

impl Driver {
  fn drive(&self, car: Car) -> () {
    println!("Zoom zoom!");
  }
}

When you run that code, you should see the familiar:

Zoom, zoom!  
Zoom, zoom!  

Using Copy

If you are fairly new to Rust, you may be wondering why issues with the borrow checker don't seem to crop up when dealing with integers and floats. That's because they're not being moved, they're being copied.

Copy is available to a few primitive data types (e.g. number types and a few others) and types that only use number types. Luckily for us, unit structs don't have any data members and are also able to make use of copy.

To make use of Copy in our code, our program changes to the the following:

fn main () {
  let me = Driver{};
  let car = Car{};

  me.drive(car); 
  me.drive(car);
}

struct Driver;
#[derive(Clone, Copy)]
struct Car;

impl Driver {
  fn drive(&self, car: Car) -> () {
    println!("Zoom zoom!");
  }
}

The application code, e.g. main() becomes simpler here. Unfortunately though, Copy is only available for a fairly limited number of use cases. Still, it's available to you when you might need it.

Wrapping Up

Programming in Rust, can feel pedantic and perhaps a little stuffy at first. Everyone has been stung by the borrow checker. Its job is to prevent you from being stung much more severely at runtime under load. Hopefully these little tricks have explained what's going on and have offered you some tips if you get stuck.

]]>
/borrow-checker-escape-hatches/acaa4cce-27f5-45a6-a081-32ab76d2ccb7Sat, 15 Oct 2016 08:33:00 GMT
<![CDATA[If Rust doesn't have exceptions, what happens on Ctrl+C?]]>Rust does not have exceptions. To indicate issues, it instead relies on strongly typed return values called Result that are either in an Err state or Ok state. Part of programming in Rust is learning that you need to explicitly handle both of these cases, even if common cases are handled by try! and ?.

That's sort of fine for inspecting the return values of functions. But what happens if an error occurs outside of a function boundary?

To illustrate what I expect to happen, consider this Python code:

import time

seconds_so_far = 0.0
delay = 0.25

while True:
    print('{}'.format(seconds_so_far))
    seconds_so_far += delay
    time.sleep(delay)

If I save that code as seconds_so_far.py and run it in cmd.exe or a Terminal, I am quite used to something like the following appearing:

$ python seconds_so_far.py 
0.0
0.25
0.5
0.75
1.0
1.25
1.5
^CTraceback (most recent call last):
  File "seconds_so_far.py", line 9, in <module>
    time.sleep(delay)
KeyboardInterrupt

The ^C indicates that I've pushed Ctrl+C together in my bash shell. Eventually to operating system will send SIGINT to the Python interpreter, which will then raise the KeyboardInterrupt exception. If this isn't explicitly handled by the programmer, then Python will terminate.

Let's try writing a similar sort of thing in Rust. We'll use cargo to get us going.

$ cargo new --bin seconds_so_far

Once the project is generated, open seconds_so_far/src/main.rs in a text editor and try replacing its contents with the following:

use std::{thread, time};

fn main() {  
    let mut seconds_so_far = 0.0;
    let delay = time::Duration::from_millis(250);

    loop {
        println!("{}", seconds_so_far);
    thread::sleep(delay);
    seconds_so_far += 0.25;
    }
}

Let's try and run the project to see what will happen when we interrupt it once it's going..

$ cargo run
   Compiling seconds_so_far v0.1.0 (file:///home/tim/Documents/Articles/Seconds%20So%20Far/seconds_so_far)
     Running `target/debug/seconds_so_far`
0  
0.25  
0.5  
0.75  
1  
1.25  
^C

Wow. Severe anti-climax. Pretty much what one might expect really.. the program exited.

Interestingly from me (as a Python programmer) this doesn't give me much information to go by if I wanted to handle the signal in my code.

Looking at the best issue ticket I could find, it looks like signal handling in the stdlib is currently up in the air. It seems that the Rust team is waiting for third party libraries to play around with APIs and then whatever settles will make its way into the language.

]]>
/if-rust-doesnt-have-exceptions-what-happens-on-ctrlc/c2761366-ed41-4cd2-8d81-5997c5fdf87aSat, 15 Oct 2016 07:57:00 GMT
<![CDATA[about tim]]>Tim is a senior data engineer from New Zealand, responsible for large scale text mining, distributed processing and machine learning pipelines. He is an experienced programmer with a deep interest in natural language processing, text mining and wider forms of machine learning and artificial intelligence.

Tim is the vice president of the New Zealand Open Source Society, the treasurer of the NZ Python Users Group Inc, and the Sandstorm New Zealand community lead. He believes in building a more secure, less centralized Internet where individuals and families are able to have autonomy, freedom and privacy.

As an advocate of open acess to government information and data, he is a leader in New Zealand's open government data movement. Among other achievements, he was an invited contributor to Declaration on Open and Transparent Government during its development.

When working at the University of Auckland, he became the first New Zealand Software Carpentry instructor. Software Carpentry aims to increase the software engineering practices of researchers to enable greater reproducibility in contemporary science.

Digital humanitarianism has been a major thread of his personal and professional life. While most active in responding to disasters during 2009-2011, he continues to be active member of the Humanitarian OpenStreetMap community. He hopes to use deep learning to build street maps from satellite imagery in under-represented areas.

If you would like to learn more about Tim, his sporadicly updated website contains a number of blog posts and digital artwork: http://timmcnamara.co.nz.

]]>
/about-tim/c996b385-d4d1-4ef3-9d89-ce40281f47a8Thu, 29 Sep 2016 21:06:00 GMT
<![CDATA[Welcome to Ghost]]>You're live! Nice. We've put together a little post to introduce you to the Ghost editor and get you started. You can manage your content by signing in to the admin area at <your blog URL>/ghost/. When you arrive, you can select this post from a list on the left and see a preview of it on the right. Click the little pencil icon at the top of the preview to edit this post and read the next section!

Getting Started

Ghost uses something called Markdown for writing. Essentially, it's a shorthand way to manage your post formatting as you write!

Writing in Markdown is really easy. In the left hand panel of Ghost, you simply write as you normally would. Where appropriate, you can use shortcuts to style your content. For example, a list:

  • Item number one
  • Item number two
    • A nested item
  • A final item

or with numbers!

  1. Remember to buy some milk
  2. Drink the milk
  3. Tweet that I remembered to buy the milk, and drank it

Want to link to a source? No problem. If you paste in url, like http://ghost.org - it'll automatically be linked up. But if you want to customise your anchor text, you can do that too! Here's a link to the Ghost website. Neat.

What about Images?

Images work too! Already know the URL of the image you want to include in your article? Simply paste it in like this to make it show up:

The Ghost Logo

Not sure which image you want to use yet? That's ok too. Leave yourself a descriptive placeholder and keep writing. Come back later and drag and drop the image in to upload:

Quoting

Sometimes a link isn't enough, you want to quote someone on what they've said. It was probably very wisdomous. Is wisdomous a word? Find out in a future release when we introduce spellcheck! For now - it's definitely a word.

Wisdomous - it's definitely a word.

Working with Code

Got a streak of geek? We've got you covered there, too. You can write inline <code> blocks really easily with back ticks. Want to show off something more comprehensive? 4 spaces of indentation gets you there.

.awesome-thing {
    display: block;
    width: 100%;
}

Ready for a Break?

Throw 3 or more dashes down on any new line and you've got yourself a fancy new divider. Aw yeah.


Advanced Usage

There's one fantastic secret about Markdown. If you want, you can write plain old HTML and it'll still work! Very flexible.

That should be enough to get you started. Have fun - and let us know what you think :)

]]>
/welcome-to-ghost/2311db5c-1221-4024-8eca-02ce31f246d1Fri, 05 Sep 2014 05:25:09 GMT