Friday, August 10, 2018

How Ballerina does I/O

Recently i came across this article which i found rather fascinating and thought of comparing how Ballerina does I/O in comparison to other programming languages which were compared in the article .

If you have not read the article before. I encourage you to read it before reading the rest of this post. Since the concepts explained in the article will not be re iterated.


Authors Summary of the Article 



Article mainly compared between the existing I/O capabilities provided by different programming languages (PHP, Java, Node.js and Go). The following table illustrates how the author in the article summarized his findings. 


The author mainly compares between several programming languages on their I/O capabilities and elaborates the importance of programming language supporting Multi-threading over creation of a process, Non-blocking I/O and ease of use (providing transparency for the developers to write code over having to manage callbacks, callbacks were artificially created as a result of non-blocking functionality in some of the programming languages which reduces the ease of use).

The author summarizes in his article how "Golang" provides most of the characteristics required by a programming language to perform I/O operations, in comparison to other programming languages he compared.

Since Golang is the successor. I thought of further evaluating pros and cons of  Golang to really identify it's capabilities and it's global understanding.

Further Evaluation 


Further, beyond the boundaries of the article i came across the following post which describes some of pain points in developers are experiencing in Golang ,

- lack of Vectored-I/O which means, this will not support operations such as scatter-gather. 
- Each time a IO read is performed a buffer is allocated first before performing the sys call. This would however will not guarantee that the read will return by filling the content to the allocated buffer.

To further elaborate this, say you're attempting to read content from a socket. When attempting to read it is expected for content to be present (obviously.. Dha).

There are several approaches to read content, the post explains that in Golang, first a buffer is allocated, sequentially a read() sys call will be called. As stated above bytes might not be present when attempting to read, if there're no bytes the read will return empty. However, allocated buffer remains the same which means increased memory footprint.

So is there an alternative for this ? how can you allocate and perform a sys() calls to read only if the content is present. I will explain further in this post on how Ballerina does this. Again if you have not read the article i encourage you to do so. The author explains how the sys calls work which i will not repeat.

Ballerina I/O Architecture 





Canonicalization


Shown above is an illustration of Ballerina I/O architecture.

I/O or Input/Output is about how a program reads data for processing or writes data after processing. I wouldn't be wrong to say every program involves an I/O operation some way or another either you read from a file process it, write some of the content to a socket or perhaps you do some calculation and write the result to the standard output and the end of the day you would mainly see the following attributes involved.

# I/O sources - these could be files, socket, devices etc basically the entities which you would use to gather or place information which is required by your programs for processing.
# Information - these information are mainly delivered to a program in bytes which is the standard way to represent information
# Interpretation - information or a sequence of bytes (1 or more) could be interpreted differently. In other words byte is basically a number. This number could represent a character in alphabet, pixel in an image, minute detail of an audio clip, state in a traffic light etc. Once a byte is being absorbed or being written into an I/O source. The program basically interprets it's value to perform it's processing these interpretations can commonly be characters, records etc.

Interpretation can be independent of it's source. Whether you get the information from socket, file, standard input it doesn't matter as long as it's the same information. Ballerina I/O architecture revolves around this principal.

So what advantage will it provide ?

say as a developer, you write one program to read from a file and perform it's processing. Say suddenly you're being told that the very same content should be read over the network. so in this case the source would change from file to socket, however the information and the interpretation stays the same. So in Ballerina all you need to change is the source and the rest of the program would work. You will see how when you further identify the semantics/syntax it provides.


Syntax


You could try the samples available in the site to get a broader sense of how to use I/O apis. Following is a pseudo code on how I/O apis look like.

function main(string... args) {
         //Initialize channel/I/O source
         io:ByteChannel sourceChannel = io:openFile(filePath, permission);
         var result = sourceChannel .read(numberOfBytes);
         //Initialize channel/I/O source
         sourceChannel .write(..);   
}

So basically as a developer when you write a program and figure out the I/O source needs to change from file to socket all you need to change is how ByteChannel is initialized from file to socket, in ballerina all the I/O sources will be represented as a ByteChannel.
Going back to the beginning of the post, when compared ballerina with other programming languages compared in the article. Following are some highlights   

# Async model -

Ballerina uses a threading model which will have controlled amount of IO threads, reduced context switches, based on NIO (Non blocking I/O) which will be described further in this article.

# Ease of use -

As you would observe. There're no callbacks. For the developer the non-blocking behaviour is transparent. Which makes it as easy to use as other comparative programming languages which tops the list for ease of use. 

As i mentioned in the beginning of the post, let me illustrate the approach ballerina uses to read content to avoid the limitation of causing additional memory footprint. 

Under the Hood (How it works)


Say you're reading content from a socket. The functional flow looks like the following,



Illustrating thought the syntax highlighted earlier in the post. When the program execution hits the following line,

var result = channel.read(numberOfBytes);

The functional flow looks like the following,

1. When read is called, an event would be registered with a selector (multiplexer which will capture event from Kernal and notify accordingly). At this point no buffers will be allocated and no thread will be hanged or blocked internally. The event would await until it's being notified by the Kernal. 
2. When the specific I/O source writes data to the Kernal buffer. The selector will be notified through the kernal to the user space. Notifying the user space of readiness of data to be read from the buffer from the given channel
3. At this point there is certainty there's data available to be read, hence the selector manager will dispatch read event to IOThreadPool (which is a dedicated thread group that will handle I/O operations in Ballerina), only at this point Ballerina would allocate a buffer
4. Data will be read into the newly allocated buffer and once the operation is complete, the next LoC will be executed. The entire operation is natively non-blocking but developer does not need to struggle handling callbacks.


Summary 


The post mainly compared some of the factors considered to evaluate I/O apis offered by different programming languages. The comparison points were derived from article. sequentially some of the pain points of the existing I/O models were discussed. Finally Ballerina I/O architecture was compared against the comparison points discussed in the article and further explained how ballerina addresses some of the commonly known limitations which are even faced by the successor mentioned in article.

I encourage you to try ballerina out yourself.