#programming #programming languges
8 January 2013
By many people I met over the years of my professional career as a software engineer, programming languages seem to be vastly overrated. Every now and then I see a battle of this language against that language and a flame war and a shit storm. To what extend - I don’t know.
For me, languages are tools, no more no less. They are tools for humans to formulate repeatable problem solutions in a way a computer and other humans can deal with. Some languages fit better to that solution and some to another. Some ecosystems have that benefit and that drawback, some another. But, no single language supports you in finding the solution for the problem in question. And that’s the crucial point of engineering, to find solutions. Programming, in essence, is selection or creation of Algorithms, and not simply coding.
I never made a big deal about the language I use, and I used a lot. If I have to learn another, I do it, simple as that. Sure, I have a sense of beauty or smartness of a particular language, but in the end of the day I try to choose the right tool for my particular solution. And that’s what I care about the most, the solution, the algorithm. Moreover, this is the right starting point for optimizations by the way. I even helped optimizing code in languages I never used before just by analyzing the algorithmic complexity of the particular solution, an approach many programmers obviously have forgotten while discussing questions like: which language is the fastest.
It’s rather similar to what Nietzsche once said - who spoke a number of contemporary and ancient languages fluently -: it does not count how many languages you do know or do not know, what counts is if you have something to say in the first place. This is why I spent more time reading papers or books about algorithms instead of the most fancy “Programming in XYZ”. If one has something to say, he or she will find a way to say it. If one has not - no language can help.
2 August 2012
These days I’m doing my first steps in the Go programming language. Today I wrote my web server hello world and I’m really excited. All I had to do was to write these lines of code.
package main
import (
"runtime"
"fmt"
"flag"
"net/http"
"strconv"
)
func main() {
// command line argments
port := flag.Int("port", 8000, "http port")
maxprocs := flag.Int("maxprocs", 1, "GOMAXPROCS")
flag.Parse()
// setup processor usage
runtime.GOMAXPROCS(*maxprocs)
// http handler
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello, %s", "world")
})
// http server
s := &http.Server{
Addr: ":" + strconv.Itoa(*port),
}
s.ListenAndServe()
}
Compilation took less than a second, then I could start the server:
$ bin/web -port=8000 -maxprocs=4
send some requests:
$ ab -c 100 -n 1000000 http://localhost:8000/
and this is what ab told me about performance:
Requests per second: 23454.77 [#/sec] (mean)
During that quick test, the web server allocated about 8MB of resident memory, while ab needed 34MB.
Sure, this thing does nothing useful. And also sure, I can reach similar numbers with an embedded Jetty or a POCO C++ embedded web server, but all I had to do to get this to work was to install Go, write that small code snippet and compile. No Maven XML, no painful dependency resolution, just some calls into the standard library and that’s it. Moreover, I have a single statically linked binary, 3.8MB in size, that I could deploy to whatever 64bit-Linux box I want.
This is practically amazing.
3 July 2012
In a blog I recently read, Conrad Irwin extends the well known MVC pattern to a MOVE pattern (http://cirw.in/blog/time-to-move-on). While his critics of MVC meets the right point (“… but the problem with MVC as given is that you end up stuffing too much code into your controllers …”) I cannot see what MOVE does better. It seems to be slightly more complex - and that’s what I would criticize.
What if we simplify things? All we need to get a separation of UI from data and processing in a stateful application are Views and a State Machine. And that’s all what MVC already is in its very core. We have a state machine with transitions initiated by events from the outside world and some kind of frontier or surface or membrane between that world and the state machine. That surface is what we know as View.
The view is a window into state with a given perspective. Not all may be visible through a given window, some state may be hidden, but moving over the surface will expose all state eventually. At some points at the surface we have sensory cells (those buttons) firing events into the state machine when touched, causing transition. And that’s what most MVC applications already do.
The perception of burdened controllers (and often views as well) is in essence caused by the fact that they simply doesn’t exist. Partly they are elements of the “M” - for Machine, partly elements of the “V” for View. And if you want a clean architecture based on a well understood computational structure, all you have to do is to accept the fact of having a State Machine and Views of state and nothing more.
BTW I willingly omitted the “F” of the (F)SM since this might be subject of discussion here, despite the fact that “infinite” is really huge in the world of computation.
12 January 2012
If you ever have to deal with really huge data that does not fit in RAM anymore, but you still need a consistent interface and efficient handling - and you have not the time to write it yourself, as usual - have a look at STXXL. It has some downsides, but after all it works well and efficiently. For me it was the fastest way - both in runtime and implementation time - to deal with data in the TB range.
You may find it here: http://stxxl.sourceforge.net
#amqp #cluster #distributed computing #rabbitmq #erlang
25 November 2011
Clustering RabbitMQ is very easy - if you know how. Unfortunately, the documentation on this topic is good but not good enough (cf. RabbitMQ Clustering). If you try to do it, you may get lost on the track until you find some insightful posts on the mailing list. This is why I summarize here how I got it to work.
Say, you want to create a cluster having two disc nodes and two ram nodes. If you do this on at least two machines, each having a disc and a ram node you achieve good fault tolerance and good scalability both with one setup. Your clients may connect to the ram nodes only or these are balanced by an additional load balancer.
But, how do I make a node a disc node and another node a ram node?
There’s no such command like “rabbitmqctl mkdisc” and there is no related configuration option. On one hand, this is a little counter intuitive, on the other hand this adds a lot of flexibility since you may alter the roles of nodes and restructure your cluster on the fly whenever necessary.
The rules are assigned by the way you call the “rabbitmqctl cluster” command. In our scenario, we have multiple nodes on the same host, so we need to wrap the calls to “rabbitmqctl” into shellscripts setting some environment variables (cf. RabbitMQ Configuration). If this has been done, you ensure all nodes of the cluster are running. Afterwards you execute a sequence of “stop_app”, “reset”, “cluster”, “start_app” commands for all nodes. If it comes to the “cluster” command, you add a space separated list of all disc nodes you want to create to the “cluster” command executed for each node. My mnemonic for this is that you copy the current node to all disc nodes. The whole sequence may look like this, with “rbctl.*” being your wrapper scripts:
host-of-disc1$ rbctl.disc1 stop_app
host-of-disc1$ rbctl.disc1 reset
host-of-dics1$ rbctl.disc1 cluster disc1@host-of-disc1 disc2@host-of-disc2
host-of-disc1$ rbctl.disc1 start_app
host-of-ram1$ rbctl.ram1 stop_app
host-of-ram1$ rbctl.ram1 reset
host-of-ram1$ rbctl.ram1 cluster disc1@host-of-disc1 disc2@host-of-disc2
host-of-ram1$ rbctl.ram1 start_app
host-of-ram2$ rbctl.ram1 stop_app
host-of-ram2$ rbctl.ram1 reset
host-of-ram2$ rbctl.ram1 cluster disc1@host-of-disc1 disc2@host-of-disc2
host-of-ram2$ rbctl.ram1 start_app
host-of-disc2$ rbctl.disc2 stop_app
host-of-disc2$ rbctl.disc2 reset
host-of-disc2$ rbctl.disc2 cluster disc1@host-of-disc1 disc2@host-of-disc2
host-of-disc2$ rbctl.disc2 start_app
If you have to add users, vhost and permissions, you better do it at the end of this procedure, otherwise the “reset” will delete all of this information. Also, if you want to change the cluster setup later, you should be careful with “reset”, omitting it for one disc node at least.
Another weak point with the whole clustering stuff is the location of the “.erlang.cookie” file. This file is essential for clustering and must have the same content for all nodes in the cluster. Documentation says RabbitMQ looks at “/var/lib/rabbitmq/.erlang.cookie” but I found this not always true. Supposed RABBIT_HOME points to the directory where the rabbit distribution is located, I copied the file to “$RABBIT_HOME/../.erlang.cookie” and RabbitMQ used this one. I’m not quite sure if this is a general rule.
22 November 2011
Very interesting paper: The Anatomy of the Facebook Social Graph
Reading this very interesting analysis of the structure of the Facebook graph I come to the conclusion that the main features qualitatively may be common to all graphs of social networks. For example, I’ve seen another social graph one order of magnitude smaller having very similar features. It would be really interesting to compare such graphs to other graphs, especially directed social graphs like Twitter or Google+ in terms of different measures. It would also be interesting to compare these graphs to offline social networks. Another interesting question could be to analyze the evolution of such graphs over time, for example using methods of the theory of random graphs.
Powered by Tumblr; designed by Adam Lloyd and Ingo Schramm.