Node JS for backend programmers

I started working in a new and exciting start up, and the thought of using Java for development crushed my soul, so I began shopping around for a fast-development, fast-deployment, easy ramp up and heavily supported language. I wanted something that will be easy for new developers to learn, and most importantly, something that I’ll have fun writing in.

NodeJS was definitely not my first (or my second, for that matter) option: I’m a backend kind of guy. I’ve done some web, but I would hardly claim expertise in web and UI. When I’ve heard of NodeJS, which was described to me as “Javascript for backend”, I was hardly excited about it. I’ve had some experience with JS, and the experience has hardly made me want to keep using it, let alone write my backend with it — it had none of the order or the elegance of what I used to think of as “Real languages”.

However, since the name NodeJS kept coming up from people and colleagues whose opinions I really appreciate, I’ve decided I can’t keep ignoring it and I have to look into it before I dismiss it.

To make a long story short, I now use NodeJS as part of the production environment, and I find it flexible, easy to deploy and very well supported by the community. However, it doesn’t fit everything and everyone — specially if you have a cpu intensive app.

Bottom line — It’s a very efficient functional language with non-blocking I/O that runs on a single thread, on a single cpu (but you can start several apps and communicate between them) and with a large and thriving community.

So, what is NodeJS, anyway?

Basically, Node JS is frameworks designed around an event-driven, non-blocking I/O model, based on Javascript with I/O libraries written in C++. There are 2 main reasons why JavaScript was chosen as the programming language of NodeJS:

  • Google’s V8 engine (the javascript interpreter in chrome)
  • The fact that it had no I/O modules, so the I/O could be implemented a new to support the goal of non-blocking I/O.

The fact that it’s running on Google’s V8 also ensures that as long as google keeps improving Chrome’s performances, NodeJS platform will keep advancing as well.

What does it do, and how?

Well, from my POV, the main deal with node is that everything is non-blocking, except your code. What does it mean? It means that every call an I/O resource — disk, db, web resource or what have you is non-blocking. How can it be non blocking, you ask? everything is based on callbacks.

So for example, if you’re working with mongodb, this is how a query looks like:

var onDBQueryReturn = function(err, results) { 
   console.log(“Found “ + JSON.stringify(results) + “ users”);
   console.log(“Calling query”);
   db.usersCollection.find({‘_id’: “1234”}, onDBQueryReturn);
   console.log(“Called query”);
}

The output will be:

Calling queryCalled queryFound {“_id”: “1234”, “name”: “User1”}

Now, this might make perfect sense for people who are experienced with functional programming, but it’s kind of a strange behaviour for people accustomed to Java, for example. In node (much like in other functional programming languages), a function is a first level object, just as a string — so it can be easily passed as an argument. What happens here is that when the call to the DB is completed, the callback function is called and executed.

The main thing to remember is that node js runs on a single cpu*. Because all it’s I/O intensive calls are non-blocking, it can gain high efficiency managing everything else on a single cpu, because it never hangs waiting on I/O.

How is it like to write backend app with Nodejs?

In one word — easy. In several words — easy until you get to parallelism .

Once you get the hang of the weird scoping issues of Javascript, Node can be treated like any other language, with a non-customary-yet-powerful I/O mechanism. Multi threading, however, is not a simple as in other languages — but keep in mind that the concept of non-blocking I/O has made the need for multithreading much less necessary than you usually think.

In addition, Node has a very thriving community — which means that everything you could possible want has already been developed, tried and honed: You can use the npm — Node Package Manager (which is Node’s equivalent of the maven repo, I would say) for easy access to everything: One of the most interesting pages is the Most depended upon modules.

Multithreading in Node

There is nothing parallel in node. It might look like a major quirk, but if you think about it, given that there’s no I/O blocking, the decision to make it single threaded actually makes everything a lot easier — You don’t really need multithreading for the kind of apps Node was designed for if you don’t hang on I/O, and it relieves you from the need to think about parallel access to variables and other object.

Inspite all the above, I feel strange when I can’t run things in parallel, and sometimes it easy kind of limiting, which is why Node comes with cluster capabilities — i.e, running several node processes and communicating between them on sockets — but it’s still experimental . Other options are using the fork(), exec() and spawn(), which are all different implementations of ChildProcess.

Should I use it or not?

The short answer is, as always, it depends.

If you feel at home with functional programming and you’re running an I/O intensive app — which is the classic use case for a website — then by all means, do. The community is very vibrant and up to date, deployment is a spree (specially with hosting services like Nodejitsu and codeship.io)

If you’re running a cpu-intensive application, if you don’t trust dynamic typing, if you don’t like functional programming — don’t use it. Nice as it is, I don’t think it offers anything that couldn’t be achieved using Ruby or Python (or even scala, for that matter).

Few last tips

(in no special order)

IDE

IDEs are a kind of religion, but I’ve been using WebStorm and I like it a lot. It’s 30-days trial and a $50 for a personal licence (or $99 for a commercial one), and I think they even provide them for free for open-source projects.

Testing

It’s very easy to make mistakes, especially in dynamic language. Familiarize yourself with Mocha unit testing, and integrate it into you project from day 1.

Coordinating tasks

Sometimes you want to run several processes one after another, or you might want to run a function after everything is done, and so forth. There are several packages for that exact purpose, but the most widely used and my personal favourite is Async

More resources

Understanding the node.js event loop
Multiple processes in nodejs and sharing ports
Multithreaded Node.JS

AWS Redundant Docker logs

Our AWS, ElasticBeanstalk deployed services are running with Docker.

Our dockerrun.aws.json defines a both the logpath:

"LogPath": "/var/lib/docker/containers/<containerId>/<containerId>-json.log",
"LogConfig": {
        "Type": "json-file",
        "Config": {
            "max-file": "4",
            "max-size": "250m"
        }
    },
}

And indeed, the log files are there, and they rotate according to size (250m) and max of 4 files.

However, we have more logs files, which is not totally clear who is writing, which are found at /var/log/containers/ – and they are casuing us troubles, because they can grow to huge sizes, and chocke up the host.

These are not the files which are read when running docker logs <containerId>, because I’ve deleted them and still the docker logs works. When I delete the files at /var/lib/docker/containers/, on the other end, the docker logs returns empty.

Apple Provisioning Hell

We’re developing an ionic app for iOS and android, and handling the Apple provisioning/certificates/apn profile is hell. Just pure hell.

One of them most annoying messages is “Your account already has a valid iOS distribution certificate”. If my account already has a valid iOS dist. cert, why won’t you just get it and build the freaking thing?

After wasting craploads of time on the nonsese, I finally found Sigh. They had me at the first sentence:

Because you would rather spend your time building stuff than fighting provisioning

Just install it: sudo gem install sigh

And then run it: sigh

And that’s it. It will get, fix, download and install all the provisioning profiles, and you’ll save yourself hours of tweaking around with apple’s weird licensing logic.

Sigh is also a part of an open-source toolbox called fastlane, to easily facilitate iOS development, testing, installation and deployment.

Monitoring CloudWatch statistics using Grafana, InfluxDB and Telegraf

We’ve started checking out monitoring solutions for our AWS-based infrastructure, and we want it to be not-that-expensive, monitor infrastructure (cpu, I/O, network…) and Application statistics

We’ve looked into several options, and we’re currently narrowing it down to Grafana-InfluxDB-Telegraph.

The idea is as following: Use Telegraf to pull CloudWatch statistics from amazon, save them into the InfluxDB, and use Grafana to present these statistics and manage alerts and notifications.

(Why not just use the Grafana CloudWatch plugin? Because it doesn’t supoort notification, sadly )

Set up the environment

To test everything, we’ve set up a docker env:

Create a network

docker network create monitoring

The Grafana docker

docker run -d -p 3000:3000 --name grafana --net=monitoring -v $PWD:/var/lib/grafana -e "GF_SECURITY_ADMIN_PASSWORD=secret" grafana/grafana

The Influx docker

docker run -p 8086:8086 -d --name influxdb --net=monitoring -v $PWD:/var/lib/influxdb influxdb

Important! add 127.0.0.1 influxdb to your hosts file (see Sanities for durther explanation)

The Kapacitor docker

We’re running it with — net=container:influxdb docker run -p 9092:9092 -d --name=kapacitor -h kapacitor --net=monitoring -e KAPACITOR_INFLUXDB_0_URLS_0=http://influxdb:8086 $PWD/kapacitor.conf:/etc/kapacitor/kapacitor.conf:ro kapacitor

The telegraph docker

First we need to generate a config file for our needs, so: docker run --rm telegraf --input-filter cloudwatch --output-filter influxdb config > telegraf.conf

And then we need to fix the region, credentials, and so on (not a lot) Then run the docker:

docker run -d --name=telegraf --net=monitoring -v $PWD/telegraf-aws-influx.conf:/etc/telegraf/telegraf.conf:ro telegraf

and

docker logs -f telegraf

Let’s monitor!

So we have all the services up — grafana, influxDB and telegraf. By now, telegraf should be pulling data from aws cloudwatch, and storing them inside influxDB. So now we need to hook up grapfana into to that data stream!

Create a new DataSource from your InfluxDb, with db = telegraf (you’ll have to input it in the DataSource page) and call influx_cloudwatch

Create a new Dashboard with your influx_cloudwatch data source, and create a new Graph.

Entities probelm

As you might have notice, we now have all these matrics, but we have a problem: We want to monitor our performance by application, and most of the data is available to us with only instanceIds (and these are not fixed, because we use ElasticBeanstalk).

Some of the data measurements, like AWS/ECS is available with clusterName tag which is a bit similar to our application name (“awseb-${appName}-rAnd0mNum”), but the AWS/EC2 instances only come with AutoscalingGroup tag, which is by not very indicative to our application names(awseb-r-x5r778aw-stack-AWSEBAutoScalingGroup-123Z3RRAFF86). So, we need to find a normal way to add the application name to both the EC2 data and the ECS data, so we could build something that makes sense.

So we’re using Kapacitor:

To add script: kapacitor define ${scriptName} -type stream -tick ${scriptFileName.tick} -dbrp kapacitor_example.autogen

To write to the db: https://docs.influxdata.com/kapacitor/v1.3//nodes/influx_d_b_out_node/

Let’s Alert!

In order to add alerts to our graphs, we first need to add Alert channels. In Grafana, go to Alerting, and add an alerting channel. The easiest one imho is the Telegram alert channel.

Just install telegram (on your local machine, and on your mobile phone), and then go to the BotFather. Create a bot according to the instructions, and after you create it, run /token. It will give you the Bot Api Token. The next thing you need is you chatId. To get that, just go to get_id_bot which will give you your Chat Id.

That’s all you need. Now you can go to one of the graphs, hit ‘Alerts’, and there ‘Notifications’

Sanities

If things don’t seem to work:

First, log in to your Grafana docker and run curl -G 'http://influxdb:8086/query?pretty=true' --data-urlencode "q=SHOW DATABASES" If the query passes, you can access influx db from the grafana docker Now run the same thing from the telegraf docke.

Also, the reason you should add “influxdb” to your hosts file is this: When we use all the dockers in the same network it means they can access each other seamlessly. However, when we open Grafana using our local browser, and try to add influxdb as a Data Source, it is all done on the client — which is the host(!) of the dockers. So, it’s doesn’t know what is “influxdb”. That’s why we add it to the hosts file.