首页 > 编程语言 >Fibers and Threads in node.js – what for?

Fibers and Threads in node.js – what for?

时间:2023-05-21 14:02:41浏览次数:57  
标签:node function what thread err js threads conn

https://bjouhier.wordpress.com/2012/03/11/fibers-and-threads-in-node-js-what-for/

 

I like node.js, and I’m not the only one, obviously! I like it primarily for two things: it issimple and it is very fast. I already said it many times but one more won’t hurt.

Before working with node, I had spent many years working with threaded application servers. This was fun sometimes but it was also often frustrating: so many APIs to learn, so much code to write, so many risks with critical sections and deadlocks, such a waste of costly system resources (stacks, locks), etc. Node came as a breath of fresh air: a simple event loop and callbacks. You can do a lot with so little, and it really flies!

But it does not look like we managed to eradicate threads. They keep coming back. At the beginning of last year Marcel Laverdet opened Pandora’s box by releasing node-fibers: his threads are a little greener than our old ones but they have some similarities with them. And this week the box got wide open as Jorge Chamorro Bieling released threads_a_gogo, an implementation of real threads for node.js.

Isn’t that awful? We were perfectly happy with the event loop and callbacks, and now we have to deal with threads and all their complexities again. Why on earth? Can’t we stop the thread cancer before it kills us!

Well. First, things aren’t so bad because fibers and threads did not make it into node’s core. The core is still relying only on the event loop and callbacks. And it is probably better this way.

And then maybe we need to overcome our natural aversion for threads and their complexities. Maybe these new threads aren’t so complex after all. And maybe they solve real problems. This is what I’m going to explore in this post.

Threads and Fibers

The main difference between fibers and real threads is on the scheduling side: threads useimplicitpreemptive scheduling while fibers use explicitnon-preemptive scheduling. This means that threaded code may be interrupted at any point, even in the middle of evaluating an expression, to give CPU cycles to code running in another thread. With fibers, these interruptions and context switches don’t happen randomly; they are into the hands of the programmer who decides where his code is going to yield and give CPU cycles to other fibers.

The big advantage of fiber’s explicit yielding is that the programmer does not need to protect critical code sections as long as they don’t yield. Any piece of code that does not yield cannot be interrupted by other fibers. This means a lot less synchronization overhead.

But there is a flip side to the coin: threads are fair; fibers are not. If a fiber runs a long computation without yielding, it prevents other fibers from getting CPU cycles. A phenomenon known as starvation, and which is not new in node.js: it is inherent to node’s event loop model; if a callback starts a long computation, it blocks the event loop and prevents other events from getting their chance to run.

Also, threads take advantage of multiple cores. If four threads compete for CPU on a quad-core processor, each thread gets 100% (or close) of a core. With fibers there is no real parallelism; at one point in time, there is only one fiber that runs on one of the cores and the other fibers only get a chance to run at the next yielding point.

Fibers – What for?

So, it looks like fibers don’t bring much to the plate. They don’t allow node modules to take advantage of multiple cores and they have the same starvation/fairness issues as the basic event loop. What’s the deal then?

Fibers were introduced and are getting some love primarily because they solve one of node’s big programming pain points: the so called callback pyramid of doom. The problem is best demonstrated by an example:

function archiveOrders(date, cb) {
  db.connect(function(err, conn) {
    if (err) return cb(err);
    conn.query("selectom orders where date < ?",  
               [date], function(err, orders) {
      if (err) return cb(err);
      helper.each(orders, function(order, next) {
        conn.execute("insert into archivedOrders ...", 
                     [order.id, ...], function(err) {
          if (err) return cb(err);
          conn.execute("delete from orders where id=?", 
                       [order.id], function(err) {
            if (err) return cb(err);
            next();
          });
        });
      }, function() {
        console.log("orders been archived");
        cb();
      });
    });
  });
}

This is a very simple piece of business logic but we already see the pyramid forming. Also, the code is polluted by lots of callback noise. And things get worse as the business logic gets more complex, with more tests and loops.

Fibers, with Marcel’s futures

var archiveOrders = (function(date) {
  var conn = db.connect().wait();
  conn.query("selectom orders where date < ?",  
             [date]).wait().forEach(function(order) {
    conn.execute("insert into archivedOrders ...", 
                 [order.id, ...]).wait();
    conn.execute("delete from orders where id=?", 
                 [order.id]).wait();
  });
  console.log("orders been archived");
}).future();

The callback pyramid is gone; the signal to noise ratio is higher, asynchronous calls can be chained (for example query(...).wait().forEach(...)), etc. And things don’t get worse when the business logic gets more complex. You just write normal code with the usual control flow keywords (ifwhile, etc.) and built-in functions (forEach). You can even use classical try/catch

Less code. Easier to read. Easier to modify. Easier to debug. Fibers clearly give the programmer a better comfort zone.

Fibers make this possible because they solve a tricky topological problem with callbacks. I’ll try to explain this problem on a very simple example:



db.connect(function(err, conn) { if (err) return cb(err); // conn is available in this block doSomething(conn); }); // Would be nice to be able to assign conn to a variable // in this scope so that we could resume execution here // rather than in the block above. // But, unfortunately, this is impossible, at least if we // stick to vanilla JS (without fibers).


The topological issue is that the conn


var c; db.connect(function(err, conn) { if (err) return cb(err); c = conn; }); // conn is now in c (???) doSomething(c);


But it does not work because the callback is invoked asynchronously. So c is still undefined when execution reaches doSomething(c). The c variable gets assigned much later, when the asynchronous connectFibers make this possible, though, because they provide a yield

var fiber = Fiber.current;
db.connect(function(err, conn) {
  if (err) return fiber.throwInto(err);
  fiber.run(conn);
});
// Next line will yield until fiber.throwInto 
// or fiber.run are called
var c = Fiber.yield();
// If fiber.throwInto was called we don't reach this point 
// because the previous line throws.
// So we only get here if fiber.run was called and then 
// c receives the conn value.
doSomething(c);
// Problem solved!

Things are slightly more complex in real life because you also need to create a FiberBut the key point is that the yield/run/throwInto combination makes it possible to transfer the connHere, I dived into the low level fiber primitives. I don’t want this to be taken as an encouragement to write code with these primitives because this can be very error prone. On the other hand, Marcel’s futures

And, to be complete, it would be unfair to say that fibers solve just this problem. They also enable powerful programming abstractions like generators. But my sense is that the main reason why they get so much attention in node.js is because they provide a very elegant and efficient solution to the pyramid of doom problem.

Sponsored Ad

The pyramid of doom problem can be solved in a different way, by applying a CPS transformation to the code. This is what my own tool, streamline.js, does. It leads to code which is very similar to what you’d write with fiber’s futures

function archiveOrders(date, _) {
  var conn = db.connect(_);
  flows.each(_, conn.query("selectom orders where date < ?",  
                           [date], _), function(_, order) {
    conn.execute("insert into archivedOrders ...", 
                 [order.id, ...], _);
    conn.execute("delete from orders where id=?", 
                 [order.id], _);
  });
  console.log("orders been archived");
}

The signal to noise ratio is even slightly better as the wait() and future()And streamline gives you the choice between transforming the code into pure callback code, or into code that takes advantage of the node-fibers

Wrapping up on fibers

In summary, fibers don’t really change the execution model of node.js. Execution is still single-threaded and the scheduling of fibers is non-preemptive, just like the scheduling of events and callbacks in node’s event loop. Fibers don’t really bring much help with fairness/starvation issues caused by CPU intensive tasks either.

But, on the other hand, fibers solve the callback pyramid of doom

Threads – What for?

As I said in the intro, threads landed into node this week, with Jorge’s thread_a_gogoimplementation (and I had a head start on them because Jorge asked me to help with beta testing and packaging). What do they bring to the plate? And this time we are talking about real threads, not the green kind. Shouldn’t we be concerned that these threads will drag us into the classical threading issues that we had avoided so far?

Well, the answer is loud and clear: there is nothing to be worried about! These threads aren’t disruptive in any way. They won’t create havoc in what we have. But they will fill an important gap, as they will allow us to handle CPU intensive operations very cleanly and efficiently in node. In short, all we get here is bonus!

Sounds too good to be true! Why would these threads be so good when we had so many issues with threads before? The answer is simple: because we had the wrong culprit! The problems that we had were not due to the threads themselves, they were due to the fact that we had SHARED MUTABLE STATE!

When you are programming with threads in Java or .NET or other similar environments, any object which is directly or indirectly accessible from a global variable, or from a reference that you pass from one thread to another, is shared by several threads. If this object is immutable, there is no real problem because no thread can alter it. But if the object is mutable, you have to introduce synchronization to ensure that the object’s state is changed and read in a disciplined way. If you don’t, some thread may access the object in an inconsistent state because another thread was interrupted in the middle of a modification on the object. And then things usually get really bad: incorrect values, crashes because data structures are corrupted, etc.

If you have shared mutable state, you need synchronization. And synchronization is a difficult and risky art. If your locks are too coarse you get very low throughput because your threads spend most of their time waiting on locks. If they are too granular, you run the risk of missing some edge cases in your locking strategy or of letting deadlocks creep in. And, even if you get your synchronization right, you pay a price for it because locks are not free and don’t scale well.

But threads a gogo (I’ll call them TAGG from now on) don’t share mutable state. Each thread runs in its own isolate, which means that it has its own copy of the Javascript code, its own global variables, its own heap and stack. Also, the API does not let you pass a reference to a mutable Javascript object from one thread to another. You can only pass strings (which are immutable in Javascript) (*). So you are on the safe side, you don’t run the risk of having one thread modify something that another thread is accessing at the same time. And you don’t need synchronization, at least not the kind you needed around shared mutable objects.

(*) it would be nice to be able to share frozen objects across threads. This is not available in the first version of TAGG but this may become possible in the future. TAGG may also support passing buffers across thread boundaries at some point (note that this may introduce a limited, but acceptable, form of shared state).

I hope that I have reassured the skeptics at this point. As Jorge puts it, these threads aren’t evil. And actually, they solve an important problem which was dramatized in a blog post a few months ago: node breaks on CPU intensive tasks. The blog post that I’m referring to was really trashy and derogative and it was making a huge fuss about a problem that most node applications won’t have. But it cannot be dismissed completely: some applications need to make expensive computations, and, without threads, node does not handle this well, to say the least, because any long running computation blocks the event loop. This is where TAGG comes to the rescue.

If you have a function that uses a lot of CPU, TAGG lets you create a worker thread and load your function into it. The API is straightforwards:



var TAGG = require('threads_a_gogo'); // our CPU intensive function function fibo(n) { return n > 1 ? fibo(n - 1) + fibo(n - 2) : 1; } // create a worker thread var t = TAGG.create(); // load our function into the worker thread t.eval(fibo);



Once you have loaded your function, you can call it. Here also the API is simple:

t.eval("fibo(30)", function(err, result) {
  console.log("fibo(30 result);
});

The function is executed in a separate thread, running in its own isolate. It runs in parallel with the main thread. So, if you have more than one core the computation will run at full speed in a spare core, without any impact on the main thread, which will continue to dispatch and process events at full speed.

When the function completes, its result is transferred to the main thread and dispatched to the callback of the t.eval call. So, from the main thread, the fibo computation behaves like an ordinary asynchronous operation: it is initiated by the t.eval

Often you’ll have several requests that need expensive computations. So TAGG comes with a simple pool API that lets you allocate several threads and dispatch requests to the first available one. For example:

var pool = TAGG.createPool(16);
// load the function in all 16 threads
pool.all.eval(fibo);
// dispatch the request to one of the threads
pool.any.eval("fibo(30unction(err, result) {
  console.log("fibo(30 result);
});

TAGG also provides support for events. You can exchange events in both directions between the main thread and worker threads. And, as you probably guessed at this point, the API is naturally aligned on node’s Emitter

A slight word of caution though: this is a first release so TAGG may lack a few usabilityfeatures. The one that comes first to mind is a module system to make it easy to load complex functions with code split in several source files. And there are still a lot of topics to explore, like passing frozen objects or buffers. But the implementation is very clean, very simple and performance is awesome.

Wrapping up on threads

Of course, I’m a bit biased because Jorge involved me in the TAGG project before the release. But I find TAGG really exciting. It removes one of node’s main limitations, its inability to deal with intensive computations. And it does it with a very simple API which is completely aligned on node’s fundamentals.

Actually, threads are not completely new to node and you could already write addons that delegate complex functions to threads, but you had to do it in C/C++. Now, you can do it in Javascript. A very different proposition for people like me who invested a lot on Javascript recently, and not much on C/C++.

The problem could also be solved by delegating long computations to child processes but this is costlier and slower.

From a more academic standpoint, TAGG brings a first bit of Erlang’s concurrency model, based on share nothing threads and message passing, into node. An excellent move.

Putting it all together

I thought that I was going to write a short post, for once, but it looks like I got overboard, as usual. So I’ll quickly recap by saying that fibers and threads are different beasts and play different roles in node.

Fibers introduce powerful programming abstractions like generators and fix the callback pyramid problem. They address a usability issue.

Threads, on the other hand, fix a hole in node’s story, its inability to deal with CPU intensive operations, without having to dive into C/C++. They address a performanceissue.

And the two blend well together (and — sponsored ad — they also blend withstreamline.js), as this last example shows:

var pool = TAGG.createPool(16);
pool.all.eval(fibo);
console.log("fibo(30 pool.any.eval("fibo(30));

Kudos to Marcel and Jorge for making these amazing technologies available to the community.


标签:node,function,what,thread,err,js,threads,conn
From: https://blog.51cto.com/u_2700990/6319223

相关文章

  • ExtJS 4 Grid组件
    我们正在努力为ExtJS4做准备,到目前为止,我们已经发布了3个开发者预览版,我们的目标是争取在4到6周内或争取更快的时间内发布最终版本。在最近的开发者预览版,我们首次发布了新的Grid组件。新的grid已经重写,而且比3.x版本的有更多功能。新的grid不但大幅提高了性能,而且更加灵活,因此......
  • ExtJS 4 Beta 2预览:Ext.Brew包
       我们还没有介绍过ExtJS4众多特性之一的新Ext.Brew包。这是独一无二的Javascript框架,Ext.Brew是采用Sencha独特技术酿造的的一杯完美的茶。   使用新数据包的强大功能,可在任何时候构建一个生成一致均衡的茶的应用程序。开始之前,首先定义一个部件之间用于交互的模型。Ex......
  • Ext JS 4倒计时:图形和图表
         基本图形功能     一直以来,在Web应用程序中进行绘图是很困难的事情。虽然我们很希望在应用程序中创建丰富的交互式图表或其他图像,但是HTML真的是不是这块料。通常,我们需要使用到AdobeFlash,但这给我们带来了两个难题:我们习惯书写Javascript代码,而且Flash并不是所......
  • Ext JS 4倒计时:开发者预览版
         作为ExtJS团队的代表,我很高兴地宣布第一个ExtJS4开发者预览版发布了。ExtJS4是到目前为止,我们所做的改动最大,有史以来最先进的Javascript框架。从生成HTML代码到类系统,框架的每一个领域几乎都被更新了。我们统一了Api,添加了令人难以置信的新功能和改进了整个框架......
  • CesiumJS 源码杂谈 - 时间与时钟系统
    目录1.时间的“诞生”2.时间的推进3.EntityAPI与PropertyAPI的更新动力源4.简单应用4.1.使用原生JSDate对象创建JulianDate4.2.使用时间字符串(ISO8601标准的时间字符串或UTC时间字符串)创建JulianDate4.3.为时钟设置起止时间和速率4.4.调整时钟的循环情况你......
  • 折线图highcharts.js -纯js图表库
    查看演示功能强大、开源、美观、图表丰富、兼容绝大多数浏览器的纯js图表库ps:使用highcharts怎么去掉图表上那个官网标识链接credits:{enabled:false}......
  • nodejs多线程,真正的非阻塞
    干货:收藏:http://cnodejs.org/topic/518b679763e9f8a5424406e9node从他推出至今,充满赞美和饱受诟病的都是其单线程模型,所有的任务都在一个线程中完成(I/O等例外),优势的地方自然是免去了频繁切换线程的开销,以及减少资源互抢的问题等等,但是当nodejs面对cpu密集型模型的时候就力不从心了......
  • [nodejs]国内npm安装nodejs modules失败的几个解决方案
    使用npm安装node模块时经常有卡住安装失败的情况,如图所示。原因在于npm服务器在美国,还有就是某强大的防火墙作用。这样的问题导致很多新手放弃使用node,几乎每天都有新手再问这个问题。现在分享一下解决这个问题的解决方案1.可以通过一劳永逸的方式修改代理地址2.更换npm源拼rp (......
  • js number format All In One
    jsnumberformatAllInOne金融数据表示法千分位符号//1,000,000demos(......
  • Newtonsoft.Json使用
    目录枚举类型序列化/反序列化结论序列化后的字符串字段顺序控制通过[JsonProperty(Order=1)]控制字段序列化后的顺序通过自定义ContractResolver的方式实现相关参考用于记录开发过程中Newtonsoft.Json中使用的常用功能。枚举类型序列化/反序列化由于枚举类型的基类是值类型,......