Qnet Networking

Where the programmer lives
Node descriptors aren't nids!
How do I pass node descriptors around?
The <sys/netmgr.h> header file
When node descriptors are valid for a node other than its own

Where the programmer lives

The Qnet Networking chapter describes what the programmer needs to know to write code.

For information that describes what the user needs to know about networking, see the System Administration guide.

The constants and API's involved in networking exist in the following header files:

<sys/neutrino.h>
<sys/netmgr.h>

We'll discuss each in turn, but here they are as a group:

from <sys/neutrino.h>:
```
int ConnectAttach(uint32_t nd, pid_t pid, int chid, 
    unsigned index, int flags);

int SignalKill(uint32_t nd, pid_t pid, int tid, 
    int signo, int code, int value);
      
```
The first parameter in each function is a node descriptor. If it's zero, the local node is used. If it's non-zero, the connection request or signal is sent to the appropriate node.

from <sys/netmgr.h>:

ND_LOCAL_NODE
ND_NODE_CMP(a,b)
int netmgr_strtond(const char *nodename, char **endstr);
int netmgr_ndtostr(unsigned flags, int nd, char *buf, size_t maxbuf);
int netmgr_remote_nd(int remote_nd, int local_nd);
int netmgr_ctl(int nd, int op);

Node descriptors aren't nids!

In QNX 4 (and QNX 2), nodes were identified by numbers known as "nids" (short for network identifiers). These nids were globally unique and were used both as the external representation of the node (as in the //<nid> syntax, or after a -n option on the command line) and as the number that was passed to various API's and in messages.

However, in QNX 6:

the external representation is the node name
a node descriptor is the value passed to API's and in messages.

Here are a few rules about node descriptors:

Node descriptors are not globally unique - FQNN's (Fully Qualified Node Name) serve that purpose.
Node descriptor "5" might refer to a node named "A" on one machine and some other node "B" on a second machine. In other words, the node descriptor numbers are relative to the machine that the program is running on. You might be saying to yourself, "Huh? If they're only good on the machine that they were created on, how do I pass them around in messages and make things work when the message goes off node?" A good question, and we'll get to it shortly.
node descriptors aren't valid for all time.
If there are no connections to a node and it hasn't talked to our node for a period of time, qnet will detect that the node is unreachable, and may reclaim the node descriptor for reuse. That means that at one time, node descriptor 5 on node A refers to node B. At some later point in time (on the order of hours), node descriptor 5 on node A might refer to node C. This won't happen if there are active connections to the node, but consider the following case -- if you have a connection to a node and attempt to send data to it but the qnet has determined that the link has failed (the cable was cut, the machine was turned off, etc.), you'll get an error report from the send and the connection will be torn down. When you attempt to reestablish the connection to the remote machine, you cannot reuse the same node descriptor you used before -- you have to convert the node name to a new node descriptor (which will fail if the cable is still cut, or the machine still off, etc).
The upshot of this is that a program should never hang on to a node descriptor for any length of time. When it gets the node descriptor as the return value from a function, or in a message, it should immediately turn around and pass it as the parameter to another function, or put it in another message to be sent on. If you're putting a node descriptor in a static variable, you're doing something wrong. Think of them as one of those exotic sub-atomic particles that come into existence for a few nanoseconds and then decay into something else :-). If you want to remember about another node, remember the node name (perferably in the "directory" form so that the IS department can fiddle the node domains freely).

Now that you're thoroughly depressed, a ray of light... The vast majority of programs don't every have to deal with node descriptors at all. Their usage is buried in the C library (mainly in the open code).

How do I pass node descriptors around?

Since node descriptors are only good on the local machine, we've got to do something magical when they go off node. That magical thing is netmgr_remote_nd() (described in detail later on).

Before any node descriptor is sent via MsgSend() or MsgReply(), the program has to call netmgr_remote_nd(). The first parameter is the node descriptor of the machine that the buffer is going to, and the second is the node descriptor that you want to put into the buffer. What comes back is a node descriptor that refers to the same node as the second parameter, but is valid for the node that the message is going to. This is one of only two cases where you'll see a node descriptor that isn't for your local node (for the other case, see "When node descriptors are valid for a node other than its own").

Here's an example from the C library:

int flink(int fd, const char *path) {
    io_link_extra_t             extra;
    
    if(ConnectServerInfo(0, fd, &extra.info) != fd) {
        return -1;
    }
    extra.info.nd = netmgr_remote_nd(extra.info.nd, ND_LOCAL_NODE);
    extra.info.pid = getpid();
    extra.info.coid = fd;
    
    if((fd = _connect(PATHMGR_COID, path, 0, O_CREAT | O_EXCL, 
             SH_DENYNO, _IO_CONNECT_LINK, 0, 0, 0,
             _IO_CONNECT_EXTRA_LINK, sizeof extra, &extra, 
             0, 0, 0)) == -1) {
        return -1;
    }
    ConnectDetach(fd);
    return 0;
}

The flink() function creates a new filename (link) for a given file descriptor. The file descriptor is for the local node (of course) and the server is on the node identified by ConnectServerInfo(). In order to tell the server that we want the operation done, we have to tell the server our node descriptor, but the server process is going to want the descriptor relative to the node that he's running on. The netmgr_remote_nd() function accomplishes that.

The `<sys/netmgr.h>` header file

The <sys/netmgr.h> header defines the ND_LOCAL_NODE macro as zero. It can be used anyplace where you're dealing with node descriptors to make it painfully explicit that you're talking about the local node.

We've been talking about node descriptors representing machines (and they do), but they also have Quality of Service stuff in them. If you want to see if two node descriptors refer to the same machine, you can't just blindly compare the two of them for equality; the ND_NODE_CMP(a,b) macro is provided for this purpose. If the return value from the macro is zero, the descriptors refer to the same node. If the value is < 0, the first is a node that's "less than" the second. If the value is > 0, the first is a node that's "greater than" the second. This is the same way that the strcmp() or memcmp() functions work. It's done this way in the (unlikely) event that you want to sort based on node descriptors.

The header file also defines the following networking functions:

netmgr_strtond()
netmgr_ndtostr()
netmgr_remote_nd()

netmgr_strtond()

int netmgr_strtond(const char *nodename, char **endstr);

Converts the string pointed at by nodename to a node descriptor, which it returns. If there's an error, -1 is returned (errno is set). If the endstr parameter is non-NULL, *endstr is set at the first character beyond the end of the node name. All three forms of node name -- simple, directory, FQNN -- are accepted by this function.

netmgr_ndtostr()

int netmgr_ndtostr(unsigned flags, int nd, char *buf, size_t maxbuf);

Converts the given node descriptor to a string and stores it in the memory pointed to by buf. The size of the buffer is given by maxbuf. The return value is the actual length of the node name (even if the function had to truncate the name to get it to fit into the space specified by maxbuf), or -1 if an error occurs (errno is set).

The flags parameter controls the conversion process, indicating which pieces of the string are to be output. The following bits are defined:

ND2S_DIR_SHOW (0x0001), ND2S_DIR_HIDE (0x0002): Show or hide the network directory portion of the string. If neither of these bits are specified, the network directory portion will be output if the node is not in the default network directory.
ND2S_QOS_SHOW (0x0004), ND2S_QOS_HIDE (0x0008): Show or hide the quality of service portion of the string. If neither are specified, the quality of service portion will be output if it is not the default QoS for the node.
ND2S_NAME_SHOW (0x0010), ND2S_NAME_HIDE (0x0020): Show or hide the node name portion of the string. If neither are specified, the name will be output if the node descriptor is not the local node.
ND2S_DOMAIN_SHOW (0x0040), ND2S_DOMAIN_HIDE (0x0080): Show or hide the node domain portion of the string. If neither is specified and a network directory portion was output, the node domain will be output if it is not the default for the output network directory. If neither is specified and no network directory portion was output, the node domain will be output if the domain is not in the default network directory.

By combining the above bits in various combinations, all sorts of interesting information can be extracted, for example:

ND2S_NAME_SHOW: Something good for display purposes.
ND2S_DIR_HIDE | ND2S_NAME_SHOW | ND2S_DOMAIN_SHOW: Something that can be passed to another node and know that it's refering to the same machine (the FQNN).
ND2S_DIR_SHOW | ND2S_NAME_HIDE | ND2S_DOMAIN_HIDE with ND_LOCAL_NODE: The default network directory.
ND2S_DIR_HIDE | NDS2_QOS_SHOW | ND2S_NAME_HIDE | ND2S_DOMAIN_HIDE with ND_LOCAL_NODE: The default Quality of Service for the node.

netmgr_remote_nd()

int netmgr_remote_nd(int remote_nd, int local_nd);

Takes the local_nd node descriptor (which is relative to this node) and returns a new node descriptor that refers to the same machine, but is valid for the node identified by remote_nd. The function can return -1 in some strange cases (e.g. if the remote_nd machine can't talk to the local_nd machine for some routing reason).

When node descriptors are valid for a node other than its own

There are two cases when a program will see a node descriptor that's valid for a node other than it's own:

In the return value from the netmgr_remote_nd().
(This was shown in the section, "How do I pass node descriptors around?").
In the _msg_info structure returned by the MsgReceive() or MsgInfo() functions.
You'll find an nd field in the _msg_info structure -- which is the node descriptor (relative to the local node) of the node that did the MsgSend(). You'll also find a srcnd field (a terrible name, but we couldn't think of anything else) -- which is the node descriptor of the receiving node relative to the sending node. This is exactly the same value that would be returned if the receiving process did a:
```
srcnd = netmgr_remote_nd(msg_info.nd, ND_LOCAL_NODE);
      
```
It turns out that having this information was valuable for a number of reasons, so we added it to the _msg_info structure.