[Previous] [Contents] [Index] [Next]

Writing a Network Driver

This chapter includes:

In this chapter, we look at the work that you must do to write a driver for your own hardware card.

From io-net's perspective, the card is an up producer because it produces data that goes up into the io-net infrastructure. It isn't a down producer because it doesn't produce or pass along any data that goes down in the io-net infrastructure -- the downward direction is strictly limited to the hardware and network interface of the card.

Our example is a "null" driver that absorbs any data sent to it (it pretends it went out to the hardware) and, once per second, generates incoming data (it pretends data arrived from the hardware).


Caution: Since your driver is part of a shared object (and not its own separate process), you have to be very careful about error checking, memory leaks, and such issues. For example, if you call exit() within your driver, you'll take down the entire io-net process! If your driver gets loaded and unloaded many times, and it has a memory leak, eventually your system will run out of memory.

DDK source code

When you install the DDK package, the source is put into a directory under the /usr/src/ddk-6.2.0 directory. Currently, the directory structure for the Network DDK looks like this:


Network DDK directories


Directory structure for the Network DDK.


Initialization

You must include the file <sys/io-net.h>, which contains structures that you'll use to bind your driver to io-net.

Here's what happens when you load your network driver:

  1. The io-net manager searches for the global symbol, io_net_dll_entry. This defines the driver's initialization function, which io-net then calls.
  2. The initialization function registers the driver and its functions with io-net.
  3. The initialization function advertises the driver's capabilities to io-net and its modules.

These steps are described in the sections that follow.

The io_net_dll_entry global symbol

The first thing that you must do in your driver is create a public symbol called io_net_dll_entry of type io_net_dll_entry_t (see the Network DDK API chapter). The io-net process searches for this symbol when it loads your shared object.

Here's the definition for our sample driver:

// Forward declaration of our initialization function:
int my_init (void *dll_hdl,
             dispatch_t *dpp,
             io_net_self_t *ion,
             char *options);

// Global symbol:
io_net_dll_entry_t io_net_dll_entry =
{
    2,         // Number of functions
    my_init,   // init()
    NULL       // "master" shutdown()
};

Here we've simply defined it as containing a single function called my_init().

Initialization function

The init() function that you supply is passed the following arguments:

void *dll_hdl
An internal handle used by io-net -- you'll need to hold onto this handle for future calls into the io-net framework.
dispatch_t *dpp
Dispatch handle.
io_net_self_t *ion
A pointer to a data structure of the io-net functions that your driver can call. For more information, see io_net_self_t in the Network DDK API chapter.
char *options
Command-line suboptions related to your driver.

At a minimum, the initialization function should:

The initialization function may perform additional functions:

Sample initialization function

Let's take a look at our sample driver's initialization function, my_init(). When we load our driver, io-net calls my_init(), passing the arguments described earlier:

void          *null_dll_hdl;
io_net_self_t *null_ion;

int
my_init (void *dll_hdl, dispatch_t *dpp, io_net_self_t *ion,
         char *options)
{
    null_dll_hdl = dll_hdl;
    null_ion = ion;

    if (!null_register_device ()  // Register with io-net
    || (errno = pthread_create (NULL, NULL,
                                null_rx_thread, NULL))) {
        return (-1);    // couldn't register, fail;
                        // errno says why
    }

    // Advertise our driver's capabilities
    null_advertise (null_reg_hdl, null_entry.func_hdl);

    return (0);         // success
}

We ignore the dpp and options; we don't use them in our trivial example here. The other two parameters we'll just store in global variables for later use:

null_dll_hdl
The handle that we'll need when calling io-net's functions.
null_ion
The structure that lists io-net's functions.

If you wish, you can define macros like these to access the function pointers (to keep things simple, our sample driver doesn't use them):

#define ion_alloc       null_ion->alloc
#define ion_alloc_npkt  null_ion->alloc_up_npkt
#define ion_add_done    null_ion->reg_tx_done
#define ion_free        null_ion->free
#define ion_rx_packets  null_ion->tx_up
#define ion_tx_complete null_ion->tx_done

Notice how we've created a receiver thread (using pthread_create()). For our trivial example, this thread simply sits in a do-forever loop, sleeps for one second, and then pretends that data has arrived from somewhere, finally giving the data to io-net (we'll see the code for this shortly).

In a real driver, the functionality is similar; the thread waits for some kind of indication from the hardware that data has arrived (perhaps via a hardware interrupt) and then gets the data from the hardware, processes it, and gives it to io-net.

Registering with io-net

Once the device is configured, you'll want to bind it into the io-net hierarchy. This is done by calling the reg() function that io-net provided in the io_net_self_t structure passed to your driver's initialization function:

int (*reg) (void *dll_hdl,
            io_net_registrant_t *registrant,
            int *reg_hdlp,
            uint16_t *cell,
            uint16_t *endpoint)

The arguments are:

dll_hdl
The handle that io-net passed when it called your driver's init() function.
registrant
A pointer to an io_net_registrant_t structure that describes what your driver is registering as. This structure includes:

For details, see the Network DDK API chapter.

reg_hdlp
The registrant handle, which is filled in if the registration succeeds. Use it as the registrant_hdl parameter to subsequent calls into io_net.
cell and endpoint
These are filled in. They indicate your driver's place to other registrants.

As described earlier, io-net uses the module type and the type of packet produced or accepted on the way up and down to determine where a module fits in with the other modules.

Once bound in, you'll receive callouts from io-net into the functions that you specified in the io_net_registrant_funcs_t structure. Your hardware most likely generates interrupts (or informs you in some other way that data has arrived); you then use the call-ins to io-net to inform it that data has arrived (after suitable processing on your end).

Sample registration

To perform the second phase of our initialization for our sample driver, we need to register it with io-net. Since we're going to be an up-producer and nothing else, this call is as follows:

// functions that we supply
io_net_registrant_funcs_t null_funcs =
{
    9,                       // nfuncs
    NULL,                    // rx_up()
    null_send_packets,       // rx_down()
    null_receive_complete,   // tx_done()
    null_shutdown1,          // shutdown1()
    null_shutdown2,          // shutdown2()
    null_advertise,          // dl_advert()
    null_devctl,             // devctl()
    null_flush,              // flush()
    NULL                     // raw_open()
};

// a description of our driver
io_net_registrant_t null_entry =
{
    _REG_PRODUCER_UP,   // we're an "up" producer
    "devn-null.so",     // our name
    "en",               // our top type
    NULL,               // our bottom type (none)
    NULL,               // function handle (see the note below)
    &null_funcs,        // pointer to our functions
    0                   // #dependencies
};

int         null_reg_hdl;
uint16_t    null_cell;
uint16_t    null_lan;

static int
null_register_device (void)
{
    if ((*null_ion -> reg)
        (null_dll_hdl,
         &null_entry,
         &null_reg_hdl,
         &null_cell,
         &null_lan) < 0) {

        return (0);     // failed
    }

    return (1);         // success
}

At this point, you've registered your device driver with io-net.


Note: For simplicity, we've used global variables (null_reg_hdl, null_cell, and null_lan) to hold our driver's registrant handle, cell, and endpoint number.

In a real driver, you'd most likely allocate a structure, and pass a pointer to that structure around. This helps your driver support multiple cards, as each card's context information (or "handle") can be passed individually. The io-net infrastructure lets you associate your own handle with the binding in the func_hdl member of io_net_registrant_t -- we've passed a NULL).


Advertising the driver's capabilities to io-net

The next thing to do is advertise the driver's capabilities to io-net. This is done via the dl_advert() function that you call in your driver's initialization function whenever you detect a card.


Note: The io-net manager might call your dl_advert() function some time later as well. This happens whenever another module is mounted above yours, so that it too can be informed of your driver's capabilities. This ties in with our earlier discussion about the dynamic nature of the loading of the modules.

Your driver advertises its capabilities by filling in a structure of type io_net_msg_dl_advert_t and passing it up io-net's hierarchy. This structure includes:

For details, see the Network DDK API chapter.

The interface number could be used for a piece of hardware that has multiple channels. It would have one interface, but have one interface number per channel.

Sample advertising function

In our simple example, we assume that the devn-null device always detects exactly one card, so we simply call this function, called null_advertise(), once in the driver's initialization function. Here's the code for our null_advertise() function (the numbers in the comments correspond to the notes just after the code sample):

#define MTUSIZE         1514

int
null_advertise (int reg_hdl, void *func_hdl)
{
    npkt_t                  *npkt;
    net_buf_t               *nb;
    net_iov_t               *iov;
    io_net_msg_dl_advert_t  *ap;

    // 1) Allocate a packet; we'll use this for communications
    //    with io-net.
    if ((npkt = null_ion->alloc_up_npkt (sizeof (*nb) + sizeof (*iov),
                                (void **) &nb)) == NULL) {
        return (0);
    }

    // 2) Allocate room for the advertisement message.
    if ((ap = null_ion->alloc (sizeof (*ap), 0)) == NULL) {
        null_ion->free (npkt);
        return (0);
    }

    // 3) Set up the packet into the queue.
    TAILQ_INSERT_HEAD (&npkt -> buffers, nb, ptrs);

    iov = (net_iov_t *) (nb + 1);

    nb -> niov = 1;
    nb -> net_iov = iov;
    iov -> iov_base = ap;
    iov -> iov_len = sizeof (*ap);

    // 4) Generate the info for the advertisement message.
    memset (ap, 0x00, sizeof (*ap));
    ap -> type          = _IO_NET_MSG_DL_ADVERT;
    ap -> iflags        = (IFF_SIMPLEX | IFF_BROADCAST |
                           IFF_MULTICAST | IFF_RUNNING);
    ap -> mtu_min       = 0;
    ap -> mtu_max       = MTUSIZE;
    ap -> mtu_preferred = MTUSIZE;
    sprintf (ap -> up_type, "en%d", null_lan);
    strcpy (ap -> dl.sdl_data, ap -> up_type);

    ap -> dl.sdl_len = sizeof (struct sockaddr_dl);
    ap -> dl.sdl_family = AF_LINK;
    ap -> dl.sdl_index  = null_lan;
    ap -> dl.sdl_type = IFT_ETHER;

    // Not terminated:
    ap -> dl.sdl_nlen = strlen (ap -> dl.sdl_data); 
    ap -> dl.sdl_alen = 6;
    memcpy (ap -> dl.sdl_data + ap -> dl.sdl_nlen,
            "\x12\x34\x56\x78\x9a\xbc", 6);

    // 5) Bind the advertisement message to the packet; note
    //    the use of the _NPKT_MSG flag to indicate to the 
    //    upper modules that this is a message intended for
    //    them. It isn't just a "regular" packet.
    npkt -> org_data = ap;
    npkt -> flags |= _NPKT_MSG;
    npkt -> iface = 0;
    npkt -> framelen = sizeof (*ap);

    if (null_ion->reg_tx_done (null_reg_hdl, npkt, NULL) == -1) {
        null_ion->free (ap);
        null_ion->free (npkt);
        return (0);
    }

    // 6) Complete the transaction.
    if(null_ion->tx_up (null_reg_hdl, npkt, 0, 0, 
    null_cell, null_lan, 0) == 0) {                  
    null_ion->tx_done (null_reg_hdl, npkt);       
    }                                                
    return (0);
}

In the code sample above, the following steps are taken:

  1. Allocate a packet for the communication. The alloc_up_npkt() function (defined in the io_net_self_t structure), is responsible for allocating an up-going packet. Here we've created the initial packet that we're going to send to the upper layer (io-net itself).
  2. Allocate room for the advertisement message. The alloc() function (defined in the io_net_self_t structure) is used to create room for our advertising structure.
  3. Set up the packet into the queue. Here we bind the pointers to the buffers into the net_iov_t data type that we allocated above.
  4. Generate the information for the advertisement message. We create the advertisement message ourselves here by filling the various members of the ap structure (of type io_net_msg_dl_advert_t). The type field is set to _IO_NET_MSG_DL_ADVERT, which tells upper modules the type of message they're receiving. The iflags field is used to advertise the capabilities of our driver.
  5. Bind the advertisement message to the packet. Finally, we perform pointer manipulations to attach the data (the advertisement message) into the packet.
  6. Complete the transaction. To complete the transaction, we call tx_up(), (defined in the io_net_self_t structure), which sends a packet to the layer above.

    We call tx_done() (also defined in the io_net_self_t structure), if no one above us takes the packet. Only when all modules (ourself and any above us that took the packet) are done with the packet is our own tx_done() called.

At this point, any modules that are attached to you from above know the characteristics of your driver.

The next two things to look at are how your driver receives data from the higher levels (destined for transmission via the hardware) and how it tells the higher levels that data has arrived (from the hardware).

Receiving data from the hardware

When a packet originates in the hardware, the network driver is notified in some way, such as by an interrupt. The driver:

When all the modules have finished with the packet, io-net calls our driver's tx_done() function, so the driver can dispose of the packet.

The prototype for the tx_up_start() function is:

npkt_t *(*tx_up_start) (int registrant_hdl,
                        nptk_t *npkt,
                        int off,
                        int framelen_sub,
                        uint16_t cell,
                        uint16_t endpoint,
                        uint16_t iface,
                        void *done_hdl)

The arguments are:

registrant_hdl
Your driver's registrant handle, which your driver was given when it registered with io-net.
npkt
A pointer to a single packet or a linked list of packets to be processed.
off
Indicates to the layer above at which offset into the packet the type your layer presents starts. For example, if your driver processes Ethernet packets, it sets this argument to the size of the Ethernet header.
framelen_sub
How many bytes on the end of the packet should be ignored by modules at higher levels. For an Ethernet packet, this value is 0. The off and framelen_sub arguments let a packet be "decapsulated" without the need to perform a copy operation.
cell, endpoint, and iface
Indicate to the layers above that this packet came from your driver. Pass the cell and endpoint that were assigned to your driver by io-net when it registered.

The iface argument is for internal use and lets a single registrant present multiple interfaces of the same type to upper modules. It should start at 0 and increase sequentially. In the case of a driver talking to hardware (a simple up producer with no modules below it), it's actually more flexible to register multiple times if multiple interfaces are present (once for each interface). In this case, the iface parameter is always 0.

done_hdl
A pointer to extra data that your driver wants passed to its own tx_done() function.

This function returns a linked list of npkts that had errors, or NULL if all succeeded.

Sample receiver

In our sample devn-null driver, recall that we created a thread to perform the "receive data from hardware" function:

pthread_create (NULL, NULL, null_rx_thread, NULL);

Let's now look at the null_rx_thread() function:

#include <sys/io-net.h>
#include <inttypes.h>
#include <atomic.h>
#include <unistd.h>

#define NULL_SIZE 50
#define NULL_IN_USE (0x1 << 31)

extern io_net_self_t *null_ion;
extern int null_reg_hdl;
extern uint16_t null_cell;
extern uint16_t null_lan;


void *
null_rx_thread (void *arg)
{
    npkt_t *npkt = NULL;
    net_buf_t *nb;
    net_iov_t *ni;
    uint32_t ret;

    while (1)
    {
        if(!npkt)
        {
            // 1) We'll get one packet once and reuse it every time.
            if(!(npkt = null_ion->alloc_up_npkt(
                          sizeof *nb + sizeof *ni + NULL_SIZE,
                          (void **) &nb)))
            {
                continue;
            }
            npkt->tot_iov  = 1;    //next thing we set up below

            ni = (net_iov_t *)(nb + 1);

            nb->niov    = 1;
            nb->net_iov = ni;

            ni->iov_base = ni + 1;
            ni->iov_len  = NULL_SIZE;
            ni->iov_phys = (paddr_t)(null_ion->mphys(ni->iov_base));
            // pretend our hardware likes physical addresses
        }

        // 2) Wait for hardware.
        sleep (1);

        if((ret = atomic_set_value(&npkt->flags, NULL_IN_USE)) &
           NULL_IN_USE)
        {
            // 3) Still in use from last time.
            continue; 

        }

        // 4) At this point, we pretend the hardware has supplied
        //    us with NULL_SIZE bytes.


        // 5) Send it up, using one of these methods:
#if 1
        // Method 1:
        if(npkt = null_ion->tx_up_start(null_reg_hdl, npkt, 0, 0, 
                    null_cell, null_lan, 0, NULL))
        {
            null_receive_complete(npkt, NULL, NULL);
        }
#else
        // Method 2:
        if(null_ion->reg_tx_done(null_reg_hdl, npkt, NULL) == -1)
        {
            null_receive_complete(npkt, NULL, NULL);
            continue;
        }
        if(null_ion->tx_up(null_reg_hdl, npkt, 0, 0, null_cell,
                        null_lan, 0) == 0) {
        null_ion->tx_done(null_reg_hdl, npkt);
        }
        #endif
    }
}

int
null_receive_complete (npkt_t *npkt, void *done_hdl, void *func_hdl)
{
    // 6) Restock our cache.
    atomic_clr(&npkt->flags, NULL_IN_USE);
    return 0;
}

Here are the notes for this code:

  1. In this example, we use a single packet over and over. In a real driver, we'd probably reuse a pool of packets. We might also load the hardware, telling it to put data where the npkt's iovs point.
  2. Wait for hardware. In our simple driver, we simply sleep for one second to simulate some form of delay as might be encountered while waiting for data from a network. Depending on the complexity of your actual hardware, the sleep() call might just be replaced with something equally simple like InterruptWait(). This depends on the hardware architecture.
  3. Since we're the originator of the packet, we're allowed to use the top 12 bits of the npkt structure's flags member as we like. In this example, the top bit indicates that this packet is in use. It's cleared when our tx_done() function, null_receive_complete(), is called. In a typical driver, the tx_done() routine probably puts the npkt on a list for reuse or actually frees it.
  4. We received an event saying that the hardware has supplied us with data. At this point, we might typically reload the hardware, trying to keep it supplied with buffers.
  5. Send the packet up. The two methods are equivalent, however the first makes more efficient use of internal locking within io-net. The second method is shown to illustrate that every module dealing with an up headed packet must call io-net's tx_done() when done with the packet. The io-net manager calls our own tx_done() routine, null_receive_complete(), only when all modules (including our own) are done with the npkt.
  6. All modules are done with this packet. Mark it as ready for reuse by clearing the in-use bit in the npkt structure's flags.

Transmitting data to the hardware

When a higher level wants to send data to a lower level for processing, it calls the io-net manager's tx_down() function. When this happens, io-net determines the destination of the packet and calls that module's rx_down() function:

int (*rx_down) (npkt_t *npkt,
                void *func_hdl);

The arguments are:

npkt
A pointer to the data structure that describes the packet.
func_hdl
The handle you specified for your driver in io_net_registrant_t.

The members of the npkt structure that are particular interest are:

cell, endpoint, and iface
The destination of the packet.
buffers
A pointer to the packet data.
tot_iov
The total number of I/O vectors that contain the packet data.
framelen
The total size, in bytes, of all the data.

Note: The data sent from a higher level may be presented as a number of buffers. This is because of the way that the higher levels prepend and append encapsulation data onto the packet.

We'd typically load up our hardware, telling it to transmit the packet as described by the linked list of net_buf_t structures. Use the physical addresses (in the net_iov_t structures) to program the DMA. We'd then wait for some indication from the hardware, telling us it was done with the packet. We'd then call io-net's tx_done() function on the packet. In the previous example, this is the call:

null_ion->tx_done(null_reg_hdl, npkt);

nicinfo interface

You can use the nicinfo utility (described in the QNX 6 Utilities Reference) to get information about the current state of the network. It reports some generic information as well as more specific information (if it's avaliable).

The generic information includes the total number of Transmit and Receive (Tx/Rx) packets that have be sent through the interface and the total number of Tx/Rx errors.

Each type of physical interface also has its own type of information that it can report. Currently, there's a specific structure for reporting information on Ethernet devices. There's also a more generic structure for passing name/value pairs.

The main data structure is Nic_t, which contains these other structures:

Each of these structures is described in more detail in the Network DDK API chapter.

By defalt, the nicinfo utility gets its information from a driver by performing a devctl() on /dev/io-net/en0, the Ethernet device on LAN 0. This devctl() then causes io-net to perform a callback into your driver using the devctl() function in the io_net_registrant_funcs_t structure that the driver registered with io-net.

When the driver's devctl() function is called, the dcmd argument is set to the command being performed. In the case of nicinfo, the dcmd is DMD_IO_NET_NICINFO. The driver code should check for this command and copy the statistics into the buffer provided. For example:

#include <sys/nic.h>
int generic_eth_devctl( void *hdl, int dcmd, void *data,
                        size_t size, int *ret )
{
    Nic_t               *nic = (Nic_t *)hdl;
    int                 status;

    status  = EOK;
    switch( dcmd ) {
        case DCMD_IO_NET_NICINFO:
            memcpy( data, nic, min( size, sizeof( Nic_t ) ) );
            break;

        default:
            status = ENOTSUP;
            break;
    }

    return( status );
}

Your driver must ensure that the Nic_t structure that's passed back to nicinfo contains the correct information. Many of the entries in the Nic_t structure can be filled in at startup, but some change on the fly. For more information on this structure, see the Network DDK API chapter.

At runtime, your driver must update the nstats member of the Nic_t structure with the correct statistical information. The least that your driver must do is increment the Tx/Rx count on each Tx/Rx packet and also increment the Tx/Rx errors on each Tx/Rx error. There's more specific data for each device that doesn't strictly need to be updated, but it should be, since this is one of the best ways to debug a system.


[Previous] [Contents] [Index] [Next]