docs/src/modules.md

Mon, 14 May 2018 18:20:56 +0200

author
Mike Becker <universe@uap-core.de>
date
Mon, 14 May 2018 18:20:56 +0200
changeset 312
e1e3b768ae8b
parent 310
b09677d72413
child 314
5d28dc8f0765
permissions
-rw-r--r--

renames ucx_ptrcmp() to ucx_cmp_ptr()

---
title: Modules
---

UCX provides several modules for data structures and algorithms.
You may choose to use specific modules by inclueding the corresponding header
file.
Please note, that some modules make use of other UCX modules.
For instance, the [Allocator](#allocator) module is used by many other modules
to allow flexible memory allocation.
By default the header files are placed into an `ucx` directory within your
systems include directory. In this case you can use a module by including it
via `#include <ucx/MODULENAME.h>`.
Required modules are included automatically.

<div id="modules" align="center">
    
----------------------- ----------------------      ----------------------------     -------------------------
[Allocator](#allocator) [AVL&nbsp;Tree](#avl-tree)  [Buffer](#buffer)                [List](#list)
[Logging](#logging)     [Map](#map)                 [Memory&nbsp;Pool](#memory-pool) [Properties](#properties)
[Stack](#stack)         [String](#string)           [Testing](#testing)              [Utilities](#utilities)
----------------------- ----------------------      ----------------------------     -------------------------

</div>

## Allocator

*Header file:* [allocator.h](api/allocator_8h.html)  
*Required modules:* None.

A UCX allocator consists of a pointer to the memory area / pool and four
function pointers to memory management functions operating on this memory
area / pool. These functions shall behave equivalent to the standard libc
functions `malloc`, `calloc`, `realloc` and `free`.

The signature of the memory management functions is based on the signature
of the respective libc function but each of them takes the pointer to the
memory area / pool as first argument.

As the pointer to the memory area / pool can be arbitrarily chosen, any data
can be provided to the memory management functions. One example is the
[UCX Memory Pool](#memory-pool).

## AVL Tree

*Header file:* [avl.h](api/avl_8h.html)  
*Required modules:* [Allocator](#allocator)

This binary search tree implementation allows average O(1) insertion and
removal of elements (excluding binary search time).
All common binary tree operations are implemented. Furthermore, this module
provides search functions via lower and upper bounds.

### Filtering items with a time window

Suppose you have a list of items which contain a `time_t` value and your task
is to find all items within a time window `[t_start, t_end]`.
With AVL Trees this is easy:
```C
/* ---------------------
 * Somewhere in a header
 */
typedef struct {
    time_t ts;
    /* other important data */
} MyObject;

/* -----------
 * Source code
 */

UcxAVLTree* tree = ucx_avl_new(ucx_longintcmp);
/* ... populate tree with objects, use '& MyObject.ts' as key ... */


/* Now find every item, with 30 <= ts <= 70 */
time_t ts_start = 30;
time_t ts_end = 70;

printf("Values in range:\n");
for (
        UcxAVLNode* node = ucx_avl_find_node(
            tree, (intptr_t) &ts_start,
            ucx_longintdist, UCX_AVL_FIND_LOWER_BOUNDED);
        node && (*(time_t*)node->key) <= ts_end;
        node = ucx_avl_succ(node)
    ) {
    printf(" ts: %ld\n", ((MyObject*)node->value)->ts);
}

ucx_avl_free_content(tree, free);
ucx_avl_free(tree);
```

## Buffer

*Header file:* [buffer.h](api/buffer_8h.html)  
*Required modules:* None.

Instances of this buffer implementation can be used to read from or to write to
memory like you would do with a stream. This allows the use of
`ucx_stream_copy()` from the [Utilities](#utilities) module to copy contents
from one buffer to another or from file or network streams to the buffer and
vice-versa.

More features for convenient use of the buffer can be enabled, like automatic
memory management and automatic resizing of the buffer space.
See the documentation of the macro constants in the header file for more
information.

### Add line numbers to a file

When reading a file line by line, you have three options: first, you could limit
the maximum supported line length.
Second, you allocate a god buffer large
enough for the most lines a text file could have.
And third, undoubtedly the best option, you start with a small buffer, which
adjusts on demand.
An `UcxBuffer` can be created to do just that for you.
Just pass the `UCX_BUFFER_AUTOEXTEND` option to the initialization function.
Here is a full working program, which adds line numbers to a file.
```C
#include <stdio.h>
#include <ucx/buffer.h>
#include <ucx/utils.h>

int main(int argc, char** argv) {

    if (argc != 2) {
        fprintf(stderr, "Usage: %s <file>\n", argv[0]);
        return 1;
    }

    FILE* input = fopen(argv[1], "r");
    if (!input) {
        perror("Canno read input");
        return 1;
    }

    const size_t chunksize = 256;

    UcxBuffer* linebuf =
        ucx_buffer_new(
            NULL,       /* the buffer should manage the memory area for us */
            2*chunksize,  /* initial size should be twice the chunk size */
            UCX_BUFFER_AUTOEXTEND); /* the buffer will grow when necessary */

    size_t lineno = 1;
    do {
        /* read line chunk */
        size_t read = ucx_stream_ncopy(
                input, linebuf, fread, ucx_buffer_write, chunksize);
        if (read == 0) break;
        
        /* handle line endings */
        do {
            sstr_t bufstr = ucx_buffer_to_sstr(linebuf);
            sstr_t nl = sstrchr(bufstr, '\n');
            if (nl.length == 0) break;

            size_t linelen = bufstr.length - nl.length;
            sstr_t linestr = sstrsubsl(bufstr, 0, linelen);

            printf("%zu: %" PRIsstr "\n", lineno++, SFMT(linestr));

            /* shift the buffer to the next line */
            ucx_buffer_shift_left(linebuf, linelen+1);
        } while(1);

    } while(1);

    /* print the 'noeol' line, if any */
    sstr_t lastline = ucx_buffer_to_sstr(linebuf);
    if (lastline.length > 0) {
        printf("%zu: %" PRIsstr, lineno, SFMT(lastline));
    }

    fclose(input);
    ucx_buffer_free(linebuf);

    return 0;
}
```

## List

*Header file:* [list.h](api/list_8h.html)  
*Required modules:* [Allocator](#allocator)

This module provides the data structure and several functions for a doubly
linked list. Among the common operations like insert, remove, search and sort,
we allow convenient iteration via a special `UCX_FOREACH` macro.

### Remove duplicates from an array of strings

Assume you are given an array of `sstr_t` and want to create a list of these
strings without duplicates.
```C
#include <stdio.h>
#include <ucx/list.h>
#include <ucx/string.h>
#include <ucx/utils.h>

UcxList* remove_duplicates(sstr_t* array, size_t arrlen) {
    UcxList* list = NULL;
    for (size_t i = 0 ; i < arrlen ; ++i) {
        if (ucx_list_find(list, array+i, ucx_cmp_sstr, NULL) == -1) {
            sstr_t* s = malloc(sizeof(sstr_t));
            *s = sstrdup(array[i]);
            list = ucx_list_append(list, s);
        }
    }
    return list;
}

/* we will need this function to clean up the list contents later */
void free_sstr(void* ptr) {
    sstr_t* s = ptr;
    free(s->ptr);
    free(s);
}

/* ... */

sstr_t* array = /* some array of strings */
size_t arrlen = /* the length of the array */

UcxList* list = remove_duplicates(array,arrlen);

/* Iterate over the list and print the elements */
UCX_FOREACH(elem, list) {
    sstr_t s = *((sstr_t*)elem->data);
    printf("%" PRIsstr "\n", SFMT(s));
}

/* Use our free function to free the duplicated strings. */
ucx_list_free_content(list, free_sstr);
ucx_list_free(list);
```

## Logging

*Header file:* [logging.h](api/logging_8h.html)  
*Required modules:* [Map](#map), [String](#string)

The logging module comes with some predefined log levels and allows some more
customization. You may choose if you want to get timestamps or source file and
line number logged automatically when outputting a message.
The following function call initializes a debug logger with all of the above
information:
```C
    log = ucx_logger_new(stdout, UCX_LOGGER_DEBUG,
            UCX_LOGGER_LEVEL | UCX_LOGGER_TIMESTAMP | UCX_LOGGER_SOURCE);
```
Afterwards you can use this logger with the predefined macros
```C
    ucx_logger_trace(log, "Verbose output");
    ucx_logger_debug(log, "Debug message");
    ucx_logger_info(log, "Information");
    ucx_logger_warn(log, "Warning");
    ucx_logger_error(log, "Error message");
```
or you use
```C
    ucx_logger_log(log, CUSTOM_LEVEL, "Some message")
```
When you use your custom log level, don't forget to register it with
```C
    ucx_logger_register_level(log, CUSTOM_LEVEL, "CUSTOM")
```
where the last argument must be a string literal.

## Map

*Header file:* [map.h](api/map_8h.html)  
*Required modules:* [Allocator](#allocator), [String](#string)

This module provides a hash map implementation using murmur hash 2 and separate
chaining with linked lists. Similarly to the list module, we provide a
`UCX_MAP_FOREACH` macro to conveniently iterate through the key/value pairs.

### Parsing command line options

Assume you want to parse command line options and record them within a map.
One way to do this is shown by the following code sample:
```C
    UcxMap* options = ucx_map_new(16);
    const char *NOARG = "";
    
    char *option = NULL;
    char optchar = 0;
    for(int i=1;i<argc;i++) {
        char *arg = argv[i];
        size_t len = strlen(arg);
        if(len > 1 && arg[0] == '-') {
            for(int c=1;c<len;c++) {
                if(option) {
                    fprintf(stderr,
                            "Missing argument for option -%c\n", optchar);
                    return 1;
                }
                switch(arg[c]) {
                    default: {
                        fprintf(stderr, "Unknown option -%c\n\n", arg[c]);
                        return 1;
                    }
                    case 'v': {
                        ucx_map_cstr_put(options, "verbose", NOARG);
                        break;
                    }
                    case 'o': {
                        option = "output";
                        optchar = 'o';
                        break;
                    }
                }
            }
        } else if(option) {
            ucx_map_cstr_put(options, option, arg);
            option = NULL;
        } else {
            /* ... handle argument that is not an option ... */
        }
    }
    if(option) {
        fprintf(stderr,
                "Missing argument for option -%c\n", optchar);
        return 1;
    }
```
With the following loop, you can access the previously recorded options:
```C
    UcxMapIterator iter = ucx_map_iterator(options);
    char *arg;
    UCX_MAP_FOREACH(optkey, arg, iter) {
        char* opt = optkey.data;
        if (*arg) {
            printf("%s = %s\n", opt, arg);
        } else {
            printf("%s active\n", opt);
        }
    }
```
Don't forget to call `ucx_map_free()`, when you are done with the map.

## Memory Pool

*Header file:* [mempool.h](api/mempool_8h.html)  
*Required modules:* [Allocator](#allocator)

Here we have a concrete allocator implementation in the sense of a memory pool.
This pool allows you to register destructor functions for the allocated memory,
which are automatically called on the destruction of the pool.
But you may also register *independent* destructor functions within a pool in
case some external library allocated memory for you, which should be
destroyed together with this pool.

Many UCX modules support the use of an allocator.
The [String Module](#string), for instance, provides the `sstrdup_a()` function,
which uses the specified allocator to allocate the memory for the duplicated
string.
This way, you can use a `UcxMempool` to keep track of the memory occupied by
duplicated strings and cleanup everything with just a single call to
`ucx_mempool_destroy()`.

### Read CSV data into a structure

The following code example shows some of the basic memory pool functions and
how they can be used with other UCX modules.
```C
#include <stdio.h>
#include <ucx/mempool.h>
#include <ucx/list.h>
#include <ucx/string.h>
#include <ucx/buffer.h>
#include <ucx/utils.h>

typedef struct {
    sstr_t column_a;
    sstr_t column_b;
    sstr_t column_c;
} CSVData;

int main(int argc, char** argv) {

    UcxMempool* pool = ucx_mempool_new(128);

    FILE *f = fopen("test.csv", "r");
    if (!f) {
        perror("Cannot open file");
        return 1;
    }
    /* close the file automatically at pool destruction*/
    ucx_mempool_reg_destr(pool, f, (ucx_destructor) fclose);

    /* create a buffer and register it at the memory pool for destruction */
    UcxBuffer* content = ucx_buffer_new(NULL, 256, UCX_BUFFER_AUTOEXTEND);
    ucx_mempool_reg_destr(pool, content, (ucx_destructor) ucx_buffer_free);

    /* read the file and split it by lines first */
    ucx_stream_copy(f, content, fread, ucx_buffer_write);
    sstr_t contentstr = ucx_buffer_to_sstr(content);
    ssize_t lc = 0;
    sstr_t* lines = sstrsplit_a(pool->allocator, contentstr, S("\n"), &lc);

    /* skip the header and parse the remaining data */
    UcxList* datalist = NULL;
    for (size_t i = 1 ; i < lc ; i++) {
        if (lines[i].length == 0) continue;
        ssize_t fc = 3;
        sstr_t* fields = sstrsplit_a(pool->allocator, lines[i], S(";"), &fc);
        if (fc != 3) {
            fprintf(stderr, "Syntax error in line %zu.\n", i);
            ucx_mempool_destroy(pool);
            return 1;
        }
        CSVData* data = ucx_mempool_malloc(pool, sizeof(CSVData));
        data->column_a = fields[0];
        data->column_b = fields[1];
        data->column_c = fields[2];
        datalist = ucx_list_append_a(pool->allocator, datalist, data);
    }

    /* control output */
    UCX_FOREACH(elem, datalist) {
        CSVData* data = elem->data;
        printf("Column A: %" PRIsstr " | "
               "Column B: %" PRIsstr " | "
               "Column C: %" PRIsstr "\n",
               SFMT(data->column_a), SFMT(data->column_b), SFMT(data->column_c)
        );
    }

    /* cleanup everything, no manual free() needed */
    ucx_mempool_destroy(pool);

    return 0;
} 
```

### Overriding the default destructor

Sometimes you need to allocate memory with `ucx_mempool_malloc()`, but the
memory is not supposed to be freed with a simple call to `free()`.
In this case, you can overwrite the default destructor as follows:
```C
    MyObject* obj = ucx_mempool_malloc(pool, sizeof(MyObject));

    /* some special initialization with own resource management */
    my_object_init(obj);

    /* register destructor function */
    ucx_mempool_set_destr(obj, (ucx_destructor) my_object_destroy);
```
Be aware, that your destructor function should not free any memory, that is
also managed by the pool.
Otherwise you might be risking a double-free.

## Properties

*Header file:* [properties.h](api/properties_8h.html)  
*Required modules:* [Map](#map)

This module provides load and store function for `*.properties` files.
The key/value pairs are stored within an UCX Map.

### Example: Loading properties from a file

```C
/* Open the file as usual */
FILE* file = fopen("myprops.properties", "r");
if (!file) {
    // error handling
    return 1;
}

/* Load the properties from the file */
UcxMap* myprops = ucx_map_new(16);
if (ucx_properties_load(myprops, file)) {
    /* ... error handling ... */
    fclose(file);
    ucx_map_free(myprops);
    return 1;
}

/* Print out the key/value pairs */
char* propval;
UcxMapIterator propiter = ucx_map_iterator(myprops);
UCX_MAP_FOREACH(key, propval, propiter) {
    printf("%s = %s\n", (char*)key.data, propval);
}

/* Don't forget to free the values before freeing the map */
ucx_map_free_content(myprops, NULL);
ucx_map_free(myprops);
fclose(file);
```

## Stack

*Header file:* [stack.h](api/stack_8h.html)  
*Required modules:* [Allocator](#allocator)

This concrete implementation of an UCX Allocator allows you to grab some amount
of memory which is then handled as a stack.
Please note, that the term *stack* only refers to the behavior of this
allocator. You may still choose to use either stack or heap memory
for the underlying space.
A typical use case is an algorithm where you need to allocate and free large
amounts of memory very frequently.

The following code sample shows how to initialize a stack and push and pop
simple data.
```C
    const size_t len = 1024;
    char space[len];
    UcxStack stack;
    ucx_stack_init(&stack, space, len);

    int i = 42;
    float f = 3.14f;
    const char* str = "Hello!";
    size_t strn = 7;

    /* push the integer */
    ucx_stack_push(&stack, sizeof(int), &i);

    /* push the float and rember the address */
    float* remember = ucx_stack_push(&stack, sizeof(float), &f);

    /* push the string with zero terminator */
    ucx_stack_push(&stack, strn, str);

    /* if we forget, how big an element was, we can ask the stack */
    printf("Length of string: %zu\n", ucx_stack_topsize(&stack)-1);

    /* retrieve the string as sstr_t, without zero terminator! */
    sstr_t s;
    s.length = ucx_stack_topsize(&stack)-1;
    s.ptr = malloc(s.length);
    ucx_stack_popn(&stack, s.ptr, s.length);
    printf("%" PRIsstr "\n", SFMT(s));

    /* print the float directly from the stack and free it */
    printf("Float: %f\n", *remember);
    ucx_stack_free(&stack, remember);

    /* the last element is the integer */
    int j;
    ucx_stack_pop(&stack, &j);
    printf("Integer: %d\n", j);
```



## String

*Header file:* [string.h](api/string_8h.html)  
*Required modules:* [Allocator](#allocator)

This module provides a safe implementation of bounded string.
Usually C strings do not carry a length. While for zero-terminated strings you
can easily get the length with `strlen`, this is not generally possible for
arbitrary strings.
The `sstr_t` type of this module always carries the string and its length to
reduce the risk of buffer overflows dramatically.

### Initialization

There are several ways to create an `sstr_t`:

```C
/* (1) sstr() uses strlen() internally, hence cstr MUST be zero-terminated */
sstr_t a = sstr(cstr);

/* (2) cstr does not need to be zero-terminated, if length is specified */
sstr_t b = sstrn(cstr, len);

/* (3) S() macro creates sstr_t from a string using sizeof() and using sstrn().
       This version is especially useful for function arguments */
sstr_t c = S("hello");

/* (4) ST() macro creates sstr_t struct literal using sizeof() */
sstr_t d = ST("hello");
```

You should not use the `S()` or `ST()` macro with string of unknown origin,
since the `sizeof()` call might not coincide with the string length in those
cases. If you know what you are doing, it can save you some performance,
because you do not need the `strlen()` call.

### Finding the position of a substring

The `sstrstr()` function gives you a new `sstr_t` object starting with the
requested substring. Thus determining the position comes down to a simple
subtraction.

```C
sstr_t haystack = ST("Here we go!");
sstr_t needle = ST("we");
sstr_t result = sstrstr(haystack, needle);
if (result.ptr)
    printf("Found at position %zd.\n", haystack.length-result.length);
else
    printf("Not found.\n");
```

### Spliting a string by a delimiter

The `sstrsplit()` function (and its allocator based version `sstrsplit_a()`) is
very powerful and might look a bit nasty at a first glance. But it is indeed
very simple to use. It is even more convenient in combination with a memory
pool.

```C
sstr_t test = ST("here::are::some::strings");
sstr_t delim = ST("::");

ssize_t count = 0; /* no limit */
UcxMempool* pool = ucx_mempool_new_default();

sstr_t* result = sstrsplit_a(pool->allocator, test, delim, &count);
for (ssize_t i = 0 ; i < count ; i++) {
    /* don't forget to specify the length via the %*s format specifier */
    printf("%*s\n", result[i].length, result[i].ptr);
}

ucx_mempool_destroy(pool);
```
The output is:

    here
    are
    some
    strings

The memory pool ensures, that all strings are freed.

## Testing

*Header file:* [test.h](api/test_8h.html)  
*Required modules:* None.

This module provides a testing framework which allows you to execute test cases
within test suites.
To avoid code duplication within tests, we also provide the possibility to
define test subroutines.

You should declare test cases and subroutines in a header file per test unit
and implement them as you would implement normal functions.
```C
    /* myunit.h */
    UCX_TEST(function_name);
    UCX_TEST_SUBROUTINE(subroutine_name, paramlist); /* optional */


    /* myunit.c */
    UCX_TEST_SUBROUTINE(subroutine_name, paramlist) {
        /* ... reusable tests with UCX_TEST_ASSERT() ... */
    }

    UCX_TEST(function_name) {
        /* ... resource allocation and other test preparation ... */

        /* mandatory marker for the start of the tests */
        UCX_TEST_BEGIN

        /*  ... verifications with UCX_TEST_ASSERT() ...
         * (and/or calls with UCX_TEST_CALL_SUBROUTINE())
         */

        /* mandatory marker for the end of the tests */
        UCX_TEST_END

        /* ... resource cleanup ...
         * (all code after UCX_TEST_END is always executed)
         */
    }
```
If you want to use the `UCX_TEST_ASSERT()` macro in a function, you are
*required* to use a `UCX_TEST_SUBROUTINE`.
Otherwise the testing framework does not know where to jump, when the assertion
fails.

After implementing the tests, you can easily build a test suite and execute it:
```C
    UcxTestSuite* suite = ucx_test_suite_new();
    ucx_test_register(suite, testMyTestCase01);
    ucx_test_register(suite, testMyTestCase02);
    /* ... */
    ucx_test_run(suite, stdout); /* stdout, or any other FILE stream */
```

## Utilities

*Header file:* [utils.h](api/utils_8h.html)  
*Required modules:* [Allocator](#allocator), [String](#string)

In this module we provide very general utility function for copy and compare
operations.
We also provide several `printf` variants to conveniently print formatted data
to streams or strings.

### A simple copy program

The utilities package provides several stream copy functions.
One of them has a very simple interface and can, for instance, be used to copy
whole files in a single call.
This is a minimal working example:
```C
#include <stdio.h>
#include <ucx/utils.h>

int main(int argc, char** argv) {

    if (argc != 3) {
        fprintf(stderr, "Use %s <src> <dest>", argv[0]);
        return 1;
    }

    FILE *srcf = fopen(argv[1], "r");   /* insert error handling on your own */
    FILE *destf = fopen(argv[2], "w");
    
    size_t n =  ucx_stream_copy(srcf, destf, fread, fwrite);
    printf("%zu bytes copied.\n", n);

    fclose(srcf);
    fclose(destf);


    return 0;
}
```

### Automatic allocation for formatted strings

The UCX utility function `ucx_asprintf()` and it's convenient shortcut
`ucx_sprintf` allow easy formatting of strings, without ever having to worry
about the required space.
```C
sstr_t mystring = ucx_sprintf("The answer is: %d!", 42);
```
Still, you have to pass `mystring.ptr` to `free()` (or the free function of
your allocator, if you use `ucx_asprintf`).
If you don't have all the information ready to build your string, you can even
use a [UcxBuffer](#buffer) as a target with the utility function
`ucx_bprintf()`.
```C
UcxBuffer* strbuffer = ucx_buffer_new(NULL, 512, UCX_BUFFER_AUTOEXTEND);

for (unsigned int i = 2 ; i < 100 ; i++) {
        ucx_bprintf(strbuffer, "Integer %d is %s\n",
                        i, prime(i) ? "prime" : "not prime");
}

/* print the result to stdout */
printf("%s", (char*)strbuffer->space);

ucx_buffer_free(strbuffer);
```

mercurial