--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/Writerside/topics/features.md Thu Jan 23 01:15:52 2025 +0100 @@ -0,0 +1,395 @@ +--- +title: UCX Features +--- + +<div id="modules"> + +------------------------ ------------------------- ------------------- --------------------------------- +[Allocator](#allocator) [String](#string) [Buffer](#buffer) [Memory Pool](#memory-pool) +[Iterator](#iterator) [Collection](#collection) [List](#list) [Map](#map) +[Utilities](#utilities) +------------------------ ------------------------- ------------------- --------------------------------- + +</div> + +## Allocator + +*Header file:* [allocator.h](api/allocator_8h.html) + +The UCX allocator provides an interface for implementing an own memory allocation mechanism. +Various function in UCX provide an additional alternative signature that takes an allocator as +argument. A default allocator implementation using the stdlib memory management functions is +available via the global symbol `cxDefaultAllocator`. + +If you want to define your own allocator, you need to initialize the `CxAllocator` structure +with a pointer to an allocator class (containing function pointers for the memory management +functions) and an optional pointer to an arbitrary memory region that can be used to store +state information for the allocator. An example is shown below: + +```c +struct my_allocator_state { + size_t total; + size_t avail; + char mem[]; +}; + +static cx_allocator_class my_allocator_class = { + my_malloc_impl, + my_realloc_impl, // all these functions are somewhere defined + my_calloc_impl, + my_free_impl +}; + +CxAllocator create_my_allocator(size_t n) { + CxAllocator alloc; + alloc.cl = &my_allocator_class; + alloc.data = calloc(1, sizeof(struct my_allocator_state) + n); + return alloc; +} +``` + +## String + +*Header file:* [string.h](api/string_8h.html) + +UCX strings come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`). +The functions of UCX are designed to work with immutable strings by default but in situations where it is necessary, +the API also provides alternative functions that work directly with mutable strings. +Functions that change a string in-place are, of course, only accepting mutable strings. + +When you are using UCX functions, or defining your own functions, you are sometimes facing the "problem", +that the function only accepts arguments of type `cxstring` but you only have a `cxmutstr` at hand. +In this case you _should not_ introduce a wrapper function that accepts the `cxmutstr`, +but instead you should use the `cx_strcast()` function to cast the argument to the correct type. + +In general, UCX strings are **not** necessarily zero-terminated. If a function guarantees to return zero-terminated +string, it is explicitly mentioned in the documentation of the respective function. +As a rule of thumb, you _should not_ pass the strings of a UCX string structure to another API without explicitly +ensuring that the string is zero-terminated. + +## Buffer + +*Header file:* [buffer.h](api/buffer_8h.html) + +Instances of this buffer implementation can be used to read from or write to memory like you would do with a stream. +This allows the use of `cx_stream_copy()` (see [Utilities](#utilities)) to copy contents from one buffer to another, +or from a file or network streams to the buffer and vice-versa. + +More features for convenient use of the buffer can be enabled, like automatic memory management and automatic +resizing of the buffer space. + +Since UCX 3.0, the buffer also supports automatic flushing of contents to another stream (or buffer) as an alternative +to automatically resizing the buffer space. +Please refer to the API doc for the fields prefixed with `flush_` to learn more. + +## Memory Pool + +*Header file:* [mempool.h](api/mempool_8h.html) + +A memory pool is providing an allocator implementation that automatically deallocates the memory upon its destruction. +It also allows you to register destructor functions for the allocated memory, which are automatically called before +the memory is deallocated. +Additionally, you may also register _independent_ destructor functions within a pool in case some external library +allocated memory for you, which should be freed together with this pool. + +Many UCX features support the use of an allocator. +The [strings](#string), for instance, provide several functions suffixed with `_a` that allow specifying an allocator. +You can use this to keep track of the memory occupied by dynamically allocated strings and cleanup everything with +just a single call to `cxMempoolFree()`. + +The following code illustrates this on the example of reading a CSV file into memory. +```C +#include <stdio.h> +#include <cx/mempool.h> +#include <cx/linked_list.h> +#include <cx/string.h> +#include <cx/buffer.h> +#include <cx/utils.h> + +typedef struct { + cxstring column_a; + cxstring column_b; + cxstring column_c; +} CSVData; + +int main(void) { + CxMempool* pool = cxBasicMempoolCreate(128); + + FILE *f = fopen("test.csv", "r"); + if (!f) { + perror("Cannot open file"); + return 1; + } + // close the file automatically at pool destruction + cxMempoolRegister(pool, f, (cx_destructor_func) fclose); + + // create a buffer using the memory pool for destruction + CxBuffer *content = cxBufferCreate(NULL, 256, pool->allocator, CX_BUFFER_AUTO_EXTEND); + + // read the file into the buffer and turn it into a string + cx_stream_copy(f, content, (cx_read_func) fread, cxBufferWriteFunc); + fclose(f); + cxstring contentstr = cx_strn(content->space, content->size); + + // split the string into lines - use the mempool for allocating the target array + cxstring* lines; + size_t lc = cx_strsplit_a(pool->allocator, contentstr, + CX_STR("\n"), SIZE_MAX, &lines); + + // skip the header and parse the remaining data into a linked list + // the nodes of the linked list shall also be allocated by the mempool + CxList* datalist = cxLinkedListCreate(pool->allocator, NULL, sizeof(CSVData)); + for (size_t i = 1 ; i < lc ; i++) { + if (lines[i].length == 0) continue; + cxstring fields[3]; + size_t fc = cx_strsplit(lines[i], CX_STR(";"), 3, fields); + if (fc != 3) { + fprintf(stderr, "Syntax error in line %zu.\n", i); + cxMempoolFree(pool); + return 1; + } + CSVData data; + data.column_a = fields[0]; + data.column_b = fields[1]; + data.column_c = fields[2]; + cxListAdd(datalist, &data); + } + + // iterate through the list and output the data + CxIterator iter = cxListIterator(datalist); + cx_foreach(CSVData*, data, iter) { + printf("Column A: %.*s | " + "Column B: %.*s | " + "Column C: %.*s\n", + (int)data->column_a.length, data->column_a.ptr, + (int)data->column_b.length, data->column_b.ptr, + (int)data->column_c.length, data->column_c.ptr + ); + } + + // cleanup everything, no manual free() needed + cxMempoolFree(pool); + + return 0; +} +``` + +## Iterator + +*Header file:* [iterator.h](api/iterator_8h.html) + +In UCX 3 a new feature has been introduced to write own iterators, that work with the `cx_foreach` macro. +In previous UCX releases there were different hard-coded foreach macros for lists and maps that were not customizable. +Now, creating an iterator is as simple as creating a `CxIterator` struct and setting the fields in a meaningful way. + +You do not always need all fields in the iterator structure, depending on your use case. +Sometimes you only need the `index` (for example when iterating over simple lists), and other times you will need the +`slot` and `kv_data` fields (for example when iterating over maps). + +If the predefined fields are insufficient for your use case, you can alternatively create your own iterator structure +and place the `CX_ITERATOR_BASE` macro as first member of that structure. + +Usually an iterator is not mutating the collection it is iterating over. +In some programming languages it is even disallowed to change the collection while iterating with foreach. +But sometimes it is desirable to remove an element from the collection while iterating over it. +For this purpose, most collections allow the creation of a _mutating_ iterator. +The only differences are, that the `mutating` flag is `true` and the `src_handle` is not const. +On mutating iterators it is allowed to call the `cxFlagForRemoval()` function, which instructs the iterator to remove +the current element from the collection on the next call to `cxIteratorNext()` and clear the flag afterward. +If you are implementing your own iterator, it is up to you to implement this behavior. + +## Collection + +*Header file:* [collection.h](api/collection_8h.html) + +Collections in UCX 3 have several common features. +If you want to implement an own collection data type that uses the same features, you can use the +`CX_COLLECTION_BASE` macro at the beginning of your struct to roll out all members a usual UCX collection has. +```c +struct my_fancy_collection_s { + CX_COLLECTION_BASE; + struct my_collection_data_s *data; +}; +``` +Based on this structure, this header provides some convenience macros for invoking the destructor functions +that are part of the basic collection members. +The idea of having destructor functions within a collection is that you can destroy the collection _and_ the +contents with one single function call. +When you are implementing a collection, you are responsible for invoking the destructors at the right places, e.g. +when removing (and deleting) elements in the collection, clearing the collection, or - the most prominent case - +destroying the collection. + +You can always look at the UCX list and map implementations if you need some inspiration. + +## List + +*Header file:* [list.h](api/list_8h.html) + +This header defines a common interface for all list implementations. + +UCX already comes with two common list implementations (linked list and array list) that should cover most use cases. +But if you feel the need to implement an own list, the only thing you need to do is to define a struct with a +`struct cx_list_s` as first member, and set an appropriate list class that implements the functionality. +It is strongly recommended that this class is shared among all instances of the same list type, because otherwise +the `cxListCompare` function cannot use the optimized implementation of your class and will instead fall back to +using iterators to compare the contents element-wise. + +### Linked List + +*Header file:* [linked_list.h](api/linked__list_8h.html) + +On top of implementing the list interface, this header also defines several low-level functions that +work with arbitrary structures. +Low-level functions, in contrast to the high-level list interface, can easily be recognized by their snake-casing. +The function `cx_linked_list_at`, for example, implements a similar functionality like `cxListAt`, but operates +on arbitrary structures. +The following snippet shows how it is used. +All other low-level functions work similarly. +```c +struct node { + node *next; + node *prev; + int data; +}; + +const ptrdiff_t loc_prev = offsetof(struct node, prev); +const ptrdiff_t loc_next = offsetof(struct node, next); +const ptrdiff_t loc_data = offsetof(struct node, data); + +struct node a = {0}, b = {0}, c = {0}, d = {0}; +cx_linked_list_link(&a, &b, loc_prev, loc_next); +cx_linked_list_link(&b, &c, loc_prev, loc_next); +cx_linked_list_link(&c, &d, loc_prev, loc_next); + +cx_linked_list_at(&a, 0, loc_next, 2); // returns pointer to c +``` + +### Array List + +*Header file:* [array_list.h](api/array__list_8h.html) + +Since low-level array lists are just plain arrays, there is no need for such many low-level functions as for linked +lists. +However, there is one extremely powerful function that can be used for several complex tasks: `cx_array_copy`. +The full signature is shown below: +```c +int cx_array_copy( + void **target, + void *size, + void *capacity, + unsigned width, + size_t index, + const void *src, + size_t elem_size, + size_t elem_count, + struct cx_array_reallocator_s *reallocator +); +``` +The `target` argument is a pointer to the target array pointer. +The reason for this additional indirection is that this function writes +back the pointer to the possibly reallocated array. +The next two arguments are pointers to the `size` and `capacity` of the target array for which the width +(in bits) is specified in the `width` argument. + +On a successful invocation, the function copies `elem_count` number of elements, each of size `elem_size` from +`src` to `*target` and uses the `reallocator` to extend the array when necessary. +Finally, the size, capacity, and the pointer to the array are all updated and the function returns zero. + +A few things to note: +* `*target` and `src` can point to the same memory region, effectively copying elements within the array with `memmove` +* `*target` does not need to point to the start of the array, but `size` and `capacity` always start counting from the + position, `*target` points to - in this scenario, the need for reallocation must be avoided for obvious reasons +* `index` does not need to be within size of the current array +* `index` does not even need to be within the capacity of the array +* `width` must be one of 8, 16, 32, 64 (only on 64-bit systems), or zero (in which case the native word width is used) + +If you just want to add one single element to an existing array, you can use the macro `cx_array_add()`. +You can use `CX_ARRAY_DECLARE()` to declare the necessary fields within a structure and then use the +`cx_array_simple_*()` convenience macros to reduce code overhead. +The convenience macros automatically determine the width of the size/capacity variables. + +## Map + +*Header file:* [map.h](api/map_8h.html) + +Similar to the list interface, the map interface provides a common API for implementing maps. +There are some minor subtle differences, though. + +First, the `remove` method is not always a destructive removal. +Instead, the last argument is a Boolean that indicates whether the element shall be destroyed or returned. +```c +void *(*remove)(CxMap *map, CxHashKey key, bool destroy); +``` +When you implement this method, you are either supposed to invoke the destructors and return `NULL`, +or just remove the element from the map and return it. + +Secondly, the iterator method is a bit more complete. The signature is as follows: +```c +CxIterator (*iterator)(const CxMap *map, enum cx_map_iterator_type type); +``` +There are three map iterator types: for values, for keys, for pairs. +Depending on the iterator type requested, you need to create an iterator with the correct methods that +return the requested thing. +There are no automatic checks to enforce this - it's completely up to you. +If you need inspiration on how to do that, check the hash map implementation that comes with UCX. + +### Hash Map + +*Header file:* [hash_map.h](api/hash__map_8h.html) + +UCX provides a basic hash map implementation with a configurable amount of buckets. +If you do not specify the number of buckets, a default of 16 buckets will be used. +You can always rehash the map with `cxMapRehash()` to change the number of buckets to something more efficient, +but you need to be careful, because when you use this function you are effectively locking into using this +specific hash map implementation, and you would need to remove all calls to this function when you want to +exchange the concrete map implementation with something different. + +## Utilities + +*Header file:* [utils.h](api/utils_8h.html) + +UCX provides some utilities for routine tasks. + +The most useful utilities are the *stream copy* functions, which provide a simple way to copy all - or a +bounded amount of - data from one stream to another. Since the read/write functions of a UCX buffer are +fully compatible with stream read/write functions, you can easily transfer data from file or network streams to +a UCX buffer or vice-versa. + +The following example shows, how easy it is to read the contents of a file into a buffer: +```c +FILE *inputfile = fopen(infilename, "r"); +if (inputfile) { + CxBuffer fbuf; + cxBufferInit(&fbuf, NULL, 4096, NULL, CX_BUFFER_AUTO_EXTEND); + cx_stream_copy(inputfile, &fbuf, + (cx_read_func) fread, + cxBufferWriteFunc); + fclose(inputfile); + + // ... do something meaningful with the contents ... + + cxBufferDestroy(&fbuf); +} else { + perror("Error opening input file"); + if (fout != stdout) { + fclose(fout); + } +} +``` + +### Printf Functions + +*Header file:* [printf.h](api/printf_8h.html) + +In this utility header you can find `printf()`-like functions that can write the formatted output to an arbitrary +stream (or UCX buffer, resp.), or to memory allocated by an allocator within a single function call. +With the help of these convenience functions, you do not need to `snprintf` your string to a temporary buffer anymore, +plus you do not need to worry about too small buffer sizes, because the functions will automatically allocate enough +memory to contain the entire formatted string. + +### Compare Functions + +*Header file:* [compare.h](api/compare_8h.html) + +This header file contains a collection of compare functions for various data types. +Their signatures are designed to be compatible with the `cx_compare_func` function pointer type.