docs/src/features.md

Fri, 07 Jul 2023 17:28:07 +0200

author
Mike Becker <universe@uap-core.de>
date
Fri, 07 Jul 2023 17:28:07 +0200
changeset 733
2ed01495f838
parent 732
a3b5f27ad956
child 745
c99abca90d21
permissions
-rw-r--r--

add iterator documentation

     1 ---
     2 title: UCX Features
     3 ---
     5 <div id="modules">
     7 ------------------------ -------------------------  -------------------  ---------------------------------
     8 [Allocator](#allocator)  [String](#string)          [Buffer](#buffer)    [Memory&nbsp;Pool](#memory-pool)
     9 [Iterator](#iterator)    [Collection](#collection)  [List](#list)        [Map](#map)
    10 [Utilities](#utilities)
    11 ------------------------ -------------------------  -------------------  ---------------------------------
    13 </div>
    15 ## Allocator
    17 *Header file:* [allocator.h](api/allocator_8h.html)  
    19 The UCX allocator provides an interface for implementing an own memory allocation mechanism.
    20 Various function in UCX provide an additional alternative signature that takes an allocator as
    21 argument. A default allocator implementation using the stdlib memory management functions is
    22 available via the global symbol `cxDefaultAllocator`.
    24 If you want to define your own allocator, you need to initialize the `CxAllocator` structure
    25 with a pointer to an allocator class (containing function pointers for the memory management
    26 functions) and an optional pointer to an arbitrary memory region that can be used to store
    27 state information for the allocator. An example is shown below:
    29 ```c
    30 struct my_allocator_state {
    31     size_t total;
    32     size_t avail;
    33     char mem[];
    34 };
    36 static cx_allocator_class my_allocator_class = {
    37         my_malloc_impl,
    38         my_realloc_impl,   // all these functions are somewhere defined
    39         my_calloc_impl,
    40         my_free_impl
    41 };
    43 CxAllocator create_my_allocator(size_t n) {
    44     CxAllocator alloc;
    45     alloc.cl = &my_allocator_class;
    46     alloc.data = calloc(1, sizeof(struct my_allocator_state) + n);
    47     return alloc;
    48 }
    50 void free_my_allocator(CxAllocator *alloc) {
    51     free(alloc.data);
    52     free(alloc);
    53 }
    54 ```
    56 ## String
    58 *Header file:* [string.h](api/string_8h.html)
    60 UCX strings come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`).
    61 The functions of UCX are designed to work with immutable strings by default but in situations where it is necessary,
    62 the API also provides alternative functions that work directly with mutable strings.
    63 Functions that change a string in-place are, of course, only accepting mutable strings.
    65 When you are using UCX functions, or defining your own functions, you are sometimes facing the "problem",
    66 that the function only accepts arguments of type `cxstring` but you only have a `cxmutstr` at hand.
    67 In this case you _should not_ introduce a wrapper function that accepts the `cxmutstr`,
    68 but instead you should use the `cx_strcast()` function to cast the argument to the correct type.
    70 In general, UCX strings are **not** necessarily zero-terminated. If a function guarantees to return zero-terminated
    71 string, it is explicitly mentioned in the documentation of the respective function.
    72 As a rule of thumb, you _should not_ pass the strings of a UCX string structure to another API without explicitly
    73 ensuring that the string is zero-terminated.
    75 ## Buffer
    77 *Header file:* [buffer.h](api/buffer_8h.html)
    79 Instances of this buffer implementation can be used to read from or write to memory like you would do with a stream.
    80 This allows the use of `cx_stream_copy()` (see [Utilities](#utilities)) to copy contents from one buffer to another,
    81 or from a file or network streams to the buffer and vice-versa.
    83 More features for convenient use of the buffer can be enabled, like automatic memory management and automatic
    84 resizing of the buffer space.
    86 Since UCX 3.0, the buffer also supports automatic flushing of contents to another stream (or buffer) as an alternative
    87 to automatically resizing the buffer space.
    88 Please refer to the API doc for the fields prefixed with `flush_` to learn more. 
    90 ## Memory Pool
    92 *Header file:* [mempool.h](api/mempool_8h.html)
    94 A memory pool is providing an allocator implementation that automatically deallocates the memory upon its destruction. 
    95 It also allows you to register destructor functions for the allocated memory, which are automatically called before
    96 the memory is deallocated.
    97 Additionally, you may also register _independent_ destructor functions within a pool in case some external library
    98 allocated memory for you, which should be destroyed together with this pool.
   100 Many UCX features support the use of an allocator.
   101 The [strings](#string), for instance, provide several functions suffixed with `_a` that allow specifying an allocator.
   102 You can use this to keep track of the memory occupied by dynamically allocated strings and cleanup everything with
   103 just a single call to `cxMempoolDestroy()`.
   105 The following code illustrates this on the example of reading a CSV file into memory. 
   106 ```C
   107 #include <stdio.h>
   108 #include <cx/mempool.h>
   109 #include <cx/linked_list.h>
   110 #include <cx/string.h>
   111 #include <cx/buffer.h>
   112 #include <cx/utils.h>
   114 typedef struct {
   115     cxstring column_a;
   116     cxstring column_b;
   117     cxstring column_c;
   118 } CSVData;
   120 int main(void) {
   121     CxMempool* pool = cxBasicMempoolCreate(128);
   123     FILE *f = fopen("test.csv", "r");
   124     if (!f) {
   125         perror("Cannot open file");
   126         return 1;
   127     }
   128     // close the file automatically at pool destruction
   129     cxMempoolRegister(pool, f, (cx_destructor_func) fclose);
   131     // create a buffer using the memory pool for destruction
   132     CxBuffer *content = cxBufferCreate(NULL, 256, pool->allocator, CX_BUFFER_AUTO_EXTEND);
   134     // read the file into the buffer and turn it into a string
   135     cx_stream_copy(f, content, (cx_read_func) fread, (cx_write_func) cxBufferWrite);
   136     cxstring contentstr = cx_strn(content->space, content->size);
   138     // split the string into lines - use the mempool for allocating the target array
   139     cxstring* lines;
   140     size_t lc = cx_strsplit_a(pool->allocator, contentstr,
   141                               CX_STR("\n"), SIZE_MAX, &lines);
   143     // skip the header and parse the remaining data into a linked list
   144     // the nodes of the linked list shall also be allocated by the mempool
   145     CxList* datalist = cxLinkedListCreate(pool->allocator, NULL, sizeof(CSVData));
   146     for (size_t i = 1 ; i < lc ; i++) {
   147         if (lines[i].length == 0) continue;
   148         cxstring fields[3];
   149         size_t fc = cx_strsplit(lines[i], CX_STR(";"), 3, fields);
   150         if (fc != 3) {
   151             fprintf(stderr, "Syntax error in line %zu.\n", i);
   152             cxMempoolDestroy(pool);
   153             return 1;
   154         }
   155         CSVData* data = cxMalloc(pool->allocator, sizeof(CSVData));
   156         data->column_a = fields[0];
   157         data->column_b = fields[1];
   158         data->column_c = fields[2];
   159         cxListAdd(datalist, data);
   160     }
   162     // iterate through the list and output the data
   163     CxIterator iter = cxListIterator(datalist);
   164     cx_foreach(CSVData*, data, iter) {
   165         printf("Column A: %.*s | "
   166                "Column B: %.*s | "
   167                "Column C: %.*s\n",
   168                (int)data->column_a.length, data->column_a.ptr,
   169                (int)data->column_b.length, data->column_b.ptr,
   170                (int)data->column_c.length, data->column_c.ptr
   171         );
   172     }
   174     // cleanup everything, no manual free() needed 
   175     cxMempoolDestroy(pool);
   177     return 0;
   178 } 
   179 ```
   181 ## Iterator
   183 *Header file:* [iterator.h](api/iterator_8h.html)
   185 In UCX 3 a new feature has been introduced to write own iterators, that work with the `cx_foreach` macro.
   186 In previous UCX releases there were different hard-coded foreach macros for lists and maps that were not customizable.
   187 Now, creating an iterator is as simple as creating a `CxIterator` struct and setting the fields in a meaningful way.
   189 You do not always need all fields in the iterator structure, depending on your use case.
   190 Sometimes you only need the `index` (for example when iterating over simple lists), and other times you will need the
   191 `slot` and `kv_data` fields (for example when iterating over maps).
   193 Usually an iterator is not mutating the collection it is iterating over.
   194 In some programming languages it is even disallowed to change the collection while iterating with foreach.
   195 But sometimes it is desirable to remove an element from the collection while iterating over it.
   196 This is, what the `CxMutIterator` is for.
   197 The only differences are, that the `mutating` flag is `true` and the `src_handle` is not const.
   198 On mutating iterators it is allowed to call the `cxFlagForRemoval()` function, which instructs the iterator to remove
   199 the current element from the collection on the next call to `cxIteratorNext()` and clear the flag afterward.
   200 If you are implementing your own iterator, it is up to you to implement this behavior in your `next` method, or
   201 make the implementation of the `flag_removal` method always return `false`.
   203 ## Collection
   205 *Header file:* [collection.h](api/collection_8h.html)
   207 Collections in UCX 3 have several common features.
   208 If you want to implement an own collection data type that uses the same features, you can use the
   209 `CX_COLLECTION_MEMBERS` macro at the beginning of your struct to roll out all members a usual UCX collection has.
   210 ```c
   211 struct my_fancy_collection_s {
   212     CX_COLLECTION_MEMBERS
   213     struct my_collection_data_s *data;
   214 };
   215 ```
   216 Based on this structure, this header provides some convenience macros for invoking the destructor functions
   217 that are part of the basic collection members.
   218 The idea of having destructor functions within a collection is that you can destroy the collection _and_ the
   219 contents with one single function call.
   220 When you are implementing a collection, you are responsible for invoking the destructors at the right places, e.g.
   221 when removing (and deleting) elements in the collection, clearing the collection, or - the most prominent case -
   222 destroying the collection.
   224 You can always look at the UCX list and map implementations if you need some inspiration.
   226 ## List
   228 *Header file:* [list.h](api/list_8h.html)
   230 This header defines a common interface for all list implementations, which is basically as simple as the following
   231 structure.
   232 ```c
   233 struct cx_list_s {
   234     CX_COLLECTION_MEMBERS       // size, capacity, etc.
   235     cx_list_class const *cl;    // The list class definition
   236 };
   237 ```
   238 The actual structure contains one more class pointer that is used when wrapping a list into a pointer aware list
   239 with `cxListStorePointers()`. What this means, is that - if you want to implement your own list structure - you
   240 only need to cover the case where the list is storing copies of your objects.
   242 UCX comes with two common list implementations (linked list and array list) that should cover most use cases.
   243 But if you feel the need to implement an own list, the only thing you need to do is to define a struct where
   244 `struct cx_list_s`, and set an appropriate list class that implements the functionality.
   245 It is strongly recommended that this class is shared among all instances of the same list type, because otherwise
   246 the `cxListCompare` function cannot use the optimized implementation of your class and will instead fall back to
   247 using iterators to compare the contents element-wise.
   249 ### Linked List
   251 *Header file:* [linked_list.h](api/linked__list_8h.html)
   253 On top of implementing the list interface, this header also defines several low-level functions that
   254 work with arbitrary structures. 
   255 Low-level functions, in contrast to the high-level list interface, can easily be recognized by their snake-casing.
   256 The function `cx_linked_list_at`, for example, implements a similar functionality like `cxListAt`, but operates
   257 on arbitrary structures.
   258 The following snippet shows how it is used.
   259 All other low-level functions work similarly.
   260 ```c
   261 struct node {
   262     node *next;
   263     node *prev;
   264     int data;
   265 };
   267 const ptrdiff_t loc_prev = offsetof(struct node, prev);
   268 const ptrdiff_t loc_next = offsetof(struct node, next);
   269 const ptrdiff_t loc_data = offsetof(struct node, data);
   271 struct node a = {0}, b = {0}, c = {0}, d = {0};
   272 cx_linked_list_link(&a, &b, loc_prev, loc_next);
   273 cx_linked_list_link(&b, &c, loc_prev, loc_next);
   274 cx_linked_list_link(&c, &d, loc_prev, loc_next);
   276 cx_linked_list_at(&a, 0, loc_next, 2); // returns pointer to c
   277 ```
   279 ### Array List
   281 *Header file:* [array_list.h](api/array__list_8h.html)
   283 Since low-level array lists are just plain arrays, there is no need for such many low-level functions as for linked
   284 lists.
   285 However, there is one extremely powerful function that can be used for several complex tasks: `cx_array_copy`.
   286 The full signature is shown below:
   287 ```c
   288 enum cx_array_copy_result cx_array_copy(
   289         void **target,
   290         size_t *size,
   291         size_t *capacity,  // optional
   292         size_t index,
   293         void const *src,
   294         size_t elem_size,
   295         size_t elem_count,
   296         struct cx_array_reallocator_s *reallocator // optional
   297 );
   298 ```
   299 The `target` argument is a pointer to the target array pointer.
   300 The reason for this additional indirection is that - given that you provide a `reallocator` - this function writes
   301 back the pointer to the possibly reallocated array.
   302 THe next two arguments are pointers to the `size` and `capacity` of the target array.
   303 Tracking the capacity is optional.
   304 If you do not specify a pointer for the capacity, automatic reallocation of the array is entirely disabled (i.e. it
   305 does not make sense to specify a `reallocator` then).
   306 In this case, the function cannot copy more than `size-index` elements and if you try, it will return
   307 `CX_ARRAY_COPY_REALLOC_NOT_SUPPORTED` and do nothing.
   309 On a successful invocation, the function copies `elem_count` number of elements, each of size `elem_size` from
   310 `src` to `*target` and uses the `reallocator` to extend the array when necessary.
   311 Finally, the size, capacity, and the pointer to the array are all updated and the function returns
   312 `CX_ARRAY_COPY_SUCCESS`.
   314 The third, but extremely rare, return code is `CX_ARRAY_COPY_REALLOC_FAILED` and speaks for itself.
   316 A few things to note:
   317 * `*target` and `src` can point to the same memory region, effectively copying elements within the array with `memmove`
   318 * `*target` does not need to point to the start of the array, but `size` and `capacity` always start counting from the
   319   position, `*target` points to - in this scenario, specifying a `reallocator` is forbidden for obvious reasons
   320 * `index` does not need to be within size of the current array, if `capacity` is specified
   321 * `index` does not even need to be within the capacity of the array, if `reallocator` is specified 
   324 ## Map
   326 *Header file:* [map.h](api/map_8h.html)
   328 Similar to the list interface, the map interface provides a common API for implementing maps.
   329 There are some minor subtle differences, though.
   331 First, the `remove` method is not always a destructive removal.
   332 Instead, the last argument is a Boolean that indicates whether the element shall be destroyed or returned.
   333 ```c
   334 void *(*remove)(CxMap *map, CxHashKey key, bool destroy);
   335 ```
   336 When you implement this method, you are either supposed to invoke the destructors and return `NULL`,
   337 or just remove the element from the map and return it.
   339 Secondly, the iterator method is a bit more complete. The signature is as follows:
   340 ```c
   341 CxIterator (*iterator)(CxMap const *map, enum cx_map_iterator_type type);
   342 ```
   343 There are three map iterator types: for values, for keys, for pairs.
   344 Depending on the iterator type requested, you need to create an iterator with the correct methods that
   345 return the requested thing.
   346 There are no automatic checks to enforce this - it's completely up to you.
   347 If you need inspiration on how to do that, check the hash map implementation that comes with UCX.
   349 ### Hash Map
   351 *Header file:* [hash_map.h](api/hash__map_8h.html)
   353 UCX provides a basic hash map implementation with a configurable amount of buckets.
   354 If you do not specify the number of buckets, a default of 16 buckets will be used.
   355 You can always rehash the map with `cxMapRehash()` to change the number of buckets to something more efficient,
   356 but you need to be careful, because when you use this function you are effectively locking into using this
   357 specific hash map implementation, and you would need to remove all calls to this function when you want to
   358 exchange the concrete map implementation with something different.
   360 ## Utilities
   362 *Header file:* [utils.h](api/utils_8h.html)
   364 UCX provides some utilities for routine tasks. Most of them are simple macros, like e.g. the `cx_for_n()` macro,
   365 creating a `for` loop counting from zero to (n-1) which is extremely useful to traverse the indices of
   366 an array.
   368 But the most useful utilities are the *stream copy* functions, which provide a simple way to copy all - or a
   369 bounded amount of - data from one stream to another. Since the read/write functions of a UCX buffer are
   370 fully compatible with stream read/write functions, you can easily transfer data from file or network streams to
   371 a UCX buffer or vice-versa.
   373 The following example shows, how easy it is to read the contents of a file into a buffer:
   374 ```c
   375 FILE *inputfile = fopen(infilename, "r");
   376 if (inputfile) {
   377     CxBuffer fbuf;
   378     cxBufferInit(&fbuf, NULL, 4096, NULL, CX_BUFFER_AUTO_EXTEND);
   379     cx_stream_copy(inputfile, &fbuf,
   380                    (cx_read_func) fread,
   381                    (cx_write_func) cxBufferWrite);
   382     fclose(inputfile);
   384     // ... do something meaningful with the contents ...
   386     cxBufferDestroy(&fbuf);
   387 } else {
   388     perror("Error opening input file");
   389     if (fout != stdout) {
   390         fclose(fout);
   391     }
   392 }
   393 ```
   395 ### Printf Functions
   397 *Header file:* [printf.h](api/printf_8h.html)
   399 In this utility header you can find `printf()`-like functions that can write the formatted output to an arbitrary
   400 stream (or UCX buffer, resp.), or to memory allocated by an allocator within a single function call.
   401 With the help of these convenience functions, you do not need to `snprintf` your string to a temporary buffer anymore,
   402 plus you do not need to worry about too small buffer sizes, because the functions will automatically allocate enough
   403 memory to contain the entire formatted string.
   405 ### Compare Functions
   407 *Header file:* [compare.h](api/compare_8h.html)
   409 This header file contains a collection of compare functions for various data types.
   410 Their signatures are designed to be compatible with the `cx_compare_func` function pointer type.

mercurial