docs/src/features.md

Thu, 23 May 2024 19:29:14 +0200

author
Mike Becker <universe@uap-core.de>
date
Thu, 23 May 2024 19:29:14 +0200
changeset 853
d4baf4dd55c3
parent 834
04c53b3c8378
child 854
fe0d69d72bcd
permissions
-rw-r--r--

simplify iterator structures

universe@720 1 ---
universe@720 2 title: UCX Features
universe@720 3 ---
universe@720 4
universe@720 5 <div id="modules">
universe@720 6
universe@720 7 ------------------------ ------------------------- ------------------- ---------------------------------
universe@720 8 [Allocator](#allocator) [String](#string) [Buffer](#buffer) [Memory&nbsp;Pool](#memory-pool)
universe@720 9 [Iterator](#iterator) [Collection](#collection) [List](#list) [Map](#map)
universe@720 10 [Utilities](#utilities)
universe@720 11 ------------------------ ------------------------- ------------------- ---------------------------------
universe@720 12
universe@720 13 </div>
universe@720 14
universe@720 15 ## Allocator
universe@720 16
universe@720 17 *Header file:* [allocator.h](api/allocator_8h.html)
universe@720 18
universe@722 19 The UCX allocator provides an interface for implementing an own memory allocation mechanism.
universe@722 20 Various function in UCX provide an additional alternative signature that takes an allocator as
universe@722 21 argument. A default allocator implementation using the stdlib memory management functions is
universe@722 22 available via the global symbol `cxDefaultAllocator`.
universe@722 23
universe@722 24 If you want to define your own allocator, you need to initialize the `CxAllocator` structure
universe@722 25 with a pointer to an allocator class (containing function pointers for the memory management
universe@722 26 functions) and an optional pointer to an arbitrary memory region that can be used to store
universe@722 27 state information for the allocator. An example is shown below:
universe@722 28
universe@722 29 ```c
universe@722 30 struct my_allocator_state {
universe@722 31 size_t total;
universe@722 32 size_t avail;
universe@727 33 char mem[];
universe@722 34 };
universe@722 35
universe@722 36 static cx_allocator_class my_allocator_class = {
universe@722 37 my_malloc_impl,
universe@722 38 my_realloc_impl, // all these functions are somewhere defined
universe@722 39 my_calloc_impl,
universe@722 40 my_free_impl
universe@722 41 };
universe@722 42
universe@722 43 CxAllocator create_my_allocator(size_t n) {
universe@722 44 CxAllocator alloc;
universe@722 45 alloc.cl = &my_allocator_class;
universe@722 46 alloc.data = calloc(1, sizeof(struct my_allocator_state) + n);
universe@722 47 return alloc;
universe@722 48 }
universe@722 49 ```
universe@722 50
universe@720 51 ## String
universe@720 52
universe@720 53 *Header file:* [string.h](api/string_8h.html)
universe@720 54
universe@723 55 UCX strings come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`).
universe@723 56 The functions of UCX are designed to work with immutable strings by default but in situations where it is necessary,
universe@723 57 the API also provides alternative functions that work directly with mutable strings.
universe@723 58 Functions that change a string in-place are, of course, only accepting mutable strings.
universe@723 59
universe@723 60 When you are using UCX functions, or defining your own functions, you are sometimes facing the "problem",
universe@723 61 that the function only accepts arguments of type `cxstring` but you only have a `cxmutstr` at hand.
universe@723 62 In this case you _should not_ introduce a wrapper function that accepts the `cxmutstr`,
universe@723 63 but instead you should use the `cx_strcast()` function to cast the argument to the correct type.
universe@723 64
universe@723 65 In general, UCX strings are **not** necessarily zero-terminated. If a function guarantees to return zero-terminated
universe@723 66 string, it is explicitly mentioned in the documentation of the respective function.
universe@723 67 As a rule of thumb, you _should not_ pass the strings of a UCX string structure to another API without explicitly
universe@723 68 ensuring that the string is zero-terminated.
universe@723 69
universe@720 70 ## Buffer
universe@720 71
universe@724 72 *Header file:* [buffer.h](api/buffer_8h.html)
universe@724 73
universe@724 74 Instances of this buffer implementation can be used to read from or write to memory like you would do with a stream.
universe@724 75 This allows the use of `cx_stream_copy()` (see [Utilities](#utilities)) to copy contents from one buffer to another,
universe@724 76 or from a file or network streams to the buffer and vice-versa.
universe@724 77
universe@724 78 More features for convenient use of the buffer can be enabled, like automatic memory management and automatic
universe@724 79 resizing of the buffer space.
universe@724 80
universe@724 81 Since UCX 3.0, the buffer also supports automatic flushing of contents to another stream (or buffer) as an alternative
universe@724 82 to automatically resizing the buffer space.
universe@724 83 Please refer to the API doc for the fields prefixed with `flush_` to learn more.
universe@720 84
universe@720 85 ## Memory Pool
universe@720 86
universe@725 87 *Header file:* [mempool.h](api/mempool_8h.html)
universe@720 88
universe@729 89 A memory pool is providing an allocator implementation that automatically deallocates the memory upon its destruction.
universe@729 90 It also allows you to register destructor functions for the allocated memory, which are automatically called before
universe@729 91 the memory is deallocated.
universe@729 92 Additionally, you may also register _independent_ destructor functions within a pool in case some external library
universe@729 93 allocated memory for you, which should be destroyed together with this pool.
universe@729 94
universe@729 95 Many UCX features support the use of an allocator.
universe@729 96 The [strings](#string), for instance, provide several functions suffixed with `_a` that allow specifying an allocator.
universe@729 97 You can use this to keep track of the memory occupied by dynamically allocated strings and cleanup everything with
universe@729 98 just a single call to `cxMempoolDestroy()`.
universe@729 99
universe@729 100 The following code illustrates this on the example of reading a CSV file into memory.
universe@729 101 ```C
universe@729 102 #include <stdio.h>
universe@729 103 #include <cx/mempool.h>
universe@729 104 #include <cx/linked_list.h>
universe@729 105 #include <cx/string.h>
universe@729 106 #include <cx/buffer.h>
universe@729 107 #include <cx/utils.h>
universe@729 108
universe@729 109 typedef struct {
universe@729 110 cxstring column_a;
universe@729 111 cxstring column_b;
universe@729 112 cxstring column_c;
universe@729 113 } CSVData;
universe@729 114
universe@729 115 int main(void) {
universe@729 116 CxMempool* pool = cxBasicMempoolCreate(128);
universe@729 117
universe@729 118 FILE *f = fopen("test.csv", "r");
universe@729 119 if (!f) {
universe@729 120 perror("Cannot open file");
universe@729 121 return 1;
universe@729 122 }
universe@729 123 // close the file automatically at pool destruction
universe@729 124 cxMempoolRegister(pool, f, (cx_destructor_func) fclose);
universe@729 125
universe@729 126 // create a buffer using the memory pool for destruction
universe@729 127 CxBuffer *content = cxBufferCreate(NULL, 256, pool->allocator, CX_BUFFER_AUTO_EXTEND);
universe@729 128
universe@729 129 // read the file into the buffer and turn it into a string
universe@729 130 cx_stream_copy(f, content, (cx_read_func) fread, (cx_write_func) cxBufferWrite);
universe@745 131 fclose(f);
universe@729 132 cxstring contentstr = cx_strn(content->space, content->size);
universe@729 133
universe@729 134 // split the string into lines - use the mempool for allocating the target array
universe@729 135 cxstring* lines;
universe@729 136 size_t lc = cx_strsplit_a(pool->allocator, contentstr,
universe@729 137 CX_STR("\n"), SIZE_MAX, &lines);
universe@729 138
universe@729 139 // skip the header and parse the remaining data into a linked list
universe@729 140 // the nodes of the linked list shall also be allocated by the mempool
universe@729 141 CxList* datalist = cxLinkedListCreate(pool->allocator, NULL, sizeof(CSVData));
universe@729 142 for (size_t i = 1 ; i < lc ; i++) {
universe@729 143 if (lines[i].length == 0) continue;
universe@729 144 cxstring fields[3];
universe@729 145 size_t fc = cx_strsplit(lines[i], CX_STR(";"), 3, fields);
universe@729 146 if (fc != 3) {
universe@729 147 fprintf(stderr, "Syntax error in line %zu.\n", i);
universe@729 148 cxMempoolDestroy(pool);
universe@729 149 return 1;
universe@729 150 }
universe@746 151 CSVData data;
universe@746 152 data.column_a = fields[0];
universe@746 153 data.column_b = fields[1];
universe@746 154 data.column_c = fields[2];
universe@746 155 cxListAdd(datalist, &data);
universe@729 156 }
universe@729 157
universe@729 158 // iterate through the list and output the data
universe@729 159 CxIterator iter = cxListIterator(datalist);
universe@729 160 cx_foreach(CSVData*, data, iter) {
universe@729 161 printf("Column A: %.*s | "
universe@729 162 "Column B: %.*s | "
universe@729 163 "Column C: %.*s\n",
universe@729 164 (int)data->column_a.length, data->column_a.ptr,
universe@729 165 (int)data->column_b.length, data->column_b.ptr,
universe@729 166 (int)data->column_c.length, data->column_c.ptr
universe@729 167 );
universe@729 168 }
universe@729 169
universe@729 170 // cleanup everything, no manual free() needed
universe@729 171 cxMempoolDestroy(pool);
universe@729 172
universe@729 173 return 0;
universe@729 174 }
universe@729 175 ```
universe@729 176
universe@720 177 ## Iterator
universe@720 178
universe@720 179 *Header file:* [iterator.h](api/iterator_8h.html)
universe@720 180
universe@733 181 In UCX 3 a new feature has been introduced to write own iterators, that work with the `cx_foreach` macro.
universe@733 182 In previous UCX releases there were different hard-coded foreach macros for lists and maps that were not customizable.
universe@733 183 Now, creating an iterator is as simple as creating a `CxIterator` struct and setting the fields in a meaningful way.
universe@733 184
universe@733 185 You do not always need all fields in the iterator structure, depending on your use case.
universe@733 186 Sometimes you only need the `index` (for example when iterating over simple lists), and other times you will need the
universe@733 187 `slot` and `kv_data` fields (for example when iterating over maps).
universe@733 188
universe@853 189 If the predefined fields are insufficient for your use case, you can alternatively create your own iterator structure
universe@853 190 and place the `CX_ITERATOR_BASE` macro inside.
universe@853 191
universe@733 192 Usually an iterator is not mutating the collection it is iterating over.
universe@733 193 In some programming languages it is even disallowed to change the collection while iterating with foreach.
universe@733 194 But sometimes it is desirable to remove an element from the collection while iterating over it.
universe@853 195 For this purpose, most collections allow the creation of a _mutating_ iterator.
universe@733 196 The only differences are, that the `mutating` flag is `true` and the `src_handle` is not const.
universe@733 197 On mutating iterators it is allowed to call the `cxFlagForRemoval()` function, which instructs the iterator to remove
universe@733 198 the current element from the collection on the next call to `cxIteratorNext()` and clear the flag afterward.
universe@853 199 If you are implementing your own iterator, it is up to you to implement this behavior.
universe@733 200
universe@720 201 ## Collection
universe@720 202
universe@720 203 *Header file:* [collection.h](api/collection_8h.html)
universe@720 204
universe@730 205 Collections in UCX 3 have several common features.
universe@730 206 If you want to implement an own collection data type that uses the same features, you can use the
universe@730 207 `CX_COLLECTION_MEMBERS` macro at the beginning of your struct to roll out all members a usual UCX collection has.
universe@730 208 ```c
universe@730 209 struct my_fancy_collection_s {
universe@730 210 CX_COLLECTION_MEMBERS
universe@730 211 struct my_collection_data_s *data;
universe@730 212 };
universe@730 213 ```
universe@730 214 Based on this structure, this header provides some convenience macros for invoking the destructor functions
universe@730 215 that are part of the basic collection members.
universe@730 216 The idea of having destructor functions within a collection is that you can destroy the collection _and_ the
universe@730 217 contents with one single function call.
universe@730 218 When you are implementing a collection, you are responsible for invoking the destructors at the right places, e.g.
universe@730 219 when removing (and deleting) elements in the collection, clearing the collection, or - the most prominent case -
universe@730 220 destroying the collection.
universe@730 221
universe@730 222 You can always look at the UCX list and map implementations if you need some inspiration.
universe@730 223
universe@720 224 ## List
universe@720 225
universe@720 226 *Header file:* [list.h](api/list_8h.html)
universe@720 227
universe@731 228 This header defines a common interface for all list implementations, which is basically as simple as the following
universe@731 229 structure.
universe@731 230 ```c
universe@731 231 struct cx_list_s {
universe@731 232 CX_COLLECTION_MEMBERS // size, capacity, etc.
universe@731 233 cx_list_class const *cl; // The list class definition
universe@731 234 };
universe@731 235 ```
universe@745 236 The actual structure contains one more class pointer that is used when wrapping a list into a pointer-aware list
universe@731 237 with `cxListStorePointers()`. What this means, is that - if you want to implement your own list structure - you
universe@731 238 only need to cover the case where the list is storing copies of your objects.
universe@731 239
universe@731 240 UCX comes with two common list implementations (linked list and array list) that should cover most use cases.
universe@731 241 But if you feel the need to implement an own list, the only thing you need to do is to define a struct where
universe@731 242 `struct cx_list_s`, and set an appropriate list class that implements the functionality.
universe@731 243 It is strongly recommended that this class is shared among all instances of the same list type, because otherwise
universe@731 244 the `cxListCompare` function cannot use the optimized implementation of your class and will instead fall back to
universe@731 245 using iterators to compare the contents element-wise.
universe@731 246
universe@720 247 ### Linked List
universe@720 248
universe@720 249 *Header file:* [linked_list.h](api/linked__list_8h.html)
universe@720 250
universe@731 251 On top of implementing the list interface, this header also defines several low-level functions that
universe@731 252 work with arbitrary structures.
universe@731 253 Low-level functions, in contrast to the high-level list interface, can easily be recognized by their snake-casing.
universe@731 254 The function `cx_linked_list_at`, for example, implements a similar functionality like `cxListAt`, but operates
universe@731 255 on arbitrary structures.
universe@731 256 The following snippet shows how it is used.
universe@731 257 All other low-level functions work similarly.
universe@731 258 ```c
universe@731 259 struct node {
universe@731 260 node *next;
universe@731 261 node *prev;
universe@731 262 int data;
universe@731 263 };
universe@731 264
universe@731 265 const ptrdiff_t loc_prev = offsetof(struct node, prev);
universe@731 266 const ptrdiff_t loc_next = offsetof(struct node, next);
universe@731 267 const ptrdiff_t loc_data = offsetof(struct node, data);
universe@731 268
universe@731 269 struct node a = {0}, b = {0}, c = {0}, d = {0};
universe@731 270 cx_linked_list_link(&a, &b, loc_prev, loc_next);
universe@731 271 cx_linked_list_link(&b, &c, loc_prev, loc_next);
universe@731 272 cx_linked_list_link(&c, &d, loc_prev, loc_next);
universe@731 273
universe@731 274 cx_linked_list_at(&a, 0, loc_next, 2); // returns pointer to c
universe@731 275 ```
universe@731 276
universe@720 277 ### Array List
universe@720 278
universe@720 279 *Header file:* [array_list.h](api/array__list_8h.html)
universe@720 280
universe@731 281 Since low-level array lists are just plain arrays, there is no need for such many low-level functions as for linked
universe@731 282 lists.
universe@731 283 However, there is one extremely powerful function that can be used for several complex tasks: `cx_array_copy`.
universe@731 284 The full signature is shown below:
universe@731 285 ```c
universe@819 286 enum cx_array_result cx_array_copy(
universe@731 287 void **target,
universe@731 288 size_t *size,
universe@731 289 size_t *capacity, // optional
universe@731 290 size_t index,
universe@731 291 void const *src,
universe@731 292 size_t elem_size,
universe@731 293 size_t elem_count,
universe@731 294 struct cx_array_reallocator_s *reallocator // optional
universe@731 295 );
universe@731 296 ```
universe@731 297 The `target` argument is a pointer to the target array pointer.
universe@731 298 The reason for this additional indirection is that - given that you provide a `reallocator` - this function writes
universe@731 299 back the pointer to the possibly reallocated array.
universe@819 300 The next two arguments are pointers to the `size` and `capacity` of the target array.
universe@731 301 Tracking the capacity is optional.
universe@731 302 If you do not specify a pointer for the capacity, automatic reallocation of the array is entirely disabled (i.e. it
universe@731 303 does not make sense to specify a `reallocator` then).
universe@731 304 In this case, the function cannot copy more than `size-index` elements and if you try, it will return
universe@819 305 `CX_ARRAY_REALLOC_NOT_SUPPORTED` and do nothing.
universe@731 306
universe@731 307 On a successful invocation, the function copies `elem_count` number of elements, each of size `elem_size` from
universe@731 308 `src` to `*target` and uses the `reallocator` to extend the array when necessary.
universe@731 309 Finally, the size, capacity, and the pointer to the array are all updated and the function returns
universe@819 310 `CX_ARRAY_SUCCESS`.
universe@731 311
universe@819 312 The third, but extremely rare, return code is `CX_ARRAY_REALLOC_FAILED` and speaks for itself.
universe@731 313
universe@731 314 A few things to note:
universe@731 315 * `*target` and `src` can point to the same memory region, effectively copying elements within the array with `memmove`
universe@731 316 * `*target` does not need to point to the start of the array, but `size` and `capacity` always start counting from the
universe@731 317 position, `*target` points to - in this scenario, specifying a `reallocator` is forbidden for obvious reasons
universe@731 318 * `index` does not need to be within size of the current array, if `capacity` is specified
universe@731 319 * `index` does not even need to be within the capacity of the array, if `reallocator` is specified
universe@731 320
universe@819 321 If you just want to add one single element to an existing array, you can use the macro `cx_array_add()`.
universe@819 322 In that case, since the element is added to the end of the array, the `capacity` argument is mandatory.
universe@834 323 You can use `CX_ARRAY_DECLARE()` to declare the necessary fields within a structure and then use the
universe@831 324 `cx_array_simple_*()` convenience macros to reduce code overhead.
universe@731 325
universe@720 326 ## Map
universe@720 327
universe@720 328 *Header file:* [map.h](api/map_8h.html)
universe@720 329
universe@732 330 Similar to the list interface, the map interface provides a common API for implementing maps.
universe@732 331 There are some minor subtle differences, though.
universe@732 332
universe@732 333 First, the `remove` method is not always a destructive removal.
universe@732 334 Instead, the last argument is a Boolean that indicates whether the element shall be destroyed or returned.
universe@732 335 ```c
universe@732 336 void *(*remove)(CxMap *map, CxHashKey key, bool destroy);
universe@732 337 ```
universe@732 338 When you implement this method, you are either supposed to invoke the destructors and return `NULL`,
universe@732 339 or just remove the element from the map and return it.
universe@732 340
universe@732 341 Secondly, the iterator method is a bit more complete. The signature is as follows:
universe@732 342 ```c
universe@732 343 CxIterator (*iterator)(CxMap const *map, enum cx_map_iterator_type type);
universe@732 344 ```
universe@732 345 There are three map iterator types: for values, for keys, for pairs.
universe@732 346 Depending on the iterator type requested, you need to create an iterator with the correct methods that
universe@732 347 return the requested thing.
universe@732 348 There are no automatic checks to enforce this - it's completely up to you.
universe@732 349 If you need inspiration on how to do that, check the hash map implementation that comes with UCX.
universe@732 350
universe@720 351 ### Hash Map
universe@720 352
universe@720 353 *Header file:* [hash_map.h](api/hash__map_8h.html)
universe@720 354
universe@732 355 UCX provides a basic hash map implementation with a configurable amount of buckets.
universe@732 356 If you do not specify the number of buckets, a default of 16 buckets will be used.
universe@732 357 You can always rehash the map with `cxMapRehash()` to change the number of buckets to something more efficient,
universe@732 358 but you need to be careful, because when you use this function you are effectively locking into using this
universe@732 359 specific hash map implementation, and you would need to remove all calls to this function when you want to
universe@732 360 exchange the concrete map implementation with something different.
universe@732 361
universe@720 362 ## Utilities
universe@720 363
universe@720 364 *Header file:* [utils.h](api/utils_8h.html)
universe@720 365
universe@724 366 UCX provides some utilities for routine tasks. Most of them are simple macros, like e.g. the `cx_for_n()` macro,
universe@724 367 creating a `for` loop counting from zero to (n-1) which is extremely useful to traverse the indices of
universe@724 368 an array.
universe@724 369
universe@724 370 But the most useful utilities are the *stream copy* functions, which provide a simple way to copy all - or a
universe@724 371 bounded amount of - data from one stream to another. Since the read/write functions of a UCX buffer are
universe@724 372 fully compatible with stream read/write functions, you can easily transfer data from file or network streams to
universe@724 373 a UCX buffer or vice-versa.
universe@724 374
universe@724 375 The following example shows, how easy it is to read the contents of a file into a buffer:
universe@724 376 ```c
universe@724 377 FILE *inputfile = fopen(infilename, "r");
universe@724 378 if (inputfile) {
universe@724 379 CxBuffer fbuf;
universe@724 380 cxBufferInit(&fbuf, NULL, 4096, NULL, CX_BUFFER_AUTO_EXTEND);
universe@724 381 cx_stream_copy(inputfile, &fbuf,
universe@724 382 (cx_read_func) fread,
universe@724 383 (cx_write_func) cxBufferWrite);
universe@724 384 fclose(inputfile);
universe@724 385
universe@724 386 // ... do something meaningful with the contents ...
universe@724 387
universe@724 388 cxBufferDestroy(&fbuf);
universe@724 389 } else {
universe@724 390 perror("Error opening input file");
universe@724 391 if (fout != stdout) {
universe@724 392 fclose(fout);
universe@724 393 }
universe@724 394 }
universe@724 395 ```
universe@724 396
universe@720 397 ### Printf Functions
universe@720 398
universe@720 399 *Header file:* [printf.h](api/printf_8h.html)
universe@720 400
universe@725 401 In this utility header you can find `printf()`-like functions that can write the formatted output to an arbitrary
universe@725 402 stream (or UCX buffer, resp.), or to memory allocated by an allocator within a single function call.
universe@725 403 With the help of these convenience functions, you do not need to `snprintf` your string to a temporary buffer anymore,
universe@725 404 plus you do not need to worry about too small buffer sizes, because the functions will automatically allocate enough
universe@725 405 memory to contain the entire formatted string.
universe@725 406
universe@720 407 ### Compare Functions
universe@720 408
universe@720 409 *Header file:* [compare.h](api/compare_8h.html)
universe@725 410
universe@725 411 This header file contains a collection of compare functions for various data types.
universe@725 412 Their signatures are designed to be compatible with the `cx_compare_func` function pointer type.

mercurial