Mon, 17 Feb 2025 23:34:33 +0100
start documenting the string functions
relates to #451
docs/Writerside/topics/string.h.md | file | annotate | diff | comparison | revisions |
--- a/docs/Writerside/topics/string.h.md Sun Feb 16 12:59:14 2025 +0100 +++ b/docs/Writerside/topics/string.h.md Mon Feb 17 23:34:33 2025 +0100 @@ -1,111 +1,239 @@ # String -<warning> -Outdated Section - will be updated soon! -</warning> - -UCX strings come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`). -The functions of UCX are designed to work with immutable strings by default but in situations where it is necessary, -the API also provides alternative functions that work directly with mutable strings. -Functions that change a string in-place are, of course, only accepting mutable strings. +UCX strings store character arrays together with a length and come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`). -When you are using UCX functions, or defining your own functions, you are sometimes facing the "problem", -that the function only accepts arguments of type `cxstring` but you only have a `cxmutstr` at hand. -In this case you _should not_ introduce a wrapper function that accepts the `cxmutstr`, -but instead you should use the `cx_strcast()` function to cast the argument to the correct type. - -In general, UCX strings are **not** necessarily zero-terminated. If a function guarantees to return zero-terminated -string, it is explicitly mentioned in the documentation of the respective function. -As a rule of thumb, you _should not_ pass the strings of a UCX string structure to another API without explicitly +In general, UCX strings are *not* necessarily zero-terminated. +If a function guarantees to return a zero-terminated string, it is explicitly mentioned in the documentation. +As a rule of thumb, you _should not_ pass a character array of a UCX string structure to another API without explicitly ensuring that the string is zero-terminated. -<!-- ## Basics -### cx_mutstr -### cx_mutstrn -### cx_str -### cx_strn -### cx_strcast -### cx_strfree -### cx_strfree_a -### cx_strdup -### cx_strdup_a -### cx_strlen -### cx_strtrim -### cx_strtrim_m -### cx_strlower -### cx_strupper +> To make documentation simpler, we introduce the pseudo-type `AnyStr` with the meaning that +> both `cxstring` and `cxmutstr` are accepted for that argument. +> The implementation is actually hidden behind a macro which uses `cx_strcast()` to guarantee compatibility. +{style="note"} + +```C +#include <cx/string.h> + +struct cx_string_s {const char *ptr; size_t length;}; + +struct cx_mutstr_s {char *ptr; size_t length;}; + +typedef struct cx_string_s cxstring; + +typedef struct cx_mutstr_s cxmutstr; + +cxstring cx_str(const char *cstring); + +cxstring cx_strn(const char *cstring, size_t length); + +cxmutstr cx_mutstr(char *cstring); + +cxmutstr cx_mutstrn(char *cstring, size_t length); + +cxstring cx_strcast(AnyStr str); + +cxmutstr cx_strdupa(AnyStr string); + +cxmutstr cx_strdup_a(const CxAllocator *allocator, AnyStr string); + +void cx_strfree(cxmutstr *str); + +void cx_strfree_a(const CxAllocator *alloc, cxmutstr *str); +``` + +> Documentation work in progress. +>{style="warning"} + +> When you want to convert a string _literal_ into a UCX string, you can also use the `CX_STR(lit)` macro. +> This macro uses the fact that `sizeof(lit)` for a string literal `lit` is always the string length plus one, +> effectively saving an invocation of `strlen()`. +> However, this only works for literals - in all other cases you must use `cx_str()` or `cx_strn`. ## Comparison -### cx_strcmp -### cx_strcmp_p -### cx_strcasecmp -### cx_strcasecmp_p -### cx_strprefix -### cx_strsuffix -### cx_strcaseprefix -### cx_strcasesuffix +```C +#include <cx/string.h> + +int cx_strcmp(cxstring s1, cxstring s2); + +int cx_strcmp_p(const void *s1, const void *s2); + +bool cx_strprefix(cxstring string, cxstring prefix); + +bool cx_strsuffix(cxstring string, cxstring suffix); + +int cx_strcasecmp(cxstring s1, cxstring s2); + +int cx_strcasecmp_p(const void *s1, const void *s2); + +bool cx_strcaseprefix(cxstring string, cxstring prefix); + +bool cx_strcasesuffix(cxstring string, cxstring suffix); +``` + +> Documentation work in progress. +>{style="warning"} ## Concatenation -### cx_strcat_ma +```C +#include <cx/string.h> + +cxmutstr cx_strcat(size_t count, ... ); + +cxmutstr cx_strcat_a(const CxAllocator *alloc, size_t count, ... ); + +cxmutstr cx_strcat_m(cxmutstr str, size_t count, ... ); + +cxmutstr cx_strcat_ma(const CxAllocator *alloc, + cxmutstr str, size_t count, ... ); + +size_t cx_strlen(size_t count, ...); +``` + +> Documentation work in progress. +>{style="warning"} ## Find Characters and Substrings -### cx_strchr -### cx_strchr_m -### cx_strrchr -### cx_strrchr_m -### cx_strstr -### cx_strstr_m -### cx_strsubs -### cx_strsubsl -### cx_strsubsl_m -### cx_strsubs_m +```C +#include <cx/string.h> + +cxstring cx_strchr(cxstring string, int chr); + +cxmutstr cx_strchr_m(cxmutstr string, int chr); + +cxstring cx_strrchr(cxstring string,int chr); + +cxmutstr cx_strrchr_m(cxmutstr string, int chr); + +cxstring cx_strstr(cxstring haystack, cxstring needle); + +cxmutstr cx_strstr_m(cxmutstr haystack, cxstring needle); + +cxstring cx_strsubs(cxstring string, size_t start); + +cxstring cx_strsubsl(cxstring string, size_t start, size_t length); + +cxmutstr cx_strsubs_m(cxmutstr string, size_t start); + +cxmutstr cx_strsubsl_m(cxmutstr string, size_t start, size_t length); + +cxstring cx_strtrim(cxstring string); + +cxmutstr cx_strtrim_m(cxmutstr string); +``` + +> Documentation work in progress. +>{style="warning"} ## Replace Substrings -### cx_strreplacen_a +```C +#include <cx/string.h> + +cxmutstr cx_strreplace(cxstring str, cxstring pattern, cxstring repl); + +cxmutstr cx_strreplace_a(const CxAllocator *allocator, cxstring str, + cxstring pattern, cxstring repl); + +cxmutstr cx_strreplacen(cxstring str, cxstring pattern, cxstring repl, + size_t replmax); + +cxmutstr cx_strreplacen_a(const CxAllocator *allocator, cxstring str, + cxstring pattern, cxstring repl, size_t replmax); +``` + +> Documentation work in progress. +>{style="warning"} ## Basic Splitting -### cx_strsplit -### cx_strsplit_a -### cx_strsplit_m -### cx_strsplit_ma +```C +#include <cx/string.h> + +size_t cx_strsplit(cxstring string, cxstring delim, + size_t limit, cxstring *output); + +size_t cx_strsplit_a(const CxAllocator *allocator, + cxstring string, cxstring delim, + size_t limit, cxstring **output); + +size_t cx_strsplit_m(cxmutstr string, cxstring delim, + size_t limit, cxmutstr *output); + +size_t cx_strsplit_ma(const CxAllocator *allocator, + cxmutstr string, cxstring delim, + size_t limit, cxmutstr **output); +``` + +> Documentation work in progress. +>{style="warning"} ## Complex Tokenization -### cx_strtok_ -### cx_strtok_delim -### cx_strtok_next -### cx_strtok_next_m +```C +#include <cx/string.h> + +CxStrtokCtx cx_strtok(AnyStr str, AnyStr delim, size_t limit); + +void cx_strtok_delim(CxStrtokCtx *ctx, + const cxstring *delim, size_t count); + +bool cx_strtok_next(CxStrtokCtx *ctx, cxstring *token); + +bool cx_strtok_next_m(CxStrtokCtx *ctx, cxmutstr *token); +``` + +> Documentation work in progress. +>{style="warning"} ## Conversion to Numbers -### cx_strtod_lc_ -### cx_strtof_lc_ -### cx_strtoi16_lc_ -### cx_strtoi32_lc_ -### cx_strtoi64_lc_ -### cx_strtoi8_lc_ -### cx_strtoi_lc_ -### cx_strtol_lc -### cx_strtoll_lc -### cx_strtos_lc -### cx_strtou16_lc -### cx_strtou32_lc -### cx_strtou64_lc -### cx_strtou8_lc -### cx_strtou_lc -### cx_strtoul_lc -### cx_strtoull_lc -### cx_strtous_lc -### cx_strtouz_lc -### cx_strtoz_lc ---> +For each integer type, as well as `float` and `double`, there are functions to convert a UCX string to a number of that type. + +Integer conversion comes in two flavours: +```C +int cx_strtoi(AnyStr str, int *output, int base); + +int cx_strtoi_lc(AnyStr str, int *output, int base, + const char *groupsep); +``` + +The basic variant takes a string of any UCX string type, a pointer to the `output` integer, and the `base` (one of 2, 8, 10, or 16). +Conversion is attempted with respect to the specified `base` and respects possible special notations for that base. +Hexadecimal numbers may be prefixed with `0x`, `x`, or `#`, and binary numbers may be prefixed with `0b` or `b`. + +The `_lc` versions of the integer conversion functions are equivalent, except that they allow the specification of an +array of group separator chars, each of which is simply ignored during conversion. +The default group separator for the basic version is a comma `,`. + +The signature for the floating point conversions is quite similar: +```C +int cx_strtof(AnyStr str, float *output); + +int cx_strtof_lc(AnyStr str, float *output, + char decsep, const char *groupsep); +``` + +The two differences are that the floating point versions do not support different bases, +and the `_lc` variant allows specifying not only an array of group separators, +but also the character used for the decimal separator. + +In the basic variant, the group separator is again a comma `,`, and the decimal separator is a dot `.`. + +> The floating point conversions of UCX 3.1 do not achieve the same precision as standard library implementations +> which usually use more sophisticated algorithms. +> The precision might increase in future UCX releases, +> but until then be aware of slight inaccuracies, in particular when working with `double`. +{style="warning"} + +> The UCX string to number conversions are intentionally not considering any locale settings +> and are therefore independent of any global state. +{style="note"} <seealso> <category ref="apidoc">