Mercurial > hg > ucx / changeset

--- a/docs/Writerside/topics/string.h.md	Sun Feb 16 12:59:14 2025 +0100
+++ b/docs/Writerside/topics/string.h.md	Mon Feb 17 23:34:33 2025 +0100
@@ -1,111 +1,239 @@
 # String

-<warning>
-Outdated Section - will be updated soon!
-</warning>
-
-UCX strings come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`).
-The functions of UCX are designed to work with immutable strings by default but in situations where it is necessary,
-the API also provides alternative functions that work directly with mutable strings.
-Functions that change a string in-place are, of course, only accepting mutable strings.
+UCX strings store character arrays together with a length and come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`).

-When you are using UCX functions, or defining your own functions, you are sometimes facing the "problem",
-that the function only accepts arguments of type `cxstring` but you only have a `cxmutstr` at hand.
-In this case you _should not_ introduce a wrapper function that accepts the `cxmutstr`,
-but instead you should use the `cx_strcast()` function to cast the argument to the correct type.
-
-In general, UCX strings are **not** necessarily zero-terminated. If a function guarantees to return zero-terminated
-string, it is explicitly mentioned in the documentation of the respective function.
-As a rule of thumb, you _should not_ pass the strings of a UCX string structure to another API without explicitly
+In general, UCX strings are *not* necessarily zero-terminated.
+If a function guarantees to return a zero-terminated string, it is explicitly mentioned in the documentation.
+As a rule of thumb, you _should not_ pass a character array of a UCX string structure to another API without explicitly
 ensuring that the string is zero-terminated.

-<!--
 ## Basics

-### cx_mutstr
-### cx_mutstrn
-### cx_str
-### cx_strn
-### cx_strcast
-### cx_strfree
-### cx_strfree_a
-### cx_strdup
-### cx_strdup_a
-### cx_strlen
-### cx_strtrim
-### cx_strtrim_m
-### cx_strlower
-### cx_strupper
+> To make documentation simpler, we introduce the pseudo-type `AnyStr` with the meaning that
+> both `cxstring` and `cxmutstr` are accepted for that argument.
+> The implementation is actually hidden behind a macro which uses `cx_strcast()` to guarantee compatibility.
+{style="note"}
+
+```C
+#include <cx/string.h>
+
+struct cx_string_s {const char *ptr; size_t length;};
+
+struct cx_mutstr_s {char *ptr; size_t length;};
+
+typedef struct cx_string_s cxstring;
+
+typedef struct cx_mutstr_s cxmutstr;
+
+cxstring cx_str(const char *cstring);
+
+cxstring cx_strn(const char *cstring, size_t length);
+
+cxmutstr cx_mutstr(char *cstring);
+
+cxmutstr cx_mutstrn(char *cstring, size_t length);
+
+cxstring cx_strcast(AnyStr str);
+
+cxmutstr cx_strdupa(AnyStr string);
+
+cxmutstr cx_strdup_a(const CxAllocator *allocator, AnyStr string);
+
+void cx_strfree(cxmutstr *str);
+
+void cx_strfree_a(const CxAllocator *alloc, cxmutstr *str);
+```
+
+> Documentation work in progress.
+>{style="warning"}
+
+> When you want to convert a string _literal_ into a UCX string, you can also use the `CX_STR(lit)` macro.
+> This macro uses the fact that `sizeof(lit)` for a string literal `lit` is always the string length plus one,
+> effectively saving an invocation of `strlen()`.
+> However, this only works for literals - in all other cases you must use `cx_str()` or `cx_strn`.

 ## Comparison

-### cx_strcmp
-### cx_strcmp_p
-### cx_strcasecmp
-### cx_strcasecmp_p
-### cx_strprefix
-### cx_strsuffix
-### cx_strcaseprefix
-### cx_strcasesuffix
+```C
+#include <cx/string.h>
+
+int cx_strcmp(cxstring s1, cxstring s2);
+
+int cx_strcmp_p(const void *s1, const void *s2);
+
+bool cx_strprefix(cxstring string, cxstring prefix);
+
+bool cx_strsuffix(cxstring string, cxstring suffix);
+
+int cx_strcasecmp(cxstring s1, cxstring s2);
+
+int cx_strcasecmp_p(const void *s1, const void *s2);
+
+bool cx_strcaseprefix(cxstring string, cxstring prefix);
+
+bool cx_strcasesuffix(cxstring string, cxstring suffix);
+```
+
+> Documentation work in progress.
+>{style="warning"}

 ## Concatenation

-### cx_strcat_ma
+```C
+#include <cx/string.h>
+
+cxmutstr cx_strcat(size_t count, ... );
+
+cxmutstr cx_strcat_a(const CxAllocator *alloc, size_t count, ... );
+
+cxmutstr cx_strcat_m(cxmutstr str, size_t count, ... );
+
+cxmutstr cx_strcat_ma(const CxAllocator *alloc,
+        cxmutstr str, size_t count, ... );
+
+size_t cx_strlen(size_t count, ...);
+```
+
+> Documentation work in progress.
+>{style="warning"}

 ## Find Characters and Substrings

-### cx_strchr
-### cx_strchr_m
-### cx_strrchr
-### cx_strrchr_m
-### cx_strstr
-### cx_strstr_m
-### cx_strsubs
-### cx_strsubsl
-### cx_strsubsl_m
-### cx_strsubs_m
+```C
+#include <cx/string.h>
+
+cxstring cx_strchr(cxstring string, int chr);
+
+cxmutstr cx_strchr_m(cxmutstr string, int chr);
+
+cxstring cx_strrchr(cxstring string,int chr);
+
+cxmutstr cx_strrchr_m(cxmutstr string, int chr);
+
+cxstring cx_strstr(cxstring haystack, cxstring needle);
+
+cxmutstr cx_strstr_m(cxmutstr haystack, cxstring needle);
+
+cxstring cx_strsubs(cxstring string, size_t start);
+
+cxstring cx_strsubsl(cxstring string, size_t start, size_t length);
+
+cxmutstr cx_strsubs_m(cxmutstr string, size_t start);
+
+cxmutstr cx_strsubsl_m(cxmutstr string, size_t start, size_t length);
+
+cxstring cx_strtrim(cxstring string);
+
+cxmutstr cx_strtrim_m(cxmutstr string);
+```
+
+> Documentation work in progress.
+>{style="warning"}

 ## Replace Substrings

-### cx_strreplacen_a
+```C
+#include <cx/string.h>
+
+cxmutstr cx_strreplace(cxstring str, cxstring pattern, cxstring repl);
+
+cxmutstr cx_strreplace_a(const CxAllocator *allocator, cxstring str,
+        cxstring pattern, cxstring repl);
+
+cxmutstr cx_strreplacen(cxstring str, cxstring pattern, cxstring repl,
+        size_t replmax);
+
+cxmutstr cx_strreplacen_a(const CxAllocator *allocator, cxstring str,
+        cxstring pattern, cxstring repl, size_t replmax);
+```
+
+> Documentation work in progress.
+>{style="warning"}

 ## Basic Splitting

-### cx_strsplit
-### cx_strsplit_a
-### cx_strsplit_m
-### cx_strsplit_ma
+```C
+#include <cx/string.h>
+
+size_t cx_strsplit(cxstring string, cxstring delim,
+        size_t limit, cxstring *output);
+
+size_t cx_strsplit_a(const CxAllocator *allocator,
+        cxstring string, cxstring delim,
+        size_t limit, cxstring **output);
+
+size_t cx_strsplit_m(cxmutstr string, cxstring delim,
+        size_t limit, cxmutstr *output);
+
+size_t cx_strsplit_ma(const CxAllocator *allocator,
+        cxmutstr string, cxstring delim,
+        size_t limit, cxmutstr **output);
+```
+
+> Documentation work in progress.
+>{style="warning"}

 ## Complex Tokenization

-### cx_strtok_
-### cx_strtok_delim
-### cx_strtok_next
-### cx_strtok_next_m
+```C
+#include <cx/string.h>
+
+CxStrtokCtx cx_strtok(AnyStr str, AnyStr delim, size_t limit);
+
+void cx_strtok_delim(CxStrtokCtx *ctx,
+        const cxstring *delim, size_t count);
+
+bool cx_strtok_next(CxStrtokCtx *ctx, cxstring *token);
+
+bool cx_strtok_next_m(CxStrtokCtx *ctx, cxmutstr *token);
+```
+
+> Documentation work in progress.
+>{style="warning"}

 ## Conversion to Numbers

-### cx_strtod_lc_
-### cx_strtof_lc_
-### cx_strtoi16_lc_
-### cx_strtoi32_lc_
-### cx_strtoi64_lc_
-### cx_strtoi8_lc_
-### cx_strtoi_lc_
-### cx_strtol_lc
-### cx_strtoll_lc
-### cx_strtos_lc
-### cx_strtou16_lc
-### cx_strtou32_lc
-### cx_strtou64_lc
-### cx_strtou8_lc
-### cx_strtou_lc
-### cx_strtoul_lc
-### cx_strtoull_lc
-### cx_strtous_lc
-### cx_strtouz_lc
-### cx_strtoz_lc
--->
+For each integer type, as well as `float` and `double`, there are functions to convert a UCX string to a number of that type.
+
+Integer conversion comes in two flavours:
+```C
+int cx_strtoi(AnyStr str, int *output, int base);
+
+int cx_strtoi_lc(AnyStr str, int *output, int base,
+        const char *groupsep);
+```
+
+The basic variant takes a string of any UCX string type, a pointer to the `output` integer, and the `base` (one of 2, 8, 10, or 16).
+Conversion is attempted with respect to the specified `base` and respects possible special notations for that base.
+Hexadecimal numbers may be prefixed with `0x`, `x`, or `#`, and binary numbers may be prefixed with `0b` or `b`.
+
+The `_lc` versions of the integer conversion functions are equivalent, except that they allow the specification of an
+array of group separator chars, each of which is simply ignored during conversion.
+The default group separator for the basic version is a comma `,`.
+
+The signature for the floating point conversions is quite similar:
+```C
+int cx_strtof(AnyStr str, float *output);
+
+int cx_strtof_lc(AnyStr str, float *output,
+        char decsep, const char *groupsep);
+```
+
+The two differences are that the floating point versions do not support different bases,
+and the `_lc` variant allows specifying not only an array of group separators,
+but also the character used for the decimal separator.
+
+In the basic variant, the group separator is again a comma `,`, and the decimal separator is a dot `.`.
+
+> The floating point conversions of UCX 3.1 do not achieve the same precision as standard library implementations
+> which usually use more sophisticated algorithms.
+> The precision might increase in future UCX releases,
+> but until then be aware of slight inaccuracies, in particular when working with `double`.
+{style="warning"}
+
+> The UCX string to number conversions are intentionally not considering any locale settings
+> and are therefore independent of any global state.
+{style="note"}

 <seealso>
 <category ref="apidoc">