Mercurial > hg > ucx / file revision

# String

UCX strings store character arrays together with a length and come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`).

In general, UCX strings are *not* necessarily zero-terminated.
If a function guarantees to return a zero-terminated string, it is explicitly mentioned in the documentation.
As a rule of thumb, you _should not_ pass a character array of a UCX string structure to another API without explicitly
ensuring that the string is zero-terminated.

## Basics

> To make documentation simpler, we introduce the pseudo-type `AnyStr` with the meaning that
> both `cxstring` and `cxmutstr` are accepted for that argument.
> The implementation is actually hidden behind a macro which uses `cx_strcast()` to guarantee compatibility.
{style="note"}

```C
#include <cx/string.h>

struct cx_string_s {const char *ptr; size_t length;};

struct cx_mutstr_s {char *ptr; size_t length;};

typedef struct cx_string_s cxstring;

typedef struct cx_mutstr_s cxmutstr;

cxstring cx_str(const char *cstring);

cxstring cx_strn(const char *cstring, size_t length);

cxmutstr cx_mutstr(char *cstring);

cxmutstr cx_mutstrn(char *cstring, size_t length);

cxstring cx_strcast(AnyStr str);

cxmutstr cx_strdupa(AnyStr string);

cxmutstr cx_strdup_a(const CxAllocator *allocator, AnyStr string);

void cx_strfree(cxmutstr *str);

void cx_strfree_a(const CxAllocator *alloc, cxmutstr *str);
```

> Documentation work in progress.
>{style="warning"}

> When you want to convert a string _literal_ into a UCX string, you can also use the `CX_STR(lit)` macro.
> This macro uses the fact that `sizeof(lit)` for a string literal `lit` is always the string length plus one,
> effectively saving an invocation of `strlen()`.
> However, this only works for literals - in all other cases you must use `cx_str()` or `cx_strn`.

## Comparison

```C
#include <cx/string.h>

int cx_strcmp(cxstring s1, cxstring s2);

int cx_strcmp_p(const void *s1, const void *s2);

bool cx_strprefix(cxstring string, cxstring prefix);

bool cx_strsuffix(cxstring string, cxstring suffix);

int cx_strcasecmp(cxstring s1, cxstring s2);

int cx_strcasecmp_p(const void *s1, const void *s2);

bool cx_strcaseprefix(cxstring string, cxstring prefix);

bool cx_strcasesuffix(cxstring string, cxstring suffix);
```

> Documentation work in progress.
>{style="warning"}

## Concatenation

```C
#include <cx/string.h>

cxmutstr cx_strcat(size_t count, ... );

cxmutstr cx_strcat_a(const CxAllocator *alloc, size_t count, ... );

cxmutstr cx_strcat_m(cxmutstr str, size_t count, ... );

cxmutstr cx_strcat_ma(const CxAllocator *alloc,
        cxmutstr str, size_t count, ... );

size_t cx_strlen(size_t count, ...);
```

> Documentation work in progress.
>{style="warning"}

## Find Characters and Substrings

```C
#include <cx/string.h>

cxstring cx_strchr(cxstring string, int chr);

cxmutstr cx_strchr_m(cxmutstr string, int chr);

cxstring cx_strrchr(cxstring string,int chr);

cxmutstr cx_strrchr_m(cxmutstr string, int chr);

cxstring cx_strstr(cxstring haystack, cxstring needle);

cxmutstr cx_strstr_m(cxmutstr haystack, cxstring needle);

cxstring cx_strsubs(cxstring string, size_t start);

cxstring cx_strsubsl(cxstring string, size_t start, size_t length);

cxmutstr cx_strsubs_m(cxmutstr string, size_t start);

cxmutstr cx_strsubsl_m(cxmutstr string, size_t start, size_t length);

cxstring cx_strtrim(cxstring string);

cxmutstr cx_strtrim_m(cxmutstr string);
```

> Documentation work in progress.
>{style="warning"}

## Replace Substrings

```C
#include <cx/string.h>

cxmutstr cx_strreplace(cxstring str, cxstring pattern, cxstring repl);

cxmutstr cx_strreplace_a(const CxAllocator *allocator, cxstring str,
        cxstring pattern, cxstring repl);

cxmutstr cx_strreplacen(cxstring str, cxstring pattern, cxstring repl,
        size_t replmax);

cxmutstr cx_strreplacen_a(const CxAllocator *allocator, cxstring str,
        cxstring pattern, cxstring repl, size_t replmax);
```

> Documentation work in progress.
>{style="warning"}

## Basic Splitting

```C
#include <cx/string.h>

size_t cx_strsplit(cxstring string, cxstring delim,
        size_t limit, cxstring *output);

size_t cx_strsplit_a(const CxAllocator *allocator,
        cxstring string, cxstring delim,
        size_t limit, cxstring **output);

size_t cx_strsplit_m(cxmutstr string, cxstring delim,
        size_t limit, cxmutstr *output);

size_t cx_strsplit_ma(const CxAllocator *allocator,
        cxmutstr string, cxstring delim,
        size_t limit, cxmutstr **output);
```

> Documentation work in progress.
>{style="warning"}

## Complex Tokenization

```C
#include <cx/string.h>

CxStrtokCtx cx_strtok(AnyStr str, AnyStr delim, size_t limit);

void cx_strtok_delim(CxStrtokCtx *ctx,
        const cxstring *delim, size_t count);

bool cx_strtok_next(CxStrtokCtx *ctx, cxstring *token);

bool cx_strtok_next_m(CxStrtokCtx *ctx, cxmutstr *token);
```

> Documentation work in progress.
>{style="warning"}

## Conversion to Numbers

For each integer type, as well as `float` and `double`, there are functions to convert a UCX string to a number of that type.

Integer conversion comes in two flavours:
```C
int cx_strtoi(AnyStr str, int *output, int base);

int cx_strtoi_lc(AnyStr str, int *output, int base,
        const char *groupsep);
```

The basic variant takes a string of any UCX string type, a pointer to the `output` integer, and the `base` (one of 2, 8, 10, or 16).
Conversion is attempted with respect to the specified `base` and respects possible special notations for that base.
Hexadecimal numbers may be prefixed with `0x`, `x`, or `#`, and binary numbers may be prefixed with `0b` or `b`.

The `_lc` versions of the integer conversion functions are equivalent, except that they allow the specification of an
array of group separator chars, each of which is simply ignored during conversion.
The default group separator for the basic version is a comma `,`.

The signature for the floating point conversions is quite similar:
```C
int cx_strtof(AnyStr str, float *output);

int cx_strtof_lc(AnyStr str, float *output,
        char decsep, const char *groupsep);
```

The two differences are that the floating point versions do not support different bases,
and the `_lc` variant allows specifying not only an array of group separators,
but also the character used for the decimal separator.

In the basic variant, the group separator is again a comma `,`, and the decimal separator is a dot `.`.

> The floating point conversions of UCX 3.1 do not achieve the same precision as standard library implementations
> which usually use more sophisticated algorithms.
> The precision might increase in future UCX releases,
> but until then be aware of slight inaccuracies, in particular when working with `double`.
{style="warning"}

> The UCX string to number conversions are intentionally not considering any locale settings
> and are therefore independent of any global state.
{style="note"}

<seealso>
<category ref="apidoc">
<a href="https://ucx.sourceforge.io/api/string_8h.html">string.h</a>
</category>
</seealso>