.. default-domain:: chpl .. module:: Bytes :synopsis: The following document shows functions and methods used to manipulate and Bytes ===== The following document shows functions and methods used to manipulate and process Chapel bytes variables. :mod:`bytes ` is similar to a string but allows arbitrary data to be stored in it. Methods on bytes that interpret the data as characters assume that the bytes are ASCII characters. Creating :mod:`bytes ` ----------------------------- - A :mod:`bytes ` can be created using the literals similar to strings: .. code-block:: chapel var b = b"my bytes"; - If you need to create :mod:`bytes ` using a specific buffer (i.e. data in another :mod:`bytes `, a `c_string` or a C pointer) you can use the factory functions shown below, such as :proc:`createBytesWithNewBuffer`. :mod:`bytes ` and :mod:`string ` ----------------------------------------------- As :mod:`bytes ` can store arbitrary data, any :mod:`string ` can be cast to :mod:`bytes `. In that event, the bytes will store UTF-8 encoded character data. However, a :mod:`bytes ` can contain non-UTF-8 bytes and needs to be decoded to be converted to string. .. code-block:: chapel var s = "my string"; var b = s:bytes; // this is legal /* The reverse is not. The following is a compiler error: var s2 = b:string; */ var s2 = b.decode(); // you need to decode a bytes to convert it to a string See the documentation for the :proc:`~bytes.decode` method for details. Similarly, a :mod:`bytes ` can be initialized using a string: .. code-block:: chapel var s = "my string"; var b: bytes = s; Casts from :mod:`bytes ` to a Numeric Type ------------------------------------------------- This module supports casts from :mod:`bytes ` to numeric types. Such casts will interpret the :mod:`bytes ` as ASCII characters and convert it to the numeric type and throw an error if the :mod:`bytes ` does not match the expected format of a number. For example: .. code-block:: chapel var b = b"a"; var number = b:int; throws an error when it is executed, but .. code-block:: chapel var b = b"1"; var number = b:int; stores the value ``1`` in ``number``. To learn more about handling these errors, see the :ref:`Error Handling technical note `. .. function:: proc createBytesWithBorrowedBuffer(x: bytes) Creates a new :mod:`bytes ` which borrows the internal buffer of another :mod:`bytes `. If the buffer is freed before the :mod:`bytes ` returned from this function, accessing it is undefined behavior. :arg s: The :mod:`bytes ` to borrow the buffer from :returns: A new :mod:`bytes ` .. function:: proc createBytesWithBorrowedBuffer(x: c_string, length = x.size) Creates a new :mod:`bytes ` which borrows the internal buffer of a `c_string`. If the buffer is freed before the :mod:`bytes ` returned from this function, accessing it is undefined behavior. :arg s: `c_string` to borrow the buffer from :arg length: Length of `s`'s buffer, excluding the terminating null byte. :type length: `int` :returns: A new :mod:`bytes ` .. function:: proc createBytesWithBorrowedBuffer(x: c_ptr(?t), length: int, size: int) Creates a new :mod:`bytes ` which borrows the memory allocated for a `c_ptr`. If the buffer is freed before the :mod:`bytes ` returned from this function, accessing it is undefined behavior. :arg s: Buffer to borrow :type x: `c_ptr(uint(8))` or `c_ptr(c_char)` :arg length: Length of the buffer `s`, excluding the terminating null byte. :arg size: Size of memory allocated for `s` in bytes :returns: A new :mod:`bytes ` .. function:: proc createBytesWithOwnedBuffer(x: c_string, length = x.size) Creates a new :mod:`bytes ` which takes ownership of the internal buffer of a `c_string`.The buffer will be freed when the :mod:`bytes ` is deinitialized. :arg s: The `c_string` to take ownership of the buffer from :arg length: Length of `s`'s buffer, excluding the terminating null byte. :type length: `int` :returns: A new :mod:`bytes ` .. function:: proc createBytesWithOwnedBuffer(x: c_ptr(?t), length: int, size: int) Creates a new :mod:`bytes ` which takes ownership of the memory allocated for a `c_ptr`. The buffer will be freed when the :mod:`bytes ` is deinitialized. :arg s: The buffer to take ownership of :type x: `c_ptr(uint(8))` or `c_ptr(c_char)` :arg length: Length of the buffer `s`, excluding the terminating null byte. :arg size: Size of memory allocated for `s` in bytes :returns: A new :mod:`bytes ` .. function:: proc createBytesWithNewBuffer(x: bytes) Creates a new :mod:`bytes ` by creating a copy of the buffer of another :mod:`bytes `. :arg s: The :mod:`bytes ` to copy the buffer from :returns: A new :mod:`bytes ` .. function:: proc createBytesWithNewBuffer(x: c_string, length = x.size) Creates a new :mod:`bytes ` by creating a copy of the buffer of a `c_string`. :arg s: The `c_string` to copy the buffer from :arg length: Length of `s`'s buffer, excluding the terminating null byte. :type length: `int` :returns: A new :mod:`bytes ` .. function:: proc createBytesWithNewBuffer(x: c_ptr(?t), length: int, size = length+1) Creates a new :mod:`bytes ` by creating a copy of a buffer. :arg s: The buffer to copy :type x: `c_ptr(uint(8))` or `c_ptr(c_char)` :arg length: Length of buffer `s`, excluding the terminating null byte. :arg size: Size of memory allocated for `s` in bytes :returns: A new :mod:`bytes ` .. method:: proc bytes.size :returns: The number of bytes in the :mod:`bytes `. .. method:: proc bytes.indices :returns: The indices that can be used to index into the bytes (i.e., the range ``0..`. .. method:: proc bytes.localize(): bytes Gets a version of the :mod:`bytes ` that is on the currently executing locale. :returns: A shallow copy if the :mod:`bytes ` is already on the current locale, otherwise a deep copy is performed. .. method:: proc bytes.c_str(): c_string Gets a `c_string` from a :mod:`bytes `. The returned `c_string` shares the buffer with the :mod:`bytes `. :returns: A `c_string` .. method:: proc bytes.item(i: int): bytes Gets an ASCII character from the :mod:`bytes ` :arg i: The index :returns: A 1-length :mod:`bytes ` .. method:: proc bytes.this(i: int): byteType Gets a byte from the :mod:`bytes ` :arg i: The index :returns: uint(8) .. method:: proc bytes.toByte(): uint(8) :returns: The value of a single-byte :mod:`bytes ` as an integer. .. method:: proc bytes.byte(i: int): byteType Gets a byte from the :mod:`bytes ` :arg i: The index :returns: The value of the `i` th byte as an integer. .. itermethod:: iter bytes.items(): bytes Iterates over the :mod:`bytes `, yielding ASCII characters. :yields: 1-length :mod:`bytes ` .. itermethod:: iter bytes.these(): byteType Iterates over the :mod:`bytes ` :yields: uint(8) .. itermethod:: iter bytes.bytes(): byteType Iterates over the :mod:`bytes ` byte by byte. :yields: uint(8) .. method:: proc bytes.this(r: range(?)): bytes Slices the :mod:`bytes `. Halts if r is non-empty and not completely inside the range ``this.indices`` when compiled with `--checks`. `--fast` disables this check. :arg r: The range of indices the new :mod:`bytes ` should be made from :returns: a new :mod:`bytes ` that is a slice within ``this.indices``. If the length of `r` is zero, an empty :mod:`bytes ` is returned. .. method:: proc bytes.isEmpty(): bool Checks if the :mod:`bytes ` is empty. :returns: * `true` -- when empty * `false` -- otherwise .. method:: proc bytes.startsWith(needles: bytes ...): bool Checks if the :mod:`bytes ` starts with any of the given arguments. :arg needles: :mod:`bytes ` (s) to match against. :returns: * `true`--when the :mod:`bytes ` begins with one or more of the `needles` * `false`--otherwise .. method:: proc bytes.endsWith(needles: bytes ...): bool Checks if the :mod:`bytes ` ends with any of the given arguments. :arg needles: :mod:`bytes ` (s) to match against. :returns: * `true`--when the :mod:`bytes ` ends with one or more of the `needles` * `false`--otherwise .. method:: proc bytes.find(needle: bytes, region: range(?) = this.indices): idxType Finds the argument in the :mod:`bytes ` :arg needle: :mod:`bytes ` to search for :arg region: an optional range defining the indices to search within, default is the whole. Halts if the range is not within ``this.indices`` :returns: the index of the first occurrence from the left of `needle` within the :mod:`bytes `, or -1 if the `needle` is not in the :mod:`bytes `. .. method:: proc bytes.rfind(needle: bytes, region: range(?) = this.indices): idxType Finds the argument in the :mod:`bytes ` :arg needle: The :mod:`bytes ` to search for :arg region: an optional range defining the indices to search within, default is the whole. Halts if the range is not within ``this.indices`` :returns: the index of the first occurrence from the right of `needle` within the :mod:`bytes `, or -1 if the `needle` is not in the :mod:`bytes `. .. method:: proc bytes.count(needle: bytes, region: range(?) = this.indices): int Counts the number of occurrences of the argument in the :mod:`bytes ` :arg needle: The :mod:`bytes ` to search for :arg region: an optional range defining the substring to search within, default is the whole. Halts if the range is not within ``this.indices`` :returns: the number of times `needle` occurs in the :mod:`bytes ` .. method:: proc bytes.replace(needle: bytes, replacement: bytes, count: int = -1): bytes Replaces occurrences of a :mod:`bytes ` with another. :arg needle: The :mod:`bytes ` to search for :arg replacement: The :mod:`bytes ` to replace `needle` with :arg count: an optional argument specifying the number of replacements to make, values less than zero will replace all occurrences :returns: a copy of the :mod:`bytes ` where `replacement` replaces `needle` up to `count` times .. itermethod:: iter bytes.split(sep: bytes, maxsplit: int = -1, ignoreEmpty: bool = false): bytes Splits the :mod:`bytes ` on `sep` yielding the bytes between each occurrence, up to `maxsplit` times. :arg sep: The delimiter used to break the :mod:`bytes ` into chunks. :arg maxsplit: The number of times to split the :mod:`bytes `, negative values indicate no limit. :arg ignoreEmpty: * `true`-- Empty :mod:`bytes ` will not be yielded, * `false`-- Empty :mod:`bytes ` will be yielded :yields: :mod:`bytes ` .. itermethod:: iter bytes.split(maxsplit: int = -1): bytes Works as above, but uses runs of whitespace as the delimiter. :arg maxsplit: The maximum number of times to split the :mod:`bytes `, negative values indicate no limit. :yields: :mod:`bytes ` .. method:: proc bytes.join(const ref S: bytes ...): bytes Returns a new :mod:`bytes `, which is the concatenation of all of the :mod:`bytes ` passed in with the contents of the method receiver inserted between them. .. code-block:: chapel var x = b"|".join(b"a",b"10",b"d"); writeln(x); // prints: "a|10|d" :arg S: :mod:`bytes ` values to be joined :returns: A :mod:`bytes ` .. method:: proc bytes.join(const ref x): bytes Returns a new :mod:`bytes `, which is the concatenation of all of the :mod:`bytes ` passed in with the contents of the method receiver inserted between them. .. code-block:: chapel var tup = (b"a",b"10",b"d"); var x = b"|".join(tup); writeln(x); // prints: "a|10|d" :arg S: :mod:`bytes ` values to be joined :type S: tuple or array of :mod:`bytes ` :returns: A :mod:`bytes ` .. method:: proc bytes.strip(chars = b" \t\r\n", leading = true, trailing = true): bytes Strips given set of leading and/or trailing characters. :arg chars: Characters to remove. Defaults to `b" \\t\\r\\n"`. :arg leading: Indicates if leading occurrences should be removed. Defaults to `true`. :arg trailing: Indicates if trailing occurrences should be removed. Defaults to `true`. :returns: A new :mod:`bytes ` with `leading` and/or `trailing` occurrences of characters in `chars` removed as appropriate. .. method:: proc bytes.partition(sep: bytes): 3*(bytes) Splits the :mod:`bytes ` on a given separator :arg sep: The separator :returns: a `3*bytes` consisting of the section before `sep`, `sep`, and the section after `sep`. If `sep` is not found, the tuple will contain the whole :mod:`bytes `, and then two empty :mod:`bytes `. .. method:: proc bytes.dedent(columns = 0, ignoreFirst = true): bytes Remove indentation from each line of bytes. This can be useful when applied to multi-line bytes that are indented in the source code, but should not be indented in the output. When ``columns == 0``, determine the level of indentation to remove from all lines by finding the common leading whitespace across all non-empty lines. Empty lines are lines containing only whitespace. Tabs and spaces are the only whitespaces that are considered, but are not treated as the same characters when determining common whitespace. When ``columns > 0``, remove ``columns`` leading whitespace characters from each line. Tabs are not considered whitespace when ``columns > 0``, so only leading spaces are removed. :arg columns: The number of columns of indentation to remove. Infer common leading whitespace if ``columns == 0``. :arg ignoreFirst: When ``true``, ignore first line when determining the common leading whitespace, and make no changes to the first line. :returns: A new `bytes` with indentation removed. .. warning:: ``bytes.dedent`` is not considered stable and is subject to change in future Chapel releases. .. method:: proc bytes.decode(policy = decodePolicy.strict): string throws Returns a UTF-8 string from the given :mod:`bytes `. If the data is malformed for UTF-8, `policy` argument determines the action. :arg policy: - `decodePolicy.strict` raises an error - `decodePolicy.replace` replaces the malformed character with UTF-8 replacement character - `decodePolicy.drop` drops the data silently - `decodePolicy.escape` escapes each illegal byte with private use codepoints :throws: `DecodeError` if `decodePolicy.strict` is passed to the `policy` argument and the :mod:`bytes ` contains non-UTF-8 characters. :returns: A UTF-8 string. .. method:: proc bytes.isUpper(): bool Checks if all the characters in the :mod:`bytes ` are uppercase (A-Z) in ASCII. Ignores uncased (not a letter) and extended ASCII characters (decimal value larger than 127) :returns: * `true`--there is at least one uppercase and no lowercase characters * `false`--otherwise .. method:: proc bytes.isLower(): bool Checks if all the characters in the :mod:`bytes ` are lowercase (a-z) in ASCII. Ignores uncased (not a letter) and extended ASCII characters (decimal value larger than 127) :returns: * `true`--there is at least one lowercase and no uppercase characters * `false`--otherwise .. method:: proc bytes.isSpace(): bool Checks if all the characters in the :mod:`bytes ` are whitespace (' ', '\\t', '\\n', '\\v', '\\f', '\\r') in ASCII. :returns: * `true` -- when all the characters are whitespace. * `false` -- otherwise .. method:: proc bytes.isAlpha(): bool Checks if all the characters in the :mod:`bytes ` are alphabetic (a-zA-Z) in ASCII. :returns: * `true` -- when the characters are alphabetic. * `false` -- otherwise .. method:: proc bytes.isDigit(): bool Checks if all the characters in the :mod:`bytes ` are digits (0-9) in ASCII. :returns: * `true` -- when the characters are digits. * `false` -- otherwise .. method:: proc bytes.isAlnum(): bool Checks if all the characters in the :mod:`bytes ` are alphanumeric (a-zA-Z0-9) in ASCII. :returns: * `true` -- when the characters are alphanumeric. * `false` -- otherwise .. method:: proc bytes.isPrintable(): bool Checks if all the characters in the :mod:`bytes ` are printable in ASCII. :returns: * `true` -- when the characters are printable. * `false` -- otherwise .. method:: proc bytes.isTitle(): bool Checks if all uppercase characters are preceded by uncased characters, and if all lowercase characters are preceded by cased characters in ASCII. :returns: * `true` -- when the condition described above is met. * `false` -- otherwise .. method:: proc bytes.toLower(): bytes Creates a new :mod:`bytes ` with all applicable characters converted to lowercase. :returns: A new :mod:`bytes ` with all uppercase characters (A-Z) replaced with their lowercase counterpart in ASCII. Other characters remain untouched. .. method:: proc bytes.toUpper(): bytes Creates a new :mod:`bytes ` with all applicable characters converted to uppercase. :returns: A new :mod:`bytes ` with all lowercase characters (a-z) replaced with their uppercase counterpart in ASCII. Other characters remain untouched. .. method:: proc bytes.toTitle(): bytes Creates a new :mod:`bytes ` with all applicable characters converted to title capitalization. :returns: A new :mod:`bytes ` with all cased characters(a-zA-Z) following an uncased character converted to uppercase, and all cased characters following another cased character converted to lowercase. .. function:: proc +=(ref lhs: bytes, const ref rhs: bytes): void Appends the :mod:`bytes ` `rhs` to the :mod:`bytes ` `lhs`. .. function:: proc =(ref lhs: bytes, rhs: bytes) Copies the :mod:`bytes ` `rhs` into the :mod:`bytes ` `lhs`. .. function:: proc =(ref lhs: bytes, rhs_c: c_string) Copies the c_string `rhs_c` into the bytes `lhs`. Halts if `lhs` is a remote bytes. .. function:: proc +(s0: bytes, s1: bytes) :returns: A new :mod:`bytes ` which is the result of concatenating `s0` and `s1` .. function:: proc *(s: bytes, n: integral) :returns: A new :mod:`bytes ` which is the result of repeating `s` `n` times. If `n` is less than or equal to 0, an empty bytes is returned.