Source code for ae.base

"""
basic constants, helper functions, classes and context managers
===============================================================

this module is pure python, has no external dependencies, and provides a comprehensive toolkit of base constants,
common helper functions, useful classes, and context managers for a wide variety of programming tasks.


string manipulation
-------------------

functions for converting, cleaning, normalizing, and formatting strings.

* :func:`ascii_dec_str`: decodes an ascii string literal converted by :func:`ascii_enc_lit` back to its Unicode form.
* :func:`ascii_enc_lit`: encodes a Unicode string into a reversible 7-bit ASCII representation, useful for transport
  protocol/HTTP headers.
* :func:`camel_to_snake`: converts a string from CamelCase to snake_case.
* :func:`snake_to_camel`: converts a string from snake_case to CamelCase.
* :func:`norm_name`: normalizes a string to be a valid identifier (e.g., for variable-, method-, or file-names).
* :func:`norm_line_sep`: converts all line separator combinations (CRLF, CR) in a string to a single newline (LF).
* :func:`defuse`: converts special characters in string to Unicode alternatives, making it safe for use as
  a URL slug, path or filename.
* :func:`dedefuse`: reverses the operation of :func:`defuse`, restoring the original string.
* :func:`force_encoding`: ensures text is in a specific encoding without raising errors, replacing characters as needed.
* :func:`to_ascii`: converts a Unicode string into its closest ASCII representation by removing accents and diacritics.
* :func:`format_given`: a replacement for `str.format_map` that formats a string but leaves placeholders intact if they
  are not found in the provided mapping.


data structure utilities
------------------------

helpers for working with lists, dictionaries, and other data structures.

* :func:`evaluate_literal`: replacement for :func:`ast.literal_eval` that also interprets/recognizes unquoted strings
  as `str` type.
* :func:`duplicates`: returns a list of all duplicate items found in any type of iterable.
* :func:`deep_dict_update`: recursively updates a dictionary in-place with values from another dictionary.
* :func:`mask_secrets`: hides sensitive string values (e.g., passwords, API keys) in deeply nested data structures,
  useful for logging.


file, path & I/O operations
---------------------------

simplify file system interactions with wrappers and context managers.

* :func:`extend_file`: append string to a file or create it if file not exists.
* :func:`in_wd`: a context manager to temporarily switch/change the current working directory.
* :func:`norm_path`: normalizes a path by expanding user home directories (`~`), resolving `.`, `..`, symbolic links,
  and converting between absolute and relative paths.
* :func:`read_bin_file`: reads the entire content of a binary file into a bytes object.
* :func:`read_file`: reads the entire content of a text file into a string.
* :func:`write_bin_file`: writes a bytes object to a file, overwriting existing content.
* :func:`write_file`: writes a string into a file, overwriting existing content.


networking utilities
--------------------

* :func:`mask_url`: hides or replaces the password/token portion of a URL for safe logging.
* :func:`url_failure`: determines if and why an HTTP|FTP target is unavailable.


general utilities & helpers
---------------------------

a collection of miscellaneous mathematical, date/time, and other standalone helper functions.

mathematical
^^^^^^^^^^^^

* :func:`sign`: returns the sign of a number (-1 for negative, 0 for zero, 1 for positive).
* :func:`round_traditional`: rounds a float value using traditional rounding rules (e.g., `0.5` rounds up).

date & time
^^^^^^^^^^^
* :func:`utc_datetime`: Returns the current date and time as a timezone-naive `datetime` object in UTC.
* :func:`now_str`: creates a compact, sortable timestamp string from the current UTC time.

miscellaneous
^^^^^^^^^^^^^
* :func:`dummy_function`: a null function that accepts any arguments and returns `None`.
* :func:`env_str`: retrieves the string value of an OS environment variable, with an option to automatically convert the
  variable name to the conventional format.
* :func:`on_ci_host`: detects if it is running on the CI of a Git repository server (GitHub or GitLab).


base types and classes
----------------------

* :class:`UnsetType`: the class for the :data:`UNSET` singleton object, useful as a sentinel value when `None` is a
  valid input.
* :class:`UnformattedValue`: a helper class for :func:`format_given` to represent a placeholder that was not found in
  the formatting map.
* :class:`GivenFormatter`: a helper class for :func:`format_given` that overrides default formatting behavior to keep
  missing placeholders.


base constants
--------------

predefined constants for defaults, project structure, file conventions, to decrease redundancy and increase performance.

project & file structure
^^^^^^^^^^^^^^^^^^^^^^^^

* :data:`CFG_EXT`: file extension for CFG/INI configuration files ('.cfg').
* :data:`DEF_PROJECT_PARENT_FOLDER`: default directory name for grouping source code projects ('src').
* :data:`DOCS_FOLDER`: default name for a project's documentation folder ('docs').
* :data:`INI_EXT`: file extension for INI configuration files ('.ini').
* :data:`PACKAGE_INCLUDE_FILES_PREFIX`: prefix for files/folders to be included in setup package data (used by
  :mod:`ae.updater` and :mod:`aedev.project_manager`)
* :data:`PY_CACHE_FOLDER`: default name for Python's cache folder ('__pycache__').
* :data:`PY_EXT`: file extension for Python modules ('.py').
* :data:`PY_INIT`: the filename for a Python package initializer ('__init__.py').
* :data:`PY_MAIN`: the filename for a Python executable's main module ('__main__.py').
* :data:`TESTS_FOLDER`: default name for a project's tests folder ('tests').
* :data:`TEMPLATES_FOLDER`: default name for a folder containing file templates ('templates').


formats & default settings
^^^^^^^^^^^^^^^^^^^^^^^^^^

* :data:`DATE_ISO`: ISO format string for dates ("%Y-%m-%d").
* :data:`DATE_TIME_ISO`: ISO format string for :mod:`datetime.datetime` dates ("%Y-%m-%d %H:%M:%S.%f").
* :data:`DEF_ENCODE_ERRORS`: the default error handling strategy for encoding ('backslashreplace').
* :data:`DEF_ENCODING`: the default encoding used for string operations ('ascii').
* :data:`NAME_PARTS_SEP`: the character used as a separator in name conversions ('_').
* :data:`NOW_STR_FORMAT`: the datetime format string, used e.g. by :func:`now_str` for creating timestamps.
* :data:`UNSET`: a singleton instance of :class:`UnsetType`, used where `None` is a valid data value.


os.path shortcuts
^^^^^^^^^^^^^^^^^

the following are direct references to functions in the :mod:`os.path` module for convenient and quicker access:

* :data:`os_path_abspath`: :func:`os.path.abspath`
* :data:`os_path_basename`: :func:`os.path.basename`
* :data:`os_path_dirname`: :func:`os.path.dirname`
* :data:`os_path_expanduser`: :func:`os.path.expanduser`
* :data:`os_path_isdir`: :func:`os.path.isdir`
* :data:`os_path_isfile`: :func:`os.path.isfile`
* :data:`os_path_join`: :func:`os.path.join`
* :data:`os_path_normpath`: :func:`os.path.normpath`
* :data:`os_path_realpath`: :func:`os.path.realpath`
* :data:`os_path_relpath`: :func:`os.path.relpath`
* :data:`os_path_sep`: :data:`os.path.sep`
* :data:`os_path_splitext`: :func:`os.path.splitext`

"""
import base64
import datetime
import os
import socket
import ssl
import string
import unicodedata

from ast import literal_eval
from contextlib import contextmanager
from urllib.error import HTTPError, URLError
from urllib.parse import urlparse, urlunparse
from urllib.request import Request, urlopen
from typing import Any, Final, Generator, Iterable


__version__ = '0.3.85'


DOCS_FOLDER = 'docs'                            #: project documentation root folder name
TESTS_FOLDER = 'tests'                          #: name of project folder to store unit/integration tests
TEMPLATES_FOLDER = 'templates'
""" template folder name, used in template and namespace root projects to maintain and provide common file templates """

PACKAGE_INCLUDE_FILES_PREFIX = 'ae_'            #: file/folder names prefix included in setup package_data/ae_updater

PY_CACHE_FOLDER = '__pycache__'                 #: python cache folder name
PY_EXT = '.py'                                  #: file extension for modules and hooks
PY_INIT = '__init__' + PY_EXT                   #: init-module file name of a python package
PY_MAIN = '__main__' + PY_EXT                   #: main-module file name of a python executable

CFG_EXT = '.cfg'                                #: CFG config file extension
INI_EXT = '.ini'                                #: INI config file extension

DATE_ISO = "%Y-%m-%d"                           #: ISO string format for date values (e.g. in config files/variables)
DATE_TIME_ISO = "%Y-%m-%d %H:%M:%S.%f"          #: ISO string format for datetime values

DEF_PROJECT_PARENT_FOLDER = 'src'               #: default directory name to put code project roots underneath of it

DEF_ENCODE_ERRORS = 'backslashreplace'          #: default encode error handling for UnicodeEncodeErrors
DEF_ENCODING = 'ascii'
""" encoding for :func:`force_encoding` that will always work independent from destination (console, file sys, ...).
"""

NAME_PARTS_SEP = '_'                                #: name parts separator character, e.g. for :func:`norm_name`

NOW_STR_FORMAT = "{sep}%Y%m%d{sep}%H%M%S{sep}%f"    #: timestamp format of :func:`now_str`


[docs] def ascii_dec_str(encoded_str: str) -> str: """ convert non-ASCII chars in a string literal encoded with :func:`ascii_enc_lit` to Unicode chars. :param encoded_str: string literal to decode (covert contained ASCII-encoded characters back Unicode chars). :return: decoded Unicode string. :raises: SyntaxError if invalid string literal got specified in :paramref:`~ascii_dec_str.encoded_str`. """ return literal_eval(encoded_str).decode()
[docs] def ascii_enc_lit(unicode_str: str) -> str: """ convert a Unicode string with non-ASCII chars to a revertible 7-bit/ASCII literal/representation. :param unicode_str: string to encode/convert. :return: revertible representation of the specified string, using only ASCII characters, e.g., to put in an http header. """ return repr(unicode_str.encode())
[docs] def camel_to_snake(name: str) -> str: """ convert a name from CamelCase to snake_case. :param name: name string in CamelCaseFormat. :return: name in snake_case_format. """ str_parts = [] for char in name: if char.isupper(): str_parts.append(NAME_PARTS_SEP + char) else: str_parts.append(char) return "".join(str_parts)
[docs] def deep_dict_update(data: dict, update: dict, overwrite: bool = True): """ update the optionally nested data dict in-place with the items and subitems from the update dict. :param data: dict to be updated/extended. non-existing keys of dict-subitems will be added. :param update: dict with the [sub-]items to update in the :paramref:`~deep_dict_update.data` dict. :param overwrite: pass False to not overwrite an already existing value. .. hint:: see the module/portion :mod:`ae.deep` for more deep update helper functions. """ for upd_key, upd_val in update.items(): if isinstance(upd_val, dict): if upd_key not in data: data[upd_key] = {} deep_dict_update(data[upd_key], upd_val, overwrite=overwrite) elif overwrite or upd_key not in data: data[upd_key] = upd_val
# noinspection GrazieInspection ASCII_UNICODE = ( ('/', '⁄'), # U+2044: Fraction Slash; '∕' U+2215: Division Slash; '⧸' U+29F8: Big Solidus; # '╱' U+FF0F: Fullwidth Solidus; '╱' U+2571: Box Drawings Light Diagonal Upper Right to Lower Left ('|', '।'), # U+0964: Devanagari Danda ('\\', '﹨'), # U+FE68: SMALL REVERSE SOLIDUS; '⑊' U+244A OCR DOUBLE BACKSLASH; '⧵' U+29F5 REV. SOLIDUS OPERATOR (':', '﹕'), # U+FE55: Small Colon ('*', '﹡'), # U+FE61: Small Asterisk ('?', '﹖'), # U+FE56: Small Question Mark ('"', '"'), # U+FF02: Fullwidth Quotation Mark ("'", '‘'), # U+2018: Left Single; '’' U+2019: Right Single; '‛' U+201B: Single High-Reversed-9 Quotation Mark ('<', '⟨'), # U+27E8: LEFT ANGLE BRACKET; '‹' U+2039: Single Left-Pointing Angle Quotation Mark ('>', '⟩'), # U+27E9: RIGHT ANGLE BRACKET; '›' U+203A: Single Right-Pointing Angle Quotation Mark ('(', '⟮'), # U+27EE: MATHEMATICAL LEFT FLATTENED PARENTHESIS (')', '⟯'), # U+27EF: MATHEMATICAL RIGHT FLATTENED PARENTHESIS ('[', '⟦'), # U+27E6: MATHEMATICAL LEFT WHITE SQUARE BRACKET (']', '⟧'), # U+27E7: MATHEMATICAL RIGHT WHITE SQUARE BRACKET ('{', '﹛'), # U+FE5B: Small Left Curly Bracket ('}', '﹜'), # U+FE5C: Small Right Curly Bracket ('#', '﹟'), # U+FE5F: Small Number Sign (';', '﹔'), # U+FE54: Small Semicolon ('@', '﹫'), # U+FE6B: Small Commercial At ('&', '﹠'), # U+FE60: Small Ampersand ('=', '﹦'), # U+FE66: Small Equals Sign ('+', '﹢'), # U+FE62: Small Plus Sign ('$', '﹩'), # U+FE69: Small Dollar Sign ('%', '﹪'), # U+FE6A: Small Percent Sign ('^', '^'), # U+FF3E: Fullwidth Circumflex Accent (',', '﹐'), # U+FE50: Small Comma (' ', '␣'), # U+2423: Open Box; more see underneath and https://unicode-explorer.com/articles/space-characters: # ' ' U+00A0: No-Break Space (NBSP); ' ' U+1680 Ogham Space Mark; ' ' U+2000 En Quad; # ' ' U+2001 Em Quad; ' ' U+2002 En Space; ' ' U+2003 Em Space; ' ' U+2004 Three-Per-Em # ' ' U+2005 Four-Per-Em; ' ' U+2006 Six-Per-Em; ' ' U+2007 Figure Space; # ' ' U+2008 Punctuation Space; ' ' U+2009 Thin; ' ' U+200A Hair Space; # ' ' U+202F: Narrow No-Break Space (NNBSP); ' ' U+205F Medium Mathematical Space; # '␠' U+2420 symbol for space; '␣' U+2423 Open Box; ' ' U+3000: Ideographic Space (chr(127), '␡'), # U+2421: DELETE SYMBOL # ('_', '𛲖'), # U+1BC96: Duployan Affix Low Line; '_' U+FF3F Fullwidth Low Line ) + tuple((chr(low_asc_ord), chr(0x2400 + low_asc_ord)) for low_asc_ord in range(32)) """ transformation table of special ASCII characters to a similar/alternative non-functional/-escaping Unicode char, see https://www.compart.com/en/unicode/category/Po and https://xahlee.info/comp/unicode_naming_slash.html (http!) """ URI_SEP_STR = '://' #: separator between service and address(host/path) in URIs URI_SEP_UNICODE_CHAR = '⫻' #: single Unicode char for :data:`URI_SEP_STR` U+2AFB: TRIPLE SOLIDUS BINARY RELATION ASCII_TO_UNICODE = str.maketrans(dict(ASCII_UNICODE)) """ :func:`str.translate` map to convert ASCII to an alternative defused Unicode character - used by :func:`defuse` """ UNICODE_TO_ASCII = str.maketrans({unicode_char: ascii_char for ascii_char, unicode_char in ASCII_UNICODE + ((URI_SEP_STR, URI_SEP_UNICODE_CHAR), )}) """ :func:`str.translate` Unicode to ASCII map - used by :func:`dedefuse` """
[docs] def dedefuse(value: str) -> str: """ convert a string that got defused with :func:`defuse` back to its original form. :param value: string defused with the function :func:`defuse`. :return: re-activated form of the string (with all ASCII special characters recovered). """ return value.translate(UNICODE_TO_ASCII)
[docs] def defuse(value: str) -> str: # noinspection GrazieInspection """ convert a file path or a URI into a defused/presentational form to be usable as URL slug or file/folder name. :param value: any string to defuse (replace special chars with Unicode alternatives). :return: string with its special characters replaced by its pure presentational alternatives. the ASCII character range 0..31 gets converted to the Unicode range U+2400 + ord(char): 0==U+2400 ... 31==U+241F. in most unix variants only the slash and the ASCII 0 characters are not allowed in file names. in MS Windows are not allowed: ASCII 0..31 / | \\ : * ? ” % < > ( ). some blogs recommend also not allowing (convert) the characters `#` and `'`. only old POSIX seems to be even more restricted (only allowing alphanumeric characters plus . - and _). more on allowed characters in file names in the answers of RedGrittyBrick on https://superuser.com/questions/358855 and of Christopher Oezbek on https://stackoverflow.com/questions/1976007. file name length is not restricted/shortened by this function, although the maximum is 255 characters on most OSs. .. hint:: use the :func:`dedefuse` function to convert the defused string back to the corresponding URI/file-path. """ return value.replace(URI_SEP_STR, URI_SEP_UNICODE_CHAR).translate(ASCII_TO_UNICODE) # replace makes URIs shorter
[docs] def dummy_function(*_args, **_kwargs): """ null function accepting any arguments and returning None. :param _args: ignored positional arguments. :param _kwargs: ignored keyword arguments. :return: always None. """
[docs] def duplicates(values: Iterable) -> list: """ determine all duplicates in the iterable specified in the :paramref:`~duplicates.values` argument. inspired by Ritesh Kumars answer to https://stackoverflow.com/questions/9835762. :param values: iterable (list, tuple, str, ...) to search for duplicate items. :return: list of the duplicate items found (can contain the same duplicate multiple times). """ seen_set: set = set() seen_add = seen_set.add dup_list: list = [] dup_add = dup_list.append for item in values: if item in seen_set: dup_add(item) else: seen_add(item) return dup_list
[docs] def env_str(name: str, convert_name: bool = False) -> str | None: """ determine the string value of an OS environment variable, optionally preventing invalid variable name. :param name: name of an OS environment variable. :param convert_name: pass True to prevent invalid variable names by converting CamelCase names into SNAKE_CASE, lower-case into upper-case and all non-alpha-numeric characters into underscore characters. :return: string value of OS environment variable if found, else None. """ if convert_name: name = norm_name(camel_to_snake(name)).upper() return os.environ.get(name)
[docs] def evaluate_literal(literal_string: str) -> bool | bytes | dict | complex | float | int | list | set | str | tuple: """ evaluates a Python expression while accepting unquoted strings as str type. :param literal_string: any literal of the base types (like dict, list, set, tuple) which are recognized by :func:`ast.literal_eval`. :return: an instance of the data type or the specified string, even if it is not quoted with high comma characters. `None` will be returned if the specified literal is the string "None". """ try: return literal_eval(literal_string) except (IndentationError, SyntaxError, TypeError, ValueError): return literal_string
[docs] def extend_file(file_path: str, content: str, encoding: str | None = None, make_dirs: bool = False): """ create/extend the text file specified by :paramref:`~extend_file.file_path` with the specified content string. :param file_path: file path/name to write the passed content into (overwriting any previous content!). :param content: new file content passed either as string or as `bytes`. if a byte array gets passed, then this method will automatically write the content as binary. :param encoding: encoding used to convert/interpret the string content to write. :param make_dirs: pass True to create not existing parent folders of the specified file path. :raises IsADirectoryError: file_path points to a directory instead of a file. :raises LookupError: unknown encoding name. :raises NotADirectoryError: part of the path expected to be a directory is actually a file. :raises OSError: disk full, filename too long, too many open files, network or device disconnected, file_path is misspelled or contains invalid characters. :raises PermissionError: if the current OS user account lacks permissions to write the file content. :raises TypeError: content is not of type `str`. :raises UnicodeEncodeError: content cannot be encoded using the selected encoding. :raises ValueError: other encoding errors, invalid mode or incompatible arguments. """ if make_dirs and (dir_path := os_path_dirname(file_path)): os.makedirs(dir_path, exist_ok=True) with open(file_path, mode='a', encoding=encoding) as file_handle: file_handle.write(content)
[docs] def force_encoding(text: str | bytes, encoding: str = DEF_ENCODING, errors: str = DEF_ENCODE_ERRORS) -> str: """ force/ensure the encoding of text (str or bytes) without any UnicodeDecodeError/UnicodeEncodeError. :param text: text as str/bytes. :param encoding: encoding (def= :data:`DEF_ENCODING`). :param errors: encode error handling (def= :data:`DEF_ENCODE_ERRORS`). :return: text as str (with all characters checked/converted/replaced to be encode-able). """ enc_str: bytes = text.encode(encoding=encoding, errors=errors) if isinstance(text, str) else text return enc_str.decode(encoding=encoding)
[docs] class UnformattedValue: # pylint: disable=too-few-public-methods """ helper class for :func:`~ae.base.format_given` to keep placeholder with format unchanged if not found. """
[docs] def __init__(self, key: int | str): self.key = key
[docs] def __format__(self, format_spec: str): """ overriding Python object class method to return placeholder unchanged, including the curly brackets. """ # pylint: disable=consider-using-f-string return "{{{}{}}}".format(self.key, ":" + format_spec if format_spec else "")
[docs] class GivenFormatter(string.Formatter): """ helper class for :func:`~ae.base.format_given` to keep placeholder with format unchanged if not found. """
[docs] def get_value(self, key, args, kwargs): """ overriding to keep placeholder unchanged if not found """ try: return super().get_value(key, args, kwargs) except KeyError: return UnformattedValue(key)
[docs] def format_given(text: str, placeholder_map: dict[str, Any], strict: bool = False): """ replacement for Python's str.format_map(), keeping intact placeholders that are not in the specified mapping. :param text: text/template in which the given/specified placeholders will get replaced. in contrary to :func:`str.format_map`, no KeyError will be raised for placeholders not specified in :paramref:`~format_given.placeholder_map`. :param placeholder_map: dict with placeholder keys to be replaced in :paramref:`~format_given.text` argument. :param strict: pass True to raise an error for text templates containing unpaired curly brackets. :return: the specified :paramref:`~format_given.text` with only the placeholders specified in :paramref:`~format_given.placeholder_map` replaced with their respective map value. inspired by the answer of CodeManX in `https://stackoverflow.com/questions/3536303`__ """ formatter = GivenFormatter() try: return formatter.vformat(text, (), placeholder_map) except (ValueError, Exception) as ex: # pylint: disable=broad-except if strict: raise ex return text
[docs] @contextmanager def in_wd(new_cwd: str) -> Generator[None, None, None]: """ context manager to temporarily switch the current working directory / cwd. :param new_cwd: path to the directory to switch to (within the context/with block). an empty string gets interpreted as the current working directory. the following example demonstrates a typical usage, together with a temporary path, created with the help of Pythons :class:`~tempfile.TemporaryDirectory` class:: with tempfile.TemporaryDirectory() as tmp_dir, in_wd(tmp_dir): # within the context the tmp_dir is set as the current working directory assert os.getcwd() == tmp_dir # here the current working directory got set back to the original path and the temporary directory got removed """ cur_dir = os.getcwd() try: if new_cwd: # empty new_cwd results in the current working folder (no dir change needed/prevent error) os.chdir(new_cwd) yield finally: os.chdir(cur_dir)
[docs] def mask_secrets(data: dict | Iterable, fragments: Iterable[str] = ('password', 'pwd')) -> dict | Iterable: """ partially-hide secret string values like passwords/credit-card-numbers in deeply nestable data structures. :param data: iterable deep data structure wherein its item values get masked if their related dict item key contains one of the fragments specified in :paramref:`~mask_secrets.fragments`. :param fragments: dict key string fragments of which the related value will be masked. each fragment has to be specified with lower case chars! defaults to ('password', 'pwd') if not passed. :return: specified data structure with the secrets masked (¡in-place!). """ is_dict = isinstance(data, dict) for idx, val in tuple(data.items()) if is_dict else enumerate(data): # type: ignore # silly mypy not sees is_dict val_is_str = isinstance(val, str) if not val_is_str and isinstance(val, Iterable): mask_secrets(val, fragments=fragments) elif is_dict and val_is_str and isinstance(idx, str): idx_lower = idx.lower() if any(_frag in idx_lower for _frag in fragments): data[idx] = val[:3] + "*" * 9 # type: ignore # silly mypy not sees is_dict return data
[docs] def mask_url(url: str, replacement: str = "¿¿¿") -> str: """ hide|replace the password/token in a URL. :param url: URL in which an optional password|token will be searched and replaced. :param replacement: optional replacement string, if not specified then the default value will be used. :return: URL with the credentials masked/replaced. """ parts = urlparse(url) if parts.password is None: return url # manually split out the netloc, because using parts.hostname/,port would have to be checked for None&hostname.lower parts = parts._replace(netloc=f"{parts.username}:{replacement}@{parts.netloc.rpartition('@')[-1]}") # noinspection PyTypeChecker return urlunparse(parts)
[docs] def norm_line_sep(text: str) -> str: # noinspection GrazieInspection """ convert any combination of line separators in the :paramref:`~norm_line_sep.text` arg to new-line characters. :param text: string containing any combination of line separators ('\\\\r\\\\n' or '\\\\r'). :return: normalized/converted string with only new-line ('\\\\n') line separator characters. """ return text.replace('\r\n', '\n').replace('\r', '\n')
[docs] def norm_name(name: str, allow_num_prefix: bool = False) -> str: """ normalize name to start with a letter/alphabetic/underscore and to contain only alphanumeric/underscore chars. :param name: any string to be converted into a valid variable/method/file/... name. :param allow_num_prefix: pass True to allow leading digits in the returned normalized name. :return: cleaned/normalized/converted name string (e.g., for a variable-/method-/file-name). """ str_parts: list[str] = [] for char in name: if char.isalpha() or char.isalnum() and (allow_num_prefix or str_parts): str_parts.append(char) else: str_parts.append('_') return "".join(str_parts)
[docs] def norm_path(path: str, make_absolute: bool = True, remove_base_path: str = "", remove_dots: bool = True, resolve_sym_links: bool = True) -> str: """ normalize a path, replacing `..`/`.` parts or the tilde character (home folder) and transform to relative/abs. :param path: path string to normalize/transform. :param make_absolute: pass False to not convert the returned path to an absolute path. :param remove_base_path: pass a valid base path to return a relative path, even if the argument values of :paramref:`~norm_path.make_absolute` or :paramref:`~norm_path.resolve_sym_links` are `True`. :param remove_dots: pass False to not replace/remove the `.` and `..` placeholders. :param resolve_sym_links: pass False to not resolve symbolic links, passing True implies a `True` value also for the :paramref:`~norm_path.make_absolute` argument. :return: normalized path string: absolute if :paramref:`~norm_path.remove_base_path` is empty and either :paramref:`~norm_path.make_absolute` or :paramref:`~norm_path.resolve_sym_links` is `True`; relative if :paramref:`~norm_path.remove_base_path` is a base path of :paramref:`~norm_path.path` or if :paramref:`~norm_path.path` got specified as a relative path and neither :paramref:`~norm_path.make_absolute` nor :paramref:`~norm_path.resolve_sym_links` is `True`. .. hint:: the :func:`~ae.paths.normalize` function additionally replaces :data:`~ae.paths.PATH_PLACEHOLDERS`. """ path = path or "." if path[0] == "~": path = os_path_expanduser(path) if remove_dots: path = os_path_normpath(path) if resolve_sym_links: path = os_path_realpath(path) elif make_absolute: path = os_path_abspath(path) if remove_base_path: if remove_base_path[0] == "~": remove_base_path = os_path_expanduser(remove_base_path) path = os_path_relpath(path, remove_base_path) return path
[docs] def now_str(sep: str = "") -> str: """ return the current UTC timestamp as string (to use as suffix for file and variable/attribute names). :param sep: optional prefix and separator character (separating date from time and in time part the seconds from the microseconds). :return: naive UTC timestamp (without timezone info) as string (length=20 + 3 * len(sep)). """ return utc_datetime().strftime(NOW_STR_FORMAT.format(sep=sep))
[docs] def on_ci_host() -> bool: """ check and return True if it is running on the GitLab/GitHub CI host/server. :return: True if running on CI host, else False. .. note:: env vars always available: 'CI' on GitHub (Pre-pipeline); 'CI_PROJECT_ID' (internal ProjectId) on GitLab """ return 'CI' in os.environ or 'CI_PROJECT_ID' in os.environ
os_path_abspath = os.path.abspath os_path_basename = os.path.basename os_path_dirname = os.path.dirname os_path_expanduser = os.path.expanduser os_path_isdir = os.path.isdir os_path_isfile = os.path.isfile os_path_join = os.path.join os_path_normpath = os.path.normpath os_path_realpath = os.path.realpath os_path_relpath = os.path.relpath os_path_sep = os.path.sep # pylint: disable=invalid-name os_path_splitext = os.path.splitext
[docs] def pep8_format(value: Any, indent_level: int = 0): """ PEP-8-conform representation code string of deep dict/list structures, superseding :func:`pprint.pformat`. :param value: value to format PEP-8-conform (hanging indent always with 4 spaces).. :param indent_level: level of indentation. pass e.g. 1 to indent the output with 4 spaces. :return: representation string of the specified value. """ spaces = " " * 4 # PEP-8: 4 spaces indent_spaces = spaces * indent_level parts = [] if value and isinstance(value, dict): parts.append("{") for key, val in value.items(): formatted = pep8_format(val, indent_level=indent_level + 1) parts.append(f"{indent_spaces}{spaces}{repr(key)}: {formatted},") parts.append(indent_spaces + "}") elif value and isinstance(value, list): parts.append("[") for item in value: formatted = pep8_format(item, indent_level + 1) parts.append(f"{indent_spaces}{spaces}{formatted},") parts.append(indent_spaces + "]") else: parts.append(repr(value)) return os.linesep.join(parts)
[docs] def read_bin_file(file_path: str) -> bytes: """ returning the binary content of the specified by the :paramref:`~read_bin_file.file_path` argument. :param file_path: path/name of the file to load the content from. :return: file content bytes array. :raises FileNotFoundError: if the file to read from does not exist. :raises OSError: if :paramref:`~read_bin_file.file_path` is misspelled or contains invalid characters. :raises PermissionError: if the current OS user account lacks permissions to read the file content. """ with open(file_path, "rb") as file_handle: return file_handle.read()
[docs] def read_file(file_path: str, encoding: str | None = None, error_handling: str | None = 'ignore') -> str: """ returning the string content of the text file specified by :paramref:`~read_file.file_path` argument. :param file_path: path/name of the file to load the content from. :param encoding: encoding used to load and convert/interpret the file content (passed onto the `encoding` parameter of the built-in `open` function). :param error_handling: pass `'strict'` or ``None`` to raise a `ValueError` exception on encoding errors. the default value `'ignore'` will ignore any decoding errors (resulting in missing characters in the return value). passed onto the `errors` parameter of the built-in `open` function. :return: the content of the file as a string. :raises FileNotFoundError: if the file to read from does not exist. :raises IsADirectoryError: file_path points to a directory instead of a file. :raises LookupError: unknown encoding name. :raises NotADirectoryError: part of the path expected to be a directory is actually a file. :raises OSError: filename too long, too many open files, device/network error, file_path misspelled or contains invalid characters. :raises PermissionError: if the current OS user account lacks permissions to read the file content. :raises UnicodeDecodeError: file content cannot be decoded with the specified encoding or error_handling. :raises ValueError: invalid error_handling argument. """ with open(file_path, "r", encoding=encoding, errors=error_handling) as file_handle: return file_handle.read()
[docs] def round_traditional(num_value: float, num_digits: int = 0) -> float: """ round numeric value traditional. needed because python round() is working differently, e.g., round(0.075, 2) == 0.07 instead of 0.08 inspired by https://stackoverflow.com/questions/31818050/python-2-7-round-number-to-nearest-integer. :param num_value: float value to be round. :param num_digits: number of digits to be round (def=0 - rounds to an integer value). :return: rounded value. """ return round(num_value + 10 ** (-len(str(num_value)) - 1), num_digits)
[docs] def sign(number: float) -> int: """ return ths sign (-1, 0, 1) of a number. :param number: any number of type float or int. :return: -1 if the number is negative, 0 if it is zero, or 1 if it is positive. """ return (number > 0) - (number < 0)
[docs] def snake_to_camel(name: str, back_convertible: bool = False) -> str: """ convert name from snake_case to CamelCase. :param name: name string composed of parts separated by an underscore character (:data:`NAME_PARTS_SEP`). :param back_convertible: pass `True` to get the first character of the returned name in lower-case if the snake name has no leading underscore character (and to allow the conversion between snake and camel case without information loss). :return: name in camel case. """ ret = "".join(part.capitalize() for part in name.split(NAME_PARTS_SEP)) if back_convertible and name[0] != NAME_PARTS_SEP: ret = ret[0].lower() + ret[1:] return ret
[docs] def to_ascii(unicode_str: str) -> str: """ converts Unicode string into ascii representation. useful for fuzzy string compare; inspired by MiniQuark's answer in: https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string :param unicode_str: string to convert. :return: converted string (replaced accents, diacritics, ... into normal ascii characters). """ nfkd_form = unicodedata.normalize('NFKD', unicode_str) return "".join([c for c in nfkd_form if not unicodedata.combining(c)]).replace('ß', "ss").replace('€', "Euro")
# pylint: disable-next=too-many-arguments,too-many-positional-arguments,too-many-return-statements
[docs] def url_failure(url: str, token: str = "", username: str = "", password: str = "", git_repo: bool = False, timeout: float | None = None) -> str: """ determine if and why an FTP or HTTP[S] target is not available via a GET request. :param url: URL of a target|page|file to check (not downloaded, fetching only the header). :param token: optional bearer token to authenticate (only for HTTPS protocol). :param username: optional username to authenticate (for HTTPS, together with the password argument). :param password: optional password to authenticate (for HTTPS, together with the username argument). :param git_repo: optimized check for Git repository HTTP servers/sites (like GitHub, GitLab, Bitbucket, Gitea, SourceHut, Mercury, etc. as long as they implement Smart HTTP). if specified then the :paramref:`~url_failure.url` has to point to a repository. :param timeout: connection timeout in seconds (see :func:`urllib.request.urlopen`). :return: empty string if target header is available, else an error description. if an FTP|HTTP response error occurred then the error/status code will be returned in the first 3 characters. .. note:: credentials for server authentication can be specified either (1) embedded into the specified url argument, (2) as bearer token in the token argument or (3) via the username/password arguments. in all cases the functino will remove these secrets from the returned error description string. """ if git_repo: if not url.endswith(".git"): url += ".git" url += "/info/refs?service=git-upload-pack" headers = {} if token: assert not username and not password, "url_failure accepts either a token or username/password, not both" headers['Authorization'] = "Bearer " + token elif username or password: creds = f"{username}:{password}".encode('utf-8') headers['Authorization'] = "Basic " + base64.b64encode(creds).decode('utf-8') # noinspection PyBroadException try: request = Request(url, method='GET', headers=headers) with urlopen(request, timeout=timeout) as response: # open connection and only read the header status = response.getcode() # no need to call response.read() return "" if 200 <= status < 300 else f"{status} {mask_url(url)} {response.reason=}" except HTTPError as exception: return f"{exception.code} {mask_url(url)} raised HTTPError {exception.reason=}" except URLError as exception: err_msg = f" {mask_url(url)} raised {exception.errno=} {exception.reason=};" if isinstance(exception.reason, socket.gaierror): return '995' + f"{err_msg} could not resolve hostname" if isinstance(exception.reason, ssl.SSLCertVerificationError): return '996' + f"{err_msg} SSL certificate verification failed" if isinstance(exception.reason, socket.timeout): return '997' + f"{err_msg} connection timed out after {timeout} seconds" return '998' + f"{err_msg} could not reach the server" except socket.timeout as _exception: # noqa: F841 # str(_exception) could contain password|token return '997' + f" {mask_url(url)} raised socket-timeout exception after {timeout} seconds" except Exception as _exception: # noqa: F841 # pylint: disable=broad-exception-caught return '999' + f" {mask_url(url)} raised unexpected exception" # str(_exception) COULD contain password
[docs] def utc_datetime() -> datetime.datetime: """ return the current UTC timestamp as string (to use as suffix for file and variable/attribute names). :return: timestamp string of the actual UTC date and time. """ return datetime.datetime.now(datetime.timezone.utc).replace(tzinfo=None)
[docs] def write_bin_file(file_path: str, content: bytes, make_dirs: bool = False): """ (over)write the file specified by :paramref:`~write_bin_file.file_path` with the specified binary/bytes content. :param file_path: file path/name to write the passed content into (overwriting any previous content!). :param content: new file content specified as `bytes`. :param make_dirs: pass True to automatically create not existing folders of the file path. :raises FileExistsError: if the file to write to exists already and is write-protected. :raises FileNotFoundError: if parts of the file path do not exist. :raises IsADirectoryError: file_path points to a directory instead of a file. :raises NotADirectoryError: part of the path expected to be a directory is actually a file. :raises OSError: disk full, filename too long, too many open files, network or device disconnected, file_path is misspelled or contains invalid characters. :raises PermissionError: if the current OS user account lacks permissions to write the file content. :raises TypeError: content is not of type `bytes`. """ if make_dirs and (dir_path := os_path_dirname(file_path)): os.makedirs(dir_path, exist_ok=True) with open(file_path, mode='wb') as file_handle: file_handle.write(content)
[docs] def write_file(file_path: str, content: str, encoding: str | None = None, make_dirs: bool = False): """ (over)write the file specified by :paramref:`~write_file.file_path` with the specified string content. :param file_path: file path/name to write the passed content into (overwriting any previous content!). :param content: new file content passed as string. :param encoding: encoding used to write/convert/interpret the file content to write (defaults to utf-8). :param make_dirs: pass True to automatically create not existing folders of the file path (specified in :paramref:`~write_file.file_path`). :raises FileExistsError: if the file to write to exists already and is write-protected. :raises FileNotFoundError: if parts of the file path do not exist. :raises IsADirectoryError: file_path points to a directory instead of a file. :raises LookupError: unknown encoding name. :raises NotADirectoryError: part of the path expected to be a directory is actually a file. :raises OSError: disk full, filename too long, too many open files, network or device disconnected, file_path is misspelled or contains invalid characters. :raises PermissionError: if the current OS user account lacks permissions to write the file content. :raises TypeError: content is not of type `str`. :raises UnicodeEncodeError: content cannot be encoded using the selected encoding. :raises ValueError: other encoding errors, invalid mode or incompatible arguments. to extend this function for Android 14+, see `<https://github.com/beeware/toga/pull/1158#issuecomment-2254564657>`__ and `<https://gist.github.com/neonankiti/05922cf0a44108a2e2732671ed9ef386>`__ Yes, to use ACTION_CREATE_DOCUMENT, you don't supply a URI in the intent. You wait for the intent result, and that will contain a URI which you can write to. See #1158 (comment - `<https://github.com/beeware/toga/pull/1158#issuecomment-2254564657>`__) for a link to a Java example, and #1158 (comment - `<https://github.com/beeware/toga/pull/1158#issuecomment-1446196973>`__) for how to wait for an intent result. Related german docs: `<https://developer.android.com/training/data-storage/shared/media?hl=de>`__ """ if make_dirs and (dir_path := os_path_dirname(file_path)): os.makedirs(dir_path, exist_ok=True) with open(file_path, mode='w', encoding=encoding) as file_handle: file_handle.write(content)
[docs] class UnsetType: """ (singleton) UNSET (type) object class. """
[docs] def __bool__(self): """ ensure to be evaluated as False, like None. """ return False
[docs] def __len__(self): """ ensure to be evaluated as empty. """ return 0
UNSET: Final = UnsetType() #: pseudo value used for attributes/arguments if ``None`` is needed as a valid value