ae.files

generic file object helpers

this namespace portion is pure Python code, providing helpers for file object and content managing. it only depends on the ae.base namespace portion.

Hint

more helper functions to manage directory/folder structures are provided by the ae.paths portion.

the helper function copy_bytes() provides recoverable copies of binary files and file streams, with progress callbacks for every copied chunk/buffer.

file_lines() and read_file_text() are helpers to read/load text file contents.

the function write_file_text() stores a string to a text file.

the helper function file_transfer_progress() puts the number of transferred bytes in a short and user-readable format, to be displayed as progress string in a file transfer.

RegisteredFile and CachedFile encapsulate and optionally cache the contents of a file within a file object. instances of these classes are compatible with the file objects provided by Python’s pathlib module. but also pure path strings can be used as file objects (see also the FileObject type).

all these types of file objects are supported by the class FilesRegister from the ae.paths portion.

registered file

a registered file object represents a single file on your file system and can be instantiated from one of the classes RegisteredFile or CachedFile provided by this module/portion:

from ae.files import RegisteredFile

rf = RegisteredFile('path/to/the/file_name.extension')

assert str(rf) == 'path/to/the/file_name.extension'
assert rf.path == 'path/to/the/file_name.extension'
assert rf.stem == 'file_name'
assert rf.ext == '.extension'
assert rf.properties == {}

file properties will be automatically attached to each file object instance with the instance attribute properties. in the last example it results in an empty dictionary because the path of this file object does not contain folder names with an underscore character.

file properties

file property names and values are automatically determined from the names of their subfolders, specified in the path attribute. every subfolder name containing an underscore character in the format <property-name>_<value> will be interpreted as a file property:

rf = RegisteredFile('property1_69/property2_3.69/property3_whatever/file_name.ext')
assert rf.properties['property1'] == 69
assert rf.properties['property2'] == 3.69
assert rf.properties['property3'] == 'whatever'

the property types int, float and string are recognized and converted into a property value. boolean values can be specified as 1 and 0 integers.

cached file

a cached file created from the CachedFile behaves like a registered file and additionally provides the possibility to cache parts or the whole file content as well as the file pointer of the opened file:

cf = CachedFile('integer_69/float_3.69/string_whatever/file_name.ext')

assert str(cf) == 'integer_69/float_3.69/string_whatever/file_name.ext'
assert cf.path == 'integer_69/float_3.69/string_whatever/file_name.ext'
assert cf.stem == 'file_name'
assert cf.ext == '.ext'
assert cf.properties['integer'] == 69
assert cf.properties['float'] == 3.69
assert cf.properties['string'] == 'whatever'

pn instantiation of the CachedFile file object the default file object loader function _default_object_loader() will be used, which opens a file stream via Python’s open() built-in. alternatively, you can specify a specific file object loader with the object_loader parameter or by assigning a callable directly to the object_loader attribute:

cf = CachedFile('integer_69/float_3.69/string_whatever/file_name.ext',
                object_loader=lambda cached_file_obj: my_open_method(cached_file_obj.path))

the cached file object is accessible via the loaded_object attribute of the cached file object instance:

assert isinstance(cf.loaded_object, TextIOWrapper)
cf.loaded_object.seek(...)
cf.loaded_object.read(...)

cf.loaded_object.close()

Module Attributes

FileObject

file object type, e.g. a file path str or any class or callable where the returned instance/value is either a string or an object with a stem attribute (holding the file name w/o extension), like e.g. CachedFile, RegisteredFile, pathlib.Path or pathlib.PurePath.

PropertyType

types of file property values

PropertiesType

dict of file properties

FilenameOrStream

file name or file stream pointer

Functions

copy_bytes(src_file, dst_file, *[, ...])

recoverable copy of a file or stream (file-like object), optionally with progress callbacks.

file_lines(file_path[, encoding])

returning lines of the text file specified by file_path argument as tuple.

file_transfer_progress(transferred_bytes[, ...])

return string to display the transfer progress of transferred bytes in short and user-readable format.

read_file_text(file_path[, encoding, ...])

returning content of the text file specified by file_path argument as string, while suppressing exceptions.

write_file_text(text_or_lines, file_path[, ...])

write the passed text string or list of line strings into the text file specified by file_path argument.

Classes

CachedFile(file_path[, object_loader, ...])

represents a cacheables registered file object - see also cached file examples.

RegisteredFile(file_path, **kwargs)

represents a single file - see also registered file examples.

FileObject

file object type, e.g. a file path str or any class or callable where the returned instance/value is either a string or an object with a stem attribute (holding the file name w/o extension), like e.g. CachedFile, RegisteredFile, pathlib.Path or pathlib.PurePath.

alias of str | RegisteredFile | CachedFile | Path | PurePath | Any

PropertyType

types of file property values

alias of int | float | str

PropertiesType

dict of file properties

alias of Dict[str, int | float | str]

FilenameOrStream

file name or file stream pointer

alias of str | BinaryIO

copy_bytes(src_file, dst_file, *, transferred_bytes=0, total_bytes=0, buf_size=16384, overwrite=False, move_file=False, recoverable=False, errors=None, progress_func=<function dummy_function>, **progress_kwargs)[source]

recoverable copy of a file or stream (file-like object), optionally with progress callbacks.

Parameters:
  • src_file (Union[str, BinaryIO]) – source file name or opened stream (file-like) object. if passing a non-seekable stream together with a non-zero value in transferred_bytes, then the source stream has to be set to the correct position before you call this function. if passing any source stream, then also the total file/stream size has to be passed into the total_bytes parameter. source file streams do also not support a True value in the move_file argument.

  • dst_file (Union[str, BinaryIO]) – destination file name or opened stream (file-like) object. recoverable copies and copies with a True value in the overwrite argument are not allowed; always use a destination file name if you need a recoverable/overwriting copy.

  • transferred_bytes (int) – file offset at which the copy process starts. if not passed for recoverable copies, then copy_bytes will determine this value from the file length of the destination file.

  • total_bytes (int) – source file size in bytes (needed only if src_file is a stream).

  • buf_size (int) – size of copy buffer/chunk in bytes (that get copied before each progress callback).

  • overwrite (bool) – pass True to allow overwriting of the destination file. if the destination file exists already, then this function will return an error (when this argument gets not specified or has a value that evaluates as False).

  • move_file (bool) – pass True to delete the source file on complete copying (only works if the source file is a stream).

  • recoverable (bool) – pass True to allow recoverable file copy (only working if the source file is a stream).

  • errors (Optional[List[str]]) – pass an empty list to get a list of detailed error messages.

  • progress_func (Callable) – optional callback to dispatch or break/cancel the copy progress for large files. if the callback returns a non-empty value, it will be interpreted as cancel reason, the copy process will be stopped, and an error will be returned.

  • progress_kwargs – optional additional kwargs passed to the progress function. the kwargs total_bytes and transferred_bytes will be updated before the callback.

Return type:

str

Returns:

destination file name/stream as string or empty string on error.

Hint

this function is extending the compatible Python functions shutil.copyfileobj(), shutil.copyfile(), shutil.copy(), shutil.copy2() and http.server.SimpleHTTPRequestHandler.copyfile() with recoverability and a progress callback. it can also be used as an argument for the copy_function parameter of e.g. shutil.copytree() and shutil.move().

file_lines(file_path, encoding=None)[source]

returning lines of the text file specified by file_path argument as tuple.

Parameters:
  • file_path (str) – file path/name to parse/load.

  • encoding (Optional[str]) – encoding used to load and convert/interpret the file content.

Return type:

Tuple[str, ...]

Returns:

tuple of the lines found in the specified file or empty tuple if the file could not be found or opened.

file_transfer_progress(transferred_bytes, total_bytes=0, decimal_places=3)[source]

return string to display the transfer progress of transferred bytes in short and user-readable format.

Parameters:
  • transferred_bytes (int) – number of transferred bytes.

  • total_bytes (int) – number of total bytes.

  • decimal_places (int) – number of decimal places (should be between 0 and 3).

Return type:

str

Returns:

formatted string to display the progress of the currently running transfer.

read_file_text(file_path, encoding=None, error_handling='ignore')[source]

returning content of the text file specified by file_path argument as string, while suppressing exceptions.

Parameters:
  • file_path (str) – file path/name to load into a string.

  • encoding (Optional[str]) – encoding used to load and convert/interpret the file content (see built-in open).

  • error_handling (str) – passed onto the errors parameter of the built-in open function.

Return type:

Optional[str]

Returns:

the file contents as a string. if the file could not be decoded, found or opened, returns an empty string (if error_handling is unspecified or set to ‘ignore’), otherwise None. this function suppresses and catches exceptions such as FileNotFoundError, OSError, PermissionError, and ValueError.

write_file_text(text_or_lines, file_path, encoding=None)[source]

write the passed text string or list of line strings into the text file specified by file_path argument.

Parameters:
  • text_or_lines (Union[str, List[str], Tuple[str]]) – new file content either passed as string or list of line strings (will be concatenated with the line separator of the current OS: os.linesep).

  • file_path (str) – file path/name to write the passed content into (overwriting any previous content!).

  • encoding (Optional[str]) – encoding used to write/convert/interpret the file content to write.

Return type:

bool

Returns:

True if the content got written to the file, False on error/exception. this function suppresses and catches exceptions such as FileExistsError, FileNotFoundError, OSError, PermissionError, and ValueError.

class RegisteredFile(file_path, **kwargs)[source]

Bases: object

represents a single file - see also registered file examples.

__init__(file_path, **kwargs)[source]

initialize the registered file instance.

Parameters:
  • file_path (str) – file path string.

  • kwargs – not supported, only there to have compatibility to CachedFile to detect invalid kwargs.

path: str

file path

stem: str

file basename without extension

ext: str

file name extension

properties: Dict[str, int | float | str]

file properties

__eq__(other)[source]

allow equality checks.

Parameters:

other (Union[str, RegisteredFile, CachedFile, Path, PurePath, Any]) – another file object to compare this instance with.

Return type:

bool

Returns:

True if both objects are of this type and contain a file with the same path, else False.

__repr__()[source]

for config var storage and eval recovery.

Returns:

evaluable/recoverable representation of this object.

__str__()[source]

return the file path of the registered file.

Returns:

file path string of this file object.

add_property(property_name, str_value)[source]

add a property to this file object instance.

Parameters:
  • property_name (str) – stem of the property to add.

  • str_value (str) – literal of the property value (int/float/str type will be detected).

__hash__ = None
_default_object_loader(file_obj)[source]

file object loader that is opening the file and keeping the handle of the opened file.

Parameters:

file_obj (Union[str, RegisteredFile, CachedFile, Path, PurePath, Any]) – file object (path string or obj with path attribute holding the complete file path).

Returns:

file handle to the opened file.

class CachedFile(file_path, object_loader=<function _default_object_loader>, late_loading=True)[source]

Bases: RegisteredFile

represents a cacheables registered file object - see also cached file examples.

__init__(file_path, object_loader=<function _default_object_loader>, late_loading=True)[source]

create a cached file object instance.

Parameters:
  • file_path (str) – path string of the file.

  • object_loader (Callable[[CachedFile], Any]) – callable converting the file_obj into a cached object (available via loaded_object).

  • late_loading (bool) – pass False to convert/load file_obj cache early, directly at instantiation.

property loaded_object: Any

loaded object class instance property.

Returns:

the loaded and cached file object.