ae.files

generic file object helpers

this namespace portion is pure Python providing helpers for file object and content managing. it only depends on the ae.base namespace portion.

Hint

more helper functions to manage directory/folder structures are provided by the ae.paths portion.

the helper function copy_bytes() provides recoverable copies of binary files and file streams, with progress callbacks for each copied bytes chunk/buffer.

file_lines() and read_file_text() are helpers to read/load text file contents.

the function write_file_text() stores a string to a text file.

the helper function file_transfer_progress() puts the amount of transferred bytes in a short and user readable format, to be displayed as progress string in a file transfer progress.

RegisteredFile and CachedFile encapsulate and optionally cache the contents of a file within a file object. instances of these classes are compatible with the file objects provided by Python’s pathlib module. but also pure path strings can be used as file objects (see also the FileObject type).

all these types of file objects are supported by the files register class FilesRegister from the ae.paths portion.

registered file

a registered file object represents a single file on your file system and can be instantiated from one of the classes RegisteredFile or CachedFile provided by this module/portion:

from ae.files import RegisteredFile

rf = RegisteredFile('path/to/the/file_name.extension')

assert str(rf) == 'path/to/the/file_name.extension'
assert rf.path == 'path/to/the/file_name.extension'
assert rf.stem == 'file_name'
assert rf.ext == '.extension'
assert rf.properties == {}

file properties will be automatically attached to each file object instance with the instance attribute properties. in the last example it results in an empty dictionary because the path of this file object does not contain folder names with an underscore character.

file properties

file property names and values are automatically determined from the names of their sub-folders, specified in the path attribute. every sub-folder name containing an underscore character in the format <property-name>_<value> will be interpreted as a file property:

rf = RegisteredFile('property1_69/property2_3.69/property3_whatever/file_name.ext')
assert rf.properties['property1'] == 69
assert rf.properties['property2'] == 3.69
assert rf.properties['property3'] == 'whatever'

the property types int, float and string are recognized and converted into a property value. boolean values can be specified as 1 and 0 integers.

cached file

a cached file created from the CachedFile behaves like a registered file and additionally provides the possibility to cache parts or the whole file content as well as the file pointer of the opened file:

cf = CachedFile('integer_69/float_3.69/string_whatever/file_name.ext')

assert str(cf) == 'integer_69/float_3.69/string_whatever/file_name.ext'
assert cf.path == 'integer_69/float_3.69/string_whatever/file_name.ext'
assert cf.stem == 'file_name'
assert cf.ext == '.ext'
assert cf.properties['integer'] == 69
assert cf.properties['float'] == 3.69
assert cf.properties['string'] == 'whatever'

pn instantiation of the CachedFile file object the default file object loader function _default_object_loader() will be used, which opens a file stream via Python’s open() built-in. alternatively you can specify a specific file object loader with the object_loader parameter or by assigning a callable directly to the object_loader attribute:

cf = CachedFile('integer_69/float_3.69/string_whatever/file_name.ext',
                object_loader=lambda cached_file_obj: my_open_method(cached_file_obj.path))

the cached file object is accessible via the loaded_object attribute of the cached file object instance:

assert isinstance(cf.loaded_object, TextIOWrapper)
cf.loaded_object.seek(...)
cf.loaded_object.read(...)

cf.loaded_object.close()

Module Attributes

FileObject

file object type, e.g.

PropertyType

types of file property values

PropertiesType

dict of file properties

FilenameOrStream

file name or file stream pointer

Functions

copy_bytes(src_file, dst_file, *[, ...])

recoverable copy of a file or stream (file-like object), optionally with progress callbacks.

file_lines(file_path[, encoding])

returning lines of the text file specified by file_path argument as tuple.

file_transfer_progress(transferred_bytes[, ...])

return string to display the transfer progress of transferred bytes in short and user readable format.

read_file_text(file_path[, encoding, ...])

returning content of the text file specified by file_path argument as string.

write_file_text(text_or_lines, file_path[, ...])

write the passed text string or list of line strings into the text file specified by file_path argument.

Classes

CachedFile(file_path[, object_loader, ...])

represents a cacheables registered file object - see also cached file examples.

RegisteredFile(file_path, **kwargs)

represents a single file - see also registered file examples.

FileObject

file object type, e.g. a file path str or any class or callable where the returned instance/value is either a string or an object with a stem attribute (holding the file name w/o extension), like e.g. CachedFile, RegisteredFile, pathlib.Path or pathlib.PurePath.

alias of Union[str, RegisteredFile, CachedFile, Path, PurePath, Any]

PropertyType

types of file property values

alias of Union[int, float, str]

PropertiesType

dict of file properties

alias of Dict[str, Union[int, float, str]]

FilenameOrStream

file name or file stream pointer

alias of Union[str, BinaryIO]

copy_bytes(src_file, dst_file, *, transferred_bytes=0, total_bytes=0, buf_size=16384, overwrite=False, move_file=False, recoverable=False, errors=None, progress_func=<function dummy_function>, **progress_kwargs)[source]

recoverable copy of a file or stream (file-like object), optionally with progress callbacks.

Parameters:
  • src_file (Union[str, BinaryIO]) – source file name or opened stream (file-like) object. if passing a non-seekable stream together with a non-zero value in transferred_bytes then the source stream has to be set to the correct position before you call this function. if passing any source stream then also the total file/stream size has to be passed into the total_bytes parameter. source file streams does also not support a True value in the move_file argument.

  • dst_file (Union[str, BinaryIO]) – destination file name or opened stream (file-like) object. recoverable copies and copies with a True value in the overwrite argument are not allowed; always use a destination file name if you need a recoverable/overwriting copy.

  • transferred_bytes (int) – file offset at which the copy process starts. if not passed for recoverable copies, then copy_bytes will determine this value from the file length of the destination file.

  • total_bytes (int) – source file size in bytes (needed only if src_file is a stream).

  • buf_size (int) – size of copy buffer/chunk in bytes (that get copied before each progress callback).

  • overwrite (bool) – pass True to allow to overwrite of destination file. if the destination file exists already then this function will return an error (when this argument get not passed or is False).

  • move_file (bool) – pass True to delete source file on complete copying (only works if source is a stream).

  • recoverable (bool) – pass True to allow recoverable file copy (only working if source is a stream).

  • errors (Optional[List[str]]) – pass empty list to get a list of detailed error messages.

  • progress_func (Callable) – optional callback to dispatch or break/cancel the copy progress for large files. if the callback returns a non-empty value it will be interpreted as cancel reason, the copy process will be stopped and an error will be returned.

  • progress_kwargs – optional additional kwargs passed to the progress function. the kwargs total_bytes and transferred_bytes will be updated before the callback.

Return type:

str

Returns:

destination file name/stream as string or empty string on error.

Hint

this function is extending the compatible Python functions shutil.copyfileobj(), shutil.copyfile(), shutil.copy(), shutil.copy2() and http.server.SimpleHTTPRequestHandler.copyfile() with recoverability and a progress callback. it can also be used as argument for the copy_function parameter of e.g. shutil.copytree() and shutil.move().

file_lines(file_path, encoding=None)[source]

returning lines of the text file specified by file_path argument as tuple.

Parameters:
  • file_path (str) – file path/name to parse/load.

  • encoding (Optional[str]) – encoding used to load and convert/interpret the file content.

Return type:

Tuple[str, ...]

Returns:

tuple of the lines found in the specified file or empty tuple if the file could not be found or opened.

file_transfer_progress(transferred_bytes, total_bytes=0, decimal_places=3)[source]

return string to display the transfer progress of transferred bytes in short and user readable format.

Parameters:
  • transferred_bytes (int) – number of transferred bytes.

  • total_bytes (int) – number of total bytes.

  • decimal_places (int) – number of decimal places (should be between 0 and 3).

Return type:

str

Returns:

formatted string to display progress of currently running transfer.

read_file_text(file_path, encoding=None, error_handling='ignore')[source]

returning content of the text file specified by file_path argument as string.

Parameters:
  • file_path (str) – file path/name to load into a string.

  • encoding (Optional[str]) – encoding used to load and convert/interpret the file content.

  • error_handling (str) – pass ‘strict’ or None to return None (instead of an empty string) for the cases where either a decoding ValueError exception or any OSError, FileNotFoundError or PermissionError exception got raised. the default value ‘ignore’ will ignore any decoding errors (missing some characters) and will return an empty string on any file/os exception.

Return type:

Optional[str]

Returns:

file content string. if the file could not be decoded, found or opened, then return empty string or None (None only if ‘strict’ got passed to the error_handling parameter).

write_file_text(text_or_lines, file_path, encoding=None)[source]

write the passed text string or list of line strings into the text file specified by file_path argument.

Parameters:
  • text_or_lines (Union[str, List[str], Tuple[str]]) – new file content either passed as string or list of line strings (will be concatenated with the line separator of the current OS: os.linesep).

  • file_path (str) – file path/name to write the passed content into (overwriting any previous content!).

  • encoding (Optional[str]) – encoding used to write/convert/interpret the file content to write.

Return type:

bool

Returns:

True if the content got written to the file, False on any file/OS error.

class RegisteredFile(file_path, **kwargs)[source]

Bases: object

represents a single file - see also registered file examples.

__init__(file_path, **kwargs)[source]

initialize registered file_obj instance.

Parameters:
  • file_path (str) – file path string.

  • kwargs – not supported, only there to have compatibility to CachedFile to detect invalid kwargs.

path: str

file path

stem: str

file basename without extension

ext: str

file name extension

properties: Dict[str, Union[int, float, str]]

file properties

__eq__(other)[source]

allow equality checks.

Parameters:

other (Union[str, RegisteredFile, CachedFile, Path, PurePath, Any]) – other file object to compare this instance with.

Return type:

bool

Returns:

True if both objects are of this type and contain a file with the same path, else False.

__repr__()[source]

for config var storage and eval recovery.

Returns:

evaluable/recoverable representation of this object.

__str__()[source]

return file path.

Returns:

file path string of this file object.

add_property(property_name, str_value)[source]

add a property to this file object instance.

Parameters:
  • property_name (str) – stem of the property to add.

  • str_value (str) – literal of the property value (int/float/str type will be detected).

__hash__ = None
_default_object_loader(file_obj)[source]

file object loader that is opening the file and keeping the handle of the opened file.

Parameters:

file_obj (Union[str, RegisteredFile, CachedFile, Path, PurePath, Any]) – file object (path string or obj with path attribute holding the complete file path).

Returns:

file handle to the opened file.

class CachedFile(file_path, object_loader=<function _default_object_loader>, late_loading=True)[source]

Bases: RegisteredFile

represents a cacheables registered file object - see also cached file examples.

__init__(file_path, object_loader=<function _default_object_loader>, late_loading=True)[source]

create cached file object instance.

Parameters:
  • file_path (str) – path string of the file.

  • object_loader (Callable[[CachedFile], Any]) – callable converting the file_obj into a cached object (available via loaded_object).

  • late_loading (bool) – pass False to convert/load file_obj cache early, directly at instantiation.

path: str

file path

stem: str

file basename without extension

ext: str

file name extension

properties: Dict[str, Union[int, float, str]]

file properties

property loaded_object: Any

loaded object class instance property.

Returns:

loaded and cached file object.