megfile.s3_path module

class megfile.s3_path.S3Path(path: str | BasePath | PathLike, *other_paths: str | BasePath | PathLike)[source]

Bases: URIPath

absolute() → S3Path[source]: Make the path absolute, without normalization or resolving symlinks. Returns a new path object

access(mode: Access = Access.READ) → bool[source]

Test if path has access permission described by mode

Parameters:: mode – access mode
Returns:: bool, if the bucket of s3_url has read/write access.

copy(dst_url: str | BasePath | PathLike, callback: Callable[[int], None] | None = None, followlinks: bool = False, overwrite: bool = True) → None[source]

File copy on S3 Copy content of file on src_path to dst_path. It’s caller’s responsibility to ensure the s3_isfile(src_url) is True

Parameters:

dst_path – Target file path
callback – Called periodically during copy, and the input parameter is the data size (in bytes) of copy since the last call
followlinks – False if regard symlink as file, else True
overwrite – whether or not overwrite file when exists, default is True

cwd() → S3Path[source]

Return current working directory

returns: Current working directory

exists(followlinks: bool = False) → bool[source]

Test if s3_url exists

If the bucket of s3_url are not permitted to read, return False

Returns:: True if s3_url exists, else False

getmtime(follow_symlinks: bool = False) → float[source]

Get last-modified time of the file on the given s3_url path (in Unix timestamp format).

If the path is an existent directory, return the latest modified time of all file in it. The mtime of empty directory is 1970-01-01 00:00:00

If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError

Returns:: Last-modified time
Raises:: S3FileNotFoundError, UnsupportedError

getsize(follow_symlinks: bool = False) → int[source]

Get file size on the given s3_url path (in bytes).

If the path in a directory, return the sum of all file size in it, including file in subdirectories (if exist).

The result excludes the size of directory itself. In other words, return 0 Byte on an empty directory path.

If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError

Returns:: File size
Raises:: S3FileNotFoundError, UnsupportedError

glob(pattern, recursive: bool = True, missing_ok: bool = True) → List[S3Path][source]

Return s3 path list in ascending alphabetical order, in which path matches glob pattern

Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError

Parameters:

pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError

Raises:

UnsupportedError, when bucket part contains wildcard characters

Returns:

A list contains paths match s3_pathname

glob_stat(pattern, recursive: bool = True, missing_ok: bool = True) → Iterator[FileEntry][source]

Return a generator contains tuples of path and file stat, in ascending alphabetical order, in which path matches glob pattern

Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError

Parameters:

pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError

Raises:

UnsupportedError, when bucket part contains wildcard characters

Returns:

A generator contains tuples of path and file stat, in which paths match s3_pathname

hasbucket() → bool[source]

Test if the bucket of s3_url exists

Returns:: True if bucket of s3_url exists, else False

iglob(pattern, recursive: bool = True, missing_ok: bool = True) → Iterator[S3Path][source]

Return s3 path iterator in ascending alphabetical order, in which path matches glob pattern

Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError

Parameters:

pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError

Raises:

UnsupportedError, when bucket part contains wildcard characters

Returns:

An iterator contains paths match s3_pathname

is_dir(followlinks: bool = False) → bool[source]

Test if an s3 url is directory Specific procedures are as follows: If there exists a suffix, of which os.path.join(s3_url, suffix) is a file If the url is empty bucket or s3://

Parameters:: followlinks – whether followlinks is True or False, result is the same. Because s3 symlink not support dir.
Returns:: True if path is s3 directory, else False

is_file(followlinks: bool = False) → bool[source]

Test if an s3_url is file

Returns:: True if path is s3 file, else False

is_symlink() → bool[source]

Test whether a path is link

Returns:: True if a path is link, else False
Raises:: S3NotALinkError

iterdir() → Iterator[S3Path][source]

Get all contents of given s3_url. The order of result is in arbitrary order.

Returns:: All contents have prefix of s3_url
Raises:: S3FileNotFoundError, S3NotADirectoryError

listdir() → List[str][source]

Get all contents of given s3_url. The result is in ascending alphabetical order.

Parameters:: missing_ok – if True and target directory not exists return empty list, default is True.
Returns:: All contents have prefix of s3_url in ascending alphabetical order
Raises:: S3FileNotFoundError, S3NotADirectoryError

load() → BinaryIO[source]

Read all content in binary on specified path and write into memory

User should close the BinaryIO manually

Returns:: BinaryIO

md5(recalculate: bool = False, followlinks: bool = False) → str[source]

Get md5 meta info in files that uploaded/copied via megfile

If meta info is lost or non-existent, return None

Parameters:

recalculate – calculate md5 in real-time or return s3 etag
followlinks – If is True, calculate md5 for real file

Returns:

md5 meta info

mkdir(mode=511, parents: bool = False, exist_ok: bool = False)[source]

Create an s3 directory. Purely creating directory is invalid because it’s unavailable on OSS. This function is to test the target bucket have WRITE access.

Parameters:

mode – mode is ignored, only be compatible with pathlib.Path
parents – parents is ignored, only be compatible with pathlib.Path
exist_ok – If False and target directory exists, raise S3FileExistsError

Raises:

S3BucketNotFoundError, S3FileExistsError

move(dst_url: str | BasePath | PathLike, overwrite: bool = True) → None[source]

Move file/directory path from src_url to dst_url

Parameters:

dst_url – Given destination path
overwrite – whether or not overwrite file when exists

open(mode: str = 'r', *, encoding: str | None = None, errors: str | None = None, s3_open_func: ~typing.Callable = <function s3_buffered_open>, **kwargs) → IO[source]: Open the file with mode.

property parts: Tuple[str, ...]: A tuple giving access to the path’s various components

property path_with_protocol: str: Return path with protocol, like file:///root, s3://bucket/key

property path_without_protocol: str: Return path without protocol, example: if path is s3://bucket/key, return bucket/key

protocol = 's3'

readlink() → S3Path[source]

Return a S3Path instance representing the path to which the symbolic link points

Returns:: Return a S3Path instance representing the path to which the symbolic link points.
Raises:: S3NameTooLongError, S3BucketNotFoundError, S3IsADirectoryError, S3NotALinkError

remove(missing_ok: bool = False) → None[source]

Remove the file or directory on s3, s3:// and s3://bucket are not permitted to remove

Parameters:: missing_ok – if False and target file/directory not exists, raise S3FileNotFoundError
Raises:: S3PermissionError, S3FileNotFoundError, UnsupportedError

rename(dst_path: str | BasePath | PathLike, overwrite: bool = True) → S3Path[source]

Move s3 file path from src_url to dst_url

Parameters:

dst_path – Given destination path
overwrite – whether or not overwrite file when exists

save(file_object: BinaryIO)[source]

Write the opened binary stream to specified path, but the stream won’t be closed

Parameters:: file_object – Stream to be read

scan(missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source]

Iteratively traverse only files in given s3 directory, in alphabetical order. Every iteration on generator yields a path string.

If s3_url is a file path, yields the file only

If s3_url is a non-existent path, return an empty generator

If s3_url is a bucket path, return all file paths in the bucket

If s3_url is an empty bucket, return an empty generator

If s3_url doesn’t contain any bucket, which is s3_url == ‘s3://’, raise UnsupportedError. walk() on complete s3 is not supported in megfile

Parameters:: missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
Raises:: UnsupportedError
Returns:: A file path generator

scan_stat(missing_ok: bool = True, followlinks: bool = False) → Iterator[FileEntry][source]

Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a tuple of path string and file stat

Parameters:: missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
Raises:: UnsupportedError
Returns:: A file path generator

scandir() → ContextIterator[source]

Get all contents of given s3_url, the order of result is in arbitrary order.

Returns:: All contents have prefix of s3_url
Raises:: S3BucketNotFoundError, S3FileNotFoundError, S3NotADirectoryError

stat(follow_symlinks=True) → StatResult[source]

Get StatResult of s3_url file, including file size and mtime, referring to s3_getsize and s3_getmtime

If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError

If attempt to get StatResult of complete s3, such as s3_dir_url == ‘s3://’, raise S3BucketNotFoundError

Returns:: StatResult
Raises:: S3FileNotFoundError, S3BucketNotFoundError

symlink(dst_path: str | BasePath | PathLike) → None[source]

Create a symbolic link pointing to src_path named dst_path.

Parameters:: dst_path – Destination path
Raises:: S3NameTooLongError, S3BucketNotFoundError, S3IsADirectoryError

sync(dst_url: str | BasePath | PathLike, followlinks: bool = False, force: bool = False, overwrite: bool = True) → None[source]

Copy file/directory on src_url to dst_url

Parameters:

dst_url – Given destination path
followlinks – False if regard symlink as file, else True
force – Sync file forcible, do not ignore same files, priority is higher than ‘overwrite’, default is False
overwrite – whether or not overwrite file when exists, default is True

unlink(missing_ok: bool = False) → None[source]

Remove the file on s3

Parameters:: missing_ok – if False and target file not exists, raise S3FileNotFoundError
Raises:: S3PermissionError, S3FileNotFoundError, S3IsADirectoryError

walk(followlinks: bool = False) → Iterator[Tuple[str, List[str], List[str]]][source]

Iteratively traverse the given s3 directory, in top-bottom order. In other words, firstly traverse parent directory, if subdirectories exist, traverse the subdirectories in alphabetical order.

Every iteration on generator yields a 3-tuple: (root, dirs, files)

root: Current s3 path;
dirs: Name list of subdirectories in current directory. The list is sorted by name in ascending alphabetical order;
files: Name list of files in current directory. The list is sorted by name in ascending alphabetical order;

If s3_url is a file path, return an empty generator

If s3_url is a non-existent path, return an empty generator

If s3_url is a bucket path, bucket will be the top directory, and will be returned at first iteration of generator

If s3_url is an empty bucket, only yield one 3-tuple (notes: s3 doesn’t have empty directory)

If s3_url doesn’t contain any bucket, which is s3_url == ‘s3://’, raise UnsupportedError. walk() on complete s3 is not supported in megfile

Parameters:: followlinks – whether followlinks is True or False, result is the same. Because s3 symlink not support dir.
Raises:: UnsupportedError
Returns:: A 3-tuple generator