aiomegfile.filesystem.hdfs module
HDFS filesystem implementation backed by the sync hdfs client.
- class aiomegfile.filesystem.hdfs.HdfsFileSystem(profile_name: str | None = None)[source]
Bases:
BaseFileSystemFilesystem implementation for HDFS URIs.
- async absolute(path: str) str[source]
Make the path absolute without resolving symlinks.
- Parameters:
path – HDFS path without protocol.
- Returns:
Absolute HDFS path without protocol.
- Return type:
str
- async access(path: str, mode: Access = Access.READ) bool[source]
Check read/write access heuristically for an HDFS path.
- Parameters:
path – HDFS path without protocol.
mode – Access mode enum.
- Returns:
Whether access is likely available.
- Return type:
bool
- build_uri(path: str) str
Build URI from path part.
- Parameters:
path – Path without protocol.
- Returns:
Full HDFS URI.
- Return type:
str
- async copy(src_path: str, dst_path: str, callback: Callable[[int], None] | None = None) str[source]
Copy a single file inside HDFS.
- Parameters:
src_path – Source HDFS path without protocol.
dst_path – Destination HDFS path without protocol.
callback – Optional callback receiving transferred byte counts.
- Returns:
Destination path.
- Return type:
str
- async download(src_path: str, dst_path: str, callback: Callable[[int], None] | None = None) None[source]
Download an HDFS file into the local filesystem.
- Parameters:
src_path – Source HDFS path without protocol.
dst_path – Local destination file path.
callback – Optional callback receiving transferred byte counts.
- async exists(path: str, followlinks: bool = False) bool[source]
Return whether the path points to an existing file or directory.
- Parameters:
path – HDFS path without protocol.
followlinks – Ignored because HDFS symlinks are unsupported.
- Returns:
True if the path exists, otherwise False.
- Return type:
bool
- classmethod from_uri(uri: str)
Create filesystem instance from URI.
- Parameters:
uri – URI string.
- Returns:
HdfsFileSystem instance.
- Return type:
- async is_absolute(path: str) bool[source]
Return whether an HDFS path is absolute.
- Parameters:
path – HDFS path without protocol.
- Returns:
True if the path is absolute.
- Return type:
bool
- async is_dir(path: str, followlinks: bool = False) bool[source]
Return True if the path points to a directory.
- Parameters:
path – HDFS path without protocol.
followlinks – Ignored because HDFS symlinks are unsupported.
- Returns:
True if the path is a directory, otherwise False.
- Return type:
bool
- async is_file(path: str, followlinks: bool = False) bool[source]
Return True if the path points to a regular file.
- Parameters:
path – HDFS path without protocol.
followlinks – Ignored because HDFS symlinks are unsupported.
- Returns:
True if the path is a file, otherwise False.
- Return type:
bool
- async is_symlink(path: str) bool[source]
Return False because HDFS symlinks are unsupported.
- Parameters:
path – Path to check.
- Returns:
Always False.
- Return type:
bool
- async md5(path: str, recalculate: bool = False, followlinks: bool = False) str[source]
Return MD5 checksum for a file or directory.
- Parameters:
path – HDFS path without protocol.
recalculate – Ignored for compatibility.
followlinks – Ignored because HDFS symlinks are unsupported.
- Returns:
MD5 hex digest.
- Return type:
str
- async mkdir(path: str, mode: int = 511, parents: bool = False, exist_ok: bool = False) None[source]
Create a directory.
- Parameters:
path – HDFS path without protocol.
mode – Permission bits for the new directory.
parents – Ignored for compatibility with pathlib semantics.
exist_ok – Whether to ignore if the directory exists.
- async move(src_path: str, dst_path: str, overwrite: bool = True) str[source]
Move a file or directory to another HDFS path.
- Parameters:
src_path – Source HDFS path without protocol.
dst_path – Destination HDFS path without protocol.
overwrite – Whether to overwrite the destination path.
- Returns:
Destination path.
- Return type:
str
- open(path: str, mode: str = 'r', buffering: int = -1, encoding: str | None = None, errors: str | None = None, newline: str | None = None, **kwargs: Any) AsyncContextManager[source]
Open an HDFS file with the requested mode.
- Parameters:
path – HDFS path without protocol.
mode – File open mode.
buffering – Buffering policy.
encoding – Text encoding in text mode.
errors – Error handling strategy.
newline – Newline handling policy.
kwargs – Extra open options for compatibility with megfile.
- Returns:
Async file context manager.
- Return type:
AsyncContextManager
- Raises:
HdfsInvalidError – If an unacceptable mode is provided.
- parse_uri(uri: str) str[source]
Parse URI into path part without protocol.
- Parameters:
uri – URI string.
- Returns:
Path without protocol.
- Return type:
str
- protocol = 'hdfs'
- async readlink(path: str) str[source]
Raise because HDFS symlinks are unsupported.
- Parameters:
path – Symlink path.
- Returns:
Never returns.
- Return type:
str
- Raises:
NotImplementedError – Always.
- async remove(path: str, missing_ok: bool = False) None[source]
Remove a file or directory recursively.
- Parameters:
path – HDFS path without protocol.
missing_ok – Whether to ignore missing targets.
- same_endpoint(other_filesystem: BaseFileSystem) bool[source]
Return whether another filesystem points to the same HDFS endpoint.
- Parameters:
other_filesystem – Filesystem to compare.
- Returns:
True when two filesystems share the same HDFS profile/config.
- Return type:
bool
- async samefile(path: str, other_path: str) bool[source]
Return whether two HDFS paths point to the same file.
- Parameters:
path – First HDFS path without protocol.
other_path – Second HDFS path without protocol.
- Returns:
True if both point to the same file.
- Return type:
bool
- scandir(path: str) AsyncContextManager[AsyncIterator[FileEntry]][source]
Return an async context manager for iterating directory entries.
- Parameters:
path – HDFS directory path without protocol.
- Returns:
Async context manager producing
FileEntryitems.- Return type:
AsyncContextManager[AsyncIterator[FileEntry]]
- scanfile(path: str, sort: bool = False) AsyncContextManager[AsyncIterator[FileEntry]][source]
Iteratively traverse only files under the given path.
- Parameters:
path – HDFS path without protocol.
sort – Compatibility flag for protocol-aligned scanfile APIs.
- Returns:
Async context manager yielding file entries.
- Return type:
AsyncContextManager[AsyncIterator[FileEntry]]
- async stat(path: str, followlinks: bool = False) StatResult[source]
Get the status of the path.
- Parameters:
path – HDFS path without protocol.
followlinks – Ignored because HDFS symlinks are unsupported.
- Returns:
Populated stat result.
- Return type:
- aiomegfile.filesystem.hdfs.get_hdfs_client(profile_name: str | None = None) Any[source]
Create or reuse a cached sync HDFS client.
- Parameters:
profile_name – Optional HDFS profile name.
- Returns:
HDFS client instance.
- Return type:
Any
- Raises:
ImportError – If the optional
hdfsdependency is unavailable.
- aiomegfile.filesystem.hdfs.get_hdfs_config(profile_name: str | None = None) Dict[str, Any][source]
Load HDFS configuration from environment variables and config file.
- Parameters:
profile_name – Optional HDFS profile name.
- Returns:
HDFS client configuration dictionary.
- Return type:
dict[str, Any]
- Raises:
Exception – If required HDFS config is missing.