aiomegfile.filesystem.hdfs module

HDFS filesystem implementation backed by the sync hdfs client.

class aiomegfile.filesystem.hdfs.HdfsFileSystem(profile_name: str | None = None)[source]

Bases: BaseFileSystem

Filesystem implementation for HDFS URIs.

async absolute(path: str) str[source]

Make the path absolute without resolving symlinks.

Parameters:

path – HDFS path without protocol.

Returns:

Absolute HDFS path without protocol.

Return type:

str

async access(path: str, mode: Access = Access.READ) bool[source]

Check read/write access heuristically for an HDFS path.

Parameters:
  • path – HDFS path without protocol.

  • mode – Access mode enum.

Returns:

Whether access is likely available.

Return type:

bool

build_uri(path: str) str

Build URI from path part.

Parameters:

path – Path without protocol.

Returns:

Full HDFS URI.

Return type:

str

async copy(src_path: str, dst_path: str, callback: Callable[[int], None] | None = None) str[source]

Copy a single file inside HDFS.

Parameters:
  • src_path – Source HDFS path without protocol.

  • dst_path – Destination HDFS path without protocol.

  • callback – Optional callback receiving transferred byte counts.

Returns:

Destination path.

Return type:

str

async download(src_path: str, dst_path: str, callback: Callable[[int], None] | None = None) None[source]

Download an HDFS file into the local filesystem.

Parameters:
  • src_path – Source HDFS path without protocol.

  • dst_path – Local destination file path.

  • callback – Optional callback receiving transferred byte counts.

async exists(path: str, followlinks: bool = False) bool[source]

Return whether the path points to an existing file or directory.

Parameters:
  • path – HDFS path without protocol.

  • followlinks – Ignored because HDFS symlinks are unsupported.

Returns:

True if the path exists, otherwise False.

Return type:

bool

classmethod from_uri(uri: str)

Create filesystem instance from URI.

Parameters:

uri – URI string.

Returns:

HdfsFileSystem instance.

Return type:

HdfsFileSystem

async is_absolute(path: str) bool[source]

Return whether an HDFS path is absolute.

Parameters:

path – HDFS path without protocol.

Returns:

True if the path is absolute.

Return type:

bool

async is_dir(path: str, followlinks: bool = False) bool[source]

Return True if the path points to a directory.

Parameters:
  • path – HDFS path without protocol.

  • followlinks – Ignored because HDFS symlinks are unsupported.

Returns:

True if the path is a directory, otherwise False.

Return type:

bool

async is_file(path: str, followlinks: bool = False) bool[source]

Return True if the path points to a regular file.

Parameters:
  • path – HDFS path without protocol.

  • followlinks – Ignored because HDFS symlinks are unsupported.

Returns:

True if the path is a file, otherwise False.

Return type:

bool

Return False because HDFS symlinks are unsupported.

Parameters:

path – Path to check.

Returns:

Always False.

Return type:

bool

async md5(path: str, recalculate: bool = False, followlinks: bool = False) str[source]

Return MD5 checksum for a file or directory.

Parameters:
  • path – HDFS path without protocol.

  • recalculate – Ignored for compatibility.

  • followlinks – Ignored because HDFS symlinks are unsupported.

Returns:

MD5 hex digest.

Return type:

str

async mkdir(path: str, mode: int = 511, parents: bool = False, exist_ok: bool = False) None[source]

Create a directory.

Parameters:
  • path – HDFS path without protocol.

  • mode – Permission bits for the new directory.

  • parents – Ignored for compatibility with pathlib semantics.

  • exist_ok – Whether to ignore if the directory exists.

async move(src_path: str, dst_path: str, overwrite: bool = True) str[source]

Move a file or directory to another HDFS path.

Parameters:
  • src_path – Source HDFS path without protocol.

  • dst_path – Destination HDFS path without protocol.

  • overwrite – Whether to overwrite the destination path.

Returns:

Destination path.

Return type:

str

open(path: str, mode: str = 'r', buffering: int = -1, encoding: str | None = None, errors: str | None = None, newline: str | None = None, **kwargs: Any) AsyncContextManager[source]

Open an HDFS file with the requested mode.

Parameters:
  • path – HDFS path without protocol.

  • mode – File open mode.

  • buffering – Buffering policy.

  • encoding – Text encoding in text mode.

  • errors – Error handling strategy.

  • newline – Newline handling policy.

  • kwargs – Extra open options for compatibility with megfile.

Returns:

Async file context manager.

Return type:

AsyncContextManager

Raises:

HdfsInvalidError – If an unacceptable mode is provided.

parse_uri(uri: str) str[source]

Parse URI into path part without protocol.

Parameters:

uri – URI string.

Returns:

Path without protocol.

Return type:

str

protocol = 'hdfs'

Raise because HDFS symlinks are unsupported.

Parameters:

path – Symlink path.

Returns:

Never returns.

Return type:

str

Raises:

NotImplementedError – Always.

async remove(path: str, missing_ok: bool = False) None[source]

Remove a file or directory recursively.

Parameters:
  • path – HDFS path without protocol.

  • missing_ok – Whether to ignore missing targets.

same_endpoint(other_filesystem: BaseFileSystem) bool[source]

Return whether another filesystem points to the same HDFS endpoint.

Parameters:

other_filesystem – Filesystem to compare.

Returns:

True when two filesystems share the same HDFS profile/config.

Return type:

bool

async samefile(path: str, other_path: str) bool[source]

Return whether two HDFS paths point to the same file.

Parameters:
  • path – First HDFS path without protocol.

  • other_path – Second HDFS path without protocol.

Returns:

True if both point to the same file.

Return type:

bool

scandir(path: str) AsyncContextManager[AsyncIterator[FileEntry]][source]

Return an async context manager for iterating directory entries.

Parameters:

path – HDFS directory path without protocol.

Returns:

Async context manager producing FileEntry items.

Return type:

AsyncContextManager[AsyncIterator[FileEntry]]

scanfile(path: str, sort: bool = False) AsyncContextManager[AsyncIterator[FileEntry]][source]

Iteratively traverse only files under the given path.

Parameters:
  • path – HDFS path without protocol.

  • sort – Compatibility flag for protocol-aligned scanfile APIs.

Returns:

Async context manager yielding file entries.

Return type:

AsyncContextManager[AsyncIterator[FileEntry]]

async stat(path: str, followlinks: bool = False) StatResult[source]

Get the status of the path.

Parameters:
  • path – HDFS path without protocol.

  • followlinks – Ignored because HDFS symlinks are unsupported.

Returns:

Populated stat result.

Return type:

StatResult

Raise because HDFS symlinks are unsupported.

Parameters:
  • src_path – Source path.

  • dst_path – Destination path.

Raises:

NotImplementedError – Always.

async upload(src_path: str, dst_path: str, callback: Callable[[int], None] | None = None) None[source]

Upload a local file into HDFS.

Parameters:
  • src_path – Local source file path.

  • dst_path – Destination HDFS path without protocol.

  • callback – Optional callback receiving transferred byte counts.

aiomegfile.filesystem.hdfs.get_hdfs_client(profile_name: str | None = None) Any[source]

Create or reuse a cached sync HDFS client.

Parameters:

profile_name – Optional HDFS profile name.

Returns:

HDFS client instance.

Return type:

Any

Raises:

ImportError – If the optional hdfs dependency is unavailable.

aiomegfile.filesystem.hdfs.get_hdfs_config(profile_name: str | None = None) Dict[str, Any][source]

Load HDFS configuration from environment variables and config file.

Parameters:

profile_name – Optional HDFS profile name.

Returns:

HDFS client configuration dictionary.

Return type:

dict[str, Any]

Raises:

Exception – If required HDFS config is missing.

aiomegfile.filesystem.hdfs.is_hdfs(path: str | PathLike) bool[source]

Return whether the given path is an HDFS URI.

Parameters:

path – Path to be tested.

Returns:

True if path is an HDFS URI.

Return type:

bool