Configuration
Common Configuration
Environment configurations
MEGFILE_BLOCK_SIZE
: block size in someopen
func, likehttp_open
,s3_open
, default is8MB
MEGFILE_MAX_BLOCK_SIZE
: max block size in someopen
func, likehttp_open
,s3_open
, default isblock size * 16
MEGFILE_MAX_BUFFER_SIZE
: max buffer size in someopen
func, likehttp_open
,s3_open
, default isblock size * 16
MEGFILE_MAX_WORKERS
: max threads will be used, default is32
MEGFILE_BLOCK_CAPACITY
: default cache capacity of block and concurrency, default is16
MEGFILE_S3_CLIENT_CACHE_MODE
: s3 client cache mode,thread_local
orprocess_local
, default isthread_local
, it’s a experimental feature.
S3 Configuration
Before using megfile
to access files on s3, you need to set up authentication credentials for your s3 account. In addition to boto3, megfile
also supports some additional configuration items, and the following describes some common configurations.
You can use environments and configuration file for configuration, and priority is that environment variables take precedence over configuration file.
Use environments
You can use environments to setup authentication credentials for your s3 account:
AWS_ACCESS_KEY_ID
: access keyAWS_SECRET_ACCESS_KEY
: secret keyOSS_ENDPOINT
: endpoint url of s3AWS_S3_ADDRESSING_STYLE
: addressing style
Use command
You can update config file with megfile
command easyly:
megfile config s3 [OPTIONS] AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
$ megfile config s3 accesskey secretkey
# for aliyun
$ megfile config s3 accesskey secretkey \
--addressing-style virtual \
--endpoint-url http://oss-cn-hangzhou.aliyuncs.com \
You can get the configuration from ~/.aws/credentials
, like:
[default]
aws_secret_access_key = accesskey
aws_access_key_id = secretkey
s3 =
addressing_style = virtual
endpoint_url = http://oss-cn-hangzhou.aliyuncs.com
Config for different s3 server or authentications
You can operate s3 files with different endpoint urls, access keys and secret keys.
For example, you have two s3 server with different endpoint_url, access_key and secret key. With configuration, you can use path with profile name like s3+profile_name://bucket/key
to operate different s3 server:
from megfile import smart_sync
smart_sync('s3+profile1://bucket/key', 's3+profile2://bucket/key')
Using environment
You need use PROFILE_NAME__
prefix, like:
PROFILE1__AWS_ACCESS_KEY_ID
PROFILE1__AWS_SECRET_ACCESS_KEY
PROFILE1__OSS_ENDPOINT
PROFILE1__AWS_S3_ADDRESSING_STYLE
Using command:
megfile config s3 AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY --profile-name profile1
Then the config file’s content will be:
[profile1]
aws_secret_access_key = accesskey
aws_access_key_id = secretkey
Hdfs Configuration
Please use command pip install 'megfile[hdfs]'
to install hdfs requirements.
You can use environments and configuration file for configuration, and priority is that environment variables take precedence over configuration file.
Use environments
You can use environments to setup authentication credentials and other configuration items:
HDFS_USER
: hdfs userHDFS_URL
: The url can be configured to support High Availability namenodes of WebHDFS, simply add more URLs by delimiting with a semicolon (;
)HDFS_ROOT
: hdfs root directory when using relative pathHDFS_TIMEOUT
: request hdfs server timeoutHDFS_TOKEN
: hdfs token if hdfs server requireHDFS_CONFIG_PATH
: hdfs config file, default is~/.hdfscli.cfg
Use command
You can update config file with megfile
command easyly:
megfile config hdfs [OPTIONS] URL
$ megfile config hdfs http://127.0.0.1:50070 --user admin --root '/' --token xxx
You can get the configuration from ~/.hdfscli.cfg
, like:
[global]
default.alias = default
[default.alias]
url = http://127.0.0.1:50070
user = admin
root = /
token = xxx
Most information about configuration file: https://hdfscli.readthedocs.io/en/latest/quickstart.html#configuration
Config for different hdfs server
You can operate hdfs files in different hdfs server.
For example, you have two hdfs server with different url. With configuration, you can use path with profile name like hdfs+profile_name://bucket/key
to operate different hdfs server:
from megfile import smart_sync
smart_sync('hdfs+profile1://path/to/file', 'hdfs+profile2://path/to/file')
Using environment
You need use PROFILE_NAME__
prefix, like:
PROFILE1__HDFS_USER
PROFILE1__HDFS_URL
PROFILE1__HDFS_ROOT
PROFILE1__HDFS_TIMEOUT
PROFILE1__HDFS_TOKEN
Using command:
megfile config hdfs http://127.0.0.1:8000 --user admin \
--root /b --token aaa --profile-name profile1
megfile config hdfs http://127.0.0.1:8001 --user admin \
--root /a --token bbb --profile-name profile2
Then the configuration file’s content will be:
[global]
default.alias = default
[default.alias]
url = http://127.0.0.1:8000
user = admin
root = /a
token = aaa
[test.alias]
url = http://127.0.0.1:8001
user = admin
root = /b
token = bbb
Sftp Configuration
Sftp is a little different from other protocols, because you can set some configurations in path(sftp://[username[:password]@]hostname[:port]/file_path
). But we suggest you not to use password in path. You can use environments setting configuration, and priority is that path settings take precedence over environments.
Use environments
You can use environments to setup authentication credentials:
SFTP_USERNAME
SFTP_PASSWORD
SFTP_PRIVATE_KEY_PATH
: ssh private key pathSFTP_PRIVATE_KEY_TYPE
: algorithm of ssh keySFTP_PRIVATE_KEY_PASSWORD
: if don’t have passwd, not set this environmentSFTP_MAX_UNAUTH_CONN
: this enviroment is about sftp server’s MaxStartups configuration, for connect to sftp server concurrently.