utils#
- s3pathlib.utils.split_s3_uri(s3_uri: str) Tuple[str, str] [source]#
Split AWS S3 URI, returns bucket and key.
- Parameters:
s3_uri – example,
"s3://my-bucket/my-folder/data.json"
New in version 1.0.1.
- s3pathlib.utils.join_s3_uri(bucket: str, key: str) str [source]#
Join AWS S3 URI from bucket and key.
- Parameters:
bucket – example,
"my-bucket"
key – example,
"my-folder/data.json"
or"my-folder/"
New in version 1.0.1.
- s3pathlib.utils.split_parts(key: str) List[str] [source]#
Split s3 key parts using “/” delimiter.
Example:
>>> split_parts("a/b/c") ["a", "b", "c"] >>> split_parts("//a//b//c//") ["a", "b", "c"]
New in version 1.0.1.
- s3pathlib.utils.smart_join_s3_key(parts: List[str], is_dir: bool) str [source]#
Note, it assume that there’s no such double slack in your path. It ensure that there’s only one consecutive “/” in the s3 key.
- Parameters:
parts – list of s3 key path parts, could have “/”
is_dir – if True, the s3 key ends with “/”. otherwise enforce no tailing “/”.
Example:
>>> smart_join_s3_key(parts=["/a/", "b/", "/c"], is_dir=True) a/b/c/ >>> smart_join_s3_key(parts=["/a/", "b/", "/c"], is_dir=False) a/b/c
New in version 1.0.1.
- s3pathlib.utils.make_s3_console_url(bucket: str | None = None, prefix: str | None = None, s3_uri: str | None = None, version_id: str | None = None, is_us_gov_cloud: bool = False) str [source]#
Return an AWS Console url that you can use to open it in your browser.
- Parameters:
bucket – example,
"my-bucket"
prefix – example,
"my-folder/"
s3_uri – example,
"s3://my-bucket/my-folder/data.json"
Example:
>>> make_s3_console_url(s3_uri="s3://my-bucket/my-folder/data.json") https://s3.console.aws.amazon.com/s3/object/my-bucket?prefix=my-folder/data.json
New in version 1.0.1.
Changed in version 2.0.1: add
version_id
parameter.
- s3pathlib.utils.ensure_s3_object(s3_key_or_uri: str) None [source]#
Raise exception if the string is not in valid format for a AWS S3 object
New in version 1.0.1.
- s3pathlib.utils.ensure_s3_dir(s3_key_or_uri: str) None [source]#
Raise exception if the string is not in valid format for a AWS S3 directory
New in version 1.0.1.
- s3pathlib.utils.validate_s3_bucket(bucket)[source]#
Ref: https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html
- s3pathlib.utils.validate_s3_key(key)[source]#
Ref: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines
- s3pathlib.utils.repr_data_size(size_in_bytes: int, precision: int = 2) str [source]#
Return human readable string represent of a file size. Doesn’t support size greater than 1YB.
For example:
100 bytes => 100 B
100,000 bytes => 97.66 KB
100,000,000 bytes => 95.37 MB
100,000,000,000 bytes => 93.13 GB
100,000,000,000,000 bytes => 90.95 TB
100,000,000,000,000,000 bytes => 88.82 PB
and more …
Magnitude of data:
1000 kB kilobyte 1000 ** 2 MB megabyte 1000 ** 3 GB gigabyte 1000 ** 4 TB terabyte 1000 ** 5 PB petabyte 1000 ** 6 EB exabyte 1000 ** 7 ZB zettabyte 1000 ** 8 YB yottabyte
New in version 1.0.1.
- s3pathlib.utils.parse_data_size(s) int [source]#
Parse human readable string representing a file size. Doesn’t support size greater than 1YB.
Examples:
>>> parse_data_size("3.43 MB") 3596615 >>> parse_data_size("2_512.4 MB") 2634442342 >>> parse_data_size("2,512.4 MB") 2634442342
New in version 1.0.5.
- s3pathlib.utils.hash_binary(b: bytes, hash_meth: callable) str [source]#
Get the hash of a binary object.
- Parameters:
b – binary object
hash_meth – callable hash method, example: hashlib.md5
- Returns:
hash value in hex digits.
New in version 1.0.1.
- s3pathlib.utils.md5_binary(b: bytes) str [source]#
Get the md5 hash of a binary object.
- Parameters:
b – binary object
- Returns:
hash value in hex digits.
New in version 1.0.1.
- s3pathlib.utils.sha256_binary(b: bytes) str [source]#
Get the md5 hash of a binary object.
- Parameters:
b – binary object
- Returns:
hash value in hex digits.
New in version 1.0.1.
- s3pathlib.utils.hash_file(abspath: str, hash_meth: callable, nbytes: int = 0, chunk_size: int = 64) str [source]#
Get the hash of a file on local drive.
- Parameters:
abspath – absolute path of the file
hash_meth – callable hash method, example: hashlib.md5
nbytes – only hash first nbytes of the file
chunk_size – internal option, stream chunk_size of the data for hash each time, avoid high memory usage.
- Returns:
hash value in hex digits.
New in version 1.0.1.
- s3pathlib.utils.grouper_list(l: Iterable, n: int) Iterable[list] [source]#
Evenly divide list into fixed-length piece, no filled value if chunk size smaller than fixed-length.
Example:
>>> list(grouper_list(range(10), n=3) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
- Parameters:
l – an iterable object
n – number of item per list
New in version 1.0.1.