utils#

s3pathlib.utils.split_s3_uri(s3_uri: str) Tuple[str, str][source]#

Split AWS S3 URI, returns bucket and key.

Parameters:

s3_uri – example, "s3://my-bucket/my-folder/data.json"

New in version 1.0.1.

s3pathlib.utils.join_s3_uri(bucket: str, key: str) str[source]#

Join AWS S3 URI from bucket and key.

Parameters:
  • bucket – example, "my-bucket"

  • key – example, "my-folder/data.json" or "my-folder/"

New in version 1.0.1.

s3pathlib.utils.split_parts(key: str) List[str][source]#

Split s3 key parts using “/” delimiter.

Example:

>>> split_parts("a/b/c")
["a", "b", "c"]
>>> split_parts("//a//b//c//")
["a", "b", "c"]

New in version 1.0.1.

s3pathlib.utils.smart_join_s3_key(parts: List[str], is_dir: bool) str[source]#

Note, it assume that there’s no such double slack in your path. It ensure that there’s only one consecutive “/” in the s3 key.

Parameters:
  • parts – list of s3 key path parts, could have “/”

  • is_dir – if True, the s3 key ends with “/”. otherwise enforce no tailing “/”.

Example:

>>> smart_join_s3_key(parts=["/a/", "b/", "/c"], is_dir=True)
a/b/c/
>>> smart_join_s3_key(parts=["/a/", "b/", "/c"], is_dir=False)
a/b/c

New in version 1.0.1.

s3pathlib.utils.make_s3_console_url(bucket: str | None = None, prefix: str | None = None, s3_uri: str | None = None, version_id: str | None = None, is_us_gov_cloud: bool = False) str[source]#

Return an AWS Console url that you can use to open it in your browser.

Parameters:
  • bucket – example, "my-bucket"

  • prefix – example, "my-folder/"

  • s3_uri – example, "s3://my-bucket/my-folder/data.json"

Example:

>>> make_s3_console_url(s3_uri="s3://my-bucket/my-folder/data.json")
https://s3.console.aws.amazon.com/s3/object/my-bucket?prefix=my-folder/data.json

New in version 1.0.1.

Changed in version 2.0.1: add version_id parameter.

s3pathlib.utils.ensure_s3_object(s3_key_or_uri: str) None[source]#

Raise exception if the string is not in valid format for a AWS S3 object

New in version 1.0.1.

s3pathlib.utils.ensure_s3_dir(s3_key_or_uri: str) None[source]#

Raise exception if the string is not in valid format for a AWS S3 directory

New in version 1.0.1.

s3pathlib.utils.validate_s3_bucket(bucket)[source]#

Ref: https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html

s3pathlib.utils.validate_s3_key(key)[source]#

Ref: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines

s3pathlib.utils.repr_data_size(size_in_bytes: int, precision: int = 2) str[source]#

Return human readable string represent of a file size. Doesn’t support size greater than 1YB.

For example:

  • 100 bytes => 100 B

  • 100,000 bytes => 97.66 KB

  • 100,000,000 bytes => 95.37 MB

  • 100,000,000,000 bytes => 93.13 GB

  • 100,000,000,000,000 bytes => 90.95 TB

  • 100,000,000,000,000,000 bytes => 88.82 PB

  • and more …

Magnitude of data:

1000         kB    kilobyte
1000 ** 2    MB    megabyte
1000 ** 3    GB    gigabyte
1000 ** 4    TB    terabyte
1000 ** 5    PB    petabyte
1000 ** 6    EB    exabyte
1000 ** 7    ZB    zettabyte
1000 ** 8    YB    yottabyte

New in version 1.0.1.

s3pathlib.utils.parse_data_size(s) int[source]#

Parse human readable string representing a file size. Doesn’t support size greater than 1YB.

Examples:

>>> parse_data_size("3.43 MB")
3596615

>>> parse_data_size("2_512.4 MB")
2634442342

>>> parse_data_size("2,512.4 MB")
2634442342

New in version 1.0.5.

s3pathlib.utils.hash_binary(b: bytes, hash_meth: callable) str[source]#

Get the hash of a binary object.

Parameters:
  • b – binary object

  • hash_meth – callable hash method, example: hashlib.md5

Returns:

hash value in hex digits.

New in version 1.0.1.

s3pathlib.utils.md5_binary(b: bytes) str[source]#

Get the md5 hash of a binary object.

Parameters:

b – binary object

Returns:

hash value in hex digits.

New in version 1.0.1.

s3pathlib.utils.sha256_binary(b: bytes) str[source]#

Get the md5 hash of a binary object.

Parameters:

b – binary object

Returns:

hash value in hex digits.

New in version 1.0.1.

s3pathlib.utils.hash_file(abspath: str, hash_meth: callable, nbytes: int = 0, chunk_size: int = 64) str[source]#

Get the hash of a file on local drive.

Parameters:
  • abspath – absolute path of the file

  • hash_meth – callable hash method, example: hashlib.md5

  • nbytes – only hash first nbytes of the file

  • chunk_size – internal option, stream chunk_size of the data for hash each time, avoid high memory usage.

Returns:

hash value in hex digits.

New in version 1.0.1.

s3pathlib.utils.grouper_list(l: Iterable, n: int) Iterable[list][source]#

Evenly divide list into fixed-length piece, no filled value if chunk size smaller than fixed-length.

Example:

>>> list(grouper_list(range(10), n=3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Parameters:
  • l – an iterable object

  • n – number of item per list

New in version 1.0.1.