iter_objects#

List objects related API.

class s3pathlib.core.iter_objects.S3PathIterProxy(iterable: Iterable)[source]#

An iterator proxy utility class provide client side in-memory filter. It is highly inspired by sqlalchemy Result Proxy that depends on SQL server side filter.

Allow client side in-memory filtering for iterator object that yield S3Path.

It is a special variation of s3pathlib.iterproxy.IterProxy, See s3pathlib.iterproxy.IterProxy for more details.

New in version 1.0.3.

one() S3Path[source]#

Return one item from the iterator.

Example:

# create a iterproxy
>>> proxy = IterProxy(range(10))

# fetch one
>>> proxy.one()
0

# fetch another one
>>> proxy.one()
1

See also:

New in version 0.1.1.

one_or_none() S3Path | None[source]#

Return one item from the iterator. If nothing left in the iterator, it returns None.

Example:

# create a iterproxy
>>> proxy = IterProxy(range(10))

# iterate all items
>>> for i in proxy:
...     print(i)

# fetch one or none
>>> proxy.one_or_none()
None

>>> proxy.one()
StopIteration

See also:

New in version 0.1.1.

many(k: int) List[S3Path][source]#

Return k item yield from iterator as a list.

Example:

# Create a iterproxy
>>> proxy = IterProxy(range(10))

# fetch 3 items
>>> proxy.many(3)
[0, 1, 2]

>>> proxy.many(4)
[3, 4, 5, 6]

See also:

New in version 0.1.1.

all() List[S3Path][source]#

Return all remaining item in the iterator as a list.

Example:

# Create a iterproxy
>>> proxy = IterProxy(range(10))

# fetch 3 items
>>> proxy.many(3)
[0, 1, 2]

# fetch remaining
>>> proxy.all()
[3, 4, 5, 6, 7, 8, 9]

See also:

New in version 0.1.1.

skip(k: int) S3PathIterProxy[source]#

Skip next k items.

Example:

# Create a iterproxy
>>> proxy = IterProxy(range(10))

# skip first 3 items
>>> proxy.skip(3)

# fetch 4 items
>>> proxy.many(4)
[3, 4, 5, 6]

See also:

New in version 0.1.1.

filter_by_ext(*exts: str) S3PathIterProxy[source]#

Filter S3 object by file extension. Case is insensitive.

Example:

>>> p = S3Path("bucket")
>>> for path in p.iter_objects().filter_by_ext(".csv", ".json"):
...      print(path)
class s3pathlib.core.iter_objects.IterObjectsAPIMixin[source]#

A mixin class that implements the iter objects methods.

iter_objects(batch_size: int = 1000, limit: int = Sentinel('NOTHING'), encoding_type: str = Sentinel('NOTHING'), fetch_owner: bool = Sentinel('NOTHING'), start_after: str = Sentinel('NOTHING'), request_payer: str = Sentinel('NOTHING'), expected_bucket_owner: str = Sentinel('NOTHING'), recursive: bool = True, bsm: BotoSesManager | None = None) S3PathIterProxy[source]#

Recursively iterate objects under this prefix, yield S3Path.

Assuming we have the following folder structure:

s3://my-bucket/
s3://my-bucket/README.txt
s3://my-bucket/hard-folder/ (this is a hard folder)
s3://my-bucket/hard-folder/1.txt
s3://my-bucket/soft-folder/ (this is a soft folder)
s3://my-bucket/soft-folder/2.txt

Example:

>>> s3dir = S3Path("s3://my-bucket/")
>>> s3dir.iter_objects().all()
[
    S3Path('s3://my-bucket/README.txt'),
    S3Path('s3://my-bucket/hard-folder/'),
    S3Path('s3://my-bucket/hard-folder/1.txt'),
    S3Path('s3://my-bucket/soft-folder/2.txt'),
]
Parameters:
  • batch_size – Number of s3 object returned per paginator, valid value is from 1 ~ 1000. large number can reduce IO.

  • limit – Total number of s3 object to return.

  • encoding_type – See ListObjectsV2.

  • fetch_owner – See ListObjectsV2.

  • start_after – See ListObjectsV2.

  • request_payer – See ListObjectsV2.

  • expected_bucket_owner – See ListObjectsV2.

  • recursive – if True, it won’t include files in sub folders.

  • bsm – See bsm.

New in version 1.0.1.

Changed in version 2.0.1: Remove include_folder argument. Support all list_objects_v2 arguments.

TODO: add unix glob liked syntax for pattern matching

iterdir(batch_size: int = 1000, limit: int = Sentinel('NOTHING'), encoding_type: str = Sentinel('NOTHING'), fetch_owner: bool = Sentinel('NOTHING'), start_after: str = Sentinel('NOTHING'), request_payer: str = Sentinel('NOTHING'), expected_bucket_owner: str = Sentinel('NOTHING'), bsm: BotoSesManager | None = None) S3PathIterProxy[source]#

iterate objects and folder under this prefix non-recursively, yield S3Path.

Assuming we have the following folder structure:

s3://my-bucket/
s3://my-bucket/README.txt
s3://my-bucket/hard-folder/ (this is a hard folder)
s3://my-bucket/hard-folder/1.txt
s3://my-bucket/soft-folder/ (this is a soft folder)
s3://my-bucket/soft-folder/2.txt

Example:

>>> s3dir = S3Path("s3://my-bucket/")
>>> s3dir.iterdir().all()
[
    S3Path('s3://my-bucket/hard-folder/'),
    S3Path('s3://my-bucket/soft-folder/'),
    S3Path('s3://my-bucket/README.txt'),
]
Parameters:
  • batch_size – number of s3 object returned per paginator, valid value is from 1 ~ 1000. large number can reduce IO.

  • limit – total number of s3 object (not folder)to return

  • encoding_type – See ListObjectsV2.

  • fetch_owner – See ListObjectsV2.

  • start_after – See ListObjectsV2.

  • request_payer – See ListObjectsV2.

  • expected_bucket_owner – See ListObjectsV2.

  • bsm – See bsm.

New in version 1.0.6.

Changed in version 2.0.1: Support all list_objects_v2 arguments.

calculate_total_size(for_human: bool = False, include_folder: bool = False, bsm: BotoSesManager | None = None) Tuple[int, int | str][source]#

Perform the “Calculate Total Size” action in AWS S3 console

Assuming we have the following folder structure:

s3://my-bucket/
s3://my-bucket/README.txt
s3://my-bucket/hard-folder/ (this is a hard folder)
s3://my-bucket/hard-folder/1.txt
s3://my-bucket/soft-folder/ (this is a soft folder)
s3://my-bucket/soft-folder/2.txt

Example:

>>> s3dir = S3Path("s3://my-bucket/")
>>> s3dir.calculate_total_size()
(3, 15360) # README.txt, hard-folder/1.txt, soft-folder/2.txt
>>> s3dir.calculate_total_size(for_human=True)
(3, 15 KB) # README.txt, hard-folder/1.txt, soft-folder/2.txt
>>> s3dir.count_objects(include_folder=True)
(4, 15 KB) # README.txt, hard-folder/, hard-folder/1.txt, soft-folder/2.txt
Parameters:
  • for_human – Default False. If true, returns human readable string for “size”.

  • include_folder – Default False, whether counting the hard folder

(an empty “/” object). :param bsm: See bsm.

Returns:

a tuple, first value is number of objects, second value is total size in bytes

New in version 1.0.1.

count_objects(include_folder: bool = False, bsm: BotoSesManager | None = None) int[source]#

Count how many objects are under this s3 directory.

Assuming we have the following folder structure:

s3://my-bucket/
s3://my-bucket/README.txt
s3://my-bucket/hard-folder/ (this is a hard folder)
s3://my-bucket/hard-folder/1.txt
s3://my-bucket/soft-folder/ (this is a soft folder)
s3://my-bucket/soft-folder/2.txt

Example:

>>> s3dir = S3Path("s3://my-bucket/")
>>> s3dir.count_objects()
3 # README.txt, hard-folder/1.txt, soft-folder/2.txt
>>> s3dir.count_objects(include_folder=True)
4 # README.txt, hard-folder/, hard-folder/1.txt, soft-folder/2.txt
Parameters:

include_folder – Default False, whether counting the hard folder

(an empty “/” object). :param bsm: See bsm.

Returns:

an integer represents the number of objects

New in version 1.0.1.