iter_objects#
List objects related API.
- class s3pathlib.core.iter_objects.S3PathIterProxy(iterable: Iterable)[source]#
An iterator proxy utility class provide client side in-memory filter. It is highly inspired by sqlalchemy Result Proxy that depends on SQL server side filter.
Allow client side in-memory filtering for iterator object that yield
S3Path
.It is a special variation of
s3pathlib.iterproxy.IterProxy
, Sees3pathlib.iterproxy.IterProxy
for more details.New in version 1.0.3.
- one() S3Path [source]#
Return one item from the iterator.
Example:
# create a iterproxy >>> proxy = IterProxy(range(10)) # fetch one >>> proxy.one() 0 # fetch another one >>> proxy.one() 1
See also:
New in version 0.1.1.
- one_or_none() S3Path | None [source]#
Return one item from the iterator. If nothing left in the iterator, it returns None.
Example:
# create a iterproxy >>> proxy = IterProxy(range(10)) # iterate all items >>> for i in proxy: ... print(i) # fetch one or none >>> proxy.one_or_none() None >>> proxy.one() StopIteration
See also:
New in version 0.1.1.
- many(k: int) List[S3Path] [source]#
Return k item yield from iterator as a list.
Example:
# Create a iterproxy >>> proxy = IterProxy(range(10)) # fetch 3 items >>> proxy.many(3) [0, 1, 2] >>> proxy.many(4) [3, 4, 5, 6]
See also:
New in version 0.1.1.
- all() List[S3Path] [source]#
Return all remaining item in the iterator as a list.
Example:
# Create a iterproxy >>> proxy = IterProxy(range(10)) # fetch 3 items >>> proxy.many(3) [0, 1, 2] # fetch remaining >>> proxy.all() [3, 4, 5, 6, 7, 8, 9]
See also:
New in version 0.1.1.
- skip(k: int) S3PathIterProxy [source]#
Skip next k items.
Example:
# Create a iterproxy >>> proxy = IterProxy(range(10)) # skip first 3 items >>> proxy.skip(3) # fetch 4 items >>> proxy.many(4) [3, 4, 5, 6]
See also:
New in version 0.1.1.
- filter_by_ext(*exts: str) S3PathIterProxy [source]#
Filter S3 object by file extension. Case is insensitive.
Example:
>>> p = S3Path("bucket") >>> for path in p.iter_objects().filter_by_ext(".csv", ".json"): ... print(path)
- class s3pathlib.core.iter_objects.IterObjectsAPIMixin[source]#
A mixin class that implements the iter objects methods.
- iter_objects(batch_size: int = 1000, limit: int = Sentinel('NOTHING'), encoding_type: str = Sentinel('NOTHING'), fetch_owner: bool = Sentinel('NOTHING'), start_after: str = Sentinel('NOTHING'), request_payer: str = Sentinel('NOTHING'), expected_bucket_owner: str = Sentinel('NOTHING'), recursive: bool = True, bsm: BotoSesManager | None = None) S3PathIterProxy [source]#
Recursively iterate objects under this prefix, yield
S3Path
.Assuming we have the following folder structure:
s3://my-bucket/ s3://my-bucket/README.txt s3://my-bucket/hard-folder/ (this is a hard folder) s3://my-bucket/hard-folder/1.txt s3://my-bucket/soft-folder/ (this is a soft folder) s3://my-bucket/soft-folder/2.txt
Example:
>>> s3dir = S3Path("s3://my-bucket/") >>> s3dir.iter_objects().all() [ S3Path('s3://my-bucket/README.txt'), S3Path('s3://my-bucket/hard-folder/'), S3Path('s3://my-bucket/hard-folder/1.txt'), S3Path('s3://my-bucket/soft-folder/2.txt'), ]
- Parameters:
batch_size – Number of s3 object returned per paginator, valid value is from 1 ~ 1000. large number can reduce IO.
limit – Total number of s3 object to return.
encoding_type – See ListObjectsV2.
fetch_owner – See ListObjectsV2.
start_after – See ListObjectsV2.
request_payer – See ListObjectsV2.
expected_bucket_owner – See ListObjectsV2.
recursive – if True, it won’t include files in sub folders.
bsm – See bsm.
New in version 1.0.1.
Changed in version 2.0.1: Remove
include_folder
argument. Support all list_objects_v2 arguments.TODO: add unix glob liked syntax for pattern matching
- iterdir(batch_size: int = 1000, limit: int = Sentinel('NOTHING'), encoding_type: str = Sentinel('NOTHING'), fetch_owner: bool = Sentinel('NOTHING'), start_after: str = Sentinel('NOTHING'), request_payer: str = Sentinel('NOTHING'), expected_bucket_owner: str = Sentinel('NOTHING'), bsm: BotoSesManager | None = None) S3PathIterProxy [source]#
iterate objects and folder under this prefix non-recursively, yield
S3Path
.Assuming we have the following folder structure:
s3://my-bucket/ s3://my-bucket/README.txt s3://my-bucket/hard-folder/ (this is a hard folder) s3://my-bucket/hard-folder/1.txt s3://my-bucket/soft-folder/ (this is a soft folder) s3://my-bucket/soft-folder/2.txt
Example:
>>> s3dir = S3Path("s3://my-bucket/") >>> s3dir.iterdir().all() [ S3Path('s3://my-bucket/hard-folder/'), S3Path('s3://my-bucket/soft-folder/'), S3Path('s3://my-bucket/README.txt'), ]
- Parameters:
batch_size – number of s3 object returned per paginator, valid value is from 1 ~ 1000. large number can reduce IO.
limit – total number of s3 object (not folder)to return
encoding_type – See ListObjectsV2.
fetch_owner – See ListObjectsV2.
start_after – See ListObjectsV2.
request_payer – See ListObjectsV2.
expected_bucket_owner – See ListObjectsV2.
bsm – See bsm.
New in version 1.0.6.
Changed in version 2.0.1: Support all list_objects_v2 arguments.
- calculate_total_size(for_human: bool = False, include_folder: bool = False, bsm: BotoSesManager | None = None) Tuple[int, int | str] [source]#
Perform the “Calculate Total Size” action in AWS S3 console
Assuming we have the following folder structure:
s3://my-bucket/ s3://my-bucket/README.txt s3://my-bucket/hard-folder/ (this is a hard folder) s3://my-bucket/hard-folder/1.txt s3://my-bucket/soft-folder/ (this is a soft folder) s3://my-bucket/soft-folder/2.txt
Example:
>>> s3dir = S3Path("s3://my-bucket/") >>> s3dir.calculate_total_size() (3, 15360) # README.txt, hard-folder/1.txt, soft-folder/2.txt >>> s3dir.calculate_total_size(for_human=True) (3, 15 KB) # README.txt, hard-folder/1.txt, soft-folder/2.txt >>> s3dir.count_objects(include_folder=True) (4, 15 KB) # README.txt, hard-folder/, hard-folder/1.txt, soft-folder/2.txt
- Parameters:
for_human – Default False. If true, returns human readable string for “size”.
include_folder – Default False, whether counting the hard folder
(an empty “/” object). :param bsm: See bsm.
- Returns:
a tuple, first value is number of objects, second value is total size in bytes
New in version 1.0.1.
- count_objects(include_folder: bool = False, bsm: BotoSesManager | None = None) int [source]#
Count how many objects are under this s3 directory.
Assuming we have the following folder structure:
s3://my-bucket/ s3://my-bucket/README.txt s3://my-bucket/hard-folder/ (this is a hard folder) s3://my-bucket/hard-folder/1.txt s3://my-bucket/soft-folder/ (this is a soft folder) s3://my-bucket/soft-folder/2.txt
Example:
>>> s3dir = S3Path("s3://my-bucket/") >>> s3dir.count_objects() 3 # README.txt, hard-folder/1.txt, soft-folder/2.txt >>> s3dir.count_objects(include_folder=True) 4 # README.txt, hard-folder/, hard-folder/1.txt, soft-folder/2.txt
- Parameters:
include_folder – Default False, whether counting the hard folder
(an empty “/” object). :param bsm: See bsm.
- Returns:
an integer represents the number of objects
New in version 1.0.1.