datalad_next.itertools.itemize
- datalad_next.itertools.itemize(iterable: Iterable[T], sep: T | None, *, keep_ends: bool = False) Generator[T, None, None][source]
Yields complete items (only), assembled from an iterable
This function consumes chunks from an iterable and yields items defined by a separator. An item might span multiple input chunks. Input (chunks) can be
bytes,bytearray, orstrobjects. The result type is determined by the type of the first input chunk. During its runtime, the type of the elements initerablemust not change.Items are defined by a separator given via
sep. IfsepisNone, the line-separators built intostr.splitlines()are used, and each yielded item will be a line. Ifsepis not None, its type must be compatible to the type of the elements initerable.A separator could, for example, be
b'\n', in which case the items would be terminated by Unix line-endings, i.e. each yielded item is a single line. The separator could also be,b'\x00'(or'\x00'), to split zero-byte delimited content, like the output ofgit ls-files -z.Separators can be longer than one byte or character, e.g.
b'\r\n', orb'\n-------------------\n'.Content after the last separator, possibly merged across input chunks, is always yielded as the last item, even if it is not terminated by the separator.
Performance notes:
Using
Noneas a separator (splitlines-mode) is slower than providing a specific separator.If another separator than
Noneis used, the runtime withkeep_end=Falseis faster than withkeep_end=True.
- Parameters:
iterable (Iterable[str | bytes | bytearray]) -- The iterable that yields the input data
sep (str | bytes | bytearray | None) -- The separator that defines items. If
None, the items are determined by the line-separators that are built intostr.splitlines().keep_ends (bool) -- If True, the item-separator will remain at the end of a yielded item. If False, items will not contain the separator. Preserving separators implies a runtime cost, unless the separator is
None.
- Yields:
str | bytes | bytearray -- The items determined from the input iterable. The type of the yielded items depends on the type of the first element in
iterable.
Examples
>>> from datalad_next.itertools import itemize >>> with open('/etc/passwd', 'rt') as f: ... print(tuple(itemize(iter(f.read, ''), sep=None))[0:2]) ('root:x:0:0:root:/root:/bin/bash', 'systemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin') >>> with open('/etc/passwd', 'rt') as f: ... print(tuple(itemize(iter(f.read, ''), sep=':'))[0:10]) ('root', 'x', '0', '0', 'root', '/root', '/bin/bash\nsystemd-timesync', 'x', '497', '497') >>> with open('/etc/passwd', 'rt') as f: ... print(tuple(itemize(iter(f.read, ''), sep=':', keep_ends=True))[0:10]) ('root:', 'x:', '0:', '0:', 'root:', '/root:', '/bin/bash\nsystemd-timesync:', 'x:', '497:', '497:')