Metadata-Version: 2.4
Name: unicode-rbnf
Version: 2.4.0
Summary: Rule-based number formatting using Unicode CLDR data
Author-email: Michael Hansen <mike@rhasspy.org>
License: MIT
Project-URL: Source Code, https://github.com/rhasspy/unicode-rbnf
Keywords: rbnf,unicode,number,format
Platform: any
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.8.0
Description-Content-Type: text/markdown
License-File: LICENSE.md
Provides-Extra: dev
Requires-Dist: black==24.8.0; extra == "dev"
Requires-Dist: build==1.2.2; extra == "dev"
Requires-Dist: flake8==7.1.1; extra == "dev"
Requires-Dist: mypy==1.14.0; extra == "dev"
Requires-Dist: pylint==3.2.7; extra == "dev"
Requires-Dist: pytest==8.3.4; extra == "dev"
Dynamic: license-file

# Unicode RBNF

A pure Python implementation of [rule based number formatting](https://icu-project.org/docs/papers/a_rule_based_approach_to_number_spellout/) (RBNF) using the [Unicode Common Locale Data Repository](https://cldr.unicode.org) (CLDR).

This lets you spell out numbers for a large number of locales:

``` python
from unicode_rbnf import RbnfEngine

engine = RbnfEngine.for_language("en")
assert engine.format_number(1234).text == "one thousand two hundred thirty-four"
```

Different formatting purposes are supported as well, depending on the locale:

``` python
from unicode_rbnf import RbnfEngine, FormatPurpose

engine = RbnfEngine.for_language("en")
assert engine.format_number(1999, FormatPurpose.CARDINAL).text == "one thousand nine hundred ninety-nine"
assert engine.format_number(1999, FormatPurpose.YEAR).text == "nineteen ninety-nine"
assert engine.format_number(11, FormatPurpose.ORDINAL).text == "eleventh"
```

For locales with multiple genders, cases, etc., the different texts are accessible in the result of `format_number`:

``` python
from unicode_rbnf import RbnfEngine

engine = RbnfEngine.for_language("de")
print(engine.format_number(1))
```

Result:

```
FormatResult(
  text='eins',
  text_by_ruleset={
    'spellout-numbering': 'eins',
    'spellout-cardinal-neuter': 'ein',
    'spellout-cardinal-masculine': 'ein',
    'spellout-cardinal-feminine': 'eine',
    'spellout-cardinal-n': 'einen',
    'spellout-cardinal-r': 'einer',
    'spellout-cardinal-s': 'eines',
    'spellout-cardinal-m': 'einem'
  }
)
```

The `text` property of the result holds the text of the ruleset with the shortest name (least specific).

## Supported locales

See: https://github.com/unicode-org/cldr/tree/release-44/common/rbnf

## Engine implementation

Not [all features](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classRuleBasedNumberFormat.html) of the RBNF engine are implemented. The following features are available:

* Literal text (`hundred`)
* Quotient substitution (`<<` or `←←`)
* Reminder substitution (`>>` or `→→`)
* Optional substitution (`[...]`)
* Rule substituton (`←%ruleset_name←`)
* Rule replacement (`=%ruleset_name=`)
* Special rules:
    * Negative numbers (`-x`)
    * Improper fractions (`x.x`)
    * Not a number (`NaN`)
    * Infinity (`Inf`)
    
Some features that will need to be added eventually:

* Proper fraction rules (`0.x`)
* Preceding reminder substitution (`>>>` or `→→→`)
* Number format strings (`==`)
* Decimal format patterns (`#,##0.00`)
* Plural replacements (`$(ordinal,one{st}...)`)
