Interfacing with the Pandas Package¶
The pandas package is a package for high
performance data analysis of table-like structures that is complementary to the
Table class in astropy.
In order to exchange data between the Table class and
the pandas DataFrame class (the main data structure in pandas), the
Table class includes two methods,
to_pandas() and
from_pandas().
Example¶
To demonstrate, we can create a minimal table:
>>> from astropy.table import Table
>>> t = Table()
>>> t['a'] = [1, 2, 3, 4]
>>> t['b'] = ['a', 'b', 'c', 'd']
Which we can then convert to a pandas DataFrame:
>>> df = t.to_pandas()
>>> df
a b
0 1 a
1 2 b
2 3 c
3 4 d
>>> type(df)
<class 'pandas.core.frame.DataFrame'>
It is also possible to create a table from a DataFrame:
>>> t2 = Table.from_pandas(df)
>>> t2
<Table length=4>
a b
int64 string8
----- -------
1 a
2 b
3 c
4 d
The conversions to and from pandas are subject to the following caveats:
The
pandasDataFrame structure does not support multidimensional columns, soTableobjects with multidimensional columns cannot be converted to DataFrame.Masked tables can be converted, but DataFrame uses
numpy.nanto indicate masked values, so all numerical columns (integer or float) are converted tonumpy.floatcolumns in DataFrame, and string columns with missing values are converted to object columns withnumpy.nanvalues to indicate missing values. For numerical columns, the conversion therefore does not necessarily round-trip if converting back to anastropytable, because the distinction betweennumpy.nanand masked values is lost, and the different integer columns (for example) will be converted to floating- point.Tables with mixin columns can currently not be converted, but this may be implemented in the future.