Mixin Columns¶
astropy tables support the concept of a “mixin column” in tables, which
allows integration of appropriate non-Column based class objects within a
Table object. These mixin column objects are not converted in any way but are
used natively.
The available built-in mixin column classes are:
Example¶
As an example we can create a table and add a time column:
>>> from astropy.table import Table
>>> from astropy.time import Time
>>> t = Table()
>>> t['index'] = [1, 2]
>>> t['time'] = Time(['2001-01-02T12:34:56', '2001-02-03T00:01:02'])
>>> print(t)
index time
----- -----------------------
1 2001-01-02T12:34:56.000
2 2001-02-03T00:01:02.000
The important point here is that the time column is a bona fide Time
object:
>>> t['time']
<Time object: scale='utc' format='isot' value=['2001-01-02T12:34:56.000' '2001-02-03T00:01:02.000']>
>>> t['time'].mjd
array([51911.52425926, 51943.00071759])
Quantity and QTable¶
The ability to natively handle Quantity objects within a table makes it more
convenient to manipulate tabular data with units in a natural and robust way.
However, this feature introduces an ambiguity because data with a unit
(e.g., from a FITS binary table) can be represented as either a Column with a
unit attribute or as a Quantity object. In order to cleanly resolve this
ambiguity, astropy defines a minor variant of the Table class called
QTable. The QTable class is exactly the same as Table except that
Quantity is the default for any data column with a defined unit.
If you take advantage of the Quantity infrastructure in your analysis, then
QTable is the preferred way to create tables with units. If instead you use
table column units more as a descriptive label, then the plain Table class is
probably the best class to use.
Example¶
To illustrate these concepts we first create a standard Table where we supply
as input a Time object and a Quantity object with units of m / s. In
this case the quantity is converted to a Column (which has a unit
attribute but does not have all of the features of a Quantity):
>>> import astropy.units as u
>>> t = Table()
>>> t['index'] = [1, 2]
>>> t['time'] = Time(['2001-01-02T12:34:56', '2001-02-03T00:01:02'])
>>> t['velocity'] = [3, 4] * u.m / u.s
>>> print(t)
index time velocity
m / s
----- ----------------------- --------
1 2001-01-02T12:34:56.000 3.0
2 2001-02-03T00:01:02.000 4.0
>>> type(t['velocity'])
<class 'astropy.table.column.Column'>
>>> t['velocity'].unit
Unit("m / s")
>>> (t['velocity'] ** 2).unit # WRONG because Column is not smart about unit
Unit("m / s")
So instead let’s do the same thing using a quantity table QTable:
>>> from astropy.table import QTable
>>> qt = QTable()
>>> qt['index'] = [1, 2]
>>> qt['time'] = Time(['2001-01-02T12:34:56', '2001-02-03T00:01:02'])
>>> qt['velocity'] = [3, 4] * u.m / u.s
The velocity column is now a Quantity and behaves accordingly:
>>> type(qt['velocity'])
<class 'astropy.units.quantity.Quantity'>
>>> qt['velocity'].unit
Unit("m / s")
>>> (qt['velocity'] ** 2).unit # GOOD!
Unit("m2 / s2")
You can conveniently convert Table to QTable and vice-versa:
>>> qt2 = QTable(t)
>>> type(qt2['velocity'])
<class 'astropy.units.quantity.Quantity'>
>>> t2 = Table(qt2)
>>> type(t2['velocity'])
<class 'astropy.table.column.Column'>
Note
To summarize: the only difference between QTable and
Table is the behavior when adding a column that has a
specified unit. With QTable such a column is always
converted to a Quantity object before being added to the
table. Likewise if a unit is specified for an existing unit-less
Column in a QTable, then the column is
converted to Quantity.
The converse is that if you add a Quantity column to an
ordinary Table then it gets converted to an ordinary
Column with the corresponding unit attribute.
Mixin Attributes¶
The usual column attributes name, dtype, unit, format, and
description are available in any mixin column via the info property:
>>> qt['velocity'].info.name
'velocity'
This info property is a key bit of glue that allows a non-Column object to
behave much like a column.
The same info property is also available in standard
Column objects. These info attributes like
t['a'].info.name refer to the direct Column
attribute (e.g., t['a'].name) and can be used interchangeably.
Likewise in a Quantity object, info.dtype
attribute refers to the native dtype attribute of the object.
Note
When writing generalized code that handles column objects which
might be mixin columns, you must always use the info
property to access column attributes.
Details and Caveats¶
Most common table operations behave as expected when mixin columns are part of the table. However, there are limitations in the current implementation.
Adding or inserting a row
Adding or inserting a row works as expected only for mixin classes that are
mutable (data can be changed internally) and that have an insert() method.
Quantity and Time support insert() but, for example, SkyCoord does
not. If you tried to insert a row into a table with a SkyCoord column then
an exception like the following would occur:
ValueError: Unable to insert row because of exception in column 'skycoord':
'SkyCoord' object has no attribute 'insert'
Initializing from a list of rows or a list of dicts
This mode of initializing a table does not work with mixin columns, so both of the following will fail:
>>> qt = QTable([{'a': 1 * u.m, 'b': 2},
... {'a': 2 * u.m, 'b': 3}])
Traceback (most recent call last):
...
TypeError: only dimensionless scalar quantities can be converted to Python scalars
>>> qt = QTable(rows=[[1 * u.m, 2],
... [2 * u.m, 3]])
Traceback (most recent call last):
...
TypeError: only dimensionless scalar quantities can be converted to Python scalars
The problem lies in knowing if and how to assemble the individual elements
for each column into an appropriate mixin column. The current code uses
numpy to perform this function on numerical or string types, but it
does not handle mixin column types like Quantity or SkyCoord.
Masking
Mixin columns do not generally support masking (with the exception of Time),
but there is limited support for use of mixins within a masked table. In this
case a mask attribute is assigned to the mixin column object. This mask
is a special object that is a boolean array of False corresponding to the
mixin data shape. The mask looks like a normal numpy array but an
exception will be raised if True is assigned to any element. The
consequences of the limitation are most apparent in the high-level table
operations.
High-level table operations
The table below gives a summary of support for high-level operations on tables that contain mixin columns:
Operation |
Support |
|---|---|
Not implemented yet, but no fundamental limitation. |
|
Available for |
|
Works if output mixin column supports masking or if no masking is required. |
|
Works if output mixin column supports masking or if no masking is
required; key columns must be subclasses of |
|
Not implemented yet, uses grouped operations. |
ASCII table writing
Tables with mixin columns can be written out to file using the
astropy.io.ascii module, but the fast C-based writers are not available.
Instead, the pure-Python writers will be used. For writing tables with mixin
columns it is recommended to use the 'ecsv' ASCII format. This will fully
serialize the table data and metadata, allowing full “round-trip” of the table
when it is read back. See ECSV Format for details.
Binary table writing
Starting with `astropy 3.0, tables with mixin columns can be written in
binary format to file using both FITS and HDF5 formats. These can be read back
to recover exactly the original Table including mixin columns and metadata.
See Unified File Read/Write Interface for details.
Mixin Protocol¶
A key idea behind mixin columns is that any class which satisfies a specified
protocol can be used. That means many user-defined class objects which handle
array-like data can be used natively within a Table. The protocol is
relatively concise and requires that a class behave like a minimal numpy
array with the following properties:
Contains array-like data.
Implements
__getitem__to support getting data as a single item, slicing, or index array access.Has a
shapeattribute.Has a
__len__method for length.Has an
infoclass descriptor which is a subclass of theastropy.utils.data_info.MixinInfoclass.
The Example: ArrayWrapper section shows a minimal working example of a class which can be used as a mixin column. A pandas.Series object can function as a mixin column as well.
Other interesting possibilities for mixin columns include:
Columns which are dynamically computed as a function of other columns (AKA spreadsheet).
Columns which are themselves a
Table(i.e., nested tables). A proof of concept is available.
new_like() method¶
In order to support high-level operations like join and
vstack, a mixin class must provide a new_like() method
in the info class descriptor. A key part of the functionality is to ensure
that the input column metadata are merged appropriately and that the columns
have consistent properties such as the shape.
A mixin class that provides new_like() must also implement __setitem__
to support setting via a single item, slicing, or index array.
The new_like method has the following signature:
def new_like(self, cols, length, metadata_conflicts='warn', name=None):
"""
Return a new instance of this class which is consistent with the
input ``cols`` and has ``length`` rows.
This is intended for creating an empty column object whose elements can
be set in-place for table operations like join or vstack.
Parameters
----------
cols : list
List of input columns
length : int
Length of the output column object
metadata_conflicts : str ('warn'|'error'|'silent')
How to handle metadata conflicts
name : str
Output column name
Returns
-------
col : object
New instance of this class consistent with ``cols``
"""
Examples of this are found in the ColumnInfo and
QuantityInfo classes.
Example: ArrayWrapper¶
The code listing below shows an example of a data container class which acts as
a mixin column class. This class is a wrapper around a numpy array. It
is used in the astropy mixin test suite and is fully compliant as a mixin
column.
from astropy.utils.data_info import ParentDtypeInfo
class ArrayWrapper(object):
"""
Minimal mixin using a simple wrapper around a numpy array
"""
info = ParentDtypeInfo()
def __init__(self, data):
self.data = np.array(data)
if 'info' in getattr(data, '__dict__', ()):
self.info = data.info
def __getitem__(self, item):
if isinstance(item, (int, np.integer)):
out = self.data[item]
else:
out = self.__class__(self.data[item])
if 'info' in self.__dict__:
out.info = self.info
return out
def __setitem__(self, item, value):
self.data[item] = value
def __len__(self):
return len(self.data)
@property
def dtype(self):
return self.data.dtype
@property
def shape(self):
return self.data.shape
def __repr__(self):
return ("<{0} name='{1}' data={2}>"
.format(self.__class__.__name__, self.info.name, self.data))