read_csv#
- astropy.io.misc.pyarrow.csv.read_csv(input_file, *, delimiter=',', quotechar='"', doublequote=True, escapechar=False, header_start=0, data_start=None, names=None, include_names=None, dtypes=None, comment=None, null_values=None, encoding='utf-8', newlines_in_values=False, timestamp_parsers=None)[source]#
Read a CSV file into an astropy Table using PyArrow.
This function allows highly performant reading of text CSV files into an astropy
Tableusing PyArrow. The best performance is achieved for files with only numeric data types, but even for files with mixed data types, the performance is still better than the the standardastropy.io.asciifast CSV reader.By default, empty values (zero-length string “”) in the CSV file are read as masked values in the Table. This can be changed by using the
null_valuesparameter to specify a list of strings to interpret as null (masked) values.Entirely empty lines in the CSV file are ignored.
Columns consisting of only string values
TrueandFalseare parsed as boolean data.Columns with ISO 8601 date/time strings are parsed as shown below: -
12:13:14.123456:object[datetime.time]-2025-01-01:np.datetime64[D]-2025-01-01T01:02:03:np.datetime64[s]-2025-01-01T01:02:03.123456:np.datetime64[ns]Support for ignoring comment lines in the CSV file is provided by the
commentparameter. If this is set to a string, any line starting with optional whitespace and then this string is ignored. This is done by reading the entire file and scanning for comment lines. If the comment lines are all at the beginning of the file and bothheader_startanddata_startare not specified, then the file is read efficiently by settingheader_startto the first line after the comments. Otherwise the entire file is read into memory and the comment lines are removed before passing to the PyArrow CSV reader. Any values ofheader_startanddata_startapply to the lines counts after the comment lines have been removed.- Parameters:
- input_file
python:str, python:path-like object, or python:file-like object File path or binary file-like object to read from.
- delimiter1-character
python:str, optional (default “,”) Character delimiting individual cells in the CSV data.
- quotechar1-character
python:strorpython:False, optional (default ‘”’) Character used optionally for quoting CSV values (
Falseif quoting is not allowed).- doublequotebool, optional (default
True) Whether two quotes in a quoted CSV value denote a single quote in the data.
- escapechar1-character
python:strorFalse, optional (defaultFalse) Character used optionally for escaping special characters (
Falseif escaping is not allowed).- header_start
python:int,python:None, optional (default 0) Line index for the header line with column names. If
Nonethis implies that there is no header line and the column names are taken fromnamesor generated automatically (“f0”, “f1”, …).- data_start
python:int,python:None, optional (defaultpython:None) Line index for the start of data. If
None, then data starts one line after the header line, or on the first line if there is no header.- names
python:list,python:None, optional (defaultpython:None) List of names for input data columns when there is no header line. If supplied, then
header_startmust beNone.- include_names
python:list,python:None, optional (defaultpython:None) List of column names to include in output. If
None, all columns are included.- dtypes
python:dict[python:str,Any],python:None, optional (defaultpython:None) If provided, this is a dictionary of data types for output columns. Each key is a column name and the value is either a PyArrow data type or a data type specifier that is accepted as an argument to
numpy.dtype. Examples includepyarrow.Int32(),pyarrow.time32("s"),int,np.float32,np.dtype('f4')or"float32". Default is to infer the data types.- comment1-character
python:strorpython:None, optional (defaultpython:None) Character used to indicate the start of a comment. Any line starting with optional whitespace and then this character is ignored. Using this option will cause the parser to be slower and potentially use more memory as it uses Python code to strip comments.
- input_file
- Returns:
astropy.table.TableAn astropy Table containing the data from the CSV file.
- Other Parameters:
- null_values
python:list, optional (defaultpython:None) List of strings to interpret as null values. By default, only empty strings are considered as null values (equivalent to
null_values=[""]). Set to[]to disable null value handling.- encoding: str, optional (default ‘utf-8’)
Encoding of the input data.
- newlines_in_values: bool, optional (default False)
Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.
- timestamp_parsers: list, optional
A sequence of strptime()-compatible format strings, tried in order when attempting to infer or convert timestamp values. The default is the special value
pyarrow.csv.ISO8601uses the optimized internal ISO8601 parser.
- null_values