Create a histogram from data by putting data into bins of fixed width.
- Parameters
-
| indata | The input data that will be binned. This is copied and the copy will be modified. |
| close_top_bin | Normally, a bin covers the range from the point equal to its minimum to points strictly less than the minimum plus the width. if 'y', then the top bin includes points less than or equal to the upper bound. This solves the problem of displaying histograms where the top bin is just one point. |
| binspec | This is an apop_data set with the same number of columns as indata. If you want a fixed size for the bins, then the first row of the bin spec is the bin width for each column. This allows you to specify a width for each dimension, or specify the same size for all with something like: |
| bin_count | If you don't provide a bin spec, I'll provide this many evenly-sized bins. Default: . 1 Apop_row(indata, 0, firstrow);
2 apop_data *binspec = apop_data_copy(firstrow);
3 gsl_matrix_set_all(binspec->matrix, 10); //bins of size 10 for all dim.s
4 apop_data_to_bins(indata, binspec);
The presumption is that the first bin starts at zero in all cases. You can add a second row to the spec to give the offset for each dimension. Default: NULL. if no binspec and no binlist, then a grid with offset equal to the min of the column, and bin size such that it takes bins to cover the range to the max element. |
- Returns
- A pointer to a binned apop_data set. If you didn't give me a binspec, then I attach one to the output set as a page named
<binspec>, so you can snap a second data set to the same grid using 1 apop_data_to_bins(first_set, NULL);
2 apop_data_to_bins(second_set, apop_data_get_page(first_set, "<binspec>"));
The text segment, if any, is not binned. I use apop_data_pmf_compress as the final step in the binning, and that does respect the text segment.
Here is a sample program highlighting the difference between apop_data_to_bins and apop_data_pmf_compress .
#define _GNU_SOURCE
#ifdef Testing
#define printdata(dataset) ;
#else
#define printdata(dataset) \
printf("\n-----------\n\n"); \
apop_data_print(dataset);
#endif
int main(){
asprintf(&d->names->title, "Original data set");
printdata(d);
asprintf(&binned->names->title, "Post binning");
printdata(binned);
assert(fabs(
asprintf(&d->names->title, "Post compression");
printdata(d);
assert(fabs(
apop_p(firstrow, d_as_pmf) - 2./6 < 1e-5));
}