SAGE II Loader¶

class pysagereader.SAGEIILoaderV700(data_folder: str = None, output_format: str = 'xarray', species: List[str] = ('aerosol', 'h2o', 'no2', 'ozone', 'background'), cf_names: bool = False, filter_aerosol: bool = False, filter_ozone: bool = False, enumerate_flags: bool = False, normalize_percent_error: bool = False, return_separate_flags: bool = False)[source]¶

Class designed to load the v7.00 SAGE II spec and index files provided by NASA ADSC into python

Data files must be accessible by the users machine, and can be downloaded from: https://eosweb.larc.nasa.gov/project/sage2/sage2_v7_table

Parameters:

data_folder – location of sage ii index and spec files.
output_format –
format for the output data. If 'xarray' the output is returned as an xarray.Dataset. If None the output is returned as a dictionary of numpy arrays.

NOTE: the following options only apply to xarray output types
species – Species to be returned in the output data. If None all species are returned. Options are aerosol, ozone, h2o, and no2. If more than one species is returned fields will be NaN-padded where data is not available. species is only used if 'xarray' is set as the output_data format, otherwise it has no effect.
cf_names – If True then CF-1.7 naming conventions are used for the output_data when xarray is selected.
filter_aerosol – filter the aerosol using the cloud flag
filter_ozone –
filter the ozone using the criteria recommended in the release notes
- Exclusion of all data points with an uncertainty estimate of 300% or greater
- Exclusion of all profiles with an uncertainty greater than 10% between 30 and 50 km
- Exclusion of all data points at altitude and below the occurrence of an aerosol extinction value of greater than 0.006 km^-1
- Exclusion of all data points at altitude and below the occurrence of both the 525nm aerosol extinction value exceeding 0.001 km^-1 and the 525/1020 extinction ratio falling below 1.4
- Exclusion of all data points below 35km an 200% or larger uncertainty estimate
enumerate_flags – expand the index and species flags to their boolean values.
normalize_percent_error – give the species error as percent rather than percent * 100
return_separate_flags – return the enumerated flags as a separate data array

Example

>>> sage = SAGEIILoaderV700()
>>> sage.data_folder = 'path/to/data'
>>> data = sage.load_data('2004-1-1','2004-5-1')

In addition to the sage ii fields reported in the files, two additional time fields are provided to allow for easier subsetting of the data.

data['mjd'] is a numpy array containing the modified julian dates of each scan

date['time'] is an pandas time series object containing the times of each scan

static convert_index_bit_flags(data: Dict[KT, VT]) → xarray.core.dataset.Dataset[source]¶

Convert the int32 index flags to a dataset of distinct flags

Parameters:	data – Dictionary of input data as returned by `load_data`
Returns:
Return type:	Dataset of the index bit flags

static convert_species_bit_flags(data: Dict[KT, VT]) → xarray.core.dataset.Dataset[source]¶

Convert the int32 species flags to a dataset of distinct flags

Parameters:	data – Dictionary of input data as returned by load_data
Returns:
Return type:	Dataset of the index bit flags

convert_to_xarray(data: Dict[KT, VT]) → Union[xarray.core.dataset.Dataset, Tuple[xarray.core.dataset.Dataset, xarray.core.dataset.Dataset]][source]¶

Parameters:	data – Data from the `load_data` function
Returns:
Return type:	data formatted to an xarray Dataset

get_index_filename(year: int, month: int) → str[source]¶

Returns the index filename given a year and month

Parameters:	year – year of the data that will be loaded month – month of the data that will be loaded
Returns:
Return type:	filename of the index file where the data is stored

static get_index_format() → Dict[str, Tuple[str, int]][source]¶

index format taken from sg2_indexinfo.pro provided in the v7.00 download

used for reading the binary data format

Returns:	an ordered dictionary of variables provided in the index file. Each dictionary field contains a tuple with the information (data type, length). Ordering is important as the sage ii binary files are read sequentially.
Return type:	Dict

get_spec_filename(year: int, month: int) → str[source]¶

Returns the spec filename given a year and month

Parameters:	year – year of the data that will be loaded month – month of the data that will be loaded
Returns:
Return type:	filename of the spec file where the data is stored

static get_spec_format() → Dict[str, Tuple[str, int]][source]¶

spec format taken from sg2_specinfo.pro provided in the v7.00 download

used for reading the binary data format

Returns:	Ordered dictionary of variables provided in the spec file. Each dictionary field contains a tuple with the information (data type, number of data points). Ordering is important as the sage ii binary files are read sequentially.
Return type:	Dict

load_data(min_date: str, max_date: str, min_lat: float = -90, max_lat: float = 90, min_lon: float = -180, max_lon: float = 360) → Union[Dict[KT, VT], xarray.core.dataset.Dataset][source]¶

Load the SAGE II data for the specified dates and locations.

Parameters:	min_date – start date where data will be loaded in iso format, eg: ‘2004-1-1’ max_date – end date where data will be loaded in iso format, eg: ‘2004-1-1’ min_lat – minimum latitude (optional) max_lat – maximum latitude (optional) min_lon – minimum longitude (optional) max_lon – maximum longitude (optional)
Returns:
Return type:	Variables are returned as numpy arrays (1 or 2 dimensional depending on the variable)

read_index_file(file: str) → Dict[KT, VT][source]¶

Read the binary file into a python data structure

Parameters:	file – filename to be read
Returns:
Return type:	data from the file

read_spec_file(file: str, num_profiles: int) → List[Dict[KT, VT]][source]¶

Parameters:	file – name of the spec file to be read num_profiles – number of profiles to read from the spec file (usually determined from the index file)
Returns:
Return type:	list of dictionaries containing the spec data. Each list is one event

static subset_data(data: Dict[KT, VT], min_date: str, max_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float) → Dict[KT, VT][source]¶

Removes any data from the dictionary that does not meet the specified time, latitude and longitude requirements.

Parameters:	data – dictionary of sage ii data. Must have the fields ‘mjd’, ‘Lat’ and ‘Lon’. All others are optional min_date – start date where data will be loaded in iso format, eg: ‘2004-1-1’ max_date – end date where data will be loaded in iso format, eg: ‘2004-1-1’ min_lat – minimum latitude (optional) max_lat – maximum latitude (optional) min_lon – minimum longitude (optional) max_lon – maximum longitude (optional)
Returns:
Return type:	returns the dictionary with only data in the valid latitude, longitude and time range