SAGE II Loader¶
-
class
pysagereader.
SAGEIILoaderV700
(data_folder: str = None, output_format: str = 'xarray', species: List[str] = ('aerosol', 'h2o', 'no2', 'ozone', 'background'), cf_names: bool = False, filter_aerosol: bool = False, filter_ozone: bool = False, enumerate_flags: bool = False, normalize_percent_error: bool = False, return_separate_flags: bool = False)[source]¶ Class designed to load the v7.00 SAGE II spec and index files provided by NASA ADSC into python
Data files must be accessible by the users machine, and can be downloaded from: https://eosweb.larc.nasa.gov/project/sage2/sage2_v7_table
Parameters: - data_folder – location of sage ii index and spec files.
- output_format –
format for the output data. If
'xarray'
the output is returned as anxarray.Dataset
. If None the output is returned as a dictionary of numpy arrays.NOTE: the following options only apply to xarray output types
- species – Species to be returned in the output data. If None all species are returned. Options are
aerosol
,ozone
,h2o
, andno2
. If more than one species is returned fields will be NaN-padded where data is not available.species
is only used if'xarray'
is set as theoutput_data
format, otherwise it has no effect. - cf_names – If True then CF-1.7 naming conventions are used for the output_data when
xarray
is selected. - filter_aerosol – filter the aerosol using the cloud flag
- filter_ozone –
filter the ozone using the criteria recommended in the release notes
- Exclusion of all data points with an uncertainty estimate of 300% or greater
- Exclusion of all profiles with an uncertainty greater than 10% between 30 and 50 km
- Exclusion of all data points at altitude and below the occurrence of an aerosol extinction value of greater than 0.006 km^-1
- Exclusion of all data points at altitude and below the occurrence of both the 525nm aerosol extinction value exceeding 0.001 km^-1 and the 525/1020 extinction ratio falling below 1.4
- Exclusion of all data points below 35km an 200% or larger uncertainty estimate
- enumerate_flags – expand the index and species flags to their boolean values.
- normalize_percent_error – give the species error as percent rather than percent * 100
- return_separate_flags – return the enumerated flags as a separate data array
Example
>>> sage = SAGEIILoaderV700() >>> sage.data_folder = 'path/to/data' >>> data = sage.load_data('2004-1-1','2004-5-1')
In addition to the sage ii fields reported in the files, two additional time fields are provided to allow for easier subsetting of the data.
data['mjd']
is a numpy array containing the modified julian dates of each scandate['time']
is an pandas time series object containing the times of each scan-
static
convert_index_bit_flags
(data: Dict[KT, VT]) → xarray.core.dataset.Dataset[source]¶ Convert the int32 index flags to a dataset of distinct flags
Parameters: data – Dictionary of input data as returned by load_data
Returns: Return type: Dataset of the index bit flags
-
static
convert_species_bit_flags
(data: Dict[KT, VT]) → xarray.core.dataset.Dataset[source]¶ Convert the int32 species flags to a dataset of distinct flags
Parameters: data – Dictionary of input data as returned by load_data Returns: Return type: Dataset of the index bit flags
-
convert_to_xarray
(data: Dict[KT, VT]) → Union[xarray.core.dataset.Dataset, Tuple[xarray.core.dataset.Dataset, xarray.core.dataset.Dataset]][source]¶ Parameters: data – Data from the load_data
functionReturns: Return type: data formatted to an xarray Dataset
-
get_index_filename
(year: int, month: int) → str[source]¶ Returns the index filename given a year and month
Parameters: - year – year of the data that will be loaded
- month – month of the data that will be loaded
Returns: Return type: filename of the index file where the data is stored
-
static
get_index_format
() → Dict[str, Tuple[str, int]][source]¶ index format taken from sg2_indexinfo.pro provided in the v7.00 download
used for reading the binary data format
Returns: an ordered dictionary of variables provided in the index file. Each dictionary field contains a tuple with the information (data type, length). Ordering is important as the sage ii binary files are read sequentially. Return type: Dict
-
get_spec_filename
(year: int, month: int) → str[source]¶ Returns the spec filename given a year and month
Parameters: - year – year of the data that will be loaded
- month – month of the data that will be loaded
Returns: Return type: filename of the spec file where the data is stored
-
static
get_spec_format
() → Dict[str, Tuple[str, int]][source]¶ spec format taken from sg2_specinfo.pro provided in the v7.00 download
used for reading the binary data format
Returns: Ordered dictionary of variables provided in the spec file. Each dictionary field contains a tuple with the information (data type, number of data points). Ordering is important as the sage ii binary files are read sequentially. Return type: Dict
-
load_data
(min_date: str, max_date: str, min_lat: float = -90, max_lat: float = 90, min_lon: float = -180, max_lon: float = 360) → Union[Dict[KT, VT], xarray.core.dataset.Dataset][source]¶ Load the SAGE II data for the specified dates and locations.
Parameters: - min_date – start date where data will be loaded in iso format, eg: ‘2004-1-1’
- max_date – end date where data will be loaded in iso format, eg: ‘2004-1-1’
- min_lat – minimum latitude (optional)
- max_lat – maximum latitude (optional)
- min_lon – minimum longitude (optional)
- max_lon – maximum longitude (optional)
Returns: Return type: Variables are returned as numpy arrays (1 or 2 dimensional depending on the variable)
-
read_index_file
(file: str) → Dict[KT, VT][source]¶ Read the binary file into a python data structure
Parameters: file – filename to be read Returns: Return type: data from the file
-
read_spec_file
(file: str, num_profiles: int) → List[Dict[KT, VT]][source]¶ Parameters: - file – name of the spec file to be read
- num_profiles – number of profiles to read from the spec file (usually determined from the index file)
Returns: Return type: list of dictionaries containing the spec data. Each list is one event
-
static
subset_data
(data: Dict[KT, VT], min_date: str, max_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float) → Dict[KT, VT][source]¶ Removes any data from the dictionary that does not meet the specified time, latitude and longitude requirements.
Parameters: - data – dictionary of sage ii data. Must have the fields ‘mjd’, ‘Lat’ and ‘Lon’. All others are optional
- min_date – start date where data will be loaded in iso format, eg: ‘2004-1-1’
- max_date – end date where data will be loaded in iso format, eg: ‘2004-1-1’
- min_lat – minimum latitude (optional)
- max_lat – maximum latitude (optional)
- min_lon – minimum longitude (optional)
- max_lon – maximum longitude (optional)
Returns: Return type: returns the dictionary with only data in the valid latitude, longitude and time range