Welcome to pysagereader’s documentation!¶
pysagereader is a python reader for the SAGE II data.
Data is imported in either an xarray
data structure or a dictionary of numpy arrays.
Quickstart¶
Get the data¶
The SAGE II data is not provided with the pysagereader package but can be obtained from NASA ASDC.
Loading the data¶
By default data is loaded into an xarray dataset that can be easily sliced, subset, and plotted.
from pysagereader import SAGEIILoaderV700
sage = SAGEIILoaderV700(data_folder=r'/path/to/sage/data')
data = sage.load_data(min_date='2000-1-1', max_date='2003-12-31', min_lat=-10, max_lat=10)
data.O3.plot(x='time', robust=True)
Creating NetCDF files¶
The xarray package also provides convenient export to NetCDF files
from pysagereader import SAGEIILoaderV700
sage = SAGEIILoaderV700(data_folder=r'/path/to/sage/data')
data = sage.load_data(min_date='2000-1-1', max_date='2001-12-31')
data.to_netcdf('sage_ii_v700_2000.nc')
Or from the command line:
python /install/directory/pysagereader/make_netcdf.py -i /sageii/data/folder -o /output/folder -time_res yearly
SAGE II Loader¶
-
class
pysagereader.
SAGEIILoaderV700
(data_folder: str = None, output_format: str = 'xarray', species: List[str] = ('aerosol', 'h2o', 'no2', 'ozone', 'background'), cf_names: bool = False, filter_aerosol: bool = False, filter_ozone: bool = False, enumerate_flags: bool = False, normalize_percent_error: bool = False, return_separate_flags: bool = False)[source]¶ Class designed to load the v7.00 SAGE II spec and index files provided by NASA ADSC into python
Data files must be accessible by the users machine, and can be downloaded from: https://eosweb.larc.nasa.gov/project/sage2/sage2_v7_table
Parameters: - data_folder – location of sage ii index and spec files.
- output_format –
format for the output data. If
'xarray'
the output is returned as anxarray.Dataset
. If None the output is returned as a dictionary of numpy arrays.NOTE: the following options only apply to xarray output types
- species – Species to be returned in the output data. If None all species are returned. Options are
aerosol
,ozone
,h2o
, andno2
. If more than one species is returned fields will be NaN-padded where data is not available.species
is only used if'xarray'
is set as theoutput_data
format, otherwise it has no effect. - cf_names – If True then CF-1.7 naming conventions are used for the output_data when
xarray
is selected. - filter_aerosol – filter the aerosol using the cloud flag
- filter_ozone –
filter the ozone using the criteria recommended in the release notes
- Exclusion of all data points with an uncertainty estimate of 300% or greater
- Exclusion of all profiles with an uncertainty greater than 10% between 30 and 50 km
- Exclusion of all data points at altitude and below the occurrence of an aerosol extinction value of greater than 0.006 km^-1
- Exclusion of all data points at altitude and below the occurrence of both the 525nm aerosol extinction value exceeding 0.001 km^-1 and the 525/1020 extinction ratio falling below 1.4
- Exclusion of all data points below 35km an 200% or larger uncertainty estimate
- enumerate_flags – expand the index and species flags to their boolean values.
- normalize_percent_error – give the species error as percent rather than percent * 100
- return_separate_flags – return the enumerated flags as a separate data array
Example
>>> sage = SAGEIILoaderV700() >>> sage.data_folder = 'path/to/data' >>> data = sage.load_data('2004-1-1','2004-5-1')
In addition to the sage ii fields reported in the files, two additional time fields are provided to allow for easier subsetting of the data.
data['mjd']
is a numpy array containing the modified julian dates of each scandate['time']
is an pandas time series object containing the times of each scan-
static
convert_index_bit_flags
(data: Dict[KT, VT]) → xarray.core.dataset.Dataset[source]¶ Convert the int32 index flags to a dataset of distinct flags
Parameters: data – Dictionary of input data as returned by load_data
Returns: Return type: Dataset of the index bit flags
-
static
convert_species_bit_flags
(data: Dict[KT, VT]) → xarray.core.dataset.Dataset[source]¶ Convert the int32 species flags to a dataset of distinct flags
Parameters: data – Dictionary of input data as returned by load_data Returns: Return type: Dataset of the index bit flags
-
convert_to_xarray
(data: Dict[KT, VT]) → Union[xarray.core.dataset.Dataset, Tuple[xarray.core.dataset.Dataset, xarray.core.dataset.Dataset]][source]¶ Parameters: data – Data from the load_data
functionReturns: Return type: data formatted to an xarray Dataset
-
get_index_filename
(year: int, month: int) → str[source]¶ Returns the index filename given a year and month
Parameters: - year – year of the data that will be loaded
- month – month of the data that will be loaded
Returns: Return type: filename of the index file where the data is stored
-
static
get_index_format
() → Dict[str, Tuple[str, int]][source]¶ index format taken from sg2_indexinfo.pro provided in the v7.00 download
used for reading the binary data format
Returns: an ordered dictionary of variables provided in the index file. Each dictionary field contains a tuple with the information (data type, length). Ordering is important as the sage ii binary files are read sequentially. Return type: Dict
-
get_spec_filename
(year: int, month: int) → str[source]¶ Returns the spec filename given a year and month
Parameters: - year – year of the data that will be loaded
- month – month of the data that will be loaded
Returns: Return type: filename of the spec file where the data is stored
-
static
get_spec_format
() → Dict[str, Tuple[str, int]][source]¶ spec format taken from sg2_specinfo.pro provided in the v7.00 download
used for reading the binary data format
Returns: Ordered dictionary of variables provided in the spec file. Each dictionary field contains a tuple with the information (data type, number of data points). Ordering is important as the sage ii binary files are read sequentially. Return type: Dict
-
load_data
(min_date: str, max_date: str, min_lat: float = -90, max_lat: float = 90, min_lon: float = -180, max_lon: float = 360) → Union[Dict[KT, VT], xarray.core.dataset.Dataset][source]¶ Load the SAGE II data for the specified dates and locations.
Parameters: - min_date – start date where data will be loaded in iso format, eg: ‘2004-1-1’
- max_date – end date where data will be loaded in iso format, eg: ‘2004-1-1’
- min_lat – minimum latitude (optional)
- max_lat – maximum latitude (optional)
- min_lon – minimum longitude (optional)
- max_lon – maximum longitude (optional)
Returns: Return type: Variables are returned as numpy arrays (1 or 2 dimensional depending on the variable)
-
read_index_file
(file: str) → Dict[KT, VT][source]¶ Read the binary file into a python data structure
Parameters: file – filename to be read Returns: Return type: data from the file
-
read_spec_file
(file: str, num_profiles: int) → List[Dict[KT, VT]][source]¶ Parameters: - file – name of the spec file to be read
- num_profiles – number of profiles to read from the spec file (usually determined from the index file)
Returns: Return type: list of dictionaries containing the spec data. Each list is one event
-
static
subset_data
(data: Dict[KT, VT], min_date: str, max_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float) → Dict[KT, VT][source]¶ Removes any data from the dictionary that does not meet the specified time, latitude and longitude requirements.
Parameters: - data – dictionary of sage ii data. Must have the fields ‘mjd’, ‘Lat’ and ‘Lon’. All others are optional
- min_date – start date where data will be loaded in iso format, eg: ‘2004-1-1’
- max_date – end date where data will be loaded in iso format, eg: ‘2004-1-1’
- min_lat – minimum latitude (optional)
- max_lat – maximum latitude (optional)
- min_lon – minimum longitude (optional)
- max_lon – maximum longitude (optional)
Returns: Return type: returns the dictionary with only data in the valid latitude, longitude and time range
SAGE II¶
Variables¶
As far as possible the original SAGE II variable names used in the IDL scripts and documentation have been adopted. If xarray is chosen as the output format revision and creation date information is not included in the output structure.
Altitude Range for Species¶
Species | Range (km) |
---|---|
Ozone | 5-60 |
NO2 | 15-60 |
Aerosol | 1-45 |
Water Vapor | MSL-40 |
Revision Info¶
Field | Description |
---|---|
Num_Prof | Number of profiles (records) in file |
Met_Rev_Date | LaRC Met Model Rev Date (yyyymmdd) |
Driver_Rev | LaRC Driver version (eg. 6.20) |
Transmission_Rev | LaRC Transmission version |
Inversion_Rev | LaRC Inversion version |
Spectroscopy_Rev | LaRC Inversion version |
Eph_File_Name | Ephemeris file name |
Met_File_Name | Met file name |
Ref_File_Name | Refraction file name |
Trans_File_Name | Transmission file name |
Spec_File_Name | Species profile file name |
FillVal | Fill value |
Altitude grid and range info¶
Grid_Size | Altitude Grid spacing |
---|---|
Alt_Grid | Geometric Alt |
Alt_Mid_Atm | Geometric Alt for Dens_Mid_Atm |
Range_Trans | Transmission Min & Max alt |
Range_O3 | Ozone Density Min & Max alt |
Range_NO2 | NO2 Density Min & Max alt |
Range_H2O | H2O Density Min & Max alt |
Range_Ext | Extinction Min & Max alt |
Range_Density | Density Min & Max alt |
Range_Surface | Surface Area Min & Max alt |
Event Specific Info¶
YYYYMMDD | Event Date (yyyymmdd) at 30 km |
event_num | The event number |
HHMMSS | Event Time (hhmmss) at 30 km |
Day_Frac | Time of Year (ddd.fraction) |
Lat | Sub-tangent Lat at 30km |
Lon | Sub-tangent Lon at 30km |
Beta | Spacecraft Beta angle (degree |
Duration | Duration of event (seconds) |
Type_Sat | Instrument Event Type, 0=sr, 1=ss) |
Type_Tan | Event Type, Local (0=sr,1=ss) |
Process Tracking Flag info¶
Processing Success | |
Dropped | Value is non-zero if event is dropped |
InfVec | 32 bits describing the event processing |
Ephemeris | |
Eph_Cre_Date | Record creation date (yyyymmdd) |
Eph_Cre_Time | Record creation time (hhmmss) |
Met | |
Met_Cre_Date | Record creation date (yyyymmdd) |
Met_Cre_Time | Record creation time (hhmmss) |
Refraction | |
Ref_Cre_Date | Record creation date (yyyymmdd) |
Ref_Cre_Time | Record creation time (hhmmss) |
Transmission | |
TRANS_Cre_Date | Record creation date (yyyymmdd) |
TRANS_Cre_Time | Record creation time (hhmmss) |
Inversion | |
SPECIES_Cre_Date | Record creation date (yyyymmdd) |
SPECIES_Cre_Time | Record creation time (hhmmss) |
Species File Contents¶
Field Type Description
Tan_Alt | Center-of-Sun Tangent Alt (km) |
Tan_Lat | Center-of-Sun Lat (deg) |
Tan_Lon | Center-of-Sun Lon (deg) |
NMC_Pres | Pressure (mb) (0.5-70km) |
NMC_Temp | Temperature (K), (0.5-70km) |
NMC_Dens | Density (molecules/cm3) (.5-70km) |
NMC_Dens_Err | Density Uncertainty(%x100) |
Trop_Height | Tropopause height in km |
Wavelength | Channel wavelengths |
O3 | O3 number density (cm-3) |
NO2 | NO2 number density (cm-3) |
H2O | H2O number density (ppp) |
Ext386 | 386 nm aerosol extinction (1/km) |
Ext452 | 452 nm aerosol extinction (1/km) |
Ext525 | 525 nm aerosol extinction (1/km) |
Ext1020 | 1020 nm aerosol extinction (1/km) |
Density | Molecular density (1/cm^3) |
SurfDen | Aerosol surface area density (micrometer^2/cm^3) |
Radius | Aerosol effective radius (micrometer) |
Dens_Mid_Atm | Middle atmosphere retrieved density(1/cm^3) |
O3_Err | o3 number density uncertainty (%x100) |
NO2_Err | NO2 number density uncertainty (%x100) |
H2O_Err | H2O number density uncertainty (%x100) |
Ext386_Err | 386 nm aerosol ext. uncertainty (%x100) |
Ext452_Err | 452 nm aerosol ext. uncertainty (%x100) |
Ext525_Err | 525 nm aerosol ext. uncertainty (%x100) |
Ext1020_Err | 1020 nm aerosol ext. uncertainty (%x100) |
Density_Err | Density uncertainty (%x100) |
SurfDen_Err | Aerosol surface area density uncertainty(%x100) |
Radius_Err | Aerosol effective radius uncertainty (%x100) |
Dens_Mid_Atm_Err | Middle atmosphere density uncertainty (%x100) |
InfVec | Bit-wise quality flags |
Quality Flags¶
SAGE II data returns both event (index) flags as well as species flags. These are 32 bit integers contained in the InfVec and ProfileInfVec variables respectively. However, for easier use the flags can be expanded to show each bit separately.
from pysagereader.sage_ii_reader import SAGEIILoaderV700
sage = SAGEIILoaderV700(data_folder=r'path\to\sage\data', enumerate_flags=True)
data = sage.load_data('2000-1-1', '2003-12-31', -10, 10)
The flags can also be returned in a separate array for convenience.
from pysagereader.sage_ii_reader import SAGEIILoaderV700
sage = SAGEIILoaderV700(data_folder=r'path\to\sage\data', return_separate_flags=True)
data, flags = sage.load_data('2000-1-1', '2003-12-31', -10, 10)
Data Filtering¶
Ozone¶
It is recommend that only a subset of the ozone data be used for scientific analysis, based on
filtering recommendations from the release notes.
Ozone results that meet these criteria can be determined from the ozone_filter
variable in the returned
dataset. A value of 0
indicates ozone should not be used. The following criteria are used as the filters:
- Exclusion of all data points with an uncertainty estimate of 300% or greater
- Exclusion of all profiles with an uncertainty greater than 10% between 30 and 50 km
- Exclusion of all data points at altitude and below the occurrence of an aerosol extinction value of greater than 0.006 km-1
- Exclusion of all data points at altitude and below the occurrence of both the 525nm aerosol extinction value exceeding 0.001 km-1 and the 525/1020 extinction ratio falling below 1.4
- Exclusion of all data points below 35km an 200% or larger uncertainty estimate
Aerosol¶
To remove cloud contamination from the aerosol data flags Cloud_Bit_1
and Cloud_Bit_2
are used to
compute the cloud_filter
. A value of 1
indicates there is a cloud present at or above that altitude.