nif:isString
|
-
All procedures were carried out in strict compliance with recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health, and were approved by the Institutional Animal Care and Use Committee of the University of California, San Francisco. Details regarding protocol and methodology have been published previously [58–59]. A brief description follows. Physiological data were collected from two adult squirrel monkeys (Saimiri sciureus, Monkey 1: male; Monkey 2: female). Subjects were group housed with other conspecifics in a temperature and humidity controlled colony. Subjects had ad libitum access to water and primate diet supplemented with fresh fruits and vegetables. An environmental enrichment program was administered by UCSF Laboratory Animal Resource Center staff. Regular monitoring and care was provided by UCSF veterinary staff. Prior to physiological recording, subjects were trained to sit in a primate chair. A head post was then surgically implanted to allow head restraint. For all surgical procedures, subjects were sedated with ketamine (25 mg/kg) and midazolam (0.1 mg/kg), and anesthetized with isoflurane gas (0.5–5%). Implants were secured to the cranium with bone screws and dental acrylic. Perioperative antibiotics and analgesics were administered as needed in consultation with UCSF veterinary staff. After subjects were trained to sit in the primate chair while head fixed, they underwent a second surgery in which a recording chamber was implanted over primary auditory cortex (A1). The temporal muscle was resected, the cranium overlying auditory cortex was exposed, and a recording chamber was secured with bone screws and dental acrylic. Perioperative care was administered as before. Sterile procedures were used for all recording sessions to access auditory cortex. Following lidocaine (1%) application, a small cranial burr hole (2–3 mm) was drilled inside the recording chamber under magnification with a surgical microscope. A small incision was then made in the dura using micro-surgical instruments. The process was repeated as needed for subsequent recording sessions to expose additional areas of auditory cortex. Between recording sessions, implants were cleaned aseptically and the chamber was filled with antibiotic ointment and sealed with a metal cap.
Recordings were conducted inside a sound attenuation chamber (Industrial Acoustics Company, Bronx, NY). Extracellular data were collected using 16-channel linear electrode arrays (177 μm2 contact size, 150 μm spacing; NeuroNexus Technologies, Ann Arbor, MI). Probes were advanced into cortex with a hydraulic microdrive (David Kopf Instruments, Tujunga, CA) to depths at which neural activity was evident on most or all channels. Penetrations were approximately perpendicular to the surface of the exposed cortex, although it was not possible to achieve strict orthogonality for every recording given the complex anatomy of auditory cortex near the superior temporal sulcus [60–61]. Extracellular signals were amplified with an RA16 Medusa preamplifier (Tucker-Davis Technologies, Gainesville, FL), band-pass filtered (800–5000 Hz) and stored to hard disk at 30.3 kHz using a Cheetah A/D system (Neuralynx, Inc., Bozeman, MT) for offline analysis. Spike waveforms that exceeded three median absolute deviations of the raw voltage distribution were retained for further analysis. Both multi-unit (MU) and isolated single-unit (SU) signals were analyzed. Custom MATLAB software (MathWorks, Natick, MA) was used for spike waveform detection, outlier rejection, and sorting. Template matching was used in combination with manual sorting in 2D and 3D waveform feature space (e.g., projections onto principal components, peak/valley amplitude, spike times). Autocorrelation, cross-correlation, and refractory period analyses were used to support SU classifications. Only SUs that remained active for the duration of the recording were included in subsequent analyses. Filtered spikes that could not be assigned to a SU were considered MU signals.
Sounds were delivered through a free-field speaker directly in front of the subject, 40 cm from the interaural line. Sound levels were calibrated using a Brüel & Kjær Model 2209 meter using an A-weighted decibel filter and a Model 4192 microphone. Levels were constrained across the experiment between 64 and 66 dB, and sound levels within the same recording session fell within 1 dB of each other. The stimulus used for estimating STRFs (below) was a dynamic moving ripple (DMR; Fig 1A), which has been extensively used in auditory STRF analysis as described in detail elsewhere [5–6,31,39]. Briefly, the DMR is a temporally-varying broadband stimulus that shares many features with natural sounds such as short-term (local) spectrotemporal correlations, but is fully balanced in the long term for durations exceeding a few minutes [31]. It is thus capable of driving auditory cortical responses, and permits rigorous STRF estimates using the STA method without additional correction for stimulus correlations. For the present experiment, the duration of the DMR was 30 min and comprised ~40 sinusoidal carriers per octave spanning 50–40,000 Hz, each with randomized phase. Carrier magnitude was modulated by the spectrotemporal envelope, which at a given time is defined by a single spectral (peaks/oct) and temporal modulation rate (peaks/s). The spectral modulation rate varied from 0–4 cycles per octave, and the temporal modulation rate varied between –150 Hz (upward sweep) and 150 Hz (downward sweep). Both modulation parameters varied randomly and independently over time, and were statistically independent and unbiased within their respective ranges. Maximum modulation depth was 40 dB with a logarithmic amplitude distribution [39]. A unique 30-s DMR segment (50 repetitions), generated with the same parameters as the estimation stimulus, was used as a validation stimulus to assess the prediction accuracy of the STRF estimates (Fig 1B).
Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0183914.g001 Spectrotemporal receptive field (STRF) estimation and validation procedures. (A) Units were first probed with a 30-min dynamic moving ripple (DMR), a synthetic broadband stimulus sharing many features with natural sounds including local spectrotemporal correlations. (B) Responses elicited by a novel 30-s DMR segment (50 repetitions) were used for subsequent validation and testing. (C) STRFs were estimated by calculating the spike-triggered average (STA). Response predictions were obtained by convolution of the STRF and validation spectrogram, with the output nonlinearity modeled by half-wave rectification. (D) STRF validity was assessed by calculating the correlation coefficient between predictions and neuronal responses obtained with the trial-averaged peristimulus time histogram (PSTH). (E) Null STAs computed with circularly-shifted spike times were used to generate a sample of gain values expected by chance. A normal distribution fit to these values was used to determine gain value cutoffs corresponding to a logarithmically-spaced range of significance levels from p < 100 to p < 10−9. (F) Similarly, null STAs subjected to a gain threshold were used to generate a sample of cluster mass values expected by chance, and a gamma distribution fit to these values was used to identify cluster mass cutoffs corresponding to the same range of significance levels. (G) The corrected STA was defined by pixels (clusters) exceeding a specified significance level.
All analyses were performed in MATLAB (MathWorks, Natick, MA). Raw STAs were obtained for each unit by computing the average stimulus preceding each spike (Fig 1C). STRFs were estimated at a resolution of 193 frequency bins (~0.05 oct, y-axis) and 200 time bins (1 ms, x-axis) to adequately reflect the spectral and temporal encoding fidelity of the auditory system in awake primates [41–48]. The color spectrum (z-axis) used to define STRF pixel values corresponds to the spike rate relative to the mean, such that red and blue reflect firing rates above or below the mean, respectively [31]. For simplicity, we refer to these responses as the excitatory and inhibitory regions of the STRF, respectively [17]. We use the term gain to refer to the strength of these responses (reflected in STRF pixel intensity values). Neuronal responses evoked by the validation stimulus were evaluated by computing trial-averaged peristimulus time histograms (PSTHs). These were compared to predicted responses (Fig 1D) obtained by convolution of the STRF with the validation stimulus (MATLAB conv2 function; [16,23,40,62]). Half-wave rectification was used as a simple approximation of the output nonlinearity characteristic of extracellular recordings [40,62]. The correlation coefficient between the response and prediction was used to quantify prediction accuracy. As described in further detail below, this stimulus was randomly divided into two 15-s halves, one for validation and the other for testing.
Raw STAs were corrected with a broad continuum of liberal-to-conservative statistical thresholds corresponding to 30 logarithmically spaced significance values ranging from p = 100 to p = 10−9 (note that p = 100 reflects the uncorrected STA). For each STA, gain threshold cutoff values corresponding to each p value (expressed pgain) were obtained by fitting a normal distribution to a sample of null STA gain values (Fig 1E). Expressed in this way, each gain threshold corresponds to the proportion of the null distribution with values exceeding the specified p value. For example, pgain < 0.01 denotes that only STRF pixels with gain values exceeding the most extreme 1% of the null distribution were retained for further analysis. To obtain the null STAs, spike times were circularly shifted by a random value (MATLAB circshift function) selected from an interval equal to the stimulus duration [63]. Thus, if spikes near the beginning of the stimulus were shifted toward the middle of the stimulus, spikes near the end of the stimulus wrapped around to the beginning. An STA was then computed with the shifted spike times using the same axes and resolution as the true STA (200 iterations). The circular shifting approach ensured that both spike counts and inter-spike interval (ISI) distributions were preserved across the true and null STAs. The validity of the STRFs obtained with each gain threshold setting was then evaluated in terms of prediction accuracy. A similar approach was implemented in a two-step correction procedure comprising a gain threshold followed by a cluster-based threshold. Following gain thresholding, the remaining clusters (contiguous pixels identified with the MATLAB bwconncomp function) were subjected to a range of cluster mass thresholds corresponding to the same 30 logarithmically spaced p values described above. Cluster mass was defined as the summed absolute pixel values. Cluster mass cutoffs corresponding to each p value were obtained for each STA by fitting a gamma distribution to a sample of null clusters (Fig 1F). For computational efficiency, the null cluster mass distribution was obtained from the sample of 200 null STAs by computing the masses of clusters remaining in every ith null STA after applying gain thresholds computed with every jth null STA. The same range of gain thresholds described above was applied to STAs for subsequent cluster analysis with the exception of the most extremely liberal and conservative settings, as follows: [i] The p value reflecting the raw STA (p = 100) was omitted since no gain threshold was implemented, and thus, no clusters were available for analysis, [ii] The p value reflecting the most liberal gain threshold (p < 0.49) was omitted because it generally produced only a small number of extremely large clusters, [iii] The eight most conservative gain threshold settings (approximate range: p < 10−7 to 10−9) were omitted because such extreme gain thresholds applied to null STAs typically resulted in very few or zero surviving clusters. Thus, STAs were first corrected at the pixel (gain) level with a total of 20 thresholds (approximate range: p < 0.24 to 10−6), and subsequently corrected at the cluster (mass) level with 30 thresholds reflecting the original range of p values (100 to 10−9). STRF structure resulting from each gain and cluster threshold intersection (expressed p(gain,clst)) was evaluated in terms of prediction accuracy as described above. Although previous studies have applied the same significance thresholds to excitatory and inhibitory regions of the STRF [31], it is conceivable that STRF validity could benefit from independent thresholding. This is because, as numerous studies have reported, inhibitory regions tend to be less robust and stereotyped, and more variable than excitatory regions of the STRF [11,17,30,64]. Thus, an excitation-dominated STRF might benefit from more rigorous elimination of inhibitory pixels and clusters. To test this hypothesis, the correction approaches outlined above were extended to two additional analyses in which excitatory and inhibitory pixels and clusters were thresholded independently. This yielded a total of four statistical correction approaches which are summarized and compared below: [1] pixel gain correction (pgain), [2] independent excitatory and inhibitory pixel gain correction (pgain{exc,inh}), [3] cluster mass correction (p(gain,clst)), [4] independent excitatory and inhibitory cluster mass correction (p(gain,clst{exc,inh})). We note that, contrary to our a priori expectations, independent excitatory and inhibitory subfield analysis yielded little, if any, significant improvement in predictive validity over the more basic approaches. As such, all figures pertaining to these analyses are presented in the Supporting Information section (S1–S4 Figs) to permit better focus on the principal results of the paper. Two implementations of each of the foregoing gain- and cluster-based thresholding approaches are summarized in the results below. First, a fixed-parameter approach was tested in which all units were uniformly subjected to the same threshold settings. Each unit was exhaustively tested at all possible intersections of the significance levels included in our study (p < 100 to p < 10−9). For each unit, this yielded a vector of 30 prediction correlation values for gain thresholding alone (pgain), a matrix of 30 × 30 prediction values for independent excitatory and inhibitory gain thresholding (pgain{exc,inh}), a matrix of 20 × 30 prediction values for gain- plus cluster-thresholding (p(gain,clst)), and an array of 20 × 30 × 30 values for independent excitatory and inhibitory cluster mass correction (p(gain,clst{exc,inh})). For the second approach, best threshold settings were selected for each unit via cross-validation. To avoid overfitting these threshold settings, the validation stimulus was randomly divided into two equal 15-s segments. The threshold settings that maximized prediction accuracy for the first half of the data (validation dataset) were then evaluated in terms of prediction accuracy for the second half (test dataset). For the fixed-parameter approaches and raw STA, prediction values reflect the test dataset alone. To minimize the dependence of the results on any particular definition of the validation and test datasets, the procedure was repeated ten times for randomly-selected dataset halves. Prediction correlation values reported below indicate the mean across iterations. Because the present study was primarily concerned with the comparative consequences of gain and cluster correction choices, no smoothing was applied to responses, predictions, or STRF kernels. Unless otherwise noted, all analyses included the full estimation, validation, and test datasets. To ensure the results reflected units with reliable responses during both the estimation and validation phases, each multi-unit and single-unit was characterized with the reliability index (RI [22]) and trial similarity (TS [65]) metrics. To calculate RI, the estimation dataset was first divided into 30 1-min segments. An STRF was then computed using half of the segments selected at random (pgain < 0.05), and a second STRF was computed using the remaining segments. RI was defined as the mean correlation coefficient between the two STRFs across 200 iterations. TS was computed by constructing a PSTH from half of the validation trials selected at random (bin size = 10 ms), and a second PSTH from the remaining trials. TS was defined as the mean correlation coefficient between PSTHs across 100 iterations. The RI and TS calculations were repeated using circularly-shifted spike times, and only units with RI and TS values exceeding chance levels (p < 0.01) were retained for subsequent analysis. Unit populations were not screened further, e.g., by prediction significance criteria. To facilitate comparison with previous studies, the results focus on responses and predictions analyzed at 10-ms resolution. Additional summaries are provided for bin sizes of 1, 2, 5, 10, 20, 50, and 100 ms.
Temporal and spectral modulation preferences: Modulation properties of each unit were obtained by computing the two-dimensional Fourier transform of each version of the corrected STRF, as described in detail elsewhere [17,31,66]. Briefly, the Fourier transform is a function of temporal (-150 to 150 cycles/s) and spectral modulation frequency (0 to 4 cycles/octave). The ripple transfer function (RTF) is obtained by folding along the temporal midline (temporal modulation frequency = 0). Summing down the columns of the RTF yields the temporal modulation transfer function (tMTF), and summing across the rows of the RTF yields the spectral modulation transfer function (sMTF). MTFs were considered band-pass if values above and below the peak of the MTF decreased by at least 3 dB. All others were considered low-pass (high-pass MTFs were not encountered). The best modulation frequency (BMF) was defined as the peak of the MTF for band-pass MTFs, and the mean between zero and the 3-dB upper cutoff for low-pass MTFs.
|