Detection of Adolescent Depression from Speech Using Optimised Spectral Roll-Off Parameters

The purpose of this paper is to examine adolescent depression detection from a clinical database of 63 adolescents (29 depressed and 34 non-depressed) interacting with a parent. A range of spectral roll-off parameters was investigated to observe an association of the frequency-energy relationship in relation to depression. The spectral roll-off range improved depression classification rates compared to the best individual roll-off parameter. Further improvement was accomplished using a 2-stage mRMR/SVM feature selection approach to optimize a roll-off parameters subset. The proposed optimized feature set reached an average depression detection accuracy of 82.2% for males and 70.5% for females. More acoustic spectral features were investigated including flux, centroid, entropy, formants and power spectral density to classify depression. The optimized spectral roll-off set was the most effective of the acoustic spectral features. All spectral features, including the best individual spectral roll-off, was grouped into a baseline feature category (S*) with an average classification accuracy of 71.4% (male) and 70.6% (female). A new spectral category (S), with the inclusion of the proposed optimized spectral roll-off sub-set, performed best with an average accuracy of 97.5% (males) and 92.3% (females).


Introduction
LINICAL depression is a debilitating affective disorder depicted by emotional disturbances, reduced emotional expression and prolonged phases of excessive sadness [1] and impairs a person's ability to function [2].Depression is the third highest cause of global disease burden and will be the highest by 2030 [3,4].In Australia depression is the third leading cause of disease burden, the leading cause of non-fatal disability [5] and has a large economic impact costing $14.9 billion annually [6].The World Health Organization (WHO) has documented an increase in worldwide depression with estimated 121 million sufferers in 2001 [4] to 350 million in 2012 [58].Depression is the most prevalent mental disorder with the highest lifetime risk [7].
Depression is the leading cause of suicide and accounts for twothirds of suicides [8] with higher risk in men, indigenous Australians, those in remote areas and children [9].Since the 1970s there has been a considerable increase in adolescent depression prevalence.In Australia mental illness is most prevalent in 16-24 year old age group [10,11].Frequently symptoms of depression initially appear in adolescence [12].Half of lifetime cases are onset by the age of 14 [12] and one in five by the age of 18 [13].The rise in adolescent depression is correlated to an increase in youth suicide [13,14] and is the leading cause of youth death [15].The lifetime risk of suicide with depression is 20% [16] and can be reduced with treatment [17].
Diagnosis is important but many suffering from a depressive illness do not seek or receive treatment.Approximately 65% of people with mental illness in Australia do not have treatment access [5].The main difficulty in diagnosis is a lack of health care resources and providers [18].Depression diagnosis is especially difficult in adolescents as symptoms are unrecognized during initial appearance [19].Psychological depression diagnosis techniques almost completely rely on professional observations and evaluations.The diagnostic method is subjective and contingent on clinical judgment that depends on the training, skill set, experience and judgment of the practitioner [20].

ISSN: 2574-1241 Abstract
The purpose of this paper is to examine adolescent depression detection from a clinical database of 63 adolescents (29 depressed and 34 non-depressed) interacting with a parent.A range of spectral roll-off parameters was investigated to observe an association of the frequencyenergy relationship in relation to depression.The spectral roll-off range improved depression classification rates compared to the best individual roll-off parameter.Further improvement was accomplished using a 2-stage mRMR/SVM feature selection approach to optimize a roll-off parameters subset.The proposed optimized feature set reached an average depression detection accuracy of 82.2% for males and 70.5% for females.More acoustic spectral features were investigated including flux, centroid, entropy, formants and power spectral density to classify depression.The optimized spectral roll-off set was the most effective of the acoustic spectral features.All spectral features, including the best individual spectral roll-off, was grouped into a baseline feature category (S*) with an average classification accuracy of 71.4% (male) and 70.6% (female).A new spectral category (S), with the inclusion of the proposed optimized spectral roll-off sub-set, performed best with an average accuracy of 97.5% (males) and 92.3% (females).
to create an accessible, non-invasive, efficient and objective adolescent depression detection system to improve on current diagnosis procedures.Automatic mass screening could increase depression detection rates and improve access, ability and willingness to seek help.The remainder of this paper is organized as follows: Section II gives a literature review of depression detection and outlines how this study extends on existing work.Section III describes the clinical conversational speech database.Section IV outlines the method of pre-processing, feature extraction and modeling.Section V supplies results with discussions and Section VII provides conclusions and comparisons to equivalent past work.

Previous Work
The seriousness of depression has led to interest in depression analysis and detection.Clinical depression is associated with dull, monotonous and lifeless speech with a lack of expression [21].Reflected changes in speech quality can indicate affective disorders including depression [22,23].Depressed subjects experience physiological fluctuations that alter vocal fold and vocal tract airflow modifying speech properties [24,25].Depressed speakers exhibit quantifiable changes in spectral, prosodic, articulatory and phonetic properties [26][27][28][29].Studies have subjectively and objectively evaluated speech parameters as indicators of depression, severity and treatment efficacy [27].Acoustic depression detection studies have concentrated on feature categories and found spectral features outperform prosodic [30][31][32][33][34][35][36].
Moore et al., compared prosodic and spectral features and determined prosodic features performed worse than spectral features [29].A follow-up study found an optimal subset of prosodic, spectral and glottal features reached 91% (males) and 96% (females) in depression classification [36].France et al. [35] also studied acoustic properties in correlation to depression severity and found spectral features (formants and PSD) attained a top of 94% depression level detection [35].Low et al., investigated acoustic adolescent depression detection with a large clinical database (Oregon research Institute database (ORI)) of 139 parent-adolescent interactions (68 depressed and 71 controls) [36,37].The studies found MFCC achieved 58%/60% (males/females), TEO attained 54%/61% and the best result of 65%/65% with MFCC+LogE.Low et al., also investigated adolescent depression, using ORI-DB, and found a combination of TEO, F0, LogE, shimmer, spectral flux and spectral roll-off gave the best result for males of 78% [31,32].
Speech has a tendency to be lower at high frequencies and higher in the lower frequencies [38].The energy-frequency relationship is an important factor in speech of depressed persons [39,40] and can be represented by spectral roll-off.Most speech, emotion and depression detection tasks using the spectral roll-off are restricted to a single parameter where the majority of energy resides (i.e.75% [41], 80% [31,32], 85% [42,43], 90% or 95% [44]).Some studies have used a limited range of roll-off coefficients (i.e.25%, 50%, 75%, and 90%) and combined with other features [33,[45][46][47].This study investigates the effectiveness of spectral features (flux, centroid, entropy, PSD, formants, roll-off) for adolescent depression detection.Spectral roll-off is expanded to a range of k values from 5% to 95% with 5% intervals and it is proposed to optimize a subset of the new roll-off range.

Database
Family relationships are correlated to adolescent depression so the nature of the database using family interactions is an important concept [48,49].The Oregon Research Institute (ORI), USA, has gathered a database (ORI-DB) of adolescent-parent(s) conversational interactions and has been validated by psychology [49][50][51][52] and engineering [31,32,36,37] studies.Detailed descriptions of participant recruitment, questionnaires, interviews and depression assessments are available in [53].The ORI-DB contains 152 conversations between an adolescent (14-18 years old) and their parent(s).The adolescent was considered depressed if they met the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) criteria for major depressive disorder (MDD) [53] and non-depressed if they had no history of mental illness and diagnostic criteria was not met.The participants were from West Oregon, USA with an attempt to match depressed and control groups in demographic variables (e.g.age, gender, ethnicity and socioeconomic status) [49].Depressed subjects generally had a lower socioeconomic status and mothers with higher depression levels, which reflects associations in depression [54].
Each participant, seated a few feet apart, had a wireless lapel microphone and recorded on a separate channel with 44kHz sampling frequency in a quiet laboratory room.The interactions were divided into three 20-minute discussions including event planning (EPI), family consensus (FCI) and problem solving (PSI) interactions [55].The unscripted setup was designed to preserve natural expressed emotions [55].Furthermore, spontaneous speech has attained higher depression detection rates than read speech [23,56,57].The following experiments used a subset of the ORI-DB using only dyadic conversations giving the final corpus a total of 63 subjects (29 depressed and 34 control adolescents).The gender pairings, given by Table 1, shows a biased gender ratio that reflects the trend of higher depression rates in females [58] and is a problem with studies [31,32,[35][36][37].

Methods
The proposed methodology for adolescent depression detection is summarized by Figure 1 with three main stages: pre-processing, acoustic features extraction and SVM modeling/classification.The training phase learns models that best discriminate between depressed and non-depressed training subjects.In the testing phase the learned models are evaluated and used to classify unknown test subjects.

Pre-Processing
The adolescent speech signals were pre-processed and cleaned to remove background noise and cross-talk from the parents' microphone.The clean signal was then segmented into overlapped windowed voiced segments.More detailed descriptions of the speech signal preprocessing is as follows: Decimate Sampling Frequency: It is acceptable to remove the higher frequency components, as these are redundant for the majority of phonemes.A decrease in sampling frequency minimizes the amount of data and hence reduces processing time and memory requirements.The audio signal was recorded with a 44kHz sampling frequency which has been reduced to 11kHz.Down sampling alone causes aliasing that misinterprets high-frequency components.An anti-aliasing filter (5.5kHz cut-off low-pass filter) was used to enforce the Nyquist frequency and mitigate the undesirable effect.Then down-sampling was conducted by a integer factor, M=4, to decimate the sampling frequency.
Cross-Talk Removal using Fast ICA for BSS: During recording the close proximity of speakers led to a problem of interference between microphones.Instances of simultaneous speech (cross-talk) meant each microphone recorded a weighted mixture of both speaker sources.Simultaneous speech segments should be retained considering the importance in parent-adolescent conversations [59].Blind Source Separation (BSS) was required to recover the original speech sources from simultaneous speech segments.BSS was solved using the Fast ICA algorithm implemented with fast fixed-point independent component analysis algorithm [60].

Background Noise Removal using SS:
The speech of the adolescents was processed to reduce background noise and improve audio quality.In this study Spectral Subtraction (SS), implemented with Adobe Audition, was utilized to estimate the noise spectrum that was then removed from recorded speech spectrum [61].

Windowing:
The cleaned signal was normalized to within an amplitude range of -1 to 1 based on the absolute maximum amplitude.The normalized signal was segmented into 25ms frames with a 50% overlap.The length was chosen to cover an entire periodic cycle of speech and capture the fundamental frequency.The final frame was appended with random noise 30dB below the maximum frame amplitude if it was too short.The frames were windowed with an overlapping Hamming function, preferable as it minimizes the magnitude of the nearest side-lobe in the frequency domain, to increase time resolution and avoid discontinuities between segments.
Voiced Speech Segment Extraction using VAD: Depending on the production process speech is categorized as voiced or unvoiced.If airflow vibrates vocal cords and excites the vocal tract speech was considered voiced.If air is forced through a constricted vocal tract the vocal cords do not vibrate, hence has no fundamental frequency and is defined as unvoiced.In many emotion and speech analysis applications only voiced segments are analyzed with unvoiced segments considered noise [62].The voiced speech segments were extracted using a voice activity detector (VAD), based on linear prediction [62], from the MATLAB speech processing and synthesis toolbox [63].The 13th order linear prediction coefficients, energy of the prediction error, E, and the first refection coefficient, r1, were generated.The optimal thresholds of r1 and E for VAD denotes a segment as voiced if r1>0.2 and E>1.85*107 and retained [64].If the criteria are not met the segment was considered unvoiced speech or silence and discarded [64].
Feature Extraction: Speech analysis studies generally follow procedures that group acoustic features into categories that relate to the human speech production model [30,32,35] with physiological and perceptual components.The spectral category is speech related to linear speech production model through the glottis, vocal tract and filtering Table 2.The spectral category acoustic speech features are summarized in Table 2 with the corresponding number of coefficients.The features included flux, centroid, entropy, rolloff coefficients, formants with bandwidths and the power spectral density (PSD) with sub-bands and power ratios.Detailed explanations and methodologies of each subcategory feature are explained in the following sections.

Formants and Bandwidths
Formants significantly differ between depressed and non-depressed speakers [28,29,34,39,40] and can provide important spectral characteristics for speech analysis.Formants are spectral peaks of the vocal tract representing acoustic resonance frequencies.The vocal tract was modeled with a 13th order linear prediction (LP) filter.The first five formants, Fi, and bandwidths, BWi, were estimated as the peaks in the vocal tract spectral envelope represented by the poles of the transfer function, pi, as follows: 1( ) ( ) Power Spectral Density (PSD) Power spectral density (PSD) has been effective in distinguishing between speech of depressed and non-depressed subjects [34,39,40].In this study the single-sided power spectral density (PSD) was estimated using the Welch spectral estimator [65].The PSDdB, given by (3), was generated with a 4096-point FFT with 50% overlapped Hamming windows for the entire bandwidth.The total power for the entire bandwidth, P total, was calculated as the area under PSDdB using trapezoidal numerical integration.The power calculation was repeated for multiple spectral sub-bands (Psub-band) in 0-500Hz, 500-1000Hz, 1000-1500Hz, and 1500-2000Hz bands.The ratios of each sub-band powers to the total power are given by (4).

Spectral Entropy
Spectral entropy measures the amount of signal information (Shannon's information theory) and the spectral distribution spikiness.For each frame the spectral entropy, H, was computed using (5) where PS is the power spectrum, n defines the frequency component index and N the FFT length.

Spectral Centroid
The spectral centroid, CS, denotes the center of a signal's spectral power distribution as defined by (6).Where the power spectral magnitudes, 〖PS〗_n, weight the corresponding frequency, fn, for N frequency bins.

Spectral Flux
Spectral flux measures the cycle-to-cycle fluctuation in the power spectrum (PS).The spectral flux, 〖Fx〗_S was measured as the Euclidian distance between the PS of consecutive frames defined by (7) where i denotes the frame index.

Spectral Roll-Off
Energy in speech signals has a tendency to be lower at high frequencies.This quality can be observed with spectral roll-off that characterizes an energy and frequency relationship [38].
Equation (8) defines the spectral roll-off where n is the frequency bin index, PSn, is the corresponding spectral magnitude, fR is the spectral roll-off frequency and N is the total number of frequency bins.The spectral roll-off is defined as the frequency, fR, that a specified proportion, k, of the total spectral energy is accumulated.Past studies have considered the spectral roll-off where the majority of energy resides [31,32,42,44,66] or as a limited range of spectral roll-off values [33,[46][47].In the following experiments the spectral roll-off is generated for a large range of k values from 5% to 95% with 5% intervals (i.e.k=0.05:0.05:0.95)giving a total of 19 parameters.

Modeling and Classification
The SVM is considered an effective binary classifier [67] with good generalization capabilities [68,69].In this application, the LIBSVM toolbox was used for depression classification [70], with sequential minimization optimization (SMO) and a radial basis function (RBF) as the kernel.The SVM hyper-parameters (C, ϒ) were optimized, from 10% of the entire dataset, using a 3-stage grid-search to fine-tune parameters with big, medium and small scales.The objective was to determine the hyper-parameters that maximized the 3-fold cross-validated depression classification accuracy.At each stage the precision was increased and the search space reduced around the current optimal parameters.The remainder of the dataset (90%) was segmented into 80% (training) and 20% (testing) to learn the SVM weights and bias (w and b) parameters.The overall training and testing process was implemented with 3-fold CV and used the previously determined optimal hyper-parameters Typically, speech analysis applications are more effective with gender dependent models due to acoustic differences in males and female speech [71,72].Furthermore, depression symptoms [73,74] and speech [59,75,76] are suggested to differ between genders.Consequently, depression detection accuracy is improved using gender dependence [32,36,37].Therefore in this study the SVM was trained and tested using either the female adolescent (GDM-F) or the male adolescent (GDM-M) speech features for gender dependent modeling (GDM).

Sensitivity=
x 100% TP + FN TN Specificity = x 100% TN + FP TP + TN Accuracy = x 100% TP + TN + FP + FN (9,10,11) Where the true positive (TP), false positive (FP), false negative (FN) and true negative (TN) parameters are defined as follows: TP: number of samples correctly classified as depressed FP: number of samples misclassified as depressed TN: number of samples correctly classified asnon-de pressed FN: number samples misclassified as non-depressed The spectral roll-off is defined as the frequency, fR, that a specified proportion, k, of the spectrum is contained below.In past studies the spectral-roll has been designated as the frequency in that the majority of the energy exists (i.e.75%, 80%, 85%, and 95% [32,33,[41][42][43][44] and have a limited range of cut-off points [33,[45][46][47].In this study the spectral roll-off range has been extended with an increased resolution of 5% increments and larger range from 5% to 95%.The spectral roll-off range was analyzed for correlations to depression by comparing the roll-off, for each point (k). Figure 2 illustrates the relative difference of the average spectral roll-off between the depressed (D) and non-depressed (ND) adolescents.The roll-off frequencies for k<55% are higher for ND than D and in contrast for k>55% the values are higher for D than ND.The interoperation is that D compared to ND have less (more) than 55% of energy concentrated below a lower (higher) frequency.This implies D subjects have a higher energy concentration in higher frequencies than ND and that ND has relatively more energy concentrated in the lower frequencies than the D. In general speech energy has a tendency to be lower at high frequencies [38] and this is shown to be more evident in the ND case compared to the D case.The largest difference of the average spectral roll-off frequencies between D and ND is at k=30% and k=80%.On average there is minimal difference of the average spectral roll-off with k intervals between 5%-10% and 50%-80%.

Spectral Roll-off Range
The roll-off parameters were examined to determine significance in a pairwise comparison of depressed and control groups.ANOVA was performed on separate coefficients, assessed using Wilk's lambda statistical procedure and considered significant if p<0.05.The p-values, given by Tables 2 & 3, denote significance of t-tests of the spectral roll-off.The roll-off points with significance or no significance coincide with the observations made from Figure 2.

Depression Classification with Optimized Spectral Rolloff Range Features
In past studies the spectral-roll has been the frequency that the majority of the energy exists [12,3,6] or a limited range [13,14,9,76].In this study the spectral roll-off has an increased range and resolution with a total of 19 parameters.The characteristics of the spectral roll-off points range can provide additional information in relation to depression.Though, the entire range could include irrelevant/redundant parameters that do not improve depression detection.An optimal sub-set of roll-off parameters should be established to maximize depression classification performance.The optimized sub-set of roll-off parameters was generated using Min-imum redundancy and maximum relevancy (mRMR) as a filter based feature selection method [40].mRMR was implemented with mutual information quotient (MIQ) to rank feature parameters that best characterize properties to discriminate between depressed and non-depressed subjects to the constraint that features are mutually dissimilar [41,40].MIQ is defined by (12) where C is the class (D or ND), i is the current index of the selected feature and j is a feature that already belongs to the optimal subset given by S. The mutual information between feature i and class C is given by I(i,C) and the mutual information between features i and j is I(i,j).
( )   The mutual information between two features is given by ( 13), where p(x) or p(y) is the probability density function of variable x or y and p(x,y) is the joint probability density function between x and y Figure 3.The mRMR filter is followed by a second-stage wrapper to further optimize a feature sub-set that improves depression classification accuracy [40,41].The wrapper stage was carried out by iteratively removing the lowest ranked coefficient, from the first stage, to find the best 3-fold CV SVM classification accuracy.The entire feature optimisation process is summarised by Figure 3

Spectral Sub-Category Features Depression Classification
Past depression detection studies [31,34,36,] have investigated a variety of spectral features.Similar studies in adolescent depression detection using the including ORI-DB have categorized spectral features into a combined feature set [6,3,18,20].This experiment examines the same spectral features (formants, PSD, flux, centroid, entropy) as used in past studies in comparison to the proposed range of optimized spectral roll-offs from Section B. The depression classification performances of the spectral sub-category feature sets are given in Table 4 for each interaction and GDM.The feature sets belonging to the spectral category have an average accuracy of 56%, 54% and 68% (GDM-F) and 65%, 66% and 68% (GDM-M) in the EPI, FCI and PSI respectively.Spectral flux, centroid and entropy are the worst performing spectral features with just 50% accuracy and a poor sensitivity to specificity ratio.Formants, PSD and the entire roll-off set are more effective in depression detection.

Depression Classification with Combined Spectral Category Features (S and S*)
Each spectral feature, including the best individual roll-off parameter, was combined into a spectral (S*) category to replicate procedures in past studies [32,33,44,24].This was compared to a new spectral category (S) with the addition of the proposed optimized roll-offs from Section B. SVM depression classification performance of each implementation (S* and S) was compared in Tables 5 & 6.The new spectral category (S) with optimized roll-off parameters was the best spectral category by an average of 26% (GDM-M) and 22% (GDM-F) compared to the original spectral cat-egory (S*).The average accuracy, across the three interactions, was 71.4 % (GDM-M) and 70.6% (GDM-F) for S* and 97.5% (GDM-M) and 92.3% (GDM-F).forS. Overall the best result obtained using the S category was in the PSI/GDM-M case reaching 97.9% accuracy with 99.5% sensitivity and 96.5% specificity.The individual spectral features classification rates range from as 51% (spectral centroid and entropy) to 88% (optimized roll-off range) depending on the feature/topic/gender combination.Individual spectral features are improved by an average accuracy of 5% (GDM-M) and 11% (GDM-F) combined into the S* category and by 31% (GDM-M) and 33% (GDM-F) using the S category.

Conclusion
This study has provided an investigation of adolescent depression detection using a variety of commonly used spectral features, including an examination of spectral roll-off features, independent-ly and in combination.In past studies gender dependence has improved depression classification either best for females [38,39], males [32,33,] and varied amongst features [32,33,37,39].In this study depression detection was more effective in males (GDM-M) than females (GDM-F).The only exception of features performing better in GDM-F was in the conflict invoking PSI.For the majority of sub-category and category feature sets the best interaction was the PSI especially in the GDM-F case.This is consistent with past examinations of interactional topic in relation to depression detection accuracy [33,39].The three interaction tasks were specifically designed to access unique behavioral characteristics that elicit differential levels of each affect [57,46].The PSI was setup to evoke conflicting behavior that is strongly correlated to depression in family interactions and could explain the increased depression detection rates [79].
The entire range of 19 roll-off parameters improved depression detection performance compared to the best individual spectral roll-off parameter.Accuracy was improved further using the new optimized subset of spectral roll-off parameters using 2-stage mRMR/SVM feature selection reaching an average accuracy of 82.2% (GDM-M) and 70.5% (GDM-F).The optimized spectral rolloff set was the most effective compared to all of the spectral features.The spectral features were combined into spectral categories with a baseline category (S*) and a new spectral category (S) with the inclusion of the optimized spectral roll-off set.Fusing spectral features into categories determined the S category, using the optimized roll-off, had an average of 97.5% (GDM-M) and 92.3% (GDM-F) outperforming the S* category.Overall the best result was in the GDM-M/PSI case with 97.9% accuracy and 99.5%/96.5% sensitivity/specificity.The best results in this paper, of 95.1% and 97.9% for GDM-F and GDM-M, outperform the previous best ORI-DB study by Low et al., reaching 79% (GDM-F) and 87% (GDM-M) using TEO [33].The study also outperforms the best current depression detection study by Moore et al. [80] at 95.6% (GDM-F) and 91.3% (GDM-M) using optimized feature selection from glottal and prosodic feature sets [80].

Figure 1 :
Figure 1: Flow chart framework for acoustic depression recognition procedure including an outline of the three main stages: pre-processing, feature extraction and SVM modeling/classification.The procedure is illustrated with a separation for both the training and testing phases.

Figure 2 :
Figure 2: Illustration of the relative difference of the average spectral roll-offs between the non-depressed (ND) and depressed (D) subject over a range of cut-off points, k.

Figure 3 :
Figure 3: Framework to optimize a subset from the range of spectral roll-off coefficients using a 2-stage feature selection method with a filter (mRMR) and wrapper (SVM classifier).

Figure 4 :
Figure 4: Accuracy of SVM GDM-M depression classification with 1 to 19 roll-off feature coefficients kept based on MIQ ranking from the mRMR filter.
with both stages.

Figure 4 (
GDM-M) and Figure5(GDM-F) shows the depression classification accuracy for iteratively reduced subsets of roll-off coefficients via a mRMR/SVM feature selection.Crosses signify the highest classification accuracy with the optimal feature subset.The best accuracy for EPI, FCI and PSI occurred with 14, 16, and 17 (GDM-M) and 11, 10, and 17 (GDM-F) of the top ranked coefficients retained in the sub-set.

Figure 5 :
Figure 5: Accuracy of SVM GDM-F depression classification with 1 to 19 roll-off feature coefficients kept based on MIQ ranking from the mRMR filter.A summary of the spectral roll-off depression classification performance is given in Table 3 comparing the optimized spectral roll-off range, the entire roll-off set and the best individual roll-off.Depression classification rates are lower using the best individual roll-off compared to the entire roll-off feature set by an average of 19.8% (GDM-M) and 11.2% (GDM-F).The entire feature set was improved further using the optimized roll-off feature set with an average increase in depression classification accuracy of 5% (GDM-M) and 5% (GDM-F).The optimized feature selection was on average 25.2% (GDM-M) and 16% (GDM-F) more accurate compared to best individual roll-off coefficient.The optimized roll-off parameters subset has an average accuracy of 82.2% (GDM-M) and 70.5% (GDM-F) with an overall best in the PSI/GDM-F case attaining 88.1% accuracy with a sensitivity 86.8% of and specificity of 89.9%.

Table 1 :
Gender distribution of depressed and non-depressed participants from the ori database for dyadic conversations.

Table 2 :
Summary of acoustic spectral feature category with feature subcategories and the respective total number of coefficients.

Table 3 :
Anova Analysis On Depressed and Control Spectral Roll-Off Features for Male and Female Adolescents Where "<" Denotes P<0.001 and the Shaded Cells Indicate Significance (P<0.05).

Table 4 :
Depression Classification Accuracy, Sensitivity And Specificity Comparing The Entire Spectral Roll-Off Feature Set, The Best Individual Roll-Off Parameter And The Optimized Roll-Off Feature Set Using 2-Stage Mrmr/Svm Selection.

Table 5 :
Accuracy, Sensitivity and Specificity Of Svm Depression Classification (Gdm-M) Using Spectral Sub-Category Features.

Table 6 :
Depression Classification Accuracy, Sensitivity and Specificity Comparing the Spectral Categories with the roll-off Subcategory as the Best Roll-Off (S*) and Optimized Roll-Off (S).