Canonical Correlation Analysis to Study the Impacts of Different Social Factors on Awareness of Health Hazard of Tobacco Smoking and Smoking Habit

The present analysis is done using the data collected from 1012 students of three universities, where students are investigated according to the convenience sampling plan. Most of the students (88.3%) are highly aware of the problem of health hazard of smoking. Still a good number of students (32.9%) are prone to smoking. Smoking habit is prevailed in higher rate among aged students. Awareness of health hazard of smoking and smoking habit is associated, and these two characters are associated with different socioeconomic background of the students. Thus, canonical correlation analysis is performed to study the complex relationship of awareness and smoking habit with other socioeconomic variables. The analysis indicates that important variables for complex relationship of awareness and smoking habit are sex and marital status. Biomedical Journal of Scientific & Technical Research Cite this article: Bhuyan KC, Urmi AF. Canonical Correlation Analysis to Study the Impacts of Different Social Factors on Awareness of Health Hazard of Tobacco Smoking and Smoking Habit. Biomed J Sci&Tech Res 10(5)-2018. BJSTR. MS.ID.002011. DOI: 10.26717/ BJSTR.2018.10.002011. Volume 10Issue 5: 2018 8092 to the awareness which is not affirmative to the awareness. The maximum of the sum of the assigned values toward awareness is 60 and the minimum is 20. These values are different for different respondents. According to the sum of the assigned values in favors of awareness, the respondents are classified into 3 classes viz. a. Low in awareness (sum of the assigned values <30), b. Medium in awareness (sum of the assigned value is 30-40) and c. High in awareness (sum 40+). Let Rxx, Ryy and Rxy be the sample correlation matrices of the variables in X-set,Y-set and in both X-set and Y-set, respectively. According to the objective of the study, it is needed to find Y*=b́ Y and X*=á X, two liner combinations of the variables in Y-set and X-set, respectively, so that the simple correlation coefficient of X* and Y* becomes maximum, where a and b are eigen values of the characteristics equations ( 1 1 ) 0 ( 1 1 ) 0 Rxx Rxy Ryx I a Ryy RyxRxx Rxy I b λ λ − − − = − − − = Here the elements in a and b are the canonical weights, the magnitude of which indicates the importance of the variables in X-set and Y-set, respectively to show the maximum correlation between the variables in both sets. The canonical correlation analysis is fruitful if the variables in X-set and Y-set are significantly correlated. This can be done by the test statistics 2 ( 1) 1/ 2( 1) X n p q In = − − − = = ∆ ' 1(1 ), ' M j M j M M λ = + ∆ = − < ∏ are the eigen values of characteristic equations given above; p and q are the number of variables in Y-set, (p=2) and X-set (q=9), respectively. The number of j λ is M= min (p,q). This 2 X has pq d.f. The rejection of : 0 o XY H = ∑ against : 0 A XY H ≠ ∑ by the above χ2-test statistic justifies the fruitful canonical correlation analysis. From the analysis, number of canonical variable of pairs is min (p, q). But all the pairs may not be statistically significant. The significance of j-th canonical variate pair is tested by the statistic. 2 ( 1) 1/ 2( 1)1 * x n p q n = − − − = = ∆ ,where ' * 1(1 ), ' j M j M M λ − ∆ = + − < ∏ This 2 x has ( ')( ') p M q M − − d.f. The main objective of the analysis is to study the relationship of any variable in Y-set with any variable in X-set. The amount of relationship can be measured by calculating cross-weights, where the cross-weight is the product of canonical loadings of any variable and canonical correlation coefficient. For j-th canonical variate pair √λj is the canonical correlation coefficient and (1) . (1) * * . XX XX j yy yy j R R a andR R b = = are the canonical loadings of X-set and Y-set, respectively, corresponding to j-th canonical variate pair. Here , ai and bj are the vectors of canonical weights for j-th variate pair. Each canonical variate pair explains certain percentage of total variation of Y-set and X-set. This can be measured , respectively by 2 ( ) ( ). ( ) 1 / * ' * j y yy i yy j R pR R = and 2 ( ) ( ). ( ) 1 / * ' * j y XX i XX j R qR R = Results and Discussion Among the investigated units 82.1% are male students and among them 38.1% are smokers (Table 1). More male students are smokers. The differentials in smoking habit among males and females are statistically significant as 2 ( ) 57.822) .000 p x ≥ = though most of the students 88.3%, (Table 2) are highly aware of health hazard of smoking. More female students are aware (90.1%) of the problem, still a good number (8.8%) of them are smokers. However , the differentials in awareness among males and females are not significant [ χ2=0.630, p= 0.427]. The study indicates that smoking is highly prevailed among males compared to females but both males and females are similarly aware of the health hazard of smoking. On the other hand, it is seen that (Table 3) rate of smokers is less among the students who are highly aware of the problem. Awareness and smoking habit are negatively significantly associated. [χ2= 5.423, p=.02]. The data show that 70.4% respondents are from urban area and among them 87.4% are highly aware of the problem of health hazard. This latter percentage (Table 4) among rural students is 90.7. Most of the respondents, either from rural area or urban area, are aware of the problem. Table 1: Distribution of students according to sex and smoking habit. Sex Smoking Habit Total n % Yes n % No n % Male 317 38.1 514 61.9 831 82.1 Female 16 8.8 165 91.2 181 17.9 Total 333 32.9 894 67.1 1012 100 Table 2: Distribution of students according to awareness of health hazard of smoking and sex. Sex Awareness Total n % Medium n % High n % Male 100 12.0 731 88.0 831 82.1 Female 18 9.9 163 90.1 181 17.9 Total 118 11.7 894 88.3 1012 100 Table 3: Distribution of students according to awareness of health hazard of smoking and smoking habit. Smoking habit Awareness Total n %


Introduction
Tobacco in smoked form is consumed around the world and due to this a serious health threat is posed throughout the world. In a study Cohen [1] has reported that smoking is increasingly prevalent habit in Bangladesh, particularly among males. According to Global Tobacco Survey [2] 60% tobacco users consume only smokeless tobacco. Though tobacco smoking remains the leading preventable cause of death throughout the world, still global projected tobacco induced death at over 6 million annually [3]. However, by antismoking campaigns and programs a considerable success has been achieved to prevent the disease [4]. Tobacco use is a global epidemic among young people. As with adults, it poses a serious health threat to youth and young adults. Most young smokers became adult smokers and 50% of adult smokers die prematurely from tobacco related disease [5]. Thus, the health care provider need ways and means to prevent death among smokers.
The barriers in the implementation of policy related to tobacco control are education and awareness among consumers. Knowledge of health effects of smoking is an important factor in predicting smoking related behavior, including lower likelihood of initiation and greater likelihood of quitting [3,6-9]. Khatun and Bhuyan [10] and Bhuyan et al. [11] observed that among the university students awareness is increasing and highly aware students are less likely to smoke. Again, awareness and smoking habit are associated with some socioeconomic factors. Thus, we are interested to study the joint relationship of smoking habit and awareness with other socioeconomic characteristics. This type of analysis is possible if there are one dependent set of variables and one independent set of variables. Such analysis is known as canonical correlation analysis [12][13][14]. In this paper canonical correlation analysis is done to study the complex relationship of smoking habit and awareness of health hazard of smoking with some of the socioeconomic background factors of the respondents.

Methodology
For canonical correlation analysis the criterion set of variables are awareness (y 2 ) and smoking habit (y 1 ) [Y-set] and the variables age(x 1 ), sex(x 2 ) , marital status(x 3 ), religion(x 4 ), education of father (x 5 ), education of mother(x 6 ), occupation of father(x 7 ), occupation of mother(x 8 ) , family income (x 9 ) are used as predictor sets(X-set). All the variables are measured in nominal scale for the analysis purpose. The awareness of health hazard of students [10,14] has been studied on the basis of nominal scale of 20 questions each of which has closed answers like 'True', 'False', 'Don't know'. The alternative answers toward the knowledge of awareness is assigned '3' followed by '2' with less awareness and '1' is answered 8092 to the awareness which is not affirmative to the awareness. The maximum of the sum of the assigned values toward awareness is 60 and the minimum is 20. These values are different for different respondents. According to the sum of the assigned values in favors of awareness, the respondents are classified into 3 classes viz.
a. Low in awareness (sum of the assigned values <30), b. Medium in awareness (sum of the assigned value is 30-40) and c. High in awareness (sum 40+).
Let Rxx, Ryy and Rxy be the sample correlation matrices of the variables in X-set,Y-set and in both X-set and Y-set, respectively. According to the objective of the study, it is needed to find Y*=b́ Y and X*=á X, two liner combinations of the variables in Y-set and X-set, respectively, so that the simple correlation coefficient of X* and Y* becomes maximum, where a and b are eigen values of the characteristics equations Here the elements in a and b are the canonical weights, the magnitude of which indicates the importance of the variables in X-set and Y-set, respectively to show the maximum correlation between the variables in both sets. The canonical correlation analysis is fruitful if the variables in X-set and Y-set are significantly correlated. This can be done by the test statistics 2  . But all the pairs may not be statistically significant. The significance of j-th canonical variate pair is tested by the statistic.
The main objective of the analysis is to study the relationship of any variable in Y-set with any variable in X-set. The amount of relationship can be measured by calculating cross-weights, where the cross-weight is the product of canonical loadings of any variable and canonical correlation coefficient. For j-th canonical variate pair √λj is the canonical correlation coefficient and (1) .
are the canonical loadings of X-set and Y-set, respectively, corresponding to j-th canonical variate pair. Here , a i and b j are the vectors of canonical weights for j-th variate pair. Each canonical variate pair explains certain percentage of total variation of Y-set and X-set. This can be measured , respectively by

Results and Discussion
Among the investigated units 82.1% are male students and among them 38.1% are smokers (Table 1). More male students are smokers. The differentials in smoking habit among males and females are statistically significant as 2 ( ) 57.822) .000 p x ≥ = though most of the students 88.3%, ( Table 2) are highly aware of health hazard of smoking. More female students are aware (90.1%) of the problem, still a good number (8.8%) of them are smokers. However , the differentials in awareness among males and females are not significant [ χ 2 =0.630, p= 0.427]. The study indicates that smoking is highly prevailed among males compared to females but both males and females are similarly aware of the health hazard of smoking. On the other hand, it is seen that (Table 3) rate of smokers is less among the students who are highly aware of the problem. Awareness and smoking habit are negatively significantly associated. [χ 2 = 5.423, p=.02]. The data show that 70.4% respondents are from urban area and among them 87.4% are highly aware of the problem of health hazard. This latter percentage (Table 4) among rural students is 90.7. Most of the respondents, either from rural area or urban area, are aware of the problem.    . is also observed in the levels of awareness among the respondents of different ages. The study is also similar to that reported by Bhuyan et al. [10,11]. Insignificance in variations in the levels of education of father [χ 2 =0.33, p= 0.848], occupation of mother [χ 2 =3.75, p>0.05] and education of mother [χ 2 =1.735, p=0.42] according to levels of awareness are also observed in analyzing the data. However, father's occupation is significantly associated (Table 6), [χ 2 =6.32, p<0.05] with the levels of awareness of their offsprings. The analytical results indicate that some of the socioeconomic variables are associated with the knowledge of health hazard of smoking. Again, smoking is significantly associated with knowledge of awareness ( Table  3). Smoking is also associated with some of the socioeconomic characters [10,11]. Important socioeconomic variables of the respondent which are significantly associated with their smoking habit are their sex, (Table 1), age (Table 7) and father's occupation (Table 6).   It is seen that with the increase in ages of the respondents smoking habit is increased significantly [χ 2 =12.109,p=0.002]. Prevalence of smoking is more among higher aged students. This is natural as time passes on the students are influenced by their friends and most of them are away of their parents. Social and family restriction on smoking is reduced day by day.  [11]. It is seen that some of the socioeconomic variables are associated with awareness of health hazard of smoking and smoking habit. Again, smoking habit is associated with awareness of health hazard. Thus to study the complex relationship of socioeconomic variables with smoking habit and awareness of health hazard of smoking canonical correlation analysis is performed. The analysis is done by transforming the variables in nominal scale. In performing the canonical correlation analysis the following information are observed. Here Rxx is the correlation matrix (Table  9) of the predictor variables, Rxy (Table 10) is the correlation matrix of the criterion and predictor variables and Ryy is the correlation (Table 11) matrix of the criterion variables. The rank of the product matrix Rxx-1Rxy Ryy-1R or Ryy-1RyxRxx-1Rxy is M=min(p, q)=2 and hence canonical variate pairs. There will be

Volume 10-Issue 5: 2018
8094 at best 2 canonical variate pairs. The variate pairs are related to the eigen values λ 1 = 0.078 and λ 2 = .017 and both pairs are found significant (Table 12). The canonical weights are the elements of eigen vectors corresponding to λ 1 and λ 2 and these weights indicate the importance of the variables to maximize the correlations of two sets. The weights are shown in Table 13.    It is seen that the first canonical variate pair explains 82.26% of variation in the data set and the important variables to explain this variation are sex and smoking habit. These two variables are significantly associated ( Table 2). The second canonical variate pair explains 17.74% of variation in the data set and the important variables to explain this variation are marital status and awareness of health hazard of smoking. From the correlation matrix (Table 10) it is seen that the pair sex and smoking habit and marital status and awareness are highly correlated. The canonical correlation may not provide the real importance of variables if the variables in X-set are collinear. To avoid this problem standardized canonical correlation coefficients are calculated (Table 13). However, from both the analytical results similar conclusion can be drawn (Table 14).

Conclusion
The present analysis is based on data collected from 1012 students of American International University Bangladesh, Jahangirnagar University and World University. The students are investigated according to convenience sampling under the supervision of teachers of the respective universities. Among the selected students 82.1% are males and 88.0% among them are highly aware of the health hazard of smoking. Still a good number of students (32.9%) are smokers. However, those who are aware of the problem of health hazard of smoking they are less prone (31.3%) to smoking. Lower level of awareness leads the students to be smoker in higher number. From the analysis it is seen that 88.3% respondents are highly aware of the problem. No one is observed, who is unaware of the problem. Awareness is independent of ages of respondents but smoking habit is not independent of ages and awareness, more students of higher ages are prone to smoking.
The study indicates that awareness and smoking habit are highly inter-related. Again, both these aspects are associated with some of the socioeconomic characters of the respondents. Specifically, the offspring of servicemen are more prone to smoking. As some of the socioeconomic characters of the respondents with awareness and smoking habit, are associated, canonical correlation analysis [12][13][14] has been performed to study the complex relationship of awareness and smoking habit with other socioeconomic variables. The analysis indicates that sex of respondents and their smoking habit and marital status and awareness are significantly interrelated.