Log-linear models for ordinal multidimensional categorical data

This study emphasised on methods for analysis of categorical data having ordered categories for the multidimensional case and the paper discusses some of the specialized models which efficiently use the information on the ordering, unlike standard methods for nominal categorical data, for multidimensional variables In order to illustrate the methodology, three dimensional data from a shopping survey in Oxford was used The Standard nominal model fitted, represented the associations between the life cycle level, car availability and the agreement with the statement I find getting to grocery shops very tiring, with 16 degrees of freedom The model selected taking the ordinal nature of variables into account also represented the same associations with 27 degrees of freedom, thus with lesser number of parameters The standard log-linear model requires describing interactions using a number of parameters where as when ordinal nature of the variables is considered, interactions can be represented by a few parameters Based on the model which takes into consideration the ordinal nature of the variables the odds ratios to illustrate the association between the life cycle and agreement, disagreement, tendency to disagree, in-between, and tendency to agree with statement are 0 8, 0 4, 0 9 and 0 9 respectively The odds ratio that describes the association of the car availability and the agreement with the statement is 0 91 It is established that ordering of categories utilizes the information reflected from data where as nominal models do not use the information in the ordered categories Also the suggested models have less parameters and are thus simpler and more parsimonious


INTRODUCTION
Statistical methodology to analyse categorical data has only recently reached the level of sophistication achieved early in this century by methodology for analysis of continuous data 1 The recent development of methods for analysis of categorical data was stimulated by increasing methodological sophistication in the social and bio-medical sciences 1 In analyzing categorical data it is necessary to consider the scale of measurement and there are two mam types of scales of interest Nominal scale is the simplest scale of measurement which assumes that distinct levels differ in quality but not in quantity and ordinal scale considers the difference of distinct levels and even a hierarchy of importance Ordinal scales are pervasive in the social sciences, in particular, for measuring attitudes and opinions on various issues and states of various response types Besides they occur commonly m many diverse fields 1 In many studies, ordinal variables are analyzed using nominal techniques assuming that results are invariant to permutations of the categories of any of the variables 2 This sacrifices a certain amount of information when the measurements are of ordinal scale Because of this, for many years in the past, researches have been carried out to extend the log-linear models to perform more complete and informative analysis for ordinal data It has been proposed that there are many advantages to be gained from using ordinal methods instead of, or in addition to the standard nominal procedures 2 Ordinal methods represent two-way associations using a single parameter whereas in standard nominal case, summarization of tables is required As ordinal methods have lesser number of parameters compared to nominal models, they are more parsimonious and thus have rriore power to test the significance of the interaction terms in the log-linear model (Proof give in Annex 2) In this paper an attempt has been made to view the importance of utilizing the ordinal property of ordinal variables in the analysis of categorical data and determine if any differences occur in the results obtained from these ordinal models when compared to standard categorical data techniques that treat all the variables as nominal.
Application of these theories have been discussed by several authors 12 ' 3,4 only for two-way tables, where as in this paper attention is paid to application of these theories to the multi-dimensional case. In order to demonstrate the usage of orderings of categories, data from a Shopping Survey in Oxford 5 was used. There were three variables of interest namely, the life cycle levels, car availability and the agreement with the statement 'Ifind getting to grocery shops very tiring. Section three provides a description of the levels of each of these three variables. This three dimensional example can easily be generalized to the multidimensional case.
The objectives ofthis study were to explore the advantages of considering the ordering of levels of categorical variables with respect to: i. Obtaining simpler models which are easily interpretable ii. Providing easier quantification of associations iii. Improving the power to test the significance of the interaction terms in the log-linear model Section 2 ofthis paper introduces the theory behind the methods used for the study and section 3 provides an illustration using an example. Section 4 discusses the results and draws conclusions. Annex 1 provides some vital proofs and tables required to understand the usage of considering the ordinal nature of variables and Annex 2 gives a proof that ordinal models have more power to detect interactions when compared to nominal models.

METHODS AND MATERIALS
The main aspect of this study was to explore the advantages of considering the ordering of levels of categorical variables.
In order to visualize the above mentioned advantages it is necessary that the data is analyzed in both ways, by treating levels as nominal and by considering the natural orderings of the levels.
The models that were used in the analysis were as follows.
a. Standard log-linear model 6 . Where e is the expected value for the cell made up by the ** row and 7 th column, \i represents the overall mean effect, kf represents the effect of the i th level of the variable X, A Y represents the effect of they* level of the variable Y.
Parameter fi describes the association between X and Y. Values u and v are the known scores assigned to the rows and columns. Often the above model is taken as, which is known as the centered model The independence model is a special case when P = 0. Since {w}and {v}are fixed the linear-by-linear association model has only one more parameter (ft) than the independence model. Thus the degrees of freedom for the linear-by-linear association model are Note that model (3) is unsaturated for tables with J>2. It is a special case of the saturated model (2), in which k XY takes the form fiu i v. While the linear-by-linear association model requires one parameter (fi) to describe association regardless of/and Jthe saturated model requires (1-1)(J-1) parameters. In many applications the choice of scores will reflect assumed distances between midpoints of categories for an underlying interval scale. Equally spaced scores result in the simplest interpretation for the model discussed in this section. In practice, the integer scores {u = z'}and { v =y}are most commonly used.

Row effects model 2 :
Here the row variable (X) is treated as nominal and column variable (Y) as ordinal. The model is appropriate for two-way tables with ordered column classifications. The ordered scores v <• v 2 <, v y are assigned to reflect the ordering of the columns. The rows are now unordered (nominal). Using the linear-by-linear structure as before and replacing the ordered values {//} in the linear-by-linear association model by the unordered parameters gives the row effects model The degrees of freedom for this model is of = total cells -number of independent parameters = (I-l)(J-2) Direct ordinal test of independence: For the linear-bylinear association model, independence is given by the null hypothesis H: 13 = 0. The likelihood ratio test statistic for testing H 0 is, G 2 (lfLxL) = G 2 (I) -G 2 (LxL). The notation G 2 (I) represents the deviance of the standard nominal model and G 2 (L xL) represents the deviance of the linear-by-linear model. The likelihood ratio statistics G 2 (IJL xL) is the difference of these two deviances and 31 measures to what extent the nominal model is a better fit compared to the linear-by-linear model. When the L x L model holds, the ordinal test using G 2 (1JL x L) is asymptotically more powerful than the test using G 2 (I) (Proof give in Annex 2). The power of a chi-squared test increases when degree of freedom decreases, for fixed non-centrality. When the L x L model holds, the noncentrality is the same for [G 2 (I JL xL)J and G 2 (I). Thus, G 2 (lJLx L) is more powerful. Since marginal distributions and marginal means and variances are identical for fitted and observed distributions, the equation (9) implies that the correlation between the scores for X and Y is the same for both distributions. The fitted counts display the same positive or negative trend as the data.

Likelihood Equations and Model
Since {uf and {v} are fixed, the L x L model log (e ) = [i + k x + k Y + p uv has only one more parameter (/3) than the independence model. Its residual df= IJ-I-Jis unsaturated for all other tables except for 2 x 2 tables.

Ordinal variables in models for multi-way tables:
Generalizations of association models can be used for multidimensional tables with ordinal responses. In three dimensions, the rich collection of models includes 1. association models that are more parsimonious than the nominal model (XY, XZ, YZ).
Models for association that are special cases of (XY, XZ, YZ) replace A association terms by structured terms that account for ordinality. For instance, when bothXand Fare ordinal, alternatives to A M are a linear-by-linear term f$ uy, a row effects term a v, or a column effects term u A; these provide a stochastic ordering of conditional distributions within rows and columns, or just within rows, or just within columns. With a linear-by-linear term the model is, The conditional odds ratio 6 ., then satisfies The u s and v 's are as defined before.
The association is the same in different partial tables, with homogeneous linear-by-linear XY association. When the association is heterogeneous, structured terms for ordinal variables make effects simpler to interpret than in the saturated model. For instance, the heterogeneous linear-by-linear association XY model allows the XY association to change across levels of Z. With unit-spaced scores,

for all i and]
It has uniform association within each level of Z, but heterogeneity among levels of Z in the strength of association. Fitting it corresponds to fitting the L x L model separately at each level of Z.

An example
In order to illustrate the methodology, data from a Shopping Survey in Oxford 4 was used. Three ordinal

Prabhani Kuruppumullage & Roshim Sooriyarachchi
variables were considered for the illustration and those were life cycle levels, car availability, and agreement with the statement 7 find getting to grocery shops very tiring 1 . The variable life cycle levels has three levels as middle-aged, younger people with children and younger people without children. The variable car availability also has three levels as no car availability, some car availability and full car availability. The third variable has five levels as disagree, tend to disagree, in between, tend to agree, and agree.
In order to visualize the advantages of considering the natural ordering of variables, first the standard loglinear model was obtained using the forward selection technique 2 . SAS PROC CATMOD was used to obtain the standard log-linear model and PROC GENMOD was used when ordinal nature is considered. The respective constraints for these procedures were sum of parameters equals zero and parameter for last level equals zero.
The selected standard nominal model to represent the associations was as follows.
Using the likelihood ratios, goodness of fit of the above model was assessed. The corresponding hypothesis is; After obtaining a model to represent the associations between the variables by ignoring the natural orderings of the variables it is of interest to fit a model which utilizes the natural orderings of the variables.
As the backward elimination procedure 2 is to be used in selecting the most representative model by utilizing the natural orderings of the variables, the model fitting was started by taking the selected standard model into account. The strategy adopted in model selection was to first examine whether any nominal terms can be replaced by linear by linear interaction terms and second to examine whether remaining nominal terms can be replaced by roweffect or column-effect terms. In doing this two things are considered.
Initially the term resulting in the largest p-value (>0.05) for the difference in deviance is selected and then the goodness of fit ofthis selected model is examined (using p-value). Only if both the p-values of the difference in deviance and goodness of fit are not significant (>0.05) is the relevant term selected.
The notation sf fe = i represents the scores associated with the variable life(life cycle) when treated as linear, u car = j represents the scores associated with the variable car(car availability) when treated as linear and v agree = fc represents variable agree(agreement with the statement) when treated as linear.
When the deviance increments with respect to model A were considered it was seen that, the model in which both the variables car and agree were treated as linear resulted in the largest p-value (0.1502) and is greater than 0.05. Thus it is possible to treat both variables in the term car * agree in model (12) as linear. The p-value of the model for the goodness of fit is 0.053 which is marginally higher than 0.05. This shows that while the fit of the model is not excellent it is adequate.
Selection of linear-by-linear interaction term(s):VJYi$n considering model A it is clear that the first two-factor interaction term that was chosen to be treated as linear was car*agree. It was found that both these ordinal variables could be treated as linear without making any significant changes to the goodness of fit of the selected standard log-linear model. The results obtained in each of the cases are tabulated in table 3.1.
Then the next two-way interaction terms were treated as linear-by-linear terms and the results are summarized in table 3.2.
It is revealed from the    Thus it was concluded that the only linear-by-linear term that could be included in the model is (u car * vf gree ).

Selection of row-effects/column effects interaction term(s):
After selecting the linear-by-linear terms, it was attempted to make other two-way interaction terms row-effects/columneffects by using the same strategy used in the above section.
The two-way interaction terms life*agree and life*car were treated as row-effects and column-effects terms respectively and the results are summarized Table  3.3.) Table 3.3 reveals that the largest p-value corresponding to deviance difference is 0.3725 which is greater than 0.05. This is when variable lifeflife cycle) is treated as linear and variable agree(agreement with the statement) is treated as factor in the two-factor interaction life*agree. Thus it could be concluded that two-way interaction life*agree could be treated as row-effects term. When the goodness of fit p-value of the model (=0.0607) is considered it is clear that this model fits well at 5% significance level. Thus the selected model is as follows. The similar procedure was applied to obtain more row-effects/column-effects terms and the results are summarized in Table 3.4.
The largest p-value corresponding to deviance difference is 0.1088 which is greater than 0.05. Thus the model is not significantly different from model C. But when the goodness of fit of the model is considered it is seen that this model does not fit well as the p-value corresponding to the model 0.0397 is less than 0.05. Thus this model is not selected and it is concluded that model C is the best model obtained by taking the natural orderings of the variables into account. When referring the model C it could be seen that the model C is a combination of a linear-by-linear association term (u car * v k agree ), a row-effects term (s i hfe *agree fc ) and

two-way nominal interaction term between (life*car ).
After selecting the model, it is necessary to assess the goodness of fit of the selected model. The following hypothesis is used for this. And it is seen that the value of the deviance of the chosen model is less than the corresponding chi-square table value. Also the p-value of the selected model is 0.0607 (> 0.05) which implied that the chosen model represents associations well. Thus it is possible to conclude at the 5% significance level that there is no sufficient evidence to say that the chosen model does not fit well enough. Hence the best model to represent the associations between factors life cycle level, car availability, and agreement with the statement 'I find getting to grocery shops very tiring' after considering the natural orderings of the appropriate variables is model C.
Also it is necessary that the selected model does not deviate significantly from the standard log-linear model. Thus a comparison of the standard log-linear model and the combination model obtained by considering the natural ordering were compared. It is also known that this deviance increment follows an asymptotic chi-square distribution and thus the corresponding table value x 2 1} m ( = 19.6751) is used to assess the following hypothesis.
As the deviance increment due to consideration of linearity of possible variables (=15.0006) is less than the corresponding chi-square table value y? u 5% (= 19.6751), it is concluded at 5% significance level not to reject the null hypothesis. Thus it could be concluded at 5% significance level that there is no sufficient evidence to say that the two models are significantly different with respect to the fit. Hence it is possible to select the model C which had higher number of degrees of freedom and thus a simpler model. The whole idea behind this paper is to utilize information revealed by the natural orderings of the variables. And it was discovered that by considering the natural ordering of the variables, it is possible to save 11 degrees of freedom and could thus obtain a simpler model compared to the standard log-linear model.

Parameter estimation, odds ratio calculation and model interpretation:
After selecting the model the interest is then to interpret the model using the parameter estimates and appropriate odds ratios. For two-way interactions where both the variables are treated as factors, it is attempted to look at the estimated parameters for a better interpretation.  It is seen that middle-aged people had a high chance to have some car availability compared to younger people without children. Also the chance of having no car availability for younger people with children is lower than the younger people without children.
To interpret the associations with linear effects relative odds ratios are calculated and have been explained below. As in the two-way interaction life*agree, it is found that variable life behaves linearly and the corresponding odds ratio is obtained for this case.
The estimates corresponding to first four levels of the agreement with the statement with respect to the respondents who had agreed with the statement are agree j = 0.2316, agree 2 = 0.8061, agree 3 = 0.0092 and agree ~ 0.1273. The further agree t falls in the positive direction, the greater the tendency for the respondents with level of agreement i to locate at the maximum life cycle level (i.e: younger people without children) relative to respondents who have agreed to the statement. In this case younger people without children have disagree with the statement 'I find getting to grocery shops very tiring' than the younger people with children and middle-aged people.
The derivation of the estimated odds ratios are given in the Annex 1. All estimated log odds ratios are negative indicating a tendency for middle aged people to agree with the statement.

When k=l;
The estimated odds that a respondent who has agreed with the statement being younger person without children instead of younger person with children, or being younger person with children instead of middle-aged, are 0.8 times {exp(-0.2316) = 0.7933} the corresponding estimated odds for a respondent who has disagreed with the statement. The 95% confidence interval is (0.6274, 1.0030). As one is included in the confidence interval, null hypothesis 6 = 1 is not rejected. This indicates that there is no difference in odds, of being younger person without children instead of younger person with children, or being younger person with children instead of middleaged, between respondents who have agreed and disagreed with the statement.
However here the upper limit is only just above one and therefore the result is nearly significant Thus, it could be concluded that the estimated odds that a respondent who finds getting to grocery shops very tiring being younger person without children instead of younger person with children, or being younger person with 37 children instead of middle-aged is around 0.8 times less compared to a respondent who does not find getting to grocery shops very tiring.

When k=2;
The estimated odds that a respondent who has agreed with the statement being younger person without children instead of younger person with children, or being younger person with children instead of middle-aged, are 0.4 times {exp(~0.8061) -0.4466} the corresponding estimated odds for a respondent who has tend to disagreed with the statement. The 95% confidence interval is (0.2773, 0.7194). As one is not included in the confidence interval, null hypothesis 6 = 1 is rejected. Thus the confidence interval supports the conclusions taken using odds ratio. Thus it is concluded that the estimated odds that a respondent who finds getting to grocery shops very tiring being younger person without children instead of younger person with children, or being younger person with children instead of middle-aged is less (0.4 times) comparatively to a respondent who does not tend to find getting to grocery shops very tiring.

When k=3;
The estimated odds that a respondent who has agreed with the statement being younger person without children instead of younger person with children, or being younger person with children instead of middle-aged, are 0.9 times {exp(-0.0092) = 0.9908} the corresponding estimated odds for a respondent who has responds (agreed) inbetween with the statement. The 95% confidence interval is (0.6756, 1.4532). As one is included in the confidence interval, null hypothesis 6 = 1 is not rejected. This indicates that there is no difference in odds of being a younger person without children instead of younger person with children or being younger person with children instead of middle-aged between respondents who have agreed and are in-between.

When k=4;
The estimated odds that a respondent who has agreed with the statement being younger person without children instead of younger person with children, or being younger person with children instead of middle-aged, are 0.9 times {exp(-0.1273) = 0.8805} the corresponding estimated odds for a respondent who has tend to agree with the statement. The 95% confidence interval is (0.5978, 1.2967). As one is included in the confidence interval, null hypothesis 6 = 1 is not rejected. The interpretation is similar to k=3.
Also it is found that both the variables in the twoway interaction term car*agree where car represents the car availability and agree represents the agreement with the statement, could be treated as linear. Thus the selected term is a linear-by-linear term and the information reflected from this term could be interpreted as follows.
The ML estimate fi = -0.0950 (negative) indicates that the respondents having higher car availability tend to disagree more with the statement 7 find getting to grocery shops very tiring'.
The corresponding odds ratio is; Thus the estimated odds that a respondent agreeing with the statement (who finds getting to grocery shops very tiring) having full car availability instead having some car availability, or having some car availability instead of no car availability is approximately 0.9 times {exp (-0.0950) = 0.9094} the corresponding estimated odds for a respondent who does not find getting to grocery shops very tiring. The 95% confidence interval is (0.8615, 0.9599) which does not include 1. Thus the null hypothesis 6 = 1 is rejected and hence it is seen that the confidence interval supports the conclusions taken using the odds ratio.
Thus it could be concluded that respondents do not find getting to grocery shops tiring with increasing car availability.

DISCUSSION
The main objective of this paper was to discuss the advantages of using the natural orderings of ordinal categorical variables. It was of particular interest to illustrate that models which use the natural orderings are simpler, provide easier quantification of associations in terms of odds ratios and have more power to detect interactions when compared to models which use nominal scale variables. A three-dimensional example was used for illustration.
A standard log-linear model was chosen to explore the associations among life cycle levels, car availability, and agreement with the statement '1 find getting to grocery shops very tiring' by considering the variables to be of nominal scale.
Then the model was chosen by taking the natural orderings of the variable into account. In this case it was

March 2007
Prabhani Kuruppumullage <£ Roshini Sooriyarachchi found that in the association between car and agree, both variables car and agree could be treated as linear variables than factors. And also in the association between life and agree, variable life could be treated as continuous. This reduced the number of parameters corresponding to each of the interactions. Due to this parameter reduction the degrees of freedom corresponding to the model was increased (by 11) and thus it was possible to obtain a more parsimonious model with much simpler interpretations.
Estimated odds showed that, respondents with higher car availability do not find getting to grocery shops very tiring and thus tends to disagree with the statement 7 find getting to grocery shops very tiring'.
Also respondents who have agreed with the statement being younger persons without children instead of younger persons with children, or being younger persons with children instead of being middleaged, are 0.4 times {exp(-0.8061) = 0.4466} the corresponding estimated odds for a respondent who tends to disagree with the statement.
It was clear from the analysis that the suggested model tests the associations with 27 degrees of freedom where as standard log-linear model tests the same associations only with 16 degrees of freedom. Thus the number of parameters used to interpret the associations in the suggested model is less than in the standard loglinear model, illustrating that the suggested model is simpler. Though standard log-linear models require 2x2 sub-tables to describe the interactions, the suggested model can be easily utilized to calculate odds ratios (Annex 1) to describe the similar interactions. As shown in Annex 2 the power for testing associations is higher in the suggested model.
Throughout this paper an illustration of methods of selection of terms, deciding the form of the terms and interpreting terms have been studied for the three dimensional case. This could be easily generalized, using the same approach for the multi dimensional case. This work could further be extended by examining the magnitude of increase in power of the likelihood-ratio tests, when the ordinal nature of the categorical variables is utilized, by using simulation studies. To assess the 95% confidence interval; var {log e (B [k(/) )} = var(agree (k+!) ) + var(agree) -

Annex 2
The power of a test is defined to be the probability of rejecting the null hypothesis given that the alternative hypothesis is true.
It is denoted by

-(3 = Pr (rejecting H Q /H is true)
For simplicity consider the two dimensional case where variables X and Y are ordinal categorical variables with levels I and J respectively. Consider the three models The notation in the models is as defined before. Suppose the three models have deviances D r D 2 = 0, andD 3 respectively. The degrees of freedom of the models are v } =1J-1-J+ 1, v 2 = 0, v 3 =1J-1-J respectively. The power with which to detect the association between X and Y when such an association is present: Based on the nominal log-linear model (ii) is given by