mChoice                package:Hmisc                R Documentation

_M_e_t_h_o_d_s _f_o_r _S_t_o_r_i_n_g _a_n_d _A_n_a_l_y_z_i_n_g _M_u_l_t_i_p_l_e _C_h_o_i_c_e _V_a_r_i_a_b_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'mChoice' is a function that is useful for defining a group of
     variables on the right side of the formula.  The variables can
     represent individual choices on a multiple choice question.  These
     choices are typically factor or character values but may be of any
     type.  Levels of component factor variables need not be the same;
     all unique levels (or unique character values) are collected over
     all of the multiple variables.  Then a new character vector is
     formed with integer choice numbers separated by semicolons. 
     Optimally, a database system would have exported the
     semicolon-separated character strings with a 'levels' attribute
     containing strings defining value labels corresponding to the
     integer choice numbers.  'mChoice' is a function for creating a
     multiple-choice variable after the fact. 'mChoice' variables are
     explicitly handed by the 'describe' and 'summary.formula'
     functions. 'NA's or blanks in input variables are ignored. 

     'format.mChoice' will convert the multiple choice representation
     to text form by substituting 'levels' for integer codes.
     'as.double.mChoice' converts the 'mChoice' object to a binary
     numeric matrix, one column per used level (or all levels of
     'drop=FALSE'.  This is called by the user by invoking
     'as.numeric'.  There is a 'print' method and a 'summary' method,
     and a 'print' method for the 'summary.mChoice' object.  The
     'summary' method computes frequencies of all two-way choice
     combinations, the frequencies of the top 5 combinations,
     information about which other choices are present when each given
     choice is present, and the frequency distribution of the number of
     choices per observation.  This 'summary' output is used in the
     'describe' function.

     'inmChoice' creates a logical vector the same length as 'x' whose
     elements are 'TRUE' when the observation in 'x' contains at least
     one of the codes or value labels in the second argument.

     'is.mChoice' returns 'TRUE' is the argument is a multiple choice
     variable.

_U_s_a_g_e:

     mChoice(..., label='', sort.=TRUE,
             sort.levels=c('original','alphabetic'), 
             add.none=FALSE, drop=TRUE)

     ## S3 method for class 'mChoice':
     format(x, minlength=NULL, sep=";", ...)

     ## S3 method for class 'mChoice':
     as.double(x, drop=FALSE, ...)

     ## S3 method for class 'mChoice':
     print(x, long=FALSE, ...)

     ## S3 method for class 'mChoice':
     summary(object, ncombos=5, minlength=NULL, drop=TRUE, ...)

     ## S3 method for class 'summary.mChoice':
     print(x, prlabel=TRUE, ...)

     ## S3 method for class 'mChoice':
     x[..., drop=FALSE]

     inmChoice(x, values)

     is.mChoice(x)

_A_r_g_u_m_e_n_t_s:

     ...: a series of vectors

   sort.: By default, choice codes are sorted in ascending numeric
          order.  Set 'sort=FALSE' to preserve the original left to
          right ordering from the input variables.

   label: a character string 'label' attribute to attach to the matrix
          created by 'mChoice' 

sort.levels: set 'sort.levels="alphabetic"' to sort the columns of the
          matrix created by 'mChoice' alphabetically by category rather
          than by the original order of levels in component factor
          variables (if there were any input variables that were
          factors) 

add.none: Set 'add.none' to 'TRUE' to make a new category ''none'' if
          it doesn't already exist and if there is an observations with
          no choices selected. 

    drop: set 'drop=FALSE' to keep unused factor levels as columns of
          the matrix produced by 'mChoice' 

       x: an object of class '"mchoice"' such as that created by
          'mChoice'.  For 'is.mChoice' is any object.

  object: an object of class '"mchoice"' such as that created by
          'mChoice'

 ncombos: maximum number of combos.

minlength: By default no abbreviation of levels is done in 'format' and
          'summary'.  Specify a positive integer to use abbreviation in
          those functions.  See 'abbreviate'.

     sep: character to use to separate levels when formatting

    long: Set to 'TRUE' to print the formatted levels.  Otherwise
          integer codes are printed. 

 prlabel: set to 'FALSE' to keep 'print.summary.mChoice' from printing
          the variable label and number of unique values

  values: a scalar or vector.  If 'values' is integer, it is the choice
          codes, and if it is a character vector, it is assumed to be
          value labels.

_V_a_l_u_e:

     'mChoice' returns a character vector of class '"mChoice"' plus
     attributes '"levels"' and '"label"'. 'summary.mChoice' returns an
     object of class '"summary.mChoice"'.  'inmChoice' returns a
     logical vector. 'format.mChoice' returns a character vector, and
     'as.double.mChoice' returns a binary numeric matrix.

_A_u_t_h_o_r(_s):

     Frank Harrell 
      Department of Biostatistics 
      Vanderbilt University 
      f.harrell@vanderbilt.edu

_S_e_e _A_l_s_o:

     'label'

_E_x_a_m_p_l_e_s:

     options(digits=3)
     set.seed(3)
     n <- 20
     sex <- factor(sample(c("m","f"), n, rep=TRUE))
     age <- rnorm(n, 50, 5)
     treatment <- factor(sample(c("Drug","Placebo"), n, rep=TRUE))

     # Generate a 3-choice variable; each of 3 variables has 5 possible levels
     symp <- c('Headache','Stomach Ache','Hangnail',
               'Muscle Ache','Depressed')
     symptom1 <- sample(symp, n, TRUE)
     symptom2 <- sample(symp, n, TRUE)
     symptom3 <- sample(symp, n, TRUE)
     cbind(symptom1, symptom2, symptom3)[1:5,]
     Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')
     Symptoms
     print(Symptoms, long=TRUE)
     format(Symptoms[1:5])
     inmChoice(Symptoms,'Headache')
     levels(Symptoms)
     inmChoice(Symptoms, 3)
     inmChoice(Symptoms, c('Headache','Hangnail'))
     # Note: In this example, some subjects have the same symptom checked
     # multiple times; in practice these redundant selections would be NAs
     # mChoice will ignore these redundant selections

     meanage <- N <- numeric(5)
     for(j in 1:5) {
      meanage[j] <- mean(age[inmChoice(Symptoms,j)])
      N[j] <- sum(inmChoice(Symptoms,j))
     }
     names(meanage) <- names(N) <- levels(Symptoms)
     meanage
     N

     # Manually compute mean age for 2 symptoms
     mean(age[symptom1=='Headache' | symptom2=='Headache' | symptom3=='Headache'])
     mean(age[symptom1=='Hangnail' | symptom2=='Hangnail' | symptom3=='Hangnail'])

     summary(Symptoms)

     #Frequency table sex*treatment, sex*Symptoms
     summary(sex ~ treatment + Symptoms, fun=table)
     # Check:
     ma <- inmChoice(Symptoms, 'Muscle Ache')
     table(sex[ma])

     # could also do:
     # summary(sex ~ treatment + mChoice(symptom1,symptom2,symptom3), fun=table)

     #Compute mean age, separately by 3 variables
     summary(age ~ sex + treatment + Symptoms)

     summary(age ~ sex + treatment + Symptoms, method="cross")

     f <- summary(treatment ~ age + sex + Symptoms, method="reverse", test=TRUE)
     f
     # trio of numbers represent 25th, 50th, 75th percentile
     print(f, long=TRUE)

