Documentation

From IMaChWiki
Jump to: navigation, search

Computing Health Expectancies using IMaCh

A Maximum Likelihood Computer Program using Interpolation of Markov Chains)

INEDEuro-REVESIMaCh
INED and EUROREVES Version 0.99s12, November 2024

Main Author of the program: Nicolas Brouard, senior researcher at the Institut National d'Etudes Démographiques (INED, Paris) in the "Mortality, Health and Epidemiology" Research Unit
Agnès Lièvre contributed to earlier versions of the programme until 2007

Contribution to the mathematics: C. R. Heathcote (Australian National University, Canberra).


Contact: Nicolas Brouard (brouard chez ined point fr)

See (http://euroreves.ined.fr/imach IMaCh main page for implementation and Downloads) as well as Main page of this wiki for other information as well as performances.


Contents

Introduction

This program computes Healthy Life Expectancies from cross-longitudinal data using the methodology pioneered by Laditka and Wolf (1). Within the family of Health Expectancies (HE), disability-free life expectancy (DFLE) is probably the most important index to monitor. In low mortality countries, there is a fear that when mortality declines (and therefore total life expectancy improves), the increase will not be as great, leading to an Expansion of morbidity. Most of the data collected today, in particular by the international REVES network on Health Expectancy and the disability process, and most HE indices based on these data, are cross-sectional. This means that the information collected comes from a single cross-sectional survey: people from a various ages (but often old people) are surveyed on their health status at a single date. The proportion of people disabled at each age can then be estimated at that date. This age-specific prevalence curve is used to distinguish, within the stationary population (that is based on the period life table computed from vital statistics on deaths which occurred around the same date), the disabled population from the disability-free population. Life expectancy (LE) (or total population divided by the yearly number of births or deaths of this stationary population) is then decomposed into disability-free life expectancy (DFLE) and disability life expectancy (DLE). This method of computing HE is usually called the Sullivan method (after the author who first described it). It is a mixed method, cross-sectional data (prevalences) for health statuses and period data (incidences) for mortality.

IMaCh needs period data (incidences) for all changes in states (health statuses as well as mortality).

The age-specific proportions of people disabled (prevalence of disability) are dependent upon the historical flows from entering disability and recovering in the past. The age-specific forces (or incidence rates) of entering disability or recovering a good health, estimated over a recent period of time (as period forces of mortality), are reflecting current conditions and therefore can be used at each age to forecast the future of this cohort if nothing changes in the future, i.e to forecast the prevalence of disability of each cohort. Our finding (2) is that the period prevalence of disability (computed from period incidences) was lower than the cross-sectional prevalence.

For example if a country is improving its technology of prosthesis, the incidence of recovering the ability to walk will be higher at each (old) age, but the prevalence of disability will only slightly reflect an improvement because the prevalence is mostly affected by the history of the cohort and not by recent period effects. To measure the period improvement we have to simulate the future of a cohort of new-born entering or leaving the disability state or dying at each age according to the incidence rates measured today on different cohorts. The proportion of people disabled at each age in this simulated cohort will be much lower that the proportions observed at each age in a cross-sectional survey. This new prevalence curve introduced in a life table will give a more realistic HE level than the Sullivan method which mostly reflects the history of health conditions in a country.

Therefore, the main question is how can we measure incidence rates from cross-longitudinal surveys? This is the goal of the IMaCH program. From your data and using IMaCH you can estimate period HE. In addition the standard errors of the HE are computed.

A cross-longitudinal survey consists of a first survey ("cross") where individuals of different ages are interviewed about their health status or degree of disability. At least a second wave of interviews ("longitudinal") should measure each individual new health status. Health expectancies are computed from the transitions observed between waves (interviews) and are computed for each degree of severity of disability (number of health states). The more degrees of severity considered, the more time is necessary to reach the Maximum Likelihood of the parameters involved in the model. Considering only two states of disability (disabled and healthy) is generally enough but the computer program works also with more health states.

The simplest model for the transition probabilities is the multinomial logistic model where pij is the probability to be observed in state j at the second wave conditional to be observed in state i at the first wave. Therefore a simple model is: log(pij/pii)= aij + bij*age+ cij*sex, where age is age and sex is a covariate. The advantage that this computer program claims, is that if the delay between waves is not identical for each individual, or if some individual missed an interview, the information is not rounded or lost, but taken into account using an interpolation or extrapolation. hPijx is the probability to be observed in state i at age x+h conditional on the observed state i at age x. The delay h can be split into an exact number (nh*stepm) of unobserved intermediate states. This elementary transition (by month or quarter, trimester, semester or year) is modeled as the above multinomial logistic. The hPx matrix is simply the matrix product of nh*stepm elementary matrices and the contribution of each individual to the likelihood is simply hPijx.

The program presented in this documentation is a general program named IMaCh (for Interpolated MArkov CHain), designed to analyse transitions from longitudinal surveys. The first step is the estimation of the set of the parameters of a model for the transition probabilities between an initial state and a final state. From there, the computer program produces indicators such as the observed and stationary prevalence, life expectancies and their variances both numerically and graphically. Our transition model consists in absorbing and non-absorbing states assuming the possibility of return across the non-absorbing states. The main advantage of this package, compared to other programs for the analysis of transition data (for example: Proc Catmod of SAS®) is that the whole individual information is used even if an interview is missing, a state or a date is unknown or when the delay between waves is not identical for each individual. The program is dependent upon a set of parameters inputted by the user: selection of a sub-sample, number of absorbing and non-absorbing states, number of waves to be taken in account , a tolerance level for the maximization function, the periodicity of the transitions (we can compute annual, quarterly or monthly transitions), covariates in the model. IMaCh works on Windows or on Unix platform (OS/X and Linux).

In version 0.99 (2018), IMaCh has been developed in order to include fixed quantitative covariates as well as time-varying dummies and time-varying quantitative variables. Also the concept of backward prevalences has been published (3) and therefore the earlier so-called implied prevalences, stable prevalences, prevalence limit are now named the forward prevalences. Backward prevalences are estimates of the prevalences of the currently old cohorts when they were younger. Forward prevalences being estimates of the prevalences of younger cohort when they will reach older ages.

Also IMaCh can now be applied to any kind of states, not only healhty states. In the published chapter (3), one example is using "Coresidence", "Living alone" and "Living in Institution" and the healthy statuses measured at each of the 9 interviews of the Health and Retirement Study (HRS 2018) are used as time varying variables instead of IMaCh states.

At REVES 2019, a presentation will use ordinary follow ups of patients suffering of cystic fibrosis where the yearly average volume of ventilation is recorded. The impact of a potential lung transplantation on Health Expectancies can then be measured.

At the REVES workshop on multistate models in Halifax (2022), organizers challenged us to use IMaCh with complex models, i.e with more states, more covariates and more interactions. The reason was that with the US Health and Retirement Surveys (HRS), up to ten waves are now available giving the statistical power of more complex models. But the number of parameters of the likelihood function to be maximized jumped from a few parameters (18 parameters) to a few hundred. And we discovered that the Powell's algorithm (from the book Numerical Recipes in C) did no more converge in a reasonable time. Back to the litterature of "Maximization of a function without derivatives" we found that the algorithm named Praxis (Brent 1973) was proposing to solve some of the issues of Powell's algorithm.

And at the next REVES workshop in Padova (2023) we tested a recent (Buckardt 2019) C version of Praxis which did converge. Therefore we spent a year to implement the Praxis algorithm within IMaCh after testing and fixing bugs in various Praxis programs from the original Algol W version (compiled using the MTS Algol W or even using the very recent AWE compiler of Glyn Webster), to the C Version of Gegenfürtner (1991) which is much more readable and close to Algol than the Buckardt C version.

At the REVES workshop of Bogota (2024) we presented (4) the current new 's' series of IMaCh's version which uses Brent's Praxis algorithm instead of Powell's (NRC) algorithm).



(1) Laditka S. B. and Wolf, D. (1998), New Methods for Analyzing Active Life Expectancy. Journal of Aging and Health. Vol 10, No. 2.

(2) Lièvre A., Brouard N. and Heathcote Ch. (2003) Estimating Health Expectancies from Cross-longitudinal surveys. Mathematical Population Studies.- 10(4), pp. 211-248. DOI and full text DOI 10.1080/713644739

(3) Nicolas Brouard (2019) - Chapter 10 - Theory and Applications of Backward Probabilities and Prevalences in Cross-Longitudinal Surveys. In Handbook of Statistics, Elsevier, Volume 40, edited by: Arni S.R. Srinivasa Rao, C.R. Rao, Pages 435-486, ISSN 0169-7161, ISBN 9780444641526, DOI 10.1016/bs.host.2018.11.009.

(4) Nicolas Brouard and Feinuo Sun (2024) - REVES 35th Conference in Bogotà - New Challenges in Estimating Multistate Models.


What kind of data is required?

The minimum data required for a transition model is the recording of a set of individuals interviewed at a first date and interviewed once more. From the observations of an individual, we obtain a follow-up over time of the occurrence of a specific event. In this documentation, the event is related to health state, but the program can be applied to many longitudinal studies with different contexts. To build the data file as explained in the next section, you must have the month and year of each interview and the corresponding health state. In order to get age, date of birth (month and year) are required (missing values are allowed for month). Date of death (month and year) is an important information also required if the individual is dead. Shorter steps (i.e. a month) will more closely take into account the survival time after the last interview.


The data file

In this example, 8,000 people have been interviewed in a cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). Some people missed 1, 2 or 3 interviews. Health states are healthy (1) and disabled (2). The survey is not a real one but a simulation of the American Longitudinal Survey on Aging. The disability state is defined as dependence in at least one of four ADLs (Activities of daily living, like bathing, eating, walking). Therefore, even if the individuals interviewed in the sample are virtual, the information in this sample is close to reality for the United States. Sex is not recorded is this sample. The LSOA survey is biased in the sense that people living in an institution were not included in the first interview in 1984. Thus the prevalence of disability observed in 1984 is lower than the true prevalence at old ages. However when people moved into an institution, they were interviewed there in 1986, 1988 and 1990. Thus the incidences of disabilities are not biased. Cross-sectional prevalences of disability at old ages are thus artificially increasing in 1986, 1988 and 1990 because a greater proportion of the sample is getting institutionalized. Our article (Lièvre A., Brouard N. and Heathcote Ch. (2003)) shows the opposite: the period prevalence based on the incidences is lower at old ages than the adjusted cross-sectional prevalence illustrating that there has been significant progress against disability.

Each line of the data set (named data1.txt in this first example) is an individual record. Fields are separated by blanks. You can insert comment line which should start with a sharp:

#ID V1V2W BirthD Death   1st_pass s1 2nd    s2  3rd    s3  4th    s4
 1 1 1 1  7/1914 99/9999  7/1984  1  6/1986  1  6/1988  1  7/1990 -1
  • Index number: positive number (field 1)
  • First covariate should be 1 or 0 (reference) (field 2)
  • Second covariate should be 1 or 0 (reference) (field 3)
  • Weight: positive number (field 4) . In most surveys individuals are weighted to account for stratification of the sample.
  • Date of birth: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 5) or with a dot
  • Date of death: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 6) or with a dot
  • Date of first interview: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 7) or with a dot
  • Status at first interview: positive number. Missing values ar coded -1. (field 8)
  • Date of second interview: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 9) or with a dot
  • Status at second interview positive number. Missing values ar coded -1. (field 10)
  • Date of third interview: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 11) or with a dot
  • Status at third interview positive number. Missing values ar coded -1. (field 12)
  • Date of fourth interview: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 13) or with a dot
  • Status at fourth interview positive number. Missing values are coded -1. (field 14)
  • etc

If you do not wish to include information on weights or covariates, you must fill the column with a number (e.g. 1) since all fields must be present.

Version 0.99

Actually, version 0.99 contains two series, the old "r" series and the new "s" series. The unique difference is the use of a different loglikelihood maximization algorithm: Powell's algorithm (from the book Numerical Recipes in C) to Brent's Praxis algorithm. Powell's algorithm struggles to handle more than 30 parameters but Brent can handle at least a few hundred.

In version 0.99r14 and + (or 0.99s3 and after) you can add columns for nqv quantitative variables (nqv can be 0), ntv time varying dummies covariates and nqtv quantitative time varying variables.

Fixed quantitative variables (for example years of education) are columns inserted after the ncovcol fixed dummies covariates (V3 in the example below). Varying dummies are inserted after the varying state at each interview (V4 in the example), varying quantitative (like BMI) are inserted after the eventual varying dummies (V5 in the example).

#ID V1 V2  V3  W BirthDate Death   1st_pass s1 V4 V5   2nd    s2 V4 V5   3rd    s3 V4  V5 
 1   1 1  3.5  1  7/1914  99/9999  7/1984    1  0 20.2 6/1986  1  1 20.1 6/1988  1 0 19.4
  • Index number: positive number (field 1)
  • First covariate should be 1 or 0 (reference) (field 2 V1)
  • Second covariate should be 1 or 0 (reference) (field 3 V2)
  • First fixed quantitative covariate should be a real like years of education (field 4 V3)
  • Weight: positive number (field 5) . In most surveys individuals are weighted to account for stratification of the sample.
  • Date of birth: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 8) or with a dot
  • Date of death: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 7) or with a dot
  • Date of first interview: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 8) or with a dot
  • State at first interview: positive number. Missing values ar coded -1. (field 9)
  • First dummy covariate at first interview : 0 or 1. Missing values are coded -1. (field 10 V4)
  • First quantitative covariate at first interview : real like BMI. (field 11 V5)
  • Date of second interview: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 12) or with a dot
  • State at second interview positive number. Missing values ar coded -1. (field 13)
  • First dummy covariate at second interview : 0 or 1. Missing values are coded -1. (field 14 V4)
  • First quantitative covariate at second interview : real like BMI. (field 15 V5)
  • Date of third interview: coded as mm/yyyy. Missing dates are coded as 99/9999 (field 16) or with a dot
  • State at third interview positive number. Missing values ar coded -1. (field 17)
  • First dummy covariate at third interview : 0 or 1. Missing values are coded -1. (field 18 V4)
  • First quantitative covariate at third interview : real like BMI. (field 19 V5)
  • etc


The following section is only relevant to you if you have heard of deaths that occurred after the last wave of your multiple-round survey. If you want to take this information into account, which can be a good idea, you should be careful of the potential biases of only including deaths that you have heard of and not all deaths that will occur before a certain date. What is called hereafter an additional wave, without interview, but rather a consultation of the vital statuses for example by the civil status.

Missing date of interview, status at interview and date of death at last wave

In early versions of IMaCh, mortality was only measured between waves. But more and more users have reported deaths after the last wave. And they did not report the status of other people still alive. In addition, only known deaths have been reported. Therefore, the mortality was overestimated and the results were biased. The Lexis domain of estimation does not need to be squared, but it should be the same for all individuals in the sample, with all deaths occurring in the domain divided by all person-years counted in the same domain.

Consequently, IMaCh will not accept the contribution of a death if the date (year and month) of the death is later than the date (year and month) of the last interview.

In order to warn users of this potential bias, IMaCh (versions 0.98r4 and later) displays an ERROR for each case (only the first error is displayed on the screen, the others are printed only in the log file) and the transition towards death is not taken into account. Your estimates are not biased, but you missed some statistical power.

In order to take into account this last transition to death in the likelihood of IMaCh, you must ADD a new `` wave which should be on the date of the verification of the vital status of ALL individuals (and not just people deceased that you have heard of.). This may be the month of the most recent death, but in this case the vital status of all people alive on the date of the previous verification should be checked.

Please take a look at the param/ILK_param.txt text file as it describes the different contributions of each individual (and adds them all up to find the parameter values ​​that maximize the sum). Some charts use the information from this very important text file to highlight various aspects (probability or likelihood).

  1. If the person is deceased and you know the date of death, status S on this additional wave would be coded "death" ("3" in the simplest case). The contribution to the probability will be the probability of going from the last known health state "i" to death, i.e. "pi3".
  2. If the person is known to be alive on the scanned date, and their health status is unknown, you will code "-1" or "." and the probability will be calculated as pi. or probability of passing from state i to any living state (pi. = pi1 + p12 in this simplest example).
  3. If a person's vital state on the date of the last scan is unknown, you can code with -2 and the transition will not be taken into account.

Example

NEW: In version 0.99r43 and later, the value of each covariate is added at the end of each line describing the contributions to the log-likelihood, permitting to verify their value and to draw nice graphs at the very beginning of the html listing.

In the example below, the space between waves is about 2 years and the parameter stepm is initially set to 24 months. This is the simplest model with two health statuses (1 for healthy, 2 for disabled) and death (code 3).

The first wave occurred between April 2002 and March 2003. The second between 1/2004 and 12/2005, etc. The last and 8th wave occurred between 1/2016 and 1/2017, and this man with id 423 was surveyed in March 2016 but could not answer. His health status was reported as unknown "-1". But later on, may be in 2018, you have the information that he died in November 2016 and you added this information in the "date of death" column but did not change the status of the 8th wave : IMaCh will signal an ERROR and the contribution of the death (which is the probability to move from the last disability status in November 2014 to death two years after) will not be taken into account.

Error! Death for individual 423 line=6 occurred at 10/2016 after last wave 8 interviewed at 3/2016. Potential bias if other individuals are still alive on that date but ignored. This case (6)/wave (8) is skipped, no contribution to likelihood. Please add a new fictitious wave at the date of last vital status scan, with a dead status. See documentation

Let us look in more details the various contributions to the likelihood of this record:

#id  msex   psclas  isclas  srh   weight  dbirth  ddeath  intv1  s1 intv2   s2  intv3   s3 intv4   s4  intv5   s5  intv6   s6  intv7   s7  intv8   s8
423  0      0       0       0     1       6/1924  10/2016 11/2002 2  9/2004  -1  1/2007  2  1/2009  2  11/2010 2   9/2012  2   10/2014 2  3/2016  -1

The man was born on June 1924 and died in November 2016. At the first interview in November 2002, he is in state 2 and is aged 78.4 years. His status was unknown on September 2004. Then on January 2007 his health status hasn't changed. After 4.2 years he is aged 82.6 years, but as the step of the elementary transition matrix is set to 2 years (stepm=24), IMaCh is computing the product of 3 matrices, from 78.4 to 80.4 years, from 80.4 to 82.4 and from 82.4 to 84.4 years. The third matrix is used only partially, from 82.4 to 82.6 years and the contribution to the likelihood, which the element p22 of the matrix product is interpolated (there are different ways to interpolate according to the value of parameter mle). Its value, as reported below is 0.633710 in file param/ILK_param.txt.

#individual(line's_record) count ageb ageend s1 s2 wave# effective_wave# number_of_matrices_product pij weight weight/gpw -2ln(pij)*weight 0pij_x 0pij_(x-stepm) cumulating_loglikeli_by_health_state(reweighted=-2ll*weightXnumber_of_contribs/sum_of_weights) and_total
#num_i ageb agend i s1 s2 mi mw dh likeli   weight %weight 2wlli   out      sav  -2*gipw/gsw*weight*ll[1]++ -2*gipw/gsw*weight*ll[2]++ -2*gipw/gsw*weight*ll(total)
423   78.4   84.4 6  2  2  1  1  3 0.633710 1.0000 1.000 -0.912327 0.483846 0.647334   17.869715   22.739298  40.609013
423   82.6   84.6 6  2  2  2  3  1 0.742829 1.0000 1.000 -0.594579 0.742829 1.000000   17.869715   23.333877  41.203592
423   84.6   86.6 6  2  2  3  4  1 0.722615 1.0000 1.000 -0.649758 0.697398 1.000000   17.869715   23.983635  41.853350
423   86.4   88.4 6  2  2  4  5  1 0.680954 1.0000 1.000 -0.768521 0.651950 1.000000   17.869715   24.752156  42.621871
423   88.2   92.2 6  2  2  5  6  2 0.592156 1.0000 1.000 -1.047969 0.331009  0.603511   17.869715  25.800125  43.669840

His second contribution, 0.742829 is from age 82.6 to 84.6 (1/2007 to 1/2009), third 0.722615 is from 84.6 to 86.4, fourth 0.680954 is from 86.4 to 88.2, fifth 0.592156 is from 88.2 to 92.2 and last contribution which is death is not taken into account because it occurred after the last wave!

In order to keep the information on the death we have to add a new wave; but this new 9th wave has to be defined for all individuals. In the ideal case the vital status of each person has to be known at the same month. For example it could be December 2018 if the last known death was December 2018; but in that case all the vital statuses of any individual from the sample have to be known on December 2018. Below are two individual records, the case 423 already described and the case 168 of the most recent death:

#id  msex   psclas  isclas  srh   weight  dbirth  ddeath  intv1  s1 intv2   s2  intv3   s3 intv4   s4  intv5   s5  intv6   s6  intv7   s7  intv8   s8  intv9  s9
423  0      0       0       0     1       6/1924  10/2016 11/2002 2  9/2004  -1  1/2007  2  1/2009  2  11/2010 2   9/2012  2   10/2014 2  3/2016  -1 12/2018  3
168  0      0       0       1     1       6/1942  12/2018  6/2002 1  2/2005  -1	 5/2006 -1  9/2008 -1   8/2010 -1  10/2012 -1  11/2014 -1 10/2016 -1 12/2018  3

The contributions of the death are now taken into account, p23=0.454324 for individual 423 and p13=0.059778 for individual 168:

#num_i ageb agend i s1 s2 mi mw dh likeli  weight %weight 2wlli   out     sav     -2*gipw/gsw*weight*ll[1]++ -2*gipw/gsw*weight*ll[2]++ -2*gipw/gsw*weight*ll(total)
423   90.3 92.3   6  2  3  6  7 1 0.454324 1.0    1.0   -1.577890 0.454324 0.000000   17.870219   27.382626   45.252845
168   60.0 78.0 449  1  3  1  1 9 0.059778 1.0    1.0   -5.634227 0.260914 0.201136  595.360661 1232.254763 1827.615424


If you don't want to add a new fictitious wave, you have to change the unknown status of the 8th wave with a death state and it should occur after the date of the death and not at the date of the last wave. Why? Again, if the date of the death occurred after the end of the survey because if you reported only a few deaths without knowing the vital status of each individual of your sample, you were potentially biasing your estimates.

#id  msex   psclas  isclas  srh   weight  dbirth  ddeath  intv1  s1 intv2   s2  intv3   s3 intv4   s4  intv5   s5  intv6   s6  intv7   s7  intv8   s8
423  0      0       0       0     1       6/1924  10/2016 11/2002 2  9/2004  -1  1/2007  2  1/2009  2  11/2010 2   9/2012  2   10/2014 2   10/2016  3
168  0      0       0       1     1       6/1942  12/2018  6/2002 1  2/2005  -1	 5/2006 -1  9/2008 -1   8/2010 -1  10/2012 -1  11/2014 -1  12/2018  3

The contributions of the death are now taken into account and should have exactly the same values.

More generally, here is a table describing what IMaCh 0.98r4 and above is doing.

The date of interview DI is unknown U (DIU) and the date of death DD is either unknown (DDU) or known (DDK).

The (health) status can also be known (SK) or unknown (SU) with a code S=-1.

DIK
DIU
DDK
SK or SU(S=-1)
DD<DIK Warning S should be dead (=3 here)
DD>=DIK Error Potential bias, case/wave is skipped


SK Warning: case skipped
SU(S=-1) Error: Potential bias, case/wave is skipped


DDU
SK OK, transition taken into account
SU(S=-1) Alive at a precise date but the health status is unknown. Warning: Transition to any vital state taken into account (pi.)
SU(S=-2) Vital status unknown at a precise date No warning, case skipped
SU(S=-2)
Vital status unknown but without precise date
Warning: case skipped


Explanations

  • Date of death known (DDK)
    • When the status is known (SK) unknown SU (with S=-1) and the date of death occurred before the given date of interview, there is a "Warning" because the status should be coded Dead (S=3 is this example). If the date of death is posterior to the date of interview, it may mean that only some deaths have been reported and not all deaths. Thus if we take into account those deaths but not the survivors we would bias the results. This is the reason why IMaCh says Error (and not a Warning) and the information concerning this death is not taken into account. To take this death event into account see above and add a new wave at the date of the last scan of the vital statuses of your sample.
    • If the date of interview is not known, we make the same distinction between Warning and Error and in both cases the information on the date of death is not used.
  • If the date of death is unknown (DDU)
    • but the status at wave is known (usual case) the transition is taken into account. If the status is coded -1, it means that the person is ALIVE but his/her health status is unknown: then the contribution of this wave is taken into account as pi. = pi1 +pi2 (because we don't know if the person who was at state i moved to state 1 or 2. If the status is coded with -2, the vital status itself is unknown at a PRECISE date (screening of deaths registry for example), it is correct and no information can be used.
    • if the status is coded -2 but the date of this unknown vital status is unknown, then there is a Warning and the information is skipped. You are supposed to have a common screening date where the vital status of any person is known (alive or dead) in order to compute the mortality without bias.

Covariates

We are not using covariates in this example. But the value of a covariate could only be 0 or 1. Some earlier versions of IMaCh did allow for values like 1 and 2 but in versions 0.98+ you will get a warning and the program will stop if your covariates columns contain any value different from 0 and 1.

This is to make results clear: IMaCh is not able (and will probably never be able) to create Design variables and you must provide them. In the simple binomial case, like sex, females could be the reference and coded 0 while males could be 1.

In the case of a polytomus independent variable like RACE with four categories White, Black, Hispanic and Other the reference Other could be coded 0 0 0, the White vs Other 1 0 0, the Black vs Other 0 1 0 and Hispanic vs Other 0 0 1. The value 1 1 1 should not exist in the sample and is meaningless here but Imach might output tables and graphs for this case because IMaCh is not able to make the difference between a model with a polytomus variable containing 4 categories and a model with 3 binomial independent variables V1, V2 and V3.

In IMaCh, age is the only variable which is treated by a logit, any other covariate must be dichotomized (whith a reference and design variables). If you have other continuous variables (like severity of a deficiency) which increase like a logit, you have to split the continuous variables into levels and create design variables (you could also modify the sources of the program in order to add this continuous variable to the computed Likelihood).



Your first example parameter file

#Imach version 0.99r11, April 2017, INED-EUROREVES
title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4
ftol=1.e-8 stepm=1 ncovcol=2 nqv=0 ntv=0 nqtv=0 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
model=1+age+.
# Parameters
12 0. 0.
13 0. 0.
21 0. 0.
23 0. 0.
# Scales
12 0. 0. 
13 0. 0. 
21 0. 0. 
23 0. 0. 
# Covariance matrix
121 0.
122 0. 0.
131 0. 0. 0. 
132 0. 0. 0. 0. 
211 0. 0. 0. 0. 0. 
212 0. 0. 0. 0. 0. 0. 
231 0. 0. 0. 0. 0. 0. 0. 
232 0. 0. 0. 0. 0. 0. 0. 0.
# agemin agemax for life expectancy, bage fage (if mle==0 ie no data nor Max likelihood).
agemin=70 agemax=100 bage=50 fage=100 estepm=24 ftolpl=6e-4
# Observed prevalence period
begin-prev-date=1/1/1984 end-prev-date=1/6/1988 mov_average=0
# Health expectancies computed from stable (period) prevalence (pop_based=0) or population based (1)
pop_based=1
# Prevalence forecasting and backcasting is skept here
##prevforecast=0  starting-proj-date=1/1/1989 final-proj-date=1/1/1990 mobil_average=0
#prevforecast=1 yearsfproj=10 mobil_average=0
#prevbackcast=1 yearsbproj=10 mobil_average=1
# Results model=1+age.
result:.


This first line is a comment. Comments line start with a '#'. Each parameter value is announced by a parameter name and an equal sign without any space 'parameter_value=value' is good, 'parameter_value =value' is wrong:

First uncommented line

title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4
  • title= 1st_example is title of the run.
  • datafile= data1.txt is the name of the data set. Our example is a six years follow-up survey. It consists of a baseline followed by 3 reinterviews. If the complete file name with path contains blanks you should replace them (rename the directories as well) with, for example, underscores:
title=BMI_nomiss datafile=C:\Users\c3017510\Desktop\Healthy_Ageing\IMaCh_June_2015\Programs\bmi.txt"  
lastobs=12432 firstpass=1 lastpass=11
Quoting the entire path with quotes (Windows), or quoting blanks with a back slash (Unixes) may not work.
  • lastobs= 8600 the program is able to run on a subsample where the last observation number is lastobs. It can be set a bigger number than the real number of observations (e.g. 100000). In this example, maximisation will be done on the first 8600 records.
  • firstpass=1 , lastpass=4 If there are more than two interviews in the survey, the program can be run on selected transitions periods. firstpass=1 means the first interview included in the calculation is the baseline survey. lastpass=4 means that the last interview to be included will be by the 4th.

Second uncommented line

ftol=1.e-08 stepm=1 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
  • ftol=1e-8 Convergence tolerance on the function value in the maximisation of the likelihood. Choosing a correct value for ftol is difficult. 1e-8 is the correct value for a 32 bit computer.
  • stepm=1 The time unit in months for interpolation. Examples:
    • If stepm=1, the unit is a month
    • If stepm=3, the unit is a trimester
    • If stepm=12, the unit is a year
    • If stepm=24, the unit is two years
    • ...
  • ncovcol=2 Number of covariate (dummy) columns included in the datafile before the column for the date of birth. You can include covariates that will not be used in the model as this number is not the number of covariates that will be specified by the model. The 'model' line will describes the covariates to be taken into account during the run.
  • nqv=0 Number of Quantitative coVariate (not dummy) columns included in the datafile. They should be inserted after the columns of the ncovcol dummy covariates. (It could be height, years of education).
  • ntv=0 Number of Time-varying coVariate (dummy) columns included, for each wave in the datafile. They should be inserted after the column of the state observed at the interview.
  • nqtv=0 Number of Time-varying Quantitative coVariate (not dummy) columns included in the datafile. They should be inserted after the ntv columns if any.
  • nlstate=2 Number of non-absorbing (alive) states. Here we have two alive states: disability-free is coded 1 and disability is coded 2.
  • ndeath=1 Number of absorbing states. The absorbing state death is coded 3.
  • maxwav=4 Number of waves in the datafile.
  • mle=1 Option for the Maximisation and the computation of the Likelihood.
    • If mle=1 the program does the maximisation and the calculation of health expectancies
    • If mle=0 the program uses the values of the parameters (which will be inserted as paramater lines) in order to calculate the first order values (means) of various health expectancies and some other indices. It also read the covariance matrix in order to calculate and to display some graphs of second order esitmates of these indices (variance and confidence intervals). With mle=0 the program does not enter the maximization process. There are also other possible values for mle:
      • If mle=-1 you get a template for the number of parameters and the size of the variance-covariance matrix. This is useful if the model is complex with many covariates.
      • If mle=-3 IMaCh computes the mortality but without any health status (May 2004)
      • If mle=2 IMach likelihood corresponds to a linear interpolation
      • If mle=3 IMach likelihood corresponds to an exponential inter-extrapolation
      • If mle=4 IMach likelihood corresponds to no inter-extrapolation, thus biasing the results if stepm is big. If stepm is 24 months (2 years) and if the interval between two waves of a person is 2.5 years, the program will compute the probability of transition over 2 years; and if it 3.2 years, it will compute the probability as the product of two matrices, as if the transition was done in exactly 4 years instead of 3.2 (see the publication for details).
      • If mle=5 IMach likelihood corresponds to no inter-extrapolation, and before the correction of the Jackson's bug (avoid this).
  • weight=0 Provides the possibility of adding weights.
    • If weight=0 no weights are included
    • If weight=1 the maximisation integrates the weights which are in last column after the ncovcol columns and nqv columns for covariates and before the date of birth.

The model line with covariates

Intercept and age are automatically included in the model but since version 0.98q2 (June 2015), the constant 1 and the age have to be explicitly set in the model. Additional covariates can be included with the command:

model=1+age+. 
  • if model=1+age+. then no other covariate than constant and age are included (before 0.98q2 the syntax was model=.).
  • if model=1+age+V1 the model includes the first covariate (field 2)
  • if model=1+age+V2 the model includes the second covariate (field 3)
  • if model=1+age+V1+V2 the model includes the first and the second covariate (fields 2 and 3)
  • if model=1+age+V1*V2 the model includes the product of the first and the second covariate (fields 2 and 3)
  • if model=1+age+V1+V1*age the model includes the product covariate V1 by age.

In this example, we have two covariates in the data file (fields 2 and 3). The number of covariates included in the data file between the id and the date of birth is ncovcol=2 (it was named ncov in version prior to 0.8). If you have 3 covariates in the datafile (fields 2, 3 and 4), you will set ncovcol=3. Then you can run the programm with a new parametrisation taking into account the third covariate. For example, model=1+age+V1+V3 estimates a model with the first and third covariates. More complicated models can be used, but this will take more time to converge. With a simple model (no covariates), the programme estimates 8 parameters. Adding covariates increases the number of parameters : 12 for model=1+age+V1, 16 for model=1+age+V1+V1*age and 20 for model=1+age+V1+V2+V3.

Please remember to increase the number of covariates step by step. And use a large stepm (=24) to test if the parameters are significant before tuning your estimation with a smaller stepm. That is because time to complete will be much greater: the number of iterations should not be greater but the number of calls to Maximum Likelihood function will much higher.

You should try the simple model first without new covariate and note the value of -2*LL. By adding a covariate V1 (0-1), the new -2*LL should increase by more than a $\chi^2$ with 1 degree of liberty which is 3.84 at the 95%. If there is not enough increase you can't keep the value of this new parameter in your regression model.

Next step will consists in adding a second covariate and test with former model if the Log Likelihood ratio test is successful or not.

Testing directly a complete model 1+age+V1+V2+V3+V3*age is usually a waste of time. It is usually a waste of time in standard regression analysis too but software for multiple regression models or logistic regression models are not as complex and time consuming than using Interpolated Markov Chain models.

Guess values for optimization

You must write the initial guess values of the parameters for optimization. The number of parameters, N depends on the number of absorbing states and non-absorbing states and on the number of covariates in the model (ncovmodel).
N is given by the formula N=(nlstate + ndeath-1)*nlstate*ncovmodel .


You should copy and paste these lines into your parameter file and run with mle=1 . The file rfoo.imach will be similar but with the estimated values for the parameter as well as for the covariance matrix.
Thus in the simple case with 2 covariates in the model(the model is log (pij/pii) = aij + bij * age where intercept and age are the two covariates), and 2 health states (1 for disability-free and 2 for disability) and 1 absorbing state (3), you must enter 8 initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can start with zeros as in this example, but if you have a more precise set (for example from an earlier run) you can enter it and it will speed up the convergence
Each of the four lines starts with indices "ij": ij aij bij


# Guess values of aij and bij in log (pij/pii) = aij + bij * age
 12 -14.155633  0.110794 
 13  -7.925360  0.032091 
 21  -1.890135 -0.029473 
 23  -6.234642  0.022315

or, to simplify (in most of cases it converges but there is no warranty!):

 12 0.0 0.0
 13 0.0 0.0
 21 0.0 0.0
 23 0.0 0.0

In order to speed up the convergence you can make a first run with a large stepm i.e stepm=12 or 24 and then decrease the stepm until stepm=1 month. If newstepm is the new shorter stepm and stepm can be expressed as a multiple of newstepm, like newstepm=n stepm, then the following approximation holds:

aij(stepm) = aij(n . stepm) - ln(n)

and

bij(stepm) = bij(n . stepm) .

For example if you already ran with stepm=6 (a 6 months interval) and got:

# Parameters
12 -13.390179  0.126133 
13  -7.493460  0.048069 
21   0.575975 -0.041322 
23  -4.748678  0.030626 
 

Then you now want to get the monthly estimates, you can guess the aij by subtracting ln(6)= 1.7917
and running using

12 -15.18193847  0.126133 
13 -9.285219469  0.048069
21 -1.215784469 -0.041322
23 -6.540437469  0.030626

and get

12 -15.029768 0.124347 
13 -8.472981  0.036599
21 -1.472527 -0.038394 
23 -6.553602  0.029856 

which is closer to the results. The approximation is probably useful only for very small intervals and we don't have enough experience to know if you will speed up the convergence or not.

          -ln(12)= -2.484
  -ln(6/1)=-ln(6)= -1.791
  -ln(3/1)=-ln(3)= -1.0986
 -ln(12/6)=-ln(2)= -0.693

In version 0.9 and higher you can still have valuable results even if your stepm parameter is bigger than a month. The idea is to run with bigger stepm in order to have a quicker convergence at the price of a small bias. Once you know which model you want to fit, you can put stepm=1 and wait hours or days to get the convergence! To get unbiased results even with large stepm we introduce the idea of pseudo likelihood by interpolating two exact likelihoods. In more detail:

If the interval of d months between two waves is not a multiple of 'stepm', but is between (n-1) stepm and n stepm then both exact likelihoods are computed (the contribution to the likelihood at n stepm requires one matrix product more) (let us remember that we are modelling the probability to be observed in a particular state after d months being observed at a particular state at 0). The distance, (bh in the program), from the month of interview to the rounded date of n stepm is computed. It can be negative (interview occurs before n stepm) or positive if the interview occurs after n stepm (and before (n+1)stepm).
Then the final contribution to the total likelihood is a weighted average of these two exact likelihoods at n stepm (out) and at (n-1)stepm(savm). We did not want to compute the third likelihood at (n+1)stepm because it is too costly in time, so we used an extrapolation if bh is positive.
The formula for the inter/extrapolation may vary according to the value of parameter mle:

  • mle=1
    • lli= log((1.+bbh)*out[s1][s2]- bbh*savm[s1][s2]); /* linear interpolation */
  • mle=2
    • lli= (savm[s1][s2]>(double)1.e-8 ? \
    log((1.+bbh)*out[s1][s2]- bbh*(savm[s1][s2])): \
    log((1.+bbh)*out[s1][s2])); /* linear interpolation */
  • mle=3
    • lli= (savm[s1][s2]>1.e-8 ? \
    (1.+bbh)*log(out[s1][s2])- bbh*log(savm[s1][s2]): \
    log((1.+bbh)*out[s1][s2])); /* exponential inter-extrapolation */
  • mle=4
    • lli=log(out[s[mw[mi][i]][i]][s[mw[mi+1][i]][i]]); /* No interpolation */
    no need to save previous likelihood into memory.

If the death occurs between the first and second pass, and for example more precisely between n stepm and (n+1)stepm the contribution of these people to the likelihood is simply the difference between the probability of dying before n stepm and the probability of dying before (n+1)stepm. There was a bug in version 0.8 and death was treated as any other state, i.e. as if it was an observed death at second pass. This was not precise but correct, although when information on the precise month of death came (death occuring prior to second pass) we did not change the likelihood accordingly. We thank Chris Jackson for correcting it. In earlier versions (fortunately before first publication) the total mortality was thus overestimated (people were dying too early) by about 10%. Version 0.95 and higher are correct.

Our suggested choice is mle=1 . If stepm=1 there is no difference between various mle options (methods of interpolation). If stepm is big, like 12 or 24 or 48 and mle=4 (no interpolation) the bias may be very important if the mean duration between two waves is not a multiple of stepm. See the appendix in our main publication concerning the sine curve of biases.

Convergence of the Likelihood and graphs

In such logistic regressions, the likelihood of the samples is depending of the values of the parameters. If this likelihood can be increased by changing the values of each parameter, there is no reason to find a unique set of parameters which will maximized the likelihood globally. If the data are of good quality and the model meaningful, you will get meaningful results.

It always a better idea to start with a value of stepm which is close to the mean interval between two waves. It is mostly in order to speed up the convergence.

Guess values for computing variances

These values are output by the maximisation of the likelihood mle=1 and can be used as an input for a second run in order to get the various output data files (Health expectancies, period prevalence etc.) and figures without rerunning the long maximisation phase (mle=0).

The 'scales' are small values needed for the computing of numerical derivatives. These derivatives are used to compute the hessian matrix of the parameters, that is the inverse of the covariance matrix. They are often used for estimating variances and confidence intervals. Each line consists of indices "ij" followed by the initial scales (zero to simplify) associated with aij and bij.

  • If mle=1 you can enter zeros:
# Scales (for hessian or gradient estimation)
 12 0. 0. 
 13 0. 0. 
 21 0. 0. 
 23 0. 0.
  • If mle=0 (no maximisation of Likelihood) you must enter real values (usually obtained from an earlier run).

Covariance matrix of parameters

The covariance matrix is output if mle=1. But it can be used as an input to get the various output data files (Health expectancies, period prevalence etc.) and figures without rerunning the maximisation phase (mle=0).
Each line starts with indices "ijk" followed by the covariances between aij and bij, the matrix is symmetrical and only the lower triangle should be fed:

121 Var(a12) 
122 Cov(b12,a12)  Var(b12) 
        ...
232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23)
  • If mle=1 you can enter zeroes for the starting values:
# Covariance matrix
121 0.
122 0. 0.
131 0. 0. 0. 
132 0. 0. 0. 0. 
211 0. 0. 0. 0. 0. 
212 0. 0. 0. 0. 0. 0. 
231 0. 0. 0. 0. 0. 0. 0. 
232 0. 0. 0. 0. 0. 0. 0. 0.
  • If mle=0 you must enter a real covariance matrix (usually obtained from an earlier run with mle=0).

mle=-1 for a correct design of parameters and covariance matrix

In fact, there are two different aspects for running the program, the first, called the design, is the number of columns and lines of the initial values for the parameter and the second is their values. In order to get a correct design, you should set mle=-1, run the program and you will get something like

$ cat leigh.imach
title=BMI_nomiss datafile=bmi.txt lastobs=12432 firstpass=1 lastpass=11
ftol=1.e-08 stepm=24 ncovcol=3 nlstate=2 ndeath=1 maxwav=11 mle=-1 weight=0
model=1+age+V1+V2+V3

Run

$ imach leigh.imach
title=BMI_nomiss datafile=bmi.txt lastobs=12432 firstpass=1 lastpass=11
ftol=1.e-08 stepm=24 ncovcol=3 nlstate=2 ndeath=1 maxwav=11 mle=-1 weight=0
model=1+age+V1+V2+V3
# Parameters
# Parameters nlstate*nlstate*ncov a12*1 + b12 * age + ...
12 0. 0. 0. 0. 0.
13 0. 0. 0. 0. 0.
21 0. 0. 0. 0. 0.
23 0. 0. 0. 0. 0.
# Scales (for hessian or gradient estimation)
12 0. 0. 0. 0. 0.
13 0. 0. 0. 0. 0.
21 0. 0. 0. 0. 0.
23 0. 0. 0. 0. 0.
# Covariance matrix
#121 Var(a12)
#122 Cov(b12,a12) Var(b12)
#123 Cov(c12,a12) Cov(c12,b12) Var(c12)
#124 Cov(d12,a12) Cov(d12,b12) Cov(d12,c12) Var(d12)
#125 Cov(e12,a12) Cov(e12,b12) Cov(e12,c12) Cov(e12,d12) Var(e12)
#131 Cov(a13,a12) Cov(a13,b12) Cov(a13,c12) Cov(a13,d12) Cov(a13,e12) Var(a13)
#132 Cov(b13,a12) Cov(b13,b12) Cov(b13,c12) Cov(b13,d12) Cov(b13,e12) Cov(b13,a13) Var(b13)
#133 Cov(c13,a12) Cov(c13,b12) Cov(c13,c12) Cov(c13,d12) Cov(c13,e12) Cov(c13,a13) Cov(c13,b13) Var(c13)
#134 Cov(d13,a12) Cov(d13,b12) Cov(d13,c12) Cov(d13,d12) Cov(d13,e12) Cov(d13,a13) Cov(d13,b13) Cov(d13,c13) Var(d13)
#135 Cov(e13,a12) Cov(e13,b12) Cov(e13,c12) Cov(e13,d12) Cov(e13,e12) Cov(e13,a13) Cov(e13,b13) Cov(e13,c13) Cov(e13,d13) Var(e13)
#211 Cov(a21,a12) Cov(a21,b12) Cov(a21,c12) Cov(a21,d12) Cov(a21,e12) Cov(a21,a13) Cov(a21,b13) Cov(a21,c13) Cov(a21,d13) Cov(a21,e13) Var(a21)
#212 Cov(b21,a12) Cov(b21,b12) Cov(b21,c12) Cov(b21,d12) Cov(b21,e12) Cov(b21,a13) Cov(b21,b13) Cov(b21,c13) Cov(b21,d13) Cov(b21,e13) Cov(b21,a21) Var(b21)
#213 Cov(c21,a12) Cov(c21,b12) Cov(c21,c12) Cov(c21,d12) Cov(c21,e12) Cov(c21,a13) Cov(c21,b13) Cov(c21,c13) Cov(c21,d13) Cov(c21,e13) Cov(c21,a21) Cov(c21,b21) Var(c21)
#214 Cov(d21,a12) Cov(d21,b12) Cov(d21,c12) Cov(d21,d12) Cov(d21,e12) Cov(d21,a13) Cov(d21,b13) Cov(d21,c13) Cov(d21,d13) Cov(d21,e13) Cov(d21,a21) Cov(d21,b21) Cov(d21,c21) Var(d21)
#215 Cov(e21,a12) Cov(e21,b12) Cov(e21,c12) Cov(e21,d12) Cov(e21,e12) Cov(e21,a13) Cov(e21,b13) Cov(e21,c13) Cov(e21,d13) Cov(e21,e13) Cov(e21,a21) Cov(e21,b21) Cov(e21,c21) Cov(e21,d21) Var(e21)
#231 Cov(a23,a12) Cov(a23,b12) Cov(a23,c12) Cov(a23,d12) Cov(a23,e12) Cov(a23,a13) Cov(a23,b13) Cov(a23,c13) Cov(a23,d13) Cov(a23,e13) Cov(a23,a21) Cov(a23,b21) Cov(a23,c21) Cov(a23,d21) Cov(a23,e21) Var(a23)
#232 Cov(b23,a12) Cov(b23,b12) Cov(b23,c12) Cov(b23,d12) Cov(b23,e12) Cov(b23,a13) Cov(b23,b13) Cov(b23,c13) Cov(b23,d13) Cov(b23,e13) Cov(b23,a21) Cov(b23,b21) Cov(b23,c21) Cov(b23,d21) Cov(b23,e21) Cov(b23,a23) Var(b23)
#233 Cov(c23,a12) Cov(c23,b12) Cov(c23,c12) Cov(c23,d12) Cov(c23,e12) Cov(c23,a13) Cov(c23,b13) Cov(c23,c13) Cov(c23,d13) Cov(c23,e13) Cov(c23,a21) Cov(c23,b21) Cov(c23,c21) Cov(c23,d21) Cov(c23,e21) Cov(c23,a23) Cov(c23,b23) Var(c23)
#234 Cov(d23,a12) Cov(d23,b12) Cov(d23,c12) Cov(d23,d12) Cov(d23,e12) Cov(d23,a13) Cov(d23,b13) Cov(d23,c13) Cov(d23,d13) Cov(d23,e13) Cov(d23,a21) Cov(d23,b21) Cov(d23,c21) Cov(d23,d21) Cov(d23,e21) Cov(d23,a23) Cov(d23,b23) Cov(d23,c23) Var(d23)
#235 Cov(e23,a12) Cov(e23,b12) Cov(e23,c12) Cov(e23,d12) Cov(e23,e12) Cov(e23,a13) Cov(e23,b13) Cov(e23,c13) Cov(e23,d13) Cov(e23,e13) Cov(e23,a21) Cov(e23,b21) Cov(e23,c21) Cov(e23,d21) Cov(e23,e21) Cov(e23,a23) Cov(e23,b23) Cov(e23,c23) Cov(e23,d23) Var(e23)
121 0.
122 0. 0.
123 0. 0. 0.
124 0. 0. 0. 0.
125 0. 0. 0. 0. 0.
131 0. 0. 0. 0. 0. 0.
132 0. 0. 0. 0. 0. 0. 0.
133 0. 0. 0. 0. 0. 0. 0. 0.
134 0. 0. 0. 0. 0. 0. 0. 0. 0.
135 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
211 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
212 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
213 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
214 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
215 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
231 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
232 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
233 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
234 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
235 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 You chose mle=-1, look at file orleigh.txt for a template of covariance matrix 

Type  q for exiting:

On Windows the file is stored in a .log file and if an error appeared you have to check that file. On Unixes, just type on a terminal : imach foo.imach

Age range for calculation of period (stable) prevalences and health expectancies

agemin=70 agemax=100 bage=50 fage=100 estepm=24 ftolpl=6e-4

Once we obtained the estimated parameters, the program is able to calculate various results, as the period prevalences, transitions probabilities and life expectancies at any age, etc. Choice of the age range is useful for extrapolations. In this example, the age of people interviewed varies from 69 to 102 and the model is estimated using their exact ages. But if you are interested in the age-specific period prevalence you can start the simulation at an exact age like 70 and stop at 100. Then the program will draw at least two curves describing the forecasted prevalences of two cohorts, one for healthy people at age 70 and the second for disabled people at the same initial age. And according to the mixing property (ergodicity) and because of recovery, both prevalences will tend to be identical at later ages. Thus if you want to compute the prevalence at age 70, you should enter a lower agemin value.

Setting bage=50 (begin age) and fage=100 (final age), let the program compute life expectancy from age 'bage' to age 'fage'. As we use a model, we can interessingly compute life expectancy on a wider age range than the age range from the data. But the model can be rather wrong on much larger intervals. Program is limited to around 120 for upper age!

  • agemin= Minimum age for calculation of the period prevalence
  • agemax= Maximum age for calculation of the period prevalence
  • bage= Minimum age for calculation of the health expectancies
  • fage= Maximum age for calculation of the health expectancies
  • estepm= Unit (in months). We compute the life expectancy from trapezoids spaced every estepm months. This is mainly to measure the difference between two models: for example if stepm=24 months, pijx are given only every 2 years and by summing them we are calculating an estimate of the Life Expectancy assuming a linear progression inbetween and thus overestimating or underestimating according to the curvature of the survival function. If, for the same date, we estimate the model with stepm=1 month, we can keep estepm to 24 months to compare the new estimate of Life expectancy with the same linear hypothesis. A more precise result, taking into account a more precise curvature will be obtained if estepm is as small as stepm.
  • ftolpl= 6e-4 . The period or stable prevalence at age x and in state j is computed by computing the probability to be in state j at x, for people being in any live state i at age x-n. Weak ergodicity property says that these probabilities do not depend of the initial state i. But if n is too small, the probabilities differ at a relative error of ftolpl which can be adjusted. Graphs of confidence intervals are not smooth when ftolpl is too big. Also if estepm is very precise (1 month for example), computations with a very small ftolpl could last long. You can change this tolerance of the prevalence limit. If you set it to not small enough value, you will get perhaps many many warnings.

Computing the cross-sectional prevalence

begin-prev-date=1/1/1984 end-prev-date=1/6/1988 mov_average=0

Statements 'begin-prev-date' and 'end-prev-date' allow the user to select the period in which the observed prevalences in each state. In this example, the prevalences are calculated on data survey collected between 1 January 1984 and 1 June 1988.

  • begin-prev-date= Starting date (day/month/year)
  • end-prev-date= Final date (day/month/year)
  • mov_average= smoothing with a five-age moving average centered at the mid-age of the five year-age period. The command mov_average takes value 1 if the prevalences are smoothed and 0 otherwise.

Population- or status-based health expectancies

pop_based=0

The program computes status-based health expectancies, i.e health expectancies which depend on the initial health state. If you are healthy, your healthy life expectancy (e11) is higher than if you were disabled (e21, with e11 > e21).
To compute a healthy life expectancy 'independent' of the initial status we have to weight e11 and e21 according to the probability of being in each state at initial age which correspond to the proportions of people in each health state (cross-sectional prevalences).

We could also compute e12 and e12 and get e.2 by weighting them according to the observed cross-sectional prevalences at initial age.

In a similar way we could compute the total life expectancy by summing e.1 and e.2 .
The main difference between 'population based' and 'implied' or 'period' is in the weights used. 'Usually', cross-sectional prevalences of disability are higher than period prevalences particularly at old ages. This is true if the country is improving its health system by teaching people how to prevent disability by promoting better screening, for example of people needing cataract surgery. Then the proportion of disabled people at age 90 will be lower than the current observed proportion.

Thus a better Health Expectancy and even a better Life Expectancy value is given by forecasting not only the current lower mortality at all ages but also a lower incidence of disability and higher recovery.
Using the period prevalences as weight instead of the cross-sectional prevalences we are computing indices which are more specific to the current situations and therefore more useful to predict improvements or regressions in the future as to compare different policies in various countries.

  • popbased= 0 Health expectancies are computed at each age from period prevalences 'expected' at this initial age.
  • popbased= 1 Health expectancies are computed at each age from cross-sectional 'observed' prevalence at the initial age. As all the population is not observed at the same exact date we define a short period where the observed prevalence can be computed as follows:
    we simply sum all people surveyed within these two exact dates who belong to a particular age group (single year) at the date of interview and are in a particular health state. Then it is easy to get the proportion of people in a particular health state as a percentage of all people of the same age group.
    If both dates are spaced and are covering two waves or more, people being interviewed twice or more are counted twice or more. The program takes into account the selection of individuals interviewed between firstpass and lastpass too (we don't know if this is useful). If popbased=1, the program outputs at first results from popbased=0 (period prevalence) and then outputs health expectancies weighted by cross-sectional prevalence at each initial age.

Prevalence forecasting, forward prevalences and backward prevalences

prevforecast=1 yearsfproj=10.0 mobil_average=0
prevbackcast=1 yearsbproj=10.0 mobil_average=1

Since version 0.99r19, forward projection of the prevalences and backward projection have these above newer syntax and no more this older (which still works)

prevforecast=1 starting-proj-date=1/1/1989 final-proj-date=1/1/1992 mobil_average=0

The programme estimates the prevalence in each state at the date of the average date of interview up to number of years (here 10 years). The statement mobil_average allows computation of smoothed forecasted prevalences with a five-age moving average centered at the mid-age of the five year-age period. For backward prevalences smoothing is mandatory



Result lines

With older versions 0.98, if the model did involve covariates, for example model=1+age+V1+V2, results were output for any combination of covariate values. It could be hudge and sometimes wrong if for example V1=1 and V2=1 did not exist.

Thus, with versions 0.99 (and later), you can write as many result lines as you need, however limited to 10 currently because of the huge number of graphs to produce. If you need more, run again with mle=0 (to avoid optimization) and add 10 new result lines. If the model is a basic model (model=1+age.) without any other covariate than age, the result line is not required, but it is as if you added

result:.

If your model involves covariates (and if it converged), you can output results for any combination of your dummy or quantitative covariates, or time varying covariates.

For example, if you have only one dummy fixed covariate V1, you need two lines:

 result:V1=0
 result:V1=1

in order to output graphs and tables for each value of the covariate V1=1 and V1=0.

You can have a very complex model like V1 dummy fixed, V2 quantitative fixed, V3 and V4 varying dummy, V5 varying quantitative:

ftol=1.e-4 stepm=12 ncovcol=1 nqv=1 ntv=2 nqtv=1 nlstate=3 ndeath=1 maxwav=8 mle= -2 weight=0
model=1+age+V5+V4+V3+V4*V3+V5*age+V2+V1*V2+V1*age+V1.

and ask only for some results:

 # Results model=1+age+V5+V4+V3+V4*V3+V5*age+V2+V1*V2+V1*age+V1.
 result: V4=1 V5=25.1 V3=0 V2=8 V1=1
 result:V4=1 V5=24.1 V3=1  V2=8 V1=0

The program should complain if a value is not entered.

In the last example, the combination related to dummy variables are different (V1=1 and V1=0). If the combinations were exactly the same, but differed for quantitative variables, the graphs will draw lines for all values of the quantitative variables of the current run (kind of bug). You can look at the generated gnuplot text file and search for the name of the graph; you can edit the data file and verify that each curve corresponds to a numbered combination of the dummy variables (first column) and that data output for each value of the quantitative variables are included in this file with the same combination number. Thus drawn on the same graph. If you want to have graphs for different values, please make separate runs for each value.

For a time varying covariate, the results use the 'fixed' value of the result line in order to compute the expectancies, exaggerating the influence of the covariate but this is what is done in demographic analysis.

As already said, the number of result lines is limited (to 10). In order to get more result lines, you have to run IMaCh again by changing the result lines. But to avoid the waste of time of the optimization of Likelihood, you can set 'mle=0' in your parameter file. IMaCh will skip the maximum likelihood process and will directly compute life expectancies and plot graphs according to your new result lines. For sure with 'mle=0', you must replace the parameter values with the optimized values (as well as the optimized covariance matrix). And the so called 'r'parameter file is the parameter which is created, with the optimized values, when you run IMaCh on a parameter file. Therefore you have to add the new result lines in such a 'r'parameter file or 'rr'parameter file etc.



Running Imach with this example (this documentation might be out of date June 2017)

We assume that you have already typed your 1st_example parameter file as explained above. To run the program under Windows you should either:

  • click on the imach.exe icon and either:
    • enter the name of the parameter file which is for example C:\home\myname\lsoa\biaspar.imach
    • or locate the biaspar.imach icon in your folder such as C:\home\myname\lsoa and drag it, with your mouse, on the already open imach window.
  • With versions 0.97b and above, if you ran setup at installation, Windows is supposed to understand the ".imach" extension and you can right click the biaspar.imach icon and either edit with wordpad (better than notepad) the parameter file or execute it with IMaCh.

The time to converge depends on the step unit used (1 month is more precise but more cpu time consuming), on the number of cases, and on the number of variables (covariates).

The program outputs many files. Most of them are files which will be plotted for better understanding.

To run under Linux is mostly the same.

It is no more difficult to run IMaCh on a MacIntosh.


Output of the program and graphs (graphs are out of date June 2017)

Once the optimization is finished (once the convergence is reached), many tables and graphics are produced.

The IMaCh program will create a subdirectory with the same name as your parameter file (here mypar) where all the tables and figures will be stored.
Important files like the log file and the output parameter file (the latter contains the maximum likelihood estimates) are stored at the main level not in this subdirectory. Files with extension .log and .txt can be edited with a standard editor like wordpad or notepad or even can be viewed with a browser like Internet Explorer or Mozilla.

The main html file is also named with the same name biaspar.htm. You can click on it by holding your shift key in order to open it in another window (Windows).

Our grapher is Gnuplot, an interactive plotting program (GPL) which can also work in batch mode. A gnuplot reference manual is available here.
When the run is finished, and in order that the window doesn't disappear, the user should enter a character like q for quitting.
These characters are:

  • 'e' for opening the main result html file biaspar.htm file to edit the output files and graphs.
  • 'g' to graph again
  • 'c' to start again the program from the beginning.
  • 'q' for exiting.

Note that if you want to use the program in batch, you can use the command 'imach parameter.imach< quit.txt' where quit.txt contains one line with 'q'. For sure it is more easy on Unix/Linux than on DOS using .bat files.

The main gnuplot file is named biaspar.gp and can be edited (right click) and run again.

Gnuplot is easy and you can use it to make more complex graphs. Just click on gnuplot and type plot sin(x) to see how easy it is.

Results files

- cross-sectional prevalence in each state (and at first pass): biaspar/prbiaspar.txt

The first line is the title and displays each field of the file. First column corresponds to age. Fields 2 and 6 are the proportion of individuals in states 1 and 2 respectively as observed at first exam. Others fields are the numbers of people in states 1, 2 or more. The number of columns increases if the number of states is higher than 2.
The header of the file is

# Age Prev(1) N(1) N Age Prev(2) N(2) N
 70 1.00000 631 631 70 0.00000 0 631
 71 0.99681 625 627 71 0.00319 2 627 
 72 0.97125 1115 1148 72 0.02875 33 1148 

It means that at age 70 (between 70 and 71), the prevalence in state 1 is 1.000 and in state 2 is 0.00 . At age 71 the number of individuals in state 1 is 625 and in state 2 is 2, hence the total number of people aged 71 is 625+2=627.

- Estimated parameters and covariance matrix: rbiaspar.imach

This file contains all the maximisation results:

 -2 log likelihood= 21660.918613445392
 Estimated parameters: a12 = -12.290174 b12 = 0.092161 
                       a13 = -9.155590  b13 = 0.046627 
                       a21 = -2.629849  b21 = -0.022030 
                       a23 = -7.958519  b23 = 0.042614  
 Covariance matrix: Var(a12) = 1.47453e-001
                    Var(b12) = 2.18676e-005
                    Var(a13) = 2.09715e-001
                    Var(b13) = 3.28937e-005  
                    Var(a21) = 9.19832e-001
                    Var(b21) = 1.29229e-004
                    Var(a23) = 4.48405e-001
                    Var(b23) = 5.85631e-005 

By substitution of these parameters in the regression model, we obtain the elementary transition probabilities:

Pebiaspar11.png

- Transition probabilities: biaspar/pijrbiaspar.txt

Here are the transitions probabilities Pij(x, x+nh). The second column is the starting age x (from age 95 to 65), the third is age (x+nh) and the others are the transition probabilities p11, p12, p13, p21, p22, p23. The first column indicates the value of the covariate (without any other variable than age it is equal to 1) For example, line 5 of the file is:

1 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513

and this means:

p11(100,106)=0.02655
p12(100,106)=0.17622
p13(100,106)=0.79722
p21(100,106)=0.01809
p22(100,106)=0.13678
p22(100,106)=0.84513
- Period prevalence in each state: biaspar/plrbiaspar.txt
#Prevalence
#Age 1-1 2-2
 
#************ 
70 0.90134 0.09866
71 0.89177 0.10823 
72 0.88139 0.11861 
73 0.87015 0.12985

At age 70 the period prevalence is 0.90134 in state 1 and 0.09866 in state 2. This period prevalence differs from the cross-sectional prevalence and we explaining. The cross-sectional prevalence at age 70 results from the incidence of disability, incidence of recovery and mortality which occurred in the past for the cohort. Period prevalence results from a simulation with current incidences of disability, recovery and mortality estimated from this cross-longitudinal survey. It is a good prediction of the prevalence in the future if "nothing changes in the future". This is exactly what demographers do with a period life table. Life expectancy is the expected mean survival time if current mortality rates (age-specific incidences of mortality) "remain constant" in the future.

- Standard deviation of period prevalence: biaspar/vplrbiaspar.txt

The period prevalence has to be compared with the cross-sectional prevalence. But both are statistical estimates and therefore have confidence intervals.
For the cross-sectional prevalence we generally need information on the design of the surveys. It is usually not enough to consider the number of people surveyed at a particular age and to estimate a Bernouilli confidence interval based on the prevalence at that age. But you can do it to have an idea of the randomness. At least you can get a visual appreciation of the randomness by looking at the fluctuation over ages.

For the period prevalence it is possible to estimate the confidence interval from the Hessian matrix (see the publication for details). We are supposing that the design of the survey will only alter the weight of each individual. IMaCh scales the weights of individuals-waves contributing to the likelihood by making the sum of the weights equal to the sum of individuals-waves contributing: a weighted survey doesn't increase or decrease the size of the survey, it only give more weight to some individuals and thus less to the others.

-cross-sectional and period prevalence in state (2=disable) with confidence interval:biaspar/vbiaspar21.png

This graph exhibits the period prevalence in state (2) with the confidence interval in red. The green curve is the observed prevalence (or proportion of individuals in state (2)). Without discussing the results (it is not the purpose here), we observe that the green curve is somewhat below the period prevalence. If the data were not biased by the non inclusion of people living in institutions we would have concluded that the prevalence of disability will increase in the future (see the main publication if you are interested in real data and results which are opposite).

Vbiaspar21.png

-Convergence to the period prevalence of disability: biaspar/pbiaspar11.png

Pbiaspar11.png
This graph plots the conditional transition probabilities from an initial state (1=healthy in red at the bottom, or 2=disabled in green on the top) at age x to the final state 2=disabled at age x+h where conditional means conditional on being alive at age x+h which is hP12x + hP22x. The curves hP12x/(hP12x + hP22x) and hP22x/(hP12x + hP22x) converge with h, to the period prevalence of disability. In order to get the period prevalence at age 70 we should start the process at an earlier age, i.e.50. If the disability state is defined by severe disability criteria with only a small chance of recovering, then the incidence of recovery is low and the time to convergence is probably longer. But we don't have experience of this yet.

- Life expectancies by age and initial health status with standard deviation: biaspar/erbiaspar.txt
# Health expectancies 
 # Age 1-1 (SE) 1-2 (SE) 2-1 (SE) 2-2 (SE)
  70   11.0180 (0.1277)    3.1950 (0.3635)    4.6500 (0.0871)    4.4807 (0.2187)
  71   10.4786 (0.1184)    3.2093 (0.3212)    4.3384 (0.0875)    4.4820 (0.2076)
  72    9.9551 (0.1103)    3.2236 (0.2827)    4.0426 (0.0885)    4.4827 (0.1966)
  73    9.4476 (0.1035)    3.2379 (0.2478)    3.7621 (0.0899)    4.4825 (0.1858)
  74    8.9564 (0.0980)    3.2522 (0.2165)    3.4966 (0.0920)    4.4815 (0.1754)
  75    8.4815 (0.0937)    3.2665 (0.1887)    3.2457 (0.0946)    4.4798 (0.1656)
  76    8.0230 (0.0905)    3.2806 (0.1645)    3.0090 (0.0979)    4.4772 (0.1565)
  77    7.5810 (0.0884)    3.2946 (0.1438)    2.7860 (0.1017)    4.4738 (0.1484)
  78    7.1554 (0.0871)    3.3084 (0.1264)    2.5763 (0.1062)    4.4696 (0.1416)
  79    6.7464 (0.0867)    3.3220 (0.1124)    2.3794 (0.1112)    4.4646 (0.1364)
  80    6.3538 (0.0868)    3.3354 (0.1014)    2.1949 (0.1168)    4.4587 (0.1331)
  81    5.9775 (0.0873)    3.3484 (0.0933)    2.0222 (0.1230)    4.4520 (0.1320)
 

For example

70  11.0180 (0.1277) 3.1950 (0.3635) 4.6500 (0.0871)  4.4807 (0.2187)

means

e11=11.0180 e12=3.1950 e21=4.6500 e22=4.4807
Expbiaspar21.pngExpbiaspar11.png

For example, life expectancy of a healthy individual at age 70 is 11.0 in the healthy state and 3.2 in the disability state (total of 14.2 years). If he was disabled at age 70, his life expectancy will be shorter, 4.65 years in the healthy state and 4.5 in the disability state (=9.15 years). The total life expectancy is a weighted mean of both, 14.2 and 9.15. The weight is the proportion of people disabled at age 70. In order to get a period index (i.e. based only on incidences) we use the prevalence in each state stable or period prevalence at age 70 (i.e. computed from incidences at earlier ages) instead of the prevalence in each state cross-sectional prevalence (observed for example at first interview) (expectancies see below).

- Variances of life expectancies by age and initial health status: biaspar/vrbiaspar.txt

For example, the covariances of life expectancies Cov(ei,ej) at age 50 are (line 3)

   Cov(e1,e1)=0.4776  Cov(e1,e2)=0.0488=Cov(e2,e1)  Cov(e2,e2)=0.0424
-Variances of one-step probabilities : biaspar/probrbiaspar.txt

For example, at age 65

   p11=9.960e-001 standard deviation of p11=2.359e-004
- Health expectancies with standard errors in parentheses: biaspar/trbiaspar.txt
#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) 
70 13.26 (0.22) 9.95 (0.20) 3.30 (0.14)

Thus, at age 70 the total life expectancy, e..=13.26 years is the weighted mean of e1.=13.46 and e2.=11.35 by the period prevalences at age 70 which are 0.90134 in state 1 and 0.09866 in state 2 respectively (the sum is equal to one). e.1=9.95 is the Disability-free life expectancy at age 70 (it is again a weighted mean of e11 and e21). e.2=3.30 is also the life expectancy at age 70 to be spent in the disability state.

-Total life expectancy by age and health expectancies in states (1=healthy) and (2=disable): [biaspar/ebiaspar1.png biaspar/ebiaspar1.png]

This figure represents the health expectancies and the total life expectancy with a confidence interval (dashed line).

        Ebiaspar1.png

Standard deviations (obtained from the information matrix of the model) of these quantities are very useful. Cross-longitudinal surveys are costly and do not involve huge samples, generally a few thousands people surveyed; therefore it is very important to have an idea of the standard deviation of our estimates. It has been a big challenge to compute the Health Expectancy standard deviations. Don't be confused: life expectancy is, as any expected value, the mean of a distribution; but here we are not computing the standard deviation of the distribution, but the standard deviation of the estimate of the mean.

Our health expectancy estimates vary according to the sample size (and the standard deviations give confidence intervals of the estimates) but also according to the model fitted. NEW: In versions 0.99s5 (May 2024) and after, the standard error of the proportions of total life expectancy spent in a specific health state j, std(e../e.j) is computed (often asked by referees) and printed in the right part of the "T_" tables, as well as included in the same graphs using a right scale of 0 to 100 %.

Let us explain the strategy of a model in more details.

Choosing a model means at least two kinds of choices: first a decision concerning the number of disability states (should we group mild and severe?) and, second, the model itself i.e. variables, covariates, confounding factors etc. to be included.

The more disability states we have, the better is our demographical approximation of the disability process, but the smaller the number of transitions between each state and the higher noise in the measurement.

We have not yet experimented enough models to summarize the advantages and disadvantages of each solution, but it is important to note that even if we had huge unbiased samples, the total life expectancy computed from a cross-longitudinal survey would vary with the number of states.

If we define only two states, alive and dead, we find the usual "life expectancy" which assumes that at each age people are at the same risk of dying.

If now, we are differentiating the alive state into healthy and disabled, and as mortality from the disabled state is higher than mortality from the healthy state, we are introducing heterogeneity in the risk of dying.

The total mortality at each age is the weighted mean of the mortality from each state by the prevalence of each state. Therefore if the proportion of people at each age and in each state is different from the period equilibrium, there is no reason to find the same total mortality at a particular age.

Life expectancy, even if it is a very useful tool, has a very strong implied hypothesis of homogeneity of the population to the risk of dying. Our main purpose is not to measure differential mortality but to measure the expected time in a healthy or disabled state in order to maximize the former and minimize the latter. But the differential in mortality complicates the measurement.

Incidences of disability or recovery are not affected by the number of states if these states are independent. But incidence estimates are dependent on the specification of the model. The more covariates we add in the logit model the better is the model, but some covariates are not well measured, some are confounding factors like in any statistical model. The procedure to "fit the best model' is similar to logistic regression which itself is similar to regression analysis. We haven't yet been sofar because we also have (had) a severe limitation which is the speed of the convergence. On a Pentium III, 500 MHz, even the simplest model, estimated by month on 8,000 people took 4 hours to converge.

Also, the IMaCh program is not a statistical package, and does not allow sophisticated design variables. If you need sophisticated design variable you have to them your self and and add them as ordinary variables. IMaCh allows up to 8 variables (20 in version 0.98j on Mac/OSX).

The current version of this program allows only to add simple variables like age+sex or age+sex+ age*sex but will never be general enough.

But what is to remember, is that incidences or probability of change from one state to another are affected by the variables specified in the model.

Also, the age range of the people interviewed is linked the age range of the life expectancy which can be estimated by extrapolation. If your sample ranges from age 70 to 95, you can clearly estimate a life expectancy at age 70 and trust your confidence interval because it is mostly based on your sample size, but if you want to estimate the life expectancy at age 50, you should rely in the design of your model. Fitting a logistic model on a age range of 70 to 95 and estimating probabilities of transition out of this age range, say at age 50, is very dangerous. At least you should remember that the confidence interval given by the standard deviation of the health expectancies, are under the strong assumption that your model is the 'true model', which is probably not the case outside the age range of your sample.

- Copy of the parameter file: orbiaspar.txt

This copy of the parameter file can be useful to re-run the program while keeping the old output files.

Trying on an example

Since you know how to run the program, it is time to test it on your own computer. Try for example on a parameter file named imachpar.imach which is a copy of mypar.imach included in the subdirectory of imach, mytry. Edit it and change the name of the data file to mydata.txt if you don't want to copy it on the same directory. The file mydata.txt is a smaller file of 3,000 people but still with 4 waves.

Right click the .imach file and a window will popup with the string 'Enter the parameter file name:'

IMACH, Version 0.98iEnter the parameter file name: imachpar.imach

Most of the data files or image files generated, will use the 'imachpar' string into their name. The running time is about 2-3 minutes on a Pentium III. If the execution worked correctly, the outputs files are created in the current directory, and should be the same as the mypar files initially included in the directory mytry.

  • Output on the screen The output screen looks like biaspar.log
#
title=MLE datafile=mydata.txt lastobs=3000 firstpass=1 lastpass=3
ftol=1.000000e-008 stepm=24 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92

Warning, no any valid information for:126 line=126
Warning, no any valid information for:2307 line=2307
Delay (in months) between two waves Min=21 Max=51 Mean=24.495826
These lines give some warnings on the data file and also some raw statistics on frequencies of transitions.
Age 70 1.=230 loss[1]=3.5% 2.=16 loss[2]=12.5% 1.=222 prev[1]=94.1% 2.=14
 prev[2]=5.9% 1-1=8 11=200 12=7 13=15 2-1=2 21=6 22=7 23=1
Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0

It includes some warnings or errors which are very important for you. Be careful with such warnings because your results may be biased if, for example, you have people who accepted to be interviewed at first pass but never after. Or if you don't have the exact month of death. In such cases IMaCh doesn't take any initiative, it does only warn you. It is up to you to decide what to do with these people. Excluding them is usually a wrong decision. It is better to decide that the month of death is at the mid-interval between the last two waves for example.

If you survey suffers from severe attrition, you have to analyse the characteristics of the lost people and overweight people with same characteristics for example.

By default, IMaCH warns and excludes these problematic people, but you have to be careful with such results.

  • Maximisation with the Powell algorithm. 8 directions are given corresponding to the 8 parameters. this can be rather long to get convergence.
Powell iter=1 -2*LL=11531.405658264877 1 0.000000000000 2 0.000000000000 3
0.000000000000 4 0.000000000000 5 0.000000000000 6 0.000000000000 7 
0.000000000000 8 0.000000000000
1..........2.................3..........4.................5.........
6................7........8...............
Powell iter=23 -2*LL=6744.954108371555 1 -12.967632334283 
2 0.135136681033 3 -7.402109728262 4 0.067844593326 
5 -0.673601538129 6 -0.006615504377 7 -5.051341616718 8 0.051272038506
1..............2...........3..............4...........
5..........6................7...........8.........
#Number of iterations = 23, -2 Log likelihood = 6744.954042573691
# Parameters
12 -12.966061 0.135117 
13 -7.401109 0.067831 
21 -0.672648 -0.006627 
23 -5.051297 0.051271 
Calculation of the hessian matrix. Wait...
 12345678.12.13.14.15.16.17.18.23.24.25.26.27.28.34.35.36.37.38.45.
46.47.48.56.57.58.67.68.78
 
Inverting the hessian to get the covariance matrix. Wait...
 
#Hessian matrix#
 3.344e+002 2.708e+004 -4.586e+001 -3.806e+003 -1.577e+000 -1.313e+002 3.914e-001 3.166e+001 
 2.708e+004 2.204e+006 -3.805e+003 -3.174e+005 -1.303e+002 -1.091e+004 2.967e+001 2.399e+003 
 -4.586e+001 -3.805e+003 4.044e+002 3.197e+004 2.431e-002 1.995e+000 1.783e-001 1.486e+001 
 -3.806e+003 -3.174e+005 3.197e+004 2.541e+006 2.436e+000 2.051e+002 1.483e+001 1.244e+003 
 -1.577e+000 -1.303e+002 2.431e-002 2.436e+000 1.093e+002 8.979e+003 -3.402e+001 -2.843e+003 
 -1.313e+002 -1.091e+004 1.995e+000 2.051e+002 8.979e+003 7.420e+005 -2.842e+003 -2.388e+005 
 3.914e-001 2.967e+001 1.783e-001 1.483e+001 -3.402e+001 -2.842e+003 1.494e+002 1.251e+004 
 3.166e+001 2.399e+003 1.486e+001 1.244e+003 -2.843e+003 -2.388e+005 1.251e+004 1.053e+006 
# Scales
 12 1.00000e-004 1.00000e-006
 13 1.00000e-004 1.00000e-006
 21 1.00000e-003 1.00000e-005
 23 1.00000e-004 1.00000e-005
# Covariance
   1 5.90661e-001
   2 -7.26732e-003 8.98810e-005
   3 8.80177e-002 -1.12706e-003 5.15824e-001
   4 -1.13082e-003 1.45267e-005 -6.50070e-003 8.23270e-005
   5 9.31265e-003 -1.16106e-004 6.00210e-004 -8.04151e-006 1.75753e+000
   6 -1.15664e-004 1.44850e-006 -7.79995e-006 1.04770e-007 -2.12929e-002 2.59422e-004
   7 1.35103e-003 -1.75392e-005 -6.38237e-004 7.85424e-006 4.02601e-001 -4.86776e-003 1.32682e+000
   8 -1.82421e-005 2.35811e-007 7.75503e-006 -9.58687e-008 -4.86589e-003 5.91641e-005 -1.57767e-002 1.88622e-004
#agemin agemax for lifexpectancy, bage fage (if mle==0 ie no data nor Max likelihood).
 agemin=70 agemax=100 bage=50 fage=100
 Computing prevalence limit: result on file 'plrmypar.txt' 
 Computing pij: result on file 'pijrmypar.txt' 
 Computing Health Expectancies: result on file 'ermypar.txt' 
 Computing Variance-covariance of DFLEs: file 'vrmypar.txt' 
 Computing Total LEs with variances: file 'trmypar.txt' 
 Computing Variance-covariance of Prevalence limit: file 'vplrmypar.txt' 
 End of Imach

Once the running is finished, the program requires a character:

Type e to edit output files, g to graph again, c to start again, and q for exiting:

You can avoid this prompt (for example in a batch job with successive runs) by adding a 'q' in a comment line just after the model line:

model=1+age
#q comment line. 

In order to have an idea of the time needed to reach convergence, IMaCh gives an estimation if the convergence needs 10, 20 or 30 iterations. It might be useful.

First you should enter e to edit the master file mypar.htm.

  • Outputs files

- Copy of the parameter file: ormypar.txt
- Gnuplot file name: mypar.gp.txt
- Cross-sectional prevalence in each state: prmypar.txt
- Period prevalence in each state: plrmypar.txt
- Transition probabilities: pijrmypar.txt
- Life expectancies by age and initial health status (estepm=24 months): ermypar.txt
- Parameter file with estimated parameters and the covariance matrix: rmypar.txt
- Variance of one-step probabilities: probrmypar.txt
- Variances of life expectancies by age and initial health status (estepm=24 months): vrmypar.txt
- Health expectancies with their variances: trmypar.txt
- Standard deviation of period prevalences: vplrmypar.txt

  • Graphs

-One-step transition probabilities
-Convergence to the period prevalence
-Cross-sectional and period prevalence in state (1) with the confident interval
-Cross-sectional and period prevalence in state (2) with the confident interval
-Health life expectancies by age and initial health state (1)
-Health life expectancies by age and initial health state (2)
-Total life expectancy by age and health expectancies in states (1) and (2).

Grants

This software have been partly granted by Euro-REVES, a concerted action from the European Union.

  • In 2003-2004 it has been granted by the French Institute on Longevity.
  • In January 2014, it has been granted by the Japan Society for the Promotion of Science (Grant-in-Aid for Scientific Research 25293121).
  • In 2015 by Intel Software by providing free licences for C/C++ on the three OS.
  • In 2016 by Intel Software by providing free licences for C/C++ on the three OS for three additional years until 2019.
  • In 2020-2022 by Economic Research Institute for ASEAN and East Asia (ERIA), Jakarta, Indonesia

Download latest version

Latest version (0.99s5 of May 2025 for Windows, OS/X and Linux can be accessed at http://euroreves.ined.fr/imach/Download

The 'r' series up to 0.99r46 are working well up to 30 parameters, but the algorithm used (from Powell) doesn't converge (or take a lot of time) with a few hundred parameters and this is the reason why we changed the algorithm to use Praxis from Brent 1973 thesis, giving birth to the 's' series of IMaCh. It doesn't change anything for the user. Older versions can be be found on the Download-old directory.

Version 0.99s5 includes the calculation and graphs of the standard error of proportion of total life expectancy spent in a specific live state.

Specific workshops on IMaCh

Current workshop in 2024:

Pre-REVES Workshop - Bogota, Colombia - 20-21th May 2024 will be devoted R.

Previous workshop in 2023:

Pre-REVES Workshop - Padova, Italy - 22-23th May 2023 PRE-REVES2023.

Previous workshops:

Pre-REVES Workshop - Halifax, Canada - 19-20th September 2022 PRE-REVES2022.

Pre-REVES workshop on IMaCh software: Live Event 25th of May 2021 3 to 5pm (Paris/Brussels UTC+2) before the 2021 REVES Meeting – Brussels, Belgium – 26-28th May 2021 PRE-REVES2021;

Pre-REVES workshop on software to compute health expectancy - Barcelona - 28th of May 2019 PREREVES2019.