Main Page

From IMaChWiki
Jump to: navigation, search


Computing Health Expectancies using IMaCh


A Maximum Likelihood Computer Program using Interpolation of Markov Chains



Introduction

IMaCh is the acronym for Interpolated Markov Chains. This idea of Markov Chain Interpolation was pioneered by Laditka and Wolf1. However, IMaCh is a computer program which maximizes the likelihood of a sample of individuals surveyed at two or more dates giving information on their conditions. IMaCh is mostly used for the estimation of Health Expectancy from Cross-Longitudinal surveys. At each wave of the survey, the health status is recorded (with or without disability for example). If the sample size is big enough and the information on disability can be detailed, a third status is added and the model of the ageing process will, in this example, consist in transitions between "No disability", "Mild disability" and "Severe disability".

Transitions are supposed to occur at any time and death is always a additional competing risk. But the health condition is measured only at the dates of interview. As the time between interviews varies among individuals (some people may also miss an interview but give information at another wave), modeling the probability to be observed in each state after a "mean" delay between two interviews will produce wrong results: a simple logistic regression modeling transition between two states is not enough.

Therefore the Interpolated Markov Chain model consists in setting a logistic regression model modeling transition between two states within a small period of a month. In fact we are modeling the Markov "process" using the attached Markov chain by observing the state (and not the transition) at exact dates spaced regularly.

And for an individual, the probability to observe the two states at waves spaced, say for example 17 months, is given by multiplying 17 Markov matrices and extracting the corresponding element from the matrix product. The total likelihood of the sample to be maximized is the product of these elements. Each individual contributing as many times as interviewed (or dead).

Age is a major covariate and is always included in the model and varies by month.

Because of the high number of matrix products and the time to get convergence, IMaCh is not directly run with a monthly (step=1 month) model but using a broader interval, closed to the mean interval between waves (60 month or 24 month) and then by decreasing stepm to 12, 4, 3 and 1 month if convergence is reached. With a small interval of one month, the logistic model is similar to the log model and the theory is similar to integrating differential equations of the real infinitesimal process, but with the advantage of having maximum likelihood properties of the estimators.

For large stepm, if the delay between two interviews is not a multiple of stepm, a pseudo likelihood is computed by inter- or extra-polating the likelihood obtained at two exact stepm.

IMaCh can accept many states and many covariates but computers are not fast enough to give results within a day. Also more states and covariate are given, mode difficult is the interpretation of results.

IMaCh is a software for "research", still under development, but will not provide facilities of other softwares and, for example, design variables of multinomial covariates have to be split and provided as binary covariates (0 - 1) only.

IMaCh treats only two-way transitions (the difficult case) in order to provide "period prevalences" to be compared to "cross-sectional prevalences". But an attempt is currently (May 2010) done to estimate one-way transition if the likelihood is plateauing (meaning that there is not enough cases to estimate the way back).

IMaCh is well designed for merging different waves giving the total information of each individual to the computation of the likelihood.

Its main outputs are age-specific "period prevalences" and health expectancies. The input data file is very simple.

Details are given on the:

Documentation page (wiki)

Click to access the Documentation page (wiki)

Installation

and for the Installation

Downloads at http://euroreves.ined.fr/imach/Download

ant downloads ar http://euroreves.ined.fr/imach/Download. There is also a downloads old page http://euroreves.ined.fr/imach/Download-old directory for olders versions.

But many informations are missing for the user. Our hope is that this IMaChWiki will help new users and that experienced users will improve this wiki.

The most important publication on IMaCh is probably the article published by the authors in Mathematical Population Studies2, but an increasing number of people have used IMaCh, please use the list of:

Publications using IMaCh

to add yours.


Workshops on IMaCh

IMaCH_and_Statistical_Packages

One important missing point of IMaCh concerns the way how to create data files suitable for IMaCh from standard statistical packages; please have a look at:

  • and improve the Wiki page with your own tips.


(1) Laditka S. B. and Wolf, D. (1998), New Methods for Analyzing Active Life Expectancy. Journal of Aging and Health. Vol 10, No. 2.

(2) Lièvre A., Brouard N. and Heathcote Ch. (2003) Estimating Health Expectancies from Cross-longitudinal surveys. Mathematical Population Studies.- 10(4), pp. 211-248. DOI 10.1080/713644739.

Performance

This page deals with performance on various platforms and compilers.

Sources

You can browse the sources on the cvs web of Ined at http://sauvy.ined.fr/cgi-bin/cvsweb.cgi You can have a look at the Changelog http://sauvy.ined.fr/cgi-bin/cvsweb.cgi/imach/src/ChangeLog

Bugs

  • With version 0.99
    • Version 0.99r45 Lot of bugs have been fixed with wrong drawings (never on optimization). Most of them being discovered by Feinuo Sun from Halifax, and fixed. The main remaining problem comes when we have a lot of covariate as in a complex model (a few hundred parameters!) such as
model=1+age+V7*V4*age+V6*V4*age+V7*V3*age+V6*V3*age+V6*V2*age+V7*age+V6*age+V4*age+V3*age+V2*age+V7*V4+V6*V4+V7*V3+V6*V3+V7*V2+V6*V2+V7+V6+V4+V3+V2

in that case the maximization doesn't work or use a lot of time you have to restart with the last parameter values (the directions will be back to unity and the entire space will be explored again). This has to be fixed.

    • Version 0.99r19 was fine but with errors in the resultline loop which hasn't been fixed in versions 0.99r20, r21, r22. It has been fixed only in version 0.99-23 (thanks to Holly).
  • Until 0.98q4,
    • Variance of one-step probabilities was in fact its square root (thank you to Yao-Chi Shih) as written in the file probrXXX.txt but not in the main html file.
    • In the log file, parameters were not listed correctly 11, 12, 21, 22 instead of 12, 13, 21 23 etc (thanks to Lucy Leigh).
    • Concerning the parameters estimates, the 95% confidence intervals and T (p/sqrt(var)) are calculated (thank you to Zachary Zimmer). They are currently output on screen as well as in the log file.
  • 0.98p0: April 2015: important bug found (see below 0.98nX interrogation) in the Numerical Recipes in C library. The routine mnbrak is faulty. Version 0.98q0 is correct and gives accurate values.
Using version 0.98q0 which fixes the MNBRAK wrong algorithm, the results are more accurate according to the likelihood which increases from historical -2*LL=46542.387547682214 to -2*LL=46537.279274631677 . This is not tiny with a diff of 5 (42-37=5) which has to be compared to a chi^2=15=qchisq(.95,df=8) in the case of 8 parameters.
If I have no time to understand how the initial process worked with the bug, I will make a dirty version which will start with the bug (0.98p0) in order to have some good first iterations and better initial values and then move on without the bug (0.98q0) in order to have accurate final results.
This is what you are currently asked to do with the two versions. But if you want to keep results exactly as IMaCh was producing since a while (10 years?) use 0.98p0.
The supposedly advantage of version 0.98q0 is that when using it (with starting values from 0.98p0 outputs) you are supposed to get convergence (and accurate results) when decreasing stepm from 3 or 2 years to stepm=1 month which should be the most scientifically results.
There was no reason to get divergence with stepm=1 when convergence was obtained with stepm=24 (or whatever). And it seemed that it came from the original MNBRAK routine which did not find a correct bracket when the function was close to the minimum (maximum in the case of MLE). MNBRAK estimates the MLE in a multidimensional direction and is supposed to find a and b such that the minimum of the Likelihood is in between, but when we were close to MLE (after 30 iterations for example) the MNBRAK routine did wrong.
  • 0.98nX: Strange differences are obtained from the Visual Studio 2013 64 bit and the Intel Parallel Studio 2015 64 bit (ilc64). The latter being much more accurate. Investigations have to be done.
    • OS/X Installation of the documentation on html instead of doc;
    • Wrong time;
    • Wrong link on main tables, please look at "Other downloads".
    • Gnuplot not launched. dyld: Library not loaded: /usr/local/lib/libreadline.6.2.dylib
  • 0.98m: It fixed some bugs, enables the use of missing values but the various graphs are not fully output because of some confusions in the numbering of parameters and covariates. You need to make your computation on your own.

General considerations about this Wiki

Please feel free to add new sections.

You can learn from wikipedia and its introduction how to enter text, figures, tables, maths or new sections.

Please log with your full name, like John Smith (with a blank). But, like for Wikipedia you can use a pseudo. or even not log in (your IP is then recorded).