File:  [Local Repository] / imach096d / src / imach.htm
Revision 1.1.1.1 (vendor branch): download - view: text, annotated - select for diffs
Thu Dec 28 18:49:56 2000 UTC (23 years, 10 months ago) by brouard
Branches: lievre, MAIN
CVS tags: start, Version-0-8a-jackson-revised, Version-0-8a, HEAD
Import de imach064

<html>

<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Computing Health Expectancies using IMaCh</title>
</head>

<body bgcolor="#FFFFFF">

<hr size="3" color="#EC5E5E">

<h1 align="center"><font color="#00006A">Computing Health
Expectancies using IMaCh</font></h1>

<h1 align="center"><font color="#00006A" size="5">(a Maximum
Likelihood Computer Program using Interpolation of Markov Chains)</font></h1>

<p align="center">&nbsp;</p>

<p align="center"><a href="http://www.ined.fr/"><img
src="logo-ined.gif" border="0" width="151" height="76"></a><img
src="euroreves2.gif" width="151" height="75"></p>

<h3 align="center"><a href="http://www.ined.fr/"><font
color="#00006A">INED</font></a><font color="#00006A"> and </font><a
href="http://euroreves.ined.fr"><font color="#00006A">EUROREVES</font></a></h3>

<p align="center"><font color="#00006A" size="4"><strong>March
2000</strong></font></p>

<hr size="3" color="#EC5E5E">

<p align="center"><font color="#00006A"><strong>Authors of the
program: </strong></font><a href="http://sauvy.ined.fr/brouard"><font
color="#00006A"><strong>Nicolas Brouard</strong></font></a><font
color="#00006A"><strong>, senior researcher at the </strong></font><a
href="http://www.ined.fr"><font color="#00006A"><strong>Institut
National d'Etudes Démographiques</strong></font></a><font
color="#00006A"><strong> (INED, Paris) in the &quot;Mortality,
Health and Epidemiology&quot; Research Unit </strong></font></p>

<p align="center"><font color="#00006A"><strong>and Agnès
Lièvre<br clear="left">
</strong></font></p>

<h4><font color="#00006A">Contribution to the mathematics: C. R.
Heathcote </font><font color="#00006A" size="2">(Australian
National University, Canberra).</font></h4>

<h4><font color="#00006A">Contact: Agnès Lièvre (</font><a
href="mailto:lievre@ined.fr"><font color="#00006A"><i>lievre@ined.fr</i></font></a><font
color="#00006A">) </font></h4>

<hr>

<ul>
    <li><a href="#intro">Introduction</a> </li>
    <li>The detailed statistical model (<a href="docmath.pdf">PDF
        version</a>),(<a href="docmath.ps">ps version</a>) </li>
    <li><a href="#data">On what kind of data can it be used?</a></li>
    <li><a href="#datafile">The data file</a> </li>
    <li><a href="#biaspar">The parameter file</a> </li>
    <li><a href="#running">Running Imach</a> </li>
    <li><a href="#output">Output files and graphs</a> </li>
    <li><a href="#example">Exemple</a> </li>
</ul>

<hr>

<h2><a name="intro"><font color="#00006A">Introduction</font></a></h2>

<p>This program computes <b>Healthy Life Expectancies</b> from <b>cross-longitudinal
data</b>. Within the family of Health Expectancies (HE),
Disability-free life expectancy (DFLE) is probably the most
important index to monitor. In low mortality countries, there is
a fear that when mortality declines, the increase in DFLE is not
proportionate to the increase in total Life expectancy. This case
is called the <em>Expansion of morbidity</em>. Most of the data
collected today, in particular by the international <a
href="http://euroreves/reves">REVES</a> network on Health
expectancy, and most HE indices based on these data, are <em>cross-sectional</em>.
It means that the information collected comes from a single
cross-sectional survey: people from various ages (but mostly old
people) are surveyed on their health status at a single date.
Proportion of people disabled at each age, can then be measured
at that date. This age-specific prevalence curve is then used to
distinguish, within the stationary population (which, by
definition, is the life table estimated from the vital statistics
on mortality at the same date), the disable population from the
disability-free population. Life expectancy (LE) (or total
population divided by the yearly number of births or deaths of
this stationary population) is then decomposed into DFLE and DLE.
This method of computing HE is usually called the Sullivan method
(from the name of the author who first described it).</p>

<p>Age-specific proportions of people disable are very difficult
to forecast because each proportion corresponds to historical
conditions of the cohort and it is the result of the historical
flows from entering disability and recovering in the past until
today. The age-specific intensities (or incidence rates) of
entering disability or recovering a good health, are reflecting
actual conditions and therefore can be used at each age to
forecast the future of this cohort. For example if a country is
improving its technology of prosthesis, the incidence of
recovering the ability to walk will be higher at each (old) age,
but the prevalence of disability will only slightly reflect an
improve because the prevalence is mostly affected by the history
of the cohort and not by recent period effects. To measure the
period improvement we have to simulate the future of a cohort of
new-borns entering or leaving at each age the disability state or
dying according to the incidence rates measured today on
different cohorts. The proportion of people disabled at each age
in this simulated cohort will be much lower (using the exemple of
an improvement) that the proportions observed at each age in a
cross-sectional survey. This new prevalence curve introduced in a
life table will give a much more actual and realistic HE level
than the Sullivan method which mostly measured the History of
health conditions in this country.</p>

<p>Therefore, the main question is how to measure incidence rates
from cross-longitudinal surveys? This is the goal of the IMaCH
program. From your data and using IMaCH you can estimate period
HE and not only Sullivan's HE. Also the standard errors of the HE
are computed.</p>

<p>A cross-longitudinal survey consists in a first survey
(&quot;cross&quot;) where individuals from different ages are
interviewed on their health status or degree of disability. At
least a second wave of interviews (&quot;longitudinal&quot;)
should measure each new individual health status. Health
expectancies are computed from the transitions observed between
waves and are computed for each degree of severity of disability
(number of life states). More degrees you consider, more time is
necessary to reach the Maximum Likelihood of the parameters
involved in the model. Considering only two states of disability
(disable and healthy) is generally enough but the computer
program works also with more health statuses.<br>
<br>
The simplest model is the multinomial logistic model where <i>pij</i>
is the probability to be observed in state <i>j</i> at the second
wave conditional to be observed in state <em>i</em> at the first
wave. Therefore a simple model is: log<em>(pij/pii)= aij +
bij*age+ cij*sex,</em> where '<i>age</i>' is age and '<i>sex</i>'
is a covariate. The advantage that this computer program claims,
comes from that if the delay between waves is not identical for
each individual, or if some individual missed an interview, the
information is not rounded or lost, but taken into account using
an interpolation or extrapolation. <i>hPijx</i> is the
probability to be observed in state <i>i</i> at age <i>x+h</i>
conditional to the observed state <i>i</i> at age <i>x</i>. The
delay '<i>h</i>' can be split into an exact number (<i>nh*stepm</i>)
of unobserved intermediate states. This elementary transition (by
month or quarter trimester, semester or year) is modeled as a
multinomial logistic. The <i>hPx</i> matrix is simply the matrix
product of <i>nh*stepm</i> elementary matrices and the
contribution of each individual to the likelihood is simply <i>hPijx</i>.
<br>
</p>

<p>The program presented in this manual is a quite general
program named <strong>IMaCh</strong> (for <strong>I</strong>nterpolated
<strong>MA</strong>rkov <strong>CH</strong>ain), designed to
analyse transition data from longitudinal surveys. The first step
is the parameters estimation of a transition probabilities model
between an initial status and a final status. From there, the
computer program produces some indicators such as observed and
stationary prevalence, life expectancies and their variances and
graphs. Our transition model consists in absorbing and
non-absorbing states with the possibility of return across the
non-absorbing states. The main advantage of this package,
compared to other programs for the analysis of transition data
(For example: Proc Catmod of SAS<sup>®</sup>) is that the whole
individual information is used even if an interview is missing, a
status or a date is unknown or when the delay between waves is
not identical for each individual. The program can be executed
according to parameters: selection of a sub-sample, number of
absorbing and non-absorbing states, number of waves taken in
account (the user inputs the first and the last interview), a
tolerance level for the maximization function, the periodicity of
the transitions (we can compute annual, quaterly or monthly
transitions), covariates in the model. It works on Windows or on
Unix.<br>
</p>

<hr>

<h2><a name="data"><font color="#00006A">On what kind of data can
it be used?</font></a></h2>

<p>The minimum data required for a transition model is the
recording of a set of individuals interviewed at a first date and
interviewed again at least one another time. From the
observations of an individual, we obtain a follow-up over time of
the occurrence of a specific event. In this documentation, the
event is related to health status at older ages, but the program
can be applied on a lot of longitudinal studies in different
contexts. To build the data file explained into the next section,
you must have the month and year of each interview and the
corresponding health status. But in order to get age, date of
birth (month and year) is required (missing values is allowed for
month). Date of death (month and year) is an important
information also required if the individual is dead. Shorter
steps (i.e. a month) will more closely take into account the
survival time after the last interview.</p>

<hr>

<h2><a name="datafile"><font color="#00006A">The data file</font></a></h2>

<p>In this example, 8,000 people have been interviewed in a
cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990).
Some people missed 1, 2 or 3 interviews. Health statuses are
healthy (1) and disable (2). The survey is not a real one. It is
a simulation of the American Longitudinal Survey on Aging. The
disability state is defined if the individual missed one of four
ADL (Activity of daily living, like bathing, eating, walking).
Therefore, even is the individuals interviewed in the sample are
virtual, the information brought with this sample is close to the
situation of the United States. Sex is not recorded is this
sample.</p>

<p>Each line of the data set (named <a href="data1.txt">data1.txt</a>
in this first example) is an individual record which fields are: </p>

<ul>
    <li><b>Index number</b>: positive number (field 1) </li>
    <li><b>First covariate</b> positive number (field 2) </li>
    <li><b>Second covariate</b> positive number (field 3) </li>
    <li><a name="Weight"><b>Weight</b></a>: positive number
        (field 4) . In most surveys individuals are weighted
        according to the stratification of the sample.</li>
    <li><b>Date of birth</b>: coded as mm/yyyy. Missing dates are
        coded as 99/9999 (field 5) </li>
    <li><b>Date of death</b>: coded as mm/yyyy. Missing dates are
        coded as 99/9999 (field 6) </li>
    <li><b>Date of first interview</b>: coded as mm/yyyy. Missing
        dates are coded as 99/9999 (field 7) </li>
    <li><b>Status at first interview</b>: positive number.
        Missing values ar coded -1. (field 8) </li>
    <li><b>Date of second interview</b>: coded as mm/yyyy.
        Missing dates are coded as 99/9999 (field 9) </li>
    <li><strong>Status at second interview</strong> positive
        number. Missing values ar coded -1. (field 10) </li>
    <li><b>Date of third interview</b>: coded as mm/yyyy. Missing
        dates are coded as 99/9999 (field 11) </li>
    <li><strong>Status at third interview</strong> positive
        number. Missing values ar coded -1. (field 12) </li>
    <li><b>Date of fourth interview</b>: coded as mm/yyyy.
        Missing dates are coded as 99/9999 (field 13) </li>
    <li><strong>Status at fourth interview</strong> positive
        number. Missing values are coded -1. (field 14) </li>
    <li>etc</li>
</ul>

<p>&nbsp;</p>

<p>If your longitudinal survey do not include information about
weights or covariates, you must fill the column with a number
(e.g. 1) because a missing field is not allowed.</p>

<hr>

<h2><font color="#00006A">Your first example parameter file</font><a
href="http://euroreves.ined.fr/imach"></a><a name="uio"></a></h2>

<h2><a name="biaspar"></a>#Imach version 0.63, February 2000,
INED-EUROREVES </h2>

<p>This is a comment. Comments start with a '#'.</p>

<h4><font color="#FF0000">First uncommented line</font></h4>

<pre>title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4</pre>

<ul>
    <li><b>title=</b> 1st_example is title of the run. </li>
    <li><b>datafile=</b>data1.txt is the name of the data set.
        Our example is a six years follow-up survey. It consists
        in a baseline followed by 3 reinterviews. </li>
    <li><b>lastobs=</b> 8600 the program is able to run on a
        subsample where the last observation number is lastobs.
        It can be set a bigger number than the real number of
        observations (e.g. 100000). In this example, maximisation
        will be done on the 8600 first records. </li>
    <li><b>firstpass=1</b> , <b>lastpass=4 </b>In case of more
        than two interviews in the survey, the program can be run
        on selected transitions periods. firstpass=1 means the
        first interview included in the calculation is the
        baseline survey. lastpass=4 means that the information
        brought by the 4th interview is taken into account.</li>
</ul>

<p>&nbsp;</p>

<h4><a name="biaspar-2"><font color="#FF0000">Second uncommented
line</font></a></h4>

<pre>ftol=1.e-08 stepm=1 ncov=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0</pre>

<ul>
    <li><b>ftol=1e-8</b> Convergence tolerance on the function
        value in the maximisation of the likelihood. Choosing a
        correct value for ftol is difficult. 1e-8 is a correct
        value for a 32 bits computer.</li>
    <li><b>stepm=1</b> Time unit in months for interpolation.
        Examples:<ul>
            <li>If stepm=1, the unit is a month </li>
            <li>If stepm=4, the unit is a trimester</li>
            <li>If stepm=12, the unit is a year </li>
            <li>If stepm=24, the unit is two years</li>
            <li>... </li>
        </ul>
    </li>
    <li><b>ncov=2</b> Number of covariates to be add to the
        model. The intercept and the age parameter are counting
        for 2 covariates. For example, if you want to add gender
        in the covariate vector you must write ncov=3 else
        ncov=2. </li>
    <li><b>nlstate=2</b> Number of non-absorbing (live) states.
        Here we have two alive states: disability-free is coded 1
        and disability is coded 2. </li>
    <li><b>ndeath=1</b> Number of absorbing states. The absorbing
        state death is coded 3. </li>
    <li><b>maxwav=4</b> Maximum number of waves. The program can
        not include more than 4 interviews. </li>
    <li><a name="mle"><b>mle</b></a><b>=1</b> Option for the
        Maximisation Likelihood Estimation. <ul>
            <li>If mle=1 the program does the maximisation and
                the calculation of heath expectancies </li>
            <li>If mle=0 the program only does the calculation of
                the health expectancies. </li>
        </ul>
    </li>
    <li><b>weight=0</b> Possibility to add weights. <ul>
            <li>If weight=0 no weights are included </li>
            <li>If weight=1 the maximisation integrates the
                weights which are in field <a href="#Weight">4</a></li>
        </ul>
    </li>
</ul>

<h4><font color="#FF0000">Guess values for optimization</font><font
color="#00006A"> </font></h4>

<p>You must write the initial guess values of the parameters for
optimization. The number of parameters, <em>N</em> depends on the
number of absorbing states and non-absorbing states and on the
number of covariates. <br>
<em>N</em> is given by the formula <em>N</em>=(<em>nlstate</em> +
<em>ndeath</em>-1)*<em>nlstate</em>*<em>ncov</em>&nbsp;. <br>
<br>
Thus in the simple case with 2 covariates (the model is log
(pij/pii) = aij + bij * age where intercept and age are the two
covariates), and 2 health degrees (1 for disability-free and 2
for disability) and 1 absorbing state (3), you must enter 8
initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can
start with zeros as in this example, but if you have a more
precise set (for example from an earlier run) you can enter it
and it will speed up them<br>
Each of the four lines starts with indices &quot;ij&quot;: <br>
<br>
<b>ij aij bij</b> </p>

<blockquote>
    <pre># Guess values of aij and bij in log (pij/pii) = aij + bij * age
12 -14.155633  0.110794 
13  -7.925360  0.032091 
21  -1.890135 -0.029473 
23  -6.234642  0.022315 </pre>
</blockquote>

<p>or, to simplify: </p>

<blockquote>
    <pre>12 0.0 0.0
13 0.0 0.0
21 0.0 0.0
23 0.0 0.0</pre>
</blockquote>

<h4><font color="#FF0000">Guess values for computing variances</font></h4>

<p>This is an output if <a href="#mle">mle</a>=1. But it can be
used as an input to get the vairous output data files (Health
expectancies, stationary prevalence etc.) and figures without
rerunning the rather long maximisation phase (mle=0). </p>

<p>The scales are small values for the evaluation of numerical
derivatives. These derivatives are used to compute the hessian
matrix of the parameters, that is the inverse of the covariance
matrix, and the variances of health expectancies. Each line
consists in indices &quot;ij&quot; followed by the initial scales
(zero to simplify) associated with aij and bij. </p>

<ul>
    <li>If mle=1 you can enter zeros:</li>
</ul>

<blockquote>
    <pre># Scales (for hessian or gradient estimation)
12 0. 0. 
13 0. 0. 
21 0. 0. 
23 0. 0. </pre>
</blockquote>

<ul>
    <li>If mle=0 you must enter a covariance matrix (usually
        obtained from an earlier run).</li>
</ul>

<h4><font color="#FF0000">Covariance matrix of parameters</font></h4>

<p>This is an output if <a href="#mle">mle</a>=1. But it can be
used as an input to get the vairous output data files (Health
expectancies, stationary prevalence etc.) and figures without
rerunning the rather long maximisation phase (mle=0). </p>

<p>Each line starts with indices &quot;ijk&quot; followed by the
covariances between aij and bij: </p>

<pre>
   121 Var(a12) 
   122 Cov(b12,a12)  Var(b12) 
          ...
   232 Cov(b23,a12)  Cov(b23,b12) ... Var (b23) </pre>

<ul>
    <li>If mle=1 you can enter zeros. </li>
</ul>

<blockquote>
    <pre># Covariance matrix
121 0.
122 0. 0.
131 0. 0. 0. 
132 0. 0. 0. 0. 
211 0. 0. 0. 0. 0. 
212 0. 0. 0. 0. 0. 0. 
231 0. 0. 0. 0. 0. 0. 0. 
232 0. 0. 0. 0. 0. 0. 0. 0.</pre>
</blockquote>

<ul>
    <li>If mle=0 you must enter a covariance matrix (usually
        obtained from an earlier run).<br>
        </li>
</ul>

<h4><a name="biaspar-l"></a><font color="#FF0000">last
uncommented line</font></h4>

<pre>agemin=70 agemax=100 bage=50 fage=100</pre>

<p>Once we obtained the estimated parameters, the program is able
to calculated stationary prevalence, transitions probabilities
and life expectancies at any age. Choice of age ranges is useful
for extrapolation. In our data file, ages varies from age 70 to
102. Setting bage=50 and fage=100, makes the program computing
life expectancy from age bage to age fage. As we use a model, we
can compute life expectancy on a wider age range than the age
range from the data. But the model can be rather wrong on big
intervals.</p>

<p>Similarly, it is possible to get extrapolated stationary
prevalence by age raning from agemin to agemax. </p>

<ul>
    <li><b>agemin=</b> Minimum age for calculation of the
        stationary prevalence </li>
    <li><b>agemax=</b> Maximum age for calculation of the
        stationary prevalence </li>
    <li><b>bage=</b> Minimum age for calculation of the health
        expectancies </li>
    <li><b>fage=</b> Maximum ages for calculation of the health
        expectancies </li>
</ul>

<hr>

<h2><a name="running"></a><font color="#00006A">Running Imach
with this example</font></h2>

<p>We assume that you entered your <a href="biaspar.txt">1st_example
parameter file</a> as explained <a href="#biaspar">above</a>. To
run the program you should click on the imach.exe icon and enter
the name of the parameter file which is for example <a
href="C:\usr\imach\mle\biaspar.txt">C:\usr\imach\mle\biaspar.txt</a>
(you also can click on the biaspar.txt icon located in <br>
<a href="C:\usr\imach\mle">C:\usr\imach\mle</a> and put it with
the mouse on the imach window).<br>
</p>

<p>The time to converge depends on the step unit that you used (1
month is cpu consuming), on the number of cases, and on the
number of variables.</p>

<p>The program outputs many files. Most of them are files which
will be plotted for better understanding.</p>

<hr>

<h2><a name="output"><font color="#00006A">Output of the program
and graphs</font> </a></h2>

<p>Once the optimization is finished, some graphics can be made
with a grapher. We use Gnuplot which is an interactive plotting
program copyrighted but freely distributed. Imach outputs the
source of a gnuplot file, named 'graph.gp', which can be directly
input into gnuplot.<br>
When the running is finished, the user should enter a caracter
for plotting and output editing. </p>

<p>These caracters are:</p>

<ul>
    <li>'c' to start again the program from the beginning.</li>
    <li>'g' to made graphics. The output graphs are in GIF format
        and you have no control over which is produced. If you
        want to modify the graphics or make another one, you
        should modify the parameters in the file <b>graph.gp</b>
        located in imach\bin. A gnuplot reference manual is
        available <a
        href="http://www.cs.dartmouth.edu/gnuplot/gnuplot.html">here</a>.
    </li>
    <li>'e' opens the <strong>index.htm</strong> file to edit the
        output files and graphs. </li>
    <li>'q' for exiting.</li>
</ul>

<h5><font size="4"><strong>Results files </strong></font><br>
<br>
<font color="#EC5E5E" size="3"><strong>- </strong></font><a
name="Observed prevalence in each state"><font color="#EC5E5E"
size="3"><strong>Observed prevalence in each state</strong></font></a><font
color="#EC5E5E" size="3"><strong> (and at first pass)</strong></font><b>:
</b><a href="prbiaspar.txt"><b>prbiaspar.txt</b></a><br>
</h5>

<p>The first line is the title and displays each field of the
file. The first column is age. The fields 2 and 6 are the
proportion of individuals in states 1 and 2 respectively as
observed during the first exam. Others fields are the numbers of
people in states 1, 2 or more. The number of columns increases if
the number of states is higher than 2.<br>
The header of the file is </p>

<pre># Age Prev(1) N(1) N Age Prev(2) N(2) N
70 1.00000 631 631 70 0.00000 0 631
71 0.99681 625 627 71 0.00319 2 627 
72 0.97125 1115 1148 72 0.02875 33 1148 </pre>

<pre># Age Prev(1) N(1) N Age Prev(2) N(2) N
    70 0.95721 604 631 70 0.04279 27 631</pre>

<p>It means that at age 70, the prevalence in state 1 is 1.000
and in state 2 is 0.00 . At age 71 the number of individuals in
state 1 is 625 and in state 2 is 2, hence the total number of
people aged 71 is 625+2=627. <br>
</p>

<h5><font color="#EC5E5E" size="3"><b>- Estimated parameters and
covariance matrix</b></font><b>: </b><a href="rbiaspar.txt"><b>rbiaspar.txt</b></a></h5>

<p>This file contains all the maximisation results: </p>

<pre> Number of iterations=47
 -2 log likelihood=46553.005854373667  
 Estimated parameters: a12 = -12.691743 b12 = 0.095819 
                       a13 = -7.815392   b13 = 0.031851 
                       a21 = -1.809895 b21 = -0.030470 
                       a23 = -7.838248  b23 = 0.039490  
 Covariance matrix: Var(a12) = 1.03611e-001
                    Var(b12) = 1.51173e-005
                    Var(a13) = 1.08952e-001
                    Var(b13) = 1.68520e-005  
                    Var(a21) = 4.82801e-001
                    Var(b21) = 6.86392e-005
                    Var(a23) = 2.27587e-001
                    Var(b23) = 3.04465e-005 
 </pre>

<h5><font color="#EC5E5E" size="3"><b>- Transition probabilities</b></font><b>:
</b><a href="pijrbiaspar.txt"><b>pijrbiaspar.txt</b></a></h5>

<p>Here are the transitions probabilities Pij(x, x+nh) where nh
is a multiple of 2 years. The first column is the starting age x
(from age 50 to 100), the second is age (x+nh) and the others are
the transition probabilities p11, p12, p13, p21, p22, p23. For
example, line 5 of the file is: </p>

<pre> 100 106 0.03286 0.23512 0.73202 0.02330 0.19210 0.78460 </pre>

<p>and this means: </p>

<pre>p11(100,106)=0.03286
p12(100,106)=0.23512
p13(100,106)=0.73202
p21(100,106)=0.02330
p22(100,106)=0.19210 
p22(100,106)=0.78460 </pre>

<h5><font color="#EC5E5E" size="3"><b>- </b></font><a
name="Stationary prevalence in each state"><font color="#EC5E5E"
size="3"><b>Stationary prevalence in each state</b></font></a><b>:
</b><a href="plrbiaspar.txt"><b>plrbiaspar.txt</b></a></h5>

<pre>#Age 1-1 2-2 
70 0.92274 0.07726 
71 0.91420 0.08580 
72 0.90481 0.09519 
73 0.89453 0.10547</pre>

<p>At age 70 the stationary prevalence is 0.92274 in state 1 and
0.07726 in state 2. This stationary prevalence differs from
observed prevalence. Here is the point. The observed prevalence
at age 70 results from the incidence of disability, incidence of
recovery and mortality which occurred in the past of the cohort.
Stationary prevalence results from a simulation with actual
incidences and mortality (estimated from this cross-longitudinal
survey). It is the best predictive value of the prevalence in the
future if &quot;nothing changes in the future&quot;. This is
exactly what demographers do with a Life table. Life expectancy
is the expected mean time to survive if observed mortality rates
(incidence of mortality) &quot;remains constant&quot; in the
future. </p>

<h5><font color="#EC5E5E" size="3"><b>- Standard deviation of
stationary prevalence</b></font><b>: </b><a
href="vplrbiaspar.txt"><b>vplrbiaspar.txt</b></a></h5>

<p>The stationary prevalence has to be compared with the observed
prevalence by age. But both are statistical estimates and
subjected to stochastic errors due to the size of the sample, the
design of the survey, and, for the stationary prevalence to the
model used and fitted. It is possible to compute the standard
deviation of the stationary prevalence at each age.</p>

<h6><font color="#EC5E5E" size="3">Observed and stationary
prevalence in state (2=disable) with the confident interval</font>:<b>
vbiaspar2.gif</b></h6>

<p><br>
This graph exhibits the stationary prevalence in state (2) with
the confidence interval in red. The green curve is the observed
prevalence (or proportion of individuals in state (2)). Without
discussing the results (it is not the purpose here), we observe
that the green curve is rather below the stationary prevalence.
It suggests an increase of the disability prevalence in the
future.</p>

<p><img src="vbiaspar2.gif" width="400" height="300"></p>

<h6><font color="#EC5E5E" size="3"><b>Convergence to the
stationary prevalence of disability</b></font><b>: pbiaspar1.gif</b><br>
<img src="pbiaspar1.gif" width="400" height="300"> </h6>

<p>This graph plots the conditional transition probabilities from
an initial state (1=healthy in red at the bottom, or 2=disable in
green on top) at age <em>x </em>to the final state 2=disable<em> </em>at
age <em>x+h. </em>Conditional means at the condition to be alive
at age <em>x+h </em>which is <i>hP12x</i> + <em>hP22x</em>. The
curves <i>hP12x/(hP12x</i> + <em>hP22x) </em>and <i>hP22x/(hP12x</i>
+ <em>hP22x) </em>converge with <em>h, </em>to the <em>stationary
prevalence of disability</em>. In order to get the stationary
prevalence at age 70 we should start the process at an earlier
age, i.e.50. If the disability state is defined by severe
disability criteria with only a few chance to recover, then the
incidence of recovery is low and the time to convergence is
probably longer. But we don't have experience yet.</p>

<h5><font color="#EC5E5E" size="3"><b>- Life expectancies by age
and initial health status</b></font><b>: </b><a
href="erbiaspar.txt"><b>erbiaspar.txt</b></a></h5>

<pre># Health expectancies 
# Age 1-1 1-2 2-1 2-2 
70 10.7297 2.7809 6.3440 5.9813 
71 10.3078 2.8233 5.9295 5.9959 
72 9.8927 2.8643 5.5305 6.0033 
73 9.4848 2.9036 5.1474 6.0035 </pre>

<pre>For example 70 10.7297 2.7809 6.3440 5.9813 means:
e11=10.7297 e12=2.7809 e21=6.3440 e22=5.9813</pre>

<pre><img src="exbiaspar1.gif" width="400" height="300"><img
src="exbiaspar2.gif" width="400" height="300"></pre>

<p>For example, life expectancy of a healthy individual at age 70
is 10.73 in the healthy state and 2.78 in the disability state
(=13.51 years). If he was disable at age 70, his life expectancy
will be shorter, 6.34 in the healthy state and 5.98 in the
disability state (=12.32 years). The total life expectancy is a
weighted mean of both, 13.51 and 12.32; weight is the proportion
of people disabled at age 70. In order to get a pure period index
(i.e. based only on incidences) we use the <a
href="#Stationary prevalence in each state">computed or
stationary prevalence</a> at age 70 (i.e. computed from
incidences at earlier ages) instead of the <a
href="#Observed prevalence in each state">observed prevalence</a>
(for example at first exam) (<a href="#Health expectancies">see
below</a>).</p>

<h5><font color="#EC5E5E" size="3"><b>- Variances of life
expectancies by age and initial health status</b></font><b>: </b><a
href="vrbiaspar.txt"><b>vrbiaspar.txt</b></a></h5>

<p>For example, the covariances of life expectancies Cov(ei,ej)
at age 50 are (line 3) </p>

<pre>   Cov(e1,e1)=0.4667  Cov(e1,e2)=0.0605=Cov(e2,e1)  Cov(e2,e2)=0.0183</pre>

<h5><font color="#EC5E5E" size="3"><b>- </b></font><a
name="Health expectancies"><font color="#EC5E5E" size="3"><b>Health
expectancies</b></font></a><font color="#EC5E5E" size="3"><b>
with standard errors in parentheses</b></font><b>: </b><a
href="trbiaspar.txt"><font face="Courier New"><b>trbiaspar.txt</b></font></a></h5>

<pre>#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) </pre>

<pre>70 13.42 (0.18) 10.39 (0.15) 3.03 (0.10)70 13.81 (0.18) 11.28 (0.14) 2.53 (0.09) </pre>

<p>Thus, at age 70 the total life expectancy, e..=13.42 years is
the weighted mean of e1.=13.51 and e2.=12.32 by the stationary
prevalence at age 70 which are 0.92274 in state 1 and 0.07726 in
state 2, respectively (the sum is equal to one). e.1=10.39 is the
Disability-free life expectancy at age 70 (it is again a weighted
mean of e11 and e21). e.2=3.03 is also the life expectancy at age
70 to be spent in the disability state.</p>

<h6><font color="#EC5E5E" size="3"><b>Total life expectancy by
age and health expectancies in states (1=healthy) and (2=disable)</b></font><b>:
ebiaspar.gif</b></h6>

<p>This figure represents the health expectancies and the total
life expectancy with the confident interval in dashed curve. </p>

<pre>        <img src="ebiaspar.gif" width="400" height="300"></pre>

<p>Standard deviations (obtained from the information matrix of
the model) of these quantities are very useful.
Cross-longitudinal surveys are costly and do not involve huge
samples, generally a few thousands; therefore it is very
important to have an idea of the standard deviation of our
estimates. It has been a big challenge to compute the Health
Expectancy standard deviations. Don't be confuse: life expectancy
is, as any expected value, the mean of a distribution; but here
we are not computing the standard deviation of the distribution,
but the standard deviation of the estimate of the mean.</p>

<p>Our health expectancies estimates vary according to the sample
size (and the standard deviations give confidence intervals of
the estimate) but also according to the model fitted. Let us
explain it in more details.</p>

<p>Choosing a model means ar least two kind of choices. First we
have to decide the number of disability states. Second we have to
design, within the logit model family, the model: variables,
covariables, confonding factors etc. to be included.</p>

<p>More disability states we have, better is our demographical
approach of the disability process, but smaller are the number of
transitions between each state and higher is the noise in the
measurement. We do not have enough experiments of the various
models to summarize the advantages and disadvantages, but it is
important to say that even if we had huge and unbiased samples,
the total life expectancy computed from a cross-longitudinal
survey, varies with the number of states. If we define only two
states, alive or dead, we find the usual life expectancy where it
is assumed that at each age, people are at the same risk to die.
If we are differentiating the alive state into healthy and
disable, and as the mortality from the disability state is higher
than the mortality from the healthy state, we are introducing
heterogeneity in the risk of dying. The total mortality at each
age is the weighted mean of the mortality in each state by the
prevalence in each state. Therefore if the proportion of people
at each age and in each state is different from the stationary
equilibrium, there is no reason to find the same total mortality
at a particular age. Life expectancy, even if it is a very useful
tool, has a very strong hypothesis of homogeneity of the
population. Our main purpose is not to measure differential
mortality but to measure the expected time in a healthy or
disability state in order to maximise the former and minimize the
latter. But the differential in mortality complexifies the
measurement.</p>

<p>Incidences of disability or recovery are not affected by the
number of states if these states are independant. But incidences
estimates are dependant on the specification of the model. More
covariates we added in the logit model better is the model, but
some covariates are not well measured, some are confounding
factors like in any statistical model. The procedure to &quot;fit
the best model' is similar to logistic regression which itself is
similar to regression analysis. We haven't yet been sofar because
we also have a severe limitation which is the speed of the
convergence. On a Pentium III, 500 MHz, even the simplest model,
estimated by month on 8,000 people may take 4 hours to converge.
Also, the program is not yet a statistical package, which permits
a simple writing of the variables and the model to take into
account in the maximisation. The actual program allows only to
add simple variables without covariations, like age+sex but
without age+sex+ age*sex . This can be done from the source code
(you have to change three lines in the source code) but will
never be general enough. But what is to remember, is that
incidences or probability of change from one state to another is
affected by the variables specified into the model.</p>

<p>Also, the age range of the people interviewed has a link with
the age range of the life expectancy which can be estimated by
extrapolation. If your sample ranges from age 70 to 95, you can
clearly estimate a life expectancy at age 70 and trust your
confidence interval which is mostly based on your sample size,
but if you want to estimate the life expectancy at age 50, you
should rely in your model, but fitting a logistic model on a age
range of 70-95 and estimating probabilties of transition out of
this age range, say at age 50 is very dangerous. At least you
should remember that the confidence interval given by the
standard deviation of the health expectancies, are under the
strong assumption that your model is the 'true model', which is
probably not the case.</p>

<h5><font color="#EC5E5E" size="3"><b>- Copy of the parameter
file</b></font><b>: </b><a href="orbiaspar.txt"><b>orbiaspar.txt</b></a></h5>

<p>This copy of the parameter file can be useful to re-run the
program while saving the old output files. </p>

<hr>

<h2><a name="example" </a><font color="#00006A">Trying an example</font></a></h2>

<p>Since you know how to run the program, it is time to test it
on your own computer. Try for example on a parameter file named <a
href="file://../mytry/imachpar.txt">imachpar.txt</a> which is a
copy of <font size="2" face="Courier New">mypar.txt</font>
included in the subdirectory of imach, <font size="2"
face="Courier New">mytry</font>. Edit it to change the name of
the data file to <font size="2" face="Courier New">..\data\mydata.txt</font>
if you don't want to copy it on the same directory. The file <font
face="Courier New">mydata.txt</font> is a smaller file of 3,000
people but still with 4 waves. </p>

<p>Click on the imach.exe icon to open a window. Answer to the
question:'<strong>Enter the parameter file name:'</strong></p>

<table border="1">
    <tr>
        <td width="100%"><strong>IMACH, Version 0.63</strong><p><strong>Enter
        the parameter file name: ..\mytry\imachpar.txt</strong></p>
        </td>
    </tr>
</table>

<p>Most of the data files or image files generated, will use the
'imachpar' string into their name. The running time is about 2-3
minutes on a Pentium III. If the execution worked correctly, the
outputs files are created in the current directory, and should be
the same as the mypar files initially included in the directory <font
size="2" face="Courier New">mytry</font>.</p>

<ul>
    <li><pre><u>Output on the screen</u> The output screen looks like <a
href="imachrun.LOG">this Log file</a>
#

title=MLE datafile=..\data\mydata.txt lastobs=3000 firstpass=1 lastpass=3
ftol=1.000000e-008 stepm=24 ncov=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0</pre>
    </li>
    <li><pre>Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92

Warning, no any valid information for:126 line=126
Warning, no any valid information for:2307 line=2307
Delay (in months) between two waves Min=21 Max=51 Mean=24.495826
<font face="Times New Roman">These lines give some warnings on the data file and also some raw statistics on frequencies of transitions.</font>
Age 70 1.=230 loss[1]=3.5% 2.=16 loss[2]=12.5% 1.=222 prev[1]=94.1% 2.=14
 prev[2]=5.9% 1-1=8 11=200 12=7 13=15 2-1=2 21=6 22=7 23=1
Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0 </pre>
    </li>
</ul>

<p>&nbsp;</p>

<ul>
    <li>Maximisation with the Powell algorithm. 8 directions are
        given corresponding to the 8 parameters. this can be
        rather long to get convergence.<br>
        <font size="1" face="Courier New"><br>
        Powell iter=1 -2*LL=11531.405658264877 1 0.000000000000 2
        0.000000000000 3<br>
        0.000000000000 4 0.000000000000 5 0.000000000000 6
        0.000000000000 7 <br>
        0.000000000000 8 0.000000000000<br>
        1..........2.................3..........4.................5.........<br>
        6................7........8...............<br>
        Powell iter=23 -2*LL=6744.954108371555 1 -12.967632334283
        <br>
        2 0.135136681033 3 -7.402109728262 4 0.067844593326 <br>
        5 -0.673601538129 6 -0.006615504377 7 -5.051341616718 <br>
        8 0.051272038506<br>
        1..............2...........3..............4...........<br>
        5..........6................7...........8.........<br>
        #Number of iterations = 23, -2 Log likelihood =
        6744.954042573691<br>
        # Parameters<br>
        12 -12.966061 0.135117 <br>
        13 -7.401109 0.067831 <br>
        21 -0.672648 -0.006627 <br>
        23 -5.051297 0.051271 </font><br>
        </li>
    <li><pre><font size="2">Calculation of the hessian matrix. Wait...
12345678.12.13.14.15.16.17.18.23.24.25.26.27.28.34.35.36.37.38.45.46.47.48.56.57.58.67.68.78

Inverting the hessian to get the covariance matrix. Wait...

#Hessian matrix#
3.344e+002 2.708e+004 -4.586e+001 -3.806e+003 -1.577e+000 -1.313e+002 3.914e-001 3.166e+001 
2.708e+004 2.204e+006 -3.805e+003 -3.174e+005 -1.303e+002 -1.091e+004 2.967e+001 2.399e+003 
-4.586e+001 -3.805e+003 4.044e+002 3.197e+004 2.431e-002 1.995e+000 1.783e-001 1.486e+001 
-3.806e+003 -3.174e+005 3.197e+004 2.541e+006 2.436e+000 2.051e+002 1.483e+001 1.244e+003 
-1.577e+000 -1.303e+002 2.431e-002 2.436e+000 1.093e+002 8.979e+003 -3.402e+001 -2.843e+003 
-1.313e+002 -1.091e+004 1.995e+000 2.051e+002 8.979e+003 7.420e+005 -2.842e+003 -2.388e+005 
3.914e-001 2.967e+001 1.783e-001 1.483e+001 -3.402e+001 -2.842e+003 1.494e+002 1.251e+004 
3.166e+001 2.399e+003 1.486e+001 1.244e+003 -2.843e+003 -2.388e+005 1.251e+004 1.053e+006 
# Scales
12 1.00000e-004 1.00000e-006
13 1.00000e-004 1.00000e-006
21 1.00000e-003 1.00000e-005
23 1.00000e-004 1.00000e-005
# Covariance
  1 5.90661e-001
  2 -7.26732e-003 8.98810e-005
  3 8.80177e-002 -1.12706e-003 5.15824e-001
  4 -1.13082e-003 1.45267e-005 -6.50070e-003 8.23270e-005
  5 9.31265e-003 -1.16106e-004 6.00210e-004 -8.04151e-006 1.75753e+000
  6 -1.15664e-004 1.44850e-006 -7.79995e-006 1.04770e-007 -2.12929e-002 2.59422e-004
  7 1.35103e-003 -1.75392e-005 -6.38237e-004 7.85424e-006 4.02601e-001 -4.86776e-003 1.32682e+000
  8 -1.82421e-005 2.35811e-007 7.75503e-006 -9.58687e-008 -4.86589e-003 5.91641e-005 -1.57767e-002 1.88622e-004
# agemin agemax for lifexpectancy, bage fage (if mle==0 ie no data nor Max likelihood).


agemin=70 agemax=100 bage=50 fage=100
Computing prevalence limit: result on file 'plrmypar.txt' 
Computing pij: result on file 'pijrmypar.txt' 
Computing Health Expectancies: result on file 'ermypar.txt' 
Computing Variance-covariance of DFLEs: file 'vrmypar.txt' 
Computing Total LEs with variances: file 'trmypar.txt' 
Computing Variance-covariance of Prevalence limit: file 'vplrmypar.txt' 
End of Imach
</font></pre>
    </li>
</ul>

<p><font size="3">Once the running is finished, the program
requires a caracter:</font></p>

<table border="1">
    <tr>
        <td width="100%"><strong>Type g for plotting (available
        if mle=1), e to edit output files, c to start again,</strong><p><strong>and
        q for exiting:</strong></p>
        </td>
    </tr>
</table>

<p><font size="3">First you should enter <strong>g</strong> to
make the figures and then you can edit all the results by typing <strong>e</strong>.
</font></p>

<ul>
    <li><u>Outputs files</u> <br>
        - index.htm, this file is the master file on which you
        should click first.<br>
        - Observed prevalence in each state: <a
        href="..\mytry\prmypar.txt">mypar.txt</a> <br>
        - Estimated parameters and the covariance matrix: <a
        href="..\mytry\rmypar.txt">rmypar.txt</a> <br>
        - Stationary prevalence in each state: <a
        href="..\mytry\plrmypar.txt">plrmypar.txt</a> <br>
        - Transition probabilities: <a
        href="..\mytry\pijrmypar.txt">pijrmypar.txt</a> <br>
        - Copy of the parameter file: <a
        href="..\mytry\ormypar.txt">ormypar.txt</a> <br>
        - Life expectancies by age and initial health status: <a
        href="..\mytry\ermypar.txt">ermypar.txt</a> <br>
        - Variances of life expectancies by age and initial
        health status: <a href="..\mytry\vrmypar.txt">vrmypar.txt</a>
        <br>
        - Health expectancies with their variances: <a
        href="..\mytry\trmypar.txt">trmypar.txt</a> <br>
        - Standard deviation of stationary prevalence: <a
        href="..\mytry\vplrmypar.txt">vplrmypar.txt</a> <br>
        <br>
        </li>
    <li><u>Graphs</u> <br>
        <br>
        -<a href="..\mytry\vmypar1.gif">Observed and stationary
        prevalence in state (1) with the confident interval</a> <br>
        -<a href="..\mytry\vmypar2.gif">Observed and stationary
        prevalence in state (2) with the confident interval</a> <br>
        -<a href="..\mytry\exmypar1.gif">Health life expectancies
        by age and initial health state (1)</a> <br>
        -<a href="..\mytry\exmypar2.gif">Health life expectancies
        by age and initial health state (2)</a> <br>
        -<a href="..\mytry\emypar.gif">Total life expectancy by
        age and health expectancies in states (1) and (2).</a> </li>
</ul>

<p>This software have been partly granted by <a
href="http://euroreves.ined.fr">Euro-REVES</a>, a concerted
action from the European Union. It will be copyrighted
identically to a GNU software product, i.e. program and software
can be distributed freely for non commercial use. Sources are not
widely distributed today. You can get them by asking us with a
simple justification (name, email, institute) <a
href="mailto:brouard@ined.fr">mailto:brouard@ined.fr</a> and <a
href="mailto:lievre@ined.fr">mailto:lievre@ined.fr</a> .</p>

<p>Latest version (0.63 of 16 march 2000) can be accessed at <a
href="http://euroeves.ined.fr/imach">http://euroreves.ined.fr/imach</a><br>
</p>
</body>
</html>

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>