Annotation of imach064/doc/imach.htm, revision 1.1.1.1
1.1 brouard 1: <html>
2:
3: <head>
4: <meta http-equiv="Content-Type"
5: content="text/html; charset=iso-8859-1">
6: <meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
7: <title>Computing Health Expectancies using IMaCh</title>
8: </head>
9:
10: <body bgcolor="#FFFFFF">
11:
12: <hr size="3" color="#EC5E5E">
13:
14: <h1 align="center"><font color="#00006A">Computing Health
15: Expectancies using IMaCh</font></h1>
16:
17: <h1 align="center"><font color="#00006A" size="5">(a Maximum
18: Likelihood Computer Program using Interpolation of Markov Chains)</font></h1>
19:
20: <p align="center"> </p>
21:
22: <p align="center"><a href="http://www.ined.fr/"><img
23: src="logo-ined.gif" border="0" width="151" height="76"></a><img
24: src="euroreves2.gif" width="151" height="75"></p>
25:
26: <h3 align="center"><a href="http://www.ined.fr/"><font
27: color="#00006A">INED</font></a><font color="#00006A"> and </font><a
28: href="http://euroreves.ined.fr"><font color="#00006A">EUROREVES</font></a></h3>
29:
30: <p align="center"><font color="#00006A" size="4"><strong>March
31: 2000</strong></font></p>
32:
33: <hr size="3" color="#EC5E5E">
34:
35: <p align="center"><font color="#00006A"><strong>Authors of the
36: program: </strong></font><a href="http://sauvy.ined.fr/brouard"><font
37: color="#00006A"><strong>Nicolas Brouard</strong></font></a><font
38: color="#00006A"><strong>, senior researcher at the </strong></font><a
39: href="http://www.ined.fr"><font color="#00006A"><strong>Institut
40: National d'Etudes Démographiques</strong></font></a><font
41: color="#00006A"><strong> (INED, Paris) in the "Mortality,
42: Health and Epidemiology" Research Unit </strong></font></p>
43:
44: <p align="center"><font color="#00006A"><strong>and Agnès
45: Lièvre<br clear="left">
46: </strong></font></p>
47:
48: <h4><font color="#00006A">Contribution to the mathematics: C. R.
49: Heathcote </font><font color="#00006A" size="2">(Australian
50: National University, Canberra).</font></h4>
51:
52: <h4><font color="#00006A">Contact: Agnès Lièvre (</font><a
53: href="mailto:lievre@ined.fr"><font color="#00006A"><i>lievre@ined.fr</i></font></a><font
54: color="#00006A">) </font></h4>
55:
56: <hr>
57:
58: <ul>
59: <li><a href="#intro">Introduction</a> </li>
60: <li>The detailed statistical model (<a href="docmath.pdf">PDF
61: version</a>),(<a href="docmath.ps">ps version</a>) </li>
62: <li><a href="#data">On what kind of data can it be used?</a></li>
63: <li><a href="#datafile">The data file</a> </li>
64: <li><a href="#biaspar">The parameter file</a> </li>
65: <li><a href="#running">Running Imach</a> </li>
66: <li><a href="#output">Output files and graphs</a> </li>
67: <li><a href="#example">Exemple</a> </li>
68: </ul>
69:
70: <hr>
71:
72: <h2><a name="intro"><font color="#00006A">Introduction</font></a></h2>
73:
74: <p>This program computes <b>Healthy Life Expectancies</b> from <b>cross-longitudinal
75: data</b>. Within the family of Health Expectancies (HE),
76: Disability-free life expectancy (DFLE) is probably the most
77: important index to monitor. In low mortality countries, there is
78: a fear that when mortality declines, the increase in DFLE is not
79: proportionate to the increase in total Life expectancy. This case
80: is called the <em>Expansion of morbidity</em>. Most of the data
81: collected today, in particular by the international <a
82: href="http://euroreves/reves">REVES</a> network on Health
83: expectancy, and most HE indices based on these data, are <em>cross-sectional</em>.
84: It means that the information collected comes from a single
85: cross-sectional survey: people from various ages (but mostly old
86: people) are surveyed on their health status at a single date.
87: Proportion of people disabled at each age, can then be measured
88: at that date. This age-specific prevalence curve is then used to
89: distinguish, within the stationary population (which, by
90: definition, is the life table estimated from the vital statistics
91: on mortality at the same date), the disable population from the
92: disability-free population. Life expectancy (LE) (or total
93: population divided by the yearly number of births or deaths of
94: this stationary population) is then decomposed into DFLE and DLE.
95: This method of computing HE is usually called the Sullivan method
96: (from the name of the author who first described it).</p>
97:
98: <p>Age-specific proportions of people disable are very difficult
99: to forecast because each proportion corresponds to historical
100: conditions of the cohort and it is the result of the historical
101: flows from entering disability and recovering in the past until
102: today. The age-specific intensities (or incidence rates) of
103: entering disability or recovering a good health, are reflecting
104: actual conditions and therefore can be used at each age to
105: forecast the future of this cohort. For example if a country is
106: improving its technology of prosthesis, the incidence of
107: recovering the ability to walk will be higher at each (old) age,
108: but the prevalence of disability will only slightly reflect an
109: improve because the prevalence is mostly affected by the history
110: of the cohort and not by recent period effects. To measure the
111: period improvement we have to simulate the future of a cohort of
112: new-borns entering or leaving at each age the disability state or
113: dying according to the incidence rates measured today on
114: different cohorts. The proportion of people disabled at each age
115: in this simulated cohort will be much lower (using the exemple of
116: an improvement) that the proportions observed at each age in a
117: cross-sectional survey. This new prevalence curve introduced in a
118: life table will give a much more actual and realistic HE level
119: than the Sullivan method which mostly measured the History of
120: health conditions in this country.</p>
121:
122: <p>Therefore, the main question is how to measure incidence rates
123: from cross-longitudinal surveys? This is the goal of the IMaCH
124: program. From your data and using IMaCH you can estimate period
125: HE and not only Sullivan's HE. Also the standard errors of the HE
126: are computed.</p>
127:
128: <p>A cross-longitudinal survey consists in a first survey
129: ("cross") where individuals from different ages are
130: interviewed on their health status or degree of disability. At
131: least a second wave of interviews ("longitudinal")
132: should measure each new individual health status. Health
133: expectancies are computed from the transitions observed between
134: waves and are computed for each degree of severity of disability
135: (number of life states). More degrees you consider, more time is
136: necessary to reach the Maximum Likelihood of the parameters
137: involved in the model. Considering only two states of disability
138: (disable and healthy) is generally enough but the computer
139: program works also with more health statuses.<br>
140: <br>
141: The simplest model is the multinomial logistic model where <i>pij</i>
142: is the probability to be observed in state <i>j</i> at the second
143: wave conditional to be observed in state <em>i</em> at the first
144: wave. Therefore a simple model is: log<em>(pij/pii)= aij +
145: bij*age+ cij*sex,</em> where '<i>age</i>' is age and '<i>sex</i>'
146: is a covariate. The advantage that this computer program claims,
147: comes from that if the delay between waves is not identical for
148: each individual, or if some individual missed an interview, the
149: information is not rounded or lost, but taken into account using
150: an interpolation or extrapolation. <i>hPijx</i> is the
151: probability to be observed in state <i>i</i> at age <i>x+h</i>
152: conditional to the observed state <i>i</i> at age <i>x</i>. The
153: delay '<i>h</i>' can be split into an exact number (<i>nh*stepm</i>)
154: of unobserved intermediate states. This elementary transition (by
155: month or quarter trimester, semester or year) is modeled as a
156: multinomial logistic. The <i>hPx</i> matrix is simply the matrix
157: product of <i>nh*stepm</i> elementary matrices and the
158: contribution of each individual to the likelihood is simply <i>hPijx</i>.
159: <br>
160: </p>
161:
162: <p>The program presented in this manual is a quite general
163: program named <strong>IMaCh</strong> (for <strong>I</strong>nterpolated
164: <strong>MA</strong>rkov <strong>CH</strong>ain), designed to
165: analyse transition data from longitudinal surveys. The first step
166: is the parameters estimation of a transition probabilities model
167: between an initial status and a final status. From there, the
168: computer program produces some indicators such as observed and
169: stationary prevalence, life expectancies and their variances and
170: graphs. Our transition model consists in absorbing and
171: non-absorbing states with the possibility of return across the
172: non-absorbing states. The main advantage of this package,
173: compared to other programs for the analysis of transition data
174: (For example: Proc Catmod of SAS<sup>®</sup>) is that the whole
175: individual information is used even if an interview is missing, a
176: status or a date is unknown or when the delay between waves is
177: not identical for each individual. The program can be executed
178: according to parameters: selection of a sub-sample, number of
179: absorbing and non-absorbing states, number of waves taken in
180: account (the user inputs the first and the last interview), a
181: tolerance level for the maximization function, the periodicity of
182: the transitions (we can compute annual, quaterly or monthly
183: transitions), covariates in the model. It works on Windows or on
184: Unix.<br>
185: </p>
186:
187: <hr>
188:
189: <h2><a name="data"><font color="#00006A">On what kind of data can
190: it be used?</font></a></h2>
191:
192: <p>The minimum data required for a transition model is the
193: recording of a set of individuals interviewed at a first date and
194: interviewed again at least one another time. From the
195: observations of an individual, we obtain a follow-up over time of
196: the occurrence of a specific event. In this documentation, the
197: event is related to health status at older ages, but the program
198: can be applied on a lot of longitudinal studies in different
199: contexts. To build the data file explained into the next section,
200: you must have the month and year of each interview and the
201: corresponding health status. But in order to get age, date of
202: birth (month and year) is required (missing values is allowed for
203: month). Date of death (month and year) is an important
204: information also required if the individual is dead. Shorter
205: steps (i.e. a month) will more closely take into account the
206: survival time after the last interview.</p>
207:
208: <hr>
209:
210: <h2><a name="datafile"><font color="#00006A">The data file</font></a></h2>
211:
212: <p>In this example, 8,000 people have been interviewed in a
213: cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990).
214: Some people missed 1, 2 or 3 interviews. Health statuses are
215: healthy (1) and disable (2). The survey is not a real one. It is
216: a simulation of the American Longitudinal Survey on Aging. The
217: disability state is defined if the individual missed one of four
218: ADL (Activity of daily living, like bathing, eating, walking).
219: Therefore, even is the individuals interviewed in the sample are
220: virtual, the information brought with this sample is close to the
221: situation of the United States. Sex is not recorded is this
222: sample.</p>
223:
224: <p>Each line of the data set (named <a href="data1.txt">data1.txt</a>
225: in this first example) is an individual record which fields are: </p>
226:
227: <ul>
228: <li><b>Index number</b>: positive number (field 1) </li>
229: <li><b>First covariate</b> positive number (field 2) </li>
230: <li><b>Second covariate</b> positive number (field 3) </li>
231: <li><a name="Weight"><b>Weight</b></a>: positive number
232: (field 4) . In most surveys individuals are weighted
233: according to the stratification of the sample.</li>
234: <li><b>Date of birth</b>: coded as mm/yyyy. Missing dates are
235: coded as 99/9999 (field 5) </li>
236: <li><b>Date of death</b>: coded as mm/yyyy. Missing dates are
237: coded as 99/9999 (field 6) </li>
238: <li><b>Date of first interview</b>: coded as mm/yyyy. Missing
239: dates are coded as 99/9999 (field 7) </li>
240: <li><b>Status at first interview</b>: positive number.
241: Missing values ar coded -1. (field 8) </li>
242: <li><b>Date of second interview</b>: coded as mm/yyyy.
243: Missing dates are coded as 99/9999 (field 9) </li>
244: <li><strong>Status at second interview</strong> positive
245: number. Missing values ar coded -1. (field 10) </li>
246: <li><b>Date of third interview</b>: coded as mm/yyyy. Missing
247: dates are coded as 99/9999 (field 11) </li>
248: <li><strong>Status at third interview</strong> positive
249: number. Missing values ar coded -1. (field 12) </li>
250: <li><b>Date of fourth interview</b>: coded as mm/yyyy.
251: Missing dates are coded as 99/9999 (field 13) </li>
252: <li><strong>Status at fourth interview</strong> positive
253: number. Missing values are coded -1. (field 14) </li>
254: <li>etc</li>
255: </ul>
256:
257: <p> </p>
258:
259: <p>If your longitudinal survey do not include information about
260: weights or covariates, you must fill the column with a number
261: (e.g. 1) because a missing field is not allowed.</p>
262:
263: <hr>
264:
265: <h2><font color="#00006A">Your first example parameter file</font><a
266: href="http://euroreves.ined.fr/imach"></a><a name="uio"></a></h2>
267:
268: <h2><a name="biaspar"></a>#Imach version 0.63, February 2000,
269: INED-EUROREVES </h2>
270:
271: <p>This is a comment. Comments start with a '#'.</p>
272:
273: <h4><font color="#FF0000">First uncommented line</font></h4>
274:
275: <pre>title=1st_example datafile=data1.txt lastobs=8600 firstpass=1 lastpass=4</pre>
276:
277: <ul>
278: <li><b>title=</b> 1st_example is title of the run. </li>
279: <li><b>datafile=</b>data1.txt is the name of the data set.
280: Our example is a six years follow-up survey. It consists
281: in a baseline followed by 3 reinterviews. </li>
282: <li><b>lastobs=</b> 8600 the program is able to run on a
283: subsample where the last observation number is lastobs.
284: It can be set a bigger number than the real number of
285: observations (e.g. 100000). In this example, maximisation
286: will be done on the 8600 first records. </li>
287: <li><b>firstpass=1</b> , <b>lastpass=4 </b>In case of more
288: than two interviews in the survey, the program can be run
289: on selected transitions periods. firstpass=1 means the
290: first interview included in the calculation is the
291: baseline survey. lastpass=4 means that the information
292: brought by the 4th interview is taken into account.</li>
293: </ul>
294:
295: <p> </p>
296:
297: <h4><a name="biaspar-2"><font color="#FF0000">Second uncommented
298: line</font></a></h4>
299:
300: <pre>ftol=1.e-08 stepm=1 ncov=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0</pre>
301:
302: <ul>
303: <li><b>ftol=1e-8</b> Convergence tolerance on the function
304: value in the maximisation of the likelihood. Choosing a
305: correct value for ftol is difficult. 1e-8 is a correct
306: value for a 32 bits computer.</li>
307: <li><b>stepm=1</b> Time unit in months for interpolation.
308: Examples:<ul>
309: <li>If stepm=1, the unit is a month </li>
310: <li>If stepm=4, the unit is a trimester</li>
311: <li>If stepm=12, the unit is a year </li>
312: <li>If stepm=24, the unit is two years</li>
313: <li>... </li>
314: </ul>
315: </li>
316: <li><b>ncov=2</b> Number of covariates to be add to the
317: model. The intercept and the age parameter are counting
318: for 2 covariates. For example, if you want to add gender
319: in the covariate vector you must write ncov=3 else
320: ncov=2. </li>
321: <li><b>nlstate=2</b> Number of non-absorbing (live) states.
322: Here we have two alive states: disability-free is coded 1
323: and disability is coded 2. </li>
324: <li><b>ndeath=1</b> Number of absorbing states. The absorbing
325: state death is coded 3. </li>
326: <li><b>maxwav=4</b> Maximum number of waves. The program can
327: not include more than 4 interviews. </li>
328: <li><a name="mle"><b>mle</b></a><b>=1</b> Option for the
329: Maximisation Likelihood Estimation. <ul>
330: <li>If mle=1 the program does the maximisation and
331: the calculation of heath expectancies </li>
332: <li>If mle=0 the program only does the calculation of
333: the health expectancies. </li>
334: </ul>
335: </li>
336: <li><b>weight=0</b> Possibility to add weights. <ul>
337: <li>If weight=0 no weights are included </li>
338: <li>If weight=1 the maximisation integrates the
339: weights which are in field <a href="#Weight">4</a></li>
340: </ul>
341: </li>
342: </ul>
343:
344: <h4><font color="#FF0000">Guess values for optimization</font><font
345: color="#00006A"> </font></h4>
346:
347: <p>You must write the initial guess values of the parameters for
348: optimization. The number of parameters, <em>N</em> depends on the
349: number of absorbing states and non-absorbing states and on the
350: number of covariates. <br>
351: <em>N</em> is given by the formula <em>N</em>=(<em>nlstate</em> +
352: <em>ndeath</em>-1)*<em>nlstate</em>*<em>ncov</em> . <br>
353: <br>
354: Thus in the simple case with 2 covariates (the model is log
355: (pij/pii) = aij + bij * age where intercept and age are the two
356: covariates), and 2 health degrees (1 for disability-free and 2
357: for disability) and 1 absorbing state (3), you must enter 8
358: initials values, a12, b12, a13, b13, a21, b21, a23, b23. You can
359: start with zeros as in this example, but if you have a more
360: precise set (for example from an earlier run) you can enter it
361: and it will speed up them<br>
362: Each of the four lines starts with indices "ij": <br>
363: <br>
364: <b>ij aij bij</b> </p>
365:
366: <blockquote>
367: <pre># Guess values of aij and bij in log (pij/pii) = aij + bij * age
368: 12 -14.155633 0.110794
369: 13 -7.925360 0.032091
370: 21 -1.890135 -0.029473
371: 23 -6.234642 0.022315 </pre>
372: </blockquote>
373:
374: <p>or, to simplify: </p>
375:
376: <blockquote>
377: <pre>12 0.0 0.0
378: 13 0.0 0.0
379: 21 0.0 0.0
380: 23 0.0 0.0</pre>
381: </blockquote>
382:
383: <h4><font color="#FF0000">Guess values for computing variances</font></h4>
384:
385: <p>This is an output if <a href="#mle">mle</a>=1. But it can be
386: used as an input to get the vairous output data files (Health
387: expectancies, stationary prevalence etc.) and figures without
388: rerunning the rather long maximisation phase (mle=0). </p>
389:
390: <p>The scales are small values for the evaluation of numerical
391: derivatives. These derivatives are used to compute the hessian
392: matrix of the parameters, that is the inverse of the covariance
393: matrix, and the variances of health expectancies. Each line
394: consists in indices "ij" followed by the initial scales
395: (zero to simplify) associated with aij and bij. </p>
396:
397: <ul>
398: <li>If mle=1 you can enter zeros:</li>
399: </ul>
400:
401: <blockquote>
402: <pre># Scales (for hessian or gradient estimation)
403: 12 0. 0.
404: 13 0. 0.
405: 21 0. 0.
406: 23 0. 0. </pre>
407: </blockquote>
408:
409: <ul>
410: <li>If mle=0 you must enter a covariance matrix (usually
411: obtained from an earlier run).</li>
412: </ul>
413:
414: <h4><font color="#FF0000">Covariance matrix of parameters</font></h4>
415:
416: <p>This is an output if <a href="#mle">mle</a>=1. But it can be
417: used as an input to get the vairous output data files (Health
418: expectancies, stationary prevalence etc.) and figures without
419: rerunning the rather long maximisation phase (mle=0). </p>
420:
421: <p>Each line starts with indices "ijk" followed by the
422: covariances between aij and bij: </p>
423:
424: <pre>
425: 121 Var(a12)
426: 122 Cov(b12,a12) Var(b12)
427: ...
428: 232 Cov(b23,a12) Cov(b23,b12) ... Var (b23) </pre>
429:
430: <ul>
431: <li>If mle=1 you can enter zeros. </li>
432: </ul>
433:
434: <blockquote>
435: <pre># Covariance matrix
436: 121 0.
437: 122 0. 0.
438: 131 0. 0. 0.
439: 132 0. 0. 0. 0.
440: 211 0. 0. 0. 0. 0.
441: 212 0. 0. 0. 0. 0. 0.
442: 231 0. 0. 0. 0. 0. 0. 0.
443: 232 0. 0. 0. 0. 0. 0. 0. 0.</pre>
444: </blockquote>
445:
446: <ul>
447: <li>If mle=0 you must enter a covariance matrix (usually
448: obtained from an earlier run).<br>
449: </li>
450: </ul>
451:
452: <h4><a name="biaspar-l"></a><font color="#FF0000">last
453: uncommented line</font></h4>
454:
455: <pre>agemin=70 agemax=100 bage=50 fage=100</pre>
456:
457: <p>Once we obtained the estimated parameters, the program is able
458: to calculated stationary prevalence, transitions probabilities
459: and life expectancies at any age. Choice of age ranges is useful
460: for extrapolation. In our data file, ages varies from age 70 to
461: 102. Setting bage=50 and fage=100, makes the program computing
462: life expectancy from age bage to age fage. As we use a model, we
463: can compute life expectancy on a wider age range than the age
464: range from the data. But the model can be rather wrong on big
465: intervals.</p>
466:
467: <p>Similarly, it is possible to get extrapolated stationary
468: prevalence by age raning from agemin to agemax. </p>
469:
470: <ul>
471: <li><b>agemin=</b> Minimum age for calculation of the
472: stationary prevalence </li>
473: <li><b>agemax=</b> Maximum age for calculation of the
474: stationary prevalence </li>
475: <li><b>bage=</b> Minimum age for calculation of the health
476: expectancies </li>
477: <li><b>fage=</b> Maximum ages for calculation of the health
478: expectancies </li>
479: </ul>
480:
481: <hr>
482:
483: <h2><a name="running"></a><font color="#00006A">Running Imach
484: with this example</font></h2>
485:
486: <p>We assume that you entered your <a href="biaspar.txt">1st_example
487: parameter file</a> as explained <a href="#biaspar">above</a>. To
488: run the program you should click on the imach.exe icon and enter
489: the name of the parameter file which is for example <a
490: href="C:\usr\imach\mle\biaspar.txt">C:\usr\imach\mle\biaspar.txt</a>
491: (you also can click on the biaspar.txt icon located in <br>
492: <a href="C:\usr\imach\mle">C:\usr\imach\mle</a> and put it with
493: the mouse on the imach window).<br>
494: </p>
495:
496: <p>The time to converge depends on the step unit that you used (1
497: month is cpu consuming), on the number of cases, and on the
498: number of variables.</p>
499:
500: <p>The program outputs many files. Most of them are files which
501: will be plotted for better understanding.</p>
502:
503: <hr>
504:
505: <h2><a name="output"><font color="#00006A">Output of the program
506: and graphs</font> </a></h2>
507:
508: <p>Once the optimization is finished, some graphics can be made
509: with a grapher. We use Gnuplot which is an interactive plotting
510: program copyrighted but freely distributed. Imach outputs the
511: source of a gnuplot file, named 'graph.gp', which can be directly
512: input into gnuplot.<br>
513: When the running is finished, the user should enter a caracter
514: for plotting and output editing. </p>
515:
516: <p>These caracters are:</p>
517:
518: <ul>
519: <li>'c' to start again the program from the beginning.</li>
520: <li>'g' to made graphics. The output graphs are in GIF format
521: and you have no control over which is produced. If you
522: want to modify the graphics or make another one, you
523: should modify the parameters in the file <b>graph.gp</b>
524: located in imach\bin. A gnuplot reference manual is
525: available <a
526: href="http://www.cs.dartmouth.edu/gnuplot/gnuplot.html">here</a>.
527: </li>
528: <li>'e' opens the <strong>index.htm</strong> file to edit the
529: output files and graphs. </li>
530: <li>'q' for exiting.</li>
531: </ul>
532:
533: <h5><font size="4"><strong>Results files </strong></font><br>
534: <br>
535: <font color="#EC5E5E" size="3"><strong>- </strong></font><a
536: name="Observed prevalence in each state"><font color="#EC5E5E"
537: size="3"><strong>Observed prevalence in each state</strong></font></a><font
538: color="#EC5E5E" size="3"><strong> (and at first pass)</strong></font><b>:
539: </b><a href="prbiaspar.txt"><b>prbiaspar.txt</b></a><br>
540: </h5>
541:
542: <p>The first line is the title and displays each field of the
543: file. The first column is age. The fields 2 and 6 are the
544: proportion of individuals in states 1 and 2 respectively as
545: observed during the first exam. Others fields are the numbers of
546: people in states 1, 2 or more. The number of columns increases if
547: the number of states is higher than 2.<br>
548: The header of the file is </p>
549:
550: <pre># Age Prev(1) N(1) N Age Prev(2) N(2) N
551: 70 1.00000 631 631 70 0.00000 0 631
552: 71 0.99681 625 627 71 0.00319 2 627
553: 72 0.97125 1115 1148 72 0.02875 33 1148 </pre>
554:
555: <pre># Age Prev(1) N(1) N Age Prev(2) N(2) N
556: 70 0.95721 604 631 70 0.04279 27 631</pre>
557:
558: <p>It means that at age 70, the prevalence in state 1 is 1.000
559: and in state 2 is 0.00 . At age 71 the number of individuals in
560: state 1 is 625 and in state 2 is 2, hence the total number of
561: people aged 71 is 625+2=627. <br>
562: </p>
563:
564: <h5><font color="#EC5E5E" size="3"><b>- Estimated parameters and
565: covariance matrix</b></font><b>: </b><a href="rbiaspar.txt"><b>rbiaspar.txt</b></a></h5>
566:
567: <p>This file contains all the maximisation results: </p>
568:
569: <pre> Number of iterations=47
570: -2 log likelihood=46553.005854373667
571: Estimated parameters: a12 = -12.691743 b12 = 0.095819
572: a13 = -7.815392 b13 = 0.031851
573: a21 = -1.809895 b21 = -0.030470
574: a23 = -7.838248 b23 = 0.039490
575: Covariance matrix: Var(a12) = 1.03611e-001
576: Var(b12) = 1.51173e-005
577: Var(a13) = 1.08952e-001
578: Var(b13) = 1.68520e-005
579: Var(a21) = 4.82801e-001
580: Var(b21) = 6.86392e-005
581: Var(a23) = 2.27587e-001
582: Var(b23) = 3.04465e-005
583: </pre>
584:
585: <h5><font color="#EC5E5E" size="3"><b>- Transition probabilities</b></font><b>:
586: </b><a href="pijrbiaspar.txt"><b>pijrbiaspar.txt</b></a></h5>
587:
588: <p>Here are the transitions probabilities Pij(x, x+nh) where nh
589: is a multiple of 2 years. The first column is the starting age x
590: (from age 50 to 100), the second is age (x+nh) and the others are
591: the transition probabilities p11, p12, p13, p21, p22, p23. For
592: example, line 5 of the file is: </p>
593:
594: <pre> 100 106 0.03286 0.23512 0.73202 0.02330 0.19210 0.78460 </pre>
595:
596: <p>and this means: </p>
597:
598: <pre>p11(100,106)=0.03286
599: p12(100,106)=0.23512
600: p13(100,106)=0.73202
601: p21(100,106)=0.02330
602: p22(100,106)=0.19210
603: p22(100,106)=0.78460 </pre>
604:
605: <h5><font color="#EC5E5E" size="3"><b>- </b></font><a
606: name="Stationary prevalence in each state"><font color="#EC5E5E"
607: size="3"><b>Stationary prevalence in each state</b></font></a><b>:
608: </b><a href="plrbiaspar.txt"><b>plrbiaspar.txt</b></a></h5>
609:
610: <pre>#Age 1-1 2-2
611: 70 0.92274 0.07726
612: 71 0.91420 0.08580
613: 72 0.90481 0.09519
614: 73 0.89453 0.10547</pre>
615:
616: <p>At age 70 the stationary prevalence is 0.92274 in state 1 and
617: 0.07726 in state 2. This stationary prevalence differs from
618: observed prevalence. Here is the point. The observed prevalence
619: at age 70 results from the incidence of disability, incidence of
620: recovery and mortality which occurred in the past of the cohort.
621: Stationary prevalence results from a simulation with actual
622: incidences and mortality (estimated from this cross-longitudinal
623: survey). It is the best predictive value of the prevalence in the
624: future if "nothing changes in the future". This is
625: exactly what demographers do with a Life table. Life expectancy
626: is the expected mean time to survive if observed mortality rates
627: (incidence of mortality) "remains constant" in the
628: future. </p>
629:
630: <h5><font color="#EC5E5E" size="3"><b>- Standard deviation of
631: stationary prevalence</b></font><b>: </b><a
632: href="vplrbiaspar.txt"><b>vplrbiaspar.txt</b></a></h5>
633:
634: <p>The stationary prevalence has to be compared with the observed
635: prevalence by age. But both are statistical estimates and
636: subjected to stochastic errors due to the size of the sample, the
637: design of the survey, and, for the stationary prevalence to the
638: model used and fitted. It is possible to compute the standard
639: deviation of the stationary prevalence at each age.</p>
640:
641: <h6><font color="#EC5E5E" size="3">Observed and stationary
642: prevalence in state (2=disable) with the confident interval</font>:<b>
643: vbiaspar2.gif</b></h6>
644:
645: <p><br>
646: This graph exhibits the stationary prevalence in state (2) with
647: the confidence interval in red. The green curve is the observed
648: prevalence (or proportion of individuals in state (2)). Without
649: discussing the results (it is not the purpose here), we observe
650: that the green curve is rather below the stationary prevalence.
651: It suggests an increase of the disability prevalence in the
652: future.</p>
653:
654: <p><img src="vbiaspar2.gif" width="400" height="300"></p>
655:
656: <h6><font color="#EC5E5E" size="3"><b>Convergence to the
657: stationary prevalence of disability</b></font><b>: pbiaspar1.gif</b><br>
658: <img src="pbiaspar1.gif" width="400" height="300"> </h6>
659:
660: <p>This graph plots the conditional transition probabilities from
661: an initial state (1=healthy in red at the bottom, or 2=disable in
662: green on top) at age <em>x </em>to the final state 2=disable<em> </em>at
663: age <em>x+h. </em>Conditional means at the condition to be alive
664: at age <em>x+h </em>which is <i>hP12x</i> + <em>hP22x</em>. The
665: curves <i>hP12x/(hP12x</i> + <em>hP22x) </em>and <i>hP22x/(hP12x</i>
666: + <em>hP22x) </em>converge with <em>h, </em>to the <em>stationary
667: prevalence of disability</em>. In order to get the stationary
668: prevalence at age 70 we should start the process at an earlier
669: age, i.e.50. If the disability state is defined by severe
670: disability criteria with only a few chance to recover, then the
671: incidence of recovery is low and the time to convergence is
672: probably longer. But we don't have experience yet.</p>
673:
674: <h5><font color="#EC5E5E" size="3"><b>- Life expectancies by age
675: and initial health status</b></font><b>: </b><a
676: href="erbiaspar.txt"><b>erbiaspar.txt</b></a></h5>
677:
678: <pre># Health expectancies
679: # Age 1-1 1-2 2-1 2-2
680: 70 10.7297 2.7809 6.3440 5.9813
681: 71 10.3078 2.8233 5.9295 5.9959
682: 72 9.8927 2.8643 5.5305 6.0033
683: 73 9.4848 2.9036 5.1474 6.0035 </pre>
684:
685: <pre>For example 70 10.7297 2.7809 6.3440 5.9813 means:
686: e11=10.7297 e12=2.7809 e21=6.3440 e22=5.9813</pre>
687:
688: <pre><img src="exbiaspar1.gif" width="400" height="300"><img
689: src="exbiaspar2.gif" width="400" height="300"></pre>
690:
691: <p>For example, life expectancy of a healthy individual at age 70
692: is 10.73 in the healthy state and 2.78 in the disability state
693: (=13.51 years). If he was disable at age 70, his life expectancy
694: will be shorter, 6.34 in the healthy state and 5.98 in the
695: disability state (=12.32 years). The total life expectancy is a
696: weighted mean of both, 13.51 and 12.32; weight is the proportion
697: of people disabled at age 70. In order to get a pure period index
698: (i.e. based only on incidences) we use the <a
699: href="#Stationary prevalence in each state">computed or
700: stationary prevalence</a> at age 70 (i.e. computed from
701: incidences at earlier ages) instead of the <a
702: href="#Observed prevalence in each state">observed prevalence</a>
703: (for example at first exam) (<a href="#Health expectancies">see
704: below</a>).</p>
705:
706: <h5><font color="#EC5E5E" size="3"><b>- Variances of life
707: expectancies by age and initial health status</b></font><b>: </b><a
708: href="vrbiaspar.txt"><b>vrbiaspar.txt</b></a></h5>
709:
710: <p>For example, the covariances of life expectancies Cov(ei,ej)
711: at age 50 are (line 3) </p>
712:
713: <pre> Cov(e1,e1)=0.4667 Cov(e1,e2)=0.0605=Cov(e2,e1) Cov(e2,e2)=0.0183</pre>
714:
715: <h5><font color="#EC5E5E" size="3"><b>- </b></font><a
716: name="Health expectancies"><font color="#EC5E5E" size="3"><b>Health
717: expectancies</b></font></a><font color="#EC5E5E" size="3"><b>
718: with standard errors in parentheses</b></font><b>: </b><a
719: href="trbiaspar.txt"><font face="Courier New"><b>trbiaspar.txt</b></font></a></h5>
720:
721: <pre>#Total LEs with variances: e.. (std) e.1 (std) e.2 (std) </pre>
722:
723: <pre>70 13.42 (0.18) 10.39 (0.15) 3.03 (0.10)70 13.81 (0.18) 11.28 (0.14) 2.53 (0.09) </pre>
724:
725: <p>Thus, at age 70 the total life expectancy, e..=13.42 years is
726: the weighted mean of e1.=13.51 and e2.=12.32 by the stationary
727: prevalence at age 70 which are 0.92274 in state 1 and 0.07726 in
728: state 2, respectively (the sum is equal to one). e.1=10.39 is the
729: Disability-free life expectancy at age 70 (it is again a weighted
730: mean of e11 and e21). e.2=3.03 is also the life expectancy at age
731: 70 to be spent in the disability state.</p>
732:
733: <h6><font color="#EC5E5E" size="3"><b>Total life expectancy by
734: age and health expectancies in states (1=healthy) and (2=disable)</b></font><b>:
735: ebiaspar.gif</b></h6>
736:
737: <p>This figure represents the health expectancies and the total
738: life expectancy with the confident interval in dashed curve. </p>
739:
740: <pre> <img src="ebiaspar.gif" width="400" height="300"></pre>
741:
742: <p>Standard deviations (obtained from the information matrix of
743: the model) of these quantities are very useful.
744: Cross-longitudinal surveys are costly and do not involve huge
745: samples, generally a few thousands; therefore it is very
746: important to have an idea of the standard deviation of our
747: estimates. It has been a big challenge to compute the Health
748: Expectancy standard deviations. Don't be confuse: life expectancy
749: is, as any expected value, the mean of a distribution; but here
750: we are not computing the standard deviation of the distribution,
751: but the standard deviation of the estimate of the mean.</p>
752:
753: <p>Our health expectancies estimates vary according to the sample
754: size (and the standard deviations give confidence intervals of
755: the estimate) but also according to the model fitted. Let us
756: explain it in more details.</p>
757:
758: <p>Choosing a model means ar least two kind of choices. First we
759: have to decide the number of disability states. Second we have to
760: design, within the logit model family, the model: variables,
761: covariables, confonding factors etc. to be included.</p>
762:
763: <p>More disability states we have, better is our demographical
764: approach of the disability process, but smaller are the number of
765: transitions between each state and higher is the noise in the
766: measurement. We do not have enough experiments of the various
767: models to summarize the advantages and disadvantages, but it is
768: important to say that even if we had huge and unbiased samples,
769: the total life expectancy computed from a cross-longitudinal
770: survey, varies with the number of states. If we define only two
771: states, alive or dead, we find the usual life expectancy where it
772: is assumed that at each age, people are at the same risk to die.
773: If we are differentiating the alive state into healthy and
774: disable, and as the mortality from the disability state is higher
775: than the mortality from the healthy state, we are introducing
776: heterogeneity in the risk of dying. The total mortality at each
777: age is the weighted mean of the mortality in each state by the
778: prevalence in each state. Therefore if the proportion of people
779: at each age and in each state is different from the stationary
780: equilibrium, there is no reason to find the same total mortality
781: at a particular age. Life expectancy, even if it is a very useful
782: tool, has a very strong hypothesis of homogeneity of the
783: population. Our main purpose is not to measure differential
784: mortality but to measure the expected time in a healthy or
785: disability state in order to maximise the former and minimize the
786: latter. But the differential in mortality complexifies the
787: measurement.</p>
788:
789: <p>Incidences of disability or recovery are not affected by the
790: number of states if these states are independant. But incidences
791: estimates are dependant on the specification of the model. More
792: covariates we added in the logit model better is the model, but
793: some covariates are not well measured, some are confounding
794: factors like in any statistical model. The procedure to "fit
795: the best model' is similar to logistic regression which itself is
796: similar to regression analysis. We haven't yet been sofar because
797: we also have a severe limitation which is the speed of the
798: convergence. On a Pentium III, 500 MHz, even the simplest model,
799: estimated by month on 8,000 people may take 4 hours to converge.
800: Also, the program is not yet a statistical package, which permits
801: a simple writing of the variables and the model to take into
802: account in the maximisation. The actual program allows only to
803: add simple variables without covariations, like age+sex but
804: without age+sex+ age*sex . This can be done from the source code
805: (you have to change three lines in the source code) but will
806: never be general enough. But what is to remember, is that
807: incidences or probability of change from one state to another is
808: affected by the variables specified into the model.</p>
809:
810: <p>Also, the age range of the people interviewed has a link with
811: the age range of the life expectancy which can be estimated by
812: extrapolation. If your sample ranges from age 70 to 95, you can
813: clearly estimate a life expectancy at age 70 and trust your
814: confidence interval which is mostly based on your sample size,
815: but if you want to estimate the life expectancy at age 50, you
816: should rely in your model, but fitting a logistic model on a age
817: range of 70-95 and estimating probabilties of transition out of
818: this age range, say at age 50 is very dangerous. At least you
819: should remember that the confidence interval given by the
820: standard deviation of the health expectancies, are under the
821: strong assumption that your model is the 'true model', which is
822: probably not the case.</p>
823:
824: <h5><font color="#EC5E5E" size="3"><b>- Copy of the parameter
825: file</b></font><b>: </b><a href="orbiaspar.txt"><b>orbiaspar.txt</b></a></h5>
826:
827: <p>This copy of the parameter file can be useful to re-run the
828: program while saving the old output files. </p>
829:
830: <hr>
831:
832: <h2><a name="example" </a><font color="#00006A">Trying an example</font></a></h2>
833:
834: <p>Since you know how to run the program, it is time to test it
835: on your own computer. Try for example on a parameter file named <a
836: href="file://../mytry/imachpar.txt">imachpar.txt</a> which is a
837: copy of <font size="2" face="Courier New">mypar.txt</font>
838: included in the subdirectory of imach, <font size="2"
839: face="Courier New">mytry</font>. Edit it to change the name of
840: the data file to <font size="2" face="Courier New">..\data\mydata.txt</font>
841: if you don't want to copy it on the same directory. The file <font
842: face="Courier New">mydata.txt</font> is a smaller file of 3,000
843: people but still with 4 waves. </p>
844:
845: <p>Click on the imach.exe icon to open a window. Answer to the
846: question:'<strong>Enter the parameter file name:'</strong></p>
847:
848: <table border="1">
849: <tr>
850: <td width="100%"><strong>IMACH, Version 0.63</strong><p><strong>Enter
851: the parameter file name: ..\mytry\imachpar.txt</strong></p>
852: </td>
853: </tr>
854: </table>
855:
856: <p>Most of the data files or image files generated, will use the
857: 'imachpar' string into their name. The running time is about 2-3
858: minutes on a Pentium III. If the execution worked correctly, the
859: outputs files are created in the current directory, and should be
860: the same as the mypar files initially included in the directory <font
861: size="2" face="Courier New">mytry</font>.</p>
862:
863: <ul>
864: <li><pre><u>Output on the screen</u> The output screen looks like <a
865: href="imachrun.LOG">this Log file</a>
866: #
867:
868: title=MLE datafile=..\data\mydata.txt lastobs=3000 firstpass=1 lastpass=3
869: ftol=1.000000e-008 stepm=24 ncov=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0</pre>
870: </li>
871: <li><pre>Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92
872:
873: Warning, no any valid information for:126 line=126
874: Warning, no any valid information for:2307 line=2307
875: Delay (in months) between two waves Min=21 Max=51 Mean=24.495826
876: <font face="Times New Roman">These lines give some warnings on the data file and also some raw statistics on frequencies of transitions.</font>
877: Age 70 1.=230 loss[1]=3.5% 2.=16 loss[2]=12.5% 1.=222 prev[1]=94.1% 2.=14
878: prev[2]=5.9% 1-1=8 11=200 12=7 13=15 2-1=2 21=6 22=7 23=1
879: Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0 </pre>
880: </li>
881: </ul>
882:
883: <p> </p>
884:
885: <ul>
886: <li>Maximisation with the Powell algorithm. 8 directions are
887: given corresponding to the 8 parameters. this can be
888: rather long to get convergence.<br>
889: <font size="1" face="Courier New"><br>
890: Powell iter=1 -2*LL=11531.405658264877 1 0.000000000000 2
891: 0.000000000000 3<br>
892: 0.000000000000 4 0.000000000000 5 0.000000000000 6
893: 0.000000000000 7 <br>
894: 0.000000000000 8 0.000000000000<br>
895: 1..........2.................3..........4.................5.........<br>
896: 6................7........8...............<br>
897: Powell iter=23 -2*LL=6744.954108371555 1 -12.967632334283
898: <br>
899: 2 0.135136681033 3 -7.402109728262 4 0.067844593326 <br>
900: 5 -0.673601538129 6 -0.006615504377 7 -5.051341616718 <br>
901: 8 0.051272038506<br>
902: 1..............2...........3..............4...........<br>
903: 5..........6................7...........8.........<br>
904: #Number of iterations = 23, -2 Log likelihood =
905: 6744.954042573691<br>
906: # Parameters<br>
907: 12 -12.966061 0.135117 <br>
908: 13 -7.401109 0.067831 <br>
909: 21 -0.672648 -0.006627 <br>
910: 23 -5.051297 0.051271 </font><br>
911: </li>
912: <li><pre><font size="2">Calculation of the hessian matrix. Wait...
913: 12345678.12.13.14.15.16.17.18.23.24.25.26.27.28.34.35.36.37.38.45.46.47.48.56.57.58.67.68.78
914:
915: Inverting the hessian to get the covariance matrix. Wait...
916:
917: #Hessian matrix#
918: 3.344e+002 2.708e+004 -4.586e+001 -3.806e+003 -1.577e+000 -1.313e+002 3.914e-001 3.166e+001
919: 2.708e+004 2.204e+006 -3.805e+003 -3.174e+005 -1.303e+002 -1.091e+004 2.967e+001 2.399e+003
920: -4.586e+001 -3.805e+003 4.044e+002 3.197e+004 2.431e-002 1.995e+000 1.783e-001 1.486e+001
921: -3.806e+003 -3.174e+005 3.197e+004 2.541e+006 2.436e+000 2.051e+002 1.483e+001 1.244e+003
922: -1.577e+000 -1.303e+002 2.431e-002 2.436e+000 1.093e+002 8.979e+003 -3.402e+001 -2.843e+003
923: -1.313e+002 -1.091e+004 1.995e+000 2.051e+002 8.979e+003 7.420e+005 -2.842e+003 -2.388e+005
924: 3.914e-001 2.967e+001 1.783e-001 1.483e+001 -3.402e+001 -2.842e+003 1.494e+002 1.251e+004
925: 3.166e+001 2.399e+003 1.486e+001 1.244e+003 -2.843e+003 -2.388e+005 1.251e+004 1.053e+006
926: # Scales
927: 12 1.00000e-004 1.00000e-006
928: 13 1.00000e-004 1.00000e-006
929: 21 1.00000e-003 1.00000e-005
930: 23 1.00000e-004 1.00000e-005
931: # Covariance
932: 1 5.90661e-001
933: 2 -7.26732e-003 8.98810e-005
934: 3 8.80177e-002 -1.12706e-003 5.15824e-001
935: 4 -1.13082e-003 1.45267e-005 -6.50070e-003 8.23270e-005
936: 5 9.31265e-003 -1.16106e-004 6.00210e-004 -8.04151e-006 1.75753e+000
937: 6 -1.15664e-004 1.44850e-006 -7.79995e-006 1.04770e-007 -2.12929e-002 2.59422e-004
938: 7 1.35103e-003 -1.75392e-005 -6.38237e-004 7.85424e-006 4.02601e-001 -4.86776e-003 1.32682e+000
939: 8 -1.82421e-005 2.35811e-007 7.75503e-006 -9.58687e-008 -4.86589e-003 5.91641e-005 -1.57767e-002 1.88622e-004
940: # agemin agemax for lifexpectancy, bage fage (if mle==0 ie no data nor Max likelihood).
941:
942:
943: agemin=70 agemax=100 bage=50 fage=100
944: Computing prevalence limit: result on file 'plrmypar.txt'
945: Computing pij: result on file 'pijrmypar.txt'
946: Computing Health Expectancies: result on file 'ermypar.txt'
947: Computing Variance-covariance of DFLEs: file 'vrmypar.txt'
948: Computing Total LEs with variances: file 'trmypar.txt'
949: Computing Variance-covariance of Prevalence limit: file 'vplrmypar.txt'
950: End of Imach
951: </font></pre>
952: </li>
953: </ul>
954:
955: <p><font size="3">Once the running is finished, the program
956: requires a caracter:</font></p>
957:
958: <table border="1">
959: <tr>
960: <td width="100%"><strong>Type g for plotting (available
961: if mle=1), e to edit output files, c to start again,</strong><p><strong>and
962: q for exiting:</strong></p>
963: </td>
964: </tr>
965: </table>
966:
967: <p><font size="3">First you should enter <strong>g</strong> to
968: make the figures and then you can edit all the results by typing <strong>e</strong>.
969: </font></p>
970:
971: <ul>
972: <li><u>Outputs files</u> <br>
973: - index.htm, this file is the master file on which you
974: should click first.<br>
975: - Observed prevalence in each state: <a
976: href="..\mytry\prmypar.txt">mypar.txt</a> <br>
977: - Estimated parameters and the covariance matrix: <a
978: href="..\mytry\rmypar.txt">rmypar.txt</a> <br>
979: - Stationary prevalence in each state: <a
980: href="..\mytry\plrmypar.txt">plrmypar.txt</a> <br>
981: - Transition probabilities: <a
982: href="..\mytry\pijrmypar.txt">pijrmypar.txt</a> <br>
983: - Copy of the parameter file: <a
984: href="..\mytry\ormypar.txt">ormypar.txt</a> <br>
985: - Life expectancies by age and initial health status: <a
986: href="..\mytry\ermypar.txt">ermypar.txt</a> <br>
987: - Variances of life expectancies by age and initial
988: health status: <a href="..\mytry\vrmypar.txt">vrmypar.txt</a>
989: <br>
990: - Health expectancies with their variances: <a
991: href="..\mytry\trmypar.txt">trmypar.txt</a> <br>
992: - Standard deviation of stationary prevalence: <a
993: href="..\mytry\vplrmypar.txt">vplrmypar.txt</a> <br>
994: <br>
995: </li>
996: <li><u>Graphs</u> <br>
997: <br>
998: -<a href="..\mytry\vmypar1.gif">Observed and stationary
999: prevalence in state (1) with the confident interval</a> <br>
1000: -<a href="..\mytry\vmypar2.gif">Observed and stationary
1001: prevalence in state (2) with the confident interval</a> <br>
1002: -<a href="..\mytry\exmypar1.gif">Health life expectancies
1003: by age and initial health state (1)</a> <br>
1004: -<a href="..\mytry\exmypar2.gif">Health life expectancies
1005: by age and initial health state (2)</a> <br>
1006: -<a href="..\mytry\emypar.gif">Total life expectancy by
1007: age and health expectancies in states (1) and (2).</a> </li>
1008: </ul>
1009:
1010: <p>This software have been partly granted by <a
1011: href="http://euroreves.ined.fr">Euro-REVES</a>, a concerted
1012: action from the European Union. It will be copyrighted
1013: identically to a GNU software product, i.e. program and software
1014: can be distributed freely for non commercial use. Sources are not
1015: widely distributed today. You can get them by asking us with a
1016: simple justification (name, email, institute) <a
1017: href="mailto:brouard@ined.fr">mailto:brouard@ined.fr</a> and <a
1018: href="mailto:lievre@ined.fr">mailto:lievre@ined.fr</a> .</p>
1019:
1020: <p>Latest version (0.63 of 16 march 2000) can be accessed at <a
1021: href="http://euroeves.ined.fr/imach">http://euroreves.ined.fr/imach</a><br>
1022: </p>
1023: </body>
1024: </html>
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>