--- imach/html/doc/imach.htm 2004/06/16 12:05:30 1.1 +++ imach/html/doc/imach.htm 2005/05/17 22:13:25 1.4 @@ -1,12 +1,10 @@ - +
-Version -0.8a, May 2002
+0.97, June 2004Age-specific proportions of people disable are very difficult -to forecast because each proportion corresponds to historical -conditions of the cohort and it is the result of the historical -flows from entering disability and recovering in the past until -today. The age-specific intensities (or incidence rates) of -entering disability or recovering a good health, are reflecting -actual conditions and therefore can be used at each age to -forecast the future of this cohort. For example if a country is -improving its technology of prosthesis, the incidence of -recovering the ability to walk will be higher at each (old) age, -but the prevalence of disability will only slightly reflect an -improve because the prevalence is mostly affected by the history -of the cohort and not by recent period effects. To measure the -period improvement we have to simulate the future of a cohort of -new-borns entering or leaving at each age the disability state or -dying according to the incidence rates measured today on -different cohorts. The proportion of people disabled at each age -in this simulated cohort will be much lower (using the exemple of -an improvement) that the proportions observed at each age in a -cross-sectional survey. This new prevalence curve introduced in a -life table will give a much more actual and realistic HE level -than the Sullivan method which mostly measured the History of -health conditions in this country.
+Age-specific proportions of people disabled (prevalence of +disability) are dependent on the historical flows from entering +disability and recovering in the past until today. The age-specific +forces (or incidence rates), estimated over a recent period of time +(like for period forces of mortality), of entering disability or +recovering a good health, are reflecting current conditions and +therefore can be used at each age to forecast the future of this +cohortif nothing changes in the future, i.e to forecast the +prevalence of disability of each cohort. Our finding (2) is that the period +prevalence of disability (computed from period incidences) is lower +than the cross-sectional prevalence. For example if a country is +improving its technology of prosthesis, the incidence of recovering +the ability to walk will be higher at each (old) age, but the +prevalence of disability will only slightly reflect an improve because +the prevalence is mostly affected by the history of the cohort and not +by recent period effects. To measure the period improvement we have to +simulate the future of a cohort of new-borns entering or leaving at +each age the disability state or dying according to the incidence +rates measured today on different cohorts. The proportion of people +disabled at each age in this simulated cohort will be much lower that +the proportions observed at each age in a cross-sectional survey. This +new prevalence curve introduced in a life table will give a more +realistic HE level than the Sullivan method which mostly measured the +History of health conditions in this country.
Therefore, the main question is how to measure incidence rates
from cross-longitudinal surveys? This is the goal of the IMaCH
@@ -196,6 +195,9 @@ Unix.
(1) Laditka, Sarah B. and Wolf, Douglas A. (1998), "New Methods for Analyzing Active Life Expectancy". Journal of Aging and Health. Vol 10, No. 2.
+In this example, 8,000 people have been interviewed in a -cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). -Some people missed 1, 2 or 3 interviews. Health statuses are -healthy (1) and disable (2). The survey is not a real one. It is -a simulation of the American Longitudinal Survey on Aging. The -disability state is defined if the individual missed one of four -ADL (Activity of daily living, like bathing, eating, walking). -Therefore, even is the individuals interviewed in the sample are -virtual, the information brought with this sample is close to the -situation of the United States. Sex is not recorded is this -sample.
+cross-longitudinal survey of 4 waves (1984, 1986, 1988, 1990). Some +people missed 1, 2 or 3 interviews. Health statuses are healthy (1) +and disable (2). The survey is not a real one. It is a simulation of +the American Longitudinal Survey on Aging. The disability state is +defined if the individual missed one of four ADL (Activity of daily +living, like bathing, eating, walking). Therefore, even if the +individuals interviewed in the sample are virtual, the information +brought with this sample is close to the situation of the United +States. Sex is not recorded is this sample. The LSOA survey is biased +in the sense that people living in an institution were not surveyed at +first pass in 1984. Thus the prevalence of disability in 1984 is +biased downwards at old ages. But when people left their household to +an institution, they have been surveyed in their institution in 1986, +1988 or 1990. Thus incidences are not biased. But cross-sectional +prevalences of disability at old ages are thus artificially increasing +in 1986, 1988 and 1990 because of a higher weight of people +institutionalized in the sample. Our article shows the +opposite: the period prevalence is lower at old ages than the +adjusted cross-sectional prevalence proving important current progress +against disability.Each line of the data set (named data1.txt -in this first example) is an individual record which fields are:
+in this first example) is an individual record. Fields are separated +by blanks:-
If your longitudinal survey do not include information about +
If your longitudinal survey does not include information about weights or covariates, you must fill the column with a number (e.g. 1) because a missing field is not allowed.
@@ -278,10 +291,10 @@ weights or covariates, you must fill theThis is a comment. Comments start with a '#'.
+This first line was a comment. Comments line start with a '#'.
+If the interval of d months between two waves is not a
+mutliple of 'stepm', but is comprised between (n-1) stepm and
+n stepm then both exact likelihoods are computed (the
+contribution to the likelihood at n stepm requires one matrix
+product more) (let us remember that we are modelling the probability
+to be observed in a particular state after d months being
+observed at a particular state at 0). The distance, (bh in
+the program), from the month of interview to the rounded date of n
+stepm is computed. It can be negative (interview occurs before
+n stepm) or positive if the interview occurs after n
+stepm (and before (n+1)stepm).
+
+Then the final contribution to the total likelihood is a weighted
+average of these two exact likelihoods at n stepm (out) and
+at (n-1)stepm(savm). We did not want to compute the third
+likelihood at (n+1)stepm because it is too costly in time, so
+we used an extrapolation if bh is positive.
Formula of
+inter/extrapolation may vary according to the value of parameter mle:
+
+mle=1 lli= log((1.+bbh)*out[s1][s2]- bbh*savm[s1][s2]); /* linear interpolation */ + +mle=2 lli= (savm[s1][s2]>(double)1.e-8 ? \ + log((1.+bbh)*out[s1][s2]- bbh*(savm[s1][s2])): \ + log((1.+bbh)*out[s1][s2])); /* linear interpolation */ +mle=3 lli= (savm[s1][s2]>1.e-8 ? \ + (1.+bbh)*log(out[s1][s2])- bbh*log(savm[s1][s2]): \ + log((1.+bbh)*out[s1][s2])); /* exponential inter-extrapolation */ + +mle=4 lli=log(out[s[mw[mi][i]][i]][s[mw[mi+1][i]][i]]); /* No interpolation */ + no need to save previous likelihood into memory. ++
+If the death occurs between first and second pass, and for example +more precisely between n stepm and (n+1)stepm the +contribution of this people to the likelihood is simply the difference +between the probability of dying before n stepm and the +probability of dying before (n+1)stepm. There was a bug in +version 0.8 and death was treated as any other state, i.e. as if it +was an observed death at second pass. This was not precise but +correct, but when information on the precise month of death came +(death occuring prior to second pass) we did not change the likelihood +accordingly. Thanks to Chris Jackson for correcting us. In earlier +versions (fortunately before first publication) the total mortality +was overestimated (people were dying too early) of about 10%. Version +0.95 and higher are correct. + +
Our suggested choice is mle=1 . If stepm=1 there is no difference +between various mle options (methods of interpolation). If stepm is +big, like 12 or 24 or 48 and mle=4 (no interpolation) the bias may be +very important if the mean duration between two waves is not a +multiple of stepm. See the appendix in our main publication concerning +the sine curve of biases. + +
This is an output if mle=1. But it can be -used as an input to get the various output data files (Health -expectancies, stationary prevalence etc.) and figures without -rerunning the rather long maximisation phase (mle=0).
- -The scales are small values for the evaluation of numerical -derivatives. These derivatives are used to compute the hessian -matrix of the parameters, that is the inverse of the covariance -matrix, and the variances of health expectancies. Each line -consists in indices "ij" followed by the initial scales -(zero to simplify) associated with aij and bij.
+These values are output by the maximisation of the likelihood mle=1. These valuse can be used as an input of a +second run in order to get the various output data files (Health +expectancies, period prevalence etc.) and figures without rerunning +the long maximisation phase (mle=0).
+ +These 'scales' are small values needed for the computing of +numerical derivatives. These derivatives are used to compute the +hessian matrix of the parameters, that is the inverse of the +covariance matrix. They are often used for estimating variances and +confidence intervals. Each line consists in indices "ij" +followed by the initial scales (zero to simplify) associated with aij +and bij.
This is an output if mle=1. But it can be
-used as an input to get the various output data files (Health
-expectancies, stationary prevalence etc.) and figures without
-rerunning the rather long maximisation phase (mle=0).
+
The covariance matrix is output if mle=1. But it can be
+also used as an input to get the various output data files (Health
+expectancies, period prevalence etc.) and figures without
+rerunning the maximisation phase (mle=0).
Each line starts with indices "ijk" followed by the
covariances between aij and bij:
+Once we obtained the estimated parameters, the program is able -to calculated stationary prevalence, transitions probabilities +to calculate period prevalence, transitions probabilities and life expectancies at any age. Choice of age range is useful -for extrapolation. In our data file, ages varies from age 70 to -102. It is possible to get extrapolated stationary prevalence by -age ranging from agemin to agemax. - - -Setting bage=50 (begin age) and fage=100 (final age), makes -the program computing life expectancy from age 'bage' to age +for extrapolation. In this example, age of people interviewed varies +from 69 to 102 and the model is estimated using their exact ages. But +if you are interested in the age-specific period prevalence you can +start the simulation at an exact age like 70 and stop at 100. Then the +program will draw at least two curves describing the forecasted +prevalences of two cohorts, one for healthy people at age 70 and the second +for disabled people at the same initial age. And according to the +mixing property (ergodicity) and because of recovery, both prevalences +will tend to be identical at later ages. Thus if you want to compute +the prevalence at age 70, you should enter a lower agemin value. + +
+Setting bage=50 (begin age) and fage=100 (final age), let +the program compute life expectancy from age 'bage' to age 'fage'. As we use a model, we can interessingly compute life expectancy on a wider age range than the age range from the data. But the model can be rather wrong on much larger intervals. @@ -568,9 +671,9 @@ Program is limited to around 120 for upp
begin-prev-date=1/1/1984 end-prev-date=1/6/1988 estepm=1-
++Statements 'begin-prev-date' and 'end-prev-date' allow to select the period in which we calculate the observed prevalences in each state. In this example, the prevalences are calculated on data survey collected between 1 january 1984 and 1 june 1988. -
pop_based=0-
The program computes status-based health expectancies, i.e
-health expectancies which depends on your initial health state.
-If you are healthy your healthy life expectancy (e11) is higher
-than if you were disabled (e21, with e11 > e21).
-To compute a healthy life expectancy independant of the initial
-status we have to weight e11 and e21 according to the probability
-to be in each state at initial age or, with other word, according
-to the proportion of people in each state.
-We prefer computing a 'pure' period healthy life expectancy based
-only on the transtion forces. Then the weights are simply the
-stationnary prevalences or 'implied' prevalences at the initial
-age.
-Some other people would like to use the cross-sectional
-prevalences (the "Sullivan prevalences") observed at
-the initial age during a period of time defined
-just above.
-
The program computes status-based health expectancies, i.e health
+expectancies which depend on the initial health state. If you are
+healthy, your healthy life expectancy (e11) is higher than if you were
+disabled (e21, with e11 > e21).
To compute a healthy life
+expectancy 'independent' of the initial status we have to weight e11
+and e21 according to the probability to be in each state at initial
+age which are corresponding to the proportions of people in each health
+state (cross-sectional prevalences).
+ +We could also compute e12 and e12 and get e.2 by weighting them +according to the observed cross-sectional prevalences at initial age. +
In a similar way we could compute the total life expectancy by
+summing e.1 and e.2 .
+
+The main difference between 'population based' and 'implied' or
+'period' consists in the weights used. 'Usually', cross-sectional
+prevalences of disability are higher than period prevalences
+particularly at old ages. This is true if the country is improving its
+health system by teaching people how to prevent disability as by
+promoting better screening, for example of people needing cataracts
+surgeryand for many unknown reasons that this program may help to
+discover. Then the proportion of disabled people at age 90 will be
+lower than the current observed proportion.
+
+Thus a better Health Expectancy and even a better Life Expectancy
+value is given by forecasting not only the current lower mortality at
+all ages but also a lower incidence of disability and higher recovery.
+
Using the period prevalences as weight instead of the
+cross-sectional prevalences we are computing indices which are more
+specific to the current situations and therefore more useful to
+predict improvements or regressions in the future as to compare
+different policies in various countries.
starting-proj-date=1/1/1989 final-proj-date=1/1/1992 mov_average=0@@ -659,6 +789,8 @@ smoothed forecasted prevalences with a f centered at the mid-age of the five-age period.
popforecast=0 popfile=pyram.txt popfiledate=1/1/1989 last-popfiledate=1/1/1992- -
This command is available if the interpolation unit is a
-month, i.e. stepm=1 and if popforecast=1. From a data file
-including age and number of persons alive at the precise date
-popfiledate, you can forecast the number of persons
-in each state until date last-popfiledate. In this
-example, the popfile pyram.txt
-includes real data which are the Japanese population in 1989.
-
We assume that you typed in your 1st_example ++To run the program under Windows you should either: +We assume that you already typed your 1st_example parameter file as explained above. -To run the program you should either: -
The time to converge depends on the step unit that you used (1 -month is cpu consuming), on the number of cases, and on the -number of variables. - - -The program outputs many files. Most of them are files which -will be plotted for better understanding. - -+
The time to converge depends on the step unit that you used (1 +month is more precise but more cpu consuming), on the number of cases, +and on the number of variables (covariates). + +
+The program outputs many files. Most of them are files which will be +plotted for better understanding. +
+To run under Linux it is mostly the same. ++It is neither more difficult to run it under a MacIntosh.
Once the optimization is finished, some graphics can be made
-with a grapher. We use Gnuplot which is an interactive plotting
-program copyrighted but freely distributed. A gnuplot reference
-manual is available here.
-When the running is finished, the user should enter a caracter
-for plotting and output editing.
-These caracters are:
+
Once the optimization is finished (once the convergence is +reached), many tables and graphics are produced.
+The IMaCh program will create a subdirectory of the same name as your
+parameter file (here mypar) where all the tables and figures will be
+stored.
+
+Important files like the log file and the output parameter file (which
+contains the estimates of the maximisation) are stored at the main
+level not in this subdirectory. File with extension .log and .txt can
+be edited with a standard editor like wordpad or notepad or even can be
+viewed with a browser like Internet Explorer or Mozilla.
+
+
The main html file is also named with the same name biaspar.htm. You can click on it by holding +your shift key in order to open it in another window (Windows). +
+ Our grapher is Gnuplot, it is an interactive plotting program (GPL) which
+ can also work in batch. A gnuplot reference manual is available here.
When the run is
+ finished, and in order that the window doesn't disappear, the user
+ should enter a character like q for quitting.
These
+ characters are:
Gnuplot is easy and you can use it to make more complex +graphs. Just click on gnuplot and type plot sin(x) to see how easy it +is. + +
The first line is the title and displays each field of the
-file. The first column is age. The fields 2 and 6 are the
+file. First column corresponds to age. Fields 2 and 6 are the
proportion of individuals in states 1 and 2 respectively as
-observed during the first exam. Others fields are the numbers of
+observed at first exam. Others fields are the numbers of
people in states 1, 2 or more. The number of columns increases if
the number of states is higher than 2.
The header of the file is
It means that at age 70, the prevalence in state 1 is 1.000 +
It means that at age 70 (between 70 and 71), the prevalence in state 1 is 1.000
and in state 2 is 0.00 . At age 71 the number of individuals in
state 1 is 625 and in state 2 is 2, hence the total number of
people aged 71 is 625+2=627.
@@ -809,18 +951,19 @@ covariance matrix: By substitution of these parameters in the regression model,
we obtain the elementary transition probabilities:
Here are the transitions probabilities Pij(x, x+nh) where nh -is a multiple of 2 years. The first column is the starting age x -(from age 50 to 100), the second is age (x+nh) and the others are -the transition probabilities p11, p12, p13, p21, p22, p23. For -example, line 5 of the file is:
+Here are the transitions probabilities Pij(x, x+nh). The second +column is the starting age x (from age 95 to 65), the third is age +(x+nh) and the others are the transition probabilities p11, p12, p13, +p21, p22, p23. The first column indicates the value of the covariate +(without any other variable than age it is equal to 1) For example, line 5 of the file +is:
-100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513+
1 100 106 0.02655 0.17622 0.79722 0.01809 0.13678 0.84513
and this means:
@@ -832,9 +975,9 @@ p22(100,106)=0.13678 p22(100,106)=0.84513#Prevalence #Age 1-1 2-2 @@ -845,49 +988,64 @@ size="3">Stationary prevalence in eac 72 0.88139 0.11861 73 0.87015 0.12985-
At age 70 the stationary prevalence is 0.90134 in state 1 and -0.09866 in state 2. This stationary prevalence differs from -observed prevalence. Here is the point. The observed prevalence -at age 70 results from the incidence of disability, incidence of -recovery and mortality which occurred in the past of the cohort. -Stationary prevalence results from a simulation with actual -incidences and mortality (estimated from this cross-longitudinal -survey). It is the best predictive value of the prevalence in the -future if "nothing changes in the future". This is -exactly what demographers do with a Life table. Life expectancy -is the expected mean time to survive if observed mortality rates -(incidence of mortality) "remains constant" in the -future.
+At age 70 the period prevalence is 0.90134 in state 1 and 0.09866 +in state 2. This period prevalence differs from the cross-sectional +prevalence. Here is the point. The cross-sectional prevalence at age +70 results from the incidence of disability, incidence of recovery and +mortality which occurred in the past of the cohort. Period prevalence +results from a simulation with current incidences of disability, +recovery and mortality estimated from this cross-longitudinal +survey. It is a good predictin of the prevalence in the +future if "nothing changes in the future". This is exactly +what demographers do with a period life table. Life expectancy is the +expected mean survival time if current mortality rates (age-specific incidences +of mortality) "remain constant" in the future.
The stationary prevalence has to be compared with the observed -prevalence by age. But both are statistical estimates and -subjected to stochastic errors due to the size of the sample, the -design of the survey, and, for the stationary prevalence to the -model used and fitted. It is possible to compute the standard -deviation of the stationary prevalence at each age.
+The period prevalence has to be compared with the cross-sectional
+prevalence. But both are statistical estimates and therefore
+have confidence intervals.
+
For the cross-sectional prevalence we generally need information on
+the design of the surveys. It is usually not enough to consider the
+number of people surveyed at a particular age and to estimate a
+Bernouilli confidence interval based on the prevalence at that
+age. But you can do it to have an idea of the randomness. At least you
+can get a visual appreciation of the randomness by looking at the
+fluctuation over ages.
+
+
For the period prevalence it is possible to estimate the +confidence interval from the Hessian matrix (see the publication for +details). We are supposing that the design of the survey will only +alter the weight of each individual. IMaCh is scaling the weights of +individuals-waves contributing to the likelihood by making the sum of +the weights equal to the sum of individuals-waves contributing: a +weighted survey doesn't increase or decrease the size of the survey, +it only give more weights to some individuals and thus less to the +others. -
This graph exhibits the stationary prevalence in state (2) -with the confidence interval in red. The green curve is the -observed prevalence (or proportion of individuals in state (2)). -Without discussing the results (it is not the purpose here), we -observe that the green curve is rather below the stationary -prevalence. It suggests an increase of the disability prevalence -in the future.
+This graph exhibits the period prevalence in state (2) with the +confidence interval in red. The green curve is the observed prevalence +(or proportion of individuals in state (2)). Without discussing the +results (it is not the purpose here), we observe that the green curve +is rather below the period prevalence. It the data where not biased by +the non inclusion of people living in institutions we would have +concluded that the prevalence of disability will increase in the +future (see the main publication if you are interested in real data +and results which are opposite).
- +This graph plots the conditional transition probabilities from an initial state (1=healthy in red at the bottom, or 2=disable in @@ -895,8 +1053,8 @@ green on top) at age x to the f age x+h. Conditional means at the condition to be alive at age x+h which is hP12x + hP22x. The curves hP12x/(hP12x + hP22x) and hP22x/(hP12x -+ hP22x) converge with h, to the stationary -prevalence of disability. In order to get the stationary ++ hP22x) converge with h, to the period +prevalence of disability. In order to get the period prevalence at age 70 we should start the process at an earlier age, i.e.50. If the disability state is defined by severe disability criteria with only a few chance to recover, then the @@ -905,40 +1063,49 @@ probably longer. But we don't have exper
# Health expectancies # Age 1-1 (SE) 1-2 (SE) 2-1 (SE) 2-2 (SE) -70 10.4171 (0.1517) 3.0433 (0.4733) 5.6641 (0.1121) 5.6907 (0.3366) -71 9.9325 (0.1409) 3.0495 (0.4234) 5.2627 (0.1107) 5.6384 (0.3129) -72 9.4603 (0.1319) 3.0540 (0.3770) 4.8810 (0.1099) 5.5811 (0.2907) -73 9.0009 (0.1246) 3.0565 (0.3345) 4.5188 (0.1098) 5.5187 (0.2702) + 70 11.0180 (0.1277) 3.1950 (0.3635) 4.6500 (0.0871) 4.4807 (0.2187) + 71 10.4786 (0.1184) 3.2093 (0.3212) 4.3384 (0.0875) 4.4820 (0.2076) + 72 9.9551 (0.1103) 3.2236 (0.2827) 4.0426 (0.0885) 4.4827 (0.1966) + 73 9.4476 (0.1035) 3.2379 (0.2478) 3.7621 (0.0899) 4.4825 (0.1858) + 74 8.9564 (0.0980) 3.2522 (0.2165) 3.4966 (0.0920) 4.4815 (0.1754) + 75 8.4815 (0.0937) 3.2665 (0.1887) 3.2457 (0.0946) 4.4798 (0.1656) + 76 8.0230 (0.0905) 3.2806 (0.1645) 3.0090 (0.0979) 4.4772 (0.1565) + 77 7.5810 (0.0884) 3.2946 (0.1438) 2.7860 (0.1017) 4.4738 (0.1484) + 78 7.1554 (0.0871) 3.3084 (0.1264) 2.5763 (0.1062) 4.4696 (0.1416) + 79 6.7464 (0.0867) 3.3220 (0.1124) 2.3794 (0.1112) 4.4646 (0.1364) + 80 6.3538 (0.0868) 3.3354 (0.1014) 2.1949 (0.1168) 4.4587 (0.1331) + 81 5.9775 (0.0873) 3.3484 (0.0933) 2.0222 (0.1230) 4.4520 (0.1320)-
For example 70 10.4171 (0.1517) 3.0433 (0.4733) 5.6641 (0.1121) 5.6907 (0.3366) means: -e11=10.4171 e12=3.0433 e21=5.6641 e22=5.6907+
For example 70 11.0180 (0.1277) 3.1950 (0.3635) 4.6500 (0.0871) 4.4807 (0.2187) +means +e11=11.0180 e12=3.1950 e21=4.6500 e22=4.4807- +
For example, life expectancy of a healthy individual at age 70 -is 10.42 in the healthy state and 3.04 in the disability state -(=13.46 years). If he was disable at age 70, his life expectancy -will be shorter, 5.66 in the healthy state and 5.69 in the -disability state (=11.35 years). The total life expectancy is a -weighted mean of both, 13.46 and 11.35; weight is the proportion -of people disabled at age 70. In order to get a pure period index +is 11.0 in the healthy state and 3.2 in the disability state +(total of 14.2 years). If he was disable at age 70, his life expectancy +will be shorter, 4.65 years in the healthy state and 4.5 in the +disability state (=9.15 years). The total life expectancy is a +weighted mean of both, 14.2 and 9.15. The weight is the proportion +of people disabled at age 70. In order to get a period index (i.e. based only on incidences) we use the computed or -stationary prevalence at age 70 (i.e. computed from +href="#Period prevalence in each state">stable or +period prevalence at age 70 (i.e. computed from incidences at earlier ages) instead of the observed prevalence -(for example at first exam) (see +href="#cross-sectional prevalence in each state">cross-sectional prevalence +(observed for example at first medical exam) (see below).
For example, the covariances of life expectancies Cov(ei,ej) at age 50 are (line 3)
@@ -946,7 +1113,7 @@ at age 50 are (line 3)Cov(e1,e1)=0.4776 Cov(e1,e2)=0.0488=Cov(e2,e1) Cov(e2,e2)=0.0424
For example, at age 65
@@ -956,28 +1123,28 @@ probabilities : Health expectancies with standard errors in parentheses: trbiaspar.txt +href="biaspar/trbiaspar.txt">biaspar/trbiaspar.txt#Total LEs with variances: e.. (std) e.1 (std) e.2 (std)
70 13.26 (0.22) 9.95 (0.20) 3.30 (0.14)
Thus, at age 70 the total life expectancy, e..=13.26 years is -the weighted mean of e1.=13.46 and e2.=11.35 by the stationary -prevalence at age 70 which are 0.90134 in state 1 and 0.09866 in -state 2, respectively (the sum is equal to one). e.1=9.95 is the +the weighted mean of e1.=13.46 and e2.=11.35 by the period +prevalences at age 70 which are 0.90134 in state 1 and 0.09866 in +state 2 respectively (the sum is equal to one). e.1=9.95 is the Disability-free life expectancy at age 70 (it is again a weighted mean of e11 and e21). e.2=3.30 is also the life expectancy at age 70 to be spent in the disability state.
This figure represents the health expectancies and the total -life expectancy with the confident interval in dashed curve.
+life expectancy with a confidence interval (dashed line). -+
Standard deviations (obtained from the information matrix of the model) of these quantities are very useful. @@ -992,13 +1159,13 @@ but the standard deviation of the estima
Our health expectancies estimates vary according to the sample size (and the standard deviations give confidence intervals of -the estimate) but also according to the model fitted. Let us +the estimates) but also according to the model fitted. Let us explain it in more details.
-Choosing a model means ar least two kind of choices. First we -have to decide the number of disability states. Second we have to -design, within the logit model family, the model: variables, -covariables, confonding factors etc. to be included.
+Choosing a model means at least two kind of choices. At first we +have to decide the number of disability states. And at second we have to +design, within the logit model family, the model itself: variables, +covariables, confounding factors etc. to be included.
More disability states we have, better is our demographical approach of the disability process, but smaller are the number of @@ -1016,7 +1183,7 @@ than the mortality from the healthy stat heterogeneity in the risk of dying. The total mortality at each age is the weighted mean of the mortality in each state by the prevalence in each state. Therefore if the proportion of people -at each age and in each state is different from the stationary +at each age and in each state is different from the period equilibrium, there is no reason to find the same total mortality at a particular age. Life expectancy, even if it is a very useful tool, has a very strong hypothesis of homogeneity of the @@ -1026,38 +1193,39 @@ disability state in order to maximise th latter. But the differential in mortality complexifies the measurement.
-Incidences of disability or recovery are not affected by the -number of states if these states are independant. But incidences -estimates are dependant on the specification of the model. More -covariates we added in the logit model better is the model, but -some covariates are not well measured, some are confounding -factors like in any statistical model. The procedure to "fit -the best model' is similar to logistic regression which itself is -similar to regression analysis. We haven't yet been sofar because -we also have a severe limitation which is the speed of the -convergence. On a Pentium III, 500 MHz, even the simplest model, -estimated by month on 8,000 people may take 4 hours to converge. -Also, the program is not yet a statistical package, which permits -a simple writing of the variables and the model to take into -account in the maximisation. The actual program allows only to -add simple variables like age+sex or age+sex+ age*sex but will -never be general enough. But what is to remember, is that -incidences or probability of change from one state to another is -affected by the variables specified into the model.
+Incidences of disability or recovery are not affected by the number +of states if these states are independent. But incidences estimates +are dependent on the specification of the model. More covariates we +added in the logit model better is the model, but some covariates are +not well measured, some are confounding factors like in any +statistical model. The procedure to "fit the best model' is +similar to logistic regression which itself is similar to regression +analysis. We haven't yet been sofar because we also have a severe +limitation which is the speed of the convergence. On a Pentium III, +500 MHz, even the simplest model, estimated by month on 8,000 people +may take 4 hours to converge. Also, the IMaCh program is not a +statistical package, and does not allow sophisticated design +variables. If you need sophisticated design variable you have to them +your self and and add them as ordinary variables. IMaCX allows up to 8 +variables. The current version of this program allows only to add +simple variables like age+sex or age+sex+ age*sex but will never be +general enough. But what is to remember, is that incidences or +probability of change from one state to another is affected by the +variables specified into the model.
-Also, the age range of the people interviewed has a link with +
Also, the age range of the people interviewed is linked the age range of the life expectancy which can be estimated by extrapolation. If your sample ranges from age 70 to 95, you can clearly estimate a life expectancy at age 70 and trust your -confidence interval which is mostly based on your sample size, +confidence interval because it is mostly based on your sample size, but if you want to estimate the life expectancy at age 50, you -should rely in your model, but fitting a logistic model on a age -range of 70-95 and estimating probabilties of transition out of -this age range, say at age 50 is very dangerous. At least you +should rely in the design of your model. Fitting a logistic model on a age +range of 70 to 95 and estimating probabilties of transition out of +this age range, say at age 50, is very dangerous. At least you should remember that the confidence interval given by the standard deviation of the health expectancies, are under the strong assumption that your model is the 'true model', which is -probably not the case.
+probably not the case outside the age range of your sample.First, +
+ +First, we have estimated the observed prevalence between 1/1/1984 and -1/6/1988. The mean date of interview (weighed average of the -interviews performed between1/1/1984 and 1/6/1988) is estimated +1/6/1988 (June, European syntax of dates). The mean date of all interviews (weighted average of the +interviews performed between 1/1/1984 and 1/6/1988) is estimated to be 13/9/1985, as written on the top on the file. Then we forecast the probability to be in each state.
-Example, -at date 1/1/1989 :
++For example on 1/1/1989 :
# StartingAge FinalAge P.1 P.2 P.3 # Forecasting at date 1/1/1989 73 0.807 0.078 0.115-
Since -the minimum age is 70 on the 13/9/1985, the youngest forecasted -age is 73. This means that at age a person aged 70 at 13/9/1989 -has a probability to enter state1 of 0.807 at age 73 on 1/1/1989. +
+ +Since the minimum age is 70 on the 13/9/1985, the youngest forecasted +age is 73. This means that at age a person aged 70 at 13/9/1989 has a +probability to enter state1 of 0.807 at age 73 on 1/1/1989. Similarly, the probability to be in state 2 is 0.078 and the -probability to die is 0.115. Then, on the 1/1/1989, the -prevalence of disability at age 73 is estimated to be 0.088.
+probability to die is 0.115. Then, on the 1/1/1989, the prevalence of +disability at age 73 is estimated to be 0.088.# Age P.1 P.2 P.3 [Population] # Forecasting at date 1/1/1989 @@ -1118,21 +1286,21 @@ are in state 2. One year latter, 512892Since you know how to run the program, it is time to test it on your own computer. Try for example on a parameter file named imachpar.imach which is a copy +href="imachpar.imach">imachpar.imach which is a copy of mypar.imach included in the subdirectory of imach, mytry. -Edit it to change the name of the data file to ..\data\mydata.txt if you don't want to +Edit it and change the name of the data file to mydata.txt if you don't want to copy it on the same directory. The file mydata.txt is a smaller file of 3,000 people but still with 4 waves.
-Click on the imach.exe icon to open a window. Answer to the -question:'Enter the parameter file name:'
+Right click on the .imach file and a window will popup with the +string 'Enter the parameter file name:'
IMACH, Version 0.8a Enter - the parameter file name: ..\mytry\imachpar.imach + | IMACH, Version 0.97b Enter + the parameter file name: imachpar.imach |
Output on the screen The output screen looks like this Log file +href="biaspar.log">biaspar.log # - -title=MLE datafile=..\data\mydata.txt lastobs=3000 firstpass=1 lastpass=3 +title=MLE datafile=mydaiata.txt lastobs=3000 firstpass=1 lastpass=3 ftol=1.000000e-008 stepm=24 ncovcol=2 nlstate=2 ndeath=1 maxwav=4 mle=1 weight=0
Total number of individuals= 2965, Agemin = 70.00, Agemax= 100.92 @@ -1163,6 +1330,22 @@ Age 70 1.=230 loss[1]=3.5% 2.=16 loss[2] Age 102 1.=0 loss[1]=NaNQ% 2.=0 loss[2]=NaNQ% 1.=0 prev[1]=NaNQ% 2.=0
+ +If you survey suffers from severe attrition, you have to analyse the +characteristics of the lost people and overweight people with same +characteristics for example. +
+By default, IMaCH warns and excludes these problematic people, but you +have to be careful with such results.
@@ -1237,7 +1420,7 @@ End of Imach
Once the running is finished, the program -requires a caracter:
+requires a character:First you should enter e to edit the master file mypar.htm.
@@ -1254,9 +1441,9 @@ edit the master file mypar.htm. <This software have been partly granted by Euro-REVES, a concerted -action from the European Union. It will be copyrighted -identically to a GNU software product, i.e. program and software -can be distributed freely for non commercial use. Sources are not -widely distributed today. You can get them by asking us with a -simple justification (name, email, institute) Euro-REVES, a concerted action +from the European Union. Since 2003 it is also partly granted by the +French Institute on Longevity. It will be copyrighted identically to a +GNU software product, i.e. program and software can be distributed +freely for non commercial use. Sources are not widely distributed +today because some part of the codes are copyrighted by Numerical +Recipes in C. You can get our GPL codes by asking us with a simple +justification (name, email, institute) mailto:brouard@ined.fr and mailto:lievre@ined.fr .
-Latest version (0.8a of May 2002) can be accessed at Latest version (0.97b of June 2004) can be accessed at http://euroreves.ined.fr/imach