Formula for mid-age of incidence from age-specific prevalence of chronic disease and its application

This study was aimed to devise a theoretical formula for the mid-age of incidence (MAI) from the prevalence of age groups and to confirm its application. The formula was devised using the concept of lost years of health and then simulated. In the inhabitants’ survey, MAI was calculated from the prevalence of liver disease in the areas, and the main cause of disease was analyzed between those areas where MAI was lower than 2.5% of the distribution with significantly high prevalence (HL group) and those areas where MAI was higher than 2.5% of the distribution with significantly high prevalence (HU group). In the computer simulation, MAI was not much different or a little lower than the mid-age of occurrence. In addition, the sum of the incidence rates in the 1-year age groups approximately corresponded to the maximum prevalence within the age groups in the simulation. In the HL group, the main cause of liver disease was alcoholic liver injury; in the HU group, one cause was type C hepatitis, whereas for many others, it was advanced alcoholic liver injury. Thus, the HU and t HL groups were confirmed as active and quiescent areas, respectively. Investigating the cause or stage is considered to be useful in future studies.


INTRODUCTION
introduced the ageadjusted (AA) (Armitage et al., 2002) years of potential life lost (YPLL) rate as follows: when n i is the population of the ith age group in some area, d i is the number of deaths in some area, N ir is the standard population and N r is the standard population between the age groups 1 and 70 years, then This index is referred to as the health index, which This index is referred to as the health index, which denotes the years of potential health lost (YPHL) (Inoue, 2002).The YPHL is calculated by the prevalence between 40 and 74 years old.When di is the number of  patients in the ith age group, ni is the number of examinees, P i =d i /n i is the prevalence, and the reference age shows T(>74), then the YPHL rate in the ith age group is (T-mid age of the ith age group) × d i /n i .
where NG is the number of age groups when each age group is in the same period for the equivalent adjustment (Chin, 1961).
The theoretical formula for the mid-age of incidence (MAI) was devised from the conception of the YPHL and was assessed using computer simulation.Furthermore, it was calculated in the inhabitants' detection survey of Japan according to liver disease prevalence, and its application was investigated.

METHODOLOGY Formula for MAI
Figure 1 shows the simple model as a guide for the formula.The upper line shows a model of a person who falls ill at 50 years of age and dies at 78 years.The straight line AD shows the time when the inhabitants' survey was done.A model of a 40-year-old person at time AD is shown at the bottom line of the parallelogram, whereas a model of a 75-year-old person at time AD is shown at the top line.Sequentially, the models are the same along the layer of the horizontal lines in all age groups from 40 (the bottom) to 75 (the top) years of age at the time of the line AD (inhabitants' detection survey) in the parallelogram (Figure 1).
In the area of triangle ABC shows the loss of health years before the age of 75 years, that is, the value of YPHL at time AB.Line BC represents the period of the disease.The length of BC is calculated as the area of triangle ABC × 2 divided by the length of AB.Thus, the period of disease is calculated as follows: (value of YPHL before the age of 75 years at time AD) × 2/number of patients at time AD.
The formula is given as follows: Where P i is the prevalence in the ith age group, c i is the normalized weight for length of each age group, and adjustment is used for the equivalent average: The aforementioned formula shows that the period of disease before 75 years is (75 -the average age of patients from age-specific prevalence before 75 years) × 2. The Appendix explains this process.The onset age calculated by the formula is referred to

Computer simulation
The age range in the simulation is 40 to 79 years with an interval of 1 year.

Without the death rate
In the simulation without the death rate, the sum of the occurrence rates within the age groups is set at 0.15, and the shape of occurrences is an isosceles triangle with its vertex at 55.5 years of age and the width of its bottom at 5, 17, or 29 years.The occurrence (incidence) rate of the youngest age group in the simulation is the sum of the incidence rates of the age groups × {2/(width of age groups of incidence +1)} 2 .The results are the same regardless of the populations.

With death rate of disease and total death
The simulation without the death rate reveals that the sum of the occurrence rates of the age groups corresponds to the maximum prevalence in the age groups.In the simulation with the death rate, the sum of the occurrence rates of the age groups is set at 60 per 1000 population.According to the death rate from the disease and the total death rate, the death rate from the disease in the age group 79 years is set at 0.52/1000; the total death rate is set at 38.9/1000.In the first simulation, the beginning age group for death rates is 42 years, when the occurrence of the disease begins, and the death rate increases in the same proportion until 79 years.In another simulation, the approximate formula used for the death rate of disease is y = 9.030×10 -11 x 4 -1.106×10 - x 3 + 5.245×10 -7 x 2 -1.043×10 -5 x + 7.105×10 -5 , which shows an increase with aging, and that for total death is y = 1.075×10 -8 x 4 -1.483×10 - x 3 + 7.452×10 -5 x 2 -1.515×10 -3 x + 1.053×10 -2 .These values were obtained from an article reporting the age-specific estimated incidence (ASEI) rate (Inoue, 2013).
The simulation was carried out using the following: Number of patients: Initially, the prevalence or number of patients is 0. The number of patients is calculated as prevalence × total population, and the number of healthy people is given as total population × (1-prevalence).The new number of patients is computed as total number of healthy people × occurrence rates in simulation.The total number of patients becomes number of patients + new number of patients.

Number of deaths (in the simulation with the death rate):
The number of deaths due to disease is calculated as population × death rate of disease.When the number of patients is more than the number of deaths from the disease, the number of patients is subtracted from the number of deaths.
Population (in the simulation with the death rate): When the number of patients is more than the number of deaths from the  disease, the number of the population is calculated as population × (1-death rate of disease).
Population (in the simulation with the death rate and total death): When the number of patients is more than the number of deaths from the disease, the number of the population is calculated as population × (1-total death rate).
Prevalence: Prevalence is calculated as the number of patients divided by the entire population.

Shift in age group:
The population and the prevalence are shifted to the next age group.
These simulations are performed until the age-specific prevalence will be constant, and MAI is compared with the mid-age set in the simulation.

Studies according to the MAI value using practical data
MAI was calculated for the age-specific prevalence on liver function disorder or other disorders obtained from the cities, towns, and villages (hereafter called areas) included in the 2007 inhabitants' survey of Japan.The prevalence was acquired from the website of the Ministry of Health, Labour, and Welfare.Table 1 shows the categories of liver disease and other diseases (Table 1).The number of areas from which data could be obtained totaled 1816.The total number of examinees in each area was not uniform, and the number of areas for several scales of examinees was the greatest in the group with 501 to 1000 examinees.Figure 2 shows the distribution of MAI in moderate-grade hypercholesterolemia (Figure 2).The value of MAI is calculated using the prevalence until the age of 75 years according to Formula 1.
When P is the prevalence, n is the number of examinees in the area, and Ṗ is the prevalence in the total area (Japan), the condition n P  >5 is necessary for the binomial distribution to be close to the normal distribution.The AA-P (age -adjusted prevalence) before 75 years was tested using the aforementioned conditions.Thus, MAI values were calculated when all the age groups were under this condition (Miller, 1983a).
The MAI values were calculated under this condition, and the mean, standard deviation, and skewness of the distribution of MAI values were obtained (Miller, 1983b).Thus, the standardization for z=(x-µ)/δ was used for the picking up of the areas, where MAI was below or above the 2.5% limit (Miller, 1983c).
In addition, MAI was meaningful in cases where the AA-P was high.Thus, those areas where the AA-P before 75 years was significantly high compared with the AA-P in the total area, and where MAI was lower than the 2.5% limit of the MAI distribution, were determined as the HL group (high and lower).Those areas where MAI was higher than the 2.5% limit were determined as the HU group (high and upper) (Miller, 1983d,e).
A telephone survey was conducted in several government offices to obtain information on liver disease in the HL and HU groups.The main cause of liver disease found in the HL groups was alcoholic liver injury (hereafter ALI).The analysis of the prevalence of ALI was not age-adjusted because the prevalence in the age groups was small.In a few areas of the HU group, the cause of liver disease was found to be type C hepatitis.Because, the proportion of hepatoma from all deaths was shown, the significance and the main cause of hepatoma in Japan is type C hepatitis (Tanaka et al., 2005).
In many areas of both the HL and the HU groups, the prevalence of ALI was found to be significantly high.Thus, the MAI values of liver disease were set as the x-value in areas where the ageadjusted prevalence of liver disease and the crude rate of ALI were significantly high.The y-value was set to 1 if the crude rates of other medical categories between ages 40 and 74 were significantly high against the total rate; otherwise, the y-value was 0. Logistic regression analyses were completed between the MAI and the significance in other medical categories (Truett et al., 1967).

Computer simulation
Table 2 shows the results of the computer simulations without the death rate, with occurrence widths of 5, 17, and 29 years.The simulation results show that the prevalence within the age groups became constant after several years.The age at which the maximum prevalence was found consistent with the final age for occurrences in the isosceles triangle.Figure 3 shows the simulation without the death rate and a width of occurrence of 17 years.The slope of change in the prevalence between the beginning and ending age became flatter as the width of the occurrence became broader.MAI became younger as the width of occurrence became wider.The sum of the occurrence rates for the 1year age groups (0.15) is almost the same as the maximum prevalence (Table 2 and Figure 3).
Table 3 presents the simulation with the death rate.The linear model of the death rate is death-1 of disease or total death-1, and the approximate model is death-2.MAI is closer to the mid-age setting in the death-2 model than in death-1.
Furthermore, in the case using 3 ×death-1 of the disease, MAI was calculated to be younger than in that using death-1.
In the case of death-2 of disease and total death-2, there was no peak in the prevalence within the age groups.The maximum prevalence rate found was higher than the sum of the occurrence rates (0.15), as shown in Figure 4 and Table 3.

Study using practical data
Table 4 shows the results of the study on liver disease.There were nine HL groups, with the rate within all the areas at 0.54%.The HU groups numbered 12, with a rate of 0.72%.Table 5 shows the prevalence of ALI in the areas; the data on the HL groups are shown in the upper part, whereas the data on the HU groups are in the lower part.In eight of the nine HU-group areas, the rates of ALI were significantly higher (almost p<0.001) than that in Japan; thus, liver function disorders in those areas were judged as ALI.
In one HU-group area, the proportion of death by Death-1 of disease and total death-1; simple linear model.Death-2 of disease and total death-2; approximation formula model.hepatoma was found to be significantly higher than that in Japan; in another area, the proportion was much higher than that in Japan, but not significantly so (p<0.06).It is possible that there were many patients with C-type hepatitis in these areas.The large number of C-type hepatitis infections in the 1920 to 1940s is attributed to the injection treatment for schistosomiasis (Tanaka et al., 2005).Further, in many areas of the HU group, the prevalence of ALI was significantly high.
As can be seen in Table 6, which shows the results of the logistic regression analyses, the MAI for liver disease showed a significant positive regression with anemia (P<0.001),hypercholesterolemia (P<0.001), and kidney dysfunction (P<0.01).Alcoholic fatty liver shows hypercholesterolemia, and alcoholic liver cirrhosis shows anemia or kidney dysfunction with the advancement of ALI.Accordingly, ALI in the HU-group was assessed as advanced ALI. (Wade et al., 1996).
Similar analyses were performed in the other medical categories, as shown in Table 6.Another medical category could be either the cause or the result of the disease of MAI category in each case if the MAI value indicates advancement of the disease.Thus, the sign of the regression coefficient and the existence of significance between the disease and another medical category were not always equal when these diseases were exchanged.The mean age of examinees was calculated as follows: sum of mid-age × number of examinees in the age group/sum of examinees between ages 40 and 74.The regression between the mean age of examinees and the MAI of ALI showed significance.The MAI value could not be dissociated from the mean age of the examinee, because the MAI is calculated by the mean age of the patient, which could be related to the mean age of the examinee.
Japanese food, especially those prepared in the traditional way, is low in calories and high in sodium, unsaturated fatty acid, and dietary fiber (Oiso, 1975;Ueshima et al., 1982;Kuratsune et al., 1985;Kato et al., 1987).Accordingly, it was reasonable that the relationship between the MAI of hypertension and the significance of hypercholesterolemia and diabetes mellitus was shown to have a significant negative regression.The significant negative regression between the MAI of hypertension and anemia could be the reason why polycythemia is found in patients with sustained hypertension (Eugene, 1989).Based on the significant negative regression between MAI of hypercholesterolemia and liver disease, MAI of anemia and hypercholesterolemia or diabetes mellitus, and MAI of kidney dysfunction and hypercholesterolemia, malnutrition is found in patients with chronic hepatitis, anemia, or chronic renal failure, whose stored body fat is decreased (John, 2001;Charles, 2001).

Formula for MAI and simulation
The incidence rate is usually calculated by the number of new cases in the given period/population at risk (Armitage et al., 2002).Then, what is the population at risk or the given period on the MAI value?In the simulation, the constant MAI was revealed after x years = the reference age-the age in the age group beginning disease (hereafter; youngest age group).Accordingly, the population at risk is the age-class between the youngest age group and the age group until reference age (oldest age group), and the x year is the time to extend back.In the z-axis shows the years to be back, the small wave shows the age-specific incidence rate in each year, and the large curve in 0 year shows the age-specific prevalence.The oldest age group in 0 year corresponds to the youngest age group in x year.For the participating age groups in incidence, the youngest age group is between 0 and x years, and the oldest age group is 0 year only.Accordingly, the year to be extended back for the MAI shows j(j=0,1,2,3…..,x).the reference age shows T. the youngest age group shows a, the incidence rate of i year age group in j year shows R ij , Figure 5.The conception of the virtual age-specific incidence rates for MAI.Z-axis is the years to be back for the calculation of MAI.Small curve shows the age-specific incidence rate in each year.Large curve in 0 year shows the age-specific prevalence.Oldest age group in 0 year corresponds to the youngest age group in x year.
The MAI is considered to be nearest mean age of the virtual incidence rates (Figure 5).In the 2010 Japan population research, 80.7% of those in the 40 to 49-year age group continued to reside in the city, town, or village in 5 years, and in the older age group, this percentage became almost 90% or more than 90%.Additionally, MAI is calculated for life-style disease in the older age group.For those in the age groups older than 40 years old and who were just separated in 5 years, there could be the same property for the area.Variable number of patients or population in the age groups was not an independent random variable, especially to the nearest age groups, and these age groups were considered in one group (Miller, 1983f).Thus, the virtual incidence rate could be considered.In the computer simulation with or without the death rate, the maximum prevalence rate within age groups almost corresponds to the sum of the occurrence rates in the 1-year age groups.
In simulation without the death rate, when the width of occurrence was wide, MAI was younger than the setting age.Moreover, in simulation with the death rate, MAI was younger than the setting age when the death rate was larger.However, in simulation with death from disease and total death, MAI was closer to the setting age than that in the simulation with death from disease only.
The parallelogram in Figure 1 features the line EC, in which a case with no old patients is represented as line AF and a case where these old patients are dead at the time of the inhabitants' detection survey is shown as line AD.In this case, MAI should be the age E and younger than B.  If MAI could become younger as the grade from the death possibility becomes higher, the proportion of the HU group could be small, and thereafter, the upper limit might have to be set as a broad percentage.
The formula for MAI is very easy to calculate when there is data on age-specific prevalence.
MAI was considered significant when 5  n P and close to or lower than the actual value.

Discrimination of main disease or cause in the disorder by MAI
The quiescent case shown in Figure 6 represents the HU group.The HU group with liver disease was thought to have been infected by type C hepatitis in the past or to have advanced ALI.On the other hand, the cause of liver disease in the HL group was thought to be ALI, which was found to occur in the younger age groups, as shown by the active case (Figure 6).Further, in several medical categories, significance related to the medication was found.In the non-HU and non-HL groups, there could be patients in the early and advanced stages of disease in those areas where the prevalence could be shown to be very high.
This method is called "MAI analysis."An investigation of the causes in the HL or HU group might be considered useful in future studies, allowing, for example, patients in the advanced or early stage of disease to be found easily.These procedures could help the advancement of the public health in the communities.Furthermore, the mechanism for the advance of the chronic disease could be revealed by this analysis, and this procedure might help or confirm the outcome from the basic medicine.

Reference age for calculation of MAI
In the 2005 life table of Japan, the survival rate at 75 years was 69.1% for men and 85.0% for women (Japanese Government, 2009).Thus, the reference age of 75 years could be used in this study because of these high rates of survival.

Application of MAI analysis in death
The mean age of death and death rate could be used instead of the prevalence.Thus, the different causes under the same category of death could be revealed in this analysis.

Conclusion
The theoretical formula for the MAI of chronic disease, calculated using the prevalence before the age of 75 years in cursory age group, is given as following The sum of the incidence rates within the 1-year age groups almost corresponds to the maximum prevalence of chronic diseases within age groups.MAI could be calculated at an age younger than the actual age according to the severity of the disease correlated to death.When data on the age-specific prevalence of a chronic disease in all areas were present, those areas where the age-adjusted prevalence was significantly high and where MAI was distinguished by a limit lower than 2.5% of the distribution of MAI were classified as the HL group.These areas were considered active areas for the incidence of the disease.Those areas where the ageadjusted prevalence was significantly high and where MAI was distinguished by a limit higher than 2.5% of the distribution of MAI were classified as the HU group.These areas were considered quiescent areas for the incidence of the disease.
Some parts of this study were already demonstrated at the Joint Scientific Meeting of the International Epidemiological Association Western Pacific Region and the Japan Epidemiological Association in 2010 (Saitama).The formula above shows that the period of disease before 75 years is: (75 -the average age of patients from agespecific prevalence) × 2.

Figure 1 .
Figure 1.Simple model of the theoretical formula for mid-age of incidence (MAI).The straight line AD shows the time when the inhabitants' detection survey was carried out.There are the same model person of 40 years of age at the line AD who shows the bottom of the parallelogram and the same model person of 75 years of age at the time of the line AD who shows the top of the parallelogram.BC=∆ABC×2/AD.

Figure 2 .
Figure 2. Histogram of MAI in moderate hypercholesterolemia. Vertical line shows mean value as 39.2.

Figure 3 .
Figure3.Prevalence after several years in the simulation without the death rate and width of occurrence of 17 years.After 1 year in the simulation, the prevalence of the age groups is equal to the incidence rate in the simulation.Vertical line shows the age when the maximum prevalence was revealed.

Figure 4 .
Figure 4. Prevalence after several years in the simulation with death rate of disease and total death rate using approximation formula model.

Figure 6 .
Figure 6.Models of common, quiescent and active cases in disease.

Table 1 .
Categories of medical diagnosis in the inhabitants' detection survey.
Anemia (confirms the suspicion)Hemoglobin; male; less than 13g/dl, female; less than 12g/dl Kidney dysfunction (confirm suspected disease) Creatinine in the serum; male; over than 1.2 mg/dl, female over than 1.0 mg/dl SBP; systolic blood pressure; DBP; diastolic blood pressure as the MAI.

Table 2 .
The simulation result without the mechanism of death

Table 3 .
The simulation results with the mechanism of death

Table 4 .
Proportions of HL group and HU group according to liver disease in the inhabitants' survey.

Table 5 .
Liver disease and alcoholic liver injury in inhabitants' detection survey in 2007, and death rate of hepation in 2010.; Age-adjusted, LD; liver disease, ALI; alcoholic liver injury, CND; cannot be determined, test of significance was performed by two-tail test, *p<0.05,**p<0.01,***p<0.001. AA

Table 6 .
The Logistic regression between the MAI of areas where the age-adjusted prevalence of disease was significantly high and the significance belongs to another medical category and regression for mean age of examinees.Cannot be determined; ie., the calculated MAI values of the disease and of another category were same.Mean of another category was shown sum of 1/number of date.
. Where P i is the prevalence in the ith age group, c i is the normalized weight for length of each age group, and age adjustment is used for the equivalent average; There are deflections within the age groups if there are multiple persons in each age group.Therefore, normalized weights were acquired by the reciprocal of the number of examinees, as shown by the following formula: