関: Cox model、Cox proportional hazards model、hazards model、proportional hazard model、proportional hazards model

WordNet

display (clothes) as a mannequin; "model the latest fashion"
the act of representing something (usually on a smaller scale) (同)modelling, modeling
representation of something (sometimes on a smaller scale) (同)simulation
a representative form or pattern; "I profited from his example" (同)example
a person who poses for a photographer or painter or sculptor; "the president didnt have time to be a model so the artist worked from photos" (同)poser
someone worthy of imitation; "every child needs a role model" (同)role model
a hypothetical description of a complex entity or process; "the computer program was based on a model of the circulatory and respiratory systems" (同)theoretical account, framework
a type of product; "his car was an old model"
construct a model of; "model an airplane" (同)mock up
form in clay, wax, etc; "model a head with clay" (同)mold, mould
plan or create according to a model or models (同)pattern
assume a posture as for artistic purposes; "We dont know the woman who posed for Leonardo so often" (同)pose, sit, posture
create a representation or model of; "The pilots are trained in conditions simulating high-altitude flights" (同)simulate
a source of danger; a possibility of incurring loss or misfortune; "drinking alcohol is a health hazard" (同)jeopardy, peril, risk, endangerment
an obstacle on a golf course
a particular functioning condition or arrangement; "switched from keyboard to voice mode"
the most frequent value of a random variable (同)modal value
any of various fixed orders of the various diatonic notes within an octave (同)musical mode
relating to a recently developed fashion or style; "their offices are in a modern skyscraper"; "tables in modernistic designs"; (同)modern, modernistic
a British teenager or young adult in the 1960s; noted for their clothes consciousness and opposition to the rockers
resembling sculpture; "her finely modeled features"; "rendered with...vivid sculptural effect"; "the sculpturesque beauty of the athletes bodies" (同)sculptural, sculptured, sculpturesque

PrepTutorEJDIC

(通名縮小した)(…の)『模型』,ひな型《+『of』+『名』》 / 《単数形で》(…の)『模範』,手本《+『of』+(『for』)+『名』》 / (美術家・作家などの)モデル,ファッションモデル / (自動車・服装などの)型,式《+『of』+『名』》 / 《英》《単数形で》(…と)そっくりな人(物),(…の)生き写し《+『of』+『名』》 / 『模型の』,見本の / 『日範的な』,申し分のない / (ある材料で)…‘の'模型を作る,‘を'型どる《+『名』+『in』+『名』〈材料〉》 / (手本・型に合わせて)…‘を'作る《+『名』+『after』(『on, upon』)+『名』(a person's do『ing』)》 / 〈洋装・髪型などの〉‘の'モデルをする / (…で)(原型)を作る《+『in』+『名』》 / (ファッションショーなどで)モデルをする
危険,冒険;(…にとっての)危険物《+『to』+『名』》 / (ゴルフや障害レースの)障害物 / 〈生命・財産など〉‘を'危険にさらす / …‘を'思い切ってする(言う)
〈C〉(…の)『方法』,やり方,流儀《+『of』+『名』(do『ing』)》 / 〈U〉〈C〉(服装などの)『流行』,モード / 〈C〉(動詞の)法,叙法(mood) / 音階(scale)
モッズふうな(服装・音楽などについて伝統にとらわれない考え方の一つ)[modernの短縮形]

Wikipedia preview

出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2017/03/04 14:59:55」(JST)

wiki en

Survival analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

To answer such questions, it is necessary to define "lifetime". In the case of biological survival, death is unambiguous, but for mechanical reliability, failure may not be well-defined, for there may well be mechanical systems in which failure is partial, a matter of degree, or not otherwise localized in time. Even in biological problems, some events (for example, heart attack or other organ failure) may have the same ambiguity. The theory outlined below assumes well-defined events at specific times; other cases may be better treated by models which explicitly account for ambiguous events.

More generally, survival analysis involves the modelling of time to event data; in this context, death or failure is considered an "event" in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research.

1 Introduction to survival analysis
- 1.1 Definitions of common terms in survival analysis
- 1.2 Example: Acute Myelogenous Leukemia survival data
  - 1.2.1 Kaplan-Meier plot for the aml data
  - 1.2.2 Life table for the aml data
  - 1.2.3 Log-rank test: Testing for differences in survival in the aml data
- 1.3 Cox proportional hazards (PH) regression analysis
  - 1.3.1 Example: Cox proportional hazards regression analysis for melanoma
  - 1.3.2 Cox model using a covariate in the melanoma data
  - 1.3.3 Extensions to Cox models
- 1.4 Tree-structured survival models
  - 1.4.1 Example survival tree analysis
  - 1.4.2 Survival random forests
2 General formulation
- 2.1 Survival function
- 2.2 Lifetime distribution function and event density
- 2.3 Hazard function and cumulative hazard function
- 2.4 Quantities derived from the survival distribution
3 Censoring
4 Fitting parameters to data
5 Non-parametric estimation
6 Computer software for survival analysis
- 6.1 Survival analysis in R
  - 6.1.1 Analyses using the R package "survival"
  - 6.1.2 Survival tree analysis using the rpart package
  - 6.1.3 Survival random forest models using the randomForestSRC package
7 Distributions used in survival analysis
8 See also
9 References
10 Further reading
11 External links

Introduction to survival analysis

Survival analysis is used in several ways:

To describe the survival times of members of a group
- Life tables
- Kaplan-Meier curves
- Survival function
- Hazard function
To compare the survival times of two or more groups
- Log-rank test
To describe the effect of categorical or quantitative variables on survival
- Cox proportional hazards regression
- Parametric survival models
- Survival trees
- Survival random forests

Definitions of common terms in survival analysis

The following terms are commonly used in survival analyses.

Event: Death, disease occurrence, disease recurrence, recovery, or other experience of interest
Time: The time from the beginning of an observation period (such as surgery or beginning treatment) to (i) an event, or (ii) end of the study, or (iii) loss of contact or withdrawal from the study.
Censoring / Censored observation: If a subject does not have an event during the observation time, they are described as censored. The subject is censored in the sense that nothing is observed or known about that subject after the time of censoring. A censored subject may or may not have an event after the end of observation time.
Survival function S(t): The probability that a subject survives longer than time t.

Example: Acute Myelogenous Leukemia survival data

This example uses the Acute Myelogenous Leukemia survival data set "aml" from the "survival" package in R. The data set is from Miller (1997) ^[1] The question at the time was whether the standard course of chemotherapy should be extended ('maintained') for additional cycles.

The aml data set sorted by survival time is shown in the box.

aml data set sorted by survival time

Time is indicated by the variable "time", which is the survival or censoring time
Event (recurrence of aml cancer) is indicated by the variable "status". 0 = no event (censored), 1 = event (recurrence)
Treatment group: the variable "x" indicates if maintenance chemotherapy was given

The last observation (11), at 161 weeks, is censored. Censoring indicates that the patient did not have an event (no recurrence of aml cancer). Another subject, observation 3, was censored at 13 weeks (indicated by status=0). This subject was only in the study for 13 weeks, and the aml cancer did not recur during those 13 weeks. It is possible that this patient was enrolled near the end of the study, so that they could only be observed for 13 weeks. It is also possible that the patient was enrolled early in the study, but was lost to follow up or withdrew from the study. The table shows that other subjects were censored at 16, 28, and 45 weeks (observations 17, 6, and 9 with status=0). The remaining subjects all experienced events (recurrence of aml cancer) while in the study. The question of interest is whether recurrence occurs later in maintained patients than in non-maintained patients.

Kaplan-Meier plot for the aml data

The Survival function S(t), is the probability that a subject survives longer than time t. S(t) is theoretically a smooth curve, but it is usually estimated using the Kaplan-Meier(KM) curve. The graph shows the KM plot for the aml data.

Kaplan-Meier plot of AML survival data set

The KM plot is interpreted as follows.

The x-axis is time, from zero (when observation began) to the last observed time point.
The y axis is the proportion of subjects surviving. At time zero, 100% of the subjects are alive without an event.
The solid line (similar to a staircase) shows the events.
A vertical drop indicates an event. In the aml table shown above, two subjects had events at 5 weeks, two had events at 8 weeks, one had an event at 9 weeks, and so on. These events at 5 weeks, 8 weeks and so on are indicated by the vertical drops in the KM plot at those time points.
At the far right end of the KM plot there is a tick mark at 161 weeks. The vertical tick mark indicates that a patient was censored at this time. In the aml data table five subjects were censored, at 13, 16, 28, 45 and 161 weeks. There are five tick marks in the KM plot, corresponding to these censored observations.

Life table for the aml data

A life table summarizes survival data in terms of the number of events and the proportion surviving at each event time point. The life table for the aml data, created using the R software, is shown.

Life table for the aml data

The life table summarizes the events and the proportion surviving at each event time point. The columns in the life table have the following interpretation.

time gives the time points at which events occur.
n.risk is the number of subjects at risk immediately before the time point, t. Being "at risk" means that the subject has not had an event before time t, and is not censored before or at time t.
n.event is the number of subjects who have events at time t.
survival is the proportion surviving, as determined using the Kaplan-Meier product-limit estimate.
std.err is the standard error of the estimated survival. The standard error of the Kaplan-Meier product-limit estimate at time it is calculated using Greenwood’s formula, and depends on the number at risk (n.risk in the table), the number of deaths (n.event in the table) and the proportion surviving (survival in the table).
lower 95% CI and upper 95% CI are the lower and upper 95% confidence bounds for the proportion surviving.

Log-rank test: Testing for differences in survival in the aml data

The logrank test compares the survival times of two or more groups. This example uses a logrank test for a difference in survival in the maintained versus non-maintained treatment groups in the aml data. The graph shows KM plots for the aml data broken out by treatment group, which is indicated by the variable "x" in the data.

Kaplan-Meier graph by treatment group in aml

The null hypothesis for a logrank test is that the groups have the same survival. The expected number of subjects surviving at each time point in each is adjusted for the number of subjects at risk in the groups at each event time. The logrank test determines if the observed number of events in each group is significantly different from the expected number. The formal test is based on a chi-squared statistic. When the log rank statistic is large, it is evidence for a different in the survival times between the groups. The log rank statistic has a chi-squared distribution with one degree of freedom, and the p-value is calculated using the chi-squared distribution.

The log rank test for difference in survival gives a p-value of p=0.0653, indicating that the treatment groups do not differ significantly in survival, assuming an alpha level of 0.05. The sample size of 23 subjects is modest, so there is little power to detect differences between the treatment groups. The chi-squared test is based on asymptotic approximation, so the p-value should be regarded with caution for small sample sizes.

Cox proportional hazards (PH) regression analysis

Kaplan-Meier curves and logrank tests are most useful when the predictor variable is categorical (e.g., drug vs. placebo), or takes a small number of values (e.g., drug doses 0, 20, 50, and 100 mg/day) that can be treated as categorical. The logrank test and K-M curves don’t work easily with quantitative predictors such as gene expression, white blood count, or age. For quantitative predictor variables, an alternative method is Cox proportional hazards regression analysis. Cox PH models^[2] work also with categorical predictor variables, which are encoded as {0,1} indicator or dummy variables. The logrank test is a special case of a Cox PH analysis, and can be performed using Cox PH software.

Example: Cox proportional hazards regression analysis for melanoma

This example uses the melanoma data set from Dalgaard Chapter 12. ^[3]

Data are in the R package ISwR. The Cox proportional hazards regression using R gives the results shown in the box.

Cox proportional hazards regression output for melanoma data. Predictor variable is sex 1: female, 2: male.

The Cox regression results are interpreted as follows.

Sex is encoded as a numeric vector. 1: female, 2: male. The R summary for the cox model gives the hazard ratio (HR) for the second group relative to the first group, that is, male versus female.
coef= 0.662 is the estimated logarithm of the hazard ratio for males versus females.
exp(coef) = 1.94 = exp(0.662) The log of the hazard ratio (coef= 0.662) is transformed to the hazard ratio using exp(coef). The summary for the Cox model gives the hazard ratio for the second group relative to the first group, that is, male versus female. The estimated hazard ratio = 1.94 indicates that males have higher risk of death (lower survival rates) than females, in these data.
se(coef) = 0.265 is the standard error of the log hazard ratio.
z = 2.5 = coef/se(coef) = 0.662/0.265. Dividing the coef by its standard error gives the z score:
p=0.013. The p-value corresponding to z=2.5 for sex is p=0.013, indicating that there is a significant difference in survival as a function of sex.

The summary output also gives upper and lower 95% confidence intervals for the hazard ratio, lower 95% bound = 1.15, upper 95% bound = 3.26.

Finally, the output gives p-values for three alternative tests for overall significance of the model:

Likelihood ratio test= 6.15 on 1 df, p=0.0131
Wald test = 6.24 on 1 df, p=0.0125
Score (logrank) test = 6.47 on 1 df, p=0.0110

These three methods are asymptotically equivalent. For large enough N, they will give similar results. For small N, they may differ somewhat. The last row, "Score (logrank) test" is the result for the logrank test, with p=0.011, the same result as the logrank test, because the logrank test is a special case of a Cox PH regression. The Likelihood ratio test has better behavior for small sample sizes, so it is generally preferred.

Cox model using a covariate in the melanoma data

The Cox model extends the logrank test by allowing the inclusion of additional covariates. This example use the melanom data set where the predictor variables include a continuous covariate, the thickness of the tumor (variable name = “thick”)

Histograms of melanoma tumor thickness

In the histograms, the thickness values don’t look normally distributed. Regression models, including the Cox model, generally give more reliable results with normally-distributed variables. For this example use a log transform. The log of the thickness of the tumor looks to be more normally distributed, so the Cox models will use log thickness. The Cox PH analysis gives the results in the box.

Cox PH output for melanoma data set with covariate log tumor thickness

The p-value for all three overall tests (likelihood, Wald, and score) are significant, indicating that the model is significant. The p-value for log(thick) is 6.9e-07, with a hazard ratio HR = exp(coef) = 2.18, indicating a strong relationship between the thickness of the tumor and increased risk of death.

By contrast, the p-value for sex is now p=0.088. The hazard ratio HR = exp(coef) = 1.58, with a 95% confidence interval of 0.934 to 2.68. Because the confidence interval for HR includes 1, these results indicate that sex makes a smaller contribution to the difference in the HR after controlling for the thickness of the tumor, and only trend toward significance. Examination of graphs of log(thickness) by sex and a t-test of log(thickness) by sex both indicate that there is a significant difference between men and women in the thickness of the tumor when they first see the clinician.

The Cox model assumes that the hazards are proportional. The proportional hazard assumption may be tested using the R function cox.zph(). A p-value is less than 0.05 indicates that the hazards are not proportional. For the melanoma data, p=0.222, indicating that the hazards are, at least approximately, proportional. Additional tests and graphs for examining a Cox model are described in the textbooks cited.

Extensions to Cox models

Cox models can be extended to deal with variations on the simple analysis.

Stratification. The subjects can be divided into strata, where subjects within a stratum are expected to be relatively more similar to each other than to randomly chosen subjects from other strata. The regression parameters are assumed to be the same across the strata, but a different baseline hazard may exist for each stratum. Stratification is useful for analyses using matched subjects, for dealing with patient subsets, such as different clinics, and for dealing with violations of the proportional hazard assumption.
Time-varying covariates. Some variables, such as gender and treatment group, generally stay the same in a clinical trial. Other clinical variables, such as serum protein levels or dose of concomitant medications may change over the course of a study. Cox models may be extended for such time-varying covariates.

Tree-structured survival models

The Cox PH regression model is a linear model. It is similar to linear regression and logistic regression. Specifically, these methods assume that a single line, curve, plane, or surface is sufficient to separate groups (alive, dead) or to estimate a quantitative response (survival time).

In some cases alternative partitions give more accurate classification or quantitative estimates. One set of alternative methods are tree-structured survival models, including survival random forests. Tree-structured survival models may give more accurate predictions than Cox models. Examining both types of models for a given data set is a reasonable strategy.

Example survival tree analysis

This example of a survival tree analysis uses the R package "rpart". The example is based on 146 stage C prostate cancer patients in the data set stagec in rpart. Rpart and the stagec example are described in the PDF document "An Introduction to Recursive Partitioning Using the RPART Routines". Terry M. Therneau, Elizabeth J. Atkinson, Mayo Foundation. September 3, 1997.

The variables in stagec are:

pgtime time to progression, or last follow-up free of progression
pgstat status at last follow-up (1=progressed, 0=censored)
age age at diagnosis
eet early endocrine therapy (1=no, 0=yes)
ploidy diploid/tetraploid/aneuploid DNA pattern
g2 % of cells in G2 phase
grade tumor grade (1-4)
gleason Gleason grade (3-10)

The survival tree produced by the analysis is shown in the figure.

Survival tree for prostate cancer data set

Each branch in the tree indicates a split on the value of a variable. For example, the root of the tree splits subjects with grade < 2.5 versus subjects with grade 2.5 or greater. The terminal nodes indicate the number of subjects in the node, the number of subjects who have events, and the relative event rate compared to the root. In the node on the far left, the values 1/33 indicate that 1 of the 33 subjects in the node had an event, and that the relative event rate is 0.122. In the node on the far right bottom, the values 11/15 indicate that 11 of 15 subjects in the node had an event, and the relative event rate is 2.7.

Survival random forests

An alternative to building a single survival tree is to build many survival trees, where each tree is constructed using a sample of the data, and average the trees to predict survival. This is the method underlying the survival random forest models. Survival random forest analysis is available in the R package "randomForestSRC".

The randomForestSRC package includes an example survival random forest analysis using the data set pbc. This data is from the Mayo Clinic Primary Biliary Cirrhosis (PBC) trial of the liver conducted between 1974 and 1984. In the example, the random forest survival model gives more accurate predictions of survival than the Cox PH model. The prediction errors are estimated by bootstrap re-sampling.

General formulation

Survival function

The object of primary interest is the survival function, conventionally denoted S, which is defined as

S(t)=\Pr(T>t)

where t is some time, T is a random variable denoting the time of death, and "Pr" stands for probability. That is, the survival function is the probability that the time of death is later than some specified time t. The survival function is also called the survivor function or survivorship function in problems of biological survival, and the reliability function in mechanical survival problems. In the latter case, the reliability function is denoted R(t).

Usually one assumes S(0) = 1, although it could be less than 1 if there is the possibility of immediate death or failure.

The survival function must be non-increasing: S(u) ≤ S(t) if u ≥ t. This property follows directly because T>u implies T>t. This reflects the notion that survival to a later age is only possible if all younger ages are attained. Given this property, the lifetime distribution function and event density (F and f below) are well-defined.

The survival function is usually assumed to approach zero as age increases without bound, i.e., S(t) → 0 as t → ∞, although the limit could be greater than zero if eternal life is possible. For instance, we could apply survival analysis to a mixture of stable and unstable carbon isotopes; unstable isotopes would decay sooner or later, but the stable isotopes would last indefinitely.

Lifetime distribution function and event density

Related quantities are defined in terms of the survival function.

The lifetime distribution function, conventionally denoted F, is defined as the complement of the survival function,

F(t)=\Pr(T\leq t)=1-S(t).

If F is differentiable then the derivative, which is the density function of the lifetime distribution, is conventionally denoted f,

f(t)=F'(t)={\frac {d}{dt}}F(t).

The function f is sometimes called the event density; it is the rate of death or failure events per unit time.

The survival function can be expressed in terms of probability distribution and probability density functions

S(t)=\Pr(T>t)=\int _{t}^{\infty }f(u)\,du=1-F(t).

Similarly, a survival event density function can be defined as

s(t)=S'(t)={\frac {d}{dt}}S(t)={\frac {d}{dt}}\int _{t}^{\infty }f(u)\,du={\frac {d}{dt}}[1-F(t)]=-f(t).

In other fields, such as statistical physics, the survival event density function is known as the first passage time density.

Hazard function and cumulative hazard function

The hazard function, conventionally denoted $\lambda$ , is defined as the event rate at time t conditional on survival until time t or later (that is, T ≥ t). Suppose that an item has survived for a time t and we desire the probability that it will not survive for an additional time dt. That is, consider P{X $\in$ (t,t + dt)|X > t}

\lambda (t)=\lim _{dt\rightarrow 0}{\frac {\Pr(t\leq T<t+dt)}{dt\cdot S(t)}}={\frac {f(t)}{S(t)}}=-{\frac {S'(t)}{S(t)}}.

Force of mortality is a synonym of hazard function which is used particularly in demography and actuarial science, where it is denoted by $\mu$ . The term hazard rate is another synonym.

The force of mortality of the survival function is defined as $\mu (x)=-{d \over dx}\ln(S(x))={\frac {f(x)}{S(x)}}$

The force of mortality is also called the force of failure. is the probability density function of the distribution.

In actuarial science, the hazard rate is the rate of death for lives aged x. For a life aged x, the force of mortality t years later is the force of mortality for a (x + t)–year old. The hazard rate is also called the failure rate. Hazard rate and failure rate are names used in reliability theory.

Any function is a hazard function if and only if it satisfies the following properties:

$h(x)\geq 0\forall (x\geq 0)$ ,
$\int _{0}^{\infty }h(x)dx=\infty$ .

In fact, the hazard rate is usually more informative about the underlying mechanism of failure than the other representatives of a lifetime distribution.

The hazard function must be non-negative, λ(t) ≥ 0, and its integral over $[0,\infty ]$ must be infinite, but is not otherwise constrained; it may be increasing or decreasing, non-monotonic, or discontinuous. An example is the bathtub curve hazard function, which is large for small values of t, decreasing to some minimum, and thereafter increasing again; this can model the property of some mechanical systems to either fail soon after operation, or much later, as the system ages.

The hazard function can alternatively be represented in terms of the cumulative hazard function, conventionally denoted $\Lambda$ :

\,\Lambda (t)=-\log S(t)

so transposing signs and exponentiating

\,S(t)=\exp(-\Lambda (t))

or differentiating (with the chain rule)

{\frac {d}{dt}}\Lambda (t)=-{\frac {S'(t)}{S(t)}}=\lambda (t).

The name "cumulative hazard function" is derived from the fact that

\Lambda (t)=\int _{0}^{t}\lambda (u)\,du

which is the "accumulation" of the hazard over time.

From the definition of $\Lambda (t)$ , we see that it increases without bound as t tends to infinity (assuming that S(t) tends to zero). This implies that $\lambda (t)$ must not decrease too quickly, since, by definition, the cumulative hazard has to diverge. For example, $\exp(-t)$ is not the hazard function of any survival distribution, because its integral converges to 1.

Quantities derived from the survival distribution

Future lifetime at a given time $t_{0}$ is the time remaining until death, given survival to age $t_{0}$ . Thus, it is $T-t_{0}$ in the present notation. The expected future lifetime is the expected value of future lifetime. The probability of death at or before age $t_{0}+t$ , given survival until age $t_{0}$ , is just

P(T\leq t_{0}+t\mid T>t_{0})={\frac {P(t_{0}<T\leq t_{0}+t)}{P(T>t_{0})}}={\frac {F(t_{0}+t)-F(t_{0})}{S(t_{0})}}.

Therefore, the probability density of future lifetime is

{\frac {d}{dt}}{\frac {F(t_{0}+t)-F(t_{0})}{S(t_{0})}}={\frac {f(t_{0}+t)}{S(t_{0})}}

and the expected future lifetime is

{\frac {1}{S(t_{0})}}\int _{0}^{\infty }t\,f(t_{0}+t)\,dt={\frac {1}{S(t_{0})}}\int _{t_{0}}^{\infty }S(t)\,dt,

where the second expression is obtained using integration by parts.

For $t_{0}=0$ , that is, at birth, this reduces to the expected lifetime.

In reliability problems, the expected lifetime is called the mean time to failure, and the expected future lifetime is called the mean residual lifetime.

As the probability of an individual surviving until age t or later is S(t), by definition, the expected number of survivors at age t out of an initial population of n newborns is n × S(t), assuming the same survival function for all individuals. Thus the expected proportion of survivors is S(t). If the survival of different individuals is independent, the number of survivors at age t has a binomial distribution with parameters n and S(t), and the variance of the proportion of survivors is S(t) × (1-S(t))/n.

The age at which a specified proportion of survivors remain can be found by solving the equation S(t) = q for t, where q is the quantile in question. Typically one is interested in the median lifetime, for which q = 1/2, or other quantiles such as q = 0.90 or q = 0.99.

One can also make more complex inferences from the survival distribution. In mechanical reliability problems, one can bring cost (or, more generally, utility) into consideration, and thus solve problems concerning repair or replacement. This leads to the study of renewal theory and reliability theory of ageing and longevity.

Censoring

Censoring is a form of missing data problem which is common in survival analysis. Ideally, both the birth and death dates of a subject are known, in which case the lifetime is known.

If it is known only that the date of death is after some date, this is called right censoring. Right censoring will occur for those subjects whose birth date is known but who are still alive when they are lost to follow-up or when the study ends.

If a subject's lifetime is known to be less than a certain duration, the lifetime is said to be left-censored. Left censoring is usually applied when subjects in a study already have exhibited the event in question at the start of the study but information about when they first reached the event is unclear.^[4]

It may also happen that subjects with a lifetime less than some threshold may not be observed at all: this is called truncation. Note that truncation is different from left censoring, since for a left censored datum, we know the subject exists, but for a truncated datum, we may be completely unaware of the subject. Truncation is also common. In a so-called delayed entry study, subjects are not observed at all until they have reached a certain age. For example, people may not be observed until they have reached the age to enter school. Any deceased subjects in the pre-school age group would be unknown. Left-truncated data are common in actuarial work for life insurance and pensions.^[5]

We generally encounter right-censored data. Left-censored data can occur when a person's survival time becomes incomplete on the left side of the follow-up period for the person. For example, in an epidemiological example, we may monitor a patient for an infectious disorder starting from the time when he or she is tested positive for the infection. Although we may know the right-hand side of the duration of interest, we may never know the exact time of exposure to the infectious agent.^[6]

Fitting parameters to data

Survival models can be usefully viewed as ordinary regression models in which the response variable is time. However, computing the likelihood function (needed for fitting parameters or making other kinds of inferences) is complicated by the censoring. The likelihood function for a survival model, in the presence of censored data, is formulated as follows. By definition the likelihood function is the conditional probability of the data given the parameters of the model. It is customary to assume that the data are independent given the parameters. Then the likelihood function is the product of the likelihood of each datum. It is convenient to partition the data into four categories: uncensored, left censored, right censored, and interval censored. These are denoted "unc.", "l.c.", "r.c.", and "i.c." in the equation below.

L(\theta )=\prod _{T_{i}\in unc.}\Pr(T=T_{i}\mid \theta )\prod _{i\in l.c.}\Pr(T<T_{i}\mid \theta )\prod _{i\in r.c.}\Pr(T>T_{i}\mid \theta )\prod _{i\in i.c.}\Pr(T_{i,l}<T<T_{i,r}\mid \theta ).

For uncensored data, with $T_{i}$ equal to the age at death, we have

\Pr(T=T_{i}\mid \theta )=f(T_{i}\mid \theta ).

For left-censored data, such that the age at death is known to be less than $T_{i}$ , we have

\Pr(T<T_{i}\mid \theta )=F(T_{i}\mid \theta )=1-S(T_{i}\mid \theta ).

For right-censored data, such that the age at death is known to be greater than $T_{i}$ , we have

\Pr(T>T_{i}\mid \theta )=1-F(T_{i}\mid \theta )=S(T_{i}\mid \theta ).

For an interval censored datum, such that the age at death is known to be less than $T_{i,r}$ and greater than $T_{i,l}$ , we have

\Pr(T_{i,l}<T<T_{i,r}\mid \theta )=S(T_{i,l}\mid \theta )-S(T_{i,r}\mid \theta ).

An important application where interval-censored data arises is current status data, where an event $T_{i}$ is known not to have occurred before an observation time and to have occurred before the next observation time.

Non-parametric estimation

The Kaplan-Meier estimator can be used to estimate the survival function. The Nelson–Aalen estimator can be used to provide a non-parametric estimate of the cumulative hazard rate function.

Computer software for survival analysis

The UCLA website http://www.ats.ucla.edu/stat/ has numerous examples of statistical analyses using SAS, R, SPSS and STATA, including survival analyses.

The textbook by Kleinbaum ^[7] has examples of survival analyses using SAS, R, and other packages. The textbooks by Brostrom ^[8] by Dalgaard ^[3] and by Tableman and Kim ^[9] give examples of survival analyses using R (or using S, and which run in R).

Survival analysis in R

The code below performs the analyses on this Wikipedia page.

Analyses using the R package "survival"

The examples above use the R package "survival", except for the tree analyses described below.

# Install and load the survival package

install.packages("survival")
library(survival)

# sort the aml data by time
aml=aml[order(aml$time),]

aml
# Create graph of length of time that each subject was in the study
with(aml, plot(time, type="h"))

# Create the life table survival object for aml
aml.survfit = survfit(Surv(time, status == 1) ~ 1, data=aml)

# Plot the Kaplan-Meier curve for aml. Don't print the confidence interval.
plot(aml.survfit, xlab = "Time (weeks)", ylab="Proportion surviving", conf.int=FALSE, main="Survival in AML")

# Create the life table for the aml data
# The functions survfit() and Surv() create a life table survival object.
# The summary() function displays the life table
# The life table object is passed to the plot() function to create the KM plot.
aml.survfit = survfit(Surv(time, status == 1) ~ 1, data=aml)

summary(aml.survfit)

# Kaplan-Meier curve for aml with the confidence bounds. 
# By default, R includes the confidence interval. 
plot(aml.survfit, xlab = "Time", ylab="Proportion surviving")

# Create aml life tables and KM plots broken out by treatment (x,  "Maintained" vs. "Not maintained")
surv.by.aml.rx = survfit(Surv(time, status == 1) ~ x, data = aml)

summary(surv.by.aml.rx)

# Plot KM 
plot(surv.by.aml.rx, xlab = "Time", ylab="Survival",col=c("black", "red"), lty = 1:2, main="Kaplan-Meier Survival vs. Maintenance in AML")

# Add legend
legend(100, .6, c("Maintained", "Not maintained"), lty = 1:2, col=c("black", "red"))

# Perform the log rank test using the R function survdiff().

surv.diff.aml= survdiff(Surv(time, status == 1) ~ x, data=aml)

surv.diff.aml

# Cox Proportional Hazards regression
# melanoma data set from ISwR package, described in Dalgaard Chapter 12. 
# install the ISwR package and load the library into R.
# The ISwR package currently only appears to be available for older versions of R

install.packages("ISwR")

library(ISwR)

help(melanom) # description of the melanoma data

# The log rank test is a special case of the cox proportional hazard regression analysis.
# The same analysis can be performed using the R function coxph().
# melanoma example using a log-rank test.
surv.diff.sex = survdiff(Surv(days, status == 1) ~ sex, data = melanom)

surv.diff.sex

# melanoma analysis using Cox proportional hazards regression
coxph.sex = coxph(Surv(days, status == 1) ~ sex, data = melanom)

summary(coxph.sex)

# melanoma Cox analysis including covariate ulcer thickness

# Plot the thickness values and log(thickness)
hist(melanom$thick)

hist(log(melanom$thick))

# The Cox PH analysis of melanoma data including covariate log(thick)

coxph.sex.thick = coxph(Surv(days, status == 1) ~ sex + log(thick), data = melanom)

summary(coxph.sex.thick)

# Examine thickness by sex
boxplot(log(melanom$thick) ~ melanom$sex)

t.test(log(melanom$thick) ~ melanom$sex)

# Test of proportional hazards assumption
coxph.sex = coxph(Surv(days, status == 1) ~ sex, data = melanom)

cox.zph(coxph.sex)

Survival tree analysis using the rpart package

Rpart and the example are described in the PDF document "An Introduction to Recursive Partitioning Using the RPART Routines". Terry M. Therneau, Elizabeth J. Atkinson, Mayo Foundation. September 3, 1997.

install.packages("rpart")
library(rpart)

head(stagec)

# Pass a survival object from Surv() to the function rpart() to perform the analysis.
fit <- rpart(Surv(pgtime, pgstat) ~ age + eet + g2 + grade + gleason + ploidy, data=stagec)

# plot the resulting tree

plot(fit, uniform=T, branch=.4, compress=T)
text(fit, use.n=T)
# The print() function provides details of the tree not shown above
print(fit)

Survival random forest models using the randomForestSRC package

Note that the R package randomSurvivalForest has been replaced by the package randomForestSRC, "Random Forests for Survival, Regression and Classification". See the randomForestSRC package for documentation on running the example.

Distributions used in survival analysis

Exponential distribution
Weibull distribution
Log-logistic distribution
Gamma distribution
Exponential-logarithmic distribution

References

^ Miller, Rupert G. (1997), Survival analysis, John Wiley & Sons, ISBN 0-471-25218-2
^ Renganathan, Vinaitheerthan (2016-03-31). "Overview of Frequentist and Bayesian approach to Survival Analysis". Applied Medical Informatics. 38 (1): 25–38. ISSN 2067-7855.
^ ^a ^b Dalgaard, Peter (2008), Introductory Statistics with R (Second ed.), Springer, ISBN 978-0387790534
^ "Censoring, Left and Right." International Encyclopedia of the Social Sciences, edited by William A. Darity, Jr., 2nd ed., vol. 1, Macmillan Reference USA, 2008, pp. 473-474. libraries.state.ma.us/login?gwurl=http://ic.galegroup.com/ic/uhic/ReferenceDetailsPage/ReferenceDetailsWindow?disableHighlighting=false&displayGroupName=Reference&currPage=&scanId=&query=&prodId=UHIC&search_within_results=&p=UHIC%3AWHIC&mode=view&catId=&limiter=&display-query=&displayGroups=&contentModules=&action=e&sortBy=&documentId=GALE%7CCX3045300295&windowstate=normal&activityType=&failOverType=&commentary=&source=Bookmark&u=mlin_w_amhercol&jsid=0938fef854cc86b83b5fe8a2c4bcb54b. Accessed 6 Nov. 2016.
^ Richards, S. J. (2012). "A handbook of parametric survival models for actuarial use". Scandinavian Actuarial Journal. 2012 (4): 233–257. doi:10.1080/03461238.2010.506688.
^ Singh, R.; Mukhopadhyay, K. (2011). "Survival analysis in clinical trials: Basics and must know areas". Perspect Clin Res. 2 (4): 145–148. doi:10.4103/2229-3485.86872.
^ Kleinbaum, David G.; Klein, Mitchel (2012), Survival analysis: A Self-learning text (Third ed.), Springer, ISBN 978-1441966452
^ Brostrom, Göran (2012), Event History Analysis with R (First ed.), Chapman & Hall/CRC, ISBN 978-1439831649
^ Tableman, Mara; Kim, Jong Sung (2003), Survival Analysis Using S (First ed.), Chapman and Hall/CRC, ISBN 978-1584884088

External links

Therneau, Terry. "A Package for Survival Analysis in S". via Dr. Therneau's page on the Mayo Clinic website
"Engineering Statistics Handbook". NIST/SEMATEK.
SOCR, Survival analysis applet and interactive learning activity.
Survival/Failure Time Analysis @ Statistics' Textbook Page
Survival Analysis in R
Lifelines, a Python package for survival analysis
Survival Analysis in NAG Fortran Library

UpToDate Contents

全文を閲覧するには購読必要です。 To read the full text you will need to subscribe.

1. 心不全の予後 prognosis of heart failure
2. 成人の心臓移植の適応および禁忌 indications and contraindications for cardiac transplantation in adults
3. システマティックレビューおよびメタアナリシス systematic review and meta analysis
4. セロトニン再取り込み阻害剤（SSRI）およびセロトニン・ノルアドレナリン再取り込み阻害薬（SNRI）への出生前曝露があった乳児 infants with antenatal exposure to selective serotonin reuptake inhibitors ssris and serotonin norepinephrine reuptake inhibitors snris
5. 成人における脳卒中予防のための心房中隔異常（PFO、ASDおよびASA）の治療 treatment of atrial septal abnormalities pfo asd and asa for prevention of stroke in adults

English Journal

Outcomes of Chinese Patients with End-stage Pulmonary Disease while Awaiting Lung Transplantation: A Single-center Study.

He WX, Yang YL, Xia Y, Song N, Liu M, Zhang P, Fan J, Jiang GN1.
Chinese medical journal.Chin Med J (Engl).2016 5th Jan;129(1):3-7. doi: 10.4103/0366-6999.172547.
BACKGROUND: The factors affecting the outcome of patients referred for lung transplantation (LTx) still have not been investigated extensively. The aim of this study was to characterize the patient outcomes and identify the prognostic factors for death while awaiting the LTx.METHODS: From January 20
PMID 26712425

Toxicity and bioaccumulation of copper in Limnodrilus hoffmeisteri under different pH values: Impacts of perfluorooctane sulfonate.

Meng L1, Yang S2, Feng M1, Qu R1, Li Y1, Liu J1, Wang Z3, Sun C1.
Journal of hazardous materials.J Hazard Mater.2016 Mar 15;305:219-28. doi: 10.1016/j.jhazmat.2015.11.048. Epub 2015 Dec 2.
Aquatic oligochaete Limnodrilus hoffmeisteri (L. hoffmeisteri) has been commonly used as a lethal and/or sub-lethal toxicological model organism in ecological risk assessments in contaminated water environments. In this study, experiments were conducted to investigate the potential toxic effects of
PMID 26686481

A robust localized soft sensor for particulate matter modeling in Seoul metro systems.

Liu H1, Yoo C2.
Journal of hazardous materials.J Hazard Mater.2016 Mar 15;305:209-18. doi: 10.1016/j.jhazmat.2015.11.051. Epub 2015 Dec 2.
Developing accurate soft sensors to predict and monitor the indoor air quality (IAQ) of hazardous pollutants that accumulate in underground metro systems is of key importance. The just-in-time (JIT) learning technique possesses a local feature that can track the variations in the dynamic process mor
PMID 26686480

Japanese Journal

A NEW RISK ESTIMATION MODEL OF BAYESIAN NETWORK FOR ADAPTING TO DRIVING ENVIRONMENT CHANGING

ZHANG ZHONG,FURUICHI TAIRA,UEDA TAKUMA,AKIDUKI TAKUMA,MASHIMO TOMOAKI
ICIC express letters. Part B, Applications : an international journal of research and surveys 10(6), 515-521, 2019-06
NAID 40021897270

Working Hours and Risk of Acute Myocardial Infarction and Stroke Among Middle-Aged Japanese Men : The Japan Public Health Center-Based Prospective Study Cohort II :

山岸良匡,Rie Hayashi,Hiroyasu Iso,Kazumasa YAMAGISHI,Hiroshi Yatsuya,Isao Saito,Yoshihiro Kokubo,Ehab S. Eshak,Norie Sawada,Shoichiro Tsugane
Circulation journal 83(5), 1072-1079, 2019-04
… Cox proportional hazards models adjusted for sociodemographic factors, cardiovascular risk factors, and occupation showed that multivariable-adjusted hazard ratios (HRs) associated with overtime work of ≥11h/day were: 1.63 (95% confidence interval [CI] 1.01–2.63) for acute myocardial infarction and 0.83 (95% CI 0.60–1.13) for total stroke, as compared with the reference group (working 7 to <9 h/day). …
NAID 120006619045

Achieving LDL cholesterol target levels <1.81 mmol/L may provide extra cardiovascular protection in patients at high risk: Exploratory analysis of the Standard Versus Intensive Statin Therapy for Patients with Hypercholesterolaemia and Diabetic Retinopathy study

Itoh Hiroshi,Komuro Issei,Takeuchi Masahiro,Akasaka Takashi,Daida Hiroyuki,Egashira Yoshiki,Fujita Hideo,Higaki Jitsuo,Hirata Ken-ichi,Ishibashi Shun,Isshiki Takaaki,Ito Sadayoshi,Kashiwagi Atsunori,Kato Satoshi,Kitagawa Kazuo,Kitakaze Masafumi,Kitazono Takanari,Kurabayashi Masahiko,Miyauchi Katsumi,Murakami Tomoaki,Murohara Toyoaki,Node Koichi,Ogawa Susumu,Saito Yoshihiko,Seino Yoshihiko,Shigeeda Takashi,Shindo Shunya,Sugawara Masahiro,Sugiyama Seigo,Terauchi Yasuo,Tsutsui Hiroyuki,Ueshima Kenji,Utsunomiya Kazunori,Yamagishi Masakazu,Yamazaki Tsutomu,Yo Shoei,Yokote Koutaro,Yoshida Kiyoshi,Yoshimura Michihiro,Yoshimura Nagahisa,Nakao Kazuwa,Nagai Ryozo
Diabetes Obesity & Metabolism 21(4), 791-800, 2019-04
… A Cox proportional hazards model was used to estimate hazard ratios (HRs) for incidence of the primary endpoint in patients who achieved target LDL cholesterol levels in each group. …
NAID 120006601525

「Cox proportional hazards model」

　　[★]

コックス比例ハザードモデル、Cox比例ハザードモデル

関: Cox model、hazard model、hazards model、proportional hazard model、proportional hazards model

「Cox model」

　　[★]

コックスモデル、Coxモデル

関: Cox proportional hazards model、hazard model、hazards model、proportional hazard model、proportional hazards model

「proportional hazards model」

　　[★] (臨床統計)比例ハザードモデル

関: Cox model、Cox proportional hazards model、hazard model、hazards model、proportional hazard model

「hazards model」

　　[★]

ハザードモデル

関: Cox model、Cox proportional hazards model、hazard model、proportional hazard model、proportional hazards model

「ハザードモデル」

　　[★]

英: hazard model、hazards model
関: 比例ハザードモデル、コックス比例ハザードモデル、コックスモデル

「proportional hazard model」

　　[★]

比例ハザードモデル

関: Cox model、Cox proportional hazards model、hazard model、hazards model、proportional hazards model

「hazard」

　　[★]

n.

ハザード、危険、害。偶然、ウン。思いがけない出来事、事故。運任せ

vt.

(生命など)賭する。危険を冒してやる、運任せにかってみる。(金銭など)賭ける、危険にさらす

関: danger、dangerous、harm、hazardous、hazardously、injure、jeopardy、risk、risky、unsafe

「mode」

　　[★]

n.

モード、型、機序、方法

関: fashion、form、manner、means、mechanism、method、pattern、procedure、process、type、typed、way

「model」

　　[★]

n.

モデル、模型

[Miller1997-1] Miller, Rupert G. (1997), Survival analysis, John Wiley & Sons, ISBN 0-471-25218-2

[2] Renganathan, Vinaitheerthan (2016-03-31). "Overview of Frequentist and Bayesian approach to Survival Analysis". Applied Medical Informatics. 38 (1): 25–38. ISSN 2067-7855.

[Dalgaard2008-3] Dalgaard, Peter (2008), Introductory Statistics with R (Second ed.), Springer, ISBN 978-0387790534

[4] "Censoring, Left and Right." International Encyclopedia of the Social Sciences, edited by William A. Darity, Jr., 2nd ed., vol. 1, Macmillan Reference USA, 2008, pp. 473-474. libraries.state.ma.us/login?gwurl=http://ic.galegroup.com/ic/uhic/ReferenceDetailsPage/ReferenceDetailsWindow?disableHighlighting=false&displayGroupName=Reference&currPage=&scanId=&query=&prodId=UHIC&search_within_results=&p=UHIC%3AWHIC&mode=view&catId=&limiter=&display-query=&displayGroups=&contentModules=&action=e&sortBy=&documentId=GALE%7CCX3045300295&windowstate=normal&activityType=&failOverType=&commentary=&source=Bookmark&u=mlin_w_amhercol&jsid=0938fef854cc86b83b5fe8a2c4bcb54b. Accessed 6 Nov. 2016.

[5] Richards, S. J. (2012). "A handbook of parametric survival models for actuarial use". Scandinavian Actuarial Journal. 2012 (4): 233–257. doi:10.1080/03461238.2010.506688.

[6] Singh, R.; Mukhopadhyay, K. (2011). "Survival analysis in clinical trials: Basics and must know areas". Perspect Clin Res. 2 (4): 145–148. doi:10.4103/2229-3485.86872.

[KleinbaumKlein2012-7] Kleinbaum, David G.; Klein, Mitchel (2012), Survival analysis: A Self-learning text (Third ed.), Springer, ISBN 978-1441966452

[Brostrom2012-8] Brostrom, Göran (2012), Event History Analysis with R (First ed.), Chapman & Hall/CRC, ISBN 978-1439831649

[TablemanKim2003-9] Tableman, Mara; Kim, Jong Sung (2003), Survival Analysis Using S (First ed.), Chapman and Hall/CRC, ISBN 978-1584884088

リンク元	「Cox proportional hazards model」「Cox model」「proportional hazards model」「hazards model」「ハザードモデル」
拡張検索	「proportional hazard model」
関連記事	「hazard」「mode」「model」

匿名

検索

案内

hazard model

WordNet

PrepTutorEJDIC

Wikipedia preview

wiki en

Contents

Introduction to survival analysis

Definitions of common terms in survival analysis

Example: Acute Myelogenous Leukemia survival data

Kaplan-Meier plot for the aml data

Life table for the aml data

Log-rank test: Testing for differences in survival in the aml data

Cox proportional hazards (PH) regression analysis

Example: Cox proportional hazards regression analysis for melanoma

Cox model using a covariate in the melanoma data

Extensions to Cox models

Tree-structured survival models

Example survival tree analysis

Survival random forests

General formulation

Survival function

Lifetime distribution function and event density

Hazard function and cumulative hazard function

Quantities derived from the survival distribution

Censoring

Fitting parameters to data

Non-parametric estimation

Computer software for survival analysis

Survival analysis in R

Analyses using the R package "survival"

Survival tree analysis using the rpart package

Survival random forest models using the randomForestSRC package

Distributions used in survival analysis

See also

References

Further reading

External links

UpToDate Contents

English Journal

Japanese Journal

Related Links

★リンクテーブル★

「Cox proportional hazards model」

「Cox model」

「proportional hazards model」

「hazards model」

「ハザードモデル」

「proportional hazard model」

「hazard」

「mode」

「model」