Tuesday, February 2, 2010

Use of Censored Data in Survival Analysis

Let us include some censored data in the previous exercise and do it again. All these numbers that are followed by + are censored data here. They are censored because the event of interest did not happen in these cases. Reason may be any.

Following data are related to time in days at which first claim happened in various cases under a health insurance product.

157, 300, 289, 91, 102+, 235, 28, 188, 311+, 32+, 119, 273, 78, 200+, 37, 349, 235, 96, 200+, 157, 78+, 314, 178, 135+, 263+, 62, 198+, 235, 192.

We are interested in knowing the pattern in probability of first claim happening after certain number of days.

(Apply your own concept and try finding the pattern. To draw any conclusion related to probability in this case, we need larger volume of data. But doing this exercise manually will help in understanding the concept.)

1. In how many cases, there is no claim till 150 days?
2. What is the probabilty that there will not be any claim till 150 days?
3. What is the range of days for which the probability in Q2 above remains same?
4. Identify similar ranges in which probabilty doesn't chamge.
5. Draw the probability Vs. time graph. This graph is called Survival Graph and represents the survival function.
6. Develop a method to draw such graphs from any given set of similar data.
7. Think about the impact on graph if the volume of data is increased.

No comments:

Post a Comment