[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Data & Theory - Insurance Example
"Insurance anyone?"
This post is admittedly lengthy and theoretical. It
covers a topic with which most people on the list are
probably familiar but only distantly involved.
I will use a business (insurance) analogy for the
Theory-Data problem that we have with "What to do
about data that yield results contradictory to
accepted theories." Residential radon is an
independent variable and the dependent variables are
lung cancer or perhaps even overall life expectancy.
I am going to address this problem using a business
analogy: car insurance (liability only).
Let's say that we RadSafers got together and started a
car insurance company. We need to decide how to
perform risk assessment for drivers.
First, "everyone knows" that one's car insurance rate
is determined by one's driving record and age, and to
a lesser extent, by the vehicle one drives. Teenagers
who drive souped-up Camaros generally pay a lot for
insurance, especially if they've had a ticket or two.
It's intuitively obvious that any person who has a lot
of speeding tickets, runned stop signs etc. etc. is
probably more likely to be the cause of a fender
bender or crash than a 40-year old 0 point driver who
drives a Volvo. Let's call this the traditional
three-factor model of car insurance: Motor Vehicle
record, age, vehicle.
As a driver but non-specialist in insurance matters, I
always thought that this three-factor model was THE
logical source of liability insurance rates. I have
given rather little thought to this, other than trying
to not get pulled over, avoiding driving in really bad
weather when the risk of a wreck is higher, etc. ;-).
However, there is a new twist in auto insurance in the
last few years.
Relatively recently, related to moving, I had to
change auto policies. I comparison-shopped and got
quotes from two agencies working for two different
major firms. One firm merely ran a Motor Vehicle
record. The other ran the same Motor Vehicle record
and also ran a credit check. The quote from the
agency performing the credit check was 40% lower than
the quote from the agency that didn't perform the
check! Amazed, I told the gal at the second agency
to immediately write the policy. At the time, I was
under the impression that the credit check was merely
to see whether they thought I would pay my bill on
time or not. Being a person with a conservative
cause-and-effect background, I made no connection
between driving risk estimation on the insurer's
behalf and credit rating.
However, yesterday evening I was listening to the show
"Marketplace" on NPR and they discussed a new twist
that the insurance companies have for determining
drivers' auto insurance rates. Some of the "major"
insurance companies are using credit rating as a
surrogate for collision risk, because they have found
a high correlation between claims and bad credit.
They are charging more for premiums to people with
history of bankruptcy etc. Now, that's a
"correlational" "ecological" model, if I've ever heard
one!
The insurance companies have apparently plugged
various factors into models of auto insurance claim
risk [i.e. fender bender and crash risk]. These
factors obviously include driving record, age and type
of vehicle driven. However, it was stated that a very
high =correlation= was obtained when credit rating was
plugged in as well. The correlation for credit was so
high that it rivaled the correlation with the driving
record. It's my guess that the insurance companies
used regression analysis to develop this model.
Although I can postulate a causal relationship between
credit rating and likelihood of future insurance bills
being paid on time, I can think of no way to create an
actual =causal= relationship between credit rating and
risk of auto insurance claim e.g. driving habits and
risk of crash. I have no idea why people with good
credit would be better drivers than those with bad
credit, although possible mechanisms can be proposed.
Thus, the portion of the credit rating thing that
factors the driver's risk of a claim is solely
correlational and is based on the probability that the
individual will engage in 'risky' behavior (in the
general sense) in the future. Thus, there is now a
"four-factor model" of car insurance: driving record +
credit record + age + vehicle driven.
As a 'scientist,' I would say that the correlation
noted by the insurance industry is an interesting one
that should be tested [and apparently is being tested
by drivers across the USA].
Is it valid to make decisions on the basis of a
correlation such as this one that is not proved to be
directly causal? This is precisely why ecological
work with radon is so controversial. Note that there
is no proof of a mechanism, such as, for instance,
"people who have declared bankruptcy delay in getting
their brakes fixed and thus cause rear-end
collisions." [This might be plausible, and we hope
the driver behind us doesn't fit this description ;-)]
Think about this: The auto insurers haven't told us
WHY people with good credit have fewer claims than
people with bad credit, just that their data shows
this is the case. There is no mechanistic model that
I know of.
Yet, the auto insurance industry has staked millions
of dollars worth of rate decisions over the past
couple of years based on what is solely a
correlational model. Remember that an insurance
company's goal is not to prove any particular theory
right or wrong, but is rather to make money by
minimizing claims paid out while maximizing profit by
maintaining strong market share i.e. number of
policies held at a price that customers are willing to
pay. So, if a particular model makes more money and
results in fewer claims being paid out than another
model, the insurance company will go with it, rather
than holding to a particular theory. In other words,
the insurance company goes with the data, because the
data help it to make money by reducing claims and
charging more for policies that have a higher risk of
claims. Economists say that the data-based model has
more "utility," meaning that it has more subjective
value to the insurance company. "Utility" is a
concept widely employed in finance and insurance that
denotes subjective value, i.e. value to the company.
This is an example of why =data cannot be ignored=
even if it does not fit a currently accepted theory.
At this time, no one has told the insurance companies
they have to stick to a traditional driving-record
model or a strict cause-and-effect model. So, the
ins. companies go with the model with the most value
to their bottom line. The insurance companies predict
lower claims-to-premiums ratio from the four-factor
empirical model than using driving record alone, or
driving record/age/vehicle, and thus have started to
use credit record to predict claims, even if there is
no definite explanation or mechanism of why people
with bad credit or bankruptcy have more auto claims
than others.
[It isn't the point of this post to discuss whether
this is "fair" or to discuss issues like privacy here
or to cover specific exceptions. However, what about
the 50-year old 0-point driver who has just been
divorced or widowed and has recently had a marked
income reduction? Should almost exclusive emphasis
be placed on the [clean] driving record in this case?
Note that the credit rating is a "crude" measure
because it does not take into account factors in
individual situations such as loss of a job, death of
spouse, divorce, etc. Does this individual fit the
new, four-factor model?]
Back to radon. Although in reality, health insurance
companies use medical history, age, and geographic
location to determine premiums, let's imagine that a
health insurance company wanted to use radon as a
surrogate to help determine premiums [!]. If they
did, based on data, they could reduce the health
insurance premiums for people who live in counties
with means of 1.5-5 pCi/l radon and raise the premiums
for people who live in counties with means of less
than 1.0 pCi/l radon, based on data, even if no
explanation can be provided at this time. [We don't
have a mechanism for auto collisions and bad credit,
either. See above about bankrupt people and brakes].
A theoretical health insurance example would be a
slightly reduced premium on the basis of reduced lung
cancer risk as documented by Dr. Cohen (1,2) by
plugging a small additional factor related to lung
cancer into health insurance formulas that already
take into account the greater factors of smoking and
age. Another theoretical example could be done on the
basis of overall life expectancy where radon levels in
the highest longevity counties are higher than those
in lower longevity counties [controlled for race and
gender]. [I would be happy to send a "Top 100/Bottom
100 Life Expectancy Counties" Excel Spreadsheet of
Life Expectancy vs. Radon data for white females to
anyone who would like one, from last semester's term
paper]. [see "Top 10" below]. No, I can't give a
mechanistic explanation.
Bottom line to think about: This post does NOT say
that correlation implies causation. What it does give
is an example of a multi-million dollar industry that
uses correlational data for large volumes of business
decisions where no clear explanation [or proof that
would contradict the data] is present. The bottom
line is that it is not valid to ignore the radon data.
Rather, an explanation should be sought for it.
~Ruth 2 aka Ruth Sponsler
====================================================
Insurance links:
http://www.insurance.com/profiles_insights/ask_expert/ask_question_0400_7.asp
"It seems that there is a connection between credit
risk and safety risk. Although there is no
explanation for the findings, some insurance company
statistics show that drivers with derogatory credit,
historically file more accident claims than drivers
without derogatory credit. Insurers reason that a
consumer who is careful with one aspect of their life
(e.g., financial affairs) is also likely to be careful
with other aspects of their life (e.g., driving
habits)."
"Credit information is also needed to determine
whether an applicant is likely to pay premiums in a
timely fashion." [duh]
http://news.mpr.org/features/199901/22_stawickie_credit/
References:
===================================================
Cohen, B.L. 1997. Lung cancer rate vs. mean radon
level in U.S. counties of various characteristics.
Health Phys. 72(1): 114-119.
Cohen, B.L. 1989. Expected indoor 222Rn levels in
counties with very high and very low lung cancer
rates. Health Phys. 57:897-907.
====================================================
Top 10/Bottom 10 Data
====================================================
(From term paper):
'Top 10" white female life expectancy counties in the
U.S. and their respective geometric mean radon levels
[note: Florida, Hawaii omitted along with California
and Arizona because of large amount of retiree
migration from other areas]. Results with my rough
smoking correction were similar. Radon data courtesy
of B.L. Cohen and Minnesota comparisons courtesy of
Dr. P. Price.
http://eetd.lbl.gov/iep/high-radon/data/minn-tbl.html
=========================================
Top 10 Life Expectancy Counties for white females.
1. MN - Stearns Co. - 82.89 y - 2.286 pCi/l - 153
Bq/m^3 using Price et al. data
2. SD - Brookings Co. - 82.89 y - 2.761 pCi/l
3. MN - Nobles Co. - 82.80 y - 3.368 pCi/l - 198
Bq/m^3 using Price et al. data
4. MN - Carver Co. - 82.52 y - 2.423 pCi/l - 153
Bq/m^3 using Price et al. data
5. Iowa - Plymouth Co. - 82.48 y - 4.897 pCi/l [no
joke]
6. N. Dakota - Eddy Co. - 82.45 y - 2.912 pCi/l
7. N. Dakota - Benson Co. - 82.45 y - 2.091 pCi/l
8. N. Dakota - Wells Co. - 82.45 y - 1.673 pCi/l
9. Iowa - Carroll Co. - 82.41 y - 3.760
10. S. Dakota - Yankton Co. - 82.40 y - 5.328 pCi/l
[no joke!!]
Mean radon for Top 10 = 3.150 pCi/l +/- 0.380 pCi/l
(SE)
--------------
Bottom 10 Life Expectancy Counties for White Females
1. Kentucky - Harlan Co. - 75.60 y - 1.484 pCi/l
2. W. Virginia - Logan Co. - 75.89 y - 1.238 pCi/l
3. W. Virginia - Fayette Co. - 76.2 y - 1.055 pCi/l
4. W. Virginia - Wyoming Co. - 76.21 y - 1.511 pCi/l
5. Indiana - Scott Co. - 76.45 y - 0.976 pCi/l
6. Oklahoma - McCurtain Co. - 76.51 y - 0.916 pCi/l
7. Missouri - St. Louis - 76.61 y - 1.0199 pCi/l
8. Mississippi - Pearl River Co. - 76.63 y - 0.684
pCi/l
9. W. Virginia - Boone Co. - 76.67 y - 1.057 pCi/l
10. Virginia - Wise Co. - 76.71 y - 1.840 pCi/l
Mean radon for Bottom 10 = 1.178 pCi/l +/- 0.108 (SE)
===================================================
hmmm....
===================================================
I have always
> felt as an
> > experimentalist that data is sacrosanct and cannot
> be ignored. So I
> > believe it is incumbent upon the supporters of the
> LNT to devise a
> > quantitative model which reproduces that data,
> within statistics. Again as
> > an experimentalist I do not feel that it is a
> prerequisite that I be able
> > to explain my data, but that everyone with a
> theory must take it into
> > account.
__________________________________________________
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com
************************************************************************
You are currently subscribed to the Radsafe mailing list. To unsubscribe,
send an e-mail to Majordomo@list.vanderbilt.edu Put the text "unsubscribe
radsafe" (no quote marks) in the body of the e-mail, with no subject line.
You can view the Radsafe archives at http://www.vanderbilt.edu/radsafe/