[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Data & Theory - Insurance Example



"Insurance anyone?"



This post is admittedly lengthy and theoretical.  It

covers a topic with which most people on the list are

probably familiar but only distantly involved.



I will use a business (insurance) analogy for the

Theory-Data problem that we have with "What to do

about data that yield results contradictory to

accepted theories."  Residential radon is an

independent variable and the dependent variables are

lung cancer or perhaps even overall life expectancy.  

I am going to address this problem using a business

analogy: car insurance (liability only).  



Let's say that we RadSafers got together and started a

car insurance company.  We need to decide how to

perform risk assessment for drivers.  



First, "everyone knows" that one's car insurance rate

is determined by one's driving record and age, and to

a lesser extent, by the vehicle one drives.  Teenagers

who drive souped-up Camaros generally pay a lot for

insurance, especially if they've had a ticket or two. 

It's intuitively obvious that any person who has a lot

of speeding tickets, runned stop signs etc. etc. is

probably more likely to be the cause of a fender

bender or crash than a 40-year old 0 point driver who

drives a Volvo.  Let's call this the traditional

three-factor model of car insurance: Motor Vehicle

record, age, vehicle.



As a driver but non-specialist in insurance matters, I

always thought that this three-factor model was THE

logical source of liability insurance rates.  I have

given rather little thought to this, other than trying

to not get pulled over, avoiding driving in really bad

weather when the risk of a wreck is higher, etc.  ;-).



However, there is a new twist in auto insurance in the

last few years.



Relatively recently, related to moving, I had to

change auto policies.  I comparison-shopped and got

quotes from two agencies working for two different

major firms.  One firm merely ran a Motor Vehicle

record.  The other ran the same Motor Vehicle record

and also ran a credit check.  The quote from the

agency performing the credit check was 40% lower than

the quote from the agency that didn't perform the

check!   Amazed, I told the gal at the second agency

to immediately write the policy.  At the time, I was

under the impression that the credit check was merely

to see whether they thought I would pay my bill on

time or not.  Being a person with a conservative

cause-and-effect background, I made no connection

between driving risk estimation on the insurer's

behalf and credit rating.  



However, yesterday evening I was listening to the show

"Marketplace" on NPR and they discussed a new twist

that the insurance companies have for determining

drivers' auto insurance rates.  Some of the "major"

insurance companies are using credit rating as a

surrogate for collision risk, because they have found

a high correlation between claims and bad credit. 

They are charging more for premiums to people with

history of bankruptcy etc.  Now, that's a

"correlational" "ecological" model, if I've ever heard

one! 

 

The insurance companies have apparently plugged

various factors into models of auto insurance claim

risk [i.e. fender bender and crash risk].   These

factors obviously include driving record, age and type

of vehicle driven.  However, it was stated that a very

high =correlation= was obtained when credit rating was

plugged in as well.  The correlation for credit was so

high that it rivaled the correlation with the driving

record.  It's my guess that the insurance companies

used regression analysis to develop this model.  

Although I can postulate a causal relationship between

credit rating and likelihood of future insurance bills

being paid on time, I can think of no way to create an

actual =causal= relationship between credit rating and

risk of auto insurance claim e.g. driving habits and

risk of crash.  I have no idea why people with good

credit would be better drivers than those with bad

credit, although possible mechanisms can be proposed. 



Thus, the portion of the credit rating thing that

factors the driver's risk of a claim is solely

correlational and is based on the probability that the

individual will engage in 'risky' behavior (in the

general sense) in the future.  Thus, there is now a

"four-factor model" of car insurance: driving record +

credit record + age + vehicle driven.  



As a 'scientist,' I would say that the correlation

noted by the insurance industry is an interesting one

that should be tested [and apparently is being tested

by drivers across the USA].



Is it valid to make decisions on the basis of a

correlation such as this one that is not proved to be

directly causal?  This is precisely why ecological

work with radon is so controversial.  Note that there

is no proof of a mechanism, such as, for instance,

"people who have declared bankruptcy delay in getting

their brakes fixed and thus cause rear-end

collisions."  [This might be plausible, and we hope

the driver behind us doesn't fit this description ;-)]



Think about this: The auto insurers haven't told us

WHY people with good credit have fewer claims than

people with bad credit, just that their data shows

this is the case.  There is no mechanistic model that

I know of.



Yet, the auto insurance industry has staked millions

of dollars worth of rate decisions over the past

couple of years based on what is solely a

correlational model.  Remember that an insurance

company's goal is not to prove any particular theory

right or wrong, but is rather to make money by

minimizing claims paid out while maximizing profit by

maintaining strong market share i.e. number of

policies held at a price that customers are willing to

pay.  So, if a particular model makes more money and

results in fewer claims being paid out than another

model, the insurance company will go with it, rather

than holding to a particular theory.  In other words,

the insurance company goes with the data, because the

data help it to make money by reducing claims and

charging more for policies that have a higher risk of

claims.  Economists say that the data-based model has

more "utility," meaning that it has more subjective

value to the insurance company.  "Utility" is a

concept widely employed in finance and insurance that

denotes subjective value, i.e. value to the company.  



This is an example of why =data cannot be ignored=

even if it does not fit a currently accepted theory.  



At this time, no one has told the insurance companies

they have to stick to a traditional driving-record

model or a strict cause-and-effect model.   So, the

ins. companies go with the model with the most value

to their bottom line.  The insurance companies predict

lower claims-to-premiums ratio from the four-factor

empirical model than using driving record alone, or

driving record/age/vehicle, and thus have started to

use credit record to predict claims, even if there is

no definite explanation or mechanism of why people

with bad credit or bankruptcy have more auto claims

than others.



[It isn't the point of this post to discuss whether

this is "fair" or to discuss issues like privacy here

or to cover specific exceptions.   However, what about

the 50-year old 0-point driver who has just been

divorced or widowed and has recently had a marked

income reduction?   Should almost exclusive emphasis

be placed on the [clean] driving record in this case? 

 Note that the credit rating is a "crude" measure

because it does not take into account factors in

individual situations such as loss of a job, death of

spouse, divorce, etc.  Does this individual fit the

new, four-factor model?]



Back to radon.  Although in reality, health insurance

companies use medical history, age, and geographic

location to determine premiums, let's imagine that a

health insurance company wanted to use radon as a

surrogate to help determine premiums [!].  If they

did, based on data, they could reduce the health

insurance premiums for people who live in counties

with means of 1.5-5 pCi/l radon and raise the premiums

for people who live in counties with means of less

than 1.0 pCi/l radon, based on data, even if no

explanation can be provided at this time.  [We don't

have a mechanism for auto collisions and bad credit,

either.  See above about bankrupt people and brakes]. 





A theoretical health insurance example would be a

slightly reduced premium on the basis of reduced lung

cancer risk as documented by Dr. Cohen (1,2) by

plugging a small additional factor related to lung

cancer into health insurance formulas that already

take into account the greater factors of smoking and

age.  Another theoretical example could be done on the

basis of overall life expectancy where radon levels in

the highest longevity counties are higher than those

in lower longevity counties [controlled for race and

gender].  [I would be happy to send a "Top 100/Bottom

100 Life Expectancy Counties" Excel Spreadsheet of

Life Expectancy vs. Radon data for white females to

anyone who would like one, from last semester's term

paper].   [see "Top 10" below].  No, I can't give a

mechanistic explanation. 



Bottom line to think about: This post does NOT say

that correlation implies causation.  What it does give

is an example of a multi-million dollar industry that

uses correlational data for large volumes of business

decisions where no clear explanation [or proof that

would contradict the data] is present.  The bottom

line is that it is not valid to ignore the radon data.

 Rather, an explanation should be sought for it.



~Ruth 2 aka Ruth Sponsler



====================================================



Insurance links:



http://www.insurance.com/profiles_insights/ask_expert/ask_question_0400_7.asp



"It seems that there is a connection between credit

risk and safety risk.  Although there is no

explanation for the findings, some insurance company

statistics show that drivers with derogatory credit,

historically file more accident claims than drivers

without derogatory credit. Insurers reason that a

consumer who is careful with one aspect of their life

(e.g., financial affairs) is also likely to be careful

with other aspects of their life (e.g., driving

habits)."

       "Credit information is also needed to determine

whether an applicant is likely to pay premiums in a

timely fashion." [duh]



http://news.mpr.org/features/199901/22_stawickie_credit/



References:



===================================================

Cohen, B.L. 1997. Lung cancer rate vs. mean radon

level in U.S. counties of various characteristics.

Health Phys.  72(1): 114-119.



Cohen, B.L.  1989.  Expected indoor 222Rn levels in

counties with very high and very low lung cancer

rates.  Health Phys.  57:897-907.

====================================================

Top 10/Bottom 10 Data

====================================================

(From term paper): 

'Top 10" white female life expectancy counties in the

U.S. and their respective geometric mean radon levels

[note: Florida, Hawaii omitted along with California

and Arizona because of large amount of retiree

migration from other areas].  Results with my rough

smoking correction were similar.  Radon data courtesy

of B.L. Cohen and Minnesota comparisons courtesy of

Dr. P. Price.  



http://eetd.lbl.gov/iep/high-radon/data/minn-tbl.html



=========================================

Top 10 Life Expectancy Counties for white females.



1.  MN - Stearns Co. - 82.89 y - 2.286 pCi/l - 153

Bq/m^3 using Price et al. data

2.  SD - Brookings Co. - 82.89 y - 2.761 pCi/l

3.  MN - Nobles Co. - 82.80 y - 3.368 pCi/l - 198

Bq/m^3 using Price et al. data

4.  MN - Carver Co. - 82.52 y - 2.423 pCi/l - 153

Bq/m^3 using Price et al. data

5.  Iowa - Plymouth Co. - 82.48 y - 4.897 pCi/l [no

joke]

6.  N. Dakota - Eddy Co. - 82.45 y - 2.912 pCi/l

7.  N. Dakota - Benson Co. - 82.45 y - 2.091 pCi/l

8.  N. Dakota - Wells Co. - 82.45 y - 1.673 pCi/l

9.  Iowa - Carroll Co. - 82.41 y - 3.760

10. S. Dakota - Yankton Co. - 82.40 y - 5.328 pCi/l

[no joke!!]



Mean radon for Top 10 = 3.150 pCi/l +/- 0.380 pCi/l

(SE)

--------------

Bottom 10 Life Expectancy Counties for White Females



1.  Kentucky - Harlan Co. - 75.60 y - 1.484 pCi/l

2.  W. Virginia - Logan Co. - 75.89 y - 1.238 pCi/l 

3.  W. Virginia - Fayette Co. - 76.2 y - 1.055 pCi/l

4.  W. Virginia - Wyoming Co. - 76.21 y - 1.511 pCi/l

5.  Indiana - Scott Co. - 76.45 y - 0.976 pCi/l

6.  Oklahoma - McCurtain Co. - 76.51 y - 0.916 pCi/l

7.  Missouri - St. Louis - 76.61 y - 1.0199 pCi/l

8.  Mississippi - Pearl River Co. - 76.63 y - 0.684

pCi/l

9.  W. Virginia - Boone Co. - 76.67 y - 1.057 pCi/l

10.  Virginia - Wise Co. - 76.71 y - 1.840 pCi/l



Mean radon for Bottom 10 = 1.178 pCi/l +/- 0.108 (SE)



===================================================



hmmm....



===================================================

 I have always

> felt as an

> > experimentalist that data is sacrosanct and cannot

> be ignored. So I

> > believe it is incumbent upon the supporters of the

> LNT to devise a

> > quantitative model which reproduces that data,

> within statistics. Again as

> > an experimentalist I do not feel that it is a

> prerequisite that I be able

> > to explain my data, but that everyone with a

> theory must take it into

> > account.





__________________________________________________

Do You Yahoo!?

Yahoo! Sports - Coverage of the 2002 Olympic Games

http://sports.yahoo.com

************************************************************************

You are currently subscribed to the Radsafe mailing list. To unsubscribe,

send an e-mail to Majordomo@list.vanderbilt.edu  Put the text "unsubscribe

radsafe" (no quote marks) in the body of the e-mail, with no subject line.

You can view the Radsafe archives at http://www.vanderbilt.edu/radsafe/