Personality testing is a big part of the way organizations make hiring decisions — it has been for a some time now (it wasn’t popular before about 1980). With advances in technology there has been a great proliferation of personality assessments. They’re not all good. A personality test is much easier to generate than it is to validate. This quiz, below, can help you to know if you’re using the wrong personality test. (Have some fun with it.)
Directions: The following list of paired statements(questions) reflects things I occasionally hear when folks are evaluating personality tests. For each pair, one response is more problematic when it comes to evaluating personality tests. Reflecting on your current situation, which of the two statements would I be most likely to hear from you or others if I were a fly on the wall when you were getting the pitch from your vendor?
Response Key: For all odd numbered pairs the problematic statement is in column A, for even numbered items the more problematic one is in column B.
Some of the statements do require more assumption than others, don’t get too caught up in the scoring. These are my answers and rationale:
- “It sure worked for me” — Frequently a personality test is sold by having the decision maker complete the assessment. This isn’t a bad thing — I encourage users to complete a personality test for themselves. The potential problem is that this is frequently the primary (or sole) evaluation criterion for a decision maker. Vendors know this and some hawk an instrument that produces unrealistically favorable results. “It says I’m good, therefore it must be right.” As for column B, the 300 page manual, good ones are typically lengthy. It takes some pulp to present all the evidence supporting a properly constructed inventory.
- “A type’s a type” – The most popular personality assessment of all, the MBTI, presents results for an individual as one of 16 types. Scores, to the extent that they are reported, only reflect the likelihood that the respondent is a given type or style – not that they are more or less extraverted, for example. But research and common sense say that personality traits do vary in degree, someone can be “really neurotic.” Two individuals with the same type can be quite different behaviorally based on how much of a trait they possess. A very extraverted person is different from someone who is only slightly extraverted — same type, different people. (No, I don’t condone mocking or calling out anyone’s score, as it would appear I’m suggesting in column A, but with a good test such a statement is potentially valid.)
- “That’s a clever twist” – Few personality tests are fully transparent to the respondent – this helps control the issue of social desirability. But some go too far with “tricky” scoring or scales. This is a problem in two ways: 1) if the trick gets out (google that) the assessment loses its value, and 2) respondents don’t like being tricked. It’s better to be fairly obvious with an item than to deal with {very} frustrated respondents who may just take you to court.
- “It was built using retina imaging” – Here’s another statement that needs a little help to see what’s going on (no pun intended). I’m not against new technology, it’s driving ever better assessment. But sometimes the technology is misused or inadequately supported with research. There’s a reason that some personality assessments have been around for more than 50 years. Validity isn’t always sexy.
- “That’s what I heard in a TED talk” — My intent here was to implicate “faddish” assessments. They may say they’re measuring the hot topic of the day, but more often than not, what’s hot in personality assessment, at least as far as traits are concerned, is not new. Research has concluded that many traits are not meaningfully different from ones that have been around a while. Don’t fall for an assessment just because you like the vocabulary, check the manual to see if it’s legitimately derived. There’s a reason that scientists prefer instruments based on the Big 5 traits (not the big 50).
- “Now that’s what I call an algorithm” — More complicated isn’t necessarily better. Some very good — typically public domain — assessments can be scored by hand. Tests that use Item Response Theory (IRT) for scoring, do have more complicated algorithms than tests scored via Classical Test Theory (i.e., more like your 3rd grade teacher scored your spelling test). Still, a three parameter IRT scoring method isn’t necessarily better than a one parameter model and it isn’t three times more complicated anyway. Proprietary assessments typically protect their copyright with nontransparent scoring, but for the most part what’s obfuscated or obscure is what items go into a calculation, not that the calculation is necessarily complex. Good assessments should employ fairly straightforward scoring to render both raw scores and percentile, or normed scores.
- “It really has big correlations” — As with some prior items a bit more context is needed to get the point I’m trying to make. Here the issue is sufficiency. Yes, a good instrument will show some relatively high correlations, but they need to be the right correlations. (And they need to be truthful. Unfortunately, I know of cases where misleading statistics have been presented. It helps to know about research design and to have a realistic expectation for that validity correlation. If the vendor tells you that their assessment correlates with performance above .40, make them prove it. (And a .40 correlation equates to a 16% reduction in uncertainty, not a 40% reduction. Sometimes vendors get this confused.)
- “It’s too long, let’s cut some items” – It’s tempting to simply eliminate irrelevant scales or items for your specific need. After all, you’re not touching the items that comprise the traits you want to know. The problem is that the assessment is validated “as is.” Both the length of an assessment and its contents can influence scores. Priming biases are one example of how items interact with each other. Anytime you modify an assessment it needs to be validated. This is typically the case for short forms of assessments (i.e., they’ve been specifically validated), so it’s fair to ask about this alternate form.
- “That’s amazing” — By now you should see that a common factor in my problem statements has to do with how much goes on “out of view” (less is better) and how thorough the test manual is. “That’s amazing” is for magic shows, not science (I realize I’m parsing semantics here – you get my point).
A personality test can be — and most often, is — a legitimate assessment for many (most) jobs. (This even applies to machines. Researchers are using a variation of personality inventories to manipulate the perceived personality of robots.) Without exception, it’s critical to ensure that any assessment be validated for specific use, but you want to start with something that has been thoroughly researched. If everything has been done right, you can expect local results to be in line with the manual (assuming your tested population isn’t that different from the test manual sample(s)).
A lot goes into validating a personality test and test manuals are lengthy. Although this is good and necessary for adequately evaluating the test, it can be used in intimidating or misleading ways. It’s easy for claims to be made out of context even if the manual is true, especially when decisions are made that affect one’s job. It’s important to review that test manual, not just the marketing brochure. (The good news is these manuals are boringly redundant. For example, the same figure is used for each scale, or trait, when repeating testing for gender bias.) Although I’m sure your vendor is a “stand up” person, you can’t rely on this fact if your process gets challenged in court. It pays to review the manual thoroughly.
I hope your personality inventory passed the test.
Psychways is owned and produced by Talentlift, LLC.