Who decides which jobs AI will take?

00:00

{"text":[[{"start":null,"text":"<div class=\"n-content-layout card-container\" data-layout-name=\"card\" data-layout-width=\"fullWidth\"><div class=\"n-content-layout__container\"><h3></h3><div class=\"n-content-layout__slot\" data-slot-width=\"true\"><p>This article is an on-site version of our The AI Shift newsletter. Premium subscribers can sign up <a href=\"https://ep.ft.com/newsletters/subscribe?newsletterIds=68da4b4af493110b11187d9f\">here</a> to get the newsletter delivered every Thursday. Standard subscribers can upgrade to Premium <a href=\"https://www.ft.com/manage/subscription/change/713f1e28-0bc5-8261-f1e6-eebab6f7600e?\">here</a>, or <a href=\"https://www.ft.com/newsletters\">explore</a> all FT newsletters</p></div></div></div>"}],[{"start":6.75,"text":"Welcome back to The AI Shift, our weekly dive beneath the surface of the AI and jobs story. We’ve written before — with a somewhat sceptical eye — about how economists produce their splashy estimates of which jobs are “exposed” to AI disruption. Today we’re going one step deeper: who (or, more to the point, what) actually decides whether AI is capable of performing an individual task? And how consistent are those assessments?"}],[{"start":34.9,"text":"John writes"}],[{"start":36.5,"text":"Over the past year or two, dozens of headline-making studies have made claims about whether or not we are on the brink of a white-collar jobs wipeout, whether AI is already eating graduate jobs, and so on. Away from the headlines is the small detail that the vast majority of those studies have been underpinned by a single set of scores for how exposed particular jobs are to AI. And deeper still in the footnotes is how those exposure scores are determined."}],[{"start":65.35,"text":"A crucial detail that often gets glossed over is the fact that in the most widely used and cited index of occupational exposure, originating in a 2024 study by researchers at OpenAI, the extent to which AI can accomplish a particular task — writing a report, interpreting an image, giving instructions to an employee, and so on — was determined by . . . AI. Specifically by GPT-4, one of OpenAI’s earlier models."}],[{"start":91.75,"text":"Having large language models do the legwork of classifying thousands of tasks’ exposure levels is perfectly defensible. But an important and until now unexplored question is whether different AI models make the same assessments. A new study by Northwestern University’s Michelle Yin finds that the answer is often a resounding “no”."}],[{"start":113.55,"text":"Yin took all 705 jobs in the US occupational coding scheme, and had four different models — the original GPT-4, as well as newer models from OpenAI, Anthropic and Google — assess their exposure to automation by AI using exactly the same approach as in the original OpenAI study, which is to say making a judgment as to whether their constituent tasks could be significantly sped up using current consumer-facing AI tools."}],[{"start":141.45,"text":"She found a remarkable level of disagreement, with estimates of the share of at-risk jobs ranging from less than 15 per cent (when assessed by Gemini) to 50 per cent (when assessed by Claude). The gaps are especially wide for specific occupations — unsurprisingly affecting white-collar jobs the most. For example, when scored by GPT-4 in the 2024 OpenAI study, economists were judged to be 10 per cent exposed; its successor GPT-5 puts that at just over 50 per cent, and Claude has it at 80 per cent."}],[{"start":174.75,"text":"These disparities have big downstream consequences for our understanding of whether and how AI really is impacting the labour market. Use the original scores to assess occupational exposure and we find that AI has had a weak negative impact on employment. Use Gemini’s assessments, and that flips to a weak positive impact, with the most exposed jobs seeing employment growth. As Yin emphasises, these starkly contrasting narratives come from using exactly the same data and methodology — only varying which AI model is doing the exposure rating."}],[{"start":208.55,"text":"The good news is this problem has a relatively straightforward solution: analyses of the real-world impacts of theoretical AI exposure should be run using several different models’ assessments of exposure. As Yin points out, discrepancies in the results could actually be useful, telling us that the findings reflect characteristics of the AI models, not the labour market. Where different models tell the same story, we can be more confident we’re looking at something concrete."}],[{"start":234.70000000000002,"text":"Reading Yin’s work, it also struck me that versions of this issue apply to other ongoing labour market debates, among them the impact of the rise in working from home. A paper published a few days ago found that the rise of remote work may be a better explanation than AI for the outsized decline in entry-level hiring over the past few years. Notably, the authors argue that one reason their analysis found this result where others have not is that their measure of an occupation’s exposure to remote work is based on actual job adverts offering remote or hybrid work, while others have used a theoretical assessment (carried out by humans but conceptually similar to the AI classification method)."}],[{"start":273.85,"text":"Sarah, the whole field of AI job displacement research feels to me like a case where the closer you look, the fuzzier the picture becomes. What was your sense after reading this study?"}],[{"start":285.95000000000005,"text":"Sarah writes"}],[{"start":287.40000000000003,"text":"Well John, I won’t rehash here all the reasons that I think these measures of AI “exposure” should be taken with a spoonful of salt anyway (they’re here if anyone missed them). But Yin’s very interesting study made me consider a wider question: is there a similar “model disagreement effect” to be found in other areas in which LLMs are being used to make consequential recommendations or decisions?"}],[{"start":310.75000000000006,"text":"The EU’s GDPR rules stipulate that people have the right to insist on a “human review” of certain automated decisions which affect them profoundly, such as whether they can be approved for a loan, or whether their job application reaches the next stage."}],[{"start":325.50000000000006,"text":"But I wonder whether — in the future — people will also begin to insist on having a second and third “AI opinion”, for example by having their job application run through a system which relies on a different underlying model, to see if a different decision is reached. "}],[{"start":340.15000000000003,"text":"That wouldn’t necessarily be as straightforward as saying: “if this decision relied on an OpenAI model, please now run it through a Google and Anthropic model.” For a start, not all “AI” systems rely on non-deterministic LLMs. Many will use proprietary sets of models designed for a specific purpose. HireVue, for example, which is one of the biggest providers of automated hiring software, uses a number of different third-party and proprietary models to turn people’s video interviews into text, to analyse that text, and to assign job applicants “competency” scores on various metrics such as “adaptability” or “problem solving.” "}],[{"start":377.45000000000005,"text":"But it’s an interesting line of thought and I think it does, at the least, increase the case for requiring maximum disclosure from companies about what goes on “under the hood” when AI is involved in making decisions that affect people’s lives."}],[{"start":391.15000000000003,"text":"I do have one question for you, John. It looks like the newer models Yin tried were more bullish about how many tasks could be automated. Is that because those more recent models are, indeed, more capable across a wider set of tasks? And therefore - to play devil’s advocate - is the fact they come up with higher exposure estimates actually quite useful information?"}],[{"start":411.8,"text":"John replies . . . "}],[{"start":414,"text":"Thanks Sarah, that’s a really interesting point on requesting review by a particular model. In answer to your question about newer models being more bullish on AI capabilities, there certainly seems to be some of that at play. On average GPT-5 scored tasks and jobs as being roughly twice as exposed to AI as its predecessor did in the original study, and Claude 4.5 (the newest of the models tested) was the punchiest of all."}],[{"start":441.2,"text":"As you surmise, Yin attributes some of this to the newer models ‘knowing’ about their own expanded abilities and having information on emerging AI capabilities that were not in GPT-4’s training data; in that sense the higher exposure scores could indeed be more useful. But I would add two caveats."}],[{"start":461.45,"text":"One is that the model whose task assessments are the closest match to measured real-world AI usage is the oldest — GPT-4. Though this may be because the models’ assessments are inherently forward-looking, and perhaps when we look back, the scores given by 2026-era models will be a better match for 2028-era usage."}],[{"start":483.25,"text":"The bigger one is that some of this seems to be about much broader differences in interpretation of the question of what tasks AI can do. Claude is particularly confident that AI tools can manage and supervise human workers in a wide range of jobs, rating occupations from CEOs to factory floor supervisors as highly exposed. That assessment is not on its face preposterous, but when Gemini rates the same jobs as very low exposure, they’re probably ‘thinking’ about the question in very different ways."}],[{"start":514.8,"text":"Recommended reading"}],[{"start":null,"text":"<ol><li><p>Ethan Mollick has a <a href=\"https://www.oneusefulthing.org/p/choosing-to-stay-human\">thoughtful essay</a> on the need to build healthy and intentional habits around the ways we use AI, so as to have it enhance our strengths instead of making us all less human (John)</p></li><li><p>Our colleague Ellesheva Kissin penned a very interesting <a href=\"https://www.ft.com/content/d82d2a5c-74ab-4eb9-a658-fd5467e71670?syn-25a6b1a6=1\">Big Read</a> on the ways in which AI is changing the consulting industry (Sarah)</p></li></ol>"}],[{"start":null,"text":"<div class=\"n-content-layout card-container\" data-layout-name=\"card\" data-layout-width=\"fullWidth\"><div class=\"n-content-layout__container\"><h3>Recommended newsletters for you</h3><div class=\"n-content-layout__slot\" data-slot-width=\"true\"><p><strong>The Lex Newsletter</strong> — Lex, our investment column, breaks down the week’s key themes, with analysis by award-winning writers. Sign up <a href=\"https://ep.ft.com/newsletters/subscribe?newsletterIds=56657d10e4b04e04251004fd\">here</a></p><p><strong>Working It</strong> — Everything you need to get ahead at work, in your inbox every Wednesday. Sign up <a href=\"https://ep.ft.com/newsletters/subscribe?newsletterIds=62039b7ea31d6577a31f70df\">here</a></p></div></div></div>"}],[{"start":522.3499999999999,"text":""}]],"url":"https://audio.ftcn.net.cn/album/a_1780156134_2598.mp3"}

尊敬的用户您好，这是来自FT中文网的温馨提示：如您对更多FT中文网的内容感兴趣，请在苹果应用商店或谷歌应用市场搜索“FT中文网”，下载FT中文网的官方应用。

{"text":[[{"start":null,"text":"

This article is an on-site version of our The AI Shift newsletter. Premium subscribers can sign up here to get the newsletter delivered every Thursday. Standard subscribers can upgrade to Premium here, or explore all FT newsletters

"}],[{"start":6.75,"text":"Welcome back to The AI Shift, our weekly dive beneath the surface of the AI and jobs story. We’ve written before — with a somewhat sceptical eye — about how economists produce their splashy estimates of which jobs are “exposed” to AI disruption. Today we’re going one step deeper: who (or, more to the point, what) actually decides whether AI is capable of performing an individual task? And how consistent are those assessments?"}],[{"start":34.9,"text":"John writes"}],[{"start":36.5,"text":"Over the past year or two, dozens of headline-making studies have made claims about whether or not we are on the brink of a white-collar jobs wipeout, whether AI is already eating graduate jobs, and so on. Away from the headlines is the small detail that the vast majority of those studies have been underpinned by a single set of scores for how exposed particular jobs are to AI. And deeper still in the footnotes is how those exposure scores are determined."}],[{"start":65.35,"text":"A crucial detail that often gets glossed over is the fact that in the most widely used and cited index of occupational exposure, originating in a 2024 study by researchers at OpenAI, the extent to which AI can accomplish a particular task — writing a report, interpreting an image, giving instructions to an employee, and so on — was determined by . . . AI. Specifically by GPT-4, one of OpenAI’s earlier models."}],[{"start":91.75,"text":"Having large language models do the legwork of classifying thousands of tasks’ exposure levels is perfectly defensible. But an important and until now unexplored question is whether different AI models make the same assessments. A new study by Northwestern University’s Michelle Yin finds that the answer is often a resounding “no”."}],[{"start":113.55,"text":"Yin took all 705 jobs in the US occupational coding scheme, and had four different models — the original GPT-4, as well as newer models from OpenAI, Anthropic and Google — assess their exposure to automation by AI using exactly the same approach as in the original OpenAI study, which is to say making a judgment as to whether their constituent tasks could be significantly sped up using current consumer-facing AI tools."}],[{"start":141.45,"text":"She found a remarkable level of disagreement, with estimates of the share of at-risk jobs ranging from less than 15 per cent (when assessed by Gemini) to 50 per cent (when assessed by Claude). The gaps are especially wide for specific occupations — unsurprisingly affecting white-collar jobs the most. For example, when scored by GPT-4 in the 2024 OpenAI study, economists were judged to be 10 per cent exposed; its successor GPT-5 puts that at just over 50 per cent, and Claude has it at 80 per cent."}],[{"start":174.75,"text":"These disparities have big downstream consequences for our understanding of whether and how AI really is impacting the labour market. Use the original scores to assess occupational exposure and we find that AI has had a weak negative impact on employment. Use Gemini’s assessments, and that flips to a weak positive impact, with the most exposed jobs seeing employment growth. As Yin emphasises, these starkly contrasting narratives come from using exactly the same data and methodology — only varying which AI model is doing the exposure rating."}],[{"start":208.55,"text":"The good news is this problem has a relatively straightforward solution: analyses of the real-world impacts of theoretical AI exposure should be run using several different models’ assessments of exposure. As Yin points out, discrepancies in the results could actually be useful, telling us that the findings reflect characteristics of the AI models, not the labour market. Where different models tell the same story, we can be more confident we’re looking at something concrete."}],[{"start":234.70000000000002,"text":"Reading Yin’s work, it also struck me that versions of this issue apply to other ongoing labour market debates, among them the impact of the rise in working from home. A paper published a few days ago found that the rise of remote work may be a better explanation than AI for the outsized decline in entry-level hiring over the past few years. Notably, the authors argue that one reason their analysis found this result where others have not is that their measure of an occupation’s exposure to remote work is based on actual job adverts offering remote or hybrid work, while others have used a theoretical assessment (carried out by humans but conceptually similar to the AI classification method)."}],[{"start":273.85,"text":"Sarah, the whole field of AI job displacement research feels to me like a case where the closer you look, the fuzzier the picture becomes. What was your sense after reading this study?"}],[{"start":285.95000000000005,"text":"Sarah writes"}],[{"start":287.40000000000003,"text":"Well John, I won’t rehash here all the reasons that I think these measures of AI “exposure” should be taken with a spoonful of salt anyway (they’re here if anyone missed them). But Yin’s very interesting study made me consider a wider question: is there a similar “model disagreement effect” to be found in other areas in which LLMs are being used to make consequential recommendations or decisions?"}],[{"start":310.75000000000006,"text":"The EU’s GDPR rules stipulate that people have the right to insist on a “human review” of certain automated decisions which affect them profoundly, such as whether they can be approved for a loan, or whether their job application reaches the next stage."}],[{"start":325.50000000000006,"text":"But I wonder whether — in the future — people will also begin to insist on having a second and third “AI opinion”, for example by having their job application run through a system which relies on a different underlying model, to see if a different decision is reached. "}],[{"start":340.15000000000003,"text":"That wouldn’t necessarily be as straightforward as saying: “if this decision relied on an OpenAI model, please now run it through a Google and Anthropic model.” For a start, not all “AI” systems rely on non-deterministic LLMs. Many will use proprietary sets of models designed for a specific purpose. HireVue, for example, which is one of the biggest providers of automated hiring software, uses a number of different third-party and proprietary models to turn people’s video interviews into text, to analyse that text, and to assign job applicants “competency” scores on various metrics such as “adaptability” or “problem solving.” "}],[{"start":377.45000000000005,"text":"But it’s an interesting line of thought and I think it does, at the least, increase the case for requiring maximum disclosure from companies about what goes on “under the hood” when AI is involved in making decisions that affect people’s lives."}],[{"start":391.15000000000003,"text":"I do have one question for you, John. It looks like the newer models Yin tried were more bullish about how many tasks could be automated. Is that because those more recent models are, indeed, more capable across a wider set of tasks? And therefore - to play devil’s advocate - is the fact they come up with higher exposure estimates actually quite useful information?"}],[{"start":411.8,"text":"John replies . . . "}],[{"start":414,"text":"Thanks Sarah, that’s a really interesting point on requesting review by a particular model. In answer to your question about newer models being more bullish on AI capabilities, there certainly seems to be some of that at play. On average GPT-5 scored tasks and jobs as being roughly twice as exposed to AI as its predecessor did in the original study, and Claude 4.5 (the newest of the models tested) was the punchiest of all."}],[{"start":441.2,"text":"As you surmise, Yin attributes some of this to the newer models ‘knowing’ about their own expanded abilities and having information on emerging AI capabilities that were not in GPT-4’s training data; in that sense the higher exposure scores could indeed be more useful. But I would add two caveats."}],[{"start":461.45,"text":"One is that the model whose task assessments are the closest match to measured real-world AI usage is the oldest — GPT-4. Though this may be because the models’ assessments are inherently forward-looking, and perhaps when we look back, the scores given by 2026-era models will be a better match for 2028-era usage."}],[{"start":483.25,"text":"The bigger one is that some of this seems to be about much broader differences in interpretation of the question of what tasks AI can do. Claude is particularly confident that AI tools can manage and supervise human workers in a wide range of jobs, rating occupations from CEOs to factory floor supervisors as highly exposed. That assessment is not on its face preposterous, but when Gemini rates the same jobs as very low exposure, they’re probably ‘thinking’ about the question in very different ways."}],[{"start":514.8,"text":"Recommended reading"}],[{"start":null,"text":"

Ethan Mollick has a thoughtful essay on the need to build healthy and intentional habits around the ways we use AI, so as to have it enhance our strengths instead of making us all less human (John)
Our colleague Ellesheva Kissin penned a very interesting Big Read on the ways in which AI is changing the consulting industry (Sarah)

"}],[{"start":null,"text":"

Recommended newsletters for you

The Lex Newsletter — Lex, our investment column, breaks down the week’s key themes, with analysis by award-winning writers. Sign up here

Working It — Everything you need to get ahead at work, in your inbox every Wednesday. Sign up here

"}],[{"start":522.3499999999999,"text":""}]],"url":"https://audio.ftcn.net.cn/album/a_1780156134_2598.mp3"}

Who decides which jobs AI will take?

Recommended newsletters for you

FT商学院

相关话题

Who decides which jobs AI will take?

Recommended newsletters for you

FT商学院

相关话题

您可能感兴趣的文章