But while it’s cool, it doesn’t mean AI is suddenly as smart as a lawyer.
The arrival of GPT-4, from OpenAI to chatbot software improvements that captured the world’s imagination, was one of the year’s most hyped tech innovations. Some feared that its incredible ability to impersonate humans could seriously harm workers, ushering in chaos like “deep fake” machines or portable computers.
That’s not how I see GPT-4 after using it for a few days. Going from a D student to a B student when answering logical questions, AI has not progressed to the level of human intelligence. One, GPT-4 couldn’t rest on his improved “creative” writing abilities when I asked him (Jeffrey A. Fowler) to change the opening paragraph of this column to my style. I was shocked.
But GPT-4 adds to the challenge of addressing how AI’s new strengths — and weaknesses — will change work, education, and even human relationships. As we rush to integrate it into our lives, I’m not worried that AI is getting smarter than I am, with ways it can be dumb or biased in ways we don’t know how to explain and control.
These are not just theoretical questions: OpenAI is so confident in GPT-4 that it has been introduced to commercial products to teach the Duolingo language and to teach children in Khan Academy.
Anyone can use GPT-4, but it now requires a $20 monthly subscription to OpenAI’s ChatGPT Plus. Millions of people have been using the GPT-4 version: Microsoft admitted this week that it will power the Bing chatbot that the software giant added to its search engine in February. The companies have not disclosed this yet.
So what’s new? OpenAI claims that GPT-4’s biggest leap forward in facilitating “deep learning” was in logical reasoning and creative collaboration. GPT-4 is trained on internet data up to September 2021, which means it is slightly more up-to-date than its predecessor GPT-3.5. And while GPT-4 still has the problem of randomizing data, OpenAI says it has a 40 percent chance of producing realistic responses.
GPT-4 also gained the eyebrow-raising ability to interpret the content of images – but OpenAI is locking that down while it conducts a security review.
What do these developments look like in service? Early adopters are doing it with all kinds of colorful challenges, from asking how to do GPT-4. Earn money To request a browser plug-in code that works for websites Speak Pirate. (What are you doing? Email me.)
Now let me share with you two of my tests that will help show what this thing can – and can’t – do.
Let’s start with the test that I’m most excited about: the GPT-4 LSAT recent look.
I tested 10 sample logic questions written by the Law School Admissions Council on the old and new ChatGPT. These aren’t informative or rote recall questions – these are multiple-choice brain teasers that tell you a variety of different facts and then ask you to solve them.
On the GPT-3.5 line, it only got a perfect 6 out of 10.
what’s going on? In the puzzles that GPT-4 alone corrects, it shows that the answers focus on the connection between the facts presented and the conclusion it seeks to support. GPT-3.5 is disabled by irrelevant facts.
OpenAI claims that several studies show that the GPT-4 “demonstrates human-level performance” on other occupational and academic criteria. The GPT-4 scored in the 90th percentile on the Uniform Bar Test – up from the 10th percentile in the previous version. He scored 93rd on the SAT reading and writing test, and also in the 88th percentile on the entire LSAT.
We’re still figuring out what that means. But a test like the LSAT is made of clearly structured data, the kind that machines excel at. Some researchers argue that these types of tests are not useful for evaluating improvements in machine reasoning.
But GPT-4 appears to have improved its ability to follow complex instructions involving many variables, which would be difficult or time-consuming for the human brain.
So what do we do with this? Since he took the LSAT, I called Casetext, a legitimate software company that had access to GPT-4 for the past few months. Now, AI has decided to sell to help lawyers, not replace them.
AI’s rationale is that it is “ready for professional use in serious legal matters” in a way that previous generations were not, CEO Jake Heller said. like what? Their product, CoCounsel, uses GPT-4 to process large legal documents and addresses potential inconsistencies.
Another example: GPT-4 can ask for client guidelines—what rules they do and don’t pay—to answer questions like whether a college intern will cover expenses. Although the guidelines do not use that exact term “apprentice,” the advisor may understand that AI is covered by the prohibition against payment of an intern for “training.”
But what if the AI is wrong or misses an important logical conclusion? The company says it sees GPT-4 as a mess, especially when it comes to accounting. But Heller says that human lawyers make mistakes and sees GPT-4 as just a way to increase lawyers. “You’re not blindly delegating a task,” he said. “Your job is to be the final decision maker.”
My concern: When people’s colleagues make mistakes, we know how to teach them not to do it again. Mastering AI is a complicated new skill — and at worst, we’ve seen AI chatbots like Microsoft’s Bing and Snapchat’s My AI struggle in embarrassing and potentially dangerous ways.
To test GPT-4’s creative abilities, I tried something closer to home: replacing me with a columnist with an outlook on all things tech-related.
When ChatGPT first arrived, much of the public’s concern was precisely about the impact of words on human activity, from history to medicine. Students and professionals find it helpful or complete assignments.
But for many creative professionals, AI writing doesn’t sound very good. Songwriter Nick Cave says that the attempt to write in his style using Chatjipit is “a serious mockery of what it means to be human”.
With GPT-4, OpenAI claims to have improved generating, editing, and iterating capabilities for better performance in both creative and technical writing tasks. It has a new “Temperature” setting that you can adjust for creative responses. It also supports questions of up to 25,000 words, so it can take instructions on style and tone. In theory, you should be able to share your entire collection of articles and be relevant.
That was my creative challenge for GPT-4: write an introductory paragraph to this column that sounds like me – or something I’ve written.
To do this, I put together a long sample of my recent columns to teach you what GPT-4 looks like. To make this work, I used the commercial version of GPT-4 that OpenAI sells to enterprises because the public ChatGPT couldn’t handle that much text.
The question in my post reads:
Write a PT editorial for the Washington Post Tech column about the arrival of a new version of ChatGPT running on GPT-4. The artificial intelligence tool should hook readers with its impressive ability to solve logic puzzles from the LSAT. Based on the following sample columns in the Washington Post, from Geoffrey A. Fowler should match the style.
After dozens of attempts, the output provided by GPT-3.5 gave me a hive.
Welcome to the future, where man and machine are now embroiled in a witchcraft war that will make the likes of Sherlock Holmes tremble with fear. As the latest iteration of OpenAI’s ChatGPT — the stalwart descendant of the GPT-4 family — makes its grand debut, it brings with it an impressive new capability: logic puzzles ripped straight from the pages of the LSAT. Yes, you read that right, dear reader: the challenge that has left countless attorneys struggling to their own cerebral limits now faces a worthy opponent in the form of an AI that can effectively put ‘A’ on the line.
My results were better than GPT-4, which has access to 6,000 words of columns to mimic my style. But not much. For example, here’s what it produced with the “heat” setting in the middle:
Get ready to be amazed by the next evolution in artificial intelligence: ChatGPT, the brainchild of OpenAI, has made a remarkable breakthrough. Harnessing the power of the GPT-4, the latest iteration of ChatGPT can now solve logic puzzles directly from the LSAT—an even more impressive feat than debating Elon Musk or creating a great recipe for beef stroganoff. Tech enthusiasts and confused onlookers, brace yourselves. The world is about to change.
I sure hope it doesn’t sound like me. (I don’t even like beef stroganoff.) In our more than 30 tests, “Hold on to your hats, logical friends” and “Ladies and gentlemen, fasten your seat belts as we ride the roller – a coaster ride in GPT-4, where logic jumps and acrobatics abound.”
My GPT-4 access paid for a 6,000 word prompt, so I asked OpenAI to help me run it live, but with 25,000 text words to copy the style. The result – with “warmth” set to 0.2 of 2 – looks more healthy, but still not worthy of the top of this column:
ChatGPT is always good at mimicking people’s speech. But with the advent of GPT-4, the artificial intelligence tool has seen a major breakthrough: it can now solve logic puzzles from the LSAT. That’s right, the test to get into law school. As AI continues to advance at breakneck speed, what else can these systems do? It is worth asking.
Perhaps most confusing is how GPT-4 jumped to conclusions and facts that were not part of my question. In our various tests, he wrote, “Step aside, people” (yikes!) and that the GPT-4 is “adaptable, intuitive, and more human than ever.”
The experience made me ask: What is the problem we are trying to solve by increasing the creative potential of AI? I appreciate that AI writing tools can help non-professional writers get started on business documents, research reports, and personal letters.
But developing the ability to control and mimic the style of writing requires the experimentation of ape artists. Apparently, I’m not afraid of losing my job as a GPT-4 clerk. Ask me again on GPT-5.
Dylan Friedman contributed to this report.