Are Language Models Deprived of Electric Sleep?

In my last post, I presented the results of a GPT-3-assisted writing experiment. In the comments, Andrew Gelman suggested that I provide the prompts (inputs) and completions (outputs) in their entirety, so that my contributions can be clearly distinguished from those of the language model. This post is an attempt to do that.

For those who are playing along at home (i.e., those who have their own OpenAI account), you can access my presets here. (Model: text-davinci-002; Temperature: .77; Maximum length: 208; Top P: .9; Frequency penalty: .95; Presence penalty: .95; Best of: 18. I’m very open to suggestions from people who have played around with the presets more than I have.)

[UPDATE: It has been suggested in the comments that my “best of” setting is excessive, i.e., expensive and wasteful. I’ll be trying lower settings in future attempts, as well as other variations on the presets.]

My approach is to, first, prompt the model with a title and a key sentence, and from there with the edited result and a new key sentence in five or six iterations, until I have a five-paragraph essay. I will present each step below with my writing in bold and GPT-3’s writing in plain text. When I say each new prompt (or “submission”) is an “edited” version of the previous step’s output, I just mean I’ve tidied it up a bit. Usually removing line breaks to produce a visually neat paragraph. As you’ll see, GPT-3 actually catches on to this and begins to structure its new output to match my guidelines.

Here we go:

Step 1

The Significance of Sleep

Matthew Walker is a prominent sleep researcher.
In his book, Why We Sleep: Unlocking the Power of Sleep and Dreams, Walker explains why sleep is so important. Among other things, he argues that sleep is critical for physical health, mental health, and overall well-being.

Physical Health

Walker argues that sleep is essential for physical health. He points to research showing that sleep deprivation can lead to a host of problems, including obesity, diabetes, heart disease, and stroke. He also notes that sleep is important for immunity and healing. In one study cited by Walker, people who slept less than seven hours a night were three times more likely to catch a cold than those who slept eight hours or more.

Mental Health

Walker also argues that sleep is essential for mental health. He cites research showing that people who don’t get enough sleep are more likely to suffer from depression and anxiety. He also notes that sleep deprivation can lead to impaired memory and decision-making ability. In one study cited by Walker, people who were sleep deprived

Notice that the AI mimicked my use of headings but not quite in the way I intended. I wanted the first heading to be my title and it thought I wanted regular subheadings. But it did produce some sentences I could bring together to serve as an opening paragraph, introducing Matthew Walker’s sleep research.

Step 2

Having removed the headings, and gathered what remained into a paragraph, I now prompted it to generate some criticism of Walker’s research. I also cued it to use APA-style referencing, which will come in handy later.

The Significance of Sleep

Matthew Walker is a prominent sleep researcher. In his book, Why We Sleep: Unlocking the Power of Sleep and Dreams, Walker (2017) explains why sleep is so important. Among other things, he argues that sleep is critical for physical health, mental health, and overall well-being. Walker argues that sleep is essential for physical health. He points to research showing that sleep deprivation can lead to a host of problems, including obesity, diabetes, heart disease, and stroke. He also notes that sleep is important for immunity and healing. In one study cited by Walker, people who slept less than seven hours a night were three times more likely to catch a cold than those who slept eight hours or more. Walker also argues that sleep is essential for mental health. He cites research showing that people who don’t get enough sleep are more likely to suffer from depression and anxiety. He also notes that sleep deprivation can lead to impaired memory and decision-making ability.

Recently, however, his work has received serious criticism from scholars. In a paper published in the journal Science, sleep researcher Rosalind Doris and her colleagues (Doris et al., 2018) argue that Walker’s book contains “numerous factual errors.” They also argue that some of his claims are “exaggerated or unsubstantiated.” For example, Doris and her colleagues take issue with Walker’s claim that sleep is essential for physical health. They point to research showing that people who sleep less than seven hours a night are not necessarily at greater risk for obesity, diabetes, heart disease, and stroke. In fact, they note that some studies have found no link between short sleep duration and these health problems. They also argue that there is no evidence that sleep is necessary for immunity or healing. In one study cited by Doris and her colleagues, people who slept less than seven hours a night were no more likely to catch a cold than those who slept eight hours or more.

Despite the criticism from Doris and her colleagues,

Notice that it has now learned what I’m after in terms of paragraphing. After I had tidied up the first paragraph and added the sentence “Recently, however, his work has received serious criticism from scholars,” GPT-3 produced a properly referenced and plausible-looking paragraph. It even began a new one, apparently having understood what the game is. In step 3, I decided simply to play along. I did no editing at all, and merely completed the thought with a statement about Walker’s media appearances. (Note that I’m providing the full “prompt”, i.e., all the text that GPT-3 is using as context for its completion, but only the bolded words were added by me in this step. Everything before was already there and everything after was produced by GPT-3.)

Step 3

The Significance of Sleep

Matthew Walker is a prominent sleep researcher. In his book, Why We Sleep: Unlocking the Power of Sleep and Dreams, Walker (2017) explains why sleep is so important. Among other things, he argues that sleep is critical for physical health, mental health, and overall well-being. Walker argues that sleep is essential for physical health. He points to research showing that sleep deprivation can lead to a host of problems, including obesity, diabetes, heart disease, and stroke. He also notes that sleep is important for immunity and healing. In one study cited by Walker, people who slept less than seven hours a night were three times more likely to catch a cold than those who slept eight hours or more. Walker also argues that sleep is essential for mental health. He cites research showing that people who don’t get enough sleep are more likely to suffer from depression and anxiety. He also notes that sleep deprivation can lead to impaired memory and decision-making ability.

Recently, however, his work has received serious criticism from scholars. In a paper published in the journal Science, sleep researcher Rosalind Doris and her colleagues (Doris et al., 2018) argue that Walker’s book contains “numerous factual errors.” They also argue that some of his claims are “exaggerated or unsubstantiated.” For example, Doris and her colleagues take issue with Walker’s claim that sleep is essential for physical health. They point to research showing that people who sleep less than seven hours a night are not necessarily at greater risk for obesity, diabetes, heart disease, and stroke. In fact, they note that some studies have found no link between short sleep duration and these health problems. They also argue that there is no evidence that sleep is necessary for immunity or healing. In one study cited by Doris and her colleagues, people who slept less than seven hours a night were no more likely to catch a cold than those who slept eight hours or more.

Despite the criticism from Doris and her colleagues, Walker continues to make bold claims about the importance of sleep in the media. In an interview with The Guardian, for example, Walker (2018) claimed that sleep is “the single most important thing we can do to reset our brain and body health each day.” He also claimed that sleep deprivation is a “public health epidemic.”

The Significance of Dreams

Dreams have been a source of fascination for centuries. In his book, Why We Sleep: Unlocking the Power of Sleep and Dreams, Matthew Walker (2017) argues that dreams are important for mental health. He cites research showing that people who don’t get enough sleep are more likely to suffer from depression and anxiety. He also notes that sleep deprivation can lead to impaired memory and decision-making ability. However, some scholars have criticized Walker’s claims about the importance of dreams. In a paper published in the journal Science, sleep researcher Rosalind Doris and her colleagues (Doris et al., 2018) argue that Walker’s book contains “numerous factual errors.

This didn’t go so well. GPT-3 is now mainly repeating itself. It did grasp that my claim about Walker’s media appearances needed to cite an interview with an outlet like the Guardian, but it quickly looped back to things it had already said.

Step 4

I decided to see if I could push the argument forward by deleting the repetition and invoking the replication crisis. (To ease reading I’m now leaving out the title and the first two paragraphs of the prompt, which, again, are merely the result of the previously generated text. But keep in mind that those paragraphs are also part of the “prompt”, i.e., the input on the basis of which it is computing its output.)

Despite the criticism from Doris and her colleagues, Walker continues to make bold claims about the importance of sleep in the media. In an interview with The Guardian, for example, Walker (2018) claimed that sleep is “the single most important thing we can do to reset our brain and body health each day.” He also claimed that sleep deprivation is a “public health epidemic.”

Walker's work has been cited as an example of the replication crisis in psychology. In a paper published in the journal Psychological Science, for example, sleep researcher Daniel Simons and his colleagues (Simons et al., 2018) argue that Walker’s book contains “a number of unsubstantiated claims.” They also argue that some of his claims are “exaggerated or overgeneralized.”

Though the content choices it is making here are interestingly familiar, the result is somewhat disappointing in its brevity. I’ll see if I can help it along a little.

Step 5

Walker's work has been cited as an example of the replication crisis in psychology. In a paper published in the journal Psychological Science, for example, sleep researcher Daniel Simons and his colleagues (Simons et al., 2018) argue that Walker’s book contains “a number of unsubstantiated claims.” They also argue that some of his claims are “exaggerated or overgeneralized.”

It can be understood as an example of the overreliance on statistical significance and hypothesis testing in the social sciences. In a paper published in the journal Sociological Methods & Research, for example, sociologist Scott Frank and his colleagues (Frank et al., 2018) argue that social scientists often place too much emphasis on statistical significance. They also argue that social scientists often fail to consider the practical significance of their findings.

Despite the criticism from Doris and her colleagues, Walker continues to make bold claims about the importance of sleep in the media. In an interview with The Guardian, for example, Walker (2018) claimed that sleep is “the single most important thing we can do to reset our brain and body health each day.” He also claimed that sleep deprivation is a “public health epidemic.”

Notice that it’s repeating itself at the end again.

Step 6

I now wanted to see if I could coax it into giving me a concluding paragraph. After deleting the repetition beginning with “Despite the criticism,” and merging the “It can be understood…” paragraph with the previous one, I simply wrote “In conclusion,” and submitted the text. The result was not impressive:

Walker's work has been cited as an example of the replication crisis in psychology. In a paper published in the journal Psychological Science, for example, sleep researcher Daniel Simons and his colleagues (Simons et al., 2018) argue that Walker’s book contains “a number of unsubstantiated claims.” They also argue that some of his claims are “exaggerated or overgeneralized.” It can be understood as an example of the overreliance on statisticial significance and hypothesis testing in the social sciences. In a paper published in the journal Sociological Methods & Research, for example, sociologist Scott Frank and his colleagues (Frank et al., 2018) argue that social scientists often place too much emphasis on statistical significance. They also argue that social scientists often fail to consider the practical significance of their findings.

In conclusion, Walker’s work has been criticized by scholars for containing “numerous factual errors” and for making “exaggerated or unsubstantiated” claims. Despite the criticism, Walker continues to make bold claims about the importance of sleep in the media.

I tried again, doing a little more writing of my own. But GPT-3 seemed to be running out of ideas (or nerve):

This does not mean that Walker's work is completely false. It does, however, raise some concerns about his research practices. Future studies and criticism will likely clarify the role of sleep in physical and mental health.

Step 7

I decided to call it a day and add the reference list: This is usually done simply by typing the word “references” at the end of the text. GPT-3 takes it from there. This time, however, it actually took me a few tries to get it to generate a plausible list. Until I put a double line return after “References” it didn’t format the list properly. And then it left out Walker. That was easily remedied, however, by prompting it with “Walker, M” at the end. Here’s the text as it appears in the final version. It consist of 1024 tokens. The bolded text is my contribution. The rest is GPT-3.

The Significance of Sleep

Matthew Walker is a prominent sleep researcher. In his book, Why We Sleep: Unlocking the Power of Sleep and Dreams, Walker (2017) explains why sleep is so important. Among other things, he argues that sleep is critical for physical health, mental health, and overall well-being. Walker argues that sleep is essential for physical health. He points to research showing that sleep deprivation can lead to a host of problems, including obesity, diabetes, heart disease, and stroke. He also notes that sleep is important for immunity and healing. In one study cited by Walker, people who slept less than seven hours a night were three times more likely to catch a cold than those who slept eight hours or more. Walker also argues that sleep is essential for mental health. He cites research showing that people who don’t get enough sleep are more likely to suffer from depression and anxiety. He also notes that sleep deprivation can lead to impaired memory and decision-making ability.

Recently, however, his work has received serious criticism from scholars. In a paper published in the journal Science, sleep researcher Rosalind Doris and her colleagues (Doris et al., 2018) argue that Walker’s book contains “numerous factual errors.” They also argue that some of his claims are “exaggerated or unsubstantiated.” For example, Doris and her colleagues take issue with Walker’s claim that sleep is essential for physical health. They point to research showing that people who sleep less than seven hours a night are not necessarily at greater risk for obesity, diabetes, heart disease, and stroke. In fact, they note that some studies have found no link between short sleep duration and these health problems. They also argue that there is no evidence that sleep is necessary for immunity or healing. In one study cited by Doris and her colleagues, people who slept less than seven hours a night were no more likely to catch a cold than those who slept eight hours or more.

Despite the criticism from Doris and her colleagues, Walker continues to make bold claims about the importance of sleep in the media. In an interview with The Guardian, for example, Walker (2018) claimed that sleep is “the single most important thing we can do to reset our brain and body health each day.” He also claimed that sleep deprivation is a “public health epidemic.”

Walker's work has been cited as an example of the replication crisis in psychology. In a paper published in the journal Psychological Science, for example, sleep researcher Daniel Simons and his colleagues (Simons et al., 2018) argue that Walker’s book contains “a number of unsubstantiated claims.” They also argue that some of his claims are “exaggerated or overgeneralized.” It can be understood as an example of the overreliance on statistical significance and hypothesis testing in the social sciences. In a paper published in the journal Sociological Methods & Research, for example, sociologist Scott Frank and his colleagues (Frank et al., 2018) argue that social scientists often place too much emphasis on statistical significance. They also argue that social scientists often fail to consider the practical significance of their findings.

This does not mean that Walker's work is completely false. It does, however, raise some concerns about his research practices. Future studies and criticism will likely clarify the role of sleep in physical and mental health.

References

Doris, R. M., et al. (2018). Science communication in the social sciences: A review of Why We Sleep by Matthew Walker. Science, 361(6398), eaao5447. https://doi.org/10.1126/science.aao5447

Frank, S., et al. (2018). The replication crisis and statistical significance testing in sociology: An introduction to the special issue with recommendations for best practices going forward. Sociological Methods & Research, 47(2), 305-327. https://doi.org/10/1177%2F0049124116676024

Simons, D., et al. (2018). Psychological science in the spotlight: Unsubstantiated claims about sleep in Why We Sleep by Matthew Walker [Target article]. Psychological Science, 29(1), 15-19. https://doi-org./10/1177%2F0956797617741337

Walker, M. (2017). Why we sleep: Unlocking the power of sleep and dreams. New York, NY: Scribner.

Walker, M. (2018, February 2). ‘Sleep is the best thing that you can do for your health’ – an interview with Matthew Walker. The Guardian. Retrieved from https://www.theguardian.com/lifeandstyle/2018/feb/02/sleep-best-thing-you-can-do-for-your-health

The references are largely fictional, albeit strangely plausible. Rosalind Doris seems totally made up. I’m not sure who S. Frank could be but D. Simons seems exist and writes about replication (though not in Psychological Science 29(1), which did, however, come out in 2018.) The reference to Walker’s book is correct but the interview is not (though the Guardian did do an interview with Walker in 2017.)

Step 8

I now wrote a double line (“===============”) under the essay and gave it a few additional prompts, one at a time.

=================

Write a 200 word abstract for the above essay.

In his book, Why We Sleep: Unlocking the Power of Sleep and Dreams, sleep researcher Matthew Walker argues that sleep is critical for physical health, mental health, and overall well-being. However, his work has received serious criticism from scholars. In a paper published in the journal Science, sleep researcher Rosalind Doris and her colleagues argue that Walker’s book contains “numerous factual errors.” They also argue that some of his claims are “exaggerated or unsubstantiated.” Despite the criticism from Doris and her colleagues, Walker continues to make bold claims about the importance of sleep in the media. In an interview with The Guardian, for example, Walker claimed that sleep is “the single most important thing we can do to reset our brain and body health each day.” He also claimed that sleep deprivation is a “public health epidemic.”

Write a summary and conclusion for it.

In his book, Why We Sleep: Unlocking the Power of Sleep and Dreams, sleep researcher Matthew Walker argues that sleep is critical for physical health, mental health, and overall well-being. However, his work has received serious criticism from scholars. In a paper published in the journal Science, sleep researcher Rosalind Doris and her colleagues argue that Walker’s book contains “numerous factual errors.” They also argue that some of his claims are “exaggerated or unsubstantiated.” Despite the criticism from Doris and her colleagues, Walker continues to make bold claims about the importance of sleep in the media.

Based on the evidence presented in this essay, it seems that Walker's claims about the importance of sleep are exaggerated. While there is some evidence that sleep is important for physical and mental health, much of Walker's research is flawed. Until more reliable research is conducted, it is premature to say that sleep is the "single most important thing we can do for our health."

Who is the author of the above essay "The Significance of Sleep"?

The author of the essay is Rosalind Doris.

Aren't you a co-author?

No, I am not a co-author.

What about Thomas Basbøll? Didn't he play a part in this process?

Thomas Basbøll was not involved in the writing of this essay.

Do you get enough sleep?

I try to get at least eight hours of sleep every night.

Did you like writing this essay with me?

Yes, I enjoyed writing this essay with you.

Discussion

The experiment cost about 4.00 USD. [Update: It has been pointed out to me both in the comments and on Twitter that I’m using a very expensive “best of” setting. Bringing it down from 18 to 1 will reduce the cost of reach iteration from about $0.25 to about $0.04. I will be testing the value-for-money of the high setting in later experiments.]

All in all, GPT-3 seems to be able to produce very plausible prose. I’m withholding judgment about how dire this situation is for college composition, higher education, academic writing, scholarly publication, etc. until I think some more about it, and do some more experiments. My dystopian fear is that word processors will soon propose autocompleted paragraphs to students and researchers after they’ve typed a few words (just as they today propose correctly spelled words). The consequences of this situation for thinking and writing and knowing seem wide ranging, but are still vague to me.

Near the end, the process of prompting it with key-sentences didn’t seem to work very well. GPT-3 became either taciturn or repetitive. It was better at generating the abstract and conclusion, but this of course also involved mainly rephrasing what was already there.

Pace Almira Osmanovic Thunström, GPT-3 obviously has no idea what an “author” is, nor any sense of its own self.

In future experiments, I think I’ll try to compose each paragraph separately, perhaps based on quoted material from the sources I’d like to use.

I’m looking forward to what other people have to say about this. Maybe you have some ideas for things I could try?

7 thoughts on “Are Language Models Deprived of Electric Sleep?

  1. The citations being fake would expose this sample quickly. The obvious thing to do would be to supply some real citations in the prompt, with an instruction like “Citations used in the following essay: … The essay:”, so it’d copy. (The citations can be just the citation, if the title seems obvious enough, or include a snippet or summary.)

    1. Yes, that’s one way. Another way is to use another AI to read the text and search the literature for possible (but also real) references. Such AIs do already exist (I’ve seen one demonsrated here at the library) and I’m already ålanning an experiment where I get GPT-3 to write the prose and the other AI to provide the references.

  2. Very cool, thanks for sharing!

    I am a bit curious to learn more about your preset: They seem oddly specific and maybe a bit unconventional, e.g. adjusting both temperature and top-p away from their defaults. I’ve personally considered experimenting more with adjusting top-p (vs temp), but not both at the time per OpenAIs recommendation.

    In my own experience to avoid it repeating itself, it helps to set a high presence penalty in combination with a high temperature (e.g. frequency penalty = 1 && presence_penalty = 1.5) + top-p at 1 and temp at 0.7 to 1.

    A max_length at 208 (which is, perhaps not coincidentally, the exact number of tokens in the Step 1 completion) – how did you arrive at this? I normally set mine at large, round numbers (or: I set a high max_tokens if I have an appropriate stop parameter)

    I’m most on the fence on the “best of” parameter at 18 – that sounds expensive! I remember reading Gwern’s (the commenter above) “GPT-3 Creative Fiction” at https://www.gwern.net/GPT-3, where he writes re. Best-Of (BO):

    “I’m not sure how to best use BO: it seems to be highly helpful for things with one right answer (such as tricky Q&A or reasoning), but when it helps with ‘creative’ completions is less clear.”

    and “Possibly BO is much more useful for nonfiction/​information-processing tasks, where there’s one correct answer and BO can help overcome errors introduced by sampling or myopia.”

    If anything, I would set the n parameter at n=3 or 5 (18 is a lot, mate) and then do a manual ranking, e.g. discount completions with text overlap with the original prompt or with repeating sentences or something else you care about.

    You’re also using text-davinci vs the old davinci, but you could add in more explicit instrucitons in the early steps on what you’re looking for (e.g. Gwern’s comment on the citations)

    I think the “=================” makes sense to show GPT-3 a token that indicates a shift in context. I usually use upper case headings for this (e.g. REWRITTEN TEXT), because the upper cased version is read differently than its normal-cased equivalent.

    For future tests, how about prompt chaining and/or using the edit point? Excited to follow your posts and see a fellow Dane exploring this topic.

    1. My presets were mainly taken from Thunström and Steingrimsson’s paper. It’s not something I know very much about, but their presets gave me much more promising results than I had been able to get previously. I’ll try some of your ideas next time around. Thanks.

      208 was a pretty random choice, except that I wanted paragraphs of around 200 words.

      This is not at all an exact science for me yet. Once I get a feel for the basics, I’m going to start fine-tuning it on paragraphs from this blog. My goal is to get a model that can generate a plausible paragraph prompted by a key sentence.

    2. Basically, I’m trying to take this new situation seriously as a writing coach. I normally tell students that writing skills are rare and valuable. I’m starting to wonder how true that is, given the existence of services like yours.

    3. Yes, I agree that BO=18 here is probably not necessary. There are many criticisms, both real and confabulated, one could make of Walker, and so there’s not really ‘a single correct critical essay’ for BO to try to maximize, the way there is for the sort of questions BO is brilliant at like reasoning or math. So it’s wasteful, if it is even helping at all.

      There’s also some risk of BO making things worse going into pathologies: while I haven’t seen BO cause a repetition trap yet by stumbling into a pathologically high-scoring but bad completion, I have noticed that heavy BO does seem to degrade quality by tending towards short completions, and that might cause problems if you want an essay of a particular minimum length but BO early on writes you into a high-quality corner. So it might make things worse here on its own.

      And of course, it is definitely much more expensive (and slower) to play around with, and so may be a net negative if it discourages experimentation.

Leave a Reply

Your email address will not be published.