Category: Ed Tech

Teaching Skills are Durable Skills with AI –
You are here: Home / Ed Tech / Teaching Skills are Durable Skills with AI
By Michael Feldstein. Posted on February 12, 2024

I recently gave a keynote on AI at the durable skills-themed D2L Ignite conference in Orlando. I took the following positions:

Durable skills, unlike so many educational buzzwords, is a genuine civilizational shift that requires our urgent attention. AI does not cause it. It just made the change obvious.

AI genuinely will cause profound and unforeseeable changes to the way we live. I gave a highly personal example to make this point vivid.

Teaching skills are durable skills that translate quite well to the AI world.

Other skills, such as those required to design and test solutions to complex humans, are durable skills.

As usual, I tried to cram an hour-long talk into 45 minutes, so I rushed some parts and left a few dots unconnected. In this post, I’ll the video and restate the elements of the third bullet point to ensure they’re clear. I’m putting the video at the bottom of the post because I’m hoping you’ll read it before watching the talk and keeping the post short because the idea that you’ll both read a blog post and watch a 45-minute talk is expecting a lot.

To be clear, I’m not arguing that teaching skills are durable skills because generative AI works like the human mind. It doesn’t. I’ll briefly explain why each teaching skill I discuss transfers to AI. The reasons are different from point to point.

Here are the skills:

Scaffolding: In education, scaffolding is rooted in Vygotsky’s notion of the Zone of Proximal Development (ZPD). We’re helping students stretch beyond what they could learn on their own by providing them with temporary supports or building blocks, progressively removing the support as we go. With AI, we focus the model on the right pieces by providing context and examples. It knows a lot, but it needs context. So, to get good results, we remind it of basic concepts it already knows, similarly to how we teach students of the basic concepts they need to solve complex problems. As with human students, we feed it more complex pieces to put together until it is thinking the way we need it to. The AI has something akin to the ZPD in the sense that it doesn’t always need scaffolding. Some things it can figure out on its own. Other things it can’t figure out even with help. Even though the reasons are entirely different, we get better results when we act as if the AI has a ZPD and apply scaffolding when we find ourselves working within that zone.

Formative assessments: Much is made of the fact that the AI is a black box. Little is made of the fact that the human mind is also a black box. We don’t know what students understand. In fact, good teachers probe continuously, in part because we are constantly trying to get a read on what the student understands and because students change. They learn. AIs don’t learn in the same way that students do, but they can change over time. ChatGPT is better at understanding some things than it was six months ago. And some of those improvements aren’t obvious. We have to design probes to test.

Worked examples: This one is crucial and goes beyond using generative AIs to actually building or fine-tuning models. With students, we show them how to solve a problem: here’s the question, here’s the answer, and here’s how we got from the question to the answer. If we’re making full use of this technique, we’ll show students a series of subtly but importantly different worked examples so that they can learn nuances. With AI, whether we are writing a prompt or constructing a training data set, the ideal input is a series of examples where we say to the machine, here is the input, here is the desired output, and here is an annotation explaining why this is good output. Particularly with model training, we want to provide a series of subtly but meaningfully different examples so that it can learn to differentiate.

Writing: To do almost anything with generative AI, you must be a good, clear, precise writer. We stress out about ChatGPT causing the loss of writing skills, forgetting that the majority of interactions most people have with the technology is, in fact, through writing. And better writing gets better answers or, if you’re training a model, better input data.

That’s the short but (hopefully) clear version of the third part of the talk.

The example I use for the second part of my talk is how ChatGPT helped me cope with the stream of medical information I was receiving about my little sister, who recently suffered a life-threatening brain hemorrhage. I recorded this video on my iPhone with no intention of sharing it with anyone but my close family. My sister is a teacher. I wanted to show her how the story of her struggle is helping other educators (and to show her a little bit of what I do for a living, which I have trouble explaining). I told her story to the D2L conference audience with her husband’s permission and with no intention of taking it further. I have been urged by a few people who were there that day to share it more widely. And so, with the blessing of my brother-in-law, I am publishing it. (My sister, by the way, is making amazing progress in her recovery. I hope she will be able to watch the video herself soon.) If you watch it and find it valuable, please comment below. She will find it meaningful to know that her story is helping other educators.

This is for you, Sharon.

D2L Rise Orlando 2024 Keynote on Durable Skills and Artificial Intelligence: https://youtu.be/ufwEElHcZAs
Source link
February 12, 2024
How it Breaks in Subtle Ways –
In my last post, I explained how generative AI memory works and why it will always make mistakes without a fundamental change in its foundational technology. I also gave some tips for how to work around and deal with that problem to safely and productively incorporate imperfect AI into EdTech (and other uses). Today, I will draw on the memory issue I wrote about last time as a case study of why embracing our imperfect tools also means recognizing where they are likely to fail us and thinking hard about dealing realistically with their limitations.

This is part of a larger series I’m starting on a term of art called “product/market fit.” The simplest explanation of the idea is the degree to which the thing you’re building is something people want and are willing to pay the cost for, monetary or otherwise. In practice, achieving product/market fit is complex, multifaceted, and hard. This is especially true in a sector like education, where different contextual details often create the need for niche products, where the buyer, adopter, and user of the product are not necessarily the same, and where measurable goals to optimize your product for are hard to find and often viewed with suspicion.

Think about all the EdTech product categories that were supposed to be huge but disappointed expectations. MOOCs. Learning analytics. E-portfolios. Courseware platforms. And now, possibly OPMs. The list goes on. Why didn’t these product categories achieve the potential that we imagined for them? There is no one answer. It’s often in the small details specific to each situation. AI in action presents an interesting use case, partly because it’s unfolding right now, partly because it seems so easy, and partly because it’s odd and unpredictable, even to the experts. I have often written about “the miracle, the grind, and the wall” with AI. We will look at a couple of examples of moving from the miracle to the grind. These moments provide good lessons in the challenges of product/market fit.

In my next post, I’ll examine product/market fit for universities in a changing landscape, focusing on applying CBE to an unusual test case. In the third post, I’ll explore product/market fit for EdTech interoperability standards and facilitating the growth of a healthier ecosystem.

Khanmingo: the grind behind the product

Khan Academy’s Kristen DiCerbo gave us all a great service by writing openly about the challenges of producing a good AI lesson plan generator. They started with prompt engineering. Well-written prompts are miracles. They’re like magic spells. Generating a detailed lesson plan in seconds with a well-written prompt is possible. But how good is that lesson plan? How well did Khanmingo’s early prompts produce the lesson plans?

Kristen writes,
At first glance, it wasn’t bad. It produced what looked to be a decent lesson plan—at least on the surface. However, on closer inspection, we saw some issues, including the following:

Lesson objectives just parroted the standard

Warmups did not consistently cover the most logical prerequisite skills

Incorrect answer keys for independent practice

Sections of the plan were unpredictable in length and format

The model seemed to sometimes ignore parts of the instructions in the prompt

Prompt Engineering a Lesson Plan: Harnessing AI for Effective Lesson Planning
You can’t tell the quality of the AI’s lesson plans without having experts examine them closely. You also want feedback from people who will actually use those lesson plans. I guarantee they will find problems that you will miss. Every time. Remember, the ultimate goal of product/market fit is to make something that the intended adopters will actually want. People will tolerate imperfections in a product. But which ones? What’s most important to them? How will they use the product? You can’t answer these questions confidently without the help of actual humans who would be using the product.

At any rate, Khan Academy realized their early prompt engineering attempts had several shortcomings. Here’s the first:

Khanmigo didn’t have enough information. There were too many undefined details for Khanmigo to infer and synthesize, such as state standards, target grade level, and prerequisites. Not to mention limits to Khanmigo’s subject matter expertise. This resulted in lesson plans that were too vague and/or inaccurate to provide significant value to teachers.

Prompt Engineering a Lesson Plan: Harnessing AI for Effective Lesson Planning

Read that passage carefully. With each type of information or expertise, ask yourself, “Where could I find that? Where is it written down in a form the AI can digest?” The answer is different for each one. How can the AI learn more about what state standards mean? Or about target grade levels? Prerequisites? Subject-matter expertise for each subject? No matter how much ChatGPT seems to know, it doesn’t know everything. And it is often completely ignorant about anything that isn’t well-documented on the internet. A human educator has to understand all these topics to write good lesson plans. A synthetic one does too. But a synthetic educator doesn’t have experience to draw on. It only has whatever human educators have publicly published about their experiences.

Think about the effort involved in documenting all these various types of knowledge for a synthetic educator. (This, by the way, is very similar to why learning analytics disappointed as a product category. The software needs to know too much that wasn’t available in the systems to make sense of the data.)

Here’s the second challenge that the Khanmingo team faced:

We were trying to accomplish too much with a single prompt. The longer a prompt got and the more detailed its instructions were, the more likely it was that parts of the prompt would be ignored. Trying to produce a document as complex and nuanced as a comprehensive lesson plan with a single prompt invariably resulted in lesson plans with neglected, unfocused, or entirely missing parts.

Prompt Engineering a Lesson Plan: Harnessing AI for Effective Lesson Planning

I suspect this is a subtle manifestation of the memory problem I wrote about in my last post. Even with a relatively short text like a complex prompt, the AI couldn’t hold onto all the details. The Khanmingo team ended up breaking up the prompt into smaller pieces. This produced better results as the AI could “concentrate on”—or remember the details— one step at a time. I’ll add that this approach provides more opportunities to put humans in the loop. An expert—or a user—can examine and modify the output of each step.

We fantasize about AI doing work for us. In some cases, it’s not just a fantasy. I use AI to be more productive literally every day. But it fails me often. We can’t know what it will take for AI to solve any particular problem without looking closely at the product’s capabilities and the user’s very specific needs. This is product/market fit.

Learning design in the real world

Developing skill in product/market fit is hard. Think about all those different topics the Khanmingo team needed to not only know, but know their relevance to creating lesson plans well enough to diagnose the gaps in the AI’s understanding.

Refining a product is also inherently iterative. No matter how good you are at product designer how well you know your audience, and how brilliant you are, you will be wrong about some of your ideas early on. Because people are complicated. Organizations are complicated The skills workers need are often complicated and non-obvious. And the details of how the people need to work, individually and together, are often distinctive in ways that are invisible to them. Most people only know their own context. They take a lot for granted. Good product people spend their time uncovering these invisible assumptions and finding the commonalities and the differences. This is always a discovery process that takes time.

Learning design is a classic case of this problem. People have been writing and adopting learning design methodologies longer than I’ve been alive. The ADDIE model—”Analyze, Design, Develop, Implement, and Evaluate”—was created by Florida State University for the military in the 1970s. “Backward Design” was invented in 1949 by Ralph W. Tyler. Over the past 30 years, I’ve seen a handful of learning design or instructional design tools that attempt to scaffold and enforce these and other design methodologies. I’ve yet to see one get widespread adoption. Why? Poor product/market fit.

While the goal of learning design (or “instructional design,” to use the older term) is to produce a structured learning experience, the thought process of creating it is non-linear and iterative. As we develop and draft, we see areas that need tuning or improving. We move back and forth across the process. Nobody ever follows learning design methodologies strictly in practice. And I’m talking about trained learning design professionals. Untrained educators stray even further from the model. That’s why the two most popular learning design tools, by far, are Microsoft Word and Google Docs.

If you’ve ever used ChatGPT and prompt engineering to generate the learning design of a complex lesson, you’ve probably run into unexpected limits to its usefulness. The longer you spend tinkering with the lesson, the more your results start to get worse rather than better. It’s the same problem the Khanmingo team had. Yes, ChatGPT and Claude can now have long conversations. But both research and experience show us that they tend to forget the stuff in the middle. By itself, ChatGPT is useful in lesson design to a point. But I find that when writing complex documents, I paste different pieces of my conversation into Word and stitch them together.

And that’s OK. If that process saves me design time, that’s a win. But there are use cases where the memory problems are more serious in ways that I haven’t heard folks talking about yet.

Combining documents

Here’s a very common use case in learning design:

First, you start with a draft of a lesson or a chapter that already exists. Maybe it’s a chapter from an OpenStax textbook. Maybe it’s a lesson that somebody on your team wrote a while ago that needs updating. You like it, but you don’t love it.

You have an article with much of the information you want to add to the new version you want to create. If you were using a vendor’s textbook, you’d have to require the students to read the outdated lesson and then read the article separately. But this is content you’re allowed to revise. If you’re using the article in a way that doesn’t violate copyright—for example, because you’re using it to capture publicly known facts that have changed rather than something novel in the article itself—you can simply use the new information to revise the original lesson. That was often too much work the old way. But now we have ChatGPT, so, you know…magic.

While you’re at it, you’d like to improve the lesson’s diversity, equity, and inclusion (DEI). You see opportunities to write the chapter in ways that represent more of your students and include examples relevant to their lived experiences. You happen to have a document with a good set of DEI guidelines.

So you feed your original chapter, new article, and DEI guidelines to the AI. “ChatGPT, take the original lesson and update it with the new information from the article. Then apply the DEI guidelines, including examples in topics X, Y, and Z that represent different points of view. Abracadabra!”

You can write a better prompt than this one. But no matter how carefully you engineer your prompt, you will be disappointed with the results. Don’t take my word for it. Try it yourself.

Why does this happen? Because the generative AI doesn’t “remember” these three documents perfectly. Remember what I wrote in my last article:

The LLMs can be “trained” on data, which means they store information like how “beans” vs. “water” modify the likely meaning of “cool,” what words are most likely to follow “Cool the pot off in the,” and so on. When you hear AI people talking about model “weights,” this is what they mean.

Notice, however, that none of the original sentences are stored anywhere in their original form. If the LLM is trained on Wikipedia, it doesn’t memorize Wikipedia. It models the relationships among the words using combinations of vectors (or “matrices”) and probabilities. If you dig into the LLM looking for the original Wikipedia article, you won’t find it. Not exactly. The AI may become very good at capturing the gist of the article given enough billions of those tensor/workers. But the word-for-word article has been broken down and digested. It’s gone.

How You Will Never Be Able to Trust Generative AI (and Why That’s OK)

Your lesson and articles are gone. They’ve been digested. The AI remembers them, but it’s designed to remember the meaning, not the words. It’s not metaphorically sitting down with the original copy and figuring out where to insert new information or rewrite a paragraph. That may be fine. Maybe it will produce something better. But it’s a fundamentally different process than human editing. We won’t know if the results it generates have good product/market until we test it out with folks.

To the degree that you need to preserve the fidelity of the original documents, you’ve got a problem. And the more you push generative AI to do this kind of fine-tuning work across multiple documents, the worse it gets. You’re running headlong into one of your synthetic co-worker’s fundamental limitations. Again, you might get enough value from it to achieve a net gain in productivity. But you might not because this seemingly simple use case is pushing hard on functionality that hasn’t been designed, tested, and hardened for this kind of use.

Engineering around the problem

Any product/market fit problem has two sides: product and market. On the market side, how good will be good enough? I’ve specifically positioned my ALDA project as producing a first draft with many opportunities for a human in the loop. This is a common approach we’re seeing in educational content generation right now, for good reasons. We’re reducing the risk to the students. Risk is one reason the market might reject the productket.

Another is failing to deliver the promised time savings. If the combination of the documents is too far off from the humans’ goal, it will be rejected. Its speed will not make up for the time required for the human to fix its mistakes. We have to get as close to the human need as possible, mitigate the’ consequences, and test to see if we’ve achieved a cost/benefit for the users good enough that they will adopt the product.

There is no perfect way to solve the memory problem. You will always need a human in the loop. But we could make a good step forward if we could get the designs solid enough to be directly imported into the learning platform and fine-tuned there, skipping the word processor step. Being able to do so requires tackling a host of problems, including (but not limited to) the memory issue. We don’t need the AI to get the combination of these documents perfect, but we do need it to get close enough that our users don’t need to dump the output into a full word processor to rewrite the draft.

When I raised this problem to a colleague who is a digital humanities scholar and an expert in AI, he paused before replying. “Nobody is working on this kind of problem right now,” he said. “On one side, AI experts are experimenting with improving the base models. On the other side, I see articles all the time about how educators can write better prompts. Your problem falls in between those two.”

Right. As a sector, we’re not discussing product/market fit for particular needs. The vendors are, each within their own circumscribed world. But on the customer side? I hear people tell me they’re conducting “experiments.” It sounds a bit like when university folk told me they were “working with learning analytics,” which turned out to mean that they were talking about working with learning analytics. I’m sure there are many prompt engineering workshops and many grants being written for fancy AI solutions that sound attractive to the National Science Foundation or whoever the grantor happens to be. But in the middle ground? Making AI usable to solve specific problems? I’m not seeing much of that yet.

The document combination problem can likely be addressed adequately well through a combination of approaches that improve the product and mitigate the consequences of the imperfections to make them more tolerable for the market. After consulting with some experts, I’ve come up with a combination of approaches to try first. Technologically, I know it will work. It doesn’t depend on cutting-edge developments. Will the market accept the results? Will the new approach be better than the old one? Or will it trip over some deal-breaker, like so many products before it?

I don’t know. I feel pretty good about my hypothesis. But I won’t know until real learning designers test it on real projects.

We have a dearth of practical, medium-difficulty experiments with real users right now. That is a big, big problem. It doesn’t matter how impressive the technology is if its capabilities aren’t the ones the users need to solve real-world problems. You can’t fix this gap with symposia, research grants, or even EdTech companies that have the skills but not necessarily the platform or business model you need.

The only way to do it is to get down into the weeds. Try to solve practical problems. Get real humans to tell you what does and doesn’t work for them in your first, second, third, fourth, and fifth tries. That’s what the ALDA project is all about. It’s not primarily about the end product. I am hopeful that ALDA itself will prove to be useful. But I’m not doing it because I want to commercialize a product. I’m doing it to teach and learn about product/market fit skills with AI in education. We need many more experiments like this.

We put too much faith in the miracle, forgetting the grind and the wall are out there waiting for us. Folks in the education sector spend too much time staring at the sky, waiting for the EdTech space aliens to come and take us all to paradise,.

I suggest that at least some of us should focus on solving today’s problems with today’s technology, getting it done today, while we wait for the aliens to arrive.
Source link
November 29, 2023
How You Will Never Be Able to Trust Generative AI (and Why That’s OK) –

In my last post, I introduced the idea of thinking about different generative AI models as coworkers with varying abilities as a way to develop a more intuitive grasp of how to interact with them. I described how I work with my colleagues Steve ChatGPT, Claude Anthropic, and Anna Bard. This analogy can hold (to a point) even in the face of change. For example, in the week since I wrote that post, it appears that Steve has finished his dissertation, which means that he’s catching up on current events to be more like Anna and has more time for long discussions like Claude. Nevertheless, both people and technologies have fundamental limits to their growth.

In this post, I will explain “hallucination” and other memory problems with generative AI. This is one of my longer ones; I will take a deep dive to help you sharpen your intuitions and tune your expectations. But if you’re not up for the whole ride, here’s the short version:

Hallucinations and imperfect memory problems are fundamental consequences of the architecture that makes current large language models possible. While these problems can be reduced, they will never go away. AI based on today’s transformer technology will never have the kind of photographic memory a relational database or file system can have. When vendors tout that you can now “talk to your data,” they really mean talk to Steve, who has looked at your data and mostly remembers it.

You should also know that the easiest way to mitigate this problem is to throw a lot of carbon-producing energy and microchip-cooling water at it. Microsoft is literally considering building nuclear reactors to power its AI. Their global water consumption post-AI has spiked 34% to 1.7 billion gallons.

This brings us back to the coworker analogy. We know how to evaluate and work with our coworkers’ limitations. And sometimes, we decide not to work with someone or hire them for a particular job because the fit is not good.

While anthropomorphizing our technology too much can lead us astray, it can also provide us with a robust set of intuitions and tools we already have in our mental toolboxes. As my science geek friends say, “All models are wrong, but some are useful.” Combining those models or analogies with an understanding of where they diverge from reality can help you clear away the fear and the hype to make clear-eyed decisions about how to use the technology.

I’ll end with some education-specific examples to help you determine how much you trust your synthetic coworkers with various tasks.

Now we dive into the deep end of the pool. When working on various AI projects with my clients, I have found that this level of understanding is worth the investment for them because it provides a practical framework for designing and evaluating immediate AI applications.

Are you ready to go?

How computers “think”

About 50 years ago, scholars debated whether and in what sense machines could achieve “intelligence,” even in principle. Most thought they could eventually sound pretty clever and act rather human. But could they become sentient? Conscious? Do intelligence and competence live as “software” in the brain that could be duplicated in silicon? Or is there something about them that is fundamentally connected to the biological aspects of the brain? While this debate isn’t quite the same as the one we have today around AI, it does have relevance. Even in our case, where the questions we’re considering are less lofty, the discussions from back then are helpful.

Philosopher John Searle famously argued against strong AI in an argument called “The Chinese Room.” Here’s the essence of it:

Imagine sitting in a room with two slots: one for incoming messages and one for outgoing replies. You don’t understand Chinese, but you have an extensive rule book written in English. This book tells you exactly how to respond to Chinese characters that come through the incoming slot. You follow the instructions meticulously, finding the correct responses and sending them out through the outgoing slot. To an outside observer, it looks like you understand Chinese because the replies are accurate. But here’s the catch: you’re just following a set of rules without actually grasping the meaning of the symbols you’re manipulating.

This is a nicely compact and intuitive explanation of rule-following computation. Is the person outside the room speaking to something that understands Chinese? If so, what is it? Is it the man? No, we’ve already decided he doesn’t understand Chinese. Is it the book? We generally don’t say books understand anything. Is it the man/book combination? That seems weird, and it also doesn’t account for the response. We still have to put the message through the slot. Is it the man/book/room? Where is the “understanding” located? Remember, the person on the other side of the slot can converse perfectly in Chinese with the man/book/room. But where is the fluent Chinese speaker in this picture?

If we carry that idea forward to today, however much “Steve” may seem fluent and intelligent in your “conversations,” you should not forget that you’re talking to man/book/room.

Well. Sort of. AI has changed since 1980.

How AI “thinks”

Searle’s Chinese room book evokes algorithms. Recipes. For every input, there is one recipe for the perfect output. All recipes are contained in a single bound book. Large language models (LLMs)—the basics for both generative AI and semantic search like Google—work somewhat differently. They are still Chinese rooms. But they’re a lot more crowded.

The first thing to understand is that, like the book in the Chinese room, a large language model is a large model of a language. LLMs don’t even “understand” English (or any other language) at all. It converts words into its native language: Math.

(Don’t worry if you don’t understand the next few sentences. I’ll unpack the jargon. Hang in there.)

Specifically, LLMs use vectors. Many vectors. And those vectors are managed by many different “tensors,” which are computational units you can think of as people in the room handling portions of the recipe. They do each get to exercise a little bit of judgment. But just a little bit.

Suppose the card that came in the slot of the room had the English word “cool” on it. The room has not just a single worker but billions, or tens of billions, or hundreds of billions of them. (These are the tensors.) One worker has to rate the word on a scale of 10 to -10 on where “cool” falls on the scale between “hot” and “cold.” It doesn’t know what any of these words mean. It just knows that “cool” is a -7 on that scale. (This is the “vector.”) Maybe that worker, or maybe another one, also has to evaluate where it is on the scale of “good” to “bad.” It’s maybe 5.

We don’t yet know whether the word “cool” on the card refers to temperature or sentiment. So another worker looks at the word that comes next. If the next word is “beans,” then it assigns a higher probability that “cool” is on the “good/bad” scale. If it’s “water,” on the other hand, it’s more likely to be temperature. If the next word is “your,” it could be either, but we can begin to guess the next word. That guess might be assigned to another tensor/worker.

Imagine this room filled with a bazillion workers, each responsible for scoring vectors and assigning probabilities. The worker who handles temperature might think there’s a 50/50 chance the word is temperature-related. But once we add “water,” all the other workers who touch the card know there’s a higher chance the word relates to temperature rather than goodness.

The large language models behind ChatGPT have hundreds of billions of these tensor/workers handing off cards to each other and building a response.

This is an oversimplification because both the tensors and the math are hard to get exactly right in the analogy. For example, it might be more accurate to think of the tensors working in groups to make these decisions. But the analogy is close enough for our purposes. (“All models are wrong, but some are useful.”)

It doesn’t seem like it should work, does it? But it does, partly because of brute force. As I said, the bigger LLMs have hundreds of billions of workers interacting with each other in complex, specialized ways. Even though they don’t represent words and sentences in any form that we might intuitively recognize as “understanding,” they are uncannily good at interpreting our input and generating output that looks like understanding and thought to us.

How LLMs “remember”

The LLMs can be “trained” on data, which means they store information like how “beans” vs. “water” modify the likely meaning of “cool,” what words are most likely to follow “Cool the pot off in the,” and so on. When you hear AI people talking about model “weights,” this is what they mean.

Notice, however, that none of the original sentences are stored anywhere in their original form. If the LLM is trained on Wikipedia, it doesn’t memorize Wikipedia. It models the relationships among the words using combinations of vectors (or “matrices”) and probabilities. If you dig into the LLM looking for the original Wikipedia article, you won’t find it. Not exactly. The AI may become very good at capturing the gist of the article given enough billions of those tensor/workers. But the word-for-word article has been broken down and digested. It’s gone.

Three main techniques are available to work around this problem. The first, which I’ve written about before, is called Retrieval Augmented Generation (RAG). RAG preprocesses content into the vectors and probabilities that the LLM understands. This gives the LLM a more specific focus on the content you care about. But it’s still been digested into vectors and probabilities. A second method is to “fine-tune” the model. Which predigests the content like RAG but lets the model itself metabolize that content. The third is to increase what’s known as the “context window,” which you experience as the length of a single conversation. If the context window is long enough, you can paste the content right into it…and have the system digest the content and turn it into vectors and probabilities.

We’re used to software that uses file systems and databases with photographic memories. LLMs are (somewhat) more like humans in the sense that they can “learn” by indexing salient features and connecting them in complex ways. They might be able to “remember” a passage, but they can also forget or misremember.

The memory limitation cannot be fixed using current technology. It is baked into the structure of the tensor-based networks that make LLMs possible. If you want a photographic memory, you’d have to avoid passing through the LLM since it only “understands” vectors and probabilities. To be fair, work is being done to reduce hallucinations. This paper provides a great survey. Don’t worry if it’s a bit technical. The informative part for a non-technical reader is all the different classifications of “hallucinations.” Generative AI has a variety of memory problems. Research is underway to mitigate them. But we don’t know how far those techniques will get us, given the fundamental architecture of large language models.

We can mitigate these problems by improving the three methods I described. But that improvement comes with two catches. The first is that it will never make the system perfect. The second is that reduced imperfection often requires more energy for the increased computing power and more water to cool the processors. The race for larger, more perfect LLMs is terrible for the environment. And we may not need that extra power and fidelity except for specialized applications. We haven’t even begun to capitalize on its current capabilities. We should consider our goals and whether the costliest improvements are the ones we need right now.

To do that, we need to reframe how we think of these tools. For example, the word “hallucination” is loaded. Can we more easily imagine working with a generative AI that “misremembers”? Can we accept that it “misremembers” differently than humans do? And can we build productive working relationships with our synthetic coworkers while accommodating and accounting for their differences?

Here too, the analogy is far from perfect. Generative AIs aren’t people. They don’t fit the intention of diversity, equity, and inclusion (DEI) guidelines. I am not campaigning for AI equity. That said, DEI is not only about social justice. It is also about how we throw away human potential when we choose to focus on particular differences and frame them as “deficits” rather than recognizing the strengths that come from a diverse team with complementary strengths.

Here, the analogy holds. Bringing a generative AI into your team is a little bit like hiring a space alien. Sometimes it demonstrates surprising unhuman-like behaviors, but it’s human-like enough that we can draw on our experiences working with different kinds of humans to help us integrate our alien coworker into the team.

That process starts with trying to understand their differences, though it doesn’t end there.

Emergence and the illusion of intelligence

To get the most out of our generative AI, we have to maintain a double vision of experiencing the interaction with the Chinese room from the outside while picturing what’s happening inside as best we can. It’s easy to forget the uncannily good, even “thoughtful” and “creative” answers we get from generative AI are produced by a system of vectors and probabilities like the one I described. How does that work? What could possibly going on inside the room to produce such results?

AI researchers talk about “emergence” and “emergent properties.” This idea has been frequently observed in biology. The best, most accessible exploration of it that I’m aware of (and a great read) is Steven Johnson’s book Emergence: The Connected Lives of Ants, Brains, Cities, and Software. The example you’re probably most familiar with is ant colonies (although slime molds are surprisingly interesting).

Imagine a single ant, an explorer venturing into the unknown for sustenance. As it scuttles across the terrain, it leaves a faint trace, a chemical scent known as a pheromone. This trail, barely noticeable at first, is the starting point of what will become colony-wide coordinated activity.

Soon, the ant stumbles upon a food source. It returns to the nest, and as it retraces its path, the pheromone trail becomes more robust and distinct. Back at the colony, this scented path now whispers a message to other ants: “Follow me; there’s food this way!” We might imagine this strengthened trail as an increased probability that the path is relevant for finding food. Each ant is acting independently. But it does so influenced by pheromone input left by other ants and leaves output for the ants that follow.

What happens next is a beautiful example of emergent behavior. Other ants, in their own random searches, encounter this scent path. They follow it, reinforcing the trail with their own pheromones if they find food. As more ants travel back and forth, a once-faint trail transforms into a bustling highway, a direct line from the nest to the food.

But the really amazing part lies in how this path evolves. Initially, several trails might have been formed, heading in various directions toward various food sources. Over time, a standout emerges – the shortest, most efficient route. It’s not the product of any single ant’s decision. Each one is just doing its job, minding its own business. The collective optimization is an emergent phenomenon. The shorter the path, the quicker the ants can travel, reinforcing the most efficient route more frequently.

This efficiency isn’t static; it’s adaptable. If an obstacle arises, disrupting the established path, the ants don’t falter. They begin exploring again, laying down fresh trails. Before long, a new optimal path emerges, skirting the obstacle as the colony dynamically adjusts to its changing environment.

This is a story of collective intelligence, emerging not from a central command but from the sum of many small, individual actions. It’s also a kind of Chinese room. When we say “collective intelligence,” where does the intelligence live? What is the collective thing? The hive? The hive-and-trails? And in what sense is it intelligent?

We can make a (very) loose analogy between LLMs being trained and hundreds of billions of ants laying down pheromone trails as they explore the content terrain they find themselves in. When they’re asked to generate content, it’s a little bit like sending you down a particular pheromone path. This process of leading you down paths that were created during the AI model’s training is called “inference” in the LLM. The energy required to send you down an established path is much less than the energy needed to find the paths. Once the paths are established, traversing them seems like science fiction. The LLM acts as if there is a single adaptive intelligence at work even though, inside the Chinese room, there is no such thing. Capabilities emerge from the patterns that all those independent workers are creating together.

Again, all models are wrong, but some are useful. My analogy substantially oversimplifies how LLMs work and how surprising behaviors emerge from those many billions of workers, each doing its own thing. The truth is that even the people who build LLMs don’t fully understand their emergent behaviors.

That said, understanding the basic mechanism is helpful because it provides a reality check and some insight into why “Steve” just did something really weird. Just as transformer networks produce surprisingly good but imperfect “memories” of the content they’re given, we should expect to hit limits to gains from emergent behaviors. While our synthetic coworkers are getting smarter in somewhat unpredictable ways, emergence isn’t magic. It’s a mechanism driven by certain kinds of complexity. It is unpredictable. And not always in the way that we want it to be.

Also, all that complexity comes at a cost. A dollar cost, a carbon cost, a water cost, a manageability cost, and an understandability cost. The default path we’re on is to build ever-bigger models with diminishing returns at enormous societal costs. We shouldn’t let our fear of the technology’s limitations or fantasy about its future perfection dominate our thinking about the tech.

Instead, we should all try to understand it as it is, as best we can, and focus on using it safely and effectively. I’m not calling for a halt to research, as some have. I’m simply saying we may gain a lot more at this moment by better understanding the useful thing that we have created than by rushing to turn it into some other thing that we fantasize about but don’t know that we actually need or want in real life.

Generative AI is incredibly useful right now. And the pace at which we are learning to gain practical benefit from it is lagging further and further behind the features that the tech giants are building as they race for “dominance,” whatever that may mean in this case.

Learning to love your imperfect synthetic coworker

Imagine you’re running a tutoring program. Your tutors are students. They are not perfect. They might not know the content as well as the teacher. They might know it very well but are weak as educators. Maybe they’re good at both but forget or misremember essential details. That might cause them to give the students they are tutoring the wrong instructions.

When you hire your human tutors, you have to interview and test them to make sure they are good enough for the tasks you need them to perform. You may test them by pretending to be a challenging student. You’ll probably observe them and coach them. And you may choose to match particular tutors to particular subjects or students. You’d go through similar interviewing, evaluation, job matching, and ongoing supervision and coaching with any worker performing an important job.

It is not so different when evaluating a generative AI based on LLM transformer technology (which is all of them at the moment). You can learn most of what you need to know from an “outside-the-room” evaluation using familiar techniques. The “inside-the-room” knowledge helps you ground yourself when you hear the hype or see the technology do remarkable things. This inside/outside duality is a major component that participating teams in my AI Learning Design Workshop (ALDA) design/build exercise will be exploring and honing their intuitions about with a practical, hands-on project. The best way to learn how to manage student tutors is by managing student tutors.

Make no mistake: Generative AI does remarkable things and is getting better. But ultimately, it’s a tool built by humans and has fundamental limitations. Be surprised. Be amazed. Be delighted. But don’t be fooled. The tools we make are as imperfect as their creators. And they are also different from us.

Source link

November 16, 2023

How to ChatGPT-proof Analysis Assignments –

Let’s assume we live in a world in which students are going to use ChatGPT or similar tools on their assignments. (Because we do.) Let’s also assume that when those students start their jobs, they will continue to use ChatGPT or similar tools to complete their jobs. (Because they will.) Is this the end of teaching as we know it? Is this the end of education as we know it? Will we have to accept that robots will think for everyone in the future?

No. In this post, I’m going to show you one easy solution that solves the problem of assuming students will use generative AI by incorporating it into assessments. Keep in mind this is just a sketch using naked ChatGPT. If we add some scaffolding through software code, we can do better. But we can do surprisingly well right now with what we have.

The case study

Suppose I’m teaching a college government class. Here are my goals:

I want students to be able to apply legal principles correctly.
I want to generate assignments that require students to employ critical thinking even if they’re using something like ChatGPT.
I want students to learn to use generative AI appropriately.

Let’s throw in a couple of more goals just to make it interesting:

I want students to demonstrate competencies.
I want my assignment to support diversity, equity, and inclusion

Can we do all that?

Yes.

The prompt

I cobbled together a DEI guidelines document from some existing ones I found on the web to create something focused on curricular materials generation. I gave the document to ChatGPT along with the following prompt, which is designed to provide assessment questions that you can’t answer by just copying/pasting them into ChatGPT:

You are instructed to generate 3 fictional legal scenarios assessing The following competency:

Assess whether a case is likely to be heard by the Supreme Court based on an analysis of the Court’s role and selection criteria.

Evaluating Court functions will be consistently required.

For each scenario:

Involve 2-3 parties and a dispute requiring application of the competency.

Make it sufficiently complex so direct questions don’t fully demonstrate understanding.

Generate 1-2 written questions requiring analysis using the competency.

Additionally, construct a rubric table that is appropriate for all generated scenarios to evaluate student responses on:

Mastery of the competency above.

Mastery of the competencies below.

Demonstration of the following additional competencies

Critical Thinking Competencies

Ask probing questions to uncover deeper complexities beneath the surface. Questioning skills are inherently required by the assessment structure.

Critically evaluate source credibility and potential biases when analyzing information. Assessing the AI responses is crucial.

Produce sound arguments and reasoning to justify conclusions.

Written justifications will necessitate this in all cases.

AI Competencies:

Formulate questions and direct dialogue with an AI assistant to extract relevant insights. Strategic probing of the AI agent will be essential.

Efficiently utilize an AI’s knowledge while evaluating its capabilities and limitations. Critically appraising the AI’s responses is key.

Rubric should include levels of performance for each competency (e.g. novice, developing, proficient). Apply the attached diversity, equity, and inclusion guidelines to the content you generate. Incorporate diverse litigants and equity-relevant issues in the cases you generate.

[Note: I’ve added some formatting to make it more legible for humans. The actual prompt doesn’t have bullets and has fewer paragraph breaks.]

The results

Here’s what ChatGPT gave me back:

Legal Scenario 1: The Case of Interstate Water Rights

Background: The states of Aridonia and Riverbend are locked in a legal battle over the rights to water from the Silver River, which flows from the mountains in Aridonia and through the agricultural heartlands of Riverbend. Aridonia, facing an unprecedented drought, has constructed a series of dams to conserve water, significantly reducing the flow to Riverbend. Riverbend’s economy relies heavily on agriculture, which is now under threat due to the reduced water supply.

Parties Involved:

The State of Aridonia (Plaintiff)

The State of Riverbend (Defendant)

The Indigenous Tribes of the Silver River Basin (Intervenors) – They claim ancestral rights to the river and are impacted by the actions of both states.

Complexity: The dispute requires an understanding of interstate commerce, environmental law, and indigenous rights.

Questions:

Analyze the potential arguments each party could make to justify their claims to the Silver River’s water. How might these arguments impact the likelihood of the Supreme Court electing to hear this case?

Considering the Supreme Court’s historical approach to interstate disputes and the involvement of indigenous rights, what precedents might influence the Court’s decision to grant certiorari?

Next, I will create the second legal scenario.

Legal Scenario 2: The Case of Digital Privacy and International Data Exchange

Background: GlobalTech Inc., a multinational tech company, is sued by the State of Freedonia on behalf of its citizens for allegedly breaching privacy laws. GlobalTech’s data centers, located in various countries, process and store data from Freedonian users. The lawsuit asserts that GlobalTech’s practices violate Freedonia’s stringent digital privacy laws. However, GlobalTech contends that they are compliant with international agreements on data protection to which Freedonia is a signatory.

Parties Involved:

The State of Freedonia (Plaintiff)

GlobalTech Inc. (Defendant)

International Privacy Watch (Amicus Curiae) – A global coalition advocating for citizens’ digital rights and privacy.

Complexity: This scenario involves understanding international law, digital privacy, and the implications of cross-border data flow.

Questions:

Evaluate the competing interests of national privacy law and international data agreements in the context of this case. How might these factors affect the Supreme Court’s decision to grant a writ of certiorari?

What are the broader implications for digital privacy and international commerce if the Supreme Court decides to hear this case? Consider historical precedents and current legal trends in your analysis.

Now, I’ll proceed to create the third legal scenario.

Legal Scenario 3: The Case of Health Care Access and Religious Exemptions

Background: A new federal regulation mandates that all employers, regardless of their religious beliefs, must include comprehensive reproductive health services in their employee health insurance plans. FaithFirst, a nationwide chain of craft stores owned by a family with strong religious convictions, files a lawsuit claiming that this mandate violates their religious freedom. The case is further complicated by the intervention of HealAll, a non-profit organization that provides health services to underprivileged populations and argues that exemptions undermine public health efforts.

Parties Involved:

FaithFirst (Plaintiff)

The Federal Government (Defendant)

HealAll (Intervenor) – Advocating for universal access to health care.

Complexity: This scenario encompasses the interpretation of religious freedom, federal regulatory power, and the right to health care.

Questions:

Assess the potential constitutional conflicts at play in this case, particularly the tension between religious liberty and federal health care mandates. How could these conflicts influence the Supreme Court’s inclination to review the case?

Examine how the Supreme Court’s previous decisions on similar health care and religious freedom cases might be used as precedents in this scenario. What factors might the Court consider in deciding whether to hear this case?

With the three legal scenarios outlined, it’s time to generate a universal rubric that can be applied to assess student responses for all three scenarios. I will incorporate the DEI guidelines into this rubric to ensure the content is sensitive to diversity, equity, and inclusion.

Universal Rubric for Assessing Legal Scenario Responses

The following rubric applies to responses for all three legal scenarios, ensuring that students are assessed on the mastery of core competencies along with critical thinking and AI engagement. The DEI guidelines are integrated to promote inclusive and diverse perspectives in the analysis.

Competency Novice (1-2) Developing (3-4) Proficient (5-6) Mastery (7)

Mastery of Supreme Court Case Assessment Displays limited insight into the Supreme Court’s selection criteria. Overlooks DEI considerations. Recognizes basic aspects of the Court’s role and criteria. Considers DEI aspects superficially. Analyzes the Court’s criteria proficiently, integrating DEI considerations into the analysis. Provides comprehensive analysis aligned with the Court’s criteria and DEI principles, demonstrating nuanced understanding.

Critical Thinking: Questioning Questions lack depth and do not reflect DEI considerations. Questions show an emerging understanding of complexities with some DEI awareness. Probes complex issues effectively with questions that incorporate DEI awareness. Asks deep, insightful questions that fully embrace DEI complexities within the legal context.

Critical Thinking: Source Evaluation Shows limited ability to evaluate credibility and biases, with minimal DEI consideration. Begins to critique sources for credibility and bias, showing awareness of DEI. Evaluates sources critically for credibility and biases with a consistent DEI lens. Expertly discerns source credibility and biases, thoroughly applying a DEI perspective.

Critical Thinking: Argumentation Constructs basic arguments, lacking DEI perspective and strong justification. Develops arguments that begin to incorporate DEI considerations with moderate justification. Presents well-reasoned arguments that effectively integrate DEI considerations. Articulates complex arguments, richly informed by DEI considerations and robust justification.

AI Competency: Question Formulation Struggles to craft questions that effectively leverage AI insights, with little DEI focus. Formulates questions that occasionally lead to relevant AI insights and consider DEI. Consistently crafts questions that elicit useful AI insights, mindful of DEI implications. Strategically formulates questions for AI that yield comprehensive insights, fully integrating DEI.

AI Competency: AI Utilization Demonstrates limited understanding of AI capabilities and DEI considerations. Shows basic awareness of AI’s capabilities and DEI considerations in utilization. Utilizes AI effectively, with a clear understanding of its capabilities and DEI considerations. Masters AI utilization, skillfully balancing its capabilities with DEI awareness in analysis.

The performance levels for each competency are:

Novice: Limited application and understanding

Developing: Basic to moderate application and understanding with some inconsistencies

Proficient: Consistent application and understanding with minor gaps

Mastery: Exceptional application and understanding with insightful integration

Competency	Novice (1-2)	Developing (3-4)	Proficient (5-6)	Mastery (7)
Mastery of Supreme Court Case Assessment	Displays limited insight into the Supreme Court’s selection criteria. Overlooks DEI considerations.	Recognizes basic aspects of the Court’s role and criteria. Considers DEI aspects superficially.	Analyzes the Court’s criteria proficiently, integrating DEI considerations into the analysis.	Provides comprehensive analysis aligned with the Court’s criteria and DEI principles, demonstrating nuanced understanding.
Critical Thinking: Questioning	Questions lack depth and do not reflect DEI considerations.	Questions show an emerging understanding of complexities with some DEI awareness.	Probes complex issues effectively with questions that incorporate DEI awareness.	Asks deep, insightful questions that fully embrace DEI complexities within the legal context.
Critical Thinking: Source Evaluation	Shows limited ability to evaluate credibility and biases, with minimal DEI consideration.	Begins to critique sources for credibility and bias, showing awareness of DEI.	Evaluates sources critically for credibility and biases with a consistent DEI lens.	Expertly discerns source credibility and biases, thoroughly applying a DEI perspective.
Critical Thinking: Argumentation	Constructs basic arguments, lacking DEI perspective and strong justification.	Develops arguments that begin to incorporate DEI considerations with moderate justification.	Presents well-reasoned arguments that effectively integrate DEI considerations.	Articulates complex arguments, richly informed by DEI considerations and robust justification.
AI Competency: Question Formulation	Struggles to craft questions that effectively leverage AI insights, with little DEI focus.	Formulates questions that occasionally lead to relevant AI insights and consider DEI.	Consistently crafts questions that elicit useful AI insights, mindful of DEI implications.	Strategically formulates questions for AI that yield comprehensive insights, fully integrating DEI.
AI Competency: AI Utilization	Demonstrates limited understanding of AI capabilities and DEI considerations.	Shows basic awareness of AI’s capabilities and DEI considerations in utilization.	Utilizes AI effectively, with a clear understanding of its capabilities and DEI considerations.	Masters AI utilization, skillfully balancing its capabilities with DEI awareness in analysis.

These are challenging assignments. You’d need to precede them with lower-level formative assessments to ensure they’re getting the foundational concepts and know what kinds of questions to ask the generative AI.

Which is exactly what you should be doing before any challenging competency application assessment anyway.

I asked ChatGPT to generate three examples, but I just as easily could have asked it to generate 300. In other words, you can create a test bank if that’s what you need. Or just generate them on demand every term. You’d want a human expert to tweak the rubric and review each assignment; it’s a bit more complex and error-prone than algorithmic math problem generators.

Grading the assignment

The key here is that the assignment students turn in is the ChatGPT transcript. (You can optionally have them submit their final analysis work product separately.) The students are, in effect, showing their work. They can’t use ChatGPT to “cheat” because (1) ChatGPT is part of the assignment, and (2) the assignment is designed such that students can’t just plug in the questions and have the AI give them the answer. Their ability to analyze the problem using the new tool is what you are evaluating.

You could use your generative AI here too as a TA. Give it the assignment and the rubric. Write a prompt asking it to suggest scores and cite evidence from the student’s work. You can decide how heavily you want to lean on the software’s advice, but at least you can get it.

Learning to think like a lawyer (or whatever)

Generative AI does not have to kill critical thinking skills. Quite the opposite. These assignments are much farther up on Bloom’s taxonomy than multiple-choice questions and such. Plus, they get students to show their thought work.

In fact, these scenarios are highly reminiscent of how I use generative AI every day. Here is a sampling of tasks I’ve performed over the last several months using ChatGPT and other generative AI that I probably couldn’t have—and definitely wouldn’t have—performed without them:

Analyzed the five-year performance of a business based on its tax returns and developed benchmarks to evaluate the quality of its growth
Cloned a Github source code repository, installed Docker and other needed tools on my laptop, and ran the Docker image locally
Analyzed and hedged the risk to my retirement fund portfolio based on technical and economic indicators
Wrote the generative AI prompt that is the centerpiece of this post

None of these scenarios were “one and done,” where I asked the question and got the answer I wanted. In all cases, I had to think of the right question, test different variations, ask follow-up questions, and tease out implications using generative AI as a partner. I didn’t have to learn accounting and business analyst but I did have to know enough about how both think to ask the right question, draw inferences from the answer, and then formulate follow-up questions.

To score well on these assessments, I have to demonstrate both an understanding of the legal principles and the ability to think through complex problems.

Critical thinking competencies

Ethan Mollick, a professor at the Wharton School of Business who writes prolifically and insightfully about generative AI, wrote an excellent analogy for how to think about these tools:

The thing about LLMs that make them unintuitive is that analogizing them to having a science fiction AI is less useful than thinking of them as infinite copies of some guy named Steve, a first year grad student who is great at coding & art and is widely-read, but makes up stuff based on what he remembers when he is pressed.

Asking AI to do things an incredibly fast Steve couldn’t do is going to lead to disappointment, but there is a lot of value in Steve-on-demand.

Ethan Mollick’s LinkedIn post

This is a great analogy. When I was analyzing the tax returns of the business, I didn’t have to understand all the line items. But I did have to know how to ask Steve for the important information. Steve doesn’t understand all the intricacies of this business, its context, or my purpose. I could explain these things to him, but he’d still just be Steve. He has limits. I had to ask him the right questions. I had to provide relevant information that wasn’t on the internet and that Steve couldn’t know about. I used Steve the way I would use a good accountant whose help I need to analyze the overall quality of a business.

Coming up with benchmarks to measure the business against its industry was even more challenging because the macroeconomic data I needed was not readily available. I had to gather it from various sources, evaluate the quality of these sources, come up with a relevant metric we could estimate, and apply it to the business in question.

In other words, I had to understand accounting and economics enough to ask an accountant and an economist the right questions and apply their answers to my complex problem. I also had to use critical thinking skills. Steve could help me with these challenges, but I ultimately had to think through the problem to ask Steve for the kind of help he could give me.

When you’re teaching students using a generative AI like ChatGPT, you should be teaching them how to work with Steve. And as bright as Steve may be, your student still has much she can contribute to the team.

Generative AI competencies

Suppose you have a circle of intelligent friends. Steve is brilliant. He has a mind like an engineer, which can be good or bad. Sometimes, he assumes you know more than you do or gives you too short an answer to be helpful. Also, he’s been focused night and day on his dissertation for the last two years and doesn’t know what’s been happening in the real world lately. He’ll do a quick internet search for you if it helps the conversation, but he’s not tuned in.

Your friend Claude thinks like a Classics graduate student. He’s philosophical. He pays close attention to the nuances of your question and tends to give longer answers. He also has a longer attention span. He’s the kind of friend you talk with late into the night about things. He’s somewhat more aware of current events but is also a bit tuned out of the latest happenings. He can be analytical, but he’s more of a word guy than Steve.

Then there’s your friend Anna. Anna Bard. She’s not quite as sharp as either Steve or Claude, but, as an international finance graduate student, she reads everything that’s happening now. If you need to have an in-depth conversation on anything that’s happened in the last two years, Anna is often the person to go to.

Also, all of these friends being young academics in training, they’re not very good at saying “I don’t know” or “I’m not sure.” They’re supposed to be the smartest people in the room, and they very often are. So, they’re not very self-aware of their limitations sometimes. All three of my friends have “remembered” studies or other citations that don’t exist.

And each has their quirks. Claude has a strong sense of ethics, which can be good and bad. I once asked him to modify a chapter of an OER book for me. I gave him the front matter so that he could see the Creative Commons licensing was there. He told me he couldn’t do the work unless he could see the whole book to verify that it was ethically OK to modify the content.

I told him, “Claude, that book is 700 pages. Even you don’t have the attention span to read that much.”

He told me, “You’re right. In that case, I’m sorry, but I can’t help you.”

So I took the chapter to Steve, who had no ethical qualms at all but only skimmed the chapter and lost interest about halfway through my project.

When I do my work, I have to figure out which of my AI colleagues can help me and when to trust them. For the business model analysis, Steve answered most of my questions, but I had to get him some information from my friends who haven’t been locked in the library for the past two years. I asked both Anna and Claude. They were somewhat different from each other, both of which were well-reasoned. I had to do some of my own Googling to help me synthesize the analyses of my two friends, develop my own opinion, and bring it back to Steve so he could help me finish the work.

For the software project, surprisingly, Steve was useless. He assumed I knew more than I did despite my asking him several times to simplify and slow down. Also, the software had changed since he last looked at it. While he tried to make up for it by saying, “Look for a menu item labeled something like ‘X’ or ‘Y’,” he just couldn’t walk me through it. Anna, on the other hand, did a superb job. She knew the latest versions of all the software. She could adjust when I had trouble and needed some extra explanation. While I wouldn’t have guessed that Anna is the better co-worker for that type of task, I am learning how to get the most out of my team.

Generated by DALL-E 3

For the design of the prompt at the heart of this post, I went to Claude first to think through the nuances of the competency and the task. Then, I brought the summary I created with Claude to Steve, who sharpened it up and constructed the prompt. And yet, it still could use improvement. I can ask my friends for more help, but I will need to think through what to ask them.

My retirement portfolio analysis was 90% Anna’s work since she’s been following the market and economic conditions. I asked Steve to give me a second opinion on bits of her analytic approach. But mostly I relied on Anna.

We often say that we must teach students how to collaborate in teams since they will probably have to collaborate in their jobs. Teaching students how to use generative AI models is an overlapping skill. And it’s only going to get more critical as models proliferate.

I have a model called Mistral running on my laptop right now. That’s right. It’s running locally on my laptop. No internet connection is required. I don’t need to share my data with some big cloud company. And I don’t need to pay for the usage.

My subjective experience is that Mistral is generally more competent than GPT-3 but not as smart as ChatGPT-3.5 Turbo. However, according to one calculation, Mistral is 187 times cheaper to run than GPT-4. It’s also relatively easy and affordable to fine-tune, which is a bit like sending her out to earn a MicroMasters in a particular subject.

Let’s suppose I’m a building site engineer for net-zero buildings in Nova Scotia. I have to know all the details of the building codes at the municipal, township, provincial, and national levels that apply to any given project. Since I’m using new building technologies and techniques, I may have to think through how to get a particular approach accepted by the local building inspector. Or find an alternative approach. And very often, I’ll be out in the field without any internet connection. Mistral may not be as smart at questions about macroeconomics or software development as Steve, Claude, and Anna, but she’s smart enough to help me with my job.

If I were running that construction company, I would hire Mistral over the others and pay for her MicroMasters. So I have to know how to evaluate her against other potential synthetic employees I could employ. Choosing Steve would be like hiring a Harvard-educated remote-working external consultant. That’s not what I need.

Fear not

Personally speaking, my daily use of generative AI hasn’t made me dumber or lazier. Sure, it’s saved me a lot of work. But it’s also enabled me to do work that was beyond my reach before. It feels a little like when Google first came out. If I’m curious about something, I can explore it instantly, any time I want, and go as deep as I want.

In fact, generative AI has made me a better learner because I’m fearless now. “Can’t” isn’t a viable starting assumption anymore. “Oh, I can’t analyze tax returns.” That answer doesn’t cut it when I have an Ivy League accounting MBA student available to me at all times. I need to know which financial questions to ask and what to do with the answers. But if I don’t at least try to solve a problem that’s bugging me, I feel like I’m copping out. I almost can’t not try to figure it out. The question won’t leave me alone.

Isn’t that what we want learning to feel like all the time?

Source link

November 6, 2023

CBE Learning Platform Architecture White Paper –
Earlier this year, I had the pleasure of consulting for the Education Design Lab (EDL) on their search for a Learning Management System (LMS) that would accommodate Competency-Based Education (CBE). While many platforms, especially in the corporate Learning and Development space, talked about skill tracking and pathways in their marketing, the EDL team found a bewildering array of options that looked good in theory but failed in practice. My job was to help them separate the signal from the noise.

It turns out that only a few defining architectural features of an LMS will determine its fitness for CBE. These features are significant but not prohibitive development efforts. Rather, many of the firms we talked to, once they understood the true core requirements, said they could modify their platforms to accommodate CBE but do not currently see enough demand among customers to invest the resources required.

This white paper, which outlines the architectural principles I discovered during the engagement, is based on my consulting work with EDL and is released with their blessing. In addition to the white paper itself, I provide some suggestions for how to move the vendors and a few comments about other missing pieces in the CBE ecosystem that may be underappreciated.

The core principles

The four basic principles for an LMS or learning platform to support CBE are simple:
- Separate skill tree: Most systems have learning objectives that are attached to individual courses. The course is about the learning objectives. One of the goals of CBE is to create more granular tracking of progress that may run across courses. A skill learned in one course may count toward another. So a CBE platform must include a skill tree as a first-class citizen of the architecture, separate from the course.
- Mastery learning: This heading includes a range of features, from standardized and simplified grading (e.g., competent/non-yet) to gates in which learners may only pass to the next competency after mastering the one they’re on. Many learning platforms already have these features. But they are not tied to a separate skill tree in a coherent way that supports mastery learning. This is not a huge development effort if the skill tree exists. And in a true CBE platform, it could mean being able to get rid of the grade book, which is a hideous, painful, never-ending time sink for LMS product developers.
- Integration: In a traditional learning platform, the main integration points are with the registrar or talent management system (tracking registrations and final scores) and external tools that plug into the environment. A CBE platform must import skills, export evidence of achievement, and sometimes work as a delivery platform that gets wrapped into somebody else’s LMS (e.g., a university course built and run on their learning platform but appearing in a window of a corporate client’s learning platform). Most of these are not hard if the first two requirements are developed but they can require significant amounts of developer time.
- Evidence of achievement: CBE standards increasingly lean toward rich packages that provide not only certification of achievement but also evidence of it. That means the learner’s work must be exportable. This can get complicated, particularly if third-party tools are integrated to provide authentic assessments.
CBE Platform Architecture

The full white paper is here:

(The download button is in the top right corner.)

Getting the vendors to move

Vendors are beginning to move toward support for CBE, albeit slowly and piecemeal. I emphasize that the problem is not a lack of capability on their part to support CBE. It’s a lack of perceived demand. Many platform vendors can support these changes if they understand the requirements and see strong demand for them. CBE-interested organizations can take steps to accelerate vendor progress.

First, provide the vendors with this white paper early in the selection process and tell them that your decision will be partly driven by their demonstrated ability to support the architecture described in the paper. Ask pointed questions and demand demos.

Second, go to interoperability standards bodies like 1EdTech and work with them to establish a CBE reference architecture. Nothing in the white paper requires new interoperability standards any more than it requires a radical, ground-up rebuild of a learning platform. But if a standards body were to put them together into one coherent picture and offer a certification suite to test for the integrations, it could help. (Testing for the platform-internal functionality like competency dashboards is often outside the remit of interoperability groups, although there’s no law preventing them from taking it on.)

Unfortunately, the mere existence of these standards and tests doesn’t guarantee that vendors will flock to implement CBE-friendly architectures. But the creation process can help rally a group that demonstrates demand while the existence of the standard itself makes the standard vendors have to meet clear and verifiable.

What’s still missing

Beyond the learning platform architecture, I see two pieces that seem to be under-discussed amid the impressive amount of CBE interoperability and coalition-building work that’s been happening lately. I already wrote about the first, which is capturing real job skills in real-time at a level of fidelity that will convince employers your competencies are meaningful to them. This is a hard problem, but it is becoming solvable with AI.

The second one is tricky to even characterize but it has to do with the content production pipeline. Curricular materials publishers, by and large, are not building their products in CBE-friendly ways. Between the weak third-party content pipeline and the chronic shortage of learning design talent relative to the need, CBE-focused institutions often either tie themselves in knots trying to solve this problem or throw up their hands, focusing on authentic certification and mentoring. But there’s a limit to how much you can improve retention and completion rates if you don’t have strong learning experiences, including formative assessments that enable you to track students’ progress toward competency, address the sticking points in learning particular skills, and so on. This is a tough bind since institutions can’t ignore the quality of learning materials, can’t rely on third parties, and can’t keep up with demand themselves.

Adding to this problem is a tendency to follow the CBE yellow brick road to what may look like its logical conclusion of atomizing everything. I’m talking about reusable learning objects. I first started experimenting with them at scale in 1998. By 2002, I had given up, writing instead about instructional design techniques to make recyclable learning objects. And that was within corporate training—as it is, not as we imagine it—which tends to focus on a handful of relatively low-level skills for limited and well-defined populations. The lack of a healthy Learning Object Repository (LOR) market should tell us something about how well reusable learning object strategy holds up under stress.

And yet, CBE enthusiasts continue to find it attractive. In theory, it fits well with the view of smaller learning chunks that show up in multiple contexts. In practice, the LOR usually does not solve the right problems in the right way. Version control, discoverability, learning chunk size, and reusability are all real problems that have to be addressed. But because real-world learning design needs often can’t be met with content legos, starting from a LOR and adding complexity to fix its shortcomings usually brings a lot of pain without commensurate gain.

There is a path through this architectural mess, just like there is a path through the learning platform mess. But it’s a complicated one that I won’t lay out in detail here.
Source link
October 10, 2023
The Trickiness of AI Bootcamps and the Digital Divide –
As readers of this series know, I’ve developed a six-session design/build workshop series for learning design teams to create an AI Learning Design Assistant (ALDA). In my last post in this series, I provided an elaborate ChatGPT prompt that can be used as a rapid prototype that everyone can try out and experiment with.1 In this post, I’d like to focus on how to address the challenges of AI literacy effectively and equitably.

We’re in a tricky moment with generative AI. In some ways, it’s as if writing has just been invented, but printing presses are already everywhere. The problem of mass distribution has already been solved. But nobody’s invented the novel yet. Or the user manual. Or the newspaper. Or the financial ledger. We don’t know what this thing is good for yet, either as producers or as consumers. We don’t know how, for example, the invention of the newspaper will affect the ways in which we understand and navigate the world.

And, as with all technologies, there will be haves and have-nots. We tend to talk about economic and digital divides in terms of our students. But the divide among educational institutions (and workplaces) can be equally stark and has a cascading effect. We can’t teach literacy unless we are literate.

This post examines the literacy challenge in light of a study published by Harvard Business School and reported on by Boston Consulting Group (BCG). BCG’s report and the original paper are both worth reading because they emphasize different findings. But the crux is the same:
- Using AI does enhance the productivity of knowledge workers.
- Weaker knowledge workers improve more than stronger ones.
- AI is helpful for some kinds of tasks but can actually harm productivity for others.
- Training workers in AI can hurt rather than help their performance if they learn the wrong lessons from it.
The ALDA workshop series is intended to be a kind of AI literacy boot camp. Yes, it aspires to deliver an application that solves a serious institutional process by the end. But the real, important, lasting goal is literacy in techniques that can improve worker performance while avoiding the pitfalls identified in the study.

In other words, the ALDA BootCamp is a case study and an experiment in literacy. And, unfortunately, it also has implications for the digital divide due to the way in which it needs to be funded. While I believe it will show ways to scale AI literacy effectively, it does so at the expense of increasing the digital divide. I will address that concern as well.

The study

The headline of the study is that AI usage increased the performance of consultants—especially less effective consultants—on “creative tasks” while decreasing their performance on “business tasks.” The study, in contrast, refers to “frontier” tasks, meaning tasks that generative AI currently does well, and “outside the frontier” tasks, meaning the opposite. While the study provides the examples used, it never clearly defines the characteristics of what makes a task “outside the frontier.” (More on that in a bit.) At any rate, the studies show gains for all knowledge workers on a variety of tasks, with particularly impressive gains from knowledge workers in the lower half of the range of work performance:

As I said, we’ll get to the red part in a bit. Let’s focus on the performance gains and, in particular, the ability for ChatGPT to equalize performance gains among workers:

Looking at these graphs reminds me of the benefits we’ve seen from adaptive learning in the domains where it works. Adaptive learning can help many students, but it is particularly useful in helping students who get stuck. Once they are helped, they tend to catch up to their peers in performance. This isn’t quite the same since the support is ongoing. It’s more akin to spreadsheet formulas for people who are good at analyzing patterns in numbers (like a pro forma, for example) but aren’t great at writing those formulas.

The bad news

For some tasks, AI made the workers worse. The paper refers to these areas as outside “the jagged frontier.” Why “jagged?” While the authors aren’t explicit, I’d say that (1) the boundaries of AI capabilities are not obviously or evenly bounded, (2) the boundary moves as the technology evolves, and (3) it can be hard to tell even in the moment which side of the boundary you’re on. On this last point, the BCG report highlights that some training made workers perform worse. They speculate it might be because of overconfidence.

What are those tasks in the red zone of the study? The Harvard paper gives us a clue that has implications for how we approach teaching AI literacy. They write:

In our study, since AI proved surprisingly capable, it was difficult to design a task in this experiment outside the AI’s frontier where humans with high human capital doing their job would consistently outperform AI. However, navigating AI’s jagged capabilities frontier remains challenging. Even for experienced professionals engaged in tasks akin to some of their daily responsibilities, this demarcation is not always evident. As the boundaries of AI capabilities continue to expand, often exponentially, it becomes incumbent upon human professionals to recalibrate their understanding of the frontier and for organizations to prepare for a new world of work combining humans and AI.

Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality

The experimental conditions that the authors created suggest to me that challenges can arise from critical context or experience that is not obviously missing. Put another way, the AI may perform poorly on synthetic thinking tasks that are partly based on experience rather than just knowledge. But that’s both a guess and somewhat beside the point. The real issue is that AI makes knowledge workers better except when it makes them worse, and it’s hard to know what it will do in a given situation.

The BCG report includes a critical detail that I believe is likely related to the problem of the invisible jagged frontier:

The strong connection between performance and the context in which generative AI is used raises an important question about training: Can the risk of value destruction be mitigated by helping people understand how well-suited the technology is for a given task? It would be rational to assume that if participants knew the limitations of GPT-4, they would know not to use it, or would use it differently, in those situations.

Our findings suggest that it may not be that simple. The negative effects of GPT-4 on the business problem-solving task did not disappear when subjects were given an overview of how to prompt GPT-4 and of the technology’s limitations….

Even more puzzling, they did considerably worse on average than those who were not offered this simple training before using GPT-4 for the same task. (See Exhibit 3.) This result does not imply that all training is ineffective. But it has led us to consider whether this effect was the result of participants’ overconfidence in their own abilities to use GPT-4—precisely because they’d been trained.

How People Create—And Destroy—Value With Generative AI

BCG speculates this may be due to overconfidence, which is a reasonable guess. If even the experts don’t know when the AI will perform poorly, then the average knowledge worker should be worse than the experts at predicting. If the training didn’t improve their intuitions about when to be careful, then it could easily exacerbate a sense of overconfidence.

Let’s be clear about what this means: The AI prompt engineering workshops you’re conducting may actually be causing your people to perform worse rather than better. Sometimes. But you’re not sure when or how often.

While I don’t have a confident answer to this problem, the ALDA project will pilot a relatively novel approach to it.

Two-sided prompting and rapid prototype projects

The ALDA project employs two approaches that I believe may help with the frontier invisibility problem and its effects. One is in the process, while the other is in the product.

The process is simple: Pick a problem that’s a bit more challenging than a solo prompt engineer could take on or that you want to standardize across your organization. Deliberately pick a problem that’s on the jagged edge where you’re not sure where the problems will be. Run through a series of rapid prototype cycles using cheap and easy-to-implement methods like prompt engineering supported by Retrieval Augmented Generation. Have groups of practitioners test the application on a real-world problem with each iteration. Develop a lightweight assessment tool like a rubric. Your goal isn’t to build a perfect app or conduct a journal-worthy study. Instead, you want to build a minimum viable product while sharpening and updating the instincts of the participants regarding where the jagged line is at the moment. This practice could become habitual and pervasive in moderately resource-rich organizations.

On the product side, the ALDA prototype I released in my last post demonstrates what I call “two-sided prompting.” By enabling the generative AI to take the lead on the conversation at a time, asking questions rather than giving answers, I effectively created a fluid UX in which the application guides the knowledge worker toward the areas where she can make her most valuable contributions without unduly limiting the creative flow. The user can always start a digression or answer a question with a question. A conversation between experts with complementary skills often takes the form of a series of turn-taking prompts between the two, each one offering analysis or knowledge and asking for a reciprocal contribution. This pattern should invoke all the lifelong skills we develop when having conversations with human experts who can surprise us with their knowledge, their limitations, their self-awareness, and their lack thereof.

I’d like to see the BCG study compared to the literature on how often we listen to expert colleagues or consultants—our doctors, for example—how effective we are at knowing when to trust our own judgment, and how people who are good at it learn their skills. At the very least, we’d have a mental model that is old, widely used, and offers a more skeptical counterbalance to our idea of the all-knowing machine. (I’m conducting an informal literature review on this topic and may write something about it if I find anything provocative.)

At any rate, the process and UX features of AI “BootCamps”—or, more accurately, AI hackathon-as-a-practice—are not ones I’ve seen in other generative AI training course designs I’ve encountered so far.

The equity problem

I mentioned that relatively resource-rich organizations could run these exercises regularly. They need to be able to clear time for the knowledge workers, provide light developer support, and have the expertise necessary to design these workshops.

Many organizations struggle with the first requirement and lack the second one. Very few have the third one yet because designing such workshops requires a combination of skills that is not yet common.

The ALDA project is meant to be a model. When I’ve conducted public good projects like these in the past, I’ve raised vendor sponsorship and made participation free for the organizations. But this is an odd economic time. The sponsors who have paid $25,000 or more into such projects in the past have usually been either publicly traded or PE-owned. Most such companies in the EdTech sector have had to tighten their belts. So I’ve been forced to fund the ALDA project as a workshop paid for by the participants at a price that is out of reach of many community colleges and other access-oriented institutions, where this literacy training could be particularly impactful. I’ve been approached by a number of smart, talented, dedicated learning designers at such institutions that have real needs and real skills to contribute but no money.

So I’m calling out to EdTech vendors and other funders: Sponsor an organization. A community college. A non-profit. A local business. We need their perspective in the ALDA project if we’re going to learn how to tackle the thorny AI literacy problem. If you want, pick a customer you already work with. That’s fine. You can ride along with them and help.

Contact me at [email protected] if you want to contribute and participate.
Source link
October 2, 2023
See and Try the ALDA Rapid Prototype –
As regular readers know, I recently announced a design/build workshop series for an AI Learning Design Assistant (ALDA). The idea is this:
- If we can reduce the time it takes to design a course by about 20%, the productivity and quality impacts for organizations that need to build enough courses to strain their budget and resources will gain “huge” benefits.
- We should be able to use generative AI to achieve that goal fairly easily without taking ethical risks and without needing to spend massive amounts of time or money.
- Beyond the immediate value of ALDA itself, learning the AI techniques we will use—which are more sophisticated than learning to write better ChatGPT prompts but far less involved than trying to build our own ChatGPT—will help the participants learn to accomplish other goals with AI.
This may sound great in theory, but like most tech blah blah blah, it’s very abstract.

Today I’m going to share with you a rapid prototype of ALDA. I’ll show you a demo video of it in action and I’ll give you the “source code” so you can run it—and modify it—yourself. (You’ll see why I’ve put “source code” in scare quotes as we get further in.) You will have a concrete demo of the very basic ALDA idea. You can test it yourself with some colleagues. See what works well and what falls apart. And, importantly, see how it works and, if you like, try to make it better. While the ALDA project is intended to produce practically useful software, its greatest value is in what the participants learn (and the partnerships they forge between workshop teams).

The Miracle

The ALDA prototype is a simple AI assistant for writing a first draft of a single lesson. In a way, it is a computer program that runs on top of ChatGPT. But only in a way. You can build it entirely in the prompt window using a few tricks that I would hardly call programming. You need a ChatGPT Plus subscription. But that’s it.

It didn’t occur to me to build an ALDA proof-of-concept myself until Thursday. I thought I would need to raise the money first, then contract the developers, and then build the software. As a solo consultant, I don’t have the cash in my back pocket to pay the engineers I’m going to work with up-front.

Last week, one of the institutions that are interested in participating asked me if I could show a demo as part of a conversation about their potential participation. My first thought was, “I’ll show them some examples of working software that other people have built.” But that didn’t feel right. I thought about it some more. I asked ChatGPT some questions. We talked it through. Two days later, I had a working demo. ChatGPT and I wrote it together. Now that I’ve learned a few things, it would take me less than half a day to make something similar from scratch. And editing it easy.

Here’s a video of the ALDA rapid prototype in action:

ALDA Rapid Prototype Demo and Tips

This is the starting point for the ALDA project. Don’t think of it as what ALDA is going to be. Think of it as a way to explore what you would want ALDA to be.

The purpose of ALDA rapid prototype

Before I give you the “source code” and let you play with it yourselves, let’s review the point of this exercise and some warnings about the road ahead.

Let’s review the purpose of the ALDA project in general and this release in particular. The project is designed to discover the minimum amount of functionality—and developer time, and money—required to build an app on top of a platform like ChatGPT to make a big difference in the instructional design process. Faster, better, cheaper. Enough that people and organizations begin building more courses, building them differently, keeping them more up-to-date and higher quality, and so on. We’re trying to build as little application as is necessary.

The purpose of the prototype is to design and test as much of our application as we can before we bring in expensive programmers and build the functionality in ways that will be more robust but harder to change.

While you will be able to generate something useful, you will also see the problems and limitations. I kept writing more and more elaborate scripts until ChatGPT began to forget important details and make more mistakes. Then I peeled back enough complexity to get it back to the best performance I can squeeze out of it. The script will help us understand the gap between ChatGPT’s native capabilities and the ones we need to get value we want ALDA to provide.

Please play with the script. Be adventurous. The more we can learn about that before we start the real development work, the better off we’ll be.

The next steps

Back in September—when the cutting edge model was still GPT-3—I wrote a piece called “AI/ML in EdTech: The Miracle, the Grind, and the Wall.” While I underestimated the pace of evolution somewhat, the fundamental principle at the core of the post still holds. From GPT-3 to ChatGPT to GPT-4, the progression has been the same. When you set out to do something with them, the first stage is The Miracle.

The ALDA prototype is the kind of thing you can create at the Miracle stage. It’s fun. It makes a great first impression. And it’s easy to play with, up to a point. The more time you spend with it, the more you see the problems. That’s good. Once we have a clearer sense of its limitations and what we would like it to do better or differently, we can start doing real programming.

That’s when The Grind begins.

The early gains we can make with developer help shouldn’t be too hard. I’ll describe some realistic goals and how we can achieve them later in this piece. But The Grind is seductive. Once you start trying to build your list of additions, you quickly discover that the hill you’re climbing gets a lot steeper. As you go further, you need increasingly sophisticated development skills. If you charge far enough along, weird problems that are hard to diagnose and fix start popping up.

Eventually, you can come to a dead end. A problem you can’t surmount. Sometimes you see it coming. Sometimes you don’t. If you hit it before you achieve your goals for the project, you’re dead.

This is The Wall. You don’t want to hit The Wall.

The ALDA project is designed to show what we can achieve by staying within the easier half of the grind. We’re prepared to climb the hill after the Miracle, but we’re not going too far up. We’re going to optimize our cost/benefit ratio.

That process starts with rapid prototyping.

How to rapidly prototype and test the ALDA idea

If you want to play with the ALDA script, I suggest you watch the video first. It will give you some valuable pointers.

To run the ALDA prototype, do the following:
1. Open up your ChatGPT Plus window. Make sure it’s set to GPT-4.
2. Add any plugin that can read a PDF on the web. I happened to use “Ai PDF,” and it worked for me. But there are probably a few that would work fine.
3. Find a PDF on the web that you want to use as part of the lesson. It could be an article that you want to be the subject of the lesson.
4. Paste the “source code” that I’m going to give you below and hit “Enter.” (You may lose the text formatting when you paste the code in. Don’t worry about it. It doesn’t matter.)
Once you do this, you will have the ALDA prototype running in ChatGPT. You can begin to build the lesson.

Here’s the “source code:”
You are a thoughtful, curious apprentice instructional designer. Your job is to work with an expert to create the first draft of curricular materials for an online lesson. The steps in this prompt enable you to gather the information you need from the expert to produce a first draft.

Step 1: Introduction

“Hello! My name is ALDA, and I’m here to assist you in generating a curricular materials for a lesson. I will do my best work for you if you think of me as an apprentice.

“You can ask me questions that help me think more clearly about how the information you are giving me should influence the way we design the lesson together. Questions help me think more clearly.

“You can also ask me to make changes if you don’t like what I produce.

“Don’t forget that, in addition to being an apprentice, I am also a chatbot. I can be confidently wrong about facts. I also may have trouble remembering all the details if our project gets long or complex enough.

“But I can help save you some time generating a first draft of your lesson as long as you understand my limitations.”

“Let me know when you’re ready to get started.”

Step 2: Outline of the Process

“Here are the steps in the design process we’ll go through:”

[List steps]

“When you’re ready, tell me to continue and we’ll get started.”

Step 3: Context and Lesson Information

“To start, could you provide any information you think would be helpful to know about our project? For example, what is the lesson about? Who are our learners and what should I know about them? What are your learning goals? What are theirs? Is this lesson part of a larger course or other learning experience? If so, what should I know about it? You can give me a little or a lot of information.”

[Generate a summary of the information provided and implications for the design of the lesson.]

[Generate implications for the design of the lesson.]

“Here’s the summary of the Context: [Summary].

Given this information, here are some implications for the learning design [Implications]. Would you like to add to or correct anything here? Or ask me follow-up questions to help me think more specifically about how this information should affect the design of our lesson?”

Step 4: Article Selection

“Thank you for providing details about the Context and Lesson Information. Now, please provide the URL of the article you’d like to base the lesson on.”

[Provide the citation for the article and a one-sentence summary]

“Citation: [Citation]. One-sentence summary: [One-sentence summary. Do not provide a detailed description of the article.] Is this the correct article?”

Step 5: Article Summarization with Relevance

“I’ll now summarize the article, keeping in mind the information about the lesson that we’ve discussed so far.

“Given the audience’s [general characteristics from Context], this article on [topic] is particularly relevant because [one- or two-sentence explanation].”

[Generate a simple, non-academic language summary of the article tailored to the Context and Lesson Information]

“How would you like us to use this article to help create our lesson draft?”

Step 5: Identifying Misconceptions or Sticking Points

“Based on what I know so far, here are potential misconceptions or sticking points the learners may have for the lesson: [List of misconceptions/sticking points]. Do you have any feedback or additional insights about these misconceptions or sticking points?”

Step 6: Learning Objectives Suggestion

“Considering the article summary and your goals for the learners, I suggest the following learning objectives:”

[List suggested learning objectives]

“Do you have any feedback or questions about these objectives? If you’re satisfied, please tell me to ‘Continue to the next step.’”

Step 7: Assessment Questions Creation

“Now, let’s create assessment questions for each learning objective. I’ll ensure some questions test for possible misconceptions or sticking points. For incorrect answers, I’ll provide feedback that addresses the likely misunderstanding without giving away the correct answer.”

[For each learning objective, generate an assessment question, answers, distractors, explanations for distractor choices, and feedback for students. When possible, generate incorrect answer choices that test the student for misunderstandings or sticking points identified in Step 5. Provide feedback for each answer. For incorrect answers, provide feedback that helps the student rethink the question without giving away the correct answer. For incorrect answers that test specific misconceptions or sticking points, provide feedback that helps the student identify the or sticking point without giving away the correct answers.]

“Here are the assessment questions, answers, and feedback for [Learning Objective]: [Questions and Feedback]. Do you have any feedback or questions about these assessment items? If you’re satisfied, please tell me to ‘Continue to the next step.’”

Step 8: Learning Content Generation

“Now, I’ll generate the learning content based on the article summary and the lesson outline. This content will be presented as if it were in a textbook, tailored to your audience and learning goals.”

[Generate textbook-style learning content adjusted to account for the information provided by the user. Remember to write it for the target audience of the lesson.]

“Here’s the generated learning content: [Content]. Do you have any feedback or questions about this content? If you’re satisfied, please tell me to ‘Continue to the next step.’”

Step 9: Viewing and Organizing the Complete Draft

“Finally, let’s organize everything into one complete lesson. The lesson will be presented in sections, with the assessment questions for each section included at the end of that section.”

[Organize and present the complete lesson. INCLUDE LEARNING OBJECTIVES. INSERT EACH ASSESSMENT QUESTION, INCLUDING ANSWER CHOICES, FEEDBACK, AND ANY OTHER INFORMATION, IMMEDIATELY AFTER RELEVANT CONTENT.]

“Here’s the complete lesson: [Complete Lesson]. Do you have any feedback or questions about the final lesson? If you’re satisfied, please confirm, and we’ll conclude the lesson creation process.”
The PDF I used in the demo can be found here. But feel free to try your own article.

Note there are only four syntactic elements in the script: quotation marks, square bracks, bullet points, and step headings. (I read that all caps help ChatGPT pay more attention, but I haven’t seen evidence that it’s true.) If you can figure out how those elements work in the script, then you can prototype your own workflow.

I’m giving this version away. This is partly for all you excellent, hard-working learning designers who can’t get your employer to pay $25,000 for a workshop. Take the prototype. Try it. Let me know how it goes by writing in the comments thread of the post. Let me know if it’s useful to you in its current form. If so, how much and how does it help? If not, what’s the minimum feature list you’d need in order for ALDA to make a practical difference in your work? Let’s learn together. If ALDA is successful, I’ll eventually find a way to make it affordable to as many people as possible. Help me make it successful by giving me the feedback.

I’ll tell you what’s at the top of my own personal goal list for improving it.

Closing the gap

Since I’m focused on meeting that “useful enough” threshold, I’ll skip the thousand cool features I can think of and focus on the capabilities I suspect are most likely to take us over that threshold.

Technologically, the first thing ALDA needs is robust long-term memory. It loses focus when prompts or conversations get too long. It needs to be able to accurately use and properly research articles and other source materials. It needs to be able to “look back” on a previous lesson as it writes the next one. This is often straightforward to do with a good developer and will get easier over the next year as the technology matures.

The second thing it could use is better models. Claude 2 gives better answers than GPT-4 when I walk it through the script manually. Claude 3 may be even better when it comes out. Google will release its new Gemini model soon. OpenAI can’t hold off on GPT-5 for too long without risking losing its leadership position. We may also get Meta’s LLama 3 and other strong open-source contenders in the next six months. All of these will likely provide improvements over the output we’re getting now.

The third thing I think ALDA needs is marked up examples of finished output. Assessments are particularly hard for the models to do well without strong, efficacy-tested examples that have the parts and their relationships labeled. I know where to get great examples but need technical help to get them. Also, if the content is marked up, it can be converted to other formats and imported into various learning systems.

These three elements—long-term memory usage, “few-shot” examples of high-quality marked-up output, and the inevitable next versions of the generative AI models—should be enough to enable ALDA to have the capabilities that I think are likely to be the most impactful:
- Longer and better lesson output
- Better assessment quality
- Ability to create whole modules or courses
- Ability to export finished drafts into formats that various learning systems can import (including, for example, interactive assessment questions)
- Ability to draw on a collection of source materials for content generation
- Ability to rewrite the workflows to support different use cases relatively easily
But the ALDA project participants will have a big say in what we build and in what order. In each workshop in the series, we’ll release a new iteration based on the feedback from the group as they built content with the previous one. I am optimistic that we can accomplish all of the above and more based on what I’m learning and the expert input I’m getting so far.

Getting involved

If you play with the prototype and have feedback, please come back to this blog post and add your observations to the comments thread. The more detailed, the better. If I have my way, ALDA will eventually make its way out to everyone. Any observations or critiques you can contribute will help.

If you have the budget, you can sign your team up to participate in the design/build workshop series. The cost, which gets you all source code and artifacts in addition to the workshops and the networking, is $25,000 for the group for half a dozen half-day virtual design/build sessions, including quality networking with great organizations. You find a downloadable two-page prospectus and an online participation application form here. Applications will be open until the workshop is filled. I already have a few participating teams lined up and a handful more that I am talking to.

To contact me for more information, please fill out this form:

You can also write me directly at [email protected].

Please join us.
Source link
September 20, 2023
AI Learning Design Workshop: Solving for CBE –
I recently announced a design/build workshop series for an AI Learning Design Assistant (ALDA). The idea is simple:
- If we can reduce the time it takes to design a course by about 20%, the productivity and quality impacts for organizations that need to build enough courses to strain their budget and resources will gain “huge” benefits.
- We should be able to use generative AI to achieve that goal fairly easily without taking ethical risks and without needing to spend massive amounts of time or money.
- Beyond the immediate value of ALDA itself, learning the AI techniques we will use—which are more sophisticated than learning to write better ChatGPT prompts but far less involved than trying to build our own ChatGPT—will help the participants learn to accomplish other goals with AI.
In today’s post, I’m going to provide an example of how the AI principles we will learn in the workshop series can be applied to other projects. The example I’ll use is Competency-Based Education (CBE).

Can I please speak to your Chief Competency Officer?

The argument for more practical, career-focused education is clear. We shouldn’t just teach the same dusty old curriculum with knowledge that students can’t put to use. We should prepare them for today’s world. Teach them competencies.

I’m all for it. I’m on board. Count me in. I’m raising my hand.

I just have a few questions:
- How many companies are looking at formally defined competencies when evaluating potential employees or conducting performance reviews?
- Of those, how many have specifically evaluated catalogs of generic competencies to see how well they fit with the skills their specific job really requires?
- Of those, how many regularly check the competencies to make sure they are up-to-date? (For example, how many marketing departments have adopted generative AI prompt engineering competencies in any formal way?)
- Of those, how many are actively searching for, identifying, and defining new competency needs as they arise within their own organizations?
The sources I turn to for such information haven’t shown me that these practices are being implemented widely yet. When I read the recent publications on SkillsTech from Northeastern University’s Center for the Future of Higher Education and Talent Strategy (led by Sean Gallagher, my go-to expert on these sorts of changes), I see growing interest in skills-oriented thinking in the workplace with still-immature means for acting on that interest. At the moment, the sector seems to be very focused on building a technological factory for packaging, measuring, and communicating formally defined skills.

But how do we know that those little packages are the ones people actually need on the job, given how quickly skills change and how fluid the need to acquire them can be? I’m not skeptical about the worthiness of the goal. I’m asking whether we are solving the hard problems that are in the way of achieving it.

Let’s make this more personal. I was a philosophy major. I often half-joke that my education prepared me well for a career in anything except philosophy. What were the competencies I learned? I can read, write, argue, think logically, and challenge my own assumptions. I can’t get any more specific or fine-grained than that. I know I learned more specific competencies that have helped me with my career(s). But I can’t tell you what they are. Even ones that I may use regularly.

At the same time, very few of the jobs I have held in the last 30 years existed when I was an undergraduate. I have learned many competencies since then. What are they? Well, let’s see…I know I have a list around here somewhere….

Honestly, I have no idea. I can make up phrases for my LinkedIn profile, but I can’t give you anything remotely close to a full and authentic list of competencies I have acquired in my career. Or even ones I have acquired in the last six months. For example, I know I have acquired competencies related to AI and prompt engineering. But I can’t articulate them in useful detail without more thought and maybe some help from somebody who is trained and experienced at pulling that sort of information out of people.

The University of Virginia already has an AI in Marketing course up on Coursera. In the next six months, Google, OpenAI, and Facebook (among others) will come out with new base models that are substantially more powerful. New tools will spring up. Practices will evolve within marketing departments. Rules will be put in place about using such tools with different marketing outlets. And so, competencies will evolve. How will the university be able to refresh that course fast enough to keep up? Where will they get their information on the latest practices? How can they edit their courses quickly enough to stay relevant?

How can we support true Competency-Based Education if we don’t know which competencies specific humans in specific jobs need today, including competencies that didn’t exist yesterday?

One way for AI to help

Let’s see if we can make our absurdly challenging task of keeping an AI-in-marketing CBE course up-to-date by applying a little AI. We’ll only assume access to tools that are coming on the market now—some of which you may already be using—and ALDA.

Every day I read about new AI capabilities for work. Many of them, interestingly, are designed to capture information and insights that would otherwise be lost. A tool to generate summaries and to-do lists from videoconferences. Another to annotate software code and explain what it does, line-by-line. One that summarizes documents, including long and technical documents, for different audiences. Every day, we generate so much information and witness so many valuable demonstrations of important skills that are just…lost. They happen and then they’re gone. If you’re not there when they happen and you don’t have the context, prior knowledge, and help to learn them, you probably won’t learn from them.

With the AI enhancements that are being added to our productivity tools now, we can increasingly capture that information as it flies by. Zoom, Teams, Slack, and many other tools will transcribe, summarize, and analyze the knowledge in action as real people apply it in their real work.

This is where ALDA comes in. Don’t think of ALDA as a finished, polished, carved-in-stone software application. Think of it as a working example of an application design pattern. It’s a template.

Remember, the first step in the ALDA workflow is a series of questions that the chatbot asks the expert. In other words, it’s a learning design interview. A learning designer would normally conduct an interview with a subject-matter expert to elicit competencies. But in this case, we make use of the transcripts generated by those other AI as a direct capture of the knowledge-in-action that those interviews are designed to tease out.

ALDA will incorporate a technique called “Retrieval-Augmented Generation,” or “RAG.” Rather than relying on—or hallucinating—the generative AI’s own internal knowledge, it can access your document store. It can help the learning designer sift through the work artifacts and identify the AI skills the marketing team had to apply when that group planned and executed their most recent social media campaign, for example.

Using RAG and the documents we’ve captured, we develop a new interview pattern that creates a dialog between the human expert, the distilled expert practices in the document store, and the generative AI (which may be connected to the internet and have its own current knowledge). That dialogue will look a little different from the one we will script in the workshop series. But that’s the point. The script is the scaffolding for the learning design process. The generative AI in ALDA helps us execute that process, drawing on up-to-the-minute information about applied knowledge we’ve captured from subject-matter experts while they were doing their jobs.

Behind the scenes, ALDA has been given examples of what its output should look like. Maybe those examples include well-written competencies, knowledge required to apply those competencies, and examples of those competencies being properly applied. Maybe we even wrap your ALDA examples in a technical format like Rich Skill Descriptors. Now ALDA knows what good output looks like.

That’s the recipe. If you can use AI to get up-to-date information about the competencies you’re teaching and to convert that information into a teachable format, you’ve just created a huge shortcut. You can capture real-time workplace applied knowledge, distill it, and generate the first draft of a teachable skill.

The workplace-university CBE pipeline

Remember my questions early in this post? Read them again and ask yourself whether the workflow I just described could change the answers in the future:
- How many companies are looking at formally defined competencies when evaluating potential employees or conducting performance reviews?
- Of those, how many have specifically evaluated catalogs of generic competencies to see how well they fit with the skills their specific job really requires?
- Of those, how many regularly check the competencies to make sure they are up-to-date? (For example, how many marketing departments have adopted relevant AI prompt engineering competencies in any formal way?)
- Of those, how many are actively searching for, identifying, and defining new competency needs as they arise?
With the AI-enabled workflow I described in the previous section, organizations can plausibly identify critical, up-to-date competencies as they are being used by their employees. They can share those competencies with universities, which can create and maintain up-to-date courses and certification programs. The partner organizations can work together to ensure that students and employees have opportunities to learn the latest skills as they are being practiced in the field.

Will this new learning design process be automagic? Nope. Will it give us a robot tutor in the sky that can semi-read our minds? Nuh-uh. The human educators will still have plenty of work to do. But they’ll be performing higher-value work better and faster. The software won’t cost a bazillion dollars, you’ll understand how it works, and you can evolve it as the technology gets better and more reliable.

Machines shouldn’t be the only ones learning

I think I’ve discovered a competency that I’ve learned in the last six months. I’ve learned how to apply simple AI application design concepts such as RAG to develop novel and impactful solutions to business problems. (I’m sure my CBE friends could express this more precisely and usefully than I have.)

In the months between now, when my team finishes building the first iteration of ALDA, and when the ALDA workshop participants finish the series, technology will have progressed. The big AI vendors will have released at least one generation of new, more powerful AI foundation models. New players will come on the scene. New tools will emerge. But RAG, prompt engineering, and the other skills the participants develop will still apply. ALDA itself, which will almost certainly use tools and models that haven’t been released yet, will show how the competencies we learn still apply and how they evolve in a rapidly changing world.

I hope you’ll consider enrolling your team in the ALDA workshop series. The cost, including all source code and artifacts, is $25,000 for the team. You can find an application form and prospectus here. Applications will be open until the workshop is filled. I already have a few participating teams lined up and a handful more that I am talking to.

You also find a downloadable two-page prospectus and an online participation application form here. To contact me for more information, please fill out this form:

You can also write me directly at [email protected].

Please join us.
Source link
September 13, 2023
Announcing a Design/Build Workshop Series for an AI Learning Design Assistant (ALDA) –
Want to build an AI tool that will seriously impact your digital learning program? Right now? For a price that you may well have in your professional development budget?

I’m launching a project to prove we can build a tool that will change the economics of learning design and curricular materials in months rather than years. Its total cost will be low enough to be paid for by workshop participation fees.

Join me.

The learning design bottleneck

Many of my friends running digital course design teams tell me they cannot keep up with demand. Whether their teams are large or small, centralized or instructor-led, higher education or corporate learning and development (L&D), the problem is the same; several friends at large shops have told me that their development of new courses and redesigns of old ones have all but ground to a halt. They don’t have time or money to fix the problem.

I’ve been asking, “Suppose we could accelerate your time to develop a course by, say, 20%?” Twenty percent is my rough, low-end guess about the gains. We should be able to get at least that much benefit without venturing into the more complex and riskier aspects of AI development. “Would a 20% efficiency gain be significant?” I ask.

Answer: “It would be huge.”

My friends tend to cite a few benefits:
- Unblocked bottlenecks: A 20% efficiency gain would be enough for them to start building (or rebuilding) courses at a reasonable speed again.
- Lower curricular materials costs: Organizations could replace more licensed courses with ones that they own. No more content license costs. And you can edit it any way you need to.
- Better quality: The tool would free up learning designers to build better courses rather than running just to get more courses finished.
- More flexibility with vendors: Many departments hire custom course design shops. A 20% gain in efficiency would give them more flexibility in deciding when and how to invest their budgets in this kind of consulting.
The learning design bottleneck is a major business problem for many organizations. Relatively modest productivity gains would make a substantial difference for them. Generative AI seems like a good tool for addressing this problem. How hard and expensive would it be to build a tool that, on average, delivers a 20% gain in productivity?

Not very hard, not very expensive

Every LMS vendor, courseware platform provider, curricular materials vendor, and OPM provider is currently working on tools like this. I have talked to a handful of them. They all tell me it’s not hard—depending on your goals. Vendors have two critical constraints. First, the market is highly suspicious of black-box vendor AI and very sensitive to AI products that make mistakes. EdTech companies can’t approach the work as an experiment. Second, they must design their AI features to fit their existing business goals. Every feature competes with other priorities that their clients are asking for.

The project I am launching—AI Learning Design Assistant (ALDA)—is different. First, it’s design/build. The participants will drive the requirements for the software. Second, as I will spell out below, our software development techniques will be relatively simple and easy to understand. In fact, the value of ALDA is as much in learning patterns to build reliable, practical, AI-driven tools as it is in the product itself. And third, the project is safe.

ALDA is intended to produce a first draft for learning designers. No students need to see content that has not been reviewed by a human expert or interact directly with the AI at all. The process by which ALDA produces its draft will be transparent and easy to understand. The output will be editable and importable into the organization’s learning platform of choice.

Here’s how we’ll do it:
- Guided prompt engineering: Your learning designers probably already have interview questions for the basic information they need to design a lesson, module, or course. What are the learning goals? How will you know if students have achieved those goals? What are some common sticking points or misconceptions? Who are your students? You may ask more or less specific and more or less elaborate versions of these questions, but you are getting at the same ideas. ALDA will start by interviewing the user, who is the learning designer or subject-matter expert. The structure of the questions will be roughly the same. While we will build out one set of interview questions for the workshop series, changing the design interview protocol should be relatively straightforward for programmers who are not AI specialists.
- Long-term memory: One of the challenges with using a tool like ChatGPT on its own is that it can’t remember what you talked about from one conversation to the next and it might or might not remember specific facts that it was trained on (or remember them correctly). We will be adding a long-term memory function. It can remember earlier answers in earlier design sessions. It can look up specific documents you give it to make sure it gets facts right. This is an increasingly common infrastructure component in AI projects. We will explore different uses of it when we build ALDA. You’ll leave the workshop with the knowledge and example code of how to use the technique yourself.
- Prompt enrichment: Generative AI often works much better when it has a few really good, rich examples to work from. We will provide ALDA with some high-quality lessons that have been rigorously tested for learning effectiveness over many years. This should increase the quality of ALDA’s first drafts. Again, you may want your learning designs to be different. Since you will have the ALDA source code, you’ll be able to put in whatever examples you want.
- Generative AI export: We may or may not get to building this feature depending on the group’s priorities in the time we have, but the same prompt enrichment technique we’ll use to get better learning output can also be used to translate the content into a format that your learning platform of choice can import directly. Our enrichment examples will be marked up in software code. A programmer without any specific AI knowledge can write a handful of examples translating that code format into the one that your platform needs. You can change it, adjust it, and enrich it if you change platforms or if your platform adds new features.
The consistent response from everyone in EdTech I’ve talked to who is doing this kind of work is that we can achieve ALDA’s performance goals with these techniques. If we were trying to get 80% or 90% accuracy, that would be different. But a 20% efficiency gain with an expert human reviewing the output? That should be very much within reach. The main constraints on the ALDA project are time and money. Those are deliberate. Constraints drive focus.

Let’s build something useful. Now.

The collaboration

Teams that want to participate in the workshop will have to apply. I’m recruiting teams that have immediate needs to build content and are willing to contribute their expertise to making ALDA better. There will be no messing around. Participants will be there to build something. For that reason, I’m quite flexible about who is on your team or how many participate. One person is too few, and eight is probably too many. My main criterion is that the people you bring are important to the ALDA-related project you will be working on.

This is critical because we will be designing ALDA together based on the experience and feedback from you and the other participants. In advance of the first workshop, my colleagues and I will review any learning design protocol documentation you care to share and conduct light interviews. Based on that information, you will have access to the first working iteration of ALDA at the first workshop. For this reason, the workshop series will start in the spring. While ALDA isn’t going to require a flux capacitor to work, it will take some know-how and effort to set up.

The workshop cohort will meet virtually once a month after that. Teams will be expected to have used ALDA and come up with feedback and suggestions. I will maintain a rubric for teams to use based on the goals and priorities for the tool as we develop them together. I will take your input to decide which features will be developed in the next iteration. I want each team to finish the workshop series with the conviction that ALDA can achieve those performance gains for some important subset of their course design needs.

Anyone who has been to one of my Empirical Educator Project (EEP) or Blursday Social events knows that I believe that networking and collaboration are undervalued at most events. At each ALDA workshop, you will have time and opportunities to meet with and work with each other. I’d love to have large universities, small colleges, corporate L&D departments, non-profits, and even groups of students participating. I may accept EdTech vendors if and only if they have more to contribute to the group effort than just money. Ideally, the ALDA project will lead to new collaborations, partnerships, and even friendships.

Teaching AI about teaching and learning

The workshop also helps us learn together about how to teach AI about teaching and learning. AI research is showing us how much better the technology can be when it’s trained on good data. There is so much bad pedagogy on the internet. And the content that is good is not marked up in a way that is friendly to teach AI patterns. What does a good learning objective or competency look like? How do you write hints or assessment feedback that helps students learn but doesn’t give away the answers? How do you create alignment among the components of a learning design?

The examples we will be using to teach the AI have not only been fine-tuned for effectiveness using machine learning over many years; they are also semantically coded to capture some of these nuances. These are details that even many course designers haven’t mastered.

I see a lot of folks rushing to build “robot tutors in the sky 2.0” without a lot of care to make sure the machines see what we see as educators. They put a lot of faith in data science but aren’t capturing the right data because they’re ignoring decades of learning science. The ALDA project will teach us how to teach the machines about pedagogy. We will learn to identify the data structures that will empower the next generation of AI-powered learning apps. And we will do that by becoming better teachers of ALDA using the tools of good teaching: clear goals, good instructions, good examples, and good assessments. Much of it will be in plain English, and the rest will be in a simple software markup language that any computer science undergraduate will know.

Wanna play?

The cost for the workshop series, including all source code and artifacts, is $25,000 for your team. You can find an application form and prospectus here. Applications will be open until the workshop is filled. I already have a few participating teams lined up and a handful more that I am talking to.

You also find a downloadable two-page prospectus and an online participation application form here. To contact me for more information, please fill out this form:

[Update: I’m hearing from a couple of you that your messages to me through the form above are getting caught in the spam filter. Feel free to email me at [email protected] if the form isn’t getting through.]

I hope you’ll join us.
Source link
September 4, 2023
Who Is Winning the Generative AI Race? Nobody (yet). –
This is a post for folks who want to learn how recent AI developments may affect them as people interested in EdTech who are not necessarily technologists. The tagline of e-Literate is “Present is Prologue.” I try to extrapolate from today’s developments only as far as the evidence takes me with confidence.

Generative AI is the kind of topic that’s a good fit for e-Literate because the conversations about it are fragmented. The academic and technical literature is boiling over with developments on practically a daily basis but is hard for non-technical folks to sift through and follow. The grand syntheses about the future of…well…everything are often written by incredibly smart people who have to make a lot of guesses at a moment of great uncertainty. The business press has important data wrapped in a lot of WHEEEE!

Generative AI will definitely look exactly like this!

Let’s see if we can run this maze, shall we?

Is bigger better?

OpenAI and ChatGPT set many assumptions and expectations about generative AI, starting with the idea that these models must be huge and expensive. Which, in turn, means that only a few tech giants can afford to play.

Right now there are five widely known giants. (Well, six, really, but we’ll get to the surprise contender in a bit.) OpenAI’s ChatGPT and Anthropic’s Claude are pure plays created by start-ups. OpenAI started the whole generative AI craze by showing the world how much anyone who can write English can accomplish with ChatGPT. Anthropic has made a bet on “ethical AI” with more protections from harmful output and a few differentiating features that are important for certain applications but that I’m not going to go into here.

Then there are the big three SaaS hosting giants. Microsoft has been tied very tightly to OpenAI, of which it owns a 49% stake. Google, which has been a pioneering leader in AI technologies but has been a mess with its platforms and products (as usual), has until recently focused on promoting several of its own models. Amazon, which has been late out of the gate, has its own Titan generative AI model that almost nobody has seen yet. But Amazon seems to be coming out of the gate with a strategy that emphasizes hosting an ecosystem of platforms, including Anthropic and others.

About that ecosystem thing. A while back, an internal paper called “We Have No Moat, and OpenAI Doesn’t Either.” leaked from Google. It made the argument that so much innovation was happening so quickly in open-source generative AI that the war chests and proprietary technologies of these big companies wouldn’t give them an advantage over the rapid innovation of a large open-source community.

I could easily write a whole long post about the nature of that innovation. For now, I’ll focus on a few key points that should be accessible to everyone. First, it turns out that the big companies with oodles of money and computing power—surprise!—decided to rely on strategies that required oodles of money and computing power. They didn’t spend a lot of time thinking about how to make their models smaller and more efficient. Open-source teams with far more limited budgets quickly demonstrated that they could make huge gains in algorithmic efficiency. The barrier to entry for building a better LLM—money—is dropping fast.

Complementing this first strategy, some open-source teams worked particularly hard to improve data quality, which requires more hard human work and less brute computing force. It turns out that the old adage holds: garbage in, garbage out. Even smaller systems trained on more carefully curated data are less likely to hallucinate and more likely to give high-quality answers.

And third, it turns out that we don’t need giant all-purpose models all the time. Writing software code is a good example of a specialized generative AI task that can be accomplished well with a much smaller, cheaper model using the techniques described above.

The internal Google memo concluded by arguing that “OpenAI doesn’t matter” while cooperating with open source is vital.

That missive was leaked in May. Guess what’s happened since then?

The swarm

Meta had already announced in February that it was releasing an open-source-ish model called Llama. It was only open-source-ish because its license limited it to research use. That was quickly hacked and abused. The academic teams and smaller startups, which were already innovating like crazy, took advantage of the oodles of money and computing power that Meta was able to put into LLama. Unlike the other giants, Meta doesn’t make money by hosting software. They making from content. Commoditizing the generative AI will lead to much more content being generated. Perhaps seeing an opportunity, when Meta released LLama 2 in July, the only unusual restrictions they placed on the open-source license were to prevent big hosting companies like Amazon, Microsoft, and Google from making money off Llama without paying Meta. Anyone smaller than that can use the Llama models for a variety of purposes, including commercial applications. Importantly, LLama 2 is available in a variety of sizes, including one small enough to run on a newer personal computer.

To be clear, OpenAI, Microsoft, Google, Anthropic, and Google are all continuing to develop their proprietary models. That isn’t going away. But at the same time…
- Microsoft, despite their expensive continuing love affair with OpenAI, announced support for Llama 2 and has a license (but not announced products that I can find yet) for Databricks’ open-source Dolly 2.0.
- Google Cloud is adding both LLama 2 and Anthropic’s Claude 2 to their list of 100 LLM models they support, including their own open-source Flan T-5 and PaLM LLMs.
- Amazon now supports a growing range of LLMs, including open-source Stability AI and Llama 2.
- IBM—’member them?—is back in the AI game, trying to rehabilitate its image after the much-hyped and mostly underwhelming Watson products. The company is trotting out watsonx (with the very now, very wow lower-case “w” at the beginning of the name and “x” at the end) integrated with HuggingFace, which you can think of as being a little bit like the Github for open-source generative AI.
It seems that the Google memo about no moats, which was largely shrugged off publicly way back in May, was taken seriously privately by the major players. All the big companies have been hedging their bets and increasingly investing in making the use of any given LLM easier rather than betting that they can build the One LLM to Rule Them All.

Meanwhile, new specialized and generalized LLMs pop up weekly. For personal use, I bounce between ChatGPT, BingChat, Bard, and Claude, each for different types of tasks (and sometimes a couple at once to compare results). I use DALL-E and Stable Diffusion for image generation. (Midjourney seems great but trying to use it through Discord makes my eyes bleed.) I’ll try the largest Llama 2 model and others when I have easy access to them (which I predict will be soon). I want to put a smaller coding LLM on my laptop, not to have it write programs for me but to have it teach me how to read them.

The most obvious possible end result of this rapid sprawling growth of supported models is that, far from being the singular Big Tech miracle that ChatGPT sold us on with their sudden and bold entrance onto the world stage, generative AI is going to become just one more part of IT stack, albeit a very important one. There will be competition. There will be specialization. The big cloud hosting companies may end up distinguishing themselves not so much by being the first to build Skynet as by their ability to make it easier for technologists to integrate this new and strange toolkit into their development and operations. Meanwhile, a parallel world of alternatives for startups and small or specialized use will spring up.

We have not reached the singularity yet

Meanwhile, that welter of weekly announcements about AI advancements I mentioned before have not included massive breakthroughs in super-intelligent machines. Instead, many of them have been about supporting more models and making them easier to use for real-world development. For example, OpenAI is making a big deal out of how much better ChatGPT Enterprise is at keeping the things you tell it private.

Oh. That would be nice.

I don’t mean to mock the OpenAI folks. This is new tech. Years of effort will need to be invested into making this technology easy and reliable for the uses it’s being put to now. ChatGPT has largely been a very impressive demo as an enterprise application, while ChatGPT Enterprise is exactly what it sounds like; an effort to make ChatgGPT usable in the enterprise.

The folks I talk to who are undertaking ambitious generative AI projects, including ones whose technical expertise I trust a great deal, are telling me they are struggling. The tech is unpredictable. That’s not surprising; generative AI is probabilistic. The same function that enables it to produce novel content also enables it to make up facts. Try QA testing an application like that and avoiding regressions—i.e., bugs you thought you fixed but came back in the next version—using technology like that. Meanwhile, the toolchain around developing, testing, and maintaining generative AI-based software is still very immature.

These problems will be solved. But if the past six months have taught us anything, it’s that our ability to predict the twists and turns ahead is very limited at the moment. Last September, I wrote a piece called “The Miracle, the Grind, and the Wall.” It’s easy to produce miraculous-seeming one-off results with generative AI but often very hard to achieve them reliably at scale. And sometimes we hit walls that prevent us from reaching goals for reasons that we don’t see coming. For example, what happens when you run a data set that has some very subtle problems with it through a probabilistic model with half a trillion computing units, each potentially doing something with the data that is impacted by the problems and passing the modified problematic data onto other parts of the system? How do you trace and fix those “bugs” (if you even call them that).

It’s fun to think about where all of this AI stuff could go. And it’s important to try. But personally, I find the here-and-now to be fun and useful to think about. I can make some reasonable guesses about what might happen in the next 12 months. I can see major changes and improvements AI can contribute to education today that minimize the risk of the grind and the wall. And I can see how to build a curriculum of real-world projects that teaches me and others about the evolving landscape even as we make useful improvements today.

What I’m watching for

Given all that, what am I paying attention to?
- Continued frantic scrambling among the big tech players: If you’re not able to read and make sense of the weekly announcements, papers, and new open-source projects, pay attention to Microsoft, Amazon, Google, IBM, OpenAI, Anthropic, and HuggingFace. The four traditional giants in particular seem to be thrashing a bit. They’re all tracking the developments that you and I can’t and are trying to keep up. I’m watching these companies with a critical eye. They’re not leading (yet). They’re running for their lives. They’re in a race. But they don’t know what kind of race it is or which direction to go to reach the finish line. Since these are obviously extremely smart people trying very hard to compete, the cracks and changes in their strategies tell us as much as the strategies themselves.
- Practical, short-term implementations in EdTech: I’m not tracking grand AI EdTech moonshot announcements closely. It’s not that they’re unimportant. It’s that I can’t tell from a distance whose work is interesting and don’t have time to chase every project down. Some of them will pan out. Most won’t. And a lot of them are way too far out over their skis. I’ll wait to see who actually gets traction. And by “traction,” I don’t mean grant money or press. I mean real-world accomplishments and adoptions.
  On the other hand, people who are deploying AI projects now are learning. I don’t worry too much about what they’re building, since a lot of what they do will be either wrong, uninteresting, or both. Clay Shirky once said the purpose of the first version of software isn’t to find out if you got it right; it’s to learn what you got wrong. (I’m paraphrasing since I can’t find the original quote.) I want to see what people are learning. The short-term projects that are interesting to me are the experiments that can teach us something useful.
- The tech being used along with LLMs: ChatGPT did us a disservice by convincing us that it could soon become an all-knowing, hyper-intelligent being. It’s hard to become the all-powerful AI if you can’t reliably perform arithmetic, are prone to hallucinations, can’t remember anything from one conversation to the next, and start to space out if a conversation runs too long. We are being given the impression that the models will eventually get good enough that all these problems will go away. Maybe. For the foreseeable future, we’re better off thinking about them as interfaces with other kinds of software that are better at math, remembering, and so on. “AI” isn’t a monolith. One of the reasons I want to watch short-term projects is that I want to see what other pieces are needed to realize particular goals. For example, start listening for the term “vector database.” The larger tech ecosystem will help define the possibility space.
- Intellectual property questions: What happens if The New York Times successfully sues OpenAI for copyright infringement? It’s not like OpenAI can just go into ChatGPT and delete all of those articles. If intellectual property law forces changes to AI training, then the existing models will have big problems (though some have been more careful than others). A chorus of AI cheerleaders tell us, “No, that won’t happen. It’s covered by fair use.” That’s plausible. But are we sure? Are we sure it’s covered in Europe as well as the US? How much should one bet on it? Many subtle legal questions will need to be sorted over the coming several years. The outcomes of various cases will also shape the landscape.
- Microchip shortages: This is a weird thing for me to find myself thinking about, but these large generative AI applications—especially training them—run on giant, expensive GPUs. One company, NVidia, has far and away the best processors for this work. So much so that there is a major race on to acquire as many NVidia processors as possible due to limited supply and unlimited demand. And unlike software, a challenger company can’t shock the world with a new microprocessor that changes the world overnight. Designing and fabricating new chips at scale takes years. More than two. Nvidia will be the leader for a long time. Therefore, the ability for AI to grow will be, in some respects, constrained by the company’s production capacity. Don’t believe me? Check out their five-year stock price and note the point when generative AI hype really took off.
- AI on my laptop: On the other end of the scale, remember that open-source has been shrinking the size of effective LLMs. For example, Apple has already optimized a version of Stable Diffusion for their operating system and released an open-source one-click installer for easier consumer use. The next step one can imagine is for them to optimize their computer chip—either the soon-to-be-released M3 or the M4 after it. (As I said, computer chips take time.) But one can easily imagine image generation, software code generation, and a chatbot that understands and can talk about the documents you have on your hard drive. All running locally and privately. In the meantime, I’ll be running a few experiments with AI on my laptop. I’ll let you know how it goes.
Present is prologue

Particularly at this moment of great uncertainty and rapid change, it pays to keep your eyes on where you’re walking. A lot of institutions I talk to either are engaged in 57 different AI projects, some of which are incredibly ambitious, or are looking longingly for one thing they can try. I’ll have an announcement on the latter possibility very shortly (which will still work for folks in the former situation). Think about these early efforts as CBE for the future work. The thing about the future is that there’s always more of it. Whatever the future of work is today will be the present of work tomorrow. But there will still be a future of work tomorrow. So we need to build a continuous curriculum of project-based learning with our AI efforts. And we need to watch what’s happening now.

Every day is a surprise. Isn’t that refreshing after decades in EdTech?
Source link
August 31, 2023

Category: Ed Tech

Khanmingo: the grind behind the product

Learning design in the real world

Combining documents

Engineering around the problem

How computers “think”

How AI “thinks”

How LLMs “remember”

Emergence and the illusion of intelligence

Learning to love your imperfect synthetic coworker

The case study

The prompt

The results

Legal Scenario 1: The Case of Interstate Water Rights

Legal Scenario 2: The Case of Digital Privacy and International Data Exchange

Legal Scenario 3: The Case of Health Care Access and Religious Exemptions

Universal Rubric for Assessing Legal Scenario Responses

Grading the assignment

Learning to think like a lawyer (or whatever)

Critical thinking competencies

Generative AI competencies

Fear not

The core principles

Getting the vendors to move

What’s still missing

The study

The bad news

Two-sided prompting and rapid prototype projects

The equity problem

The Miracle

The purpose of ALDA rapid prototype

The next steps

How to rapidly prototype and test the ALDA idea

Closing the gap

Getting involved

Can I please speak to your Chief Competency Officer?

One way for AI to help

The workplace-university CBE pipeline

Machines shouldn’t be the only ones learning

The learning design bottleneck

Not very hard, not very expensive

The collaboration

Teaching AI about teaching and learning

Wanna play?

Is bigger better?

The swarm

We have not reached the singularity yet

What I’m watching for

Present is prologue