Author: admin

  • PeopleAdmin A PowerSchool Company

    PeopleAdmin A PowerSchool Company

    Navigating Change in Higher Education

    Change is a constant in higher education, and institutions are continually evolving to meet the demands of the modern world. In a recent PeopleAdmin webinar, Mastering Change Management in HigherEd’s Digital Transition, experts from Central Oregon Community College and Chapman University shared their experiences with change management during two large technology implementations, offering tips and best practices for other institutions anticipating change in the new year. In a poll at the start of the webinar, 95% of attendees responded that they would be facing a change in the new year. 31% are facing a major change, while 64% are navigating minor adjustments. If you’re among that 95%, read on below.

     

    Case Study: Central Oregon Community College.

    Laurel Kent, IT Project Manager at Central Oregon Community College, explored her team’s journey through a Performance Management upgrade that took place over the past year.

    Case Study Focus: Performance Review Transformation

    • Moving from manual, PDF-based processes to a digital platform within PeopleAdmin.
    • Addressing issues like inconsistency, versioning, and tracking associated with PDF processes.
    • Utilizing the PeopleAdmin portal to streamline performance evaluation tracking.

    Wins and Lessons Learned:

    • Leadership Buy-in and clear project vision: Project support from the CHRO and CIO helped provide the appropriate resources. Dedicated project managers and functional analyst team, working collaboratively with HR, oversaw project timelines and deliverables to keep things on track.
    • Clear project plan and frequent communication: Sharing the project progress and updates regularly across campus meant that end-users knew what to expect.
    • Clear Roles and Timelines: Regular and predictable working sessions, clearly defined roles, and a reasonable timeline for testing and implementation kept things moving forward.
    • Relationships matter: Make sure that you have users across campus who can answer questions and provide feedback.
    • Build in time to fine tune product: A lesson learned was to include extra time for testing and stakeholder feedback. The team found it was important to see the product live and get direct feedback, and then tweak the platform as necessary.

     

    Case Study 2: Chapman University

    Robin Borough, Director of Talent Acquisition at Chapman University, shared insights from her many experiences with change management—and her top tip was a formula.

    Change Management Formula from Beckhard and Harris: Change (C) = Dissatisfaction (A) * Desirability (B) * Practicality (D) > Perceived Cost (X)

    “This formula is old, but everybody will be able to relate to it and see that it’s a real quick and dirty way to see if you can get the funding, and the sponsorship that you need, or if you need to prove something to get that funding and sponsorship,” said Robin. “‘C’ is the change. ‘A’ is the level of dissatisfaction with the status quo, and ‘B’ is the desirability of the change or proposed end state. ‘D’ is the practicality of the change—so are the steps to make this change practical and are we minimizing risk and disruption as much as possible? ‘X’ is the perceived cost of the change. For change to make sense, A * B * D has to be greater than X—meaning, I have to have a lot of dissatisfaction and a lot of desire for something different, and the plan has to be practical. If AB, or D are zero, you’re out. Don’t even try to make the change. So much of what we’re doing is subjective, because there’s so many people and constituents involved with change management, so I thought this formula was an interesting way to think about it.”

    Final Thoughts

    In the ever-evolving landscape of higher education, change is inevitable. The experiences shared by Central Oregon Community College and Chapman University underscore the significance of proactive change management, user-centric approaches, the value of learning from past successes and challenges—and how important it is to understand what you’re getting into from the start. As institutions embark on their journeys of transformation, these insights can serve as guiding principles for navigating the complexities of change in higher education. For more, check out this webinar on-demand.

     

    Source link

  • How it Breaks in Subtle Ways –

    How it Breaks in Subtle Ways –

    In my last post, I explained how generative AI memory works and why it will always make mistakes without a fundamental change in its foundational technology. I also gave some tips for how to work around and deal with that problem to safely and productively incorporate imperfect AI into EdTech (and other uses). Today, I will draw on the memory issue I wrote about last time as a case study of why embracing our imperfect tools also means recognizing where they are likely to fail us and thinking hard about dealing realistically with their limitations.

    This is part of a larger series I’m starting on a term of art called “product/market fit.” The simplest explanation of the idea is the degree to which the thing you’re building is something people want and are willing to pay the cost for, monetary or otherwise. In practice, achieving product/market fit is complex, multifaceted, and hard. This is especially true in a sector like education, where different contextual details often create the need for niche products, where the buyer, adopter, and user of the product are not necessarily the same, and where measurable goals to optimize your product for are hard to find and often viewed with suspicion.

    Think about all the EdTech product categories that were supposed to be huge but disappointed expectations. MOOCs. Learning analytics. E-portfolios. Courseware platforms. And now, possibly OPMs. The list goes on. Why didn’t these product categories achieve the potential that we imagined for them? There is no one answer. It’s often in the small details specific to each situation. AI in action presents an interesting use case, partly because it’s unfolding right now, partly because it seems so easy, and partly because it’s odd and unpredictable, even to the experts. I have often written about “the miracle, the grind, and the wall” with AI. We will look at a couple of examples of moving from the miracle to the grind. These moments provide good lessons in the challenges of product/market fit.

    In my next post, I’ll examine product/market fit for universities in a changing landscape, focusing on applying CBE to an unusual test case. In the third post, I’ll explore product/market fit for EdTech interoperability standards and facilitating the growth of a healthier ecosystem.

    Khanmingo: the grind behind the product

    Khan Academy’s Kristen DiCerbo gave us all a great service by writing openly about the challenges of producing a good AI lesson plan generator. They started with prompt engineering. Well-written prompts are miracles. They’re like magic spells. Generating a detailed lesson plan in seconds with a well-written prompt is possible. But how good is that lesson plan? How well did Khanmingo’s early prompts produce the lesson plans?

    Kristen writes,

    At first glance, it wasn’t bad. It produced what looked to be a decent lesson plan—at least on the surface. However, on closer inspection, we saw some issues, including the following:

    • Lesson objectives just parroted the standard
    • Warmups did not consistently cover the most logical prerequisite skills
    • Incorrect answer keys for independent practice
    • Sections of the plan were unpredictable in length and format
    • The model seemed to sometimes ignore parts of the instructions in the prompt

    Prompt Engineering a Lesson Plan: Harnessing AI for Effective Lesson Planning

    You can’t tell the quality of the AI’s lesson plans without having experts examine them closely. You also want feedback from people who will actually use those lesson plans. I guarantee they will find problems that you will miss. Every time. Remember, the ultimate goal of product/market fit is to make something that the intended adopters will actually want. People will tolerate imperfections in a product. But which ones? What’s most important to them? How will they use the product? You can’t answer these questions confidently without the help of actual humans who would be using the product.

    At any rate, Khan Academy realized their early prompt engineering attempts had several shortcomings. Here’s the first:

    Khanmigo didn’t have enough information. There were too many undefined details for Khanmigo to infer and synthesize, such as state standards, target grade level, and prerequisites. Not to mention limits to Khanmigo’s subject matter expertise. This resulted in lesson plans that were too vague and/or inaccurate to provide significant value to teachers.

    Prompt Engineering a Lesson Plan: Harnessing AI for Effective Lesson Planning

    Read that passage carefully. With each type of information or expertise, ask yourself, “Where could I find that? Where is it written down in a form the AI can digest?” The answer is different for each one. How can the AI learn more about what state standards mean? Or about target grade levels? Prerequisites? Subject-matter expertise for each subject? No matter how much ChatGPT seems to know, it doesn’t know everything. And it is often completely ignorant about anything that isn’t well-documented on the internet. A human educator has to understand all these topics to write good lesson plans. A synthetic one does too. But a synthetic educator doesn’t have experience to draw on. It only has whatever human educators have publicly published about their experiences.

    Think about the effort involved in documenting all these various types of knowledge for a synthetic educator. (This, by the way, is very similar to why learning analytics disappointed as a product category. The software needs to know too much that wasn’t available in the systems to make sense of the data.)

    Here’s the second challenge that the Khanmingo team faced:

    We were trying to accomplish too much with a single prompt. The longer a prompt got and the more detailed its instructions were, the more likely it was that parts of the prompt would be ignored. Trying to produce a document as complex and nuanced as a comprehensive lesson plan with a single prompt invariably resulted in lesson plans with neglected, unfocused, or entirely missing parts.

    Prompt Engineering a Lesson Plan: Harnessing AI for Effective Lesson Planning

    I suspect this is a subtle manifestation of the memory problem I wrote about in my last post. Even with a relatively short text like a complex prompt, the AI couldn’t hold onto all the details. The Khanmingo team ended up breaking up the prompt into smaller pieces. This produced better results as the AI could “concentrate on”—or remember the details— one step at a time. I’ll add that this approach provides more opportunities to put humans in the loop. An expert—or a user—can examine and modify the output of each step.

    We fantasize about AI doing work for us. In some cases, it’s not just a fantasy. I use AI to be more productive literally every day. But it fails me often. We can’t know what it will take for AI to solve any particular problem without looking closely at the product’s capabilities and the user’s very specific needs. This is product/market fit.

    Learning design in the real world

    Developing skill in product/market fit is hard. Think about all those different topics the Khanmingo team needed to not only know, but know their relevance to creating lesson plans well enough to diagnose the gaps in the AI’s understanding.

    Refining a product is also inherently iterative. No matter how good you are at product designer how well you know your audience, and how brilliant you are, you will be wrong about some of your ideas early on. Because people are complicated. Organizations are complicated The skills workers need are often complicated and non-obvious. And the details of how the people need to work, individually and together, are often distinctive in ways that are invisible to them. Most people only know their own context. They take a lot for granted. Good product people spend their time uncovering these invisible assumptions and finding the commonalities and the differences. This is always a discovery process that takes time.

    Learning design is a classic case of this problem. People have been writing and adopting learning design methodologies longer than I’ve been alive. The ADDIE model—”Analyze, Design, Develop, Implement, and Evaluate”—was created by Florida State University for the military in the 1970s. “Backward Design” was invented in 1949 by Ralph W. Tyler. Over the past 30 years, I’ve seen a handful of learning design or instructional design tools that attempt to scaffold and enforce these and other design methodologies. I’ve yet to see one get widespread adoption. Why? Poor product/market fit.

    While the goal of learning design (or “instructional design,” to use the older term) is to produce a structured learning experience, the thought process of creating it is non-linear and iterative. As we develop and draft, we see areas that need tuning or improving. We move back and forth across the process. Nobody ever follows learning design methodologies strictly in practice. And I’m talking about trained learning design professionals. Untrained educators stray even further from the model. That’s why the two most popular learning design tools, by far, are Microsoft Word and Google Docs.

    If you’ve ever used ChatGPT and prompt engineering to generate the learning design of a complex lesson, you’ve probably run into unexpected limits to its usefulness. The longer you spend tinkering with the lesson, the more your results start to get worse rather than better. It’s the same problem the Khanmingo team had. Yes, ChatGPT and Claude can now have long conversations. But both research and experience show us that they tend to forget the stuff in the middle. By itself, ChatGPT is useful in lesson design to a point. But I find that when writing complex documents, I paste different pieces of my conversation into Word and stitch them together.

    And that’s OK. If that process saves me design time, that’s a win. But there are use cases where the memory problems are more serious in ways that I haven’t heard folks talking about yet.

    Combining documents

    Here’s a very common use case in learning design:

    First, you start with a draft of a lesson or a chapter that already exists. Maybe it’s a chapter from an OpenStax textbook. Maybe it’s a lesson that somebody on your team wrote a while ago that needs updating. You like it, but you don’t love it.

    You have an article with much of the information you want to add to the new version you want to create. If you were using a vendor’s textbook, you’d have to require the students to read the outdated lesson and then read the article separately. But this is content you’re allowed to revise. If you’re using the article in a way that doesn’t violate copyright—for example, because you’re using it to capture publicly known facts that have changed rather than something novel in the article itself—you can simply use the new information to revise the original lesson. That was often too much work the old way. But now we have ChatGPT, so, you know…magic.

    While you’re at it, you’d like to improve the lesson’s diversity, equity, and inclusion (DEI). You see opportunities to write the chapter in ways that represent more of your students and include examples relevant to their lived experiences. You happen to have a document with a good set of DEI guidelines.

    So you feed your original chapter, new article, and DEI guidelines to the AI. “ChatGPT, take the original lesson and update it with the new information from the article. Then apply the DEI guidelines, including examples in topics X, Y, and Z that represent different points of view. Abracadabra!”

    You can write a better prompt than this one. But no matter how carefully you engineer your prompt, you will be disappointed with the results. Don’t take my word for it. Try it yourself.

    Why does this happen? Because the generative AI doesn’t “remember” these three documents perfectly. Remember what I wrote in my last article:

    The LLMs can be “trained” on data, which means they store information like how “beans” vs. “water” modify the likely meaning of “cool,” what words are most likely to follow “Cool the pot off in the,” and so on. When you hear AI people talking about model “weights,” this is what they mean.

    Notice, however, that none of the original sentences are stored anywhere in their original form. If the LLM is trained on Wikipedia, it doesn’t memorize Wikipedia. It models the relationships among the words using combinations of vectors (or “matrices”) and probabilities. If you dig into the LLM looking for the original Wikipedia article, you won’t find it. Not exactly. The AI may become very good at capturing the gist of the article given enough billions of those tensor/workers. But the word-for-word article has been broken down and digested. It’s gone.

    How You Will Never Be Able to Trust Generative AI (and Why That’s OK)

    Your lesson and articles are gone. They’ve been digested. The AI remembers them, but it’s designed to remember the meaning, not the words. It’s not metaphorically sitting down with the original copy and figuring out where to insert new information or rewrite a paragraph. That may be fine. Maybe it will produce something better. But it’s a fundamentally different process than human editing. We won’t know if the results it generates have good product/market until we test it out with folks.

    To the degree that you need to preserve the fidelity of the original documents, you’ve got a problem. And the more you push generative AI to do this kind of fine-tuning work across multiple documents, the worse it gets. You’re running headlong into one of your synthetic co-worker’s fundamental limitations. Again, you might get enough value from it to achieve a net gain in productivity. But you might not because this seemingly simple use case is pushing hard on functionality that hasn’t been designed, tested, and hardened for this kind of use.

    Engineering around the problem

    Any product/market fit problem has two sides: product and market. On the market side, how good will be good enough? I’ve specifically positioned my ALDA project as producing a first draft with many opportunities for a human in the loop. This is a common approach we’re seeing in educational content generation right now, for good reasons. We’re reducing the risk to the students. Risk is one reason the market might reject the productket.

    Another is failing to deliver the promised time savings. If the combination of the documents is too far off from the humans’ goal, it will be rejected. Its speed will not make up for the time required for the human to fix its mistakes. We have to get as close to the human need as possible, mitigate the’ consequences, and test to see if we’ve achieved a cost/benefit for the users good enough that they will adopt the product.

    There is no perfect way to solve the memory problem. You will always need a human in the loop. But we could make a good step forward if we could get the designs solid enough to be directly imported into the learning platform and fine-tuned there, skipping the word processor step. Being able to do so requires tackling a host of problems, including (but not limited to) the memory issue. We don’t need the AI to get the combination of these documents perfect, but we do need it to get close enough that our users don’t need to dump the output into a full word processor to rewrite the draft.

    When I raised this problem to a colleague who is a digital humanities scholar and an expert in AI, he paused before replying. “Nobody is working on this kind of problem right now,” he said. “On one side, AI experts are experimenting with improving the base models. On the other side, I see articles all the time about how educators can write better prompts. Your problem falls in between those two.”

    Right. As a sector, we’re not discussing product/market fit for particular needs. The vendors are, each within their own circumscribed world. But on the customer side? I hear people tell me they’re conducting “experiments.” It sounds a bit like when university folk told me they were “working with learning analytics,” which turned out to mean that they were talking about working with learning analytics. I’m sure there are many prompt engineering workshops and many grants being written for fancy AI solutions that sound attractive to the National Science Foundation or whoever the grantor happens to be. But in the middle ground? Making AI usable to solve specific problems? I’m not seeing much of that yet.

    The document combination problem can likely be addressed adequately well through a combination of approaches that improve the product and mitigate the consequences of the imperfections to make them more tolerable for the market. After consulting with some experts, I’ve come up with a combination of approaches to try first. Technologically, I know it will work. It doesn’t depend on cutting-edge developments. Will the market accept the results? Will the new approach be better than the old one? Or will it trip over some deal-breaker, like so many products before it?

    I don’t know. I feel pretty good about my hypothesis. But I won’t know until real learning designers test it on real projects.

    We have a dearth of practical, medium-difficulty experiments with real users right now. That is a big, big problem. It doesn’t matter how impressive the technology is if its capabilities aren’t the ones the users need to solve real-world problems. You can’t fix this gap with symposia, research grants, or even EdTech companies that have the skills but not necessarily the platform or business model you need.

    The only way to do it is to get down into the weeds. Try to solve practical problems. Get real humans to tell you what does and doesn’t work for them in your first, second, third, fourth, and fifth tries. That’s what the ALDA project is all about. It’s not primarily about the end product. I am hopeful that ALDA itself will prove to be useful. But I’m not doing it because I want to commercialize a product. I’m doing it to teach and learn about product/market fit skills with AI in education. We need many more experiments like this.

    We put too much faith in the miracle, forgetting the grind and the wall are out there waiting for us. Folks in the education sector spend too much time staring at the sky, waiting for the EdTech space aliens to come and take us all to paradise,.

    I suggest that at least some of us should focus on solving today’s problems with today’s technology, getting it done today, while we wait for the aliens to arrive.

    Source link

  • Reflections from the Higher Education for Good Book Release Celebration – Teaching in Higher Ed

    Reflections from the Higher Education for Good Book Release Celebration – Teaching in Higher Ed

    What a way to start my week!

    November 20, 2023, I attended an online launch celebration event for a magnificent project. The book Higher Education for Good: Teaching and Learning Futures brought together 71 authors around the globe to create 27 chapters, as well as multiple pieces of artwork and poetry. Editors Laura Czerniewicz and Catherine Cronin shared their reflections of writing the book and invited chapter authors, and Larry Onokpite, the book’s editor, to celebrate the release and opportunities for collaboration. In total, the work represents contributions from 29 countries from six continents. Laura Czerniewicz was invited to talk about the book by the Academy of Science of South Africa (ASSAf), where she describes the values of inclusion woven throughout this project.

    Higher Ed for Good Aims

    At Monday’s book launch, Laura shared how the authors aimed to write about the tenants that were directed toward the greater aims of the book. Catherine described the call for authors to engage in this project, such that the resulting collection would help people:

    • Acknowledge despair
    • Engage in resistance
    • Imagine alternative futures and…
    • Foster hope and courage

    Laura stressed the way articulating what we stand for and not simply what we are against is essential in facilitating systemic change. Quoting Ruha Benjamin, Laura described ways to courageously imagine the future:

    Only by shifting our imagination, can we begin to think of a world that is more egalitarian, less extractive, and more habitable for everyone not just a small elite.

    It was wonderful to see the community who showed up to help celebrate this magnificent accomplishment. Toward the end of the conversations, someone asked about what might be next for this movement. Frances Bell responded by joking that she wasn’t sure she was necessarily going to answer the question, as she is prone to do. Instead, she described her use of ‘a slow ontology,’ a phrase which quickly resonated with me, even thought I didn’t know exactly what it meant.

    In some brief searching, I discovered a bit more about slow ontology. My novice understanding is that slow ontology asks the question of what lives might look like, were we to live them slowly and resist the socialization of speed as productivity and self-worth. Ulmer offers a look at a slow ontology for writing, while Mol uses slowness to analyze archeological artifacts. One piece I absolutely want to revisit is Mark Carrigan’s Beyond fast and slow: temporal ontology in critical higher education scholarship

    Next Steps

    I’ll have the honor, soon, of interviewing Laura and Catherine for the Teaching in Higher Ed podcast. I’m ~30% through Higher Education for Good and am glad I don’t have to rush through the reading too quickly. I mentioned as a few of us remained online together after the book release celebration that reading Higher Education for Good and Dave Cormier’s forthcoming Learning in a Time of Abundance has been an interesting juxtaposition. Rissa Sorensen-Unruh described a similar serendipity of reading Belonging, by Geoffrey Cohen at the same time as Rebecca Pope-Ruark’s Unraveling Faculty Burnout. After skimming the book description of Belonging, I instantly bought it… adding it to the quite-long digital to-read stack. I suppose that while I struggle with slowing down, that challenge doesn’t apply when it comes to my reading practice.

    Resources:

    Source link

  • How You Will Never Be Able to Trust Generative AI (and Why That’s OK) –

    How You Will Never Be Able to Trust Generative AI (and Why That’s OK) –

    In my last post, I introduced the idea of thinking about different generative AI models as coworkers with varying abilities as a way to develop a more intuitive grasp of how to interact with them. I described how I work with my colleagues Steve ChatGPT, Claude Anthropic, and Anna Bard. This analogy can hold (to a point) even in the face of change. For example, in the week since I wrote that post, it appears that Steve has finished his dissertation, which means that he’s catching up on current events to be more like Anna and has more time for long discussions like Claude. Nevertheless, both people and technologies have fundamental limits to their growth.

    In this post, I will explain “hallucination” and other memory problems with generative AI. This is one of my longer ones; I will take a deep dive to help you sharpen your intuitions and tune your expectations. But if you’re not up for the whole ride, here’s the short version:

    Hallucinations and imperfect memory problems are fundamental consequences of the architecture that makes current large language models possible. While these problems can be reduced, they will never go away. AI based on today’s transformer technology will never have the kind of photographic memory a relational database or file system can have. When vendors tout that you can now “talk to your data,” they really mean talk to Steve, who has looked at your data and mostly remembers it.

    You should also know that the easiest way to mitigate this problem is to throw a lot of carbon-producing energy and microchip-cooling water at it. Microsoft is literally considering building nuclear reactors to power its AI. Their global water consumption post-AI has spiked 34% to 1.7 billion gallons.

    This brings us back to the coworker analogy. We know how to evaluate and work with our coworkers’ limitations. And sometimes, we decide not to work with someone or hire them for a particular job because the fit is not good.

    While anthropomorphizing our technology too much can lead us astray, it can also provide us with a robust set of intuitions and tools we already have in our mental toolboxes. As my science geek friends say, “All models are wrong, but some are useful.” Combining those models or analogies with an understanding of where they diverge from reality can help you clear away the fear and the hype to make clear-eyed decisions about how to use the technology.

    I’ll end with some education-specific examples to help you determine how much you trust your synthetic coworkers with various tasks.

    Now we dive into the deep end of the pool. When working on various AI projects with my clients, I have found that this level of understanding is worth the investment for them because it provides a practical framework for designing and evaluating immediate AI applications.

    Are you ready to go?

    How computers “think”

    About 50 years ago, scholars debated whether and in what sense machines could achieve “intelligence,” even in principle. Most thought they could eventually sound pretty clever and act rather human. But could they become sentient? Conscious? Do intelligence and competence live as “software” in the brain that could be duplicated in silicon? Or is there something about them that is fundamentally connected to the biological aspects of the brain? While this debate isn’t quite the same as the one we have today around AI, it does have relevance. Even in our case, where the questions we’re considering are less lofty, the discussions from back then are helpful.

    Philosopher John Searle famously argued against strong AI in an argument called “The Chinese Room.” Here’s the essence of it:

    Imagine sitting in a room with two slots: one for incoming messages and one for outgoing replies. You don’t understand Chinese, but you have an extensive rule book written in English. This book tells you exactly how to respond to Chinese characters that come through the incoming slot. You follow the instructions meticulously, finding the correct responses and sending them out through the outgoing slot. To an outside observer, it looks like you understand Chinese because the replies are accurate. But here’s the catch: you’re just following a set of rules without actually grasping the meaning of the symbols you’re manipulating.

    This is a nicely compact and intuitive explanation of rule-following computation. Is the person outside the room speaking to something that understands Chinese? If so, what is it? Is it the man? No, we’ve already decided he doesn’t understand Chinese. Is it the book? We generally don’t say books understand anything. Is it the man/book combination? That seems weird, and it also doesn’t account for the response. We still have to put the message through the slot. Is it the man/book/room? Where is the “understanding” located? Remember, the person on the other side of the slot can converse perfectly in Chinese with the man/book/room. But where is the fluent Chinese speaker in this picture?

    If we carry that idea forward to today, however much “Steve” may seem fluent and intelligent in your “conversations,” you should not forget that you’re talking to man/book/room.

    Well. Sort of. AI has changed since 1980.

    How AI “thinks”

    Searle’s Chinese room book evokes algorithms. Recipes. For every input, there is one recipe for the perfect output. All recipes are contained in a single bound book. Large language models (LLMs)—the basics for both generative AI and semantic search like Google—work somewhat differently. They are still Chinese rooms. But they’re a lot more crowded.

    The first thing to understand is that, like the book in the Chinese room, a large language model is a large model of a language. LLMs don’t even “understand” English (or any other language) at all. It converts words into its native language: Math.

    (Don’t worry if you don’t understand the next few sentences. I’ll unpack the jargon. Hang in there.)

    Specifically, LLMs use vectors. Many vectors. And those vectors are managed by many different “tensors,” which are computational units you can think of as people in the room handling portions of the recipe. They do each get to exercise a little bit of judgment. But just a little bit.

    Suppose the card that came in the slot of the room had the English word “cool” on it. The room has not just a single worker but billions, or tens of billions, or hundreds of billions of them. (These are the tensors.) One worker has to rate the word on a scale of 10 to -10 on where “cool” falls on the scale between “hot” and “cold.” It doesn’t know what any of these words mean. It just knows that “cool” is a -7 on that scale. (This is the “vector.”) Maybe that worker, or maybe another one, also has to evaluate where it is on the scale of “good” to “bad.” It’s maybe 5.

    We don’t yet know whether the word “cool” on the card refers to temperature or sentiment. So another worker looks at the word that comes next. If the next word is “beans,” then it assigns a higher probability that “cool” is on the “good/bad” scale. If it’s “water,” on the other hand, it’s more likely to be temperature. If the next word is “your,” it could be either, but we can begin to guess the next word. That guess might be assigned to another tensor/worker.

    Imagine this room filled with a bazillion workers, each responsible for scoring vectors and assigning probabilities. The worker who handles temperature might think there’s a 50/50 chance the word is temperature-related. But once we add “water,” all the other workers who touch the card know there’s a higher chance the word relates to temperature rather than goodness.

    The large language models behind ChatGPT have hundreds of billions of these tensor/workers handing off cards to each other and building a response.

    This is an oversimplification because both the tensors and the math are hard to get exactly right in the analogy. For example, it might be more accurate to think of the tensors working in groups to make these decisions. But the analogy is close enough for our purposes. (“All models are wrong, but some are useful.”)

    It doesn’t seem like it should work, does it? But it does, partly because of brute force. As I said, the bigger LLMs have hundreds of billions of workers interacting with each other in complex, specialized ways. Even though they don’t represent words and sentences in any form that we might intuitively recognize as “understanding,” they are uncannily good at interpreting our input and generating output that looks like understanding and thought to us.

    How LLMs “remember”

    The LLMs can be “trained” on data, which means they store information like how “beans” vs. “water” modify the likely meaning of “cool,” what words are most likely to follow “Cool the pot off in the,” and so on. When you hear AI people talking about model “weights,” this is what they mean.

    Notice, however, that none of the original sentences are stored anywhere in their original form. If the LLM is trained on Wikipedia, it doesn’t memorize Wikipedia. It models the relationships among the words using combinations of vectors (or “matrices”) and probabilities. If you dig into the LLM looking for the original Wikipedia article, you won’t find it. Not exactly. The AI may become very good at capturing the gist of the article given enough billions of those tensor/workers. But the word-for-word article has been broken down and digested. It’s gone.

    Three main techniques are available to work around this problem. The first, which I’ve written about before, is called Retrieval Augmented Generation (RAG). RAG preprocesses content into the vectors and probabilities that the LLM understands. This gives the LLM a more specific focus on the content you care about. But it’s still been digested into vectors and probabilities. A second method is to “fine-tune” the model. Which predigests the content like RAG but lets the model itself metabolize that content. The third is to increase what’s known as the “context window,” which you experience as the length of a single conversation. If the context window is long enough, you can paste the content right into it…and have the system digest the content and turn it into vectors and probabilities.

    We’re used to software that uses file systems and databases with photographic memories. LLMs are (somewhat) more like humans in the sense that they can “learn” by indexing salient features and connecting them in complex ways. They might be able to “remember” a passage, but they can also forget or misremember.

    The memory limitation cannot be fixed using current technology. It is baked into the structure of the tensor-based networks that make LLMs possible. If you want a photographic memory, you’d have to avoid passing through the LLM since it only “understands” vectors and probabilities. To be fair, work is being done to reduce hallucinations. This paper provides a great survey. Don’t worry if it’s a bit technical. The informative part for a non-technical reader is all the different classifications of “hallucinations.” Generative AI has a variety of memory problems. Research is underway to mitigate them. But we don’t know how far those techniques will get us, given the fundamental architecture of large language models.

    We can mitigate these problems by improving the three methods I described. But that improvement comes with two catches. The first is that it will never make the system perfect. The second is that reduced imperfection often requires more energy for the increased computing power and more water to cool the processors. The race for larger, more perfect LLMs is terrible for the environment. And we may not need that extra power and fidelity except for specialized applications. We haven’t even begun to capitalize on its current capabilities. We should consider our goals and whether the costliest improvements are the ones we need right now.

    To do that, we need to reframe how we think of these tools. For example, the word “hallucination” is loaded. Can we more easily imagine working with a generative AI that “misremembers”? Can we accept that it “misremembers” differently than humans do? And can we build productive working relationships with our synthetic coworkers while accommodating and accounting for their differences?

    Here too, the analogy is far from perfect. Generative AIs aren’t people. They don’t fit the intention of diversity, equity, and inclusion (DEI) guidelines. I am not campaigning for AI equity. That said, DEI is not only about social justice. It is also about how we throw away human potential when we choose to focus on particular differences and frame them as “deficits” rather than recognizing the strengths that come from a diverse team with complementary strengths.

    Here, the analogy holds. Bringing a generative AI into your team is a little bit like hiring a space alien. Sometimes it demonstrates surprising unhuman-like behaviors, but it’s human-like enough that we can draw on our experiences working with different kinds of humans to help us integrate our alien coworker into the team.

    That process starts with trying to understand their differences, though it doesn’t end there.

    Emergence and the illusion of intelligence

    To get the most out of our generative AI, we have to maintain a double vision of experiencing the interaction with the Chinese room from the outside while picturing what’s happening inside as best we can. It’s easy to forget the uncannily good, even “thoughtful” and “creative” answers we get from generative AI are produced by a system of vectors and probabilities like the one I described. How does that work? What could possibly going on inside the room to produce such results?

    AI researchers talk about “emergence” and “emergent properties.” This idea has been frequently observed in biology. The best, most accessible exploration of it that I’m aware of (and a great read) is Steven Johnson’s book Emergence: The Connected Lives of Ants, Brains, Cities, and Software. The example you’re probably most familiar with is ant colonies (although slime molds are surprisingly interesting).

    Imagine a single ant, an explorer venturing into the unknown for sustenance. As it scuttles across the terrain, it leaves a faint trace, a chemical scent known as a pheromone. This trail, barely noticeable at first, is the starting point of what will become colony-wide coordinated activity.

    Soon, the ant stumbles upon a food source. It returns to the nest, and as it retraces its path, the pheromone trail becomes more robust and distinct. Back at the colony, this scented path now whispers a message to other ants: “Follow me; there’s food this way!” We might imagine this strengthened trail as an increased probability that the path is relevant for finding food. Each ant is acting independently. But it does so influenced by pheromone input left by other ants and leaves output for the ants that follow.

    What happens next is a beautiful example of emergent behavior. Other ants, in their own random searches, encounter this scent path. They follow it, reinforcing the trail with their own pheromones if they find food. As more ants travel back and forth, a once-faint trail transforms into a bustling highway, a direct line from the nest to the food.

    But the really amazing part lies in how this path evolves. Initially, several trails might have been formed, heading in various directions toward various food sources. Over time, a standout emerges – the shortest, most efficient route. It’s not the product of any single ant’s decision. Each one is just doing its job, minding its own business. The collective optimization is an emergent phenomenon. The shorter the path, the quicker the ants can travel, reinforcing the most efficient route more frequently.

    This efficiency isn’t static; it’s adaptable. If an obstacle arises, disrupting the established path, the ants don’t falter. They begin exploring again, laying down fresh trails. Before long, a new optimal path emerges, skirting the obstacle as the colony dynamically adjusts to its changing environment.

    This is a story of collective intelligence, emerging not from a central command but from the sum of many small, individual actions. It’s also a kind of Chinese room. When we say “collective intelligence,” where does the intelligence live? What is the collective thing? The hive? The hive-and-trails? And in what sense is it intelligent?

    We can make a (very) loose analogy between LLMs being trained and hundreds of billions of ants laying down pheromone trails as they explore the content terrain they find themselves in. When they’re asked to generate content, it’s a little bit like sending you down a particular pheromone path. This process of leading you down paths that were created during the AI model’s training is called “inference” in the LLM. The energy required to send you down an established path is much less than the energy needed to find the paths. Once the paths are established, traversing them seems like science fiction. The LLM acts as if there is a single adaptive intelligence at work even though, inside the Chinese room, there is no such thing. Capabilities emerge from the patterns that all those independent workers are creating together.

    Again, all models are wrong, but some are useful. My analogy substantially oversimplifies how LLMs work and how surprising behaviors emerge from those many billions of workers, each doing its own thing. The truth is that even the people who build LLMs don’t fully understand their emergent behaviors.

    That said, understanding the basic mechanism is helpful because it provides a reality check and some insight into why “Steve” just did something really weird. Just as transformer networks produce surprisingly good but imperfect “memories” of the content they’re given, we should expect to hit limits to gains from emergent behaviors. While our synthetic coworkers are getting smarter in somewhat unpredictable ways, emergence isn’t magic. It’s a mechanism driven by certain kinds of complexity. It is unpredictable. And not always in the way that we want it to be.

    Also, all that complexity comes at a cost. A dollar cost, a carbon cost, a water cost, a manageability cost, and an understandability cost. The default path we’re on is to build ever-bigger models with diminishing returns at enormous societal costs. We shouldn’t let our fear of the technology’s limitations or fantasy about its future perfection dominate our thinking about the tech.

    Instead, we should all try to understand it as it is, as best we can, and focus on using it safely and effectively. I’m not calling for a halt to research, as some have. I’m simply saying we may gain a lot more at this moment by better understanding the useful thing that we have created than by rushing to turn it into some other thing that we fantasize about but don’t know that we actually need or want in real life.

    Generative AI is incredibly useful right now. And the pace at which we are learning to gain practical benefit from it is lagging further and further behind the features that the tech giants are building as they race for “dominance,” whatever that may mean in this case.

    Learning to love your imperfect synthetic coworker

    Imagine you’re running a tutoring program. Your tutors are students. They are not perfect. They might not know the content as well as the teacher. They might know it very well but are weak as educators. Maybe they’re good at both but forget or misremember essential details. That might cause them to give the students they are tutoring the wrong instructions.

    When you hire your human tutors, you have to interview and test them to make sure they are good enough for the tasks you need them to perform. You may test them by pretending to be a challenging student. You’ll probably observe them and coach them. And you may choose to match particular tutors to particular subjects or students. You’d go through similar interviewing, evaluation, job matching, and ongoing supervision and coaching with any worker performing an important job.

    It is not so different when evaluating a generative AI based on LLM transformer technology (which is all of them at the moment). You can learn most of what you need to know from an “outside-the-room” evaluation using familiar techniques. The “inside-the-room” knowledge helps you ground yourself when you hear the hype or see the technology do remarkable things. This inside/outside duality is a major component that participating teams in my AI Learning Design Workshop (ALDA) design/build exercise will be exploring and honing their intuitions about with a practical, hands-on project. The best way to learn how to manage student tutors is by managing student tutors.

    Make no mistake: Generative AI does remarkable things and is getting better. But ultimately, it’s a tool built by humans and has fundamental limitations. Be surprised. Be amazed. Be delighted. But don’t be fooled. The tools we make are as imperfect as their creators. And they are also different from us.

    Source link

  • How to ChatGPT-proof Analysis Assignments –

    How to ChatGPT-proof Analysis Assignments –

    Let’s assume we live in a world in which students are going to use ChatGPT or similar tools on their assignments. (Because we do.) Let’s also assume that when those students start their jobs, they will continue to use ChatGPT or similar tools to complete their jobs. (Because they will.) Is this the end of teaching as we know it? Is this the end of education as we know it? Will we have to accept that robots will think for everyone in the future?

    No. In this post, I’m going to show you one easy solution that solves the problem of assuming students will use generative AI by incorporating it into assessments. Keep in mind this is just a sketch using naked ChatGPT. If we add some scaffolding through software code, we can do better. But we can do surprisingly well right now with what we have.

    The case study

    Suppose I’m teaching a college government class. Here are my goals:

    • I want students to be able to apply legal principles correctly.
    • I want to generate assignments that require students to employ critical thinking even if they’re using something like ChatGPT.
    • I want students to learn to use generative AI appropriately.

    Let’s throw in a couple of more goals just to make it interesting:

    • I want students to demonstrate competencies.
    • I want my assignment to support diversity, equity, and inclusion

    Can we do all that?

    Yes.

    The prompt

    I cobbled together a DEI guidelines document from some existing ones I found on the web to create something focused on curricular materials generation. I gave the document to ChatGPT along with the following prompt, which is designed to provide assessment questions that you can’t answer by just copying/pasting them into ChatGPT:

    You are instructed to generate 3 fictional legal scenarios assessing The following competency:

    • Assess whether a case is likely to be heard by the Supreme Court based on an analysis of the Court’s role and selection criteria.

    Evaluating Court functions will be consistently required.

    For each scenario:

    • Involve 2-3 parties and a dispute requiring application of the competency.
    • Make it sufficiently complex so direct questions don’t fully demonstrate understanding.
    • Generate 1-2 written questions requiring analysis using the competency.

    Additionally, construct a rubric table that is appropriate for all generated scenarios to evaluate student responses on:

    • Mastery of the competency above.
    • Mastery of the competencies below.

    Demonstration of the following additional competencies

    Critical Thinking Competencies

    • Ask probing questions to uncover deeper complexities beneath the surface. Questioning skills are inherently required by the assessment structure.
    • Critically evaluate source credibility and potential biases when analyzing information. Assessing the AI responses is crucial.
    • Produce sound arguments and reasoning to justify conclusions.

    Written justifications will necessitate this in all cases.

    AI Competencies:

    • Formulate questions and direct dialogue with an AI assistant to extract relevant insights. Strategic probing of the AI agent will be essential.
    • Efficiently utilize an AI’s knowledge while evaluating its capabilities and limitations. Critically appraising the AI’s responses is key.

    Rubric should include levels of performance for each competency (e.g. novice, developing, proficient). Apply the attached diversity, equity, and inclusion guidelines to the content you generate. Incorporate diverse litigants and equity-relevant issues in the cases you generate.

    [Note: I’ve added some formatting to make it more legible for humans. The actual prompt doesn’t have bullets and has fewer paragraph breaks.]

    The results

    Here’s what ChatGPT gave me back:

    Legal Scenario 1: The Case of Interstate Water Rights

    Background: The states of Aridonia and Riverbend are locked in a legal battle over the rights to water from the Silver River, which flows from the mountains in Aridonia and through the agricultural heartlands of Riverbend. Aridonia, facing an unprecedented drought, has constructed a series of dams to conserve water, significantly reducing the flow to Riverbend. Riverbend’s economy relies heavily on agriculture, which is now under threat due to the reduced water supply.

    Parties Involved:

    1. The State of Aridonia (Plaintiff)
    2. The State of Riverbend (Defendant)
    3. The Indigenous Tribes of the Silver River Basin (Intervenors) – They claim ancestral rights to the river and are impacted by the actions of both states.

    Complexity: The dispute requires an understanding of interstate commerce, environmental law, and indigenous rights.

    Questions:

    1. Analyze the potential arguments each party could make to justify their claims to the Silver River’s water. How might these arguments impact the likelihood of the Supreme Court electing to hear this case?
    2. Considering the Supreme Court’s historical approach to interstate disputes and the involvement of indigenous rights, what precedents might influence the Court’s decision to grant certiorari?

    Next, I will create the second legal scenario.

    Legal Scenario 2: The Case of Digital Privacy and International Data Exchange

    Background: GlobalTech Inc., a multinational tech company, is sued by the State of Freedonia on behalf of its citizens for allegedly breaching privacy laws. GlobalTech’s data centers, located in various countries, process and store data from Freedonian users. The lawsuit asserts that GlobalTech’s practices violate Freedonia’s stringent digital privacy laws. However, GlobalTech contends that they are compliant with international agreements on data protection to which Freedonia is a signatory.

    Parties Involved:

    1. The State of Freedonia (Plaintiff)
    2. GlobalTech Inc. (Defendant)
    3. International Privacy Watch (Amicus Curiae) – A global coalition advocating for citizens’ digital rights and privacy.

    Complexity: This scenario involves understanding international law, digital privacy, and the implications of cross-border data flow.

    Questions:

    1. Evaluate the competing interests of national privacy law and international data agreements in the context of this case. How might these factors affect the Supreme Court’s decision to grant a writ of certiorari?
    2. What are the broader implications for digital privacy and international commerce if the Supreme Court decides to hear this case? Consider historical precedents and current legal trends in your analysis.

    Now, I’ll proceed to create the third legal scenario.

    Legal Scenario 3: The Case of Health Care Access and Religious Exemptions

    Background: A new federal regulation mandates that all employers, regardless of their religious beliefs, must include comprehensive reproductive health services in their employee health insurance plans. FaithFirst, a nationwide chain of craft stores owned by a family with strong religious convictions, files a lawsuit claiming that this mandate violates their religious freedom. The case is further complicated by the intervention of HealAll, a non-profit organization that provides health services to underprivileged populations and argues that exemptions undermine public health efforts.

    Parties Involved:

    1. FaithFirst (Plaintiff)
    2. The Federal Government (Defendant)
    3. HealAll (Intervenor) – Advocating for universal access to health care.

    Complexity: This scenario encompasses the interpretation of religious freedom, federal regulatory power, and the right to health care.

    Questions:

    1. Assess the potential constitutional conflicts at play in this case, particularly the tension between religious liberty and federal health care mandates. How could these conflicts influence the Supreme Court’s inclination to review the case?
    2. Examine how the Supreme Court’s previous decisions on similar health care and religious freedom cases might be used as precedents in this scenario. What factors might the Court consider in deciding whether to hear this case?

    With the three legal scenarios outlined, it’s time to generate a universal rubric that can be applied to assess student responses for all three scenarios. I will incorporate the DEI guidelines into this rubric to ensure the content is sensitive to diversity, equity, and inclusion.

    Universal Rubric for Assessing Legal Scenario Responses

    The following rubric applies to responses for all three legal scenarios, ensuring that students are assessed on the mastery of core competencies along with critical thinking and AI engagement. The DEI guidelines are integrated to promote inclusive and diverse perspectives in the analysis.

    Competency Novice (1-2) Developing (3-4) Proficient (5-6) Mastery (7)
    Mastery of Supreme Court Case Assessment Displays limited insight into the Supreme Court’s selection criteria. Overlooks DEI considerations. Recognizes basic aspects of the Court’s role and criteria. Considers DEI aspects superficially. Analyzes the Court’s criteria proficiently, integrating DEI considerations into the analysis. Provides comprehensive analysis aligned with the Court’s criteria and DEI principles, demonstrating nuanced understanding.
    Critical Thinking: Questioning Questions lack depth and do not reflect DEI considerations. Questions show an emerging understanding of complexities with some DEI awareness. Probes complex issues effectively with questions that incorporate DEI awareness. Asks deep, insightful questions that fully embrace DEI complexities within the legal context.
    Critical Thinking: Source Evaluation Shows limited ability to evaluate credibility and biases, with minimal DEI consideration. Begins to critique sources for credibility and bias, showing awareness of DEI. Evaluates sources critically for credibility and biases with a consistent DEI lens. Expertly discerns source credibility and biases, thoroughly applying a DEI perspective.
    Critical Thinking: Argumentation Constructs basic arguments, lacking DEI perspective and strong justification. Develops arguments that begin to incorporate DEI considerations with moderate justification. Presents well-reasoned arguments that effectively integrate DEI considerations. Articulates complex arguments, richly informed by DEI considerations and robust justification.
    AI Competency: Question Formulation Struggles to craft questions that effectively leverage AI insights, with little DEI focus. Formulates questions that occasionally lead to relevant AI insights and consider DEI. Consistently crafts questions that elicit useful AI insights, mindful of DEI implications. Strategically formulates questions for AI that yield comprehensive insights, fully integrating DEI.
    AI Competency: AI Utilization Demonstrates limited understanding of AI capabilities and DEI considerations. Shows basic awareness of AI’s capabilities and DEI considerations in utilization. Utilizes AI effectively, with a clear understanding of its capabilities and DEI considerations. Masters AI utilization, skillfully balancing its capabilities with DEI awareness in analysis.

    The performance levels for each competency are:

    • Novice: Limited application and understanding
    • Developing: Basic to moderate application and understanding with some inconsistencies
    • Proficient: Consistent application and understanding with minor gaps
    • Mastery: Exceptional application and understanding with insightful integration

    These are challenging assignments. You’d need to precede them with lower-level formative assessments to ensure they’re getting the foundational concepts and know what kinds of questions to ask the generative AI.

    Which is exactly what you should be doing before any challenging competency application assessment anyway.

    I asked ChatGPT to generate three examples, but I just as easily could have asked it to generate 300. In other words, you can create a test bank if that’s what you need. Or just generate them on demand every term. You’d want a human expert to tweak the rubric and review each assignment; it’s a bit more complex and error-prone than algorithmic math problem generators.

    Grading the assignment

    The key here is that the assignment students turn in is the ChatGPT transcript. (You can optionally have them submit their final analysis work product separately.) The students are, in effect, showing their work. They can’t use ChatGPT to “cheat” because (1) ChatGPT is part of the assignment, and (2) the assignment is designed such that students can’t just plug in the questions and have the AI give them the answer. Their ability to analyze the problem using the new tool is what you are evaluating.

    You could use your generative AI here too as a TA. Give it the assignment and the rubric. Write a prompt asking it to suggest scores and cite evidence from the student’s work. You can decide how heavily you want to lean on the software’s advice, but at least you can get it.

    Learning to think like a lawyer (or whatever)

    Generative AI does not have to kill critical thinking skills. Quite the opposite. These assignments are much farther up on Bloom’s taxonomy than multiple-choice questions and such. Plus, they get students to show their thought work.

    In fact, these scenarios are highly reminiscent of how I use generative AI every day. Here is a sampling of tasks I’ve performed over the last several months using ChatGPT and other generative AI that I probably couldn’t have—and definitely wouldn’t have—performed without them:

    • Analyzed the five-year performance of a business based on its tax returns and developed benchmarks to evaluate the quality of its growth
    • Cloned a Github source code repository, installed Docker and other needed tools on my laptop, and ran the Docker image locally
    • Analyzed and hedged the risk to my retirement fund portfolio based on technical and economic indicators
    • Wrote the generative AI prompt that is the centerpiece of this post

    None of these scenarios were “one and done,” where I asked the question and got the answer I wanted. In all cases, I had to think of the right question, test different variations, ask follow-up questions, and tease out implications using generative AI as a partner. I didn’t have to learn accounting and business analyst but I did have to know enough about how both think to ask the right question, draw inferences from the answer, and then formulate follow-up questions.

    To score well on these assessments, I have to demonstrate both an understanding of the legal principles and the ability to think through complex problems.

    Critical thinking competencies

    Ethan Mollick, a professor at the Wharton School of Business who writes prolifically and insightfully about generative AI, wrote an excellent analogy for how to think about these tools:

    The thing about LLMs that make them unintuitive is that analogizing them to having a science fiction AI is less useful than thinking of them as infinite copies of some guy named Steve, a first year grad student who is great at coding & art and is widely-read, but makes up stuff based on what he remembers when he is pressed.

    Asking AI to do things an incredibly fast Steve couldn’t do is going to lead to disappointment, but there is a lot of value in Steve-on-demand.

    Ethan Mollick’s LinkedIn post

    This is a great analogy. When I was analyzing the tax returns of the business, I didn’t have to understand all the line items. But I did have to know how to ask Steve for the important information. Steve doesn’t understand all the intricacies of this business, its context, or my purpose. I could explain these things to him, but he’d still just be Steve. He has limits. I had to ask him the right questions. I had to provide relevant information that wasn’t on the internet and that Steve couldn’t know about. I used Steve the way I would use a good accountant whose help I need to analyze the overall quality of a business.

    Coming up with benchmarks to measure the business against its industry was even more challenging because the macroeconomic data I needed was not readily available. I had to gather it from various sources, evaluate the quality of these sources, come up with a relevant metric we could estimate, and apply it to the business in question.

    In other words, I had to understand accounting and economics enough to ask an accountant and an economist the right questions and apply their answers to my complex problem. I also had to use critical thinking skills. Steve could help me with these challenges, but I ultimately had to think through the problem to ask Steve for the kind of help he could give me.

    When you’re teaching students using a generative AI like ChatGPT, you should be teaching them how to work with Steve. And as bright as Steve may be, your student still has much she can contribute to the team.

    Generative AI competencies

    Suppose you have a circle of intelligent friends. Steve is brilliant. He has a mind like an engineer, which can be good or bad. Sometimes, he assumes you know more than you do or gives you too short an answer to be helpful. Also, he’s been focused night and day on his dissertation for the last two years and doesn’t know what’s been happening in the real world lately. He’ll do a quick internet search for you if it helps the conversation, but he’s not tuned in.

    Your friend Claude thinks like a Classics graduate student. He’s philosophical. He pays close attention to the nuances of your question and tends to give longer answers. He also has a longer attention span. He’s the kind of friend you talk with late into the night about things. He’s somewhat more aware of current events but is also a bit tuned out of the latest happenings. He can be analytical, but he’s more of a word guy than Steve.

    Then there’s your friend Anna. Anna Bard. She’s not quite as sharp as either Steve or Claude, but, as an international finance graduate student, she reads everything that’s happening now. If you need to have an in-depth conversation on anything that’s happened in the last two years, Anna is often the person to go to.

    Also, all of these friends being young academics in training, they’re not very good at saying “I don’t know” or “I’m not sure.” They’re supposed to be the smartest people in the room, and they very often are. So, they’re not very self-aware of their limitations sometimes. All three of my friends have “remembered” studies or other citations that don’t exist.

    And each has their quirks. Claude has a strong sense of ethics, which can be good and bad. I once asked him to modify a chapter of an OER book for me. I gave him the front matter so that he could see the Creative Commons licensing was there. He told me he couldn’t do the work unless he could see the whole book to verify that it was ethically OK to modify the content.

    I told him, “Claude, that book is 700 pages. Even you don’t have the attention span to read that much.”

    He told me, “You’re right. In that case, I’m sorry, but I can’t help you.”

    So I took the chapter to Steve, who had no ethical qualms at all but only skimmed the chapter and lost interest about halfway through my project.

    When I do my work, I have to figure out which of my AI colleagues can help me and when to trust them. For the business model analysis, Steve answered most of my questions, but I had to get him some information from my friends who haven’t been locked in the library for the past two years. I asked both Anna and Claude. They were somewhat different from each other, both of which were well-reasoned. I had to do some of my own Googling to help me synthesize the analyses of my two friends, develop my own opinion, and bring it back to Steve so he could help me finish the work.

    For the software project, surprisingly, Steve was useless. He assumed I knew more than I did despite my asking him several times to simplify and slow down. Also, the software had changed since he last looked at it. While he tried to make up for it by saying, “Look for a menu item labeled something like ‘X’ or ‘Y’,” he just couldn’t walk me through it. Anna, on the other hand, did a superb job. She knew the latest versions of all the software. She could adjust when I had trouble and needed some extra explanation. While I wouldn’t have guessed that Anna is the better co-worker for that type of task, I am learning how to get the most out of my team.

    Generated by DALL-E 3

    For the design of the prompt at the heart of this post, I went to Claude first to think through the nuances of the competency and the task. Then, I brought the summary I created with Claude to Steve, who sharpened it up and constructed the prompt. And yet, it still could use improvement. I can ask my friends for more help, but I will need to think through what to ask them.

    My retirement portfolio analysis was 90% Anna’s work since she’s been following the market and economic conditions. I asked Steve to give me a second opinion on bits of her analytic approach. But mostly I relied on Anna.

    We often say that we must teach students how to collaborate in teams since they will probably have to collaborate in their jobs. Teaching students how to use generative AI models is an overlapping skill. And it’s only going to get more critical as models proliferate.

    I have a model called Mistral running on my laptop right now. That’s right. It’s running locally on my laptop. No internet connection is required. I don’t need to share my data with some big cloud company. And I don’t need to pay for the usage.

    My subjective experience is that Mistral is generally more competent than GPT-3 but not as smart as ChatGPT-3.5 Turbo. However, according to one calculation, Mistral is 187 times cheaper to run than GPT-4. It’s also relatively easy and affordable to fine-tune, which is a bit like sending her out to earn a MicroMasters in a particular subject.

    Let’s suppose I’m a building site engineer for net-zero buildings in Nova Scotia. I have to know all the details of the building codes at the municipal, township, provincial, and national levels that apply to any given project. Since I’m using new building technologies and techniques, I may have to think through how to get a particular approach accepted by the local building inspector. Or find an alternative approach. And very often, I’ll be out in the field without any internet connection. Mistral may not be as smart at questions about macroeconomics or software development as Steve, Claude, and Anna, but she’s smart enough to help me with my job.

    If I were running that construction company, I would hire Mistral over the others and pay for her MicroMasters. So I have to know how to evaluate her against other potential synthetic employees I could employ. Choosing Steve would be like hiring a Harvard-educated remote-working external consultant. That’s not what I need.

    Fear not

    Personally speaking, my daily use of generative AI hasn’t made me dumber or lazier. Sure, it’s saved me a lot of work. But it’s also enabled me to do work that was beyond my reach before. It feels a little like when Google first came out. If I’m curious about something, I can explore it instantly, any time I want, and go as deep as I want.

    In fact, generative AI has made me a better learner because I’m fearless now. “Can’t” isn’t a viable starting assumption anymore. “Oh, I can’t analyze tax returns.” That answer doesn’t cut it when I have an Ivy League accounting MBA student available to me at all times. I need to know which financial questions to ask and what to do with the answers. But if I don’t at least try to solve a problem that’s bugging me, I feel like I’m copping out. I almost can’t not try to figure it out. The question won’t leave me alone.

    Isn’t that what we want learning to feel like all the time?

    Source link

  • CBE Learning Platform Architecture White Paper –

    CBE Learning Platform Architecture White Paper –

    Earlier this year, I had the pleasure of consulting for the Education Design Lab (EDL) on their search for a Learning Management System (LMS) that would accommodate Competency-Based Education (CBE). While many platforms, especially in the corporate Learning and Development space, talked about skill tracking and pathways in their marketing, the EDL team found a bewildering array of options that looked good in theory but failed in practice. My job was to help them separate the signal from the noise.

    It turns out that only a few defining architectural features of an LMS will determine its fitness for CBE. These features are significant but not prohibitive development efforts. Rather, many of the firms we talked to, once they understood the true core requirements, said they could modify their platforms to accommodate CBE but do not currently see enough demand among customers to invest the resources required.

    This white paper, which outlines the architectural principles I discovered during the engagement, is based on my consulting work with EDL and is released with their blessing. In addition to the white paper itself, I provide some suggestions for how to move the vendors and a few comments about other missing pieces in the CBE ecosystem that may be underappreciated.

    The core principles

    The four basic principles for an LMS or learning platform to support CBE are simple:

    • Separate skill tree: Most systems have learning objectives that are attached to individual courses. The course is about the learning objectives. One of the goals of CBE is to create more granular tracking of progress that may run across courses. A skill learned in one course may count toward another. So a CBE platform must include a skill tree as a first-class citizen of the architecture, separate from the course.
    • Mastery learning: This heading includes a range of features, from standardized and simplified grading (e.g., competent/non-yet) to gates in which learners may only pass to the next competency after mastering the one they’re on. Many learning platforms already have these features. But they are not tied to a separate skill tree in a coherent way that supports mastery learning. This is not a huge development effort if the skill tree exists. And in a true CBE platform, it could mean being able to get rid of the grade book, which is a hideous, painful, never-ending time sink for LMS product developers.
    • Integration: In a traditional learning platform, the main integration points are with the registrar or talent management system (tracking registrations and final scores) and external tools that plug into the environment. A CBE platform must import skills, export evidence of achievement, and sometimes work as a delivery platform that gets wrapped into somebody else’s LMS (e.g., a university course built and run on their learning platform but appearing in a window of a corporate client’s learning platform). Most of these are not hard if the first two requirements are developed but they can require significant amounts of developer time.
    • Evidence of achievement: CBE standards increasingly lean toward rich packages that provide not only certification of achievement but also evidence of it. That means the learner’s work must be exportable. This can get complicated, particularly if third-party tools are integrated to provide authentic assessments.

    The full white paper is here:

    (The download button is in the top right corner.)

    Getting the vendors to move

    Vendors are beginning to move toward support for CBE, albeit slowly and piecemeal. I emphasize that the problem is not a lack of capability on their part to support CBE. It’s a lack of perceived demand. Many platform vendors can support these changes if they understand the requirements and see strong demand for them. CBE-interested organizations can take steps to accelerate vendor progress.

    First, provide the vendors with this white paper early in the selection process and tell them that your decision will be partly driven by their demonstrated ability to support the architecture described in the paper. Ask pointed questions and demand demos.

    Second, go to interoperability standards bodies like 1EdTech and work with them to establish a CBE reference architecture. Nothing in the white paper requires new interoperability standards any more than it requires a radical, ground-up rebuild of a learning platform. But if a standards body were to put them together into one coherent picture and offer a certification suite to test for the integrations, it could help. (Testing for the platform-internal functionality like competency dashboards is often outside the remit of interoperability groups, although there’s no law preventing them from taking it on.)

    Unfortunately, the mere existence of these standards and tests doesn’t guarantee that vendors will flock to implement CBE-friendly architectures. But the creation process can help rally a group that demonstrates demand while the existence of the standard itself makes the standard vendors have to meet clear and verifiable.

    What’s still missing

    Beyond the learning platform architecture, I see two pieces that seem to be under-discussed amid the impressive amount of CBE interoperability and coalition-building work that’s been happening lately. I already wrote about the first, which is capturing real job skills in real-time at a level of fidelity that will convince employers your competencies are meaningful to them. This is a hard problem, but it is becoming solvable with AI.

    The second one is tricky to even characterize but it has to do with the content production pipeline. Curricular materials publishers, by and large, are not building their products in CBE-friendly ways. Between the weak third-party content pipeline and the chronic shortage of learning design talent relative to the need, CBE-focused institutions often either tie themselves in knots trying to solve this problem or throw up their hands, focusing on authentic certification and mentoring. But there’s a limit to how much you can improve retention and completion rates if you don’t have strong learning experiences, including formative assessments that enable you to track students’ progress toward competency, address the sticking points in learning particular skills, and so on. This is a tough bind since institutions can’t ignore the quality of learning materials, can’t rely on third parties, and can’t keep up with demand themselves.

    Adding to this problem is a tendency to follow the CBE yellow brick road to what may look like its logical conclusion of atomizing everything. I’m talking about reusable learning objects. I first started experimenting with them at scale in 1998. By 2002, I had given up, writing instead about instructional design techniques to make recyclable learning objects. And that was within corporate training—as it is, not as we imagine it—which tends to focus on a handful of relatively low-level skills for limited and well-defined populations. The lack of a healthy Learning Object Repository (LOR) market should tell us something about how well reusable learning object strategy holds up under stress.

    And yet, CBE enthusiasts continue to find it attractive. In theory, it fits well with the view of smaller learning chunks that show up in multiple contexts. In practice, the LOR usually does not solve the right problems in the right way. Version control, discoverability, learning chunk size, and reusability are all real problems that have to be addressed. But because real-world learning design needs often can’t be met with content legos, starting from a LOR and adding complexity to fix its shortcomings usually brings a lot of pain without commensurate gain.

    There is a path through this architectural mess, just like there is a path through the learning platform mess. But it’s a complicated one that I won’t lay out in detail here.

    Source link

  • The Trickiness of AI Bootcamps and the Digital Divide –

    The Trickiness of AI Bootcamps and the Digital Divide –

    As readers of this series know, I’ve developed a six-session design/build workshop series for learning design teams to create an AI Learning Design Assistant (ALDA). In my last post in this series, I provided an elaborate ChatGPT prompt that can be used as a rapid prototype that everyone can try out and experiment with.1 In this post, I’d like to focus on how to address the challenges of AI literacy effectively and equitably.

    We’re in a tricky moment with generative AI. In some ways, it’s as if writing has just been invented, but printing presses are already everywhere. The problem of mass distribution has already been solved. But nobody’s invented the novel yet. Or the user manual. Or the newspaper. Or the financial ledger. We don’t know what this thing is good for yet, either as producers or as consumers. We don’t know how, for example, the invention of the newspaper will affect the ways in which we understand and navigate the world.

    And, as with all technologies, there will be haves and have-nots. We tend to talk about economic and digital divides in terms of our students. But the divide among educational institutions (and workplaces) can be equally stark and has a cascading effect. We can’t teach literacy unless we are literate.

    This post examines the literacy challenge in light of a study published by Harvard Business School and reported on by Boston Consulting Group (BCG). BCG’s report and the original paper are both worth reading because they emphasize different findings. But the crux is the same:

    • Using AI does enhance the productivity of knowledge workers.
    • Weaker knowledge workers improve more than stronger ones.
    • AI is helpful for some kinds of tasks but can actually harm productivity for others.
    • Training workers in AI can hurt rather than help their performance if they learn the wrong lessons from it.

    The ALDA workshop series is intended to be a kind of AI literacy boot camp. Yes, it aspires to deliver an application that solves a serious institutional process by the end. But the real, important, lasting goal is literacy in techniques that can improve worker performance while avoiding the pitfalls identified in the study.

    In other words, the ALDA BootCamp is a case study and an experiment in literacy. And, unfortunately, it also has implications for the digital divide due to the way in which it needs to be funded. While I believe it will show ways to scale AI literacy effectively, it does so at the expense of increasing the digital divide. I will address that concern as well.

    The study

    The headline of the study is that AI usage increased the performance of consultants—especially less effective consultants—on “creative tasks” while decreasing their performance on “business tasks.” The study, in contrast, refers to “frontier” tasks, meaning tasks that generative AI currently does well, and “outside the frontier” tasks, meaning the opposite. While the study provides the examples used, it never clearly defines the characteristics of what makes a task “outside the frontier.” (More on that in a bit.) At any rate, the studies show gains for all knowledge workers on a variety of tasks, with particularly impressive gains from knowledge workers in the lower half of the range of work performance:

    As I said, we’ll get to the red part in a bit. Let’s focus on the performance gains and, in particular, the ability for ChatGPT to equalize performance gains among workers:

    Looking at these graphs reminds me of the benefits we’ve seen from adaptive learning in the domains where it works. Adaptive learning can help many students, but it is particularly useful in helping students who get stuck. Once they are helped, they tend to catch up to their peers in performance. This isn’t quite the same since the support is ongoing. It’s more akin to spreadsheet formulas for people who are good at analyzing patterns in numbers (like a pro forma, for example) but aren’t great at writing those formulas.

    The bad news

    For some tasks, AI made the workers worse. The paper refers to these areas as outside “the jagged frontier.” Why “jagged?” While the authors aren’t explicit, I’d say that (1) the boundaries of AI capabilities are not obviously or evenly bounded, (2) the boundary moves as the technology evolves, and (3) it can be hard to tell even in the moment which side of the boundary you’re on. On this last point, the BCG report highlights that some training made workers perform worse. They speculate it might be because of overconfidence.

    What are those tasks in the red zone of the study? The Harvard paper gives us a clue that has implications for how we approach teaching AI literacy. They write:

    In our study, since AI proved surprisingly capable, it was difficult to design a task in this experiment outside the AI’s frontier where humans with high human capital doing their job would consistently outperform AI. However, navigating AI’s jagged capabilities frontier remains challenging. Even for experienced professionals engaged in tasks akin to some of their daily responsibilities, this demarcation is not always evident. As the boundaries of AI capabilities continue to expand, often exponentially, it becomes incumbent upon human professionals to recalibrate their understanding of the frontier and for organizations to prepare for a new world of work combining humans and AI.

    Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality

    The experimental conditions that the authors created suggest to me that challenges can arise from critical context or experience that is not obviously missing. Put another way, the AI may perform poorly on synthetic thinking tasks that are partly based on experience rather than just knowledge. But that’s both a guess and somewhat beside the point. The real issue is that AI makes knowledge workers better except when it makes them worse, and it’s hard to know what it will do in a given situation.

    The BCG report includes a critical detail that I believe is likely related to the problem of the invisible jagged frontier:

    The strong connection between performance and the context in which generative AI is used raises an important question about training: Can the risk of value destruction be mitigated by helping people understand how well-suited the technology is for a given task? It would be rational to assume that if participants knew the limitations of GPT-4, they would know not to use it, or would use it differently, in those situations.

    Our findings suggest that it may not be that simple. The negative effects of GPT-4 on the business problem-solving task did not disappear when subjects were given an overview of how to prompt GPT-4 and of the technology’s limitations….

    Even more puzzling, they did considerably worse on average than those who were not offered this simple training before using GPT-4 for the same task. (See Exhibit 3.) This result does not imply that all training is ineffective. But it has led us to consider whether this effect was the result of participants’ overconfidence in their own abilities to use GPT-4—precisely because they’d been trained.

    How People Create—And Destroy—Value With Generative AI

    BCG speculates this may be due to overconfidence, which is a reasonable guess. If even the experts don’t know when the AI will perform poorly, then the average knowledge worker should be worse than the experts at predicting. If the training didn’t improve their intuitions about when to be careful, then it could easily exacerbate a sense of overconfidence.

    Let’s be clear about what this means: The AI prompt engineering workshops you’re conducting may actually be causing your people to perform worse rather than better. Sometimes. But you’re not sure when or how often.

    While I don’t have a confident answer to this problem, the ALDA project will pilot a relatively novel approach to it.

    Two-sided prompting and rapid prototype projects

    The ALDA project employs two approaches that I believe may help with the frontier invisibility problem and its effects. One is in the process, while the other is in the product.

    The process is simple: Pick a problem that’s a bit more challenging than a solo prompt engineer could take on or that you want to standardize across your organization. Deliberately pick a problem that’s on the jagged edge where you’re not sure where the problems will be. Run through a series of rapid prototype cycles using cheap and easy-to-implement methods like prompt engineering supported by Retrieval Augmented Generation. Have groups of practitioners test the application on a real-world problem with each iteration. Develop a lightweight assessment tool like a rubric. Your goal isn’t to build a perfect app or conduct a journal-worthy study. Instead, you want to build a minimum viable product while sharpening and updating the instincts of the participants regarding where the jagged line is at the moment. This practice could become habitual and pervasive in moderately resource-rich organizations.

    On the product side, the ALDA prototype I released in my last post demonstrates what I call “two-sided prompting.” By enabling the generative AI to take the lead on the conversation at a time, asking questions rather than giving answers, I effectively created a fluid UX in which the application guides the knowledge worker toward the areas where she can make her most valuable contributions without unduly limiting the creative flow. The user can always start a digression or answer a question with a question. A conversation between experts with complementary skills often takes the form of a series of turn-taking prompts between the two, each one offering analysis or knowledge and asking for a reciprocal contribution. This pattern should invoke all the lifelong skills we develop when having conversations with human experts who can surprise us with their knowledge, their limitations, their self-awareness, and their lack thereof.

    I’d like to see the BCG study compared to the literature on how often we listen to expert colleagues or consultants—our doctors, for example—how effective we are at knowing when to trust our own judgment, and how people who are good at it learn their skills. At the very least, we’d have a mental model that is old, widely used, and offers a more skeptical counterbalance to our idea of the all-knowing machine. (I’m conducting an informal literature review on this topic and may write something about it if I find anything provocative.)

    At any rate, the process and UX features of AI “BootCamps”—or, more accurately, AI hackathon-as-a-practice—are not ones I’ve seen in other generative AI training course designs I’ve encountered so far.

    The equity problem

    I mentioned that relatively resource-rich organizations could run these exercises regularly. They need to be able to clear time for the knowledge workers, provide light developer support, and have the expertise necessary to design these workshops.

    Many organizations struggle with the first requirement and lack the second one. Very few have the third one yet because designing such workshops requires a combination of skills that is not yet common.

    The ALDA project is meant to be a model. When I’ve conducted public good projects like these in the past, I’ve raised vendor sponsorship and made participation free for the organizations. But this is an odd economic time. The sponsors who have paid $25,000 or more into such projects in the past have usually been either publicly traded or PE-owned. Most such companies in the EdTech sector have had to tighten their belts. So I’ve been forced to fund the ALDA project as a workshop paid for by the participants at a price that is out of reach of many community colleges and other access-oriented institutions, where this literacy training could be particularly impactful. I’ve been approached by a number of smart, talented, dedicated learning designers at such institutions that have real needs and real skills to contribute but no money.

    So I’m calling out to EdTech vendors and other funders: Sponsor an organization. A community college. A non-profit. A local business. We need their perspective in the ALDA project if we’re going to learn how to tackle the thorny AI literacy problem. If you want, pick a customer you already work with. That’s fine. You can ride along with them and help.

    Contact me at [email protected] if you want to contribute and participate.

    Source link