Tag: Generative

  • Will the use of generative AI shift higher education from a knowledge-first system to a skills-first system?

    Will the use of generative AI shift higher education from a knowledge-first system to a skills-first system?

    On the eve of the release of HEPI’s Student Generative AI Survey 2025, HEPI hosted a roundtable dinner with the report’s sponsor, Kortext, and invited guests to discuss the following essay question:

    How will AI change the university experience for the next generation?

    This was the third roundtable discussion we have hosted with Kortext on AI, over three years. Observing the debate mature from a cautious, risk-averse response to this forward-looking, employability-focused discussion has been fascinating. We spent much of the evening discussing a potential pivot for teaching and learning in the sector.

    The higher education sector places the highest importance on creating, collecting, and applying knowledge. ‘Traditional’ assessments have focused on the recollection of knowledge (exams) or the organisation and communication of knowledge (in essays). The advent of search engines has made acquiring knowledge more accessible, while generative AI has automated the communication of knowledge.

    If knowledge is easily accessible, explainable, and digestible, which skills should our graduates possess that cannot be replaced by ChatGPT, now or in the future? It was suggested that these are distinctly ‘human’ skills: relationship building, in-person communication, and leadership. Are we explicitly teaching these skills within the curriculum? Are we assessing them? Are we rebalancing our taught programmes from knowledge to irreplaceable skills to stay ahead of the AI curve?

    And to get a bit meta about it all, what AI skills are we teaching? Not just the practical skills of application of AI use in one’s field, but deep AI literacy. Recognising bias, verifying accuracy, understanding intellectual property rights and embracing digital ambition. (Professor Sarah Jones of Southampton Solent University has written about this here.)

    Given recent geopolitical events, critical thinking was also emphasized. When and why can something be considered the ‘truth’? What is ‘truth’, and why is it important?

    Colleagues were clear that developing students’ knowledge and understanding should still be a key part of the higher education process (after all, you can’t apply knowledge if you don’t have a basic level of it). In addition, they suggested that we need to be clearer with students about the experiential benefits of learning. As one colleague stated,

    ‘The value of the essay is not the words you have put on the page, it is the processes you go through in getting the words to the page. How do you select your information? How do you structure your argument more clearly? How do you choose the right words to convince your reader of your point?’

    There was further discussion about the importance of experiential learning, even within traditional frameworks. Do we clearly explain to students the benefits of learning experiences – such as essay writing – and how this will develop their personal and employability skills? One of the participants mentioned that they were bribing their son not to complete his Maths homework by using ChatGPT. As students increasingly find their time constrained due to paid work and caring responsibilities, how can we convince students of the value of fully engaging with their learning experiences and assessments when ChatGPT is such an attractive option? How explicitly are we talking to students about their skills development?

    There was a sense of urgency to the discussion. One colleague described this as a critical juncture, a ‘one-time opportunity’ to make bold choices about developing our programmes to be future-focused. This will ensure graduates leave higher education with the skills expected and needed by their employers, which will outlast the rapidly evolving world of generative AI and ensure the sector remains relevant in a world of bite-sized, video-based learning and increasing automation.

    Kortext is a HEPI partner.

    Founded in 2013, Kortext is the UK’s leading student experience and engagement expert, pioneering digitally enhanced teaching and learning in the higher education community. Kortext supports institutions in boosting student engagement and driving outcomes with our AI-powered, cutting-edge content discovery and study products, market-leading learner analytics, and streamlined workflows for higher education. For more information, please visit: kortext.com

    Source link

  • Engaging Students in Collaborative Research and Writing Through Positive Psychology, Student Wellness, and Generative AI Integration – Faculty Focus

    Engaging Students in Collaborative Research and Writing Through Positive Psychology, Student Wellness, and Generative AI Integration – Faculty Focus

    Source link

  • Probabilities of generative AI pale next to individual ideas

    Probabilities of generative AI pale next to individual ideas

    While I was working on the manuscript for More Than Words: How to Think About Writing in the Age of AI, I did a significant amount of experimenting with large language models, spending the most time with ChatGPT (and its various successors) and Claude (in its different flavors).

    I anticipated that over time this experimenting would reveal some genuinely useful application of this technology to my work as a writer.

    In truth, it’s been the opposite, and I think it’s interesting to explore why.

    One factor is that I have become more concerned about what I see as a largely uncritical embrace of generative AI in educational contexts. I am not merely talking about egregiously wrongheaded moves like introducing an AI-powered Anne Frank emulator that has only gracious thoughts toward Nazis, but other examples of instructors and institutions assuming that because the technology is something of a wonder, it must have a positive effect on teaching and learning.

    This has pushed me closer to a resistance mindset, if for no other reason than to provide a counterbalance to those who see AI as an inevitability without considering what’s on the other side. In truth, however, rather than being a full-on resister I’m more in line with Marc Watkins, who believes that we should be seeing AI as “unavoidable” but not “inevitable.” While I think throwing a bear hug around generative AI is beyond foolish, I also do not dismiss the technology’s potential utility in helping students learn.

    (Though, a big open question is what and how we want them to learn these things.)

    Another factor has been that the more I worked with the LLMs, the less I trusted them. Part of this was because I was trying to deploy their capabilities to support me on writing in areas where I have significant background knowledge and I found them consistently steering me wrong in subtle yet meaningful ways. This in turn made me fearful of using them in areas where I do not have the necessary knowledge to police their hallucinations.

    Mostly, though, just about every time I tried to use them in the interests of giving myself a shortcut to a faster outcome, I realized by taking the shortcut I’d missed some important experience along the way.

    As one example, in a section where I argue for the importance of cultivating one’s own taste and sense of aesthetic quality, I intended to use some material from New Yorker staff writer Kyle Chayka’s book Filterworld: How Algorithms Flattened Culture. I’d read and even reviewed the book several months before, so I thought I had a good handle on it, but still, I needed a refresher on what Chayka calls “algorithmic anxiety” and prompted ChatGPT to remind me what Chayka meant by this.

    The summary delivered by ChatGPT was perfectly fine, accurate and nonhallucinatory, but I couldn’t manage to go from the notion I had in my head about Chayka’s idea to something useful on the page via that summary of Chayka’s idea. In the end, I had to go back and reread the material in the book surrounding the concept to kick my brain into gear in a way that allowed me to articulate a thought of my own.

    Something similar happened several other times, and I began to wonder exactly what was up. It’s possible that my writing process is idiosyncratic, but I discovered that to continue to work the problem of saying (hopefully) interesting and insightful things in the book was not a summary of the ideas of others, but the original expression of others as fuel for my thoughts.

    This phenomenon might be related to the nature of how I view writing, which is that writing is a continual process of discovery where I have initial thoughts that bring me to the page, but the act of bringing the idea to the page alters those initial thoughts.

    I tend to think all writing, or all good writing, anyway, operates this way because it is how you will know that you are getting the output of a unique intelligence on the page. The goal is to uncover something I didn’t know for myself, operating under the theory that this will also deliver something fresh for the audience. If the writer hasn’t discovered something for themselves in the process, what’s the point of the whole exercise?

    When I turned to an LLM for a summary and could find no use for it, I came to recognize that I was interacting not with an intelligence, but a probability. Without an interesting human feature to latch onto, I couldn’t find a way to engage my own humanity.

    I accept that others are having different experiences in working alongside large language models, that they find them truly generative (pardon the pun). Still, I wonder what it means to find a spark in generalized probabilities, rather than the singular intelligence.

    I believe I say a lot of interesting and insightful things in More Than Words. I’m also confident I may have some things wrong and, over time, my beliefs will be changed by exposing myself to the responses of others. This is the process of communication and conversation, processes that are not a capacity of large language models given they have no intention working underneath the hood of their algorithm.

    Believing otherwise is to indulge in a delusion. Maybe it’s a helpful delusion, but a delusion nonetheless.

    The capacities of this technology are amazing and increasing all the time, but to me, for my work, they don’t offer all that much of meaning.

    Source link

  • Using Generative AI to “Hack Time” for Implementing Real-World Projects – Faculty Focus

    Using Generative AI to “Hack Time” for Implementing Real-World Projects – Faculty Focus

    Source link

  • Call for Submissions for Special Edition – “Trends in the Use of Generative Artificial Intelligence for Digital Learning.” (Anthony Picciano)

    Call for Submissions for Special Edition – “Trends in the Use of Generative Artificial Intelligence for Digital Learning.” (Anthony Picciano)

     

    Dear Commons Community,

    Patsy Moskal and I have decided to be guest editors for Education Sciences for a special edition entitled,

    “Trends in the Use of Generative Artificial Intelligence for Digital Learning.” (See below for a longer description.)

    It is a most timely topic of deep interest to many in the academy. We would love to have you contribute an article for it. Your submission can be research, practitioner, or thought-based. It also does not have to be a long article (4,000-word minimum). Final articles will be due no later than July 1, 2025.

    You can find more details at: https://www.mdpi.com/journal/education/special_issues/6UHTBIOT14#info

    Thank you for your consideration!

    Tony

     

    Source link

  • for Generative AI Integration into Education – Sovorel

    for Generative AI Integration into Education – Sovorel

    I’m very happy and excited to share that I have released a new book that is geared specifically to helping universities, as well as all educational institutions, with the very important topic of generative AI integration into education. This is a vital process that higher education and all places of learning need to address in order to become and stay relevant in a world that so filled with AI. All of us in academia must develop AI Literacy skills in order to fully develop these skills within our students. If educational institutions do not integrate this important process now, then they will not be properly setting up their students for success. This book specifically provides an action plan to help educational institutions be part of the solution and to better ensure success.

    Here is a video trailer for the 9 Point Action Plan: for Generative AI Integration into Education book:

    Table of contents for the 9 Point Action Plan: for Generative AI Integration into Education book that is now available as an ebook or printed book at Amazon: https://www.amazon.com/Point-Action-Plan-Generative-Integration/dp/B0D172TMMB

    TABLE OF CONTENTS

    1. Chapter 1: Institutional Policies
      • Examples
      • Policy Examples
      • Implementation
    2. Chapter 2: Leadership Guidance on Utilization of Generative AI
      • Examples
      • Michigan State University Example
      • Yale University Example
      • Template Example: Leadership Guidance on Generative AI in Education
      • Implementation
    3. Chapter 3: Training
      • Faculty Training
      • Staff Training
      • Student Training
      • Examples
      • American University of Armenia Example
      • Arizona State University Example
      • Other Examples
      • Implementation
    4. Chapter 4: Generative AI Teaching & Learning Resources
      • Examples
      • University of Arizona
      • American University of Armenia
      • The University of California Los Angeles (UCLA)
      • Implementation
    5. Chapter 5: Outside Information/Confirmation
      • Bring in an Outside Speaker, Presenter, Facilitator
      • Examples
      • Obtain Employers’/Organizations’ Views & Ideas on Needed AI Skills
      • Implementation
    6. Chapter 6: Syllabus AI Use Statement
      • Examples
      • Tuffs University Example
      • Vanderbilt College of Arts and Science
      • American University of Armenia Example
      • Implementation
    7. Chapter 7: Strategic Plan Integration
      • Components of a Good Strategic Plan and AI Considerations
      • Environmental Analysis
      • Review of Organizational Vision/Mission
      • Identification of Strategic Goals and Objectives
      • Key Performance Indicators
      • Integration of AI Literacy into the Curriculum
      • Example: White Paper: Integration of AI Literacy into Our Curriculum
    8. Chapter 8: Integration Observation and Evaluation
    9. Chapter 9: Community Outreach
      • Example Benefits of Community Outreach
      • Implementation
    10. Chapter 10: Conclusion and Call to Action
    11. Glossary
    12. References
    13. Additional Resources

    As with all of my books, please reach out if you have any questions. I can be found on LinkedIn and Twitter. I also respond to all comments placed this blog or through YouTube. Please also join the Sovorel Center for Teaching and Learning Facebook page where I post a lot of updates.

    Source link

  • How You Will Never Be Able to Trust Generative AI (and Why That’s OK) –

    How You Will Never Be Able to Trust Generative AI (and Why That’s OK) –

    In my last post, I introduced the idea of thinking about different generative AI models as coworkers with varying abilities as a way to develop a more intuitive grasp of how to interact with them. I described how I work with my colleagues Steve ChatGPT, Claude Anthropic, and Anna Bard. This analogy can hold (to a point) even in the face of change. For example, in the week since I wrote that post, it appears that Steve has finished his dissertation, which means that he’s catching up on current events to be more like Anna and has more time for long discussions like Claude. Nevertheless, both people and technologies have fundamental limits to their growth.

    In this post, I will explain “hallucination” and other memory problems with generative AI. This is one of my longer ones; I will take a deep dive to help you sharpen your intuitions and tune your expectations. But if you’re not up for the whole ride, here’s the short version:

    Hallucinations and imperfect memory problems are fundamental consequences of the architecture that makes current large language models possible. While these problems can be reduced, they will never go away. AI based on today’s transformer technology will never have the kind of photographic memory a relational database or file system can have. When vendors tout that you can now “talk to your data,” they really mean talk to Steve, who has looked at your data and mostly remembers it.

    You should also know that the easiest way to mitigate this problem is to throw a lot of carbon-producing energy and microchip-cooling water at it. Microsoft is literally considering building nuclear reactors to power its AI. Their global water consumption post-AI has spiked 34% to 1.7 billion gallons.

    This brings us back to the coworker analogy. We know how to evaluate and work with our coworkers’ limitations. And sometimes, we decide not to work with someone or hire them for a particular job because the fit is not good.

    While anthropomorphizing our technology too much can lead us astray, it can also provide us with a robust set of intuitions and tools we already have in our mental toolboxes. As my science geek friends say, “All models are wrong, but some are useful.” Combining those models or analogies with an understanding of where they diverge from reality can help you clear away the fear and the hype to make clear-eyed decisions about how to use the technology.

    I’ll end with some education-specific examples to help you determine how much you trust your synthetic coworkers with various tasks.

    Now we dive into the deep end of the pool. When working on various AI projects with my clients, I have found that this level of understanding is worth the investment for them because it provides a practical framework for designing and evaluating immediate AI applications.

    Are you ready to go?

    How computers “think”

    About 50 years ago, scholars debated whether and in what sense machines could achieve “intelligence,” even in principle. Most thought they could eventually sound pretty clever and act rather human. But could they become sentient? Conscious? Do intelligence and competence live as “software” in the brain that could be duplicated in silicon? Or is there something about them that is fundamentally connected to the biological aspects of the brain? While this debate isn’t quite the same as the one we have today around AI, it does have relevance. Even in our case, where the questions we’re considering are less lofty, the discussions from back then are helpful.

    Philosopher John Searle famously argued against strong AI in an argument called “The Chinese Room.” Here’s the essence of it:

    Imagine sitting in a room with two slots: one for incoming messages and one for outgoing replies. You don’t understand Chinese, but you have an extensive rule book written in English. This book tells you exactly how to respond to Chinese characters that come through the incoming slot. You follow the instructions meticulously, finding the correct responses and sending them out through the outgoing slot. To an outside observer, it looks like you understand Chinese because the replies are accurate. But here’s the catch: you’re just following a set of rules without actually grasping the meaning of the symbols you’re manipulating.

    This is a nicely compact and intuitive explanation of rule-following computation. Is the person outside the room speaking to something that understands Chinese? If so, what is it? Is it the man? No, we’ve already decided he doesn’t understand Chinese. Is it the book? We generally don’t say books understand anything. Is it the man/book combination? That seems weird, and it also doesn’t account for the response. We still have to put the message through the slot. Is it the man/book/room? Where is the “understanding” located? Remember, the person on the other side of the slot can converse perfectly in Chinese with the man/book/room. But where is the fluent Chinese speaker in this picture?

    If we carry that idea forward to today, however much “Steve” may seem fluent and intelligent in your “conversations,” you should not forget that you’re talking to man/book/room.

    Well. Sort of. AI has changed since 1980.

    How AI “thinks”

    Searle’s Chinese room book evokes algorithms. Recipes. For every input, there is one recipe for the perfect output. All recipes are contained in a single bound book. Large language models (LLMs)—the basics for both generative AI and semantic search like Google—work somewhat differently. They are still Chinese rooms. But they’re a lot more crowded.

    The first thing to understand is that, like the book in the Chinese room, a large language model is a large model of a language. LLMs don’t even “understand” English (or any other language) at all. It converts words into its native language: Math.

    (Don’t worry if you don’t understand the next few sentences. I’ll unpack the jargon. Hang in there.)

    Specifically, LLMs use vectors. Many vectors. And those vectors are managed by many different “tensors,” which are computational units you can think of as people in the room handling portions of the recipe. They do each get to exercise a little bit of judgment. But just a little bit.

    Suppose the card that came in the slot of the room had the English word “cool” on it. The room has not just a single worker but billions, or tens of billions, or hundreds of billions of them. (These are the tensors.) One worker has to rate the word on a scale of 10 to -10 on where “cool” falls on the scale between “hot” and “cold.” It doesn’t know what any of these words mean. It just knows that “cool” is a -7 on that scale. (This is the “vector.”) Maybe that worker, or maybe another one, also has to evaluate where it is on the scale of “good” to “bad.” It’s maybe 5.

    We don’t yet know whether the word “cool” on the card refers to temperature or sentiment. So another worker looks at the word that comes next. If the next word is “beans,” then it assigns a higher probability that “cool” is on the “good/bad” scale. If it’s “water,” on the other hand, it’s more likely to be temperature. If the next word is “your,” it could be either, but we can begin to guess the next word. That guess might be assigned to another tensor/worker.

    Imagine this room filled with a bazillion workers, each responsible for scoring vectors and assigning probabilities. The worker who handles temperature might think there’s a 50/50 chance the word is temperature-related. But once we add “water,” all the other workers who touch the card know there’s a higher chance the word relates to temperature rather than goodness.

    The large language models behind ChatGPT have hundreds of billions of these tensor/workers handing off cards to each other and building a response.

    This is an oversimplification because both the tensors and the math are hard to get exactly right in the analogy. For example, it might be more accurate to think of the tensors working in groups to make these decisions. But the analogy is close enough for our purposes. (“All models are wrong, but some are useful.”)

    It doesn’t seem like it should work, does it? But it does, partly because of brute force. As I said, the bigger LLMs have hundreds of billions of workers interacting with each other in complex, specialized ways. Even though they don’t represent words and sentences in any form that we might intuitively recognize as “understanding,” they are uncannily good at interpreting our input and generating output that looks like understanding and thought to us.

    How LLMs “remember”

    The LLMs can be “trained” on data, which means they store information like how “beans” vs. “water” modify the likely meaning of “cool,” what words are most likely to follow “Cool the pot off in the,” and so on. When you hear AI people talking about model “weights,” this is what they mean.

    Notice, however, that none of the original sentences are stored anywhere in their original form. If the LLM is trained on Wikipedia, it doesn’t memorize Wikipedia. It models the relationships among the words using combinations of vectors (or “matrices”) and probabilities. If you dig into the LLM looking for the original Wikipedia article, you won’t find it. Not exactly. The AI may become very good at capturing the gist of the article given enough billions of those tensor/workers. But the word-for-word article has been broken down and digested. It’s gone.

    Three main techniques are available to work around this problem. The first, which I’ve written about before, is called Retrieval Augmented Generation (RAG). RAG preprocesses content into the vectors and probabilities that the LLM understands. This gives the LLM a more specific focus on the content you care about. But it’s still been digested into vectors and probabilities. A second method is to “fine-tune” the model. Which predigests the content like RAG but lets the model itself metabolize that content. The third is to increase what’s known as the “context window,” which you experience as the length of a single conversation. If the context window is long enough, you can paste the content right into it…and have the system digest the content and turn it into vectors and probabilities.

    We’re used to software that uses file systems and databases with photographic memories. LLMs are (somewhat) more like humans in the sense that they can “learn” by indexing salient features and connecting them in complex ways. They might be able to “remember” a passage, but they can also forget or misremember.

    The memory limitation cannot be fixed using current technology. It is baked into the structure of the tensor-based networks that make LLMs possible. If you want a photographic memory, you’d have to avoid passing through the LLM since it only “understands” vectors and probabilities. To be fair, work is being done to reduce hallucinations. This paper provides a great survey. Don’t worry if it’s a bit technical. The informative part for a non-technical reader is all the different classifications of “hallucinations.” Generative AI has a variety of memory problems. Research is underway to mitigate them. But we don’t know how far those techniques will get us, given the fundamental architecture of large language models.

    We can mitigate these problems by improving the three methods I described. But that improvement comes with two catches. The first is that it will never make the system perfect. The second is that reduced imperfection often requires more energy for the increased computing power and more water to cool the processors. The race for larger, more perfect LLMs is terrible for the environment. And we may not need that extra power and fidelity except for specialized applications. We haven’t even begun to capitalize on its current capabilities. We should consider our goals and whether the costliest improvements are the ones we need right now.

    To do that, we need to reframe how we think of these tools. For example, the word “hallucination” is loaded. Can we more easily imagine working with a generative AI that “misremembers”? Can we accept that it “misremembers” differently than humans do? And can we build productive working relationships with our synthetic coworkers while accommodating and accounting for their differences?

    Here too, the analogy is far from perfect. Generative AIs aren’t people. They don’t fit the intention of diversity, equity, and inclusion (DEI) guidelines. I am not campaigning for AI equity. That said, DEI is not only about social justice. It is also about how we throw away human potential when we choose to focus on particular differences and frame them as “deficits” rather than recognizing the strengths that come from a diverse team with complementary strengths.

    Here, the analogy holds. Bringing a generative AI into your team is a little bit like hiring a space alien. Sometimes it demonstrates surprising unhuman-like behaviors, but it’s human-like enough that we can draw on our experiences working with different kinds of humans to help us integrate our alien coworker into the team.

    That process starts with trying to understand their differences, though it doesn’t end there.

    Emergence and the illusion of intelligence

    To get the most out of our generative AI, we have to maintain a double vision of experiencing the interaction with the Chinese room from the outside while picturing what’s happening inside as best we can. It’s easy to forget the uncannily good, even “thoughtful” and “creative” answers we get from generative AI are produced by a system of vectors and probabilities like the one I described. How does that work? What could possibly going on inside the room to produce such results?

    AI researchers talk about “emergence” and “emergent properties.” This idea has been frequently observed in biology. The best, most accessible exploration of it that I’m aware of (and a great read) is Steven Johnson’s book Emergence: The Connected Lives of Ants, Brains, Cities, and Software. The example you’re probably most familiar with is ant colonies (although slime molds are surprisingly interesting).

    Imagine a single ant, an explorer venturing into the unknown for sustenance. As it scuttles across the terrain, it leaves a faint trace, a chemical scent known as a pheromone. This trail, barely noticeable at first, is the starting point of what will become colony-wide coordinated activity.

    Soon, the ant stumbles upon a food source. It returns to the nest, and as it retraces its path, the pheromone trail becomes more robust and distinct. Back at the colony, this scented path now whispers a message to other ants: “Follow me; there’s food this way!” We might imagine this strengthened trail as an increased probability that the path is relevant for finding food. Each ant is acting independently. But it does so influenced by pheromone input left by other ants and leaves output for the ants that follow.

    What happens next is a beautiful example of emergent behavior. Other ants, in their own random searches, encounter this scent path. They follow it, reinforcing the trail with their own pheromones if they find food. As more ants travel back and forth, a once-faint trail transforms into a bustling highway, a direct line from the nest to the food.

    But the really amazing part lies in how this path evolves. Initially, several trails might have been formed, heading in various directions toward various food sources. Over time, a standout emerges – the shortest, most efficient route. It’s not the product of any single ant’s decision. Each one is just doing its job, minding its own business. The collective optimization is an emergent phenomenon. The shorter the path, the quicker the ants can travel, reinforcing the most efficient route more frequently.

    This efficiency isn’t static; it’s adaptable. If an obstacle arises, disrupting the established path, the ants don’t falter. They begin exploring again, laying down fresh trails. Before long, a new optimal path emerges, skirting the obstacle as the colony dynamically adjusts to its changing environment.

    This is a story of collective intelligence, emerging not from a central command but from the sum of many small, individual actions. It’s also a kind of Chinese room. When we say “collective intelligence,” where does the intelligence live? What is the collective thing? The hive? The hive-and-trails? And in what sense is it intelligent?

    We can make a (very) loose analogy between LLMs being trained and hundreds of billions of ants laying down pheromone trails as they explore the content terrain they find themselves in. When they’re asked to generate content, it’s a little bit like sending you down a particular pheromone path. This process of leading you down paths that were created during the AI model’s training is called “inference” in the LLM. The energy required to send you down an established path is much less than the energy needed to find the paths. Once the paths are established, traversing them seems like science fiction. The LLM acts as if there is a single adaptive intelligence at work even though, inside the Chinese room, there is no such thing. Capabilities emerge from the patterns that all those independent workers are creating together.

    Again, all models are wrong, but some are useful. My analogy substantially oversimplifies how LLMs work and how surprising behaviors emerge from those many billions of workers, each doing its own thing. The truth is that even the people who build LLMs don’t fully understand their emergent behaviors.

    That said, understanding the basic mechanism is helpful because it provides a reality check and some insight into why “Steve” just did something really weird. Just as transformer networks produce surprisingly good but imperfect “memories” of the content they’re given, we should expect to hit limits to gains from emergent behaviors. While our synthetic coworkers are getting smarter in somewhat unpredictable ways, emergence isn’t magic. It’s a mechanism driven by certain kinds of complexity. It is unpredictable. And not always in the way that we want it to be.

    Also, all that complexity comes at a cost. A dollar cost, a carbon cost, a water cost, a manageability cost, and an understandability cost. The default path we’re on is to build ever-bigger models with diminishing returns at enormous societal costs. We shouldn’t let our fear of the technology’s limitations or fantasy about its future perfection dominate our thinking about the tech.

    Instead, we should all try to understand it as it is, as best we can, and focus on using it safely and effectively. I’m not calling for a halt to research, as some have. I’m simply saying we may gain a lot more at this moment by better understanding the useful thing that we have created than by rushing to turn it into some other thing that we fantasize about but don’t know that we actually need or want in real life.

    Generative AI is incredibly useful right now. And the pace at which we are learning to gain practical benefit from it is lagging further and further behind the features that the tech giants are building as they race for “dominance,” whatever that may mean in this case.

    Learning to love your imperfect synthetic coworker

    Imagine you’re running a tutoring program. Your tutors are students. They are not perfect. They might not know the content as well as the teacher. They might know it very well but are weak as educators. Maybe they’re good at both but forget or misremember essential details. That might cause them to give the students they are tutoring the wrong instructions.

    When you hire your human tutors, you have to interview and test them to make sure they are good enough for the tasks you need them to perform. You may test them by pretending to be a challenging student. You’ll probably observe them and coach them. And you may choose to match particular tutors to particular subjects or students. You’d go through similar interviewing, evaluation, job matching, and ongoing supervision and coaching with any worker performing an important job.

    It is not so different when evaluating a generative AI based on LLM transformer technology (which is all of them at the moment). You can learn most of what you need to know from an “outside-the-room” evaluation using familiar techniques. The “inside-the-room” knowledge helps you ground yourself when you hear the hype or see the technology do remarkable things. This inside/outside duality is a major component that participating teams in my AI Learning Design Workshop (ALDA) design/build exercise will be exploring and honing their intuitions about with a practical, hands-on project. The best way to learn how to manage student tutors is by managing student tutors.

    Make no mistake: Generative AI does remarkable things and is getting better. But ultimately, it’s a tool built by humans and has fundamental limitations. Be surprised. Be amazed. Be delighted. But don’t be fooled. The tools we make are as imperfect as their creators. And they are also different from us.

    Source link

  • Who Is Winning the Generative AI Race? Nobody (yet). –

    Who Is Winning the Generative AI Race? Nobody (yet). –

    This is a post for folks who want to learn how recent AI developments may affect them as people interested in EdTech who are not necessarily technologists. The tagline of e-Literate is “Present is Prologue.” I try to extrapolate from today’s developments only as far as the evidence takes me with confidence.

    Generative AI is the kind of topic that’s a good fit for e-Literate because the conversations about it are fragmented. The academic and technical literature is boiling over with developments on practically a daily basis but is hard for non-technical folks to sift through and follow. The grand syntheses about the future of…well…everything are often written by incredibly smart people who have to make a lot of guesses at a moment of great uncertainty. The business press has important data wrapped in a lot of WHEEEE!

    Let’s see if we can run this maze, shall we?

    Is bigger better?

    OpenAI and ChatGPT set many assumptions and expectations about generative AI, starting with the idea that these models must be huge and expensive. Which, in turn, means that only a few tech giants can afford to play.

    Right now there are five widely known giants. (Well, six, really, but we’ll get to the surprise contender in a bit.) OpenAI’s ChatGPT and Anthropic’s Claude are pure plays created by start-ups. OpenAI started the whole generative AI craze by showing the world how much anyone who can write English can accomplish with ChatGPT. Anthropic has made a bet on “ethical AI” with more protections from harmful output and a few differentiating features that are important for certain applications but that I’m not going to go into here.

    Then there are the big three SaaS hosting giants. Microsoft has been tied very tightly to OpenAI, of which it owns a 49% stake. Google, which has been a pioneering leader in AI technologies but has been a mess with its platforms and products (as usual), has until recently focused on promoting several of its own models. Amazon, which has been late out of the gate, has its own Titan generative AI model that almost nobody has seen yet. But Amazon seems to be coming out of the gate with a strategy that emphasizes hosting an ecosystem of platforms, including Anthropic and others.

    About that ecosystem thing. A while back, an internal paper called “We Have No Moat, and OpenAI Doesn’t Either.” leaked from Google. It made the argument that so much innovation was happening so quickly in open-source generative AI that the war chests and proprietary technologies of these big companies wouldn’t give them an advantage over the rapid innovation of a large open-source community.

    I could easily write a whole long post about the nature of that innovation. For now, I’ll focus on a few key points that should be accessible to everyone. First, it turns out that the big companies with oodles of money and computing power—surprise!—decided to rely on strategies that required oodles of money and computing power. They didn’t spend a lot of time thinking about how to make their models smaller and more efficient. Open-source teams with far more limited budgets quickly demonstrated that they could make huge gains in algorithmic efficiency. The barrier to entry for building a better LLM—money—is dropping fast.

    Complementing this first strategy, some open-source teams worked particularly hard to improve data quality, which requires more hard human work and less brute computing force. It turns out that the old adage holds: garbage in, garbage out. Even smaller systems trained on more carefully curated data are less likely to hallucinate and more likely to give high-quality answers.

    And third, it turns out that we don’t need giant all-purpose models all the time. Writing software code is a good example of a specialized generative AI task that can be accomplished well with a much smaller, cheaper model using the techniques described above.

    The internal Google memo concluded by arguing that “OpenAI doesn’t matter” while cooperating with open source is vital.

    That missive was leaked in May. Guess what’s happened since then?

    The swarm

    Meta had already announced in February that it was releasing an open-source-ish model called Llama. It was only open-source-ish because its license limited it to research use. That was quickly hacked and abused. The academic teams and smaller startups, which were already innovating like crazy, took advantage of the oodles of money and computing power that Meta was able to put into LLama. Unlike the other giants, Meta doesn’t make money by hosting software. They making from content. Commoditizing the generative AI will lead to much more content being generated. Perhaps seeing an opportunity, when Meta released LLama 2 in July, the only unusual restrictions they placed on the open-source license were to prevent big hosting companies like Amazon, Microsoft, and Google from making money off Llama without paying Meta. Anyone smaller than that can use the Llama models for a variety of purposes, including commercial applications. Importantly, LLama 2 is available in a variety of sizes, including one small enough to run on a newer personal computer.

    To be clear, OpenAI, Microsoft, Google, Anthropic, and Google are all continuing to develop their proprietary models. That isn’t going away. But at the same time…

    • Microsoft, despite their expensive continuing love affair with OpenAI, announced support for Llama 2 and has a license (but not announced products that I can find yet) for Databricks’ open-source Dolly 2.0.
    • Google Cloud is adding both LLama 2 and Anthropic’s Claude 2 to their list of 100 LLM models they support, including their own open-source Flan T-5 and PaLM LLMs.
    • Amazon now supports a growing range of LLMs, including open-source Stability AI and Llama 2.
    • IBM—’member them?—is back in the AI game, trying to rehabilitate its image after the much-hyped and mostly underwhelming Watson products. The company is trotting out watsonx (with the very now, very wow lower-case “w” at the beginning of the name and “x” at the end) integrated with HuggingFace, which you can think of as being a little bit like the Github for open-source generative AI.

    It seems that the Google memo about no moats, which was largely shrugged off publicly way back in May, was taken seriously privately by the major players. All the big companies have been hedging their bets and increasingly investing in making the use of any given LLM easier rather than betting that they can build the One LLM to Rule Them All.

    Meanwhile, new specialized and generalized LLMs pop up weekly. For personal use, I bounce between ChatGPT, BingChat, Bard, and Claude, each for different types of tasks (and sometimes a couple at once to compare results). I use DALL-E and Stable Diffusion for image generation. (Midjourney seems great but trying to use it through Discord makes my eyes bleed.) I’ll try the largest Llama 2 model and others when I have easy access to them (which I predict will be soon). I want to put a smaller coding LLM on my laptop, not to have it write programs for me but to have it teach me how to read them.

    The most obvious possible end result of this rapid sprawling growth of supported models is that, far from being the singular Big Tech miracle that ChatGPT sold us on with their sudden and bold entrance onto the world stage, generative AI is going to become just one more part of IT stack, albeit a very important one. There will be competition. There will be specialization. The big cloud hosting companies may end up distinguishing themselves not so much by being the first to build Skynet as by their ability to make it easier for technologists to integrate this new and strange toolkit into their development and operations. Meanwhile, a parallel world of alternatives for startups and small or specialized use will spring up.

    We have not reached the singularity yet

    Meanwhile, that welter of weekly announcements about AI advancements I mentioned before have not included massive breakthroughs in super-intelligent machines. Instead, many of them have been about supporting more models and making them easier to use for real-world development. For example, OpenAI is making a big deal out of how much better ChatGPT Enterprise is at keeping the things you tell it private.

    Oh. That would be nice.

    I don’t mean to mock the OpenAI folks. This is new tech. Years of effort will need to be invested into making this technology easy and reliable for the uses it’s being put to now. ChatGPT has largely been a very impressive demo as an enterprise application, while ChatGPT Enterprise is exactly what it sounds like; an effort to make ChatgGPT usable in the enterprise.

    The folks I talk to who are undertaking ambitious generative AI projects, including ones whose technical expertise I trust a great deal, are telling me they are struggling. The tech is unpredictable. That’s not surprising; generative AI is probabilistic. The same function that enables it to produce novel content also enables it to make up facts. Try QA testing an application like that and avoiding regressions—i.e., bugs you thought you fixed but came back in the next version—using technology like that. Meanwhile, the toolchain around developing, testing, and maintaining generative AI-based software is still very immature.

    These problems will be solved. But if the past six months have taught us anything, it’s that our ability to predict the twists and turns ahead is very limited at the moment. Last September, I wrote a piece called “The Miracle, the Grind, and the Wall.” It’s easy to produce miraculous-seeming one-off results with generative AI but often very hard to achieve them reliably at scale. And sometimes we hit walls that prevent us from reaching goals for reasons that we don’t see coming. For example, what happens when you run a data set that has some very subtle problems with it through a probabilistic model with half a trillion computing units, each potentially doing something with the data that is impacted by the problems and passing the modified problematic data onto other parts of the system? How do you trace and fix those “bugs” (if you even call them that).

    It’s fun to think about where all of this AI stuff could go. And it’s important to try. But personally, I find the here-and-now to be fun and useful to think about. I can make some reasonable guesses about what might happen in the next 12 months. I can see major changes and improvements AI can contribute to education today that minimize the risk of the grind and the wall. And I can see how to build a curriculum of real-world projects that teaches me and others about the evolving landscape even as we make useful improvements today.

    What I’m watching for

    Given all that, what am I paying attention to?

    • Continued frantic scrambling among the big tech players: If you’re not able to read and make sense of the weekly announcements, papers, and new open-source projects, pay attention to Microsoft, Amazon, Google, IBM, OpenAI, Anthropic, and HuggingFace. The four traditional giants in particular seem to be thrashing a bit. They’re all tracking the developments that you and I can’t and are trying to keep up. I’m watching these companies with a critical eye. They’re not leading (yet). They’re running for their lives. They’re in a race. But they don’t know what kind of race it is or which direction to go to reach the finish line. Since these are obviously extremely smart people trying very hard to compete, the cracks and changes in their strategies tell us as much as the strategies themselves.
    • Practical, short-term implementations in EdTech: I’m not tracking grand AI EdTech moonshot announcements closely. It’s not that they’re unimportant. It’s that I can’t tell from a distance whose work is interesting and don’t have time to chase every project down. Some of them will pan out. Most won’t. And a lot of them are way too far out over their skis. I’ll wait to see who actually gets traction. And by “traction,” I don’t mean grant money or press. I mean real-world accomplishments and adoptions.

      On the other hand, people who are deploying AI projects now are learning. I don’t worry too much about what they’re building, since a lot of what they do will be either wrong, uninteresting, or both. Clay Shirky once said the purpose of the first version of software isn’t to find out if you got it right; it’s to learn what you got wrong. (I’m paraphrasing since I can’t find the original quote.) I want to see what people are learning. The short-term projects that are interesting to me are the experiments that can teach us something useful.

    • The tech being used along with LLMs: ChatGPT did us a disservice by convincing us that it could soon become an all-knowing, hyper-intelligent being. It’s hard to become the all-powerful AI if you can’t reliably perform arithmetic, are prone to hallucinations, can’t remember anything from one conversation to the next, and start to space out if a conversation runs too long. We are being given the impression that the models will eventually get good enough that all these problems will go away. Maybe. For the foreseeable future, we’re better off thinking about them as interfaces with other kinds of software that are better at math, remembering, and so on. “AI” isn’t a monolith. One of the reasons I want to watch short-term projects is that I want to see what other pieces are needed to realize particular goals. For example, start listening for the term “vector database.” The larger tech ecosystem will help define the possibility space.
    • Intellectual property questions: What happens if The New York Times successfully sues OpenAI for copyright infringement? It’s not like OpenAI can just go into ChatGPT and delete all of those articles. If intellectual property law forces changes to AI training, then the existing models will have big problems (though some have been more careful than others). A chorus of AI cheerleaders tell us, “No, that won’t happen. It’s covered by fair use.” That’s plausible. But are we sure? Are we sure it’s covered in Europe as well as the US? How much should one bet on it? Many subtle legal questions will need to be sorted over the coming several years. The outcomes of various cases will also shape the landscape.
    • Microchip shortages: This is a weird thing for me to find myself thinking about, but these large generative AI applications—especially training them—run on giant, expensive GPUs. One company, NVidia, has far and away the best processors for this work. So much so that there is a major race on to acquire as many NVidia processors as possible due to limited supply and unlimited demand. And unlike software, a challenger company can’t shock the world with a new microprocessor that changes the world overnight. Designing and fabricating new chips at scale takes years. More than two. Nvidia will be the leader for a long time. Therefore, the ability for AI to grow will be, in some respects, constrained by the company’s production capacity. Don’t believe me? Check out their five-year stock price and note the point when generative AI hype really took off.
    • AI on my laptop: On the other end of the scale, remember that open-source has been shrinking the size of effective LLMs. For example, Apple has already optimized a version of Stable Diffusion for their operating system and released an open-source one-click installer for easier consumer use. The next step one can imagine is for them to optimize their computer chip—either the soon-to-be-released M3 or the M4 after it. (As I said, computer chips take time.) But one can easily imagine image generation, software code generation, and a chatbot that understands and can talk about the documents you have on your hard drive. All running locally and privately. In the meantime, I’ll be running a few experiments with AI on my laptop. I’ll let you know how it goes.

    Present is prologue

    Particularly at this moment of great uncertainty and rapid change, it pays to keep your eyes on where you’re walking. A lot of institutions I talk to either are engaged in 57 different AI projects, some of which are incredibly ambitious, or are looking longingly for one thing they can try. I’ll have an announcement on the latter possibility very shortly (which will still work for folks in the former situation). Think about these early efforts as CBE for the future work. The thing about the future is that there’s always more of it. Whatever the future of work is today will be the present of work tomorrow. But there will still be a future of work tomorrow. So we need to build a continuous curriculum of project-based learning with our AI efforts. And we need to watch what’s happening now.

    Every day is a surprise. Isn’t that refreshing after decades in EdTech?

    Source link

  • Generative AI and the Near Future of Work: An EdTech Example –

    Generative AI and the Near Future of Work: An EdTech Example –

    A friend recently asked me for advice on a problem he was wrestling with related to an issue he was having with a 1EdTech interoperability standard. It was the same old problem of a standard not quite getting true interoperability because people implement it differently. I suggested he try using a generative AI tool to fix his problem. (I’ll explain how shortly.)

    I don’t know if my idea will work yet—he promised to let me know once he tries it—but the idea got me thinking. Generative AI probably will change EdTech integration, interoperability, and the impact that interoperability standards can have on learning design. These changes, in turn, impact the roles of developers, standards bodies, and learning designers.

    In this post, I’ll provide a series of increasingly ambitious use cases related to the EdTech interoperability work of 1EdTech (formerly known as IMS Global). In each case, I’ll explore how generative could impact similar work going forward, how it changes the purpose of interoperability standards-making, and how it impacts the jobs and skills of various people whose work is touched by the standards in one way or another.

    Generative AI as duct tape: fixing QTI

    1EdTech’s Question Test Interoperability (QTI) standard is one of its oldest standards that’s still widely used. The earliest version on the 1EdTech website dates back to 2002, while the most recent version was released in 2022. You can guess from the name what it’s supposed to do. If you have a test, or a test question bank, in one LMS, QTI is supposed to let you migrate it into another without copying and pasting. It’s an import/export standard.

    It never worked well. Everybody has their own interpretation of the standard, which means that importing somebody else’s QTI export is never seamless. When speaking recently about QTI to a friend at an LMS company, I commented that it only works about 80% of the time. My friend replied, “I think you’re being generous. It probably only works about 40% of the time.” 1EdTech has learned many lessons about achieving consistent interoperability in the decades since QTI was created. But it’s hard to fix a complex legacy standard like this one.

    Meanwhile, the friend I mentioned at the top of the post asked me recently about practical advice for dealing with this state of affairs. His organization imports a lot of QTI question banks from multiple sources. So his team spends a lot of time debugging those imports. Is there an easier way?

    I thought about it.

    “Your developers probably have many examples that they’ve fixed by hand by now. They know the patterns. Take a handful of before and after examples. Embed them into a prompt in a generative AI that’s good at software code, like Hugging Chat. [As I was drafting this post, OpenAI announced that ChatGPT now has a code interpreter.] “Then give the generative AI a novel input and see if it produces the correct output.”

    Generative AI are good at pattern matching. The differences in QTI implementations are likely to have patterns to them that an LLM can detect, even if those differences change over time (because, for example, one vendor’s QTI implementation changed over time).

    In fact, pattern matching on this scale could work very well with a smaller generative AI model. We’re used to talking about ChatGPT, Google Bard, and other big-name systems that have between half a billion and a billion transformers. Think of transformers as computing legos. One major reason that ChatGPT is so impressive is that it uses a lot of computing legos. Which makes it expensive, slow, and computationally intensive. But if your goal is to match patterns against a set of relatively well-structured set of texts such as QTI files, you could probably train a much smaller model than ChatGPT to reliably translate between implementations for you. The smallest models, like Vicuña LLM, are only 7 billion transformers. That may sound like a lot but it’s small enough to run on a personal computer (or possibly even a mobile phone). Think about it this way: The QTI task we’re trying to solve for is roughly equivalent in complexity to the spell-checking and one-word type-ahead functions that you have on your phone today. A generative AI model for fixing QTI imports could probably be trained for a few hundred dollars and run for pennies.

    This use case has some other desirable characteristics. First, it doesn’t have to work at high volume in real time. It can be a batch process. Throw the dirty dishes in the dishwasher, turn it on, and take out the clean dishes when the machine shuts off. Second, the task has no significant security risks and wouldn’t expose any personally identifiable information. Third, nothing terrible happens if the thing gets a conversion wrong every now and then. Maybe the organization would have to fix 5% of the conversions rather than 100%. And overall, it should be relatively cheap. Maybe not as cheap as running an old-fashioned deterministic program that’s optimized for efficiency. But maybe cheap enough to be worth it. Particularly if the organization has to keep adding new and different QTI implementation imports. It might be easier and faster to adjust the model with fine-tuning or prompting than it would be to revise a set of if/then statements in a traditional program.

    How would the need for skilled programmers change? Somebody would still need to understand how the QTI mappings work well enough to keep the generative AI humming along. And somebody would have to know how to take care of the AI itself (although that process is getting easier every day, especially for this kind of a use case). The repetitive work they are doing now would be replaced by the software over time, freeing up the human brains for other things that human brains are particularly good at. In other words, you can’t get rid of your programmer but you can have that person engaging in more challenging, high-value work than import bug whack-a-mole.

    How does it change the standards-making process? In the short term, I’d argue that 1EdTech should absolutely try to build an open-source generative AI of the type I’m describing rather than trying to fix QTI, which is a task they’ve not succeeded in doing over 20 years. This strikes me as a far shorter path to achieving the original purpose for which QTI was intended, which is to move question banks from one system to another.

    This conclusion, in turn, leads to a larger question: Do we need interoperability standards bodies in the age of AI?

    My answer is a resounding “yes.”

    Going a step further: software integration

    QTI provides data portability but not integration. It’s an import/export format. The fact that Google Docs can open up a document exported from Microsoft Word doesn’t mean that the two programs are integrated in any meaningful way.

    So let’s consider Learning Tool Interoperability (LTI). LTI was quietly revolutionary. Before it existed, any company building a specialized educational tool would have to write separate integrations for every LMS.

    The nature of education is that it’s filled with what folks in the software industry would disparagingly call “point solutions.” If you’re teaching students how to program in python, you need a python programming environment simulator. But that tool won’t help a chemistry professor who really needs virtual labs and molecular modeling tools. And none of these tools are helpful for somebody teaching English composition. There simply isn’t a single generic learning environment that will work well for teaching all subjects. None of these tools will ever sell enough to make anybody rich.

    Therefore, the companies that make these necessary niche teaching tools will tend to be small. In the early days of the LMS, they couldn’t afford to write a separate integration for every LMS. Which meant that not many specialized learning tools were created. As small as these companies’ target markets already were, many of them couldn’t afford to limit themselves to the subset of, say, chemistry professors whose universities happened to use Blackboard. It didn’t make economic sense.

    LTI changed all that. Any learning tool provider could write integration once and have their product work with every LMS. Today, 1EdTech lists 240 products that are officially certified as supporting LTI interoperability standard. Many more support the standard but are not certified.

    Would LTI have been created in a world in which generative AI existed? Maybe not. The most straightforward analogy is Zapier, which connects different software systems via their APIs. ChatGPT and its ilk could act as instant Zapier. A programmer using generative AI could use the API documentation of both systems, ask the generative AI to write integration to perform a particular purpose, and then ask the same AI for help with any debugging.

    Again, notice that one still needs a programmer. Somebody needs to be able to read the APIs, understand the goals, think about the trade-offs, give the AI clear instructions, and check the finished program. The engineering skills are still necessary. But the work of actually writing the code is greatly reduced. Maybe by enough that generative AI would have made LTI unnecessary.

    But probably not. LTI connections pass sensitive student identity and grade information back and forth. It has to be secure and reliable. The IT department has legal obligations, not to mention user expectations, that a well-tested standard helps alleviate (though not eliminate). On top of that, it’s just a bad idea to have spread bits of glue code here, there, and everywhere, regardless of whether a human or a machine writes it. Somebody—an architect—needs to look at the big picture. They need to think about maintainability, performance, security, data management, and a host of other concerns. There is value in having a single integration standard that has been widely vetted and follows a pattern of practices that IT managers can handle the same way across a wide range of product integrations.

    At some point, if a software integration fails to pass student grades to the registrar or leaks personal data, a human is responsible. We’re not close to the point where we can turn over ethical or even intellectual responsibility for those challenges to a machine. If we’re not careful, generative AI will simply write spaghetti code much faster the old days.

    The social element of knowledge work

    More broadly, there are two major value components to the technical interoperability standards process. The first is obvious: technical interoperability. It’s the software. The second is where the deeper value lies. It’s in the conversation that leads to the software. I’ve participated in a 1EdTech specification working group. When the process went well, we learned from each other. Each person at that table brought a different set of experiences to an unsolved problem. In my case, the specification we were working on sent grade rosters from the SIS to the LMS and final grades back from the LMS to the SIS. It sounds simple. It isn’t. We each brought different experiences and lessons learned regarding many aspects of the problem, from how names are represented in different cultures to how SIS and LMS users think differently in ways that impact interoperability. In the short term, a standard is always a compromise. Each creator of a software system has to make adjustments that accommodate the many ways in which others thought differently when they built their own systems. But if the process works right, everybody goes home thinking a little differently about how their systems could be built better for everybody’s benefit. In the longer term, the systems we continue to build over time reflect the lessons we learn from each other.

    Generative AI could make software integration easier. But without the conversation of the standards-making process, we would lose the opportunity to learn from each other. And if AI can reduce the time and cost of the former, then maybe participants in the standards-making effort will spend more time and energy on the latter. The process would have to be rejiggered somewhat. But at least in some cases, participants wouldn’t have to wait until the standard was finalized before they started working on implementing it. When the cost of implementation is low enough and the speed is fast enough, the process can become more of an iterative hackathon. Participants can build working prototypes more quickly. They would still have to go back to their respective organizations and do the hard work of thinking through the implications, finding problems or trade-offs and, eventually, hardening the code. But at least in some cases, parts of the standards-making process could be more fluid and rapidly iterative than they have been. We could learn from each other faster.

    This same principle could apply inside any organization or partnership in which different groups are building different software components that need to work together. Actual knowledge of the code will still be important to check and improve the work of the AI in some cases and write code in others. Generative AI is not ready to replace high-quality engineers yet. But even as it improves, humans will still be needed.

    Anthopologist John Seely Brown famously traced the drop in Xerox copier repair quality to a change in its lunch schedule for their repair technicians. It turns out that technicians learn a lot from solving real problems in the field and then sharing war stories with each other. When the company changed the schedule so that technicians had less time together, repair effectiveness dropped noticeably. I don’t know if a software program was used to optimize the scheduling but one could easily imagine that being the case. Algorithms are good at concrete problems like optimizing complex schedules. On the other hand, they have no visibility into what happens at lunch or around the coffee pot. Nobody writes those stories down. They can’t be ingested and processed by a large language model. Nor can they be put together in novel ways by quirky human minds to come up with new insights.

    That’s true in the craft of copier repair and definitely true in the craft of software engineering. I can tell you from direct experience that interoperability standards-making is much the same. We couldn’t solve the seemingly simple problem of getting the SIS to talk to the LMS until we realized that registrars and academics think differently about what a “class” or a “course” is. We figured that out by talking with each other and with our customers.

    At its heart, standards-making is a social process. It’s a group of people who have been working separately on solving similar problems coming together to develop a common solution. They do this because they’ve decided that the cost/benefit ratio of working together is better than the ratio they’ve achieved when working separately. AI lowers the costs of some work. But it doesn’t yet provide an alternative to that social interaction. If anything, it potentially lowers some of the costs of collaboration by making experimentation and iteration cheaper—if and only if the standards-making participants embrace and deliberately experiment with that change.

    That’s especially true the more 1EdTech tries to have a direct role in what it refers to as “learning impact.”

    The knowledge that’s not reflected in our words

    In 2019, I was invited to give a talk at a 1EdTech summit, which I published a version of under the title “Pedagogical Intent and Designing for Inquiry.” Generative AI was nowhere on the scene at the time. But machine learning was. At the same time, long-running disappointment and disillusionment with learning analytics—analytics that actually measure students’ progress as they are learning—was palpable.

    I opened my talk by speculating about how machine learning could have helped with SIS/LMS integration, much as I speculated earlier in the post about how generative AI might help with QTI:

    Now, today, we would have a different possible way of solving that particular interoperability problem than the one we came up with over a decade ago. We could take a large data set of roster information exported from the SIS, both before and after the IT professionals massaged it for import into the LMS, and aim a machine learning algorithm at it. We then could use that algorithm as a translator. Could we solve such an interoperability problem this way? I think that we probably could. I would have been a weaker product manager had we done it that way, because I wouldn’t have gone through the learning experience that resulted from the conversations we had to develop the specification. As a general principle, I think we need to be wary of machine learning applications in which the machines are the only ones doing the learning. That said, we could have probably solved such a problem this way and might have been able to do it in a lot less time than it took for the humans to work it out.

    I will argue that today’s EdTech interoperability challenges are different. That if we want to design interoperability for the purposes of insight into the teaching and learning process, then we cannot simply use clever algorithms to magically draw insights from the data, like a dehumidifier extracting water from thin air. Because the water isn’t there to be extracted. The insights we seek will not be anywhere in the data unless we make a conscious effort to put them there through design of our applications. In order to get real teaching and learning insights, we need to understand the intent of the students. And in order to understand that, we need insight into the learning design. We need to understand pedagogical intent.

    That new need, in turn, will require new approaches in interoperability standards-making. As hard as the challenges of the last decade have been, the challenges of the next one are much harder. They will require different people at the table having different conversations.

    Pedagogical Intent and Designing for Inquiry

    The core problem is that the key element for interpreting both student progress and the effectiveness of digital learning experiences—pedagogical intent—is not encoded in most systems. No matter how big your data set is, it doesn’t help you if the data you need aren’t in it. For this reason, I argued, fancy machine learning tricks aren’t going to give us shortcuts.

    That problem is the same, and perhaps even worse in some ways, with generative AI. All ChatGPT knows is what it’s read on the internet. And while it’s made progress in specific areas at reading between the lines, the fact is that important knowledge, including knowledge about applied learning design, simply is extremely scarce in the data it can access and even in the data living in our learning systems that it can’t access.

    The point of my talk was that interoperability standards could help by supplying critical metadata—context—if only the standards makers set that as their purpose, rather than simply making sure that quiz questions end up in the right place when migrating from one LMS to another.

    I chose to open the talk by highlighting the ambiguity of language that enables us to make art. I chose this passage from Shakespeare’s final masterpiece, The Tempest:

    O wonder!
    How many goodly creatures are there here!
    How beauteous mankind is! O brave new world
    That has such people in’t!

    William Shakespeare, The Tempest

    It’s only four lines. And yet it is packed with double entendres and the ambiguity that gives actors room to make art:

    Here’s the scene: Miranda, the speaker, is a young woman who has lived her entire life on an island with nobody but her father and a strange creature who she may think of as a brother, a friend, or a pet. One day, a ship becomes grounded on the shore of the island. And out of it comes, literally, a handsome prince, followed by a collection of strange (and presumably virile) sailors. It is this sight that prompts Miranda’s exclamation.

    As with much of Shakespeare, there are multiple possible interpretations of her words, at least one of which is off-color. Miranda could be commenting on the hunka hunka manhood walking toward her.

    “How beauteous mankind is!”

    Or. She could be commenting on how her entire world has just shifted on its axis. Until that moment, she knew of only two other people in all of existence, each of who she had known her entire life and with each of whom she had a relationship that she understood so well that she took it for granted. Suddenly, there was literally a whole world of possible people and possible relationships that she had never considered before that moment.

    “O brave new world / That has such people in’t”

    So what is on Miranda’s mind when she speaks these lines? Is it lust? Wonder? Some combination of the two? Something else?

    The text alone cannot tell us. The meaning is underdetermined by the data. Only with the metadata supplied by the actor (or the reader) can we arrive at a useful interpretation. That generative ambiguity is one of the aspects of Shakespeare’s work that makes it art.

    But Miranda is a fictional character. There is no fact of the matter about what she is thinking. When we are trying to understand the mental state of a real-life human learner, then making up our own answer because the data are not dispositive is not OK. As educators, we have a moral responsibility to understand a real-life Miranda having a real-life learning experience so that we can support her on her journey.

    Pedagogical Intent and Designing for Inquiry

    Generative AI like ChatGPT can answer questions about different ways to interpret Miranda’s lines in the play because humans have written about this question and made their answers available on the internet. If you give the chatbot an unpublished piece of poetry and ask it for an interpretation, its answers are not likely to be reliably sophisticated. While larger models are getting better at reading between the lines—a topic for a future blog post—they are not remotely as good as humans are at this yet.

    Making the implicit explicit

    This limitation of language interpretation is central to the challenge of applying generative AI to learning design. ChatGPT has reignited fantasies about robot tutors in the sky. Unfortunately, we’re not giving the AI the critical information it needs to design effective learning experiences:

    The challenge that we face as educators is that learning, which happens completely inside the heads of the learners, is invisible. We can not observe it directly. Accordingly, there are no direct constructs that represent it in the data. This isn’t a data science problem. It’s an education problem. The learning that is or isn’t happening in the students’ heads is invisible even in a face-to-face classroom. And the indirect traces we see of it are often highly ambiguous. Did the student correctly solve the physics problem because she understands the forces involved? Because she memorized a formula and recognized a situation in which it should be applied? Because she guessed right? The instructor can’t know the answer to this question unless she has designed a series of assessments that can disambiguate the student’s internal mental state.

    In turn, if we want to find traces of the student’s learning (or lack thereof) in the data, we must understand the instructor’s pedagogical intent that motivates her learning design. What competency is the assessment question that the student answered incorrectly intended to assess? Is the question intended to be a formative assessment? Or summative? If it’s formative, is it a pre-test, where the instructor is trying to discover what the student knows before the lesson begins? Is it a check for understanding? A learn-by-doing exercise? Or maybe something that’s a little more complex to define because it’s embedded in a simulation? The answers to these questions can radically change the meaning we assign to a student’s incorrect answer to the assessment question. We can’t fully and confidently interpret what her answer means in terms of her learning progress without understanding the pedagogical intent of the assessment design.

    But it’s very easy to pretend that we understand what the students’ answers mean. I could have chosen any one of many Shakespeare quotes to open this section, but the one I picked happens to be the very one from which Aldous Huxley derived the title of his dystopian novel Brave New World. In that story, intent was flattened through drugs, peer pressure, and conditioning. It was reduced to a small set of possible reactions that were useful in running the machine of society. Miranda’s words appear in the book in a bitterly ironic fashion from the mouth of the character John, a “savage” who has grown up outside of societal conditioning.

    We can easily develop “analytics” that tell us whether students consistently answer assessment questions correctly. And we can pretend that “correct answer analytics” are equivalent to “learning analytics.” But they are not. If our educational technology is going to enable rich and authentic vision of learning rather than a dystopian reductivist parody of it, then our learning analytics must capture the nuances of pedagogical intent rather than flattening it.

    This is hard.

    Pedagogical Intent and Designing for Inquiry

    Consider the following example:

    A professor knows that her students tend to develop a common misconception that causes them to make practical mistakes when applying their knowledge. She very carefully crafts her course to address this misconception. She writes the content to address it. In her tests, she provides wrong answer choices—a.k.a. “distractors”—that students would choose if they had the misconception. She can tell, both individually and collectively, whether her students are getting stuck on the misconception by how often they pick the particular distractor that fits with their mistaken understanding. Then she writes feedback that the students see when they choose that particular wrong answer. She crafts it so that it doesn’t give away the correct answer but does encourage students to rethink their mistakes.

    Imagine if all this information were encoded in the software. Their hierarchy would look something like this:

    • Here is learning objective (or competency) 1
      • Here is content about learning objective 1
        • Here is assessment question A about learning objective 1.
          • Here is distractor c in assessment question A. Distractor c addresses misconception alpha.
            • Here is feedback to distractor c. It is written specifically to help students rethink misconception alpha without giving away the answer to question A. This is critical because if we simply tell the student the answer to question A then we can’t get good data about the likelihood that the student has mastered learning objective 1.

    All of that information is in the learning designer’s head and, somehow, implicitly embedded in the content in subtle details of the writing. But good luck teasing it out by just reading the textbook if you aren’t an experienced teacher of the subject yourself.

    What if these relationships were explicit in the digital text? For individual students, we could tell which ones were getting stuck on a specific misconception. For whole courses, we could identify the spots that are causing significant numbers of students to get stuck on a learning objective or competency. And if that particular sticking point causes students to be more likely to fail either that course or a later course that relies on a correct understanding of a concept, then we could help more students persist, pass, stay in school, and graduate.

    That’s how learning analytics can work if learning designers (or learning engineers) have tools that explicitly encode pedagogical intent into a machine-readable format. They can use machine learning to help them identify and smooth over tough spots where students tend to get stuck and fall behind. They can find the clues that help them identify hidden sticking points and adjust the learning experience to help students navigate those rough spots. We know this can work because, as I wrote about in 2012, Carnegie Mellon University (among others) has been refining this science and craft for decades.

    Generative AI adds an interesting twist. The challenge with all this encoding of pedagogical intent is that it’s labor-intensive. Learning designers often don’t have time to focus on the work required to identify and improve small but high-value changes because they’re too busy getting the basics done. But generative AI that creates learning experiences modeled after the pedagogical metadata in the educational content it is trained on could provide a leg up. It could substantially speed up the work of writing the first-draft content so that designers can focus on the high-value improvements that humans are still better at than machines.

    Realistically, for example, generative AI is not likely to know particular common misconceptions that block students from mastering a competency. Or how to probe for and remediate those misconceptions. But if were trained on the right models, it could generate good first-draft content through a standards-based metadata format that could be imported into a learning platform. The format would have explicit placeholders for those critical probes and hints. Human experts. supported by machine learning. could focus their time on finding and remediating these sticking points in the learning process. Their improvements would be encoded with metadata, providing the AI with better examples of what effective educational content looks like. Which would enable the AI to generate better first-draft content.

    1EdTech could help bring about such a world through standards-making. But they’d have to think about the purpose of interoperability differently, bring different people to the table, and run a different kind of process.

    O brave new world that has such skilled people in’t

    I spoke recently to the head of product development for an AI-related infrastructure company. His product could enable me to eliminate hallucinations while maintaining references and links to original source materials, both of which would be important in generating educational content. I explained a more elaborate version of the basic idea in the previous section of this post.

    “That’s a great idea,” he said. “I can think of a huge number of applications. My last job was at Google. The training was terrible.”

    Google. The company that’s promoting the heck out of their free AI classes. The one that’s going to “disrupt the college degree” with their certificate programs. The one that everybody holds up as leading the way past traditional education and toward skills-based education.

    Their training is “terrible.”

    Yes. Of course it is. Because everybody’s training is terrible. Their learning designers have the same problem I described academic learning designers as having in the previous section. Too much to develop, too little time. Only much, much worse. Because they have far fewer course design experts (if you count faculty as course design experts). Those people are the first to get cut. And EdTech in the corporate space is generally even worse than academic EdTech. Worst of all? Nobody knows what anybody knows or what anybody needs to know.

    Academia, including 1EdTech and several other standards bodies, funded by corporate foundations, are pouring incredible amounts of time, energy, and money into building a data pipeline for tracking skills. Skill taxonomies move from repositories to learning environments, where evidence of student mastery is attached to those skills in the form of badges or comprehensive learner records. Which are then sent off to repositories and wallets.

    The problem is, pipelines are supposed to connect to endpoints. They move something valuable from the place where it is found to the place where it is needed. Many valuable skills are not well documented if they are documented at all. They appear quickly and change all the time. The field of knowledge management has largely failed to capture this information in a timely and useful way after decades of trying. And “knowledge” management has tended to focus on facts, which are easier to track than skills.

    In other words, the biggest challenge that folks interested in job skills face is not an ocean of well-understood skill information that needs to be organized but rather a problem of non-consumption. There isn’t enough real-world, real-time skill information flowing into the pipeline and few people who have real uses for it on the other side. Almost nobody in any company turns to their L&D departments to solve the kinds of skills problems that help people become more productive and advance in their careers. Certainly not at scale.

    But the raw materials for solving this problem exist. A CEO for HP once famously noted knows a lot. It just doesn’t know what it knows.

    Knowledge workers do record new and important work-related information, even if it’s in the form of notes and rough documents. Increasingly, we have meeting transcripts thanks to videoconferencing and AI speech-to-text capabilities. These artifacts could be used to train a large language model on skills as they are emerging and needed. If we could dramatically lower the cost and time required to create just-in-time, just-enough skills training then the pipeline of skills taxonomies and skill tracking would become a lot more useful. And we’d learn a lot about how it needs to be designed because we’d have many more real-world applications.

    The first pipeline we need is from skill discovery to learning content production. It’s a huge one, we’ve known about it for many decades, and we’ve made very little progress on it. Groups like 1EdTech could help us to finally make progress. But they’d have to rethink the role of interoperability standards in terms of the purpose and value of data, particularly in an AI-fueled world. This, in turn, would not only help match worker skills with labor market needs more quickly and efficiently but also create a huge industry of AI-aided learning engineers.

    Summing it up

    So where does this leave us? I see a few lessons:

    • In general, lowering the cost of coding through generative AI doesn’t eliminate the need for technical interoperability standards groups like 1EdTech. But it could narrow the value proposition for their work as currently applied in the market.
    • Software engineers, learning designers, and other skilled humans have important skills and tacit knowledge that don’t show up in text. It can’t be hoovered up by a generative AI that swallows the internet. Therefore, these skilled individuals will still be needed for some time to come.
    • We often gain access to tacit knowledge and valuable skills when skilled individuals talk to each other. The value of collaborative work, including standards work, is still high in a world of generative AI.
    • We can capture some of that tacit knowledge and those skills in machine-readable format if we set that as a goal. While doing so is not likely to lead to machines replacing humans in the near future (at least in the areas I’ve described in this post), it could lead to software that helps humans get more work done and spend more of their time working on hard problems that quirky, social human brains are good at solving.
    • 1EdTech and its constituents have more to gain than to lose by embracing generative AI thoughtfully. While I won’t draw any grand generalizations from this, I invite you to apply the thought process of this blog post to your own worlds and see what you discover.

    Source link

  • desperately in need of redefinition in the age of generative AI. – Sijen

    desperately in need of redefinition in the age of generative AI. – Sijen

    The vernacular definition of plagiarism is often “passing off someone else’s work as your own” or more fully, in the University of Oxford maternal guidance, “Presenting work or ideas from another source as your own, with or without consent of the original author, by incorporating it into your work without full acknowledgement.” This later definition works better in the current climate in which generative AI assistants are being rolled out across many word-processing tools. When a student can start a prompt and have the system, rather than another individual, write paragraphs, there is an urgent need to redefine academic integrity.

    If they are not your own thoughts committed to text, where did they come from? Any thoughts that are not your own need to be attributed. Generative AI applications are already being used in the way that previous generations have made use of Wikipedia, as a source of initial ‘research’, clarification, definitions, and for the more diligent perhaps for sources. In the early days of Wikipedia I saw digitally illiterate students copy and paste wholesale blocks of text from the website straight into their submissions, often with removing hyperlinks! The character of wikipedia as a source has evolved. We need to engage in an open conversation with students, and between ourselves, about the nature of the purpose of any writing task assigned to a student. We need to quickly move students beyond the unreferenced Chatbots into structured and referenced generative AI tools and deploy what we have learnt about Wikipedia. Students need to differentiate between their own thoughts and triangulate everything else before citing and referencing it.

    Image: Midjourney 12/06/23


    Source link