Last night, I looked at a chart that had been tweeted out by Marco Learning, a terrific source for information about The College Board’s AP Program. It showed the percentage of all scores graded 4 and 5 over time by subject, and there were some glaring points: Lots of big increases in certain subjects that didn’t seem to make sense. Turns out, their data was correct.
Wanting to dive down a little deeper, I went to the College Board website to look at the data myself, and to “download” it for some additional analysis. I put the word download in quotation marks on purpose.
I have a history with College Board, of course. I used to download the very rich AP data by state, exam, and ethnicity they’d post on their site and put it into an interactive format that pulled out insight better than the large, text-exclusive spreadsheets they’d post. Then–despite the organization’s oft-cited commitment to transparency–they stopped.
In an example of Newspeak worthy of the novel 1984 that they might want to use in a future AP English Literature Exam, College Board said they were going to implement a “streamlined” reporting protocol for the data. Less data, and less insight, in other words, was better. (They also announced that their “Landscape” product was being pulled down while they were saying they were making it more transparent, by the way, and no high school person has access to it today.)
Anyway, this chart shows incorrect data for AP Psych, suggesting that the percentage of 4 and 5 scores increased by 42 percentage points between 2022 and 2024. Let me explain how it found its way into my tweet, and the larger issues it points out.
You can still download summary data at the subject level (but not more detailed than that) on the College Board website, but it comes in a messy format that makes one think they don’t really want you to do any analysis on it. It has hidden rows, hidden columns, merged cells, and different formats by row that make anything other than tedious manual extraction almost impossible. It looks like this; the data are clearly intended for casual users who want a quick answer, and not in a way that makes it easy to study in-depth.
So, after getting frustrated after wrangling this and admitting I’d been foiled by the data people on Vesey Street, I settled not for raw data, but for summaries on their website, on pages like this for 2024 and this for 2022. I manually copied all the tables, pasted them into Excel, and then set about cleaning them up. Even that was frustrating: In some years, College Board calls its exam “AP English Language & Composition,” while in other years, it’s “AP English Language and Composition.” Similarly, it’s either “AP 2D Art & Design” or “AP 2-D Art and Design.” Some years, data are rounded to the nearest whole number; in others, to one decimal point. These are insignificant differences to human readers, but they’re a big deal for computers.
All seemed to be going well, although the year-to-year changes in nomenclature and formatting seem capricious and undisciplined from a data standpoint, especially for an organization that prides itself on its research and analysis capabilities.
And, finally, on the 2024 link, above, guess what? AP Psych is listed twice: First under “History and Social Sciences”
and then again under “Sciences.” So, AP Psych in 2024 (but not the other years) got counted twice.
Had I been successful in just downloading and cleaning the numbers, this would not have happened because I calculate the percentage of the totals of raw numbers. But because I had to scrape this off a website, this error showed up. I should have checked this a couple of ways before posting, but I didn’t, and that’s my fault.
This would normally be where I’d call on College Board to make their data more accessible to the general public in the interest of transparency, but a) they don’t listen, b) they don’t give a crap about the members, and c) they just wait for people to forget how bad they are at the most simple things and keep paying their executives multi-million dollar salaries.
And these are the people, I’d remind you, who are being asked to fix the FAFSA, and despite the massive conflict of interest it creates, gleefully and arrogantly agree to do so.
All is good. Carry on. I’ll post the complete data soon after I do more more auditing.