A Public Discourse Bot Should Be Mistaken for the Real Thing

1. Afterwards, there was still the command line

This past month, having just defended my dissertation, I found myself similarly needing to do some work, but not up to the challenge of literature. In one of my favorite passages from De Quincey’s Confessions, he claims that in the depths of his addiction, his thinking clouded by pain and intoxication, political economy helped him to convalesce:

In this state of imbecility I had, for amusement, turned my attention to political economy; my understanding, which formerly had been as active and restless as a hyæna, could not, I suppose (so long as I lived at all) sink into utter lethargy; and political economy offers this advantage to a person in my state, that though it is eminently an organic science (no part, that is to say, but what acts on the whole as the whole again reacts on each part), yet the several parts may be detached and contemplated singly.

In a similar situation, I decided to code something. Coding projects in my experience have a similar cognitive reward structure to what De Quincey describes of reading economics. Because computers do exactly what you tell them to, it’s hard, unless you’re working in a complex design environment, to go too far afield of your intentions. Contrary to the questions-based approach of traditional humanistic inquiry, I’ve found that programming, at my level of sophistication, either results in something that works, or doesn’t. Every once in a while, an error will be genuinely interesting, but that’s the rare exception. And so the modular thinking facilitated by programming allows me to test and re-test pieces and wholes in ways that keep me productively balanced in that flow state between feeling successful and feeling frustrated. In short, I find the work of coding to be qualitatively different than that of critical thinking. But that’s a far thing from saying that the end result of such work necessarily lacks a critical edge.

2. Critical bots

I’ve wanted to build a Twitter bot for a while, both because the Twitter API is where I first cut my teeth on processing large data sets, and because some of the work in this area I’ve found intellectually stimulating and politically inspiring. Mark Sample has a list of “protest bots,” which, in a post that inspired my title here, he defines as “topical, cumulative, data-based, and oppositional,” and, more importantly, “can’t be mistaken for bullshit.” In my opinion, the most medium-transforming protest bot is @every3minutes by the historian Caleb McDaniel at Rice. Caleb famously built a bot that tweets a variation of “A slave was just sold” every three minutes, which was the average time between such sales between 1820-1860.

This bot changes the entire Twitter experience: scrolling through status updates stops being a thoughtless passtime when, every few flicks of the thumb, you’re suddenly socked in the gut by a historical reality made immediate.

3. A political discourse bot

I attempted to build something slightly different — a public discourse bot, we might call it. It’s May 2015, and the 2016 presidential campaign is well underway. On the one hand, this is depressing because blah blah permanent campaign blah blah. On the other hand, it’s exciting because it is one of the few domains in which most (not all) Americans still have some say over policy: voting districts have been gerrymandered–and then some–out of existence, and the Supreme Court is openly flirting with disregarding the clear intent of what little the congress has passed. In other words, I think Presidential campaigns are a great opportunity for public discourse.

What I built, then, was an interactive Twitter bot based on a rudimentary keyword search engine. I scraped the congressional record for all of Bernie Sanders’ statements, and used NLTK to build a database of his keywords and phrases.[1] I removed some of the passages that would be confusing when taken out of context (points of order, some letters read into the record, the details of some amendments, etc.), and after messing around for a while with the basic search algorithm and the output formatting, the senator’s congressional record had essentially been made interactive.

4. The Specifics

@SandersBot has two functions built in:

  1. If someone mentions him, he makes a weighted random guess as to which document in his corpus is the best fit, and, then the best passage in that document.
  2. If nobody talks to him for an hour, he reads what his friends are tweeting about, and tries to respond with the two sentences he thinks are most relevant to those topics.

And, because he was written for an academic, he always includes a link to the source material in the congressional record.

I’ve been surprised at how well he deals with basic policy questions:

bernie_jesse And this past weekend, on Memorial Day, he really seemed to pick up on the day’s theme in his timeline:

bernie vets copy

Sometimes, of course, he misses his mark. But the other day, the bot did alright on Twitter’s version of the Turing test, when a couple users didn’t realize it was a bot account, and engaged it in an extended discussion. The bot responds to almost any mention, so replies to him can quickly prompt runaway threads.  One person was a bit frustrated at how quickly the bot posted long, sometimes off-topic replies in her timeline; after I told her the replies weren’t coming from a human, she was very kind but said she wouldn’t use it again. Another person who engaged it at length didn’t seem to mind when I told him it was a bot, and came back to ask it a few more questions the next day.

5. Conclusion

I don’t know if this would work with every congressperson (though I did post an early version of the project’s code to github). What’s ironic about my choice of Bernie Sanders is that his vocal stance on specific issues is both what makes this bot functional and in many ways superfluous. Functional because his specific and expressive stances on policy questions makes the corpus so searchable and quote-worthy; superfluous because his clear stances on most issues makes them easy to find in any news search engine.[2]

But at the same time, his regular tweeting schedule and always-on availability for questions stage less of an intervention than a contribution to political discourse. At the end of the day, the bot is supposed to be like C-SPAN having a dream about social media: it tries to blend the dusty text of the congressional record, and the day-and-night pulse of Twitter. This bot tries to make the politics of congress accessible to one of the most politically rich communications platforms we have today.

 

Footnotes———–

[1] I was inspired here by one of my students, who this semester submitted a truly compelling final project that took William Wordsworth’s poetic corpus, and scrambled the lines into believable simulacra using Markov chains and Natural Language Processing.

[2] If anybody is interested, I started processing Hillary Clinton’s congressional corpus but decided I’d given enough time to the project as a whole. I would be happy to send you the files.

2014 Wordsworthian Student Projects

2014 was busy: teaching at URI, moving to Houston, writing like mad, and teaching at Rice — Romanticism and Shakespeare, so thanks both to the Romanticists back at Brown and to the Shakespeareans James Kuzner and Jean Feerick, without whom I wouldn’t be able to teach a spider to weave a web.

flower copy

The point being that I only blog about my students these days, and my Romanticism students this past semester did some very interesting work that lends itself to the blog format. Conveniently, they both worked with the same subject material, namely the working relationship of Dorothy and William Wordsworth. That relationship has become something of a pedagogical touchstone for me (displacing even Blake!), after two years of on-and-off dissertation engagement with the subject.

And so in my Fall 2014 Rice seminar, I gave students a crash course in Wordsworthianism, walking them through Homans, Levin, Mellor, Woof (and Woof), Fay, and Newlyn – though I should say I only teach the journals. I’ve adopted, as a means of fixing the canonicity problem, the resolution never to teach William without Dorothy. I’m more in Fay’s camp than anywhere else, and try to do justice to the complexity of their working relationship, but as one student (I can’t remember who) assessed my terrible poker face rather fairly during office hours, “You weren’t going to let William get away with it.”

And I love what they did. Dorothy, in their renderings, isn’t an appendix, or a pretext, or an index. Their representations make us think of the writer’s relationship in terms of intertextuality, with interesting and productive differences between their readings.

———–

1. Jessica, Chas, and Daniel made a video reflecting on William and Dorothy’s biographical and literary relationship. In a reading (Daniel) of an excerpted “Tintern Abbey”, they allow William only to go so far in his approach to the poem’s closing address to his sister (see Fay’s “address-to-maiden”). At a certain point, Dorothy insinuates herself into this poem as though she were refusing to merely be imagined by her brother, and retroactively changes the video we’ve just watched.

———–

2. Alitha (Computer Science major) and Sharon (English major) combined their skills to create a hypertext version of the “Daffodils”. In one column, we have William’s poem (the 1815 version); in the second, Dorothy’s journal entry; and in the third, a changing block of commentary. When you mouseover portions of either column that have been categorized in a particular way, corresponding blocks of text in both columns are highlighted. When you click on such a highlighted text, commentary pops up in the third column. It’s a lot like RapGenius, but I like this interface much better: there is no original text here with an index, but two interrelated texts and a changing third that attempts to mediate. As you’ll see, the page takes a determined interpretive stance on the nature of the relation between the texts; and while such authoritativeness is often a subject of criticism against hypertextual presentations, I think the site’s interactivity and critical approach present a real challenge by Digital Humanities to the traditional anthology form.

syau-site-conceptConcept sketch. Click to view interactive site.
(best viewed in full-screen mode)

Kudos to both groups for doing great critical work in nontraditional formats, and for giving me some incredible teaching aids for this semester’s Romanticism class!

2013 Summer Science Fiction Course Feedback

Regular visitors to my blog know that I taught a science fiction course last year at Brown, through the Continuing Education department. In that course, we explored a number of different authors’ and filmmakers’ attempts to understand the limits of personal and social human existence, from Mary Shelley’s Frankenstein to Star Trek to Neal Stephenson’s imagination of a Turing tests with paranoid computers.

Some of my students, a year out, have written testimonials for me to share about how the course has helped them as writers and critical readers of literature, and I’d like to share them! It was an immensely gratifying experience to teach an often-overlooked genre to such gifted students, and I’m so pleased to hear the same from them.

Student 1:

“Future Perfect was a fantastic course, not only because of the nature of the material, which was in and of itself fascinating, but because of the new way that the class forced me to look at literature, at society, and at the world- expanding my horizons, opening my eyes to a new layer of textual analysis. Exploring the crossroads of science and literature in a way I never would have imagined, gave us all a chance to think about things in new ways and to try to understand the innate human desire to wonder about what the future will bring, what our technological advancements mean, and ultimately what makes us human and what makes humanity superior or different to nature or to technology. I find myself thinking about literature in new ways still, of course as much as this is an accolade for the class, it is even more so an accolade for John, who expanded my world view, while simultaneously helping me focus on the close analysis of science fiction in every medium and most surprisingly helped me to unearth aspects of my own writing that I otherwise never would have realized fully.”

Student 2:

As someone who adores science fiction, Future Perfect was basically an irresistible opportunity to talk about the books and films I love. Sci-fi is a field that gets tragically overlooked by just about every curriculum, so to be able to learn more about it and discuss its themes in a classroom environment was a unique (and awesome) experience. The class definitely helped me as a writer; the assignments gave us a lot of room for creativity, and the feedback we received was conducive to improvement.

Student 3:

I signed up for Future Perfect planning to indulge a guilty pleasure of mine (science fiction of course!), but after the two weeks were over I had such a deep respect for the genre and its ideas that I now consider sci-fi a true art form and a (entirely guiltless) passion. The class touched on everything from philosophy to history to real scientific discoveries, all of which are the inspiration for science fiction along with the powerful question, what if? Science fiction is not only a film and literary genre, it is a lens through which many people have looked at the world and imagined it differently, which of course, is the first step to changing it. As a student of the Future Perfect class, I was expected to complete college-level reading and writing assignments, and the fact that I was reading Frankenstein and writing short stories about reanimated corpse “service beings” and meteorites that bent the laws of probability had no impact on the fact that now, as I face college in a mere few months, I feel ready for it. Future Perfect was an unforgettable experience, and as close to perfect as things come outside of fiction.

I’m teaching a new version of this course in July. If you know a pre-college student who loves science fiction and wants to learn how to read it and write it (either in short story form or as an academic essay), please send them my way!
john_mulligan@brown.edu

#ACLA2014 tweets (Now with #ACLA14) through Saturday, March 22, 10:00pm

The graph has been updated to include all tweets before 10:00pm (ish) on Saturday, March 22. It includes Friday’s tweets as well.

In addition, it now includes #ACLA14, since there has been some 

I’m sharing what I think is a useful tool for navigating the Twitter activity on #ACLA2014 (&#ACLA14) (though this is more of a potential utility, as there’s not yet enough activity to require this kind of map). There is now enough activity (734 tweets by 225 users, and 196 connections),  to produce a navigable map.

The nodes in this graph are people tweeting on #ACLA2014 & #ACLA14, or mentioned by people using that hashtag. If you click on them, you’ll see a user icon, a list of their tweets with links to that content, and below this a list of their connections to other people. Connections represented here: retweets, replies, and in-line mentions (“Loved the panel with @SoAndSo”).

View the network graph here.

acla_network_screenshot_1

I will update it over the course of the day, and make another for Saturday’s activity (unless people would prefer a multi-day map). I welcome feedback!

My own panel can be found on pages 274-5 of your program. It’s Friday & Saturday, 4:40-6:30, at 25 West 4th C-16. I present on Saturday, and will be talking about Thomas De Quincey and the Netflix & Amazon recommendation algorithms.

Studies in Romanticism Dynamic, Co-Citational Network Graph (Video)

Last week, because of a tweet by Alan Liu, I found Scott Weingart’s wonderful digital humanities blog. As it turns out, it looks like he had already gone through some of the same work processes last year, that I codified last month by adding a .gexf export function to Neal Caren’s refcliq.

One of the things I learned from scott’s post was that I had been drawing co-citational, rather than citational, graphs. Which made a lot of sense of the structures I’d been seeing. Basically, a line on the graph btw A and B doesn’t represent work A citing work B, but instead that A and B are both cited by some third work, not necessarily represented on the graph. All the nodes you have seen in the graphs I have posted are works that have been cited two or more times, and the edges are all representations that those two works have been cited together by two or more separate articles.

What was missing from these exports was the temporal dimension: co-citational network graphs allows us to think visually about how fields organize knowledge, and their own production of it. However, the interactive graphs I published before were static, and so did not allow us to think about how these internal structures developed over time.

I therefore reworked the code to export dynamic graphs (.gexf format only). These graphs register changes in influence and connectedness, over time, of the works cited by a journal.

I believe I wrote this code properly, but it is producing small variances in graph sizes compared to Caren’s original, so if anyone is interested in helping to unpack that, definitely email me. I also considered the usefulness of making modularity classes dynamic,

Back to the graph. My test case is again Studies in Romanticism. Over time, you will see individual nodes and edges changing size based on (respectively) the number of times a given work has been cited, and the number of times two works have been cited together. You will also see clusters develop, and separate from one another. I have not added any decay function, so once works are linked, or once a work has a specific node size, it either keeps that size or grows; no works or links diminish in absolute terms simply because they haven’t been cited in a while.

In relative terms, however, they may fail to keep up with the growing influence of Wordsworth’s Prelude, Coleridge’s Biographia Literaria, or even Milton’s Paradise Lost. I have identified clusters around the big six poets, plus three around Mary Shelley, William Godwin, and Walter Scott. I have also identified the developments of two of these sub-areas of Romantic interest with the publications of major critical works (all shown on graph).

The .gexf file can be downloaded here.

Here is my annotated video screen capture of the dynamic graph’s development over time.

Studies in Romanticism Dynamic Co-Citational Graph from John Mulligan on Vimeo.

I really think this kind of visualization could be an incredible research aid, if the raw data were cleaned up. But other commitments are likely going to keep me from working on this project for a while. In the meantime, please consider developing the code on GitHub and/or use the tool to create a map of your own field’s evolution. If you do, please email me a link to give me a few minutes’ break :)

PMLA Citation Network, 1975-2010

I’ll be making one more of these graphs (Victorian Studies) before I give it a rest for a while, but I thought I would present a nice coda to the MLA interactions graphs; I have two network graphs (using slightly different scripting and visualization) of Twitter interactions on the hashtag #mla14, in previous blog posts.

To round off this thinking about academic networks in all senses — though I have to say I haven’t been doing much thinking at all on the blog about this as I just try to make the data legible — I thought I would publish a citational network graph for PMLA. For the details on how to navigate these graphs, go to my earlier post on Studies in Romanticism. On this graph, though, metadata doesn’t seem to be doing the job on identifying communities, and the database had a good number of orphan nodes that were causing problems with the graph and had to be removed.

The below displays the citational network for PMLA from vol. 90, no. 1 – v. 125 no. 4 (1975-2010).

View Graph:
pmla

#MLA14 network, 6pm Friday to 10am Sunday

This is my second graph of tweets on this blog. I’m using an old script of mine to capture #mla14 tweets (using Twitter’s REST API). The below graph was built from 9785 tweets, posted between 6pm Friday and 10am Sunday. It shows 1736 users and 3666 interactions.

There is at least one other #mla14 visualization out there, Ernesto Priego’s, which uses d3.js. His visualization, which is searchable, uses Martin Hawksey’s TAGS. Priego is also posting regular updates on #mla14  statistics on his twitter feed.

My version, which will not update with new data, I built with Gephi and my own script, but it loads and runs a bit faster as a result. It would be interesting to hold up the two networks and to see how differences in interpreting mentions create different groupings. This visualization is made using a sigma.js plugin (see below).

Click to view:
mla14

Credits:

I’m using Alexis Jacomy’s sigma.js to render Graph #1. Graph #2 was produced using a plugin written by Scott Hale. The original graph was drawn in Gephi, using data gathered by and old script of mine, that I’ve updated to use on this hashtag.

Three Americanist Journals

I’ve had a request to map citations in three Americanist journals: Early American LiteratureAmerican Literary History, and American Literature.

For simplicity’s sake and to see how well the community-detection algorithms work across journals. I’m actually quite surprised at how well this seems to have worked (and how coherent the detected communities seem to be). I have a little bit of training in (19th-c ) Americanism, so I’ve gone ahead and identified some of the communities:

americanists

The full, interactive graph is available below:

View full screen:
americanists

I welcome expert commentary below, or on Twitter.

You can download the original gephi file here.

Credits:

As usual, some credits: the javascript visualization, which allows this complex graph to be presented in your browser, was written by Alexis Jacomy. The raw data comes from Thomson Reuters’ Web of Science. The parser/analyzer that turns the raw data into a network was written by Neal Caren. And I wrote a patch that allows Caren’s code to talk to Gephi. It occurs to me that these credits might lend themselves to a network graph…

#MLA14 Twitter interactions

This blog is quickly becoming a library of network graphs, but I really couldn’t resist this.

I dusted off an old script of mine for pulling twitter data, and have built a network graph of all the interactions between people using the hashtag “mla14″. This graph is recent through about 11:15pm Eastern.

This graph did not readily partition into distinct communities. Perhaps someone else will have better luck with the metadata, which I’ve exported here.

But the full graph, which I’ve published using a Gephi sigma.js export plugin, is available below:

s2

You can also view this as an .svg image here:
s1 copy

Stay warm in Chicago!

 

Another Citational Network: Eighteenth-Century Studies

Yesterday I posted a draft visualization of the citational network in the journal Studies in Romanticism. When I did so, I said I would post a similar network for the journal Eighteenth-Century Studies. Here we go!

To navigate:

  • Zoom by scrolling
  • Pan by clicking-and-dragging
  • Examine a work’s “neighbors” by rolling your mouse over a node

View full screen:
ecs

As promised, Nancy Armstrong‘s Desire and Domestic Fiction has more indexed citations in the Web of Science database than does Thomas Hobbes’s most famous work. Bigger than Leviathan, as it were. I’m interested in hearing feedback from people about the algorithmically-detected “communities” color-coded in the above graph, and any noteworthy connections that can be found.

I also suggested yesterday that I had some ideas for how best to use this kind of visualization. Here are a couple:

1. As a discovery tool

My medium-term goal (end of summer?) is to turn these graphs into graphical interfaces. Essentially, one would be able to click on a node, and see the article’s related metadata (e.g., abstract, publication information, cited and citing articles), including a link to the JSTOR resource (or Amazon page), displayed in a sidebar. I’ve already written some of the back-end code for this. This would allow users to move seamlessly from a basic exploration of connections to research. If the strength of these graphs is that they show us important connections we weren’t yet aware of, they can only be helped in this area by assisting scholars in turning parts of the network into ad-hoc bibliographies.

2. As a minoritarian discovery tool

A major downside of these graphs for scholarly discovery is that they tend to reinforce our prejudices towards major/great works (both primary and secondary). It would be all too easy to use this tool to create mere checklists of scholars to cite. Moreover, the citational database we are using here bakes this bias in, to a certain degree–1) the “orphan” articles that I’ve removed may very well be cited by works that the Web of Science simply didn’t parse correctly, and 2) these articles certainly cite other works, but whoever they’re citing, the Web of Science didn’t recognize the reference.

A longer-term project would involve fleshing out this graph fully, probably by writing a bibliographic parser and workflow for a human to do error correction on the parsed data. With that full graph, one could create a devil’s dictionary of dead-end citations and pseudo-communities. By looking in the aggregate at small groups of scholars who withered on the vine, one could tell a history of whose work went nowhere, and attempt to explain why. These analytic tools need to be used to actively counteract the confirmation bias that they inherently favor.

3. As dynamic visualizations of the development of fields and subfields

As noted above, the publication dates correspond roughly with the pagination of articles within journal issues and volumes. The use of this data goes beyond a temporal filter on a static graph, though. By time-coding the different works’ appearances and references of one another, and applying a force-based layout as these elements are introduced, we should be able to see the historical emergence of scholarly communities over time, and so to assist attempts to narrativize this information.

But…

I’ve got a (non-digital English Lit) dissertation to write, and for the time being my work on this project will be limited to cleaning up these data sets (probably using JSTOR information with code I’ve been writing). The Web of Science database is fairly solid on citations (though there are plenty of duplicate entries and probably missed connections), but its other bibliographic metadata (title, author, date, &c.) are dismally bad. It’s a rare node on the above graph whose title is complete.

Credits:

Again, some credits: the javascript visualization, which allows this complex graph to be presented in your browser, was written by Alexis Jacomy. The raw data comes from Thomson Reuters’ Web of Science. The parser/analyzer that turns the raw data into a network was written by Neal Caren. And I wrote a patch that allows Caren’s code to talk to Gephi. It occurs to me that these credits might lend themselves to a network graph…