Digital Humanities Day 2 (Notes)

8:30 AM 

Toward a Noisier Digital Humanities – Darren Mueller, Whitney Trettier

  • @TeamNoise – PhD lab at Duke
  • soundBox – incorporate sound into digital projects
  • provocations: hybrid projects, layering physical spaces with digital connectivity to provoke new sonically engaged scholarly expression
  • pubishing format – Provoke! A web-book including audio, video and source code and website showcasing new multimedia methods of scholarship
  • “Sonic edition” of the book
  • #DHsound
  • Archiving, publishing, working with sound in digital environment – found scholars around the world struggling with this
  • Graduate pedagogy is usually coursework – exams – dissertations – supposed to be scaffolded, but are usually just huge steps with not a lot of support between them
  • The GDSI grant scaffolded graduate pedagogly through freeform collaboration – built a professional community
  • Seed grant ($500) fostered very early collaboration across disciplines, got the conversation going
  • Downside of the GDSI scholarship – cannot be used for thesis work, Soundbox was all extra-curricular
  • Sonifying library exhibit  space – had more luck with external galleries, but white-box space less interesting
  • Annotating a book with sound – use cheap sound cards (you can buy a greeting card and record your own voice)

Linked Data for Music Collections – a user-centred approach – Seamus Lawless

  • Linked Data – “it is about making links, so that a person or machine can explore the web of data…when you have some of it, you can find other, related data.” (TBL)
  • Huge increase in music collections online, sound recordings, but also textual info – biographies, scores, histories of works, engineering info about how recordings are made.
  • Disparate databases – a score in this database, a recording of the same thing in another
  • Linking biographical information to recordings elsewhere.
  • Different metadata, different formats,  MAY share DublinCore for basic stuff, but use their own standards for other parts of the collection
  • LinkedData and RDA help collections be more pluralistic – move away from siloed mark-up
  • Allows for linking to related but different info – where did a performance take place, where was a recording done?
  • Current landscape – Music Ontology – vocabulary for music info, builds upon and extends ontologies: Timeline, Event, FOAF [friend of a friend], FRBR, Key, Instrument. BUT no support for music scores, can identify too closely with particular genres (not sure about how this is)
  • MusicBrainz – music metadata database, collects core and user-conrtbuted data on artists, recordings etc…not a lot of classical coverage, doesn’t distinguish between performers and composers, privileges performances.
  • Contemporary Music Centre in Ireland – national archive &resource centre, has extensive collections of scores, huge range of media,
  • User base – composers, researchers, promoters, teachers, students
  • Conducted focus groups and interviews to discuss exposing the data of the centre – how to link to external content? How would users feel about reuse, both internally and externally (borrow from others, share their own collection)? How to make is discoverable?
  • Key findings – ensure trust and accuracy of data, avoid over-linking to external data, want more focused, music-related links, enhance browsability, prevent technological disadvantage among composers (i.e. some are not very present on the web – will they be drowned out by more savvy composers?)
  • Overall model was developed for integration of Linked Data and CMC ‘s collection – remodel database using FRBR, map CMCs metadata to the Music Ontology, Embed mappings as RDF in a new drupal version of the CMC website, develop visualization tools to support browsing
  • Linked data are very obvious very quickly music, but user base had many fears about the design. MORE nervous about pulling info in from outside to enhance local experience – excited about linked data being used to enhance the profile of the institution, share collection with outsiders
  • http://buff.ly/12M4w2Y ß dissertation on this topic

Schooling the Scholar, Coaching the Fan: Fannish Intellectual Production  and Social Network Analysis – Hannah Goodwin

  • Based on work at UC Santa Barbara
  • Studying social networks around television shows
  • How to analyze a TV show? They’re long, expansive, not always treated as a single unit
  • Analyze characters from Friday Night Lights – race is explicitly discussed
  • Some datasets sometimes from fan forms
  • Social network graphs
  • Wanted to call attention to how DH projects enhance fannish/fandom study
  • What characters interact and when? What does it mean?
  • X-Men example of a network graph, Lost, Glee
  • Went to fan wiki to collect info about characters – familial and romantic relationship listed, but not friendships
  • Graph created in mathematica – demonstrates lack of interracial interaction
  • Analyzed forum discussion – very little forum activity around black families and relationships
  • Without leading black character (Smash), black characters are completely separate, no interactions between black and white characters (Smash as a node)
  • While this is meant to be a show that is open about racial issues, other than Smash all black characters are unimportant, do not have relationships that reappear – conveys superficiality.
  • Watched an episode, marked down each interaction between characters, drew lines between them. But doesn’t show whether discussion is angry, romantic etc. But drawing a graph of fan discussion forms means that you can see which interactions were meaningful – may capture emotional investment better. Graphs include ‘fantasy’ relationships – fans interested in relationships that don’t actually exist in the text of the show.
  • Show following slightly smaller (than, say, Lost), so she felt that she had a grasp on the community.

Robots Watching Television – Approaching Algorithmic Media Analysis – Jarom McDonald

  • He has Google glasses and will share!
  • “An experimental testbed” –
  • “Reading Machine” – it’s not that interpretive readings of texts can be arrived at algorithmically, but simply that algorithmic transformation can provide the alternative visions that give rise to such readings…patterns. From pattern the critic may move to the grander rhetorical formations that constitute critical reading” – Steve Ramsay
  • Why is this hard? Video has unique challenges
    • Amount of bits, even when well-compressed, need a lot of processing power
    • 4 dimensional variables (linguistic, aural, visual, temporal)
    • Contextual pirpose? Narrative studies? Media studies?
  • Project: Cinemetrics, Shot Logger, Story Map, EnactiveCinema, Anyclip, NERD, CVP Media, Google X Lab – algorithmic analysis of media. Can run through millions of hours, identify cat footage
  • Emphsize: these are experimentations…what might be pattern-yielding algorithms?
  • What algorithms might help us develop a hemenuetics of programmatically approaching television? Allow for ‘live’ analysis and aggregate, comparative analysis
  • Make sure they can run in modern browsers, driven by javascript (means the interface is already there)
  • Just because you have numbers, doesn’t mean that there’s a narrative
  • Experiments – start with text-based algorithms – content, text, semantic – tricky  – what is a text? Text-to-speech, closed-captioning transcript, original script?
  • Jason Mittell blog post from Nov 30 2012 – intersection of media studies and narrative studies how to analyze TV?
  • Spectrogram generation – audio signals. A wave form? No, a spectrogram. 3D data plotted in a 2D graph – live javascript spectrogram reader – can view as you watch the video
  • Graphed tweets-per minutes from a show as it was happening (Pretty Little Liars – much tweeted). Number of tweets spiked directly after narratively significant moments – can this be overlaid on spectrogram? Audio patterns change to match twitter spikes.
  • Next steps – direct relationships between media structure and narrative, design experiments for assessing the strength of those relationships, what might we say once we’ve found significant correlations, how machines might apply these algorithms without intervention
  • How to store, what quality? Lower quality okay for now – but we don’t know what we might want to do in the future.

10:30 

Reading the Visual Page of Victorian Poetry – Neal Audenaert & Natalie Houston

  • Project – The Visual Page
  • Developing software for the computational analysis of digitized page image
  • Dataset: 300 single author books of poetry published in England from 1860-1880
  • Dan Cohen – “Searching for the Victorians” – should we worry that our scholarship might be anecdotally correct but comprehensively wrong?”
  • Poetry as a part of 19th Century cultural production
  • Interesting to study because of use white space (you always know it’s a poem). Words on the page (title, imprint) convey a series of messages
  • William Morris – pages demand to be read, but also seen
  • What kind of visual codes were structuring more ordinary poetry books? What messages?
  • Richard Watson Dixon – Chirst’s Company and Other Poems (1861) – in verses, indented lines (very easy to know what you’re reading)
  • Long narrative blank verse poems, poems end on one page and another begins, poems with dropped, ornate capital letter at the beginning (challenge for OCR)
  • Matthew Arnold’s 1867 Dover Beach was radical – unrhymed, mixture of long and short lines
  • Goal is to see similarities and differences between sets of printed materials, historical changes in printed materials, measure identifying destincitve features and/or distinvtive books, measuring and identifying rep in typical books, influence and imitation in the design of printed materials.
  • Neal’s approach – there are documents that scholars need to see – what tools can we build to support interaction and analysis when the text transcribed is not sufficient
  • How to extract and model features from a systems perspective? How to join OCR, pattern recognition, user interface to support this research, how to build user interactions
  • Image analysis – a lot of work has been done on image analysis, but text-based – how can traditional work be leveraged?
  • Layout analysis gets labeled, needs to be indexed by pages, by poems, and into books,
  • Feature extraction: Easy to compute line height, width, leading margins, indentations, ratio of black pixels
  • Preliminary results: looking at line length, found the end of ’50 Modern Poems’ was wayyy off – much longer lines – found that the last 20 pages were ads from the publisher – can use this method to find different types of materials in periodicals
  • Further research – visual analytics in large-scale historical understanding, how can visual analysis combine with text analysis, topic modeling, expand this research to other types of printed material (periodicals and newspaper)
  • Where are machine learning and pattern classification techniques helpful? How can we designs search interfaces to support exploring and research? How to design for scalability of data and analysis

Representing Materiality in a Digital Archive – Matthew Lavin

  • “Words are written, books are designed and built”
  • UNL has 50 copies of Death Comes for the Archbishop.
  • Many have ‘artifact value’ i.e. Willa Cather’s copy with a fan letter, her name,  a photo
  • Digital Humanities Next Top Model
  • TEI  cons – high learning curve, lack of analytical pay off
  • FRBR  pro – very good with nuanced bibliographical data con – hard to explain to people, hasn’t yet been fully realized.
  • ….

Beyond the Document: Transcribing the Text of the Document and the Variant States of the Text

  • Capacity for annotation and transcription greatly enhanced by digitization, but there’s a lot of money and time required
  • Shift in focus to transcription of primary documents – “texts are constructs from documents”
  • Shift from traditional role of the editor – intense editorial efforts should foucs on the construction of the text of the documents
  • Coding system to represent different series of texts – represent meaningful marks on the document
  • Relationship between the TEI community and the DH community
  • How to encode the digital page? How does this display?
  • Labour over encoding but have we done enough to explain why understanding page is important? and content are related.
  • ….

1:30

Mapping Homer’s Catalogue of Ships – Courtney Evans and Ben Jasnow – UVA Scholars Lab

  • Mapping Homer – Greek poet, composer of Iliad and Odyssey. Oral poet – no writing, no maps
  • “The Catalogue of Ships” lists 29 kingdoms from around the world – leaders, which ships provided, enumerates the towns in each kingdsom – so 180 place names total
  • Homer uses geography as a mnemonic device – kingdom-by-kingdom coherence is clear
  • Research question: does a geographical principle underlie Homer’s narration of towns?
  • Groups reflect roads and landforms – demonstrate this using neatline (sp?)
  • Syntactical group – place names share same verb or pronoun. Line-by-line group – place names appear in the same line
  • One exception – Boeotia. Is the poet ‘standing’ in Thebes, pointing out to the towns that encircle. “The view from a wall” – literary technique where narrator stands above, enumerating those on the battlefield below.
  • Future direction: adding archival info – bibliography, photos, annotations to the catalogue
  • Implications – could possibly find lost ancient cities – if  city named does not exist now, but is suspected to be geographically located between two known places….

Literary Editions and GIS

  • Field report of existing literary geo-spatial projects that have attributes of literary editions
    • Mapping the Lakes – includes literary text in full with Google Maps
    • HyperCities –
    • Europe: A Literary History
    • Rings of Saturn
    • Mapping St. Petersburg – large blocks of Crime and Punishment included with locations
  • GIS provides a medium for ‘visual intution’ nd the interpretation of patterns, modeling,
  • Dennis Cosgrove’s Apollo Eye  – how the globe is pictured
  • GIS – “a provocative new form of the literary edition” – some readers first encounter the text on these platforms
  • More than mapping route or location of a text, students can be readers, analyzers, critical scholars
  • Student project – Malory’s Roman War
  • Typically little experience with ArcGIS, – GeoFlow uses Excel to create 3D visualizations

Computing Place: Naturehoods in large US cities

  • http://citynature.stanford.edu
  • many aspects of cities obey certain scaling laws. Larger the city, the more patents, creative jobs, wages higher
  • Number of parks and green spaces do not follow this – no consistency between city size and amount of green space.
  • People live in neighbourhoods as much as cities – it’s probably very uneven across neighbourhoods.
  • Took satellite data for ‘greenness’ – backyards, open lots, etc. and added that to park land calculations
  • Calculate distance to park-level greenness, then average by neighbourhood
  • Looked at social variable with 2010 census data – there is not much of a correlation between affluence, ethnicity, and access to park space

A Clear Temporal GIS Viewer and Software for Discovering Irregularities in Historical GIS – Vitit Kantabutra

  • Need for temporal GIS in the digital humanities – human and natural historic processes are not static; they involved space and time.
  • Temporal GGIS is an essential visualization tool
  • Developed a database management system, Intentinally-Linked Entitites
  • Created 3 pieces software used for spatio-temporaly eporation:
    • Clearview – “a high-fidelity viewer for historical/temporal GID – does not lose or hide data
    • Sofware for detecting database situations where a region’s capital lies outside the region’s border
    • Software for displaying these abnormal situations
    • Working on China Historical GIS (CHGIS) – previously available GID that work with time all have trouble viewing data at historical time –scale.
    • Google Earth – skips much of the data, does not show locations and labels clearly.
    • ESRI ARcGIS 10 – goes back and forth in time as users slide the time-slider.
    • Emphasis is on pretty visuals rather than presenting information clearly.
    • Capital lies outside the boundry in about 1/5 of cases – 22% abnormalities
    • Epilogue – what’s wrong with packaged software: Too many layers, general inefficiency because the makers try to satisfy everyone. Difficult to learnm often for the same reason, poor or too much documentation. Bugs –

Mapping Text – Automated Geoparsing and Map Browser for Electronic Theses and Dissertations

  • About 10k born digital theses in repository (since 2004)
  • Why map a textual collection? Increase attention and acess to collection, presents a unique context, visualizes interconnections in the locaton of study, fills gaps not covered by traditional cataloguing
  • Increase understanding of geoparsing, automated metadata collection
  • Geoparsing – enables a map based interface, encodes co-ordinates
  • Desired map functionality – three map bass, dropdown menu of countries and states, dropdown menu for departments, search y author, time slider.
  • Geoparser – Dspace support curation tasks (custom java programs)
  • Problems – name extraction – Open NLP, Standform NLP, Mallet – classifies strings of text as being place names or not
  • Disambiguation – (they use geonames.org)
  • Heuristis – context-based – clustering of places, favour candidates of mentioned feature types, (if you mention Dallas, the Red River, Paris….you probably mean Paris, Texas. If there’s no context, Paris, France will be chosen because of higher population)
  • Geonames – web look up returns are unclear in ordering,
  • Future plans – use statistical techniques for name disambiguation, integrate the tool into document submitter/curator workflow.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s