Monday, March 4, 2013

Research and Data Blogathon - Day #2

Good morning.
“And the beat goes on........................”

Apologies on commenting.  I often forget that in the Google blogger platform, you have to be on the site to leave a comment.  So if you have received the post via email as a subscriber, and would like to comment, please click on the Barry's Blog button above the Google icon on the top left hand side to be taken to the site.  Then scroll down to the end of the day's blog and click on the 'comment' icon.  Sorry.

Comment by Bryce Merrill to yesterday's postings:

"I would like to add to Margaret Wyszomirski's comments that the academy has not exactly embraced art as an esteemed research topic--the cultural turn in the social sciences did much to elevate the importance of culture and little to accelerate and legitimate arts research. Getting graduate students in the social sciences, for example, to specialize in arts research is a hard sell when tenure-track jobs teaching arts research and policy are few and far between."


Barry:  QUESTION #2:   Given the trends in research toward “big data,” what are the merits of centralizing arts data and making these data available online to the arts field? If such a central location were to be developed, what might be included and whose responsibility would it be to maintain and curate this repository?

Sunil Iyengar:  T.S. Eliot's weary skepticism notwithstanding ("Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?"), I'm as starry-eyed as the next researcher when confronted with the promise of big data--especially given the prognosis of Census officials that household surveys are reaching a saturation point, and that administrative records and commercial datasets (from online transactions, for example) will be used increasingly to fill the void.

To make good on that promise, arts researchers could benefit from a web portal that allows them to engage promiscuously with all manner of datasets featuring arts variables and outcome variables that can be studied in relation to one another. Ultimately, smart curation will be needed not only to procure such data but to describe their value for arts-related research so that their strength and limitations are apparent to all who visit the site. This is not as easy as it sounds: it takes imagination and foresight to perceive how different datasets can be used for different purposes. Reader-friendly user's guides and data dictionaries are a must. Further, one probably will want access to both macro- and micro-datasets, to extensive search capabilities, data visualizations, and (one can dream) an extensive library of citations or abstracts from studies that have relied on those datasets. As for who will do the curating, I imagine a national consortium of public and private funders recruiting a team of data managers, researchers, and arts practitioners to act as an advisory team responsible for vetting the data and recommending new acquisitions. And let's not forget the lawyers (as if they would let us!), to advise on the intellectual property and data confidentiality issues that are sure to arise.

Margaret Wyszomirski:  Centralizing access to data and making it available on line are two developments that would encourage more data use and analysis in arts research. This is not the first time this issue has come up and we would do well to remember what we learned from previous efforts and to build on them when possible.  I would regard four sources as key building blocks in this effort: (1) the ICPSR at the University of Michigan; (2) CPANDA at Princeton University; (3) The NEAs Sourcebook on Arts Statistics; and the (4) Unified Database of Arts Organizations at the National Center for Charitable Statistics in the Urban Institute.

1).  In the Social Sciences, we have long had such a resource for survey research on a range of topics in the Inter-University Consortium on Political and Social Research (ICPSR).  Here is some basic information on the consortium:
An international consortium of more than 700 academic institutions and research organizations, ICPSR provides leadership and training in data access, curation, and methods of analysis for the social science research community.
ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
The consortium also has an extensive set of summer workshop and course offerings.  Key data sets like the SPPA, and the American Time Use study are archived at ICPSR and a quick keyword search indicates that “arts” shows up in a scattering of other datasets across a number of topics but there isn’t much here on the arts right now. I mention ICPSR because it provides something of a model for how data has been archived and made more available to scholars and other users on other subjects.

2)  Indeed, ICPSR was something of a model for the construction of CPANDA at Princeton University in the late 1990s.  CPANDA is the Cultural Policy and the Arts Data Archive, its website identifies it as follows:
CPANDA, the Cultural Policy & the Arts National Data Archive, is the world's first interactive     digital archive of policy-relevant data on the arts and cultural policy in the United States. It was founded in 2001. It is a collaborative effort of Princeton University's Firestone Library and the Princeton Center for Arts and Cultural Policy Studies. The Pew Charitable Trusts underwrote the original development of the archive.

This site includes far more databases concerning the arts than ICPSR.  It also facilitates on-line analysis of the data, and a search by key variables.  Although it is no longer aggressively obtaining new datasets, various researchers continue to archive their materials there and key series like the SPPA are updated regularly.  This provides a valuable resource foundation for expanding and building accessible data resources for the field.

3).  We could also look to an older “analog” resource that the NEA used to provide in the Sourcebook on Arts Statistics that it produced in 1989 and 1992.  This was something of a guide and sampler to data that was collected across the arts sector and about the arts sector.  It was particularly valuable in that it included overview information from most of the major arts service organizations—information that is still not widely available outside of association membership. It also included information on commercial arts activities such as broadcasting and film.  This helped to provide an introduction to the kind of information that was available and who collected it.  No comparable introduction to information resources on the arts is currently available.

4).  The Unified Database of Arts Organizations (UDAO) was a collaborative effort of the National Endowment for the Arts, the National Assembly of State Arts Agencies, and the National Center of Charitable Statistics to combine their separate lists of arts organizations from the NEAs grant management systems, NASAA’s National Information Systems Project, and the NCCS’s IRS files on arts and cultural organizations into one unified database.  The goal of the Unified Database is to provide a comprehensive, flexible, and sustainable database of arts organizations that meet the needs of arts researchers, funders, and policy makers. The core of the system is a master list of organizations with activities in the arts that is derived from data obtained from the IRS and from state arts agencies through grantee or mailing lists. This core subset includes extensive financial data and reliable classifications of activities. The database encompasses data from a number of sources and include data on for-profit and governmental organizations as available.  It because a resource for a series of local reports on arts participation that involved the Performance Arts Research Coalition (Opera America, ASOL, Dance USA, and TCG).

So we might say that ICPSR provides a model for building and maintaining a data archive as well as identifies another key feature –training modules for researchers and students that can expand quantitative information use and analysis.  We could argue, that CPANDA constitutes a major building block for acquiring relevant datasets and facilitating on-line use. Both CPANDA and ICPSR provide examples of field governance for such efforts with interdisciplinary and diverse advisory groups.  The Sourcebook provided a broad overview of relevant information sources, examples of their data, and an introduction to this information to help orient newcomers.  Finally, the Unified Database took a big step toward merging and combining overlapping databases about arts organizations to incorporate on the strengths of each and minimize the weaknesses of each.  They also involved service organizations in the effort and its use.  If we are to fully engage in the era of big data, we will need all of these building blocks and will also need to build new resources that address ecology components in the infrastructure, about digital participation, and collect political and policy relevant information as well.

Finally, it bears mentioning that there is precious little in the way of policy or political databases available in the field.  When I teach about the history of arts policy positions in party platforms, I had to build my own database that required combing thru both Democratic and Republican platforms (as well as occasional major third-party efforts) since 1960 –or now a fifty year period.  AFTA reports on congressional scorecards regarding arts policy issues, but to my knowledge, that information has not been gathered into an archive.

Bryce Merrill:  This is a very important question, and not because it asks to weigh in on the arts and big data. I will save my “Vivek Kundra-meets-IBM” argument for another day. For now, the issue of greatest import is that of big data and centralization. There are certainly significant merits to making arts data available to the field online; however, centralizing all of these data and expecting one organization to bear that administrative burden is unrealistic and likely unwise. Moreover, given the broad availability of big data collection tools including data scraping and data mining--and also constant advances in lowering the cost of data storage--single-source or limited-source approaches to data collection are becoming anachronistic. Add to this the increasing power of data manipulation technologies, many of which are built using open-source applications, and the arts field should be cautious about centralization. The arts field can be active participants and even innovators in the era of big data--the field will not advance otherwise--but a highly flexible, less bureaucratic, multi-source and financially sustainable model of data engagement is likely to be needed. The field would benefit from a multiplicity of big data collection and management efforts joined with a more enlightened understanding of the costs of engaging in the world of big data.

There are many benefits to making arts and arts-related data available to the arts field online. From a research perspective, having online access to the General Social Survey, for example, encourages scholars to use that ample database for research. I applaud the National Endowment for the Arts’ efforts to make available online some of the data sets the agency routinely collects. They could do even more. For example, they could further expand the data they make available by taking advantage of new innovations in online data application technologies to make the data more readily accessible. (The real estate industry offers a number of excellent, user-friendly data portals that provide housing market and neighborhood indicators data in easily accessible formats. Even data.gov is growing in accessibility.) The Cultural Policy and the Arts National Data Archive (CPANDA) is a good start at making arts data available online, but much more work will need to go into that effort for the data to be usable by broader audiences--especially field practitioners who often do not have knowledge of  research and data analysis methods. The field also needs to address the question that such a data portal begs: “Who needs it?” If the answer is “no one,” then we either have a useless resource or maybe a larger problem--a resource and no one to use it. Our (WESTAF’s) Creative Vitality Index portal and Americans for the Arts’ Local Arts Index web portal are also positive contributions, but ones that need to be expanded in terms of data availability and usability. Quite frankly, before the age of big data, there was a need to surface the rich data sets produced by arts researchers over the years. Now, like the volume of arts data that exists, the need for data access is even greater.

No one organization should be or can be responsible for meeting the growing need for access to data. Consider the considerable resources that have been invested in the Cultural Data Project, a project that attempts only to surface a single section of the arts data landscape. The CDP is an important resource for the field, but gathering and surfacing a census of information about nonprofit arts and culture organizations is a Herculean task. The CDP is not even close to completion in terms of its potential as a national resource, and the cost of building and maintaining such a system may be unsustainable. Imagine the difficulty and cost of centralizing all arts data. Also, imagine the lack of imagination and systemic inflexibility that often characterize the type of large bureaucracy that might volunteer for the task of centralizing arts data. Informal and crowd-sourced data scraping sites, HackFests, and other “open-data” phenomena may be the most nimble, inexpensive, and effective options for surfacing arts data currently available. Instead of centralization, the field would benefit from multiple data surfacing efforts and even some overlapping of such efforts.

Just because one can find data online does not mean that one should use it or that the finder knows how to use it. The field would benefit from a more concerted effort to encourage online data literacy and competency. With such literacy and competency, we could as a field have an ongoing discussion about the benefits and pitfalls of having access to all of these data. We could also have a meaningful discussion of the merits of responsible research. For example, the field could have more candid conversations about criteria for data validity, excellence, and use.

WESTAF’s 2012 Cultural Policy Symposium, Arts and Culture Research in the Digital Age, was an attempt to contribute to the field’s growing awareness of big data. Proceedings from this symposium, including Geoff McGhee’s keynote address, will be published on the WESTAF website in June, followed by the publication of excellent presentations by Steven Tepper, Margaret Wyszomirski, Laura Zucker, Anne Gadwa, Eric Rodenbeck, and others. National arts organizations and major funders could collaborate in their efforts to help the arts field thrive in the age of big data. They could do so through the sponsorship of informational convenings, professional development opportunities, and by supporting data literacy in the field more broadly. In fact, I often find myself defending a plurality of efforts in the arts field like a capitalist defends free market competition, but the one place where collaboration would be most beneficial would be helping the field find big advantages in big data.

The field also would benefit from influencing the existing collection of arts data by large data collecting organizations, such as the IRS. Elizabeth Boris, Director of the Center on Nonprofits and Philanthropy, recently advanced a similar position regarding nonprofit data. According to Boris:

[w]hile the time is ripe to deepen and standardized information collected [on the nonprofit sector] independently, we must acknowledge that the IRS data collected annually on Forms 990 are critical to our ability to understand the sector. There must be ongoing interaction with the IRS if we want sector information to be robust, inclusive, and ongoing.

In other words, there are good reasons to centralize our efforts in the field to improve the national data sources--such as 990 data--that the field relies on regularly. Working with the major federal agencies that collect and maintain data relevant to the arts to refine and improve the measurement of the arts is a cause we can all get behind.

Going back to the implications of big data for the arts field, researchers and non-researchers should be concerned about the field’s ability to access and gain insight from big data. We likely need access to big data in order to support the inquiries now being launched about participation in the arts, the value of arts education, patterns of art consumption, etc. Rather than becoming fixated on data centralization, I propose we spend more time on field literacy regarding data and research. A research-literate field will be a critical consumer of big data. Let’s find unity in the language we bring to big data. Doing so will pay larger dividends than attempting to create a centralized source of arts data.

The arts field must unite around big data, but we do not need to be united in our approach.

Randy Cohen:  I remember my first 100MB computer hard drive and thinking, “This is so big, I’ll never need another!”  Today, I email that much data before lunch.  The amount of data generated globally doubles every 2-3 years, and arts data is expanding at amazing rate as well.  Big Data is more than an archive of large and complex data sets.  It also refers to their accessibility and usage.  The Big Data magic happens when we can compile enough data about past events and behaviors, that we are able to predict future ones.

There are some solid Big Data efforts in the arts currently:
·         National Center for Charitable Statistics (NCCS) is home for IRS data on nonprofit organizations
·         Cultural Data Project (CDP) is an online financial management tool in 13 states
·         WESTAF’s Creative Vitality Index (CVI) provides dynamic data about the creative economy
·         Americans for the Arts’ National & Local Arts Index provides measures of the health and vitality of the arts

The beauty of Big Data is that we can track multiple aspects of a phenomenon using scores of indicators.  In years past, decreases in attendance suggested a sector in decline and we feared the worst.  From the National Arts Index, however, we now know that while the share of the population attending live performing arts events declined annually between 2003 and 2009, we also know there was a simultaneous upswing in personal creation, electronic participation, and the number of college arts degrees conferred.  Thus, the issue may be more about challenges to our traditional delivery mechanisms, rather than an industry in crisis.

Savvy arts organizations use these findings to tap a public seeking a more active arts engagement.  The Baltimore Symphony did this with its brilliantly-conceived ‘Rusty Musicians’ project, where members of the community proficient in an orchestral instrument and able to read music were able to perform in a live concert with the symphony. Demand for participation was beyond expectations and family, friends, and co-workers flocked to the concert hall at movie-theater prices to watch the special concerts.  For many attendees, it was their first time in the symphony hall.

Arts organizations are in great position to gather and share data from and about their consumers. There are some excellent Big Data regional efforts of this already under way, such as LA Stage Alliance’s “LA Arts Census,” a collaborative research and marketing tool that compounds data on 4 million patrons from more than 225 arts organizations, and even integrates psychographic and consumer data. The ultimate success of such endeavors will be whether arts organizations become facile enough with the data to not just generate mailing lists, but also build long-term organizational strategies.

The measure for Big Data success is not just how much of it there is (it’s always growing), but how well it is applied to policy and practice.  Big Data will be most valuable when there is a clear link between the numbers and arts leaders taking action.

Elizabeth Currid-Halkett:   The merits are innumerable – I also think that such centralization would really make strides in creating a universal definition of the arts as well.  Sharing data lessens the burden of data collection (meaning we can spend more time actually doing research and analysis and less time collecting) and also allows for great potential in comparative analysis on many different scales and units of measurement. We might also get closer to coming to an agreed definition of what we mean by “the arts” and what sectors and occupations comprise them.

Thank you.

Have a great day.

Don't Quit