Personal information management in academia

I’ve been thinking about Personal Information Management (PIM) for the last few weeks as I’ve been wrapping up my semester course work. For my class on Human Information Interactions, I developed a short annotated bibliography for research on how faculty and researchers organize information. I initially had some trouble locating articles that dealt specifically with PIM in academia: most research examines information workers outside of the university. However, there were a handful of useful studies and I thought I would share those in case anyone else needed a good starting point.

Introduction

While scholarly communication has received significant attention from researchers in the field of human information behavior, less attention has been given to how scholars actually organize their files in the pre– and post– publication stages of research. As the world of academic research becomes increasingly digital, networked, and transparent, information scientists should turn their attention to the underlying structures, methodologies, habits, and perceptions of personal archiving in a university environment. Not only is it easier in a digital environment to track the scholarly communication process, but by focusing on these activities, we will see how digital networks are changing the ways scholars create, store, and disseminate information at all stages of research, from planning to publication and beyond.

The field of Personal Information Management (PIM) provides a theoretical and practical framework for discussing the technical details of the research process. Unfortunately, even though there are numerous PIM studies on engineers, travel agencies, financial firms, legal firms, etc., researchers have rarely turned a critical eye upon their own practices. Perhaps, as many of the works below suggest, this is due to the realization that PIM is uniquely tailored by each individual: no one system works for everyone. Those studies that do exist are fairly limited in scope, usually focusing on a single tool (e.g. email, bookmarks) or a single user group (e.g. computer scientists, graduate students). Few studies broadly discuss PIM in a university environment.

The following works were chosen because, in part or in whole, they deal with PIM in a university environment by faculty and researchers. Together, they provide a rough outline of the major concerns for PIM in academia: How much information should be saved? How will it be organized? Who should be responsible for its organization and preservation? What motivations drive information storage? What barriers exist and what are the implications for scholarly communication? For more information on PIM in general, I recommend the works of William Jones and Jamie Teevan, especially their Personal Information Management (Seattle: University of Washington Press, 2007) and Peter Williams, Jeremy John, and Ian Rowland’s 2009 article “The personal curation of digital objects: A lifecycle approach” (Aslib Proceedings, 61(4), 340–363).

Bibliography

Boardman, R. & Sasse, M.A. (2004). “Stuff goes into the computer and doesn’t come out”: A cross-tool study of personal information management. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 583–590). New York: Association for Computing Machinery.

Boardman and Sasse are constantly referred to in the literature that exists on how faculty and researchers organize personal information. Their research provides data and a methodology for creating an empirical foundation for PIM. In the study, information about how users in an academic setting organize information was collected across multiple tools (email, files, and bookmarks) and over time. All the participants except for one were from the university community and the majority of the participants were researchers. Using interviews, observations of the work environment, and long-term observations of file management, the authors examined the structures, maintenance, and retrieval preferences of the participants.

This research provides useful information for understanding how some individuals organize information and how they feel about their personal organizational methods. For example, the authors discovered that when users had similar hierarchies of file folders and hierarchies of email folders (termed “overlap”), users did so according to their roles (e.g. teacher) or projects (e.g. research proposal). Additionally, the users that filed items more frequently (daily) and had established organizational systems exhibited a sense of pride at their ability to organize their files over the years, even while simultaneously recognizing flaws in their system. This confirms what other studies have suggested: that the best PIM system is a highly personalized one.

Most importantly, the authors conclude that the categories used to describe information organizers in previous studies, such as Whittaker and Sidner’s “pilers” and “filers” (Whittaker, S. & Sidner, C. (1996). Email overload: Exploring personal information management of email. Proceedings CHI 1996, 276–283.), were not granular enough to describe all users. The participants in this study used multiple PIM strategies across multiple tools and did not easily fir in the previously established categories. This study provides a broader framework, based on previous research but adapted to describe the results of this experiment, for discussing the various PIM strategies.

Foster, N.F. & Gibbons, S. (2005). Understanding faculty to improve content recruitment for institutional repositories. D-Lib Magazine 11(1). Retrieved November 22, 2010, from http://www.dlib.org/dlib/january05/foster/01foster.html

In this year-long study funded by a 2003 Institute of Museum and Library Services grant, Nancy Foster and Susan Gibbons of the University of Rochester River Campus Libraries system sought to understand how faculty manage information. The purpose of their research was to find innovative ways to market and adapt IR systems to meet faculty needs, ultimately increasing participation. The article’s goal is not to explore PIM, but its findings provide insight into how faculty manage personal information and the information needs of individuals in a research environment.

The authors asked faculty members what they expected from an IR system. The majority of faculty indicated that they wanted tools for authoring, archiving, disseminating, locating, and reading research. They also expressed a desire for tools to control versioning, access information anywhere, and control access by other users. Faculty want their research to be archived with similar materials (related by subject), which suggests how they conceptualize the context of their personal information in a networked environment. In many cases, faculty had already created systems and methods that met these needs without specialized software: e.g. emailing files to oneself or to family members as a versioning control system. The broad array of responses indicates the wide range of information needs.

The observations and documentation of the faculty at work were based on anthropological participant observation. The data was gathered and analyzed by a diverse team that included reference librarians, computer scientists, an anthropologist, a programmer, a cataloger, and a graphic designer: an aspect that makes the research particularly insightful. The latter half of the article is primarily concerned with how to use this information to market buy-in for IR systems. For the purposes of this bibliography, it illustrates one practical benefit of understanding how faculty organize information.

Gandel, P.B., Katz, R.N., Metros, S.E. (2004). The “weariness of the flesh”: Reflections on the life of the mind in an era of abundance. EDUCAUSE Review, 39(2), 40–5.

The authors of this commentary on the current state of knowledge management in higher education offer a CIO’s perspective on the future of personal information organization. Grandel, Vice-Provost for Information Services and Dean of University Libraries at the University of Rhode Island; Katz, Vice-President of EDUCAUSE; and Metros, Deputy CIO and Executive Director for eLearning at Ohio State University, combine their extensive experience working with various stake-holders in the information landscape of universities to offer simple solutions to the problem of information abundance and recommend ways to encourage faculty buy-in on institutional repositories.

The authors claim that before the age of the computer, there was a fairly stable equilibrium between the demand for information and the supply of people to teach that information, but that now we live in an era of information abundance. The shift from an industrial to a knowledge economy, the falling cost of computer processors, the rapid adoption of information systems for all aspects of operations, and the growing acceptance of education as a life-long process have all contributed to a growing dependence on information resources in higher education. The future promises to be an age of abundance as individuals discover and utilize their ability to archive any and all aspects of human life in digital form. This includes the production of scholarly works.

The authors suggest that we think of the information landscape in terms of “ecologies” and of individuals as the organisms within that ecosphere. How will we study these organisms? How will we adapt our ecosystem to meet the needs of these individuals? What necessities will this ecosystem requires? These questions, though not asked explicitly, are suggested as the authors discuss the roles in which administrators, librarians, archivists, and publishers play in this new ecosystem. Grandel, Katz, and Metros conclude by recommending that institutional repositories be easy to use and seamlessly integrated with [faculty] desktop systems to encourage use and provide a stress-free way of incorporating tacit and explicit institutional knowledge into the networked ecosystem of information. Their image of the future calls to mind a great university-run Memex, both individual and institutional in its scope. For the purposes of this bibliography, this article provides an institution-wide perspective on the implications of PIM when integrated into a networked environment.

Kaye, J., Vertesi, J., Avery, S., Dafoe, A., David, S., Onaga, L., Rosero, I., et al. (2006). To have and to hold. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 275–284). New York: Association for Computing Machinery.

Kaye et al. set out to discover how academics at one Ivy League university organize and archive their information and to understand the values inherent in their organizational system. The authors posed a set of questions to 48 academics, took pictures of their information spaces, and qualitatively analyzed the results. They discovered five principle reasons for personal archiving: (1) retrieval, (2) legacy building (3) resource sharing, (4) fear of loss, and (5) identity construction. While the organizational systems varied from one individual to the next, each system tended to utilize one particular medium (e.g. bookshelves, boxes, file folders, digital bookmarks) that was influenced by the organizer’s principal values (the five stated above) and work lifestyle (e.g. single office vs. multiple office).

Kaye et al.‘s study suggests that the need to retrieve information is neither the only nor the most important reason for personal archiving among academics. Additionally, the study states that no one system was significantly more effective at information retrieval than any other. Academics archive material for reasons that are not always rational (e.g. fear of loss) or immediately transparent (e.g. identity construction). Based on this knowledge, system designers should develop information systems that reflect the values inherent in personal archiving. Currently-used systems can be judged according to these values. The authors also suggest studying the relationship between personal identity and the customization of desktops, blogs, and personal websites when designing digital archiving tools.

Marshall, C.C. (2008). From writing and analysis to the repository: Taking the scholars’ perspective on scholarly archiving. In Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 251–260). New York: Association for Computing Machinery.

Catherine Marshall of Microsoft Research and the Center for the Study of Digital Libraries at Texas A&M University, studied the information behaviors of 14 computer scientists with a significant number of publications in their field in order to understand how they organized information related to their research. The participants in this study were more familiar with computing environments and thus illustrated more complex PIM practices. Marshall used semi-structured, open-ended interviews and observations over the course of six months to gather her data.

The participants in the study typically made an effort to archive six types of materials: (1) paper sources of their publications, (2) digital copies of the same, (3) research codes (4) data sets and logs, (5) bibliographies of related work, and (6) email. These files existed in various forms of completion, across multiple tools, and among multiple collaborators, illustrating the complex nature of scholarly communication in a digital, networked environment. Of particular note, Marshall discovers that personal archiving is more a side effect of collaboration and publication than a unique, intended process. If files are shared with colleagues via email, then email becomes the tool used for version control and storage. In her words, personal archiving is at once both “opportunistic” and “social.”

This study also raised a number of interesting questions about PIM, including: if two or more authors are collaborating on a single publication, who has the authoritative version? At what point do data sets become archive-worthy: as raw data or after the data has been worked on? Do citations stored in BibTex files need to be complete or just enough so that they are recognizable? Marshall ends by offering implications for collaborative information management, for personal scholarly archives, and for institutional and disciplinary repositories.

Winget, M.A., Chang, K. & Tibbo, H. (2006). Personal email management on the University Digital Desktop: User behaviors vs. archival best practices. Proceedings of the American Society for Information Science and Technology, 43(1), 1–13.

This article offers a summary of the findings of a three-year project that examined the records management behaviors– particularly email management– of faculty and staff at two North Carolina universities. In-depth interviews were used to collect information about the subjects’ organization methods, retention habits, and concerns about digital information. While the majority of the article discusses the practice of record retention in the legal context of a state-supported university, it does provide some useful data for understanding how faculty and staff at a university manage their email, including: how important emails are stored; how emails are organized; and how attachments are stored.

Winget, Chang, and Tibbo discovered a variety of behaviors when it came to how important messages were stored, including saving them to a hard-drive or network drive, printing them out, moving them to a sub-folder, flagging them, moving them to another format (e.g. Microsoft Word), and leaving them in the inbox. The majority of respondents (88%) used a folder system to organize emails, most ranging from 11 to 50 folders. 89% of the respondents saved attachments outside the email program. Like other studies, this shows the variety of methods university faculty and staff use to organize information. While there are certainly strong tendencies to organize information in a particular way, no one system is shown to be more effective than another.

Winget, M.A. & Ramirez, M. (2006). Developing a meaningful digital self-archiving model: Archival theory vs. natural behavior in the Minds of Carolina Research Project. Proceedings of the American Society for Information Science and Technology, 43(1), 1–12.

The goal of this paper was to examine how users, specifically university faculty, might choose to self-archive digital objects. The authors interviewed two faculty members, one scientist and one humanities scholar, and asked them to consider and collect what they would submit to a digital archive and discuss how they would organize it. The two faculty members took two very different approaches. The scientist intentionally excluded lab notebooks (an item the authors considered to be of great academic value), created a lengthy narrative of his career to accompany the materials that he did include, and mostly referenced his publications by providing links to PubMed citations rather than submitting the actual documents themselves. The humanities scholar provided materials related to the development of a single monograph. These included documents that illustrated the creative and iterative process of translation (of poetry) and contextualized the monograph within the scholar’s work and professional connections. For example, he included pre-prints of the work that contained notes from other colleagues.

Winget and Ramirez spend much of the article making recommendations for future developments of digital archives. Concerning personal information management, they discovered that the desire to self-archive at the early stage of one’s career is inhibited by (1) lack of need to reflect and “look back” and (2) the hesitation to publish mistakes, especially in light of a rigorous tenure process. The article also illustrates how two people can chose two radically different approaches to organizing information and deciding what information is worthy of preservation. Additionally, Winget and Ramirez point out that these approaches were contrary to archival best practices.

Zimmerman, E. (2009). PIM @ academia: How e-mail is used by scholars. Online Information Review, 33(1), 22–42.

In this study, Eric Zimmerman, Vice-Provost for Academic Affairs and Director of Research at Interdisciplinary Center Herzlia in Israel, assesses the relationships between email use and scholarly work. While not an original research question, this study, performed decades after the introduction of email, is unique in that it is undertaken at a time when it is understood, based on previous studies, that the vast majority of scholars today are comfortable using email technology.

Zimmerman surveyed 390 faculty members of the humanities, social sciences, and sciences at Bar-Ilan University in Israel. The surveys were distributed via email and paper formats and asked faculty members a number of questions regarding email use, level of comfort, skill level, and the application of email for scholarly communication. Of 17 predefined uses, faculty mostly used email for: proposal development use, manuscript submission, research collaboration, and participation in committees.

Other important findings include: (1) a negative correlation between age and self-described email skill: older users expressed lower levels of comfort using email; (2) 45% of those surveyed feel overloaded, but almost 65% expressed little difficulty in organizing email; and (3) scholars with more publications tended to use email more frequently. Additionally, Zimmerman found that while respondents view email as a benefit to scholarly work (rated on a Likert scale), when the results are broken down by school, humanities faculty generally rate its benefit lower than social sciences or sciences faculty.

The results of this study suggest that email is perhaps the most widely used tool in the scholarly communication process, serving the processes of communication, collaboration, drafting, peer-review, manuscript submission, versioning, and archiving in the publication process.

I hope this information is helpful. If you have additional resources on Personal Information Management in universities, please share in the comments

My information management

During the week, I spend approximately 13 hours online each day. On the weekends, it’s slightly less than that. Being a full time library cataloger and a full time graduate student in an online program comes with some considerable drawbacks, not the least of which is finding ways to organize all the data that I collect and interact with on a daily basis. Here’s how I do it:

Online Storage

Since most of my digital experiences happen online, I store most of my data in the cloud.

Bookmarks (Websites)

Anything I come across online that I think is worthy of coming back to later is stored using Delicious. This usually includes root level domains of websites or major directories within websites. Rarely, I will save blog posts or articles here, though the more “academic” in nature, the more likely I am to save it in Zotero instead (see below). I’ve synced my Delicious bookmarks with all my Firefox browsers so they are immediately accessible and a new site can be added and tagged in seconds.

Contacts

I’ve migrated all my contact information to Google contacts: phone numbers, emails, mailing addresses. With the exception of mailing out wedding invites, every time I need an address or phone number, I’m usually out of the house. So I’ve synced my Google contacts to my ipod touch and stored them locally so they are always available even without a connection.

Email

I practice a mix of GTD and Inbox Zero methodologies. This requires (1) action-based labels and (2) smart use of filters. Basically, everything that comes into my mailbox is tagged and marked for (a) needs action, (b) read and review, © notifications, or (d) trash. So depending on who the email comes from, whether or not I’m the only person in the To: line, what words are in the subject, etc., each email gets moved to a certain place and I deal with each batch as time permits. By the end of every day, my email box is always empty. I save whatever I would be sad to loose and delete everything else (which makes future searching much more efficient).

Current Notes

I just started using Evernote to collect my ideas, clippings, and drafts for blogging projects (for this site and my library blog). Evernote allows you to import any type of note (text, image, pdf, whatever) and it will index any text (even text in images). I then tag all my notes based on the context in which I want to consider it in the future (e.g. read and review, potential posts, reading notes, tumblr blog, iav blog, etc.). Essentially, this is my pile of research notes, ideas, drafts.

Citations

For any article that I plan to cite in future writings or research, I store the citation in Zotero, a Firefox plugin that will store all the bibliographic data locally and on a server. I can then cut and paste the citations into documents using any of the usual formats (APA, MLA, Chicago, etc.). Keeping all these together and separate from my delicious bookmarks lets me know what I’ve cited in other papers, when I accessed the articles, and in what context I used them (based on any tags or notes I added).

Tasks

I’m a huge fan of the GTD methodology which stresses the importance of context over priority when deciding on task management. I use Remember the Milk to create lists of tasks based on project-type (research paper, home repair, blog work) and context (online, errand, at work, at home). RTM also allows me to assign due dates, repeating tasks, durations, and more. Using these tags, I can create smart lists such as: a list of any tasks that are time sensitive, can be done at home, and in under 20 minutes… a great way to decide what to do when you’ve got a few minutes to waste before going to see a movie.

Lists

I also use RTM to store all my simple lists such as: (1) CDs I want to check out, (2) things I need to buy, (3) gift ideas, etc.

Local Storage

Most of the files I store on my local drives are archived items: things I don’t plan to access anytime in the 6-12 months (or ever again). This includes old research papers, pictures, raw data from financial statements, etc. Nonetheless, the information is important, so I have a regular backup schedule that utilizes SyncBack to save specific folders to an external drive and Dropbox to save specific folders to a server.

Monthly: music and pictures. These items don’t change often and I rarely add a lot of new content to their folders so at worst if I loose a month of data, it isn’t that much. These files are backed up to the external drive.

Weekly: archived documents. I set up a document folder for any files I am no longer working on or don’t plan to work on in the next 6 months. These are backed up to the external drive.

Daily: Anything I am currently working on is stored in my DropBox folder which instantaneously syncs those files anytime a change is made to the file (i.e. you hit the save button). So all of my current school projects are stored here. These files are synced to a server online so I can access them from any computer.

Online data: Additionally, there is some online data that I save to my local drive, such as financial statements and my blog XML files. These files are archived in my documents folder and are additionally backed up to the external drive weekly.

You gotta have a system and this is mine. What’s your system for managing all your data?

The web is not a library

Kent Anderson at the Scholarly Kitchen has a post this morning on Google’s business model and its influence on the web as an organizer of information. He brings up a number of important questions that deserve rumination, including: is Google’s ad-based business model really the most natural model for the web? are digital [organizing] systems sufficient compared to more intuitive human models?

I allowed Kent to lead me until I can to this paragraph:

Because Google’s reputation is that it has been able to organize the world’s information, it’s tempting to think there’s a system in the digital realm that can actually do this. But the fact is, Google is a pretty limited organizational system. For instance, I can’t drag all the files I find in a search onto my hard drive. I can’t be sure the search results a week from now will be the same as those I’ll get today. I can’t compare one page to another within the system.

There are two inherent biases present in these statements that I often hear in speaking with people in the print publishing field. First, that one would WANT to drag all the files in a search to one’s hard-drive is a ridiculous scenario given the amount of information on the web. But let’s assume you queried the web so succinctly that you found 30 perfect results, you COULD get those to your hard drive via RSS or “save page”, but that’s not the point. Information on the web is more fluid than print media, it can change, grow, adapt, and improve itself (or fizzle away). Its complex structure exists thinly spread across space and time and is affected by those two forces.

Second, considering the billions of web pages that exist on the internet, Google does a pretty good job of using what metadata exists to organize item in terms of search query. And that’s where it’s strength lies: not in organizing the web from the top down (like Yahoo directories), but from the bottom up. The web, in its amorphous state, remains so until acted upon by your query and Google’s (or any search engine’s) algorithm. It is organization, just not the type that people of the old media are used to (some librarians included).

Otherwise, it’s an article worth reading if you have a moment.