• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Stop wasting time looking for files and revisions. Connect your Gmail, DriveDropbox, and Slack accounts and in less than 2 minutes, Dokkio will automatically organize all your file attachments. Learn more and claim your free account.


Foundations of Information Organization

Page history last edited by c.j._petlick@whirlpool.com 10 years, 5 months ago

Foundations of Information Organization

Anyone who uses a personal computer has devised some kind of system for organizing their information — and it is almost always based on a taxonomy defined by the user. At a single-user level, this approach works because the underlying taxonomy is meaningful to the individual managing his or her own information. However, when information must be shared among multiple users, things become much more complicated, and many problems often arise. Fortunately, a number of best practices for organizing information have been identified. Some common organizational issues are examined on the following pages, followed by a discussion of best practices that can help make collaborative information sharing as efficient and productive as possible.


A Bit of History

We humans have been recording and organizing information for thousands of years. Among early examples of this are the Ebla tablets, discovered in 1974 during excavations of palace archives in Ebla, Syria. [Moorey 1991] This collection of 1,800 clay tablets, plus more than 4,700 fragments of tablets, dates to 2500 B.C. Interestingly, because the building containing the tablets burned 4,500 years ago, the soft clay tablets were fired in place and found arranged on collapsed shelves, accompanied by many of the clay tags used to reference them. The Ebla tablets represent two information repositories: one containing bureaucratic economic records, and the other containing ritual and literary texts. [Wellisch 1981]


Similarly, the library of the Assyrian king Ashurbanipal, dating to about 660 B.C., contained cuneiform tablets, arranged by subject.


The Ancient Library of Alexandria (Egypt), which existed from about 300 B.C. to its destruction in 48 B.C., was said to contain hundreds of thousands of papyrus scrolls, including a 120-scroll bibliography. [Morville and Rosenfeld 2006]


Somewhat more recently, in 1735 the Swedish botanist, zoologist, and physician Carl Linnaeus introduced his well-known seven-level taxonomy of the natural world (kingdom-phylum-class-order-family-genus-species) in the book Systema Naturae, laying the groundwork for all biological classification that followed.


In 1876, American Melvil Dewey published the first version of the Dewey Decimal Classification (also called the Dewey Decimal System), which is familiar to modern library users. According to the Introduction to Dewey Decimal Classification (in its 22nd formal revision), the DDC is "the most widely used classification system in the world. . . . At the broadest level, the DDC is divided into ten main classes, which together cover the entire world of knowledge. Each main class is further divided into ten divisions, and each division into ten sections (not all the numbers for the divisions and sections have been used)." [OCLC 2003]

About ten years later, in 1887, Dewey founded the first school of library science at Columbia University, thus formalizing an interdisciplinary field concerned with the “organization, management, and dissemination of information resources.” [OCLC 2003]


Much more recently (in the late 1990s), the new discipline of information architecture emerged in response to the challenges of organizing information in the digital age. [Morville and Rosenfeld 2006] In addition to roots in library science, information architecture is influenced and informed by cognitive psychology, computer science, human-computer interaction, usability engineering, and various design disciplines. Drawing on this multi-disciplinary body of knowledge, information architects study and respond to the behaviors and needs of modern (i.e., online) information-seekers.


The most important point is that, for thousands of years, taxonomies have provided the means to effectively organize, store, and retrieve information.


Theoretical Concepts

The taxonomy (or taxonomies) created for a document repository, library, database, or website is the key to locating any piece of information contained within it. For example, a taxonomy in the form of a hierarchical classification system enables navigation or browsing to information of interest, whereas a taxonomy comprised of metadata (keywords, tags, etc.) enables searching for information. In order to create useful, effective taxonomies, it’s important to understand a few theoretical concepts.


Information Seeking Behavior 

One useful model widely-referenced by information architects identifies four types of information “seeking” behaviors, particularly among online computer users: [Wodtke 2002]


Known-item search. In this type of seeking, the user typically knows what they want, knows what words to use to describe it, and has a good understanding of where to start. [Spencer 2006]


Exploratory seeking. In this type of seeking, users have some idea of what they need to know, but they may or may not know how to articulate it and, if they can, may not yet know the right words to use. They may not know where to start to look. They usually recognize when they have found the right answer, but they may not know whether they have found enough information. [Spencer 2006]


Don’t know what they need to know. The key concept behind this type of seeking is that people often don’t know exactly what they need to know. They may think they need one thing but really need another; or, they may be browsing without a specific goal in mind. [Spencer 2006]


Re-finding. This type of seeking appears to be straightforward --- a user is looking for information he or she has already seen. It is not the same as a “known-item search,” though, because the user may not be able to recall the exact words needed to locate the information. It’s not the same as “exploratory seeking,” either, because the user knows that they are looking for a specific piece of information, and they want to find it quickly. What the user really needs is a reminder and a short-cut to the information. [Spencer 2006]


Organizational Schemes

Information architects have also defined two main types of organizational schemes (i.e., taxonomies) that support the information seeking behaviors described above:


Exact (objective) organization schemes divide information into mutually exclusive sections. Assigning information to the proper section is quite straightforward. Some examples of exact schemes include alphabetical, numerical, chronological, and geographic schemes. These exact organization schemes tend to support known-item searching. [Morville and Rosenfeld 2006]


Ambiguous (subjective) organization schemes are more difficult to design and maintain. Some examples of ambiguous schemes include organizing by topic, by task, by audience, etc. . . . with various combinations thereof. However, even though they can be difficult to create and use, ambiguous schemes can be more useful than exact schemes, simply because people frequently don’t know “exactly” what they are looking for. As an example, library patrons can search for material by author, by title, or by subject. However, research has shown that patrons use the “ambiguous” subject search method much more frequently than “exact” author or title searches. [Morville and Rosenfeld 2006] Ambiguous organization schemes tend to support targeted exploratory seeking, as well as less-focused associative “don’t know what they need to know” seeking.


It is important to understand that, due to their subjectivity, decisions made when creating ambiguous organization schemes have consequences. For example, a classification scheme that deviates from accepted medical or pharmaceutical convention may result in overlooked information and, possibly, the death of a patient. More subtle is the way in which ambiguous organization schemes can reflect the biases of their creator(s). An excellent example of this bias is found in the Dewey Decimal Classification described above: Even today, the ten divisions under the “Religion” class consist of two “general” divisions, seven divisions related to Christianity, and only one division dedicated to all of the world’s other religions.


Information Structures

Information architects also describe several types of information structures, which are essentially ways of defining organization schemes (i.e., taxonomies):


Bottom-up information structures are typically derived from underlying database models. These structures can be very detailed, and they can leverage relationships between elements. Consequently, these bottom-up structures typically support known-item searching, but they can also be used to define complex schemes that support exploratory seeking (e.g., think of the complex network of categories, subcategories, sub-subcategories, etc. at eBay.com).


Top-down structures, on the other hand, are the classic hierarchies that we are all familiar with. By starting with a very broad category, then choosing from selected sub-categories, and then sub-subcategories within those, we are supported in our exploratory seeking --- as well as coming across information “we didn’t know we needed to know.”


Hypertext structures are relatively new, consisting of links between metadata associated with information objects (files). While hypertext structures may be hierarchical, they are very often non-linear. This offers tremendous flexibility, supporting exploratory seeking in entirely new ways. However, users can easily get lost in non-linear structures, quickly becoming overwhelmed and frustrated. [Morville and Rosenfeld 2006]


Social classification (tagging) is a type of information structure that has appeared only recently. Free-tagging (also known as “collaborative categorization” or “mob indexing”) allows users --- rather than repository administrators --- to create and apply metadata to information objects. Combining elements of all three structures described above, what makes social classification unique is that its participatory and democratic nature generates taxonomies that are organic and constantly changing. [Morville and Rosenfeld 2006]


Applied Research

A great deal of research has been conducted to determine (a) what kinds of problems are encountered by information-seekers, and (b) what kinds of organizational models best support information-seeking. In the last twenty five years or so, there has been a particular focus on the users of shared file repositories. This applied research has been conducted by individuals in various fields, including library science, information systems, human-computer interaction, and computer science.


A number of researchers have studied common problems associated with information retrieval. One well-known study examined business information retrieval (of both paper-based and electronic files), and identified four broad areas of concern. These problem areas, with specific examples of each, include: [Gordon 1997]

  • Searching for and losing information
    • Lost time
    • Lost information
    • Inefficient (and ineffective) work patterns
  • Ineffective sharing
    • Lack of institutional support for sharing
    • Duplication of effort
    • Missed opportunities
    • Non-utilized (or under-utilized) work product
    • Wasted organizational knowledge
  • Overload and volume
    • Significant (and increasing) extent and consequences of information overload
    • Cost of information storage
  • Organizational perspectives
    • Information retrieval not a priority
    • Resistance to change
    • Incorrect assumption that business re-engineering will resolve retrieval problems


A more recent paper described issues specific to online shared file repositories. One general observation was that “[shared file repositories] tend to accumulate content over time and become more and more disorganized, such that users have difficulty finding the files they need.” [Rader 2007]


A particular problem noted in this same paper was that shared file repositories generally lack structure: “Despite the importance of information stored within them, shared file repositories generally do not have explicit rules or structures for organization and searching, like a library catalog does.” The author also acknowledges that while the subjective use of language often results in confusion over category names or file names, this problem can be mitigated by good communication and feedback among repository users, creating “common ground.” Another point made is that users of shared file repositories are both producers and consumers of information, but “in most situations, people do not effectively package information for others.” [Rader 2007]


Fortunately, there has also been considerable research into what works well, and how we can create taxonomies, organizational schemes, and organizational structures that facilitate information retrieval.


For example, two independent studies of computer users’ filing-and-finding behaviors in the mid-1990s reported surprisingly similar observations.  When the researchers became aware of each others’ work, they collaborated on a paper that jointly presented their findings. [Barreau and Nardi 1995] While both studies had fairly small study groups, they covered a wide range of operating system technologies (DOS, Windows 3.0, OS/2, and Macintosh OS) and varying levels of user experience. Despite these differences, the researchers found striking similarities:

  • Users have a strong preference for location-based file-seeking. In both studies, users overwhelmingly preferred to navigate to the subdirectory where they believed the file was located, then browse through the list of files available, and then select their target --- regardless of whether they were using a graphical user interface or a text-based command-line interface. Even though it was available, users rarely (or never) used a “Find” GUI function or the DOS “whereis” command. The underlying reason was that users rely on “recognition” rather than “remembering” and, in fact, they often cannot remember file names.  

  • File location serves a critical “reminding” function for users. Users in all environments placed their important files in locations where they were most likely to notice them (i.e., an upper level of a directory structure in either a command-line or GUI operating system, in a highly-noticeable desktop location in the case of a GUI operating system, or placed near related icons in the case of a GUI operating system).

  • Users manage and retrieve three types of information:
    • Ephemeral information consists of “action items,” which only need to be retained for a short time.

    • Working files are active for weeks or months, are important enough to be organized by location and category, and are accessed frequently, so users usually have no trouble locating and retrieving them.

    • Archived information has a shelf life of months or years, but it is rarely accessed.

  • Dealing with inactive files and archiving information is a low priority for users. Since archived work is rarely accessed, users will attempt to establish well-defined directory structures to help them locate information later. However, every user in both studies reported that “their attempts to establish elaborate filing schemes for archived information failed because they proved to require more time and effort than the information was worth.” Additionally, the researchers found that ephemeral information tends to remain in a system well beyond its active life, often cluttering the file system. [Barreau and Nardi 1995]


Based on the research above, some conclusions (i.e., best practices) might include the following:

  • Users must be able to navigate (browse) to information of interest. While search functionality may be required as well, especially in repositories containing large volumes of information, organizing files in navigable directory (folder) structures is essential.

  • Careful attention must be paid to the handling of ephemeral information. It should be placed in locations that provide a strong “reminding” function, but it must also be managed in such a way that information clutter does not become a problem.

  • Since users do not want to expend much effort creating and maintaining file archives (and as long as the information is important enough), archiving systems should be as transparent and effortless for users as possible.

Comments (9)

Evelyn Kay said

at 3:44 pm on May 31, 2010

It is clear and obvious that you have done your research because thiis section seems to flow effortlessly. I am a history fanatic so I definitely appreciated the background information.

maso0137@... said

at 5:10 pm on May 24, 2010

I also think this is well written, by breaking the concept down into smaller pieces the topic became much more managable.

Erik Wallin said

at 2:23 pm on May 24, 2010

Well structured and written. I like the History section, but I would rather see a list than 1 sentence paragraphs.

kcarrero@depaul.edu said

at 10:07 am on May 24, 2010

I have to agree with the previous comments. The section is easy to read because one sentence leads to the next with a natural flow. I found the individual sections were organized in the proper order and definitions were clear and concise.

Jimmy Prathan said

at 12:54 am on May 24, 2010

Good flow to support the outline, well defined sources and interesting info on the history of the subject. Good job.

John Wolfram said

at 10:44 pm on May 23, 2010

Nice Job. Nice structure and material flows in a logical manner.

wajihazafar@... said

at 10:17 pm on May 23, 2010

Good job C.J (or whoever worked on this section)...You've made an otherwise dull subject quite interesting to read and understand in this section. I wish I had more of that skill to work on my chapter!! :)

marisha.s.b@... said

at 9:33 pm on May 23, 2010

After reading this, must say it was structured really well, words were clearly defined, and fairly easy read.

Mosa Sallam said

at 7:14 pm on May 23, 2010

I truly have nothing to say, the structure is great and the contents are well defined and straightforward. I don’t really think there is anything more the readers will ask for. Great job.

You don't have permission to comment on this page.