Features

The sea change of the Web: What is the Second-Generation, Semantic Web?

Raymond Yee, IST—Interactive University

Large historical shifts are always difficult to discern properly, especially when one is in the middle of them. Many argue that adoption of XML (the eXtensible Markup Language) and the "Second-Generation Web" represent a profound sea change for the Web [12]. In spite of the hype surrounding XML, changes are clearly afoot for the Web — changes which, in turn, will transform the work of the University. This shift from the first to the second generation is a multifaceted transition (see Table 1 for a schematic of those changes). Perhaps the most tangible change is that of core markup language from HTML to XML. However, this shift by itself would be meaningless without an attendant change in mindset surrounding how documents are generated and interpreted on the Web.

Table 1. First vs. Second-Generation Web: The distinctions are not hard and fast ones but represent a continuum of what the first and second generations are usually like.

Characteristic

First-Generation Web

Second-Generation Web

core markup language

HTML

XML

formality and structure

unstructured documents

structured documents

semantics

implicit semantics

explicit labeling (metadata, Semantic Web)

relationship between content and form

HTML = conflation of content and form

layering of content and form: XML + transformation (e.g., XSL) to HTML, WML, PDF, or other formats

changeability

static documents

dynamic documents

decomposability and recomposability

monolithic, standalone websites

bricolage (aggregation), syndication, repurposing of content

interactivity

one-way, broadcast medium

two-way, writeable web

audience

for human consumption

for human and computer consumption (e.g., web services)

production control

centralized

decentralized (P2P)

Increasing numbers of automated computer agents (such as search engines) have been deployed to parse, produce, organize, and interpret the content of the Web (which is still largely expressed as HTML). However, human beings are much more proficient than these agents at deciphering meaning (or semantics) embedded in an HTML-formatted web page. Much of the meaning is implied (visually and otherwise) rather than explicitly stated. For example, an educated human reader of The New York Times on the Web has little difficulty picking out headlines, pictures, their associated captions, and various sections of the newspaper as displayed as HTML in a browser. On the other hand, developing software to automate the extraction and syndication of headlines or, more ambitious yet, the translation of these HTML pages into reusable content, is painstaking work (and patent-pending technology) [2, 19].

The implicit semantics of HTML were certainly not a surprise to the designers and users of SGML (from which HTML was derived), since SGML was created as a way of marking up structured documents to explicitly denote their semantics. SGML was not widely deployed on the Web because its generality and complexity made implementing SGML browsers impractical. XML was devised as a simplification of SGML specialized for marking up documents for the Web.

Interest surrounding XML has been far more widespread than that for SGML, spawning an entire family of technologies, practices, and frameworks. The general philosophy for applying XML to the production of documents is that of separating the content from the form, expressing the content and meaning in XML, while applying a transformation technique to the XML document (e.g., XSL or CSS) to end up with the published document in the desired form (e.g., HTML or PDF). In contrast to authoring documents directly as HTML, in which content and form are conflated, expressing documents as XML promises greater repurposability and reuse of the original content. An often used example is the deployment of content to wireless devices that use WML (Wireless Markup Language). Translating structured XML into WML is generally a more straightforward task than transcoding HTML into WML [4].

The core of the Semantic Web [11] — Tim Berners-Lee's vision for the future of the Web — is then a sea of documents marked up in XML that provides structuring of internal content. Associated with individual documents are metadata (literally, data about data). Tying metadata to documents is akin to labeling soup cans: one can read about the contents without having to open the can and taste the soup. Labels can describe not only the ingredients contained in a can but also the relationship among various cans (e.g., "If you prefer less sodium, try our XYZ brand."). The Semantic Web then is the network of resources on the Web whose semantics and interrelationships are explicitly stated, allowing software to make formal, even novel, deductions about these resources. Such a Web, if realized, would permit a new level of empowerment of computer agents on the Web. In a parallel development, web services have garnered just as much attention and excitement. The web services movement is essentially a reconceptualization of database-backed websites as large computational objects with an API (application program interface) that other programs can invoke, rather than primarily a destination for a user to access through the browser [21, 25].

The Second-Generation, Semantic Web accentuates the shift from static documents to dynamic documents. Moreover, once the underlying documents of the Web are semantically structured, the components of the document are more sensibly disaggregated, making them available for recombination or reassembly. RSS (RDF or Rich Site Summary) channels, an application of XML, already enable the syndication of headlines and HTML fragments (parts of weblogs), which are then reaggregated [7, 16, 30]. Such technology is beginning to empower a new level of web bricolage: the mixing and matching of not only the Web content of others but one's own. Web documents are no longer monolithic structures but pieces of a gigantic jigsaw puzzle [13, 14, 19].

A trend that is not so much about the use of XML — although clearly tied to it — is the deepening of the collaborative potential of the Web. Although the Web was created to enable physicists to share documents, it has evolved primarily into a broadcast medium, leaving much to be desired in the area of collaboration [23]. However, efforts abound in restoring the original communicative symmetry of the Web, to re-establish a two-way web, a writeable web — giving people power to not only talk to each other but also to author and publish texts in more direct, more spontaneous ways [9, 29]. Both server-based tools (such as Manila [8]) and P2P applications (such as Groove [3]) are facilitating new forms of collaboration [31]. Weblogging (see Weblogging: Another kind of website in this issue of BC&C), a currently popular form of often-updated web journals that refer to other parts of the Web, have potentially profound applications for journalism [17, 18], knowledge sharing [20], education [22], and project management [24]. In the context of collaboration, in which there are varying levels of formality in interactions, the machinery of XML semantic markup is especially helpful to facilitate the exchange of structured data embedded in the midst of conversations and narratives. For instance, RSS channels provide a helpful, formal mechanism for a common activity in weblogging, namely the sharing of URLs and accompanying commentary.

The Second-Generation Web is bound to have a profound impact on the worlds of education, research, and teaching. Everyday, this campus adds to its rich array of digital resources available on the Web to support research, teaching, and learning. Interest will grow in building upon this work, either through referencing, reusing, or recontextualizing these resources. Obviously, such trends will affect the entire campus, often in unexpected and novel ways. The goal of UC Berkeley's Interactive University Project (IU, http://interactiveu.berkeley.edu:8000/IU/) is to enable Berkeley to make its unique resources of people and knowledge available on the Internet to K-12 educators. IU is developing a new model [5] that involves collaborative communities (between the university and K-12) producing, creating, and disseminating curricular materials and learning objects, "digital resource[s] that can be reused to support learning"[28]. Packaged as XML and associated with relevant K-12 and discipline-specific metadata, these learning objects will be flexible and reusable documents, assembled and distributed in the IU Open Learning Environment (IU-OLE), a web environment in which California's teachers, students, and family members will be able to find and access, manipulate, assemble, and share these documents. The IU-OLE is designed to adapt to a Second-Generation Semantic Web [32].

Clearly the Second-Generation, Semantic Web is more a dream than a reality at this point. Several significant barriers impede its realization: the complexity (and grandiosity, perhaps) of the ideas, the challenges of writing structured documents (including the changes in authoring process over word processing, and the lack of widely adopted, inexpensive tools) [27]. Legal frameworks, social norms, and actual practice concerning intellectual property — all of which have been thrown into flux by the Web — may ultimately set the direction of the Web. "In the longer term, e-publishing presents some interesting questions about copyright protection and fair use. Bricolage authoring is increasingly common, especially for multimedia materials. For example, people will gather digital images from the Web and use them to illustrate presentations. Or within a company, the same system diagram may find its way into many different Powerpoint files and be used in entirely different ways. At the same time, we are seeing increasingly rigid copyright protection schemes come into place, schemes that significantly flatten notions of fair use. This tension is bound to bring about some important changes." [15] We are left then with a key question: through what practices can we enable the new technology of the Second-Generation Web to empower teachers, researchers, and learners to share their creative work and gain appropriate recognition for what they are sharing, while protecting their work from being misrepresented or misappropriated (e.g., sold by others for financial gain)?

References

[1] ACM Symposium on Document Engineering 2001: Call for Papers (http://www.documentengineering.org/).

[2] Connotate Technologies: Provider of web content monitoring, mining, and aggregation tools (http://www.connotate.com/core_technology.asp).

[3] Groove (http://www.groove.net/).

[4] IBM Transcoding Technology: Architecture, 2001 (http://www-4.ibm.com/software/webservers/transcoding/publications/transcoding.html).

[5] The Interactive University: A Future Model (http://iu.berkeley.edu/newiu).

[6] Open Content Syndication at Internet Alchemy (http://internetalchemy.org/ocs/index.html).

[7] RDF Site Summary (RSS) 1.0 (http://groups.yahoo.com/group/rss-dev/files/namespace.html).

[8] What is Manila? Userland, Inc. (http://manila.userland.com/).

[9] The Writeable Web (http://www.oreillynet.com/pub/t/84).

[10] XSL Tutorial (http://www.xml101.com/xsl/).

[11] Berners-Lee, T., Hendler, J. and Lassila, O. The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, 284 (May 2001). 34-38,40-43 (http://www.sciam.com/2001/0501issue/0501berners-lee.html).

[12] Bosak, J. and Bray, T. XML and the Second-Generation Web. Scientific American, 280 (May 1999). 89-93 (http://www.sciam.com/1999/0599issue/0599bosak.html).

[13] Brown, J.S. Growing Up Digital: How the Web Changes Work, Education, and the Ways People Learn. Change, 2000. 10-20 (http://www.aahe.org/change/digital.pdf).

[14] Brown, J.S. Learning, Working & Playing in the Digital Age, 1999 (http://serendip.brynmawr.edu/sci_edu/seelybrown/seelybrown.html).

[15] Carbone, C. and Marshall, C. Where E-Technologies Are Taking Us: An Interview with Cathy Marshall. scan360, 2000 (http://raven.ubalt.edu/features/scan360/e-books/culture/int1.htm).

[16] Dornfest, R. Meerkat: An Open Wire Service. O'Reilly Network (http://www.oreillynet.com/pub/a/rss/2000/03/17/about_meerkat.html).

[17] Lasica, J.D. Blogging as a Form of Journalism. Online Journalism Review, 2001 (http://ojr.usc.edu/content/story.cfm?ID=585).

[18] Lasica, J.D. Weblogs: A New Source of News. Online Journalism Review, 2001 (http://ojr.usc.edu/content/story.cfm?ID=588).

[19] Luh, J.C. Content Goes to Pieces: The Walls That Hold Data Inside Web Sites Are Crumbling, And They May Topple Existing Business Models With Them. Internet World Magazine, 2000 (http://www.internetworld.com/070100/7.01cover1.html).

[20] Nichani, M. and Rajamanickam, V. Grassroots KM through blogging. elearningpost, 2001 (http://www.elearningpost.com/elthemes/blog.asp).

[21] O'Reilly, T. The Network Really Is the Computer. O'Reilly Network, 2000 (http://www.oreillynet.com/pub/a/network/2000/06/09/java_keynote.html?page=1).

[22] Shefler, L. Indisciplinary Education: A Pedagogy of Nudges, 2000 (http://yinzgandantananat.editthispage.com/stories/storyReader$365).

[23] Udell, J. Internet Groupware for Scientific Collaboration. Software Carpentry, 2000 (http://software-carpentry.codesourcery.com/Groupware/report.html).

[24] Udell, J. Telling a Story: The Weblog as a project-management tool. Byte.com, 2001 (http://www.byte.com/documents/BYT20010524S0001/).

[25] Vasudevan, V. A Web Services Primer. XML.com, 2001 (http://www.xml.com/pub/a/2001/04/04/webservices/).

[26] Walsh, N. A Technical Introduction to XML. XML.com, 1998 (http://www.xml.com/pub/a/98/10/guide0.html).

[27] Walsh, N. and Muellner, L. DocBook: The Definitive Guide. O'Reilly & Associates, Sebastopol, CA, 1999 (http://www.docbook.org/tdg/html/docbook.html).

[28] Wiley, D.A. Connecting learning objects to instructional design theory: A definition, a metaphor, and a taxonomy. in Wiley, D.A. ed. Instructional Use of Learning Objects: Online Version, 2000 (http://reusability.org/read/chapters/wiley.doc).

[29] Winer, D. Bootstrapping the Two-Way-Web, 2000 (http://www.xmlrpc.com/bootstrappingTheTwoWayWeb).

[30] Winer, D. RSS 0.92, 2000 (http://backend.userland.com/rss092).

[31] Yee, R. What is P2P and why should you be mindful of it? Berkeley Computing and Communications, 11 (3). 16-17 (http://istpub.berkeley.edu:4201/bcc/Summer2001/info.p2p.html).

[32] Yee, R. and Yoes, C., The IU-OLE: University and K-12 Educators Collaborating and Teaching through Flexible, Repurposable Learning Objects. in ACM Symposium on Document Engineering 2001, (Atlanta, GA, 2001), ACM (submitted).

[ Next Article | Contents | Search BC&C | BC&C Main Menu | IST | UC Berkeley ]

Berkeley Computing & Communications, Volume 11, Number 4 (Fall 2001)
Copyright 2001, The Regents of the University of California