NCSA PSP Annual Meeting     Agenda    |    Registration    ||    Alliance All-Hands    |    MSI
NCSA PSP Annual Meeting

NCSA PSP Annual Meeting Abstracts

Alliance Grid Science Portal
Jay Alameda, Research Programmer, Scientific Applications

The group will demonstrate the Alliance Grid Science Portal with two usage scenarios:

  • Showing the portal to setup and submit a WRF job which includes processing, WRF simulation, postprocessing (scoring), and visualization using VisBench.


  • Showing the portal to setup and submit a linked chemical engineering copper electrodeposition job on a grid resource, followed by visualization through VisBench.


Allstate/NCSA Partnership Activities in Data Mining and Knowledge Management
Fred Geinosky, Senior Systems Planning Consultant, Allstate Insurance Company

Fred will discuss the projects that Allstate and NCSA have jointly worked on the past year. Emphasis will be given to the NCSA folks who have worked on these projects and what tools they used and/or are using to complete these projects.


CAVE Demo
Bill Sherman, Visualization Programmer, Scientific Applications

At the NCSA CAVE we will be demonstrating our latest virtual reality applications, displayed on our new projectors with increased resolution and clarity. The NCSA virtual reality applications have been developed through collaborations with industry, local research scientists, NCSA Faculty Fellows, University of Illinois students, and VR programmers from around the Alliance. Visitors to the CAVE will have the opportunity to see our newest work, or some old classics that may be new to them.


Computational Biology at NCSA
Eric Jakobsson, Senior Research Engineer, Scientific Applications

The new paradigm of PC-based grid and cluster/supercomputing provide unprecedented and growing cost-effective compute power together with assured stable architecture for computational biology. These capabilities open new opportunities in meaningful simulation and analysis of biomolecular assemblies, biomolecular structure prediction, and computational genomics. Combined with group-to-group communication tools such as the Access Grid, this grid/cluster compute power can be put into the service of corporate training and formal education in computational biology on a national or even international scale. This presentation will discuss implementations of these capabilities that are presently ongoing or under development, or that might be possible in the near future.


Commodity Computing with Condor
Miron Livny, Professor of Computer Science, University of Wisconsin-Madison

Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.

While providing functionality similar to that of a more traditional batch queueing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines - if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource.

The ClassAd mechanism in Condor provides an extremely flexible and expressive framework for matching resource requests (jobs) with resource offers (machines). Jobs can easily state both job requirements and job preferences. Likewise, machines can specify requirements and preferences about the jobs they are willing to run. These requirements and preferences can be described in powerful expressions, resulting in Condor's adaptation to nearly any desired policy.

Condor can be used to build Grid-style computing environments that cross administrative boundaries. Condor's "flocking" technology allows multiple Condor compute installations to work together. Condor incorporates many of the emerging Grid-based computing methodologies and protocols. For instance, Condor is fully interoperable with resources managed by Globus.

Condor is the product of the Condor Research Project at the University of Wisconsin-Madison (UW-Madison), and it was first installed as a production system in the UW-Madison Department of Computer Sciences nearly 10 years ago. This Condor installation has since served as a major source of computing cycles to UW-Madison faculty and students. Today, just in our department, Condor manages more than 1000 workstations. On a typical day, Condor delivers more than 650 CPU days to UW researchers. Additional Condor installations have been established over the years across our campus and the world. Hundreds of organizations in industry, government, and academia have used Condor to establish compute installations ranging in size from a handful to well over one thousand workstations.

The Condor software and complete documentation is freely available from the Condor project's website at URL http://www.cs.wisc.edu/condor/. Most flavors of Unix are supported, as well as Windows NT/2K.


D2K - Data to Knowledge
Michael Welge, Research & Development, Technical Program Lead, Computing & Data Management

D2K is the next-generation in workspace architecture for the creation of data analysis applications in any domain area. It is a visual programming environment allowing users to easily connect software modules together in a unique data flow environment to form an application. D2K supplies a standard set of software components and application templates, along with a standard API for software component development. The software modules are reusable components, which facilitate efficiency and collaboration among developers. Modules that have been modified for a specific domain application as well as generated models can be stored in a central repository for use by others in a given research community.


Data Mining at NCSA: What Next?
Michael Welge, Director, Automated Learning Group

The area of data mining, sometimes referred to as knowledge discovery from databases, has already produced significant practical results for Private Sector Program Partners in areas such as fraud detection, medical outcomes analysis, prediction of customer purchase behavior, prediction of web user interests, and optimization of manufacturing processes. It has also led to a set of fascinating scientific questions about how computers might automatically learn from experience.

While this first generation of data mining algorithms has produced highly practical applications, data mining is still in its infancy. It is reasonable to expect that future algorithmic development will produce an order of magnitude advance in the state of the art. These next generation algorithms will:

  • Accommodate dramatically more diverse sources and types of data
  • Automate a broader range of steps involved in the data mining process
  • Support mixed-initiative data mining in which human experts collaborate with the computer to form hypotheses and test them against the data
  • Allow for the use of distributed data and computational resources
  • Provide an easy to use environment to facilitate the development and execution of this new class of data mining approaches.
This presentation will highlight future activities of the Automated Leaning Group that may address some of the next generation data mining needs and answer the question of "What Next?" in the areas of applications, research and development at NCSA.


e-Learning & Knowledge Sharing
Tim Wentling, Senior Research Scientist, Cybercommunities

This presentation will describe the research and development efforts of the Knowledge and Learning Systems Group. The group has been involved in conducting research on knowledge sharing practices within companies. They have used this research to advance the field of KM, but also to inform their design of a new knowledge center concept for organizations. The group is also doing research on the relationship of culture (national, professional, and organizational) to e-learning interaction and success. These studies are being used to facilitate the design of adaptive systems for learning. Overall, the group sees the convergence of e-Learning and Knowledge Management and their work is directed with this focus. The presentation will be a combination of research sharing and showing what is being built.


Exchange Conferencing Server
Tilt Thompkins, Division Director, Integrated Decision Technologies

The software industry has recently released several new server software products that when used in conjunction allows multiple participants to communicate in a real time, encrypted, scalable virtual conferencing environment. Representative examples include a large assortment of tools and utilities in addition to standard real time video/audio collaboration. Demonstrated in an interactive display will be real time conferencing, file transfer, collaborative application sharing, automated H.323 unicast to multicast bridging, 128 bit conference encryption, password authentication, and concurrent private and public chat.


Hot Topics Detection
Tilt Thompkins, Division Director, Integrated Decision Technologies

The aim of the Hot Topics project is to produce software capable of analyzing a large body of documents to identify those concepts which are "emerging" or of high interest to a specific user. The software used to identify hot topics is composed of several modules, and a document-gathering step. After accumulating a set of documents, the hot topics collection builder module uses natural language processing techniques to detect and gather the noun phrases ("concepts") used in the texts. Next, the user is presented with a query interface, in which he or she can select a subset of the data gathered to be analyzed for hot topics detection. For example, a user could start up the hot topics engine, feeding all of the patent data from a large database; then, at the query interface, the user could choose to restrict analysis to only patent documents produced by IBM or Kodak between the years 1970 and 1980. After the user has selected the data subset and tuned the search, the TEXAN model builder is used to compute clusters of co-occurring concepts, and also to gather data about the concepts used in the next step of the process--the trained neural net classifier. This classifier takes information from the model builder about each of the concepts, such as frequency of appearance year by year, appearance in clusters, and size of clusters in which a concept appears. Given these parameters, the classifier runs a pre-trained neural net, and classifies each concept as emerging or non-emerging. The final module of the hot topics system is a visualization interface, which graphically displays the results of the classifier, and gives other statistical information about the concepts, which have been analyzed.


HyperComputiCations: What Happens When The Grid Grows Up And Has To Make A Living?
Gerry Labedz, Dan Noble Fellow, Motorola Laboratories

Efforts underway today to create the capability to link computing resources that are geographically separated are aimed at Big Science and Big Government programs. This hoped-for computing capability is called The Grid. Someday, The Grid may be able to play a role in the commercial world by being part of a future communications infrastructure. A research project, active for three years in Motorola Laboratories, explores how HYPERfast networks could combine with remote COMPUTational assistance to enable the commercially viable content-rich communiCATIONS applications of the future. That is, The Grid may have a role to play in "HyperComputiCations systems".


Interactive Passive Stereo Theatre
Donna Cox, Division Director, Experimental Technologies

A prototype interactive multi-platform API, VirDir2, has been developed to navigate, record and playback visualization applications in a new interactive high-definition (HD) passive stereo theatre. The theatre can playback HD digital frames as well as display HD interactive graphics applications. A multi-way remote collaboration using streaming video will be demonstrated. Users will collaborate with other remote users as streaming video avatars within three-dimensional visualization data. For this demonstration, VirDir2 will couple with Partiview, a particle and geometry viewer for scientific and astronomical data.


Iperf/Multicast Beacon
John Towns, Division Director, Scientific Computing Division

The National Laboratory for Applied Network Research (NLANR) Distributed Applications Support Team (DAST) located at NCSA develops tools and provides support for the use of resources connected by high-performance networks. We will be showing a few of these tools as example of the work under way in this group. These tools are:

Iperf - a modern alternative for measuring TCP and UDP bandwidth performance allowing for the tuning of various parameters and UDP characteristics. Iperf reports bandwidth, delay, jitter, and datagram loss.

Multicast Beacon - an active measurement software that monitors the performance and connectivity within a multicast session. This tool has been recently updated and work is being done now to extend the systems functionality. The beacon has two components:

  • Beacon Client -- an active probing program running at each machine and measuring the performance of the transmission. It then reports to the Beacon Server. The current version (v0.63) is written in Java.
  • Beacon Server -- a central server collecting the performance information from the Beacon Clients.
End-to-End Performance Framework - a platform to allow easy integration of network diagnostic tools and the ability to query the results of these tools and act on them in order to understand end-to-end performance issues and potentially make recommendations on how to improve the performance of an applications on a particular network path. This is a new project within this groups and is just now seeing first prototype pieces.


Linux and Clusters in the Enterprise
Rob Pennington, NCSA Senior Associate Director of Computing and Data Management

Linux was insignificant as an operating system for the enterprise until very recently. In large part this was due to the perception that it was an unsupported operating system that was the domain of people wanting to do Open Source work, which was usually viewed as irrelevant to the business environment. What we are seeing is that an increasing number of vendors are offering Linux on their systems along with support necessary to make it an effective operating system. Just as important is that there is an increased emphasis on using Linux to scale up in clusters. We are currently using large scale computational Linux clusters at the NCSA, working with other sites on Open Source projects and using it on desktops. We expect the acceptance in the community and the enterprise to continue as people and businesses gain experience with Linux and clusters.


The Making of Discovery Channel Program
Donna Cox, Division Director, Experimental Technologies

Donna Cox will describe the technology and visualization techniques for creating "Unfolding Universe." Her group created most of the computer graphics scenes and visualized Alliance scientists computational science. She will show clips for the show that will air June 3, 2002.


NCSA's e-Learning Solution—Chame-Learn
Tim Wentling, Senior Research Scientist, Cybercommunities

This session will demonstrate many of the features of the e-Learning system designed by the Knowledge and Learning Systems Group at NCSA. The system provides customizable user interfaces allowing multiple delivery options to individual and organization-wide users. Some of the system components are designed to deliver specialized content to wireless mobile devices.

A prototype of the system was used last Summer to deliver a graduate-credit course to students located both on and off campus. Usability research is currently being performed to assist in the refinement of this custom system.


NCSA Grid Security Directions
Randy Butler, Division Director, Networking and Middleware

Over the years NCSA has maintained an open yet secure high performance computational local area environment, adapting to ever increasing threats with new capabilities. NCSA is actively involved with local, state and federal cyber law enforcement. Our security projects have included secure authentication systems such as Kerberos, SSH, and PKI. We have utilized a wide array of intrusion detection strategies and have developed grid security software that is in use around the world today and developed key management prototypes for the United States Air Force. With the increased interest in cyber security NCSA is building on these experiences and expanding cyber security research efforts. In this talk I will discuss NCSA's past, present, and future cyber security projects highlighting the grid aspects and discuss some of the initiatives we are working on now.


NCSA VIAS
Alan Craig, Research Programmer, Computing & Data Management

The NCSA VIAS system automatically gathers internet information resources on specific fields of interest, and then applies an assortment of algorithms to the resulting database to extract metadata such as company names, bibliographic references, and over 30 more metadata types. The system then provides a web-based interface to support various kinds of queries against that database of information. This is a fully automated system, requiring no human intervention.

We will be demonstrating the VIAS system, highlighting the changes and improvements we have made since last year, including database optimizations, a port to Oracle, new content areas, assembling a low cost production cluster, plans to integrate with NCSA T2K/D2K, and user interface enhancements.


Network Utilization Profiling and Wireless
Tony Rimovsky, Assistant Director, Network, Engineering & Resources

The Network Research group at NCSA will be demonstrating its PacketLine network flow processing application and an example of low cost, rapidly deployable wireless networking spanning a large geographic area. Information will also be available about other wireless projects the group is working on and NCSA's involvement with 10 Gigabit/Second network technologies.

PacketLine is part of an ongoing effort to produce network utilization information that can be used to characterize how users are using the network. PacketLine takes incoming information about traffic patterns (network flows) and provides a modular system for handling the information in a pipeline fashion. The system provides a toolkit from which multiple actions, such as storage, analysis, visualization and intrusion detection can all be taken. We will demonstrate the pipeline process and how it can be used to look at data from both network traffic utilization and application usage perspective.

The group will also demonstrate a low-cost, portable wireless deployment in a point to multipoint configuration. The goal of this demonstration is to show the potential for rapidly deployable wireless infrastructures that both provide a connection between main infrastructure and a remote wireless cloud. Using this technology, access to a home site could be provided at a remote location up to several miles away in a few minutes.


NLANR Tool Development
John Towns, Division Director, Scientific Computing Division

The National Laboratory for Applied Network Research (NLANR) Distributed Applications Support Team (DAST) located at NCSA develops tools and provides support for the use of resources connected by high-performance networks. On overview of current activities in network performance and related tool development and applications enablement in grid-based environments will be provided.


PORT - Postweb Open Resource Toolkit
Doug Fein, Technical Programming Lead, Communications

Postweb Open Resource Toolkit will show the voice interaction with data and conversation at the PSP meeting. This project has a simple display system that will bring up data sources based on items discussed in a conference room, or on the telephone. This demo will require a space that is not too near other demos to avoid voice interaction problems. Using a display system we will display documents, and web pages that relate to the conversations ongoing in the room, and we will show the predictive elements of the Port Voice interaction system.


Prototype Knowledge Sharing System
Tim Wentling, Senior Research Scientist, Cybercommunities

This demonstration will highlight the major features of a knowledge sharing prototype designed by the Knowledge and Learning Systems Group at NCSA. The system provides for the capture, cataloging, and sharing of information in and among communities or practice.

Major features of the system include:

  • Document management
    • media and document repository
    • automated information-gathering utility
    • search & query
  • Seamless integration with all knowledge sharing applications
    • including e-Learning system
    • within a custom portal interface
    • adaptable and scalable
  • Collaboration with various communities of practice
    • custom web forums with knowledge capture
    • multipoint desktop video conferencing
  • Sophisticated user support system


Real-Time Automatic Speech-Based Indexing of MPEG-2 Video Streams
Tilt Thompkins, Division Director, Integrated Decision Technologies

Real-time speech-based indexing of video streams is normally a multi-pass operation. Video content is encoded and stored to a file-server for future analysis on a campus network. With live streaming provide by a networked MPEG-2 encoder, the indexing application may tune into the stream as a multicast. By singling out the audio channel from the video it may perform real-time voice recognition. The result of this is to build a database that may be searched using natural language processing in order to correlate with video events containing contextual speech. This provides a multi-level approach to context retrieval without multiple passes on the video streams. This is accomplished by tight integration with off the shelf MPEG-2 encoder appliances and the software indexing package. The demonstration shows how an MPEG-2 video multicast present on the network provides the source media to a real-time indexing package running on a PC.


SAMCat - Secure Active Metadata Catalog
Tilt Thompkins, Division Director, Integrated Decision Technologies

The goal of the SAMCat (Secure Active Metadata CATalog) project is to create versatile distributed objectspace using tuplespace systems—and then to exploit the power of this software as the backbone for comprehensive scalable file and services management. This next-generation system uses the XML protocol to generate objects accessible not only by location, but by associative matching on the attributes of the objects in the space—giving the user the capability to access files and services based on their characteristics, rather than on an arbitrary location name. In this way, we use tuplespaces to create a secure, location-transparent system that cleanly and elegantly manages data, documents, and services across large networked systems.

A central use of the distributed tuplespace paradigm is to create a secure, distributed, scalable automatic cataloguing system for large groups of static or active objects. At the foundation of SAMCat is a distributed tuplespace system, through which objects (stored in XML format) can be accessed using associative matching on all or any of their tag fields, rather than through any preset catalog structures. In this way, access to the objects (which can be of any type—text or binary files) is query-centric, rather than predetermined by a fixed indexing system—the user can access documents and files from across a networked system according to his or her own needs, either by matching on preset (or automatically generated) metadata for a document, or by treating the entire document as metadata, and using full-text matching. Furthermore, the location of the documents is handled by the distributed SAMCat, and is entirely transparent to the user—documents can be retrieved from a local machine, or from a machine at the other end of the network, with no difference in access.

SAMCat goes beyond simple file management, however; the tuplespace backbone can be used to automatically manage distributed processes, as well as static objects like documents and data files. Because of the anonymous, asynchronous nature of tuplespace communication, we can harness the space to work as an automatic job distributor, in which master users place jobs in the space, and client workers perform the jobs, writing results in XML format back to the space for the master to gather and process. We will demonstrate an optimization suite which automatically distributes parallel processes to a group of worker machines (which can be added and removed on-the-fly!), and returns the results to a calling program—which need know nothing of the number or location of the workers carrying out its request. Furthermore, we can use the tuplespace to allow clients to transparently access services across a network without needing to know location or name of these service machines—through associative matching, we can access software and components by comparing metadata, rather than absolute unique names. We will also demonstrate a web services system that operates entirely through the cataloguing and job distribution mechanisms of SAMCat, and show other examples illustrating the versatility of this software, including an image processing tool and a dynamic web page generator.


Service Oriented Architectures
Tilt Thompkins, Division Director, Integrated Decision Technologies

Service Oriented Architectures, e.g., web services, are emerging as serious competitors to distributed object systems, e.g., CORBA, for enterprise level application integration. This talk will review service-oriented architectures, common web services protocols such as XML, SOAP, and WSDL, and commercial offerings from Sun and Microsoft. The talk will also illustrate how NCSA's Secure Active Metadata CATalog (SAMCat) project can be used to create and mange applications integration. SAMCat is a versatile distributed tuplespace system and we exploit the power of this software as the backbone for comprehensive scalable file and services management. This next-generation system uses the XML protocol to generate objects accessible not only by location, but by associative matching on the attributes of the objects in the space—giving the user the capability to access files and services based on their characteristics, rather than on an arbitrary location name. In this way, we use tuplespaces to create a secure, location-transparent system that cleanly and elegantly manages data, documents, and services across large networked systems while not requiring the tight integration of other approaches.


Smart Environments
Steve Pietrowicz, Research Programmer, Scientific Applications

This presentation will describe the components of the NCSA smart room. We will discuss the design philosophy of the project, present the current the software and hardware we are using, and describe the components we will build in the future.


Smart Room
Steve Pietrowicz, Senior Research Programmer, Experimental Technologies & Volodymyr Kindratenko, Senior Research Scientist, Experimental Technologies

This demonstration will show how systems that monitor their surroundings and external events can enhance your own personal environment. Interaction with the environment using speech, hand-held devices and smart tags will be shown.


Smart Tags
Volodymyr Kindratenko, Research Scientist, Scientific Applications

The main goal of the Smart Tags project is to explore the capabilities of Radio Frequency Identification (RFID) technology for identifying and tracking people and objects in a content of a Smart Space Environment. The tracked location data can be stored and can be used to provide advanced services via data mining and visualization and to trigger various events in the Smart Space Environment.


Supply and Demand Chain Optimization—Models and Algorithms
Udatta Palekar, Professor of Mechanical and Industrial Engineering, University of Illinois at Urbana-Champaign

In this presentation, we discuss models and algorithms we have developed for Supply and Demand Chain Optimization problems. After a brief description of the mathematical techniques used in our research, we will describe several applications of these ideas to design and operational problems in supply chain and distribution management. We discuss, supply chain network design problems for multi-location, multi-product manufacturing; optimal design of multi-echelon distribution problems; and operations planning problems in supply chain networks. On the demand management side we will discuss our work on optimal assortment planning and trade promotion planning.


Tiled Display Wall
Paul Rajlich, Visualization Programmer, Scientific Applications

The NCSA Visualization and Virtual Environments group will be demonstrating its scalable tiled display wall, one of the highest resolution displays in existence. The current configuration is a 40-projector system driven by a cluster of 40 PCs, resulting in one logical display with a resolution of 8,192 x 3,840 pixels. This work represents NCSAs contribution to the scalable tiled display wall community and is being documented as part of the Alliance Wall-in-a-box initiative. The applications that will be shown include high-resolution movie playback, large image exploration, and interactive 3D visualization.


VisBench
Randy Heiland, Research Programmer, Scientific Applications

VisBench is a component-based data visualization and analysis system. A primary goal is to enable a person to easily visualize remote data. VisBench uses CORBA (http://www.corba.org/) for the component-to-component communication. We will demonstrate a VisBench client running on a laptop that communicates with a VisBench server running on a Linux cluster. Data will be (transparently) moved from a mass storage device (Unitree at NCSA) to the cluster, where the visualization will be performed and results sent to/displayed on the laptop. More information can be found at http://visbench.ncsa.uiuc.edu/.


 

 NCSA/Alliance
605 East Springfield Avenue
Champaign, Illinois 61821
217-244-0072