Scientists from the National Center for Data Mining (NCDM) at the
University of Illinois at Chicago and the Geophysical Center at the
Space Research Institute, Russian Academy of Sciences, Moscow,
demonstrated a new method for distributing extremely large volumes of
scientific information across the world. They successfully moved 1.4
TeraBytes (TB) of data in about 4.5 hours over a 1 Gbps lightpath
between Chicago and Moscow as part of the Teraflow Network
initiative. This event, which represents the highest performance
information transfer ever recorded between these two countries, was
made possible by a unique international organizational partnership.
Although the amount of science information is growing rapidly, the
ability to move it on the regular Internet is still very limited.
NCDM partnered with Russia's Geophysical Center to demonstrate a new
capability for moving science data by transferring the Sloan Digital
Sky Survey (SDSS) dataset between their two sites, using a
specialized international communications facility.
This new capability uses two integrated innovations. One requires
placing information directly on lightwaves while avoiding the slower
services that are used by the traditional Internet. The other uses
specialized communications technologies (new network protocols) for
high performance streaming to avoid the limitations of standard
Internet communications.
Using NCDM's open-source, high-performance network transport protocol
UDT (UDP-based Data Transfer) on the Teraflow Network, researchers
were able to quickly transfer the SDSS astronomy catalog data,
between Chicago and Moscow. The 2.5 TB catalog is compressed to 1.4
TB, split into 60 files and, is distributed to astronomers around the
world from the NCDM in Chicago. Using UDT, the 1.4 TB was
transferred over a 1 Gbps lightpath and then decompressed in Moscow
to its original size. It now resides on a local server www.skyserver.ru in Moscow.
This data transfer had a sustained rate of 711 Mbps and a peak rate
of 844 Mbps, and took about 4.5 hours to complete. This is about
the speed that the data could be moved across the city of Chicago
over a 1Gbps network, which graphically illustrates how barriers of
distance are being eliminated by the new communications
infrastructures and technologies. These techniques are required for
research and experimentation for many science disciplines, and in the
future it may also be used for many types of data intensive
commercial applications.
This accomplishment was made possible through a unique partnership
among organizations in eleven countries that have created
international advanced communication facilities at locations
literally around the world. GLORIAD, the Global Ring Network for
Advanced Applications Development, is a consortium of several
countries, notably the USA, Russia, China, Korea, Canada, the
Netherlands, and the Nordic countries (Denmark, Sweden, Norway,
Finland and Iceland), that are contributing networking capabilities
to build a global 10 Gbps optical network around the northern
hemisphere of the globe in support of advanced science and
engineering. In the USA, GLORIAD is supported by the National Science
Foundation's International Research Network Connections (IRNC)
program, which also funds a 10 Gbps path between Chicago and
Amsterdam called TransLight/StarLight. GLORIAD has been provided with
a 3 Gbps path on TransLight/StarLight to allow a direct high-
performance connection between the USA and Europe.
GLORIAD's Russian partners recently installed a 10 Gbps path from
Amsterdam to Moscow, provided by the Russian Research Center
"Kurchatov Institute". This allowed a 1 Gbps lightpath to be
dedicated to the Teraflow Network, from Chicago (the StarLight
facility), to Amsterdam (the NetherLight facility) and then on to
Moscow (the MoscowLight facility). GLORIAD participants are part of a
global initiative called the Global Lambda Integrated Facility
(GLIF), which promotes the paradigm of lightpaths, or lambda
networks, for data-intensive scientific research and applications.
This science demonstration was also supported by NCDM's Teraflow
Network, an international facility designed to develop innovative
technologies to stream massive distributed datasets over high-
performance networks, at 1 Gbps, 10 Gbps and multiple 10 Gbps. The
TeraFlow Network is being used as a next-generation platform, capable
of supporting data-intensive applications, including many requiring
information transfers that cannot be supported by traditional
networks. The TeraFlow Network is developing techniques that will be
required by future global applications.
"This is the latest in a string of demonstrations that proves that it
is now practical for the working scientist to efficiently access
terabyte size datasets from anywhere in the world. All it takes are
today's high-performance networks and new network protocols, such as
UDT," said Robert Grossman, NCDM director at the University of
Illinois at Chicago. "With the technology now available, there is no
reason for scientists not to have access to the latest data available
in order to advance their research."
"We look forward to using these new technologies to share and mine
very large databases in global change, space weather and remote
sensing studies," said Mikhail Zhizhin, head of the Telematics Lab at
the Geophysical Center in Moscow, "and to applying the technologies
from the Teraflow Network to the larger GLORIAD infrastructure. In
particular, the Research Group is working with the USA National
Geophysical Data Center (NOAA) on the Space Physics Interactive Data
Resource (SPIDR), and is working with Microsoft Research Cambridge on
the Environmental Scenario Search Engine (ESSE). Additionally, there
is strong demand to transmit real-time data streams and high-
resolution images, which has not previously been possible. "
This is a significant achievement between USA and Russian
scientists," stated Alexey Soldatov, co-director of GLORIAD/Russia
and director of the Institute of Information Systems, Russian
Research Center "Kurchatov Institute" (RRC "KI"), - "GLORIAD/Russia,
based at RRC "KI", provides support and development of the networking
infrastructure for scientist and educators. In addition, our Research
Center is one of the leaders of the nationwide Russian Data Intensive
Grid program, that will use GLORIAD's advanced networking
infrastructure to support data-intensive projects and frontier
experiments in high-energy physics, nanotechnology, gravitational
wave research, digital astronomy and molecular genomics."
"The ability to move multi-terabyte datasets internationally in a
matter of hours, and ultimately minutes, has been based on the
cooperation and efforts among many international teams and it builds
a solid foundation for future international science projects," said
Natalia Bulashova, GLORIAD/USA co-principal investigator.
"Lessons learned on the Teraflow Network can be expanded to the
entire GLORIAD community, and ultimately other GLIF international
partners, said Greg Cole, GLORIAD/USA principal investigator. "No
matter how fast we increase capacity and services on the GLORIAD
network, the various science groups out there are moving faster. It's
a real challenge for us, but it's a good challenge."
|