08.13.09

Scientific Internet Searches Benefit from New Technology

Leave your hard hat and pick behind, it is now easier to mine information in the deep web.

The Internet is an amazing tool that contains a vast amount of data that is theoretically available to everyone. I say theoretically because the Internet is really divided into two unequal pools of information – the surface web and the deep web.

According to Walt Warnick, Director of the DOE Office of Scientific and Technical Information (OSTI), "The deep web is huge" – by some estimates, the deep web is more than 500 times the size of the surface web.

Popular search engines, like Google and Yahoo, crawl across Internet pages on the surface, but are unable to dig into the databases to retrieve information from the deep web.

In effect, the deep web, which accounts for 99 percent of all scientific research and development results, is cut off from common Internet searches.

Using a series of Small Business and Innovation Research (SBIR) grants, Abe Lederman's company, Deep Web Technologies, developed the technology to mine data in the deep web using federated search.

Unlike common web browsers that crawl along the surface of the Internet, a federated search moves across the surface web, but then digs down into select databases searching for information posed by the query. Deep Web Technologies moves beyond other federated search engines by adding the benefit of relevance ranking.

"Today the technology allows for hundreds of simultaneous searches," said Lederman. "Ultimately, these innovations have been used to develop and deploy applications for the Department of Energy and other federal agencies, as well as private companies."

The technology is now available in over 20 portals, most notably the search engines WorldWideScience.org, Science.gov, ScienceResearch.com, Scitopia.org, Mednar, and Biznar.

One of Lederman's biggest accomplishments is the use of his company's technology in the search engine WorldWideScience.org, which provides a one-stop search engine for global scientific databases. Since the debut of the search engine prototype in 2007, WorldWideScience has expanded from 10 to 56 participating countries and it scours more than 375 million pages of scientific information contained in deep web databases.

deep web

Credit: Darcy Pedersen

Deep Web Technologies uses a "divide-and-conquer" approach to simultaneously searching multiple data bases.

"Deep Web Technologies is a great SBIR success story," said Lederman. "We develop powerful search solutions that can then be used in products such as WorldWideScience.org. As the benefit of these technologies has been realized, we've grown. We started with 2-1/3 employees, the 1/3 being my brother, and grew to 23 employees."

When asked how he came up with this idea, Lederman smiled and said it all began with a conversation with his brother — remember the 1/3 employee. They had the brainstorm to develop an online comparison shopping site for books. "Then Amazon came along and we dropped the idea," said Lederman. When approached by Dave Henderson of OSTI to build a site for environment science researchers, "I brushed off the book idea and decided to apply it to mining databases."

Warnick called the development of this technology "a series of sequential miracles" that makes the deep web accessible to a wider audience where billions of dollars worth of government-sponsored scientific research results reside.

An MIT graduate, Lederman has had a fondness for computers and computer programming from an early age. "I first realized I wanted to work with computers when I was in high school. I wrote a Monopoly program in BASIC and submitted it for competitions. I won a few of them," Lederman said with a smile.

Lederman is not sitting quietly on his laurels. "The future of scientific research depends on sifting through more information, more quickly, and more effectively. Searching through 1,000 databases simultaneously for critical information is not unreasonable," said Lederman.

The future looks good for Deep Web Technologies. A future in which searching thousands of collections in parallel is not only achievable, but quickly becoming a strategic necessity.

The greatest obstacle to achieving this goal is making sense of the vast amount of data available on the Internet today. Lederman explains this obstacle in terms of scalability – the ability to quickly and easily sift through this information and present it to the user in a relevant and appropriate manner.

Deep Web Technologies is tackling scalability with a divide-and-conquer approach. "Where a 1,000-source search engine may struggle," said Lederman, "by combining 10 to 20 federated search engines that each search 50 to 100 sources will allow each search engine to perform a manageable amount of work."

countires

Credit: Darcy Pedersen

A screenshot of results from ScienceResearch.Gov search engine.

Deep web search engines allow scientists, researchers, educators, and engineers to easily share and transfer knowledge that can lead to cross pollination of new ideas from different fields of study. This sharing of knowledge may lead to breakthroughs and innovation in new and unique ways.

This work is supported by the Department of Energy (DOE) Office of Science, which invests in science and solving critical issues impacting people's daily lives and the nation's future. For more information, visit http://www.science.doe.gov/.

The DOE Small Business and Innovation Research (SBIR) program supports scientific excellence and technological innovation through the investment of federal research funds in areas of critical importance to building a strong national economy — one small business at a time.

This article was written by Stacy W. Kish.

Last modified: 3/15/2013 5:24:38 PM