Sunday, November 16, 2008

Making Search Social

Looking for an apartment online, day after day, can get tedious. Finding the right sofa at the right price can also be time consuming. A new search engine, called Yotify, is designed to make these kinds of persistent quests more tolerable, and hopefully more successful.

Much like Google Alerts and Yahoo Alerts, a Yotify search does not start and end in an instant. Instead, the search runs at regular intervals--either hourly or daily, depending on the user's preference--with results sent back to the user via e-mail.

But Yotify offers much more than the search giants' current alert tools, argues Ron Bouganim, CEO and cofounder of Branchnext, the San Francisco startup behind Yotify. Those alert tools, Bouganim says, are merely an afterthought for these huge companies, and they do not take into account important Web 2.0 developments, such as social networking.

"We want to create a richer experience," Bouganim says.

When users sign up for an account, they are given a personal profile page that lists, stores, and displays what they've searched for and where. That information can be made public as well, so that friends can share the results and help refine the search. This could be particularly useful for group projects such as apartment hunting with roommates, for example.

Meanwhile, Yotify is making it a point to closely integrate with the major social-networking sites, most notably Facebook and LinkedIn. "If people want to search through Facebook using our technology, we want to let them do it," claims Bouganim.

Another distinguishing characteristic of Yotify versus Google Alerts or Yahoo Alerts is its focus on shopping. Whereas Google Alerts is primarily concerned with retrieving news and other hard information, Yotify is setting up as more of a sales tool for its partner sites, which include general retailers such as Shopping.com as well as a host of niche players.

In this respect, Yotify does go above and beyond what Google Alerts currently provides. Say a user wants to buy a black futon, for example. The important aspect of the search is not that the user obtain the futon immediately, but that it's a certain price. Yotify will continually monitor its partner sites, then notify the user when a black futon is available at that particular price.

The main problem with Yotify is that, as of now, it only scans a small portion of the Web: users can only search among Yotify's partner sites. While the search engine has partnered with many key websites, such as Craigslist, the New York Times, and eBay, it certainly does not have the breadth of a search giant such as Google or Yahoo.


The technology involved is quite different than the large-scale indexing done by a typical search engine. Yotify asks partner sites to integrate its software into their systems. "We don't 'scrape' information from other sites," explains Bouganim. "We help other sites distribute their information in a way that fully complies with the goals of the partner site."

From the user perspective, however, all that matters is the effectiveness of the search. And a user who has no idea where to find what he is looking for won't want to follow the Yotify format and select specific blogs or news sites for the search.

Bouganim claims that this "deep but narrow" issue will be resolved in future releases--and sooner rather than later. Indeed, it's still early days for the search engine; a test version of the tool was just launched on September 24.

Online media analyst Mike Boland of The Kelsey Group notes that getting users to switch from Google to a different engine could prove difficult, no matter what innovations Yotify attempts. "It is such an uphill battle to get users to break out of deep-rooted online habits," Boland says. "Companies that have spent too much time drinking the Kool-Aid seem to forget that, because they think their solution is so great that it will overcome this issue. But it usually doesn't."

Although unwilling to get into details about the business model, Bouganim is clearly planning to exploit the social-networking and e-commerce aspects of Yotify. "Understanding people's wants and needs, as well as those of their friends, obviously has a tremendous amount of value."

Unused Internet

In a little more than two years, the last Internet addresses will be assigned by the international group tasked with managing the 4.3 billion numbers. And yet, while most Internet engineers are looking to Internet Protocol version 6 (IPv6), the next-generation Internet addressing scheme, a research team has probed the entire Internet and found that the problem may not be as bad as many fear. The probe reveals millions of Internet addresses that have been allocated but remain unused.

In a paper to be presented later this month at the Proceedings of the ACM Internet Measurement Conference, a team of six researchers have documented what they claim is the first complete census of the Internet in more than two decades. They discovered a surprising number of unused addresses and conclude that plenty will still be lying idle when the last numbers are handed out in a few years' time. The problem, they say, is that some companies and institutions are using just a small fraction of the many million addresses they have been allocated.

"People are very concerned that the IPv4 address space is very close to being exhausted," says John Heidemann, a research associate professor in the department of computer science at the University of Southern California (USC) and the paper's lead author. "Our data suggests that maybe there are better things we should be doing in managing the IPv4 address space."

The census, carried out every quarter since 2003 but only recently published, is the first comprehensive map of the Internet since David Smallberg, then a computer-science student at the University of California, Los Angeles, canvassed the Internet's first servers--all 300-plus of them--following the switchover from the ARPANET in early 1983.

Internet Protocol version 4 (IPv4) addresses are typically managed as network blocks consisting of 256 addresses (known as a C block), 65,536 addresses (known as a B block), or approximately 16.8 million addresses (known as an A block). About a quarter of the A block addresses--the largest segments of the Internet--were given out in the first days of the Internet to early participants and to companies and organizations including Apple, IBM, and Xerox.

Today, A blocks are issued by an organization called the Internet Assigned Numbers Authority (IANA) to large Internet service providers or to regional registrars to which the A blocks are resold. But because accelerating use of the Internet is quickly eating up the remaining free blocks of network addresses, the last blocks will likely be given out between the end of 2010 and 2011.

The next-generation Internet address scheme, IPv6, solves the shortage by vastly increasing the number of addresses available. While IPv4 offers about 4.3 billion addresses for the earth's 6.7 billion people, IPv6 will offer 51 thousand trillion trillion per person. However, the move to IPv6 has progressed slowly because of cost and complexity, even with recent mandates for use of IPv6 within the U.S. government.

The new map of the Internet suggests that there is room for more hosts even if addresses are running out. The map reveals that, while roughly a quarter of all blocks of network addresses are heavily populated and therefore efficiently used, about half of the Internet is either used lightly or is located behind firewalls blocking responses to the survey. The last quarter of network blocks consists of addresses that can still be assigned in the future.

The USC research group used the most innocuous type of network packet to probe the farthest reaches of the Internet. Known as the Internet Control Message Protocol, or ICMP, this packet is typically used to send error messages between servers and other network hardware. Sending an ICMP packet to another host (an action known as pinging) is generally not seen as hostile, Heidemann says. "There are certainly people who misunderstand what we are doing," and interpret it as the prelude to an attack, he says. "By request, we remove them from the survey, but its fewer people than you might think. Pings are pretty innocuous."

The researchers found that ICMP pings stack up well against another method of host detection, the Internet's main means of transmitting data: the Transmission Control Protocol, or TCP. TCP-probing is a common technique used by network scanners, but it tends to take longer and is considered more aggressive than ICMP pings, so it may be blocked. To compare the effectiveness of each technique, the team probed a million random Internet addresses using both ICMP and TCP, finding a total of 54,297 active hosts. ICMP pings elicited a response from approximately three-quarters of visible hosts, while TCP probes garnered a response slightly less than two-thirds of the time.

In total, the researchers estimate that there are 112 million responsive addresses, with between 52 million and 60 million addresses assigned to hosts that are contactable 95 percent of the time.

The survey may miss computers behind firewalls or computers that do not respond to pings, but the overall conclusion--that the Internet has room to grow--is spot on, says Gordon Lyon, a security researcher who created the popular network scanning tool NMAP.

"There are huge chunks of IP space which are not allocated yet, and also giant swaths which are inefficiently allocated," Lyon says. "For example, Xerox, GE, IBM, HP, Apple, and Ford each have more than 16 million IP addresses to themselves because they were allocated when the Internet was just starting."

Google Android

There's a lot of talk at Mobile Internet World 2008, in Boston, about how great applications for mobile devices die all the time because it's so hard to get through all the negotiation that stands in the way of real people using the software. A startup often has to work deals with carriers, device manufacturers, and the company that controls a device's operating system before having any hope that people might one day be able to buy or use any software that the company intends to build.

But while insider woes may not matter to the average person, the goals described by Rich Miner, group manager of mobile platforms for Google and one of the visionaries behind the company's open Android platform, could vastly change how large numbers of people access the Internet--if Android succeeds. Google is supporting Android for a long-term reason, Miner said. The company's products are all Web services, and, after having successfully won the hearts and minds of many people using laptops and desktops, one way the company hopes to grow is by convincing more users to access its services through mobile phones. That requires making it possible for them to do so.

Miner described Google's frustrations building a Maps application for mobile phones. Miner said that after having established itself by building on open-source software in most cases--using the Linux operating system, for example--the company was shocked at the closed, serpentine processes typical of building mobile applications. The company wants to change what is now often an expensive, 18-month process into a matter of days and a $25 application fee. The company has successfully pushed industry giants to talk the same talk. Yesterday at the conference, Verizon Wireless director of open development Anthony Lewis spoke about his company's efforts to reduce the application approval process to only four weeks.

If these types of efforts succeed, people will see many more applications available through mobile phones. It will be easier to access Web pages and services familiar from the larger Internet, and devices will stop existing as separate animals. And presumably, Google will continue to rake in money through advertisements as more people access the Internet more often.

The vision that Miner described is in line with other things that I've heard from Google, particularly in relation to App Engine, its quick-start service designed to help Web application developers get going quickly and easily. The idea is that the easier it is for people to build software for the Web, the more reasons people will have to access the Web. The Web will become an ever-larger part of people's lives. In the end, this will be good for Google. In service to this strategy, the company has poured money and effort into shaking up the mobile industry.

Since the first phone running Android software came out this Tuesday, with many more to follow, it's time for users to put Google's strategy to the test. I'm hoping that Android and other open efforts succeed. Google's profit motives aside, the mobile industry is clearly choked and stifled by the wrangling and politics associated with getting new software and hardware on the market. Breaking that block will bring better services to people using mobile devices.

Saturday, November 8, 2008

Open Cloud Computing

Cloud-computing platforms such as Amazon's Elastic Compute Cloud (EC2), Microsoft's Azure Services Platform, and Google App Engine have given many businesses flexible access to computing resources, ushering in an era in which, among other things, startups can operate with much lower infrastructure costs. Instead of having to buy or rent hardware, users can pay for only the processing power that they actually use and are free to use more or less as their needs change.

However, relying on cloud computing comes with drawbacks, including privacy, security, and reliability concerns. So there is now growing interest in open-source cloud-computing tools, for which the source code is freely available. These tools could let companies build and customize their own computing clouds to work alongside more powerful commercial solutions.

One open-source software-infrastructure project, called Eucalyptus, imitates the experience of using EC2 but lets users run programs on their own resources and provides a detailed view of what would otherwise be the black box of cloud-computing services.

Another open-source cloud-computing project is the University of Chicago's Globus Nimbus, which is widely recognized as having pioneered the field. And a European cloud-computing initiative coordinated by IBM, called RESERVOIR, features several open-source components, including OpenNebula, a tool for managing the virtual machines within a cloud. Even some companies, such as Enomaly and 10gen, are developing open-source cloud-computing tools.

Rich Wolski, a professor in the computer-science department at the University of California, Santa Barbara, who directs the Eucalyptus project, says that his focus is on developing a platform that is easy to use, maintain, and modify. "We actually started from first principles to build something that looks like a cloud," he says. "As a result, we believe that our thing is more malleable. We can modify it, we can see inside it, we can install it and maintain it in a cloud environment in a more natural way."

Reuven Cohen, founder and chief technologist of Enomaly, explains that an open-source cloud provides useful flexibility for academics and large companies. For example, he says, a company might want to run most of its computing in a commercial cloud such as that provided by Amazon but use the same software to process sensitive data on its own machines, for added security. Alternatively, a user might want to run software on his or her own resources most of the time, but have the option to expand to a commercial service in times of high demand. In both cases, an open-source cloud-computing interface can offer that flexibility, serving as a complement to the commercial service rather than a replacement.

Indeed, Wolski says that Eucalyptus isn't meant to be an EC2 killer (for one thing, it's not designed to scale to the same size). However, he believes that the project can make a productive contribution by offering a simple way to customize programs for use in the cloud. Wolski says that it's easier to assess a program's performance when it's possible to see how it operates both at the interface and from within a cloud.

Wolski says that Eucalyptus will also imitate Amazon's popular Simple Storage Surface, which allows users to access storage space on demand, as well as its Elastic IP addresses, which keeps the address of Web resources the same, even if the physical location changes.

Ignacio Llorente, a professor in the distributed systems architecture group at the Universidad Complutense de Madrid, in Spain, who works on OpenNebula, says that Eucalyptus's main advantage is that it uses the popular EC2 interface. However, he adds that "the open-source interface is only one part of the solution. Their back-end [the system's internal management of physical resources and virtual machines] is too basic. A complete cloud solution requires other components." Llorente says that Eucalyptus is just one example of a growing ecosystem of open-source cloud-computing components.

Wolski expects many of Eucalyptus's users to be academics interested in studying cloud-computing infrastructure. Although he doubts that such a platform would be used as a distributed system for ordinary computer users, he doesn't discount the possibility. "You can argue it both ways," he notes. But Wolski says that he thinks some open-source cloud-computing tool will become important in the future. "If it's not Eucalyptus, I suspect [it will be] something else," he says. "There will be an open-source thing that everyone gets excited about and runs in their environment."

Cracking The Physical Internet

For decades, the physical Internet has been in a state of suspended animation. It was designed in the 1960s to transmit files and e-mail, and even the advent of YouTube, Internet phone calls, streaming music, and networked video games have done little to change it. In part, that's because the only network big enough to provide a test bed for new hardware tricks is the Internet itself; in part, it's because the routers and switches that make up the Internet are closed technologies, sold by a handful of companies.

A project led by Nick McKeown of Stanford University, however, has begun to open up some of the most commonly used network hardware, from companies such as HP, Cisco, NEC, and Juniper. Allowing researchers to fiddle with Internet hardware, McKeown says, will make the Internet more secure, more reliable, more energy efficient, and more pervasive.

"In the last 10 years, there's been no transfer of ideas into the [Internet] infrastructure," says McKeown, a professor of electrical engineering and computer science. "What we're trying to do is enable thousands of graduate students to demonstrate ideas at scale. That could lead to a faster rate of innovation, and ultimately these ideas can be incorporated into products."

Under the auspices of a project called OpenFlow, McKeown's team has secured permission from equipment vendors to write a small amount of code that, essentially, grants access to a critical part of a network or switch called a flow table. When a packet--a chunk of data--arrives at a switch, for instance, software in the switch looks up instructions on the flow table to decide where to send the packet.

"What OpenFlow does is give you direct access to the flow table, to add and delete instructions," says McKeown. "It's a completely brain-dead idea." But it hasn't been implemented before because the assumption was that vendors wouldn't open up their hardware. "We figured out that there was a minimum amount of access to the flow table that network vendors were okay with allowing that was still extremely useful to us for testing out our ideas," McKeown says.

At a recent demonstration, McKeown and his team showed off their ability to control the traffic in a network via a simple cartoonlike interface on a PC. One test was designed to let people play a first-person-shooter video game on laptops, while moving between wireless access points, without losing any information or experiencing any lags. (First-person-shooter games are commonly used in network tests because they are resource intensive, and if the network fails, it's immediately obvious.) In the demonstration, the researchers instructed a server on Stanford's network to find the most efficient connection to the device at any given moment. "It's a good idea for a game, but today you can't do that because you can't control the routing," McKeown says.

In another demonstration, the researchers showed that OpenFlow can enable direct manual control of network traffic: using a mouse cursor, researchers rerouted data traffic from Stanford to a network in Japan. "The goal is not to show that you are controlling your network from a mouse, but that you now have control," McKeown says. "It's not left up to whatever the box vendor decides . . . This infrastructure that's been held close is being opened and democratized."

OpenFlow is creating an entirely new field of research, with benefits that the average person could enjoy within the next couple of years. "This could take over the Internet," says Rick McGeer, a researcher at HP Labs who's working on projects similar to McKeown's. "This actually looks like an elegant, efficient solution that we can use to take all of these ideas that we've been exploring for the past five years and start implementing them, and start putting them in the network."

There could, however, still be some challenges ahead, McGeer warns. First, he says, vendors would need to continue to support the project as it moves out of the lab and onto the live Internet. Second, companies who provide Internet service need to see the benefits of opening up their networks. "If I had to guess what would happen first," McGeer says, "Comcast might want to offer multicast trees [a way to distribute the burden of data-intensive Web functions] for efficient YouTube videos, and they'll start to put that in for their services."

McKeown sees the potential to completely open up the airwaves, allowing portable devices to access any wireless network that they can detect. In a city, for instance, a Wi-Fi-enabled cell phone can probably recognize dozens of networks, McKeown says--from Wi-Fi access points to the cell networks maintained by different carriers. But if a user needs more bandwidth for a download, or a stronger signal for a clearer call, or if she moves out of range of a wireless transmitter, switching to another network is difficult, if not impossible. "Our goal is seamless mobility," McKeown says. "We'd love to come up with a way to re-architect cellular wireless networks. But that's further out. We're talking 10 years."

Wednesday, October 29, 2008

Azure on the Horizon

Microsoft announced details of its new offering, Azure, Web-based infrastructure that seems intended to compete with the likes of Google App Engine and Amazon's EC2. Rumors have been flying for some time that Microsoft planned to release a Web-based operating system, and based on recent talks by company luminaries such as Craig Mundie and Ray Ozzie, the giant has been looking to change its strategy to better compete with challenges from Internet companies such as Google. A lot is still up in the air about Azure, however: Microsoft hasn't yet revealed when it will be out or how much it will cost.