Web 2.0 & beyond

Sunday, September 6, 2009

Retwitting Evolution

The micro blogging and social networking site Twitter took off last year and had more than 44.5 million users worldwide as of June. In the 140-character limited ecosystem of Twitter, users have evolved a language of their own, figuring out creative ways to filter the sometimes overwhelming stream of Twitter posts. Now, Twitter has announced that a user-generated communication technique called retweeting--reposting someone else's message, similar to quoting--will be formally incorporated into Twitter. Some experts say Twitter's approach will hinder the conversational aspect of retweeting; others predict that it will create a new way of communicating.

Twitter has incorporated other user-generated linguistic tools, such as using a hash symbol in front of a word to make it easily searchable (like "#conference09"). Another common technique is typing @ in front of a username to reply directly (but publically) to the user, which Twitter also formalized after users adopted it. These linguistic tools have even trickled into other social media environments, including YouTube, Flickr, Facebook, and blogs.
Currently, there is no set format for retweeting, which loosely consists of reposting someone's tweet and giving due credit. The most common scheme for a retweet involves prefacing the post with the letters "RT," then the @ symbol, and the username of the person being quoted. The retweet rebroadcasts the information to a new set of followers, who see the retweet and have the option of retweeting themselves. In this way, ideas, links, and other information can be distributed--and tracked--fairly quickly.
But the retweeting format is much more inconsistent and complex than the targeted reply and hashtag conventions, according to Microsoft Research social media scientist Danah Boyd, who recently posted a paper on the behavior of retweeting. Variations include typing the attribution at the end and using "via," "by," or "retweet" instead of "RT." What's more, people often add their own comments before or after a retweet. This becomes a problem with Twitter's 140-character limit, explains Boyd. Typing "RT @username" takes up characters, and so does adding a comment. To deal with this, users will paraphrase or omit part of the original text, sometimes leading to incorrect quotes.
Last week, Twitter announced that it will soon implement a button that will let users automatically repost someone else's tweet. While this will make it quicker and easier for users to accurately retweet, the mockup of the new button does not appear to let users edit the retweet, so that commentary can be incorporated. Rather, the "retweet" button will add the image and name of the quoted person to the original tweet and post it for those who follow the retweeter.
The new retweet function "is not going to meet the needs of those who retweet. At the same time, I think it's going to bring retweeting to a whole new population," says Boyd. "Adding commentary is a huge element to why people retweet." Instead of just replying privately to a person with an opinion, by retweeting and adding a comment, users can target a larger audience, sharing their opinions and inviting others to do the same, she says.
Boyd found that the percentage of Twitter users who retweet is fairly small, but she expects that number to increase once the retweet button is incorporated. In her research, Boyd found that 11 percent of the retweets examined contained commentary. But she says that number likely underestimates the phenomenon, as she only looked for comments at the beginning of the message.
"Retweeting is primarily used by the geeks and news folks," she says. "What's really starting to hit [Twitter] in large numbers... are those involved with the pop culture." Boyd expects that a retweet button will bring the practice to those millions of users who follow celebrities, such as Twitter fanatics Ashton Kutcher and Oprah Winfrey, for example. "We're going to see information spread from populations who haven't engaged in that way [before]. We'll see an evolution of the behavior," says Boyd. "It will become a way to validate or agree with other users' content."
Users often employ retweets to provide context in conversation, says Susan Herring, a professor of information science and linguistics at Indiana University and editor in chief of the Language@Internet journal. "I can't imagine that [the new Twitter tool] will be very satisfactory to Twitter retweeters," says Herring. "A retweet plus a comment is a conversation. A retweet alone could be an endorsement, but it's a stretch to view an exchange of endorsements as a conversation." Herring does agree that it will increase retweeting and broaden the range of users who retweet.
Retweets are not just of interest to users but also are valuable to companies and researchers who strive to keep track of how ideas spread. Retweeting "is this elegant viral mechanism," says Dan Zarrella, a Web developer who studies viral marketing in social media. "The scale and data you can extract from [retweets] has never been possible with [other] viral or word-of-mouth communications," says Zarrella, who claims to have a database of more than 30 million retweets.
"I think that having a button and supported structure of retweeting is definitely a good idea, but I disagree with the implementation," Zarrella says, and suggests using a format like third-party Twitter tool TweetDeck and others do: pressing a retweet button there will automatically copy and paste the old link with the "RT" syntax, but the tool still allows the retweeter to modify the text.
By taking out the "RT @username," Twitter is making it impossible for users to search for retweets themselves, says Zarrella. "They're limiting how much you can analyze retweets." Zarrella speculates as to whether the retweet button might have been created so that, down the road, Twitter can charge for different features, such as extensive tracking of retweets.
In addition to showing the original tweeter's image, the new Twitter button will also show the latest 20 retweets of a post. "If they show the breadcrumbs of the trail of everyone who retweeted, that's a good thing," says Steve Garfield, a new media advisor to several large companies and prolific video blogger. "I like to add value to my retweets by adding a comment, to tell people why I like it." If the new function doesn't allow for comments, Garfield says users will just design a new way or revert to the old way.
"People will continue to repurpose Twitter to meet their needs," predicts Herring. "I can't imagine that those who are passionate retweeters will discontinue their practices."

Adding Trust to Wiki

The official motto of the Internet could be "don't believe everything you read," but moves are afoot to help users know better what to be skeptical about and what to trust.

A tool called WikiTrust, which helps users evaluate information on Wikipedia by automatically assigning a reliability color-coding to text, came into the spotlight this week with news that it could be added as an option for general users of Wikipedia. Also, last week the Wikimedia Foundation announced that changes made to pages about living people will soon need to be vetted by an established editor. These moves reflect a broader drive to make online information more accountable. And this week the World Wide Web Consortium published a framework that could help any Web site make verifiable claims about authorship and reliability of content.

WikiTrust, developed by researchers at the University of California, Santa Cruz, color-codes the information on a Wikipedia page using algorithms that evaluate the reliability of the author and the information itself. The algorithms do this by examining how well-received the author's contributions have been within the community. It looks at how quickly a user's edits are revised or reverted and considers the reputation of those people who interact with the author. If a disreputable editor changes something, the original author won't necessarily lose many reputation points. A white background, for example, means that a piece of text has been viewed by many editors who did not change it and that it was written by a reliable author. Shades of orange signify doubt, dubious authorship, or ongoing controversy.

Luca de Alfaro, an associate professor of computer science at the UC Santa Cruz who helped develop WikiTrust, says that most Web users crave more accountability. "Fundamentally, we want to know who did what," he says. According to de Alfaro, WikiTrust makes it harder to change information on a page without anyone noticing, and it makes it easy to see what's happening on a page and analyze it.

The researchers behind WikiTrust are working on a version that includes a full analysis of all the edits made to the English-language version of Wikipedia since its inception. A demo of the full version will be released within the next couple months, de Alfaro says, though it's still uncertain whether that will be hosted on the university's own servers or by the Wikimedia Foundation. The principles used by WikiTrust's algorithms could be brought onto any site with collaboratively created content, de Alfaro adds.

Creating a common language for building trust online is the goal of the Protocol for Web Description Resources (POWDER), released this week by the World Wide Web Consortium.

Powder takes a simpler approach than WikiTrust. By using Powder's specifications, a Web site can make claims about where information came from and how it can be used. For example, a site could say that a page contains medical information provided by specific experts. It could also assure users that certain sites will work on mobile devices, or that content is offered through a Creative Commons license.

Powder is designed to integrate with third-party authentication services and to be machine-readable. Users could install a plug-in that would look for claims made through Powder on any given page, automatically check their authentication, and inform other users of the result. Search engines could also read descriptions made using Powder, allowing them to help users locate the most trustworthy and relevant information.

"From the outset, a fundamental aspect of Powder is that, if the document is to be valid, it must point to the author of that document," says Phil Archer, a project manager for i-sieve technologies who is involved with the Powder working group. "We strongly encourage authors to make available some sort of authentication mechanism."

Ed Chi, a senior research scientist at the Palo Alto Research Center, believes that educating users about online trust evaluation tools could be a major hurdle. "So far, human-computer interaction research seems to suggest that people are willing to do very little [to determine the trustworthiness of websites]--in fact, nothing," he says. As an example, Chi notes the small progress that has been made in teaching users to avoiding phishing scams or to make sure that they enter credit-card information only on sites that encrypt data. "The general state of affairs is pretty depressing," he says.

Even if Web users do learn to use new tools to evaluate the trustworthiness of information, most experts agree that this is unlikely to solve the problem completely. "Trust is a very human thing," Archer says. "[Technology] can never, I don't think, give you an absolute guarantee that what is on your screen can be trusted at face value."

Sunday, November 16, 2008

Making Search Social

Looking for an apartment online, day after day, can get tedious. Finding the right sofa at the right price can also be time consuming. A new search engine, called Yotify, is designed to make these kinds of persistent quests more tolerable, and hopefully more successful.

Much like Google Alerts and Yahoo Alerts, a Yotify search does not start and end in an instant. Instead, the search runs at regular intervals--either hourly or daily, depending on the user's preference--with results sent back to the user via e-mail.

But Yotify offers much more than the search giants' current alert tools, argues Ron Bouganim, CEO and cofounder of Branchnext, the San Francisco startup behind Yotify. Those alert tools, Bouganim says, are merely an afterthought for these huge companies, and they do not take into account important Web 2.0 developments, such as social networking.

"We want to create a richer experience," Bouganim says.

When users sign up for an account, they are given a personal profile page that lists, stores, and displays what they've searched for and where. That information can be made public as well, so that friends can share the results and help refine the search. This could be particularly useful for group projects such as apartment hunting with roommates, for example.

Meanwhile, Yotify is making it a point to closely integrate with the major social-networking sites, most notably Facebook and LinkedIn. "If people want to search through Facebook using our technology, we want to let them do it," claims Bouganim.

Another distinguishing characteristic of Yotify versus Google Alerts or Yahoo Alerts is its focus on shopping. Whereas Google Alerts is primarily concerned with retrieving news and other hard information, Yotify is setting up as more of a sales tool for its partner sites, which include general retailers such as Shopping.com as well as a host of niche players.

In this respect, Yotify does go above and beyond what Google Alerts currently provides. Say a user wants to buy a black futon, for example. The important aspect of the search is not that the user obtain the futon immediately, but that it's a certain price. Yotify will continually monitor its partner sites, then notify the user when a black futon is available at that particular price.

The main problem with Yotify is that, as of now, it only scans a small portion of the Web: users can only search among Yotify's partner sites. While the search engine has partnered with many key websites, such as Craigslist, the New York Times, and eBay, it certainly does not have the breadth of a search giant such as Google or Yahoo.

The technology involved is quite different than the large-scale indexing done by a typical search engine. Yotify asks partner sites to integrate its software into their systems. "We don't 'scrape' information from other sites," explains Bouganim. "We help other sites distribute their information in a way that fully complies with the goals of the partner site."

From the user perspective, however, all that matters is the effectiveness of the search. And a user who has no idea where to find what he is looking for won't want to follow the Yotify format and select specific blogs or news sites for the search.

Bouganim claims that this "deep but narrow" issue will be resolved in future releases--and sooner rather than later. Indeed, it's still early days for the search engine; a test version of the tool was just launched on September 24.

Online media analyst Mike Boland of The Kelsey Group notes that getting users to switch from Google to a different engine could prove difficult, no matter what innovations Yotify attempts. "It is such an uphill battle to get users to break out of deep-rooted online habits," Boland says. "Companies that have spent too much time drinking the Kool-Aid seem to forget that, because they think their solution is so great that it will overcome this issue. But it usually doesn't."

Although unwilling to get into details about the business model, Bouganim is clearly planning to exploit the social-networking and e-commerce aspects of Yotify. "Understanding people's wants and needs, as well as those of their friends, obviously has a tremendous amount of value."

Unused Internet

In a little more than two years, the last Internet addresses will be assigned by the international group tasked with managing the 4.3 billion numbers. And yet, while most Internet engineers are looking to Internet Protocol version 6 (IPv6), the next-generation Internet addressing scheme, a research team has probed the entire Internet and found that the problem may not be as bad as many fear. The probe reveals millions of Internet addresses that have been allocated but remain unused.

In a paper to be presented later this month at the Proceedings of the ACM Internet Measurement Conference, a team of six researchers have documented what they claim is the first complete census of the Internet in more than two decades. They discovered a surprising number of unused addresses and conclude that plenty will still be lying idle when the last numbers are handed out in a few years' time. The problem, they say, is that some companies and institutions are using just a small fraction of the many million addresses they have been allocated.

"People are very concerned that the IPv4 address space is very close to being exhausted," says John Heidemann, a research associate professor in the department of computer science at the University of Southern California (USC) and the paper's lead author. "Our data suggests that maybe there are better things we should be doing in managing the IPv4 address space."

The census, carried out every quarter since 2003 but only recently published, is the first comprehensive map of the Internet since David Smallberg, then a computer-science student at the University of California, Los Angeles, canvassed the Internet's first servers--all 300-plus of them--following the switchover from the ARPANET in early 1983.

Internet Protocol version 4 (IPv4) addresses are typically managed as network blocks consisting of 256 addresses (known as a C block), 65,536 addresses (known as a B block), or approximately 16.8 million addresses (known as an A block). About a quarter of the A block addresses--the largest segments of the Internet--were given out in the first days of the Internet to early participants and to companies and organizations including Apple, IBM, and Xerox.

Today, A blocks are issued by an organization called the Internet Assigned Numbers Authority (IANA) to large Internet service providers or to regional registrars to which the A blocks are resold. But because accelerating use of the Internet is quickly eating up the remaining free blocks of network addresses, the last blocks will likely be given out between the end of 2010 and 2011.

The next-generation Internet address scheme, IPv6, solves the shortage by vastly increasing the number of addresses available. While IPv4 offers about 4.3 billion addresses for the earth's 6.7 billion people, IPv6 will offer 51 thousand trillion trillion per person. However, the move to IPv6 has progressed slowly because of cost and complexity, even with recent mandates for use of IPv6 within the U.S. government.

The new map of the Internet suggests that there is room for more hosts even if addresses are running out. The map reveals that, while roughly a quarter of all blocks of network addresses are heavily populated and therefore efficiently used, about half of the Internet is either used lightly or is located behind firewalls blocking responses to the survey. The last quarter of network blocks consists of addresses that can still be assigned in the future.

The USC research group used the most innocuous type of network packet to probe the farthest reaches of the Internet. Known as the Internet Control Message Protocol, or ICMP, this packet is typically used to send error messages between servers and other network hardware. Sending an ICMP packet to another host (an action known as pinging) is generally not seen as hostile, Heidemann says. "There are certainly people who misunderstand what we are doing," and interpret it as the prelude to an attack, he says. "By request, we remove them from the survey, but its fewer people than you might think. Pings are pretty innocuous."

The researchers found that ICMP pings stack up well against another method of host detection, the Internet's main means of transmitting data: the Transmission Control Protocol, or TCP. TCP-probing is a common technique used by network scanners, but it tends to take longer and is considered more aggressive than ICMP pings, so it may be blocked. To compare the effectiveness of each technique, the team probed a million random Internet addresses using both ICMP and TCP, finding a total of 54,297 active hosts. ICMP pings elicited a response from approximately three-quarters of visible hosts, while TCP probes garnered a response slightly less than two-thirds of the time.

In total, the researchers estimate that there are 112 million responsive addresses, with between 52 million and 60 million addresses assigned to hosts that are contactable 95 percent of the time.

The survey may miss computers behind firewalls or computers that do not respond to pings, but the overall conclusion--that the Internet has room to grow--is spot on, says Gordon Lyon, a security researcher who created the popular network scanning tool NMAP.

"There are huge chunks of IP space which are not allocated yet, and also giant swaths which are inefficiently allocated," Lyon says. "For example, Xerox, GE, IBM, HP, Apple, and Ford each have more than 16 million IP addresses to themselves because they were allocated when the Internet was just starting."

Web 2.0 & beyond

Sunday, September 6, 2009

Retwitting Evolution

Adding Trust to Wiki

Sunday, November 16, 2008

Making Search Social

Unused Internet

Contributors

My Blog List

Followers

Blog Archive