Sunday, September 6, 2009
Retwitting Evolution
Twitter has incorporated other user-generated linguistic tools, such as using a hash symbol in front of a word to make it easily searchable (like "#conference09"). Another common technique is typing @ in front of a username to reply directly (but publically) to the user, which Twitter also formalized after users adopted it. These linguistic tools have even trickled into other social media environments, including YouTube, Flickr, Facebook, and blogs.
Currently, there is no set format for retweeting, which loosely consists of reposting someone's tweet and giving due credit. The most common scheme for a retweet involves prefacing the post with the letters "RT," then the @ symbol, and the username of the person being quoted. The retweet rebroadcasts the information to a new set of followers, who see the retweet and have the option of retweeting themselves. In this way, ideas, links, and other information can be distributed--and tracked--fairly quickly.
But the retweeting format is much more inconsistent and complex than the targeted reply and hashtag conventions, according to Microsoft Research social media scientist Danah Boyd, who recently posted a paper on the behavior of retweeting. Variations include typing the attribution at the end and using "via," "by," or "retweet" instead of "RT." What's more, people often add their own comments before or after a retweet. This becomes a problem with Twitter's 140-character limit, explains Boyd. Typing "RT @username" takes up characters, and so does adding a comment. To deal with this, users will paraphrase or omit part of the original text, sometimes leading to incorrect quotes.
Last week, Twitter announced that it will soon implement a button that will let users automatically repost someone else's tweet. While this will make it quicker and easier for users to accurately retweet, the mockup of the new button does not appear to let users edit the retweet, so that commentary can be incorporated. Rather, the "retweet" button will add the image and name of the quoted person to the original tweet and post it for those who follow the retweeter.
The new retweet function "is not going to meet the needs of those who retweet. At the same time, I think it's going to bring retweeting to a whole new population," says Boyd. "Adding commentary is a huge element to why people retweet." Instead of just replying privately to a person with an opinion, by retweeting and adding a comment, users can target a larger audience, sharing their opinions and inviting others to do the same, she says.
Boyd found that the percentage of Twitter users who retweet is fairly small, but she expects that number to increase once the retweet button is incorporated. In her research, Boyd found that 11 percent of the retweets examined contained commentary. But she says that number likely underestimates the phenomenon, as she only looked for comments at the beginning of the message.
"Retweeting is primarily used by the geeks and news folks," she says. "What's really starting to hit [Twitter] in large numbers... are those involved with the pop culture." Boyd expects that a retweet button will bring the practice to those millions of users who follow celebrities, such as Twitter fanatics Ashton Kutcher and Oprah Winfrey, for example. "We're going to see information spread from populations who haven't engaged in that way [before]. We'll see an evolution of the behavior," says Boyd. "It will become a way to validate or agree with other users' content."
Users often employ retweets to provide context in conversation, says Susan Herring, a professor of information science and linguistics at Indiana University and editor in chief of the Language@Internet journal. "I can't imagine that [the new Twitter tool] will be very satisfactory to Twitter retweeters," says Herring. "A retweet plus a comment is a conversation. A retweet alone could be an endorsement, but it's a stretch to view an exchange of endorsements as a conversation." Herring does agree that it will increase retweeting and broaden the range of users who retweet.
Retweets are not just of interest to users but also are valuable to companies and researchers who strive to keep track of how ideas spread. Retweeting "is this elegant viral mechanism," says Dan Zarrella, a Web developer who studies viral marketing in social media. "The scale and data you can extract from [retweets] has never been possible with [other] viral or word-of-mouth communications," says Zarrella, who claims to have a database of more than 30 million retweets.
"I think that having a button and supported structure of retweeting is definitely a good idea, but I disagree with the implementation," Zarrella says, and suggests using a format like third-party Twitter tool TweetDeck and others do: pressing a retweet button there will automatically copy and paste the old link with the "RT" syntax, but the tool still allows the retweeter to modify the text.
By taking out the "RT @username," Twitter is making it impossible for users to search for retweets themselves, says Zarrella. "They're limiting how much you can analyze retweets." Zarrella speculates as to whether the retweet button might have been created so that, down the road, Twitter can charge for different features, such as extensive tracking of retweets.
In addition to showing the original tweeter's image, the new Twitter button will also show the latest 20 retweets of a post. "If they show the breadcrumbs of the trail of everyone who retweeted, that's a good thing," says Steve Garfield, a new media advisor to several large companies and prolific video blogger. "I like to add value to my retweets by adding a comment, to tell people why I like it." If the new function doesn't allow for comments, Garfield says users will just design a new way or revert to the old way.
"People will continue to repurpose Twitter to meet their needs," predicts Herring. "I can't imagine that those who are passionate retweeters will discontinue their practices."
Adding Trust to Wiki
Sunday, November 16, 2008
Making Search Social

Much like Google Alerts and Yahoo Alerts, a Yotify search does not start and end in an instant. Instead, the search runs at regular intervals--either hourly or daily, depending on the user's preference--with results sent back to the user via e-mail.
But Yotify offers much more than the search giants' current alert tools, argues Ron Bouganim, CEO and cofounder of Branchnext, the San Francisco startup behind Yotify. Those alert tools, Bouganim says, are merely an afterthought for these huge companies, and they do not take into account important Web 2.0 developments, such as social networking.
"We want to create a richer experience," Bouganim says.
When users sign up for an account, they are given a personal profile page that lists, stores, and displays what they've searched for and where. That information can be made public as well, so that friends can share the results and help refine the search. This could be particularly useful for group projects such as apartment hunting with roommates, for example.
Meanwhile, Yotify is making it a point to closely integrate with the major social-networking sites, most notably Facebook and LinkedIn. "If people want to search through Facebook using our technology, we want to let them do it," claims Bouganim.
Another distinguishing characteristic of Yotify versus Google Alerts or Yahoo Alerts is its focus on shopping. Whereas Google Alerts is primarily concerned with retrieving news and other hard information, Yotify is setting up as more of a sales tool for its partner sites, which include general retailers such as Shopping.com as well as a host of niche players.
In this respect, Yotify does go above and beyond what Google Alerts currently provides. Say a user wants to buy a black futon, for example. The important aspect of the search is not that the user obtain the futon immediately, but that it's a certain price. Yotify will continually monitor its partner sites, then notify the user when a black futon is available at that particular price.
The main problem with Yotify is that, as of now, it only scans a small portion of the Web: users can only search among Yotify's partner sites. While the search engine has partnered with many key websites, such as Craigslist, the New York Times, and eBay, it certainly does not have the breadth of a search giant such as Google or Yahoo.
The technology involved is quite different than the large-scale indexing done by a typical search engine. Yotify asks partner sites to integrate its software into their systems. "We don't 'scrape' information from other sites," explains Bouganim. "We help other sites distribute their information in a way that fully complies with the goals of the partner site."
From the user perspective, however, all that matters is the effectiveness of the search. And a user who has no idea where to find what he is looking for won't want to follow the Yotify format and select specific blogs or news sites for the search.
Bouganim claims that this "deep but narrow" issue will be resolved in future releases--and sooner rather than later. Indeed, it's still early days for the search engine; a test version of the tool was just launched on September 24.
Online media analyst Mike Boland of The Kelsey Group notes that getting users to switch from Google to a different engine could prove difficult, no matter what innovations Yotify attempts. "It is such an uphill battle to get users to break out of deep-rooted online habits," Boland says. "Companies that have spent too much time drinking the Kool-Aid seem to forget that, because they think their solution is so great that it will overcome this issue. But it usually doesn't."
Although unwilling to get into details about the business model, Bouganim is clearly planning to exploit the social-networking and e-commerce aspects of Yotify. "Understanding people's wants and needs, as well as those of their friends, obviously has a tremendous amount of value."
Unused Internet
In a paper to be presented later this month at the Proceedings of the ACM Internet Measurement Conference, a team of six researchers have documented what they claim is the first complete census of the Internet in more than two decades. They discovered a surprising number of unused addresses and conclude that plenty will still be lying idle when the last numbers are handed out in a few years' time. The problem, they say, is that some companies and institutions are using just a small fraction of the many million addresses they have been allocated.
"People are very concerned that the IPv4 address space is very close to being exhausted," says John Heidemann, a research associate professor in the department of computer science at the University of Southern California (USC) and the paper's lead author. "Our data suggests that maybe there are better things we should be doing in managing the IPv4 address space."
The census, carried out every quarter since 2003 but only recently published, is the first comprehensive map of the Internet since David Smallberg, then a computer-science student at the University of California, Los Angeles, canvassed the Internet's first servers--all 300-plus of them--following the switchover from the ARPANET in early 1983.
Internet Protocol version 4 (IPv4) addresses are typically managed as network blocks consisting of 256 addresses (known as a C block), 65,536 addresses (known as a B block), or approximately 16.8 million addresses (known as an A block). About a quarter of the A block addresses--the largest segments of the Internet--were given out in the first days of the Internet to early participants and to companies and organizations including Apple, IBM, and Xerox.
Today, A blocks are issued by an organization called the Internet Assigned Numbers Authority (IANA) to large Internet service providers or to regional registrars to which the A blocks are resold. But because accelerating use of the Internet is quickly eating up the remaining free blocks of network addresses, the last blocks will likely be given out between the end of 2010 and 2011.
The new map of the Internet suggests that there is room for more hosts even if addresses are running out. The map reveals that, while roughly a quarter of all blocks of network addresses are heavily populated and therefore efficiently used, about half of the Internet is either used lightly or is located behind firewalls blocking responses to the survey. The last quarter of network blocks consists of addresses that can still be assigned in the future.
The USC research group used the most innocuous type of network packet to probe the farthest reaches of the Internet. Known as the Internet Control Message Protocol, or ICMP, this packet is typically used to send error messages between servers and other network hardware. Sending an ICMP packet to another host (an action known as pinging) is generally not seen as hostile, Heidemann says. "There are certainly people who misunderstand what we are doing," and interpret it as the prelude to an attack, he says. "By request, we remove them from the survey, but its fewer people than you might think. Pings are pretty innocuous."
The researchers found that ICMP pings stack up well against another method of host detection, the Internet's main means of transmitting data: the Transmission Control Protocol, or TCP. TCP-probing is a common technique used by network scanners, but it tends to take longer and is considered more aggressive than ICMP pings, so it may be blocked. To compare the effectiveness of each technique, the team probed a million random Internet addresses using both ICMP and TCP, finding a total of 54,297 active hosts. ICMP pings elicited a response from approximately three-quarters of visible hosts, while TCP probes garnered a response slightly less than two-thirds of the time.
In total, the researchers estimate that there are 112 million responsive addresses, with between 52 million and 60 million addresses assigned to hosts that are contactable 95 percent of the time.
The survey may miss computers behind firewalls or computers that do not respond to pings, but the overall conclusion--that the Internet has room to grow--is spot on, says Gordon Lyon, a security researcher who created the popular network scanning tool NMAP.
"There are huge chunks of IP space which are not allocated yet, and also giant swaths which are inefficiently allocated," Lyon says. "For example, Xerox, GE, IBM, HP, Apple, and Ford each have more than 16 million IP addresses to themselves because they were allocated when the Internet was just starting."