Progress Report

Completed Tasks

Data Collection:

We were able to collect a relatively small sample size of tweets from the TwitterAPI. We are worried that this data is still not statistically significant and may be somewhat biased, so we will continue to extract more data from the API and hopefully have enough data to analyze by this Sunday.Our target number of tweets is 20K.

Another option is to obtain data already gathered from Twitter (from InfoChimps for instance), but this data is only available at a cost ($300).

Formulation of Hypotheses:

We came up with a few questions about what makes a tweet 'optimum'. With that we want to provide the user with information such as:
a. Personal Data Set: Provide the user with a dynamic set of information that shows statistics regarding topics, time of day and time it takes to retweet or reply, tweeting/retweeting ratio,and style of their friends tweets, and what their friends have been retweeting and replying to.
b. Generic Date Set: Provide the user a broader set of information that includes statistics from our initial results that look for trends, times of answer, and styles that are trending in the world. This data set will provide general information regarding effective tweet techniques and will not be limited to friend's trends.These hypotheses will be tested once we collect the data and analyze it properly. Once we confirm/deny the hypothesis and obtain other interesting insights, we will proceed to actually create the application.

Design Specs:

We put ourselves in the shoes of the user to determine what would actually be useful for the user to see.We want to make the information from the API available to anyone without much technical knowledge and add the insights we obtain to complement that and thus facilitate the decision-making process at the time of tweeting, but without overwhelming the user with too much information.

We figured that if we give the user way too many statistics/suggestions, the app will stop being user-friendly so we decided it would be best to keep things simple at all times. Of course, we won't know what information will be useful and what won't be until we do the data analysis, so we would like to have the analysis done as soon as possible.

Web Infrastructure:

We have decided to use Google App Engine to build this product, as it is scalable,free and easy to use.

On Deck for Next Week

-Gather more data(By Friday May 14th at 3PM-Maurizio and Jeremy):
Leave computers on 24/7 to extract a larger sample from the Twitter API. Dump that data into a database for the data analysis.

-Statistical Analysis (Mike, Diego and Nedu. Maurizio and Jeremy will help as well) (By Wednesday)
Use tools such as Matlab and R to extract meaningful insights from the data and test our hypotheses. Ask the staff for feedback before we proceed with the coding.

Create a functional prototype of the product (By Friday May 21st)

Test the product (Mike, Nedu, Molino, volunteers-over the weekend)
See what improvements can be made and how we can make the user experience better.

Build a more sophisticated tool

Candid Discussion of Project Status

We came up with the idea for this project a little bit late and as a result we haven't done much coding yet. However, the project is heading in a good direction as we think we are asking the right questions, and if done correctly this could be a valuable tool, both for personal use and for the use of companies/mass tweeters. The team is motivated and ready to get the project up to speed. We anticipate that by this Friday we will be fully caught up, if not ahead.
The critical risk factors to consider are that the data gathered may not be statistically significant or may be somewhat biased given that what we have is a relatively small sample.