More servicesWindows Live
HomeHotmailSpacesOneCare
 
MSN
Sign in
 
 
Spaces home  The Next ThoughtProfileFriendsBlogMore Tools Explore the Spaces community

The Next Thought

In Egyptian mythology, thoth was the god of the moon, associated with writing and wisdom. Here are my thoughts and to quote Ludwig Wittgenstein, "the limits of my language mean the limits of my world."

Media Makeup, USA, Olympics

You don’t watch LIVE Olympics now-a-days, at least not on the NBC channels, you watch events captured LIVE by NBC but delivered to you hours after the news is out of the Great Wall about the outcome. You couldn’t watch the opening ceremony live, when the whole world was enjoying the spectacle with baited breath. This is so very ridiculous. Think of it, frozen food is fine but some things are better experienced LIVE than canned. You get up at 5 AM and watch some lipstick painted faces telling how great a day it would be, how the stock market will boom when the magical moment of 08.08.08.08.08.08 is upon us. So much for corporatism and High-Definition TV.

Then again, the US performance in the Olympics has been disastrous. The media does not highlight it nor do the analysts discuss it. Look at the medal tally. China has won handsomely over the others in many events. 35 gold medals vs just 19 for the US. If you take out Phelps 8, the US has won just 11 others. And still you show US on top of the chart while all along I knew that the gold medals haul decides the rankings. So much for Mr Bush’s fan support.

image

The country that re-discovered the spelling of colour, is rediscovering the power of color Red now.

Macro Indicators

We all love macro indicators. If you can tell that there will be a storm in the Pacific coast, given the butterfly flapped it’s wings half way round the globe, nothing like it. The author signing books at a release function can gauge that his amateur ad-writing days for the local underwear company is over. The maturity of a association or club or conference can be gauged from the awards it given out. Life-time achievement awards are definitely a sign of super-maturity. Maybe that’s why universities bestow honorary PhD's by dozens. When a C-movie comes out on the current economic downturn, pray hard, you survived!

Google vs Live – 08.08.08 Olympics

We are moments away from history. 08.08.08 begins in a BRIC nation China. The Olympic of billions of people not just billions of dollars. Folks like me will be watching coverage online live courtesy http://www.nbcolympics.com and their Silverlight player. It’s the coming of age of .Net, the coming of age of Internet, the coming of age of democratic communism. A billion dreams and memories will be formed and shared. Dope tests, world records, and of course the great Indian rope trick – billion people, 100 athletes no medals. I am so excited.

However, Gold for web search, yet again to Google. See below for why.

image image

How could they miss this? I mean get someone to make a database entry. Well it proves machines are not humans and they don’t learn so well either!

The Problem of Scale : The saga continues

I blogged briefly about the problems of aplenty.  Well I have been grappling on the problem. The basic idea has been to create a graph in memory and walk it. The data stays in the database, we pull it up, walk (hope the algo will run) and dump the results back into the database.

In situations like these when you are trying to fit a 8GB data into a 4GB physical memory you realize that every name like “This is a function” takes zillion of bytes more than generating a identifier which is usually a long. Well I did that and bingo I was down to 2GB in DB space. Still loading up the entire data was taking up hours. The initial load time was pretty fast and that went down exponentially thereafter. By the time you had everything in memory and started your walker, it’s time to call it a day. We tried and we failed, thrice.

Turned out in .Net the greatest size of a single object can be 2GB. With all the nodes and edges and extras, we sure were hitting the graph. You can break it down into 5 subgraphs, but you lose the elegance of the algorithm, which I bet runs like lightning.

Suddenly you feel you are back 20 years in time, cranking out programs by bits on a 640KB RAM. I have tried playing around with R and igraph. iGraph is a superb piece of geekware but goes belly up at ~10 million nodes for a Erdos-Renyi graph.

Can you imagine, I am thinking of writing a paging algorithm for my app that uses SQL as the disk?

BTW, I read this paper on processing massive datasets elegantly using Scope. I don’t have that much to do to ask for COSMOS. But to quote Einstein, “The questions remain the same, the answers change”

Quite a Lot

It’s not often that you realize the bounds of the 4GB memory model. It’s not often you scour the net for information and find none. It’s not often you need to process billions of items using the same abstraction model you used for the HelloWorldish data.

It’s not often you need to run BFS on a graph of million nodes, million times over. It’s fun, it’s quite a lot.

Document Similarity

Many approaches have been tried to identify and measure document similarity. Most of them concern about text content. The classic measure of similarity between 2 documents is to measure the cosine between their term vectors. When the term-document matrix is large, to make it computationally easier, we can use Latent Semantic Indexing to reduce the dimensionality. More recent statistical methods like Latent Dirchlet have also been used for better results.

On the other hand linguistics have advocated a language – science based approach using stemming, parts of speech tagging as a means to nail down the intent. WordNet experiment goes along this lines.

Semantic search is becoming mainstream by the day. The intent  of a English sentence depends not only on the syntax but also the choice of words, the context and the punctuation as well. While it’s true that 2 word query for the search engine does not indicate much, longer questions should be able to find all these and more and search documents on that. I have to investigate what semantic measures they use to distinguish sarcasm vs. plain sentence. Well if you van capture the mood of the user, before using his search string, would it help?

Hi how are you doing today? What can I help you look for?

First Order Thought

billgates

Most of our instinctive thoughts are useless. Leave them to linger, maybe come back or log them down. 80% are useless. Unfortunately most of the time we pay more attention to them than we should. What if the guy down the road honked his horn hard? Fact is you should have an agenda of your own. If you think a little high than usual, the thought process takes care of keeping you busy. Some people get it right. With tears in his eyes, Bill Gates ended an era he’s made for himself. I watched the send-off, live. Very poignant …the feeling’ and reflections are sinking in still, it will continue.

Privacy And Social Networks

Privacy is a six-dimensional problem, probably. The 5 inputs (Provider, Content, Consumer,Medium,Time) will determine the measure of privacy of the content been shared between the Provider and Consumer at a particular time.

Let us try to define the concepts in the above statement as:

  1. Provider: The agent who provides the content.
  2. Content: The comment or video or photo that’s being shared
  3. Consumer: The agent who will receive the content for consumption or relay
  4. Medium: This is the medium through which content has to be exchanged. E.G. cell phone
  5. Time: The instance or interval when the exchange happens. Time is an essential criteria, since rules and preferences of all the other inputs can be expressed in temporal constraints
  6. Privacy: A number in [0,1] where 0 indicates total privacy and 1 indicates no privacy, like newspaper article.

As you can see we can define explicit rules for 1-4 in terms of access control and optionally  make that time-dependent.

E.G. Provider X (who is authenticated and the President) can send text e-mail [Content] using the Corporate Network [Medium] to All General Mangers [Consumers] at any time of the day[Time]

How does privacy come into the picture? Let us identify a few situations.

  1. Can X control which General Mangers he wants to exclude?
  2. Can X control which General Manger sees which attachment?
  3. Can X control that the mail is not forwarded or downloaded?
  4. Can X control that the mail is deleted automatically after 10 days?

All of these questions relate to the control X will have on the e-mail over it’s life time as to who does what with it.

A social network is usually a ego network. Users will go to any length to reveal personal and semi-personal information to gain mileage. [Ralph Gross Et al., Information Revelation and Privacy in Online Social Networks, Pre-proceedings version. ACM Workshop on Privacy in the Electronic Society (WPES), 2005 here]

While today’s social networks have improved quite a bit in terms of granularity of controls exposed to the user for different content, there are several downsides that still remain

  1. What about the data being still there on the social network and the network provider mining it for preference discovery?
  2. Can you track who does what with your data? There are Trojans everywhere
  3. Can you remember or track which data you provided to which site?

The Internet was built on simplicity and openness. SMTP never had sender authentication and created the Spam industry. Similarly privacy has never been a fundamental pillar of Social Applications. We may be getting there, but in the interim, what about

  1. Whenever I upload data to a site, the site stores the data internally and provides me a digital digest, which I can store so that I can track my uploads.
  2. Whenever the user uploads content, the upload engine would try to analyze the content and tags and suggest privacy ratings for the user to approve. This rating can then be used against the user preferences during downstream distribution

Providing a 1000 controls to select my privacy options for different data points is just too raw. The systems should rate content dynamically and maybe against previous content in the same category and suggest options for the user. E.G. A person having photos of babies in his album should get suggested a high privacy rating for uploading a photo of the open human brain.

Finally, privacy of content should change with time, e.G. increase when content is meant for point-to-point consumption.

Price of e-books

This book sells for a Benjamin at Amazon. The Kindle version sells at $75 or so. Given that the re-sell market for e-books is almost non-existent, it beats me why I will buy the Kindle version, if I am a cost-sensitive buyer.

Mining Graph Data
by Read more about this book...

Second, given the cost of paper, or transportation  I no reason to think why the book shouldn’t sell at $25. On top of that, just see the feedback and reviews. Far too less for a book on the topic. Maybe they should sell the Kindle version for $10. What say?

About Solving Intractable Computation Problems

How do we replicate Human Intelligence, cognition capability, and emotions in a computer. Sounds Frankenstein-ous?image Artificial Intelligence, Machine Learning and Pattern Recognition have been adapted, developed and used in this direction since the past half century. The latest effort revolves around a new dimension, well you guessed it, money.

Wikinomics tells about how collaborative business activity won't be as controllable as independent enterprise, but it will produce better and unexpected results. Similarly voting algorithms like those used at Digg will aggregate user wisdom to rank and classify news items.

Image search is another difficult problem given that objects are not identifiable easily from images of poor quality, for example. The Google Image Labeler re-purposes takes this identification problem as  a tagging game between anonymous partners, who label images together while raking up points on the leader board

Wikipedia is another classic example of the wisdom of the crowds phenomenon.

Now the question is, what’s the incentive for the crowd? Up until now, we have seen the economies of attention and reputation play out extensively in successful human activities.These are slightly fine-grained and need to be adapted to the particular application for perfect results.

The economy of money on the other hand is like the sledgehammer in this context. Very application agnostic, easy to apply and control.

Live search, on the face of it, is trying to use the money economy to gain attention and reputation.

Trying to find the right incentive will probably be the crux of many scenarios for applying human knowledge to solving computation problems, it seems.

View more entries
 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by 
by