Archive for the ‘idea!’ Category

The Zen of Cluster Counting

Sunday, May 25th, 2008

Problem I’m using k-means (or insert-clustering-gizmo-here) algorithm. How many clusters shall I partition my data into?

Solution Consider the scale of the clustering, which is naively the zoom setting at which the data is plotted. At a wide enough zoom, all data is one cluster, at telephoto, each point is a cluster. Scale is chosen at the outset of the problem determined not by clustering algorithm but by what is to be achieved by the clustering.

Let’s characterize the correct number of clusters at a given scale.

  1. A big Tibshirani Gap: Across-group variance of n-grouped similarly-distributed random data is much higher than the across-group variance of n-grouped given data.
    • How big is big? Try various values of n and pick the largest.
    • How do you generate similarly distributed radom data in high dimensions??
  2. Low across-iteration variance in variance: If the number of clusters hits the sweet spot, the grouping will be stable across iterations; i.e. a global minima will exist for minimum variance which can be attained several times. For n-clustered data, the variance measure at each iteration will be stable.
    • What’s the variance for multi-dimensional data? For a basic implementation: the trace of covariance matrix.
    • How many iterations? Thousands of them.

Analog Karnaugh Maps

Saturday, November 24th, 2007

Problem A set of input n-sets, and the desired output for each is available. The objective is to determine a function f(x_i1, … x_in)=y_i for each n-set and output value. This is a continuous valued “truth-table.”

Solution Regression.

Anaglyph Pen in Gimp

Tuesday, November 20th, 2007

Problem I got red/cyan anaglyph glasses, but don’t have red/cyan pencils to play around with.

Briefly Make a gimp brush to go with them!

(more…)

Poisson Control

Friday, November 9th, 2007

Problem Digital camera photograph is noisy.

Discussion This noise is generally modeled as a summation of Gaussian noise, constant “dark noise”, and Poisson noise. In situations where Poisson noise is the limiting factor, point process modeling of noise as spatial Poisson process can be used to estimate the signal.

Under these assumptions it comes down to: smooth over large areas when signal intensity is high, small areas when it’s low.

What more can the Poisson model yield?