Archive for the ‘storm!’ Category

That Voight-Kampff test of yours: Language

Tuesday, October 14th, 2008
  1. Tools
  2. Artificial Intelligence & Knowledge Representation
  3. Acoustics
  4. Computational Linguistics
  5. Experimental Linguistics & Psycholinguistics
  6. Philosophy
  7. Language-based Linguistics (as opposed to Concept-based)
  8. Compilers/PL

The Zen of Cluster Counting

Sunday, May 25th, 2008

Problem I’m using k-means (or insert-clustering-gizmo-here) algorithm. How many clusters shall I partition my data into?

Solution Consider the scale of the clustering, which is naively the zoom setting at which the data is plotted. At a wide enough zoom, all data is one cluster, at telephoto, each point is a cluster. Scale is chosen at the outset of the problem determined not by clustering algorithm but by what is to be achieved by the clustering.

Let’s characterize the correct number of clusters at a given scale.

  1. A big Tibshirani Gap: Across-group variance of n-grouped similarly-distributed random data is much higher than the across-group variance of n-grouped given data.
    • How big is big? Try various values of n and pick the largest.
    • How do you generate similarly distributed radom data in high dimensions??
  2. Low across-iteration variance in variance: If the number of clusters hits the sweet spot, the grouping will be stable across iterations; i.e. a global minima will exist for minimum variance which can be attained several times. For n-clustered data, the variance measure at each iteration will be stable.
    • What’s the variance for multi-dimensional data? For a basic implementation: the trace of covariance matrix.
    • How many iterations? Thousands of them.

Analog Karnaugh Maps

Saturday, November 24th, 2007

Problem A set of input n-sets, and the desired output for each is available. The objective is to determine a function f(x_i1, … x_in)=y_i for each n-set and output value. This is a continuous valued “truth-table.”

Solution Regression.

MATLAB Session Reports

Thursday, November 15th, 2007

Problem I run a MATLAB session, and test numerous hypotheses. It remains undocumented because there’s no standard for elegant documentation.

Discussion

Features of an organization standard

  • Organization of input
  • Allow modification
  • Inclusion of all graphs
  • Exclusion of selected figures
  • Automation
  • Integration into target documentation, in this case, TeX/LaTeX

(more…)