general-discussion > Error in running 'niak_kmeans_clustering'
Showing 1-3 of 3 posts
Apr 11, 2011 10:04 PM | mohammad z
Error in running 'niak_kmeans_clustering'
Hi, Running m-file 'niak_kmeans_clustering' , I get this error "Undefined function or method 'niak_part2mat'" It seems there is no such a file 'niak_part2mat' Can 'niak_kmeans_clustering' function be used for bisecting k-means? In every phase of bisecting k-means method, we gain two clusters that is OPTIMAL. (the two centroids of two partitions update in an iteration until the clusters will be optimal.) Does 'niak_kmeans_clustering' have the same result? Thanks. |
Apr 14, 2011 09:04 PM | Pierre Bellec
RE: Error in running 'niak_kmeans_clustering'
Dear Mohammad,
Sorry for the late reply, I needed to work a little bit for this issue. You are using a feature found in the development version, and I happened to forget a dependency from my private libraries. I have just commited the missing function to the repository. Please check out the latest version of the code on google code site :
http://code.google.com/p/niak/source/checkout
You will also need a version of PSOM :
http://www.nitrc.org/projects/psom
Here's an example of how to use bisecting k-means :
>> tseries = [(0.5+randn([100 20])) (1.5+randn([100 20])) (5+randn([100 20])) (6+randn([100 20]))];
>> opt_k.nb_classes = 4;
>> opt_k.flag_bisecting = true;
>> part = niak_kmeans_clustering(tseries,opt_k);
>> niak_visu_part(part);
Regarding your question on the algorithm, at each iteration the cluster with largest inertia (sum of squared Euclidian distance to the cluster mean) is splitted into two using standard k-means (well you can still decide how to initialize this k-means and how to deal with empty clusters). The main advantage of this approach is that it can be much faster than the standard k-means, it depends less on the initialization and will generally result into exactly the number of specified clusters (while clusters often disappear in the standard k-means when a large number of clusters are identified). Note that in my experience, a better initialization for k-means such as k-means++ [url=http://en.wikipedia.org/wiki/K-means%2B%2B]http://en.wikipedia.org/wiki/K-means%2B%2B[/url] (as opposed to a random partition) does also address these issues and can in some situations perform better than the bisecting k-means. You can try it out if you want (it is also implemented in the development version) :
>> tseries = [(0.5+randn([100 20])) (1.5+randn([100 20])) (5+randn([100 20])) (6+randn([100 20]))];
Sorry for the late reply, I needed to work a little bit for this issue. You are using a feature found in the development version, and I happened to forget a dependency from my private libraries. I have just commited the missing function to the repository. Please check out the latest version of the code on google code site :
http://code.google.com/p/niak/source/checkout
You will also need a version of PSOM :
http://www.nitrc.org/projects/psom
Here's an example of how to use bisecting k-means :
>> tseries = [(0.5+randn([100 20])) (1.5+randn([100 20])) (5+randn([100 20])) (6+randn([100 20]))];
>> opt_k.nb_classes = 4;
>> opt_k.flag_bisecting = true;
>> part = niak_kmeans_clustering(tseries,opt_k);
>> niak_visu_part(part);
Regarding your question on the algorithm, at each iteration the cluster with largest inertia (sum of squared Euclidian distance to the cluster mean) is splitted into two using standard k-means (well you can still decide how to initialize this k-means and how to deal with empty clusters). The main advantage of this approach is that it can be much faster than the standard k-means, it depends less on the initialization and will generally result into exactly the number of specified clusters (while clusters often disappear in the standard k-means when a large number of clusters are identified). Note that in my experience, a better initialization for k-means such as k-means++ [url=http://en.wikipedia.org/wiki/K-means%2B%2B]http://en.wikipedia.org/wiki/K-means%2B%2B[/url] (as opposed to a random partition) does also address these issues and can in some situations perform better than the bisecting k-means. You can try it out if you want (it is also implemented in the development version) :
>> tseries = [(0.5+randn([100 20])) (1.5+randn([100 20])) (5+randn([100 20])) (6+randn([100 20]))];
>> opt_k.nb_classes = 4;
>> opt_k.flag_bisecting = false;
>> opt_k.type_init = 'kmeans++';
>> part = niak_kmeans_clustering(tseries,opt_k);
>>
niak_visu_part(part);
I hope this helps,
Pierre