(*********************************************************************** Mathematica-Compatible Notebook This notebook can be used on any computer system with Mathematica 3.0, MathReader 3.0, or any compatible application. The data for the notebook starts with the line of stars above. To get the notebook into a Mathematica-compatible application, do one of the following: * Save the data starting with the line of stars above into a file with a name ending in .nb, then open the file inside the application; * Copy the data starting with the line of stars above to the clipboard, then use the Paste menu command inside the application. Data for notebooks contains only printable 7-bit ASCII and can be sent directly in email or through ftp in text mode. Newlines can be CR, LF or CRLF (Unix, Macintosh or MS-DOS style). NOTE: If you modify the data for this notebook not in a Mathematica- compatible application, you must delete the line below containing the word CacheID, otherwise Mathematica-compatible applications may try to use invalid cache data. For more information on notebooks and Mathematica-compatible applications, contact Wolfram Research: web: http://www.wolfram.com email: info@wolfram.com phone: +1-217-398-0700 (U.S.) Notebook reader applications are available free of charge from Wolfram Research. ***********************************************************************) (*CacheID: 232*) (*NotebookFileLineBreakTest NotebookFileLineBreakTest*) (*NotebookOptionsPosition[ 7558, 188]*) (*NotebookOutlinePosition[ 8950, 230]*) (* CellTagsIndexPosition[ 8906, 226]*) (*WindowFrame->Normal*) Notebook[{ Cell[CellGroupData[{ Cell["Cluster Analysis", "Subtitle"], Cell[BoxData[ \(Needs["\"]\)], "Input"], Cell["\<\ This package is needed for the data: in general, the principal \ components of the original data are used in this clustering procedure rather \ than the data itself.\ \>", "Text"], Cell["\<\ The main function is recursive, so special cases are included to \ avoid infinite recursion\ \>", "Text"], Cell[BoxData[ \(clusterData::usage = "\"\)], "Input"], Cell[BoxData[ \(clusterData[currentclusters : {__?MatrixQ}, data_?MatrixQ, threshold_Real] /; \((\((Length[data] > 1)\)\ && \ \((And\ @@\ \((\ Thread[ \((\(\(Dimensions[#]\)\[LeftDoubleBracket]2 \[RightDoubleBracket]\ &\)\ /@\ currentclusters) \) == \(Dimensions[data]\)\[LeftDoubleBracket]2 \[RightDoubleBracket]\ ])\)\ )\)\ )\) := \n\t clusterData[updateclusters[currentclusters, First[data], threshold], Rest[data], threshold]\)], "Input"], Cell[BoxData[ \(clusterData[currentclusters : {__?MatrixQ}, data : {_?VectorQ}, threshold_Real]\ /; \((And\ @@\ \((\ Thread[ \((\(\(Dimensions[#]\)\[LeftDoubleBracket]2 \[RightDoubleBracket]\ &\)\ /@\ currentclusters) \) == \(Dimensions[data]\)\[LeftDoubleBracket]2 \[RightDoubleBracket]\ ])\)\ )\)\ := \n\t updateclusters[currentclusters, First[data], threshold]\)], "Input"], Cell[BoxData[ \(clusterData[currentclusters : {__?MatrixQ}, {{}}, threshold_Real] /; \((Equal\ @@ \((\ \(\(( \(Dimensions[#]\)\[LeftDoubleBracket]2 \[RightDoubleBracket])\)\ &\)\ /@\ clusters)\))\) := \ currentclusters\)], "Input"], Cell["\<\ This is the meat of the program. The two main versions of \ clusterData (other than the \"empty points\" one) call updateclusters as the \ means of allocating each new point.\ \>", "Text"], Cell[BoxData[ \(updateclusters::usage = "\"\)], "Input"], Cell[BoxData[ \(\(updateclusters[currentclusters : {__?MatrixQ}, newpoint_?VectorQ, threshold_Real] /; \n\t \((And\ @@\ \((\ Thread[ \((\(\((\(Dimensions[#]\)\[LeftDoubleBracket]2 \[RightDoubleBracket])\)\ &\)\ /@\ currentclusters)\) == Length[newpoint]\ ])\))\) := \n\t With[{centroids = Sqrt[Apply[Plus, \ \((\(\((\((First[#] - newpoint)\)^2)\)&\)\ /@currentclusters) \), {1}]]}, \n\t With[{closest = \(Flatten[Position[centroids, Min[centroids]]] \)\[LeftDoubleBracket]1\[RightDoubleBracket]}, \n\t\t If[centroids\[LeftDoubleBracket]closest\[RightDoubleBracket] < threshold, \n\t\t\t If[\(Dimensions[currentclusters]\)\[LeftDoubleBracket]1 \[RightDoubleBracket] == 1, \n\t\t\t\t Insert[currentclusters, {newpoint}, {\(-1\)}], Insert[currentclusters, newpoint, {closest, Length[currentclusters\[LeftDoubleBracket]closest \[RightDoubleBracket]] + 1}]], Join[currentclusters, {{newpoint}}]]\ ]]\ \)\)], "Input"], Cell["\<\ These are two information statistics that are intended to help \ determine the best number of clusters. clusterLambdaStatistic is the ratio of the determinant of the matrix of sums \ of sums of squares and crossproducts of each cluster, to the determinant of \ the matrix of the sums of squares and cross products for the whole dataset. Think of this as the ratio of the sum of the variations within each cluster, \ to the total variation in the data set. Smaller numbers are better. MarriotClusterStatistic trades off parsimony for explanatory power, being the \ product of the square of the number of clusters, and the determinant of the \ matrix of sums of sums of squares and crossproducts of each cluster. Think of this as the product of the total within-cluster variation and the \ square of the number of clusters. The more clusters, the less variation \ within each of them, but this is traded off against the number of clusters. \ Again, the smaller the value of the statistic, the better.\ \>", "Text"], Cell[BoxData[ \(clusterLambdaStatistic[clusters : {__?MatrixQ}] /; \((Equal\ @@\ \((\(\((\(Dimensions[#]\)\[LeftDoubleBracket]2 \[RightDoubleBracket])\)\ &\)\ /@\ clusters)\))\) := \n \tWith[{joined = Join@@\ clusters}, \n\t\t Det[Plus\ @@ \((\(Apply[Plus, Apply\ [Plus, Outer[Times, Transpose[#], #], {1}], {1}]& \)\ /@\ clusters)\)]/\n\t\t\t\t Det[Apply[Plus, Apply\ [Plus, Outer[Times, Transpose[\((joined)\)], \((joined)\)], {1}], { 1}]]\ ]\)], "Input"], Cell[BoxData[ \(MarriotClusterStatistic[clusters : {__?MatrixQ}] /; \((Equal\ @@\ \((\(\((\(Dimensions[#]\)\[LeftDoubleBracket]2 \[RightDoubleBracket])\)\ &\)\ /@\ clusters)\))\) := \n \t\t\tLength[clusters]^2* Det[Plus\ @@ \((\(Apply[Plus, Apply\ [Plus, Outer[Times, Transpose[#], #], {1}], {1}]& \)\ /@\ clusters)\)]\)], "Input"] }, Open ]] }, FrontEndVersion->"Macintosh 3.0", ScreenRectangle->{{0, 1152}, {0, 850}}, WindowSize->{866, 760}, WindowMargins->{{Automatic, 89}, {23, Automatic}}, PrintingCopies->1, PrintingPageRange->{1, Automatic}, PageHeaders->{{Cell[ TextData[ { CounterBox[ "Page"]}], "PageNumber"], Inherited, Cell[ TextData[ { ValueBox[ "FileName"]}], "Header"]}, {Cell[ TextData[ { ValueBox[ "FileName"]}], "Header"], Inherited, Cell[ TextData[ { CounterBox[ "Page"]}], "PageNumber"]}}, PrintingOptions->{"PrintingMargins"->{{36, 36}, {36, 36}}, "PrintCellBrackets"->False, "PrintRegistrationMarks"->False, "PrintMultipleHorizontalPages"->False}, MacintoshSystemPageSetup->"\<\ 01h0005X0FP000003g`;C?oComH@A0]f8085N`?P0000005X0FP000003g`;C05/ 038;C4LH05000@4100000BL?00400@=930`\>" ] (*********************************************************************** Cached data follows. If you edit this Notebook file directly, not using Mathematica, you must remove the line containing CacheID at the top of the file. The cache data will then be recreated when you save this file from within Mathematica. ***********************************************************************) (*CellTagsOutline CellTagsIndex->{} *) (*CellTagsIndex CellTagsIndex->{} *) (*NotebookFileOutline Notebook[{ Cell[CellGroupData[{ Cell[1731, 51, 36, 0, 51, "Subtitle"], Cell[1770, 53, 84, 1, 27, "Input"], Cell[1857, 56, 190, 4, 30, "Text"], Cell[2050, 62, 115, 3, 30, "Text"], Cell[2168, 67, 189, 3, 43, "Input"], Cell[2360, 72, 610, 11, 59, "Input"], Cell[2973, 85, 503, 9, 59, "Input"], Cell[3479, 96, 305, 6, 59, "Input"], Cell[3787, 104, 199, 4, 30, "Text"], Cell[3989, 110, 168, 3, 43, "Input"], Cell[4160, 115, 1266, 24, 155, "Input"], Cell[5429, 141, 1029, 18, 190, "Text"], Cell[6461, 161, 634, 13, 75, "Input"], Cell[7098, 176, 444, 9, 59, "Input"] }, Open ]] } ] *) (*********************************************************************** End of Mathematica Notebook file. ***********************************************************************)