Biopsy And Clustering Essay

Improved Essays
Clustering with Biopsy
Introduction
For this assignment we have been instructed to experiment with clustering. Within the MASS library we can find the data set biopsy, which gives approximately nine attributes for breast tumors of 699 patients. The first column of the data is the patient ID number and the last column is the classification (“benign” or “malignant”) of the tumor.
Scenario
Our responsibility is as follows: remove ID, classification and replace any missing values with zero; select two clustering techniques and determine the number of clusters; cross-tabulate the two methods with each other and compute the value of CramersV; and then cross tabulate one cluster method against the true class.
Data Preparations

In order to properly analyze the biopsy data for
…show more content…
Hence, the reason it is called the elbow method and is considered an appropriate indicator for determining the number of clusters. This elbow looking plot is a representation of when the total within cluster sum of squares is minimized and as compacted as possible. This may not always be the best answer but I believe it is better than just picking arbitrary values for (k).

Now that I have established a reasonable number of clusters; I then run clustering techniques for kmeans and pam

Kmeans Method

With this method I used the process shown in class. I set nstart to a large number, so that I could establish a global optimum. I had to run this twice because it will give different results. I then setup a two-way table so I could see how close I have come to a global optimum. The Rcode is below as well as Figure 2 with the results:

It appear that both of the runs for kmeans agree and that I am fairly close to optimal by clustering in two groups.

Pam Method

Based on the given from above, I simply clustered using pam function by cluster in two groups. The following Rcode was used to cluster with

Related Documents

  • Great Essays

    Nt1330 Unit 6 Mis

    • 1255 Words
    • 6 Pages

    The main issue with FOS was coming up with an algorithm that determines the important deliverables of thermocouple systems by using time history data generated from a first-order system. Deliverables include: our MATLAB coded algorithm -which outputs the parameters yL, yH, ts, and � - an executive MATLAB file -which outputs SEE and standard deviation values- and a regression plot which compares the prices of FOS models to �. Determining these four parameters within 20.0% of the calibrated data points merits a fairly successful model. We had to deal with one important constraint: we are working with data specific to a first-order system, and must use equations that appropriately match.…

    • 1255 Words
    • 6 Pages
    Great Essays
  • Decent Essays

    Step1: Start the program Step2: Initialize the nodes by fixing the number of nodes, type of antenna used, type of routing protocol and plotting circumference Step3: Frequency is allocated for the MIMO antennas. Step4: Positioning and plotting the nodes Step5: Base Bandwidth allocation for primary and secondary nodes • Primary network range-…

    • 306 Words
    • 2 Pages
    Decent Essays
  • Improved Essays

    The specification of hardware is GPU used : NVIDIA GTX280 (has about 30 multiprocessors each with 8 processors, frequency is 1.29 GHz) CPU used : Intel i5D, 4 cores, frequency of 2.67 GHz. GPU memory, bandwidth : 1 GB, 141.7GB/s To get a more clear picture speedup calculated only after the I/O file is completed. Results that are obtained from the proposed differential (data size dependent) approach are compared with other approaches like HP_k_means (for smaller hence low-dimension data), UV_k-means , GMiner (for large data sets) and then fialy the performance is compared with CPU. A. Small data sets (Low –dimension) For this a data set of sizes 2 million and 4 million with varying values of “k” (number of the distinct sets/groups) and “d”…

    • 971 Words
    • 4 Pages
    Improved Essays
  • Superior Essays

    Li Faumeni Syndrome

    • 1484 Words
    • 6 Pages

    The control, however, is very important in the determination of possible metastases because it contains wild type (normal) DNA that can be compared to Valerie’s DNA samples. In Valerie’s peripheral blood and normal breast tissue samples, a copy of the wild type DNA is present in addition to a copy of the cancer DNA, thus she is heterozygous in this area. In Valerie’s tumor tissue sample, however, there is no copy of the normal gene present, thus she is homozygous for the cancer DNA in this area. Because Valerie has one copy of the wild type…

    • 1484 Words
    • 6 Pages
    Superior Essays
  • Improved Essays

    3 Week Diet Review Essay

    • 578 Words
    • 3 Pages

    5. Five useful and powerful methods used by the 3 Week Diet to burn the obstinate fat The 3 Week Diet created by the personal trainer and nutritionist, Brian Flat is considered a revolutionary and multipurpose weight loss weight loss program, as it assists people in attaining their weight loss goals as well as within a short time of 21 days. In his creation, the author asserts that users may get several health benefits, including:  Reduced cellulite.  Quicker metabolism.…

    • 578 Words
    • 3 Pages
    Improved Essays
  • Great Essays

    A single cell cannot grow larger and larger indefinitely because on a cellular level and metabolic level, a cell requires essential provisions, cell-to-cell communication, and transport of molecules/nutrients. The outside of a cell grows slower than the inside of a cell. If a single cell were to continuously grow the outside layer of the cell would not be able to keep up. The larger a cell is, the more difficult it is to transport nutrients and oxygen from outside of the cell into the inside of the cell. Consequently, the large cell would have more trouble transporting water, nutrients, and oxygen across the cell membrane and through the plasma membrane.…

    • 1357 Words
    • 6 Pages
    Great Essays
  • Great Essays

    The biomedical model ideal originate from Rudolf Virchochow, considered the “Father of Pathology”, concluded that all disease is a result of cellular abnormality or disturbance. In order to further understand pathology, this model adopted a factor analysis approach, which helps to reduce large sums of data to remove replica data. This…

    • 1484 Words
    • 6 Pages
    Great Essays
  • Decent Essays

    Cronbach Alpha

    • 551 Words
    • 3 Pages

    2. There is a good, positive correlation between the six variables in the HC scale. The Cronbach Alpha is .769. From looking at the Inter-Item Correlation if I delete any variable the Cronbach Alpha will go down so there is no need to delete anything. The SE scale Cronbach Alpha was .686, after looking at the Inter-item Correlation chart all of the variables were positive but there were a few variable that were low.…

    • 551 Words
    • 3 Pages
    Decent Essays
  • Superior Essays

    Epidemiologic Surveillance 6318 Short Report # 3 An Evaluation of the STI Surveillance in the Summerville County Importance of Evaluation According to CDC, surveillance is the ongoing systemic collection, analysis, and interpretation of health related data which is essential for planning, implementation, and evaluation of public health practice (Lee et al, 2010). Surveillance system evaluation should be done periodically to assess the effectiveness, relevance and impact of the surveillance system (Lee et al, 2010). Evaluation of the surveillance system is of importance because it will help ensure that data collection resource is used efficiently and that public health problems is monitored effectively (Lee et al, 2010).…

    • 1344 Words
    • 6 Pages
    Superior Essays
  • Improved Essays

    3.8.2 Geary’s C Another measure of spatial autocorrelation is Geary’s C statistic which ranges from 0 to 2 where 0 signifies maximum positive spatial autocorrelation or clustering, 1 signifies no autocorrelation or randomness and 2 signifies maximum negative autocorrelation or dispersal. If the values of Geary’s C are low it indicates positive spatial autocorrelation and if the values are high it indicates negative spatial autocorrelation. The calculation is similar to Moran…

    • 1289 Words
    • 6 Pages
    Improved Essays
  • Improved Essays

    Essay On Neoplasm

    • 465 Words
    • 2 Pages

    The term neoplasm refers to tumors that are masses or growths that arise from normal tissue. Can a growth occur at any time in life? Are all neoplasms life-threatening? Explain in your own words why or why not.…

    • 465 Words
    • 2 Pages
    Improved Essays
  • Great Essays

    The Master Surgery Scheduling Level The MSSP involves the development of an MSS, a cyclic timetable that determines the patient category associated with each block of OR time. At this level, the target is to generate an MSS that balances both beds and nurses daily requirements, while considering surgeons preferences. In this study, it is assumed that surgeons prefer working for certain number of days per week. The model objective function minimizes the weighted sum of peaks in the daily bed occupancy and nurse daily workloads.…

    • 1702 Words
    • 7 Pages
    Great Essays
  • Improved Essays

    Linear Kriging Essay

    • 702 Words
    • 3 Pages

    The Kriging weights w_i1,w_i2,w_i3,…w_ik can be used to estimate the fair value of VA contracts x ⃑_i by the following formula, y ̂_i=∑_(j=1)^k▒〖w_ij∙y_j 〗. The kriging weights can be calculated by the following linear equations, Where the is a control variable, which is used to make sure that ∑_(j=1)^k▒w_ij =1. V_rs=α+exp⁡(-3/β D(z ⃑_r,z ⃑_s,λ)),r,s=1,2,3,…,k, D_ij= α+exp⁡(-3/β D(x ⃑_i,z ⃑_j,λ)),j=1,2,3,…,k, The D(.) function is the distance function mentioned in the clustering section,α≥0 and β≥0 are two parameters.…

    • 702 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    On Monday, October 5th we discussed how we would going about not using your phone to text for 24 hours, so I decided that I would not text from Monday at 11 o’clock until the same time on Tuesday. My immediate thought process was that this assignment was going to be rather challenging because sometimes you just need to text people to remain in contact and coordinate your daily routine. But as I began to think more about it, I realized how easy this task was going to be. All I had to do was not send a text for 24 hours, I could still read and receive text messages and I still had various social media platforms to keep myself entertained.…

    • 1502 Words
    • 6 Pages
    Improved Essays
  • Improved Essays

    TEEN And CH Protocols

    • 953 Words
    • 4 Pages

    TEEN and LEACH Protocols LEACH and TEENfollow self-organizing and adaptive CH selection criteria. In setup phase, CH is elected on the bases of following threshold equation1 [32]; Fig.3 and Fig.4. T(n)={■(p/(1-p(rmod 1/p) )&ifn∈G@0&otherwise) ┤ (1) wherep is the desired number of CHs, r is the current round and G is the set of nodes that have not been CH in the current epoch. Epoch is the number of rounds for a CH, after which again it become eligible to become a CH. Each node generates a random number between 0 and 1, if the number is less than the node’s threshold, then this sensor node becomes a CH.…

    • 953 Words
    • 4 Pages
    Improved Essays