Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
199 Cards in this Set
- Front
- Back
The field of bioinformatics is split into two fields, what are those two fields? |
1) Computing and modeling 2) Data management |
|
In R programming what can stringAsFactor be used for? |
stringAsFactor is used if you do not set strings as factors and it will set strings as strings and nothing else |
|
In R programming, what would set.seed(#) do? |
Turns pseudo random numbers into specific random numbers (random numbers generated with random number generator algorithm) |
|
In R programming, is 'for' a loop? |
Yes |
|
In R programming, what is this saying: for (I in 1:11) |
Make columns until you have 11 columns |
|
In R programming, how you analyze the data rows for 1:10 of the column 1 and 2? |
myData[c(1,2,3,4,5,6,7,8,9,10), c(1,2)] OR myData[1:10, 1:2] |
|
In R programming, how you analyze the data rows for 1:10 of the first two columns in reverse order? |
myData[10:1, 1:2] |
|
In R programming, how you analyze the data rows for 1:10 of the first two columns in reverse order but not the third row of the reults? |
(myData[10:1, 1:2])[-3, ] |
|
In R programming, how you analyze the data rows for 1:10 of the first two columns in random order? |
sample(1:10) myData[sample(1:10), 1:2] |
|
In R programming, how you analyze the data rows for 1:10 of the first two columns, ordered by value in the second column ascending? |
myData[1:10, 2] order(myData[1:10, 2]) myData[order(myData[1:10,2]), 1:2] |
|
In R programming, how you analyze the data of column 1:2 of all rows with gene-names that begin with "q"? |
substr(mydata[,"genes"], 1,1) == "q" myData[substr(myData[,"genes"], 1,1) == "q", 1:2] |
|
In R programming, how you analyze the data rows of genes with the highest final expression level? |
myData[myData[,ncol(myData)] == max(myData[,ncol(myData)]), ] |
|
In R programming, what is ncol? |
Final column |
|
What type of file format is FASTA? |
Flat file format |
|
What is the R programming language? |
It is a statistics environment and programming language that is exceptionally well engineered It is free platform for data manipulation and analysis |
|
In R programming, what does the "#" character mark? |
Following text is a comment and not executed by R |
|
In R programming, what does help(rnorm) do? |
Opens a help page of Normal Distribution information |
|
In R programming, what does dnorm do? |
Gives density in normal distribution |
|
In R programming, what does pnorm do? |
Gives distribution function in normal distribution |
|
In R programming, what does rnorm do? |
Generates random deviates in normal distribution Length of result of normal distribution value is determined by n for rnorm |
|
In R programming, what does dbinom do? |
Gives density of binomial distribution |
|
In R programming, what does pbinom do? |
Gives distribution function of binomial distribution |
|
In R programming, what does qbinom do? |
Gives quantile function of binomial distribution |
|
In R programming, what does rbinom do? |
Generates random deviates of binomial distribution |
|
In R programming, what does apropos( ) do? |
Returns a character vector giving the name of all objects in the search list matching what (character string with name of an object) |
|
In R programming, how do you use apropos to find 1-letter things? |
apropos(""^.$") |
|
In R programming, how do you use apropos to find things that start with me? |
apropos("^me") |
|
In R programming, how do you use apropos to find 2-to-4 letter things? |
apropos("^.{2,4}$") |
|
In R programming, how do you use apropos to find that contain me? |
apropos("me") |
|
In R programming, how do you use apropos to find things that end with me? |
apropos("me$") |
|
In R programming, what does find( ) do? |
Returns a character vector giving the name of all objects in the search list matching what (character string with name of an object) |
|
In R programming, what does ignore.case do? |
Logical indicating if the search should be case-insensitive TRUE by default |
|
In R programming, what is ignore.case set to by default? |
TRUE |
|
In R programming, what does simple.words do? |
Logical If TRUE, the what argument is only searched as whole word |
|
In R programming, what does var, cov and cor do? |
Compute the variance of x and the covariance or correlation of x and y if these are vectors If x and y are matrices, then the covariance (or correlations) between the columns of x and columns of y are computed For cov and cor, one must give a matrix or a data frame for x or give both x and y |
|
In R programming, what does cov2cor do? |
Scales a covariance matrix into corresponding correlation matrix efficiently |
|
In R programming, what does dput( ) and head( ) do? |
Creates small, reproducible dataset with which your problem can be reproduced or your question illustrated |
|
In R programming, what does getwd( ) do? |
List what "Working Directory" is currently set to |
|
In R programming, what does setwd( ) do? |
To create a working directory set location |
|
In R programming, how would you use setwd( )? |
setwd("your/directory/name") |
|
In R programming, what does ls( ) do? |
Returns a vector of character strings giving the name of objects in the specific environment
Shows what data sets and functions a user has defined When invoked with no argument inside a function, ls returns the name of function's local variables |
|
In R programming, what does a( ) do? |
From seqinr package This is a vectorized function to convert three-letters amino-acid code into one-letter codes |
|
In R programming, what does aaindex do? |
From seqinr package List of 544 phsyicochemical and biological properties for 20 amino-acids Format: - H String: accession number in aaindex database - D String: data description - R String: LITDB entry number - A String: Author(s) - T String: Title of article - J String: Journal reference and comments - C String: Accession number of similar entries with the correlation coefficients of 0.8 (-0.8) or more (less). The correlation coefficient is calculated with zeroes filled for missing values - I Numeric named vector: amino acid index data |
|
In R programming, what is the H string of the aaindex? |
Accession number in aaindex database |
|
In R programming, what is the D string of the aaindex? |
Data description |
|
In R programming, what is the R string of the aaindex? |
LITDB entry number |
|
In R programming, what is the A string of the aaindex? |
Author(s) |
|
In R programming, what is the T string of the aaindex? |
Title of article |
|
In R programming, what is the J string of the aaindex? |
Journal reference and comments |
|
In R programming, what is the C string of the aaindex? |
Accession number of similar entries with correlation coefficients of 0.8 (-0.8) or more (less) The correlation coefficient is calculated with zeroes filled for missing values |
|
In R programming, what is the I string of the aaindex? |
Numeric named vector |
|
In R programming, what does <- mean? |
Assignment operator Assign a value to a constant |
|
In R programming, what does list( ) do? |
Creates a list function Generally ordered collections of components |
|
In R programming, what does return( ) do? |
Data gets into the function via function arguments and this data would be returned using the return function |
|
In R programming, what does browser( ) do? |
Enter browser mode Sets a breakpoint into your function Using an if(condition) browser( ) to insert a condition breakpoint or watchpoint |
|
What is an R packages (in R programming)? |
A package is a collection of code, documentation and often sample data |
|
In R programming, what is the seqinr package? |
Exploratory data analysis and data visualization for biological sequences (DNA and protein) data Include utilities for sequence data management under the ACNUC system |
|
How does R programming compute a matrix norm of x? |
Using LAPACK |
|
In R programming, if the element of x for a binomial distribution is not an integer, what would the result of dbinom would be? |
Zero (0) |
|
In R programming, what algorithm is p(x) for a binomial distribution computed with? |
Loader's algorithm |
|
What is R studio? |
A free IDE for R |
|
What is cargo cult science? |
Practices that have the semblance of being scientific but do not in fact follow the scientific method |
|
What is the Mbp1 component? |
It is an ig-fold transcription factor involved in regulation of cell cycle progression from G1 to S-phase It forms a complex with Swi6p that binds to Mlu1 cell cycle box regulatory element in promoters of DNA synthesis gene Located on chromosome 4 of yeast Positively regulates transcription by RNA polymerase 2 Usually found in nucleus |
|
Mbp1 forms a complex with what that binds to what? |
Forms a complex with Swi6p that binds to Mlu1 cell cycle box regulatory elements |
|
What does Mbp1 positively regulates? |
Transcription by RNA polymerase 2 and involved in G1 to S-phase transition in mitosis |
|
If a yeast has a null mutant of Mbp1, what happens to that yeast? |
Abnormal vacuolar and mitochondrial morphology Respiration defects Decreased ethanol tolerance Increase lifespan and budding index Increased resistance to caffeine and desiccation |
|
If a yeast is homozygous diploid null for Mbp1, what happens to that yeast? |
Sensitive to starvation |
|
If a yeast over expresses Mbp1, what happens to that yeast? |
Slow growth Affect cellular morphology and budding |
|
What is the SGD database? |
A web-based Saccharomyces genome database Includes summary, sequence, protein, analyze, function, literature, gene ontology, interactions, regulations and expressions Website is http://www.yeastgenome.org |
|
What is the NCBI? |
National Center for Biotechnology Information Largest international provider of data for genomics and molecular biology Its data is freely and openly available over the internet |
|
What is NCBI's Entrez? |
NCBI's primary text search and retrieval system that integrates the PubMed database of biomedical literature with 39 other literature and molecular databases |
|
What are NCBI's Boolean operators? |
Provide a way of generating precise queries that produce well-define sets of results AND, OR, NOT Requires to be in uppercase and processed in a left-to-right sequence |
|
What are NCBI's Limits pages? |
Pre-selected popular or useful searches that are available on the Limits page of each Entrez database Selecting any of the boxes intersects the current search with corresponding limited search term |
|
What are NCBI's Wild Cards? |
AKA Truncation searching
Using an asterisk * to represent characters |
|
What are NCBI's GenBank records? |
Archival entries, submitted by independent research projects |
|
What are NCBI's RefSeq? |
Preferred entry to work with Curated, non-redundant databases which solve a number of problems of archival databases |
|
What is NCBI's SwissPort sequence? |
Cross-reference into UniProt, the huge protein sequence database maintained by EBI (European Bioinformatics Institute) which is NCBI's counterpart in Europe Highest annotation standard overall and are expertly curated |
|
What is PubMed's Weighted? |
Applies a weighting algorithm to find broadly relevant information in PubMed |
|
In R programming, what does toupper( ) do? |
Translate characters in character vectors from lower case to upper |
|
What is MySQL? |
A free, open relational database Based on a client-server model Database engine runs as a daemon in the background and waits for connection attempts |
|
Why would you use MySQL over R? |
Scalability, concurrency and ACID compliance |
|
How is MySQL have better scalability over R? |
In theory R is good with large data objects but not so much in practice when the data is more than what the computer can keep memory of all at once MySQL can handle this |
|
What is ACID compliance? |
Atomicity (either succeeds fully with all requested elements or not at all) Consistency (requires that any transaction will bring the database from one valid state to another) Isolation (ensures that any concurrent execution of transaction results in exact same database state as if transactions would have been executed serially, one after the other) Durability (ensures that committed transaction remains permanently committed, even in the event that the database crashes or alter error occurs) |
|
What is Entity-Relationship Diagram (ERD)? |
A semi-formal diagrams that show the key features of the model |
|
In R programming: if (!require(seqinr, quietly=TRUE)) { install.packages("seqinr") library(seqinr) } This is one if statement actually take care of three different scenarios/cases. What are they? |
(1) If the package seqinr is already installed and loaded, the entire if statement will evaluate to FALSE and nothing will happen (2) If the package seqinr is installed but not loaded, the require function will load the package. The if statement will evaluate to FALSE again so the contents of the if statement will not be executed (3) If the package is not installed, the if statement will evaluate to TRUE. So the contents of the if statement will be executed. The package will be installed and loaded |
|
In the R studio package when creating a vector such as the one shown: a<-c(1, "d", 3.0, TRUE) When your print the a vector, what class of data will all values in the vector be coerced as? (Check all that apply) A) Logical B) Character C) Integer D) Complex |
B) |
|
In R programming, what is a function of square brackets in R? A) To add an internal or external link B) To search for/define something C) Retrieving elements or slices from matrices D) To exit the program |
C) |
|
In R programming, what is the result of the following statement? !as.logical(0) |
TRUE |
|
Given values: 1, 2, 3, 4, 5 & 6, (or any other set of numbers)
Describe two (or more) ways to compute the mean in R programming |
1) x = 1:6 mean(x)
2) a <- c(1,2,3,4,5,6) mean(a)
3) mean(1:6) mean(6:1) |
|
What is the difference between a vector, list, matrix, and data.frame in R programming |
A vector is a one dimensional collection of a single data type A list is a vector with multiple data types A matrix is a two (or more) dimensional vector of a single data type (a matrix can also be multidimensional) A data.frame is a two dimensional list which can have columns with different data types. |
|
Given the following code in R programming: a <- 1:12; a dim(a) <- c(2,2,3);a dim(a)[1] What will this return? |
2 |
|
What is one way of creating a matrix with 9 rows and 2 columns? Write the code in R programming |
1) Using dim() a<-1:18 dim(a)<-c(9,2); a 2) Using cbind() m<-cbind(1:9, 10:18); m 4) matrix(c(1:18), nrow=9, byrow=TRUE) 5) Using matrix() matrix(1:18, 9, 2) |
|
In R programming, which of the following takes a quoted string as its argument, and which of the following takes a variable name, without quotation marks? install.packages() library() |
install.packages() - quoted string library() - no quotation marks |
|
What will the output be for the following statement in R programming? f <- c(1,1,2,3,5,8,13,21); f[length(f)-3:length(f)] |
5 3 2 1 1 |
|
Is the argument logical in R programming? Why or why not? #sample script: #define a vector a <- c(a, 1, 9, 7, 2, 71, 26) #list its contents a #calculate the mean of its values mean(a) |
It is not valid and will show an error because the argument is not numerical or logical. Hence it cannot calculate the mean |
|
The following R programming code was given: a <- complex(3,3,3) b <- complex(3,5,4) c <- a[3] + b[2] print(c) What would the code give A) 8 + 7i B) 7 + 8i C) 8 + 8i D) 7 + 7i |
A) |
|
What is the difference between the assignment operators: "<-", "<<-" and "=" in R programming? |
"<-" : Assigns value in the environment it is being evaluated in. It can also be used anywhere in the program. "<<-" : Assigns value in a global context, if the variable was assigned a value previously, it is redefined. Usually used in functions to avoid multiple assignments of the same value to a variable. "=" : Assigns value in the environment it is being evaluated in. However, it can only be used in the top level or as a subexpression in a list within braces. |
|
By default, which assignment operator is used to assign values to a constant in R? A) ? B) <- C) <<- D) == |
B) |
|
What is the output of the following in R programming? complex(r=5,4,6) |
5+6i, 5+6i, 5+6i, 5+6i |
|
Predict the output of the following R programming code: c <- 3 b <- 4 (c + 10) + b b != c c > b |
> (c + 10) + b [1] 17 (3 + 10) + 4 = 17 > b != c [1] TRUE b does NOT equal c > c > b [1] FALSE c is NOT greater than b |
|
When employing a function in R programming, its arguments must always be listed in a specific predefined order. True or False? |
False If one simply lists the values of the arguments then they must be listed in a predefined order. However, when values are assigned via their argument name the order is no longer important |
|
What is the output of the following R code: a <- c(2,4,6,8) a[a[a<4]] |
> a<4 = TRUE FALSE FALSE FALSE > a[a<4] = 2 > a[a[a<4]] = a[2] = 4 The output is 4 |
|
How can you find the median of the following set of data in R programming? 55, 23, 132, 1, 3, 43, 11 |
1) Create a vector using the above data a<- c(55, 23, 132, 1, 3, 43, 11) median(a) 2) Using the vector, order the data, and using the value of length(a) simply use the value of the median number a<- c(55, 23, 132, 1, 3, 43, 11) a<- a[order(a)] length(a) a[4] |
|
Given the following R codex<-matrix(c(1,2,3,11,13,12),4,3) What are the following outputs of the matrix 'x'? A) x[1,2] B) x[4,3] C) x[nrow(x),ncol(x)] D) x[ncol(x),nrow(x)] E) x[as.integer(TRUE),TRUE] |
A) 2 B) 12 C) 12 D) Error: subscript out of bounds E) 1 13 3 |
|
Following the installation of the package:seqinr, student A decides to test the function that allows for her to change three letter AA codes into their respective one letter code. Upon typing >a("phe") in R programming, she gets an error that tells her the amino acid does not exist. Why does this happen and what does she have to do to achieve her desired results? |
R is a case sensitive language Student A needs to capitalize the "p" in "phe" in order to use the programs function If she instead enters >a("Phe"), she should be rewarded with her desired result: [1] "F" |
|
When extracting components from a list in R programming (i.e. pKA23 <- list(size=4000, marker="kan", ori="ColE1", BanI=c(240, 450, 600, 3000) ) ), what is the difference between pKA23[ [2] ] and pKA23[2]? |
pKA23[ [2] ] gives output of "kan", which is the defined value of the second object in the pKA23 list, "marker" pKA23[2] gives the output of: "$marker" [1] "kan" |
|
In R programming, if a <- c(4, 5, 6, 7, 8, 9, 10), what is a[seq(2,6,2)]? A) 4 5 6 B) 5 7 9 C) 8 9 10 D) 5 6 7 |
B) |
|
Given: a <- c(1,2,3,4) b <- c(2,3,4,5) What elements of 'a', when specified in the following expression, enter R into the "Browser Mode"? What brackets do we use for this specification? > if (a < b[3]) browser () |
Square brackets are used to specify elements Elements "1", "2" and "3" of 'a' will, when specified in the above expression, enter us into Browser Mode For element "1" code would be: > if (a[1] < b[3]) browser () |
|
In R programming:
f<- c(1,2,3,4,5,6,7,8,9,10)
How would you retrieve the first, second and fifth item together? |
cat(f[1], f[2], f[5])
OR
cat("First:", f[1], " Second:", f[2], " Fifth:", f[5]) |
|
Where it can be found the indexed terms of a specific field on the NCBI's Entrez system? |
The indexed terms can be found on the Advanced Search Page |
|
Both Mbp1 and Swi6p bind directly to DNA? True or False |
False Only Mbp61 can bind directly to DNA and it can do so without Swi6p |
|
The N-terminus of Mbp1 binds to _______ and the the C-terminus binds to _______ . |
DNA (N-terminus) Swi6p (C-terminus) |
|
What is genome annotation? |
Genome annotation is the process of attaching biological information to sequences It can be automatic or manual (curated) |
|
What does this code output in R programming? s="i love ramen!" substr(s,8,12) |
ramen The substr() function returns a substring of the parameter 's', from indices 8 to 12 |
|
In R programming, what does this code ouput? s="i love ramen!" substr(s,8,12) <- "frogs" s |
i love frogs! In this case, substr() does not return a value, but changes the content of 's'; in particular, it replaces the indices 8-12 with 'frogs' |
|
MySQL, MariaDB, and PostgreSQL are all examples of what? |
Relational databases |
|
How would you do an Entrez search for items with either Mbp1 or Swi6p intersected with a search for regulators but excluding any results with human using Boolean operators? |
(((Mbp1 OR Swi6p) AND regulators) NOT human) |
|
Given the following of R programming: > randomNumber <- function(len=1, MIN=0, MAX=1000) { return(floor(runif(len, min=MIN, max=MAX))) } Write the proper code you would use to debug the function in the example above. In addition, how would one exit the debugging mode of said function? |
debug(randomNumber) -> to enter debugging mode undebug(randomNumber) -> to exit debugging mode. |
|
Which of the followings are TRUE regarding to NCBI's Entrez operating systems? A) Entrez integrates data with links only within databases B) There is only one way of entering 'gene' database homepage. C) Boolean operator AND is not case-sensitive. D) Individual search terms separated by spaces are normally automatically combined as if they were joined by OR operators |
All are FALSE |
|
Which of the following are useful applications/places to store your lab notes electronically? A) Evernote B) Google Keep C) Microsoft OneNote D) The Student Wiki |
All |
|
In NCBI's Entrez, what is problematic about the search term cat*? |
This is an example of a truncation search The search (cat*) will give incomplete results, because truncation searches can only use the first 600 variations of the search term. |
|
How would NCBI's Entrez generate a search based on the following?
Ghrelin AND (bipolar OR schizophrenia) What purpose do the parentheses serve? |
The union of bipolar and schizophrenia results is processed first followed by the result of the ghrelin search
The information inside the parentheses is processed first and will override the default left to right processing |
|
How can R be used to organize data? |
Connect R to a database like mySQL, MariaDB Use data.frame() to keep complex data Use read.table() |
|
What is the cardinality of the relationship between DNA sequence and protein? A) 1:2 B) 1:1 C) 1:n D) n:n E) A and B F) B and C |
F) |
|
When constructing an Entity-Relationship Diagram for a protein, why is it important to have a Unique Identifier, Remove Redundant Data, and Create Separate Tables for attributes that do not depend on our protein? |
Make the data model more efficient, and internally consistent Remove redundant information Accommodating different features |
|
When running a Protein BLAST, what is the E value and does it increase or decrease as we go down the list (towards less significant alignments)? |
The E value (the Expect Value) increases as the sequence alignments become less significant (the closer to zero = the more significant the match is) |
|
What is Glycine structure, short form and properties? |
Glycine, Gly, G Only has a hydrogen atom as a side chain Aliphatic side chain Does not have an L or D forms Very and maybe most flexible aa and the smallest Allows close packing and van de Waal forces Hydrophobic Simplest amino acid, has only a single hydrogen for an R group and this hydrogen is not a good hydrogen bond former Glycine's solubility properties are influenced mainly by its polar amino and carboxyl groups and thus glycine is best considered a member of the polar, uncharged group Except for glycine, all the amino acids isolated from proteins have 4 different groups attached to the alpha-carbon atom Glycine is sterically the most adaptable of the amino acids and it accommodates conveniently to other steric constraints in the beta-turn |
|
What is Alanine structure, short form and properties? |
Alanine, Ala, A Has a methly group for R-chain Aliphatic side chain Most generic Non-polar Hydrophobic |
|
What is Valine structure, short form and properties? |
Valine, Val, V Beta branched Large aliphatic chain Non-polar Hydrophobic due to aliphatic chain Terrible alpha-helix former due to beta branch |
|
What is Leucine structure, short form and properties? |
Leucine, Leu, L Aliphatic side chain Most common amino acid in proteins Great alpha-helix former Hydrophobic Non-polar |
|
What is Isoleucine structure, short form and properties? |
Isoleucine, Ile, I Aliphatic side chain Non-polar Hydrophobic |
|
What is Serine structure, short form and properties? |
Serine, Ser, S Aliphatic hydroxyl side chain Good hydrogen bond-forming moeities Hydrophilic Polar EN negative |
|
What is Threonine structure, short form and properties? |
Threonine, Thr, T Beta branched Aliphatic hydroxyl side chain Polar EN negative Hydroxyl group AND methyl group Good hydrogen bond-forming moieties Hydrophilic |
|
What is Phenyalanine structure, short form and properties? |
Phenyalanine, Phe, F Aromatic side chain Benzyl group R-side chain Hydrophobic Absorb ultraviolet light above 250nm |
|
What is Tyrosine structure, short form and properties? |
Tyrosine, Tyr, Y Aromatic side chain Amphipathic Hydrophobic with polar properties Good hydrogen-bond forming moieties Also has non-polar characteristics due to its aromatic ring and could be arguably be placed in the non-polar group (Has pKa of 10.1, its phenolic hydroxyl is a charged, polar entity at high pH) Absorb ultraviolet light above 250nm |
|
What is Tryptophan structure, short form and properties? |
Tryptophan, Trp, W Aromatic side chain Has indole ring R-side chain which gives it absorption of 290nm light Nitrogen on indole ring give it hydrogen donor potential Considered a borderline member of aromatic side chain group because it can interact favourable with water via the N-H moiety of indole ring |
|
What is Cysteine structure, short form and properties? |
Cysteine, Cys, C Contains sulfurEN negative Can deprotonate at pH values greater than 7 Hydrophilic Can also be considered hydrophobic because of its sulfide and found buried inside the protein Can create disulfide bridges with other cysteines |
|
What is Methionine structure, short form and properties? |
Methionine, Met M Contains sulfur EN negative Often the first protein to be cut off as it is the initiator protein Amphipathic (least polar of the amphipathic amino acids but its thioether sulfur can be an effective metal ligand in proteins) Hydrophobic |
|
What is Asparate structure, short form and properties? |
Aspartate, Asp, D Polar Acidic Hydrophilic |
|
What is Glutamate structure, short form and properties? |
Glutamate, Glu, E Polar Acidic |
|
What is Asparagine structure, short form and properties? |
Asparagine, Asn, N Good hydrogen-bonding forming moieties Hydrophilic To test what a D is doing in a protein, you can change it to N to see if it does anything or if it kills the protein Polar Amide R-side chain Acidic |
|
What is Glutamine structure, short form and properties? |
Glutamine, Gln, Q Hydrophilic Acidic |
|
What is Lysine structure, short form and properties? |
Lysine, Lys, K Has a 4 methyl group in a row making it hydrophobic Head poking out interacting with water while the chain is hiding inside the protein, called snorkeling Head is different from rest of protein Basic Lysine contains a protonated alkyl amino group Side chains are protonated under physiological conditions and participate in electrostatic interactions in proteins Amphipathic Can be considered amphipathic because its R group consists of an aliphatic side chain which can interact with hydrophobic amino acids in protein and normally charged at neutral pH Polar |
|
What is Arginine structure, short form and properties? |
Arginine, Arg, R Very positively charge and never lose it under natural circumstances Very important for binding substances Basic Arginine contains aguanidiniumgroup Side chains are protonated under physiological conditions and participate in electrostatic interactions in proteins Hydrophilic Has resonance structure due to double bond |
|
What is Histidine structure, short form and properties? |
Histidine, His, H Has an imidazole ring where resonance structure where the positive charge can change It can lose it proton easily where it can take or lose easily Basic Side chains fully protonated at pH 7 but histidine with a side chain pKa of 6 which means it is only 10% protonated at pH 7 With a pKa near neutrality, histidine side chains plays important roles as proton donors and acceptors in many enzyme reaction Hydrophilic |
|
What is Proline structure, short form and properties? |
Proline, Pro, P Is an imino acid, not really an amino acid but is because it is cyclic Side chain is cyclic and forms a ring via a covalent bond with the backbone nitrogen atom Cyclic ring makes it a very rigid structure and makes the kink in chainsIf you put in alpha helix, it would break it and bend it For the protein to fold, it is often found that proline is found in the bends and folds Non-polar Hydrophobic Proline has a cyclic structure and a fixed phi angel, so, to some extent, it forces the formation of a beta-turn |
|
What are the hydrophobic amino acids? |
Ala, Cys, Ile, Leu, Met, Phe, Val |
|
What are the hydrophilic amino acids? |
Basic: Arg, Lys Acidic: Asp, Glu Polar: Asn, Gln, His |
|
What is the most common amino acid? |
Leu |
|
What is the pKa's of the amino acids R, K, C, H, E, D? |
R = 12.5 K = 10.5 C = 8.3 H = 6.0 E = 4.3 D = 3.9 |
|
What are the top helix forming amino acid residues? |
Glu, Met, Ala, Leu, Lys |
|
What are the best helix-breaking amino acid residues? |
Gly, Pro, Asn, Tyr, Cys |
|
Why would you want to keep the GeneSequence info in a rational database? |
To see if the gene is part of a gene cluster and see what its possible functions are Promoters including upstream and downstream cofactors Alternative splicing Conserved sequences in genome (look for DN/DS scores) |
|
What are smile strings? |
Linearsets of text that unique define a chemical module even if the module is cyclic |
|
What amino acid is B? |
D or N |
|
What amino acid is J? |
I or L |
|
What amino acid is O? |
Pyrrolysine |
|
What amino acid is U? |
Selenocysteine |
|
What amino acid is X? |
Unknown |
|
What amino acid is Z? |
E or Q |
|
What are the hydrophobic amino acids? |
FAMILYVW |
|
What is the yellow "key" in an MySQL Workbench? |
Primary key |
|
What is the red diamond in an MySQL Workbench? |
Foreign key |
|
What is a foreign key in MySQL workbench? |
Information they reference is not in our schema but somewhere else |
|
What is the white diamond in an MySQL workbench? |
Normal attributes |
|
What is what green diamond in MySQL workbench? |
Cannot be "NULL" |
|
How do you create an empty list in R programming? |
db<- list() |
|
What does str() do in R programming? |
Compactly display the internal structure of an R object A diagnostic function and al alternative to summary |
|
What does strOptions() do in R programming? |
Convenience function for setting options for str() |
|
What does setDataPart() do in R programming? |
Called to implement object@.Data Used to merge the data part of a superclass prototype |
|
What does gsub() do in R programming? |
Perform replacement of the first and all matches respectively |
|
What does this code do in R programming? gsub("[^a-zA-Z]", "", seq) |
Replaces anything that is not a to z for lower orupper with nothing |
|
What does missing() do in R programming? |
Checks whether the argument has been provided |
|
What does computePI() do in R programming?
|
From seqinr package This function calculates the theoretical isoelectric point of a protein This estimate does not account for the post-translational modifications |
|
What does strsplit() do in R programming? |
Split the elements of a character vector x intosubstrings according to the matches to substring split within them |
|
What does unlist() do in R programming? |
Given a list structure x, unlist simplifies it toproduce a vector which contains all the atomic components which occurs in x |
|
What does pmw() do in R programming? |
With default parameter values, returnsthe apparent molecular weight of one mole (6.0221415*e^23) of the input protein expressed in gramat sea level on Earth with terrestrial isotpoic composition |
|
What does AAstat() do in R programming? |
Returnssimple protein sequence information including the number of residues,percentage physico-chemical classes and the theoretical isoelectric point |
|
What is isoelectric point? |
The pH at which the protein has a neutral charge |
|
Whybad for a data model to directly assigning new values to elements? |
Whole model becomes in an inconsistent state Much better to write functions that get and set data elements which also keep data consistent |
|
Whatwould an setData function have to look like? |
Create a new entry if the requested row of a table does not exist yet Update data if protein exist Perform consistency check (check that data has correct type) Perform sanity check (check that data values fall into expected range) Perform completeness check (handle incomplete data) |
|
What is regularexpressions? |
Concisedescription language to define patterns for pattern-matching in strings |
|
What is RegexPal? |
A javascript regular expression tester Gives immediate visual feedback Website: http://regexpal.com |
|
In RegexPal, how do you specify more than one more character to match? |
Place in square brackets [lq] gives l OR q |
|
How do you do ranges in RegexPal? |
[1-5] or [a-z] |
|
How do you do exclusions in RegexPal? |
Uses caret, ^ [^0-9] or [^a-z] |
|
Can you use commas as ands in RegexPal? |
Yes unless there is a common in the sequence, then it posses a problem |
|
We have learnt that it is more convenient to write set and get data functions in order to edit components of data models List the five characteristics of an appropriate set data function |
1) Create a new entry if the requested row of a table does not exist yet 2) Update data if the protein exists 3) Perform consistency checks (i.e. check that the data has the correct type) 4) Perform sanity checks (i.e. check that data values fall into the expected range) 5) Perform completeness checks (i.e. handle incomplete data) |
|
Why would we not consider RefSeq ID and Uniprot ID "foreign keys"? |
They refer us to the NCBI/EBI website, they are not in our schema |
|
Which of the following is/are sequence analysis tools found in the EMBOSS package? A) pepstats B) tmap C) shuffleseq D) a and b E) all of a, b and c |
E) |
|
What is one way a string of sequence can be split into vectors? |
1) Use s2c() function in seqinr package > a <- "apple" > a <- s2c(toupper(a)) > a [1] "A" "P" "P" "L" "E" 2) > a <- "apple" > a <- toupper(a) > a <- strsplit(a,"") > a <- unlist(a) > a [1] "A" "P" "P" "L" "E" |
|
Given the nucleotide sequence below, what are two ways to extract only the sequence using regular expressions? >ENSONIE00000000371_116_T11948TTTCACCGTTCCCACACCTTAAAGCGGAATGGAGAAGAGCGGGAGGCAGAGAGGAAAGGAAAGACCGAGACAGAGAATGAAAGGAGGGGTAAACCGGGGCGATATCCTCTTTACCTGACCGGGTTGCTCACCTGAGCGGACTCACCTGTCCCGACGCCGAAAATACTTTTCTTTAGCGCTTGGGAACAAATCTGGTTGAGAGGAAAGGTGTGNCCGGGAAAGCGGAACTGGAGTGAACTCCCTGATCATGAGCGAGGGGACGTCTACATCCC |
1) Use [A,C,T,G] to select only the sequence and copy it to a new string (e.g. assign it to a vector) 2) Use [^A,C,T,G] to select every character except the bases and delete them from the original string |
|
In the data model developed, there are xref_table that holds two attributes: 1) The type of cross-reference from which the URL/accession (e.g. PubMed) that is constructed 2) The key (e.g. 10747782) Why is the type of cross-reference (i.e. xref_type_id) included as a foreign key rather than an attributed of the xref_table? |
The same cross-reference type may be described by different strings
By storing each description type in its own table (xref_type) and linking those tables via a foreign key (xref_type_id), we make sure that we get all of the cross-references associated with our protein/features, regardless of variation in description strings |
|
Consider the code below. Suppose your data overlapped your legend, what can you do to move the legend location? legend (x = 1, y = -1, legend = c("charged (+)", "charged (-)", "hydrophilic", "hydrophobic", "plain"), bty = "n", fill = c(chargePlus, chargeMinus, hydrophilic, hydrophobic, plain) ) |
The x, y parameteres in the function legend() allows you to specify where your legend is displayed Replace the values with say x = 10 and keeping y = -1 would do the trick |
|
Consider the following sequence: seq <- "1111zzzzz2222YYYYY333$$$TTT///" Using R-studio, how would you extract only the special characters ($ and /)? How would you extract only the upper case letters? |
1) To extract only special characters ($ and /), use the following command: seq <- gsub("[^$/]", "", seq) 2) To extract only the upper case letters, use the following command: seq <- gsub("[^A-Z]", "", seq) |
|
What are the regex expression(s) to find all animo acids but exclude the ones with the letter code 'n' in the following sequence? 1 msnqiysary sgvdvyefih stgsimkrkk ddwvnathil kaanfakakr trilekevlk 61 ethekvqggf gkyqgtwvpl niakqlaekf svydqlkplf dftqtdgsas pppapkhhha 121 skvdrkkair sastsaimet krnnkkaeen qfqsskilgn ptaaprkrgr pvgstrgsrr 181 klgvnlqrsq sdmgfprpai pnssisttql psirstmgpq sptlgileee rhdsrqqqpq 241 qnnsaqfkei dledglssdv epsqqlqqvf nqntgfvpqq qssliqtqqt esmatsvsss 301 pslptspgdf adsnpfeerf pgggtspiis miprypvtsr pqtsdindkv nkylsklvdy |
There are 2 ways- 1) [a-mo-z] (find all the a-m and o-z characters) 2) [^ 0-9n] (exclude the space, digit and n characters) |
|
Did the protein Mbp1 have an coil-coil motifs? If so, how many? How long? |
Mbp1 has two coil-coil motifs as shown by the pepcoil tool The first is located from residues 627-655, with a length of 29, and the second ranges from 740-767, with a length of 28 |
|
What information can we determine from pepstats using EMBOSS tools? Be sure to give specific examples |
We obtain calculated statistical information relating to the properties of the protein (e.g. molecular weight, charge, isoelectric point, extinction coefficients, etc.) |
|
What do these lines of code do? if (missing(database) | missing(table) | missing(id)) { stop("database, table and id have to be specified") } |
missing() within a function checks whether the argument has been provided The vertical bar means "OR" Therefore, if any of the three arguments have not been provided, the condition is TRUE and the stop() function thats executed |