• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/199

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

199 Cards in this Set

  • Front
  • Back

The field of bioinformatics is split into two fields, what are those two fields?

1) Computing and modeling




2) Data management

In R programming what can stringAsFactor be used for?

stringAsFactor is used if you do not set strings as factors and it will set strings as strings and nothing else

In R programming, what would set.seed(#) do?

Turns pseudo random numbers into specific random numbers (random numbers generated with random number generator algorithm)

In R programming, is 'for' a loop?

Yes

In R programming, what is this saying:




for (I in 1:11)

Make columns until you have 11 columns

In R programming, how you analyze the data rows for 1:10 of the column 1 and 2?

myData[c(1,2,3,4,5,6,7,8,9,10), c(1,2)]




OR




myData[1:10, 1:2]

In R programming, how you analyze the data rows for 1:10 of the first two columns in reverse order?

myData[10:1, 1:2]

In R programming, how you analyze the data rows for 1:10 of the first two columns in reverse order but not the third row of the reults?

(myData[10:1, 1:2])[-3, ]

In R programming, how you analyze the data rows for 1:10 of the first two columns in random order?

sample(1:10)


myData[sample(1:10), 1:2]

In R programming, how you analyze the data rows for 1:10 of the first two columns, ordered by value in the second column ascending?

myData[1:10, 2]


order(myData[1:10, 2])


myData[order(myData[1:10,2]), 1:2]

In R programming, how you analyze the data of column 1:2 of all rows with gene-names that begin with "q"?

substr(mydata[,"genes"], 1,1) == "q"



myData[substr(myData[,"genes"], 1,1) == "q", 1:2]

In R programming, how you analyze the data rows of genes with the highest final expression level?

myData[myData[,ncol(myData)] == max(myData[,ncol(myData)]), ]

In R programming, what is ncol?

Final column

What type of file format is FASTA?

Flat file format

What is the R programming language?

It is a statistics environment and programming language that is exceptionally well engineered




It is free platform for data manipulation and analysis

In R programming, what does the "#" character mark?

Following text is a comment and not executed by R

In R programming, what does help(rnorm) do?

Opens a help page of Normal Distribution information

In R programming, what does dnorm do?

Gives density in normal distribution

In R programming, what does pnorm do?

Gives distribution function in normal distribution

In R programming, what does rnorm do?

Generates random deviates in normal distribution




Length of result of normal distribution value is determined by n for rnorm

In R programming, what does dbinom do?

Gives density of binomial distribution

In R programming, what does pbinom do?

Gives distribution function of binomial distribution

In R programming, what does qbinom do?

Gives quantile function of binomial distribution

In R programming, what does rbinom do?

Generates random deviates of binomial distribution

In R programming, what does apropos( ) do?

Returns a character vector giving the name of all objects in the search list matching what (character string with name of an object)

In R programming, how do you use apropos to find 1-letter things?

apropos(""^.$")

In R programming, how do you use apropos to find things that start with me?

apropos("^me")

In R programming, how do you use apropos to find 2-to-4 letter things?

apropos("^.{2,4}$")

In R programming, how do you use apropos to find that contain me?

apropos("me")

In R programming, how do you use apropos to find things that end with me?

apropos("me$")

In R programming, what does find( ) do?

Returns a character vector giving the name of all objects in the search list matching what (character string with name of an object)

In R programming, what does ignore.case do?

Logical indicating if the search should be case-insensitive




TRUE by default

In R programming, what is ignore.case set to by default?

TRUE

In R programming, what does simple.words do?

Logical




If TRUE, the what argument is only searched as whole word

In R programming, what does var, cov and cor do?

Compute the variance of x and the covariance or correlation of x and y if these are vectors




If x and y are matrices, then the covariance (or correlations) between the columns of x and columns of y are computed




For cov and cor, one must give a matrix or a data frame for x or give both x and y

In R programming, what does cov2cor do?

Scales a covariance matrix into corresponding correlation matrix efficiently

In R programming, what does dput( ) and head( ) do?

Creates small, reproducible dataset with which your problem can be reproduced or your question illustrated

In R programming, what does getwd( ) do?

List what "Working Directory" is currently set to

In R programming, what does setwd( ) do?

To create a working directory set location

In R programming, how would you use setwd( )?

setwd("your/directory/name")

In R programming, what does ls( ) do?

Returns a vector of character strings giving the name of objects in the specific environment



Shows what data sets and functions a user has defined




When invoked with no argument inside a function, ls returns the name of function's local variables

In R programming, what does a( ) do?

From seqinr package




This is a vectorized function to convert three-letters amino-acid code into one-letter codes

In R programming, what does aaindex do?

From seqinr package




List of 544 phsyicochemical and biological properties for 20 amino-acids




Format:




- H String: accession number in aaindex database


- D String: data description


- R String: LITDB entry number


- A String: Author(s)


- T String: Title of article


- J String: Journal reference and comments


- C String: Accession number of similar entries with the correlation coefficients of 0.8 (-0.8) or more (less). The correlation coefficient is calculated with zeroes filled for missing values


- I Numeric named vector: amino acid index data

In R programming, what is the H string of the aaindex?

Accession number in aaindex database

In R programming, what is the D string of the aaindex?

Data description

In R programming, what is the R string of the aaindex?

LITDB entry number

In R programming, what is the A string of the aaindex?

Author(s)

In R programming, what is the T string of the aaindex?

Title of article

In R programming, what is the J string of the aaindex?

Journal reference and comments

In R programming, what is the C string of the aaindex?

Accession number of similar entries with correlation coefficients of 0.8 (-0.8) or more (less)




The correlation coefficient is calculated with zeroes filled for missing values

In R programming, what is the I string of the aaindex?

Numeric named vector

In R programming, what does <- mean?

Assignment operator




Assign a value to a constant

In R programming, what does list( ) do?

Creates a list function




Generally ordered collections of components

In R programming, what does return( ) do?

Data gets into the function via function arguments and this data would be returned using the return function

In R programming, what does browser( ) do?

Enter browser mode




Sets a breakpoint into your function




Using an if(condition) browser( ) to insert a condition breakpoint or watchpoint

What is an R packages (in R programming)?

A package is a collection of code, documentation and often sample data

In R programming, what is the seqinr package?

Exploratory data analysis and data visualization for biological sequences (DNA and protein) data




Include utilities for sequence data management under the ACNUC system

How does R programming compute a matrix norm of x?

Using LAPACK

In R programming, if the element of x for a binomial distribution is not an integer, what would the result of dbinom would be?

Zero (0)

In R programming, what algorithm is p(x) for a binomial distribution computed with?

Loader's algorithm

What is R studio?

A free IDE for R

What is cargo cult science?

Practices that have the semblance of being scientific but do not in fact follow the scientific method

What is the Mbp1 component?

It is an ig-fold transcription factor involved in regulation of cell cycle progression from G1 to S-phase




It forms a complex with Swi6p that binds to Mlu1 cell cycle box regulatory element in promoters of DNA synthesis gene




Located on chromosome 4 of yeast




Positively regulates transcription by RNA polymerase 2




Usually found in nucleus

Mbp1 forms a complex with what that binds to what?

Forms a complex with Swi6p that binds to Mlu1 cell cycle box regulatory elements

What does Mbp1 positively regulates?

Transcription by RNA polymerase 2 and involved in G1 to S-phase transition in mitosis

If a yeast has a null mutant of Mbp1, what happens to that yeast?

Abnormal vacuolar and mitochondrial morphology




Respiration defects




Decreased ethanol tolerance




Increase lifespan and budding index




Increased resistance to caffeine and desiccation

If a yeast is homozygous diploid null for Mbp1, what happens to that yeast?

Sensitive to starvation

If a yeast over expresses Mbp1, what happens to that yeast?

Slow growth




Affect cellular morphology and budding

What is the SGD database?

A web-based Saccharomyces genome database




Includes summary, sequence, protein, analyze, function, literature, gene ontology, interactions, regulations and expressions




Website is http://www.yeastgenome.org

What is the NCBI?

National Center for Biotechnology Information




Largest international provider of data for genomics and molecular biology




Its data is freely and openly available over the internet

What is NCBI's Entrez?

NCBI's primary text search and retrieval system that integrates the PubMed database of biomedical literature with 39 other literature and molecular databases

What are NCBI's Boolean operators?

Provide a way of generating precise queries that produce well-define sets of results




AND, OR, NOT




Requires to be in uppercase and processed in a left-to-right sequence

What are NCBI's Limits pages?

Pre-selected popular or useful searches that are available on the Limits page of each Entrez database




Selecting any of the boxes intersects the current search with corresponding limited search term

What are NCBI's Wild Cards?

AKA Truncation searching



Using an asterisk * to represent characters

What are NCBI's GenBank records?

Archival entries, submitted by independent research projects

What are NCBI's RefSeq?

Preferred entry to work with




Curated, non-redundant databases which solve a number of problems of archival databases

What is NCBI's SwissPort sequence?

Cross-reference into UniProt, the huge protein sequence database maintained by EBI (European Bioinformatics Institute) which is NCBI's counterpart in Europe




Highest annotation standard overall and are expertly curated

What is PubMed's Weighted?

Applies a weighting algorithm to find broadly relevant information in PubMed

In R programming, what does toupper( ) do?

Translate characters in character vectors from lower case to upper

What is MySQL?

A free, open relational database




Based on a client-server model




Database engine runs as a daemon in the background and waits for connection attempts

Why would you use MySQL over R?

Scalability, concurrency and ACID compliance

How is MySQL have better scalability over R?

In theory R is good with large data objects but not so much in practice when the data is more than what the computer can keep memory of all at once




MySQL can handle this

What is ACID compliance?

Atomicity (either succeeds fully with all requested elements or not at all)




Consistency (requires that any transaction will bring the database from one valid state to another)




Isolation (ensures that any concurrent execution of transaction results in exact same database state as if transactions would have been executed serially, one after the other)




Durability (ensures that committed transaction remains permanently committed, even in the event that the database crashes or alter error occurs)

What is Entity-Relationship Diagram (ERD)?

A semi-formal diagrams that show the key features of the model

In R programming:




if (!require(seqinr, quietly=TRUE)) {


install.packages("seqinr")


library(seqinr)


}




This is one if statement actually take care of three different scenarios/cases. What are they?

(1) If the package seqinr is already installed and loaded, the entire if statement will evaluate to FALSE and nothing will happen




(2) If the package seqinr is installed but not loaded, the require function will load the package. The if statement will evaluate to FALSE again so the contents of the if statement will not be executed




(3) If the package is not installed, the if statement will evaluate to TRUE. So the contents of the if statement will be executed. The package will be installed and loaded

In the R studio package when creating a vector such as the one shown:




a<-c(1, "d", 3.0, TRUE)




When your print the a vector, what class of data will all values in the vector be coerced as? (Check all that apply)




A) Logical




B) Character




C) Integer




D) Complex

B)

In R programming, what is a function of square brackets in R?




A) To add an internal or external link




B) To search for/define something




C) Retrieving elements or slices from matrices




D) To exit the program

C)

In R programming, what is the result of the following statement?




!as.logical(0)

TRUE

Given values: 1, 2, 3, 4, 5 & 6, (or any other set of numbers)



Describe two (or more) ways to compute the mean in R programming

1)


x = 1:6


mean(x)



2)


a <- c(1,2,3,4,5,6)


mean(a)



3)


mean(1:6)


mean(6:1)

What is the difference between a vector, list, matrix, and data.frame in R programming

A vector is a one dimensional collection of a single data type




A list is a vector with multiple data types




A matrix is a two (or more) dimensional vector of a single data type (a matrix can also be multidimensional)




A data.frame is a two dimensional list which can have columns with different data types.

Given the following code in R programming:




a <- 1:12; a


dim(a) <- c(2,2,3);a


dim(a)[1]




What will this return?

2

What is one way of creating a matrix with 9 rows and 2 columns? Write the code in R programming

1) Using dim()




a<-1:18


dim(a)<-c(9,2); a




2) Using cbind()




m<-cbind(1:9, 10:18); m




4)




matrix(c(1:18), nrow=9, byrow=TRUE)




5) Using matrix()




matrix(1:18, 9, 2)

In R programming, which of the following takes a quoted string as its argument, and which of the following takes a variable name, without quotation marks?




install.packages()




library()

install.packages() - quoted string




library() - no quotation marks

What will the output be for the following statement in R programming?




f <- c(1,1,2,3,5,8,13,21); f[length(f)-3:length(f)]

5 3 2 1 1

Is the argument logical in R programming? Why or why not?




#sample script:


#define a vector


a <- c(a, 1, 9, 7, 2, 71, 26)


#list its contents


a


#calculate the mean of its values


mean(a)

It is not valid and will show an error because the argument is not numerical or logical. Hence it cannot calculate the mean

The following R programming code was given:




a <- complex(3,3,3)


b <- complex(3,5,4)


c <- a[3] + b[2]


print(c)




What would the code give




A) 8 + 7i


B) 7 + 8i


C) 8 + 8i


D) 7 + 7i

A)

What is the difference between the assignment operators: "<-", "<<-" and "=" in R programming?

"<-" : Assigns value in the environment it is being evaluated in. It can also be used anywhere in the program.




"<<-" : Assigns value in a global context, if the variable was assigned a value previously, it is redefined. Usually used in functions to avoid multiple assignments of the same value to a variable.




"=" : Assigns value in the environment it is being evaluated in. However, it can only be used in the top level or as a subexpression in a list within braces.

By default, which assignment operator is used to assign values to a constant in R?




A) ?


B) <-


C) <<-


D) ==

B)

What is the output of the following in R programming?




complex(r=5,4,6)

5+6i, 5+6i, 5+6i, 5+6i

Predict the output of the following R programming code:




c <- 3


b <- 4


(c + 10) + b


b != c


c > b

> (c + 10) + b


[1] 17


(3 + 10) + 4 = 17




> b != c


[1] TRUE


b does NOT equal c




> c > b


[1] FALSE


c is NOT greater than b

When employing a function in R programming, its arguments must always be listed in a specific predefined order. True or False?

False




If one simply lists the values of the arguments then they must be listed in a predefined order. However, when values are assigned via their argument name the order is no longer important

What is the output of the following R code:




a <- c(2,4,6,8)


a[a[a<4]]

> a<4 = TRUE FALSE FALSE FALSE


> a[a<4] = 2


> a[a[a<4]] = a[2] = 4




The output is 4

How can you find the median of the following set of data in R programming?




55, 23, 132, 1, 3, 43, 11

1) Create a vector using the above data




a<- c(55, 23, 132, 1, 3, 43, 11)


median(a)




2) Using the vector, order the data, and using the value of length(a) simply use the value of the median number




a<- c(55, 23, 132, 1, 3, 43, 11)


a<- a[order(a)]


length(a)


a[4]

Given the following R




codex<-matrix(c(1,2,3,11,13,12),4,3)




What are the following outputs of the matrix 'x'?




A) x[1,2]


B) x[4,3]


C) x[nrow(x),ncol(x)]


D) x[ncol(x),nrow(x)]


E) x[as.integer(TRUE),TRUE]

A) 2




B) 12




C) 12




D) Error: subscript out of bounds




E) 1 13 3

Following the installation of the package:seqinr, student A decides to test the function that allows for her to change three letter AA codes into their respective one letter code. Upon typing >a("phe") in R programming, she gets an error that tells her the amino acid does not exist. Why does this happen and what does she have to do to achieve her desired results?

R is a case sensitive language



Student A needs to capitalize the "p" in "phe" in order to use the programs function



If she instead enters >a("Phe"), she should be rewarded with her desired result: [1] "F"

When extracting components from a list in R programming (i.e. pKA23 <- list(size=4000, marker="kan", ori="ColE1", BanI=c(240, 450, 600, 3000) ) ), what is the difference between pKA23[ [2] ] and pKA23[2]?

pKA23[ [2] ] gives output of "kan", which is the defined value of the second object in the pKA23 list, "marker"




pKA23[2] gives the output of:




"$marker"


[1] "kan"

In R programming, if a <- c(4, 5, 6, 7, 8, 9, 10), what is a[seq(2,6,2)]?




A) 4 5 6


B) 5 7 9


C) 8 9 10


D) 5 6 7

B)

Given:




a <- c(1,2,3,4)


b <- c(2,3,4,5)




What elements of 'a', when specified in the following expression, enter R into the "Browser Mode"? What brackets do we use for this specification?




> if (a < b[3]) browser ()

Square brackets are used to specify elements




Elements "1", "2" and "3" of 'a' will, when specified in the above expression, enter us into Browser Mode




For element "1" code would be:


> if (a[1] < b[3]) browser ()

In R programming:



f<- c(1,2,3,4,5,6,7,8,9,10)



How would you retrieve the first, second and fifth item together?

cat(f[1], f[2], f[5])



OR



cat("First:", f[1], " Second:", f[2], " Fifth:", f[5])

Where it can be found the indexed terms of a specific field on the NCBI's Entrez system?

The indexed terms can be found on the Advanced Search Page

Both Mbp1 and Swi6p bind directly to DNA?




True or False

False




Only Mbp61 can bind directly to DNA and it can do so without Swi6p

The N-terminus of Mbp1 binds to _______ and the the C-terminus binds to _______ .

DNA (N-terminus)




Swi6p (C-terminus)

What is genome annotation?

Genome annotation is the process of attaching biological information to sequences




It can be automatic or manual (curated)

What does this code output in R programming?




s="i love ramen!"


substr(s,8,12)

ramen




The substr() function returns a substring of the parameter 's', from indices 8 to 12

In R programming, what does this code ouput?




s="i love ramen!"


substr(s,8,12) <- "frogs"


s

i love frogs!




In this case, substr() does not return a value, but changes the content of 's'; in particular, it replaces the indices 8-12 with 'frogs'

MySQL, MariaDB, and PostgreSQL are all examples of what?

Relational databases

How would you do an Entrez search for items with either Mbp1 or Swi6p intersected with a search for regulators but excluding any results with human using Boolean operators?

(((Mbp1 OR Swi6p) AND regulators) NOT human)

Given the following of R programming:




> randomNumber <- function(len=1, MIN=0, MAX=1000) {


return(floor(runif(len, min=MIN, max=MAX)))


}




Write the proper code you would use to debug the function in the example above. In addition, how would one exit the debugging mode of said function?

debug(randomNumber) -> to enter debugging mode




undebug(randomNumber) -> to exit debugging mode.

Which of the followings are TRUE regarding to NCBI's Entrez operating systems?




A) Entrez integrates data with links only within databases




B) There is only one way of entering 'gene' database homepage.




C) Boolean operator AND is not case-sensitive.




D) Individual search terms separated by spaces are normally automatically combined as if they were joined by OR operators

All are FALSE

Which of the following are useful applications/places to store your lab notes electronically?




A) Evernote


B) Google Keep


C) Microsoft OneNote


D) The Student Wiki

All

In NCBI's Entrez, what is problematic about the search term cat*?

This is an example of a truncation search




The search (cat*) will give incomplete results, because truncation searches can only use the first 600 variations of the search term.

How would NCBI's Entrez generate a search based on the following?



Ghrelin AND (bipolar OR schizophrenia)


What purpose do the parentheses serve?

The union of bipolar and schizophrenia results is processed first followed by the result of the ghrelin search



The information inside the parentheses is processed first and will override the default left to right processing

How can R be used to organize data?

Connect R to a database like mySQL, MariaDB




Use data.frame() to keep complex data




Use read.table()

What is the cardinality of the relationship between DNA sequence and protein?




A) 1:2


B) 1:1


C) 1:n


D) n:n


E) A and B


F) B and C

F)

When constructing an Entity-Relationship Diagram for a protein, why is it important to have a Unique Identifier, Remove Redundant Data, and Create Separate Tables for attributes that do not depend on our protein?

Make the data model more efficient, and internally consistent




Remove redundant information




Accommodating different features

When running a Protein BLAST, what is the E value and does it increase or decrease as we go down the list (towards less significant alignments)?

The E value (the Expect Value) increases as the sequence alignments become less significant (the closer to zero = the more significant the match is)

What is Glycine structure, short form and properties?

Glycine, Gly, G




Only has a hydrogen atom as a side chain




Aliphatic side chain




Does not have an L or D forms




Very and maybe most flexible aa and the smallest




Allows close packing and van de Waal forces Hydrophobic




Simplest amino acid, has only a single hydrogen for an R group and this hydrogen is not a good hydrogen bond former




Glycine's solubility properties are influenced mainly by its polar amino and carboxyl groups and thus glycine is best considered a member of the polar, uncharged group




Except for glycine, all the amino acids isolated from proteins have 4 different groups attached to the alpha-carbon atom




Glycine is sterically the most adaptable of the amino acids and it accommodates conveniently to other steric constraints in the beta-turn

What is Alanine structure, short form and properties?

Alanine, Ala, A




Has a methly group for R-chain




Aliphatic side chain




Most generic




Non-polar




Hydrophobic

What is Valine structure, short form and properties?

Valine, Val, V




Beta branched




Large aliphatic chain




Non-polar




Hydrophobic due to aliphatic chain




Terrible alpha-helix former due to beta branch

What is Leucine structure, short form and properties?

Leucine, Leu, L




Aliphatic side chain




Most common amino acid in proteins




Great alpha-helix former




Hydrophobic




Non-polar

What is Isoleucine structure, short form and properties?

Isoleucine, Ile, I




Aliphatic side chain




Non-polar




Hydrophobic

What is Serine structure, short form and properties?

Serine, Ser, S




Aliphatic hydroxyl side chain




Good hydrogen bond-forming moeities




Hydrophilic




Polar




EN negative

What is Threonine structure, short form and properties?

Threonine, Thr, T




Beta branched




Aliphatic hydroxyl side chain




Polar




EN negative




Hydroxyl group AND methyl group




Good hydrogen bond-forming moieties




Hydrophilic

What is Phenyalanine structure, short form and properties?

Phenyalanine, Phe, F




Aromatic side chain




Benzyl group R-side chain




Hydrophobic




Absorb ultraviolet light above 250nm

What is Tyrosine structure, short form and properties?

Tyrosine, Tyr, Y




Aromatic side chain




Amphipathic




Hydrophobic with polar properties




Good hydrogen-bond forming moieties




Also has non-polar characteristics due to its aromatic ring and could be arguably be placed in the non-polar group (Has pKa of 10.1, its phenolic hydroxyl is a charged, polar entity at high pH)




Absorb ultraviolet light above 250nm

What is Tryptophan structure, short form and properties?

Tryptophan, Trp, W




Aromatic side chain




Has indole ring R-side chain which gives it absorption of 290nm light




Nitrogen on indole ring give it hydrogen donor potential




Considered a borderline member of aromatic side chain group because it can interact favourable with water via the N-H moiety of indole ring

What is Cysteine structure, short form and properties?

Cysteine, Cys, C




Contains sulfurEN negative




Can deprotonate at pH values greater than 7




Hydrophilic




Can also be considered hydrophobic because of its sulfide and found buried inside the protein




Can create disulfide bridges with other cysteines

What is Methionine structure, short form and properties?

Methionine, Met M




Contains sulfur




EN negative




Often the first protein to be cut off as it is the initiator protein




Amphipathic (least polar of the amphipathic amino acids but its thioether sulfur can be an effective metal ligand in proteins)




Hydrophobic

What is Asparate structure, short form and properties?

Aspartate, Asp, D




Polar




Acidic




Hydrophilic

What is Glutamate structure, short form and properties?

Glutamate, Glu, E




Polar




Acidic

What is Asparagine structure, short form and properties?

Asparagine, Asn, N




Good hydrogen-bonding forming moieties




Hydrophilic




To test what a D is doing in a protein, you can change it to N to see if it does anything or if it kills the protein




Polar




Amide R-side chain




Acidic

What is Glutamine structure, short form and properties?

Glutamine, Gln, Q




Hydrophilic




Acidic

What is Lysine structure, short form and properties?

Lysine, Lys, K




Has a 4 methyl group in a row making it hydrophobic




Head poking out interacting with water while the chain is hiding inside the protein, called snorkeling




Head is different from rest of protein




Basic




Lysine contains a protonated alkyl amino group




Side chains are protonated under physiological conditions and participate in electrostatic interactions in proteins




Amphipathic




Can be considered amphipathic because its R group consists of an aliphatic side chain which can interact with hydrophobic amino acids in protein and normally charged at neutral pH




Polar

What is Arginine structure, short form and properties?

Arginine, Arg, R




Very positively charge and never lose it under natural circumstances




Very important for binding substances




Basic




Arginine contains aguanidiniumgroup




Side chains are protonated under physiological conditions and participate in electrostatic interactions in proteins




Hydrophilic




Has resonance structure due to double bond

What is Histidine structure, short form and properties?

Histidine, His, H




Has an imidazole ring where resonance structure where the positive charge can change




It can lose it proton easily where it can take or lose easily




Basic




Side chains fully protonated at pH 7 but histidine with a side chain pKa of 6 which means it is only 10% protonated at pH 7




With a pKa near neutrality, histidine side chains plays important roles as proton donors and acceptors in many enzyme reaction




Hydrophilic

What is Proline structure, short form and properties?

Proline, Pro, P




Is an imino acid, not really an amino acid but is because it is cyclic




Side chain is cyclic and forms a ring via a covalent bond with the backbone nitrogen atom




Cyclic ring makes it a very rigid structure and makes the kink in chainsIf you put in alpha helix, it would break it and bend it




For the protein to fold, it is often found that proline is found in the bends and folds




Non-polar




Hydrophobic




Proline has a cyclic structure and a fixed phi angel, so, to some extent, it forces the formation of a beta-turn

What are the hydrophobic amino acids?

Ala, Cys, Ile, Leu, Met, Phe, Val

What are the hydrophilic amino acids?

Basic: Arg, Lys




Acidic: Asp, Glu




Polar: Asn, Gln, His

What is the most common amino acid?

Leu

What is the pKa's of the amino acids R, K, C, H, E, D?

R = 12.5




K = 10.5




C = 8.3




H = 6.0




E = 4.3




D = 3.9

What are the top helix forming amino acid residues?

Glu, Met, Ala, Leu, Lys

What are the best helix-breaking amino acid residues?

Gly, Pro, Asn, Tyr, Cys

Why would you want to keep the GeneSequence info in a rational database?

To see if the gene is part of a gene cluster and see what its possible functions are




Promoters including upstream and downstream cofactors




Alternative splicing




Conserved sequences in genome (look for DN/DS scores)

What are smile strings?

Linearsets of text that unique define a chemical module even if the module is cyclic

What amino acid is B?

D or N

What amino acid is J?

I or L

What amino acid is O?

Pyrrolysine

What amino acid is U?

Selenocysteine

What amino acid is X?

Unknown

What amino acid is Z?

E or Q

What are the hydrophobic amino acids?

FAMILYVW

What is the yellow "key" in an MySQL Workbench?

What is the yellow "key" in an MySQL Workbench?

Primary key

What is the red diamond in an MySQL Workbench?

What is the red diamond in an MySQL Workbench?

Foreign key

What is a foreign key in MySQL workbench?

Information they reference is not in our schema but somewhere else

What is the white diamond in an MySQL workbench?

What is the white diamond in an MySQL workbench?

Normal attributes

What is what green diamond in MySQL workbench?

What is what green diamond in MySQL workbench?

Cannot be "NULL"

How do you create an empty list in R programming?

db<- list()

What does str() do in R programming?

Compactly display the internal structure of an R object




A diagnostic function and al alternative to summary

What does strOptions() do in R programming?

Convenience function for setting options for str()

What does setDataPart() do in R programming?

Called to implement object@.Data




Used to merge the data part of a superclass prototype

What does gsub() do in R programming?

Perform replacement of the first and all matches respectively

What does this code do in R programming?




gsub("[^a-zA-Z]", "", seq)

Replaces anything that is not a to z for lower orupper with nothing

What does missing() do in R programming?

Checks whether the argument has been provided

What does computePI() do in R programming?

From seqinr package




This function calculates the theoretical isoelectric point of a protein




This estimate does not account for the post-translational modifications

What does strsplit() do in R programming?

Split the elements of a character vector x intosubstrings according to the matches to substring split within them

What does unlist() do in R programming?

Given a list structure x, unlist simplifies it toproduce a vector which contains all the atomic components which occurs in x

What does pmw() do in R programming?

With default parameter values, returnsthe apparent molecular weight of one mole (6.0221415*e^23) of the input protein expressed in gramat sea level on Earth with terrestrial isotpoic composition

What does AAstat() do in R programming?

Returnssimple protein sequence information including the number of residues,percentage physico-chemical classes and the theoretical isoelectric point

What is isoelectric point?

The pH at which the protein has a neutral charge

Whybad for a data model to directly assigning new values to elements?

Whole model becomes in an inconsistent state




Much better to write functions that get and set data elements which also keep data consistent

Whatwould an setData function have to look like?

Create a new entry if the requested row of a table does not exist yet




Update data if protein exist




Perform consistency check (check that data has correct type)




Perform sanity check (check that data values fall into expected range)




Perform completeness check (handle incomplete data)

What is regularexpressions?

Concisedescription language to define patterns for pattern-matching in strings

What is RegexPal?

A javascript regular expression tester




Gives immediate visual feedback




Website: http://regexpal.com

In RegexPal, how do you specify more than one more character to match?

Place in square brackets




[lq] gives l OR q

How do you do ranges in RegexPal?

[1-5] or [a-z]

How do you do exclusions in RegexPal?

Uses caret, ^




[^0-9] or [^a-z]

Can you use commas as ands in RegexPal?

Yes unless there is a common in the sequence, then it posses a problem

We have learnt that it is more convenient to write set and get data functions in order to edit components of data models




List the five characteristics of an appropriate set data function

1) Create a new entry if the requested row of a table does not exist yet




2) Update data if the protein exists




3) Perform consistency checks (i.e. check that the data has the correct type)




4) Perform sanity checks (i.e. check that data values fall into the expected range)




5) Perform completeness checks (i.e. handle incomplete data)

Why would we not consider RefSeq ID and Uniprot ID "foreign keys"?

They refer us to the NCBI/EBI website, they are not in our schema

Which of the following is/are sequence analysis tools found in the EMBOSS package?




A) pepstats




B) tmap




C) shuffleseq




D) a and b




E) all of a, b and c

E)

What is one way a string of sequence can be split into vectors?

1) Use s2c() function in seqinr package




> a <- "apple"


> a <- s2c(toupper(a))


> a




[1] "A" "P" "P" "L" "E"




2)




> a <- "apple"


> a <- toupper(a)


> a <- strsplit(a,"")


> a <- unlist(a)


> a




[1] "A" "P" "P" "L" "E"

Given the nucleotide sequence below, what are two ways to extract only the sequence using regular expressions?




>ENSONIE00000000371_116_T11948TTTCACCGTTCCCACACCTTAAAGCGGAATGGAGAAGAGCGGGAGGCAGAGAGGAAAGGAAAGACCGAGACAGAGAATGAAAGGAGGGGTAAACCGGGGCGATATCCTCTTTACCTGACCGGGTTGCTCACCTGAGCGGACTCACCTGTCCCGACGCCGAAAATACTTTTCTTTAGCGCTTGGGAACAAATCTGGTTGAGAGGAAAGGTGTGNCCGGGAAAGCGGAACTGGAGTGAACTCCCTGATCATGAGCGAGGGGACGTCTACATCCC

1) Use [A,C,T,G] to select only the sequence and copy it to a new string (e.g. assign it to a vector)




2) Use [^A,C,T,G] to select every character except the bases and delete them from the original string

In the data model developed, there are xref_table that holds two attributes:



1) The type of cross-reference from which the URL/accession (e.g. PubMed) that is constructed



2) The key (e.g. 10747782)



Why is the type of cross-reference (i.e. xref_type_id) included as a foreign key rather than an attributed of the xref_table?

The same cross-reference type may be described by different strings



By storing each description type in its own table (xref_type) and linking those tables via a foreign key (xref_type_id), we make sure that we get all of the cross-references associated with our protein/features, regardless of variation in description strings

Consider the code below. Suppose your data overlapped your legend, what can you do to move the legend location?




legend (x = 1,


y = -1,




legend = c("charged (+)",


"charged (-)",


"hydrophilic",


"hydrophobic",


"plain"),




bty = "n",




fill = c(chargePlus,


chargeMinus,


hydrophilic,


hydrophobic,


plain)




)

The x, y parameteres in the function legend() allows you to specify where your legend is displayed




Replace the values with say x = 10 and keeping y = -1 would do the trick

Consider the following sequence:




seq <- "1111zzzzz2222YYYYY333$$$TTT///"




Using R-studio, how would you extract only the special characters ($ and /)? How would you extract only the upper case letters?

1) To extract only special characters ($ and /), use the following command:




seq <- gsub("[^$/]", "", seq)




2) To extract only the upper case letters, use the following command:




seq <- gsub("[^A-Z]", "", seq)

What are the regex expression(s) to find all animo acids but exclude the ones with the letter code 'n' in the following sequence?




1 msnqiysary sgvdvyefih stgsimkrkk ddwvnathil kaanfakakr trilekevlk




61 ethekvqggf gkyqgtwvpl niakqlaekf svydqlkplf dftqtdgsas pppapkhhha




121 skvdrkkair sastsaimet krnnkkaeen qfqsskilgn ptaaprkrgr pvgstrgsrr




181 klgvnlqrsq sdmgfprpai pnssisttql psirstmgpq sptlgileee rhdsrqqqpq




241 qnnsaqfkei dledglssdv epsqqlqqvf nqntgfvpqq qssliqtqqt esmatsvsss




301 pslptspgdf adsnpfeerf pgggtspiis miprypvtsr pqtsdindkv nkylsklvdy

There are 2 ways-




1) [a-mo-z] (find all the a-m and o-z characters)




2) [^ 0-9n] (exclude the space, digit and n characters)

Did the protein Mbp1 have an coil-coil motifs? If so, how many? How long?

Mbp1 has two coil-coil motifs as shown by the pepcoil tool




The first is located from residues 627-655, with a length of 29, and the second ranges from 740-767, with a length of 28

What information can we determine from pepstats using EMBOSS tools?




Be sure to give specific examples

We obtain calculated statistical information relating to the properties of the protein (e.g. molecular weight, charge, isoelectric point, extinction coefficients, etc.)

What do these lines of code do?




if (missing(database) | missing(table) | missing(id)) {


stop("database, table and id have to be specified")


}

missing() within a function checks whether the argument has been provided




The vertical bar means "OR"




Therefore, if any of the three arguments have not been provided, the condition is TRUE and the stop() function thats executed