Wednesday, September 25, 2024

digital governance ideas

🚨 [AI Governance] Have you come across the term "digital governance"? Did you know that privacy careers are evolving, and AI governance expertise is becoming increasingly valuable? Here's what you need to know:

1️⃣ The IAPP has just released its "Organizational Digital Governance Report 2024," which is essential reading for those working in privacy and AI governance (link provided below).

2️⃣ Many organizations face challenges in structuring their internal digital governance frameworks. Alongside this, privacy roles are shifting, and top privacy professionals must now be well-versed in navigating AI governance concerns. Here are two key excerpts from the IAPP report:

"Existing C-suite leaders of specific domains are seeing their personal remits expanded and elevated. For example, 69% of chief privacy officers surveyed have acquired additional responsibility for AI governance, while 69% are responsible for data governance and data ethics, 37% for cybersecurity regulatory compliance, and 20% for platform liability. This trend continues at a team level, with over 80% of privacy teams gaining responsibilities that extend beyond privacy. At 55%, more than one in two privacy professionals works in functions with AI governance responsibilities. At 58%, more than one in two privacy pros has picked up data governance and data ethics. At 32%, almost one in three covers cybersecurity regulatory compliance. At 19%, almost one in five has platform liability responsibilities."

-

"One trend taking root is the expansion and greater empowerment of the role of the CPO. Some of this extends to include responsibility for other digital governance subdomains. For example, many CPOs have acquired responsibility for AI governance. The IAPP-EY Professionalizing Organizational AI Governance Report found 63% of organizations have tasked their privacy functions with AI governance responsibility. In some cases, the extension is even broader to include digital safety and ethics. These are logical extensions in many ways, given the disciplinary, regulatory and governance overlaps. This trend can be observed through the changing job titles in the market, with many additional descriptors tagged on to CPO titles over the past year."

3️⃣ Privacy careers are evolving, adapting to new demands. As AI continues to expand, AI governance skills are increasingly sought after. If you're looking to enhance your privacy career or transition into AI governance, now is an opportune time to upskill and pursue AI governance training—many already recognize this shift.

4️⃣ Check out the full report below.

5️⃣ Considering designing an AI governance Bootcamp. Any ideas welcome. 

6️⃣ In the ever-changing world of "digital governance," staying informed is crucial. Perhaps a newsletter for the latest updates in AI policy, compliance, and regulation would be good.

➡️ Read the full report:
https://t.co/kxdsvbHuFy

Tuesday, September 24, 2024

Qualitative Research Fundamentals - Social Sciences

 Book Ref : Research Methods and Statistics for the Social Sciences: A Brief Introduction by Amber DeBono (author) (ISBN: 9781516537389) 

Variable - anything that can be measured or changed in a research study is called a variable.

Independent Versus Dependent Variable

The independent variable is the factor that the researcher believes will affect an outcome. Variables that cannot be experimentally manipulated (constants), are not true independent variables.

The dependent variable is the variable that the researcher believes will be affected by the independent variable. 

Categorical Versus Continuous Variables 

A categorical variable is a variable in which participants belong to different groups or categories.

A continuous variable is a variable in which participants can fall anywhere on a spectrum of scores.  

Important to know variables are categorical or continuous because it will affect what type of statistical analysis they will use to analyze their data. 

Hypothesis

Null hypothesis always states that your independent variable will have no effect on the dependent variable. We test the null hypothesis with statistical analyses in order to find support for our research hypothesis. We call this null hypothesis testing.

Example

- Research Question : Is [Group1] more likely than [Group2].

- Research Hypothesis : If [condition], then [Group1] [consequence] more likely than [Group2 consequence].

- Null Hypothesis : If [condition], then [Group1] [consequence] will be equally likely to [Group2 consequence].

Directional Versus Non-Directional Research Hypothesis  

Directional research hypotheses always predict that one group will be higher than the other on the dependent variable. 

Nondirectional research hypotheses are written in a way that differences are predicted between the groups, but the researcher isn’t sure which group will be higher than the other. 

Example. 

- Directional Research Hypothesis: If [condition], then [Group1] [consequence] more likely than [Group2 consequence]. 

- Nondirectional Research Hypothesis: If [condition], then [Group1] [consequence] more or less likely than [Group2] [consequence].

Statistics only work when the data is "normal". If there is a skew, then it has to be transformed into a normal distribution,

Probability and Null Hypothesis Testing: p < .05

The most important probability is p = .05. Converted into a percentage this would be a probability of 5%.. Typically, social scientists test the null hypothesis. They do not want the null hypothesis to be true. When they calculate their statistical tests on the null hypothesis, they want the probability of the null hypothesis to be true in their dataset to be less than 5% (p < .05). That’s a pretty low probability. By demonstrating how unlikely it is that the null hypothesis is true in the dataset, the researchers have evidence to support their research hypothesis. if p>.05, improve your test design like Increase your sample size. 

Descriptive Versus Inferential Statistics 

Descriptive variables describe the data ex demographics. Inferential variables infer the relationship between the variables ex height vs weight taller person may weigh more , ofc this may not be true in all cases.

Raw Scores  are the actual data before we transform or standardize them. when we count occurence of a raw number it is called frequency.

Statistical Tests

You are always testing the null hypothesis (which assumes no relation between variables). Your aim is to prove p < 0.05. 

--------------------------------------

Central tendency: Values that represent the central point within a group of scores

Mode: The most frequently occurring score or value in a set of scores or values

Median: The midpoint in the set of scores in which 50% of the scores are above this midpoint and 50% are below

Mean: The sum of the scores in a dataset divided by the number of scores (also known as the average)

Variance: A single number that describes how spread out the scores in a dataset are

Standard deviation: A single number derived from the variance that states how spread out the scores in a dataset are. The standard deviation is typically reported in manuscripts to describe the variance





Central Tendency (Mean, Median, Mode) vs Spread (Variance)




(N-1) is called degrees of freedom = This is the number of values in a distribution of scores that are free to vary.

Conceptually, formula for variance is a simple fraction. That means that the bigger the numerator is, and the smaller the denominator is, the larger your variance. When we look at the formula, this means that the more your scores are different from the mean and the fewer participants you have, the larger your variance will be. Likewise, the closer your scores are to the mean and the more participants you have, the smaller your variance will be. Typically, we want our variance to be small. This is the reason we aim for a higher sample size.

The standard deviation tells us how much our scores vary as a whole. That is, smaller standard deviations tell us that the scores aren’t very different from each other, whereas larger standard deviations tell us that the scores are very different from each other.

  

----------------------------------------

Z-Scores (Standardized Scores)

Raw scores are the actual scores that you get from your participants.

Standardized scores are very useful because they tell you how many standard deviations your raw score is from the mean. The sign of Z-score indicates whether the participant is below or above the mean, and the Magnitude tells us if it is an outlier. They are also important for calculating correlations.


One-Tailed Versus Two-Tailed Test

Recall that a nondirectional research hypothesis will have no specific prediction for the relationship between the research variables, whereas a directional research hypothesis will predict a very specific relationship between the research variables. 

When we have a directional research hypothesis, we should conduct a one-tailed test. 

Ex if our hypothesis is Loneliness increases Depression, we want our statistic to be in top 5% of distribution. so we test the top-tail,

But if our hypothesis is Loneliness affects Depression (we don't know), we test the low 2.5% and high 2.5%. 

Journals may want researchers to still conduct 2-tail test in directional hypothesis too, because they don't want to take your word about pre-conceived directions!

Z-Test: Your First Statistical Test

This test is to know if a z-score is substantially different from the mean. Also called Statistical Significance.  This means the the Z-score should occur (p<0.05) in your dataset.

so, 2-tailed Z-test would be conducted to publish for a journal, or if you aren't sure about the direction. :-)    Two tail Z-test : See if the z-score is above +1.96 or below -1.96 [Critical Values].

In a bell curve, top 2.5% means z-score of +1.96. bottom 2.5% means z-score of -1.96



This means that if you get a z-score below -1.96 or above +1.96, it is very unlikely that the Null Hypothesis is true. You win!!  

<<Whenever a researcher calculates a z-score below -1.96 or above +1.96, the researcher can reject the null hypothesis and claim the score demonstrates evidence to support the research hypothesis.>>

One-Tailed Z-test:  

For a one-tailed z-test (when you have a directional research hypothesis and you’re not going to publish your results), you would need to see if your z-score is above +1.64, if your research hypothesis predicts that your score is higher than the mean, or below -1.64, if your research hypothesis predicts that your score is below the mean. Statistical significance is a good thing. 

When we have statistical significance, this means we have evidence to support our research hypothesis.


Correlational Research Design

This is a type of research study that examines how two or more variables are related to each other. 

A positive correlation is when two variables are related in a way that as one variable increases, the other also increases. This must also logically mean that as scores on one variable decrease, the other one also decreases. 

A negative correlation is when one variable increases, another variable decreases.  

Positive and negative, in terms of correlations, simply explain whether we expect the variables to correspond in the same (positive) or opposite (negative) direction.

This is best suited when you can't change a variable, or don't have the resources (time etc) to do so. Like trying to test correlation between Self Esteem and Exam Scores of students. You can't change the self esteem in an experiment. 

Correlational designs are also an excellent way to replicate the effects from another study. Indeed, if the same effect can be found using multiple research methodologies, this can be some very powerful evidence for your hypotheses!

CO-RELATION IS NOT CAUSATION!

Calculating Correlations: Pearson’s r


A scatterplot is a graph that shows the relationship between two variables. As a researcher, you are hoping that your data plots (the dots in the scatterplot) cluster closely to a diagonal line and in the
direction that is consistent with your research hypothesis.

If the data plots fall closely along a diagonal line; this is called the Regression Line. It is the line that intersects with the most points in the scatterplot. 

The more datapoints cluster on the regression line, the stronger the correlation, that is, the closer your correlation will be closer to -1.00 for negative correlations or +1.00 for positive correlations.


Most commonly used correlation, Pearson’s r (Pearson,1920) formula:

Basically, multiply the z-scores for variable X by the z-scores for variable Y. Then you add them up and divide that number by your total number of participants minus 1.


Summary

Z-scores: Also known as standardized scores, these scores tell us how far each score is from the mean (in terms of standard deviations)

Z-test: A statistical test to find out if a single score is significantly different from the mean

Correlational research: A type of research study that examines how two or more variables are related to each other but does not determine cause and effect

Scatterplot: A graph that includes plots for participants’ data on two variables

r -pearson : The most frequently reported correlation. It is calculated by summing the multiplication
of z-scores on two variables and dividing that sum by the number of participants (N).

Regression line: The best-fitting line in a scatterplot that is closest to the most data points

Positive correlation: A correlation in which scores on one variable increase as scores on the other variable increase

Negative correlation: A correlation in which scores on one variable increase as scores on the other variable decrease

Strong correlation: A type of correlation that is good at predicting how one person will score on one variable, knowing how they scored on another variable

Weak correlation: A type of correlation that is not very good at predicting how one person will score on one variable, knowing how they scored on another variable

------------------------------------

When researchers create a questionnaire or other type of assessment, they need to make sure that it is valid (accurate) and reliable (consistent). 

There are three types of reliability that are important for researchers: 
            - internal consistency (Cronbach’s alpha), 
            - test-retest reliability, and 
            - inter-rater reliability. 

Cronbach’s alpha, a measure of internal consistency, tells researchers how closely questionnaire items are correlated with one another. 
Test-retest reliability tells researchers how much their assessment results in similar scores over time. 
Inter-rater reliability tells researchers who use behavioral measures if they are consistently measuring a behavior based on people’s ratings. 

To make sure that their assessment is valid, they need to find evidence for content and construct validity.  Construct validity tells us if our scale is measuring what it is supposed to, whereas content validity tells us if our items are measuring what they are supposed to measure. 


RELIABILITY = Consistent

Our methods (let's say our scales or established questionnaires that we administer on people to measure their "loneliness" for example) are deemed reliable when we have evidence that they are consistent. We
want our questionnaire to consistently find the same results. We want the scores on a questionnaire to consistently measure the same characteristic. 

There are three main ways that we measure reliability: 
     - internal consistency (Cronbach’s alpha), 
     - test-retest reliability, and 
     - inter-rater reliability.

Internal Consistency: Cronbach’s Alpha

We want the questionnaires that we use in the social sciences to be internally consistent. This means that all the questions seem to be measuring the same concept, the one that we claim to be measuring. If all the questions are measuring the same concept, then the responses to the questions should mostly be positively correlated with one another, rather than with the questions we designed to measure the same concept or behavior. 

Instead of examining the multiple correlations between item responses; we look to a single number—the Cronbach’s alpha. The Cronbach’s alpha is like a mega-correlation; it tells us the extent to which responses to all questions correlate with each other.

Cronbach’s alphas for your methods ought to be above .80, although .70 can also be considered acceptable. Less than that, junk the questionnaire, find another.

Test-Retest Reliability

We also want to receive the same (or very similar) scores on the questionnaires every time we
administer them to each of our participants.  Just find correlation for 2 attempts responses, and correlation should be more than 0.7

Inter-Rater Reliability
Inter-rater reliability is most often used with measuring determinants directly—not with methods. Again correlation between the "raters" or "observers" has to be high.


VALIDITY = measuring what we think we are measuring,
The primary ways that we establish that our scales are valid are 
    - construct validity and 
    - content validity.

Construct Validity: Convergent and Discriminant Validity

Construct validity tells us if our scale is measuring to the construct (e.g. the trait) that it’s supposed to. There are two main ways that researchers can establish construct validity: 
        - Convergent Validity and 
        - Discriminant Validity. 

    Convergent Validity: To demonstrate that your questionnaire has good convergent validity, you will need to administer your questionnaire and other questionnaires that measure very similar characteristics. 

Researchers usually want their questionnaires to measure a unique characteristic or to measure that characteristic better than already existing questionnaires. Hence correlation should not be too high! (r>0.9)

    For Discriminant validity, researchers want to ensure that the questionnaire they developed is not related to questionnaires that measure characteristics that are unrelated to the characteristic the researcher is hoping to measure. ex: you want to measure self-esteem, it should not correlate highly to say humour ot death anxiety lol. You need to look for zero correlation in this case!

Content Validity = Face Validity 

Content validity tells us if the items in the questionnaire are assessing what they are supposed
to (Haynes et al., 1995). One of the best ways to establish content validity is to establish face validity. 
Face validity means that a lay person or expert in the field has reviewed the items in your questionnaire and agrees that your items seem like they would measure what you are trying to measure (Holden, 2010). 

Summary

Reliability: A characteristic of a measure that demonstrates consistency
Internal consistency: A characteristic of a measure that demonstrates that all questions are measuring the same concept; this is typically reported as Cronbach’s alpha ()
Cronbach’s alpha: A statistic that we use to measure the internal consistency of a questionnaire
Test-retest reliability: A characteristic of a measure that tells a researcher how consistent the scale is over time
Inter-rater reliability: A characteristic of a measure (usually behavioral) that demonstrates the consistency of raters

Validity: A characteristic of a measure that tells us how accurate our measure is, if the measure is measuring what it is designed to measure
Construct validity: A characteristic of a measure that tells us if our scale is measuring the construct it’s supposed to. This includes convergent and discriminant validity.
Convergent validity: The extent to which a measure correlates with similar measures
Discriminant validity: The extent to which a measure does not correlate with unrelated measures
Content validity: A characteristic of a measure that tells us if our items are measuring the content they are supposed to
Face validity: Tells the researcher if, simply reading each item, the items seem to measure the characteristic they are supposed to measure

------------------------------ pg 50----

Experimental Designs

Till now we discussed, what makes a questionnaire good. Let's now discuss what makes the experiment good. The experimental method requires two basic  components: an independent and dependent  variable.

The experimental method is considered the best scientific method because it can provide evidence for a cause-and-effect relationship between the independent and dependent variable. To make sure that you have a good experiment, the experimenter must thwart several threats to the experiment’s internal and external validity. Mook (1983) suggested that it is critical for researchers to replicate their results. By doing this, the experimenter has strong evidence that an experiment’s results are generalizable.

There are two main types of experiments: between-subjects and within-subjects

A between-subjects design is an experiment where groups of participants receive different experiences of the independent variable. A within-subjects design is an experiment in which each participant serves as both the experimental and control condition. Often, this means a pre-/post-test design.

Internal validity describes how well an experiment demonstrates the cause-and-effect relationship
between the independent and dependent variables. 

Some of the most frequent threats to internal validity are :-
  1. History : History refers to the events that occur between the measurements of the dependent variable in a within-subjects design.  To protect against this threat, a good experimenter would make sure there is as little time as possible in between the pre- and post-test measures of the dependent variable.
  2. Maturation : Maturation refers to changes in participants that occur over time during an experiment. To combat this threat, it is important for the experimenter to make their studies as short as possible.
  3. Practice Effect :  The practice effect is when the experimenter measures the dependent variable so often that the participants perform better on the dependent variable simply because of practice (and not the independent variable). To avoid this validity threat, researchers should keep  measurements of the dependent variable to a minimum.
  4. Reactive Measures : Reactive measures are measurements of the dependent variable that provoke the participants and result in imprecisely measuring the dependent variable. For example, questionnaires that measure participants’ sexual activity and drug use would be considered reactive measures. One way to combat this threat is to obtain a certificate of confidentiality.
  5. Selection :  Selection refers to choosing participants in a way so that our groups are not equal prior to the experiment. To fight this threat, researchers should always randomly assign participants to condition.
  6. Mortality : It refers to drop-out rates. Need to keep them incentivized.
  7. Demand Characteristics:  To prevent the occurrence of these demand characteristics, experimenters should create scripts that are practiced and strictly followed during the experiment.  Participants, in turn, may respond to questionnaires in a way that is misleading or false. This type of bias is called response bias. Response bias may be due to demand characteristics, but this bias may simply be due to the participants’ desire to present themselves favorably.
Important Steps to Protect Against These Threats
To ensure that your experiment is internally valid, it is critical for you to randomly assign your participants (in a between-group design). Also, be sure to keep the length of your study short and keep measurements of the dependent variable to a minimum. A good researcher will also practice running the experiment multiple times and follow a script so that all participants are treated similarly (except for the experience of the independent variable).  
External Validity: Generalizing Your Findings
Researchers want to be able to generalize their findings from the sample they recruited to the general population, across time and place. In order to do that, researchers must keep three types of generalization in mind: 
    - population generalization , 
    - environmental  generalization , and 
    - temporal generalizability.

Population Generalizability : to demonstrate that the results of studies conducted only on this population can generalize to people from other races, genders, ages, and socioeconomic statuses. Social scientists should, to the extent possible, recruit from abroad swath of the population to ensure good population generalizability\

Environmental Generalizability
The ability to find the same (or very similar) results from an experiment to a situation or environment that differs from that of the original experiment is called environmental generalizability.

Temporal Generalizability
To have good temporal generalizability, you need to conduct your experiment for years and find very similar results every year. 

The Statistics of Assessing Generalizability
To determine generalizability, we will need to find a similar pattern of findings across studies. The best way to assess these patterns is with a meta-analysis.

Some of the most frequent threats to internal validity are :-
Artificial Conditions - controlled labs. white rats. Convenience Sampling.
Importance of Replication.


Summary:

  • Experimental method: A research design that includes a manipulated independent variable (assigning participant to different experiences) and a measured dependent variable
  • Between-subjects design: An experiment in which groups of participants receive different experiences of the independent variable
  • Within-subjects design: An experiment in which the participants serve as both the experimental and control conditions
  • Internal validity: The extent to which an experimenter can demonstrate that the independent variable causes changes to the dependent variable
  • History: The events that occur between the measurements of the dependent variable in a within-subjects design
  • Maturation: The changes in participants that occur over time during an experiment
  • Practice effect: Participants perform better on the dependent variable due to multiple measurements of the dependent variable
  • Reactive measures: Measurements of the dependent variable that provoke the participants
  • Selection: Choosing participants in a way so that groups are not equal prior to the experiment
  • Mortality: Participants’ dropout rates that are particularly problematic if dropout rates differ between experimental conditions
  • Demand characteristics: The researcher leads participants to behave in a certain way in the experiment
  • Response bias: Participants in a research study respond in a way that presents themselves more favorably
  • External validity: The extent to which experimental results apply to different populations and situations
  • Population generalizability: The ability to apply the results of an experiment to a group of participants that is different and more encompassing than those used in the original experiment
  • Environmental generalizability: The ability to find the same (or very similar) results from an experiment to a situation or environment that differs from the original experiment
  • Temporal generalizability: The ability to find the same (or very similar) results from an experiment over time
  • Meta-analysis: A statistical analysis that examines the combined findings of multiple studies (published and unpublished)
  • Artificial conditions: A research environment, such as a laboratory, that does not look or feel like the participants’ natural environment
  • Convenience sampling: Recruiting participants who are convenient or easy to find and participate in research
  • Replication: Repeating an experiment with a new set of participants.

----------------------------

Having learned the experimental designs for data collection, we now learn how to analyze our data.

Recall that the two most common types of experimental designs are between and within subjects. 

Between-subjects design refers to experiments in which people are assigned to different experimental groups and the researcher determines if there is a difference between these groups. Each group  experiences completely different aspects of the independent variable, in this case social exclusion. This type of research design is often referred to as a classic experimental design because it is a frequently used experimental design. 

Within-subjects design :  This type of research design means that the subjects experience all or some of the aspects of the independent variable. Participants are serving as both the control and experimental condition. This is also an example of a pre-/post-test design. In this case, the dependent variable (aggression) was measured before and after the experimental intervention, making participants feel left out. 

Pros and Cons for using a between-subjects design. 

- between-subjects design tends to be shorter because the dependent variable is only measured once 
within-subjects designs are repeated measures designs
- Order effects (~ carryover effects) are irrelevant because the order of the independent variable and dependent variable are the same for all participants. However, for a within-subjects design, this would be a major concern.




Two-Group Between-Subjects Design: Independent T-Test

The simplest research design is a two-group between-subjects design, with one independent
variable and one dependent variable.





















Saturday, September 21, 2024

Quantitative Research Fundamentals

REF BOOK: Statistics for Management by Richard I Levin, David S Rubin, Sanjay Rastogi and Masood Husain Siddiqui 

               .Chapter 2.

DEFINITIONS

Continuous Data : Data that may progress from one class to the next without a break and may be expressed by either whole numbers or fractions.

Cumulative frequency distribution : A tabular display of data showing how many observations lie above, or below, certain values.

Data : A collection of any number of related observations on one or more variables.

Data Array : The arrangement of raw data by observations in either ascending or descending order.

Data Point : A single observation from a data set.

Data Set:  A collection of data.

Discrete Classes : Data that do not progress from one class to the next without a break; that is, whe classes represent distinct categories or counts and may be represented by whole numbers.

Frequency Curve: A frequency polygon smoothed by adding classes and data points to a data set.

Frequency Distribution: An organized display of data that shows the number of observations from data set that falls into each of a set of mutually exclusive and collectively exhaustive classes.

Frequency Polygon : A line graph connecting the midpoints of each class in a data set, plotted at a heg corresponding to the frequency of the class.

Histogram :A graph of a data set, composed of a series of rectangles, each proportional in width to range of values in a class and proportional in height to the number of items falling in the class, or fraction of items in the class.

Ogive: A graph of a cumulative frequency distribution.

Open-Ended Class : A class that allows either the upper or lower end of a quantitative classific limitless.

Population :A collection of all the elements we are studying and about which we are trying to to draw conclusions. 

Raw Data: Information before it is arranged or analyzed by statistical methods.

Relative Frequency Distribution : the display of a data set that shows the fraction or percentage of the total data set that falls into each of a set of mutually exclusive and collectively exhaustive classes 

Representative Sample : A sample that contains the relevant characteristics of the population in th same proportions as they are included in that population. 

Sample : A collection of some, but not all, of the elements of the population under study, used to desan the population.

Width of class intervals = (Next unit value after largest value in data - Smallest value in data) / Total number of class intervals

Chapter 3

Bimodal Distribution: A distribution of data points in which two values occur more frequently than the rest of the values in the data set.

Boxplot : A graphical EDA technique used to highlight the center and extremes of a data set.

Chebyshev's Theorem : No matter what the shape of a distribution, 
- at least 75 percent of the values in the population will fall within 2 standard deviations of the mean and
- at least 89 percent will fall within 3 standard deviations.

Coding: A method of calculating the mean for grouped data by recoding values of class midpoints to more simple values.

Coefficient of Variation: A relative measure of dispersion, comparable across distributions, that expresses the standard deviation as a percentage of the mean.

Deciles:  Fractiles that divide the data into 10 equal parts.

Dispersion: The spread or variability in a set of data.

Distance Measure: A measure of dispersion in terms of the difference between two values in the data set.

Exploratory Data Analysis (EDA): Methods for analyzing data that require very few prior assumptions.

Fractile: In a frequency distribution, the location of a value at or above a given fraction of the data. 

Geometric Mean: A measure of central tendency used to measure the average rate of change or growth for some quantity, computed by taking the nth root of the product of n values representing change.

Interfractile Range : A measure of the spread between two fractiles in a distribution, that is, the difference between the values of two fractiles.

Interquartile Range : The difference between the values of the first and the third quartiles; this difference indicates the range of the middle half of the data set. 

Kurtosis: The degree of peakedness of a distribution of points.

Mean/ A central tendency measure representing the arithmetic average of a set of observations.

Measure of Central Tendency : A measure indicating the value to be expected of a typical or middle data point.

Measure of Dispersion : A measure describing how the observations in a data set are scattered or spread out.

Median : The middle point of a data set, a measure of location that divides the data set into halves. 

Median Class : The class in a frequency distribution that contains the median value for a data set.

Mode : The value most often repeated in the data set. It is represented by the highest point in the distribution curve of a data set.

Parameters : Numerical values that describe the characteristics of a whole population, commonly rep- resented by Greek letters.

Pg142

Percentiles : Fractiles that divide the data into 100 equal parts. 

Quartiles : Fractiles that divide the data into four equal parts.

Range The distance between the highest and lowest values in a data set.

Skewness : the extent to which a distribution of data points is concentrated at one end or the other, the lack of symmetry: 

Standard Deviation : The positive square rot of the variance; a measure of dispersion in the same units as the original data, rather than in the squared units of the variance.

Standard Score : Expressing an observation in terms of standard deviation units above or below the mean, that is transformation of an observation by subtracting the mean and dividing by the standard deviation.

Statistics :  Numerical measures describing the characteristics of a sample. Represented by Roman letters. 

Stem and Leaf Display:  A histogram-like display used in EDA to group data, while still displaying a the original values.

Summary Statistics: Single numbers that describe certain characteristics of a data set.

 Symmetrical: A characteristic of a distribution in which each half is the mirror image of the other half. 

Variance : A measure of the average squared distance between the mean and each item in the population

 Weighted Mean: An average calculated to take into account the importance of each value to the overall total, that is, an average in which each observation value is weighted by some index of its importance.