P-Hacking: The Menace In Science

In the American Statistician Association (2016a) statement, stated the following conversation:

Q: Why do so many colleges and grad schools teach p = 0.05?

A: Because that’s still what the scientific community and journal editors use.

Q: Why do so many peope still use p = 0.05?

A: Because that’s what they were taught incollege or grad school.

Someone doesn’t need to be studying philosophy, or for the Law School Acceptance Test (LSAT) to see the flaw in that argument.  It’s circular reasoning, and that is the point.  The p-value is being overused when there are so many other ways to measure the strength of the data and it’s significance. Plus, a p = 0.05 is arbitrary and dependent on many fields.  I have seen papers use p = 0.10; p = 0.05, p = 0.01 and rarely p = 0.001.  But, are the results reliable, replicable, and reproducible? There are even studies that manipulate their data to get these elusive p-values…

Scientific research is at the bedrock of pushing society forward. However, not every study’s results published can represent the best of science. Some in the field have tried to alter how long the study lasts, not take into account of a confounding variable that could be causing the results, make the sample size too small to be reliable and allowing luck to be in play, or attempt p-hacking (Adam Ruins Everything, 2017; CrashCourse, 2018; Oliver, 2016).

P-hacking is defined as gathering as many variables as possible, then massaging the huge amounts of data to get a statistically significant result (CrashCourse, 2018; Oliver, 2016). However, that result could be completely meaningless. Similar to when the 538 blog did a p-hacking study called “You can’t trust what you read about nutrition” surveyed 54 people and collected over 1000 variables, found a statistically significant correlation between eating raw tomatoes to Judaism. 538 did this study just to point out the issue of p-hacking (Aschwanden, 2016).

As mentioned earlier, the best way to protect ourselves from p-hacking is to replicate the study and see if we can get similar results to the original study (Adam Ruins Everything, 2017; John Olver, 2016). Unfortunately, in science, there is no prize for fact-checking (John Oliver, 2016). That is why when we do research, we must make sure our results are robust, by testing multiple times if possible.  If it is not possible to do it in your own research, then a replication study is called for by others.  However, Replication studies are rarely ever funded and rarely get published (Adam Ruins Everything, 2017). A great way to do this, is collaborating with scientific peers from multiple universities, work on the same problem, with the same methodology, but different datasets and publish one or a series of papers that confirms a result as replicable and robust.  If we don’t do this, it forces the scientific field to only fund exploratory studies to get developed and published, and the results never get evaluated. Unfortunately, the adage for most scientists is to “publish or perish,” and as Prof. Brian Nosek from Center for Open Science said, “There is NO COST to getting things WRONG. THE COST is not getting them PUBLISHED.” (John Oliver, 2016).

The American Statistical Association (2016b), suggested the following to be used with p-values to give a more accurate representation of the significances:

  • Methods that emphasize estimation over testing
    • Confidence intervals
    • Credibility intervals
    • Prediction intervals
  • Bayesian methods
  • Alternatives measure of evidence
    • Likelihood ratios
    • Bayesian Factors
  • Decision-Theoretic modeling
  • False discovery rates

Have hope, most reputable scientists don’t take the result of one study to heart, but look at in the context of all the work done in that field (Adam Ruins Everything, 2017). Also, most reputable scientists tend to downplay the implications and generalizations of their results when they publish their findings (American Statistical Association, 2016b; Adam Ruins Everything, 2017; CrashCourse, 2018; Oliver, 2016). Looking for those kinds of studies and knowing how p-hacking is done is the best ammunition to defend against spurious results.

Resources

Finance/Accounting 101: Capital and Operating Expense

Capital Expenditure – CapEx (Finance/Accounting): Includes all spending on an asset that is supposed to last for over a year (Apptio, 2018).  Usually, it is used to undertake a new project, but it can be used for purchasing or changing equipment, buildings, etc. (Investopedia, n.d.b.). CapEx contains depreciation, look at my previous post for that (Apptio, n.d.; Investopedia, n.d.b.). A car is a great example for your personal CapEx, given that it depreciates over time and you purchase or lease it typically for more than a year.

Operating Expense – OpEx (Finance/Accounting): Includes all the ongoing costs for running as normal (Apptio, 2018; Investopedia, n.d.a.). For instance, OpEx could include rent, equipment, inventory costs, marketing, payroll, insurance, and funds allocated for research and development (Investopedia, n.d.a). Essentially, if you look at the rent you pay for living or for driving, that can be considered your own OpEx.  If you also consider your health, dental, vision, disability, housing, car, etc. it can also fall under this category.  Even gas to fuel up a car, given that it is used to make your asset operable fits under this category.  According to Apptio (2018), bills like electricity, water, etc. can fall under this category as well.

You can be more CapEx or OpEx heavy in your budgets.  Each with their benefits.  For instances being more CapEx heavy, your costs are more predictable in the long run and you can easily calculate your net worth. In that scenario, you may not have enough cash to continue to pay for some opportunities.  If you are more OpEx heavy you tend to save more money for investment purposes.  Here you have more flexibility to take on an opportunity, but its harder to show/calculate your net worth.

Another way to look at this is OpEx is like the cloud service on your phone, you pay for what you use, be it 5 gigs, 25 gigs, 50 gigs, etc. Whereas, CapEx is steady and saying I rather pay for the entire asset and enjoy as much or as little as I want.

Resources:

Finance/Accounting 101: Sunk and Opportunity Costs

Opportunity Cost (Finance): Is the cost one misses out on, when you go with one option over another (Investopedia, n.b.a.).  This usually occurs when you have limited resources.  If you have little money to budget, if you factor out all your needs, you only have so much left for your wants.  You cannot buy all your wants and therefore when you buy one want you may not be able to afford another.  The biggest limited resource we have is time, and time is usually associated with money.  When I did my doctorate, I couldn’t use that time to go to law school, so my opportunity cost was law school during my doctorate.  However, going to law school now will mean that the opportunity cost I have to pay is time with family, friends, and pets.  As Ursula from the little mermaid said “You can’t get something for nothing,” even free things have their cost.  You can get a free cookie, cake piece, ice cream, or pizza slice, but that may mean more time in the gym to burn those unneeded calories.

Opportunity cost can be calculated in dollars, time, or any other metric, but since you forgo that opportunity in exchange for another, you cannot claim it in accounting purposes. However, calculating it can be extremely useful for decision making.

Sunk cost (finance/accounting): It is the cost that has already been incurred to date that cannot be recovered, especially when deciding whether to continue to invest or divest (Investopedia, n.d.b.).  There is a fallacy that we as humans tend in include sunk cost when making our decision to continue moving over.  For instance, if you had a major in college, let’s say physics and you are on your senior year, and you realized you want to be a biologist instead.  The decision you have to make is to finish physics as a double major with biology, finish physics and stay in that field, or stop studying physics and pursue biology.  The sunk costs are all the classes that won’t count towards a degree in physics.  Some people may look at the problem and say they are 3-4 classes away from the degree, I might as well suck it up.  Or others may say, I have enough for a minor, and I should cut my losses.  When making a decision, like this, we should look at the problem new, without looking at what was already invested, because if you hate physics, but are 23-4 classes away, you will hate those three or four classes and your future career.  It makes no sense to continue.

In sunk cost, it doesn’t mean that you cannot try to claim some value from what you invested in.  For instance, claiming a minor in physics, or seeing which of those credits can transfer to lessen the load of classes you want to take for a biology major.  That is a smart way to minimize sunk cost.  But, if there is a sunk cost, it is ok.  The problem is not to keep wasting resources towards a lost cause and increasing the sunk cost.

Personally, I fall victim to this sunk cost fallacy a few times, when it comes to being a life-long learner.  Especially when reading a book.  Just because I checked out a book in the library doesn’t mean I have to read it from cover to cover, especially if I don’t like it after a few chapters.  But, again we have a tendency to want to see things through to the end.  The letting go of a book exercise is a great exercise in building resilience against the sunk cost fallacy.  Give it a try.  Has there been a book on your nightstand that you just don’t want to read anymore? Then let it go.  Donate it to a library, to a school, etc.  Relish in the fact that you didn’t give in to the sunk cost fallacy to keep reading that book to the end.

Resources:

Finance/Accounting 101: Direct and Indirect Costs

Direct Cost (Finance/Accounting): Can consist of fixed and variable costs, but that is 100% dedicated to a service, an asset, etc (Apptio, 2018; Investopedia, n.d.a.).  Imagine you buy a new laptop.  The cost is fixed direct cost to acquire it.

Indirect Cost (Finance/Accounting): Are costs that are shared amongst a service, an asset, etc. (Apptio, 2018).  Let’s look at the laptop you just bought above.  Even though the price of the physical laptop is fixed and direct, you have indirect fixed and variable costs associated with it.  Some of the indirect fixed cost will come from purchasing software, OS license, virus and malware detection software, etc. While some of the indirect variable cost will come with how much electricity you will spend to keep your laptop’s battery charged. Indirect costs can be hard to find if your budget isn’t transparent (Apptio, 2018).

Resources:

Finance/Accounting 101: Fixed and Variable Costs

Fixed Expense (Finance/Accounting): Are expenses that remain the same over time (Apptio, 2018; Investopedia, n.d.a.).  If you have a gym membership, you are charged a flat membership fee each and every period.  Thus, you know how much you can and should budget for.

Variable Expense (Finance/Accounting): Are expenses that change over time (Apptio, 2018; Investopedia, n.d.b.).  A great example is for those people who don’t have unlimited talk, text and data plan on their cell phone.  Given that we are measuring the exact minutes we spend each month talking, or the amount of text we send or receive, or how much data we download, this person’s cell phone bill will be variable. Here, you don’t know how much you can budget for.  Things happen.

Variable expense isn’t necessarily bad nor is fixed expenses good.  It depends on context, asset, service, etc.  Therefore, one should regularly evaluate their budget and see if what they have as fixed or variable expenses are justified. The benefit of a variable expense is you have the most leverage on how much you can consume or spend, giving you greater control over your budget rather than a fixed expense.  This leverage gives you budget flexibility (Apptio, 2018).

A healthy budget would take into account fixed and variable costs and will have an appropriate mix of the two.

Resources:

Finance/Accounting 101: Amortization and depreciation

The Matching Principle (accounting): Expenses are matched to and recorded in the period where you have realized the benefits (Accounting Coach, n.d.).  It doesn’t matter when you received or sent an invoice out, it matters only when you get paid or you pay the invoice (Apptio, 2018).  In other words, I get my credit card statement on the 23rd of the month (a weird date, but it is what it is).  The credit card company cannot realize the benefit/payment of the invoice until I pay it, therefore it is a liability for them (Accounting Coach, n.d.).  Usually, people have about a month or less to pay back their balance in part or in full.  Until I decide to pay them the credit card company cannot account for the money, which means the credit card company cannot say it is Revenue (Accounting Coach, n.d., Apptio, 2018).  This is because, how can the credit card company say I paid them for the service rendered if I haven’t cut the check or e-paid my bill? However, when I do pay, I can pay it on the 23rd, 24th or the 2nd of the month. Once I pay my bill, either in part or the full amount, the credit card company can say they realized the benefits for the service rendered (in this case, me borrowing money on credit).

Depreciation and Amortization (accounting/finance): This refers to how money is spread throughout the lifetime of the product or service (Apptio, 2018).  The best example we have for amortization is a mortgage on a house.  When I bought my house, I got a long printout (excel sheet style and a waste of trees) of how much I will be paying for my mortgage, how much of that will go to escrow, how much of that will go to the principle and how much of that goes to the interest.  The mortgage schedule shows that over time I will pay more into my principle and less into the interest, which tends to lower the book value of my housing loan (Investopedia n.d.a.).   If I were to sell the house, and it losses value during a housing bubble, then I will be in a budget shock (Apptio, 2018). The reason is that the entire amortization schedule is due in full at the date of closing, and I will be on the hook for the difference.

So, let’s look into buying a new car! When we buy a shiny new car and drive it off the lot, it is said to depreciate over 20% in a matter of seconds.  Over the course of the first two years, the cost of the car will further depreciate, therefore the best advice usually is to wait 2-5 years after the car has been manufactured to keep most of your money, given that the most depreciation occurs in the first 2-5 years (2 Cents, 2018).  Thus, depreciation is not necessarily the loss of intrinsic value for car usage, just a loss of financial value over time (Apptio, 2018).  Depreciation is accounted for in taxes or in accounting books, it can be used to illustrate the loss of value of an asset over the life of the asset (Apptio, 2018; Investopedia, n.d.b.).

Note that in business, assets can be tangible, usually a physical server, a building, etc., or intangible, like patents or copywrites, etc. (Apptio 2018).

Resources:

Adv Database Management: SQL Unions

Please note that the following blog post provides a summary view for what you need to get done (left column) and quick examples that illustrate how to do it in SQL (right column with SQL code in red). For more information please see the resources below:

Union
SELECT ename, job, deptno
  FROM emp
  UNION
SELECT name, title, deptid
  FROM emp_history
Intersect
SELECT ename, job, emptno
  FROM emp
  INTERSECT
SELECT name, emptid, title
  FROM emp_history
Union all (will include duplicate values)
SELECT ename, job, emptno
  FROM emp
  UNION ALL
SELECT name, emptid, title
  FROM emp_history

Adv Database Management: SQL Sub-queries and views

Please note that the following blog post provides a summary view for what you need to get done (left column) and quick examples that illustrate how to do it in SQL (right column with SQL code in red). For more information please see the resources below:

Subquery
SELECT enames
  FROM emp
  WHERE sal >
    (SELECT sal
     FROM emp
     WHERE empno = 7566)
Correlated Subqueries
SELECT empno, sal, deptno
  FROM   emp outr
  WHERE  sal >
    (SELECT AVG(sal)
     FROM   emp innr
     WHERE  outr.deptno = innr.deptno)
Exists
SELECT empno, ename, job, deptno
  FROM   emp outr
  WHERE  EXISTS
    (SELECT empno
     FROM   emp innr
     WHERE  innr.mgr = outr.empno)
Not Exists
SELECT dname, deptno
  FROM   dept d
  WHERE  NOT EXISTS
    (SELECT *
     FROM   emp e
     WHERE  d.deptno = e.deptno)
In
SELECT empno, ename, job, deptno
  FROM   emp outr
  WHERE empno IN
    (SELECT mgr
     FROM   emp)

Creating a view
CREATE VIEW empvu10
  AS SELECT empno, ename, job
     FROM emp
     WHERE deptno = 10
Drop view
DROP VIEW empvu10

Adv Database Management: SQL Group functions

Please note that the following blog post provides a summary view for what you need to get done (left column) and quick examples that illustrate how to do it in SQL (right column with SQL code in red). For more information please see the resources below:

AVG, COUNT, MAX, MIN, STDDEV, SUM, VARIANCE
SELECT AVG(sal), MAX(sal), MIN(sal), SUM(sal)
  FROM emp
  WHERE jobs LIKE ‘Sales%’
COUNT
SELECT COUNT(*)
  FROM emp
  WHERE deptno = 30
Group By
SELECT deptno, AVG(sal)
  FROM emp
  GROUP BY deptno
Rollup and cube
SELECT   deptno, MAX(sal)
  FROM     emp
  GROUP BY deptno WITH ROLLUP [CUBE]
Having
SELECT   deptno, MAX(sal)
  FROM     emp
  GROUP BY deptno
  HAVING max(sal)>2900

 

Database Management: SQL Joins

Please note that the following blog post provides a summary view for what you need to get done (left column) and quick examples that illustrate how to do it in SQL (right column with SQL code in red). For more information please see the resources below:

Equijoins
SELECT e.ename, e.deptno,  d.deptno, d.name
  FROM emp e INNER JOIN dept d
  ON e.deptno = d.deptno
Non-Equijoins
SELECT e.ename, e.sal,  s.grade
  FROM emp e INNER JOIN salgrade s
  WHERE e.sal
  BETWEEN  s.losal  AND  s.hisal

From:
grade      losal        hisal
-----      -----        ------
1            700        1200
2           1201        1400
3           1401        2000
4           2001        3000
5           3001        9999

Gives the following solution:
ename           sal     grade
----------   --------- ---------
JAMES            950         1
SMITH            800         1
ADAMS           1100         1
Outer joins
SELECT e.ename, e.deptno,  d.deptno
  FROM emp e RIGHT JOIN dept d
  ON e.deptno = d.deptno

SELECT e.deptno,  d.deptno, d.name
  FROM emp e LEFT JOIN dept d
  ON e.deptno = d.deptno
Self Joins
SELECT worker.ename +’ works for’+ manager.ename
  FROM emp worker, emp manger
  ON worker.mgr = manager.empno