Date of Ranking: June 29, 2018
Rank | huID P-index
Algorithm Details
The algorithm has 2 distinct parts. The first is the algorithm to score and rank phenotypes of the PGP. The second is the algorithm to score and rank participants of the PGP. All comparisons made are case-insensitive, i.e. does not distinguish between capital and small letters.
Phenotype Score
The Phenotype Score is derived from counting the number of participants that reported a valid value for that phenotype. For example, a Phenotype Score of 100 indicates that 100 unique participants reported a valid value for that phenotype. A phenotype is defined as any reportable information that can be ascribed to the individual participant. The Phenotype Score does not increase even if the same participant reported multiple valid values for the same phenotype. Phenotypes can range from trait information to disease status. Some example phenotypes are common quantitative trait measurements like height and weight and disease status such as whether the participant has breast cancer, acne or lipoma. For the phenotype to be valid, the participant need not have to have the disease. For example, for the phenotype "diabetes mellitus, type 2", valid phenotype values could be "yes", "no", "positive", etc. Invalid values are values that are not informative about the phenotype.
Current invalid phenotype values are,
- Unsure
- Not applicable
- Other / Don't know / No response
- No response
- Not sure
- Unspecified
Some phenotypes are synonyms of one another. Although we cannot determine a priori which phenotypes are synonyms of one another, below is a list that was used to map synonyms to the same phenotype.
Weight
1.3 --- Weight
Height
1.2 --- Height
Blood Type
1.1 --- Blood Type
Race/ethnicity
Race
Sex/Gender
Gender
Date of Birth
Date of Birth (mm/dd/yyyy)
Comments
1.4 --- Comments | 2.5 ---Comments | 3.3 --- Comments
Any final thoughts?
4.1 --- Any final thoughts?
Handedness
1.4 --- Handedness
Left Eye
2.1 --- Left Eye (Photograph Number) (full-size image: https://goo.gl/XQ2Voh)
Right Eye
2.2 --- Right Eye (Photograph Number) (full-size image: https://goo.gl/XQ2Voh)
Left Eye Color
2.3 --- Left Eye Color - Text Description
Right Eye Color
2.4 --- Right Eye Color - Text Description
Hair Color
3.2 --- Hair Color - Text Description
Natural Hair Color
3.1 --- What is your natural hair color currently, when without artificial color or dye?
Systolic Blood Pressure
Blood Pressure, Systolic (Upper Number) | Systolic Pressure | Systolic
Diastolic Blood Pressure
Blood Pressure, Diastolic (Lower Number) | Diastolic Pressure | Diastolic
HDL Cholesterol
HDL Cholesterol.
LDL Cholesterol
LDL Cholesterol Calc | LDL Cholesterol (Calculated)
High Cholesterol (Hypercholesterolemia)
High Cholesterol
Hepatitis B Vaccine, Adult
hepatitis b vaccine (hepb) adult
Hepatitis B Vaccine, Type Unknown
hib/hepatitis b vaccine
Hepatitis A Vaccine, Type Unknown
hepatitis a vaccine (hepa)
Hepatitis B Vaccine, Adolescent or Pediatric
hepatitis b vaccine (hepb) child
Hepatitis A/Hepatitis B vaccine
hepatits a and hepatitis b vaccine (hepa/hepb)
Influenza Vaccine, Type Unknown
influenza vaccine | flu shot
Myopia (Nearsightedness)
Myopia | Nearsightedness
One can use the Phenotype Score as a tool to browse the phenotypes available from the PGP.
P-index
The P-index for each participant is calculated by adding up the Phenotype Score of all phenotypes reported by the participant divided by the theoratical maximum score, i.e. the sum of all Phenotype Scores. That number is then multiplied by 100 resulting in a value ranging from 0 to 100. A P-index of 0 indicates that the participant reported no phenotypes while a P-index of 100 indicates that the participant reported all phenotypes available from the PGP.
The P-index is then used to rank participants of the PGP. The higher the P-index, the better the rank will be. If 2 or more participants have exactly the same P-index, they will be considered tied, and will have a rank prefixed with the letter T. For example if if 2 participants are ranked at 10th place, their ranks would be T10 and the next ranked participant would be 12, i.e. there would be no 11th place.
Not all phenotypes are used for ranking participants. The criteria for excluding phenotypes for the current implementation (which is different from the criteria reported in the publication) are given below,
- Real Name is excluded. We do not wish to incentivized individuals to publicly identify themselves.
- Phenotypes there are generated automatically during enrollment are excluded. E.g. "enrolled", "consent", "account created".
- Phenotypes that pertains to sample collection. E.g. "boston ma, june 21 2014", "mountain view ca, may 7 2014".
- Phenotypes that pertains to enrollment of family members. E.g. "enrolled relatives [children]", "sibling".
- Phenotypes that are too non-specific. E.g. "comments", "any final thoughts?".
- Phenotypes that pertains to genotyping information about the participant. E.g. "23 and me", "ancestrydna".
- Phenotypes that have a Phenotype Score less than 5.
Citation
To cite this work, please cite the following article,
Chan, Y. et al. An unbiased index to quantify participant's phenotypic contribution to an open-access cohort. Sci. Rep. 7, 46148; doi: 10.1038/srep46148 (2017)
Click here to access manuscript (open-access).
For more information, please contact Rigel Chan or Elaine Lim at pgpresearch@wyss.harvard.edu.