What's new

Statistical Nerds League (SNL)

My brain is just no good for mathematical statistics. I enjoy reading them. When I read stats I'm on the verge of comprehension then it quickly dissipates. It's probably my ADHD. I hit watch thread and will enjoy the posts as I read them. :tongue_sm:w00t:
You are very welcome to follow along. If there is anything that that we can help to explain please just ask. We will try to keep it as easy as possible to follow.
 
Ok, so played around with the weights to my scores and have currently settled on Quality 3, Comfort 2 and Effectiveness 1.

Subsequently ran everything through the database and the following top 5 razors emerge:
  1. Gillette Rocket HD500 (T: 9.60 SD: 0.25)
  2. Mühle R89 (T: 9.44 SD: 0.41)
  3. Gillette NEW Short Comb (T: 9.35 SD: 0.30)
  4. Gillette Single Ring (T: 9.35 SD: 0.58)
  5. Mühle Rocca R94 (T: 9.35 SD: 0.71)
These razors are very consistent in their results and provide a very comfortable and smooth shave. For me at least. And fair to say that these are also household names of the shaving industry. From a modern perspective you cannot go wrong with a R89 and if you like vintage the Rocket - and likely other super speeds - are simply bliss as well.

(Disclaimer: it's easy to change these statistics in any favour we like - so there is the proverbial grain of salt)
 
This is today's statistical nerds update.

Today's topic will address the use and how to calculate averages or means as they are also called. Averages are everywhere. We calculate average house prices, average cost for a car, average gasoline price etc etc etc. They are everywhere. It is one of the most common ways to look at groups of data.

Calculating the average of anything is very straightforward. It is important to understand that only numerical quantities can be calculated means for. This means that the those numerical shave ratings in your shave log are vital for this step. To calculate a mean all we have to do is to add all the values together and divide that with the total number of items.

Average (mean) = "Add the number of items together"/"Number of items"

If we perform this over all our shave log entries we get an average shave rating. Now that is fine and dandy, but it gets more interesting if we start asking a few questions like:

What is my average shave rating while using blade "X"?
What is my average shave rating while using razor "Y"?

Now this start getting interesting. If we do this for all our blades/razors...... in our shave log we can start build some sort of a picture of how well a particular blade/razor...... works for us as compared to the other blades/razor..... that we have.

How do we do this in Excel/Google sheets?
There are some very easy to understand commands that will calculate this for you

=AVERAGE(<top cell in column>:<bottom cell in column>)

Even more useful is the conditional average, but maybe a bit confusing
=AVERAGEIF(<top cell in criterion column>:<bottom cell in criterion column> , <criterion> , <top cell in column>:<bottom cell in column>
EXAMPLE
1715691200938.png


Using averages we can find out all kinds of useful information regarding how certain items works for us. Now be careful not to read into things too much. What I mean is. I can do all this for soaps or brushes etc etc and I can draw some potentially not correct/meaningful conclusions. This is true for all statistical analysis. We have to be a bit careful that we do not read into the data what is not there. It is very easy to do especially when you are dealing with very and I mean very subjective quantities in this case.
 
This is today's statistical nerds update.

How do we rate our shaves?

When you come up with your ratings there is a few things to keep in mind. First your ratings are your ratings and none others. Ratings that you are entering in your shave log cannot be compared with another persons shave ratings because it is very subjective. This also means that unless you do some massaging of your statistical data it is difficult at best to compare statistical outcomes.

The only thing you have to keep in mind while setting up your rating scale is to try to be as consistent with YOURSELF as you can be. This is the only person you have to be consistent with to eventually gain some useful information. Now this may seem easy, but we change over time as well and we may also drift in our judgements of our shaves over time. Just keep this in mind. It is probably a good idea to draft up some sort of a scale that you can follow.

For me the "Shave Quality" is a bit easier because you can actually put it on a scale that has some sort of measurements. the below picture shows what my scale looks like. Now you come up with your own. Maybe you want to only use a 5 step scale, maybe you do not like the way I named the steps, great name them something different. the important thing is that you try to make it as objective as you can for yourself and then stick to it.
View attachment 1843136

If you change scale or rating system after a while, you will find yourself in a bit of a dilemma. You will not be able to do statistical analysis on all your shave log as a unit. You may have to treat the pre and post rating change records differently. If you a lucky you can translate one rating scale to another, but many times this is not possible either and you are stuck dealing with them separately.

How do you rate your shaves? Please share.

Thank you for defining your shave result/quality/closeness criteria and for specifying that BBS = NO stubble felt in any direction! I completely agree with your method to then rate the shave on the percentage of your shave area you got to BBS. That's what I do as well. It is easy to estimate that and, while still somewhat subjective, can also be consistent. I know for certain I will never get a perfect 10, and that too is good for the measurement scale top end.

My shave quality scale (inspired by @T Bone) expands the top numbers a bit, which gives me more dynamic range with which to measure:
10 = 100% smooth everywhere ATG​
9 = smooth ATG 2/3 of area, XTG on the rest​
8 = smooth ATG 1/2 of area, XTG on the rest​
7 = smooth ATG 1/3 of area, XTG on the rest​
6 = smooth ATG ~20% of area​
5 = smooth XTG everywhere​

This is a hard grading scale. I regularly get in the 8s and lower 9s. But with this tougher scale I can better differentiate shaves. Not everything piles up between 9.8 and 10. Have you ever considered broadening your top end a bit?

I measure other factors too, but it is really important to me to not blend the quality of the shave result with other data. A 8.7 result with 20 weepers and my face on fire is very different than an 8.7 with no blood or feedback at all.
 
Thank you for defining your shave result/quality/closeness criteria and for specifying that BBS = NO stubble felt in any direction! I completely agree with your method to then rate the shave on the percentage of your shave area you got to BBS. That's what I do as well. It is easy to estimate that and, while still somewhat subjective, can also be consistent. I know for certain I will never get a perfect 10, and that too is good for the measurement scale top end.

My shave quality scale (inspired by @T Bone) expands the top numbers a bit, which gives me more dynamic range with which to measure:
10 = 100% smooth everywhere ATG​
9 = smooth ATG 2/3 of area, XTG on the rest​
8 = smooth ATG 1/2 of area, XTG on the rest​
7 = smooth ATG 1/3 of area, XTG on the rest​
6 = smooth ATG ~20% of area​
5 = smooth XTG everywhere​

This is a hard grading scale. I regularly get in the 8s and lower 9s. But with this tougher scale I can better differentiate shaves. Not everything piles up between 9.8 and 10. Have you ever considered broadening your top end a bit?

I measure other factors too, but it is really important to me to not blend the quality of the shave result with other data. A 8.7 result with 20 weepers and my face on fire is very different than an 8.7 with no blood or feedback at all.
Yeah My shave quality rating tend to pile up at the 9, 9.5 and once in a great while a 10. I guess it is a sign that my technique has improved. The problem is that my 220 entries so far is in this scale so I would have to go back and adjust them and I really do not want to start all over. I guess I am a bit lazy to go back and readjust them, because I probably would have to do it manually.

In all reality as long as we are "consistent" with ourselves it is not really an issue though. Besides I have started to use Z-scores (thanks to @Guido75 who put me onto it) to analyze my data and that standardizes the scale better and the range matters even less. Z-scores removes the average and unifies the data based on the standard deviation. Besides I do not think that any of our rating scales are linear of any sorts. It is what it is. In retrospect a smaller range might even have been better.

I do think we need to keep a couple of ratings without getting too granular. I think we do need one for some sort of how close the shave was and one for how rough/smooth it was. I would say between 2-5 different categories are probably fine. One problem with too many is that we tend not to fill them out well. It needs to be easy and something that we can do longer term.

Everything that we do here is subjective and as such we will break statistical rules on a regular basis here like I said in my opening post. We can look at some things and maybe draw some conclusions, but we have to be cautious as well.
 
Hi Everyone

I am working on a shared shave diary for statistical data collection and AI training about wet shaving. I would appreciate any review this group could give to the work so far. It grew out of a project to try and get a scientific analysis of Chinese manufactured blades done.

If you could take some time to review the database design located here (Version 3) it would be greatly appreciated.

Thank You in advance

 
Hi Everyone

I am working on a shared shave diary for statistical data collection and AI training about wet shaving. I would appreciate any review this group could give to the work so far. It grew out of a project to try and get a scientific analysis of Chinese manufactured blades done.

If you could take some time to review the database design located here (Version 3) it would be greatly appreciated.

Thank You in advance

This is a very large and extensive shave diary database. It looks like it is for a PhD dissertation. You have certainly put in a lot of thought and effort into this database design. One thing I would evaluate is how much data that participants will be willing to put in per shave and do it regularly for a long period of time over many many shaves. Can you expand a bit on what your end goal is with this data collection.
 
I've added some manual entries for May to get a feel for the formatting:
Comments, suggestions are welcome. Meanwhile I have to do some major cleanup before attempting to import my text file with over three years of data.

View attachment 1843916
Since you're revising existing data, flip your rows and columns before you get too deep into your conversion effort. Each row should be one shave.

This opens you up to filtering, sorting and a host of other features. Think of this as a primitive database.

For my date column, I enter date and time, and I have a separate calculated shave interval column. I find this important to me learn if patterns emerge based on resting my face longer than 24 hours.

When I get to my laptop, I'll copy/paste the date formula. But first ... it's time for a shave ;-)

[edit] working through this thread, I see that @blethenstrom already mentioned flipping your data. If you create a second worksheet, you can copy/paste one shave at a time into it. There's a flipping/paste function if you right-mouse instead of ctrl-v. I'm sure there are macros for this, but I haven't taken Excel that far.

[edit #2] Ah! Transpose function! That's a new one on me. Thanks, @blethenstrom!

... Thom
 
Last edited:
Below is a snip of the first few columns from my log. I track only razors, blades, and shave # on the blade.

I don't track brushes, soap, or post-shave. Obviously, we track what we need to track ;-)

I currently use only Haslingers Shafmilch, and occasionaly, Cella Red. Post shave is also simple and unchanging: witch hazel with a tiny bit of alcohol followed by a light smear of Weleda Skin Food.

Brushes are 3 Mühle STFs and one Oumo ST-1, and I lather those up consistently.

The time of day column (TOD) is there only as a "check field". I'll likely delete it since my interval calculator is working.

To the point of this post:

Here's the formula for calculating the shaving interval in column-F (using the day/time field - column-E) - this, for row-110:

=IF(E110<>"",ROUND(((E110-E109)*24),0)&" hrs","")

RazorBladeBlade
Shave
#
TODDay/DateInteval
(Previous
Shave)
Overlander - BrassPersonna Platinum Chrome110pm09-May-2024 (Thu) 22:0030 hrs
Overlander - BrassPersonna Platinum Chrome212am11-May-2024 (Sat) 00:0026 hrs
Overlander - BrassPersonna Platinum Chrome32am12-May-2024 (Sun) 02:0026 hrs
Overlander - BrassWizamet Iridium11pm13-May-2024 (Mon) 13:0035 hrs
Tatara MasamuneWizamet Iridium24pm14-May-2024 (Tue) 16:0027 hrs

... Thom
 
This is a very large and extensive shave diary database. It looks like it is for a PhD dissertation. You have certainly put in a lot of thought and effort into this database design. One thing I would evaluate is how much data that participants will be willing to put in per shave and do it regularly for a long period of time over many many shaves. Can you expand a bit on what your end goal is with

This is a very large and extensive shave diary database. It looks like it is for a PhD dissertation. You have certainly put in a lot of thought and effort into this database design. One thing I would evaluate is how much data that participants will be willing to put in per shave and do it regularly for a long period of time over many many shaves. Can you expand a bit on what your end goal is with this data collection.
Thankfully I don't have to do that again. No, it is a curiosity and give back project. The curiosity part is all of the legend, myths, traditional wisdom, etc. that has developed over the years without any real objective analysis of correlations (can't really get to causality as we aren't running experiments). I've been lurking in forums for a while to gather what people think is true as well as seek out the most onerous YMMV to see if I could sink some teeth into it (source for all the environmental and physiological data in the model). It's also a ripe opportunity for training an AI to help people narrow down their search for their grail hardware and software.

My working assumption is that most YMMV can be correlated to environment and physiology. Technique is important but other than prep and passes I am not sure how you can record that. A follow on (by someone else) someday might be technique experiments holding everything else constant. If people supply the data on prep and passes it will likely suggest that there is value in experiment on technique.

To gather the data, it really needed to be useful for those supplying the data and be attractive to use. Much of what has gone into the design is to minimize any typing, make as much as possible simple selections, and over time anticipate and pre-load values that can be over rode easily. It also shouldn't barf is data is missing. It will serve its purpose if all people do is put in razor/blade/outcome, though I suspect people will get into it as they start to see stats, trends, managing their collections, etc.

Since I want the data freely available to anyone who wants to do research on it, there is a lot of intentional duplication in the flat file diary so it can be offered totally separate from the actual database (likely in comma delimited format).

My goal is to fund its operation for 5 years to collect the data, and then turn it into a going concern by turning it over to someone like B&B. It came about because of my interest in the new Chinese blades (and mixed results) and the fact there is very little scientific/academic work and papers on the subject. Filling a little bit of that void would help the hobby and industry I think.
 
Even more useful is the conditional average, but maybe a bit confusing
=AVERAGEIF(<top cell in criterion column>:<bottom cell in criterion column> , <criterion> , <top cell in column>:<bottom cell in column>
Excellent example. I use Pivotal Tables for that in a separate sheet in my file. So one tab has the data the other tab has the analysis. Using Pivotal Tables you can start juxtaposition razor and blade combinations for example or modern versus vintage depending on what you are tracking.

Great!

Guido
 
Excellent example. I use Pivotal Tables for that in a separate sheet in my file. So one tab has the data the other tab has the analysis. Using Pivotal Tables you can start juxtaposition razor and blade combinations for example or modern versus vintage depending on what you are tracking.

Great!

Guido
Shhh I have not divulged the power of pivot tables yet. Actually maybe that is something that you can help me with. It is quite a large topic than to cover in a single post. Pivot tables are just so useful and powerful and indeed does a lot of work for us.
 
Thankfully I don't have to do that again. No, it is a curiosity and give back project. The curiosity part is all of the legend, myths, traditional wisdom, etc. that has developed over the years without any real objective analysis of correlations (can't really get to causality as we aren't running experiments). I've been lurking in forums for a while to gather what people think is true as well as seek out the most onerous YMMV to see if I could sink some teeth into it (source for all the environmental and physiological data in the model). It's also a ripe opportunity for training an AI to help people narrow down their search for their grail hardware and software.

My working assumption is that most YMMV can be correlated to environment and physiology. Technique is important but other than prep and passes I am not sure how you can record that. A follow on (by someone else) someday might be technique experiments holding everything else constant. If people supply the data on prep and passes it will likely suggest that there is value in experiment on technique.

To gather the data, it really needed to be useful for those supplying the data and be attractive to use. Much of what has gone into the design is to minimize any typing, make as much as possible simple selections, and over time anticipate and pre-load values that can be over rode easily. It also shouldn't barf is data is missing. It will serve its purpose if all people do is put in razor/blade/outcome, though I suspect people will get into it as they start to see stats, trends, managing their collections, etc.

Since I want the data freely available to anyone who wants to do research on it, there is a lot of intentional duplication in the flat file diary so it can be offered totally separate from the actual database (likely in comma delimited format).

My goal is to fund its operation for 5 years to collect the data, and then turn it into a going concern by turning it over to someone like B&B. It came about because of my interest in the new Chinese blades (and mixed results) and the fact there is very little scientific/academic work and papers on the subject. Filling a little bit of that void would help the hobby and industry I think.
I think your effort are very noble and I understand where you are heading. I think the big challenge will be to compare data from different people. How we judge our own shave even is highly subjective. That means that even if you have a selection table of choices people will choose differently, even if their shaves were objectively the same. Their experiences of the shave will be different. Things like your mental state when you shave comes into play a lot in addition to the things you mentioned. Are you angry, sad, happy, content etc etc when you are shaving will effect you and how you perceive your shave and outcome thereof. Were you dehydrated, tired......?

We all also have preconceived notions regarding shaves even before we shave. Let's say we have an expensive razor and a very inexpensive razor that we have bought and we use both of them. We have a tendency to favor the expensive one vs the inexpensive one , even if the shaves were identical. Same thing with well regarded items on B&B vs less well regarded items on B&B. It colors our judgement. We all do this unbeknownst to ourselvs regarding everything. This is why trying to compare notes so to speak regarding which razors/blades etc etc are the best is very very difficult.

I have written in a thread about this before and we really would not get the true picture unless blind tests were done. For example, we have our well regarded blade in our mind when we try it and we are more likely to give it high marks whether or not it actually was any better. Likewise with a less regarded blade. Even this would have its problems.

Comparing your own data with your own past shaves is probably the best you can do IMHO. Now there is certainly no harm in gathering the data and see what you get. Maybe some useful conclusions can be obtained. Since you are making your own data available to yourself you can just compare with yourself so go for it.
 
I think your effort are very noble and I understand where you are heading. I think the big challenge will be to compare data from different people. How we judge our own shave even is highly subjective. That means that even if you have a selection table of choices people will choose differently, even if their shaves were objectively the same. Their experiences of the shave will be different. Things like your mental state when you shave comes into play a lot in addition to the things you mentioned. Are you angry, sad, happy, content etc etc when you are shaving will effect you and how you perceive your shave and outcome thereof. Were you dehydrated, tired......?

We all also have preconceived notions regarding shaves even before we shave. Let's say we have an expensive razor and a very inexpensive razor that we have bought and we use both of them. We have a tendency to favor the expensive one vs the inexpensive one , even if the shaves were identical. Same thing with well regarded items on B&B vs less well regarded items on B&B. It colors our judgement. We all do this unbeknownst to ourselvs regarding everything. This is why trying to compare notes so to speak regarding which razors/blades etc etc are the best is very very difficult.

I have written in a thread about this before and we really would not get the true picture unless blind tests were done. For example, we have our well regarded blade in our mind when we try it and we are more likely to give it high marks whether or not it actually was any better. Likewise with a less regarded blade. Even this would have its problems.

Comparing your own data with your own past shaves is probably the best you can do IMHO. Now there is certainly no harm in gathering the data and see what you get. Maybe some useful conclusions can be obtained. Since you are making your own data available to yourself you can just compare with yourself so go for it.
Good points. These are all traditional issue in qualitative research. While I am collecting "numbers", the reality is that the answers are subjective (what qualitative methods are meant to deal with). There are hopefully three phases of the analysis:

Analyzing subjective shave data from a growing population presents unique challenges and opportunities. Here are some statistical techniques that work with my model, starting with approaches for smaller samples and scaling up as the data grows:

Early Stage: Smaller Sample Sizes
  • Descriptive Statistics: Begin with descriptive statistics like means, medians, and standard deviations for each rating (smoothness, comfort, etc.). This will give a basic overview of user perceptions for different combinations.
  • Visualizations: Use bar charts or box plots to compare ratings across different razors or blades visually. This can help identify early trends and outliers.
  • Cluster Analysis: Group similar shavers (based on skin/beard type) to see if responses differ significantly between these groups. This can help identify which factors most impact user perception.
Medium Stage: Growing Sample Sizes
  • Regression Analysis: Once I have enough data, I can use regression models to explore relationships between razor/blade characteristics (price, aggressiveness, etc.) and user ratings. This can help predict how a new razor or blade might be perceived.
  • Factor Analysis: This technique can help identify underlying factors (e.g., "overall shave quality," "blade comfort") that influence the subjective ratings, providing deeper insights into user preferences.
Large Sample Sizes (Long-Term):
  • Multilevel Modeling: With a large population and many observations per person, this approach accounts for individual differences and repeated measures over time. It helps separate individual variability from product effects.
  • Bayesian Analysis: Bayesian methods can incorporate prior knowledge or beliefs about razors/blades, updating those beliefs as more data is collected. This can be particularly useful when dealing with subjective ratings and uncertainty.
  • Machine Learning Techniques: At this stage I intend more advanced methods like collaborative filtering or recommender systems to predict which razors/blades a specific user might prefer based on their past ratings and similar users.
Addressing Subjectivity
  • Normalization: As the population increase I can normalize the data to account for individual differences in rating scales. For example, I could standardize ratings within each person or use a z-score transformation to center the data.
  • Inter-Rater Reliability: I can assess the level of agreement between different users when rating the same razor/blade combination. This can helps gauge the reliability of the data.
  • Qualitative Feedback: I always incorporate open-ended feedback alongside numerical ratings via the comment sections. This provides valuable context and insights that quantitative data alone can't capture. There are a number of qualitative techniques that can be used to parse that unstructured data and then encode it (identify themes) that can be further used to normalize the quantitative data.
All of this is why I want to make the data freely available. I teach AI and Qualitative Research, but I am not a stats guru. Hopefully studying the data turns into a collaborative effort.

In the end, I need to do all of this to generate aa training base for a "newbie advisor AI", which is kind of my end goal (as well as generate insights that might advance the hobby and technologies).

I also hypothesis that hard data about razors/blades/lubricants/etc. might also be the source of some of the YMMV issues - for example, there are differences in dimensions among blades (KAIs are wider than most) which change the gap/exposure relationships of the razor. I am collecting hard data on sharpness (BESS rating), dimension variability (relative to a standard), thickness on blades to see potential correlations among user types and razors.
 
Last edited:
Good points. These are all traditional issue in qualitative research. While I am collecting "numbers", the reality is that the answers are subjective (what qualitative methods are meant to deal with). There are hopefully three phases of the analysis:

Analyzing subjective shave data from a growing population presents unique challenges and opportunities. Here are some statistical techniques that work with my model, starting with approaches for smaller samples and scaling up as the data grows:

Early Stage: Smaller Sample Sizes
  • Descriptive Statistics: Begin with descriptive statistics like means, medians, and standard deviations for each rating (smoothness, comfort, etc.). This will give a basic overview of user perceptions for different combinations.
  • Visualizations: Use bar charts or box plots to compare ratings across different razors or blades visually. This can help identify early trends and outliers.
  • Cluster Analysis: Group similar shavers (based on skin/beard type) to see if responses differ significantly between these groups. This can help identify which factors most impact user perception.
Medium Stage: Growing Sample Sizes
  • Regression Analysis: Once I have enough data, I can use regression models to explore relationships between razor/blade characteristics (price, aggressiveness, etc.) and user ratings. This can help predict how a new razor or blade might be perceived.
  • Factor Analysis: This technique can help identify underlying factors (e.g., "overall shave quality," "blade comfort") that influence the subjective ratings, providing deeper insights into user preferences.
Large Sample Sizes (Long-Term):
  • Multilevel Modeling: With a large population and many observations per person, this approach accounts for individual differences and repeated measures over time. It helps separate individual variability from product effects.
  • Bayesian Analysis: Bayesian methods can incorporate prior knowledge or beliefs about razors/blades, updating those beliefs as more data is collected. This can be particularly useful when dealing with subjective ratings and uncertainty.
  • Machine Learning Techniques: At this stage I intend more advanced methods like collaborative filtering or recommender systems to predict which razors/blades a specific user might prefer based on their past ratings and similar users.
Addressing Subjectivity
  • Normalization: As the population increase I can normalize the data to account for individual differences in rating scales. For example, I could standardize ratings within each person or use a z-score transformation to center the data.
  • Inter-Rater Reliability: I can assess the level of agreement between different users when rating the same razor/blade combination. This can helps gauge the reliability of the data.
  • Qualitative Feedback: I always incorporate open-ended feedback alongside numerical ratings via the comment sections. This provides valuable context and insights that quantitative data alone can't capture. There are a number of qualitative techniques that can be used to parse that unstructured data and then encode it (identify themes) that can be further used to normalize the quantitative data.
All of this is why I want to make the data freely available. I teach AI and Qualitative Research, but I am not a stats guru. Hopefully studying the data turns into a collaborative effort.

In the end, I need to do all of this to generate aa training base for a "newbie advisor AI", which is kind of my end goal (as well as generate insights that might advance the hobby and technologies).

I also hypothesis that hard data about razors/blades/lubricants/etc. might also be the source of some of the YMMV issues - for example, there are differences in dimensions among blades (KAIs are wider than most) which change the gap/exposure relationships of the razor. I am collecting hard data on sharpness (BESS rating), dimension variability (relative to a standard), thickness on blades to see potential correlations among user types and razors.
You certainly are over my head in all this. I am no statistician and are not well read on all this. Now give me a circuit board to design and I am your man, but not this. Z scores are as far as I go. Sounds like you have a good, detailed plan and will be following your progress with interest.
 
Hopefully you'll want to use it when its done....
Jim,

I have read your design version 3 and your structure is impressive. I see some duplications but you already mentioned those were deliberate.

I have requested moderators in the past if it was possible to get a download of the SOTD thread to bills that database, but unfortunately the structure of the forum doesn’t allow that without significant efforts on their part.

Your approach is the second best thing and a daunting task to say the least.

You do pose a million dollar question if people will indeed use it as the sheet number of variables to be assessed is extensive. Even when populated. I am all for data analyses and I like it even for the sake of it, because in honesty I have never actually done anything with my statistics other than checking razor usage. I have favourite blades, but the past year been blade agnostic and simply shave with the blade off the week so to speak.

I also hypothesis that hard data about razors/blades/lubricants/etc. might also be the source of some of the YMMV issues
It definitely might yes and it would be an idea to get these handled if we can, but I also think circumstances might be very different even if minute between shaves purely because of the operator. For example I load for 15 counts before face lathering. Is my count even throughout? Or did I inadvertently speed up this morning? Do I apply consistent and constant issues when loading or even when face lathering?
Inter-Rater Reliability: I can assess the level of agreement between different users when rating the same razor/blade combination. This can helps gauge the reliability of the data.
So I see much benefit in finding in internal validity in data. Given the subjective nature of the YMMV I can also hypothesise that IRR might not be applicable because the underlying events are less similar than expected. ICC tries to counter that effect by calculating IRR based on one observation. There is a paper available on calculations. You might want to check it out if that may make sense for shaving data.
Multilevel Modeling: With a large population and many observations per person, this approach accounts for individual differences and repeated measures over time. It helps separate individual variability from product effects.
This would be super cool. You could do HLM on individual or razor (I think) and if we populate the database from across the globe it would make even more possible.

Just some first thoughts.

Cheers,

Guido
 
This is getting intense. I love it.

I had a thought about rating compression after discussing some of my scores with a few individuals.

I'm currently grading my great shaves in the 9.0 range for closeness. Some have commented that my shaves fall in the 7.5 to 8.0 closeness range on @T Bone's scale.

This got me thinking about rating compression and headroom at the upper end of the scale. I'm wondering if some of us perceive rating steps (7, 8, 9) as being linear ones while others perceive them as being logarithmic.

For subjective assessments like comfort, whether people rate logarithmically may be a consideration. I seem to think logarithmically for comfort.

My statistical knowledge is primitive, and it's been decades since I've thought about this beyond basic concepts.

I'm just throwing it out there.

... Thom
 
Top Bottom