Comparing the Aesthetic Quality and Creativity Scales – Blog Post #2

After obtaining all the scores given to each drawing on the Creativity scale and the Aesthetic Quality scale, I have been working on establishing the reliability of the scales, the construct validity of the Aesthetic Quality scale, and the potential relationship between the scores that drawings obtained on each scale with the scores that participants obtained on the VAST.

First, inter-rater reliability was determined by finding the Cronbach’s alpha of each scale. This was found using the alpha function in R, which determined the Cronbach’s alpha to be .80 on the Creativity scale and .82 on the Aesthetic Quality scale. This suggests that the raters judging the drawings gave consistent scores to each drawing. A more in depth analysis of each rater showed that there would be very little change in the inter-rater reliability if one were to remove the scores given by any particular rater, which also suggests that raters were consistent with the scores given to each drawing.

Given that raters were consistent with their scores of the drawings, I included all of the scores when creating a composite score for each drawing. For each scale, I found the average (mean) of all the scores assigned to a particular drawing. This was simple enough to calculate within LibreOffice (a program used to create and manage data in a spreadsheet). There was a good deal of variety among the means of each drawing, suggesting that the individual drawings differed in regard to their apparent creativity and aesthetic quality.

Using a Pearson’s Product-Moment Correlation test (the “cor.test” function in R), I determined that the correlation between the mean scores assigned to drawings on the Aesthetic Quality and Creativity scales to be approximately .81 (p<.05). I also conducted the same test on the relationship between the mean scores on each scale to the participants’ VAST scores, which returned non-significant results (weak correlation, p>.05). These results suggest that the Aesthetic Quality and Creativity scales are being scored similarly, meaning that the two constructs are not as distinct from each other as they should be. Also, there does not appear to be a significant correlation between participants’ VAST scores and the scores given to their drawings on either scale.

Moving forward I still intend to look into if there are any individual raters who differentiated between the two scales and, if that is the case, examining whether the raters’ VAST scores impacted how they scored the test. Given that there is strong inter-rater reliability, however, I am not expecting to find anything particularly significant. In addition to that, I will also look into determining if it is possible to better distinguish the constructs in the instructions given to the raters.

There was some difficulty in matching the scores of participants’ drawings to their VAST scores since the scores (on both the creativity and aesthetic quality scales) used different means of identifying the participants than the VAST used. After obtaining a file which linked the two methods of identification, I created two new files to organize the data for the Creativity scale and for the Aesthetic Quality scale. In each file, I linked the VAST user ID number to the participant’s average score and their VAST score. The VAST was administered twice, so I included both the pre and post test scores, but I am focusing primarily on the VAST Pretest data since not every participant completed the second VAST test. I was able to link all but one participant, whose user ID did not seem to match with any of the available VAST scores. I intend to look more into that later, but for now, the participant was not included in analyses involving the VAST scores.

Given that Aesthetic Quality does not appear to have been scored differently from Creativity, it is necessary for future research into it to make an effort to differentiate the constructs. I intend to look further into how to do that, referring the literature related to art and aesthetics which attempts to define aesthetic quality. Barring any particularly noteworthy results that may arise from examining the individual raters, defining Aesthetic Quality in a way that is more distinct from Creativity will be the next step if progress is to be made towards determining if training in virtual reality can improve skills related to aesthetic judgment or sensitivity.

Assessing Aesthetic Judgment and Related Skills – Blog Post #1

The purpose of this project is to examine the potential of virtual reality (VR) training to improve aesthetic skills. Our main goal is to create an aesthetic quality scale, which will be used to measure the aesthetic quality of 3D drawings created in a VR setting. In future research, it may be possible to use such a scale to determine whether VR training has a significant effect upon aesthetic skills, by seeing if there is an improvement in the aesthetic quality of 3D drawings people produce after VR training. The scale may also come in useful for other research related to measuring aesthetic quality, but for our purposes, it will make it possible to measure the observable quality of drawings being produced by participants, in order to assess any potential changes to the internal characteristic of aesthetic sensitivity. The aesthetic quality of drawings may be used as a way to assess an individual’s aesthetic skills. Before research can be conducted into that, the project will focus on developing a suitable aesthetic quality scale, assessing its validity, and determining whether the scores assigned to drawings with the scale can be used as a proxy of an individual’s aesthetic sensitivity.

The data being used in this particular project was collected for a different study which examines the link between VR training and creativity. It involved assessing 3D drawings produced by the participants in a VR setting on a scale designed to measure creativity. Each participant was instructed to create a drawing of an animal of their choosing (real or imaginary) using the program. The Visual Aesthetic Sensitivity Test (VAST) is an accepted measure of aesthetic sensitivity, and a revised version of it (VAST-R) was used to determine participants’ aesthetic sensitivity. Prior to producing the drawing, all participants completed the VAST-R. In the current study, the VAST-R scores will be examined in their relation to the scores given to the drawings on the aesthetic quality scale. If participants who achieve higher VAST-R scores have their drawings scored higher on the aesthetic quality scale, it will add credibility to the validity of the scale, as well as function as an important variable to consider since the project is examining the potential for VR to improve aesthetic sensitivity.

The VR training itself consisted of various immersive experiences. Participants watched a music video which fully surrounded them, explored an underwater scene, played an interactive story game, and interacted with a recorded scene which utilized tactile stimulation from the researcher which mimicked what the participant would expect to feel during the experience. This training was utilized in the previous study, which involved a group of 68 participants, each of whom created their 3D drawing after experiencing the VR training. Each of the drawings produced in the study were scored on a scale for creativity. This scale utilized the Consensual Assessment Technique (CAT), which involves raters being given a characteristic number of creative productions to rate using a multipoint scale which they can use to freely assign the score they think best suits each production. In the case of this study, the group of judges were asked to rate the 68 productions for creativity on a seven point scale, as well as for the new, similarly structured scale, of aesthetic quality. Judges were also given the VAST-R prior to scoring to see whether their aesthetic sensitivity would impact the scores they give on either of the scales.

Currently, we are focused upon establishing the validity of the aesthetic sensitivity scale in particular. A valid scale is one which yields the correct score for the construct it is measuring. The validity of the scale is vital because a scale which produces an incorrect score is likely invalid and not measuring what it is intended to measure. In the case of the aesthetic quality scale, it has not yet been established that the scores being produced are indicative of aesthetic quality, so that is our immediate next step. To determine whether the scale is valid, we will be looking for trends in how individuals score items using the aesthetic quality scale.

If the aesthetic quality scale measures what it is intended to measure, then the scores given to each 3D drawing will likely reflect the aesthetic sensitivity of the individual producing it. To that end, we will be examining whether each individual’s VAST-R score is predictive of the aesthetic quality rating assigned to their drawing. If higher scores on the participant’s VAST-R predict higher scores on the aesthetic quality scale, that will suggest that the aesthetic quality of the drawings being produced are a result of an individual’s aesthetic sensitivity ability. This may make it possible for future studies to determine whether aesthetic sensitivity can be improved through VR training. Furthermore, comparing the scores given on the aesthetic quality scale with the scores given on the creativity scale will determine the extent to which the construct of aesthetic quality is separate from the construct of creativity. It is necessary to determine that the aesthetic quality scale is measuring something unique and specific in order to establish its validity.

In addition to establishing the validity, we will look into whether the scale is reliable. If the aesthetic quality scale is reliable, then scores assigned to a drawing will be consistent between raters. If the scale is not consistent between raters, we will examine differences between individuals in the scores they assign to drawing. Since we administered the VAST-R to the judges, then that will be our primary focus when it relates to which judges are more likely to produce valid scores. It may be the case that judges will need to have a high VAST-R score if their judgment is to be relied upon. If there is strong interrater reliability, however, the judging can likely be done by anyone who understands the scale. Establishing how reliable the scale is will determine how the aesthetic sensitivity scale can be utilized in the future.

We are also working on writing syntax for automating the IRT scoring process of the VAST-R. IRT refers to Item Response Theory – a framework used in designing, analyzing, and scoring measures of abilities, attitudes, and other psychological dimensions – which uses the unobserved ability of an individual and the unobserved characteristics of the items (both of which are considered fixed values for a given individual and item) and considers these to be predictors of an individual’s observable score on each item. In the process, unobservable parameters (notably, each individual’s latent ability) are estimated, making IRT an accurate alternative to regular “sum” scoring.

This is my first experience writing syntax, as well as my first encounter with the IRT framework, so I am learning quite a bit as I go. I hope to familiarize myself with these skills since they will be useful in future projects. Additionally, by writing out a syntax for automating the IRT scoring process, the same syntax can be used by me and by other people who wish to use the automated IRT syntax in the future. This, along with the progress that is being made towards the project’s ultimate goal of assessing aesthetic judgment and creating an aesthetic quality scale, are both informative in and of themselves and will be useful for future projects. I hope to gain more knowledge about psychometrics in the process of working on this project, as well as produce results that future studies can build upon.