There's the study that u/HOU_Civil_Econ provided but also if you're trying to see how college causally affects income, it might be easier to compare the income from people with certain SAT scores who did go to college vs. people with those scores who didn't go to college to control the "intelligence" variable as that is easier information to get and verify and also controls for the factor that people who were accepted and didn't attend college likely did so for financial reasons, which means they were low income or came from a low income family which would definitely distort the study.
There's also this study to compare barely accepted/barely rejected students which achieves a similar thing but only for people around the cutoff, not people who would have been accepted but didn't apply which may see better or worse returns on their educational investment:
Edit: presuming u/HOU_Civil_ECON and his take on what you were really asking is true, yes, there are a number of resources on the impact of college education on wages.
I’m linking a couple, including the 1999 Handbook article (it’s still a good, free read).
Almost certainly someone has looked at people right around a whole bunch of cutoffs. So it often won’t be accepted but didn’t attend but instead 189 instead of a 190 on whatever scoring matrix. That is barely accepted vs barely rejected.
It's also a much better way of estimating the causal impact of college attendance on wages.
Even if we had data on people who turned down going to college despite an acceptance, there's a selection issue. People who choose not to go to college are different from people who do choose to go in ways that may affect their future wages.
By using an arbitrary cutoff that is outside the applicants' control, we can argue that the people on either side of the cutoff have similar underlying characteristics and that they therefore make appropriate treatment and control groups.
That still isn’t the question OP asked. While I agree that a cutoff in scores is useful and would be available as data at institutions, to me that answers a different question.
It’s not going to be one question. It’s going to have to include some options about the WHY they chose not to go.
You’re also going to have to control for where they got rejected versus not.
Beyond that, I imagine it’s going to be a small sample anyways.
Edit: it’s not that the data wouldn’t be useful. It would be plausibly causal. It would probably be the best treatment effect for identifying the impact of college/no college on economic outcomes. I would use a survey dataset like that.
Congratulations OP you accidentally rediscovered causal inference. Now say “identification strategy” three times and you’re halfway to a PhD in Economics.
They find a 22% average increase in wages 8-14 years after high school graduation when looking at the marginal admission. I don't know what they consider marginal, but I assume it's looking at something like the students who are barely in the top 20% of their high school class.
NOTE: Top-level comments by non-approved users must be manually approved by a mod before they appear.
This is part of our policy to maintain a high quality of content and minimize misinformation. Approval can take 24-48 hours depending on the time zone and the availability of the moderators. If your comment does not appear after this time, it is possible that it did not meet our quality standards. Please refer to the subreddit rules in the sidebar and our answer guidelines if you are in doubt.
Please do not message us about missing comments in general. If you have a concern about a specific comment that is still not approved after 48 hours, then feel free to message the moderators for clarification.
The “marginal admission” studies are probably the best that we can do. The issue with those is it’s in a narrow demographic band.
The reason you can’t get a representative sample from “people who were accepted but chose not to attend” is exactly because of that choice part. It’s “self selection”. The danger with inferring population conclusions from a heavily self selected sample is that the self selection effect might overwhelm whatever it is you’re trying to segregate on. In other words, the “chose not to go” selection effect might overwhelm the “college or no college” effect, which means your study didn’t tell you anything about the impact of going to college.
You can actually partially remove that effect by using multivariate analysis. Like for example I can collect a large pool of independent attributes which I think has signal, including whether they were admitted and whether they went as 2 separate variables, and throw it into a machine learning model that can handle the number of attributes I selected, and just ignore the coefficients / weights / branches of the tree (depending on what model you used) associated with the choice itself.
There's the study that u/HOU_Civil_Econ provided but also if you're trying to see how college causally affects income, it might be easier to compare the income from people with certain SAT scores who did go to college vs. people with those scores who didn't go to college to control the "intelligence" variable as that is easier information to get and verify and also controls for the factor that people who were accepted and didn't attend college likely did so for financial reasons, which means they were low income or came from a low income family which would definitely distort the study.
There's also this study to compare barely accepted/barely rejected students which achieves a similar thing but only for people around the cutoff, not people who would have been accepted but didn't apply which may see better or worse returns on their educational investment:
https://academic.oup.com/qje/advance-article/doi/10.1093/qje/qjaf055/8376650?utm_source=chatgpt.com&login=false
No. How would you get that data?
Edit: presuming u/HOU_Civil_ECON and his take on what you were really asking is true, yes, there are a number of resources on the impact of college education on wages.
I’m linking a couple, including the 1999 Handbook article (it’s still a good, free read).
https://www.journals.uchicago.edu/doi/abs/10.1086/698760?casa_token=r1MuC44Z12AAAAAA:ROu0e0IpO4rM8w3PhcsxmjozlhFmj1chra8HzsG90SToObzCv9vIrJRrpcDZX-1zVog8gKdJfryfgw
https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0084.2012.00708.x?casa_token=0LQBtG8U5HcAAAAA%3AMIJKGQYDjiyqEtBkPEwD2uSocu02bqEDbUY7EVJYtdO_RQl9ihTbD_HgPWktG3anrJ6-q6rF6eU_lXac&casa_token=jW57byPaVLwAAAAA%3ADO8igtfK_MfFAdulmOwM2gZofCMGdT3B83HovdxfMtdd5GL17ReIyu5yCWIJDTnbaAhwzP58l_OHzPVd
https://www.sciencedirect.com/science/chapter/handbook/abs/pii/S1573446399030114
Almost certainly someone has looked at people right around a whole bunch of cutoffs. So it often won’t be accepted but didn’t attend but instead 189 instead of a 190 on whatever scoring matrix. That is barely accepted vs barely rejected.
The point is that there is no publicly available repository where people register that they were "accepted" into X university but chose not to attend.
Graduation classes from college are easy to find, plus people tend to list it on their resumes.
Nobody is listing "accepted to UNC but chose to become a mechanic" anywhere.
The point is letting OP know the more feasible ways economist might try to answer the fundamental underlying question.
It's also a much better way of estimating the causal impact of college attendance on wages.
Even if we had data on people who turned down going to college despite an acceptance, there's a selection issue. People who choose not to go to college are different from people who do choose to go in ways that may affect their future wages.
By using an arbitrary cutoff that is outside the applicants' control, we can argue that the people on either side of the cutoff have similar underlying characteristics and that they therefore make appropriate treatment and control groups.
We don't know the underlying question because OP didn't specify. If he said he wanted (local treatment) effect of college he would have said so.
Huh? They chose not to attend. How is that a scoring matrix cutoff?
For OP, this is more likely how it is going to be done.
But that’s a different question. That’s accepted versus not.
For OP, this is more likely how the underlying question “how does college casually impact wages “ would be addressed.
Why not respond to OP then?
That still isn’t the question OP asked. While I agree that a cutoff in scores is useful and would be available as data at institutions, to me that answers a different question.
Because I felt you had a good but incomplete answer. The OP question is clearly trying to get at the causal effect of a college education.
👍🏻
"Clearly" is a strech
Sincere question: what seems particularly hard about getting this data vs anything else?
It’s not going to be one question. It’s going to have to include some options about the WHY they chose not to go.
You’re also going to have to control for where they got rejected versus not.
Beyond that, I imagine it’s going to be a small sample anyways.
Edit: it’s not that the data wouldn’t be useful. It would be plausibly causal. It would probably be the best treatment effect for identifying the impact of college/no college on economic outcomes. I would use a survey dataset like that.
[removed]
[removed]
[removed]
[removed]
[removed]
[removed]
[removed]
Congratulations OP you accidentally rediscovered causal inference. Now say “identification strategy” three times and you’re halfway to a PhD in Economics.
https://www.journals.uchicago.edu/doi/abs/10.1086/676661 leans in to the fact that Florida has a guaranteed admission policy based on high school performance. https://www.fldoe.org/schools/family-community/activities-programs/pre-collegiate/talented-twenty-program/ outlines the requirements.
They find a 22% average increase in wages 8-14 years after high school graduation when looking at the marginal admission. I don't know what they consider marginal, but I assume it's looking at something like the students who are barely in the top 20% of their high school class.
NOTE: Top-level comments by non-approved users must be manually approved by a mod before they appear.
This is part of our policy to maintain a high quality of content and minimize misinformation. Approval can take 24-48 hours depending on the time zone and the availability of the moderators. If your comment does not appear after this time, it is possible that it did not meet our quality standards. Please refer to the subreddit rules in the sidebar and our answer guidelines if you are in doubt.
Please do not message us about missing comments in general. If you have a concern about a specific comment that is still not approved after 48 hours, then feel free to message the moderators for clarification.
Consider Clicking Here for RemindMeBot as it takes time for quality answers to be written.
Want to read answers while you wait? Consider our weekly roundup or look for the approved answer flair.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
I think I recently read a similar study that looked at Ivy League admittances (or sometime like that).
Do you have a link to it?
I'm not sure if this is what I read but I think it references the same study: https://fordhaminstitute.org/national/commentary/revisiting-research-who-gets-elite-colleges-and-how-it-affects-their-lives
Does that look towards people who didn't go to college at all, or just those who didn't go to ivy but went to a state school or similar to save money.
It's been a few months so I can't remember. I think the study was mentioned in this article but I can't say with certainty https://fordhaminstitute.org/national/commentary/revisiting-research-who-gets-elite-colleges-and-how-it-affects-their-lives
The “marginal admission” studies are probably the best that we can do. The issue with those is it’s in a narrow demographic band.
The reason you can’t get a representative sample from “people who were accepted but chose not to attend” is exactly because of that choice part. It’s “self selection”. The danger with inferring population conclusions from a heavily self selected sample is that the self selection effect might overwhelm whatever it is you’re trying to segregate on. In other words, the “chose not to go” selection effect might overwhelm the “college or no college” effect, which means your study didn’t tell you anything about the impact of going to college.
You can actually partially remove that effect by using multivariate analysis. Like for example I can collect a large pool of independent attributes which I think has signal, including whether they were admitted and whether they went as 2 separate variables, and throw it into a machine learning model that can handle the number of attributes I selected, and just ignore the coefficients / weights / branches of the tree (depending on what model you used) associated with the choice itself.