View this email in your browser
Masthead Sponsor
05 October 2022
Managing Editor Laura Norén, Editor and Producer Brad Stenger, and ADSA staff
A discussion of research, journalism, working conditions, tools, software, events, jobs, and community in data science. 
CS/DS Cluster Hires
Last week we noted our surprise that Yale could hire so many CS faculty members in a 3 year period. Our readers pointed out that Yale is not the only school on a CS/DS hiring frenzy and is far from making the largest cluster hires.

UIUC hired 12 CS faculty in the same period, just one short of Yale, though maybe not a perfect comparison because UIUC is significantly bigger than Yale. Still. It’s a large talent acquisition.

The biggest volume of hires by far is at the Roux Institute, which is a Portland, Maine-based part of Northeastern University. (Northeastern’s main campus is in Boston.) They announced they are seeking 70+ new hires, though it looks like only 10 are for full-time faculty positions.

Where hiring was notably absent in the CS/DS field was at Meta. Over the summer, the company announced to interns that their past practice of offering permanent positions to at least some interns at the end of the summer would not happen in 2022.

danah boyd to Step Down from Data & Society Board of Directors
danah boyd has announced she will step down from her board role at Data & Society next March as part of a “natural” leadership transition, ending her Chair-ship now. Charlton McIlwain, NYU Vice Provost and Professor of Media, Culture, and Communication, will assume the position of Chair of the Board.

boyd founded Data & Society in 2013, the first independent research-based non-profit to focus on the societal impacts of the communications happening in our networked societies. boyd notes, “while this decision is bittersweet….The bones of the organization are strong, the mission is clear, and the work could not be more relevant or important at this moment in time.” As a prolific researcher in this intellectual space, she intends to continue collaborating, engaging, and writing.

UConn, UNC Scrape Students’ Social Media
Using a third-party service called Navigate360, various universities including UConn and UNC, have contracted to scan and collect content posted to student’s social media. UNC decided not to renew their contract. UConn students are pushing for the same non-renewal in Connecticut, arguing: “UConn has created a panopticon.”

For their part, Navigate360 and similar web-scraping companies argue that any content on the public web is fair game.
  We enjoy hearing from readers. Hit reply to send constructive feedback and suggestions.

Featured jobs

Johns Hopkins University logo
The Johns Hopkins University
Mathematical Institute for Data Science (MINDS)
Baltimore, MD

Johns Hopkins University logo
Associate/Full Professor – Data Science

University of Texas, San Antonio
School of Data Science
San Antonio, TX

Johns Hopkins University logo
Associate/Full Professor - Bioinformatics
University of Texas, San Antonio
School of Data Science
San Antonio, TX

Johns Hopkins University logo
Professor of Instruction – Workforce
University of Texas, San Antonio
School of Data Science
San Antonio, TX

Audio of the week
By Malcolm Gladwell

History of Medical Experiments in the US

Spreadsheet of new data science programs and funding announcements.

Academic Inequality, Follow-up
We wrote about academic inequality two weeks ago and want to follow-up with a pertinent finding from Aaron Clauset and Daniel Larremore who worked with K. Hunter Wapman and Sam Zhang to do network analysis in which PhD granting programs were nodes connected by hiring “edges” to tenure-track hiring departments. For computer science departments: “only 12% of faculty were able to get jobs at universities more prestigious than where they went to school—a number that plummeted to 6% in economics.” In other words, there is a strong status gradient that makes it statistically improbable for PhDs to get positions at institutions higher ranked than the one where they got their degree. It is remarkably strong across fields. In their words, “the prestige hierarchies that broadly define faculty hiring are universally steep, and often substantially steeper than can be explained by the ubiquitous and large production inequalities.”

See also: Data Viz of the Week, bottom of this email.
Sponsored Content 
The Draper Data Science Business Plan Competition is designed for student entrepreneurs to advance business ventures that use data science to create value and solve unmet problems. Female teams are encouraged to apply. First information session is Friday, November 11, 2022.

Insurance market collapsing
As readers of this newsletter know, climate change is both an event occurring in real-time, right now, and a future event that we’d be better off avoiding. As we have written in the past, using a social science lens, our prediction is that the insurance industry will prove to be one of the most impacted players and impactful actors in the way climate change shifts human behavior, at least in countries where insurance is required to obtain housing mortgages and other substantial purchases.

Professor Carolyn Kousky wrote an explainer on the way insurers are impacted by climate change: “Very large homeowners insurance companies have largely left the state of Florida, and you see insurers leaving the Gulf Coast all the time. Several just went bankrupt with Hurricane Ida in Louisiana.” In California, R J Lehmann found that, “nonrenewals of residential policies in the state grew by 36% and new policies written by the state’s residual market FAIR Plan surged 225%” largely due to wildfire risk (FAIR explained). The costs to insurers of wildfires in 2017 and 2018 obliterated their profits for the previous 25 years, “Homeowners insurers doing business in the state posted a combined underwriting loss of $20 billion for the massive wildfire years of 2017 and 2018 alone…double the total combined underwriting profit of $10 billion that California homeowners insurers had generated from 1991 to 2016.”

It’s not just toasty California and soggy Florida.

After Hurricane Sandy, coastal properties in the east were also rejected from insurance renewals. Without homeowners insurance, banks will not underwrite mortgages, dramatically slowing transfers of properties from one owner to the next, and transferring risk to state-backed insurance programs of last resort (e.g. flood insurance is always state backed; home owners who face non-renewal can rely on limited state-backed FAIR insurance if they are unable to qualify with private insurers). And in Oregon, the state Department of Natural Resource’s new wildfire risk map, which would have been used by insurers to update their own risk calculations, had to be pulled back for at least a year after vociferous public outcry.

When asked in a post-Hurricane Ian MSNBC interview about homeowners potentially retreating from the risky areas where their properties were destroyed by Ian to safer areas, Kousky noted that “if people want to move somewhere safer or build better… there are a lot of challenges in the system. Some of the federal assistance dollars are only meant to be used to build back in place in the same way. There’s some policy changes we need there. Sometimes there is simply not enough money to help someone pay off a mortgage and relocate somewhere else.” In other words, the science about climate change may have been clear enough to advise against choosing to live near the coast or in wildfire prone areas, but getting people to respond may require economic and political levers that haven’t been altogether coordinated. Failing to receive insurance renewal and having to fall back on the states’ limited FAIR programs means owners end up paying more for less coverage and states end up overburdened by a pool of the riskiest properties.

Atmospheric data scientists’ predictions are improving, for better and worse. Israeli scientists can now predict flash floods at 90+ percent accuracy with a predictive model that uses water vapor data collected at Eastern Mediterranean weather stations 24 hours before a rain event. Simulations to forecast rain get exponentially more complex (and more expensive) as the time window increases beyond a few days. Nicklas Boers' team at Technical University of Munich turned to GANs as a way to help predict extreme precipitation events over longer time windows, without exploding the computational requirements. The approach augments Earth system simulations and helps in scenario development and bias recognition. Atmospheric geo-engineers at Colorado State University and Stanford simulate spraying fine particles (reflective aerosols) into the upper atmosphere to reflect heat. Results are promising. Planetary temperatures will likely cool significantly but unevenly. Their models show that air temperatures will continue to rise in many places for at least ten years after a stratospheric intervention.

From our perspective in the data science community, we feel that the international nature of scientific collaboration and the interdisciplinary nature of data science will be one of the best ‘rapid response’ crews available to scope and provide reasonable information about the choices facing individuals and governments. Already, First Street Foundation has provided a free Risk Factor calculator that maps wildfire and flood risk to every property in the US using data-science backed projections that are fresher than any similar risk calculator available to the public.
UCLA Buys Marymount California Campus
UCLA announced plans to buy Marymount California’s “24.5-acre campus and an 11-acre residential site” for $80 m. UCLA, potentially the most popular university in the country in terms of undergraduate applications (140,000+), has run out of places to house their expanding student body. Marymount California University in Rancho Palos Verdes shuttered earlier this year (2022).

The academic marketplace for undergraduates is well-diversified in the US. The loss of one smaller private school due to student disinterest to the benefit of a public school with stupendous student interest is likely a net win for California’s undergraduates and their families. It should be feasible for students to attend in-state public schools. California has been struggling to keep up with increasing population and demand for high quality education for decades. Some students at UC Santa Cruz are living in unusual circumstances - for instance, student Matthew Chin is living in a trailer parked on someone’s driveway - because they either cannot afford or got lotteried out of the dorms.
On fairness in AI:
Recommender Systems

Recommender systems play a large role in cultural markets - music, video, tv, movies, podcasts - particularly as the amount of choices explodes. The Guardian’s pop and rock critic Alexis Petridis suggests 60,000 new musical tracks are uploaded to listening platforms every day. They also come into play in employment, dating, travel, news discovery, and other crucial markets where efficient discovery reduces overall resource expenditure. But how do we know if recommenders are operating fairly? In many recommender models, the assumption is that item relevance is directly linked to item exposure (the more “relevant” the item, the higher it will be ranked relative to less-relevant items. The greater the exposure the highly relevant items receive is deemed to be ‘fair’ because it reduces the search resources expended by the recipients of the recommendation).

Researchers Yuta Saito and Thorsten Joachims at Cornell presented a paper at SIGKDD that utilized a new way to define fairness derived from a social utility maximization model developed in economics (video available). Their adaptation of Nash’s Social Welfare function followed three principles, “Every item’s benefit from being ranked on the platform is better than being discovered at random; no item's impact, such as revenue, can easily be improved; and no item envies the allocation of positions for the other items.” This process for providing recommendations has the potential to be more fair to content creators and other parties on the item distribution side of the equation.

AND Removing (Queer) bias from language models
Researcher Katy Felkner made progress towards anti-biasing a language model against LGBTQ people by gathering and feeding it a better training set, mostly from Twitter accounts of LGBTQ people talking about their experiences. (News articles about LGBTQ happenings weren’t as helpful). Training data matters. When it is drawn from a population with a history of bias against a particular group, it’s likely to reflect that bias unless trained otherwise.

Parkinson’s Severity Detection
Placing radio-wave sensors in the homes of people with Parkinson's - 40% of whom receive little or no specialized care for their disease - could help monitor disease severity and responsiveness to drugs. In a year-long pilot test across 50 homes (34 of which contained people with Parkinsons, 16 of which were controls), MIT researchers led by Dina Katabi, Yingcheng Liu, and Guo Zhang confirmed that gait speed is a key distinguishing factor between people who don’t have Parkinsons and those who do. Further, the radio wave sensors were able to detect improvements in gait that spiked after patients took their medications, then slowly declined over the subsequent hours. This is particularly useful for Parkinsons patients because drug side effects are noxious. Tools that allow specialized physicians to have enough information to time and titrate doses can increase patients’ quality of life. This type of real-time response-to-medication data is also a boon to researchers working to develop new drugs as the sample size in this study (N=50) is fairly manageable.
Animal Detection AI
“Blue, fin, humpback and grey whales are vulnerable to ship strikes as they migrate and feed” off the coast of California. With AI trained on visual and acoustical data, whale locations can be pinpointed and port traffic controllers can alert ships to steer clear of the large mammals. Already deployed in the Santa Barbara Channel near the ports of Long Beach and Los Angeles, a similar system running the same model is headed for the area around San Francisco, supported by Marc Benioff’s foundation and the Benioff Ocean Science Laboratory.

Researchers at Cornell and in Australia have trained an AI to detect whale calls that is now 90% accurate, besting human raters who only hit 70% accuracy. Perhaps more importantly, “It took about 10 hours of human effort to identify the calls. It took the ML-detector 30 seconds – 1,200 time faster. And it doesn’t get tired” according to lead author Brian Miller of the Australian Antarctic Division.

In Australia, exotic animals are smuggled in and out in people’s luggage. It’s impossible to open every piece of luggage to prevent this flow, so Vanessa Pirotta and Justine O’Brien developed an image recognition model trained on airport luggage scanner imagery to detect animals like lizards, birds, and, possibly, koalas. Non-native animals coming into the island nation can become destructive invasive species. Native animals leaving the country may be endangered - not only by being crammed into stuffy luggage, but also in the sense that they are on the endangered species list.

In the North American cattle raising territory, OneCup AI has developed a model called Betsy based on camera data placed where cattle eat and sleep. Betsy is able to learn which cow is which (cow facial recognition) and detect health issues like limping and calving. AgTech and animal protection tech will continue to be a growth sector and likely continue to be under-covered in data science news cycles.
Papermills Have a new Nemesis
Adam Day, director of a data-services company called Clear Skies in London, trained a text-classifier on known “paper mill” papers using a dataset developed by Elizabeth Bik and David Bimler. (Those two have been on a bit of a crusade against plagiarism and similar transgressions in the academic publishing industry for years.) Day’s model only looks at titles and abstracts, flagging those with high similarity scores compared to known papermill publications. He found about 1% of the papers he fed through the model were flagged as potential scholarly fakes masquerading as real research.

Demanding Authorship
In another study touching on academic bad habits, veterinary researcher Nicola di Girolamo was part of a team at Cornell University to review the author statements on ~82,000 papers. They wanted to determine how many qualitative descriptions of per-author contributions would reveal authors falling short of the authorship criteria laid out by journals. They found, “overall, some 35% of the authors failed to meet the ICMJE criteria, and 4% didn’t meet the PNAS standards.” These ‘honorary authors’ “‘are misrepresenting their contributions in the scientific literature,’ possibly to inflate their volume of publications for tenure and promotion.”
White House Releases 5 Principles for Ethical AI
The White House Office of Science and Technology Policy has released five principles for ethical AI that’s an intellectual warm-up act for an AI Bill of Rights.
  1. Safe and Effective Systems: You should be protected from unsafe or ineffective systems.
  2. Algorithmic Discrimination Protections: You should not face discrimination by algorithms and systems should be used and designed in an equitable way. “This protection should include proactive equity assessments as part of the system design, use of representative data and protection against proxies for demographic features, ensuring accessibility for people with disabilities in design and development, pre-deployment and ongoing disparity testing and mitigation, and clear organizational oversight.
  3. Data Privacy: You should be protected from abusive data practices via built-in protections and you should have agency over how data about you is used. For this one, we want to point out two stand-out pieces of advice: “Consent should only be used to justify collection of data in cases where it can be appropriately and meaningfully given”. This is an excellent advancement in terms of privacy, acknowledging that consent is illegitimate if it is not informed and freely given. They go on to take a stand against surveillance creep: “Continuous surveillance and monitoring should not be used in education, work, housing, or in other contexts where the use of such surveillance technologies is likely to limit rights, opportunities, or access.
  4. Notice and Explanation: You should know that an automated system is being used and understand how and why it contributes to outcomes that impact you. “Automated systems should provide explanations that are technically valid, meaningful and useful to you and to any operators or others who need to understand the system, and calibrated to the level of risk based on the context.” The emphasis on AI explainability is notable because it’s still an active area of research and debate. A new paper by Anne Marie Nussberger and MJ Crockett looks at the preference for accuracy or interpretability when it is not possible to optimize equally for both. They find that people tend to prefer accuracy, which is a real challenge for principle. They conclude with a call for more research about the downstream consequences of preferences for accuracy of non-interpretable systems: “These attitudes could drive a proliferation of AI systems making high-impact ethical decisions that are difficult to explain and understand.
  5. Human Alternatives, considerations, and fallback: You should be able to opt out, where appropriate, and have access to a person who can quickly consider and remedy problems you encounter.
Data Viz of the Week 
Figure 6 in Prestige change from doctorate to faculty job in the US faculty hiring network
By K. Hunter Wapman, Sam Zhang, Aaron Clauset, and Daniel Larremore

Quantifying hierarchy and dynamics in US faculty hiring and retention
Calling researchers in the social and behavioral sciences! It's happening again!
"All throughout October... we're running more AI-powered prediction markets and we need you to join us. We'll give you cash to play with."

2022 Foundational Integrity Research request for proposals"In this request for proposals (RFP), Meta is offering awards to global social science researchers interested in exploring integrity issues related to social communication technologies. We will provide a total of $1,000,000 USD in funding for research proposals."
Deadline for proposals = November 22.
Tools & Resources  
I do a lot of work on science of selection. … Someone asked me for my "rules of thumb of selection". Here's a thread!
Twitter, Jim Savage from September 29, 2022
*The @KumarAGarg principle*
- Give homework and select on ability to do things.
- Some people do things without *any* prompting. They're on rails. Kumar's "doers".
- Almost nothing else matters.
Design principles for data analysis
Flowing Data blog, Nathan Yau from September 27, 2022
"To teach, learn, and measure the process of analysis more concretely, Lucy D’Agostino McGowan, Roger D. Peng, and Stephanie C. Hicks explain their work in the Journal of Computational and Graphical Statistics"
About Us 
The Data Science Community Newsletter was founded in 2015 in the Moore-Sloan Data Science Environment at NYU's Center for Data Science. We continue to be supported by the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation through the Academic Data Science Alliance.

Our archive of newsletters is available.
If you no longer want to receive this newsletter: Unsubscribe
Please forward the newsletter to colleagues and students: Subscribe.
Academic Data Science Alliance Twitter account
ADSA Website
Copyright ©2022 Academic Data Science Alliance, All rights reserved.

Our mailing address is:
1037 NE 65th St #316; Seattle, WA 98115

This email was sent to <<Email Address>>
why did I get this?    unsubscribe from this list    update subscription preferences
Academic Data Science Alliance · 1037 NE 65th Street · Suite 316 · Seattle, WA 98115 · USA