View this email in your browser
Masthead Sponsor
Link to UVA Data Points Podcast
Issue 256: 1 December 2022
Managing Editor Laura Norén, Editor and Producer Brad Stenger, and ADSA staff
A discussion of research, journalism, working conditions, tools, software, events, jobs, and community in data science. 
UK - 70,000 Academics on Strike
In the last issue we reported that the University of California system was seeing the largest strike action ever in higher education with 48,000 workers out on the picket line. That record only held until Thanksgiving day when 70,000 academic workers in the UK went out on strike. They are seeking better pay, the reversal of pension cuts, more job security, and better working hours. 

To put a finer point on it, pay raises for 2022-2023 were set at 3% despite an 11% inflation rate in the UK. Strikers want a 13.6% pay raise. Furthermore, “UCU (the union) says that on average, university staff work two extra unpaid days per week, and that one-third of academic staff are on temporary contracts.” They also want a reversal to what amounts to a 35% cut to pension benefits. 

UF: Going up for tenure every 5 years
The University of Florida system is facing a significant threat to tenure as we know it. The latest legislation would give state universities the ability to require faculty members with tenure to undergo a “post-tenure review” every five years. This requirement will be most troubling for faculty teaching in the humanities and social sciences because the bill also imposes subjective requirements for how topics like race and racism can be taught. It “prohibits trainings that cause someone to feel guilty or ashamed about the past collective actions of their race or sex”. The bill would also require universities to change which bodies they use to obtain accreditation, a very difficult stipulation because there simply aren’t that many accreditation bodies and it takes years to switch from one to another.

The University of Georgia system recently passed a bill with a similar 5-year post-tenure review process. It was then censured by the American Association of University Professors for creating conditions that do not allow academic freedom to flourish. 
Featured jobs

Lecturers, and Professors of Computer Science Practice for Data Science
Viterbi School of Engineering, University of Southern California
Los Angeles, CA

Clinical Faculty
Center for Data Science, New York University
New York, NY

Data Scientist

University of Rochester
Goergen Institute for Data Science (GIDS)

Rochester, NY


Audio of the week

UC on Strike [13:28]
WNYC, The Takeaway

Spreadsheet of new data science programs and funding announcements.

Sponsored Content 
Three Recommendations for Responsible AI
Our readership is likely aware and in agreement that it is a good idea to produce ‘responsible AI’. We’ve covered various frameworks and opinions about what “responsible AI” is and how to go about achieving it in the past. This week we ran across an interview with Navrina Singh, CEO of Credo AI, appointee to President Biden’s AI task force, and former engineering executive at Microsoft and Qualcomm. Singh laid out three high-level guardrails:
  1. “First, responsible AI requires a full lifecycle approach. AI systems cannot be considered ‘responsible’ based on one point-in-time snapshot, but instead must be continuously evaluated.” This makes a lot of sense as a check against various kinds of model drift as well as changes in the underlying real-world conditions. 
  2. “Secondly, context is critical for AI governance…trustworthy AI depends on a shared understanding that AI is industry specific, application specific, data specific and context driven.” Though she is an engineer, Singh sounds exactly like a social scientist here. Context matters, and not as a secondary consideration, as the primary consideration!
  3. “Finally, transparency reporting and system assessments….cannot be overstated as a critical foundation for AI governance for all organizations.” This may sound like a softball - who objects to transparency? - but it is the most difficult for most organizations. They see little benefit to preparing reports that will invite scrutiny and negative commentary from academics, journalists, and regulators.
Singh’s three guardrails succinctly outline sound governance criteria for the emerging field of AI.
An administratively manufactured disaster (UC strike)
UC-San Diego Assistant Professor Lilly Irani wrote about the impossible position she’s been put in due to the failure of the UC system negotiators to reach a compromise with the UAW over 8 months of bargaining that could have averted the current strike action. Sympathetic to the striking workers’ expectations of making enough to live in the hot housing markets surrounding many UC schools - Los Angeles, Santa Barbara, Santa Cruz, San Diego, Irvine - she also laments what this “administratively manufactured disaster” means for her students. 

Physically and politically unable to grade all the student work submitted to her large lecture class, she summarizes what we all know but don’t like to admit, “without grades to confirm their progress, students may lose their financial aid and, with it, their rent, food and tuition. Veterans can lose VA educational benefits. With incomplete transcripts, graduate school applications will get stuck. Students may find themselves unable to graduate.” The striking laborers are well aware of this and timed their strike to be resolvable before finals, but to impinge on these meaningful deadlines to underscore how much important labor they do. 

Our Audio of the Week in the right sidebar provides additional coverage of the UC strike action.

The strike continues, but postdocs and academic researchers have reached a tentative agreement that would give them raises, make postdocs minimum 2 years long (particularly helpful for international grad students whose visas are tied to contract length), provide 8 weeks paid leave for new parents, and add transportation benefits. Postdocs will continue to strike in solidarity with graduate students who have yet to reach agreements. There’s also been discussion about whether the postdoc union should have pushed harder for disability rights issues.

Researchers With Greater Access to Research and Teaching Labor Are More Productive
Elite universities that have greater and more consistent access to paid ‘junior labor’ - teaching and research assistants - see that their tenured faculty are more productive as measured by research outputs. 

Sam Zhang and Aaron Clauset at the University of Colorado Boulder were authors on a study that looked at the publication records of “78,802 tenured or tenure-track researchers at 262 PhD-granting US universities”. Non-faculty group members at the same schools were no more productive, suggesting that it is not the intellectual or physical environment alone that leads to more productivity. Having access to more labor support appears to explain the increased productivity, debunking “the myth of the meritocracy, that productivity is tied to some sort of inherent characteristics about a person,” according to Clauset.

Animal AI
One of our favorite occasional series is Animal AI where we cover applications of AI to animal settings. This time around we’ve got:
Science of Happiness - See more people
A dispatch from the science of happiness builds on existing findings that people who report being happier also report spending more time with others AND people are happier when they’re being social. But these findings haven’t pressed too hard on a key question: does happiness depend on which people we spend our time with? Anecdotally, we all know the answer is “yes”. But this is where social science can elevate an intuition to a finding.

A new paper by Hanne Collins (Harvard Business School), Serena Hagerty (University of Virginia), Jordi Quoidbach (Esade Business School), Michael Norton (Harvard Business School) and Alison Wood Brooks (Harvard Business School) finds that those who spend time with a diversity of people are happier than those who are equally social and equally diverse in their activity set, but have less diversity in their social portfolio. This finding holds when researchers looked between people and within the same individuals measured over time. Go reach out to an old friend or colleague or make a new social connection. You’ll both feel better once you get past the initial social anxiety. 

Sponsored Content 
An event for researchers who work with images as a primary source of data.
AI for Drug Discovery - new Therapeutics Data Commons
Marinka Zitnik is a biomedical informatician at Harvard Medical School working at the interdisciplinary intersection of statistics, clinical medicine, chemistry, and AI. She has launched the Therapeutics Data Commons to bring clinical researchers together with computer and data scientists in an intellectual exchange supported by practical processes. Zitnik explains that even though, “we see in publications that those [AI] models are achieving near-perfect accuracy….we [aren’t] seeing widespread adoption of machine learning in drug discovery. This is because there is a big gap between performing well on a benchmark data set and being ready to transition to real-world implementation in a biomedical or clinical setting. The data on which these models are trained and tested are not indicative of the kind of challenges these models are exposed to when they’re used in real practice.” The Therapeutics Data Commons aims to close this gap. 

Because we recognize how hard it can be to do interdisciplinary work, we are particularly interested in attempts to bridge intellectual and research gaps that mirror disciplinary boundaries, like this one does. As a new project, it remains to be seen how well the Therapeutics Data Commons will serve its intended purposes.
Copyright and AI - Still Anyone’s Game
Using old regulatory paradigms for new technologies is usually stupid. Heck, even using regulatory frameworks created to govern new technologies often leads to stupid outcomes (e.g. all those cookie banners that have to be interacted with for every single website in the EU. If we had listened to CMU’s Lorrie Cranor 20+ years ago, our browsers would be beautifully orchestrating our preferences behind the scenes). 

It is no surprise, then, that figuring out how to apply existing copyright law to govern what goes into AI models (training data), the model itself, and the outputs of the model has led to massive confusion. According to The Verge writer James Vincent, some lawyers confidently claim that there is definitely grounds for copyright infringement with typical AI applications and others confidently claim that there is not. A couple realities of copyright law are emerging: Fair use may allow content to be used for training purposes without model builders worrying that they need copyright releases. Model outputs generated solely by machine are unlikely to receive copyright protection. And yet there is a large legal territory between these two loosely established stakes in the ground. 

Scholar Andres Guadamuz at the University of Sussex has at least formulated a tight set of questions to frame the discussion: “First, can you copyright the output of a generative AI model, and if so, who owns it? Second, if you own the copyright to the input used to train an AI, does that give you any legal claim over the model or the content it creates? Once these questions are answered, an even larger one emerges: how do you deal with the fallout of this technology? What kind of legal restraints could — or should — be put in place on data collection? And can there be peace between the people building these systems and those whose data is needed to create them?”

New Unit Measurement Terms Created for the Biggest and Smallest Data
Increasing developments in data processing and nanotechnology mean we need new words for very big and very small units. The 27th General Conference on Weights and Measures has introduced new terms as follows:

ronna - 27 zeroes after the first digit 
quetta - 30 zeroes after the first digit

ronto - 27 zeroes after the decimal point 
quecto - 30 zeroes after the decimal point

Computer Science Job Searches - 21% fail rate
This is a straight pass-through of data reported by the Computer Research Association’s survey of the job market for tenure-track computer science positions.

“Survey respondents reported filling a total of 289 tenure-track faculty for an aggregate success rate of 76%, which is comparable to the 2019 study. Examination on the success of the search for each of the 148 institutions found that 21% of institutions failed to hire any faculty, while 55% succeeded in hiring at least as many faculty as were being sought. These failed search results are worse than, and the institutional success results are comparable to, survey results from 2019. In terms of results for different types of institutions, PhD institutions reported failed search rates of only a few percent and at least successful searches for more than 60% of institutions.  In contrast, BS/BA (37%) and MS (29%) institutions reported a much higher percentage of failed searches. There was also a lower percentage of successful searches with 38% for MS and 54% for BS/BA institutions.”

UK Conservatives Contemplate Banning Foreign Students Except at TOP Universities
If conservative lawmakers in the UK prevail, foreign students will be turned down for student visas unless they are studying at “top universities” and avoid pursuing “low quality degrees”. Foreign students who are able to win a spot at a “top university” in a degree course deemed high quality will face limits on the number of dependents they can bring with them.  

Meanwhile, the annual ranking of leading science cities as measured by scientific publications does not feature a single UK city in the top 5. Beijing, New York, Shanghai, Boston and San Francisco top the list, in that order. 

Large Language Models
We are working on a piece that will get into Large Language Models. For those really into this topic, we’re including a link to an LLM benchmark study out of Stanford’s HAI unit that aimed to plot 25 LLMs in comparison to one another. They did not get to Meta’s new Galactica LLM…but it was pulled offline only days after its public debut. Designed to help write scientific articles it was found to 1) generate realistic sounding garbage and 2) be able to be made racist.

US Census - Bad Look
US Census has recently failed a cybersecurity penetration test, but defended its decision to use differential privacy as the “best available option”. This despite researchers’ critiques that the Bureau has introduced unacceptable latency for releasing new data due to their practices for applying differential privacy and made it extremely difficult or impossible to study rural areas. One might argue that in this case, the Bureau’s main privacy threat is a lack of solid cybersecurity. 
Wide, Growing Racial Gap in NSF Funding 
A new research study looked at NSF funding decisions for over 1m proposal submissions from 1996 to 2019. Authors include Christine Yifeng Chen at Lawrence Livermore Labs and Rosie Alegado at University of Hawaii Manoa found a large and widening racial gap. White PIs were and are more likely to be funded than PIs of other racial backgrounds. They measured the gap in ‘surplus awards’ - the number of awards granted beyond what would have been consistent with the statistical average and found: “In 2019…white scientists received 798 surplus grants. The cumulative surplus over 20 years was 12,820 awards.” They noted that the surplus awards are higher for research projects than for teaching or other non-research awards.
Data Viz of the Week 

A visualization of US Couple’s Money Arrangements by Joanna R. Pepin in Socius

The 2023 Innovators Under 35 competition is now open for nominations
"Here’s what makes a great candidate—and how you can help [MIT Technology Review magazine] choose the finalists." Deadline for nominations is January 30, 2023.

Educational Opportunity
@NYUDataScience continues to search for Faculty Fellows (interdisciplinary postdocs).
"The position comes with a "generous compensation package which may include NYU faculty housing as well as funds to support research and travel". Deadline Dec 22, 2022.

Call for White Papers: Mid-cycle Robotics Roadmap Update
"We [Computing Community Consortium] are asking researchers to submit a white paper / position statement, no longer than two pages, that addresses the following: What are the current gaps in robotics research and its ecosystem – and what are potential opportunities to fill those gaps?" Deadline for submissions is January 9, 2023.
Tools & Resources  
Finding ‘fairness’ in AI: How to combat bias in the data collection process
Thomson Reuters Institute, Zach Warren from November 14, 2022
"Legal organizations and others that depend on artificial intelligence to power their data analytics and decision-making need to ensure they are addressing potential bias in data collection"

The 60 Best Campus Novels from the Last 100 Years
Lit Hub, Emily Temple from November 2, 2022
"To keep you company as the cold weather descends, here is a list of the greatest academic satires, campus novels, and boarding school bildungsromans in the modern canon."

Community Building in DS4A/Empowerment
Correlation One, C1 Insights blog from November 22, 2022
"Previous DS4A/Empowerment teaching assistants and graduates of DS4A/Empowerment discuss community building and maintaining networks post-program."
  We enjoy hearing from readers. Hit reply to send constructive feedback and suggestions.

About Us 
The Data Science Community Newsletter was founded in 2015 in the Moore-Sloan Data Science Environment at NYU's Center for Data Science. We continue to be supported by the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation through the Academic Data Science Alliance.

Our archive of newsletters is available.
If you no longer want to receive this newsletter: Unsubscribe
Please forward the newsletter to colleagues and students: Subscribe.
Academic Data Science Alliance Twitter account
ADSA Website
Copyright ©2022 Academic Data Science Alliance, All rights reserved.

Our mailing address is:
1037 NE 65th St #316; Seattle, WA 98115

This email was sent to <<Email Address>>
why did I get this?    unsubscribe from this list    update subscription preferences
Academic Data Science Alliance · 1037 NE 65th Street · Suite 316 · Seattle, WA 98115 · USA