Fair game: the ethics of telco fraud

How do we connect high-level principles with day-to-day product decision making? How do we move past the AI Ethics hype and start trying, testing and implementing practical approaches?

These questions are at the heart of Laura’s work, and in this talk she shares stories, discoveries, and decisions from her time as an ‘ethics ops’ consultant embedded with a small team in a big telco.

From improving the science bit of data science, to developing the collective sensitivity of the team, to designing recourse for false positives, tune in for pragmatic pointers and actionable take-aways that you can try with your team right away.

Fair game: the ethics of telco fraud

Laura Summers: Multi-disciplinary designer – Debais.ai

Keywords:ethics, ethics ops, risk mitigation, harm mapping, explicit vs implicit knowledge, data interpretation, fraud and security, cognitive design, ethics litmus tests

TL;DR: Laura parses some of the challenges involved in designing and developing ‘Ethics Ops’. Whilst she doesn’t proffer a magic bullet, she shares some key learnings and insights gleaned through her work as an Ethics Ops consultant, specifically learnings from within the fraud space, presenting a range of use cases to demonstrate how thinking through product design via a harm identification and mitigation lens can be helpful not just for our users but also ourselves, our teams, and our companies. By developing a healthy cynicism about data sources, mapping knowledge states, designing for failure first, and paying attention to upstream prevention as opposed to downstream mitigation, Laura introduces some practices and questions we can experiment with to work towards meaningfully grappling with the inherent risks, problems, and complexities surrounding the space of ethics and fairness.

Before we kick off, Laura would like to share a new project she has just posted a digital version of: Ethics Litmus Tests. The online version features a card-flipping tool that you can choose cards with. Laura borrowed the code for this project from Ben Buchanan, who created a card chooser tool for a similar card deck called Oblique Strategies, which was an inspiration for Laura’s project. This is a lovely moment of networks colliding! If you don’t know Ben, he’s the fella behind the Big Stonkin’ Post blog summaries of all the fabulousness coming out of Web Directions. Shout out and thanks to Ben, and if you want to try the online version of Ethics Litmus Tests please do so!

If you want any of the references for anything in this talk, you can find the slides here.

Section 1: Friend or fraud? Laura will cover four primary findings she’s like us to takeaway today:

  1. Developing a healthy cynicism about data sources.
  2. Mapping knowledge states.
  3. Designing for failure first.
  4. Upstream prevention vs downstream mitigation.

Don’t be worried that this will be too specific to Telco or machine learning. It’s actually quite generic. These findings and techniques she’s offering are certainly the foundation pieces to doing complex modeling correctly, but they apply to any kind of automation; any kind of decision system. Ergo: They apply to you.

At the beginning of 2020 Laura began consulting for Telco through her company Debias ai. She’s primarily embedded in their fraud detection team. Their approach to fraud detection is fairly standard in the fraud space – a mix of monitoring, internal business rules, and some third party systems that have proprietary fraud ML detection algorithms.

If you’re unfamiliar with the term ‘ethics ops’, think: dev ops, but for ethics. Laura introduces a harm identification and harm mitigation lens. Pragmatically, that work looks like monitoring notifications and alerts, as well as product flows and feedback loops.

Fraud is a really interesting space in which to discuss ethics. Not only are you the company capable of inflicting harm on your customers, you’re simultaneously under attack by a subset of customers who mean you harm. You have to balance the real (and often urgent) concerns of identifying bad behaviour whilst also acknowledging the possibility of false positives and considering how you might identify and support any innocent parties who accidentally got caught in the crossfire. It’s a war of attrition. Think: email spam.

Finding #1: Developing a healthy cynicism about data sources. When we start to feel too comfortable about how we’re thinking about or interpreting our data, that’s when we often get bitten by something unique or unexpected when we least expect it.

Laura believes that Data (interpretation) pride comes before a fall. Some illustrative examples: One person sending a high volume of SMS messages, much of it to overseas numbers. Typically, they would assign this type of behaviour to a marketing strategy or spam (Buy these Raybans! etc). Upon investigation, this was not the case. They were actually sending information-based messages containing activism literature. Typically you would expect this to be transmitted via email, but here it was via text. Unusual? Yes. Malicious? No.

Another ex: Customers self-automating across various arenas like daily reminders, or health monitoring auto-calls to check in on vulnerable populations. The danger here is that these types of automations look very much like more problematic bot behaviours or overly consistent ‘not human’ like behaviours that the data flags. A core element of doing this work well is keeping in mind that these innocent instances are happening simultaneously with more suspicious behaviours within the data set. How do we help ourselves to think through these issues?

Deconstructing your proxy: Thinking of proxy as a stand-in/placeholder for the thing you want to measure, all analytics and all product work rely on proxies. Ex: We might assume that the number of clicks on a button is proxy for: I want to buy that thing/take the next step, But it could equally be due to rage clicks. Or if looking at how much time a visitor spends on a page, we might assume that is proxy for how engaging the content is. But it could equally be due to page abandonment due to a pop up confusing them and turning them off. Being suspicious of your proxies and taking time to think them through is a much more pragmatic way to understand what you might be learning.

A concept framework. Start with the idea of a signal – the incoming data, the zeros and ones. Sometimes coming from third-party server, sometimes from infrastructure, sometimes from data sources you’re otherwise plugged into. It is a thing you are measuring. From the signal, you infer some type of activity. But note, the first inference is a leap in logic. It’s telling you: I think this is what this probably could be, but actually it could be something else. Back to the texting ex: The intuitive and obvious interpretation of seeing SMS’s coming from a phone is: Somebody is on their phone sending a text. Actually, it might be malware sending the texts without the person’s knowledge. Acknowledging the second possibility injects uncertainty into the equation.

From the activity, you might further infer a persona – if somebody is texting you assume two people are having a conversation. But it could also be somebody simply confirming an appointment. There are many iterations of texting that may not reflect a traditional conversation – testing an API, pinging etc. The lower persona could be the person who wrote the malware to make the phone send SMS’s in the background. We don’t know much about them, but we are going to assume they are making money in some way. What this framework of signal-infers activity-infers-persona demonstrates is that for every leap of logic, there is some uncertainty. The more we do the work to map this out and think it through, the more pragmatic and grounded our understanding of what we’re learning will be.

A $600m mistake? Lest we think Laura is throwing shade at prior employers, think about this example in terms of how small and subtle misunderstandings can spin out to massive impact: This recent NBN news story where they didn’t know they were missing 300k premises which still needed to be hooked up. To go back and re-integrate these added $600m to the project budget. The data distinction is that they had a list of addresses, but they did not know that that list of addresses was not every single premises on that address. This is an easy mistake to make and Laura feels empathy for the engineer who identified the problem. But it’s also a great Aesop’s Fable for the rest of us who cannot afford to make $600m mistakes!

Look at your raw data. Beyond deconstructing your signals and proxies, look at your raw data, not just at dashboards. Data scientists, please use the tools available to you to interrogate the validity of that raw data. Does it actually meet your expectations? Do the types match? Are there a lot of formatting issues? How is the overall data quality? This is a group responsibility, not just the data scientist’s. Subject matter experts often have strong intuitions about the human stories underneath the data. This can be invaluable to support the data scientists and analysts in knowing what they’re looking at and what to make of it.

Data misinterpretation is easy, and likely, under pressure. One of the most interesting aspects of working in fraud is the pressure put on data interpretation in periods of attack. This is akin to putting your data science hygiene through a crucible of intense experience in unexpected ways. If you’re not prepared to put your data sciences, labeling, and hygiene through that high pressure experience, you may not be at the stage of maturity that is optimal. If this is so, go back and think through what you might need to build up the confidence and lode-bearing nature that can withstand being in the crucible.

An example from Laura’s experience in fraud: One instance where they thought they had taken actions against specific services; they could see that the bot in production thought it had performed its necessary function and were very confused trying to understand what had happened. Upon investigation it turned out that the service running the data batch had simply been down for a few hours in a planned maintenance outage. This is a common occurrence, but the team felt so stressed about the problem that they missed the obvious explanation. She shares this not as a perceived failure of the team, but to highlight how easy it is to get wound up rather than to take the time to step back and logically parse and assess the various parts in the automation process where the issue may have occurred. Difficulty of interpretation when you are under pressure is a common issue. Ex 2: The team had some features that they know have some fixed period of time to calculate. Lag between the data being captured by activity in the real world to the features being calculated to the alert coming out on Slack was a fixed period of time which they knew they had to account for. But when they saw the alert coming out at 2am when they’re stressed and under attack they forgot about the time lag and instead assumed something more suspicious was going on because they had taken action on a service and weren’t seeing the result immediately. These kinds of interpretations are common and human, but with cool heads and deep breaths, moving on is easier.

Design for cognitive load. Simple doesn’t imply you’re stupid. Simply because you have a team of brilliant and skilled data scientists and developers and machine learning engineers doesn’t mean they need to function as detectives under time and cost pressures. If you can, add labels. Be explicit. Make it impossible to misinterpret.

Litmus test. A great litmus test here is: On a quick glance, is this easy to misunderstand? If you literally look away from the screen and look back and don’t see clarity immediately, this is a signal that you could do more on making it interpretable. More pointers for how to make data interpretation scalable and easier across the team:

  • Are we using DSL (domain specific language)? Are we using jargon? Are we using acronyms? If so, can we change it? If you can be explicit, particularly in regard to lode-bearing data or notifications, and don’t assume prior knowledge, this will almost always pay dividends.
  • Add units: When looking at time, is it minutes, hours, or days? When looking at counts, is it one person, or one hundred services, or one thousand?
  • Absolute (total count) or relative (%)? If looking at a relative percentage and seeing 90% accuracy, a sample size of ten means this is a good result but a sample size of a million means this is a terrible result. Relative percentage only yields useful information when grounded in total or absolute count.
  • Is the data recency clear? Calling back to the aforementioned fixed delay – if there is going to be a delay between data capture and seeing the thing, make that as explicit as you can: ex: the data was captured at [timestamp:x] and the alert happens at [timestamp:y] This may feel over-explicit at the time, but future you will thank past you for doing this!

Finding # 2: Mapping knowledge states. This is where Laura tells you the best way to improve your data science practices as a whole is to do better at the boring science bits, the bits where we capture knowledge and store it so you can see it in the future.

Beware the swamp of lazy assumptions. The swamp of lazy assumptions is the status quo for most companies, for most domains. There are times when this matters a lot. Ex:Fraud space. When first joining her team, Laura had trouble getting clear, understandable answers about the types of behaviours that were happening. Many mismatches of assumptions, a lot of namespace clashing, and much murkiness. Again this is common, but doing the job of asking the questions highlighted that many others had similar issues and mismatched mental models.

Implicit vs Explicit knowledge. Just because a thing is knowable doesn’t mean it’s known. Even if it is known, it doesn’t mean it’s knowledge that the team or the company owns or possesses. There’s a distance between what is knowable, what is known, and what everybody knows collectively. Become a knowledge excavator: The first step to creating shared understanding is to ‘sniff out’ implicit knowledge and try to formalize it.

Naming matters. Naming things, models, reference points;

  • Building a shared vocabulary – This is key. Even if working in a domain where there is a lot of existing language/jargon/vocab, taking the time to write out what you mean when you say the thing, can often capture all the other words that mean the same thing is super useful in getting everyone onto the same page.
  • Build shared mental models.
  • Avoid name-space clashing – if you do have name-space clashing, find a name to call something else so that you don’t have a push and pull of meaning. Ex: International revenue sharing fraud vs toll fraud. These are two names for the exact same kind of fraud, but looking at the names does not indicate this. Having cognitive dissonance creates more work for everybody in terms of interpretation.

Establish personas. Laura’s team uses personas to understand the different sorts of behaviours in terms of misuse and the drivers of those behaviours. (NB: this is not equivalent of the deeply researched personas typical in UX work, due to barriers inherent to talking to fraudsters, but doing what they can with the data available. Still useful to have names for the personas to help the team communicate internally.) Ex: They have a persona for somebody who over-uses promotions and engage in deal-seeking behaviour (think: digital version of coupon clipper). This person should not be confused with a person posing as a Telco and trying to onsell the services of the company, which is a whole different kind of costly and large-scale fraud. You’re sending messages or warnings to two very different subsets of people with very different intentions, so you need to talk to them accordingly.

Defining your baseline Without shared understanding you can’t define your baseline. Eliciting and discovering small discrepancies where team member’s understandings don’t match yields great insight. This can be in the form of revealing questions to investigate of pointing to product opportunities. This work may feel a little painful or verbose, but if you take the time to do it it can be insightful for sewing new seeds.

Data schemas. While we do well at capturing data which shows on the product side of seeing what people do with our tools, we aren’t so good at capturing information about our state of the world, our understanding and what changes over time. Doing the work of designing these data schemas both for establishing our baseline but also in establishing a starting point if we do want to go into statistical oe ML modelling. Ex: First, what did I think at the time? What was my intuition or inference based on the data? Then: What new info came to light; what did I discover? Then: What changed? What did I think after I saw this new info? Did it reinforce or change my original idea? As you get into the habit of designing data schemas it will really flesh out and formalize these experiments that we’re running all the time in product.

Changing our language changes our minds. Ex: Laura’s colleague presenting his data analysis and the possibility of entering a new feature: We’ll wait until we’re 100% certain…no make that 99% certain Hashtag proud mama! Laura had succeeded in getting everybody on the team to onboard the idea that yes, we’re doing science and moving toward certainty but we’re never there. We always hold space in our minds for the possibility of doubt and change. Small hygiene changes like moving from 100% to 99% are big steps.

Added uncertainty in the space of fraud Fraud adds another layer of uncertainty to the normal uncertainty you have of product interpretation or useability interpretation. Here, you really can’t trust what people say. People may answer your questions honestly, or they may lie. Dealing with this uncertainty day in day out can make these discussions feel suspect and you start to see fraud everywhere.

The antidote to suspicion. How do we avoid hardening our hearts to our customers? Set an explicit, deliberate intention to be kind and respectful in all of our communications.

bUt wHAt iF iT’s A BAd AcTor? This is a fair question, but so what? What’s the actual cost or problem? What possible outcomes could I foresee from this scenario? If I speak kindly or respectfully to a bad actor, is there any cost or harm to them or the business? If you speak dismissively or unkindly to a good actor who’s confused, is there a cost to the person or business? What if they’re differently abled or have difficulty with certain channels? The cost of speaking nicely to a bad actor is nothing compared with the cost of speaking badly to a good actor. You also need to be mindful of reputation cost. Focus on your goal: to prevent bad behavior that is costing us money. Keep respect and kindness at the heart of your communications.

Keep the moralizing out of it. You don’t know the person or their story. Even if they are exactly who you expect they are, it doesn’t really matter. Describe behaviours, not people. Describe: ‘service misuse’ or ‘promotions overuse’ rather than ‘an abuser’ or ‘a fraudster’ which triggers negative connotations and hardens your heart to them.

Finding # 3: Designing for failure first. At Telco, Laura’s been involved with designing communication flows to provide clearer feedback to people who’s service came under suspicion and allow them to request support if they think there was a mistake. In some cases they received a warning first, in others it was after the fact. These were only small batch cases, and proved very challenging. If you don’t ask, customers won’t tell. If you make an incorrect assumption about a person and take action against them, it’s so much easier to churn that to come back and say ‘that was wrong and I’m unhappy with you.’ You likely won’t get any feedback at all unless they are really angry, which is obvy the worst case outcome.

Feedback loops must be: Intuitive (make sense at the time), Contextual (specific to the action or experience this person just had), and Timely (both for the customer and for you – ask when it’s fresh in their mind). Get your feedback loops up and running and also incorporate…

Plan time for…

  • Customer support – if somebody comes back and says ‘fix it’ what will your support script be?
  • Product/model improvements – you may have spaces that aren’t performing well. You may need more features or improvements.
  • Integrate your learnings – think about other ways the feedback may be integrated into your product cadence.

Be explicit. You need to think about the potential harms and outcomes. If you can’t imagine consequences, you’re not thinking hard enough.

Shot.

We believe this system has contributed to making Facebook the safest place on the internet for people and their information.

See this 2011 paper by a group of facebook engineers which discusses their attempts to classify and prevent malicious behaviour (eg bots and other attacks) on the platform. Trying to figure out who’s behaving suspiciously and shut them down.

Chaser.

The goal is to protect the [social] graph against all attacks other than to maximize the accuracy of any one specific classifier. The opportunity cost of refining a model for one attack may be increasing the detection and response on other attacks.

Insert a string of scream emojis here. TL;DR: What they’re saying is: We don’t care how good our classifier is, nor do we care about the consequences of false positives. Zoom to 2020: The impact of being de-platformed or being otherwise silenced in a platform on facebook can be really significant. Can impact income (ex: small business) or safety (ex: vetting sex workers), to the point of potentially leading to death. So for Facebook to say in an academic way that they don’t care about the accuracy of any specific classifier is beyond naive but deeply problematic and downright dangerous. Keep this paper top of mind when needing to remember how unintended consequences can change over time.

Harm mapping.This topic is critical, and has a breadth beyond the time we have available here. Laura’s recommended starting point if you want some tools, workshops and activities to support your learning in this space is to go to her github page. Please also do come back to her with questions around any of these – she’s exploring also and developing intuitions about which ones are useful for what kinds of circumstances.

Failure-first design. If you’re thinking: What does that even mean? Here’s a couple kickstarter suggestions: It could be things like:

  • UI interactions – where you have links to buttons with phrases like: Get help/This isn’t right. Ex: ask Netflix to come up with a button to get rid of a recommended movie forever. They failed by recommending a film I hate, but I want them to correct this failure through some kind of UI interaction
  • Email/SMS templates – Ex: Offer the right to reply.
  • Support scripts for conversations – What are your bullshit sniff tests? These conversations are tricky, put some thought into it.
  • Data schema design – capture your internal learnings and map the state of your understandings over time.

Litmus test: What if this (being anything in your product) happened to my most vulnerable customer? We all have cases of someone who is vulnerable in one or many ways. What if you take the worst thing that could happen to them? Being forced to grapple with our worst case scenario and map out the associated user journey will force you to work out how important it is to mitigate this, identify it earlier and figure out how to offer support, or prevent it outright.

Prepare for pushback. Don’t expect gratitude! People don’t like it when you make bad assumptions about them! (Or sometimes even good ones!) Making the implicit explicit is often uncomfortable. Prepare yourself mentally.

Designing an escape hatch This is a useful metaphor in designing for failure-first. You don’t want to have to use it, but if you need it, you sure want it to be there and you want to ensure the hinges aren’t rusted shut.

Litmus test: Could any customer go through my escape hatch, recover, and remain a happy customer? This is a high bar, you could even elide ‘happy’ and still retain a bar of good quality.

Paper by Donella Meadows called: Leverage points: Places to Intervene in a System. which is great for some of the systems theory content we’ve been discussing here. She writes:

Delays in feedback loops are common causes of oscillations. If you’re trying to adjust a system state to your goal, but you only receive delayed information about what the system state is, you will overshoot and undershoot. Donella Meadows

These feedback loops that we’re baking in as far as failure-first design are not only to protect and support our customers, but also for our own understanding of the accuracy of our system. The feedback timeframe is important.

Finding # 4: Upstream prevention vs downstream mitigation. Or: Is this a product problem or a behaviour detection problem? This is a quote from Eva PenzeyMoog’s newsletter which focuses on her work in the technology of domestic violence and abuse, a space which has a lot of overlap with the space of fraud and technology: ie: people with a different agenda to yours are looking for vulnerabilities they can exploit for their own ends.

These sorts of things will happen. I’m very intentional about not saying abuse ‘might’ happen – if it can, it will. – Eva PenzeyMoog

Product use cascade: or a hierarchy of what’s possible for you to do in your product.

  1. What’s possible to do in the product – this trumps all else
  2. Intentions, framing, design. Ex:microcopy helping people know what they should be putting into the forum or how they should be using the tool, or onboarding copy.
  3. Culture of your community – if you have a product with a lot of people with an established culture of how they interact, that culture will also impact how people use your tool.
  4. …[anything else]
  5. Terms of use – this is least important and least impactful when it comes to changing people’s behaviour.

Possible = permissible. If you don’t care to make it impossible for someone to do something in your product, it’s tacit permission for the behaviour to occur.

Setting boundaries is design. Think about your marketing copy, onboarding, continued engagement comms – these are all opportunities to establish the norms of your tool and give people hints about what will be ok. Snarky tweet from Cory Doctorow:

The rise and rise of terms of service is a genuinely astonishing cultural dysfunction. Think of what a bizarre pretense we all engage in, that anyone, ever, has read these sprawling garbage novellas of impenetrable legalese.

We have to get beyond the pretense that terms and conditions is somehow meaningful or useful.

Downstream is usually more costly than upstream. Upstream changes the product. Downstream observes the behaviour and does something about it. Ex: promotions abuse. If you build guardrails into your design that prevent people from fraudulently using that QR code or that URL more than the number of times you want them to use it, that’s always going to be better than trying to work out how people have done it post-hoc and remove some credit.

Section 2: Top-down vs. bottom-up.

Responsible tech. This presentation is not about how to write a values strategy or principles document. There are no magic bullets in this space. We live in a world of complex, entangled systems and as a result, our problems in the space of fairness and ethics are complex. Addendum: Not only are there no magic bullets, but anyone who tells you otherwise is selling something. The desire for a magic bullet is part of the problem. We are in the Cambrian explosion phase, not just of AI, but also AI ethics. It’s on us to interrogate the power structures and agendas behind glossy marketing copy.

Principled artificial intelligence graphic demonstrates the enormity of the space of AI. This was created through a meta-study of values documents that found a great deal of overlapping concepts and assumed this as progress on the course of AI ethics. Laura is unconvinced that the repetition of themes is indicative of a global agreement on how to move forward. More just a function of copy and paste.

Quote from this study exploring AI ethics tools:

As a result, developers are becoming frustrated by how little help is offered by highly abstract principles when it comes to the ‘day job.’

Let’s look at some of these abstract principles to ground this: Here’s one from the Mozilla Trustworthy AI Paper

Goal:Consumers choose trustworthy products when available and demand them when they aren’t.

Two really problematic assumptions here. 1) Customers will know if its trustworthy – abstractions on top of abstractions, to the point where technologists themselves don’t know what classification means, how can we expect consumers to meaningfully interrogate or interpret that technology to determine trustworthiness? 2) Assumption that we are in a place to employ our consumer power. We aren’t. These are models, being purchased by advertisers, police stations, governments. Consumers are not voting customers at this stage = minimal power. Both an economic asymmetry and an information asymmetry.

Quote from Laura’s recent Twitter rant on this:

I’ve been meaning to write about the @mozilla white paper on Trustworthy AI for ages. I started to formulate a response with friends but then…life events happened and I didn’t have the bandwidth. Here’s the paper.

Next example of a goal or value or principle is taken from the IBM Ethics

AI must be designed to minimize bias and promote inclusive representation

Laura doesn’t disagree in theory, but: How much?or: What’s my code test for this? How do I know that bias has been minimized? If I care about inclusive representation, how much is enough? [and: Will you be reporting on that? and :How can I interrogate it?] The problem isn’t the statement but the follow through, or lack thereof.

One more ex from Smart Dubai:

AI systems should be safe and secure, and should serve and protect humanity.

Again, sounds great, but hand-wavey and divorced from reality.

Laura worries that much of the ethics work in this space is ‘bike-shedding’. Without measurable principles, or impacts or penalties for not getting this right, this feels toothless. Worse, they’re resource thieves. Lulling us into thinking we’re already doing the work when we haven’t really started.

Quote from Laura’s partner Andy encapsulates this problem: A string of empty profundities – Andy Kitchen

Laura doesn’t want to hear profundities that she can’t test and measure for. When thinking of building technology systems, particularly big, enterprise-level systems, think of us as a team sailing on a big cruise-liner. When descending to lower decks it feels safe and even boring. It feels more stable and the sense of motion is minimized. But if you go up to the top decks and stand outside in the wind and look down and comprehend the speed of the boat and the context of where you are, that’s when you can meaningfully grapple with dangers and risks.

Laura posits that we want to feel the wind rushing past. Insulating ourselves from that sensation is itself unhelpful.

How not to do principles: Don’t write a document, throw it over the wall and say ‘go use this.’ If you want a principles document, champion the approach from inside your teams, get your hands dirty and show by doing.

Commit to action: Assign strategy champions(s) that will actually support people in making it real; Set measurable targets; Provide examples, and Allow for uncertainty. So much of this work is contextual – a principle that makes sense in one domain may not in another.

Section 3:Final Thoughts.

  1. Developing a healthy cynicism about data sources.
  2. Mapping knowledge states.
  3. Designing for failure first.
  4. Upstream prevention vs downstream mitigation.

All of these lead into automation design itself, how we think about it and how we might change the way we think about it. Stop thinking about automation design as an all or nothing feature. How do we break down the monolith, test our assumptions, an explore what’s working in lightweight, iterative, easy steps:

  • Concierge prototypes (human-as-bot): if you put a human where a script or bot would be and they see something they’re supposed to do, and it feels bad, that’s useful info.
  • Automations can be temporary or time-boxed, they don’t need to be forever.
  • Not every automation needs to go end to end from input data to action at the end. Think about where it should be added – Dashboard? Notification? Escalation? Action? There is a world of nuance here.
  • Do usability testing (duh). If people are going to see it, make sure it’s interpretable and testable.

The first principle is that you must not fool yourself – and you are the easiest person to fool. – Richard Feynman

Whether we’re doing product, or running science experiments, or doing physics – we have to remember that our own over confidence is as much the enemy as anyone outside of ourselves

Remember that the real game is trying to catch the fraud, yet not defrauding ourselves.Our minds and interpretations are fair game. We need to improve our way of understanding what’s happening to avoid getting overconfident and stuck in our own biases. We are driven by the same incentives as the fraudsters, it’s just the light and dark versions of markets and capitals. We have as much in common with them as we do with any other user.

Thankyou! Check out Laura’s links below:

Slides.

Ethics Litmus Tests

Debias ai

Harm mapping