PopYard:Today's Tech.-Building Human Curiosity into Artificial Intelligence

Tue Nov 26 16:26:21 2024

Building Human Curiosity into Artificial Intelligence
Source: Scott Zoldi

Advances in artificial intelligence (AI) and machine learning are boosting the ability of predictive analytics to boost bottom lines. Does that mean that smart machines are about to replace humans in higher-complexity jobs? No doubt, smart machines are getting smarter. But even the smartest machines lack fundamental human characteristics absolutely critical to solving certain types of problems, whether human or machine. One of these key capabilities is curiosity ― but how can we replicate that?

For the answer, we need to look at neuro-dynamic programming. It’s an analytic method for learning and anticipating how current and future actions are likely to contribute to a long-term cumulative reward. This technique is related to advanced AI reinforcement learning methods, which take inspiration from behaviorist psychology to connect future reward/penalty back to earlier steps in a decision-making process. That contrasts with traditional supervised learning, which attributes reward only to the current decision.

These advanced methods focus on repeated experimentation and prediction and ultimately these chains of actions produce much more complex decisions/strategies and outcomes. For example, these methods are leveraged in robotics to allow learning to occur to stabilize, grasp and manipulate an object. These analytic methods mimic the way the brain learns complex task sequences through pleasurable or painful feedback signals that may occur later in time �C essentially, how humans seek and achieve long-term positive results. Think about how you learned to ride a bike -- gradually mastering balance, braking, mounting and dismounting (and falling safely).

Clearly, analytics that can “think” well ahead and focus on the most favorable long-term outcomes are highly valued. That’s particularly true in the many operational decisions about customers that have long-term consequences and where loyalty is earned over repeated interactions with an organization.

High customer lifetime value and healthy, sustainable cash flow are both produced by a series of interactions: The business takes an action, the customer reacts, the business responds to the new state of the relationship with another action, the customer reacts … and so on. In this way, neuro-dynamic programming enables smart machines to think ahead -- potentially making moves early in the decision chain that may not appear optimal in the short run but lead to better decisions in the long term.

Another way to think about this concept is to consider a group of dumb software agents, similar to individual ants. The agents interact with their environment, rewarded or penalized around a small set of success criteria. Gradually sequences of successful behavior emerge as the agents begin to map out the risk and reward of various inter-related activities �C many paths are explored and non-optimal ones learned and abandoned in the pursuit for the best chains of actions. Those agents with few successes receive a low “fitness” score and die out, whereas those with many successful sequences score high and are allowed to reproduce, mutate, or combine with other high-scoring agents. In this way, the overall performance of the group increases.

All the while, their environment is changing. So these agents not only act in the optimal way based on their current best “map of the world,” they also experiment to deal with these changing conditions. Using probabilities, they make slight variations and mutate around the optimal strategies. As these activities result in rewards and penalties, they learn from these experiments and adjust to a changing fitness landscape continually.

As you can see in Figure 1, at any point in the sequence, the current state of the customer relationship is the result not only of the just-taken action, but also of the string of previous actions. Just as in a chess match, where a checkmate could be rooted 10 moves back ― or even in the first move ― the loss of a valuable customer may have started with actions taken months ago. To be successful, a business needs to understand and track this dynamic.

}