IT & AI Meet Innovation

Own Your Stack

Are You Going To Own The Most Profitable Portion Of Your Business 5 Years From Now Or Are You Going To Give It Away?

About us

We offer full stack consulting services that will improve your business and your bottom line more than anyone else in the industry can. Every single member of our team is a full stack generalist. From Python, to SQL, to Javascript, and HTML+CSS, we do it all. Whether you want your own app, want to assess your tech stack, or want to talk AI, we specialize in reducing IT costs, and generating profits from your IT department.

I currently have over 30 books available on Amazon related to every aspect of Artificial Intelligence. From Development, to Mathemetics, to Philosophy. 

I currently offer over 30 courses related to AI and Machine Learning on Udemy. Several of them are 100% free courses. 

Blog

Fuzzy logic is a problem that has been a personal thorn in my side for almost a decade now, so I have personally been intrigued by the allure of using LLM models to replace fuzzy logic detection, or CDP. On paper, this is something that an LLM model should actually excel at, far better than a rules based system at least. 

 

Most people are likely not aware but rules based systems are absolutely awful at fuzzy logic, which is why the entire CDP industry has sprung up in the first place. It’s a simple challenge, yet computers are awful at it. The reason for that is pretty straightforward to me, this problem was ‘solved’ in the 1950’s, and no significant iteration has occurred since. 

 

Anyone who ever delves into fuzzy logic runs into this solution, it is elegant, simplistic, and it works, until it fails miserably. You simply set a cap on the standard number of deviations, and the system goes from there. If you set a standard number of deviations to one, then:

Cat

 

or

 

Kat

 

Would register as being the same according to the algorithm. Cat is within one deviation of Kat. Whereas:

 

Cats

 

Or

 

Caatz

 

Would register as not being related because it would take two changes, or two standard deviations, to get caatz to cats. 

 

This is simple and straightforward when you are dealing with only one row of data, and the system will detect your related records with a very high amount of accuracy. This level of accuracy decreases though the more rows of data you introduce. First Name, Last Name, Email Address, Phone Number, Address. We now have 5 rows of data and some of the rows of data can have a lot of characters in them. All of a sudden, our rules based system cannot keep up. 

 

Here are two sample records that I used all throughout my experiments:

 

# Example records

 

record1 = {'first_name': 'John', 'last_name': 'Doe', 'email': 'john.doe@example.com',

          'address': '123 Main St', 'phone': '555-1234', 'age': 30, 'location': 'City A'}

 

record2 = {'first_name': 'Jon', 'last_name': 'Doe', 'email': 'john.doe@example.com',

          'address': '123 Main St', 'phone': '555-1234', 'age': 32, 'location': 'City A'}

 

There are only two differences in this data, two standard deviations. The first name is John or Jon, and the age is 30 or 32. A person looking at these two records could easily deduce, Jon Doe with the same phone number and same email as John Doe, with two years difference in the age category, is almost assuredly the same person. When I run these records through even machine learning enhancing rules based systems, the best I get is a 50% chance that these two records are related. 

 

Rules based systems aren’t up to the task, but what about LLM models? I am definitely one of those people that have been in the camp of advising responsible AI deployment and use. AI is the shiny new toy but that does not mean you should simply throw AI at everything. The first step should always be to evaluate if AI is actually a viable option for what you would like to accomplish. 

 

In this instance, what a CDP does is the perfect example of what an advanced AI model can handle with ease, where a rules based system could never compete. As mentioned previously, my rules based system (based on fuzzywuzzy and skfuzzy), gave our example records a 50% chance of being directly related to each other. Unless you are an absolute madman, there is no way you will ever set your confidence threshold in a scenario like this to 50%. So, under any rules based system, these two records would not be related. 

 

In comparison, this is the exact type of challenge AI was made for. My first instinct was that this is such a simple challenge for AI, that I could use even the most basic of models, and they would properly logic through this experiment. My experiments in this regard ended up shedding a lot of light on this equation for me. Upfront, there is a definitive threshold, even among AI models for being able to solve this problem. 

 

I first experimented with Tiny Llama, which was not able to solve the challenge on any level. In virtually all of my experiments, the model simply parroted the question back to me. From there, I tried Quyen SE, which has proven to me to be very logical for its parameter size. Quyen SE got this question wrong. It deduced basically along the same lines as the rules based system, they are a different person because of the number of differences between the two records.

From here, I decided to go up in power, so I tried Llama 7B. To my surprise, Llama 7B even failed the test. From there, I needed to make sure my initial hypothesis that an LLM model could actually handle these challenges was in fact correct, so I tested Qwen 70B. Qwen passed the test with flying colors, as expected. From there, I went back to the 7B class and chose Mistral, which also passed the test. So, you can go as low as 7B with no additional fine tuning, or special training of any kind. You simply have to be selective about the model.

From this testing, I can deduce from here that fine tuning would increase the accuracy of models for this task. With fine tuning, you could perhaps even get Quyen Se to accomplish the task, surely Phi-2 could with fine tuning. If I had a choice though, I would simply throw a 70B model at the problem to make sure there were zero issues. 

 

If you are an enterprise organization of any kind, and you are currently paying for CDP of any kind, a 70B LLM model is assuredly actually cheaper to build and maintain than your current CDP costs, and you also get ancillary benefits from it beyond the CDP capabilities.  

 

For anyone who is interested in diving into the super high level weeds of all of this, here is a Colab version of the code I used for my experimentations. It contains different ways to feed the LLM the data and different prompts, as well as the complete rules based system.


https://colab.research.google.com/drive/1gYRWIlSlTkWB0u2eaYY5udlrdL_bcMVg?usp=sharing

Contacts

+1 661 699 7603
turingssolutions@gmail.com

Name *
E-mail *
Address *
How did you find us? *
Message *