Need Help in creating ai chatbot for my app

ajaypillai · 2025-08-05T06:40:41.096Z

I’m building a chatbot for a school where parents can ask for their child’s report (e.g., marks from last year). All student data is stored in MongoDB.
My idea:

Use LLaMA 3 to extract intent and entities (like student name, year).
Use that info to query MongoDB.
Generate a response based on the data.
Does this sound like the right approach?
The issue: LLaMA 3 needs high specs, which I can’t afford right now.
Any suggestions for lightweight alternatives or ways to optimize this?
I can’t use OpenAI or anything as data is confidential
Thanks!

VietCat · 2025-08-05T07:59:37.656Z

hi there, you can use gemma 3 (4B or less depending on your hardware), which requires a much lower spec. However, based on my experiment, the result would be not so good (poor result, very slow response).

I took a couple months on investigating for an optimal solution for chatbot (low specs/cost, quick response, good result), but the conclusion is that there is no solution for low cost high quality. You would have to accept to pay by either cost, response time, or result quality

entfane · 2025-08-05T09:31:30.912Z

If you need purely the answer looks like a good solution could be a fine-tuned encoder + decoder. You provide table description + query and as an output you get a generated query which you execute and present the response. If that is not an option indeed it is worth looking into other LLMs, as mentioned by @VietCat Gemma 3. Here you should experiment and see what works better

dannyyeung · 2025-08-05T09:39:17.193Z

May be you can try Qwen’s models, it seems perform relatively good in low end setup. At least I choose to use Qwen myself locally.
It is from China, but its performance is better DeepSeek.

ajaypillai · 2025-08-05T09:43:19.999Z

Hey! Thank you so much for your response, as currently I’m working with phi3 I’m planning to fine tune it and check for what output will I get.

Actually my current planning was to simply extract intent and entity from the query passed by the user/parent
Suppose if a parent types, “I want to check report of John(studentName) of last year(date)”
So model must return John as studentName and 2024 as date
After this I’ll simply run a query to fetch data from mongodb
And generate a text based on my mongodb result.
Is this a good approach? Or something better is available and I’m not aware of…

@entfane @VietCat

ajaypillai · 2025-08-05T09:46:40.971Z

Hey @dannyyeung,

As my data is confidential so I can’t use these kind of LLMs
Like I can’t share my data with any 3rd party LLMs
That’s why I’m looking for open source and also which can run on my device locally without internet

VietCat · 2025-08-05T10:37:23.792Z

yes, the approach that you’re describing is actually similar to what i did before.

A user will send a message (it can be in any form and wording)
LLM or any intent/entity extractor model will play the role of extracting info in a form that you can process programmatically
your business logic will take the extracted info to build a query to fetch the data from your db
LLM will use the result to build a “nature” response then send to the user

I understand that your challenge is that you don’t have a good specs machine to deploy a good LLM model locally, which will introduce more difficulties (ex: cannot extract correct info from user’s message, slow response time…). Tbh, i had the same issue when i tried to build up a similar chatbot with limited resources.
From my experiment, Gemma is a bit better than others (Qwen3, TinyLlama, some PhoBert models) when deployed on a low-spec machine.

dannyyeung · 2025-08-05T10:44:49.509Z

Qwen has off line version.

ajaypillai · 2025-08-05T10:55:42.097Z

Okay got it, I’ll try with Gemma and also with Qwen as suggested by @dannyyeung If I stuck I’ll let you know
Thank you so much your response it’ll help me a lot!

John6666 · 2025-08-05T11:32:00.741Z

Just an introduction to the models. If you need a relatively minor language, I think Qwen or Gemma would be good choices. If you only need English, I think SmolLM3 is excellent.

Good smaller LLMs

huggingface.co

HuggingFaceTB/SmolLM3-3B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Google's Gemma models family - a google Collection

Unlock the magic of AI with handpicked models, awesome datasets, papers, and mind-blowing Spaces from google

huggingface.co

Qwen3 - a Qwen Collection

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Qwen2.5 - a Qwen Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B.

kojrey · 2025-08-05T13:23:56.538Z

I’m not an intermediate here, and barely even a beginner (this is my first ever post I think)…

But have you considered possible data leakage in this solution? Will there be guardrails so Parent X can’t look up the marks and report cards of the children of Parents A, B, C?

I’m sure you’ve likely thought about this, but this was just the first thing that popped in my head when I read this thread. Wish you all the best, and I hope the solution comes out awesome for the school, parents and students.

John6666 · 2025-08-05T23:02:44.571Z

I can’t use OpenAI or anything as data is confidential

I think he/she thinks about privacy carefully.

entfane · 2025-08-06T06:17:27.887Z

Thats a good notice and definitely should be taken into account!

Pimpcat-AU · 2025-08-06T08:29:58.231Z

Use any bot you want, makes no difference. Just index your data via the textdb.

“”"
Script: index_text_dataset.py
Purpose: Index a text dataset using textdb for fast retrieval and efficient data access.
Dependencies: pip install textdb
Instructions:

Set INPUT_FILE and OUTPUT_DB.
Run: python index_text_dataset.py
“”"

from textdb import TextDB

Path to your input text file (one sample per line)

INPUT_FILE = ‘/path/to/input.txt’

Output directory for textdb index files

OUTPUT_DB = ‘/path/to/output_textdb’

def main():
# 1. Load the dataset
print(f"Loading data from: {INPUT_FILE}")
with open(INPUT_FILE, ‘r’, encoding=‘utf-8’) as f:
lines = [line.rstrip(‘\n’) for line in f]

# 2. Create and build the TextDB index
print(f"Building index at: {OUTPUT_DB}")
db = TextDB(OUTPUT_DB, mode='w')
db.add_all(lines)  # Index all samples
db.save()          # Persist index to disk

print(f"Indexed {len(lines)} lines. TextDB saved to: {OUTPUT_DB}")

if name == ‘main’:
main()

ajaypillai · 2025-08-06T13:35:49.622Z

Hey! Sorry for the late response.

Those things will, of course, be taken care of. Each parent will be given separate login credentials, so they can only access information related to their own children.

For example, if Parent X has 100 kids (which hopefully won’t be the case ), they’ll only be able to search for and view those 100 students. This is because the children are linked to Parent X via their unique objectId.

Hope you got it

ajaypillai · 2025-08-06T13:37:13.545Z

Thanks so much, bro! I’m definitely going to try that!

ajaypillai · 2025-08-08T08:07:24.847Z

Hey, thank you so much for the reply, and sorry for the late response! I’m currently checking out Gemma 3n 4B but I’m running into an issue—it requires a lot of memory to run locally after downloading it from Hugging Face. Do I need to quantize the model before running it?

John6666 · 2025-08-08T08:16:19.660Z

Yeah. But quantization at runtime (on-the-fly quantization) is not that difficult. Of course, it becomes difficult if you are particular about accuracy and speed…
Also, there are many high-quality files available on Hub for the GGUF format, so it would be a good idea to search for them.

# pip install -U bitsandbytes
from transformers import pipeline, BitsAndBytesConfig
import torch

# https://huggingface.co/blog/4bit-transformers-bitsandbytes
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-3n-e4b-it",
    torch_dtype=torch.bfloat16,
    quantization_config=nf4_config,
)

ajaypillai · 2025-08-08T08:21:27.599Z

Will this directly run without quantization?

John6666 · 2025-08-08T08:27:55.252Z

Yeah. GGUF files are pre-quantized files that can be used in their quantized state. They are to be dequantized at runtime, but this is not something we need to worry about.
Hugging Face’s Transformers are not suitable for running GGUF, so if you want to use GGUF, it is better to run it using Ollama or similar tools. There are various quantization formats available for Transformers, but BitsAndBytes is usually sufficient.