I’m building a chatbot for a school where parents can ask for their child’s report (e.g., marks from last year). All student data is stored in MongoDB.
My idea:

Use LLaMA 3 to extract intent and entities (like student name, year).
Use that info to query MongoDB.
Generate a response based on the data.
Does this sound like the right approach?
The issue: LLaMA 3 needs high specs, which I can’t afford right now.
Any suggestions for lightweight alternatives or ways to optimize this?
I can’t use OpenAI or anything as data is confidential
Thanks!

2 Likes

hi there, you can use gemma 3 (4B or less depending on your hardware), which requires a much lower spec. However, based on my experiment, the result would be not so good (poor result, very slow response).

I took a couple months on investigating for an optimal solution for chatbot (low specs/cost, quick response, good result), but the conclusion is that there is no solution for low cost high quality. You would have to accept to pay by either cost, response time, or result quality

2 Likes

If you need purely the answer looks like a good solution could be a fine-tuned encoder + decoder. You provide table description + query and as an output you get a generated query which you execute and present the response. If that is not an option indeed it is worth looking into other LLMs, as mentioned by @VietCat Gemma 3. Here you should experiment and see what works better

2 Likes

May be you can try Qwen’s models, it seems perform relatively good in low end setup. At least I choose to use Qwen myself locally.
It is from China, but its performance is better DeepSeek.

3 Likes

Hey! Thank you so much for your response, as currently I’m working with phi3 I’m planning to fine tune it and check for what output will I get.

Actually my current planning was to simply extract intent and entity from the query passed by the user/parent
Suppose if a parent types, “I want to check report of John(studentName) of last year(date)”
So model must return John as studentName and 2024 as date
After this I’ll simply run a query to fetch data from mongodb
And generate a text based on my mongodb result.
Is this a good approach? Or something better is available and I’m not aware of…

@entfane @VietCat

1 Like

Hey @dannyyeung,

As my data is confidential so I can’t use these kind of LLMs
Like I can’t share my data with any 3rd party LLMs
That’s why I’m looking for open source and also which can run on my device locally without internet

1 Like

yes, the approach that you’re describing is actually similar to what i did before.

  • A user will send a message (it can be in any form and wording)
  • LLM or any intent/entity extractor model will play the role of extracting info in a form that you can process programmatically
  • your business logic will take the extracted info to build a query to fetch the data from your db
  • LLM will use the result to build a “nature” response then send to the user

I understand that your challenge is that you don’t have a good specs machine to deploy a good LLM model locally, which will introduce more difficulties (ex: cannot extract correct info from user’s message, slow response time…). Tbh, i had the same issue when i tried to build up a similar chatbot with limited resources.
From my experiment, Gemma is a bit better than others (Qwen3, TinyLlama, some PhoBert models) when deployed on a low-spec machine.

2 Likes

Qwen has off line version.

2 Likes

Okay got it, I’ll try with Gemma and also with Qwen as suggested by @dannyyeung If I stuck I’ll let you know
Thank you so much your response it’ll help me a lot!

2 Likes

Just an introduction to the models. If you need a relatively minor language, I think Qwen or Gemma would be good choices. If you only need English, I think SmolLM3 is excellent.

Good smaller LLMs

I’m not an intermediate here, and barely even a beginner (this is my first ever post I think)…

But have you considered possible data leakage in this solution? Will there be guardrails so Parent X can’t look up the marks and report cards of the children of Parents A, B, C?

I’m sure you’ve likely thought about this, but this was just the first thing that popped in my head when I read this thread. Wish you all the best, and I hope the solution comes out awesome for the school, parents and students.

2 Likes

I can’t use OpenAI or anything as data is confidential

I think he/she thinks about privacy carefully.

1 Like

Thats a good notice and definitely should be taken into account!

2 Likes

Use any bot you want, makes no difference. Just index your data via the textdb.

“”"
Script: index_text_dataset.py
Purpose: Index a text dataset using textdb for fast retrieval and efficient data access.
Dependencies: pip install textdb
Instructions:

  • Set INPUT_FILE and OUTPUT_DB.
  • Run: python index_text_dataset.py
    “”"

from textdb import TextDB

Path to your input text file (one sample per line)

INPUT_FILE = ‘/path/to/input.txt’

Output directory for textdb index files

OUTPUT_DB = ‘/path/to/output_textdb’

def main():
# 1. Load the dataset
print(f"Loading data from: {INPUT_FILE}")
with open(INPUT_FILE, ‘r’, encoding=‘utf-8’) as f:
lines = [line.rstrip(‘\n’) for line in f]

# 2. Create and build the TextDB index
print(f"Building index at: {OUTPUT_DB}")
db = TextDB(OUTPUT_DB, mode='w')
db.add_all(lines)  # Index all samples
db.save()          # Persist index to disk

print(f"Indexed {len(lines)} lines. TextDB saved to: {OUTPUT_DB}")

if name == ‘main’:
main()

1 Like

Hey! Sorry for the late response.

Those things will, of course, be taken care of. Each parent will be given separate login credentials, so they can only access information related to their own children.

For example, if Parent X has 100 kids (which hopefully won’t be the case :grinning_face_with_smiling_eyes:), they’ll only be able to search for and view those 100 students. This is because the children are linked to Parent X via their unique objectId.

Hope you got it

2 Likes

Thanks so much, bro! I’m definitely going to try that!

2 Likes

Hey, thank you so much for the reply, and sorry for the late response! I’m currently checking out Gemma 3n 4B but I’m running into an issue—it requires a lot of memory to run locally after downloading it from Hugging Face. Do I need to quantize the model before running it?

1 Like

Yeah. But quantization at runtime (on-the-fly quantization) is not that difficult. Of course, it becomes difficult if you are particular about accuracy and speed…
Also, there are many high-quality files available on Hub for the GGUF format, so it would be a good idea to search for them.

# pip install -U bitsandbytes
from transformers import pipeline, BitsAndBytesConfig
import torch

# https://huggingface.co/blog/4bit-transformers-bitsandbytes
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-3n-e4b-it",
    torch_dtype=torch.bfloat16,
    quantization_config=nf4_config,
)

Will this directly run without quantization?

1 Like

Yeah. GGUF files are pre-quantized files that can be used in their quantized state. They are to be dequantized at runtime, but this is not something we need to worry about.
Hugging Face’s Transformers are not suitable for running GGUF, so if you want to use GGUF, it is better to run it using Ollama or similar tools. There are various quantization formats available for Transformers, but BitsAndBytes is usually sufficient.

1 Like