

Demo AI Quoting System with Fine-tuning and Quantization
A technical report + reflection on building a demo AI quoting system by fine-tuning Qwen3-4B with a single RTX 4090, deploying on i9 with quantization
This article is not only a technical report of the demo AI quote system, but also include some thoughts on the rising AI Staff industry. This article can also represent my treasure memory for my internship experience. Much thanks to CX for giving me, a freshman student who is still seeking knowledge, this opoortunity.
0) Context#
- Company’s Goal: Applying “AI Staff” to manufacturing/distribution workflows
- Use case: Parse customer quote Excels/Images/PDFs → normalized JSON → matching price database.
- Why LLM vs rules: document variety, long‑tail fields, layout drift.
I normalize each line item into this compact schema used throughout this post:
{"index": "", "model": "", "voltage": "", "spec": "", "unit": "", "num": ""}
json1) Model & Setup#
- Base model: Qwen3-4B-Instruct-2507 ↗
- Hardware: RTX 4090, i9-14900K
- Framwork: LLaMA-Factory ↗, Llama.cpp ↗
- Approach: LoRA, GGUF (4-bit quant)
2) Data Processing & Formatting#
-
Source:
Thanks for my amazing collegues, I was provided by excel sheets that are already labelled in a great quality.
Anonymization and sharing policy#
I do not publish any company data. All examples here are redacted; I only show a 5‑line preview screenshot for illustration. The full dataset is private.
Why only Excel in v1#
Although the company handles PDFs/images/Excels, this demo focuses on Excel quotes only. Future work: OCR pipeline → the same JSON schema.
序号 物料编码 货物(功能规格)描述 单位 数量 单价(元) 税率(%) 单价(元) CX-型号 CX-电压 CX-规格 CX-单位 CX-数量 1 500135730 低压电力电缆,ERF,铜,300,1芯,ZC,无铠装,普通 千米 1 251885.1 13% 284630.16 ZC-ERF 0.6/1kV 1*300 m 1000 2 500109714 低压电力电缆,VV,铜,35/16,3+1芯,ZC,22,普通 千米 1 90639.79 13% 102422.96 ZC-VV22 0.6/1kV 3*35+1*16 m 1000 3 500109080 低压电力电缆,YJLV,铝,120,4芯,ZC,22,普通 千米 1 52033.58 13% 58797.95 ZC-YJLV22 0.6/1kV 4*120 m 1000 4 500132449 低压电力电缆,YJLV,铝,120,4芯,ZC,无铠装,普通 千米 1 44103.17 13% 49836.58 ZC-YJLV 0.6/1kV 4*120 m 1000 5 500015270 低压电力电缆,YJLV,铝,120,4芯,不阻燃,22,普通 千米 1 51573.13 13% 58277.64 YJLV22 0.6/1kV 4*120 m 1000 -
Processing & formatting:
As I use Llama-Factory as my fine-tuning framework, it is necessary to convert the dataset to a jsonl file in Alpaca format for supervised fine-tuning. A formal Alpaca json typically contain four components, which are instruction (prompt), input (context), output (desired response), and system (system prompt). For details, please refer to LLM dataset formats ↗ in Llama-Factory.
Here is the python script I wrote that helps me to process the datasets.
Click here to see the full script
pythonimport argparse, json, re from pathlib import Path import pandas as pd DASH = r"[\--—–]" MODEL_COLS = [f"CX{DASH}型号", "CX-型号"] VOLT_COLS = [f"CX{DASH}电压", "CX-电压"] SPEC_COLS = [f"CX{DASH}规格", "CX-规格"] UNIT_COLS = [f"CX{DASH}单位", "CX-单位"] NUM_COLS = [f"CX{DASH}数量", "CX-数量"] CX_PATTERNS = [f"^CX{DASH}型号$", f"^CX{DASH}电压$", f"^CX{DASH}规格$", f"^CX{DASH}单位$", f"^CX{DASH}数量$"] DEFAULT_INSTRUCTION = ( "Given one table row of cables, produce ONLY a JSON object with keys exactly: index, model, voltage, spec, unit, num. If the input contains an index token like '<n>#', set index to '<n>' (no '#'). If the input has no index, set index to an empty string." ) DEFAULT_SYSTEM = ( "Return strict JSON with keys exactly: index, model, voltage, spec, unit, num. No extra text. Do not invent values. For index: if the input includes an index token like '<n>#',copy the number and output it as '<n>' (no '#'); otherwise set index to an empty string." ) def to_str(x): if pd.isna(x): return "" s = str(x).strip() # collapse internal whitespace s = re.sub(r"\s+", " ", s) # drop trailing .0 for ints coming from Excel if re.match(r"^\d+\.0$", s): s = s[:-2] return s def is_cx_col(colname: str) -> bool: if colname is None: return False name = str(colname).strip() for pat in CX_PATTERNS: if re.match(pat, name): return True return False def pick_first_present(row, candidates): """Find the first candidate column name that exists (regex or literal).""" for cand in candidates: if any(ch in cand for ch in "--—–[]^$\\"): for c in row.index: if re.match(cand, str(c).strip()): return to_str(row[c]) else: if cand in row.index: return to_str(row[cand]) return "" def build_input_values_only(row): """Join all NON-CX cell values from the row into one string (values only).""" vals = [] for col in row.index: name = str(col).strip() if is_cx_col(name): continue v = to_str(row[col]) if v != "": vals.append(v) return " | ".join(vals) def make_output_obj(row, one_based_index): return { "index": str(one_based_index), "model": pick_first_present(row, MODEL_COLS), "voltage": pick_first_present(row, VOLT_COLS), "spec": pick_first_present(row, SPEC_COLS), "unit": pick_first_present(row, UNIT_COLS), "num": pick_first_present(row, NUM_COLS), } def process_excel(path, sheet, instruction, system, output_as_object): try: df = pd.read_excel(path, sheet_name=sheet) except Exception as e: print(f"[WARN] Skip {path} (read error): {e}") return [] df = df.dropna(how="all") records = [] for i, row in df.iterrows(): index_1_based = i + 1 input_text = build_input_values_only(row) out_obj = make_output_obj(row, index_1_based) rec = { "instruction": instruction, "input": input_text, "output": (out_obj if output_as_object else json.dumps(out_obj, ensure_ascii=False)), } if system: rec["system"] = system records.append(rec) return records def iter_excels(paths): for p in paths: p = Path(p) if p.is_dir(): for f in sorted(p.rglob("*.xls*")): yield f elif p.is_file() and p.suffix.lower().startswith(".xls"): yield p def main(): ap = argparse.ArgumentParser(description="Convert a folder of Excel files to Alpaca JSONL (values-only input, CX- output).") ap.add_argument("--in", dest="inputs", nargs="+", required=True, help="Folder(s) and/or file(s). Folders scanned recursively.") ap.add_argument("--sheet", default="CX-1", help="Sheet index (int) or name (str). Default 0.") ap.add_argument("--out", default="CX_AI_Quote_813.jsonl", help="Output JSONL.") ap.add_argument("--instruction", default=DEFAULT_INSTRUCTION, help="Instruction text.") ap.add_argument("--no-system", action="store_true", help="Omit the system field.") ap.add_argument("--output-as-object", action="store_true", help="Store 'output' as a JSON object instead of a JSON string.") args = ap.parse_args() sheet = int(args.sheet) if args.sheet.isdigit() else args.sheet system = None if args.no_system else DEFAULT_SYSTEM all_recs, files = [], list(iter_excels(args.inputs)) for f in files: recs = process_excel(f, sheet, args.instruction, system, args.output_as_object) print(f"[OK] {f.name}: {len(recs)} rows") all_recs.extend(recs) Path(args.out).parent.mkdir(parents=True, exist_ok=True) with open(args.out, "w", encoding="utf-8") as w: for r in all_recs: w.write(json.dumps(r, ensure_ascii=False) + "\n") print(f"[DONE] Wrote {len(all_recs)} samples from {len(files)} file(s) → {args.out}") if __name__ == "__main__": main()
-
Instruction:
“Given one table row of cables, produce ONLY a JSON object with keys exactly: index, model, voltage, spec, unit, num. If the input contains an index token like ’
#’, set index to ' ' (no ’#’). If the input has no index, set index to an empty string.” -
System:
“Return strict JSON with keys exactly: index, model, voltage, spec, unit, num. No extra text. Do not invent values. For index: if the input includes an index token like ’
#‘,copy the number and output it as ' ' (no ’#’); otherwise set index to an empty string.” -
Input:
The input component is extracted from the strings of each row in the excel table, whereas adding ’|’ delimiter to seperate fields and adding ’#’ after the index number that are accquired from excel table row numbers to distinguished with real row values.
The reason behind using index to mark each row is to support input with multiple lines in the future. This can still preserves a 1:1 mapping between inputs and outputs via
index
, which keeping postprocessing simple without changing the schema. -
Output:
The output format is
{"index": "", "model": "", "voltage": "", "spec": "", "unit": "", "num": ""}
The output component is extracted from the CX-columns, which are the annotated data, except for index number. The index numbers in output are also accquired from the excel table row numbers.
-
3) Fine-tuning Recipe#
- Method: LoRA on Qwen3-4B-Instruct (SFT via LLaMA-Factory)
- Dataset size: ~1099 JSON objects
- Formatting: Alpaca style (instruction/input/output[/system]); assistant output is JSON‑only