The development of new spectral analysis methods in bio thin-film detection has generated intense interest in terahertz (THz) spectroscopy and its application in a wide range of fields. In this paper, it is the first time that machine learning methods are applied to the quantitative characterization of bovine serum albumin (BSA) deposited thin-films detected by terahertz time-domain spectroscopy. The spectra data of BSA thin-films prepared by solutions with concentrations ranging from 0.5 to 35 mg/ml are analyzed using the support vector regression method to learn the underlying model of the frequency against the target concentration. The learned mode successfully predicts the concentrations of the unknown test samples with a coefficient of determination R2 = 0.97932. Furthermore, aiming to identify the relevance of each frequency to the concentration, the maximal information coefficient statistical analysis is used and the three most discriminating frequencies in THz frequency are identified at 1.2, 1.1 and 0.5 THz respectively, which means a good prediction for BSA concentration can be achieved by using the top three relevant frequencies. Moreover, the top discriminating frequencies are in good agreement with the frequencies predicted by a long-wavelength elastic vibration model for BSA protein.
Research. Development. Production.
We are a leading supplier to the global Life Science industry with solutions and services for research, biotechnology development and production, and pharmaceutical drug therapy development and production.