14 releases
new 0.0.15 | Oct 31, 2024 |
---|---|
0.0.13 | Aug 31, 2024 |
0.0.12 | Jul 28, 2024 |
0.0.8 | Mar 22, 2024 |
0.0.3 | Oct 28, 2023 |
#1142 in Parser implementations
258 downloads per month
Used in 3 crates
170KB
3.5K
SLoC
MalwareDB Types
Note: These parsers are designed to extract potentially useful features from various file types. They are in no way designed to be complete representations of their respective file format. That said, contributions are welcome to extract additional features/information, to add support for a new file format, or to make general improvements!
This crate contains the logic for parsing some executable and document datatypes, and for determining if a Zip file is an MS Office document or an archive of files.
Executable Types:
- ELF (feature flag
elf
, default) - Mach-O and Fat Mach-O (feature flag
macho
, default). Fat Mach-O's embedded Mach-O binaries are extracted and processed as child elements. - PE32 (feature flag
pe32
, default) - PEF (feature flag
pef
, not default and probably not useful)
For each executable, the goal is to extract:
- Section information: names, sizes, entropy
- Import data
- Target: architecture, operating system, endianness, pointer size (32 vs 64-bit)
- Binary type (object file, executable, library, etc.)
Some complications:
- How to get the imports for ELFs? Go has this figured out, but I haven't been able to replicate. Goblin issue #363.
- Should I ditch the custom parsers for Goblin? It would allow me to get Authenticode data from PE32 files, but I worry it won't be tolerant to malformed files (as malware tends to be).
Document Types:
- PDF via pdf (feature flag
pdf
, default) - RTF currently incomplete (feature flag
rtf
, default)
There should be a simple way to represent the needed data so the component which stores the data in the database doesn't have to be aware of file formats.
Dependencies
~8–15MB
~196K SLoC