#dataset #numbers #business #analysis #parser #json #abn

app simple-abns

Simplify the ABR's Australian Business Number dataset for easier analysis

1 unstable release

0.1.0 Apr 26, 2024

#26 in #business

38 downloads per month

MIT license

24KB
526 lines

simple-abns

simple-abns parses the ABR's Australian Business Number dataset and converts it to a simpler JSON format.

You can download a copy of the converted dataset. Note that this is not updated automatically - please open an issue if it could use a refresh.

You can also find machine-readable names for the entity types the ABR uses in ./entity_types.json.

If you'd like to generate the dataset yourself, you'll need to download the raw XML data and place all 20 chunks in ./raw. simple-abns will parse them and print each ABN record as a seperate line. You can see progress and compress the output using:

cargo run --release | pv -ls 18M | zstd -T0 -9 > simple-abns.jsonl.zst

Example

Input:

<ABR recordLastUpdatedDate="20240412" replaced="N">
	<ABN status="ACT" ABNStatusFromDate="19991101">88712649015</ABN>
	<EntityType>
		<EntityTypeInd>SGE</EntityTypeInd>
		<EntityTypeText>State Government Entity</EntityTypeText>
	</EntityType>
	<MainEntity>
		<NonIndividualName type="MN">
			<NonIndividualNameText>STATE EMERGENCY SERVICE (NSW)</NonIndividualNameText>
		</NonIndividualName>
		<BusinessAddress>
			<AddressDetails>
				<State>NSW</State>
				<Postcode>2500</Postcode>
			</AddressDetails>
		</BusinessAddress>
	</MainEntity>
	<GST status="ACT" GSTStatusFromDate="20000701" />
	<OtherEntity>
		<NonIndividualName type="TRD">
			<NonIndividualNameText>NEW SOUTH WALES STATE EMERGENCY SERVICE</NonIndividualNameText>
		</NonIndividualName>
	</OtherEntity>
</ABR>

Output:

{
  "abn": "88712649015",
  "status": "Active",
  "status_since": "1999-11-01",
  "last_updated": "2024-04-12",
  "entity_name": {
    "type": "NonIndividual",
    "name": "STATE EMERGENCY SERVICE (NSW)"
  },
  "entity_type": "SGE",
  "trade_names": [
    "NEW SOUTH WALES STATE EMERGENCY SERVICE"
  ],
  "postcode": "2500",
  "state": "NSW",
  "gst_status": "Active",
  "gst_status_since": "2000-07-01"
}

Dependencies

~3.5–5MB
~93K SLoC