Find phrase in text file. Python

In this post I am making a program which checks a text file for a word / phrase which the user wants. It will print all sentences with that word.


Plan

  • Find a file to use for testing
  • Input
  • Save file
  • Format text
  • Split text
  • Check each sentence
  • Print results

Find a file to use for testing

I used this random text generator and saved it as a text file. You can use anything (make sure your text has full stops).


Input

phrase=str(input("What phrase do you want to find in your text? "))
phrase=phrase.lower()

This lets the user input a phrase and makes the phrase lowercase.


Save file

#Open file
f=open('Text.txt','r')
text=f.read()
f.close()

This opens the file, saves the file and closes the file.


Format text

#Remove useless characters
text=' '.join(text.split('\n'))
text=''.join(text.split(','))
text=''.join(text.split(':'))
text=''.join(text.split(';'))
text=' '.join(text.split('  '))
text=text.lower()

#Change Punctuation
text='.'.join(text.split('!'))
text='.'.join(text.split('?'))

This code gets rid of common punctuation so if you search ‘man’ a sentence containing ‘man:’ would appear.

Then the text is made lowercase. This prevents cap sensitive issues occurring.

Then I changed punctuation so both “?” and “!” count as full stop.


Split text

#Split text
sentences=text.split('.')
valid=[]

This code simply just split the text into sentences.


Check each sentence

#Check each sentence
for x in sentences:
    words=x.split(' ')
    for y in words:
        if phrase==y:
            if x not in valid:
                valid.append(x)

This checks every word in every sentence to select the sentences containing the phrase.


Print results

#print results
for z in valid:
    print(z)

This prints every sentence containing the phrase as a new line. You could just do:

print(valid)

But it would look less visually appealing.


Full code

#Input
phrase=str(input("What phrase do you want to find in your text? "))
phrase=phrase.lower()

#Open file
f=open('Text.txt','r')
text=f.read()
f.close()

#Remove useless characters
text=' '.join(text.split('\n'))
text=''.join(text.split(','))
text=''.join(text.split(':'))
text=''.join(text.split(';'))
text=' '.join(text.split('  '))
text=text.lower()

#Change Punctuation
text='.'.join(text.split('!'))
text='.'.join(text.split('?'))

#Split text
sentences=text.split('.')
valid=[]

#Check each sentence
for x in sentences:
    words=x.split(' ')
    for y in words:
        if phrase==y:
            if x not in valid:
                valid.append(x)

#print results
for z in valid:
    print(z)

Leave a comment